represent any character using only python builtin function calls, no literals github
Python has built-in functions like len(), str(), not(), chr(), etc.
This site builds any Unicode character using only those functions: no numbers, no strings, no variables, only one paramater.
Just nested function calls.
For example, chr(max(range(ord(min(str(bytes())))))) evaluates to &.
Try pasting the expression into a python console (or using print() on it in a python file).
After you enter a character, the visualize button works you through the evaluation step by step.
Also (new!) this works for strings too, but we allow multiple parameters.
Right around the whole among us era, it was gaining traction that
chr(sum(range(ord(min(str(not())))))) in Python evaluates to
ඞ, a unicode character that looks suspiciously like the among us crewmate.
This was an amazing discovery. But (unfortunately), I immediately tried to generalize it.
Could any Unicode character (there's ~160,000) be represented like this?
The rules are, to be clear: (as I made them up):
not().pow(a,b) isn't allowed.
Since we only aim to find the Unicode value of the character and then
apply chr() to it, the struggle is essentially to find a
neat representation for each number 1–160,000.
And the representation MUST be neat, in a sense, since Python won't let you have more than 200 nested parentheses. It seemed like a cool challenge idk
My initial attempt used three tools:
len(bin(len(str(not())))) = 5sum(range(n)) = n(n-1)/2 — to jump up quicklymax(range(n)) = n-1 — to step back down
The idea: apply sum(range()) a few times to overshoot your target,
then decrement back down with max(range()).
For example, to get to 6:
5 = len(bin(len(str(not())))) 10 = sum(range(5)) = sum(range(len(bin(len(str(not())))))) 6 = max(range(max(range(max(range(max(range(10))))))))
This algorithm works, but for bigger numbers it eventually can't fit Python's 200 parentheses limit. In this example, to reach 100, you'd have to use sum(range()) thrice on 5 to get 990, and then do 890 decrements! Even if you have many initial seeds, the quadratic growth is much too sparse to reach arbitrarily large numbers within the parentheses budget.
I went looking for many formulas to get from a number n to f(n), and found many, but credit to Gemini for finding the key! A direct function that turns n into 3n. This is incredibly useful, because it means we can represent each number n in "O(log n)" parentheses. Basically, it's kind of like the algorithm to find a number in base 3, except the opposite direction, since we can only subtract and not add.
Two operations are enough:
Subtract 1: max(range(n)) returns n - 1.
range(n) produces 0, 1, ..., n-1. max() picks the last one. Costs 2 parentheses.
Multiply by 3: len(str(list(bytes(n)))) returns exactly 3n.
bytes(n) creates n zero-bytes. list() turns it into [0, 0, ..., 0]. str() gives "[0, 0, ..., 0]" — always exactly 3n characters. Costs 4 parentheses.
The algorithm: to represent n, decompose it in base 3.
At each step, build ceil(n/3), triple it, then subtract the remainder (0, 1, or 2).
Stop when you reach a base anchor — a small number you can construct
directly (like len(str(not())) = 4).
*note: The only base anchors you actually need is 1 = int(not()), 0 = int(not(not())).
function build(n):
if n is a base anchor:
return the anchor expression
q = ceil(n / 3)
r = 3 * q - n // r is 0, 1, or 2
expr = triple(build(q)) // multiply by 3
expr = subtract(expr, r) // subtract 0, 1, or 2
return expr
Example: build(13)
You can think about it this way: 13 = 15-2 = 5*3-2 = (3*2-1)*3-2
build(13):
q = ceil(13/3) = 5, r = 15 - 13 = 2
13 = triple(build(5)) - 2
build(5):
q = ceil(5/3) = 2, r = 6 - 5 = 1
5 = triple(build(2)) - 1
build(2):
2 is a base anchor
return len(str(ord(min(str(not())))))
working back up:
build(5) = triple(len(str(ord(min(str(not())))))) - 1
build(13) = triple(that) - 2
Then wrap the whole thing in chr() to get the character.
Algorithm stats (base-3 only, no optimizations)
Python has a 200 nested parentheses limit. The base-3 algorithm stays well under that for all Unicode code points (max 1,114,111).
The base-3 algorithm works for everything but isn't always the shortest.
The idea: for each number in the database, the optimizer asks "could any strategy + some
smaller number produce this more cheaply?" For example, for target 51: the 3x strategy
inverts to 51/3 = 17. If wrapping 17's expression in len(str(list(bytes(...))))
is shorter than what we already have for 51, we replace it.
It does this for every number (0–200,000) and every strategy. Each pass can improve expressions, and improvements cascade — if a later pass finds a shorter way to build 17, everything that depends on 17 (like 51) automatically gets shorter too.
Available strategies
Exact multipliers
All based on stringifying bytes objects in different ways:
| len(str(list(bytes(n)))) | = 3n | 4 parens |
| len(str(bytes(n))) | = 4n + 3 | 3 parens |
| len(ascii(str(bytes(n)))) | = 5n + 5 | 4 parens |
4n+3 costs only 3 parens (cheaper than 3x), so it's often better when you can land on the right value with the +3 offset.
Zip chain — higher exact multiples via nested tuples
Each zip() wrapper turns each element into a deeper tuple, adding exactly 3n to the string length.
| len(str(list(zip(bytes(n))))) | = 6n | 5 parens |
| len(str(list(zip(zip(bytes(n)))))) | = 9n | 6 parens |
| ...k zips... | = 3(k+1)n | 4+k parens |
Ascii exponential — exponential multiplier for linear paren cost
str(bytes(n)) produces backslash escapes like \x00.
Each ascii() call escapes those backslashes again, doubling them.
So the string roughly doubles in length with each wrap:
str(bytes(2)) = "b'\x00\x00'" len = 11 ascii(that) = "\"b'\\x00\\x00'\"" len = 15 ascii(ascii(that)) = ... len = 23
General formula with k layers of ascii(): f(n) = (2^k + 3)n + (2^(k+1) + 1)
| k=1 (one ascii) | 5n + 5 | 4 parens |
| k=2 (two ascii) | 7n + 9 | 5 parens |
| k=3 | 11n + 17 | 6 parens |
| k=4 | 19n + 33 | 7 parens |
| k=5 | 35n + 65 | 8 parens |
| k=6 | 67n + 129 | 9 parens |
| k=10 | 1027n + 2049 | 13 parens |
Triangular jump — quadratic growth for 2 parens
sum(range(n)) = n(n-1)/2
Full list of strategies (many are useless): strategies.py
· Base anchors: anchors.py
If you can think of any I'm missing:
Submit them on GithHub!
A simple sqllite database stores the shortest known expression for each number from 0 to 200,000. Each entry records which strategy produced it and which smaller number it depends on. So, for example, if we did formula_5(20) = 50, and we found a smaller formula for 20, we automatically plug that in, and get a smaller representation for 50.
Current stats
Strategy breakdown
How many numbers use each strategy as their shortest representation, in the final database.
Optimization history
Each row shows the state of the database after a round of improvements. Minimal = only seeds 0 and 1. Full algorithm = adds 44 base anchors. Optimizer = tries every strategy on every number and keeps the shortest. Deep search = same, but allows up to 10 extra decrements to bridge gaps (instead of 2).
| avg depthaverage function calls across all numbers | max depthworst-case function calls for any number | avg lenaverage expression string length |
|---|
So we've got single characters figured out. But what about whole strings?
Python has no builtin concat(a, b). There's no way to join two strings
without operators (+), methods (.join()), or syntax ([]).
(at least, I really don't think so!)
The single-character system works nicely because it produces a linear AST
(abstract syntax tree). Every expression is a straight line: each function wraps the previous one,
flowing in one direction without any branches or merges.
So, methods that take in no arguments would be better, since they at least keep the AST linear.
And you can think of doing x.to_bytes() as basically just doing to_bytes(x). Just looks less nice.
Multiple parameters, on the other hand, introduce branches in the AST :(
The giant integer problem
To keep the AST completely linear for a whole string, that would mean uniquely representing each string as a single number. There is a clear way to do this - write the string as a number in base 256, where each byte is a digit (basically, just write the string out as raw bytes and interpret it as an integer). For example, "hello" as a big-endian integer is about 448 billion.
But representing huge numbers with our restricted toolset doesn't scale. CPython physically cannot handle the execution:
len(str(list(bytes(n)))) for tripling.
Python physically executes bytes(n). For billion-scale numbers, that means allocating gigabytes of RAM at runtime, resulting in an instant MemoryError.
SyntaxError.
len() and range() are capped at
sys.maxsize (263 - 1). A 10-character string easily exceeds this, breaking any length-based math tricks.
The linear AST approach dies at ~8 characters. The numbers just get too big :(
I really doubt there's a different way to map between strings and integers in a one-to-one way.
There are other issues with this too: for example, you can't do x = b"ඞ" in Python, you get:
SyntaxError: bytes can only contain ASCII literal characters
So you'd have to build the bytes one by one, which would require multiple parameters!
Also, to even decode the bytes back into a string, you'd need to specify that you are doing str(x, "u8"), which
uses multiple parameters...
Forced to branch
Since massive numbers are impossible, we have to build each byte independently and pack them together.
But merging independent branches requires either commas (multi-arg functions) or dot notation (methods).
(And dunder methods like .__add__() are outright banned, as they are just operators in disguise).
Here are a few early attempts at merging that didn't make the cut:
str().join([chr(a), chr(b)]) - Relies on [] list syntax. Rejected.str().join(Exception(chr(a), chr(b)).args) - Avoids [], but introduces both commas AND attribute access. Rejected.The map(ord) pipeline
This led to horizontal packing using zip():
bytes(map(ord, next(zip( chr(b1), chr(b2), ... )))).decode()
zip() on single-char strings produces a tuple. next() extracts it.
map(ord, ...) converts back to integers, bytes() builds the byte string,
and .decode() gives the final string.
This works. But passing ord as a bare reference into map() is basically cheating—it's an uncalled function identifier, not the evaluated result of a call. Plus, it still relies on a method (.decode()).
The final solution: the eval() literal pipeline
Instead of building a string and decoding it, we can build the Python source code
for the string literal as raw bytes, and let eval() parse it.
To get pure integers into zip() without using map(),
we use reversed(range(N))—an iterator whose first element is exactly N-1.
eval(bytes(next(zip(
reversed(range(40)), # yields 39 (')
reversed(range(113)), # yields 112 (p)
reversed(range(122)), # yields 121 (y)
reversed(range(40)) # yields 39 (')
))))
Step by step for "py":
repr("py") gives 'py'. Its UTF-8 bytes are 39, 112, 121, 39.reversed(range(...)). This yields b as its first element.zip packs the iterators. next pulls the first element from each, creating the tuple: (39, 112, 121, 39).bytes naturally consumes the tuple to build b"'py'".eval accepts bytes natively in Python 3, parses the literal, and returns "py".
So yeah. We end up using multiple paramaters for zip. But no methods.
And it still comes out looking pretty cool I think.