represent any character using only python builtin function calls, no literals github
Basically, this is a code golf challenge to represent each unicode character in a weird way, in python.
More specifically, this site builds unicode characters from python's built-in functions, like: len(), str(), not(), chr(), etc.
Also, it follows these rules:
not().pow(a,b) isn't allowed.Just nested function calls.
For example, chr(max(range(ord(min(str(bytes())))))) evaluates to '&'.
Try pasting it into a Python console.
To try this out, enter a character into the "type here" input (or click random for a random one).
After that, the visualize button walks you through the evaluation step by step.
The "use optimized database" toggle switches between a an optimized expression from the database, or the expression we get through the psuedo-base-3 algorithm, (read the appropriate tabs for more information).
Also, I expanded this beyond single characters:
Strings: by allowing multiple arguments, this works for entire strings too (try the strings tab).
Arbitrary code: by also allowing methods, you can compile entire Python scripts (try the exec() tab).
After you understand how the optimizations work, you can interactively view the underlying graph, that we end up running Dijkstra on: graph visualization.
Right around the whole among us era, it was gaining traction that
chr(sum(range(ord(min(str(not())))))) in Python evaluates to
'ඞ', a unicode character that looks suspiciously like the among us crewmate.
This was an amazing discovery. But (unfortunately), I immediately tried to generalize it.
Could any Unicode character (there's ~160,000) be represented like this?
Since we only aim to find the Unicode value of the character and then
apply chr() to it, the struggle is essentially to find a
neat representation for each number 1–160,000.
And the representation MUST be neat, in a sense, since Python won't let you have more than 200 nested parentheses. It seemed like a cool challenge idk
* Technically, not is an OPERATOR in python, not a function. But since the among us guy used it I'm using it too.
Before any algorithm, we need a handful of numbers we can construct from zero-argument builtins alone. These become the starting points for everything else.
Booleans → 0 and 1
not() returns True; not(not()) returns False.
Wrapping in int() converts to 1 and 0.
len(str(x)) — length of string representations
Python objects have predictable string forms, so len(str(x)) gives a fixed small integer.
on not() | → len("True") = 4 |
on int(not()) via bin | → len("0b1") = 3, len("0b100") = 5 |
on frozenset() | → len("frozenset()") = 11 |
len(str(type(x))) — length of type name strings
str(type(x)) gives "<class 'typename'>".
Each type has a different name length, covering a spread of values from 13 to 33.
on int(), not(), float() … | → 13, 14, 15 … |
on iterators: iter(set()), iter(list()) … | → 22, 23, 24, 26, 28 … |
on reversed: reversed(list()), reversed(dict()) | → 30, 33 |
ord(min/max(str(x))) — ASCII codes of characters inside strings
Every Python value has a string representation that contains specific characters.
ord(min(str(x))) picks the smallest character; ord(max(...)) picks the largest.
For example, str(not()) = "True" → ord(min(...)) = 84 ('T').
on not(), not(not()) | → 84 'T', 117 'u', 70 'F', 115 's' |
on float(), tuple() | → 46 '.', 48 '0', 40 '(', 41 ')' |
on bytes(), list(), dict() | → 39 "'", 98 'b', 91 '[', 93 ']', 123 '{', 125 '}' |
on complex(), set(), frozenset() | → 106 'j', 116 't', 122 'z' |
on hex()/oct() of int(not()) | → 120 'x', 111 'o' |
on type(type(not())) | → 121 'y' |
My initial attempt used three tools:
len(bin(len(str(not())))) = 5sum(range(n)) = n(n-1)/2 — to jump up quicklymax(range(n)) = n-1 — to step back down
The idea: apply sum(range()) a few times to overshoot your target,
then decrement back down with max(range()).
For example, to get to 6:
5 = len(bin(len(str(not())))) 10 = sum(range(5)) = sum(range(len(bin(len(str(not())))))) 6 = max(range(max(range(max(range(max(range(10))))))))
This algorithm works, but for bigger numbers it eventually can't fit Python's 200 parentheses limit. In this example, to reach 100, you'd have to use sum(range()) thrice on 5 to get 990, and then do 890 decrements! Even if you have many initial seeds, the quadratic growth is much too sparse to reach arbitrarily large numbers within the parentheses budget.
After the initial attempt, I went looking for many strategies to get from a number n to f(n), and found many, but credit to Gemini for finding the key! A direct function that turns n into 3n. This is incredibly useful, because it means we can represent each number n in "O(log n)" parentheses. Basically, it's kind of like the algorithm to find a number in base 3, except the opposite direction, since we can only subtract and not add.
Two operations are enough:
Subtract 1: max(range(n)) returns n - 1.
range(n) produces 0, 1, ..., n-1. max() picks the last one. Costs 2 parentheses.
Multiply by 3: len(str(list(bytes(n)))) returns exactly 3n.
bytes(n) creates n zero-bytes. list() turns it into [0, 0, ..., 0]. str() gives "[0, 0, ..., 0]" — always exactly 3n characters. Costs 4 parentheses.
The algorithm: to represent n, decompose it in base 3.
At each step, build ceil(n/3), triple it, then subtract the remainder (0, 1, or 2).
Stop when you reach a base anchor — a small number you can construct
directly (like len(str(not())) = 4).
*note: The only base anchors you actually need is 1 = int(not()). Every natural number can be built from interweaving 1, f(x)=3x, g(x)=x-1.
function build(n):
if n is a base anchor:
return the anchor expression
q = ceil(n / 3)
r = 3 * q - n // r is 0, 1, or 2
expr = triple(build(q)) // multiply by 3
expr = subtract(expr, r) // subtract 0, 1, or 2
return expr
Example: build(13)
You can think about it this way: 13 = 15-2 = 5*3-2 = (3*2-1)*3-2
build(13):
q = ceil(13/3) = 5, r = 15 - 13 = 2
13 = triple(build(5)) - 2
build(5):
q = ceil(5/3) = 2, r = 6 - 5 = 1
5 = triple(build(2)) - 1
build(2):
2 is a base anchor
return len(str(ord(min(str(not())))))
working back up:
build(5) = triple(len(str(ord(min(str(not())))))) - 1
build(13) = triple(that) - 2
Then wrap the whole thing in chr() to get the character.
Algorithm stats (base-3 only, no optimizations)
Python has a 200 nested parentheses limit. The base-3 algorithm stays well under that for all Unicode code points (max 1,114,111).
The base-3 algorithm works for everything but isn't always the shortest. Besides that first algorithm, we actually found a lot of other strategies to jump between numbers.
Decrement
| max(range(n)) | = n − 1 | 2 parens |
range(n) yields 0…n-1. max() picks the last element.
Exact multipliers
All based on stringifying bytes/bytearray objects:
| len(str(list(bytes(n)))) | = 3n | 4 parens |
| len(str(bytes(n))) | = 4n + 3 | 3 parens |
| len(str(bytearray(n))) | = 4n + 14 | 3 parens |
| len(ascii(str(bytes(n)))) | = 5n + 5 | 4 parens |
bytearray(n) stringifies as bytearray(b'\x00...') — a 14-char prefix, then 4 chars per byte. Same paren cost as quad_plus_3 but a different constant, filling in different gaps.
Zip chain — higher exact multiples via nested tuples
Each zip() wrapper turns each element into a deeper tuple, adding exactly 3n to the string length.
| len(str(list(zip(bytes(n))))) | = 6n | 5 parens |
| len(str(list(zip(zip(bytes(n)))))) | = 9n | 6 parens |
| ...k zips... | = 3(k+1)n | 4+k parens |
Ascii exponential — exponential multiplier for linear paren cost
str(bytes(n)) produces backslash escapes like \x00.
Each ascii() call escapes those backslashes again, doubling them.
So the string roughly doubles in length with each wrap:
str(bytes(2)) = "b'\x00\x00'" len = 11 ascii(that) = "\"b'\\x00\\x00'\"" len = 15 ascii(ascii(that)) = ... len = 23
General formula with k layers of ascii(): f(n) = (2^k + 3)n + (2^(k+1) + 1)
| k=1 (one ascii) | 5n + 5 | 4 parens |
| k=2 (two ascii) | 7n + 9 | 5 parens |
| k=3 | 11n + 17 | 6 parens |
| k=4 | 19n + 33 | 7 parens |
| k=5 | 35n + 65 | 8 parens |
| k=6 | 67n + 129 | 9 parens |
| k=10 | 1027n + 2049 | 13 parens |
Triangular jump — quadratic growth for 2 parens
sum(range(n)) = n(n-1)/2
Super-linear (n·log n) — stringify a list of integers
When you stringify a list like [0, 1, 2, ..., n-1], the result grows faster than n
because numbers gain digits as they get larger. These strategies exploit that.
| len(str(list(range(n)))) | ≈ n·log10(n) | 4 parens |
| len(str(list(zip(range(n))))) | ≈ n·log10(n) + 5n | 5 parens |
| len(str(list(enumerate(bytes(n))))) | ≈ n·log10(n) + 6n | 5 parens |
| len(str(dict(enumerate(bytes(n))))) | ≈ n·log10(n) + 5n | 5 parens |
| len(str(dict(enumerate(range(n))))) | ≈ 2n·log10(n) | 5 parens |
Digit-count (logarithmic)
These grow with the number of digits of n, not n itself — extremely cheap for large n.
| len(str(slice(n))) | = digits(n) + 19 | 3 parens |
str(slice(n)) → "slice(None, n, None)" — always 19 chars of boilerplate plus the digits of n.
There are more strategies in the codebase — only those with >1,000 uses in the final database are listed here.
Finding the shortest paths
Initially, to optimize this whole thing, I would run a script to loop over every number and check all possibilities for other numbers to reach it. For example, to see if we can find a shorter representation from the triple strategy for n=150, we would check how long it would be to do triple(best_representation_for_50). If it was an improvement, we'd keep it. It worked, but it was clunky and required running multiple passes until things stopped getting shorter.
But as I was practicing algorithms exams, I realized — we can represent this as a directed graph!
We can treat each number 0...200,000 as a node, and each strategy is just a set of edges connecting them. For example, the strategy decrement gives us a directed edge from each node n to n-1, and the triple strategy is an edge from n to 3n. In this model, every representation is simply a path through this graph.
So instead of guessing and checking backwards, we use Dijkstra's Algorithm. To do this, we create a starting node s with an edge to all the base anchors according to the base anchors' length of representation. Our goal is then reduced to finding the shortest path from s to every other node in the set {0, 1, ..., 200000}.
We can actually weight the edges differently — by length or depth — depending on what we want:
For instance, each directed edge n -> n-1 from decrement (max(range(n))) has a depth of 2 and a length of 12. Additionally, we can use a mix of both to break up ties between two representations with the same depth but different lengths, or vice versa. Dijkstra guarantees that we find the absolute shortest paths from our list of strategies. :)
If you're curious about how this graph ends up looking:
Click here to explore the graph interactively click nodes, hover over edges, trace paths
I built that with Cytoscape.js: You can click a neighbor node to move to it, and walk around the graph.
Hovering over an edge will tell you what strategy it's using.
Also, you can trace the shortest path from s to any other node (int or character).
And it plays a chord for each edge you move from which I think is fun :)
Full list of strategies (many are useless): strategies.py
· Base anchors: anchors.py
If you can think of any I'm missing:
Submit them on GithHub!
A SQLite database stores the shortest known expression for each integer from 0 to 200,000. Each entry records which strategy produced it and which parent number it was derived from. The whole database is built in one shot by running Dijkstra's algorithm over the integer graph — each strategy is a set of directed edges, and Dijkstra guarantees the globally shortest path to every node.
Current stats
Strategy breakdown
Optimization history
Each row shows the database state at a snapshot in time. Minimal = base-3 algorithm seeded with only 0 and 1. Full algorithm = base-3 with all 44 base anchors. Optimizer / deep search = the old iterative DP approach (now replaced). Dijkstra = current method — single forward-search pass, globally optimal for the strategy set. Each Dijkstra run appears twice: depth optimizes for fewest function calls (parens), length optimizes for shortest expression string — both are stored in the database.
| avg depthaverage function calls across all numbers | max depthworst-case function calls for any number | avg lenaverage expression string length | max lenlongest expression string |
|---|
TL;DR: this works for strings too, but we allow multiple argumets. While this tab is open, the input accepts strings as well.
So we've got single characters figured out. But what about whole strings?
Python has no builtin concat(a, b). There's no way to join two strings
without operators (+), methods (.join()), or syntax ([]).
(at least, I really don't think so!)
The single-character system works nicely because it produces a linear AST
(abstract syntax tree). Every expression is a straight line: each function wraps the previous one,
flowing in one direction without any branches or merges.
So, methods that take in no arguments would be better, since they at least keep the AST linear.
And you can think of doing x.to_bytes() as basically just doing to_bytes(x). Just looks less nice.
Multiple parameters, on the other hand, introduce branches in the AST :(
The giant integer problem
To keep the AST completely linear for a whole string, that would mean uniquely representing each string as a single number. There is a clear way to do this - write the string as a number in base 256, where each byte is a digit (basically, just write the string out as raw bytes and interpret it as an integer). For example, "hello" as a big-endian integer is about 448 billion.
But representing huge numbers with our restricted toolset doesn't scale. CPython physically cannot handle the execution:
len(str(list(bytes(n)))) for tripling.
Python physically executes bytes(n). For billion-scale numbers, that means allocating gigabytes of RAM at runtime, resulting in an instant MemoryError.
SyntaxError.
len() and range() are capped at
sys.maxsize (263 - 1). A 10-character string easily exceeds this, breaking any length-based math tricks.
The linear AST approach dies at ~8 characters. The numbers just get too big :(
I really doubt there's a different way to map between strings and integers in a one-to-one way.
There are other issues with this too: for example, you can't do x = b"ඞ" in Python, you get:
SyntaxError: bytes can only contain ASCII literal characters
So you'd have to build the bytes one by one, which would require multiple parameters!
Also, to even decode the bytes back into a string, you'd need to specify that you are doing str(x, "u8"), which
uses multiple parameters...
Forced to branch
Since massive numbers are impossible, we have to build each byte independently and pack them together.
But merging independent branches requires either commas (multi-arg functions) or dot notation (methods).
(And dunder methods like .__add__() are outright banned, as they are just operators in disguise).
Here are a few early attempts at merging that didn't make the cut:
str().join([chr(a), chr(b)]) - Relies on [] list syntax. Rejected.str().join(Exception(chr(a), chr(b)).args) - Avoids [], but introduces both commas AND attribute access. Rejected.The map(ord) pipeline
This led to horizontal packing using zip():
bytes(map(ord, next(zip( chr(b1), chr(b2), ... )))).decode()
zip() on single-char strings produces a tuple. next() extracts it.
map(ord, ...) converts back to integers, bytes() builds the byte string,
and .decode() gives the final string.
This works. But passing ord as a bare reference into map() is basically cheating—it's an uncalled function identifier, not the evaluated result of a call. Plus, it still relies on a method (.decode()).
The final solution: the eval() literal pipeline
Instead of building a string and decoding it, we can build the Python source code
for the string literal as raw bytes, and let eval() parse it.
To get pure integers into zip() without using map(),
we use reversed(range(N))—an iterator whose first element is exactly N-1.
eval(bytes(next(zip(
reversed(range(40)), # yields 39 (')
reversed(range(113)), # yields 112 (p)
reversed(range(122)), # yields 121 (y)
reversed(range(40)) # yields 39 (')
))))
Step by step for "py":
repr("py") gives 'py'. Its UTF-8 bytes are 39, 112, 121, 39.reversed(range(...)). This yields b as its first element.zip packs the iterators. next pulls the first element from each, creating the tuple: (39, 112, 121, 39).bytes naturally consumes the tuple to build b"'py'".eval accepts bytes natively in Python 3, parses the literal, and returns "py".
So yeah. We end up using multiple paramaters for zip. But no methods.
And it still comes out looking pretty cool I think.
TL;DR: you can run arbitrary python code, but we allow both multiple arguments and methods. Paste it into the editor above and click "compile". Copy the output into any Python REPL or file and run it.
The strings technique uses eval(bytes(next(zip(...)))) to reconstruct a string.
Similarly, we can convert an entire python script into a string, and then run exec() on it.
However, I don't like that the word exec is right there in the output. It makes it too obvious what's going on -
I think it's funnier if it's not clear at all how we actually end up executing the code.
The best way to do this uses method calls :( but it still looks cool I think:
To get exec without writing it, we can extract it from the builtins module. Here is how the chain works:
vars() gives us a dict of the current environment..get("__builtins__") pulls the builtins module itself out of that dict.vars(...) on that module rips out its internal dictionary of every single built-in function..get("exec") pulls the actual exec function out of that second dict.Put it all together and pass a payload to it, and it looks like this:
vars(vars().get("__builtins__")).get("exec")(b"print('payload')")
Swap the strings and payload for our bytes trick, and we get the final output:
vars(vars().get(bytes(...).decode())).get(bytes(...).decode())(bytes(...))
The three bytes(...) blobs encode: the string "__builtins__",
the string "exec", and the actual payload script. The interpreter reconstructs and calls
exec entirely from its own memory, which I think is cool!