- Python really needs to take the Typescript approach of "all valid Python4 is valid Python3". And then add value types so we can have int64 etc. And allow object refs to be frozen after instantiation to avoid the indirection tax.
Sensible type-annotated python code could be so much faster if it didn't have to assume everything could change at any time. Most things don't change, and if they do they change on startup (e.g. ORM bindings).
- To clarify, it is nuts that in an object method, there is a performance enhancement through caching a member value.
class SomeClass def init(self) self.x = 0 def SomeMethod(self) q = self.x ## do stuff with q, because otherwise you're dereferencing self.x all the damn time- This is not just a performance concern, this describes completely different behaviour. You forgot that self.x is just Class.__getattr__(self, 'x') and that you can implement __getattr__ how you like. There is no object identity across the values returned by __getattr__.
- This level of dynamism is commonly forgotten/omitted because it is most often not at all needed. "There is no object identity across the values [retrieved by self.x]" is a very curious choice to many.
- It's very Pythonic to expose e.g. state via the existence of attributes. This also makes it possible to dynamically expose foreign language interfaces. You can really craft the interface you like, because the interface exposal is also normal code that returns strings and objects.
You are right that it is not needed often, but there is often somewhere a part in the library stack that does exactly this, to expose a nice interface.
- This is just an analogy but in Swift String is such a commonly used hot path the type is designed to accommodate different backing representations in a performant way. The type has bits in its layout that indicate the backing storage. eg a constant string is just a pointer to the bytes in the binary and unless the String escapes or mutates incurs no heap allocation at all - it is just a stack allocation and a pointer.
Javascript implementations do their own magic since most objects aren't constantly mutating their prototypes or doing other fun things. They effectively fast-path property accesses and fallback if that assumption proves incorrect.
Couldn't python tag objects that don't need such dynamism (the vast majority) so it can take the fast path on them?
- Java also has a performance cost to accessing class fields, as exampled by this (now-replaced) code in the JDK itself - https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/...
- Any decent JIT compiler (and HotSpot's is world class) will optimize this out. Likely this was done very early on in development, or was just to reduce bytecode size to promote inlining heuristics that use it
- String is also a pretty damn fundamental object, and I'm sure trim() calls are extremely common too. I wouldn't be surprised if making sure that seemingly small optimizations like this are applied in the interpreter before the JIT kicks are not premature optimizations in that context.
There might be common scenarios where this had a real, significant performance impacts, E.G. use-cases where it's such a bottle-neck in the interpreter that it measurably affects warm-up time. Also, string manipulation seems like the kind of thing you see in small scripts that end before a JIT even kicks in but that are also called very often (although I don't know how many people would reach for Java in that case.
EDIT: also, if you're a commercial entity trying to get people to use your programming language, it's probably a good idea to make the language perform less bad with the most common terrible code. And accidentally quadratic or worse string manipulation involving excessive calls to trim() seems like a very likely scenario in that context.
- But what if whatever you call is also accessing and changing the attribute?
- If what you call gets inlined, then the compiler can see that it either does or doesn't modify the attribute and optimize it accordingly. Even virtual calls can often be inlined via, e.g., class hierarchy analysis and inline caches.
If these analyses don't apply and the callee could do anything, then of course the compiler can't keep the value hoisted. But a function call has to occur anyway, so the hoisted value will be pushed/popped from the stack and you might as well reload it from the object's field anyway, rather than waste a stack slot.
- Another thread can access it and do that, how could the compiler possibly know about it?
- There are documented ways to ensure that changes are visible across threads (e.g. locks). If these are not used, the compiler is within its rights to not go out of its way to pull changes from another thread.
- That was a niche optimization primarily targeting code at intepretor. Even the most basic optimizing compiler in HotSpot tiered compilation chain at that time (the client compiler or C1) would be able to optimize that into the register. Since String is such an important class, even small stuffs like this is done.
- > it is nuts that in an object method, there is a performance enhancement through caching a member value
i don't understand what you think is nuts about this. it's an interpreted language and the word `self` is not special in any way (it's just convention - you can call the first param to a method anything you want). so there's no way for the interpreter/compiler/runtime to know you're accessing a field of the class itself (let alone that that field isn't a computed property or something like that).
lots of hottakes that people have (like this one) are rooted in just a fundamental misunderstanding of the language and programming languages in general <shrugs>.
- > the word `self` is not special in any way (it's just convention - you can call the first param to a method anything you want).
The name `self` is a convention, yes, but interestingly in python methods the first parameter is special beyond the standard "bound method" stuff. See for example PEP 367 (New Super) for how `super()` resolution works (TL;DR the super function is a special builtin that generates extra code referencing the first parameter and the lexically defining class)
- I don't think it's a hot take to say much of Python's design is nuts. It's a very strange language.
- What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable. You can look it up once, go off and do something else, and look it up again and it's changed. It's dynamism taken to an unnecessary extreme. Nobody in the real world expects this behaviour. Making it just a bit less dynamic wouldn't change the fundamentals of the language but it would make it a lot more tractable.
- > What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable. You can look it up once, go off and do something else, and look it up again and it's changed.
There is no such thing as 'successive references to the same member value' here. It's not that you look up the same object and it can change, it's that you are not referring to the same object at all.
self.x is actually self.__getattr__('x'), which can in fact return a different thing each time. `self.x` IS a string lookup and that is not an implementation detail, but a major design goal. This is the dynamism, that is one of the selling points of Python, it allows you to change and modify interfaces to reflect state. It's nice for some things and it is what makes Python Python. If you don't want that, use another language.
- In Python attribute access aren't stable! `self.x` where `x` is a property is not guaranteed to refer to the same thing.
And getting rid of descriptors would be a _fundamental change to the language_. An immeense one. Loads of features are built off of descriptors or descriptor-like things.
And what you're complaining about is also not true in Javascript world either... I believe you can build descriptor-like things in JS now as well.
_But_ if you want that you can use stuff like mypyc + annotations to get that for you. There are tools that let you get to where you want. Just not out of the box because Python isn't that language.
Remember, this is a scripting language, not a compiled language. Every optimization for things you talk about would be paid on program load (you have pyc stuff but still..)
Gotta show up with proof that what you're saying is verifiable and works well. Up until ~6 or 7 years ago CPython had a concept of being easy to onboard onto. Dataflow analyses make the codebase harder to deal with.
Having said all of that.... would be nice to just inline RPython-y code and have it all work nicely. I don't need it on everything and proving safety is probably non-trivial but I feel like we've got to be closer to doing this than in the past.
I ... think in theory the JIT can solve for that too. In theory
- >Remember, this is a scripting language, not a compiled language
This is the fundamental issue and "elephant in the room" that everyone is seems to be overlooking, and putting under the carpet.
The extreme compiled type language guys going gung-ho with very slow to compile and complicated Rust (moreso than C++), while the rest of the world gladly hacking their shiny ML/AI codes in scripting language aka Python "the glue duct tapes language" with most if not all the fast engine libraries (e.g PyTorch) written in unsafe C/C++.
The problem is that Python was meant for scripting not properly designed software system engineering. After all it's based on ABC language for beginners with an asterisk attached "intended for teaching or prototyping, but not as a systems-programming language" [1].
In ten years time people will most probably look in horror at their python software stacks tech debt that they have to maintain for the business continuity. Or for their own sanity, they will rewrite the entire things in much more stable with fast development and compiled modern language eco-system like D language with native engine libraries, and seamless integration C, and C++ (to some extend) if necessary.
[1] ABC (programming language)
- > The problem is that Python was meant for scripting not properly designed software system engineering.
What something was meant to do has never, ever stopped people. People find creative ways to use tools in unintended ways all the time. It's what we do.
We can call this dumb or get misanthropic about it, or we can try to understand why people all over the world choose to use Python in "weird" ways, and what this tells us about the way people relate to computing.
- > What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable.
The language supports multiple threads and doesn’t have private fields (https://docs.python.org/3/tutorial/classes.html#private-vari...), so the runtime cannot rule out that the value gets changed in-between.
And yes, it often is obvious to humans that’s not intended to happen, and almost never what happens, but proving that is often hard or even impossible.
- wouldn't a concurrent change without synchronization be UB anyway? Also parent wants to cache the address, not the value (but you have to cache the value if you want to optimize manually)
- Not necessarily UB, but absolutely "spooky action" nondeterministic race conditions that make things difficult to understand.
- Why would it be UB? All objects are behind (thin) pointers, which can be overwritten atomically.
- > Nobody in the real world expects this behaviour.
For example, numbers and strings are immutable objects in Python. If self.x is a number and its numeric value is changed by a method call, self.x will be a different object after that. I'd dare say people expect this to work.
- basically all object oriented languages work like that. You access a member; you call a method which changes that member; you expect that change is visible lower in the code, and there're no statically computable guarantees that particular member is not touched in the called method (which is potentially shadowed in a subclass). It's not dynamism, even c++ works the same, it's an inherent tax on OOP. All you can do is try to minimize cost of that additional dereference. I'm not even touching threads here.
now, functional languages don't have this problem at all.
- OOP has nothing to do with it. In your C++ example, foo(bar const&); is basically the same as bar.foo();. At the end of the day, whether passing it in as an argument or accessing this via the method call syntax it's just a pointer to a struct. Not to mention, a C++ compiler can, and often does, choose to put even references to member variables in registers and access them that way within the method call.
This is a Python specific problem caused by everything being boxed by default and the interpreter does not even know what's in the box until it dereferences it, which is a problem that extends to the "self" object. In contrast in C++ the compiler knows everything there's to know about the type of this which avoids the issue.
- That's not true. I mean: it's true that it has little to do with OOP, but most imperative languages (only exception I know is Rust) have the issue, it's not "Python specific". For example (https://godbolt.org/z/aobz9q7Y9):
struct S { const int x; int f() const; }; int S::f() const { int a = x; printf("hello\n"); int b = x; return a-b; }
The compiler can't reuse 'x' unless it's able to prove that it definitely couldn't have changed during the `printf()` call - and it's unable to prove it. The member is loaded twice. C++ compilers can usually only prove it for trivial code with completely inlined functions that doesn't mutate any external state, or mutates in a definitely-not-aliasing way (strict aliasing). (and the `const` don't do any difference here at all)
In Python the difference is that it can basically never prove it at all.
- > This is a Python specific problem caused by everything being boxed
I would say it is part python being highly dynamic and part C++ being full of undefined behavior.
A c++ compiler will only optimize member access if it can prove that the member isn't overwritten in the same thread. Compatible pointers, opaque method calls, ... the list of reasons why that optimization can fail is near endless, C even added the restrict keyword because just having write access to two pointers of compatible types can force the compiler to reload values constantly. In python anything is a function call to some unknown code and any function could get access to any variable on the stack (manipulating python stack frames is fun).
Then there is the fun thing the C++ compiler gets up to with varibles that are modified by different threads, while(!done) turning into while(true) because you didn't tell the compiler that done needs to be threadsafe is always fun.
- What is going on here is not, that an attribute might be changed concurrently and the interpreter can't optimize the access. That is also a consideration. But the major issue is that an attribute doesn't really refer to a single thing at all, but instead means whatever object is returned by a function call that implements a string lookup. __getattr__ is not an implementation detail of the language, but something that an object can implement how it wants to, just like __len__ or __gt__. It's part of the object behaviour, not part of the static interface. This is a fundamental design goal of the Python language.
- > This is a Python specific problem caused by everything being boxed by default and the interpreter does not even know what's in the box until it dereferences it
That's not the whole thing, what is going on. Every attribute access is a function call to __getattr__, that can return whatever object it wants.
bar.foo (...) is actually bar.__getattr__ ('foo') (bar, ...)
This dynamism is what makes Python Python and it allows you to wrap domain state in interface structure.
- > same member value within the same function body are stable
Did you miss the part where I explained to you there's no way to identify that it's a member variable?
> Nobody in the real world expects this behaviour
As has already been explained to you by a sibling comment you are in fact wrong and there are in fact plenty of people in the real world who do actually expect this behavior.
So I'll repeat myself: lots of hottakes from just pure. Unadulterated, possibly willful, ignorance.
- The above is a very thick response that doesn't address the parent's points, just sweeps them under the rag with "that's just how it was designed/it works".
"Did you miss the part where I explained to you there's no way to identify that it's a member variable?"
No, you you did miss the case where that in itself can be considered nuts - or at least an unfortunate early decision.
"this just how things are dunn around diz here parts" is not an argument.
- > No, you you did miss the case where that in itself can be considered nuts - or at least an unfortunate early decision.
This is not a side implementation detail, that they got wrong, this is a fundamental design goal of Python. You can find that nuts, but then just don't use Python, because that is (one of) that things, that make Python Python.
- > considered nuts - or at least an unfortunate early decision
Please explain to us then how exactly you would infer a variable with an arbitrary name is actually a reference to the class instance in an interpreted language.
- >Please explain to us then how exactly you would infer a variable with an arbitrary name is actually a reference to the class instance in an interpreted language.
Did I stutter when I wrote about "an unfortunate early decision"? Who said it has to be "an arbitrary name"?
Even so, you could add a bloody marker announcing an arbitrary name (which 99% would be self anyway) as so, as an instruction to the interpreter. If it fails, it fails, like countless other things that can fail during runtime in Python today.
- But now you are no longer talking about the way Python works, but the way you want Python to work - and that has nothing to do with Python.
- "The way our economy works is bad"
"That's how it's been since forever, it's an essential part of country X"
"Yes, and it's a badly designed part".
"But now you are no longer talking about the way country X works, but the way you want country X to work - and that has nothing to do with country X."
See how the argument quickly degenerates?
- You mean even if x is not a property?
- That was how the Mojo language started. And then soon after the hype they said that being a superset of Python was no longer the goal. Probably because being a superset of Python is not a guarantee for performance either.
- Being a superset would mean all valid Python 3 is valid Python 4. A valuable property for sure, but not what OP suggested. In fact, it is the exact opposite.
- > python code could be so much faster if it didn't have to assume everything could change at any time
Definitely, but then it wouldn't be Python. One of the core principles of Python's design is to be extremely dynamic, and that anything can change at any time.
There are many other, pretty good, strictly dynamically typed languages which work just as well if not better than Python, for many purposes.
- I feel that this excuse is being trotted out too much. Most engineers never get to choose the programming language used for 90% of their professional projects.
And when Python is a mainstream language on top of which large, globally known websites, AI tools, core system utilities, etc are built, we should give up the purity angle and be practical.
Even the new performance push in Python land is a reflection of this. A long time ago some optimizations were refused in order to not complicate the default Python implementation.
- You’re always free to create your own Python-like language that caters more toward your goals. No excuses, then.
- This is not a substantive response to
> Most engineers never get to choose the programming language used for 90% of their professional projects.
If it was up to me, there are plenty of languages to choose from that meet my technical needs just fine, but the political friction of getting all of my colleagues (most of whom are not software engineers at all) to use my language of choice is entirely insurmountable. Therefore, I have a vested interested in seeing practical changes to Python. The existence or invention of other languages is irrelevant.
- If you're a contributor to Python, my apologies.
- I’m not a Python contributor, so no need to apologize to me. But if you have strong ideas about what Python should be, perhaps you should step up and contribute that code rather than saying that others are offering excuses for why they won’t deliver what you want. I have worked on other open source projects where users were very entitled, to the point of demanding that the project team deliver them certain features. It’s not fun. It’s ironic that open source often brings out both the best and the worst in people. Suggesting changes and new features is fine, even critical to a strong roadmap. But we all need to realize that maintainers may have other goals and there’s no obligation on their part to implement anything. The beauty of open source is that you can customize or fork as much as you want to match your goals. But then you’re responsible for doing the work and if your changes are public you may have your own set of users demanding their own favorite changes.
- I have made some experiments with P2W, my experimental Python (subset) to WASM compiler. Initial figures are encouraging (5x speedup, on specific programs).
https://github.com/abilian/p2w
NB: some preliminary results:
p2w is 4.03x SLOWER than gcc (geometric mean) p2w is 5.50x FASTER than cpython (geometric mean) p2w is 1.24x FASTER than pypy (geometric mean) - But that's just not what python is for. Move your performance-critical logic into a native module.
- Performance is one part of the discussion, but cleanliness is another. A Python4 that actually used typing in the interpreter, had value types, had a comptime phase to allow most metaprogramming to work (like monkey patching for tests) would be great! It would be faster, cleaner, easier to reason about, and still retain the great syntax and flexibility of the language.
- I too see potential in this - it started feeling a bit weird in recent years switching between Go, Python and Rust codebases with Python code looking more and more like a traditional statically typed language and not getting the performance benefits. I know I know, there are libraries and frameworks which make heavy use of fun stuff you can do with strings (leading to the breakdown of even the latest and greatest IDE tooling and red squiggly lines all over you code) and don’t get me started on async etc.
Funnily enough I’ve found Python to be excellent for modelling my problem domain with Pydantic (so far basically unparalleled, open for suggestions in Go/Rust), while the language also gets out of my way when I get creative with list expressions and the like. So overall, still it is extremely productive for the work I’m doing, I just need to spin up more containers in prod.
- > A Python4 that actually used typing in the interpreter, had value types, had a comptime phase to allow most metaprogramming to work (like monkey patching for tests) would be great! It would be faster, cleaner, easier to reason about, and still retain the great syntax and flexibility of the language.
And what prevents someone from designing such a language?
- PSF has full time employees. If someone else does it as a personal project it would remain a personal project and we'd never hear about it.
- I’ll be happy if over night all Python code in the world can reap 10-100x performance benefits without changing much of a codebase, you can continue having soup of multiple languages.
- Me too, but changing the referential semantics would be a massive breaking change. That doesn't qualify as "without changing much of a codebade". And tacking on a giant new orthogonal type system to avoid breaking existing code would be akin to creating a new language. Why bother when you can just write Python modules in Rust.
- I’d like to be good looking and drive a Ferrari. But that probably isn’t going to happen, either.
- Any program written in Python of any significant size is literally a soup of multiple languages.
- There's no project that isn't like that.
- >> Sensible type-annotated python code could be so much faster if it didn't have to assume everything could change at any time.
Then it wouldn't be Python any more.
- Fine by me. I don't particularly like Python, but it's the defacto standard in my field so I have to use it (admittedly this is an improvement over a decade ago, when MATLAB was the defacto standard). I don't care about preserving the spirit of Python, I just care that the thing that bears the name Python meets my needs.
- I share your view. Python's flexibility is central to Python.
Even type annotations, though useful, can get in the way for certain tasks.Betting on things like these to speed up things would be a mistake, since it would kind of force you to follow that style.
Anything that accelearates things should rely on run-time data, not on type annotations that won't change.
- > Python really needs to take the Typescript approach of "all valid Python4 is valid Python3"
It is called type hints, and is already there. TS typing doesn't bring any perf benefits over plain JS.
- You really need dedicated types for `int64` and something like `final`. Consider:
there are multiple issues with Python that prevent optimizations:class Foo: __slots__ = ("a", "b") a: int b: float* a user can define subtype `class my_int(int)`, so you cannot optimize the layout of `class Foo`
* the builtin `int` and `float` are big-int like numbers, so operations on them are branchy and allocating.
and the fact that Foo is mutable and that `id(foo.a)` has to produce something complicates things further.
- Maybe, but I quoted specific part I was replying to. TS has no impact on runtime performance of JS. Type hints in Python have no impact on runtime performance of Python (unless you try things like mypyc etc; actually, mypy provides `from mypy_extensions import i64`)
Therefore Python has no use for TS-like superset, because it already has facilities for static analysis with no bearing on runtime, which is what TS provides.
- Because the python devs weren't allowed to optimize on types. They are only hints, not contracts. If they become contracts, it will get 5-10x faster. But `const` would be more important than core types.
- What OP means is that they need to:
1) Add TS like language on top of Python in backwards compatible way
2) Introduce frozen/final runtime types
3) Use 1 and 2 to drive runtime optimizations
- Still makes no sense. OP demands introduction of different runtime semantics, but this doesn't require adding more language constructs (TS-like superset). Current type hints provide all necessary info on the language level, and it is a matter of implementation to use them or not.
From all posts it looks like what OP wants is a different language that looks somewhat like Python syntax-wise, so calling for "backwards-compatible" superset is pointless, because stuff that is being demanded would break compatibility by necessity.
- I went sort of this route in an experiment with Claude.. I really want Python for .NET but I said, damn the expense, prioritize .NET compatibility, remove anything that isn't supported feasably. It means 0 python libs, but all of NuGet is supported. The rules are all signatures need types, and if you declare a type, it is that type, no exceptions, just like in C# (if you squint when looking at var in a funny way). I wound up with reasonable results, just a huge trade of the entire Python ecosystem for .NET with an insanely Python esque syntax.
Still churning on it, will probably publish it and do a proper blog post once I've built something interesting with the language itself.
- IronPython -> TitaniumPython?
- Isn't rpython doing that, allowing changes on startup and then it's basically statically typed? Does it still exist? Was it ever production ready? I only once read a paper about it decades ago.
- It exists in the sense that PyPy exists.
As far as I can tell, it only ever existed to make PyPy possible, and was only defined/specified in terms of PyPy's needs.
- RPython is great, but it changes semantics in all sorts of ways. No sets for example. WTF? The native Set type is one of the best features of Python. Tuples also get mangled in RPython.
- I think sadly a lot of Python in the wild relies heavily, somewhere, on the crazy unoptimisable stuff. For example pytest monkey patches everything everywhere all the time.
You could make this clean break and call it Python 4 but frankly I fear it won't be Python anymore.
- As a person who has spent a lot of time with pytest, I'm ready for testing framework that doesn't do any of that non-obvious stuff. Generally use unittest as much as I can these days, so much less _wierd_ about how it does things. Like jeeze pytest, do you _really_ need to stress test every obscure language feature? Your job is to call tests.
- Yeah, I've been thinking about how I'd do it from scratch, honestly. (One of the reasons Pytest could catch on is that it supported standard library `unittest` classes, and still does. But the standard library option is already ugly as sin, being essentially an ancient port of JUnit.)
I think it's not so much that Pytest is using obscure language features (decorators are cool and the obvious choice for a lot of this kind of stuff) but that it wants too much magic to happen in terms of how the "fixtures" automatically connect together. I would think that "Explicit is better than implicit" and "Simple is better than complex" go double for tests. But things like `pytest.mark.parametrize` are extremely useful.
- If you do that you then have a less productive language for many use cases IMHO.
All the dynamism from Python should stay where it is.
Just JIT and remember a type maybe, but do not force a type from a type hint or such things.
As a minimum, I would say not relying on that is the correct thing. You could exploit it, but not force it to change the semantics.
- I think there are ways that it could be reined in quite a bit with most people not noticing. But it would still be a different language.
- Perl 6 showed what happens when you do something like that.
- Allowing metaprogramming at module import (or another defined phase) would cover most monkey patching use cases. From __future__ import python4 would allow developers to declare their code optimisable.
- > Python really needs to take the Typescript approach of "all valid Python4 is valid Python3
Great idea, but I'm not convinced that they learned anything from the Python 2 to 3 transition, so I wouldn't hold my breath.
If you want a language system without contempt for backward compatibility, you're probably better off with Java/C++/JavaScript/etc. (though using JS libraries is like building on quicksand.) Bit of a shame since I want to like Python/Rust/Swift/other modern-ish languages, but it turns out that formal language specifications were actually a pretty good idea. API stability is another.
- is that you, python core dev team? ;-)
- SPy [1] is a new attempt at something like this.
TL;DR: SPy is a variant of Python specifically designed to be statically compilable while retaining a lot of the "useful" dynamic parts of Python.
The effort is led by Antonio Cuni, Principal Software Engineer at Anaconda. Still very early days but it seems promising to me.
- There will be not Python 4, and 3.X policy requires forward compat, so we are already there.
- Oh, and while we're at it, fix the "empty array is instantiated at parse time so all your functions with a default empty array argument share the same object" bullshit.
- We don't call them "arrays".
It has nothing to do with whether the list is empty. It has nothing to do with lists at all. It's the behaviour of default arguments.
It happens at the time that the function object is created, which is during runtime.
You only notice because lists are mutable. You should already prefer not to mutate parameters, and it especially doesn't make sense to mutate a parameter that has a default value because the point of mutating parameters is that the change can be seen by the caller, but a caller that uses a default value can't see the default value.
The behaviour can be used intentionally. (I would argue that it's overused intentionally; people use it to "bind" loop variables to lambdas when they should be using `functools.partial`.)
If you're getting got by this, you're fundamentally expecting Python to work in a way that Pythonistas consider not to make sense.
- It's best practice to avoid mutable defaults even if you're not planning to mutate the argument.
It's just slightly annoying having to work around this by defaulting to None.
- You don't need to use `None`. If you indeed aren't planning to mutate the argument, then use something immutable that provides the necessary interface. Typically, this will be `()`, and then your logic doesn't require the special case. I genuinely don't understand, after 20+ years of this, why everyone else has decided that the `None` check should be idiomatic. It's just, ugh. I'm pretty sure I've even seen people do this where a string is expected and `''` is right there staring at them as the obvious option.
- Execution time, not parse time. It's a side effect of function declarations being statements that are executed, not the list/dict itself. It would happen with any object.
- It's still ridiculous. A hypothetical Python4 would treat function declarations as declarations not executable statements, with no impact on real world code except to remove all the boilerplate checks.
- There is no such thing as a "function declaration" in Python. The keyword is "def", which is the first three letters of the word "define" (and not a prefix of "declare"), for a reason.
The entire point of it being an executable statement is to let you change things on the fly. This is key to how the REPL works. If I have `def foo(): ...` twice, the second one overwrites the first. There's no need to do any checks ahead of time, and it works the same way in the REPL as in a source file, without any special logic, for the exact same reason that `foo = 1` works when done twice. It's actually very elegant.
People who don't like these decisions have plenty of other options for languages they can use. Only Python is Python. Python should not become not-Python in order to satisfy people who don't like Python and don't understand what Python is trying to be.
- You are describing a completely different language, that differs in very major ways from Python. You can of course create that, but please don't call it Python 4 !
- You think so but then you write a function with a default argument pointing to some variable that is a list and now suddenly the semantics of that are... what?
- you could just treat argument initialization as an executable expression which is called every time you call a function. If you have a=[], then it's a new [] every time. If a=MYLIST then it's a reference to the same MYLIST. Simple. And most sane languages do it this way, I really don't know why python has (and maintain) this quirk.
- What are the semantics of the following:
Should it create a copy of b every time the function is invoked? If you want that right now, you can just call b.copy (), when you always create that copy, then you can not implement the current choice.b = ComplexObject (...) # do things with b def foo (self, arg=b): # use b return fooShould the semantic of this be any different? :
Now imagine a:def foo (self, arg=ComplexObject (...)):ComplexObject = list- I wonder, why that kind of ambiguity or complexity even comes to your mind at all. Just because python is weird?
def foo(self, arg=expression):
could, and should work as if it was written like this (pseudocode)
def foo(self, arg?): if is_not_given(arg): arg=expression
if "expression" is a literal or a constructor, it'd be called right there and produce new object, if "expression" is a reference to an object in outer scope, it'd be still the same object.
it's a simple code transformation, very, very predictable behavior, and most languages with closures and default values for arguments do it this way. Except python.
- What you want is for an assignment in a function definition to be a lambda.
Assignment of unevaluated expressions is not a thing yet in Python and would be really surprising. If you really want that, that is what you get with a lambda.def foo (self, arg=lambda : expression):> most languages with closures and default values for arguments do it this way.
Do these also evaluate function definitions at runtime?
- yes they do. check ruby for example.
- Let's not get started on the cached shared object refs for small integers....
- What realistic use case do you have for caring about whether two integers of the same value are distinct objects? Modern versions of Python warn about doing unpredicatble things with `is` exactly because you are not supposed to do those things. Valid use cases for `is` at all are rare.
- if v is not None as opposed to if not v is one of those use cases if you store 0 or False or an empty list, etc.
- > Valid use cases for `is` at all are rare.
There might not be that many of them, depending on how you count, but they're not rare in the slightest. For example, you have to use `is` in the common case where you want the default value of a function argument to be an empty list.
- I assume you refer to the `is None` idiom. That happens often enough, but I count it as exactly one use case, and I think it's usually poorly considered anyway. Again, you probably don't actually want the default value to be an empty list, because it doesn't make a lot of sense to mutate something that the caller isn't actually required to provide (unless the caller never provides it and you're just abusing the default-argument behaviour for some kind of cache).
Using, for example, `()` as a default argument, and cleaning up your logic to not do those mutations, is commonly simpler and more expressive. A lot of the community has the idea that a tuple should represent heterogeneous fixed-length data and a list should be homogeneous; but I consider (im)mutability to be a much more interesting property of types.
- Could you expand on this? For example, this works just fine:
Edit: Oh, I think you probably mean in cases where you're mutating the input list.def silly_append(item, orig=[]): return orig + [item]
- If you change this you break a common optimization:
https://github.com/python/cpython/blob/3.14/Lib/json/encoder...
Default value is evaluated once, and accessing parameter is much cheaper than global
- there is PEP 671 for that, which introduces extra syntax for the behavior you want. people rely on the current behavior so you can't really change it
- I'm been occasionally glancing at PR/issue tracker to keep up to date with things happening with the JIT, but I've never seen where the high level discussions were happening; the issues and PRs always jumped right to the gritty details. Is there anywhere a high-level introduction/example of how trace projection vs recording work and differ? Googling for the terms often returns CPython issue tracker as the first result, and repo's jit.md is relatively barebones and rarely updated :(
Similarly, I don't entirely understand refcount elimination; I've seen the codegen difference, but since the codegen happens at build time, does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs? With so many opcodes and their specialized variants, how many stencils are there now?
- > I've never seen where the high level discussions were happening
Thanks for your interest. This is something we could improve on. We were supposed to document the JIT better in 3.15, but right now we're crunching for the 3.15 release. I'll try to get to updating the docs soon if there's enough interest. PEP 744 does not document the new frontend.
I wrote a somewhat high-level overview here in a previous blog post https://fidget-spinner.github.io/posts/faster-jit-plan.html#...
> does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs?
This is a great question, the answer is not exactly! The key is to expose the refcount ops in the intermediate representation (IR) as one single op. For example, BINARY_OP becomes BINARY_OP, POP_TOP (DECREF), POP_TOP (DECREF). That way, instead of optimizing for n operations, we just need to expose refcounting of n operations and optimize only 1 op (POP_TOP). Thus, we just need to refactor the IR to expose refcounting (which was the work I divided up among the community).
If you have any more questions, I'm happy to answer them either in public or email.
- Update: I put up a PR to document the trace recording interpreter https://github.com/python/cpython/pull/146110
- You’ll probably want to look to the PEPs. Havent dug into this topic myself but looks related https://peps.python.org/pep-0744/
- I think CPython already had tier2 and some tracing infrastructure when the copy-and-patch JIT backend was added; it's the "JIT frontend" that's more obscure to me.
- discussions might be happening on the Python forums, which are pretty active.
https://discuss.python.org/t/pep-744-jit-compilation/50756/8... here's one thing
I do think you can also just outright ask questions about it on the forums and you'll get some answers.
At the end of the day there's only so many people working on this though.
- UPDATE: I misunderstood the question :-/ You can ignore this.
I love playing with compilers for fun, so maybe I can shed some light. I’ll explain it in a simplified way for everyone’s benefit (going to ignore the stack):
When an object is passed between functions in Python, it doesn’t get copied. Instead, a reference to the object’s memory address is sent. This reference acts as a pointer to the object’s data. Think of it like a sticky note with the object’s memory address written on it. Now, imagine throwing away one sticky note every time a function that used a reference returns.
When an object has zero references, it can be freed from memory and reused. Ensuring the number of references, or the “reference count” is always accurate is therefore a big deal. It is often the source of memory leaks, but I wouldn’t attribute it to a speed up (only if it replaces GC, then yes).
- what at all does this comment have to do with what it's replying to?
- I misread the original comment, thinking it was a question about what is refcount elimination, than how it affects the JIT's performance(?).
- Oh man, Python 2 > 3 was such a massive shift. Took almost half a decade if not more and yet it mainly changing superficial syntax stuff. They should have allowed ABIs to break and get these internal things done. Probably came up with a new, tighter API for integrating with other lower level languages so going forward Python internals can be changed more freely without breaking everything.
- The text encoding stuff wasn't a small change considering what it could break, at least. And remember we're sometimes talking about software that would cost a lot of money to migrate or upgrade. I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.
Although your general sentiment is something I agree with(if it's going to be painful do it and get it over with), I don't believe anybody knew or could've guessed what the reaction of the ecosystem would be.
Your last point about being able to change internals more freely is also great in theory but very difficult(if not impossible) to achieve in practice.
I don't know. Having maintained some small projects that were free and open source, I saw the hostility and entitlement that can come from that position. And those projects were a spec of dust next to something like Python. So I think the core team is doing the best they can. It was always going to be damned if you do, damned if you don't.
- > I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.
Slight tangent: if Claude can decimate IBM stock price by migrating off Cobol for cheap, surely we can do Python 2 to 3 now, too?
About the internals: we sort of missed an opportunity there, but back then there also didn't quite know what they were doing (or at least we have better ideas of what's useful today). And making the step from 2 to 3 even bigger might have been a bad idea?
- I wasn't aware that migrating projects off Cobol has become cheap and it would only take a Claude subscription.
In my experience, the problem had always been maintaining the business logic and any integrations with third-party software that also may be running legacy code-bases or have been abandoned. It can get quite complicated, from what I've seen. Now of course if you're talking about well maintained code-bases with 100%, or close to 100% test coverage, and that includes the integration part along with having the ability to maintain the user experience and/or user interface then yes it becomes a relatively easy process of "just write the code". But, in my experience, this has never been the case.
For the 2.x code-bases I maintain, the customers simply doesn't want to pay for any of it. They might choose to at a later time, but so far it has been more cost effective for them to pay me to maintain that legacy code than pay to have it migrated. Other customers have different needs and thus budget differently.
I'll refrain from judging if 2 to 3 was a missed opportunity or not. I believe the core team does actually know what they're doing and that any decision would've been criticized.
- > I wasn't aware that migrating projects off Cobol has become cheap and it would only take a Claude subscription.
It's like you never even saw u/bumlazer42069's seminal post that 'itd only take 3 prompts and a weekend for me to port all that cobol to typescript'
- IBM shares fell 13% in a single day in last month:
"IBM Sinks Most Since 2000 as Anthropic Touts Cobol Tool"
https://finance.yahoo.com/news/ibm-sinks-most-since-2000-210...
It may not be "cheap", but possibly cheaper than IBM's consulting.
- I skip news like that. It's an AI business hyping one of their tools in a major AI hype-cycle. Shares can go up and down based on sentiment. My point still stands.
To me, there's a big difference between saying that migration projects can now be assisted with some AI tooling and saying that it is cheap and to just get Claude to do it.
Maybe I am out of touch but the former is realistic and the latter is just magical hand-waving.
- Share-pricing operates on illusions. Just selling a plausible claim can influence the price. Whether they will deliver at the end, doesn't matter at that moment.
- IBM share price is back to where it was pre-Anthropic press release.
- Sure, but imagine how much higher it would have gone in the counterfactual world where Anthropic didn't have an automatic port-from-Cobol tool.
- Remember that those who trade on the stock market are not programmers with decades of experience writing cobol.
- > I believe the core team does actually know what they're doing and that any decision would've been criticized.
I agree with the latter. About the former: they probably made a good decisions given the information available at the time. I mean that nowadays they know more than they did in the past.
- Absoultely, I had a 2 -> 3 code base I'd mostly given up on, and Claude was amazing. It even re-wrote some libraries I used without py3 versions, decided to just write the parts of the libraries I needed.
It does much better with good tests. In my case the output was a statically generated website, so I could just say 'make the same website, given these inputs'.
- I cannot believe people are still acting like Python 2->3 was a huge fuck-up and an enormous missed opportunity. When in reality Python is by most measures the most popular language and became so AFTER that switch.
Since the switch we have seen enormous companies being built from scratch. There is no reason for anyone to be complaining about it being too hard to upgrade in 2026
- Living through it... Python 3 made a lot of changes for the better but 3.0 in particular included a bunch of unforced errors that made it too hard for people to upgrade in one go.
It wasn't until much later (I would say 3.4 or 3.5?) that we had good tooling to allow for migrating from Python 2 to Python 3 gradually, which is what most tools needed to do.
The final thing that made Python upgrading easy was making a bunch of changes (along with stuff like six) so that you could write code that would run identically in Python 2 and Python 3. That lets you do refactors over time, little cleanups, and not have the huge "move to Python 3" commit.
- > Python is by most measures the most popular language and became so AFTER that switch
The switch had nothing to do with Python's rise in popularity though, it was because of NumPy and later PyTorch being adopted by data scientist and later machine learning tasks that themselves became very popular. Python's popularity rose alongside those.
> There is no reason for anyone to be complaining about it being too hard to upgrade in 2026
The "complaints" are about unnecessary and pointless breakage, that was very difficult for many codebases to upgrade for years. That by now most of these codebases have been either abandoned, upgraded or decided to stick with Python2 until the end of time doesn't mean these pains didn't happen nor that the language's developers inflicting them to their users were a good idea because some largely unrelated external factors made the language popular several years later.
- > that was very difficult for many codebases to upgrade for years.
In case people have forgotten: python 3.3 through 3.5 (and 3.6 I think) each had to reintroduce something that was removed to make the upgrade easier. Jumping from 2.7 to 3.3 (or higher depending on what you needed) was the recommended route because of this, it was less work than going to 3.0, 3.1, or 3.2
- It took a long time for python 3 to add the necessary backwards compatibility features to allow people to switch over. Once they did it was fine, but it was a massive fuck up until then. The migration took far longer than it should have done
Its widely regarded as a disaster for good reason, that forced some corrections in python to fix it. Just because its fine now, does not mean it was always fine
- Now they just break stuff every release so we never relax.
- Those are unrelated.
- The biggest (and worst planned) change was module names. Your imports didn't work, forcing hacks like
Or worse, people used try/except in their imports.if sys.version_info.major == 2: import old else: import new - yes. it was not a massive shift. it was barely worth the effort.
- The Python devs didn’t want to make huge changes because they were worried Python 3 would end up taking forever like Perl 6. Instead they went to the other extreme and broke everyone’s code for trivial reasons and minimal benefit, which meant no-one wanted to upgrade.
Even the main driver for Python 3, the bytes-Unicode split, has unfortunately turned out to be sub-optimal. Python essentially bet on UTF-32 (with space-saving optimisations), while everyone else has chosen UTF-8.
- > Python essentially bet on UTF-32 (with space-saving optimisations)
How so? Python3 strings are unicode and all the encoding/decoding functions default to utf-8. In practice this means all the python I write is utf-8 compatible unicode and I don't ever have to think about it.
- UTF-32 allows for constant time character accesses, which means that mystr[i] isn't O(n). Most other languages can only provide constant time access for code units.
- UTF-32 allows for constant time access to code points. Neither UTF-8 nor UTF-16 can do the same (there are 2 to the power of 20 valid code points, though not all are in use).
While most characters might be encodable as a single code point, Python does not normalize strings, so there is no guarantee that even relatively normal characters are actually stored as single code points.
Try this in Python:
You will see:s = "a\u0308" print(s) print(s[0])ä a
- Internally Python holds a string as an array of uint32. A utf-8 representation is created on demand from it (and cached). So pansa2 is basically correct [^1].
IMO, while this may not be optimal, it's far better than the more arcane choice made by other systems. For example, due to reasons only Microsoft can understand, Windows is stuck with UTF-16.
[1] Actually it's more intelligent. For example, Python automatically uses uint8 instead of uint32 for ASCII strings.
- There is no caching of a "utf-8 representation". You may check for example:
Generally, the only reason this would happen implicitly is for I/O; actual operations on the string operate directly on the internal representation.>>> x = '日本語'*100000000 >>> import time >>> t = time.time(); y = x.encode(); time.time() - t # takes nontrivial time >>> t = time.time(); y = x.encode(); time.time() - t # not cached; not any fasterPython uses either 8, 16 or 32 bits per character according to the maximum code point found in the string; uint8 is thus used for all strings representable in Latin-1, not just "ASCII". (It does have other optimizations for ASCII strings.)
The reason for Windows being stuck with UTF-16 is quite easy to understand: backwards compatibility. Those APIs were introduced before there supplementary Unicode planes, such that "UTF-16" could be equated with UCS-2; then the surrogate-pair logic was bolted on top of that. Basically the same thing that happened in Java.
- > There is no caching of a "utf-8 representation".
No there certainly is. This is documented in the official API documentation:
In particular, Python's Unicode object (PyUnicodeObject) contains a field named utf8. This field is populated when PyUnicode_AsUTF8AndSize() is first called and reused thereafter. You can check the exact code I'm talking about here:UTF-8 representation is created on demand and cached in the Unicode object. https://docs.python.org/3/c-api/unicode.html#unicode-objectshttps://github.com/python/cpython/blob/main/Objects/unicodeo...
Is it clear enough?
- The C API may provide for it, but I'm not seeing a way to access that from Python. This sort of thing is provided for people writing C extensions who need to interface to other C code.
(And the code search seems to be broken; it can't find me the definition of `unicode_fill_utf8` although I'm sure it's obvious enough.)
- Read first paragraph here https://devblogs.microsoft.com/oldnewthing/20190830-00/?p=10...
- > all the encoding/decoding functions default to utf-8
Languages that use UTF-8 natively don't need those functions at all. And the ones in Python aren't trivial - see, for example, `surrogateescape`.
As the sibling comment says, the only benefit of all this encoding/decoding is that it allows strings to support constant-time indexing of code points, which isn't something that's commonly needed.
- They absolutely do because random byte strings are not valid utf8. Safe Rust requires validating bytes when converting to strings because this.
- > Python essentially bet on UTF-32 (with space-saving optimisations), while everyone else has chosen UTF-8.
It did nothing of the sort. UTF-8 is the default source file encoding and has been the target for many APIs. It likely would have been the default for all I/O stuff if we lived in a world where Windows had functioning Unicode in the terminal the whole time and didn't base all its internal APIs on UTF-16.
I assume you're referring to the internal representation of strings. Describing it as "UTF-32 with space-saving optimizations" is missing the point, and also a contradiction in terms. Yes, it is a system that uses the same number of bytes per character within a given string (and chooses that width according to the string contents). This makes random access possible. Doing anything else would have broken historical expectations about string slicing. There are good arguments that one shouldn't write code like that anyway, but it's hard to identify anything "sub-optimal" about the result except that strings like "I'm learning 日本語" use more memory than they might be able to get away with. (But there are other strings, like "ℍℯℓ℗", that can use a 2-byte width while the UTF-8 encoding would add 3 bytes per character.)
- Ironically Perl 5 managed to do the bytes-Unicode split with a feature gate, no giant major version change.
- I'm curious is the JIT developers could mention any Python features that prevent promising JIT features. An earlier Ken Jin blog [1], mentions how __del__ complicates reference counting optimization.
There is a story that Python is harder to optimize than, say, Typescript, with Python flexibility and the C API getting mentioned. Maybe, if the list of troublesome Python features was out there, programmers could know to avoid those features with the promise of activating the JIT when it can prove the feature is not in use. This could provide a way out of the current Python hard-to-JIT trap. It's just a gist of an idea, but certainly an interesting first step would be to hear from the JIT people which Python features they find troublesome.
[1] https://fidget-spinner.github.io/posts/faster-jit-plan.html
- It's interesting you mention __del__ because Javascript not only doesn't have destructors but for security reasons (that are above my pay grade) but the spec _explicitly prohibits_ implementations from allowing visibility into garbage collection state, meaning that code cannot have any visibility into deallocations.
I think __del__ is tricky though. In theory __del__ is not meant to be reliable. In practice CPython reliably calls it cuz it reference counts. So people know about it and use it (though I've only really seen it used for best effort cleanup checks)
In a world where more people were using PyPy we could have pressure from that perspective to avoid leaning into it. And that would also generate more pressure to implement code that is performant in "any" system.
- > In practice CPython reliably calls it cuz it reference counts ... In a world where more people were using PyPy we could have pressure from that perspective to avoid leaning into it
A big part of the problem is that much of the power of the Python ecosystem comes specifically from extensions/bindings written in languages with manual (C) or RAII/ref-counted (C++, Rust) memory management, and having predictable Python-level cleanup behavior can be pretty necessary to making cleanup behavior in bound C/C++/Rust objects work. Breaking this behavior or causing too much of a performance hit is basically a non-starter for a lot of Python users, even if doing so would improve the performance of "pure" Python programs.
- That cleanup can be explicit when needed by using context managers. Mixing resource handling with object lifetime is a bad design choice
- > That cleanup can be explicit when needed by using context managers.
It certainly can be, but if a large part of the Python code you are writing involves native objects exposed through bindings then using context managers everywhere results in an incredible mess.
> Mixing resource handling with object lifetime is a bad design choice
It is a choice made successfully by a number of other high-performance languages/runtimes. Unfortunately for Python-the-language, so much of the utility of Python-the-ecosystem depends on components written in those languages (unlike, for example, JVM or CLR languages where the runtime is usually fast enough to require a fairly small portion of non-managed code).
- Tell that to the C++ guys...
- > code cannot have any visibility into deallocations
Doesn't FinalizationRegistry let you do exactly that?
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
- That link itself calls out that conformant implementations can’t be relied on to call callbacks.
> A conforming JavaScript implementation, even one that does garbage collection, is not required to call cleanup callbacks. When and whether it does so is entirely down to the implementation of the JavaScript engine. When a registered object is reclaimed, any cleanup callbacks for it may be called then, or some time later, or not at all. It's likely that major implementations will call cleanup callbacks at some point during execution, but those calls may be substantially after the related object was reclaimed. Furthermore, if there is an object registered in two registries, there is no guarantee that the two callbacks are called next to each other — one may be called and the other never called, or the other may be called much later. There are also situations where even implementations that normally call cleanup callbacks are unlikely to call them:
- It's supported in all of the major engines. And you also can't rely on the garbage collector to run at a predictable time (or at all!), so the engine never calling finalizers is functionally the same as the garbage collector being unusual.
- The only (other) visible effect of GC not running is memory exhaustion. WeakRef/FinalizationGroup not getting triggered can have lots of script-visible effects, so can be much much worse. I wouldn't describe that as "functionally the same".
- Oh! While this one does mention that you don't have visibility, this + weak refs seem to change the game
I remember a couple of years ago (well probably around 2021) reading about GC exposure concerns and seeing some line in some TC39 doc like "users should not have visibility into collection" but if we've shipped weakrefs sounds like we're not thinking about that anymore
- We still try to limit any additional exposure as much as possible, and WR/FG are specced to keep the visibility as coarse as possible. (Collections won't be visible until the current script execution finishes, though async adds a lot more places where that can happen.)
A proposal to add new ways of observing garbage collection will still be shot down immediately without a damn good justification.
- > meaning that code cannot have any visibility into deallocations.
This is more pedantry than a serious question. JavaScript has WeakReference, sure it'd be cumbersome and inefficient because you'd need to manually make and poll each thing you wanted to observe, but could it not be said that it does provide a view on deallocations?
- Yes, WeakRef and FinalizationGroup both make GC visible (the latter removes the need to poll in your example). So not pedantic at all. They were eventually added after much reluctance from the language designers and implementers, partly because they can lead to code being broken by (valid & correct) engine optimizations, which is a big no-no on the web. But some things simply cannot be implemented without them.
Note that 90% of the uses for them actually shouldn't be using them, usually for subtle reasons. It's always a big cause for debate.
- Huh, I could imagine that as a set of Ruff rules:
> Using str.frobnicate prevents TurboJit on line 63
- The biggest thing is BigInt by default. It makes every integer operation require an overflow check.
- JS (when using ints, which v8 does) is the same in this respect.
- > However, I misunderstood and came up with an even more extreme version: instead of tracing versions of normal instructions, I had only one instruction responsible for tracing, and all instructions in the second table point to that. Yes I know this part is confusing, I’ll hopefully try to explain better one day. This turned out to be a really really good choice. I found that the initial dual table approach was so much slower due to a doubling of the size of the interpreter, causing huge compiled code bloat, and naturally a slowdown.
> By using only a single instruction and two tables, we only increase the interpreter by a size of 1 instruction, and also keep the base interpreter ultra fast. I affectionally call this mechanism dual dispatch.
I really do hope they'll write that better explanation one day because this sounds pretty intriguing all on its own.
- > We don’t have proper free-threading support yet, but we’re aiming for that in 3.15/3.16. The JIT is now back on track.
I recently read an interview about implementing free-threading and getting modifications through the ecosystem to really enable it: https://alexalejandre.com/programming/interview-with-ngoldba...
The guy said he hopes the free-threaded build'll be the only one in "3.16 or 3.17", I wonder if that should apply to the JIT too or how the JIT and interpreter interact.
- I continue to believe that free-threading hurts performance more than it helps and Python should abandon it.
Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.
- > Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.
Way more than 1% of the community, particularly of the community actively developing Python, wants free-threaded. The problem here is that the Python community consists of several different groups:
1. Basically pure Python code with no threading
2. Basically pure Python with appropriate thread safety
3. Basically pure Python code with already broken threaded code, just getting lucky for now
4. Mixed Python and C/C++/Rust code, with appropriate threading behavior in the C or C++ components
5. Mixed Python and C or C++ code, with C and C++ components depending on GIL behavior
Group 1 gets a slightly reduced performance. Groups 2 and 4 get a major win with free-threaded Python, being able to use threading through their interfaces to C/C++/Rust components. Group 3 is already writing buggy code and will probably see worse consequences from their existing bugs. Group 5 will have to either avoid threading in their Python code or rewrite their C/C++ components.
Right now, a big portion of the Python language developer base consists of Groups 2 and 4. Group 5 is basically perceived as holding Python-the-language and Python-the-implementations back.
- Where is the major win? Sorry but I just don't see the use case for free-threading.
Native code can already be multi-threaded so if you are using Python to drive parallelized native code, there's no win there. If your Python code is the bottleneck, well then you could have subinterpreters with shared buffers and locks. If you really need to have shared objects, do you actually need to mutate them from multiple interpreters? If not, what about exploring language support for frozen objects or proxies?
The only thing that free threading gives you is concurrent mutations to Python objects, which is like, whatever. In all my years of writing Python I have never once found myself thinking "I wish I could mutate the same object from two different threads".
- > Native code can already be multi-threaded so if you are using Python to drive parallelized native code, there's no win there.
When using something like boost::python or pybind11 to expose your native API in Python, it is not uncommon to have situations where the native API is extensible via inheritance or callbacks (which are easy to represent in these binding tools). Today with the GIL you are effectively forced to choose between exposing the native API parallelism or exposing the native API extensibility; e.g. you can expose a method that performs parallel evaluation of some inputs, OR you can expose a user-provided callback to be run on the output of each evaluation, but you cannot evaluate those inputs and run a user-provided callback in parallel.
The "dumbest" form of this is with logging; people want to redirect whatever logging the native code may perform through whatever they are using for logging in Python, and that essentially creates a Python callback on every native logging call that currently requires a GIL acquire/release.
Could some of this be addressed with various Python-specific workarounds/tools? Probably. But doing so is probably also going to tie the native code much more tightly to problematic/weird Pythonisms (in many cases, the native library in question is an entirely standalone project).
> The only thing that free threading gives you is concurrent mutations to Python objects, which is like, whatever.
The big benefit is that you get concurrency without the overhead of multi-process. Shared memory is always going to be faster than having to serialize for inter-process communication (let alone that not all Python objects are easily serializable).
- Maybe they could have two versions of the interpreter, one that’s thread-safe and one that’s optimised for single-threading?
Microsoft used to do this for their C runtime library.
- PHP does this as well. Most distributions ship PHP without thread safety, but it's seeing more use now that FrankenPHP uses it. Speaking of which, it would be nice if PHP's JIT got a little love: it's never eked out more than marginal gains in heavily-numeric code.
- That's exactly what we have now and it looks like the python devs want a single unified build at some point
- Pure Python code always needed mutexes for thread safety with or without ol' GIL. I thought the difficulty with removing the GIL instead had to do with C extensions that rely on it.
- This is accurate and the parent commenter here seems to be echoing a common misconception. Either they are confused or they need to elaborate more to demonstrate that they have a valid complaint.
For instance, this would have been a valid complaint:
"Users who don't need free threading will now suffer a performance penalty for their single-threaded code."
That is true. But if you are currently using multiple threads, code that was correct before will still be correct in the free threaded build, and code that was incorrect before will still be incorrect.
- I don't want to go too heavy on the negatives, but what's nuts is Python going for trust-the-programmer style multithreading. The risk is that extension modules could cause a lot of crashes.
- My understanding is that many extension modules are already written to take advantage of multithreading by releasing the GIL when calling into C code. This allows true concurrency in the extension, and also invites all the hazards of multithreading. I wonder how many bugs will be uncovered in such extensions by the free threaded builds, but it seems like the “nuts” choice actually happened a long time ago.
- I also wonder how many people actually need free-threading. And I wonder how useful it will be, when you can already use the ABI to call multi-threaded code.
I think the GIL provides python with a great guarantee, I would probably prefer single-thread performance improvements over multithreading in python to be honest.
Anyway if I need performance, Python would probably not be my first choice
- Doesn't PyPy already have a jit compiler? Why aren't we using that?
- As far as I know, PyPy doesn't support all CPython extensions, so pure Python code will probably (very likely) run fine but for other things most bets are off. I believe PyPy also only supports up to 3.11?
- PyPy isn't CPython.
A lot of Python code still leans on CPython internals, C extensions, debuggers, or odd platform behavior, so PyPy works until some dependency or tool turns that gap into a support problem.
The JIT helps on hot loops, but for mixed workloads the warmup cost and compatibility tax are enough to keep most teams on the interpreter their deps target first.
- Why shouldn't the reference implementation get JIT? Just because some other implementations already have it is no reason not to. That'd be like skipping list comprehensions because they already exist in CPython.
- Because the same people who made a big deal about supporting PyPy and PEP 399 when it was fashionable to do so are now told by their corporations that PyPy does not matter. CPython only moves with what is currently fashionable, employer mandated and profitable.
- PyPy is limited to maintenance mode due to a lack of funding/contributors. In the past, I think a few contributors or funding is what helped push "minor" PyPy versions. It's too bad PyPy couldn't take the federal funding the PSF threw away.
- > It's too bad PyPy couldn't take the federal funding the PSF threw away.
The PSF is primarily a political advocacy organisation, so it wouldn't make sense for them to use the money for Python.
- Because PyPy seems to be defunct. It hasn't updated for quite a while.
See https://github.com/numpy/numpy/issues/30416 for example. It's not being updated for compatibility with new versions of Python.
- PyPy's devs disagree: https://news.ycombinator.com/item?id=47293415
- [flagged]
- It supports at best Python 3.11 code, right?
So it’s not unmaintained, no. But the project is currently under resourced to keep up with the latest Python spec.
- That is not the same thing at all, and not what he said.
- It is exactly what I'm referring to. I didn't say there aren't still people around. But they're far enough behind CPython that folks like NumPy are dropping support. Unless they get a substantial injection of new people and new energy, they're likely to continue falling behind.
- > I didn't say there aren't still people around.
You said it was defunct, which would mean there aren't still people working on it.
- Not what you wrote.
Also CPython 3.10 is not EOL so library authors won't be using anything from 3.11 anyway.
- Great to see this going, Python also deserves a JIT, and given that only few bother with PyPy or GraalPy, shipping into the CPYthon is the only way to have less "rewrite into XYZ".
Kudos to those involved into making it happen.
- [dead]
- [dead]
- Thanks for all the amazing work! I have Noob question. Wouldn't this get the funding back? Or would that not be preferable way to continue(as opposed to just volunteer driven)?
Like this is a big deal to get a project to a state where volunteers are spun up and actively breaking tasks and getting work done, no? It's a python JIT something I know next to nothing about — as do most application developers — which tells one how difficult this must have been.
- > Wouldn't this get the funding back?
The funding was Microsoft employing most of the team. They were laid off (or at least, moved onto different projects), apparently because they weren't working on AI.
- With Python being the main language for AI, isn't like more important to be more performant? I kinda don't get Microsoft reasoning, maybe they're just tight in money
- I don’t think Python is the main language of AI.
- Python is pretty big as glue in the AI ecosystem as far as I can tell. It also seems to be most agent's 'preferred' language to write code in, when you don't specify anything.
(The latter is probably more to do with the preferences they give it in the re-inforcement learning phase than anything technical, though.)
- It looks like ARM picked up plenty of those folk and pays them to continue this work.
- I always wanted this for Python but now that machines write code instead of humans I feel like languages like Python will not be needed as much anymore. They're made for humans, not machines. If a machine is going to do the dirty work I want it to produce something lean, fast, and strictly verified.
- > now that machines write code instead of humans
That is not remotely the case for anyone who produces quality work.
- Look again.
If you care about quality you absolutely can guide a machine to produce that for you without writing a single line of code yourself.
And I expect the amount of guidance needed will continue to drop.
- We got daguerrotypes, and then photographic film, and then digital cameras, along with image editing software, and now AI image generation systems; yet there are still people who go out and apply oil paints to a canvas with natural hair brushes. I'm not willing to lose that.
- Pretty much my thoughts the other day... now that Codex does the writing, maybe I can finally switch to Go for the web backend stuff without being annoyed by some of its archaisms and gain significant execution performance, while still having a relatively easy to read language.
- You ask a machine to write your code and you still care about being easy to read?
In my experience the people who care the most about code readability tend to be the people most opinionated on having the right abstractions, which are historically not available in Go.
- > You ask a machine to write your code and you still care about being easy to read?
I just happen to read what the machine writes, which is a way to both learn and inspect. So yes, I care about the code being relatively (and I stress relatively) easy to read. Go is ok there.
- I don't think people mind reading Go as much as they mind writing it.
- Nah all the `if err != nil` is just so much noise they obscures the real logic. And for the longest time it didn’t have generics to write map/filter/reduce on slices, forcing people to use loops where the intention is less clear.
- Ideally, the errors shouldn't be returned as-is, but wrapped with context instead. If that context doesn't matter for you, you can have your editor wrap the if instead, which helps a lot.
- I have shifted as much as I can python to go when I don’t code. It’s just faster and the compiler catches more errors, win win,
- AI, write me that sqlalchemy clone in <lang>
- Over 100% speedup sound like "the code compiled before you asked the compiler to start working".
`from future import time_travel`
- If the speed of a car increases by 100% does that mean that it arrives at its destination before it left? No, it just means it took 50% of the time it would have otherwise.
But I do agree that it would be a bit clearer to talk in terms of time taken rather than speedup % i.e. instead of "20% slowdown to over 100% speedup" it's clearer to say "takes between 50% and 125% of the original time". (Especially since people very often say things like "3 times faster", which technically means 4 times as fast, when they should say "3 times as fast"; "takes 1/3 of the time" is unambiguous.)
- What is wrong with the Python code base that makes this so much harder to implement than seemingly all other code bases? Ruby, PHP, JS. They all seemed to add JITs in significantly less time. A Python JIT has been asked for for like 2 decades at this point.
- The Python C api leaks its guts. Too much of the internal representation was made available for extensions and now basically any change would be guaranteed to break backwards compatibility with something.
- Ooo this makes sense it's like if the Linux had don't break users space AND a whole bunch of other purely internal APIs you also can't refactor.
- It's a shame that Python 2->3 transition was so painful, because Python could use a few more clean breaks with the past.
This would be a potential case for a new major version number.
- On the other hand, taking backwards compatibility so seriously is a big part of the massive success of Python
- >> Python 2->3 transition
> taking backwards compatibility so seriously
Python’s backward compatibility story still isn’t great compared to things like the Go 1.x compatibility promise, and languages with formal specs like JS and C.
The Python devs still make breaking changes, they’ve just learned not to update the major version number when they do so.
- Indeed, Python's version format is semver but it's just aesthetics, they remove stuff in most (every?) minor version. Just yesterday I wasted hours trying to figure out a bug before realizing my colleague hadn't read the patch notes.
- I would argue that the libraries, and specifically NumPy, are the reason Python is still in the picture today.
It will be interesting to see, moving forward, what languages survive. A 15% perf increase seems nice, until you realize that you get a 10x increase porting to Rust (and the AI does it for you).
Maybe library use/popularity is somewhat related to backwards compatibility.
Disclaimer: I teach Python for a living.
- Python it's a language that really good libraries for different domains. like web: django/flask AI numpy pytorch and more. All the ecosystem for scripting and being already installed in most linux distros and on macs. For GUI it has really good bindings for the major frameworks QT,GTK.
- And PyTorch, and Pandas, and, and…
- Built and or inspired by NumPy...
- > you get a 10x increase porting to Rust (and the AI does it for you)
So, you keep reading/writing Python and push a button to get binary executables through whatever hoops are best today ?
(I haven't seen the "fits your brain" tagline in the recent past ...)
- Python does not take backwards compatibility seriously. 2 to 3 is a big compatibility break. But things like `map(None, seq1, seq2)` also broke; such deliberate compatibility break is motivated by no more than aesthetic purity.
- Python does not take backwards compatibility very seriously at all. Take a look at all the deprecated APIs.
I would say it's probably worth it to clean up all the junk that Python has accumulated... But it's definitely not very high up the list of languages in terms of backwards compatibility. In fact I'm struggling to think of other languages that are worse. Typescript probably? Certainly Go, C++ and Rust are significantly better.
- They don't just deprecate APIs, they remove them completely to make sure you really stop using them.
- For what it’s worth Ruby’s JIT took several different implementations, definitely struggled with Rails compatibility and literally used some people’s PhD research. It wasn’t a trivial affair
- I can't really talk about Ruby. But PHP is much more static and surface of things you have to care about at runtime is like magnitude smaller and there already was opache as a starting point. And speaking of something like JIT in V8 is of the most sophisticated and complicated ever built. There hasn't been near enough man hours and funding to cpython to make it fair comparison
- Some languages are much harder to compile well to machine code. Some big factors (for any languages) are things like: lack of static types and high "type uncertainty", other dynamic language features, established inefficient extension interfaces that have to be maintained, unusual threading models...
- That makes sense if you're comparing with Java or C#, but not Ruby, which is way more dynamic than Python.
The more likely reason is that there simply hasn't been that big a push for it. Ruby was dog slow before the JIT and Rails was very popular, so there was a lot of demand and room for improvement. PHP was the primary language used by Facebook for a long time, and they had deep pockets. JS powers the web, so there's a huge incentive for companies like Google to make it faster. Python never really had that same level of investment, at least from a performance standpoint.
To your point, though, the C API has made certain types of optimizations extremely difficult, as the PyPy team has figured out.
- Google, Dropbox, and Microsoft from what I can recall all tried to make Python fast so I don’t buy the “hasn’t seen a huge amount of investment”. For a long time Guido was opposed to any changes and that ossified the ecosystem.
But the main problem was actually that pypy was never adopted as “the JIT” mechanism. That would have made a huge difference a long time ago and made sure they evolved in lock step.
- Microsoft is the one the TFA refers to cryptically when it says "the Faster CPython team lost its main sponsor in 2025".
AFAIK it was not driven by anything on the tech side. It was simply unlucky timing, the project getting in the middle of Microsoft's heavy handed push to cut everything. So much so that the people who were hired by MS to work on this found out they were laid off in a middle of a conference where they were giving talks on it.
- > Python never really had that same level of investment, at least from a performance standpoint.
Or lack of incentive?
Alot of big python projects that does machine learning and data processing offloads the heavy data processing from pure python code to libraries like numpy and pandas that take advantage of C api binding to do native execution.
- The simplest JIT just generates the machine code instructions that the interpreter loop would execute anyway. It’s not an extremely difficult thing, but it also doesn’t give you much benefit.
A worthwhile JIT is a fully optimizing compiler, and that is the hard part. Language semantics are much less important - dynamic languages aren’t particularly harder here, but the performance roof is obviously just much lower.
- I think that it's just that python people took the problem different, they made working with c and other languages better, and just made bindings for python and offloaded the performant code to these libraries. Ex: numpy
- For better or for worse they have been very consistent throughout the years that they don't want want to degrade existing performance. It is why the GIL existed for so long
- I thought php hasn't shipped jit yet (as in its behind a disabled by default config)
- PHP 8 shipped with JIT on by default unless I'm mistaken.
- https://www.php.net/manual/en/opcache.configuration.php says its off by default as of php 8.4 (and prior to that it was technically on but effectively off due to other configs)
- Are you forgetting about PyPy, which has existed for almost 2 decades at this point?
- That's a completely separate codebase that purposefully breaks backwards compatibility in specific areas to achieve their goals. That's not the same as having a first-class JIT in CPython, the actual Python implementation that ~everyone uses.
- Definitely agree that it’s better to have JIT in the mainline Python, but it’s not like there weren’t options if you needed higher performance before.
Including simply implementing the slow parts in C, such as the high performance machine learning ecosystem that exists in Python.
- PHP and JS had huge tech companies pouring resources into making them fast.
- Money.
- (what are blueberry, ripley, jones and prometheus?)
- Yes, the graphs are incomprehensible because those are not defined in the article. They turn out to be different physical machines with different architectures: https://doesjitgobrrr.com/about
blueberry (aarch64) Description: Raspberry Pi 5, 8GB RAM, 256GB SSD OS: Debian GNU/Linux 12 (bookworm) Owner: Savannah Ostrowski ripley (x86_64) Description: Intel i5-8400 @ 2.80GHz, 8GB RAM, 500GB SSD OS: Ubuntu 24.04 Owner: Savannah Ostrowski jones (aarch64) Description: Apple M3 Pro, 18GB RAM, 512GB SSD OS: macOS Owner: Savannah Ostrowski prometheus (x86_64) Description: AMD Ryzen 5 3600X @ 3.80GHz, 16GB RAM OS: Windows 11 Pro Owner: Savannah Ostrowski - The names of the benchmark runners. https://doesjitgobrrr.com/about
- So the biggest gains so far are on Windows 11 Pro of (x86_64) ~20%? Is that because Windows was bad as a baseline (promethius)? It doesn't seem like the x86_64/Linux has improved as dramatically ~5% (ripley). I'm just surprised OS has that much of an effect that can be attributed to JIT vs other OS issues.
- It's hard to say whether it's Windows related since the two x86_64 machines don't just run different OSes, they also have different processors, from different manufacturers. I don't know whether an AMD Ryzen 5 3600X versus Intel i5-8400 have dramatically different features, but unlike a generic static binary for x86_64, a JIT could in principle exploit features specific to a given manufacturer.
- The immediate question has been answered, but what about the names? The latter three are obvious references to the Alien universe, but what relationship does blueberry have to them?
- I assume Blueberry is a nod to the machine being a Raspberry Pi.
- Sorry but the graphs are completely unreadable. There are four code names for each of the lines. Which is jit and which is cpython?
- They are all JIT on different architectures, measured relative to CPython. https://doesjitgobrrr.com/about: blueberry is aarch64 Raspberry Pi, ripley is x86_64 Intel, jones is aarch64 M3 Pro, prometheus is x86_64 AMD.
- Thanks
- [dead]
- [dead]
- [dead]
- [flagged]
- [flagged]
- [flagged]
- Reference counting is not a strict requirement for python. Certainly not accurate counting.
- [flagged]
- [flagged]