PyPy 1.7 - widening the sweet spot
We're pleased to announce the 1.7 release of PyPy. As became a habit, this release brings a lot of bugfixes and performance improvements over the 1.6 release. However, unlike the previous releases, the focus has been on widening the "sweet spot" of PyPy. That is, classes of Python code that PyPy can greatly speed up should be vastly improved with this release. You can download the 1.7 release here:
https://pypy.org/download.html
What is PyPy?¶
PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 1.7 and cpython 2.7.1 performance comparison) due to its integrated tracing JIT compiler.
This release supports x86 machines running Linux 32/64, Mac OS X 32/64 or Windows 32. Windows 64 work is ongoing, but not yet natively supported.
The main topic of this release is widening the range of code which PyPy can greatly speed up. On average on our benchmark suite, PyPy 1.7 is around 30% faster than PyPy 1.6 and up to 20 times faster on some benchmarks.
Highlights¶
-
Numerous performance improvements. There are too many examples which python constructs now should behave faster to list them.
-
Bugfixes and compatibility fixes with CPython.
-
Windows fixes.
-
PyPy now comes with stackless features enabled by default. However, any loop using stackless features will interrupt the JIT for now, so no real performance improvement for stackless-based programs. Contact pypy-dev for info how to help on removing this restriction.
-
NumPy effort in PyPy was renamed numpypy. In order to try using it, simply write:
import numpypy as numpy
at the beginning of your program. There is a huge progress on numpy in PyPy since 1.6, the main feature being implementation of dtypes.
-
JSON encoder (but not decoder) has been replaced with a new one. This one is written in pure Python, but is known to outperform CPython's C extension up to 2 times in some cases. It's about 20 times faster than the one that we had in 1.6.
-
The memory footprint of some of our RPython modules has been drastically improved. This should impact any applications using for example cryptography, like tornado.
-
There was some progress in exposing even more CPython C API via cpyext.
Things that didn't make it, expect in 1.8 soon¶
There is an ongoing work, which while didn't make it to the release, is probably worth mentioning here. This is what you should probably expect in 1.8 some time soon:
- Specialized list implementation. There is a branch that implements lists of integers/floats/strings as compactly as array.array. This should drastically improve performance/memory impact of some applications
- NumPy effort is progressing forward, with multi-dimensional arrays coming soon.
- There are two brand new JIT assembler backends, notably for the PowerPC and ARM processors.
Fundraising¶
It's maybe worth mentioning that we're running fundraising campaigns for NumPy effort in PyPy and for Python 3 in PyPy. In case you want to see any of those happen faster, we urge you to donate to numpy proposal or py3k proposal. In case you want PyPy to progress, but you trust us with the general direction, you can always donate to the general pot.
Cheers,
Maciej Fijałkowki, Armin Rigo and the entire PyPy team
Comments
Could you put a link to some sort of NEWS file, a list of issue tracker tickets, or at least the relevant span of the revision control tool so that I could browse what sorts of changes have gone into trunk since 1.6?
"PyPy now comes with stackless features enabled by default"
Could you please tell a bit more about it? Is it just sort of internal optimizations, something under the hood? Or does it mean tail recursion optimization? Or cooperative multitasking with greenlets? What's the API for stackless features?
Is it so hard to wait until you have a Windows build before announcing a release?
Or not telling in the release that the Windows binary is available?
@Zooko
hg log -rrelease-1.6:release-1.7
I am getting a segmentation fault.
So if I want to run PyPy on my code with numpy I have to replace in each file "import numpy" by "import numpypy", "from numpy import ..." by "from numpypy import ...". And each time I want to switch beween PyPy and CPython, I have to search and replace all those occurrences backward. Well done...
Thank you for all your work, it's nice to see how far you have come in so little time! Keep raising the bar.
@D: Please take it the easy way and add "sys.modules['numpy'] = numpypy" at the start of your program.
@⚛ report a bug to bugs.pypy.org
@D it's gonna stay like this until it's finished. The problem is that most programs won't run out of the box anyway as of now, because of some missing functionality. We'll probably rename it back once it's finished.
@D: all you need is to create a file "numpy.py" that contains "from numpypy import *". (The real reason we did this temporary renaming is because numpy developers asked us to.)
More likely, though, you are probably going to hit some unimplemented feature anyway, as our numpy(py) is still incomplete.
Re: numpypy. The standard in the bad old days with three different and subtly incompatible array libraries was "try: import ...; except: ..."
@Maciej: I am *not* going to submit a bug report, on purpose. When developing software for the masses, there are always two sets of users. One set comprises the users who report bugs, the other set comprises the users who are experiencing issues but do not report bugs.
The ideal state would be that there are no bugs, but this is only theoretical of course.
As an experiment, I have decided not to tell you any information about the segmentation fault. Nothing. Absolutely nothing.
The question is what measures are you going to take to solve this PyPy issue.
Good luck ...
@⚛ we're going to do nothing with that. Most probably you're using a CPython C extension or some illegal ctypes invocation or older version of jinja that did that or something... Besides, there is absolutely no point in trying to fix a bug that noone can potentially provide any information for.
Cheers,
fijal
@Maciej:
PyPy 1.6 worked OK (but it was slower than CPython).
"we're going to do nothing with that."
OK
"Most probably you're using a CPython C extension or some illegal ctypes invocation or older version of jinja that did that or something..."
I don't think so. GDB says that the EIP register stops at an address which does not seem to belong to the PyPy executable nor to any dynamically loaded library. This leads me to the conclusion that the issue is in the x86 code generated by PyPy.
"Besides, there is absolutely no point in trying to fix a bug that noone can potentially provide any information for."
I am not saying you have to fix it. I am just saying that PyPy 1.7 generates code that segfaults.
Does PyPy employ partial verification when generating x86 code?
@Flower
"As an experiment, I have decided not to tell you any information about the segmentation fault. Nothing. Absolutely nothing."
So you want to conduct an experiment into 'How to help out an open source project by withholding crucial information'? And I thought the ideas of my PhD-advisor were bad ...
The point he, she, or it is making is that PyPy should contain a theorem prover to verify the code it generates so it is possible to prove mathematically that it never generates bad code—and that anything else is beneath the contempt of a serious computer scientist. If you need information about a segfault in order to debug it, you obviously have not thought it through thoroughly enough.
@Jorgen and @Maciej:
Well, I previously wrote here that "The question is what measures are you (=the PyPy team) going to take to solve this PyPy issue."
This sentence of mine contained the additional information that: I believe that it is a PyPy issue.
Maciej then wrote: "Most probably you're using a CPython C extension or ... that did that or something". This means he was trying to put the blame on others (C extensions or whatever) rather than admitting that it might be an issue attributable to PyPy and PyPy alone.
Then you (Jorgen) wrote "So you want to conduct an experiment into 'How to help out an open source project by withholding crucial information'?". And that is exactly what I intend to do: to help the PyPy project by withholding crucial information.
It will work.
@Damian:
"... PyPy should contain a theorem prover to verify the code it generates so it is possible to prove mathematically that it never generates bad code"
I believe such a thing is impossible.
It's possible if you let the verifier reject legal code. It's probably not realistic though, RPython (or is that the JIT-annotation language?) would have to be designed to be verifiable for whatever property you want to verify.
@⚛: you're sitting in your own corner of the world thinking that we will try hard to figure out which segfault you could possibly mean, and that it will help the PyPy project :-) I've heard many misconceptions of how Open Source works, but I've never heard this one.
How it really works is: you think you have a genuine segfault and want to report it, in which case you file a bug to https://bugs.pypy.org, and maybe we have to discuss more to figure out why, for example, it appears on your machine and not ours, or which configuration you need to reproduce it; sometimes it can take efforts on both parties to even reproduce the problem.
You are free to not play this game, but then just like Maciej said, you will be fully ignored. Even if it's a real bug, it's likely that over time someone else will report or fix it. I'm not trying to force you to "reveal" it to us; feel free to ignore me. I'm just explaining how I believe Open Source works.
The difference for us is small, because a real bug will be seen and reported by others too. The difference for you is whether you would like to contribute and get our thanks, or don't care about it.
The pypy team "could" solve it. But it would be a massive waste of time, and of cource the changes are that they are unable to because of problems in your setup. I most certainly hope no open source team really spend their time on such ghost hunts.
https://democreatorreview.blogspot.com/
Somewhat off the topic of this post, but I'm wondering what the special optimization of string lists would be. I can see obvious benefits to storing ints/floats directly in the list rather then as boxed numbers, but not so much for strings since they have be stored using an indirection anyways.
@Winston:
astutely observed (as always). There are two points to string lists:
1) PyPy's strings have one extra indirection, e.g. the data is not stored in the string box. This is due to RPython restrictions. With string lists, one indirection can be removed.
2) If the JIT knows that the full list stores only strings, it can actually generate better code, because it does not need to check the type of the item that was just read out of the list.
This means he was trying to put the blame on others....
omething under the hood? Or does it mean tail recursion optimization?