Python Finalizers Semantics, Part 1
Python's garbage collection semantics is very much historically grown and implementation-driven. Samuele Pedroni therefore likes to call it the "'there is no such thing as too much chocolate'-approach to GC semantics" :-). In this two-part post series I am going to talk about the semantics of finalization (__del__ methods) in CPython and PyPy.
The current behaviour is mostly all a consequence of the fact that CPython uses reference counting for garbage collection. The first consequence is that if several objects die at the same time, their finalizers are called in a so-called topological order, which is a feature that some GCs have that CPython offers by chance. This ensures, that in a __del__ method, all the attributes of the object didn't get their __del__ called yet. A simple example:
class B(object): def __init__(self, logfile): self.logfile = logfile def __del__(self): self.logfile.write("done doing stuff") b = B(file("logfile.txt", "w"))
If the instance of B dies now, both it and the logfile are dead. They will get their __del__``s called and it's important that the file's ``__del__ gets called second, because otherwise the __del__ of B would try to write to a closed file.
The correct ordering happens completely automatically if you use reference counting: Setting b to None will decref the old value of b. This reduces the reference count of this instance to 0, so the finalizer will be called. After the __del__ has finished, this object will be freed and all the objects it points to decrefed as well, which decreases the reference count of the file to 0 and call its `` __del__`` as well, which closes the file.
The behaviour of PyPy's semispace and generational GCs wasn't very nice so far: it just called the finalizers in an essentially random order. Last week Armin came up with a somewhat complicated algorithm that solves this by emulating CPython's finalization order, which we subsequently implemented. So PyPy does what you expect now! The Boehm GC does a topological ordering by default, so it wasn't a problem there.
A small twist on the above is when there is a cycle of objects involving finalizers: In this case a topological ordering is not possible, so that CPython refuses to guess the finalization order and puts such cycles into gc.garbage. This would be very hard for PyPy to do, since our GC implementation is essentially independent from the Python interpreter. The same GCs work for our other interpreters after all too. Therefore we decided to break such a cycle at an arbitrary place, which doesn't sound too insane. The insane thing is for a Python program to create a cycle of objects with finalizers and depend on the order in which the finalizers are called. Don't do that :-) (After all, CPython wouldn't even call the finalizers in this case.)
PyPy presence on various conferences in the near future
Hello! I will have the pleasure of presenting PyPy on various conferences in the near future. They're (in chronological order):
- Studencki Festiwal Informatyczny in Krakow, POLAND 6-8 March 2008. I think this might be only interesting for polish people (website, in polish)
- Pycon Chicago, IL, USA. 14-17 March 2008. There should be also a PyPy sprint afterwards, including newbie-friendly tutorial, everybody is welcome to join us! (Provided that I'll get the US visa, which seems to be non-trivial issue for a polish citizen)
- RuPy, Poznan, POLAND 13-14 April 2008 (website). This is small, but very friendly Ruby and Python conference. Last year was amazing, I can strongly recommend to go there (Poznan is only 2h by train from Berlin also has its own airport).
Cheers,
fijal
Hey - I'll be at both the Polish conferences talking about IronPython. I hope you will be talking in English!
Look forward to meeting up with you.
Michael Foord
Cheers Michael. Looking forward to see you!
At rupy definitely. At sfi it depends on them (I'll try to, also that I have noone to help me with slides in polish :)
Buildbots and Better Platform Support
In the last days we improved platform-support of PyPy's Python interpreter. Jean-Paul Calderone has been tirelessly working for some time now on setting up a buildbot for translating and testing PyPy. So far the basic mechanisms are working and the buildbot is running on various machines, including some that Michael Schneider (bigdog) lets us use, one of them being a Windows machine, the other one with a 64bit Linux (lots of thanks to those two, you are awesome!).
What is still missing is a nice way to visualize the test results to quickly see which tests have started failing on which platforms. There is a prototype already, which still needs some tweaking.
The availability of these machines has triggered some much-needed bug-fixing in PyPy to make our Python interpreter work better on Windows and on 64 bit Linux. Maciek and Michael Schneider worked on this quite a bit last week, with the result that PyPy supports many more extension modules now on Windows and 64 bit Linux. Since we now have the buildbot the hope is that the support also won't disappear soon :-).
PyPy Keyboard Heatmap
Today I saw the keyboard heatmap generator on the Blended Technologies blog. I threw all the PyPy code at it to see whether the heatmap looks any different than normal Python code. It doesn't:
So now the excuse "I can't contribute to PyPy because it needs all those special PyPy-keys" isn't working anymore :-).
RPython can be faster than C
(yes, C as in language, not c as in speed of light). I looked recently at the great computer language shootout, for some benchmarks and to make some speed comparisons. I use this benchmark, modified it to be rpythonic-enough and compared speeds. The code is here (the only change from the Python version was to create a class instead of tuple, so actually this version is more OO). Also the benchmark is very likely flawed because it favours better GCs :).
So, here we go:
Language: | Time of run (for N=14): |
Python version running on Python 2.5.1, distribution | 25.5s |
Python version running on PyPy with generational GC | 45.5 |
Python with psyco | 20s |
RPython translated to C using PyPy's generational GC | 0.42s |
compiling the Haskell version with GHC 6.6.1 | 1.6s |
compiling the C version with gcc 4.1.2 -O3 -fomit-frame-pointer | 0.6s |
Also worth noticing is that when using psyco with the original version (with tuples) it is very fast (2s).
So, PyPy's Python interpreter is 80% slower than CPython on this (not too horrible), but RPython is 40% faster than gcc here. Cool. The result is mostly due to our GC, which also proves that manual memory-management can be slower than garbage collection in some situations. Please note that this result does not mean that RPython is meant for you. It requires a completely different mindset than the one used to program in Python. Don't say you weren't warned! :-)
"It requires a completely different mindset than the one used to program in Python."
Can you elaborate? "RPython for Python programmers" would be an excellent addition to the docs or this blog. :)
I agree with Jonathan. There are many Python programmers who would *love* to be able to write Python extensions with RPython.
I know that this is already possible, but there are two issues:
* Lack of documentation on programming with RPython (I realise that this is a moving target)
* Last I heard, the refcounting implementation made RPython extensions inefficient
If these two issues were resolved (or mostly resolved) then a lot more people might start using the PyPy toolchain.
Asides from my growsing, it looks like PyPy is becoming more impressive by the day. Congratulations.
As of today you can't write CPython extensions in RPython.
Why not ask for the great computer language shootout to include RPython as one of their benchmarking languages? This could be a good and free advertising for the pypy project.
Hi Michael,
Leonardo is correct, the extension compiler was removed from SVN in November. We had many discussions about this step, but eventually it turned out to be necessary for many reasons. The extcompiler never was that useful in the first place because the produced extensions weren't fast (one of the reasons being the bad refcounting indeed).
The other reasons were that the extcompiler was impossible to maintain and was actually preventing progress, because it kept code alive that we wanted to get rid off.
So at the moment you cannot use PyPy any more to produce CPython extensions, only standalone programs.
It's completely possible that the extcompiler will be reborn in the future, but at the moment our priorities are really to make PyPy a good Python and not do tons of things on the side.
Cheers,
Carl Friedrich
I would also say nowadays it's already possible to write extension modules in RPython... but just for PyPy, now for CPython :-).
Jokes apart, if someone is really interested in writing part of its application in RPython (despite our warnings :-)), targeting PyPy could be an interesting alternative, as long as you don't need external libraries and the speed gain is more than what you loose in other areas where PyPy is actually slower.
I think a lot of people are interested in using RPython for performance reasons. But about nobody will leave CPython atm, because extension modules are not working.
At the moment, I wouldn't leave CPython since all I am doing is heavily based on scipy. And so my only option is (a) to wait PyPy being able to compile extensions for CPython or (b) PyPy making use of CPython extensions.
As long as this is not going to happen, I probably will not use RPython for serious projects. :/
Why would that be a flaw? Note: this is an adaptation of a benchmark for testing GC
Leonardo Santagada said "Why not ask for the great computer language shootout to include RPython ..."
FAQ Why don't you include language X?
> Why would that be a flaw? Note: this is an adaptation of a benchmark for testing GC
I know, but I realised it after posting :) (We even have original somewhere around to compare gcs). Also, honestly a lot of python versions rely on libraries written in C, hence it took me a while that is "pure enough".
> Once the RPython was translated to C by PyPy how did you compile the C?
With the very same options as bare gcc. -O3 -fomit-frame-pointer
More objects != OO.
Perhaps you meant to say that it more closely reflects the domain?
No, I don't know how I ended up on this blog post.
PyPy.NET goes Windows Forms
After having spent the last few days on understanding PyPy's JIT, today I went back hacking the clr module. As a result, it is now possible to import and use external assemblies from pypy-cli, including Windows Forms
Here is a screenshot of the result you get by typing the following at the pypy-cli interactive prompt:
>>>> import clr >>>> clr.AddReferenceByPartialName("System.Windows.Forms") >>>> clr.AddReferenceByPartialName("System.Drawing") >>>> from System.Windows.Forms import Application, Form, Label >>>> from System.Drawing import Point >>>> >>>> frm = Form() >>>> frm.Text = "The first pypy-cli Windows Forms app ever" >>>> lbl = Label() >>>> lbl.Text = "Hello World!" >>>> lbl.AutoSize = True >>>> lbl.Location = Point(100, 100) >>>> frm.Controls.Add(lbl) >>>> Application.Run(frm)
Unfortunately at the moment you can't do much more than this, because we still miss support for delegates and so it's not possibile to handle events. Still, it's a step in the right direction :-).
Improve .NET Integration
A while ago Amit Regmi, a student from Canada, started working on the clr module improvements branch as a university project.
During the sprint Carl Friedrich, Paul and me worked more on it and brought it to a mergeable state.
It adds a lot of new features to the clr module, which is the module that allows integration between pypy-cli (aka PyPy.NET) and the surrounding .NET environment:
- full support to generic classes;
- a new importer hook, allowing things like from System import Math and so on;
- .NET classes that implements IEnumerator are treated as Python iterators; e.g. it's is possile to iterate over them with a for loop.
This is an example of a pypy-cli session:
>>>> from System import Math >>>> Math.Abs(-42) 42 >>>> from System.Collections.Generic import List >>>> mylist = List[int]() >>>> mylist.Add(42) >>>> mylist.Add(43) >>>> mylist.Add("foo") Traceback (most recent call last): File "<console>", line 1, in <interactive> TypeError: No overloads for Add could match >>>> mylist[0] 42 >>>> for item in mylist: print item 42 43
This is still to be considered an alpha version; there are few known bugs and probably a lot of unknown ones :-), so don't expect it to work in every occasion. Still, it's a considerable step towards real world :-).
Crashing Other People's Compilers
Over the years PyPy has (ab?)used various external software for different purposes, and we've discovered bugs in nearly all of them, mostly by pushing them to their limits. For example, many compilers are not happy with 200MB of source in one file. The Microsoft C compiler has a limit of 65536 lines of code per file and the CLI was raising "System.InvalidProgramException: Method pypy.runtime.Constants:.cctor () is too complex.", where too complex probably means "too long". Just for fun, today we collected all projects we could think of in which we found bugs:
So one could say that PyPy is really just the most expensive debugging tool ever :-).
You know, the one piece of external software we most depend on is one we haven't found bugs in: gcc (at least, I can't remember any problems). That's pretty impressive.
I think you can probably add gdb to the list though.
Can you, maybe, give a few examples of what you did. Linking to items about them if you wrote about it.
I'd be interested in knowing which projects were the most receptive to the bug reports.
Hi Brett,
I think the project most receptive to bug reports is LLVM, where bugs that we find are usually fixed within a small number of days. I think in general Open Source projects react quite well, as you would expect. A negative example is graphviz, which still segfaults despite us producing a patch which fixes the problem.
Microsoft proves to be completely unapproachable, it seems you have to pay them if you want to report a bug (should be the other way round, of course :-)).
@Carl:
Thanks for the info, Carl. I have been contemplating trying to rely on them for compiling Python for testing purposes, especially with clang coming along (although I am waiting for them to address a bug I found =). Good to know they are responsive.
And yes, it is really unfortunate that Microsoft doesn't make reporting bugs easy, but I guess no one wants to deal with the number of reports they would most likely get. =)
Regarding Microsoft bug reports, there's Connect. And I've got some degree of success with it.
Leysin Winter Sport Sprint Started
The Leysin sprint has started since yesterday morning in the usual location. The view is spectacular (see photo) the weather mostly sunny. The following people are sprinting:
- Maciej Fijalkowski
- Armin Rigo
- Toby Watson
- Paul deGrandis
- Antonio Cuni
- Carl Friedrich Bolz
We started working on various features and performance improvements for the high level backends (JVM and .NET) and on implementing ctypes for PyPy. Later this week we plan to spend a few days on the JIT, because Anto and I both need to get into it for our respective university projects.
For those curious about what is going on: SVN commits
Great work, guys! :)
Finding GC roots: using LLVM or parsing assembler files from GCC
PyPy contains a framework for writing custom Garbage Collectors, and a few simple GCs have been written in this framework. A common issue with all these GCs is how to find all the stack roots, i.e. all the pointers to live GC-managed objects currently stored in local variables, in all the callers of the current function. The current solution is to maintain a custom shadow stack of roots, where all functions push and pop copies of their local variables of type "GC pointer". Clearly this is an overhead. Can we remove it?
LLVM has recently grown some support for this. By emitting markers in the LLVM source and with the help of a bit of custom C++ code, we can generate stack maps for the functions compiled by LLVM. Then, with 100% non-portable code in our framework GC's root finding algorithm, we can walk the machine stack and locate where in each stack frame LLVM stores the GC pointers. (Yes, I mean non-portable: LLVM offers no help for doing that. Maybe it will at some point, though I didn't manage to explain why this is an issue to people working on this in LLVM so far...). I've tried that approach in the llvmgcroot branch. Over the manually-managed shadow stack, this gives speed improvements which are, very roughly, on the order of 5%.
Note that this prevents some optimizations in LLVM, because it forces it to allocate all local variables of type "GC pointer" in the stack; it cannot keep them in registers and it must assume that they can be changed more or less at any time (as moving GCs do). Can we do better?
Actually, yes. We can even do better in the C backend, using a GCC hack. GCC has this nice extension:
asm("bla", constrains);This is meant to generate assembler instructions directly from C. Internally, GCC considers the whole asm() as a single regular instruction of its intermediate language; the constrains are expressed in the same way as the constrains for all the prebuilt intermediate language instructions. They express things like input and output operands of the instruction, whether they can live in memory or in registers, whether the whole instruction has side-effects, etc. The nice thing about asm() is that it doesn't kill any optimization whatsoever in GCC - it's your job to make sure that you use the correct constrains.
So what I've tried in the asmgcroot branch is to use asm() as markers. In this branch, the C backend produces code like this after each function call, for each local variable containing a live GC pointer:
asm("/* GCROOT %0 */" : "=g"(localvar) : "0"(localvar) : "memory");
This causes GCC to emit the following line in the assembler file it generates:
/* GCROOT register-or-memory-containing-localvar */
I won't go in the details of the asm() line above - the constrains are just enough to make sure that GCC doesn't optimize too much, but don't prevent most optimizations from occurring. For example, the localvar can be in a register.
The assembler will just ignore the line above; it is a comment. But what we can do is write our own tool parsing the assembler files. This tool locates the /* GCROOT */ comments and follows where the register or memory location in the comment comes from (to do this it must follow the control flow and data flow of the function). This allows it to build a stack map: for each call instruction it knows exactly which registers and frame stack locations contain a live GC pointer. The stack map is then emitted in an extra assembler file that we link with the rest. As with LLVM above, the stack map is then used at run-time by non-portable code written in our GC's stack root tracker.
Yes, that's rather insane. But at least, we don't need to modify the assembler file - just read it. If GCC is too clever in its optimizations, the custom parser will get lost and complain cleanly; but I think that it is relatively safe in the sense that GCC optimizations should not be able to make the custom parser produce wrong results.
The branch is not merged because it's probably too insane to merge (not to mention, it's probably not portable to non-GCC compilers, and it is completely platform-specific). Still, it gives good results, better that the pure LLVM approach - on the order of 10% to 25% speed-ups for pypy-c.
Obviously it depends on the compiler, but the basic idea is that the natural place to support this is in the compiler itself. For example, instead of parsing the assembler produced by GCC, it would probably be possible to extend GCC to cleanly generate stack maps. (This is basically what I tried to do with LLVM, which gives a plug-in API to do that.)
After a bit of googling, GCC doesn't seem to support Objective-C 2.0 yet. Moreover, the current Objective-C run-time library simply uses the conservative Boehm collector.
ObjC 2 does not use Boehm. The collecting thread suspends other threads and conservatively scans their stacks. It picks up values in registers by querying the kernel for the suspended thread state. It depends heavily on Mach.
llvm-gcc fully supports inline asm, so you could use the same hack you use with GCC with your llvm backend.
Also, you might be interested in https://llvm.org/PR1917, which proposes a method of identifying GC pointers that doesn't disable most optimizations.
The link to the "somewhat complicated algorithm" is a bit broken, but you can still get to it at the web archive.
Thanks, link updated.