Call for donations - PyPy to support Numpy!
UPDATE (May 2014):
Thanks to our donors, we have raised 80% of the total so far. Work on this topic has been happening, and continues to happen, within the budget – even if not within the timeline described below. We have simply not found enough time to work on it as much as we wanted, and thus did not consume the money as quickly as predicted. The ratio “progress / $ used” so far corresponds roughly to what we expected. The document below is the original call for proposal, and we still accept donations for this topic. See the latest status reports on our blog for updates. There is also an automatically generated coverage dashboard showing what parts of NumPy are already usable.
This is a proposal to provide a fully compatible working NumPy implementation for PyPy. This has long been a very commonly requested feature for PyPy as well as a worthy goal given that PyPy performs extremely well on numeric workloads.
We already had some success providing a very basic NumPy implementation, however we believe that raising funds can significantly speed up its development, by paying people to work full time on it.
Below you'll find the work plan and the associated fundraising targets we need to make things happen. Once we reach the necessary target for each stage, we will start contracting developers. Contracts and money are managed by the non-profit Software Freedom Conservancy of which the PyPy project is a member. The current elected representatives are Carl Friedrich Bolz, Holger Krekel and Jacob Hallen and they will – in close collaboration with Conservancy and the core developers – select the best developers for implementing NumPy among well known PyPy contributors.
Should we not receive enough donations to complete all stages by 1st March 2012 at the latest, we will try our best to make PyPy support NumPy anyway. We however reserve the right to shift any unused funds to other PyPy activities when that date is reached. Of course, since the Conservancy is a 501(c)(3) charitable organization incorporated in NY, USA, all funds will, regardless of their use, be spent in a way that benefits the general public, the advancement of Open Source and Free Software, and in particular the PyPy community and the PyPy codebase.
Note For donations higher than $1,000, we can arrange for an invoice and a different payment method to avoid the high Paypal fees. Please contact pypy at sfconservancy.org if you want to know details on how to donate via other means.
What is NumPy?
NumPy is a framework for doing numerical calculations in Python. It has become the de-facto standard for doing any kinds of computations that involve n-dimensional arrays. Please consult the NumPy website for more details.
Why does NumPy on PyPy make sense?
NumPy on PyPy makes sense for a couple of reasons: Firstly, it is by far the most requested feature from PyPy. Secondly, PyPy performs well on numerical loads already. Therefore bringing NumPy into the equation is a reasonable next step - as it's a very convenient and popular tool for doing this kind of work. The resulting implementation should move Python in scientific world from being a merely “glue” language into being the main implementation language for a lot of people in the scientific/numeric worlds. This will benefit current users of NumPy as well as people who so far have to cope with lower level languages like C or Fortran for speed purposes.
The current implementation of NumPy on PyPy is reasonably fast - it ranges from roughly the same speed to 2-5x faster for stacked large array operations to 100-300x for accessing NumPy array elements one by one. The exact speed depends very much how NumPy is used, but the target would be to be within an order of magnitude from handwritten C. To achieve this, we would need to teach our JIT backends how to use modern vector instructions, like SSE or AVX. Hence, we split the proposal into two parts, first part covers compatibility with a reasonable approach to keeping current speed achievements, second part is about teaching the JIT how to vectorize certain operations, which should bring PyPy's NumPy as a very competitive tool compared to other available solutions for numerical computations, like matlab or C++ array libraries.
About estimates and costs
For each step, we estimated the time that it would take to complete for an experienced developer who is already familiar with the PyPy codebase. From this number, the money is calculated considering an hourly rate of $60, and a 5% general donation which goes to the Software Freedom Conservancy itself, the non-profit organization of which the PyPy project is a member and which manages all the issues related to donations, payments, and tax-exempt status.
We split the proposal into two parts - we plan to implement them in the same order, starting by the time we raise the corresponding funding targets:
This part covers the core NumPy Python API. We'll implement all NumPy APIs that are officially documented and we'll pass all of NumPy's tests that cover documented APIs and are not implementation details. Specifically, we don't plan to:
- implement NumPy's C API
- implement other scientific libraries, like SciPy, matplotlib or biopython
Estimated costs: USD$30,000. Estimated duration: 3 months.
This part will cover significant speed improvements in the JIT that would make numeric computations faster. This includes, but is not necesarilly limited to:
- write a set of benchmarks covering various use cases
- teaching the JIT backend (or multiple backends) how to deal with vector operations, like SSE
- experiments with automatic parallelization using multiple threads, akin to numexpr
- improving the JIT register allocator that will make a difference, especially for tight loops
As with all speed improvements, it's relatively hard to predict exactly how it'll cope, however we expect the results to be within an order of magnitude of handwritten C equivalent.
Estimated costs: USD$30,000. Estimated duration: 3 months.
Benefits of This Work to the Python Community and the General Public
Python has become one of the most popular dynamic programming languages in the world. Web developers, educators, and scientific programmers alike all value Python because Python code is often more readable and because Python often increases programmer productivity.
Traditionally, languages like Python ran more slowly than static, compiled languages; Python developers chose to sacrifice execution speed for ease of programming. The PyPy project created a substantially improved Python language implementation, including a fast Just-in-time (JIT) compiler. The increased execution speed that PyPy provides has attracted many users, who now find their Python code runs up to four times faster under PyPy than under the reference implementation written in C.
Meanwhile, adoption of Python is already underway for those researchers and developers who work specifically on computing that requires fast numeric operations. Numpy support in PyPy will allow for Python's use by those developers and researchers who want the ease of programming that Python provides, the speed of PyPy, and the speedups for numerical work that Numpy can provide.
PyPy's developers make all PyPy software available to the public without charge, under PyPy's Open Source copyright license, the permissive MIT License. PyPy's license assures that PyPy is equally available to everyone freely on terms that allow both non-commercial and commercial activity. This license allows for academics, for-profit software developers, volunteers and enthusiasts alike to collaborate together to make a better Python implementation for everyone.
NumPy support for PyPy will be licensed similarly, and therefore NumPy support can directly help researchers and developers who seek to do numeric computing but want an easier programming language to use than Fortan or C, which is typically used for these applications. Being licensed freely to the general public means that opportunities to use, improve and learn about how NumPy works itself will be generally available to everyone.