Skip to content

Commit

Permalink
docs: make README more approachable (python#41)
Browse files Browse the repository at this point in the history
* docs: make README more approachable

* Add mermaid diagram, add evaluation

Co-Authored-By: Jules <57632293+JuliaPoo@users.noreply.github.com>

---------

Co-authored-by: Jules <57632293+JuliaPoo@users.noreply.github.com>
  • Loading branch information
Fidget-Spinner and JuliaPoo authored Jun 22, 2023
1 parent 426daf1 commit 2ce2883
Show file tree
Hide file tree
Showing 2 changed files with 112 additions and 28 deletions.
140 changes: 112 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,121 @@

A WIP Lazy Basic Block Versioning + (eventually) Copy and Patch JIT Interpreter for CPython.

Python is a widely-used programming language. CPython is its reference implementation. Due to Python’s dynamic type semantics, CPython is generally unable to execute Python programs as fast as it potentially could with static type semantics.
## The case for our project

Last semester, while taking CS4215, we made progress on implementing a technique for removing type checks and other overheads associated with dynamic languages known as [Lazy Basic Block Versioning (LBBV)](https://arxiv.org/abs/1411.0352) in CPython. This work will be referred to as pyLBBV. More details can be found in our [technical report](https://github.com/pylbbv/pylbbv/blob/pylbbv/report/CPython_Tier_2_LBBV_Report_For_Repo.pdf). For an introduction to pyLBBV, refer to our [presentation](https://docs.google.com/presentation/d/e/2PACX-1vQ9eUaAdAgU0uFbEkyBbptcLZ4dpdRP-Smg1V499eogiwlWa61EMYVZfNEXg0xNaQvlmdNIn_07HItn/pub?start=false&loop=false&delayms=60000).
Python is a widely-used programming language. As a Python user, I want CPython code to execute quicker.

CPython is Python's reference implementation. Due to Python’s dynamic type semantics, CPython is generally unable to execute Python programs as fast as it potentially could with static type semantics.

## The solution

We fork CPython to implement and experiment with features that will make it faster. Our fork is called pyLBBVAndPatch.

### Features

This section will be laden with some CS programming language terminology. We will do our best to keep that to a minimum and explain our features as simply as possible.

#### Pre-Orbital

Before Orbital, while taking CS4215 in the previous semester, we have already achieved the following:

- Lazy basic block versioning interpreter
- Basic type propagation
- Type check removal
- Basic tests
- Comprehensive documentation

In short, [lazy basic block versioning](https://arxiv.org/abs/1411.0352) is a technique for collecting type information of executing code. A *basic block* is a "is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit" ([retrieved from Wikipedia, 21/6/2023](https://en.wikipedia.org/wiki/Basic_block)). The code you write usually consists of multiple basic blocks. Lazy basic block versioning means we only generate code at runtime, block by block, as we observe the types. This allows us to collect runtime type information at basic block boundaries and optimize our basic blocks we generate them.

Type propagation means we can take the type information gathered from a single basic block, and propagate them down to the next basic block. Thus over time we can accumulate more and more type information.

Type check removal means removing type checks in dynamic Python. E.g. if you have ``a + b``, the fast path in Python has to check that these are `int` or `str` or `float`, then if all those fail, rely on a generic `+` function. These type checks incur overhead at runtime. With type information, if we know the types, that we don't need any type checks at all! This means we can eliminate type checks.

We had a rudimentary test script and Doxygen documentation for all our functions to follow common SWE practices.

#### Orbital

This Orbital, we intend to refine pyLBBV. These include but are not limited to:
- General refactoring
- Bug fixing
- Better unboxing + support unboxing of other types
- More type specialised code
- [X] General refactoring
- [X] Bug fixing
- [X] A more advanced type propagator.
- [X] A more comprehensive test suite with Continuous Integration testing.
- [ ] A copy and patch JIT compiler.

Furthermore, we intend to further pyLBBV by integrating a [Copy and Patch JIT](https://arxiv.org/abs/2011.13127) (using code written externally by Brandt Bucher) on top of the type specialisation PyLBBV provides. The culmination of these efforts will allow further optimisations to be implemented. We hope that this effort can allow Python to be as fast as a statically typed language. Our work here will be made publically available so that it will benefit CPython and its users, and we plan to collaborate closely with the developers of CPython in the course of this project.
A JIT(Just-in-Time) compiler is just a program that generates native machine executable code at runtime. [Copy and Patch](https://arxiv.org/abs/2011.13127) is a new fast compilation technique developed rather recently. The general idea is that compilation normally requires multiple steps, thus making compilation slow (recall how many steps your SICP meta-circular evaluator needs to execute JS)! Copy and patch makes compilation faster by skipping all the intermediate steps, and just creating "templates" for
the final code. These "templates" are called *stencils* and they contain *holes*, i.e. missing values. All you have to do for compilation now is to copy and template, and patch in the holes. Thus making it very fast!

The main copy and patch JIT compiler is writte by Brandt Bucher, and we plan on integrating his work with ours. However, as such a compiler is not designed for use with basic block versioning, we will be handwriting x64 assembly code to get things working!

Our work here will be made publically available so that it will benefit CPython and its users, and we plan to collaborate closely with the developers of CPython in the course of this project.

Due to Python being a huge language, pyLBBVAndPatch intends to support and optimise a subset of Python. Specifically pyLBBVAndPatch focuses on integer and float arithmetic. We believe this scope is sufficient as an exploration of the LBBV + Copy and Patch JIT approach for CPython.

# Project Plan
##### General refactoring

We did a major refactor of our code generation machinery. This makes the code easier to reason about.


- Fix bugs and refactor hot-patches in pyLBBV
- Implement interprocedural type propagation
- Implement typed object versioning
- Implement unboxing of integers, floats and other primitive types
- Implement Copy and Patch JIT (Just In Time) Compilation
##### Bug fixing

## Immediate Goals
- We managed to support recursive functions in Python!
- We are now able to build ourselves using the standard Python build suite. This is a huge accomplishment because it requires supporting a lot of Python code.
- We have fixed enough bugs that we can now run complex recursive Python functions like `help(1)`.

Refer to [the issues page](https://github.com/pylbbv/pylbbv/issues).

# Changelog over CS4215
##### A more advanced type propagator

- We fixed bugs with how one type context (i.e snapshot) is deemed to be compatible with another type context.
- We now support collecting negative type information (e.g. that a variable is not and `int` or not `float`). This allows for better type check elimination.


##### SWE dev best practices and CI testing

- We have added both feature tests and regression tests to our test script in [tier2_test.py](./tier2_test.py).
- We now have continous integration. We build our project and run tests using GitHub Actions for Windows 64-bit, on every pull request and commit to the repository!
![image](./orbital/CI.png)
- All PRs require review and an approval before merging is allowed. All tests in CI must also pass. This follows standard best practices.

## Architecture Diagram and Design

```mermaid
sequenceDiagram
autonumber
participant CPython Compiler
participant Tier 0
participant Tier 1
box rgba(66,120,99,0.1) Our Project
participant Tier 2
participant Type Propagator
end
CPython Compiler ->> Tier 0: Emits code for <br> Tier 0 to execute
loop
Tier 0 ->> Tier 1: Individual instructions <br>profile the data it <br> receives and overwrites <br> itself to a more efficient <br> instruction.
Tier 1 ->> Tier 0: If optimisation is <br> invalid, de-optimise <br> back to Tier 0
end
Tier 1 ->> Tier 2: Code executed more <br> than 63 times and <br> Tier 1 instructions <br> present, triggers the <br> Tier 2 interpreter
loop Until exit scope executed
loop until Tier 2 encounters type-specialised tier 1 instruction
Note over Tier 2: Tier 2 copies Tier 1 <br> instructions into a <br> buffer to be executed <br> according to runtime <br> type info
Tier 2 ->> Type Propagator: Requests type propagator
Type Propagator ->> Tier 2: Type propagator <br> updates runtime type <br> info based on <br>newly emitted code
end
Note over Tier 2: Emits a typeguard <br> and executes Tier 2 code <br> until typeguard is hit.
Tier 2 ->> Type Propagator: Requests type propagator
Type Propagator ->> Tier 2: Type propagator updates <br> runtime type info <br> based on branch taken
Note over Tier 2: Emits type specialised <br> branch according to <br> runtime type info
end
```

## What's left for our project

- The Copy and Patch JIT compiler.

## Evaluation and benchmarks

We will run the [bm_nbody.py](./bm_nbody.py) script and the [bm_float_unboxed.py](./bm_float_unboxed.py) to gather results. For now we expect performance to have no improvement as we have yet to implement the copy and patch JIT compiler.

## Changelog over CS4215

* Refactor: Typeprop codegen by @JuliaPoo in https://github.com/pylbbv/pylbbv/pull/1
* Refactored type propagation codegen to more explicitly align with the formalism in our [technical report (Appendix)](https://github.com/pylbbv/pylbbv/blob/pylbbv/report/CPython_Tier_2_LBBV_Report_For_Repo.pdf) and remove duplicated logic
Expand All @@ -38,35 +126,27 @@ Refer to [the issues page](https://github.com/pylbbv/pylbbv/issues).
* Perf: Improved typeprop by switching overwrite -> set by @JuliaPoo in https://github.com/pylbbv/pylbbv/pull/6
* Stricter type propagation reduces type information loss

# Build instructions
## Build instructions

You should follow the official CPython build instructions for your platform.
https://devguide.python.org/getting-started/setup-building/

We have one major difference - you must have a pre-existing Python installation.
Preferrably Python 3.9 or higher. On MacOS/Unix systems, that Python installation
*must* be located at `python3`.

The main reason for this limitation is that Python is used to bootstrap the compilation
of Python. However, since our interpreter is unable to run a large part of the Python
language, our interpreter cannot be used as a bootstrap Python.

During the build process, errors may be printed, and the build process may error. However,
the final Python executable should still be generated.

# Where are files located? Where is documentation?
## Where are files located? Where is documentation?

The majority of the changes and functionality are in `Python/tier2.c` where Doxygen documentation
is written alongside the code, and in `Tools/cases_generator/` which contains the DSL implementation.

# Running tests
## Running tests

We've written simple tests of the main functionalities.
Unfortunately we did not have time to write comprehensive tests, and it doesn't seem worth it eitherways given the experimental nature of this project.

After building, run `python tier2_test.py` or `python.bat tier2_test.py` (on Windows) in the repository's root folder.

# Debugging output
## Debugging output

In `tier2.c`, two flags can be set to print debug messages:
```c
Expand All @@ -76,3 +156,7 @@ In `tier2.c`, two flags can be set to print debug messages:
// Prints typeprop debug messages
#define TYPEPROP_DEBUG 0
```
## Addendum
More details can be found in our [technical report](https://github.com/pylbbv/pylbbv/blob/pylbbv/report/CPython_Tier_2_LBBV_Report_For_Repo.pdf). For an introduction to pyLBBV, refer to our [presentation](https://docs.google.com/presentation/d/e/2PACX-1vQ9eUaAdAgU0uFbEkyBbptcLZ4dpdRP-Smg1V499eogiwlWa61EMYVZfNEXg0xNaQvlmdNIn_07HItn/pub?start=false&loop=false&delayms=60000).
Binary file added orbital/CI.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 2ce2883

Please sign in to comment.