Reading Python Documentation

If you're taking CS 41, we still have a few weeks left in the course, which will cover the Python Standard Library and Third-Party Libraries in more detail and culminate in the final project presentations (get hype!! 🥳🚀). As far as course notes are concerned, though, we wanted to leave you with a tutorial on how to teach yourself things within the Python ecosystem.

Here, we'll walk through reading documentation from the Python standard library because most third-party documentation has a similar format.

Built-in Functions

Built-in Types

Sequence Types — list, tuple, range

Text Sequence Type — str

Built-in Exceptions

Text Processing Services

string — Common string operations

re — Regular expression operations

Data Types

datetime — Basic date and time types

heapq — Heap queue algorithm

pprint — Data pretty printer

File and Directory Access

pathlib — Object-oriented filesystem paths

Data Persistence

pickle — Python object serialization

Concurrent Execution

threading — Thread-based parallelism

multiprocessing — Process-based parallelism

Internet Data Handling

json — JSON encoder and decoder

At the top-level of the above list, you can find categories like "Data Persistence." Within those categories are Python libraries about that topic. For example, under "Data Persistence" is the library pickle, which allows you to store many Python objects onto your computer's hard drive.

The `pickle` Module

Let's take a closer look at the pickle module, whose documentation is hosted at https://docs.python.org/3/library/pickle.html.

If you're reading about this module for the first time, you should start with the introduction which usually describes the module at a high level. In this case, the introduction is:

The pickle module implements binary protocols for serializing and de-serializing a Python object structure. "Pickling" is the process whereby a Python object hierarchy is converted into a byte stream, and "unpickling" is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as "serialization", "marshalling," or "flattening"; however, to avoid confusion, the terms used here are "pickling" and "unpickling".

Then, you should glance at the table of contents for this module, which lives in the left-hand collapsible bar. In this case, it looks like this:

pickle — Python object serialization

Relationship to other Python modules

Comparison with marshal

Comparison with json

Data stream format

Module Interface

What can be pickled and unpickled?

Pickling Class Instances

Persistence of External Objects

Dispatch Tables

Handling Stateful Objects

Custom Reduction for Types, Functions, and Other Objects

Out-of-band Buffers

Provider API

Consumer API

Example

Restricting Globals

Performance

Examples

Many of these sections are unique to pickle, which is a fairly sophisticated library. Let's jump to the "Module Interface" section which describes the exports of this module in the standard format.

This section (and most Python documentation) is ordered by indentation. The module exports are aligned to the left of the page, and indentation levels are used to nest descriptions. For example, this is the documentation for pickle.dumps:

pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None)
Return the pickled representation of the object obj as a bytes object, instead of writing it to a file.
Arguments protocol, fix_imports and buffer_callback have the same meaning as in the Pickler constructor.

Changed in version 3.8: The buffer_callback argument was added.

The name of the function is depicted in bold, and the function signature is reproduced in detail to show which arguments are required/optional and positional/keyword. In the above example, obj is the only required argument.

Then, there's an indented description of the function which describes what the function does. This description will refer to parameters in italics and should explicitly say when the function returns something and what the type of that object will be.

Finally, for objects that have their own attributes (like classes and instances), the documentation will typically display these with additional levels of indentation. As a template:

library.ClassName(...parameters...)
A description of the class and parameters, at a high level.

method_name(self, ...parameters...)
A description of the method and its parameters.

If a class has an attribute that has its own attributes, the indentation can continue further.

List of Useful Third Party Packages

Below, we've compiled a list of third party packages that we've found useful, along with a brief description of each and the link to the documentation. Hopefully you'll find these packages helpful in your Python projects!

Numerical Computing, Machine Learning

numpy (documentation) - numpy provides a series of numerical computing tools. numpy provides an n-dimensional array object, as well as (fast) linear-algebraic operations on numpy arrays, statistical operations, and random simulation.
scipy (documentation) - scipy also provides numerical computing tools, specifically, it contains function implementations for numerical integration, interpolation, optimization, linear algebra, and statistics.
matplotlib (documentation) - matplotlib provides tools for the easy creation of data plots in Python.
tensorflow (documentation) - tensorflow is Google's open-source machine learning package. It contains tools to easily create, train, and test machine learning models.
pytorch (documentation) - pytorch is another open-source machine learning package, primarily designed for deep learning. It's similar to tenosrflow in that it provides tools to create, train, and test machine learning models.
scikit-learn (documentation) - scikit-learn is (yet another) machine learning package. It provides out-of-the-box implementations of classical machine learning models (KNN, SVM, random forest, etc.) as well as a multi-layered perceptron for regression and classification tasks.
keras (documentation) - keras is a deep learning package built on top of tensorflow, that makes it easier to design and train deep learning models.
nltk (documentation) - nltk is a natural language processing package, which provides access to lexicons and corpora, as well as libraries for classification, tokenization, parsing, and other tasks.
cvxpy (documentation) - cvxpy is the Boyd Lab's convex optimization package, which implements standard optimization algorithms to automatically optimize convex problems.
pandas (documentation) - pandas is a data manipulation library. Through its DataFrame class, it allows for the easy processing, reading/writing, and transformation of various types of data.

Python & The Web

django (documentation) - django is an industrial-strength framework in Python for building web applications.
beautifulsoup (documentation) - BeautifulSoup is an HTML-parsing library in Python for web scraping.

Cryptography

pyca/cryptography (documentation) - pyca/cryptography is a package which provides an interface to cryptographic function implementations, such as symmetric ciphers, message digests, and key derivation functions.

Game Programming

pygame (documentation) - pygame is a collection of modules which enable developers to write video games in Python.

With love, 🦄s, and 🐘s by the CS41 Staff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7-reading-documentation.md

7-reading-documentation.md

Reading Python Documentation

Table of Contents

The `pickle` Module

List of Useful Third Party Packages

Numerical Computing, Machine Learning

Python & The Web

Cryptography

Game Programming

Files

7-reading-documentation.md

Latest commit

History

7-reading-documentation.md

File metadata and controls

Reading Python Documentation

Table of Contents

The pickle Module

List of Useful Third Party Packages

Numerical Computing, Machine Learning

Python & The Web

Cryptography

Game Programming

The `pickle` Module