Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tooling for recording and replaying PyPI interactions #609

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,6 @@ cache-*

# python tmp files
__pycache__

scripts/offlinepi/mitmproxy-ca-cert.pem
scripts/offlinepi/responses.dat
52 changes: 45 additions & 7 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ rayon = { version = "1.8.0" }
# For correct IO error handling: https://github.com/cargo-bins/reflink-copy/pull/51
reflink-copy = { git = "https://github.com/cargo-bins/reflink-copy", rev = "7dffdccc4d4152cdc0a460b3ba8e77dd84ad74df" }
regex = { version = "1.10.2" }
reqwest = { version = "0.11.22", default-features = false, features = ["json", "gzip", "brotli", "stream", "rustls-tls"] }
reqwest = { version = "0.11.22", default-features = false, features = ["json", "gzip", "brotli", "stream", "rustls-tls-native-roots"] }
Copy link
Member Author

@zanieb zanieb Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is required because I registered the cert with my system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustls-tls: Enables TLS functionality provided by rustls. Equivalent to rustls-tls-webpki-roots.
rustls-tls-webpki-roots: Enables TLS functionality provided by rustls, while using root certificates from the webpki-roots crate.
rustls-tls-native-roots: Enables TLS functionality provided by rustls, while using root certificates from the rustls-native-certs crate.
source

Hm...

Copy link
Member Author

@zanieb zanieb Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reqwest-middleware = { version = "0.2.4" }
reqwest-retry = { version = "0.3.0" }
rfc2047-decoder = { version = "1.0.1" }
Expand Down
3 changes: 3 additions & 0 deletions crates/puffin-client/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,6 @@ url = { workspace = true }
[dev-dependencies]
anyhow = { workspace = true }
tokio = { workspace = true, features = ["fs", "macros"] }

[features]
puffin-test-custom-ca-cert = []
13 changes: 12 additions & 1 deletion crates/puffin-client/src/registry_client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,22 @@ impl RegistryClientBuilder {

pub fn build(self) -> RegistryClient {
let client_raw = {
let client_core = ClientBuilder::new()
let mut client_core = ClientBuilder::new()
.user_agent("puffin")
.pool_max_idle_per_host(20)
.timeout(std::time::Duration::from_secs(60 * 5));

if cfg!(feature = "puffin-test-custom-ca-cert") {
if let Some(cert) = std::env::var_os("PUFFIN_TEST_CA_CERT_PEM") {
client_core = client_core.add_root_certificate(
reqwest::Certificate::from_pem(
&fs_err::read(cert).expect("No PUFFIN_TEST_CA_CERT_PEM"),
)
.expect("Invalid certificate"),
)
}
}

client_core.build().expect("Fail to build HTTP client.")
};

Expand Down
58 changes: 58 additions & 0 deletions scripts/offlinepi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# offlinepi

Utilities for managing an offline version of PyPI.

## Installation

Installation requires `mitmproxy`. We require unreleased changes, it is recommended to install from GitHub:

```
pip install git+https://github.com/mitmproxy/mitmproxy@1fcd0335d59c301d73d1b1ef676ecafcf520ab79
```

## Usage

Record PyPI responses during a command:

```
./offlinepi record <command>
```

Replay PyPI responses during a command:

```
./offlinepi replay <command>
```

### Example

Record server interactions during Puffin's tests:

```
./offlinepi record cargo test --features pypi -- --test-threads=1
```

**Note**: Recording tests without parallelism is helpful for reliable replays.

Then, run it again using replayed responses:

```
./offlinepi replay cargo test --features pypi
```

## TLS Certificates

In order to record HTTPS requests, the certificate generated by mitmproxy must be installed.
See [the mitmproxy certificate documentation](https://docs.mitmproxy.org/stable/concepts-certificates/) for details.

## Implementation

[mitmproxy](https://mitmproxy.org/) is used to record and replay responses.

The proxy is temporarily created for the execution of the provided command.

The command _must_ respect the `HTTP_PROXY` and `HTTPS_PROXY` environment variables.

Response recording is limited to `pypi.org` and `files.pythonhosted.org`.

Responses are written to `responses.dat` in the `offlinepi` project root.
50 changes: 50 additions & 0 deletions scripts/offlinepi/offlinepi
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env bash
#
# Run a command, recording or replaying interaction with the PyPI server.
#
# Usage:
#
# offlinepi <record|replay> <command>
#

projectroot=$(realpath "$(dirname "$0")")
responsefile=$projectroot/responses.har

mode=$1
shift

if [ -z "$mode" ]; then
echo 'A mode must be provided e.g. `offlinepi record ...`'
exit 1
fi

if [[ "${mode}" != @(record|replay) ]]; then
echo "Invalid mode \"$mode\"; expected either \"record\" or \"replay\"."
exit 1
fi

if $projectroot/offlinepi-healthcheck; then
echo "Proxy is already running at localhost:8080"
echo "Aborted!"
exit 1
fi

echo "Starting proxy server to $mode responses..."
$projectroot/offlinepi-$mode $responsefile&
PROXY_PID=$!

if ! $projectroot/offlinepi-wait $PROXY_PID; then
echo "Server failed to start!"
echo "Aborted!"
$projectroot/offlinepi-stop $PROXY_PID
exit 1
fi

export HTTP_PROXY=http://localhost:8080
export HTTPS_PROXY=https://localhost:8080

echo "Running provided command..."
"$@"

echo "Stopping proxy server..."
$projectroot/offlinepi-stop $PROXY_PID
12 changes: 12 additions & 0 deletions scripts/offlinepi/offlinepi-healthcheck
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/usr/bin/env sh
#
# Checks if the proxy is running.
#
# Usage:
#
# offlinepi-healthcheck

exec curl --output /dev/null --silent --head --fail --proxy 127.0.0.1:8080 http://mitm.it

# TODO(zanieb): We could consider looking at the response to determine if a _different_ proxy is being used.
# TODO(zanieb): This could take a configurable host and port
36 changes: 36 additions & 0 deletions scripts/offlinepi/offlinepi-record
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env bash
#
# Start a proxy that records client server interactions to a file.
#
# Usage:
#
# offlinepi-record <path>

path=$1
shift

if [ -z "$path" ]; then
echo 'A recording path must be provided.'
exit 1
fi

if [ -n "$*" ]; then
echo "Unexpected extra arguments: $*"
exit 1
fi

# N.B. Additional options must be added _before_ the filter string
exec mitmdump \
--set stream_large_bodies=1000m \
--set hardump="$path" \
"~d pypi.org|files.pythonhosted.org|mitm.it"

# stream_large_bodies: must be set to a large value or large responses will not be recorded
# resulting in an unexpected file endings during replays
# hardump: we use a HAR file instead of the binary format (-w <path>) so it the output is
# human readable
# ~d: only interactions with package index domains should be recorded
# we also allow `mitm.it` so healthchecks succeed when replaying

# Helpful notes for development
# --flow-detail <0-4> can be used to adjust the amount information displayed about traffic
30 changes: 30 additions & 0 deletions scripts/offlinepi/offlinepi-replay
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash
#
# Start a proxy that replays server responses from a recording.
# Unknown responses will result in a 500.
# Each response can only be replayed once or it will be treated as unknown.
#
# Usage:
#
# offlinepi-start-replay <path>

path=$1
shift

if [ -z "$path" ]; then
echo 'A recording path must be provided.'
exit 1
fi

if [ -n "$*" ]; then
echo "Unexpected extra arguments: $*"
exit 1
fi

exec mitmdump --server-replay "$path" \
--flow-detail 3 \
--server-replay-extra 500 \
--set connection_strategy=lazy

# server-replay-extra: configures behavior when a response is unknown.
# connection_stategy: lazy is required to replay offline
24 changes: 24 additions & 0 deletions scripts/offlinepi/offlinepi-stop
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/usr/bin/env sh
#
# Stops the proxy at the given PID.
#
# Usage:
#
# offlinepi-stop <pid>

pid=$1
shift

if [ -z "$pid" ]; then
echo 'A PID must be provided.'
exit 1
fi

if [ -n "$*" ]; then
echo "Unexpected extra arguments: $*"
exit 1
fi

kill "$pid" 2> /dev/null
wait "$pid" 2> /dev/null
echo "Done!"
32 changes: 32 additions & 0 deletions scripts/offlinepi/offlinepi-wait
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env bash
#
# Waits for the proxy to be ready.
#
# Usage:
#
# offlinepi-wait-ready <pid>

projectroot=$(realpath "$(dirname "$0")")
healthcheck="$projectroot/offlinepi-healthcheck"

pid=$1
shift

if [ -z "$pid" ]; then
echo 'A PID must be provided.'
exit 1
fi

if [ -n "$*" ]; then
echo "Unexpected extra arguments: $*"
exit 1
fi


# Wait until the server is ready
until $healthcheck; do
if ! kill -0 "$pid" 2> /dev/null; then
exit 1
fi
sleep 1
done