Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Aggregator Rewrite #753

Merged
merged 152 commits into from
Feb 22, 2021
Merged
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
f252410
First steps in the rewrite
Sep 29, 2020
0ba7de7
Fixed import paths
Sep 29, 2020
d65af1e
One giant refactor
Sep 29, 2020
0a18a19
Merge branch 'master' into DataAggregatorRefactor
Sep 30, 2020
3fb70b8
Fixing tests
Sep 30, 2020
53b55cd
Adding mypy
Sep 30, 2020
ec4c7df
Removed mypy from pre-commit workflow
Sep 30, 2020
7288783
First draft on DataAggregator
Oct 1, 2020
f694254
Wrote a DataAggregator that starts and shuts down
Oct 1, 2020
b7e9b1d
Created tests and added more empty types
Oct 2, 2020
8eaf7ce
Got demo.py working
Oct 2, 2020
d633f95
Created sql_provider
Oct 2, 2020
71ae8b0
Cleaned up imports in TaskManager
Oct 9, 2020
6e11e1b
Added async
Oct 9, 2020
07db4ab
Fixed minor bugs
Oct 9, 2020
f3857ac
First steps at porting arrow
Oct 9, 2020
ee5d5b6
Introduced TableName and different Task handling
Oct 9, 2020
9e06a5e
Added more failing tests
Oct 9, 2020
2737ec0
First first completes others don't
Oct 13, 2020
c63ae82
It works
Oct 13, 2020
8177ed4
Started working on arrow_provider
Oct 14, 2020
50e1539
Implemented ArrowProvider
Oct 16, 2020
9e1d67e
Added logger fixture
Oct 16, 2020
fe2e977
Fixed test_storage_controller
Oct 16, 2020
157ee24
Fixing OpenWPMTest.visit()
Oct 16, 2020
52c4ff6
Moved test/storage_providers to test/storage
Oct 16, 2020
1098727
Fixing up tests
Oct 19, 2020
d7e7268
Moved automation to openwpm
Nov 16, 2020
86fa88c
Merge branch 'master' into DataAggregatorRefactor
Nov 16, 2020
ce5e901
Readded datadir to .gitignore
Nov 16, 2020
0631848
Ran repin.sh
Nov 16, 2020
3d2d720
Fixed formatting
Nov 16, 2020
d167846
Let's see if this works
Nov 16, 2020
7f1597f
Fixed imports
Nov 16, 2020
5de0822
Got arrow_memory_provider working
Nov 16, 2020
ae718dc
Merge branch 'master' into DataAggregatorRefactor
Nov 25, 2020
12a60a0
Starting to rewrite tests
Nov 25, 2020
84bff66
Setting up fixtures
Nov 25, 2020
4eb5c23
Attempting to fix all the tests
Nov 25, 2020
9b03e30
Still fixing tests
Nov 25, 2020
95bfcd5
Broken content saving
Nov 25, 2020
1b2f162
Added node
Nov 25, 2020
f01756b
Fixed screenshot tests
Nov 26, 2020
c5dfcd6
Fixing more tests
Nov 27, 2020
9d635d3
Fixed tests
Nov 27, 2020
ceb1d98
Implemented local_storage.py
Nov 27, 2020
11fb99f
Cleaned up flush_cache
Nov 27, 2020
17835b4
Fixing more tests
Nov 27, 2020
cc9ed52
Wrote test for LocalArrowProvider
Nov 27, 2020
0098181
Introduced tests for local_storage_provider.py
Nov 27, 2020
bf4f92c
Asserting test dir is empty
Nov 30, 2020
5c0a1e1
Creating subfolder for different aggregators
Dec 4, 2020
5981463
New depencies and init()
Dec 4, 2020
ba56b34
Everything is terribly broken
Dec 4, 2020
74ae07c
Figured out finalize_visit_id
Dec 7, 2020
6068c69
Running two event loops kinda works???
Dec 7, 2020
17a22d3
Rearming the event
Dec 8, 2020
3389d00
Introduced mypy
Dec 8, 2020
7343c88
Downgraded black in pre-commit
Dec 8, 2020
babd962
Modifying the database directly
Dec 8, 2020
6f9a06d
Merge branch 'master' into DataAggregatorRefactor
Dec 8, 2020
b3d28a0
Fixed formatting
Dec 11, 2020
791d865
Made mypy a lil stricter
Dec 11, 2020
66e8caa
Fixing docs and config printing
Dec 11, 2020
70963bd
Realising I've been using the wrong with
Dec 11, 2020
9862bf7
Trying to figure arrow_storage
Dec 11, 2020
4a036fa
Moving lock initialization in in_memory_storage
Dec 11, 2020
57d8ba9
Fixing tests
Dec 11, 2020
67d3070
Fixing up tests and adding more typechecking
Dec 11, 2020
de00f94
Fixed num_browsers in test_cache_hits_recorded
Dec 11, 2020
4291ddb
Parametrized unstructured
Dec 11, 2020
fa1c52f
String fix
Dec 11, 2020
9aed882
Added failing test
Dec 15, 2020
ef0ba1e
New test
Dec 23, 2020
1b14cbd
Review changes with Steven
Dec 23, 2020
8eb6ef0
Fixed repin.sh and test_arrow_cache
Jan 8, 2021
51d510f
Merge branch 'master' into DataAggregatorRefactor
Jan 15, 2021
24fc5d2
Minor change
Jan 15, 2021
0096007
Fixed prune-environment.py
Jan 15, 2021
962af53
Removing references to DataAggregator
Jan 15, 2021
902e4ed
Fixed test_seed_persistance
Jan 15, 2021
25cd9cf
More paths
Jan 15, 2021
dcb9a6a
Fixed test display shutdown
Jan 18, 2021
e91aba7
Made cache test more robust
Jan 18, 2021
e4c9bb8
Update crawler.py
Jan 18, 2021
247a69c
Slimming down ManagerParams
Jan 18, 2021
41e59ad
Fixing more tests
Jan 18, 2021
c9e52ee
Merge remote-tracking branch 'origin/DataAggregatorRefactor' into Dat…
Jan 18, 2021
7acb624
Update test/storage/test_storage_controller.py
Jan 19, 2021
db0d27f
Purging references to DataAggregator
Jan 22, 2021
abe4a01
Reverted changes to .travis.yml
Jan 22, 2021
2223d0c
Merge remote-tracking branch 'origin/DataAggregatorRefactor' into Dat…
Jan 22, 2021
d7400d2
Demo.py saves locally again
Jan 22, 2021
645240b
Readjusting test paths
Jan 22, 2021
ecb87f0
Expanded comment on initialize to reference #846
Jan 22, 2021
8629538
Made token optional in finalize_visit_id
Jan 22, 2021
9983362
Simplified test paramtetrization
Jan 22, 2021
105c73b
Fixed callback semantics change
Jan 22, 2021
f5a0abd
Removed test_parse_http_stack_trace_str
Jan 22, 2021
e6175db
Added DataSocket
Jan 22, 2021
173de3a
WIP need to fix path encoding
Jan 22, 2021
501dc5c
Fixed path encoding
Jan 25, 2021
6bd5575
Added task and crawl to schema
Jan 25, 2021
e5395d4
Merge branch 'master' into DataAggregatorRefactor
Jan 29, 2021
eeceaa3
Fixed paths in GitHub actions
Jan 29, 2021
22a822b
Merge branch 'master' into DataAggregatorRefactor
Jan 29, 2021
d7db8ca
Refactored completion handling
Jan 29, 2021
d5733db
Fix tests
Jan 29, 2021
6ee9972
Trying to fix tests on CI
Feb 1, 2021
89635c2
Removed redundant setting of tag
Feb 1, 2021
d4a391d
Removing references to S3
Feb 1, 2021
ffbb346
Purging more DataAggregator references
Feb 1, 2021
379af2d
Craking up logging to figure out test failure
Feb 1, 2021
e5c897b
Moved test_values into a fixture
Feb 1, 2021
8520be6
Fixing GcpUnstructuredProvider
Feb 1, 2021
6527179
Fixed paths for future crawls
Feb 1, 2021
1ca2739
Renamed sqllite to official sqlite
Feb 3, 2021
7cf48a1
Restored demo.py
Feb 3, 2021
29d3e27
Update openwpm/commands/profile_commands.py
Feb 3, 2021
5b5f229
Restored previous behaviour of DumpProfileCommand
Feb 3, 2021
73f0850
Removed leftovers
Feb 3, 2021
a4a75ff
Cleaned up comments
Feb 3, 2021
41f6656
Expanded lock check
Feb 3, 2021
9046d0d
Fixed more stuff
Feb 3, 2021
0ec3353
More comment updates
Feb 3, 2021
c1a6038
Update openwpm/socket_interface.py
Feb 3, 2021
ae25bfa
Removed outdated comment
Feb 4, 2021
4e89806
Using config_encoder
Feb 4, 2021
669a40f
Merge remote-tracking branch 'origin/DataAggregatorRefactor' into Dat…
Feb 4, 2021
85a4c2d
Renamed tar_location to tar_path
Feb 4, 2021
4c03174
Removed references to database_name in docs
Feb 5, 2021
f565507
Cleanup
Feb 5, 2021
2553a09
Moved screenshot_path and source_dump_path to ManagerParamsInternal
Feb 5, 2021
fceaee0
Fixed imports
Feb 12, 2021
1846d25
Fixing up comments
Feb 12, 2021
dfdc34d
Fixing up comments
Feb 12, 2021
b922774
More docs
Feb 15, 2021
a7bcbb8
Merge branch 'master' into DataAggregatorRefactor
Feb 15, 2021
55f6cdb
updated dependencies
Feb 16, 2021
65c6eda
Fixed test_task_manager
Feb 17, 2021
9dbeb7e
Merge branch 'master' into DataAggregatorRefactor
Feb 17, 2021
048546d
Reupgraded to python 3.9.1
Feb 17, 2021
59484de
Restoring crawl_reference in mp_logger
Feb 22, 2021
937c8fe
Removed unused imports
Feb 22, 2021
1c819f7
Apply suggestions from code review
Feb 22, 2021
2cbb801
Cleaned up socket handling
Feb 22, 2021
2dd339c
Fixed TaskManager.__exit__
Feb 22, 2021
74762cb
Merge remote-tracking branch 'origin/DataAggregatorRefactor' into Dat…
Feb 22, 2021
a555c14
Moved validation code into config.py
Feb 22, 2021
4f6aed1
Removed comment
Feb 22, 2021
eb08c13
Removed comment
Feb 22, 2021
4820b2c
Removed comment
Feb 22, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions .github/workflows/run-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# This workflow will run all tests as well as pre-commit

name: Tests and linting
on:
push:
branches:
- master
pull_request:
schedule:
- cron: '0 0 */2 * *'

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
# All of these steps are just setup
- uses: actions/checkout@v2
- name: Setting MINICONDA_PATH
run: echo "MINICONDA_PATH=$HOME/miniconda" >> $GITHUB_ENV
- name: Setting OPENWPM_CONDA_PATH
run: echo "OPENWPM_CONDA_PATH=$MINICONDA_PATH/envs/openwpm" >> $GITHUB_ENV
# If the environment.yaml hasn't changed we just reuse the entire conda install
- id: cache
uses: actions/cache@v2
env:
cache-name: conda-cache
with:
path: ${{ env.MINICONDA_PATH }}
key: ${{ env.cache-name }}-${{ hashFiles('environment.yaml') }}

- name: Install conda
if: ${{ steps.cache.outputs.cache-hit != 'true' }}
run: $GITHUB_WORKSPACE/scripts/install-miniconda.sh

- run: echo "$MINICONDA_PATH/bin" >> $GITHUB_PATH

- name: Install.sh (cache miss)
if: ${{ steps.cache.outputs.cache-hit != 'true' }}
run: $GITHUB_WORKSPACE/install.sh
- name: Install.sh (cache hit)
if: ${{ steps.cache.outputs.cache-hit == 'true' }}
run: $GITHUB_WORKSPACE/install.sh --skip-create
- run: echo "$OPENWPM_CONDA_PATH/bin" >> $GITHUB_PATH
# Now we have a working OpenWPM environment

- run: pre-commit run --all
tests:
runs-on: ubuntu-latest
strategy:
matrix:
test-groups: ["test/test_[a-e]*", "test/test_[f-h]*", "test/test_[i-r,t-z]*", "test/test_[s]*", "test/storage/*"]
fail-fast: false
steps:
# All of these steps are just setup, maybe we should wrap them in an action
- uses: actions/checkout@v2
- name: Cache node modules
uses: actions/cache@v2
env:
cache-name: cache-node-modules
with:
# npm cache files are stored in `~/.npm` on Linux/macOS
path: ~/.npm
key: ${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
# Setting env variables that depend on $HOME
- name: Setting MINICONDA_PATH
run: echo "MINICONDA_PATH=$HOME/miniconda" >> $GITHUB_ENV
- name: Setting OPENWPM_CONDA_PATH
run: echo "OPENWPM_CONDA_PATH=$MINICONDA_PATH/envs/openwpm" >> $GITHUB_ENV

# If the environment.yaml hasn't changed we just reuse the entire conda install
- id: conda-cache
uses: actions/cache@v2
env:
cache-name: conda-cache
with:
path: ${{ env.MINICONDA_PATH }}
key: ${{ env.cache-name }}-${{ hashFiles('environment.yaml') }}

- name: Install conda
if: ${{ steps.conda-cache.outputs.cache-hit != 'true' }}
run: $GITHUB_WORKSPACE/scripts/install-miniconda.sh

- run: echo "$MINICONDA_PATH/bin" >> $GITHUB_PATH

- name: Install.sh (cache miss)
if: ${{ steps.conda-cache.outputs.cache-hit != 'true' }}
run: $GITHUB_WORKSPACE/install.sh
- name: Install.sh (cache hit)
if: ${{ steps.conda-cache.outputs.cache-hit == 'true' }}
run: $GITHUB_WORKSPACE/install.sh --skip-create

- run: echo "$OPENWPM_CONDA_PATH/bin" >> $GITHUB_PATH
# Now we have a working OpenWPM environment

- run: ./scripts/ci.sh
env:
DISPLAY: ":99.0"
TESTS: ${{ matrix.test-groups }}
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ repos:
hooks:
- id: isort
- repo: https://github.com/psf/black
rev: 19.10b0
rev: 20.8b1
hooks:
- id: black
language_version: python3
Expand Down
46 changes: 0 additions & 46 deletions .travis.yml

This file was deleted.

1 change: 0 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ OpenWPM's tests are build on [pytest](https://docs.pytest.org/en/latest/). Execu
in the test directory to run all tests:

$ conda activate openwpm
$ cd test
$ py.test -vv

See the [pytest docs](https://docs.pytest.org/en/latest/) for more information on selecting
Expand Down
114 changes: 63 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

OpenWPM
[![Build Status](https://travis-ci.org/mozilla/OpenWPM.svg?branch=master)](https://travis-ci.org/mozilla/OpenWPM)
[![Build Status](https://github.com/mozilla/openwpm/workflows/Tests%20and%20linting/badge.svg?branch=master)](https://github.com/mozilla/openwpm/actions?query=branch%3Amaster)
[![OpenWPM Matrix Channel](https://img.shields.io/matrix/OpenWPM:mozilla.org?label=Join%20us%20on%20matrix&server_fqdn=mozilla.modular.im)](https://matrix.to/#/#OpenWPM:mozilla.org?via=mozilla.org) <!-- omit in toc -->
=======

Expand All @@ -12,25 +12,25 @@ the instrumentation section below for more details.

Table of Contents <!-- omit in toc -->
------------------
* [Installation](#installation)
* [Pre-requisites](#pre-requisites)
* [Install](#install)
* [Mac OSX](#mac-osx)
* [Quick Start](#quick-start)
* [Troubleshooting](#troubleshooting)
* [Advice for Measurement Researchers](#advice-for-measurement-researchers)
* [Developer instructions](#developer-instructions)
* [Instrumentation and Configuration](#instrumentation-and-configuration)
* [Persistence Types](#persistence-types)
* [Local Databases](#local-databases)
* [Parquet on Amazon S3](#parquet-on-amazon-s3)
* [Docker Deployment for OpenWPM](#docker-deployment-for-openwpm)
* [Building the Docker Container](#building-the-docker-container)
* [Running Measurements from inside the Container](#running-measurements-from-inside-the-container)
* [MacOS GUI applications in Docker](#macos-gui-applications-in-docker)
* [Citation](#citation)
* [License](#license)

- [Installation](#installation)
- [Pre-requisites](#pre-requisites)
- [Install](#install)
- [Mac OSX](#mac-osx)
- [Quick Start](#quick-start)
- [Troubleshooting](#troubleshooting)
- [Advice for Measurement Researchers](#advice-for-measurement-researchers)
- [Developer instructions](#developer-instructions)
- [Instrumentation and Configuration](#instrumentation-and-configuration)
- [Storage](#storage)
- [Local Storage](#local-storage)
- [Remote storage](#remote-storage)
- [Docker Deployment for OpenWPM](#docker-deployment-for-openwpm)
- [Building the Docker Container](#building-the-docker-container)
- [Running Measurements from inside the Container](#running-measurements-from-inside-the-container)
- [MacOS GUI applications in Docker](#macos-gui-applications-in-docker)
- [Citation](#citation)
- [License](#license)

Installation
------------
Expand Down Expand Up @@ -82,8 +82,8 @@ Quick Start

Once installed, it is very easy to run a quick test of OpenWPM. Check out
`demo.py` for an example. This will use the default setting specified in
`openwpm/default_manager_params.json` and
`openwpm/default_browser_params.json`, with the exception of the changes
`openwpm/config.py::ManagerParams` and
`openwpm/config.py::BrowserParams`, with the exception of the changes
specified in `demo.py`.

More information on the instrumentation and configuration parameters is given
Expand Down Expand Up @@ -178,40 +178,52 @@ If you want to contribute to OpenWPM have a look at our [CONTRIBUTING.md](./CONT

Instrumentation and Configuration
-------------------------------

OpenWPM provides a breadth of configuration options which can be found
in [Configuration.md](docs/Configuration.md)
More detail on the output is available [below](#persistence-types).

Persistence Types
Storage
------------

#### Local Databases
By default OpenWPM saves all data locally on disk in a variety of formats.
Most of the instrumentation saves to a SQLite database specified
by `manager_params.database_name` in the main output directory. Response
bodies are saved in a LevelDB database named `content.ldb`, and are keyed by
the hash of the content. In addition, the browser commands that dump page
source and save screenshots save them in the `sources` and `screenshots`
subdirectories of the main output directory. The SQLite schema
specified by: `openwpm/DataAggregator/schema.sql`. You can specify additional tables
inline by sending a `create_table` message to the data aggregator.

#### Parquet on Amazon S3
As an option, OpenWPM can save data directly to an Amazon S3 bucket as a
Parquet Dataset. This is currently experimental and hasn't been thoroughly
tested. Screenshots, and page source saving is not currently supported and
will still be stored in local databases and directories. To enable S3
saving specify the following configuration parameters in `manager_params`:
* Persistence Type: `manager_params.output_format = 's3'`
* S3 bucket name: `manager_params.s3_bucket = 'openwpm-test-crawl'`
* Directory within S3 bucket: `manager_params.s3_directory = '2018-09-09_test-crawl-new'`

In order to save to S3 you must have valid access credentials stored in
`~/.aws`. We do not currently allow you to specify an alternate storage
location.

**NOTE:** The schemas should be kept in sync with the exception of
output-specific columns (e.g., `instance_id` in the S3 output). You can compare
OpenWPM distinguishes between two types of data, structured and unstructured.
Structured data is all data captured by the instrumentation or emitted by the platform.
Generally speaking all data you download is unstructured data.

For each of the data classes we offer a variety of storage providers, and you are encouraged
to implement your own, should the provided backends not be enough for you.

We have an outstanding issue to enable saving content generated by commands, such as
screenshots and page dumps to unstructured storage (see [#232](https://github.com/mozilla/OpenWPM/issues/232)).
For now, they get saved to `manager_params.data_directory`.

### Local Storage

For storing structured data locally we offer two StorageProviders:

- The SQLiteStorageProvider which writes all data into a SQLite database
- This is the recommended approach for getting started as the data is easily explorable
- The LocalArrowProvider which stores the data into Parquet files.
- This method integrates well with NumPy/Pandas
- It might be harder to ad-hoc process

For storing unstructured data locally we also offer two solutions:

- The LevelDBProvider which stores all data into a LevelDB
- This is the recommended approach
- The LocalGzipProvider that gzips and stores the files individually on disk
- Please note that file systems usually don't like thousands of files in one folder
- Use with care or for single site visits

### Remote storage

When running in the cloud, saving records to disk is not a reasonable thing to do.
So we offer a remote StorageProviders for S3 (See [#823](https://github.com/mozilla/OpenWPM/issues/823)) and GCP.
Currently, all remote StorageProviders write to the respective object storage service (S3/GCS).
The structured providers use the Parquet format.

**NOTE:** The Parquet and SQL schemas should be kept in sync except
output-specific columns (e.g., `instance_id` in the Parquet output). You can compare
the two schemas by running
`diff -y openwpm/DataAggregator/schema.sql openwpm/DataAggregator/parquet_schema.py`.

Expand All @@ -238,7 +250,7 @@ Docker service.
__Step 2:__ to build the image, run the following command from a terminal
within the root OpenWPM directory:

```
```bash
docker build -f Dockerfile -t openwpm .
```

Expand All @@ -253,7 +265,7 @@ X-server. You can do this by running: `xhost +local:docker`

Then you can run the demo script using:

```
```bash
mkdir -p docker-volume && docker run -v $PWD/docker-volume:/opt/Desktop \
-e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --shm-size=2g \
-it openwpm python3 /opt/OpenWPM/demo.py
Expand Down
14 changes: 7 additions & 7 deletions crawler.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
# Storage Provider Params
CRAWL_DIRECTORY = os.getenv("CRAWL_DIRECTORY", "crawl-data")
GCS_BUCKET = os.getenv("GCS_BUCKET", "openwpm-crawls")
GCP_PROJECT = os.getenv("GCP_PROJECT", "senglehardt-openwpm-test-1")
GCP_PROJECT = os.getenv("GCP_PROJECT", "")
AUTH_TOKEN = os.getenv("GCP_AUTH_TOKEN", "cloud")

# Browser Params
Expand Down Expand Up @@ -87,9 +87,6 @@
# Manager configuration
manager_params.data_directory = Path("~/Desktop/") / CRAWL_DIRECTORY
manager_params.log_directory = Path("~/Desktop/") / CRAWL_DIRECTORY
manager_params.output_format = "s3"
manager_params.s3_bucket = GCS_BUCKET
manager_params.s3_directory = CRAWL_DIRECTORY

structured = GcsStructuredProvider(
project=GCP_PROJECT,
Expand All @@ -100,7 +97,7 @@
unstructured = GcsUnstructuredProvider(
project=GCP_PROJECT,
bucket_name=GCS_BUCKET,
base_path=CRAWL_DIRECTORY,
base_path=CRAWL_DIRECTORY + "/data",
token=AUTH_TOKEN,
)
# Instantiates the measurement platform
Expand All @@ -119,7 +116,7 @@
with sentry_sdk.configure_scope() as scope:
# tags generate breakdown charts and search filters
scope.set_tag("CRAWL_DIRECTORY", CRAWL_DIRECTORY)
scope.set_tag("S3_BUCKET", GCS_BUCKET)
scope.set_tag("GCS_BUCKET", GCS_BUCKET)
scope.set_tag("DISPLAY_MODE", DISPLAY_MODE)
scope.set_tag("HTTP_INSTRUMENT", HTTP_INSTRUMENT)
scope.set_tag("COOKIE_INSTRUMENT", COOKIE_INSTRUMENT)
Expand All @@ -136,7 +133,10 @@
if PREFS:
scope.set_context("PREFS", json.loads(PREFS))
scope.set_context(
"crawl_config", {"REDIS_QUEUE_NAME": REDIS_QUEUE_NAME,},
"crawl_config",
{
"REDIS_QUEUE_NAME": REDIS_QUEUE_NAME,
},
)
# Send a sentry error message (temporarily - to easily be able
# to compare error frequencies to crawl worker instance count)
Expand Down
Loading