RxRx19a is the first component of the RxRx19 dataset series released by Recursion sharing data from a high-dimensional human cellular assay for COVID-19 associated disease. RxRx19a models active SARS-CoV-2 infection in both human renal cortical epithelial cells as well as Vero cells. For more information about RxRx19a please visit RxRx.ai and the associated preprints, Identification of potential treatments for COVID-19 through artificial intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2 and Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and COVID-19 drug discovery.
RxRx19a is part of a larger set of Recursion datasets that can be found at RxRx.ai and on GitHub. For questions about this dataset and others please email info@rxrx.ai.
The metadata can be found in metadata.csv
and downloaded from here. The schema of the metadata is as follows:
Attribute | Description |
---|---|
site_id | Unique identifier of a given site |
well_id | Unique identifier of a given well |
cell_type | Cell type tested |
experiment | Experiment identifier |
plate | Plate number within the experiment |
well | Location on the plate |
site | Indication of the location in the well where image was taken (1, 2, 3 or 4) |
disease_condition | The disease condition tested in the well (mock, irradiated or viral) |
treatment | Compound tested in the well |
treatment_conc | Compound concentration tested (in uM) |
SMILES | Formula of tested compound (as CXSMILES/ChemAxon Extended SMILES) |
The images are found in images/*
and can be downloaded from here (n.b. this is 445GB).
The image data are 1024x1024 8-bit png
files. The image paths, such as HRCE-1/Plate1/AA02_s2_w3.png
, can be read as:
Experiment Name: Cell type and experiment number (HRCE experiment 1)
Plate Number (1)
Well location on plate (column AA, row 2)
Site (2)
Channel (3)
All five channels (w1
- w5
) make up an single image of a given site
. Each channel images a single
cellular stain:
channel | stain |
---|---|
w1 |
Hoechst 33342 (nucleus) |
w2 |
Concanavalin A (membrane glycoproteins) |
w3 |
Phalloidin (Actin) |
w4 |
Syto14 (RNA) |
w5 |
Wheat germ agglutinin (Golgi) |
Physical resolution: 0.65 micron/pixel.
The deep learning embeddings can be found in embeddings.csv
and downloaded from here (n.b. this is 1.4GB).
Each row in the csv has a site_id
as described in the metadata schema. The remaining 1024 columns are the embedding for that respective site.
- April 2020: initial release
- August 2020: updated to correct metadata errors in compound mapping and add SMILES to metadata
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.