Packer 2019 Taylor2019 Cao2019 data wangle 2020-03-30
Munfred
released this
03 Apr 22:53
·
49 commits
to master
since this release
VAE trained on full data with scVI v0.6.1 (works on v0.6.3)
New data wrangle with packer labels to include cell_plot_type
as cell_type
The wormcells-data-2020-03-30.h5ad
anndata file is provided with the following entries:
AnnData object with n_obs × n_vars = 191138 × 22761
obs: 'barcode', 'cell_subtype', 'cell_type', 'embryo_time', 'embryo_time_bin', 'experiment', 'lineage', 'numi', 'passed_qc', 'raw_embryo_time', 'raw_embryo_time_bin', 'size_factor', 'study', 'time_point', 'tissue_type'
var: 'gene_name', 'gene_description'
The first and last entries of the data for each study can be printed with this snippet
import anndata
import pandas as pd
adata = anndata.read('wormcells-data-2020-03-30.h5ad')
pd.concat([adata.obs[adata.obs['study'] == 'cao'].head(1).T,
adata.obs[adata.obs['study'] == 'cao'].tail(1).T,
adata.obs[adata.obs['study'] == 'packer'].head(1).T,
adata.obs[adata.obs['study'] == 'packer'].tail(1).T,
adata.obs[adata.obs['study'] == 'taylor'].head(1).T,
adata.obs[adata.obs['study'] == 'taylor'].tail(1).T],
axis=1)
It looks as below. Note that the display is transposed for convenience, the entries in first column below and the anndata obs names
0-cao 35986-cao 0-packer 89700-packer 0-taylor 65449-taylor
barcode A01_A02_AACTACCGAC B02_B42_TTCTACGCCA AAACCTGAGACAATAC-300.1.1 TGGGCGTTCAGGCCCA-b02 acr2_AAACCCAAGATCGCTT-1 u3_TTTGTCATCTTCGGTC-1
cell_subtype nan nan BWM_head_row_1 nan nan nan
cell_type hyp_4_to_7_bin_3_around_L2_molt Intestine_far_posterior BWM_head_row_1 nan Unknown_NT VB
embryo_time NaN NaN 380 265 NaN NaN
embryo_time_bin nan nan 330-390 210-270 nan nan
experiment L2_experiment_1 L2_experiment_2 Waterston_300_minutes Murray_b02 acr-2 unc-3
lineage nan nan MSxpappp nan nan nan
numi NaN NaN 1630 1132 NaN NaN
passed_qc nan nan True True nan nan
raw_embryo_time NaN NaN 360 260 NaN NaN
raw_embryo_time_bin nan nan 330-390 210-270 nan nan
size_factor NaN NaN 1.02319 0.70682 NaN NaN
study cao cao packer packer taylor taylor
time_point nan nan 300_minutes mixed nan nan
tissue_type nan nan Body_wall_muscle nan Neuron Neuron
In the variables, Gene annotations include WormBase short gene descriptions, for example the first 5 entries look like:
gene_id gene_name gene_description
0 WBGene00000001 aap-1 Exhibits protein kinase binding activity. Involved in dauer larval development; determination of adult lifespan; and insulin receptor signaling pathway. Localizes to the phosphatidylinositol 3-kinase complex. Human ortholog(s) of this gene implicated in several diseases, including astroblastoma; carcinoma (multiple); endometrial cancer (multiple); primary immunodeficiency disease (multiple); and type 2 diabetes mellitus. Is expressed in intestine and neurons. Orthologous to several human genes including PIK3R3 (phosphoinositide-3-kinase regulatory subunit 3).
1 WBGene00000002 aat-1 Contributes to L-amino acid transmembrane transporter activity. Involved in amino acid transmembrane transport. Localizes to the amino acid transport complex. Is expressed in several structures, including excretory system; gonadal sheath cell; nervous system; pharynx; and rectal gland cell. Orthologous to several human genes including SLC7A8 (solute carrier family 7 member 8).
2 WBGene00000003 aat-2 Predicted to have L-amino acid transmembrane transporter activity. Predicted to be involved in amino acid transmembrane transport. Predicted to localize to the integral component of membrane. Human ortholog(s) of this gene implicated in lysinuric protein intolerance. Orthologous to several human genes including SLC7A7 (solute carrier family 7 member 7).
3 WBGene00000004 aat-3 Contributes to L-amino acid transmembrane transporter activity. Involved in amino acid transmembrane transport. Localizes to the amino acid transport complex. Orthologous to human SLC7A5 (solute carrier family 7 member 5) and SLC7A8 (solute carrier family 7 member 8).
4 WBGene00000005 aat-4 Predicted to have L-amino acid transmembrane transporter activity. Predicted to be involved in amino acid transmembrane transport. Predicted to localize to the integral component of membrane. Human ortholog(s) of this gene implicated in lysinuric protein intolerance. Orthologous to human SLC7A6 (solute carrier family 7 member 6) and SLC7A7 (solute carrier family 7 member 7).