Skip to content

Packer 2019 Taylor2019 Cao2019 data wangle 2020-03-30

Compare
Choose a tag to compare
@Munfred Munfred released this 03 Apr 22:53
· 49 commits to master since this release
fc107a0

VAE trained on full data with scVI v0.6.1 (works on v0.6.3)
New data wrangle with packer labels to include cell_plot_type as cell_type

The wormcells-data-2020-03-30.h5ad anndata file is provided with the following entries:

AnnData object with n_obs × n_vars = 191138 × 22761 
    obs: 'barcode', 'cell_subtype', 'cell_type', 'embryo_time', 'embryo_time_bin', 'experiment', 'lineage', 'numi', 'passed_qc', 'raw_embryo_time', 'raw_embryo_time_bin', 'size_factor', 'study', 'time_point', 'tissue_type'
    var: 'gene_name', 'gene_description'

The first and last entries of the data for each study can be printed with this snippet

import anndata
import pandas as pd
adata = anndata.read('wormcells-data-2020-03-30.h5ad')
pd.concat([adata.obs[adata.obs['study'] == 'cao'].head(1).T,
           adata.obs[adata.obs['study'] == 'cao'].tail(1).T,
           adata.obs[adata.obs['study'] == 'packer'].head(1).T,
           adata.obs[adata.obs['study'] == 'packer'].tail(1).T,
           adata.obs[adata.obs['study'] == 'taylor'].head(1).T,
           adata.obs[adata.obs['study'] == 'taylor'].tail(1).T],
           axis=1)

It looks as below. Note that the display is transposed for convenience, the entries in first column below and the anndata obs names

	0-cao	35986-cao	0-packer	89700-packer	0-taylor	65449-taylor
barcode	A01_A02_AACTACCGAC	B02_B42_TTCTACGCCA	AAACCTGAGACAATAC-300.1.1	TGGGCGTTCAGGCCCA-b02	acr2_AAACCCAAGATCGCTT-1	u3_TTTGTCATCTTCGGTC-1
cell_subtype	nan	nan	BWM_head_row_1	nan	nan	nan
cell_type	hyp_4_to_7_bin_3_around_L2_molt	Intestine_far_posterior	BWM_head_row_1	nan	Unknown_NT	VB
embryo_time	NaN	NaN	380	265	NaN	NaN
embryo_time_bin	nan	nan	330-390	210-270	nan	nan
experiment	L2_experiment_1	L2_experiment_2	Waterston_300_minutes	Murray_b02	acr-2	unc-3
lineage	nan	nan	MSxpappp	nan	nan	nan
numi	NaN	NaN	1630	1132	NaN	NaN
passed_qc	nan	nan	True	True	nan	nan
raw_embryo_time	NaN	NaN	360	260	NaN	NaN
raw_embryo_time_bin	nan	nan	330-390	210-270	nan	nan
size_factor	NaN	NaN	1.02319	0.70682	NaN	NaN
study	cao	cao	packer	packer	taylor	taylor
time_point	nan	nan	300_minutes	mixed	nan	nan
tissue_type	nan	nan	Body_wall_muscle	nan	Neuron	Neuron

In the variables, Gene annotations include WormBase short gene descriptions, for example the first 5 entries look like:

               gene_id	gene_name	gene_description
0	WBGene00000001	aap-1           Exhibits protein kinase binding activity. Involved in dauer larval development; determination of adult lifespan; and insulin receptor signaling pathway. Localizes to the phosphatidylinositol 3-kinase complex. Human ortholog(s) of this gene implicated in several diseases, including astroblastoma; carcinoma (multiple); endometrial cancer (multiple); primary immunodeficiency disease (multiple); and type 2 diabetes mellitus. Is expressed in intestine and neurons. Orthologous to several human genes including PIK3R3 (phosphoinositide-3-kinase regulatory subunit 3).
1	WBGene00000002	aat-1		Contributes to L-amino acid transmembrane transporter activity. Involved in amino acid transmembrane transport. Localizes to the amino acid transport complex. Is expressed in several structures, including excretory system; gonadal sheath cell; nervous system; pharynx; and rectal gland cell. Orthologous to several human genes including SLC7A8 (solute carrier family 7 member 8).
2	WBGene00000003	aat-2		Predicted to have L-amino acid transmembrane transporter activity. Predicted to be involved in amino acid transmembrane transport. Predicted to localize to the integral component of membrane. Human ortholog(s) of this gene implicated in lysinuric protein intolerance. Orthologous to several human genes including SLC7A7 (solute carrier family 7 member 7).
3	WBGene00000004	aat-3		Contributes to L-amino acid transmembrane transporter activity. Involved in amino acid transmembrane transport. Localizes to the amino acid transport complex. Orthologous to human SLC7A5 (solute carrier family 7 member 5) and SLC7A8 (solute carrier family 7 member 8).
4	WBGene00000005	aat-4		Predicted to have L-amino acid transmembrane transporter activity. Predicted to be involved in amino acid transmembrane transport. Predicted to localize to the integral component of membrane. Human ortholog(s) of this gene implicated in lysinuric protein intolerance. Orthologous to human SLC7A6 (solute carrier family 7 member 6) and SLC7A7 (solute carrier family 7 member 7).