onekp

The 1000 Plants initiative (1KP) provides the transcriptome sequences to over 1000 plants from diverse lineages. onekp allows researchers in plant genomics and transcriptomics to access this dataset through a simple R interface. The metadata for each transcriptome project is scraped from the 1KP project website. This metadata includes the species, tissue, and research group for each sequence sample. onekp leverages the taxonomy program taxizedb, a local database version of taxize package, to allow filtering of the metadata by taxonomic group (entered as either a taxon name or NCBI ID). The raw nucleotide or translated peptide sequence can then be downloaded for the full, or filtered, table of transcriptome projects.

Alternatives to `onekp`

The data may also be accessed directly through CyVerse (previously iPlant). CyVerse efficiently distributes data using the iRODS data system. This approach is preferable for high-throughput cases or in where iRODS is already in play. Further, accessing data straight from the source at CyVerse is more stable than scraping it from project website. However, the onekp R package is generally easier to use (no iRODS dependency or CyVerse API) and offers powerful filtering solutions.

Contact info

1KP staff

Gane Ka-Shu Wong - Principal investigator
Michael Deyholos - Alberta co-investigator
Yong Zhang - Shenzhen co-investigator
Eric Carpenter - Database manager

R package maintainer

Zebulun Arendsee

Installation

onekp is on CRAN, but currently is a little out of date. So for now it is better to install through github.

library(devtools)
install_github('ropensci/onekp')

Examples

Retrieve the protein and gene transcript FASTA files for two 1KP transcriptomes:

onekp <- retrieve_onekp()
seqs <- filter_by_code(onekp, c('URDJ', 'ROAP'))
download_peptides(seqs, 'oneKP/pep')
download_nucleotides(seqs, 'oneKP/nuc')

This will create the following directory:

oneKP
 ├── nuc 
 │   ├── ROAP.fna
 │   └── URDJ.fna
 └── pep
     ├── ROAP.faa
     └── URDJ.faa

onekp can also filter by species names, taxon ids, or clade.

# filter by species name
filter_by_species(onekp, 'Pinus radiata')

# filter by species NCBI taxon ID
filter_by_species(onekp, 3347)

# filter by clade name scientific name (get all data for the Brassicaceae family)
filter_by_clade(onekp, 'Brassicaceae')

# filter by clade NCBI taxon ID
filter_by_clade(onekp, 3700)

So to get the protein sequences for all species in Brassicaceae:

onekp <- retrieve_onekp()
seqs <- filter_by_clade(onekp, 'Brassicaceae')
download_peptides(seqs, 'oneKP/pep')
download_nucleotides(seqs, 'oneKP/nuc')

Funding

Development of this R package was supported by the National Science Foundation under Grant No. IOS 1546858.

Contributing

We welcome any contributions!

By participating in this project you agree to abide by the terms outlined in the Contributor Code of Conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github		.github
R		R
data		data
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
CONDUCT.md		CONDUCT.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

onekp

Alternatives to `onekp`

Contact info

Installation

Examples

Funding

Contributing

About

Releases 2

Packages

Contributors 2

Languages

License

ropensci/onekp

Folders and files

Latest commit

History

Repository files navigation

onekp

Alternatives to onekp

Contact info

Installation

Examples

Funding

Contributing

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Alternatives to `onekp`

Packages