Skip to content

Latest commit

 

History

History
54 lines (47 loc) · 3.17 KB

open_data.md

File metadata and controls

54 lines (47 loc) · 3.17 KB

Data Release

All data captured by the SeqCode Registry is intended for immediate or eventual public release. However, entries have one of three levels of visibility:

  1. Private: The data and metadata associated to name and register list entries in draft status (under preparation) are only visible to the submitters themselves and the Registry curator team. However, the name might be listed in some pages: e.g., in the genome page or among children of the parent taxon (if the name of the parent taxon is public). Private pages indicate they are private with a crossed-eye icon under the page title (top-left).
  2. Unlisted: Register lists that are submitted, endorsed, or notified are not listed in the Registry and therefore cannot be found publicly linked in the portal. However, using the direct link to the register list (i.e., the SeqCode Accession) allows anyone to view the data and metadata in the list. This allows authors to include the SeqCode Accession of prepared lists in their manuscript submissions for peer review, without making the entry findable before publication. Note that in most cases the names will be listed in the register list, together with their rank, status, and nomenclatural type, but all other data will remain inaccessible unless the names themselves are public.
  3. Public: The data and metadata associated to name and register list entries in validated status (validly published), as well as names with the status of automated discovery are fully accessible to all portal users, including unregistered visitors. All other entries in the system are also public, including genome and publication pages. All public data in the SeqCode Registry is available in the portal under the terms of the CC BY 4.0 license (Creative Commons Attribution version 4.0).

Data sources and repositories

The SeqCode Registry is not a primary repository for genomic data, and genome sequences are not stored directly. Genome sequences must be available in INSDC before an entry can be created, and the Registry only stores metadata associated to these genomes. In addition, data for some entries, including publications, authors, and some names, is directly retrieved from third-party providers, and links are included in the corresponding pages when appropriate. Of note:

  • Some data from publications and authors might be automatically retrieved through the APIs of CrossRef, DataCite, ORCID, and ROR
  • Some data and metadata from genomes might be automatically retrieved through the APIs of NCBI, EBI, and MiGA
  • Some data from names might be automatically retrieved through the APIs of ITIS, GBIF, IRMNG, CoL, LPSN, NCBI, and GTDB