Skip to content

JervenBolleman/Glycan.rdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

#Standard RDF representation for glycans

The following document describes a standard RDF representation for glycan related data. Subject Classesfoaf Properties Sequence information and identifier Resource properties Images Compositions Biological source Publication references Experimental data Glyco relationship RDF Monosaccharide information Other Glycan information Glycan Function Description of GlycoProtein RDFsized glyco-DBs available for SPARQL search -1 RDFsized glyco-DBs available for SPARQL search -2 other RDFs necessary for SPARQL examples References and Links Subject Each document consists of one or several glycans which in the XML version are described using Description tags with each specifying the of the RDF triples in the about attribute. Classes Each class name (e.g.) will translate in RDF to:

@prefix glyco: <http://purl.jp/bio/12/#> .

##Table 1.

owl:class rdfs:label rdfs:subClassOf rdfs:comment
cyclic_glycan Cyclic glycan glycan
repeat_unit Repeating glycan structure glycan
biological_repeat_unit repeatingGlycan

##Table 2.

sequence biomolecule sequence glycan
glycosequence glycan sequence sequence
glycoconjugate_sequence glycoconjugate sequence sequence
aglycon the aglycon portion of a glycoconjugate can be used in combination with other rdf:types from other ontologies.

##Table 3.

synthetic synthetic glycan glycan
modified modified glycan glycan including degradation products
chemical_synthetic synthetic
enzymatical_synthetic synthetic
chemoenzymatical_synthetic synthetic
chemical_modified modified
enzymatical_modified modified
modeled
natural natural glycan glycan

##Table 4.

database a database
glycan_database a glycan database database http://purl.jp/bio/12/core/database.rdf

Properties Each property name (e.g. sequence_glyde, link_to_ccsd) in the following tables (left column) will translate in RDF to:

@prefix glyco: <http://purl.jp/bio/12/> .
subjectURI
glyco:sequence_glyde “…”;
glyco:link_to_ccsd <http://www.genome.jp/dbget-bin/www_bget?carbbank+6915> .
  1. Sequence information and identifier The following table describes properties that have simple data types as object (e.g. Strings, Integer). To have at least one of the sequence formats is mandatory.

##Table 4.

predicate data type comment
rdf:type class glycan class (N-glycan, O-glycan, etc.) - from GlycO or some glyco-ontology
(glyco:repeatUnit, glyco:biologicalRepeatUnit, glyco:cyclicGlycan)
dcterms:identifier xsd:string entry ID in other resource
foaf:name xsd:string Trivial name (eg sialyl-lewis-x, lactosamine)
has_glycosequence Resource glycan sequence information to object of rdf:type glyco:sequence
has_glycoconjugate_sequenceResource sequence information to ojbect of rdf:type glyco:glycoconjugate_sequence
has_aglycon Resource in case of glyco_conjugate, define the aglycon
has_repeat_count Resource

##Table 5. has_aglycon node predicates

predicate data type comment
rdf:type class ChEBI (http://www.ebi.ac.uk/chebi/downloadsForward.do)
foaf:name xsd:string trivial name
has_reference Resource see Reference
attachment_position xsd:integer atom number in aglycon
linkage xsd:integer carbon number in glycan

has_repeat_count is applicable only if a structure encoded in the sequence IS a repeating unit ##Table 6. has_repeat_count node

predicate data type comment
repeat_attribute repeat (min, max, exact, average, unknown) part of has_repeat_count resource
repeat_count xsd:integer value of repeat_attribute in resource (use if it is known)

##Table 7. sequence formats property of the glyco:sequence resource

predicate data type comment
rdf:type owl:class sequence
in_glycoct xsd:string plain literal string with the carbohydrate sequence in GlycoCT condensed format. GlycoCT sequence format is described in a reference 18436199
in_KCF xsd:string String with the carbohydrate sequence in KCF format.
in_GlydeII xsd:string String with the carbohydrate sequence in Glyde-II format.
in_linearCode xsd:string String with the carbohydrate sequence in LinearCode® format.
in_linucs xsd:string String with the carbohydrate sequence in LINUCS format.
in_IUPAC_condensed xsd:string String with carbohydrate sequence in IUPAC nomenclature. http://www.chem.qmul.ac.uk/iupac/ http://www.chem.qmul.ac.uk/iupac/misc/glycp.html http://www.chem.qmul.ac.uk/iupac/misc/glylp.html
in_IUPAC_short
in_IUPAC_extended
in_SweetDB xsd:string Multiline string with carbohydrate sequence in Sweet-DB pseudographics.
in_CSDB xsd:string String with sequence of residues in CSDB linear code http://csdb.glycoscience.ru/bacterial/core/help.php?db=bacterial&topic=rules
@prefix glycoSequence:  <http://purl.jp/bio/12/glycanSequence/0.1/> .
@prefix glyco:   <http://purl.jp/bio/12/> .
@prefix foaf:   <http://xmlns.com/foaf/0.1/>

<http://www.glycome-db.org/rdf/2015>
      glyco:has_glycosequence    <http://www.glycome-db.org/rdf/2015#UUID> ;
      glyco:has_glycoconjugate_sequence <http://www.glycome-db.org/rdf/2015#UUID2> ;
      glyco:has_aglycon 
       [ rdf:type  glyco:aglycon;
          rdf:type chebi:lipid ;
          rdfs:seeAlso <>  # Link to database (eg.CheBI, PubChem)
          foaf: name;    #free text
          glyco:attachment_position 3;	#atom number in aglycon
          glyco:linkage 1		#carbon number in glycan
       ] ;
      rdf:type  glyco:GlycanClass ;
     glyco:relation <http://...>	#back-reference to relastionship RDF
.
<http://www.glycome-db.org/rdf/2015#UUID>    #UUID is a random literal to make blank nodes available from SPARQL
       rdf:type	glyco:glycosequence ;
       glyco:in_linucs              "[][?-D-GlcNAc]{[(3+1)][a-L-Fucp]{}[(4+1)][b-D-Galp]{[(2+1)][a-L-Fucp]{}}}" ;
       glyco:in_iupac              "..” .
<http://www.glycome-db.org/rdf/2015#UUID2>
      glyco:rdf:type	glyco:glycoconjugate_sequence ;
      glyco:in_linucs             "[XXX][?-D-GlcNAc]{[(3+1)][a-L-Fucp]{}[(4+1)][b-D-Galp]{[(2+1)][a-L-Fucp]{}}}" .

#Resource properties

The following table contains information about links to other non-RDF resources describing the same carbohydrate.
All reference databases must be listed in http://purl.jp/bio/12/database.rdf (currently under construction). databases of glycan database https://docs.google.com/document/d/1xy3N7Njsm0EO9fjjp3GC-wuDG3TPcUGQnGPdltp0YpI/edit

##Table 8. Describing a glycan database

is_glycan_database Resource URI (Range is glyco:glycan_database class )
dcterms:identifier xsd:string entry ID in other resource
rdfs:seeAlso Resource URI Reference to other resource can be used as subject for further annotation (resource_name and resource_id)
owl:sameAs Resource URI Reference to another RDF description of exactly this carbohydrate provided by a different resource which may contain complementary information
@prefix glyco:   <http://purl.jp/bio/12/> .
@prefix glycodb:   <http://purl.jp/bio/12/database> .

<http://www.glycome-db.org/rdf/2015>
     rdfs:seeAlso <http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=0538> ;
     rdfs:seeAlso <http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=23173>;
     rdfs:seeAlso <http://www.genome.jp/dbget-bin/www_bget?carbbank+14848">;
     rdfs:seeAlso <http://www.ebi.ac.uk/eurocarb/show_glycan.action?glycanSequenceId=2711>;
     rdfs:seeAlso <http://www.glycosciences.de/sweetdb/start.php?action=explore_linucsid&linucsid=2611&show=1#struct%0A%20%20%09%09>;
     rdfs:seeAlso <_:1> .

<http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=0538>
glyco:is_glycan_database glycodb:bcsdb; 
dcterms:identifier “0538".
<http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=23173>
glyco:is_glycan_database <http://purl.jp/bio/12/database/bcsdb>; dcterms:identifier “23173”.
<http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=6920>
glyco:is_glycan_database <http://purl.jp/bio/12/database/bcsdb>; dcterms:identifier “6920”.
<http://www.genome.jp/dbget-bin/www_bget?carbbank+14848>
glyco:is_glycan_database <http://purl.jp/bio/12/database/ccsd>; dcterms:identifier “14848”.
<http://www.ebi.ac.uk/eurocarb/show_glycan.action?glycanSequenceId=2711>
glyco:is_glycan_database <http://purl.jp/bio/12/database/eurocarb-dbi>; dcterms:identifier “2711”.
 <http://www.glycosciences.de/sweetdb/start.php?action=explore_linucsid&linucsid=2611&show=1#struct%0A%20%20%09%09>
glyco:is_glycan_database <http://purl.jp/bio/12/database/glycosciences.de>;  dcterms:identifier “2611”.
<_:1>  glyco:is_glycan_database <someUUID> ; dcterms:identifier “123” .

<someUUID>
    glyco:url_template "http://foo.bar.com/someglycan?id=%s" ;
    glyco:abbreviation "foobar" ;
    glyco:category "glycan structure database" ;
    a glyco:glycan_database ;
    rdfs:label "FooBarDB" .

###List of existing Glycan databse URI's These are URLs of databases that may be referenced. [?id?] placeholder for database internal id

Database nameDescriptionURL template
ccsd Web link to a CarbBank entry about this carbohydrate. http://www.genome.jp/dbget-bin/www_bget?carbbank+[?id?]
glycomedb Web link to a GlycomeDB entry about this carbohydrate.http://www.glycome-db.org/database/showStructure.action?glycomeId=[?id?]
jcggdb Web link to a JCGGDB entry about this carbohydrate.http://jcggdb.jp/idb/jcggdb/[?id?]
kegg_glycan Web link to a KEGG Glycan entry about this carbohydrate.http://www.genome.jp/dbget-bin/www_bget?gl:[?id?]
cfg Web link to a CFG entry about this carbohydrate. http://www.functionalglycomics.org/glycomics/CarbohydrateServlet?pageType=view&view=view&operationType=view&carbId=[?id?]&sideMenu=no%0A%20%20%09%09
glyaffinity Web link to a GlyAffinity entry about this carbohydrate.http://worm.mpi-cbg.de/affinity/structure.action?ID=[?id?]
glycobase_lille Web link to a GlycoBase(Lille) entry about this carbohydrate.http://glycobase.univ-lille1.fr/base/view_mol.php?id=[?id?]
glycosciences.de Web link to a GLYCOSCIENCES.de entry about this carbohydrate. http://www.glycosciences.de/sweetdb/start.php?action=explore_linucsid&linucsid=[?id?]&show=1#struct%0A%20%20%09%09
PDB Web link to a PDB entry about this carbohydrate.http://www.rcsb.org/pdb/explore/explore.do?structureId=[?id?]
unicarbkb Web link to a UniCarbKB entry about this carbohydrate.http://unicarbkb.org/structure/:[?id?]
bcsdb Web link to a BCSDB entry about this carbohydrate.http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=[?id?]
glyco
glycobase_dublin (Registration is required)
unicarbdb Web link to a UniCarb-DB entry about this carbohydrate.http://unicarb-db.biomedicine.gu.se/unicarbdb/show_glycan.action?glycanSequenceId=[?id?]
eurocarbdb-ebi Web link to a EuroCarbDB (EBI) entry about this carbohydrate.http://www.ebi.ac.uk/eurocarb/show_glycan.action?glycanSequenceId=[?id?]
eurocarbdb-nibrt

databases of glycan database https://docs.google.com/document/d/1xy3N7Njsm0EO9fjjp3GC-wuDG3TPcUGQnGPdltp0YpI/edit

#Images The following table contains information about links to graphical representations of the carbohydrate.

##Table 9. Describing an image representation of a Glycan.

predicate data type comment
has_imageResourceURL of image
dc:formatxsd:stringThe file format of the image. (image/svg+xml, image/png, image/gif, …)
symbol_formatxsd:string or ResourceURL to explanation of symbol? The display style of the glycan. (cfg, uoxf, atoms)
Turtle example:
@prefix glycoSequence:  <http://purl.jp/bio/12//glycanSequence/0.1/> .
@prefix glyco:   <http://www.glycome-db.org/rdf/2012/glyco/1.0#> .

<http://www.glycome-db.org/rdf/3018>
      glyco:has_image <http://www.glycome-db.org/http-services/getImage.action?suspress=yes&id=3018>;
<http://www.glycome-db.org/http-services/getImage.action?suspress=yes&id=3018>
              dc:format       "image/png" ;
              glyco:symbol_format "cfg".

#Compositions

predicate data type comment
has_component Resource Reference to another subject in the document describing a part of the composition in detail.
has_cardinality xsd:integer Number of occurrences of an element (e.g. a monosaccharide) in the subject. This information can be missing in case the cardinality cannot be defined (e.g. repeat units with unknown or under-defined repeats). Missing for non-stoichiometrical residues.
has_cardinality_per_repeat xsd:integer Number of occurrences of an element (e.g. a monosaccharide) in the repeat unit. Applicable to repeatUnits only. Missing for non-stoichiometrical residues.
has_monosaccharide ResourceReference to a RDF resource describing the monosaccharide.

Example stucture:

Turtle example:

@prefix glycoSequence:  <http://purl.jp/bio/12//glycanSequence/0.1/> .
@prefix glyco:   <http://www.glycome-db.org/rdf/2012/glyco/1.0#> .

<http://www.glycome-db.org/rdf/2015>
      glyco:has_component
              [ glyco:has_cardinality      "1" ;				#this is only applicable to oligomers
                glyco:has_monosaccharide
                        "http://www.monosaccharidedb.org/query_monosaccharide_by_name.action?scheme=msdb&name=x-dglc-HEX-x:x||(2d:1)n-acetyl"
              ] ;
      glyco:has_component
              [ glyco:has_cardinality
                        "2" ;
                glyco:has_monosaccharide
                        "http://www.monosaccharidedb.org/query_monosaccharide_by_name.action?scheme=msdb&name=a-lgal-HEX-1:5|6:d"
              ] ;
      glyco:has_component
              [ glyco:has_cardinality
                        "1" ;
                glyco:has_monosaccharide
                        "http://www.monosaccharidedb.org/query_monosaccharide_by_name.action?scheme=msdb&name=b-dgal-HEX-1:5"
              ] .

Biological source

see Source RDF in: https://docs.google.com/document/d/1rLJWha_5oXWGgPq8VhytTzJk1grkizVYtWQ7XXOBoXk/edit

Publication references

see Publication RDF in: https://docs.google.com/document/d/1rLJWha_5oXWGgPq8VhytTzJk1grkizVYtWQ7XXOBoXk/edit

Experimental data

see Evidence RDF in: https://docs.google.com/document/d/1rLJWha_5oXWGgPq8VhytTzJk1grkizVYtWQ7XXOBoXk/edit

Glyco relationship RDF Glyco relationship RDF links glycan structure, biological source, publication and experimental data.

https://docs.google.com/document/d/1rLJWha_5oXWGgPq8VhytTzJk1grkizVYtWQ7XXOBoXk/edit

Monosaccharide information

Thomas: The following table contains the RDF properties for single monosaccharides. name

predicate data type comment
name xsd:string MsDB name of the monosaccharide
has_basetype Resource reference to another RDF resource with URI describing the monosaccharide basetype
has_substituent Resource reference to another RDF resource with URI. The substituent is linked to the basetype in this monosaccharide.
average_MW xsd:double literal numeric with decimal, calculated from monosaccharide composition with average atomic weight
monoisotopic_MW xsd:double literal numberic with decimal, calculated from monosaccharide composition with atomic weight of monoisotope.
has_linking_position xsd:integer monosaccharide can be linked to other residues via standard glycosidic linkage at the given backbone position
has_alias_name Resource?[scheme:String; name:String; external_substituent[name; position; linkage_type]; is_primary] name of the monosaccharide in a given notation scheme (external substituents only apply to specific residues / schemes) The following table contains the RDF properties for monosaccharide basetypes.
size xsd:integer number of backbone carbon atoms
average_MW xsd:double literal numeric with decimal, calculated from monosaccharide composition with average atomic weight
monoisotopic_MW xsd:doubel literal numberic with decimal, calculated from monosaccharide composition with atomic weight of monoisotope.
anomeric Resource alpha, beta, none, unknown (URI for concept to be provided). anomeric state of the basetype
configuration Resource D, L, unknown, none (URI for concept to be provided. absolute configuration of the basetype
ring_start xsd:integer position of first carbon involved in ring closure
ring_end xsd:integer position of second carbon involved in ring closure
stereocode xsd:string Stereocode describing the backbone stereochemistry
ext_stereocode xsd:string Extended stereocode of the basetype
has_composition
has_core_modification Resource reference to another RDF describing a core modification that is present in this basetype

Other Glycan information

predicate data type comment
has_standard_MW xsd:double literal numeric with decimal, calculated from monosaccharide composition with standard atomic weight ( http://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl?ele=&ascii=html&isotype=some )
has_monoisotopic_MWxsd:double literal numeric with decimal, calculated from monosaccharide composition with atomic weight of monoisotope.

Glycan Function

predicate data type comment
has_motif Resource reference to another RDF resource with URI. The object is a structurally defined motif Eg. for Neo-lacto motif http://jcggdb.jp/idb/motif?id=JCGG-MOTIF3009.rdf inverse of ”contained_in”. should have sequence, composition, image ...
has_epitope Resource reference to another RDF resource with URI. The subject is a structural motif with biological relevance; subclass of has_motif.
contained_in Resource reference to another RDF resource with URI. The subject is a structurally defined motif, and the object is a glycan structure or motif.
has_affinity_to Resource Subject is a material or organism, which has affinity to the object.
degraded_by Resource Subject is an enzyme, which degrades the object.
generated_by Resource Subject is an enzyme, which synthesize the object.
degraded_from Resource Subject is a precursor of the term “degraded_by”
generated_from Resource Subject is a precursor of the term “generated_by”

Turtle example:

@prefix glycoSequence:  <http://purl.jp/bio/12//glycanSequence/0.1/> .
@prefix glyco:   <http://www.glycome-db.org/rdf/2012/glyco/1.0#> .

<http://www.glycome-db.org/rdf/2015>
...

#Namespace or Vocabularies for Describing GlycoProtein

prefix, uniprotcore:http://purl.uniprot.org/core/

https://docs.google.com/spreadsheet/ccc?key=0Ajw2OqykvyGLdG9FOHN4ZENYZmpVVkhCYmZFc3dwX3c#gid=2

predicate data type comment
rdf:type Resource Description
has_core_protein uniprotcore:Protein Uniprot identifier for glycoprotein entry
amino_acid_length From relevant uniprot page or stated if sequenced/synthesised
amino_acid_sequence
versionOf
has_glycosylated_amino_acid_residue Resource use UUID if URI not available
position_of_amino_acid xsd:int property of “has_glycosylated_amino_acid_residue”, Amino acid position
amino_acid_type Resource Link to definition of amino acid residue. property of “has_glycosylated_amino_acid_residue”, type of amino acid residue, usually N or S/T
modification_type Resource property of “has_glycosylated_amino_acid_residue” The object is referenced to list below.
has_structure Resouce URI of structure associated with glycoprotein (predicate can occur multiple times)
evidence
contributor
<http://unicarbkb.org/protein/Q8TAX7> a glyco:glycosylation_annotation;
glyco:has_uniprot <http://uniprot.org/Q8TAX7>;
glyco:has_glycosylated_amino_acid_residue “”,   
	glyco:has_structure <http://unicarbkb.org/structure/5989> .  (contain many structures, which can be linked to site via BNode)

About

Glycan database standardisation effort (superseded draft see GlycoRDF)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published