#Standard RDF representation for glycans
The following document describes a standard RDF representation for glycan related data. Subject Classesfoaf Properties Sequence information and identifier Resource properties Images Compositions Biological source Publication references Experimental data Glyco relationship RDF Monosaccharide information Other Glycan information Glycan Function Description of GlycoProtein RDFsized glyco-DBs available for SPARQL search -1 RDFsized glyco-DBs available for SPARQL search -2 other RDFs necessary for SPARQL examples References and Links Subject Each document consists of one or several glycans which in the XML version are described using Description tags with each specifying the of the RDF triples in the about attribute. Classes Each class name (e.g.) will translate in RDF to:
@prefix glyco: <http://purl.jp/bio/12/#> .
##Table 1.
owl:class | rdfs:label | rdfs:subClassOf | rdfs:comment | |
---|---|---|---|---|
cyclic_glycan | Cyclic glycan | glycan | ||
repeat_unit | Repeating glycan structure | glycan | ||
biological_repeat_unit | repeatingGlycan |
##Table 2.
sequence | biomolecule sequence | glycan | |
glycosequence | glycan sequence | sequence | |
glycoconjugate_sequence | glycoconjugate sequence | sequence | |
aglycon | the aglycon portion of a glycoconjugate | can be used in combination with other rdf:types from other ontologies. |
##Table 3.
synthetic | synthetic glycan | glycan | |
modified | modified glycan | glycan | including degradation products |
chemical_synthetic | synthetic | ||
enzymatical_synthetic | synthetic | ||
chemoenzymatical_synthetic | synthetic | ||
chemical_modified | modified | ||
enzymatical_modified | modified | ||
modeled | |||
natural | natural glycan | glycan |
##Table 4.
database | a database | ||
glycan_database | a glycan database | database | http://purl.jp/bio/12/core/database.rdf |
Properties Each property name (e.g. sequence_glyde, link_to_ccsd) in the following tables (left column) will translate in RDF to:
@prefix glyco: <http://purl.jp/bio/12/> .
subjectURI
glyco:sequence_glyde “…”;
glyco:link_to_ccsd <http://www.genome.jp/dbget-bin/www_bget?carbbank+6915> .
- Sequence information and identifier The following table describes properties that have simple data types as object (e.g. Strings, Integer). To have at least one of the sequence formats is mandatory.
##Table 4.
predicate | data type | comment | |
---|---|---|---|
rdf:type | class | glycan class (N-glycan, O-glycan, etc.) - from GlycO or some glyco-ontology | |
(glyco:repeatUnit, glyco:biologicalRepeatUnit, glyco:cyclicGlycan) | |||
dcterms:identifier | xsd:string | entry ID in other resource | |
foaf:name | xsd:string | Trivial name (eg sialyl-lewis-x, lactosamine) | |
has_glycosequence | Resource | glycan sequence information to object of rdf:type glyco:sequence | |
has_glycoconjugate_sequence | Resource | sequence information to ojbect of rdf:type glyco:glycoconjugate_sequence | |
has_aglycon | Resource | in case of glyco_conjugate, define the aglycon | |
has_repeat_count | Resource |
##Table 5. has_aglycon node predicates
predicate | data type | comment |
---|---|---|
rdf:type | class | ChEBI (http://www.ebi.ac.uk/chebi/downloadsForward.do) |
foaf:name | xsd:string | trivial name |
has_reference | Resource | see Reference |
attachment_position | xsd:integer | atom number in aglycon |
linkage | xsd:integer | carbon number in glycan |
has_repeat_count is applicable only if a structure encoded in the sequence IS a repeating unit ##Table 6. has_repeat_count node
predicate | data type | comment |
---|---|---|
repeat_attribute | repeat | (min, max, exact, average, unknown) part of has_repeat_count resource |
repeat_count | xsd:integer | value of repeat_attribute in resource (use if it is known) |
##Table 7. sequence formats property of the glyco:sequence resource
predicate | data type | comment |
---|---|---|
rdf:type | owl:class | sequence |
in_glycoct | xsd:string | plain literal string with the carbohydrate sequence in GlycoCT condensed format. GlycoCT sequence format is described in a reference 18436199 |
in_KCF | xsd:string | String with the carbohydrate sequence in KCF format. |
in_GlydeII | xsd:string | String with the carbohydrate sequence in Glyde-II format. |
in_linearCode | xsd:string | String with the carbohydrate sequence in LinearCode® format. |
in_linucs | xsd:string | String with the carbohydrate sequence in LINUCS format. |
in_IUPAC_condensed | xsd:string | String with carbohydrate sequence in IUPAC nomenclature. http://www.chem.qmul.ac.uk/iupac/ http://www.chem.qmul.ac.uk/iupac/misc/glycp.html http://www.chem.qmul.ac.uk/iupac/misc/glylp.html |
in_IUPAC_short | ||
in_IUPAC_extended | ||
in_SweetDB | xsd:string | Multiline string with carbohydrate sequence in Sweet-DB pseudographics. |
in_CSDB | xsd:string | String with sequence of residues in CSDB linear code http://csdb.glycoscience.ru/bacterial/core/help.php?db=bacterial&topic=rules |
@prefix glycoSequence: <http://purl.jp/bio/12/glycanSequence/0.1/> .
@prefix glyco: <http://purl.jp/bio/12/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/>
<http://www.glycome-db.org/rdf/2015>
glyco:has_glycosequence <http://www.glycome-db.org/rdf/2015#UUID> ;
glyco:has_glycoconjugate_sequence <http://www.glycome-db.org/rdf/2015#UUID2> ;
glyco:has_aglycon
[ rdf:type glyco:aglycon;
rdf:type chebi:lipid ;
rdfs:seeAlso <> # Link to database (eg.CheBI, PubChem)
foaf: name; #free text
glyco:attachment_position 3; #atom number in aglycon
glyco:linkage 1 #carbon number in glycan
] ;
rdf:type glyco:GlycanClass ;
glyco:relation <http://...> #back-reference to relastionship RDF
.
<http://www.glycome-db.org/rdf/2015#UUID> #UUID is a random literal to make blank nodes available from SPARQL
rdf:type glyco:glycosequence ;
glyco:in_linucs "[][?-D-GlcNAc]{[(3+1)][a-L-Fucp]{}[(4+1)][b-D-Galp]{[(2+1)][a-L-Fucp]{}}}" ;
glyco:in_iupac "..” .
<http://www.glycome-db.org/rdf/2015#UUID2>
glyco:rdf:type glyco:glycoconjugate_sequence ;
glyco:in_linucs "[XXX][?-D-GlcNAc]{[(3+1)][a-L-Fucp]{}[(4+1)][b-D-Galp]{[(2+1)][a-L-Fucp]{}}}" .
#Resource properties
The following table contains information about links to other non-RDF resources describing the same carbohydrate.
All reference databases must be listed in http://purl.jp/bio/12/database.rdf (currently under construction).
databases of glycan database https://docs.google.com/document/d/1xy3N7Njsm0EO9fjjp3GC-wuDG3TPcUGQnGPdltp0YpI/edit
##Table 8. Describing a glycan database
is_glycan_database | Resource | URI (Range is glyco:glycan_database class ) |
dcterms:identifier | xsd:string | entry ID in other resource |
rdfs:seeAlso | Resource | URI Reference to other resource can be used as subject for further annotation (resource_name and resource_id) |
owl:sameAs | Resource | URI Reference to another RDF description of exactly this carbohydrate provided by a different resource which may contain complementary information |
@prefix glyco: <http://purl.jp/bio/12/> .
@prefix glycodb: <http://purl.jp/bio/12/database> .
<http://www.glycome-db.org/rdf/2015>
rdfs:seeAlso <http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=0538> ;
rdfs:seeAlso <http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=23173>;
rdfs:seeAlso <http://www.genome.jp/dbget-bin/www_bget?carbbank+14848">;
rdfs:seeAlso <http://www.ebi.ac.uk/eurocarb/show_glycan.action?glycanSequenceId=2711>;
rdfs:seeAlso <http://www.glycosciences.de/sweetdb/start.php?action=explore_linucsid&linucsid=2611&show=1#struct%0A%20%20%09%09>;
rdfs:seeAlso <_:1> .
<http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=0538>
glyco:is_glycan_database glycodb:bcsdb;
dcterms:identifier “0538".
<http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=23173>
glyco:is_glycan_database <http://purl.jp/bio/12/database/bcsdb>; dcterms:identifier “23173”.
<http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=6920>
glyco:is_glycan_database <http://purl.jp/bio/12/database/bcsdb>; dcterms:identifier “6920”.
<http://www.genome.jp/dbget-bin/www_bget?carbbank+14848>
glyco:is_glycan_database <http://purl.jp/bio/12/database/ccsd>; dcterms:identifier “14848”.
<http://www.ebi.ac.uk/eurocarb/show_glycan.action?glycanSequenceId=2711>
glyco:is_glycan_database <http://purl.jp/bio/12/database/eurocarb-dbi>; dcterms:identifier “2711”.
<http://www.glycosciences.de/sweetdb/start.php?action=explore_linucsid&linucsid=2611&show=1#struct%0A%20%20%09%09>
glyco:is_glycan_database <http://purl.jp/bio/12/database/glycosciences.de>; dcterms:identifier “2611”.
<_:1> glyco:is_glycan_database <someUUID> ; dcterms:identifier “123” .
<someUUID>
glyco:url_template "http://foo.bar.com/someglycan?id=%s" ;
glyco:abbreviation "foobar" ;
glyco:category "glycan structure database" ;
a glyco:glycan_database ;
rdfs:label "FooBarDB" .
###List of existing Glycan databse URI's These are URLs of databases that may be referenced. [?id?] placeholder for database internal id
Database name | Description | URL template |
---|---|---|
ccsd | Web link to a CarbBank entry about this carbohydrate. | http://www.genome.jp/dbget-bin/www_bget?carbbank+[?id?] |
glycomedb | Web link to a GlycomeDB entry about this carbohydrate. | http://www.glycome-db.org/database/showStructure.action?glycomeId=[?id?] |
jcggdb | Web link to a JCGGDB entry about this carbohydrate. | http://jcggdb.jp/idb/jcggdb/[?id?] |
kegg_glycan | Web link to a KEGG Glycan entry about this carbohydrate. | http://www.genome.jp/dbget-bin/www_bget?gl:[?id?] |
cfg | Web link to a CFG entry about this carbohydrate. | http://www.functionalglycomics.org/glycomics/CarbohydrateServlet?pageType=view&view=view&operationType=view&carbId=[?id?]&sideMenu=no%0A%20%20%09%09 |
glyaffinity | Web link to a GlyAffinity entry about this carbohydrate. | http://worm.mpi-cbg.de/affinity/structure.action?ID=[?id?] |
glycobase_lille | Web link to a GlycoBase(Lille) entry about this carbohydrate. | http://glycobase.univ-lille1.fr/base/view_mol.php?id=[?id?] |
glycosciences.de | Web link to a GLYCOSCIENCES.de entry about this carbohydrate. | http://www.glycosciences.de/sweetdb/start.php?action=explore_linucsid&linucsid=[?id?]&show=1#struct%0A%20%20%09%09 |
PDB | Web link to a PDB entry about this carbohydrate. | http://www.rcsb.org/pdb/explore/explore.do?structureId=[?id?] |
unicarbkb | Web link to a UniCarbKB entry about this carbohydrate. | http://unicarbkb.org/structure/:[?id?] |
bcsdb | Web link to a BCSDB entry about this carbohydrate. | http://csdb.glycoscience.ru/bacterial/core/search_id.php?id_list=[?id?] |
glyco | ||
glycobase_dublin | (Registration is required) | |
unicarbdb | Web link to a UniCarb-DB entry about this carbohydrate. | http://unicarb-db.biomedicine.gu.se/unicarbdb/show_glycan.action?glycanSequenceId=[?id?] |
eurocarbdb-ebi | Web link to a EuroCarbDB (EBI) entry about this carbohydrate. | http://www.ebi.ac.uk/eurocarb/show_glycan.action?glycanSequenceId=[?id?] |
eurocarbdb-nibrt |
databases of glycan database https://docs.google.com/document/d/1xy3N7Njsm0EO9fjjp3GC-wuDG3TPcUGQnGPdltp0YpI/edit
#Images The following table contains information about links to graphical representations of the carbohydrate.
##Table 9. Describing an image representation of a Glycan.
predicate | data type | comment |
---|---|---|
has_image | Resource | URL of image |
dc:format | xsd:string | The file format of the image. (image/svg+xml, image/png, image/gif, …) |
symbol_format | xsd:string or Resource | URL to explanation of symbol? The display style of the glycan. (cfg, uoxf, atoms) |
Turtle example:
@prefix glycoSequence: <http://purl.jp/bio/12//glycanSequence/0.1/> .
@prefix glyco: <http://www.glycome-db.org/rdf/2012/glyco/1.0#> .
<http://www.glycome-db.org/rdf/3018>
glyco:has_image <http://www.glycome-db.org/http-services/getImage.action?suspress=yes&id=3018>;
<http://www.glycome-db.org/http-services/getImage.action?suspress=yes&id=3018>
dc:format "image/png" ;
glyco:symbol_format "cfg".
#Compositions
predicate | data type | comment |
---|---|---|
has_component | Resource | Reference to another subject in the document describing a part of the composition in detail. |
has_cardinality | xsd:integer | Number of occurrences of an element (e.g. a monosaccharide) in the subject. This information can be missing in case the cardinality cannot be defined (e.g. repeat units with unknown or under-defined repeats). Missing for non-stoichiometrical residues. |
has_cardinality_per_repeat | xsd:integer | Number of occurrences of an element (e.g. a monosaccharide) in the repeat unit. Applicable to repeatUnits only. Missing for non-stoichiometrical residues. |
has_monosaccharide | Resource | Reference to a RDF resource describing the monosaccharide. |
Example stucture:
Turtle example:
@prefix glycoSequence: <http://purl.jp/bio/12//glycanSequence/0.1/> .
@prefix glyco: <http://www.glycome-db.org/rdf/2012/glyco/1.0#> .
<http://www.glycome-db.org/rdf/2015>
glyco:has_component
[ glyco:has_cardinality "1" ; #this is only applicable to oligomers
glyco:has_monosaccharide
"http://www.monosaccharidedb.org/query_monosaccharide_by_name.action?scheme=msdb&name=x-dglc-HEX-x:x||(2d:1)n-acetyl"
] ;
glyco:has_component
[ glyco:has_cardinality
"2" ;
glyco:has_monosaccharide
"http://www.monosaccharidedb.org/query_monosaccharide_by_name.action?scheme=msdb&name=a-lgal-HEX-1:5|6:d"
] ;
glyco:has_component
[ glyco:has_cardinality
"1" ;
glyco:has_monosaccharide
"http://www.monosaccharidedb.org/query_monosaccharide_by_name.action?scheme=msdb&name=b-dgal-HEX-1:5"
] .
see Source RDF in: https://docs.google.com/document/d/1rLJWha_5oXWGgPq8VhytTzJk1grkizVYtWQ7XXOBoXk/edit
see Publication RDF in: https://docs.google.com/document/d/1rLJWha_5oXWGgPq8VhytTzJk1grkizVYtWQ7XXOBoXk/edit
see Evidence RDF in: https://docs.google.com/document/d/1rLJWha_5oXWGgPq8VhytTzJk1grkizVYtWQ7XXOBoXk/edit
Glyco relationship RDF Glyco relationship RDF links glycan structure, biological source, publication and experimental data.
https://docs.google.com/document/d/1rLJWha_5oXWGgPq8VhytTzJk1grkizVYtWQ7XXOBoXk/edit
Thomas: The following table contains the RDF properties for single monosaccharides. name
predicate | data type | comment |
---|---|---|
name | xsd:string | MsDB name of the monosaccharide |
has_basetype | Resource | reference to another RDF resource with URI describing the monosaccharide basetype |
has_substituent | Resource | reference to another RDF resource with URI. The substituent is linked to the basetype in this monosaccharide. |
average_MW | xsd:double | literal numeric with decimal, calculated from monosaccharide composition with average atomic weight |
monoisotopic_MW | xsd:double | literal numberic with decimal, calculated from monosaccharide composition with atomic weight of monoisotope. |
has_linking_position | xsd:integer | monosaccharide can be linked to other residues via standard glycosidic linkage at the given backbone position |
has_alias_name | Resource? | [scheme:String; name:String; external_substituent[name; position; linkage_type]; is_primary] name of the monosaccharide in a given notation scheme (external substituents only apply to specific residues / schemes) The following table contains the RDF properties for monosaccharide basetypes. |
size | xsd:integer | number of backbone carbon atoms |
average_MW | xsd:double | literal numeric with decimal, calculated from monosaccharide composition with average atomic weight |
monoisotopic_MW | xsd:doubel | literal numberic with decimal, calculated from monosaccharide composition with atomic weight of monoisotope. |
anomeric | Resource | alpha, beta, none, unknown (URI for concept to be provided). anomeric state of the basetype |
configuration | Resource | D, L, unknown, none (URI for concept to be provided. absolute configuration of the basetype |
ring_start | xsd:integer | position of first carbon involved in ring closure |
ring_end | xsd:integer | position of second carbon involved in ring closure |
stereocode | xsd:string | Stereocode describing the backbone stereochemistry |
ext_stereocode | xsd:string | Extended stereocode of the basetype |
has_composition | ||
has_core_modification | Resource | reference to another RDF describing a core modification that is present in this basetype |
predicate | data type | comment |
---|---|---|
has_standard_MW | xsd:double | literal numeric with decimal, calculated from monosaccharide composition with standard atomic weight ( http://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl?ele=&ascii=html&isotype=some ) |
has_monoisotopic_MW | xsd:double | literal numeric with decimal, calculated from monosaccharide composition with atomic weight of monoisotope. |
predicate | data type | comment |
---|---|---|
has_motif | Resource | reference to another RDF resource with URI. The object is a structurally defined motif Eg. for Neo-lacto motif http://jcggdb.jp/idb/motif?id=JCGG-MOTIF3009.rdf inverse of ”contained_in”. should have sequence, composition, image ... |
has_epitope | Resource | reference to another RDF resource with URI. The subject is a structural motif with biological relevance; subclass of has_motif. |
contained_in | Resource | reference to another RDF resource with URI. The subject is a structurally defined motif, and the object is a glycan structure or motif. |
has_affinity_to | Resource | Subject is a material or organism, which has affinity to the object. |
degraded_by | Resource | Subject is an enzyme, which degrades the object. |
generated_by | Resource | Subject is an enzyme, which synthesize the object. |
degraded_from | Resource | Subject is a precursor of the term “degraded_by” |
generated_from | Resource | Subject is a precursor of the term “generated_by” |
Turtle example:
@prefix glycoSequence: <http://purl.jp/bio/12//glycanSequence/0.1/> .
@prefix glyco: <http://www.glycome-db.org/rdf/2012/glyco/1.0#> .
<http://www.glycome-db.org/rdf/2015>
...
#Namespace or Vocabularies for Describing GlycoProtein
prefix, uniprotcore:http://purl.uniprot.org/core/
https://docs.google.com/spreadsheet/ccc?key=0Ajw2OqykvyGLdG9FOHN4ZENYZmpVVkhCYmZFc3dwX3c#gid=2
predicate | data type | comment |
---|---|---|
rdf:type | Resource | Description |
has_core_protein | uniprotcore:Protein | Uniprot identifier for glycoprotein entry |
amino_acid_length | From relevant uniprot page or stated if sequenced/synthesised | |
amino_acid_sequence | ||
versionOf | ||
has_glycosylated_amino_acid_residue | Resource | use UUID if URI not available |
position_of_amino_acid | xsd:int | property of “has_glycosylated_amino_acid_residue”, Amino acid position |
amino_acid_type | Resource | Link to definition of amino acid residue. property of “has_glycosylated_amino_acid_residue”, type of amino acid residue, usually N or S/T |
modification_type | Resource | property of “has_glycosylated_amino_acid_residue” The object is referenced to list below. |
has_structure | Resouce | URI of structure associated with glycoprotein (predicate can occur multiple times) |
evidence | ||
contributor |
<http://unicarbkb.org/protein/Q8TAX7> a glyco:glycosylation_annotation;
glyco:has_uniprot <http://uniprot.org/Q8TAX7>;
glyco:has_glycosylated_amino_acid_residue “”,
glyco:has_structure <http://unicarbkb.org/structure/5989> . (contain many structures, which can be linked to site via BNode)