Glycan Structures GlyCosmos Glycans

The glycan list is under construction, but you can search glycan structures registered in GlyTouCan.

Text Search
Please select an input format.
Please enter glycans.
 Clear

GlycoCT format is encoding schema for carbohydrate sequences based on a connection table approach to describe carbohydrate sequences. The format is adopting IUPAC rules to generate a consistent, machine-readable nomenclature using a block concept to describe carbohydrate sequences like repeating units. It consists of two variants, a condensed format and an XML format. The condensed format allows for unique identification of glycan structures in a compact manner. The monosaccharide naming convention follows the following format: a-bccc-DDD-e:f|g:h, where a is the anomeric configuration (one of a, b, o, x), b is the configurational symbol (one of d, l, x), ccc is the three-letter code for the monosaccharide as listed in Table 1.1, DDD is the base type or superclass indicating the number of consecutive carbon atoms such as HEX, PEN, NON, e and f indicate the carbon numbers involved in closing the ring, g is the position of the modifier, and h is the type of modifier. For a, b, e, f and g, an x can be used to specify an unknown value. bcc and g : h may also be repeated if necessary. It is noted that substituents of monosaccharides are also treated as separate residues attached to the base residue. These substituents are distinguished by specifying one of the following codes immediately after the residue number: b=basetype, s=substituent, r=repeating unit, a=alternative unit. The list of substituents handled by GlycoCT is given in Table 1.2. The GlycoCT format follows something similar to the KCF format, where the residues are specified in a RES section, and the linkage in a LIN section. More details

List of monosaccharide and their three-letter codes used in GlycoCT.

Monosaccharide name Three-letter code Superclass
Allose ALL HEX
Altrose ALT HEX
Arabinose ARA PEN
Erythrose ERY TET
Galactose GAL HEX
Glucose GLC HEX
Glyceraldehyde GRO TRI
Gulose GUL HEX
Idose IDO HEX
Lyxose LYX PEN
Mannose MAN HEX
Ribose RIB PEN
Talose TAL HEX
Threose TRE TET
Xylose XYL PEN

List of substituents used in GlycoCT.

acetyl amidino amino
anhydro bromo chloro
diphospho epoxy ethanolamine
ethyl fluoro formyl
glycolyl hydroxymethyl imino
iodo lactone methyl
N-acetyl N-alanine N-amidino
N-dimethyl N-formyl N-glycolyl
N-methyl N-methyl-carbomoyl N-succinate
N-sulfate N-triflouroacetyl nitrate
phosphate phospho-choline phospho-ethanolamine
pyrophosphate pyruvate succinate
sulfate thio triphosphate
(r)-1-hydroxyethyl (r)-carboxyethyl (r)-carboxymethyl
(r)-lactate (r)-pyruvate (s)-1-hydroxyethyl
(s)-carboxyethyl (s)-carboxymethyl (s)-lactate
(s)-pyruvate (x)-lactate (x)-pyruvate

Example of GlycoCT format:


    RES
    1b:b-dglc-HEX-1:5
    2b:b-dgal-HEX-1:5
    3b:b-dglc-HEX-1:5
    4s:n-acetyl
    5b:b-dgal-HEX-1:5
    6b:a-lgal-HEX-1:5|6:d
    LIN
    1:1o(4+1)2d
    2:2o(3+1)3d
    3:3d(2+1)4n
    4:3o(3+1)5d
    5:5o(2+1)6d
    

IUPAC suggests an extended IUPAC form by which structures are written across multiple lines. This is the format originally used by CarbBank, thus it is sometimes referred to as such. The representation of monosaccharides is the same as that of IUPAC format, where each monosaccharides residue is preceded by the anomeric descriptor and the configuration symbol and the ring size is indicated by an italic f or p. If any of α/β, D/L, f/p are omitted, it is assumed that this structural detail is unknown. Branches are written on a second line, or in brackets on the same line. This format is may substitute α and β with a and b, respectively. Arrows (→) may also be replaced by hyphens (-), and up (↑) and down (↓) arrows may be replaced by bars (|). More details

Example of CarbBank format: The N-glycan core structure represented in CarbBank (extended IUPAC) format.


    a-D-Manp-(1-6)+
                  |
             b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-a-D-GlcpNAc
                  |
    a-D-Manp-(1-3)+
    

However, enter the following:


    α-D-Manp-(1→3)[α-D-Manp-(1→6)]-β-D-Manp-(1→4)-β-D-GlcpNAc-(1→4)-β-D-GlcpNAc-(1→
    

Linear Code® is a carbohydrate format that uses a single-letter nomenclature for monosaccharides and includes a condensed description of the glycosidic linkages. Monosaccharide representation is based on the common structure of a monosaccharide where modifications to the common structure are indicated by specific symbols, as in the following (Banin el al.(2002)). Stereoisomers (D or L) differing from the common isomer are indicated by apostrophe (‘). Monosaccharides with differing ring size (furanose or pyranose) from the common form are indicated by a caret (^). Monosaccharides differing in both of the above are indicated by a tilde (~). More details

List of common modifications as used in the Linear Code® format.

Modification Type Linear Code®
amino Q
ethanolaminephosphate PE
inositol IN
methyl ME
N-acetyl N
O-acetyl T
phosphate P
phosphocholine PC
pyruvate PYR
sulfate S
sulfide SH
2-aminoethylphosphonic acid EP

Example of Linear Code®:


    GNb2(Ab4GNb4)Ma3(Ab4GNb2(Fa3(Ab4)GNb6)Ma6)Mb4GNb4GN
    

The KEGG Chemical Function (KCF) format for representing glycan structures was originally used to represent chemical structures (thus the name) in KEGG. KCF uses the graph notation, where nodes are monosaccharides and edges are glycosidic linkages. Thus to represent a glycan, at least three sections are required: ENTRY, NODE, EDGE, followed by three slashes ‘///’ at the end. More details

  • The ENTRY section consists of one line and may specify a name for the structure followed by the keyword Glycan.
  • The NODE section consists of several lines. The first line contains the number of monosaccharides or aglycon entities, and the following lines consist of the details of these entities numbered consecutively. For each entity line, the name and x- and y-coordinates (to draw on a 2D plane) must be specified.
  • Similarly, the EDGE section consists of several lines, the first line containing the number of bonds (usually one less than the number of NODEs), followed by the details of the bond information. The format for the bond information is as follows:
      num<donor node#>:<anomeric configuration (a or b)><donor carbon#> <acceptor node#>:<acceptor carbon#>

Example of KCF format: The N-glycan core structure represented in KCF format.


    ENTRY     XYZ          Glycan
    NODE      5
              1     GlcNAc     15.0     7.0
              2     GlcNAc      8.0     7.0
              3     Man         1.0     7.0
              4     Man        -6.0    12.0
              5     Man        -6.0     2.0
    EDGE      4
              1     2:b1       1:4
              2     3:b1       2:4
              3     5:a1       3:3
              4     4:a1       3:6
    ///
    

Web3 Unique Representation of Carbohydrate Structures (WURCS) as a linear notation for representing carbohydrates for the Semantic Web. More details

Example of WURCS format:


    WURCS=2.0/3,5,4/[a2122h-1b_1-5_2*NCC/3=O][a1122h-1b_1-5][a1122h-1a_1-5]/1-1-2-3-3/a4-b1_b4-c1_c3-d1_c6-e1
    



Copyright © 2019    GlyCosmos Portal v1.1.0