The DG-CST database is a collection of conserved sequence elements, identified by a systematic genomic sequence comparison between a set of human genes involved in the pathogenesis of genetic disorders and their murine counterparts. Human and Mouse genomic sequences were compared by BLASTZ, an independent implementation of the Gapped BLAST algorithm, specifically designed for aligning two long genomic sequences (http://bio.cse.psu.edu). Sequences longer than 100 and with identity better than 70% were selected as CSTs and imported into the DB. CSTs are extensively annotated with respect to exon/intron structure and other biological parameters.
CST counterparts in other species were identified by using BLAST to scan genomes from other species, and selecting on the basis of homology and colinearity.

DG-CST REFERENCES
DG-CST (Disease Gene Conserved Sequence Tags), a database of human-mouse conserved elements associated to disease genes. Nucleic Acids Research, Vol. 33
Access CST data by disease gene or chr localization:
Gene. CSTs identified on the basis of the analysed genes and their features such as gene symbol and description according to ENSEMBL or LocusLink gene definition, tissue localization, alternative splicing and others.
Chromosomal localization. Use this to search CSTs by chromosomal coordinates, distances from analysed and closest genes and coding regions, etc. You can, for example, identify intergenic CSTs located in the region proximal to the 5' end of disease genes.
Graphic browser. A tool for graphic visualization of CSTs on the chormosome and within the gene context. CSTs may be color coded on the basis of alternative features such as percent identity, GC content, CPS value, direct and inverted repeats etc.
Access CST data by sequence features:
DNA features. About 50% of CSTs do not corrispond to previously known exon sequences. At least some of them are expected to be transcriptional regulatory regions and elements involved in chromosome structure and functions.
RNA features. CSTs may be searched for the presence of conserved RNA secondary structures as determined by a prediction program. A CST with a conserved RNA secondary structure is expected to be transcribed and active as RNA.
Coding regions. About 30% of CSTs contain coding sequences, according to ENSEMBL exon definitions, but a significant fraction of the remaining 70% retains significant coding potential, when analysed with a prediction software. Among them there are novel exons belonging to known genes or to novel genes.
Search CST by ID

Search gene by symbol or keyword
Symbol Keyword

View all analysed genes
Search by sequence

Conserved Sequence Tags Database