PUP

Evolution and Variation of the SARS-CoV Genome

Genomics, Proteomics & Bioinformatics
Volume 1, Issue 3, August 2003, Pages 216-225

Jianfei Hu, Jing Wang, Jing Xu, Wei Li, Yujun Han, Yan Li, Jia Ji, Jia Ye, Zhao Xu, Zizhang Zhang, Wei Wei, Songgang Li, Jun Wang, Jian Wang, Jun Yu, Huanming Yang.

Abstract

Knowledge of the evolution of pathogens is of great medical and biological significance to the prevention, diagnosis, and therapy of infectious diseases. In order to understand the origin and evolution of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus), we collected complete genome sequences of all viruses available in GenBank, and made comparative analyses with the SARS-CoV. Genomic signature analysis demonstrates that the coronaviruses all take the TGTT as their richest tetranucleotide except the SARS-CoV. A detailed analysis of the forty-two complete SARS-CoV genome sequences revealed the existence of two distinct genotypes, and showed that these isolates could be classified into four groups. Our manual analysis of the BLASTN results demonstrates that the HE (hemagglutinin-esterase) gene exists in the SARS-CoV, and many mutations made it unfamiliar to us.

Key words

SARS, SARS-CoV, motif frequency profile, genomic signature, Chaos Game Representation, PUP

Genome Organization of the SARS-CoV

Genomics, Proteomics & Bioinformatics
Volume 1, Issue 3, August 2003, Pages 226-235

Jing Xu, Jianfei Hu, Jing Wang, Yujun Han, Yongwu Hu, Jie Wen, Yan Li, Jia Ji, Jia Ye, Zizhang Zhang, Wei Wei, Songgang Li, Jun Wang, Jian Wang, Jun Yu, Huanming Yang.

Abstract

Annotation of the genome sequence of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) is indispensable to understand its evolution and pathogenesis. We have performed a full annotation of the SARS-CoV genome sequences by using annotation programs publicly available or developed by ourselves. Totally, 21 open reading frames (ORFs) of genes or putative uncharacterized proteins (PUPs) were predicted. Seven PUPs had not been reported previously, and two of them were predicted to contain transmembrane regions. Eight ORFs partially overlapped with or embedded into those of known genes, revealing that the SARS-CoV genome is a small and compact one with overlapped coding regions. The most striking discovery is that an ORF locates on the minus strand. We have also annotated non-coding regions and identified the transcription regulating sequences (TRS) in the intergenic regions. The analysis of TRS supports the minus strand extending transcription mechanism of coronavirus. The SNP analysis of different isolates reveals that mutations of the sequences do not affect the prediction results of ORFs.

Key words

SARS-CoV, genome annotation, transcription, ORF, PUP, TRS