ZCURVE_CoV: a new system to recognize protein coding genes in coronavirus genomes, and its applications in analyzing SARS-CoV genomes

Biochemical and Biophysical Research Communications
Volume 307 (2003), Pages 382–388

Ling-Ling Chen, Hong-Yu Ou, Ren Zhang, and Chun-Ting Zhanga.

Abstract

A new system to recognize protein coding genes in the coronavirus genomes, specially suitable for the SARS-CoV genomes, has been proposed in this paper. Compared with some existing systems, the new program package has the merits of simplicity, high accuracy, reliability, and quickness. The system ZCURVE_CoV has been run for each of the 11 newly sequenced SARS-CoV genomes. Consequently, six genomes not annotated previously have been annotated, and some problems of previous annotations in the remaining five genomes have been pointed out and discussed. In addition to the polyprotein chain ORFs 1a and 1b and the four genes coding for the major structural proteins, spike (S), small envelop (E), membrane (M), and nuleocaspid (N), respectively, ZCURVE_CoV also predicts 5–6 putative proteins in length between 39 and 274 amino acids with unknown functions. Some single nucleotide mutations within these putative coding sequences have been detected and their biological implications are discussed. A web service is provided, by which a user can obtain the annotated result immediately by pasting the SARS-CoV genome sequences into the input window on the web site (http://tubic.tju.edu.cn/sars/). The software ZCURVE_CoV can also be downloaded freely from the web address mentioned above and run in computers under the platforms of Windows or Linux.  2003 Elsevier Inc. All rights reserved.

Keywords

Coronavirus, Severe acute respiratory syndrome, SARS-CoV, Genome, Gene-finding, Mutation