66

databases

An artificial intelligenceapproachto DNA sequencefeature recognition Richard J. Mural, J. Ralph Einstein, Xiaojun Guan, Reinhold C. Mann and Edward C. Uberbacher

The ultimate goal of the Human Genome project is to extract the biologically relevant information recorded in the estimated 100000

genes encoded by the

3 x 109 bases of the human genome. This necessitates development of reliable computer-based methods capable of analysing and correctly identifying genes ii; the vast amounts of DNA-sequence data generated. Such tools may save time and labour by simpWying, for example, screening of cDNA libraries. They may also facilitate the localization of human disease genes by identifijkg car#ate

genes in

promising regions of anonymous DNA sequence.

nnaiys~s IS 3 rclatlvcly rccmt Although computer-based recognition of features in nology in DNA-scq~wncc The application ofmncbine learning hns DNA srqucncrs cm bc difiicult, WC k~;ow this sl~ould dcvciopmcnt. lx possible since orgmisms rccognizc and utilize the made LIS(: oftwo basic appronchcs to scqucncc-pattern recognition: Sigml?, :Tith cxon-rrcognition programs is the precise idcntitication of the coon-introt; intctfaccs atrd the exact ‘start’ and ‘stop’ points of coons. Although by no means an easy task, programs to tackle thcsc problems arc b&g dcvclopcdt 1. When achieved, such ‘gcncassembly programs will bc able not onl, to identify the cxons, but also ro asscmblc a model ofa gene from its component parts and predict the cncodcd proteins. The low false-positive rat? in fiaturc-rccognitlot-z characteristic ofchc CRM is also important in dcsigning thcsc cxpcrt system gcnc-assembly programs bccausc of the combinatorial nature of the analysis.

“It might

be a junk you

can

sequence dance

but at teast

to it !”

Acknowledgements This work was sponsored by the Offcc of Health and Envirotm~cntnl Rescnrch. US Dcpartmrnt of Encrhy, under contract nE-ACO5-t140R?14(tO wit11 the Martin Marietta Energy Systems, Inc., and wc arc particularly grateful fcr support from the Esploracoty Stud& Program of the Oak Ridge National Laboratory.

Forthcoming topics in TBTECH..

....

Recombinant inbred mouse strains: models for studying disease Responsive systems for controlled delivery of therapeutics Phage antibodies Biotechnology of cotton Triple-helix therapeutics Recombinant haemoglobins 3iosafety monitoring in bioprocessing Plastic: transformation in higher plants &o-oxidation reactions and fine-chemical preparation Message Amplification Phenotyping Transgene design Therapeutic challenges in glycobiology Genetic engineering of rice Controlling proIeofysis In fermentations High-throughput immunoassays Do not miss these subscribe to TlsTECH using th@form bound in this issue.

TIBTECH jWl/fEB

1992 !‘:OL 1Ci

An artificial intelligence approach to DNA sequence feature recognition.

The ultimate goal of the Human Genome project is to extract the biologically relevant information recorded in the estimated 100,000 genes encoded by t...
485KB Sizes 0 Downloads 0 Views