k. 1992 Oxford University Press

Nucleic Acids Research, Vol. 20, Supplement 2009-2011

Molecular Probe Database: a database on synthetic oligonucleotides Ottavia Aresul, Barbara Parodil, Paolo Romano2, Massimo Romani1, Giovanna Angelinil, Assunta Manniellol, Beatrice lannottal, Gabriella Rondaninal, Tiziana Ruzzon' and Leonardo Santil 2 'Istituto Nazionale per la Ricerca sul Cancro, viale Benedetto XV 10, 16132 Genova and 2Istituto di Oncologia Clinica e Sperimentale, Universita degli Studi, viale Benedetto XV 10, 16132 Genova, Italy

ABSTRACT The Molecular Probe Data Base (MPDB) is designed to collect and make available on-line information on synthetic oligonucleotides. This paper briefly describes the purpose of MPDB, its content and structure, forms and mode of data distribution, and a series of additional services available to scientists using MPDB.

INTRODUCTION The swift development of methods for a rapid and efficient synthesis of oligonucleotides, together with the advent of molecular cloning techniques, has had a profound effect on many biological studies. Synthetic DNA has become an indispensable tool in biology laboratories for a wide range of applications, such as molecular diagnosis of genetic and infectious diseases, bone marrow transplantation, genetic polymorphism studies, oncogene identification, paternity testing and forensic medicine. As a consequence, not only molecular biologists from University and Industry, but also clinicians are potentially interested in acquiring information on nucleotide sequence, potential applications, technical features and availability of tested synthetic oligonucleotides. Therefore, an easily accessible and updated database, available on-line, can make the choice of the most appropriate sequence easier. In this framework, the Molecular Probe Data Base has been developed within the Interlab Project (1), a University-Industry joint project, funded by the Italian Ministry for University and Scientific and Technological Research, as part of the improvement of Italian research infrastructures. Additional services of biomedical interest are provided in this project, in particular access to databases on animal cell lines (Cell Line Data Base) (2) and HLA typed B lymphoblastoid cell lines (B Line Data Base) (3), electronic mail, news.

MPDB CONTENTS AND STRUCTURE Data on oligonucleotides having a sequence up to 100 nucleotides can be recorded into MPDB. As a first step, data were collected from research laboratories within the National Institute for Cancer Research of Genoa; written questionnaires have been sent later on to main academic and industrial molecular biology laboratories. Further data are being taken from the literature.

Information on oligonucleotide identification, target gene, origin, technical aspects, applications and availability is included. In table 1 the data structure of MPDB is shown.

Identification is based on the name of the oligonucleotide, as it was assigned by the data submitter, and on the nucleotide sequence. When the sequence refers to a coding region, the possibility is given to indicate the starting point of the corresponding peptide. The aminoacid sequence, expressed by the standard three-letter abbreviations (4), is then automatically generated by the computer. Information on more than one target gene can be specified for a single oligonucleotide. Each gene is described on the basis of its name and code (taken from the HGM10 (5) in the case of human genes), map location, EMBL/GenBank sequence access number (6,7) and, for polymorphic genes, recognized allelic variants. Names of allelic variants are derived, when possible, from international workshops nomenclatures. This is the case, for example, of HLA genes (8). Technical data include melting temperature and NaCl molarity. When this information is not provided by the data submitter, values are automatically estimated by the system, according to Sambrook et al. (9). Origin data include bibliographic references and details on laboratory that either submitted data or synthesized the oligonucleotide. More than one application can be recorded for each oligonucleotide. These are taken from a vocabulary of controlled terms that includes the most frequent research, clinical and industrial applications. Finally, data on availability include oligonucleotide code in catalogues and the addresses of the distributors. Data recorded in MPDB are structured in a quite complex framework, created according to databases relational theory. Its main purposes are: i) to optimize searches (both in speed and accuracy), ii) to reduce disk usage (avoiding repetition of data), iii) to improve user interface, and iv) to minimize errors (ortography, punctuation, etc...). In this context, a table has been built for each logical entity involved in MPDB, i.e. oligonucleotides, genes and laboratories, and for those attributes that can assume a particular value only

2010 Nucleic Acids Research, Vol. 20, Supplement Table 1. Basic data set and MPDB data structure

mnem. code§

data type

Field name

Identification Name Length Nucleotide sequence Starting nucleotide Aminoacid sequence Bibliography

free text

insert data set*

query key

yes

yes

yes

yes

multi

record'

integer@ intl coding integer

intl coding@ free text

Target gene Gene Species Chrom. localization Polymorphic Notes on polymorphism Localization within the gene Recognized specificities

yes

yes yes

encoded encoded intl coding encoded free text

yes

yes

yes

yes

free text encoded

yes

yes

yes

Technical features and availability integer NaCl molarity integer@ Melting temperature encoded Applications encoded Laboratory free text Notes encoded Distributor free text Distr. catalogue code

yes yes

yes

yes

yes

yes yes

yes yes

§ Availability of a mnemonic code * Mandatory fields ^ More than a definition per oligonucleotide can be inserted @ Data automatically determined by the system on the basis of the nucleotide sequence

Molecular Probe Data Base June 1991

Interlab Project

oligonucleotides detailed description

DPB DBlO IST-IG51 5' GAATTACCTTTTCCAGGGA 3' AsnTyrLeuPheGlnGly

Target gene: -

HLA-DPBl Major histocompatibility complex, class II, DP beta 1 Human Chromosome localization: 6p2l.3 19 allelic variants in the second exon Oligo localization: Second exon, nucleotides 15-34 Recognized specificities: DPB-7 - DPB1*0201 - DPB1*0202 - DPB1*0401

DPB1*0402

-

DPB1*0501

-

DPB1*0801

molecules - Bone marrow Genetic polymorphism Forensic medicine Imoune system C 66 Melting temperature:

Applications: B cell surface

-

Data received from: Istituto Nazionale per la Ricerca sul Cancro IMMGE

-

-

DPB1*1901

transplantation HLA Class II

Genova

Bibliography: J. Immunol. 1988, 141:4024

Fig. 1. Report of the oligonucleotide detailed description. Mnemonic codes

are

within a set of predefined values, e.g. species, applications and allelic variants of a gene. In the latter case, for each item of the tables (also referred to as controlled vocabularies) a mnemonic code is provided as well, to facilitate and speed up searches and insertion of data. Vocabularies can only be updated by the database administrator,

used instead of description for laboratory.

but end users and data submitters can suggest extensions on the basis of their experience. The tables of interest can be browsed by choosing the corresponding option in the repertories menu. For the development of the database we have utilized 700 oligonucleotides, which cover a number of different applications. Presently MPDB contains mainly information on highly

Nucleic Acids Research, Vol. 20, Supplement 2011 polymorphic human sequences with particular reference to the Major Histocompatibility Complex (MHC), to chromosome specific PCR amplimers and to oligonucleotides useful in the diagnosis of genetic diseases. About 100 viral sequences, which can be employed for the diagnosis of human infectious diseases, have been also included. The database is being constantly updated and we plan to add a minimum of 2000 oligonucleotides per year. However, we feel that the ultimate goal of this database could be the establishment of an international network between laboratories to make rapidly available information on different areas with special references to the Human Genome Project, animal diseases and plant biology. In this context this database could be an important step toward the construction of the 'Common Language' for the physical mapping of the human genome (10).

USER INTERFACE Particular care has been devoted to the realization of the user interface; this has led to easy to understand screens and valuable 'on-line' helps. Standard search and display procedures have been prepared and results of queries are available for possible downloadings.

Searches Developed applications give the user the opportunity to carry out searches on the basis of: i) the name and/or distribution code of the oligonucleotides, ii) their nucleotide sequence, iii) their target gene (including corresponding chromosome localization and specificities), and iv) their applications (see also table 1). Mnemonic codes can be used as an alternative to the complete definition of the corresponding item. Queries are made even easier by the availability of wild characters, which can be used within the text to be searched for as substitutes for either a single character, or any sequence of characters.

Reports Standard layouts for searching and displaying results have been prepared, available in a format fit to the transfer to a personal computer. Thus, at the end of a search, the user can ask for the report of all the information of interest. Data can be either 'downloaded' by capturing them while they are displayed, or requested via electronic mail. Four differents kinds of report can be obtained from MPDB: i) reports on all the characteristics of one specific oligonucleotide (Fig. 1), ii) reports on relevant characteristics of all oligonucleotides, iii) lists of oligonucleotide names, indexed by some relevant characteristics, and iv) lists of items included in controlled vocabularies. Oligonucleotides can be indexed on the basis of their application, species and target gene, target gene and recognized specificities. Lists of controlled terms are available for species, genes and specificities, applications, research laboratories and distributors.

TECHNICAL DESCRIPTION MPDB has been developed by means of the relational database management system Oracle that is available on many computers (from personals to mainframes) and utilizes the standard query language SQL (Structured Query Language). Unix operating system has been adopted as well. These choices ensure good modularity and portability to the system. A medium power

microcomputer, allowing for 16 concurrent users, is used for the developmental stage. MPDB is hosted both in the Interlab Project Data Processing Center (IPPC), which is located at the National Institute for Cancer Research, and in the Ansaldo Data Processing Center (ADPC). IPPC can be reached via Internet, while ADPC can only be reached by means of the Public Packet Switching Data Network (PSDN, ITAPAC in Italy), that is internationally available. In both cases, DEC VT100 terminal emulation software should be used in order to fully accomplish Oracle functionalities. No particular limitation exists on hardware used by the end users. In order to access MPDB, login accounts must be requested to the Interlab Project User Service. All users recognized by MPDB through the login/password procedure will be directly prompted by its applications and, depending on their rights, will be able to perform appropriate tasks.

ACKNOWLEDGEMENTS This work was supported by a grant from the Italian Ministry for University and Scientific and Technological Research (DM 02081988).

CONTACT Enquiries about MPDB should be addressed to: Interlab Project User Service Servizio Tecnologie Biomediche Istituto Nazionale per la Ricerca sul Cancro viale Benedetto XV 10, 16132 Genova, ITALY tel. +39(10)3534511/2/3/4/5 fax: + 39(10)352888/352999 E-mail:[email protected] (also 130.251.201.2) Telecom Gold:75:DBI0503 Questionnaires can be requested at the above address, to which data should be also sent. In case many oligonucleotides are to be submitted, a DOS diskette containing relative data in ASCII format can also be submitted. Please, ask for specific instructions.

REFERENCES 1. Parodi, B., Romano, P., Aresu, 0., Manniello, A., Vitiello, E., lannotta, B., Ruzzon, T. and Santi, L. (1990) Chemistry today, 8, 23-25. 2. Aresu, 0., lannotta, B., Parodi, B., Romano, P. and Ruzzon, T. (1991) In Spier, R.E., Griffiths, J.B., Meignier, B. (eds.), Production of biologicals from animal cells in culture. Butterworth-Heinemann LtD, pp. 49-51. 3. Aresu, 0., lannotta, B., Maschio, C., Ottazzi, G., Parodi, B., Romano, P., Ruzzon, T. and Ferrara, G.B. (1990) Proceedings of the 4th European Histocompatibility Conference, Council of Europe Eds., p. 74. 4. Dayhoff, M.O. (ed) (1978) Atlas of protein sequence and structure, vol 5., suppl. 3, 26. 5. New Haven Human Gene Mapping Library-Chromosome Plots, (1989) Howard Hughes Medical Institute and Yale University, number 5, HGM 10. 6. Cameron, G.N. (1988) Nucleic Acids Res., 16, 1865-1867. 7. Bilofsky, H.S., Burks, C. (1988) Nucleic Acids Res., 16, 1861-1863. 8. Bodmer, J.G., Marsh, S.G.E., Albert, E.D., Bodmer, W.F., Dupont, B.,

Erlich, H.A., Mach, B., Mayr, W.R., Parham, P., Sasazuki, T., Schreuder, G.M.Th., Strominger, J.L., Svejgaard, A., Terasaki, P.I. (1991) Tissue Antigens, 37, 97-104. 9. Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual. Book 2, 2nd Edition. Cold Spring Harbor Laboratory Press. 10. Olson, M., Hood, L., Cantor, C., Botstein, D. (1989) Science, 245, 1434-1435.

Molecular Probe Database: a database on synthetic oligonucleotides.

The Molecular Probe Data Base (MPDB) is designed to collect and make available on-line information on synthetic oligonucleotides. This paper briefly d...
504KB Sizes 0 Downloads 0 Views