Pediatr Nephrol DOI 10.1007/s00467-014-2994-1

REVIEW

The Human Variome Project: ensuring the quality of DNA variant databases in inherited renal disease Judy Savige & Raymond Dalgleish & Richard GH Cotton & Johan T den Dunnen & Finlay Macrae & Sue Povey

Received: 17 August 2014 / Revised: 9 October 2014 / Accepted: 15 October 2014 # IPNA 2014

Abstract A recent review identified 60 common inherited renal diseases caused by DNA variants in 132 different genes. These diseases can be diagnosed with DNA sequencing, but each gene probably also has a thousand normal variants. Many more normal variants have been characterised by individual laboratories than are reported in the literature or found in publicly accessible collections. At present, testing laboratories must assess each novel change they identify for pathogenicity, even when this has been done elsewhere previously, and the distinction between normal and disease-associated variants is particularly an issue with the recent surge in exomic sequencing and gene discovery projects. The Human Variome Project recommends the establishment of gene-specific DNA

variant databases to facilitate the sharing of DNA variants and decisions about likely disease causation. Databases improve diagnostic accuracy and testing efficiency, and reduce costs. They also help with genotype–phenotype correlations and predictive algorithms. The Human Variome Project advocates databases that use standardised descriptions, are up-to-date, include clinical information and are freely available. Currently, the genes affected in the most common inherited renal diseases correspond to 350 different variant databases, many of which are incomplete or have insufficient clinical details for genotype–phenotype correlations. Assistance is needed from nephrologists to maximise the usefulness of these databases for the diagnosis and management of inherited renal disease.

J. Savige : F. Macrae The University of Melbourne, Melbourne Health, Melbourne, Australia

Keywords DNAvariant database . Genetic variant . Inherited disease . Phenotype . Pathogenicity

R. Dalgleish Department of Genetics, University of Leicester, Leicester, UK

Introduction

R. G. Cotton Human Variome Project, The University of Melbourne, Melbourne, Australia J. T. den Dunnen Human and Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands F. Macrae Colorectal Medicine and Genetics, The Royal Melbourne Hospital, Parkville, Australia S. Povey Research Department of Genetics, Evolution and Environment, University College London, London, UK J. Savige (*) Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3050, Australia e-mail: [email protected]

A recent review identified 60 inherited renal diseases caused by DNA variants in more than 130 different genes [1], and many further genes have been identified since that report. These inherited diseases can be diagnosed with sequencing, but human genes each have up to one thousand normal variants, and many more continue to be identified by the “1,000 Genomes Project” [2], and the use of exomic sequencing for clinical diagnostics [3]. Thus, genetic testing laboratories not only detect DNA variants, but must also assess them for pathogenicity. This usually requires a search of the scientific literature and on-line collections for a previous report, but assessment is also based on the location of the variant within the gene, the nature of the affected residue, whether it is conserved in different species, and sometimes, on further testing of other family members, or functional analyses. Diagnostic laboratories

Pediatr Nephrol

typically use several programs that calculate the likely consequences of any change, as well as providing links to variant scoring algorithms (SIFT, PolyPhen-2, SpliceSite Finder etc.), to databases for normal and pathogenic variants (LOVD locus-specific databases, HGMD Professional, Cafe Variome, dbSNP), to OMIM and to references in PubMed. The pathogenicity assessment of an unpublished variant takes about an hour of a scientist’s time, and laboratory protocol requires the results to be confirmed independently. This represents a substantial workload for a busy department, and the work may be replicated many times for the same unreported variant, throughout the world. This approach results in unnecessary delays, equivocation and potentially errors in reporting, and in further expense for the patient, health provider, insurer and community. For some genes, especially for missense variants, the assessment is even more complicated. For example, in bowel cancer, the Variant Interpretation Committee of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) meets 2-monthly by teleconference, to assign pathogenicity to variants against a hierarchy of criteria. Most service laboratories do not routinely submit their mutations to a database, and journals rarely publish small series of DNA changes. Furthermore, privately owned laboratories are often reluctant to share their variants and the corresponding assessments of pathogenicity, which they consider their intellectual property. Thus, ten times more variants are estimated to be known to individual laboratories than are found in the literature or publicly accessible collections [4].

The Human Variome Project The Human Variome Project has as its mission “the sharing of all genetic variants to the widest possible community in geneor disease-specific databases that are up-to-date, accurately assessed for pathogenicity, and that incorporate clinical features” [5]. It is often considered the successor to the Human Genome Project, and its members are volunteer physicians, geneticists, laboratory scientists and researchers who aim to prevent the duplication involved in assessing pathogenicity by encouraging laboratories to share their variants and decisions about disease causation in databases. However, variant databases also have many other applications: & &

To confirm the diagnosis of an inherited disease when a previously described pathogenic variant is detected To increase the accuracy and speed of genetic testing through access to expertly curated databases of variants annotated with a transparent process of pathogenicity assignment. To thereby reduce the costs of testing for the laboratory, and for the patient, health service, insurer and community

&

& &

To increase the number of known variants and hence improve genotype–phenotype correlations. To enable the development of algorithms that predict clinical features from mutation characteristics To improve our understanding of the structure and function of a protein through analysis of the effects of mutations on phenotype To improve our understanding of the prevalence of an inherited disease and, hence, its economic cost

The Human Variome Project has developed recommendations for the establishment and maintenance of DNA variant databases (locus-specific databases, LSDB), and practical advice on how to do this [6]. Members of the Human Variome Project have developed a prototype request form for genetic tests with prompts for disease-specific clinical features, and for consent to making test results available to other family members and for database inclusion. Members at the Leiden University Medical Center (LUMC) have developed a “database-in-a-box” system (Leiden Open Variation Database, LOVD), which can be downloaded, customised and hosted on a website, and is user-friendly and free [7, 8]. Typically collected data include the DNA and corresponding protein change, method of detection, ethnicity, clinical associations (including age at onset of renal failure), assessment of pathogenicity or prevalence in normal individuals, previous descriptions and contact details of the submitting laboratory. Sometimes, some fields may be accessed only by registered users. LUMC also hosts LOVD databases on their server, and provides website management, security, technical support and access to software [9], which checks variant descriptions for compliance with the Human Genome Variation Society recommendations [10]. LOVD installations managed by LUMC and by external sites currently host databases with more than two million unique variants from 163,046 individuals (http:// www.lovd.nl/3.0/home). More than 95 % of all gene variant databases world-wide and 350 different curators use the LOVD system, and these databases have been cited in more than 1,000 scientific publications.The Human Variome Project has outlined the following steps in establishing and maintaining a gene variant database: 1. Search for an existing database, for example, at http:// genesymbol.lovd.nl/ (where “genesymbol” is the HGNC Gene Symbol for the gene of interest [11]), http://www. hgvs.org/dblist/glsdb.html or http://www.gen2phen.org/ data/lsdbs, and confirm that any database is active, up to date, includes clinical data and is curated by individuals with a special interest in the gene or disease (“experts”). 2. Collaborate with a curator, or offer to curate, a database, where this already exists or the database is inactive. Choose a joint curator with complementary skills so that together you describe a variant accurately, and can

Pediatr Nephrol

3.

4.

5.

6.

7.

8.

9.

10.

decide on its likely pathogenicity based on both genetic and clinical features [12]. Decide whether to use the LOVD database, and Leiden University Medical Center (LUMC) server and management, a different database and server, or a hybrid, for example, the LOVD database hosted on your own institution’s server. The server must be secure, and only the database curators and manager should be able to modify database content [6]. LUMC is funded by the European Commission to support LOVD development, infrastructure, software, server, backup, management and security, without cost to the curators or users. Describe variants with reference to the Locus Reference Genomic (LRG) (http:///www.ncbi.nlm.nih.gov/ pubmed/24285302) [13] or GenBank RefSeq [14] sequence using standardised nomenclature [10]. Check variant descriptions (for example, amino acid alterations) for accuracy using tools such as Mutalyzer [9] on the LUMC server. Include the gene name and symbol used by the HUGO Gene Nomenclature Committee [11] (with further updates at http://www.genenames.org/guidelines.html) on the database webpage as well as links to sites with further relevant clinical or structural protein information (OMIM, UniProt, UCSC and Ensembl viewers etc.). Share data with these central repositories. For example, variants included in the LUMC databases can be viewed using the UCSC site. Include on the webpage the database policy, a disclaimer to limit liability, and instructions for database citation [15, 16]. Decide on data fields. Many curators already have their own spreadsheets, and data fields mainly vary regarding clinical features. Although adding more fields increases the usefulness of the database, they are also more difficult to fully populate [4]. Describe each variant at the DNA level and its predicted effect on the corresponding mRNA and protein using the appropriate reference sequences [6]. Include both pathogenic and normal variants in the database, and include each variant every time it is found in a different family. Include any functional evidence of pathogenicity. Ensure that clinical information is accurate and complete. Use standardised descriptions of clinical features (The Human Phenotype Ontology: http://www.humanphenotype-ontology.org/) to facilitate data retrieval and analysis [17]. De-identify patient data. Fields with potentially identifiable data should be accessible only to professionals. An independent Ethics Oversight Committee may help with monitoring and advice on these issues [18]. Develop strategies to increase the number of variants in the database: distribute a spreadsheet with relevant data fields to potential collaborators; offer to visit their

laboratories to access their in-house spreadsheets; provide co-authorship of “mutation update” manuscripts to those who submit variants; and search meeting abstracts, published manuscripts, the dbSNP, exomic sequencing and the 1,000 Genomes Project websites. Develop the process of microattribution to formally acknowledge submission of unpublishable packages of data. 11. Devise criteria for pathogenicity, include them on the website and score all variants accordingly. In general, the pathogenicity of any variant should be confirmed by at least two different means. Segregation with disease alone is insufficient. Guidelines for predicting the pathogenicity of missense mutations already exist for some diseases [19]. 12. Provide free access, possibly with greater access for health professionals than for members of the public. 13. Review and update the database regularly. The Human Variome Project encourages experts working on a gene to collaborate on mutation reviews that include their laboratories’ unpublished variants and to undertake genotype– phenotype analyses. Most curators focus on one or a few related genes, and can personally encourage colleagues working in the field to contribute their variants to a database. There are also more sophisticated approaches for sharing variants, such as Café Variome (http://www.cafevariome.org/), which provides an automated interface between a laboratory’s variant storage system and a remote database via a data clearing house. The potential benefits for accuracy and efficiency mean that the submission of variants to a database could be mandated for laboratory accreditation. Other incentives for contribution to databases include “microattribution” [20, 21], where the laboratory and individual responsible for each submission are acknowledged. Finally, already some journals require, as a condition of publication, the prior submission of all variants to a database. There are many different systems, but LOVD requires minimal resources and is an easy database for non-experts to establish. However, most laboratories simply want to access a database to check the pathogenicity of variants rather than to establish one. The Human Gene Mutation Database (HGMD), hosted by the University of Cardiff, has both “public” and “professional” sites (http://www.hgmd.cf.ac.uk/docs/). Access to the professional site includes advanced search options, and current data, but incurs a charge. Data on the public site are at least 3 years behind data available in the literature. Recently, the US National Center for Biotechnology Information (NCBI) established the ClinVar database (http:// www.ncbi.nlm.nih.gov/clinvar/), which has the advantages of automated registry of newly discovered normal variants, and US government funding, which means that it has free access and is sustainable. There are also other web-based collections of DNA variants (dbSNP, UniProt, etc.), but these do not

Pediatr Nephrol

encourage the submission of unpublished variants from diagnostic laboratories, have limited clinical information, and are more likely to have annotation or transcription errors [22, 23].

DNA variant databases for inherited renal diseases The DNA variant databases for inherited renal diseases are compared in Table 1. The 60 inherited renal diseases and 130 affected genes described earlier correspond to 350 DNA variant databases [1]. Duplication of these databases is common, few are curated by experts or include clinical features, nearly half have fewer than 10 variants, only half have been updated in the previous 12 months, and some impose a charge to access their data [24]. Probably the most common monogenic inherited renal diseases are autosomal dominant polycystic kidney disease, Alport syndrome and tuberous sclerosis. The corresponding databases also have the largest numbers of variants [24]. (Two authors of this manuscript, JS and SP, curate the COL4A5 and the TSC1 and TSC2 databases respectively.) All three diseases have databases established by interested clinicians that incorporate clinical features and have used the information found therein to advance our knowledge of these conditions, especially genotype–phenotype correlations. The autosomal dominant polycystic kidney disease mutation database (PKDB) is a repository for all pathogenic and normal variants in the PKD1 and PKD2 genes [25]. Funds for establishment and maintenance have been provided by the PKD Foundation. This database does not use the LOVD system or server, and includes the location of the variant, amino acid change, interpretation as “pathogenic”, “neutral” or “indeterminate”, age at onset of renal failure, segregation with disease within the family, other possible pathogenic variants in the family, references for published variants, and further comments. A score is provided for each variant to reflect pathogenicity. The site carries a disclaimer that the database should be used for research only because variant pathogenicity is often unproven. There are two major databases for X-linked Alport syndrome mutations (Table 1) [26]. The ARUP database is curated by a commercial not-for-profit testing company and is updated quarterly [27]. It is not based on LOVD, and does not include clinical detail. The other major Alport database uses the LOVD system, includes databases for autosomal recessive Alport syndrome and Alport syndrome with leiomyomatosis, and incorporates clinical data, including age at onset of renal failure, hearing loss and ocular features. The LOVD Alport database has been funded in part by the US and Australian Alport Foundations, but otherwise depends on volunteers. Both databases are free, and they share variants collaboratively so that diagnostic laboratories are not

disadvantaged in accessing one rather than the other. Why not simply combine the databases? Because both were developed by independent groups, who put in a lot of effort, and who take pride in their achievement, and because, for the timebeing, this arrangement works. The tuberous sclerosis database uses the LOVD system and includes variants in both TSC1 and TSC2. Again, its funding is from a patient support group. The curators update these databases monthly and include functional assessments of pathogenicity [28]. Which other renal diseases would benefit most from databases? The answer is those diseases that are reasonably common, but difficult to diagnose genetically, such as the inherited complement diseases (haemolytic uremic syndrome, dense deposit disease, C3 nephropathy etc.).

Further goals and challenges The further goals and challenges for the Human Variome Project are as follows: & &

& & & &

& & &

Encouraging a single database for each gene. Ensuring the collection of all variants, especially those from busy and commercial testing laboratories. This may be facilitated by an automated interface between laboratory storage of variants and a centralised database. Handling the massive amounts of data from “next-generation” and exomic sequencing. Ensuring that variants are collected from people of diverse ethnicities. Developing databases for clinically relevant syndromes, such as haematuria, nephrotic syndrome, CAKUT, etc. Resolving privacy and confidentiality issues in different countries and cultures with diverse legal and ethical considerations. Moving towards “open consent”, where donors acknowledge that researchers will try to ensure, but cannot guarantee, privacy. Extending databases to include mitochondrial and somatic variants. Having a rating system for databases that indicates their reliability, and how well they adhere to the Human Variome Project recommendations and standards. Ensuring the financial sustainability of databases.

It is important that, as far as possible, there is a single, complete, current and accurate database for each gene affected in inherited renal disease. Databases currently focus primarily on monogenic renal diseases rather than on syndromes or risk factors for the more common complex diseases. However, they require better integration based on clinical syndromes due to variants in multiple genes, and better coordination to identify gaps and duplication.

Professional data entry Includes most published pathogenic and normal variants; includes multiple reports of same pathogenic variants in different families Yes, but depends on volunteer curators Some clinical information, but generally not detailed Yes

Expert curation/accuracy

Completeness

Yes

Not applicable

Not typically

Free registration for access to variants published more than 3 years previously; subscription for access to more recent variants on professional site Charges for access to recently described variants

No

No

Yes

Includes most published pathogenic variants

Professional data entry

Published variants

Includes most disease-related genes

HGMD

Yes

Funded by US government

Sustainable

Commonly used in gene discovery pipelines to confirm pathogenicity For curators and submitters of variants Server, technical support and backup provided to guest curators

Yes

Includes detailed clinical information Actively includes normal variants too Freely available

Updated regularly

Published variants and normal variants from “1,000 genomes” and “GOESP” projects

Includes most disease-related genes

ClinVar (NCBI)

Source of variants

For diagnostic laboratories Content

DNA variant database

Table 1 Comparison of different DNA variant databases for inherited kidney disease

Yes

European Union support for development of LOVD system and GEN2PHEN databases Yes

Recommended by HVP, but not necessarily Recommended by HVP, but not necessarily Typically, yes

Potentially includes most published and unpublished pathogenic variants; includes multiple reports of same pathogenic variants in different families Variable

Gene/disease expert volunteer

Published and unpublished variants

Includes most disease-related genes, but some databases are empty

Locus-specific databases (LSDB), e.g. LOVD format

Not applicable

No

Often funded by governments

Typically, yes

Often

Variable

Includes small cohorts of novel pathogenic and normal variants from different ethnicities; includes multiple reports of same pathogenic variants in different families Variable

Typically genes affected in common diseases and for which there is a testing facility in that country Often unpublished variants from diagnostic laboratories in that country. Useful data about variants in different ethnicities Professional data entry

Country-specific nodes, e.g. Korean, Chinese

Yes, typically by host institution

No

Often funded by university departments or patient support groups

Typically, yes

Variable

Variable

Variable

Potentially most pathogenic variants, but only for a few diseases; includes multiple reports of same pathogenic variants in different families

Published and unpublished variants, often from diagnostic and research labs with an interest in a certain disease Gene/disease expert volunteer

Usually only a few different genes that result in related diseases

Other (often research group or patient support group driven)

Pediatr Nephrol

1,275 (http://www.LOVD.nl/ TSC1) 240 (http://proteomics. bio21.unimelb.edu.au/ lovd/genes/PKHD1)

125b (http://www.LOVD.nl/ PKD1) 1,951 (http://www.LOVD.nl/ COL4A5

Yes, in some LOVD applications Curator can use Mutalyzer tool to confirm variant description

Yes

Locus-specific databases (LSDB), e.g. LOVD format

4 (KMD) (http://kmd.nih.go.kr)

18 (KMD) (http://kmd.nih.go.kr)

2 (KMD) (http://kmd.nih.go.kr)

Not stated

Variable

Generally the aim

Country-specific nodes, e.g. Korean, Chinese

748 (http:// humgen.rwth-aachen.de)

675 (ARUP) (http:// www.arup.utah.edu/ database/ALPORT/ ALPORT_welcome.php)

2,322 (http://pkdb.mayo.edu/)

Typically confirmed by curator

Variable

Often, yes

Other (often research group or patient support group driven)

c

b

a

Many of the variants in ClinVar do not have pathogenicity information, but instead have links back to the more detailed LOVD entries

Some LOVD databases include multiple examples of the same variant if it is found in different families

The number of variants in brackets is for the professional database, which charges for access

LOVD Leiden Open Variant Database, NCBI National Center for Biotechnology Information, KMD Korean Mutation Database, HGMD Human Gene Mutation Database, ADPKD autosomal dominant polycystic kidney disease, TSC tuberous sclerosis, ARUP Associated Regional and University Pathologists Laboratory

384 (393)

220 (295) (http://www.hgmd.org/)

639c (www.ncbi.nlm.nih.gov/ clinvar/) 114 (ignoring large deletions) (www.ncbi.nlm.nih.gov/ clinvar/)

Tuberous sclerosis (TSC1)

Autosomal recessive polycystic kidney disease (PKHD1)

525 (755) (http:// www.hgmd.org/)

516 (980)a (http://www.hgmd.org/)

References provided

References provided

HGMD

665 (www.ncbi.nlm.nih.gov/ clinvar/)

43 (www.ncbi.nlm.nih.gov/ clinvar/)

References provided or source stated as “1,000 genomes” etc.

Yes; data are transferred directly from some US laboratories Yes

ClinVar (NCBI)

X-linked Alport syndrome (COL4A5)

Number of variants ADPKD (PKD1)

Microattribution where the submitter is acknowledged Checking of variant description

Laboratories can submit their own variants

DNA variant database

Table 1 (continued)

Pediatr Nephrol

Pediatr Nephrol

Errors occur where databases are not expertly curated, and the major sources of error are from human processing and the description of clinical features [22]. The quality and paucity of clinical data are common weaknesses, but most databases include links to publications or contact details for the submitting laboratory. Expert curators are more likely to accurately assess submitted clinical features for significance. Further new terminology is required to systematically describe clinical features (“ontologies”) and, to this end, the Human Phenotype Ontology (HPO: http://www.humanphenotype-ontology.org/) [17] has been developed to facilitate data retrieval and analysis. Next-generation sequencing is taking on a role in clinical diagnostic testing [29] and yielding enormous numbers of normal and pathogenic variants. The 1,000 Genomes Project has identified large numbers of normal variants, but even more exomic variants are found, and sometimes ignored, by diagnostic laboratories. Country-specific databases, for example, those in Korea (KMD; http://kmd.cdc.go.kr) will help to increase the spectrum of normal variants in different ethnicities. It remains important for variants identified incidentally in a population without a corresponding phenotype to be recorded as normal in databases because of the critical importance of normal variants in the analysis of exomic sequencing results. Nevertheless, professional bodies caution against overreliance on databases to interpret the significance of a variant [30]. Databases usually indicate that they are “for research purposes only” and have a disclaimer regarding errors in clinical interpretation. In contrast, the InSiGHT database specifically promotes its databases for clinical interpretation purposes, and has incorporated a legal entity to protect against medicolegal claims as a result of misclassification leading to adverse health outcomes, for example, where a variant is misclassified as benign, surveillance ceased and an advanced cancer presents later. These caveats do not detract from a database’s value and commercially available assessment tools invariably provide links to such databases. So, should only variants found in accredited testing laboratories be included in a database? This seems unreasonable since confirmation in an accredited testing laboratory is not required for publication. However, it is desirable for the development of International Organization for Standardization (ISO) certified standards for databases [6]. Variants are commonly misreported and journal editors and referees should take steps to improve the accuracy of reporting. For example, the journal Human Mutation scrutinises all sequence variants for compliance with reporting standards. This practice could be incorporated into “Uniform Requirements for Manuscripts Submitted to Biomedical Journals” (http://www.icmje.org/). Privacy and confidentiality remain ongoing concerns for variant databases, especially in relation to the sharing of data across nations and ethnic groups, with diverse legal and

ethical requirements. Although variants are connected to patients in a de-identified manner, this is not necessarily anonymous, but allows the data to be re-connected, usually by the submitting laboratory. Patient support groups indicate that most patients are happy for their genetic information to be used to lessen the hardship of their own or other families with the same condition. Variant databases could be even more useful. Some databases only record the first published account of a diseasecausing variant, and not the subsequent reports that indicate disease prevalence and possible phenotypic variability. It is beyond the remit of databases to include detailed up-to-date information on disease treatment, but many have links to relevant websites.

What renal physicians can do Previously, for most nephrologists, their interaction with a diagnostic genetic laboratory has been limited to completing a request form and communicating the test results to patients and their families. Now, they are critical in ensuring the success of the Human Variome Project through support for disease- and gene-specific databases. Nephrologists can encourage patients to consent to the submission of their variants to a database. They can help patients understand that accurate, complete and current data improve our understanding of their disease, its course and complications, and that a database benefits themselves, their families, and others who are also affected. Providing complete and accurate clinical information with any request for a genetic test is important since this may be transferred directly to a database [31]. Physicians can also update their patients’ data throughout the course of their illness and, sometimes, encourage the patients to do so themselves. They can explain to patients that clinical information submitted to a database is de-identified, but that they can be “re-identified” and kept informed when new and clinically relevant information is discovered. Nephrologists are also likely to lead in the establishment and curation of variant databases for inherited renal diseases, using the steps outlined above [4, 7]. They may collaborate with colleagues who also work in the field to produce “updates” or “super-reviews” that increase the number of variants in the public domain and facilitate novel genotype–phenotype correlations and potentially the development of algorithms that predict clinical features based on variant characteristics. Some may take as an example, the databases for syndromes such as spastic paraparesis and sensory neuropathies that the Neurogenetics Consortium is developing [32]. Physician–curators are also likely to help assess clinical features and variant pathogenicity. Syndromes in inherited renal disease may include inherited haematuria, nephrotic syndrome or congenital

Pediatr Nephrol

anomalies of the kidneys and urinary tract (CAKUT). InSiGHT represents a model for the involvement of clinician experts in a variant database. This society aims not only to encourage research in gastrointestinal tumour syndromes, but also to educate health care professionals in molecular genetics and assist institutions in establishing or maintaining a registry, as well as in the discussion of controversial areas (www. insight-group.org/). How much time, effort and cost are required to establish a new database? It depends on the gene, the number of variants, and the work required to ascertain likely pathogenicity, and to correct errors in submitted data. Nevertheless, curation takes time and resources. For example, standardising and checking 771 variants for the PKD1 database occupied 170 h of curator time [22]. However, an empty database is provided on the LUMC server, and the curator needs only to decide on the data to collect, which will be mainly disease-specific phenotype characteristics. Then, a literature review will be required to collect all published information and testing laboratories can be contacted with the request to share unpublished data. This may amount to 1–3 months of full-time work, and thereafter, an hour to a day a week for cataloguing new submissions [4]. It is important to keep in mind, though, the many benefits of a database for patients, physicians and diagnostic laboratories. The rewards for physicians include the increased understanding of genotype–phenotype correlations, as well as the citations, webpage access figures, and further collaborations with other laboratories working in the field. The cost of establishing a database can be measured in terms of staff time only because LOVD databases are hosted without charge at LUMC. While there may be some greater flexibility in local database customisation, there are the costs of hardware purchase, technical expertise and maintenance. Nephrologists often play many roles other than patient care. As manuscript reviewers and journal editors, they can request that all variant descriptions adhere to the Human Variome Project/Human Genome Variation Society guidelines and that variants be submitted to a database as a prerequisite of manuscript publication. Those with leadership roles in genetic testing laboratories can support the automated transfer of variants to locus-specific databases. As members of the teams accrediting laboratories for national pathology boards, they can encourage the submission of variants to a database as a requirement of accreditation. As hospital administrators or members of Health Service Executive or Finance Committees, they can support funding for the establishment and maintenance of databases because of their contribution to increased diagnostic accuracy and cost-effectiveness. As members of Human Research Ethics Committees, they can encourage researchers to submit new variants to a database using arguments about the ethics of resource utilisation.

Sustainability of databases remains a concern, and again physicians can help, by advocating, together with patients, for funding support from governments, health services, insurers and patient support groups. Variant databases have some overlap with disease registries, which have clearly demonstrated their utility. How many more DNA variants are there and for how much longer will funding be required? By now, many of the genes responsible for the most clinically significant inherited diseases have been identified. The next 5 years may see identification of the most common variants for frequently affected genes. While mutations continue to occur de novo, sufficient variants should nevertheless be known to derive clinically useful genotype–phenotype correlations. After that, curators may choose to hand over their databases to government funded institutions, such as the NCBI and EBI [33]. In the meantime, interested nephrologists can help with the curation of databases, by increasing the contribution of normal and pathogenic variants, and by using databases to correlate DNA variants with clinical phenotypes.

Acknowledgements The authors of this manuscript are all members of the Human Variome Project, but have no financial disclosures or conflicts of interest to declare. The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754—the GEN2PHEN project. This funding source played no role in the production of this manuscript. This work was completed while JS was a Visiting Academic within the Research Department of Genetics, Evolution and the Environment at University College London.

References 1. Hildebrandt F (2010) Genetic kidney diseases. Lancet 375:1287– 1295 2. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073 3. Yang Y, Muzny DM, Reid JG, Bainbridge MN, Willis A, Ward PA, Braxton A, Beuten J, Xia F, Niu Z, Hardison M, Person R, Bekheirnia MR, Leduc MS, Kirby A, Pham P, Scull J, Wang M, Ding Y, Plon SE, Lupski JR, Beaudet AL, Gibbs RA, Eng CM (2013) Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med 369:1502–1511 4. Celli J, Dalgleish R, Vihinen M, Taschner PE, den Dunnen JT (2012) Curating gene variant databases (LSDBs): toward a universal standard. Hum Mutat 33:291–297 5. Cotton RG, Auerbach AD, Beckmann JS, Blumenfeld OO, Brookes AJ, Brown AF, Carrera P, Cox DW, Gottlieb B, Greenblatt MS, Hilbert P, Lehvaslaiho H, Liang P, Marsh S, Nebert DW, Povey S, Rossetti S, Scriver CR, Summar M, Tolan DR, Verma IC, Vihinen M, den Dunnen JT (2008) Recommendations for locus-specific databases and their curation. Hum Mutat 29:2–5 6. Vihinen M, den Dunnen JT, Dalgleish R, Cotton RG (2012) Guidelines for establishing locus specific databases. Hum Mutat 33:298–305

Pediatr Nephrol 7. Fokkema IF, Taschner PE, Schaafsma GC, Celli J, Laros JF, den Dunnen JT (2011) LOVD v. 2.0: the next generation in gene variant databases. Hum Mutat 32:557–563 8. Fokkema IF, den Dunnen JT, Taschner PE (2005) LOVD: easy creation of a locus-specific sequence variation database using an “LSDB-in-a-box” approach. Hum Mutat 26:63–68 9. Wildeman M, van Ophuizen E, den Dunnen JT, Taschner PE (2008) Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Hum Mutat 29:6–13 10. Den Dunnen JT, Antonarakis SE (2000) Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat 15:7–12 11. Wain HM, Bruford EA, Lovering RC, Lush MJ, Wright MW, Povey S (2002) Guidelines for human gene nomenclature. Genomics 79: 464–470 12. Smith CI, Vihinen M (1996) Immunodeficiency mutation databases—a new research tool. Immunol Today 17:495–496 13. Dalgleish R, Flicek P, Cunningham F, Astashyn A, Tully RE, Proctor G, Chen Y, McLaren WM, Larsson P, Vaughan BW, Beroud C, Dobson G, Lehvaslaiho H, Taschner PE, den Dunnen JT, Devereau A, Birney E, Brookes AJ, Maglott DR (2010) Locus reference genomic sequences: an improved basis for describing human DNA variants. Genome Med 2:24 14. Benson DA, Karsch-Mizrachi I, Clark K, Lipman DJ, Ostell J, Sayers EW (2012) Genbank. Nucleic Acids Res 40:D48–53 15. Claustres M, Horaitis O, Vanevski M, Cotton RG (2002) Time for a unified system of mutation description and reporting: a review of locus-specific mutation databases. Genome Res 12:680–688 16. Mitropoulou C, Webb AJ, Mitropoulos K, Brookes AJ, Patrinos GP (2010) Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use. Hum Mutat 31: 1109–1116 17. Robinson PN, Mundlos S (2010) The human phenotype ontology. Clin Genet 77:525–534 18. Povey S, Al Aqeel AI, Cambon-Thomsen A, Dalgleish R, den Dunnen JT, Firth HV, Greenblatt MS, Barash CI, Parker M, Patrinos GP, Savige J, Sobrido MJ, Winship I, Cotton RG (2010) Practical guidelines addressing ethical issues pertaining to the curation of human locus-specific variation databases (LSDBs). Hum Mutat 31:1179–1184 19. Tavtigian SV, Greenblatt MS, Goldgar DE, Boffetta P (2008) Assessing pathogenicity: overview of results from the IARC unclassified genetic variants working group. Hum Mutat 29:1261–1264 20. Thorisson GA (2009) Accreditation and attribution in data sharing. Nat Biotechnol 27:984–985 21. Giardine B, Borg J, Higgs DR, Peterson KR, Philipsen S, Maglott D, Singleton BK, Anstee DJ, Basak AN, Clark B, Costa FC, Faustino P, Fedosyuk H, Felice AE, Francina A, Galanello R, Gallivan MV, Georgitsi M, Gibbons RJ, Giordano PC, Harteveld CL, Hoyer JD, Jarvis M, Joly P, Kanavakis E, Kollia P, Menzel S, Miller W, Moradkhani K, Old J, Papachatzopoulou A, Papadakis MN, Papadopoulos P, Pavlovic S, Perseu L, Radmilovic M, Riemer C, Satta S, Schrijver I, Stojiljkovic M, Thein SL, Traeger-Synodinos J, Tully R, Wada T, Waye JS, Wiemann C, Zukic B, Chui DH, Wajcman H, Hardison RC, Patrinos GP (2011) Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach. Nat Genet 43:295–301

22. Gout AM, Ravine D, Harris PC, Rossetti S, Peters D, Breuning M, Henske EP, Koizumi A, Inoue S, Shimizu Y, Thongnoppakhun W, Yenchitsomanus PT, Deltas C, Sandford R, Torra R, Turco AE, Jeffery S, Fontes M, Somlo S, Furu LM, Smulders YM, Mercier B, Ferec C, Burtey S, Pei Y, Kalaydjieva L, Bogdanova N, McCluskey M, Geon LJ, Wouters CH, Reiterova J, Stekrova J, San Millan JL, Aguiari G, Del Senno L (2007) Analysis of published PKD1 gene sequence variants. Nat Genet 39:427–428 23. Murphy JA, Barrantes-Reynolds R, Kocherlakota R, Bond JP, Greenblatt MS (2004) The CDKN2A database: Integrating allelic variants with evolution, structure, function, and disease association. Hum Mutat 24:296–304 24. Savige J, Dagher H, Povey S (2014) Mutation databases for inherited renal disease: are they complete, accurate, clinically relevant, and freely available? Hum Mutat 35:791–793 25. Gout AM, Martin NC, Brown AF, Ravine D (2007) PKDB: Polycystic Kidney Disease Mutation Database—a gene variant database for autosomal dominant polycystic kidney disease. Hum Mutat 28:654–659 26. International Alport Mutation Consortium, Savige J, Ars E, Cotton RG, Crockett D, Dagher H, Deltas C, Ding J, Flinter F, Pont-Kingdon G, Smaoui N, Torra R, Storey H (2014) DNA variant databases improve test accuracy and phenotype prediction in Alport syndrome. Pediatr Nephrol 29:971–977 27. Crockett DK, Pont-Kingdon G, Gedge F, Sumner K, Seamons R, Lyon E (2010) The Alport syndrome COL4A5 variant database. Hum Mutat 31:E1652–1657 28. Hoogeveen-Westerveld M, Wentink M, van den Heuvel D, Mozaffari M, Ekong R, Povey S, den Dunnen JT, Metcalfe K, Vallee S, Krueger S, Bergoffen J, Shashi V, Elmslie F, Kwiatkowski D, Sampson J, Vidales C, Dzarir J, GarciaPlanells J, Dies K, Maat-Kievit A, van den Ouweland A, Halley D, Nellist M (2011) Functional assessment of variants in the TSC1 and TSC2 genes identified in individuals with tuberous sclerosis complex. Hum Mutat 32:424–435 29. ACMG Board of Directors (2012) Points to consider in the clinical application of genomic sequencing. Genet Med 14:759–761 30. Maddalena A, Bale S, Das S, Grody W, Richards S (2005) Technical standards and guidelines: molecular genetic testing for ultra-rare disorders. Genet Med 7:571–583 31. Cotton RG, Auerbach AD, Brown AF, Carrera P, Christodoulou J, Claustres M, Compton J, Cox DW, De Baere E, den Dunnen JT, Greenblatt M, Fujiwara M, Hilbert P, Jani A, Lehvaslaiho H, Nebert DW, Verma I, Vihinen M (2007) A structured simple form for ordering genetic tests is needed to ensure coupling of clinical detail (phenotype) with DNA variants (genotype) to ensure utility in publication and databases. Hum Mutat 28:931–932 32. Haworth A, Bertram L, Carrera P, Elson JL, Braastad CD, Cox DW, Cruts M, den Dunnen JT, Farrer MJ, Fink JK, Hamed SA, Houlden H, Johnson DR, Nuytemans K, Palau F, Rayan DL, Robinson PN, Salas A, Schule B, Sweeney MG, Woods MO, Amigo J, Cotton RG, Sobrido MJ (2011) Call for participation in the neurogenetics consortium within the Human Variome Project. Neurogenetics 12:169– 173 33. Den Dunnen JT, Sijmons RH, Andersen PS, Vihinen M, Beckmann JS, Rossetti S, Talbot CC Jr, Hardison RC, Povey S, Cotton RG (2009) Sharing data between LSDBs and central repositories. Hum Mutat 30:493–495

The Human Variome Project: ensuring the quality of DNA variant databases in inherited renal disease.

A recent review identified 60 common inherited renal diseases caused by DNA variants in 132 different genes. These diseases can be diagnosed with DNA ...
226KB Sizes 3 Downloads 7 Views