Original Article Whole Exome Sequencing Implicates an INO80D Mutation in a Syndrome of Aortic Hypoplasia, Premature Atherosclerosis, and Arterial Stiffness Khader Shameer, PhD; Eric W. Klee, PhD; Angela K. Dalenberg, BS; Iftikhar J. Kullo, MD Background—Massively parallel, high-throughput sequencing technology is helping to generate new insights into the genetic basis of human diseases. We used whole exome sequencing to identify the mutation underlying a syndrome affecting 2 siblings with aortic hypoplasia, calcific atherosclerosis, systolic hypertension, and premature cataract. Methods and Results—Exonic regions were captured and sequenced using a next-generation sequencing platform to generate 100 bases paired-end reads. A computational genomic data analysis pipeline was used to perform quality control, align reads to a reference genome, and identify genetic variants; findings were confirmed using a different exome analyses pipeline. The 2 siblings were homozygous for a rare missense mutation (Ser818Cys) in INO80D, a subunit of the human INO80 chromatin remodeling complex. Homozygosity mapping and Sanger sequencing confirmed that the mutation is located in one of the runs of homozygosity on chromosome 2. INO80D encodes a key subunit of the human IN080 complex, a multiprotein complex involved in DNA binding, chromatin modification, organization of chromosome structure, and ATP-dependent nucleosome sliding. By introducing a new disulphide-bond in the protein product and also disrupting the composition of low-complexity regions, the Ser818Cys mutation may affect INO80D function, protein– protein interactions, and chromatin remodeling. Conclusions—Our findings suggest a link between the Ser818Cys mutation in INO80D, a subunit of the human INO80 chromatin remodeling complex, and accelerated arterial aging.  (Circ Cardiovasc Genet. 2014;7:607-614.) Key Words: aortic hypoplasia ◼ arterial stiffness ◼ atherosclerosis ◼ chromatin assembly and disassembly ◼ genetics ◼ INO80D protein, human

M

assively parallel sequencing is a powerful tool to identify the genetic basis of heritable diseases.1 Whole exome sequencing (WES) is a cost-efficient strategy to study the ≈1% of the human genome that is protein coding and has been used with success to identify genes for rare diseases that cluster in families.2,3 Consequently, WES is increasingly being used to investigate the genetic causes of hitherto uncharacterized inherited disorders, and several disease genes have been identified using this technology.4

Clinical Perspective on p 614 In this report, we describe an application of WES to identify the genetic basis of a syndrome affecting 2 nonHispanic white siblings who were referred to the Early Atherosclerosis Clinic of the Mayo Clinic. The affected siblings had aortic hypoplasia, diffuse arterial calcification, calcific aortic valve stenosis, systolic hypertension, and wide pulse pressure indicative of arterial stiffness, peripheral arterial disease, premature cataract, and excessive dental root mobility. These clinical features were not present

in the parents and 2 other siblings. To identify the mutation underlying this syndrome, we performed high-density genotyping, runs of homozygozity analysis, and WES in the 2 affected siblings, as well as WES in the 1 of the 2 unaffected siblings and the parents. The Institutional Review Board of the Mayo Clinic approved the study and the family members provided informed consent.

Methods Clinical Phenotyping There was no history of consanguinity between the parents of affected individuals. Clinical features of the affected siblings (L6 and L7), an unaffected sibling (L10) and the parents (L8 and L9), are summarized in the Table. The male sibling presented with systolic hypertension at the age of 17 years and the female sibling presented with systolic hypertension in her early 20s and cataract at the age of 30 years. Both were subsequently noted to have a hypoplastic aorta with extensive calcification, features that were not present in the parents or the other 2 siblings, thereby suggesting a recessive mode of inheritance (Figure 1). The male sibling died at the age of 42 years of a myocardial infarction after aortic valve replacement surgery.

Received May 23, 2013; accepted June 27, 2014. From the Division of Cardiovascular Diseases (K.S., A.K.D., I.J.K.) and Biomedical Statistics and Informatics, Departments of Health Sciences Research and Laboratory Medicine and Pathology (E.W.K.), Mayo Clinic, Rochester MN. The current address for Dr Shameer is Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY. The Data Supplement is available at http://circgenetics.ahajournals.org/lookup/suppl/doi:10.1161/CIRCGENETICS.113.000233/-/DC1. Correspondence to Iftikhar J. Kullo, MD, Mayo Clinic, 200 1st S SW, Rochester, MN 55905. E-mail [email protected] © 2014 American Heart Association, Inc. Circ Cardiovasc Genet is available at http://circgenetics.ahajournals.org

DOI: 10.1161/CIRCGENETICS.113.000233

Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015 607

608  Circ Cardiovasc Genet  October 2014 Table. 

Clinical Features of the 5 Members of the Affected Family L6 (Affected Son)

Sex

L7 (Affected Daughter)

L10 (Unaffected Daughter)

L8 (Father)

L9 (Mother)

Men

Women

Women

Men

Women

Died at the age of 41 y

Alive, age 42 y

Alive, age 38 y

Alive, age 70 y

Alive, age 66 y

Hypoplasia of the ascending (18 mm) and descending (14 mm) aorta with calcification

Hypoplasia of ascending (17 mm) and descending (12 mm) aorta with calcification

...†

Normal dimensions on echo

Normal dimensions on echo

Valve disease

Severe calcific aortic valve stenosis

Mild calcific aortic valve stenosis

None

None

None

CAD

Mild coronary atherosclerosis on coronary angiogram

CABG at the age of 52 y

No history of CABG

PAD

Aorto-iliac bypass graft at the age of 32 y for claudication

Mild claudication

No history of PAD

No history of PAD

No history of PAD

Moderate to severe plaque bilaterally

Mild plaque in both carotid bulbs

Normal

Mild plaque

...†

Systolic hypertension with wide pulse pressure, onset at the age of 17 y

Systolic hypertension with wide pulse pressure, onset in 20s

Normotensive

Hypertensive

Normotensive

Vital status Aorta

Carotid ultrasound Hypertension

Coronary calcification No coronary calcification on CT scan (score >90th on CT scan at age 27 percentile); but no significant stenosis on coronary angiogram at age 40

Hyperlipidemia

Yes

Yes

None

Yes

Yes

Smoking

None

None

None

None

None

No

No

No

Yes

No

Type 2 diabetes mellitus BMI, kg/m Cataract Teeth

2

23

22

22

32

24

Cataract surgery at the age of 32 y

Cataract surgery at the age of 34 y

None

None

None

Short roots with excessive mobility of the teeth

Short roots with excessive mobility of the teeth; extraction of all teeth at the age of 42 y

No history of dental problems

No history of dental problems

No history of dental problems

BMI indicates body mass index; CABG, coronary artery bypass graft surgery; CAD coronary artery disease; CT, computed tomography; and PAD, peripheral arterial disease. †A clinical phenotype was not observed or the test was not performed.

High-Density Genotyping and Homozygozity Mapping Homozygous segments are likely to carry disease mutations in inbred and outbred families, and homozygosity mapping is a means of localizing deleterious variants associated with recessive traits.5–7 Several recent exome analyses included runs of homozygozity–based filtering as a strategy for identifying causal variants in familial studies.8–13 Homozygosity mapping was performed in the 2 affected siblings who had been genotyped on the Illumina Human-660w Quad-version 1 platform as part of a genome-wide association study of peripheral arterial disease.14 A total of 476 395 single nucleotide polymorphisms (SNPs) passed the following quality control criteria: SNP call rate >98%, minor allele frequency >0.05, and Hardy–Weinberg equilibrium P value TGT) in the 2 parents and 3 siblings by Sanger sequencing. Primers for amplifying the genomic regions around these 2 mutation sites are listed in Table IV in the Data Supplement. Output from Sanger sequencing was visualized and mutations confirmed using 4Peaks software (Figure 3).

Frequency and Pathogenicity of the Missense Mutation in INO80D The novel missense mutation in INO80D was not present in dbSNP and the 1000 genomes project databases. In the Exome Variant Server (http://evs.gs.washington.edu/EVS/), the frequency of mutant allele A was 0.28% in 1085 sequenced European Americans. With an allelic frequency of A at 0.28%, the expected frequency of genotype AA is ≈8×10–6 (ie, 8 AA homozygotes per million). A common method to address pathogenicity is to assume that purifying selection constrains evolutionary divergence at phenotypically important nucleotides and amino acids. Multiple sequence alignments of

INO80D indicated conserved orthologs present in chimpanzee, dog, cow, mouse, rat, chicken, and zebrafish (Figure 4A). The effects of the missense mutation (Ser818Cys, NP_060229) in INO80D were measured by conservation (phastCons score of 1.0 and GERP score of 4.46) and difference in amino acid property (Grantham score of 112), suggesting the mutated site is in an evolutionarily conserved amino acid. This missense mutation was predicted to be probably damaging by Polyphen2,18 with a score of 0.990 (sensitivity, 0.71; specificity, 0.96).

Discussion In the present study, we combined homozygosity mapping and WES to identify the genetic basis of a syndrome observed in 2 siblings, with features of accelerated aging, including aortic hypoplasia, calcification and stiffness, early onset atherosclerotic vascular disease, systolic hypertension with wide pulse pressure, premature cataract, and excessive dental root mobility. A missense mutation (Ser818Cys) in INO80D was identified using integrated filtering of WES data and confirmed by an alternate exome analyses pipeline; the mutation mapped to one of the homozygous regions on chromosome 2 shared by the 2 affected siblings.

Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015

Shameer et al   An INO80D Mutation Implicated in Arterial Aging   611

Figure 3.  Sanger sequencing data visualized using 4Peaks software for c.2857T>A, NM_017759; AGT>TGT (Ser818Cys) mutation in an unaffected (top, AGT(S)) and affected individual (bottom, TGT(C)).

The INO80 complex is a chromatin remodeler involved in cardiovascular development and physiology.19 INO80D is a metazoan-specific subunit of the human INO80 complex that plays a crucial role in transcription and DNA-replication and repair.20,21 In humans, the INO80 complex21–24 is an ATP-dependent chromatin remodeling complex composed of 3 modules that assemble with 3 distinct domains of the hIno80 ATPase.20 Sequence annotations compiled in Pfam database suggest that INO80D encodes 2 copies of the zf-C3Hc3H domain, which functions as a DNA-binding domain of chromatin remodeling proteins and helicases (http://pfam.xfam.org/protein/Q53TQ3). The INO80D protein (Uniprot identifier Q53TQ3) has 2 isoforms of length 878 (Q53TQ3-1) and 1027 (Q53TQ3-2) residues, produced by alternative splicing. The mutation site (c.2857T>A, NM_017759) was mapped to residue number 818 of canonical transcripts, indicating that irrespective of the expressed isoform, the mutation would have a functional effect. The mutation results in the replacement of serine with cysteine and may lead to an altered steric effect during protein–protein interactions.20,21 To assess the functional significance of the mutation site, we used and integrated protein sequence, structure, and network-based interactome analyses (Figure 4; in the Data Supplement). Comparative analysis of secondary structure and solvent accessibility revealed that secondary structure conformation is not altered, but solvent accessibility is changed (Figure 4B). The mutation site (Ser818Cys) was observed to be a part of a low-complexity region (LCR) LCR-7 in both isoforms.

Mutation site Ser818Cys is observed in LCR-7 in Q53TQ3-1 and Q53TQ3-2 but LCR-7 was absent in mutated sequences and indicates that the mutation results in the loss of LCR-7 in both isoforms (Figure 4C). Loss of LCR-7 could affect protein–protein interactions or other functions mediated by LCRs. As INO80D interacts with several proteins and is a key component of the human IN080 complex, which mediates several important cellular processes20,21 loss of LCR may have significant functional effects. In addition to LCR composition, we used protein secondary structure data to understand the effect of the mutation. Using secondary structure information derived from protein sequence and cysteine residue frequencies, the disulfide-bond connectivity prediction program DiANNA25 suggested that a new disulfide-bond may be generated between the 456th amino acid (VKGEQCANKAL) and the 818th mutated amino acid (RQQYSCDHSHS) with a score of 0.999. Therefore, Ser818Cys may induce a new disulfide-bond that could lead to misfolding and thereby affect INO80D function.26 A major role of INO80D is in forming a hetero-oligomeric protein complex by interacting with other members of the hINO80 complex, and therefore a minor change in folding could have a significant effect on protein binding or protein–DNA interactions (Figure 4D; in the Data Supplement).20,23,27,28 Although we did not perform functional studies in animal models, published reports pertaining to the INO80 complex and its subunits point to an important role for

Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015

612  Circ Cardiovasc Genet  October 2014

Figure 4.  Sequence features of INO80D protein: (A) conservation of S818C mutation and flanking region across 8 mammalian species (B) secondary structure, solvent accessibility score of mutation site (Ser818Cys), and flanking residues of wild-type and mutated sequences (C) composition of low-complexity regions (LCRs) in wild-type isoforms and Ser818Cys mutant sequences (D) protein–protein interaction network mediated by INO8OD: violet edges indicate experimentally validated protein–protein interaction and purple edges indicate homologous protein–protein interactions observed in model organisms. ACTL6A indicates actin-like 6A (IN080K); ACTR8, ARP8 actinrelated protein 8 homolog (yeast)(IN080N); MCRSI, microspherule protein 1 (IN080Q); NFRKB, nuclear factor realted to kappaB binding protein (IN080G); RUVBL2, RuvB-like AAA ATPase 2 (IN080J); TFPT, TCF3 (E2A) fusion partner (in childhood Leukemia)(IN080F); UCHL5, ubiquitin carboxyl-terminal hydrolase L5 (IN080R); and YY1, YY1 transcription factor.

INO80D in cardiovascular and aging traits.29–35 Based on a gene expression evolution database (http://bgee.unil. ch), we found that INO80D is expressed in the aorta (adult), heart (fetus and adult), coronary arteries (adult), and myocardium (child and adult). In the infant mouse, Ino80d is expressed in aorta and aortic valve. We compiled microRNA (miRNA) expression and associated phenotype data from publicly available databases36,37 and noted that several miRNAs that target INO80D (Data Supplement) are differentially expressed in the setting of diverse phenotypes observed in the affected the individuals (Data Supplement). Of 27 experimentally validated miRNAs that target INO80D, 10 are implicated in ≥1 phenotypes (vascular disease, hypertension, coronary artery disease, cataract, and increased dental root mobility [periodontitis]) observed in the affected individuals, and 12 miRNAs were implicated in related phenotypes, including heart failure, myocardial infarction, aging, and cardiac hypertrophy. Morpholino knockdown of pontin (ortholog of human RuvB [Escherichia coli homolog]-like 1 [RUVBL1], member of INO80 complex [INO80 complex subunit H]) in zebrafish causes cardiac hyperplasia, whereas an insertional mutation of reptin (ortholog of human RuvB [E coli homolog]–like 2 [RUVBL2], member of INO80 complex [INO80 complex subunit J]) that activates the APTase activity of reptin stimulates

cardiomyocyte proliferation in zebrafish.38 Mutation screening in Drosophila melanogaster suggests that the dINO80 complex regulates multiple genes through epigenetic mechanisms.39

Chromatin Remodeling Complexes and Accelerated Aging Syndromes Chromatin remodeling is a basic biological mechanism that is crucial for transcription, gene regulation, DNA replication, DNA repair, and nucleosome remodeling. Chromatin remodeling complexes are classified into 4 major families with varying number of protein subunits as follows: SWItch/ Sucrose NonFermentable (SWI/SNF) family (8–14 subunits), imitation SWI family (2–4 subunits), chromodomain helicase DNA-binding (CHD) family (1–10 subunits), and INO80 family (>10 subunits).40 INO80 complex and its subunits20 are involved in maintaining genomic integrity,41 transcription,24 DNA replication,42,43 damage tolerance, and double-strand break repair mechanisms.33,44–50 INO80D subunit is involved in chromosome segregation34 and also interacts with telomere,51 histone core, actin, and actin-related proteins.52–54 The molecular and genetic basis of several accelerated aging syndromes has been attributed to DNA damage and genome instability, but specific mechanistic roles are unclear.55 Several syndromes with clinical features of accelerated aging are

Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015

Shameer et al   An INO80D Mutation Implicated in Arterial Aging   613 associated with mutations in chromatin remodeling complexes (listed in Table V in the Data Supplement). Key clinical phenotypes (arterial calcification, early onset-systolic hypertension, wide pulse pressure suggestive of arterial stiffness, atherosclerosis, cataract, and increased dental mobility), present in the affected siblings in our study are consistent with accelerated aging. Although no syndromic associations are known for human or other mammalian INO80 complexes,20,24,40 further analysis of the INO80 complex and its subunits could provide additional insights. In summary, using WES, we identified the genetic basis of a syndrome with aortic hypoplasia and calcification, early onsetsystolic hypertension with wide pulse pressure suggestive of arterial stiffness, atherosclerosis, cataract, and increased dental mobility. We used a strategy of homozygosity mapping using high-density genotyping data, as well as WES for hypothesisfree disease gene mapping, in the 2 affected siblings. A missense mutation (Ser818Cys) in INO80D was identified as the likely cause of the syndrome, using 2 different analytic pipelines. In-depth functional annotation approaches were used to assess the effect of the mutation on the INO80D function. Additional studies in model organisms will be helpful in further delineating the role of INO80 complex in chromatin remodeling and aging.

Acknowledgments We acknowledge Advanced Genomic Technology Center and Microarray Shared Resource at Mayo Clinic for technical assistance.

Sources of Funding Dr Kullo is supported by grant U01HG006379 from the National Human Genome Research Institute and a Marriot Award for Individualized Medicine.

Disclosures None.

References 1. Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet. 2011;12:443–451. 2. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, et al. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet. 2010;42:30–35. 3. Tennessen JA, Bigham AW, O’Connor TD, Fu W, Kenny EE, Gravel S, et al; Broad GO; Seattle GO; NHLBI Exome Sequencing Project. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337:64–69. 4. Gilissen C, Hoischen A, Brunner HG, Veltman JA. Unlocking Mendelian disease using exome sequencing. Genome Biol. 2011;12:228. 5. Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science. 1987;236:1567–1570. 6. Clark AG. The size distribution of homozygous segments in the human genome. Am J Hum Genet. 1999;65:1489–1492. 7. Szpiech ZA, Xu J, Pemberton TJ, Peng W, Zöllner S, Rosenberg NA, et al. Long runs of homozygosity are enriched for deleterious variation. Am J Hum Genet. 2013;93:90–102. 8. Takata A, Kato M, Nakamura M, Yoshikawa T, Kanba S, Sano A, et al. Exome sequencing identifies a novel missense variant in RRM2B associated with autosomal recessive progressive external ophthalmoplegia. Genome Biol. 2011;12:R92. 9. Chahrour MH, Yu TW, Lim ET, Ataman B, Coulter ME, Hill RS, et al; ARRA Autism Sequencing Collaboration. Whole-exome sequencing and homozygosity analysis implicate depolarization-regulated neuronal genes in autism. PLoS Genet. 2012;8:e1002635.

10. Shamseldin HE, Elfaki M, Alkuraya FS. Exome sequencing reveals a novel Fanconi group defined by XRCC2 mutation. J Med Genet. 2012;49:184–186. 11. Walsh T, Shahin H, Elkan-Miller T, Lee MK, Thornton AM, Roeb W, et al. Whole exome sequencing and homozygosity mapping identify mutation in the cell polarity protein GPSM2 as the cause of nonsyndromic hearing loss DFNB82. Am J Hum Genet. 2010;87:90–94. 12. Erlich Y, Edvardson S, Hodges E, Zenvirt S, Thekkat P, Shaag A, et al. Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis. Genome Res. 2011;21:658–664. 13. Abu-Safieh L, Alrashed M, Anazi S, Alkuraya H, Khan AO, Al-Owain M, et al. Autozygome-guided exome sequencing in retinal dystrophy patients reveals pathogenetic mutations and novel candidate disease genes. Genome Res. 2013;23:236–247. 14. Kullo IJ, Fan J, Pathak J, Savova GK, Ali Z, Chute CG. Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J Am Med Inform Assoc. 2010;17:568–574. 15. Jouni H, Shameer K, Asmann YW, Hazin R, de Andrade M, Kullo IJ. Clinical correlates of autosomal chromosomal abnormalities in an electronic medical record-linked Genome-wide association study: a case series. Journal of Investigative Medicine: the Official Publication of the American Federation for Clinical Research. 2013;1. 16. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and populationbased linkage analyses. Am J Hum Genet. 2007;81:559–575. 17. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics. 2013;14:178–92. 18. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–249. 19. Han P, Hang CT, Yang J, Chang CP. Chromatin remodeling in cardiovascular development and physiology. Circ Res. 2011;108:378–96. 20. Chen L, Cai Y, Jin J, Florens L, Swanson SK, Washburn MP, et al. Subunit organization of the human INO80 chromatin remodeling complex: an evolutionarily conserved core complex catalyzes ATP-dependent nucleosome remodeling. J Biol Chem. 2011;286:11283–11289. 21. Jin J, Cai Y, Yao T, Gottschalk AJ, Florens L, Swanson SK, et al. A mammalian chromatin remodeling complex with similarities to the yeast INO80 complex. J Biol Chem. 2005;280:41207–41212. 22. Morrison AJ, Shen X. Chromatin remodelling beyond transcription: the INO80 and SWR1 complexes. Nat Rev Mol Cell Biol. 2009;10:373–384. 23. Bao Y, Shen X. INO80 subfamily of chromatin remodeling complexes. Mutat Res. 2007;618:18–29. 24. Conaway RC, Conaway JW. The INO80 chromatin remodeling complex in transcription, replication and repair. Trends Biochem Sci. 2009;34:71–77. 25. Ferrè F, Clote P. DiANNA: a web server for disulfide connectivity prediction. Nucleic Acids Res. 2005;33(Web Server issue):W230–W232. 26. Wedemeyer WJ, Welker E, Narayan M, Scheraga HA. Disulfide bonds and protein folding. Biochemistry. 2000;39:4207–4216. 27. Shen X. Preparation and analysis of the INO80 complex. Methods Enzymol. 2004;377:401–412. 28. Fass D. Disulfide bonding in protein biophysics. Annu Rev Biophys. 2012;41:63–79. 29. Hur SK, Park EJ, Han JE, Kim YA, Kim JD, Kang D, et al. Roles of human INO80 chromatin remodeling enzyme in DNA replication and chromosome segregation suppress genome instability. Cell Mol Life Sci. 2010;67:2283–2296. 30. Park EJ, Hur SK, Kwon J. Human INO80 chromatin-remodelling complex contributes to DNA double-strand break repair via the expression of Rad54B and XRCC3 genes. Biochem J. 2010;431:179–187. 31. Watanabe S, Peterson CL. The INO80 family of chromatin-remodeling enzymes: regulators of histone variant dynamics. Cold Spring Harb Symp Quant Biol. 2010;75:35–42. 32. Han P, Hang CT, Yang J, Chang CP. Chromatin remodeling in cardiovascular development and physiology. Circ Res. 2011;108:378–396. 33. Chambers AL, Downs JA. The RSC and INO80 chromatin-remodeling complexes in DNA double-strand break repair. Prog Mol Biol Transl Sci. 2012;110:229–261. 34. Chambers AL, Ormerod G, Durley SC, Sing TL, Brown GW, Kent NA, et al. The INO80 chromatin remodeling complex prevents polyploidy and maintains normal chromatin structure at centromeres. Genes Dev. 2012;26:2590–2603.

Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015

614  Circ Cardiovasc Genet  October 2014 35. Chang CP, Bruneau BG. Epigenetics and cardiovascular development. Annu Rev Physiol. 2012;74:41–68. 36. Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39(Database issue):D152–D157. 37. Lu M, Zhang Q, Deng M, Miao J, Guo Y, Gao W, et al. An analysis of ­human microRNA and disease associations. PLoS One. 2008;3:e3420. 38. Rottbauer W, Saurin AJ, Lickert H, Shen X, Burns CG, Wo ZG, et al. Reptin and pontin antagonistically regulate heart growth in zebrafish embryos. Cell. 2002;111:661–672. 39. Bhatia S, Pawar H, Dasari V, Mishra RK, Chandrashekaran S, Brahmachari V. Chromatin remodeling protein INO80 has a role in regulation of homeotic gene expression in Drosophila. Genes Cells. 2010;15:725–735. 40. Clapier CR, Cairns BR. The biology of chromatin remodeling complexes. Annu Rev Biochem. 2009;78:273–304. 41. Bakshi R, Mehta AK, Sharma R, Maiti S, Pasha S, Brahmachari V. Characterization of a human SWI2/SNF2 like protein hINO80: demonstration of catalytic and DNA binding activity. Biochem Biophys Res Commun. 2006;339:313–320. 42. Cai Y, Jin J, Yao T, Gottschalk AJ, Swanson SK, Wu S, et al. YY1 functions with INO80 to activate transcription. Nat Struct Mol Biol. 2007;14:872–874. 43. Falbo KB, Shen X. Function of the INO80 chromatin remodeling complex in DNA replication. Front Biosci. 2012;17:970–975. 44. Cairns BR. Around the world of DNA damage INO80 days. Cell. 2004;119:733–735. 45. Kawashima S, Ogiwara H, Tada S, Harata M, Wintersberger U, Enomoto T, et al. The INO80 complex is required for damage-induced recombination. Biochem Biophys Res Commun. 2007;355:835–841. 46. Morrison AJ, Kim JA, Person MD, Highland J, Xiao J, Wehr TS, et al. Mec1/Tel1 phosphorylation of the INO80 chromatin

remodeling complex influences DNA damage checkpoint responses. Cell. 2007;130:499–511. 47. Czaja W, Bespalov VA, Hinz JM, Smerdon MJ. Proficient repair in chromatin remodeling defective ino80 mutants of Saccharomyces cerevisiae highlights replication defects as the main contributor to DNA damage sensitivity. DNA Repair (Amst). 2010;9:976–984. 48. Kashiwaba S, Kitahashi K, Watanabe T, Onoda F, Ohtsu M, Murakami Y. The mammalian INO80 complex is recruited to DNA damage sites in an ARP8 dependent manner. Biochem Biophys Res Commun. 2010;402:619–625. 49. Sarkar S, Kiely R, McHugh PJ. The Ino80 chromatin-remodeling complex restores chromatin structure during UV DNA damage repair. J Cell Biol. 2010;191:1061–1068. 50. Kato D, Waki M, Umezawa M, Aoki Y, Utsugi T, Ohtsu M, et al. Phosphorylation of human INO80 is involved in DNA damage tolerance. Biochem Biophys Res Commun. 2012;417:433–438. 51. Yu EY, Steinberg-Neifach O, Dandjinou AT, Kang F, Morrison AJ, Shen X, et al. Regulation of telomere structure and functions by subunits of the INO80 chromatin remodeling complex. Mol Cell Biol. 2007;27:5639–5649. 52. Saravanan M, Wuerges J, Bose D, McCormack EA, Cook NJ, Zhang X, et al. Interactions between the nucleosome histone core and Arp8 in the INO80 chromatin remodeling complex. Proc Natl Acad Sci U S A. 2012;109:20883–20888. 53. Charles GM, Chen C, Shih SC, Collins SR, Beltrao P, Zhang X, et al. Site-specific acetylation mark on an essential chromatin-remodeling complex promotes resistance to replication stress. Proc Natl Acad Sci U S A. 2011;108:10620–10625. 54. van Attikum H, Fritsch O, Gasser SM. Distinct roles for SWR1 and INO80 chromatin remodeling complexes at chromosomal double-strand breaks. EMBO J. 2007;26:4113–4125. 55. Dreesen O, Stewart CL. Accelerated aging syndromes, are they relevant to normal human aging? Aging. 2011;3:889–895.

CLINICAL PERSPECTIVE Genome sequencing is increasingly being used to delineating the genetic basis of rare diseases that cluster in families. We performed whole exome sequencing in a family with 2 siblings with aortic hypoplasia, premature atherosclerosis, arterial stiffness, and cataract. Using 2 complementary exome sequence analyses pipelines and filtering methods, we identified a mutation (Ser818Cys) in INO80D as the casual variant. INO80D is a subunit of the human INO80 chromatin remodeling complex and plays a key role in chromatin-dependent functions, including transcription, DNA repair, and DNA replication. Genetic variations in chromatin remodeling complexes have been previously implicated in cardiovascular development, accelerated aging, and developmental syndromes. Our findings suggest that the Ser818Cys mutation in the INO80D subunit of human INO80 chromatin remodeling complex leads to a syndrome of accelerated arterial aging manifesting as aortic hypoplasia, early atherosclerosis, and arterial stiffness.

Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015

SUPPLEMENTAL  MATERIAL                                                  

 

1

Library  preparation,  whole  exome  capture  and  sequencing   Paired-­‐end  indexed  libraries  were  prepared  following  the  manufacturer’s  (Agilent   Technologies,  Santa  Clara,  CA)  protocol.    Briefly,  target  DNA  (3ηg  in  120  ul  TE  buffer)  was   fragmented  using  a  Covaris  E210  sonicator  (Covaris  Inc,  Woburn,  Mass.,)  using  a  duty  cycle  of  10%,   intensity  5,  cycles  200,  time  360  seconds,  resulting  in  double-­‐stranded  DNA  fragments  with  blunt  or   sticky  ends  with  a  fragment  size  between  150-­‐200  bp.  The  ends  were  repaired  and  phosphorylated   using  Klenow,  T4  polymerase,  and  T4  polynucleotide  kinase,  after  which  “A”  base  is  added  to  the  3’   ends  of  double-­‐stranded  DNA  using  Klenow  exo-­‐  (3’  to  5’  exo  minus).    Paired  end  Index  DNA   adaptors  (Agilent  Technologies,  Santa  Clara,  CA)  with  a  single  “T”  base  overhang  at  the  3’  end  were   ligated  and  resulting  constructs  were  purified  using  AMPure  SPRI  beads  from  Agencourt.  The   adapter-­‐modified  DNA  fragments  were  enriched  by  4  cycles  of  PCR  using  InPE  1.0  forward  and   SureSelect  Pre-­‐Capture  Indexing  reverse  (Agilent  Technologies,  Santa  Clara,  Ca)  primers.  The   concentration  and  size  distribution  of  the  libraries  were  determined  on  an  Agilent  Bioanalyzer  DNA   1000  chip  (Agilent  Technologies,  Santa  Clara,  Ca).   Exome  capture  was  carried  out  using  the  protocol  for  Agilent’s  SureSelect  Human  All  Exon   50MB  kit  (Agilent  Technologies,  Santa  Clara,  Ca).  This  kit  encompasses  coding  exons  annotated  by   the  GENCODE  project  (www.sanger.ac.uk/gencode/)  as  well  as  consensus  coding  sequence  (CCDS,   www.ncbi.nlm.nih.gov/CCDS/)  and  RefSeq  (www.ncbi.nlm.nih.gov/refseq/)  databases  and   incorporates  exomic  regions  and  non-­‐coding  RNAs  from  miRBase  (v.13)  and  Rfam  databases  to   provide  a  capture  size  of  approximately  50  Mb.  500  ng  of  the  prepped  library  was  incubated  for  24   hours  at  65  °C  with  whole  exon  biotinylated  RNA  capture  baits  supplied  in  the  kit.  The  captured   DNA:RNA  hybrids  were  recovered  using  Dynabeads  MyOne  Streptavidin  T1  from  Dynal  (Invitrogen,   Carlsbad,  CA).  DNA  was  eluted  from  the  beads  and  purified  using  Ampure  XP  beads  from  Agencourt   (Beckman  Coulter,  Brea,  CA).  The  purified  capture  products  were  then  amplified  using  the   SureSelect  Post-­‐Capture  Indexing  forward  and  Index  PCR  reverse  primers  (Agilent)  for  12  cycles.   Libraries  were  validated  and  quantified  on  the  Agilent  Bioanalyzer  (Agilent  Technologies,  Santa   Clara,  Ca).   For  individuals  L6-­‐L9,  libraries  were  loaded  onto  paired  end  flow  cells  at  concentrations  of   4-­‐5  pM  to  generate  cluster  densities  of  300,000-­‐500,000/mm2  following  Illumina’s  standard   protocol  using  the  Illumina  cBot  and  HiSeq  Paired  end  cluster  kit  version  1  (Illumina,  San  Diego,   CA).  The  flow  cells  were  sequenced  as  101  X  2  paired  end  reads  on  an  Illumina  HiSeq  2000  using  

 

2

TruSeq  SBS  sequencing  kit  version  1  and  HiSeq  data  collection  version  1.1.37.0  software.  Base-­‐ calling  was  performed  using  Illumina’s  RTA  version  1.7.45.0.   For  individual  L10,  libraries  were  loaded  onto  paired  end  flow  cells  at  concentrations  of  7.5   pM  to  generate  cluster  densities  of  500,000-­‐600,000/mm2  following  Illumina’s  standard  protocol   using  the  Illumina  cBot  and  HiSeq  Paired  end  cluster  kit  version  3.  The  flow  cells  were  sequenced  as   101  X  2  paired  end  reads  on  an  Illumina  HiSeq  2000  using  TruSeq  SBS  sequencing  kit  version  3  and   HiSeq  data  collection  version  1.4.8  software.  Base  calling  was  performed  using  Illumina’s  RTA   version  1.12.4.2.   Genotype  calling  and  variant  filtration     The  technical  challenge  of  the  modern  era  of  genomic  medicine  and  personalized  exome   analytics  is  in  the  effective  use  of  combination  of  tools  to  find  software-­‐agnostic,  highly  concordant,   high-­‐quality  genetic  variants  underlying  complex,  familial  diseases[1].  To  address  this  challenge,  we   used  two  computational  genomic  data  analysis  pipelines  and  two  complementary  variant  filtering   methods.    Both  pipelines  included  analyses  modules  for  quality  control,  sequence  alignment  (two   different  aligners:  BWA  and  Novoaligner),  base  quality  score  recalibration,  and  variant  calling  and   complimentary  variant  filtering  methods.    

Illumina  fastq  files  were  converted  to  Sanger  fastq  files  using  the  MAQ  software  

(http://maq.sourceforge.net/).  We  used  the  FASTX-­‐toolkit   (http://hannonlab.cshl.edu/fastx_toolkit/)  for  preprocessing  short-­‐read  fastq  files.  The   preprocessing  steps  included  clipping  sequencing  primers/adapter  sequences,  trimming  sequences   based  on  the  quality  scores,  and  filtering  artifacts  and  low  quality  sequences.  We  used  FastQC   (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/)  to  perform  QC  on  raw  and  QC  filtered   sequence  data.    We  aligned  sequence-­‐reads  with  the  reference  genome  GRCh37/hg19  from  the   1000  Genomes  project,  using  the  Burrows-­‐Wheeler  Aligner  (BWA)  software.  We  used  a  Perl  script   (cmpfastq.pl  [http://compbio.brc.iop.kcl.ac.uk/software/cmpfastq.php])  to  identify  paired  and  un-­‐ paired  reads.  BWA  software  was  then  used  to  align  single-­‐end  and  paired-­‐end  data  separately.  The   generated  SAM  files  were  merged  using  PICARD  (http://picard.sourceforge.net/index.shtml)  to   generate  sorted  BAM  files.  The  BAM  files  were  indexed  using  SAMtools   (http://samtools.sourceforge.net/).   We  used  the  Genome  Analysis  Toolkit[2]  (GATK  v1.0.5777,   http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit)  for  post-­‐ alignment  processing  of  BAM  files,  including  local  realignment  around  insertions  or  deletions  

 

3

(indels),  removal  of  duplicates,  and  base  quality  score  recalibration.  Due  to  alignment  artifacts  or   false  positive  SNPs,  sequence  aligners  are  unable  to  perfect  map  reads  containing  indels.  Multiple   steps  were  employed  in  the  realignment  process,  i.e.,  determining  (small)  suspicious  intervals   which  are  likely  in  need  of  realignment,  running  the  realigner  over  those  intervals,  and  fixing  the   mate  pairs  of  realigned  reads.  The  realigned,  fixed,  and  sorted  BAM  files  were  generated  for  each   sample.  Duplicate  reads  were  located  and  removed  using  PICARD  tools  (MarkDuplicates).  Finally,   we  corrected  for  variation  in  quality  with  machine  cycle  and  sequence  context  by  analyzing  the  co-­‐ variation  among  several  features  of  the  base  (i.e.,  reported  quality  score,  the  position  within  the   read,  the  preceding  and  current  nucleotide  observed  by  the  sequencing  machine,  probability  of   mismatching  the  reference  genome,  and  known  SNPs  taken  into  account).  The  recalibrated  quality   scores  are  more  accurate.[2]   Analysis  of  depth  of  coverage  in  the  final  BAM  files  indicated  that  approximately  80%  of  the   exomic  regions  were  present  >8  times  in  the  five  patients  (-­‐-­‐minMappingQuality=10  and  -­‐-­‐ minBaseQuality=20).  We  then  used  GATK  ‘UnifiedGenotyper’  for  multiple-­‐sample  calling  to   generate  raw  variants.  GATK  applies  a  Bayesian  algorithm  for  variant  discovery  and  genotyping   that  simultaneously  estimates  the  probability  that  two  alleles  A  (the  reference  allele),  and  B  (the   alternative  allele),  are  segregating  in  a  sample  of  N  individuals  and  the  likelihoods  for  each  of  the   AA,  AB  and  BB  genotypes  for  each  of  individual  samples.[2]  If  the  genotype  for  this  individual  could   not  be  assigned  based  on  the  genotype  likelihood  model,  an  unknown  genotype  ‘N’  was  assigned.     To  generate  analysis-­‐ready  variants,  the  GATK  ‘VariantFiltration’  was  used  to  annotate   suspicious  calls  from  variant  calling  format  (VCF)  files  based  on  their  failing  given  filters.  Raw  SNP   calls  were  filtered  using  empirically  derived  cut-­‐offs  for  the  following  GATK  filter  expressions:  – filterExpression  “QUAL−0.10  ||  SNPcluster  ||  InDel”  –filterName   “StandardFilters”  –filterExpression  “DP=  4   &&  ((MQ0  /  (1.0  *  DP))  >  0.1)”  –filterName  “Hard2Validate”,  where  DP–sequencing  depth  at  the   SNP  position;  QD–QUAL/DP  ratio  at  the  SNP  position;  HRun–maximal  length  of  the  homopolymer   run;  SB–strand  bias  at  the  SNP  position;  SNPcluster–  3  SNPs  with  10  bp  of  each  other;  InDel–SNP   calls  around  the  raw  InDels  calls;  and  MQ0–the  number  of  mapping-­‐quality  zero  reads  at  the   position.  The  resulting  VCF  file  was  annotated  using  SeattleSeq  Annotation  Server  and  filtered   based  on  variant  quality  and  localization  of  variants  overlapping  the  ROH  regions.     In  the  confirmatory    pipeline,  the  fastq  files  were  processed  using  GenomeGPS  pipeline  v2.0.   Briefly,  the  Illumina  paired  end  reads  were  aligned  to  the  hg19  reference  genome  using  Novoalign  

 

4

(http://novocraft.com)  followed  by  the  sorting  and  marking  of  duplicate  reads  using  Picard.  Local   realignment  of  INDELs  and  base  quality  score  recalibration  were  then  performed  using  the  Genome   Analysis  Toolkit  (GATK).  Single  nucleotide  variants  (SNVs)  and  insertions/deletions  (INDELs)  were   called  across  all  of  the  samples  simultaneously  using  GATK's  UnifiedGenotyper  with  variant  quality   score  recalibration.  Variants  were  annotated  using  a  custom  annotation  workflow  and  filtered   using  VAAST  v1.04  and  knowledge-­‐based  gene  lists  relevant  to  the  phenotype  observed.       Genotype  calling  and  variant  filtration     Results  from  the  primary  pipeline  based  were  filtered  using  annotation  database  following   strict  criteria  and  localization  of  variants  in  ROH  regions  shared  by  the  family  (Table  S2).     Results  from  the  confirmatory  pipeline  were  filtered  using  two  different  methods.    The  first   variant  filtering  method  focused  on  annotation  driven  filtering  followed  by  presence  of  variants   based  on  localization  of  a  variant  on  ROH  region  shared  by  the  pedigree.  The  second  filtering   method  followed  a  knowledge-­‐based  approach  to  look  for  rare  variants  in  the  known  genes   associated  with  leading  phenotypes  followed  by  a  probabilistic  disease  variant  identification  using   VAAST.    The  analysis  was  carried  out  to  identify  high-­‐confidence  variants  in  the  affected  individuals   using  a  pipeline  that  use  a  different  short-­‐read  aligner.  Given  the  fact  that  two  siblings  were   affected,  we  ruled  out  the  possibility  that  the  mutation  arose  from  a  de  novo  mutation  as  such  a   possibility  is  extremely  low  in  likelihood..  We  also  ruled  out  the  possibility  of  uniparental  disomy   (UPD)  or  a  single  copy  deletion  occurring  in  both  siblings,  as  this  is  also  extremely  unlikely.   (i) Method  1:  Single-­‐  homozygous  alternative  mutation  at  a  position  where  both  parents  are   heterozygous   Variants  were  processed  using  VAAST  v1.04  (3)  configured  to  fit  a  recessive  mode  of   inheritance  in  a  trio-­‐mode  assessment.  Variant  calls  from  the  two  affected  siblings  were  intersected   and  input  as  the  “proband”.    The  resulting  statistically  significant  candidate  variants  were  filtered  to   exclude  any  findings  positive  in  the  unaffected  sibling.  The  program  was  configured  with  the   following  options:   m  lrt  -­‐o  output_trio_pnt_i_02_21_2013  -­‐pnt  c  -­‐-­‐mp1  8  -­‐-­‐less_ram  -­‐-­‐fast_gp  -­‐-­‐gp  1e10  -­‐-­‐ significance  2.4e-­‐6  -­‐-­‐codon_bias  -­‐iht  r  -­‐-­‐locus_heterogeneity  n  –trio   The  VAAST  program  was  configured  to  run  in  a  recessive  mode  with  no  locus  heterogeneity.    Given   the  limitations  of  the  current  version  of  the  software,  we  ran  it  with  the  affected  siblings’  genotype  

 

5

data  intersected  and  compared  against  the  parents.  A  post-­‐analysis  filter  was  then  applied  to   remove  potential  candidates  that  were  found  in  the  unaffected  daughter.   (ii) Method  2:  Compound  heterozygous  variant  where  each  parent  carries  only  one  variant   Variant  calls  were  filtered  for  positions  that  were  homozygous  alternative  in  the  affected   siblings,  heterozygous  in  the  parents,  and  either  heterozygous  or  homozygous  reference  in  the   unaffected  sibling.    Results  were  then  filtered  to  select  non-­‐synonymous  mutations  with  an  MAF  <   0.1  in  the  ESP6500  datasets  and  all  HapMap  and  1k  Genome  population  datasets.  Results  from  the   two  methods  were  combined  and  manually  reviewed  for  relevance  to  the  disease  state  using   GLAD4U  (http://bioinfo.vanderbilt.edu/glad4u/).  Two  gene  lists  were  created,  one  with  eight   targets  using  the  specific  term  “aortic  hypoplasia-­‐atherosclerosis  syndrome”  and  a  second  list  that   generated  196  targets  using  the  generic  term  “atherosclerosis”.     Initial  results  were  processed  to  find  Method  (i)  variants.  Results  were  filtered  to  restrict   the  unaffected  sibling  to  being  either  heterozygous  or  homozygous  reference.  Additionally,  variants   were  be  filtered  for  non-­‐synonymous  changes.    Variant  list  were  initially  filtered  on  the  basis  of   MAF  (<  0.1)  are  as  follows:  595  variants  (both  affected  siblings  were  homozygous  alternative  and   both  parents  were  heterozygous).  Of  these,  there  were  387  variants  (unaffected  sister  was  either   homozygous  reference  or  heterozygous).  Of  these,  68  variants  encoded  non-­‐synonymous  changes.   Further  results  from  variant  filtering  using  MAF  were  compared  against  the  candidate  gene  lists   and  a  common  variant  in  APOB    (chr2:21250914  G>A;  A618V)  was  observed.    4/68  variants  that   had  population  frequencies  <  0.1  in  all  population  datasets  queried.  This  includes  two  deleterious   and  damaging  variants  in  INO80D  (Chr2:206869724  T>A)  and  NPIPA5  (Chr16:  15463612  G>A),     and  two  tolerated  and  benign  variants  in  SLC35B1  (Chr17:47783663  C>T)  and  SALL3   (Chr18:76754549  T>C).       Sanger  sequencing    

To  validate  the  presence  of  the  mutation  we  used  Sanger  sequencing.  The  FASTA  sequence  

of  two  genes  of  interest  (that  satisfied  strict  filtering  of  discovery  and  validation  pipelines,  VAAST   analyses  and  ROH  filtering  results)  were  obtained  using  NCBI  nucleotide  search  to  design  the   primers  http://www.ncbi.nlm.nih.gov/nuccore.  This  FASTA  sequence  was  used  as  a  query  to   search  in  NCBI  Primer  BLAST  http://www.ncbi.nlm.nih.gov/tools/primer-­‐ blast/index.cgi?LINK_LOC=BlastHome.  Primers  pairs  were  selected  based  on  primer  length  (18-­‐30  

 

6

bp),  GC  content,  theoretical  melting  temperature  (Tm  =  59-­‐60°C),  and  product  size.  BLAST  was  used   to  check  the  specificity  of  primers.  PAGE-­‐purified  oligos  (Integrated  DNA  Technologies,  IA,  USA)   were  used  for  real-­‐time  PCR.  Primers  used  for  resequencing  mutation  sites  in  INO80D  and   TMPRS11E  is  provided  (Table  S4).  Sanger  sequencing  for  loci  mapping  was  performed  using  Big   Dye  terminator  chemistries  on  ABI  3730xl  (Life  technologies;  Carlsbad,  CA)  sequencer.     Sequence,  structure  and  functional  annotation  of  INO8OD     Conserved  domains  and  motifs  in  INO80D:    

We  performed  a  comparative  sequence  analysis  of  wild-­‐type  isoforms  and  derived  sequence  

with  Ser818Cys  mutation.  Two  wild-­‐type  isoform  sequences  and  mutated  sequences  were  used  to   assess  the  secondary  structure,  solvent  accessibility  and  distribution  of  LCRs.  Secondary  structure   and  solvent  accessibility  were  predicted  from  the  sequence  using  SABBLE  [3],  and  LCRs  were   characterized  using  SEG  program  integrated  in  SMART  database[4].     We  also  characterized  conserved  functional  domains  in  INO80D  using  sequence  based   protein  domain  searches:  protein  sequences  (Q53TQ3-­‐1  and  Q53TQ3-­‐2)  were  scanned  against   Pfam  database.  INO80D  encodes  two  copies  of  zf-­‐C3Hc3H  domains  in  both  isoforms.  Seven  low-­‐ complexity  regions  (LCRs)  were  found  in  the  sequence  of  Q53TQ3-­‐1  and  nine  different  segments  of   (LCR)  were  found  on  Q53TQ3-­‐2  using  SEG  [5]  program  integrated  in  SMART[4,  6].  LCRs  are  tandem   sequence  repeats  in  the  protein  universe  and  were  often  excluded  in  the  past  prior  to  detailed   sequence  analysis.  For  example,  sequence  search  algorithms  like  BLAST  mask  off  sequence  with   low  compositional  complexity.  Recent  studies  on  LCR  function  suggest  that  protein  sequences  with   LCRs  have  several  important  functional  roles.  LCRs  are  common  in  protein  sequence  space  and   observed  in  diverse  proteins.  Proteins  with  LCRs  have  the  higher  number  of  first-­‐degree  interaction   partners  when  compared  to  proteins  without  LCRs  (Wilcoxon-­‐Mann-­‐Whitney  test;  P30%)[8].  A  remote  homolog  (16%)  of  INO80D  sequence  was   found  in  PDB  identifier  2VZ9,  A  chain  (structure  of  mammalian  fatty  acid  synthase)  [9].  This  could   be  a  further  pointer  that  IN080D  may  encode  a  novel  fold  or  a  fold  similar  to  fatty  acid  synthase   (Figure:  S1(a)).  As  2VZ9  is  not  yet  incorporated  in  the  database  of  Structural  Classification  of   Proteins  (SCOP)  [10],  hence  an  objective  fold  recognition  approach  was  not  possible  to  detect   additional  structural  relationships.     Interactome  of  INO80D:    

To  understand  the  functional  context  of  INO80D  from  a  network  perspective,  first-­‐degree  

interactome  of  INO80D  was  obtained  from  IntAct,  a  database  of  experimentally  characterized   protein-­‐protein  interactions  and  visualized  using  Cytoscape  (Figure  3(d)).  20  interactions   originated  from  spoke-­‐expanded  co-­‐complexes  were  reported  in  IntAct.  This  indicates  that  INO80D   gene  product  is  involved  in  multiple  protein-­‐protein  interactions,  a  hallmark  feature  of  proteins   with  LCRs  in  addition  to  an  important  subunit  of  the  INO80  complex.     Phylogenetic  analysis  of  IN080D:      

INO80D  is  a  component  of  human  INO80  complex,  which  has  multiple  functions  including  

chromatin  remodeling.  INO80D  is  a  non-­‐conserved  subunit  in  human,  yeast  and  drosophila.  Exact   evolutionary  lineage  of  INO80D  is  unclear.  To  understand  functional  role  from  homologs  of   INO80D,  we  performed  a  detailed  phylogenetic  analysis.  Sequence  of  the  longest  isoform  was  used   for  homology  search  using  PSI-­‐BLAST,  and  a  phylogenetic  tree  was  constructed  using  Phylip  v3.6   (http://evolution.genetics.washington.edu/phylip.html)and  visualized  using  iTOL  [11].  PSI-­‐BLAST   search  [12]  (E-­‐value:  0.05)  was  performed  against  non-­‐redundant  database  (nr)  with  sequences   from  GenBank  CDS  translations,  PDB,  SwissProt,  PIR  and  PRF.  From  first  iteration,  146  sequences   were  obtained.  146  sequences  were  aligned  using  Clustal-­‐Omega  [13].  Bootstrapping  of  the  output   from  Clustal-­‐Omega  was  performed  using  seqboot  (1000  iterations).  ‘protdist’  program  was  used  to   derive  the  pairwise  distance  between  146  sequences.  Phylogenetic  trees  were  derived  from  

 

8

‘protdist’  output  using  ‘neighbor’  program  (Neighbor-­‐joining  tree  method).  Consensus  trees  with   bootstrap  values  were  derived  from  ‘neighbor’  output  using  ‘consense’  program.  Nodes  of  INO80D   phylogenetic  tree  co-­‐clustered  with  the  query  sequence  (Q53TQ3-­‐2)  indicate  that  INO80D  is   conserved  exclusively  in  higher  eukaryotes  and  the  functions  of  the  co-­‐clustered  proteins  are   largely  unknown  (Figure  S1(b)).  This  indicates  that  INO80D  is  a  metazoan  specific  protein,  and  it   may  have  a  recent  evolutionary  history.       MicroRNAs  targeting  INO80D:    

MicroRNA  (miRNA)  molecules  have  established  role  in  the  regulating  genes  involved  in  

cardiovascular  and  aging  phenotypes  [14-­‐17]  via  translational  repression  pathways  [18].  To   understand  whether  any  known  miRNAs  implicated  in  cardiovascular  or  aging  phenotypes,  we   compiled  literature  reports  and  miRNA  expression  data.  To  perform  this  analysis  we  retrieved  all   miRNAs  targeting  the  UTR  region  of  INO80D.  A  list  of  putative  miRNAs  that  could  target  INO80D   was  identified  by  TargetScan  search  [19-­‐21]  using  a  library  of  regulatory  targets  of  mammalian  and   vertebrate  miRNAs.    A  list  of  27  miRNAs  was  retrieved,  and  clinical  phenotypes  associated  with   these  miRNAs  were  obtained  from  Human  MiRNA  &  Disease  Database  (HMDD).  We  noted  that   several  miRNAs  implicated  in  cardiovascular  and  aging  phenotypes  target  INO80D  (See   Supplementary  Table  S6)  suggesting  a  regulatory  perturbation  of  INO80D  in  the  setting  of  various   disease  phenotypes.       Disease  or  quantitative  traits  associated  with  INO80  complex  subunits:      

We  compiled  results  from  published  genome-­‐wide  association  studies  to  understand  the  

genetic  role  of  different  subunits  of  INO80  complex.  Subunits  of  INO80  complex  and  their   phylogenetic  similarity  compiled  from  protein  databases  are  provided  (Figure  S1(c)).  Published   GWAS  reports  suggest  that  that  subunits  of  INO80  complex  were  associated  with  phenotypes  like   extreme  obesity,  heart  rate  and  capecitabine  sensitivity  (Table  S7).      

 

9

Table  S1.  Application,  results  and  inferences  from  various  tools  employed  for  functional  analysis  of   Ser818Cys  mutation  on  INO80D   Application   Tool   Result   Inference   Prediction  of  conserved   SMART   No  conserved   INO80D  encodes   domains  and  motifs     domains  predicted,   multiple  LCRs  and  the   LCRs  are  predicted   mutation  site  is  part  of   LCR-­‐7     Prediction  of  conserved   Pfam   Encodes  2  copies  of     Presence  of  zf-­‐C3Hc3H   domains  and  motifs   Potential  DNA-­‐ indicates  its  functional   binding  domain  “zf-­‐ role  in  mediating   C3Hc3H”   protein-­‐DNA  binding   and  related  functional   mechanisms     Prediction  of  unassigned   PURE   No  distant  domain   No  known  domains   region   association  predicted   could  be  assigned  to   INO80D  using  PURE       Homology  modeling   ModWeb/ModBase   Template  search   IN080D  may  encode  a   (template  search)   identified  remote   novel  fold  or  a  fold   homolog  (16%)  A   similar  to  fatty  acid   chain  of  structure  of   synthase   mammalian  fatty  acid   synthase,  PDB  ID:   2VZ9  with  no  SCOP   classification       Homology  modeling     ModWeb/ModBase   No  single  structural   Homology  model   homologs  were  found   derived  using  low-­‐ with  sequence   similarity  (<  30%)   similarity  above  the   templates  are  not  ideal   twilight  zone  (>30%).   for  structure  analysis       Phylogenetic  analysis   PSI-­‐BLAST,  Phylip   A  phylogenetic  tree   Tree  depicts  that   v3.6,  iTOL   was  derived  using   INO80D  is  conserved   protein  homologs.     exclusively  in  higher   (Figure  S1)   eukaryotes  and  the   functions  of  the  co-­‐ clustered  proteins  are   largely  unknown     Protein-­‐protein  interaction   IntAct   First-­‐degree   20  interactions   analysis     interactome  of   originating  from  spoke   INO80D  was  obtained     expanded  co-­‐ complexes  were   reported  in  IntAct   indicating  that  INO80D   is  involved  in  multiple   protein-­‐protein    

10

interactions,  a  feature   of  proteins  with  LCRs     Secondary  structure  and   solvent  accessibility  

SABBLE  

Secondary  structure   and  solvent   accessibility   predicted  using  the   sequence  of  INO80D   isoforms    

Comparative  analysis   of  wild-­‐type  and   mutant  sequence   revealed  changes  in   solvent  accessibility   due  to  mutation    

     

                                             

 

11

Table  S2:  Homozygous  regions  identified  using  Runs  of  Homozygosity  (ROH)  routine  in   PLINK     %&'($

!"

#$%!&%'()*" #$&)+)!)," !(&-.(+-(%%" !()-)%)-,,," %!%,/,+" !!,"

%%"

!'-!%&"

;!

#8?@?!

#8(=)@?(=($ (A;B@;(B=AC$

(A@B>;;BA>@$

))>AD;=$ )@)@B

Whole exome sequencing implicates an INO80D mutation in a syndrome of aortic hypoplasia, premature atherosclerosis, and arterial stiffness.

Massively parallel, high-throughput sequencing technology is helping to generate new insights into the genetic basis of human diseases. We used whole ...
2MB Sizes 0 Downloads 4 Views