Original Article Whole Exome Sequencing Implicates an INO80D Mutation in a Syndrome of Aortic Hypoplasia, Premature Atherosclerosis, and Arterial Stiffness Khader Shameer, PhD; Eric W. Klee, PhD; Angela K. Dalenberg, BS; Iftikhar J. Kullo, MD Background—Massively parallel, high-throughput sequencing technology is helping to generate new insights into the genetic basis of human diseases. We used whole exome sequencing to identify the mutation underlying a syndrome affecting 2 siblings with aortic hypoplasia, calcific atherosclerosis, systolic hypertension, and premature cataract. Methods and Results—Exonic regions were captured and sequenced using a next-generation sequencing platform to generate 100 bases paired-end reads. A computational genomic data analysis pipeline was used to perform quality control, align reads to a reference genome, and identify genetic variants; findings were confirmed using a different exome analyses pipeline. The 2 siblings were homozygous for a rare missense mutation (Ser818Cys) in INO80D, a subunit of the human INO80 chromatin remodeling complex. Homozygosity mapping and Sanger sequencing confirmed that the mutation is located in one of the runs of homozygosity on chromosome 2. INO80D encodes a key subunit of the human IN080 complex, a multiprotein complex involved in DNA binding, chromatin modification, organization of chromosome structure, and ATP-dependent nucleosome sliding. By introducing a new disulphide-bond in the protein product and also disrupting the composition of low-complexity regions, the Ser818Cys mutation may affect INO80D function, protein– protein interactions, and chromatin remodeling. Conclusions—Our findings suggest a link between the Ser818Cys mutation in INO80D, a subunit of the human INO80 chromatin remodeling complex, and accelerated arterial aging. (Circ Cardiovasc Genet. 2014;7:607-614.) Key Words: aortic hypoplasia ◼ arterial stiffness ◼ atherosclerosis ◼ chromatin assembly and disassembly ◼ genetics ◼ INO80D protein, human
M
assively parallel sequencing is a powerful tool to identify the genetic basis of heritable diseases.1 Whole exome sequencing (WES) is a cost-efficient strategy to study the ≈1% of the human genome that is protein coding and has been used with success to identify genes for rare diseases that cluster in families.2,3 Consequently, WES is increasingly being used to investigate the genetic causes of hitherto uncharacterized inherited disorders, and several disease genes have been identified using this technology.4
Clinical Perspective on p 614 In this report, we describe an application of WES to identify the genetic basis of a syndrome affecting 2 nonHispanic white siblings who were referred to the Early Atherosclerosis Clinic of the Mayo Clinic. The affected siblings had aortic hypoplasia, diffuse arterial calcification, calcific aortic valve stenosis, systolic hypertension, and wide pulse pressure indicative of arterial stiffness, peripheral arterial disease, premature cataract, and excessive dental root mobility. These clinical features were not present
in the parents and 2 other siblings. To identify the mutation underlying this syndrome, we performed high-density genotyping, runs of homozygozity analysis, and WES in the 2 affected siblings, as well as WES in the 1 of the 2 unaffected siblings and the parents. The Institutional Review Board of the Mayo Clinic approved the study and the family members provided informed consent.
Methods Clinical Phenotyping There was no history of consanguinity between the parents of affected individuals. Clinical features of the affected siblings (L6 and L7), an unaffected sibling (L10) and the parents (L8 and L9), are summarized in the Table. The male sibling presented with systolic hypertension at the age of 17 years and the female sibling presented with systolic hypertension in her early 20s and cataract at the age of 30 years. Both were subsequently noted to have a hypoplastic aorta with extensive calcification, features that were not present in the parents or the other 2 siblings, thereby suggesting a recessive mode of inheritance (Figure 1). The male sibling died at the age of 42 years of a myocardial infarction after aortic valve replacement surgery.
Received May 23, 2013; accepted June 27, 2014. From the Division of Cardiovascular Diseases (K.S., A.K.D., I.J.K.) and Biomedical Statistics and Informatics, Departments of Health Sciences Research and Laboratory Medicine and Pathology (E.W.K.), Mayo Clinic, Rochester MN. The current address for Dr Shameer is Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY. The Data Supplement is available at http://circgenetics.ahajournals.org/lookup/suppl/doi:10.1161/CIRCGENETICS.113.000233/-/DC1. Correspondence to Iftikhar J. Kullo, MD, Mayo Clinic, 200 1st S SW, Rochester, MN 55905. E-mail
[email protected] © 2014 American Heart Association, Inc. Circ Cardiovasc Genet is available at http://circgenetics.ahajournals.org
DOI: 10.1161/CIRCGENETICS.113.000233
Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015 607
608 Circ Cardiovasc Genet October 2014 Table.
Clinical Features of the 5 Members of the Affected Family L6 (Affected Son)
Sex
L7 (Affected Daughter)
L10 (Unaffected Daughter)
L8 (Father)
L9 (Mother)
Men
Women
Women
Men
Women
Died at the age of 41 y
Alive, age 42 y
Alive, age 38 y
Alive, age 70 y
Alive, age 66 y
Hypoplasia of the ascending (18 mm) and descending (14 mm) aorta with calcification
Hypoplasia of ascending (17 mm) and descending (12 mm) aorta with calcification
...†
Normal dimensions on echo
Normal dimensions on echo
Valve disease
Severe calcific aortic valve stenosis
Mild calcific aortic valve stenosis
None
None
None
CAD
Mild coronary atherosclerosis on coronary angiogram
CABG at the age of 52 y
No history of CABG
PAD
Aorto-iliac bypass graft at the age of 32 y for claudication
Mild claudication
No history of PAD
No history of PAD
No history of PAD
Moderate to severe plaque bilaterally
Mild plaque in both carotid bulbs
Normal
Mild plaque
...†
Systolic hypertension with wide pulse pressure, onset at the age of 17 y
Systolic hypertension with wide pulse pressure, onset in 20s
Normotensive
Hypertensive
Normotensive
Vital status Aorta
Carotid ultrasound Hypertension
Coronary calcification No coronary calcification on CT scan (score >90th on CT scan at age 27 percentile); but no significant stenosis on coronary angiogram at age 40
Hyperlipidemia
Yes
Yes
None
Yes
Yes
Smoking
None
None
None
None
None
No
No
No
Yes
No
Type 2 diabetes mellitus BMI, kg/m Cataract Teeth
2
23
22
22
32
24
Cataract surgery at the age of 32 y
Cataract surgery at the age of 34 y
None
None
None
Short roots with excessive mobility of the teeth
Short roots with excessive mobility of the teeth; extraction of all teeth at the age of 42 y
No history of dental problems
No history of dental problems
No history of dental problems
BMI indicates body mass index; CABG, coronary artery bypass graft surgery; CAD coronary artery disease; CT, computed tomography; and PAD, peripheral arterial disease. †A clinical phenotype was not observed or the test was not performed.
High-Density Genotyping and Homozygozity Mapping Homozygous segments are likely to carry disease mutations in inbred and outbred families, and homozygosity mapping is a means of localizing deleterious variants associated with recessive traits.5–7 Several recent exome analyses included runs of homozygozity–based filtering as a strategy for identifying causal variants in familial studies.8–13 Homozygosity mapping was performed in the 2 affected siblings who had been genotyped on the Illumina Human-660w Quad-version 1 platform as part of a genome-wide association study of peripheral arterial disease.14 A total of 476 395 single nucleotide polymorphisms (SNPs) passed the following quality control criteria: SNP call rate >98%, minor allele frequency >0.05, and Hardy–Weinberg equilibrium P value TGT) in the 2 parents and 3 siblings by Sanger sequencing. Primers for amplifying the genomic regions around these 2 mutation sites are listed in Table IV in the Data Supplement. Output from Sanger sequencing was visualized and mutations confirmed using 4Peaks software (Figure 3).
Frequency and Pathogenicity of the Missense Mutation in INO80D The novel missense mutation in INO80D was not present in dbSNP and the 1000 genomes project databases. In the Exome Variant Server (http://evs.gs.washington.edu/EVS/), the frequency of mutant allele A was 0.28% in 1085 sequenced European Americans. With an allelic frequency of A at 0.28%, the expected frequency of genotype AA is ≈8×10–6 (ie, 8 AA homozygotes per million). A common method to address pathogenicity is to assume that purifying selection constrains evolutionary divergence at phenotypically important nucleotides and amino acids. Multiple sequence alignments of
INO80D indicated conserved orthologs present in chimpanzee, dog, cow, mouse, rat, chicken, and zebrafish (Figure 4A). The effects of the missense mutation (Ser818Cys, NP_060229) in INO80D were measured by conservation (phastCons score of 1.0 and GERP score of 4.46) and difference in amino acid property (Grantham score of 112), suggesting the mutated site is in an evolutionarily conserved amino acid. This missense mutation was predicted to be probably damaging by Polyphen2,18 with a score of 0.990 (sensitivity, 0.71; specificity, 0.96).
Discussion In the present study, we combined homozygosity mapping and WES to identify the genetic basis of a syndrome observed in 2 siblings, with features of accelerated aging, including aortic hypoplasia, calcification and stiffness, early onset atherosclerotic vascular disease, systolic hypertension with wide pulse pressure, premature cataract, and excessive dental root mobility. A missense mutation (Ser818Cys) in INO80D was identified using integrated filtering of WES data and confirmed by an alternate exome analyses pipeline; the mutation mapped to one of the homozygous regions on chromosome 2 shared by the 2 affected siblings.
Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015
Shameer et al An INO80D Mutation Implicated in Arterial Aging 611
Figure 3. Sanger sequencing data visualized using 4Peaks software for c.2857T>A, NM_017759; AGT>TGT (Ser818Cys) mutation in an unaffected (top, AGT(S)) and affected individual (bottom, TGT(C)).
The INO80 complex is a chromatin remodeler involved in cardiovascular development and physiology.19 INO80D is a metazoan-specific subunit of the human INO80 complex that plays a crucial role in transcription and DNA-replication and repair.20,21 In humans, the INO80 complex21–24 is an ATP-dependent chromatin remodeling complex composed of 3 modules that assemble with 3 distinct domains of the hIno80 ATPase.20 Sequence annotations compiled in Pfam database suggest that INO80D encodes 2 copies of the zf-C3Hc3H domain, which functions as a DNA-binding domain of chromatin remodeling proteins and helicases (http://pfam.xfam.org/protein/Q53TQ3). The INO80D protein (Uniprot identifier Q53TQ3) has 2 isoforms of length 878 (Q53TQ3-1) and 1027 (Q53TQ3-2) residues, produced by alternative splicing. The mutation site (c.2857T>A, NM_017759) was mapped to residue number 818 of canonical transcripts, indicating that irrespective of the expressed isoform, the mutation would have a functional effect. The mutation results in the replacement of serine with cysteine and may lead to an altered steric effect during protein–protein interactions.20,21 To assess the functional significance of the mutation site, we used and integrated protein sequence, structure, and network-based interactome analyses (Figure 4; in the Data Supplement). Comparative analysis of secondary structure and solvent accessibility revealed that secondary structure conformation is not altered, but solvent accessibility is changed (Figure 4B). The mutation site (Ser818Cys) was observed to be a part of a low-complexity region (LCR) LCR-7 in both isoforms.
Mutation site Ser818Cys is observed in LCR-7 in Q53TQ3-1 and Q53TQ3-2 but LCR-7 was absent in mutated sequences and indicates that the mutation results in the loss of LCR-7 in both isoforms (Figure 4C). Loss of LCR-7 could affect protein–protein interactions or other functions mediated by LCRs. As INO80D interacts with several proteins and is a key component of the human IN080 complex, which mediates several important cellular processes20,21 loss of LCR may have significant functional effects. In addition to LCR composition, we used protein secondary structure data to understand the effect of the mutation. Using secondary structure information derived from protein sequence and cysteine residue frequencies, the disulfide-bond connectivity prediction program DiANNA25 suggested that a new disulfide-bond may be generated between the 456th amino acid (VKGEQCANKAL) and the 818th mutated amino acid (RQQYSCDHSHS) with a score of 0.999. Therefore, Ser818Cys may induce a new disulfide-bond that could lead to misfolding and thereby affect INO80D function.26 A major role of INO80D is in forming a hetero-oligomeric protein complex by interacting with other members of the hINO80 complex, and therefore a minor change in folding could have a significant effect on protein binding or protein–DNA interactions (Figure 4D; in the Data Supplement).20,23,27,28 Although we did not perform functional studies in animal models, published reports pertaining to the INO80 complex and its subunits point to an important role for
Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015
612 Circ Cardiovasc Genet October 2014
Figure 4. Sequence features of INO80D protein: (A) conservation of S818C mutation and flanking region across 8 mammalian species (B) secondary structure, solvent accessibility score of mutation site (Ser818Cys), and flanking residues of wild-type and mutated sequences (C) composition of low-complexity regions (LCRs) in wild-type isoforms and Ser818Cys mutant sequences (D) protein–protein interaction network mediated by INO8OD: violet edges indicate experimentally validated protein–protein interaction and purple edges indicate homologous protein–protein interactions observed in model organisms. ACTL6A indicates actin-like 6A (IN080K); ACTR8, ARP8 actinrelated protein 8 homolog (yeast)(IN080N); MCRSI, microspherule protein 1 (IN080Q); NFRKB, nuclear factor realted to kappaB binding protein (IN080G); RUVBL2, RuvB-like AAA ATPase 2 (IN080J); TFPT, TCF3 (E2A) fusion partner (in childhood Leukemia)(IN080F); UCHL5, ubiquitin carboxyl-terminal hydrolase L5 (IN080R); and YY1, YY1 transcription factor.
INO80D in cardiovascular and aging traits.29–35 Based on a gene expression evolution database (http://bgee.unil. ch), we found that INO80D is expressed in the aorta (adult), heart (fetus and adult), coronary arteries (adult), and myocardium (child and adult). In the infant mouse, Ino80d is expressed in aorta and aortic valve. We compiled microRNA (miRNA) expression and associated phenotype data from publicly available databases36,37 and noted that several miRNAs that target INO80D (Data Supplement) are differentially expressed in the setting of diverse phenotypes observed in the affected the individuals (Data Supplement). Of 27 experimentally validated miRNAs that target INO80D, 10 are implicated in ≥1 phenotypes (vascular disease, hypertension, coronary artery disease, cataract, and increased dental root mobility [periodontitis]) observed in the affected individuals, and 12 miRNAs were implicated in related phenotypes, including heart failure, myocardial infarction, aging, and cardiac hypertrophy. Morpholino knockdown of pontin (ortholog of human RuvB [Escherichia coli homolog]-like 1 [RUVBL1], member of INO80 complex [INO80 complex subunit H]) in zebrafish causes cardiac hyperplasia, whereas an insertional mutation of reptin (ortholog of human RuvB [E coli homolog]–like 2 [RUVBL2], member of INO80 complex [INO80 complex subunit J]) that activates the APTase activity of reptin stimulates
cardiomyocyte proliferation in zebrafish.38 Mutation screening in Drosophila melanogaster suggests that the dINO80 complex regulates multiple genes through epigenetic mechanisms.39
Chromatin Remodeling Complexes and Accelerated Aging Syndromes Chromatin remodeling is a basic biological mechanism that is crucial for transcription, gene regulation, DNA replication, DNA repair, and nucleosome remodeling. Chromatin remodeling complexes are classified into 4 major families with varying number of protein subunits as follows: SWItch/ Sucrose NonFermentable (SWI/SNF) family (8–14 subunits), imitation SWI family (2–4 subunits), chromodomain helicase DNA-binding (CHD) family (1–10 subunits), and INO80 family (>10 subunits).40 INO80 complex and its subunits20 are involved in maintaining genomic integrity,41 transcription,24 DNA replication,42,43 damage tolerance, and double-strand break repair mechanisms.33,44–50 INO80D subunit is involved in chromosome segregation34 and also interacts with telomere,51 histone core, actin, and actin-related proteins.52–54 The molecular and genetic basis of several accelerated aging syndromes has been attributed to DNA damage and genome instability, but specific mechanistic roles are unclear.55 Several syndromes with clinical features of accelerated aging are
Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015
Shameer et al An INO80D Mutation Implicated in Arterial Aging 613 associated with mutations in chromatin remodeling complexes (listed in Table V in the Data Supplement). Key clinical phenotypes (arterial calcification, early onset-systolic hypertension, wide pulse pressure suggestive of arterial stiffness, atherosclerosis, cataract, and increased dental mobility), present in the affected siblings in our study are consistent with accelerated aging. Although no syndromic associations are known for human or other mammalian INO80 complexes,20,24,40 further analysis of the INO80 complex and its subunits could provide additional insights. In summary, using WES, we identified the genetic basis of a syndrome with aortic hypoplasia and calcification, early onsetsystolic hypertension with wide pulse pressure suggestive of arterial stiffness, atherosclerosis, cataract, and increased dental mobility. We used a strategy of homozygosity mapping using high-density genotyping data, as well as WES for hypothesisfree disease gene mapping, in the 2 affected siblings. A missense mutation (Ser818Cys) in INO80D was identified as the likely cause of the syndrome, using 2 different analytic pipelines. In-depth functional annotation approaches were used to assess the effect of the mutation on the INO80D function. Additional studies in model organisms will be helpful in further delineating the role of INO80 complex in chromatin remodeling and aging.
Acknowledgments We acknowledge Advanced Genomic Technology Center and Microarray Shared Resource at Mayo Clinic for technical assistance.
Sources of Funding Dr Kullo is supported by grant U01HG006379 from the National Human Genome Research Institute and a Marriot Award for Individualized Medicine.
Disclosures None.
References 1. Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet. 2011;12:443–451. 2. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, et al. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet. 2010;42:30–35. 3. Tennessen JA, Bigham AW, O’Connor TD, Fu W, Kenny EE, Gravel S, et al; Broad GO; Seattle GO; NHLBI Exome Sequencing Project. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337:64–69. 4. Gilissen C, Hoischen A, Brunner HG, Veltman JA. Unlocking Mendelian disease using exome sequencing. Genome Biol. 2011;12:228. 5. Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science. 1987;236:1567–1570. 6. Clark AG. The size distribution of homozygous segments in the human genome. Am J Hum Genet. 1999;65:1489–1492. 7. Szpiech ZA, Xu J, Pemberton TJ, Peng W, Zöllner S, Rosenberg NA, et al. Long runs of homozygosity are enriched for deleterious variation. Am J Hum Genet. 2013;93:90–102. 8. Takata A, Kato M, Nakamura M, Yoshikawa T, Kanba S, Sano A, et al. Exome sequencing identifies a novel missense variant in RRM2B associated with autosomal recessive progressive external ophthalmoplegia. Genome Biol. 2011;12:R92. 9. Chahrour MH, Yu TW, Lim ET, Ataman B, Coulter ME, Hill RS, et al; ARRA Autism Sequencing Collaboration. Whole-exome sequencing and homozygosity analysis implicate depolarization-regulated neuronal genes in autism. PLoS Genet. 2012;8:e1002635.
10. Shamseldin HE, Elfaki M, Alkuraya FS. Exome sequencing reveals a novel Fanconi group defined by XRCC2 mutation. J Med Genet. 2012;49:184–186. 11. Walsh T, Shahin H, Elkan-Miller T, Lee MK, Thornton AM, Roeb W, et al. Whole exome sequencing and homozygosity mapping identify mutation in the cell polarity protein GPSM2 as the cause of nonsyndromic hearing loss DFNB82. Am J Hum Genet. 2010;87:90–94. 12. Erlich Y, Edvardson S, Hodges E, Zenvirt S, Thekkat P, Shaag A, et al. Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis. Genome Res. 2011;21:658–664. 13. Abu-Safieh L, Alrashed M, Anazi S, Alkuraya H, Khan AO, Al-Owain M, et al. Autozygome-guided exome sequencing in retinal dystrophy patients reveals pathogenetic mutations and novel candidate disease genes. Genome Res. 2013;23:236–247. 14. Kullo IJ, Fan J, Pathak J, Savova GK, Ali Z, Chute CG. Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J Am Med Inform Assoc. 2010;17:568–574. 15. Jouni H, Shameer K, Asmann YW, Hazin R, de Andrade M, Kullo IJ. Clinical correlates of autosomal chromosomal abnormalities in an electronic medical record-linked Genome-wide association study: a case series. Journal of Investigative Medicine: the Official Publication of the American Federation for Clinical Research. 2013;1. 16. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and populationbased linkage analyses. Am J Hum Genet. 2007;81:559–575. 17. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics. 2013;14:178–92. 18. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–249. 19. Han P, Hang CT, Yang J, Chang CP. Chromatin remodeling in cardiovascular development and physiology. Circ Res. 2011;108:378–96. 20. Chen L, Cai Y, Jin J, Florens L, Swanson SK, Washburn MP, et al. Subunit organization of the human INO80 chromatin remodeling complex: an evolutionarily conserved core complex catalyzes ATP-dependent nucleosome remodeling. J Biol Chem. 2011;286:11283–11289. 21. Jin J, Cai Y, Yao T, Gottschalk AJ, Florens L, Swanson SK, et al. A mammalian chromatin remodeling complex with similarities to the yeast INO80 complex. J Biol Chem. 2005;280:41207–41212. 22. Morrison AJ, Shen X. Chromatin remodelling beyond transcription: the INO80 and SWR1 complexes. Nat Rev Mol Cell Biol. 2009;10:373–384. 23. Bao Y, Shen X. INO80 subfamily of chromatin remodeling complexes. Mutat Res. 2007;618:18–29. 24. Conaway RC, Conaway JW. The INO80 chromatin remodeling complex in transcription, replication and repair. Trends Biochem Sci. 2009;34:71–77. 25. Ferrè F, Clote P. DiANNA: a web server for disulfide connectivity prediction. Nucleic Acids Res. 2005;33(Web Server issue):W230–W232. 26. Wedemeyer WJ, Welker E, Narayan M, Scheraga HA. Disulfide bonds and protein folding. Biochemistry. 2000;39:4207–4216. 27. Shen X. Preparation and analysis of the INO80 complex. Methods Enzymol. 2004;377:401–412. 28. Fass D. Disulfide bonding in protein biophysics. Annu Rev Biophys. 2012;41:63–79. 29. Hur SK, Park EJ, Han JE, Kim YA, Kim JD, Kang D, et al. Roles of human INO80 chromatin remodeling enzyme in DNA replication and chromosome segregation suppress genome instability. Cell Mol Life Sci. 2010;67:2283–2296. 30. Park EJ, Hur SK, Kwon J. Human INO80 chromatin-remodelling complex contributes to DNA double-strand break repair via the expression of Rad54B and XRCC3 genes. Biochem J. 2010;431:179–187. 31. Watanabe S, Peterson CL. The INO80 family of chromatin-remodeling enzymes: regulators of histone variant dynamics. Cold Spring Harb Symp Quant Biol. 2010;75:35–42. 32. Han P, Hang CT, Yang J, Chang CP. Chromatin remodeling in cardiovascular development and physiology. Circ Res. 2011;108:378–396. 33. Chambers AL, Downs JA. The RSC and INO80 chromatin-remodeling complexes in DNA double-strand break repair. Prog Mol Biol Transl Sci. 2012;110:229–261. 34. Chambers AL, Ormerod G, Durley SC, Sing TL, Brown GW, Kent NA, et al. The INO80 chromatin remodeling complex prevents polyploidy and maintains normal chromatin structure at centromeres. Genes Dev. 2012;26:2590–2603.
Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015
614 Circ Cardiovasc Genet October 2014 35. Chang CP, Bruneau BG. Epigenetics and cardiovascular development. Annu Rev Physiol. 2012;74:41–68. 36. Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39(Database issue):D152–D157. 37. Lu M, Zhang Q, Deng M, Miao J, Guo Y, Gao W, et al. An analysis of human microRNA and disease associations. PLoS One. 2008;3:e3420. 38. Rottbauer W, Saurin AJ, Lickert H, Shen X, Burns CG, Wo ZG, et al. Reptin and pontin antagonistically regulate heart growth in zebrafish embryos. Cell. 2002;111:661–672. 39. Bhatia S, Pawar H, Dasari V, Mishra RK, Chandrashekaran S, Brahmachari V. Chromatin remodeling protein INO80 has a role in regulation of homeotic gene expression in Drosophila. Genes Cells. 2010;15:725–735. 40. Clapier CR, Cairns BR. The biology of chromatin remodeling complexes. Annu Rev Biochem. 2009;78:273–304. 41. Bakshi R, Mehta AK, Sharma R, Maiti S, Pasha S, Brahmachari V. Characterization of a human SWI2/SNF2 like protein hINO80: demonstration of catalytic and DNA binding activity. Biochem Biophys Res Commun. 2006;339:313–320. 42. Cai Y, Jin J, Yao T, Gottschalk AJ, Swanson SK, Wu S, et al. YY1 functions with INO80 to activate transcription. Nat Struct Mol Biol. 2007;14:872–874. 43. Falbo KB, Shen X. Function of the INO80 chromatin remodeling complex in DNA replication. Front Biosci. 2012;17:970–975. 44. Cairns BR. Around the world of DNA damage INO80 days. Cell. 2004;119:733–735. 45. Kawashima S, Ogiwara H, Tada S, Harata M, Wintersberger U, Enomoto T, et al. The INO80 complex is required for damage-induced recombination. Biochem Biophys Res Commun. 2007;355:835–841. 46. Morrison AJ, Kim JA, Person MD, Highland J, Xiao J, Wehr TS, et al. Mec1/Tel1 phosphorylation of the INO80 chromatin
remodeling complex influences DNA damage checkpoint responses. Cell. 2007;130:499–511. 47. Czaja W, Bespalov VA, Hinz JM, Smerdon MJ. Proficient repair in chromatin remodeling defective ino80 mutants of Saccharomyces cerevisiae highlights replication defects as the main contributor to DNA damage sensitivity. DNA Repair (Amst). 2010;9:976–984. 48. Kashiwaba S, Kitahashi K, Watanabe T, Onoda F, Ohtsu M, Murakami Y. The mammalian INO80 complex is recruited to DNA damage sites in an ARP8 dependent manner. Biochem Biophys Res Commun. 2010;402:619–625. 49. Sarkar S, Kiely R, McHugh PJ. The Ino80 chromatin-remodeling complex restores chromatin structure during UV DNA damage repair. J Cell Biol. 2010;191:1061–1068. 50. Kato D, Waki M, Umezawa M, Aoki Y, Utsugi T, Ohtsu M, et al. Phosphorylation of human INO80 is involved in DNA damage tolerance. Biochem Biophys Res Commun. 2012;417:433–438. 51. Yu EY, Steinberg-Neifach O, Dandjinou AT, Kang F, Morrison AJ, Shen X, et al. Regulation of telomere structure and functions by subunits of the INO80 chromatin remodeling complex. Mol Cell Biol. 2007;27:5639–5649. 52. Saravanan M, Wuerges J, Bose D, McCormack EA, Cook NJ, Zhang X, et al. Interactions between the nucleosome histone core and Arp8 in the INO80 chromatin remodeling complex. Proc Natl Acad Sci U S A. 2012;109:20883–20888. 53. Charles GM, Chen C, Shih SC, Collins SR, Beltrao P, Zhang X, et al. Site-specific acetylation mark on an essential chromatin-remodeling complex promotes resistance to replication stress. Proc Natl Acad Sci U S A. 2011;108:10620–10625. 54. van Attikum H, Fritsch O, Gasser SM. Distinct roles for SWR1 and INO80 chromatin remodeling complexes at chromosomal double-strand breaks. EMBO J. 2007;26:4113–4125. 55. Dreesen O, Stewart CL. Accelerated aging syndromes, are they relevant to normal human aging? Aging. 2011;3:889–895.
CLINICAL PERSPECTIVE Genome sequencing is increasingly being used to delineating the genetic basis of rare diseases that cluster in families. We performed whole exome sequencing in a family with 2 siblings with aortic hypoplasia, premature atherosclerosis, arterial stiffness, and cataract. Using 2 complementary exome sequence analyses pipelines and filtering methods, we identified a mutation (Ser818Cys) in INO80D as the casual variant. INO80D is a subunit of the human INO80 chromatin remodeling complex and plays a key role in chromatin-dependent functions, including transcription, DNA repair, and DNA replication. Genetic variations in chromatin remodeling complexes have been previously implicated in cardiovascular development, accelerated aging, and developmental syndromes. Our findings suggest that the Ser818Cys mutation in the INO80D subunit of human INO80 chromatin remodeling complex leads to a syndrome of accelerated arterial aging manifesting as aortic hypoplasia, early atherosclerosis, and arterial stiffness.
Downloaded from http://circgenetics.ahajournals.org/ at Tulane University on January 29, 2015
SUPPLEMENTAL MATERIAL
1
Library preparation, whole exome capture and sequencing Paired-‐end indexed libraries were prepared following the manufacturer’s (Agilent Technologies, Santa Clara, CA) protocol. Briefly, target DNA (3ηg in 120 ul TE buffer) was fragmented using a Covaris E210 sonicator (Covaris Inc, Woburn, Mass.,) using a duty cycle of 10%, intensity 5, cycles 200, time 360 seconds, resulting in double-‐stranded DNA fragments with blunt or sticky ends with a fragment size between 150-‐200 bp. The ends were repaired and phosphorylated using Klenow, T4 polymerase, and T4 polynucleotide kinase, after which “A” base is added to the 3’ ends of double-‐stranded DNA using Klenow exo-‐ (3’ to 5’ exo minus). Paired end Index DNA adaptors (Agilent Technologies, Santa Clara, CA) with a single “T” base overhang at the 3’ end were ligated and resulting constructs were purified using AMPure SPRI beads from Agencourt. The adapter-‐modified DNA fragments were enriched by 4 cycles of PCR using InPE 1.0 forward and SureSelect Pre-‐Capture Indexing reverse (Agilent Technologies, Santa Clara, Ca) primers. The concentration and size distribution of the libraries were determined on an Agilent Bioanalyzer DNA 1000 chip (Agilent Technologies, Santa Clara, Ca). Exome capture was carried out using the protocol for Agilent’s SureSelect Human All Exon 50MB kit (Agilent Technologies, Santa Clara, Ca). This kit encompasses coding exons annotated by the GENCODE project (www.sanger.ac.uk/gencode/) as well as consensus coding sequence (CCDS, www.ncbi.nlm.nih.gov/CCDS/) and RefSeq (www.ncbi.nlm.nih.gov/refseq/) databases and incorporates exomic regions and non-‐coding RNAs from miRBase (v.13) and Rfam databases to provide a capture size of approximately 50 Mb. 500 ng of the prepped library was incubated for 24 hours at 65 °C with whole exon biotinylated RNA capture baits supplied in the kit. The captured DNA:RNA hybrids were recovered using Dynabeads MyOne Streptavidin T1 from Dynal (Invitrogen, Carlsbad, CA). DNA was eluted from the beads and purified using Ampure XP beads from Agencourt (Beckman Coulter, Brea, CA). The purified capture products were then amplified using the SureSelect Post-‐Capture Indexing forward and Index PCR reverse primers (Agilent) for 12 cycles. Libraries were validated and quantified on the Agilent Bioanalyzer (Agilent Technologies, Santa Clara, Ca). For individuals L6-‐L9, libraries were loaded onto paired end flow cells at concentrations of 4-‐5 pM to generate cluster densities of 300,000-‐500,000/mm2 following Illumina’s standard protocol using the Illumina cBot and HiSeq Paired end cluster kit version 1 (Illumina, San Diego, CA). The flow cells were sequenced as 101 X 2 paired end reads on an Illumina HiSeq 2000 using
2
TruSeq SBS sequencing kit version 1 and HiSeq data collection version 1.1.37.0 software. Base-‐ calling was performed using Illumina’s RTA version 1.7.45.0. For individual L10, libraries were loaded onto paired end flow cells at concentrations of 7.5 pM to generate cluster densities of 500,000-‐600,000/mm2 following Illumina’s standard protocol using the Illumina cBot and HiSeq Paired end cluster kit version 3. The flow cells were sequenced as 101 X 2 paired end reads on an Illumina HiSeq 2000 using TruSeq SBS sequencing kit version 3 and HiSeq data collection version 1.4.8 software. Base calling was performed using Illumina’s RTA version 1.12.4.2. Genotype calling and variant filtration The technical challenge of the modern era of genomic medicine and personalized exome analytics is in the effective use of combination of tools to find software-‐agnostic, highly concordant, high-‐quality genetic variants underlying complex, familial diseases[1]. To address this challenge, we used two computational genomic data analysis pipelines and two complementary variant filtering methods. Both pipelines included analyses modules for quality control, sequence alignment (two different aligners: BWA and Novoaligner), base quality score recalibration, and variant calling and complimentary variant filtering methods.
Illumina fastq files were converted to Sanger fastq files using the MAQ software
(http://maq.sourceforge.net/). We used the FASTX-‐toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) for preprocessing short-‐read fastq files. The preprocessing steps included clipping sequencing primers/adapter sequences, trimming sequences based on the quality scores, and filtering artifacts and low quality sequences. We used FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) to perform QC on raw and QC filtered sequence data. We aligned sequence-‐reads with the reference genome GRCh37/hg19 from the 1000 Genomes project, using the Burrows-‐Wheeler Aligner (BWA) software. We used a Perl script (cmpfastq.pl [http://compbio.brc.iop.kcl.ac.uk/software/cmpfastq.php]) to identify paired and un-‐ paired reads. BWA software was then used to align single-‐end and paired-‐end data separately. The generated SAM files were merged using PICARD (http://picard.sourceforge.net/index.shtml) to generate sorted BAM files. The BAM files were indexed using SAMtools (http://samtools.sourceforge.net/). We used the Genome Analysis Toolkit[2] (GATK v1.0.5777, http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit) for post-‐ alignment processing of BAM files, including local realignment around insertions or deletions
3
(indels), removal of duplicates, and base quality score recalibration. Due to alignment artifacts or false positive SNPs, sequence aligners are unable to perfect map reads containing indels. Multiple steps were employed in the realignment process, i.e., determining (small) suspicious intervals which are likely in need of realignment, running the realigner over those intervals, and fixing the mate pairs of realigned reads. The realigned, fixed, and sorted BAM files were generated for each sample. Duplicate reads were located and removed using PICARD tools (MarkDuplicates). Finally, we corrected for variation in quality with machine cycle and sequence context by analyzing the co-‐ variation among several features of the base (i.e., reported quality score, the position within the read, the preceding and current nucleotide observed by the sequencing machine, probability of mismatching the reference genome, and known SNPs taken into account). The recalibrated quality scores are more accurate.[2] Analysis of depth of coverage in the final BAM files indicated that approximately 80% of the exomic regions were present >8 times in the five patients (-‐-‐minMappingQuality=10 and -‐-‐ minBaseQuality=20). We then used GATK ‘UnifiedGenotyper’ for multiple-‐sample calling to generate raw variants. GATK applies a Bayesian algorithm for variant discovery and genotyping that simultaneously estimates the probability that two alleles A (the reference allele), and B (the alternative allele), are segregating in a sample of N individuals and the likelihoods for each of the AA, AB and BB genotypes for each of individual samples.[2] If the genotype for this individual could not be assigned based on the genotype likelihood model, an unknown genotype ‘N’ was assigned. To generate analysis-‐ready variants, the GATK ‘VariantFiltration’ was used to annotate suspicious calls from variant calling format (VCF) files based on their failing given filters. Raw SNP calls were filtered using empirically derived cut-‐offs for the following GATK filter expressions: – filterExpression “QUAL−0.10 || SNPcluster || InDel” –filterName “StandardFilters” –filterExpression “DP= 4 && ((MQ0 / (1.0 * DP)) > 0.1)” –filterName “Hard2Validate”, where DP–sequencing depth at the SNP position; QD–QUAL/DP ratio at the SNP position; HRun–maximal length of the homopolymer run; SB–strand bias at the SNP position; SNPcluster– 3 SNPs with 10 bp of each other; InDel–SNP calls around the raw InDels calls; and MQ0–the number of mapping-‐quality zero reads at the position. The resulting VCF file was annotated using SeattleSeq Annotation Server and filtered based on variant quality and localization of variants overlapping the ROH regions. In the confirmatory pipeline, the fastq files were processed using GenomeGPS pipeline v2.0. Briefly, the Illumina paired end reads were aligned to the hg19 reference genome using Novoalign
4
(http://novocraft.com) followed by the sorting and marking of duplicate reads using Picard. Local realignment of INDELs and base quality score recalibration were then performed using the Genome Analysis Toolkit (GATK). Single nucleotide variants (SNVs) and insertions/deletions (INDELs) were called across all of the samples simultaneously using GATK's UnifiedGenotyper with variant quality score recalibration. Variants were annotated using a custom annotation workflow and filtered using VAAST v1.04 and knowledge-‐based gene lists relevant to the phenotype observed. Genotype calling and variant filtration Results from the primary pipeline based were filtered using annotation database following strict criteria and localization of variants in ROH regions shared by the family (Table S2). Results from the confirmatory pipeline were filtered using two different methods. The first variant filtering method focused on annotation driven filtering followed by presence of variants based on localization of a variant on ROH region shared by the pedigree. The second filtering method followed a knowledge-‐based approach to look for rare variants in the known genes associated with leading phenotypes followed by a probabilistic disease variant identification using VAAST. The analysis was carried out to identify high-‐confidence variants in the affected individuals using a pipeline that use a different short-‐read aligner. Given the fact that two siblings were affected, we ruled out the possibility that the mutation arose from a de novo mutation as such a possibility is extremely low in likelihood.. We also ruled out the possibility of uniparental disomy (UPD) or a single copy deletion occurring in both siblings, as this is also extremely unlikely. (i) Method 1: Single-‐ homozygous alternative mutation at a position where both parents are heterozygous Variants were processed using VAAST v1.04 (3) configured to fit a recessive mode of inheritance in a trio-‐mode assessment. Variant calls from the two affected siblings were intersected and input as the “proband”. The resulting statistically significant candidate variants were filtered to exclude any findings positive in the unaffected sibling. The program was configured with the following options: m lrt -‐o output_trio_pnt_i_02_21_2013 -‐pnt c -‐-‐mp1 8 -‐-‐less_ram -‐-‐fast_gp -‐-‐gp 1e10 -‐-‐ significance 2.4e-‐6 -‐-‐codon_bias -‐iht r -‐-‐locus_heterogeneity n –trio The VAAST program was configured to run in a recessive mode with no locus heterogeneity. Given the limitations of the current version of the software, we ran it with the affected siblings’ genotype
5
data intersected and compared against the parents. A post-‐analysis filter was then applied to remove potential candidates that were found in the unaffected daughter. (ii) Method 2: Compound heterozygous variant where each parent carries only one variant Variant calls were filtered for positions that were homozygous alternative in the affected siblings, heterozygous in the parents, and either heterozygous or homozygous reference in the unaffected sibling. Results were then filtered to select non-‐synonymous mutations with an MAF < 0.1 in the ESP6500 datasets and all HapMap and 1k Genome population datasets. Results from the two methods were combined and manually reviewed for relevance to the disease state using GLAD4U (http://bioinfo.vanderbilt.edu/glad4u/). Two gene lists were created, one with eight targets using the specific term “aortic hypoplasia-‐atherosclerosis syndrome” and a second list that generated 196 targets using the generic term “atherosclerosis”. Initial results were processed to find Method (i) variants. Results were filtered to restrict the unaffected sibling to being either heterozygous or homozygous reference. Additionally, variants were be filtered for non-‐synonymous changes. Variant list were initially filtered on the basis of MAF (< 0.1) are as follows: 595 variants (both affected siblings were homozygous alternative and both parents were heterozygous). Of these, there were 387 variants (unaffected sister was either homozygous reference or heterozygous). Of these, 68 variants encoded non-‐synonymous changes. Further results from variant filtering using MAF were compared against the candidate gene lists and a common variant in APOB (chr2:21250914 G>A; A618V) was observed. 4/68 variants that had population frequencies < 0.1 in all population datasets queried. This includes two deleterious and damaging variants in INO80D (Chr2:206869724 T>A) and NPIPA5 (Chr16: 15463612 G>A), and two tolerated and benign variants in SLC35B1 (Chr17:47783663 C>T) and SALL3 (Chr18:76754549 T>C). Sanger sequencing
To validate the presence of the mutation we used Sanger sequencing. The FASTA sequence
of two genes of interest (that satisfied strict filtering of discovery and validation pipelines, VAAST analyses and ROH filtering results) were obtained using NCBI nucleotide search to design the primers http://www.ncbi.nlm.nih.gov/nuccore. This FASTA sequence was used as a query to search in NCBI Primer BLAST http://www.ncbi.nlm.nih.gov/tools/primer-‐ blast/index.cgi?LINK_LOC=BlastHome. Primers pairs were selected based on primer length (18-‐30
6
bp), GC content, theoretical melting temperature (Tm = 59-‐60°C), and product size. BLAST was used to check the specificity of primers. PAGE-‐purified oligos (Integrated DNA Technologies, IA, USA) were used for real-‐time PCR. Primers used for resequencing mutation sites in INO80D and TMPRS11E is provided (Table S4). Sanger sequencing for loci mapping was performed using Big Dye terminator chemistries on ABI 3730xl (Life technologies; Carlsbad, CA) sequencer. Sequence, structure and functional annotation of INO8OD Conserved domains and motifs in INO80D:
We performed a comparative sequence analysis of wild-‐type isoforms and derived sequence
with Ser818Cys mutation. Two wild-‐type isoform sequences and mutated sequences were used to assess the secondary structure, solvent accessibility and distribution of LCRs. Secondary structure and solvent accessibility were predicted from the sequence using SABBLE [3], and LCRs were characterized using SEG program integrated in SMART database[4]. We also characterized conserved functional domains in INO80D using sequence based protein domain searches: protein sequences (Q53TQ3-‐1 and Q53TQ3-‐2) were scanned against Pfam database. INO80D encodes two copies of zf-‐C3Hc3H domains in both isoforms. Seven low-‐ complexity regions (LCRs) were found in the sequence of Q53TQ3-‐1 and nine different segments of (LCR) were found on Q53TQ3-‐2 using SEG [5] program integrated in SMART[4, 6]. LCRs are tandem sequence repeats in the protein universe and were often excluded in the past prior to detailed sequence analysis. For example, sequence search algorithms like BLAST mask off sequence with low compositional complexity. Recent studies on LCR function suggest that protein sequences with LCRs have several important functional roles. LCRs are common in protein sequence space and observed in diverse proteins. Proteins with LCRs have the higher number of first-‐degree interaction partners when compared to proteins without LCRs (Wilcoxon-‐Mann-‐Whitney test; P30%)[8]. A remote homolog (16%) of INO80D sequence was found in PDB identifier 2VZ9, A chain (structure of mammalian fatty acid synthase) [9]. This could be a further pointer that IN080D may encode a novel fold or a fold similar to fatty acid synthase (Figure: S1(a)). As 2VZ9 is not yet incorporated in the database of Structural Classification of Proteins (SCOP) [10], hence an objective fold recognition approach was not possible to detect additional structural relationships. Interactome of INO80D:
To understand the functional context of INO80D from a network perspective, first-‐degree
interactome of INO80D was obtained from IntAct, a database of experimentally characterized protein-‐protein interactions and visualized using Cytoscape (Figure 3(d)). 20 interactions originated from spoke-‐expanded co-‐complexes were reported in IntAct. This indicates that INO80D gene product is involved in multiple protein-‐protein interactions, a hallmark feature of proteins with LCRs in addition to an important subunit of the INO80 complex. Phylogenetic analysis of IN080D:
INO80D is a component of human INO80 complex, which has multiple functions including
chromatin remodeling. INO80D is a non-‐conserved subunit in human, yeast and drosophila. Exact evolutionary lineage of INO80D is unclear. To understand functional role from homologs of INO80D, we performed a detailed phylogenetic analysis. Sequence of the longest isoform was used for homology search using PSI-‐BLAST, and a phylogenetic tree was constructed using Phylip v3.6 (http://evolution.genetics.washington.edu/phylip.html)and visualized using iTOL [11]. PSI-‐BLAST search [12] (E-‐value: 0.05) was performed against non-‐redundant database (nr) with sequences from GenBank CDS translations, PDB, SwissProt, PIR and PRF. From first iteration, 146 sequences were obtained. 146 sequences were aligned using Clustal-‐Omega [13]. Bootstrapping of the output from Clustal-‐Omega was performed using seqboot (1000 iterations). ‘protdist’ program was used to derive the pairwise distance between 146 sequences. Phylogenetic trees were derived from
8
‘protdist’ output using ‘neighbor’ program (Neighbor-‐joining tree method). Consensus trees with bootstrap values were derived from ‘neighbor’ output using ‘consense’ program. Nodes of INO80D phylogenetic tree co-‐clustered with the query sequence (Q53TQ3-‐2) indicate that INO80D is conserved exclusively in higher eukaryotes and the functions of the co-‐clustered proteins are largely unknown (Figure S1(b)). This indicates that INO80D is a metazoan specific protein, and it may have a recent evolutionary history. MicroRNAs targeting INO80D:
MicroRNA (miRNA) molecules have established role in the regulating genes involved in
cardiovascular and aging phenotypes [14-‐17] via translational repression pathways [18]. To understand whether any known miRNAs implicated in cardiovascular or aging phenotypes, we compiled literature reports and miRNA expression data. To perform this analysis we retrieved all miRNAs targeting the UTR region of INO80D. A list of putative miRNAs that could target INO80D was identified by TargetScan search [19-‐21] using a library of regulatory targets of mammalian and vertebrate miRNAs. A list of 27 miRNAs was retrieved, and clinical phenotypes associated with these miRNAs were obtained from Human MiRNA & Disease Database (HMDD). We noted that several miRNAs implicated in cardiovascular and aging phenotypes target INO80D (See Supplementary Table S6) suggesting a regulatory perturbation of INO80D in the setting of various disease phenotypes. Disease or quantitative traits associated with INO80 complex subunits:
We compiled results from published genome-‐wide association studies to understand the
genetic role of different subunits of INO80 complex. Subunits of INO80 complex and their phylogenetic similarity compiled from protein databases are provided (Figure S1(c)). Published GWAS reports suggest that that subunits of INO80 complex were associated with phenotypes like extreme obesity, heart rate and capecitabine sensitivity (Table S7).
9
Table S1. Application, results and inferences from various tools employed for functional analysis of Ser818Cys mutation on INO80D Application Tool Result Inference Prediction of conserved SMART No conserved INO80D encodes domains and motifs domains predicted, multiple LCRs and the LCRs are predicted mutation site is part of LCR-‐7 Prediction of conserved Pfam Encodes 2 copies of Presence of zf-‐C3Hc3H domains and motifs Potential DNA-‐ indicates its functional binding domain “zf-‐ role in mediating C3Hc3H” protein-‐DNA binding and related functional mechanisms Prediction of unassigned PURE No distant domain No known domains region association predicted could be assigned to INO80D using PURE Homology modeling ModWeb/ModBase Template search IN080D may encode a (template search) identified remote novel fold or a fold homolog (16%) A similar to fatty acid chain of structure of synthase mammalian fatty acid synthase, PDB ID: 2VZ9 with no SCOP classification Homology modeling ModWeb/ModBase No single structural Homology model homologs were found derived using low-‐ with sequence similarity (< 30%) similarity above the templates are not ideal twilight zone (>30%). for structure analysis Phylogenetic analysis PSI-‐BLAST, Phylip A phylogenetic tree Tree depicts that v3.6, iTOL was derived using INO80D is conserved protein homologs. exclusively in higher (Figure S1) eukaryotes and the functions of the co-‐ clustered proteins are largely unknown Protein-‐protein interaction IntAct First-‐degree 20 interactions analysis interactome of originating from spoke INO80D was obtained expanded co-‐ complexes were reported in IntAct indicating that INO80D is involved in multiple protein-‐protein
10
interactions, a feature of proteins with LCRs Secondary structure and solvent accessibility
SABBLE
Secondary structure and solvent accessibility predicted using the sequence of INO80D isoforms
Comparative analysis of wild-‐type and mutant sequence revealed changes in solvent accessibility due to mutation
11
Table S2: Homozygous regions identified using Runs of Homozygosity (ROH) routine in PLINK %&'($
!"
#$%!&%'()*" #$&)+)!)," !(&-.(+-(%%" !()-)%)-,,," %!%,/,+" !!,"
%%"
!'-!%&"
;!
#8?@?!
#8(=)@?(=($ (A;B@;(B=AC$
(A@B>;;BA>@$
))>AD;=$ )@)@B