ORIGINAL RESEARCH Extensive functional analyses of RHD splice site variants: insights into the potential role of splicing in the physiology of Rh Yann Fichou,1,2 Pierre Gehannin,1,2 Manon Corre,1,2 Alice Le Guern,1,2 Ce dric Le Marechal,1,2,3,4 Gerald Le Gac,1,2,3,4 and Claude Ferec1,2,3,4

BACKGROUND: Among more than 300 mutated alleles identified so far within the RHD gene, almost 40 are assumed to alter cellular splicing and therefore may have a direct effect on Rh phenotype both at the quantitative and at the qualitative levels. Functional data are, however, mostly unavailable to assess the direct involvement of splicing defect in the underlying physiology. STUDY DESIGN AND METHODS: We generated plasmid constructs to carry out an exhaustive investigation of 38 RHD variants located within or in the vicinity of exon–intron junctions by a minigene splicing assay, further characterized the transcript structures by sequencing, and identified cryptic sites activated by the genetic defect. Bioinformatics predictions were carried out in parallel and compared with the functional data. RESULTS: For the first time we demonstrate that a product including the full-length Exon 9 is transcribed in the presence of the c.1227G>A substitution frequently carried by Asians with DEL phenotype and confirmed that splicing is altered in the RHD*weak D Type 2 allele, a rare variant most commonly found in Caucasians. CONCLUSION: Overall we 1) show significant correlation between functional analyses, bioinformatics predictions, and phenotypes, when available, especially for variants in close proximity of the consensus splice sites; 2) classify the variations as splicing or nonsplicing variants; and 3) provide functional data to further improve bioinformatics splicing tools. Conversely assessment of seven silent exonic variants was mainly inconclusive.

R

egulation of gene expression is a complex process that involves different mechanisms to convert a genetic coding to a functional protein. Gene transcription is functionally coupled in space and time to a series of posttranscriptional modifications.1 RNA splicing is a dynamic process evolutionarily conserved in eukaryotes that defines the nuclear mechanism by which a premessenger RNA molecule is processed by a macromolecular, ribonucleoprotein functional unit, namely, the spliceosome, into an intron-free, mature messenger RNA (mRNA) further exported to the cytoplasm for translation into a polypeptide by the ribosomal machinery. This complex process is very finely regulated by cis-regulatory elements within both exons and introns, including more importantly the canonical, dinucleotide donor and acceptor sequences, typically GT and AG, respectively, at the 50 - and 30 -ends of each intron. Any alteration in either latter sequence may thus potentially  et de la Recherche From the 1Institut National de la Sante dicale (Inserm), UMR1078; the 2Etablissement Franc¸ais du Me ne tique gion Bretagne; the 3Laboratoire de Ge Sang (EFS)–Re culaire et d’Histocompatibilite , Centre Hospitalier Mole gional Universitaire (CHRU), Ho ^pital Morvan; and the Re 4  de Me decine et des Sciences de la Sant Faculte e, Universit e de Bretagne Occidentale, Brest, France. Address reprint requests to: Yann Fichou, PhD, Etablissegion Bretagne, Inserm ment Franc¸ais du Sang (EFS)–Re lix Le Dantec, CS 51819, 29218 Brest UMR1078, 46 rue Fe Cedex, France; e-mail: [email protected] This work was supported by the Association Recherche et Transfusion (ART; Contract 63-2012); the Etablissement Franc¸ais gion Bretagne; and the Institut National de la du Sang (EFS)–Re  et de la Recherche Me dicale (Inserm), France. Sante Received for publication November 7, 2014; revision received January 8, 2015; and accepted February 10, 2015. doi:10.1111/trf.13083 C 2015 AABB V

TRANSFUSION 2015;00;00–00 Volume 00, February 2015 TRANSFUSION 1

FICHOU ET AL.

TABLE 1. RHD gene variants in the vicinity of the consensus splice sites

Exon

Nucleotide variant

Amino acid change (position*)

2

c.150T>C c.334A>G c.335G>C c.335 1 1G>A c.336-2A>G c.336-2delA c.336-1G>A c.485A>G c.486 1 1G>A c.486 1 2T>A c.486 1 3G>C c.486 1 5G>A c.634G>C c.634G>T c.634 1 1G>A c.634 1 2T>A c.634 1 5G>A c.634 1 5G>T c.635-2A>G c.635G>T c.636C>T c.800A>T c.801 1 1G>A c.938C>T c.939G>A c.939 1 2T>A c.940-4A>C c.941G>T c.1073T>C c.1073 1 2T>C c.1074-2A>C c.1152A>C c.1153 1 1G>A c.1154-31T>C c.1154-8T>A c.1154G>C c.1226A>T c.1227G>A

Synonymous p.S112G (TM4) p.S112T (TM4) NA NA NA NA p.N162S (EC3) NA NA NA NA p.G212R (TM7) p.G212C (TM7) NA NA NA NA NA p.G212V (TM7) Synonymous p.K267M (TM9) NA p.P313L (TM10) Synonymous NA NA p.G314V (TM10) p.M358T (EC6) NA NA Synonymous NA NA NA p.G385A (TM12) p.K409M (IC7) Synonymous

3

4

5

6

7

8

9

D phenotype† Multiple Weakened DEL Negative DEL DEL, partial Negative Partial DEL DEL Weakened Weakened DEL Weak ND ND Weakened Weakened ND Negative Positive Multiple Negative Weak, partial ND Negative Weakened Negative Partial Negative Negative Weak Negative ND Weak Weak Weak DEL

Reference Rhesus [22] Rhesus [23] [24] Rhesus [15] [25] [26] [27] [28] [29] [27] [30] [31] [31] [28] [28] [32] [26] [33] Rhesus Rhesus [34] [20] [35] Rhesus [36] [37] Rhesus [24] Rhesus [26] ND [29] [38] [39] [26]

Base Base

Base

Base Base

Base

Base Base

Splicing variant‡

Maintenance of full-length exon inclusion§

– 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 – – 1 – 1 11 – – 1 1 1 1 1 – 1 1 1 1

111 1 – – – – – 11 – – 1 – 11 11 – – – 11 – 11 111 111 – 111 1 111 111 111 – – 1 – 111 1 11 11 1

* In accordance with Flegel40 and Silvy et al.41 † “Multiple” indicates that the variant is found in several alleles resulting in different phenotypes. ‡ Variants that disrupt, alter, or modify splicing (1) or not (–) compared to what was observed with the respective wild-type construct are defined as “splicing variants.” § Maintenance of the full-length exon of interest at a similar (111), moderate (11), low (1), or null (–) proportion compared with the wildtype construct. TM 5 transmembrane domain; EC 5 extracellular loop; IC 5 intracellular domain; NA 5 not applicable; ND 5 not documented.

alter the cellular splicing of the related intron by disrupting trans-acting factor binding to a constitutive splice site and/or activating a cryptic splice site in the vicinity of the mutant, finally resulting in a quantitative and/or qualitative defect of the mRNA product. Such mutants have been shown to be involved in numerous genetic disorders, as well as tumorigenesis.2-4 Overall approximately 10% of hereditary human diseases are assumed to be directly linked to splice site variants.3 More than 300 variants have been reported so far within RHD (Rhesus Base v2.0: www.uni-ulm.de/~fwagner/ RH/RB2/; ISBT Allele Terminology, http://www.isbtweb.org/ working-parties/red-cell-immunogenetics-and-bloodgroup-terminology/blood-group-terminology/blood-group2

TRANSFUSION Volume 00, February 2015

allele-terminology/; Blood Group Antigen Gene Mutation Database, http://www.ncbi.nlm.nih.gov/projects/gv/rbc/ xslcgi.fcgi?cmd=bgmut/systems), the gene encoding the transmembrane RhD protein at the surface of red blood cells (RBCs).5-7 Alleles carrying these variations constitute the molecular bases for the expression of various phenotypes that are characterized by quantitative and/or qualitative defect of the D antigen as investigated by serologic analyses.8 Although several bioinformatics tools have been developed to gain insights into the defect resulting from a genetic variation in silico,9-12 very few functional data are available to confirm these predictions in the RHD gene. The silent c.1227G>A transition (i.e., p.K409K), which

FUNCTIONAL TEST OF RHD SPLICING VARIANTS

Fig. 1. Functional analysis of variants affecting RHD Exon 9 by minigene splicing assay in 293T/17 cells. RT-PCR (A) and sequencing profiles of the products with skipping of Exon 9 (B) and inclusion of full-length Exon 9 (C). Arrows indicate the relative positions of forward and reverse primers; asterisk indicates the location of potential full-length RT-PCR product; arrowheads in the sequencing profiles indicate the position of the variant of interest. NT 5 not transfected; pSP 5 pSplicePOLR2G vector; WT 5 wild-type construct; bp 5 base pairs.

substitutes the last nucleotide position of RHD Exon 9, is more frequently found in people originated from Eastern Asia expressing a DEL phenotype.13-18 Functional studies carried out with both total RNA extracted from whole blood samples and a model of minigene splicing assay in eukaryotic cell models clearly showed that the c.1227A genotype dramatically alters the cellular splicing resulting in aberrant mRNA molecules.19 In a similar approach using minigene we reported that the c.335G>T and the c.939G>A variants, which respectively affect the ultimate nucleotide of Exons 2 and 6, are splice site variants. These variations are thought to induce a decrease in the biosynthesis of full-length “normal” mRNA, which is further translated into a smaller amount of “normal” protein than expected in wild-type conditions, finally contributing to a potential weak D phenotype.20 These experiments highlighted the quantitative impact of splice site variants in Rh physiology. Because 1) many variants of the RHD gene are private and/or 2) fresh blood samples cannot be drawn for RNA extraction most of the time for various reasons, we took advantage of our reporter assay20,21 to exhaustively investigate RHD variants in the vicinity of both donor

and acceptor splice sites. Functional data were compared with bioinformatics prediction and phenotype when available.

MATERIALS AND METHODS RHD variants A first set of single-base substitution variants of interest was selected on the basis of their position relative to the consensus, constitutive splice sites within RHD Exons 2 to 9 (i.e., first, second, penultimate, and final positions of both exons and introns), accounting for 30 variants (Table 1). This set includes the already described c.939G>A20 and c.1227G>A19 synonymous substitutions, respectively, in Exons 6 and 9, which serve as positive controls for minigene splicing assay. Two other sets of variants, including other intronic, single-base substitutions in the vicinity of the consensus splice sites (n 5 8), and exonic, synonymous variants (n 5 7), were also selected for functional analysis.

Bioinformatics prediction of splice variant effect Predictions were carried out with computer software (Splicing Module of Alamut v2.4, Interactive Biosoftware, Volume 00, February 2015 TRANSFUSION 3

FICHOU ET AL.

Fig. 2. Functional analysis of RHD variants involving Exon 2 (A, F), Exon 3 (B, G), Exon 4 (C, H), Exon 5 (C), Exon 6 (D, E), Exon 7 (E, F), Exon 8 (E, I), and Exon 9 (I) by minigene splicing assay in 293T/17 cells. NT 5 not transfected; pSP 5 pSplicePOLR2G vector; WT 5 wild-type construct; bp 5 base pairs.

Rouen, France) that integrates five splicing prediction tools: Splice Site Finder-like, MaxEntScan,11 NNSPLICE,9 GeneSplicer,10 and Human Splicing Finder,12 with default parameters and visualized in Alamut splicing window.

DNA samples Among the more than 2000 blood samples addressed to our laboratory for routine molecular genotyping of the RHD gene by the French blood centers (Etablissement Franc¸ais du Sang), four DNA samples were selected: a “wild-type,” reference RHD sample for wild-type, control plasmid constructs; two DNA samples carrying respectively the c.486 1 5G>A and c.1154-8T>A variants previously characterized at the laboratory;29 and a RHD*weak D Type 2 sample, which is commonly found in Caucasians carrying D variants.38

Plasmid constructs, site-directed mutagenesis, and minigene splicing assay Recombinant plasmid vectors (n 5 8) containing the wildtype RHD inserts, which consist of Exons 2 to 9 with flanking domains of their respective introns, as well as those including the c.486 1 5G>A, c.1154-8T>A, and Exon 9 of the RHD*weak D Type 2 allele, were constructed as 4

TRANSFUSION Volume 00, February 2015

reported before (Appendix S1, available as supporting information in the online version of this paper).21 Plasmid constructs were all checked by Sanger sequencing in conditions previously described20 and output data were visualized with Sequencher 5.2 (Gene Codes Corporation, Ann Arbor, MI). Minigene splicing assay is derived from what was previously published (Appendix S1).21

RESULTS Validation of the model: the example of the Asian DEL c.1227G>A variant All plasmid constructs were transfected in 293T/17 cells. Reverse transcription–polymerase chain reaction (RTPCR) products were loaded on an agarose gel, and profiles obtained with the variants were compared to those of their respective wild-type construct. All experiments were carried out at least in triplicate. Although our method previously showed proof of concept,20 confirming the relevance of the test by comparing our results with those reported before appeared to be mandatory. Particular attention was then first paid to the c.1227G>A variant, which was shown to induce skipping of Exon 9 from the mature transcript by a similar minigene splicing assay strategy.19 In our model an Exon 9–excluded product

FUNCTIONAL TEST OF RHD SPLICING VARIANTS

Fig. 3. Schematic representation of the cryptic splice sites activated by variants in the vicinity of the constitutive Intron 2 donor (A), Intron 2 acceptor (B), Intron 3 donor (C), Intron 4 donor (D), Intron 6 donor (E), and Intron 7 acceptor (F) splice sites, as demonstrated by functional analyses. Uppercase letters 5 exonic sequences; lowercase letters 5 intronic letters; constitutive splice sites are bold underlined; activated cryptic splice sites are italics bold underlined; arrowheads indicate the position of the variant of interest; numbering above the sequences defines the number of base pairs (bp) added to (1) or removed from (–) the RTPCR product relatively to the product observed with the respective wild-type constructs.

identical to that obtained with the empty pSplicePOLR2G vector was observed and characterized (Fig. 1A). This result indicates that RHD Exon 9 carrying this variant is skipped from the mature transcript (Fig. 1B), as reported by Liu and colleagues.19 Interestingly, although not visible on the gel (Fig. 1A, asterisk), we subcloned a potential product with a similar size to that obtained with the wildtype allele. Sequencing analysis confirmed inclusion of the whole variant Exon 9 (Fig. 1C), suggesting for the first time that a very small amount of full-length RhD protein may be biosynthesized in subjects carrying the RHD c.1227G>A allele.

Functional analysis of the RHD variants Table 1 summarizes the results of the functional analysis by minigene splicing assay. In line with the study of the aforementioned variant, we investigated additional variants related to RHD Exon 9. Both the missense c.1154G>C and the intronic c.1154-31T>C substitutions, which are associated to the RHD*weak D Type 2 allele, were analyzed independently. As reported previously,42 we clearly demonstrated that only the missense

c.1154G>C variation has a functional impact on splicing (Fig. 1A), suggesting that this variant is sufficient for the weak D phenotype associated with the RHD*weak D Type 2 allele. Similar results were obtained with 1226A>T (Fig. 1A), and c.1154-8T>A (Fig. 2I), which both induced significant exclusion of Exon 9 (Fig. 1B). Of the other variants that were investigated, 19 functionally showed very mostly, if not totally, exclusion of the full-length exon, resulting in alternative transcript products and/or skipping of the exon of interest. These include c.334A>G, c.335G>C, and c.335 1 1G>A in Exon 2 (Fig. 2A and Fig. S2A); c.336-2A>G, c.336-2delA, c.336-1G>A, c.486 1 1G>A, c.486 1 2T>A, and c.486 1 5G>A in Exon 3 (Figs. 2B and 2G; Figs. S2B and S2C); c.634 1 1G>A, c.634 1 2T>A, c.634 1 5G>A, and c.634 1 5G>T in Exon 4; and c.635-2A>G and c.801 1 1G>A in Exon 5 (Fig. 2C; Fig. S2D); c.939 1 2T>A in Exon 6, c.1073 1 2T>C in Exon 7, and c.1074-2A>C and c.1153 1 1G>A in Exon 8 (Fig. 2E; Figs. S2F and S2H). Of note it is particularly interesting to mention that a cryptic site upstream the constitutive acceptor site was activated in the presence of c.336-2delA, resulting in an in-frame insertion of a 27-bp intronic Volume 00, February 2015 TRANSFUSION 5

FICHOU ET AL.

TABLE 2. Bioinformatics predictions of splicing effect by Alamut v2.4 Exon 2

3

4

5

6

7

8

9

Nucleotide variant

50 -donor (D)/30 -acceptor (A)

SSF

MES

NNS

GS

HSF

Group*

Wild-type c.150T>C Wild-type c.334A>G c.335G>C c.335 1 1G>A Wild-type c.336-2A>G c.336-2delA c.336-1G>A Wild-type c.485A>G c.486v1G>A c.486 1 2T>A c.486 1 3G>C c.486 1 5G>A Wild-type c.634G>C c.634G>T c.634 1 1G>A c.634 1 2T>A c.634 1 5G>A c.634 1 5G>T Wild-type c.635-2A>G c.635G>T c.636C>T Wild-type c.800A>T c.801 1 1G>A Wild-type c.938C>T c.939G>A c.939 1 2T>A Wild-type c.940-4A>C c.941G>T Wild-type c.1073T>C c.1073 1 2T>C Wild-type c.1074-2A>C Wild-type c.1152A>C c.1153 1 1G>A Wild-type c.1154-31T>C c.1154-8T>A c.1154G>C Wild-type c.1226A>T c.1227G>A

A A D D D D A A A A D D D D D D D D D D D D D A A A A D D D D D D D A A A D D D A A D D D A A A A D D D

— — 72.4 — — — 79.1 — — — 82.0 73.1 — — 76.5 — 81.9 — — — — — — 85.1 — 78.9 — 89.6 81.0 — 85.8 86.2 73.6 — 80.8 80.8 80.8 72.9 72.2 — 89.7 — 90.1 81.2 — 76.2 76.2 71.0 70.5 94.4 85.9 82.2

2.8 2.5 8.3 4.9 — — 9.4 — — — 9.5 8.2 — — 5.6 — 9.1 5.0 4.5 — — 4.2 5.7 9.6 — 8.0 10.8 9.6 8.3 — 10.1 9.4 5.9 — 7.0 7.0 8.3 2.9 4.5 — 12.3 — 8.9 7.5 — 6.2 6.2 4.7 4.2 10.6 9.5 6.9

— — 0.8 — — — 0.8 — — — 1.0 1.0 — — 0.5 — 1.0 0.7 0.5 — — 0.6 0.5 0.8 — — 0.9 1.0 1.0 — 1.0 1.0 0.5 — 0.7 0.6 0.9 — — — 1.0 — 0.9 0.8 — — — — — 1.0 0.9 0.5

1.9 — 4.2 0.5 — — 9.5 — — — 8.0 7.1 — — 2.1 — 5.0 1.8 0.9 — — 2.2 1.4 4.1 — 2.5 5.5 12.5 8.8 — 7.1 6.2 3.7 — 4.9 4.3 6.7 — — — 8.8 — 5.8 4.3 — — — — — — — —

81.0 80.2 76.9 72.1 65.9 — 90.4 — — — 86.9 82.0 — — 77.2 74.7 91.2 80.7 80.4 — — 79.1 78.9 90.8 — 86.5 91.6 95.7 91.0 — 92.9 92.9 82.3 — 89.8 90.3 90.4 79.6 79.5 — 83.0 — 95.0 90.2 — 83.2 83.2 81.2 79.1 96.7 92.0 86.1

NA 3 NA 2 2 1 NA 1 1 1 NA 3 1 1 2 2 NA 2 2 1 1 2 2 NA 1 2 3 NA 3 1 NA 3 2 1 NA 3 3 NA 3 1 NA 1 NA 2 1 NA 3 3 3 NA 3 2

* See “DNA samples” under Materials and Methods for explanations. SSF 5 SpliceSiteFinder-like (range, 70-100); MES 5 MaxEntScan (range, 0-12 for donor and 0-16 for acceptor); NNS 5 NNSPLICE (range, 0.4-1.0); GS 5 Gene Splicer (range, 0-15); HSF 5 Human Splicing Finder (rage, 60-100). — 5 out of range; NA 5 not applicable.

sequence within the mature transcript (Fig. S2B). It is also worth indicating that all variants located within the consensus dinucleotide splice sites strongly affect the cellular splicing, as experimentally shown. A moderate to significant deleterious splicing effect was observed with variants c.486 1 3G>C in Exon 3 (Fig. 2G), c.634G>C and c.634G>T in Exon 4, and c.635G>T in 6

TRANSFUSION Volume 00, February 2015

Exon 5 (Fig. 2C; Figs. S2D and S2E); c.939G>A in Exon 6 (Fig. 2D; Fig. S2F) as reported previously;20 and c.1152A>C in Exon 8 (Fig. 2E; Fig. S2H). Some of the variants included in these two latter categories resulted in the activation of cryptic splice sites that were characterized by sequencing (Fig. S2) and can be visualized in a genomic context (Fig. 3).

FUNCTIONAL TEST OF RHD SPLICING VARIANTS

Fig. 4. Prediction of 50 -donor splice sites in the vicinity of RHD Exon 4/Intron 4 junction by Alamut v2.4 in the wild-type sequence (top) and in the presence of c.634 1 1G>A (middle) and 634G>T (bottom). Six positions with various scores indicated on the left of or above the vertical bars are predicted to be recognized as donor splice sites, including the constitutional site. Splice sites that were found to be active by functional analysis are surrounded. Scores are within the same ranges as those indicated in Table 2. Numbering below the sequences defines the number of base pairs (bp) included within the transcript products from variant constructs relatively to the wild-type product including full-length Exon 4. id 5 identical to wild-type. See the text for explanations.

Finally no disruption was observed with the eight following variants: c.150T>C in Exon 2 (Fig. 2A; Fig. S2A); c.485A>G in Exon 3 (Fig. 2B; Fig. S2C); c.636C>T and c.800A>T in Exon 5 (Fig. 2C; Fig. S2E); c.938C>T in Exon 6 (Fig. 2D; Fig. S2F); and c.940-4A>C, c.941G>T, and c.1073T>C in Exon 7 (Figs. 2E and 2F; Fig. S2G).

Bioinformatics predictions of splice site disruption by RHD variants and correlation with functional studies In a second step of the study, the effect of all variants on their respective consensus splice site was estimated by the splicing module of Alamut (Table 2), which integrates five splice site tools. Variations were further classified in three groups depending on the degree of the predictive impact: Group 1 defines variants that are predicted to abolish the consensus splice site by all five tools; Group 2 includes variants with two to five tools predicting significant reduction in scores; and variants with minor changes in scores, if any, are in Group 3. Group 1 includes the 14 variants that are strictly involved either in the donor or in the acceptor dinucleotide sequence of the consensus splice sites. The functional analyses (Fig. 2) and sequencing data (Fig. S2), showing

that no product including the full-length exon could be characterized in our approach, are in clear accordance with predictions (Table 2). To illustrate this observation, two products including the full-length Exon 4 with either 11 or 16 additional base pairs, corresponding to intronic sequences, were characterized with variant c.634 1 1G>A as mentioned (Fig. S2D). Both alternative splice sites activated here were clearly identified by bioinformatics prediction (Fig. 4, middle panel). The tight correlation between sequencing data and prediction of alternative splice sites was validated in all other variants of Group 1. These 14 variants importantly result in either a negative or DEL phenotype when reported, suggesting the critical impact of splicing defect in the major quantitative phenotype defect observed in this subset. Twelve variants, located within either exon or intron are in Group 2 (Table 2). A significant albeit moderate effect on splicing was predicted for six variations (i.e., c.486 1 3G>C, c.634G>C, c.634G>T, c.635G>T, c.939G>A, and c.1152A>C), as confirmed by functional analyses. Inclusion of the full-length exon, as well as activation of cryptic splice sites, was functionally confirmed in all cases (Figs. 2 and 3; Fig. S2). The example of c.634G>T conveniently illustrates the competition between the constitutional and cryptic splice sites. Indeed reduction in Volume 00, February 2015 TRANSFUSION 7

FICHOU ET AL.

strength of the consensus dinucleotide donor site by the single-base substitution (Table 1) is predicted to result in the activation of two alternative splice sites at 111 and 116 bp, as indicated by higher scores (Fig. 4, bottom panel). But conversely to c.634 1 1G>A, this variant does not completely abolish the constitutive site (Fig. 4). The other six variations (i.e., c.334A>G, c.335G>C, c.486 1 5G>A, c.634 1 5G>A, c.634 1 5G>T, and c.1227G>A) displayed an exclusion of the respective, fulllength exon very predominantly (Fig. 2). These results do not exactly correlate with that expected from the bioinformatics predictions. It is particularly worthwhile to underline that both the functional test and phenotype indicating a severe impact of the c.1227G>A variant are in obvious discrepancy with bioinformatics predictions, which suggest a moderate effect on splicing. Similar observations may be made for the c.335G>C variant, although it cannot be ruled out that the amino acid substitution may also have a role to explain the related phenotype. While a moderate effect was predicted with the other four variants (i.e., c.334A>G, c.486 1 5G>A, c.634 1 5G>A, and c.634 1 5G>T), a strong impairment of the mechanism was functionally demonstrated, which is also in clear discrepancy with the reported weak phenotype. A total of 12 variants are in Group 3 (Table 2) including, as expected, those that did not display any functional defect (i.e., c.150T>C, c.485A>G, c.636C>T, c.800A>T, c.938C>T, c.940-4A>C, c.941G>T, c.1073T>C, and c.115431T>C). Unexpectedly considering bioinformatics predictions, three variants (c.1154-8T>A, c.1154G>C, and c.1226A>T) exhibited moderate splicing disruption (Figs. 1 and 2). Taken together these data suggest that bioinformatics predictions correlate well with functional analyses when considering both extreme sides of the functional spectrum (i.e., total and null disruption), suggesting that a broad range of intermediate situations needs to be experimented in vitro.

Functional analysis of exonic silent variants We next paid attention to another class of variants: silent substitutions in exonic regions located out of the consensus splice sites. Seven variants were selected on this basis: c.165C>T in Exon 2, c.357T>C in Exon 3, c.519C>T and c.576C>T in Exon 4, c.960G>A in Exon 7, c.1110C>T in Exon 8, and c.1170T>C in Exon 9. They all have been reported to result in a weak D phenotype, except c.357T>C, for which no serologic data have been documented. Because 1) these variants do not modify the amino acid sequence of the RhD protein and 2) our minigene assay proved to be relevant, we sought to test whether the cellular splicing may be altered by these exonic variations. In 293T/17 cells, both c.960G>A (Fig. 2F) and c.1110C>T (Fig. 2I) appear to slightly enhance 8

TRANSFUSION Volume 00, February 2015

exclusion of Exons 7 and 8, respectively. No difference between the wild-type and mutant constructs could be observed for the other variants (Figs. 2F-2I). In the K-562 leukemia cell model, activation of exon exclusion was confirmed for c.960G>A, but not in c.1110C>T (Fig. S3), suggesting that the results may depend on the cell type used. Although an additional product above the major band appeared to be amplified in c.519C>T and c.576C>T (Fig. S3), it could not be characterized by sequencing. Overall our data suggest that the c.960G>A variant slightly affects splicing by excluding Exon 7, at least partially, from the mature transcript, which may further potentially contribute to the expression of a weak D phenotype.

DISCUSSION The RHD c.1227G>A allele may generate a fulllength mRNA resulting potentially in the translation of a “normal” RhD protein Characterization of transcripts resulting from the RHD c.1227G>A allele has already been reported by two independent studies.19,43 They both concluded that Exon 9 carrying the variant, as well as other exons potentially, is systematically skipped from the mature transcript. The model proposed so far has suggested that a protein with an altered C-terminal tail is produced, including all epitopes encoded by Exons 1 to 8 that are exposed at the surface of RBCs.40 Our result, which demonstrates the inclusion of a full-length Exon 9 within a mature transcript (Fig. 1), challenges this assumption, but may however be in accordance with the major quantitative defect associated with this variant. Indeed although most of Exon 9 is skipped, a minute fraction may be retained within the mature transcript, which would be further translated into a wild-type RhD protein that migrates to the plasma membrane. RBCs would thus express a very low amount of “normal” D antigen exposed at their surface resulting in a DEL phenotype. A reason why the other studies did not identify any full-length transcript may result from the combination of two critical factors: the extreme limited amount of biosynthesized variant mRNA and the low sensitivity of the visualization method (i.e., ethidium bromide staining of agarose gel). Although we also carried out RT-PCR and agarose gels, it appeared worthwhile to make sure that no product under the resolution limit was produced at the expected size. It is important to note that the “invisible” product could not be directly sequenced, but had to be cloned before characterization, allowing the identification of minute amounts of RT-PCR product. Additional studies carried out with mRNA extracted from fresh drawn blood samples will be required to elucidate the relevance of our findings in vivo. This result also highlights an important feature in conducting such a study: only major transcribed products can be

FUNCTIONAL TEST OF RHD SPLICING VARIANTS

visualized on the gel, implying that (many) others may also coexist and be of physiologic significance. Fluorescent PCR may help to identify minor, alternative products by enhancing the detection sensitivity by capillary electrophoresis.

Alteration of dinucleotide consensus splice sites dramatically impairs splicing of RHD exons Thirteen single-nucleotide substitutions, as well as a single-base deletion variant, were directly involved within either the dinucleotide consensus 50 -donor (AG) or 30 acceptor (GT) sequence. Those that have been characterized either confer a D– or DEL phenotype (Table 1), indicating an obvious quantitative defect and confirming, as expected, that alteration of the canonical dinucleotide sequences within the RHD gene is highly deleterious to the recognition of exons for proper splicing. The c.336-2delA variant is particularly interesting. This variant is categorized as partial DEL, implying that 1) a protein is expressed, 2) at a very low level of expression, and 3) with altered epitope(s). Functional analysis revealed two products (Fig. 2G), including one with a skipped exon, potentially participating to the quantitative defect, and the other with an in-frame, intronic 27-bp sequence at the 50 -end of Exon 3. One may speculate that a mature RHD transcript containing this latter sequence might be translated into a protein with an additional 9amino-acid sequence (i.e., Arg-Pro-Gly-Ser-Pro-Ser-LeuPro-Arg) between Ser112 and Ile113. These residues are supposed to locate within the fourth transmembrane domain of the native protein40,41 and more specifically very close to the second extracellular loop. It cannot be ruled out that such an insertion may alter the structure of this loop, then modifying the antigenic properties of the protein and contributing to the partial D phenotype associated to this variant. Additional studies will need to be carried out to validate this assumption.

A correlation between exon skipping and D antigen expression On the basis of the functional studies, predicting the global quantitative expression of RhD protein appears to be possible in most cases. Indeed major skipping of the exon of interest and/or production of alternative products are associated with both negative and DEL phenotypes (e.g., c.801 1 1G>A and c.1227G>A, respectively), while mild to moderate exon exclusion is found in variants with weak/weakened D expression (e.g., c.1152A>C). The amount of D antigen is therefore negatively linked to the disruption of exon splicing. In accordance with previously reported results obtained by cDNA analysis,42 we confirm functionally that the missense c.1154G>C transversion (p.G385A), which characterizes the RHD*weak D Type 2 allele predominantly found in Caucasians,38 is a splicing variant that

induces both significant skipping of the whole Exon 9 and inclusion of the full-length exon including the missense nucleotide (Fig. 1). Additionally to the potential deleterious effect of the missense p.G385A substitution within the translated protein, reduction in the expression of a fulllength transcript may then contribute to the weak D phenotype associated to this variant, suggesting an addictive functional effect. This feature is also illustrated by the c.1226A>T (p.K409M) variant at the end of the same exon, which confers a weak D phenotype. By contrast the missense c.938C>T (p.P313L) variant, which results in a weak D phenotype, did not show any functional impact on splicing (Fig. 2D), suggesting that the quantitative defect is only due to the protein carrying the p.P313L amino acid substitution. Similarly the partial phenotypes observed with c.485A>G (p.N162S) and c.1073T>C (p.M358T) result from modifications within the protein sequence of residues exposed at the surface of RBCs. These are typical cases of “protein defect” as opposed to the “splicing defect” primarily studied in this work. Although expression of D antigen and functional analysis by minigene splicing assay were found to match most of the time, correlation is however challenged with some variations. The missense c.635G>T (p.G212V) and c.941G>T (p.G314V) variants are typical examples illustrating that feature. While our functional study shows that skipping of the respective exon of interest is moderate for the former (Fig. 2C) to null for the latter (Fig. 2E), thus indicating that a full-length, missense protein is supposed to be biosynthesized, they both have been reported to confer a D– phenotype (Table 1). One may speculate that the p.G212V and p.G314V substitutions, respectively, which locate within transmembrane domains, may dramatically alter the structure of the protein and completely impair its integration within the membrane, then finally resulting in a negative phenotype. On the basis of our findings, we also propose that the c.941G>T variant, which appears in the “missense (splice site affected)” section in the Rhesus Base v2.0, should be removed from this list (www.uni-ulm.de/~fwagner/RH/RB2/M_RHDMissense(splicesiteaffected).htm). The result of the functional study of c.486 1 5G>A is somewhat surprising. This variant appears to promote a total exclusion of Exon 3 from the transcript (Fig. 2G) that is not compatible with the expected weakened D phenotype. At least two scenarios may be hypothesized: either the reported results of phenotyping are erroneous, suggesting that our functional results are correct, or phenotyping was correct and then our minigene splicing assay does not reflect the in vivo conditions for this variant, because some tissue-specific factors, absent from our model, are necessary for mimicking physiologic splicing. The cellular mechanism of splicing disruption has not been investigated because it is not the purpose of the work. But it is not so speculative to state that some if not all Volume 00, February 2015 TRANSFUSION 9

FICHOU ET AL.

variants studied here induce disruption of binding sites for the splicing machinery that further alters the molecular process. However, taken together our functional data have helped to classify 38 variations as splicing or nonsplicing variants.

Relative quantification of splicing disruption by bioinformatics prediction remains difficult in splicing variants with mild to moderate effect An interesting feature of bioinformatics prediction with Alamut v2.4 appears to be obvious from our study: predicting a major deleterious effect resulting from a variant within both the 50 -donor an 30 -acceptor dinucleotide consensus sequences is accurate, as illustrated by the perfect correlation between variants classified as “Group 1” (Table 2) and the functional studies. Another benefit of these prediction tools is their capacity to identify alternative splice sites. Indeed, all cryptic sites functionally identified by sequencing can be linked to bioinformatics predictions in our study. Bioinformatics predictions were also relevant in Group 2, except for c.1227G>A. Although a moderate reduction of scores is predicted for the consensus donor splice site affected by the variant (Table 2), we and others have shown that Exon 9 is almost totally skipped, in accordance with the very major splicing effect resulting in a DEL phenotype.19 Similarly, predictions of very mild effect for c.1154-8T>A, c.1154G>C, and c.1226A>T classified in Group 3 (Table 2) do not necessarily reflect what observed in the functional analysis (Fig. 1). Overall and on the basis of our results, bioinformatics prediction tends to underestimate the negative effect of splicing variants, especially those exhibiting mild to moderate deleterious activity. This work may thus help at improving algorithms by providing functional data mandatory to achieve highaccuracy splicing prediction tools. Because discrepancies remain between bioinformatics prediction and functional studies, these latter analyses must remain the gold standard to conclude that a variant/mutation of interest is actually a splicing variation.

Is our minigene splicing assay appropriate to investigate splicing of silent exonic variants? We finally took advantage of our model to study seven silent exonic variants that have been suspected to result in a weak D phenotype except c.357T>C, which has not been documented, in two eukaryotic cell lines. Interestingly we could show that c.960G>A seems to slightly promote Exon 7 exclusion, but it is currently unknown whether it is sufficient to explain the related phenotype, and our findings were mainly inconclusive for this class of variants. Several hypotheses may however be formulated. First, trans-acting factors and/or cis-acting regulatory elements from distal intronic regions required to promote exon skipping may be absent in the two cell lines used in the functional analysis. 10

TRANSFUSION Volume 00, February 2015

Second, the c.1227G>A variant, which has already been shown to disrupt splicing in another model, can be considered as a positive control of deleterious splicing effect to study variants at exon-intron junctions within the RHD gene for the validation of our assay.19 To our knowledge, no silent exonic variant of the RHD gene has been proven to alter the cellular splicing so far. Then a positive control is actually not available to investigate this kind of variant, which obviously challenges our model for that purpose. Third, phenotype associated to some of the silent variations may also be unsure. This latter hypothesis is illustrated by the c.165C>T variant, which has been independently reported twice: weakened D expression in China (GenBank Accession Number DQ309583) and apparently normal in Austria (FN545815). It remains to confirm unambiguously whether this variant alters D antigen expression or not. It is well known that exons contain short stretches of nucleotides identified as splicing regulatory elements, which positively (exonic splicing enhancers: ESEs) or negatively (exonic splicing silencers: ESSs) regulate the mechanism by interacting with protein partners. Although several bioinformatics tools have been developed to identify exonic regulatory elements,12,44,45 predicting the effect of exonic mutations on splicing activity has remained a difficult task. An interesting study published a few years ago reported the functional characterization of all RNA 6-mer sequences (i.e., 4096 individual 6-mers) on splicing.46 The authors concluded that not only the primary sequence of the hexamer is critical to splicing regulation, but its proximal environment, including the nature of the flanking nucleotides, the relative location of the 6-mer within the exon, as well as the size of the exon, also plays a key role. Synergy between splicing motifs should also be taken into consideration. Furthermore effects of both chromatin and RNA secondary structures, which could not be assessed in our model for obvious reasons, are particularly important to the regulation of splicing. Overall it remains to be concluded whether the seven silent exonic variants studied here somehow impair splicing or not. Recently de Coulgeans and colleagues47 elegantly demonstrated that expressing variant alleles of the ART4 gene with synonymous polymorphisms affecting either a single leucine (Leu208) or tyrosine (Tyr126) codon yielded quantitative variations of the antigen in vitro. Their results importantly suggest that such polymorphisms may modulate Dombrock blood group expression in vivo by the codon-usage bias phenomenon, which is involved in the general process of gene expression regulation. By considering that of the seven synonymous variants studied here, four substitutions involve either a leucine (c.960G>A, c.1110C>T, and c.1170T>C) or tyrosine (c.519C>T) codon, testing the alternative hypothesis of posttranscriptional regulation will also be to carry out in our context.

FUNCTIONAL TEST OF RHD SPLICING VARIANTS

CONFLICT OF INTEREST

18. Chen Q, Li M, Li M, et al. Molecular basis of weak D and DEL in Han population in Anhui Province, China. Chin Med

The authors have disclosed no conflicts of interest.

J 2012;125:3251-5. 19. Liu HC, Eng HL, Yang YF, et al. Aberrant RNA splicing in

REFERENCES 1. Schroeder SC, Schwer B, Shuman S, et al. Dynamic association of capping enzymes with transcribing RNA polymerase

RHD 7-9 exons of DEL individuals in Taiwan: a mechanism study. Biochim Biophys Acta 2010;1800:565-73. chal C, Jamet D, et al. Establishment of a 20. Fichou Y, Le Mare medium-throughput approach for the genotyping of RHD

II. Genes Dev 2000;14:2435-40.

variants and report of nine novel rare alleles. Transfusion 2013;53:1821-28.

2. Srebrow A, Kornblihtt AR. The connection between splicing and cancer. J Cell Sci 2006;119:2635-41. 3. Wang GS, Cooper TA. Splicing in disease: disruption of the

21. Callebaut I, Joubrel R, Pissard S, et al. Comprehensive functional annotation of 18 missense mutations found in sus-

splicing code and the decoding machinery. Nat Rev Genet 2007;8:749-61.

pected hemochromatosis type 4 patients. Hum Mol Genet 2014;23:4479-90.

4. Padgett RA. New connections between splicing and human disease. Trends Genet 2012;28:147-54. 5. Avent ND, Ridgwell K, Tanner MJ, et al. cDNA cloning of a 30 kDa erythrocyte membrane protein associated with Rh (Rhesus)-blood-group-antigen expression. Biochem J 1990;271:

22. Vege S, Copeland TR, Nickle PA, et al. RHD exon consensus splice-site changes, 334A>G and 1228T>G associated with weak D expression. Transfusion 2009;49:118-9A. 23. Ye LY, Guo ZH, Li Q, et al. Molecular and family analyses revealed two novel RHD alleles in a survey of a Chinese RhD-negative population. Vox Sang 2007;92:242-6.

821-5. 6. Cherif-Zahar B, Bloy C, Le Van Kim C, et al. Molecular cloning and protein structure of a human blood group Rh polypeptide. Proc Natl Acad Sci U S A 1990;87:6243-7.

24. Krog GR, Clausen FB, Berkowicz A, et al. Is current serologic RhD typing of blood donors sufficient for avoiding immunization of recipients? Transfusion 2011; 51:2278-85.

rif-Zahar B, et al. Molecular 7. Le van Kim C, Mouro I, Che cloning and primary structure of the human blood group RhD polypeptide. Proc Natl Acad Sci U S A 1992;89:10925-9.

25. Etheridge W, Tilley L, Poole J, et al. Two novel D genes of the Rh blood group system producing D variant phenotypes.

8. Daniels G. Variants of RhD—current testing and clinical con-

Transfus Med 2006; 16 Suppl s1:21-2.

sequences. Br J Haematol 2013;161:461-70. 9. Reese MG, Eeckman FH, Kulp D, et al. Improved splice site detection in Genie. J Comput Biol 1997;4:311-23. 10. Pertea M, Lin X, Salzberg SL. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res

26. Wagner FF, Frohmajer A, Flegel WA. RHD positive haplotypes in D negative Europeans. BMC Genet 2001;2:10. 27. Flegel WA, von Zabern I, Wagner FF. Six years’ experience performing RHD genotyping to confirm D- red blood cell units in Germany for preventing anti-D immunizations.

2001;29:1185-90.

Transfusion 2009;49:465-71.

11. Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 2004;11:377-94.

28. Ye L, He Y, Gao H, et al. Weak D phenotypes caused by intronic mutations in the RHD gene: four novel weak D

12. Desmet FO, Hamroun D, Lalande M, et al. Human Splicing

alleles identified in the Chinese population. Transfusion 2013;53:1829-33.

Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 2009;37:e67.

chal C, Guerry C, Benech C, et al. Identification of 29. Le Mare 12 novel RHD alleles in western France by denaturing highperformance liquid chromatography analysis. Transfusion

13. Shao CP, Maas JH, Su YQ, et al. Molecular background of Rh D-positive, D-negative, D(el) and weak D phenotypes in Chi-

2007;47:858-63.

nese. Vox Sang 2002;83:156-61. 14. Chen JC, Lin TM, Chen YL, et al. RHD 1227A is an important

30. Kamesaki T, Iwamoto S, Kumada M, et al. Molecular characterization of weak D phenotypes by site-directed mutagenesis and expression of mutant Rh-green fluorescence protein

genetic marker for RhD(el) individuals. Am J Clin Pathol 2004;122:193-8.

fusions in K562 cells. Vox Sang 2001;81:254-8.

15. Kim JY, Kim SY, Kim CA, et al. Molecular characterization of D- Korean persons: development of a diagnostic strategy.

31. Wagner FF, Mardt I, Bittner R, et al. RHD PCR of blood donors in Northern Germany: use of adsorption/elution to determine D antigen status. Vox Sang 2012;103:15.

Transfusion 2005;45:345-52. 16. Yang YF, Wang YH, Chen JC, et al. Prevalence of RHD 1227A and hybrid Rhesus box in the general Chinese population. Transl Res 2007;149:31-6. 17. Li Q, Hou L, Guo ZH, et al. Molecular basis of the RHD gene in blood donors with DEL phenotypes in Shanghai. Vox Sang 2009;97:139-46.

32.

M, Simon S, Gouvitsos J, et al. Weak D and DEL alleles detected by routine SNaPshot genotyping: identification of four novel RHD alleles. Transfusion 2011;51:401-11.

33. Chen Q, Flegel WA. Random survey for RHD alleles among D1 European persons. Transfusion 2005;45:1183-91. Volume 00, February 2015 TRANSFUSION 11

FICHOU ET AL.

€ ller TH, Wagner FF, Trockenbacher A, et al. PCR screening 34. Mu

47. de Coulgeans CD, Silvy M, Halverson G, et al. Synonymous

for common weak D types shows different distributions in three

nucleotide polymorphisms influence Dombrock blood

Central European populations. Transfusion 2001;41:45-52.

group protein expression in K562 cells. Br J Haematol 2014; 164:131-41.

35. Scott SA, Nagl L, Tilley L, et al. The RHD(1227G>A) DELassociated allele is the most prevalent DEL allele in Australian D- blood donors with C1 and/or E1 phenotypes. Transfusion 2014;54:2931-40. 36. Okuda H, Kawano M, Iwamoto S, et al. The RHD gene is highly detectable in RhD-negative Japanese donors. J Clin Invest 1997;100:373-9. € rmo € czi GF, Legler TJ, Daniels GL, et al. Molecular and 37. Ko serologic characterization of DWI, a novel “high-grade” partial D. Transfusion 2004;44:575-80. € ller TH, et al. Molecular basis of 38. Wagner FF, Gassner C, Mu weak D phenotypes. Blood 1999;93:385-93 39. Denomme GA, Wagner FF, Fernandes BJ, et al. Partial D, weak D types, and novel RHD alleles among 33,864 multiethnic patients: implications for anti-D alloimmunization and prevention. Transfusion 2005;45:1554-60. 40. Flegel WA. Molecular genetics and clinical applications for RH. Transfus Apher Sci 2011;44:81-91. 41. Silvy M, Chapel-Fernandes S, Callebault I, et al. Characterization of novel RHD alleles: relationship between phenotype, genotype, and trimeric architecture. Transfusion 2012;52:2020-9. 42. Vege S, Whorley T, Haspel RL, et al. The weak D type 2 mutation 1154G>C is associated with exon skipping [abstract]. Transfusion 2007;47:160A. 43. Shao CP, Xiong W, Zhou YY. Multiple isoforms excluding the normal RhD mRNA detected in Rh blood group Del phenotype with RHD 1227A allele. Transfus Apher Sci 2006;34:145-52. 44. Fairbrother WG, Yeh RF, Sharp PA, et al. Predictive identification of exonic splicing enhancers in human genes. Science 2002;297:1007-13. 45. Cartegni L, Wang J, Zhu Z, et al. ESEfinder: a web resource to identify exonic splicing enhancers. Nucleic Acids Res 2003; 31:3568-71. 46. Ke S, Shang S, Kalachikov SM, et al. Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res 2011;21:1360-74.

12

TRANSFUSION Volume 00, February 2015

SUPPORTING INFORMATION Additional Supporting Information may be found in the online version of this article at the publisher’s Web site: Fig. S1. Schematic representation of the strategy used to generate the c.1154G>C construct. (A) Two products were first amplified by PCRs 1 and 2 by using wild-type and RHD*weak D type 2 genomic DNAs as templates, respectively, resulting in the generation of partially overlapping wild-type and variant sequences. PCR products 1 and 2 were then diluted at 1/100th, and mixed together for PCR 3. Arrows with extensions: PCR primers for exon 9 amplification (Table SI, available as supporting information in the online version of this paper); arrows: complementary primers 50 -CTGTTTAAATGCATAATTTAATGTTAAAAG-30 in PCR 1, and 50 -CTTTTAACATTAAATTATGCATT TAAACAG30 in PCR 2; nucleotides to incorporate are bold red underlined; nucleotides to remove are bold black underlined. PCR conditions are those defined in the manuscript. (B) Control of PCR amplifications by agarose gel electrophoresis; bp: base pairs. Fig. S2. Sequencing profiles of the RT-PCR products including full-length and alternative products of the respective exons (A-H). Arrowheads in the sequencing profiles indicate the position of the variant of interest; bp: base pairs. Fig. S3. Functional analysis of silent exonic RHD variants by minigene splicing assay in K-562 cells. NT 5 not transfected; pSP 5 pSplicePOLR2G vector; WT 5wildtype construct; bp 5 base pairs. Table S1. Primer sequences for PCR amplification of RHD exons and flanking intronic regions. Table S2. Primer mutagenesis.

sequences

for

site-directed

Extensive functional analyses of RHD splice site variants: Insights into the potential role of splicing in the physiology of Rh.

Among more than 300 mutated alleles identified so far within the RHD gene, almost 40 are assumed to alter cellular splicing and therefore may have a d...
645KB Sizes 0 Downloads 14 Views