Journal of Bioinformatics and Computational Biology Vol. 12, No. 5 (2014) 1450023 (21 pages) # .c Imperial College Press DOI: 10.1142/S0219720014500231

In°uenza A HA's conserved epitopes and broadly neutralizing antibodies: A prediction method

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

Jing Ren*,§, John Ellis†,¶ and Jinyan Li‡,|| *Advanced

Analytics Institute, Faculty of Engineering and Information Technology University of Technology Sydney, P. O. Box 123 Broadway, NSW 2007, Australia † School of Medical and Molecular Biosciences University of Technology Sydney, P. O. Box 123 Broadway, NSW 2007, Australia



Advanced Analytics Institute and Centre for Health Technologies Faculty of Engineering and Information Technology University of Technology Sydney, P. O. Box 123 Broadway, NSW 2007, Australia § [email protected][email protected] ||[email protected] Received 27 July 2014 Revised 7 August 2014 Accepted 7 August 2014 Published 11 September 2014

A conserved epitope is an epitope retained by multiple strains of in°uenza as the key target of a broadly neutralizing antibody. Identi¯cation of conserved epitopes is of strong interest to help design broad-spectrum vaccines against in°uenza. Conservation score measures the evolutionary conservation of an amino acid position in a protein based on the phylogenetic relationships observed amongst homologous sequences. Here, Average Amino Acid Conservation Score (AAACS) is proposed as a method to identify HA's conserved epitopes. Our analysis shows that there is a clear distinction between conserved epitopes and nonconserved epitopes in terms of AAACS. This method also provides an excellent classi¯cation performance on an independent dataset. In contrast, alignment-based comparison methods do not work well for this problem, because conserved epitopes to the same broadly neutralizing antibody are usually not identical or similar. Location-based methods are not successful either, because conserved epitopes are located at both the less-conserved globular head (HA1) and the more-conserved stem (HA2). As a case study, two conserved epitopes on HA are predicted for the in°uenza A virus H7N9: One should match the broadly neutralizing antibodies CR9114 or FI6v3, while the other is new and requires validation by wet-lab experiments. Keywords: Conservation score; average amino acid conservation score; conserved epitope; broadly neutralizing antibody; structure alignment; location-based method; H7N9.

1450023-1

J. Ren, J. Ellis & J. Li

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

1. Introduction Of all in°uenza viruses, humans are most susceptible to in°uenza A. The hemagglutinin protein (HA) is a primary surface antigen of in°uenza A virus. It binds to the sialic acid-containing receptors on host cells as the primary event leading to infection. Antibodies can block this infection process by binding to the globular head of HA to inhibit the attachment of HA to the surface cells of the host or binding to the stem of HA to prevent the viral membrane fusion of HA with the host surface cells. The binding site on the HA antigen is a B-cell epitope, which can be recognized by a speci¯c B lymphocyte to stimulate the production of antibodies. Mutations have led to a great deal of diversity and variability amongst in°uenza A virus. In 2013, it was reported that there are at least 18 mutated HAs of two categories according to their antigenicity. Group 1 contains H1, H2, H5, H6, H8, H9, H11, H12, H13, H16, H17 and H18; Group 2 contains H3, H4, H7, H10, H14 and H15.1,2 The antigenicity of these mutated HAs varies considerably, resulting in huge di±culties for the e®ective selection and development of epitope-based vaccines. As common epitopes are retained by di®erent HA variants, they provide a potential target for e®ective broad-spectrum vaccines; hence the identi¯cation of those conserved epitopes is of wide interest. The recent discovery of broadly neutralizing antibodies indirectly con¯rms the existence and importance of conserved epitopes.1,3–14 Table 1 lists these broadly neutralizing antibodies. Some of them (such as FI6v3) can neutralize multiple strains of in°uenza in both Group 1 and Group 2. That is, their corresponding epitopes are conserved across groups. Some broadly neutralizing antibodies only neutralize Table 1. Broadly neutralizing antibodies. Antibody 3

HB36.3 HB80.44 C1795 CR62616 F101 A067 CR80208 CR80439 FI6v310 CR911411 39.2912 C0513 S139/114

Antigens

Group

H1, H5 H1, H2, H5, H6, H9, H12, H13, H16 H1, H2, H5, H6, H9 H1, H2, H5, H6, H8, H9, H11, H13, H16 H1, H2, H5, H6, H8, H9, H11, H13, H16 H1N1 and H5N1 H3, H4, H7, H10, H14, H15 H3, H10 H1, H2, H5, H6, H8, H9, H13, H3, H4, H7, H10 H1, H3, H5, H7, H9&B H1, H2, H3 H1, H2, H3, H9, H12 H1, H2, H3, H5, H9, H13

1 1

The stem region The stem region

IMF IMF

1 1

In the middle of the stem region At the interface of HA1 and HA2

IMF IMF

1

At the interface of HA1 and HA2

IMF

The A-helix epitope on the HA2 subunit At the base of the stem region At the base of the stem region A shallow groove on the F sub-domain of the HAs The stem region a groove adjacent to HA2 helix A The HA1 globular head domain The globular head adjacent to the receptor-binding domain

IMF IMF IMF IMF

1 2 2 1&2 1&2 1&2 1&2 1&2

Epitope location

Note: IMF: Inhibits membrane fusion. IVA: Inhibits virus attachments by direct competing with sialic acids. 1450023-2

Function

IMF IMF IVA IVA

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

In°uenza A HA's conserved

viruses from the same group; for example, CR6261 neutralizes viruses from Group 1. In contrast, some antibodies simply neutralize viruses of the same strain; for example, CH65 neutralizes only H1N1 viruses, and the epitopes are considered as strain-speci¯c epitopes (nonconserved epitopes). Hence the relationships amongst conserved epitopes and the broadly neutralizing antibodies are complicated. Identi¯cation of conserved epitopes by biological experiments is costly as it involves the search and determination of broadly neutralizing antibodies. Identi¯cation of conserved epitopes by computational methods is also a di±cult problem. First, conserved epitopes to the same broadly neutralizing antibody are not identical, sometimes, with big di®erence. Alignment-based comparison methods are thus not e®ective. Second, conserved epitopes are located at both the less-conserved globular head (HA1) of HA and the more-conserved stem (HA2) of HA. Location-based methods are not successful either. Evolutionary propensities can be used to characterize the conservation degree of each residue in a protein. The widely used evolutionary propensitiy Position-Speci¯c Scoring Matrix (PSSM) does not provide strong classi¯cation power to identify conserved epitopes. On the other hand, a conservation score can measure the degree of evolutionary conservation at each amino acid position for a family of homologues. It is usually calculated based on the correlation between the target protein and its homologues.15 In fact, conservation score has an inverse relationship with the sites evolution rate: Rapidly evolving positions are variable while slowly evolving positions are conserved.16 It is more relevant for the identi¯cation of conserved epitopes than PSSM. In this work, Average Amino Acid Conservation Score (AAACS) is proposed to identify conserved epitopes. The data analysis shows that there is a clear distinction between conserved epitopes and nonconserved epitopes in terms of their AAACS. The method also provides an excellent classi¯cation performance on an independent dataset containing both conserved and nonconserved epitopes. The utility of this AAACS-based method is discussed in the context of investigating new epitopes. A case study shows the use of the AAACS methodology: Two conserved epitopes were predicted for in°uenza A virus H7N9: one should match the broadly neutralizing antibodies CR9114 or FI6v3, while the other is completely new and novel. In general, when a new epitope is recognized by biological experiments or computational methods in the ¯rst strain of virus, our method can easily predict whether it is a conserved epitope or not. If it is predicted as a conserved epitope, its corresponding antibody is probably a broadly neutralizing antibody. Thus neutralization experiments are deserved to be conducted. 2. Materials and Methods 2.1. Datasets Using the structure data released by Protein Data Bank (PDB) as of 14 Jan 2014, three datasets of in°uenza A virus were constructed for this study. 1450023-3

J. Ren, J. Ellis & J. Li

First, ALL quaternary structures which contain in°uenza A HA epitopes were collected by the following steps.

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

.

ALL available in°uenza A HA complexes (quaternary structures) from PDB were retrieved. Only those structures whose molecule type is protein and whose experimental method is X-ray crystallographic were retained. Eligible quaternary structures are listed in Table 2, . Redundancy was removed. Symmetric units were removed, keeping only four chains: A HA1 chain and a HA2 chain of the antigen HA, and a light chain and a heavy chain of the antibody. These four chains must be paired. Incomplete structures (containing only one chain from the HA) were also retained, . Epitopes were computationally determined. If the Euclidian distance of two heavy atoms of two residues (one from antigen and one from antibody) is within 4 Å, the corresponding residue on the antigen side is considered as an epitope residue.17,18 Second, a re¯ned dataset was built, which consists of only those quaternary structures with quali¯ed tertiary structures and complete four chains. This involved: .

Removal of incomplete structures from the ¯rst dataset, Tertiary structures were obtained by sequence alignment. The quaternary structures were aligned with tertiary structures though the BLAST algorithm provided by PDB. The quaternary structure was discarded if the best sequence similarity is less than 95%. . Epitopes were mapped onto tertiary structures by structure alignment. Structure alignment (by jCE algorithms) was carried out for the quaternary structures and their paired tertiary structures. Those quaternary structures whose epitopes cannot be completely aligned were removed. Table 2 shows this alignment outcome. .

This re¯ned dataset (called E DS) contains 21 epitopes extracted from those quaternary structures with available tertiary structures in PDB. The remaining quaternary structures (with annotations) form another dataset (named T DS) and contain 11 epitopes. E DS is used as training data for generating AAACS model in order to identify conserved epitopes. T DS is used as an independent test dataset to verify the classi¯cation performance of AAACS. Another dataset (ER DS) is a nonredundant epitope residue dataset made up of 241 nonredundant epitope residues extracted from the above tertiary structures. ER DS is used for the identi¯cation of conserved epitope residues. The calculation of AAACS does not depend on tertiary structure. The reason epitopes were mapped onto tertiary structures is mainly for the construction of the nonredundant ER DS. When two quaternary structures have an identical antigen and their epitopes are partially overlapped, redundant residues can only be removed through the common tertiary structure to construct ER DS. The other reason is that it is considered as an arbitrary criterion to divide the quaternary structure dataset into the training dataset (E DS) and the independent test dataset (T DS). 1450023-4

Nonconserved epitopes

Conserved epitopes

1450023-5

3LZF 4HF5 1KEN 1EO8 1QFU 4MHH 4MHJ

A/South Carolina/1/1918 A/Japan/305þ/1957 A/X-31 A/X-31 A/X-31 A/Viet Nam/1203/2004 A/goose/Guangdong/1/1996

H1N1 H2N2 H3N2 H3N2 H3N2 H5N1 H5N1

H5N1 H7N7

4FQI A/Viet Nam/1203/2004 4FQV A/Netherlands/219/2003

Type c H1N1 H1N1 H1N1 H1N1 H2N2 H3N2 H3N2 H3N2 H3N2 H3N2 H5N1 H5N1

Antigen b

A/Brevig Mission/1/1918 A/Brevig Mission/1/1918 A/Brevig Mission/1/1918 A/California/04/2009 A/Japan/305/1957 A/Hong Kong/1/1968 A/Hong Kong/1/1968 A/Hong Kong/1/1968 A/Hong Kong/1/1968 A/Aichi/2/1968 A/Viet Nam/1203/2004 A/Viet Nam/1203/2004

3R2X 4EEF 3GBN 3ZTN 4HLZ 4FQR 4FQY 3SDY 4NM8 3ZTJ 3GBM 3FKU

QStru a

Fab 2D1 Fab 8F8 No name Fab BH151 HC45 H5M9 kappa H5M9 kappa

CR9114 CR9114

HB36.3 F-HB80.4 Fab CR6261 FI6V3 Fab C179 C05 CR9114 CR8020 CR8043 FI6V3 Fab CR6261 F10

Antibody d HA1&HA2 HA1&HA2 HA1&HA2 HA1&HA2 HA1&HA2 HA1 HA1&HA2 HA1&HA2 HA1&HA2 HA1&HA2 HA1&HA2 HA1&HA2 HA1&HA2 HA1&HA2 HA1 HA1 HA1 HA1 HA1 HA1 HA1

Y11 Y11 N20 N21 N22 N23 N24 N  H525 N  H525

Epitope f

Y Y4 Y6 Y10 Y1 Y13 Y11 Y8 Y9 Y10 Y6 Y1,19

3

BNA e

Table 2. Dataset and mapping relationship.

4GXX 3KU3 2YPG 2YPG 2YPG 2FK0 4MHI

2FK0 4DJ6

4GXX 4GXX 4GXX 3LZG 3KU3 4FNK 4FNK 4FNK 4FNK 2YPG 2FK0 2FK0

TStru g

0.95 1 1 1 1 1 1

1 0.95

0.95 0.95 0.95 1 1 1 1 1 1 1 1 1

HA1 h

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

1 1 1 1 1 1 1

1 1

1 1 1 1 1 1 1 1 1 1 1 1

HA2 i

Y Y Y Y (number changes) Y Y Y Y Y Y Y Y (number changes in chain A) Y Y (number changes in chain A) Y Y Y Y Y Y (number changes) Y

Alignment j

In°uenza A HA's conserved

1450023-6

3SM5 4HKX 2VIR 2VIS 2VIT 4FP8 1FRG

A/Solomon Islands/3/2006 A/Solomon Islands/3/2006 A/X-31 A/X-31 A/X-31 A/Hong Kong/1/1968 A/reassortant/X-47

4GMS A/Victoria/3/1975

HA1 HA1 HA1

N21 N21 Y28 HA1 HA1 HA1 HA1 HA1 HA1 HA1

HA1

HA1 HA0

Epitope f

NA27

26

N  H1 Y12

BNA e

H1N1 CH65 N  H129 H1N1 CH67 N  H130 H3N2 No name N31 H3N2 No name N31 H3N2 No name N31 H3N2 C05 Y13 H3N2 Fab IGG2A 26/9 NA32

Fab S139/1

Fab 2G1

Fab 8M2

1F1

Fab 5J8 Fab 39.29

Antibody d

4FNK

3KU3

3KU3

4GXX

3UYX 2YP8

TStru g

0.7 0.7 0.3 0.3 0.3 0.3 Too short

0.9

0.95

0.95

0.95

0.95 HA0

HA1 h

0.95

1

1

1

non 0.95

HA2 i

Y (number changes) Y (number totally changes) N (2 residues cannot be aligned) N (3 residues cannot be aligned) N (1 residue cannot be aligned) N (4 residues cannot be aligned)

Alignment j

QStru a : Quaternary structures obtained from PDB. Antigen b : The antigen of the quaternary structure in column QStru. Type c : Type of the antigen in column Antigen. Antibody d : The antibody of the quaternary structure in column QStru. BNA e : Whether the antibody in column Antibody is a broadly neutralizing antibody; it is determined by the existing literature. Epitope f : The epitope position. TStru g : The corresponding tertiary structure selected by sequence alignment. HA1 h : The sequence similarity of HA1 between the quaternary structure in column QStru and the tertiary structure in column TStru. HA2 i : The sequence similarity of HA2. Alignment j : Whether the epitopes on the quaternary structure is the same as that on the tertiary structures by structure alignment.

No tertiary structures

H2N2

4HG4 H3N2

H2N2

4HFU A/Japan/305þ/1957

A/Japan/305þ/1957

H1N1

4GXU A/Brevig Mission/1/1918

Epitope not map

H1N1 H3N2

Type c

4M5Z A/California/04/2009 4KVN A/Perth/16/2009

Antigen b

One chain

QStru a

Table 2. (Continued )

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

J. Ren, J. Ellis & J. Li

In°uenza A HA's conserved

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

2.2. De¯nition of conserved epitopes There are few research studies that de¯ne conserved epitopes explicitly. This study uses the following de¯nition. If an antibody neutralizes multiple in°uenza viruses from di®erent strains (e.g. H1 and H2) as reported in the literature, i.e. it is a broadly neutralizing antibody, the corresponding epitopes are all considered as conserved epitopes. If not, the epitopes are considered as nonconserved epitopes or strainspeci¯c epitopes. For example, H5M9 Kappa has been reported to neutralize di®erent H5N1 viruses only in the same strain,25 the corresponding epitopes are considered as nonconserved. By this de¯nition, the E DS dataset is further divided into a conserved epitope dataset (14 conserved epitopes) and a nonconserved epitope dataset (seven nonconserved epitopes). The T DS dataset is annotated in Sec. 3.1.2, and the details are shown in Table 2. The ER DS is divided into a conserved epitope residue dataset (159 nonredundant residues) and a nonconserved epitope residue dataset (82 nonredundant residues). The actual size of these datasets is smaller than that described in Table 1 because many of the complexes are not available in PDB or have not been experimentally resolved by X-ray technology. 2.3. AAACS and APSSM Two evolutionary propensities were investigated and compared for the identi¯cation of conserved epitopes. One is the amino acid conservation score. A smaller conservation score represents higher conservation of residues. In this study, conservation scores are computed by using the web server ConSurf,16,33,34 where all of the Multiple Sequence Alignment (MSA) parameters are set as default except that the protein database is changed to SWISS-PROT. The AAACS is the arithmetical average of the conservation score of the residues in an epitope. The other evolutionary propensity is the PSSM. PSSM is obtained by PSIBLAST.35 The process is iterated three times with default parameters in the ¯rst iteration, and a stricter e-value (1:0e  4) is used in the remaining two iterations. The Average PSSM (APSSM) (20-dimensional vector) is the arithmetical average of PSSMs by each dimension over each epitope. 2.4. Epitope similarity Two epitopes were compared through a structure alignment-based method: First, the viruses containing the two epitopes were aligned by structure alignment (jCE algorithm); according to the alignment results, the epitope similarity was calculated by the approach as described below. A similarity measurement for two epitopes is de¯ned by Eq. (1): SimilarityðEp1; Ep2Þ ¼

MatchðEp1  ; Ep2  Þ : MaxðEp1  ; Ep2  Þ

1450023-7

ð1Þ

J. Ren, J. Ellis & J. Li Table 3. Similarity of two epitopes to the broadly neutralizing antibody CR6261. Ep1: 3GBN/4GXX

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

Chain A A A A A A A A B B B B B B B B B B B B

Ep2: 3GBM/2FK0

Rnum

Rname

ER?

Chain

Rnum

Rname

ER?

38 40 41 42 291 292 293 318 19 20 21 38 41 42 45 46 49 52 53 56

HIS VAL ASN LEU SER LEU PRO THR ASP GLY TRP GLN THR GLN ILE ASP THR VAL ASN ILE

Y Y Y Y Y Y N N Y Y Y Y Y Y Y Y Y Y Y Y

A A A A A A A A B B B B B B B B B B B B

38 40 41 42 291 292 293 318 19 20 21 38 41 42 45 46 49 52 53 56

HIS GLN ASP ILE SER MET PRO THR ASP GLY TRP LYS THR GLN ILE ASP THR VAL ASN ILE

Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N

Identical p

p p p p p p p p p p p p p p

Statistics: There are a maximum of 20 residues for the two extended epitopes, of which 15 are similar. p Note: stands for aligned residues. ER? Stands that whether it is an epitope residue (Y) or extended residue (N).

Here, Match(Ep1  , Ep2  ) represents the number of paired epitope residues through a sequence alignment of the two viruses, while Max(Ep1  , Ep2  ) stands for the maximum number of epitope residues of the two epitopes. Ep  is an extension of epitope. For a certain epitope residue on one side (e.g. Ep1), if the corresponding residue on the other side is not included in epitope2, then Ep2 is extended and vice versa. An example is demonstrated in Table 3. 2.5. Prediction of conserved epitopes and their broadly neutralizing antibodies An epitope can be determined either through quaternary structures or by computational methods. Given an epitope, its AAACS could be calculated and used to determine whether it is conserved or strain-speci¯c through a cut-o® threshold. If the epitope is predicted as conserved, its corresponding antibody is probably a broadly neutralizing antibody. The potential broadly neutralizing antibodies can be investigated using an antibody database (e.g. antibodies in Table 1). The °owchart of the whole process is illustrated in Fig. 1. The main outcome of this study is a reallife application: Epitopes predicted as conserved represent new vaccine targets requiring experimental validation. 1450023-8

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

In°uenza A HA's conserved

Fig. 1. Identi¯cation of conserved epitopes and broadly neutralizing antibodies by AAACS.

3. Results and Discussion The results are presented in three parts. Part one covers the classi¯cation performance of the conservation score-based method on the training data and on the independent test dataset. The second part includes the performance of other methods to demonstrate why PSSM, alignment-based method, or location-based method are not e®ective in identifying conserved epitopes. The third part shows the prediction results on a in°uenza A virus H7N9 for identifying the conserved epitopes and their broadly neutralizing antibodies. 3.1. Prediction performance on conserved epitopes 3.1.1. Assessment on the training dataset E DS The AAACS of every epitope in E DS is listed in Table 4. The AAACS of the nonconserved epitopes is higher, ranging from 0.498 to 1.355. In contrast, the AAACS of the conserved epitopes is lower, ranging from 0.416 to 0.307. These two ranges do not overlap at any part. There is a clear AAACS distinction between the conserved epitopes and the strain-speci¯c epitopes in the training dataset. To evaluate the performance of the AAACS for identifying conserved epitopes by machine learning methods, a leave-one-out cross-validation process of the J48 decision tree algorithm (Weka 3.7) is applied. As shown in Table 5, the algorithm achieves an Matthew's correlation coe±cient (MCC) of 0.901: Only one conserved epitope (extracted from 4FQR) is wrongly classi¯ed as nonconserved. This performance is signi¯cantly better than when the tree algorithm runs on the integrated 1450023-9

J. Ren, J. Ellis & J. Li

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

Table 4. AAACS of epitopes in E DS. Nonconserved

AAACS

Conserved

AAACS

3LZF 4HF5 1KEN 1EO8 1QFU 4MHH 4MHJ

0.818 0.776 0.498 0.908 1.001 0.672 1.355

Range

 0.498

3R2X 4EEF 3GBN 3ZTN 4HLZ 4FQR 4FQY 3SDY 4NM8 3ZTJ 3GBM 3FKU 4FQI 4FQV Range

0.243 0.359 0.370 0.326 0.209 0.307 0.379 0.020 0.161 0.285 0.416 0.218 0.322 0.266  0.307

Table 5. Identi¯cation of conserved epitopes by two evolutionary features in E DS. Matthew's correlation coe±cient (MCC) and F -Measure are commonly used evaluation metrics on imbalanced dataset. Features AAACS APSSM AAACS and APSSM

MCC

F -Measure

0.901 0.395 0.901

0.953 0.400 0.953

evolutionary feature APSSM. In fact, the APSSM feature wrongly classi¯es all nonconserved epitopes as conserved and ¯ve conserved epitopes as strain-speci¯c. The classi¯cation performance cannot improve when these two features AAACS and APSSM are combined by the decision tree. This indicates that APSSM is unable to contribute to the classi¯cation performance. 3.1.2. Prediction performance on the independent test dataset T DS The AAACS of every epitope in T DS is listed in the 6th column of Table 6. Whether or not an epitope in T DS can interact with a broadly neutralizing antibody was obtained from a literature search. The annotations (Y or N) are listed in the 7th column of Table 6. A simple decision tree model is trained on the training dataset E DS and then tested on T DS to determine whether the independent test epitopes can be predicted as conserved or not. The prediction results are shown in the last column of Table 6. The rule in the decision tree is that: Any epitopes with an AAACS above 0.307 are identi¯ed as nonconserved epitopes, while all others are identi¯ed as conserved epitopes. Under this rule, one conserved epitope (extracted from 4GMS) in T DS 1450023-10

In°uenza A HA's conserved Table 6. Prediction results by a decision tree model on T DS.

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

PDB ID 4HG4 3SM5 4HKX 4HFU 2VIR 4M5Z 2VIT 2VIS 4GMS 4FP8 4KVN

Antigen

Type

A/Japan/305þ/1957 A/Solomon Islands/3/2006 A/Solomon Islands/3/2006 A/Japan/305þ/1957 A/X-31 A/California/04/2009 A/X-31 A/X-31 A/Victoria/3/1975 A/Hong Kong/1/1968 A/Perth/16/2009

H2N2 H1N1 H1N1 H2N2 H3N2 H1N1 H3N2 H3N2 H3N2 H3N2 H3N2

Antibody 2G1 CH65 CH67 8M2 NA 3J8 NA NA S139/1 C05 Fab39.29

Chain HA1 HA1 HA1 HA1 HA1 HA1 HA1 HA1 HA1 HA1 HA0

AAACS

BNA

Predict

21

0.893 0.654 0.608 0.568 0.435 0.434 0.385 0.373 0.354 0.220 0.509

N N N N N N N N N Y Y

N N29 N30 N21 N31 N26 N31 N31 Y28 Y13 Y12

BNA: broadly neutralizing antibody

is wrongly classi¯ed as nonconserved. The misclassi¯cation is attributed to the decision tree algorithm, as it randomly selects the boundary AAACS of the conserved epitopes as the cut-o® threshold for classi¯cation. An alternative method for selecting a proper threshold is to use a probabilistic model. We ¯t two normal curves for the AAACS distribution of the conserved epitopes and the nonconserved epitopes in E DS. For each test data, we calculate the probability belonging to di®erent phenotypes. The phenotype with a larger probability is assigned to the target. The prediction results are shown in Table 7. As can be seen, the epitope extracted from 4GMS is still wrongly predicted. This is probably due to the limited data. Actually looking at all scores as shown in Tables 4 and 6, there is still a clear separation between all the conserved epitopes and nonconserved epitopes. The determination of new epitopes may help further improve the classi¯cation performance. 3.1.3. Prediction by a 3-interval scoring approach Figure 2 displays the AAACS of each epitope in both the training dataset E DS and the test dataset T DS. This ¯gure suggests a novel way of prediction with three Table 7. Prediction results by a probabilistic model on T DS. PDB ID

AAACS

Conserved probability

Nonconserved probability

BNA

Predict

4HG4 3SM5 4HKX 4HFU 2VIR 4M5Z 2VIT 2VIS 4GMS 4FP8 4KVN

0.893 0.654 0.608 0.568 0.435 0.434 0.385 0.373 0.354 0.22 0.509

1.27E  07 2.70E  05 6.60E  05 1.38E  04 1.29E  03 1.31E  03 2.71E  03 3.21E  03 4.19E  03 2.22E  02 9.19E  01

5.47E  01 2.23E  01 1.76E  01 1.40E  01 5.83E  02 5.79E  02 3.98E  02 3.61E  02 3.09E  02 9.11E  03 2.26E  07

N N N N N N N N Y Y Y

N N N N N N N N N Y Y

BNA: broadly neutralizing antibody 1450023-11

J. Ren, J. Ellis & J. Li

Datasets

Testing

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

Training

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

AAACS

Fig. 2. Conservation score of the epitopes in E DS and T DS. The dotted lines divide the region into three regions: \conserved region", `unknown region", and \nonconserved region" (from left to right). The left dotted line is the border of \conserved region", and the right dotted line is the border of \nonconserved region". The vertical solid line stands for the middle line of the above two borders: Two nonconserved epitopes (2VIS and 2VIT) in the test data are wrongly predicted as conserved epitopes by the middle line.

AAACS regions. One is a \conserved region"    if the AAACS of an epitope is less than or equal to 0.307 (the maximum AAACS of the conserved epitopes in the training data E DS), then the epitope is assigned to the \conserved region", where all epitopes are believed to be conserved. The second region is a \nonconserved region"    if the AAACS of an epitope is greater than or equal to 0.498 (the minimum AAACS of the nonconserved epitopes in the training data E DS), then the epitope is assigned to the \nonconserved region", where all epitopes are predicted as nonconserved. The third one is an \unknown region"    epitopes with an AAACS between 0.307 and 0.498 are assigned to the \unknown region". Because there is no training data in this region, any test epitope falling in this region cannot be predicted whether it is conserved or not. By this three-interval approach, the six epitopes (see Table 6) in the test dataset T DS falling into the \conserved region" or the \nonconserved region" have got predicted labels, while the ¯ve epitopes falling into the \unknown region" are still undecided. Those labeled epitopes of the test data in the \conserved region" and the \nonconserved region" are added to enrich the training data, and then new strategies can be applied to identify conserved epitopes in the \unknown region". A simple method is by calculating the centroids of the conserved and nonconserved epitopes. The centroid of the nonconserved epitopes is 0.796, and that of the conserved epitopes is 0.202. Then, the middle point (0.297) of the two centroids is used as a cuto® threshold to classify the ¯ve epitopes in the \unknown region": four epitopes are correctly predicted, while only one conserved epitope (extracted from 4GMS) is wrongly predicted as nonconserved. The probabilistic model mentioned above 1450023-12

In°uenza A HA's conserved Table 8. Prediction results by probabilistic model for the cases in the \unknown region". PDB ID 2VIR 4M5Z 2VIT 2VIS 4GMS

AAACS

Conserved probability

Nonconserved probability

BNA

Predict

0.435 0.434 0.385 0.373 0.354

4.06E  03 4.11E  03 7.36E  03 8.44E  03 1.04E  02

6.88E  02 6.82E  02 4.54E  02 4.09E  02 3.45E  02

N N N N Y

N N N N N

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

BNA: broadly neutralizing antibody

can also be applied in this enriched dataset. The prediction result is reported in Table 8, showing that the same conserved epitope (the one extracted from 4GMS) is also recognized as nonconserved. For the epitopes in the test dataset T DS, all epitopes falling into the \conserved region" or the \nonconserved region" are correctly predicted, con¯rming that AAACS is an e®ective criterion in identifying conserved epitopes and nonconserved epitopes. Furthermore, the prediction methods only misclassify one epitope in the T DS. However, there are still nonoverlapping regions for all existing structure data (i.e. \conserved region": (1, 0.354], \nonconserved region": [0.373, þ1), and \unknown region": (0.354, 0.373)). The border of the \unknown region" is probably to be also re¯ned and narrowed down with future discovery of new samples to double ensure prediction accuracy. 3.2. Alignment-based comparison method does not work for identifying conserved epitopes As introduced, a broadly neutralizing antibody can neutralize viruses from di®erent strains. A direct and intuitive way to identify conserved epitopes is to compare epitopes from di®erent strains. However, the alignment-based comparison method (as described in Sec. 2.4) is unsuitable for the following reasons. First, the conserved epitopes to the same broadly neutralizing antibody are not necessarily identical. For example, CR6261 neutralizes both A/Brevig Mission/1/1918 H1N1 (3GBN) and A/Viet Nam/1203/2004 H5N1 (3GBM), but the two epitopes di®er from each other at many positions. The epitope on H1N1 and that on H5N1 consist of 18 and 19 residues respectively. Five pairs of residues between them are mutated residues, and there are another three residues unique to the two epitopes. Only 12 residues can be totally aligned (Table 9). Second, conserved epitopes to the same broadly neutralizing antibody are not even similar. As di®erent broadly neutralizing antibodies can neutralize a di®erent range of viruses, the similarity between di®erent pairs of conserved epitopes varies greatly: If the two conserved epitopes belongs to viruses with farther kinship, their similarity seems to be even lower. Some epitope similarity results are shown in Table 10. Third, some pairs of epitopes are quite similar, but they are not conserved epitopes binding to the same broadly neutralizing antibody. For example, a single 1450023-13

J. Ren, J. Ellis & J. Li Table 9. Two conserved epitopes of CR6261.

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

3GBN

3GBM

Chain

Residue no.

Residue name

Chain

Residue no.

Residue name

Compare

A A A A A A

38 40 41 42 291 292

HIS VAL ASN LEU SER LEU

B B B B B B B B B B B B

19 20 21 38 41 42 45 46 49 52 53 56

ASP GLY TRP GLN THR GLN ILE ASP THR VAL ASN ILE

A A A A A A A A B B B B B B B B B B B

38 40 41 42 291 292 293 318 19 20 21 38 41 42 45 46 49 52 53

HIS GLN ASP ILE SER MET PRO THR ASP GLY TRP LYS THR GLN ILE ASP THR VAL ASN

Identical Mutated Mutated Mutated Identical Mutated Unique Unique Identical Identical Identical Mutated Identical Identical Identical Identical Identical Identical Identical Unique

Table 10. Epitope similarity. Each row shows two epitopes to the same antibody. PDB ID 3ZTN&3ZTJ 4FQI&4FQY 4FQI&4FQV 4FQY&4FQV 3GBN&3GBM

Group

Antigen1

Antigen2

Antibody

Similarity (%)

1&2 1&2 1&2 2 1

H1N1 H5N1 H5N1 H3N2 H1N1

H3N2 H3N2 H7N7 H7N7 H5N1

FI6V3 CR9114 CR9114 CR9114 CR6261

68.42 54.17 54.55 78.95 75.00

mutation can lead to immune escape: A point mutation at residue 89 in HA2 makes the 2009 H1N1 unable to be neutralized by the sera from persons immunized with A/ Solomon Islands/3/2006 and A/Brisbane/59/2007 H1N1 seasonal in°uenza vaccines.36 On the other hand, substitution of epitope residues does not a®ect the binding activity.1 Therefore, it is impractical to identify conserved epitopes simply by comparing multiple epitopes and looking for similar ones. 3.3. Hard to distinguish conserved or nonconserved epitope residues Conserved epitopes are believed to be closely related to virus evolution, and thus their epitope residues may be di®erent from nonconserved epitope residues in terms of evolutionary propensities. However, our analysis represented below shows that 1450023-14

In°uenza A HA's conserved Table 11. Identi¯cation of conserved epitope residues in ER DS by evolutionary features. Features

MCC

F-measure

Conservation score PSSM Conservation score and PSSM Conservation score and PSSM(Ser)

0.409 0.156 0.316 0.416

0.730 0.627 0.696 0.736

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

Note: PSSM(Ser) and Conservation score are picked up by the BestFirst algorithm implemented by Weka 3.7. PSSM (Ser) stands for the Ser dimension of the PSSM vector.

there is no clear separation between conserved epitope residues and nonconserved epitope residues by machine learning methods. A 10-fold cross-validation process was conducted using the decision tree algorithm J48 (Weka 3.7) on the residue dataset ER DS for the classi¯cation of conserved epitope residues and nonconserved epitope residues. Both the evolutionary propensities conservation score and PSSM for the residues were used. The classi¯cation performance is presented in Table 11. It can be seen that neither a single evolutionary propensity nor the combination of the two features can distinguish well the conserved epitope residues. The best MCC obtained is only 0.416 by combining the evolutionary features conservation score and PSSM(Ser). It signi¯es that the residue-based approach to the identi¯cation of conserved epitopes is not accurate, which may be attributed to that not all residues in conserved epitopes are conserved. Meanwhile, it was found that the conservation score is better than PSSM for characterizing conserved epitope residues. The use of a conservation score achieves an MCC of 0.409, which is competitive with the performance achieved by the best combination of the features. This observation supports the idea that the AAACS of an epitope, which has an aggregation e®ect over an entire epitope, is likely to be an excellent feature to describe conserved epitopes. 3.4. Locations of HA epitopes It is commonly known that HA2 is more conserved than HA1.37 If an epitope is extended to HA2 or it is contained in HA2, it is more likely to be a conserved epitope. If an epitope is located at HA1, then it is more likely to be a nonconserved epitope. For the data collected for this study, the epitopes extended to HA2 or contained in HA2 are all conserved epitopes. However, there are three conserved epitopes (those from 4FQR, 4GMS, and 4FP8) which are separately located at HA1 where all of the nonconserved epitopes are located. In fact from Fig. 3, it can be seen that some of these HA1-located conserved epitopes (e.g. 4FQR) are very near the area of a nonconserved epitope (e.g. 1KEN). This implies that location di®erence is not su±cient to identify conserved epitopes. As the AAACS of these HA1-located conserved epitopes are signi¯cantly lower than the other epitopes on HA1, using AAACS to 1450023-15

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

J. Ren, J. Ellis & J. Li

Fig. 3. The locations of epitopes on H3N2. All H3N2 epitopes are mapped onto 4FNK. HA1 is colored in green, and HA2 is colored in cyan. All conserved epitopes are colored in warm colors: 4FQR red, 4FQY magentas, 3SDY orange, and 4NM8 pink. All nonconserved epitopes are colored in cold colors: 1EO8 blue, 1KEN purple blue, 1QFU marine. Some epitopes are overlapped, such as 4FQR and 1KEN, 1EO8 and 1QFU, 3SDY, and 4NM8. It shows that conserved epitopes (e.g. 4FQR epitope) and nonconserved epitopes (e.g. 1KEN epitope) cannot be accurately identi¯ed by location.

identify conserved epitopes can correct the mistakes by the location-based approach. Using AAACS to identify conserved epitopes is also more scalable, and it would be applicable to any other viruses in addition to in°uenza A. 3.5. Conserved epitopes of A/Shanghai/02/2013 (H7N9): A prediction result In 2013, a H7N9 pandemic outbreak in China lead to 132 human infections and 37 deaths during the spring according to the WHO report of H7N9 on 31 May 2013. A/ Shanghai/02/2013 (H7N9) was one of the major H7N9 viruses that caused this pandemic. There is no quaternary structure related to A/Shanghai/02/2013 (H7N9) in PDB. As tertiary structures of this virus have been determined, one of them (4N5J) was used as input to a B-cell epitope prediction algorithm (CeePre retrained on this E DS) and made a prediction of two epitopes for the HA of this virus.38 These two predicted epitopes are displayed in yellow and red in Fig. 4. The red-color epitope consists of 22 residues on chain HA1 and chain HA2; while the yellow epitope 1450023-16

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

In°uenza A HA's conserved

Fig. 4. Two predicted epitopes for A/Shanghai/02/2013 (H7N9). The epitopes are predicted by an extended version of our unpublished predictor CeePre (ISMB2014), which are colored in red and yellow. Two isolated epitope residues are removed. HA1 is colored in green, while HA2 is colored in cyan.

is made up of 10 residues on chain HA2. The AAACS of the red-color epitope is 0.227, and that of the yellow one is 0.244. These scores both falls into the \conserved region" of Fig. 2. Therefore, it is likely that both of them will be conserved epitopes. Structure alignments (by jCE) were performed for these two epitopes with those existing epitopes in the current dataset (E DS and T DS) in order to ¯nd out the potential antibodies. The red epitope has a high degree of similarity with those epitopes extracted from 4FQV (corresponding antibody CR9114), 4FQY (CR9114), 4FQI (CR9114), 3ZTJ (FI6V3), 3ZTN (FI6V3), 3GBN (CR6261), 3GBM (CR6261), 3FKU (F10), 4EEF (F-HB80.4), and 3R2X (HB36.3). Of these antibodies, only CR9114 and FI6V3 are reported to neutralize in°uenza A viruses from Group 2. We infer that CR9114 and FI6V3 are probably the potential antibodies to this predicted epitope. The yellow epitope is a novel epitope, and is quite di®erent from any epitope currently known. Therefore, it is not possible to infer the corresponding antibody by this method, and requires validation by wet-lab experiments. 4. Conclusion In this work, a simple and very e®ective classi¯cation method is proposed to distinguish between conserved epitopes and nonconserved epitopes of in°uenza A surface antigen HA. The novelty of this method is the use of AAACS of an epitope as 1450023-17

J. Ren, J. Ellis & J. Li

a criteria for the classi¯cation. This method is better than PSSM, the alignmentbased method, or the location-based method. Prediction results are also presented for the epitopes of H7N9, an in°uenza A virus that was recently widespread in China. Two epitopes occur on the HA of H7N9, and both of them are conserved. One of the conserved epitopes matches the broadly neutralizing antibody CR9114 or FI6V3. The AAACS scoring method presented here can be easily extended for the study of other families of antigens.

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

Acknowledgments This research work was partially supported by a UTS 2013 Early Career Research Grant and an ARC Discovery Project (DP130102124); and the China Scholarship Council. We thank Dr. Q. Liu for his suggestions. References 1. Dreyfus C, Ekiert DC, Wilson IA, Structure of a classical broadly neutralizing stem antibody in complex with a pandemic H2 in°uenza virus hemagglutinin, J Virol 87(12):7149–7154, 2013. 2. Tong S, Zhu X, Li Y, Shi M, Zhang J, Bourgeois M, Yang H, Chen X, Recuenco S, Gomez J, New world bats harbor diverse in°uenza A viruses, PLoS Pathogens 9(10):e1003657, 2013. 3. Fleishman SJ, Whitehead TA, Ekiert DC, Dreyfus C, Corn JE, Strauch E-M, Wilson IA, Baker D, Computational design of proteins targeting the conserved stem region of in°uenza hemagglutinin, Science 332(6031):816–821, 2011. 4. Whitehead TA, Chevalier A, Song Y, Dreyfus C, Fleishman SJ, Mattos CD, Myers CA, Kamisetty H, Blair P, Wilson IA, Optimization of a±nity, speci¯city and function of designed in°uenza inhibitors using deep sequencing, Nat Biotechnol 30(6):543–548, 2012. 5. Okuno Y, Matsumoto KI, Isegawa Y, Ueda S, Protection against the mouse-adapted A/ FM/1/47 strain of in°uenza A virus in mice by a monoclonal antibody with cross-neutralizing activity among H1 and H2 strains, J Virol 68(1):517–520, 1994. 6. Ekiert DC, Bhabha G, Elsliger MA, Friesen RHE, Jongeneelen M, Throsby M, Goudsmit J, Wilson IA, Antibody recognition of a highly conserved in°uenza virus epitope, Science 324(5924):246–251, 2009. 7. Kashyap AK, Steel J, Rubrum A, Estelles A, Briante R, Ilyushina NA, Xu L, Swale RE, Faynboym AM, Foreman PK, Protection from the 2009 H1N1 pandemic in°uenza by an antibody from combinatorial survivor-based libraries, PLoS Pathogens 6(7):e1000990, 2010. 8. Ekiert DC, Friesen RHE, Bhabha G, Kwaks T, Jongeneelen M, Yu W, Ophorst C, Cox F, Korse HJWM, Brandenburg B, A highly conserved neutralizing epitope on group 2 in°uenza A viruses, Science 333(6044):843–850, 2011. 9. Friesen RHE, Lee PS, Stoop EJM, Ho®man RMB, Ekiert DC, Bhabha G, Yu W, Juraszek J, Koudstaal W, Jongeneelen M, A common solution to group 2 in°uenza virus neutralization, Proc Nat Acad Sci 111(1):445–450, 2014. 10. Corti D, Voss J, Gamblin SJ, Codoni G, Macagno A, Jarrossay D, Vachieri SG, Pinna D, Minola A, Vanzetta F, A neutralizing antibody selected from plasma cells that binds to group 1 and group 2 in°uenza A hemagglutinins, Science 333(6044):850–856, 2011.

1450023-18

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

In°uenza A HA's conserved

11. Dreyfus C, Laursen NS, Kwaks T, Zuijdgeest D, Khayat R, Ekiert DC, Lee JH, Metlagel Z, Bujny MV, Jongeneelen M, Highly conserved protective epitopes on in°uenza B viruses, Sci Signal 337(6100):1343, 2012. 12. Nakamura G, Chai N, Park S, Chiang N, Lin Z, Chiu H, Fong R, Yan D, Kim J, Zhang J, An in vivo human-plasmablast enrichment technique allows rapid identi¯cation of therapeutic in°uenza A antibodies, Cell Host Microbe 14(1):93–103, 2013. 13. Ekiert DC, Kashyap AK, Steel J, Rubrum A, Bhabha G, Khayat R, Lee JH, Dillon MA, ONeil RE, Faynboym AM, Cross-neutralization of in°uenza A viruses mediated by a single antibody loop, Nature 489(7417):526–532, 2012. 14. Yoshida R, Igarashi M, Ozaki H, Kishida N, Tomabechi D, Kida H, Ito K, Takada A, Cross-protective potential of a novel monoclonal antibody directed against antigenic site B of the hemagglutinin of in°uenza A viruses, PLoS Pathogens 5(3):e1000350, 2009. 15. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N, Rate4site: An algorithmic tool for the identi¯cation of functional regions in proteins by surface mapping of evolutionary determinants within their homologues, Bioinformatics 18(suppl 1):S71–S77, 2002. 16. Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N, Consurf: Identi¯cation of functional regions in proteins by surface-mapping of phylogenetic information, Bioinformatics 19(1):163–164, 2003. 17. Ponomarenko J, Bui H-H, Li W, Fusseder N, Bourne PE, Sette A, Peters B, Ellipro: A new structure-based tool for the prediction of antibody epitopes, BMC Bioinformatics 9(1):514, 2008. 18. Stave JW, Lindpaintner K, Antibody and antigen contact residues de¯ne epitope and paratope size and structure, J Immunol 191(3):1428–1435, 2013. 19. Sui J, Hwang WC, Perez S, Wei G, Aird D, Chen L-M, Santelli E, Stec B, Cadwell G, Ali M, Structural and functional bases for broad-spectrum neutralization of avian and human in°uenza A viruses, Nat Struct Molec Biol 16(3):265–273, 2009. 20. Xu R, Ekiert DC, Krause JC, Hai R, Crowe JE, Wilson IA, Structural basis of preexisting immunity to the 2009 H1N1 pandemic in°uenza virus, Science 328(5976):357– 360, 2010. 21. Xu R, Krause JC, McBride R, Paulson JC, Crowe JE Jr, Wilson IA, A recurring motif for antibody recognition of the receptor-binding site of in°uenza hemagglutinin, Nat Struct Molec Biol 20(3):363–370, 2013. 22. Barbey-Martin C, Gigant B, Bizebard T, Calder LJ, Wharton SA, Skehel JJ, Knossow M, An antibody that prevents the hemagglutinin low pH fusogenic transition, Virology 294(1):70–74, 2002. 23. Fleury D, Daniels RS, Skehel JJ, Knossow M, Bizebard T, Structural evidence for recognition of a single epitope by two distinct antibodies, PROTEINS: Struct Funct Bioinform 40(4):572–578, 2000. 24. Fleury D, Barrre B, Bizebard T, Daniels RS, Skehel JJ, Knossow M, A complex of in°uenza hemagglutinin with a neutralizing antibody that binds outside the virus receptor binding site, Nat Struct Molec Biol 6(6):530–534, 1999. 25. Zhu X, Guo YH, Jiang T, Wang YD, Chan KH, Li XF, Yu W, McBride R, Paulson JC, Yuen KY, Qin CF, Che XY, Wilson IA, A unique and conserved neutralization epitope in H5N1 in°uenza viruses identi¯ed by a murine antibody against the A/Goose/Guangdong/1/96 hemagglutinin, J Virol 87(23):12619–12635, 2013. 26. Hong M, Lee PS, Ho®man RMB, Zhu X, Krause JC, Laursen NS, Yoon S-I, Song L, Tussey L, Crowe JE, Antibody recognition of the pandemic H1N1 in°uenza virus hemagglutinin receptor binding site, J Virol 87(22):12471–12480, 2013.

1450023-19

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

J. Ren, J. Ellis & J. Li

27. Tsibane T, Ekiert DC, Krause JC, Martinez O, Crowe JE Jr, Wilson IA, Basler CF, In°uenza human monoclonal antibody 1F1 interacts with three major antigenic sites and residues mediating human receptor speci¯city in H1N1 viruses, PLoS Pathogens 8(12): e1003067, 2012. 28. Lee PS, Yoshida R, Ekiert DC, Sakai N, Suzuki Y, Takada A, Wilson IA, Heterosubtypic antibody recognition of the in°uenza virus hemagglutinin receptor binding site enhanced by avidity, Proc Nat Acad Sci 109(42):17040–17045, 2012. 29. Whittle JRR, Zhang R, Khurana S, King LR, Manischewitz J, Golding H, Dormitzer PR, Haynes BF, Walter EB, Moody MA, Broadly neutralizing human antibody that recognizes the receptor-binding pocket of in°uenza virus hemagglutinin, Proc Nat Acad Sci 108(34):14216–14221, 2011. 30. Schmidt AG, Xu H, Khan AR, ODonnell T, Khurana S, King LR, Manischewitz J, Golding H, Suphaphiphat P, Car¯ A, Precon¯guration of the antigen-binding site during a±nity maturation of a broadly neutralizing in°uenza virus antibody, Proc Nat Acad Sci 110(1):264–269, 2013. 31. Fleury D, Wharton SA, Skehel JJ, Knossow M, Bizebard T, Antigen distortion allows in°uenza virus to escape neutralization, Nat Struct Molec Biol 5(2):119–123, 1998. 32. Churchill MEA, Stura EA, Pinilla C, Appel JR, Houghten RA, Kono DH, Balderas RS, Fieser GG, Schulze-Gahmen U, Wilson IA, Crystal structure of a peptide complex of antiin°uenza peptide antibody Fab 26/9: Comparison of two di®erent antibodies bound to the same peptide antigen, J Molec Biol 241(4):534–556, 1994. 33. Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, Ben-Tal N, Consurf 2005: The projection of evolutionary conservation scores of residues on protein structures, Nucleic Acids Res 33(suppl 2):W299–W302, 2005. 34. Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N, Consurf 2010: Calculating evolutionary conservation in sequence and structure of proteins and nucleic acids, Nucleic Acids Res 38(suppl 2):W529–W533, 2010. 35. McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, Cowley AP, Lopez R, Analysis tool web services from the EMBL-EBI, Nucleic Acids Res 41(W1):W597–W600, 2013. 36. Wang W, Anderson CM, De Feo CJ, Zhuang M, Yang H, Vassell R, Xie H, Ye Z, Scott D, Weiss CD, Cross-neutralizing antibodies to pandemic 2009 H1N1 and recent seasonal H1N1 in°uenza A strains in°uenced by a mutation in hemagglutinin subunit 2, PLoS Pathogens 7(6):e1002081, 2011. 37. Bommakanti G, Citron MP, Hepler RW, Callahan C, Heidecker GJ, Najar TA, Lu X, Joyce JG, Shiver JW, Casimiro DR, Design of an HA2-based escherichia coli expressed in°uenza immunogen that protects mice from pathogenic challenge, Proc Nat Acad Sci 107(31):13701–13706, 2010. 38. Ren J, Liu Q, Ellis J, Li J, Tertiary structure-based prediction of conformational B-cell epitopes through B factors, Bioinformatics 30(12):i264–i273, 2014.

1450023-20

In°uenza A HA's conserved

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 01/17/15. For personal use only.

Jing Ren is a Ph.D. candidate at Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney. She got her Master's degree of Engineering from National University of Defence Technology (NUDT, China), and obtained her Bachelor's degree of Science from Jilin University (China). Jing Ren's current research is about conserved B-cell epitope prediction, and her research interests are focused on structure bioinformatics, data mining, and computer architecture.

John Ellis is Professor of Molecular Biology at the University of Technology Sydney with 30 years of experience in molecular parasitology. His current research interests are mainly focused on the development of vaccines and diagnostics for parasitic diseases. Recent progress has included development of \Vacceed" which is a pipeline for the application of reverse vaccinology to eukaryotic pathogens. His experimental research work includes transcriptomics, the study of di®erential gene expression, and the use of animal models to evaluate vaccines.

Jinyan Li is an Associate Professor and core member at Advanced Analytics Institute and Center for Health Technologies, Faculty of Engineering and IT, University of Technology, Sydney, Australia. His research is focused on fundamental data mining algorithms, machine learning, gene expression data analysis, structural bioinformatics, and information theory. He is known for the notion of emerging patterns in data mining, and is known for \double water exclusion" hypothesis in bioinformatics. Jinyan obtained his Ph.D. from the University of Melbourne, Master degree in Engineering from Hebei University of Technology, and Bachelor degree in Science from National University of Defense Technology.

1450023-21

Influenza A HA's conserved epitopes and broadly neutralizing antibodies: a prediction method.

A conserved epitope is an epitope retained by multiple strains of influenza as the key target of a broadly neutralizing antibody. Identification of co...
994KB Sizes 0 Downloads 4 Views