crossmark

Genomic Resolution of Outbreak-Associated Legionella pneumophila Serogroup 1 Isolates from New York State Brian H. Raphael,a Deborah J. Baker,b Elizabeth Nazarian,b Pascal Lapierre,b Dianna Bopp,b Natalia A. Kozak-Muiznieks,a Shatavia S. Morrison,a Claressa E. Lucas,a Jeffrey W. Mercante,a Kimberlee A. Musser,b Jonas M. Winchella Respiratory Diseases Branch, Centers for Disease Control and Prevention, Atlanta, Georgia, USAa; Wadsworth Center, New York State Department of Health, Albany, New York, USAb

ABSTRACT

A total of 30 Legionella pneumophila serogroup 1 isolates representing 10 separate legionellosis laboratory investigations (“outbreaks”) that occurred in New York State between 2004 and 2012 were selected for evaluation of whole-genome sequencing (WGS) approaches for molecular subtyping of this organism. Clinical and environmental isolates were available for each outbreak and were initially examined by pulsed-field gel electrophoresis (PFGE). Sequence-based typing alleles were extracted from WGS data yielding complete sequence types (ST) for isolates representing 8 out of the 10 outbreaks evaluated in this study. Isolates from separate outbreaks sharing the same ST also contained the fewest differences in core genome single nucleotide polymorphisms (SNPs) and the greatest proportion of identical allele sequences in a whole-genome multilocus sequence typing (wgMLST) scheme. Both core SNP and wgMLST analyses distinguished isolates from separate outbreaks, including those from two outbreaks sharing indistinguishable PFGE profiles. Isolates from a hospital-associated outbreak spanning multiple years shared indistinguishable PFGE profiles but displayed differences in their genome sequences, suggesting the presence of multiple environmental sources. Finally, the rtx gene demonstrated differences in the repeat region sequence among ST1 isolates from different outbreaks, suggesting that variation in this gene may be useful for targeted molecular subtyping approaches for L. pneumophila. This study demonstrates the utility of various genome sequence analysis approaches for L. pneumophila for environmental source attribution studies while furthering the understanding of Legionella ecology. IMPORTANCE

We demonstrate that whole-genome sequencing helps to improve resolution of Legionella pneumophila isolated during laboratory investigations of legionellosis compared to traditional subtyping methods. These data can be important in confirming the environmental sources of legionellosis outbreaks. Moreover, we evaluated various methods to analyze genome sequence data to help resolve outbreak-related isolates.

L

egionella pneumophila causes severe, sometimes fatal, pneumonia known as Legionnaires’ disease (LD) in susceptible individuals. LD is transmitted to humans through man-made systems that disperse aerosols of contaminated water. These systems can include those using potable water (PW), such as showerheads and spas, or nonpotable systems such as cooling towers (CTs) associated with building air conditioning units or decorative fountains (1). During LD outbreak investigations, identifying environmental sources is a high priority, and L. pneumophila isolates from likely sources may be compared with those isolated from clinical specimens (e.g., sputum) to confirm a suspected source and focus remediation efforts. Active bacterial core (ABC) surveillance for legionellosis at 10 sites within the United States revealed an incidence rate of 1.3 cases per 100,000 population (2). Similarly, surveillance in 29 European countries revealed 1.20 cases of legionellosis per 100,000 inhabitants in 2010 (3). There are 15 known serogroups of L. pneumophila, and isolates belonging to serogroup 1 are most frequently obtained from LD cases (4). In addition, monoclonal antibody (MAb) panels can be used to further subtype L. pneumophila serogroup 1 isolates (5, 6). In particular, MAb 2 from the “Joly” MAb panel (5) recognizes an epitope associated with lipopolysaccharide in L. pneumophila and is reactive with ⬎94% of the isolates from legionellosis outbreaks investigated by CDC (7). Since serological characterization of L. pneumophila isolates

3582

aem.asm.org

offers low discriminatory power among strains, DNA-based subtyping methods are often employed in outbreak investigations. Pulsed-field gel electrophoresis (PFGE) and sequence-based typing (SBT) are two common approaches to the molecular subtyping of L. pneumophila (8). PFGE profiles can be compared by identifying differences in banding patterns of various isolates (9), while SBT uses DNA sequence data derived from a combination of seven alleles (flaA, pilE, asd, mip, mompS, proA, and neuA) to produce a sequence type (ST) using an international database of allele sequences (8). Due to its ease of standardization, SBT is the most widely used typing method for L. pneumophila.

Received 2 February 2016 Accepted 4 April 2016 Accepted manuscript posted online 8 April 2016 Citation Raphael BH, Baker DJ, Nazarian E, Lapierre P, Bopp D, Kozak-Muiznieks NA, Morrison SS, Lucas CE, Mercante JW, Musser KA, Winchell JM. 2016. Genomic resolution of outbreak-associated Legionella pneumophila serogroup 1 isolates from New York State. Appl Environ Microbiol 82:3582–3590. doi:10.1128/AEM.00362-16. Editor: D. W. Schaffner, Rutgers, The State University of New Jersey Address correspondence to Jonas M. Winchell, [email protected]. Supplemental material for this article may be found at http://dx.doi.org/10.1128 /AEM.00362-16. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

Applied and Environmental Microbiology

June 2016 Volume 82 Number 12

Resolving L. pneumophila Isolates Using WGS

L. pneumophila isolates from LD outbreak investigations have been examined using whole-genome sequencing (WGS) to confirm suspected environmental sources, including cooling towers and hot water supplies in hospital-associated outbreaks (10, 11). Genome sequence comparison of several patient and environmental isolates in a large community-associated LD outbreak occurring in Quebec City, Canada, helped identify a specific cooling tower implicated as the source of the outbreak (12). These results also revealed that isolates associated with a previous LD outbreak in the same city were distinct from the current outbreak isolates. Examination of isolates from different outbreaks occurring in Alcoy, Spain, over several years using WGS revealed that much of the genetic diversity observed among isolates resulted from specific recombination events (13). During 2011 to 2013, New York State (NYS) had the highest incidence of legionellosis (4.0 cases per 100,000 population) among ABC sites examined in the United States (2). The large number of LD investigations performed by the NYS Department of Health along with an extensive PFGE database of L. pneumophila isolates provided a unique opportunity to examine the ability of WGS to differentiate L. pneumophila isolates from multiple outbreaks and to compare the resolution of WGS approaches to PFGE. In this study, L. pneumophila serogroup 1 isolates from clinical and environmental sources representing 10 separate LD investigations were examined. WGS generally clustered isolates associated with the same investigation and differentiated those from separate investigations. In some cases, WGS analysis demonstrated improved strain differentiation compared to that of PFGE or SBT, supporting an important role for WGS in laboratory investigations of LD outbreaks. MATERIALS AND METHODS Bacterial isolates used in this study. The L. pneumophila isolates used in this study are shown in Table 1. Isolates were recovered from separate legionellosis investigations where at least one clinical isolate and one environmental isolate (from a presumptive source) were available for analysis. Although some of the investigations examined in this study involved only a single individual, we refer to these investigations collectively as “outbreaks” for ease of discussion. All isolates were grown on buffered charcoal-yeast extract (BCYE) medium, and identifications were confirmed utilizing an updated multiplex real-time PCR assay that detects the Legionella genus, L. pneumophila serogroups 1 to 15, and specifically L. pneumophila serogroup 1 (14, 15). PFGE. DNA was prepared from Legionella isolates harvested from BCYE plates after 24 h and digested with 50 U of SfiI (Roche, NEB). PFGE conditions consisted of 6 V with an initial switch time of 7 s and a final switch time of 70 s. The resulting PFGE profiles were interpreted using the criteria described by Tenover et al. (9). Genome sequencing. Genomic DNA was extracted using the Epicentre MasterPure Complete DNA and RNA purification kit as described in the manufacturer’s protocol with minor modifications. DNA was quantitated using the Qubit dsDNA BR assay system (Thermo Fisher). Sequencing libraries were prepared using the Nextera XT DNA sample preparation kit, and paired-end sequences (2 ⫻ 250 bp) were generated using the Illumina MiSeq System. Bioinformatic analysis. Sequence reads were assembled into draft continuous sequences (contigs) using Velvet (v. 1.2.10) with a k-mer size selected using VelvetOptimiser.pl (v. 2.2.5) (16). The number of sequence reads and genome assembly statistics are provided in Table S1 in the supplemental material. Allele sequences representing the SBT scheme according to the ESCMID Study Group for Legionella Infections (ESGLI) (17, 18) were extracted in silico from draft genome sequences using BLAST. Briefly, blastn was performed using each genome sequence as

June 2016 Volume 82 Number 12

query and a separate BLAST database representing known alleles (for each SBT allele) as subject (19). For comparative analysis, a core single nucleotide polymorphism (SNP) matrix was generated using kSNP (v. 3.0) with a k-mer size of 21 (20). Whole-genome multilocus sequence typing (wgMLST) analysis was performed using BioNumerics v. 7.5 (Applied Maths, Belgium). L. pneumophila genome sequences representing various serogroups available in the NCBI database were used to develop a database of 4,472 reference alleles. Allele sequences sharing ⱖ97.5% sequence similarity with the reference allele were called and numbered using a BLAST-based method among the assembled genomes of isolates NY1 to NY30, resulting in an allelic profile for each isolate. Pairwise comparisons of called alleles in each sequence were used to generate a similarity matrix for unweighted pair group method using average linkages (UPGMA) analysis. Additional comparative analysis was performed with BLAST Ring Image Generator (BRIG), which compares segments of a reference genome with selected draft genomes using blastn (21). Nucleotide sequence accession numbers. Newly determined assembled genome sequences were deposited at DDBJ/EMBL/GenBank under the accession numbers shown in Table 1.

RESULTS

Description of LD outbreaks. Isolates from 10 separate legionellosis investigations (“outbreaks”) occurring throughout New York State (Fig. 1) between 2004 and 2012 were selected for WGS analysis in this study. A total of 4 outbreaks were associated with long-term-care facilities (LTCF), 3 outbreaks were hospital associated, and the remaining 3 outbreaks occurred in a hotel, an apartment complex, and a private residence, respectively (Table 1). For each outbreak, isolates from clinical specimens and presumptive environmental sources were available for subtyping analysis. Isolates from cooling tower (CT) water were tested in 3 of the outbreaks, isolates from potable water (PW) sources were examined in 3 outbreaks, and isolates from both CT water and PW were examined in 2 outbreaks. In two outbreaks (outbreaks 7 and 8), isolates from samples taken directly from water heaters were tested. All 30 isolates were confirmed as L. pneumophila serogroup 1 by real-time PCR. PFGE analysis. In all 10 outbreaks, the PFGE profiles of isolates from patients and at least one environmental source were indistinguishable (Table 1; Fig. 2). In 3 outbreaks (outbreaks 2, 3, and 10), multiple isolates from different patients had indistinguishable PFGE profiles. In 2 out of the 10 outbreaks examined (outbreaks 2 and 6), the PFGE profiles of isolates from different environmental sources within an outbreak were indistinguishable from each other. Slight changes in banding patterns were observed among environmental isolates within outbreaks 1 and 9 (Fig. 2). Interestingly, the PFGE profiles among isolates associated with two separate outbreaks occurring 4 to 6 years apart in the same city (outbreaks 6 and 7) were indistinguishable. SBT from genome sequences. In order to further characterize these isolates, genome sequences were obtained and draft assembles were generated. SBT allele sequences from isolates associated with 8 of the 10 outbreaks were extracted from the assembled genomes (Table 1). Sequence types for isolates associated with outbreaks 2 and 5 could not be determined due to the presence of multiple mompS paralogs. In the remaining outbreaks, each isolate associated with a particular outbreak shared the same ST. Among the isolates examined in this study, ST1 was associated with 3 outbreaks (outbreaks 1, 3, and 4) and ST36 was associated with 2 outbreaks (outbreaks 6 and 7). ST1204, ST94, and ST731 were associated with outbreaks 8, 9, and 10, respectively.

Applied and Environmental Microbiology

aem.asm.org

3583

Raphael et al.

TABLE 1 L. pneumophila serogroup 1 isolates examined in this study Outbreak

Isolate

Locationa

Sourceb

Yr of isolation

PFGE profilec

STd

GenBank accession no.

1

NY1 NY2 NY3 NY4

LTCF K LTCF K LTCF K LTCF K

CT Sputum PW PW

2004 2004 2004 2004

LpnS13015 LpnS13015 LpnS13016 LpnS13017

1 1 1 1

LOLB00000000 LOLE00000000 LOLF00000000 LOLG00000000

2

NY5 NY6 NY7 NY8

Hospital S Hospital S Hospital S Hospital S

CT Sputum, patient 1 Sputum, patient 2 CT

2005 2005 2008 2008

LpnS13018 LpnS13018 LpnS13018 LpnS13018

ND ND ND ND

LOLH00000000 LOLI00000000 LOLJ00000000 LOLK00000000

3

NY9 NY10 NY11 NY12

Hospital R Hospital R Hospital R Hospital R

Sputum, patient 1 Lung, patient 2 Sputum, patient 3 PW

2004 2005 2006 2006

LpnS13041 LpnS13041 LpnS13041 LpnS13041

1 1 1 1

LOLL00000000 LOLM00000000 LOLN00000000 LOLO00000000

4

NY13 NY14

LTCF E LTCF E

Sputum PW

2009 2009

LpnS13042 LpnS13042

1 1

LOLP00000000 LOLQ00000000

5

NY15 NY16

LTCF O LTCF O

Sputum CT

2006 2006

LpnS13053 LpnS13053

ND ND

LOLR00000000 LOLS00000000

6

NY17 NY18 NY19 NY20

Hospital A Hospital A Hospital A Hospital A

Hot water tank Tissue PW CT

2006 2007 2007 2007

LpnS13066 LpnS13066 LpnS13066 LpnS13066

36 36 36 36

LOLT00000000 LOLU00000000 LOLV00000000 LOLW00000000

7

NY21 NY22

Hotel B Hotel B

Tracheal aspirate Water heater

2011 2012

LpnS13066 LpnS13066

36 36

LOLX00000000 LOLY00000000

8

NY23 NY24

Residence H Residence H

Sputum Water heater

2007 2007

LpnS13072 LpnS13072

1204 1204

LOLZ00000000 LOMA00000000

9

NY25 NY26 NY27

Apartment complex F Apartment complex F Apartment complex F

Sputum PW PW

2008 2008 2008

LpnS13096 LpnS13096 LpnS13097

94 94 94

LOMB00000000 LOMC00000000 LOMD00000000

10

NY28 NY29 NY30

LTCF R LTCF R LTCF R

Clinical, patient 1 CT Clinical, patient 2

2011 2012 2012

LpnS13203 LpnS13203 LpnS13203

731 731 731

LOME00000000 LOMF00000000 LOMG00000000

a

LTCF, long-term-care facility (e.g., nursing home). Multiple clinical isolates within the same outbreak are from different patients. CT, water from cooling tower; PW, potable water; clinical, isolate recovered from unknown clinical specimen. c NYS-specific PFGE pattern designation. d SBT alleles extracted from genome sequence data. ND, ST could not be determined because one or more alleles could not be resolved. b

SNP analysis. Analysis using kSNP was used to further resolve the genome sequences. In most cases, isolates associated with specific outbreaks differed by ⬍5 core SNPs and formed outbreakspecific clades (Fig. 3; see Table S2 in the supplemental material). The greatest diversity was found among isolates associated with outbreak 3, which differed by 6 to 418 core SNPs. Within outbreak 2, isolate NY5 differed from other outbreak-associated isolates by 43 core SNPs. ST1 and ST36 isolates formed ST-specific clades, but outbreakassociated isolates could still be distinguished in most cases. For example, SNP analysis revealed that ST36 isolates associated with outbreaks 6 and 7 that also shared identical PFGE profiles differed by ⬎350 SNPs (Fig. 3; see Table S2 in the supplemental material). While outbreak 1 isolates (occurring in LTCF K) were shown to be closely related, isolates from outbreak 3 (occurring in hospital R

3584

aem.asm.org

and collected over multiple years) were more diverse, suggesting that the clinical isolates from individuals in hospital R were associated with different sources. One clinical isolate (NY9) from outbreak 3 was obtained from an individual who resided in the county where both LTCF K and hospital R were located. The genome sequence of NY9 shared fewer core genome SNPs with other clinical and environmental isolates from LTCF K (3 or 4 SNPs) than with other isolates from hospital R (244 to 418 SNPs) (Fig. 3; see Table S2 in the supplemental material). Interestingly, LTCF K and hospital R are located in close proximity within the same city. wgMLST analysis. The ability of a wgMLST method to distinguish isolates from separate outbreaks was compared to the resolution of outbreak isolates using kSNP. In the wgMLST approach, specific allele sequences (based on comparison to a reference database of 4,472 alleles) were called and identified by a unique num-

Applied and Environmental Microbiology

June 2016 Volume 82 Number 12

Resolving L. pneumophila Isolates Using WGS

FIG 1 Locations of selected New York State legionellosis outbreaks. The locations of outbreaks examined in this study are indicated as red circles and labeled by the locations given in Table 1.

ber for each genome analyzed. UPGMA analysis was based on a similarity matrix of pairwise comparisons of the allelic profiles. Similar to the case for the kSNP analysis, outbreak-related isolates (9 out of 10 outbreaks) generally clustered together by wgMLST analysis, sharing ⬎98% similar alleles (Fig. 4). Not unexpectedly, isolates associated with outbreak 3 were more divergent. The clinical and environmental isolates, NY11 and NY12, were most closely associated, while isolate NY9 clustered with isolates from outbreak 1. Clinical isolate NY10 was an outlier, suggesting that this individual may have acquired LD from a source not represented among the isolates examined in this study. Similar to the case for the kSNP analysis, ST1 and ST36 isolates clustered closely, sharing ⬃93% and ⬃95% similar alleles, respectively. Comparison of ST1 and ST36 outbreak strains. In order to investigate the differences between isolates sharing the same ST, genome sequences of ST1 and ST36 isolates were compared using BLAST to the reference sequence for either L. pneumophila strain Paris (22) or Philadelphia (23), respectively. Among ST1 isolates, sequence variation compared to the Paris strain was observed in three regions (Fig. 5A). Separately, another three regions demonstrated sequence variation compared to the Philadelphia strain among ST36 strains (Fig. 5C). Only a single ST1 clinical isolate (NY11) demonstrated the presence of the ⬃132-kb plasmid (pLPP) present in L. pneumophila strain Paris (Fig. 5B). Interestingly, the hospital R environmental isolate (NY12) that was most closely related to NY11 does not appear to contain this plasmid, suggesting that loss of this element may be a frequent event. A region containing the rtx gene was absent among outbreak 1 (and NY9) isolates compared to L. pneumophila strain Paris (Fig. 5A). Interestingly, closer analysis of the assembled sequences of these isolates demonstrated sequence similarity with regions flanking the repeat region present in the rtx gene (Fig. 5D). Contigs of these assembled sequences appeared to end in nearly identical locations relative to the rtx gene of L. pneumophila strain Paris, suggesting the presence of a different repeating sequence (which can often present challenges for accurate genome assembly). Indeed, BLAST analysis revealed that the repeating sequence

June 2016 Volume 82 Number 12

FIG 2 PFGE profiles of L. pneumophila isolates. PFGE profiles of DNA digested with SfiI are shown and labeled with isolate name, source, and outbreak number as shown in Table 1. Sources are identified as follows: clinical (C), cooling tower (CT), potable water (PW), and other environmental source (E).

in the rtx gene of L. pneumophila strain HL06041035 (GenBank accession no. FQ958211.1) shared ⬃98 to 99% sequence identity with regions of the genome sequences of isolates NY1 to NY4 and NY9 (data not shown). Sequence variations located at ca. kbp 2700 compared to L. pneumophila strain Paris among two potable water isolates (NY3 and NY4) in LTCF K (outbreak 1) were observed (Fig. 5A). These data help differentiate the potable water isolates from clinical isolates NY2 and NY9 (originally grouped with outbreak 3). Similar to the case for the clinical isolates, environmental isolate NY1 also does not contain the sequence variations observed in the potable water isolates, suggesting that the cooling tower at LTCF K was the likely the environmental source. Finally, two regions (located at ca. kbp 200 and 1800) demonstrating nucleotide sequence differences relative to L. pneumophila strain Philadelphia were observed among outbreak 6 isolates but not among those isolates associated with outbreak 7, providing further evidence that these two outbreaks were caused by different strains (Fig. 5C). DISCUSSION

During outbreak investigations of LD, molecular subtyping can provide supporting evidence for epidemiologically associated environmental sources of Legionella. Since more than 90% of LD cases in the United States occur sporadically (24), environmental

Applied and Environmental Microbiology

aem.asm.org

3585

Raphael et al.

FIG 3 Core SNP analysis. A maximum-parsimony tree is shown, including reference sequences based on a 24,424-core-SNP matrix. Insets show subtrees of ST1 and ST36 isolates. NCBI accession numbers of reference sequences are as follows: Paris, NC_006368.1, NC_006365.1; HL06041035, NC_018140.1; Alcoy, NC_014125.1; Corby, NC_009494.2; Lorraine, NC_018139.1, NC_018141.1; Lens, NC_006369.1, NC_006366.1; Thunder Bay, CP003730.1; ATCC 43290, NC_016811.1; LPE509, NC_020521.1, NC_020522.1; and Philadelphia, NC_002942.5.

sources of the organism are often not identified. In New York State, PFGE has been routinely used to compare clinical and environmental Legionella isolates when available. More recently, a subset of these isolates was examined by WGS. The objective of this study was to determine if WGS approaches could provide additional subtyping information compared to PFGE for L. pneumophila serogroup 1 isolates. Moreover, we examined various bioinformatic approaches for their ability to resolve strains associated with different legionellosis outbreaks.

3586

aem.asm.org

CDC and others currently use the ESGLI SBT protocol for routine typing of L. pneumophila isolates. SBT is performed by amplification of a set of seven alleles and Sanger sequencing of the resulting products (17, 18). With the availability of WGS data for isolates examined in this study, we extracted these allele sequences from the genome sequence data in order to determine the ST of an isolate. An international SBT database allows for comparison of metadata (such as origin of isolation) from a strain of interest with those from other previously isolated strains sharing the same ST.

Applied and Environmental Microbiology

June 2016 Volume 82 Number 12

Resolving L. pneumophila Isolates Using WGS

FIG 4 wgMLST analysis. A UPGMA dendrogram depicting the similarity of wgMLST allelic profiles of outbreak-associated isolates (NY1 to NY30) is shown.

For example, isolates associated with outbreak 10 in this study were identified as ST731. This ST was also associated with separate clinical isolates collected in 2009 from New York State. Isolates associated with outbreak 8 were identified as ST1204, and the only

June 2016 Volume 82 Number 12

other isolate found in the SBT database sharing this ST was a single environmental isolate from Japan. Our finding extends the known geographical distribution of ST1204 isolates. Similar to other reports utilizing WGS approaches for L. pneumophila subtyping, an

Applied and Environmental Microbiology

aem.asm.org

3587

Raphael et al.

FIG 5 Genetic analysis of ST1 and ST36 isolates. (A and B) Draft genome sequences of ST1 isolates were compared to the L. pneumophila strain Paris chromosome (A) or the Paris strain plasmid pLPP (B) using BRIG. (C) Genome sequences of ST36 isolates were compared to the L. pneumophila strain Philadelphia chromosome. Regions of similarity (⬎50% nucleotide similarity) between the draft genome sequences and the indicated reference sequences are shaded using colors corresponding to separate outbreaks: outbreak 1, red; outbreak 3, blue; outbreak 4, green; outbreak 6, purple; and outbreak 7, light blue. For each outbreak, isolates are represented in the order shown in Table 1, starting with the innermost circle. Regions of interest lacking sequence similarity in panels A and C among some or all of the draft sequences compared are designated with black boxes. (D) Alignment of the rtx gene in L. pneumophila strain Paris with draft sequences of isolates NY1, NY2, NY3, NY4, and NY9. The rtx repeat region is illustrated with vertical black lines, and regions of similarity between the Paris rtx gene and draft genome sequences are shaded in light red.

ST could not be assigned to some isolates due to the presence of multiple copies of the mompS allele (25). In these cases, either sequence assemblies failed to cover the entire allele or multiple alleles were identified within the genome sequence. Notably, for the two outbreaks where a complete SBT profile could not be obtained, partial profiles (i.e., six out of seven alleles) of outbreakassociated isolates matched (data not shown). More sophisticated bioinformatic or targeted sequencing methods aimed at identifying the genomic locations of such paralogs are clearly needed. In this study, core SNPs were identified using kSNP software. Although recombination events could affect phylogenetic analysis, we found that removal of potential recombination from a core gene alignment did not significantly alter the topology of the resulting SNP tree compared to that produced using kSNP (see Fig. S1 in the supplemental material). Moreover, inclusion of potential

3588

aem.asm.org

recombination may be useful for accurate clustering of closely related outbreak-associated isolates. kSNP and wgMLST resolved outbreak-related strains equally well. With the kSNP approach, a new analysis generating a SNP matrix must be performed each time a new isolate is added to the data set. Since wgMLST relies on the detection of a predetermined set of alleles, new isolates can be added to the data set without the need for reanalysis. Although the wgMLST scheme described in this study requires use of specific software and analysis methods, a centralized wgMLST allele database (such as that used in this study) can serve as a way to ensure reproducible subtyping results among multiple laboratories. A similar core genome-based MLST scheme, termed cgMLST, has been recently described for typing L. pneumophila strains using a 1,521-allele database based on core genes present in 17 L. pneumophila genomes (25). This method was used to resolve 12 out-

Applied and Environmental Microbiology

June 2016 Volume 82 Number 12

Resolving L. pneumophila Isolates Using WGS

break-related strains implicated in separate LD outbreaks (25). Since wgMLST uses a larger allele database that is not restricted to core genes, wgMLST is likely to provide higher strain resolution than cgMLST approaches; however, wgMLST allelic profiles may be too diverse to generate specific sequence types useful for a standardized nomenclature. Nonetheless, differences in the level of wgMLST allele identity demonstrated utility in distinguishing outbreak-related isolates, including those sharing very similar genome sequences, such as those from ST1 and ST36 strains. Among the 10 selected legionellosis outbreaks examined in this study, genomic analysis revealed insights not appreciated with other techniques. Isolates associated with 2 outbreaks (outbreaks 6 and 7) occurring in the same city but at different locations and times were indistinguishable by PFGE but were shown by WGS methods to be associated with two closely related but distinct strains. In addition, WGS analysis revealed that individuals with legionellosis from outbreak 3 occurring in hospital R were likely to have acquired infection from exposure to different sources, including one patient who resided in the county where LTCF K was located and whose clinical isolate was closely related to other isolates from LTCF K. Further comparative analysis of the genome sequences of isolates from LTCF K demonstrated that an isolate from the cooling tower rather than potable water isolates was more closely associated with clinical isolates from outbreak 1 and isolate NY9 (initially considered to be associated with outbreak 3 in hospital R), suggesting that the cooling tower from LTCF K was the source of infection. Interestingly, the PFGE profiles of clinical isolate NY2 and cooling tower isolate NY1 were indistinguishable but shared banding patterns slightly different from those of the potable water isolates NY3 and NY4. These data suggest that highresolution analysis of isolates collected during legionellosis outbreaks may be required during particularly complex investigations where multiple highly related environmental isolates are obtained. The rtx gene encodes a large protein with several repeating units and has been demonstrated to play a role in the pathogenicity of L. pneumophila (26). In L. pneumophila, this gene undergoes recombination, resulting in differences in the sequence and copy number of the repeat region among different strains (22, 26, 27, 28). Comparison of ST1 outbreak-related isolates NY1 to NY4 and NY9 with L. pneumophila strain Paris revealed a lack of sequence similarity within the repeat region of rtx. Further analysis showed that these isolates shared sequence similarity with the rtx repeat region of L. pneumophila strain HL06041035. Interestingly, L. pneumophila strains HL06041035 and Philadelphia share the same rtx repeat unit sequence but differ in the number of repeats (27). The draft genome sequences generated in this study do not span the repeat regions of rtx; hence, it is not possible to determine the repeat copy numbers of isolates NY1 to NY4 and NY9. Nonetheless, the strain variability with respect to the rtx gene in L. pneumophila suggests that targeted sequencing of this gene may provide useful subtyping information. This study demonstrates that high-resolution genome sequence analysis of L. pneumophila not only may support epidemiological associations among clinical and environmental isolates collected during outbreak investigations but also may reveal relationships among strains not previously appreciated. Accurate confirmation of environmental sources associated with legionellosis outbreaks not only is important for focusing remediation efforts but also may provide lessons for prevention of future out-

June 2016 Volume 82 Number 12

breaks. Finally, identification of strain-variable genomic regions of similar L. pneumophila isolates may be useful in targeted resequencing methods and development of metagenomic analysis methods that could substantially reduce the length of time needed to generate subtyping results in an outbreak setting. ACKNOWLEDGMENTS We thank Heather Carleton and Brenda Brown (Enteric Diseases Laboratory Branch, CDC) for technical assistance with BioNumerics v. 7.5 software and Hannes Pouseele (Applied Maths) for development of the wgMLST database. We thank the Wadsworth Center Applied Genomics Technologies Core and the Wadsworth Center Bacteriology Laboratory staff for contributing to this work. This study was supported, in part, by funds made available through the Office of Advanced Molecular Detection (CDC). The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

REFERENCES 1. Mercante JW, Winchell JM. 2015. Current and emerging Legionella diagnostics for laboratory and outbreak investigations. Clin Microbiol Rev 28:95–133. http://dx.doi.org/10.1128/CMR.00029-14. 2. CDC. 2015. Active bacterial core surveillance for legionellosis—United States, 2011-2013. Morb Mortal Wkly Rep 64:1190 –1193. http://dx.doi .org/10.15585/mmwr.mm6442a2. 3. Beauté J, Zucs P, de Jong B, on behalf of the European Legionnaires’ Disease Surveillance Network. 2013. Legionnaires’ disease in Europe, 2009-2010. Euro Surveill 18(10):pii⫽20417. http://www.eurosurveillance .org/ViewArticle.aspx?ArticleId⫽20417. 4. Gomez-Valero L, Rusniok C, Buchrieser C. 2009. Legionella pneumophila: population genetics, phylogeny and genomics. Infect Genet Evol 9:727–739. http://dx.doi.org/10.1016/j.meegid.2009.05.004. 5. Joly JR, McKinney RM, Tobin JO, Bibb WF, Watkins ID, Ramsay D. 1986. Development of a standardized subgrouping scheme for Legionella pneumophila serogroup 1 using monoclonal antibodies. J Clin Microbiol 23:768 –771. 6. Helbig JH, Kurtz JB, Pastoris MC, Pelaz C, Lück PC. 1997. Antigenic lipopolysaccharide components of Legionella pneumophila recognized by monoclonal antibodies: possibilities and limitations for division of the species into serogroups. J Clin Microbiol 35:2841–2845. 7. Kozak NA, Benson RF, Brown E, Alexander NT, Taylor TH, Jr, Shelton BG, Fields BS. 2009. Distribution of lag-1 alleles and sequence-based types among Legionella pneumophila serogroup 1 clinical and environmental isolates in the United States. J Clin Microbiol 47:2525–2535. http: //dx.doi.org/10.1128/JCM.02410-08. 8. Lück C, Fry NK, Helbig JH, Jarraud S, Harrison TG. 2013. Typing methods for Legionella. Methods Mol Biol 954:119 –148. http://dx.doi.org /10.1007/978-1-62703-161-5_6. 9. Tenover FC, Arbeit RD, Goering RV, Mickelsen PA, Murray BE, Persing DH, Swaminathan B. 1995. Interpreting chromosomal DNA restriction patterns produced by pulsed-field gel electrophoresis: criteria for bacterial strain typing. J Clin Microbiol 33:2233–2239. 10. Reuter S, Harrison TG, Köser CU, Ellington MJ, Smith GP, Parkhill J, Peacock SJ, Bentley SD, Török ME. 2013. A pilot study of rapid wholegenome sequencing for the investigation of a Legionella outbreak. BMJ Open 3(1):e002175. http://dx.doi.org/10.1136/bmjopen-2012-002175. 11. Graham RM, Doyle CJ, Jennison AV. 2014. Real-time investigation of a Legionella pneumophila outbreak using whole genome sequencing. Epidemiol Infect 42:2347–2351. 12. Lévesque S, Plante PL, Mendis N, Cantin P, Marchand G, Charest H, Raymond F, Huot C, Goupil-Sormany I, Desbiens F, Faucher SP, Corbeil J, Tremblay C. 2014. Genomic characterization of a large outbreak of Legionella pneumophila serogroup 1 strains in Quebec City, 2012. PLoS One 9:e103852. http://dx.doi.org/10.1371/journal.pone.0103852. 13. Sánchez-Busó L, Comas I, Jorques G, González-Candelas F. 2014. Recombination drives genome evolution in outbreak-related Legionella pneumophila isolates. Nat Genet 46:1205–1211. http://dx.doi.org/10.1038 /ng.3114. 14. Nazarian EJ, Bopp DJ, Saylors A, Limberger RJ, Musser KA. 2008.

Applied and Environmental Microbiology

aem.asm.org

3589

Raphael et al.

15.

16. 17.

18.

19. 20.

21. 22.

Design and implementation of a protocol for the detection of Legionella in clinical and environmental samples. Diagn Microbiol Infect Dis 62:125– 132. http://dx.doi.org/10.1016/j.diagmicrobio.2008.05.004. Merault N, Rusniok C, Jarraud S, Gomez-Valero L, Cazalet C, Marin M, Brachet E, Aegerter P, Gaillard L, Etienne J, Hermann L, DELPHI-I Study Group, Lawrence C, Buchrieser C. 2011. Specific real-time PCR for simultaneous detection and identification of Legionella pneumophila serogroup 1 in water and clinical samples. Appl Environ Microbiol 77: 1708 –1717. http://dx.doi.org/10.1128/AEM.02261-10. Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821– 829. http://dx.doi .org/10.1101/gr.074492.107. Gaia V, Fry NK, Afshar B, Lück PC, Meugnier H, Etienne J, Peduzzi R, Harrison TG. 2005. Consensus sequence-based scheme for epidemiological typing of clinical and environmental isolates of Legionella pneumophila. J Clin Microbiol 43:2047–2052. http://dx.doi.org/10.1128/JCM.43 .5.2047-2052.2005. Ratzow S, Gaia V, Helbig JH, Fry NK, Lück PC. 2007. Addition of neuA, the gene encoding N-acylneuraminate cytidylyl transferase, increases the discriminatory ability of the consensus sequence-based scheme for typing Legionella pneumophila serogroup 1 strains. J Clin Microbiol 45:1965– 1968. http://dx.doi.org/10.1128/JCM.00261-07. NCBI. 2008. BLAST command line applications user manual, updated 6 October 2014. National Center for Biotechnology Information, Bethesda, MD. http://www.ncbi.nlm.nih.gov/books/NBK279696/. Gardner SN, Slezak T, Hall BG. 2015. kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics 31:2877–2878. http://dx.doi.org/10.1093 /bioinformatics/btv271. Alikhan NF, Petty NK, Ben Zakour NL, Beatson SA. 2011. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics 12:402. http://dx.doi.org/10.1186/1471-2164-12-402. Cazalet C, Rusniok C, Brüggemann H, Zidane N, Magnier A, Ma L, Tichit M, Jarraud S, Bouchier C, Vandenesch F, Kunst F, Etienne J,

3590

aem.asm.org

23.

24. 25.

26.

27.

28.

Glaser P, Buchrieser C. 2004. Evidence in the Legionella pneumophila genome for exploitation of host cell functions and high genome plasticity. Nat Genet 36:1165–1173. http://dx.doi.org/10.1038/ng1447. Chien M, Morozova I, Shi S, Sheng H, Chen J, Gomez SM, Asamani G, Hill K, Nuara J, Feder M, Rineer J, Greenberg JJ, Steshenko V, Park SH, Zhao B, Teplitskaya E, Edwards JR, Pampou S, Georghiou A, Chou IC, Iannuccilli W, Ulz ME, Kim DH, Geringer-Sameth A, Goldsberry C, Morozov P, Fischer SG, Segal G, Qu X, Rzhetsky A, Zhang P, Cayanis E, De Jong PJ, Ju J, Kalachikov S, Shuman HA, Russo JJ. 2004. The genomic sequence of the accidental pathogen Legionella pneumophila. Science 305:1966 –1968. http://dx.doi.org/10.1126/science.1099776. CDC. 2011. Legionellosis—United States, 2000-2009. Morb Mortal Wkly Rep 60:1083–1086. Moran-Gilad J, Prior K, Yakunin E, Harrison TG, Underwood A, Lazarovitch T, Valinsky L, Luck C, Krux F, Agmon V, Grotto I, Harmsen D. 2015. Design and application of a core genome multilocus sequence typing scheme for investigation of Legionnaires’ disease incidents. Euro Surveill 20(28):pii⫽21186. http://www.eurosurveillance.org /ViewArticle.aspx?ArticleId⫽21186. D’Auria G, Jiménez N, Peris-Bondia F, Pelaz C, Latorre A, Moya A. 2008. Virulence factor rtx in Legionella pneumophila, evidence suggesting it is a modular multifunctional protein. BMC Genomics 9:14. http://dx .doi.org/10.1186/1471-2164-9-14. Gomez-Valero L, Rusniok C, Jarraud S, Vacherie B, Rouy Z, Barbe V, Medigue C, Etienne J, Buchrieser C. 2011. Extensive recombination events and horizontal gene transfer shaped the Legionella pneumophila genomes. BMC Genomics 12:536. http://dx.doi.org/10.1186/1471-2164 -12-536. Cazalet C, Jarraud S, Ghavi-Helm Y, Kunst F, Glaser P, Etienne J, Buchrieser C. 2008. Multigenome analysis identifies a worldwide distributed epidemic Legionella pneumophila clone that emerged within a highly diverse species. Genome Res 18:431– 441. http://dx.doi.org/10.1101/gr .7229808.

Applied and Environmental Microbiology

June 2016 Volume 82 Number 12

Genomic Resolution of Outbreak-Associated Legionella pneumophila Serogroup 1 Isolates from New York State.

A total of 30 Legionella pneumophila serogroup 1 isolates representing 10 separate legionellosis laboratory investigations ("outbreaks") that occurred...
2MB Sizes 0 Downloads 12 Views