Original Article

Quality Assessment Program for EuroFlow Protocols: Summary Results of Four-Year (2010–2013) Quality Assurance Rounds Tomas Kalina,1* Juan Flores-Montero,2 Quentin Lecrevisse,2 Carlos E. Pedreira,3 Vincent H. J. van der Velden,4 Michaela Novakova,1 Ester Mejstrikova,1 Ondrej Hrusak,1 Sebastian B€ ottcher,5 Dennis Karsch,5 Łukasz SeRdek,6 Amelie Trinquand,7 Nancy Boeckx,8 Joana Caetano,9 Vahid Asnafi,7 Paulo Lucio,9 Margarida Lima,10 Ana Helena Santos,10 Paola Bonaccorso,11 Alita J. van der Sluijs-Gelling,12 Anton W. Langerak,4 Marta Martin-Ayuso,13 Tomasz Szczepa nski,6 Jacques J. M. van Dongen,4† Alberto Orfao2†

1

CLIP - Childhood Leukemia Investigation Prague and Department of Pediatric Hematology and Oncology, 2nd Faculty of Medicine, Charles University Prague and University Hospital Motol, Prague, Czech Republic

2

Cancer Research Center (IBMCC-CSIC/ USAL-IBSAL), Servicio General de Citometria (NUCLEUS) and Department of Medicine, University of Salamanca, Salamanca, Spain

3

Systems & Computing Department (PESC), COPPE - Engineering Graduate Program, Federal University of Rio de Janeiro, Rio de Janeiro (UFRJ), Brazil

4

Department of Immunology, Erasmus MC University Medical Center, Rotterdam, Netherlands

5

Second Department of Medicine, University Hospital of Schleswig-Holstein, Campus Kiel, Kiel, Germany

6

Department of Pediatric Hematology and Oncology, Zabrze Medical University of Silesia, Katowice, Poland

7

Universite Paris Descartes Sorbonne Cit e, Institut Necker-Enfants Malades (INEM), Institut national de recherche m edicale (INSERM) U1151, and Laboratory of OncoHematology, Assistance PubliqueH^opitaux de Paris (AP-HP), H^ opital Necker Enfants-Malades, Paris, France

 Abstract Flow cytometric immunophenotyping has become essential for accurate diagnosis, classification, and disease monitoring in hemato-oncology. The EuroFlow Consortium has established a fully standardized “all-in-one” pipeline consisting of standardized instrument settings, reagent panels, and sample preparation protocols and software for data analysis and disease classification. For its reproducible implementation, parallel development of a quality assurance (QA) program was required. Here, we report on the results of four consecutive annual rounds of the novel external QA EuroFlow program. The novel QA scheme aimed at monitoring the whole flow cytometric analysis process (cytometer setting, sample preparation, acquisition and analysis) by reading the median fluorescence intensities (MedFI) of defined lymphocytes’ subsets. Each QA participant applied the predefined reagents’ panel on blood cells of local healthy donors. A uniform gating strategy was applied to define lymphocyte subsets and to read MedFI values per marker. The MedFI values were compared with reference data and deviations from reference values were quantified using performance score metrics. In four annual QA rounds, we analyzed 123 blood samples from local healthy donors on 14 different instruments in 11 laboratories from nine European countries. The immunophenotype of defined cellular subsets appeared sufficiently standardized to permit unified (software) data analysis. The coefficient of variation of MedFI for 7 of 11 markers performed repeatedly below 30%, average MedFI in each QA round ranged from 86 to 125% from overall median. Calculation of performance scores was instrumental to pinpoint standardization failures and their causes. Overall, the new EuroFlow QA system for the first time allowed to quantify the technical variation that is introduced in the measurement of fluorescence intensities in a multicentric setting over an extended period of time. EuroFlow QA is a proficiency test specific for laboratories that use standardized EuroFlow protocols. It may be used to complement, but not replace, established proficiency tests. VC 2014 International Society for Advancement of Cytometry  Key terms leukemia; lymphoma; flow cytometry; quality assessment; immunophenotyping; EuroFlow

INTRODUCTION

FLOW cytometric immunophenotyping is a powerful tool for the diagnosis and classification of hematologic malignancies. Recent developments in instrumentation and fluorochromes allow for routine usage of 8- to 10-color antibody panels. However, Cytometry Part A  87A: 145 156, 2015

Original Article

8

9

Department of Laboratory Medicine, University Hospitals Leuven and Department of Oncology, Catholic University Leuven, Leuven, Belgium Hemato-Oncology Laboratory, Department of Hematology, Portuguese Institute of Oncology (IPOLFG), Lisbon, Portugal

10

Laboratory of Flow Cytometry, Department of Hematology, Hospital de Santo Antonio, Centro Hospitalar do Porto, Porto, Portugal

11

M.Tettamanti Research Center, Pediatric Clinic University of Milan Bicocca, Monza, Italy

12

DCOG - Dutch Childhood Oncology Group, Leyweg 299, The Hague, Netherlands

13

R & D Department, Cytognos SL, Salamanca, Spain.

Grant sponsor: ERA-NET PRIOMEDCHILD, Grant numbers: 40–4180098-027 (to Ł.S.; T.S.; V.H.J.vd.V.; J.J.Mv.D.) Grant sponsor: Red de C ancer (ISCIII-RTICC RD06/0020/0035 and RD12/036/048-FEDER, Instituto de Salud Carlos III, Ministerio de Economıa y Competitividad, Madrid, Spain) and Spanish Network of Cancer Research Centers (ISCIII RTICC-RD06/0020/0035-FEDER and RD12/0036/0048-FEDER), FIS 08/90881 from the ‘Fondo de Investigaci on Sanitaria’, Ministerio de Ciencia e Innovaci on (Madrid, Spain), Grant numbers: GR37 EDU/894/2009; SA016- A-09 Grant sponsor: Consejerıa de Educaci on, Junta de Castilla y Le on (Valladolid, Spain), Grant numbers: PIB2010BZ-00565; Direcci on General de Cooperaci on Internacional y Relaciones Institucionales, Secretarıa de Estado de Investigaci on, Ministerio de Ciencia e Innovaci on (Madrid, Spain) (to J.F.M., Q.L., A.O.)

Received 16 May 2014; Revised 28 August 2014; Accepted 6 October 2014

Additional Supporting Information may be found in the online version of this article.

Grant sponsor: European Commission, Grant number: LSB-CT-2006– 018708 (to EuroFlow Consortium)

*Correspondence to: Tomas Kalina; Department of Pediatric Hematology and Oncology, 2nd Faculty of Medicine, Charles University Prague, V Uvalu 84 150 06 Praha 5, Czech Republic. E-mail: [email protected]

Grant sponsor: International Society of Advancement of Cytometry (“ISAC Scholar”; to T.K.) Grant sponsor: Ministry of Health of the Czech Republic, Grant numbers: NT/12425 and NT/14534 (to T.K.; O.H.; E.M.) Grant sponsor: GAUK 802214 (to M.N.) Grant sponsor: CNPq- Brazilian National Research Council, (Brasılia, Brazil), Grant number: Produtividade em Pesquisa: 305081/ 2011-0 and Universal: 472499/2012 (to C.E.P.) Grant sponsor: FAPERJ-Fundac¸~ ao de Amparo  a Pesquisa do Rio de Janeiro, (Rio de Janeiro, Brazil), Grant number: (Cientista do Nosso Estado: E26/102.946/2011-CNE) (to C.E.P.)

optimal multicolor antibody panels provide a steadily growing wealth of information about single cells, resulting in highly complex data sets. This has in turn increased the complexity of antibody panel design, optimization and validation, and data analysis (1). Of note, the amount of information generated with the more complex multicolor panels far exceeds the capacity of individual flow cytometry experts for fast and detailed understanding of the acquired data. Thereby, such extensive and complex data sets have demanded for the development of new data analysis algorithms and strategies. Such novel approaches for data analysis must be applicable in multicenter settings, which require high levels of intra- and interlaboratory standardization to control for, e.g., instrument, sample preparation, data analysis, and data interpretation associated variability in each laboratory (2). In addition, collaborative multicenter efforts such as the one taken by the EuroFlow Consortium (2,3) also allow fast evaluation of the performance of antibody panels in a large number of prospectively acquired samples, but again, high levels of standardization across the centers is mandatory. The EuroFlow Consortium has addressed the standardization problem by developing procedures for instrument setup, that include PMT setting to target fluorescence intensity of harddyed beads (Rainbow particles) and usage of predefined set of compensation tubes (2). A complete set of eight-color antibody panels and sample preparation procedures was optimized, including reagent characteristics (fluorochrome tag, clone, titer, 146

† These two authors have equally contributed to this work, as well as to the supervision and coordination of the EuroFlow program, including the QA Workpackage.

Published online 23 October 2014 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/cyto.a.22581 C 2014 International Society for Advancement of Cytometry V

manufacturer) (4). Complete process is linked to software tools allowing direct comparison of expression patterns for the diagnosis and classification of hematological malignancies. These software tools enable large-scale data analysis and direct comparison of individual cases against reference data sets composed of specific disease categories (5–8). Direct antigen expression comparisons in data files collected in parallel in multiple centers created a need for quality assurance scheme to ensure that standardization level is maintained in all centers. To address this need, EuroFlow developed an innovative and easy to implement external QA program. Thus far, external QA procedures in diagnostic flow cytometry have mainly focused on two endpoints: (1) the quantification of subsets of cells, e.g., CD41 T-cells (9,10) and CD341 precursor cells (2,11–13) the overall interpretation of flow cytometry data, e.g., as used for the diagnosis and classification of leukemia (14,15). In contrast, the aim of the EuroFlow QA program is to evaluate the application of the technical procedures as defined by the standardized EuroFlow protocols via monitoring of the median fluorescence intensity (MedFI) of antigens expressed at relatively stable levels on specific subsets of PB lymphocytes. The MedFI values were used as readout, because they are typically indirectly used in current practice for semiquantitative interpretation of leukemia/ lymphoma immunophenotypic patterns. At the same time, Quality Assessment Scheme for EuroFlow Protocols

Original Article MedFI values reflect the end product of standardized preanalytical processes in individual laboratories. Previous studies have shown relatively stable expression of multiple surface molecules on PB lymphocytes with low variation among different healthy individuals. The CD45 antigen, for example, was reported to possess an antibody binding capacity of 201,000 with minimal variation (CV 2.5%) (16), and the CD4 molecule has an antibody binding capacity of 48,400 with a CV of 3.6% (17). In another study, the CVs for CD3, CD4, CD5, CD8, CD19, CD20, and CD45 expression on normal lymphocytes are between 7 and 33%, within a single laboratory and using the same instrument (18). We used the MedFI of specific molecules expressed on defined PB lymphocyte subsets as endpoints in the EuroFlow external QA, because the fluorescence intensity for select antigens is the key criterion for the diagnostic algorithms proposed and used by EuroFlow. Combined assessment of the performance and variability associated with instrument settings [such as photomultiplier tube (PMT) voltages and fluorescence compensation values], and sample preparation protocols (such as immunostaining procedures with predefined reagent concentrations and incubation times), is achieved through analysis of locally drawn PB samples from random healthy donors, thereby overcoming the need for centralized preparation and shipment of stabilized PB samples. Here, we report the results of four consecutive annual rounds of the external QA EuroFlow program performed in up to 11 different laboratories from nine European countries. The QA program can be used as benchmark for standardization of fluorescence intensity measurements. It also allows for the first time an estimation of the contribution of technical variation to total variation of fluorescence intensity measurements, thus guiding interpretation of biological differences based on fluorescence intensity readouts, as used for leukemia/lymphoma immunophenotyping.

METHODS QA Rounds QA rounds started in 2010 and have been repeated once per year for a total of four rounds: 2010, 2011, 2012, and 2013. Between 9 and 11 laboratories from nine different European countries participated in each round. No reagents or samples were centrally sent to other laboratories, unless specified. The QA procedures were designed to mimic as closely as possible routine sample preparation and data acquisition, as used for local diagnostic samples. For each QA round, PB samples of three different healthy donors were drawn locally at each laboratory. Healthy donors provided consents according to the Declaration of Helsinki. Rainbow eight-peak beads (Spherotech, Lake Forest, IL) were acquired along the QA tubes to provide evidence of each instrument’s performance, as stated in the EuroFlow recommendations for the monitoring of instruments (2). An illustrating example of a report table for a sample included in a QA round is shown in Supporting Information Figure 1. Flow Cytometry Flow cytometers equipped with three lasers and eight PMT detectors (FACS Canto II or LSR II, BD Biosciences, San Cytometry Part A  87A: 145 156, 2015

Jose, CA) were set according to the EuroFlow instrument setup Standard Operating Protocol (SOP) (2). The optics of LSR II differed from FACS Canto II in collection of Pacific Orange (PacO): Band pass filter (BP) 537/26 vs. 510/50; PE: BP 576/ 26 vs. 585/42 and PerCP–Cyanin5.5 (PerCPCy5.5) BP 695/40 vs. Long Pass filter 670LP. In brief, the voltage of each PMT was set to reach uniformity of the target value of the seventh peak (descending order by intensities) of Rainbow eight-peak beads (Spherotech; see www.euroflow.org). Compensation was calculated using Diva 6 software (BD Biosciences) according to a defined set of single stained compensation tubes, as detailed elsewhere (2). Composition of the EuroFlow QA Tube The Lymphocytosis Screening Tube-QA (LST-QA) antibody mixture was designed to parallel the LST tube (4) with a few modifications: TCRcd PE-Cy7 was omitted and CD38 APC-H7 was replaced by CD81 APC-H7 to achieve relatively homogeneous levels of expression for at least one marker in each channel. By default, the following reagents were used in each laboratory: CD4 (clone RPA-T4) Pacific Blue (PacB) and CD20 (clone 2H7) PacB from BioLegend (San Diego, CA); CD5 (clone L17F12) PerCPCy5.5, CD3 (clone SK7) APC and CD81 (clone JS-81) APC-Hilite7 (APC-H7) from BD Biosciences; CD8 (clone UCHT-4) FITC, Ig Lambda (IgL) polyclonal FITC, CD56 (clone C5.9) PE, Ig Kappa (IgK) polyclonal PE antibody mixture purchased from Cytognos SL (Salamanca, Spain); CD19 (clone J3.119) PE-Cyanin7 (PECy7) was obtained from Immunotech (Marseille, France) and CD45 (clone HI30) PacO was from Invitrogen (Carlsbad, CA). Although the pattern of expression of IgL, IgK, and CD56 on normal PB B lymphocytes is more heterogeneous and variable than that of the other markers, these three LST markers were retained in the LST-QA combination because the staining profile obtained for them provides valuable information about specific steps in the sample preparation protocol (e.g., the washing steps) and the sensitivity for relatively low marker expression levels. For the 2013 QA round, also a lyophilized eight-color premixed antibody combination prepared by Cytognos SL (CYT-LST-QA) was used in parallel to the above in-house antibody combination (Supporting Information Fig. 2). This lyophilized tube consists of carefully selected antibodies, which have been tested previously by the EuroFlow Consortium for equivalency to the in-house EuroFlow QA tube. Specimen staining and erythrocyte lysis (FACS Lysing solution, BD Biosciences) were performed according to the EuroFlow sample preparation SOP (reference [2) and www. euroflow.org). The samples were measured within 1 hour after immunostaining, using the local flow cytometers. Data Analysis All FCS data files were analyzed locally using the Infinicyt software (Cytognos SL) and a predefined gating strategy, as incorporated in the Infinicyt QA profile, together with reference screenshots centrally provided to each laboratory. In brief, PB lymphocytes were gated using the forward scatter, side scatter (FSC/SSC), and CD45pos parameters; within lymphocytes, B-cells were gated as CD19pos and CD20pos events, 147

Original Article

Figure 1. Illustrating example of the gating strategy used with the LST-QA tube. Lymphocytes were gated based on their forward scatter/ side scatter (FSC/SSC) and CD45 properties (grey dots; panels A.1 & A.2), followed by gating of B-cells on the CD19 vs. CD3 and CD19 vs. CD20 dot plots (red dots in panels B.1 & B.2). Subsequently, single B-cell subpopulations expressing surface membrane immunoglobulins were gated on the IgK vs. IgL dot plot (IgLpos B-cells correspond to orange dots and IgKpos B-cells are depicted as red dots; panel B.3). In turn, CD56bright NK-cells were gated as those events included in the lymphocyte gate which were CD19- and CD3- (green dots; panel C.1); CD56bright NK-cells were further discriminated from other NK-cells upon gating on CD45 vs. CD56 dot plot (panel C.2). Finally, T-cells were gated on the CD19 vs. CD3 dot plot (dark blue dots in panel D.1) and their major subsets (CD4pos and CD8pos T-cells) were further gated on the CD4 vs. CD8 dot plot (CD4pos T-cells are depicted as light blue events and CD8pos T-cells were painted as dark blue events; panel D.2). The expression patterns of CD81 on B-cells and of CD5 on T-cells are, shown in panels B.4 and D.3, respectively. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

T-cells were defined as CD3pos events and NK-cells as CD3 and CD19 double-negative events expressing strong CD56pos. T-cells were further gated into CD41 and CD81 T-cells and B-cells were divided into IgKpos and IgLpos B-cells (Fig. 1). For each of the above-defined lymphocyte subsets, specific MedFI values were recorded in a predefined table (Supporting Information Fig. 1). Locally analyzed data were used for P-score calculation and results are presented in Figures 3A and 4 (with the exception of CD56 PE “central gate”). In parallel, the analyzed data files were saved as “.cyt” files (containing the FSC file and all analysis related meta-data) and sent for central evaluation by TK at the CLIP-Cytometry Laboratory (Prague, Czech Republic). Central evaluation was performed on “merged” samples (data files are combined into one metafile for analysis and graphical presentation) (2) using the Infinicyt software after performing all gating strategy again. Centrally analyzed data are presented in Figures 1, 2, 3B, and 5. Subsequently, all subsets of gated cells were displayed for each fluo148

rescence parameter (Fig. 2 and Supporting Information Fig. 3) and MedFI values were recorded. Statistical Methods The coefficients of variation (CV) of MedFIs reported for all FCS files were calculated as: CV5 (Standard Deviation)/ (Average of MedFI) 3 100. To identify data abnormalities, we adapted the “Performance score” (P-score) calculation used by Lysak (11) and built into a national external quality assessment system (SEKK, Pardubice, Czech Republic, sekk.cz). The adaption consisted of logarithmic transformation of values and adjusting the maximal allowed difference to the actual distribution of the data: log MedFI2log 10 qaMedFI p2score5 10 Dmax

where qaMedFI is the median of all MedFI values from QA rounds 2010–2013 (Table 1) and Dmax is the maximal allowed Quality Assessment Scheme for EuroFlow Protocols

Original Article

Figure 2. Summary of the QA round performed in 2013. The MedFI values (circle) obtained for each of the corresponding (gated) PB lymphocyte subsets from data files obtained during the QA round performed in 2013 are shown. Color codes represent IgLpos B-cells (orange), IgKpos B-cells (red), CD56bright NK-cells (green), CD4pos and CD8pos T-cells (light blue and dark blue respectively), grey arrows point to outliers) P-score < -1 or P-score >1, compare to Figure 4. MedFI values are well preserved in all files for all markers, except for the immunoglobulin light chains (IgK, IgL) on B-cells. Compensation inaccuracies can be found in the following fluorochrome-associated stainings: PB (files K1 are undercompensated), FITC (files E1 are overcompensated), PE (files I1 and K1 are overcompensated) and APC (files K1 are undercompensated). Similar plots depicting QA 2010, 2011, 2012 can be found in Supporting Information Figure 3. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Cytometry Part A  87A: 145 156, 2015

149

Original Article

Figure 3. Rapid outlier identification with P-score. Panel A: Graph showing P-score trends for CD3, IgK, CD81 across samples in all participating laboratories in QA 2013 using lyophilized reagent mixture (laboratory, instrument and year coded on y-axis, P-score on x-axis, zero at dashed line, outliers highlighted with black background). Panel B: Fluorescence intensity of CD3 (blue), IgK (red), CD81 (orange) shown for outliers identified with P-score. Left part shows all cells as dots with median fluorescence as circle in relationship to all other samples (same order as in A), while right part shows a histogram of outlier (full arrow) versus reference close to P-score 0 (dashed arrow). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

difference calculated as the 95th percentile of all absolute values of difference of actual MedFI from qaMedFI. To follow less stable values (IgL FITC, and IgK PE; CV repeatedly >60%) 90th percentile was used instead (Table 2). The setting of the Dmax value was validated by data variability obtained during the 2010–2013 QA rounds, so that all outliers were identified and manually 150

tracked and the most likely cause of such variability was determined. To illustrate the impact of variability on the graphical interpretation of the data MedFI_span was calculated as subtraction of logarithmically transformed fifth percentile MedFI from 95th percentile MedFI. Silhouette graphs (19) were constructed using the MATLAB software program (Mathworks, Natick, MA). Quality Assessment Scheme for EuroFlow Protocols

Original Article

Figure 4. Summary graph results of all QA rounds. Summary graph of P-score values out of range for the acceptable variation (marked as black or grey squares) shown for every individual laboratory (three files in a column) obtained through the whole study period (4 years). Note that black squares repeat in vertical duplicates or triplicates (same lab same year, different sample) point to issues in sample preparation procedure or instrument setup, while horizontal doublets in lab A and B (same year 5 same sample, different instrument) point to sample-related issues. While 13 failures (9.4%) were recorded for each IgK and IgL, for all other markers only 6 (4.3%) failures were recorded in total per marker (which corresponds to Dmax setting). Letter codes indicate participating laboratories; “Lyo mix” denotes results obtained with the lyophilized reagent mixture (note that in Lyo mix, CD45 OC515 and CD81 APCC750 was used). Issues tracked back with QA: Grey squares-“local” gating of CD56bright NK-cells; XLaboratory B used the wrong reagent volume; laboratory G used an LSRII instrument with an orange shifted optical filter; ## denotes missing values, when no CD56bright NK-cells could be found in the sample.

RESULTS QA of Fluorescence Intensity Patterns Obtained at Individual Laboratories via Local Data Analysis All submitted FCS data files were first evaluated for fluorescence intensity patterns obtained for individual FCS data files in each laboratory. For this purpose, we displayed all gated PB lymphocyte subsets versus the file numbers for all fluorescence channels, to visually inspect the potential occurrence of any abnormal staining patterns in the gated lymphocyte subsets. The data presented in Figure 2 and in Supporting Information Figure 3 demonstrate that standardized flow cytometry can provide uniform data. However, presence of any inconsistent MedFI values among individual laboratories may be easily identified for several reagents. Symmetrical shifts in triplicate samples from an individual laboratory suggested the occurrence of systematic problems related to that particular laboratory, while individual outliers would typically depict individual sample/donor-related issues. In those samples where we observed symmetrical shifts in triplicates (see D1 and I1 data files in CD3 APC graph, Fig. 2), Cytometry Part A  87A: 145 156, 2015

we recommended revision of instrument setup, Rainbow bead lot, reagent, and titration. Individual variations are notable in the staining for the B-cell surface IgK and IgL, albeit a degree of symmetrical shift in triplicates could be observed as well. Otherwise, the MedFI values were consistent across different donors, instruments, and laboratories, which confirmed the previously published data on interlaboratory reproducibility of the stainings proposed for EuroFlow panels (2), showing the utility of the MedFI values as readout for QA. At the same time, QA of MedFI was also useful for the identification of compensation problems. When MedFI values of antigen “negative” subsets were evaluated, suboptimal compensation settings were identified (e.g., files E1 in Fig. 2, where the overcompensated light blue CD41 T-cells showed negative MedFI values for the CD8 FITC parameter). Review of the fluorescence compensation procedure was recommended to affected laboratories. Rapid QA Evaluation of Individual Data Files To provide feedback evaluation on each data file to each laboratory, we asked individual laboratories to analyze their 151

Original Article

Figure 5. Clustering of MedFI Values for Defined Subsets of Lymphocytes Identified with the LST QA Antibody Combination in all Laboratories for the four Yearly QA Rounds. Cluster formation is displayed using principal component analysis (composition of each principal component is listed below each axis) of manually gated PB CD41 T-cells (CD4 -light blue), CD81 T-cells (CD8 -dark blue), CD56bright NKcells (NK -green), IgKpos B-cells (IgK -red) and IgLpos B-cells (IgL –orange) (panel A) from 123 healthy donors (all displayed simultaneously). The Silhouette graph provides a measure of how tightly all the data within a given group/cell subset are grouped. Each gated lymphocyte subset from each measured data file is represented by a line (y axis in panel B) and lines are grouped to particular clusters. A Silhouette value approaching 1 represents an ideal match to the corresponding cluster, a value at zero represents lymphocyte subsets located between clusters and values below zero represent lymphocyte subsets located closer to an inappropriate cluster. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

own data using a predefined gating template and to report MedFI values for given subsets of cells in a pre-established table format (Supporting Information Fig. 1). This QA step also intended to evaluate the understanding of the analytical software. Individual MedFI values were then compared with median values for all MedFI values obtained in all QA rounds (2010–2013). The P-score was calculated for individual predefined parameters for each sample, to offer fast identification of outliers from the acceptable variation. The results of the QA 2013 are illustrated in Figure 3. The P-score calculation allowed us to set the stringency of the acceptable variation for each antigen/cell population pair (see Methods section). By definition, P-score is set to highlight MedFI values performing poorer than 95% (or 90%) of seemingly correct data files. In the 2013 QA round, we implemented the P-score calculation directly into the Infinicyt software (QA module). The analysis of FCS data files based on the software template algorithm allowed for direct comparison of manual analysis versus software-guided analysis as a benchmark (Supporting Information Fig. 4). Thereby, this QA module is also meant to be used for self-testing by individual laboratories or even individual technicians or operators, independently of the EuroFlow organized multicenter QA rounds. Overall Trends in QA The four QA rounds performed from 2010 to 2013 demonstrated that average of MedFI values obtained for the selected antigens expressed on well-defined subsets of PB lymphocytes used for the QA program were well maintained overtime among donors across all laboratories (average of MedFI 152

ranged 86 to 125% from qaMedFI for seven best markers over QA rounds, Table 1 and Supporting Information Fig. 3). We observed that a fluctuation around median (distance from fifth percentile to 95th percentile of MedFI spanned less than 0.5 decade for CD4, CD20, CD45, and CD8; between 0.5 and 0.6 decade for CD5, CD19, CD3, and CD81; whereas it was between 0.7 and 1.4 for Igl, IgK, and CD56 (“MedFI span” in Table 2). In total, CVs of MedFI repeatedly below 30% were obtained for 7 of 11 markers evaluated in all QA rounds (Table 2). Of those with higher variability, IgK and IgL staining was caused by a combination of factors: (i) sample preparation, mainly incomplete wash of plasma antibodies (shifts of triplets, Fig. 2 and Supporting Information Fig. 3); (ii) larger variability of light chain molecule expression per B-cell (dots within one donor span one decade, Supporting Information Fig. 3); (iii) donor-related variability (Fig. 4); (iv) polyclonal reagent. Variable expression levels of CD81 on B-lymphocytes present in normal PB samples (Figs. 1 and 3B) and differences between initial custom conjugate (year 2010) and catalog reagent could contribute to higher CV of CD81 staining in the first year of testing. When CD56bright NK-cells were reevaluated for source of variability, lower MedFI values leading to QA failure [because of total CD56pos gating, as opposed to CD3neg19neg45pos56bright gating (green, Fig. 1, panels C.1 and C.2)] were observed in reports generated locally, as compared with central evaluation (Fig. 4), thus pointing to importance of proper application of gating strategy. In addition, once the out-of-range P-score values were studied from all QA rounds of all samples across all laboratories and QA years (Fig. 4), we could infer that the majority of Quality Assessment Scheme for EuroFlow Protocols

Original Article Table 1. MedFI drifts during QA period AVERAGE OF MEDFI OVER TEST YEARS MARKER

CD20 PacB CD4 PacB CD45 PacO CD8 FITC IgL FITC CD56 PE IgK PE CD5 PerCP5-5 CD19 PECy7 CD3 APC CD81 APCH7

CELL SUBSET

2010 (N 5 30)

2011 (N 5 33)

2012 (N 5 33)

2013 (N 5 15)

2013 MIX (N 5 27)

on B-cells on CD4pos T-cells on T-cells on CD8pos T-cells on IgLpos B-cells on CD56bright NK-cells on IgKpos B-cells on T-cells on B-cells on T-cells on B-cells

27,553 8,956 6,042 14,464 21,686 18,246 13,939 10,303 13,399 31,664 2,692

21,954 8,616 5,855 15,846 12,372 16,285 22,937 8,998 13,702 34,980 1,921

19,775 7,777 4,997 15,653 10,423 17,501 18,768 10,222 14,633 36,622 2,222

22,352 6,747 6,573 15,727 9,306 17,261 7,114 10,116 14,963 42,125 2,542

20,250 7,293 5,475 14,538 11,833 17,745 6,647 9,759 15,807 31,940 2,931

out-of-range values which emerged in the same laboratory for the three donors tested locally, typically pointed to variations in the experimental procedure, rather than to interdonor variability. Although for IgK and IgL 13 failures (9.4%) were recorded, all other markers showed six failures (4.3%) out of 138 tested QA P-score values per marker. Of note, no clear trend toward either improvement or relaxation of the standardization effort over the years was identified in any of the participating laboratories, for any of the QA rounds (Fig. 4, Table 1 and Table 2). In addition, we were able to track issues related to wrong reagent titer and different optical filter usage (Fig. 4). For the last QA 2013 round, commercial lyophilized premixed reagents were used in parallel, which should be expected to decrease the previously detected variability by removing differences in manual pipetting within and between individual laboratories. In this QA round, the premixed reagent solutions were compared side-by-side with traditional pipetting of individual reagents in five different laboratories. Indeed, we were able to improve from the total of 11 P-score outliers in single reagents to 4 in premixed reagent sample set. Of note, majority of outliers in single rea-

QAMEDFI

DMAX

22,200 7,807 5,287 15,137 11,824 18,064 9,685 9,795 14,785 33,647 2,256

0.28 0.16 0.24 0.29 0.69 0.52 0.77 0.36 0.40 0.31 0.32

gent group (n 5 8) was in one particular laboratory and resulted in lower signal of CD4, IgL, and IgK stainings (Supporting Information Fig. 2). Analytical Resolution of Different Subsets of PB Lymphocytes in the Multidimensional Antibody Staining Space The ultimate goal of the immunophenotyping standardization efforts described here was to ensure that, provided highly similar and reproducible results are obtained, also all major subsets of lymphocytes will be appropriately discriminated from each other in the n-dimensional flow cytometry space regardless of donor, instrument, laboratory, and time of evaluation. Thereby, the projections of all individual welldefined lymphocyte subsets into separate clusters can be shown for each of the four QA rounds and for all years combined using an APS view [principal component 1 vs. principal component 2 space; (Fig. 5A)]. To better evaluate the coherence and potential overlaps between the different clusters of all subsets, and the resolution obtained among them, a Silhouette graph (19) was constructed (Fig. 5B). As depicted in Figure 5, all PB lymphocyte subsets, except for the IgKpos B-cells,

Table 2. MedFI variation during QA period COEFFICIENT OF MEDFI VARIATION OVER THE TEST YEARS MARKER

CD20 PacB CD4 PacB CD45 PacO CD8 FITC IgL FITC CD56 PE IgK PE CD5 PerCP5-5 CD19 PECy7 CD3 APC CD81 APCH7

CELL SUBSET

2010 (N 5 30)

2011 (N 5 33)

2012 (N 5 33)

2013 (N 5 15)

2013 MIX (N 5 27)

MEDFI SPAN (N 5 123)

on B-cells on CD4pos T-cells on T-cells on CD8pos T-cells on IgLpos B-cells on CD56bright NK-cells on IgKpos B-cells on T-cells on B-cells on T-cells on B-cells

32.6% 19.3% 28.8% 15.5% 79.7% 52.5% 82.1% 23.1% 32.4% 29.5% 52.3%

25.8% 15.3% 37.3% 14.3% 189.6% 45.9% 108.7% 34.8% 24.5% 38.5% 39.0%

25.6% 16.9% 19.1% 39.2% 62.1% 47.5% 100.8% 43.8% 26.6% 40.9% 34.0%

29.3% 18.6% 28.4% 31.2% 71.2% 32.6% 94.6% 31.8% 14.0% 22.5% 30.6%

21.6% 10.9% 11.5% 23.2% 43.8% 33.9% 91.2% 23.5% 16.0% 35.7% 34.0%

0.42 0.27 0.31 0.43 1.23 0.74 1.42 0.54 0.51 0.54 0.55

Cytometry Part A  87A: 145 156, 2015

153

Original Article formed well-defined clusters, which could be perfectly discriminated from all other cell subsets identified, including their nearest neighbors. Good identification of subset clusters is an evidence of the quality of standardization (e.g., CD41 T-cells will be correctly defined as CD41 T-cells in any laboratory at any time). However, the Silhouette graph showed a poor separation of IgKpos B-cells from neighboring IgLpos B-cells. This was due to the low fluorescence intensity of IgK staining on IgKpos B-cells in several samples (Fig. 2 and Supporting Information Fig. 3).

DISCUSSION Usage of the eight-color EuroFlow antibody panels has proven to be a powerful and robust immunophenotyping procedure for diagnostic screening and classification of the most relevant WHO-defined hematological malignancies (2–4). In addition to conventional expert-based data analysis and interpretation, the EuroFlow database-guided software algorithms (1,5) can also be used for the analysis of data from FCS files. The latter approach can ultimately be used for softwareguided classification of either individual cells to specific cell populations represented in the database, or specific populations of malignant/aberrant cells to given WHO disease categories (4,5). However, to provide robust and fully comparable multicenter results, sample preparation and instrument setup and data acquisition must be performed accurately and in a standardized manner (20). For this purpose, clear and robust standard operating protocols (SOPs), trained personnel, and QA tools are essential. Most diagnostic categories of WHO-defined hematological malignancies and cell population definitions in the “Flow diagnostic essential (FDE) code” (21) rely on two- or threeclass distinction of positivity (positive vs. negative or strong vs. weak vs. negative). Interpretation is made based on (relative) expression levels, which shall thus be accurate, objectively assessed and reproducible. When expression level is the output, then monitoring MedFI can control the quality of the process. The rationale of CD4 positive T-cell enumeration or CD34 positive cell enumeration is different; here, the absolute count and relative percentage of a given cell population within a sample is the final diagnostic output that needs to be monitored. If the intensity of the fluorescence signals associated with individual markers, rather than the percentage of a given cell subpopulation, is the output, locally drawn healthy volunteer PB samples can be used in parallel (2) instead of a stabilized sample from the same donor distributed to all laboratories participating in a QA round. In fact, we have already shown that the variability of MedFI values of specific PB subsets is not greater when local donors are used compared with a stabilized centrally processed and distributed sample (2). These results are in line with the well-established stable expression of such markers in normal lymphocyte subsets in healthy volunteers in single center studies (16–18). Here, we confirm in interlaboratory setting and over 4-year period that the interindividual variability of MedFI values can be compared with 154

previously published data by Bikoue (18), showing CV of 11% for CD19 (compared with 14% in QA 2013), CV of 12% for CD3 (23% in QA 2013), CV of 20% for CD8 (23% in QA 2013) or CV of 29% for CD45 (12% in QA 2013). Values below 30% were reached repeatedly for CD20, CD4, CD45, CD8, CD5, CD19, and CD3 (Table 2), whereas CD56 and CD81 were slightly more variable (CV 30–52%) and would offer a room for improvement. In fact, the difference between samples at lower end MedFI (fifth percentile) and upper end MedFI (95th percentile) is around half a decade on the flow cytometry scattergram. Deviation of average of MedFI for individual markers and years (Table 1) ranged from 86 to 125% from the qaMedFI, which in fact combines the interdonor variation, sample-preparation induced variation and the allowed variation of instrument setup 615% from Target MedFI settings according to EuroFlow SOP. Thus, we considered the above-named antigens to be acceptable as endpoints for longitudinal comparisons. Although assessment of IgK and IgL on B-cells is not sufficiently stable (CV >60% annually) to serve as sensitive general MedFI QA readout, it is still an important QA parameter for monitoring the staining method. Usage of an additional reagent in the same fluorochrome channel (e.g., CD8 and CD56) backs up these two markers in the EuroFlow LST-QA tube to provide general MedFI QA readout for these fluorescence channels. Altogether, we tested a series of 123 individual healthy donors from 11 laboratories in nine different European countries, over a 4-year period. Based on our data, it may be concluded that long-term standardization is feasible with expected MedFI CVs for stable markers below 30%. The QA approach presented here has taken advantage of the fully standardized environment of the EuroFlow approach implemented in the participating laboratories. Thus, many preanalytical and post-analytical variables that are tested by the SIHONSCORE (15) did not need to be addressed as a subject of the described QA. Similarly, computer-assisted data analysis algorithms constructed and/or proposed should limit the interpretation failures (4,5). Of note, the EuroFlow QA was designed to be applied to one of the most common entries of the EuroFlow diagnostic algorithm: evaluation of PB samples using the Lymphocyte Screening Tube (LST) (4). Thereby, the LST-QA tube closely mimics the actual diagnostic procedure. Whenever a measurement failed the P-score evaluation, the entire set of the implemented EuroFlow SOPs was reviewed locally and identified errors were discussed at EuroFlow meetings. It should be noted that the EuroFlow QA program evaluates a general flow cytometry process rather than a single antibody panel or single diagnostic sample, as performed in other leukemia immunophenotyping QA programs, e.g., UK NEQAS (14) or The College of American Pathologists. In the UK NEQAS testing, it is assumed that whenever a participant can correctly classify the QA sample, the involved laboratory would also correctly classify any leukemia sample. In contrast, the EuroFlow QA assumes that if a participant can execute the EuroFlow SOP correctly (and can accurately assess the lineage markers on normal cells), the involved laboratory would be Quality Assessment Scheme for EuroFlow Protocols

Original Article able to accurately identify these markers on malignant cells as well. Consequently the EuroFlow QA program evaluates the complete technical procedure including data analysis and thereby complements the UK NEQAS and other comparable QA programs (9–15), but does not replace them. The measurements of MedFI values as endpoints of the EuroFlow QA consists of composite measures of biological variables (e.g., number of antibody molecules bound per cell), staining variables (e.g., properties of the antibody reagent used, pipetting and incubation time) and instrument-related variables. Because the inter-donor biological variation of marker expression is relatively low and because the instrument setup is monitored using well-defined and identical standards (Rainbow beads), most variations observed will potentially point to staining-related variables and data analysis-related issues. Discussions held during the evaluation of the EuroFlow QA rounds (during regular meetings of the group) revealed several sources of suboptimal results. They included the reagent volume used, reagent swap, insufficient washing of the plasma immunoglobulins and inconsistent gating, but most frequently, improper fluorescence compensation. Surprisingly, no instrument-related issues other than compensation settings were identified (Fig. 2 and Supporting Information Fig. 3). It should be stressed that, although some QA test results were out of the acceptable range, none of the results would have led to misclassification of a lymphocyte subset and therefore, failure in recognizing the target cell population. Discussion of outlying results, their potential causes and critical points of the SOP were successful in either identifying the need for additional training (new participants were given a chance to train in the laboratories of the founding members) or clarifying the issues and improving the labs’ performance. Potential limitations of the described EuroFlow QA test include the fact that only the LST tube is used. Consequently, in theory, pipetting of other tubes or panels could still be suboptimal, as it was not specifically verified. This may be particularly relevant for those antibody combinations aimed at simultaneous detection of cell surface and intracellular (e.g., cytoplasmic) markers, where the absence of signal may be due to sample preparation steps not evaluated here, e.g. inappropriate permeabilization of the cells. However, such technical steps can be easily incorporated into future EuroFlow QA programs, if needed. Recently, a QA module has been developed and embedded in the Infinicyt software and a LST-QA reagent mix has also been manufactured by Cytognos SL. This reagent mix was evaluated in the 2013 QA round and proven to perform similarly well among the participating laboratories, avoiding pipetting errors and thereby the need for repeated sample staining. Furthermore, the software QA module can be used to test proficiency as a self-testing tool or a local QA tool (inter-personal QA, time-course). Remarkably, the results of the QA rounds performed using the LST-QA lyophilized mix in parallel to the in-house single reagent panels of the individual participating laboratories, were only modestly better. This can be due to the already extensive training and experience of the laboratory personnel performing the stainings for more Cytometry Part A  87A: 145 156, 2015

than 7 years. Altogether, these results suggest that any individual laboratory which has well-trained and experienced personnel may choose the QA settings that best match their daily work (e.g. usage of individual reagents versus reagent mixes) for a more close evaluation of their routine practice. Based on the experience gained and the results obtained, the EuroFlow Consortium has decided to open the QA rounds to any individual laboratory which might be using, or plans to adopt, the EuroFlow approach for immunophenotyping of hematological malignancies. In addition, the EuroFlow Consortium will support these QA rounds through the EuroFlow educational meetings and workshops (announced via the EuroFlow website www.euroflow.org). However, it should be noted that the purpose of the EuroFlow QA is to ensure that technical performance permits comparison of data files with EuroFlow database and consequently EuroFlow QA does not replace established proficiency tests of leukemia diagnostics.

ACKNOWLEDGMENTS We wish to acknowledge Marieke Comans-Bitter for her lasting organizational support of EuroFlow Consortium. We are thankful to our technical staff for continuous strive for high precision, namely Daniel Th} urner for his help on thoroughly testing and commenting the technical procedures. Cytognos SL, Salamanca developed the LST-QA reagent mix and the Infinicyt QA module (Rafael Fluxa and Carina Cabrita).

LITERATURE CITED 1. Pedreira CE, Costa ES, Lecrevisse Q, van Dongen JJM, Orfao A. Overview of clinical flow cytometry data analysis: Recent advances and future challenges. Trends Biotechnol 2013;31:415–425. 2. Kalina T, Flores-Montero J, van der Velden VHJ, et al. EuroFlow standardization of flow cytometer instrument settings and immunophenotyping protocols. Leukemia 2012;26:1986–2010. 3. Van Dongen JJM, Orfao A. EuroFlow: Resetting leukemia and lymphoma immunophenotyping. Basis for companion diagnostics and personalized medicine. Leukemia 2012;26:1899–1907. 4. Van Dongen JJM, Lhermitte L, B€ ottcher S, et al. EuroFlow antibody panels for standardized n-dimensional flow cytometric immunophenotyping of normal, reactive and malignant leukocytes. Leukemia 2012;26:1908–1975. 5. Costa ES, Pedreira CE, Barrena S, et al. Automated pattern-guided principal component analysis vs expert-based immunophenotypic classification of B-cell chronic lymphoproliferative disorders: A step forward in the standardization of clinical immunophenotyping. Leukemia 2010;24:1927–1933. 6. Pedreira CE, Costa ES, Almeida J, Fernandez C, Quijano S, Flores J, Barrena S, Lecrevisse Q, Van Dongen JJM, Orfao A. A probabilistic approach for the evaluation of minimal residual disease by multiparameter flow cytometry in leukemic B-cell chronic lymphoproliferative disorders. Cytometry A 2008;73A:1141–1150. 7. Pedreira CE, Costa ES, Barrena S, Lecrevisse Q, Almeida J, van Dongen JJM, Orfao A. Generation of flow cytometry data files with a potentially infinite number of dimensions. Cytometry A 2008;73A:834–846. 8. Da Costa ES, Peres RT, Almeida J, Lecrevisse Q, Arroyo ME, Teod osio C, Pedreira CE, van Dongen JJM, Orfao A. Harmonization of light scatter and fluorescence flow cytometry profiles obtained after staining peripheral blood leucocytes for cell surface-only versus intracellular antigens with the Fix & Perm reagent. Cytometry B Clin Cytom 2010;78B:11–20. 9. Whitby L, Granger V, Storie I, Goodfellow K, Sawle A, Reilly JT, Barnett D. Quality control of CD41 T-lymphocyte enumeration: Results from the last 9 years of the United Kingdom National External Quality Assessment Scheme for Immune Monitoring (1993–2001). Cytometry 2002;50:102–110. 10. Homburger HA, Rosenstock W, Paxton H, Paton ML, Landay AL. Assessment of interlaboratory variability of immunophenotyping. Results of the College of American Pathologists Flow Cytometry Survey. Ann N Y Acad Sci 1993;677:43–49. 11. Lysak D, Kalina T, Martınek J, Pikalova Z, Vokurkova D, Jaresova M, Marinov I, Ondrejkova A, Spacˇek M, Stehlıkova O. Interlaboratory variability of CD341 stem cell enumeration. A pilot study to national external quality assessment within the Czech Republic. Int J Lab Hematol 2010;32:e229–e236. 12. Levering WHBM, Preijers FWMB, van Wieringen WN, Kraan J, van Beers WAM, Sintnicolaas K, van Rhenen DJ, Gratama JW. Flow cytometric CD341 stem cell

155

Original Article

13.

14. 15.

16.

enumeration: Lessons from nine years’ external quality assessment within the Benelux countries. Cytometry B Clin Cytom 2007;72B:178–188. Barnett D, Granger V, Whitby L, Storie I, Reilly JT. Absolute CD41 T-lymphocyte and CD341 stem cell counts by single-platform flow cytometry: The way forward. Br J Haematol 1999;106:1059–1062. Reilly JT, Barnett D. UK NEQAS for leucocyte immunophenotyping: The first 10 years. J Clin Pathol 2001;54:508–511. Kluin-Nelemans J, Van Wering E, Van Der Schoot C, Adriaansen H, Van’T Veer M, Van Dongen J, Gratama J. SIHONSCORE: A scoring system for external quality control of leukaemia/lymphoma immunophenotyping measuring all analytical phases of laboratory performance. Br J Haematol 2001;112:337–343. Bikoue A, Janossy G, Barnett D. Stabilised cellular immuno-fluorescence assay: CD45 expression as a calibration standard for human leukocytes. J Immunol Methods 2002;266:19–32.

156

17. Davis KA, Abrams B, Iyer SB, Hoffman RA, Bishop JE. Determination of CD4 antigen density on cells: Role of antibody valency, avidity, clones, and conjugation. Cytometry 1998;33:197–205. 18. Bikoue A, George F, Poncelet P, Mutin M, Janossy G, Sampol J. Quantitative analysis of leukocyte membrane antigen expression: Normal adult values. Cytometry 1996; 26:137–147. 19. Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987;20:53–65. 20. Stuchly J, Kalina T. Analyses of large flow cytometry datasets. Cytometry A 2014; 85A:203–205. 21. Hrusak O, Basso G, Ratei R, Gaipa G, Luria D, Mejstrıkova E, Karawajew L, Buldini B, Rozenthal E, Bourquin JP, Kalina T, Sartor M, Dworzak MN. Flow diagnostics essential code: A simple and brief format for the summary of leukemia phenotyping. Cytometry B Clin Cytom 2014;86B:288–291.

Quality Assessment Scheme for EuroFlow Protocols

Quality assessment program for EuroFlow protocols: summary results of four-year (2010-2013) quality assurance rounds.

Flow cytometric immunophenotyping has become essential for accurate diagnosis, classification, and disease monitoring in hemato-oncology. The EuroFlow...
1MB Sizes 0 Downloads 8 Views