A Crowd-Based Process and Tool for HTS Hit Triage.

DOI: 10.1002/minf.201200154

A Crowd-Based Process and Tool for HTS Hit Triage Zhengwei Peng,*[a] Paul Gillespie,[b] Martin Weisel,[b] Sung-Sau So,[b] W. Venus So,[a] Rama Kondru,[b] Arjun Narayanan,[b] and Johannes C. Hermann[b] Keywords: Cheminformatics · Drug design · Medicinal chemistry

Given a large set of confirmed HTS hits generated by a new drug discovery project, it is necessary to prioritize and select a smaller set for the down-stream steps of lead identification and lead optimization. The quality of such a hit selection is critically connected to the effectiveness and potentially the ultimate success of a drug discovery project. In this report we present a HTS hit triage process based on a wisdom of crowds approach. Participants provide votes and comments on suggested compounds of confirmed HTS hits via a web voting tool in an independent and decentralized way. Insights gained and lessons learned through the actual application of this process and the associated tool on several drug discovery projects will be discussed. Finally this web voting tool could be easily modified and adapted for any crowd-based decision making for compound selections. High-throughput screening (HTS) of a large corporate compound collection (~ 105 to 106) is one of the well-established strategies to generate hits against a new drug target.[1–3] The initial hits are first confirmed, either by rescreening at the same screening concentration or in more elaborate bio-assays such as measuring dose-responsive curves.[4] For those confirmed HTS hits (~ 102 to 103), additional analysis is required to further prioritize and select a manageable number (~ 101) of hits for the down-stream hit-to-lead step involving the actual design and synthesis of new compounds by medicinal chemists. Figure 1 depicts this HTS hit triage process as a filtering process involving careful decision making. The quality of this decision is paramount as one wants to achieve the following desired objectives: a) to retain as many as possible of the high-value hits that have a good chance of progressing successfully through the lead identification and optimization process; b) to remove as many as possible of the low-value hits that have a lower chance of further progression. This would ensure that precious hit-to-lead resources are not spent on low-value hits but are allocated to genuine high-value hits. Given the high stakes involved, it is not surprising that multiple strategies for HTS hit triage have been utilized in drug discovery projects. One strategy is to assign the full responsibility of hit triage to one or a few individuals. The Mol. Inf. 2013, 32, 337 – 345

advantage of this approach is the lower amount of resource required and certain quality assurance from the committed individual(s) who will have a strong sense of ownership and responsibility on the hits selected. As shown by Lajiness et al. and Kutchukian et al., the main drawback of this approach is that the quality of the HTS hit triage is heavily influenced by the biases of the few people involved.[5–6] The major conclusion from their studies is that chemists are not very consistent either individually or with peers in terms of rejecting undesirable compounds or selecting desirable ones embedded inside a larger set of molecules, a fact that would contribute to significant variability in the outcome of HTS hit triage. To avoid this pitfall, Lajiness et al. recommended a team-based approach to ensure robustness of such a decision making process.[5] In recent years, increasing evidence has shown that for many types of decision making, the collective wisdom of the crowds consistently outperforms the decisions of individuals.[7–8] By utilizing a team-based approach, 11 experts with diverse scientific backgrounds and organizational affiliations evaluated 64 chemical probes and offered medium and high confidence on 48 out of the 64 probes suggested as a part of the NIH Molecular Libraries and Imaging Initiative.[9] However those authors did not mention any specific tools either developed or utilized for the experts to make their collective decisions on those chemical probes. At Johnson & Johnson, a large-scale library enrichment effort was successfully guided by a crowd-based voting process involving 145 chemists from six research sites and five therapeutic areas.[10] A voting module was implemented within a wellestablished in-house cheminformatics desktop tool. This module provided the voters an intuitive way to select attractive molecules to be purchased and added into their corporate compound collection. In this article, we report for the first time, the application of a wisdom of crowds approach to HTS hit triage. Our ex[a] Z. Peng, W. V. So pRED Informatics, Hoffmann-La Roche Inc. 340 Kingsland Street, Nutley, New Jersey 07110, USA *e-mail: [email protected] [b] P. Gillespie, M. Weisel, S.-S. So, R. Kondru, A. Narayanan, J. C. Hermann pRED Discovery Chemistry, Hoffmann-La Roche Inc. 340 Kingsland Street, Nutley, New Jersey 07110, USA

2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

337

Communication

Zhengwei Peng et al.

Figure 1. The HTS hit triage process presented by this report. Data gathering, analysis, and initial hit selection are done by the triage leaders. Crowd vote and comment are gathered via a web tool. Please see the main text for more info.

perience from more than ten years of this approach supports the published results described above. In addition, our process combines the advantages from both crowdbased and committed-individual strategies. The process and its associated web voting tool aim to meet the essential requirements for effective decision making: a) diversity in opinion; b) independent assessment; c) decentralization and d) proper vote aggregation. Moreover, this HTS hit triage process and the voting tool also need to be highly streamlined, efficient, and user-friendly to ensure rapid adoption and effective use by the targeted voter community. The design of the process and its associated web voting tool will be described, alongside summaries of actual voting results from four projects to demonstrate some insights we have gained in applying this HTS hit triage process.

Overview of Hit Selection Workflow. The crowd-based HTS hit triage we have implemented is depicted in Figure 1. After HTS hit confirmation, the HTS hit triage process is typically started with about 103 confirmed hits. Typically a computational chemist and a medicinal chemist team up to become HTS triage leaders (committed individuals) for a new drug discovery project. This team decides what are the key data and additional annotations needed for group decisions. This data set typically includes potency against the chosen protein target, ligand efficiency,[11] selectivity data, and basic physicochemical descriptors such as molecular weight (MW), cLogP, and polar surface area (PSA). Furthermore, each compound is annotated with historic bioassay data against other protein targets. The historic HTS data are further abstracted into drug target classes to simplify the overview (e.g. kinases, GPCRs, proteases, etc., see

Figure 2. Screen shots from the web voting tool as seen by a voter. Proprietary molecular structures from a real project have been replaced by public structures downloaded from PubChem (http://pubchem.ncbi.nlm.nih.gov/), and proprietary assay names have been changed to generic names. a) Compounds are displayed in the main web voting page, two records per row. The key properties pertinent for decision making are also presented to voters. Numerical data fields are configured for conditional coloring to aid data interpretation. Users can click on the cluster drill-down web link to show the cluster viewer (Figure 2 b). Voters use the “Vote” combo box and the “Comment” text field to vote and provide additional comments as they see fit. Voters are encouraged to save intermediate voting results using the “Save votes” button, and the time stamp of the last vote saved is also displayed below the “Save votes” button to remind users to save often as a guard against any potential data loss. The abbreviation LTS stands for low-throughput screening data, such as an IC50 measurement. b) The cluster drill-down view for Cluster 2 displayed in Figure 2 a. For members of Cluster 2 with IC50 confirmation data, pertinent data are displayed. For those cluster members which have no confirmation assay data, only the %_inhibition data from the primary HTS results are displayed here. LigEff_PCT_INHIB stands for ligand efficiency based on the primary % inhibition HTS data. Its detailed definition is beyond the scope of this report.

338

www.molinf.com


Mol. Inf. 2013, 32, 337 – 345

A Crowd-Based Process and Tool for HTS Hit Triage

Figure 2 a). Data from low-throughput assays, typically Kds or IC50s, are also included when available.

Mol. Inf. 2013, 32, 337 – 345

Importantly, the triage leaders perform clustering and a preliminary selection to ensure that the number of compounds recommended to the voting group is within a rea-


www.molinf.com

339

Communication


sonable range (from 50 to 150). Typically a cluster is represented by only a single compound selected according to project-specific criteria such as potency, ligand efficiency,[11] or selectivity. For clustering we use a method based on Maximum Common Structure (MCS)[12–13] and Maximum Overlapping Set (MOS),[14] although conceptually the hit triage process depicted in Figure 1 is independent of the clustering method used. Singleton hits from the cluster analysis are usually not selected for voting but rather passed on to the project team formed later for consideration. Once all of the recommended compounds are annotated with assay data and computed properties, up to 130 chemists from different Roche sites worldwide (the crowd) are invited by the triage leaders to provide their independent assessments on each compound using a web voting tool. In order to make data review and voting as straightforward as possible, voters are presented with molecular structures and key properties of each compound using a simple card layout (see Figure 2 a). To facilitate better decision making, each compound also has a hyperlink to allow the viewing of molecular structures and assay data for other members of the same cluster and examine its respective SAR (Structure Activity Relationship) (see Figure 2 b). There are three options for a vote (+ 1 for acceptance, 0 for neutral, and 1 for rejection). Optional comments on the particular compounds or clusters are collected through free text entries for each cluster. As seen in Figure 2 a, the vote field is set by default to “None” until a vote is recorded for that compound. A “0” or neutral vote indicates that a voter has actually reviewed the data associated with a compound, but does not have a strong positive or negative opinion on that compound. After the voting period is over, the triage leaders then summarize and analyze the voting results provided by the crowd via the web voting tool. The crowd votes (+ 1, 0, 1) for each individual compound are added together as the “net vote” for that compound. Positive votes (+ 1) will increase the “net vote” score while negative votes ( 1) will lower this score. A “None” vote does not contribute to the “net vote”. As a consequence, the most promising compounds will accumulate high net votes while undesirable ones will receive negative net votes. The triage leaders then focus on the top compounds with high net votes, review comments and gather additional information (e.g., ease of synthesis, molecular stability, structural novelty, etc.). Finally the triage leaders organize a meeting for the voters, present the summarized results of the compounds with higher net votes, discuss and resolve any issues raised in the comments with the help of individual authors, and then finalize the prioritized selection of hits. The outcome of this HTS hit triage process is a data package with many annotations in the form of key assay data, computed properties, group votes and comments. This data package is then passed to the drug discovery project team to start the down-stream hit-to-lead process. 340

www.molinf.com

In the past, four criteria have been suggested by previous authors as required for intelligent and effective decision making through the wisdom of crowds: a) diversity in opinion; b) independence; c) decentralization; and d) proper aggregation.[7,8,10] The larger group of voting chemists (~ 102) represents sufficient diversity of opinions. Since all voters are given the same amount of information to vote and are not influenced by anyone before and during the voting period, the HTS hit triage process depicted in Figure 1 also satisfies the criteria of independence and decentralization. Implementation Detail. The triage leaders utilize various ways to gather bioassay information from corporate databases, compute ADMET descriptors, and perform cluster analysis. In a series of Accelrys Pipeline Pilot (http://accelrys.com/products/pipeline-pilot/) protocols, properties such as ligand and lipophilic ligand efficiency, cLogP, and PSA are calculated. Roche databases are queried for historical data on each compound. These not only include activity data but also physicochemical data when available (e.g. solubility). The preselection of compounds to be voted on is done by plotting various properties within clusters in order to identify the least promising ones, e.g. too hydrophobic, not ligand efficient, etc. Furthermore clusters are ruled out due to obvious undesired substructures – this does introduce a level of personal bias from the triage leaders, however most clusters are removed due to undesired properties minimizing the impact of personal bias on the pre-selection of compounds. After preparation of the data, a set of SD[15] files are passed on to the web voting tool. The first SD file contains molecular structures and data fields for all compounds selected for group vote (as seen in Figure 2 a). Then for each compound, there is an SD file that contains molecular structures and data fields of other members from the same cluster (as seen in Figure 2 b). The web voting tool is also implemented as an Accelrys Pipeline Pilot web application (http://accelrys.com/products/pipeline-pilot/). The votes and comments for all compounds to be voted on are stored on the server-side of the web voting tool as tab-separated text files, one line per compound, and one file per voter. A voter-specific text file is created on the server-side the first time a voter starts using the web voting tool and then updated each time the voter clicks on the “Save votes” button on the main page. If a voter ends a voting session and then re-launches the web voting tool later, the votes and comments already captured in the previous session will be fully restored and displayed on the main page so that the voter can continue to vote. Two additional Pipeline Pilot protocols were implemented and are used by the triage leaders to summarize the vote results and prepare a report to facilitate the group meeting for final decision making. One protocol summarizes the votes and comments from all vote files. The second protocol generates a Microsoft PowerPoint slide deck with


Mol. Inf. 2013, 32, 337 – 345

A Crowd-Based Process and Tool for HTS Hit Triage

molecular structures and the summarized vote data, with one compound per page. There are also a few noteworthy usability features of the web voting tool. A client-side Javascript running inside the web browser reminds a voter to save his or her votes before closing a vote session. A time stamp is displayed on the main vote page to indicate the last time the votes were saved (see Figure 2 a). This time stamp serves as a handy reminder for the voter to save his or her votes frequently as an effective guard against possible data loss upon any unexpected exit of the voting session. More information related to the implementation of the web voting tool can be found in the supplemental material. The modular design of the Pipeline Pilot protocols allows easy customization of many aspects of the web voting tool (data fields to be displayed or ignored, labeling, font type, font size, text alignment, etc.). Conditional coloring of data fields based on the value of the data element is also enabled (see Figure 2 a). It is up to the triage leaders to decide how to customize the web voting tool to ensure the most relevant factors are presented in a simple yet effective layout for voter to make informed decisions efficiently. Large variations in individual votes. The major benefit of crowd-based decision making is its robustness when compared with the decisions made by any individual. We choose the following visualization to highlight this point. For Project 1 we assume that the top 10 % (8 compounds) of the voted hits according to the net votes are accepted by the group, and the bottom 10 % (8 compounds) are rejected by the group, while the remaining 63 compounds are assigned to be neutral (no clear decision by the group). Then we use two descriptors to assess the deviations of an individual’s choice from the group choice by counting a) how many group-accepted compounds are not accepted by that voter; and b) how many group-rejected compounds are not rejected by the same voter. Each descriptor can vary from 0 (perfect agreement) to 8 (total disagreement). Figure 3 displays the discrepancy of individual’s choices with respect to the group choice for Project 1. Only one voter’s choice came close to the group choice, missing only one compound accepted by the group. Overall, a wide variation was observed among the votes from individual chemists. This echoes the finding reported by Lajiness, Kutchukian, and their co-workers.[5–6] Importance of the neutral votes. In their publication, Lajiness and co-workers suggested that a neutral vote should be enabled to allow voters to deal with molecules for which they may not have sufficient information to either accept or reject.[5] Our experiences with the actual voting data confirm their conclusion. Figure 4 shows the voting summary from four HTS hit triage campaigns. In this case, the voters for all four projects are from the same medicinal chemistry department. For the first three projects, the HTS hits are all from the corporate compound set containing molecules which are more lead-like and drug-like. On the other hand, the hits for the fourth project are fragment-like Mol. Inf. 2013, 32, 337 – 345

Figure 3. Great variations in the selections of individuals are seen when compared with the group selection. (0,0) means a perfect agreement between an individual’s selection and the group selection, while (8,8) represents a total miss by an individual when compared with the group selection.

molecules from a fragment screening library. It is easy to see from Figure 4 that chemists are far more likely to be assertive when it comes to lead-like or drug-like molecules than fragment-like molecules, as they gave more + 1 or 1 votes in the first three vote campaigns than in the fourth one. Without the neutral votes, the difficulty chemists have in evaluating fragment-like hits would not be so evident. Voter comments capture valuable and diverse collective knowledge for sharing. In addition to the numerical votes of “ + 1”, “0”, and “ 1”, the textual comments also capture valuable and diverse knowledge from the group unmatched by any individual from the same group. Table 1 highlights this point using comments for the top hit from Project 1 which was accepted by most chemists and the bottom hit rejected by most chemists. Even after casting the “ + 1” votes for the top hit, many voters still voiced their concerns about this hit and suggested ways to address those concerns (e.g., potential toxicity issue related to the electronrich aniline fragment inside the hit, and a potential false negative issue as the hit is too simple and SAR observed among its cluster neighborhood might be too abrupt to be real…). For the bottom hit rejected by most chemists, the comments allowed a much more comprehensive summary of what was wrong with this hit. Value of making it simple for the voters. Before the current tool was rolled out, voting and commenting were done by exchanging Microsoft Excel files by email. Even though this process achieved the same benefits of seeking crowd


www.molinf.com

341

Communication


Figure 4. Display of vote summaries in a three-dimensional descriptor space. The x-axis is the total number of “ 1” votes received by a compound divided by the total number of voters; the y-axis the total number of “ + 1” votes divided by the total number of voters; and the z-axis the total number of neutral (“0”) votes divided by the total number of voters. Each compound is represented by a single point on a single plane defined by the equation x + y + z = 1. Compounds closest to the top corner received the most neutral votes, while compounds near the bottom left corner received most negative (“ 1”) votes, and compounds near the bottom right corner received most positive (“ + 1”) votes. a) vote summary from Project 1 with 79 compounds voted by 41 chemists. b) vote summary from Project 2 with 84 compounds voted by 48 chemists. c) vote summary from Project 3 with 86 compounds voted by 39 chemists. d) vote summary from Project 4 with 132 compounds voted by 17 chemists. The hits for Project 4 are small fragment-like molecules, unlike the lead-like and drug-like hits from the other three projects.

wisdom, the web voting tool significantly streamlines the workflow and makes it much easier for both the voters and the triage leaders to conduct the vote. Voter feedback after rollout of the web voting tool confirmed the positive impact of such an improvement. Role of the triage leaders. The triage leaders for a given HTS hit triage campaign play a very important role. Using their combined sets of experiences and skills, they are fully empowered to decide what data will be used for hit analy342

www.molinf.com

sis and voting. They also have the technical know-how to gather the required data associated with key decision factors (e.g. selectivity profile, ligand efficiency,[11] and ratio between potency and toxicity, etc.) pertinent to the given drug discovery project. Finally they perform the initial filtering (decision making) and select 50–150 compounds for the larger group to vote on. The triage leaders have to be mindful of the known pitfall of decisions made by one or a few individuals during this initial filtering step.[5–6] During


Mol. Inf. 2013, 32, 337 – 345

A Crowd-Based Process and Tool for HTS Hit Triage Table 1. Sample voter comments provide additional insights on the diversity and richness of the collective knowledge only available from the crowd. LE is the abbreviation for ligand efficiency; SAR for structure activity relationship; and tox for toxicity. For the top hit from Project 1

For the bottom hit from Project 1

[Voter_1] Good starting point with a lot of SAR known. Would be good if you could get away from the electron-rich aniline without compromising the tox. ratio. [Voter_3] The structure might be too simple to be sub-micromolar in this assay. Fishy. [Voter_4] Good potency. Reasonable ligand and lipophilic efficiency. Easy to do hit expansion because of simple chemistry. Concern: potential quinone precursor. [Voter_5] This motif of two aryl or hetero-aryl groups separated by an amide appears frequently. Good ligand efficiency and potency. SAR may be contained in large cluster. Then again it may be too common and give selectivity issues. [Voter_6] pre-arranged conformation and pretty potent – an teach you about the pharmacophore. [Voter_7] good activity, selectivity and LigEff [Voter_8] SAR falls fast [Voter_9] Rapid SAR development likely [Voter_10] Good potency. Could be worth a few analogs. [Voter_11] SAR could be steep. In cluster data, the most related compounds lost potency when methoxy group was changed to methyl or chloro.

[Voter_1] low LE

the voting period, the fruits of their labor are fully leveraged and utilized by many other voters to make informed decisions efficiently. We believe that this division of labor between triage leaders and voters strikes a reasonable balance between efficiency and independence. More effective reuse of valuable knowledge from previous hit-to-lead investment. Over the years, a drug discovery organization will typically have worked on many selected HTS hits against many protein targets. Most of these hit series are terminated along their progression paths from initial hits to potential drug candidates due to various undesirable experimental outcomes (e.g., low synthetic feasibility, low stability, or undesirable ADME&T outcomes). The time and cost associated with generating a body of knowledge about the context and underlying reasons for the terminations of lead series are very significant, as actual compound synthesis and experimental assay measurements are required. Therefore it is imperative for a drug discovery organization to collect and reuse this body of valuable knowledge to guide future decision making, for example in HTS triage and selection. The challenge is that a significant portion of the knowledge is not explicit but tacit.[16] By its very nature, such tacit knowledge is accumulated by individual chemists through their experience of working on specific lead series on discovery projects. Due to the vastness of chemical space, the amount of tacit knowledge accumulated by each individual chemist through actual project experience is necessarily limited to certain areas of expertise and may have only a small overlap with the knowledge of another chemist who has worked on different lead series for other discovery projects. This may be the main reason behind the low consensus observed when chemists are Mol. Inf. 2013, 32, 337 – 345

[Voter_2] Narrow IC50/CC50 window, might not be real. [Voter_3] Too hydrophobic for a starting point.

[Voter_4] The last three compounds with IC50 are from the previous vendor collaboration and are kinase inhibitors .The ligand efficiency for most of the compounds is low, probably not worth to follow up [Voter_5] Looks just like kinase inhibitors. Stay away from kinases. [Voter_6] SAR falls off fast [Voter_7] Again similar to cluster 2. Diaryl amides… [Voter_8] pre-arranged conformation

asked to select desirable or reject undesirable compounds from a larger set.[5–6] The only way to ensure that the accumulated tacit knowledge of a drug discovery organization is reused is to engage as many individual chemists as possible in an efficient manner. We have analyzed the comments provided by voters for the HTS hits from three projects and extracted comments that specifically state that knowledge from previous projects on related hits was used (Table 2). For compounds 1.1 and 3.4, voters reported that similar compounds were selected for hit-to-lead development for past projects and had later proven experimentally to be unstable. For compounds 1.2, 2.3, and 3.2, voters reported that similar compounds were also selected for hit-to-lead but that it was difficult to achieve the desired ADME objectives. This type of voter comment triggers further data gathering and analysis by the triage leaders and project chemists, ensuring that low value hits with a low chance of success are not followed experimentally. Given a large set of confirmed HTS hits, it is necessary to prioritize and select a smaller set for down-stream lead development and lead optimization. The quality of such decision making is critically connected to the effectiveness and even the ultimate success of the drug discovery project. Given such high stakes, hit selections made by individual chemists or even a small group are not robust enough for this purpose.[5–6] In this report we present a HTS hit triage process which combines a) the expertise of two designated triage leaders (a medicinal chemist and a computational chemist) to gather pertinent key data for judging a compound’s quality and to make the initial selection of 50–150 compounds; and b) the wisdom of crowds to vote and comment on those 50–150 suggested compounds via


www.molinf.com

343

Communication


Table 2. As reflected in voter comments, valuable knowledge gained from previous hit-to-lead effort on other discovery projects are collected and reused to guide hit triage for new projects. Here all explicit discovery project names are replaced by the generic symbol XXX, and all names of chemists replaced by xxxx. Project ID

Compound Sample voter comment ID

1

1.1

1

1.2

1

1.3

2 2

2.1 2.2

2

2.3

2

2.4

2

2.5

2

2.6

3 3 3

3.1 3.2 3.3

3

3.4

The XXX team worked on compounds like this and found them to be unstable (they turn a dark color on exposure to air). In the XXX project it was hard to find compounds in this series with a good balance of solubility, permeability, and clearance. I think that makes this a poor choice of starting point. We made some compounds with a urea attached to a good leaving group for the XXX project. They are acylating agents. Selectivity issues. Compounds like this were not stable in the XXX project (ask xxxx for details) Previous experience with stability of 3-amino-2-pyridones was not good (ask xxxx). But maybe the sulfonamides are more stable. Good lipophilic and ligand efficiency. Pyridine carries a concern about CYP inhibition, and is also probably responsible for the kinase activity. There was a Palo Alto XXX project that had a semicarbazone starting point like this. I think that they were never able to get rid of the semicarbazone. I would not want to have this substructure in the series I work on. from XXX project. Poor solubility, otherwise good properties. Chloro-pyridine is rather unreactive. Easy to avoid NKreceptor affinity. Contact: xxxx Concern about potential GSH liability. In my experience in other projects (e.g., XXX, XXX), structures like this lose activity when you reduce the alpha, beta unsaturation. xxxx worked on this series for the XXX project. Optimization was difficult because the amine is very unreactive, so the chemistry space was limited. The single-point data for analogues suggests that this is not a promising series for XXX. These compounds undergo photocyclization. In the XXX project, the cyclized products had greater activity. This looks like a compound from the XXX project. I think that solubility was an issue for the series. Selectivity is not good. We worked on related (but fully aromatic compounds) in the XXX hit set and they had poor solubility. We followed up on some related compounds for the XXX project. The compounds were prepared by a CRO and by the time they reached us, they had decomposed. Please check the purity and identity of the hit.

a web voting tool in an independent and decentralized way. The experiences we have gathered through the application of this HTS hit triage process in multiple projects have demonstrated that a) choices made by individual voters do vary significantly, as expected, with respect to the crowd choice, b) the neutral vote is very useful for cases where voters do not have sufficient information to either accept or reject a compound; c) voter comments contain a rich and diverse set of collective knowledge unmatched by that from any individual; and d) the web voting tool is an effective way to engage many chemists, draw out their valuable tacit knowledge accumulated through actual project experiences, and use it for more informed decision in HTS hit triage. Finally we believe that the web voting tool could be easily modified for any crowd-based decision making involving compound selections (compounds to be used as chemical probes,[9] compounds to be purchased,[10] HTS hits to be followed-up (this work), virtual compounds to be prioritized and synthesized, etc.).

Supporting Information For illustrative purposes only, a sample dataset plus a few simplified Pipeline Pilot scripts related to the web voting tool are available as supplemental material for download. 344

www.molinf.com

Please consult the short user guide included inside the same download package for more details.

Acknowledgements The authors would like to thank Dr. Zhang Li and the Accelrys support staff for their valuable technical assistance in web tool development using Pipeline Pilot; Drs. Frank Yan, Tim Owens, Matthew Lucas, Nicholas Huby, and Andreas Kuglstatter for their interest and valuable feedback; Dr. Anne Song and Mr. Jeremy Gore for work related to data integration; and Dr. Paul Whitehead for his support throughout this project.

References [1] J. R. Broach, J. Thorner, Nature 1996, 384, SUPP, 14 – 16. [2] K. H. Bleicher, H.-J. Boehm, K. Muller, A. Alanine, Nat. Rev. Drug Discov. 2003, 2, 369 – 378. [3] R. A. Goodnow Jr, Drug Discov. Today: Technol. 2006, 3, 367 – 375. [4] G. M. Keseru, G. M. Makara, Drug Discov. Today 2006, 11, 741 – 748. [5] M. S. Lajiness, G. M. Maggiora, V. Shanmugasundaram, J. Med. Chem., 2004, 47, 4891 – 4896.


Mol. Inf. 2013, 32, 337 – 345

A Crowd-Based Process and Tool for HTS Hit Triage [6] P. S. Kutchukian, N. Y. Vasilyeva, J. Xu, M. K. Lindval, M. P. Dillon, M. Glick, J. D. Coley, N. Brooijmans, PLoS ONE, 2012, 7(11): e48476. doi:10.1371/journal.pone.0048476 [7] J. Surowiecki, The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations, Little Brown & Company, New York, 2004. [8] F. Khatib, F. DiMaio, Foldit Contenders Group, Foldit Void Crushers Group, S. Cooper, M. Kazmierczyk, M. Gilski, S. Krzywda, H. Zabranska, I. Pichova, J. Thompson, Z. Popovic´, M. Jaskolski, D. Baker, Nat. Struct. Mol. Biol. 2001, 18, 1175 – 1177. [9] T. I. Oprea, C. G. Bologa, S. Boyer, R. F. Curpan, R. C. Glen, A. L. Hopkins, C. A. Lipinski, G. R. Marshall, Y. C. Martin, L. Ostopovici-Halip, G. Rishton, O. Ursu, R. J. Vaz, C. Waller, H. Waldmann, L. A. Sklar, Nat. Chem. Biol. 2009, 5, 441 – 447. [10] M. D. Hack, D. N. Rassokhin, C. Buyck, M. Seierstad, A. Skalkin, P. ten Holte, T. K. Jones, T. Mirzadegan, D. K. Agrafiotis, J. Chem. Inf. Model. 2011, 51, 3275 – 3286.

Mol. Inf. 2013, 32, 337 – 345

[11] A. L. Hopkins, C. R. Groom, A. Alex, Drug Discov. Today, 2004, 9, 430 – 431. [12] H. Bunke, Proc. Int. Conf. Vision Interface, 2000, Montreal, Quebec, Canada, 2000, pp. 82 – 88. [13] D. Conte, P. Foggia, C. Sansone, M. Vento, Inter. J. Pattern Recogn. Artif. Intell. 2004, 18, 265 – 298. [14] M. Stahl, H. Mauser, M. Tsui, N. R. Taylor, J. Med. Chem., 2005, 48, 4358 – 4366. [15] A. Dalby, J. G. Nourse, W. D. Hounshell, A. K. I. Gushurst, D. L. Grier, B. A. Leland, J. Laufer, J. Chem. Inf. Comp. Sci. 1992, 32, 244 – 255. [16] M. Polanyi, The Tacit Dimension, University of Chicago Press, Chicago, 1966, p. 4.


Received: November 30, 2012 Accepted: March 2, 2012 Published online: April 9, 2013

www.molinf.com

345

ED Ebola triage algorithm: a tool and process for compliance.

B-HIT - A Tool for Harvesting and Indexing Biodiversity Data.

Early stage hit triage for plant chemical genetic screens and target site identification.

Modified Goff Symptom Index: Simple triage tool for ovarian malignancy.

Usefulness of the Shock Index as a secondary triage tool.

The trauma report nurse: a trauma triage process improvement project.

Development of a Pediatric Mass Casualty Triage Algorithm Validation Tool.

Triage: an investigation of the process and potential vulnerabilities.

HPLC biogram analysis: a powerful tool used for hit confirmation in early drug discovery.

Evidence for a double hit process in photosystem II based on fluorescence studies.

Predictive models, a new tool for managing organ procurement process.

Psychiatric Emergency Triage in Iran; the Need for a Specific Tool.

Bombings specific triage (Bost Tool) tool and its application by healthcare professionals.

Smartphone teledermoscopy referrals: a novel process for improved triage of skin cancer patients.

An online tool for nurse triage to evaluate risk for acute coronary syndrome at emergency department.

IOTA: integration optimization, triage and analysis tool for the processing of XFEL diffraction images.

Evaluation of an emergency department triage screening tool for suspected severe sepsis and septic shock.

Development of a High-Throughput and Miniaturized Cytokinesis-Block Micronucleus Assay for Use as a Biological Dosimetry Population Triage Tool.

Effectiveness of arterial, venous, and capillary blood lactate as a sepsis triage tool in ED patients.

Asthma in pregnancy: a hit for two.

SBGNViz: A Tool for Visualization and Complexity Management of SBGN Process Description Maps.

One-two-triage: validation and reliability of a novel triage system for low-resource settings.

Lung ultrasound: a useful tool in the weaning process?

Reflexivity: a methodological tool in the knowledge translation process?