Secondary triage classification using an ensemble random forest technique.

Technology and Health Care 23 (2015) 419–428 DOI 10.3233/THC-150907 IOS Press

419

Secondary triage classification using an ensemble random forest technique Dhifaf Azeeza , K.B. Ganb,∗ , M.A. Mohd Alib and M.S. Ismailc a Department

b Department

of Control and Systems Engineering, University of Technology, Baghdad, Iraq of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia,

Malaysia

c Department

of Emergency Medicine, Universiti Kebangsaan Malaysia Medical Centre, Kuala Lumpur,

Malaysia Received 10 September 2014 Accepted 13 February 2015 Abstract. BACKGROUND: Triage of patients in the emergency department is a complex task based on several uncertainties and ambiguous information. Triage must be implemented within two to five minutes to avoid potential fatality and increased waiting time. OBJECTIVE: An intelligent triage system has been proposed for use in a triage environment to reduce human error. METHODS: This system was developed based on the objective primary triage scale (OPTS) that is currently used in the Universiti Kebangsaan Malaysia Medical Center. Both primary and secondary triage models are required to develop this system. The primary triage model has been reported previously; this work focused on secondary triage modelling using an ensemble random forest technique. The randomized resampling method was proposed to balance the data unbalance prior to model development. RESULTS: The results showed that the 300% resampling gave a low out-of-bag error of 0.02 compared to 0.37 without preprocessing. This model has a sensitivity and specificity of 0.98 and 0.89, respectively, for the unseen data. CONCLUSION: With this combination, the random forest reduces the variance, and the randomized resembling reduces the bias, leading to the reduced out-of-bag error. Keywords: Decision support system, emergency department, random forest, randomized resampling

1. Introduction Triage is an essential procedure in the Emergency Department (ED), aiming to categorize patients into the most appropriate treatment location. Triage categorizes the patient using limited information such as general appearance, medical history and vital signs. There are many international scales, such as the Australian, American, British and Canadian triage scales, to help the ED specialist to categorize the patients. The scales provide general rules for common cases to reduce errors. In addition, there are many localized triage systems, including the Taiwan Triage Scale [1], the Cape Triage Scale and the Geneva Emergency Triage Scale [2], the Objective Primary Triage Scale (OPTS). OPTS is a locally ∗ Corresponding author: K.B. Gan, Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, Malaysia. Tel.: +603 8921 7149; Fax: +603 89118359; E-mail: [email protected].

c 2015 – IOS Press and the authors. All rights reserved 0928-7329/15/$35.00

420

D. Azeez et al. / Secondary triage classification using an ensemble random forest technique

developed system by the Emergency Department of the Universiti Kebangsaan Malaysia Medical Centre (UKMMC) based on the Emergency Severity Index (ESI) [3] for triage acuity assessment. The output of these scales is varied from three to five categories ranging from resuscitation to non-emergency. Manual triage decision-making is always complex and dependent on the officer’s expert judgment and experience, the patients’ clinical history and the availability of resources. Moreover, the issue faced by the ED physician is to quickly and accurately identify those patients who require more attention. Then, the ED physician can intervene before the patients collapse while not overburdening the surgeon for non-emergent problems. Waiting time and ED care time are affected by some factors that can be reduced when the ED nurse staffing is within state-mandated levels, after controlling for ED census and patient acuity [4], but can be increased with lower staffing levels. The three independent factors affecting the time of patients’ stays in the ED are the number of elective surgical admissions, the number of ED admissions, and hospital occupancy [5]. The length of stay increased from 1.08 to 14.27 minutes based on shift and additional patient numbers [5]. Other factors with significant effects on the time of treatment are admission hour, day of the week, patient volume, patient characteristics, hospital characteristics and area characteristics [6]. However, the above system still requires a clinical specialist to categorize patients into the appropriate triage level based on the triage scales. These manual systems, although they are simple, do not take advantage of the power of modern computer facilities, leading to inconsistency and errors. Recently, various mobile and handheld devices have been adapted based on different triage standards to assist ED physicians. Michalowski et al. [7] developed the Mobile Emergency Triage (MET) system for pediatric emergency service using a mobile device. It uses rough set theory and fuzzy measures to extract the rules of the incomplete data set. The overall mean accuracy of the MET system was slightly lower, but not significantly different from the accuracy of the ED physicians (70.2% for physicians vs 67.2% for MET) [8]. ITriage [9] is another mobile triage system which has a mean accuracy of 67% with a standard deviation of 29%, whereas the use of the physiological list has a mean accuracy of 53% and a standard deviation of 23%. The disadvantages of the mobile devices are their limited memory and requirement of an internet connection to the data server. Sadeghi et al. [10] developed a decision support system for emergency triage using a Bayesian network, for which data were extracted from the paper records of 90 patients with non-traumatic abdominal pain as their chief complaint. This system has a higher sensitivity than the physician (90% versus 64%) but a lower level of specificity compared to the physician (25% versus 48%). An expert system for abnormal diagnosis of emergency triage was developed by Lin et al. [11] using cluster analysis (Ward’s method and k-means) and decision-tree methods. This system depends on saved data and was not tested for accuracy on new input data. This system used limited data and did not take the patient’s description of the condition into account. Lin et al. [12] used cluster analysis and rough set theory as tools for data mining to extract rules from the data in the emergency department. This system followed the patients until they left the hospital, for classification into five levels with an accuracy of 0.937. Chapman et al. [13] compared the ability of chief complaint and fifty five clinical features from ED as automated syndromic surveillance systems. They used the random forest method to classify the results and showed that the result is improved compared to the use of the chief complaint alone. Random forest is a classification or a regression method developed by Breiman [14] and has been applied widely in the medical field [15–17]. It is an ensemble technique that used the decision tree classifier as a base. Random forest is an enhancement of random tree classification, which considers a random set of features of each node. Random forest is constructed by generating a set of ensemble decision trees based on a random vector that is independent and identically distributed and the trees are


421

then built from a set of features [18]. Additionally, random forest gives better performance with rare classes [19–21]. Model development using unbalanced data could reduce the accuracy of the system. There are many methods used to improve the accuracy of classification of the unbalanced data. Boosting [22], bagging [23] and randomization [24] have been implemented to construct ensemble models to improve the accuracy of the weak learner and to enhance the classifier itself. The external approach works on the data rebalancing and on cost-sensitive learning [21]. Improving the data and the classifier are other approaches to improving the accuracy. Ambroise and McLachlan [25] suggested that the assessment of the bias correction can be implemented either using external resample (bootstrap) or cross validation technique. The main objective of this project is to develop an intelligent triage system to minimize the need for expert intervention in the ED. This system will reduce waiting time during busy periods in the ED. Consequently, admission of patients in a hospital for observation and assessment becomes faster. The development of two classification models, namely, primary and secondary triage models are required to achieve this objective. The primary triage model has been reported previously using Adaptive NeuralFuzzy Inference System (ANFIS) and Artificial Neural Network model (ANN). The results showed that the accuracy of the training data was 99% for ANN model and 96% for ANFIS model [26]. As for unseen data, the accuracy of the ANN was 96.7% and 94% for the ANFIS model. The ANN model was performed better for both training and unseen data than ANFIS model in term of generalization. This paper focuses on the secondary triage modelling using the random forest technique. Randomized resampling was used as a pre-processing step to improve the accuracy of the unbalanced data and reduce out-of-bag error of the developed classification model. Because random forest works by reducing the variance without changing its bias, the combination of random forest and randomized resampling can reduce the variance and the bias. As a result, the out-of-bag (OOB) error will be reduced. 2. Materials and methods 2.1. Triage data extraction and description This is a retrospective study, for which 1912 data were obtained from OPTS and reviewed by the experts in the ED, UKMMC. The exclusion criteria in this study were patients under 12 years old and patients who do not have any vital signs or die on the triage counter or at the resuscitation stage. This study was granted approval by the research ethics committee of UKMMC. These data were divided into two sets, one of which was used to build the random forest tree and measure the OOB. The other dataset was used as unseen data to measure the generalization of the random forest model. The inputs to the model were the known cases, their chief complaints, for which the numeric codes are described below, and patient medical history, such as heart disease, hypertension, diabetes, epilepsy, cerebrovascular accident, asthma, chronic obstructive pulmonary disease, chronic kidney disease, hepatitis, gastritis, benign prostate hyperplasia, allergic, migraine and cancer. All the above variables were coded as numeric values of either zero or one. Zero represents a history of this disease and one represents the absence of that disease history. Vital signs (heart rate, blood pressure, temperature, respiratory rate, oxygen saturation) were other features used as numeric input for the secondary triage model. The patient’s vital signs measurement was usually performed by the medical officer. One of the input features of this model is a free-text space for chief complaint. In this work, the chief complaint was coded into numeric values according to the Canadian Emergency Department Information

422

D. Azeez et al. / Secondary triage classification using an ensemble random forest technique Table 1 Chief complaint and their numeric codes Chief complaint Chest discomfort/pain Palpitation Weakness Collapsed/Fainted/Syncope Limb/Extremity complaints Motor vehicle accident (MVA) Falls Dizziness/Giddiness Assault Industrial/Machinery injury Nasal problems Shortness of breath GI bleeding Allergic reaction Burns & Scalds Fever Diabetic problems Abdominal discomfort/pain

Numeric coded 3,4 5 7 8 10,555,410,554,557 13 14 15 16 17 151 651 260 657 705 852 853 251

Chief complaint Local infections & Abscesses GI bleeding Back pain Headache Eye problems Urinary problems Fits/Seizures Allergic reaction Ear problems Facial complaints Sports injury Scrotal/Testicular pain & swell Abnormal behavior Bites & Stings Rectal complaints Overdose & Poisoning Nausea/vomiting

Numeric coded 709 260 551 404 503,505 307,302,306 405 657 51 107 18 305 402 701 258 752 257

Table 2 Description of the triage output and its occurrence No. 1 2 3 4 5 6 7

Description of the triage output Resus Emergency cases which are unstable Emergency cases which are stable Urgent cases that need to be seen on the same day Non-emergent cases Patients with age above 65 years old Push to redbox, if need further investigation

Coded class number Zero One Two Three Four Five Six

Occurrence 12 64 77 733 533 217 276

System (CEDIS) [27]. However, some common terms for chief complaint that are used in UKMMC are not listed in CEDIS. Therefore, custom numbers were created and added in the model development. Table 1 shows the description and coded number for each chief complaint. The numeric values of the coded number for each chief complaint varied between 3 and 856. Some chief complaints that came from different causes were categorized under the primary chief complaint; for example, there are two numeric codes for chest pain/discomfort. Three is for chest pain due to a cardiac cause, and four were associated with non-cardiac causes. Therefore, if the chest pain is due to cardiac signs and symptoms (chest pain, sweating, difficulty breathing, palpitation, giddiness, hand numbness, nausea, vomiting), then it was referred to the code for cardiac cause (three). If the chest pain is just chest pain but no signs or symptoms of cardiac causes, it was referred to the chest pain (non-cardiac) and coded as number four. The secondary triage classes are shown in Table 2 with their numerical codes. The output of the secondary triage would be one of these classes: resuscitation (Zero), emergency cases which are unstable (One), emergency cases which are stable (Two), urgent cases that need to be seen on the same day (Three), non-emergent cases (Four), patients with age above 65 years old (Five) and push to Redbox if need further investigation. Normally, the elderly patients (> 65 years old) will be triaged separately as they don’t have clear complains, more frequent hospital admissions, increased resource utilization and higher rate of adverse health outcomes [28]. Besides that, there is a negative impact care coming from the uncomfortable of emergency physicians when evaluating the elderly patients [29]. Table 2 clearly


423

showed that the output classes in the data were unbalanced. Some classes rarely occurred in the real scenario of the triage process. The minimum number in the classes was for the label of zero (12 cases), and the maximum number was for the label of three (733 cases). 2.2. Random forest model The random forest model was developed using Waikato Environment for Knowledge Analysis (WEKA) [30]. WEKA is a data mining or machine learning tool developed by the Department of Computer Science, University of Waikato, New Zealand. This software has many features, including pre-processing, classification and regression, and clustering algorithms. It also includes many statistical measurements and detailed class accuracy for the evaluation of results. The triage data were preprocessed using an unsupervised resample filter in WEKA. This preprocessing step was used to overcome the data unbalance problem. Resampling in WEKA creates a stratified subsample of the given dataset. The overall class distributions are approximately retained within the sample. Breiman recommends choosing the highest possible number of trees and using the root square of the future. The training set is then constructed randomly from the selected features. For feature selection, there are varieties of techniques that implement subset selection, such as genetic algorithms [31] Scatter Search [32] and many others [33]. Therefore, the two parameters to set for the random forest are the number of the trees, which is set to 700, and the number for the randomize feature that is used to construct the bootstrapped data, which is set to five random features, according to Breiman’s suggestion. A new subset training set is used to train each new split of trees, and no cost-complexity pruning is required. The new split tree is generated by randomly choosing a subset of predictors and determining the best split using only these predictors, and this tree is saved. The split in each tree is stopped when all the leaves on this tree have the same output label [34]. Finally, the majority vote is implemented for all of the tree outputs. Squared error loss is used as an optimization method for decreasing test error through lowering prediction variance while leaving bias unchanged. The out-of-bag error was used in this work as a measurement factor to optimize the best resampling ratio of the data [35]. The OOB was calculated and compared with different resampling ratios as an estimator of the generalization accuracy to select a near-optimal sampling ratio. The original dataset was up-sampled from 150% to 500% with a size of 50 to increase the original data size, using resampling with replacement. This means that the set used to construct a set of training data can be used again and is not dropped. The original and resampled data were used to build different random forest models. 2.3. Model evaluation As an internal cross-validation for this model, 0.36% of the data were used to calculate the OOB error for assessment of the model’s performance. The structured tree was trained depending on different sets of vectors generated by the bootstrap procedure. Bootstrap is a statistical method based on sampling with replacement. It is mainly a combination of bagging with random selection [23]. This makes the bootstrap more diverse than a single method. This method is used to generate the training set of the random forest technique by sampling the data set of n samples randomly with replacement to train each tree. Because sampling is done with replacement, the second training set will have some repeated elements in its set, with a probability of 1/n, while some elements are not picked from the original dataset [30]. This will be used as a test set with a probability of (1 − 1/n). To find the test probability as a percentage, the amount is multiplied by n as in Eq. (1). [1 − 1/n]n ≈ e(−1) = 0.368

(1)

424


Fig. 1. Trees optimization of the secondary triage data to build a random forest model.

From Eq. (1), we can see that the unpicked item from the dataset is 36.8%, while the rest will represent the training item percentage (63.2%). This procedure implements internal evaluation of the random forest model and eliminates the need for further evaluation such as cross-validation. The unpicked item will be used for internal evaluation with out-of-bag error. The accuracy of the models is another type of model evaluation measurement. Accuracy was calculated by dividing the correctly classified classes by the entire number of samples. The predicted output reference in this study is the triage output performed by the medical officers at UKMMC. The accuracy was calculated for 66 unseen instances to measure the ability of the model to predict the new patterns of the input variables. Furthermore, the confusion matrix of the unseen data for the selected model was measured, and the specificity and sensitivity of the matrix were then calculated as follows: Specificity =

Number of TN Total of TN + FP

(2)

Sensitivity =

Number of TP Total of TP + FN

(3)

Here, TP is true positive, FP is false positive, TN is true negative and FN is false negative. A specificity of 100% means that the test recognizes all actual negatives. Because 100% specificity means no positives are misclassified. A positive result in a high-specificity test is used to confirm the belonging to the class. However, a sensitivity of 100% means that the test recognizes all actual positives. Thus, in contrast to a high-specificity test, negative results in a high-sensitivity test are used to rule out the disease [36].


425

Table 3 Comparative evaluation of models Model no. Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9

Method Random forest Random forest with 150% data resampling Random forest with 200% data resampling Random forest with 250% data resampling Random forest with 300% data resampling Random forest with 350% data resampling Random forest with 400% data resampling Random forest with 450% data resampling Random forest with 500% data resampling

Number of training samples 1846 2769 3692 4615 5538 6461 7384 8307 9230

% OOB 0.373 0.093 0.056 0.0342 0.020 0.011 0.007 0.004 0.002

% Accuracy for the unseen data 86.360 84.848 86.363 84.848 87.878 87.878 86.363 86.363 86.363

Fig. 2. Number of samples versus OOB.

3. Results and discussion A random forest of 700 trees was constructed using 5 random features as the parameters for the model. Figure 1 represents the OOB error versus the number of trees; from the figure, 700 trees give lowest OOB error, 0.372, and the models were thus built from 700 trees. Many models have been implemented: one without data resampling and the others with different percentages of the original data resampled, with resampling percentages from 150% to 500% with a step size of 50. For all implementations of the random forest with and without resampling, 100% was achieved for the accuracy of the training data set. The OOB error is shown in Table 3 along with the accuracy of the unseen data. The OOB error of random forest without resampling is 0.372, while the models with pre-processed data have lower OOB errors depending on the ratio of the resampling, as shown in Fig. 2. In this figure,

426

D. Azeez et al. / Secondary triage classification using an ensemble random forest technique Table 4 Confusion matrix for secondary random forest model for the unseen data Predicted classes Five 9 0 0 0 0 0 1

Five Seven Three Four One Two Six

Seven 1 6 0 0 2 1 1

Actual classes Three Four One 0 0 0 0 0 0 17 0 0 1 7 0 0 0 9 0 0 0 0 0 0

Two 0 0 0 0 0 8 0

Six 0 0 0 0 0 0 3

Table 5 TP, FP, FN and TN for secondary random forest model for the unseen data TP FP FN TN

Five 9 1 1 56

Seven 6 0 5 55

Three 17 0 1 47

Four 7 1 0 57

One 9 2 0 54

Two 8 1 0 56

Six 3 2 0 61

the first point represents the OOB error for the random forest model without data pre-processing, and the others represent the OOB errors for the random forest model with resampling data for different ratios of data resampling. From the table and the Figures, we can summarize that increasing the ratio of resampling reduces the OOB error, with lower or equal accuracy for the unseen data. The resampled data with five times of the original data gives the lowest OOB error, which was 0.002. Only for 300% resampling of the data was a higher accuracy (87.878%) achieved for the unseen data. The model with 300% resampling was chosen for integration with a primary triage model to construct the final triage model to balance between high accuracy and low OOB error. The confusion matrix for the unseen data for the selected model is tabulated in Table 4: the TP represents the correct classified class for a certain class, the FP represents the incorrect classification under the specified class, the FN represents the classes that were classified incorrectly under the specified class, and TN represents the total correctly classified cases for class five, seven, three, four, one, two and six, respectively, which are tabulated further in Table 5. The sensitivity and specificity of the model were calculated from this table; they are 0.98 and 0.89, respectively. 4. Conclusion In this paper, an ensemble model using a random forest strategy has been used on unbalanced triage data to build a secondary triage model. For further enhancement for the OOB error, external randomized resampling has been implemented. Randomized resampling reduced the bias, while the random forest method works to reduce the variance without changing the bias. This reduced the OOB error of the random forest model. Accuracy and OOB error have been used as the evaluation criteria for choosing the best model for secondary triage prediction. The results presented here show that increasing the resampling ratio of the data reduced OOB error. Model 4 was the best model for secondary triage, with accuracy of 87.87% and OOB 0.020. The sensitivity and specificity for the unseen data are high, with values of 0.98 and 0.89, respectively.


427

Acknowledgments The authors would like to thank the Ministry of Education, Malaysia and Universiti Kebangsaan Malaysia for sponsoring this work under the Research University Grant: ERGS/1/2013/TK02/UKM/02/2 and UKM-GUP-2011-352. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]

Chi CH, Huang CM. Comparison of the Emergency Severity Index (ESI) and the Taiwan Triage System in predicting resource utilization. J Formos Med Assoc. 2006; 105(8): 617-625. Bruijns SR, Wallis LA, Burch VC. A prospective evaluation of the Cape triage score in the emergency department of an urban public hospital in South Africa. Emergency Medicine Journal. 2008; 25(7): 398-402. Gonzalez J, Soltero R. Emergency Severity Index (ESI) triage algorithm: Trends after implementation in the emergency department. Bol Asoc Med P R. 2009; 101(3): 7-10. Chan TC, Killeen JP, Vilke GM, Marshall JB, Castillo EM. Effect of mandated nurse-patient ratios on patient wait time and care time in the emergency department. Acad Emerg Med. 2010; 17(5): 545-552. Rathlev NK, Chessare J, Olshaker J, Obendorfer D, Mehta SD, Rothenhaus T, Crespo S, Magauran B, Davidson K, Shemin R, Lewis K, Becker JM, Fisher L, Guy L, Cooper A, Litvak E. Time series analysis of variables associated with daily mean emergency department length of stay. Ann Emerg Med. 2007; 49(3): 265-271. Karaca Z, Wong HS, Mutter RL. Duration of patients’ visits to the hospital emergency department. BMC Emerg Med. 2012; 12: 15. Michalowski W, Rubin S, Slowinski R, Wilk S. Mobile clinical support system for pediatric emergencies. Decis Support Syst. 2003; 36(2): 161-176. Michalowski W, Slowinski R, Wilk S, Farion KJ, Pike J, Rubin S. Design and development of a mobile system for supporting emergency triage. Methods Inf Med. 2005; 44(1): 14-24. Padmanabhan N, Burstein F, Churilov L, Wassertheil J, Hornblower B, Parker N A. Mobile Emergency Triage Decision Support System Evaluation. In: System Sciences, 2006. HICSS ’06. Proceedings of the 39th Annual Hawaii International Conference on, 04-07 Jan. 2006; p. 96b-96b. Sadeghi S, Barzi A, Sadeghi N, King B. A Bayesian model for triage decision support. International Journal of Medical Informatics. 2006; 75(5): 403-411. Lin WT, Wang S-T, Chiang T-C, Shi Y-X, Chen W-Y, Chen H-M. Abnormal diagnosis of Emergency Department triage explored with data mining technology: An Emergency Department at a Medical Center in Taiwan taken as an example. Expert Systems with Applications. 2010; 37(4): 2733-2741. Lin WT, Wu YC, Zheng JS, Chen MY. Analysis by data mining in the emergency medicine triage database at a Taiwanese regional hospital. Expert Systems with Applications. 2011; 38(9): 11078-11084. Chapman W, Dowling J, Cooper GF, Hauskrecht M, Valko M (STAT:ML; SDV:MHEP internationale Baltimore ÉtatsUnis). A Comparison of Chief Complaints and Emergency Department Reports for Identifying Patients with Acute Lower Respiratory Syndrome. In: Conference of the International Society for Disease Surveillance, 19-20/10/2006. 2006. Breiman L. Random Forests. Machine Learning. 2006; 45(1): 5-32. Jamal S, Periwal V, Scaria V. Predictive modeling of anti-malarial molecules inhibiting apicoplast formation. BMC Bioinformatics. 2013; 14(1): 55. Chen CW, Lin J, Chu YW. iStable: Off-the-shelf predictor integration for predicting protein stability changes. BMC Bioinformatics. 2013; 14 Suppl 2: S5. Stanislawski J, Kotulska M, Unold O. Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides. BMC Bioinformatics. 2013; 14: 21. Kuncheva LI. (2004) Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken, New Jersey, 2004. Rong Y, Yan L, Rong J, Hauptmann A On predicting rare classes with SVM ensembles in scene classification. In: Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03). 2003 IEEE International Conference on, 6-10 April 2003. 2003; p. III-21-24 vol.23. Joshi MV, Agarwal RC, Kumar V. Predicting rare classes: can boosting make any weak learner strong? Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining Edmonton, Alberta, Canada, 2002. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on. 2012; 42(4): 463-484.

428 [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36]

D. Azeez et al. / Secondary triage classification using an ensemble random forest technique Freund Y. Boosting a Weak Learning Algorithm by Majority. Information and Computation. 1995; 121(2): 256-285. Breiman L. Bagging predictors. Machine Learning. 1996a; 24: 123-140. Dietterich T. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning. 2000; 40(2): 139-157. Ambroise C, Mclachlan G J. Selection Bias in Gene Extraction on the Basis of Microarray Gene-Expression Data. Proceedings of the National Academy of Sciences. 2002; 99(10): 6562-6566. Azeez D, Ali MAM, Gan KB, Saiboon, I. Comparison of adaptive neuro-fuzzy inference system and artificial neutral networks model to categorize patients in the emergency department. SpringerPlus. 2013; 2(1): 1-10. Grafstein E, Bullard MJ, Warren D, Unger B. Revision of the Canadian Emergency Department Information System (CEDIS) Presenting Complaint List version 1.1. CJEM 10. 2008; (2): 151-173. Aminzadeh F & Dalziel WB. Older adults in the emergency department: A systematic review of patterns of use, adverse outcomes, and effectiveness of interventions. Annals of Emergency Medicine. 2002; 39: 238-247. Rutschmann OT, Chevalley T, Zumwald C, Luthy C, Vermeulen B, Sarasin FP. Pitfalls in the emergency department triage of frail elderly patients without specific complaints. Swiss Med Wkly. 2005; 135: 145-50. Witten I, Frank E, Hall M. Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, USA, 2011. Leardi R. Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection. Journal of Chemometrics. 1994; 8(1): 65-79. García López F, García Torres M, Melián Batista B, Moreno Pérez J, Moreno-Vega M. Solving feature subset selection problem by a Parallel Scatter Search. European Journal of Operational Research. 2006; 169(2): 477-489. Liu Y, Li X, Wu Z. The feature subset selection algorithm. J of Electron (China). 2003; 20(1): 57-61. Lempitsky V, Verhoek M, Noble JA, Blake A. Random Forest Classification for Automatic Delineation of Myocardium in Real-Time 3D Echocardiography. In: Ayache N, Delingette H, Sermesant M (eds) Functional Imaging and Modeling of the Heart, vol 5528. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2009, pp. 447-456. Martínez-Muñoz G, Suárez A. Out-of-bag estimation of the optimal sample size in bagging. Pattern Recognition. 2010; 43(1): 143-152. Gosztolya G, Bánhalmi A, Tóth L. Using One-Class Classification Techniques in the Anti-phoneme Problem. In: Araujo H, Mendonça A, Pinho A, Torres M (eds) Pattern Recognition and Image Analysis, vol 5524. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2009, pp. 433-440.

Feature selection and classification of leukocytes using random forest.

A Random Forest-based ensemble method for activity recognition.

Prehospital triage of trauma patients using the Random Forest computer algorithm.

Using random forest to model the domain applicability of another random forest model.

Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique.

Steering in a random forest: ensemble learning for detecting drowsiness-related lane departures.

Using beta binomials to estimate classification uncertainty for ensemble models.

EnSVMB: Metagenomics Fragments Classification using Ensemble SVM and BLAST.

Propensity score and proximity matching using random forest.

Predicting Metabolic Syndrome Using the Random Forest Method.

Classification of Potential Water Bodies Using Landsat 8 OLI and a Combination of Two Boosted Random Forest Classifiers.

An efficient word typing P300-BCI system using a modified T9 interface and random forest classifier.

Land cover mapping based on random forest classification of multitemporal spectral and thermal images.

Novel risk genes for systemic lupus erythematosus predicted by random forest classification.

Quantitative measurement of retinal ganglion cell populations via histology-based random forest classification.

Random Forest Classification of Depression Status Based On Subcortical Brain Morphometry Following Electroconvulsive Therapy.

Forest site-quality estimation using Forest Ecosystem Classification in Northwestern Ontario.

A new approach to human microRNA target prediction using ensemble pruning and rotation forest.

Ensemble renormalization group for the random-field hierarchical model.

An ensemble-of-classifiers based approach for early diagnosis of Alzheimer's disease: classification using structural features of brain images.

Delineation of blood vessels in pediatric retinal images using decision trees-based ensemble classification.

Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification.

An automated technique for carotid far wall classification using grayscale features and wall thickness variability.

Cirrhosis classification based on texture classification of random features.