Mining echocardiography workflows for disease discriminative patterns.

Mining Echocardiography Workflows for Disease Discriminative Patterns Ritwik Kumar, PhD, Tanveer Syeda-Mahmood, PhD, David Beymer, PhD, Colin Compas, PhD, Karen Brannon, MS IBM Research - Almaden, San Jose, CA Abstract Objective: To provide quick diagnostic insights to medical practitioners into echocardiograms by only analyzing the echocardiogram workflows (defined as the sequence of modalities examined). Methods: We define a dictionary of workflows, called subflows, which are commonly encountered in echocardiography workflows but are mutually exclusive. We represent each workflow as a mixture of dictionary subflows and learn discriminative models for various cardiac diseases using Support Vector Machines. Using these discriminative models, we can predict occurrences of diseases for any, yet unseen, echocardiogram workflow. Results: Working with a corpus of 2300 echocardiograms workflows, we build a dictionary of 172subflows. Using the associated reports (expert created) we identify the ground-truth diagnoses. We then build discriminative models for 7 different cardiac diseases. Using just the workflow as input, these models can predict diseases on average with over 75% accuracy. Conclusions: Mining collection of echocardiography workflows, for the first time, we are able to predict diseases without even looking at the image contents.

Introduction Sixty years since its invention1, echocardiography remains a critical tool in the hands of cardiologists for diagnosis and treatment of a multitude of cardiac diseases. Its ubiquitous presence can be attributed to its relatively noninvasive nature, ease of use, associated low costs and the array of useful clinical information about the heart structure, motion and blood flow that it provides. Echocardiography examines the heart with ultrasound waves. Using the core technology of capturing reflected ultrasound, it can construct detailed 2D or 3D images of the heart as well as characterize the blood flow therein (Doppler). In the course of an exam, an expert sonographer switches among various modalities (3D Video, 2D Video, M-mode, CW-Doppler, PW-Doppler and their hybrids) systematically examining the heart walls, valves and blood flow from various viewpoints, making critical measurements along the way. American College of Cardiology, American Heart Association, and American Society of Echocardiography, among others, have provided detailed guidelines2, 11, 12, 13, 14, 15 and protocols in this regard. In addition to the prior knowledge about the patient and these guidelines, findings during the course of an exam also guide a sonographer as he decides on what modality to look at next. Typical echocardiogram modalities are depicted in Figure 1. A radiologist or a cardiologist typically examines the echocardiogram conducted by the sonographer post-hoc for the final diagnosis. This is an arduous (a typical echocardiogram can generate up to 3000 image frames and text) and expensive stage (since specialists’ time is costly). Clearly, this is an opportunity where medical informatics can help alleviate the cost and the pain. Towards this, automatic intelligent summarization3 of such exams has been attempted where key frames from different views and modalities are selected. Though useful, such summarizations do not provide any diagnostic insight to the doctors. Methods4 that do provide diagnostics insights tend to tackle the problem per modality, are computationally expensive, time consuming and do not take the holistic approach (fuse information across modalities) to diagnosis as recommended in the guidelines2. In this paper, we examine the hypothesis that given a large enough collection of echocardiogram workflows (defined as a sequence of echocardiogram modalities in the order of their examination), patterns indicative of various diseases can be indentified. This novel approach is midway between using the echocardiogram workflow for mere summarization and using single modality for diagnostic insight. Furthermore, we take a completely data driven approach and thus we do not bias the system to look for specific rule based pattern.

814

Starting with a collection of workflows, we build a dictionary of subflows, defined as a sub-sequence of a workflow, which are used as a basis to represent workflows. The subflows are selected based on both their frequency of occurrence as well as their mutual exclusiveness. These subflows can be thought of as a way to extract features from a workflow. Once all the workflows have been transformed into the feature space, using the labels obtained from expert created reports, we train one Support Vector Machine (SVM) 24 per disease so that it can discriminate between a diseased and a control case. Given a new unseen workflow, we extract same features and use the SVMs to predict presence or absence of various diseases. Rest of the paper is organized as follows: We begin with a brief survey of workflow, radiology workflow and echocardiography workflow related existing work. We then describe the corpus of data we use, its properties and a few preprocessing steps. We then describe the echocardiogram workflows and their representation using a subflow dictionary. This followed by a description of discriminative model building using SVM. Finally, we present some empirical results and concluding remarks.

Figure 1. Typical modalities encountered in an echocardiogram. (a) 2D Video (used to study heart structure) (b) CW Doppler (used to study blood flow through heart valves) (c) PW Doppler (used to study blood flow in a localized region) (d) Color Doppler (used to study blood flow in context of the heart structure) (e) M Mode (used to study local structure movements over time) (f) Text (used to record or lookup measurements).

Related Work The general topic of workflow analysis has been of interest to researchers for a long time. There is a large body of literature that looks at workflow mining as a business process optimization problem5. This work is largely focused on indentifying deviation from a standard workflow or discovering a workflow from a collection of noisy and incomplete activity logs6, 7. In the medical field, given high costs and complexity, there has been an increased focus in the policy sphere to standardize workflow and automatically record them for applications like auditing8. There have also been some attempts to automatically analyze workflows to standardize processes like surgeries and use them for teaching9. Automatic mining of workflows from radiology departments in particular has been used to assess recorded data quality and deviations from standard workflows10.

815

In the domain of echocardiography, with the advent of digital imaging, there has been increased focus on providing guidelines for workflows11, 12, 13, 14, 15, especially in context of diagnosing diseases. Note that most of the time these guidelines do not dictate specific order in which modalities must be examined, rather they describe a set of modalities that should be used in tandem to make certain measurements. To the best of our knowledge, automatic mining of echocardiography workflows has not been explored thus far. The extent of automatic processing of complete echocardiogram studies has been limited to video summarization3, 18, though at times, workflow information has been leveraged to accomplish video summarization3. There are various methods that try to automatically detect diseases and compute clinical measurements, but they are often focused on a single modality4, 16, 19, 22. Within this landscape of workflow and echocardiography literature, this paper presents a novel approach to predicting diagnostic information by mining collection of echocardiogram workflows. Not only do we explore a novel hypothesis, we also present novel methods – dictionary based workflow classification – to accomplish the set goals. Echocardiogram Data Corpus The corpus of data we work with is composed of 2300 echocardiogram exams and their associated expert created reports. The echocardiogram data was provided to us in the form of a collection of video frames. These frames belong to video segments, called runs that belong to the collection of modalities depicted in Figure 1. To determine the workflow, we need to determine the modality that each run belongs to. In absence of the associated DICOM headers, given the large number of runs, we used an automatic method to assign a label to each run. Note that in certain instances, a run was composed solely of a single 2D Video frame. By manually analyzing the runs, we determined certain characteristics image templates for each modality. These templates are show in Figure 2. For each of these templates, we also determined the likely locations for their appearance in various modalities. For each frame, using 2D cross-correlation, we obtained a matching score for each of the templates show in Figure 2. Finally, using a rule based system (rules were built using manual analysis of the corpus), we classified each run as one of the following (a) 2D: run with single 2D frame (b) 2DV: a 2D video run (c) Text (d) PW: a PW Doppler run (e) CW: a CW Doppler run (f) Color: a color Doppler run (g) MM: a M-Mode run and (h) Und: an undefined run which was a catch all label for runs that could not be classified to other classes. Please not the abbreviation in bold, we will use them to denote modalities in the rest of this manuscript.

Figure 2. Image templates used to indentify the modality a run belongs to.

There was a total of 143,786 runs identified from the corpus and the distribution of associated labels is presented in Figure 3(a). It can be noted that 2D Video make up over 40% of the runs while the undefined cases make up less than 0.001% of the total runs. Obtaining disease labels for echocardiograms also involved automatic processing. The reports associated with the echocardiograms in our corpus were created by experts but are composed of unstructured text. Using a dictionary based approach within the Apache Unstructured Information Management Architecture (UIMA) 23, we obtained a list of diagnoses from each report. From the list of diseases, we identified the seven most frequent diseases (TR: Tricuspid Regurgitation MR: Mitral Regurgitation LVM: Left Ventricular Hypertrophy AS: Aortic Stenosis AR: Aortic Regurgitation AF: Atrial Fibrillation Lev: Levocardia.). These diseases and their frequency of occurrence

816

(number of reports) are presented in Figure 3(b). For each of the disease, the remaining echocardiograms (from the total of 2300) were considered the control cases. The dictionary based approach ensured that there were no falsepositives in our diagnoses list.

(a)

(b)

Figure 3. (a) Distribution of various modalities in our echocardiogram data corpus. (b) Distribution of disease labels automatically identified using associated expert created reports. TR: Tricuspid Regurgitation MR: Mitral Regurgitation LVM: Left Ventricular Hypertrophy AS: Aortic Stenosis AR: Aortic Regurgitation AF: Atrial Fibrillation Lev: Levocardia.

Echocardiogram Workflow As mentioned before, a sonographer during the course of an exam switches among various echocardiogram modalities as he explores the heart. The order of modalities explored is guided by both the standard guidelines as well as immediate findings. Consider the following example: guidelines for assessment of Aortic Stenosis dictate that CW Doppler be used to investigate the jet velocity. This requires use of 2D Video to position the beam over the aortic valve before CW Doppler can be used. Also, it is recommended that Color Doppler be used to avoid recording eccentric regurgitant jets from the mitral valve that may cause erroneous CW Doppler readings12. The above recommendation translates to most sonographers using approximately similar sequences of modalities to investigate Aortic Stenosis. Furthermore, it is likely that in case evidence of Aortic Stenosis is detected, the used pattern may be repeated by the sonographer to confirm the finding. Motivated by this, we define echocardiography workflow for the purposes of this paper as temporally ordered sequence of modalities that are examined in a given echocardiogram exam. A typical echocardiogram exam in our collection is about 65 modalities long while the maximum length observed is 211 modalities. Note that echocardiogram guidelines sometimes define the workflow down to an even granular level of specificity – the viewpoint. Viewpoint refers to the combination of position and angle at which is ultrasound probe is placed on the patient. At times the viewpoint data associated with a run is available in the DICOM header. Since our corpus lacks DICOM headers, we did not have direct access to this information. That said, there are automatic methods17, 20, 21 for determining the viewpoint in a given echocardiogram frame (of course, with associated inaccuracies), but for the sake of simplicity, we define our workflow simply in terms of the modalities explored. Here is an example of a typical echocardiogram workflow: {2DV, 2DV, 2DV, 2DV, 2DV, Color, 2DV, Color, Color, 2DV, Color, CW, Color, CW, 2DV, 2DV, MM, 2D, 2DV, 2D, 2D, TEXT, Color, CW, Color, PW, CW, Color, PW, PW, MM, 2DV, 2D}. We will denote each workflow by wi and the set of all workflows as W = {w1, w2, … , w2300}. As evident, representing a workflow with an explicit listing of all the modalities can be cumbersome to read. Thus in Figure 4 we present a state flow diagram representation for the echocardiogram workflows. Description about how to read such diagrams is provided in the caption.

817

Figure 4. State transition diagram for a workflow given as {2DV, 2DV, 2DV, 2DV, 2DV, Color, 2DV, Color, Color, 2DV, Color, CW, Color, CW, 2DV, 2DV, MM, 2D, 2DV, 2D, 2D, TEXT, Color, CW, Color, PW, CW, Color, PW, PW, MM, 2DV, 2D}. The rectangular boxes represent the different modalities and the links between them represent a transition. The start state is indicated with darker rectangle. The links are color coded to represent their directions. The Legend Box shows that the link that goes from a node listed left of a destination node are colored in gray while the reverse direction is coded in red. The opacity of the link indicates it frequency in the workflow. The links are also labeled with the transition number. For example, in the above example, the first four transitions are all from 2DV to 2DV, indicated by a yellow loop on 2DV. Transition number five, 2DV to Color is gray in color while transition number six, back from Color to 2DV, is red.

Echocardiogram Workflow Representation Automatically discovering patterns in data requires a definition of similarity or distance between any pair of entities. This allows us to compare given entities and look for patterns that a common among similar and dissimilar entities. We approach this problem by first preparing a common basis for representation of workflows. Without a common basis, comparing workflows is a challenge since they can be of differing lengths and composition. Dictionary Construction: We begin by defining a dictionary of salient subflows. We call the dictionary elements subflows because of the way they are derived and also due to the fact that they are typically smaller in length than a workflow. The objective is to assemble a set of subflows, which we will eventually use to represent workflows, which is both descriptive and distinctive. We require them to be descriptive so that when used to represent other workflows, the representation is not too erroneous and we require them to be distinctive so that we do not encode redundant information in our basis dictionary. Foremost, we analyze all possible pairs of workflows in our collection and derive longest common subsequences (LCS) 25 from them. Note that a LCS does not have to be contiguously present in the parent workflows. As an example, if w1= {2DV, 2DV, Color, CW, PW, MM, 2DV, 2D} and w2= {Color, CW, Color, CW, 2DV, 2DV},

818

LCS(w1, w2) = {Color, CW, 2DV}. From our corpus, we derived over 4 million subflows using this pair wise analysis. In order to reduce the number of elements in this large set, next, we retain only the 10,000 most frequently occurring subflows. This step should retain only the most descriptive subflows. Next we remove subflows that are less than 3 modalities long. This is done so as to not admit subflows in our dictionary that will be found in almost all of the workflows. Finally, we carry out a pair wise analysis of the retained subflows and if a set of subflows {s1 s2, …, sk} is found such that s1 Є s2 Є … Є sk, where si Є si represents LCS(si, sj) = si, we only retain sk. This is done because any information that is captured by s1 through sk-1 is present in sk. Doing this reduces the amount of redundancy further in our dictionary. After these steps, we are left with a collection of only 172 subflows that are sufficiently descriptive and distinctive. We represent the dictionary with D = {e1, e2, …, e172}, where ei are the dictionary elements. Top five elements of the dictionary are presented in Figure 5.

Figure 5. Top five elements of the subflow dictionary. Note that the top entry is simply a loop between 2D Video and Color Doppler modalities. Also apparent is the fact the 2D Video modality forms an anchor point from which transition to most other modalities takes place. Transitions between CW Doppler and Color Doppler are also frequent, as one would expect going by the guidelines. Zoom in the digital copy to see details.

Workflow Representation: Once the subflow dictionary has been created, we define a method to represent each workflow, no matter its composition or length, in terms of dictionary elements. We accomplish this by comparing a given workflow wi against each of the dictionary elements and recording the length of LCS matching score. This is defined as pij = length(LCS(wi,ej))/min(length(wi), length(ej)). Note that this translates each workflow (wi) into a 172-dimensional vector (pi.) where each dimension show the amount of overlap the workflow has with each of the dictionary elements. Once in the 172-dimensional subflow space, any of various vectors distances e.g. cosine distance, Euclidian distance etc. can be used to compute distances between workflows. Thus distance between two workflows wi and wj is defined as DDictionary(wi, wj) = d(pi. , pi.), where d is any vector space distance.

Discriminative Disease Model Using our novel representation of workflows, we seek to build models that can help us identify possible diagnoses. For this we call upon the Support Vector Machine (SVM) framework. An SVM is a classifier that, when used with two classes, tries to find a boundary in the data space such that the two classes separated by the maximum possible margin. Given a set of training data which is composed of a set of vectors and their labels, SVM tries to find the parameters of this maximum margin boundary. This boundary can be thought of as a high dimensional line, which is characterized by a set of weights (α). Since the objective of this work is to predict occurrence of each of the seven cardiac diseases identified before (see Figure 3), we build one SVM model for each one of the diseases.

819

TR

ASC

MR

AR

LVH

AF

Lev

Figure 6. SVM model parameters for the seven selected diseases. The absolute value along a dimension indicates the high importance of the associated dictionary subflow in the discriminative model. TR: Tricuspid Regurgitation MR: Mitral Regurgitation LVM: Left Ventricular Hypertrophy AS: Aortic Stenosis AR: Aortic Regurgitation AF: Atrial Fibrillation Lev: Levocardia Zoom in the digital copy to see details.

In Figure 6, we present the SVM parameters (α) that were learned for the seven diseases. α is a 172-dimensional vector. A large absolute value along a dimension of α indicates the high importance of that particular dimension (and associated dictionary element) in the disease discriminative model. In Figure 7 we present the top three most important dictionary elements (subflows) for the seven diseases. Even within this small set, one could spot subflows that appear frequently, for instance the subflow rank # 3 for TR repeats four times in this collection of twenty-one flows.

TR

MR

LVH

AS

AR

AF

Lev

Figure 7. Top three highest weighted subflows for each of the disease models. TR: Tricuspid Regurgitation MR: Mitral Regurgitation LVM: Left Ventricular Hypertrophy AS: Aortic Stenosis AR: Aortic Regurgitation AF: Atrial Fibrillation Lev: Levocardia Zoom in the digital copy to see details.

820

Results For each disease and its associated control group, we conducted several blind experiments to test the workflow mining framework described above. In each experiment, we used 95% of the data to train the model and the remaining 5% to test. We repeated these experiments 10 times and we report the average accuracy number in Table 1. In the section labeled “Subflow Dictionary”, numbers under the column “Control” represent the percentage of test workflows that were labeled by experts as disease free that were correctly classified by our system. The numbers under the column “Diseased” show the same percentage for expert labeled diseased cases. The “Mean” column reports the average of the “Diseased” and “Control” columns. In addition to experiments on the proposed method, we also tested a baseline method for comparison. These results are presented in Table 1 in the section labeled “Longest Common Subflow”. Instead of defining distance between workflows via a dictionary, here we simply use a score based on overlap between two workflows. As per this definition, distance between workflows wi and wj is given as DLCS(wi, wj) = length(LCS(wi,wj))/min(length(wi), length(wj)). For the binary classification task, it can be noted that “Longest Common Subflow” performs hardly better than chance. Recall that even a random classifier should be able to attain 50% accuracy. In comparison, the proposed dictionary based method performs significantly better. The application we are targeting is to quickly provide a summary of possible diseases in a given echocardiogram exam to the doctors so as to help them in their task of diagnosing the patient. Since our system, on an average can predict diseased cases with 77% accuracy, in conjunctions with a traditional video summarization system, this is a critical value addition. For context, an automatic method for disease detection that works with 2D Video and requires substantial amount of image processing attains an average of 80% accuracy16.

Table 1. Disease prediction accuracy percentage. Results from the proposed method “Subflow Dictionary” and a baseline method “Longest Common Subflow” are presented. In a given row, Control number is the accuracy of the systems on cases without the disease, Diseased number is the accuracy on cases with disease and Mean is the average of the previous two accuracy numbers. Higher the number, better the system. The proposed method can predict disease with over 75% accuracy on average.

Tricuspid Regurgitation Mitral Regurgitation LV Hypertrophy Aortic Sclerosis Aortic Regurgitation Atrial Fibrillation Levocardia Overall

Subflow Dictionary Control Diseased Mean 66 % 70.0 % 74 % 65 % 69.5 % 74 % 61 % 67.5 % 74 % 62 % 66.5 % 71 % 66 % 70.0 % 74 % 67 % 74.0 % 78 % 95 % 95.5 % 94 % 68.9 % 72.9 % 77.0 %

Longest Common Subflow Control Diseased Mean 59 % 61 % 60.0 % 56 % 57 % 56.5 % 56 % 58 % 57.0 % 54 % 52 % 53.0 % 56 % 50 % 53.0 % 53 % 53 % 53.0 % 70 % 71 % 70.5 % 57.7 % 57.4 % 57.6 %

Discussion and Conclusion Echocardiogram workflows are known to be important as evident from various guidelines that present modalityviewpoint sequence protocols for investigating various diseases. But never before has the echocardiogram workflow data been explored for unknown and unseen patterns that are indicative of diseases. In this paper we present one of the first attempts to exploit these hidden patterns for obtaining quick diagnostic insights into echocardiograms without looking into the image content. In addition to predicting presence of diseases, our method can also produce evidence to support its opinion. For instance, if a workflow is classified to have Aortic Stenosis, we could use the highest ranking subflows that are present in the disease discriminative model (Figure 6) for Aortic Stenosis that are also present in the given workflow as evidence. Thus, the presented disease models are not mere black-box classifiers.

821

We find the current results encouraging but at the same time, our findings suggest various other tantalizing directions for further investigation. For instance, the subflows that are often used in diagnosing large number of diseases could possible be made part of the standard practice guidelines. If appropriate labels are available, one could also use the proposed mechanism to obtain insight into how different subflows correlate with age, gender or other physical conditions of the patients. Workflow descriptions obtained from sonographer-trainees can be used to detect subflow patterns that are commonly used in error and can be explicitly corrected for. If viewpoint information is available, we could describe the workflow in more detail and could possibly obtain higher accuracy rates. From the data mining point of view, in the current work, we have treated each disease independently. Looking at correlation between workflows among multiple diseases can further bolster our disease detection accuracy. Also, the current dictionary based representation of the workflows could be used to carry out unsupervised statistical analysis on the space of all workflows. This could possibly allow us to visualize workflows in 2D or 3D space. It is important to not look at workflow based disease prediction as an alternative to actual medical image analysis. Workflow based prediction provides information complementary to what is present in the images and can used to improve the performance of more detailed disease prediction systems. To conclude, we would like to reiterate that in this paper we have presented stories from our first few sorties into the yet unexplored universe of echocardiography workflow mining. We have built models that can predict diseases without looking into an echocardiogram’s image content with reasonably high accuracy.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

13.

14.

15.

Siddharth Singh and Abha Goyal. The Origin of Echocardiography. Tex Heart Inst J. 2007; 34(4): 431–438. http://www.asecho.org/i4a/pages/index.cfm?pageID=4475 Shahram Ebadollahi, Shih-Fu Changa, HenryWub, Echocardiogram Video Summarization, Proceedings of SPIE Vol. 4325 (2001) David Beymer Tanveer Syeda-Mahmood Arnon Amir Fei Wang Scott Adelman, Automatic Estimation of Left Ventricular Dysfunction from Echocardiogram Videos, MMBIA 2009. W.M.P. van der, Aalst B.F. van Dongen , J. Herbst,, L. Maruster,G. Schimm, A.J.M.M. Weijters Workflow mining: A survey of issues and approaches, Data & Knowledge Engineering 47 (2003) 237–267. Silva, Ricardo, Jiji Zhang, and James G. Shanahan. "Probabilistic workflow mining." Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 2005. Sun, Weixiang, et al. "Incremental Workflow Mining with Optional Patterns and Its Application to Production Printing Process." International Journal of Intelligent Control and Systems 12.1 (2007): 45-55. Reiner, Bruce. "New strategies for medical data mining, part 3: automated workflow analysis and optimization." Journal of the American College of Radiology: JACR 8.2 (2011): 132. Blum, Tobias, et al. "Workflow mining for visualization and analysis of surgeries." International journal of computer assisted radiology and surgery 3.5 (2008): 379-386. Lang, M., Kirpekar, N., Burkle, T., Laumann, S., & Prokosch, H. (2007). Results from data mining in a radiology department: the relevance of data quality. Studies in health technology and informatics, 129(1), 576. IHE Cardiology Technical Committee, 3D/4D Echocardiography Workflow, 2011 Baumgartner, H., Hung, J., Bermejo, J., Chambers, J. B., Evangelista, A., Griffin, B. P & Quiñones, M. (2009). Echocardiographic assessment of valve stenosis: EAE/ASE recommendations for clinical practice. European Journal of Echocardiography, 10(1), 1-25. Grayburn, M. D., Kraft, C. D., Nihoyannopoulos, P., Quinones, M. A., Rakowski, H., & William, J. (2003). Recommendations for evaluation of the severity of native valvular regurgitation with two-dimensional and Doppler echocardiography. J Am Soc Echocardiography, 16, 777-802. Bonow, Robert O., Blase Carabello, Antonio C. de Leon Jr, L. Henry Edmunds Jr, Bradley J. Fedderly, Michael D. Freed, William H. Gaasch et al. "Guidelines for the management of patients with valvular heart disease: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee on Management of Patients with Valvular Heart Disease)." Circulation 98, no. 18 (1998): 1949-1984. Cheitlin, Melvin D., William F. Armstrong, Gerard P. Aurigemma, George A. Beller, Fredrick Z. Bierman, Jack L. Davis, Pamela S. Douglas et al. "ACC/AHA/ASE 2003 guideline update for the clinical application of echocardiography: summary article: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (ACC/AHA/ASE Committee to Update the 1997 Guidelines for

822

16.

17.

18.

19.

20.

21. 22.

23. 24. 25.

the Clinical Application of Echocardiography)." Journal of the American College of Cardiology 42, no. 5 (2003): 954-970. Kumar, R., Wang, F., Beymer, D., & Syeda-Mahmood, T. (2010, June). Cardiac disease detection from echocardiogram using edge filtered scale-invariant motion features. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on (pp. 162-169). IEEE. Kumar, R., Wang, F., Beymer, D., & Syeda-Mahmood, T. (2009, June). Echocardiogram view classification using edge filtered scale-invariant motion features. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 723-730). IEEE. Ebadollahi, S., Chang, S. F., & Wu, H. (2002). Echocardiogram videos: summarization, temporal segmentation and browsing. In Image Processing. 2002. Proceedings. 2002 International Conference on (Vol. 1, pp. I-613). IEEE. Kühl, Harald P., Marcus Schreckenberg, Dierk Rulands, Markus Katoh, Wolfgang Schäfer, Georg Schummers, Arno Bücker, Peter Hanrath, and Andreas Franke. "High-resolution transthoracic real-time three-dimensional echocardiography: quantitation of cardiac volumes and function using semi-automatic border detection and comparison with cardiac magnetic resonance imaging." Journal of the American College of Cardiology 43, no. 11 (2004): 2083-2090. Qian, Y., Wang, L., Wang, C., & Gao, X. (2013). The Synergy of 3D SIFT and Sparse Codes for Classification of Viewpoints from Echocardiogram Videos. In Medical Content-Based Retrieval for Clinical Decision Support (pp. 68-79). Springer Berlin Heidelberg. S. Zhou, J. Park, B. Georgescu, J. Simopoulos, J. Otsuki, and D. Comaniciu. Image-based multiclass boosting and echocardiographic view classification. In CVPR, pages 1559–1565, 2006. D. Beymer and T. Syeda-Mahmood. Cardiac disease recognition in echocardiograms using spatio-temporal statistical models. In IEEE Computer Society Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA), pages 1–8, 2008. Ferrucci, D., & Lally, A. (2004). UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3-4), 327-348. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297. Hirschberg, D. S. (1977). Algorithms for the longest common subsequence problem. Journal of the ACM (JACM), 24(4), 664-675.

823

Mining Discriminative Patterns to Predict Health Status for Cardiopulmonary Patients.

Text-mining-assisted biocuration workflows in Argo.

Frequent and discriminative subnetwork mining for mild cognitive impairment classification.

An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

Discriminative sparse connectivity patterns for classification of fMRI Data.

Scientific workflows for bibliometrics.

Mining disease risk patterns from nationwide clinical databases for the assessment of early rheumatoid arthritis risk.

Diagnosing Coronary Artery Disease via Data Mining Algorithms by Considering Laboratory and Echocardiography Features.

New techniques for mining frequent patterns in unordered trees.

Practical Approaches for Mining Frequent Patterns in Molecular Datasets.

Improved candidate drug mining for Alzheimer's disease.

Joint Discriminative and Representative Feature Selection for Alzheimer's Disease Diagnosis.

DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework.

Computing Workflows for Biologists: A Roadmap.

Visual workflows for 13C-metabolic flux analysis.

General mortality patterns in appalachian coal-mining and non-coal-mining counties.

Argo: enabling the development of bespoke workflows and services for disease annotation.

Beyond Contagion: Reality Mining Reveals Complex Patterns of Social Influence.

Mining of high utility-probability sequential patterns from uncertain databases.

Hyper-structure mining of frequent patterns in uncertain data streams.

Mining social media and web searches for disease detection.

Mining patterns of adverse events using aggregated clinical trial results.

DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

Mining Symptom-Herb Patterns from Patient Records Using Tripartite Graph.