Technical Report

Development of a Personalized Training System Using the Lung Image Database Consortium and Image Database Resource Initiative Database Hongli Lin, PhD, Weisheng Wang, PhD, Jiawei Luo, PhD, Xuedong Yang, PhD Rationale and Objectives: The aim of this study was to develop a personalized training system using the Lung Image Database Consortium (LIDC) and Image Database resource Initiative (IDRI) Database, because collecting, annotating, and marking a large number of appropriate computed tomography (CT) scans, and providing the capability of dynamically selecting suitable training cases based on the performance levels of trainees and the characteristics of cases are critical for developing a efficient training system. Materials and Methods: A novel approach is proposed to develop a personalized radiology training system for the interpretation of lung nodules in CT scans using the Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI) database, which provides a Content-Boosted Collaborative Filtering (CBCF) algorithm for predicting the difficulty level of each case of each trainee when selecting suitable cases to meet individual needs, and a diagnostic simulation tool to enable trainees to analyze and diagnose lung nodules with the help of an image processing tool and a nodule retrieval tool. Results: Preliminary evaluation of the system shows that developing a personalized training system for interpretation of lung nodules is needed and useful to enhance the professional skills of trainees. Conclusions: The approach of developing personalized training systems using the LIDC/IDRL database is a feasible solution to the challenges of constructing specific training program in terms of cost and training efficiency. Key Words: Personalized radiology education; LIDC/IDRI; Content-Boosted Collaborative Filtering; lung nodule. ªAUR, 2014

L

ung cancer has been the most mortal cancer for both men and women in the last two decades (1). For example, in 2013, lung cancer is expected to account for 26% of all female cancer deaths and 28% of all male cancer deaths in America according to the 2013 Cancer Statistics established by the American Cancer Society (1). A number of studies suggest that early detection and diagnosis is the most promising means of increasing the survival rate of patients (2). Therefore, a significant degree of research has been undertaken to improve the diagnosis and detection accuracy by radiologists. A variety of computer-aided detection (CAD) systems were developed to assist radiologists in detect-

Acad Radiol 2014; 21:1614–1622 From the Key Laboratory for Embedded and Network Computing of Hunan Province, School of information science and engineering, Hunan University, 410082 Changsha, China (H.L., W.W., J.L.) and Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada (X.Y.). Received January 11, 2014; accepted July 21, 2014. Address correspondence to: H.L. e-mail: [email protected] ªAUR, 2014 http://dx.doi.org/10.1016/j.acra.2014.07.020

1614

ing lung nodules in computed tomography (CT) scans (3–6), and a number of computer-aided diagnosis (CADx) approaches were proposed to assist radiologists in distinguishing malignant nodules from benign ones (7–9). The CAD systems being developed for lung cancer detection and classification may require training and evaluation based on CT images. It is also useful to assess the performance of different CAD systems developed by different research groups and to verify their potential clinical utility. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI) have established a reference database, called the LIDC/IDRI database, publicly available to the medical imaging research community. This initiative was sponsored by the National Cancer Institute, further advanced by the Foundation for the National Institutes of Health, and accompanied by the Food and Drug Administration through active participation (10–13). In addition to the assessment of the performance of the various CAD methods, the LIDC/IDRI database has also inspired other research with a variety of applications (14–17). For example, Michael et al. (14) created the Content-Based

Academic Radiology, Vol 21, No 12, December 2014

Image Retrieval (CBIR) framework to retrieve images of similar nodules among CT images of pulmonary nodules from the collection provided by the LIDC. They compared three feature extraction methods: 1) Haralick co-occurrence, 2) Gabor filters, and 3) Markov random field. The results showed that the Gabor and Markov descriptors perform better than does the Haralick co-occurrence method at retrieving similar nodules. To bridge the semantic gap between radiologists’ ratings and image features, Dasovich et al. (15) researched the relationship between semantic and contentbased similarity using LIDC. They developed a conceptualbased similarity model derived from content-based similarity to improve CBIR. The potential value of these resources is still under active exploration. Adequate training is essential to radiologists for improving their diagnostic accuracy (18,19). With the widespread use of the Internet today in education, computer-aided radiology training is becoming an effective approach because it overcomes the limitations of time, location, and personnel associated with traditional radiology training methods (20,21). There exist, already, a number of services on the Internet from several organizations, such as Radiological Association of North America, MedPix TM, MyPACS, and SonoWorld, that provide convenient tools and a massive amount of training resources. Several e-learning systems or training systems have been developed for training medical students or radiologist residents in the interpretation of mammograms, dental images, and other imaging modalities (22,23). For example, Guliato et al. (22) designed a teaching system named Interpretation and Diagnosis of Mammograms for training students in the interpretation of mammograms and diagnosis of breast cancer. Wu et al. (23) proposed a Web-based training method for the interpretation of dental images. These services and training systems allow trainees to access a wider range of cases than those seen typically in everyday practice. However, maximizing the potential of these resources is a challenge. Currently, most services available suffer from a common limitation, namely they follow the one-sizefits-all static training paradigm, presenting the same content to all trainees, despite differing individual needs. Although these services provide tools to enable trainees to retrieve cases in which they are interested, the retrieval tools passively select cases according to the keywords of trainees. In other words, the tools have no capability to recognize the knowledge and experience level of an individual and, thus, are not be able to intelligently choose suitable cases meeting his or her performance level. We feel that the development of personalized radiology training systems, which can intelligently choose appropriate cases for each trainee based on individual performance level, deserves more attention from researchers because they make training more effective and timeeffective. Interpretation of radiographic imaging is a process of decision making, which requires broad knowledge and in-depth understanding of the underlying pathology and consequent radiographic manifestations. The science of radiologic expertise has always been a major focus for

DEVELOPMENT OF A PERSONALIZED TRAINING SYSTEM

researchers all around the world; work on decision making, on the representation of expert knowledge, and on perceptual processes are research focuses of this area (21). Hence, the study results in this domain furnish some conclusions for radiologic training. Currently, personalized radiology education in the interpretation of radiologic imaging has been gaining increased attention from researchers around the world. Researchers are exploring approaches for the development of personalized radiology education systems for the interpretation of radiographic imaging and, fortunately, have made some progress in this area (24–26). The user model approach, collaborative filtering (CF) method, and content-boosted CF algorithm (CBCF) were proposed to predict the difficulty levels of unseen cases for a given trainee to be used in personalized radiology education. Very few researchers focus on developing specific training systems for the interpretation of lung CT scans, in terms of detection and classification of lung nodules. Similar to other specialized radiology training systems, we believe that medical students and radiologist residents specializing in lung imaging would benefit from such a training system. It would allow trainees to be exposed to a bigger set of training cases and to be guided by tools through simulated diagnosis to improve their performance levels. Furthermore, appropriate feedback can be obtained in a timely fashion. However, there are several obstacles to developing such a specific training system. One is the collection of a large amount of suitable training cases. For an efficient training system, a case repository containing enough typical cases is critical. However, collecting a sufficient number of appropriate cases is a laborious and timeconsuming process. Once a large amount of suitable training cases have been collected, marking and annotating these training cases as instructions, references, or evaluation criteria (called ‘‘gold standard’’) in the training process become an issue. Usually, a panel of radiologist experts is invited to assess each scan by marking the boundary of findings and providing their assessment of the probability of malignancy and appearances of lesions as annotation. Data deidentification is another tedious and time-consuming task. Therefore, these difficulties may be a part of the reason why only a small number of training systems are available today for the interpretation of lung nodules. Another obstacle is improving training efficiency. Most of the available radiology training systems follow the one-size-fits-all static training paradigm, providing the same training cases to all trainees, regardless of their individual needs. This situation often leads to lower training efficiency. Hence, a personalized training paradigm is necessary to enhance training efficiency. In this article, we propose a specialized personalized training system for training medical students and radiologist residents to detect and diagnose lung nodules based on the LIDC/IDRI database. Because this database is a result of the collective efforts of a consortium of professional institutions, the problems of collecting suitable training scans and annotating each scan have been well addressed; therefore, it improves the time efficiency for developing our training system. Also, a CBCF algorithm 1615

LIN ET AL

was devised as part of our system to predict the difficulty level of each scan for each trainee when intelligently choosing optimal cases for that individual. This improves the training efficiency. In this article, we focus on 1) the creation of a training database containing selected cases from the LIDC/IDRI database; 2) devising a prediction algorithm that can estimate the level of difficulty of a case to an individual, thus providing personalized training; and 3) the design and implementation of an integrated personalized training system. The remainder of the article is organized as follows. Methods section describes the three core components of our training system: 1) the training database based on the LIDC/IDRI database; 2) the CBCF prediction algorithm for personalized training; and 3) the training system. Results and System Evaluation section presents the results of the training system and the evaluation method. Conclusions section summarizes our work. METHODS Well-documented training cases, convenient tools, and an accurate prediction algorithm are the core components of an effective personalized training system. These components are described in detail in the following. Data

A large repository of CT scans is critical to a training system. The LIDC/IDRI database contains 1018 cases, each of which includes images from a clinical thoracic CT scan and an associated Extensive Markup Language (XML) file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the image annotation process, a comprehensive set of written instructions was available to each radiologist. These instructions evolved from a consensus-based process conducted over numerous telephone conferences and meetings of the 12-member LIDC Steering Committee (13). The 1018 CT scans were captured by scanners from several different manufactures that include GE, Philips, Siemens, and Toshiba. The tube peak potential energies used for image acquisition were 120, 130, 135, or 140 KV. Tube current ranged from 40 to 627 mA. Slice thicknesses were 0.6, 0.75, 0.9, 1.0, 1.25, 1.5, 2.0, 2.5, 3.0, 4.0, or 5.0 mm. The reconstruction interval ranged from 0.45 to 5.0 mm. The in-plane pixel size ranged from 0.461 to 0.977 mm (13). The annotation data were stored in 1018 XML files. Four experienced radiologists were required to independently review each scan using a computer interface and to mark lesions that they identified as 1) ‘‘nodule $ 3 mm’’, 2) ‘‘nodule

Development of a personalized training system using the Lung Image Database Consortium and Image Database resource Initiative Database.

The aim of this study was to develop a personalized training system using the Lung Image Database Consortium (LIDC) and Image Database resource Initia...
850KB Sizes 1 Downloads 7 Views