Performance appraisal of online MEDLINE access routes.

Performance Appraisal of Online MEDLINE Access Routes CJ Walker, KA McKibbon, RB Haynes, ME Johnston Health Infornation Research Unit, Dept. of Clinical Epidemiology & Biostatistics, McMaster University, 1200 Main St W, Hamilton, Ont, Canada L8N 3Z5 (416)525-9140 x3133, FAX 416-546-0401, E-MAIL WALKERC@McMASTER (SciMate, SearchMaster and InSearch). Six standardized searches were run through each route and it was found that accessing MEDLINE directly or through SearchMaster software offered the best combination of low cost and high yield. Since the study was done there have been many changes in software, features and pricing policies for MEDLINE searching. Some front-end software is no longer marketed and new software has been introduced. NLM has produced and distributed GRATEFUL MED which has been used by our group in studies evaluating end-user searching of MEDLINE.12' Furthermore, numerous CD ROM versions of MEDLINE are now marketed, a searching technology that was not available at the time of the comparison study. The NLM sponsored 21 institutions to evaluate the characteristics and use of MEDLINE on CD ROM in 1988 but there were few simultaneous comparisons with online searching routes, and no attempts to provide simultaneous comparisons among all systems. Furthermore, the relevance of retrieval was not considered in most evaluations.[33 The study reported here was undertaken for updating purposes and to address specific criticisms from some of the vendors whose products were tested in the previous study. Grievances included the subjectiveness of an lease of use' measure, belief that the search questions were too simple for a rigorous comparison of the routes, and the fact that librarians rather than end users performed the searches used in the comparison.14' The main investigation addressed two questions. 1. Compared with direct searching on MEDLINE at Elhill, what are the relative performances (in tenns of recall and currency) and costs (in terms of time and expense) for 13 online MEDLINE systems and 14 CD ROM versions of MEDLINE? 2. How do the searches of a) inexperienced end users, b) librarians who are experienced in MEDLINE searching, and c) vendor representatives compare in performance and efficiency on a given vendor's system? This paper reports the costs and relevance of retrieval of the daytime online MEDLINE systems for librarian and clinician approaches to searching. All systems were compared with direct access to MEDLINE at the NLM

ABSTRACr Objective: To compare the performance and cost of 11 online MEDLINE systems with MEDLINE at Elhill. Design: Comparative study. Systems: Eleven online daytime systems commercially available in North America offering the MEDLINE database. Measures: Number of relevant citations, number of irrelevant citations, proportion of searches producing no relevant citations and cost per relevant citation were analyzed for each system. Relevance and cost for each system were compared with direct searching of MEDLINE through NLM for librarian and clinician search strategies for 18 clinical questions. The citations retrieved by both strategies were pooled and rated for relevance on a 7-point scale. Results: Numbers of relevant and irrelevant citations and cost per relvant citation were higher for clinician searches than librarian searches, reflecting the higher total number of citations retrieved by the clinician approaches. A lower proportion of clinician searches produced no relevant citations than librarian searches. Conclusions: Eleven daytime MEDLINE systems performed similarly in terms of retrieval and cost within similar searching groups. Clinicians, however, tended to capture larger overall retrievals resulting in higher numbers of relevant and irrelevant citations than librarians.

INTRODUCTION Institutions and individuals are faced with a bewildering array of choices and claims when selecting a means of accessing MEDLINE. MEDLINE is produced by the National Library of Medicine (NLM) but is marketed by many vendors in various styles to appeal to specific audiences such as clinicians, researchers or librarians. In 1985 a study was undertaken to test all the available routes to MEDLINE (14 at the time) and compare them for the quantity and quality of retrieved citations, and the cost, time and ease of use of each system.11 The routes included the products of vendors of MEDLINE (NLM, DIALOG, BRS, BRS After Dark, BRS Colleague, Knowledge Index and PaperChase) and gateway software products through which to search MEDLINE

0195-4210/92/$5.00 (C1993 AMIA, Inc.

483

conducted by clinicians with only basic instruction in MEDLINE use. Searches performed at the midpoint of the study were chosen to allow participants to gain familiarity with the search software and process. Searches were examined for potential inclusion according to the following criteria: 1) the search must be prompted by a patient problem andaddress a problem related to that patient, 2) the question must be complete and understandable, 3) the six departments of Family Medicine, Medicine, Pediatrics, Psychiatry, Obstetrics and Gynecology, and Surgery must be represented by three searches each, one to represent each of the three areas of therapy, diagnosis and prognosis, and 4) no more than two questions may originate from the same searcher. From the pool of searches meeting these criteria, two searches were chosen randomly for trial runs and 18 for formal testing. A minimum of 13 searches was required to detect a difference in precision of 10% between any one system and direct searching through

because this system fared best in the first comparison study and its database is the source for all the other systems. We used clinical end-user searches for testing each system because they provided both authentic clinical questions and end-user search approaches. Vendor representatives' searches were included to verify that the study librarians were searching the systems properly and to best advantage. MEDUNE Systems All online systems and CD ROM products offering access to MEDLINE which were commercially available in Canada and the United States were identified and subscriptions were purchased. Clinical subsets and monthly and quarterly updates provided by some of the CD ROM companies were also acquired. Thirteen online systems and 14 CD ROM systems were compared. Online systems studied were NLM- Direct, GRATEFUL MED for the PC, GRATEFUL MED for the Macintosh, BRS, BRS Colleague, DIALOG, DIALOG Medical Connection, Pro-Search DIALOG, ProSearch BRS, PaperChase, Data-Star, and the nighttime systems, BRS After Dark and Knowledge Index. The CD ROM products were Compact Cambridge, CD Plus, SilverPlatter, EBSCO CD ROM, Bibliomed and a test version of Bibliomed Professional, Dialog OnDisc, and Aries Knowledge Finder. Each type of subscription (complete, subset, monthly or quarterly) was tested as a separate system. The results presented here describe the performances of the 11 online systems offered during daytime hours. The data analyst was blinded to system identity and we are presenting this preliminary data in this forum to obtain assistance in interpreting the results in an unbiased manner. Interpretations of the differences amongst systems would certainly be biased should the product names be revealed, therefore we require that the identity of the systems remain coded.

NLM.

Search Construction The 18 clinician search strategies formed the end-user part of the performance appraisal. The systems were also tested by librarians experienced in Search strategies were MEDLINE searching. constructed for each of the 18 search questions by the senior study librarian. The librarian was provided with the original search question, the patient sex and gender, if available, and the clinical departmnent from which the search originated. She was blinded to the identity of the clinician, and the original search strategy. The strategies were constructed using MeSH and a medical dictionary. The librarian was allowed to sign into MEDLINE using NLM-direct mode up to two times to verify that her search formulation was reasonable. She was not allowed to consult with health care workers or other librarians with regard to search construction. The librarian's approach was to provide a search that produced at least a few citations to studies with high clinical relevance, while minimizing the number of irrelevant references. Exhaustive searches were not pursuedas the search questions required clinical information regarding patient problems. Following the construction of the librarian strategies the clinician search strategies were extracted from the 18 original searches. If the clinician's original search comprised more than one attempt, the last attempt that retrieved citations was selected. If no citations were retrieved for any attempt, the last attempt was selected. The strategies were checked for completeness and accuracy. The

Search Selection The questions used for the test searches were drawn from those performed during a randomized controlled trial of MEDLINE access in hospital wards and clinics.151 Participants could perform searches on any of 16 computers located in wards and clinics in the hospital. They were required to state the question prompting each search in a gateway program before they gained access to GRATEFUL MED for MEDLINE searches. The questions from searches performed by the control group clinicians at the midpoint of the study (i.e., after their fifth search) formed the pool from which search questions were selected for this study. Control group searches were chosen as they were considered to be representative of those

484

vendor representative was scheduled to come to McMaster. The searches were performed on an IBM PS/2 Model 55 computer except for Aries Knowledge Finder and GRATEFUL MED for the Macintosh which were run on a Macintosh SE/30. Sign-on for the online systems was automated with TELIX, a telecommunications program, using a 2400 baudextemal modem. The CD ROM discs were mounted on a double-drive Hitachi 3600 CD player. Searching began on Monday (provided the vendor representative was not coming that day) at 10:00 am and continued until 4:00 pm. The librarians recorded online time and cost for each search and any problems encountered, such as telecommunications failures or line noise. A separate comparison was done for backfile searching. If the clinician strategy included searching the backfiles, the same years were searched on the online systems.

clinician searches were all performed originally with GRATEFUL MED version 4.0, and a standardized approach to interpreting the entry of search terms was adopted. For instance, a word entered on a GRATEFUL MED subject line and not chosen from the MeSH screen was searched as both a MeSH term and as a title or abstract word in all systems, because this is how GRATEFUL MED would translate the entry for the ELIfLL software that runs MEDLINE at NLM. Also, GRATEFUL MED permits the entry of terms that are cross-referenced to MeSH terms, and searches these as the MeSH term. Therefore, if the MeSH crossreference MIENTAL DEFICIENCY was entered by the original searcher, MENTAL RETARDATION was searched by the NLM computer, and would be the term used in clarifying the clinician strategy. Search Translation The online and CD ROM systems were grouped by vendor and randomly allocated to the two study librarians so that each librarian was responsible for learning and searching half the systems in the comparison. A specification sheet based on Hewison's CD ROM evaluation checklist was completed for each system. 16] The librarian and clinician strategies, in a generic form, were translated into the searching protocol for each system beginning with the two trial run questions followed by the 18 librarian and 18 System familiarization, clinician strategies. specification sheet completion, test searches and translation were done one system at a time and in a randomized order. For systems with both command and menu modes, the clinician strategies were translated in the menu mode and the librarian strategies in the command mode, those being the most likely searching interfaces used by the two groups. [7] Translations were made as close to the defmed strategy as each system would allow. However, exceptions were made if systems did not accommodate some NLM search protocol features. In cases for which explosions were not possible, all indented terms under the broader MeSH term were OR'd. If major emphasis applied to MeSH terms was not allowed, the term was searched without such emphasis. Systems not permitting the use of bald subheadings were searched with attached subheadings. If the subheading was not applicable to the MeSH term it was searched as a textword, and, if possible, MeSH term.

Data Collection and Analysis Data was collected using standardized data forms and downloaded files of the online and CD ROM interaction for each search. Data from the forms was keyed into Paradox (ver. 3.5) tables. Citation unique identifiers were extracted from search output files using purpose-built programs, then imported into Paradox tables. The search question and citations retrieved by each end-user and librarian search from all 25 systems were pooled, placed in random order, and presented to a clinician with expertise in the topic of the search. Blinded to which systems retrieved which citations, the clinician rated each citation for relevance on a 7point scale,[21 and all citations rating 5, 6 or 7 (i.e., directly relevant to the clinical question) were treated as relevant for calculation of absolute numbers of relevant and irrelevant citations and recall. Data from the citations captured by the vendor representative searches was included in a separate analysis comparing librarian and vendor searches. Technical specifications and update rates for each system were collected and the systems were assessed for time, cost and performance compared with NLM-direct MEDLINE searching. Specific outcome measures were number of relevant citations, number of irrelevant citations, proportion of searches producing no relevant citations, recall, cost per search, cost per relevant citation, time per search, time per relevant citation and online or search processing time per search.

RunLning Searches Searches were conducted from January 1991 to October 1991. Searches for each system were run by the study librarians during the week that the system's

RESULTS

Reported here are the performance results for

485

For librarian searches, only system 11 behaved differently from the others: it retrieved substantially more citations that were relevant, but also more that were irrelevant. Table 3 gives the proportion of searches producing no relevant citations for each system by librarian and clincian strategies. Two search questions from the clinician searches with very large retrievals were prorated for the initial 100 citations captured.

the 11 daytime online systems. Table 1 gives the number of relevant and irrelevant citations for each system from the librarian strategies and the clinician strategies averaged across the 18 search questions. Systems are coded and ranked in order of highest number of relevant citations, with the reference system, NLM-direct, labled 01.

Table 1 Relevance of Citations for Online Systems 18 Search Strategies 0

System 11 27 12 07

08 01 03 05

09 04 02

system

Relevant

Irrelevant

S.5 (11.60) 2.3 (2.80) 2.3 (3.55) 2.1 (2.78) 2.1 (2.78) 2.1 (2.74) 2.1 (2.78) 2.1 (2.97) 2.1 (2.78)

9.0 (23.14)

1

2.9 (3.67)

02

2.9 (3.69) 2.6 (3.71)

27 07 09

2.1 (.SS) 1.8 (2.68)

Table 3 Proportion of Searches Producing No Relevant Citations

Cndan stategu

Libarian Stategies

2.6 (3.71)

12

2.7 (3.75) 2.7 (3.75) 2.7 (3.79)

03

01 04 08 05

2.7 (3.75) 4.2 (6.81) 2.6 (3.81)

Relevant 7.4 (17.10) 7.1 (1638) 7.0 (17.7S) 6.8 (16.61) 6.7 (16.63) 6.6 (15.96) 6.6 (16.04) 6.6 (16.29) 6.6 (16.06) 6.4 (14.80)

5.0 (14.05)

hvelevant

Lrarian Stratgies Ma. Seache Producing of No Raelanst Cites Relevant (18 otav/stem) ~~~~~~Citations System Number Proportion on Any

38. (10.9 SS.9 (17725) S4.8 (170.13) 55.6 (173A9)

0333

11

01

4

6 6 6

0.333

10

07

0

0.333

1S

08

4 4

0.333

11

12

4

7 7 7

0M9 0M389 0.389

11

27

4

11

02

S

0.2 0.222 om 0.278

12

03

7 7

0.389 0.389

11

0278 0278

11

OS 09

S 5 S

0278

70

7 8

Q389

11

11

S

66

0.44

42

04

6

0.278 0.333

12

02 03 05 07 06

09 11

DWferonce

System Comparison

Interval 3.38

0.422 6356

11 - 01 04 - 01 27 01

Difference

Between Means2

Table 4 Search Costs for Online Systems 18 Search Strategies Ubrrian Strategies

1.444

-s5.652

0.222

.2.745 - 3.189

0.167

.2.800 - 3.134

0.000

-2.967 - 2.967

12 - 01

0.167

-0.056

-3.023 - 2.911

0.000

4.056 4.056

.3.023 - 2.911

09 - 01 03 - 01 05 01 07 01

40.167

02 - 01

4.167

08 - 01

4.056

-3.023 -2.911

03 - 01

4.056

02 -01

4.278

3.023 - 2.911 S3.245 - 2.689

Minimum significant difference

-

2.967

08 - 01

.5.707 - 6.040 -5.874 -574

0.000

-S.874 5.874 ^5.929 5.818 -6.040 - 5.707

27 11

1.32

09 07 01 05 04 03

2 Minimum significant difference

-

5.874

citadon'

06

02

4040- 5.707 4.040 5.707

Cost/relevant 0.97 1.08

12

6.096

4.056

-0.167

System

Cfidence Intewva

O

12 01

27 - 01 05 .01 04 - 01 07 * 01 09 01

4.023 - 2.911

95%

0.404 - 12.152 -4.429 7.318

6.278

62 67

75 6

67 61

67

Clinician searches not only retrieved more relevant citations per search than librarian searches, they had a lower proportion of searches retrieving no relevant citations. Systems were evaluated for search cost and cost per relevant citation. Table 4 gives cost outcome measures for the librarian strategies and clinician strategies averaged across the 18 search questions. Systems are ranked in order of increasing cost per relevant citation.

Ieevmnt Citatos

95% Confidence

69 70

'Prrtd for search questo 14 an 2

Clinician Strategies

I

Between

1

6

04 27

Table 2

11 -01

01

56.2 (176.40) S4.1 (170.5 64.9 C220.3 )

Comparison of Systems with NLM-direct Librarian Strategies - Relevant and Irrelevant Citations Ielevat Citaons I

Search' 0.22

S5.7 (176.56)

A higher number of relevant references was found by the clinician searches, but this was accompanied by an even greater number of irrelevant citations. One notable exception to this was system 11 which though it retrieved the highest number of irrelevant citations for librarian searches, retrieved the lowest number for clinician searches. Of additional interest, the system that produced the least number of relevant citations for librarian searches produced the second greatest number for clinician searches. The differences between means for each system compared with NLM-direct searching (System 01) for number of relevant and number of irrelevant citations by librarian searches are given in Table 2.

max.0

of Relevant

System Number Proportion Citadons on Any

Search*

SS2 (173.96) 5S.0 (174.44) SS.3 (176.53)

Mun and standard deviations

System

ainician Stagies Searches Producing No Relevant Citn's (18 total/tem)

Cost/relevant

12 07 09 11 08 02 03 27 01 04 05

1.fl 1.79

(S)

1.26

1.39 1.41 1.42 1.67 1.81 1.99 2.09

Cost totalled over 18 quesdons/total

486

System

citaion

1.86 2.01

2.14 3.00 3.07 3.19

3.61 3.86 4.07

relevant citations over 18 questions

The higher cost per relevant citation of the clinician strategies reflects the larger total retrievals for those searches. The range of costs was two-fold from lowest to highest for both librarian and clinician searches.

ACKNOWLEDGMENTS The study was supported by the National Library of Medicine (R01 LM04696-03). Dr. Haynes has a career award from the National Health Research and Development Program in Canada.

DISCUSSION Comparison of different access routes to the MEDLINE database requires the recognition and control of many variables. The information in the database remains the same but many other features vary across systems. Testing of system performance using standardized searches from librarian and clinician user groups compensates for many areas of potential bias. For retrieval only 1 system behaved remarkably different than NLM. System 11 retrieved more relevant and more irrelevant citations than NLMdirect when tested by librarian searches and more relevant and fewer irrelevant citations when tested by clinician searches. The cost per relevant citation for this system was towards the lower end of the spectrum. Although system performance was similar for the remaining systems, cost per relevant citation varied substantially. Most sytems were slightly less expensive than NLM-direct, system 4 and 5 being consistently more expensive. Clinician strategies found more relevant citations than librarian strategies but also more irrelevant citations. Librarian searches took the approachthat a few key citations of superior methodologic quality were of more use to a clinician treating a patient than a large download requiring substantial sorting to find the relevant articles. With these preliminary data we seek unbiased feedback from colleagues on interpreting the differences amongst systems observed in a complex set of measurements. Our interest is the clinical use of MEDLINE and while clinical searches were run through the systems, actual use of the systems in the clinical setting has not been assessed here. Such a study is highly desirable but prohibited by the large number of systems being compared. Nevertheless, insight is gained regarding the style of clinician searches and how their strategies fare when run on various MEDLINE access routes. Such information should be useful to libraries and institutions (e.g. hospitals) planning end-user information access by identifying those products providing optimal combinations of performance and efficiency. This subset of systems performing best should be subjected to controlled trials of access by end users to determine whether there are any important differences in end-user performance.

BIBUOGRAPHY 1. Haynes RB, McKibbon KA, Walker CJ, Mousseau J, Baker LM, Fitzgerald D, Guyatt G, Norman G. Computer searching of the medical literature: an evaluation of MEDLINE searching systems. Ann Intem Med 1985;103:812-816. 2. Haynes RB, McKibbon KA, Walker CJ, Ryan N, Fitzgerald D, Ramsden M. Online access to MEDLINE in clinical settings: a study of use and usefulness. Ann Intern Med 1990;112:78-84.

3. MEDLINE on CD-ROM: National Library of Medicine evaluation forum, Bethesda, Maryland, September 23, 1988. Editors, Rose Marie Woodsmall, Becky Lyon-Hartmann, Elliot R Siegel. Medford, NJ: Learned Information, 1989. 4. Ryan P, Winand D, Bleich HL, Slack WV, PorterD. Computer searching of the medical literature [letter]. Ann Intern Med 1987;106:168-169.

5. McKibbon KA, Haynes RB, Johnston M, Walker CJ. A study to enhance clinical end-user MEDLINEsearch

skills: design and baseline findings. Proceedings of the 15th Annual Symposium on Computer Applications in Medical Care, November 1991. 6. Hewison NS. Evaluating CD-ROM versions of the MEDLINE database: a checklist. Bull Med Libr Assoc 1989;77:332-6.

7. Wallingford KT, Humphreys BL, Selinger NE, Siegel ER. Bibliographic retrieval: a survey of individual users of MEDLINE. MD Computing 1990;7:166-171.

487

Online access to MEDLINE in clinical settings: impact of user fees.

Distress and performance appraisal satisfaction.

[Transcardiac Access Routes for Endovascular Treatment of Ascending Aortic Pathologies].

Examination of Performance Appraisal Behavior Structure.

Faculty performance appraisal systems: procedures and criteria.

Expensive medicines: ensuring objective appraisal and equitable access.

Patients' online access to their electronic health records and linked online services: a systematic interpretative review.

The MEDLINE Retriever.

Online access to medical records: finding ways to minimise harms.

Rural providers' access to online resources: a randomized controlled trial.

Seeking health information online: does limited healthcare access matter?

Improving hydropower choices via an online and open access tool.

A free-access online key to identify Amazonian ferns.

Searching MEDLINE for nursing literature.

Appraisal of pacing lead performance using transtelephonic follow-up data.

Engineering Nanostructural Routes for Enhancing Thermoelectric Performance: Bulk to Nanoscale.

Improving access to primary care: can online communities contribute?

Problems of clinical nurse performance appraisal system: a qualitative study.

KJU indexed in MEDLINE.

The MEDLINE Button.

MEDLINE and BIOSIS.

Integrating CD-ROM Medline with electronic mail: first step in implementing new strategy for online reference library.

Derivation of Two Critical Appraisal Scores for Trainees to Evaluate Online Educational Resources: A METRIQ Study.

[Clinic interns, individual performance appraisal and obstetric quality assurance].