Tree testing of hierarchical menu structures for health applications.

YJBIN 2132

No. of Pages 8, Model 5G

4 March 2014 Journal of Biomedical Informatics xxx (2014) xxx–xxx 1

Contents lists available at ScienceDirect

Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin 4 5

Tree testing of hierarchical menu structures for health applications

3 6 7 8 10 9 11 1 3 3 2 14 15 16 17 18 19 20 21 22

Q1

Thai Le a,⇑, Shomir Chaudhuri a, Jane Chung b, Hilaire J. Thompson b, George Demiris a,b a b

Biomedical Informatics and Medical Education, School of Medicine, University of Washington, WA, USA Biobehavioral Nursing and Health Systems, School of Nursing, University of Washington, WA, USA

a r t i c l e

i n f o

Article history: Received 3 November 2013 Accepted 12 February 2014 Available online xxxx Keywords: User–computer interface Usability methods Information system evaluation

a b s t r a c t To address the need for greater evidence-based evaluation of Health Information Technology (HIT) systems we introduce a method of usability testing termed tree testing. In a tree test, participants are presented with an abstract hierarchical tree of the system taxonomy and asked to navigate through the tree in completing representative tasks. We apply tree testing to a commercially available health application, demonstrating a use case and providing a comparison with more traditional in-person usability testing methods. Online tree tests (N = 54) and in-person usability tests (N = 15) were conducted from August to September 2013. Tree testing provided a method to quantitatively evaluate the information structure of a system using various navigational metrics including completion time, task accuracy, and path length. The results of the analyses compared favorably to the results seen from the traditional usability test. Tree testing provides a flexible, evidence-based approach for researchers to evaluate the information structure of HITs. In addition, remote tree testing provides a quick, flexible, and high volume method of acquiring feedback in a structured format that allows for quantitative comparisons. With the diverse nature and often large quantities of health information available, addressing issues of terminology and concept classifications during the early development process of a health information system will improve navigation through the system and save future resources. Tree testing is a usability method that can be used to quickly and easily assess information hierarchy of health information systems. Published by Elsevier Inc.

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

43 44

1. Introduction

2. Background

59

45

Health Information Technologies (HIT) are highly complex systems, with interfaces that incorporate multiple menu labels navigating across both broad and deep structures. The user interface (UI) also contributes towards challenges in navigation with dense displays, cluttered layout, icons, and popups. We demonstrate the use of tree testing as a method to evaluate an abstraction of navigation structure, in particular, focusing on only the menu labels and their organization within the navigational hierarchy. In a tree test, users navigate the disembodied representation of the UI, consisting of only menu labels and their sublevels while trying to complete representative tasks. This approach isolates the conceptual component of the navigation structure from the UI, though it is recognized that elements of the UI also contributes towards navigation within the HIT system.

2.1. Evidence-based evaluation of Health Information Technology

60

Health Information Technologies (HIT) range in breadth including various applications such as electronic health records (EHR), electronic prescription systems (eRX), and computerized provider order entry systems (CPOE). These systems are often highly complex and have significant potential of improving patient care and organizational efficiency. However serious challenges behind the design and construction of HIT systems have created interfaces that are often frustrating to use for health care providers. This has led to a call for a stronger evidence-base behind development of HIT systems [1,2]. The need to address such concerns becomes even more significant as examples of failed systems come to light, highlighting high costs of implementation and operation, and negative impact on patients [3–5]. Koppel identifies a need to apply greater scientific rigor to the development of HIT with a focus on navigation, usability, information graphics, and ethnographic evaluation [1]. The lack of a strong evidence-based approach towards HIT design can often be attributed to the complexity of health systems, the limited time and resources within clinical practice,

61

46 47 48 49 50 51 52 53 54 55 56 57 58

⇑ Corresponding author. Address: Department of Biomedical Informatics and Medical Education, University of Washington, Box 350847, Seattle, WA 981957240, USA. E-mail address: [email protected] (T. Le). http://dx.doi.org/10.1016/j.jbi.2014.02.011 1532-0464/Published by Elsevier Inc.

Please cite this article in press as: Le T et al. Tree testing of hierarchical menu structures for health applications. J Biomed Inform (2014), http://dx.doi.org/ 10.1016/j.jbi.2014.02.011

62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

YJBIN 2132


4 March 2014 2

T. Le et al. / Journal of Biomedical Informatics xxx (2014) xxx–xxx

80

and the difficulty of applying randomized control trials to pervasive health systems [1].

81

2.2. Terminology and classification in HIT

82

95

The breadth of information available within HIT systems presents a challenge of defining appropriate terminology. Chute refers to terminology as a naming system that applies controlled terms to reference formal concepts organized by a classification schema [6]. Terminologies in health sciences is a well-established area of work and include examples such as the International Classification of Diseases (ICD), Systematic Nomenclature of Medicine (SNOMED), and Logical Observations Identifiers, Names, and Codes (LOINC) [7–9]. Defining appropriate terminology and concept classification is essential towards creating effective information architectures of HIT system. In this research, we are interested in evaluating the existing terminology defined within a commercially available multi-media health interface, focusing on how it supports navigation for the user.

96

2.3. Tree testing to assess hierarchical navigation

97

To address the need for greater evidence-based evaluation of HIT systems applicable throughout the development cycle, we introduce a method of usability testing grounded in human factors engineering. Termed tree testing, this technique focuses on evaluating information architecture, often within the context of website design [10]. In this study, we extend tree testing to health applications, demonstrating a use case and providing a comparison with more traditional usability testing methods. In a tree test, users are presented with an abstraction of the information architecture through a set of menu and submenu navigations. By focusing only on the abstraction of the information architecture, contextual and aesthetic related content is isolated from the organization of information. Users are then given a set of tasks and asked to navigate through the menu structure to find an appropriate location to answer each of the tasks. The menu structure often consists of a collapsible tree. Quantitative metrics derived from tree testing include completion time, navigation path length, and task completion accuracy. With well-chosen tasks representative of the breadth of the information architecture, tree testing can be a valuable tool to evaluate a system’s taxonomy against the user’s expectations. Matching a user’s mental model with the internal system architecture is important towards reducing inefficiencies and errors [11,12]. Spencer introduced tree testing as a user research technique out of the need to systematically evaluate hierarchy structures, separate from interactions due to the interface [13,14]. Initial variants of tree testing involved paper and pencil index cards with menu label headings. A moderator would manipulate submenu interactions to simulate navigation within the hierarchy. Since then, remote tree testing tools have surfaced (Treejack, UserZoom, Plainframe) to allow for larger scale testing [15–17]. Tree testing is used often within the context of website architecture with a primary focus on findability and label organization [10]. This has yet to be extended to HIT systems. We address these limitations by describing the development and implementation of a web based tree test to evaluate the taxonomy of a commercially available multimedia health and wellness tool. The multimedia tool combines both health specific aspects and entertainment and games that ultimately aim to reduce anxiety and improve cognitive and overall well-being for older adults. The interface for the multimedia tool consists of pictorial icons with menu labels describing content to engage, entertain, and interact with participants. The menus lead to further sub-categori-

79

83 84 85 86 87 88 89 90 91 92 93 94

98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139

zations or to applications that users can launch. These applications include games, movies, news media, music, exercise videos, and arts and crafts. However, given the breadth of applications available, questions arose as to the navigability of the system. This was evaluated through tree testing of the information hierarchy. We present a quantitative analysis of results from the tree test and also develop a network path visualization of aggregate navigational patterns for each task. In addition, we provide a comparison of tree testing as a user research method with more traditional scenario based methods where users are asked to think-aloud while navigating through the live version of the multimedia tool. The objectives of this research are to present tree testing as a usability testing method, focusing on implementation and analysis as opposed to providing usability recommendations on a specific multimedia tool.

140

3. Materials and methods

155

3.1. Development of tree test tool

156

We created a web-based implementation of tree test using a combination of HTML, JavaScript, and PHP. We used collapsible accordion menu structures to simulate navigation across the tree such that once a node is clicked, its siblings are collapsed and hidden from view. Alternatively, if the node is already hidden, clicking it will expand its siblings. This creates a linear path of navigation, preventing users from jumping across branches within a tree unless they navigate up to a parent node and expand its children. The tree test tool takes as input a formatted data file to build up the tree structure along with a file representing the tasks presented for tree testing. As output, the tree test generates a file storing each node clicked and the time taken to click the node from either the start of the task or the most recently clicked node, for each task. We hosted the web based tree test on a university provided server for public access. The customizable tree test tool along with full documentation is provided for access at (http://staff.washington. edu/tle23/TreeTest.zip).

157

3.2. Design of tree test

174

We transposed the underlying menu taxonomy of the multimedia health tool to the tree test. This resulted in a tree with 70 nodes overall, of which 53 were leafs and 17 were branches. The depth of the tree varied from two to four levels. Given the voluntary nature of the survey, we limited the number of tasks to seven scenarios. These were chosen to span across the tree, while also highlighting paths across a range of difficulty as identified through preliminary discussion with the research team (Table 1). An example of the tree structure is provided for Task 2 (Fig. 1). As part of the implementation of the tree test, participants were exposed to an initial demonstration tree and a set of three trial tasks. We used a different taxonomy for the demonstration run compared to that of the true trial. The demonstration consisted of scenarios asking participants to find different food items (such as lasagna or tomato soup) within a categorization of food groups. Since a moderator was not present for these web-based sessions, we wanted to use the trials as an opportunity to familiarize participants with the collapsible menu structure and task goals. The instructions were:

175

‘‘Please navigate through the tree to find the appropriate location for the item. There are no right or wrong answers and you can navigate back up levels of the tree. Please choose the best fit under ‘‘I’d find it here’’. If there are no appropriate locations, select ‘‘No Category Found’’. The task is: (scenario).’’

194


141 142 143 144 145 146 147 148 149 150 151 152 153 154

158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173

176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193

195 196 197 198

YJBIN 2132


4 March 2014 3

T. Le et al. / Journal of Biomedical Informatics xxx (2014) xxx–xxx Table 1 Set of scenario tasks for the tree test along with the appropriate navigation path based on the multimedia health application tool. Scenario

Correct path

1. Your friend recommended a hilarious show to watch from the 1960s. Where would you find an application to watch this show? (Classic Comedy) 2. You need an instructional video on how to make lasagna. Where would you find an application to help you do this? (Cooking Video) 3. Your friend told you about this hilarious clip of a dog jumping on a trampoline. Where would you find this? (Funny Clip)

Entertain ? Classic TV ? Comedy ? I’d find it here Reminisce ? Life Experiences ? Cooking ? I’d find it here Get Silly ? Funny Animals ? I’d find it here Relax ? Music ? Karaoke ? I’d find it here Entertain ? Travel Videos ? Rome ? I’d find it here Engage ? Games ? Simple Strategy ? I’d find it here Stay Connected ? Skype ? I’d find it here

4. It’s been a long day from school (or work) and you want to kick back with friends and sing some Karaoke. Where would you find an application to help you do this? (Karaoke) 5. For summer break you are planning a trip to Rome. Having never been there before, you would like to do some research. Where would you find an application to help prepare for your trip? (Rome Trip) 6. You are on a long plane ride and would like to play solitaire to pass the time. Where would you find an application for this? (Solitaire) 7. You are traveling abroad and plan to talk to your parents today over Video Chat. Where would you find an application to help you do this? (Video Chat)

199 200 201 202 203 204 205 206 207 208

Upon completion of the demonstration trials, participants were then presented with a randomized order of the seven tasks from the true experimental session. For each task, participants navigated through the taxonomy until they reached the end of the tree. If participants believed the menu label corresponded to an appropriate location to address the scenario, they would select ‘‘I’d find it here’’ before moving onto the next task. At any time during the navigation process an option of ‘‘No Category Found’’ was available if participants believed no appropriate category within the taxonomy fit the scenario.

209

3.3. In person usability testing

210

We provide a comparison of the web-based tree test with more traditional in-person usability tests. During the in-person sessions, we asked participants to think aloud as they navigated to complete tasks using the actual multimedia interface. We used the same tasks as the online tree test, randomized at the start of each session. A moderator helped explain study procedures while a note taker recorded paths taken during navigation. We also recorded the sessions for transcription and analysis. Participants were distinct from the online tree test. Participation was voluntary and lasted

211 212 213 214 215 216 217 218

between 15 and 30 min. The university institutional review board approved all human subject procedures of the study.

219

3.4. Participant recruitment

221

We applied a mixed methods approach towards recruitment that involved fliers, email list subscriptions, and snowball sampling. All participation was voluntary, restricted to English speakers. We sampled based on convenience across a primarily, though not restricted to, university population. Fliers and emails contained a link to the online tree test along with contact information for in-person usability testing. We restricted participation to only one of either online or usability testing by verifying an email identifier.

222

3.5. Data analysis

231

For the online survey, we calculated metrics of completion time, task accuracy, and path length. We used descriptive statistics including mean, standard deviations (SD), and confidence intervals (CI) to summarize the data. However, these metrics are primarily aggregate measures of task performance indicating which tasks are easier or more efficient to complete.

232

Fig. 1. Example navigation of the menu structure in order to complete Task 2 (finding a cooking application).


220

223 224 225 226 227 228 229 230

233 234 235 236 237

YJBIN 2132


4 March 2014 4

T. Le et al. / Journal of Biomedical Informatics xxx (2014) xxx–xxx

260

For greater insight into path taken, we developed a network path visualization grounded from social network analysis [18]. Each node within the network represents the number of times a menu label was selected across all trials of a given task. A directional edge between two nodes exists if participants selected a given label and then selected a following label as part of the navigation path. This edge is weighted by the number of occurrences across all trials for the task and can be selectively filtered for visualization. This resulted in a network path visualization that provides a gross overview of the paths taken towards completing a task. For the in person usability tests we recorded navigation path and used this to calculate task accuracy along with relative path length. We chose not to record time primarily because the think aloud process and moderator prompted discussions would unfairly bias time to completion. We assessed joint dependence of task type and task accuracy with testing method (remote tree test or in-person) using Fisher’s Exact Test. We applied a one-way ANOVA to test if mean relative path lengths differed by task type and testing method. In addition, we provide a gross overview of the qualitative think-aloud to complement the results from the tree testing. All statistical analyses and visualizations were completed with R V.2.15.2 statistical software [19].

261

4. Results

238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259

262 263 264 265

We conducted the survey and in-person usability tests from August–September 2013. Over this period, 54 participants responded to the remote survey and 15 participants were involved with in-person usability testing.

266

4.1. Tree test metrics

267

We found a broad distribution of accuracy rates across the seven tasks (Fig. 1). This was consistent with the design of the scenarios, intended to sample across the spectrum of difficulty. The most accurate tasks involved Video Chat (94.4% success, CI: 83.7–98.6%) and Funny Clip (83.3% success, CI: 70.2–91.6%). It should be noted that these two highly accurate tasks involved the shortest paths, requiring participants to navigate across two levels of the tree before finding the appropriate location. However,

268 269 270 271 272 273 274

when comparing across tasks within the same path lengths, there were still statistically significant differences. For example, both Karaoke (59.3%, success CI: 45.1–72.1%) and Classic Comedy (53.7% success, CI: 39.7–67.2%) were statistically different from the least accurate tasks of Cooking Video (3.70% success, CI: 0.644–13.8%) and Rome Trip (25.9% success, CI: 15.4%–39.9%). These scenarios all involved clicking four total nodes. Ranked from least to most accurate are the tasks: Cooking Video ? Rome Trip ? Solitaire ? Classic Comedy ? Karaoke ? Funny Clip ? Video Chat. For task completion times and path length analysis, we filtered out times that were unusually long (above 300 s). We also filtered out paths that were greater than 15 clicks (this also applied for inperson results). Those who completed the tasks correctly took on average 64.3 s (CI: 53.7–74.9 s) while those who incorrectly completed the tasks took 72.4 s (CI: 61.7–83.2 s). This difference was not statistically significant (p = 0.29). Within task, there was also no statistically significant difference in correct compared to incorrect completion times (Fig. 2). The quickest tasks to complete correctly were Video Chat (24.6 s, CI: 19.3–30.0 s) and Funny Clip (32.1 s, CI: 20.4–43.8 s). Overall, from longest correct completion time to shortest, we found a relationship closely matching that of accuracy rate: Cooking Video, Rome Trip, Solitaire, Karaoke, Classic Comedy, Funny Clip, and Video Chat. We defined relative path length as the number of additional steps taken by the user in order to arrive at a correct solution compared to the optimal shortest path. This provides a measure of path directness. The most direct paths involved Video Chat, Funny Clip, and Classic Comedy, taking on average 0.2–0.6 additional clicks (Fig. 3). There was a strong separation of this group of tasks compared with Karaoke, Rome Trip, and Solitaire, which required an additional 1.9–3.2 clicks. The Cooking Video task had only one correct response and so conclusions related to that task are limited. These findings align with completion time and accuracy rate.

275

4.2. Network path visualizations

309

To better understand differences in performance across tree tasks, we developed network path visualizations. The visualizations provide a technique to represent the breadth of path decisions made by participants for each task. In the visualization,

310

Fig. 2. Proportion of all respondents who navigated to an appropriate location to answer each scenario with 95% CI.


276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308

311 312 313

YJBIN 2132


4 March 2014 T. Le et al. / Journal of Biomedical Informatics xxx (2014) xxx–xxx

5

Fig. 3. Mean completion time for each task partitioned by incorrect/correct responses.

314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338

the size of the nodes represents the number of instances across all trials that the node was selected during a task. A directed edge between nodes (A ? B) represents a navigation path from node A to node B during the navigation task, weighted by the number of occurrences. We filtered the network to remove nodes and edges less than three. The visualizations are demonstrated with two tasks: Karaoke and Solitaire (Fig. 4). For the Karaoke task, participants were asked to find an application to sing Karaoke (Relax ? Music ? Karaoke ? I’d find it here). From the visualization, there are three primary clusters. One represents the correct pathway selected by approximately 65% of participants. Once on the path, there was limited deviation, indicating that subsequent levels provided clear choices for the participant. However, there were two other clusters leading to incorrect choices, represented by {Get Silly, Entertain, No Category Found} and {Engage, Games, Sound Games}. These indicate potential points of confusion for participants. For the Solitaire task (Engage ? Games ? Simple Strategy ? I’d find it here), there was confusion at the top level with participants selecting between Relax, Entertain, and Engage. In addition, within the Engage pathway, participants were unclear where solitaire might be found, looking amongst Matching Games, Touch Games, and Puzzles along with the correct location of Simple Strategy. In contrast to the prior example where the correct path is straight, Q2 this task shows divergence at the sublevels (see Fig. 5).

339

4.3. Comparison with In-Person Usability Testing

340

We did not find evidence to reject the joint independence of usability method (remote tree and in-person sessions) with task type and accuracy through Fisher’s Exact Test (N = 54, p = 0.5). This indicates that the relationship between task type and task accuracy is not impacted by the usability method employed. However an ANOVA examining independence of mean relative path length with task type and usability method did have statistical significance (p

Neural mechanisms underlying the computation of hierarchical tree structures in mathematics.

Design of Hierarchical Structures for Synchronized Deformations.

Hierarchical group testing for multiple infections.

Modeling of biological tree structures.

metal grid structures for stable, flexible transparent conductors.

Fibre Optic Sensors for Structural Health Monitoring of Aircraft Composite Structures: Recent Advances and Applications.

CdS Hierarchical Nanostructure Growth for Photoconductive Applications.

Transition metal oxide hierarchical nanotubes for energy applications.

Hierarchical Rank Aggregation with Applications to Nanotoxicology.

Giant Magnetoresistance Sensors: A Review on Structures and Non-Destructive Eddy Current Testing Applications.

Calorie changes in chain restaurant menu items: implications for obesity and evaluations of menu labeling.

HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree.

An empirical, hierarchical typology of tree species assemblages for assessing forest dynamics under global change scenarios.

Prediction of silicon-based layered structures for optoelectronic applications.

Prognostic applications of exercise testing.

Cortical tracking of hierarchical linguistic structures in connected speech.

Evolution of hierarchical porous structures in supramolecular guest-host hydrogels.

Cyclic growth of hierarchical structures in the aluminum-silicate system.

Hierarchical structures of amorphous solids characterized by persistent homology.

The role of bio-inspired hierarchical structures in wetting.

A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories.

Smartphone Applications for Mental Health.

Sponge-like nickel and nickel nitride structures for catalytic applications.

Modelling three-dimensional protein structures for applications in drug design.