Journal of Experimental Psychology: Applied 2015, Vol. 21, No. 1, 73– 88

© 2014 American Psychological Association 1076-898X/15/$12.00 http://dx.doi.org/10.1037/xap0000035

Effects of Cues on Target Search Behavior Assaf Botzer

Joachim Meyer

Ben-Gurion University of the Negev and Ariel University

Ben-Gurion University of the Negev and Tel Aviv University

Avinoam Borowsky, Ido Gdalyahu, and Yoav Ben Shalom

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Ben-Gurion University of the Negev Cues in visual scanning task can improve decision accuracy, and they may also affect task performance strategies. We tested the effects of cues on the performance of binary classifications, on the screen scanning procedure participants employed, and on the reported effort in a simulated quality control task. Participants had to decide whether each item in a 5 ⫻ 5 matrix of items was intact or faulty. In half the experimental blocks decisions could only be based on the visual properties of the items. In the other half, participants also saw imperfect binary cues and could use them to classify the items as faulty or intact. We used eye tracking to study scan patterns and fixation durations on items. Decision performance improved with cues, and cues affected the scanning of items, with participants mainly scanning cued items and tending to scan them longer. Participants stated that cues reduced their effort when cues were highly valid. We conclude that strategic choices to focus on suspected areas determined the screen scanning procedure, the amount of effort invested in single decisions, and the accuracy of the decisions. We therefore suggest using likelihood ratio cues to help optimize the scanning procedure. Keywords: binary cues, target detection, mental effort, decision support, random walk

to prompt an action or to draw attention to an event. Examples of such aids are smoke alarms, collision warning systems in cars or airplanes, intelligent target aiding and alerts in automated production. Technological and environmental constraints usually limit the sensors’ ability to distinguish between different states, so their indications are almost never entirely correct. Operators usually need to evaluate the information from the system and combine it with additional available information to make binary categorization decisions (e.g., decide whether or not danger or a malfunction exists). Numerous studies have shown that people can benefit from such information, even though they tend to give nonoptimal weights to the information from the different sources, assigning too little or too much weight to cues (e.g., Bliss, 2003; Dixon & Wickens, 2006; Dixon, Wickens, & McCarley, 2007; Maltz & Meyer, 2001; Mosier & Skitka, 1996; Mosier, Skitka, Heers, & Burdick, 1998; Skitka, Mosier, & Burdick, 1999). For example, Wickens and Dixon (2007) concluded from an analysis of the literature that to have any benefits, above unaided human performance, binary recommendations need to have at least 70% reliability. Below this value performance with the aid may be worse than performance without it, because individuals may overrely on cues. Even when cues help to improve performance, users’ aided decisions may still be worse or equal to the performance of the cueing system alone (Botzer, Meyer, & Parmet, 2013; Goh, Wiegmann, & Madhavan, 2005; Meyer, Wiczorek, & Günzler, 2014). This is due to people overrelying on their own judgment and assigning too little weight to cue indications. The incorrect weighting of cues may depend on the number of concurrently performed tasks, on previous experiences with the system, on the nature of mistakes the system makes and even on

Information is necessary input for decision making, as it reduces the uncertainty about expected outcomes from choosing among different alternatives. The availability of information may also change the strategies decision makers use to address a decision problem. Decision makers’ use of information in a decision and the strategy choices depend on properties of the information, the decision problem and the individual decision maker (e.g., Fisher, Coury, Tengs, & Duffy, 1989; Fisher & Tan, 1989; Meyer, 2004). Our study examines the decision and strategic aspects of information use in binary classifications aided by binary cues. Binary cueing systems are widely used to assist in decisions, for instance, alerting people about potential hazards (e.g., as alarms, alerts, decision support, etc.). The system indications are based on information from sensors measuring the value of a monitored variable (Lehto, Papastavrou, & Giffen, 1998). When a value crosses a predefined threshold, a perceptible cue is issued, usually

This article was published Online First December 1, 2014. Assaf Botzer, Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, and Department of Industrial Engineering, Ariel University; Joachim Meyer, Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, and Department of Industrial Engineering, Tel Aviv University; Avinoam Borowsky, Ido Gdalyahu, and Yoav Ben Shalom, Department of Industrial Engineering and Management, Ben-Gurion University of the Negev. This research was funded by the Israel Science Foundation (Grant 670/09) to Joachim Meyer. It was conducted as part of Assaf Botzer’s postdoctoral fellowship at Ben-Gurion University of the Negev. Correspondence concerning this article should be addressed to Assaf Botzer, Department of Industrial Engineering, Ariel University, Ariel 44837, Israel. E-mail: [email protected] 73

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

74

BOTZER ET AL.

previous mental schemas regarding automation. For instance, Wickens and Dixon (2007) found that across studies where the cue-aided task was not the primary task, the weight assigned to cues was not significantly correlated with the reliability of the cues. Thus, decision makers supported their performance of primary tasks by depending on cues for the cue-aided task. In other cases, however, decision makers do not overdepend on cues but may even completely ignore them. This occurs when cues were too often wrong in the past (Parasuraman & Riley, 1997; Sorkin, 1988). Finally, users may underrely on cues after only few automation mistakes, if in these cases the correct “answer” was obvious (Madhavan, Wiegmann, & Lacson, 2006). According to Dzindolet, Pierce, Beck, Dawe, and Anderson (2001) individuals may hold a “perfect automation schema” that is partly shuttered by automation failures, leading to a sharp drop in trust. The effect of cues on individuals’ decisions is therefore schema dependent, experience dependent, reliability dependent, and of course, task dependent. In this study we investigate the effect of cues on users’ decisions in a simulated quality control task. In each trial we presented 25 items simultaneously in a 5 ⫻ 5 array (see Figure 1). Decisions could be made based on patterns of dark and light areas in the items that were probabilistically related to the items’ state (i.e., intact or faulty). The task resembles the visual inspection of printed circuit boards, where darker areas could be dust particles, or just the result of shading due to uneven grains in a silicon wafer (e.g., Yoda, Ohuchi, Taniguchi, & Ejiri, 1988). In a previous study with the same experimental task (Botzer et al., 2013) we explored the effect of binary cues on the effort individuals invested in the task. We found that cues improved decision accuracy. The reported effort in the task was only lower with the cues than without the cues when cues had high validity. The findings may indicate that users strive for a certain selfdefined level of performance and are willing to invest effort within a certain range to achieve this level of performance. When cues are highly valid, users may realize they can achieve a satisfactory level of performance even when they invest less effort in the task. They may, therefore, decide to invest less effort in processing informa-

Figure 1. A sample screen in the experiment. See the online article for the color version of this figure.

tion from sources other than the cues. Such a decision is congruent with the view of effort as a costly resource that is selectively allocated (e.g., Navon & Gopher, 1979; Robert & Hockey, 1997). Although there was no time limit for trials in the experiment, participants may still have limited the time they spent on the experimental task, rather than the effort. Whether participants tried to save effort or time or both, there are two not mutually exclusive ways in which cues may affect user performance: (a) Cues may affect the visual scanning procedure users employ when they scan the screen, and (b) cues may affect the amount of information users collect before each decision.

Effects of Cues on Visual Scanning There are two types of visual detection tasks. In some tasks a single item appears, and it has to be classified to one of two or more categories. In other tasks multiple items appear on the screen, and specific target items have to be detected among the displayed items. Examples for such tasks are the analysis of aerial photos for certain objects or the analysis of medical images for abnormalities. Cues in these tasks can point to the location of a possible target such as a tumor in mammography (e.g., Oliver et al., 2010). They may thus affect the inspection of items or areas in the viewed image. For instance, a person may scan all cued items and may scan the uncued items only very quickly and may even decide to scan just a subset of them. This notion is supported by the results of a series of studies by Yeh and colleagues (Yeh, Merlo, Wickens, & Brandenburg, 2003; Yeh & Wickens, 2001; Yeh, Wickens, & Seagull, 1999). Participants performed a simulated target detection task on images from natural terrains. Their tendency to miss a target that was not cued was greater if that target (i.e., a nuclear weapon) appeared on the same screen with another target that was cued (e.g., a tank), compared to their tendency to miss the same target when it appeared on the same screen with another target that was not cued. The researchers concluded that cues may lead to “cognitive tunneling,” namely that they capture attention so that attention is withdrawn from other stimuli or domains. Maltz and Shinar (2003) asked participants to detect military vehicles in blurred images of natural terrains. The experimental conditions differed in the cue reliabilities. Participants detected fewer targets when aided by a system that falsely cued on average three areas per image, compared to participants with more reliable systems (0 or 1 false cues per image). Participants falsely identified targets in cued areas, and they failed to find actual targets in areas that were not cued, because they did not inspect them carefully or even entirely ignored some of them. The lower likelihood to finding additional targets after locating other targets in an image is a known phenomenon in radiology where it was named “satisfaction of search” (SOS; e.g., Berbaum et al., 1998; Berbaum et al., 1990; Samuel, Kundel, Nodine, & Toto, 1995). Fleck, Samei, and Mitroff (2010) demonstrated SOS outside radiology, when targets were letters. They showed that participants who experienced larger proportions of single target trials were more likely to stop the search on trials with two targets before they found the second target. Apparently, these participants, compared to participants who saw a smaller proportion of single target trials, judged the presence of a second target to be less likely. Also, SOS existed even when there was no time limit for a

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

EFFECTS OF CUES ON TARGET SEARCH BEHAVIOR

trial. Thus, SOS is probably not the result of immediate time pressure, but rather is related to the expected likelihood of finding targets. Binary cues, then, may facilitate SOS in target search tasks because users may first inspect a large proportion of the cued areas, and with valid cues, they will indeed find targets, at least in part of these areas. After finding targets in the cued areas, they may decide not to scan some of the uncued areas, as they less expect to find targets there. Kundel and La Follette (1972) already showed that when expert radiologists inspect X-ray plates, they examine first and more closely those areas in which they more expect to find tumors. Binary cues may have the same role that knowledge has in target detection tasks, namely, they direct attention to suspected areas. In both Botzer and colleagues (2013) and Maltz and Shinar (2003), participants tended to mark cued areas as targets before marking uncued areas as targets. Berbaum and colleagues (1996) conducted a study that has some relevance for target detection with multiple unknown possible targets. Participants searched abdominal radiographs for extraintestinal abnormalities. Some regions of the radiography were created using contrast material (making it easier to detect abnormalities) and some were created without contrast material. Participants spent less time gazing at noncontrast areas, causing them to miss abnormalities. Contrast areas were inspected first, and the abnormalities found there satisfied searchers, so that other areas were less scanned. When searching for targets with the aid of binary cues, viewers may thus tend to inspect cued areas before they inspect uncued areas, and they may choose to inspect only some of the uncued areas. One possible rule for deciding how many uncued areas to inspect before terminating the search may be “probability matching.” With this rule, viewers inspect items until the proportion of items identified as targets matches their estimated proportion of targets among all items. Bliss, Gilson, and Deaton (1995) suggested that people use probability matching to determine their response to alarms, because response probabilities corresponded with the probability for a true alarm. This tendency may be especially strong when items appear in arrays and the probability for faults may, to some extent, be suggested by the ratio of cued to uncued items on screens. When inspecting arrays of items with aid from cues, the choice not to inspect part of the uncued areas reduces the amount of processed information and the time and effort invested in the task. Yet, if the automatic-detection algorithm is not 100% correct, the scanning procedure is bound to result in missing targets. It is therefore important to investigate whether and how cues affect the scanning of screens in target search.

classified as intact or faulty). The process can be conceived as a walk between two boundaries, or thresholds, that continues until one of the thresholds is crossed. The estimated prior probability for one state of the world or the other (e.g., the prior confidence that an item is faulty before the inspection starts) affects the random-walk process. The greater the likelihood of one state of the world, the closer the “walk” should start near the threshold that corresponds to that state of the world (Wolfe & Van Wert, 2010), and thus, the faster the walk should end by crossing the boundary corresponding to that state of the world, and the longer it should take to cross the threshold for the other state of the world (Edwards, 1965; Palmer, Huk, & Shadlen, 2005; Wolfe & Van Wert, 2010). In a simulated baggage screening task Wolfe and Van Wert (2010) found partial support for their prediction from a randomwalk model in a target search task. When target (guns and knives) prevalence was higher, decisions to terminate a search stating “target absent” were slower, as predicted by a random-walk model. However, in contrast to the prediction from a random-walk model, detection times for targets were not faster when the prevalence of targets was higher, although searchers tended more to declare “target.” Wolfe, Horowitz, and Kenner (2005) reported similar effects of target prevalence on the time to state “target absent.” Indications from a cueing system change the prior probability for one state of the world or the other (Botzer, Meyer, Bak, & Parmet, 2010; Getty, Swets, Pickett, & Gonthier, 1995; Robinson & Sorkin, 1985). Thus, according to a random-walk model, if decision makers trust cues, they should need less time to decide according to the cues and should need more time to decide against the cue indication. In Maltz (2005) participants tracked a tank and at the same time had to detect other tanks when those entered the frame. Their decisions that the cued area was indeed a tank were faster with the higher validity cues than with lower validity cues. This finding corresponds with the prediction from a random -walk model that greater trust should facilitate decisions that correspond with recommendations. Maltz (2005) did not evaluate the time to override a cue recommendation (i.e., to declare “no tank” for a cued area). We thus analyze the effect of cues on the performance of target detection tasks at two levels as shown in Table 1. (a) At the single decision level, when analyzing each of a series of decisions on its own, cues can provide information for decisions and help improve decision accuracy (e.g., Bliss, 2003; Dixon & Wickens, 2006; Dixon, Wickens, & McCarley, 2007). In addition, cues may also Table 1 Conceptual Framework of the Effects of Binary Cues

Cues as Information for Item Classification The second way in which cues can affect categorization tasks is by changing the prior probability that an item is a target. With valid cues, fewer additional indications are needed to classify an item as a target when it is cued as a target than when it is not cued. Data-driven binary classification processes are often modeled with random-walk models (e.g., Edwards, 1965; Ratcliff & Rouder, 1998; Townsend & Ashby, 1983; Vandekerckhove, Tuerlinckx, & Lee, 2011), where data are collected until enough evidence accumulates to choose one alternative or the other (e.g., an item is

75

Level of analysis Level of effect

Process

Outcome

Single decision

Data-driven binary classification

Screenwide

Screen scanning procedure

Durations of binary classifications Proportions of correct and incorrect classifications Proportion of areas inspected Proportions of correct and incorrect classifications Screen completion durations

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

76

BOTZER ET AL.

affect the amount of data collected before each decision and as a consequence they may also affect the time and effort invested in making that decision. (b) At the screenwide level, cues may affect the scanning of the screen when a number of stimuli appear on the same screen. Observers may decide to focus on cued areas, because targets are more likely there, and, therefore, they may be able to detect a reasonable number of targets without scanning the entire screen. This may shorten task completion times (Botzer, Meyer, & Parmet, 2013; Maltz & Shinar, 2003) and lower the invested effort in the task (Botzer, Meyer, & Parmet, 2013). Estes (1986) suggested that the categorization of stimuli, based on multiple sources of information, involves three stages: (a) evaluation of each source of information; (b) integration, that is, the combination of representations derived from evaluation; and (c) decision, that is, the mapping of the integration to a response. The two different possible effects of the cue—as an information source that is combined with additional information and as an indication that directs attention and scanning processes—may affect two different stages in the categorization of stimuli. The evaluation of each source of information, or rather the decision whether to evaluate a source of information and how much attention to devote to it, are reflected in the effects of cues on the screen scanning procedure. The effect of cues on the integration stage is reflected in the weight assigned to the cues in the categorization decisions. These two effects are eventually manifested in the responses to different stimuli. We used eye activity measures to investigate the effects of cues. Only few studies have used eye tracking to study search patterns with cues (e.g., Liao, Granada, & Johnson, 2005; Lorigo et al., 2008). In these studies areas that were not highlighted were inspected less often. This was expected, because participants could end the search after they found the single or the two targets that existed in an image. We found no studies on the effect of cueing on target search tasks with unknown numbers of targets. Eye tracking data can reveal what items users inspect first, and whether they indeed choose to inspect only part of the uncued areas. Eye activity data can also provide indications on the effect of cues on the time invested to make single decisions. As we described before, a random-walk model predicts that decisions should be faster if they follow cue recommendations and slower if they override these recommendations. In addition to the analysis of the effects of cues on participants’ scanning procedure and on the durations of single decisions, it is also informative to analyze the weights assigned to the cues in decisions, relative to a quantitative benchmark. To do so, we used signal detection theory (SDT) measures, as were used by Maltz and Meyer (2001); Meyer (2001); Parasuraman, Hancock, and Olofinboba (1997), and others. In terms of SDT the binary cueing system and the user are two detectors, and decisions result from their joint performance (Sorkin & Woods, 1985). Each detector has a certain sensitivity (d=) to distinguish between two possible states (e.g., whether a product is faulty or intact). Both detectors receive information and evaluate it relative to a response threshold which determines their output given a certain input. The binary cueing system has a preset response threshold, and its output serves as an additional input for the human detector. According to the contingent criterion model (Robinson & Sorkin, 1985), the response thresholds (␤) the human detectors use with the cueing system reflect the weight they assign to the cue in their

decisions. To the extent that cues are valid, the probability for a signal (e.g., a fault in a product) is larger when the system issues a cue for a failure and smaller when there is no cue. Therefore, threshold settings should change as a function of the system output, that is, ␤Cue ⬍ ␤NoCue, as the probability for a signal decreases or increases, respectively. For nonvalid cueing systems the probability for a signal or noise does not change with the different cues, and therefore, as the system output has no predictive value, the same threshold should be used regardless of the system output, that is, ␤Cue ⫽ ␤ ⫽ ␤NoCue, where ␤ is the threshold when there is no cueing system at all. The differential adjustment of the response criteria to the output of the cueing system is a measure of the individual’s response to the cues and their weight in decisions. When an individual uses the same response criterion regardless of the cue, she or he obviously ignores the cue. The use of different criteria indicates that the individual responds to the cue, and the larger the difference between the settings, the stronger the response. Meyer (2001, 2004) identified two responses to cues. The first is compliance, which is the tendency to act in accordance with a cue that implies a problem (“signal” in SDT terms), for example, to discard a product when it is cued faulty. The second is reliance, which is the tendency to rely on the binary cueing system when it indicates that no problem exists (“noise” in SDT terms), for example, to approve a product when it was not cued faulty. One can assess the degree of compliance and reliance by comparing individuals’ response threshold when they use a binary cueing system to their response threshold when they make their decisions without such an aid. Several studies showed that different variables affect the two responses differentially. For instance, false alarms were found to affect both reliance and compliance, while missed detections only affected reliance (Dixon & Wickens, 2006; Dixon, Wickens, & McCarley, 2007; Meyer, Wiczorek, & Günzler, 2014; Rice & McCarley, 2011). In the current study, we used SDT to estimate the weight users assigned to cues in their decisions and we compared these weights to the normative optimal values according to this model. We also used SDT to evaluate the contribution of binary cues to performance in the quality control task. As we mentioned before, we investigated the underlying decision-making processes, namely, the scanning procedure and the time for single decisions using eye activity measures. SDT measures are to some extent “indifferent” to these processes. With respect to single decisions we defined two new terms to quantify the trust in cues: (a) Countercompliance is the increase in the decision time when stating that an item is intact although a cue indicated that it contained a target, compared to the time to decide that an item is intact without cues; (b) counter-reliance is the increase in the decision time when stating that an item contains a target although the cueing system marked it as intact, compared to the time to decide that an item contains a target without cues. As trust in cues is a function of the sensitivity (i.e., validity) of the cues (e.g., Botzer et al., 2013; Maltz, 2005; Meyer, 2001, 2004) we used cues from medium sensitivity and from high sensitivity systems. We expected the sensitivity of cues to affect the degree of compliance, reliance, countercompliance and counter-reliance and to also affect the screen scanning procedure. Fisher and Tan (1989) have shown that the tendency to scan highlighted areas before other areas is greater when the highlighting is more valid. When users do not greatly trust the highlighting,

EFFECTS OF CUES ON TARGET SEARCH BEHAVIOR

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

they may prefer a sequential scanning of possible targets, even when one uses color to highlight, where color is a basic feature of a stimulus and may readily draw attention. Further, Fleetwood and Byrne (2006) showed how users prefer to examine icons nearest to their current point of gaze, and MacGregor and Lee (1987) maintained that much data about response time in target search tasks can be accommodated by a systematic rather than a random visual search model. Thus, with lower sensitivity cues, the tendency to inspect cued items in a sequence should be weaker because the basic tendency to inspect items according to their spatial location may be dominant. Thus our hypotheses in this study are as follows: Hypothesis 1: Task performance will improve with binary cues and improve more with the more valid cues. Hypothesis 2a: Users will comply with cues and more strongly with the more valid cues. Hypothesis 2b: Users will rely on cues and more strongly on the more valid cues. Hypothesis 3: Users will inspect a large proportion of the cued items before they inspect uncued ones, especially when cues are more valid. Hypothesis 4: Users will inspect a larger proportion of cued items than uncued items in blocks that contain cues and a smaller proportion of items in blocks with cues than in blocks without cues, especially with more valid cues. Hypothesis 5: Users will show countercompliance and counter-reliance, and these will be stronger with more valid cues. Hypothesis 6: Single decisions about items will be faster with cues than without them when the final decision corresponds with the binary cue recommendation. This effect will be stronger with the more valid cues.

77

acuity of 6/9 (20/30) or better, normal color vision and normal contrast sensitivity.

Task and Procedure The experimental task was similar to the binary categorization and detection task used in Botzer et al. (2013). Participants identified defective items in matrices of 25 items, with each item being a 5 ⫻ 5 matrix of black and white cells (see Figure 1 for an example of the experimental screen). White squares were locations in which an item was tested and found intact. In our experiment there were 13 white squares in each item. An item is damaged if a continuous 2 ⫻ 2 field is not intact. In each matrix, only a randomly sampled subset of the squares are tested and appear as white squares if they are intact. Participants had to decide whether items are faulty, based on this partial information. A decision problem exists because squares are black either when they are faulty or when they have not been tested. Thus, a continuous 2 ⫻ 2 field may also exist in an intact item. Figure 2 shows an example of intact and damaged items. If all squares are tested, one can for certain classify an item as faulty or intact. If only a small number of squares are tested, the distinction between faulty and intact items is only possible when dark squares do not form a continuous 2 ⫻ 2 area. In all other cases, the probability for a fault is a function of the characteristics of the items. In Botzer et al. (2013), we show that the probability for a fault in an item is a function of the number of possible fault areas (2 ⫻ 2 black squares), the number of white cells and the prior probability for a fault. Even though this is an artificial and abstract task, it resembles in its essential properties real life tasks where people have to locate targets in images. A radiologist, for example, may contemplate whether a dense group of cells is a tumor because healthy cells may sometimes be just as dense (Lai, Li, & Biscof, 1989). Examples for similar tasks are the scanning of bitewing radiographs for dental lesions (e.g., Araki, Matsuda, Seki, & Okano, 2010; Mobley

Hypothesis 7: Task completion times will be faster in blocks with binary cues than in blocks without binary cues, especially with more valid cues. Hypothesis 8: Subjectively experienced effort will be lower with binary cues and will decrease more with more valid cues.

Method Participants Participants were 26 students (26 to 31 years old, with a mean age of 27 years and a standard deviation of 1.74 years; 22 were women) who received 50 New Israeli Shekel (ILS; ⬃US$14) for their participation. We informed them that each point they gained in the experiment is a raffle ticket in a lottery for a 200 ILS (⬃US$57) cash prize. The reward program was designed to motivate participants to invest effort in the task even when they believed their initial task performance was low, because the probability of winning a prize always increased with the number of points gained. All participants had uncorrected Snellen visual

Figure 2. Sample of an intact item (upper left) and a faulty item (upper right) when all squares are tested and a sample of an intact item (lower left) and an item in which a failure can exist (lower right) when only part of the squares are tested. See the online article for the color version of this figure.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

78

BOTZER ET AL.

& Goldstein, 1978), aerial photos for targets, printed circuit boards for faults (e.g., Yoda et al., 1988), or images from an agricultural robotic system for picking melons (see Bechar, Meyer, & Edan, 2009, for a detailed discussion of the latter application). Even though our task does not allow us to accurately predict performance levels or the pace of learning in actual target search tasks, it can provide predictions about the effects cues and their properties may have on such tasks. The experimental task has several major advantages for the study of categorization decisions and for the study of the effects of cues on scanning procedures in particular. (a) There are 25 stimuli in a screen, so relatively few screens are needed to obtain a sufficiently large number of items to compute stable estimates of hit and false alarm rates. (b) No previous experience participants may have is relevant for this task, and it is relatively insensitive to basic sensory abilities such as visual acuity or contrast sensitivity. Thus, participants can be considered homogenous in terms of their experience, and one can safely assume that they have the required sensory abilities to collect the information needed for their decisions. (c) One can easily define areas of interest for fixation analyses, as the matrix of items neatly divides the screen into squares of equal size. (d) The probability for a fault in an item is a function of the number of possible fault positions in the item (see Botzer et al., 2013, for the computation of the probabilities). Participants can use the visible information and compare it to a response threshold. Participants performed the experiment individually. The experimenter gave them the instructions and read them aloud, referring to figures of the experimental stimuli and screens. Participants also performed two familiarization trials to ensure they understood the task. We provide full details of the instructions and the familiarization in Botzer et al. (2013). Participants decided whether to approve production for each item in a 5 ⫻ 5 matrix of items, based on the pattern of 13 randomly sampled intact squares in each item. They right-clicked on an item they identified as faulty, causing a blue frame to appear around the selected item. They could change their decision by clicking on a selected item again. A trial-screen ended when participants clicked the “Submit” button. There was no time limit to complete a screen. Cues, in the form of black frames around items, indicated possibly faulty items in half the experimental blocks. Participants received 150 points for correctly identifying a fault (a “hit”) and lost 50 points for failing to detect one (a “miss”). A “correct rejection” was neither rewarded nor punished, and a “false alarm” was punished by 100 points. The experimenter explained these outcomes and how the number of points gained in the task was related to the chance of winning the 200 ILS raffle. This matrix was intended to encourage participants to search for faulty items, while discouraging them from excessively classifying items as faulty. The experimental conditions differed in the sensitivity of the cueing system (its d=Cue). The more sensitive cueing system (d=Cue ⫽ 3) was more likely to mark faulty items and less likely to mark intact items than the less sensitive cueing system (d=Cue ⫽ 1.7). The threshold value for both cueing systems was Ln␤ ⫽ 0 (see Table 2 for the conditional probabilities for the different values of d=). The probabilities pCue|Target and pNoCue|NoTarget were independent from the prior probability, ps, while the probabilities

Table 2 The Conditional Probabilities Associated With the Cues From the d= ⫽ 1.7 and the d= ⫽ 3 Systems Sensitivity (d=Cue) 1.7 3

pCue|Target pNoCue|NoTarget pTarget|Cue pNoTarget|NoCue 0.80 0.93

0.80 0.93

0.57 0.81

0.92 0.97

pTarget|Cue and pNoTarget|NoCue depended on the prior probability for a fault, which was ps ⫽ .25. One can compute the latter probabilities from the former ones and ps by using Bayes’s theorem. These probabilities are in fact the percent correct when an item is cued and when an item is not cued, respectively. We did not inform participants about the sensitivity of the cues in their experimental condition, but we told them that cues were not perfect and that they did not have to follow the cue recommendations. In all experimental blocks, participants saw a feedback screen after they submitted their decision (see Figure 3). The screen was similar to the screen they just completed, only that the continuous 2 ⫻ 2 field in each faulty item was painted red. Thereby, participants could learn about their correct and wrong answers and assess the accuracy of the cueing system when it was active. In the left part of the feedback screen, participants received feedback, summarizing the frequency of each outcome and the corresponding payoff. The total score for the screen and for the experiment appeared below and above the written summary, respectively. Each experimental block consisted of 12 screens, followed by an on-screen NASA-Task Load Index (TLX) questionnaire (Hart & Staveland, 1988) to assess participants’ mental workload. The paired comparison part from the original NASA-TLX procedure was omitted, as suggested by Byers, Bittner, and Hill (1989) and Nygren (1991). Upon completing the questionnaire, there was a 1-min rest period before the next block began.

Apparatus and Design We conducted the experiment on PCs with 17-in. monitors, using experimental software programmed in C# .Net. The four experimental conditions were combinations of the sensitivity of the binary cueing system (d=Cue ⫽ 1.7 or 3) and the blocks in which the binary cueing system was active (either in Blocks 1 and 3 or in Blocks 2 and 4). We randomly assigned 13 participants to each of the two d=Cue levels. Participants received feedback after each decision so they could assess the sensitivity of the system, even though they did not use other systems. This expectation is supported by previous studies with sensitivity as a between-subjects variable (e.g., Botzer et al., 2013; Maltz, 2005; Meyer, 2001). The status of the binary system (active or not active) and the half of the experiment (Blocks 1 and 2 in the first half and Blocks 3 and 4 in the second half) were within-subjects variables. For the blocks with and without cues we computed mean d=eq values, mean cutoff settings, and mean times to complete blocks, and we used these means as dependent variables in the experiment. We recorded eye movements with a pan/tilt control eye-tracking system (Model D6, Applied Science Laboratories, Bedford, Massachusetts), sampling the visual gaze at 60 Hz, with a nominal accuracy of 1°. We computed fixations in ILAB (Gitelman, 2002), using the dispersion methodology, with a minimum fixation dura-

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

EFFECTS OF CUES ON TARGET SEARCH BEHAVIOR

Figure 3. The feedback screen. See the online article for the color version of this figure.

tion of 100 ms; minimum dispersion of 1 visual degree, considered a fixation; and the maximum consecutive sample loss set to infinity (the default setting). Based on the eye movement data, we computed the estimated fixation time on single items, the order of scanning of items, the proportion of transitions between adjacent items, and the proportions of items inspected. We used these measures to study users’ screen scanning procedure.

Results We analyzed responses with repeated measures ANOVAs, applying Greenhouse–Geisser corrections for violations of sphericity for within-subject effects. In the analyses of the eye-tracking data, we had to account for missing values, a common problem in eye-tracking procedures where short periods of failure to capture the participant’s eyes occur (Borowsky, Shinar, & Parmet, 2008). To collect accurate estimates for our parameters, we computed a linear mixed model (LMM) analysis with a random intercept on the relevant dependent variables. We computed effect sizes for LMM findings using the computational method developed by Xu (2003). Our experimental design included order as a control variable, so that half of the participants first completed a block without cues and the other half a block with cues. Such counterbalancing allowed us to separate the effects of the cues from the effects of time on the task on dependent variables such as d= and score. We report here the results of the analyses without including the order variable as an independent variable. We first analyzed the results with this variable and found that its effects never qualified other findings. We therefore chose not to include it in our ANOVA and LMM models. We divided the experiment into two halves (first half: Blocks 1 and 2; second half: Blocks 3 and 4) and used the half as a within-subjects variable.

79

performance in a block, and d=Cue, which indicates the sensitivity of the cueing system. Decision performance was better in blocks with cues (M ⫽ 2.04, SD ⫽ .82) than in blocks without cues (M ⫽ 1.03, SD ⫽ .41), F(1, 24) ⫽ 102.72, MSe ⫽ .25, p ⬍ .001, ␩p2 ⫽ .81. This improvement was larger when cues were more valid, F(1, 24) ⫽ 21.63, MSe ⫽ .25, p ⬍ .001, ␩p2 ⫽ .47, see Figure 4. Thus, Hypothesis 1 was supported. The mean d=eq values in blocks without cues were approximately d=eq ⫽ 1, so participants could perform the task and did not mark items arbitrarily. Participants’ performance with both cueing systems (i.e., d=eq ⫽ 1.51 and d=eq ⫽ 2.56 in the black bars in Figure 4) was lower than the sensitivity of the cueing systems themselves (i.e., d=Cue ⫽ 1.7 and 3). In other words, if participants had just followed the cues and had not employed any judgment of their own, performance in the task would have been at least as good or even better (see dashed lines relative to error bars in Figure 4). The complementary analysis of participants’ mean scores revealed a similar pattern as the analysis of d=eq. Decision performance improved with cues, as the mean score was greater in blocks with cues (M ⫽ 4415.385, SD ⫽ 3525.92) than in blocks without cues (M ⫽ ⫺383.654, SD ⫽ 1,947.31), F(1, 24) ⫽ 4.882, MSe ⫽ 7,418,257.21, p ⬍ .001, ␩p2 ⫽ .77. This improvement was larger when cues were more valid, F(1, 24) ⫽ 6.804, MSe ⫽ 7,418,257.21, p ⬍ .02, ␩p2 ⫽ .22.

Cutoff Settings To assess participants’ responses to the binary cues we analyzed their cutoff settings, designated as C in SDT. This is the location of the cutoff (or criterion) in a signal detection model, assuming equal-variance Gaussian (normal) distributions for signals and noise. It is defined as C ⫽ ⫺.5(ZH ⫹ ZFA).

(1)

where ZH and ZFA are the inverse of the normal distribution

Sensitivity (d=eq) We computed a three-way ANOVA (Sensitivity ⫻ Status ⫻ Half) on the d= equivalent (d=eq), which is the overall performance measure in the quality control task (i.e., the d= based on the overall proportion of hits and false alarms in an experimental block). Note the difference between d=eq, which is a measure of the quality of

Figure 4. Performance in d=eq units when cues were available (black bars) and when cues were unavailable (gray bars), for the different cue sensitivities (d=Cue). Error bars are for confidence intervals of 95% (0.324 in black bars, 0.164 in gray bars). Dashed lines represent d=Cue.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

80

BOTZER ET AL.

function for pHit (the probability of correctly detecting a signal, given that there was one) and pFA (the probability of stating that there was a signal when there was none out of all nonsignal events), respectively. Macmillan and Creelman (1990) compared various measures of the response criterion in signal detection studies and concluded that a measure of criterion location, such as C, has advantages in empirical work, compared to measures of the criterion such as␤. With lower values of C a user tends more to report a signal, or to discard a product in our experiment. We computed participants’ mean tendency, C, to discard items in blocks where they were not aided by binary cues. In blocks where binary cues were available, we computed the mean value of C for items cued as faulty (CCue) and the mean value of C for items the system did not cue as faulty (CUncued). If participants comply with the system they should discard cued items more than they should discard items in blocks where cues are unavailable. Therefore, their mean CCue value should be lower than their mean value of C (CCue ⬍ C). If participants rely on the system they should discard uncued items less than they should discard items in blocks where cues are not presented. Therefore, their mean CUncued value should be higher than their mean value of C (CUncued ⬎ C). Compliance. Participants complied with the cues, as they showed greater tendency to discard cued items (M ⫽ ⫺.44, SD ⫽ .57) than to discard items in blocks without cues (M ⫽ .18, SD ⫽ .49), F(1, 24) ⫽ 69.05, MSe ⫽ .15, p ⬍ .001, ␩p2 ⫽ .74. Still, participants’ compliance was much weaker than predicted from the normative model in which CCue should be ⫺1.14 and ⫺2.16 for the d=Cue ⫽ 1.7 and 3 cues, respectively. Threshold settings that are higher than low normative thresholds or lower than high normative thresholds are a robust phenomenon in signal detection experiments (e.g., Chi & Drury, 1998; Craig & Colquhoun, 1977; Fox & Haslegrave, 1969; Mobley & Goldstein, 1978) and are known as “sluggish beta” (Wickens, 1992). In contrast to previous findings (Maltz & Meyer, 2001; Meyer, 2001), compliance was not stronger with the more sensitive cues. Participants in the d=Cue ⫽ 3 tended more to discard items (M ⫽ ⫺.35, SD ⫽ .94) than participants in the d=Cue ⫽ 1.7 group (M ⫽ .08, SD ⫽ .77), whether cues were available or not, F(1, 24) ⫽ 7.58, MSe ⫽ .42, p ⬍ .05, ␩p2 ⫽ .24. In other words, participants in the d= ⫽ 3 group chose lower criteria (CCue and C) than participants in the d= ⫽ 1.7

group. However, the difference between CCue and C was not greater in the d= ⫽ 3 than in the d= ⫽ 1.7 group, and, therefore, compliance was not stronger with the more sensitive cues. Participants in the d= ⫽ 3 group probably performed the task better than participants in the d= ⫽ 1.7 group, even though they did not comply more with the cues, because the d= ⫽ 3 system was correct more often than the d= ⫽ 1.7 system (pTarget|Cue ⫽ 0.81 and pTarget|Cue ⫽ 0.57, respectively), and because participants who used this system discarded a larger proportion of cued items. In conclusion, Hypothesis 2a was only partly supported. Participants complied with the cues but did not comply more strongly with the more sensitive cues. Reliance. Participants relied on the cues, as they showed a lower tendency to discard uncued items (M ⫽ 1.10, SD ⫽ .69) than to discard items in blocks without cues (M ⫽ .18, SD ⫽ .49), F(1, 24) ⫽ 7.58, MSe ⫽ .63, p ⬍ .001, ␩p2 ⫽ .59. Also, Figure 5 reveals stronger reliance on the higher sensitivity than on the lower sensitivity cues in the second half of the experiment, F(1, 24) ⫽ 16.95, MSe ⫽ .03, p ⬍ .001, ␩p2 ⫽ .41, for the three-way interaction Sensitivity ⫻ Status ⫻ Half. This interaction corresponds with previous findings, as it suggests that over time decision makers adjusted their responses to the sensitivity of cues (Botzer et al., 2013; Maltz & Meyer, 2001) and relied more strongly on the more sensitive cues (Maltz & Meyer, 2001; Meyer, 2001). This interaction may also account for the better performance with the more sensitive cues. As with compliance, the cutoffs participants chose were much lower than those the normative model would prescribe (the normative settings were CUncued ⫽ 2.08 and 2.95 for cues from the d=Cue ⫽ 1.7 and 3 systems, respectively). Thus, participants’ responses to uncued items also reflected the sluggish beta phenomenon. In conclusion, Hypothesis 2b was partly supported. Participants relied on the cues, yet stronger reliance on the more sensitive cues was only evident in the second half of the experiment.

Aanalyses of Users’ Scanning Procedure Order of scanning. According to Hypothesis 3, users will inspect a large proportion of the cued items before they inspect uncued ones, especially when cues are more sensitive. Knowing the conditional probabilities in Table 2 and the prior probability for a fault (p ⫽ .25), one can compute a mean number of 7.12 and 8.75 cued items for

Figure 5. Reliance on the d=Cue ⫽ 1.7 and 3 cues in the first half (left panel) and in the second half (right panel) of the experiment. Error bars are for confidence intervals of 95% (left panel: 0.413 in black bars, 0.244 in gray bars; right panel: 0.367 in black bars, 0.292 in gray bars).

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

EFFECTS OF CUES ON TARGET SEARCH BEHAVIOR

a screen with the d= ⫽ 3 and d= ⫽ 1.7 cues, respectively. Because, on average, it took a bit more than a second to classify an item when it was cued faulty (see black bars in Figure 8), one can conclude that it would have taken, on average, approximately 10 s to classify all cued items in a sequence from the onset of each screen. In that case, most, or even all transitions between items in the first 10 s from the onset of each screen should have been between cued items. On the other hand, if viewers inspected cued and uncued items intermittently, most transitions in the first 10 s of the screen should not have been between cued items. Thus, the ratio of consecutive inspections of cued items to the total number of transitions between items, in the first 10 s of each screen, can be used as a measure of the tendency to scan cued items in a sequence. Essentially, the estimated ratio should approach 1as the tendency to inspect cued items in a sequence increases. We defined each item on the screen as an area of interest, and we assigned consecutive numbers from 1 to 25 to the areas according to their location on the screen from left to right (i.e., top left item was 1, and bottom right item was 25). We could identify consecutive fixations on cued items from the ordinal number assigned to each fixation and from knowing which items were cued. For example, if Fixations 6 and 7 were on two different cued items, we could count one instance of a fixation on a cued item, immediately followed by a fixation on another cued item. For each screen we computed the number of consecutive fixations on cued items and the total number of transitions between items in each 5-s period since the appearance of a screen. We used these data to compute the proportion of transitions between cued items out of all transitions between items, cued or uncued. The proportion of consecutive transitions between cued items in the first 10 s since the appearance of a screen was .289 (SE ⫽ .038), which was clearly lower than 1, but also larger than what had been expected if transitions were random. As can be seen in Figure 6, the proportion of consecutive fixations on cued items was initially high and then decreased and remained relatively stable. We conducted an LMM analysis with a random intercept on the proportion of consecutive fixations on cued items. The random

Figure 6. Proportion of consecutive fixations on cued items as a function of time from the onset of an experimental screen. p5s ⬎ p30s, according to a Bonferroni correction. ⴱ p ⬍ .01.

81

effect was the participants and the fixed effects were the system sensitivity and the half of the experiment (Blocks 1 and 2 vs. Blocks 3 and 4). We included the second-order interactions between the fixed effects in the model. There were no significant effects or interactions of the fixed effects on the proportion of consecutive fixations on cued items. Thus, in contrast to Hypothesis 3, the more sensitive cues did not increase participants’ tendency to inspect cued items consecutively. Inspecting adjacent items. As mentioned in the introduction, viewers sometimes prefer to scan a sequence of adjacent areas. This may have been the reason why participants in the experiment did not show a strong tendency to inspect a series of cued items. We analyzed the proportion of transitions between adjacent items to learn more about participants’ scanning procedure. To test whether users preferred to inspect adjacent items, we used the ordinal number assigned to each fixation and the known spatial positions of items. Items were considered adjacent if they shared a border but were not diagonal to each other. Figure 7 shows an example of a scan pattern in an experimental screen with circles representing fixations and the lines between circles representing transitions between fixations. The black arrows mark a transition between adjacent items and the white arrows mark a transition between separated items. We computed the overall number of transitions between items and the number of transitions between adjacent items in each block. The proportion of adjacent transitions out of all transitions was our estimate of the probability that users switch their fixations between adjacent items. We analyzed this variable with an LMM analysis with a random intercept for the participants and the system status, system sensitivity, and the half of the experiment as fixed effects. We included all second- and third-order interactions between the fixed effects in the model. Participants were more likely to fixate on adjacent items in blocks without cues (M ⫽ .69, SE ⫽ .01) than in blocks with cues (M ⫽ .62, SE ⫽ .12). Thus, with cues, participants were apparently more likely to forgo consecutive fixations on adjacent items, probably in favor of fixations on cued items, even when these were not adjacent, F(1, 96) ⫽ 15.51, p ⬍ .001, ⍀2 ⫽ 0.12. However, even with cues, a large proportion of transitions were between adjacent items. Further, the difference between the means in blocks with and without cues was only .07. Because some of the cued items were also adjacent, the differences in the tendency to switch between adjacent items with and without cues cannot fully reveal how strongly cues affected the scanning procedure. Still, the strong preference to scan adjacent items we found here is also supported by the low proportion of consecutive fixations on cued items we found in the previous analysis (.289). There were no other significant effects in the analysis, so, again, the sensitivity of cues did not affect the scanning procedure. Proportion of inspected items. To learn whether participants tended to fixate more on cued items than on uncued items we conducted a binary logistic regression analysis on the probability to fixate on an item. The first independent variable was “system recommendation,” with three levels: no recommendation (i.e., in blocks without cues), recommended faulty, and recommended intact. We encoded the variable through two dummy variables— not recommended (yes, no) and recommended faulty (yes, no). The other independent variables were the sensitivity of the binary

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

82

BOTZER ET AL.

Figure 7. Scan pattern with a transition between adjacent items (black arrow) and a transition between separated items (white arrow). See the online article for the color version of this figure.

cueing system and the half of the experiment. Table 3 presents the results of the analysis. As none of the interactions was significant, we chose not to report their coefficients to facilitate readability. In accordance with Hypothesis 4, we found that participants fixated on a significantly greater proportion of cued items (M ⫽ .80, SE ⫽ .023) than uncued items (M ⫽ .61, SE ⫽ .04). As a result, the overall proportion of items they inspected in blocks with cues was lower (0.66, SE ⫽ .034) compared to the overall proportion in blocks without cues (0.79, SE ⫽ .02). A logistic regression analysis with system status (active or not active) as an independent variable, instead of system recommendation, showed that the difference in the overall proportion of inspected items in blocks with cues and in blocks without cues was significant (p ⬍ .001). Contrary to Hypothesis 4, the sensitivity of the cueing system had no effect on the proportion of items participants fixated on. Participants fixated on a smaller proportion of items in the second (M ⫽ .71, SE ⫽ .027) than in the first half of the experiment (M ⫽ .76, SE ⫽ .023). Possibly participants were tired in the second half, and they tried to save effort by inspecting fewer items. We suggested that if participants inspect only a subset of the items, they may decide when to stop their search based on prob-

ability matching. In our experiment, this means stopping the search for faults when the proportion of marked items matches participants’ estimated proportion of faulty items on a screen. To test this possibility, we computed the proportion of items participants marked as faulty in blocks with cues. This proportion (p ⫽ .29) was quite close to the actual proportion of faults in the task (p ⫽ .25), even though we did not inform participants about this proportion, so not to encourage them to probability match. The proportion of marked items when cues were unavailable (p ⫽ .34) was greater than p ⫽ .25 and significantly higher than the proportion when cues were available, F(1, 24) ⫽ 4.79, MSe ⫽ .01, p ⬍ .05, ␩p2 ⫽ .048. Thus, when cues were available, participants may have tried to probability match rather than to make sequential decisions regarding the items on a screen. In line with our analysis of the cutoff settings, we found that in blocks without cues participants in the d=Cue ⫽ 3 group marked a greater proportion of items (M ⫽ .36, SD ⫽ .10) than participants in the d=Cue ⫽ 1.7 group (M ⫽ .277, SD ⫽ .10), F(1, 24) ⫽ 6.52, MSe ⫽ .02, p ⬍ .05, ␩p2 ⫽ .21.

Estimated Fixation Time on Single Items We computed for each participant the cumulative fixation durations on the items he or she fixated upon and analyzed them with an LMM with participants as a random effect and the system sensitivity (low or high), system recommendation (not recommended, in blocks without cues; recommended intact; and recommended faulty), decision (the item was classified as intact or faulty), and the half (first or second) as fixed effects. We included all second-, third-, and fourth-order interactions between the fixed effects in the model. The two-way interaction between the system recommendation and the decision was significant, F(2, 21,378) ⫽ 74.56, p ⬍ .001, ⍀2 ⫽ .007, as was the three-way interaction between the system recommendation, the decision and the cue sensitivity, F(2, 29,378) ⫽ 14.175, p ⬍ .001, ⍀2 ⫽ .001, shown in Figure 8. Fixations on items indicated as faulty that were eventually classified as intact (the black bar in the left set of bars in each panel) were clearly longer. This difference in fixation times is evidence for countercompliance. It existed with both higher and lower validity cues, but it was stronger when cues were more valid, F(1, 1,096) ⫽ 12.18, p ⬍ .001 (for the comparison between the black

Figure 8. Average fixation durations as a function of the cue recommendation about the items and users’ decision for the d=Cue ⫽ 1.7 group (left panel) and for the d=Cue ⫽ 3 group (right panel). ⴱ p ⬍ .05. ⴱⴱ p ⬍ .01. ⴱⴱⴱ p ⬍ .001, with Bonferroni corrections. Error bars are for confidence intervals of 95%.

EFFECTS OF CUES ON TARGET SEARCH BEHAVIOR

Table 3 Coefficients and Standard Errors of Predictors of the Probability to Fixate on an Item Parameters

B

SE

Half (first) Sensitivity (d= ⫽ 1.7) Not recommended Recommended faulty

.453ⴱⴱ ⫺.08 1.06ⴱⴱⴱ .97ⴱⴱⴱ

.16 .36 .20 .29

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

ⴱⴱ

p ⬍ .01.

ⴱⴱⴱ

p ⬍ .001.

bar in the left group of bars in the left panel vs. the black bar in the left group of bars in the right panel). We found counter-reliance only with the d=Cue ⫽ 1.7 cues. The average fixation duration was longer before classifying items as faulty when these were recommended intact (left panel, white bar in the right set of bars) compared to the average fixation duration to classify items as faulty without cues (left panel, gray bar in the right set of bars). Thus, Hypothesis 5 was supported with respect to countercompliance and only partly supported with respect to counter-reliance. We also expected shorter fixation durations for decisions that agreed with the system recommendations. This was only evident for decisions that uncued items were indeed intact and only with the d=Cue ⫽ 3 cues (right panel, white bar is shorter than the gray bar in the left group of bars). Decisions that cued items were indeed faulty were no faster with the d=Cue ⫽ 3 cues than without cues at all (right panel, black and gray bars in the right set of bars are approximately the same in height). Moreover, decisions that cued items were faulty were even longer with the d=Cue ⫽ 1.7 cues than without cues at all (left panel, black bar is higher than the gray bar in the right set of bars). Thus, although confidence about the status of cued items should have been greater, decisions were not faster and sometimes even slower than without recommendations at all. It seems, then, contrary to Hypothesis 6, that fixation durations on items that were classified as faulty were not shorter for cued items than for items in blocks without cues.

83

4.99, MSe ⫽ 1.10, p ⬍ .05, ␩p2 ⫽ .17. As can be seen in Figure 9, participants with d=Cue ⫽ 1.7 cues reported similar effort with and without the cues, while for participants using the d=Cue ⫽ 3 cues, the differences in reported effort with and without cues were clearly larger. Yet, the interaction between the status of the system and its sensitivity was only marginally significant (p ⫽ .075), and, therefore, Hypothesis 8 was only partly supported. We believe, however, that with a larger sample Hypothesis 8 would have been fully supported as in Botzer et al. (2013). An in-depth analysis we computed indeed revealed that cue availability reduced the experienced effort with the more sensitive cues, F(1, 12) ⫽ 5.82, MSe ⫽ 1.59, p ⬍ .05, ␩p2 ⫽ .32. This result corresponds with the finding that certain decisions were faster with the d=Cue ⫽ 3 cues, as is evident in the right panel in Figure 8, where the white bar is shorter than the gray bar in the left group of bars.

Discussion We investigated the effect of binary cues on decision processes at the single item and at the screenwide levels and the effect of cues on performance and reported effort when items in a simulated quality control task were presented in arrays. Previous studies on visual search (e.g., Yeh et al., 2003; Yeh & Wickens, 2001; Yeh, Wickens, & Seagull, 1999) showed that targets are more often missed in uncued than in cued areas, yet it was hard to determine whether uncued areas are skipped or maybe merely inspected less thoroughly. Our analysis on the proportion of fixations on cued and uncued areas indicates that viewers chose not to inspect a large proportion of the uncued areas (approximately 40%). The literature on highlighting suggested that with credible highlighting, highlighted areas will be inspected first (e.g., Fisher & Tan, 1989; Lorigo et al., 2008). However, in these studies, only one or two targets could exist, while in our study, the number of targets was greater. Participants showed some tendency to scan cued items in a sequence, but they also tended to inspect adjacent

Time To Test Hypothesis 7, we computed a three-way ANOVA (Sensitivity ⫻ Status ⫻ Half) on the average time to complete a block. Participants completed a block faster when cues where available (M ⫽ 330.73 s, SD ⫽ 152.94) than when cues were unavailable (M ⫽ 377.66 s, SD ⫽ 141.35), F(1, 24) ⫽ 5.65, MSe ⫽ 10,116.40, p ⬍ .05, ␩p2 ⫽ .19. It seems, then, that participants processed less additional information when cues were available. Cue validity had no significant effect on the time to complete a block. Thus, Hypothesis 7 was only partly supported.

Effort To Test Hypothesis 8, we computed a three-way ANOVA (Sensitivity ⫻ Status ⫻ Half) on the effort scale in the NASATLX questionnaire. Participants reported investing less effort in the task when cues were available (M ⫽ 4.19, SD ⫽ 2.06) than when they were unavailable (M ⫽ 4.65, SD ⫽ 2.07), F(1, 24) ⫽

Figure 9. Reported effort when cues were available (black bars) and when cues were unavailable (gray bars) with d=Cue ⫽ 1.7 and 3. Error bars are for confidence intervals of 95% (0.969 in black bars, 0.889 in gray bars).

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

84

BOTZER ET AL.

items. Thus the possible number of targets may affect the scanning procedure with cues. Finally, cue validity had no effect on the proportion of items inspected and on the tendency to inspect a sequence of cued items. The time course of consecutive fixations on cued items shown in Figure 6 reveals that viewers often began their search with cued items. However, it also reveals that users showed only a small tendency to inspect cued items in a sequence, whereas other analyses demonstrated the tendency to inspect adjacent items. But viewers still did not inspect 40% of the uncued items. It seems, then, that viewers began their search with cued items, continued their search to adjacent items, and then, again, halted the adjacent inspection in favor of cued items, and so forth. Thus, it seems that participants inspected cued items and the perimeters around them. We do not know how participants decided to terminate their adjacent inspection and switch to cued items, but we believe that a candidate mechanism may be similar to Chun and Wolfe’s (1996) timing hypothesis. It states that when people look for a target, they have a notion how long it should take to find it. As the search draws on and no target is found, they become more certain that no target exists. Wolfe, Horowitz, and Kenner (2005) refer to the internal time limit for the search as the “quitting threshold.” It is possible, then, that when searchers look for an unknown number of targets with the aid from binary cues, they may scan adjacent items because this requires less effort than scanning cued items all over the screen. After a certain time passes, searchers estimate how many targets they should have found, based on their experience and according to the estimated proportion of targets on screens. When a certain internal time limit is exceeded, viewers decide to abandon the sequential (adjacent) search and switch to cued items as they are more likely to be faulty. The search ends when the viewers believe they found the correct proportion of targets on the screen. The scanning procedure we describe corresponds with our finding that the proportion of items classified as faulty in blocks with cues was near the actual proportion of faulty items. Furthermore, such a procedure may also account for the lack of an effect of the cue sensitivity on the proportion of fixations and on the tendency to scan cued items consecutively. If cues designate areas that are more likely to be targets, they can serve as starting points for the sequential inspection of adjacent items. In conclusion, with respect to the effect of cues on the scanning procedure, our findings suggest that cues affect the proportion of items inspected and the sequence of inspection. The smaller proportion of items inspected in blocks with cues, compared to blocks without cues, may explain the reduced effort participants reported when they used cues and the shorter time needed to complete blocks with cues, findings we also reported in Botzer et al. (2013). In line with the predictions of a random-walk model, decisions that uncued items were intact were faster with cues from the d=Cue ⫽ 3 system than without cues. Such an effect was not found for the d=Cue ⫽ 1.7 system, possibly because it was not sensitive enough to shift the starting point for the walk sufficiently to shorten the process significantly. It seems, then, that when cues are sensitive enough they may reduce the amount of information collected for single decisions. The shorter duration of decisions with the d=Cue ⫽ 3 system than with the d=Cue ⫽ 1.7 system also corresponded with the lower effort participants reported when d=Cue ⫽ 3. In contrast to our prediction, durations for decisions that

cued items were faulty were not shorter in blocks with cues, and they were even longer with d=Cue ⫽ 1.7 cues than in blocks without cues. Thus, as we hypothesized, cues affected the durations of single decisions. However, for certain decisions the effects were more complicated than we predicted. It appears that the decision of how much information to collect for single classifications was also determined by the decision how to scan the entire screen. For instance, the decision not to inspect a large proportion of the uncued areas could have been coupled with a decision to invest more effort in inspecting the cued areas. The relationship between single decisions and screenwide scanning procedures should be further explored in future studies. We also assessed participants’ joint performance with the cues and the weight they assigned to cues and compared it to the optimal weight according to SDT. Our findings showed that participants underrelied on and undercomplied with the cues. Also, in contrast to previous findings (e.g., Botzer et al., 2013; Meyer, 2001), reliance and compliance were not stronger with the more valid cues. Finally, participants’ performance with the systems was lower or equivalent to the expected performance of the systems alone with no human in the loop. Botzer et al. (2013) and Goh, Wiegmann, and Madhavan (2005) report similar findings, implying that individuals too often ignore correct cue recommendations. One possible explanation for this finding may be that individuals lack the computational capabilities to adequately integrate information from cues and other available information, and thus, they may assign too little weight to cues (Maltz & Meyer, 2001; Meyer, 2001; Shurtleff, 1991). Alternatively, individuals may feel that they need to respond “creatively,” and they therefore refrain from submitting similar decisions in a sequence (Shanks, Tunney, & McCarthy, 2002; Wickens, 1992). In our experiment, for example, most uncued items were expected to be intact, but participants could have decided not to submit the same response time and again. Finally, participants in Dzindolet, Pierce, Beck, and Dawe (2002) justified their underreliance on cues in that they felt a “moral obligation” to rely on themselves. The lack of an effect of cue validity on reliance and compliance may be due to two limitations of our study. First, some items contained no 2 ⫻ 2 dark areas and were, therefore, certainly not faulty. This is not an ecological limitation because images in real life can also have areas that clearly contain no targets (e.g., Oliver et al., 2010). However, as the overall tendency to report items as faulty, designated as C, is computed across entire blocks of screens, having items that can easily be classified as intact should result in lower tendencies, overall, to report faults and thus, to higher values of C. Such a bias could have made it harder to detect finer grained differences in responses to cues. Second, only 24 individuals participated in the experiment. The small sample size, compared to Botzer et al. (2013), may explain why we did not fully replicate the findings regarding reliance and compliance. Still, our most important findings with measures from SDT, namely, that participants undercomplied with cues and underrelied on cues and that individuals’ performance with cues was lower than the expected performance of cues alone are in line with the previous findings we reported. Our study had a number of additional limitations. One of them is the relatively high proportion of faults. Therefore, our findings may apply to such cases as conflict alerts in air traffic control (e.g.,

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

EFFECTS OF CUES ON TARGET SEARCH BEHAVIOR

Wickens et al., 2009) or computer assistance in interpreting dental radiograms (e.g., Araki et al., 2010). In other cases the probability of faults may be extremely small (for instance 1/1,000), and pTarget|Cue may be as low as 0.01(Parasuraman, 2000). This may cause operators to ignore alerts (Parasuraman & Riley, 1997; Sorkin, 1988), the equivalent to ignoring visual cues in target search tasks. In our payoff matrix, the difference between the outcome values for hits and misses (200 points) was greater than the difference in outcome values for false alarms and correct rejections (100 points). This characterizes many task domains (e.g., quality control, air traffic control), but often the consequences of missing negative events may be much larger than represented by the values we chose. In such cases, we expect users to be more careful and to scan images more thoroughly. Finally, in this study we had only two cueing-system sensitivities and only one threshold setting for the systems (Ln␤ ⫽ 0). We therefore only explored few correct and incorrect proportions of system recommendations. All cues we used led to similar screen scanning procedures. It is not clear what range of cue sensitivities and threshold settings is associated with similar scanning procedures, and what would be the scanning procedure outside that range.

Conclusions The results indicate that when cues are used to locate possible targets in an assembly of stimuli, their role in the detection task may be different from the role cues usually have when single stimuli are presented. Although the two situations may seem to be equivalent, our study shows that they do not necessarily involve the same underlying cognitive processes. In single decisions, cues mainly affect the threshold settings for the classification in SDT terms. As we suggested, they should also change the starting point of a random walk. In the classification of items in arrays, cues seem to change both the visual scanning procedure and the parameters that determine the duration of the random walk. The former effect may be largely unaffected by cue sensitivity, at least within a certain range of cue sensitivities. The latter effect seems to be determined by the sensitivity of the cues, but may also be determined by the screen scanning procedure. At a practical level, we show again that aided user decisions may lead to worse performance than the performance obtained when the decision would have been simply based on the cue alone. Two findings we introduced may account for the lower performance of human- cue teams. First, instead of inspecting all uncued items, users tended to use a strategy resembling probability matching to decide about the number of items they classify as faulty. Second, users’ compliance with the cues and reliance on them was far from optimal, suggesting that when they had inspected items, they often chose to override cue recommendations and were eventually wrong. These findings have strong implications for the design of computer aided detection systems and training programs in settings where multiple targets may exist (e.g., radiology, quality control for defects in silicon wafers, detection of military targets). Designers and practitioners who are aware of the tendency of individuals to use probability matching, instead of inspecting all areas, and who are also aware of the tendency to override correct

85

system recommendations may consider a number of countermeasures. First, they may opt for likelihood ratio cueing (e.g., Gupta, Bisantz, & Singh, 2002; Sorkin, Kantowitz, & Kantowitz, 1988; Wiczorek & Manzey, 2014) to classify areas according to the likelihood of targets, instead of only providing binary indications. With such cueing, users may inspect and tend to mark those areas that are designated as having the highest likelihood for targets (e.g., are highlighted with a red frame), and they will carefully search for additional targets in areas with lower likelihood for targets (e.g., are highlighted with an orange frame). Areas with the lowest likelihood for targets will not be cued at all and will therefore almost not be inspected. This way, if users indeed choose not to inspect a subset of the uncued areas, they can at least choose this subset from the group of items with the lowest likelihood for targets. Thus, likelihood ratio cueing may improve the user-automation team performance by directing attention to areas with higher likelihood for targets, by facilitating acceptance of correct recommendations and, possibly, by lowering the effort invested in the classification task. Human–automation teams with likelihood ratio cueing were already found to perform tasks better than humanautomation teams with binary cueing systems (e.g., Gupta, Bisantz, & Singh, 2002; Sorkin, Kantowitz, & Kantowitz, 1988; Wiczorek & Manzey, 2014). Second, designers may consider presenting small numbers of subareas at a time, instead of presenting an entire screen image. This way, detection tasks with multiple targets may become more similar to detection tasks with sequences of single decisions, where inspection is less likely to end after a certain number of targets had been found. In other words, decisions about subareas would depend less on decisions about other subareas. This countermeasure, however, is probably more feasible in static than in dynamic images. Further, it should also be employed more carefully in tasks where a comparison between subareas is necessary to set a reference point to evaluate the magnitude of effects. For example, magnitude of shading on a number of subareas on a silicon wafer may help in assessing how dark a dark subarea on the wafer really is (Yoda et al., 1988). Third, professionals who perform target detection tasks with the aid of detection algorithms and cues should be informed of the tendency to terminate searches before all uncued areas are inspected. Previous studies have indeed shown that awareness to heuristics and decision biases may reduce their occurrences (e.g., Kassirer & Kopelman, 1991; Wegwarth, Gaissmaier, & Gigerenzer, 2009). Last, training programs may be developed to reduce probability matching. Shanks and colleagues (2002) have demonstrated that with certain kinds of feedback on performance, combined with extensive training, probability matching may even completely disappear for some individuals. Cues are integral parts of decision support, and their use will probably increase as image analysis systems will advance and provide users with indications, pointing to areas that might be of interest. The benefits such systems provide depend on users’ use of the cues and on the additional visual information they receive. The prediction of the effects of such systems on performance needs to be based on empirical findings that describe the combined effects of cues on visual scanning and on the classification of individual areas. A better understanding of these processes can help us design

BOTZER ET AL.

86

systems that will lead to improved task performance and lower workload for the operator.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

References Araki, K., Matsuda, Y., Seki, K., & Okano, T. (2010). Effect of computer assistance on observer performance of approximal caries diagnosis using intraoral digital radiography. Clinical Oral Investigations, 14, 319 –325. http://dx.doi.org/10.1007/s00784-009-0307-z Bechar, A., Meyer, J., & Edan, Y. (2009). An objective function to evaluate performance of Human-Robot Systems for target recognition tasks. IEEE Transactions on Systems, Man, and Cybernetics, 39, 611– 620. http://dx.doi.org/10.1109/TSMCC.2009.2020174 Berbaum, K. S., Franken, E. A., Jr., Dorfman, D. D., Miller, E. M., Caldwell, R. T., Kuehn, D. M., & Berbaum, M. L. (1998). Role of faulty visual search in the satisfaction of search effect in chest radiography. Academic Radiology, 5, 9 –19. http://dx.doi.org/10.1016/S10766332(98)80006-8 Berbaum, K. S., Franken, E. A., Jr., Dorfman, D. D., Miller, E. M., Krupinski, E. A., Kreinbring, K., . . . Lu, C. H. (1996). Cause of satisfaction of search effects in contrast studies of the abdomen. Academic Radiology, 3, 815– 826. http://dx.doi.org/10.1016/S10766332(96)80271-6 Berbaum, K. S., Franken, E. A., Jr., Dorfman, D. D., Rooholamini, S. A., Kathol, M. H., Barloon, T. J., . . . Montgomery, W. J. (1990). Satisfaction of search in diagnostic radiology. Investigative Radiology, 25, 133–139. http://dx.doi.org/10.1097/00004424-199002000-00006 Bliss, J. P. (2003). Investigation of alarm-related accidents and incidents in aviation. The International Journal of Aviation Psychology, 13, 249 – 268. http://dx.doi.org/10.1207/S15327108IJAP1303_04 Bliss, J. P., Gilson, R. D., & Deaton, J. E. (1995). Human probability matching behaviour in response to alarms of varying reliability. Ergonomics, 38, 2300 –2312. http://dx.doi.org/10.1080/00140139508925269 Borowsky, A., Shinar, D., & Parmet, Y. (2008). The relation between driving experience and recognition of road signs relative to their locations. Human Factors, 50, 173–182. http://dx.doi.org/10.1518/ 001872008X288330 Botzer, A., Meyer, J., Bak, P., & Parmet, Y. (2010). User settings of cue thresholds for binary categorization decisions. Journal of Experimental Psychology: Applied, 16, 1–15. http://dx.doi.org/10.1037/a0018758 Botzer, A., Meyer, J., & Parmet, Y. (2013). Mental effort in binary categorization aided by binary cues. Journal of Experimental Psychology: Applied, 19, 39 –54. http://dx.doi.org/10.1037/a0031625 Byers, J. C., Bittner, A. C., & Hill, S. G. (1989). Traditional and raw task load index (TLX) correlations: Are paired comparisons necessary? In A. Mital (Ed.), Advances in Industrial Ergonomics and Safety, I (pp. 481– 485). London, UK: Taylor & Francis. Chi, C. F., & Drury, C. G. (1998). Do people choose optimal response criterion in an inspection task. IIE Transactions, 30, 257–266. http://dx .doi.org/10.1080/07408179808966456 Chun, M. M., & Wolfe, J. M. (1996). Just say no: How are visual searches terminated when there is no target present? Cognitive Psychology, 30, 39 –78. http://dx.doi.org/10.1006/cogp.1996.0002 Craig, A., & Colquhoun, W. P. (1977). Vigilance effects in complex inspection. In R. R. Mackie (Ed.), Vigilance-theory: Operational performance and physiological correlates (pp. 239 –262). New York, NY: Plenum Press. http://dx.doi.org/10.1007/978-1-4684-2529-1_14 Dixon, S. R., & Wickens, C. D. (2006). Automation reliability in unmanned aerial vehicle control: A reliance-compliance model of automation dependence in high workload. Human Factors, 48, 474 – 486. http:// dx.doi.org/10.1518/001872006778606822 Dixon, S. R., Wickens, C. D., & McCarley, J. S. (2007). On the independence of compliance and reliance: Are automation false alarms worse than misses? Human Factors, 49, 564 –572. http://dx.doi.org/10.1518/ 001872007X215656

Dzindolet, M. T., Pierce, L. G., Beck, H. P., & Dawe, L. A. (2002). The perceived utility of human and automated aids in a visual detection task. Human Factors, 44, 79 –94. http://dx.doi.org/10.1518/001872 0024494856 Dzindolet, M. T., Pierce, L. G., Beck, H. P., Dawe, L. A., & Anderson, B. W. (2001). Predicting misuse and disuse of combat identification systems. Military Psychology, 13, 147–164. http://dx.doi.org/10.1207/ S15327876MP1303_2 Edwards, W. (1965). Optimal strategies for seeking information: Models for statistics, choice reaction times, and human information processing. Journal of Mathematical Psychology, 2, 312–329. http://dx.doi.org/ 10.1016/0022-2496(65)90007-6 Estes, W. K. (1986). Array models for category learning. Cognitive Psychology, 18, 500 –549. http://dx.doi.org/10.1016/0010-0285(86)90008-3 Fisher, D. L., Coury, B. G., Tengs, T. O., & Duffy, S. A. (1989). Minimizing the time to search visual displays: The role of highlighting. Human Factors, 31, 167–182. Fisher, D. L., & Tan, K. C. (1989). Visual displays: The highlighting paradox. Human Factors, 31, 17–30. Fleck, M. S., Samei, E., & Mitroff, S. R. (2010). Generalized “satisfaction of search”: Adverse influences on dual-target search accuracy. Journal of Experimental Psychology: Applied, 16, 60 –71. http://dx.doi.org/ 10.1037/a0018629 Fleetwood, M. D., & Byrne, M. D. (2006). Modeling the visual search of displays: A revised ACT-R model of icon search based on eye-tracking data. Human-Computer Interaction, 21, 153–197. http://dx.doi.org/ 10.1207/s15327051hci2102_1 Fox, J. G., & Haslegrave, C. H. (1969). Industrial inspection efficiency and the probability of a defect occurring. Ergonomics, 12, 713–721. http:// dx.doi.org/10.1080/00140136908931088 Getty, D. J., Swets, J. A., Pickett, R. M., & Gonthier, D. (1995). System operator response to warnings of danger: A laboratory investigation of the effects of the predictive value of a warning on human response time. Journal of Experimental Psychology: Applied, 1, 19 –33. http://dx.doi .org/10.1037/1076-898X.1.1.19 Gitelman, D. R. (2002). ILAB: A program for postexperimental eye movement analysis. Behavior Research Methods, Instruments & Computers, 34, 605– 612. http://dx.doi.org/10.3758/BF03195488 Goh, J., Wiegmann, D. A., & Madhavan, P. (2005). Effects of automation failure in a luggage screening task: A comparison between direct and indirect cueing. Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting (pp. 492– 496). Orlando, FL: Human Factors and Ergonomics Society. Gupta, N., Bisantz, A. M., & Singh, T. (2002). The effects of adverse condition warning system characteristics on driver performance: An investigation of alarm signal type and threshold level. Behaviour & Information Technology, 21, 235–248. http://dx.doi.org/10.1080/ 0144929021000013473 Hart, S. G., & Staveland, L. E. (1988). Development of the NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P. A. Hancock & N. Meshkati (Eds.), Human mental workload (pp. 139 –183). Amsterdam, The Netherlands: North-Holland. http://dx.doi .org/10.1016/S0166-4115(08)62386-9 Kassirer, J. P., & Kopelman, R. I. (1991). Learning clinical reasoning. Baltimore, MD: Williams and Wilkins. Kundel, H. L., & La Follette, P. S., Jr. (1972). Visual search patterns and experience with radiological images. Radiology, 103, 523–528. http:// dx.doi.org/10.1148/103.3.523 Lai, S. M., Li, X., & Biscof, W. F. (1989). On techniques for detecting circumscribed masses in mammograms. IEEE Transactions on Medical Imaging, 8, 377–386. http://dx.doi.org/10.1109/42.41491 Lehto, M. R., Papastavrou, J. D., & Giffen, W. J. (1998). An empirical study of adaptive warnings: Human- versus computer-adjusted warning thresholds. International Journal of Cognitive Ergonomics, 2, 19 –33.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

EFFECTS OF CUES ON TARGET SEARCH BEHAVIOR Liao, M. J., Granada, S., & Johnson, W. W. (2005). The influence of brightness highlighting on eye movements within a cockpit display of traffic information. Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting (pp. 1644 –1648). Orlando, FL: Human Factors and Ergonomics Society. Lorigo, L., Haridasan, M., Brynjarsdóttir, H., Xia, L., Joachims, T., Gay, G., . . . Pan, B. (2008). Eye tracking and online search: Lessons learned and challenges ahead. Journal of the American Society for Information Science and Technology, 59, 1041–1052. http://dx.doi.org/10.1002/asi .20794 MacGregor, J., & Lee, E. (1987). Menu search: Random or systematic? International Journal of Man–Machine Studies, 26, 627– 631. http://dx .doi.org/10.1016/S0020-7373(87)80075-5 Macmillan, N. A., & Creelman, C. D. (1990). Response bias: Characteristics of detection theory, threshold theory, and “nonparametric” indexes. Psychological Bulletin, 107, 401– 413. http://dx.doi.org/10.1037/ 0033-2909.107.3.401 Madhavan, P., Wiegmann, D. A., & Lacson, F. C. (2006). Automation failures on tasks easily performed by operators undermine trust in automated aids. Human Factors, 48, 241–256. http://dx.doi.org/10.1518/ 001872006777724408 Maltz, M. (2005). Modeling the efficacy of automated aids in target acquisition under conditions of heavy workload. Optical Engineering, 44, 086201. http://dx.doi.org/10.1117/1.2011107 Maltz, M., & Meyer, J. (2001). Use of warnings in an attentionally demanding detection task. Human Factors, 43, 217–226. http://dx.doi .org/10.1518/001872001775900931 Maltz, M., & Shinar, D. (2003). New alternative methods of analyzing human behavior in cued target acquisition. Human Factors, 45, 281– 295. http://dx.doi.org/10.1518/hfes.45.2.281.27239 Meyer, J. (2001). Effects of warning validity and proximity on responses to warnings. Human Factors, 43, 563–572. http://dx.doi.org/10.1518/ 001872001775870395 Meyer, J. (2004). Conceptual issues in the study of dynamic hazard warnings. Human Factors, 46, 196 –204. http://dx.doi.org/10.1518/hfes .46.2.196.37335 Meyer, J., Wiczorek, R., & Günzler, T. (2014). Measures of reliance and compliance in aided visual scanning. Human Factors, 56, 840 – 849. http://dx.doi.org/10.1177/0018720813512865 Mobley, W. H., & Goldstein, I. L. (1978). The effects of payoff on the visual processing of dental radiographs. Human Factors, 20, 385–390. Mosier, K. L., & Skitka, L. J. (1996). Human decision makers and automated decision aids: Made for each other? In R. Parasuraman & M. Mouloua (Eds.), Automation and human performance: Theory and application (pp. 201–220). Mahwah, NJ: Erlbaum, Inc. Mosier, K. L., Skitka, L. J., Heers, S., & Burdick, M. (1998). Automation bias: Decision making and performance in high-tech cockpits. The International Journal of Aviation Psychology, 8, 47– 63. http://dx.doi .org/10.1207/s15327108ijap0801_3 Navon, D., & Gopher, D. (1979). On the economy of the human processing system. Psychological Review, 86, 214 –255. http://dx.doi.org/10.1037/ 0033-295X.86.3.214 Nygren, T. E. (1991). Psychometric properties of subjective workload measurement techniques: Implications for their use in the assessment of perceived mental workload. Human Factors, 33, 17–33. Oliver, A., Freixenet, J., Martí, J., Pérez, E., Pont, J., Denton, E. R. E., & Zwiggelaar, R. (2010). A review of automatic mass detection and segmentation in mammographic images. Medical Image Analysis, 14, 87–110. http://dx.doi.org/10.1016/j.media.2009.12.005 Palmer, J., Huk, A. C., & Shadlen, M. N. (2005). The effect of stimulus strength on the speed and accuracy of a perceptual decision. Journal of Vision, 5, 376 – 404. http://dx.doi.org/10.1167/5.5.1

87

Parasuraman, R. (2000). Designing automation for human use: Empirical studies and quantitative models. Ergonomics, 43, 931–951. http://dx.doi .org/10.1080/001401300409125 Parasuraman, R., Hancock, P. A., & Olofinboba, O. (1997). Alarm effectiveness in driver-centred collision-warning systems. Ergonomics, 40, 390 –399. http://dx.doi.org/10.1080/001401397188224 Parasuraman, R., & Riley, V. A. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39, 230 –253. http://dx.doi.org/ 10.1518/001872097778543886 Ratcliff, R., & Rouder, J. N. (1998). Modeling response times for twochoice decisions. Psychological Science, 9, 347–356. http://dx.doi.org/ 10.1111/1467-9280.00067 Rice, S., & McCarley, J. S. (2011). Effects of response bias and judgment framing on operator use of an automated aid in a target detection task. Journal of Experimental Psychology: Applied, 17, 320 –331. http://dx .doi.org/10.1037/a0024243 Robert, G., & Hockey, J. (1997). Compensatory control in the regulation of human performance under stress and high workload: A cognitiveenergetical framework. Biological Psychology, 45, 73–93. http://dx.doi .org/10.1016/S0301-0511(96)05223-4 Robinson, D. E., & Sorkin, R. D. (1985). A contingent criterion model of computer assisted detection. In R. E. Eberts & C. G. Eberts (Eds.), Trends in ergonomics/human factors II (pp. 75– 82). Amsterdam, The Netherlands: North-Holland. Samuel, S., Kundel, H. L., Nodine, C. F., & Toto, L. C. (1995). Mechanism of satisfaction of search: Eye position recordings in the reading of chest radiographs. Radiology, 194, 895–902. http://dx.doi.org/10.1148/ radiology.194.3.7862998 Shanks, D. R., Tunney, R. J., & McCarthy, J. D. (2002). A re-examination of probability matching and rational choice. Journal of Behavioral Decision Making, 15, 233–250. http://dx.doi.org/10.1002/bdm.413 Shurtleff, M. S. (1991). Effects of specificity of probability information on human performance in a signal detection task. Ergonomics, 34, 469 – 486. http://dx.doi.org/10.1080/00140139108967330 Skitka, L. J., Mosier, K. L., & Burdick, M. (1999). Does automation bias decision making? International Journal of Human-Computer Studies, 51, 991–1006. http://dx.doi.org/10.1006/ijhc.1999.0252 Sorkin, R. D. (1988). Why are people turning off our alarms? The Journal of the Acoustical Society of America, 84, 1107–1108. http://dx.doi.org/ 10.1121/1.397232 Sorkin, R. D., Kantowitz, B. H., & Kantowitz, S. C. (1988). Likelihood alarm displays. Human Factors, 30, 445– 459. Sorkin, R. D., & Woods, D. D. (1985). Systems with human monitors: A signal detection analysis. Human-Computer Interaction, 1, 49 –75. http://dx.doi.org/10.1207/s15327051hci0101_2 Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of elementary psychological processes. Cambridge, UK: Cambridge University Press. Vandekerckhove, J., Tuerlinckx, F., & Lee, M. D. (2011). Hierarchical diffusion models for two-choice response times. Psychological Methods, 16, 44 – 62. http://dx.doi.org/10.1037/a0021765 Wegwarth, O., Gaissmaier, W., & Gigerenzer, G. (2009). Smart strategies for doctors and doctors-in-training: Heuristics in medicine. Medical Education, 43, 721–728. http://dx.doi.org/10.1111/j.1365-2923.2009 .03359.x Wickens, C. D. (1992). Engineering psychology and human performance (2nd ed.). New York, NY: Harper Collins. Wickens, C. D., & Dixon, S. R. (2007). The benefits of imperfect diagnostic automation: A synthesis of the literature. Theoretical Issues in Ergonomics Science, 8, 201–212. http://dx.doi.org/10.1080/ 14639220500370105 Wickens, C. D., Rice, S., Keller, D., Hutchins, S., Hughes, J., & Clayton, K. (2009). False alerts in air traffic control conflict alerting system: Is there a “cry wolf” effect? Human Factors, 51, 446 – 462. http://dx.doi .org/10.1177/0018720809344720

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

88

BOTZER ET AL.

Wiczorek, R., & Manzey, D. (2014). Supporting attention allocation in multitask environments: Effects of likelihood alarm systems on trust, behavior, and performance. Human Factors. Advance online publication. http://dx.doi.org/10.1177/0018720814528534 Wolfe, J. M., Horowitz, T. S., & Kenner, N. M. (2005). Cognitive psychology: Rare items often missed in visual searches. Nature, 435, 439 – 440. http://dx.doi.org/10.1038/435439a Wolfe, J. M., & Van Wert, M. J. (2010). Varying target prevalence reveals two dissociable decision criteria in visual search. Current Biology, 20, 121–124. http://dx.doi.org/10.1016/j.cub.2009.11.066 Xu, R. (2003). Measuring explained variation in linear mixed effects models. Statistics in Medicine, 22, 3527–3541. http://dx.doi.org/ 10.1002/sim.1572 Yeh, M., Merlo, J. L., Wickens, C. D., & Brandenburg, D. L. (2003). Head up versus head down: The costs of imprecision, unreliability, and visual clutter on cue effectiveness for display signaling. Human Factors, 45, 390 – 407. http://dx.doi.org/10.1518/hfes.45.3.390.27249

Yeh, M., & Wickens, C. D. (2001). Display signaling in augmented reality: Effects of cue reliability and image realism on attention allocation and trust calibration. Human Factors, 43, 355–365. http://dx.doi.org/ 10.1518/001872001775898269 Yeh, M., Wickens, C. D., & Seagull, F. J. (1999). Target cuing in visual search: The effects of conformality and display location on the allocation of visual attention. Human Factors, 41, 524 –542. http://dx.doi.org/ 10.1518/001872099779656752 Yoda, H., Ohuchi, Y., Taniguchi, Y., & Ejiri, M. (1988). An automatic wafer inspection system using pipelined image processing techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10, 4 –16. http://dx.doi.org/10.1109/34.3863

Received November 3, 2013 Revision received September 24, 2014 Accepted September 28, 2014 䡲

Effects of cues on target search behavior.

Cues in visual scanning task can improve decision accuracy, and they may also affect task performance strategies. We tested the effects of cues on the...
1MB Sizes 3 Downloads 5 Views