This article was downloaded by: [Dicle University] On: 09 November 2014, At: 11:37 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Ergonomics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/terg20

Predicting and interpreting identification errors in military vehicle training using multidimensional scaling a

a

Corey J. Bohil , Nicholas A. Higgins & Joseph R. Keebler a

b

Department of Psychology, University of Central Florida, Orlando, FL, USA

b

Department of Psychology, Wichita State University, Wichita, KS, USA Published online: 04 Apr 2014.

To cite this article: Corey J. Bohil, Nicholas A. Higgins & Joseph R. Keebler (2014) Predicting and interpreting identification errors in military vehicle training using multidimensional scaling, Ergonomics, 57:6, 844-855, DOI: 10.1080/00140139.2014.899631 To link to this article: http://dx.doi.org/10.1080/00140139.2014.899631

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Ergonomics, 2014 Vol. 57, No. 6, 844–855, http://dx.doi.org/10.1080/00140139.2014.899631

Predicting and interpreting identification errors in military vehicle training using multidimensional scaling Corey J. Bohila*, Nicholas A. Higginsa and Joseph R. Keeblerb a

Department of Psychology, University of Central Florida, Orlando, FL, USA; bDepartment of Psychology, Wichita State University, Wichita, KS, USA

Downloaded by [Dicle University] at 11:37 09 November 2014

(Received 23 September 2013; accepted 21 February 2014) We compared methods for predicting and understanding the source of confusion errors during military vehicle identification training. Participants completed training to identify main battle tanks. They also completed card-sorting and similarityrating tasks to express their mental representation of resemblance across the set of training items. We expected participants to selectively attend to a subset of vehicle features during these tasks, and we hypothesised that we could predict identification confusion errors based on the outcomes of the card-sort and similarity-rating tasks. Based on card-sorting results, we were able to predict about 45% of observed identification confusions. Based on multidimensional scaling of the similarity-rating data, we could predict more than 80% of identification confusions. These methods also enabled us to infer the dimensions receiving significant attention from each participant. This understanding of mental representation may be crucial in creating personalised training that directs attention to features that are critical for accurate identification. Practitioner Summary: Participants completed military vehicle identification training and testing, along with card-sorting and similarity-rating tasks. The data enabled us to predict up to 84% of identification confusion errors and to understand the mental representation underlying these errors. These methods have potential to improve training and reduce identification errors leading to fratricide. Keywords: vehicle identification; training; incidental learning; multidimensional scaling

Introduction Friendly fire accidents (also known as fratricide or blue-on-blue incidents) often result from mistaken identification of military vehicles by individual soldiers (Regan 1995). Such errors disrupt the planning of appropriate responses to rapidly changing field conditions, often with life-threatening results (Briggs and Goldberg 1995; Keebler et al. 2010). Research suggests that novices can overreact based on the appearance of a prominent feature such as tank treads or turrets (Biederman and Shiffrar 1987) and that, depending on the extent of feature overlap, vehicles can be easily confused with one another (O’Kane et al. 1997). Accurate combat identification likely requires (1) attention to multiple features and (2) that features with little predictive value of a vehicle’s identity are not overly attended to. A common vehicle identification-training method is through repeated study of images – often line drawings – of armoured vehicles until their identities are memorised (Keebler, Jentsch, and Hudson 2011). Although simple and cost effective, there are shortcomings to this approach. There is no interactivity to promote active engagement in learning. Training is known to be more effective when learners are deeply engaged in the training system (Kirkpatrick 1975; Malone 1981; Keebler, Jentsch, and Hudson 2011). Also, given the complexity of the training items (armoured vehicles with many features) there is no way to know which features are most prominent in the learner’s memory. Object identification requires learning a 1:1 mapping of stimulus features to response labels (see Figure 1a). However, given the large number of features comprising an object as complex as a vehicle, there could be any number of feature subsets upon which a learner fixates. Furthermore, attending to idiosyncratic – but not predictive beyond the training set – features of training examples can lead to learning that is difficult to reverse (Biederman and Shiffrar 1987). However, given the task of memorising several training items, focusing on a subset of outward physical features (even if unconsciously) may be effective and efficient. Over time, experienced learners will develop a deeper, knowledge-based understanding of the critical features for each vehicle type (i.e. other thought processes over and above perception and memory play a role). But within the operational environment, most military personnel making identification judgements may be relatively inexperienced. For example, upon encountering an armoured vehicle in the field, infantry must make rapid decisions to engage or retreat, and must communicate what they see to forces elsewhere. These identification judgements are often made in poor viewing conditions, including distance, rain or dust, and occlusion by other objects, as well as under time pressure

*Corresponding author. Email: [email protected] q 2014 Taylor & Francis

Downloaded by [Dicle University] at 11:37 09 November 2014

Ergonomics

845

Figure 1. Representative items from the training set. (a) Identification: 1:1 mapping of stimuli to responses. (b) Classification: Many stimuli treated as equivalent.

and the duress of battle (Keebler et al. 2007). Improved decision-making in the field could result from identification training that is tailored to help learners focus attention on the most diagnostic features and avoid attending to less informative features (Biederman and Shiffrar 1987; Keebler et al. 2008; Keebler, Jentsch, and Hudson 2011). A well-known finding from the literature on classification learning is that when stimuli are complex, learners often focus on a subset of stimulus features (Biederman and Shiffrar 1987; Yamauchi and Markman 1998). During identification training, learners may incidentally (i.e. unintentionally) form a mental representation of vehicle classes in addition to memorising individual items (e.g. Kemler Nelson 1984; Smith 2008; Folstein, Gauthier, and Palmeri 2010). While progressing through a training set, learners are likely to notice recurring features across items (see Figure 1b). For example, some tanks have treads that are covered by armour while many do not; some vehicles have various peripheral devices mounted to their surface while others do not, and so on. Critically, this ‘many to one mapping’ mental representation may occur despite the fact that training feedback does not suggest any sort of classification scheme across sets of training objects, but rather supports only individual item identification. The current research explores the possibility that participants develop mental classification schemes incidentally during identification training, and the possibility that we can (1) determine what their mental representations are like and (2) predict their identification errors based on this knowledge. Participants completed an identification-training task in which they learned to identify a set of armoured military vehicles (main battle tanks). During identification training, participants received information about the unique identity of each individual training item. The training information did not serve to reinforce their attention to any particular subset of stimulus dimensions or class of vehicles based on common features. If participants notice recurring differences and similarities across training items and form associations between these items

Downloaded by [Dicle University] at 11:37 09 November 2014

846

C.J. Bohil et al.

(even implicitly) then this could be considered a type of ‘unsupervised’ category formation (i.e. training feedback does not reinforce category formation). We included both a passive ‘observational’ identification-training condition that is akin to learning with a set of training images and a more actively engaging ‘feedback’ training condition, during which participants had to guess the identity of each item followed by corrective feedback. Our goal was not necessarily to contrast these training conditions, but rather to evaluate our ability to predict identification errors under a variety of training methods. Participants also completed two tasks designed to assess their mental representation of the training set. They completed a card-sorting task – both before and after the identification-training task – in which they placed each training item into whatever piles made sense to them. Card-sorting is a widely used task for gaining insight into mental classification schemes and elicitation of knowledge structures (Edwards et al. 2006). After completing these tasks, participants also completed a similarity-rating task in which they compared the similarity of each possible pair of training images. These ratings facilitate multidimensional scaling (MDS) analysis (details below), which we used to gain another measure of mental representation of the stimulus set. Both tasks provide information regarding the participant’s focus of attention when examining members of the training set. Our main goal was to infer something about mental representations that develop during training and to examine how well we can predict identification confusion errors (i.e. confusing one training item for another) using that information. We hypothesise that learners notice subsets of features during identification training and that attention to these features contributes to their confusions (e.g. tanks with similar treads may be more confusable if this feature is the focus of attention). If this is the case, then it may be possible to infer the dimensions that are receiving the most attention from a learner and potentially predict some confusion errors during the training process. Knowledge of this underlying mental representation of the training set could be used to create adaptive training that focuses attention on the most important dimensions and reduces identification errors. After presenting the results of our comparison between several prediction methods, we consider the possibility of using the methods explored here for developing an adaptive system for identification training.

Method Participants Undergraduate students from the University of Central Florida (n ¼ 38) voluntarily participated in the experiment for course credit. Data from two participants were removed for failure to complete the tasks as instructed, so our analyses are based on the remaining 36 participants.

Stimuli The stimuli used across the experimental tasks were line drawings of armoured military vehicles, selected from 54 study cards included in Graphic Training Aid 17-02-013: Armoured Vehicle Recognition, January 1997. Figure 1 shows some of the images used in the study. A total of 32 drawings were selected from the set of study cards. Twenty-two were drawings of main battle tanks, including 2S1-M1974, AMX-30, Centurion, Challenger, Chieftain, Leopard-II, M1-Abrams, M48A5, M60A3, T-62, T-64, T-72, AMX-13, ASU-85, Leopard-1A2, Jagdpanzer-Kanone, Jagdpanzer-SK105, Leopard-1A4, T-80, BMP-2, M109, T-54/55. A subset of 12 vehicles was chosen to be used in the identification-training task (the first 12 in the list above), while the remaining 10 were used for comparison in the identification test, card-sort and similarity-rating phases of the study. These vehicles were chosen because they were all of the same type (i.e. main battle tanks) and because they appeared (to the researchers) to be highly similar to each other. An additional 10 ‘non-similar’ drawings from the set of cards were also selected, including AMX-10P, BMD, BMP, Airborne, Jaguar, M2IFV, M551A1 Sheridan, Marder, Type531, ZSU-57-2. These were selected because they were different in appearance from the main battle tanks (e.g. many were not main battle tanks, although all were armoured military vehicles that were potentially confusable with other items in the study). These 10 items appeared along with the items above during the identification test. Images were roughly 2 inches tall and 3 inches wide when presented on paper cards (during the card-sort task), and roughly 4 £ 6 inches when presented on the computer screen (during the identification and similarity-rating tasks). All vehicles appeared at roughly the same oblique angle or rotation, and we controlled for image size and extraneous details in the images (name information, size comparison with a human).

Procedure Each participant completed several tasks. First was a card-sorting task; second, they were trained and tested on identification; and third, they conducted another card-sorting task to assess changes due to identification training. We refer

Ergonomics

847

to the sorting tasks as pre- and post-test sorting tasks since they came before and after the identification task. Finally, all participants finished the session by completing a similarity-rating task. Task details are as follows.

Downloaded by [Dicle University] at 11:37 09 November 2014

Card-sorting task Participants were presented with a stack of 22 cards (the ‘similar’ cards described above), each with a single vehicle image on one side. They were instructed to spread the cards onto a table and then sort them into piles. They were instructed that the piles could be organised however they wished as long as the piles made sense to them. No other direction was given about how to sort the vehicles. If the participant asked questions during the sorting task, they received only encouragement to sort the vehicles however they felt made sense. They were also instructed that they should sort into at least two piles, but could have as many piles as they wished beyond two. Finally, they were informed that they would have to provide a name for each pile after the sort was completed, and that they would be asked to explain the basis for sorting the images as they did and what rationale they used for naming the piles. Finally, for each vehicle in a sort pile, they were asked to rate the ‘goodness’ of that vehicle for the pile. These goodness ratings were based on a three-point scale (1 ¼ fair, 2 ¼ good, 3 ¼ perfect). This provided a metric of representativeness for the pile into which each vehicle image was placed. The sorting task was completed at the beginning of the experimental session, and again following the identification test phase. We assumed that the post-ID sort results would reflect knowledge gained from experience with the images (i.e. during the ID training and test phases). Each sorting task (pre- and post-test) took about 10 minutes to complete. Identification training and testing tasks There were two identification-training conditions. In both training conditions, participants viewed a series of 12 vehicle images – one at a time – on a computer screen. These 12 vehicles also appeared in the sorting-task set. In the ‘observational’ training condition, on each trial the participant would view a tank image, along with the name of that tank. They could view the image for as long as they wished, and they pressed a key to move on to the next image. The set of 12 images was presented in 10 training blocks (with order randomised for each block), for a total of 120 training trials. In the ‘feedback’ condition, the participant was presented on each trial with one of the 12 vehicles. Instead of a single label, all 12 vehicle labels were presented on the screen, along with an instruction to ‘press the corresponding keyboard key to guess the name of the tank’. After pressing a response key, the screen was cleared, followed by ‘correct’ or ‘incorrect – that was a [correct label of tank]’. The feedback remained on the screen for two seconds. The 12 training tanks were randomly presented in each training block. The training continued until the participant provided the correct category label for all 12 tanks in a row. It is typical in identification-training studies with feedback to continue training until an accuracybased criterion is reached (e.g. 100% accuracy one or two times through the training set). We followed this convention for the feedback training condition. In the observational condition, we selected the number of training cycles based on a guess as to the number of training trials that would likely be needed to reach criterion for participants in the feedback condition. This was done to keep the number of training observations roughly equal across conditions. As shown in the results section, the number of training trials completed in both conditions was similar. The observational and feedback training conditions were presented between subjects. No participant completed more than a single training condition. Following training, each participant (in both training conditions) completed an identification test. In random order, the 12 training items, along with 10 additional similar and 10 non-similar (described above) tank images were presented on the computer screen, one image per trial. On a trial, the test item was presented for 1 second before disappearing. Then the list of 12 item labels from the training phase appeared, along with an additional label that said ‘other’ which participants could press if they felt that they had not seen the image during training. The participant could take as long as they wished to select their response. No accuracy feedback was provided during this identification test. The identification training and testing portion of the experiment took approximately 30 minutes to complete. Following the identification training and test phases, each participant again completed the card-sort task (described above). Similarity-rating task All participants finished the session by completing a similarity-rating task. On each trial of the similarity-rating task, two tanks were presented side by side on the computer screen, along with the instructions ‘How similar are these?’ Under these instructions was a rating scale with numbers from 1 (‘low similarity’) to 5 (‘high similarity’). The participant’s task was to examine the two images and press the corresponding number key to rate their similarity. After a one-second inter-trial interval, the next randomly selected pair of tanks was presented. The presented tanks included the 12 ID task training items,

848

C.J. Bohil et al.

along with the 10 similar items that also appeared in the card-sorting and ID training tasks. All pairwise comparisons were rated for these 22 stimuli, resulting in 231 similarity-rating trials. This portion of the experiment took approximately 30 minutes to complete. Results We begin by reviewing the outcome of the identification training and test phases. After that we evaluate our ability to predict confusion errors committed during the identification test based on the results of the card-sorting and similarityrating tasks. The similarity-rating data for each participant were submitted to a MDS analysis that gave a spatial representation of confusability for the items. We also examine three methods for inferring which stimulus dimensions tended to be the focus of attention. These include a method based on the card-sort results and one based on interpreting the dimensions of MDS results. Finally, we compare all methods on the basis of signal detection analysis, which provides a quantitative measure of ability to predict confusions.

Downloaded by [Dicle University] at 11:37 09 November 2014

Identification confusion errors During the identification test, participants were presented with the 12 training items, along with 20 additional tank images to serve as lures (the 10 similar and 10 non-similar items described above). The lures were included to make the identification task more difficult. Although participants completed, on average, slightly more training trials in the feedback condition, this was not a statistically reliable difference. In the observational condition, all participants completed 120 training trials. In the feedback condition, the median number of trials completed to reach criterion was also 120 trials (M ¼ 135 trials, SD ¼ 54), t (30) ¼ 1.08, p ¼ 0.291. Participants committed more identification errors (including ‘other’ responses) after observational training (M ¼ 8.24, SD ¼ 4.72) than after feedback training (M ¼ 4.63, SD ¼ 2.49), t (34) ¼ 2.91, p ¼ 0.006. Excluding the ‘other’ response, the difference in ID confusion rates was smaller (i.e. the rate of confusing one training item for another training item). Based only on the 12 training stimuli, participants averaged 2.94 (SD ¼ 2.61) confusions after observational training and averaged 2.84 (SD ¼ 2.19) confusions after feedback training, but there was no significant difference between the two conditions, t (34) ¼ 0.12, p ¼ 0.902. It follows that more ‘other’ responses were made after observational training (M ¼ 5.29, SD ¼ 3.29) than after feedback training (M ¼ 1.79, SD ¼ 1.23), t (34) ¼ 4.32, p , 0.001. Although the difference in error rates seems to disappear after removing the ‘other’ responses, it is important to note that there were far fewer errors overall in the feedback training condition (total errors across all participants ¼ 88) than in the observational condition (total errors across participants ¼ 140), while ‘other’ responses accounted for 34 (39%) and 90 (64%) errors, respectively, in these conditions. Clearly, there was greater uncertainty about item identity after observational training. The ‘other’ responses could merely indicate a lack of confidence in the identification response, rather than certainty that the item was not seen in training. As a result, in the rest of our analyses we limit attention to identification confusions based only on the 12 training items. Our primary interest is in errors of commission (i.e. misidentifying one training item as another training item), due to these types of errors being responsible for incidences of fratricide. Predicting ID confusions from card-sort results We analysed the card-sort piles with respect to their ability to predict ID confusions using the following method. When two tanks were placed into the same card-sort pile, we interpreted this to mean that the participant considered these items to be more similar to each other than to items in other piles. Therefore, if two items appeared in the same pile and those items were indeed confused during the identification task, we considered that observed error to be predicted by the card-sort results. For example, if the participant confused a T-64 with a T-72 during the identification task, and these tanks appeared together in one of the participant’s sort piles, then this confusion was predicted by the sorting task. We compared predictions based on training condition, and also compared the pre- and post-task sorts using a 2 £ 2 mixed-factor analysis of variance (ANOVA) with training as a between-participant factor and pre – post sort as a withinparticipant factor. We could predict a higher proportion of ID confusions after feedback training (M ¼ 0.54, SD ¼ 0.35) than after observational training (M ¼ 0.33, SD ¼ 0.33; collapsed over pre- and post-task sorts); this difference was very close to reaching statistical significance, F (1,27) ¼ 4.12, p ¼ 0.052, partial h 2 ¼ 0.132. There was no reliable difference in predictions based on pre- (M ¼ 0.42, SD ¼ 0.35) and post-task sorts (M ¼ 0.45, SD ¼ 0.36), F (1,27) ¼ 0.10, p ¼ 0.752, h 2 ¼ 0.004, and no interaction between training condition and change from pre- to post-task sort, F (1,27) ¼ 0.03, p ¼ 0.87, h 2 ¼ 0.001. It appears that the difference between feedback and observational training card-sort

Downloaded by [Dicle University] at 11:37 09 November 2014

Ergonomics

849

predictions largely existed even before training occurred. This conclusion is supported by the non-significant difference in ability to predict confusions based on pre- and post-test card-sorts. The point we wish to make with this analysis is that card-sort results do a reasonable job of predicting ID confusions. The card-sorts predict about 45% of confusion errors collapsed over training conditions and pre- and post-task sorts. Because the main goal of this study was to predict identification confusion errors during training, rather than to champion one training style over another, we limit our consideration of differences between training conditions in our remaining analyses. Before evaluating other prediction methods, we must evaluate the content of the card-sorting results. In addition to error prediction, our goal is to interpret the psychological dimensions underlying the confusability of training items (i.e. we wish to determine the basis for judging items as similar enough to be confusable). We summarise here the outcome of the posttraining card-sorts. This interpretation will be contrasted with another approach to error interpretation in a later section. In the (post-training) card-sort task, participants averaged around four piles each (M ¼ 4.37, SD ¼ 2.28). Piles averaged slightly more than five items per pile (M ¼ 5.43, SD ¼ 3.05). Despite idiosyncrasies in participants’ descriptions of their sort piles, we found substantial consistency within and across participants in terms of the features they described as guiding their sorts. We were able to organise their sort labels into a relatively small set of approximately six categories. These included features pertaining to the following (from most common to least): body style (e.g. shape, size), small auxiliary guns (e.g. presence, absence, number), antennas (e.g. presence, absence, size), main turret (e.g. size, shape), attached auxiliary equipment (e.g. presence, absence) and number of wheels. For each participant, we counted the number of unique feature categories described. For example, in some cases a participant’s sort piles each corresponded to a different feature (e.g. body style, wheels). In most cases, though, several sort piles were based on different facets of the same dimension (e.g. rounded body, sectioned body and angular body). When this was the case, we treated these as use of a single feature category (e.g. body features). Using this process, we counted 62 total feature categories across participants. Most prevalent among these were body features (appearing for 16 participants), small attached guns (13 cases) and antenna (11 cases), then main turret features (8), attached equipment (8), number of wheels (5) and finally one miscellaneous feature (based on ‘modern’ appearance). These results suggest that a relatively small set of prominent tank features formed the basis for organising training items in the minds of participants. Predicting ID confusions from similarity ratings After completing the identification-training and card-sorting tasks, participants provided pairwise-similarity ratings for all the training items and the 10 similar lure items (items were rated on a five-point similarity scale). Similarity ratings should provide another way to understand the mental representation of the training items, and could provide a means for predicting confusion errors. We submitted the similarity ratings to MDS analysis, which is a data reduction technique that places each item into an N-dimensional space based on their psychological proximity (Borg and Groenen 2005). If two items are perceived as highly similar, they should be located close together in the MDS space; if perceived as non-similar they will be located far apart. We used the ALSCAL method to derive a two-dimensional MDS space for each individual participant (Borg and Groenen 2005). (The number of dimensions, N, can be fixed arbitrarily or determined based on model fit.) Because we sought to both predict ID confusion errors and deduce the primary dimensions underlying these errors, we limited the space to two dimensions for ease of interpretation. Better predictive performance would likely result from allowing the MDS procedure to determine the optimal number of dimensions to describe each participant’s data, although it might be more challenging to interpret the dimensions. Overall, however, the two-dimensional MDS solutions provided a reasonable fit to the similarity-rating data. Average fit (Young’s S-Stress) was 0.174 (SD ¼ 0.05), and the average proportion of variance accounted for was high (R 2 ¼ 0.83). A separate MDS analysis was carried out for each participant. This individual-level analysis is critical since we are interested in finding methods with potential to adaptively improve training for each learner, and because each learner’s focus of attention is idiosyncratic. Figure 2a displays the MDS result for a representative participant (chosen at random). All 22 tanks are plotted in the space, and those that are close together had higher pairwise-similarity ratings than those far apart. Based on the MDS solution for each participant, we examined two methods for predicting identification confusions. The first approach is based on our ability to assign psychological interpretation to the dimensions of a participant’s MDS space. This approach, which we refer to as the MDS-features method, is described in the next section. A second approach, which we refer to as the MDS-distance method, is based on predicting confusions from inter-item distance in the MDS space (e.g. the distance between points in the top panel of Figure 2). This approach will be considered after we evaluate the MDSfeatures method.

C.J. Bohil et al.

Downloaded by [Dicle University] at 11:37 09 November 2014

850

Figure 2. (a) Spatial representation between training items produced by multidimensional scaling for a representative participant. (b) The MDS placement of the training items summarised in panel a for the same participant. Coordinate axes are labelled with dimensional interpretation produced by regressing apparent feature ratings onto MDS x, y coordinates for the items. See text for details.

Predictions based on MDS-features Apparent feature ratings. In order to interpret the MDS dimensions, the researchers rated (prior to the study) each of the tanks in terms of their visually apparent features. For each of the tank images used, two raters assessed the following features: sectioned body (1 ¼ no, 5 ¼ yes), number of wheels (number of visible wheels), degree to which armour covered the wheels (1 ¼ none to 5 ¼ a lot), armour smoothness (1 ¼ smooth to 5 ¼ rough), presence of attached auxiliary equipment (1 ¼ little to 5 ¼ a lot), presence of small auxiliary guns (1 ¼ no, 5 ¼ yes), size of main turret (1 ¼ small to 5 ¼ very large) and presence of antenna (1 ¼ no, 5 ¼ yes). In addition, various other dimensions were rated (1 – 5 scales were used as well) to rule out the influence of nuisance features, including angle of the vehicle depicted in the image, size of

Downloaded by [Dicle University] at 11:37 09 November 2014

Ergonomics

851

the vehicle image on the card and assumed actual vehicle size, and presence/absence of a human for size perspective. None of these dimensions appeared to contribute to the results we describe next, so we will not consider them further. Because this method was exploratory, we were satisfied with the level of agreement between the feature ratings of the two raters. There was a high correlation between ratings, r (142) ¼ 0.725, p , 0.001, and no significant difference between the feature-rating responses, t (286) ¼ 0.84, p ¼ 0.401. Next, we derived dimensional interpretations for each participant’s MDS space by regressing the apparent feature ratings for the set of tanks onto the x, y coordinates for each tank in the MDS space. This allowed us to determine which tank features contributed most to the perceived similarity of the tanks as indicated by their proximity in the MDS space. Features with the smallest p-values in the regression analysis contributed most to the x, y coordinates for each item. This method has been utilised in a variety of studies to provide psychological interpretation to latent stimulus-space dimensions (e.g. Kruskal and Wish 1978; Markman and Makin 1998). Figure 2b shows the outcome of this analysis for one representative participant (i.e. the same data-set as in Figure 2a). Regression indicated that for this participant, the appearance of a ‘sectioned body’ and the presence of ‘antenna’ contributed most strongly to the configuration of points in the MDS space (reflecting this participant’s similarity ratings). Each of the vehicles displayed to the participant in the similarity-rating task are displayed for illustration. The vehicles on the left side of the space tended to be main battle tanks consisting of a sectioned body with an armoured and tracked chassis and a rotating turret supporting a large main weapon. The vehicles to the right tended to have a more unified body style in which there is no differentiation between chassis and turret. As for the vertical dimension, it is clear that the tanks near the bottom of the MDS space tended to have antenna, while those at the top did not. Clearly, the presence or absence of antenna is not a highly informative indicator of tank model, and a mental representation that focuses attention on this dimension might contribute to hazardous identification confusions among novice decision-makers. This approach to interpreting the dimensions underlying each participant’s mental representation of tank similarity enables us to predict identification confusion errors. For each of the two most prominent dimensions for a given participant (e.g. sectioned body and antennae in our example), we examined the apparent feature ratings (described above) for all possible pairings of tanks from the ID training task. If the ratings on at least one of the two dimensions matched for a pair of tanks, or differed in rated value by no more than one, then we predicted a confusion error for that pair. For example, if two items rated a four on ‘number of wheels’, or if one item rated a one and the other a two on the dimension ‘sectioned body’, then we would predict a confusion error. A pair had to be closely matched on only one of the two MDS dimensions. Although this algorithm was a rather arbitrary preliminary attempt, it substantially increased our predictive power for ID confusions over the card-sort method summarised above. Based on this MDS-features analysis, we could predict a higher proportion of confusions (M ¼ 0.85, SD ¼ 0.26) than based on the card-sort method (M ¼ 0.45, SD ¼ 0.36), t (56) ¼ 4.83, p , 0.001. We further evaluate these two methods below. Correlation between card-sort and MDS-features interpretations There was substantial overlap between features identified by the MDS-features method and the card-sort method. The MDS-features method identified about 58 feature categories totalled across participants. The most prevalent were features pertaining to antenna (15 cases), number of wheels (11), small attached guns (9), armoured wheels (7), sectioned body (7), armour smoothness (5) and attached equipment (4). For each participant we compared the number of feature categories in common across card-sort piles and MDS-features results. For 28 participants (those committing ID confusion errors), prediction features overlapped for at least one feature category in 50% of cases (i.e. for 14 participants, the card-sort and MDS-features methods identified at least one of the same features as important to the participant’s sorting or similarity-rating decisions). There was a fair amount of overlap between the predictive features produced by the card-sort and MDS-features methods, but they were often not in agreement. This leaves us with an important question: Which method are we to favour for inferring psychological dimensions? The next section details another method that may inform this decision. The psychological interpretation provided by the MDS-features method is valuable for understanding – and potentially influencing – the outcome of identification training (i.e. for understanding the source of confusion errors). However, predictions do not necessarily have to be linked to a psychological interpretation. Better predictive performance should be possible when based only on inter-point distances in the MDS space, which reflects the contribution of any number of underlying psychological dimensions. In the next section, we consider an MDS-distance-based predictor of confusion errors (which does not rely on dimensional interpretation), and compare its performance with the MDS-features and card-sort methods. In addition, so far we have considered only ability to predict observed confusions (‘hits’ in the language of signal detection theory). In the next section, we use signal detection analysis to compare all three prediction methods, taking into account both hit and false alarm rates.

Downloaded by [Dicle University] at 11:37 09 November 2014

852

C.J. Bohil et al.

MDS-distance predictions and signal detection analysis Using the MDS representation of training item similarity, we can use inter-point distance as a method for creating new sorting criteria for the items (i.e. for sorting more-confusable from less-confusable items). We did this by evaluating a range of inter-point distance criteria for each participant in order to predict confusion errors. This method is described next. Given an MDS representation of inter-item similarity (i.e. inter-point distance), we can predict confusability of two items based on their closeness in space. For each participant, we evaluated a range of distance criteria and made confusion error predictions. Then, for each participant, we selected the inter-point distance value that resulted in the best predictive performance (as determined by d 0 , the signal detection measure of discriminability). For example, using the same participant as in Figure 2 as an example, the distance between points in the space is the Euclidean distance computed using the x, y coordinates for each point. For this participant, inter-point distances ranged between 0.482 and 4.328. We tried grouping items in the MDS space (i.e. treating them as confusable in a manner akin to card-sorting) using several distance criterion values (ranging from 0.5, 1, 1.5, . . . , 4.5, 5). For example, when 0.5 was the criterion, all pairs of items with inter-point distance of 0.5 or less were sorted together and considered confusable. All pairs of items with larger inter-point distance would not be considered confusable. We repeated this process for each of the distance criterion values in order to find one that best predicted ID confusion errors. For the participant in our example, an inter-point distance criterion of two led to the best prediction performance (as defined below). The best inter-point distance criterion varied by participant. We computed signal detection indices of predictive performance as follows. A ‘signal’ was defined as an observed ID confusion error (i.e. an actual confusion made by the participant during the ID confusion task). All other item pairs were considered ‘noise’. In other words, observed confusions were treated as a signal that we tried to predict using our MDSdistance-based detector. Other potential (but not observed in the data) confusions were treated as noise since these could potentially be predicted as ‘signal’ by our algorithm (i.e. they could result in false alarms). A ‘hit’ was defined as a signal that was predicted by our algorithm, and a false alarm was an ID confusion that was predicted by our algorithm but not actually committed by the participant during the ID task. To facilitate comparison, we also used this method to compute signal detection measures for the card-sort and MDSfeatures procedures. For the card-sort method, items appearing together in a pile predicted a confusion error (a ‘hit’ if the predicted error also happened to be an observed confusion in the data; a ‘false alarm’ otherwise). For the MDS-features method, pairs with at least one of two dimensions close to matching on apparent feature ratings (differing by no more than one; described above) predicted a confusion error (these, too, were classified as hits or false alarms). All signal detection analyses were carried out at the individual participant level, and aggregate results presented below are based on individuallevel outcomes. To increase the reliability of each method’s predictive power, we base these comparisons on participants who committed at least three confusion errors in the identification task. Table 1 shows the d 0 , hit rate and false alarm rates for each method. There was an advantage for the MDS-based approaches over the card-sort method in terms of discriminability, d 0 and hit rate. Average d 0 was significantly higher for the MDS-distance method than for the MDS-features method, t (18) ¼ 3.09, p ¼ 0.003, as well as higher than the card-sorting method, t (18) ¼ 3.498, p ¼ 0.001. Although d 0 was higher for MDSfeatures than for card-sort, this difference was not statistically significant, t (18) ¼ 0.93, p ¼ 0.363. Both MDS-distance and MDS-features had significantly higher hit rates than card-sort, t (18) ¼ 4.29, p , 0.001 and t (18) ¼ 4.49, p , 0.001, respectively. MDS-distance and MDS-features hit rates did not significantly differ from each other, t (18) ¼ 0.245, p ¼ 0.405. On the other hand, the card-sort method produced the lowest false alarm rates. MDS-features resulted in significantly more false alarms than card-sort, t (18) ¼ 8.43, p , 0.001, as did MDS-distance, t (18) ¼ 4.32, p , 0.001. The MDSdistance method provided an intermediate level of false alarms; significantly lower than those from the MDS-features approach, t (18) ¼ 1.76, p ¼ 0.048 but still higher than the card-sort approach. Table 1.

Summary of signal detection analysis on prediction of ID confusions by method.

Card-sort MDS-features MDS-distance

d0

Hit rate

False alarm rate

0.556 (1.541) 0.974 (1.174) 1.816 (0.672)

0.450 (0.274) 0.819 (0.231) 0.836 (0.302)

0.270 (0.163) 0.771 (0.185) 0.624 (0.353)

Ergonomics

853

In summary, the MDS-based approaches can lead to substantial gains in hit rate, but this trades off with an increase in false alarm rate. However, the MDS-distance method provides some relief from this problem over the MDS-features approach. The overall level of discriminability, d 0 , which takes both hit and false alarm rates into account, clearly favours the MDS-distance method over the others.

Downloaded by [Dicle University] at 11:37 09 November 2014

Correlation between prediction methods Another way to examine the relationship between prediction methods is through their correlation with each other. We computed pairwise correlations between the methods on measures of d 0 , hit rate and false-alarm rate. None of the correlations reached the level of statistical significance ( p . 0.05 in all cases), but the trends were nevertheless consistent with what we might expect based on the results described above. Based on d 0 , MDS-features and MDS-distance were closer to each other than to card-sorts. There was a weak correlation between MDS-features and MDS-distance, r (17) ¼ 0.27, but little correlation between MDS-features and card-sorts, r (17) ¼ 0.02 and between MDS-distance and card-sorts, r (17) ¼ 0.17. Similarly, for hit rates, the strongest correlation was between MDS-distance and MDS-features, r (17) ¼ 0.39, with virtually no correlation between MDS-features and card-sorts, r (17) ¼ 0.002, and MDS-distance and card-sorts, r (17) ¼ 0.08. However, for false alarm rates, the relationship between the methods was more equivocal. There were weak correlations between MDS-features and MDS-distance r (17) ¼ 0.20 and between MDS-distance and card-sorts, r (17) ¼ 0.21. There was very little relationship between MDS-features and card-sorts, r (17) ¼ 0.11. These relationships suggest that the strongest predictive relationships are between MDS-based measures on d 0 and hit rate. Discussion In this study, we compared methods for predicting identification confusion errors and for understanding the mental representation underlying these confusions. Each participant completed a series of tasks, including a pre-training card-sort, identification training and testing, a post-training card-sort and finally a pairwise-similarity-rating task. The card-sort and similarity-rating tasks provide separate indicators of participants’ mental representation of the similarities and differences between items in the identification-training stimulus set. Our hypothesis was that participants focus on a subset of features during training and that we can use this tendency to predict and explain confusion errors. In doing so, we may discover a basis for directing learners’ attention towards critical stimulus features and away from superfluous details of training items. The ID training task was designed to mimic features of self-paced study with a set of training images (e.g. a deck of training cards). We compared two training conditions; a passive ‘observational’ training condition in which participants studied tanks and their labels together, and a more actively engaging ‘feedback’ training condition in which identification attempts were followed by corrective feedback. We observed some differences between training types – including a lower error rate and greater correspondence of confusions with card-sort categories in the feedback condition. Although these differences are interesting and might warrant further study, our primary focus was on understanding and predicting confusion errors rather than comparing training methods. Predicting ID confusions There was clear evidence that subsets of features were prominent in the minds of participants. Attention was often given to important features such as body style and weapons. In many cases, however, participants were influenced by the presence of antennas or other peripheral attachments which may not provide a reliable guide to identification of armoured vehicles in operational environments. We found that the card-sort method, which is straightforward to implement, accounted well for identification errors, predicting about 45% of observed confusions. However, the MDS-based methods proved to be much more sensitive. The MDS-features method, which combined the spatial representation of similarity judgements with regression-based interpretation of the most prominent psychological dimensions, predicted about 82% of observed confusions. And the MDS-distance method, which omits any psychological interpretation, predicted about 84% of confusions. On the other hand, both MDS methods predict a higher false alarm rate than the card-sort method (although this problem was less severe in the MDS-distance case). If the goal is to root out ID confusions, the MDS approaches seem to be worth exploring in more detail. It is important to point out that the methods applied here are preliminary, and that better predictive performance could be achieved with the MDS-based methods by allowing higher dimensional representations (we limited our MDS solutions to two dimensions in this study to simplify interpretation). Also, the MDS-distance method could be improved slightly by using a parameter optimisation algorithm to find the most predictive distance criterion for each participant. In order to gain a clearer picture of the trade-off between hit and false alarm rates for each method, we would likely need a study with many

854

C.J. Bohil et al.

more repetitions of the identification test trials (to produce more identification confusions). This would make the identification task a more sensitive measure of mental representation. It is also known that small training sets can lead to different learning strategies than large training sets (Rouder and Ratcliff 2004). Nevertheless, our goal here was exploratory. We were able to demonstrate the feasibility of interpreting psychological representation and predicting identification errors in a variety of ways. Another prediction approach worth exploring would be to apply models from the literature on classification learning (Pothos and Wills 2011). One thing we have not explored in the analyses reported here is whether participants unintentionally develop mental categories (in addition to memorising individual identities) for the training items based on similarity or whether they simply apply rules along a small subset of dimensions. If mental organisation of training items corresponds to application of simple dimensional rules (e.g. tanks with armour-covered treads), then learners may memorise fewer features of the training items than if their representation is based on memories for a set of features for each training exemplar. Determining their actual strategy would likely require fitting computational models to the data.

Downloaded by [Dicle University] at 11:37 09 November 2014

Future research There are of course additional avenues for future research. For example, eye tracking has been used in identificationtraining studies to answer questions similar to our own about the focus of attention during learning (e.g. Lee et al. 2013). It would be valuable to see whether eye tracking results corroborate our behaviour-based conclusions regarding the features that drive performance. Another important question pertains to the contribution of individual differences. Attention to certain details may be predictable from personal preferences, personality type or experiences that learners have had. The current work did not assess these variables. Furthermore, it will be important to understand any differences between learning performance of novices (as evaluated here) and those who have already received some form of training in vehicle identification. Another important consideration is the possibility that old/new recognition, rather than identification based on memorising sets of features, influences what learners remember and respond to. Some errors might be due more to a vague sense of recognition than reasoning based on attention to specific characteristics. Application to adaptive training Both the card-sort and the MDS-based methods could provide the basis for future adaptive training systems. However, although card-sort data are simple to collect, their analysis and interpretation are much more subjective than the similarityrating-based MDS methods. It might be difficult to implement in a computer-mediated training system using the card-sort method. On the other hand, the similarity-based methods lend themselves much more readily to automation. It is easy to envision an adaptive training system that teaches and tests item identification (tanks or otherwise) along with a system for collecting similarity ratings for the purpose of tailoring subsequent training sessions to optimise attention allocation. Such an approach makes sense given the ubiquity of hand-held computing devices (e.g. cell phones). Mobile devices can increase the realism of training stimuli and allow interactive learning, in addition to real-time adaptive capabilities based on the methods reported here. The system could improve training by directing attention away from features that are unreliable or unimportant and towards features that are critical for accurate vehicle identification. Furthermore, such an evaluation system would easily integrate into efforts at determining the ideal form-factor for training items. For example, ongoing research focuses on investigating differences between static images, movies and 3D interactive virtual models for training (Keebler, Jentsch, and Hudson 2011; Keebler, Jentsch, and Schuster, 2013). Mobile device training systems such as the Army’s ROC-V (Recognition of Combat Vehicles) training programme allow users many options for studying and testing on trained vehicles (Night Vision and Electronic Sensors Directorate 2013). By incorporating a pairwise comparison or sorting task like those evaluated here, and analysing the results using our MDS-based approach, the system could be tuned to (1) shorten training by focusing learner attention on routinely confused items and (2) ameliorate potential errors by emphasising dimensions that are critical for accurate identification. Finally, future research will need to put such an adaptive training method to the test with respect to real training outcomes. The system would need to model learner performance in real time, and compare performance with more traditional methods. References Biederman, I., and M. M. Shiffrar. 1987. “Sexing Day-Old Chicks: A Case Study and Expert Systems Analysis of a Difficult PerceptualLearning Task.” Journal of Experimental Psychology: Learning, Memory, and Cognition 13: 640–645. Borg, I., and P. Groenen. 2005. Modern Multidimensional Scaling: Theory and Applications. 2nd ed. New York: Springer-Verlag.

Downloaded by [Dicle University] at 11:37 09 November 2014

Ergonomics

855

Briggs, R. W., and J. H. Goldberg. 1995. “Battlefield Recognition of Armored Vehicles.” Human Factors 37 (3): 596– 610. Edwards, P. J., F. Sainfort, T. Kongnakorn, and J. A. Jacko. 2006. “Methods of Evaluating Outcomes.” In Handbook of Human Factors and Ergonomics, 3rd ed., 1150– 1187. Hoboken, NJ: Wiley. Folstein, J. R., I. Gauthier, and T. J. Palmeri. 2010. “Mere Exposure Alters Category Learning of Novel Objects.” Frontiers in Psychology 1: 40. Keebler, J. R., M. Harper-Sciarini, M. Curtis, D. Schuster, F. Jentsch, and M. Bell-Carroll. 2007. “Effects of 2-Dimensional and 3Dimensional Media Exposure Training on a Tank Recognition Task.” Proceedings of the 51st Annual Meeting of the Human Factors and Ergonomic Society, Baltimore. Keebler, J. R., F. Jentsch, and I. Hudson. 2011. “Developing an Effective Combat Identification Training.” Proceedings of the 55th Annual Meeting of the Human Factors and Ergonomics Society, Las Vegas. Keebler, J. R., F. Jentsch, and D. Schuster. 2013. “The effects of video game experience and active stereoscopy on performance in combat identification tasks.” Submitted for publication. Keebler, J. R., L. Sciarini, T. Fincannon, F. Jentsch, and D. Nicholson. 2008. “Effects of Training Modality on Target Identification in a Virtual Tank Recognition Task.” Proceedings of the 52nd Annual Meeting of the Human Factors and Ergonomics Society, New York. Keebler, J. R., L. Sciarini, T. Fincannon, F. Jentsch, and D. Nicholson. 2010. “A Cognitive Basis for Vehicle Misidentification.” In Human Factors Issues in Combat Identification, edited by D. H. Andrews, R. P. Herz, and M. B. Wolf, 113– 128. Burlington, VT: Ashgate. Kemler Nelson, D. G. 1984. “The Effect of Intention on What Concepts Are Acquired.” Journal of Verbal Learning and Verbal Behavior 23 (6): 734– 759. Kirkpatrick, D. L., ed. 1975. “Techniques for Evaluating Training Programs.” In Evaluating Training Programs. Alexandria, VA: ASTD. Kruskal, J. B., and M. Wish. 1978. “Multidimensional Scaling.” In Sage University Paper Series on Quantitative Application in the Social Sciences, 07 – 011. Beverly Hills: Sage. Lee, C., E. Middleton, D. Mirman, S. Kalenine, and L. J. Buxbaum. 2013. “Incidental and Context-Responsive Activation of Structureand Function-Based Action Features during Object Identification.” Journal of Experimental Psychology: Human Perception and Performance 39 (1): 257–270. Malone, T. W. 1981. “Toward a Theory of Intrinsically Motivating Instruction.” Cognitive Science 4: 333– 369. Markman, A. B., and V. S. Makin. 1998. “Referential Communication and Category Acquisition.” Journal of Experimental Psychology: General 127 (4): 331–354. Night Vision and Electronic Sensors Directorate. 2013. Army ROC-V (Version 1.0) [Mobile application software]. https://play.google. com/store/apps/details?id¼gov.usa.rocv O’Kane, B. L., I. Biederman, E. E. Cooper, and B. Nystrom. 1997. “An Account of Object Identification Confusions.” Journal of Experimental Psychology: Applied 3 (1): 21– 41. Pothos, E. M., and A. J. Wills, eds. 2011. Formal Approaches in Categorization. Cambridge: Cambridge University Press. Regan, G. 1995. Blue on Blue: A History of Friendly Fire. New York: Avon Books. Rouder, J. N., and R. Ratcliff. 2004. “Comparing Categorization Models.” Journal of Experimental Psychology: General 133: 63 – 82. Smith, E. E. 2008. “The Case for Implicit Category Learning.” Cognitive, Behavioral, and Affective Neuroscience 8 (1): 3 – 16. Yamauchi, T., and A. B. Markman. 1998. “Category Learning by Inference and Classification.” Journal of Memory and Language 39: 124– 148.

Predicting and interpreting identification errors in military vehicle training using multidimensional scaling.

We compared methods for predicting and understanding the source of confusion errors during military vehicle identification training. Participants comp...
391KB Sizes 0 Downloads 3 Views