Applied Ergonomics xxx (2014) 1e11

Contents lists available at ScienceDirect

Applied Ergonomics journal homepage: www.elsevier.com/locate/apergo

Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method Miranda Cornelissen a, b, *, Roderick McClure b, Paul M. Salmon c, Neville A. Stanton d a

Griffith Aviation, Griffith University, Nathan Campus, 170 Kessels road, Nathan, QLD 4111, Australia Monash Injury Research Institute, Monash University, Building 70, Clayton Campus, Wellington Road, Clayton, VIC 3800, Australia c University of the Sunshine Coast Accident Research (USCAR), Faculty of Arts and Business, University of the Sunshine Coast, Locked Bag 4, Maroochydore DC, QLD 4558, Australia d Civil, Maritime, Environmental Engineering and Science Unit, Faculty of Engineering and the Environment, University of Southampton, Highfield, Southampton SO17 1BJ United Kingdom b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 27 May 2013 Accepted 7 April 2014 Available online xxx

The Strategies Analysis Diagram (SAD) is a recently developed method to model the range of possible strategies available for activities in complex sociotechnical systems. Previous applications of the new method have shown that it can effectively identify a comprehensive range of strategies available to humans performing activity within a particular system. A recurring criticism of Ergonomics methods is however, that substantive evidence regarding their performance is lacking. For a method to be widely used by other practitioners such evaluations are necessary. This article presents an evaluation of criterion-referenced validity and test-retest reliability of the SAD method when used by novice analysts. The findings show that individual analyst performance was average. However, pooling the individual analyst outputs into a group model increased the reliability and validity of the method. It is concluded that the SAD method’s reliability and validity can be assured through the use of a structured process in which analysts first construct an individual model, followed by either another analyst pooling the individual results or a group process pooling individual models into an agreed group model. Ó 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.

Keywords: Strategies Analysis Diagram Cognitive Work Analysis Validation

1. Introduction Cognitive Work Analysis (CWA; Rasmussen et al., 1994; Vicente, 1999) is an Ergonomics method that is used to describe and evaluate complex sociotechnical systems. CWA has been applied in various complex sociotechnical systems such as defence (Burns et al., 2004), aviation (Ahlstrom, 2005), road transport (Cornelissen et al., 2013), and process control (Vicente, 1999). CWA is one of few formative methods within the discipline of Ergonomics. Formative methods aim to describe potential ways in which a system can operate, rather than describing how a system should operate (normative methods) or actually operates (descriptive methods) (Vicente, 1999). Formative methods are an important category of methods, as they are used to assist the design of adaptive and flexible systems and tools that can cope with nonroutine situations. Non-routine situations, driven by emergence,

* Corresponding author. Tel.: þ61 7 3735 5276. E-mail addresses: m.cornelissen@griffith.edu.au (M. Cornelissen), Rod.Mcclure@ monash.edu (R. McClure), [email protected] (P.M. Salmon), [email protected]. uk (N.A. Stanton).

are a feature of complex sociotechnical systems. Such systems exist in a highly changeable and demanding environment, in which adaptive capacity is essential (Vicente, 1999; Woods, 1988). Contemporary safety related concepts such as resilience (cf. Hollnagel, 2006) and performance variability (cf. Hollnagel, 2002, 2004) further underline the importance of methods to take adaptability of complex sociotechnical systems into account. Despite the popularity of CWA, reliability and validity analyses of formative methods remain largely absent from the Ergonomics literature. One of the few efforts to establish validity of CWA was conducted by Burns et al. (2004). They conducted a qualitative post hoc comparison of two independently conducted Work Domain Analyses (WDA), the first phase of CWA, on similar defence systems. More formal validation studies, including validation of latter phases of the CWA framework, are currently lacking. A recurring criticism aimed at the Ergonomics discipline is that some of the methods are only used by their developers and that methods are often chosen by practitioners based on familiarity and ease of use rather than based on reliability and validity evidence (Stanton et al., 2013). Many practitioners are unaware of whether the methods are reliable and valid (Stanton and Young, 1998).

http://dx.doi.org/10.1016/j.apergo.2014.04.010 0003-6870/Ó 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

2

M. Cornelissen et al. / Applied Ergonomics xxx (2014) 1e11

Fig. 1. Strategies analysis diagram.

Validation studies can benefit Ergonomics methods by providing clear evaluation and empirical evidence of their performance and value (Stanton and Young, 1999, 2003). Further, a key requisite of ergonomics methods is that they are usable by non-experts and that they achieve acceptable levels of performance when used by other analysts (Stanton and Young, 1999, 2003). Without this, uptake of the method by ergonomics practitioners, designers and engineers may be limited. The aim of the study reported in this article was to provide a more formal reliability and validity analysis of CWA. This study was conducted using non-expert analysts to provide an evaluation of CWA’s level of performance when used by other analysts and ensure uptake of the method by ergonomics practitioners. 1.1. Cognitive Work Analysis The CWA framework comprises five phases (Vicente, 1999). Each phase models a different set of constraints. First, WDA models the system constraints by describing what the system is trying to achieve and how and with what it achieves its purpose. Second, Control Task Analysis models situational constraints and decisionmaking requirements. Third, Strategies Analysis models ways in which activities within the system can be carried out. Fourth, social organisation and cooperation analysis models communication and coordination demands imposed by organisational constraints. Fifth, worker competencies analysis describes skills, rules and knowledge required for the activities possible within the system. To date, applications have focussed on the development and application of the first two phases: WDA and Control Task Analysis. The third phase, Strategies Analysis, is useful for providing insight into the different response options that enable a systems adaptive capacity. This phase has traditionally not been as well developed

and neither applied as often as the earlier phases. Recently, the Strategies Analysis has regained attention (Cornelissen et al., 2012, 2013; Hassall and Sanderson, 2012; Hilliard et al., 2008). In particular, Cornelissen et al. (2013) developed and applied a structured method, the Strategies Analysis Diagram (SAD) to model strategies available in complex sociotechnical systems. Initial evaluations of this method have gathered evidence of the methods effectiveness to identify strategies possible (Cornelissen et al., 2012, 2013.); however, the method would benefit from more formal evaluations. This paper is a response to this requirement. 1.1.1. Strategies Analysis Diagram The SAD method models how activities can potentially be executed within a system’s constraints. It also models criteria for when or why work will be executed in a certain way. SAD builds upon the first phase of the CWA framework by adding verbs and criteria to the constraints identified. This allows further specification of courses of action possible within the system’s constraints as well as criteria influencing the employment of courses of action. The SAD, see Fig. 1, is a networked hierarchical diagram using means ends links to represent ‘how’ and ‘why’ relationships between the different levels of the diagram. Links upwards explain why a certain object or function is there, whereas links downwards explain how a system works to achieve its purpose or execute its functions. The levels transferred from the WDA are illustrated in light grey and the SAD specific levels are dark grey. The lower half of the SAD models how activities can potentially be executed. The diagram describes, bottom up, verbs that describe potential interactions with or manipulations of objects in the system (e.g. follow), physical objects present in the system (e.g. lane markings), object related processes afforded by the physical objects (e.g. display information) and purpose related functions describing

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

M. Cornelissen et al. / Applied Ergonomics xxx (2014) 1e11

activities that need to be carried out for the system to achieve its purpose (e.g. determine path). The top half of the diagram describes why certain ways of work, or courses of action, are seen within the system. The top of the diagram describes top-down the functional purpose, this is the reason why the system exists (e.g. support negotiation of right hand turn by road users), values and priority measures evaluating the effectiveness of a system and driving behaviour (e.g. safety) and criteria, describing when certain courses of action at the lower end of the diagram are valid or likely to be chosen (e.g. high traffic volume). Together, nodes from the different levels provide analysts with syntax for strategy definition. For example bottom up a strategy could be defined as ‘assess’ ‘road users’ ‘show behaviour’ when ‘avoiding conflict with other road users’. Assess whether the ‘road user is unfriendly’, for ‘safety’ purposes and to ‘support negotiation of right hand turns by road users’. 1.2. Evaluating the Strategies Analysis Diagram Based on initial evaluations of the SAD method (Cornelissen et al., 2012; Cornelissen et al., 2013) the evidence suggests that, when used by its developers, it can effectively identify a comprehensive range of strategies available to humans performing activity within a particular system. Following this, the next critical step is to more formally evaluate the methods usefulness and ensure that the SAD can be applied by other practitioners. The methods’ reliability and validity when used by novices in the SAD method should be established. The aim of the study reported in this article was to evaluate the performance of the SAD method when used by analysts not already skilled in the CWA framework. Specifically, the study aimed to evaluate SAD in terms of its reliability and validity, when used by novice analysts to identify the range of strategies available to different road user groups at intersections. This allows for a more ample validation and reliability evaluation of formative methods. 1.2.1. Reliability and validity analysis Evaluating the methods criterion-referenced validity entails ensuring that the method allows analysts to come up with a SAD that contains accurate, and not too much irrelevant, content. Evaluating its test-retest reliability comprises ensuring that the method produces a similar model when used by the same analyst for the same system more than once. 1.2.1.1. Assessing validity. Studies assessing the validity of Ergonomics methods have been reported in the literature (Baber and Stanton, 1996; Stanton et al., 2009; Stanton and Young, 2003). Many of those have focussed on human reliability and error prediction methods (Baysari et al., 2011; Kirwan et al., 1997; Stanton and Young, 2003). In those studies, the validity of methods was assessed by comparing a method’s results (e.g. errors predicted) against actual observations (e.g. errors observed). Since the CWA framework is formative and models behaviour possible and include behaviour beyond that currently prescribed or actually seen within a system, a comparison of the methods results with actual observations would not provide sufficient conclusive evidence about the validity of the method. Alternatively, CWA results could be compared to results using a similar but validated method. However, CWA is unique in that it is a constraints-based approach and is one of few formative methods and substantial validation studies of such methods remain absent from the literature to date. Therefore, no method is available that could be used for SAD’s validation. Expert models are used when other validated standards are not available (Gordon et al., 2005). Expert models in the context of

3

CWA are models developed by expert analysts who are highly experienced in applying the methods from the CWA framework. It is worth noting that expert CWA models are not normative models representing the expert’s knowledge of a system, but rather well developed formative CWA analyses conducted by expert analysts. The assumption is that the expert model is most accurate and highly valuable in circumstances in which uncertainty exists and behaviour has yet to occur, is noisy or complex (Bolger and Wright, 1992). In absence of other objective standards, a model developed by expert analysts with no time constraints, is likely to be the best standard against which to compare other analysts’ CWA results. Once the standard against which the novice analyst’s results will be assessed is established, measures to assess the quality of the novice results have to be determined. Quantitative methods to compare a expert results versus novices results (or predicted versus actual outcomes) are often based on the use of signal detection theory to calculate the sensitivity of the method under analysis (Baber and Stanton, 1994; Stanton et al., 2009; Stanton and Young, 2003). The signal detection theory sorts the method’s outputs into hits, misses, false alarms and correct rejections. Hits represent items identified by novice analysts that were also identified by expert analysts. Misses represent items that were identified by expert analysts but not by novice analysts. False alarms refer to those items identified by novice analysts and not the expert analysts. Correct rejections are those items neither identified by the expert or novice analysts. The signal detection theory metrics are commonly used to assess the reliability and validity of ergonomics methods such as human error prediction (Stanton et al., 2009); however, not all of the metrics may usefully apply to formative methods. Since formative methods describe things that could be possible within a particular system, it is complicated to use the correct rejections category because they are unknown and possibly infinite. That is, theoretically the total number of items that could be included is anything in the world as we know it, as formative methods are not restrained by what is currently happening or should be happening. Therefore using an artificial number for the number of items representing the world would be infinite and it is hard to know what items were actively rejected from this large pool of items by expert and novice analysts. While others (Stanton and Baber, 2002; Stanton and Stevenage, 1998) have been able to argue for a theoretical maximum based on a set number of tasks and error categories provided by a taxonomy, such theoretical maximum would be artificially inflated when used for formative methods. Therefore, it is argued that measures using correct rejections are not suitable for assessing the validity of the SAD method. Measures involving hits, misses or false alarms can be used for the evaluation of CWA. Such measures include hit rate (hits divided by hits and misses), which allows comparison of items identified by novice analysts versus items identified by expert analysts. The false alarm rate cannot be used here as that includes using correct rejections. To still account for the number of false alarms a novice analyst identifies, a measure often used in clinical studies or recall studies can be used: positive predictive value (Descatha et al., 2009; Lindegård Andersson and Ekman, 2008). Predictive value (hits divided by hits and false alarms) reflects the amount of items identified by novice analysts that were also identified by expert analysts compared to the total number of items identified by novice analysts. 1.2.1.2. Assessing reliability. Reliability of Ergonomics methods is often assessed using a test-retest paradigm (Baysari et al., 2011). Measures include percentage agreement (Baber and Stanton, 1996;

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

4

M. Cornelissen et al. / Applied Ergonomics xxx (2014) 1e11

Fig. 2. Individual and multi-analyst approach to assessing reliability and validity.

Baysari et al., 2011) and Pearson’s correlation (Harris et al., 2005; Stanton and Young, 2003). As discussed above, the formative nature of CWA provides some challenges to traditional Ergonomics reliability and validity measures. To avoid unwarranted overcomplicating of analysis, percentage agreement was applied here to evaluate the test-retest reliability. The reliability measure compares novice analyst’s items identified at two different times. 1.2.2. Multi analyst approach CWA is a resource intensive method. The formative nature of the method requires participants to go beyond what they currently know which is a challenging activity. Further, the analysis concerns a complex system of which analysts may understand a part of the system in great detail while missing the lack of knowledge on other parts. A multi-analyst approach, using more than one analyst to conduct the analysis, is therefore practicable for a method such as CWA to decrease resources required and compensate for shortfalls of individual analysts (Stanton et al., 2009). Other validation studies have pooled individual results (Harris et al., 2005) or proposed multiple analyst approaches (Stanton et al., 2009). These approaches are suggested to increase the validity and comprehensiveness of the methods’ results. Unfortunately, one of the main weaknesses of a multi-analyst approach is the increase in false alarm rate (e.g. items identified that are not accurate) (Stanton et al., 2009). To ensure that the benefits of a multi-analyst approach outweigh the cost of introducing false alarms, it is worth exploring strategies to reduce the false alarms and ensure the quality of the output. For example, it is assumed that if a method is accurate, relevant content should be identified more consistently than irrelevant content. A practical solution for a multi-analyst approach to CWA would be to only include items if more than one novice analyst generated that item or multiple analysts agreed on the relevance of that item in a group

discussion. Therefore, the present validity analysis includes both an unadjusted multi-analyst model (collating all individual items) and an adjusted pooled multi-analyst model (collating all individual items into one model, but eliminating all items that have only been identified by one novice analysts) to evaluate the value of a multianalyst approach to SAD. Fig. 2 summarises the approach taken to evaluate the SAD method. Validity was measured by quantifying hit rate and predictive value of participant’s results. Reliability was be measured by using a test-retest paradigm. Results were analysed for individual analysts as well as for a pooled multi-analyst approach. There were a number of hypotheses. It is expected that if the SAD method is valid, novice analysts’ models will be similar to the expert analyst model. It is expected, however, that novice analyst’s models will fail to produce complete coverage of the expert analyst’s model within the constraints of the study. Reasons for this include the use of novice analysts, time constraints of a reliability and validity study and the semi-structured formative approach of SAD. It is expected that by using a multi-analyst approach, and especially an adjusted multi-analyst model, results will improve and resemble the expert analyst’s model better. If SAD can be used reliably to conduct a SAD and elicit strategies, novice analysts’ models over time are expected to be similar. 2. Method 2.1. Participants 17 transportation safety professionals aged between 27 and 61 (M 38, SD 11) took part in the study, see Tables 1 and 2. While the participants were all professionals with an interest in Human Factors, only 4 of them had applied CWA once or twice before but never in full (i.e. using all five phases). None of the participants had

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

M. Cornelissen et al. / Applied Ergonomics xxx (2014) 1e11

used the SAD. Therefore, all participants were considered to be novices in using the SAD method. Participants were recruited through professional association and university newsletters and were compensated for their time and expenses. Prior to commencing the study, ethics approval was formally granted by the Monash Human Ethics Committee. Participants were divided into two groups based on their availability for the workshops. The two groups however, were the same in regards to the process taken and the time constraints imposed on all activities. 2.2. SAD analysis task The study focused on drivers and cyclists turning right at urban signalised intersections in Melbourne, Australia. In Australia, road users travel on the left hand side of the road. A right hand turn performed on the main road involves crossing the intersection and continuing on the intersecting road. Cyclists can also travel on the footpath, which involves travelling across the intersection using the pedestrian crossings available to the far right corner as viewed from their approach, see Fig. 3. Right hand turns at urban signalised intersections were chosen, as it is a relatively complex task with ample opportunities for performance variability and therefore ideal to evaluate the SAD. At the same time it also defines a reasonable boundary for the system under analysis as suggested by Naikar (2005). 2.3. Materials Training material comprised descriptions and flowcharts of how to conduct an analysis using the WDA and SAD methods. Background information, available throughout the entire exercise, consisted of de-identified quotes obtained from drivers and cyclists. These quotes described verbalised thought processes obtained while making right hand turns and post hoc explanation of cognitive processes underlying their decision making at the time; a more detailed description can be found in Cornelissen et al. (2012, 2013). Participants were furthermore provided with a WDA of the intersection ‘system’. An extract of the WDA is provided in Fig. 2. 2.3.1. Building the Strategies Analysis Diagram Exercise material for building the SAD consisted of matrices in which participants could first write possible verbs and criteria identified in columns on the left hand side and subsequently link them to for example physical objects, which were already provided in the top row. Participants used matrices rather than drawing a SAD by hand to make it easier for participants to keep track of their process and for the analysts to analyse the diagrams afterwards.

5

Table 2 Participant background.

Age Years experience

M

SD

Min

Max

38 14

11 11

27 4.5

61 40

flowcharts to write down the strategies elicited. These flowcharts contained empty nodes from each level of the SAD for participants to fill out. From left to right the flowcharts contained nodes for verbs, physical objects, object related processes, criteria and values and priority measures. 2.4. Design All participants underwent the same training and exercises. Workshop one (T1) was used to assess the validity of participant’s results using the SAD against the expert analyst results. In a second workshop (T2), a month after the first workshop, participants completed the exercise again. The participant’s results at T1 were compared to their results at T2 to assess the test-retest reliability. 2.5. Procedure Participants each attended two workshops. 2.5.1. Workshop 1: data for criterion-referenced validity and testretest reliability In workshop one, upon completion of a demographic questionnaire and informed consent form, participants received training in CWA, and specifically the SAD. The training consisted of an introduction to the method as well as a walkthrough of the method using road transport examples. Once familiar with the method, participants were given a training exercise in which they applied the method to a relatively simple system, the Apple iPod. In the training exercise participants could ask questions to ensure they fully understood the method before starting the main exercise. In the training exercise, participants were given an hour to identify verbs and criteria and link them to the neighbouring levels in the SAD. They were then given 45 min to elicit strategies from the SAD. Following a break, participants started the main exercise. The exercise was explained to the participants and a step-by-step walkthrough of the materials provided was conducted.

2.3.2. Eliciting strategies from the Strategies Analysis Diagram After participants built their own SAD they were provided with a SAD prepared for them, see Fig. 4, from which to elicit strategies. This was done to ensure that all participants elicited strategies from the same diagram and that elicitation was not influenced by how participants performed in building the SAD. Participants used SAD Table 1 Participant background. Participants (n ¼ 17) Gender Educational background Employment background

n Female Male Human Factors Engineering Academia Government agency Industry

8 9 10 7 5 8 4

Fig. 3. Example on-road and pedestrian crossing route for right hand turns at Australian intersections.

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

6

M. Cornelissen et al. / Applied Ergonomics xxx (2014) 1e11

Fig. 4. Work Domain Analysis extract.

Participants were once more prompted to the formative nature of the method and urged to identify possible verbs and criteria not only those currently seen in our intersection system. Participants were given an hour and 45 min to build a SAD. After a short break, participants were provided with a SAD prepared for them and were asked to elicit strategies from this diagram. Participants elicited strategies for a simple and complex scenario see Table 3. Participants were given 30 min to elicit strategies for the simple scenario and an hour for the complex scenario. 2.5.2. Workshop 2: data for test-retest reliability A month after the first workshop participants returned for a second workshop. In workshop two, participants received a refresher training familiarising them again with the method. After the refresher training, participants started with the main exercise. This was the same exercise as in workshop one. Once finished with the exercise participants filled out a feedback form and were thanked for their participants and reimbursed for their expenses. 2.6. Data analysis The expert model was developed by one analyst, and reviewed by two analysts, all with extensive experience in the CWA framework and SAD method. The validity of SAD was assessed by comparing the verbs, criteria and strategies in the participants’ model (novice analyst) with those in the expert model (expert analyst). Measures of validity included hit rate and predictive value. The analysis is depicted in Fig. 5. The validity of the method using a multi-analyst approach was analysed by pooling individual participant’s analyses into a group model. That is, the raw data was combined into one group model and the verbs, criteria and strategies identified in this group model were compared to the expert analyst’s model. In addition, an adjusted multi-analyst model was tested to evaluate whether a practical solution could be found to counter the increase of false alarms in a multi-analyst approach. In the adjusted multi-analyst model, items were eliminated from the group model if only one novice analyst from the

overall group of 17 analysts generated it. The output of this approach is referred to in the text as the adjusted pooled model. The reliability of SAD was assessed by comparing participants’ results at two different times, using a test retest paradigm. The verbs, criteria and strategies identified by participants at T1 were compared with the verbs, criteria and strategies identified at T2. To assess the reliability of a multi-analyst approach, items identified by all 17 novice analysts were combined into one group model and compared to the items identified by all 17 novice analysts at T2. In the adjusted multi-analyst model, items that were only identified by one of the 17 novice analysts were eliminated from the group model. 3. Results 3.1. Criterion-referenced validity 3.1.1. Building the Strategies Analysis Diagram 3.1.1.1. Individual analyst approach. All participants defined verbs and criteria. Not all participants linked the nodes in the SAD while some connected all nodes with each other. Therefore the definition of verbs and criteria is analysed but linking of the nodes was not analysed further.

Table 3 Scenarios strategy elicitation. Simple scenario

Complex scenario

The road user (driver or cyclist) has entered the right hand turning lane. There is no traffic in front. The road user wants to position him/herself at the traffic lights aiming to activate the traffic light sensor embedded in the tarmac. What strategies could the road user apply to position him or herself at the traffic light sensor? The road user (cyclist) has entered the right hand turning lane. Traffic has banked up in the turning lane. The cyclist is approaching the stopped traffic and is deciding whether to filter to the front of the traffic cue. What strategies could the road user apply to decide whether to filter to the front?

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

M. Cornelissen et al. / Applied Ergonomics xxx (2014) 1e11

7

Fig. 5. Example analysis of evaluating accuracy novice model compared to expert model; identifying verbs and criteria.

It was found that the accuracy of verbs and criteria identified by individual participants, produced within the time constraints set, compared to the verbs and criteria identified by the expert analysts was average, see Fig. 3. The results showed that only 21e24% of the verbs and criteria described in the expert model were identified by the novice analysts.

3.1.2. Eliciting strategies Next participants’ ability to accurately elicit strategies from the SAD diagram was assessed. This involved evaluating whether novice analyst’s courses of action (comprising verbs, physical objects and object related processes) were similar to the courses of action identified by expert analyst in their model.

3.1.1.2. Multi-analyst approach. The novices’ group model, produced by pooling individual verbs and criteria into one model, improved the accuracy of verbs and criteria identified when compared with the expert model, see Fig. 6. Pooled results resemble the expert model well, identifying 81e88% of the verbs and criteria defined in the expert model. Unfortunately, pooling participant’s results into a group model increases the number of false alarms due to pooling of error data. Therefore the predictive value of the pooled model is low (29e33%).

3.1.2.1. Individual analyst approach.. The results showed that novices’ individual results for eliciting courses of action did not achieve high levels of accuracy, see Fig. 7. The novice analyst’s identified only 8e13% of the strategies identified in the expert model. 3.1.2.2. Multi-analyst approach. Pooling individual participant’s courses of action identified into a group model improved the results, see Fig. 4. As a group, the novice analysts identified 57e58% of the strategies in the expert model.

3.1.1.3. Adjusted pooled model. The rise in false alarms was countered in the adjusted pooled model by removing items that were identified by one novice analyst only. This reduced the irrelevant verbs and criteria significantly while only removing a small number of verbs and criteria that were in fact relevant. This improved the predictive value of the pooled model to 48 and 66% respectively, see Fig. 6.

3.1.2.3. Adjusted pooled model. Again, strategies identified by one participant only were removed. This resulted in an adjusted pooled model. This adjustment reduced the accuracy of the group model, with the group now only identifying 32e43% of strategies identified in the expert model; see Fig. 7. Participants elicited a great variety in courses of action and therefore many relevant strategies were removed. Adjusting the pooled model did increase the

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

8

M. Cornelissen et al. / Applied Ergonomics xxx (2014) 1e11

3.2. Test-retest reliability of individual analysts approach Participants applied the SAD method at two different times with a month’s interval. Assessing the test-retest reliability involved evaluating whether results were consistent over time, namely whether participants identified the same content at both times. Interpretation scores of reliability measures for formative methods, or instances where correct rejections do not exist or cannot be estimated are absent from the literature to date. Therefore Kappa’s interpretation of reliability is the closest guidance and is used for the interpretation of the results here. 3.2.1. Building the Strategies Analysis Diagram 3.2.1.1. Individual analyst approach. The reliability of individual results was found to be fair for identifying verbs and criteria, see Fig. 9. Between .23 and .28 of the content was consistently identified at both applications (T1 and T2). 3.2.1.2. Multi-analyst approach.. The pooled results, pooling all individual models into a group model at T1 and pooling individual models into a group model for T2, resulted in fair to moderate agreement. As a group, between .37 and .41 of the verbs and criteria was consistently identified at both applications (T1 and T2). 3.2.1.3. Adjusted pooled model. Eliminating items that were only identified by one of the 17 analysts improved the test-retest reliability. The consistency of the adjusted group model over time was between .52 and .53. Fig. 6. Individual (Mdn and IQR) and pooled results for verbs and criteria.

predictive value. Hence, the chances that a course of action that was identified was relevant were greater for the adjusted than the non adjusted pooled model. Next, participant’s ability to identify accurate criteria and values and priority measures for the courses of action identified was assessed. Criteria and values and priority measures were only assessed for each participant’s hits as naturally there are no corresponding criteria and values and priority measures for false alarms in the expert model. The number of accurate courses of action (hits) identified per participant varied and the number of criteria and values and priority measures per strategy in the expert model varied as well. Counts of criteria and values and priority measures identified do not take this relativity into account. Therefore only percentages and no counts are reported in this section.

3.1.2.4. Individual analyst approach. Participants were better at identifying relevant values and priority measures than criteria for both scenarios, see Fig. 8. This finding is not surprising as the elicitation of values and priority measures was much more constrained (6 values and priority measures versus 44 criteria) and therefore the chances of poor accuracy was lower. There were large differences between participants, with some identifying as little as 0e5% of the relevant criteria and others 40e50%.

3.1.2.5. Multi-analyst approach. Pooling of the individual participant’s criteria and values and priority measures into a group model was not conducted for this part of the analysis as participants identified different courses of actions and therefore the criteria and values and priority measures relate to different courses of action and cannot be pooled.

3.2.2. Eliciting strategies from the Strategies Analysis Diagram 3.2.2.1. Individual analyst approach. Reliability for eliciting strategies was lower. There was only a slight agreement for individuals eliciting strategies, see Fig. 9. Between .11 and .13 of the strategies elicited at T1 were also elicited at T2. 3.2.2.2. Multi-analyst approach. Pooling of the individual models into a group model at T1 and a group model at T2 increased the test-retest reliability of eliciting strategies. As a group, between .25 and .26 of the strategies elicited were consistently identified at T1 and T2. 3.2.2.3. Adjusted pooled model. Eliminating strategies that were only identified by one of the 17 analysts improved the test-retest reliability even more. About .39 of the strategies in the adjusted group model were identified at T1 and T2. 4. Discussion The aim of this article was to evaluate the criterion-referenced validity and test-retest reliability of the SAD method. The study represents a first of a kind evaluation of reliability and validity of the SAD and one of the first to conduct such evaluations for formative methods. Therefore the study adds to the knowledge base surrounding the reliability and validity of Ergonomics methods work (Stanton and Young, 1999, 2003). The analysis showed that individual results, within the constraints of the current exercise, proved to be only moderate. This is especially true when compared to other reliability and validity studies, see Table 4. Group models, produced by pooling individual analyst output into one model, were more valuable than the individual models on its own. This suggests that the SAD method’s reliability and validity can be optimised through the use of a structured process whereby individual analysts first conduct an analysis, following which the individual outputs are pooled by an independent analysts or the individual analysts are brought

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

M. Cornelissen et al. / Applied Ergonomics xxx (2014) 1e11

9

Engineers generated more content and relatively more false alarms, whereas Human Factors professionals generated less content but the content generated was more accurate. These differences were not statistically significant but might explain some of the variation observed between individuals. The performance when using Ergonomics methods of analysts with different discipline skill sets is therefore proposed as a pertinent line of further research. Table 4 summarises the reliability and validity scores of other Ergonomics methods that have been tested and reported on within the published peer reviewed literature. While these studies were conducted using structured human error prediction methods with limited number of categories for participants to chose from, it allows for relative evaluation of the scores obtained in the study presented in this article. The human error prediction studies did not report predictive value scores and therefore only hit rates are compared. This confirms that the individual novice analyst’s scores using the SAD are below average when compared with results obtained in other validation studies. This is not surprising, however, when one delves deeper into the structure of formal error prediction methods. For example, the Systematic Human Error Reduction and Prediction Approach (SHERPA, Embrey, 1986) bounds the analysis. Only a certain number of error modes can be selected for each task step depending on its behavioural classification (e.g. as an Action or a Checking behaviour). This limits the extent to which analysts can achieve poor accuracy scores. SAD, on the other hand,

Fig. 7. Individual (Mdn and IQR) and pooled results for eliciting strategies.

together to integrate their analyses and produce an agreed group result. This finding is not surprising since the CWA framework has typically been applied as a group process involving teams of analysts (Birrell et al., 2011) or analyses being reviewed and adapted using other experts (Jenkins et al., 2008); however, it is an important finding as it highlights the requirement for the SAD to be applied in a similar group manner. When combined with other positive pooled analyst study findings (Stanton et al., 2009), it adds further weight to the notion that the Ergonomics design and evaluation process should follow a group process involving multiple analysts, regardless of the methods used. Further, based on the evidence from other reliability and validity studies it is likely that such approaches will prove beneficial for applications of Ergonomics methods generally, including task analysis, error prediction and analysis, and accident analysis methods. The results demonstrated variability between participants. Whether such differences could be explained by experience in CWA, academic or industry background, field of study (e.g. whether Human Factors and Engineers performed differently), differences between groups, age or years of experience was investigated, however, no valid explanation could be found. An observation made during the workshops, however, was that Human Factors professionals and Engineering professionals did approach the exercise differently. Human Factors professionals attempted to understand the method and experienced blocks in generating verbs, criteria and strategies. Engineering professionals appeared to trust the method and started generating verbs, criteria and strategies with less scrutiny on what was generated. It was found that

Fig. 8. Individual (Mdn and IQR) and pooled results for matching criteria and values and priority measures with strategies.

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

10

M. Cornelissen et al. / Applied Ergonomics xxx (2014) 1e11

Fig. 9. Individual (Mdn and IQR) and pooled results for reliability scores.

gives the analyst far greater scope to identify different verbs, criteria etc. The pooled multi-analyst hit rates scores obtained for building the SAD in this study compare well with the human error prediction methods, especially considering the semi structured approach of SAD. The hit rates for the multi-analyst approach are a little lower than hit rates reported in human error studies. Reliability scores were considerably lower than those of human error predication methods. Given the difference in nature of the methods, the Table 4 Outcomes other validity scores. Method

Expert/novices

Validity

Reference

TAFEI

Expert analysts Novices Experts

67% 48% 80% 75% 68%; w67e70% 90% w60e68% w75e80% 88-89% 38% 62% 54% w71e79%

Baber and Stanton, 1996 Stanton and Baber, 2002 Baber and Stanton, 1996 Stanton and Stevenage, 1998 Stanton and Baber, 2002 Stanton et al., 2009 Harris et al., 2005 Stanton et al., 2009 Stanton et al., 2009 Stanton et al., 2009 Gordon et al., 2005 Baysari et al., 2011 Baysari et al., 2011 Stanton et al., 2009

SHERPA

Novices

HAZOP HEIST HET HFIT Tracer-Rail Tracer-rav1 Multi method

Pooled; novices Novices Novices Novices Novices Novices Novices Novices

Method

Expert/novices

Reliability

Reference

TAFEI

Expert Novices Expert Novices

.9 .79 .9 .73 .7

Baber and Stanton, Stanton and Baber, Baber and Stanton, Stanton and Baber, Harris et al., 2005

SHERPA

1996 2002 1996 2002

validity scores of SAD, when using a multi-analyst approach, are promising. However, lessons were identified where the method can be improved. There are a number of reasons why the SAD method performed poorly when individual analyst outputs were examined. Participants provided feedback that they found it hard to work with someone else’s output. Participants were provided a finished WDA when building the SAD, and a finished SAD when eliciting strategies. It is apparent that not being involved in development of the WDA and the SAD made it difficult for participants to understand the language used and subsequently define the correct terms. Involvement of analysts throughout the model development process, spending considerable time to explain what each term means, and providing clearer guidance on how to define verbs and criteria all seem logical approaches that could be beneficial and produce better outcomes. The language issue might explain why there was variety in the items identified over time. Participants reported that they were impressed by the expert model. Participants therefore might have remembered the items that they did not identify themselves (misses) better than the ones that they did identify themselves (hits) and ensured that they would come up with these additional verbs and criteria, for example, in the second workshop. Also, while most Human Factors task analysis methods have a structured process to start at the same step every time (e.g. Hierarchical Task Analysis (Stanton, 2006)) participants were free to start at any point in the SAD, and were even encouraged to use strategies to formatively identify new verbs and criteria. While this is beneficial for innovation it is perhaps not inducing high levels of consistency. The evaluation of participant’s results and sorting them into categories of hits, misses and false alarms was conducted in a strict manner. If items were worded very differently (e.g. verb ‘accelerate’ referring to actual interaction of ‘press’ (pedal) or ‘slow’ versus ‘release’ pedal), or an incorrect level of detail was provided in describing the items identified (e.g. criteria ‘weather conditions’ versus ‘wet conditions’), such items were counted as false alarms or misses. However, while such evaluation is appropriate for a reliability or validity study, a real world application of pooling results would filter such false alarms out or reword them to fit the purpose of the application. The use of novices and the strict assessment protocol therefore provides a worst case scenario. Stanton and Stevenage (1998) for example used novices to apply human error identification methods, in which the theoretical maximum of options was much more constrained than the current method, and found reliability scores between .4 and .6. They regarded those positive and attributed them to lack of experience of their participants since other papers using the same methods reported much higher performance when using experts for the same methods. Therefore reliability and validity results of SAD are expected to improve when more experienced analysts use the method. Stanton and Young (2003) also point out that the scope of analysis influences the results and that with a wider scope it is harder to get favourable reliability and validity statistics. It has been argued that novices perform better when using a simple system under analysis (Stanton and Stevenage, 1998; Stanton and Young, 2003) and for a complex system, experts outperform novices (Baber and Stanton, 1996; Stanton and Young, 2003). SAD analysis was performed on a complex system and the formative nature of the method provides for a wide scope of analysis. The current study presented a first of a kind evaluation of the reliability and validity of a formative method, the SAD. While the measures used were found to be the most appropriate measures available in the literature to date, further research should address

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

M. Cornelissen et al. / Applied Ergonomics xxx (2014) 1e11

whether more appropriate methods could be developed for reliability and validity evaluations of formative methods. Moreover, establishing the reliability and validity of Ergonomics methods is a key endeavour for all within the discipline, particularly to support increased involvement of Ergonomics methods and practitioners in the design and evaluation of safety critical systems. Continued discussion and investigation in this area is therefore encouraged. Future directions should focus on providing more detailed guidance on the SAD process and how to define verbs, criteria and strategies, similar to directions provided on conducting a WDA by Naikar et al. (2005). While in the current study the pooling exercise was conducted by a separate analyst, future directions include exploring the difference with such an approach and allowing the individuals who build their own models to pool their models into a group model through a structured group process. Acknowledgement Miranda Cornelissen’s contribution to this article was conducted as part of her PhD candidature. This was funded by a Monash Graduate Scholarship and a Monash International Postgraduate Research Scholarship. Paul Salmon’s contribution to this article was funded through his Australian National Health and Medical Research Council post doctoral fellowship. References Ahlstrom, U., 2005. Work domain analysis for air traffic controller weather displays. J. Saf. Res. 36, 159e169. Baber, C., Stanton, N.A., 1994. Task analysis for error identification: a methodology for designing error-tolerant consumer products. Ergonomics 37 (11), 1923e 1941. Baber, C., Stanton, N.A., 1996. Human error identification techniques applied to public technology: predictions compared with observed use. Appl. Ergon. 27 (2), 119e131. Baysari, M.T., Caponecchia, C., McIntosh, A.S., 2011. A reliability and usability study of TRACEr-RAV: the technique for the retrospective analysis of cognitive errors e for rail, Australian version. Appl. Ergon. 42 (6), 852e859. Birrell, S.A., Young, M.S., Jenkins, D.P., Stanton, N.A., 2011. Cognitive Work Analysis for safe and efficient driving. Theor. Issues Ergon. Sci., 1e20. Bolger, F., Wright, G., 1992. Reliability and validity in expert judgment. In: Wright, G., Bolger, F. (Eds.), Expertise and Decision Support. Plenum Press, New York, pp. 47e76. Burns, C.M., Bisantz, A.M., Roth, E.M., 2004. Lessons from a comparison of work domain models: representational choices and their implications. Hum. Factors 46 (4), 711e727. Cornelissen, M., Salmon, P.M., Jenkins, D.P., Lenné, M.G., 2012. A structured approach to the strategies analysis phase of cognitive work analysis. Theor. Issues Ergon. Sci., 1e19. Cornelissen, M., Salmon, P.M., McClure, R., Stanton, N.A., 2013. Using cognitive work analysis and the strategies analysis diagram to understand variability in road user behaviour at intersections. Ergonomics, 1e17. Descatha, A., Roquelaure, Y., Caroly, S., Evanoff, B., Cyr, D., Mariel, J., Leclerc, A., 2009. Self-administered questionnaire and direct observation by checklist: comparing two methods for physical exposure surveillance in a highly repetitive tasks plant. Appl. Ergon. 40 (2), 194e198.

11

Embrey, D.E., 1986. SHERPA: a systematic human error reduction and prediction approach. In: Paper Presented at the International Meeting of Advances in Nuclear Power Systems, Knoxville, Tennessee. Gordon, R., Flin, R., Mearns, K., 2005. Designing and evaluating a human factors investigation tool (HFIT) for accident analysis. Saf. Sci. 43 (3), 147e171. Harris, D., Stanton, N.A., Marshall, A., Young, M.S., Demagalski, J., Salmon, P., 2005. Using SHERPA to predict design-induced error on the flight deck. Aerosp. Sci. Technol. 9 (6), 525e532. Hassall, Maureen E., Sanderson, Penelope M., 2012. A formative approach to the strategies analysis phase of cognitive work analysis. Theor. Issues Ergon. Sci., 1e 47 http://dx.doi.org/10.1080/1463922x.2012.725781. Hilliard, A., Thompson, L., Ngo, C., 2008. Demonstrating CWA strategies analysis: a case study of municipal winter maintenance. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 52 (4), 262e266. Hollnagel, E., 2002. Understanding Accidents-from Root Causes to Performance variability. In: Paper presented at the IEEE 7th Human Factors Meeting, Scottsdale, Arizona. Hollnagel, E., 2004. Barriers and Accident Prevention. Ashgate Publishing, Aldershot. Hollnagel, E., 2006. Resilience: the challenge of the unstable. In: Hollnagel, E., Woods, D.D., Leveson, N. (Eds.), Resilience Engineering: Concepts and Precepts. Ashgate, Aldershot, United Kingdom. Jenkins, D.P., Stanton, N.A., Salmon, P.M., Walker, G.H., Young, M.S., 2008. Using cognitive work analysis to explore activity allocation within military domains. Ergonomics 51 (6), 798e815. Kirwan, B., Kennedy, R., Taylor-Adams, S., Lambert, B., 1997. The validation of three Human Reliability Quantification techniques d THERP, HEART and JHEDI: part II d results of validation exercise. Appl. Ergon. 28 (1), 17e25. Lindegård Andersson, A., Ekman, A., 2008. Reply to the short communication paper by T.P. Hutchinson regarding “Concordance between VDU-users’ ratings of comfort and perceived exertion with experts’ observations of workplace layout and working postures”. Applied Ergonomics (2005) 36, 319e325. Appl. Ergon. 39 (1), 133e134. Naikar, N., 2005. A methodology for work domain analysis, the first phase of cognitive work analysis. In: Paper Presented at the Human Factors and Ergonomics Society 49th Annual Meeting, Orlando, Florida. Naikar, N., Hopcroft, R., Moylan, A., 2005. Work Domain Analysis: Theoretical Concepts and Methodology. DSTO, Fishermans bend. Rasmussen, J., Pejtersen, A.M., Goodstein, L.P., 1994. Cognitive Systems Engineering. Wiley, New York. Stanton, N.A., 2006. Hierarchical task analysis: developments, applications, and extensions. Appl. Ergon. 37 (1), 55e79. Stanton, N.A., Baber, C., 2002. Error by design: methods for predicting device usability. Des. Stud. 23 (4), 363e384. Stanton, N.A., Salmon, P.M., Harris, D., Marshall, A., Demagalski, J., Young, M.S., Dekker, S.W.A., 2009. Predicting pilot error: testing a new methodology and a multi-methods and analysts approach. Appl. Ergon. 40 (3), 464e471. Stanton, N.A., Salmon, P.M., Rafferty, L., Walker, G.H., Baber, C., Jenkins, D.P., 2013. Human Factors Methods: A practical guide for Engineering and Design. second ed. Ashgate Publishing, Aldershot, United Kingdom. Stanton, N.A., Stevenage, S.V., 1998. Learning to predict human error: issues of acceptability, reliability and validity. Ergonomics 41 (11), 1737e1756. Stanton, N.A., Young, M.S., 1998. Is utility in the mind of the beholder? A study of ergonomics methods. Appl. Ergon. 29 (1), 41e54. Stanton, N.A., Young, M.S., 1999. What price ergonomics? Nature 399, 197e198. Stanton, N.A., Young, M.S., 2003. Giving ergonomics away? The application of ergonomics methods by novices. Appl. Ergon. 34 (5), 479e490. Vicente, K.J., 1999. Cognitive Work Analysis: toward Safe, Productive and Healthy Computer-based Work. Lawrence Erlbaum Associates, Inc, Mahwah, New Jersey. Woods, D.D., 1988. Coping with complexity: the psychology of human behaviour in complex systems. In: Goodstein, L.P., Andersen, H.P., Olsen, S.E. (Eds.), Tasks, Errors, and Mental Models. Taylor & Francis, London, pp. 128e148.

Please cite this article in press as: Cornelissen, M., et al., Validating the Strategies Analysis Diagram: Assessing the reliability and validity of a formative method, Applied Ergonomics (2014), http://dx.doi.org/10.1016/j.apergo.2014.04.010

Validating the strategies analysis diagram: assessing the reliability and validity of a formative method.

The Strategies Analysis Diagram (SAD) is a recently developed method to model the range of possible strategies available for activities in complex soc...
2MB Sizes 0 Downloads 3 Views