This article was downloaded by: [New York University] On: 02 June 2015, At: 12:06 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Ergonomics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/terg20

Human factors assessment of conflict resolution aid reliability and time pressure in future air traffic control a

b

c

a

Fitri Trapsilawati , Xingda Qu , Chris D. Wickens & Chun-Hsien Chen a

School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore b

Institute of Human Factors and Ergonomics, College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen, China c

Alion Science Corporation, Boulder, Colorado, USA Published online: 20 Jan 2015.

Click for updates To cite this article: Fitri Trapsilawati, Xingda Qu, Chris D. Wickens & Chun-Hsien Chen (2015) Human factors assessment of conflict resolution aid reliability and time pressure in future air traffic control, Ergonomics, 58:6, 897-908, DOI: 10.1080/00140139.2014.997301 To link to this article: http://dx.doi.org/10.1080/00140139.2014.997301

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Ergonomics, 2015 Vol. 58, No. 6, 897–908, http://dx.doi.org/10.1080/00140139.2014.997301

Human factors assessment of conflict resolution aid reliability and time pressure in future air traffic control Fitri Trapsilawatia, Xingda Qub*, Chris D. Wickensc and Chun-Hsien Chena a

School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore; bInstitute of Human Factors and Ergonomics, College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen, China; cAlion Science Corporation, Boulder, Colorado, USA

Downloaded by [New York University] at 12:06 02 June 2015

(Received 1 September 2014; accepted 2 December 2014) Though it has been reported that air traffic controllers’ (ATCos’) performance improves with the aid of a conflict resolution aid (CRA), the effects of imperfect automation on CRA are so far unknown. The main objective of this study was to examine the effects of imperfect automation on conflict resolution. Twelve students with ATC knowledge were instructed to complete ATC tasks in four CRA conditions including reliable, unreliable and high time pressure, unreliable and low time pressure, and manual conditions. Participants were able to resolve the designated conflicts more accurately and faster in the reliable versus unreliable CRA conditions. When comparing the unreliable CRA and manual conditions, unreliable CRA led to better conflict resolution performance and higher situation awareness. Surprisingly, high time pressure triggered better conflict resolution performance as compared to the low time pressure condition. The findings from the present study highlight the importance of CRA in future ATC operations. Practitioner Summary: Conflict resolution aid (CRA) is a proposed automation decision aid in air traffic control (ATC). It was found in the present study that CRA was able to promote air traffic controllers’ performance even when it was not perfectly reliable. These findings highlight the importance of CRA in future ATC operations. Keywords: air traffic control; conflict resolution aid; automation reliability; time pressure; human factors assessment

1.

Introduction

Air traffic controllers (ATCos) play an essential role in the current air traffic management system. Every aircraft is directed to its destination through integrated communication with ground-based ATCos so as to maintain separation among aircraft. ATCos are required to observe their controlled sectors, to provide clearances, to detect a potential conflict and to instruct pilots to resolve a conflict. A major challenge to air traffic control (ATC) is caused by the projected continuous increase in air traffic density (Airbus 2013; ICAO 2010; Sheridan 2006). As a result, ATCos have to deal with more aircraft in their daily duty, and their tasks become more cognitively demanding which could compromise their performance and air traffic safety in the near future. Automation has been suggested to be a solution to minimising ATCos’ cognitive (mental) workload and improving their performance (Endsley and Kaber 1999; Wickens et al. 1998). In an ideal condition, perfectly reliable automation could improve ATCos’ performance. For instance, in a study by Rovira and Parasuraman (2010), ATCos could detect up to 28% more conflicts with the help of a conflict detection aid. ATCos could also detect conflicts on average 78 seconds earlier with such an aid. However, in reality, automation cannot be perfectly reliable due to an inherent uncertainty of prediction. Unreliability can lead to two major problems (Parasuraman and Riley 1997). First, ATCos may not prefer to rely upon automation because they assume that the automation is unreliable. Researchers from the Programme for Harmonised Air Traffic Management Research in EUROCONTROL (PHARE) found the underutilisation of the automation tools by the ATCos due to the presumed lack of reliability and accuracy of the tools (Kelly et al. 2003). The second problem is that ATCos may be too dependent on the automation. In the short run, this can lead to a loss of situation awareness (SA). In the long-term operation, this may lead to ATCos deskilling (Wickens et al. 2013). Both factors can induce slower response time when an automation failure occurs due to the skill loss. Although imperfectly reliable automation is not desirable from some aspects, some studies in the context of conflict detection automation still reported that unreliable automation could benefit ATCos. In Metzger and Parasuraman (2005), 2 failures were provided out of 17 conflict detection trials, leaving the ATCos with 88% overall conflict detection aid reliability. The results indicated that the automation could help ATCos to detect 90% of the conflicts. This was well above

*Corresponding author. Email: [email protected] q 2015 Taylor & Francis

Downloaded by [New York University] at 12:06 02 June 2015

898

F. Trapsilawati et al.

the 81% detection rate in the manual condition. Employing data from Metzger and Parasuraman as well as others in a metaanalysis, Wickens and Dixon (2007) suggested that compared to the manual condition people can perform detection tasks better with the automation whose reliability is as low as around 70%. Though extensive research has been done in the context of conflict detection automation, little attention has been paid to conflict resolution automation. The conflict resolution aid (CRA) is a proposed automation decision aid utilised in air traffic management to help ATCos resolve air traffic conflicts. The CRA can be configured to advise ATCos of a wide variety of manoeuvres and avoid loss of separation (LOS) between aircraft. Prevot et al. (2012) found that ATCos’ performance was improved with a CRA relative to the manual condition. Westin, Borst, and Hilburn (2013) also found that ATCos tended to rely upon the CRA, which could improve performance especially in a higher density environment. In both studies, the CRA was of perfect reliability. It is important to realise, however, that automated conflict resolution, like conflict detection, can also be unreliable, although in contrast to conflict detection, where misses and false alerts are easy to categorise as automation failures, the issue of what is the ‘wrong’ advice (an automation conflict resolution failure) is harder to specify unambiguously. No empirical data are available regarding the effects of automation reliability on ATCos performance with automated CRAs. In fact, reliability and its effects on trust are arguably the most critical factor in any automated systems (Lee and See 2004). In addition, the factor of time pressure imposed on ATCos in an unreliable condition may affect ATCos performance due to speed –accuracy trade-off. Therefore, there is also a need to investigate how ATCos utilise automation aids when they are exposed to different time pressure conditions. The present study aimed to evaluate the CRA from a human factors perspective by investigating the effects of CRA reliability and time pressure. To do so, we examined several human factors measurements (Langan-Fox, Sankey, and Canty 2009) including conflict resolution performance, mental workload, trust, dependence and SA in a manual condition, and three automated conditions. The latter set consisted of one 100% reliable condition and two 80% reliable conditions, and the latter two in turn were examined under low and high time pressure. It was hypothesised (H1) that participants’ performance and workload would be improved with the support of the CRA as compared to manually performing the conflict avoidance tasks. Furthermore, perfectly reliable automation would improve the performance and workload compared to the unreliable one (H2). However, unreliable automation would still improve the performance and reduce workload compared to the manual condition (H3). Since more automation would be associated with less SA (Onnasch et al. 2014), it was hypothesised that participants’ SA would be lower under the automated conditions than the manual condition. In addition, the SA was predicted to be higher in the unreliable condition than in the reliable condition as unreliability should foster deeper processing of the raw data (H4). We also assessed participants’ trust rating and their dependence on the CRA. This is because automation use (or dependence) is a behavioural manifestation of trust (Wickens et al. 2013). We hypothesised (H5) that trust would degrade in the unreliable conditions compared to reliable condition and therefore, participants would be less likely to utilise the automation in the unreliable conditions. In terms of time pressure, it was hypothesised that performance would decrease and workload would increase when participants had shorter response time in the unreliable condition (H6). In Dao et al. (2009) no speed accuracy tradeoff was observed in SA. Also, working in a low time pressure environment might lead to out-of-the-loop-unfamiliarity (OOTLUF) due to complacency to automation. Given these reasons, we hypothesised that participants’ SA would be higher with increased time pressure (H7). 2. 2.1.

Methods Participants

Twelve college students (10 males and 2 females) aged 23 – 27 years (mean ¼ 25.08 years, SD ¼ 1.38 years) participated in the present study. These participants were educated in Aerospace Engineering and Aeronautical Engineering from local institutions. An interview was conducted before the study to make sure that all these participants were familiar with the procedures and terms used in the current ATC practice. Informed consent approved by NTU IRB was provided by each participant. 2.2.

Apparatus

Two sets of computers were installed for an ATCo position and a pseudo-pilot position, respectively. A desktop version of ATC simulator, ATCSimulatorw2, was used in the experiment. This simulator consisted of a radar display and electronic flight progress strips as shown in the left monitor in Figure 1. The radar showed a 60-mile radius sector. An ATC-Sector Design Kit (ATC-SDK) was used to generate the traffic scenario in a generic airspace. The conflicts in each scenario were generated by manipulating the flight time, speed and locations of aircraft.

Downloaded by [New York University] at 12:06 02 June 2015

Ergonomics

Figure 1.

899

Air traffic control simulator setup.

The CRA provided advisories to participants. An example of these advisories was shown in Figure 2. This example advised the pilot of aircraft SWA356 to descend and maintain at 10,000 feet while the pilot of aircraft N130LM was asked to maintain its current course. The list of possible abbreviations in the advisory is provided in Appendix 1. The CRA was based on the principle proposed by Erzberger (2006) for Resolution Aircraft and Maneuver Selector (RAMS), since regular patterns of conflict resolution manoeuvre appeared to be empirically verified (Rantanen and Wickens 2012). The CRA adopted the altitude-first resolver principle, where a vertical manoeuvre would be preferred over lateral and speed ones. The timing data of each conflict was inputted into the CRA, and the CRA appeared two minutes prior to any detected conflicts (i.e. LOS). The conflict resolution advisories were translated into macros containing resolution manoeuvres in the radar simulation. Participants used a headset with a built-in microphone to communicate with a pseudo-pilot. A mouse was provided for the interaction between participants and the onscreen CRA control. 2.3. Experimental design and procedure A within-subjects design was adopted. In order to assess the role of CRA reliability and time pressure in ATC, four testing conditions were examined in the experiment: Condition Condition Condition Condition

R: reliable CRA UAH: unreliable and high time pressure UAL: unreliable and low time pressure M: manual (no-CRA).

In order to remove any carry-over effects, a balanced Latin square was used to order these testing conditions across participants. The order of the conditions is shown in Table 1. Before the formal experiment, all participants participated in an instruction and training session, in which they were given the chance to get familiar with ATC protocols and operations and the use of the CRA. A message that this

Figure 2.

Conflict resolution aid.

900 Table 1.

F. Trapsilawati et al. The order of experiment conditions.

Participant no.

Downloaded by [New York University] at 12:06 02 June 2015

1 2 3 4 5 6 7 8 9 10 11 12

Testing conditions R UAH UAL M R UAH UAL M R UAH UAL M

UAH M R UAL UAH M R UAL UAH M R UAL

UAL R M UAH UAL R M UAH UAL R M UAH

M UAL UAH R M UAL UAH R M UAL UAH R

automation accounts for the factors in complex airspace including flight paths of all surrounding aircraft, the route structures as well as the flow constraints was given during the training session. Participants were also informed that the conflict resolution automation involves iteration loops and informed that this may be one of the sources of unreliability. In the formal experiment, there were four 1-hour ATC scenarios corresponding to the four testing conditions, respectively. Each scenario involved five potential conflicts. In each scenario, participants were instructed to communicate with the pseudo-pilots to issue appropriate altitudes, to accept all aircraft that entered their sector, to hand-off aircraft that left their sector and to issue the correct radio frequency change. The four scenarios were completed in four different experimental sessions with an at least one-day interval between each. In order to simulate future air traffic situations, sector density for each scenario was 30 aircraft which is the double of the current air traffic density. The experimenter acted as the pseudo-pilot during the experiment. In the three CRA conditions, there was a conflict resolution advisory for each conflict and participants were free to either accept or reject the advisory by clicking a respective button. If participants agreed with the advisory, the word ‘agree’ would appear in the pseudo-pilot position and the pseudo-pilot would directly execute the resolution manoeuvres by giving the preset macros on the radar simulator. Otherwise, participants had to give their own resolution manoeuvring advisory to the pseudo-pilot. In the reliable condition, the CRA was able to provide correct advisories to all the potential conflicts. In the unreliable and high time pressure condition, the manoeuvring advisory helped resolve a primary conflict but led to a conflict with another aircraft (secondary conflict) in 100 seconds for a random one of the five designated conflicts. ATC operations primarily involve tactical actions which are affected by ‘time pressure’ (Hoffman, Mukherjee, and Vossen 2010.). The selected time interval between the CRA resolution of the primary conflict and the occurrence of the secondary conflict in the high time pressure condition was determined as the sum of the activation time of an onboard traffic alert and collision avoidance system (TCAS) (, 45 seconds) (Thomas, Wickens, and Rantanen 2003), necessary time for an ATCo to give a manoeuvring instruction until a pilot response (, 42 seconds) (Cardosi and Boole 1991; Cardosi et al.1992), and some time allowance (, 10 seconds). It should be noted that once a TCAS is activated, ATCos cannot interfere since the TCAS will give expedited manoeuvres to avoid the conflict. Thus, setting a secondary conflict in 100 seconds after resolution of a primary conflict is based on the minimum time available for ATCos to resolve a conflict in the current system which will surely impose time pressure onto them. In the unreliable and low time pressure condition, the manoeuvring advisory elicited a secondary conflict in four minutes. It should also be noted that in the unreliable conditions, if participants reject the advisory given by the CRA and instruct their own appropriate resolution manoeuvre, the secondary conflict could be avoided. However, if participants accept the advisory in the conflict where a failure has been set, the advisory will resolve the primary conflict but trigger a secondary conflict. This is an automation failure that we set intentionally when coding the process and is the way we define unreliable CRA. We chose this form of automation failure, rather than a ‘failure’ in the resolution of the primary contact, because of the ambiguity in precisely defining the latter. An example of such a failure is given in Figure 3, in which the CRA correctly advised the primary conflict resolution between SNP567 and N80866. But such an advisory led to a secondary conflict between N80866 and FRL789 in 100 seconds. In the manual condition, participants were asked to resolve the potential conflicts by providing their own resolution manoeuvring instructions.

Downloaded by [New York University] at 12:06 02 June 2015

Ergonomics

Figure 3.

2.4.

901

An example of unreliable and high time pressure condition.

Dependent measures

2.4.1. Task performance measures Task performance measures included percentage of resolved conflicts, conflict resolution time and procedural errors. Percentage of resolved conflicts is operationally defined as the absence of LOS relative to the number of designated conflicts. The separation standards for the experiment were 3 nautical miles laterally and 1000 feet vertically (Nolan 1999). Conflict resolution time is defined as the interval between the CRA onset and the moment that the manoeuvring advisory was given to the pilots by participants. Errors in performing the ATC tasks (i.e. improper altitude clearance, inappropriate hand-offs, inappropriate runway assignments and incorrect radio frequency changes) were counted. The ATCSimulatorw2 is equipped with a tracking system of ATC tasks procedure. So long as participants did not meet the appropriate procedure, for instance, when participants gave a wrong radio frequency or wrong exit altitude, the system listed this particular violation in the reporting file. 2.4.2.

Mental workload and trust

Subjective ratings of mental workload and trust in the CRA were obtained using the NASA TLX rating scale (Hart and Staveland 1988), and the Likert-type rating scale (ranging from 0 to 7) developed by Jian, Bisantz, and Drury (2000). Both questionnaires were administered after each testing condition. 2.4.3.

Aid utilisation rate

Aid utilisation rate data that reflected the ATCos’ dependence upon the CRA were obtained through participants’ acceptance of the advisories. During the simulation, two buttons (i.e. agree and reject buttons) appeared following the advisories. If participants opted to agree, the word ‘agree’ appeared on the pseudo-pilot screen and the pseudo-pilot would send a preset macro for resolving the respective conflict. If participants pressed the reject button, they were asked to provide their own resolutions manually to the pseudo-pilot using voice transmission. Their responses to all advisories were recorded. The ratio of the number of agreement with the CRA to the total number of advisories was used to reflect participants’ dependence upon the CRA. 2.4.4.

Situation awareness

SA data were collected during the entire experiment by giving SA probes in two steps. First, a ready prompt was shown. This prompt informs participants that a probe question is ready to be presented (see Figure 4). The examples of probe questions were adopted from the requirements of SA information for en-route ATC (Endsley 1994, 2000) and provided in Appendix 2. If participants felt capable of answering the probe question without affecting their ATC tasks, they could

902

F. Trapsilawati et al.

Downloaded by [New York University] at 12:06 02 June 2015

Figure 4.

An example of probe question.

proceed to the question by pressing an ‘answer’ button. Participants were able to see the questions right after pressing the button. The prompt appeared every five minutes and the probe questions appeared right after clicking the answer button. The SA display timed out after one minute if participants did not respond to either the ready prompt or the probe questions. Four measures accounted for participants’ SA: (1) percentage of timeouts (non-responded questions), (2) ready response latency defined by the latency between the onset of a ready response and when participants indicated that they were ready to answer, (3) probe response latency defined by the latency between the presence of the question and participants’ response and (4) percentage of correct responses to SA probes. 2.5.

Statistical analysis

The conflict resolution time, trust ratings and aid utilisation rate were only collected in the three CRA conditions. Therefore, the data were analysed using a repeated measure ANOVA with one 3-level independent variable and two planned orthogonal contrasts: (1) reliable versus unreliable conditions, (2) unreliable conditions with high versus low time pressure to resolve a secondary conflict. The remaining dependent variables (i.e. percentage of resolved conflicts, procedural errors, mental workload and SA measures) were collected in all the four scenarios. The data were analysed using a repeated measure ANOVA with the CRA reliability as the four-level independent variable and three planned orthogonal contrasts: (1) three automated versus manual conditions, (2) reliable versus two unreliable conditions, (3) unreliable conditions with high versus low time pressure to resolve a secondary conflict for respective dependent measures as appropriate. A separate contrast between manual and unreliable conditions was also conducted. Level of significance (a) was set at 0.05 for all the analysis. In addition, a targeted analysis was performed exclusively on the conflict resolution time measure for the first conflict following the first automation failure occurring in each of the two unreliable automation blocks. This analysis was conducted separately for the first unreliable block (whether low or high time pressure) and the second unreliable block, in order to examine the ‘first failure effect’ in automation, found to produce large decrements in performance (Yeh et al. 2003). Finally, a significance of difference in proportions test was also performed on the aid utilisation rate following an automation failure and following a correct automation. This analysis was done to reveal any loss of utilisation that might have occurred as a consequence, reflecting a loss of trust. 3.

Results

3.1. Task performance measures There was a significant effect of CRA reliability on the percentage of resolved conflict (Table 2). Figure 5 shows the comparison across the four conditions. The contrast analyses revealed that more conflicts were accurately resolved under the automated conditions (M ¼ 88.33%, SE ¼ 2.67%) than the manual condition (M ¼ 55%, SE ¼ 7.80%; F(1, 33) ¼ 40.59, p , 0.001). Conflict resolution performance was also better under the reliable condition (M ¼ 98.30%, SE ¼ 1.70%) than the unreliable conditions (M ¼ 83.40%, SE ¼ 3.15%; F(1, 33) ¼ 7.31, p ¼ 0.011). Surprisingly, participants under the unreliable with high time pressure condition (M ¼ 90.00%, SE ¼ 3.00%) could resolve conflicts more accurately compared to the unreliable with low time pressure condition (M ¼ 76.70%, SE ¼ 3.30%; F(1, 33) ¼ 4.33, p ¼ 0.045).

Ergonomics Table 2.

903

Dependent measures under various CRA conditions. CRA conditions

Measures Percentage of resolved conflict Conflict resolution time (seconds) Procedural error (number) Trust (ratings) Workload (NASA TLX) Aid utilisation rate (%) Percentage of timeouts Ready response latency (seconds) Probe response latency (seconds) Percentage of correct response to SA probes

Reliable

Unreliable and high time pressure

Unreliable and low time pressure

98.30 (1.70) 46.75 (6.49) 2.50 (0.54) 6.29 (0.14) 7.40 (0.37) 93.33 (3.76) 13.50 (6.00) 8.00 (0.98) 11.42 (1.59) 79.80 (2.90)

90.00(3.00) 74.25(11.02) 2.08 (0.34) 6.08 (0.17) 7.17 (0.34) 93.33 (2.84) 16.30 (4.50) 9.50 (1.82) 11.42 (1.35) 84.80 (2.90)

76.70 (3.30) 66.92 (9.91) 2.33 (0.41) 5.92 (0.20) 7.15 (0.34) 93.33 (5.12) 21.70 (6.30) 9.67 (1.78) 11.42 (1.46) 85.30 (4.00)

Manual 55.00 – 3.17 – 7.41 – 15.10 9.08 14.17 69.30

(7.80) (0.44) (0.37) (3.80) (1.59) (1.45) (6.40)

p

F

,0.001* 0.030* 0.288 0.078** 0.396 1.000 0.497 0.832 0.083** 0.037*

17.4 4.15 1.31 2.84 1.02 0.00 0.81 0.29 2.43 3.17

Downloaded by [New York University] at 12:06 02 June 2015

Note: All values are given as mean (SE). *Significant at a ¼ 0.05; **Significant at a ¼ 0.1 (marginally significant).

A final contrast between the unreliable conditions and manual condition showed that more conflicts could be resolved under the unreliable conditions (M ¼ 83.40%, SE ¼ 3.15%) more accurately than the manual condition (M ¼ 55%, SE ¼ 7.80%; F(1, 22) ¼ 18.98, p , 0.001), verifying that ATCos performance was better under even unreliable automated conditions than manually performing the tasks. A significant effect of CRA reliability on the conflict resolution time was observed (Table 2). Conflicts were resolved faster under the reliable condition (M ¼ 46.75 seconds, SE ¼ 6.49 seconds) than the unreliable conditions (M ¼ 70.58 seconds, SE ¼ 10.47 seconds; F(1, 22) ¼ 7.75, p ¼ 0.011). There was no difference in conflict resolution time between the unreliable with increasing time pressure. No difference was observed on the conflict resolution time following the first failure (M ¼ 103.42, SE ¼ 26.30) and the second failure (M ¼ 104.25, SE ¼ 24.45) encounters (F(1, 11) ¼ 0.02, p ¼ 0.984). Hence, the two blocks data could be collapsed. Importantly, there was a significant delay in conflict resolution time after participants experienced automation failures (M ¼ 103.84) compared to that in the reliable condition (M ¼ 46.75; F(1, 11) ¼ 24.01, p , 0.001). No significant effects of CRA reliability on the procedural errors were observed (Table 2).

Figure 5. Percentage of resolved conflict as a function of automation reliability conditions. Solid dots indicate means, and error bars indicate 1 SE.

904

F. Trapsilawati et al.

3.2. Subjective ratings of mental workload and trust Participants rated their workload to be quite high (M ¼ 7.28; SE ¼ 0.36), compared to a maximum possible value of 10. However, no significant difference in mental workload was found between the four conditions (Table 2). The trust ratings ranged from 1 to 7 with an average of 6.02 (SE ¼ 0.17), indicating a fairly high trust on the CRA. The effect of CRA reliability on trust ratings was significant with the one-tailed planned contrast consistent with the hypothesis (Table 2). Participants rated their trust higher under the reliable condition (M ¼ 6.29, SE ¼ 0.14) than under unreliable conditions (M ¼ 6.00, SE ¼ 0.19; F(1, 22) ¼ 4.71, p ¼ 0.041). The orthogonal contrast showed no significant effect of time pressure. 3.3. Aid utilisation rate There was no significant effect of CRA reliability on the aid utilisation rate (Table 2). Participants tended to depend upon the aid to resolve most of the conflicts (M ¼ 93.33%, SE ¼ 3.91%) regardless of the levels of CRA reliability. However, the likelihood of rejecting the CRA following a CRA failure on that trial was significantly greater than following a correct CRA (F(1, 11) ¼ 30.58, p , 0.001), showing less dependence after experiencing automation failures.

Downloaded by [New York University] at 12:06 02 June 2015

3.4.

Situation awareness

The effect of CRA reliability on probe response latency was marginally significant (Table 2). The contrast showed that participants responded to the SA probe question faster under the automated conditions (M ¼ 11.42 seconds, SE ¼ 1.47 seconds) than under manual condition (M ¼ 14.17 seconds, SE ¼ 1.45 seconds; F(1, 33) ¼ 7.29, p ¼ 0.011). No marked differences of probe response latency were observed across the automated conditions. The contrast analysis also revealed that participants responded to the SA probes faster under the unreliable conditions (M ¼ 11.42 seconds, SE ¼ 1.41 seconds) than under manual conditions (M ¼ 14.17 seconds, SE ¼ 1.45 seconds; F(1, 22) ¼ 5.49, p ¼ 0.029). A significant effect of CRA reliability on the percentage of correct responses to SA probe was found (Table 2). The contrast analysis revealed that the response accuracy was higher under the automated conditions (M ¼ 83.30%, SE ¼ 3.27%) than the manual condition (M ¼ 69.30%, SE ¼ 6.40%; F(1, 33) ¼ 8.43, p ¼ 0.007). It was also found that unreliable automation (M ¼ 85.05%, SE ¼ 2.30%) led to higher response accuracy than the manual condition (M ¼ 69.30%, SE ¼ 6.40%; F(1, 22) ¼ 7.67, p ¼ 0.011) (Figure 6). There were no other significant results. 4. Discussion 4.1. Effects of CRA reliability The primary purpose of the experiment was to examine the benefits or costs of imperfectly reliable conflict resolution automation, compared to manual performance, within the framework of automation trust, SA and dependence. Imperfect

Figure 6. Percentage of correct response to the SA probes as a function of automation reliability conditions. Solid dots indicate means, and error bars indicate 1 SE.

Downloaded by [New York University] at 12:06 02 June 2015

Ergonomics

905

ATC automation conflict detection, has been evaluated previously (Metzger and Parasuraman 2005), and found to help overall performance relative to manual. We noted that this level of reliability (88%) was above the 72% threshold at which Wickens and Dixon (2007) observed automation to still benefit human performance. In the present study, in contrast to Metzger and Parasuraman’s (2005), conflict resolution automation was evaluated, and the level of reliability for imperfect automation was set at 80% (one automation failure out of five scenarios) which also exceeded the threshold of assistance (Wickens and Dixon 2007; Xu, Wickens, and Rantanen 2007). While Prevot et al. (2012) did establish the benefits of perfect conflict resolution automation; they did not examine the effects of its imperfections in reliability. Most prominently, our results here showed that perfectly reliable conflict resolution automation clearly helped by improving resolution rates above manual from 55% to nearly 100%. These findings supported H1 and H2. However, even imperfectly reliable automation substantially increased resolution performance to 83% (perhaps coincidentally approximating the reliability rate of 80%) demonstrating that H3 was supported. It is noteworthy that this performance assistance is consistent with the fact that the reliability is above the Wickens and Dixon (2007) threshold. Importantly, we observed that while imperfection of the CRA did degrade performance somewhat from the fully reliable condition, in both speed and accuracy, it also improved SA in a way that partially offset the costs of that imperfection. This showed that H4 was upheld. Presumably, becoming slightly sceptical or mistrustful of the automation on these imperfect blocks, participants paid closer attention to the raw data depicting possible secondary conflicts hence increasing their accuracy of response to SA probes (Figure 6); but they could still greatly benefit from the 100% perfection of automation in recommending resolution from the primary conflicts. A second important finding here was the partial dissociation between trust and dependence (Wickens et al. 2013, Chapter 12). H5 was partially upheld. Dependence, as assessed by utilisation rate, was extremely high (93.33%) and did not decline with reliability; however, trust, although overall was lower compared to maximum scale, did significantly decline with reliability. Thus, participants noted the automation failures, adjusted their SA and trust accordingly, but still highly depended or relied upon automation, presumably because the high workload task environment allowed them little other alternative (Wickens et al. 2013). In the above analysis, performance, reliability and use/dependence were averaged across an entire block. Of course, this average masks conflict-by-conflict variations in these measures – the dynamics of trust and dependence – the most important of which is the response following the initial failure experienced by each participant. Because of the counterbalancing design, this initial failure occurred at a different trial blocks for different participants. Upon realising that a prior perfect CRA had now failed, did this lead to a temporary drop in trust and dependence? Because trust was only measured at the end of a block, we could not assess this in terms of trust rating. However, trust degradation could be characterised by more attention paid to the raw data which induced longer response time (Wickens, Conejo, and Gempler 1999). Following automation failures, in the present study, participants trust was found to degrade, indicated by the significantly longer conflict resolution time to the first conflict trial after the failure trials. Also, we examined the likelihood of ignoring the automation following an automation failure as compared to that following a correct automation. The results revealed that participants indeed showed less dependence after automation failure encounters than after correct trials following a correct automation. This revealed that the aid utilisation was reduced to 83.33% on the first trial following the first failure and was reduced even lower to 66.67% following the second failure. These findings suggest that the amplified effects of automation failures following the first failure, the so-called ‘first failure effect’ (Yeh et al. 2003) were manifest in the current data for both trust and dependence. However, in the present study, while participants’ trust and dependence did drop temporarily it returned rapidly after experiencing a correct trial that was rejected by them. The phenomenon might also occur due to high workload so that they opted to rely on the CRA to preserve their cognitive resources for other tasks (Wickens and Dixon 2007). Also, participants might depend on the CRA upon the initial contact (i.e. the first experience of using such aid), because they were not really familiar with the system (Merritt and Ilgen 2008). Since conflict resolution requires more processing resources on the prediction of intervention’s consequences (Eyferth, Niessen, and Spaeth 2003), the above results showed that CRA could help highlight the consequences accordingly and minimise the demand on processing resources. In the present study, the examination of incorrect resolution with traffic aircraft might turn the ‘black swan’ event of primary conflict into the ‘grey swan’ event of secondary conflict (Wickens et al. 2009). This means that a primary conflict which may be more difficult to detect by participants can successfully be resolved by the CRA, leaving participants with a ‘somewhat unexpected’ secondary conflict. Given this, participants would be able to better resolve secondary conflicts because they have an initial knowledge on the predetermined possible conflicts. This reveals the importance of initial conflict resolution reliability. Based on these findings, we may argue that automation for conflict resolution will be useful to be applied in future ATC as long as it can be reliable for the first conflict resolution iteration. The aid utilisation rate was above 93%, no matter whether the CRA was reliable or not. We speculate that such extensive use of automation could degrade ATCos’ manual resolution skills in the long term. This is because ATCos might no longer rely on their mental library of resolution manoeuvres to preserve their cognitive capability in handling more

906

F. Trapsilawati et al.

aircraft as projected in the near future. High dependence on the CRA and degraded manual resolution skill may cause ATCos to be ‘out of the loop’ if an automation failure occurs (Wickens et al. 2013).

Downloaded by [New York University] at 12:06 02 June 2015

4.2.

Effects of time pressure

It was surprising that participants with more time to resolve the secondary conflict did not perform better than those having less time to act in response. This finding showed that H6 was not upheld. This may be due to the nature of ATC practice which involves tactical actions (Hoffman, Mukherjee, and Vossen 2010). This means that a secondary conflict that was triggered within four minutes may reflect a really unexpected event because participants may have switched their attention away from that particular possibility. In contrast, the high time pressure condition represents a condition in current ATC practice where a traffic alert typically appears (Thomas, Wickens, and Rantanen 2003). Thus, participants may perform better since they are more familiar with it. Furthermore, time pressure did not affect participants’ mental workload. ‘Workload insensitivity’ (Parasuraman and Hancock, 2001) might have occurred in the conflict resolution task, where ATCos could adapt to the unreliability making the workload remain unchanged, regardless of the time pressure. There were no significant effects of time pressure on trust and dependence. This is in line with the findings in Metzger and Parasuraman (2005). Participants rated their trust in the CRA relatively high, regardless of time pressure conditions. The high trust ratings seemed to be manifested by the high dependence on the CRA across all automated conditions. Time pressure did not affect SA in the present study, indicating that H7 was not supported. These findings conflict with Gibson et al. (1997) and Endsley and Garland (2000) which indicated that time pressure was perceived as one of the critical components in the situation assessment phase and affected 45% of pilots’ SA errors. However, some researchers also reported that time pressure and awareness levels were not related to each other (Rodgers, Mogford, and Strauch 2000). The finding regarding the effects of time pressure on SA might be because both 100 seconds and 4 minutes are within the tactical time range (i.e. less than 5 minutes) (Thomas, Wickens, and Rantanen 2003), in which participants might expect that there would be a secondary conflict because of CRA unreliability. Similar finding was found in Metzger and Parasuraman (2005) that reported no variation in conflict detection between two and four minutes time ranges before a secondary conflict occurs. 4.3. Limitations There are some limitations in the present study. First, the simulated air traffic environment may not completely reflect the real situation. Some factors, such as weather, were not taken into account in our study. Second, since a within-subject design was adopted, learning curve might exist. But several measures were taken to minimise the adverse effects caused by the learning curve. In particular, the testing conditions were counter-balanced between participants and different testing conditions were not presented to the same participant in the same experimental session. Third, ATCos were not used as participants in this study, although participants were somewhat familiar with ATC procedures through their prior training in aerospace and aeronautical engineering. Hence, the result might slightly bias towards positive responses meaning that participants may rely on the CRA more than ATCos do. However, the research by Rantanen and Nunes (2005) empirically verified that performance patterns in conflict avoidance of both experts and novices were similar. Novices also tended to respond conventionally in accordance to conflict definition (Stankovic, Raufaste, and Averty 2008). This study would then provide a conservative estimate of the ATC practice in using automation for conflict resolution. 4.4.

Practical implications

According to previous research, automation unreliability in a critical environment like ATC would lead to a greatly detrimental effect. However, the present study found that automation could benefit ATC operations even if it is not perfectly reliable. The finding may advise designers to set safety layers in automation for conflict resolution. Specifically, designers should place a high effort in providing a reliable automation in the first resolution iteration. This is the implication of the present study, where we focus on the definition of unreliability as a secondary conflict triggered by an imperfect resolution advisory. The results indicated that the unreliability does not lead to adverse effects. This means, as long as the advisories can resolve a primary conflict, although it leads to a secondary conflict, ATCos should be able to safely override the CRA. The successful advisory for the primary conflict represents the reliable automation in the first resolution iteration. The present study also found that ATCos can perform better when exposed to the unreliable and high time pressure condition compared to the manual condition. This implies that time pressure did not raise safety concern in conflict resolution practice. This suggests, in addition to the first implication, that designers should provide time allowance for a secondary conflict resolution similar to the current time allowance in conflict detection. By doing so, the inherent uncertainty of prediction can also be narrowed (Hwang, Hwang, and Tomlin 2003). Moreover, comprehensive raw data

Ergonomics

907

illustrating a thorough picture of environment (Bahner, Hu¨per, and Manzey 2008) and automation feedback (Wickens et al. 2013) are still essential for ATCos to review ‘what the automation is doing’ (Norman 1990) and override in a safely manner. Disclosure statement No potential conflict of interest was reported by the authors.

Downloaded by [New York University] at 12:06 02 June 2015

References Airbus. 2013. Future Journey 2013– 2032, Airbus Global Market Forecast Report. http://www.airbus.com/company/market/forecast/? eID¼dam_frontend_push&docID¼33752 Bahner, J. E., A. -D. Hu¨per, and D. Manzey. 2008. “Misuse of Automated Decision Aids: Complacency, Automation Bias and the Impact of Training Experience.” International Journal of Human-Computer Studies 66 (9): 688– 699. Cardosi, Kim M., and Pamela W. Boole. 1991. “Analysis of Pilot Response Time to Time-Critical Air Traffic Control Calls.” Federal aviation administration final report. http://www.dtic.mil/dtic/tr/fulltext/u2/a242527.pdf Cardosi, Kim M., Judith Buerki-Cohen, Pamela W. Boole, Jennifer Hourihan, Peter Mengert, and Robert Disario. 1992. “Controller Response to Conflict Resolution Advisory.” Federal aviation administration final report. http://oai.dtic.mil/oai/oai?verb¼getRecord& metadataPrefix¼html&identifier¼ADA261177 Dao, A. Q. V., S. L. Brandt, V. Battiste, K. P. L. Vu, T. Strybel, and W. W. Johnson. 2009. “The Impact of Automation Assisted Aircraft Separation on Situation Awareness.” In Human Interface and the Management of Information: Information and Interaction, edited by G. Salvendy and M. J. Smith, 738– 747. Berlin Heidelberg: Springer. Endsley, M. R. 1994. “Situation Awareness Information Requirements for En Route Air Traffic Control.” Federal aviation administration final report. http://www.dtic.mil/dtic/tr/fulltext/u2/a289649.pdf Endsley, M. R. 2000. “Direct Measurement of Situation Awareness: Validity and Use of SAGAT.” In Situation Awareness Analysis and Measurement, edited by M. R. Endsley and D. J. Garland. Mahwah, NJ: Erlbaum. Endsley, M. R., and D. J. Garland. 2000. “Pilot Situation Awareness Training in General Aviation.” In Proceedings of the Human Factors and Ergonomics Society 44th Annual Meeting, 357–360. San Diego: Human Factors and Ergonomics Society. Endsley, M. R., and D. B. Kaber. 1999. “Level of Automation Effects on Performance, Situation Awareness and Workload in a Dynamic Control Task.” Ergonomics 42 (3): 462– 492. Erzberger, H. 2006. “Automated Conflict Resolution for Air Traffic Control.” In Proceedings of the 25th International Congress of the Aeronautical Sciences, 1 – 28. Hamburg: International Council of the Aeronautical Sciences. Eyferth, K., C. Niessen, and O. Spaeth. 2003. “A Model of Air Traffic Controllers’ Conflict Detection and Conflict Resolution.” Aerospace Science and Technology 7: 409– 416. Gibson, J., J. Orasanu, E. Villeda, and T. E. Nygren. 1997. “Loss of Situation Awareness: Causes and Conseuquences.” In Proceedings of the Eighth International Symposium on Aviation Psychology, edited by R. S. Jensen and L. A. Rakowan, 1417– 1421. Columbus: The Ohio State University. Hart, S. G., and L. E. Staveland. 1988. “Development of NASA TLX (Task Load Index): Results of Empirical and Theoretical Research.” In Human Mental Workload, edited by P. Hancock and N. Meshkati, 139– 183. Amsterdam: Elsevier. Hoffman, R., A. Mukherjee, and T. W. Vossen. 2010. “Air Traffic Management.” In Wiley Encyclopedia of Operations Research and Management Science, edited by J. J. Cochran, 1 – 12. New York: Wiley. Hwang, I., J. Hwang, and C. Tomlin. 2003. “Flight-Mode-Based Aircraft Conflict Detection Using a Residual-Mean Interacting Multiple Model Algorithm.” In Proceedings of the AIAA Guidance, Navigation, and Control Conference (AIAA 2003-5340). Austin, TX: The American Institute of Aeronautics and Astronautics. International Civil Aviation Organization. 2010. “Intensifying Asia-Pacific Collaboration to Address Efficiency and Safety.” Asia Pacific Regional Report. http://www.icao.int/publications/journalsreports/2010/ICAO_APAC-Regional-Report.pdf Jian, J. -Y., A. M. Bisantz, and C. G. Drury. 2000. “Foundations for an Empirically Determined Scale of Trust in Automated Systems.” International Journal of Cognitive Ergonomics 4: 53 – 71. Kelly, C., M. Boardman, P. Goillau, and E. Jeannot. 2003. “Guidelines for Trust in Future Atm Systems: A Literature Review.” European Air Traffic Management Programme (Vol. 1.0): European Organisation for the Safety of Air Navigation. Langan-Fox, J., M. J. Sankey, and J. M. Canty. 2009. “Human Factors Measurement for Future Air Traffic Control Systems.” Human Factors: The Journal of the Human Factors and Ergonomics Society 51 (5): 595– 637. Lee, J. D., and K. A. See. 2004. “Trust in Automation: Designing for Appropriate Reliance.” Human Factors 46: 50 – 80. Merritt, S. M., and D. R. Ilgen. 2008. “Not All Trust Is Created Equal: Dispositional and History-Based Trust in Human-Automation Interactions.” Human Factors: The Journal of the Human Factors and Ergonomics Society 50 (2): 194– 210. Metzger, U., and R. Parasuraman. 2005. “Automation in Future Air Traffic Management: Effects of Decision aid Reliability on Controller Performance and Mental Workload.” Human Factors: The Journal of the Human Factors and Ergonomics Society 47 (1): 35 – 49. Nolan, M. S. 1999. The Fundamentals of Air Traffic Control. New York: Cengage Learning. Norman, D. A. 1990. “The Problem with Automation: Inappropriate Feedback and Interaction, Not Overautomation.” Philosophical Transactions of the Royal Society (London) B237: 585– 593. Onnasch, L., C. D. Wickens, H. Li, and H. Manzey. 2014. “Human Performance Consequences of Stages and Levels of Automation: An Integrated Meta-Analysis.” Human Factors: The Journal of Human Factors and Ergonomics Society 56 (3): 476– 488. Parasuraman, R., and P. A. Hancock. 2001. “Adaptive Control of Mental Workload.” In Stress, Workload, and Fatigue, edited by P. A. Hancock and P. A. Desmond, 305– 320. Mahwah, NJ: Erlbaum. Parasuraman, R., and V. Riley. 1997. “Humans and Automation: Use, Misuse, Disuse, Abuse.” Human Factors: The Journal of the Human Factors and Ergonomics Society 39 (2): 230– 253.

Downloaded by [New York University] at 12:06 02 June 2015

908

F. Trapsilawati et al.

Prevot, T., J. R. Homola, L. H. Martin, J. S. Mercer, and C. D. Cabrall. 2012. “Toward Automated Air Traffic Control – Investigating a Fundamental Paradigm Shift in Human/Systems Interaction.” International Journal of Human-Computer Interaction 28: 77 – 98. Rantanen, E. M., and A. Nunes. 2005. “Hierarchical Conflict Detection in Air Traffic Control.” The International Journal of Aviation Psychology 15: 339– 362. Rantanen, E. M., and C. D. Wickens. 2012. “Conflict Resolution Maneuvers in Air Traffic Control: Investigation of Operational Data.” The International Journal of Aviation Psychology 22: 266– 281. Rodgers, M. D., R. H. Mogford, and B. Strauch. 2000. “Post-Hoc Assessment of Situation Awareness in Air Traffic Control Incidents and Major Accidents.” In Situation Awareness Analysis and Measurement, edited by M. R. Endsley and D. J. Garland, 64 – 100. Mahwah, NJ: Erlbaum. Rovira, E., and R. Parasuraman. 2010. “Transitioning to Future Air Traffic Management: Effects of Imperfect Automation on Controller Attention and Performance.” Human Factors: The Journal of the Human Factors and Ergonomics Society 52 (3): 411– 425. Sheridan, T. B. 2006. “Next Generation Air Transportation Systems: Human-Automation Interaction and Organizational Risks.” In Proceedings of the 2nd Symposium on Resilience Engineering (pp. 272– 282), Antibes, France. Stankovic, S., E´. Raufaste, and P. Averty. 2008. “Determinants of Conflict Detection: A Model of Risk Judgments in Air Traffic Control.” Human Factors: The Journal of the Human Factors and Ergonomics Society 50 (1): 121– 134. Thomas, L. C., C. D. Wickens, and E. M. Rantanen. 2003. “Imperfect Automation in Aviation Traffic Alerts: A Review of Conflict Detection Algorithms and Their Implications for Human Factors Research.” In Proceedings of the Human Factors and Ergonomics Society 47th Annual Meeting, 344– 348. Denver, CO: Human Factors and Ergonomics Society. Westin, C. A., C. Borst, and B. Hilburn. 2013. “Mismatches between Automation and Human Strategies: An Investigation into Future Air Traffic Management Decision Aiding.” In Proceedings of the 17th International Symposium on Aviation Psychology, Dayton, OH. Wickens, C. D., R. Conejo, and K. Gempler. 1999. “Unreliable Automated Attention Cueing for Air-Ground Targeting and Traffic Maneuvering.” In Proceedings of the Human Factors and Ergonomics Society 43th Annual Meeting, 21 – 25. Houston: Human Factors and Ergonomics Society. Wickens, C. D., and S. R. Dixon. 2007. “The Benefits of Imperfect Diagnostic Automation: A Synthesis of the Literature.” Theoretical Issues in Ergonomics Science 8: 201– 212. Wickens, C. D., J. G. Hollands, S. Banbury, and R. Parasuraman. 2013. Engineering Psychology and Human Performance. 4th ed. New Jersey: Pearson Education. Wickens, C. D., B. L. Hooey, B. F. Gore, A. Sebok, and C. S. Koenicke. 2009. “Identifying Black Swans in NextGen: Predicting Human Performance in Off-Nominal Conditions.” Human Factors: The Journal of the Human Factors and Ergonomics Society 51 (5): 638–651. Wickens, C. D., A. S. Mavor, R. Parasuraman, and J. P. McGee, eds. 1998. The Future of Air Traffic Control: Human Operators and Automation. Washington, DC: National Academy Press. Xu, X., C. D. Wickens, and E. M. Rantanen. 2007. “Effects of Conflict Alerting System Reliability and Task Difficulty on Pilots’ Conflict Detection with Cockpit Display of Traffic Information.” Ergonomics 50 (1): 112– 130. Yeh, M., J. L. Merlo, C. D. Wickens, and D. L. Brandenburg. 2003. “Head Up Versus Head Down: The Costs of Imprecision, Unreliability, and Visual Clutter on Cue Effectiveness for Display Signaling.” Human Factors: The Journal of the Human Factors and Ergonomics Society 45 (3): 390–407.

Appendix 1: List of manoeuvring abbreviations and the units

Notation

Abbreviation

Unit

CM DM FH IS RS

Climb and maintain Descend and maintain Fly heading Increase speed Reduce speed

Flight level Flight level Degree Knots Knots

Appendix 2: Situation awareness probe questions Level 1 1 2 3 Level 2 1 2 3 Level 3 1 2 3

What is the aircraft A’s speed? What is the direction of departure for aircraft B? What is the altitude clearance for aircraft C? How many aircraft flying southbound? Which aircraft has lower altitude? Is the difference in heading between aircraft D and aircraft E more than 90o? Which aircraft must be handed-off to another sector within the next 2 minutes? Which pairs of aircraft will lose separation if they stay on their current courses? Which aircraft will need a new clearance to achieve landing requirements?

Human factors assessment of conflict resolution aid reliability and time pressure in future air traffic control.

Though it has been reported that air traffic controllers' (ATCos') performance improves with the aid of a conflict resolution aid (CRA), the effects o...
421KB Sizes 1 Downloads 5 Views