The Surgical Safety Checklist and Teamwork Coaching Tools: a study of inter-rater reliability.

Downloaded from http://qualitysafety.bmj.com/ on June 14, 2015 - Published by group.bmj.com

ORIGINAL RESEARCH

The Surgical Safety Checklist and Teamwork Coaching Tools: a study of inter-rater reliability Lyen C Huang,1,2 Dante Conley,3,4 Stu Lipsitz,5 Christopher C Wright,6 Thomas W Diller,6 Lizabeth Edmondson,1 William R Berry,1 Sara J Singer7,8,9

▸ Additional material is published online only. To view please visit the journal online (http://dx.doi.org/10.1136/ bmjqs-2013-002446). For numbered affiliations see end of article. Correspondence to Dr Sara J Singer, Department of Health Policy and Management, Harvard School of Public Health, 677 Huntington Avenue, Kresge Building 3, Room 317, Boston, MA 02115, USA; [email protected] Received 30 August 2013 Revised 11 December 2013 Accepted 8 January 2014 Published Online First 4 February 2014

To cite: Huang LC, Conley D, Lipsitz S, et al. BMJ Qual Saf 2014;23:639–650.

ABSTRACT Objective To assess the inter-rater reliability (IRR) of two novel observation tools for measuring surgical safety checklist performance and teamwork. Summary background Data surgical safety checklists can promote adherence to standards of care and improve teamwork in the operating room. Their use has been associated with reductions in mortality and other postoperative complications. However, checklist effectiveness depends on how well they are performed. Methods Authors from the Safe Surgery 2015 initiative developed a pair of novel observation tools through literature review, expert consultation and end-user testing. In one South Carolina hospital participating in the initiative, two observers jointly attended 50 surgical cases and independently rated surgical teams using both tools. We used descriptive statistics to measure checklist performance and teamwork at the hospital. We assessed IRR by measuring percent agreement, Cohen’s κ, and weighted κ scores. Results The overall percent agreement and κ between the two observers was 93% and 0.74 (95% CI 0.66 to 0.79), respectively, for the Checklist Coaching Tool and 86% and 0.84 (95% CI 0.77 to 0.90) for the Surgical Teamwork Tool. Percent agreement for individual sections of both tools was 79% or higher. Additionally, κ scores for six of eight sections on the Checklist Coaching Tool and for two of five domains on the Surgical Teamwork Tool achieved the desired 0.7 threshold. However, teamwork scores were high and variation was limited. There were no significant changes in the percent agreement or κ scores between the first 10 and last 10 cases observed. Conclusions Both tools demonstrated substantial IRR and required limited training to use. These instruments may be used to observe

checklist performance and teamwork in the operating room. However, further refinement and calibration of observer expectations, particularly in rating teamwork, could improve the utility of the tools.

INTRODUCTION Surgical safety checklists promote adherence to standards of care in surgery. When used to enhance communication, they can improve teamwork and prevent communication failures in the operating room.1–4 Their use has been associated with reductions in mortality and other postoperative complications.5–7 However, the effectiveness of surgical safety checklists depends on how well surgical team members perform them.8 For example, surgical team members may skip items on the checklist or treat it as a box-ticking exercise.6 8–11 Research suggests that poorly implemented checklists can have adverse effects on team function.12 Previous experience and research suggests that these challenges can be overcome with methodical implementation programmes.13 The Safe Surgery 2015 initiative has spent the last 3 years collaborating with a diverse group of 67 hospitals in South Carolina with the goal of implementing surgical safety checklists in a way that promotes adherence and teamwork. Individual and team-based coaching on checklist performance has been endorsed by experts (including those who developed the WHO Surgical Safety Checklist) as an essential practice in achieving this goal.14 The Safe Surgery 2015 initiative encourages participating hospitals to select and train clinical staff from their ORs to serve as coaches. To provide a

Huang LC, et al. BMJ Qual Saf 2014;23:639–650. doi:10.1136/bmjqs-2013-002446

639


Original research

We developed a pair of tools to measure checklist performance and teamwork in the operating room (see online Appendix for sample South Carolina Checklist Template as well as the Checklist and Teamwork Coaching Tools). We created two tools rather than a single unified tool to allow hospitals to focus on specific areas of improvement and to avoid common method bias in future intertool analysis. The Surgical Safety Checklist Coaching Tool measures the key behaviours and processes contained on the surgical safety checklist template developed in collaboration with participating South Carolina hospitals. The majority of the items on the tool document the extent to which surgical teams performed the key teamwork and communication elements of the checklist. It was designed this way to reinforce the concept of the checklist as a process for improving teamwork and communication rather than a series of tasks to be checked off. We also included items that assess whether surgical team members follow checklist best practices (eg, reading all checklist items aloud, without reliance on memory) and exhibit appropriate behaviours (eg, ‘Buy-In’, see table 1) while performing the checklist. In order to measure the effect of

checklist performance for the observed cases, we also incorporated items measuring operating room efficiency, the avoidance of errors and adherence to existing surgical standards of care (eg, antibiotic re-dosing for operations >2 h in duration). For certain processes, which are not applicable in every case (eg, antibiotic prophylaxis, compression boots), we allowed observers to indicate if they were not applicable to the case. The companion Surgical Teamwork Tool measures teamwork among surgical team members in the operating room. In developing the tool, we started with the conceptual model already in use by the Safe Surgery 2015 monitoring programme. Members of the Safe Surgery 2015 team derived the conceptual model from previous models of teamwork,15 understanding of intended checklist effects, and experience observing checklist use. Specifically, we defined five measurable domains of teamwork considered particularly applicable to the operating room: clinical leadership, team communication, assertiveness, coordination and respect. We focus on aspects of teamwork, such as coordination, rather than hallmarks of high reliability,16 including situational awareness, in order to focus on the central construct, and because coordination applies specifically to teams while situational awareness can apply to individuals. We separately consider aspects of teamwork others have consolidated, such as clinical leadership and assertiveness to distinguish behaviours of those with authority from those without it. We also include elements like respect, that while often absent in surgical teamwork tools, are prominent in the teamwork literature more generally15 and often problematic in surgical teams.17 Definitions and examples of the behaviours measured in each of the five domains are shown in table 1. Safe Surgery 2015 team members (WB, DC, LE, SS) generated potential items for the Surgical Teamwork Tools, with reference to previous teamwork observation tools and climate assessments, including the Teamwork in Multidisciplinary Care Teams tool,18 the Oxford NOTECHS System,19 the High Reliability Surgical Teamwork tool by Thomas et al for Kaiser Permanente (unpublished), the Behaviour Marker Risk Index,20 the case-based version of the Safety Attitudes Questionnaire (‘ORBAT’),21 and the Observational Teamwork Assessment for Surgery (‘OTAS’) tool.22 The tools were further refined in consultation with experts in teamwork and medical simulation from the Center for Medical Simulation. We developed 19 items, which best measured behaviours within the five teamwork domains. All but one item on the tool describes what is considered an optimal teamwork behaviour, for example, ‘Discussions took place in a calm, learning-oriented fashion.’ These items use a 5-point frequency scale, where 1 indicates the behaviour never occurred, 2 says the behaviour occurred

640


framework for coaching surgical teams, Safe Surgery 2015 investigators developed a pair of observational tools that could be distributed widely and used by the coaches with limited training to assess and guide discussion about checklist performance and surgical teamwork—to assess not only whether the surgical checklist is being used, but also how it is being used. These tools were also designed to measure the impact of the Safe Surgery 2015 initiative. At the time, existing tools for measuring surgical checklist performance failed to address all three stopping points recommended by the WHO Safe Surgery Checklist ( preanaesthesia processes of care, preincision briefing, and postoperative debriefing), important communication features, surgical team member buy-in for using checklists, and expected impacts of checklist use (like reducing the number of times the circulating nurse needs to leave the room to find instruments or equipment). Similarly, tools for observing surgical teamwork were not tailored to aspects of teamwork that checklists might affect. The goal of the present study was to pilot test the two tools using observers like those expected to use them in a ‘real world’ setting (ie, not extensively trained and not conducted in a simulation lab) and to determine their inter-rater reliability (IRR) under these circumstances. Measuring the IRR is a critical step in assessing whether we have developed tools that are sufficiently clear and easy to use that the results will be consistent regardless of observer biases or training. METHODS Development of the coaching tools


Original research Table 1

Selected sections and domains from the Checklist Performance and Surgical Teamwork Tools Criteria/examples of excellent behaviours

Item

Definition

Buy-in

Acceptance and commitment by individuals to the performance of the checklist

Team members stop all other activities and conversation and appear to be interested while the checklist is performed

Clinical leadership

Exerting control or playing a decision-making role in a patient’s clinical care. Any member of the team may demonstrate clinical leadership in the course of a surgical procedure

Surgical team members share clinical responsibilities while managing a patient’s hypotension. Team members are open to suggestions from every member of the team. Physicians actively participate in patient care prior to incision, and maintain a positive tone throughout the operation

Communication

Team communication refers to the way information is shared among surgical team members.

Coordination

Coordination refers to team members working together to accomplish technical tasks

Respect

Respect refers to the ways in which team members treat or show regard for one another

Assertiveness

Assertiveness refers to team members’ willingness to speak up in order to communicate or ask for help

Verbal communication is easy to understand and clearly directed at individuals. Team members’ names are frequently used to ensure that the correct person receives the information. Key information is shared as it becomes available. Both speakers and recipients make visual or spoken efforts to confirm that important information was received Team members are eager to help one another, and plan for patient care together. Team members pay close attention to the operation and are able to engage in a discussion of its progress. When the patient condition changes, the team works together to adapt to the situation. Interactions occur in a coordinated and cooperative fashion Team members are called by their names, even if it requires asking them for their names again. No one is ever referred to by their role. When there is a problem or confusion, team members provide instruction as needed. Team members are apologetic when errors are discovered and appear genuinely interested in learning from their error. When an error occurs, errors are addressed in a calm and respectful fashion, without accusations or condescending remarks. Team members make certain that their concerns are heard and understood by other team members. Team members help each other when they are busy

Examples of poor behaviours Team members continue other activities or conversation while performing the checklist, or exhibit poor buy-in behaviour (eg, non-participation, speed reading, or rolling eyes) Surgical team members ignore requests from other team members when managing a patient’s condition. Staff members do not offer the opportunity to others to provide suggestions or actively discourage suggestions. Physicians are not present prior to incision, or if present, are engaged in unrelated activities. Physicians set a negative tone for the case by disregarding the opinions of other team members Verbal communication is inaudible or mumbled, or other team members need to clarify requests. Key information is not shared among team members in a timely fashion. No attempt is made by speakers or receivers to confirm that important information was received

The actions of team members are completely disconnected. Team members do not pay attention, or fail to update each other on the progress of the case. Plans are not updated or discussed when the patient condition changes. Emergence from anaesthesia is not well timed with the end of the surgical portion of the operation Team members are called by their roles instead of by name. When mistakes occur, no attempt is made to use the error as an occasion for learning or to encourage team members to speak up with questions or concerns in the future. Instead, errors are pointed out with condescension or raised voices. Team members respond angrily or rudely when an error or mistake is pointed out

When team members are ignored, they do not follow-up and ensure that their concerns are addressed. Team members who appear to be extremely overloaded and busy fail to ask for help

about 25% of the time, 3 corresponds with a behaviour that occurred about half the time, 4 with a behaviour occurring about 75% of the time, and 5 indicates that a behaviour always occurred. By estimating the proportion of instances in which the optimal behaviour occurred in the case, a rater assesses potentially varying quality of teamwork among surgical teams. A ‘N/A’ option was provided for four items that referenced behaviours unlikely to occur in every case. The last item on the tool asked for an overall rating of

surgical teamwork during the procedure on a scale of 1–5, with 1 indicating poor surgical teamwork and 5 indicating excellent surgical teamwork. Common to both tools is a section capturing case demographics ( patient age and gender, surgeon’s specialty, and procedure performed) and observer information (age, gender, role and tenure). This information enables users to match observations from the two tools for the same case in order to examine associations between checklist performance and


641


Original research teamwork. We also included more detailed case characteristic information, such as case duration measured as time of incision to surgical end time, whether the case was urgent/emergent or delayed, and patient disposition in order to study the relationship between these characteristics, checklist performance and teamwork. Design and case sample

selected cases were elective. This criterion was designed to minimise disruption and distraction associated with the presence of observers that might affect patient care. We also excluded cases involving study investigators to reduce potential for observer bias. Study investigators sent a letter to all surgical personnel prior to the start of the study offering the opportunity to decline to participate. None did so. The observers attended cases over a 3-month period from November 2012 to January 2013. We obtained ethical approval for the study from institutional review boards at the Greenville Health System and the Harvard School of Public Health.

We conducted a prospective observational study. Two nurse observers from the study hospital (both of whom had experience in quality management and observational data collection) used the two coaching tools to rate checklist performance and teamwork in the operating room. Their training on use of the instruments resembled the training that we believe could be reasonably offered to personnel in any healthcare facility. First, the two observers reviewed the tools and accompanying written instructions. Then, each observer completed a supplementary web-based training course on the use of the Surgical Teamwork Tool (http:// safesurgery.teamtraining.sgizmo.com/s3/). For each of the five teamwork domains included in the teamwork instrument, the web-based training provides a definition, lists the set of related items, and shows two short video vignettes. The video vignettes depict scenarios carefully designed to demonstrate positive and negative forms of each behaviour that the observers will use the teamwork instrument to judge. A short quiz follows each vignette to assess the user’s ability to distinguish positive and negative teamwork behaviours. The training provides automated feedback on user’s responses and allows the opportunity for review. The web-based training is self-paced and generally takes about 15 min to complete. After this preliminary training was complete, the two observers together trialled the coaching tools in a single case, without involvement of investigators. We followed this first observation with a conference call allowing the observers to debrief with the primary investigators (LH, SS). During this call, we discussed discrepancies in observer ratings and provided an opportunity for the observers to ask questions about the instruments. Observers requested clarification regarding subjective measures, for example, what constituted ‘significant’ disruption and ‘repeatedly’ leaving the OR. The remaining cases were then completed without additional discussion with the research team. Additionally, the observers did not discuss their ratings with each other during the study period. The onsite project coordinator randomly selected 50 surgical cases for observation. Selection criteria included an expected case duration between 30 min and 2 h. We expected the minimum duration to provide adequate time for observers to evaluate teamwork during the procedure, and the maximum duration to allow the observers to see both the briefing and debriefing portions of the same case. Additionally,

The project coordinator at the study hospital sent electronic copies of completed paper-based observations forms to Safe Surgery 2015 team members every 2–3 weeks during the data collection period. Investigators entered these data for analysis, checking accuracy of data entry by reviewing data that seemed inconsistent or unusual and double checking 10% of all data entered. We performed all analyses using SAS V.9.3 (SAS Institute, Cary, North Carolina, USA). To begin our analysis, we calculated descriptive statistics for both tools. To assess IRR, we calculated the percent absolute agreement and Cohen’s κ score for each section of the Checklist Coaching Tool and each domain of the Surgical Teamwork Tool. κ Is considered more robust than simple percent agreement because it accounts for agreement occurring by chance. For questions using Likert scales, we used a weighted κ score, which assigns partial credit for near, but not exact, agreement. For Likert scale questions also including an N/A option, the N/A was assigned a value of 6 in order to maintain the same continuum as the scale. For the Checklist Coaching Tool, we calculated an overall average κ coefficient as an average of the κ coefficients for the individual sections weighted by the number of items in each section.23 We generated an overall κ coefficient for the Surgical Teamwork Tool by calculating an average of the κ coefficients across all 19 items. The 95% CIs for the κ coefficients were calculated using a jackknife technique.24 We considered κ coefficients to be statistically significant if the 95% CI excluded 0, and the p value was less than or equal to 0.05. In sections where there was a very high prevalence of the same answer (ie, an item was always rated a 5 by both observers), we did not estimate κ coefficients because the probability of selecting the answer by chance was so high that the κ coefficient is considered an inappropriate measure of reliability.25 These sections were omitted from the overall κ calculations. We interpreted the κ coefficients using Landis and Koch’s scale: 0.80 ‘almost perfect’.26

642


Data collection and statistical analysis


Original research For this study, we set our threshold for considering the tools sufficiently reliable for widespread use at 0.70, which is the midpoint for the ‘substantial’ category.27 To evaluate the possibility of an experience effect, that is, that ratings would change with experience in using the tools, we compared percent agreement and κ scores of the first 10 cases (excluding the first case that was observed prior to the debriefing call with investigators) and the last 10 cases.

Checklist performance and teamwork in the operating room

Procedure information Patient median age (IQR) 38.5 (14.0–56.0) Patient gender female (%) 19 (38) Case median duration in minutes (IQR) 43.5 (34.0–72.0) Cases with significant nonclinical disruption (%) 1 (2)* Case delayed >30 min (%) 5 (10) or 4 (8)† Disposition to inpatient facility 6 (12) versus outpatient facility (%) Surgeon specialties (%) Cardiothoracic 1 (2) ENT 3 (6) General surgery 18 (36) Gynaecology 9 (18) Neurosurgery 1 (2) Orthopedics 5 (10) Pediatric surgery 4 (8) Pediatric ENT 1 (2) Pediatric urology 1 (2) Plastics 1 (2) Surgical oncology 3 (6) Urogynecology 1 (2) Urology 2 (4) *Each observer reported one case with a significant non-clinical disruption, but differed on the case reported. †In one case, observers differed on whether the case was delayed >30 min.

The observation tools provide a snapshot about checklist performance and surgical teamwork at Greenville Memorial Hospital at the time cases were observed (table 3). Compliance with the Joint Commission’s Surgical Care Improvement Project (SCIP) measures was very high according to both observers, with 40 teams providing antibiotics within 1 h of incision in the 41 cases that called for prophylactic administration. Additionally, in the 36 cases where compression boots were not contraindicated, all the teams provided them. For appropriate placement of warmers in cases with an expected duration of more than 1 h, both observers identified high rates of compliance but differed in their assessment of case duration. One observer reported that teams placed warmers in 38 of 38 cases where warmers were required. The other observer reported that teams complied in 41 of 42 applicable cases. Compliance with performing the briefing and debriefing portions of the checklist was less consistent. According to both observers, teams introduced themselves by name and role, or had done so earlier in the day in all 50 cases. However, the first observer noted introductions in 44 of the 50 cases and the second observer in 43 of the cases. Surgeons discussed the operative plan less than half the time (in only 22 of the 50 cases according to the first observer, and in 20 cases according to the second observer). The surgeon stated the expected duration of the procedure in 31 or 33 of the 50 cases according to the first and second observers, respectively. By contrast, nurses discussed sterility, equipment and other concerns more frequently (in 45 of 50 cases according to the first observer and 41 cases according to the second observer). There was more disagreement regarding how often the anaesthesia provider discussed the anaesthesia plan. The first observer reported a briefing by an anaesthesia provider in 43 of 50 cases while the second observer reported it in just 32 cases. Most teams did not perform the checklist as intended, by reading every item aloud without reliance on memory. The two observers reported that the checklist was performed properly in only 22 or 23 of the cases, respectively. For the debriefings, teams discussed specimen labelling in 25 cases according to both observers. However, the first observer identified 34 cases with specimens while the second noted 33. Teams discussed equipment and other problems in 46 of 50 or 44 of 49 cases according to the two observers, respectively. Finally, observers reported that 40 of 49 teams (41 of 50 according to the second observer) discussed key concerns for patient recovery and postoperative management. Buy-in to the checklist process among the surgical team members was uniformly rated as high, with mean buy-in scores of 4.78–4.88 among the different


643

RESULTS Case characteristics

Both observers attended all 50 cases to which they were jointly assigned. Information about case characteristics is presented in table 2. The median age of the patients in the cases observed was 38.5 (IQR 14.0–56.0), and the median case duration was 43.5 min (IQR 34.0–72.0). A significant non-clinical disruption as judged by the observers only occurred in one case. However, there were significant delays of greater than 30 min in 5 (10%) of the cases. In 6 (12%) of the cases, the patients were admitted to the hospital postoperatively (versus discharged home). The most commonly observed specialty was general surgery (36% of all cases), followed by gynaecology (18%). Table 2

Observed case characteristics (n=50) n

Yes, w/o prompting Processes of care

O1

O2

Yes, prompted by checklist

No

O1

O2

O1

N/A O2

O1

O2

40

24

0

18

1

1

9

7

Q2. Were compression boots placed (mechanical deep vein thrombosis prophylaxis)?

36

36

0

0

0

0

14

14

Q3. Was a warmer placed (for case >1 h)?

37

40

1

1

0

1

12

8

Yes Briefing

No

O1

O2

O1

O2

Circulating nurse

50

49

0

1

Anaesthesia provider

50

49

0

1

Surgeon

50

49

0

1

Surgical tech

49

49

1

1

44

43

6

7

6

5

6

5

Q4. Which of the following individuals participated in confirming the patient’s identity, procedure or operative site before incision?

Q5. Did team members introduce themselves by name and role (eg, ‘Lynn, the anaesthesiologist.’)? Q5a. If no, was this team established (ie, introductions performed earlier the same day)?


Q6. Before incision, did the surgeon discuss the operative plan?

22

20

28

30

Q7. Before incision, did the surgeon state the expected duration of the procedure?

31

33

19

17

Q8. Before incision, did the surgeon communicate the expected blood loss (EBL)?

30

31

20

19

Q9. Before incision, did the nurse discuss sterility, equipment, or any other concerns?

45

41

5

9

Q10. Before incision, did the anaesthesia provider discuss the anaesthesia plan (including airway or other concerns)?

43

32

6

18

Q11. Were all checklist items read aloud, without reliance on memory?

22

23

28

27

Yes Debriefing

No

N/A

O1

O2

O1

O2

O1

O2

Q12. Before the patient left the OR, did the team discuss specimen labelling (eg, labels/patient name read aloud)?

25

25

9

8

16

17

Q13. Before the patient left the OR, did the team discuss equipment or other problems that arose?

46

44

4

5

0

1

Q14. Before the patient left the OR, did the team discuss key concerns for patient recovery and postoperative management?

40

41

9

9

0

0

Yes Buy-In

No

O1

O2

O1

O2

Circulating nurse

50

49

0

1

Anaesthesia provider

50

50

0

0

Surgeon

50

50

0

0

Surgical tech

50

48

0

1

Q15. Which of the following individuals actively participated in discussing checklist items?


Q1. Was an antibiotic given within 1 h of incision?

Original research

644

Table 3 Checklist performance in the 50 observed cases


50 50 0 0 0 0

0 0

O1 O2 O1 O2

0 0

O2 No Yes

16 23 0 17 24 0

33 27 50

O1 O2 O1

32 26 50

No Yes

4.88 4.84 4.80 4.82 4.78 4.78 4.82 4.84

O2

0.39 0.47 0.64 0.48

O1 O2 O1

0.46 0.51 0.44 0.42

SD Mean

professional roles. A notable proportion of cases experienced equipment issues. The two observers reported that in 16 or 17 cases, nurses had to leave the OR repeatedly to find instruments or equipment. Equipment was available and functioning throughout the case in only 23 or 24 of the 50 cases according to the first and second observers, respectively. Both observers also noted that antibiotic re-dosing was not discussed in the one case where the expected case duration of longer than 2 h warranted such discussion. With regard to teamwork in the operating room, the observers uniformly rated cases highly (table 4). The mean overall teamwork rating was 4.74 (SD 0.49) according to the first observer, and 4.98 (SD 0.14) by the second observer. Scores ranged from 3 to 5 on the 5-point scale. The reverse-scored item ‘Team members referred to each other by role instead of name’ (Q13) was rated the highest by both observers (mean 4.98, SD 0.14 by observer 1; mean 5.00, SD 0.00 by observer 2), indicating team members almost never referred to others by role. The teamwork item ‘Verbal communication among team member was easy to understand’ (Q5) was rated the lowest by the observers (mean 4.54, SD 0.58 by observer 1; mean 4.62, SD 0.57 by observer 2). When items were aggregated by teamwork domain, assertiveness was, on average, the highest rated domain (mean 4.85, SD 0.16 by observer 1; mean 4.97, SD 0.01 by observer 2), while communication was rated the lowest (mean 4.84, SD 0.20 by observer 1; mean 4.83, SD 0.15 by observer 2).

Q23. If there is significant EBL, was a type and cross sent or blood products available? Q24. If there is significant EBL, was adequate intravenous access discussed and obtained?

O1

Q20. Did the circulating nurse leave the OR repeatedly to find instruments or equipment? Q21. Were instruments and equipment available and functioning throughout the case? Q22. Was a potential error or omission averted by the checklist?

Additional data

Inter-rater reliability

Nurse Anaesthesiologist Surgeon Surgical tech

Q16-19. For questions 16–19 rate checklist buy-in using the descriptions below. ‘1’ represents poor buy-in; ‘5’ represents excellent buy-in

Table 3 Continued.

O2

N/A

50 50

Continued

Original research


For the Checklist Coaching Tool, the overall percent agreement was 93%, and the overall κ coefficient was 0.74 (95% CI 0.66 to 0.82) (table 5). Percent agreement within sections ranged from 83% for SCIP measures and surgical team member buy-in to 100% for the surgical best practices section. κ Coefficients ranged from 0.44 for buy-in to 0.94 for adherence to the Joint Commission Timeout. Within the SCIP measures, percent agreement for antibiotics being given within 1 h (Q1) was 58% when only unprompted administration was considered to be proper performance, but increased to 92% when the responses for ‘Yes w/o prompting’ and ‘Yes, prompted by the checklist’ were combined. There was no change in the percent agreements for compression boot use (Q2) or warmer use (Q3) when the ‘Yes’ responses were combined. The overall percent agreement increased to 95% though, when the ‘Yes’ responses were combined. The overall percent agreement for the Surgical Teamwork Tool was 86% and the κ score was 0.84 (95% CI 0.77 to 0.90). The assertiveness domain had the lowest κ score at 0.63 (95% CI 0.45 to 0.82) and percent agreement at 79%. The respect domain had the highest κ score at 0.92 (95% CI 0.84 to 1.00) and percent agreement at 92%. The κ score for the 645


Original research Table 4

Surgical teamwork in the 50 observed cases

Clinical leadership Q1. Clinical leadership was shared among disciplines in response to changes in the patient’s condition or issues that arose during the operation Q2. Physicians were open to suggestions from nurses Q3. Physicians were present and actively participating in patient care prior to skin incision Q4. Physicians maintained a positive tone throughout the operation Communication Q5. Verbal communication among team members was easy to understand (eg, clearly articulated and spoken at an adequate volume) Q6. Team members shared key information as it became available Q7. Speakers made a visual or spoken effort to confirm that important information was received Q8. Recipients made a visual or spoken effort to confirm that they understood the information communicated Coordination Q9. Team members appeared eager to help one another Q10. Team members from different disciplines discussed the patient’s condition and the progress of the operation Q11. Plans for patient care were adapted as needed Q12. Clinical tasks were well coordinated among team members Respect Q13. Team members referred to each other by role instead of name (eg, ‘Nurse’ instead of ‘Dana’). (R) Q14. Discussions took place in a calm, learning-oriented fashion Q15. Team members reacted appropriately when their potential errors or mistakes were pointed out Q16. Potential errors or mistakes were pointed out without raised voices or condescending remarks Assertiveness Q17. Team members made certain that their concerns were understood by other team members Q18. Team members appeared to struggle and did not ask one another for help. (R) Overall teamwork Q19. Please rate surgical teamwork during this procedure

Mean*

SD*

O1

O2

O1

O2

4.83 4.88

4.85 4.86

0.05 0.32

0.02 0.64

4.78 4.86 4.80 4.84 4.54

4.81 4.86 4.86 4.83 4.62

0.42 0.40 0.45 0.20 0.58

0.46 0.50 0.40 0.15 0.57

4.92 4.96 4.94 4.89 4.94 4.88 4.94 4.82 4.86 4.98 4.90 4.79 4.79 4.85 4.88 4.93

4.96 4.88 4.86 4.89 4.94 4.84 4.87 4.92 4.87 5.00 4.96 4.79 4.75 4.97 4.85 4.93

0.28 0.20 0.24 0.06 0.31 0.39 0.25 0.48 0.09 0.14 0.36 0.54 0.54 0.16 0.34 0.38

0.20 0.33 0.35 0.05 0.31 0.43 0.34 0.34 0.12 0.00 0.20 0.42 0.55 0.01 0.36 0.37

4.74

4.98

0.49

0.14

*Excluding N/A responses; (R), Reverse scored question; O1, observer 1; O2, observer 2.

Table 5

Surgical Safety Checklist Coaching Tool inter-rater reliability and percent agreement (n=50 cases) First 10 cases* % Agree

Overall Case characteristics SCIP measures (Q1–3) Joint commission timeout (Q4) Briefing (Q5–11) Debrief (Q12–14) Active participation (Q15) Surgical team member buy-in (Q16–19) Checklist outcomes (Q20–22) Surgical best practices (Q23–25)

κ

Last 10 cases 95%

CI

% Agree

92 95 90 92 90 97 100 80

0.60 0.27 0.93 94 0.00 −0.99 0.99 98 0.78 0.57 0.99 77 0.78 0.39 0.99 98 0.79 0.63 0.94 90 0.92 0.75 0.99 100 –‡ – – 98 0.63 0.19 0.99 98

87 100

0.52 –‡

0.09 0.95 93 – – 100

κ

Overall 95%

0.83 0.54 0.88 0.59 0.55 0.42 0.94 0.83 0.78 0.62 0.99 0.99 –‡ – 0.88 −0.62 0.86 –

CI

% Agree

1.11 93 0.99 97 0.67 83† 0.99 98 0.94 88 0.99 94 – 99 0.99 83

0.68 0.99 94 – – 100

κ

95% CI

p Value

0.74 0.86 0.63† 0.94 0.73 0.87 –‡ 0.44

0.66 0.70 0.52 0.84 0.64 0.78 – 0.09

Improvement of teamwork and safety climate following implementation of the WHO surgical safety checklist at a university hospital in Japan.

Teamwork Assessment Tools in Modern Surgical Practice: A Systematic Review.

Implementing the WHO Surgical Safety Checklist.

Surgical Safety in Pediatrics: practical application of the Pediatric Surgical Safety Checklist.

Degree of Observance of the WHO Surgical Safety Checklist.

The WHO surgical safety checklist: survey of patients' views.

Swedish Nurse Anesthetists' Experiences of the WHO Surgical Safety Checklist.

How to establish interrater reliability.

Re: Ward safety checklist in the acute surgical unit.

Ward safety checklist in the acute surgical unit.

Interrater Reliability of the Postoperative Epidural Fibrosis Classification: A Histopathologic Study in the Rat Model.

Interrater reliability of the Glamorgan scale: overt and covert data.

The Gap between Individual Perception and Compliance: A Qualitative Follow-Up Study of the Surgical Safety Checklist Application.

Interrater reliability of seizure duration in ECT.

Changes in safety climate and teamwork in the operating room after implementation of a revised WHO checklist: a prospective interventional study.

Implementation of a surgical safety checklist: interventions to optimize the process and hints to increase compliance.

Recommendations for surgical safety checklist use in Canadian children's hospitals.

Interrater reliability in evaluating trainee interviewing skills.

Interrater Reliability of Motion Palpation in the Thoracic Spine.

Interrater reliability of a national acute myocardial infarction register.

Accountability and teamwork: tools for a fall-free zone.

Interrater Reliability of the Record of Driving Errors (RODE).

Six sigma tools for a patient safety-oriented, quality-checklist driven radiation medicine department.

Interrater Reliability of the Adapted Fresno Test across Multiple Raters.