practice

Photographic assessment of burn size and depth: reliability and validity  Objective: The aim of this study was to examine the reliability and validity of using photographs of burns to assess both burn size and depth. l Method: Fifty randomly selected photographs taken on day 0–1 post burn were assessed by seven burn experts and eight referring physicians. Inter-rater reliability in both groups (experts vs. referrers) was calculated.The validity of burn size assessment was calculated using live assessment as the gold standard, and of burn depth using clinical assessment in combination with laser Doppler imaging as the gold standard. The validity of the photographically-assessed indication for surgery was calculated using laser Doppler imaging and actual treatment as the gold standard. Finally, agreement in referral indication was calculated. l Results: Using photographs, burn size could be assessed reliably and validly by experts (ICCs of 0.83 and 0.87), but not by referrers (ICCs of 0.68 and 0.78). Photographic assessment of burn depth was neither reliable nor valid, with ICCs respectively of 0.38 and 0.28 for experts and 0.24 and 0.13 for referrers. The indication for surgery could also not be assessed validly. Agreement between assessors regarding referral indication was low. l Conclusion: Burn size, but not burn depth, can be assessed reliably and validly by experts using photographs of the burn wound. We recommend exploring other forms of telemedicine, like live interactive video, to investigate whether this leads to an improved burn depth assessment where clinical assessment is not possible. l Declaration of interest: There were no external sources of funding for this study. The authors have no conflicts of interest to declare with regard to the manuscript or its content. l

144

T

he initial assessment of burn size and depth is important. Firstly, accurate assessment of burn size, expressed as percentage of total body surface area (TBSA) burned, determines whether transfer to a burn centre is necessary, as well as the need for and amount of initial intravenous fluid resuscitation. Secondly, early assessment of burn depth assessment determines the most optimal wound treatment, whether surgical or conservative. The initial assessment of a burn wound, however, is often difficult and its accuracy depends on the experience of the assessor.1,2 Burn size is clinically assessed using the Lund and Browder Charts, rules of nine and/or hand rule.3–5 Inexperienced assessors often estimate burn size incorrectly, which leads to significant errors in fluid resuscitation and expensive over-triage.6–10 Burn depth is also assessed clinically, where the diagnosis relies on an evaluation of several wound features such as appearance, capillary refill and pain sensibility.1 Inaccurate burn depth assessment can lead to suboptimal choice of topical treatment and to unnecessary surgery or an unnecessary delay in surgery.7,11–15 Because the assessment of burn size and depth is primarily a visual skill8 and the number of burn experts is limited, the use of telemedicine in burn

care is an attractive diagnostic tool to investigate. Telemedicine has numerous definitions, relating to the practice of medicine at a distance. The use of telemedicine in burn care has been explored in the past decade using three basic telemedicine systems: live interactive video, store-and-forward images/photographs and telephone.8,16 The use of telemedicine in burn care aims to improve burn diagnosis and to support treatment choices (including referral). The focus of this study was on the diagnostic abilities of telemedicine in burn care. As with all diagnostics, reliability and validity are important.17 Reliability is the degree in which the assessment by telemedicine is free from measurement error.18 Validity is defined as the degree to which an instrument truly measures the constructs it purports to measure.18 Several studies have already investigated reliability and validity of burn assessment by different systems of telemedicine.11,19–22 However, to our knowledge, the reliability of photographic assessment of burn size has not yet been studied. Two studies analysed the criterion validity of respectively photographic and video burn size assessment, and a high correlation was found between telemedicine assessment and live assessment by burn experts.11,19 Regarding photographic burn depth assessment, Boccara et al. reported on the inter-rater reliability,

© 2014 MA Healthcare

M.J. Hop,1–3 MD; C.M. Moues,3 MD, PhD; K. Bogomolova,1 BSc; M.K. Nieuwenhuis,4 PhD; I.M.M.H. Oen,5 MD; E. Middelkoop,2,6 PhD; R.S. Breederveld,7,8 MD. PhD; M.E. van Baar,1 PhD 1 Association of Dutch Burn Centres, Maasstad Hospital, Rotterdam, the Netherlands; 2 Department of Plastic, Reconstructive and Hand Surgery, MOVE Research Institute,VU University Medical Centre, Amsterdam, the Netherlands; 3 Department of Plastic and Reconstructive Surgery, Medical Centre Leeuwarden, the Netherlands; 4 Association of Dutch Burn Centres, Martini Hospital, Groningen, the Netherlands; 5 Burn Centre, Maastad Hospital, Rotterdam, the Netherlands; Continued on page 145

ltd

burn; wound diagnostics; telemedicine; photographs

J O U R N A L O F WO U N D C A R E V O L 2 3 , N O 3 , M A R C H 2 0 1 4

Journal of Wound Care. Downloaded from magonlinelibrary.com by 138.025.078.025 on July 22, 2016. For personal use only. No other uses without permission. . All rights reserved.

where 75% agreement was found between three experienced surgeons.20 The criterion validity of photographic burn depth assessment was presented in several studies. Poor to good agreement (κ ranging from 0.33–0.60) between photographic and live assessment by an experienced physician was found in two studies. The highest agreement was demonstrated among experienced burn surgeons.21,22 Two other studies assessed burn depth on a two-point-scale, and demonstrated an accuracy of 94% in photographic compared to live assessment,11 sensitivity of 77% and specificity of 75% in photographic assessment compared to initial live diagnosis and actual treatment, by 3 experienced observers.20 Despite the knowledge gained so far in terms of diagnosing burn wounds using telemedicine, the reliability of burn size or burn depth assessment by telemedicine has not been studied conclusively. Furthermore, the number of observers were often small; inter-observer bias can be reduced by increasing the numbers of observers in these studies. The gold standard in all studies that calculated validity of burn depth was also not optimal: research shows that the accuracy of live assessment of burn depth is limited, even in experienced assessors (50–75%).1,12,13 None of the studies chose live assessment in combination with laser Doppler imaging as the gold standard, which has proven to be a far more accurate predictor of burn depth/ wound healing time.23 Given this, the evidence on the reliability and validity of the use of telemedicine in burn diagnosis is limited. Hence, the aim of this study was to examine the reliability and validity of photographic assessment of burn size and burn depth by both burn centre experts and referring physicians, in order to determine whether this form of telemedicine can be used as a diagnostic tool to support therapeutic decisions. Reliability and validity were determined by calculating Intraclass Correlation Coefficients (ICC), where an ICC of 0.70 is considered acceptable and an ICC of >0.80 is considered good.24 The hypotheses were: 1) the inter-rater reliability and validity of photographic burn size assessment by burn experts are high (ICC>0.80) and higher than in referring physicians; 2) the inter-rater reliability and validity of photographic burn depth assessment by burn experts are acceptable (ICC= 0.70) and higher than in referring physicians.

Methods Study design and population Photographs made on day 0 or 1 post-burn were used in this study. They were randomly selected J O U R N A L O F WO U N D C A R E V O L 2 3 , N O 3 , M A R C H 2 0 1 4

from a recent trial,25 using a random number table,26 and represented a variety of burn patients with a TBSA ≤20%. Ethical approval was obtained to permit the use of photographs for this study (protocol T2013/13). A total of fifty cases were selected, in accordance with previous clinimetric studies.24 As in daily practice, photographs of lesser quality and of non-debrided wounds were also included. Each case was presented in printed form and included at least one photograph of the burn and basic clinical information (Fig 1). All observers received a standard explanation on the assessment of burn depth and size, in which burn depth was categorised as superficial partial thickness, deep partial thickness, or full thickness. Superficial partial thickness burns have the potential to heal within 14 days, whereas deep partial thickness burns do not heal within 21 days, and full thickness burns do not heal spontaneously at all.27 Burn size was defined as percentage total body surface area (TBSA) burned, using the Lund and Browder chart.4 The assessors in this study were 7 burn experts (burn physicians/surgeons with > 5 years of experience) from the Dutch Burn Centres, and 8 referring physicians (2 ER physicians, 2 plastic surgeons, and 4 plastic surgery/surgery residents) from one general hospital. The assessors scored (Fig 1): • burn size • burn depth, of a specific area • indication for surgery of the marked area • indication for referral of the patient to a burn centre. The gold standard used for burn size was the first live assessment of percentage TBSA burned by one experienced burn physician (5 different burn experts in total), using the Lund and Browder chart,

6 Association of Dutch Burn Centres, Red Cross Hospital, Beverwijk, the Netherlands; 7 Department of Surgery/ Burn Centre, Red Cross Hospital, Beverwikj, the Netherlands; 8 Department of Surgery, Leiden University Medical Centre, Leiden, the Netherlands; Email: baarm@ maasstadziekenhuis.nl

Fig 1 Patient 32: left forearm and hand Post burn day: 1 Female Age: 21 Aetiology: fire Assessment ___% TBSA of complete wound BURN DEPTH of marked area (choose one) q superficial partial thickness q deep partial thickness q full thickness INDICATION FOR SURGERY of marked area? q yes q no INDICATION FOR REFERRAL of patient? q yes q no

s

© 2014 MA Healthcare

ltd

practice

145

Journal of Wound Care. Downloaded from magonlinelibrary.com by 138.025.078.025 on July 22, 2016. For personal use only. No other uses without permission. . All rights reserved.

practice

n=50 (%)

Male/ female

26/24 (52/48)

% TBSA burned (live assessment), mean (SD; range)

3.2 ( 2.47; 0.5–10.0)

Age mean (SD; range)

29 (24; 0–87 )

0.5–1.0

12 (24)

0–4

11 (22)

>1–2.0

12 (24)

5–17

7 (14)

>2–3.0

5 (10)

18–40

16 (32)

>3–4.0

8 (16)

41–60

11 (22)

>4–5.0

6 (12)

61–80

4 (8)

>5–10.0

7 (14)

>80

1 (2)

Aetiology

Depth according to clinical assessment +LDI

Fire/Flame

19 (38)

Healing potential within 14days

26 (52)

Scald

17 (34)

Healing potential between 14-21 days

1 (2)

Oil

11 (22)

Healing potential >21 days

23 (46)

Contact

2 (4)

Steam

1 (2)

Location

Depth according to biopsy * Deep partial thickness

9 (18.0)

Head/Neck

3 (6)

Full thickness

8 (16.0)

Trunk

5(10)

No biopsy

33 (66)

Arm

24(48)

Hand

11(22)

Leg/Foot

7 (14)

*biopsy results were available in surgical treated patients only

Table 2. Burn size (%TBSA burned) and burn depth reliability and validity Inter-Rater Reliability

Validity

ICC (95% CI)

ICC (95% CI)

Burn Size

Gold standard: live assessment of burn experts at first presentation

Burn centre expert (n=7)

0.83 (0.76–0.89)

0.87 (0.84–0.89)

Referring physicians (n=8)

0.68 (0.56–0.78)

0.78 (0.74–0.82)

Burn Depth

Gold standard: clin-LDI

Burn centre expert (n=7)

0.38 (0.25–0.53)

0.28 (0.18–0.38)

Referring physicians (n=8)

0.26 (0.16–0.39)

0.13 (0.03–0.22)

Missing values: reliability burn size experts 2/50, referrers 2/50, validity burn size experts 2/350, referrers 3/400, reliability burn depth experts 4/50 (in one expert 16 items missing, therefore excluded from reliability analysis), referrers 2/50 validity burn depth, experts 20/350, referrers 4/400

rules of nine and/or the palm rule.3–5 The gold standard used for burn depth was live assessment by an experienced burn centre physician in combination with laser Doppler imaging (between 148

Statistical analysis The inter-observer reliability of photographic assessment of burn size and depth was measured within the expert group and the referrer group by calculating ICCs for single assessments with a two way ANOVA. With burn size, the standard error of measurement (SEM) was calculated. To analyse the criterion validity in the expert and referrer group, results of photographic burn assessment were compared to the gold standards by calculating ICCs. The Bland and Altman method was used to compare the burns size assessment by observers and the gold standard. Positive predictive values and sensitivity and specificity of burn depth assessment were calculated. To analyse validity of treatment choices in the expert and referrer group, the photographic assessment of treatment choice (surgery/ no surgery) was analysed, calculating sensitivity and specificity. The agreement in photographically made referral choice in the expert and referrer group was measured by calculating ICCs.

Results Patient and injury characteristics Fifty-two percent of the included patients were male, with a mean age of 29 (range 0–87). The body regions most frequently burned were the hands and arms (48% and 22%, respectively). The main burn causes were from scalds and fire/flame (34% and 38%). The mean percentage TBSA burned, according to live assessment at presentation, was 3.2 (range 0.5–10.0). Fifty-two percent of the photographed burns had a HP21 days according to the clinical assessment, in combination with the

ltd

n=50 (%)

48 hours and 5 days post-burn; moorLDI2-BI, Moor Instruments). Laser Doppler imaging, in combination with clinical assessment (clin-LDI) is the most accurate predictor of wound healing time with an accuracy of ≥95%.1,23 Possible LDI results are a healing potential (HP) of less than 14 days, HP 14–21 days, and HP greater 21 days. Furthermore, biopsy results, available from surgical burn wounds in our original trial were used to create a 3-point scale that equalled the photographic assessment. Burns with a HP of 21 days, the biopsy results were used to differentiate between deep partial thickness and full thickness wounds. The gold standard used for the indication for surgery were the results from clin-LDI; HP ≤21 days and HP>21 days. Additionally, the final treatment choice (surgery/no surgery) was used as comparator; not all specialists consider a HP >21 days as a cut-off point and the size of a wound and the overall patient status influences the treatment choice as well.

© 2014 MA Healthcare

Table 1. Patient and injury characteristics, including gold standards

J O U R N A L O F WO U N D C A R E V O L 2 3 , N O 3 , M A R C H 2 0 1 4

Journal of Wound Care. Downloaded from magonlinelibrary.com by 138.025.078.025 on July 22, 2016. For personal use only. No other uses without permission. . All rights reserved.

practice

The inter-rater reliability of photographic burn size assessment by the 7 burn experts was good; ICC of 0.83 (95% confidence interval (CI) 0.76–0.89), SEM of 0.93% TBSA. The inter-rater reliability of photographic burn size assessment by the 8 referring physicians was considered not acceptable; ICC of 0.68 (95% CI 0.56–0.78) and the SEM 1.24%. The validity of photographic burn size assessment, compared to live assessment by a burn expert, was rated good for the burn experts with an ICC of 0.87 (95% CI 0.84–0.89) and significantly higher than for referring physicians; ICC of 0.78 (95% CI 0.74–0.82) (Table 2). As shown in Fig 2, both under- and overestimation of burn size was observed, and estimation errors were larger among referrers. In burn experts, but not in referrers, a trend was seen between years of experience and the validity of burn size assessment (Fig 3).

The photographic assessment of burn depth was not reliable. A low ICC between experts was found (0.38 [95% CI 0.25–0.53]), with an even lower ICC between referrers (0.24 [95% CI 0.14–0.37]). The validity of burn depth assessment compared to clinLDI was low, with an ICC of 0.28 (95% CI 0.18– 0.38) for experts and 0.13 (95% CI 0.03–0.22) for referrers (Table 2 and 3). The positive predictive value of deep partial thickness and full thickness burns was 55.4% for experts and 51.9% for referrers. Sensitivity in burn experts of the photographic recognition of deep partial thickness and full thickness burns (when clin-LDI HP was >21 days) was 83.8% (95% CI 79.8–87.7), and specificity (the photographic recognition of superficial dermal burns, when clin-LDI HP was ≤21 days) was 40.9% (95% CI 35.6–46.2). Sensitivity in referrers was 66.8% (95% CI 62.2–71.5) and the specificity 46.2% (95% CI 41.3–51.1). In other words, in both groups the most common mistake in photographic assessment was overestimation of burn depth. The burn depth of many wounds were also underestimated by referrers (Table 3). No relation between years of experience and validity was seen. Due to the possibility that the low ICCs found in burn depth validity were the result of unequal 3-point scales between photographic assessment and gold standard (respectively, superficial partial thickness, deep partial thickness, full thickness, vs LDI: HP21 days), we aimed to create equal scales by combining two gold standards: clinLDI and biopsy results in one new gold standard. The validity of photographic assessment compared to the combined gold standard remained low, with an ICC of 0.37 (95%CI 0.25–0.54) in experts and an ICC of J O U R N A L O F WO U N D C A R E V O L 2 3 , N O 3 , M A R C H 2 0 1 4

10

5

0

-5

-10 0

2

4 6 average TBSA burned (%)

8

10

The horizontal lines indicate mean difference and upper and lower limits of agreement ±2 SD

Fig 2B. Bland–Altman plot of the difference in TBSA observer minus gold standard against the mean TBSA per photograph – Referrers 10

5

0

-5

-10 0

2

4 6 average TBSA burned (%)

8

10

0.19 (95% CI 0.09–0.29) in referrers.

Treatment choice The sensitivity of photographic assessment of surgery indication in experts, compared to clin-LDI, was 49.7% (95% CI 44.2–55.2, n=73/147), and specificity was 82.4% (95% CI 78.2–86.5, n=140/170). In referring physicians, sensitivity (39.9% [95% CI 35.1– 44.7, n=73/183]) and specificity (76.2% [95% CI

s

© 2014 MA Healthcare

ltd

Burn depth

TBSA burn centre specialist minus TBSA golden standard

Burn size

Fig 2A. Bland–Altman plot of the difference in TBSA observer minus gold standard against the mean TBSA per photograph – Experts

TBSA burn centre specialist minus TBSA golden standard

laser Doppler imager (Table 1).

149

Journal of Wound Care. Downloaded from magonlinelibrary.com by 138.025.078.025 on July 22, 2016. For personal use only. No other uses without permission. . All rights reserved.

practice 71.0–80.2) in referrers, respectively.

HP 14–21

HP>21 days

69

3

25

97

Deep partial thickness

88

4

94

186

Full thickness

12

0

35

47

Discussion

169

7

154

330*

Superficial partial thickness

93

5

61

159

Deep partial thickness

86

3

87

176

Full thickness

25

0

36

61

204

8

184

396**

Photographic assessment experts (n=7) Photographic assessment referrers (n=8) Total

Total

*In total 20 observations missing. ** In total 4 observations missing. HP, healing potential in days.

Fig 3.Validity (ICCs) of photographic burn size assessment in experts Results per period of experience in years, respectively 2, 2 and 3 observers per group

Intraclass Correlation Coefficients

1.00 90 80 70 60 50 40 30 20 10 >5-10

>10-20

>20-35

Years of experience

72.0–80.4, n=163/214]) were even lower, respectively. All wounds with a clin-LDI prediction of ≤21 days (n=37) healed without surgery. Of the wounds with a predicted healing potential of >21 days (n=23), 20 were treated surgically and 3 healed without surgery. Sensitivity and specificity, using the actual treatment choice as a reference test, were 52.0% (95% CI 46.5– 57.5) and 80.5% (95% CI 76.2–84.9) in burn experts and 42.1% (95% CI 37.3–47.0) and 76.1% (95% CI 150

This was the first study into the clinimetrics of telemedicine in burn care with multiple observers and an accurate gold standard for the assessment of burn depth. We showed that photographic assessment by burn experts was a reliable and valid method to assess burn size. Contrary to our hypothesis, reliability and validity of photographic burn depth assessment was not acceptable. Subsequently, the need for surgery could not be assessed validly using photographs. Experts more often observed an indication for referral than referrers, but the agreement within the experts and referrer groups was low. Similar to a previous study on the photographic assessment of burn size,11 experts in our study could make a valid and reliable assessment of burn size, which potentially improves referral and resuscitation choices. Whereas the previous study included only one observer, in the current study several burn experts (n=7) participated, which strengthens our conclusions. An extra observation made in our study was the trend between years of burn care experience and validity in burn experts. Jones et al. have reported similar findings on photographic assessment of burn depth by experts of different levels; poor validity was found in junior nurses and good validity in plastic surgeons.22 It is important to realise that years of experience plays a role in the use of telemedicine. For a reliable and valid assessment of burn depth, visual observation of a photograph alone is inadequate. Our study results were not in accordance with some previously presented positive results of photographic assessment of burn depth.20–22 Although the results for experts were better than for referrers, neither reliability nor validity was acceptable in any group. In accordance with Boccara et al. most errors in our study were due to overestimation of burn depth.20 These results resemble studies on ‘live’ clinical assessment of burn depth in which the most common error was overestimation.1,12,13 Since the optimal method of early burn depth assessment is not yet established, in our study several gold standards were used. Our study was the first telemedicine study that used clinical assessment in combination with LDI results. Possibly, the earlier presented positive results of photographic burn depth assessment were caused by the use of the gold standard live assessment by a burn expert at presen-

ltd

HP

Photographic assessment of burn size and depth: reliability and validity.

The aim of this study was to examine the reliability and validity of using photographs of burns to assess both burn size and depth...
1MB Sizes 0 Downloads 2 Views