Evaluating Emergency Physicians: Data Envelopment Analysis Approach Javier Fiallos MSc1, Ken Farion, EP2 Wojtek Michalowski, PhD1, Jonathan Patrick, PhD1 1

Telfer School of Management, University of Ottawa, Ottawa, ON; 2

Children’s Hospital of Eastern Ontario, Ottawa, ON

Abstract The purpose of this research is to develop an evaluation tool to assess performance of Emergency Physicians according to such criteria as resource utilization, patient throughput and the quality of care. Evaluation is conducted using a mathematical programming model known as Data Envelopment Analysis (DEA). Use of this model does not require the subjective assignment of weights associated with each criterion – a feature typical of methodologies that rely on composite scores. The DEA model presented in this paper was developed using a hypothetical data set describing a representative set of profiles of Emergency Physicians. The solution to the model relates the performance of each Emergency Physician in relation to the others and to a benchmark. We discuss how such an evaluation tool can be used in practice. Introduction The performance and effectiveness of an Emergency Department (ED) has a direct impact both on the quality of patient care and the efficiency of resource utilization. As the ED is often the entry point to the healthcare system and the point at which stress in the system is most clearly demonstrated through excessive wait times, its operation is highly scrutinized. This has motivated a number of initiatives focused on improving patient flow and quality of care in the ED1,2. While most of these initiatives aim to achieve more efficient workflow processes, needs-based staffing or improved operations, they often underestimate the importance of the performance of care providers (physicians and nurses). This is an important omission as a significant portion of the ED performance measures is to some extent influenced by how well Emergency Physicians (EPs) function. The importance of accurate measures of EP (and physicians in general) performance is highlighted by the common outcomes of such measures including identification of areas for improved medical practice, promotion of continuous professional development and the dissemination of identified best practices3. Physicians’ performance is multi-faceted and requires considering a number of heterogeneous factors5. Limiting an assessment to a single criterion skews performance towards the selected measure quite often at the expense of other potential candidates. For example, exclusive focus on reducing the rate of returns to ED (commonly used as a measure of quality) might motivate EPs to over-treat patients resulting in a higher cost per patient and a reduced throughput for the department. Developing a multi-criteria evaluation framework presents a considerable challenge not only when selecting the type and number of performance measures to include but also when assigning a weight to each measure in order to capture its relative importance. Some methodologies assume equal weights while others assign different weights in an attempt to achieve a higher impact on a final score from the measures deemed most important. The main shortcoming of any weighting scheme is the subjectivity involved in the weight development process6. A goal of this research was to develop an evaluation tool that limits subjectivity. Considering the broad scope of possible evaluation frameworks, we focus solely on the measures related to the clinical competency of EPs and use criteria such as patient outcomes, timeliness of care, throughput of patients, and the efficient use of resources. The proposed evaluation is carried out using a quantitative mathematical programming model that belongs to a family of Data Envelopment Analysis (DEA) models. DEA models assess how “efficient” each EP is in relation to other physicians in the sample under consideration. Solution to the model produces a set of scores that characterize “efficiency” (or lack of it) for each individual EP. The paper is organized as follows: the next section presents a brief review of the literature on physician performance evaluation. This is followed by a description of the mathematical model that forms the basis for the evaluation tool. A case study is used to illustrate the utility of the model in evaluating the performance of EPs. The paper concludes with a discussion.

423

Related Work A number of the evaluations of physician performance reported in the literature focus on the measures for assessing the quality of care in relation to selected therapies7, 8 while others deal with the measures required to jointly evaluate medical and behavioral competences9. The majority of evaluation tools construct a global score by either calculating the average of a series of responses to a questionnaire10, by analyzing deviations from some threshold values11, or categorizing Likert scale responses into satisfactory and unsatisfactory categories12. In some cases, calculating a global score involves using subjective weights to capture the relative importance of each measure8. To reduce the bias associated with the development of weights, some authors proposed the use of DEA models. For example, a number of simple DEA models were used for the assessment of primary care physicians13,14,15. These models use resource utilization measures such as visits to the ED, hospitalizations, laboratory tests, radiology tests and medications as well as throughput measures such as the number of treated patients, and measures related to cost containment. None of these models included measures related to the quality of care or the timeliness of care. The DEA model proposed here specifically addresses this shortcoming by using measures for resource utilization, timeliness of care, throughput of patients and quality of patient outcomes. Use of these four criteria captures the multi-faceted nature of the EPs’ work and, to the best of our knowledge, represents the first attempt at developing a comprehensive tool for evaluating EPs. Methods DEA is a mathematical programming model that has been developed to carry out evaluation of entities (called Decision Making Units (DMUs)) such as firms, hospitals, police stations, bank branches, stores, and sports teams. DEA applies a benchmarking approach where the performance of each DMU is compared to the performance of the benchmark DMUs that represent the best practice amongst their peers. These best practice DMUs are also called the efficient DMUs16 and they define the efficient frontier that establishes a benchmark. In DEA terminology, the performance of a DMU is characterized by a relation between the inputs (normally the resources) used by a DMU to complete a task, and the outputs of this task (normally measures of performance). For the DMUs that are evaluated as inefficient, the solution of a DEA model provides information about the required improvements in inputs, outputs or both, that will transform a DMU in question into an efficient one. Solving a DEA model produces a score characterizing the performance of each DMU. The value of this score provides a relative comparison of the DMU to benchmark performance and its exact interpretation depends on the type of DEA model that is used. Determining the score requires calculating the relative weights for the inputs and outputs. Unlike other methods, in DEA models these weights are calculated automatically by solving a linear programming model16. The entire process is encapsulated into the structure of the DEA model and does not require manual interventions (other than calculating the model’s parameters). At the end, efficient DMUs have scores of one while inefficient ones have a score in the [0,1) interval. To put in layman terms, the DEA model attempts to calculate the best possible score to each DMU (by calculating optimal set of weights), so DMU in question achieves the maximal performance. In order to graphically illustrate the basic concepts behind DEA, we use a simple example with three DMUs representing three EPs. We consider two inputs (number of ordered laboratory tests and number of requested specialist consults), and one output (number of patients seen) and assume for simplicity that all EPs see the same number of patients (have the same value for the output), so their evaluation may be based solely on the comparison of the inputs. Examining figure 1, it is possible to notice that EP ”D” requests the lowest number of consults, while EP “E” orders the lowest number of laboratory tests. These two EPs will be considered efficient (define benchmark performance) according to the DEA model because they operate with the minimum use of resources in at least one of the inputs. Their performance defines an efficient frontier (dotted line in figure 1) that “envelopes” the performance of EP “F” who is inefficient since fir him/her utilization of each of the inputs can be improved.

424

Figure 1. Graphical illustration of the DEA model (one output, two inputs).

Solving the DEA model for these three EPs will provide information regarding the required reduction of the input values (number of ordered tests, number of specialist consults) in order for EP “F” to become efficient. The arrows originating from point F, in figure 1, represent two improvement opportunities available for F. The horizontal arrow represents the improvement by reducing the number of requested specialist consults and the vertical arrow represents the improvement by reducing the number of ordered laboratory tests. Simultaneous improvement of both inputs is also a possibility so long as it is sufficient to move "F" onto the efficient frontier. There are a number of DEA models that can be used to evaluate performance of the DMUs. Taxonomy of these models is based on how achievement of efficiency is calculated. If the model considers just reductions in the inputs, then it is called an input-oriented DEA model. If it considers just increasing the outputs, then it is called an outputoriented DEA model. Finally, if it considers changes in inputs and outputs simultaneously, then it is called a nonoriented DEA model. The formulation of the DEA models may also vary depending if return to scale is considered to be constant or variable. In our estimation, a DEA model is an appropriate evaluation tool for EPs because it meets the following basic requirements: •

It has the ability to simultaneously consider multiple performance measures (inputs and outputs).



It has the ability to automatically determine the values for the weights (thereby avoiding bias and subjectivity).



It has the ability to compare inefficient DMUs with a benchmark.

One of the common criticisms of any DEA model is that a DMU may be deemed efficient (has value of a score equal to one) because the model assigns a high weight to an input or output where the DMU performs very well and at the same time attaches a low weight to those inputs/outputs where the DMU’s performance is poor. Such excessive weights flexibility may produce an efficient DMU that is extremely good according to a single or limited set of inputs/outputs and a poor performer according to the rest. Generally, these cases contradict expert opinion and weaken the validity of the model. Typically, asymmetric assignment of weights is addressed through an addition of constraints that control for the weights’ values. However, determining these constraints is difficult and often requires subjective judgment. In order to avoid subjectivity and limit flexibility in calculating weights, the DEA model described here promotes symmetry scaling by automatically penalizing the DMUs with significantly asymmetric weights and therefore helps to avoid a situation described above17. A non-oriented DEA model known as the Slacks Based Measure model (SBM)18 meets basic requirements for EP performance evaluation. It assumes that the DMUs have control over inputs and outputs, which allows translating results into flexible improvement goals that may include reductions in the inputs as well as increases in the outputs.

425

For a purpose of this research the model is further specialized by implementing the variable returns to scale assumption and by including a penalizing mechanism that promotes Symmetric Weight Assignment (SWA). This extended model – referred to as the SBM-SWA model – was used to create an evaluation tool for assessing the performance of the EPs. The structure of the model and an explanation of its components are given in the Appendix. Case Study We tested the SBM-SWA by using it to assess the performance of the hypothetical EPs working in the ED of a midsize hospital. We assumed that the ED sees approximately 70,000 patient visits per year and is staffed by 20 full time equivalent EPs. We also assumed that work patterns of all the EPs are similar and that they work shifts throughout entire year ensuring that each of them manages a similar patient mix and each EP can manage every complaint. For all possible reasons for the ED visits, we assume the EPs are evaluated on such presentations that can be managed by any staff member. In this context it is expected that the EPs defining the benchmark are able to manage patients more effectively with better patient outcomes using less resources than the inefficient ones. The first step in developing the SBM-SWA model requires defining the performance measures. There are some limiting factors that influence the selection of these measures and they are outlined below: •

Data: in the ED setting availability of patient-specific and EP-specific data defines type and granularity of measures that can be used.



Expected variation between EPs’ performance: only performance measures that sufficiently discriminate between the physicians should be included.



Degree of control the EPs have over the measures: there are situations where physician’s performance depends to a large extent on factors that are outside his/her control and thus measures where EP’s level of control is low should be excluded.

When using DEA, it is important to establish a proper number of the DMUs in relation to the number of inputs and outputs. It is recommended that this number is equal to or greater than max {ms, 3(m + s)} where m is the number of inputs and s is the number of outputs13. Since 20 EPs populate the hypothetical ED, therefore in this case study, up to three outputs and three inputs should be used. These input and output are calculated for each individual EP based on aggregated yearly data and each EP is being evaluated for the same set of presenting complaints. The subscript “j” associated with a measure described below refers to j-th EP. Input measures: Average Encounter Time per Patient Visit (EPTIME_PATj): this measure represents the amount of time the EP spends with a patient and is associated with effective patient management. EPTIME_PATj is calculated as the number of minutes between a patient’s first medical assessment and the final disposition decision. It is a sensible proxy for the length of stay of a patient in the ED (one of the useful and reliable measures of ED’s performance)16. It is important to note that within the encounter time, the time waiting for tests’ results, etc is not controlled by the EP and may influence the value of EPTIME_PATj. However, it is reasonable to assume that all EPs are equally affected by these external factors. Average of Number Laboratory Tests per Patient Visit (LAB_PATj): this measure captures practice variations in diagnosing a patient as EPs may diagnose patients from the same DRG requesting a different number of tests. LAB_PATj is a proxy for resources used and it is expected that efficient EPs will arrive at a correct diagnosis with fewer tests. Average Number of Radiology Orders per Patient Visit (RAD_PATj): this is similar measure to LAB_PATj. Radiology orders were distinguished from the laboratory tests because of their differing cost. Output measures: Rate of Non-Return Patient Visits within 72 hours (RNR72j): A return visit to the ED (for the same presentation) within 72 hours of discharge is considered to be a reliable measure of patient care. According to a survey of ED medical directors of accredited Canadian programs19, the RNR72j was ranked as one of most useful indicators of EP performance. It is an output measure because it can be considered as a proxy for the quality of care provided by the EP. Due to the fact that the SBM-SWA model considers higher values of outputs as better, the rate of non-return visits (more being better) is used instead of the commonly used rate of return visits. The value of RNR72j is

426

calculated as the total number of non-return patient visits for EP j, divided by the number of patients seen by the same EP. Average Number of Patient Visits per Worked Hour (PAT_WHj): The volume of patients that each EP sees in a given time interval is a measure of throughput and captures the productivity of each physician. While the value of PAT_WHj is clearly influenced by how busy the ED is, we argue that it is reasonable to assume that all EPs performance is similarly affected by the variability in congestion.  Data Data describing 20 EPs was developed with the help of an expert EP to reflect different physician profiles. Table 1 summarizes the data by showing the rank each EP occupies according to a given input and output measure, each considered independently (i.e. EP1 is the best performing EP when RNR72j is considered (rank 1) and the worst performing when RAD_PATj is considered (rank 20)). In terms of performance, EP6, EP13 and EP20 are overall good performers. Conversely, EP2, EP4 and EP18 are low performing physicians. However, there is a number of EPs whose performance cannot be easily categorized, and EP1, EP3 and EP8 belong to this group. Note for example, that the EP1 is the best performing physician according to the RNR72j measure and at the same time the worst performer according to the RAD_PATj measure. Determining overall performance of such EP is not a trivial matter and in practice often involves some subjective assessment. We show that this subjectivity can be limited when using the SBM-SWA model. Table 1. Hypothetical inputs and outputs for the sample of physicians Physicians

EP_TIME_PATj (Input 1)

LAB_PATj RAD_PATj (Input 2) (Input 3)

RNR72j (Output 1)

PAT_WHj (Output 3)

EP1

8

6

20

1

13

EP2

19

20

19

9

5

EP3

9

2

6

20

20

EP4

15

17

13

8

17

EP5

11

3

2

18

16

EP6

2

5

1

19

3

EP7

16

15

8

17

4

EP8

6

12

3

11

15

EP9

13

13

12

13

2

EP10

3

4

18

12

19

EP11

4

1

11

7

12

EP12

12

10

5

6

14

EP13

7

9

16

3

1

EP14

14

16

9

4

11

EP15

5

8

15

15

10

EP16

20

18

14

2

18

EP17

17

19

10

5

9

EP18

18

14

17

14

6

EP19

10

11

4

10

7

EP20

1

7

7

16

8

427

Results Table 2 summarizes results from running the SBM-SWA model with the data describing 20 EPs. It lists all inefficient EPs whose performance scores are less than one. The efficient EPs (their score is equal to one) are EP6, EP13 and EP20 (as expected).

Table 2. Inefficient EPs identified by the SBM-SWA model Physicians

SBM-SWA score

EP1

0.899

EP2

0.621

EP3

0.745

EP4

0.625

EP5

0.855

EP7

0.721

EP8

0.776

EP9

0.802

EP10

0.733

EP11

0.985

EP12

0.768

EP14

0.719

EP15

0.813

EP16

0.591

EP17

0.691

EP18

0.679

EP19

0.808

“Difficult” cases represented by the EP1, EP3 and EP8 were identified by the SBM-SWA model as inefficient EPs (scores of 0.899, 0.745 and 0.776 respectively). These scores can, to some extent, be interpreted as "measuring" how far off each EP is from achieving efficiency – with the EP1 being the “closest” and thus might be viewed as best performer among this group. However, knowledge of the score is not sufficient to determine what needs to be improved in order for an EP to become efficient. Such information can be obtained from solving a reformulated SBM-SWA model, known as a dual model (such reformulation is very easy to conduct and it is standard procedure for this class of linear programming models). Results obtained for these three EPs are presented in table 3. Looking at a first row for EP1, we interpret these results as follows: if the Average Encounter Time per Patient Visit (EP_TIME_PAT1) is reduced by 2.387 minutes, the Average Number of Radiology Orders per Patient Visit (RAD_PAT1) is lowered by 0.037 and the Average Number of Patient Visits per Worked Hour (PAT_WHj) is increased by 0.338, EP1 becomes an efficient physician. Of course these values should not be considered in an absolute sense, but rather in relation to other values in table 3. Thus, it is possible to hypothesize that if EP_TIME_PATj is considered, then change will be the most difficult for EP3 and the easiest for EP1, while the differences are not that pronounced when PAT_WHj measure is considered.

428

Table 3. Improvement opportunities for EPs 1, 3 and 8 Excess (Input 1)

Excess (Input 2)

Excess (Input 3)

Shortfall (Output 1)

Shortfall (Output 3)

EP_TIME_PATj

LAB_PATj

RAD_PATj

RNR72j

PAT_WHj

EP1

2.387

0.000

0.037

0.000

0.338

EP3

13.221

0.000

0.025

0.027

0.838

EP8

7.410

0.171

0.004

0.000

0.645

Discussion In the paper we demonstrated that the SBM-SWA variant of the DEA model provides valuable insights into assessing the performance of EPs. Assessment conducted using hypothetical data representative for a medium-size ED allows us to state that: •

We were able to consider heterogeneous performance measures in order to arrive at categorization of the EPs into efficient and inefficient.



All evaluations were conducted without the need to subjectively assign relative weights to each individual input and output.



We were able to control (by limiting excessive weights’ flexibility) for unwanted compensatory behavior where poor performance on some measure is compensated for by very good performance on another.



We were able to not only identify which EPs are inefficient but also to provide guidance with regards what are relative weak points of each inefficient EP.

An evaluation tool based on the SBM-SWA model can be used to design personalized improvement plans for each EP. Information about the excessive use of inputs and inadequate outputs can be used to define performance improvement goals (and priorities) for each EP and become a part of continuous improvement initiatives. Since there is no need to define a priori weights for each performance measure, no bias is introduced into the evaluation. This enhances the validity of the evaluation tool and increases the chances for positive response from underperforming EPs.

Acknowledgement Research described in this paper was partially funded by the IBM Centre for Business Analytics and Performance at the Telfer School of Management, University of Ottawa.

References 1.

Eitel DR, Rudkin SE, Malvehy MA, Killeen JP, Pines JM. Improving service quality by understanding emergency department flow: A white paper and position statement prepared for the American Academy of Emergency Medicine. J Emerg Med. 2010;38(1):70-9.

2.

Kelly AM, Bryant M, Cox L, Jolley D. Improving emergency department efficiency by patient streaming to outcomes-based teams. Aust Health Rev. 2007;31(1):16-21.

3.

Holmboe ES. Assessment of the practicing physician: Challenges and opportunities. Journal of Continuing Education in the Health Professions. 2008; 28(1): 4-10.

4.

Wharam F, Frank M, Rosland A, Paasche-Orlow M, Farber N, Sinsky C, Rucker L, Rask K, Barry M, Figaro K. “Pay for performance” as a quality improvement tool: perceptions and policy recommendations of physicians and program leaders. Quality Management in Health Care. 2011; 20(3): 234-245.

5.

Dubinsky I, Jennings K, Greengarten M, Brans A. 360-degree physician performance assessment. Healthcare Quarterly. 2010; 13(2): 71-76.

429

6.

Coste J, Fermanian J, Venot A. Methodological and statistical problems in the construction of composite measurement scales: A survey of six medical and epidemiological journals. Stat Med. 1995; 14(4): 331-45.

7.

Glickman SW, Schulman KA, Peterson ED, Hocker M B, Cairns CB. Evidence-based perspectives on pay for performance and quality of patient care and outcomes in emergency medicine. Annals of Emergency Medicine. 2008: 51(5): 622-631.

8.

Hess BJ, Weng W, Lynn LA, Holmboe ES, Lipner RS. Setting a fair performance standard for physicians' quality of patient care. Journal of General Internal Medicine. 2011: 26(5): 467-473.

9.

Hall W, Violato C, Lewkonia R, Lockyer J, Fidler H, Toews J, Jennett P, Donoff M, Moores D. Assessment of physician performance in Alberta: The physician achievement review. CMAJ. 1999: 161(1): 52-57.

10. Smith C, Varkey A, Evans A, Reilly B. Evaluating the performance of inpatient attending physicians. A new instrument for today’s teaching hospitals. Journal of General Internal Medicine. 2004; 19(7): 766-771. 11. Weber RS, Lewis CM, Eastman SD, Hanna EY, Akiwumi O, Hessel AC, Lai S, Kian L, Kupferman M, Roberts D. Quality and performance indicators in an academic department of head and neck surgery. Archives of Otolaryngology - Head and Neck Surgery. 2010: 136(12): 1212-1218. 12. Goulet F, Jacques A, Gagnon R, Bourbeau D, Laberge D, Melanson J, Ménard C,Racette P, Rivest R. Performance assessment: Family physicians in Montreal meet the mark! Canadian Family Physician. 2002: 48: 1337-1344. 13. Ozcan YA. Physician benchmarking: Measuring variation in practice behavior in treatment of otitis media. Health Care Management Science. 1998: 1(1): 5-17. 14. Chilingerian JA, Sherman HD. Benchmarking physician practice patterns with DEA: A multi-stage approach for cost containment. Annals of Operations Research. 1996: 67: 83-116. 15. Collier DA, Collier CE, Kelly TM. Benchmarking physician performance, part 1. Journal of Medical Practice Management. 2006: 21(4): 185-189. 16. Cooper WW, Seiford LM, Tone K. Introduction to Data Envelopment Analysis and its uses. Springer Verlag, 2006. 17. Dimitrov S, Sutton W, Promoting symmetric weight selection in Data Envelopment Analysis: A penalty function approach, European Journal of Operational Research. 2010: 200(1): 281-288. 18. Tone, K. Slacks-based measure of efficiency in Data Envelopment Analysis. European Journal of Operational Research. 2001: 130(3): 498-509. 19. Hung GR, Chalut D. A consensus-established set of important indicators of pediatric emergency department performance. Pediatric Emergency Care. 2008: 24(1): 9-15.

430

Appendix SBM-SWA Model Used in the Study A structure of the model is as follows:

max

ξ − βz − − βz +

(1)

Subject to:

ξ + vx o − uy o − π o = 1 − vX + uY + π o ≤ 0 v≥ u≥

1 [1/x o ] m

(2) (3) (4)

ξ

[1/y o ] s vl − vk ≤ zlk−

(5) for l ≠ k

(6)

vk − vl ≤ zlk−

for l ≠ k

(7)

ul − u k ≤ z

+ lk

for l ≠ k

(8)

u k − ul ≤ z

+ lk

for l ≠ k

(9)

The above model solves for the variables v,u,z-,z+, ξ and π. Variables represented by v and u are the weights to be calculated for each of the m inputs represented by x and for s outputs represented by y respectively. Variable π is used to implement an assumption of variable returns to scale. The objective function (1) is set up to maximize a score while penalizing asymmetric weight assignments represented by using the terms βz- and βz+. The symmetry scaling factor (represented by β) can be used to further reduce a score when asymmetric weights are present. Constraints (2) and (3) are there to put bounds on the variables and to ensure that variable ξ has always a value less than or equal to one. Constraints (4) and (5) ensure that v and u variables are positive. Constraints (6) and (7) are used to ensure that variable z- captures an absolute value of the difference between each pair of the input weights while (8) and (9) serve the same purpose for each pair of the output weights.

431

Evaluating emergency physicians: data envelopment analysis approach.

The purpose of this research is to develop an evaluation tool to assess performance of Emergency Physicians according to such criteria as resource uti...
127KB Sizes 1 Downloads 3 Views