Resident evaluation of surgical faculty.

JOURNAL

OF SURGICAL

51,186-191

RESEARCH

(1991)

Resident Evaluation of Surgical Faculty’ ANTHONY J. TORTOLANI, M.D., **2 DONALD A. RISUCCI, PH.D.,* AND ROBERT J. ROSATI, PH.D.-~ *Department of Surgery, North Shore University Hospital, Cornell University Medical College, TDepartment of Quality Assurance, North Shore University Hospital, Manhusset, New York 11030 Submitted

for publication

November

20,199O

ensure the reinforcement of excellence, decrease educational inadequacies, and identify areas of fiscal inefficiency. A growing number of institutions evaluate physicians by reviewing practice patterns derived from data collected in medical records and/or quality assurance/ utilization review departments. These efforts aim at defining objective indicators of the quality and efficiency of clinical care and typically focus on measures such as length of stay (LOS), mortality, and complication rates. In academic medical centers, where the education of future professionals must be considered side-by-side with the efficiency and quality of care, physicians are expected to be excellent educators and supervisors, in addition to being efficient providers of quality health care. Thus, it seems essential to evaluate these skills in physicians and to understand their relation to indicators of the quality and efficiency of the clinical care they provide. Among the methods available for evaluation of educators and supervisors, ratings by students and subordinates are used and studied extensively in a variety of organizational settings, [2,3] including medical schools [4,5] and hospitals [6, 71. Ratings can provide quantification of a subjective level of assessment that cannot be obtained solely from test scores or reviews of educational background and experience. In surgery, where a central component of the educational process involves close supervision of the novice by the expert, subjective assessments have inherent importance. However, very few studies have used resident ratings to evaluate attending faculty surgeons. The relation of such subjective measures to objective measures of teaching activity and practice patterns have not been studied. The current study hypothesizes that resident evaluations of faculty surgeons are related to the amount of teaching activity and to characteristics of the practice patterns in which the faculty members engage. It is further proposed that integration of data such as these can be useful in academic decision making, feedback and faculty development.

This study examines the association between resident evaluations of surgical faculty and measures of teaching activity and practice characteristics. Twenty-three of 33 general surgery residents, anonymously and voluntarily rated 62 faculty surgeons in 10 areas of performance. This was repeated 1 year later with 24 of 28 residents and 64 surgeons. Ratings were reliable (testretest on overall mean rating, r = 0.91, P < 0.01). Factor analysis of ratings by residents in postgraduate year (PGY) 6 extracted a judgment factor and an interpersonal skills factor. Ratings by other PGYs were unidimensional. Faculty subgroups that performed more than two major procedures per week at the hospital (P < 0.01) attended the weekly Mortality and Morbidity conference more than once per month (P < 0.05) and/or published one or more research reports (P < 0.05) received higher ratings than those who did not. Among general surgeons (n = 35), ratings of interpersonal skill by PGY 5 residents correlated inversely with complication rate (r = -0.41, P < 0.06) and overall ratings by PGYs 3 and 4 correlated inversely with mean patient length-of-stay (LOS; r = -0.44, P < 0.05). These remained significant after statistical adjustment for case mix differences and differences in the amount of contact with residents. Resident ratings were unrelated to mean patient age, the mean number of diagnoses per patient, and the mean number of procedures per patient. Surgeons who published one or more research reports in 1 year performed a greater number of major procedures (128 f 79 vs 75 + 55; P < 0.01) than those who had not published. Results suggest that excellence in patient care and activity in teaching and research, although not necessarily related, characterize the supe0 1991 Academic press, hc. educator. rior SUrgkal

INTRODUCTION Current trends in the health care delivery system heighten the importance of physician evaluation, particularly at institutions which educate medical students and residents [ 11. Evaluation in these settings can help

METHOD 1 Presented at the Annual Meeting of the Association for Academic Surgery, Houston, TX, November 14-17, 1990. c To whom reprint requests should be addressed at Department of Surgery, North Shore University Hospital, 300 Community Drive, Manhasset, NY 11030. 002%4804/91

$1.50 1991 by of reproduction

Copyright 0 All

rights

Subjects

Sixty-two faculty surgeons were evaluated in June, 1988. They were distributed by specialty as follows: gen186

Academic

Press, in any form

Inc. reserved.

TORTOLANI,

RISUCCI, AND ROSATI: RESIDENT

era1 (n = 33), colon-rectal (n = 3), urologic (n = 8), pediatric (n = 5), cardiothoracic (n = 6), vascular (n = 4), oncologic (n = l), and those affiliated with a health maintenance organization (HMO, n = 2). Among these, there were four full-time general surgeons, four full-time cardiothoracic surgeons, and two full-time vascular surgeons. The remaining voluntary surgeons were active in the general surgery residency program. The same 62 faculty members were evaluated again in June, 1989, as were two additional general surgeons on the voluntary staff. Rating Procedures In June 1988,23 of 33 general surgery residents volunteered to anonymously rate the faculty surgeons in the same 10 areas of performance on which faculty surgeons have traditionally evaluated the residents in our program: technical ability, basic science knowledge, clinical knowledge, judgment, peer relations, patient relations, reliability, industry, personal appearance, and reaction to pressure. A form was used which listed the names of each surgeon in a separate row on the left side of the page and the 10 areas of evaluation across the top of the page. This resulted in a 62 X 10 matrix into which ratings were to be entered. Each resident rated each surgeon on a 1 to 5 scale as follows: (1) poor, (2) fair, (3) good, (4) above average, and (5) excellent. Residents entered a zero (missing value) to indicate when they could not evaluate the surgeon. The form also asked the resident to indicate his/her postgraduate year (PGY). The distribution of raters by PGY was as follows: PGY 1 (n = 9), PGY 2 (n = 6), PGY 3 (n = 2), PGY 4 (n = 3), and PGY 5 (n = 3). Approximately 3 months later, each faculty member received an individualized report which listed all the ratings he/she received and the means and standard deviations of each item across all 62 surgeons. This procedure was repeated 1 year later when 24 of 28 residents completed the rating form. They were distributed by PGY as follows: PGY 1 (n = ll), PGY 2 (n = 3), PGY 3 (n = 6), PGY 4 (n = l), and PGY 5 (n = 3). Measures of Teaching Actioity Teaching activity, in the 1988-1989 academic year, was measured by recording for each faculty member the number of: (a) major operative procedures performed with residents, (b) Grand Rounds conferences attended, (c) Mortality and Morbidity conferences (M&M) attended, and (d) research reports published in the scientific literature in 1989. Descriptive statistics concerning teaching activity are presented in Table 1. Measures of Practice Characteristics Practice characteristics for each faculty member during the 1988-1989 academic year included: (a) mean pa-

EVALUATION

187

tient age, (b) mean number of diagnoses per patient, (c) mean number of procedures per patient, (d) mean patient LOS, (e) the percentage of the surgeon’s patients who experienced complications that were presented at M&M, and (f) full-time vs voluntary faculty status. Descriptive statistics concerning practice characteristics are presented in Table 1. Data Analysis The data analyses focused on psychometric characteristics of the rating scales including stability and factor structure and the interrelationships among resident ratings, measures of teaching activity, and practice patterns. Twelve of the 62 faculty surgeons received ratings of zero (cannot evaluate) from 50% or more of the residents in each of the 10 evaluation areas. All of these surgeons were voluntary, low-volume general surgeons and were eliminated from all analyses. For each surgeon, a mean was computed of ratings in each of the 10 areas by PGYs 1 and 2 and by PGYs 3 and 4. The ratings by the PGY 5 residents were also averaged for each of the 10 areas. A mean rating was then computed across the 10 areas as rated by each of the three PGY groups. This resulted in one mean rating for each surgeon from PGYs 1 and 2 combined (PGY l/2), PGYs 3 and 4 combined (PGY 3/4), and all PGY 5’s (PGY 5). The ratings were averaged within these PGY categories because of two considerations. First, since the residency program includes both a categorical (5 year) and preliminary (1 or 2 year) component, and since the majority of residents in PGYs 1 and 2 are in the preliminary component, it seemed important to study this group separately from the remaining groups. Second, since there are considerable differences between PGY 5 (chief residency year) and previous years in demands and responsibilities, as well as judgment and knowledge acquisition, we decided to analyze ratings by chief residents separately. Further, this distinction among the three groups of raters was supported by visual inspection of rating distributions across PGY levels and provided the advantage of representing at least three raters in each group. A final overall mean rating was calculated for each surgeon as the mean of PGY l/2, PGY 3/4, and PGY 5. RESULTS

Rating Scale Characteristics The stability of the ratings was estimated by calculating the Pearson product-moment correlation (r) between ratings from June 1968 and from June 1989. These correlations ranged from r = 0.56 to r = 0.90 on individual items and were all significant at P < 0.01. The correlation between the overall mean rating in 1968 and 1989 was r = 0.91, P < 0.001. Stability was also evaluated

188

JOURNAL

OF SURGICAL

RESEARCH:

VOL.

TABLE Descriptive

Statistics

for Entire

Practice

activities

characteristics

Sample

Teaching Activity

of major operations of grand rounds attended conferences attended of publications

Average Average Average Average

patient number number length

age of diagnoses/pt. of procedures/pt. of stay/pt.

and Resident Ratings

The distribution of each of the teaching activity variables was positively skewed and bimodal. For analytic purposes, each variable was dichotomized at a point on the frequency histogram where visual inspection suggested that two subgroups may be distinguished. For the number of major procedures, the cutoff point chosen was 112 operations, which equals approximately two major procedures per week. For attendance at both M&M and

in Academic

Year

Mean

Number Number M & M Number

by comparing means in each of the 10 areas across years. None of the means differed by more than 0.10 on the 1 to 5 scale and none of the differences was statistically significant, as determined by repeated measures multivariate analysis of variance. The factor structure of the ratings was examined by principal components analysis of each of the three sets of mean ratings on the 10 evaluation items. Only the PGY 5 ratings resulted in a solution with two factors. Ratings by PGY 1 and 2 and by PGY 3 and 4 residents could be explained by one general factor. Analysis of ratings by PGY 5 residents resulted in two discernible factors, which were labeled: (i) Judgment (42.8% explained variance) and (ii) Interpersonal Skills (35.3% explained variance), based on interpretation of the factor pattern matrix. A score for Judgment was computed for each surgeon as the mean rating on the items: technical ability, basic science knowledge, clinical knowledge, judgment, and reaction to pressure. A score for interpersonal Skills was computed as the mean on the items: peer relations, patient relations, and personal appearance. Each of these items had a factor loading of at least 0.70 on the factor to which it was assigned and a loading not greater than 0.50 on any other factor. Analyses of ratings in relation to teaching activity and practice characteristic variables are presented in the following section. These analyses are based on the ratings PGY l/2, PGY 3/4, and the Judgment and Interpersonal Skill factors from PGY 5 ratings.

1991

1

Variable Teaching

51, NO. 3, SEPTEMBER

1988-89 Standard deviation

Range

94.0 18.0 15.2 0.9

68.9 12.2 17.1 1.8

8-280 o-35 o-43 o-11

53.3 4.1 3.2 9.0

16.0 1.2 1.3 2.3

5.6-71.6 2.1-7.3 1.2-7.8 2.0-34.5

Grand Rounds conferences, the cutoff point was 12 conferences, or approximately one conference per month. Since the mode of the distribution of number of research publications was zero, two groups were formed: surgeons with zero publications and those with one or more publications. Differences in mean ratings across the faculty subgroups defined by each of these variables were significant and are summarized in Table 2. Practice Characteristics

and Resident Ratings

Analyses of the relation between practice characteristics and resident ratings were limited to data from the general surgeons in the faculty sample. This subsample was selected because of its size (n = 35) and the problems imposed by the large variability in patient characteristics among surgeons in the various specialties, and the small number of surgeons available to represent each specialty. Scatterplots were examined and correlations were computed between each of the practice characteristic variables and the ratings: PGY l/2, PGY 3/4, and Judgment and Interpersonal Skills. These correlations are presented in Table 3. Statistically significant correlations were obtained between PGY l/2 and PGY 3/4 with LOS and between interpersonal skills ratings and complication rate. In order to determine whether the associations of ratings with LOS and complication rates were mediated by case mix differences among surgeons, partial correlation analysis was used. Some of the correlations between ratings and complication rate were significant even after partialing out the number of procedures per patient, as were some of the correlations between ratings and LOS, after partialing out the expected LOS (ELOS; based on statewide analyses [ 81 for each diagnosis-related group). Table 4 presents the results of the partial correlation analyses. Ratings were unrelated to the average age of patients, the average number of diagnoses per patient, and the average number of procedures per patient. In order to determine whether the associations of ratings with LOS and complication rates were mediated by

TORTOLANI,

RISUCCI,

AND

ROSATI:

TABLE T Tests:

Teaching

Activity

Group

RESIDENT

189

EVALUATION

2 Differences

on Resident

Ratings

Ratings Overall Pgyl&2 Groups

Overall Pgy3&4

PgY5 Judgment

PgY5 Interpersonal

N

Mean

SD

Mean

SD

Mean

SD

Mean

SD

Research activities 0 publications l+ publications

33 19

3.7*. 4.1

0.6 0.6

3.9 3.9

0.6 0.6

3.3 3.7

0.6 0.7

3.4 3.4

0.6 0.7

M & M attendence 12 or less/year More than 12/year

30 22

3.7* 4.1

0.6 0.5

3.9 4.2

0.6 0.6

3.3 3.5

0.6 0.8

3.3 3.5

0.6 0.7

Grand Rounds attendance 12 or less/year More than 12/year

15 37

3.6 3.9

0.7 0.5

3.8 4.1

0.7 0.6

3.3 3.4

0.6 0.7

3.2 3.4

0.4 0.7

Major operations 112 or less/year More than 112/year

37 15

3.7** 4.2

0.6 0.4

3.9** 4.4

0.6 0.4

3.2** 3.9

0.7 0.6

3.3 3.5

0.7 0.4

* P < 0.05. ** P < 0.01.

the amount of contact with residents, the four measures associated with contact with residents (total number of operations with resident involvement, attendance at Grand Rounds, attendance at M&M, and number of research publications) were partialed out of the analyses. With these four variables partialed out, LOS correlated negatively with all four rating measures (highest r = -0.47, P < 0.05 with PGY 3/4) as did the total number of complications (highest r = -0.45, P < 0.05 with PGY 5 ratings of interpersonal skills) and the total number of deaths due to complications (highest r = -0.48, P < 0.05 with PGY 5 ratings of interpersonal skills). Practice Characteristics The association tice characteristics

and Teaching Activity

between teaching activity and pracwas evaluated through cross-tabula-

tion analyses and t tests. Among full-time faculty members, 70% (n = 7) published at least one research report in 1989, compared with 29% (n = 12) of the voluntary staff (P < 0.05). Faculty members who had published research in 1989 performed significantly more major procedures (M f SD; 127.6 -t 78.5 vs 74.8 f 55.2; P < 0.01) and also had a higher number of diagnoses per patient (4.7 + 1.3 vs 3.9 f 1.0; P < 0.05) and a higher number of procedures per patient (4.0 f 1.6 vs 2.8 f 0.9) than those who had not published. DISCUSSION

The results of this study support the hypothesis that resident evaluations of surgical faculty are directly associated with the amount of teaching and research activity

TABLE Correlations

between

Ratings

and General

3 Surgeons’

Ratings

Average LOS

Average patient age brs.)

Average number of diagnoses/pt.

Overall Pgy 1 & 2 Overall Pgy 3 & 4 Judgment Interpersonal

-0.32* -0.44** -0.08 -0.20

-0.22 -0.20 -0.03 0.02

-0.03 -0.13 -0.05 -0.18

* P < 0.05. ** P < 0.01.

Practice

Characteristics

Average number of procedures/pt. 0.03 -0.12 0.05 -0.26

Complication rate -0.26 -0.23 -0.17 -0.46**

JOURNAL

TABLE Partial

OF SURGICAL

RESEARCH:

4

Correlations of General Surgeons’ with LOS and Complication Rate Adjusted for patient characteristics

Ratings

Average LOS”

Overall PGY l/2 Overall PGY 3/4 Judgment Interpersonal

-0.32 -0.44* -0.11 -0.16

Complication rateb -0.27 -0.14 -0.18 -0.38

Ratings

Adjusted for amount of contact with residents Average LOS

Complication rate

-0.19 -0.47* -0.07 -0.28

-0.28 -0.28 -0.16 -0.45*

n Partialing out the effects of expected length of stay. b Partialing out the effects of average procedures/patient. * P < 0.05.

in which faculty members engage and with practice characteristics which are frequently regarded as indicators of the quality and efficiency of patient care. The apparent association among these variables has implications for the recruitment and evaluation of faculty surgeons. The data suggest that excellence in patient care, teaching, and research tend to co-occur in superior surgical educators. This implies that in the recruitment of surgical faculty for a general surgery residency program, a wise strategy would include evaluation of candidates with respect to the amount of teaching, research, and surgery in which they engage. Further, the results imply that the ongoing evaluation of surgical faculty should integrate resident evaluations with assessments of teaching activity, research productivity, and the quality and efficiency of patient care. The integration of these data in feedback to faculty surgeons offers the advantage of combining subjective and objective data on which to base behavioral change [9]. Further this process is likely to be perceived positively by residents [lo] and to avoid some of the problems associated with interview methods [ll]. Analyses of the psychometric properties of the resident ratings demonstrate that mean ratings of surgical faculty by general surgery residents, grouped according to PGY, are highly reliable and efficient. Further, these analyses identified two factors which underlie ratings by chief residents; one factor includes variables related to judgment, knowledge, and technical ability, while a second factor includes variables related to interpersonal skill. It is significant to note that these factors differed in their relation to the teaching and practice variables studied. Judgment, not interpersonal skill, was rated significantly higher in the group of surgeons who performed a relatively large number of major procedures. However, interpersonal skill, not judgment, was correlated with complication rate. It may be hypothesized from these results that perceptions by chief residents of a faculty

VOL.

51, NO. 3, SEPTEMBER

1991

member’s judgment and technical skill are influenced primarily by the amount to which the faculty member operates with and directly teaches the residents, whereas perceptions by chief residents of a faculty member’s interpersonal skills may be influenced primarily by factors associated with the quality of patient care. Viewed from a different perspective, comparable ratings of interpersonal skill and judgment, in an individual surgeon, may be associated with a desirable balance between operative volume and patient care. In integrating the results of this study, it is important to consider its limitations. First, all resident raters were volunteers. The extent to which ratings would have differed if all residents had volunteered or were required to complete the rating form is unknown. Second, the relatively small sample size combined with the unequal representation of surgical specialties, limit the power of the study and the generalizability of the results. Third, the study defined teaching activity and practice characteristics in terms of a limited subset of measures. For example, the measurement of research productivity did not include presentations at national and regional conferences, nor did it distinguish between basic science and clinical research or include any evaluation of the quality of the research. Lastly, it could be argued that LOS and complication rates are only superficial measures of quality and efficiency of patient care. Our use of statewide data to control for case mix differences partially compensates for this problem as it relates to LOS. Nevertheless, in the absence of more sophisticated measures, our use of these indicators seemed important and justifiable if only for the reason that they are in such common use at virtually all teaching hospitals in the country. The associations between resident ratings and the teaching and practice variables differed across PGYs. For example, the difference in mean rating between faculty members who published one or more papers and those who did not was only significant for PGY 1 and 2 raters. These discrepancies are difficult to explain; however, they may be related to differences in the number of raters in the three different PGY groups. The apparent reliability and validity of resident evaluations in this study suggest that the continued development of methods for measuring resident perceptions of the surgical faculty should be supported. This may include rater education to reduce error and bias, experimentation with different rating formats, and the evaluation of the use of ratings in faculty recruitment and systematic feedback to surgical faculty. REFERENCES 1.

2.

Gewertz, B. L. Association for surgical education, Presidential address: The surgical educator, an endangered species? Surgery 97(6): 728, 1985. Saal, F. E., Downey, R. G., and Lahey, M. A. Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bul!ktin 88(2):413, 1980.

TORTOLANI,

RISUCCI,

AND

ROSATI:

3. Borman, W. C. Evaluating

performance effectiveness on the job: How can we generate more accurate ratings? In J. S. Lloyd (Ed.),

Evaluation

of Noncognitiue Skills and Clinical Performance.

Chicago: American Board of Medical Specialties, 1982. Pp. 179-193. 4. Postuma, R. Evaluation of surgical faculty and surgical undergraduate teaching. Can. J. Surg. 28(4): 351, 1985. 5. Irby, D. M., Gillmore, G. M., and Ramsey, P. G. Factors affecting ratings of clinical teachers by medical students and residents. J. Med. Educ. 62: 1,1987. 6. Risucci, D. A., Tortolani, A. J., and Ward, R. J. Ratings of surgical residents by self, supervisors and peers. Surg. Gynecol. Obstet. 169: 519, 1989. 7. Downing, S. M., English, D. C., and Dean, R. E. Resident ratings

RESIDENT

191

EVALUATION

of surgical faculty: Improved teaching feedback. Am. Surg. 49: 329,1983.

effectiveness

through

6. Leiken, A. M., Sexton, T. R., Nolan A., Mitchell,

A., Dlugacz, Y., and Rosati, R. J. Targeting and reducing excessive hospital stays. Unpublished manuscript, State University of New York at Stony Brook, 1990.

9.

Ende, J. Feedback in clinical

medical education.

JAMA 260(6):

777,1983. 10.

Anwar, R. A. H., Bosk, C., and Greenberg, tion. J. Surg. Res. 30: 27, 1981.

11.

Dean, K., Anderson, W., and Dean, R. A low-threat, high compliance faculty evaluation system. Focus Surg. Educ. g(4): 13, 1989.

G. Resident

evalua-

Teaching: aligning faculty member and resident perceptions.

Reflections in a time of transition: orthopaedic faculty and resident understanding of accreditation schemes and opinions on surgical skills feedback.

Resident and Faculty Perceptions of Program Strengths and Opportunities for Improvement: Comparison of Site Visit Reports and ACGME Resident Survey Data in 5 Surgical Specialties.

Faculty and resident response to an innovative mock board.

Comparison of resident and glaucoma faculty practice patterns in the care of open-angle glaucoma.

Patient Perceptions of Electronic Medical Record Use by Faculty and Resident Physicians: A Mixed Methods Study.

The Opinion of Students and Faculty Members about the Effect of the Faculty Performance Evaluation.

Factors that affect selection of a psychiatry residency program : faculty, resident, and student perspectives.

Evaluation of Faculty: Are medical students and faculty on the same page?

Increasing Faculty Attendance at Emergency Medicine Resident Conferences: Does CME Credit Make a Difference?

Ethics in psychiatric research : resources for faculty development and resident education.

Current status of surgical education, resident training and continuing education.

Outcome of a Resident Spine Surgical Skills Training Program.

Designing faculty development to support the evaluation of resident competency in the intrinsic CanMEDS roles: practical outcomes of an assessment of program director needs.

Faculty participation in an evaluation review of low-use journals.

Resident Self-Assessment and Learning Goal Development: Evaluation of Resident-Reported Competence and Future Goals.

Reading between the lines: faculty interpretations of narrative evaluation comments.

Surgical resident perspective on deceased donor organ procurement.

Surgical resident experience in breast disease: a national study.

Trends in ophthalmology resident surgical experience from 2009 to 2015.

Preoperative surgical rehearsal using cadaveric fresh tissue surgical simulation increases resident operative confidence.

Faculty development for the evaluation system: a dual agenda.

Evaluation of surgical scrubbing.

An evaluation of plastic surgery resident selection factors.