Annals of the Royal College of Surgeons of England (1992) vol. 74, 80-84

The challenge of evaluating surgical procedures Gordon M Stirrat MD FRCOG

John Farndon MD FRCS

Professor of Obstetrics and Gynaecology

Professor of Surgery

Stephen C Farrow MD MRCP

Nuala Dwyer MB BChir MRCOG

Senior Lecturer, Health Care Evaluation Unit

Research Registrar

Departments of Obstetrics and Gynaecology, Epidemiology and Public Health Medicine and Surgery, University of Bristol Key words: Surgical procedures; Evaluation; Randomised controlled trials

All new interventions and procedures must be properly assessed in comparison to the currently accepted method(s). It is unethical not to do so. The optimum method is by Randomised Controlled Trial (RCT). This is ideally suited to the testing of drugs because the trial can usually be double blind and placebo controlled. RCTs are less commonly used for the evaluation of new surgical techniques. There are valid and invalid reasons for this and these are discussed.

The medical profession and its allied industries continually tries to improve ways of preventing or treating illhealth. This activity is laudable given that many therapeutic interventions and procedures may be only partially or temporarily effective, have unwanted side-effects and are more costly than we would wish. New remedies of proven efficacy should be welcomed, but certain absolute principles should govern their introduction. The foremost of these is that all new interventions must be properly compared with the currently accepted method(s). This is not an optional extra-it is a fundamental ethical imperative. The first RCT, inspired by Bradford Hill and conducted by the Medical Research Council (MRC), was reported in 1948 (1) and evaluated streptomycin in the treatment of pulmonary tuberculosis. This was soon followed by a placebo controlled, double-blind trial in which the MRC examined the use of antihistamines in the treatment of the common cold (2). During the next 30 years Cochrane (3) became the standard bearer of this methodology and more recently Dudley (4) has argued that properly constructed controlled trials are necessary Correspondence to: Professor G M Stirrat, Department of Obstetrics and Gynaecology, Bristol Maternity Hospital, Southwell Street, Bristol BS2 8EG

for the advancement of clinical knowledge. The question has recently been thrown into sharper relief by the increasing advocacy of endoscopic surgery by a variety of enthusiasts (5). In gynaecological surgery, for example, endoscopic techniques for endometrial ablation are being advocated as a safer and cheaper option than hysterectomy (6-10). Pleas for proper evaluation before widespread introduction (11) have been questioned (12) because it 'stands to reason' that the new techniques will be 'better'. Reason is an insufficient arbiter and better evidence of benefit is necessary. In a directly analogous context: 'open versus transurethral resection of the prostate', Roos et al. (13) and Wennberg (14) demonstrated that transurethral prostatectomy was less effective than open surgery in overcoming urinary obstruction, and raised the possibility that the endoscopic procedure may result in higher long-term mortality. They recommended that the two operations be subjected to a prospective clinical study. This example of a belated call emphasises the need for proper assessment of all new surgical techniques.

Evaluation of therapeutic interventions A Lancet leading article (15) discussed 'a hierarchy in the cogency of evidence that may be adduced in defence of medical interventions'. These have been amplified and ranked by Iain Chalmers (16) as follows: 1 Clinical impressions. These have no place in the context of proper evaluation. 'Impressions about the quantifiable are no substitute for measurement' (17). 2 Case and case series without controls. These can seldom be the basis for evaluating effects of interventions because they are prone to selection bias and

the rate of false inferences may be as high as 50% (18). Their main role is to stimulate a properly controlled study. 3 Case studies with controls. The controls can be 'historical' in which incorrect conclusions may be inferred in between 40% and 60% of studies (19). Some useful information may be obtained from studies using matched concurrent controls, but these can never be conclusive. They may, however, be appropriate in the evaluation of: conditions with very high mortality; rare diseases in which a large series cannot be attained; some diseases with paired organs where the patient acts as control; and certain conditions producing repetitive emergencies. 4 Prospective randomised controlled trial (RCT). This is the only method by which unknown selection biases can be controlled if the trial is properly designed with specific objectives and of sufficient size to prevent false-positive (a error) and false-negative (/B error) results. RCTs are ideally suited to the testing of new drugs because the trial can usually be doubleblind and placebo controlled-the gold standard for proper evaluation. Objective testing of this rigour is required of all new pharmaceutical products before the Committee of Safety of Medicines (CSM) in the UK and the Food and Drug Administration (FDA) in the USA allow their release for therapeutic use.

Characteristics of RCTs Patients in the comparison group are considered concurrently. Unknown characteristics which may have a confounding effect on the outcome are likely to be distributed equally between the two groups. The RCT theoretically assigns or distributes both known and unknown variables equally between the two groups, although it is possible that, by chance, the distribution can be unequal. It is normal, therefore, to record variables which may influence outcome and to check equal distribution. Comparisons based on the randomisation of individuals are preferable in principle, even if practical considerations may sometimes lead to the adoption of other units of randomisation. Ideally the first person treated should be in the context of an RCT (20) although, in practice, this is not often achieved. RCTs are still open to bias if not properly conducted and most of the criticisms (21) relate to the 'frailty of human nature' among the medical practitioners involved rather than the method itself. The principal biases relate to the doctor (as practitioner and observer), the patient (as subject), compliance (of either of the above) and cointervention. The blinding of doctor and patient is difficult to achieve, but it should be possible to limit observer bias when taking measurements and assessing outcomes. Compliance bias can occur, for example, if a surgical treatment in one arm is accompanied by a drug as an intrinsic part of the intervention. If patients in one arm comply to a different extent from those in another, valid inferences cannot be drawn about the effectiveness of the

Evaluating surgical procedures 81 therapy. Cointervention bias may occur if, for example, patients in one arm of a study spontaneously adopt a form of treatment which is part of another arm. This could occur within a RCT of a surgical procedure with which a particular dietary regimen is associated if that diet is also used by some patients in the non-surgical arm.

Evaluation of surgical procedures Historically, the introduction of new surgical procedures resulted from the initiative of individual surgeons who frequently designed a new piece of equipment or developed a new technique. This was usually tested by comparing the results with historical controls. There has been resistance to the wider use of RCTs for assessment of surgical procedures (20) despite strong advocacy by such proponents as Baum (22,23). Salzman (24) reviewed the place of the RCT in surgical practice over 40 years and noted that by 1980 only 10-20% of health technologies had been subjected to this form of evaluation. Baum (22) and Salzman (24) provide examples of surgical practice where less rigorously evaluated procedures have been found to be ineffective. These included gastric freezing for bleeding peptic ulcer, carotid body denervation for bronchial asthma, prophylactic portacaval shunt to prevent oesophageal variceal bleeding, nephropexy for visceroptosis, removal of the chronically inflamed appendix and periarterial sympathectomy. Randomisation in surgical practice RCTs are less frequently used in the evaluation of surgical procedures for several (at least partially valid) reasons: 1 The only true placebo is a sham operation which is unethical. Surgical trials cannot, therefore, be fully placebo controlled. 2 Blinding of the procedure can never be achieved for the operator and only sometimes for the assessor and patient. 3 The trial usually involves comparison of an established method with a new technique. Surgical experience, a strong variable determining outcome, is inevitably going to be greater for the standard method and therefore not directly comparable with that for the new technique, particularly during any learning curve. Further bias accrues in that new surgical techniques are usually introduced by enthusiastic and skilled surgeons whereas the standard technique is practised by all surgeons in that specialty. To counteract this bias Rudicel and Esdaile (25) suggested that randomisation should centre on the surgeon (who is allowed to practise the procedure at which he or she is accomplished) rather than on the patient. Uncontrolled patient variables then cloud the issue and it has been claimed that the surgeon 'becomes nothing other than a technologist'

G M Stirrat et al.

82

4

5

6

7

8

(15). In practice there is little enthusiasm for surgeon-centred randomisation. Trials of sufficient size in which the experience of the participating surgeons is carefully recorded should allow some estimate to be made of the cost to the patient of an inexperienced operator. This is a very important issue highlighted by the Large Bowel Cancer Study of Fielding (26) and the Confidential Enquiry into Perioperative Deaths (CEPOD) in the United Kingdom (27). Many new surgical techniques are introduced to clinical care much earlier than is the case for new drugs which are more readily developed and initially tested outwith clinical practice. The early introduction of surgical innovations must, however, be monitored very carefully. T C Chalmers (28) has argued strongly for 'randomisation of the first patient'. How is the first patient to be defined? It surely cannot be the very first patient on whom the surgeon carries out the new procedure. Should it be mandatory for any other surgeon who wishes to test the technique to do so only within the context of an RCT in which the first patient is randomised? This seems impractical. It can be argued that, since there is a learning curve for each new operation, randomisation of 'the first patient' to a new procedure introduces selective bias against it. On the other hand, it would provide some estimate of the cost of an inexperienced operator. In reality, however, no matter how strong the theoretical arguments might be for randomisation of 'the first patient' this is impossible in practice. The investigator must be allowed to exercise judgement about the earliest possible point at which an RCT can begin. Trials of all but the most minor surgical procedures can only take place in hospital and are, therefore, constrained by operating theatre availability. Surgical trials are not inherently more difficult than others. Recruitment is slower than for many drug trials and financial support is frequently limited, so that it is more difficult to mount trials of sufficient size to produce clear results. The combined analysis of the results of several RCTs (meta-analysis) will often allow meaningful conclusions to be drawn which would not be possible if each were to be considered individually. Multicentre trials are made more difficult for surgical procedures by the different level of skill of the surgeons and anaesthetists in the various centres. This can be eliminated by stratified randomisation within each centre. Patients may be unwilling to be randomised because they perceive one procedure to be preferable to another; for example, the very poor recruitment to the Cancer Research Campaign trial of conservative surgery vs mastectomy for small breast carcinomas. Appropriate end-points may require many years of follow-up, for example recurrence and/or death rates for treatments of many cancers.

There are many less valid reasons for lack of enthusiasm for RCTs, for example: 1 It is part of the surgeons' ethos to believe in themselves and their surgical skills (25). They tend to assume that new methods developed by them are superior. 2 Rudicel and Esdaile (25) suggest that having to confess uncertainty by wishing to undertake a trial undermines the patient's confidence in the surgeon. They believe that uncertainty is antithetical to surgical training. This is one of the most insidious of all objections to RCTs and must be resisted. The patient-doctor relationship is totally dependent on each being truthful with the other. 3 RCTs are said to be 'too difficult', and because the ideal trial cannot be mounted nothing is done at all. 4 Language used in the context of RCTs is often pejorative, eg it is claimed that patients are being asked to 'sacrifice (some organ) on the altar of science' (12). 5 A surgeon gains personal kudos and sometimes private practice from developing a new operation. It may therefore not be in his best interests to have it exposed to testing. A distinguished Canadian surgeon has stated that "there is little doubt that threatened loss of income is a major factor in inhibiting such studies" (29). 6 It is difficult to get agreement on the protocol, particularly if the entry criteria are very rigid. The possibility of room for flexibility of entry criteria has recently been highlighted by Byar et al. (30) in relation to trials for the treatment of AIDS. 7 Collaborating doctors are often prepared to allow inclusion of their patients on the condition that they are allocated to the arm of his/her preference.

Alternative strategies? Surgeons feel far more comfortable with 'long-term follow-up' as a method of evaluation of surgical techniques, because this is a traditional method of assessment of patient care. The value of long-term follow-up and audit of surgical techniques should not be minimised. It must, however, follow and not replace prior objective assessment. Wennberg (14) has recently argued that, under some circumstances, a preference trial is better than a RCT because the importance of patients' rational decision making raises questions about randomisation. A preference trial is 'the systematic follow-up of patient cohorts where treatment assignments are made according to informed patient choice rather than by randomisation'. Such trials can, however, only complement and not replace RCTs, because it is only by the latter that the true advantages and disadvantages of a procedure can be determined without the confounding bias of patient differences. A RCT will provide the evidence to allow the patient to make an informed choice.

Evaluating surgical procedures Table I. Criteria for a RCT Absolute 1 A valid method for random allocation must be used. 2 The size of the trial must be large enough to avoid falsepositive (a error) and false-negative (/3 error) results. 3 The size of the treatment effect sought (6) and the number of patients required in each arm of the study to demonstrate it must be considered beforehand. 4 The protocol should not be altered during the trial without proper justification and clear documentation. 5 Ethics Committee approval must be gained. 6 Fully informed consent from patients must be obtained before randomisation. 7 Results must be analysed on the basis of 'intention to treat'. Other analyses can be included as long as this criterion is met. Relative 1 Every attempt should be made to construct it as a doubleblind placebo-controlled trial, but this is not always possible. 2 The 'first patient' should be randomised. 3 An independent data monitoring committee can be useful for large or controversial trials.

3 4

5 6

7

8 9

10 11

Conclusions 1 Randomised controlled trials are the best method for assessing all new interventions, including surgical techniques. Successful identification of benefit or harm is possible if the differences are large. RCTs are necessary if confounding biases are to be avoided when, as is usual, moderate differences are expected. 2 The ideal double-blind, placebo-controlled trial cannot easily be applied to surgical comparisons. 3 The design of RCTs for surgical evaluation must be as close as possible to the optimal. The design and quality of the trial is often enhanced by input from epidemiologists and/or statisticians. Deviations from the ideal must be clearly justified and must not prejudice the trial. Table I shows suggested absolute and relative criteria for a valid RCT. 4 Local ethics committees must be involved in reviewing the design. The involvement of patient interest groups related to the subject under study is recommended. 5 Meta-analysis to combine the results of several similar RCTs can be useful (16), but cannot redeem badly designed trials.

12

13

14

15 16

17

18 19

20

We wish to acknowledge the assistance of Drs lain Chalmers, Jean Golding and Klim McPherson in the preparation of this

21

manuscript.

22 23

References

24

83

drugs in the prevention and treatment of the common cold. Br Med Jf 1950;2:425-9. Cochrane AL. Effectiveness and Efficiency-Random Reflections on Health Services. Oxford: Nuffield Provincial Hospitals Trust, 1972. Dudley HAF. The controlled clinical trial and the advance of reliable knowledge: an outsider looks in. Br Med J 1983;287:957-60. Wastell C. Laparoscopic cholecystectomy. Br Med J 1991; 302:303-4. Vancaille TG. Electrocoagulation with the ball-end resectoscope. Obstet Gynecol 1989;74:425-7. Davies WAR, Pollard W, Basterfield P. Reduction of menstrual blood loss with endometrial cryosurgery. J Obstet Gynaecol 1985;6:117-19. Davies J. Hysteroscopic endometrial ablation with the neodymium-YAG laser. Brj Obstet Gynaecol 1989;96:92832. Phipps JH, Lewis BV, Roberts T et al. Treatment of functional menorrhagia by radiofrequency-induced thermal endometrial ablation. Lancet 1990;335:374-6. Magos AL, Baumann R, Turnbull AC. Transcervical resection of the endometrium in women with menorrhagia. Br MedJ 1989;298:1209-12. Stirrat GM, Dwyer N, Browning J. Planned trial of transcervical resection of the endometrium versus hysterectomy. Br_7 Obstet Gynaecol 1990;97:459. Anonymous. Positive discrimination for surgery? Lancet 1990;336: 151. Roos NP, Wennberg JE, Malenka DJ et al. Mortality and reoperation after open and transurethral resection of the prostate for benign prostatic hyperplasia. N Engl J Med 1989;320:1120-4. Wennberg JE. What is outcomes research? In: Gelijns AC ed. Medical Innovation at the Crossroads, Vol. 1, Modern Methods of Clinical Investigation. Washington DC: National Academy Press, 1990:33-46. Anonymous. The epistemology of surgery. Lancet 1986;1: 656-7. Chalmers I. Evaluating the effects of care during pregnancy and childbirth. In: Chalmers I, Enkin M and Keirse MJNC eds. Effective Care in Pregnancy and Childbirth. Oxford: Oxford University Press, 1990:3-38. McCormick JS. The personal doctor (1975). J R Coll Gen Pract 1976;26:750-3. Venning GR. The validity of anecdotal reports of suspected adverse reactions-the problem of false alarms. Br Med J 1982;284:249-52. Sacks H, Chalmers TC, Smith H. Randomised versus historical controls for clinical trials. Am 7 Med 1982;22: 233-40. Chalmers TC. Randomization of the first patient. Med Clin North Am 1975;59: 1035-8. Byer A. The practical and ethical defects of surgical randomised prospective trials. J Med Ethics 1983;9:90-3. Baum M. Scientific empiricisms and clinical medicine: a discussion paper. J R Soc Med 1981;74:504-9. Baum M. Commentary on the practical and ethical defects of surgical randomised prospective trials. J Med Ethics 1983;9:92-3. Salzman E W. Is surgery worthwhile? Arch Surg

1985;1l20:771-6. 1 Medical Research Council. Streptomycin treatment of pulmonary tuberculosis. Br Med7J 1948;2:769-82. 2 Medical Research Council. Clinical trials of antihistaminic

25 Rudicel 5, Esdaile J. The randomised clinical trial in orthopaedics: obligation or option? J Bone Jfoint Surg 1985;67A: 1284-93.

84

G M Stirrat et al.

26 Fielding LP, Phillips RKS, Fry JS, Hitlinger R. The prediction of outcome after curative resection for large bowel cancer. Lancet 1986;2:904-7. 27 Buck N, Devlin HB, Lunn JN. The Report of a Confidential Enquiry into Perioperative Deaths. London: Nuffield Provincial Hospitals Trust, The Kings Fund, 1989. 28 Chalmers TC. Randomized clinical trials in surgery. In:

Varco RL, Delaney JP eds. Controversy in Surgery. Philadelphia: W B Saunders, 1976:3-11. 29 Warren ED. Quality control in surgical research; importance to the patient. CanJ Surg 1988;31:81-5. 30 Byar DP, Schoenfeld DA, Green SB et al. Design considerations for AIDS trials. N Engl J Med 1990;323:1343-8. Received 6 June 1991

Assessor's comment The authors make a cogent set of arguments for applying statistical criteria to the question of whether a new surgcial treatment should be introduced for any particular surgical condition. They set out clearly the reasons why we should undertake clinical trials of new treatments and also the difficulties inherent in such trials in the field of surgery. As I see it, the chief problem is to allow for the skill of the surgical operator, which varies both from operator and to operator, and in an individual operator with time on the learning curve. There are also problems with rare diseases with respect to which the investigator has to steer between the Scylla of small numbers and Charybdis of trying to combine results from different centres. One technique that the authors do not mention but which can help in these circumstances is that of sequential analysis (1) which, despite having been developed in 1952, does not seem to have been much used in clinical work. The two treatments are applied, one to each of a pair of patients who are chosen to be in all other respects than the treatment as similar as possible. The treatment

outcome, whether it favours treatment 1 or treatment 2 or neither, is plotted on graph paper and the line obtained followed until it reaches significance level for treatment 1 being better than 2, or vice versa, or for neither being better than the other. The expertise of the surgeon can be allowed for as one of the factors in selecting patient pairs. This greatly simplifies the statistical assessment: if the first eight pairs all showed a superiority for treatment 1 against treatment 2, then treatment 1 is better than treatment 2 (P>O.05). Of course that raises another problem-the definition of a meaningful and reliable assessment of outcome.

MICHAEL HOBSLEY TD

FRCS

Professor of Surgery Middlesex Hospital London

Reference I Armitage P. Sequential Medical Trials. Oxford: Blackwell, 1960.

The challenge of evaluating surgical procedures.

All new interventions and procedures must be properly assessed in comparison to the currently accepted method(s). It is unethical not to do so. The op...
874KB Sizes 0 Downloads 0 Views