Gen Thorac Cardiovasc Surg (2013) 61:685–693 DOI 10.1007/s11748-013-0334-1

REVIEW ARTICLE

Cardiac surgical patients are not the same. But who knows that: the patient, the cardiologist or the surgeon? Haralabos Parissis • Bassel Al-Alao

Received: 5 June 2013 / Accepted: 14 October 2013 / Published online: 29 October 2013 Ó The Japanese Association for Thoracic Surgery 2013

Abstract Nowadays, outcome measures get published following risk adjustment; mortality rates are adjusted for risks to give a picture of the performance of a surgeon or a unit. The risk factors associated with in-hospital death are likely to reflect concurrent disease-specific variables. Factors associated with increased resource utilisation (as measured by length of stay) reflect serious co-morbid illness. The various models are effective at predicting population behaviour, but not necessarily suited for predicting individual outcomes. Further work needs to be done both to explain the differences in risk factors seen between the various risk stratification models and to determine which models are best suited in clinical practice. This document reviews the current evidence and risk stratification tools currently employed in cardiac surgery practice. Keywords Risk stratifications  EuroSCORE  STS score  Parsonnet score

Introduction Risk stratification is a process that considers the pathophysiology of the disease in question, co-morbid illnesses, as well as the treatment employed; as an indispensable medical action, basically it is based on simple accountable variables of the particularities of the patients.

H. Parissis  B. Al-Alao (&) Department of Cardiothoracic Surgery, Royal Victoria Hospital, Grosvenor Road, Belfast BT12 6BA, UK e-mail: [email protected] H. Parissis e-mail: [email protected]

Scoring systems that derive from risk models are able to predict outcomes. To make things more complex, the relationship between surgical volume and outcome is not simple [1] and therefore the idea of ‘‘practice-makes-perfect’’ has obvious face validity; On the other hand, risk stratification models would not take into account that surgeon volume is a proxy for such individual human factors as technical skill and quality of decision-making. The Leapfrog group standards for CABG were based on volume standards coupled with risk-adjusted mortality rates [2]. However, the same group stated that if hospitals have audited systems in place for assessing risk-adjusted mortality and also if the mortality in those institutions is below the 25th quartile, then procedure volume is not considered. Actually Leapfrog volume standard (450 CABGs per year) did only affected mortality by 0.4 % between hospitals above and below such a case volume (2.9 vs 3.3 % respectively) [3]. The authors went on to conclude that the potential benefit of implementing the initiatives regarding CABG was mainly attributable to the addition of the riskadjusted mortality standard; approximately 1/8th of patients could be saved with implementation of numerical volume standard ([450 cases yearly) alone for CABG. Yet, a recent study [4] suggests that surgeon volume may be more important than hospital volume for most high-risk procedures, something that raises questions on subspecialisation. Mortality on the other hand, is the most attractive outcome measure due to its ease of measurement, particularly in administrative databases. Yet, mortality is subject to the limitations of any outcome measure, and additionally to the limitation that it is a rare event for some procedures in modern surgical practice, leading to complexities with analysis.

123

686

The issue of identifying objectively high-risk patients, is very delicate and timely, especially now that surgical practice is ‘‘under the microscope’’ and the patient can use the Internet ‘‘to assess’’ operative results and individual performance. The risk at present with the development of medical statistics and the expansion of the Internet lies in the fact that the results of surgical practice are announced publicly, though without reference to the risk adjustment. This has a negative impact on the care that the patient receives, since attending physicians may be less willing to accept high-risk cases. The purpose of this article is not to provide a lecture on the foundations of the risk adjustment models and their fundamental statistics, but to highlight some of the central ideas on the development of those systems and make the light reader aware of the difficulties encountered in implementing, interpreting and applying those models.

Background By reshuffling patients according to the severity of their illness, risk stratification is determined to predict outcomes from a given intervention. It provides a numerical expression that attempts to classify mortality or morbidity following a procedure by means of an objective score. Comorbidities on the other hand are variables by means of coexisting diagnoses that are not linked to the principal surgical diagnosis, but may alter the outcome of a medical procedure. The concept of risk stratification was conceived in the mid 19th century. First Nightingale [5] in an attempt to criticise the levels of treatment provided in several London hospitals, she observed that mortality on its own cannot be an accurate reflection of surgical practice, since some patients are medically compromised and they can obviously have a different outcome. She carried her observations to the next level by suggesting that simple measures such as improved sanitation and less crowded wards may alter adverse outcome. Codman [6], a surgeon from Boston was one of the first to be occupied with surgical outcome analysis and scrutiny of results. His observatory spirit was triggered when he started investigating the high complication rate during anaesthesia. In the early 1900s, it was medical students that delivered anaesthesia. Codman, linked the outcome of an illness to care. He viewed outcome analysis as one step closer to improvement of patient care. This article is an overall review of the principles governing the ideas behind the development of risk stratification systems; moreover, this report aims to also address how inevitably, those stratification systems suffer from

123

Gen Thorac Cardiovasc Surg (2013) 61:685–693

endogenous confounding bias. To start with, we would nevertheless, like to take the opportunity and ‘‘touch upon’’ some fundamental principles from the ‘‘surgeon’s eye’’, that could provide an insight of the topic, at the junior doctors level.

Brief update on statistical tools The first tool that is in principle needed to develop risk stratification models is the database of the ‘‘in study’’ population. Obviously, complete and accurate data collection of the examined sample is of a paramount importance. For example the use of in-hospital death as an endpoint could be dubious, because in-hospital death under-represents actual postoperative death. For example hospitals with greater volume are at greatest risk for under-reporting of postoperative deaths because they are more likely to be higher-acuity, tertiary care facilities with more rapid discharge of patients to lower-acuity facilities. Sample size is also affecting statistical power of a given study; false negative results occur with small sample size. With other words, sample size directly affects the type II error and this is important to be relayed bluntly to the readers. Broadly speaking, reducing bias and confounding factors during an experimental design requires understanding the important principles of using control groups, propensity matching or randomisation. It is worth noted that as many as 30 % of journal articles may contain errors in the use of simple statistics [7]. A common mistake is attributed to the fact that t test is misused; t test should compare means between only 2 groups. When dealing with data, the differentiation between continuous, categorical or discrete variables becomes essential. Categorical variables either nominal (example: alive/dead) or ordinal (example: lung cancer stages), are often presented and analysed in contingency tables using Fisher’s exact test or Pearson’s Chi square test. Sensitivity and specificity Sensitivity is the number of the true positive results divided by the number of all patients suffering from the illness. Sensitivity shows the probability that the test is positive when the patient is sick. The more sensitive a test, the lower the false negative result. Increased sensitivity of a test rules out the diagnosis (SNOUT, SeNsitivity rules OUT). High sensitivity is necessary for screening tests. Specificity is the number of true negative results divided by the number of all healthy patients. Specificity shows the

Gen Thorac Cardiovasc Surg (2013) 61:685–693

probability that the test is negative when the patient is not sick. The more specific the test, the lower the false positive result. High specificity of a test rules the diagnosis in (SPIM, SPecificity rules IN). High specificity is necessary when the test is diagnostic. We mentioned that the higher the sensitivity of a test, the lower the number of false positive results. For example, in lung cancer, the PET scan is sensitive in 94 % of the cases and specific in 82 % of the cases [8] with regard to mediastinal lymph nodes. So, when the PET scan is negative for mediastinal lymph nodes (due to high sensitivity in lung cancer and consequently the small number of false negative), this means that no further investigation of the mediastinum is recommended. In other words ‘‘high sensitivity rules out diagnosis’’. When the PET is positive for mediastinal lymph nodes (due to low specificity in lung cancer and consequently high number of false positive), this means that further investigation of the mediastinum is recommended. That is to say ‘‘high specificity rules diagnosis in’’. The Pareto principle: the ‘‘vital few’’ or the ‘‘80/20’’ The Pareto principle states ‘‘20 % of a group is responsible for 80 % of the results’’. For example: a small rate of patients consuming the highest rate of blood agents (example: multi-injured) or the highest number of disposable gloves (example: AIDS patients). So, in a complex procedure it is only a few factors that are responsible for the majority of the results. After a CABG for example, 20–25 % of the patients suffer from some kind of complications and 10 % suffer from severe complications [9]. CUSUM chart In its simple form, it is a chart, with postoperative deaths on the vertical axis and operations on the horizontal one. For each death, the curve moves up one unit on the vertical axis. When no fatal complications occur, the curve moves one unit horizontally. Basically, it is a warning alarm regarding the detection of direct mortality rates of a surgeon and his practice [10]. It is a real-time monitor of surgical practices. An example is the publication by de Leval et al. [11] who used a CUSUM chart to describe his surgical experience after arterial switch. Although only one patient died in the first 52 operations, 7 patients died in the following 16 operations. The ‘‘warning alarm’’ (CUSUM) led the surgical team to make changes on their technique, resulting in a positive final outcome. Therefore, it has become important in cardiac surgery the use of graphical tools in a visually intuitive manner whereby deviation from expected outcomes can be

687

monitored to see the likelihood that they have occurred by chance [12, 13]. Nevertheless, every risk adjustment is not perfect because it cannot take into account all the confounding factors. Finally funnel charts are used when for example one wishes to evaluate performance of individuals; the funnel is plotted, using upper and lower control limits as an alert limit of 3 standard deviations (99.8 % confidence limit) [14].

Common features of risk stratification systems The main outcomes interest the surgeon that treats an ischemic patient are mortality, morbidity and resouce utilisation. Mortality is easy to define and measure; however, morbidity by means of postoperative complications, is more difficult to define and scrutinise. And finally resource utilisation becomes of an interest when health-related funds must be spent in a beneficial cost-effective way. Almost all risk stratification systems aim at developing a score that, by taking into account peri-procedural variables, attempts to calculate mortality rates. Exception to this was a ‘‘Canadian’’ system [15] as well as mortality following cardiac surgery sought to predict ICU and postoperative length of stay; the authors considered only a six-variable risk index including age, sex, left ventricular function, type of surgery, urgency of surgery, and repeat operation. Furthermore, only one system the Acute Physiology, Age, Chronic Health Evaluation APACHE III [16] computes a mortality score independent of the patient diagnosis, but mainly taking into account major medical and surgical disease categories, acute physiologic abnormalities, age, pre-existing functional limitations, major co-morbidities, and treatment location immediately prior to ICU admission. Risk factors associated with in-hospital death are likely to reflect disease-specific variables. For example, mortality risk after CABG is associated with ischemic pathophysiology, such as status of left ventricular function, preoperative myocardial infraction, haemodynamic instability before the operation and emergency redo or complex procedures [17]. Risk factors associated with increased resource utilisation (such as length of ICU and hospital stay) reflect a severe co-morbid illness, as peripheral vascular disease, renal dysfunction and chronic lung disease [18].

Calculating co-morbidities Co-morbidities are coexisting diseases that are not directly related to the principal surgical diagnosis, but have an impact on the outcome of operation. The prevalence of co-

123

688

morbid illnesses in patients with cardiac disease has been well documented by Stewart et al. [19]; in patients with myocardial infraction, 26 % also have diabetes, 6 % have obstructive pulmonary disease and 12 % suffer from gastrointestinal problems. Many attempts have been made to assess co-morbidities by means of comorbidity indices. The purpose of systems like Goldman’s et al. [20] is to predict morbidity, based on preoperative associated illnesses; the authors identified nine independent variables: preoperative third heart sound or jugular venous distension; myocardial infarction in the preceding 6 months; more than five premature ventricular contractions per minute documented at any time before operation; rhythm other than sinus or presence of premature atrial contractions on preoperative electrocardiogram; age over 70 years; intraperitoneal, intra-thoracic or aortic operation; emergency operation; important valvular aortic stenosis; and poor general medical condition. The problem with these ‘‘indices’’ is that the databases used to compute them are retrospective and inaccurate with low predictive value.

Scoring systems Characteristics The several stratification systems aim to predict the risk rate after coronary revascularization, meaning an attempt to develop an additive score that will compute postoperative mortality. Most assessment systems take into account patient related factors by means of demographic variables such as age, gender and BMI; they also take into consideration cardiac disease-specific variables such as NYHA class, history of myocardial infraction, low ejection fraction, left ventricular aneurysm or ischemic ventricular septal defect (VSD). The majority of the assessment systems also take into account the urgency of the procedure, haemodynamic stability and support as well as history of co-morbid disease such as obesity, history of peripheral arteriopathy, anaemia, diabetes, renal failure, history of stroke, history of smoking and obstructive pulmonary disease. The severity and advancement of the co-morbid disease is based on defined formulas that are different between risk systems, depending on the definitions of a co-morbid disease. Most systems also take into account the complexity of a procedure, such as re-operation or complex surgery (two or more operations, operation on the ascending aorta or arch, operation for a VSD). Unfortunately, there is no system that takes into account the severity of the coronary artery disease, and how much

123

Gen Thorac Cardiovasc Surg (2013) 61:685–693

difficulty that poses to the operator. Furthermore, the ‘‘human factor’’ is not taken into consideration in any of the known risk models. The predicted value of different variables is expressed in odds ratio (OR) and confidence intervals. It is obvious that the statistical burden of a parameter has different value in predicting mortality and morbidity. An emergency operation due to on-going ischemia and cardiovascular failure reflects postoperative mortality. More specifically the VA risk stratification system [17] showed that the parameters referred to as ‘‘disease specific diagnosis’’ (example: heart failure, concurrent valve disease, re-operation, low ejection fraction, use of intra-aortic balloon pump and endocarditis) have an impact on mortality. In other words, the risk factors that can predict mortality in cardiac surgical patients are possibly associated with principal disease-specific variables (ischemia/ myocardial condition). The risk factors that can predict the length of the hospital stay and resource utilisation possibly are associated with the concurrent disease-specific variables [20]. An assessment of the risk factors to predict mortality in patients undergoing surgery for valve disease is difficult; several variables inter-correlate: there are numerous types of valves, the left ventricular ‘‘response remodelling’’ to the valvular heart disease could be difficult to be rated and also there are multiple valve-related events, sometimes impossible to be captured. However, as per Miller et al. [21] risk factors to predict mortality in patients undergoing valve surgery seem to be the functional status, the state of the left ventricle, a concurrent coronary disease, pulmonary hypertension, small aortic root, preoperative renal failure and emergency operation, are the variables that mainly have an effect on mortality, during valve operations. Regarding the various risk stratification models throughout the literature, the number of the independent variables being taken into account, vary from 5 to up to 25 in number [22]. This reflects the uncertainty and ambiguity in an effort to create a model that accurately predicts outcomes. A benchmark study reporting a robust and accurate complication rate after CABG was published by Shroyer et al. [23] where in a database of 500000 patients it was found that rates were as follows: stroke (1.63 %), renal failure (3.53 %), prolonged ventilation (5.96 %), sternal infection (0.63 %) and re-operation within the first 24 h (7.17 %). Scoring systems: how do they perform? There are significant differences between risk stratification scores; this is to do with the score design, the number and

Gen Thorac Cardiovasc Surg (2013) 61:685–693

characteristics of the initial patient population on which score development is based and the derived variables to be tested and validated. Score validity is assessed by calculating the area under the receiver operating characteristic (ROC) curve. The ROC value tests the sensitivity and specificity of the predictive power achieved. Also for every variable tested, OR is calculated to investigate the predictive relevance of risk factors. By enlarging for all scoring systems, the predictive values for morbidity are lower than the predictive values for mortality. Comparing six scoring systems [Parsonnet, Cleveland Clinic, French, Euro, Pons, and Ontario Province Risk (OPR) score] in heart surgery Geissler and colleagues [24] found that the EuroSCORE had the best predictive value when it comes to mortality. Likewise Nilsson et al. [25] looked at the performance of 19 risk scores (STS excluded) applied in a population of 6222 cardiac procedures (70 % CABG). The discriminatory power (area under the ROC curve) for 30-day mortality and 1-year mortality due to cardiovascular deaths was highest for the EuroSCORE algorithms (0.84 and 0.77), followed by the Cleveland Clinic (0.82 and 0.76) and the Magovern (0.82 and 0.76) scoring systems. None of the other risk algorithms had a significantly better discriminatory power (larger ROC area) than these four systems. The unique parameter in this project is the fact that the Swedish registry follows up rigorously all the patients up to 1 year and captures the cardiovascular deaths. The same group [26] compared EuroSCORE to the STS in a paper published in 2004; they applied the scoring systems into a cohort of 4497 patients operated for CABG between 1996 and 2001. The 30-day mortality was 1.89 %. The area under the ROC curve was 0.84 for EuroSCORE and 0.71 for STS; therefore, the EuroSCORE appeared to have a better discriminatory power. In their sophisticated analysis of the value of EuroSCORE in a single centre, Sergeant et al. [27] concluded that they needed a scoring system capable of outperforming the weather forecast, not merely matching it at an ROC value of 0.83. Indeed, the EuroSCORE group attempts to enhance the ROC value; they included adverse intra-operative events into their formula [28] and they found that their ROC value for patient mortality increased to 0.87.

Brief overview of the scoring systems Parsonnet and EuroSCORE The Parsonnet score [29] was published in 1989 and was used as a pilot in 7 New York Centres. The Parsonnet score was the first simple, verified, documented and cumulative

689

scoring system for the prediction of the risk involved in a heart operation. It has been widely used in the US, but its weak point is that it allows ‘‘room’’ for subjective scoring (‘‘unclear’’ variables). Two of the variables (catastrophic states and other rare circumstances) were fairly arbitrary and could significantly influence the score. The Parsonnet risk model over-predicts mortality in most situations, partly because elements of the system are subjective, but mainly because cardiac surgery itself has improved substantially in the 10? years since Parsonnet was first introduced. The EuroSCORE [30] was introduced in 1999 as a simple 16 variable system. It is similar to the NorthAmerican Parsonnet score, but the sample is Pan-European. It is related to a large database. The collection of the variables was performed in 128 cardio-surgical units all around Europe and 6 countries each with vastly different patient epidemiologies. Although EuroSCORE predicts patient outcome far better than Parsonnet, it does over-predict mortality in lowrisk cases; furthermore, it underestimates mortality in those of the highest risk. It is believed that this is because insufficient weighting has been ascribed to the most serious situations, as there were too few of these in the initial cohort of patients that generated the reference dataset. EuroSCORE models retain very good discrimination, with an area under the ROC curve of 0.7896 for the logistic model and 0.7894 for the additive model. The current risk-adjusted mortality ratio (RAMR = observed/predicted) for the additive model is 0.67 and for the logistic model is 0.53. Thus, the original EuroSCORE is now no longer appropriately calibrated and shows a substantial net improvement in cardiac surgical outcomes since 1995 with risk-adjusted mortality falling by nearly half. STS model The STS has a database consisting of 2 million cardiac procedures since 1997. In 2009 the performance of the model for isolated CABG cases [31] was published; 774881 isolated CABG procedures were performed on adult patients between 1 January 2002, and 31 December 2006, at 819 US centres. The c-index for the mortality model was 0.812. As regards valve surgery [32] the STS model studied adult patients who underwent one of the three types of valve surgeries: isolated aortic valve replacement (AVR) (n = 67292), isolated mitral valve replacement (MVR) (n = 21229), or isolated mitral valve repair (n = 21238). The population was divided into a 60 % development sample and a 40 % validation sample. The number of

123

690

predictive variables was 24 covariates in the composite mortality plus morbidity model. Discrimination as measured by the c-index was 0.799 for mortality. The final study population of the STS model [33] consisted of 101661 procedures, including AVR plus CABG, MVR plus CABG, or mitral valve repair (MVRepair) plus CABG between 1 January 2002, and 31 December 2006. The c-index for mortality prediction for the overall valve plus CABG population was 0.75. Australian model The AusSCORE [34] is the Australian predictive version for mortality following CABG. It was developed in 2009 taking into account 11.823 patients over a 4-year period from 2001 to 2005 in Australia. Factors selected as independent predictors in the preoperative, isolated coronary bypass AusSCORE model were as follows: age, New York Heart Association class, ejection fraction estimate, urgency of procedure, previous cardiac surgery, hypercholesterolemia (lipid-lowering treatment), peripheral vascular disease, and cardiogenic shock. The area under the ROC curve was 0.834. A model from predicting mortality following AVR was published from the Australian database in 2011 [35]. Between July 2001 and June 2008, a total of 3544 AVR procedures were performed. Early mortality was 4.15 %. The AVR-score contained the following predictors: age, New York Heart Association class, left main disease, infective endocarditis, cerebrovascular disease, renal dysfunction, previous cardiac surgery and estimated ejection fraction. The final model (AVR-score) obtained an average area under the ROC curve of 0.78. British risk model A British risk model that predicts mortality following AVR was published in 2007 [36]. 4550 consecutive patients undergoing aortic valve replacement between 1 April 1997 and 31 March 2004 at four hospitals. Overall mortality was 4.6 %. The variables taken into account were age, renal dysfunction: 1.2307, p \ 0.001; NYHA class IV: 0.5782, p = 0.003; hypertension: 0.4203, p = 0.006; atrial fibrillation: 0.604, p = 0.002; ejection fraction \30 %: 0.571, p = 0.012; previous cardiac surgery: 0.9193, p \ 0.001; [10] non-elective surgery: 0.5735, p \ 0.001; cardiogenic shock: 1.1291, p = 0.009; and concomitant CABG: 0.6436, p \ 0.001. The ROC curve was 0.78. New York cardiac surgical registry A study by Hannan et al. [37] was published in 2006 using a predicting model consisting of 10 variables. It is

123

Gen Thorac Cardiovasc Surg (2013) 61:685–693

interesting to examine some of the definitions: haemodynamic state was rated as stable, Unstable or shock, EF\20, 20–29, 30–39 % and pre-procedural MI \6 h, first day and 1–20 days. The octogenarians’ shock state and MI \6 h scored the most. Interestingly, extensively calcified ascending aorta scored 2 points. The risk factors with the highest OR were previous MI \6 h (OR 7.22, 95 % CI 3.81–13.67), shock (OR 5.85, 95 % CI 3.05–11.24), and renal failure requiring dialysis (OR 5.58, 95 % CI 3.62–8.61). The logistic regression model had a very good C-statistic of 0.823. The same group [38] went on to develop a scoring system to predict mortality following cardiac valve surgery with again acceptable predicting ability and C-statistics. Morbidity model The term CABDEAL is a representation of the risk factors that make up the model (C, creatinine; A, age; B, body mass index; D, diabetes mellitus; E, emergency; A, abnormal electrocardiogram; and L, lung function). The CABDEAL model [39], although with limited overall applicability, has shown to perform well in the prediction of morbidity when compared with other models. EuroSCORE II The EuroSCORE is a widely used, but its predictive power and reliability are declining due to changes in cardiac surgery case mix and outcomes in recent years. Therefore, it became apparent that the predicting ability of the score might be increased by changes meeting the recent needs. The EuroSCORE II [40] was developed from a dedicated website collecting prospective risk and outcome data on 22381 consecutive patients undergoing major cardiac surgery in 154 hospitals in 43 countries (of which many were outside Europe) over a 12-week period (May–July 2010). Moreover, the authors omitted the additive score. In order to enhance the performance of the model in valvular surgery, an almost equal number of CABG and valve (around 10000 cases each) procedures performed, was incorporated in the score calculation. Most of the variables applied to the old EuroSCORE, reexplored and their impact to mortality readjusted, according to the current practices outcomes; therefore, it was found that the impact of certain variables to predict mortality is reduced. It was also noticed from the analysis of their variables that CCS class 4, insulin-dependent diabetes, reduce mobility and creatinine clearance, are better predictors of mortality. Low BMI, serum albumin, BNP and troponin release were considered, but not eventually ‘‘made it’’ into the model. Recent MI as well as PA pressure and LV

Gen Thorac Cardiovasc Surg (2013) 61:685–693

function were reclassified. Furthermore, weight was given to the number of procedures as well as to the surgery on the thoracic aorta whereas ischemic VSD was removed due to limited data. Finally, the EuroSCORE II slightly under predicted mortality when tested back to the ‘‘in-study’’ population. However, the discriminatory ability of the new model was found to be better as compared to EuroSCORE I. One of the major concerns with EuroSCORE II is that the primary outcome was mortality at the base hospital. Current guidelines and clinical trial practices mandate mortality assessment at 30, and up to 90 days. The number of centres that provided 30 or 90 days mortality was disappointing as per Sergeant et al. [41]. There may be a sampling time error [42] due to the seasonal selection of the database; nevertheless we ought to congratulate the contribution of the EuroSCORE group in their effort to update a very successful surgical tool.

Weaknesses of the scoring systems Up to today, it is impossible to recommend a stratification system as being the most suited. The Parsonnet is based on a database, where variables reflect the US population and for some people, this population differs from the Europeans. Moreover, in the Parsonnet score, the ‘‘freedom’’ of definition of some variables (example: the ‘‘destructive’’ situations and the ‘‘rarest’’ co-morbidities) increases the subjective factor and thus alters the ultimate score. In general, it has been shown that the Parsonnet score over-calculates risk. Moreover, if a patient is [80 years old, then the expected mortality is set to be 31 %, which is quite higher than the observed. Moreover, the patient with an ejection fraction \30 %—the scoring system gives 4 points,—so mortality is estimated to be 1–2 %, an estimated number that is quite lower than the observed. In EuroSCORE the absence of variables such as insulin dependent diabetes, hypertension and smoking history, is surprising. On the contrary, the presence of non-cardiovascular disease has been taken into account. Moreover, a disadvantage is that the anatomy and state of the coronary arteries are not taken into consideration (small vessels, chronic obstruction with ‘recanalization’, endomyocardial LAD branch, diffuse distal disease, non-graftable vessels). The total EuroSCORE possibly under-calculates the risk in high-risk patients. On the contrary it over-calculates the risk in medium and low-risk patients. Moreover, a patient with a poor LV and an ejection fraction \30 % is scored with 3 points; the amount of reversible myocardium is not taken into account, which I think biases the overall result. Likewise under the

691

circumstances ischemic mitral disease is not taken into consideration separately. Frailty is another variable that is not taken into account by any scoring systems although it is becoming increasingly important nowadays, especially after dealing with a substantial percentage of patients above 80 years old. Weakness of scoring systems could also be attributed in the fact that a large number of risk factors in most models are highly correlated. It is important to recognise correlation between predicting variables, as the additional risk contribution of certain variables can in some part be explained by the effect of other variables; so, potential interactions may overestimate risk in certain categories of patients. Another bias factor is the preoperative ‘‘over-scoring’’ of patients, as underlined in the discussion between Dr. Frater and Dr. Nashef [30]: Dr. Frater: Do you know what happened in New York when we included as a risk factor the intravenous use of nitroglycerin in the preoperative stage? Dr. Nashef: All patients were suddenly on nitroglycerin. Intuitively, the larger the database on which a risk scoring system is developed, then more risk factors acquire a statistical significance. Even though it seems obvious that ideal models shall include multiple risk factors, the answer to this question remains unclear. For example, the STS risk stratification model includes multiple risk factors, whereas the Toronto risk-adjustment scheme includes only 5 factors. The diagnostic value of these models is similar. The predictive accuracy of different risk models may be influenced by numerous factors, such as differences in variable definitions, management of incomplete data fields, surgical procedure selection criteria, and geographical differences in patient risk factors. The prevalence of risk factors in patients referred for heart surgery may also change over time. Furthermore, the majority of data are used to generate, validate, and compare risk stratification systems for cardiac surgery predates many of the newer surgical techniques, such as off-pump surgery; nevertheless as per Polari et al. [43] EuroSCORE algorithms performed similarly, and have comparable performance in on pump and off-pump CABG; both risk models, however, significantly overestimated mortality. Structural weaknesses of scoring systems may be reflected to human errors related to interpretation of a definition of variable or sometimes conflicting information in the patient’s chart. One method to improve development of a risk model could be to include more patients with higher risk scores as suggested by Wyse et al. [44]. Another method to improve the risk model would be to include as many variables as possible [45]; However, to be able to make a hypothetical model like this to work you would have to balance between efficiency of collecting the

123

692

data, accuracy, comprehensiveness and user friendliness. Lastly another method to improve risk prediction scoring could be to use more complex statistics to build up a risk model contrary to most risk algorithms that are based on logistic regression analysis with a priori assumptions of linear relationships. Lastly, the quality of the coronary target vessels is not addressed by any of the known systems; we speculate that the SYNTAX score may have to be incorporated in a future risk stratification model. Finally, one should mention the comprehensive criticism on the risk models, published in a recent editorial by Sergeant et al. [41]. It is worth stating here, that most of the risk systems calculate 30-day mortality whereas for example 30-day mortality outcome in patients undergoing heart valve replacement [46] contributes to approximately 56–57 % of deaths observed at 1-year post procedure.

Conclusion There is an argument that the discrimination provided by the predictive models is barely better than clinical judgment. And furthermore that many high-risk cardiac surgical patients require a prolonged stay in the intensive care unit, something that is not predicted from those models. Given that surgical outcomes are announced on the Internet, the ‘‘candidate’’ surgical patients that visit the several clinics’ websites, obviously prefer, without being able to elaborate further, those clinics with the ‘‘best’’ statistical results. So, the lighter the cases, the ‘‘better’’ the statistical result, so the more preferred the clinic becomes. This has been observed in the New York State, where a hospital had been rejecting high-risk patients, aiming to improve its ‘‘outcomes’’. There are several benefits to risk stratifying patients: (a) better understanding and defining basic demographics of patients undergoing cardiac surgery; (b) capturing the evolving and changing population requiring cardiac surgery; (c) unit benchmarking, and development of effective quality improvement programs; (d) assessing the performance of a hospital for its case mix compared with the expected performance of a reference group of providers for a similar case mix; (e) promote awareness and rationalisation of resource utilisation; and (f) use of data to leverage healthcare providers to affect policy. Cardiac surgical registries will soon attempt to track patients for longer follow-up periods after discharge to identify surgery-related deaths for more extended periods of time following surgery, thereby improving the monitoring and prediction of patient outcomes. Wu et al. [47] developed a model to predict long-term mortality following CABG. The overall 7 year mortality was 24.2 % with renal failure

123

Gen Thorac Cardiovasc Surg (2013) 61:685–693

requiring dialysis to be associated with the highest relative risk of death (hazard ratio = 5.53). Finally, when properly conducted, the measurement of risk-adjusted surgical outcomes should improve quality control and decision-making and allow early identification of poor performance. Conflict of interest interest exists.

The authors have declared that no conflict of

References 1. Christian CK, Gustafson ML, Betensky RA, Daley J, Zinner MJ. The volume–outcome relationship: don’t believe everything you see. World J Surg. 2005;29(10):1241–4. 2. Milstein A, Galvin RS, Delbanco SF, Salber P, Buck CR Jr. Improving the safety of health care: the leapfrog initiative. Eff Clin Pract. 2000;3(6):313–6. 3. Birkmeyer JD, Dimick JB. Potential benefits of the new leapfrog standards: effect of process and outcomes measures. Surgery. 2004;135(6):569–75. 4. Birkmeyer JD, Stukel TA, Siewers AE, Goodney PP, Wennberg DE, Lucas FL. Surgeon volume and operative mortality in the United States. N Engl J Med. 2003;349(22):2117–27. 5. Cohen IB. Florence Nightingale. Sci Am. 1984;250(3):128–37. 6. Codman EA. A study in hospital efficiency: as demonstrated by the case report of the first five years of a private hospital. Oakbrook Terrace: Joint Commission on Accreditation of Healthcare Organizations; 1996. 7. Ferraris VA, Ferraris SP. Assessing the medical literature: let the buyer beware. Ann Thorac Surg. 2003;76(1):4–11. 8. Cerfolio RJ, Ojha B, Bryant AS, Bass CS, Bartalucci AA, Mountz JM. The role of FDG-PET scan in staging patients with non small cell carcinoma. Ann Thorac Surg. 2003;76(3):861–6. 9. Berwick DM, Godfrey AB, Roessner J. Curing health care: new strategies for quality improvement. San Francisco: Jossey-Bass; 1990. 10. Rogers CA, Reeves BC, Caputo M, Ganesh JS, Bonser RS, Angelini GD. Control chart methods for monitoring cardiac surgical performance and their interpretation. J Thorac Cardiovasc Surg. 2004;128(6):811–9. 11. de Leval MR, Francois K, Bull C, Brawn W, Spiegelhalter D. Analysis of a cluster of surgical failures. Application to a series of neonatal arterial switch operations. J Thorac Cardiovasc Surg. 1994;107(3):914–23 (discussion 23–4). 12. Sherlaw-Johnson C, Gallivan S, Treasure T, Nashef SA. Computer tools to assist the monitoring of outcomes in surgery. Eur J Cardiothorac Surg. 2004;26(5):1032–6. 13. Treasure T, Gallivan S, Sherlaw-Johnson C. Monitoring cardiac surgical performance: a commentary. J Thorac Cardiovasc Surg. 2004;128(6):823–5. 14. Noyez L. Control charts, Cusum techniques and funnel plots. A review of methods for monitoring performance in healthcare. Interact Cardiovasc Thorac Surg. 2009;9(3):494–9. 15. Tu JV, Jaglal SB, Naylor CD. Multicenter validation of a risk index for mortality, intensive care unit stay, and overall hospital length of stay after cardiac surgery. Steering Committee of the Provincial Adult Cardiac Care Network of Ontario. Circulation. 1995;91(3):677–84. 16. Knaus WA, Wagner DP, Draper EA, Zimmerman JE, Bergner M, Bastos PG, et al. The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest. 1991;100(6):1619–36.

Gen Thorac Cardiovasc Surg (2013) 61:685–693 17. Hammermeister KE, Burchfiel C, Johnson R, Grover FL. Identification of patients at greatest risk for developing major complications at cardiac surgery. Circulation. 1990;82(5 suppl):IV 380–9. 18. Ferraris VA, Ferraris SP. Risk factors for postoperative morbidity. J Thorac Cardiovasc Surg. 1996;111(4):731–8 (discussion 8–41). 19. Stewart AL, Greenfield S, Hays RD, Wells K, Rogers WH, Berry SD, et al. Functional status and well-being of patients with chronic conditions. Results from the Medical Outcomes Study. JAMA. 1989;262(7):907–13. 20. Goldman L, Caldera DL, Nussbaum SR, Southwick FS, Krogstad D, Murray B, et al. Multifactorial index of cardiac risk in noncardiac surgical procedures. N Engl J Med. 1977;297(16): 845–50. 21. Miller CC 3rd, Safi HJ, Winnerkvist A, Baldwin JC. Actual versus actuarial analysis for cardiac valve complications: the problem of competing risks. Curr Opin Cardiol. 1999;14(2): 79–83. 22. Grunkemeier GL, Zerr KJ, Jin R. Cardiac surgery report cards: making the grade. Ann Thorac Surg. 2001;72(6):1845–8. 23. Shroyer AL, Coombs LP, Peterson ED, Eiken MC, DeLong ER, Chen A, et al. The Society of Thoracic Surgeons: 30-day operative mortality and morbidity risk models. Ann Thorac Surg. 2003;75(6):1856–64 (discussion 64–5). 24. Geissler HJ, Holzl P, Marohl S, Kuhn-Regnier F, Mehlhorn U, Sudkamp M, et al. Risk stratification in heart surgery: comparison of six score systems. Eur J Cardiothorac Surg. 2000;17(4):400–6. 25. Nilsson J, Algotsson L, Hoglund P, Luhrs C, Brandt J. Comparison of 19 pre-operative risk stratification models in openheart surgery. Eur Heart J. 2006;27(7):867–74. 26. Nilsson J, Algotsson L, Hoglund P, Luhrs C, Brandt J. Early mortality in coronary bypass surgery: the EuroSCORE versus The Society of Thoracic Surgeons risk algorithm. Ann Thorac Surg. 2004;77(4):1235–9 (discussion 1239–40). 27. Sergeant P, de Worm E, Meyns B. Single centre, single domain validation of the EuroSCORE on a consecutive sample of primary and repeat CABG. Eur J Cardiothorac Surg. 2001;20(6):1176–82. 28. Stoica SC, Sharples LD, Ahmed I, Roques F, Large SR, Nashef SA. Preoperative risk prediction and intraoperative events in cardiac surgery. Eur J Cardiothorac Surg. 2002;21(1):41–6. 29. Parsonnet V, Dean D, Bernstein AD. A method of uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation. 1989;79(6 Pt 2):I3–12. 30. Roques F, Nashef SA, Michel P, Gauducheau E, de Vincentiis C, Baudet E, et al. Risk factors and outcome in European cardiac surgery: analysis of the EuroSCORE multinational database of 19030 patients. Eur J Cardiothorac Surg. 1999;15(6):816–22 (discussion 22–3). 31. Shahian DM, O’Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 1—coronary artery bypass grafting surgery. Ann Thorac Surg. 2009;88(1 suppl):S2–22. 32. O’Brien SM, Shahian DM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 2—isolated valve surgery. Ann Thorac Surg. 2009;88(1 suppl):S23–42.

693 33. Shahian DM, O’Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 3—valve plus coronary artery bypass grafting surgery. Ann Thorac Surg. 2009;88(1 suppl):S43–62. 34. Reid C, Billah B, Dinh D, Smith J, Skillington P, Yii M, et al. An Australian risk prediction model for 30-day mortality after isolated coronary artery bypass: the AusSCORE. J Thorac Cardiovasc Surg. 2009;138(4):904–10. 35. Ariyaratne TV, Billah B, Yap CH, Dinh D, Smith JA, Shardey GC, et al. An Australian risk prediction model for determining early mortality following aortic valve replacement. Eur J Cardiothorac Surg. 2011;39(6):815–21. 36. Kuduvalli M, Grayson AD, Au J, Grotte G, Bridgewater B, Fabri BM. A multi-centre additive and logistic risk model for in-hospital mortality following aortic valve replacement. Eur J Cardiothorac Surg. 2007;31(4):607–13. 37. Hannan EL, Wu C, Bennett EV, Carlson RE, Culliford AT, Gold JP, et al. Risk stratification of in-hospital mortality for coronary artery bypass graft surgery. J Am Coll Cardiol. 2006;47(3): 661–8. 38. Hannan EL, Wu C, Bennett EV, Carlson RE, Culliford AT, Gold JP, et al. Risk index for predicting in-hospital mortality for cardiac valve surgery. Ann Thorac Surg. 2007;83(3):921–9. 39. Kurki TS, Jarvinen O, Kataja MJ, Laurikka J, Tarkka M. Performance of three preoperative risk indices; CABDEAL, EuroSCORE and Cleveland models in a prospective coronary bypass database. Eur J Cardiothorac Surg. 2002;21(3):406–10. 40. Nashef SA, Roques F, Sharples LD, Nilsson J, Smith C, Goldstone AR, et al. EuroSCORE II. Eur J Cardiothorac Surg. 2012;41(4):734–44 (discussion 44–5). 41. Sergeant P, Meuris B, Pettinari M. EuroSCORE II, illum qui est gravitates magni observe. Eur J Cardiothorac Surg. 2012;41(4): 729–31. 42. Poullis M, Fabri B, Pullan M, Chalmers J. Sampling time error in EuroSCORE II. Interact Cardiovasc Thorac Surg. 2012;14(5): 640–1. 43. Parolari A, Pesce LL, Trezzi M, Loardi C, Kassem S, Brambillasca C, et al. Performance of EuroSCORE in CABG and offpump coronary artery bypass grafting: single institution experience and meta-analysis. Eur Heart J. 2009;30(3):297–304. 44. Wyse RK, Taylor KM. Using the STS and multinational cardiac surgical databases to establish risk-adjusted benchmarks for clinical outcomes. Heart Surg Forum. 2002;5(3):258–64. 45. Head SJ, Osnabrugge RL, Howell NJ, Freemantle N, Bridgewater B, Pagano D, et al. A systematic review of risk prediction in adult cardiac surgery: considerations for future model development. Eur J Cardio-thorac Surg Off J Eur Assoc Cardio-thorac Surg. 2013;43(5):e121–9. 46. Edwards MB, Taylor KM. Is 30-day mortality an adequate outcome statistic for patients considering heart valve replacement? Ann Thorac Surg. 2003;76(2):482–5 (discussion 6). 47. Wu C, Camacho FT, Wechsler AS, Lahey S, Culliford AT, Jordan D, et al. Risk score for predicting long-term mortality after coronary artery bypass graft surgery. Circulation. 2012;125(20): 2423–30.

123

Cardiac surgical patients are not the same. But who knows that: the patient, the cardiologist or the surgeon?

Nowadays, outcome measures get published following risk adjustment; mortality rates are adjusted for risks to give a picture of the performance of a s...
207KB Sizes 0 Downloads 0 Views