Metab Brain Dis DOI 10.1007/s11011-014-9613-5

SHORT COMMUNICATION

On improving human clinical trials to the level of animal ischemic stroke studies Donald G. Stein

Received: 24 June 2014 / Accepted: 18 August 2014 # Springer Science+Business Media New York 2014

Abstract This is a response to Jickling and Sharp’s paper discussing the near-complete failure of clinical trials for stroke interventions. While they propose a paradigm shift in the way preclinical research is conducted, I propose that it is clinical trial design that needs an overhaul. Clinical trials could be designed to reduce variance, prevent data entry errors, and encompass less ambitious enrollment criteria limited to fewer centers which have demonstrated expertise in the treatment of stroke (and TBI). Statistical fundamentalism is another soluble problem: clinical trial designs tend to address what is medically significant as opposed to what is primarily statistically significant. Adaptive Design is an alternative to current protocols that needs urgent consideration if we are to get through the Valley of Death between bench and bedside. Maybe it is time to change the clinical trial paradigm to adopt the precise modeling used in good laboratory research rather than asking scientists to give up well-established procedures for producing reliable and reproducible results. Keywords Stroke . Clinicaltrialdesign . Preclinicalresearch . Traumatic brain injury In an excellent recent article in this journal, Drs. Glen C. Jickling and Frank R. Sharp (“Improving the translation of animal ischemic stroke studies to humans,” published online Feb. 15, 2014) point out that “despite testing more than 1,026 therapeutic strategies in models of ischemic stroke and 114 therapies in human ischemic stroke, only one agent, tissue plasminogen activator, has successfully been translated to clinical practice as a treatment for acute stroke.” I would add that there have also been more than 60 Phase III clinical trials D. G. Stein (*) Department of Emergency Medicine, Emory University, Atlanta, GA 30322, USA e-mail: [email protected]

for the treatment of traumatic brain injury (TBI) and all of them have failed as well. This is an abysmal overall success rate of five thousandths of 1 %. And this “success” is due entirely to tPA—the only drug that worked at all, and which helps only a small percentage of stroke patients! This performance rate approaching zero cost billions of dollars, millions of hours of labor and thousands of scientific publications debating whether a given agent ought to be tested in clinical trials in the first place. Yes, as Jickling and Sharp note, we need consistent, clear guidelines governing how much independent verification is required before a new drug can go into clinical testing. Some guidelines are already in practice. Most pre-clinical stroke researchers already follow the Stroke Therapy Academic Industry Roundtable (STAIR) recommendations first promulgated in 1999: provide dose–response data, define the time-window for treatment, do reproducible and blinded studies testing new drugs, measure histological and functional short- and long-term outcomes, do rodent studies and follow them with gyrencephalic species (the hardest to do), use several stroke models—permantent and transient—in males and females. But realistically, can the authors explain how “1,026 therapeutic strategies all have failed in models of stroke”? By chance alone one could expect that at least 50 would succeed. Something is clearly not right, but given the weight of all the pre-clinical experimentation, I question whether the problems lie entirely with faulty preclinical animal stroke or TBI studies. From their perspective as clinicians with many years of clinical trial experience, Jickling and Sharp make a number of very plausible recommendations as to how to move beyond the “valley of death” that has been the fate of almost every drug ever tested for stroke or TBI. The problem, as they see it, lies in the fact that pre-clinical research fails to model the real world of medical trials with humans. They point out, for example, that stroke (I would add, like TBI) is a very heterogeneous disease, with widely varying extent of infarct size and

Metab Brain Dis

type, and that there may be different mechanisms underlying the different kinds of stroke. In contrast, in animal studies, injuries are well-controlled and for the most part highly localized and reproducible. Among other good points, the authors note that the behavioral outcomes used in laboratory studies do not reflect the behavioral outcomes used in clinical trials (which are usually rather blunt, easy-to-administer, low-cost, dichotomous measures such as the Modified Rankin Scale for stroke and the Glasgow Outcome Scale-Extended for TBI). From Jickling and Sharp’s point of view, the clinical trial failures point to the fact that the precise, well-controlled, usually highly quantitative anatomical and behavioral measures and biomarkers used in animal studies do not reflect the vagaries and difficulties of clinical trial pressures and designs. They propose, in effect, that laboratory investigators should use pre-clinical stroke models that are variable and inconsistent, spread out laboratory testing of injured animals over a large number of centers (I surmise with varying degrees of scientific expertise in doing stroke surgeries), and use, as in clinical trials, unsophisticated and blunt behavioral outcome measures that may not really reflect underlying short- or longterm functional deficits in order to mirror the high degree of variability and error (heterogeneity) that often accompanies clinical trials. From the perspective of a bench scientist like me, their recommendations could be interpreted as arguing for guaranteeing failure at the preclinical levels of research that would match the failures of clinical trials over the last 3–4 decades in the hopes that something better might emerge from the intentional chaos that this approach could produce. I suggest an alternative approach: clinicians should consider designing their clinical trials to reduce variance, prevent data entry errors, and encompass less ambitious enrollment criteria. For example, instead of enrolling patients with multiple different kinds of brain injury, it would be possible to limit enrollment to blunt frontal or temporal cortex damage. For stroke, why not limit enrollment to patients with large vessel atherosclerosis, or cardioembolic stroke? Trials could also be limited to fewer centers which have demonstrated clear, substantial expertise in the treatment of stroke (and TBI). Think how much money and time would be saved if a smaller number of centers enrolled fewer patients with well-defined stroke (or TBI) parameters similar to those used in animal studies, instead of the other way around. Admittedly, such trials might take more time to complete because of lower patient enrollments and fewer centers, but the more concise data would be easier to interpret and more likely to reveal consistent and substantial treatment effects that would not be masked by high levels of variability and error, and it might be feasible to run more such smaller, cheaper, more focused trials. Because so many factors can influence the outcome of brain injuries in humans, it is critically important, and I believe

entirely feasible, to collect much more precise, quantitative, objective functional outcome measures than the current standard. Asking a brain-injured patient (or a care-giver) to report on her “quality of life” as the primary measure of her functional status cannot possibly yield an objective result. Detailed and more quantitative and objective neuropsychological testing will cost more and take more time, but in the long run it could prove far cheaper than repeated failures to find effects across a broader spectrum of cases. The FDA plays a very important role in deciding what outcome measures should be used in clinical trials for stroke and TBI. But if the agency’s requirements are based on obsolete standards and methods, the whole purpose of the trial is defeated before it even starts. As the techies say, “garbage in, garbage out.” Even the best biostatistical designs cannot fix this problem. From the many meetings on TBI that I have attended, better definitions and outcome measures are major concerns for the field. There may not be an easy solution to this problem, but it is not impossible to resolve. Recently, Tate et al. (2013) identified more than 700 tests that are available “for all the major areas of function pertinent to TBI” (p. 739). Perhaps some (or even many) of these tests could be adapted to quantitative evaluations of stroke patients. The World Health Organization International ICF Checklist (2003) is a comprehensive assessment instrument thatcould also be considered for further evaluation. Another glaring problem in clinical trial design is what I think of as statistical fundamentalism: to what extent do clinical trial designs address what is medically significant as opposed to what is primarily statistically significant? Phase III clinical trials are usually terminated either because treatmentrelated serious adverse events are found, or because the differences between the treatment group and the control group do not approach statistical significance, usually set at the probability of being in error no more than 5 % of the time in rejecting the null hypothesis of no differences between groups. In other words, one has to be 95 % confident that a drug works. Obviously very strict standards must be set in preclinical research using animals, to assure that, at the very least, this 95 % (or better) confidence of finding an effect should be maintained in multiple peer-reviewed studies before any drug can be advanced to clinical trial. However, if the bar is set equally high in Phase III trials, what might be medically significant is rejected—are FDA requirements so strict that nothing can pass? Retrospective analyses of groups of studies might provide more confidence in a medically significant effect if a given proportion of those trials approached a level of significance that could be accepted by consensus—especially if the drugs in question had no serious adverse effects and there were no alternative treatments. Here is an example. Imagine a patient with a very severe stroke. If you had a drug with no history of serious adverse effects in Phase II trials but you were only 90 % (p

On improving human clinical trials to the level of animal ischemic stroke studies.

This is a response to Jickling and Sharp's paper discussing the near-complete failure of clinical trials for stroke interventions. While they propose ...
106KB Sizes 2 Downloads 4 Views