G Model NSR-3719; No. of Pages 7

ARTICLE IN PRESS Neuroscience Research xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Neuroscience Research journal homepage: www.elsevier.com/locate/neures

Review article

Changing requirements and resulting needs for IT-infrastructure for longitudinal research in the neurosciences Karoline Buckow, Matthias Quade, Otto Rienhoff, Sara Y. Nussbeck ∗ University Medical Center Göttingen, Department of Medical Informatics, Robert-Koch-Str. 40, 37075 Göttingen, Germany

a r t i c l e

i n f o

Article history: Received 10 February 2014 Received in revised form 6 August 2014 Accepted 13 August 2014 Available online xxx Keywords: IT-infrastructure Metadata Identity management Data quality Neuroscience Infrastructure methodology

a b s t r a c t The observation of growing “difficulties” in IT-infrastructures in neuroscience research during the last years led to a search for reasons and an analysis on how this phenomenon is reflected in the scientific literature. With a retrospective analysis of nine examples of multicenter research projects in the neurosciences and a literature review the observation was systematically analyzed. Results show that the rise in complexity mainly stems from two reasons: (1) more and more need for information on quality and context of research data (metadata) and (2) long-term requirements to handle the consent and identity/pseudonyms of study participants and biomaterials in relation to legal requirements. The combination of these two aspects together with very long study times and data evaluation periods are components of the subjectively perceived “difficulties”. A direct consequence of this result is that big multicenter trials are becoming part of integrated research data environments and are not standing alone for themselves anymore. This drives up the resource needs regarding the IT-infrastructure in neuroscience research. In contrast to these findings, literature on this development is scarce and the problem probably underestimated. © 2014 Elsevier Ireland Ltd and the Japan Neuroscience Society. All rights reserved.

Contents 1. 2.

3.

4.

5.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. IT-requirements for longitudinal and large cohort studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. International literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. Description of the latest application design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. IT-requirements for longitudinal and large cohort studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. International literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Description of the latest application design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. IT requirements for longitudinal and large cohort studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. International literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Description of the latest application design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion and recommendations for future neuroscience projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conflicts of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

∗ Corresponding author. Tel.: +49 551396984. E-mail address: [email protected] (S.Y. Nussbeck). http://dx.doi.org/10.1016/j.neures.2014.08.005 0168-0102/© 2014 Elsevier Ireland Ltd and the Japan Neuroscience Society. All rights reserved.

Please cite this article in press as: Buckow, K., et al., Changing requirements and resulting needs for IT-infrastructure for longitudinal research in the neurosciences. Neurosci. Res. (2014), http://dx.doi.org/10.1016/j.neures.2014.08.005

G Model NSR-3719; No. of Pages 7

ARTICLE IN PRESS K. Buckow et al. / Neuroscience Research xxx (2014) xxx–xxx

2

1. Introduction In an estimation of the World Health Organization and the US Institute of Ageing, by 2016 the world population for the first time in history will have more people aged 65+ than people younger than five years (WHO and US National Institute of Ageing, 2011). With this incidence the prevalence of neurodegenerative diseases, such as Alzheimer’s and Parkinson’s, will increase over the next decades. For 2050 an expected number of 100 million people will suffer from Alzheimer’s worldwide (Coleman and Barrow, 2012). This will sum up to a predicted annual cost of 1.9 trillion US$. According to a study of the European Brain Council, 127 million Europeans were affected by brain disorders in 2004, which cost the health systems approximately 400 billion Euros (Di Luca, 2011). To boost neuroscience research, the European Union increased its funding budget in the 7th framework program for brain research from 10% to 20% (Di Luca, 2011). This paper addresses the growing IT challenges arising from new research approaches and changing methods applied in the neurosciences within the last ten to 15 years and ongoing. Relevant measures – with regard to the existing and upcoming difficulties – are described which facilitate research projects to produce highquality research data and to ensure the provision of sustainable databases fulfilling applying regulations. But what distinguishes neurologic disorders from cardiologic or oncologic diseases? And how do these particularities affect the planning, setup, and maintenance of according IT-infrastructures? One key aspect is the heterogeneity and complexity of symptoms affecting the patients’ mental well-being, cognition, motor function, self-perception or self-conception. Thus, the measurement and evaluation of effects resulting from neurologic diseases cover a broad range of instruments and methods. Even conditions like sexual perceptions can meanwhile be observed by analysis of physiological data. Due to this heterogeneity (Balk et al., 2014) holistic approaches are chosen to investigate the etiology of diseases, to assess the morbidity and severity of clinical symptoms and to find biomarkers for prediction and therapy. All this is on the background of changing perceptions of disease definitions and nosologic readjustments, which is focused and stimulated by research approaches dealing with personalized medicine (Murray et al., 2011). Furthermore, this holistic approach affects the use and the applicability of biomarkers that will be based on a whole panel of different measurements and platforms (Filiou and Turck, 2011). For example, genomics approaches like genome wide association studies (GWAS1 ) are not enough for identifying biomarkers in complex psychiatric diseases anymore. These approaches must be supported by further research such as epigenomics, phenomics, environmental factors, and neurobiological approaches (Schulze, 2010). This results in more complex study designs and the subdivision of patients into smaller treatment groups, which – for significance reasons – require higher numbers of patients to be included in one study. To make symptoms and conditions measurable and comparable, a broad range of techniques and instruments is used and according results and interpretations need to be handled adequately. Neurologic symptoms are often measured using specific questionnaires to be answered by the patient (such as the Multiple Sclerosis Impact Scale (Riazi, 2002)) or scores assessed by physicians (like the Expanded Disability Status Scale (Kurtzke, 1983)). A hallmark of psychiatric research is big cohorts with several thousand patients and controls, which can only be achieved through collaboration (Anderson-Schmidt et al., 2013; Sullivan, 2010) of many centers. This also and especially applies for the

1

GWAS: genome wide association studies.

Neuropathology. Ethically, the human brain holds an exceptional position (Shen, 2013) and together with the decreasing number of autopsies (Kretzschmar, 2009) the number of brain donations is decreasing. Therefore, large consortia are being built (Bell et al., 2008; Sheedy et al., 2008) to optimally study the brains of affected people. The above described difficulty of measuring “soft” clinical symptoms is a major challenge in multicenter projects (interobserver variability). The same applies for MRI investigations and for the processing of biomaterial using different devices from various vendors. Longitudinal studies are required and are already performed (Demiroglu et al., 2012; Warner et al., 2008) since the patients need to be followed up for many years to observe the disease progression, therapy effects, or the change in quality of life. In addition, many mental disorders and degenerative diseases, such as Multiple Sclerosis or Huntington’s disease, have an early onset. Chronic diseases affect patients for years and it happens quite often that they result in an early termination of employment and thus add an additional burden to the health system and economy of a country (DeVol et al., 2007). A big issue in longitudinal studies is the discovering of new knowledge and resulting further developments during the study runtime. Hence, the diagnostic clarification grid may change over time, as do instruments or definitions. For example, the diagnostic criteria for Multiple Sclerosis have been revised twice in the last decade (Polman et al., 2011). How can all these challenges be addressed when building a data infrastructure for clinical neuroscience research today? This paper provides an insight into the state of art for IT-infrastructure in neuroscientific research projects and is a prospect on the upcoming challenges which need to be addressed. 2. Materials and methods To identify challenges, requirements and solutions for ITinfrastructure for longitudinal research in neurosciences a mixed approach was chosen reviewing literature and analyzing experiences of neuroscience research projects of the last ten to 15 years from 2000 until today. 2.1. IT-requirements for longitudinal and large cohort studies Neuroscience is one of two key research topics of the University Medical Center Göttingen (UMG2 ). To identify the IT-requirements for longitudinal research in the neurosciences, national as well as international neuroscience projects of the Department of Medical Informatics were evaluated following Shortliffe’s concept of experimental research in medical informatics (Shortliffe, 1983). This means that implementations of solutions are understood as experiments aiming to test which positive and negative outcomes the solution produces. The subsequent design is then a re-design reflecting the results and becoming itself again an experiment. This results in a continuous improvement of approaches, IT-components, and methods. Since 2001 the Department of Medical Informatics designed, re-designed, operated, and evaluated IT-infrastructures for nine national and international neuroscience projects (see Table 1). These projects are perceived as experimental designs for the given research purpose as they contain different types of data for many studies. To identify IT-requirements, the research projects and respective IT-components were compared regarding the following questions: (1) How did volume and types of variables in the

2

UMG: University Medical Center Göttingen.

Please cite this article in press as: Buckow, K., et al., Changing requirements and resulting needs for IT-infrastructure for longitudinal research in the neurosciences. Neurosci. Res. (2014), http://dx.doi.org/10.1016/j.neures.2014.08.005

G Model NSR-3719; No. of Pages 7

ARTICLE IN PRESS K. Buckow et al. / Neuroscience Research xxx (2014) xxx–xxx

3

Table 1 Overview of neuroscientific research projects supported by the Department of Medical Informatics. Similar infrastructures have been developed and operated for cardiologic and oncologic projects. Interestingly, continuous electrophysiological measurements like EEGs were never requested. Project

IT-components

Years

National multiple sclerosis (MSa ) registry National network (NWb ) dementia

Phenotype database Phenotype database, image database, biomaterial database Phenotype database, image database Phenotype database, image database, biomaterial database, identity management Phenotype data, image data Phenotype database, data merging tool Phenotype database, biomaterial database, identity management, study participant management system, data merging tool Phenotype database, image database, biomaterial database, identity management Video recorded eye movements, phenotype database

Since 2001 2002–2009;2014 reactivated

NW CJD NW MS Treatment study of dementia European MS registry Clinical research group psychosis

National psychiatry cohort Psychiatric forensic study a b

2003–2006 Since 2009 Since 2011 Since 2011 Since 2011

Since 2012 Designed 2013

MS: Multiple Sclerosis. NW: National network.

databases change over the considered period? (2) How did complexity of data systems develop? (3) Which new IT requirements arose by adapting to the needs and challenges of neuroscience research? (4) Do the results of the aforementioned questions involve a change in research setup? 2.2. International literature review To bring the identified IT-requirements into perspective, a literature research was performed addressing: (a) IT-infrastructure for neuroscience in general and (b) literature focusing on the identified IT-requirements. The search was performed using the advanced search of PubMed as well as the one of the Web of Science to cover medical as well as informatics journals. According to the focus of this paper, the search interval was limited to the years 2000–2014 regarding the year of publication. Medical and ITterms were used in combination to focus on papers describing the use of IT-infrastructure or specific IT-components in neuroscience research projects. For search termini representing the medical context “neuroscience” as well as the main neurological disorders were used, e.g. dementia, multiple sclerosis, epilepsy. In addition to these medical search terms, IT-terms were combined, addressing ITinfrastructure in general as well as the pointed out IT-requirements.

the study concepts: metadata on the quality of data and raw data from biomarker measurements, e.g. DNA sequence data (Fig. 1). The volume of those two data types increases: metadata slowly but steadily, as it is a very tricky issue, and raw data exponentially due to more and more imaging data stemming from higher resolutions, motion images, and functional markers in images. Data from biomaterial analyses grow exponentially, because error-rates are sinking, measurements become cheaper, and more biomarkers are identified every day. Complexity: The complexity stems from a growing gap between the original measurement and the captured data, used as variables in studies. In the last two decades many different calculations were performed and algorithms applied to the measurements, which on the one hand have produced impressive new variables – especially in imaging. However, the value of these variables depends extremely on how the data measurement, data collection and data processing is defined, programmed, and executed by the machinery or individual lab assistant or human interpreter (Fig. 1). This leads to difficult reliability and validity situations. Because of a lack of money and because this issue complicates research, this

2.3. Description of the latest application design As a consequence of the before mentioned IT-requirements resulting from the analysis of the neuroscientific research projects listed in Table 1, the current design of IT-infrastructure as being implemented by the Department of Medical Informatics of the UMG is described. Specific IT-components are pointed out which are facing upcoming challenges. Cross references to results from the literature review are included. 3. Results 3.1. IT-requirements for longitudinal and large cohort studies Looking in more detail on (1) data types and data volume, (2) complexity of data systems, (3) methodological questions, as well as (4) research setup in neuroscience studies, the following requirements were identified: Data types and volumes: During the last ten years, the amount of research variables and corresponding data has not increased very much, leading to a perception among many researchers that few changes have occurred during the last decade. However this is not true for two types of data, which still are seldom included into

Fig. 1. Increasing IT-infrastructure requirements: Schematic representation of the need to advance IT-infrastructure for neuroscience research based on data volume and data complexity in neuroscience studies. Starting with rather limited numbers of research data in single data entry systems, it is nowadays necessary to manage several different specialized IT-systems for different types of data in an interoperable way distributed over many centers. The volume of data grows exponentially, because of the vast amounts of images and biomarkers. This leads to complexity of IT-systems requiring big specialized groups of professionals with advanced data management skills and a long-term financial management of data centers. As studies often last many years, adherence to rules has to be documented in performance data. All of this needs long-term archiving and possible reactivation for new research questions at a later point of time.

Please cite this article in press as: Buckow, K., et al., Changing requirements and resulting needs for IT-infrastructure for longitudinal research in the neurosciences. Neurosci. Res. (2014), http://dx.doi.org/10.1016/j.neures.2014.08.005

G Model NSR-3719; No. of Pages 7 4

ARTICLE IN PRESS K. Buckow et al. / Neuroscience Research xxx (2014) xxx–xxx

problem is mostly hidden – but has become a factor of insecurity compromising the validity of study results. IT-requirements raised from neuroscience research challenges: Analyzing the neurological projects from Table 1, three main peculiarities were identified to be typical for research in neuroscience. These are the needs: for (A) a holistic research approach resulting from the variety of symptoms and characteristics of neurologic disorders, for (B) longitudinal studies due to the chronic courses of some major neurologic diseases, and for (C) large numbers of patients required, e.g. because of the increasing relevance of personalized medicine approaches. As stated above, a holistic analysis of the patient is required since research in neuroscience needs to consider a wide range of data resulting from clinical observations, biomaterial analyses, imaging technologies, or patients’ reporting. This variety of data types needs a complement from the ethical point of view: the informed consent should be designed in a modular way in order to empower the patient to opt-in for a study or cohort in general but also to opt-out for specific data types or procedures, e.g. whole genome analysis. The acquisition, storage, management, and provision of these different types of data requires a broad range of very specialized IT-components. The need for longitudinal cohorts results for example from chronic or protracted courses of some major neurologic diseases. The follow-up of study participants over a long period of time implies that these research projects have to deal with challenges like changing names, living places, responsible physicians, legal guardians, and also a participation of the patient in additional studies. To avoid a loss to follow-up resulting from the mentioned challenges, the identity management must work with robust algorithms and high precision to securely track the identifying information and link them to non-speaking identifiers. During years maybe even decades of study runtime also specific circumstances might change, like the update of diagnostic criteria or the development of new therapies in the medical context as well as advancements, extensions or renewal of software and hardware in the IT context. These changes in turn evoke data migration challenges, changes in processes, and a need for the training of personnel. This implies the demand for a flexible and sustainable IT-infrastructure as well as for a continuous support during the complete life-cycle of the infrastructure including a detailed documentation of IT-systems, processes, and decisions made. Large numbers of patients are needed for sufficient statistical power due to new methods in study design or because of the need to take various collectives into account. Two options exist to include large numbers of patients or study participants: by recruiting patients in multiple centers or by including patients from multiple (existing) cohorts. The former one requires a decision on either a central IT-infrastructure, providing a defined set of software tools for all participating centers (see also Demiroglu et al., 2012), or a decentralized IT-infrastructure, integrating data from the respective local software tools available at the participating centers. The latter one requires an appropriate handling of heterogeneous study contexts and data qualities as well as a mapping of data items (Flachenecker et al., 2014). Either way there is a need to control the various approaches and workflows (standard operating procedures, SOPs) of the centers or cohorts to collect digital and analogous data or to process information. Optimally, these are documented in the so-called metadata, i.e. data about data (National Information Standards Organization (U.S.), 2004). In addition, the understanding or the use of project relevant terms might be heterogeneous. For both ways to include large numbers of study participants an appropriate identity management should be considered, managing the identification of one study participant in different centers or in different cohorts.

The above described challenges do result in a couple of requirements that highlight the need for an adequate IT-infrastructure: First, an appropriate IT-tool for the identity management is needed which allows a secure but user-friendly way to handle pseudonyms across centers or cohorts and along several years, maybe even decades. Second, methods and tools are to be deployed to manage and control heterogeneity and quality of data among centers, cohorts, and studies. Third, to assure sustainability and flexibility, the IT-infrastructure needs a professional support for the whole project’s life-cycle and beyond. Fourth, processing of digital and analogous data – especially in the interplay of various centers – requires a robust infrastructure of logistics, databases, and interfaces between different software products. (4) Research setup: At the turn of the century most studies had a defined recruitment period; cases were assembled according to a study protocol and then statistically analyzed. This approach started to disappear at the end of the first decade: more and more data has been documented in such a form that it may be used (later on) for related research questions – after the originally planned time span or in comparison with other data from other research institutions (if the consent of the patient/study participant allows this!). This leads to an even more important role for metadata, because methods of data generation vary over time and the context of data included into comprehensive analyses potentially varies. 3.2. International literature review IT-infrastructure for neurosciences in general: a literature search in the Web of Science for the years 2000 up to 2014 yielded 65 results using the keywords “neuroscience infrastructure” in paper topics. Of these 65 findings, the majority (42) addressed topics like education or medical findings. Of the remaining 23 articles, almost equal numbers focused on imaging (seven), simulations and models (five), and six on IT-infrastructure, databases, and IT-tools. A detailed analysis of the six papers on IT-infrastructure, databases, and IT-tools published between 2001 and 2013 revealed a strong neuroimaging background focusing on the management, provenance and analysis of large neuroimaging data (Dinov et al., 2010) or storage techniques using data warehouse methods (Gee et al., 2010). In two older publications non-curated databases for neuronal morphology data (Cannon et al., 2002) as well as local storage and web-indexing of content for image data were discussed (Gorin et al., 2001). All image research approaches in neuroscience are part of the larger emerging field of bioimage informatics (Peng, 2008). One article identified was the editorial of the Neuroinformatics Journal describing the very heterogeneous landscape of neuroscience research and that IT plays a major role as vast amounts of data are being generated (Ascoli et al., 2003). Finally, there was only one paper, which focused on a different topic – inherent to longitudinal studies – which is the assigning of data generated by diverse instruments to the correct study participant IDs and study visits (Rohlfing et al., 2013). The literature research in PubMed focusing on the ITrequirements, as described before, resulted in 64 articles. Most of the papers (41 of 64) addressed the topic of data quality, but more from the medical or statistical point of view than from an IT-perspective. No articles were found addressing the identity management. There is an awareness of several authors declaring data quality and data heterogeneity being an issue when performing big trials or studies (Carmichael et al., 2012; Keefe and Harvey, 2008; Lillquist, 2004), nevertheless, only one paper described approaches on an IT basis to face these challenges (Nesbitt et al., 2013). In this paper Nesbitt et al. describe the IT-infrastructure of a research network and point out the main components of the implemented IT-infrastructure. This includes a central data repository for phenotypic data, a project data warehouse providing interfaces for data

Please cite this article in press as: Buckow, K., et al., Changing requirements and resulting needs for IT-infrastructure for longitudinal research in the neurosciences. Neurosci. Res. (2014), http://dx.doi.org/10.1016/j.neures.2014.08.005

G Model NSR-3719; No. of Pages 7

ARTICLE IN PRESS K. Buckow et al. / Neuroscience Research xxx (2014) xxx–xxx

analysis, a collaboration platform supporting document management and structured communication, an EEG/MRI file storage and review management, as well as electronic data collection tools for participant management, study documentation, or specimen tracking. Two other papers at least described the intention of the authors to set up an appropriate IT-infrastructure (López-Pousa et al., 2006; Pugliatti et al., 2012). In summary, the literature on IT-infrastructure in general as well as specific IT-challenges in neurosciences is very scarce in comparison to thousands of studies in the field. Challenges and requirements were addressed in many articles, whereas proper solutions were missing. 3.3. Description of the latest application design Since 2001 the Department of Medical Informatics at the University Medical Center Göttingen in Germany runs and develops IT-infrastructure for networked medical research. In the area of neuroscience research several national and international projects (Dangl et al., 2010; Demiroglu et al., 2012; Helbing et al., 2010; Pugliatti et al., 2012; Stroet et al., 2010) have been supported over years and the main components of the latest application design are described in the following. To support the research process from feasibility estimation through recruitment and data capture to data analysis, publication and long-term archiving, several IT-components are necessary. Due to data protection regulations and data handling issues, clinical, biomaterial as well as image data are stored in separate dedicated databases. This minimizes the risk of re-identification of study participants (Helbing et al., 2010; Pommerening et al., 2008). To strengthen the data privacy even more, different identifiers for one patient or study participant are used in the diverse databases. A key tool of such an infrastructure is what we call identity management. The identity management (ID management3 ) acts as a master-patient index and uses phonetic word matching algorithms to even link patients’ identities despite being reported from different centers or physicians and despite using different spellings. For example, it can identify that e.g. John Doe, male, born 20.06.1958 in Göttingen and John Dough, male, born 20.06.1958 in Göttingen are most probably the same person. This is especially important for longitudinal studies, when study participants move or change their names. Moreover, studies enrolling large numbers of participants need to ensure that IDs are correctly assigned as whole study outcomes might be affected otherwise. The data quality is raised through an appropriate ID management – in particular within a multicenter research environment. Data quality analyses as well as the originally formulated research questions can only be answered using an ID management as a link between the databases storing clinical, biomaterial, and image data. In summary, a centralized ID management is essential to guarantee the correct record linkage of all captured data especially over a long period of time and in large cohorts. A study participant management system (SPMS4 ) is especially helpful, when the study participant has to be contacted several times for longitudinal studies. It manages data about study visits and contact history with a study participant and with this reduces the risk of loss to follow-up. Additionally, it can also manage the information of given consents (Schwanke et al., 2013). The SPMS is tightly connected to the ID management and can be used to generate IDs for a study participant in several databases (Supplementary Fig. 2 of (Demiroglu et al., 2012)).

3 4

ID Management: Identity management. SPMS: Study participant management system.

5

Phenotypic, image, and biomaterial data are the most common data types in neuroscience research. Additional data types most probably require further specialized databases and software for managing, e.g. video files (e.g. captured in behavioral research). Depending on the type of study, one or the other database will be more important as depicted in the following examples. In psychiatric studies, typically many different scales and questionnaires are involved. Thus up to several thousand items will be documented per participant per visit in a dedicated study or phenotype database. Additionally, related metadata needs to be collected and managed to track heterogeneity if data collection is performed at different sites or if data is merged from various cohorts or trials (Flachenecker et al., 2014). Studies focusing on the degeneration of brains, i.e. Multiple Sclerosis, prion diseases, Alzheimer’s, require a highly sophisticated biomaterial management system and database to document in detail the quality and storage places of their several hundred samples generated from each brain under cryo- and paraffin-embedded conditions. Due to the fact that brain autopsies can only be performed after deceasing of patients and require a separate informed consent, many neuroscientific studies involve a variety of different imaging procedures to gather valuable information already during the life-time of patients. Thus, a dedicated image database to manage and annotate images, to document the quality of images, and to add relevant metadata is essential. For all IT-components an audit trail to trace each and every step of data manipulation is implemented. For the electronic data capture in clinical trials having an audit trail is necessary to be in compliance with Good Clinical Practice (GCP5 ) guidelines. 4. Discussion The findings of the analysis of the neuroscience studies revealed some trends, which are only partially reflected in the literature review. 4.1. IT requirements for longitudinal and large cohort studies Data types and volumes: The reason for the different perception of the increase in data volume of clinical researchers and medical informatics specialists may be that the explosion of biomarker and image data is only slowly reaching the statistical analysis level due to lack of methods and traditional study designs. The same is true for the growing amount of metadata, describing the quality of biomarker data and research data as well as of biomaterials. The fact that strategic data analysis methodology is increasingly focusing on metadata analysis prior to data analyses is part of new data evaluation strategies linked to buzz words like “big data” or “smart data”. Complexity: The growing complexity of data requires focusing on respective methodology, which at the moment is not reflected in the training of clinical researchers. Individuals with a mixture of clinical and methodological know-how – the strength of clinical research in the decades between 1960 and 1990 – are hard to find nowadays. As more and more researchers become aware of the severity of this problem of study validity, more and more raw data is stored and the production process of data (or biomaterial) mapped into metadata descriptions. Thus in parallel to data becoming more complex, software systems are becoming complex as well e.g. in order to be able to manage these very diverse metadata. IT-requirements raised from neuroscience research challenges: Although the IT-requirements corresponding to the areas of holistic approaches, longitudinal and large cohorts were identified and first solutions were already outlined, more research is required in this area. This is especially true in the light of upcoming big data

5

GCP: Good Clinical Practice.

Please cite this article in press as: Buckow, K., et al., Changing requirements and resulting needs for IT-infrastructure for longitudinal research in the neurosciences. Neurosci. Res. (2014), http://dx.doi.org/10.1016/j.neures.2014.08.005

G Model NSR-3719; No. of Pages 7

ARTICLE IN PRESS K. Buckow et al. / Neuroscience Research xxx (2014) xxx–xxx

6

studies based on advances in the omics area and still dropping analysis costs. For the sake of standardized results the percentage of formalized descriptions should be increased – a quest for standardized terminology systems (Elkin, 2012). Research setup: The key requirement is, that IT-infrastructures have to be planned, designed, operated, maintained, and archived for decades and not any more for a given time span of a couple of years. This implies the implementation of a comprehensive metadata management, tracking the origin of data and providing relevant context and quality information. The analysis of the neuroscience projects supports the literature analysis: the growing importance of a complex interoperable IT-based research infrastructure is underestimated in many research designs and could be optimized (Akil et al., 2011). Study infrastructures are still set up to serve one limited purpose: the purpose of this specific research project. This reflects the study-oriented funding systems. Although research itself is understood as a sequence of studies – intellectually interlinked in secondary and tertiary analyses of separate studies leading to meta-studies and reviews or a basic research question leading to translational research and later on to a clinical trial. The time span for these processes usually covers a decade. 4.2. International literature review The areas of research identified in the findings from the international literature review between 2000 and 2014 are in accordance with a retrospective analysis of articles submitted to the Neuroinformatics Journal (Schutter et al., 2012). Hardly any papers were published on databases or IT-infrastructure in the neurosciences. In the Neuroinformatics Journal they still only account for 3–8% of all articles published (Schutter et al., 2012), indicating at least a small increase within the last ten years. This might be due to the non-existing appreciation on infrastructure research. 4.3. Description of the latest application design As described in the results section, the IT-infrastructure for longitudinal neuroscience research involving large numbers of participants is by no means easy and should under no circumstances be underestimated. The key to a successful infrastructure lies in the professionalism of the ID-management and to overcome data heterogeneity by systematically using metadata. These are issues, which are not only true for neuroscience research IT-infrastructure but for all areas of biomedical research. 5. Conclusion and recommendations for future neuroscience projects In addition to the data types described above, we expect more sources of data required to be included in research projects within the upcoming years – such as omics data, high resolution images of digital pathology, as well as data from medical registries or from health care information systems. The relevance of these different data types has been increasing: the more we learn about the common etiology of many different diseases the more fixed nosological boundaries begin to disappear. Hence, interdisciplinary teams are needed to face new methodological challenges. Methods and tools to store, manage, and process the above mentioned data are already being developed and will be evaluated and improved to provide a standardized IT-infrastructure and robust routines for research projects. The broadening of data sources implies an even higher need for an appropriate metadata management and reliable ID-management techniques, both of which facilitate comprehensive analyses across the specialized databases. Besides this look ahead, we have several recommendations for future neuroscience research projects in the areas of funding, governance,

training of researchers, and the value system of research. By fulfilling these recommendations, the efficiency of research will probably be improved. Research could be further improved by developing and maintaining multicenter interoperable study infrastructures, which can easily be adapted to the needs of many studies and operated at reasonable costs for different research groups. Here forces should be joint between study centers (biostatistics) and infrastructure centers (biomedical informatics) for neuroscience research as it takes many years to build-up such infrastructures and to staff them with differently trained specialists. The infrastructure centers need two streams of funding: On the one hand the basic funding for the long-term operation and on the other hand a project-related part which comes from researchers, who want to make efficient use of the infrastructure. The second component is also the guarantee for the infrastructure centers to remain service-oriented and to improve and adapt their infrastructure to customer needs. Competitive funding is also necessary to foster the infrastructure centers, their equipment and their scientific methodology. These centers usually include faculties for biomedical informatics and statistics as well as psychosocial research. Governance of research projects: During the last decade it has been observed that none of the governing bodies; i.e. management boards of major research projects included methodologists such as biomedical informaticians, statistical or psychosocial researchers, but rather accumulated an isolated mix of representatives of basic biological researchers and clinicians. The results point out that – depending on the complexity of the infrastructure processes – a specialist on these issues should be part of the study management team/board. This would reduce the underestimation of potentials as well as resources (time, staff, and technical means) in study designs, maintenance, and evaluation. Training of researchers: Many software providers offer their tools using the sales story, that their tools enable researchers to perform powerful operations on their own. However, it has to be weighed of what a researcher can do him/herself – based on his/her methodological training – and what should be delegated or outsourced to a specialist. It requires a change of orientation of mind if one carries out one study to produce a few publications in comparison to sharing research in a stream of data from several studies and knowledge enhancement. To understand the basics but also the complexity of biomedical informatics, the international medical informatics association recommended 40 h of medical informatics training for undergraduate medical education (Mantas et al., 2010). This might help to make the above described decision based on career paths, the size of a research project and funding available. Value system of research: During the last decade research has been more and more driven by publications and impact points in increasingly specialized journals. Careers cannot be built-up anymore for overarching strategic advancement of knowledge and especially for research methodologies including IT-infrastructures. However, under the current value system, young researchers believe to plan, manage, and evaluate their “own” studies within easy and independent groups not including “difficult” biomedical informaticians, statisticians or other methodologists. The current value system with low acknowledgements for infrastructure methodology has become a danger for strategic neuroscience research itself (Kommission Zukunft der Informationsinfrastruktur, 2011). This aspect is true for many clinical fields of research, but especially for the complex neurosciences domains and should therefore be changed.

Conflicts of interest The authors declare that there are neither actual nor potential conflicts of interest. All authors have approved the final article.

Please cite this article in press as: Buckow, K., et al., Changing requirements and resulting needs for IT-infrastructure for longitudinal research in the neurosciences. Neurosci. Res. (2014), http://dx.doi.org/10.1016/j.neures.2014.08.005

G Model NSR-3719; No. of Pages 7

ARTICLE IN PRESS K. Buckow et al. / Neuroscience Research xxx (2014) xxx–xxx

Acknowledgements This publication is based, among others, on the following projects 1. The EUReMS project which has received (i) co-funding from the European Union, in the framework of the Second Health Programme 2008–2013, Priority Area: 3.3.2 Promote health – Promote healthier ways of life and reduce major diseases and injuries – Action: 3.3.2.7 Prevention of major and chronic diseases and rare diseases and (ii) from the following sponsors: Almirall, Bayer Pharma AG, Biogen Idec, ECTRIMS, GSK, F. Hoffmann La Roche, Genzyme, Medtronic Foundation, Merck Serono, Coloplast, Novartis, TEVA. The sole responsibility lies with the author and the Executive Agency for Health and Consumers is not responsible for any use that may be made of the information contained therein. 2. The Competence Network Multiple Sclerosis which was funded by the German Federal Ministry of Education and Research, funding code 01GI1304B. 3. The clinical research group (KFO) 241 ‘Genotype-phenotype relationships and neurobiology of the longitudinal course of psychosis funded by the Deutsche Forschungsgemeinschaft (DFG) with grant number SCHU 1603/5-1. References Akil, H., Martone, M.E., Essen, D.C.V., 2011. Challenges and opportunities in mining neuroscience data. Science 331, 708–712. Anderson-Schmidt, H., Adler, L., Aly, C., Anghelescu, I.-G., Bauer, M., Baumgärtner, J., Becker, J., Bianco, R., Becker, T., Bitter, C., et al., 2013. The “DGPPN-Cohort”: a national collaboration initiative by the German Association for Psychiatry and Psychotherapy (DGPPN) for establishing a large-scale cohort of psychiatric patients. Eur. Arch. Psychiatry Clin. Neurosci. 263, 695–701. Ascoli, G.A., De Schutter, E., Kennedy, D.N., 2003. An information science infrastructure for neuroscience. Neuroinformatics 1, 1–2. Balk, L., Tewarie, P., Killestein, J., Polman, C., Uitdehaag, B., Petzold, A., 2014. Disease course heterogeneity and OCT in multiple sclerosis. Mult. Scler., http://dx.doi.org/10.1177/1352458513518626. Bell, J.E., Alafuzoff, I., Al-Sarraj, S., Arzberger, T., Bogdanovic, N., Budka, H., Dexter, D.T., Falkai, P., Ferrer, I., Gelpi, E., et al., 2008. Management of a twenty-first century brain bank: experience in the BrainNet Europe consortium. Acta Neuropathol. 115, 497–507. Cannon, R.C., Howell, F.W., Goddard, N.H., De Schutter, E., 2002. Non-curated distributed databases for experimental data and models in neuroscience. Network 13, 415–428. Carmichael, D.W., Vulliemoz, S., Rodionov, R., Thornton, J.S., McEvoy, A.W., Lemieux, L., 2012. Simultaneous intracranial EEG-fMRI in humans: protocol considerations and data quality. Neuroimage 63, 301–309. Coleman, P.J., Barrow, J.C., 2012. Challenges and opportunities in neuroscience research. ChemMedChem 7, 339–341. Dangl, A., Demiroglu, S.Y., Gaedcke, J., Helbing, K., Jo, P., Rakebrandt, F., Rienhoff, O., Sax, U., 2010. The IT-infrastructure of a biobank for an academic medical center. Stud. Health Technol. Inform. 160, 1334–1338. Demiroglu, S.Y., Skrowny, D., Quade, M., Schwanke, J., Budde, M., Gullatz, V., ReichErkelenz, D., Jakob, J.J., Falkai, P., Rienhoff, O., et al., 2012. Managing sensitive phenotypic data and biomaterial in large-scale collaborative psychiatric genetic research projects: practical considerations. Mol. Psychiatry 17, 1180–1185. DeVol, R., Bedroussian, A., Charuworn, A., Chatterjee, A., Kim, I.K., Kim, S., Klowden, K., 2007. An unhealthy America: the economic burden of chronic disease. Milken Institute http://www.milkeninstitute.org/pdf/chronic disease report.pdf Dinov, I., Lozev, K., Petrosyan, P., Liu, Z., Eggert, P., Pierce, J., Zamanyan, A., Chakrapani, S., Van Horn, J., Parker, D.S., et al., 2010. Neuroimaging study designs, computational analyses and data provenance using the LONI pipeline. PLoS ONE 5, http://dx.doi.org/10.1371/journal.pone.0013070. Elkin, P.L., 2012. Terminology and Terminological Systems. Springer, London/New York, ISBN 9781447128168 1447128168. Filiou, M.D., Turck, C.W., 2011. General overview: biomarkers in neuroscience research. Int. Rev. Neurobiol. 101, 1–17. Flachenecker, P., Buckow, K., Pugliatti, M., Kes, V.B., Battaglia, M.A., Boyko, A., Confavreux, C., Ellenberger, D., Eskic, D., Ford, D., et al., 2014. Multiple sclerosis registries in Europe – results of a systematic survey. Mult. Scler., http://dx.doi.org/10.1177/1352458514528760 [Epub ahead of print]. Gee, T., Kenny, S., Price, C.J., Seghier, M.L., Small, S.L., Leff, A.P., Pacurar, A., Strother, S.C., 2010. Data warehousing methods and processing infrastructure for brain recovery research. Arch. Ital. Biol. 148, 207–217.

7

Gorin, F., Hogarth, M., Gertz, M., 2001. The challenges and rewards of integrating diverse neuroscience information. Neurosci. Rev. J. Bringing Neurobiol. Neurol. Psychiatry 7, 18–27. Helbing, K., Demiroglu, S.Y., Rakebrandt, F., Pommerening, K., Rienhoff, O., Sax, U., 2010. A data protection scheme for medical research networks. Review after five years of operation. Methods Inf. Med. 49, 601–607. Keefe, R.S.E., Harvey, P.D., 2008. Implementation considerations for multisite clinical trials with cognitive neuroscience tasks. Schizophr. Bull. 34, 656–663. Kommission Zukunft der Informationsinfrastruktur, 2011. Gesamtkonzept für die Informationsinfrastruktur in Deutschland. http://www.leibniz-gemeinschaft. de/fileadmin/user upload/downloads/Infrastruktur/KII Gesamtkonzept.pdf Kretzschmar, H., 2009. Brain banking: opportunities, challenges and meaning for the future. Nat. Rev. Neurosci. 10, 70–78. Kurtzke, J.F., 1983. Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology 33, 1444–1452. Lillquist, P.P., 2004. Challenges in surveillance of dementias in New York State. Prev. Chronic Dis. 1, A08. López-Pousa, S., Garre-Olmo, J., Monserrat-Vila, S., Boada-Rovira, M., TárragaMestre, L., Aguilar-Barberà, M., Lozano-Fernández de Pinedo, R., Lorenzo-Ferrer, J., 2006. A proposal for a clinical registry of dementias. Rev. Neurol. 43, 32–38. Di Luca, M., 2011. European neuroscience research: the road ahead. . .. Eur. J. Neurosci. 33, 767. Mantas, J., Ammenwerth, E., Demiris, G., Hasman, A., Haux, R., Hersh, W., Hovenga, E., Lun, K.C., Marin, H., Martin-Sanchez, F., Wright, G., IMIA Recommendations on Education Task Force, 2010. Recommendations of the International Medical Informatics Association (IMIA) on Education in Biomedical and Health Informatics. Methods Inf. Med. 49, 105–120. Murray, M.E., Graff-Radford, N.R., Ross, O.A., Petersen, R.C., Duara, R., Dickson, D.W., 2011. Neuropathologically defined subtypes of Alzheimer’s disease with distinct clinical characteristics: a retrospective study. Lancet Neurol. 10, 785–796. National Information Standards Organization (U.S.), 2004. Understanding Metadata. NISO Press, Bethesda MD, ISBN 9781880124628 1880124629. Nesbitt, G., McKenna, K., Mays, V., Carpenter, A., Miller, K., Williams, M., Investigators, E.P.G.P., 2013. The Epilepsy Phenome/Genome Project (EPGP) informatics platform. Int. J. Med. Inf. 82, 248–259. Peng, H., 2008. Bioimage informatics: a new area of engineering biology. Bioinformatics 24, 1827–1836. Polman, C.H., Reingold, S.C., Banwell, B., Clanet, M., Cohen, J.A., Filippi, M., Fujihara, K., Havrdova, E., Hutchinson, M., Kappos, L., et al., 2011. Diagnostic criteria for multiple sclerosis: 2010 revisions to the McDonald criteria. Ann. Neurol. 69, 292–302. Pommerening, K., Sax, U., Müller, T., Speer, R., Ganslandt, T., Drepper, J., Semler, S.C., 2008. Integrating eHealth and medical research: the TMF data protection scheme. In: eHealth: Combining Health Telematics, Telemedicine, Biomedical Engineering and Bioinformatics to the Edge. Aka, Berlin, pp. 5–10. ´ T., Pitschnau-Michel, D., Myhr, K.-M., Sastre-Garriga, Pugliatti, M., Eskic, D., Mikolcic, J., Otero, S., Wieczynska, L., Torje, C., Holloway, E., et al., 2012. Assess, compare and enhance the status of Persons with Multiple Sclerosis (MS) in Europe: a European Register for MS. Acta Neurol. Scand. Suppl. 195, 24–30. Riazi, A., 2002. Multiple Sclerosis Impact Scale (MSIS-29): reliability and validity in hospital based samples. J. Neurol. Neurosurg. Psychiatry 73, 701–704. Rohlfing, T., Cummins, K., Henthorn, T., Chu, W., Nichols, B.N., 2013. N-CANDA data integration: anatomy of an asynchronous infrastructure for multi-site, multi-instrument longitudinal data capture. JAMIA, http://dx.doi.org/10.1136/ amiajnl-2013-002367. Schulze, T.G., 2010. Genetic research into bipolar disorder: the need for a research framework that integrates sophisticated molecular biology and clinically informed phenotype characterization. Psychiatr. Clin. N. Am. 33, 67–82. Schutter, E.D., Ascoli, G.A., Kennedy, D.N., 2012. Ten years of neuroinformatics. Neuroinformatics 10, 329–330. Schwanke, J., Rienhoff, O., Schulze, T.G., Nussbeck, S.Y., 2013. Suitability of customer relationship management systems for the management of study participants in biomedical research. Methods Inf. Med. 52, 340–350. Sheedy, D., Garrick, T., Dedova, I., Hunt, C., Miller, R., Sundqvist, N., Harper, C., 2008. An Australian Brain Bank: a critical investment with a high return! Cell Tissue Bank 9, 205–216. Shen, H., 2013. US brain project puts focus on ethics. Nature 500, 261–262. Shortliffe, E.H., 1983. The science of biomedical computing. In: Pages, J., Levy, A., Grémy, F., Anderson, J. (Eds.), Meeting the Challenge: Informatics and Medical Education. North Holland, Amsterdam, pp. 1–10. Stroet, A., Buckow, K., Stürner, K.H., Gold, R., Antony, G., 2010. Das Datenmodell des Kompetenznetzes Multiple Sklerose: Aufbau und Umsetzung eines gemeinsamen Datenmodells für alle Studien und Register des Kompetenznetzes Multiple Sklerose. In: Duesberg, F. (Ed.), E-Health 2011. Informationstechnologien Und Telematik Im Gesundheitswesen. Medical Future Verl., Solingen, pp. 113–122. Sullivan, P.F., 2010. The psychiatric GWAS consortium: big science comes to psychiatry. Neuron 68, 182–186. Warner, V., Wickramaratne, P., Weissman, M.M., 2008. The role of fear and anxiety in the familial risk for major depression: a three-generation study. Psychol. Med. 38, 1543–1556. WHO, and US National Institute of Ageing, 2011. Global health and ageing. http://who.int/ageing/publications/global health/en/

Please cite this article in press as: Buckow, K., et al., Changing requirements and resulting needs for IT-infrastructure for longitudinal research in the neurosciences. Neurosci. Res. (2014), http://dx.doi.org/10.1016/j.neures.2014.08.005

Changing requirements and resulting needs for IT-infrastructure for longitudinal research in the neurosciences.

The observation of growing "difficulties" in IT-infrastructures in neuroscience research during the last years led to a search for reasons and an anal...
700KB Sizes 3 Downloads 4 Views