Accepted Manuscript How do we make it easy and rewarding for researchers to share their data? – a publisher’s perspective Dr. Hylke Koers, Head of Content Innovation PII:

S0895-4356(15)00325-X

DOI:

10.1016/j.jclinepi.2015.06.016

Reference:

JCE 8929

To appear in:

Journal of Clinical Epidemiology

Received Date: 5 February 2015 Revised Date:

15 June 2015

Accepted Date: 15 June 2015

Please cite this article as: Koers H, How do we make it easy and rewarding for researchers to share their data? – a publisher’s perspective, Journal of Clinical Epidemiology (2015), doi: 10.1016/ j.jclinepi.2015.06.016. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

How do we make it easy and rewarding for researchers to share their data? – a publisher’s perspective (A response to “Promoting greater transparency and accountability in clinical and behavioural research by routinely disclosing data and statistical commands” by Robert West)

RI PT

Dr. Hylke Koers, Head of Content Innovation, Elsevier

SC

For the sake of “variance and dissent” it may have been fitting to purposely take a more adversary point of view – but playing devil’s advocate here would do injustice to the importance of the issues of data transparency and data sharing, and to the thoughtful article written by Robert West [1]. In fact, I found the article very insightful and balanced in that it clearly outlines the needs and promises for more transparency and better practices around data sharing, but it also rightly points out some of the challenges and risks that need to be addressed in moving to such a situation.

TE D

M AN U

What this commentary aims to add, representing the perspective of a global Science, Technology and Medicine (STM) publisher, is two-fold. First I will attempt to place some of the discussion in a broader context, i.e. less specific to health sciences. Secondly, I would like to add one dimension to the discussion that appears to be somewhat underrepresented in West’s paper. That is the dimension of technology, and the solutions that it has to offer to enable researchers to better communicate (about) their data and methods. In this part I will mostly draw from experience at Elsevier1 to give some concrete examples of solutions that enable researchers to better store, share, use, and discover data. (In the following I will often write “data” rather than “data and methods” – but the argument usually extends to methods).

AC C

EP

At a high level, there are tremendous opportunities, and the case for more transparency and more sharing of data seems clear. The potential benefits are heard often and unison: better reproducibility of reported scientific results, greater efficiency of the scientific enterprise as a whole, and brand new research possibilities that are opened up when a wealth of data become available. To this list I would like to add a benefit that is heard less often, but one that seems highly important in driving a cultural change in how researchers deal with data: publishing data can provide researchers with a wider set of opportunities to communicate their work, create impact, and get proper recognition and credit. If we accept that publishing data is important to advance research, I am arguing that we should also enable those that are responsible for data gathering, analysis, organization, and sharing to pursue and receive academic credit – for example through data citation or via publication outlets designed to disseminate data. This is particularly important for those colleagues in the lab whose time is dedicated to data gathering and analysis, but who often don’t make it to the publications’ author list. In today’s world, those researchers receive no or very little academic credit for work that is largely beneficial to the progress of science and medicine. In addition, it could provide an incentive to publish “negative” results which often don’t find a place in the traditional literature even though they can be very valuable. 1

For the sake of disclosure, it should be noted that the author (Hylke Koers) has been with Elsevier since 2009, most recently as the Head of Content Innovation. In that capacity he is responsible for developing and managing a range of solutions to better support research data – including some of the projects mentioned in the text.

ACCEPTED MANUSCRIPT

On the other side of the spectrum, there are some obvious and generally respected reasons for not sharing data. As West also mentioned, patient privacy issues are important especially in the medical domain. In addition, there’s data that can be dangerous when used by people with malicious intent, or data that is simply too large or complex to share in any practical sense.

RI PT

In between these extremes there is an area where things are not so clear-cut. Where on the one hand there are legitimate concerns that impede individual researchers from sharing their data (see also [2]), but on the other hand there could be a larger benefit for the community as a whole if the data were shared. We’ll discuss two key barriers here.

M AN U

SC

The first barrier for individual researchers to share data is the expected amount of work to prepare the data, cast them in a format that enables re-use (by humans and machines), and provide the necessary metadata, annotations, and descriptions to make the data understandable. This is not a trivial concern, as the utility of data largely depends on those attributes: a data file without any description or context will be very hard to make sense of when it’s made available – let alone 20 years later! Proper data management is neither easy nor cheap and will require researchers to invest a significant amount of time if we want them to do this well – posing a clear challenge for research funders and for research institutes in their training programs.

EP

TE D

Another barrier that should be acknowledged here is that researchers often see their data as a treasure trove (or “dowry” as Christine Borgman eloquently puts it [3]) that will help them to write the articles which - in the current system of academic recognition - they direly need to build a name and a career. Until data output will be counted more fully in tenure decisions and grant proposals, this argument cannot be simply swept aside with an appeal to the common good – as requiring researchers to give away this competitive advantage overnight will create an uneven playing field and has the potential to disrupt a generation of academic careers. The move to more transparency and more data sharing, therefore, has to be a cultural change at the right pace for individual research communities, and we already see today that some communities are much more ahead than others in this regard – for example genomics and earth sciences already have several well-established data repositories that serve as the go-to place to share and find data.

AC C

And, apart from the barriers, there are some thorny questions. For one, the notion of “data owner” that West refers to is not uncontested. In particular, in the case of raw data (i.e. data that are the direct result of observations or experimentation without analysis or other intellectual input) it is often argued that such material is in the public domain with no possibility of copyright or ownership in a legal sense.2 Wherever there is original creative value added, for example by further analysis or aggregation, the situation is different – but the details here are usually very domain-specific and often not so clear-cut. It is therefore an open question if the notion of a “data owner” who has special rights to “publish findings 2

Legislation varies per country; in the US, paragraph 102(b) of The Copyright Act of 1976 stipulates that “In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.” – which can be interpreted to say “there can be no copyright in facts”, see Feist Pubs., Inc. v. Rural Tel. Svc. Co., Inc. (https://supreme.justia.com/cases/federal/us/499/340/case.html)

ACCEPTED MANUSCRIPT

beyond [a] limited quality control process” (from the original article by West [1]) would be practically feasible. Another question is whether such a notion is actually desirable, as a requirement to go back to the original author could also hamper progress. Drawing a parallel with work published in articles, researchers are not necessarily expected to contact the author(s) of any work that they wish to build on.

SC

RI PT

Another question that is, surprisingly, not heard that often: What does it actually mean to “share data” or to be “more transparent”? Again referring back to [1], when is disclosure enough, and when is actual sharing needed for research to be really reproducible and re-usable (and where is the exact line between the two)? What level of metadata, annotation, and description may be expected from individual researchers? What sharing solutions are deemed acceptable, and how do we support longtime storage? These are all good questions that will need an answer, and most likely different answers in different domains, to really carry the agenda forward.

M AN U

So, in the face of these challenges, how do we move forward? There’s no immediate and complete answer here, but what is clear is that structural changes in attitudes and practices will need to develop from within the research communities themselves; and that we need policies, incentives, and technology to be aligned to make it both easy and rewarding for researchers to share their data. There is a special role here for technology to pave the way – both in removing barriers, as well as in delivering rewards and benefits.

EP

TE D

Publishers can make a substantial contribution here. In the opening paragraph above, I wrote about how technology can enable researchers to better communicate “(about) their data and methods”. The parentheses here were intentional as they carry a specific meaning: in the old days of paper publishing, the data and methods underpinning an article could usually not be communicated – as they simply did not fit onto the ink-on-paper format. Therefore communication was mostly about data and methods – often a brief description or another, necessarily limited, representation of the actual thing. In the digital world, things are different and the actual data or methodology can often be captured and communicated from one researcher to the other – allowing researchers to communicate their actual data and methods.

AC C

It is here, in extending the format of the article, that publishers can make significant contributions towards increasing transparency and accountability in research. Most online publishing platforms support supplementary material, which provides authors with a generic option to make available any data related to their article (see also [2]). In addition, several publishers provide facilities to link articles with data hosted at data repositories, and encourage their authors to do so. At Elsevier, the “Article of the Future” [4] program aims to take this development to the next level: move beyond the traditional print-based article format, embrace new forms of digital research output, and create a new article format that allows researchers to communicate their research more fully, including data and methods. This includes several provisions to make data available to other researchers in a findable and comprehensible way, including (i) in-article data visualization tools [5, 6], (ii) a datalinking program [7, 8] to link published articles with data stored in domain-specific data repositories, and (iii) specialized journals such as Data in Brief [9], MethodsX [10] and SoftwareX [11] which publish so-

ACCEPTED MANUSCRIPT

RI PT

called “microarticles”: brief publications that are focused on a particular kind of research output, such as a data set, a method, or a piece of computer code. A recent project that is of particular interest in the context of this article is Elsevier’s Open Data pilot [12, 13], which provides authors with an option to make raw research data available on ScienceDirect under an open access Creative Commons Attribution Unported (CC BY) license. These examples demonstrate how a publisher can play a role in creating easyto-use and rewarding solutions for researchers to make their data available. Importantly, the choice is left to the researcher, but the publisher can encourage and support their authors to make use of the most appropriate solution.

References

TE D

M AN U

SC

While the solutions described in the previous paragraph by themselves are specific to researchers publishing with Elsevier (or reading articles published by Elsevier), they also build upon, and help develop, appropriate standards of a more universal nature. Such standards are essential to create awareness and drive a cultural change. In this regard, I would like to end this article by mentioning some important community efforts which are helping to develop a global infrastructure for research data through technology and standards. One such effort is the Joint Declaration of Data Citation Principles [14, 15], which promotes proper data citation practices that are essential for academic recognition. Another is the Research Data Alliance [16], which hosts a large number of interest and working groups, including on research data publication. Such groups, where different stakeholders - including researchers, data centers, publishers, funders, and research institutes - come together in a constructive way to push the envelope, are essential to set the pace of change towards more sharing of data and methods, more transparency – and, ultimately, a higher-value, more effective way of scientific communication.

AC C

EP

[1] Robert West. Promoting greater transparency and accountability in clinical and behavioural research by routinely disclosing data and statistical commands. Journal of Clinical Epidemiology, 2015. [2] Alice Meadows. To Share or not to Share? That is the (Research Data) Question… The Scholarly Kitchen, 2014, http://scholarlykitchen.sspnet.org/2014/11/11/to-share-or-not-to-share-that-is-theresearch-data-question/ [3] Christine Borgman, plenary talk at the 4th Research Data Alliance Plenary, Amsterdam, 23 September 2014 [4] IJsbrand Jan Aalbersberg. The Article of the Future. ElsevierConnect, 2012, http://www.elsevier.com/connect/the-article-of-the-future [5] http://www.elsevier.com/about/content-innovation [6] Hylke Koers, Ann Gabriel, and Rebecca Capone. Executable papers in computer science go live on ScienceDirect. ElsevierConnect, 2013, http://www.elsevier.com/connect/executable-papers-incomputer-science-go-live-on-sciencedirect [7] Harald Boersma. Bringing data to life with data linking. ElsevierConnect, 2013, http://www.elsevier.com/connect/bringing-data-to-life-with-data-linking [8] http://www.elsevier.com/databaselinking [9] Paige Shaklee. New data journal lets researchers share their data open access. ElsevierConnect, 2014, http://www.elsevier.com/connect/new-data-journal-lets-researchers-share-their-data-open-

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

access [10] Andrea Hoogenkamp & Irene Kanter-Schlifke. In new MethodsX journal, authors can show the methods behind their research. ElsevierConnect, 2014, http://www.elsevier.com/connect/in-newmethodsx-journal-authors-can-show-the-methods-behind-their-research [11] Elsevier Announces the Launch of SoftwareX. Press Release, http://www.elsevier.com/about/pressreleases/research-and-journals/elsevier-announces-the-launch-of-softwarex [12] Hylke Koers and Rachel Martin. “Open science needs open minds”. ElsevierConnect, 2014, http://www.elsevier.com/connect/open-science-needs-open-minds [13] http://www.elsevier.com/about/research-data/open-data [14] Hylke Koers. Building a global infrastructure for research data. ElsevierConnect, 2014, http://www.elsevier.com/connect/building-a-global-infrastructure-for-research-data; [15] https://www.force11.org/datacitation [16] https://rd-alliance.org/

How do we make it easy and rewarding for researchers to share their data? A publisher's perspective.

How do we make it easy and rewarding for researchers to share their data? A publisher's perspective. - PDF Download Free
462KB Sizes 0 Downloads 9 Views