Publication of research data
Research data should be made publicly available unless there are legal reasons to the contrary. Not only do most funding bodies and scientific organisations now expect this, but it is also in your personal interest.
- Published research data are an independent, citable and re-usable scientific achievement.
- The citation rate of publications whose associated data are publicly available is increasing.
- Research data published in suitable repositories are kept available for a long time and your own work is thus secured.
- You meet the requirements of research funders.
- You enable other researchers to work with high-quality data.
- Within the framework of Open Science, new findings and scientific collaborations can be promoted.
Nevertheless, in some cases it may be advisable not to publish research data or research data may not be published due to certain conditions. This is the case, for example, if data is protected by copyright, subject to confidentiality or has a personal reference. It is advisable to think through such legal issues during project planning.
In a research data repository
A repository is a storage location for digital objects that can be accessible to a public or restricted group of users. Publication in a repository offers the advantage that research data can also be found independently of text publications via appropriate search services. Research data repositories can be divided into the following types:
Subject-specific repositories have been established for many disciplines in recent years. If there is a subject-specific repository in your discipline, we recommend that you publish your research data there. By being known in the respective subject community, your data will be found more easily and its visibility will be increased. Subject-specific repositories often offer additional services such as content quality checks. Furthermore, they usually use subject-specific metadata standards and offer technical support for more specialised data formats that may be used in the subject.
Humanities and Cultural Studies
DARIAH-DE Repository (Digital Research Infrastructure for the Arts and Humanities):
IANUS Research Data Centre Archaeology and Classical Studies
TextGrid Repository Long-term archive for research data in the humanities
arthistoricum.net@heiDATA of the specialised information service Art - Protography - Design
Propylaeum@heiDATA of the specialised information service Classics
Forschungsdatenzentren (FDZ) of the VerbundFDB (German Network of Educational Research Data) for research data from educational research
Social and economic sciences
Publication in a generic repository is a good idea if no subject-specific repository exists for your subject yet. Generic repositories are focused on data types and formats from a wide range of subject areas. As a rule, they do not use subject-specific but general metadata standards, which means that they cannot map subject-specific properties of research data in depth. In order to ensure the discoverability, reusability and interoperability of data for a specific subject community, it is better to publish research data in a subject-specific repository.
Generic Repositories (in selection)
Zenodo: In addition to research data, Zenodo can also be used to store publications, software, presentations, videos and other types of resources. The repository is maintained by the OpenAIRE consortium and CERN.
Open Science Framework (OSF): Open Science Framework (OSF) is both a free network for research materials and an open source project management tool.
Selecting a suitable repository
To select a suitable repository, we recommend searching via re3data (Registry of Research Data Repositories), which provides an overview of existing research data repositories.
The consortia of the National Research Data Infrastructure (NFDI) also offer overviews of corresponding subject-specific repositories and data archives.
The selection of a repository suitable for your research data should be based on the practices of your subject or the requirements of funding institutions. It also depends on whether your data are to be stored for a specific period of time or archived for the long term. If there are no specifications, subject-specific repositories are recommended.
Quality features for selecting a suitable repository
- Allocation of persistent identifiers: Are persistent identifiers assigned to datasets (e.g. DOI) and authors (e.g. ORCID)?
- Metadata: Is it possible to use subject-specific metadata standards?
- Download and export options: Does the repository offer different export options?
- Description or documentation: Can the context of origin of the data be recorded in a text field?
- Access options: Is it possible to specify different types of access or an embargo period?
- Licences: Can licences for subsequent use be selected?
- Overview/preview of the data set: Are viewers integrated to display a file preview?
- Versioning: Is it possible to create versions of a data set?
- Registration and editing: Are registration and login simple? Do authors have the possibility to edit their data set even after it has been filed?
- Findability by search engines: Can the repository be found by search engines through indexing?
More detailed information on the individual quality features can be found in the Fact Sheet: Research Data Repositories by the Kompetenznetzwerks Forschungsdatenmanagement an Thüringer Hochschulen.
As Data Paper
There are special Data-Journals, that are usually discipline-specific and publish articles (data papers) on research data. The detailed description of data sets in a data paper is intended to facilitate their reusability and increase their visibility. The aim of a data journal is to provide rapid access to quality-assured data sets.
As a supplement to a publication
Many scientific journals offer the possibility of depositing research data as supplementary material with an article. Text and data are then in the same place, but the extent to which this form of data publication complies with the FAIR principles depends on the implementation in the respective journal or publisher.
Preparing research data for publication
Not all research data created in the course of a project is worthy of publication or archiving. The checklist "Five steps to decide what data to keep" from the Digital Curation Center (DCC) can help when selecting research datathat is to be published or archived. It is based on the following five steps for data selection:
- Consider potential reuse purposes - what aims could the data meet?
- Check for indications that the data must be saved considering legal or policy compliance risks
- Identify which data should be kept for its may have long-term value
- Weigh up the costs - which data management costs have already been incurred and therefore contribute to its value, and how much more is planned and affordable? Where will the funds to pay these costs come from? Considering these questions will give you the cost element of your data appraisal and should help identify any need for external advice, e.g., on how to deal with any shortfall in the budget.
- Complete your data appraisal - this will list what data must, should or could be kept to fulfil potential reuse purposes. The appraisal should also summarise any actions needed to prepare the data for deposit, or the justification for not keeping it
File names and folder structure should be consistent and without special characters.
Multiple files should be stored in hierarchically structured folders and, if possible, be in open file formats.
It is also advisable to check your data with regard to formal criteria such as date and number formats, value scales, naming conventions for fields and variables, abbreviations used, etc.
All data that you select for publication should be comprehensively documented so that third parties can understand and classify it independently. For example, a README file that you publish together with the data is suitable for this. Metadata is required for publication in a repository and should be precise in order to increase the findability of your data. (see Project execution, Documentation and Metadata).
Clarifying legal aspects and choosing a licence
Furthermore, you should clarify existing copyrights and exploitation rights to the data in advance, obtain any necessary consents and make personal references anonymous. You should also consider under which licence you would like to make your data available.
Good scientific practice requires research data to be stored for at least ten years in order to meet the requirement of verifiability and traceability. However, archiving beyond this period is recommended.
As a rule, archiving is independent of whether all or part of the data is also accessible as a publication. Not all files and file versions have to be archived as a matter of principle. Just as with a publication, it is necessary to select which data should be retained in the long term.
Archiving means ensuring the long-term usability of the data over an undefined period of time. The preservation of authenticity, integrity, accessibility and comprehensibility play an essential role. This includes the provision of the technical infrastructure, organisational measures as well as workflows and standards. The mere storage on one's own computer or an external storage medium does not constitute archiving!
In the context of purely physical storage, so-called bitstream preservation, data is preserved in the state in which it was delivered. However, since operating systems, software and file formats change continuously, data can quickly become inaccessible and unusable with this strategy.
In order to be able to reproduce and interpret data in the long term, migration or emulation is a suitable strategy. This preserves the information, not the digital objects themselves. Comprehensive contextual information on collection methods, hardware and software used and a detailed description with metadata can greatly facilitate future use scenarios. Furthermore, the data must not be inseparably linked to a data carrier or readout device in order to be able to migrate it to other systems and carriers. Proprietary file formats complicate this process.
Depending on the volume, format or sensitivity of the research data, different options are available for archiving. For example, the data can be stored at your own institution, placed in a repository or handed over to a research data centre.