Project execution
Utilizing RDM in ongoing research ensures that research data are handled in a well-planned, structured manner in day-to-day academic work. This includes organized data storage and management, having a storage and backup strategy and the documentation of the research data.
Organizing research data
-
Folder structures
Plan out an effective filing structure that is as simple as possible to adhere to and is also clear and unambiguous – for future reference as well. All files should be easy to find and uniquely named. Keeping a folder system to a maximum of four levels and not much more than ten elements per folder are tried and true guidelines observed by many.
Depending on the research project and the nature of your data, different approaches may be advisable, such as folder structures based on data collection methods, data types, processing steps, individuals, locations, time categories, etc. The folder structure should be primarily designed around the workflows and routines characteristic of your project. If navigating through your folders is not intuitive or takes up significant time out of your day, you should rethink the system you have in place. In doing so, be sure to coordinate closely with your staff.
-
File names
The same applies to file naming. File names should be unique, indicate the content and status of the file and facilitate sorting. Make sure that file names are not excessively long and use hyphens and underscores or upper and lower case type for separation. Do not use spaces, periods or special characters in file names. For example, the file name MS_Sample17_Clean_19-11-06 could indicate that the file is sample no. 17 from Manfred Schmidt, cleaned on November 6, 2019. Naming conventions should be strictly observed, and only changed after prior consulting your collaborators.
-
For further practical information on this subject see:
- UK Data Service: Organising https://ukdataservice.ac.uk/learning-hub/research-data-management/format-your-data/organising/
- Verbund Forschungsdaten Bildung: Dateien benennen und organisieren https://www.forschungsdaten-bildung.de/datei-benennung#Dateiorganisation-Ordnerstruktur
- RDM support from Leibniz University Hannover: Slides for the advanced course "Datenorganisation und Projektablage" https://www.fdm.uni-hannover.de/fileadmin/fdm/Dokumente/Schulungsunterlagen/Schulungsunterlagen_FDM_VertiefungDatenorganisation_Folien.pdf
- Recker, Jonas; Brislinger, Evelyn (2019): Dateiorganisation in empirischen Forschungsprojekten. In: Uwe Jensen, Sebastian Netscher und Katrin Weller (Hg.): Forschungsdatenmanagement sozialwissenschaftlicher Umfragedaten. Opladen, Berlin, Toronto: Verlag Barbara Budrich, S. 81–95 https://doi.org/10.3224/84742233.06
- Wageningen University & Research: Organising files and folders https://www.wur.nl/en/Value-Creation-Cooperation/WDCC/Data-Management-WDCC/Doing/Organising-files-and-folders.htm
- Briney, Kristin (2020): File Naming Convention Worksheet https://resolver.caltech.edu/CaltechAUTHORS:20200601-161923247
- Santaguida (2010): Folder and File Naming Convention – 10 Rules for Best Practice https://www.exadox.com/files/pdf/en/Folder-File-Naming-Convention-10Rules-Best-Practice.pdf
-
Version control
When you change files, it is often a good idea to keep earlier versions at hand and to adhere to versioning scheme. For example, you can give the file a sequential version number in the file name (see above), or save version information within the file (in the header, for example). Document version jumps and the associated work steps in your data documentation (see below). You should define what a version jump means and how they are designated and documented, especially in projects involving several individuals.
-
For further information on version control see:
- IANUS research data center: Versionskontrolle https://ianus-fdz.de/versionskontrolle
- UK Data Service: Versioning https://www.ukdataservice.ac.uk/manage-data/format/versioning
-
File formats
File formats should be chosen deliberately. File formats are either proprietary or open. Proprietary formats can have disadvantages because they are often developed for specific processing software and may not be easily migratable into other environments for either technical or legal reasons. If the software is no longer maintained or your license expires, in the worst case you may lose access to your data. This impedes data interoperability, i.e. usability in differing technical contexts, and is a major problem for long-term archiving. How can we ensure, to the extent possible, that our data will still be readable in ten, twenty or even fifty years from now?
We recommend saving data in open formats with widely used, recognized specifications or rely on the formats typically used in your academic field.
-
Details on preferred file formats and further information can be found here:
- Data Archiving and Network Services: File formats https://dans.knaw.nl/en/about/services/easy/information-about-depositing-data/before-depositing/file-formats
- UK Data Service: Recommended formats https://www.ukdataservice.ac.uk/manage-data/format/recommended-formats
- Forschungsdaten.info: Formate erhalten (Preserving formats) https://www.forschungsdaten.info/themen/bewahren-und-nachnutzen/formate-erhalten/
- Handout: How do I make my spreadsheet FAIR? https://www.forschungsdaten.uni-bonn.de/de/files/handout-how-do-i-make-my-spreadsheet-fair/at_download/file
Contact us at any time for advice on any data organization questions you may have at: [Email protection active, please enable JavaScript.]
Storage and security
-
Storage and backup strategies
Losing unsaved research data can be frustrating, requiring work to be redone and jeopardizing publications, or even torpedo an entire research project in a worst-case scenario. Accidents can never be ruled out, like spilling a hot drink on your laptop, leaving your bag on the subway with your USB stick in it or overwriting the latest version of a file. However, a well thought-out storage and backup strategy will in most cases minimize the damage done.
Where and in what format you should optimally store and edit your data depends chiefly on your research data and work routines. In general, you should avoid relying on individual devices and external data carriers. Cloud storage services are often recommended for automated synchronization of devices and users. Before using such services however you should carefully review the terms of use, encryption technologies employed, server locations and other factors. In many projects, additional considerations are important such as controlling access to highly sensitive data, handling large data volumes or the rights and role management for staff.
Backup strategies can also vary from project to project. What data do you want to back up, and at what intervals? What data volume is concerned? How many restore points should be created, and retained for how long? As a rule of thumb, you should keep at least three copies of your data on at least two different storage media, with at least one copy saved at a separate location (i.e. a different fire compartment). Doing so keeps you well-prepared to overcome most accidents and incidents.
-
Data storage solutions at the University of Bonn
Faculty and graduate students of the University of Bonn have access to a range of services offered by University IT:
- Personal data storage on Sciebo, the Campuscloud, open to all universities in North Rhine-Westphalia (30GB standard package):
- Sciebo project boxes (also usable collaboratively, up to 2TB): https://www.sciebo.de/anleitung/pbox.html
- Personal storage on the Research Data Infrastructure (FDI) (100GB standard package): https://www.hrz.uni-bonn.de/en/all-services/data-storage-fileservices/research-data-infrastructure-fdi
- Project storage on the Research Data Infrastructure (FDI) (also usable collaboratively, 50TB standard package):https://www.hrz.uni-bonn.de/en/all-services/data-storage-fileservices/research-data-infrastructure-fdi
- Transferring large files: Gigamove, provided by RWTH Aachen University (file size up to 100GB): https://www.hrz.uni-bonn.de/en/all-services/data-storage-fileservices/gigamove
- uniVM - Virtual Machines for Institutes: (CentOS, Ubuntu, Debian, Windows Server) approximate monthly costs (€ 1.20/CPU; € 0.48/GB RAM; € 0.04/GB SSD; € 2.52 license fee for Windows if applicable): https://www.hrz.uni-bonn.de/en/all-services/serverhosting-housing/virtual-machines-for-institutes
See also the following for further information:
- CESSDA Training Ressources: Backup https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/4.-Store/Backup
- CESSDA Training Ressources: Security https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/4.-Store/Security
- UK Data Service: Store your data https://ukdataservice.ac.uk/learning-hub/research-data-management/#store-your-data
- Venkatamaran & Moura (2020): Raw data, backup and versioning: What you need to know to preserve your research data https://doi.org/10.5281/zenodo.4041556
Documenting research data
Documenting work steps is a cornerstone of good scientific practice, for it is essential to ensure that research results are transparent and reproducible and makes it easier for interested third parties (and the researchers themselves, after the fact) to understand the methodology employed and recreate it if necessary. Structured data documentation is recommended if digital data are a core element in your research work. Undocumented data can become worthless in the worst case, as its informational value can no longer be determined.
Documentation of your data may include the following:
Data collection
- For what research project and what research questions were the data generated?
- When, where and by whom was the data gathered?
- What methods and procedures were employed and what measurement instruments were used, if any?
Data structure
- What is the data content (interviews, temperature measurements, stock prices, text codes, lab samples, etc.)?
- What is the data basis and range (e.g. ratio relative to the population, type of sampling)?
- Scope of data (number of “cases” or “events”, description of characteristics and variables collected)
- Explanations of codes, classifications, variable names, numbering, etc.
- Description of the software environment (operating system, software used, versions)
- Information on folder structures, file names, version control and formats (see above)
Data processing
- Quality assurance and data cleansing measures
- Anonymization and pseudonymization methods, as required
- Conversion, formatting, normalization, other processing
- Evaluation (analysis steps and methods)
- Preparation and visualization techniques
There are a number of practical approaches to implementing documentation. Freely formulated documentation is always an option using in editors or word processing programs, but many prominent software packages offer internal documentation solutions, via description fields for individual data records, for example. Depending on the scope and character of your project, it may be better to work with project-wide documentation or differing documentation approaches for individual files or file groups. The documentation technology best suited for your purposes depends heavily as well on the academic field concerned. In laboratory sciences, for example, Electronic Lab Notebooks are increasingly in use which have been specially developed for the documentation of laboratory activities.
Plan in advance the scope and implementation of your data documentation regime. To ensure data reusability it is often advisable to focus on the requirements under relevant metadata standards (see section on Metadata Tagging under Publishing Research Data) and the criteria for selecting suitable repositories.
For basic data documentation, we offer a readme template that you are welcome to use for your datasets.
Contact us for any questions you may have concerning data documentation at: [Email protection active, please enable JavaScript.]
See the following for further information:
- Wageningen University & Research: Data Documentation https://www.wur.nl/en/Value-Creation-Cooperation/WDCC/Data-Management-WDCC/Doing/Data-Documentation.htm
- ZBW, GESIS, RatSWD: Datensätze dokumentieren https://auffinden-zitieren-dokumentieren.de/dokumentieren/a-daten-dokumentieren/
- Forschungsdaten-Bildung: Webinar – Dokumentation & Metadaten https://www.youtube.com/watch?v=YMJOhxvlmL0
- UK Data Service: Document your data https://ukdataservice.ac.uk/learning-hub/research-data-management/#document-your-data
- Forschungsdaten.info: Datendokumentation https://www.forschungsdaten.info/themen/beschreiben-und-dokumentieren/datendokumentation/
Image sources:
Attic: Bill Kasman 2014 & Scott Arneman 2009
File Names: Randall Munroe
File Formats: Bezjak et al. (2018): Open Science Training Handbook
Backup: verändert nach Foto von Kaboompics from Pexels.com
Dead Chef: Auke Herrema
Folders Icon: Bharat from the Noun Project
Backup Icon: ProSymbols from the Noun Project
Documentatin Icon: Juicy Fish from the Noun Project