Naming and Structure of the Research Data (#1)

This letter guides readers on how to efficiently name and organize research data, stressing the significance of consistency. It covers naming conventions and data structuring for seamless collaboration.

Introduction

Today we will look at the most fundamental and important aspects of working with data – how to structure and name your research files for easy retrieval and use. At the beginning of a research project, it’s easy to believe that you'll remember all file names and locations. However, as your research progresses, you may accumulate multiple files in various formats, versions, and sources. Searching for a data file becomes challenging if it is stored or named incorrectly, wasting valuable time.

Naming

Consistency throughout the naming process is paramount. Adopting a sensible file naming strategy and applying it consistently will provide an audit trail for the development of your data. This will help prevent confusion when working on files, particularly when working collaboratively with others, and ensure data files are not accidentally overwritten or deleted. Therefore, understanding and adhering to naming conventions is a key practice for efficient data management.

The following non-exhaustive list includes a number of common elements that should be considered when developing a file naming strategy:

Description of the content
Project number
Name of creator (a good idea would be to introduce a common abbreviation)
Name of research team/department associated with the data (if several departments are involved in the work)
Date of creation; Publication date
Version number

Other useful tips would be:

Consider using underscores or hyphens (Special characters, full-stops and spaces are parsed differently on different systems)
Use no special characters (§ $ & % . : , ; ! + {} ? ß ö ü ä )
Names should not exceed 40 characters
Use English
Date is always written as specified in ISO8601 (YYYY-MM-DD)
Use leading zeros for sequential numbering.

If you want to bring your previous files and documents to the same standard, you can search for a free file remane software, there are plenty of it for each of the system types.

Data structure

Once you have an overview of your project files and have decided how to name them, think about how to organize files into different folders. Use the most important attribute for a top level and then create nested folders according to the relevance of the remaining categories or attributes.

Important rules:

Avoid overlapping categories and folder redundancy
Find balance: folders not too large vs. structure not too deep
Follow file naming guidelines for folders as well

Folders can be set up by:

project, experiment,…
time (year, month, day)
data type (e.g. documents, scripts, figures,…)

Example of a possible folder structure:

└── 01_Project-01-Example/

├── 01_Organisation/

│ ├── 01_Meta_information/

│ │ ├── Project-abstract

│ │ └── ..

│ ├── 02_Time_table/

│ │ ├── Project-01-Example_Time_table.xlsx

│ │ └── ...

│ ├── 03_Finances/

│ │ ├── Project-01-Example_Finances.xlsx

│ │ ├── 01_Grants

│ │ └── ...

│ ├── 04_Other

├── 02_Raw_data/

│ ├── 01_Readme.txt

│ ├── 02_Internal/

│ │ ├── 01_Experiment_A/

│ │ │ ├── P01-01-Example_Metagenome-Sequencing-A1_v01_2023-08-18_KG.gz

│ │ │ ├── P01-01-Example_Metagenome-Sequencing-A2_v01_2023-08-19_KG.gz

│ │ │ ├── ...

│ │ │ └── <AND/OR_Coscine-PID_URL_https://coscine.rwth-aachen.de/>

│ └── 03_External/

│ ├── 01_ENA

│ ├── 02_NCBI

│ ├── 03_DDBJ

│ └── 04_Zenodo

├── 03_Analysis/

│ ├── 01_Readme.txt

│ ├── 02_Experiment_A/

│ │ ├── 01_Figures

│ │ └── ...

│ └── 03_Pipeline_C/

│ ├── 01_Code/

│ │ ├── 01_Component_loads()

│ │ ├── 02_Component_dumps()

│ │ └── ...

│ ├── 02_Pipeline/

│ │ ├── 01_load_data/

│ │ │ ├── .tmp

│ │ │ ├── .store

│ │ │ └── .out

│ │ └── ...

│ └── 03_Output/

│ └── ...

├── 04_Writing/

│ ├── 01_Readme.txt/

│ │ └── Content:Use-Zotero_URL_https://www.zotero.org/groups/

│ ├── 02_Abstract

│ ├── 03_Text

│ ├── 04_Figures

│ ├── 05_Data

│ └── 06_...

└── 05_Archive/

├── 01_Organisation

├── 02_Raw_data

└── 03_...

Readme.txt files can be used to describe projects, folders, and files.

Naming

Structure

Disclaimer

I hope this was an interesting read. If you have comments, remarks, or suggestions about other RDM-related topics for the next newsletters, please let me know by sending me an email at dukkart@itc.rwth-aachen.de.

Image designed by stories / Freepik

Project area Z

Administrative project

Consortium

Logo

Flyer

Naming and Structure of the Research Data (#1)

Introduction

Naming

Data structure

Disclaimer