Version control - dataΒΆ

Content πŸ’‘πŸ‘©πŸ½β€πŸ«ΒΆ

In the following you’ll find the objectives and materials for each of the topics we’ll discuss during this session.

MotivationΒΆ

Research Data Management is a core component of good scientific practice and can help to make your work not only more reproducible and transparent but also easier, and version control for data can be one component of it. This session introduces DataLad, a data management and data publication tool building up on version control systems.

If you answer β€œyes” to any of the following questions, this session will be interesting for you:

Have you ever worked through such a directory?

A directory filled with almost identical files, differing only by slight name version variations that get increasingly chaotic

Is this metaphor fitting to a paper of yours?

A beautiful mermaid fascinating a man, but unknown to the man, the hidden back of the mermaid is a monster - A metaphor for papers (the beautiful mermaid) and their project directories (the hidden horror)

Have you ever looked like this trying to figure out how a colleagues script is supposed to work (or an old script of yourself)?

A researcher pulling her hair over futil attempts to understand and run something on a computer

Do you find yourself wondering how to share or publish the data and results of your recent project?

An explorer carrying a large data box to a sign pointing at a data repository

Objectives πŸ“ΒΆ

  • Understand why we should not only version control code and other small files, but also data or software

  • Understand the advantages of distributed version control for data

  • Get first-hand usage experience with DataLad, and master the following DataLad concepts:

    • Create and consume datasets

    • Perform version control on arbitrarily sized digital objects

    • Link components of a data analysis (code, data, software) together

    • Run and rerun computationally reproducible data analyses

Materials πŸ““ΒΆ

You can find the slides here or you can directly download them from Zenodo.

Questions you should be able to answer based on this lecture πŸ–₯οΈβœπŸ½πŸ“–ΒΆ

optional reading/further materialsΒΆ

If you want to learn more about DataLad or research data management in general, there are several major resources:

Additionally, you can find an overview of recorded workshops and past tutorials at github.com/datalad/tutorials.