Do

What are you waiting for?
Just Do it—Embrace reproducibility from the get-go

Our computational environments and infrastructure are growing more powerful, more heterogeneous, and more complicated. Reproducibility can be improved through practices and tools that 1) enable efficient management of data and scripts while capturing the history and details of changes and 2) facilitate executing analyses in a fashion that supports inspection, comparison, validation, modification, and re-execution.
To become most effective, such solutions should consider all stages of the scientific process, from planning the experiment and collecting data to publishing the findings. We aim to assist the neuroimaging community in improving reproducibility of their results by Doing

  • data conversion with minimal manual effort while establishing data provenance from the moment data are collected
  • collection and analysis of data provenance throughout different steps of the scientific process
  • (re)execution of (previous) analyses in a scalable and flexible manner by taking advantage of both existing infrastructure and new technologies.

We strongly believe in the benefits of modular design, collaboration, and re-use. That is why, instead of developing a single monolithic platform to "solve all the problems", we are reusing and contributing to existing projects as much as possible. To that end, in the scope of this project, we actively maintain relevant software within NeuroDebian so that ReproNim and other projects can benefit from this turnkey platform. We also provided official Debian packaging for Singularity "scientific containerization" platform to make your research more flexible and reproducible. With our ongoing software and platform development, we aim to provide you with a collection of tools that are useful on their own even if you choose not to use the full suite of products.

Currently we are focused on the following projects. You can help us by trying and starting to use them, contributing, or by sharing your ideas on how to improve them. We need your feedback (positive or negative) to make our projects most beneficial for your research.

ReproIn

Collect and prepare neuroimaging data in an efficient and automated way to immediately benefit from the established community standard Brain Imaging Data Structure (BIDS) and DataLad, a distributed data distribution and management system. To achieve that we

  • formalize a specification and procedures for MRI scanning to allow for flexible, automated, and non-ambiguous conversion of all acquired MR data from original DICOM files to BIDS dataset(s)
  • contribute to the development of the Heuristic DICOM Converter (HeuDiConv) which provides us the base implementation platform
  • integrate with "lab notebook" suites such as BrainVerse

Automatically converting all collected data to BIDS, while retaining all original DICOMs as well a clear association between raw and converted data, makes it possible to

  • validate and assess the quality of data acquisition by using the BIDS-validator and MRIQC to detect possible problems with the design or acquisition in the earliest stages of a project and avoid many common mistakes stemming from manual data conversion and curation
  • automatically reconvert all collected data if relevant defects get detected in the conversion backend utility, if additional metadata stored in DICOMs is found to be necessary, or in order to confirm the reproducibility of the data itself
  • make it possible to start data analysis immediately after collection using any of the BIDS-aware applications
  • facilitate data sharing (private or public) and publication (data descriptor papers) since all data is immediately almost completely ready for sharing with a standardized file layout and annotation.

Visit reproin.repronim.org for more information.

YODA Principles

YODA (YODA’s organigram on data analysis) outlines an approach for using version control systems such as git, git-annex, and DataLad in a modular fashion to cover the entire life-span of a research project with reliable, non-ambiguous tracking and orchestration of all digital products of a study (e.g., inputs, code, outputs). Modularization facilitates the independent re-use of parts (e.g., the same data used across multiple studies, versions of a software library used repeatedly) in a manner that scales to dataset sizes found in cutting edge high-resolution neuroimaging research.

In the scope of the ReproIn project, having all data collected as DataLad datasets makes it possible to

  • start changing and enriching BIDS dataset with information not available during scanning (condition onsets, dataset metadata, etc.) while allowing for subsequent updates and merges with new data coming from the scanner using standard Git mechanisms
  • distribute data for curation and processing across available infrastructure while relying on git-annex to maintain information about data availability, and Git providing clear information about dataset version(s)
  • annotate sensitive data files with metadata to automatically restrict their public sharing (e.g. original DICOMs, non-defaced anatomicals).

We also extend and contribute new functionality to DataLad to facilitate VCS-enabled provenance tracing of execution and results:

You can adhere to YODA principles by making a better use of VCS in your research projects. Visit and contribute to our training materials and YODA template repository.

ReproMan

Neuroimaging Computation Environments Manager (ReproMan) is being developed to help researchers track and manage computation resources that they have available and to use them in a reproducible and scalable way. We aim for ReproMan to

  • collect information about used resources—such as software packages (Debian, Conda, pip), VCS repositories (Git, SVN), containers (Docker, Singularity)—possibly enriching already collected provenance information (e.g., obtained from BrainVerse, ReproZip, or workflow engines). We emphasize collecting an exhaustive amount of information so it could be sufficient to later automatically re-instantiate necessary components given just a specification.
  • analyze collected information to answer questions like
    • "Am I using the same software as in the original analysis?"
    • "What has changed?"
    • "Does a given environment satisfy the necessary requirements?"
  • (when feasible) adjust any given computational environment to satisfy the specified requirements, or just repopulate a new (Docker, Singularity, etc.) container which would.
  • provide a uniform interface to a variety of computational environments (containers, cloud, etc.) for interactive sessions and scheduling of computational tasks while allowing for tracing of computational resources used by the computation.

Visit reproin.repronim.org for more information.

Our Do Team

Robert Buccigrossi

Yaroslav Halchenko

Christian Haselgrove

Kyle Meyer

Matt Travers