Research Ideas and Outcomes :
Software Management Plan
|
Corresponding author: Horea-Ioan Ioanas (ioanas@biomed.ee.ethz.ch)
Received: 03 Feb 2017 | Published: 07 Feb 2017
© 2017 Horea-Ioan Ioanas, Bechara Saab, Markus Rudin
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ioanas H, Saab B, Rudin M (2017) Gentoo Linux for Neuroscience - a replicable, flexible, scalable, rolling-release environment that provides direct access to development software. Research Ideas and Outcomes 3: e12095. https://doi.org/10.3897/rio.3.e12095
|
Gentoo is a GNU/Linux metadistribution designed to maximize and simplify user control of the software environment. All determinants of a Gentoo environment are recorded in a small number of plain-text configuration files, from which the software make-up of the system can be reconstructed entirely. As such, Gentoo constitutes a replicable and transparent software infrastructure - as mandated by research valuing reproducibility. Of equal scientific interest is the flexibility of Gentoo's package management. All software is distributed in a rolling-release fashion, giving the user full control over which versions (including live versions and branches/tags from version control) of which programs to install, and with which compilation options. All of the above is accompanied by automatic, version-aware dependency resolution, which also tracks static library linking and prompts for rebuilds as necessary.
We believe Gentoo is excellently suited to address many of the challenges in neuroscience software management; including: system replicability, system documentation, data analysis reproducibility, fine-grained dependency management, easy control over compilation options, and seamless access to cutting-edge software releases.
We have made a substantial set of neuroimaging and data analysis packages - including their entire dependency stacks - available for any system using Gentoo's Package Management Standard. Neuroscientific software now usable under Gentoo includes but is not limited to:
Herein we describe the implementation and current capabilities of this environment, as well as its ability to accelerate and improve research.
gentoo, portage, linux, software management, repository, dependency management, dependency resolution, neuroscience, flexible, scalable, rolling-release, live software, versioning, source-based, gnu, gnu/linux
Neuroscientific data analysis commonly relies on a multitude of software packages, which many scientists still resort to managing manually. Across the scientific community, manual software management is a major cause of effort duplication, resource waste (
The Gentoo Linux metadistribution*
In addition to a package-version-aware dependency graph, Portage provides USE flags - parameters which can be used to specify how packages should be built. This fine-grained control is useful for reducing disk space footprint and memory usage, but can also - among many other things - allow administrators to select whether a package is built with static libraries or not. As package version differences and library linking are a leading factor impeding data analysis reproducibility (
Furthermore, the Gentoo Prefix project allows users of any GNU/Linux distribution - and even of some non-Linux operating systems - to set up a Portage software environment in userspace. This is especially relevant for researchers who use high-performance computing environments where they are not awarded administration rights (
We tackle the advanced software management needs faced by neuroscience by leveraging the manifold capabilities of the Gentoo metadistribution and the Portage package manager. This task materializes chiefly in writing ebuilds for the most popular neuroscientifc packages and their dependencies, integrating these into the Gentoo ebuild repositing model, and testing the resulting environment in present research scenarios.
Ebuilds are reposited in directory trees called repositories, which can be enabled by the addition of a simple text file defining a small number of parameters (such as name, location, and priority) to the package manager configuration directory. In addition to the main Gentoo repository, containing just under 20.000 packages, a number of other repositories enjoy official status, and their users can rely on support from the entire Gentoo community. Of these we distribute neuroscience ebuilds via the Gentoo Science overlay (
We have contributed and are maintaining ebuilds for about 40 neuroscientifically relevant software packages to the Gentoo Science overlay. This set encompasses highly specialized software, as well as a few more general scientific packages, and a number of dependencies not previously available for Gentoo. The ebuilds for dependencies not directly related to scientific applications are scheduled for migration to the main Gentoo repository.
Our contributed ebuild set (Table
The list of packages written in order to facilitate automated neuroscientific software management on Gentoo platforms. It should be noted that very many packages with only incidental use for neuroscience (e.g. scikit-learn (
dev-python/imageio |
dev-python/tqdm |
dev-python/moviepy |
dev-python/matrix2latex |
dev-python/prov |
dev-python/pydotplus |
dev-python/pymvpa |
dev-vcs/datalad |
dev-tex/pythontex |
media-libs/avbin-bin |
sci-biology/afni |
sci-biology/ants |
sci-biology/bru2nii |
sci-biology/dipy |
sci-biology/fsl |
sci-biology/dcmstack |
sci-biology/mne-python |
sci-biology/nilearn |
sci-biology/nistats |
sci-biology/nitime |
sci-biology/nireg |
sci-biology/psychopy |
sci-biology/pybrain |
sci-biology/pysurfer |
sci-biology/spm |
sci-libs/itk |
sci-libs/nibabel |
sci-libs/nipype |
sci-libs/nipy |
sci-libs/nipy-data |
sci-libs/nipy-templates |
sci-libs/pydicom |
sci-libs/scikits_image |
sci-libs/vxl |
sci-mathematics/mdp |
sci-visualization/mricrogl |
sci-visualization/mricron |
sci-visualization/surf-ice |
Dependency graphs with hierarchical edge bundling, depicting packages as vertices and dependency relationships as edges. The graphs are seeded by the ~40 packages which we maintain and have contributed to the Portage environment primarily for neuroscience use. Graph (a) covers the set's entire non-optional dependency stack, and totals ~550 packages. Graph (b) covers the set's entire dependency stack, including all optional dependencies, and totals ~3500 packages. The seed packages and their dependency relationships are highlighted in green. Dependencies provided by the Gentoo Science repository and their dependency relationships are colored purple. Dependencies provided by the main Gentoo repository and their dependency relationships are colored purple-tinted gray. The graph shows a tight clustering of neuroscientific Python packages, indicating the infrastructure cohesiveness and application diversity of scientific Python. The graph shows that Portage neuroscience packages make use of ~20 lower-level packages from Gentoo Science - illustrating the benefit of integrating scientific software management across disciplines. It is also notable that this graph includes deep Haskell and TeX dependency stacks - which are pulled in by DataLad (
Neuroscientific research and teaching was performed on Gentoo platforms using our ebuilds on at least 4 physical machines and over 100 virtual machines by at least 40 students and researchers at least at 4 academic institutions. The testing process demonstrated the usability of our software management solution, and illustrated areas which could most benefit from improvement, notably the ease of distribution for base Gentoo systems.
We have made a comprehensive set of neuroscientific software packages available for the wide family of Gentoo distributions and derivatives. Via Gentoo-prefix, these neuroscientific software packages are, in fact, also accessible to users of many other operating systems.
Having demonstrated the feasibility of Gentoo for neuroscientific research we seek to further improve the system, by augmenting packaging with outstanding issues, and compiling a detailed overview of the easiest ways to obtain a base Gentoo distribution - tailored to popular research usage scenarios.
We thank the Gentoo community and developers, in particular Ted Rodgers, Andrew Savchenko, and Benda Xu. We thank the Gentoo Science Project members, in particular Justin Lercher and François Bissey. We thank the brainhack organizers and attendees, and the NeuroDebian and DataLad developer, Yaroslav Halchenko.
Institute for Biomedical Engineering, ETH and University of Zürich
As a metadistribution, Gentoo consists of a collection of tools allowing users to create their own distributions - of which many have emerged and some have gained significant popularity in their own right: Sabayon, Calculate Linux, and Kogaion - just to name a few. These are distinct from Gentoo derivatives, for which Funtoo and ChromeOS would be better examples. (All of these platforms, however, can benefit from neuroscience software management solutions designed for Gentoo.)