The Connectome File Format

Stephan Gerhard (Signal Processing Laboratory 5, Ecole Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland)

Introduction

Arguably, the integration of data across spatio-temporal scales and modalities, as well as the integration of knowledge across conceptual domains with ontologies, pose serious sociological and technical challenges to the field of neuroinformatics. A coherent, synthetic theory of nervous system's development, structure and function can only emerge, if these challenges are met and researchers agree on standards of knowledge, data and source code exchange. Only such standards will provide an integrative methodology for collaborative learning and enable experimental reproducibility.

Method

To tackle these challenges, the “Connectome File Format” (CFF) was developed. The necessity of flexible annotation of metadata for each individual data item (termed “connectome object”) led to the adoption and integration of two proposed standards for metadata annotation: Open Metadata Markup Language (odML) [1] and Dublin Core Metadata Initiative Terms (DCMI) [2]. The DCMI terms allow annotation of general metadata attributes for resources in the semantic web. Fields are reused such as title, creator, publisher, creation and modification date, license, references and description. On the other hand, the odML scheme allows for annotation of structured properties in sections and subsections for each connectome object. By a clear separation of structure and vocabulary, odML enables flexible bottom-up building of metadata vocabularies for various neuroscientific domains.

The CFF is a container format that “connects” and builds on top of established data formats in the neuroimaging community. A connectome file consists of one XML-file (the “meta.cml” file) and the data files in different data formats stored in a path relative to the main XML-file. The XML-file is an instance of the “Connectome Markup Language” XML Schema which contains the connectome file metadata, and the description of connectome objects, including their unique name, data format and reference to the original data file, measurement modality and metadata. Importantly, the supported connectome objects and data formats though predefined leave open the description of the measurement modality which can be a unique identifier from a standardized ontology. A connectome file is not confined to store only subject-related datasets, but it can also store group-related datasets with many subjects, providing a coherent way to organize one's study datasets.

Practically, just defining yet another format is not enough for user adoption. The format needs to be accompanied with a library that enables reading, writing and manipulation of connectome files. For this purpose, the Connectome File Format Library “cfflib” [3] was implemented in Python. The library exposes an easily comprehensible conceptual view on the underlying object-model, defined by the connectome markup language. With the cfflib, it is easy to add and remove connectome objects, update their metadata fields and load datasets into memory on demand, and manipulate and store them again. Implementing the library in Python allows the support of numerous data formats for which open source libraries for reading and writing are available.

Result

The CFF was used for structured storage of connectome mapping results [4], and to interface with the Connectome Viewer for subsequent analysis and visualization [5]. It enables to bundle diverse datasets such as volumes, surfaces, fibers, networks, timeseries and behavioural data, all within the same container. Furthermore, entities can be related to each other, such as network nodes to surface patches, volumetric regions of interest or timeseries, and network edges to fiber tracks.

The Extensible Neuroimaging Archive Toolkit XNAT [6] is widely used as a neuroimaging data sharing platform. The cfflib contains a push/pull mechanism based on PyXNAT [7] to make connectome files easily available on XNAT instances. As CFF represents self-descriping and self-contained chunks of data, it will facilitate the conversion to MapReduce-based neuroscience data sharing and exploration frameworks [8]. Meanwhile, a public, curated data repository is available on GitHub [9], providing exemplary connectome files.

Conclusion

Through standardization in the neuroinformatics field, a unified neuroscience infrastructure as envisioned in the introduction is more likely to emerge. With the Connectome File Format, a first pragmatic proposal is provided of how integration could be achieved across structural and functional data, multi-modal datasets, neuroinformatics infrastructure developers and users, and ontology curators and data producer. User acceptance is facilitated through a bottom-up approach for minimal, but flexibly extensible metadata annotation, and by providing the readily usable library cfflib for data manipulation.

Further development is required for integration of micro- and macro-scale measurement modalities, proper time domain support, as well as export options for neurocomputational modelling tools. The various INCF task forces provide the forum for these ongoing discussions.


References
1. Open Metadata Markup Language, http://www.g-node.org/projects/odml
2. DCMI Metadata Terms, http://dublincore.org/documents/dcmi-terms/
3. Connectome File Format Library, http://cmtk.org/cfflib/
4. Connectome Mapping Toolkit, http://cmtk.org/
5. Connectome Viewer, http://www.connectomeviewer.org/
6. Extensible Neuroimaging Archive Toolkit, http://www.xnat.org/
7. PyXNAT, http://packages.python.org/pyxnat/
8. http://en.wikipedia.org/wiki/MapReduce
9. CFFdata repository, http://github.com/LTS5/cffdata/

Preferred presentation format: Poster
Topic: General neuroinformatics

Document Actions