Accessing and scripting Neuroimaging XNAT databases with PyXnat
Yannick Schwartz (Neurospin-I2BM-CEA), Alexis Barbot (Neurospin-I2BM-CEA), Vincent Frouin (Neurospin-I2BM-CEA), Benjamin Thyreau (Neurospin-I2BM-CEA), Gael Varoquaux (Neurospin-INRIA), Bertrand Thirion (Neurospin-INRIA), Jean-Baptiste Poline (Neurospin-I2BM-CEA)
Introduction
An increasing number of large international projects are generating large
amount of neuroimaging and associated data such as behavioural, clinical or
genetic, requiring databases and management systems. The time researchers spend
to manage and query the data is increasing with the size and complexity of
these databases, making data analysis more cumbersome. To automate data
management and processing tasks, it is crucial to be able to script the access
to a database.
We introduce here PyXNAT, a Python module that interacts with The Extensible
Neuroimaging Archive Toolkit (XNAT) through native Python calls across multiple
operating systems.
Methods
We built a python communication library over the XNAT databasing XNAT (Marcus,
2004) is an open source software platform designed to manage
neuroimaging and associated data. It helps organizing and accessing data from
small studies or from large datasets with multiple modalities. We chose the Python
language that enjoys a growing success in the neuroimaging community (Koetter,
2008), as an alternative or a complement to other analysis tools.
The most common tool for working with a large database
in neuroimaging is XNAT. XNAT provides a web interface to select part of the
data such as a sub-population with specific characteristics through a search utility
and then download the relevant data locally. Interacting with XNAT databases is
therefore most often done using the web interface. However, Databases may store
many variables and it may be challenging to pick out the right data to download
from any graphical interface. Additionally, the processing of large databases
generally has to be scripted and distributed across processors. Once the data
are downloaded, the File System (FS) subsequently aggregates the transferred
data and annotates the data in a consistent and meaningful manner with specific
paths and file names. This step is equivalent to converting manually the database
in a local FS-based store that lacks advanced search capabilities and has to be
synchronized - again manually - with the database.
We designed a communication libray that can give a direct access to the XNAT
server and deal with data management tasks such as keep the local data up to
date. Processing scripts accessing a central database are easier to share and
to re-use. The vocabulary to describe the data is defined at the database scale
which means that it is shared and grasped by a group of users.
Results
We have implemented a Python module called PyXNAT on top of a REST API (Representational State Transfer) to communicate with XNAT. It is an open-source project available for download at http://pypi.python.org/pypi/pyxnat and
documented at http://packages.python.org/pyxnat.
The XNAT REST API identifies uniquely the data with URIs (Uniform Resource Identifier)
and uses HTTP for transfer. As a separate feature, the XNAT search engine is
also accessible through REST since it can receive an XML document describing a
query at a specific URI. Wrapping the REST API in Python makes it possible to
unify the two functionalities so that a list of variables and a list of files
for a subset of the database are retrievable under consistent semantics. It
also introduces new features, such as caching and introspection mechanisms to
solve performance issues and help users navigate XNAT.
PyXNAT connects programs to an XNAT server. As an example, NiPyPE is a Python module
that interfaces to existing neuroimaging software such as SPM, FSL, FreeSurfer
or others. It is also able to distribute jobs over clusters which makes it very
efficient to process large amounts of data. Its data connection method was
originally FS-based but it now accesses an XNAT server through PyXNAT. PyXNAT
and NiPype are being used jointly to run analysis on the IMAGEN European
project that aims to study addiction risk factor in a large cohort of more than
2000 14-year-old adolescents.
Conclusions
PyXNAT enables an XNAT access in the Python environment. It can be used both as
an interactive command line interface and as a back-end communication library.
We see PyXNAT as a major step to help process datasets in XNAT servers. Other
projects may use the NiPyPE/PyXNAT combination in the future, such as the
International Neuroimaging data-sharing initiative (INDI), a project within the
1000 Functional Connectomes Project.
Acknowledgement
This work is partly founded by the IMAGEN project from the European Community’s Sixth Framework Programme (LSHM-CT-2007-037286). This abstract reflects
only the author’s views and the Community is not liable for any use that may be
made of the information contained therein.
References
Marcus, D. et al.,
XNAT: A software framework for managing neuroimaging laboratory data. In
Proceedings of the 11th annual meeting of the organization for human brain
mapping, Toronto, Canada. 12th–16th June. Neuroimage.
Koetter, R. et al.,
2008. Python in neuroscience. Front. Neuroinformatics.
Ghosh, S. et al.,
2010. Nipype: Opensource platform for unified and replicable interaction with
existing neuroimaging tools. In 16th Annual Meeting of the Organization for
Human Brain Mapping.