Accessing and scripting Neuroimaging XNAT databases with PyXnat
Yannick Schwartz (Neurospin-I2BM-CEA), Alexis Barbot (Neurospin-I2BM-CEA), Vincent Frouin (Neurospin-I2BM-CEA), Benjamin Thyreau (Neurospin-I2BM-CEA), Gael Varoquaux (Neurospin-INRIA), Bertrand Thirion (Neurospin-INRIA), Jean-Baptiste Poline (Neurospin-I2BM-CEA)
An increasing number of large international projects are generating large amount of neuroimaging and associated data such as behavioural, clinical or genetic, requiring databases and management systems. The time researchers spend to manage and query the data is increasing with the size and complexity of these databases, making data analysis more cumbersome. To automate data management and processing tasks, it is crucial to be able to script the access to a database.
We introduce here PyXNAT, a Python module that interacts with The Extensible Neuroimaging Archive Toolkit (XNAT) through native Python calls across multiple operating systems.
We built a python communication library over the XNAT databasing XNAT (Marcus, 2004) is an open source software platform designed to manage neuroimaging and associated data. It helps organizing and accessing data from small studies or from large datasets with multiple modalities. We chose the Python language that enjoys a growing success in the neuroimaging community (Koetter, 2008), as an alternative or a complement to other analysis tools.
The most common tool for working with a large database
in neuroimaging is XNAT. XNAT provides a web interface to select part of the
data such as a sub-population with specific characteristics through a search utility
and then download the relevant data locally. Interacting with XNAT databases is
therefore most often done using the web interface. However, Databases may store
many variables and it may be challenging to pick out the right data to download
from any graphical interface. Additionally, the processing of large databases
generally has to be scripted and distributed across processors. Once the data
are downloaded, the File System (FS) subsequently aggregates the transferred
data and annotates the data in a consistent and meaningful manner with specific
paths and file names. This step is equivalent to converting manually the database
in a local FS-based store that lacks advanced search capabilities and has to be
synchronized - again manually - with the database.
We designed a communication libray that can give a direct access to the XNAT server and deal with data management tasks such as keep the local data up to date. Processing scripts accessing a central database are easier to share and to re-use. The vocabulary to describe the data is defined at the database scale which means that it is shared and grasped by a group of users.
We have implemented a Python module called PyXNAT on top of a REST API (Representational State Transfer) to communicate with XNAT. It is an open-source project available for download at http://pypi.python.org/pypi/pyxnat and documented at http://packages.python.org/pyxnat.
The XNAT REST API identifies uniquely the data with URIs (Uniform Resource Identifier) and uses HTTP for transfer. As a separate feature, the XNAT search engine is also accessible through REST since it can receive an XML document describing a query at a specific URI. Wrapping the REST API in Python makes it possible to unify the two functionalities so that a list of variables and a list of files for a subset of the database are retrievable under consistent semantics. It also introduces new features, such as caching and introspection mechanisms to solve performance issues and help users navigate XNAT.
PyXNAT connects programs to an XNAT server. As an example, NiPyPE is a Python module that interfaces to existing neuroimaging software such as SPM, FSL, FreeSurfer or others. It is also able to distribute jobs over clusters which makes it very efficient to process large amounts of data. Its data connection method was originally FS-based but it now accesses an XNAT server through PyXNAT. PyXNAT and NiPype are being used jointly to run analysis on the IMAGEN European project that aims to study addiction risk factor in a large cohort of more than 2000 14-year-old adolescents.
PyXNAT enables an XNAT access in the Python environment. It can be used both as an interactive command line interface and as a back-end communication library. We see PyXNAT as a major step to help process datasets in XNAT servers. Other projects may use the NiPyPE/PyXNAT combination in the future, such as the International Neuroimaging data-sharing initiative (INDI), a project within the 1000 Functional Connectomes Project.
This work is partly founded by the IMAGEN project from the European Community’s Sixth Framework Programme (LSHM-CT-2007-037286). This abstract reflects
only the author’s views and the Community is not liable for any use that may be
made of the information contained therein.
Marcus, D. et al., XNAT: A software framework for managing neuroimaging laboratory data. In Proceedings of the 11th annual meeting of the organization for human brain mapping, Toronto, Canada. 12th–16th June. Neuroimage.
Koetter, R. et al., 2008. Python in neuroscience. Front. Neuroinformatics.
Ghosh, S. et al., 2010. Nipype: Opensource platform for unified and replicable interaction with existing neuroimaging tools. In 16th Annual Meeting of the Organization for Human Brain Mapping.