Partial Least Squares-Correspondence Analysis (PLS-CA): A New Method to Analyze Common Patterns in Measures of Cognition and Genetics

Derek Beaton (The University of Texas at Dallas), Hervé Abdi (The University of Texas at Dallas)

With the introduction of large scale studies (such as the Alzheimer's Disease Neuroimaging Initiative; ADNI) researchers in  neuroscience, and genetics are now routinely collecting, and sharing, a wide variety of very large data sets (e.g., neuroimaging, genomes, cognitive measures) whose analysis requires suitable statistical methods. In general, the structure and variety of these data sets preclude the use of standard hypothesis driven or multivariate techniques (such as multiple regression or discriminant analysis).
 
Interestingly, most of the data in genetics-based cognition research---such as questionnaires and single nucleotide polymorphisms (SNPs)---are nominal variables and structured in blocks or tables (e.g., memory tasks for behavior; chromosomes for a genome). Nominal variables can be analyzed with multivariate techniques such as Correspondence Analysis (CA) and Discriminant Correspondence Analysis (DiCA). CA-based methods are to qualitative data what principal components-based methods are to quantitative data.
 
However, CA and DiCA are suitable to analyze one data set. In order to reveal common information between two data sets (e.g., cognitive measures and genetics) we adapted the well-known Partial Least Squares Correlation (PLSC) method (suitable for quantitative measurements) so that it can be used with nominal data. This new approach, called PLS-CA, integrates in a common framework PLSC and CA. PLS-CA reveals patterns of co-occurrence of SNPs and behavioral measures --- or connectivity between genetics and behavior. Furthermore, we extend the PLS-CA framework to predict group membership (e.g., clinical group). This new technique, called Discriminant PLS-CA reveals factors common to two nominal data tables, while accounting for a priori discriminant information for observations. Discriminant PLS-CA relates causal data (i.e., clinical groups), to relational data (i.e., genetics and cognition), similar to Multi-block PLSC.
 
PLS-CA and Discriminant PLS-CA include inferential non-parametric cross validation techniques such as permutation tests, jackknifing, and bootstrapping.
 
We illustrate the PLS-CA framework (and its discriminant extension) with data from the ADNI clinical and genetic datasets. Our goal is to relate genomic (SNPs) and behavioral data. Results show patterns of genetic variations associated to both healthy and dysfunctional cognitive traits, as well as genetic variations for clinical groups.

Partial Least Squares-Correspondence Analysis (PLS-CA): A New Method to Analyze Common Patterns in Measures of Cognition and Genetics
Preferred presentation format: Poster
Topic: Genomics and genetics

Document Actions