Human-based computing for bioinformatics

We develop a human-based computing framework to improve multiple sequence alignments called Phylo.

We abstract the multiple alignment problem to a game where the goal will be to align words made by pieces of different color instead of letters representing the genetic code (A, C, G, T). The sequences are displayed on a grid where you can only move them horizontally. Your goal will be to create columns with the same color. You will find more information on how to play on the Phylo's website.

Our goal is to provide an interface to (i) entertain our users and (ii) improving the quality of the data used by biologists. If you want to know more about the motivations of this work you can check the posts we got on CBC, Wired, Discovery Channel, MSNBC CNET and the Montreal's gazette.

Ensemble Prediction of Protein Structures

We develop a framework for modeling and predicting ensembles of protein structures. Our algorithms are implemented in a suite of tools named partiFold.

partiFold aims to compute quickly (couple of minutes) accurate three-dimensional structure predictions of long polypeptides (several hundreds of residues) without using any template. More importantly, it is designed to provide a realistic picture of the folding landscape by computing macroscopic behaviors of the ensemble of folds from microscopic properties of residues. Instead of focusing the analysis on a single minimum folding energy structure, it uses statistical mechanics techniques to compute properties of the ensemble of structures found at the equilibrium.

We illustrate below the broad range of predictions provided by partiFold. It includes, but is not limited to, (a) matrices of inter-residue contact probabilities, and (b) per-residue flexibility profiles of β-strands.

(a) Inter-residue contact probability matrix.
(b) Per-residue flexibility profile.

Mutational Analysis of Ribonucleic acids

We develop a computational framework named RNAmutants which aims to analyze the relationship between RNA sequences and structures (a.k.a. sequence-structure maps).

By allowing a simultaneous exploration of the complete structure and mutation landscape in polynomial time and space, RNAmutants generalizes classical RNA secondary structure prediction algorithms. We illustrate this concept in the figure below. We map all sequences to all their potential secondary structures. Input sequence is at the center and concentric rings represent the k-mutants neighborhood (i.e. sequences with k mutations).

While classical folding algoritms are limitated to explore the conformational landscape of the input sequence (i.e. the central node in the figure). RNAmutants enables to analyze the modifications of the conformational landscape resulting from any mutations of the input sequence (i.e. the concentric rings).

RNAmutants allows stunning applications ranging from (i) the analysis of the thermodynamical stability of secondary structure elements, (ii) the prediction of deleterious mutations or (iii) the engineering of new RNA molecules.

For instance, we illustrate below how RNAmutants can be used to predict the mutational robustness of a structure upon a sequence. In the Hepatitis C virus cis-acting replication element below blue labels stand for positions that can be mutated without disrupting the structure, while red ones are very sensitive. Green, yellow and orange are intermediate cases. The base pairs stability is indicated by the intensity of the bond.