Arnaud Droit Laboratory


Where omics technologies meets computer science and artificial intelligence

Welcome to ADlab

Computational biology platform of the research center of CHU de Québec-Université Laval.

ADlab focuses on the development of tools dedicated to the analysis of omics-type massive data, including genomics, transcriptomics, proteomics and metabolomics. His work provides a better understanding of the complex biological mechanisms of different diseases or biological phenomena. His team develops various approaches to identify multi-omics signatures using multivariate-driven methods such as machine learning and PLS-DA, and knowledge-based methods such as interaction networks and modules detection. Other projects are focusing on data management of big biological data with interactive graphical layers, to offer a better exploitation of the data. The lab is also involved in medium to large scale collaborative projects with international teams in academics and private sector focusing on breast cancer, prostate cancer, neuro-developmental diseases and pediatrics in order to discover new therapeutics targets. The laboratory expertise is now evolving to the next step, consisting of elaborate strategies to identify drugs of interest by drug repositioning, including drug synergies and secondary effects/toxicity prediction.

ADlab expertises


We perform large scale studies to decipher the role of proteins and identify organisms through proteomics

Cancer research

We identify causes and develop strategies for diagnosis and prognosis of breast and uro-oncological cancers

Big biological data

We manage to handle terabytes of biological data of many types: Genomics, proteomics, etc.

R package development

We develop various R packages to help process biological data

Semantic web

We host Bio2RDF, which provides the largest network of Linked Data for the Life Sciences

Genomics and transcriptomics

We can exploit data from various sequencing experiments (RNAseq, miRseq, WGS, WES)

Machine learning and Deep learning

We create programs using machine learning and deep learning algorithms to classify biological data

ADlab projects

Biomarker discovery by machine learning

Machine learning

We use machine learning approaches in different projects to analyze omics data. We also develop machine learning tools to help biomarker signature identification from disease-derived omics datasets. Omics datasets are generally highly unbalanced, where features largely outnumber samples and the patients are unequally distributed among measured outcomes. The data is also often heterogeneous (e.g. cancer data), of diverse types (e.g. categorical, numerical), and are often sparse. Thus, specific machine learning strategies have to be developed to adapt the special characteristics of omics data. 

Prediction of drug toxicity

Despite the importance of knowing a drug’s mechanism of action (MOA) for its success in clinical trials and for understanding its potential side effects, it is not a requirement for Food Drug Administration (FDA) approval. As a result, many drugs on the market are administered without knowing their precise mechanism of action. The current challenge is to predict whether a drug presents a warning toxicity and how it interacts with its environment and therefore better characterize its mechanisms via multi-layer omic network analysis approaches.



In the past decade, the main strategy for genome-wide mapping of chromatin modifications, histone marks and interactions between DNA and proteins, has been ChIP followed by microarray analysis (ChIP-chip). Recent improvements in the efficiency, quality, and cost of genome-wide sequencing prompted biologists to abandon microarrays in favor of next-generation sequencing, a method referred to as ChIP-Seq. Functional annotation of the noncoding sequences, which account for more than 95% of the genome, is difficult however due to the inherent lack of statistical and computational biology methods and tools available to agnostically interrogate epigenomic changes in humans.

The main goal of our research program is to build new computational tools to comprehensively characterize and functionally annotate the human epigenome. This research programs builds on the power of next- generation sequencing (NGS) coupled with chromatin immunoprecipitation (ChIP), an approach called ChIP-Seq to detect epigenetic variations at an unprecedented level of resolution.


Development of proteomics tools

Our team aims at bringing the power and flexibility of the R/Bioconductor statistical plateform to mass spectrometry based proteomics. The Bioconductor plateform is a repertory of softwares, data and annotation packages based on the R statistical language. This plateform allows to quickly build new analytical pipelines by seamlessly connecting various tools for data manipulation, statistical analysis, annotation or visualisation. The bioconductor plateform also facilitates the deployment of a pipeline on HPC servers or on cloud computing services.

Two packages on this plateform are currently developed: rTANDEM and shinyTANDEM. rTANDEM is the first protein identification algorithm implemented in R. It includes the tandem algorithm as well as many associated scoring functions like the k-score, hrk-score and PTMTreeSearch-score. The package also provides converter functions allowing quick conversions between R-object and XML files.

Personalised Risk Stratification for Prevention and Early Detection of Breast Cancer

Personalised Risk Stratification for Prevention and Early Detection of Breast Cancer (collaboration)

Each year, over 22,000 Canadian women are diagnosed with breast cancer, a disease that will claim the lives of 5,000 of them. The routine screening program currently in place is more accessible to women over the age of 50. However, one in five women diagnosed with breast cancer are under the age of 50.
The project aims to develop a decision-making support tool that will help extend the benefits of the current screening program to those women most at risk for breast cancer.

Through involvement with the largest international consortium on the study of breast cancer, the project will help broaden existing knowledge in order to provide better risk stratification tools, fine tune intervention strategies and offer the population more effective tools.


The project (link) is an online biohub portal that combines Elasticsearch, a fast search engine designed to manage very large amounts of data, and Siren, a web visualization plugin, that
creates relational links between biological databases. This solution enables biologists to extract meaningful information from available biological research data repositories. It also removes boundaries
by solving compatibility issues between resources (i.e. different data types, separation into specialized repositories) and performs complex searches on many resources simultaneously.

Academic partners

Centre de recherche en données massives
Centre de recherche en données massives
Centre de recherche en données massives
Centre de recherche en données massives
Centre de recherche en données massives
Centre de recherche en données massives
Plateforme protéomique

Plateforme de protéomique CHU de Québec - Centre de génomique de Québec

Centre de recherche en données massives
Plateforme séquencage

Plateforme de séquencage et génotypage des génomes - CHU de Québec

Industrial partners

Plateforme séquencage
Plateforme séquencage
Centre de recherche en données massives
Plateforme séquencage
Plateforme séquencage