The Toulouse Genotoul bioinformatics platform, in collaboration with the Genotoul Biostatistics platform, the Sigenae team and the MIAT unit, organize a 3.5 days long training course for bio-informaticians and biologists aiming at learning sequence analysis. It focuses on (protein coding) gene expression analysis using reads produced by ‘RNA-Seq’. This training session is designed to introduce sequences from ‘NGS’ (Next Generation Sequencing), particularly Illumina platforms (HiSeq). You will discover the standards file formats, learn about the usual biases of this type of data and run different kinds of analyses, such as spliced alignment on a reference genome, novel gene and transcript discovery, expression quantification of coding genes and transcripts. Finally you will be able to extract the differentially expressed genes.



This training focuses on practice. It consists of modules with a large variety of exercises described hereunder (PROVISIONAL SCHEDULE):

  • Reminders (Day 1: 13:00 pm to 15:00 pm): Reminders about linux command lines and jobs submission in a cluster. Reminders about essential file formats
  • Introduction (Day 1: 15:15 pm to 16:30 pm): What will be my experimental plan? What is gene expression? What kind of technology can be used to monitor gene expression? What do the reads produced by NGS platforms (Illumina) using the RNA-Seq protocol look like? Which are the known biases of these sequences? Presentation of the dataset for the practical exercises
  • Sequence quality (Day 1: 16:30 pm to 17:00 pm).
  • Sequence cleaning (Day 2: 09:00 am to 10:30 am).
  • Splice aligning reads on a reference genome, Visualizing alignments and splice sites using IGV (Integrated Genome Viewer) (Day 2: 10:45 am to 12:45 am).
  • Raw count vs. abundance estimate (Day 2: 14:00 pm to 15:30 pm).
  • Discovering novel genes and transcripts Part 1 (Day 2: 15:30 pm to 16:45 pm).
  • Discovering novel genes and transcripts Part 2 (Day 3: 09:00 am to 09:45 am).
  • Comparison of models, visualization and results of gene expression quantification and conclusions (Day 3: 09:45 am to 11:45 am).
  • Statistics: Exploratory analysis of count data (Day 3: 13:30 pm to 16:30 pm).
  • Statistics: Normalization and differential expression analysis (Day 4: 9:00 am to 17:00 pm).


The session will take place in the room ‘salle de formation’ at the INRA center of Toulouse-Auzeville.

Prerequisites: ability to use a Unix environment (see Unix training) and basic knowledge in R. Training materials will be available on the website before the session. Slides will be downloadable from our web site. A Unix reference command leaflet will also be provided. Only the latter will be available during the session.



