RNAseq bioinfo/biostats

Introduction

This page contains the material (files, links,…) used during the RNA-Seq course given by the MIAT Unit and Bioinfo Genotoul platform.

It also contains some Genotoul scripts for biostatistics.

Slides and exercises
  • Command line slides and exercises see
  • Galaxy slides and exercises go
  • Data for training, are available in previous links under data directory.

Lots of  informations about RNAseq statistic analysis are available here

Biostatistics scripts on genotoul

More info about scripts presented in this page are available here: genoweb.toulouse.inra.fr/~formation/LigneCmd/RNAseq/doc/RScriptsDocumentation.pdf

Input data

Format your count table like this (separator tabulation)

gene_id    untreated1    untreated2    untreated3    untreated4    treated1    treated2    treated3 
FBgn0000003    0    0    0    0    0    0    1 
FBgn0000008    92    161    76    70    140    88    70 
FBgn0000014    5    1    0    0    4    0    0 
FBgn0000015    0    2    1    2    1    0    0 
FBgn0000017    4664    8714    3564    3150    6205    3072    3334 
FBgn0000018    583    761    245    310    722    299    308 
FBgn0000022    0    1    0    0    0    0    0 
FBgn0000024    10    11    3    3    10    7    5 
FBgn0000028    0    1    0    0    0    1    1 
FBgn0000032    1446    1713    615    672    1698    696    757 
FBgn0000036    2    1    0    0    1    0    1 
FBgn0000037    15    25    9    5    20    14    17

Test data are available here : http://www.nathalievilla.org/doc/gz/RNAseq_data.tar.gz

wget http://www.nathalievilla.org/doc/gz/RNAseq_data.tar.gz

tar -xvzf RNAseq_data.tar.gz

Fix R environment variable for cluster
  • go on a node:
    srun -c 4 --pty bash
  • Fix your environment variable:
    export R_LIBS="~/work/Rlib"
    mkdir ~/work/Rlib
  • Load R module :
    module load system/R-3.5.1
Run normalization
  • Get help:
    Rscript /usr/local/bioinfo/Scripts/bin/Normalization.R
  • Run:
    Rscript /usr/local/bioinfo/Scripts/bin/Normalization.R -f count_table.tsv -o ./normalization
  • List result directory:
    ls ./normalization
  • Download image and pdf in local machine to view:
    scp user@genologin.toulouse.inra.fr:~/work/normalization .
  • Select the normalization where boxplots and density plot are best aligned and where libraries are well separate in PCA.
Run differential expression detection
  • Get help:
    Rscript /usr/local/bioinfo/Scripts/bin/DEG.R
  • Run script ( with initial matrix and normalization info file):
    Rscript /usr/local/bioinfo/Scripts/bin/DEG.R -f count_table.tsv -n ./normalization/RLE_info.txt -o DEG --pool1 untreated1,untreated2,untreated3,untreated4 --pool2=treated1,treated2,treated3 --filter TRUE --alpha 0.05 --correct BH --MAplots TRUE
  • Download result on your computer
    scp user@genotoul.toulouse.inra.fr:~/work/DEG .
  • Your differential expressed genes are available in :
    DEG/resDEG.csv
Perform GO enrichment