RNAseq bioinfo/biostats - genotoul-bioinfo

Introduction

This page contains the material (files, links,…) used during the RNA-Seq course given by the MIAT Unit and Bioinfo Genotoul platform.

It also contains some Genotoul scripts for biostatistics.

Slides and exercises

Command line slides and exercises see
Galaxy slides and exercises go
Data for training, are available in previous links under data directory.

Lots of informations about RNAseq statistic analysis are available here

Biostatistics scripts on genotoul

More info about scripts presented in this page are available here: genoweb.toulouse.inra.fr/~formation/LigneCmd/RNAseq/doc/RScriptsDocumentation.pdf

Input data

Format your count table like this (separator tabulation)

gene_id    untreated1    untreated2    untreated3    untreated4    treated1    treated2    treated3 
FBgn0000003    0    0    0    0    0    0    1 
FBgn0000008    92    161    76    70    140    88    70 
FBgn0000014    5    1    0    0    4    0    0 
FBgn0000015    0    2    1    2    1    0    0 
FBgn0000017    4664    8714    3564    3150    6205    3072    3334 
FBgn0000018    583    761    245    310    722    299    308 
FBgn0000022    0    1    0    0    0    0    0 
FBgn0000024    10    11    3    3    10    7    5 
FBgn0000028    0    1    0    0    0    1    1 
FBgn0000032    1446    1713    615    672    1698    696    757 
FBgn0000036    2    1    0    0    1    0    1 
FBgn0000037    15    25    9    5    20    14    17

Test data are available here : http://www.nathalievialaneix.eu/doc/gz/RNAseq_data.tar.gz

wget http://www.nathalievialaneix.eu/doc/gz/RNAseq_data.tar.gz

tar -xvzf RNAseq_data.tar.gz

Fix R environment variable for cluster

go on a node:
srun -c 4 --pty bash
Fix your environment variable:
export R_LIBS="~/work/Rlib"
mkdir ~/work/Rlib
Load R module :
module load system/R-3.5.1

Run normalization

Get help:
Rscript /usr/local/bioinfo/Scripts/bin/Normalization.R
Run:
Rscript /usr/local/bioinfo/Scripts/bin/Normalization.R -f count_table.tsv -o ./normalization
List result directory:
ls ./normalization
Download image and pdf in local machine to view:
scp user@genologin.toulouse.inra.fr:~/work/normalization .
Select the normalization where boxplots and density plot are best aligned and where libraries are well separate in PCA.

Run differential expression detection

Get help:
Rscript /usr/local/bioinfo/Scripts/bin/DEG.R
Run script ( with initial matrix and normalization info file):
Rscript /usr/local/bioinfo/Scripts/bin/DEG.R -f count_table.tsv -n ./normalization/RLE_info.txt -o DEG --pool1 untreated1,untreated2,untreated3,untreated4 --pool2=treated1,treated2,treated3 --filter TRUE --alpha 0.05 --correct BH --MAplots TRUE
Download result on your computer
scp user@genotoul.toulouse.inra.fr:~/work/DEG .
Your differential expressed genes are available in :
DEG/resDEG.csv

Perform GO enrichment

Get help:
Rscript /usr/local/bioinfo/Scripts/bin/GOEnrichment.R
If your are working with the example matrix ( count_table.tsv of flybase) download GO from flybase ftp://ftp.flybase.net/releases/current/precomputed_files/go/gene_association.fb.gz
wget ftp://ftp.flybase.net/releases/current/precomputed_files/go/gene_association.fb.gz
gunzip gene_association.fb.gz
generate expected 2 columns format :
grep -v '^!' gene_association.fb | cut -f 2,5 > fb.go
Run Go enrichment on resDEG.csv :
Rscript /usr/local/bioinfo/Scripts/bin/GOEnrichment.R -f fb.go --fileFormat twoColumns -i DEG/resDEG.csv -o GOEnrichment -a classic -t fisher
Download result on your computer :
scp user@genologin.toulouse.inra.fr:~/work/GOEnrichment .