Introduction
This page contains the material (files, links,…) used during the RNA-Seq course given by the MIAT Unit and Bioinfo Genotoul platform.
It also contains some Genotoul scripts for biostatistics.
Slides and exercises
- Command line slides and exercises see
- Galaxy slides and exercises go
- Data for training, are available in previous links under data directory.
Lots of informations about RNAseq statistic analysis are available here
Biostatistics scripts on genotoul
More info about scripts presented in this page are available here: genoweb.toulouse.inra.fr/~formation/LigneCmd/RNAseq/doc/RScriptsDocumentation.pdf
Input data
Format your count table like this (separator tabulation)
gene_id untreated1 untreated2 untreated3 untreated4 treated1 treated2 treated3 FBgn0000003 0 0 0 0 0 0 1 FBgn0000008 92 161 76 70 140 88 70 FBgn0000014 5 1 0 0 4 0 0 FBgn0000015 0 2 1 2 1 0 0 FBgn0000017 4664 8714 3564 3150 6205 3072 3334 FBgn0000018 583 761 245 310 722 299 308 FBgn0000022 0 1 0 0 0 0 0 FBgn0000024 10 11 3 3 10 7 5 FBgn0000028 0 1 0 0 0 1 1 FBgn0000032 1446 1713 615 672 1698 696 757 FBgn0000036 2 1 0 0 1 0 1 FBgn0000037 15 25 9 5 20 14 17
Test data are available here : http://www.nathalievialaneix.eu/doc/gz/RNAseq_data.tar.gz
wget http://www.nathalievialaneix.eu/doc/gz/RNAseq_data.tar.gz
tar -xvzf RNAseq_data.tar.gz
Fix R environment variable for cluster
- go on a node:
srun -c 4 --pty bash
- Fix your environment variable:
export R_LIBS="~/work/Rlib"
mkdir ~/work/Rlib
- Load R module :
module load system/R-3.5.1
Run normalization
- Get help:
Rscript /usr/local/bioinfo/Scripts/bin/Normalization.R
- Run:
Rscript /usr/local/bioinfo/Scripts/bin/Normalization.R -f count_table.tsv -o ./normalization
- List result directory:
ls ./normalization - Download image and pdf in local machine to view:
scp user@genologin.toulouse.inra.fr:~/work/normalization .
- Select the normalization where boxplots and density plot are best aligned and where libraries are well separate in PCA.
Run differential expression detection
- Get help:
Rscript /usr/local/bioinfo/Scripts/bin/DEG.R
- Run script ( with initial matrix and normalization info file):
Rscript /usr/local/bioinfo/Scripts/bin/DEG.R -f count_table.tsv -n ./normalization/RLE_info.txt -o DEG --pool1 untreated1,untreated2,untreated3,untreated4 --pool2=treated1,treated2,treated3 --filter TRUE --alpha 0.05 --correct BH --MAplots TRUE
- Download result on your computer
scp user@genotoul.toulouse.inra.fr:~/work/DEG .
- Your differential expressed genes are available in :
DEG/resDEG.csv
Perform GO enrichment
- Get help:
Rscript /usr/local/bioinfo/Scripts/bin/GOEnrichment.R
- If your are working with the example matrix ( count_table.tsv of flybase) download GO from flybase ftp://ftp.flybase.net/releases/current/precomputed_files/go/gene_association.fb.gz
wget ftp://ftp.flybase.net/releases/current/precomputed_files/go/gene_association.fb.gz
gunzip gene_association.fb.gz
- generate expected 2 columns format :
grep -v '^!' gene_association.fb | cut -f 2,5 > fb.go
- Run Go enrichment on resDEG.csv :
Rscript /usr/local/bioinfo/Scripts/bin/GOEnrichment.R -f fb.go --fileFormat twoColumns -i DEG/resDEG.csv -o GOEnrichment -a classic -t fisher
- Download result on your computer :
scp user@genologin.toulouse.inra.fr:~/work/GOEnrichment .