RNAseq bioinfo/biostats

Introduction

This page contains the material (files, links,…) used during the RNA-Seq course given by the Sigenae Team and Bioinfo Genotoul platform.

It also contains some Genotoul scripts for biostatistics.

Slides and exercises
  • Command line slides and exercises see
  • Galaxy slides and exercises go
  • Data for training, go

More info for RNAseq statistical analysis are available here

Biostatistics scripts on genotoul

More info about scripts presented in this page are available here: genoweb.toulouse.inra.fr/~formation/LigneCmd/RNAseq/doc/ScriptsDocumentation.pdf

Input data

Format your count table like this (separator tabulation)

gene_id    untreated1    untreated2    untreated3    untreated4    treated1    treated2    treated3 
FBgn0000003    0    0    0    0    0    0    1 
FBgn0000008    92    161    76    70    140    88    70 
FBgn0000014    5    1    0    0    4    0    0 
FBgn0000015    0    2    1    2    1    0    0 
FBgn0000017    4664    8714    3564    3150    6205    3072    3334 
FBgn0000018    583    761    245    310    722    299    308 
FBgn0000022    0    1    0    0    0    0    0 
FBgn0000024    10    11    3    3    10    7    5 
FBgn0000028    0    1    0    0    0    1    1 
FBgn0000032    1446    1713    615    672    1698    696    757 
FBgn0000036    2    1    0    0    1    0    1 
FBgn0000037    15    25    9    5    20    14    17

Test data are available here : http://www.nathalievilla.org/doc/gz/RNAseq_data.tar.gz

Fix R environment variable for cluster
  • go to the cluster:
    qrsh
  • Fix your environment variable:
    export R_LIBS=”~/work/Rlib”
    mkdir ~/work/Rlib
Run normalization
  • Get help:
    /usr/local/bioinfo/src/R/R-3.2.2/bin/Rscript /usr/local/bioinfo/Scripts/bin/Normalization.R
  • Run:
    /usr/local/bioinfo/src/R/R-3.2.2/bin/Rscript /usr/local/bioinfo/Scripts/bin/Normalization.R -f count_table.tsv -o ./normalization
  • List result directory:
    ls ./normalization
  • Download image in local machine to view:
    scp user@genotoul.toulouse.inra.fr:~/work/normalization .
  • Select the normalization where boxplots and density plot are aligned  and where libraries are well separate in PCA.
Run differential expression detection
  • Get help:
    /usr/local/bioinfo/src/R/R-3.2.2/bin/Rscript /usr/local/bioinfo/Scripts/bin/DEG.R
  • Run script ( with initial matrix and normalization info file):
    /usr/local/bioinfo/src/R/R-3.2.2/bin/Rscript /usr/local/bioinfo/Scripts/bin/DEG.R -f count_table.tsv -n ./normalization/RLE_info.txt -o DEG –pool1 untreated1,untreated2,untreated3,untreated4 –pool2=treated1,treated2,treated3 –filter TRUE –alpha 0.05 –correct BH –MAplots TRUE
  • Download result on your computer
    scp user@genotoul.toulouse.inra.fr:~/work/DEG .
  • Your differential expressed genes are available in :
    DEG/resDEG.csv
Perform GO enrichment
  • Get help:
    /usr/local/bioinfo/src/R/R-3.2.2/bin/Rscript /usr/local/bioinfo/Scripts/bin/GOEnrichment.R
  • If your are working with the example matrix ( count_table.tsv of flybase) download GO from flybase ftp://ftp.flybase.net/releases/current/precomputed_files/go/gene_association.fb.gz
  • Generate expected 2 columns format :
    gunzip gene_association.fb.gz
    grep -v ‘^!’ gene_association.fb | cut -f 2,5 > fb.go
  • Run Go enrichment on resDEG.csv :
    /usr/local/bioinfo/src/R/R-3.2.2/bin/Rscript /usr/local/bioinfo/Scripts/bin/GOEnrichment.R -f fb.go –fileFormat twoColumns -i DEG/resDEG.csv -o GOEnrichment -a classic -t fisher
  • Donwload result on your computer :
    scp user@genotoul.toulouse.inra.fr:~/work/GOEnrichment .