NGS data

If you want to publish results on short reads you have to publish your data.

  • For genomic data you can submit it in the Short Read Archive (SRA).
The european mirror of SRA is the ENA, have a look to the video in the paragraph "Submitting public access data using SRA Webin". Follow those step : - create your account - fill all the needed information - compute the md5 of your raw file or ask to the bioinformatics platform if your data are in NG6. md5 is a unique string generated fron the content of a file. To compute it on windows use WinMD5, on linux use in the terminal the command md5sum /path/to/file - transfert your data when it's required or ask to the bioinformatics platform if your data are in NG6.
with a SRA run id (eg SRR1045854) type :
  • create ncbi directory: mkdir ~/work/ncbi
  • create link : ln -s ~/work/ncbi ~/ncbi
  • donwload data : prefetch SRR1045854 Your data will be available ~/wor/ncbi/public/sra/SRR1045854.sra
  • convert sra to fastq fastq-dump ~/wor/ncbi/public/sra/SRR1045854.sra
  • After conversion, remove .sra file to save disk space.
    You should use file from ENA as the protocol is much much much faster and files are already in fastq format.
  • Find your dataset at https://www.ebi.ac.uk/ena/. Here is an example
  • Find the FASTQ url in columns "FASTQ files (FTP)", in the example it's : ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR104/003/SRR1045853/SRR1045853.fastq.gz
  • On Genotoul cluster type : ascpwrap.pl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR104/003/SRR1045853/SRR1045853.fastq.gz .