NgsPipelinesOnBeocat

Pipelines to assemble de novo transcriptomes and genomes and test for differential expression

View the Project on GitHub i5K-KINBRE-script-share/transcriptome-and-genome-assembly

Alt text

About


The K-INBRE Bioinformatics Core has created easy to use pipelines for using paired end Illumina reads for several common NGS experiments.

All pipelines have sample datasets and tutorials. All pipelines take you from raw data received from your sequencing facility to finished analysis using Beocat https://www.cis.ksu.edu/beocat (the largest compute cluster in Kansas which is free to all researchers in the K-INBRE network).

No experience with command line is necessary before using these scripts but a free Beocat account is.

All of the scripts you will need to complete use these pipelines as well as the sample dataset will be copied to your Beocat directory as you follow the instructions in the links below. You should type or paste the text in the beige code block into your terminal as you follow along with the instructions. If you are not used to commandline, practice with real data is one of the best ways to learn.

If you would like a quick primer on basic linux commands try these 10 minute lessons from Software Carpentry http://software-carpentry.org/v4/shell/index.html. Learn to start using Beocat and begin using the terminal got to https://github.com/i5K-KINBRE-script-share/FAQ/blob/master/UsingBeocat.md. Learn how to download files from Beocat at https://github.com/i5K-KINBRE-script-share/FAQ/blob/master/BeocatEditingTransferingFiles.md.

Genome assembly


Go to:

https://github.com/i5K-KINBRE-script-share/transcriptome-and-genome-assembly/blob/master/KSU_bioinfo_lab/AssembleG/AssembleG_LAB.md

The script "AssembleG.pl" organizes your working directory and writes scripts to clean your reads http://prinseq.sourceforge.net/manual.html, assemble your data or the de novo genome for Staphylococcus aureus from the sample data with Abyss https://github.com/bcgsc/abyss#abyss, and summarize your assembly metrics.

Transcriptome assembly


Go to:

https://github.com/i5K-KINBRE-script-share/transcriptome-and-genome-assembly/blob/master/KSU_bioinfo_lab/AssembleT/AssembleT_LAB.md

The script "AssembleT.pl" organizes your working directory and writes scripts to clean your reads http://prinseq.sourceforge.net/manual.html, assemble your de novo transcriptome for your data or the human breast cancer cell lines from the sample data with Oases https://www.ebi.ac.uk/~zerbino/oases/, and summarize your assembly metrics.

RNA-Seq reference based differential expression


Go to:

https://github.com/i5K-KINBRE-script-share/RNA-Seq-annotation-and-comparison/blob/master/KSU_bioinfo_lab/RNA-SeqAlign2Ref/RNA-SeqAlign2Ref_LAB.md

The script writes scripts and qsubs to generate count summaries for Illumina paired end reads after mapping against a reference genome. The script 1) converts illumina headers if the -c parameter is used, 2) cleans raw reads using Prinseq http://prinseq.sourceforge.net/manual.html, 3) indexes the reference genome for mapping, 4) reads are aligned to the genome with Tophat2 (read more about Tophat2 at http://tophat.cbcb.umd.edu/manual.html) and expressed genes and transcripts are assembled with Cufflinks2, 5) these assemblies are merged with Cuffmerge and differential expression is estimated with Cuffdiff2 (see http://bioinformaticsk-state.blogspot.com/2013/04/cuffdiff-2-and-isoform-abundance.html).

RNA-Seq differential expression using a de novo transcriptome


Go to: https://github.com/i5K-KINBRE-script-share/RNA-Seq-annotation-and-comparison/blob/master/KSU_bioinfo_lab/RNA-SeqAlign/RNA-SeqAlignLAB.md

The script writes scripts and qsubs to generate count summaries for illumina paired end reads after mapping against a de novo transcriptome. The script 1) converts illumina headers if the "-c" parameter is used, 2) cleans raw reads using Prinseq http://prinseq.sourceforge.net/manual.html, 3) creates a filtered transcriptome fasta file with putative transcripts less than 200 bp long removed and then indexes this transcriptome for mapping, 4) reads are then mapped to the length filtered de novo transcriptome using Bowtie2 in the best mapping default mode, read more about Bowtie2 at http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml, 5) count summaries are generated as a tab separated list where the first row is the sample ids and the first column is the name of the contig and the other values are the read counts per sample, see https://github.com/i5K-KINBRE-script-share/RNA-Seq-annotation-and-comparison/tree/master/KSU_bioinfo_lab/Count_reads_denovo for details on how reads are summarized.

Blastx annotation of a transcriptome


Go to:

https://github.com/i5K-KINBRE-script-share/RNA-Seq-annotation-and-comparison/blob/master/KSU_bioinfo_lab/Blastx/Blastx_LAB.md

The ncbi “blastx” search tool translates a nucleotide query into all six frames and compares these six translations to a protein database http://www.ncbi.nlm.nih.gov/books/NBK1763/. The script "Blastx.pl" organizes the working directory and writes scripts to blast putative transcripts from a de novo transcriptome fasta file against the nr protein database.

Report issues or bugs using the repository issue tracker


Go to: https://github.com/i5K-KINBRE-script-share/FAQ/blob/master/ReportIssues.md

image by Krzysztof Szkurlatowski; 12frames.eu