RNASeq analysis pipline
00-Using the conda environment
conda create --name rnaseq
conda activate rnaseq
conda install -c bioconda trim-galore
Prepare test.fq.gz
(PE) files for test, we sample top 20000 lines(5k reads) for furthor analysis
zcat Tcell001/01_Rawdata/CD8_A2_R1.fastq.gz | head -n 20000 > test/Test_sample2_R1_fq
zcat Tcell001/01_Rawdata/CD8_A2_R2.fastq.gz | head -n 20000 > test/Test_sample2_R2_fq
01-Construction of the reference genome
We can download the human or mouse genome and gtf files from Genecode,and you certainly have other ways, such as NCBI, etc. I prefer to download from this site.
hisat2
uses enhancer bwa
algorithm.The input file must be genome.fa
format
# hisat2
nohup hisat2-build -p 8 GRCm38.p6.genome.fa hisat2/mm10 &
02-Quality control of raw sequencing reads
fastqc -t 10 -o /01_Rawdata/fastqc test1.fq.gz
03-Removal of adapters and low quality reads
RNAseq Workflow
Here I provide a hallworkflow using Linux.
gtf=/home/yulab/nino/Reference/RNAseq/mmu
core=8
# conda activate rnaseq
# Run the project dir
# pro_dir=/home/yulab/nino/RNAseq/Project/Tcell001
pro_dir=`pwd`
mkdir ${pro_dir}/02_trim_galore ${pro_dir}/03_align ${pro_dir}/04_featurecounts ${pro_dir}/05_exp
#############################################################
echo "Now Extract sample_info.txt"
sample=$(awk 'NR>2{print line}{line=$0} END{print line}' ${pro_dir}/sample_info.txt | awk '{print $1}')
R1=`ls ${pro_dir}/01_Rawdata/*R1*`
R2=`ls ${pro_dir}/01_Rawdata/*R2*`
#############################################################
echo "Run fastqc"
begin_time=`date`
mkdir ${pro_dir}/01_Rawdata/fastqc
ls ${pro_dir}/01_Rawdata/*fastq.gz | while read ID
do echo ${ID}
fastqc -t ${core} -o ${pro_dir}/01_Rawdata/fastqc ${ID}
sleep 5s
done
end_time=`date`
echo "Fastqc begin time:"
echo ${begin_time}
echo "Fastqc end time:"
echo ${end_time}
QA
- when constructing hisat2 index, warning: Encountered reference sequence with only gaps