Git Product home page Git Product logo

pslibs's Introduction

Transcriptomic Analysis of Pi-S treatment libraries

Here is the documentation of my methods for processing the Pi-S treatment libraries, my scripts and locations of the data

Read locations

# Raw reads
cd /home/lucy/R/Eutrema/PS/PS/reads

# Trimmed reads
cd /home/lucy/R/Eutrema/PS/PS/trimmed

# Read counts
cd /home/lucy/R/Eutrema/PS/QoRTs

Attempt to find novel transcripts

cd /home/lucy/R/Eutrema/PS/StringTIe

There were many novel transcripts almost all of which overlap with already annotated transcripts and differ by a could of base-pairs. In fact, some of the annotated DROUGHT transcripts also overlap with the Thhalvs, eventually one should look through the annotation file and remove the overlapping DROUGHT transcripts.

How does results() work in DESeq2

Why don't I get the same DEGs as A vs. B when I switch the reference level to B vs. A?

results() command extracts the log2FoldChange comparison values from the DESeqDataSet object generated from DESeq().

Match between two patterns (Perl)

perl -pe 

Custom crema instructions

Before running crema, you have to filter out ribosomal reads!.

(Or don't filter, Elizabeth says that some ribosomal reads can also be lncRNA depending on how they were processed)

(In the spirit of keeping everything, might not even filter FPKMs less than 1 because lncRNA that are related to stress seems to have very low expression in general)

#!/bin/bash

Output generated from /home/lucy/R/Eutrema/PS/scripts/PS_pairwise.R

NAME_DIR=/home/lucy/R/Eutrema/PS/names
CREMA=/home/lucy/bin/crema
PROC=3
cd $NAME_DIR
ls -1 */*/*/names > names

Activate Conda environment that has everything already downloaded. Check crema GitHub for software requirements.

conda activate renv

Use batch processing to generate multi fasta

#go on the cluster
#ssh lucy@114
while read line; do
    xargs samtools faidx /2/scratch/lucy/misc/2transcripts.fa < $line > $line.fa
done < names

#exit the cluster
#exit

Run CPAT, make sure it's in an environment with Python 2 installed

conda deactivate

conda activate cpat

parallel -j $PROC cpat.py -g {}.fa -o {}.cpat \
    -x $CREMA/cpat_models/ath_hexamer.txt \
    -d $CREMA/cpat_models/ath_logit.RData :::: names

conda deactivate

Run Diamond BLASTX

parallel -j $PROC diamond blastx -d swissprot.dmnd -q {}.fa -o {}.diamond \
    -e 0.001 -k 5 --matrix BLOSUM62 --gapopen 11 --gapextend 1 --more-sensitive \
    -f 6 qseqid pident length qframe qstart qend sstart send evalue bitscore :::: names

Run the lncRNA prediction tool

conda activate renv
parallel -j $PROC python3 $CREMA/bin/predict.py -f {}.fa -c {}.cpat -d {}.diamond :::: names

Weighted gene co-expression network analysis

With R package WGCNA (no longer supported), however still a good diagnostic analysis to explore data.

See: K-means clustering paper

The above package is also deprecated and their recommended alternative seems to be optimized for animal tissue types.

Script

Looking for QTLs

See: Arabidopsis QTL mapping paper

See if these QTLs (in supp file 4 of the paper), particularly the PHO and PUE group are found in Eutrema Yukon and if their expression levels remark anything interesting.

  • Use Phytozome/Phytomine Python API to get homolog genes and sequences (really powerful tool)
  • R script calls the Python script which queries Phytozome through the Phytomine API
  • For genes with no match in Arabidopsis, extract the transcript sequence and run BLAST in Eutrema transcriptome
  • Try to annotate those transcripts
  • Then we determine the expression levels of these transcripts in the RNA-Seq libraries

Three common genes mapped in these two QTL groups

intersect(PHO, PUE)

  [1] "AT1G24020" "AT1G47970" "AT1G47980"

QTL groupings

The number of matches may exceed mismatches because one Arabidopsis gene may match to multiple Eutrema genes.

QTL Group Arabidopsis Eutrema matched Eutrema BLAST No match
PHO 25 18 3 5
PUE 44 15 5 4
SUL 26 22 3 3
IP6 18 17 1 1

PUE - Phosphate use efficiency; PHO - Leaf phosphate content; SUL - Leaf sulfate content; IP6 - Leaf phytate content;

pslibs's People

Contributors

lucyzhangzhang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.