eonurk / cinar Goto Github PK

View Code? Open in Web Editor NEW

12.0 2.0 3.0 26.92 MB

A differential and enrichment analyses pipeline for bulk ATAC-seq (and RNA-seq)

Home Page: https://eonurk.github.io/cinaR/

R 100.00%

atac-seq differential-analysis enrichment-analysis gene-sets

cinar's Introduction

cinaR

Overview

cinaR is a single wrapper function for end-to-end computational analyses of bulk ATAC-seq (or RNA-seq) profiles. Starting from a consensus peak file, it outputs differentially accessible peaks, enrichment results, and provides users with various configurable visualization options. For more details, please see the preprint.

Installation

# CRAN mirror
install.packages("cinaR")

Development version

To get bug fix and use a feature from the development version:

# install.packages("devtools")
devtools::install_github("eonurk/cinaR")

Known Installation Issues

Sometimes bioconductor related packages may not be installed automatically.
Therefore, you may need to install them manually:

BiocManager::install(c("ChIPseeker", "DESeq2", "edgeR", "fgsea","GenomicRanges", "limma", "preprocessCore", "sva", "TxDb.Hsapiens.UCSC.hg38.knownGene", "TxDb.Hsapiens.UCSC.hg19.knownGene", "TxDb.Mmusculus.UCSC.mm10.knownGene"))

Usage

library(cinaR)
#> Checking for required Bioconductor packages...
#> All required Bioconductor packages are already installed.

# create contrast vector which will be compared.
contrasts<- c("B6", "B6", "B6", "B6", "B6", "NZO", "NZO", "NZO", "NZO", "NZO", "NZO", 
              "B6", "B6", "B6", "B6", "B6", "NZO", "NZO", "NZO", "NZO", "NZO", "NZO")

# If reference genome is not set hg38 will be used!
results <- cinaR(bed, contrasts, reference.genome = "mm10")
#> >> Experiment type: ATAC-Seq
#> >> Matrix is filtered!
#> 
#> >> preparing features information...      2024-05-22 12:38:01 
#> >> identifying nearest features...        2024-05-22 12:38:02 
#> >> calculating distance from peak to TSS...   2024-05-22 12:38:02 
#> >> assigning genomic annotation...        2024-05-22 12:38:02 
#> >> assigning chromosome lengths           2024-05-22 12:38:11 
#> >> done...                    2024-05-22 12:38:11
#> >> Method: edgeR
#>  FDR:0.05& abs(logFC)<0
#> >> Estimating dispersion...
#> >> Fitting GLM...
#> >> DA peaks are found!
#> >> No `geneset` is specified so immune modules (Chaussabel, 2008) will be used!
#> >> enrichment.method` is not selected. Hyper-geometric p-value (HPEA) will be used!
#> >> Mice gene symbols are converted to human symbols!
#> >> Enrichment results are ready...
#> >> Done!

pca_plot(results, contrasts, show.names = F)

For more details please go to our site from here!

Citation

@article {Karakaslar2021.03.05.434143,
    author = {Karakaslar, E Onur and Ucar, Duygu},
    title = {cinaR: A comprehensive R package for the differential analyses and 
    functional interpretation of ATAC-seq data},
    year = {2021},
    doi = {10.1101/2021.03.05.434143},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2021/03/08/2021.03.05.434143.1},
    journal = {bioRxiv}
}

Contribution

You can send pull requests to make your contributions.

License

GNU General Public License v3.0

cinar's People

Contributors

Stargazers

Watchers

Forkers

nkatiyar alperoglu cmf1997

cinar's Issues

Sort the comparison orders according to the order of contrasts

If you have;

contrasts <- c("Severe", "Mild", "Healthy")

Then comparisons becomes Severe_Mild, Severe_Healthy, Mild_Healthy.

Hint: Take unique of the contrasts.

Error in .get_data_frame_col_as_numeric(df, granges_cols[["start"]]) : some values in the "Start" column cannot be turned into numeric values

Hello,

I am attempting to run cinaR with this code. I have 20 samples, composed of 2 replicates of 10.
results <- cinaR(consensus_matrix, contrasts, reference.genome = "mm10", additional.covariates = c(rep("H", 5), rep("L", 5), rep("H", 5), rep("L", 5)), batch.correction = T, batch.information = c(rep(0, 10), rep(1,10)))

I have given this as contrasts:
contrasts<- c("H0", "H2", "H24", "H48", "H72", "L0", "L2", "L24", "L48", "L72","H0", "H2", "H24", "H48", "H72", "L0", "L2", "L24", "L48", "L72")

head(consensus_matrix) shows the below (only the fifrst 3 samples shown)

  chr   start     end Hi_0_REP1.mLb.clN.sorted.bam Hi_2_REP1.mLb.clN.sorted.bam Hi_24_REP1.mLb.clN.sorted.bam
1 chr1 3008684 3009119                            8                           12                             1
2 chr1 3012311 3012785                            8                            3                             1
3 chr1 3037464 3037989                            7                           20                             0
4 chr1 3046437 3046652                            6                            4                             0
5 chr1 3049581 3049922                            8                            4                             0
6 chr1 3053849 3054004                            0                           10                             0

dim(consensus_matrix)
[1] 312759 23

The traceback of the code shows:

Error in .get_data_frame_col_as_numeric(df, granges_cols[["start"]]) : some values in the "Start" column cannot be turned into numeric values
8. stop(wmsg("some values in the ", "\"", names(df)[[col]], "\" ", "column cannot be turned into numeric values"))
7. .get_data_frame_col_as_numeric(df, granges_cols[["start"]])
6. makeGRangesFromDataFrame(from, keep.extra.columns = TRUE)
5. asMethod(object)
4. as(seqnames, "GRanges")
3. GenomicRanges::GRanges(bed)
2. annotatePeaks(cp.filtered, reference.genome = reference.genome, show.annotation.pie = show.annotation.pie, verbose = verbose)
1. cinaR(consensus_matrix, contrasts, reference.genome = "mm10", additional.covariates = c(rep("H", 5), rep("L", 5), rep("H", 5), rep("L", 5)), batch.correction = T, batch.information = c(rep(0, 10), rep(1, 10)))

I've checked that the start and end columns contain only numeric values and no NAs. I'd appreciate any advice.

Thank you

Verbose boolean

if(verbose){
    # print message
}

Note: don't forget chIpSeeker.

Differential Analyses - final.matrix

hello there,
cinaR is a wonderful tools for analysis ATAC-seq
I was wondering what is format of final.matrix (Annotated Consensus peaks)
I was trying to conducting Differential Analyses in my custom peak matrix with custom annotation
and I want to conduct Differential Analyses only rather than cinaR
much appreciated if you can provide any information

Implement null checks for enrichment of mm10 modules

If there are any DA peaks it throws an error, fix it!

fgsea error for run_enrichement

When run_enrichment is run with the GSEA as the enrichment method, the following error is given:

Error in fgsea::fgsea(pathways = geneset, stats = genes, eps = 0, minSize = 15, : unused argument (eps = 0)

The new version of fgsea::fgsea does not have the eps as an option.