Git Product home page Git Product logo

lncpipereporter's Introduction

LncPipeReporter

Build Status codecov install with bioconda

An R package for automatically aggregating and summarizing lncRNA analysis results.

Overview

Most of bioinformatics tools, such as aligners like STAR, TopHat and HISAT2 generate log files by default. A lastest nextflow-based lncRNA sequenceing data analysis pipeline, known as LncPipe, produces a file containing lncRNA basic features.

This project is a part of LncPipe (but can also be used solely) that take charge of automatically generating reports in HTML format with interactive plots based on pipeline output. It contains several ploting functions as well as analysis scripts to perform comparison analysis and differential expression analysis when experimental design information was available. We speculated this tools can facilitate understanding the underlining machanism of known and novel lncRNAs in their experiment.

Gallery

Gif animations were recorded using phw/peek.

LncPipeReporter generated interactive plots support arbitrary scaling, filtering with tags refer to real value implemented via plotly.

There are also interactive tables exhibiting the first 80 lines of the data.frame/data.table, which could be exported as many forms, allowing for searching, filtering and ordering.

The user-adjusted plots can always be saved as static figures, then could be temporarily placed in your manuscripts for peer-review. Once time comes to publication, you may use publish-deserved version instead.

Features

  • Common result files in lncRNA sequencing data analysis pipeline are well suppoted. The package is designed to handle with several types of files (click to see the example file content):

  • File can be found anywhere. Users can put all up-stream analysis result files simply in a folder (even with other files). They will be found out recursively from the folder and its subdirectories.

  • File types can be guessed. Users never need designate file types explicitly or even send a file containing name list as a paramter when use LncPipe reporter.

  • Flexible use. User can send arbitrary type or number of files at a time, for instance, more than one STAR log files, or both STAR and HISAT2 log files, or even without any alignment log files.

  • More themes available. Users can apply for a series of pretty theme brought by ggsci. See Parameters for details.

  • Multiple differential expression analysis method supported. Up to now, users can choose one of edgeR, DESeq2 or NOISeq as differential expression analysis tool.

  • High resolution static figures with detailed results in csv is provided. Users will get figures which can be used for publication in tiff format (with 300 ppi resolution and lzw compression performed) and pdf format (could be modified in AI, etc.). Also, LncPipeReporter always brings you analysis result tables (comma-separated, can be opened/edited by MS Excel, etc.), for details, see Results.

Installation

LncPipeReporter currently only support Unix-like operation system.

Because it contains several lines of Perl 5 one-liner for parsing multiple log files. I'll use pure R code instead in the future to make it a cross-platform package.

The main reporter Rmd file is constructed from Rmarkdown files of R Markdown v2 document, so you must install pandoc first:

For Arch Linux:

$ sudo pacman -S pandoc

For other operation systems or Linux distributions, see pandoc's official documentation.

You can't build from source in Microsoft-R-Open early than v3.4.2, due to its bug.

For some packages need fortran for compiling, you should install fortran compiler first:

$ sudo apt-get install gfortran

Run in R session:

install.packages("devtools")
devtools::install_github("bioinformatist/LncPipeReporter")

If there's any problem during installation, please refer to FAQ.

How to use

Caution: Though users never need specify file types, the sample name should be embedded in the first part (use both . and _ as file name delimiter) of file name's prefix, for example, the sample name of LWS2.Log.final.out and N1037.log will be obtained as LWS2 and N1037.

If you use DESeq2 or NOISeq as differentially expression analysis tool, the order of sample names in experimental design information file should be consistent with the expression matrix columns.

It is highly recommended that users should use Chrome web browser for looking through reports produced by LncPipeReporter.

Try the simplest run with default parameters

library(LncPipeReporter)
run_reporter()

Specify the parameter values with user-interface

library(LncPipeReporter)
# DO NOT use T as short name of TRUE
run_reporter(ask = TRUE)

Call with user-defined parameter values

library(LncPipeReporter)
run_reporter(input = system.file(file.path("extdata", "demo_results"),package = "LncPipeReporter"),
             output = 'reporter.html',
             theme = 'npg',
             cdf.percent = 10,
             max.lncrna.len = 10000,
             min.expressed.sample = 50,
             ask = FALSE)

Call in shell scripts or command line (Nextflow, etc.)

List the paramters with values as a R list object:

$ Rscript -e "library(LncPipeReporter); run_reporter(input = '.', ...)"

... stands for other arguments. You should use single-quotes here.

Parameters with their names and default values were listed below:

Parameters

Name Default value Description
input extdata/demo_results Absolute path of input directory (results of up-stream analysis)
output ~/reporter.html index file name (In HTML format)
output_dir ~/LncPipeReports output directory (who holds all results and dependencies)
de.method 'edger' Differential expression analysis method, could be 'edger'(default), 'noiseq' or 'deseq2'
theme npg Journal palette applied to all plots supplied by ggsci
cdf.percent 10% Percentage of values to display when calculating coding potential
max.lncrna.len 10000 Maximum length of lncRNAs to display when calculating distribution
min.expressed.sample 50% Minimal percentage of expressed samples
ask FALSE need set parameters with graphical user-interface in browser?

For details and examples, please type help(run_reporter) or ?run_reporter in R session for documentation.

Results

By default, LncPipeReporter will generate a directory named as LncPipeReports at your $HOME (you can set another place yourself) that holds all results as well as dependencies, so you should always move/copy the whole folder. The contents of the output directory seems like:

LncPipeReports/
├── figures
│   ├── CDF.pdf
│   ├── CDF.tiff
│   ├── compare_density.pdf
│   ├── compare_density.tiff
│   ├── compare_violin.pdf
│   ├── compare_violin.tiff
│   ├── HISAT2.pdf
│   ├── HISAT2.tiff
│   ├── lncRNA_length_distribution.pdf
│   ├── lncRNA_length_distribution.tiff
│   ├── lncRNA_length_distribution_with_type.pdf
│   ├── lncRNA_length_distribution_with_type.tiff
│   ├── pca.pdf
│   ├── pca.tiff
│   ├── STAR.pdf
│   ├── STAR.tiff
│   ├── TopHat2.pdf
│   ├── TopHat2.tiff
│   ├── vocano.pdf
│   └── vocano.tiff
├── libs
│   ├── bootstrap-3.3.5
│   ├── crosstalk-1.0.0
│   ├── datatables-binding-0.2
│   ├── dt-core-1.10.12
│   ├── dt-ext-buttons-1.10.12
│   ├── dt-plugin-searchhighlight-1.10.12
│   ├── htmlwidgets-0.9
│   ├── ionicons-2.0.1
│   ├── jquery-1.12.4
│   ├── jszip-1.10.12
│   ├── pdfmake-1.10.12
│   ├── plotly-binding-4.7.1.9000
│   ├── plotlyjs-1.31.2.9000
│   ├── stickytableheaders-0.1.19
│   └── typedarray-0.1
├── reporter.html
└── tables
    ├── DE.csv
    ├── HISAT2.csv
    ├── STAR.csv
    └── TopHat2.csv

18 directories, 25 files

This tree thumbnail is represented for output with differentially expression analysis via edgeR. The results from the other tools may be slightly different.

FAQ

If devtools::install_github() raise Installation failed: Problem with the SSL CA cert (path? access rights?) error, try:

install.packages(c("curl", "httr"))

During installation there may be some configuration error (lack of libraries):

------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libcurl was not found. Try installing:
 * deb: libcurl4-openssl-dev (Debian, Ubuntu, etc)
 * rpm: libcurl-devel (Fedora, CentOS, RHEL)
 * csw: libcurl_dev (Solaris)
If libcurl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libcurl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------

Just follow the instruction to satisfy the dependencies. For instance, you can run sudo apt-get install libcurl4-openssl-dev in Ubuntu to fix the problem above.

LncPipeReporter use Bioconductor package edgeR to perform differential expression analysis, so if you get 'BiocInstaller' must be installed to install Bioconductor packages., please choose 1 (Yes). Since then you may see Installation failed: cannot open the connection to 'https://bioconductor.org/biocLite.R', run source('http://bioconductor.org/biocLite.R'), finally try the installation commands above again.

Please wait for minutes then try again if solving some dependencies from GitHub fails with Connection timed out after 100001 milliseconds.

License

This package is free and open source software, licensed under GPL v3.0.

lncpipereporter's People

Contributors

bioinformatist avatar likelet avatar sateeshperi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lncpipereporter's Issues

‘LncPipeReporter’ install error

Warning message:
package ‘LncPipeReporter’ is not available (for R version 3.4.2)

LncPipeReporter does not support R version 3.4.2? I can't install it in centos.

Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed

Hi,this problem occurs when I run testdata in lncpipe. How can I solve it? Thanks in advance.

processing file: ./edger.Rmd
Quitting from lines 9-57 (./edger.Rmd)
Quitting from lines 185-194 (./edger.Rmd)
Error in .rowNamesDF<-(x, value = value) :
duplicate 'row.names' are not allowed
Calls: run_reporter ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning messages:
1: In eval(expr, envir, enclos) : NAs introduced by coercion
2: non-unique values when setting 'row.names':
Execution halted

Install error on Ubuntu linux system

Hi, I meet an error showed like this ;

Installing 116 packages: acepack, annotate, AnnotationDbi, base64enc, BH, Biobase, BiocGenerics, BiocParallel, bit, bit64, bitops, blob, caTools, checkmate, colorspace, cowplot, crosstalk, data.table, DBI, DelayedArray, dendextend, DEoptimR, DESeq2, diptest, dplyr, DT, edgeR, evaluate, fansi, flexdashboard, flexmix, foreach, formatR, Formula, fpc, futile.logger, futile.options, gclus, gdata, genefilter, geneplotter, GenomeInfoDb, GenomeInfoDbData, GenomicRanges, ggplot2, ggsci, gplots, gridExtra, gtable, gtools, heatmaply, hexbin, highr, Hmisc, htmlTable, htmltools, htmlwidgets, httpuv, IRanges, iterators, kernlab, knitr, labeling, lambda.r, later, latticeExtra, lazyeval, limma, locfit, markdown, matrixStats, mclust, modeltools, munsell, mvtnorm, NOISeq, pillar, pkgconfig, plogr, plotly, plyr, prabclus, promises, qap, RColorBrewer, RcppArmadillo, RCurl, registry, reshape2, rmarkdown, robustbase, RSQLite, S4Vectors, scales, seriation, shiny, snow, sourcetools, stringi, stringr, SummarizedExperiment, tibble, tidyr, tidyselect, tinytex, trimcluster, TSP, utf8, viridis, viridisLite, webshot, xfun, XML, xtable, XVector, zlibbioc
Error in if (type == "binary") { : argument is of length zero
Calls: ... with_rprofile_user -> with_envvar -> force -> force -> i.p
In addition: Warning message:
In is.na(remote_deps$package) :
is.na() applied to non-(list or vector) of type 'NULL'
Execution halted

I found in some discussion. Developers mentioned that the reason is that the pkgtype is not defined. Is there any clue I could fix it?

Thank you!

Run_LncPipeReporter gives .onLoad failed in loadNamespace() for 'shiny', details:

Hi, what might be reason of getting this error? Thanks in advance.

Command error:

processing file: reporter.Rmd

processing file: ./lncRNA.Rmd
Quitting from lines 27-38 (./lncRNA.Rmd)
Quitting from lines 41-43 (./lncRNA.Rmd)
Error: .onLoad failed in loadNamespace() for 'shiny', details:
call: NULL
error: invalid version specification '1,5'
Execution halted

Reporter

Hi,
I'm running lncpipe, and the a error always show up when the lncpipeReporter process is running.

The error message is:
"processing file: reporter.Rmd Quitting from lines 51-99 (reporter.Rmd) Error in [.data.table(x, , 2) : Item 1 of j is 2 which is outside the column number range [1,ncol=1] Calls: run_report ... FUN -> determine type -> paste -> [ -> [.data.table Execution halted"
I'm using docker, lastest nextflow and lncpipe versions. I'm not a developer, could someone help me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.