Git Product home page Git Product logo

derfinder's Issues

Make Docker container for reproducing this analysis

I'm working on making a Docker container with the correct dependency/R versions so that this script is reproducible without the user having to install archived packages. This should resolve all issues opened recently by @brianhigh. I will be sure to close this issue when the container is done, though please note that it might be a few weeks. Thanks!

Error in countReads.py

I have tried running it multiple times and this is the error I am receiving.

Traceback (most recent call last):
File "countReads.py", line 144, in
countReadlets(options.file, options.output, options.kmer, options.chrom, stranded)
TypeError: countReadlets() takes exactly 4 arguments (5 given)

seems like 'strand' is missing in def of countReadlets() . Also noticed that in the last line of the code all arguments have options.argument while stranded does not.

countReadlets(options.file, options.output, options.kmer, options.chrom, stranded)

package RSQLite.extfuns was removed from CRAN

As with Issue #9, the RSQLite.extfuns package has also been removed from CRAN. An archived version is available and can be installed with:

library(devtools)
install_url("http://cran.r-project.org/src/contrib/Archive/RSQLite.extfuns/RSQLite.extfuns_0.0.1.tar.gz")

makeDb error when running analysis_code.R

I got this makeDb error when running analysis_code.R:

> makeDb(dbfile = dbfile, tablename = tablename, textfile = textfile, cutoff = 5)
Error in if (.allows_extensions(db)) { : 
  missing value where TRUE/FALSE needed
Error in !dbPreExists : invalid argument type

Perhaps an update to a package since your code was developed has created an incompatibility. Any ideas for a solution? Perhaps ...

library(checkpoint)
checkpoint("2014-10-08")

... as suggedted on the sqldf homepage ... though that did not work for me.

See also:

exons length error from stopifnot() call in getFlags.R

When trying to reproduce results in the 2013 paper, from this code in analysis_code.R ...

# get the flags:
exons = getAnnotation("hg19","knownGene")
myflags = getFlags(regions = regions.merged.y, exons, "chrY", pctcut = 0.8)

... I am getting this fatal error ...

Error: length(unique(exons$chr)) == 1 is not TRUE

Some tests:

> length(unique(subset(x=exons, seqnames == "chrY", c(chr))))
 [1] 1
> length(unique(unlist(subset(x=exons, seqnames == "chrY", c(chr)))))
 [1] 101
> str(unique(unlist(subset(x=exons, seqnames == "chrY", c(chr)))))
 chr [1:101] "100101116" "100101120" "100132596" "100133941" "100289150" ...
> summary(subset(x=exons, seqnames == "chrY", c(chr)))
     chr           
 Length:1190       
 Class :character  
 Mode  :character  
> head(subset(x=exons, seqnames == "chrY"))
    gene       chr seqnames   start     end width strand
392   85 100101116     chrY 6258442 6258716   275      +
393   85 100101116     chrY 6262141 6262300   160      +
394   85 100101116     chrY 6269164 6269272   109      +
395   85 100101116     chrY 6271629 6271766   138      +
396   85 100101116     chrY 6279348 6279605   258      +
397   85 100101116     chrY 9590765 9591022   258      -
> tail(subset(x=exons, seqnames == "chrY"))
        gene  chr seqnames    start      end width strand
258853 22527 9189     chrY  2354455  2358810  4356      -
258854 22527 9189     chrY  2354455  2358813  4359      -
258855 22527 9189     chrY  2368352  2368580   229      -
258856 22527 9189     chrY  2368858  2369015   158      -
263606 22923 9426     chrY 20137667 20139627  1961      +
263607 22923 9426     chrY 19990140 19992100  1961      -

See: getFlags.R ...

stopifnot(length(unique(exons$chr))==1,
length(unique(regions$chr)) == 1,
unique(exons$chr) == unique(regions$chr))

This is my sessionInfo():

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C            LC_COLLATE=C        
 [5] LC_MONETARY=C        LC_MESSAGES=C        LC_PAPER=C           LC_NAME=C           
 [9] LC_ADDRESS=C         LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
 [1] splines   grid      parallel  stats4    stats     graphics  grDevices utils     datasets 
[10] methods   base     

other attached packages:
 [1] rtracklayer_1.26.2     GenomicFeatures_1.18.3 AnnotationDbi_1.28.1  
 [4] derfinder_1.0.2        locfdr_1.1-7           devtools_1.7.0        
 [7] HiddenMarkov_1.8-1     Genominator_1.20.0     GenomeGraphs_1.26.0   
[10] DESeq_1.18.0           lattice_0.20-29        locfit_1.5-9.1        
[13] Biobase_2.26.0         edgeR_3.8.5            limma_3.22.4          
[16] biomaRt_2.22.0         Rsamtools_1.18.2       Biostrings_2.34.1     
[19] XVector_0.6.0          GenomicRanges_1.18.4   GenomeInfoDb_1.2.4    
[22] IRanges_2.0.1          S4Vectors_0.4.0        BiocGenerics_0.12.1   
[25] BiocInstaller_1.16.1   sqldf_0.4-10           gsubfn_0.6-6          
[28] proto_0.3-10           RSQLite_1.0.0          DBI_0.3.1             

loaded via a namespace (and not attached):
 [1] BBmisc_1.9              BatchJobs_1.5           BiocParallel_1.0.3     
 [4] GenomicAlignments_1.2.1 RColorBrewer_1.1-2      RCurl_1.95-4.5         
 [7] XML_3.98-1.1            annotate_1.44.0         base64enc_0.1-2        
[10] bitops_1.0-6            brew_1.0-6              checkmate_1.5.1        
[13] chron_2.3-45            codetools_0.2-10        digest_0.6.8           
[16] fail_1.2                foreach_1.4.2           genefilter_1.48.1      
[19] geneplotter_1.44.0      httr_0.6.1              iterators_1.0.7        
[22] sendmailR_1.2-1         stringr_0.6.2           survival_2.37-7        
[25] tcltk_3.1.2             tools_3.1.2             xtable_1.7-4           
[28] zlibbioc_1.12.0

Where is tophatY-updated?

Hi! I am trying to run your R script analysis_code.R to reproduce your research paper results and when the command...

makeDb(dbfile = dbfile, tablename = tablename, textfile = textfile, cutoff = 5)

...is executed I get this error:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'tophatY-updated': No such file or directory

I do not see this file tophatY-updated in your repository. Where can I get this file? Can you add it to your repository? Thanks!

fix dependencies

packages "BSgenome.Hsapiens.UCSC.hg19" and "BSgenome.Mmusculus.UCSC.mm10" are REALLY SUPER SUPER slow in installing and aren't always needed. Is there a way to only load main packages when installing the packages and load those bulky ones only if the user needs them? (i.e. doesn't already have an annotation)?

Also, fix README to include all dependencies (not just the "Depends") and while you're at it, fix Depends vs. Imports.

package locfdr was removed from CRAN

In your README.md, you mention locfdr in the installation section. Since locfdr has been removed from CRAN, you might want to use an alternative which is still supported. Some suggestions may be found in this stackoverflow thread. The packages twilight and fdrtool were suggested and some examples were given in that thread. Otherwise a archived package for locfdr must be used, as in...

library(devtools)
install_url("http://cran.r-project.org/src/contrib/Archive/locfdr/locfdr_1.1-7.tar.gz")

Regions are not detected as expressed

I encountered a strange result while analysing RNA-seq data with derfinder. My goal is to find expressed regions in a number of conditions which broadly derfinder does. The issue is that it seems that the reported regions are smaller than what would be expected by visual inspection:

derfinder_regions

As you can see there is quite a large expressed region of which only a fraction is detected. This doesn't seem to be an issue of low coverage because, to the left, there is a an area with similar high coverage which is not reported as expressed. This an example but I found several of these in my data.

The question is now if this a setting which needs to be changed, or is this something to be expected?

The analysis was run with the following settings:

fullCov <- fullCoverage(
      files = files,
      chrs = chrom,
      cutoff = 0,
      L = read_length,
      verbose = TRUE,
      totalMapped = total_mapped,
      filter = "one",
      mc.cores = nproc
)

regionMat <- regionMatrix(
   fullCov,
   cutoff = min_cov,
   L = read_length,
   maxClusterGap = 3000L,
   returnBP = TRUE,
   verbose = TRUE,
   filter = "one",
   targetSize = targetSize,
   mc.cores = nproc
   )

files are bam files, read length = 75 and cutoff = 5. Session info can be found here.

makeTranscriptDbFromUCSC() returns "'data' must be of a vector type" error

It looks like changes in UCSC have introduced a problem with running this beta derfinder with R 2.15.3.

In the sample code analysis_code.R:

exons = getAnnotation("hg19","knownGene")
Warning message:
In .local(.Object, ...) : NAs introduced by coercion
Error in genome(ucscCart(x)) : 
  error in evaluating the argument 'x' in selecting a method for function 'genome': Error in matrix(unlist(pairs), nrow = 2) : 
  'data' must be of a vector type

Error in getAnnotation("hg19", "knownGene") : 
  Problem accessing requested UCSC annotation - likely there is a problem with genome or tablename arguments. Use ucscGenomes() to see acceptable genomes; use supportedTables(genome) to see acceptable tablenames for your genome.

A test:

a <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "knownGene")
Warning message:
In .local(.Object, ...) : NAs introduced by coercion
Error in genome(ucscCart(x)) : 
  error in evaluating the argument 'x' in selecting a method for function 'genome': Error in matrix(unlist(pairs), nrow = 2) : 
  'data' must be of a vector type

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.