Git Product home page Git Product logo

valr's Introduction

valr

R-CMD-check codecov

valr provides tools to read and manipulate genome intervals and signals, similar to the BEDtools suite.

Installation

The latest stable version can be installed from CRAN:

install.packages('valr')

The latest development version can be installed from github:

# install.packages("devtools")
devtools::install_github('rnabioco/valr')

valr Example

Functions in valr have similar names to their BEDtools counterparts, and so will be familiar to users coming from the BEDtools suite. Unlike other tools that wrap BEDtools and write temporary files to disk, valr tools run natively in memory. Similar to pybedtools, valr has a terse syntax:

library(valr)
library(dplyr)

snps <- read_bed(valr_example("hg19.snps147.chr22.bed.gz"))
genes <- read_bed(valr_example("genes.hg19.chr22.bed.gz"))

# find snps in intergenic regions
intergenic <- bed_subtract(snps, genes)
# find distance from intergenic snps to nearest gene
nearby <- bed_closest(intergenic, genes)

nearby |>
  select(starts_with("name"), .overlap, .dist) |>
  filter(abs(.dist) < 5000)
#> # A tibble: 1,047 × 4
#>    name.x      name.y   .overlap .dist
#>    <chr>       <chr>       <int> <int>
#>  1 rs530458610 P704P           0  2579
#>  2 rs2261631   P704P           0  -268
#>  3 rs570770556 POTEH           0  -913
#>  4 rs538163832 POTEH           0  -953
#>  5 rs190224195 POTEH           0 -1399
#>  6 rs2379966   DQ571479        0  4750
#>  7 rs142687051 DQ571479        0  3558
#>  8 rs528403095 DQ571479        0  3309
#>  9 rs555126291 DQ571479        0  2745
#> 10 rs5747567   DQ571479        0 -1778
#> # ℹ 1,037 more rows

valr's People

Contributors

agillen avatar jayhesselberth avatar jimhester avatar kriemo avatar mackie90125 avatar millerh1 avatar raysinensis avatar sheridar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

valr's Issues

pass groups through bed_map

from the README:

x <- tss %>%
  ...
  group_by(win_id)

bed_map(x, y, ...) %>%
  group_by(win_id.x) %>%
  ...

Consider passing groups within bed_map so that the second group_by wouldn't be required. Might be surprising as the groups coming out would be suffixed.

bed_merge implementations

bed_tbl <- dplyr::tibble(
  ~chrom, ~start, ~end,
  "chr1", 100,    500,
  "chr1", 300,    400,
  "chr1", 399,    401,
  "chr1", 200,    600,
  "chr1", 800,   1000,
  "chr2", 100,    200,
  "chr2", 150,    250,
  "chr3", 500,   1000
)

bed_tbl %>% group_by(chrom) %>%
  mutate(.overlap = lead(start) - end,
         .overlap = ifelse(.overlap < 0, -1, 1),
         .rank = dense_rank(.overlap))
#> Source: local data frame [8 x 5]
#> Groups: chrom [3]
#> 
#>   chrom start   end .overlap .rank
#>   (chr) (dbl) (dbl)    (dbl) (int)
#> 1  chr1   100   500       -1     1
#> 2  chr1   300   400       -1     1
#> 3  chr1   399   401       -1     1
#> 4  chr1   200   600        1     2
#> 5  chr1   800  1000       NA    NA
#> 6  chr2   100   200       -1     1
#> 7  chr2   150   250       NA    NA
#> 8  chr3   500  1000       NA    NA

Need to:

  • identify runs of 1s (maybe a purrr function for this)
  • assign min(start), max(end) to those groups
  • impl overlap constraint

unnecessary join warning in bed_window()

Somewhere in bed_window one of the functions is generating a chrom column as a <fctr> when it should be a <chr>. This causes the join warning below. Minor.

library(valr)

genome <- read_genome(system.file('extdata', 'hg19.chrom.sizes.gz', package = 'valr'))

x <- bed_random(genome, n = 100)
y <- bed_random(genome, n = 100)

# a few intersections
bed_intersect(x, y)
#> Source: local data frame [0 x 6]
#> 
#> Variables not shown: chrom <fctr>, start.x <int>, end.x <int>, start.y
#>   <int>, end.y <int>, .overlap <int>.

# can be expanded by casting a wider net
bed_window(x, y, genome, both = 1e6)
#> Warning in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y): joining
#> character vector and factor, coercing into character vector
#> Source: local data frame [6 x 6]
#> 
#>   chrom   start.x     end.x   start.y     end.y .overlap
#>   <chr>     <int>     <int>     <int>     <int>    <int>
#> 1  chr1 178630052 178631052 179274609 179275609     1000
#> 2 chr16  19722181  19723181  19845923  19846923     1000
#> 3 chr16  79075460  79076460  78546496  78547496     1000
#> 4 chr22  38638987  38639987  38556828  38557828     1000
#> 5  chr3  95547777  95548777  95802366  95803366     1000
#> 6  chr8 102562319 102563319 102338405 102339405     1000

bed_map dot spec / pass colname suffixes to bed_intersect

The bed_intersect call in bed_map yields new columns with .y suffixes, causing surprising behavior:

bed_tbl <- tibble::frame_data(
~chrom, ~start, ~end,
"chr1", 100, 250,
"chr2", 250, 500)

signal_tbl <- tibble::frame_data(
~chrom, ~start, ~end, ~value,
"chr1", 100, 250, 10,
"chr1", 150, 250, 20,
"chr2", 250, 500, 500)

bed_map(bed_tbl, signal_tbl, sum = sum(value))
#> Error: object 'value' not found
bed_map(bed_tbl, signal_tbl, sum = sum(value.y))
#> Source: local data frame [2 x 4]
#> 
#>   chrom start   end   sum
#>   <chr> <dbl> <dbl> <dbl>
#> 1  chr1   100   250    30
#> 2  chr2   250   500   500

The way around is to enable passing of suffix values to bed_intersect:

bed_intersect(x, y, suffix.y = '')

C stack usage error bed_flank()

I ran into another issue debugging bed_flank(), when using 1e6 random intervals. I'll work on both #59 and this error.

genome <- read_genome("inst/extdata/genome.txt.gz") 
x <- bed_random(genome)
bed_flank(x, genome, both = 100)
#> Error: C stack usage  34924379 is too close to the limit

use start for merge_id

testing revealed that cluster IDs are out of order because .merge_id is lexographically ordered by dense_rank.

Seems OK just propagate the start value for merged intervals as it will likely always be combined with group_by(chrom), yielding a unique combination per chrom.

Don't expose Rcpp methods

It's not a good idea to have the *_impl methods exposed in the API. If someone passes e.g. an ungrouped dataframe, it will crash RStudio.

library(valr)
# this function is available
closest_impl

The Rcpp methods in dplyr are not exposed, but I don't understand why.

library(dplyr)
# this function is not
select_impl

One guess is that the Rcpp methods are hidden by the S3 objects on the dplyr side.

closest implementation

This is the bedtools2 closest alogrithm: https://github.com/arq5x/bedtools2/blob/master/src/utils/NewChromsweep/CloseSweep.cpp

bed_closest is probably more easily implementated with an interval tree than a sweep algortihm, as there are a huge number of checks to confirm that relative ordering of intervals in the sweep case.

Need to think about how to cache these trees (per session?) if possible, they are somewhat expensive to build and be reused.

EKG has a minimal cxx interval tree implementation https://github.com/ekg/intervaltree/blob/master/IntervalTree.h

cleanup up function scopes before release

anything from dplyr should be used without qualification, i.e. mutate not dplyr::mutate.

anything else should be qualified with a package name i.e., from purrr, broom etc.

build fails

installing to /private/var/folders/_1/4wg2xbj12_dft9p3kq4005n40000gn/T/RtmpvDL3Vf/devtools_install_beea4498c931/valr/libs
* DONE (valr)
Error in dyn.load(dllfile) : 
  unable to load shared object '/Users/jayhesselberth/devel/valr/src/valr.so':
  dlopen(/Users/jayhesselberth/devel/valr/src/valr.so, 6): Symbol not found: __ZN5dplyr23DataFrameSubsetVisitorsC1ERKN4Rcpp14DataFrame_ImplINS1_15PreserveStorageEEERKNS1_6VectorILi16ES3_EE
  Referenced from: /Users/jayhesselberth/devel/valr/src/valr.so
  Expected in: flat namespace
 in /Users/jayhesselberth/devel/valr/src/valr.so
Calls: suppressPackageStartupMessages ... <Anonymous> -> load_all -> load_dll -> library.dynam2 -> dyn.load
> devtools::session_info()
Session info ------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.0 (2016-05-03)
 system   x86_64, darwin15.4.0        
 ui       RStudio (0.99.1172)         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Denver              
 date     2016-05-12                  

Packages ----------------------------------------------------------------------------------------------------------------------------------------------
 package   * version date       source        
 curl        0.9.7   2016-04-10 CRAN (R 3.3.0)
 devtools  * 1.11.1  2016-04-21 CRAN (R 3.3.0)
 digest      0.6.9   2016-01-08 CRAN (R 3.3.0)
 git2r       0.15.0  2016-05-11 CRAN (R 3.3.0)
 htmltools   0.3.5   2016-03-21 CRAN (R 3.3.0)
 httr        1.1.0   2016-01-28 CRAN (R 3.3.0)
 jsonlite    0.9.20  2016-05-10 CRAN (R 3.3.0)
 knitr       1.13    2016-05-09 CRAN (R 3.3.0)
 memoise     1.0.0   2016-01-29 CRAN (R 3.3.0)
 R6          2.1.2   2016-01-26 CRAN (R 3.3.0)
 Rcpp        0.12.4  2016-03-26 CRAN (R 3.3.0)
 rmarkdown   0.9.6   2016-05-01 CRAN (R 3.3.0)
 withr       1.0.1   2016-02-04 CRAN (R 3.3.0)
 yaml        2.1.13  2014-06-12 CRAN (R 3.3.0)

Shiny demo

Develop a demo of valr in shiny.

  • Feature aggregation (e.g., ChIP-seq signal around TSSs).
  • Interval summaries (e.g., by chrom, selectable in DT)
  • Heatmaps of correlations / jaccards in d3heatmap
  • Use flexdashboard to tie it all together

build error

The build fails with cran version of dplyr. While the workaround is to install an alternate version of dplyr found at jayhesselberth/dplyr, this extra functionality should somehow be included in the valr package so that the user doesn't have to overwrite their version dplyr

> install.packages("Projects/valr", repos = NULL, type = "source")
Installing package into ‘/Users/dpastling/Library/R/3.3/library’
(as ‘lib’ is unspecified)
* installing *source* package ‘valr’ ...
** libs
clang++ -std=c++11 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I../inst/include -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Users/dpastling/Library/R/3.3/library/Rcpp/include" -I"/Users/dpastling/Library/R/3.3/library/BH/include" -I"/Users/dpastling/Library/R/3.3/library/dplyr/include"   -fPIC  -Wall -mtune=core2 -g -O2 -c RcppExports.cpp -o RcppExports.o
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/valr.h:5:
In file included from /Users/dpastling/Library/R/3.3/library/dplyr/include/dplyr.h:120:
/Users/dpastling/Library/R/3.3/library/dplyr/include/dplyr/GroupedDataFrame.h:58:25: error: use of undeclared identifier 'build_index_cpp'
                data_ = build_index_cpp( data_) ;
                        ^
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/valr.h:5:
In file included from /Users/dpastling/Library/R/3.3/library/dplyr/include/dplyr.h:149:
/Users/dpastling/Library/R/3.3/library/dplyr/include/dplyr/Collecter.h:228:42: error: use of undeclared identifier 'get_time_classes'
            Parent::data.attr("class") = get_time_classes() ;
                                         ^
/Users/dpastling/Library/R/3.3/library/dplyr/include/dplyr/Collecter.h:255:37: error: use of undeclared identifier 'get_time_classes'
            return collapse<STRSXP>(get_time_classes()) ;
                                    ^
/Users/dpastling/Library/R/3.3/library/dplyr/include/dplyr/Collecter.h:365:54: error: use of undeclared identifier 'get_date_classes'
                return new TypedCollecter<INTSXP>(n, get_date_classes()) ;
                                                     ^
/Users/dpastling/Library/R/3.3/library/dplyr/include/dplyr/Collecter.h:371:55: error: use of undeclared identifier 'get_date_classes'
                return new TypedCollecter<REALSXP>(n, get_date_classes()) ;
                                                      ^
/Users/dpastling/Library/R/3.3/library/dplyr/include/dplyr/Collecter.h:396:54: error: use of undeclared identifier 'get_date_classes'
                return new TypedCollecter<INTSXP>(n, get_date_classes() ) ;
                                                     ^
/Users/dpastling/Library/R/3.3/library/dplyr/include/dplyr/Collecter.h:404:55: error: use of undeclared identifier 'get_date_classes'
                return new TypedCollecter<REALSXP>(n, get_date_classes() ) ;
                                                      ^
7 errors generated.
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘valr’
* removing ‘/Users/dpastling/Library/R/3.3/library/valr’
Warning message:
In install.packages("Projects/valr", repos = NULL, type = "source") :
  installation of package ‘Projects/valr’ had non-zero exit status


> devtools::session_info()
Session info ------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.0 (2016-05-03)
 system   x86_64, darwin13.4.0        
 ui       AQUA                        
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Denver              
 date     2016-05-14                  

Packages ----------------------------------------------------------------------------------------------------------------------------------
 package  * version date       source        
 devtools * 1.11.1  2016-04-21 CRAN (R 3.3.0)
 digest     0.6.9   2016-01-08 CRAN (R 3.3.0)
 memoise    1.0.0   2016-01-29 CRAN (R 3.3.0)
 withr      1.0.1   2016-02-04 CRAN (R 3.3.0)

support VCF inputs

Deal with naked chroms (i.e. 1 in VCF vs chr1 in BED).

read_vcf <- function(vcf) {
res <- read_tsv(vcf, col_names = c('chrom', ...)) %>%
mutate(.chrom = str_c('chr', chrom))

attr(res, "is_vcf") <- TRUE
}

#downstream methods can compare `chrom == .chrom` if `is_vcf == TRUE`

NSE for bed_map

Think about how to implement arguments / NSE for bed_map:

bed_map(sum(signal_colname))
bed_map(max(signal_colname))
bed_map(count(signal_colname))

bed_intersect output not compatible with downstream dplyr functions

Jay, thanks for giving me the opportunity to contribute to this project. I noticed that the new implementation of bed_intersect() (85174fa) does not append suffixes (.x or .y) to the name score or strand columns. This is problematic when piping the output to dplyr for further analysis as it will result in an error. I don't know Cxx well enough yet, otherwise I would suggest a fix. Thanks!

x <- tibble::frame_data(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 500,    1000, '.',   '.',     '+',
  "chr1", 1000,   1500, '.',   '.',     '-',
  "chr2", 1000,   1200, '.',   '.',     '-'
)

y <- tibble::frame_data(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 400,    450, '.',   '.',     '+',
  "chr1", 1000,   1200, '.',   '.',     '-',
  "chr1", 1100,    1500, '.',   '.',     '+',
  "chr2", 1300,   1500, '.',   '.',     '-'
)


bed_intersect(x, y) 

Source: local data frame [3 x 12]

  chrom start.x end.x  name score strand start.y end.y  name score strand .overlap
  <chr>   <dbl> <dbl> <chr> <chr>  <chr>   <dbl> <dbl> <chr> <chr>  <chr>    <int>
1  chr1     500  1000     .     .      +    1000  1200     .     .      -        0
2  chr1    1000  1500     .     .      -    1000  1200     .     .      -      200
3  chr1    1000  1500     .     .      -    1100  1500     .     .      +      400

bed_intersect(x,y) %>% select(start.y)
Error: found duplicated column name: name, score, strand

Passing grouping variables to bed_map

Need a way to pass grouping variables to bed_map, e.g., window intervals from makewindows:

tss_intervals %>%
  bed_flank(size = 1000) %>%
  bed_makewindows(win_size = 50, win_id = 'num') %>%
  bed_map(chip_signal, groups = .win_id, sum = sum(value))

bed_map <- function(..., groups, ...) {
  res <- bed_intersect(bed_tbl, signal_tbl, suffix_y = '') %>%
    group_by_(groups) %>%
    summarize_(.dots = lazyeval::lazy_dots(...)) %>%
    rename(start = start.x, end = end.x)
}

Need more flexible way to specific default grouping by chrom, start.x, end.x. Maybe have a unique ID for each intersection and group_by that?

Pure R impl of complement

genome <- dplyr::tibble(
  ~chrom, ~size,
  "chr1", 100000,
  "chr2", 200000
)

bed_tbl <- dplyr::tibble(
  ~chrom, ~start, ~end,
  "chr1", 50,     250,
  "chr1", 500,    1000,
  "chr2", 1,      1000,
  "chr2", 2000,   5000
)

lags <- bed_tbl %>% group_by(chrom) %>% mutate(.prev_end = lag(end))

first <- lags %>% filter(is.na(.prev_end) & start > 1) %>%
  mutate(.start = 1, .end = start) %>%
  mutate(start = .start, end = .end) %>%
  select(-.prev_end, -.start, -.end)

internal <- lags %>%
  filter(!is.na(.prev_end)) %>%
  mutate(.start = .prev_end, .end = start) %>%
  mutate(start = .start, end = .end) %>%
  select(-.prev_end, -.start, -.end)

final <- lags %>%
  summarize(max.end = max(end)) %>%
  left_join(genome, by='chrom') %>%
  filter(size != max.end) %>%
  mutate(start = max.end, end = size) %>%
  select(-size, -max.end)
#> Joining by: "chrom"

compl <- bind_rows(list(first, internal, final)) %>% arrange(chrom, start)

compl
#> Source: local data frame [5 x 3]
#> 
#>   chrom start   end
#>   (chr) (dbl) (dbl)
#> 1  chr1     1 5e+01
#> 2  chr1   250 5e+02
#> 3  chr1  1000 1e+05
#> 4  chr2  1000 2e+03
#> 5  chr2  5000 2e+05

bed_map tests

test_that("ops on y columns work on original names (#14)")

package name ideas

Suggestions for names.

Ideally package names are pronounceable. Rbedtools doesn't exactly roll off the tongue.

bedr ("better") is taken.

  • valr - manipulates intervals
  • the alternative, rinter, sounds like "winter" with a speech impediment
  • bred - mixing R and bed tools.
  • berd - uhh ...

bed_intersect inverse returns 0 rows

df.1 <- frame_data(
  ~chrom, ~start, ~end,
  "A",   1,      100,
  "B",   50,     150
)

df.2 <- frame_data(
  ~chrom, ~start, ~end,
  "A",    1,       25,
  "B",    45,      85
)

intersection <- bed_intersect(df.1, df.2)
intersection
#> Source: local data frame [2 x 6]
#> 
#>   chrom start.x end.x start.y end.y .overlap
#>   <chr>   <dbl> <dbl>   <dbl> <dbl>    <int>
#> 1     A       1   100       1    25       24
#> 2     B      50   150      45    85       35

reverse <- bed_intersect(df.1, df.2, invert = TRUE)
reverse
#> Source: local data frame [0 x 3]
#> Groups: chrom [2]
#> 
#> Variables not shown: chrom <chr>, start <dbl>, end <dbl>.

glyph function

make this pretty and then use throughout vignettes. would be helpful to add x, y, and .fun labels to the plot.

library(ggplot2)
library(dplyr)
library(tibble)
library(valr)

bed_glyph <- function(x, y, .fun, ...) {

  x <- mutate(x, bin = 3)
  y <- mutate(y, bin = 2)

  res <- eval(.fun(x, y, ...)) %>% mutate(bin = 1)

  comb <- bind_rows(x, y, res) 

  ggplot(comb) + 
    geom_rect(aes(xmin = start, xmax = end,
                  ymin = bin, ymax = bin + 0.9,
                  fill= bin)) + theme_bw()

}

x <- tribble(
  ~chrom, ~start, ~end,
  'chr1',      1,      100
)

y <- tribble(
  ~chrom, ~start, ~end,
  'chr1',      50,     75
)

bed_glyph(x, y, bed_subtract)

duplicate intervals in bed_intersect

x <- tibble::frame_data(
~chrom, ~start, ~end,
"chr1", 100,    500,
"chr1", 175,    200
)

y <- tibble::frame_data(
~chrom, ~start, ~end,
"chr1", 150,    400,
"chr1", 151,    401
)

bed_intersect(x, y)
#> Source: local data frame [6 x 6]
#> 
#>   chrom start.x end.x start.y end.y .overlap
#>   <chr>   <dbl> <dbl>   <dbl> <dbl>    <int>
#> 1  chr1     100   500     150   400      250
#> 2  chr1     100   500     151   401      250
#> 3  chr1     175   200     150   400       25
#> 4  chr1     175   200     151   401       25
#> 5  chr1     175   200     150   400       25
#> 6  chr1     175   200     151   401       25

error in bed_flank vignette example

Error: Duplicate identifiers for rows (1284808, 1498582), (3284808, 3498582), (1386858, 1732629), (3386858, 3732629), (1136478, 1735205), (3136478, 3735205), (1256268, 1422468), (3256268, 3422468), (1298131, 1653327), (3298131, 3653327), (1628578, 1849474), (3628578, 3849474), (284808, 498582), (2284808, 2498582), (386858, 732629), (2386858, 2732629), (136478, 735205), (2136478, 2735205), (256268, 422468), (2256268, 2422468), (298131, 653327), (2298131, 2653327), (628578, 849474), (2628578, 2849474)
8. stop("Duplicate identifiers for rows ", paste(str, collapse = ", "), call. = FALSE)
7. spread_.data.frame(data, key_col, value_col, fill = fill, convert = convert, drop = drop, sep = sep)
6. NextMethod()
5. as_data_frame(NextMethod())
4. spread_.tbl_df(data, key_col, value_col, fill = fill, convert = convert, drop = drop, sep = sep)
3. spread_(data, key_col, value_col, fill = fill, convert = convert, drop = drop, sep = sep)
2. tidyr::spread(res, key, value) at bed_flank.r#121
1. bed_flank(x, genome, both = 100)

differing numbers of chromosomes prevents intersections (bed_intersect)

Intersections will not be reported by bed_intersect if the number of chromosomes are dissimilar.

x <- tibble::frame_data(
  ~chrom, ~start, ~end,
  "chr2", 100,    500
)

y <- tibble::frame_data(
  ~chrom, ~start, ~end,
  "chr1", 10,     20,
  "chr2", 100,    500
)

bed_intersect(x, y)
#> # A tibble: 0 x 6
#> # ... with 6 variables: chrom <chr>, start.x <dbl>, end.x <dbl>,
#> #   start.y <dbl>, end.y <dbl>, .overlap <int>

make column name parameters on the Rcpp side

There are many places where chrom, start and end are hard-coded on the Rcpp side. Could make these into parameters that could be passed.

OTOH, this doesn't have to change and we just enforce the existence of these names on the R side.

build fails with forked version of dplyr

build fails with forked version of dplyr jayhesselberth/dplyr

> devtools::install_github('jayhesselberth/dplyr')
> devtools::install_github('eddelbuettel/BH')
> devtools::install_github('jayhesselberth/valr')

...

g++ -std=c++0x \
    -I/vol4/home/astlingd/R/lib64/R/include \
    -DNDEBUG \
    -I../inst/include \
    -I/usr/local/include \
    -I"/vol4/home/astlingd/R/lib64/R/library/Rcpp/include" \
    -I"/vol4/home/astlingd/R/lib64/R/library/BH/include" \
    -I"/vol4/home/astlingd/R/lib64/R/library/dplyr/include" \
    -fpic  -g -O2 -c RcppExports.cpp -o RcppExports.o
In file included from ../inst/include/valr.h:6,
                 from RcppExports.cpp:4:
../inst/include/IntervalTree.h: In constructor ‘IntervalTree<T, K>::IntervalTree()’:
../inst/include/IntervalTree.h:63: error: ‘nullptr’ was not declared in this scope
../inst/include/IntervalTree.h: In copy constructor ‘IntervalTree<T, K>::IntervalTree(const IntervalTree<T, K>&)’:
../inst/include/IntervalTree.h:76: error: ‘nullptr’ was not declared in this scope
../inst/include/IntervalTree.h: In member function ‘IntervalTree<T, K>& IntervalTree<T, K>::operator=(const IntervalTree<T, K>&)’:
../inst/include/IntervalTree.h:87: error: ‘nullptr’ was not declared in this scope
../inst/include/IntervalTree.h: In constructor ‘IntervalTree<T, K>::IntervalTree(std::vector<Interval<T, K>, std::allocator<Interval<T, K> > >&, size_t, size_t, K, K, size_t)’:
../inst/include/IntervalTree.h:101: error: ‘nullptr’ was not declared in this scope
../inst/include/IntervalTree.h:140: error: parse error in template argument list
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘valr’
* removing ‘/vol4/home/astlingd/R/lib64/R/library/valr’
Error: Command failed (1)


> devtools::session_info()
Session info -------------------------------------------------------------------
 setting  value                       
 version  R version 3.2.3 (2015-12-10)
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       <NA>                        
 date     2016-05-14                  

Packages -----------------------------------------------------------------------
 package  * version date       source        
 curl       0.9.7   2016-04-10 CRAN (R 3.2.3)
 devtools * 1.11.1  2016-04-21 CRAN (R 3.2.3)
 digest     0.6.9   2016-01-08 CRAN (R 3.2.2)
 git2r      0.14.0  2016-03-13 CRAN (R 3.2.3)
 httr       1.1.0   2016-01-28 CRAN (R 3.2.3)
 memoise    1.0.0   2016-01-29 CRAN (R 3.2.3)
 R6         2.1.2   2016-01-26 CRAN (R 3.2.3)
 withr      1.0.1   2016-02-04 CRAN (R 3.2.3)

makewindows ids

Pretty sure the win_id param is not needed.

# name
bed_makewindows(x, genome, win_size = 10) %>% group_by(name)
# num
bed_makewindows(x, genome, win_size = 10) %>% group_by(win_id)
# namenum
bed_makewindows(x, genome, win_size = 10) %>% group_by(name, win_id)

implement bed_shuffle

on the Rcpp side. Need interval trees for -incl and -excl. And look at the recent roulette sampling strategy in bedtools.

Strategy:

  1. Identify new starts within the incl and excl bounds.
  2. Calculate original sizes from passed intervals and add to the random starts.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.