bioconductor / genomicranges Goto Github PK

Representation and manipulation of genomic intervals

Home Page: https://bioconductor.org/packages/GenomicRanges

R 97.95% C 1.70% TeX 0.35%

genomicranges's Introduction

GenomicRanges is an R/Bioconductor package for representing and manipulating genomic intervals.

See https://bioconductor.org/packages/GenomicRanges for more information including how to install the release version of the package (please refrain from installing directly from GitHub).

genomicranges's People

Contributors

Stargazers

Watchers

Forkers

alexandremkuhn lwaldron ltla balwierz aelmas biolchen kayla-morrell jverploegen ditag matthieurouland vjcitn hsadia538 neurogenomics seven1112233 sonali8434 raziafrooz namhoangubs

genomicranges's Issues

Making seqinfo<- easier to use with new levels

This works:

library(GenomicRanges)
len <- c(chrB=10, chrA=20)
x <- GRanges("chrA:1-10", seqinfo=Seqinfo(seqnames=names(len), seqlengths=len))

This doesn't:

x <- GRanges("chrA:1-10")
seqinfo(x) <- Seqinfo(seqnames=names(len), seqlengths=len))
## Error in GenomeInfoDb:::makeNewSeqnames(x, new2old = new2old, seqlevels(value)) :
##   when 'new2old' is NULL, the first elements in the
##   supplied 'seqlevels' must be identical to 'seqlevels(x)'

What gives? Everything is named so there should be no problem defining a mapping automatically.

So currently I have to take a weird and unintuitive detour to do what I need to do:

seqlevels(x) <- names(len)
seqlengths(x) <- len

GenomicRanges::findKNN masks BiocNeighbors::findKNN

Or vice versa. It's not a disaster as people can always disambiguate with ::. But it seems like we have an opportunity to just avoid such problems with a different name, given that GenomicRanges::findKNN was only been exported in this devel cycle. findKNearest() would thematically fit with the names of other functions, e.g., nearest(), distanceToNearest().

annoying warning when `as.data.frame(..., stringsAsFactors=F)`-ing on GRanges object

require(GenomicRanges)
g1 <- GRanges(data.frame(seqnames="Chr1",
                         strand="+",
                         start=c(2,2,2,4,5,6),
                         end=c(24,25,25,25,25,21),
                         m1=c("a","a","a","b","b","b"),
                         m2=c("A","B","A","B","A","B"),
                         m3=c(1,2,3,4,5,6), stringsAsFactors=F))
as.data.frame(g1, stringsAsFactors=F)
##   seqnames start end width strand m1 m2 m3
## 1     Chr1     2  24    23      +  a  A  1
## 2     Chr1     2  25    24      +  a  B  2
## 3     Chr1     2  25    24      +  a  A  3
## 4     Chr1     4  25    22      +  b  B  4
## 5     Chr1     5  25    21      +  b  A  5
## 6     Chr1     6  21    16      +  b  B  6
## Warning message:
## In as.data.frame(mcols(x), ...) : Arguments in '...' ignored

The warning message is annoying, and by the way, the default behaviour is stringsAsFactors=F.
So why not disabling it when stringsAsFactors=F?
For the case stringsAsFactors=T, the warning might be good to know.

no convenient SimpleGRangesList to CompressedGRangesList as function

Hi Hervé, @hpages

Shouldn't there be a SimpleGRangesList to CompressedGRangesList as
method?

I get an error when using this code:

suppressPackageStartupMessages({
    library(GenomicRanges)
})
example(GRangesList, echo = FALSE)

grl <- as(list(gr1, gr2), "GRangesList")
GRangesList(grl)
#> Error in as(objects[[1L]], "CompressedGRangesList", strict = FALSE): no method or default for coercing "SimpleGRangesList" to "CompressedGRangesList"

sessionInfo()
#> R version 3.6.1 Patched (2019-10-04 r77258)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.3 LTS
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#> [1] GenomicRanges_1.37.16 GenomeInfoDb_1.21.2   IRanges_2.19.16      
#> [4] S4Vectors_0.23.25     BiocGenerics_0.31.6  
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.2             digest_0.6.21          bitops_1.0-6          
#>  [4] magrittr_1.5           evaluate_0.14          highr_0.8             
#>  [7] zlibbioc_1.31.0        rlang_0.4.0            stringi_1.4.3         
#> [10] XVector_0.25.0         rmarkdown_1.16         tools_3.6.1           
#> [13] stringr_1.4.0          RCurl_1.95-4.12        xfun_0.10             
#> [16] yaml_2.2.0             compiler_3.6.1         htmltools_0.4.0       
#> [19] knitr_1.25             GenomeInfoDbData_1.2.1

^{Created on 2019-10-09 by the reprex package (v0.3.0)}

The simple solution would be to use as(list(gr1, gr2), "CompressedGRangesList")
but I don't think many end users are aware of this distinction.

^{This is in reference to Bioconductor/RaggedExperiment#24}

Thanks,
Marcel

Errors attempting GenomicRanges:::findKNN()

Goal: given a set of genomic positions, return the k-nearest genes with corresponding distance (TSS to genomic position). I'm attempting to use unexported and exploratory function findKNN() as suggested by @lawremi.

Input:

>positions
GRanges object with 295644 ranges and 1 metadata column:
           seqnames            ranges strand |      E123
              <Rle>         <IRanges>  <Rle> | <numeric>
       [1]    chr19             1-200      * |         0
       [2]    chr19           201-400      * |         0
       [3]    chr19           401-600      * |         0
       [4]    chr19           601-800      * |         0
       [5]    chr19          801-1000      * |         0
       ...      ...               ...    ... .       ...
  [295640]    chr19 59127801-59128000      * |         0
  [295641]    chr19 59128001-59128200      * |         0
  [295642]    chr19 59128201-59128400      * |         0
  [295643]    chr19 59128401-59128600      * |         0
  [295644]    chr19 59128601-59128800      * |         0
  -------
  seqinfo: 23 sequences from an unspecified genome

positions RDS object can be downloaded here. This object includes the positions for 263 cell types, but the example above only includes one.

>tss
GRanges object with 18436 ranges and 1 metadata column:
          seqnames    ranges strand |   gene_name
             <Rle> <IRanges>  <Rle> | <character>
    OR4F5     chr1     69091      + |       OR4F5
   SAMD11     chr1    860260      + |      SAMD11
   KLHL17     chr1    895967      + |      KLHL17
  PLEKHN1     chr1    901877      + |     PLEKHN1
    ISG15     chr1    948803      + |       ISG15
      ...      ...       ...    ... .         ...
  MTCP1NB     chrX 154299637      - |     MTCP1NB
    MTCP1     chrX 154376212      - |       MTCP1
   RAB39B     chrX 154493874      - |      RAB39B
    CLIC2     chrX 154563966      - |       CLIC2
    TMLHE     chrX 154899605      - |       TMLHE
  -------
  seqinfo: 23 sequences from hg19 genome

tss RDS object can be downloaded here.

Error:

>GenomicRanges:::findKNN(positions, tss, k = 2) # get the 2 nearest genes for every location in positions with distance 

Error in pc(extractList(start(starts), pwindows) - end(query), end(query) -  : 
  all the objects to combine must have the same length

However if we swap the subject and query then the error disappears but I'm left with a list of empty elements of the same length as the number of genes:

> GenomicRanges:::findKNN(positions, tss, k = 2)
IntegerList of length 18436
[[1]] integer(0)
[[2]] integer(0)
[[3]] integer(0)
[[4]] integer(0)
[[5]] integer(0)
[[6]] integer(0)
[[7]] integer(0)
[[8]] integer(0)
[[9]] integer(0)
[[10]] integer(0)
...
<18426 more elements>

Relavant Bioconductor threads:

Any help would be much appreciated!

Session info

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doParallel_1.0.15    iterators_1.0.12     foreach_1.4.8        caret_6.0-85         lattice_0.20-38      GenomicRanges_1.38.0 GenomeInfoDb_1.22.0  IRanges_2.20.2       S4Vectors_0.24.3     BiocGenerics_0.32.0  forcats_0.4.0       
[12] stringr_1.4.0        dplyr_0.8.4          purrr_0.3.3          readr_1.3.1          tidyr_1.0.2          tibble_2.1.3         ggplot2_3.2.1        tidyverse_1.3.0     

loaded via a namespace (and not attached):
 [1] httr_1.4.1             jsonlite_1.6.1         splines_3.6.2          prodlim_2019.11.13     modelr_0.1.5           assertthat_0.2.1       GenomeInfoDbData_1.2.2 cellranger_1.1.0       yaml_2.2.1             ipred_0.9-9           
[11] pillar_1.4.3           backports_1.1.5        glue_1.3.1             pROC_1.16.1            XVector_0.26.0         rvest_0.3.5            colorspace_1.4-1       recipes_0.1.9          Matrix_1.2-18          plyr_1.8.5            
[21] timeDate_3043.102      pkgconfig_2.0.3        broom_0.5.4            haven_2.2.0            zlibbioc_1.32.0        scales_1.1.0           gower_0.2.1            lava_1.6.6             generics_0.0.2         pacman_0.5.1          
[31] withr_2.1.2            nnet_7.3-12            lazyeval_0.2.2         cli_2.0.1              survival_3.1-8         magrittr_1.5           crayon_1.3.4           readxl_1.3.1           fs_1.3.1               fansi_0.4.1           
[41] nlme_3.1-144           MASS_7.3-51.5          xml2_1.2.2             class_7.3-15           data.table_1.12.8      tools_3.6.2            hms_0.5.3              lifecycle_0.1.0        munsell_0.5.0          reprex_0.3.0          
[51] compiler_3.6.2         rlang_0.4.4            grid_3.6.2             RCurl_1.98-1.1         rstudioapi_0.11        bitops_1.0-6           ModelMetrics_1.2.2.1   gtable_0.3.0           codetools_0.2-16       DBI_1.1.0             
[61] reshape2_1.4.3         R6_2.4.1               lubridate_1.7.4        stringi_1.4.6          Rcpp_1.0.3             vctrs_0.2.3            rpart_4.1-15           dbplyr_1.4.2           tidyselect_1.0.0

findOverlaps with type="equal" and a GRangesList

When I use findOverlaps with type="equal" and a GRangesList I get an error:

> findOverlaps(gr, grIntrons.24, type = "equal")
Error in match.arg(type) : 
  'arg' should be one of “any”, “start”, “end”, “within”

(in this case gr is a GRanges and grIntrons.24 is a GRangesList).

This is made even more confusing by the generic

> findOverlaps
standardGeneric for "findOverlaps" defined from package "IRanges"

function (query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", 
    "start", "end", "within", "equal"), select = c("all", "first", 
    "last", "arbitrary"), ...) 
standardGeneric("findOverlaps")
<bytecode: 0x7f8a47c5aea8>
<environment: 0x7f8a48e3a1e0>
Methods may be defined for arguments: query, subject
Use  showMethods("findOverlaps")  for currently available ones.

which strongly suggests type="equal" is valid.

inner and outer mcols for GRangesFactor

It would be desirable for GRangesFactor to consistently transmit the information in the mcols of its levels when providing GRanges-like semantics. In particular:

example(GRangesFactor)
levels(grf1)$whee <- 1:6

# It would be nice to see 'whee' mentioned somewhere.
show(grf1)

# Needs ignore.mcols=TRUE to get 'whee', but then it loses 'ID'.
unfactor(grf1) 

# Doesn't know any better, so just returns NULL.
grf1$whee

One of the appeals of using a GRangesFactor in the first place is so that we can stuff loads of content into the mcols of the levels without inflating the overall object. While setting is easy with mcols(levels(x))<-, we should consider ways of making it easy to do get back the "expanded" level metadata without needing the wordy levels(x)$whee[as.integer(x)]. For example:

show could just list the metadata fields in the levels, even if it doesn't show them.
unfactor's default could try to include both level and element-wise metadata (i.e., mcols(x)), throwing a warning if the latter overrides the former by name.
$ could try to get the expanded level metadata if name does not exist in the element-wise metadata.

This whole situation is similar to the inner/outer mcols retrieval issue for GRLs. It would probably be desirable to be consistent across these two classes - though arguably, GRLs are not a great example because the inner mcols are shown but so hard to get (what a tease!).

Feature request: make intra- and inter- range transformations work on GPos objects

See https://support.bioconductor.org/p/113391/

Enhancement: Promoters Calculated Don't Overlap with Other Genes

I am trying to reproduce a published study by reimplementing the algorithm. The algorithm takes 200000 bases, the average length of a Topologically Associated Domain (TAD), upstream of the TSS and downstream of the TES. GenomicRanges has a function named promoters. It would be nice to have a setting such as nonOverlapping = TRUE that would extend the promoter region upstream until it reaches another gene's exon in the GRangesList object or 200000 bases, whichever is lesser.

Some fixes I would like

Two fixes I would like implemented.

First is that GenomicRanges::sort() for GRangesList should have an ignore.strand argument(just like it has for Granges object), so that you can sort following either GRanges standard (minus strand objects with highest start is first in group) and bed standard (- strand objects with highest start is last in group)

Second GenomicRanges::sort() for GRangesList also is waaay too slow. In my results a GRangesList of 76k transcripts, just sorting the output GRangesList is 99% of the time spent in the entire pipeline, that is not good.

This example sorts in < 1 second for a million GRanges groups by using data.table::order() :
DT <- as.data.table(grl)
asgrl <- makeGRangesListFromDataFrame(
DT[order(group, start)],split.field = "group",
names.field = "group_name", keep.extra.columns = T)
names(asgrl) <- names(grl)

but of course, more dangerous I guess, example is just for "+" strands, "-" strands must use order(,decreasing = T)

Bug in follow and precede

In my code, I want to get the row of the y range that comes after the x range by using the code below. But when I run the code, I get the row of the y range that comes before the x range. Precede gives the opposite.

y <- data.frame('seqnames' = c('chr11', 'chr11'), 'start' = c(62789197, 62789292), 'end' = c(62789210, 62789426), strand = c('+', '+'))

x <- data.frame('seqnames' = 'chr11', 'start' = 62789211, 'end' = 62789271, strand = '+')

x <- GenomicRanges::makeGRangesFromDataFrame(x, ignore.strand = F)
y <- GenomicRanges::makeGRangesFromDataFrame(y, ignore.strand = F)

x <- GenomicRanges::GRanges(seqnames(x), IRanges(end(x),   end(x)),   strand = strand(x))
y <- GenomicRanges::GRanges(seqnames(y), IRanges(start(y), start(y)), strand = strand(y))

IRanges::follow(x, y, select = 'last', ignore.strand = F)

Unit tests succeed but for the wrong reason

The test here https://github.com/Bioconductor/GenomicRanges/blob/master/inst/unitTests/test_GRangesList-class.R#L103 and immediately following fails because it is trying to assign to an object that does not exist outside the assignment

> start(GRangesList()) = NULL
Error in start(GRangesList()) = NULL : 
  invalid (NULL) left side of assignment

Probably the intention was

> grl = GRangesList()
> start(grl) = NULL
Error in `start<-`(`*tmp*`, value = NULL) : 
  replacement 'value' is not an IntegerList with the same elementNROWS as 'x'

Error in mapToTranscripts

Hi:
When I use function mapToTranscripts I get an error:

> mapToTranscripts(x, GR)
Error in .normargSeqlevels(seqnames) :
  supplied 'seqlevels' cannot contain NAs or empty strings ("")

However,table(is.na(seqlevels(GR))) table(seqlevels(GR)=="") Hint that it doesn't exist NA or empty strings
traceback:

12: stop(errmsg)
11: .normargSeqlevels(seqnames)
10: Seqinfo(names(seqlengths), seqlengths)
9: new_GRanges("GRanges", seqnames = seqnames, ranges = ranges,
       strand = strand, mcols = mcols, seqlengths = seqlengths,
       seqinfo = seqinfo)
8: GRanges(names(transcripts)[transcriptsHits], xrange, strand(flat)[txHits],
       df, seqlengths = seqlengths)
7: .mapToTranscripts(x, transcripts, hits, ignore.strand, intronJunctions)
6: .local(x, transcripts, ...)
5: mapToTranscripts(x, grl, ignore.strand)
4: mapToTranscripts(x, grl, ignore.strand)
3: .local(x, transcripts, ...)
2: mapToTranscripts(x, GR)
1: mapToTranscripts(x, GR)

Supplementary information

> str(GR)
Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
  ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. ..@ values         : Factor w/ 440 levels "NC_000001.11",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. ..@ lengths        : int [1:435] 16474 12488 10350 7126 7287 8488 7809 6504 7222 7837 ...
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. ..@ start          : int [1:178581] 11874 29926 30366 30438 69091 131068 182388 200442 487055 722046 ...
  .. .. ..@ width          : int [1:178581] 2536 1370 138 21 918 3769 2491 3542 3765 3777 ...
  .. .. ..@ NAMES          : chr [1:178581] "NR_046018.2" "XR_001737835.1" "NR_036051.1" "rna-MIR1302-2" ...
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 1 2 1 2 1 2 1 2 1 2 ...
  .. .. ..@ lengths        : int [1:715] 8550 7924 6458 6030 5145 5205 3683 3443 3703 3584 ...
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. ..@ seqnames   : chr [1:440] "NC_000001.11" "NC_000002.12" "NC_000003.12" "NC_000004.12" ...
  .. .. ..@ seqlengths : int [1:440] NA NA NA NA NA NA NA NA NA NA ...
  .. .. ..@ is_circular: logi [1:440] NA NA NA NA NA NA ...
  .. .. ..@ genome     : chr [1:440] NA NA NA NA ...
  ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
  .. .. ..@ rownames       : NULL
  .. .. ..@ nrows          : int 178581
  .. .. ..@ listData       :List of 2
  .. .. .. ..$ tx_id  : int [1:178581] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. ..$ tx_name: chr [1:178581] "NR_046018.2" "XR_001737835.1" "NR_036051.1" "rna-MIR1302-2" ...
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ elementType    : chr "ANY"
  ..@ metadata       :List of 1
  .. ..$ genomeInfo:List of 15
  .. .. ..$ Db type                                 : chr "TxDb"
  .. .. ..$ Supporting package                      : chr "GenomicFeatures"
  .. .. ..$ Data source                             : chr "/home/weir/RNAedit/human_test/reference/GCF_000001405.38_GRCh38.p12_genomic.gff"
  .. .. ..$ Organism                                : chr NA
  .. .. ..$ Taxonomy ID                             : chr NA
  .. .. ..$ miRBase build ID                        : chr NA
  .. .. ..$ Genome                                  : chr NA
  .. .. ..$ transcript_nrow                         : chr "178581"
  .. .. ..$ exon_nrow                               : chr "1945509"
  .. .. ..$ cds_nrow                                : chr "1460272"
  .. .. ..$ Db created by                           : chr "GenomicFeatures package from Bioconductor"
  .. .. ..$ Creation time                           : chr "2019-06-17 22:46:48 +0800 (Mon, 17 Jun 2019)"
  .. .. ..$ GenomicFeatures version at creation time: chr "1.34.8"
  .. .. ..$ RSQLite version at creation time        : chr "2.1.1"
  .. .. ..$ DBSCHEMAVERSION                         : chr "1.2"

propagating names from RangesList to GRanges

Should this give "foo" or NULL (as currently):

names(GRanges(IRangesList(chr1=IRanges(1, 10, names="foo"))))

Feature request: make mcols on a GRangesList return a SplitDataFrameList

I currently cannot access the metadata columns of GRangesList without some form of custom "witchcraft"

library(TxDb.Scerevisiae.UCSC.sacCer3.sgdGene)
txdb <- TxDb.Scerevisiae.UCSC.sacCer3.sgdGene
grl<- GenomicFeatures::cdsBy(txdb, by = "gene")
mcols(grl)
#> DataFrame with 6534 rows and 0 columns
mcols(grl[[1]])
#> DataFrame with 2 rows and 2 columns
#>      cds_id    cds_name
#>   <integer> <character>
#> 1      6992          NA
#> 2      6993          NA
mcols(grl[,"cds_id"])
#> DataFrame with 6534 rows and 0 columns

For accessing columns my function looks like this

.get_column_GRangesList <- function(grl,column){
    relist(mcols(grl@unlistData)[,column],grl@partitioning)
}

Is there a specific reason, this feature exists as it is right now? Did I miss some specific way of accessing the metadata?

Thanks for any help and advice

Export GenomicRanges object as gtf file

Hello,
I'm relatively new to working with GenomicRanges, so this question may be naive, but I couldn't find any documentation elsewhere.

I was wondering if it is possible to export the genomic ranges(Granges) object as a gtf(gene transfer format).

For my current analysis, it is important for me to identify exons that overlap between different genes in my reference annotation and exclude them. I did this by the following call:

no.overlappers <- exonicParts(eqcab_db, linked.to.single.gene.only=TRUE)

Now that I have dropped all of the exons that were linked to more than 1 gene, I'd like to export this file as a gtf for my analysis. Is this possible?

Thank you and let me know if there needs to be any further clarification. Look forward to your comments!

pc Performance Slow for Two GRangesList

A previous question on the support site has a comment on an answer from the person asking the question:

Thank you Mike and Michael for your responses. Michael's solution works in sub-second time. mapply didn't return after a few minutes on real input so I stopped it.

The solution involves the use of pc. When I do it with the current version of GenomicRanges and R 4.0.1, it takes a few minutes. Could it be a performance regression since 2017? The length of each list is also about 20000 components, like the original question.

> length(upstreamCisRegions)
[1] 23215
> length(downstreamCisRegions)
[1] 23215
system.time(allCisRegions <- pc(upstreamCisRegions, downstreamCisRegions))
   user  system elapsed 
567.914   9.354 578.292

coercion from character vector to GRanges does not report the faulty string

In case of a long character vector x, it can be hard for the user to identify the faulty string(s) when as(x, "GRanges") fails. Would be nice if the error message was more informative e.g. by reporting the index and value of the 1st faulty element in x.

Add GRangesFactor class

A specialization of Factor objects where the levels can be any GRanges object or derivative.

As discussed here: Bioconductor/Contributions#1114

'update' methods need exporting?

@hpages This support site question seems to be due to 'update' methods not being exported / documented in GenomicRanges (and elsewhere?). This seems to fix the immediate problem.

GenomicRanges master$ git diff
diff --git a/NAMESPACE b/NAMESPACE
index 3fbb6ff..6d9880c 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -73,7 +73,7 @@ exportMethods(
     merge,
 
     ## Generics defined in the stats4 package:
-    summary,
+    summary, update,
 
     ## Generics defined in the BiocGenerics package:
     duplicated, match,

For reference: the reproducible example

seqinfo_hg19 = GenomeInfoDb::Seqinfo(genome = 'hg19')

se <- SummarizedExperiment::SummarizedExperiment(
  rowRanges = GenomicRanges::GRanges(c("chr1", "chr2", "chr1"), 
                                     IRanges::IRanges(1:3, width = 1)))

GenomeInfoDb::seqinfo(se) = seqinfo_hg19[GenomeInfoDb::seqlevelsInUse(se)]
#> Error in methods::slot(object, name): no slot of name "call" for this object of class "GRanges"

makeGRangesFromDataFrame check for missing cells before sending to IRanges

First, thanks for this awesome package!

The error message when there is a missing cell in the data.frame is a bit cryptic (and I am not sure what users know the meaning of solve_user_SEW0). makeGRangesFromDataFrame should check for missing cells so and throw an exception there so that the user do not feel necessary to debug on the level of IRanges.

Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
  In range 36: at least two out of 'start', 'end', and 'width', must
  be supplied.

I can create an PR if needed :)

Feature request: GPos() constructor can make a GPos object from an IPos object

See https://support.bioconductor.org/p/113394/

numeric meta column collapsed for connecting ranges when converting from Rle to GRanges

library(BSgenome.Scerevisiae.UCSC.sacCer2)
set.seed(55)

## 2) As a metadata column on a disjoint GRanges object
## ----------------------------------------------------

gr2 <- GRanges(c("chrI:5",
                 "chrI:6",
                 "chrI:291-377",
                 "chrV:51-60"),
               score=c(0.4,0.4, -10, 2.2),
               id=letters[1:4],
               seqinfo=seqinfo(Scerevisiae))
gr2

bindAsGRanges(mcolAsRleList(gr2, "score"))

Sessioninfo:

` sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] BSgenome.Scerevisiae.UCSC.sacCer2_1.4.0 BSgenome_1.52.0 rtracklayer_1.44.0 Biostrings_2.52.0
[5] XVector_0.24.0 ggplot2_3.3.0 chromoMap_0.2 dplyr_0.8.3
[9] Gviz_1.28.3 GenomicRanges_1.36.0 GenomeInfoDb_1.20.0 IRanges_2.18.1
[13] S4Vectors_0.22.0 BiocGenerics_0.30.0

loaded via a namespace (and not attached):
[1] ProtGenerics_1.16.0 bitops_1.0-6 matrixStats_0.54.0 bit64_0.9-7 RColorBrewer_1.1-2 progress_1.2.2
[7] httr_1.4.1 rstan_2.19.2 tools_3.6.0 backports_1.1.4 R6_2.4.0 rpart_4.1-15
[13] Hmisc_4.2-0 DBI_1.0.0 lazyeval_0.2.2 colorspace_1.4-1 nnet_7.3-12 withr_2.1.2
[19] processx_3.3.1 tidyselect_0.2.5 gridExtra_2.3 prettyunits_1.0.2 bit_1.1-14 curl_4.1
[25] compiler_3.6.0 cli_1.1.0 Biobase_2.44.0 htmlTable_1.13.1 DelayedArray_0.10.0 labeling_0.3
[31] scales_1.0.0 checkmate_1.9.3 callr_3.2.0 StanHeaders_2.19.0 stringr_1.4.0 digest_0.6.21
[37] Rsamtools_2.0.0 foreign_0.8-70 base64enc_0.1-3 dichromat_2.0-0 pkgconfig_2.0.3 htmltools_0.3.6
[43] ensembldb_2.8.0 htmlwidgets_1.3 rlang_0.4.0 rstudioapi_0.10 RSQLite_2.1.1 BiocParallel_1.18.1
[49] acepack_1.4.1 inline_0.3.15 VariantAnnotation_1.30.1 RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.1
[55] Formula_1.2-3 loo_2.1.0 Matrix_1.2-17 Rcpp_1.0.2 munsell_0.5.0 stringi_1.4.3
[61] yaml_2.2.0 SummarizedExperiment_1.14.0 zlibbioc_1.30.0 pkgbuild_1.0.3 blob_1.1.1 ggrepel_0.8.1
[67] crayon_1.3.4 lattice_0.20-38 splines_3.6.0 GenomicFeatures_1.36.1 hms_0.4.2 ps_1.3.0
[73] knitr_1.23 pillar_1.4.2 biomaRt_2.40.0 XML_3.98-1.20 glue_1.3.1 packrat_0.5.0
[79] biovizBase_1.32.0 latticeExtra_0.6-28 data.table_1.12.2 gtable_0.3.0 purrr_0.3.3 assertthat_0.2.1
[85] xfun_0.7 AnnotationFilter_1.8.0 survival_2.43-3 tibble_2.1.3 GenomicAlignments_1.20.0 AnnotationDbi_1.46.0
[91] memoise_1.1.0 cluster_2.0.8 `

Error in getListElement(x, i, ...)

I create a GRanges object by the following commands but I fail to view it.

> library(GenomicRanges)
> gr=GRanges(seqnames=c("chr1","chr2","chr2"),
           ranges=IRanges(start=c(50,150,200),
                          end=c(100,200,300)),
           strand=c("+","-","-")
)
> gr
GRanges object with 3 ranges and 0 metadata columns:
Error in getListElement(x, i, ...) : 
  IRanges objects don't support [[, as.list(), lapply(), or unlist() at
  the moment

Here is the information of my R session.

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
[1] C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.38.0 GenomeInfoDb_1.22.1  IRanges_2.20.1      
[4] S4Vectors_0.24.4     BiocGenerics_0.32.0 

loaded via a namespace (and not attached):
[1] zlibbioc_1.32.0        compiler_3.6.0         tools_3.6.0           
[4] XVector_0.26.0         GenomeInfoDbData_1.2.2 RCurl_1.95-4.12       
[7] bitops_1.0-6

Could anyone please tell me how to fix this problem? Thanks a lot !

coverage() with small weight values returns non-zero values over empty positions

I've noticed that when using coverage() over IRanges/GRanges objects using the weight argument, very small but non-zero values can be returned for empty positions.

> gr <- GRanges("chr1", IRanges(1:3, 2:4))
> seqinfo(gr) <- Seqinfo(seqnames = "chr1", seqlengths = 1e3)
> gr
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1       1-2      *
  [2]     chr1       2-3      *
  [3]     chr1       3-4      *
  -------
  seqinfo: 1 sequence from an unspecified genome

> coverage(gr)$chr1
integer-Rle of length 1000 with 4 runs
  Lengths:   1   2   1 996
  Values :   1   2   1   0

> coverage(gr, weight = c(0.01, 0.02, 0.03))$chr1
numeric-Rle of length 1000 with 5 runs
  Lengths:                     1                     1                     1                     1                   996
  Values :                  0.01                  0.03                  0.05                  0.03 -3.46944695195361e-18

Thanks

> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.41.1 GenomeInfoDb_1.25.0  IRanges_2.23.4       S4Vectors_0.27.5    
[5] BiocGenerics_0.35.2 

loaded via a namespace (and not attached):
 [1] rstudioapi_0.11        XVector_0.29.0         zlibbioc_1.35.0        R6_2.4.1              
 [5] fansi_0.4.1            tools_4.0.0            pkgbuild_1.0.8         cli_2.0.2             
 [9] withr_2.2.0            remotes_2.1.1          assertthat_0.2.1       rprojroot_1.3-2       
[13] crayon_1.3.4           processx_3.4.2         GenomeInfoDbData_1.2.3 callr_3.4.3           
[17] bitops_1.0-6           ps_1.3.3               RCurl_1.98-1.2         curl_4.3              
[21] glue_1.4.1             compiler_4.0.0         backports_1.1.7        prettyunits_1.1.1

[Edit: not unique to GRanges objects]

Document and export get_out_of_bound_index

Hello,

The internal function get_out_of_bound_index is handy for getting the index of out-of-bounds ranges. I think people would find common use for it if it was documented and exported.

Cheers!

findOverlaps is reporting overlaps that are not actual overlaps

findOverlaps is reporting overlaps that are not actual overlaps.

Command
ov <- findOverlaps(ag.gr, bam.alignments.df.unique.gr, ignore.strand=TRUE, type="within", select="all")

Input files:
ag.gr:

> ag.gr
GRanges object with 6452 ranges and 0 metadata columns:
                     seqnames              ranges strand
                        <Rle>           <IRanges>  <Rle>
   chr10:5004878_A/G    chr10     5004877-5004878      +
   chr10:7702074_A/G    chr10     7702073-7702074      +
  chr10:17723270_A/G    chr10   17723269-17723270      +
  chr10:18094537_A/G    chr10   18094536-18094537      +
  chr10:19610109_A/G    chr10   19610108-19610109      +
                 ...      ...                 ...    ...
  chrX:168654413_T/C     chrX 168654412-168654413      -
  chrX:168654419_T/C     chrX 168654418-168654419      -
  chrX:169311696_T/C     chrX 169311695-169311696      -
  chrX:169311856_T/C     chrX 169311855-169311856      -
  chrX:169312549_T/C     chrX 169312548-169312549      -
  -------
  seqinfo: 66 sequences from an unspecified genome; no seqlengths

bam.alignments.df.unique.gr:

> bam.alignments.df.unique.gr
GRanges object with 6249 ranges and 0 metadata columns:
                                       seqnames              ranges strand
                                          <Rle>           <IRanges>  <Rle>
  e1c99f26-4253-43e7-a0db-3432e11887aa     chr1     4857839-4897633      +
  024fe75b-ddec-44bc-8aa0-43b0405c5372     chr1     4878013-4897892      -
  f32172b2-87cf-4bb7-b48c-aa57f800cf98     chr1     4878024-4897906      -
  f3c04f57-5951-413c-bce9-7f96b0d27726     chr1     4878075-4897614      +
  7a96e313-e4b0-4e73-8bc3-faa6edc2cfc7     chr1     4886801-4897908      -
                                   ...      ...                 ...    ...
  9e583772-5059-4f15-968d-1239bf3776f9     chr1 191546524-191556508      +
  5c69f539-4438-4024-b404-dc8acbea26bb     chr1 191548367-191561539      -
  3597e52b-9540-4bd8-952d-d25c5b35d922     chr1 191552466-191556859      -
  215df717-96c3-4929-a2c6-fdd95b21344a     chr1 191552989-191575528      -
  e230985c-4fa8-423f-b64c-54fe7d955d9f     chr1 191556304-191556791      -
  -------
  seqinfo: 66 sequences from an unspecified genome; no seqlengths

Output after taking indexing the queryHits and subjectHits back to the original dataframe:

                                       qwidth   start     end width njunc
e1c99f26-4253-43e7-a0db-3432e11887aa.1   2408 4857839 4897633 39795     9
024fe75b-ddec-44bc-8aa0-43b0405c5372.1   2389 4878013 4897892 19880     7
f32172b2-87cf-4bb7-b48c-aa57f800cf98.1   2437 4878024 4897906 19883     7
f3c04f57-5951-413c-bce9-7f96b0d27726.1   2036 4878075 4897614 19540     7
7a96e313-e4b0-4e73-8bc3-faa6edc2cfc7.1   2293 4886801 4897908 11108     6
3e96c6cf-d3d2-42fb-9a59-8a57c1938e32.1   1920 4889467 4897628  8162     5
                                       ag.seqnames ag.start distance.to.snp
e1c99f26-4253-43e7-a0db-3432e11887aa.1         chr1   5124337          266499
024fe75b-ddec-44bc-8aa0-43b0405c5372.1         chr1   5124337          246325
f32172b2-87cf-4bb7-b48c-aa57f800cf98.1         chr1   5124337          246314
f3c04f57-5951-413c-bce9-7f96b0d27726.1         chr1   5124337          246263
7a96e313-e4b0-4e73-8bc3-faa6edc2cfc7.1         chr1   5124337          237537
3e96c6cf-d3d2-42fb-9a59-8a57c1938e32.1         chr1   5124337          234871

You can see that the read starts at 4857839 and ends at 4897633, but findOverlaps is reporting that our site at 5124337 overlaps with this range, but this is not true because it is actually outside the range. Would you have any idea how to fix this?

Thanks.

GRangesFactor should inherit from GenomicRanges

as discussed in issue #25

problems with AtomicList in mcols in Rdevel

Happens only with R-devel:

> library(GenomicRanges)
> library(S4Vectors)
> library(IRanges)
> gr <- GRanges("chr1", IRanges(1:5, width=10))
> fl <- FactorList(lapply(1:5, FUN=function(x) sample(LETTERS,x)))
> fl
FactorList of length 5
[[1]] W
[[2]] P Q
[[3]] B V Y
[[4]] V M N Y
[[5]] T E K B O
> gr$fl <- fl
> gr
GRanges object with 5 ranges and 1 metadata column:
      seqnames    ranges strand |           fl
         <Rle> <IRanges>  <Rle> | <FactorList>
  [1]     chr1      1-10      * |             
  [2]     chr1      2-11      * |             
  [3]     chr1      3-12      * |             
  [4]     chr1      4-13      * |             
  [5]     chr1      5-14      * |             
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> gr$fl
FactorList of length 5
Error in RangeNSBS(x, start = start, end = end, width = width) : 
  the specified range is out-of-bounds

This works fine:

DataFrame(fl=fl)
DataFrame with 5 rows and 1 column
            fl
  <FactorList>
1            X
2          I,E
3        T,F,R
4    I,V,J,...
5    Y,N,F,...

This also works:

> fl <- FactorList(lapply(1:5, FUN=function(x) sample(LETTERS,x)), compress=FALSE)
> mcols(gr) <- NULL
> gr$fl <- fl
> gr
GRanges object with 5 ranges and 1 metadata column:
      seqnames    ranges strand |           fl
         <Rle> <IRanges>  <Rle> | <FactorList>
  [1]     chr1      1-10      * |            M
  [2]     chr1      2-11      * |          G,K
  [3]     chr1      3-12      * |        B,Y,Z
  [4]     chr1      4-13      * |    G,A,L,...
  [5]     chr1      5-14      * |    C,O,X,...

Suggesting that it is related to compression. However, further down the line it seems it goes back to compressing it automatically, and I get errors like:

Error in validObject(result) : 
  invalid class "CompressedFactorList" object: 
    improper partitioning

> sessionInfo()
R Under development (unstable) (2021-04-08 r80148)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /home/pigerm/applications/R-devel/lib/libRblas.so
LAPACK: /home/pigerm/applications/R-devel/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_CH.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_CH.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_CH.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.43.4 GenomeInfoDb_1.27.10 IRanges_2.25.7      
[4] S4Vectors_0.29.15    BiocGenerics_0.37.1 

loaded via a namespace (and not attached):
[1] zlibbioc_1.37.0        compiler_4.1.0         tools_4.1.0           
[4] XVector_0.31.1         GenomeInfoDbData_1.2.4 RCurl_1.98-1.3        
[7] bitops_1.0-6

GRangesList fails with named arguments

Calling:

GRangesList(a=GRanges())

fails with

Error in `rownames<-`(`*tmp*`, value = "a") : invalid rownames length

I'm pretty sure this was working before, and indeed it works fine with BioC-release versions of all packages. Traceback suggests it may be related to the new use.names=TRUE setting for mcols(), as discussed in Bioconductor/SummarizedExperiment#2.

R version 3.5.0 Patched (2018-04-30 r74679)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /home/cri.camres.org/lun01/Software/R/R-3-5-branch/lib/libRblas.so
LAPACK: /home/cri.camres.org/lun01/Software/R/R-3-5-branch/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.33.6 GenomeInfoDb_1.17.1  IRanges_2.15.13     
[4] S4Vectors_0.19.12    BiocGenerics_0.27.0 

loaded via a namespace (and not attached):
[1] zlibbioc_1.27.0        compiler_3.5.0         XVector_0.21.1        
[4] GenomeInfoDbData_1.1.0 RCurl_1.95-4.10        bitops_1.0-6

Sorting a SortedByQueryHits produces a Hits

Running the following code on the current GenomicRanges devel version (1.31.3):

library(GenomicRanges)
example(GenomicRanges, echo=FALSE)
olap <- findOverlaps(gr, gr)
class(sort(olap))

... gives a Hits object, while previously in the release (1.30.0) it gave a SortedByQueryHits object. Is this change intentional? I am curious because my InteractionSet package defines its own findOverlaps method, and I would like to know what the correct return class should be.

Can't create GRanges Object with more than 11 peaks

I updated my GenomicRanges to 1.40.0 and I can't create objects bigger than 11 peaks now. There was no such problem with 1.38.0.

suppressWarnings(suppressPackageStartupMessages(library(GenomicRanges)))
bed <- structure(list(Chr = c("chr5", "chr17", "chr8", "chr4", "chr1", 
                              "chr3", "chr12", "chr12", "chr3", "chr9", "chr7", "chr11"), Start = c("24841478", 
                                                                                                    "8162955", "40577584", "145277698", "180808752", "88732151", 
                                                                                                    "72461561", "69816610", "135579997", "83868473", "80131463", 
                                                                                                    "106556431"), End = c("24845196", "8164380", "40578029", "145278483", 
                                                                                                                          "180815472", "88732652", "72462310", "69818185", "135580952", 
                                                                                                                          "83868703", "80131945", "106557146")), row.names = c(NA, 12L), class = "data.frame")
colnames(bed) <- c("Chr", "Start", "End")

GRanges(head(bed, n = 11))
#> GRanges object with 11 ranges and 0 metadata columns:
#>        seqnames              ranges strand
#>           <Rle>           <IRanges>  <Rle>
#>    [1]     chr5   24841478-24845196      *
#>    [2]    chr17     8162955-8164380      *
#>    [3]     chr8   40577584-40578029      *
#>    [4]     chr4 145277698-145278483      *
#>    [5]     chr1 180808752-180815472      *
#>    [6]     chr3   88732151-88732652      *
#>    [7]    chr12   72461561-72462310      *
#>    [8]    chr12   69816610-69818185      *
#>    [9]     chr3 135579997-135580952      *
#>   [10]     chr9   83868473-83868703      *
#>   [11]     chr7   80131463-80131945      *
#>   -------
#>   seqinfo: 9 sequences from an unspecified genome; no seqlengths
GRanges(head(bed, n = 12))
#> GRanges object with 12 ranges and 0 metadata columns:
#> Error in dimnames(x) <- dn: length of 'dimnames' [1] not equal to array extent

^{Created on 2020-12-30 by the reprex package (v0.3.0)}

Any help would be appreciated,
Onur

sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.2  magrittr_2.0.1  tools_4.0.2     htmltools_0.5.0
#>  [5] yaml_2.2.1      stringi_1.5.3   rmarkdown_2.6   highr_0.8      
#>  [9] knitr_1.30      stringr_1.4.0   xfun_0.19       digest_0.6.27  
#> [13] rlang_0.4.10    evaluate_0.14

^{Created on 2020-12-30 by the reprex package (v0.3.0)}

IRanges promoters changes coming to GenomicRanges?

Hi Hervé,

Could I please request this change is ported to GenomicRanges:

promoters() args 'upstream' and 'downstream' now can be integer vectors parallel to 'x' (for consistency with the other intra range transformations) (Bioconductor/IRanges@2fb997f)

Thanks,
Pete

GPos - Error in getClass(x) : “UnstitchedGPos” is not a defined class

We get an error in the Vignette of our DominoEffect package since a few days and it seems due to the GPos function.

Here is the printout if I replicate the error:

R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(DominoEffect)
> data("SnpData", package = "DominoEffect")
> chr_info <- paste("chr", SnpData$Chr_name,":", 
+                   SnpData$Position_on_chr, "-", 
+                   SnpData$Position_on_chr, sep = "")
> head(chr_info)
[1] "chr1:150917624-150917624" "chr1:150936280-150936280" "chr1:169823521-169823521" "chr1:169823718-169823718" "chr1:169823790-169823790"
[6] "chr1:17256626-17256626"  
> snp_data <- GenomicRanges::GPos(chr_info, stitch = FALSE)
Error in getClass(x) : “UnstitchedGPos” is not a defined class
> traceback()
6: stop(gettextf("%s is not a defined class", dQuote(Class)), domain = NA)
5: getClass(x)
4: getSlots(x_class)
3: S4Vectors:::normarg_mcols(mcols, Class, ans_len)
2: new_GRanges(Class, seqnames = seqnames, ranges = pos, strand = strand, 
       mcols = mcols, seqinfo = seqinfo)
1: GenomicRanges::GPos(chr_info, stitch = FALSE)
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] zlibbioc_1.31.0        compiler_3.6.0         IRanges_2.19.10        XVector_0.25.0         parallel_3.6.0        
 [6] tools_3.6.0            GenomicRanges_1.37.12  GenomeInfoDbData_1.2.1 RCurl_1.95-4.12        S4Vectors_0.23.13     
[11] BiocGenerics_0.31.4    GenomeInfoDb_1.21.1    bitops_1.0-6           stats4_3.6.0

Now if I load the GenomicRanges package, the error does not happen anymore:

> library(GenomicRanges)
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max,
    which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:grDevices’:

    windows

Loading required package: GenomeInfoDb
> snp_data <- GenomicRanges::GPos(chr_info, stitch = FALSE)
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.37.12 GenomeInfoDb_1.21.1   IRanges_2.19.10       S4Vectors_0.23.13     BiocGenerics_0.31.4  

loaded via a namespace (and not attached):
[1] zlibbioc_1.31.0        compiler_3.6.0         XVector_0.25.0         tools_3.6.0            GenomeInfoDbData_1.2.1 RCurl_1.95-4.12       
[7] bitops_1.0-6

So I am not sure how to interpret this, but I think there is some function that is not loaded if I only access the GPos function but can be accessed when I have loaded the GenomicRanges package. To avoid the error in building our Vignette, I guess I would need to import this functionality as well into our package. So far I import the following functions:

importFrom(GenomicRanges, GPos, mcols, pos)

Any idea why the error happens and how we can avoid getting the error in the future in our Vignette when we call the GPos function? Thanks for your help! If something is unclear please feel free to contact me.

Easy access to inner metadata of a GRangesList

I was playing around with the grl GRangesList object from example(GRangesList), and it wasn't clear to me how to access the inner metadata. The best I could find was:

relist(unlist(grl)$score, grl)

... which does what I want (extracting the metadata while it is still grouped by the grl) but was not particularly obvious to me. I'm surprised that there is no inner.mcols function, or an inner=TRUE argument to mcols, or something more convenient, given that accessing inner metadata seems like a fairly common operation. Of course, it's entirely possible that I've just missed it.

plan for dealing with loss of KEGG.db?

KEGG.db is still in the DESCRIPTION file and used in vignettes. Anyone planning on
revising package? Ideally we would introduce values derived from KEGGREST. If there
is no current plan I am willing to give this a shot in a PR.

Making it easier to convert from IGV coordinates

IGV gives coordiantes as this:

"chr5:37614,351-37614537:+"

Would be nice if the GRanges string constructor would accept "," in the string.

Now this fails:

GRanges("chr5:37,614,351-37,614,537:+") #  <- There are commas in here 

#This works: 
GRanges(sub(pattern = "," , x = "chr5:37,614,351-37,614,537:+", replacement = ""))

Could this be implemented in bioc 3.9 or 3.10 ? So that copy from IGV coordinates is a 1-liner, instead of having to manually remove commas. Or is there a way this could harm the function ?

GRanges gives 'Error in getListElement' when creating objects

Hi, I am using R/3.6.1 and GenomicRanges/1.38.0 and for some reason I am not able to create a GRanges object anymore starting yesterday. I have tried removing the package and reinstall but it still shows the same error.

x1 <- "chr2:56-125"
as(x1, "GRanges")
GRanges object with 1 range and 0 metadata columns:
Error in getListElement(x, i, ...) : IRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment

gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), IRanges(1:10, width=10:1))
gr0
GRanges object with 10 ranges and 0 metadata columns:
Error in getListElement(x, i, ...) : IRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment

Support distance() / nearest() for circular chromosomes

Opening an issue for the request to implement distance() / nearest() on circular chromosomes:

https://support.bioconductor.org/p/111087/

I'll start by writing unit tests for distance() that demonstrate the expected behavior and will post here for discussion. Others who are interested, feel free to do the same.

overlaps with character arguments as seqlevels?

I was recently in the situation where I needed to write some code that would detect whether a GRanges or GRangesList contained elements on a particular chromosome. Well, no problem, I'll just look at the seqnames:

example(GenomicRanges, echo=FALSE)
as.logical(seqnames(gr) %in% "chr1")
##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE

So far so good. But then I realized that the same approach would not work properly for GRangesLists:

set.seed(10)
grl <- split(gr, sample(3, length(gr), replace=TRUE))
seqnames(grl) %in% "chr1"
## RleList of length 3
## $`1`
## logical-Rle of length 1 with 1 run
##   Lengths:     1
##   Values : FALSE
## 
## $`2`
## logical-Rle of length 2 with 2 runs
##   Lengths:     1     1
##   Values : FALSE  TRUE
## 
## $`3`
## logical-Rle of length 7 with 3 runs
##   Lengths:     2     1     4
##   Values : FALSE  TRUE FALSE

Which breaks the GRanges* abstraction that I was hoping to use. As such, I need to write GRanges and GRangesList-specific code to check whether the entries contain any intervals in my desired chromosome - not great.

However, it occurred to me that an elegant solution would be to repurpose overlapsAny(), which always returns a logical vector. To wit, the following gives me the desired result for both objects:

chr1 <- GRanges("chr1:1-1000")
overlapsAny(gr, chr1)
##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
overlapsAny(grl, chr1)
## [1] FALSE  TRUE  TRUE

The above is not quite perfect as it still requires us to construct chr1, which requires knowledge of the range of entries on chromosome 1. A user-friendlier version of the above would allow us to just do:

overlapsAny(gr, "chr1")
overlapsAny(grl, "chr1")

To achieve the same effect. This would simply require new methods for GRanges(List),character, with the understanding that all character arguments are interpreted as seqlevels by the GenomicRanges overlap infrastructure.

strange code for GenomicRanges:::compatibleStrand

Hi. I noticed some strange code in GenomicRanges (v1.38.0) for the
function GenomicRanges:::compatibleStrand. I believe that when
building GenomicRanges in R, the following happens:

R processes strand-utils.R, which defines an S4 generic for
compatibleStrand stored in GenomicRanges:::compatibleStrand, and
defines several S4 methods.
R processes findOverlaps-methods.R (this is processed after
strand-utils.R, according to the "Collate" field in
GenomicRanges/DESCRIPTION). This file defines a regular function
stored in GenomicRanges:::compatibleStrand, overwriting the generic
function defined in strand-utils.R.

Thus, this package defines several methods for compatibleStrand that
are never used, since there is no S4 generic function for it. Was
this intentional? It could be that the generic and methods are just
stale code that should be removed.

GRanges Error: "Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent"

Im trying to create a Granges object from a data frame. Here is my data frame:

> head(cna[,1:3],15)
   chrom loc.start   loc.end
1      1         1  60278618
2      1  60278619  60280056
3      1  60280057 122661020
4      1 122661021 124971287
5      1 124971288 125073244
6      1 125073245 125116434
7      1 125116435 234775098
8      1 234775099 234802730
9      1 234802731 248956422
10     2         1  89349774
11     2  89349775  94340332
12     2  94340333 242193529
13     3         1  90718825
14     3  90718826  91226754
15     3  91226755 188755202

I can make a a Granges object with 11 rows:

>     i = 1
>     j = 11
>     gr <- GRanges(seqnames=Rle(cna$chrom[i:j]),
+                   ranges=IRanges(cna$loc.start[i:j],
+                                  cna$loc.end[i:j]),
+                   strand="*") # Turn intoo GRanges object
>     gr
GRanges object with 11 ranges and 0 metadata columns:
       seqnames              ranges strand
          <Rle>           <IRanges>  <Rle>
   [1]        1          1-60278618      *
   [2]        1   60278619-60280056      *
   [3]        1  60280057-122661020      *
   [4]        1 122661021-124971287      *
   [5]        1 124971288-125073244      *
   [6]        1 125073245-125116434      *
   [7]        1 125116435-234775098      *
   [8]        1 234775099-234802730      *
   [9]        1 234802731-248956422      *
  [10]        2          1-89349774      *
  [11]        2   89349775-94340332      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

but once I add another row, up to 12, it throws an error:

>     j = 12
>     gr <- GRanges(seqnames=Rle(cna$chrom[i:j]),
+                   ranges=IRanges(cna$loc.start[i:j],
+                                  cna$loc.end[i:j]),
+                   strand="*") # Turn intoo GRanges object
>     gr
GRanges object with 12 ranges and 0 metadata columns:
Error in dimnames(x) <- dn : 
  length of 'dimnames' [1] not equal to array extent

Im only showing a small portion of the larger problem...I want to make the first 3 columns of cna into a Granges object but I found that these row errors were happening.

Cannot unset strict.strand in intersect

When I attempt to find the intersect of a set of stranded ranges and a set of unstranded ranges, ranges with strand "*" do not match ranges with strand "+" or "-".

library(GenomicRanges)

a <- GRanges(
	paste0('chr',as.character(1:10)),
	IRanges(
		rep(1,10),
		rep(100,10)
	),
	rep('+',10)
)

b <- GRanges(
	paste0('chr',as.character(c(1:10,1:10))),
	IRanges(
		c(rep(1,10),rep(90,10)),
		c(rep(10,10),rep(110,10))
	),
	rep('*',20)
)

intersect(a,b)

gives the unexpected output

GRanges object with 0 ranges and 0 metadata columns:
   seqnames    ranges strand
      <Rle> <IRanges>  <Rle>
  -------
  seqinfo: 10 sequences from an unspecified genome; no seqlengths

When ignore.strand=T, it gives the expected output.

> intersect(a,b,ignore.strand=T)

GRanges object with 20 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]     chr1      1-10      *
   [2]     chr1    90-100      *
   [3]     chr2      1-10      *
   [4]     chr2    90-100      *
   [5]     chr3      1-10      *
   ...      ...       ...    ...
  [16]     chr8    90-100      *
  [17]     chr9      1-10      *
  [18]     chr9    90-100      *
  [19]    chr10      1-10      *
  [20]    chr10    90-100      *
  -------
  seqinfo: 10 sequences from an unspecified genome; no seqlengths

This appears to be an issue with strict.strand defaulting to TRUE rather than FALSE (as indicated in the documentation for GenomicRanges::intersect), but when I attempt to unset it, the argument isn't recognized.

> intersect(a,b,strict.strand=F)

Error in .local(x, y, ...) : unused argument (strict.strand = FALSE)

> sessionInfo()

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 21.04

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.40.0 GenomeInfoDb_1.24.2  IRanges_2.22.2
[4] S4Vectors_0.26.1     BiocGenerics_0.34.0  nvimcom_0.9-82

loaded via a namespace (and not attached):
[1] zlibbioc_1.34.0        compiler_4.0.4         XVector_0.28.0
[4] tools_4.0.4            GenomeInfoDbData_1.2.3 RCurl_1.98-1.5
[7] bitops_1.0-7

gpos/GPos equivalent for granges/GRanges

Currently, calling granges on a GPos-object returns a GRanges-object. It would be nice if granges returned either the corresponding GPos-object, or adding a separate gpos function for the GPos class.

Inefficiency of GPos("chr1:1-4e8") with GenomicRanges 1.37.7

Takes a long time and uses a lot of memory:

library(GenomicRanges)

system.time(gpos <- GPos("chr1:1-4e8"))
#    user  system elapsed 
#   7.342   3.723  11.070 

gc()
#           used (Mb) gc trigger   (Mb)   max used    (Mb)
# Ncells 2170379  116    4387842  234.4    2640964   141.1
# Vcells 3800652   29 1157310983 8829.6 1403787890 10710.1

gpos is stitched so construction should be very fast and use very little memory!

import error

Hello, I have installed the GenomicRanges of version 1.38 on my linux server. But when I import it, I get the following error:

library(GenomicRanges)
Loading required package: BiocGenerics
Error in value[3L] :
Package ‘BiocGenerics’ version 0.30.0 cannot be unloaded:
Error in unloadNamespace(package) : namespace ‘BiocGenerics’ is imported by ‘GenomicRanges’, ‘XVector’, ‘S4Vectors’, ‘GenomeInfoDb’, ‘IRanges’ so cannot be unloaded

How to fix this problem? I have installed BioGenerics of the newest version 0.32 from bioconductor. Thanks.

gaps and GRangesList

gaps() does not work on a GRangesList:

> gaps(grIntrons.24)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘gaps’ for signature ‘"CompressedGRangesList"’

I would expect it to work :) Running endoapply(grIntrons.24, gaps) takes forever.

Could we directly save the metadata when we use the function makeTxDbFromGFF?

Dear Bioconductor team,

Lately, I used the function makeTxDbFromGFF to generate the TxDb from the gff3 file downloaded from Gencode. I also notice the additional information within the Gencode gff3 file is very useful, e.g. transcript_support_level, gene_type, and so on. I wondered whether it is possible to keep all this information when converting the gff3 file to a TxDb object. Thank you.

Best regards,
You Zhou

GPos doesn't survive round-trip conversion to data.frame

suppressPackageStartupMessages(library(GenomicRanges))

# Errors
gpos <- GPos(c("chr1", "chr2"), pos = c(3, 10))
as(as.data.frame(gpos), "GPos")
#> Error in .try_to_coerce_to_GRanges_first(from, "UnstitchedGPos"): object to coerce to UnstitchedGPos couldn't be coerced to GRanges first

# Workaround by first coercing to GRanges and then coercing result back to GPos
as(as(as.data.frame(as(gpos, "GRanges")), "GRanges"), "GPos")
#> UnstitchedGPos object with 2 positions and 0 metadata columns:
#>       seqnames       pos strand
#>          <Rle> <integer>  <Rle>
#>   [1]     chr1         3      *
#>   [2]     chr2        10      *
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

^{Created on 2020-08-24 by the reprex package (v0.3.0)}

Session info

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       macOS Catalina 10.15.5      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Australia/Melbourne         
#>  date     2020-08-24                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package          * version  date       lib source        
#>  assertthat         0.2.1    2019-03-21 [1] CRAN (R 4.0.0)
#>  backports          1.1.8    2020-06-17 [1] CRAN (R 4.0.0)
#>  BiocGenerics     * 0.35.4   2020-06-04 [1] Bioconductor  
#>  bitops             1.0-6    2013-08-17 [1] CRAN (R 4.0.0)
#>  callr              3.4.3    2020-03-28 [1] CRAN (R 4.0.0)
#>  cli                2.0.2    2020-02-28 [1] CRAN (R 4.0.0)
#>  crayon             1.3.4    2017-09-16 [1] CRAN (R 4.0.0)
#>  desc               1.2.0    2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools           2.3.1    2020-07-21 [1] CRAN (R 4.0.2)
#>  digest             0.6.25   2020-02-23 [1] CRAN (R 4.0.0)
#>  ellipsis           0.3.1    2020-05-15 [1] CRAN (R 4.0.0)
#>  evaluate           0.14     2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi              0.4.1    2020-01-08 [1] CRAN (R 4.0.0)
#>  fs                 1.5.0    2020-07-31 [1] CRAN (R 4.0.2)
#>  GenomeInfoDb     * 1.25.10  2020-08-06 [1] Bioconductor  
#>  GenomeInfoDbData   1.2.3    2020-04-30 [1] Bioconductor  
#>  GenomicRanges    * 1.41.6   2020-08-12 [1] Bioconductor  
#>  glue               1.4.1    2020-05-13 [1] CRAN (R 4.0.0)
#>  highr              0.8      2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools          0.5.0    2020-06-16 [1] CRAN (R 4.0.0)
#>  IRanges          * 2.23.10  2020-06-13 [1] Bioconductor  
#>  knitr              1.29     2020-06-23 [1] CRAN (R 4.0.0)
#>  magrittr           1.5      2014-11-22 [1] CRAN (R 4.0.0)
#>  memoise            1.1.0    2017-04-21 [1] CRAN (R 4.0.0)
#>  pkgbuild           1.1.0    2020-07-13 [1] CRAN (R 4.0.0)
#>  pkgload            1.1.0    2020-05-29 [1] CRAN (R 4.0.0)
#>  prettyunits        1.1.1    2020-01-24 [1] CRAN (R 4.0.0)
#>  processx           3.4.3    2020-07-05 [1] CRAN (R 4.0.1)
#>  ps                 1.3.4    2020-08-11 [1] CRAN (R 4.0.2)
#>  R6                 2.4.1    2019-11-12 [1] CRAN (R 4.0.0)
#>  RCurl              1.98-1.2 2020-04-18 [1] CRAN (R 4.0.0)
#>  remotes            2.2.0    2020-07-21 [1] CRAN (R 4.0.2)
#>  rlang              0.4.7    2020-07-09 [1] CRAN (R 4.0.0)
#>  rmarkdown          2.3      2020-06-18 [1] CRAN (R 4.0.0)
#>  rprojroot          1.3-2    2018-01-03 [1] CRAN (R 4.0.0)
#>  S4Vectors        * 0.27.12  2020-06-09 [1] Bioconductor  
#>  sessioninfo        1.1.1    2018-11-05 [1] CRAN (R 4.0.0)
#>  stringi            1.4.6    2020-02-17 [1] CRAN (R 4.0.0)
#>  stringr            1.4.0    2019-02-10 [1] CRAN (R 4.0.0)
#>  testthat           2.3.2    2020-03-02 [1] CRAN (R 4.0.0)
#>  usethis            1.6.1    2020-04-29 [1] CRAN (R 4.0.0)
#>  withr              2.2.0    2020-04-20 [1] CRAN (R 4.0.0)
#>  xfun               0.16     2020-07-24 [1] CRAN (R 4.0.2)
#>  XVector            0.29.3   2020-06-25 [1] Bioconductor  
#>  yaml               2.2.1    2020-02-01 [1] CRAN (R 4.0.0)
#>  zlibbioc           1.35.0   2020-05-14 [1] Bioconductor  
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Context: I'm looking to convert GPos objects to data.frame objects before writing to disk in an HDF5 file so that I can make use of the existing write/read data.frame functionality in rhdf5.

Speed difference in way you add mcols to GRanges

grl is some large GRangesList
gr <- unlist(gr, use.names = F)

system.time(gr$names <- "1")
user system elapsed
19.139 0.014 19.152

temp <- gr
gr <- unlist(gr, use.names = F)
system.time(mcols(x = gr) <- DataFrame(row.names = names(gr), names = rep("1", length(gr))))
user system elapsed
0.024 0.000 0.025

identical(temp, gr)
[1] TRUE

This is a 800x speed increase in case 2, I think this slows down a lot of packages here on Bioc, since they often use the first syntax.

Or is there a specific reason for this ?

sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] GenomicFeatures_1.33.2 GenomicRanges_1.33.13 IRanges_2.15.17

...