johncassil / stringr.plus Goto Github PK

License: Other

R 100.00%

stringr.plus's Issues

can't install your packages

thank you for your excellent work, but I have some problems.

R version 4.3.0 (2023-04-21 ucrt) -- "Already Tomorrow"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

remotes::install_github("johncassil/stringr.plus")
Downloading GitHub repo johncassil/stringr.plus@HEAD
── R CMD build ─────────────────────────────────────────────
✔ checking for file 'C:\Users\yongfengshi\AppData\Local\Temp\Rtmp8gU6Pt\remotesc879e13c09\johncassil-stringr.plus-111503d/DESCRIPTION' (742ms) ─ preparing 'stringr.plus':
✔ checking DESCRIPTION meta-information ... ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories Omitted 'LazyData' from DESCRIPTION ─ building 'stringr.plus_0.1.2.tar.gz' 将程序包安装入‘C:/Users/yongfengshi/AppData/Local/R/win-library/4.3’ (因为‘lib’没有被指定) '\Mac\Home\Documents' ��Ϊ��ǰĿ¼��·�� CMD.EXE�� UNC ·��֧�֡�Ĭ��ֵ��Ϊ Windows Ŀ¼�� * installing source package 'stringr.plus' ... using staged installation R ** byte-compile and prepare package for lazy loading
ERROR: lazy loading failed for package 'stringr.plus' * removing 'C:/Users/yongfengshi/AppData/Local/R/win-library/4.3/stringr.plus' Warning message: In i.p(...) : 安装程序包‘C:/Users/YONGFE~1/AppData/Local/Temp/Rtmp8gU6Pt/filec84c7c4650/stringr.plus_0.1.2.tar.gz’时退出狀態的值不是0

| >

New function idea: str_context

Hi there,

A while back I wrote a function to return a window of a given size around a pattern. This is helpful for understanding the context of a detected string (e.g. how is this string used in my data, especially for very long blocks of text) as well as detecting false positives.

For example:
str_context(string = "In a hole in the ground there lived a hobbit.", pattern = "ground", window_size = 6)
would return "...n the ground there..."

There would also be a parameter for how many matches to return.

Does this sound like an interesting/useful function for your package? If so I'd be happy to submit as a Hacktoberfest PR. Thanks!

Installation information in README

Hi,

Thanks for your great package! It has spared me a lot of googling about regex.

I had a little trouble installing the package at first, because I was using a slightly older version of devtools which expected the default branch to be called 'master' rather than 'main':

devtools::install_github("johncassil/stringr.plus")
Error: Failed to install 'unknown package' from GitHub:
  HTTP error 404.
  No commit found for the ref master

  Did you spell the repo owner (`johncassil`) and repo name (`stringr.plus`) correctly?
  - If spelling is correct, check that you have the required permissions to access the repo.

It's not a problem in the latest release of devtools, but for someone unknowingly using an older version, it's a very minor headache.

Would it be possible to update the installation information in the README to

devtools::install_github("johncassil/stringr.plus", ref = "main")

to prevent anyone else with a similarly old version from having the same issue?

Thanks!

Alternative implementation: lookahead/lookbehind

Have you considered leveraging the power of lookahead/lookbehind? It looks like it might be easier to maintain.

str_before <- function(string, pattern, n = NULL) {
  n_str <- ifelse(is.null(n), "*?", glue::glue("{<n>}", .open = "<", .close = ">"))
  new_pattern <- glue::glue(".{n_str}(?={pattern})")
  stringr::str_extract(string, new_pattern)
}
str_before("www.carfax.com/vehicle/3GCPKTE77DG348900", "/")
#> [1] "www.carfax.com"
str_before("www.carfax.com/vehicle/3GCPKTE77DG348900", ".com")
#> [1] "www.carfax"
str_before("www.carfax.com/vehicle/3GCPKTE77DG348900", ".com", 6)
#> [1] "carfax"

str_after <- function(string, pattern, n = NULL) {
  n_str <- ifelse(is.null(n), "*", glue::glue("{<n>}", .open = "<", .close = ">"))
  new_pattern <- glue::glue("(?<={pattern}).{n_str}")
  stringr::str_extract(string, new_pattern)
}
str_after("www.carfax.com/vehicle/3GCPKTE77DG348900", "/")
#> [1] "vehicle/3GCPKTE77DG348900"
str_after("www.carfax.com/vehicle/3GCPKTE77DG348900", "vehicle/")
#> [1] "3GCPKTE77DG348900"
str_after("www.carfax.com/vehicle/3GCPKTE77DG348900", "vehicle/", n = 6)
#> [1] "3GCPKT"

^{Created on 2020-08-14 by the reprex package (v0.3.0)}

~~Also, have you considered submitting such a proposal to {stringr} itself?~~ I see it's not something {stringr} is interested in (at least in 2018): tidyverse/stringr#222

I use this pattern all the time, so I'm interested in it.

Vectorization weirdness

I would like someone to check all functions in the package and make sure that they work when applied to vectors, per @abidawson

I did a quick test and was unsure why I got these results:

`> test_vector <- c('www.carfax.com/vehicle/3GCPKTE77DG348900', 'www.carfax.com/vehicle/3GCPKTE77DG348900', 'www.carfax.com/vehicle/3GCPKTE77DG348900', 'www.carfax.com/vehicle/3GCPKTE77DG348900')

stringr.plus::str_extract_before('www.carfax.com/vehicle/3GCPKTE77DG348900', "vehicle")
[1] "www.carfax.com/"
stringr.plus::str_extract_after('www.carfax.com/vehicle/3GCPKTE77DG348900', "vehicle")
[1] "/3GCPKTE77DG348900"
stringr.plus::str_extract_before(test_vector, "vehicle")
[1] "www.carfax.com/" "www.carfax.com/" "www.carfax.com/" "www.carfax.com/"
stringr.plus::str_extract_after(test_vector, "vehicle")
[1] "ehicle/3GCPKTE77DG348900" "ehicle/3GCPKTE77DG348900" "ehicle/3GCPKTE77DG348900"
[4] "ehicle/3GCPKTE77DG348900"`

Add which = "first"/"last" arg to before/after functions

The way these functions are set up might lead to some confusing behaviour:

library(stringr.plus)
url <- 'www.carfax.com/vehicle/3GCPKTE77DG348900'

str_extract_before(string = url, pattern = '/')
#> [1] "www.carfax.com"

What if we wanted everything before the last "/" ?
Likewise:

str_extract_after(string = url, pattern = '/')
#> [1] "vehicle/3GCPKTE77DG348900"

what if we wanted everything after the last "/" ?

The default is finding the first location of the pattern and using that, but we could add in a "which" argument that accepts "first" and "last" with the default "first" for finer grain selection. Maybe it could also take a number and extract the text before/after the nth occurrence of a pattern (for cases when you know there are seven slashes/underscores and you want the data after the 5th.

add str_extract_context examples to the readme

We need to add a couple of good examples to the readme so users understand how this works.

johncassil / stringr.plus Goto Github PK

stringr.plus's Issues

can't install your packages

New function idea: str_context

Installation information in README

Alternative implementation: lookahead/lookbehind

Vectorization weirdness

Add which = "first"/"last" arg to before/after functions

add str_extract_context examples to the readme

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent