johncassil / stringr.plus Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
I would like someone to check all functions in the package and make sure that they work when applied to vectors, per @abidawson
I did a quick test and was unsure why I got these results:
`> test_vector <- c('www.carfax.com/vehicle/3GCPKTE77DG348900', 'www.carfax.com/vehicle/3GCPKTE77DG348900', 'www.carfax.com/vehicle/3GCPKTE77DG348900', 'www.carfax.com/vehicle/3GCPKTE77DG348900')
stringr.plus::str_extract_before('www.carfax.com/vehicle/3GCPKTE77DG348900', "vehicle")
[1] "www.carfax.com/"
stringr.plus::str_extract_after('www.carfax.com/vehicle/3GCPKTE77DG348900', "vehicle")
[1] "/3GCPKTE77DG348900"
stringr.plus::str_extract_before(test_vector, "vehicle")
[1] "www.carfax.com/" "www.carfax.com/" "www.carfax.com/" "www.carfax.com/"
stringr.plus::str_extract_after(test_vector, "vehicle")
[1] "ehicle/3GCPKTE77DG348900" "ehicle/3GCPKTE77DG348900" "ehicle/3GCPKTE77DG348900"
[4] "ehicle/3GCPKTE77DG348900"`
thank you for your excellent work, but I have some problems.
R version 4.3.0 (2023-04-21 ucrt) -- "Already Tomorrow"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
| >
Hi,
Thanks for your great package! It has spared me a lot of googling about regex.
I had a little trouble installing the package at first, because I was using a slightly older version of devtools which expected the default branch to be called 'master' rather than 'main':
devtools::install_github("johncassil/stringr.plus")
Error: Failed to install 'unknown package' from GitHub:
HTTP error 404.
No commit found for the ref master
Did you spell the repo owner (`johncassil`) and repo name (`stringr.plus`) correctly?
- If spelling is correct, check that you have the required permissions to access the repo.
It's not a problem in the latest release of devtools, but for someone unknowingly using an older version, it's a very minor headache.
Would it be possible to update the installation information in the README to
devtools::install_github("johncassil/stringr.plus", ref = "main")
to prevent anyone else with a similarly old version from having the same issue?
Thanks!
Hi there,
A while back I wrote a function to return a window of a given size around a pattern. This is helpful for understanding the context of a detected string (e.g. how is this string used in my data, especially for very long blocks of text) as well as detecting false positives.
For example:
str_context(string = "In a hole in the ground there lived a hobbit.", pattern = "ground", window_size = 6)
would return "...n the ground there..."
There would also be a parameter for how many matches to return.
Does this sound like an interesting/useful function for your package? If so I'd be happy to submit as a Hacktoberfest PR. Thanks!
We need to add a couple of good examples to the readme so users understand how this works.
The way these functions are set up might lead to some confusing behaviour:
library(stringr.plus)
url <- 'www.carfax.com/vehicle/3GCPKTE77DG348900'
str_extract_before(string = url, pattern = '/')
#> [1] "www.carfax.com"
What if we wanted everything before the last "/" ?
Likewise:
str_extract_after(string = url, pattern = '/')
#> [1] "vehicle/3GCPKTE77DG348900"
what if we wanted everything after the last "/" ?
The default is finding the first location of the pattern and using that, but we could add in a "which" argument that accepts "first" and "last" with the default "first" for finer grain selection. Maybe it could also take a number and extract the text before/after the nth occurrence of a pattern (for cases when you know there are seven slashes/underscores and you want the data after the 5th.
Have you considered leveraging the power of lookahead/lookbehind? It looks like it might be easier to maintain.
str_before <- function(string, pattern, n = NULL) {
n_str <- ifelse(is.null(n), "*?", glue::glue("{<n>}", .open = "<", .close = ">"))
new_pattern <- glue::glue(".{n_str}(?={pattern})")
stringr::str_extract(string, new_pattern)
}
str_before("www.carfax.com/vehicle/3GCPKTE77DG348900", "/")
#> [1] "www.carfax.com"
str_before("www.carfax.com/vehicle/3GCPKTE77DG348900", ".com")
#> [1] "www.carfax"
str_before("www.carfax.com/vehicle/3GCPKTE77DG348900", ".com", 6)
#> [1] "carfax"
str_after <- function(string, pattern, n = NULL) {
n_str <- ifelse(is.null(n), "*", glue::glue("{<n>}", .open = "<", .close = ">"))
new_pattern <- glue::glue("(?<={pattern}).{n_str}")
stringr::str_extract(string, new_pattern)
}
str_after("www.carfax.com/vehicle/3GCPKTE77DG348900", "/")
#> [1] "vehicle/3GCPKTE77DG348900"
str_after("www.carfax.com/vehicle/3GCPKTE77DG348900", "vehicle/")
#> [1] "3GCPKTE77DG348900"
str_after("www.carfax.com/vehicle/3GCPKTE77DG348900", "vehicle/", n = 6)
#> [1] "3GCPKT"
Created on 2020-08-14 by the reprex package (v0.3.0)
Also, have you considered submitting such a proposal to I see it's not something {stringr} is interested in (at least in 2018): tidyverse/stringr#222{stringr}
itself?
I use this pattern all the time, so I'm interested in it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.