Git Product home page Git Product logo

string_splitter's Introduction

StringSplitter

Build Status Gem Version

NAME

StringSplitter - String#split on steroids

INSTALLATION

gem "string_splitter"

SYNOPSIS

require "string_splitter"

ss = StringSplitter.new

Same as String#split

ss.split("foo bar baz")
ss.split("foo bar baz", " ")
ss.split("foo bar baz", /\s+/)
# => ["foo", "bar", "baz"]

ss.split("foo", "")
ss.split("foo", //)
# => ["f", "o", "o"]

ss.split("", "...")
ss.split("", /.../)
# => []

Split at the first delimiter

ss.split("foo:bar:baz:quux", ":", at: 1)
ss.split("foo:bar:baz:quux", ":", select: 1)
# => ["foo", "bar:baz:quux"]

Split at the last delimiter

ss.split("foo:bar:baz:quux", ":", at: -1)
# => ["foo:bar:baz", "quux"]

Split at multiple delimiter positions

ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
# => ["1", "2", "3", "4:5:6:7:8", "9"]

Split at all but the first and last delimiters

ss.split("1:2:3:4:5:6", ":", except: [1, -1])
ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
# => ["1:2", "3", "4", "5:6"]

Split from the right

ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
# => ["1", "2:3:4:5:6", "7", "8", "9"]

Split with negative, descending, and infinite ranges

ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
# => ["1", "2", "3", "4", "5", "6", "7:8:9"]

ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
# => ["1:2:3:4", "5", "6", "7", "8:9"]

ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
# => ["1", "2:3", "4", "5", "6:7", "8", "9"]

Full control via a block

result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
  split.pos % 2 == 0
end
# => ["1:2", "3:4", "5:6", "7:8"]
string = "banana".chars.sort.join # "aaabnn"

ss.split(string, "") do |split|
    split.rhs != split.lhs
end
# => ["aaa", "b", "nn"]

DESCRIPTION

Many languages have built-in split functions/methods for strings. They behave similarly (notwithstanding the occasional surprise), and handle a few common cases, e.g.:

  • limiting the number of splits
  • including the separator(s) in the results
  • removing (some) empty fields

But, because the API is squeezed into two overloaded parameters (the delimiter and the limit), achieving the desired results can be tricky. For instance, while String#split removes empty trailing fields (by default), it provides no way to remove all empty fields. Likewise, the cramped API means there's no way to, e.g., combine a limit (positive integer) with the option to preserve empty fields (negative integer), or use backreferences in a delimiter pattern without including its captured subexpressions in the result.

If split was being written from scratch, without the baggage of its legacy API, it's possible that some of these options would be made explicit rather than overloading the parameters. And, indeed, this is possible in some implementations, e.g. in Crystal:

":foo:bar:baz:".split(":", remove_empty: false)
# => ["", "foo", "bar", "baz", ""]

":foo:bar:baz:".split(":", remove_empty: true)
# => ["foo", "bar", "baz"]

StringSplitter takes this one step further by moving the configuration out of the method altogether and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:

ss = StringSplitter.new

ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
# => ["foo", "bar:baz"]

ss.split("foo:bar:baz:quux", ":") do |split|
  split.position == 1 || split.position == 3
end
# => ["foo", "bar:baz", "quux"]

As a shortcut, the common case of splitting (or not splitting) at one or more positions is supported by dedicated options:

ss.split("foo:bar:baz:quux", ":", select: [1, -1])
# => ["foo", "bar:baz", "quux"]

ss.split("foo:bar:baz:quux", ":", reject: [1, -1])
# => ["foo:bar", "baz:quux"]

WHY?

I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.

As an example, the nominally unstructured output of many Unix commands is often formatted in a way that's tantalizingly close to being machine-readable, apart from a few pesky exceptions, e.g.:

$ ls -l

-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md
-rw-r--r-- 1 user users  254 Jun 19 21:21 Gemfile
drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
-rw-r--r-- 1 user users 8952 Jun 18 18:16 LICENSE.md
-rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md

These lines can almost be parsed into an array of fields by splitting them on whitespace. The exception is the date (columns 6-8), i.e.:

line = "-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md"
line.split

gives:

["-rw-r--r--", "1", "user", "users", "87", "Jun", "18", "18:16", "CHANGELOG.md"]

instead of:

["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]

One way to work around this is to parse the whole line, e.g.:

line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)

But that requires us to specify everything. What we really want is a version of split which allows us to veto splitting for the 6th and 7th delimiters (and to stop after the 8th delimiter), i.e. control over which splits are accepted, rather than being restricted to the single, baked-in strategy provided by the limit parameter.

By providing a simple way to accept or reject each split, StringSplitter makes cases like this easy to handle, either via a block:

ss.split(line) do |split|
  case split.position when 1..5, 8 then true end
end
# => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]

Or via its option shortcut:

ss.split(line, at: [1..5, 8])
# => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]

CAVEATS

Differences from String#split

Unlike String#split, StringSplitter doesn't trim the string before splitting if the delimiter is omitted or a single space, e.g.:

" foo bar baz ".split          # => ["foo", "bar", "baz"]
" foo bar baz ".split(" ")     # => ["foo", "bar", "baz"]

ss.split(" foo bar baz ")      # => ["", "foo", "bar", "baz", ""]
ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]

String#split omits the nil values of unmatched optional captures:

"foo:bar:baz".scan(/(:)|(-)/)  # => [[":", nil], [":", nil]]
"foo:bar:baz".split(/(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]

StringSplitter preserves them by default (if include_captures is true, as it is by default), though they can be omitted from spread captures by passing :compact as the value of the spread_captures option:

s1 = StringSplitter.new(spread_captures: true)
s2 = StringSplitter.new(spread_captures: false)
s3 = StringSplitter.new(spread_captures: :compact)

s1.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", nil, "bar", ":", nil, "baz"]
s2.split("foo:bar:baz", /(:)|(-)/) # => ["foo", [":", nil], "bar", [":", nil], "baz"]
s3.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]

COMPATIBILITY

StringSplitter is tested and supported on all versions of Ruby supported by the ruby-core team, i.e., currently, Ruby 2.5 and above.

VERSION

0.7.3

SEE ALSO

Gems

  • rsplit - a reverse-split implementation (only works with string delimiters)

Articles

AUTHOR

chocolateboy

COPYRIGHT AND LICENSE

Copyright © 2018-2020 by chocolateboy.

This is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.

string_splitter's People

Contributors

chocolateboy avatar dependabot-support avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

string_splitter's Issues

Trailing empty fields

StringSplitter#split (and #rsplit) produces a trailing empty field if the separator is empty e.g.:

ss.split("foo", "")   # => ["f", "o", "o", ""]
ss.split("foo", //)   # => ["f", "o", "o", ""]
ss.split("foo", /()/) # => ["f", "", "o", "", "o", "", ""]

This is because String#split includes them in its results:

"foo".split("", -1)   # => ["f", "o", "o", ""]
"foo".split(//, -1)   # => ["f", "o", "o", ""]
"foo".split(/()/, -1) # => ["f", "", "o", "", "o", "", ""]

We can omit them by passing a non-negative limit, but this produces the wrong results in cases where there are trailing non-empty delimiters i.e.:

# right
"foo:bar:baz:".split(":", -1)    # => ["foo", "bar", "baz", ""]
"foo:bar:baz::".split(/:/, -1)   # => ["foo", "bar", "baz", "", ""]
"foo:bar:baz::".split(/(:)/, -1) # => ["foo", ":", "bar", ":", "baz", ":", "", ":", ""]

# wrong
"foo:bar:baz:".split(":", 0)    # => ["foo", "bar", "baz"]
"foo:bar:baz::".split(":", 0)   # => ["foo", "bar", "baz"]
"foo:bar:baz::".split(/:/, 0)   # => ["foo", "bar", "baz"]
"foo:bar:baz::".split(/(:)/, 0) # => ["foo", ":", "bar", ":", "baz", ":", "", ":"]

remove_empty removes non-empty delimiters

StringSplitter 1.1.1
Rubyruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
OSLinux (Arch)

The remove_empty option removes empty fields, as expected, but it also appears to be removing non-empty delimiters.

Test

ss = StringSplitter.new(remove_empty: true, include_captures: true)
ss.split("::", ":")

Expected

[":", ":"]

Actual

[]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.