Git Product home page Git Product logo

proposal-reversible-string-split's Introduction

Reversible string split

Image from @DasSurma

Status

Champion(s): Luca Casonato

Author(s): Luca Casonato

Stage: 1

Motivation

The string split method in JavaScript behaves unlike the string split methods in nearly all other languages. In JavaScript, a split(sep, N) is essentially a regular split, but with the output array truncated to the first N values. JavaScript considers the N to mean the number of splits and the number of return items.

In most other languages a splitN instead splits the original string into N items, including a remainder. They thus split N-1 times. The last item in the returned array contains the "remainder" of the string.

# Perl

print join('\n', split(/\|/, 'a|b|c|d|e|f', 2))

# a
# b|c|d|e|f
<!-- PHP -->
<?php

print join("\n", explode("|", "a|b|c|d|e|f", 2));

# a
# b|c|d|e|f
# Ruby

print 'a|b|c|d|e|f'.split('|', 2)

# ["a", "b|c|d|e|f"]
// Go

package main

import (
  "fmt"
  "strings"
)

func main() {
  fmt.Printf("%#v", strings.SplitN("a|b|c|d|e|f", "|", 2))
}

// []string{"a", "b|c|d|e|f"}
// Rust

fn main() {
  let v = "a|b|c|d|e|f".splitn(2, "|").collect::<Vec<_>>();
  println!("{:?}", v);
}

// ["a", "b|c|d|e|f"]
// Java

class Playground {
  public static void main(String[] args) {
    String s = new String("a|b|c|d|e|f");
    for(String val : s.split("\\|", 2)) {
      System.out.println(val);
    }
  }
}

// a
// b|c|d|e|f
# Python

print('a|b|c|d|e|f'.split('|', 2))

# ['a', 'b', 'c|d|e|f']
// JavaScript

console.log("a|b|c|d|e|f".split("|", 2));

// ["a", "b"]

The first 6/8 languages agree here. They consider the N to mean "the number of items returned" and the remainder to be the last item in the returned array. This means they actually split N-1 times.

Python also agree that the remainder should be returned as the last item in the array. It disagrees with the rest about what N means though. Python splits N times, and returns N+1 items.

JavaScript diverges from the pack completely though: it splits N times, and returns N items, but does not return a remainder at all. It is the only language to do so.

Even though Python and the other langauges are slightly different from each other, all their algorithms have a common feature that JavaScript is missing: their splits are reversible. This means that if you split a string into N items, you can join them back together without losing any information.

Reversible splits have the property that for any string V and any seperator S and any unsigned non 0 integer N, the following is valid:

join(S, V.split(S, N)) == V;

This reversability allows using string splits for some very useful tasks, where the current split method does not work. The most common use case for this are prefix splits:

Prefix splits

Many formats out there are character delimited. It is useful to be able to easially split a string at those predefined "split points" into two parts. For example the INI file format uses the = character to separate key-value pairs, and the \n character to separate key-value pairs from each other.

key = value
other_key = 'value contains an = sign'

With the current "split" in JavaScript, parsing this is not as obvious as with the "more popular" splitting algorithm:

// Current JavaScript
const ini = Deno.readTextFileSync("./test.ini");
const entries = ini.split("\n").map((line) => {
  const [key, ...rest] = line.split("=");
  return [key, rest.join("=")];
});

// Other languages
const ini = Deno.readTextFileSync("./test.ini");
const entries = ini.split("\n").map((line) => line.splitn("=", 2));

Note: I am aware this could be made more efficient with a different "parser". That is not the point. The point is to make the obvious thing easy.

This behavious is not just relevant for the INI file format, but also for things like HTTP headers in HTTP/1.1, key value pairs in Cookie headers, and many more.

Proposal

The proposal is to add reversible string split support to JavaScript. This propsal proposes the addition of a String.prototype.splitn method that splits the input string at most N-1 times, returning N substrings. The last item contains the remainder of the string.

console.log("a|b|c|d|e|f".splitn("|", 2));
// ["a", "b|c|d|e|f"]

The naming is taken from Rust and Go.

Q&A

Could this be an extra option for the split method?

Yes. This could also be an option in a new options bag for split. Example:

console.log("a|b|c|d|e|f".split("|", { n: 2 }));
// or
console.log("a|b|c|d|e|f".split("|", 2, true));
// or
console.log("a|b|c|d|e|f".split("|", { n: 2, remainder: true }));
// or
console.log("a|b|c|d|e|f".split("|", 2, { remainder: true }));

The first may be confusing to users though, as it is not obvious that the return value between split("|", 2) and split("|", { n: 2 }) is different. These kinds of overloads exist on the web platform (e.g. addEventListener), but the form you use does not impact behaviour.

The second is more clear, but at the same time also less clear, because it is not obvious what the true value in the third argument is.

The third option is pretty clear, but is also the most verbose. The verbosity may make it cumbersome to use.

The 4th option is probably the "cleanest". Because the extra option is ignored in current engines, it might make it look like the extra option is supported, whereas in fact it is not - it is just being ignored.

Which of the 4 proposed options should ultimately be used should be up to the committee as a whole. I don't really care (although I prefer the splitn option).

I like the current behaviour of split!

No worries! It isn't going away. The new splitn function is meant to simplify the usecases described above. You can continue to use split as it exists now.

proposal-reversible-string-split's People

Contributors

lucacasonato avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.