Git Product home page Git Product logo

fundu's Introduction

Configurable, precise and fast rust string parser to a Duration


Table of Contents

Overview

fundu provides a flexible and fast parser to convert rust strings into a Duration. fundu parses into its own Duration but provides methods to convert into std::time::Duration, chrono::Duration and time::Duration. If not stated otherwise, this README describes the main fundu package. Some examples for valid input strings with the standard feature

  • "1.41"
  • "42"
  • "2e-8", "2e+8" (or likewise "2.0e8")
  • ".5" or likewise "0.5"
  • "3." or likewise "3.0"
  • "inf", "+inf", "infinity", "+infinity"
  • "1w" (1 week) or likewise "7d", "168h", "10080m", "604800s", ...

and the custom (or base) feature assuming some defined custom time units s, secs, minutes, , day, days, year, years, century and the time keyword yesterday

  • "1.41minutes" or likewise "1.41 minutes" if allow_delimiter is set
  • "years" or likewise "1 years", "1years" if number_is_optional is set
  • "42 secs ago" or likewise "-42 secs" if allow_ago and allow_negative is set
  • "9e-3s", "9e3s" (or likewise "9.0e+3s")
  • "yesterday" or likewise "-1day", "-1days" if allow_negative is set
  • "9 century" or likewise "900 years"

For more examples of the custom feature see the Customization section. Summary of features provided by this crate:

  • Precision: There are no floating point calculations and the input is precisely parsed as it is. So, what you put in you is what you get out within the range of a Duration.
  • Performance: The parser is blazingly fast (Benchmarks)
  • Customization: TimeUnits, the number format and other aspects are easily configurable (Customization)
  • Sound limits: The duration saturates at Duration::MAX if the input number was larger than that maximum or if the input string was positive infinity.
  • Negative Durations: The parser can be configured to parse negative durations. Fundu's Duration can represent negative durations but also implements TryFrom for chrono::Duration and time::Duration if the corresponding feature is activated.
  • Error handling: The error messages try to be informative on their own but can also be easily adjusted (See also Examples)

fundu aims for good performance and being a lightweight crate. It is purely built on top of the rust stdlib, and there are no additional dependencies required in the standard configuration. The accepted number format is per default the scientific floating point format and compatible with f64::from_str. However, the number format and other aspects can be customized up to formats like systemd time spans or gnu relative times. There are two dedicated, simple to use fundu side-projects:

  • fundu-systemd for a fully compatible systemd time span parser
  • fundu-gnu for a fully compatible GNU relative time parser.

See also the examples Examples section and the examples folder.

For further details see the Documentation!

Installation

Add this to Cargo.toml for fundu with the standard feature.

[dependencies]
fundu = "2.0.0"

fundu is split into three main features, standard (providing DurationParser and parse_duration) and custom (providing the CustomDurationParser) and base for a more basic approach to the core parser. The first is described here in in detail, the custom feature adds fully customizable identifiers for time units. Most of the time only one of the parsers is needed. For example, to include only the CustomDurationParser add the following to Cargo.toml:

[dependencies]
fundu = { version = "2.0.0", default-features = false, features = ["custom"] }

Activating the chrono or time feature provides a TryFrom and SaturatingInto implementation for chrono::Duration or time::Duration. Converting to/from std::time::Duration is supported without the need of an additional feature.

Activating the serde feature allows some structs and enums to be serialized or deserialized with serde

Examples

If only the default configuration is required once, the parse_duration method can be used. Note that parse_duration returns a std::time::Duration in contrast to the parse method of the other parsers which return a fundu::Duration.

use std::time::Duration;

use fundu::parse_duration;

let input = "1.0e2s";
assert_eq!(parse_duration(input).unwrap(), Duration::new(100, 0));

When a customization of the accepted TimeUnits is required, then DurationParser::with_time_units can be used.

use fundu::{Duration, DurationParser};

let input = "3m";
assert_eq!(
    DurationParser::with_all_time_units().parse(input).unwrap(),
    Duration::positive(180, 0)
);

When no time units are configured, seconds is assumed.

use fundu::{Duration, DurationParser};

let input = "1.0e2";
assert_eq!(
    DurationParser::without_time_units().parse(input).unwrap(),
    Duration::positive(100, 0)
);

However, the following will return an error because y (Years) is not a default time unit:

use fundu::DurationParser;

let input = "3y";
assert!(DurationParser::new().parse(input).is_err());

The parser is reusable and the set of time units is fully customizable

use fundu::TimeUnit::*;
use fundu::{Duration, DurationParser};

let parser = DurationParser::with_time_units(&[NanoSecond, Minute, Hour]);

assert_eq!(parser.parse("9e3ns").unwrap(), Duration::positive(0, 9000));
assert_eq!(parser.parse("10m").unwrap(), Duration::positive(600, 0));
assert_eq!(parser.parse("1.1h").unwrap(), Duration::positive(3960, 0));
assert_eq!(parser.parse("7").unwrap(), Duration::positive(7, 0));

Setting the default time unit (if no time unit is given in the input string) to something different than seconds is also easily possible

use fundu::TimeUnit::*;
use fundu::{Duration, DurationParser};

assert_eq!(
    DurationParser::without_time_units()
        .default_unit(MilliSecond)
        .parse("1000")
        .unwrap(),
    Duration::positive(1, 0)
);

The identifiers for time units can be fully customized with any number of valid utf-8 sequences if the custom feature is activated:

use fundu::TimeUnit::*;
use fundu::{CustomTimeUnit, CustomDurationParser, Duration};

let parser = CustomDurationParser::with_time_units(&[
    CustomTimeUnit::with_default(MilliSecond, &["χιλιοστό του δευτερολέπτου"]),
    CustomTimeUnit::with_default(Second, &["s", "secs"]),
    CustomTimeUnit::with_default(Hour, &["⏳"]),
]);

assert_eq!(parser.parse(".3χιλιοστό του δευτερολέπτου"), Ok(Duration::positive(0, 300_000)));
assert_eq!(parser.parse("1e3secs"), Ok(Duration::positive(1000, 0)));
assert_eq!(parser.parse("1.1⏳"), Ok(Duration::positive(3960, 0)));

The custom feature can be used to customize a lot more. See the documentation of the exported items of the custom feature (like CustomTimeUnit, TimeKeyword) for more information.

Also, fundu tries to give informative error messages

use fundu::DurationParser;

assert_eq!(
    DurationParser::without_time_units()
        .parse("1y")
        .unwrap_err()
        .to_string(),
    "Time unit error: No time units allowed but found: 'y' at column 1"
);

The number format can be easily adjusted to your needs. For example to allow numbers being optional, allow some ascii whitespace between the number and the time unit and restrict the number format to whole numbers, without fractional part and an exponent (Also note that the DurationParserBuilder can build a DurationParser at compile time in const context):

use fundu::TimeUnit::*;
use fundu::{Duration, DurationParser, ParseError};

const PARSER: DurationParser = DurationParser::builder()
    .time_units(&[NanoSecond])
    .allow_time_unit_delimiter()
    .number_is_optional()
    .disable_fraction()
    .disable_exponent()
    .build();

assert_eq!(PARSER.parse("ns").unwrap(), Duration::positive(0, 1));
assert_eq!(
    PARSER.parse("1000\t\n\r ns").unwrap(),
    Duration::positive(0, 1000)
);

assert_eq!(
    PARSER.parse("1.0ns").unwrap_err(),
    ParseError::Syntax(1, "No fraction allowed".to_string())
);
assert_eq!(
    PARSER.parse("1e9ns").unwrap_err(),
    ParseError::Syntax(1, "No exponent allowed".to_string())
);

It's also possible to parse multiple durations at once with parse_multiple. The different durations can be separated by whitespace and an optional conjunction (here: and). If the delimiter is not encountered, a number or sign character can also indicate a new duration.

use fundu::{Duration, DurationParser};

let parser = DurationParser::builder()
    .default_time_units()
    .parse_multiple(Some(&["and"]))
    .build();

assert_eq!(
    parser.parse("1.5h 2e+2ns"),
    Ok(Duration::positive(5400, 200))
);
assert_eq!(
    parser.parse("55s500ms"),
    Ok(Duration::positive(55, 500_000_000))
);
assert_eq!(parser.parse("1\t1"), Ok(Duration::positive(2, 0)));
assert_eq!(
    parser.parse("1.   .1"),
    Ok(Duration::positive(1, 100_000_000))
);
assert_eq!(parser.parse("2h"), Ok(Duration::positive(2 * 60 * 60, 0)));
assert_eq!(
    parser.parse("300ms20s 5d"),
    Ok(Duration::positive(5 * 60 * 60 * 24 + 20, 300_000_000))
);
assert_eq!(
    parser.parse("300.0ms and 5d"),
    Ok(Duration::positive(5 * 60 * 60 * 24, 300_000_000))
);

See also the examples folder for common recipes and integration with other crates. Run an example with

cargo run --example $FILE_NAME_WITHOUT_FILETYPE_SUFFIX

like the systemd time span parser example

# For some of the examples a help is available. To pass arguments to the example itself separate 
# the arguments for cargo and the example with `--`
$ cargo run --example systemd --features custom --no-default-features -- --help
...

# To actually run the example execute
$ cargo run --example systemd --features custom --no-default-features '300ms20s 5day'
Original: 300ms20s 5day
      μs: 432020300000
   Human: 5d 20s 300ms

Time units

Second is the default time unit (if not specified otherwise for example with DurationParser::default_unit) which is applied when no time unit was encountered in the input string. The table below gives an overview of the constructor methods and which time units are available. If a custom set of time units is required, DurationParser::with_time_units can be used.

TimeUnit Default identifier Calculation Default time unit
Nanosecond ns 1e-9s
Microsecond Ms 1e-6s
Millisecond ms 1e-3s
Second s SI definition
Minute m 60s
Hour h 60m
Day d 24h
Week w 7d
Month M Year / 12
Year y 365.25d

Note that Months and Years are not included in the default set of time units. The current implementation uses an approximate calculation of Months and Years in seconds and if they are included in the final configuration, the Julian year based calculation is used. (See table above)

With the CustomDurationParser from the custom feature, the identifiers for time units can be fully customized.

Customization

Unlike other crates, fundu does not try to establish a standard for time units and their identifiers or a specific number format. A lot of these aspects can be adjusted when initializing or building the parser. Here's an incomplete example for possible customizations of the number format:

use fundu::TimeUnit::*;
use fundu::{Duration, DurationParser, ParseError};

let parser = DurationParser::builder()
    // Use a custom set of time units. For demonstration purposes just NanoSecond
    .time_units(&[NanoSecond])
    // Allow some whitespace characters as delimiter between the number and the time unit
    .allow_time_unit_delimiter()
    // Makes the number optional. If no number was encountered `1` is assumed
    .number_is_optional()
    // Disable parsing the fractional part of the number => 1.0 will return an error
    .disable_fraction()
    // Disable parsing the exponent => 1e0 will return an error
    .disable_exponent()
    // Finally, build a reusable DurationParser
    .build();

// Some valid input
assert_eq!(parser.parse("ns").unwrap(), Duration::positive(0, 1));
assert_eq!(
    parser.parse("1000\t\n\r ns").unwrap(),
    Duration::positive(0, 1000)
);

// Some invalid input
assert_eq!(
    parser.parse("1.0ns").unwrap_err(),
    ParseError::Syntax(1, "No fraction allowed".to_string())
);
assert_eq!(
    parser.parse("1e9ns").unwrap_err(),
    ParseError::Syntax(1, "No exponent allowed".to_string())
);

Here's an example for fully-customizable time units which uses the CustomDurationParser from the custom feature:

use fundu::TimeUnit::*;
use fundu::{CustomDurationParser, CustomTimeUnit, Duration, Multiplier, TimeKeyword};

// Let's define a custom time unit `fortnight` which is worth 2 weeks. Note the creation 
// of `CustomTimeUnits` and `TimeKeywords` can be `const` and moved to compile time:
const FORTNIGHT: CustomTimeUnit = CustomTimeUnit::new(
    Week,
    &["f", "fortnight", "fortnights"],
    Some(Multiplier(2, 0)),
);

let parser = CustomDurationParser::builder()
    .time_units(&[
        CustomTimeUnit::with_default(Second, &["s", "secs", "seconds"]),
        CustomTimeUnit::with_default(Minute, &["min"]),
        CustomTimeUnit::with_default(Hour, &["ώρα"]),
        FORTNIGHT,
    ])
    // Additionally, define `tomorrow`, a keyword of time which is worth `1 day` in the future.
    // In contrast to a `CustomTimeUnit`, a `TimeKeyword` doesn't accept a number in front of it 
    // in the source string.
    .keyword(TimeKeyword::new(Day, &["tomorrow"], Some(Multiplier(1, 0))))
    .build();

assert_eq!(
    parser.parse("42e-1ώρα").unwrap(),
    Duration::positive(15120, 0)
);
assert_eq!(
    parser.parse("tomorrow").unwrap(),
    Duration::positive(60 * 60 * 24, 0)
);
assert_eq!(
    parser.parse("1fortnight").unwrap(),
    Duration::positive(60 * 60 * 24 * 7 * 2, 0)
);

Benchmarks

To run the benchmarks on your machine, clone the repository

git clone https://github.com/fundu-rs/fundu.git
cd fundu

and then run all benchmarks with

cargo bench --all-features

The iai-callgrind (feature = with-iai) and flamegraph (feature = with-flamegraph) benchmarks can only be run on unix. Use the --features option of cargo to run the benchmarks for specific features:

cargo bench --features standard,custom

The above won't run the flamegraph and iai-callgrind benchmarks.

Benchmarks can be further filtered for example with

cargo bench --bench benchmarks_standard
cargo bench --bench benchmarks_standard -- 'parsing speed'
cargo bench --features custom --no-default-features --bench benchmarks_custom

For more infos, see the help with

cargo bench --help # The cargo help for bench
cargo bench --bench benchmarks_standard -- --help # The criterion help

To get a rough idea about the parsing times, here the average parsing speed of some inputs (Quad core 3000Mhz, 8GB DDR3, Linux)

Input avg parsing time
1 38.705 ns
123456789.123456789 57.974 ns
format!("{0}.{0}e-1022", "1".repeat(1022)) 421.56 ns
1s 55.755 ns
1ns 59.842 ns
1y 57.760 ns

Contributing

Contributions are always welcome! Either start with an issue that already exists or open a new issue where we can discuss everything so that no effort is wasted. Do not hesitate to ask questions!

Projects using fundu

License

MIT license (LICENSE or http://opensource.org/licenses/MIT)

fundu's People

Contributors

gamma0987 avatar joining7943 avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

gamma0987

fundu's Issues

Parse a single duration from the input string and return the remaining string together with the duration

The parse_with_remainder method parses a single duration from the input string and then returns the rest of the input.

By example: The input is 1ms 3s and the parse_multiple option is set with the space character as delimiter. With the usual parse method this would result in a Duration with 3 seconds and 1_000_000 nano seconds because all parsed durations are summed up. The parse_with_remainder method would only parse 1ms returning the Duration with 1_000_000 nano seconds and the rest of input string 3s after consuming the delimiter. If the parse_multiple option is not set, then the remainder of the input string would simply be an empty string.

If needed, this method allows a more flexible handling of the resulting Durations, strings and possible ParseErrors.

A sign character might be interpreted as 1 second when `parse_multiple` and `number_is_optional` are set

This misinterpretation happens if the configuration option parse_multiple is set together with number_is_optional and a sign surrounded by a delimiter is present. For example "1s + 1s" resuls in a Duration::positive(3, 0) instead of the expected Duration::positive(2, 0).

The expected solution is to return an error. As soon as #28 is implemented the input "1s + 1s" would result in a duration worth 2 seconds.

When parsing multiple durations also allow the sign characters (`+`,`-`) as separator in addition to digits

Parsing multiple durations with the parse_multiple option usually recognizes the next duration if it is delimited by the Delimiter defined with the parse_multiple function but also if a digit is encountered. For example, 10ms20s is identified as two durations (finally adding up to one) because of the digit after ms and 10ms 20s because of the delimiter (here a single space character). It would be a nice addition if a sign would also indicate a new duration like in 10ms-20s or 10ms+20s.

Combine the different delimiters

Currently it is possible to specify different delimiters for some configuration options:

  • allow_delimiter, a possible delimiter between the number and the time unit
  • allow_ago, a possible delimiter between the time unit and the ago keyword
  • sign_delimiter, a possible delimiter between the sign and the number
  • delimiter_multiple, a possible delimiter between multiple durations when parsing multiple durations

These different delimiter possibilities grew over time and introduce unnecessary complexity. The goal is to 1. unify the first three delimiters into an inner delimiter which can occur within a duration and 2. keeping the last delimiter as an outer delimiter which can occur between durations when parsing multiple durations.

The inner delimiter can occur here:

(sign)(inner_delimiter)(number)(inner_delimiter)(time_unit)(inner_delimiter)(ago) = duration

The outer delimiter can occur here:

(duration)(outer_delimiter)(conjunction)(outer_delimiter)(duration)...

The advantages are:

  • Having a default delimiter for the inner and outer delimiter options. The default would be the rust definition of whitespace
  • Simpler usage. The above options don't need an argument with a delimiter anymore. Instead there are setters to be able to overwrite the default delimiters if needed
  • Better optimisation possibilities and performance
  • Reducing complexity in the core parser

`fundu-gnu`: "fuzzy" parsing of years and months

Gnu parses years and months fuzzy. Instead of using static approximate values, like year = 365.25 days, years and months are calculated based on some starting point like a given date or now.

`CustomDurationParser` cannot be fully built as `const` at compile time

This is a limitation of the CustomDurationParser which is difficult to work around if you want to provide the convenience for which the custom feature is intended and at the same time parse the time units as fast as possible.

A solution is to make the basic Parser from the parse module a part of the public api and let the user choose to implement TimeUnitsLike. This will be done in a separate feature that the user can choose to enable or not. Implementing TimeUnitsLike is not difficult, but a bit more involved than using the parser from the custom feature, needs some examples and additional documentation. However, the advantages of such a solution are that the parser can be created as const at compile time, removing the rest of the non-const initialization time from the parsing time, slightly speeding up the parsing of the time units, and also reducing the binary size a bit.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Detected dependencies

cargo
Cargo.toml
  • chrono 0.4.24
  • clap 4
  • criterion 0.5
  • iai-callgrind 0.10
  • inferno 0.11
  • pprof 0.13
  • rstest 0.18
  • rstest_reuse 0.6
  • serde 1
  • serde_test 1
  • time 0.3.1
fundu-core/Cargo.toml
fundu-gnu/Cargo.toml
fundu-gnu/fuzz/Cargo.toml
  • libfuzzer-sys 0.4
  • regex 1.8
  • lazy_static 1.4
fundu-systemd/Cargo.toml
fundu/Cargo.toml
fundu/fuzz/Cargo.toml
  • libfuzzer-sys 0.4
  • arbitrary 1.3
  • regex 1.8
github-actions
.github/workflows/cicd.yml
  • actions/checkout v4
  • EmbarkStudios/cargo-deny-action v1
  • actions/checkout v4
  • Swatinem/rust-cache v2
  • actions/checkout v4
  • actions/checkout v4
  • Swatinem/rust-cache v2
  • actions/checkout v4
  • Swatinem/rust-cache v2
  • actions/checkout v4
  • Swatinem/rust-cache v2
  • actions/checkout v4
  • Swatinem/rust-cache v2
  • codecov/codecov-action v4
  • actions/checkout v4
  • Swatinem/rust-cache v2
  • codecov/codecov-action v4
  • actions/checkout v4
  • Swatinem/rust-cache v2
  • dawidd6/action-download-artifact v3
  • actions/upload-artifact v4
  • actions/checkout v4
  • Swatinem/rust-cache v2
.github/workflows/publish-fundu-core.yml
  • actions/checkout v4
.github/workflows/publish-fundu-gnu.yml
  • actions/checkout v4
.github/workflows/publish-fundu-systemd.yml
  • actions/checkout v4
.github/workflows/publish-fundu.yml
  • actions/checkout v4

  • Check this box to trigger a request for Renovate to run again on this repository

Restructure the project and split the fundu package into a core package and the actual library

Splitting the package serves the purpose, that parts of the core parser can be adjusted for more specific usages if needed. The intention is to be able to use the core parser directly in other fundu packages instead of just having access to the public api. It also keeps the fundu main package public api clean and focused on structs etc. which are actually useful for an end-user.

The final structure of the project will vaguely look like:

/fundu-core/
/fundu/
/other packages .../
Cargo.toml (virtual workspace)
README.md
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.