Git Product home page Git Product logo

rust-lexical's Introduction

lexical

High-performance numeric conversion routines for use in a no_std environment. This does not depend on any standard library features, nor a system allocator.

Similar Projects

If you want a minimal, stable, and compile-time friendly version of lexical's float-parsing algorithm, see minimal-lexical. If you want a minimal, performant float parser, recent versions of the Rust standard library should be sufficient.

Table of Contents

Getting Started

Add lexical to your Cargo.toml:

[dependencies]
lexical = "^6.0"

And get started using lexical:

// Number to string
use lexical_core::BUFFER_SIZE;
let mut buffer = [b'0'; BUFFER_SIZE];
lexical_core::write(3.0, &mut buffer);   // "3.0", always has a fraction suffix,
lexical_core::write(3, &mut buffer);     // "3"

// String to number.
let i: i32 = lexical_core::parse("3")?;      // Ok(3), auto-type deduction.
let f: f32 = lexical_core::parse("3.5")?;    // Ok(3.5)
let d: f64 = lexical_core::parse("3.5")?;    // Ok(3.5), error checking parse.
let d: f64 = lexical_core::parse("3a")?;     // Err(Error(_)), failed to parse.

In order to use lexical in generic code, the trait bounds FromLexical (for parse) and ToLexical (for to_string) are provided.

/// Multiply a value in a string by multiplier, and serialize to string.
fn mul_2<T>(value: &str, multiplier: T)
    -> Result<String, lexical_core::Error>
where 
    T: lexical_core::ToLexical + lexical_core::FromLexical,
{
    let value: T = lexical_core::parse(value.as_bytes())?;
    let mut buffer = [b'0'; lexical_core::BUFFER_SIZE];
    let bytes = lexical_core::write(value * multiplier, &mut buffer);
    Ok(std::str::from_utf8(bytes).unwrap())
}

Partial/Complete Parsers

Lexical has both partial and complete parsers: the complete parsers ensure the entire buffer is used while parsing, without ignoring trailing characters, while the partial parsers parse as many characters as possible, returning both the parsed value and the number of parsed digits. Upon encountering an error, lexical will return an error indicating both the error type and the index at which the error occurred inside the buffer.

Complete Parsers

// This will return Err(Error::InvalidDigit(3)), indicating
// the first invalid character occurred at the index 3 in the input
// string (the space character).
let x: i32 = lexical_core::parse(b"123 456")?;

Partial Parsers

// This will return Ok((123, 3)), indicating that 3 digits were successfully
// parsed, and that the returned value is `123`.
let (x, count): (i32, usize) = lexical_core::parse_partial(b"123 456")?;

no_std

lexical-core does not depend on a standard library, nor a system allocator. To use lexical-core in a no_std environment, add the following to Cargo.toml:

[dependencies.lexical-core]
version = "0.8.5"
default-features = false
# Can select only desired parsing/writing features.
features = ["write-integers", "write-floats", "parse-integers", "parse-floats"]

And get started using lexical:

// A constant for the maximum number of bytes a formatter will write.
use lexical_core::BUFFER_SIZE;
let mut buffer = [b'0'; BUFFER_SIZE];

// Number to string. The underlying buffer must be a slice of bytes.
let count = lexical_core::write(3.0, &mut buffer);
assert_eq!(buffer[..count], b"3.0");
let count = lexical_core::write(3i32, &mut buffer);
assert_eq!(buffer[..count], b"3");

// String to number. The input must be a slice of bytes.
let i: i32 = lexical_core::parse(b"3")?;      // Ok(3), auto-type deduction.
let f: f32 = lexical_core::parse(b"3.5")?;    // Ok(3.5)
let d: f64 = lexical_core::parse(b"3.5")?;    // Ok(3.5), error checking parse.
let d: f64 = lexical_core::parse(b"3a")?;     // Err(Error(_)), failed to parse.

Features

Lexical feature-gates each numeric conversion routine, resulting in faster compile times if certain numeric conversions. These features can be enabled/disabled for both lexical-core (which does not require a system allocator) and lexical. By default, all conversions are enabled.

  • parse-floats:   Enable string-to-float conversions.
  • parse-integers:   Enable string-to-integer conversions.
  • write-floats:   Enable float-to-string conversions.
  • write-integers:   Enable integer-to-string conversions.

Lexical is highly customizable, and contains numerous other optional features:

  • std:   Enable use of the Rust standard library (enabled by default).
  • power-of-two:   Enable conversions to and from non-decimal strings.
    With power_of_two enabled, the radixes {2, 4, 8, 10, 16, and 32} are valid, otherwise, only 10 is valid. This enables common conversions to/from hexadecimal integers/floats, without requiring large pre-computed tables for other radixes.
  • radix:   Allow conversions to and from non-decimal strings.
    With radix enabled, any radix from 2 to 36 (inclusive) is valid, otherwise, only 10 is valid.
  • format:   Customize acceptable number formats for number parsing and writing.
    With format enabled, the number format is dictated through bitflags and masks packed into a u128. These dictate the valid syntax of parsed and written numbers, including enabling digit separators, requiring integer or fraction digits, and toggling case-sensitive exponent characters.
  • compact:   Optimize for binary size at the expense of performance.
    This minimizes the use of pre-computed tables, producing significantly smaller binaries.
  • safe:   Require all array indexing to be bounds-checked.
    This is effectively a no-op for number parsers, since they use safe indexing except where indexing without bounds checking can be trivially shown to be correct. The number writers frequently use unsafe indexing, since we can easily over-estimate the number of digits in the output due to the fixed-length input.
  • f16:   Add support for numeric conversions to-and-from 16-bit floats.
    Adds f16, a half-precision IEEE-754 floating-point type, and bf16, the Brain Float 16 type, and numeric conversions to-and-from these floats. Note that since these are storage formats, and therefore do not have native arithmetic operations, all conversions are done using an intermediate f32.

To ensure the safety when bounds checking is disabled, we extensively fuzz the all numeric conversion routines. See the Safety section below for more information.

Lexical also places a heavy focus on code bloat: with algorithms both optimized for performance and size. By default, this focuses on performance, however, using the compact feature, you can also opt-in to reduced code size at the cost of performance. The compact algorithms minimize the use of pre-computed tables and other optimizations at the cost of performance.

Customization

WARNING: If changing the number of significant digits written, disabling the use of exponent notation, or changing exponent notation thresholds, BUFFER_SIZE may be insufficient to hold the resulting output. WriteOptions::buffer_size will provide a correct upper bound on the number of bytes written. If a buffer of insufficient length is provided, lexical-core will panic.

Every language has competing specifications for valid numerical input, meaning a number parser for Rust will incorrectly accept or reject input for different programming or data languages. For example:

// Valid in Rust strings.
// Not valid in JSON.
let f: f64 = lexical_core::parse(b"3.e7")?;  // 3e7

// Let's only accept JSON floats.
const JSON: u128 = lexical_core::format::JSON;
let options = ParseFloatOptions::new();
let f: f64 = lexical_core::parse_with_options::<JSON>(b"3.0e7", &options)?; // 3e7
let f: f64 = lexical_core::parse_with_options::<JSON>(b"3.e7", &options)?;  // Errors!

Due the high variability in the syntax of numbers in different programming and data languages, we provide 2 different APIs to simplify converting numbers with different syntax requirements.

  • Number Format API (feature-gated via format or power-of-two).
    This is a packed struct contained flags to specify compile-time syntax rules for number parsing or writing. This includes features such as the radix of the numeric string, digit separators, case-sensitive exponent characters, optional base prefixes/suffixes, and more.
  • Options API.
    This contains run-time rules for parsing and writing numbers. This includes exponent break points, rounding modes, the exponent and decimal point characters, and the string representation of NaN and Infinity.

A limited subset of functionality is documented in examples below, however, the complete specification can be found in the API reference documentation.

Number Format API

The number format class provides numerous flags to specify number syntax when parsing or writing. When the power-of-two feature is enabled, additional flags are added:

  • The radix for the significant digits (default 10).
  • The radix for the exponent base (default 10).
  • The radix for the exponent digits (default 10).

When the format feature is enabled, numerous other syntax and digit separator flags are enabled, including:

  • A digit separator character, to group digits for increased legibility.
  • Whether leading, trailing, internal, and consecutive digit separators are allowed.
  • Toggling required float components, such as digits before the decimal point.
  • Toggling whether special floats are allowed or are case-sensitive.

Many pre-defined constants therefore exist to simplify common use-cases, including:

  • JSON, XML, TOML, YAML, SQLite, and many more.
  • Rust, Python, C#, FORTRAN, COBOL literals and strings, and many more.

An example of building a custom number format is as follows:

const FORMAT: u128 = lexical_core::NumberFormatBuilder::new()
    // Disable exponent notation.
    .no_exponent_notation(true)
    // Disable all special numbers, such as Nan and Inf.
    .no_special(true)
    .build();

// Due to use in a `const fn`, we can't panic or expect users to unwrap invalid
// formats, so it's up to the caller to verify the format. If an invalid format
// is provided to a parser or writer, the function will error or panic, respectively.
debug_assert!(lexical_core::format_is_valid::<FORMAT>());

Options API

The options API allows customizing number parsing and writing at run-time, such as specifying the maximum number of significant digits, exponent characters, and more.

An example of building a custom options struct is as follows:

use std::num;

let options = lexical_core::WriteFloatOptions::builder()
    // Only write up to 5 significant digits, IE, `1.23456` becomes `1.2345`.
    .max_significant_digits(num::NonZeroUsize::new(5))
    // Never write less than 5 significant digits, `1.1` becomes `1.1000`.
    .min_significant_digits(num::NonZeroUsize::new(5))
    // Trim the trailing `.0` from integral float strings.
    .trim_floats(true)
    // Use a European-style decimal point.
    .decimal_point(b',')
    // Panic if we try to write NaN as a string.
    .nan_string(None)
    // Write infinity as "Infinity".
    .inf_string(Some(b"Infinity"))
    .build()
    .unwrap();

Documentation

Lexical's API reference can be found on docs.rs, as can lexical-core's. Detailed descriptions of the algorithms used can be found here:

In addition, descriptions of how lexical handles digit separators and implements big-integer arithmetic are also documented.

Validation

Float-Parsing

Float parsing is difficult to do correctly, and major bugs have been found in implementations from libstdc++'s strtod to Python. In order to validate the accuracy of the lexical, we employ the following external tests:

  1. Hrvoje Abraham's strtod test cases.
  2. Rust's test-float-parse unittests.
  3. Testbase's stress tests for converting from decimal to binary.
  4. Nigel Tao's tests extracted from test suites for Freetype, Google's double-conversion library, IBM's IEEE-754R compliance test, as well as numerous other curated examples.
  5. Various difficult cases reported on blogs.

Although lexical may contain bugs leading to rounding error, it is tested against a comprehensive suite of random-data and near-halfway representations, and should be fast and correct for the vast majority of use-cases.

Metrics

Various benchmarks, binary sizes, and compile times are shown here:

Build Timings

The compile-times when building with all numeric conversions enabled. For a more fine-tuned breakdown, see build timings.

Build Timings

Binary Size

The binary sizes of stripped binaries compiled at optimization level "2". For a more fine-tuned breakdown, see binary sizes.

Parse Stripped - Optimization Level "2" Write Stripped - Optimization Level "2"

Benchmarks -- Parse Integer

A benchmark on randomly-generated integers uniformly distributed over the entire range. For a more fine-tuned breakdown, see benchmarks.

Uniform Random Data

Benchmarks -- Parse Float

A benchmark on parsing floats from various real-world data sets. For a more fine-tuned breakdown, see benchmarks.

Real Data

Benchmarks -- Write Integer

A benchmark on writing random integers uniformly distributed over the entire range. For a more fine-tuned breakdown, see benchmarks.

Uniform Random Data

Benchmarks -- Write Float

A benchmark on writing floats generated via a random-number generator and parsed from a JSON document. For a more fine-tuned breakdown, see benchmarks.

Random Data

Safety

Due to the use of memory unsafe code in the integer and float writers, we extensively fuzz our float writers and parsers. The fuzz harnesses may be found under fuzz, and are run continuously. So far, we've parsed and written over 72 billion floats.

Due to the simple logic of the integer writers, and the lack of memory safety in the integer parsers, we minimally fuzz both, and test it with edge-cases, which has shown no memory safety issues to date.

Platform Support

lexical-core is tested on a wide variety of platforms, including big and small-endian systems, to ensure portable code. Supported architectures include:

  • x86_64 Linux, Windows, macOS, Android, iOS, FreeBSD, and NetBSD.
  • x86 Linux, macOS, Android, iOS, and FreeBSD.
  • aarch64 (ARM8v8-A) Linux, Android, and iOS.
  • armv7 (ARMv7-A) Linux, Android, and iOS.
  • arm (ARMv6) Linux, and Android.
  • mips (MIPS) Linux.
  • mipsel (MIPS LE) Linux.
  • mips64 (MIPS64 BE) Linux.
  • mips64el (MIPS64 LE) Linux.
  • powerpc (PowerPC) Linux.
  • powerpc64 (PPC64) Linux.
  • powerpc64le (PPC64LE) Linux.
  • s390x (IBM Z) Linux.

lexical-core should also work on a wide variety of other architectures and ISAs. If you have any issue compiling lexical-core on any architecture, please file a bug report.

Versioning and Version Support

Version Support

The currently supported versions are:

  • v0.8.x
  • v0.7.x (Maintenance)
  • v0.6.x (Maintenance)

Rustc Compatibility

  • v0.8.x supports 1.51+, including stable, beta, and nightly.
  • v0.7.x supports 1.37+, including stable, beta, and nightly.
  • v0.6.x supports Rustc 1.24+, including stable, beta, and nightly.

Please report any errors compiling a supported lexical-core version on a compatible Rustc version.

Versioning

lexical uses semantic versioning. Removing support for Rustc versions newer than the latest stable Debian or Ubuntu version is considered an incompatible API change, requiring a major version change.

Changelog

All changes are documented in CHANGELOG.

License

Lexical is dual licensed under the Apache 2.0 license as well as the MIT license. See the LICENSE.md file for full license details.

Contributing

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in lexical by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions. Contributing to the repository means abiding by the code of conduct.

For the process on how to contribute to lexical, see the development quick-start guide.

rust-lexical's People

Contributors

5225225 avatar alexhuszagh avatar astro avatar bl-ue avatar eclipseo avatar gelbpunkt avatar ignatenkobrain avatar jawadcode avatar jonas-schievink avatar jonhoo avatar josfemova avatar luro02 avatar mdrach avatar myrrlyn avatar razrfalcon avatar xiphoseer avatar zackpierce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rust-lexical's Issues

~25% perf hit from version 2.0

I just saw you released a 2.0 (nice!) so I bumped the version I use in a pet project and it has caused a ~25% slowdown in parsing floats.

Are you aware of this already? With all the benchmarks here I figure you would have but the only note about 2.0 I can find is that you use minimal unsafe which is looking like a dubious trade on my end. Can you enlighten me?

parse api for char at time, incremental FromLexical

In cases like #24 where you wish to parse a subset of the formats supported by lexical,
It would perhaps be nice if there was an API which would return an interim result, which could be fed back into the parse function along with the next character.

Then when parsing from e.g. a JSON float string representation, you could avoid malformed string representations, and convert to float in a single pass over the input.

lexical::try_parse for floats parsing appears dependent on compiler flags

Hi,

First off I just want to thank you for the work you've put into this crate to create a faster parser of from string to uint, int, and floating types.

I'm currently writing a crate that I hope will eventually act as a faster version of numpy's loadtxt and genfromtxt but for Rust. The main logic can be found in the macro I wrote for the various different conversions as seen here. The only lines I had to change to incorporate your crate can be found at L172-188. The commented out lines are what used to be contained in the map function based on the standard library's conversion for most primitive types.

The current tests I have fail for only the float cases. You can view them here. However, they pass for all of integer and unsigned integer tests. Now if I more or less copy exactly what's in the macro and run it outside of a macro it works. Below is an example:

    let mut results = Vec::<f32>::new();
    let line_split_vec = vec!["1", "2", "3"];
    results.extend({
          line_split_vec.iter().map(|x| {
                lexical::try_parse::<f32, _>(x.trim()).unwrap()
           })
     });
    println!("{:?}", results);

I've also checked the type and values being fed into the lexical::try_parse, and they are the same between my above example and the macro which fails.

edit:

I just ran it using the debug build and that works...

So, I've been fiddling around with the compiler options a bit, and it appears that depending on what compiler flags are being passed in determines whether or not it will run. I'll need to look a bit more into what was so different between what my basic cargo options set and the flags used for release.

Implement std::error::Error trait for lexical::Error

Problem

rust-lexical does not implement the std::error::Error trait for lexical::Error.

That makes error handling harder, in my case the integration with anyhow. anyhow is a very helpful error handling helper library, but it requires the errors to implement the std::error::Error trait. lexical does not do that, which is why I'm gonna have to write boilerplate code to make it work correctly

Solution

Implement the std::error::Error trait for lexical::Error.

Prerequisites

None, as far as I can tell. I'm a Rust beginner, but I think an implementation of std::error::Error should require nothing beyond just a couple lines of code.

Alternatives

Alternative: not do anything.

  • Advantage: less work
  • Disadvantage: crate is harder to work with and integrate with other crates

[FEATURE] NumberFormat to disallow leading zeroes

Problem

Some languages disallow leading zeros in integers (but not floats). For example, a Python REPL:

>>> 012
  File "<stdin>", line 1
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
>>> -012
  File "<stdin>", line 1
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
>>> 012.0
12.0

JavaScript/JSON has the same behavior.

Solution

I don't see an existing format that applies. If a new format is added, it would pass this test:

diff --git a/lexical-core/src/atoi/api.rs b/lexical-core/src/atoi/api.rs
index 022214f..1196fd8 100644
--- a/lexical-core/src/atoi/api.rs
+++ b/lexical-core/src/atoi/api.rs
@@ -348,6 +348,14 @@ mod tests {
         assert!(i32::from_lexical_format(b"31_", format).is_err());
     }

+    #[test]
+    #[cfg(feature = "format")]
+    fn i32_leading_zero() {
+        let format = NumberFormat::INTEGER_NO_LEADING_ZERO;
+        assert!(i32::from_lexical_format(b"012", format).is_err());
+        assert!(i32::from_lexical_format(b"-012", format).is_err());
+    }
+
     #[cfg(feature = "std")]
     proptest! {
         #[test]

This could be applied to at least NumberFormat::PYTHON_LITERAL and NumberFormat::JSON.

[BUG] Unable to use lexical-core in stable no_std environment

Description

Include lexical-core as a dependency in Cargo.toml, and turn off std:

lexical-core = {version="0.7.4", default-features=false }

Build gives an error:

error[E0554]: `#![feature]` may not be used on the stable release channel
   --> /Users/todd/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.7.4/src/lib.rs:133:35
    |
133 | #![cfg_attr(not(feature = "std"), feature(core_intrinsics))]

Memory safety problem in `insert_many()`

Hello, we recently reported a buffer overflow bug in SmallVec::insert_many() servo/rust-smallvec#252.

This crate contains a slightly older copy of insert_many() that has the same vulnerability.

/// Insert multiple elements at position `index`.
///
/// Shifts all elements before index to the back of the iterator.
/// It uses size hints to try to minimize the number of moves,
/// however, it does not rely on them. We cannot internally allocate, so
/// if we overstep the lower_size_bound, we have to do extensive
/// moves to shift each item back incrementally.
///
/// This implementation is adapted from [`smallvec`], which has a battle-tested
/// implementation that has been revised for at least a security advisory
/// warning. Smallvec is similarly licensed under an MIT/Apache dual license.
///
/// [`smallvec`]: https://github.com/servo/rust-smallvec

Please update the code when the fix is published.

Thanks!

[FEATURE] Make no_std stable by default

Problem

Currently, lexical-core only allows no_std on stable if the libm feature is enabled. Since libm is very small, well-maintained, and fast to compile, this should be the default.

Solution

Remove core::intrinsics and replace them with libm.

[FEATURE] Turn as much as possible of `NumberFormat` into `const fn`

Problem

This is how I currently define my number format at comptime, because const fn is not supported:

const FORMI_LITERAL: NumberFormat =
    NumberFormat::from_bits_truncate(0
        | ((b'_' as u64) << 56) // digit_separator_to_flags
        | 0x00000000_00000007   // REQUIRED_DIGITS
        | 0x00000000_00000100   // NO_EXPONENT_WITHOUT_FRACTION
        | 0x00000000_00000200   // NO_SPECIAL
        | 0x00000111_00000000   // INTERNAL_DIGIT_SEPARATOR
    );

I.e. a bunch of magic numbers that may break on any version jump.

Solution

Well, use const fn. =) Or provide alternative functions using const fn. Where checks are needed, a few different approaches are possible, each with ups and downs:

  • Make the functions unsafe fn with_×××_unchecked and panic!() once panicing in const fns is a thing.
  • Use the const assert trick to check invariants. I.e. index a const array with a bool. If it's true, you index out of bounds. Fetched value must be used, iirc.
  • Use a crate for static assertions.

Prerequisites

  • lexical version : 0.7.*
  • lexical compilation features used: correct, radix, format

Alternatives

Provide a single const fn that returns true if a format spec is valid, otherwise false. Users of lexical-core can then themselves do the static assertion. However, this still makes constructing a format at comptime »magic«, as opposed to using const fn methods describing what it is one wants.

The repository is HUGE (500 MB), consider cleaning it up (remove lexical-core/target)

The repository is almost 500 megabytes due to including the lexical-core/target

git allows you to cleanup this using things like filter-branch, etc.

It will change the commit ids of the commits after the one where you accidentally added the lexical-core/target folder though, so contributors with old histories will get conflicts, but it's still largely worth it IMHO.

Tracking issue: atof for exact float parsing (hopefully for serde_json)

This issue tracks the implementation of an atof function that could be used by serde_json. It is motivated by the parsing issues discussed in serde-rs/json#536.

@Alexhuszagh Has provided background detail in #28. In particular, their lexical library has lots of testing, which provides a great foundation on which to build a customized atof function.

The direction that currently seems viable is to add a streaming atof function to lexical-core. It would operate on one byte at a time. This ought to allow serde_json to correctly parse JSON floats at high speed.

[FEATURE] The Schubfach algorithm

Problem

While the Ryu algorithm shows fine average throughput for arbitrary numbers.

But it does a lot of rounding iterations for numbers with a small mantissa (in serialized representation) and have no version for bases except decimal.

Solution

Use the Schubfach algorithm to avoid rounding loops.

It will minimize tail latency or increase throughput if input has lot of numbers a with small mantissa.

Also, the Schubfach algorithm can be implemented for other bases except 3, 12, 24, 48, etc.

Additional Context

"The Schubfach way to render doubles" by Raffaello Giulietti
Java variant for decimal representation from the author of the algorithm
Scala variant for decimal representation from the jsoniter-scala library

[FEATURE] Add support for `f128::f128` and `half::{ f16, bf16 }`

Problem

Currently, lexical-core can only parse f32 and f64, but especially for designers of programming languages supporting more number formats than Rust does would be nice.

Solution

Offer a feature-gated default impl for f128 using the f128 crate and f16, bf16 from the half crate.

Prerequisites

  • lexical version : 0.7.*
  • lexical compilation features used: format, correct, radix

Alternatives

Don't see any beyond »let's not«.

Additional Context

Rust has u128, as having it for e.g. crypto is convenient, despite no mainstream CPU having 128-bit integer arithmetic and registers. f16 is very often used, e.g. in GPU code, bf16 specifically in neural network code, and f128 also finds some use here and there.

There's also 8-bit floats, though not IEEE-standardised, and there's IEEE 754 binary256. However, I know of no handy softfloat crates for these.

As lexical-core aims to be a proglang-agnostic number parser, i.e. not tied to Rust formats and types, I see no reason to completely restrict oneself to just the built-in Rust machine types.

[FEATURE] CORE write() formatting control

Problem

I'm trying to implement a protocol which does not accept the scientific format in all places. It would be useful to control if the decimal output is written in normal or scientific format.

The number of significant digits would also be nice to have some degree of control over. Rounding the number to the desired before doesn't help if the rounded value isn't representable (example 1.2f32 -> 1.2000000476837158).

Solution

A extra write function which can take formatting hints, possibly write_format(n, format, significant_digits, bytes) where:

  • n - Value to be written
  • format - An enum for desired format
  • significant_digits - A usize of maximum number of significant digits, 0 could mean "Don't care".
  • bytes - Output buffer

Prerequisites

If applicable to the feature request, here are a few things you should provide to help me understand the issue:

  • Rust version : rustc 1.39.0-nightly (4295eea90 2019-08-30)
  • lexical version : 0.6.2
  • lexical compilation features used: lexical-core: features=["radix"], default-features=false

Alternatives

  • String manipulation - Expensive and buggy, needs to check the result to figure out how to cut it up.

parse_float with 16 radix failed

Description

use lexical::parse_with_options;

const FIXTURE: &str = "1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111";

fn main() {
    let f: f64 = parse_with_options::<_, _, { lexical::NumberFormatBuilder::from_radix(16) }>(
        FIXTURE,
        &lexical::parse_float_options::STANDARD,
    )
    .expect("parse float failed");
    println!("{}", f);
}

Prerequisites

Here are a few things you should provide to help me understand the issue:

  • Rust version: 1.56.0
  • lexical version: 6.0.1
  • lexical compilation features used: features = ["power-of-two", "parse-floats"]

Test case

use lexical::parse_with_options;

const FIXTURE: &str = "11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111";

fn main() {
    let f: f64 = parse_with_options::<_, _, { lexical::NumberFormatBuilder::from_radix(16) }>(
        FIXTURE,
        &lexical::parse_float_options::STANDARD,
    )
    .expect("parse float failed");
    println!("{}", f);
}

Additional Context

expected to see inf,
but got error: thread 'main' panicked at 'parse float failed: InvalidPunctuation', src\main.rs:10:6

[FEATURE] Ignore/check base prefixes when using `format` *and* `radix`

Problem

With format and radix enabled, I can e.g. parse hexadecimal floats using my proglang-specific syntax. However, that syntax includes base prefixes, in this case 0x. To make matters worse, base prefixes usually appear between the sign and the integer digits of a number literal.

Solution

0b, 0o, 0, 0d, 0x, as well as upper-case variants, are all common base prefixes in programming languages with 0b for base-2, 0o for base-8, 0 — as in leading zero — as a terrible way of saying 0o, 0d as an optional base-10 for the sake of symmetry, and 0x for base-16 numbers. I suggest ignoring leading 0 to mean base-8. That's just terrible, a source of countless bugs, and should thus be up to the user to work around.

While pretty much any radix is possible, I suggest only handling these four bases. I don't know of any common radices for others.

The following extensions to the format bit-packed config should be made:

  • 4 bits to indicate whether base prefixes are allowed for 0b, 0o, 0d, 0x.
  • 4 bits to indicate whether base prefixes are optional. In most proglangs that would only be true for 0d.
  • 4 bits to indicate whether base prefixes are case-insensitive. Often is the case, but not always.
  • 4 bytes for the base-indicating characters. For base-10 this character would be b'd'. If only upper-case was allowed in a format, this would be b'D'. Leading 0 is implied. If the format of the current radix has optional base-indicators, then all leading zeros behave normally.

This leaves 2 bytes and 4 bits reserved when using a u128 or a second u64 for the format settings.

Prerequisites

  • lexical version : 0.7.*
  • lexical compilation features used: correct, format, radix

Alternatives

Currently I check the sign myself, memorise it, then skip sign and base prefix to radix-parse the number literal I got. This abuses the fact that flipping the sign of a float is a lossless operation. However, it's annoying and unergonomic.

An alternative design to support more radix prefixes would be to take a function pointer or something that maps base to a base-indicating u8 ASCII-char.

It's also worth noting that there are languages with base postfixes, like 03h in Intel x86 assembly. Should these be supported as well?

[BUG] lexical-core 0.6 pins cfg-if 0.1.9, which causes problems on newer stable.

Description

lexical-core 0.6 pins cfg-if 0.1.9, which causes downstream problems.
Users may be stuck on lexical-core 0.6 for a while, since nom 5 requires it, and moving to newer nom versions is a notoriously slow process.

Prerequisites

  • Rust version : rustc 1.43.1 (8d69840ab 2020-05-04)
  • lexical-core version : 0.6.3

Test case

[dependencies]
lexical-core = "0.6" # or indirectly through nom = "5"
cfg-if = "0.1"
fn foo() {
    cfg_if::cfg_if! {
        if #[cfg(unix)] {
            fn bar() {}
            let tm = ();
        }
    }
}
$ cargo +stable check
    Checking foo v0.1.0 (/home/jon/dev/tmp/foo)
error: expected an item keyword
 --> src/main.rs:5:13
  |
5 |             let tm = ();
  |             ^^^

error: aborting due to previous error

error: could not compile `foo`.
$ cargo +beta check
    Checking foo v0.1.0 (/home/jon/dev/tmp/foo)
error: expected an item keyword
  --> src/main.rs:5:13
   |
5  |             let tm = ();
   |             ^^^
   |
  ::: /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/cfg-if-0.1.9/src/lib.rs:41:40
   |
41 |         if #[cfg($($meta:meta),*)] { $($it:item)* }
   |                                        -------- while parsing argument for this `item` macro fragment

error: aborting due to previous error

error: could not compile `foo`.

Additional Context

So, the problem here is that cargo does a search of the dependency tree, sees that lexical-core requires exactly cfg-if 0.1.9, and so any crate in the tree that depends on cfg-if 0.1 then gets that version (since cargo only builds one major version of every crate).

As I understand it, the decision to pin cfg-if 0.1.9 was made in fefe818 to support older Rust versions. Unfortunately that now means that newer Rust versions are not supported. It seems more important to support new Rust versions than old ones, so I suggest that the pinning should be undone.

There is also a note in that PR saying:

Update cfg-if to "0.1.10" when we support only Rustc >= 1.32.0.

Don't know if that applies now?

See also rust-bakery/nom#1115 (comment)

Parse floats using fast path?

Problem

In my application, over 25% of execution time is spent inside lexical::parse_lossy. Mind you, lexical's implementation is far better than the stdlib implementation and its speed is simply fantastic.

A little bit more speed can't hurt though, so I was looking at the implementation details where is stated that parse_lossy tries multiple parsing implementations: first the fast path, then the moderate path. Is it possible to make lexical return the fast path's result directly?

To give context; this is an excerpt of the floats that need to be parsed:

-0.018477, -0.018464, -0.018458, -0.014031, -0.014018, -0.014011, -0.000648, 0.008092, 
0.000111, -0.009875, 0.012704, 0.012185, 0.011334, 0.011927, 0.012284, 0.010097,
0.012951, 0.001517, -0.005452, 0.015123, -0.004884, -0.007977, 0.019697, 0.010684

They're all in between -1.0 and +1.0, and only the first few 4-5 floating point digits matter.

Solution

It could be implemented using a new parse function, maybe lexical::parse_lossier?

This f64 value round-trips with 1 ULP error

Compile the following with lexical-core (I tried 0.4.0 and 0.4.2) with the default options:

extern crate lexical_core;

fn main() {
    let problematic: f64 = 7.689539722041643e164;
    let as_str: String = format!("{:?}", problematic); // or other formats, see below
    println!("{}", as_str);
    let lcresult = lexical_core::atof64_slice(as_str.as_bytes());
    println!("{:?}", lcresult);
    let parse_result: f64 = as_str.parse().unwrap();
    println!("{:?}", parse_result);
}

Output (I've removed most of the zeroes to make it more readable):

768953972204164300 … 0.0
768953972204164200 … 0.0
768953972204164300 … 0.0

Note the lexical-core result (middle) is different from the input value and from the String::parse result, by 1 ULP. I got the same error using lexical::parse (1.2.2) instead of lexical_core::atof64_slice. OTOH, I got the expected result if I replace the format in the marked line with {} (which omits the ".0") or with {:e} (which outputs e164 like the input literal, instead of many zeroes). The input value has no special significance to me: I found it by testing my code (which calls into this function) using the proptest strategy proptest::num::f64::NORMAL.

[BUG] Formats with `digit_separators` can't parse float numbers

Description

Trying to define a custom format (a format containing digit separators), I couldn't get my number format to parse the string "42.0". After a while I've noticed, that those provided formats, which also contain digit separators, can't parse the same string either. See test case.

Prerequisites

  • Rust version: rustc 1.54.0 (a178d0322 2021-07-26)
  • lexical version: 6.0.0
  • lexical compilation features used: format

Test case

fn main() {
  const RUST: u128 = lexical::format::RUST_LITERAL;
  const JSON: u128 = lexical::format::JSON;
  const CXX: u128 = lexical::format::CXX17_LITERAL;

  let o = lexical::ParseFloatOptions::new();
  
  println!("{:?}", lexical::parse_with_options::<f64, _, JSON>("42.0", &o));
  
  // RUST_LITERAL
  println!("{:?}", lexical::parse_with_options::<f64, _, RUST>("42.0", &o));
  println!("{:?}", lexical::parse_with_options::<f64, _, RUST>("4_2.0", &o));
  
  // CXX17_LITERAL
  println!("{:?}", lexical::parse_with_options::<f64, _, CXX>("42.0", &o));
  println!("{:?}", lexical::parse_with_options::<f64, _, CXX>("4'2.0", &o));
}

I would expect all five println invocations to print Ok(42.0). But in the actual output, only the first one is able to parse the number.

Ok(42.0)
Err(EmptyInteger(2))
Err(EmptyInteger(3))
Err(EmptyMantissa(4))
Err(EmptyMantissa(5))

Additional Context

When I copy the RUST_LITERAL and CXX17_LITERAL definitions to my main function and comment out the digit_separator, the simple case can be parsed correctly:

  pub const CXX_NOSEP: u128 = lexical::NumberFormatBuilder::new()
//    .digit_separator(std::num::NonZeroU8::new(b'\''))
    .case_sensitive_special(true)
    .internal_digit_separator(true)
    .build();
  println!("{:?}", lexical::parse_with_options::<f64, _, CXX_NOSEP>("42.0", &o));

  pub const RUST_NO_SEP: u128 = lexical::NumberFormatBuilder::new()
//    .digit_separator(std::num::NonZeroU8::new(b'_'))
    .required_digits(true)
    .no_positive_mantissa_sign(true)
    .no_special(true)
    .internal_digit_separator(true)
    .trailing_digit_separator(true)
    .consecutive_digit_separator(true)
    .build();
  println!("{:?}", lexical::parse_with_options::<f64, _, RUST_NO_SEP>("42.0", &o));

prints

Ok(42.0)
Ok(42.0)

[FEATURE] Add Support for ISA Integers/Numbers

Problem

Currently, there are integer literals for ISAs (Instruction Set Architectures) like Intel x86 that support literal numbers for interrupt instructions, etc, as well as numerous other places. For example, for x86, we have the following reference specification.

Solution

First, we should add flags to NumberFormat to ensure all these numbers can be parsed. Specific flags, such as for base prefixes and postfixes (integer only?) should be added.

Next, we should add support for the numerical constants supported by popular ISAs, which could include:

  • x86/x86_64
  • ARMv6, ARMv7-A, ARMv8-A
  • MIPS/MIPS64EL/MIPS64
  • PowerPC/PPC64/PPC64EL
  • s390x (IBM Z)
  • RISC-V
  • Any others deemed important

I don't know any specifics for ISAs other than x86, so help is greatly appreciated. Do different ISAs have any differences than x86? Is there any difference between AT&T and Intel syntax (I don't believe so). I'm looking for a series of new flags to add to NumberFormat and then pre-defined constants so I can encompass all these possible variants.

Compile error in lexical-core 4 when building nom 5.0.0-beta2

Not sure how to get around this.

Cargo.lock

[[package]]
name = "nom"
version = "5.0.0-beta2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "lexical-core 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)",
 "memchr 2.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
 "regex 1.1.7 (registry+https://github.com/rust-lang/crates.io-index)",
 "version_check 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
]

Error log

error[E0412]: cannot find type `ChunksExact` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:439:51
    |
439 |     fn chunks_exact(&self, size: usize) -> slice::ChunksExact<T> {
    |                                                   ^^^^^^^^^^^ not found in `slice`

error[E0412]: cannot find type `ChunksExactMut` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:445:59
    |
445 |     fn chunks_exact_mut(&mut self, size: usize) -> slice::ChunksExactMut<T> {
    |                                                           ^^^^^^^^^^^^^^ not found in `slice`

error[E0412]: cannot find type `RChunks` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:535:46
    |
535 |     fn rchunks(&self, size: usize) -> slice::RChunks<T> {
    |                                              ^^^^^^^ did you mean `Chunks`?

error[E0412]: cannot find type `RChunksMut` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:541:54
    |
541 |     fn rchunks_mut(&mut self, size: usize) -> slice::RChunksMut<T> {
    |                                                      ^^^^^^^^^^ did you mean `ChunksMut`?

error[E0412]: cannot find type `RChunksExact` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:549:52
    |
549 |     fn rchunks_exact(&self, size: usize) -> slice::RChunksExact<T> {
    |                                                    ^^^^^^^^^^^^ not found in `slice`

error[E0412]: cannot find type `RChunksExactMut` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:555:60
    |
555 |     fn rchunks_exact_mut(&mut self, size: usize) -> slice::RChunksExactMut<T> {
    |                                                            ^^^^^^^^^^^^^^^ not found in `slice`

error[E0309]: the parameter type `T` may not live long enough
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:137:5
    |
136 | pub struct ReverseView<'a, T> {
    |                            - help: consider adding an explicit lifetime bound `T: 'a`...
137 |     inner: &'a [T],
    |     ^^^^^^^^^^^^^^
    |
note: ...so that the reference type `&'a [T]` does not outlive the data it points at
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:137:5
    |
137 |     inner: &'a [T],
    |     ^^^^^^^^^^^^^^

error[E0309]: the parameter type `T` may not live long enough
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:153:5
    |
152 | pub struct ReverseViewMut<'a, T> {
    |                               - help: consider adding an explicit lifetime bound `T: 'a`...
153 |     inner: &'a mut [T],
    |     ^^^^^^^^^^^^^^^^^^
    |
note: ...so that the reference type `&'a mut [T]` does not outlive the data it points at
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:153:5
    |
153 |     inner: &'a mut [T],
    |     ^^^^^^^^^^^^^^^^^^

error: aborting due to 8 previous errors

Some errors occurred: E0309, E0412.
For more information about an error, try `rustc --explain E0309`.
error: Could not compile `lexical-core`.
warning: build failed, waiting for other jobs to finish...
error: build failed

Parsing of '.'

Rust will parse . as Err(ParseFloatError { kind: Invalid }), while rust-lexical will parse it as 0.0. Not sure which one is the "correct" one.

Consider replacing stackvector with arrayvec

Hey, that's me again bugging you about arrayvec :)

It looks like the two crates are pretty close in the end, and I wonder if it makes sense for rust-lexical to switch to the latter? arrayvec has seen much more usage in the ecosystem, and, because unsafe code is involved, it seems like it makes sense to minimize duplication?

The there's the problem that ArrayVec lacks insert_many method, but I wonder if adding a
splice method would help with that?

(The reason why I am asking about this is that I've noticied that rust-analyzer transitively, via nom, depends on stackvector, while it already has arrayvec among the deps`).

Add ignored byte feature

Akin to the configurable exponent character etc., I'd like to be able to tell lexical_core that all b'_' are to be ignored.

I'm using this crate to parse floating-point literals in a toy programming language that allows arbitrary _ separators to be added after the first digits before and after the .. Currently, I have to allocate memory just to throw away the _ from otherwise valid input.

Test case:

b"+4_2_.3_4_e+7_7_" should successfully parse as +42.34e77 if the to-be-ignored byte is set to b'_'.

Prior art:

  • Rust's float literals allow for _ separators.
  • Ruby's as well.
  • In C++14, ' is a valid digit separator.

Questions:

  • Are valid digits allowed? Who would even do that?
  • Dot should not be allowed, I'd say?
  • How would one treat the case of not having a to-be-ignored byte? Would that noticeably slow things down?
  • Would it make sense to allow having multiple to-be-ignored bytes? Maybe even going so far as to add a filter callback? That'll for sure be slower.

[FEATURE] Add support for numbers with different radices in different components.

Problem

By default, we assume the radix is the same for the entire number. That is, the radix for the mantissa digits, the exponent base, and the radix for the exponent digits is the same.

Solution

Provide in ParseFloatOptions 2 additional fields:

  • exponent_radix, the radix for the exponent digit encoding
  • exponent_base, the numerical base for the exponent

These should both be limited to valid radices as well.

Additional Context

C++ hexadecimal float literals, and hexadecimal float representations demonstrate this issue:

// 0xa.b, which is 10.6875 in hex notation
// p specifies an exponent base of 2
//   The exponent is never optional for literals
//   The exponent is optional for strings
// 10 is a decimal-encoded integer
// So, the float is identical to 10.6875 * 2^10 
const float = 0xa.bp10

[BUG] Compilation Fails on Latest Nightly

Description

When building the crate with the latest nightly, the compilation fails with 27 errors.

Prerequisites

  • Rust version : rustc 1.51.0-nightly (d4e3570db 2021-02-01)
  • lexical version: 0.7.4

All of them are E0308 and E0277

[FEATURE] Refactor available features

Problem

It is currently impossible to do a lot of things, without private forks of lexical-core.

  1. Cannot have optionally-trimmed floats (IE, "12.0" and "12").
  2. Cannot use correct (slow) and incorrect (fast) parsers at the same time.
  3. Heavy reliance on global state (#45).

There are also numerous features that are rarely used (including some undocumented ones), and have dubious utility:

  1. table (should be the default, since correct depends on it).
  2. unchecked_index (introduces security risks if enabled, and has no tangible performance benefits).
  3. libm (should be enabled by default, see #61).
  4. noinline (debugging tools, no longer used).
  5. format (should be enabled by default, with fast-path algorithms to avoid overhead).

Remove needless uses of `unsafe`

There's a lot of unsafe code in lexical_core. A lot of it appears to be dealing with pointers, where you have a start and end pointer pair, which could as well be a slice and be completely safe with acceptable performance cost.

Unable to build (rename-dependency and lifetime errors)

Hi! Your work looks interesting and I'm interesting in applying some of the concepts elsewhere, so I thought I'd have a play around.

Unfortunately I'm new to rust and struggling to use this. I tried following your instructions, and hit a couple of problems. I'm on Ubuntu 18.04/amd64, and I started off with the system supplied rust (1.30.0).

I started a new project with cargo new, added lexical to the Cargo.toml, and threw this into main.rs:

extern crate lexical;

fn main() {
    let f: f32 = lexical::parse("12.34567");
    println!("Hello, world! {}", f);
}

This gave me rename-dependency as an error:

$ cargo install lexical
    Updating crates.io index
 Downloading lexical v2.0.0                                                                                                                                                 
error: failed to parse manifest at `/home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-2.0.0/Cargo.toml`                                                 

Caused by:
  feature `rename-dependency` is required

consider adding `cargo-features = ["rename-dependency"]` to the manifest

I found various suggestions on the internet:

  • Including inserting cargo-features = ["rename-dependency"] at the top of my Cargo.toml. This did not appear to help.
  • Update to rust 1.32.0.

Unfortunately, the latter suggestion lead to this new error:

~/.local/rust/bin/cargo install lexical
    Updating crates.io index
  Installing lexical v2.0.0
   Compiling void v1.0.2
   Compiling ryu v0.2.7
   Compiling static_assertions v0.2.5
   Compiling cfg-if v0.1.6
   Compiling unreachable v1.0.0
   Compiling stackvector v1.0.2
   Compiling lexical-core v0.3.1
error[E0309]: the parameter type `T` may not live long enough
   --> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:440:5
    |
439 | pub struct ReverseView<'a, T> {
    |                            - help: consider adding an explicit lifetime bound `T: 'a`...
440 |     inner: &'a [T],
    |     ^^^^^^^^^^^^^^
    |
note: ...so that the reference type `&'a [T]` does not outlive the data it points at
   --> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:440:5
    |
440 |     inner: &'a [T],
    |     ^^^^^^^^^^^^^^

error[E0309]: the parameter type `T` may not live long enough
   --> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:456:5
    |
455 | pub struct ReverseViewMut<'a, T> {
    |                               - help: consider adding an explicit lifetime bound `T: 'a`...
456 |     inner: &'a mut [T],
    |     ^^^^^^^^^^^^^^^^^^
    |
note: ...so that the reference type `&'a mut [T]` does not outlive the data it points at
   --> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:456:5
    |
456 |     inner: &'a mut [T],
    |     ^^^^^^^^^^^^^^^^^^

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0309`.
error: failed to compile `lexical v2.0.0`, intermediate artifacts can be found at `/tmp/cargo-installhhZln5`

Caused by:
  Could not compile `lexical-core`.

To learn more, run the command again with --verbose.

Thanks in advance for any help.

[BUG] Wrong limb width when cross compiling

The following code in the lexical-core build script branches on the value of cfg(target_arch). That cfg refers to the target of the current code being compiled by rustc. In the case of the build script that's the host architecture.

let limb_width_64 = cfg!(any(
target_arch = "aarch64",
target_arch = "mips64",
target_arch = "powerpc64",
target_arch = "x86_64"
));
if limb_width_64 {
println!("cargo:rustc-cfg=limb_width_64");
} else {
println!("cargo:rustc-cfg=limb_width_32");
}

Cargo provides a separate env variable to build scripts, called CARGO_CFG_TARGET_ARCH, to determine the target arch of the library as opposed to the target arch of the build script.

To reproduce, put this in build.rs and run cargo check --target wasm32-unknown-unknown and see which error is triggered.

#[cfg(target_arch = "x86_64")]
compile_error!("target_arch = x86_64");
#[cfg(target_arch = "wasm32")]
compile_error!("target_arch = wasm32");

[BUG] Fix a bug, and improve comments on Dragonbox

  • Rust version: N/A
  • lexical version: 0.8.1
  • lexical compilation features used: N/A

Description

Here are some comments in the code I find misleading to the readers.

  1. // These are much more efficient log routines than the ones
    I'm not sure what you mean here. The ones provided by dragonbox (if you mean the reference implementation) eventually boil down to code mostly identical to what you wrote. There are small differences on the details but they should not really make any differences on the performance.
  2. /// Calculate `(x * log10(2) - log10(4)) / 3` quickly.
    This one is not correct. What's being computed is floor(x * log10(2) - log10(4/3)); please refer to the paper (Section 5.4).
  3. // floor( (fc-1/2) * 2^e ) = 1.175'494'28... * 10^-38
    and similar lines. There should be no floor on the LHS's (the RHS's are not integers). This was my mistake and I corrected them in my repo recently.

[FEATURE] Add a format flag to allow parsing with , as separator

Problem

Dutch floats are formatted like so: 101.123,456, where the . is the separator, and the comma is used for the fraction.

Afaik, it is not possible to add a format flag to allow parsing dutch floats.

Some way to configure the parser to allow that would be great.

try_parse fails to parse signed integer minimum values

All of the following test cases fail with an Overflow error:

    assert_eq!(i8::MIN, lexical::try_parse(i8::MIN.to_string()).unwrap());
    assert_eq!(i16::MIN, lexical::try_parse(i16::MIN.to_string()).unwrap());
    assert_eq!(i32::MIN, lexical::try_parse(i32::MIN.to_string()).unwrap());
    assert_eq!(i64::MIN, lexical::try_parse(i64::MIN.to_string()).unwrap());

[FEATURE] Remove global state

Problem

Currently, some settings like the current expected exponent character are global state. This can be anything from inconvenience to tricky issue for projects parsing multiple languages.

Solution

The format feature of lexical-core added integer-packed settings which you have to pass to all format parsing functions. I suggest doing a similar thing for everything else, but by passing in a struct reference. Integer-packing works for things like the exponent characters, but fails for e.g. set_inf_string. This is still C-API-friendly. C libs are just forced to put strlen next to their *const c_chars.

Prerequisites

  • lexical version : 0.7.*
  • lexical compilation features used: format, radix, correct

Alternatives

Uhh… not doing any of this? Maybe packing said strings into something like staticvec::StaticString inlined into the struct? But I don't think that gives any benefit. Another idea would be to always enable format and include format's bit-packed settings into that one struct.

Undefined symbols ld error from nom

Not sure if this is an issue with lexical-core, nom, elastic-rs/elastic (where I am using nom) or even std/core or the compiler but I thought I would start here...

I am getting undefined symbols ld errors for various symbols in std/core when I enable the lexical feature of nom.

See elastic-rs/elastic/pull/389 for a little more background + error logs and this or this Travis build.

I have reproduced it on macOS 10.14 & 10.15 and Ubuntu 19.04 (and for the sake of completeness; various Linux via Docker) with rustc 1.38.0 (625451e37 2019-09-23), 1.39.0-beta.6 (224f0bc90 2019-10-15), 1.40.0-nightly (4a8c5b20c 2019-10-23)—and a few other nightlies—and when cross compiling to x86_64-unknown-linux-musl from macOS and Linux hosts.

We don't actually use the lexical feature of nom in elastic-rs/elastic, so I disabled it and everything is building fine but I thought I would open this issue in case someone else runs into it.

I can also upload our current Cargo.lock if that would help...

x86_64-pc-windows-msvc compile error

I've encountered the following compile error on a vs2017-win2016 VM (in Azure Piplines) using x86_64-pc-windows-msvc, rustc 1.39.0-nightly (97e58c0d3 2019-09-20). I see this project has a CI run on x86_64-pc-windows-gnu (not msvc) that passes.

error[E0061]: this function takes 1 parameter but 0 parameters were supplied
   --> C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.6.1\src\util\num.rs:961:44
    |
961 |         float_method_msvc!(self, f32, f64, powf, powf32, n as f32)
    |                                            ^^^^ expected 1 parameter

[BUG] Enabling `compact` breaks `no_std`

Description

The current version (v0.8.2) of lexical-core claims to be no_std (when default features are disabled), and doesn't include any mention of needing std in the context of the compact feature, but when enabling said feature, compilation is halted because std seems to be required by lexical-util whose std feature was enabled somewhere in the dependency tree.

   Compiling lexical-util v0.8.1
error[E0463]: can't find crate for `std`
  |
  = note: the `thumbv6m-none-eabi` target may not support the standard library
  = note: `std` is required by `lexical_util` because it does not declare `#![no_std]`

Prerequisites

Here are a few things you should provide to help me understand the issue:

  • Rust version: 1.53
  • lexical-core version: 0.8.2
  • lexical compilation features used: "compact"

Test case

[package]
name = "foo"
version = "0.1.0"
authors = ["becominginsane <[email protected]>"]
edition = "2018"


[dependencies]
lexical-core = { version = "0.8", features = ["compact"], default-features = false }
fn main() {
    println!("you won't compile me");
}

[BUG] Remove lexical-test/target from build tree.

Description

Git repository is currently too large due to compiled targets from lexical-test being added to the tree. This affects clone times dramatically. This can be fixed by running the following commands on each branch.

git filter-branch --tree-filter "rm -rf lexical-test/target" --prune-empty HEAD
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git commit -m "Removing lexical-test/target from git history."
git gc
git push --force

[OTHER] Yank versions incompatbile with Rust 1.53.0

Prerequisites

If applicable to the issue, here are a few things you should provide to help me understand the issue:

  • Rust version: rustc -V 1.53.0
  • lexical version: 0.7.4
  • lexical compilation features used:

Description

Please include a clear and concise description of the issue.

Refs #55, rust-lang/rust#85667

It would be nice if the versions that are incompatible with Rust 1.53.0 could be yanked. While yanking doesn't force people to update to the fixed versions, it does help as tools like cargo-audit will now warn that you're using a yanked version and should upgrade.

Additional Context

Add any other context or screenshots about the issue here.

At rust-lang/rust#85667 (comment) @Mark-Simulacrum said "I think yanking is likely not the right step to take at this time." - I wonder if they still think that now that 1.53.0 is stable.

Parse success on floats missing a trailing digit, e.g. "1."

In testing a project using lexical-core against https://github.com/nst/JSONTestSuite, I see that parsing floats such as 1., 0.e1, and 2.e+3 pass, but are expected to fail.

This patch shows the behavior I think should apply. What do you think?

diff --git a/lexical-core/src/atof/api.rs b/lexical-core/src/atof/api.rs
index 9bb0688..a95d682 100644
--- a/lexical-core/src/atof/api.rs
+++ b/lexical-core/src/atof/api.rs
@@ -270,6 +270,9 @@ mod tests {
         assert_eq!(Err((ErrorCode::EmptyFraction, 0).into()), f32::from_lexical(b"e-1"));
         assert_eq!(Err((ErrorCode::Empty, 1).into()), f32::from_lexical(b"+"));
         assert_eq!(Err((ErrorCode::Empty, 1).into()), f32::from_lexical(b"-"));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f32::from_lexical(b"1."));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f32::from_lexical(b"0.e1"));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f32::from_lexical(b"2.e+3"));

         // Bug fix for Issue #8
         assert_eq!(Ok(5.002868148396374), f32::from_lexical(b"5.002868148396374"));
@@ -399,6 +402,9 @@ mod tests {
         assert_eq!(Err((ErrorCode::EmptyFraction, 1).into()), f64::from_lexical(b"-."));
         assert_eq!(Err((ErrorCode::Empty, 1).into()), f64::from_lexical(b"+"));
         assert_eq!(Err((ErrorCode::Empty, 1).into()), f64::from_lexical(b"-"));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f64::from_lexical(b"1."));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f64::from_lexical(b"0.e1"));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f64::from_lexical(b"2.e+3"));

         // Bug fix for Issue #8
         assert_eq!(Ok(5.002868148396374), f64::from_lexical(b"5.002868148396374"));

Trailing dot

println!("{}", 1.0.to_string()); // 1

let mut buf = [0u8, 64];
lexical_core::ftoa::f64toa_slice(v, 10, &mut buf);
println!("{}", std::str::from_utf8(buf).unwrap()); // 1.

Is this an intended behavior?

module `traits` is private

Hello,

With lexical 1.5 the following code was fine:

pub fn deserialize<'de, T, D>(deserializer: D) -> Result<T, D::Error>
where
T: lexical::traits::Aton,
D: Deserializer<'de> {
lexical::try_parse::<T, _>(String::deserialize(deserializer)?).map_err(de::Error::custom)
}

Now it says:

error[E0603]: module traits is private

It looks like trait Aton has become FromBytes, but it still doesn't help as it cannot be used.

Doctesting fails without std

Errors:

error: duplicate lang item in crate `lexical_core`: `panic_impl`.
  |
  = note: first defined in crate `std`.
error: duplicate lang item in crate `lexical_core`: `eh_personality`.
  |
  = note: first defined in crate `panic_unwind`.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.