Git Product home page Git Product logo

Comments (17)

cowtowncoder avatar cowtowncoder commented on June 21, 2024 1

int/long is sort of less relevant as we implement optimized decoding ourselves and do not rely on JDK (for cases that matter -- parsing int/long from String is not on critical path).
And wrapping char[] in CharSequence is unlikely to have much if any performance benefits: might be slower than constructing Strings.

As to adding methods... I think performance numbers would be good. Can be stand-along micro-benchmark, no need to be integrated via parser (altho obv integrated one would be the ultimate thing).
It is too easy to try to add speculative improvements that have no real effect.

The reason I am slightly skeptical is that the real heavy part for double and float is decimal-to-binary decoding of actual numeric value; it's much more expensive than int/long f.ex.

from jackson-core.

pjfanning avatar pjfanning commented on June 21, 2024 1

You can use https://github.com/wrandelshofer/FastDoubleParser directly - It has support for Char Array, Char Sequence and String.

We can update NumberInput too but I'd prefer if that was done with perf tests that prove that it improves Jackson performance.

I'd prefer not to delay the 2.17 release so any Jackson changes in this area are likely not to be released until 2.18 or beyond.

from jackson-core.

cowtowncoder avatar cowtowncoder commented on June 21, 2024 1

One thing I may have missed -- thank you @pjfanning for sort of hinting at -- NumberInput is primarily meant to be used by Jackson itself. It is not meant as utility for code outside Jackson.
So improvements need to be something that JsonParser implementations use.

And yeah, definitely none of changes would go in 2.17 but 2.18 or later.

from jackson-core.

pjfanning avatar pjfanning commented on June 21, 2024 1

I understand there is nuance in assuming "public" means "public API". Looking through some related comments though, rallyhealth/weePickle#117 (comment) @pjfanning seems to suggest that NumberInput is part of the "public API"?

I'm very familiar with https://github.com/wrandelshofer/FastDoubleParser (we use it as part of https://github.com/deephaven/deephaven-csv), it's great :D. Good point though, I can use it directly in the meantime.

The methods that we have in NumberInput - we can't stop you using them. We are not going to add new methods to NumberInput unless they are useful for the internal Jackson code.

from jackson-core.

devinrsmith avatar devinrsmith commented on June 21, 2024 1

(Sorry if you saw my previous comment which I deleted, it was incorrect.)

I think there's an opportunity to also improve the performance of JsonParser#getDoubleValue, which seems like it always goes through String instead of char[]? Testing out an array of doubles (as opposed to an array of double strings), the performance matches up with my slower getText results from above.

Essentially, I'm in a position where the JsonParser API lets me parse double strings faster than doubles, which is an odd position to be in. (Unless I'm mistaken and there is a way for me to get char[] from VALUE_NUMBER_FLOAT contexts.)

from jackson-core.

cowtowncoder avatar cowtowncoder commented on June 21, 2024

Might make sense, esp. if performance benchmark shows meaningful improvement from avoided allocations (or rather, gc time). Challenges:

  1. JDK implementation only takes String (which is probably why there's no char[] taking alternative), so might end up being no-win for default case
  2. Number of permutations wrt fast/JDK parser (number of methods we have with overloads)

Then again, if and when we are moving to default to FastDoubleParser, it could be a win. Would def want to know something about improvement tho.

Anyway, +1 for experimentation.

from jackson-core.

devinrsmith avatar devinrsmith commented on June 21, 2024

Yeah. At least in the case of int/long, the JDK has CharSequence parsing support:

static int parseStringAsInt(JsonParser parser) throws IOException {
    if (parser.hasTextCharacters()) {
        final int len = parser.getTextLength();
        final CharSequence cs = CharBuffer.wrap(parser.getTextCharacters(), parser.getTextOffset(), len);
        return Integer.parseInt(cs, 0, len, 10);
    } else {
        return Integer.parseInt(parser.getText());
    }
}

A bit surprising the JDK is missing similar methods for parseFloat / parseDouble.

Would a PR adding these new methods to NumberInput be welcome, or would you need to see performance numbers first? I'm not planning to have anything in the library call them; although, I suspect there may be opportunities to improve JsonParser#getValueAsDouble and related in these regards.

from jackson-core.

cowtowncoder avatar cowtowncoder commented on June 21, 2024

Oh btw, forgot to say -- given that you do have high-performance streaming use case, this is probably quite relevant. So just to make sure: I am not against improvements like this. I just want to emphasize verifying improvements; in this case showing for your actual use case (replica thereof) would be absolutely awesome.

I actually have -- interestingly enough -- need for fast float parsing for Vector DB use case. So this is something I could quite likely benefit from as well, now that I think about it :)

from jackson-core.

devinrsmith avatar devinrsmith commented on June 21, 2024

I understand there is nuance in assuming "public" means "public API". Looking through some related comments though, rallyhealth/weePickle#117 (comment) @pjfanning seems to suggest that NumberInput is part of the "public API"?

I'm very familiar with https://github.com/wrandelshofer/FastDoubleParser (we use it as part of https://github.com/deephaven/deephaven-csv), it's great :D. Good point though, I can use it directly in the meantime.

from jackson-core.

cowtowncoder avatar cowtowncoder commented on June 21, 2024

Exactly, what @pjfanning said: primary user is Jackson itself, and anything that benefits it is potentially good idea. Making it more useful for other use cases is not a priority and could be considered counter-productive (due to added code to maintain, or to limit possibility of changes to API, implementation).

So, I don't consider NumberInput to be part of Public API.

And even more importantly: embedded/shaded FastDoubleParser is definitely not to be used by anyone directly (but acceptable via NumberInput I guess).

from jackson-core.

devinrsmith avatar devinrsmith commented on June 21, 2024

Thanks for the additional context.

Ultimately, I think I'll be making a case for additional methods on JsonParser, something like:

/**
 * Assumes {@link JsonToken#VALUE_STRING}, parses the string as a <b>float</b>.
 */
public float getStringAsFloat() throws IOException;

// ...
public double getStringAsDouble() throws IOException;

// ...
public BigInteger getStringAsBigInteger() throws IOException;

// ...
public BigDecimal getStringAsBigDecimal() throws IOException;

I would assume that getValueAsFloat/getValueAsDouble ID_STRING cases would delegate to respective impl.

Edit:
Probably getStringAsInt and getStringAsLong as well.

from jackson-core.

cowtowncoder avatar cowtowncoder commented on June 21, 2024

There are already getValueAsXxx() for int, long, double, fwtw. So naming should be followed.
Other than that I am quite ambivalent on whether supporting such use is good or not (from bigger picture perspective). But can definitely be discussed as part of proposal.

from jackson-core.

devinrsmith avatar devinrsmith commented on June 21, 2024

I've got some of my own quantitative numbers motivating this. I'm essentially seeing an uplift from 14mm doubles / second -> 20mm doubles / second when using ch.randelshofer.fastdoubleparser.JavaDoubleParser from getText to getTextCharacters, and an order of magnitude less GC (count, total reclaimed, total GC time). This is parsing through an array of a billion strings that are Math.random() doubles. Java 21, -Xmx1g -XX:CompileCommand=inline,java/lang/String.charAt (of course, this is on my machine with some overhead from our engine, and only representative of this particularly setup, etc):

getText

getTextCharacters

from jackson-core.

cowtowncoder avatar cowtowncoder commented on June 21, 2024

@devinrsmith I am 100% in favor of ensuring that getDoubleValue() on JsonToken.VALUE_NUMBER_FLOAT is optimal. So if and when we are missing method in NumberInput to support that, I'd be in favor of adding a method or methods as necessary.

(for 2.18 branch once we get there)

from jackson-core.

cowtowncoder avatar cowtowncoder commented on June 21, 2024

Ok. This is awkward: I was assuming getDoubleValue() and getFloatValue() would invoke efficient methods in FastDoubleParser (assuming fast processing enabled). But due to some refactoring in 2.16 or 2.15, this is no longer the case: I think deferred processing was applied too widely and as a result a String is constructed proactively and used from thereon. I don't think that is necessary: fast method should be called from ParserBase._parseSlowFloat() via TextBuffer _textBuffer methods contentsAsFloat() and contentsAsDouble().
(and looks like TextBuffer.contentsAsFloat() needs to be rewritten to actually use buffered content, if any).

This should be fixed, but timeline is such that it's too late for 2.17 -- after rc1 I think there's too much risk for edge case handling to go awry (deferred number handling was added for a reason and although I hope unit test coverage is sufficient I am not confident).
So I'll mark this as 2.18 thing to look into.

from jackson-core.

cowtowncoder avatar cowtowncoder commented on June 21, 2024

With #1230 now merged, we could tackle this issue.

But looking at ParserBase, logic is rather muddy: deferral of decoding increase has essentially lead ALL FP access to go via intermediate String _numberString which is now accurate but sub-optimal.

Conceptually, there shouldn't be a problem with more eager access: if and when JsonParser.getDoubleValue() is called, for example, value can now be materialized as double -- we do have other accessors that do not force evaluation, specifically JsonParser.getNumberValueDeferred(), used for buffering.
But the trick is untangle methods the way they are done currently...

from jackson-core.

cowtowncoder avatar cowtowncoder commented on June 21, 2024

Re-created as #1284 to address performance issue; closing this one since I do not think new methods are needed, just better implementations of existing ones.

from jackson-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.