Comments (7)
The Mison paper pretends to extract "top-level fields". I don't know what a top-level field is... but I am pretty sure it is a bit like my idea of extracting tabular data out of a JSON document.
from simdjson.
We'd like some level of abstraction so we don't work directly on the tape... maybe a zero-cost abstraction.
from simdjson.
Looks like people are happy to just measure the "parsing time": https://github.com/chadaustin/sajson/blob/master/benchmark/benchmark.cpp
That's not entirely satisfying, though we can play this game as well...
from simdjson.
The 'parsing time' is not entirely satisfactory as an benchmark outcome in situations where we want to do something like the Mison use case. However, I'm not entirely convinced at this stage that this partial parsing use case is well-defined. The fundamental flaw of papers analysing such queries is that they tend towards the "ask yourself a question and answer it" school of thought. Parsing everything down to the level that the zero-cost abstraction you describe can traverse it is a known outcome.
It might be a bit churlish, but one approach is to try to achieve a full parse faster than Mison (or similar systems) can answer queries; this isn't a sustainable position to hold in the long run, but might be briefly amusing.
I agree that working directly with the tapes is awkward, but once number and string parsing are properly in place I think you'll find that post-stage 4 they are largely a friendly data structure fundamentally, even if they cosmetically resemble the droppings of an ape.
They don't contain 'luxury' pointers (i.e. 'up' pointers) but I think it's possible to do most traversals on them. I think a shallow layer on top of them (the "zero-cost abstraction" idea) should do the trick.
from simdjson.
Parsing everything down to the level that the zero-cost abstraction you describe can traverse it is a known outcome.
Fair enough.
I agree that working directly with the tapes is awkward, but once number and string parsing are properly in place I think you'll find that post-stage 4 they are largely a friendly data structure fundamentally, even if they cosmetically resemble the droppings of an ape.
I agree that it is a nice data structure for this problem. My thinking is more that we need some convenience functions so that the queries don't unnecessarily depend on the underlying data structure. I imagine that something like 20 or 30 lines of code would be enough to have a decent wrapper so that a programmer with little knowledge of the tapes can be productive.
from simdjson.
Completely agreed. Definitely well worth doing. I had deferred thinking about this because the structure of the tapes has been constantly changing, but it's worth getting it done. Especially because in my experience it's only when you actually try to write real code that you discover that it's harder than you thought!
from simdjson.
The following has been implemented:
SELECT DISTINCT “user.id” FROM tweets;
We now have a prototypical API so I am going to close this.
from simdjson.
Related Issues (20)
- Add full support for JSONPath HOT 19
- 你能训练一个连下2步的围棋ai吗?
- Trailing comma support for array and object HOT 1
- Confusing error message when trying to convert a non-scalar on-demand document to a value HOT 2
- Add Glace to the benchmarks HOT 1
- Double parsing can produce incorrect results due to integer overflow. HOT 1
- get_number().get_double() produces incorrect results, but get_double() is correct HOT 1
- unsafe precondition(s) violated: ptr::write requires that the pointer argument is aligned and non-null HOT 1
- [SOLVED] ambiguous template specialization 'get<simdjson::fallback::ondemand::document>' HOT 2
- How can I fix 'simdjson::dom::parser::Iterator::is_object': Use the new DOM navigation API instead (see doc/basics.md)' compiler warning in VS2019? HOT 1
- Implement an ability to parse integers that exceed 64 bits HOT 11
- Does this library only support the read operations? I have seen some APIs that do not seem to support the write operations similar to rapidjson. HOT 1
- Branchless integer parsing
- Wrong version number for release 3.7.0 HOT 9
- 3.6.4: build fails with gcc 14.x HOT 9
- Fallback parser missing on aarch64 + Linux HOT 7
- When capacity of padded_string_view is given a size smaller than length, padding() is wrapping HOT 2
- Security Policy HOT 2
- Fail to parse boolean in a truncated document stream. HOT 3
- Does simdjson get faster if you keep parsing objects with the same schema? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simdjson.