geofflangdale / simdcsv Goto Github PK
View Code? Open in Web Editor NEWA fast SIMD parser for CSV files
License: Apache License 2.0
A fast SIMD parser for CSV files
License: Apache License 2.0
Terminology: Selecting a subset of columns is called a projection.
Since extracting the indexes is expensive, you may want to only pick some of the indexes, never committing to memory the unneeded indexes. So maybe you want the 4th index and the 10th one. Then you want the 14th index and the 20th index and so forth. Skipping an index is cheap, it is a single instruction (blsr
).
So, conceptually, getting just the indexes you need is easy. Practically, there is a software engineering issue in getting the whole thing to work... but it can be done.
So I got 2 different instances, both CPU supports AVX2, I was expecting the shared library built from one instance can be used in another, however that's not the case. I was able to narrow the issue down to the -march=native flag: When replace it with -mavx2 -mpclmul it works perfectly in both instances. Just a note for anyone who may got same issue.
It happens in every csv file parsing, especially there is only one line in the file
Any chance you could switch to a more permissive license like MIT?
Thanks for your hard work on this!
I've proposed a PR (#10) that allows to compile on MSVC.
However the performance is awful (0.65GB/sec instead of 11GB/sec on the same machine, with MSVC Clang)
If anybody is able to look into that.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.