Git Product home page Git Product logo

hashdeps / arrow-rs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from apache/arrow-rs

0.0 1.0 0.0 72.16 MB

HASH uses Apache Arrow within hEngine for in-memory columnar data representation and zero-copy reads

Home Page: https://hash.ai/platform/engine

License: Apache License 2.0

Shell 2.82% Ruby 2.80% Batchfile 0.30% Python 8.73% Dockerfile 0.57% Jinja 0.20% R 0.01% Makefile 0.09% Rust 84.48%
rust arrow engine apache-arrow

arrow-rs's Introduction

Native Rust implementation of Apache Arrow

Coverage Status

Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust.

This part of the Arrow project is divided in 4 main components:

Crate Description Documentation
Arrow Core functionality (memory layout, arrays, low level computations) (README)
Parquet Parquet support (README)
Arrow-flight Arrow data between processes (README)
DataFusion In-memory query engine with SQL support (README)
Ballista Distributed query execution (README)

Independently, they support a vast array of functionality for in-memory computations.

Together, they allow users to write an SQL query or a DataFrame (using the datafusion crate), run it against a parquet file (using the parquet crate), evaluate it in-memory using Arrow's columnar format (using the arrow crate), and send to another process (using the arrow-flight crate).

Generally speaking, the arrow crate offers functionality to develop code that uses Arrow arrays, and datafusion offers most operations typically found in SQL, with the notable exceptions of:

  • join
  • window functions

There are too many features to enumerate here, but some notable mentions:

  • Arrow implements all formats in the specification except certain dictionaries
  • Arrow supports SIMD operations to some of its vertical operations
  • DataFusion supports async execution
  • DataFusion supports user-defined functions, aggregates, and whole execution nodes

You can find more details about each crate in their respective READMEs.

Arrow Rust Community

We use the official ASF Slack for informal discussions and coordination. This is a great place to meet other contributors and get guidance on where to contribute. Join us in the arrow-rust channel.

We use ASF JIRA as the system of record for new features and bug fixes and this plays a critical role in the release process.

For design discussions we generally collaborate on Google documents and file a JIRA linking to the document.

There is also a bi-weekly Rust-specific sync call for the Arrow Rust community. This is hosted on Google Meet at https://meet.google.com/ctp-yujs-aee on alternate Wednesday's at 09:00 US/Pacific, 12:00 US/Eastern. During US daylight savings time this corresponds to 16:00 UTC and at other times this is 17:00 UTC.

Developer's guide to Arrow Rust

How to compile

This is a standard cargo project with workspaces. To build it, you need to have rust and cargo:

cd /rust && cargo build

You can also use rust's official docker image:

docker run --rm -v $(pwd)/rust:/rust -it rust /bin/bash -c "cd /rust && cargo build"

The command above assumes that are in the root directory of the project, not in the same directory as this README.md.

You can also compile specific workspaces:

cd /rust/arrow && cargo build

Git Submodules

Before running tests and examples, it is necessary to set up the local development environment.

The tests rely on test data that is contained in git submodules.

To pull down this data run the following:

git submodule update --init

This populates data in two git submodules:

By default, cargo test will look for these directories at their standard location. The following environment variables can be used to override the location:

# Optionaly specify a different location for test data
export PARQUET_TEST_DATA=$(cd ../parquet-testing/data; pwd)
export ARROW_TEST_DATA=$(cd ../testing/data; pwd)

From here on, this is a pure Rust project and cargo can be used to run tests, benchmarks, docs and examples as usual.

Running the tests

Run tests using the Rust standard cargo test command:

# run all tests.
cargo test


# run only tests for the arrow crate
cargo test -p arrow

Code Formatting

Our CI uses rustfmt to check code formatting. Before submitting a PR be sure to run the following and check for lint issues:

cargo +stable fmt --all -- --check

Clippy Lints

We recommend using clippy for checking lints during development. While we do not yet enforce clippy checks, we recommend not introducing new clippy errors or warnings.

Run the following to check for clippy lints.

cargo clippy

If you use Visual Studio Code with the rust-analyzer plugin, you can enable clippy to run each time you save a file. See https://users.rust-lang.org/t/how-to-use-clippy-in-vs-code-with-rust-analyzer/41881.

One of the concerns with clippy is that it often produces a lot of false positives, or that some recommendations may hurt readability. We do not have a policy of which lints are ignored, but if you disagree with a clippy lint, you may disable the lint and briefly justify it.

Search for allow(clippy:: in the codebase to identify lints that are ignored/allowed. We currently prefer ignoring lints on the lowest unit possible.

  • If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
  • If you have several lints on a function or module, you may disable the lint on the function or module.
  • If a lint is pervasive across multiple modules, you may disable it at the crate level.

Git Pre-Commit Hook

We can use git pre-commit hook to automate various kinds of git pre-commit checking/formatting.

Suppose you are in the root directory of the project.

First check if the file already exists:

ls -l .git/hooks/pre-commit

If the file already exists, to avoid mistakenly overriding, you MAY have to check the link source or file content. Else if not exist, let's safely soft link pre-commit.sh as file .git/hooks/pre-commit:

ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit

If sometimes you want to commit without checking, just run git commit with --no-verify:

git commit --no-verify -m "... commit message ..."

arrow-rs's People

Contributors

kou avatar wesm avatar kszucs avatar pitrou avatar nealrichardson avatar jorgecarleitao avatar andygrove avatar xhochy avatar nevi-me avatar alamb avatar paddyhoran avatar dandandan avatar sunchao avatar fsaintjacques avatar houqp avatar bkietz avatar emkornfield avatar liurenjie1024 avatar julienledem avatar cpcloud avatar jhorstmann avatar ritchie46 avatar vertexclique avatar bryancutler avatar kiszk avatar lidavidm avatar pcmoritz avatar jorisvandenbossche avatar mrkn avatar maxburke avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.