Git Product home page Git Product logo

feast-hive's Introduction

Feast Hive Support

Sorry guys, I am super busy recently for other projects, I will come back to continue to improve maybe a month later (since Apr 15th), please create an issue if you have any problem.

Hive is not included in current Feast roadmap, this project intends to add Hive support for Offline Store.
For more details, can check this Feast issue.

The public releases have passed all integration tests, please create an issue if you got any problem.

Change Logs

  • DONE [v0.1.1] I am working on the first workable version, think it will be released in a couple of days.
  • DONE [v0.1.2] Allow custom hive conf when connect to a HiveServer2
  • DONE [v0.14.0] Support Feast 0.14.x
  • DONE [v0.17.0] Support Feast 0.17.0
  • TODO It currently supports insert into for uploading entity_df, which is a little inefficient, gonna add extra parameters for people who are able to provide HDFS address in next version (for uploading to HDFS).

Quickstart

Install feast

pip install feast

Install feast-hive

  • Install stable version
pip install feast-hive 
  • Install develop version (not stable):
pip install git+https://github.com/baineng/feast-hive.git 

Create a feature repository

feast init feature_repo
cd feature_repo

Edit feature_store.yaml

set offline_store type to be feast_hive.HiveOfflineStore

project: ...
registry: ...
provider: local
offline_store:
    type: feast_hive.HiveOfflineStore
    host: localhost
    port: 10000        # optional, default is `10000`
    database: default  # optional, default is `default`
    hive_conf:         # optional, hive conf overlay
      hive.join.cache.size: 14797
      hive.exec.max.dynamic.partitions: 779
    ... # other parameters
online_store:
    ...

Create Hive Table

  1. Upload data/driver_stats.parquet to HDFS
hdfs dfs -copyFromLocal ./data/driver_stats.parquet /tmp/
  1. Create Hive Table
CREATE TABLE driver_stats (
    event_timestamp   bigint,
    driver_id         bigint,
    conv_rate         float,
    acc_rate          float,
    avg_daily_trips   int,
    created           bigint
)
STORED AS PARQUET;
  1. Load data into the table
LOAD DATA INPATH '/tmp/driver_stats.parquet' INTO TABLE driver_stats;

Edit example.py

# This is an example feature definition file

from google.protobuf.duration_pb2 import Duration

from feast import Entity, Feature, FeatureView, ValueType
from feast_hive import HiveSource

# Read data from Hive table
# Here we use a Query to reuse the original parquet data, 
# but you can replace to your own Table or Query.
driver_hourly_stats = HiveSource(
    # table='driver_stats',
    query = """
    SELECT Timestamp(cast(event_timestamp / 1000000 as bigint)) AS event_timestamp, 
           driver_id, conv_rate, acc_rate, avg_daily_trips, 
           Timestamp(cast(created / 1000000 as bigint)) AS created 
    FROM driver_stats
    """,
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created",
)

# Define an entity for the driver.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id", )

# Define FeatureView
driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver_id"],
    ttl=Duration(seconds=86400 * 1),
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
    ],
    online=True,
    batch_source=driver_hourly_stats,
    tags={},
)

Apply the feature definitions

feast apply

Generating training data and so on

The rest are as same as Feast Quickstart

Developing and Testing

Developing

git clone https://github.com/baineng/feast-hive.git
cd feast-hive
# creating virtual env ...
pip install -e ".[dev]"

# before commit
make format
make lint

Testing

pip install -e ".[test]"
pytest -n 6 --host=localhost --port=10000 --database=default

feast-hive's People

Contributors

achals avatar bennfocus avatar diana-jinhyeon avatar felix-neko avatar felixwang9817 avatar kyungwan-nam avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.