Feast is the leading open source feature store that automates the last mile in your production ML data pipelines. It allows data teams to serve features consistently for offline training and online inference.
- Offline Store: The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. The feast does not manage the offline store directly but runs queries against it.
- Online Store: The online store is a database that stores only the latest feature values for each entity.
- Feast Registry: An object store (GCS, S3) based registry used to persist feature definitions that are registered with the feature store.
- SDK: Manage version-controlled feature definitions, Materialize (load), Build and retrieve training datasets, and Retrieve online features.
- Feast UI
feast init -m dev
STEP 1: Create data folder under feature_store/feature_repo
mkdir dev/feature_repo/data
STEP 2: Configure feature_store.yaml
# Initial Configuration File
project: feature_store
# Path to the registry [ object store (GCS, S3) ] where feature definiation will be stored by feast.
registry: /path/to/registry.db
# Enviorment where data is stored.
provider: local
# The online store is a database that stores only the latest feature values for low latency inference.
online_store:
path: /path/to/online_store.db
# The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. For feature retrieval and materialization, Feast does not manage the offline store directly, but runs queries against it.
# offline_store:
# type: redshift
# cluster_id: [SET YOUR CLUSTER ID]
# region: us-west-2
# user: admin
# database: dev
# s3_staging_location: [SET YOUR BUCKET]
# iam_role: [SET YOUR ARN]
entity_key_serialization_version: 2
# Updated Configuration File
project: dev
registry: data/registry.db
provider: local
online_store:
type: sqlite
path: data/online_store.db
entity_key_serialization_version: 2
- Define The source of the features
- Define Entity for the feature schema
- Define Feature Schema [Entity and source will be utilised]
- Define Feature Service
- Define the Entity Dataframe [ Df will contain target, Entity, Timestamp]
- Get the Historical Features using entity dataframe [ Feature retrieval ]
- Save the data which will be used for model training
- Materialization [ For inferencing real time ]
- Getting started with Feast, an open source feature store running on AWS Managed Services : Blog Link