Git Product home page Git Product logo

alloydb-pgvector's Introduction

alloydb-pgvector

Use Case: Vector Search for Patents Data (using Embeddings for Text Data)

User wants to search for a specific text in patent literature stored in AlloyDB, meaning-driven, not just keyword-driven. The app data is stored in AlloyDB, uses pgvector extension for embeddings and Cosine Similarity for Vector Search. The app is deployed as a Cloud Function endpoint.

What is AlloyDB ?

AlloyDB is a fully managed, PostgreSQL-compatible database service that offers enterprise-grade performance, availability, and scale. AlloyDB offers a number of features that make it a good choice for demanding transactional and analytical workloads, including:

High performance:

AlloyDB is designed to be highly performant for both transactional and analytical workloads. It uses a number of techniques to achieve this, including a shared-nothing architecture, columnar storage, and vectorized execution.

High availability:

AlloyDB offers a 99.99% uptime SLA, inclusive of maintenance. It uses a number of techniques to achieve this, including synchronous replication, automatic failover, and hot backups.

Scale:

AlloyDB can scale to meet the needs of even the most demanding applications. It can be scaled up to support up to 128 vCPUs and 1 TB of memory per instance, and it can be scaled out to support up to 100 instances per cluster.

Security:

AlloyDB is designed with security in mind. It uses a number of features to protect your data, including encryption, auditing, and access control.

Compliance:

AlloyDB is designed to meet the needs of organisations with demanding compliance requirements. It supports a number of compliance standards, including HIPAA, PCI DSS, and SOC 2.

What is Cloud Function ?

Cloud Function is a lightweight compute solution for developers to create single-purpose, stand-alone functions that respond to Cloud events without needing to manage a server or runtime environment.

Demo Steps

  1. Create the AlloyDB cluster, instance, database and table

CREATE TABLE patents_data ( id VARCHAR(25), type VARCHAR(25), number VARCHAR(20), country VARCHAR(2), date VARCHAR(20), abstract VARCHAR(300000), title VARCHAR(100000), kind VARCHAR(5), num_claims BIGINT, filename VARCHAR(100), withdrawn BIGINT) ;

  1. Create Extensions:

    CREATE EXTENSION vector;

    CREATE EXTENSION google_ml_integration;

  2. Grant Permission:

GRANT EXECUTE ON FUNCTION embedding TO postgres;

  1. Alter the table to add a Vector column for storing the Embeddings:

    alter table patents_data ADD column abstract_embeddings vector(768);

  2. Insert DML:

    Refer to the file "insert_scripts.sql" in this repo

  3. Grant Vertex AI User ROLE to the AlloyDB service account PRINCIPAL:

    You can do this from IAM on Google Cloud Console

  4. Test if embedding function works on a test string:

SELECT embedding( 'textembedding-gecko@001', 'AlloyDB is a managed, cloud-hosted SQL database service.');

  1. Update the abstract_embeddings Vector field:

    update patents_data set abstract_embeddings = embedding( 'textembedding-gecko@001', abstract);

  2. Query to check if Cosine Similarity works (Vector Search by Text):

SELECT * FROM patents_data ORDER BY abstract_embeddings <=> embedding('textembedding-gecko@001', 'A new Natural Language Processing related Machine Learning Model')::vector LIMIT 10;

  1. Create Cloud Functions, deploy and test it on the web!

alloydb-pgvector's People

Contributors

abiramisukumaran avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.