Explainability Results

openai-embeddings-2023

OpenAI Text Embeddings for User Classification in Social Networks

Setup

Virtual Environment

Create and/or activate virtual environment:

conda create -n openai-env python=3.10
conda activate openai-env

Install package dependencies:

pip install -r requirements.txt

OpenAI API

Obtain an OpenAI API Key (i.e. OPENAI_API_KEY). We initially fetched embeddings from the OpenAI API via the notebooks, but the service code has been re-implemented here afterwards, in case you want to experiment with obtaining your own embeddings.

Users Sample

Obtain a copy of the "botometer_sample_openai_tweet_embeddings_20230724.csv.gz" CSV file, and store it in the "data/text-embedding-ada-002" directory in this repo. This file was generated by the notebooks, and is ignored from version control because it contains user identifiers.

Cloud Storage

We are saving trained models to Google Cloud Storage. You will need to create a project on Google Cloud, and enable the Cloud Storage API as necessary. Then create a service account and download the service account JSON credentials file, and store it in the root directory, called "google-credentials.json". This file has been ignored from version control.

From the cloud storage console, create a new bucket, and note its name (i.e. BUCKET_NAME).

Environment Variables

Create a local ".env" file and add contents like the following:

# this is the ".env" file...

OPENAI_API_KEY="sk__________"

GOOGLE_APPLICATION_CREDENTIALS="/path/to/openai-embeddings-2023/google-credentials.json"
BUCKET_NAME="my-bucket"

DATASET_ADDRESS="my_project.my_dataset"

Usage

OpenAI Service

Fetch some example embeddings from OpenAI API:

python -m app.openai_service

Embeddings per User (v1)

Demonstrate ability to load the dataset:

python -m app.dataset

Perform machine learning and other analyses on the data:

OpenAI Embeddings:

Word2Vec Embeddings:

Embeddings per Tweet (v1)

OpenAI Embeddings:

Fetching Embeddings

Testing

pytest --disable-warnings

s2t2 / openai-embeddings-2023 Goto Github PK

openai-embeddings-2023's Introduction

openai-embeddings-2023

Setup

Virtual Environment

OpenAI API

Users Sample

Cloud Storage

Environment Variables

Usage

OpenAI Service

Embeddings per User (v1)

Embeddings per Tweet (v1)

Testing

openai-embeddings-2023's People

Contributors

Stargazers

Watchers

openai-embeddings-2023's Issues

Recommend Projects

Recommend Topics

Recommend Org