mintplex-labs / vector-admin Goto Github PK

The universal tool suite for vector database management. Manage Pinecone, Chroma, Qdrant, Weaviate and more vector databases with ease.

Home Page: https://vectoradmin.com/

License: MIT License

JavaScript 44.68% Shell 0.19% Dockerfile 0.40% Python 1.30% HTML 0.17% TypeScript 53.05% CSS 0.21%

ai aitools chroma database-management document-retrieval embeddings langchain llms pinecone vector-data-management

vector-admin's Introduction

The universal UI and tool suite for managing vector databases at scale.

| | | Docs | Hosted Instance

Quick! Can you tell me exactly what information is embedded in your Pinecone or Chroma vector database? I bet you can't. While those teams are focusing on building the underlying architecture we made it easier for you to manage vector data without the headaches and API calls.

We call it VectorAdmin and we want to be the best universal GUI for vector database management.

view more screenshots

Watch the demo!

Product Overview

VectorAdmin aims to be a full-stack application that gives you total control over your otherwise unwieldy vector data that you are embedding via an API or using tools like LangChain, which don't show you want you just saved into your database.

VectorAdmin is a full capable multi-user product that you can run locally via Docker as well as host remotely and manage multiple vector databases at once.

VectorAdmin is more than a single tool. VectorAdmin is a suite of tools that make interacting with and understanding vectorized text easy without compromise for the controls you would expect from a traditional database management system.

Some cool features of VectorAdmin

Multi-user instance support and oversight
Atomically view, update, and delete singular text chunks of embeddings.
Copy entire documents or even whole namespaces and embeddings without paying to re-embed.
Upload & embed new documents directly into the vector database.
Migrate an entire existing vector database to another type or instance. still in progress
Manage multiple concurrent vector databases at once.
Permission data and access to data
100% Cloud deployment ready.
Automated regression testing that run as namespaces or collections are updated with new documents to ensure response quality. still in progress
Full API, Javascript, and Python standalone client and LangChain integration. still in progress
Extremely efficient cost-saving measures for managing very large documents. You'll never pay to embed a massive document or transcript more than once.

Technical Overview

This monorepo consists of three main sections:

document-processor: Flask app to digest, parse, and embed documents easily.
frontend: A viteJS + React frontend that you can run to easily create and manage all your content.
backend: A nodeJS + express server to handle all the interactions and do all the vectorDB management.
workers: An InngestJS instance to handle long-running processes background tasks for snappy performance.
docker: Run this entire arch in a single command as a docker instance recommended.

Requirements

yarn and node on your machine
python 3.9+ for running scripts in document-processor/.
access to an OpenAI API key if planning to update embeddings or upload new documents.
a Pinecone.io free account or a running ChromaDB instance.

How to get started (Docker - simple setup)

Get up and running in minutes with Docker

How to get started (Development environment)

The below instructions will not work on Windows.

yarn dev:setup from the project root directory.
cd document-processor && python3.9 -m virtual-env v-env && source v-env/bin/activate && pip install -r requirements.txt

In separate terminal windows from project root:

yarn prisma:setup to create DB migration and client and then run yarn dev:server
yarn dev:frontend
yarn dev:workers
cd document-processor && flask run --host '0.0.0.0' --port 8888

On first boot and visiting of the homepage, you will be automatically redirected to create your primary admin account, organization, and database connection.

Contributing

create issue
create PR with branch name format of <issue number>-<short name>
yee haw let's merge

Telemetry

VectorAdmin by Mintplex Labs Inc contains a telemetry feature that collects anonymous usage information.

Why?

We use this information to help us understand how VectorAdmin is used, to help us prioritize work on new features and bug fixes, and to help us improve VectorAdmin's performance and stability.

Opting out

Set DISABLE_TELEMETRY in your server or docker .env settings to "true" to opt out of telemetry.

DISABLE_TELEMETRY="true"

What do you explicitly track?

We will only track usage details that help us make product and roadmap decisions, specifically:

Server is started or booted up.
Version of your installation.
Type of job when executed.

You can verify these claims by finding all locations Telemetry.sendTelemetry is called. Additionally these events are written to the output log so you can also see the specific data which was sent - if enabled. No IP or other identifying information is collected. The Telemetry provider is PostHog - an open-source telemetry collection service.

vector-admin's People

Contributors

Stargazers

Watchers

Forkers

tri9ne franzbischoff carloslu0 chriswinsatlife mz0in hblink iwillcodeu ari1945 giachat hany-girgis substo bitcoinski bhctest123 coinhubx drgonzalomora lpai-org markscrivo tenqido hcchengithub iokinpardo sunny2567 alikibao salonast ilanb jonnyzhong brian-1812 yuchengwang derores bjoernrave melroy89 thebigelf1 crowdcompany lightning-dev eric-epsilla aelkhalil adrianhacker-pdx yongshengzhucn harryvize fahadaltaf jdbranham minkj1992 pooriaarab b-hof martin-242 theonlyjohnny felipetzar elafo carrigan223 xinqiyang thelyoncrypt ai-ev1lc0rp rioncarter nsuleerturk brown7477 xtreme-appz mvandermeulen copyrightsworld silasdao xtreme-appz bellyfat zyanzealous petrsovadina-digimedic derkodex robertomalatesta plasmaking ashiknesin stophobia marcoschaarbr id-2 kschmelter13 jairajgaur jadenblack healthmemmo industrialmagic rawpixel-vincent sanyaade-projects kh-ac mu-l guzhixin0824 jmlogs youngsecurity yomulabs nnfrog awesomething lexsf mightyhouseinc chatbotsgpt ux-builder usmc2033 mentordotgit ax2l marchdigital jadegeek treyanderson pixlcode renenavas 0xtaos mosiojar seryum danozworld

vector-admin's Issues

Weaviate Integration

Add full VectorAdmin Support for Weaviate

Create/Find workspace

We currently have protections when you type in a new workspace name that we wont have namespace/collection collisions, but we should also have this input function as a typeahead for collections or namespaces not yet imported.

If the user selects a suggestion we should then verify they want to pull in that specific workspace and then do that as opposed to only creating new ones.

Right now the only way to target and pull in a workspace is to sync the entire DB. Once the workspace exists then we can sync just the workspace going forward.

What embeddings model is used for the uploaded documents?

I saw the project on the youtube. Seems excellent.

I was wondering. When uploading a new document, what is the embeddings model used to convert it?

Is it possible to chose a custom one? If so, how can it be done?

Thank you.

Don't allow special chars in org name

connect #91

Some organizations may want to include special chars in their organization name. We should try to either prevent this from occurring or encode the slug of the URL to prevent this from being an issue.

Open details document make blank page Docker

Hi, tried to open cached documents after sync with Pinecone:
Document
857
Vectors
18.8 MiB
Vector Cache (MB)

When I click to open it, I see some ashes and after that quickly a blanc page (Chrome)
No error appear in logs of Docker and no error in google chrome console

Documet view with no content to display from the text crashes page

When viewing a document after upload if it was uploaded with no text crashes the page. I uploaded a document without a text field in the metadata, in the next frontend component of the table view when looking at the document the page flashes and crashes. the console error is that .length is trying to be called against null. My suspicion is that the content preview pill is checking length regardless of whether the property is present when trying to display the row component in the document view.

Infinite scroll on sidebar for collections

Many users have >100 workspaces. In this event, we should utilize an infinite scroll + a simple name-based search field in the sidebar. This way we can improve the initial page load by only querying the top 20 workspaces and then any more than that will lazy load in the rest of the workspaces.

The search bar should only appear if there are more than LIMIT workspaces, where limit is the page size of workspaces.

Azure Embedding model support

Like how we use OpenAI for embedding vector search and new documents, we should enable the connection of Azure services to this as well for those who opt to use that service.

This should be readily extensible to support other embedded models.

Running yarn dev:setup returns ERROR: no such file or directory: dev:setup

I see devSetup.js but can't run yarn dev:setup.js.

Any idea why this is happening?

Support milvus

Hi Team.
Amazing work. Is there a plan to support milvus and zilliz cloud?

Sqlite support

Do you plan support sqlite instead of postgresql?
I think it will be comfortable for small instalaltions.

Drop source loading on document view

on Document view we do attempt to download the vector-cache file and retain on the client side so can view the text, metadata, and other assets.

However, when the vector-cache is sufficiently large (like 100MB+) the client will fail to parse such a large JSON body, not to mention that it also takes forever to load - if it does.

We can dramatically improve the UX here by dropping the support of trying to read arbitrary-size JSON files since we can just read the metadata exactly from the vector database given that we are using pagination now.

Connected Chroma, chroma/sync-workspace fails

Installed in Docker and connected to local docker Chroma instance. Can see my Workspaces. Clicked on Workspace and then 'Sync Workspace Data'. This queues a job that failes.

Job response:
{
"canRetry": true,
"message": "Job failed with error",
"error": "Cannot read properties of null (reading 'title')",
"details": {}
}

Chroma 0.4.4

VectorAdmin UI V2

Implement new designs for VectorAdmin.

https://www.figma.com/file/HqOoKd5AALhX34eJajIPln/%5BVA%5D-UI-Overhaul?type=design&node-id=191%3A46023&mode=dev

Qdrant Support

Add Full Vector Admin support for QDrant.tech

We could not log you in - contact an admin. [002] Failed to authenticate (First time login)

Following this https://github.com/Mintplex-Labs/vector-admin/blob/master/docker/DOCKER.md#how-to-install

and then using default Sys email and password I get this error unable to login
I tried with cridentials in docker/.env and backend/.env both didn't work
Logs

2023-09-07 14:42:35 Warning: connect.session() MemoryStore is not
2023-09-07 14:42:35 designed for a production environment, as it will leak
2023-09-07 14:42:35 memory, and will not scale past a single process.
2023-09-07 14:42:35 Warning: connect.session() MemoryStore is not
2023-09-07 14:42:35 designed for a production environment, as it will leak
2023-09-07 14:42:35 memory, and will not scale past a single process.
2023-09-07 14:42:34 SELECT * FROM system_settings WHERE label = 'telemetry_id'
2023-09-07 14:42:34 [TELEMETRY SENT] { event: 'server_boot', properties: { commit: '--' } }
2023-09-07 14:42:34 SELECT COUNT(*) as count FROM users WHERE role = "admin"
2023-09-07 14:42:34 SELECT * FROM users WHERE email = '[email protected]' AND role = 'root'
2023-09-07 14:42:34 INSERT INTO users (email, password, role) VALUES ('[email protected]', '$2b$10$8QuMQU8wveRP2.CNVlbT7eMynhAPu0NvIAc.6Ak6tdOGNjmKEesla', 'root')
2023-09-07 14:42:34 Root user created with credentials
2023-09-07 14:42:34 Example app listening on port 3001
2023-09-07 14:42:34 SELECT * FROM system_settings WHERE label = 'debug_username'
2023-09-07 14:42:35 setupFunctions success! 200
2023-09-07 14:42:35 Background workers listening on port 3355

File name encoding garbled

Garbled characters appear when uploading a file whose name is Chinese.

Can't have multiple vectordb connectors to same technology provider in same workspace

I have a personal Qdrant account where I run experiments and, when an import/experiment is working, I want to promote it to the final/production qdrant account. I was hoping vector-admin could help me here. Unfortunately it only allowed me to add 1 Qdrant credential. I would like to enter N host:key pairs for Qdrant. As a matter of fact, for any one of the supported providers (2 Pinecones, 3 Quadrants etc).

Pagination on documents

It is possible and reasonable that a single document can have 10k+ vectors in it (large pdfs) and this will dramatically slow the page load on document loading.

We should paginate data for the loading of the document so that loading times are not extreme for large documents in workspaces

Can't connect to Chroma

I have a Chroma instance in the AWS, it's currently open without authorization. But every time I try to connect using the IP address, it says "Could not connect to Chroma instance with those credentials.", even if the API Header & Key its empty.

Support for OpenID Authentication

Hello great tool,
It would be great to OpenID implemented for authentication, what do you think?

An other feedback around the cloning action do you think that it would make sens to clone and propose the option to start a new DB instance based on a user provided script for example?

Flowise document import not finding a name

When documents are implemented via Flowise all documents come through named as blob which results in massive documents that are all merged into a single document. These documents should be parsed according to their document title or be given a new document name totally.

How to install on Digital Ocean Droplet via Docker

Went through the docs and found installation instructions for docker on windows but not docker on digital ocean droplet (linux - ubuntu).

Got any tips on how to install it on digital ocean?

Found your tool from flowise discord.

Also are you the people behind mintplex for Solana?

Support for cohere embeddings instead of openAi?

Can you add support for cohere please?

The table `public.system_settings` does not exist in the current database , when following How to get started (Development environment)

Following the steps in https://github.com/Mintplex-Labs/vector-admin#how-to-get-started-development-environment , when I try the step

yarn dev:server
```

I get the error:

```
The table `public.system_settings` does not exist in the current database.
    at wn.handleRequestError (/Users/mqm/git/oc/vector-admin/backend/node_modules/@prisma/client/runtime/library.js:123:6730)
    at wn.handleAndLogRequestError (/Users/mqm/git/oc/vector-admin/backend/node_modules/@prisma/client/runtime/library.js:123:6119)
    at wn.request (/Users/mqm/git/oc/vector-admin/backend/node_modules/@prisma/client/runtime/library.js:123:5839)
    at async l (/Users/mqm/git/oc/vector-admin/backend/node_modules/@prisma/client/runtime/library.js:128:9763)
    at async Object.updateSettings (/Users/mqm/git/oc/vector-admin/backend/models/systemSettings.js:45:25)
    at async saveDebug (/Users/mqm/git/oc/vector-admin/backend/utils/debug/index.js:19:3) {
  code: 'P2021',
  clientVersion: '5.3.1',
  meta: { table: 'public.system_settings' }
}

```


Shouldn't the tables be created on first run?

Support Chroma with Authentication

When using Chroma with Authentication via AWS ApiGateway or otherwise we should allow the user to set the header and the key so the instance can be accessed with Auth is provided

Document Embedding Search

When viewing a document the user should be able to find specific embedding(s) that match some given query. We should support the same searching functions that exist in the workspace document searching.

Vectors without a namespace

VectorAdmin not pulling data from Pinecone for vectors without a namespace. Is that a known issue or am I missing something?

Request for Redis Vector Store support

Hey, @timothycarambat great work on this project.

I would like to request support for the Redis Vector Store lang-chain supports it https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/redis/

I will check through the codebase hopefully I can create a PR for this.

Workspace Search on Sidebar and Workspace Table

As a user with many workspaces, I can already infinite scroll the workspaces available to the user. However, to find a workspace of specific interest I crtl+f the page and try to find it. Since infinite scroll was implemented I can no longer use this method.

An automatic "search filter" should be present on the side bar below the workspace header that I can type in and the desired (or no result) will appear once done typing.

Likewise, a "search" filter function should be present on the Workspaces table on the Org page. It should function the same.

Dev notes:
The workspaces on the side bar and in the table are logically linked. Updating workspaces on the sidebar will impact the results in the table. The table should probably not auto-filter.

Some other considerations are a "Search" and "Infinite" scroll mode. When the search input is non-null/empty we hide infinite scroll and show only search results. Same for the table. This way the original workspace state does not change constantly.

The sidebar component will be busy adding this so I suggest extracting the Search/Infinite scroll to a new component or subcomponent of the Sidebar. Adding this all into the Sidebar file will be a mess.

Organizations names editable

Hi @timcarambat do you think AV Organizations names could be editable? It seems that once created they cannot be changed, but I think this should be a bit more flexible if possible, at least the visual label will keeping internal id reference. Many thanks.

Text splitter preview

When uploading a document via the UI we should allow the user to optionally specify "Advanced Options" which enables them to set the parameters and type of splitting.

We should also integrate https://langchain-text-splitter.streamlit.app/ as a separate page and tool for the user to use to determine which option is best used.

Fallback or no special options is the current default that is used app-wide.

Cannot login after first boot with Docker

For some reason I cant login with the SYS_EMAIL and SYS_PASSWORD I provided in the .env file. Also all values in their are surrounded by semicolons, do I need to remove all of those? My .env file looks like this now:

SERVER_PORT=3100
# STORAGE_DIR="./backend/storage"

JWT_SECRET=iewuhdcw8ry2903741c9eoidjso
[email protected]
SYS_PASSWORD=132894aoisjdasdj

UID=1000
GID=1000

INNGEST_EVENT_KEY=background_workers
INNGEST_SIGNING_KEY=aosidjcm98we7c938uewoiejcfso
INNGEST_LANDING_PAGE=true

You're missing the INNGEST_SIGNING_KEY parameter when serving your functions. This will not work in production

Should this be a concern?

{"level":"warn","caller":"devserver","time":"2023-12-15T14:26:25-03:00","message":"You're missing the INNGEST_SIGNING_KEY parameter when serving your functions.  This will not work in production."}

Running locally on my laptop.

Adding a new workspace and document

When i go to create a workspace one isnt created unless i upload a document. When i create a workspace and add a document, the "job" flags an error but then the workspace and doucment are added anyway.

yarn/docker issue

The following error is yielded when runing yarn dev:setup

I'm using a macO S12.3.1

the neares pacakage directory (../vector-admin/frontend) doesn't seem to be part of the project declared in (../vector-admin)

and when trying to run 'docker-compose' comand, it naturaly yields

failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::g7wxfdpmli2jkowdsir0lph1k: "/frontend/yarn.lock": not found

Workspace Search input

At the workspace level, we should enable a document to be found by the following search params:
exact text search - reads JSON files for exact text match
semantic search - uses OpenAI
metadata key - requires specific format of [key]:[value]
vector id

Searching should execute on user's click and will render results in the document viewer to show the document(s) which match the search.

Document Metadata field editor.

On each document, we should allow the editing of metadata and optionally enable the editing of chunk metadata.

So at the document level, we can apply or remove key/value pairs and on the chunk level be allowed to do the same.

This will not cover multi-selection. It is either the entire document or a single chunk.

uploading a PDF file only 1 vector

I tried to upload a pdf file that consists of 20+ pages, but it seems that only the title page was uploaded. There only 1 vector present after uploading.

Embedders, one issue to rule them all

Requests to 3rd party integrations: #2 #69
Lack of guidance for new users: #61

After starting the quest for not using OpenAI embedded, that use a "magic number" of 1536 for their resulting vectors, we see that most on-premises (or competitors) use any other vector length for their reasons.

This creates one of the most known headaches for developers. So, this issue is here to:

Remind the users that the embedding process differs from the "chat" process, and there is no standard yet. And to get worse, most Vector DBs still need to be ready for these non-standards.
Remind the developers that this information shall be shown everywhere. Metadata, popups, labels, whatever.

my 2 cents

Pagination on document page

When there is a large document, we should be able to split the json files into chunks and have pagination so that we can quickly see parts of the document/remove loading times for large documents.

implement more vector database apps

Will you put an implementation for the vector apps? like milvus, or also zilliz which would be the same as milvus

Windows support?

In your README, it says the setup procedure will not work on Windows. Does that mean this only works on Mac OS for now?

Cant access the UI from a remote box

Nice work. Comes up fine...

I am running using docker-compose up -d --build

What's the way to access the UI from a remote box?
Something in the frontend always tries to authenticate to 127.0.0.1:3001/api/

I see a VITE_API_BASE variable but I am not sure where to set this (and what to rebuild (for docker) after setting it)

Fix README setup instructions

There are a few commands referencing the document-collector when it should be document-processor.

yarn dev:setup produces "vector-admin/backend/node_modules/@prisma/engines: Command failed."

Hello!

I'm attempting to setup vector-admin on an Ubuntu VM running on WSL 2.

This is the Ubuntu release information:

Distributor ID: Ubuntu 
Description: Ubuntu 22.04.3 LTS 
Release: 22.04 
Codename: jammy

I installed the most recent version of yarn with this command:

sudo apt install yarn

This also installed nodejs.

So far I've had two problems with the very first command, yarn dev:setup:

The first was the error:

yarn run v1.22.19 error [email protected]: The engine "node" is incompatible with this module. Expected version ">=18".  Got "12.22.9" error Commands cannot run with an incompatible environment. i

I was able to deal with that by issuing this command:

yarn config set ignore-engines  true

The installer ran, but experienced this error toward the end of the installation:

error /mnt/c/Development/Projects/Python/vectoradmin/vector-admin/backend/node_modules/@prisma/engines: Command failed.
 Exit code: 1 
Command: node scripts/postinstall.js 
Arguments: 
Directory: /mnt/c/Development/Projects/Python/vectoradmin/vector-admin/backend/node_modules/@prisma/engines 
Output: 
/mnt/c/Development/Projects/Python/vectoradmin/vector-admin/backend/node_modules/@prisma/engines/dist/scripts/postinstal l.js:22549
  return this[INTERNAL2].defaultPort ?? (this.protocol === "https:" ? 443 : 80); 

SyntaxError: Unexpected token '?'
  at wrapSafe (internal/modules/cjs/loader.js:915:16)
  at Module._compile (internal/modules/cjs/loader.js:963:27) 
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
  at Module.load (internal/modules/cjs/loader.js:863:32) 
at Function.Module._load (internal/modules/cjs/loader.js:708:14)
  at Module.require (internal/modules/cjs/loader.js:887:19)
  at require (internal/modules/cjs/helpers.js:74:18)
  at Object. (/mnt/c/Development/Projects/Python/vectoradmin/vector-admin/backend/node_modules/@prisma/engi

Any ideas?  

Most cordially,
Jack Stewart

Is there any Docs on Running this on GCP or Railway?

Would be cool if we had more docs on other ways to run this app in the cloud. Thank you :)

Organization "[email protected]" causes 404

I am adding multiple organizations as a means to connect to multiple different vector db technologies.

On my 2nd Org I defined the name to be: [email protected]

After it is created and the left menu is populated, if you click on it you will get a 404"

This localhost page can’t be foundNo webpage was found for the web address: http://localhost:3000/dashboard/[email protected]
HTTP ERROR 404

BTW, how can I delete the Org?

Pinecone support for Vectors without namespace

I'v some pinecone vector index without namespace and vector admin can't find database:
{
"message": "No Pinecone namespace found for vectors-without-namespace - nothing to do."
}

Paginate Documents

For users with many many documents, we should limit this container to only showcase some LIMIT of documents which will then require pagination via a query param in the URL.

Currently the view is limited to 100 documents flat without pagination.