Git Product home page Git Product logo

gbnc's Introduction

Naïve Infrastructure for a GB&C project

Warning This is a prototype for development only. No security considerations have been made. All services run as root!

Getting started

Locally

To build and run the container locally with hot reload on python files do:

DOCKER_BUILDKIT=1 docker build . -t gbnc
docker run  \
  --env HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
  --volume "$(pwd)/gswikichat":/workspace/gswikichat \
  --volume gbnc_cache:/root/.cache \
  --publish 8000:8000 \
  --rm \
  --interactive \
  --tty \
  --name gbnc \
  gbnc

Point your browser to http://localhost:8000/ and use the frontend.

Runpod.io

The container works on runpod.io GPU instances. A template is available here.

Local development

Backend

python -m venv .venv
. ./.venv/bin/activate
pip install -r requirements.txt

Frontend

cd frontend
yarn dev

What's in the box

Docker container

One container running all the components. No separation to keep it simple. Based on Nvidia CUDA containers in order to support GPU acceleration. Small models work on laptop CPUs too (tested i7-1260P).

Ollama inference

The container runs Ollama for LLM inference. Will probably not scale enough when run as a service for multiple users, but enough for testing.

Phi2 LLM

The Microsoft Phi2 2.7B model is run by default. The model runs locally using Ollama. Can be switched with the MODEL docker build arg.

Haystack RAG Framework

The Haystack RAG framework is used to implement Retrieval Augmented Generation on a minimal test dataset.

API

A FastAPI server is running in the container. It exposes an API to receive a question from the frontend, runs the Haystack RAG and returns the response.

Frontend

A minimal frontend lets the user input a question and renders the response from the system.

gbnc's People

Contributors

andrewtavis avatar exowanderer avatar rti avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

andrewtavis

gbnc's Issues

EN/DE dataset

Terms

Issue

To really test the prototype, we need an English dataset too, ideally, its data should mostly overlap with the German one. Just like in a wiki where most of the contents are translated in both languages (but some stuff might be missing).

Add in loading animation to UI

@rti and I just discussed that the final thing needed for the UI is a loading animation for while the model is running. As of now the field where the response is displayed it just blank with no indication that there's work being done.

Suggestion for this:
We display the Wikimedia icon as is done in the response and have a placeholder text rectangle next to it. These two can then be blinked slowly as the model runs. The condition for this would be when an API request has been made, but without a response being present yet.

Eval full pipeline

Terms

Issue

I think it would be interesting to evaluate the performance of the pipeline at different stages.

  • How good is the retrieval?
    • How do different embedding models perform in comparison?
  • What is the best amount of contexts to give into the model?
  • Which model answers questions best?
    • Takes up the actual facts from the context
    • Least hallucinations
    • Best phrasing

For the last GB&C Silvan and I implemented something very simple, but conceptually similar for the askwikidata prototype:
https://github.com/rti/askwikidata/blob/main/eval.py

There are also frameworks such as Ragas that might help https://docs.ragas.io/en/latest/getstarted/evaluation.html#metrics

Suggestion: We should change the repo and project name

Terms

Issue

I suggest that we change the GitHub repo name and project name to PrivateWikiSearchRAG or private_wiki_search_RAG, which can be shortened to pwsr instead of gbnc. Please provide any other suggestions for changing the name. We can discuss or vote on them here.

Justification for changing the name
We started this project as a GB&C small project, but it has now evolved into a deployable prototype RAG for almost any private wiki-text focused dataset.

I think that we should reflect his upgrade in our scope and functionality to match the level of capacity we provided as well as the larger range of community that could take advantage of our efforts.

Embedding generation runs on CPU only

Terms

Behavior

When generating embeddings, only CPU is used, no GPU acceleration is leveraged.
This makes embedding generation for our full example data requiring 18h on 16 cores.
Typically, GPU acceleration can be activated by providing a device="cuda" parameter. This should speed up the embedding generation.

Operating System

linux, our container on runpod.io host with nvidia 3090

How to handle languages

Terms

Issue

We want to support English and German to start with.

How should the system behave?

  • The UI switches languages based on browser language already ✅
  • Should the LLM always answer in the UI language?
  • What if the user ask in a different language?
  • How does the LLM perform in translating?
    • Thanks to the multilingual embeddings, we should already retrieve information in both languages transparently
    • Can the LLM get a context in one language and still respond correctly in another language?
  • Should we somehow restrict everything to a single language at a time? How?

Updated Issue template forms

@andrewtavis said: Let me know if we want to do issue template forms for this. I've made loads of those and I'd be happy to add a few in for bugs, feature requests and the like :)

Originally posted by @andrewtavis in #11 (comment)

Yes, please. I copy pasted the Issue and PR templates from a GitHub tutorial. Please do what you wish to with those.

Inacceptable performance on excellent_acticles full

Terms

Behavior

When booting up with the full excellent articles dataset, the application takes ages to start up. After the embeddings have been generated with GPU acceleration (NVIDIA RTX 3090 takes ~10minutes to embed excellent articles full) the application hangs, 1 CPU core on 100%, GPU idle. Is it the storage of the embedding cache as json?

Operating System

runpod host with rtx 3090

[Bug] Docker build creates yarn error with esbuild

In the main branch, the automated yarn installation and build through docker build is flagging the following error

=> ERROR [17/18] RUN cd frontend && yarn install && yarn build
11.17 error /workspace/frontend/node_modules/esbuild: Command failed.
11.17 Error: Command failed: /usr/bin/node /workspace/frontend/node_modules/esbuild/bin/esbuild --version
11.17 <ref *1> Error: spawnSync /workspace/frontend/node_modules/@esbuild/linux-x64/bin/esbuild ETXTBSY
11.17 error: [Circular *1],

ERROR: failed to solve: process "/bin/sh -c cd frontend && yarn install && yarn build" did not complete successfully: exit code: 1

Here is the full error traceback from docker build

 => [16/18] RUN npm install -g yarn                                                                                                                                                                           6.9s 
 => ERROR [17/18] RUN cd frontend && yarn install && yarn build                                                                                                                                              11.3s 
------                                                                                                                                                                                                             
 > [17/18] RUN cd frontend && yarn install && yarn build:                                                                                                                                                          
0.770 yarn install v1.22.21                                                                                                                                                                                        
0.824 [1/4] Resolving packages...                                                                                                                                                                                  
0.917 [2/4] Fetching packages...                                                                                                                                                                                   
0.923 warning Pattern ["string-width@^4.1.0"] is trying to unpack in the same destination "/usr/local/share/.cache/yarn/v6/npm-string-width-cjs-4.2.3-269c7117d27b05ad2e536830a8ec895ef9c6d010-integrity/node_modules/string-width-cjs" as pattern ["string-width-cjs@npm:string-width@^4.2.0"]. This could result in non-deterministic behavior, skipping.
7.783 [3/4] Linking dependencies...
11.05 [4/4] Building fresh packages...
11.17 error /workspace/frontend/node_modules/esbuild: Command failed.
11.17 Exit code: 1
11.17 Command: node install.js
11.17 Arguments: 
11.17 Directory: /workspace/frontend/node_modules/esbuild
11.17 Output:
11.17 node:internal/errors:932
11.17   const err = new Error(message);
11.17               ^
11.17 
11.17 Error: Command failed: /usr/bin/node /workspace/frontend/node_modules/esbuild/bin/esbuild --version
11.17 node:child_process:929
11.17     throw err;
11.17     ^
11.17 
11.17 <ref *1> Error: spawnSync /workspace/frontend/node_modules/@esbuild/linux-x64/bin/esbuild ETXTBSY
11.17     at Object.spawnSync (node:internal/child_process:1124:20)
11.17     at spawnSync (node:child_process:876:24)
11.17     at Object.execFileSync (node:child_process:919:15)
11.17     at Object.<anonymous> (/workspace/frontend/node_modules/esbuild/bin/esbuild:221:28)
11.17     at Module._compile (node:internal/modules/cjs/loader:1376:14)
11.17     at Module._extensions..js (node:internal/modules/cjs/loader:1435:10)
11.17     at Module.load (node:internal/modules/cjs/loader:1207:32)
11.17     at Module._load (node:internal/modules/cjs/loader:1023:12)
11.17     at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:135:12)
11.17     at node:internal/main/run_main_module:28:49 {
11.17   errno: -26,
11.17   code: 'ETXTBSY',
11.17   syscall: 'spawnSync /workspace/frontend/node_modules/@esbuild/linux-x64/bin/esbuild',
11.17   path: '/workspace/frontend/node_modules/@esbuild/linux-x64/bin/esbuild',
11.17   spawnargs: [ '--version' ],
11.17   error: [Circular *1],
11.17   status: null,
11.17   signal: null,
11.17   output: null,
11.17   pid: 0,
11.17   stdout: null,
11.17   stderr: null
11.17 }
11.17 
11.17 Node.js v20.11.0
11.17 
11.17     at checkExecSyncError (node:child_process:890:11)
11.17     at Object.execFileSync (node:child_process:926:15)
11.17     at validateBinaryVersion (/workspace/frontend/node_modules/esbuild/install.js:99:28)
11.17     at /workspace/frontend/node_modules/esbuild/install.js:284:5 {
11.17   status: 1,
11.17   signal: null,
11.17   output: [
11.17     null,
11.17     Buffer(0) [Uint8Array] [],
11.17     Buffer(1157) [Uint8Array] [
11.17       110, 111, 100, 101,  58,  99, 104, 105, 108, 100,  95, 112,
11.17       114, 111,  99, 101, 115, 115,  58,  57,  50,  57,  10,  32,
11.17        32,  32,  32, 116, 104, 114, 111, 119,  32, 101, 114, 114,
11.17        59,  10,  32,  32,  32,  32,  94,  10,  10,  60, 114, 101,
11.17       102,  32,  42,  49,  62,  32,  69, 114, 114, 111, 114,  58,
11.17        32, 115, 112,  97, 119, 110,  83, 121, 110,  99,  32,  47,
11.17       119, 111, 114, 107, 115, 112,  97,  99, 101,  47, 102, 114,
11.17       111, 110, 116, 101, 110, 100,  47, 110, 111, 100, 101,  95,
11.17       109, 111, 100, 117,
11.17       ... 1057 more items
11.17     ]
11.17   ],
11.17   pid: 153,
11.17   stdout: Buffer(0) [Uint8Array] [],
11.17   stderr: Buffer(1157) [Uint8Array] [
11.17     110, 111, 100, 101,  58,  99, 104, 105, 108, 100,  95, 112,
11.17     114, 111,  99, 101, 115, 115,  58,  57,  50,  57,  10,  32,
11.17      32,  32,  32, 116, 104, 114, 111, 119,  32, 101, 114, 114,
11.17      59,  10,  32,  32,  32,  32,  94,  10,  10,  60, 114, 101,
11.17     102,  32,  42,  49,  62,  32,  69, 114, 114, 111, 114,  58,
11.17      32, 115, 112,  97, 119, 110,  83, 121, 110,  99,  32,  47,
11.17     119, 111, 114, 107, 115, 112,  97,  99, 101,  47, 102, 114,
11.17     111, 110, 116, 101, 110, 100,  47, 110, 111, 100, 101,  95,
11.17     109, 111, 100, 117,
11.17     ... 1057 more items
11.17   ]
11.17 }
11.17 
11.17 Node.js v20.11.0
11.17 info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
------
Dockerfile:61
--------------------
  59 |     
  60 |     # Install frontend dependencies and build it for production (into the frontend/dist folder)
  61 | >>> RUN cd frontend && yarn install && yarn build
  62 |     
  63 |     # Container start script
--------------------
ERROR: failed to solve: process "/bin/sh -c cd frontend && yarn install && yarn build" did not complete successfully: exit code: 1

[Core Structure Change]: branch:frontend is a failed attempt to docker-compose the frontend with the backend

[Core Structure Change]

Following the presciption on Developing a Single Page App with FastAPI and Vue.js, I created a docker-compose with services/backend and services/frontend.

This was the nominal solution to Issue #3, but the work is incomplete.

If anyone knows better how to compose the vue3 with the fastapi+haystack backend, please check it out.

Note that the error is very likely in the services/frontend/Dockerfile or how the frontend volumes is configured in the `docker-compose.yml' file.

[Bug]: [EmbeddingDE] docker run produces ImportError: 'send_event' & 'haystack.telemetry'

In Branch: feature/EmbeddingDE

After introducing the new FAISSDoctumentStore (for sqlite) and jina.ai embedding for DE+EN embedding, the docker run began to flag the error:

ImportError: cannot import name 'send_event' from 'haystack.telemetry'

This error started early in the process, but could not discover how or why.

Following the guidance from deepset-ai/haystack#6652, I began to turn off farm-haystack features and/or haystack-ai in the Dockerfile.

Later, I attempted to forcibly install Python3.8 or Python.3.9 -- both suggest by the Haystack:Issue:6652 above.

These solutions may still exist, but my implementation did not fix the error.

Support overwriting model via environment

Terms

Description

We can set the model using the environment variable OLLAMA_MODEL_NAME already.
If we change the model string on docker run, the backend fails though, because the model was not pulled.

This ticket is about fixing this by pulling the model in start.sh too, not only in the Dockerfile.

This will also allow us to use small models by default on dev systems while using a different model in production.

Contribution

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.