rti / gbnc Goto Github PK

License: Other

Dockerfile 8.54% Python 39.72% Shell 3.04% HTML 1.19% CSS 1.90% JavaScript 5.54% Vue 32.46% TypeScript 7.61%

gbnc's Introduction

Naïve Infrastructure for a GB&C project

Warning This is a prototype for development only. No security considerations have been made. All services run as root!

Getting started

Locally

To build and run the container locally with hot reload on python files do:

DOCKER_BUILDKIT=1 docker build . -t gbnc
docker run  \
  --env HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
  --volume "$(pwd)/gswikichat":/workspace/gswikichat \
  --volume gbnc_cache:/root/.cache \
  --publish 8000:8000 \
  --rm \
  --interactive \
  --tty \
  --name gbnc \
  gbnc

Point your browser to http://localhost:8000/ and use the frontend.

Runpod.io

The container works on runpod.io GPU instances. A template is available here.

Local development

Backend

python -m venv .venv
. ./.venv/bin/activate
pip install -r requirements.txt

Frontend

cd frontend
yarn dev

What's in the box

Docker container

One container running all the components. No separation to keep it simple. Based on Nvidia CUDA containers in order to support GPU acceleration. Small models work on laptop CPUs too (tested i7-1260P).

Ollama inference

The container runs Ollama for LLM inference. Will probably not scale enough when run as a service for multiple users, but enough for testing.

Phi2 LLM

The Microsoft Phi2 2.7B model is run by default. The model runs locally using Ollama. Can be switched with the MODEL docker build arg.

Haystack RAG Framework

The Haystack RAG framework is used to implement Retrieval Augmented Generation on a minimal test dataset.

API

A FastAPI server is running in the container. It exposes an API to receive a question from the frontend, runs the Haystack RAG and returns the response.

Frontend

A minimal frontend lets the user input a question and renders the response from the system.

gbnc's People

Contributors

Stargazers

Watchers

Forkers

andrewtavis

gbnc's Issues

EN/DE dataset

Terms

I have searched open and closed issues
I agree to follow Wikimedia's Code of Conduct

Issue

To really test the prototype, we need an English dataset too, ideally, its data should mostly overlap with the German one. Just like in a wiki where most of the contents are translated in both languages (but some stuff might be missing).

Add in loading animation to UI

@rti and I just discussed that the final thing needed for the UI is a loading animation for while the model is running. As of now the field where the response is displayed it just blank with no indication that there's work being done.

Suggestion for this:
We display the Wikimedia icon as is done in the response and have a placeholder text rectangle next to it. These two can then be blinked slowly as the model runs. The condition for this would be when an API request has been made, but without a response being present yet.

Eval full pipeline

Terms

I have searched open and closed issues
I agree to follow Wikimedia's Code of Conduct

Issue

I think it would be interesting to evaluate the performance of the pipeline at different stages.

How good is the retrieval?
- How do different embedding models perform in comparison?
What is the best amount of contexts to give into the model?
Which model answers questions best?
- Takes up the actual facts from the context
- Least hallucinations
- Best phrasing

For the last GB&C Silvan and I implemented something very simple, but conceptually similar for the askwikidata prototype:
https://github.com/rti/askwikidata/blob/main/eval.py

There are also frameworks such as Ragas that might help https://docs.ragas.io/en/latest/getstarted/evaluation.html#metrics

Suggestion: We should change the repo and project name

Terms

I have searched open and closed issues
I agree to follow Wikimedia's Code of Conduct

Issue

I suggest that we change the GitHub repo name and project name to PrivateWikiSearchRAG or private_wiki_search_RAG, which can be shortened to pwsr instead of gbnc. Please provide any other suggestions for changing the name. We can discuss or vote on them here.

Justification for changing the name
We started this project as a GB&C small project, but it has now evolved into a deployable prototype RAG for almost any private wiki-text focused dataset.

I think that we should reflect his upgrade in our scope and functionality to match the level of capacity we provided as well as the larger range of community that could take advantage of our efforts.

Embedding generation runs on CPU only

Terms

I have searched all open bug reports
I agree to follow Wikimedia's Code of Conduct

Behavior

When generating embeddings, only CPU is used, no GPU acceleration is leveraged.
This makes embedding generation for our full example data requiring 18h on 16 cores.
Typically, GPU acceleration can be activated by providing a device="cuda" parameter. This should speed up the embedding generation.

Operating System

linux, our container on runpod.io host with nvidia 3090

[Bug]: [Frontend] Frontend from andrewtavis/gbnc is blank with no errors on terminal

In Branch: Frontend

Filing a bug report

If we replace the static directory in rti/gbnc repo with the frontend directory in andrewtavis/gbnc repo, and change a small typo related to src/ -> src/, then

the docker build does not produce errors
the webserver is live
the localhost view is blank

Auth for serving data from private wikis

Terms

I have searched open and closed feature requests
I agree to follow Wikimedia's Code of Conduct

Description

In order to serve data from private wikis on a public deployment, we need some kind of authentication. Simplest could be a shared secret required by the api, entered into the frontend. Production deployments may want to use OAuth or LDAP maybe.

Contribution

No response

Interaction pattern

Terms

I have searched open and closed issues
I agree to follow Wikimedia's Code of Conduct

Issue

Which interaction pattern do we want to implement?

Ask

Ask a full question and get a response.

Search

Enter a search string and get a summary.

Chat

Ask questions with followup.

Else

Something else?

How to handle languages

Terms

I have searched open and closed issues
I agree to follow Wikimedia's Code of Conduct

Issue

We want to support English and German to start with.

How should the system behave?

The UI switches languages based on browser language already ✅
Should the LLM always answer in the UI language?
What if the user ask in a different language?
How does the LLM perform in translating?
- Thanks to the multilingual embeddings, we should already retrieve information in both languages transparently
- Can the LLM get a context in one language and still respond correctly in another language?
Should we somehow restrict everything to a single language at a time? How?

Updated Issue template forms

@andrewtavis said: Let me know if we want to do issue template forms for this. I've made loads of those and I'd be happy to add a few in for bugs, feature requests and the like :)

Originally posted by @andrewtavis in #11 (comment)

Yes, please. I copy pasted the Issue and PR templates from a GitHub tutorial. Please do what you wish to with those.

Rework API Secret flow

Terms

I have searched open and closed design issues
I agree to follow Wikimedia's Code of Conduct

Description

The user flow of entering the API secret should be revisited and probably reworked. This is a follow up to #36

Contribution

No response

Data fetching

Terms

I have searched open and closed feature requests
I agree to follow Wikimedia's Code of Conduct

Description

Currently, we build the data hardcoded into the deployed container. A production system should pull the data at runtime, update it regularly and rebuild embeddings automatically.

Contribution

No response

Inacceptable performance on excellent_acticles full

Terms

I have searched all open bug reports
I agree to follow Wikimedia's Code of Conduct

Behavior

When booting up with the full excellent articles dataset, the application takes ages to start up. After the embeddings have been generated with GPU acceleration (NVIDIA RTX 3090 takes ~10minutes to embed excellent articles full) the application hangs, 1 CPU core on 100%, GPU idle. Is it the storage of the embedding cache as json?

Operating System

runpod host with rtx 3090

Use ollama chat-generation

Terms

I have searched open and closed feature requests
I agree to follow Wikimedia's Code of Conduct

Description

This allows us to use ollamas knowledge about model prompt templates, so we can easily swap out models in our pipeline without adjusting our prompt format.

https://haystack.deepset.ai/integrations/ollama#chat-generation

Contribution

No response

Additional data sources ideas

Terms

I have searched open and closed feature requests
I agree to follow Wikimedia's Code of Conduct

Description

List of data sources that could be integrated as well.

Who are our users? What do our users expect?

https://www.wikimedia.de/abcdesfreienwissens/
Open Glossar - https://wiki.wikimedia.de/wiki/Leseliste

Contribution

No response

[Bug] Docker build creates yarn error with esbuild

In the main branch, the automated yarn installation and build through docker build is flagging the following error

=> ERROR [17/18] RUN cd frontend && yarn install && yarn build
11.17 error /workspace/frontend/node_modules/esbuild: Command failed.
11.17 Error: Command failed: /usr/bin/node /workspace/frontend/node_modules/esbuild/bin/esbuild --version
11.17 <ref *1> Error: spawnSync /workspace/frontend/node_modules/@esbuild/linux-x64/bin/esbuild ETXTBSY
11.17 error: [Circular *1],

ERROR: failed to solve: process "/bin/sh -c cd frontend && yarn install && yarn build" did not complete successfully: exit code: 1

Here is the full error traceback from docker build

 => [16/18] RUN npm install -g yarn                                                                                                                                                                           6.9s 
 => ERROR [17/18] RUN cd frontend && yarn install && yarn build                                                                                                                                              11.3s 
------                                                                                                                                                                                                             
 > [17/18] RUN cd frontend && yarn install && yarn build:                                                                                                                                                          
0.770 yarn install v1.22.21                                                                                                                                                                                        
0.824 [1/4] Resolving packages...                                                                                                                                                                                  
0.917 [2/4] Fetching packages...                                                                                                                                                                                   
0.923 warning Pattern ["string-width@^4.1.0"] is trying to unpack in the same destination "/usr/local/share/.cache/yarn/v6/npm-string-width-cjs-4.2.3-269c7117d27b05ad2e536830a8ec895ef9c6d010-integrity/node_modules/string-width-cjs" as pattern ["string-width-cjs@npm:string-width@^4.2.0"]. This could result in non-deterministic behavior, skipping.
7.783 [3/4] Linking dependencies...
11.05 [4/4] Building fresh packages...
11.17 error /workspace/frontend/node_modules/esbuild: Command failed.
11.17 Exit code: 1
11.17 Command: node install.js
11.17 Arguments: 
11.17 Directory: /workspace/frontend/node_modules/esbuild
11.17 Output:
11.17 node:internal/errors:932
11.17   const err = new Error(message);
11.17               ^
11.17 
11.17 Error: Command failed: /usr/bin/node /workspace/frontend/node_modules/esbuild/bin/esbuild --version
11.17 node:child_process:929
11.17     throw err;
11.17     ^
11.17 
11.17 <ref *1> Error: spawnSync /workspace/frontend/node_modules/@esbuild/linux-x64/bin/esbuild ETXTBSY
11.17     at Object.spawnSync (node:internal/child_process:1124:20)
11.17     at spawnSync (node:child_process:876:24)
11.17     at Object.execFileSync (node:child_process:919:15)
11.17     at Object.<anonymous> (/workspace/frontend/node_modules/esbuild/bin/esbuild:221:28)
11.17     at Module._compile (node:internal/modules/cjs/loader:1376:14)
11.17     at Module._extensions..js (node:internal/modules/cjs/loader:1435:10)
11.17     at Module.load (node:internal/modules/cjs/loader:1207:32)
11.17     at Module._load (node:internal/modules/cjs/loader:1023:12)
11.17     at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:135:12)
11.17     at node:internal/main/run_main_module:28:49 {
11.17   errno: -26,
11.17   code: 'ETXTBSY',
11.17   syscall: 'spawnSync /workspace/frontend/node_modules/@esbuild/linux-x64/bin/esbuild',
11.17   path: '/workspace/frontend/node_modules/@esbuild/linux-x64/bin/esbuild',
11.17   spawnargs: [ '--version' ],
11.17   error: [Circular *1],
11.17   status: null,
11.17   signal: null,
11.17   output: null,
11.17   pid: 0,
11.17   stdout: null,
11.17   stderr: null
11.17 }
11.17 
11.17 Node.js v20.11.0
11.17 
11.17     at checkExecSyncError (node:child_process:890:11)
11.17     at Object.execFileSync (node:child_process:926:15)
11.17     at validateBinaryVersion (/workspace/frontend/node_modules/esbuild/install.js:99:28)
11.17     at /workspace/frontend/node_modules/esbuild/install.js:284:5 {
11.17   status: 1,
11.17   signal: null,
11.17   output: [
11.17     null,
11.17     Buffer(0) [Uint8Array] [],
11.17     Buffer(1157) [Uint8Array] [
11.17       110, 111, 100, 101,  58,  99, 104, 105, 108, 100,  95, 112,
11.17       114, 111,  99, 101, 115, 115,  58,  57,  50,  57,  10,  32,
11.17        32,  32,  32, 116, 104, 114, 111, 119,  32, 101, 114, 114,
11.17        59,  10,  32,  32,  32,  32,  94,  10,  10,  60, 114, 101,
11.17       102,  32,  42,  49,  62,  32,  69, 114, 114, 111, 114,  58,
11.17        32, 115, 112,  97, 119, 110,  83, 121, 110,  99,  32,  47,
11.17       119, 111, 114, 107, 115, 112,  97,  99, 101,  47, 102, 114,
11.17       111, 110, 116, 101, 110, 100,  47, 110, 111, 100, 101,  95,
11.17       109, 111, 100, 117,
11.17       ... 1057 more items
11.17     ]
11.17   ],
11.17   pid: 153,
11.17   stdout: Buffer(0) [Uint8Array] [],
11.17   stderr: Buffer(1157) [Uint8Array] [
11.17     110, 111, 100, 101,  58,  99, 104, 105, 108, 100,  95, 112,
11.17     114, 111,  99, 101, 115, 115,  58,  57,  50,  57,  10,  32,
11.17      32,  32,  32, 116, 104, 114, 111, 119,  32, 101, 114, 114,
11.17      59,  10,  32,  32,  32,  32,  94,  10,  10,  60, 114, 101,
11.17     102,  32,  42,  49,  62,  32,  69, 114, 114, 111, 114,  58,
11.17      32, 115, 112,  97, 119, 110,  83, 121, 110,  99,  32,  47,
11.17     119, 111, 114, 107, 115, 112,  97,  99, 101,  47, 102, 114,
11.17     111, 110, 116, 101, 110, 100,  47, 110, 111, 100, 101,  95,
11.17     109, 111, 100, 117,
11.17     ... 1057 more items
11.17   ]
11.17 }
11.17 
11.17 Node.js v20.11.0
11.17 info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
------
Dockerfile:61
--------------------
  59 |     
  60 |     # Install frontend dependencies and build it for production (into the frontend/dist folder)
  61 | >>> RUN cd frontend && yarn install && yarn build
  62 |     
  63 |     # Container start script
--------------------
ERROR: failed to solve: process "/bin/sh -c cd frontend && yarn install && yarn build" did not complete successfully: exit code: 1

[Core Structure Change]: branch:frontend is a failed attempt to docker-compose the frontend with the backend

[Core Structure Change]

Following the presciption on Developing a Single Page App with FastAPI and Vue.js, I created a docker-compose with services/backend and services/frontend.

This was the nominal solution to Issue #3, but the work is incomplete.

If anyone knows better how to compose the vue3 with the fastapi+haystack backend, please check it out.

Note that the error is very likely in the services/frontend/Dockerfile or how the frontend volumes is configured in the `docker-compose.yml' file.

[Bug]: [EmbeddingDE] docker run produces ImportError: 'send_event' & 'haystack.telemetry'

In Branch: feature/EmbeddingDE

After introducing the new FAISSDoctumentStore (for sqlite) and jina.ai embedding for DE+EN embedding, the docker run began to flag the error:

ImportError: cannot import name 'send_event' from 'haystack.telemetry'

This error started early in the process, but could not discover how or why.

Following the guidance from deepset-ai/haystack#6652, I began to turn off farm-haystack features and/or haystack-ai in the Dockerfile.

Later, I attempted to forcibly install Python3.8 or Python.3.9 -- both suggest by the Haystack:Issue:6652 above.

These solutions may still exist, but my implementation did not fix the error.

Support overwriting model via environment

Terms

I have searched open and closed feature requests
I agree to follow Wikimedia's Code of Conduct

Description

We can set the model using the environment variable OLLAMA_MODEL_NAME already.
If we change the model string on docker run, the backend fails though, because the model was not pulled.

This ticket is about fixing this by pulling the model in start.sh too, not only in the Dockerfile.

This will also allow us to use small models by default on dev systems while using a different model in production.

Contribution

No response

rti / gbnc Goto Github PK

gbnc's Introduction

Naïve Infrastructure for a GB&C project

Getting started

Locally

Runpod.io

Local development

Backend

Frontend

What's in the box

Docker container

Ollama inference

Phi2 LLM

Haystack RAG Framework

API

Frontend

gbnc's People

Contributors

Stargazers

Watchers

Forkers

gbnc's Issues

Terms

Issue

Terms

Issue

Terms

Issue

Terms

Behavior

Operating System

Terms

Description

Contribution

Terms

Issue

Ask

Search

Chat

Else

Terms

Issue

Terms

Description

Contribution

Terms

Description

Contribution

Terms

Behavior

Operating System

Terms

Description

Contribution

Terms

Description

Contribution

Terms

Description

Contribution

Recommend Projects

Recommend Topics

Recommend Org