Git Product home page Git Product logo

docsgpt's Introduction

DocsGPT πŸ¦–

Open-Source Documentation Assistant

DocsGPT is a cutting-edge open-source solution that streamlines the process of finding information in the project documentation. With its integration of the powerful GPT models, developers can easily ask questions about a project and receive accurate answers.

Say goodbye to time-consuming manual searches, and let DocsGPT help you quickly find the information you need. Try it out and see how it revolutionizes your project documentation experience. Contribute to its development and be a part of the future of AI-powered assistance.

link to main GitHub showing Stars number link to main GitHub showing Forks number link to license file link to discord X (formerly Twitter) URL

Production Support / Help for Companies:

We're eager to provide personalized assistance when deploying your DocsGPT to a live environment.

video-example-of-docs-gpt

Roadmap

You can find our roadmap here. Please don't hesitate to contribute or create issues, it helps us improve DocsGPT!

Our Open-Source Models Optimized for DocsGPT:

Name Base Model Requirements (or similar)
Docsgpt-7b-mistral Mistral-7b 1xA10G gpu
Docsgpt-14b llama-2-14b 2xA10 gpu's
Docsgpt-40b-falcon falcon-40b 8xA10G gpu's

If you don't have enough resources to run it, you can use bitsnbytes to quantize.

Features

Main features of DocsGPT showcasing six main features

Useful Links

Project Structure

  • Application - Flask app (main application).

  • Extensions - Chrome extension.

  • Scripts - Script that creates similarity search index for other libraries.

  • Frontend - Frontend uses Vite and React.

QuickStart

Note

Make sure you have Docker installed

On Mac OS or Linux, write:

./setup.sh

It will install all the dependencies and allow you to download the local model, use OpenAI or use our LLM API.

Otherwise, refer to this Guide:

  1. Download and open this repository with git clone https://github.com/arc53/DocsGPT.git

  2. Create a .env file in your root directory and set the env variables and VITE_API_STREAMING to true or false, depending on whether you want streaming answers or not. It should look like this inside:

    LLM_NAME=[docsgpt or openai or others] 
    VITE_API_STREAMING=true
    API_KEY=[if LLM_NAME is openai]
    

    See optional environment variables in the /.env-template and /application/.env_sample files.

  3. Run ./run-with-docker-compose.sh.

  4. Navigate to http://localhost:5173/.

To stop, just run Ctrl + C.

Development Environments

Spin up Mongo and Redis

For development, only two containers are used from docker-compose.yaml (by deleting all services except for Redis and Mongo). See file docker-compose-dev.yaml.

Run

docker compose -f docker-compose-dev.yaml build
docker compose -f docker-compose-dev.yaml up -d

Run the Backend

Note

Make sure you have Python 3.10 or 3.11 installed.

  1. Export required environment variables or prepare a .env file in the project folder:

(check out application/core/settings.py if you want to see more config options.)

  1. (optional) Create a Python virtual environment: You can follow the Python official documentation for virtual environments.

a) On Mac OS and Linux

python -m venv venv
. venv/bin/activate

b) On Windows

python -m venv venv
 venv/Scripts/activate
  1. Download embedding model and save it in the model/ folder: You can use the script below, or download it manually from here, unzip it and save it in the model/ folder.
wget https://d3dg1063dc54p9.cloudfront.net/models/embeddings/mpnet-base-v2.zip
unzip mpnet-base-v2.zip -d model
rm mpnet-base-v2.zip
  1. Install dependencies for the backend:
pip install -r application/requirements.txt
  1. Run the app using flask --app application/app.py run --host=0.0.0.0 --port=7091.
  2. Start worker with celery -A application.app.celery worker -l INFO.

Start Frontend

Note

Make sure you have Node version 16 or higher.

  1. Navigate to the /frontend folder.
  2. Install the required packages husky and vite (ignore if already installed).
npm install husky -g
npm install vite -g
  1. Install dependencies by running npm install --include=dev.
  2. Run the app using npm run dev.

Contributing

Please refer to the CONTRIBUTING.md file for information about how to get involved. We welcome issues, questions, and pull requests.

Code Of Conduct

We as members, contributors, and leaders, pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. Please refer to the CODE_OF_CONDUCT.md file for more information about contributing.

Many Thanks To Our Contributors⚑

Contributors

License

The source code license is MIT, as described in the LICENSE file.

Built with 🐦 πŸ”— LangChain

docsgpt's People

Contributors

actions-user avatar adeyinkaezra123 avatar ajaythapliyal avatar ankit-matth avatar arpitpandey29 avatar dartpain avatar debnath003 avatar dependabot[bot] avatar drk1rd avatar exterminator11 avatar guspan-tanadi avatar harshmn2345 avatar jbampton avatar ka1bi4 avatar krishvsoni avatar lakshmi930 avatar larinam avatar m4n4n-j avatar manishmadan2882 avatar pabik avatar rahul0x00 avatar samsshow avatar shelar1423 avatar shruti-sen2004 avatar siiddhantt avatar soumyadiptopal avatar staticguru avatar taylors15 avatar tenokami avatar zillibub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docsgpt's Issues

openai.error.InvalidRequestError when using custom documentation

I followed the instructions to train my own documents, but the generated index file was named "index.faiss" and "index.pkl" instead of "docs.index" and "faiss_store.pkl". After copying the generated files to the application directory and running app.py, I encountered the following error message when asking questions:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2548, in call
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/Users/liam.mazy/Developer/DocsGPT/application/app.py", line 129, in api_answer
result = chain({"query": question})
File "/usr/local/lib/python3.10/site-packages/langchain/chains/base.py", line 142, in call
raise e
File "/usr/local/lib/python3.10/site-packages/langchain/chains/base.py", line 139, in call
outputs = self._call(inputs)
File "/usr/local/lib/python3.10/site-packages/langchain/chains/vector_db_qa/base.py", line 153, in _call
answer, _ = self.combine_documents_chain.combine_docs(docs, question=question)
File "/usr/local/lib/python3.10/site-packages/langchain/chains/combine_documents/map_reduce.py", line 139, in combine_docs
results = self.llm_chain.apply(
File "/usr/local/lib/python3.10/site-packages/langchain/chains/llm.py", line 91, in apply
response = self.generate(input_list)
File "/usr/local/lib/python3.10/site-packages/langchain/chains/llm.py", line 59, in generate
response = self.llm.generate(prompts, stop=stop)
File "/usr/local/lib/python3.10/site-packages/langchain/llms/base.py", line 128, in generate
raise e
File "/usr/local/lib/python3.10/site-packages/langchain/llms/base.py", line 125, in generate
output = self._generate(prompts, stop=stop)
File "/usr/local/lib/python3.10/site-packages/langchain/llms/openai.py", line 217, in _generate
response = self.completion_with_retry(prompt=_prompts, **params)
File "/usr/local/lib/python3.10/site-packages/langchain/llms/openai.py", line 178, in completion_with_retry
return _completion_with_retry(**kwargs)
File "/usr/local/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/usr/local/lib/python3.10/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/init.py", line 314, in iter
return fut.result()
File "/usr/local/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/local/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/site-packages/tenacity/init.py", line 382, in call
result = fn(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/langchain/llms/openai.py", line 176, in _completion_with_retry
return self.client.create(**kwargs)
File "/usr/local/lib/python3.10/site-packages/openai/api_resources/completion.py", line 25, in create
return super().create(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 227, in request
resp, got_stream = self._interpret_response(result, stream)
File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 620, in _interpret_response
self._interpret_response_line(
File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 680, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 8745 tokens (8489 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

It seems that the application is not recognizing my custom index file. I'm not sure if I missed any steps or if there is a problem with the instructions.

Getting rate limited by OpenAI on ingestion

Getting the following error when ingesting my documentation:

openai.error.RateLimitError: Rate limit reached for default-global-with-image-limits in organization org-Wge... on requests per min. Limit: 60.000000 / min. Current: 70.000000 / min. Contact [email protected] if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.

Additional context:

Number of Tokens = 538,902

I can't find any way to specify a backoff. I have a paid account, and apparently the rate limit is raised over time. However, I would like to be able to specify a rate limit.

List index out of range when executing ingest_rst.py

After adding a file to /scripts/inputs I get the following error when running ingest_rst.py. Any ideas?

Traceback (most recent call last):
  File "/Users/ivan/Documents/dev/DocsGPT/scripts/ingest_rst.py", line 85, in <module>
    get_user_permission()
  File "/Users/ivan/Documents/dev/DocsGPT/scripts/ingest_rst.py", line 38, in get_user_permission
    call_openai_api()
  File "/Users/ivan/Documents/dev/DocsGPT/scripts/ingest_rst.py", line 21, in call_openai_api
    store = FAISS.from_texts(docs, OpenAIEmbeddings(), metadatas=metadatas)
  File "/Users/ivan/.local/lib/python3.8/site-packages/langchain/vectorstores/faiss.py", line 192, in from_texts
    index = faiss.IndexFlatL2(len(embeddings[0]))
IndexError: list index out of range

Multiple Chats for Different Source Documentation

As a user of DocsGPT, I would like to have the ability to have multiple chats for different source documentation so that the app can remember what was discussed in the chat earlier and provide better subsequent answers. This would improve the user experience by giving the app more context about what the user is talking about, much like how different chats work in ChatGPT.

Proposed Solution
Implement a feature that allows the user to create multiple chats for different source documentation.
Store the history of each chat so that the app can provide better subsequent answers.
Update the UI/UX to make it easy for the user to switch between different chats and source documentation.

Support for remote info stores like, website, Confluence, Sharepoint etc,

Or must all files exist locally ?

In real world of large enterprises, theres a confluence server, a jira server, sharepoint server that typically reside in a data center or as a SaaS cloud, and some on-prem custom html, excel files etc.

Would be great a ChatGPT type of ingest/compute searches across remote systems.. how feasible is this ?
Thanks

Extract out common modal functionalities to a generic modal component

Both of the existing modal components have few things in common:
1]They both overlay over existing page. Both have similar CSS design(rounded corners, shadows).
2]Both have some similar functions as well, both will go away when "save" is clicked. As for now one of them have cancellation feature but as we add more modals, this too can be a common functionality.

These things should be moved to a generic component. This modal will accept :
1]save click handler as a prop
2]isCancellable boolean prop to decide whether to enable the cancellable feature or not.
3]The actually view to render as render prop

Code parsing

Make sure we can ingest code well

Javascript
Java
Python

Support for other llm's via Langchain

Show how to switch in documentation

  • Test on different embeddings and different llm (e.g Huggingface embeds and Cohere llm)
  • Make sure different indexes can be used
  • add all old vectorstores to new format in docshub
  • Create documentation to show how to switch between LLM's
  • Handle different embeddings in frontend
  • Add embeddings path on DocsHUB (make it backwards compatable)

Stream of warnings about chunk size longer than the specified 1500

Ingestion of a large text file emits a stream of warnings similar to the following:

Created a chunk of size 2799, which is longer than the specified 1500

I'm not sure what this means (what the implications are), and what I should do about this.

I read the comment in the source code that says:

# We do this due to the context limits of the LLMs.

Does this mean that uploading this data set will be a waste of money - that it won't work?
Is there something that I can do to mitigate this?

A more helpful message - maybe even with a link to more info - would be great.

Windows issue with loading pickle

Discussed in #5

Originally posted by ClayPippenger February 5, 2023
I was not able to get the sample working on Windows due to this error:

image1

Lots of stuff searching for this error not related. The way I was able to fix this was by editing a related library:

image_code_change

I just commented out the three highlighted lines and saved the file. Obviously, this is not an elegant solution. But, just to get people working, I thought for now this may be adequate.

This is the test code working now:

image2

Hope this helps someone or improves setup documentation.

Cite sources

Add sources section to every reply by the AI in the UI
AI already generates sources. you need to only add it to the frontend. Required data is in the fetch request

Encapsulate the decision of what view to render(mobile or desktop) within the component.

As per the current code we are maintaining the view data as a state in parent component and passing it down to children using props which can get out of hand pretty quickly as soon as we have many component that have seperate views for mobile and desktop.

Instead we could let each component make their own independent decision whether or when to switch to mobile view or desktop view by encapsulating the decision within the component.

Since this is a common logic shared across multiple components, this can be moved into a custom hook which then can be used by multiple components.

`KeyError: 'source'` when using a custom doc

Hi! I followed this guide to train the AI with my own documentation, which is a .md file.

I changed the .rst extensions I saw in ingest_rst.py and ingest_rst_sphinx.py to .md and that successfully generated the docs.index and faiss_store.pkl files wich I moved to the root of /application.

However, when using the Default option in the dropdown of the UI, I get a 500 response that says KeyError: 'source'.
Am I missing something?

Error training on new docs

Getting this error out of the gate when trying to train on a single new .rst file. Any thoughts?

Traceback (most recent call last):
  File "/home/pshriwise/soft/doc-gpt/docsgpt/scripts/ingest_rst.py", line 36, in <module>
    store = FAISS.from_texts(docs, OpenAIEmbeddings(), metadatas=metadatas)
  File "/home/pshriwise/.pyenv/versions/3.9.1/lib/python3.9/site-packages/langchain/vectorstores/faiss.py", line 192, in from_texts
    index = faiss.IndexFlatL2(len(embeddings[0]))
IndexError: list index out of range

Develop a community website to centralize pre-trained docs

Hi everyone,

Just landed on this project, and i can already see the huge impact this project could have in software programming!

The idea would be to develop a community website where we could search for stores to download in order to import them directly in docsGPT. This would avoid people paying fees when a documentation for a given version has already been ingested by somebody else.

Another idea would be to have this community list directly integrated within the app. Or docsGPT could be hosted online directly.

in-conversation memory

Should be done in two parts

  1. pass history with each api request
  2. Add history section in the template and include it in the prompt template(limit it two two questions and answers)

Improve modal language

The modal text should be updated to remove acronyms that non-developers might not understand.

Increase training stability and avoid timeout caused by OpenAI API calling rate limit

  • For users without pro accounts, requests to OpenAI API frequently time out due to calling rate limit.
  • We could add a retry policy to increase stability.
    for i in tqdm(docs, desc="Embedding πŸ¦–", unit="docs", total=len(docs), bar_format='{l_bar}{bar}| Time Left: {remaining}'):
    try:
    import time
    store.add_texts([i.page_content], metadatas=[i.metadata])
    except Exception as e:
    print(e)
    print("Error on ", i)
    print("Saving progress")
    print(f"stopped at {c1} out of {len(docs)}")
    faiss.write_index(store.index, "docs.index")
    store_index_bak = store.index
    store.index = None
    with open("faiss_store.pkl", "wb") as f:
    pickle.dump(store, f)
    print("Sleeping for 60 seconds and trying again")
    time.sleep(60)
    faiss.write_index(store_index_bak, "docs.index")
    store.index = store_index_bak
    store.add_texts([i.page_content], metadatas=[i.metadata])

OpenAPI3 (Swagger) parsing

OpenAPI3 (Swagger api) parsing of the json file would be extremely helpful as this is commonly published docs.

Html parsing

In a /scripts/parser add HTML parsing, you can use langchains module

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.