tsinghuadatabasegroup / db-gpt Goto Github PK

View Code? Open in Web Editor NEW

438.0 9.0 60.0 365.21 MB

An LLM Based Diagnosis System (https://arxiv.org/pdf/2312.01454.pdf)

Home Page: http://dbgpt.dbmind.cn/

License: Apache License 2.0

Python 79.29% Shell 0.14% JavaScript 6.02% HTML 0.15% Vue 11.08% CSS 0.15% SCSS 2.99% TypeScript 0.17% Dockerfile 0.01%

dba diagnosis tuning database

db-gpt's Introduction

LLM As Database Administrator

Demo • QuickStart • Alerts And Anomalies • Knowledge And Tools • Dockers • FAQ • Community • Citation • Contributors

👫 Join Us on WeChat! 🏆 Top 100 Open Project! 🌟 VLDB 2024!

【English | 中文】

🦾 Build your personal database administrator (D-Bot)🧑‍💻, which is good at solving database problems by reading documents, using various tools, writing analysis reports! Undergoing An Upgrade!

🗺 Online Demo

After launching the local service (adopting frontend and configs from Chatchat), you can easily import documents into the knowledge base, utilize the knowledge base for well-founded Q&A and diagnosis analysis of abnormal alarms.

With the user feedback function 🔗, you can (1) send feedbacks to make D-Bot follow and refine the intermediate diagnosis results, and (2) edit the diagnosis result by clicking the “Edit” button. D-Bot can accumulate refinement patterns from the user feedbacks (stored in vector database) and adaptively align to user's diagnosis preference.

On the online website (http://dbgpt.dbmind.cn), you can browse all historical diagnosis results, used metrics, and detailed diagnosis processes.

Old Version 1: [Gradio for Diag Game] (no langchain)

Old Version 2: [Vue for Report Replay] (no langchain)

📰 Updates

This project is evolving with new features 👫👫
Don't forget to star ⭐ and watch 👀 to stay up to date :)

🕹 QuickStart

1. Environment Setup

1.1 backend setup

First, ensure that your machine has Python (>= 3.10) installed.

$ python --version
Python 3.10.12

Next, create a virtual environment and install the dependencies for the project within it.

# Clone the repository
$ git clone https://github.com/TsinghuaDatabaseGroup/DB-GPT.git

# Enter the directory
$ cd DB-GPT

# Install all dependencies
$ pip3 install -r requirements.txt 
$ pip3 install -r requirements_api.txt # If only running the API, you can just install the API dependencies, please use requirements_api.txt

# Default dependencies include the basic runtime environment (Chroma-DB vector library). If you want to use other vector libraries, please uncomment the respective dependencies in requirements.txt before installation.

If fail to install google-colab, try conda install -c conda-forge google-colab

PostgreSQL v12 (We have developed and tested based on PostgreSQL v12, we do not guarantee compatibility with other versions of PostgreSQL)

Ensure your database supports remote connections (link)

Moreover, install extensions like pg_stat_statements (track frequent queries), pg_hint_plan (optimize physical operators), and hypopg (create hypothetical indexes).

Note pg_stat_statements accumulates query statistics over time. Therefore, you need to regularly clear the statistics: 1) to discard all statistics, execute "SELECT pg_stat_statements_reset();"; 2) to discard statistics for a specific query, execute "SELECT pg_stat_statements_reset(userid, dbid, queryid);".
(optional) If you need to run this project locally or in an offline environment, you first need to download the required models to your local machine and then correctly adapt some configurations.

Download the model parameters of Sentence Trasformer

Create a new directory ./multiagents/localized_llms/sentence_embedding/

Place the downloaded sentence-transformer.zip in the ./multiagents/localized_llms/sentence_embedding/ directory; unzip the archive.

Download LLM and embedding models from HuggingFace.

To download models, first install Git LFS, then run

$ git lfs install
$ git clone https://huggingface.co/moka-ai/m3e-base
$ git clone https://huggingface.co/Qwen/Qwen-1_8B-Chat

Adapt the model configuration to the download model paths, e.g.,

EMBEDDING_MODEL = "m3e-base"
LLM_MODELS = ["Qwen-1_8B-Chat"]
MODEL_PATH = {
    "embed_model": {
        "m3e-base": "m3e-base", # Download path of embedding model.
    },

    "llm_model": {
        "Qwen-1_8B-Chat": "Qwen-1_8B-Chat", # Download path of LLM.
    },
}

Download and config localized LLMs.

1.2 frontend setup

Ensure that your machine has Node (>= 18.15.0)

$ node -v
v18.15.0

Install pnpm and dependencies

cd webui
# pnpm address https://pnpm.io/zh/motivation
# install dependency(Recommend use pnpm)
# you can  use "npm -g i pnpm" to install pnpm 
pnpm install

2. Initialize Knowledge Base and Configuration Files

Copy the configuration files

$ python copy_config_example.py
# The generated configuration files are in the configs/ directory
# basic_config.py is the basic configuration file, no modification needed
# diagnose_config.py is the diagnostic configuration file, needs to be modified according to your environment.
# kb_config.py is the knowledge base configuration file, you can modify DEFAULT_VS_TYPE to specify the storage vector library of the knowledge base, or modify related paths.
# model_config.py is the model configuration file, you can modify LLM_MODELS to specify the model used, the current model configuration is mainly for knowledge base search, diagnostic related models are still hardcoded in the code, they will be unified here later.
# prompt_config.py is the prompt configuration file, mainly for LLM dialogue and knowledge base prompts.
# server_config.py is the server configuration file, mainly for server port numbers, etc.

!!! Attention, please modify the following configurations before initializing the knowledge base, otherwise, it may cause the database initialization to fail.

model_config.py

# EMBEDDING_MODEL   Vectorization model, if choosing a local model, it needs to be downloaded to the root directory as required.
# LLM_MODELS        LLM, if choosing a local model, it needs to be downloaded to the root directory as required.
# ONLINE_LLM_MODEL  If using an online model, you need to modify the configuration.

server_config.py

# WEBUI_SERVER.api_base_url   Pay attention to this parameter, if deploying the project on a server, then you need to modify the configuration.

In diagnose_config.py, we set config.yaml as the default LLM expert configuration file.

DIAGNOSTIC_CONFIG_FILE = "config.yaml"

To enable interactive diagnosis refinement with user feedbacks, you can set

DIAGNOSTIC_CONFIG_FILE = "config_feedback.yaml"

To enable diagnosis in Chinese with Qwen, you can set

DIAGNOSTIC_CONFIG_FILE = "config_qwen.yaml"

Initialize the knowledge base

$ python init_database.py --recreate-vs

3. One-click Start

Start the project with the following commands

$ python startup.py -a

4. Launch Interface Examples

If started correctly, you will see the following interface

FastAPI Docs Interface

Web UI Launch Interface Examples:

Web UI Knowledge Base Management Page：

Web UI Conversation Interface:

Web UI UI Diagnostic Page：

👩🏻‍⚕️ Anomaly Diagnosis

1. Prerequisites

Save time by trying out the docker deployment.

(optional) Enable slow query log in PostgreSQL (link)

(1) For "systemctl restart postgresql", the service name can be different (e.g., postgresql-12.service);

(2) Use absolute log path name like "log_directory = '/var/lib/pgsql/12/data/log'";

(3) Set "log_line_prefix = '%m [%p] [%d]'" in postgresql.conf (to record the database names of different queries).
(optional) Prometheus

Check prometheus.md for detailed installation guides.

2. Test typical cases

We put multiple test cases under the test_case folder. You can select a case file on the front-end page for diagnosis or use the command line.

python3 run_diagnose.py --anomaly_file ./test_cases/testing_cases_5.json --config_file config.yaml

🎩 Alerts And Anomalies

Alert Management

Check out how to deploy prometheus and alertmanager in prometheus_service_docker.

You can also choose to quickly put your hands on by using our docker (docker deployment)

Anomaly Simulation

Script-Triggered Anomalies

We provide scripts that trigger typical anomalies (anomalies directory) using highly concurrent operations (e.g., inserts, deletes, updates) in combination with specific test benches.

Single Root Cause Anomalies:

Execute the following command to trigger a single type of anomaly with customized parameters:

python anomaly_trigger/main.py --anomaly MISSING_INDEXES --threads 100 --ncolumn 20 --colsize 100 --nrow 20000

Parameters:

--anomaly: Specifies the type of anomaly to trigger.
--threads: Sets the number of concurrent clients.
--ncolumn: Defines the number of columns.
--colsize: Determines the size of each column (in bytes).
--nrow: Indicates the number of rows.

Multiple Root Cause Anomalies:

To trigger anomalies caused by multiple factors, use the following command:

python anomaly_trigger/multi_anomalies.py

Modify the script as needed to simulate different types of anomalies.

Root Cause	Description	Potential Alerts
INSERT_LARGE_DATA	Long execution time for large data insert
FETCH_LARGE_DATA	Long execution time for large data fetch
REDUNDANT_INDEX	Unnecessary and redundant indexes in tables
VACUUM	Unused space caused by data modifications
POOR_JOIN_PERFORMANCE	Poor performance of join operators
CORRELATED_SUBQUERY	Non-promotable subqueries in SQL statements	,
LOCK_CONTENTION	Lock contention issues
CPU_CONTENTION	Severe CPU resource contention
IO_CONTENTION	IO resource contention affecting SQL performance
COMMIT_CONTENTION	Highly concurrent commits affecting SQL execution
SMALL_MEMORY_ALLOC	Too small allocated memory space

Check detailed use cases at http://dbgpt.dbmind.cn.

Manually Designed Anomalies

Click to check 29 typical anomalies together with expert analysis (supported by the DBMind team)

📎 Customize Knowledge And Tools

1. Knowledge Extraction

(Basic version by Zui Chen)

(1) If you only need simple document splitting, you can directly use the document import function in the "Knowledge Base Management Page".

(2) We require the document itself to have chapter format information, and currently only support the docx format.

Step 1. Configure the ROOT_DIR_NAME path in ./doc2knowledge/doc_to_section.py and store all docx format documents in ROOT_DIR_NAME.

Step 2. Configure OPENAI_KEY.

export OPENAI_API_KEY=XXXXX

Step 3. Split the document into separate chapter files by chapter index.

cd doc2knowledge/
python doc_to_section.py

Step 4. Modify parameters in the doc2knowledge.py script and run the script:

python doc2knowledge.py

Step 5. With the extracted knowledge, you can visualize their clustering results:

python knowledge_clustering.py

2. Tool Preparation

Tool APIs (for optimization)

Module Functions

index_selection (equipped) heuristic algorithm

query_rewrite (equipped) 45 rules

physical_hint (equipped) 15 parameters

For functions within [query_rewrite, physical_hint], you can use api_test.py script to verify the effectiveness.

If the function actually works, append it to the api.py of corresponding module.

Module	Functions
index_selection (equipped)	heuristic algorithm
query_rewrite (equipped)	45 rules
physical_hint (equipped)	15 parameters

Index Advisor Tool

We utilize db2advis heuristic algorithm to recommend indexes for given workloads. The function api is optimize_index_selection.

🐳 Docker Start

You can use docker for a quick and safe use of the monitoring platform and database.

1. Install Docker and Docker-Compose

Refer to tutorials (e.g., on CentOS) for installing Docker and Docoker-Compose.

2. Start service

We use docker-compose to build and manage multiple dockers for metric monitoring (prometheus), alert (alertmanager), database (postgres_db), and alert recording (python_app).

cd prometheus_service_docker
docker-compose  -p prometheus_service  -f docker-compose.yml up --build

Next time starting the prometheus_service, you can directly execute "docker-compose -p prometheus_service -f docker-compose.yml up" without building the dockers.

3. Run anomaly files and generate new alerts

Configure the settings in anomaly_trigger/utils/database.py (e.g., replace "host" with the IP address of the server) and execute an anomaly generation command, like:

cd anomaly_trigger
python3 main.py --anomaly MISSING_INDEXES --threads 100 --ncolumn 20 --colsize 100 --nrow 20000

You may need to modify the arugment values like "--threads 100" if no alert is recorded after execution.

After receiving a request sent to http://127.0.0.1:8023/alert from prometheus_service, the alert summary will be recorded in prometheus_and_db_docker/alert_history.txt, like:

This way, you can use the alert marked as `resolved' as a new anomaly (under the ./diagnostic_files directory) for diagnosis by d-bot.

💁 FAQ

🤨 The '.sh' script command cannot be executed on windows system.

Switch the shell to *git bash* or use *git bash* to execute the '.sh' script.

🤨 "No module named 'xxx'" on windows system.

This error is caused by issues with the Python runtime environment path. You need to perform the following steps:

Step 1: Check Environment Variables.

You must configure the "Scripts" in the environment variables.

Step 2: Check IDE Settings.

For VS Code, download the Python extension for code. For PyCharm, specify the Python version for the current project.

⏱ Todo

~~Project cleaning~~
~~Support more anomalies~~
~~Support more knowledge sources~~
~~Query log option (potential to take up disk space and we need to consider it carefully)~~
~~Add more communication mechanisms~~
~~Prometheus-as-a-Service~~
Localized model that reaches D-bot(gpt4)'s capability
Support other databases (e.g., mysql/redis)

👫 Community

🤗 Relevant Projects

https://github.com/OpenBMB/AgentVerse

https://github.com/Vonng/pigsty

https://github.com/UKPLab/sentence-transformers

https://github.com/chatchat-space/Langchain-Chatchat

https://github.com/shreyashankar/spade-experiments

📒 Citation

Feel free to cite us (paper link) if you like this project.

@misc{zhou2023llm4diag,
      title={D-Bot: Database Diagnosis System using Large Language Models}, 
      author={Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, Jianming Wu, Jiesi Liu, Ruohang Feng, Guoyang Zeng},
      year={2023},
      eprint={2312.01454},
      archivePrefix={arXiv},
      primaryClass={cs.DB}
}

@misc{zhou2023dbgpt,
      title={DB-GPT: Large Language Model Meets Database}, 
      author={Xuanhe Zhou, Zhaoyan Sun, Guoliang Li},
      year={2023},
      archivePrefix={Data Science and Engineering},
}

📧 Contributors

Other Collaborators: Wei Zhou, Kunyi Li.

We thank all the contributors to this project. Do not hesitate if you would like to get involved or contribute!

Contact Information

👏🏻Welcome to our wechat group!

db-gpt's People

Contributors

Stargazers

Watchers

db-gpt's Issues

pip install jq fails on windows

操作系统：

  Windows 11 家庭中文版
  版本：23H2
  操作系统版本：22631.3447

问题描述：

  尝试使用 pip3 install -r requirements.txt 安装依赖时，安装 jq 包失败。

部分日志如下：

  Using cached pyasn1-0.6.0-py2.py3-none-any.whl (85 kB)
  Using cached pycocotools-2.0.7-cp311-cp311-win_amd64.whl (85 kB)
  Downloading pypdfium2-4.30.0-py3-none-win_amd64.whl (2.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.9/2.9 MB 1.7 MB/s eta 0:00:00
  Using cached timm-0.9.16-py3-none-any.whl (2.2 MB)
  Using cached portalocker-2.8.2-py3-none-any.whl (17 kB)
  Building wheels for collected packages: jq
    Building wheel for jq (pyproject.toml) ... error
    error: subprocess-exited-with-error
  
    × Building wheel for jq (pyproject.toml) did not run successfully.
    │ exit code: 1
    ╰─> [5 lines of output]
        running bdist_wheel
        running build
        running build_ext
        Executing: ./configure CFLAGS=-fPIC --prefix=C:\Users\JeanRiver\AppData\Local\Temp\pip-install-_i5dnfje\jq_68d45937a68f47918473e8d79f9a93ba\_deps\build\onig-install-6.9.8
        error: [WinError 2] 系统找不到指定的文件。
        [end of output]
  
    note: This error originates from a subprocess, and is likely not a problem with pip.
    ERROR: Failed building wheel for jq
  Failed to build jq
  ERROR: Could not build wheels for jq, which is required to install pyproject.toml-based projects

python 3.8装不了scipy==1.11.1 SentenceTransformer默认初始化选cuda导致算相似度的时候由于设备不同弹错

希望能fix一下
1.在readme里指定一下python的版本
2.在初始化embedding model的时候指定device='cpu'

Support for loading in lower precision

i have a small gpu, can i load the 13b llama in 4bit? Is there support for this?

import sys; print('Python %s on %s' % (sys.version, sys.platform))
/root/miniconda3/envs/D-Bot/bin/python /root/.pycharm_helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client localhost --port 37745 --file /home/workspace/YYG/FromS/DB-GPT/main.py
Connected to pydev debugger (build 232.8660.197)
系统启动java的jvm虚拟环境成功
12/14/2023 19:25:09 - ERROR - root - obtain_historical_queries_statistics Fails!
12/14/2023 19:25:09 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
12/14/2023 19:25:09 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
12/14/2023 19:25:09 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
12/14/2023 19:25:09 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
12/14/2023 19:25:09 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
12/14/2023 19:25:09 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
Report Initialization!
0%| | 0/1====================== Initialization ======================
rank : 0
local_rank : 0
world_size : 1
local_size : 1
master : localhost:10010
device : 0
cpus : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1
3, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 2
4, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 3
5, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 4
6, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 5
7, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 6
8, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 7
9, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 9
0, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,
101, 102, 103, 104, 105, 106, 107, 108, 109,
110, 111, 112, 113, 114, 115, 116, 117, 118
, 119, 120, 121, 122, 123, 124, 125, 126, 12
7, 128, 129, 130, 131, 132, 133, 134, 135, 1
36, 137, 138, 139, 140, 141, 142, 143, 144,
145, 146, 147, 148, 149, 150, 151, 152, 153,
154, 155, 156, 157, 158, 159, 160, 161, 162
, 163, 164, 165, 166, 167, 168, 169, 170, 17
1, 172, 173, 174, 175]
/root/miniconda3/envs/D-Bot/lib/python3.10/site-packages/bmtrain/synchronize.py:15: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
nccl.allReduce(barrier.storage(), barrier.storage(), 'sum', config['comm'])
args.load is not None, start to load checkpoints /home/workspace/YYG/YYG/D-Bot/DiagLlama/DiagLlama.pt
[INFO][2023-12-14 19:25:39][jeeves-hpc-gpu00][inference.py:33:105510] - load model in 21.73s
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
[INFO][2023-12-14 19:25:40][jeeves-hpc-gpu00][inference.py:38:105510] - load tokenizer in 1.27s
finish loading
100%|██████████████████████████████████████████████████████████████████████| 1/1
Role Assignment!
100%|██████████████████████████████████████████████████████████████████████| 1/1
12/14/2023 19:26:07 - INFO - sentence_transformers.SentenceTransformer - Load pretrained SentenceTransformer: ./localized_llms/sentence_embedding/sentence-transformer/
12/14/2023 19:26:10 - INFO - sentence_transformers.SentenceTransformer - Use pytorch device: cuda
Batches: 100%|████████████████████████████████████| 1/1 [00:00<00:00, 1.46it/s]
CpuExpert Diagnosis!

Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Reflecting ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Reflexion: Reflection: From the previous steps, it's clear that CPU usage was indeed abnormal. Upon diagnosing using match_diagnose_knowledge tool, we observed some potential root causes like high disk I/O and increased number of processes running simultaneously. Key indicators such as node_ins_stdload1[ins=] = 1.75 > 100% were identified which point towards an anomaly in the system. Understanding this knowledge allows us to better diagnose and find solutions for similar issues in the future. However, more analysis is needed to pinpoint the exact cause of the high CPU usage. Moving forward, we should delve deeper into analyzing the provided information before jumping to conclusions too quickly.
Voting ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
12/14/2023 19:41:20 - INFO - sentence_transformers.SentenceTransformer - Load pretrained SentenceTransformer: ./localized_llms/sentence_embedding/sentence-transformer/
12/14/2023 19:41:22 - INFO - sentence_transformers.SentenceTransformer - Use pytorch device: cuda
Batches: 100%|████████████████████████████████████| 1/1 [00:00<00:00, 80.18it/s]
MemoryExpert Diagnosis!
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Reflecting ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
Reflexion: Reflection: From the previous attempt, it's clear that sudden surges in memory usage can cause significant performance degradation. The specific query insert into table1 select generate_series... was identified as potentially problematic due to its large scale data insertion within a short timeframe. This type of operations is resource-intensive and could significantly impact system performance. Understanding this knowledge allows us to better diagnose and find solutions for such issues in the future.
Voting ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
============= Finish the initial diagnosis =============
Cross Review!
Report Generation!
100%|██████████████████████████████████████████████████████████████████████| 1/1
============diag end time==========: 2327.0771877765656
12/14/2023 20:04:01 - ERROR - root - obtain_historical_queries_statistics Fails!
12/14/2023 20:04:01 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
12/14/2023 20:04:01 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
12/14/2023 20:04:01 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
12/14/2023 20:04:01 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
12/14/2023 20:04:01 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
12/14/2023 20:04:01 - WARNING - root - Unused arguments: {'model': 'diag-llama'}
Report Initialization!
100%|██████████████████████████████████████████████████████████████████████| 1/1
Role Assignment!
100%|██████████████████████████████████████████████████████████████████████| 1/1
12/14/2023 20:04:26 - INFO - sentence_transformers.SentenceTransformer - Load pretrained SentenceTransformer: ./localized_llms/sentence_embedding/sentence-transformer/
12/14/2023 20:04:27 - INFO - sentence_transformers.SentenceTransformer - Use pytorch device: cuda
Batches: 100%|████████████████████████████████████| 1/1 [00:00<00:00, 33.23it/s]
CpuExpert Diagnosis!
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
12/14/2023 20:04:46 - INFO - sentence_transformers.SentenceTransformer - Load pretrained SentenceTransformer: ./localized_llms/sentence_embedding/sentence-transformer/
12/14/2023 20:04:47 - INFO - sentence_transformers.SentenceTransformer - Use pytorch device: cuda
Batches: 100%|████████████████████████████████████| 1/1 [00:00<00:00, 38.65it/s]
MemoryExpert Diagnosis!
Analyzing with tools ...
100%|██████████████████████████████████████████████████████████████████████| 1/1
python-BaseException
Traceback (most recent call last):
File "/root/miniconda3/envs/D-Bot/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/root/miniconda3/envs/D-Bot/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/workspace/YYG/FromS/DB-GPT/main.py", line 14, in main
report, records = await multi_agents.run(args)
File "/home/workspace/YYG/FromS/DB-GPT/multiagents/multiagents.py", line 65, in run
report, records = await self.environment.step(args)
File "/home/workspace/YYG/FromS/DB-GPT/multiagents/environments/dba.py", line 252, in step
report = await self.decision_making(selected_experts, None, previous_plan, advice) # plans: the list of diagnosis messages
File "/home/workspace/YYG/FromS/DB-GPT/multiagents/environments/dba.py", line 312, in decision_making
initial_diags = await self.decision_maker.astep(
File "/home/workspace/YYG/FromS/DB-GPT/multiagents/environments/decision_maker/vertical.py", line 33, in astep
results = await asyncio.gather(
File "/home/workspace/YYG/FromS/DB-GPT/multiagents/agents/solver.py", line 200, in step
result_node, top_abnormal_metric_values = chain.start(simulation_count=1,epsilon_new_node=0.3,choice_count=1,vote_candidates=2,vote_count=1,single_chain_max_step=24)
File "/home/workspace/YYG/FromS/DB-GPT/multiagents/reasoning_algorithms/tree_of_thought/UCT_vote_function.py", line 187, in start
end_node, top_abnormal_metric_values = self.default_policy(now_node,this_simulation,single_chain_max_step)
File "/home/workspace/YYG/FromS/DB-GPT/multiagents/reasoning_algorithms/tree_of_thought/UCT_vote_function.py", line 579, in default_policy
result = temp_node.env.tool.call_function(parsed_response.tool, **parameters)
File "/home/workspace/YYG/FromS/DB-GPT/multiagents/tools/api_retrieval.py", line 30, in call_function
return func(*args, **kwargs)
File "/home/workspace/YYG/FromS/DB-GPT/multiagents/tools/metric_monitor/api.py", line 54, in whether_is_abnormal_metric
with open(f"./alert_results/{current_diag_time}/{metric_name}.html", "w") as f:
FileNotFoundError: [Errno 2] No such file or directory: './alert_results/2023-12-14-19:24:58/cpu_usage.html'
a = open(f"./alert_results/{current_diag_time}/{metric_name}.html", "w")
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.18.1 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 8.18.1
Traceback (most recent call last):
File "/root/miniconda3/envs/D-Bot/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3550, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
a = open(f"./alert_results/{current_diag_time}/{metric_name}.html", "w")
File "/root/miniconda3/envs/D-Bot/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 310, in _modified_open
return io_open(file, *args, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: './alert_results/2023-12-14-19:24:58/cpu_usage.html'
a = open(f"/home/workspace/YYG/FromS/DB-GPT/alert_results/alert_results/{current_diag_time}/{metric_name}.html", "w")
Traceback (most recent call last):
File "/root/miniconda3/envs/D-Bot/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3550, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
a = open(f"/home/workspace/YYG/FromS/DB-GPT/alert_results/alert_results/{current_diag_time}/{metric_name}.html", "w")
File "/root/miniconda3/envs/D-Bot/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 310, in _modified_open
return io_open(file, *args, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/home/workspace/YYG/FromS/DB-GPT/alert_results/alert_results/2023-12-14-19:24:58/cpu_usage.html'

API

The project has API support. Is there any documentation on how to use it

/bin/sh: 1: mvn: not found

import sys; print('Python %s on %s' % (sys.version, sys.platform))
/home/hw/miniconda3/envs/D-Bot/bin/python /home/hw/.pycharm_helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client localhost --port 39485 --file /home/hw/YYG/D-Bot/DB-GPT/main.py
Connected to pydev debugger (build 232.8660.197)

/bin/sh: 1: mvn: not found, 这里 readme 里面是不是缺 maven 的安装信息

Clarification on UCT(v) Implementation in Tree Search Algorithm

I found the concept of utilizing the UCT score intriguing. Specifically, I am curious about how the UCT(v) function is implemented in the actual code base.

In the paper, you describe the function as UCT(v) = w(v)/n(v) + C・√ln(N)/n(v), and I am trying to locate the exact part of the code that reflects this formula. Could you confirm if the logic around line 326 in /multiagents/reasoning_algorithms/tree_of_thought/UCT_vote_function.py is where this UCT computation takes place?

If not, could you please guide me to the right section in the code? Understanding the practical application of UCT(v) will greatly assist me in comprehending the overall tree search strategy you have proposed.

Thank you for your time and assistance.

How do you get the diagnosis knowledge?

Dear authors,
I really appreciate your efforts in contributing to this wonderful repo. I have some questions about the acquisition of diagnosis knowledge.

I see you have a file diagnosis_code.txt which contains codes to test different root cause. I'm wondering how do you obtain the code? Is it curated during your own DB maintanence experience or obtained from online resources?
And also is it depend on specific software/DB that you want to diagnose, as some metric names can be different?
Do you have any suggestions for me to obtain such knowledge base? I'm thinking about apply your software to our service.

Thanks so much for your consideration!

Invalid file path involved for Win10

The path of file is not allowed to contain the following characters in Win10, which makes the clone command failed.

\ / : * ? " < > |

i.e., knowledge_json/knowledge_from_document/raw

where can I obtain the dataset used in the experiment

About the experience detection

Thank you for this great work!
I have a question about the Sec. 4 in "LLM as DBA" paper. It says the experience format is ["name", "content", "metrics", "steps"], and is extracted from a document. However, in the repo I didn't see the code for this part. Instead, I found "knowledge_from_code" which extrat knowledge from diagnose codes, and in the format of ["cause_name", "desc", "metrics"]. Am I missing something here?

support config for private openai proxy

网络环境问题，希望能够以配置形式支持个人搭建的openai代理地址，如http://xxxxx:xx/api/openai/v1
另外，apikey的设置，是否也考虑以统一配置文件的形式支持，当前环境变量的配置方法，由于调试环境的不同、不太友好。

得到了一个空的前端页面信息

这里跑完了一个例子后页面依然是空的

running batch_main.py error

process_num = 4
output_answer_file = "batch_testing_answers.jsonl"
result_log_prefix = "./alert_results/logs/diagnosis_results/"
log_dir_name = result_log_prefix + "2023-11-09-12-10-11"
reports_log_dir_name = log_dir_name + "/reports"

this var above, us not find in local project

readme里写的是pg12或更高版本，但pg15 pg_stat_statements表里没有total_time ; 改成total_exec_time了吧？

https://github.com/TsinghuaDatabaseGroup/DB-GPT/blob/34df9b6d9699851eecf2e6df8b98d792717c2039/utils/database.py#L335C28-L335C28

Where are the codes and data for tool matching?

As titled, thanks!

how to justify if different experts contribute to the result?

from the code, the llm behinds all experts are the same(GPT-4). Have you fine-tuned each experts with different knowledge?
how about if we provide the specific domain knowledges(IO/CPU/mem) to an agent? as GPT-4 is a good enough llm to handle such knowledge.

Support for local llm

Does this work only with gpt4? Is there support for local models like mistral?

please help

thanks

sh run_demo.sh error

SQL Execution Fatal!!
ERROR:root:obtain_historical_queries_statistics Fails!

需要安装 protobuf 包

/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/bmtrain/synchronize.py:14: UserWarning: The torch.cuda.DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
barrier = torch.cuda.FloatTensor([1])
/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/bmtrain/synchronize.py:15: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
nccl.allReduce(barrier.storage(), barrier.storage(), 'sum', config['comm'])
args.load is not None, start to load checkpoints /home/hw/YYG/D-Bot/DiagLlama/DiagLlama.pt
[INFO][2023-12-12 19:39:19][jeeves-hpc-gpu00][inference.py:69:1135371] - load model in 30.93s
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in huggingface/transformers#24565
0%| | 0/1
Traceback (most recent call last):
File "/home/hw/miniconda3/envs/D-Bot/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/hw/miniconda3/envs/D-Bot/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/hw/YYG/D-Bot/DB-GPT/main.py", line 14, in main
report, records = await multi_agents.run(args)
File "/home/hw/YYG/D-Bot/DB-GPT/multiagents/multiagents.py", line 65, in run
report, records = await self.environment.step(args)
File "/home/hw/YYG/D-Bot/DB-GPT/multiagents/environments/dba.py", line 172, in step
self.reporter.initialize_report()
File "/home/hw/YYG/D-Bot/DB-GPT/multiagents/agents/reporter.py", line 62, in initialize_report
anomaly_desc = self.llm.parse()
File "/home/hw/YYG/D-Bot/DB-GPT/multiagents/llms/diag_llama.py", line 83, in parse
output = llama_inference.inference(new_messages, max_in_len=self.args.max_in_len, max_length=self.args.max_length, beam_size=self.args.beam_size)
File "/home/hw/YYG/D-Bot/DB-GPT/diagllama/inference.py", line 222, in inference
self.tokenizer, self.model = setup_model(self.args)
File "/home/hw/YYG/D-Bot/DB-GPT/diagllama/inference.py", line 72, in setup_model
python-BaseException
tokenizer = get_tokenizer(args)
File "/home/hw/YYG/D-Bot/DB-GPT/diagllama/inference.py", line 35, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(args.vocab,
File "/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 768, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
return cls._from_pretrained(
File "/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 124, in init
super().init(
File "/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 114, in init
fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
File "/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py", line 1344, in convert_slow_tokenizer
return converter_class(transformer_tokenizer).converted()
File "/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py", line 464, in init
model_pb2 = import_protobuf()
File "/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py", line 37, in import_protobuf
if version.parse(google.protobuf.version) < version.parse("4.0.0"):
AttributeError: '_jpype._JPackage' object has no attribute 'version'

API通信遇到错误：peer closed connection without sending complete message body

==============================Data chat Configuration==============================
操作系统：Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35.
python版本：3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
项目版本：v0.0.1
langchain版本：0.0.344. fastchat版本：0.2.36

当前使用的分词器：ChineseRecursiveTextSplitter
当前启动的LLM模型：['qwen_18b'] @ cuda
{'device': 'cuda',
'host': '0.0.0.0',
'infer_turbo': False,
'model_path': '/mnt/g/chatglm/db-gpt-qh/model/Qwen-1_8B-Chat',
'model_path_exists': True,
'port': 20002}
当前Embbedings模型： text-embedding-ada-002 @ cuda

服务端运行信息：
OpenAI API Server: http://127.0.0.1:20000/v1
DB-GPT API Server: http://127.0.0.1:7861
DB-GPT WEBUI Server: http://0.0.0.0:8501
==============================DB-GPT Configuration==============================

You can now view your Streamlit app in your browser.

URL: http://0.0.0.0:8501

2024-03-28 15:48:03,012 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:20001/list_models "HTTP/1.1 200 OK"
INFO: 127.0.0.1:59642 - "POST /llm_model/list_running_models HTTP/1.1" 200 OK
2024-03-28 15:48:03,020 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:7861/llm_model/list_running_models "HTTP/1.1 200 OK"
2024-03-28 15:48:03,159 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:20001/list_models "HTTP/1.1 200 OK"
INFO: 127.0.0.1:59642 - "POST /llm_model/list_running_models HTTP/1.1" 200 OK
2024-03-28 15:48:03,167 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:7861/llm_model/list_running_models "HTTP/1.1 200 OK"
INFO: 127.0.0.1:59642 - "POST /llm_model/list_config_models HTTP/1.1" 200 OK
2024-03-28 15:48:03,174 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:7861/llm_model/list_config_models "HTTP/1.1 200 OK"
2024-03-28 15:48:07,116 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:20001/list_models "HTTP/1.1 200 OK"
INFO: 127.0.0.1:37774 - "POST /llm_model/list_running_models HTTP/1.1" 200 OK
2024-03-28 15:48:07,126 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:7861/llm_model/list_running_models "HTTP/1.1 200 OK"
2024-03-28 15:48:07,162 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:20001/list_models "HTTP/1.1 200 OK"
INFO: 127.0.0.1:37774 - "POST /llm_model/list_running_models HTTP/1.1" 200 OK
2024-03-28 15:48:07,168 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:7861/llm_model/list_running_models "HTTP/1.1 200 OK"
INFO: 127.0.0.1:37774 - "POST /llm_model/list_config_models HTTP/1.1" 200 OK
2024-03-28 15:48:07,176 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:7861/llm_model/list_config_models "HTTP/1.1 200 OK"
INFO: 127.0.0.1:37774 - "POST /chat/chat HTTP/1.1" 200 OK
2024-03-28 15:48:07,212 - _client.py[line:1026] - INFO: HTTP Request: POST http://127.0.0.1:7861/chat/chat "HTTP/1.1 200 OK"
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
self.dialect.do_execute(
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: no such table: message

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call
await self.app(scope, receive, send)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/routing.py", line 69, in app
await response(scope, receive, send)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/responses.py", line 270, in call
async with anyio.create_task_group() as task_group:
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 597, in aexit
raise exceptions[0]
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/responses.py", line 273, in wrap
await func()
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/starlette/responses.py", line 262, in stream_response
async for chunk in self.body_iterator:
File "/mnt/g/chatglm/DB-GPT-qh/server/chat/chat.py", line 44, in chat_iterator
message_id = add_message_to_db(chat_type="llm_chat", query=query, conversation_id=conversation_id)
File "/mnt/g/chatglm/DB-GPT-qh/server/db/session.py", line 26, in wrapper
result = f(session, *args, **kwargs)
File "/mnt/g/chatglm/DB-GPT-qh/server/db/repository/message_repository.py", line 19, in add_message_to_db
session.commit()
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1920, in commit
trans.commit(_to_root=True)
File "", line 2, in commit
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
ret_value = fn(self, *arg, **kw)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1236, in commit
self._prepare_impl()
File "", line 2, in _prepare_impl
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
ret_value = fn(self, *arg, **kw)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1211, in _prepare_impl
self.session.flush()
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4163, in flush
self._flush(objects)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4298, in _flush
with util.safe_reraise():
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 147, in exit
raise exc_value.with_traceback(exc_tb)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4259, in _flush
flush_context.execute()
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
rec.execute(self)
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 642, in execute
util.preloaded.orm_persistence.save_obj(
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 93, in save_obj
_emit_insert_statements(
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 1226, in _emit_insert_statements
result = connection.execute(
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1412, in execute
return meth(
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 483, in _execute_on_connection
return connection._execute_clauseelement(
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1635, in _execute_clauseelement
ret = self._execute_context(
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
return self._exec_single_context(
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1984, in _exec_single_context
self._handle_dbapi_exception(
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
self.dialect.do_execute(
File "/root/anaconda3/envs/dbgpts/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: message
[SQL: INSERT INTO message (id, conversation_id, chat_type, "query", response, meta_data, feedback_score, feedback_reason, create_time) VALUES (?, ?, ?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP) RETURNING create_time]
[parameters: ('0f9509e43907405b94374347e22029c6', 'b6ee1b1811e34f35a8819ba5541b4eff', 'llm_chat', 'hello', '', '{}', -1, '')]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
2024-03-28 15:48:07,238 - utils.py[line:187] - ERROR: RemoteProtocolError: API通信遇到错误：peer closed connection without sending complete message body (incomplete chunked read)

running with baichung2 asks for openai error

im using the baichung version of the project, but it throws an openai api key error

How is workload_sqls obtained in main.py

Hello,

I am trying to understand about the workload_sqls that is being fed into main.py. Could you explain to me how these SQLs are being derived?

Is the information being extracted from the pg_stat_statements view?

support other online models？

想问个问题是否支持其他线上模型啊，比如zhipuai，通义千问等在线模型，在diagnosis这个模块中

[bug] 启动后无法对话

模型配置

EMBEDDING_MODEL = "bge-m3e"
LLM_MODELS = ["chatglm3-6b"]
MODEL_PATH = {
"embed_model": {
"bge-m3e": "/home/wwt/repo/aigc/models/embedding/BAAI_bge-m3", # Download path of embedding model.
},

"llm_model": { 
    "chatglm3-6b": "/home/wwt/repo/aigc/models/glm/THUDM_chatglm3-6b"
},

}

初始化知识库和配置文件

docker start mysql
docker start postgre12
python copy_config_example.py

前端安装

cd webui
pnpm install

启动

python startup.py -a # 模型也看加载了，但是报如图错误，也无法对话，这是为啥呀。

RuntimeError: main thread is not in main loop

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1

Analyzing with tools ...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Reflecting ...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Reflexion: Reflection: From the previous steps, it's clear that CPU usage was indeed abnormal. Upon diagnosing using match_diagnose_knowledge tool, we observed some potential root causes like high disk I/O and increased number of processes running simultaneously. Key indicators such as node_ins_stdload1[ins=] = 1.75 > 100% were identified which point towards an anomaly in the system. Understanding this knowledge allows us to better diagnose and find solutions for similar issues in the future. However, more analysis is needed to pinpoint the exact cause of the high CPU usage. Moving forward, we should delve deeper into analyzing the provided information before jumping to conclusions too quickly.
Voting ...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
12/14/2023 21:46:04 - INFO - sentence_transformers.SentenceTransformer - Load pretrained SentenceTransformer: ./localized_llms/sentence_embedding/sentence-transformer/
12/14/2023 21:46:05 - INFO - sentence_transformers.SentenceTransformer - Use pytorch device: cuda
Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 101.90it/s]

MemoryExpert Diagnosis!

Analyzing with tools ...
0%| | 0/1Exception ignored in: <function Image.del at 0x7ff83a08e200>
Traceback (most recent call last):
File "/root/miniconda3/envs/D-Bot/lib/python3.10/tkinter/init.py", line 4056, in del
self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Analyzing with tools ...
0%| | 0/1Exception ignored in: <function Variable.del at 0x7ff83a212a70>
Traceback (most recent call last):
File "/root/miniconda3/envs/D-Bot/lib/python3.10/tkinter/init.py", line 388, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7ff83a212a70>
Traceback (most recent call last):
File "/root/miniconda3/envs/D-Bot/lib/python3.10/tkinter/init.py", line 388, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7ff83a212a70>
Traceback (most recent call last):
File "/root/miniconda3/envs/D-Bot/lib/python3.10/tkinter/init.py", line 388, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7ff83a212a70>
Traceback (most recent call last):
File "/root/miniconda3/envs/D-Bot/lib/python3.10/tkinter/init.py", line 388, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Tcl_AsyncDelete: async handler deleted by the wrong thread
[1] 110342 IOT instruction (core dumped) python main.py

support for mysql database

如题，能够支持MySQL场景

Video walkthrough for setup with baichung model

I think itll be super helpful if you can showcase a video walkthrough on how to setup for baichung model and the database and prometheus configs

Thanks so much!

The information in Supported Anomalies is duplicated.

there are two FETCH_LARGE_DATA in the Supported Anomalies Table.

怎么配置llm

请问下，如果不使用llm，multiagents/agent_conf/config.yaml 该怎么配置
llm: llm_type: model:
目前的case是没法跑通的
ValueError: None is not registered. Please register with the .register("None") method provided in LLMRegistry registry

TypeError When Using Diagnosis function.

run python startup.py -a and operate in webpage. The normal chat and knowledge chat is ok.
after upload the project's test file of test_cases/xx.json，click diagnosis，for a while, the error occur as following and。

`
ConfigurationExpert Diagnosis!

Analyzing with tools ...
0%| | 0/12024-02-03 21:35:18,444 - _client.py[line:1027] - INFO: HTTP Request: POST http://127.0.0.1:8000/v1/chat/completions "HTTP/1.1 200 OK"
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Traceback (most recent call last):
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/run_diagnose.py", line 90, in
asyncio.run(main(args))
File "/home/hhw/anaconda3/envs/dbgpt/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/hhw/anaconda3/envs/dbgpt/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/run_diagnose.py", line 31, in main
report, records = await multi_agents.run(args)
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/multiagents/multiagents.py", line 138, in run
report, records = await self.environment.step(args)
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/multiagents/environments/dba.py", line 278, in step
report = await self.decision_making(selected_experts, None, previous_plan, advice) # plans: the list of diagnosis messages
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/multiagents/environments/dba.py", line 358, in decision_making
initial_diags = await self.decision_maker.astep(
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/multiagents/environments/decision_maker/vertical.py", line 30, in astep
results = await asyncio.gather(
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/multiagents/agents/solver.py", line 140, in step
result_node, top_abnormal_metric_values = chain.start(
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/multiagents/reasoning_algorithms/tree_of_thought/UCT_vote_function.py", line 184, in start
end_node, top_abnormal_metric_values = self.default_policy(now_node,this_simulation,single_chain_max_step)
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/multiagents/reasoning_algorithms/tree_of_thought/UCT_vote_function.py", line 576, in default_policy
result = temp_node.env.tool.call_function(parsed_response.tool, **parameters)
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/multiagents/tools/api_retrieval.py", line 30, in call_function
return func(*args, **kwargs)
File "/home/hhw/Downloads/aigc/proj/gpt/TsinghuaDatabaseGroup_DB-GPT/dbgpt0203/multiagents/tools/index_advisor/api.py", line 33, in optimize_index_selection
for query_template in workload_statistics:
TypeError: 'int' object is not iterable
`

by the way, backend's gpu service is started by fastchat project, and i run chatglm3-6b in fastchat's openai-api for diagnosis function with a cuda of 24G.

how to add new db-specified rules for the query rewrite tool?

hi,

I saw the query rewrite use the LearnedRewrite.jar to do the rewrite job.
In the readme, LearnedRewrite.jar is derived from calcite,
may i know where is the repo of LearnedRewrite.jar?
If i want to add some new rules, how can i add the some db-specified rules?

Can i setup without prometheus?

Im having trouble installing greenplum , can i just bypass and setup it up without prometheus?

tool_config.yaml

What does the remote_directory in the tool_config.yaml refer to