Evaluation-AI: A LLM for Assessing Government Programs

The Evaluation-AI repository is dedicated to building a comprehensive collection of program evaluation reports and leveraging state-of-the-art techniques in large language models (LLMs) to create a powerful tool for assessing government programs. Our mission is to develop a model that can effectively understand and analyze policy frameworks using retrieval-augmented generation (RAG) and a hybrid search strategy. We are committed to transparency and open-source fairness measurement to ensure the reliability and equity of our models.

To run a Streamlit app of a local LLM specialized in program evaluation critiques, go to our page to run the app codes: https://github.com/casualcomputer/evaluation-ai-pro. Our backend design is as follows:

Objectives

Repository of Program Evaluation Reports:

Curate and maintain a rich collection of program evaluation reports, serving as a valuable resource for researchers and practitioners.
Advanced LLM Training: Utilize the latest advancements in LLMs to train a model that comprehensively understands policy frameworks and program evaluation contexts.
Retrieval-Augmented Generation (RAG): Employ a hybrid search strategy combining traditional retrieval methods with generative capabilities to enhance information extraction and comprehension.
Contextual Application: Develop and tailor the tool for various tasks within the program evaluation context, ensuring its relevance and effectiveness in real-world applications.

Design:

Backend pipelines:

Scrape and clean reports
RAG for document retrival and summarization a. retrival b. summarization c. LLM validation/critique
Human-in-the-loop for user feedback
Model evaluation and online/batch learning

Frontend:

Chat interface

Usage

Clone the repository

git clone https://github.com/casualcomputer/evaluation-ai.git

Navigate to the script location

you can use -h with the scripts to see the help messages.

cd evaluation-ai/src/data/
python 00_load_raw_data.py
python 01_extract_text.py

Data Sources

Source Name	Source Link	Number of Extracted Reports
ESDC	Link	177
CRA	Link	196
Health Canada	Link	129
Natural Resources Canada	Link	119

Naming Convention

Reports are named as <department acronym>_<id>_<title>.<extension>

casualcomputer / evaluation-ai Goto Github PK

evaluation-ai's Introduction

Evaluation-AI: A LLM for Assessing Government Programs

Objectives

Design:

Usage

Clone the repository

Navigate to the script location

Data Sources

Naming Convention

evaluation-ai's People

Contributors

Watchers

evaluation-ai's Issues

Add requirements.txt for reproducibility

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent