Git Product home page Git Product logo

agentops-node's Introduction

AgentOps BETA๐Ÿ•ต๏ธ

AI agents suck. Weโ€™re fixing that.

Build your next agent with evals, observability, and replay analytics. AgentOps is the toolkit for evaluating and developing robust and reliable AI agents.

License: MIT

Quick Start

Install AgentOps npm install agentops

Add AgentOps to your code. Check out an example.

import OpenAI from "openai";
import { Client } from 'agentops';

const openai = new OpenAI();                        // Add your API key here or in the .env

const agentops = new Client({
    apiKey: "<Insert AgentOps API Key>",            // Add your API key here or in the .env
    tags: ["abc", "success"],                       // Optionally add tags to your run
    patchApi: [openai]                              // Record LLM calls automatically (Only OpenAI is currently supported)
});

// agentops.patchApi(openai)                        // Alternatively, you can patch API calls later

// Sample OpenAI call (automatically recorded if specified in "patched")
async function chat() {
    const completion = await openai.chat.completions.create({
        messages: [{ "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "Who won the world series in 2020?" },
        { "role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020." },
        { "role": "user", "content": "Where was it played?" }],
        model: "gpt-3.5-turbo",
    });

    return completion
}

// Sample other function
function orignal(x: string) {
    console.log(x);
    return 5;
}

// You can track other functions by wrapping the function.
const wrapped = agentops.wrap(orignal);
wrapped("hello");


chat().then(() => {
    agentops.endSession("Success"); // Make sure you end your session when your agent is done.
});

Time travel debugging ๐Ÿ”ฎ

(coming soon!)

Agent Arena ๐ŸฅŠ

(coming soon!)

Evaluations Roadmap ๐Ÿงญ

Platform Dashboard Evals
โœ… Python SDK โœ… Multi-session and Cross-session metrics ๐Ÿšง Evaluation playground + leaderboard
๐Ÿšง Evaluation builder API โœ… Custom event tag trackingย  ๐Ÿ”œ Agent scorecards
โœ… Javascript/Typescript SDK ๐Ÿšง Session replays ๐Ÿ”œ Custom eval metrics

Debugging Roadmap ๐Ÿงญ

Performance testing Environments LAA (LLM augmented agents) specific tests Reasoning and execution testing
โœ… Event latency analysis ๐Ÿ”œ Non-stationary environment testing ๐Ÿ”œ LLM non-deterministic function detection ๐Ÿšง Infinite loops and recursive thought detection
โœ… Agent workflow execution pricing ๐Ÿ”œ Multi-modal environments ๐Ÿ”œ Token limit overflow flags ๐Ÿ”œ Faulty reasoning detection
๐Ÿ”œ Success validators (external) ๐Ÿ”œ Execution containers ๐Ÿ”œ Context limit overflow flags ๐Ÿ”œ Generative code validators
๐Ÿ”œ Agent controllers/skill tests ๐Ÿ”œ Honeypot and prompt injection evaluation ๐Ÿ”œ API bill tracking ๐Ÿ”œ Error breakpoint analysis
๐Ÿ”œ Information context constraint testing ๐Ÿ”œ Anti-agent roadblocks (i.e. Captchas)
๐Ÿ”œ Regression testing

Why AgentOps? ๐Ÿค”

Our mission is to make sure your agents are ready for production.

Agent developers often work with little to no visibility into agent testing performance. This means their agents never leave the lab. We're changing that.

AgentOps is the easiest way to evaluate, grade, and test agents. Is there a feature you'd like to see AgentOps cover? Just raise it in the issues tab, and we'll work on adding it to the roadmap.

agentops-node's People

Contributors

siyangqiu avatar areibman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.