Git Product home page Git Product logo

ai-network-troubleshooting-poc's Introduction

AI-Network-Troubleshooting-PoC

Docker Version

This demo is built to showcase how you AI might assist you in troubleshooting network issues.

The components used by this demo are:

  • Virtual IOS-XE devices running ISIS.
    • The CML Devnet sandbox was used to build the lab.
    • Sandbox DevBox VM 10.10.20.50, developer/C1sco12345)
  • ncpeek. A python netconf client used for telegraf.
  • TIG stack with docker 20.10+ ๐Ÿณ
    • Telegraf uses ncpeek to pull telemetry data from network devices.
    • Grafana kicks a webhook when an alarm is detected. ๐Ÿšจ
  • FastAPI.
    • Host the LLM.
    • Interacts with PyATS & Webex.
  • PyATS. Provides a framework to interact with network devices. ๐Ÿ› ๏ธ
  • Webex_bot use to interact with the LLM. ๐Ÿค–
  • OpenAI LLM. ๐Ÿง 
    • gpt-4-turbo-preview was used. ๐Ÿš€

๐ŸŽฌ Demo

For this demo one alarm was created.

if avgNeighbors(30sec) < avgNeighbors(30min) : send Alarmโ€‹

When the average number of ISIS neighbors in a lapse of 30 second is less than the average number of ISIS neighbors in a lapse of 30 minutes, the alarm will trigger a webhook for the LLM.

This signal that a stable ISIS neighbor that was working on the last 30 minutes was lost, and allows to work with N number of ISIS neighbors.

๐Ÿ› ๏ธ Prepare Demo

๐Ÿ”‘ Environment variables

Environment variables are injected through the use of the Makefile on root of the project.

๐Ÿ“Œ Mandatory variables

Important

For the demo to work, you must set the next environment variables.

OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
WEBEX_TEAMS_ACCESS_TOKEN=<YOUR_TEAM_ACCESS_TOKEN>
WEBEX_APPROVED_USERS_MAIL=<MAILS_OF_USERS_APPROVED_SEPARATED_BY_COMMAS>
WEBEX_USERNAME=<YOUR_WEBEX_USERNAME>
WEBEX_ROOM_ID=<THE_WEBEX_ROOM_ID>

Note

The webex variables are only needed if you interact with the LLM using webex. However you need to modify the python accordingly.

If you prefer to use another client, you need to:

๐Ÿ“ Webex considerations

To get your webex token go to https://developer.webex.com/docs/bots and create a bot.

To get the WEBEX_ROOM_ID the easiest way is to open a room with your bot in the webex app. Once you have your room, you can get the WEBEX_ROOM_ID by using API list room, use your token created before.

๐Ÿ“Œ Optional Variables

For testing, you can use the GRAFANA_WEB_HOOK env var to send webhooks to other site, such as https://webhook.site/

If you have access to smith.langchain.com (recommended for view LLM operations) add your project ID and API key.

GRAFANA_WEB_HOOK=<WEB_HOOK_URL>
LANGCHAIN_PROJECT=<YOUR_LANGCHAIN_PROJECT_ID>
LANGCHAIN_API_KEY=<YOUR_LANGCHAIN_API_KEY>
LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com

.env.local file

The .env.local file is used to define all variables used by the containers.

In a production environment, this file should be kept out of version control using the .gitignore file.

๐Ÿš€ Start the topology

This demo uses a CML instance from the Cisco DevNet sandbox. You can also use a dedicated CML instance or a NSO sandbox. ๐Ÿ–๏ธ

After acquiring your sandbox, stop the default topology and wipe it out. ๐Ÿงน

Then, import the topology file used for this demo and start the lab.

๐Ÿ“ฆ TIG Stack

The TIG stack requires Docker and IP reachability to the CML instance. For this demo, I used the sandbox VM 10.10.20.50.

First time, build the TIG stack.

make build-tig

Subsequent runs of the TIG stack you can run the containers.

make run-tig

๐Ÿšฆ Verifying Telemetry on Telegraf, Influxdb, Grafana

Telegraf

  • Logs: On 10.10.20.50 use docker exec -it telegraf bash then tail -F /tmp/telegraf-grpc.log.

Influxdb

Grafana

๐Ÿ Starting the LLM

The llm_agent directory provides the entry point for the application, the app file

The llm container runs on the sandbox VM 10.10.20.50.

make run-llm

๐ŸŽฎ Running the Demo

network topology

The demo involves shutting down one interface, causing an ISIS failure, and allowing the LLM to diagnose the issue and implement a fix.

In the images below, GigabitEthernet5 was shutting down on cat8000-v0 resulting in losing its ISIS adjacency with cat8000-v2

You can watch the recorded demo here

Note

The recoding was done as a backup demo. It doesn't have audio or instructions.

On Grafana, you can observe the ISIS count decreasing and triggering an alarm.

grafana alarm grafana alarm 2

Next, you will receive a webex notification from grafana and the LLM will receive the webhook. The webhook triggers the LLM to start looking at what the issue is and how to resolve it.

llm thinking 1 llm thinking 2 llm thinking 3 llm thinking 4

๐Ÿ“ Notes

  • Tokens can run out easily with netconf, highly important to filter what is sent to the AI.
  • Repeated alarms are suppresed by Grafana, this is controlled by the grafana policy file,
    • If you are testing continously, run make run-tig to destroy and create the TIG containers.
    • This isn't an ideal scenario, but a proper solution wasn't found within the given time.
  • From time to time, the answers from the LLM are lost and not sent to webex. You can find them on the terminal output.
  • This is the third iteration of this exercise. The first one was Cisco Live Amsterdam 2024
    • The main differences are the use of makefile, docker compose and the refactoring of the llm agent code for a better separation of concerns.

ai-network-troubleshooting-poc's People

Contributors

jillesca avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.