Git Product home page Git Product logo

mipt-hack-23's Introduction

MyCityGuideBot

tg: @MyCityGuideBot

MyCityGuideBot

A perfect assistant for urban adventurers and culture enthusiasts

Features:

  • Responds with audio to voice message
  • Generates description for uploaded picture
  • /story - Discover an intriguing fact about a random city
  • Supports 2 languages: English & Russian

Demo slide deck is available via link

Architecture

architecture

Sequence diagrams

Voice flow

voice sequence diagram

Image flow

voice sequence diagram

Prerequisites

  1. poetry

    curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
  2. ngrok

    brew install ngrok
  3. pre-commit

    brew install pre-commit
    pre-commit install
  4. gcloud

    brew install --cask google-cloud-sdk

Running jupyterlab

poetry install
poetry run jupyter lab

Running bot locally

  1. Make sure you have access to the Google project mipt-hack-01 (contact tg: @fancyeagle if not)

    gcloud auth application-default login
  2. Register your bot for testing with BotFather and save a telegram token as env variable:

    export TELEGRAM_TOKEN=some_value
  3. If you didn't create a virtual environment with poetry before, create it:

    poetry install
  4. Execute the following commands in terminal to start bot service locally:

    export FLASK_APP=bot_service.app:app
    export FLASK_RUN_PORT=5005
    export FLASK_ENV=development
    export PROJECT_ID=mipt-hack-01
    poetry run flask run --reload
  5. Create a tunnel for the local HTTP server from the previous step using ngrok, making it accessible from the public internet:

    ngrok http 127.0.0.1:5005
  6. Using sh/webhook.sh script, register the webhook with Telegram API, so that the bot will trigger local service on each request:

    sh sh/webhook.sh set <PUBLIC_URL> <TELEGRAM_TOKEN>
    • PUBLIC_URL - the url generated by ngrok on step 4, e.g.: https://fce9-2a02-a447-94fc-1-50f5-8a25-6ab-eecc.ngrok.io
    • TELEGRAM_TOKEN - the token generated by BotFather on step 1
  7. After you finished with testing, you can disconnect your bot using the following command:

    sh sh/webook.sh delete <TELEGRAM_TOKEN>

Running BDD tests

To ensure the model accurately recognizes artwork, a set of Behavior-Driven Development (BDD) tests has been created. These tests verify that the model's responses correctly match the paintings and landmarks present in the photo.

Before running tests, download data.zip and extract it to tests/bdd folder.

Running all tests:

poetry run pytest tests/bdd

Running tests for category:

poetry run pytest tests/bdd -k "paintings"
poetry run pytest tests/bdd -k "landmarks"

It appears that the model incorrectly identified the painting "The Stroll at Giverny" by Claude Monet as "Woman and Children in a Field" by Berthe Morisot. This misidentification can be due to several factors related to the model's training and the inherent challenges in artwork recognition.

mipt-hack-23's People

Contributors

bushuevatatiana avatar itallix avatar newuser454 avatar zagir-dg avatar

Stargazers

 avatar

Watchers

 avatar

mipt-hack-23's Issues

Dev: Implement the image flow

Background: User can upload an image to the bot. The bot should be able to reply with text from Gemini Pro model.

  • Indicate that typing is in progress (bot status area)
  • If user provides prompt, ask model with that prompt
  • If user doesn't provide prompt, implement prompt generation as decided in #2

Dev: Implement the voice flow

Background: User can record the message and send to the bot. The bot should be able to reply with the audio recording.

  • Indicate that voice recording is in progress (bot status area)
  • Transcribe audio from the user with Speech-To-Text API
  • Send text request to the Gemini Pro model
  • Generate audio based on the model response with Text-To-Speech API and send back to user

*Optional. Discuss with the team and decide if we want to moderate the input? I.e. if it's not related to city exploration, then bot can reply with "I apologize, but that's beyond my capabilities at the moment" (Check with tech writer for the best possible response)

QA: Implement automated tests to verify Gemini Pro Vision replies

  • add tests for images with a few categories of artwork, e.g. paintings, sculptures (what else can be included?)
  • add tests for images with city objects, e.g. the Colosseum (Rome), the Forbidden City (Beijing), etc
  • discuss with the team and determine additional scenarios that can be included
  • cover 5 different options per each category
  • parametrise tests within each category

Think of adopting BDD test framework pytest-bdd and design tests as features, e.g.:

Feature: Vision Pro can detect artwork
    A model can correctly identify the artwork on the image

    Scenario: Can detect paintings
        Given The painting "nightwatch.jpg"
        When Gemini Pro Vision triggered
        Then The model response should contain painting name "The Night Watch"
        And The model response should contain author "Rembrandt"

    Scenario: Can detect sculptures
        Given The sculpture "kiss.jpg"
        When Gemini Pro Vision triggered
        Then The model response should contain sculpture name "The Kiss"
        And The model response should contain author "Rodin"

ML: Decide the best way to ask Gemini Pro Vision model about an uploaded image when no prompt is provided

Background: Users are not required to provide any prompt when uploading an image. This raises an issue about how to best ask the model about the image without any context.

  • Option 1: Use a generic prompt, e.g. "Tell me something about this thing"
  • Option 2*: Use a more specific prompt that has knowledge of what is in the image, i.e. painting, sculpture or any other artwork

AC: Run some experiments and decide which option to choose.

  • Implementation will be defined in Dev story

Dev: randomise `/story` command

Background: The current prompt doesn't give different answers. Let's pick a city from the list of cities using python's random.choice and send in prompt to the model

  • List can be hardcoded (use some 30 different famous cities)
  • Or can be retrieved with some public API (check for possible options)

TW: Define the bot metadata

Background: For our new bot MyCityGuideBot telegram-specific metadata should be provided.

Following properties should be filled:

  • About - will be shown on bot settings page
  • Description -
  • Description picture -
  • Botpic - circle bot icon shown in telegram interface

Acceptance criterias:

  • Choose 3-5 logo options (use existing pictures or generate something unique with the model, e.g. Kandinsky, Midjourney)
  • Create Poll in the team chat to vote
  • Create new static folder in the repo and add image file logo.jpg with the most votes
  • Add metadata.txt file formatted in the following way:
About: <text>
Description: <text>
Description picture: <text>

Check telegram docs for more details on metadata

Dev: Implement `/story` command

Implement new command /story that will give some fun fact about a random city:

  • Support English and Russian
  • The prompt language should be defined based on telegram profile settings

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.