Git Product home page Git Product logo

odiagenai_interns_2023's Introduction

OdiaGenAI_Interns_2023

About

This repo used by the OdiaGenAI interns selected for Generative AI and LLM projects.

How to use this repo?

# Install pre-commit
pre-commit install

Please refer to the instruction guidelines placed in Instructions_and_Guidelines.

Interns

  • Nipun Balachandran Nair (Amrita Vishwa Vidhyapeetham, Bangalore)
  • Shubhendra Kushwaha (ITER, Bhubaneswar)
  • Nihar Ranjan Samal (Silicon Institute of Technology, Bhubaneswar)
  • Madhusmita Mohanty (MITS School of Biotechnology, Bhubaneswar)
  • Parul Agarwal (Institute of Mathematics and Applications, Bhubaneswar)
  • Debasish Dhal (NISER, Bhubaneswar)
  • Subham Pradhan (Silicon Institute of Technology, Bhubaneswar)
  • Adit Sharma (Jaypee Institute of Information Technology, Noida)
  • Aisha Asif (KIIT, Bhubaneswar)
  • Prosper Abel Mgimwa (KIIT, Bhubaneswar)
  • Muhammed Abdur Rahmaan Kamaldeen (KIIT, Bhubaneswar)
  • Sai Harish (Raghu Engineering College, Visakhapatnam)
  • Sai Snehith K (Amrita Vishwa Vidhyapeetham, Bangalore)
  • Samirit Saha (Amrita Vishwa Vidhyapeetham, Bangalore)
  • Pragyan Prusty (ITER, Bhubaneswar)
  • Sk Shahid (Silicon Institute of Technology, Bhubaneswar)

odiagenai_interns_2023's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

odiagenai_interns_2023's Issues

ArgillaforValidationInterfaceinColab

Hi, this is Samirit Saha. Here is a Python file for my work done so far in creating a validation interface using Argilla UI with Python on Google Colab, so far I have been able to open the Argilla UI on Google Colab and have been able to download the all_combined_odia_171k dataset on Colab from Hugging Face. The code is divided into two sections, one containing the code that works so far, and one containing the code that doesn't.

inability to store extracted data in a txt file

Currently, the code does not include the functionality to save the extracted data as a text file. After performing the extraction process, the code lacks the capability of storing the extracted content in a structured format, such as a .txt file

Translation (T1_ST2)

Build a Translation model for translation from English to Indic Languages.

  1. extract pair datasets from hugging face, and other different sources and push data to central repo.
  2. Buid a translation model using English to XX using LLM. (R&D on T5, NLLB, others).
  3. XX - Odia or any other Indic languages.

Initial code for webapp

I have included the source files for our web scraping web application using Streamlit and Python. These files contain all the necessary code to scrape data from websites and present it in a user-friendly and interactive manner.

Data Prepration(T1)

  1. T1_ST1 - Translating instruction set from English - XX using the existing translation app available in the OdiaGenAI Github.
  2. T1_ST3 - Validation of instruction set (please refer to the https://drive.google.com/drive/u/0/folders/16GUxTEbcvE-RL-JNrTOtGliaPBDZawso).
  3. T1_ST4 - Generate instructions manually in Odia considering the local context in mind.
  4. T1_ST6 - Prepare an evaluation test set in Odia/Indic languages for evaluating pre-trained model performance.

Unable to retrieve data from sitemaps.

Unable to retrieve data from sitemaps. When attempting to extract data from sitemaps using the tool, it fails to fetch any information and returns empty results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.