Git Product home page Git Product logo

iasa_nlp_course's Introduction

IASA NLP Course

Setup Working Environment

Pre-requirements

Setup environment

Poetry (Recommended)

  1. Install Poetry using Poetry full guide.
    • Important: Check if it is working using poetry --version
  2. Run command to keep your .venv folder right in your project: poetry config virtualenvs.in-project true
  3. poetry shell
    • Important: If you have conda and 2 environments were activated: conda deactivate
  4. poetry install --no-root

In order to activate environment on the next use. Important: you should be inside your project

poetry shell

Conda

If you have CUDA

conda env create -f environment_gpu.yaml

Otherwise

conda env create -f environment.yaml

In order to activate environment

conda activate iasa_nlp_env

Start Jupyter

You may use any port

jupyter lab --port 7766

Content

  1. Структура та структурні елементи постановки ML задачі. Формалізація бізнес задач. Основні задачі й методи в сфері Обробки природних мов - Volodymyr
  2. Представлення природніх мов в машинному вигляді. Класичні та нейронні алгоритми векторизації. Класичні ML підходи в NLP. - Vladyslav
  3. Основні метрики в NLP (обробка природніх мов). Побудова оцінки підходів і моделей в NLP - валідація. - Anton
  4. Підходи з використанням архітектур RNN/GRU/LSTM. - Volodymyr
  5. Підходи з використанням архітектури Transformer. - Anton
  6. Генеративні задачі: машинний переклад, сумаризація тексту, умовна та безумовна текстова генерація, розгляд GPT архітектури - Vladyslav
  7. Задача кластеризації. Задача моделювання тем. - Volodymyr
  8. MLOps - розгортання моделей. - Anton

Use Kaggle or Colab for computations

Kaggle

  1. Create Kaggle account
  2. Create Notebook
  3. Explore docs and find out how
    • Add Kaggle dataset to notebook
    • Turn on GPU

Colab

  1. Create Notebook in Colab
  2. Enable GPU
  3. Add Kaggle dataset to Colab - https://www.geeksforgeeks.org/how-to-import-kaggle-datasets-directly-into-google-colab/

Data

  • For most of lectures you will need datasets from Kaggle. Prepare in advance
    • CommonLit - Evaluate Student Summaries dataset API command: kaggle competitions download -c commonlit-evaluate-student-summaries
    • Natural Language Processing with Disaster Tweets dataset API command: kaggle competitions download -c nlp-getting-started
    • Mantis Analytics Location Detection dataset: kaggle datasets download -d vladimirsydor/mantis-analytics-location-detection
    • Dataset for Topic Modelling: https://drive.google.com/drive/folders/1jwh225T0DIEN4A1wMZ8-dVJX-2Tsovqf?usp=sharing
  • We recommend to create data folder in the course root directory and put all datasets there. So you might have next structure
data/
    nlp_getting_started/
        train.csv
        test.csv
        ...
    ...
Lecture_1/
...

How to use Kaggle datasets

  1. Create Kaggle account
  2. Proceed with Installation & Authentication
  3. Don't forget to join a competition and accept its rules on a Kaggle website.
  4. Download dataset with API command

iasa_nlp_course's People

Contributors

vsydorskyy avatar prometheusua avatar antonbazdyrev avatar vyelisieievv avatar nochnoyritzar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.