Git Product home page Git Product logo

encode_nlp_workshop_2023's Introduction

Encode NLP workshop 2023

A practical introduction to machine learning and natural language processing on papyrus data.

In this training activity, we cover how to explore papyrus data using natural language processing and machine learning techniques. We also show how to build simple machine learning models for classifying different papyrus characteristics. The activity takes the participant through all the steps from downloading and preparing a dataset, to training a classification model. The activity will be done using Google Colab with python scripts prepared up front that the participant can modify to achieve the desired outcomes. It is recommended that the participant has basic experience with the python programming language.

A workshop created by André Walsøe for the Encode Workshop AI and Ancient Writing Cultures, Bologna 23rd-27th January 2023

Table of contents

Part 1:

  • Introduction to machine learning and NLP: Slides
  • Setup of google colab to run the workshop material (15 minutes)
    1. Open notebook: Open In Colab
    2. Click connect in upper right corner
    3. Log in to google account
    4. Click "Copy to Drive" in upper left corner. The notebook will then be copied to you google drive.
    5. Click "Connect" in upper right corner to connect to a computing instance.
  • Download dataset and install libraries

Part 2 Data Exploration and introduction to NLP techniques

  • 1 Data exploration
    1. Basic data exploration and filtering with Pandas
    2. Application of filtering techniques based on data exploration findings
    3. Hands-on-exercise: Data exploration and filtering
  • 2 Introduction to nlp techniques
    1. Lower text
    2. Tokenization
    3. Stopword removal
    4. Vectorization (count and tf-idf)

Part 3 Building a text classification model

  • Building text classification model
    1. Choose what to classify and which input data to use
    2. Split data into training and test
    3. Transform/vectorize data
    4. Training a logistic regression model
    5. Test and evaluate metrics
    6. Deploy model with Gradio
    7. Hands-on exercise: Reflect on possible usecases for these techniques
  • Wrap-up
  • Resources for learning more.

Workshop agenda

Session 1 (45 min):

  • Introduction
  • Set up of google colab
  • Introduction to Data exploration

Session 2 (45 min):

  • Hands-on task 1: Data Exploration
  • Introduction to basic NLP techniques
  • How to build a text classifier

Session 3 (45 min):

  • How to build a text classifier (continuation)
  • Brainstorming and discussion: How can ML and NLP be used in my field?
  • Wrap up

encode_nlp_workshop_2023's People

Contributors

auwalsoe avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.