Git Product home page Git Product logo

spam_classification_with_bert's Introduction

CA4023_Assignment_2

This is the repository for the second assignment in CA4023 - Natural Language Technologies. This assignment contains two parts. Part one is sentiment analysis system improvements (with two approaches) & part two is BERT fine tuning on a spam classification task. This repository contains a folder for each of the two parts within this assignment. Each section contains a jupyter notebook with all code inside.

Part 1 - Sentiment analysis system improvements

This section takes a baseline sentiment analysis system which uses logistic regression and attempts to improve on it using two novel approaches. The two approaches are:

  • Stop word removal
  • Named entity recognition

Part 2 - Spam Classification using BERT

This section involves the classfication of sms messages to predict spam. It uses BERT, created by Google in 2018 (Devlin et al. 2018). BERT is a neural network model which predicts a masked token in a sentence and also is trained in sentence similarity between two section of text to determine if they are from the same document. The model is fine tuned for this task using google colab and a sms spam dataset available from HuggingFace.

Use of this code

To clone this repository execute the following commands:

git clone https://gitlab.computing.dcu.ie/dockreg2/ca4023_assignment_2.git
cd ca4023_assignment_2

spam_classification_with_bert's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.