Git Product home page Git Product logo

aist's Introduction

AIST

This repository contains materials for the AIST conference.

Data

Both datasets have been manually anonymized due to VK's privacy policy. Besides, they were normalized in a bit different way comparing with their initial versions, used in this study.

msg_with_emoji.csv is a small dataset (~10k samples), containing annotated messages with emoji.

train_data.csv is a relatively large dataset (~54k samples; only half of the dataset is uploaded here, full version can be uploaded later).

Model

model.joblib is a trained logistic regression model for predicting emotions in Russian text messages.

vect.joblib is a TF-IDF vectorizer for the aforementioned model.

Parser

Parser.py is a high-level parser for text messages requested from VK and Telegram.

It requires the path to the folder with messages and recursively iterates through all files.

VK parser looks into *.html files and TG parser checks *.json files.

Usage

from Parser import Parser
parser = Parser('PATH', mtype='tg') # parse TG messages
parser = Parser('PATH', mtype='vk') # parse VK messages
parser.parse() # returns a DataFrame with text messages

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.