Git Product home page Git Product logo

musicner's Introduction

Recognizing Musical Entities in User-generated Content

We present a novel method for detecting musical entities from user-generated content, modelling linguistic features with statistical models and extracting contextual information from a radio schedule. We analyzed tweets related to a classical music radio station, integrating its schedule to connect users' messages to tracks broadcasted.

This repository contains code to reproduce the results of our arXiv paper.

Reference:

Lorenzo Porcaro, Horacio Saggion (2019). Recognizing Musical Entities in User-generated Content. Paper presented at the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing) 2019, University of La Rochelle, La Rochelle, 7-13 April.

Contact:

lorenzo.porcaro at gmail.com

Reproduce our results

Installation:

Create a python 2.7 (sorry!) virtual environment and install dependencies pip install -r src/requirements.txt

Update config file:

Update the file etc/config.yaml, insert your consumer key, consumer secret, access token, access secret from the Twitter API. More info about the API: https://developer.twitter.com/

Import data:

To receive the data for reproducing the experiment, please contact lorenzo.porcaro at gmail.com. Once received, go to the data README page for more info.

Pre-process data:

To pre-process the data, run:

python src/hydrate_tweet.py -i ../path/to/input/file.json

It will read the tweet IDs and related annotations from the input file, and create the following output files

  1. INPUTFILE_entities.csv: list of entities annotated
  2. INPUTFILE_summary.csv: tweets summary information (creation date, raw text, etc)
  3. INPUTFILE_text_tkn.txt: tweet raw texts tokenized

Extract features:

To extract the required features from the data, run:

python src/extract_features.py -i ../path/to/INPUTFILE_summary.csv -e ../path/to/INPUTFILE_entities.csv -o ../path/to/OUTPUTFILE_WEKA.csv -n ../path/to/OUTPUTFILE_biLSTM_CRF.csv

It extracts several features from the input tweets for performing the experiments. It takes as input the INPUTFILE_summary.csv and INPUTFILE_entities.csv, and it creates two output files: one which can be used as input in WEKA, and one which can be used as input in this BiLSTM-CNN-CRF architecture for sequence tagging implementation

Schedule matching:

To run the matching against the schedule, run

python src/schedule_matcher.py -w work_tsl -c contr_tsl -t time_tsl -i ../path/to/UGC_INPUTFILE_summary.csv -s ../path/to/SCHEDULE_INPUTFILE_summary.csv

It searches for matches between entities annotated in the schedule and user-generated tweets. It writes the results in a text file in CoNLL format. The input parameters are the input summary files and the thresholds:

  • time_tsl (int): time-distance threshold (in seconds) between schedule tweet and user-generated tweet
  • work_tsl (float): string similarity threshold for Musical Work entities
  • contr_tsl (float): string similarity threshold for Contributor entities

The output file is written in results/schedule_matcher_%s_%s_%s.txt, where the %s in the file path are the values used for the thresholds.

For evaluating the results obtained from the schedule matching, run

src/conlleval < results/schedule_matcher_%s_%s_%s.txt > results/score.schedule_matcher_%s_%s_%s.txt

musicner's People

Contributors

lporcaro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.