Git Product home page Git Product logo

blade_chest's Introduction

Blade-chest model

Overview

Blade-chest (BC) is a software developed by Shuo Chen from the Department of Computer Science, Cornell University. It learns from matchup or comparison data for future prediction. It is capable of handling intransitivity which is not covered in many conventional methods. Please see our paper for more details. This program is granted free of charge for research and education purposes. However you must obtain a license from the author to use it for commercial purposes. Since it is free, there is no warranty for it.

Build

A simple "make" will do. It will create a binary ../bin/BC, which serves for training and testing.

Usage

Format of the game record data

The format is very intuitive and mostly human-readable, as demonstrated in the files within datasets folder. The first line is the total number of players, followed by lines that contain all the players' IDs (starting from 0) names. Then there is one line for total number of games. The rest are the game records. Each record could be prefixed with a tag from "FOR_TRAINING", "FOR_VALIDATION" and "FOR_TESTING", indicating what this line of record is for. Otherwise, when no record is prefixed, the records will be randomly divided into 5:2:3 ratio for training, validation and testing. For the latter case, you can use different random seed to generate different split. The main body of the record takes a "a:b x:y" format, meaning a beats b x times and loses to b y times.

Format of the matchup matrix data

This input data is optional. It is only useful for running the matrix reconstruction experiment in our paper. The format is intuitive as well. One can see it from the example file datasets/sf4/sf4mat.txt. It is a matrix of integers in [1, 9], except the diagonal elements are 0s. Rows are separated by '\n' and columns by ' '.

Running the program

BC is used in the following format:

BC [options] data_file model_file

An example is (assuming you are under /bin):

./BC -d 5 -l 0.001 -M 2 ../datasets/tennis/tennis.txt mymodel.txt

Available options are:

-d						int               Dimensionality of the embedding (default 2)

-e						float             Error allowed for termination (default 1e-4)

-i						float             Learning rate (default 1e-2)

-l						float             Regularization coefficient (default 0.0)

-r						[0, 1]            Including the bias terms (1) or not (0) (default 1)

-S						int               The seed for random number generator to create different training, validation, and testing split (default 0)

-a						float             Adaptively increase the learning rate by this number if the improvement of the training objective function is too small (default 1.1, not recomeended to change if you run the code on our datasets)

-b						float             Adaptively decrease the learning rate by this number if the training objective function deteriorates (default 2.0, not recomeended to change if you run the code on our datasets)

-t						[0, 1, 2]         0: 2-norm regularizer on blade and chest vectors 
  						                  1: regularizer on the distances between blade and chest vectors (default)
  						                  2: sum of type 0 and type 1 as regularizer 

-M						[0, 2]            dist model (0) or inner model (2) (default 0. Note that 1 is another matchup function we experiemnted with, but did perform as well. You can try it if you like.)


-E						path              the matchup matrix data if you are running the reconstruction experiment (default '\0', meaning it is not used)  

Outputs

There are two outputs: training log and validation/testing results printed to stdout, and model stored in model_file.

The log is human-readable, it contains information for each training iteration, validation/test log-likelihood and accuracy, results for the naive baselines (as a bonus), and the reconstruction results if you use -E to provide a matchup matrix file. Note that it is possible to run Bradley-Terry model by simply using "-d 0 -r 1" option.

The first four lines of the model file contain numplayers, d, rankon (usd bias terms or not) and modeltype (basically -M). They are followed by tvecs and hvecs, the chest vectors and blade vectors (we called them tail vectors and head vectors initially). In the end, there is one line of "ranks" you turned -r on.

Datasets

These datasets are collected and processed by Shuo Chen from multiple public sources on the internet. Every dataset used in our paper is under /datasets except the peer grading ones which we do not have the right to release. We do not own these data. Please cite each of the individual source if you use them for research or education purposes, and contact the source for any commercial use. Please see our paper for details on the sources.

Bug Report

Please contact the author if you spot any bug in the software.

References

If you use the software, please cite the following paper:

Shuo Chen, Thorsten Joachims. Modeling Intransitivity in Matchup and Comparison Data. The 9th ACM International Conference on Web Search and Data Mining (WSDM)

blade_chest's People

Contributors

csinpi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.