Git Product home page Git Product logo

openskill.js's Introduction

Version Tests Coverage Status Downloads License

Javascript implementation of Weng-Lin Rating, as described at https://www.csie.ntu.edu.tw/~cjlin/papers/online_ranking/online_journal.pdf

Speed

Up to 20x faster than TrueSkill!

Model Speed (higher is better) Variance Samples
Openskill/bradleyTerryFull 62,643 ops/sec ±1.09% 91 runs sampled
Openskill/bradleyTerryPart 40,152 ops/sec ±0.73% 91 runs sampled
Openskill/thurstoneMostellerFull 59,336 ops/sec ±0.74% 93 runs sampled
Openskill/thurstoneMostellerPart 38,666 ops/sec ±1.21% 92 runs sampled
Openskill/plackettLuce 23,492 ops/sec ±0.26% 91 runs sampled
TrueSkill 2,962 ops/sec ±3.23% 82 runs sampled

See this post for more.

Installation

Add openskill to your list of dependencies in package.json:

npm install --save openskill

Usage

If you're writing ES6, you can import, otherwise use CommonJS's require

import { rating, rate, ordinal } from 'openskill'

Ratings are kept as an object which represent a gaussian curve, with properties where mu represents the mean, and sigma represents the spread or standard deviation. Create these with:

> const { rating } = require('openskill')
> const a1 = rating()
{ mu: 25, sigma: 8.333333333333334 }
> const a2 = rating({ mu: 32.444, sigma: 5.123 })
{ mu: 32.444, sigma: 5.123 }
> const b1 = rating({ mu: 43.381, sigma: 2.421 })
{ mu: 43.381, sigma: 2.421 }
> const b2 = rating({ mu: 25.188, sigma: 6.211 })
{ mu: 25.188, sigma: 6.211 }

If a1 and a2 are on a team, and wins against a team of b1 and b2, send this into rate

> const { rate } = require('openskill')
> const [[x1, x2], [y1, y2]] = rate([[a1, a2], [b1, b2]])
[
  [
    { mu: 28.67..., sigma: 8.07...},
    { mu: 33.83..., sigma: 5.06...}
  ],
  [
    { mu: 43.07..., sigma: 2.42...},
    { mu: 23.15..., sigma: 6.14...}
  ]
]

Teams can be asymmetric, too! For example, a game like Axis and Allies can be 3 vs 2, and this can be modeled here.

Ranking

When displaying a rating, or sorting a list of ratings, you can use ordinal

> const { ordinal } = require('openskill')
> ordinal({ mu: 43.07, sigma: 2.42})
35.81

By default, this returns mu - 3*sigma, showing a rating for which there's a 99.7% likelihood the player's true rating is higher, so with early games, a player's ordinal rating will usually go up and could go up even if that player loses.

Artificial Ranking

If your teams are listed in one order but your ranking is in a different order, for convenience you can specify a ranks option, such as

> const a1 = b1 = c1 = d1 = rating()
> const [[a2], [b2], [c2], [d2]] = rate([[a1], [b1], [c1], [d1]], {
    rank: [4, 1, 3, 2] // 🐌 🥇 🥉 🥈
  })
[
  [{ mu: 20.963, sigma: 8.084 }], // 🐌
  [{ mu: 27.795, sigma: 8.263 }], // 🥇
  [{ mu: 24.689, sigma: 8.084 }], // 🥉
  [{ mu: 26.553, sigma: 8.179 }], // 🥈
]

It's assumed that the lower ranks are better (wins), while higher ranks are worse (losses). You can provide a score instead, where lower is worse and higher is better. These can just be raw scores from the game, if you want.

Ties should have either equivalent rank or score.

> const a1 = b1 = c1 = d1 = rating()
> const [[a2], [b2], [c2], [d2]] = rate([[a1], [b1], [c1], [d1]], {
    score: [37, 19, 37, 42] // 🥈 🐌 🥈 🥇
  })
[
  [{ mu: 24.689, sigma: 8.179 }], // 🥈
  [{ mu: 22.826, sigma: 8.179 }], // 🐌
  [{ mu: 24.689, sigma: 8.179 }], // 🥈
  [{ mu: 27.795, sigma: 8.263 }], // 🥇
]

Predicting Winners

For a given match of any number of teams, using predictWin you can find a relative odds that each of those teams will win.

> const { predictWin } = require('openskill')
> const a1 = rating()
> const a2 = rating({mu:33.564, sigma:1.123})
> const predictions = predictWin([[a1], [a2]])
[ 0.45110899943132493, 0.5488910005686751 ]
> predictions[0] + predictions[1]
1

Predicting Draws

Also for a given match, using predictDraw you can get the relative chance that these teams will draw. The number returned here should be treated as relative to other matches, but in reality the odds of an actual legal draw will be impacted by some meta-function based on the rules of the game.

> const { predictDraw } = require('openskill')
> const prediction = predictDraw([[a1], [a2]])
0.09025530533015186

This can be used in a similar way that you might use quality in TrueSkill if you were optimizing a matchmaking system, or optimizing an tournament tree structure for exciting finals and semi-finals such as in the NCAA.

Alternative Models

By default, we use a Plackett-Luce model, which is probably good enough for most cases. When speed is an issue, the library runs faster with other models

import { bradleyTerryFull } from './models'
const [[a2], [b2]] = rate([[a1], [b1]], {
  model: bradleyTerryFull,
})
  • Bradley-Terry rating models follow a logistic distribution over a player's skill, similar to Glicko.
  • Thurstone-Mosteller rating models follow a gaussian distribution, similar to TrueSkill. Gaussian CDF/PDF functions differ in implementation from system to system (they're all just chebyshev approximations anyway). The accuracy of this model isn't usually as great either, but tuning this with an alternative gamma function can improve the accuracy if you really want to get into it.
  • Full pairing should have more accurate ratings over partial pairing, however in high k games (like a 100+ person marathon race), Bradley-Terry and Thurstone-Mosteller models need to do a calculation of joint probability which involves is a k-1 dimensional integration, which is computationally expensive. Use partial pairing in this case, where players only change based on their neighbors.
  • Plackett-Luce (default) is a generalized Bradley-Terry model for k ≥ 3 teams. It scales best.

Implementations in other languages

openskill.js's People

Contributors

brezinajn avatar jlaferri avatar kejadlen avatar philihp avatar renovate-bot avatar renovate[bot] avatar spencerolson avatar vaschex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

openskill.js's Issues

Plackett-Luce vs Bradley-Terry Full Pairs

Thanks a lot for the library, it seems super useful for matchmaking. I just have one question.

The Plackett-Luce model is the default model over the Bradley-Terry Full Pairs. Why? In which situations is one preferred over the other?

Help explaining ties

Hi!

Could someone please help me understand how ties work given the examples below?

const result = rate(
  [[rating()], [rating()], [rating()], [rating()]],
  {
    rank: [1, 2, 3, 2],
  }
)

//  result, ordinal: [
//    3.0056026990343057, //  1st
//    0.15179388828942209, //  2nd
//    -1.7115960929604057, //  3rd
//    0.15179388828942209 //  2nd
//  ]

So far so good, but...

const result = rate(
  [[rating()], [rating()], [rating()], [rating()], [rating()]],
  {
    rank: [1, 2, 3, 2, 4],
  }
)

//  result, ordinal: [
//    2.794996035203244, //  1st
//    0.44622977732347024, //  ranked 2nd but skill is 3rd
//    0.649325864695502, //  ranked 3rd but skill is 2nd
//    0.44622977732347024, //  ranked 2nd but skill is 3rd
//    -2.6840074686378337 //  4th
//  ]

It gets "worse"...

const result = rate(
  [[rating()], [rating()], [rating()], [rating()], [rating()], [rating()], [rating()], [rating()]],
  {
    rank: [1, 2, 3, 2, 4, 5, 6, 7],
  }
)

//  result, ordinal: [
//    2.3490991742487033, //  1st
//    0.7035551798587676, //  ranked 2nd but skill is 4th
//    1.5576843491759007, //  ranked 3rd but skill is 2nd
//    0.7035551798587676, //  ranked 2nd but skill is 4th
//    0.973568140809558, //  ranked 4th but skill is 3rd
//    0.18397380061958657, //  5th
//    -1.0333401668557407, //  6th
//    -3.668571550329389 //  7th
//  ]

Thanks!

Rating decay to eliminate camping

Is there any provision for rating decay? Fundamentally, the big problem with original TrueSkill is that someone can have a high ranking, end up on leaderboards, and then stop playing. This is called "camping", and is a common behavior pattern. People get a good game streak, their rank gets high, and then they are disincentivized to play ranked modes again.

I've been playing around with a function that increases the sigma value of players that haven't played a minimum of games in a recent time window, but this comes with side effects. Notably, if a player's sigma is increased through rating decay and then they win their next game, their mu value ends up higher than it would have been absent rating decay.

Do you have any thoughts on how to address rating decay? I would like some sort of function that decays the ranking of players that don't play actively.

Create Docsify site

The README for this is growing too big. I'd like to make a docs tutorial website, similar to http://hathora.dev. That should give more room for examples and answer common questions people have about configuration and tuning.

Ranking producing strange results

Example

import {rate, rating as openRating, ordinal } from 'openskill'

// 12 teams of 1
const teams = [[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()]]


// Lowest rank first
const rank = [1,2,2,4,5,5,7,8,9,9,11,12]

const all = rate(teams,{rank})

all.forEach(x=>console.log(ordinal(x[0])))

This outputs

1.9887943085056428
0.7351607181385624
0.7351607181385624
1.5932055407029928
0.2720169616753587
0.2720169616753587
1.0192331720170102
0.6235031893289182
-0.9496298411755326
-0.9496298411755326
-0.8953795301204437
-3.0470369446801193

Which is odd because the two people in 2-3 tied, received a rating of 0.7351607181385624, but the person who ranked 4th got a higher rating of 1.5932055407029928.

If I do the inverse (with score) I get a similar result.

const score = [12,11,11,9,8,8,6,4,3,2,1]
const open2 = rate(teams,{score})
open2.forEach(x=>console.log(ordinal(x[0])))

outputs

1.9887943085056428
0.7351607181385624
0.7351607181385624
1.5932055407029928
0.2720169616753587
0.2720169616753587
1.0192331720170102
0.6235031893289182
0.12619886610430697
-0.542803641476798
-1.5642761925161288
-3.7159336070758044

If I use thurstoneMostellerFull I get a more expected result.

const teams = [[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()],[openRating()]]
const rank = [1,2,2,4,5,5,7,8,9,9,11,12]
const open3 = rate(teams,{rank, model: thurstoneMostellerFull})
open3.forEach(x=>console.log(ordinal(x[0])))

outputs

71.00770829746648
58.39200967088472
58.39200967088472
45.776231044302946
33.16053241772118
33.16053241772118
20.54475379113941
12.13426137341823
-0.48143725316354136
-0.48143725316354136
-13.097215879745306
-21.50770829746648

Is there something I am missing? Could this be an issue with the Plackett-Luce model?

Thanks for your time!

Winning probability

I finally found some time to invest in openskill and I had a look into your benchmarks. Great results!
I have a question with respect to the accuracy:
You calculate the correctness solely based on the mu value (https://github.com/philihp/openskill.js/blob/master/benchmark/load.js#L67).
Sublee (author of the Python-Trueskill implementation) suggests to calculate the winning probability based on both mu and sigma: https://trueskill.org/#win-probability
Is there any reason behind this?
I am curious to hear your thoughts about this.

Rename variables to match paper

  • Refactor and rename variables in the src/models/ such that they match the variables defined in the PDF. This will make the code easier to maintain and perhaps help others port to other languages.
  • e.g. sigSqToCiq should probably be some variation of qEta, since it's related to ηq defined on page 287, §3.1.1
  • The unit tests should protect you from making any catastrophic changes.

feat: Support alternative ranking

Such that if 3 teams come in [ a, b, c ], but the actual ordering was c, a, b, the call would be like:

rank([a, b, c], { ranks: [2, 3, 1 ] })

Starting values for MU and Sigma

Hi,

First of all, a big thank you for this awesome repository.

I've recently tried to implement this into my ladders for my gaming app.
But i have a problem finding the sweet spot for the default mu and sigma value for new participants.

I want all new users to start with an ordinal of 1000 and present it like an ELO for the rating of the users.

My problem: After the first round is completed the one who finished last gets way to much "negative" ranking and vice verse the one who wins gets way to high ranking and it would take multiple rounds to catch up for the looser.

So if i switch the results in round 2 the looser wins and the winner loose. The points for winning is like one third of the previous points lost.

FYI, the ladder is about 30 rounds and in each rounds its 15 users.

I understand that the result will initially fluctuate until the sigma value has stabilized. But shouldn't it fluctuate both ways?

I'm using the default model and my currently starting values are { mu: 1500, sigma:167 }.
I've tried different models but with no success.

Thanks in advance and have a nice day!

Go Port

I'm working on a Go port, and I already have implemented the Plackett-Luce model. If it is not asking much, when I finish porting it, could you link it like the other ports?

Which model to use?

I'm using the trueskill js port for my bot which is used to organize games at the moment. Since there is no way to take team scores into account for ratings I wanted to use openskill.js instead. Since the library offers different models I wanted to ask which model suits my requirements the best. Players in a team can range from one to ten and there can be up to ten teams.

Make Typescript a thing

  • This task entails renaming relevant files from .js to .ts
  • Add the necessary dependencies
  • Ensure that the library still compiles as expected. This should not be a breaking feature.

`rate` in case of a draw (1vs1)

Hi this is an awesome package and really helpful. Quick question: in the case of draw (1vs1), how should we use the rate function?

In the trueskill, there is a rate_1vs1(r1, r2, drawn=True) feature to rate the skill of the agent. Upon further inspection, the drawn=True manipulates the ranks as follows:

    ranks = [0, 0 if drawn else 1]
    teams = env.rate([(rating1,), (rating2,)], ranks, min_delta=min_delta)
    return teams[0][0], teams[1][0]

Could I handle the draw situation (1vs1) by overriding ranks like done in trueskill? What if I have a draw situation for 2vs2?

Also, see

from openskill import Rating, rate
a1 = b1  = Rating()
print(rate([[a1], [b1]], rank=[0,0]))

from trueskill import Rating, rate_1vs1
a1 = b1  = Rating()
print(rate_1vs1(a1, b1, drawn=True))
[[Rating(mu=27.63523138347365, sigma=8.065506316323548)], [Rating(mu=22.36476861652635, sigma=8.065506316323548)]]
(trueskill.Rating(mu=25.000, sigma=6.458), trueskill.Rating(mu=25.000, sigma=6.458))

Thank you!

predict_win methods appears to give incorrect values in the 2 player case

I've been experimenting with the prediction function and I would expect to get equal probabilities for two player teams 1 member each both with the same ratings. This isn't what I'm currently seeing.

i.e. PlackettLuce with two players in two teams, both with default ratings:

from openskill.models import PlackettLuce
model = PlackettLuce()
a = model.rating()
b = model.rating()
model3.predict_win([[a], [b]])

[0.38146287993364575, 0.6185371200663543]

Additionally, when I input teams with differing ratings, the order in which I put them appears to effect the probability estimates that I see returned :(?

from openskill.models import PlackettLuce
model = PlackettLuce()
r = model.rating
[[a, b], [x, y]] = [[r(), r()], [r(), r()]]
[[a, b], [x, y]] = model.rate([[a, b], [x, y]])
[[a, x], [b, y]] = model.rate([[a, x], [b, y]])
model.predict_win([[a], [b]])

[0.49602898384231453, 0.5039710161576855]
model.predict_win([[b], [a]])
[0.27041217584012034, 0.7295878241598797]

Happy to be told I'm doing something silly...

Partial Play

Very nice work, I was reading documentation and I noticed that there is not any support to abandoned and partial games (Partial Play).

Is it possible to implement some kind of partial play's support (like in this trueskill implementation) or openskill's math is incompatible with this support?

Thanks you!

Ranking function?

There seems to be no rank() function as described in the README. Is this because it hasn't been implemented yet?

Different weights for games

Hey!

First of all, thanks a lot for the work put in, the library is awesome. I'm using it to rank players in TFT games (free-for-all 8 player games) and in general it's working very well, but I want to understand if it's possible to give a slight bigger weight to some games.

Example: in the competitive TFT circuit, there are regional tournaments and international ones. The former have higher stakes and technically could be "worth more" in terms of rate, like winning in a Grand Slam vs a ATP 250 in Tennis. Is there anyway I can setup this weight? I searched through the issues and found out about the "gamma" parameter, but I couldn't understand how to use it based on the docs.

Thanks in advance.

Methods like trueskill 2 ?

Could be interested to have functions like trueskill 2, like when a player leave the game, or to attribute different scores to the team players etc

Support for partial play?

Hi there! Documentation for TrueSkill implementations describe (rather vaguely) how it can handle "partial play", when a player is only present for a certain proportion of the game, so their contribution is down-weighted. This kind of weighting is a central part of the project I'm working on at the moment. Is partial play weighting also possible within the OpenSkill framework, and if so is it implemented somewhere in this library?

[Question] Python Port

I'm writing a python port of this package and I don't understand why this function results in 102.42

expect(sumQ[1]).toBeCloseTo(102.42)

What exactly is this function supposed to return for 5 v 5?

[204.84378810598616, 204.84378810598616]

or

[204.84378810598616, 102.42189405299308]

If it's the latter, why and what does this function do?

Requirements for 1.0.0

Pretty satisfied with the interface and performance. Haven't seen any numbers abnormalities after benchmarking. Prior to a 1.0.0 release, I'd like to close

  • #32 Support alternate ranking option
  • #31 Support tied rankings

I honestly can't think of anything to expand feature scope that would also add value, outside of being able to swap out CDF/gaussian implementations, but no one yet has voiced that they want that. I am open to suggestions.

Independency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Repository problems

These problems occurred while renovating this repository. View logs.

  • WARN: Using npm packages for Renovate presets is now deprecated. Please migrate to repository-based presets instead.

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

  • chore(deps): update typescript-eslint monorepo to v7.2.0 (@typescript-eslint/eslint-plugin, @typescript-eslint/parser)

Detected dependencies

github-actions
.github/workflows/tests.yml
  • actions/checkout v4
  • actions/setup-node v4
  • actions/checkout v4
  • actions/setup-node v4
  • actions/checkout v4
  • actions/setup-node v4
nodenv
.node-version
  • node 20.11.1
npm
package.json
  • @types/gaussian 1.2.2
  • @types/ramda 0.29.11
  • gaussian 1.3.0
  • ramda 0.29.1
  • sort-unwind 2.1.2
  • @babel/cli 7.23.9
  • @babel/core 7.24.0
  • @babel/preset-env 7.24.0
  • @babel/preset-typescript 7.23.3
  • @babel/register 7.23.7
  • @philihp/eslint-config 6.1.0
  • @philihp/prettier-config 1.0.0
  • @tsconfig/node20 20.1.2
  • @types/jest 29.5.12
  • @typescript-eslint/eslint-plugin 7.1.1
  • @typescript-eslint/parser 7.1.1
  • eslint 8.57.0
  • eslint-import-resolver-typescript 3.6.1
  • eslint-plugin-jest 27.9.0
  • husky 9.0.11
  • jest 29.7.0
  • typescript 5.4.2

  • Check this box to trigger a request for Renovate to run again on this repository

RFC: perhaps simpler rating function interface?

I'm curious if this might be a good idea? If this would be an easier interface, I'd love to hear from people. It could be added and overload the current behavior to do both.

Proposal: If you pass in an array of strings or numbers (things that aren't rating objects), then it treats those as like player IDs. For instance Alice and Betty are on a team. Cindy and Debby are on another team.:

let ratings = {}
ratings = rate([['alice', 'betty'], ['cindy', 'debby']], { state: ratings })
// {
//   'alice': { mu: 26.964, sigma: 8.1776 },
//   'betty': { mu: 26.964, sigma: 8.1776 },
//   'cindy': { mu: 23.035, sigma: 8.1776 },
//   'debby': { mu: 23.035, sigma: 8.1776 }
// }
ratings = rate([['alice', 'cindy'], ['betty', 'debby']], { state: ratings })
// {
//   'alice': { mu: 28.888, sigma: 8.0257 },
//   'betty': { mu: 24.959, sigma: 8.0257 },
//   'cindy': { mu: 25.041, sigma: 8.0257 },
//   'debby': { mu: 21.112, sigma: 8.0257 }
// }
ratings = rate([['alice'],['betty']], { state: ratings })
// {
//   'alice': { mu: 30.663, sigma: 7.7960 },
//   'betty': { mu: 19.337, sigma: 7.7960 },
//   'cindy': { mu: 25.041, sigma: 8.0257 },
//   'debby': { mu: 21.112, sigma: 8.0257 }
// }

Thus instead of passing in rating objects, we'd pass in names of players or player IDs, and then we'd get back a new rating object. I believe strongly that rate should stay pure and not have the side effect of mutating the ratings contained in the object passed in.

I'm thinking this behavior would be triggered on if typeof state === 'object' (or if it were a dict/hash in other languages)

Plackett-Luce question

Hello there, I wanted to use this algorithm in Ruby and started porting the code over, but I have a question about your implementation of Plackett-Luce.

const [omegaSum, deltaSum] = teamRatings
.filter(([_qMu, _qSigmaSq, _qTeam, qRank]) => qRank <= iRank)
.reduce(
([omega, delta], [_], q) => {
const quotient = iMuOverCe / sumQ[q]
return [
omega + (i === q ? 1 - quotient : -quotient) / a[q],
delta + (quotient * (1 - quotient)) / a[q],
]
},
[0, 0]
)

There's a reduce after a filter, so wouldn't the q index be incorrect when indexing into sum_q and c, or am I misunderstanding the code here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.