sublee / trueskill Goto Github PK

View Code? Open in Web Editor NEW

746.0 23.0 113.0 695 KB

An implementation of the TrueSkill rating system for Python

Home Page: https://trueskill.org/

License: Other

Python 100.00%

trueskill python rating-system

trueskill's Introduction

TrueSkill, the video game rating system

See the documentation.

by Heungsub Lee

trueskill's People

Contributors

Stargazers

Watchers

Forkers

jbcooper tongming mjpost carpedm20 siggame kaczor78 langeds vickkyy parkayun cliffclive td-andy yaozhangpan ccpgames rmarquis rsimmons yupbank chiaolun timwee twang006 townie sixmanguru buddyatkins notyetend alexbigboy goxman8 xuanhan863 sallensun romovpa gover520 chriskolenko lt11222 karhohs tbble vribeiro1 lisg2016 eycab nlarusstone ut-eterstats 7aked0wn tolygins felker glandfried bussiere fpzh2011 bernd-wechner snow-stone xuguanggen yingshen-ys baichii vishalbelsare smicn therefromhere hfxunlp sahanbull znewt99 vladiwnl zouwuhe seankimdesign xilongpei charlie-chau viennazhu frankkdong mcrey-cyber taxane adedayoominiyi iamkjw tahahaha7 mrhenryd hdvariantlabs mackcmillion vinklibrary borneonose xuejin18 stabbath ibingoogler mxbi mbabby wook133 anthony0727 sircaptainmitch seanchencs football61 jebnid9 clanwarz tournafest adity5 shibahead korman qiaozhqz talhz eduardosalaz delaunay gunjunlee jonathan-scholz labachos chrischris96 jaksymarci evilpegasus rabbentor diwanliwe

trueskill's Issues

Add `trueskill` to `conda-forge`

Hi,

Could you make the package available on conda-forge?

Possible to give lower absolute weight to a given match?

I was wondering if it was possible to give a lower absolute weight to a given match. For example, if a match is typically played to 10 points but the match was only played to 6 but and was still deemed "complete," how would I implement this with TrueSkill? I saw the section on partial play but it looks like this is for when one player joins or leaves a match and doesn't play the whole duration. If I put 0.6 for all the weights parameters, the rating changes as if it was a 10 point match. Thanks!

Players not able to play on differently sized, rebalanced teams?

Hi, I'm attempting to use this implementation to rank rowers. In this sport, oftentimes the same athletes can compete in lineups with 1, 2, 4, or 8 people, and will not always have the same teammates. I assumed I would be able to execute the following code with no issue:

#Womens U17 1x Heat 1
t1 = [trinitywi]
t2 = [annaliedu]
t3 = [summerma]
t4 = [sofiapa]
(trinitywi), (annaliedu), (summerma), (sofiapa) = rate([t1, t2, t3, t4], ranks =[3, 0, 2, 1])

#Womens U17 1x Heat 2
t1 = [selahki]
t2 = [lillydu]
t3 = [malloryst]
t4 = [samanthaca]
(selahki), (lillydu), (malloryst), (samanthaca) = rate([t1, t2, t3, t4], ranks =[0, 1, 2, 3])

#Womens U17 4x
t1 = [lillydu, selahki, tarasc, molliba]
t2 = [mauricapi, lilysp, sarahdu, mariasa]
t3 = [lindsibe, emmacr, noraga, lydiama]
t4 = [hannahed, victoriaal, elliean, charlottecr]
t5 = [arwenmc, oliviaye, emmaha, annawa]
(lillydu, selahki, tarasc, molliba), (mauricapi, lilysp, sarahdu, mariasa), (lindsibe, emmacr, noraga, lydiama), (hannahed, victoriaal, elliean, charlottecr), (arwenmc, oliviaye, emmaha, annawa) = rate([t1, t2, t3, t4, t5], ranks =[0, 2, 4, 3, 1])

However, this returns the following error:

Traceback (most recent call last):
  File "C:\Users\kees\Desktop\TrueSkill\Rowing.py", line 107, in <module>
    (lillydu, selahki, tarasc, molliba), (mauricapi, lilysp, sarahdu, mariasa), (lindsibe, emmacr, noraga, lydiama), (hannahed, victoriaal, elliean, charlottecr), (arwenmc, oliviaye, emmaha, annawa) = rate([t1, t2, t3, t4, t5], ranks =[0, 2, 4, 3, 1])
  File "C:\Users\kees\AppData\Roaming\Python\Python310\site-packages\trueskill\__init__.py", line 700, in rate
    return global_env().rate(rating_groups, ranks, weights, min_delta)
  File "C:\Users\kees\AppData\Roaming\Python\Python310\site-packages\trueskill\__init__.py", line 498, in rate
    layers = self.run_schedule(*args)
  File "C:\Users\kees\AppData\Roaming\Python\Python310\site-packages\trueskill\__init__.py", line 398, in run_schedule
    f.down()
  File "C:\Users\kees\AppData\Roaming\Python\Python310\site-packages\trueskill\factorgraph.py", line 102, in down
    sigma = math.sqrt(self.val.sigma ** 2 + self.dynamic ** 2)
AttributeError: 'tuple' object has no attribute 'sigma'

I would really appreciate insight into why I am thrown an error at the point at which athletes are being combined into larger boats. And if it is indeed because lineups are growing, is there a workaround way to do what I'm trying to do here without error? Thanks.

cannot import name imap (Python 3.3.2)

Hi sublee,

Firstly, thanks for implementing a Python version of the Trueskill algorithm!

I'm unfortunately getting the following error when I try to import the trueskill library:

C:\test>python test.py
Traceback (most recent call last):
File "test.py", line 1, in
import trueskill
File "C:\Python33\lib\site-packages\trueskill-0.4.1-py3.3.egg\trueskill__init
__.py", line 12, in
ImportError: cannot import name imap

Do you have any ideas as to what is wrong?

Thanks for your help!

in section rating_groups, what is 'player.team'?

in the api docs there's a reference to rating_groups, and the code:

calculate new ratings

rating_groups = [{p1: p1.rating, p2: p2.rating}, {p3: p3.rating}]
rated_rating_groups = env.rate(rating_groups, ranks=[0, 1])

save new ratings

for player in [p1, p2, p3]:
player.rating = rated_rating_groups[player.team][player]

where does player.team come from? this seems pulled out of thin air ...

Documentation convergence speed

Hi, I am writing my master thesis and there is an issue about the convergence of TrueSkill in our case. We have thousands of players with a different number of matches and we want to only consider players that TrueSkill approximates real skill of a player 'well'. Because we are using mu values of players as data labels in a supervised learning task, so it is important that the labels should be in good quality. Regarding the issue, I have several questions:

What is the best measure to evaluate the convergence of TrueSkill?
If the answer is the number of matches, we found a table in the documentation indicates, for 4:4 game, 40 matches is needed to the convergence of TrueSKill. What is the source of information (or how the calculation of this magic number done)?
We thought that the sigma value of a player can also indicate the quality of convergence. In which sigma threshold appr. sure convergence of skill is good?

Thanks in advance for all guys contribute the issue.

Win probability?

I'm trying to figure out how to calculate the win probability from two TrueSkill ratings. You have draw probability, as match_quality, but not win probability.

Any idea how to calculate it? I've been reading lots of stuff on TrueSkil, but it's a bit over my head.

Ranks for games with scores like tennis/padel

Hey! I am currently using Truescore to rank padel players playing 2:2 games. After each game, you get a score like "3:2" or something. Should I use those scores as ranks? Or enter [1;0] for the winner? Can't find any documentation/examples for ranks, so any recommendations would be valuable!

Free For All Ties

Trying to use FFA for a leaderboard ranking. However, multiple players will have a score of 0. Is there a way for players that score equally to be set as ties?

Edit: solved

Proposing a PR to fix a few small typos

Issue Type

[x] Bug (Typo)

Steps to Replicate and Expected Behaviour

Examine trueskilltest.py and observe interable, however expect to see iterable.
Examine trueskill/init.py and observe couls, however expect to see could.

Notes

Semi-automated issue generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

To avoid wasting CI processing resources a branch with the fix has been
prepared but a pull request has not yet been created. A pull request fixing
the issue can be prepared from the link below, feel free to create it or
request @timgates42 create the PR. Alternatively if the fix is undesired please
close the issue with a small comment about the reasoning.

https://github.com/timgates42/trueskill/pull/new/bugfix_typos

Thanks.

three or more players ranks probability

I think, if I want to calculate the win percentage for two players, I should calculate the difference distribution.
X - Y ~ N(μ1 - μ2, σ12 + σ22)
I think that it is sufficient to definitely integrate N in the interval of 0 or more.

But, if game is contained three or more players, how should I predict the ranking probability?

a = Rating(mu= m_a, sigma= s_a)
b = Rating(mu= m_b, sigma= s_b)
c = Rating(mu= m_c, sigma= s_c)

# I want to calculate the probability that rank c > b > a.

Increase probablity in 2v1 of the 1 winning.

I'm wondering if there's a way to increase the probability that the 1 manages to win against two other players, because currently it thinks its much harder than it actually is.

Large FFA produces unexpected mu values

Possibly related to #22

I ran multiple large FFAs. Some of the FFAs consist of large parts of the population while others are much smaller. I noticed that one player who only did a few of the smaller FFAs and performed relatively poorly had the largest mu of all the players while still maintaining a relatively small sigma. Does this appear to be an issue with my setup, this implementation of trueskill, or an issue with trueskill itself?

Here is my setup:
draw_probability = 0, mu = 25, sigma = mu / 3, beta = sigma / 4

I have bolded the matches where both players competed. Matches are listed in chronological order.

Player 1 (identifier externally as the best player):
trueskill.Rating(mu=51.219, sigma=3.449) 1 / 979
trueskill.Rating(mu=40.846, sigma=1.768) 13 / 890
trueskill.Rating(mu=38.448, sigma=1.334) 18 / 727
trueskill.Rating(mu=38.392, sigma=1.132) 3 / 800
trueskill.Rating(mu=38.980, sigma=1.049) 1 / 711
trueskill.Rating(mu=39.408, sigma=0.988) 1 / 578
trueskill.Rating(mu=39.387, sigma=0.911) 2 / 503
trueskill.Rating(mu=39.664, sigma=0.874) 1 / 355
trueskill.Rating(mu=39.789, sigma=0.851) 1 / 687
trueskill.Rating(mu=39.919, sigma=0.852) 2 / 139
trueskill.Rating(mu=39.947, sigma=0.851) 18 / 132
trueskill.Rating(mu=39.382, sigma=0.848) 8 / 128
trueskill.Rating(mu=39.404, sigma=0.851) 2 / 129
trueskill.Rating(mu=40.144, sigma=0.851) 1 / 116
trueskill.Rating(mu=39.502, sigma=0.847) 8 / 115
trueskill.Rating(mu=39.386, sigma=0.849) 1 / 80
trueskill.Rating(mu=39.386, sigma=0.853) 1 / 122
trueskill.Rating(mu=38.502, sigma=0.789) 34 / 1817
trueskill.Rating(mu=37.862, sigma=0.739) 16 / 1629
trueskill.Rating(mu=37.462, sigma=0.698) 8 / 1354
trueskill.Rating(mu=37.562, sigma=0.686) 1 / 1418
trueskill.Rating(mu=37.714, sigma=0.672) 1 / 1304
trueskill.Rating(mu=37.354, sigma=0.642) 10 / 1081
trueskill.Rating(mu=37.001, sigma=0.617) 17 / 975
trueskill.Rating(mu=36.832, sigma=0.596) 4 / 919
trueskill.Rating(mu=36.538, sigma=0.577) 11 / 1237
trueskill.Rating(mu=38.168, sigma=0.579) 9 / 202
trueskill.Rating(mu=37.909, sigma=0.579) 112 / 194
trueskill.Rating(mu=38.314, sigma=0.580) 22 / 182
trueskill.Rating(mu=39.261, sigma=0.580) 10 / 177
trueskill.Rating(mu=38.636, sigma=0.579) 37 / 171
trueskill.Rating(mu=39.591, sigma=0.580) 16 / 166
trueskill.Rating(mu=39.939, sigma=0.582) 2 / 168
trueskill.Rating(mu=39.716, sigma=0.581) 37 / 186

Player 2 (the best player according to trueskill):
trueskill.Rating(mu=41.308, sigma=2.696) 134 / 139
trueskill.Rating(mu=76.557, sigma=1.677) 69 / 132
trueskill.Rating(mu=69.771, sigma=1.357) 115 / 128
trueskill.Rating(mu=72.300, sigma=1.146) 83 / 129
trueskill.Rating(mu=75.554, sigma=1.035) 95 / 116
trueskill.Rating(mu=78.606, sigma=0.942) 87 / 115
trueskill.Rating(mu=87.675, sigma=0.878) 5 / 80
trueskill.Rating(mu=88.466, sigma=0.814) 72 / 122

1 v all winning probabilities

I'd like to calculate the winning probabilities of a 1 vs all game . Is there a function to derive The probabilities which sums to 1?

How do I input my observed data?

The skill of the players are my latent variables, it is what I'm actually trying to estimate. The performance of each player is my observed variable.
My code initializes all the prior skills of my players, and I want to update their skills match by match given their scores. I have each player performance in the match. I see I can only input the list of skills to the rate method, and it will return me the updated skills. But how do I add my data to the code so it knows who won and who lost? In other words, how do I initialize the performance variables?

Weightings not respected when supplied as dictionary

In this sample:

    # Multi Player example
    print("\nMultiplayer example")

    class Player(object):
        def __init__(self, name, rating, team):
            self.name = name
            self.rating = rating
            self.team = team

    p1 = Player('Player A', Rating(), 0)
    p2 = Player('Player B', Rating(), 0)
    p3 = Player('Player C', Rating(), 1)

    print(p1.rating, p2.rating, p3.rating)

    teams = [{p1: p1.rating, p2: p2.rating}, {p3: p3.rating}]
    ranks = [1, 2]
    weights = {(0, p1): 1, (0, p2): 1, (1, p3): 1}

    rated = trueskill.rate(teams, ranks, weights=weights)

    p1.rating = rated[p1.team][p1]
    p2.rating = rated[p2.team][p2]
    p3.rating = rated[p3.team][p3]

    print(p1.rating, p2.rating, p3.rating)

The result is:

    Multiplayer example
    trueskill.Rating(mu=25.000, sigma=8.333) trueskill.Rating(mu=25.000, sigma=8.333) trueskill.Rating(mu=25.000, sigma=8.333)
    trueskill.Rating(mu=25.604, sigma=8.075) trueskill.Rating(mu=25.604, sigma=8.075) trueskill.Rating(mu=24.396, sigma=8.075)

All the weights were 1. Now give p2 a weight of 0.5:

    weights = {(0, p1): 1, (0, p2): 0.5, (1, p3): 1}

The result is identical:

If the weights are supplied as a list of tuples instead:

    weights = weights = [(1, 0.5), (1,)]    # for p1, p2, p3 respectively

Then the results reflect the weights:

    Multiplayer example
    trueskill.Rating(mu=25.000, sigma=8.333) trueskill.Rating(mu=25.000, sigma=8.333) trueskill.Rating(mu=25.000, sigma=8.333)
    trueskill.Rating(mu=26.764, sigma=7.685) trueskill.Rating(mu=25.882, sigma=8.176) trueskill.Rating(mu=23.236, sigma=7.685)

Does TrueSkill Support playing on both teams.

Let's say I have player A and player B. On my games, I have team 1: [A, B] and team 2: [B, B]. Player B is an algorithm, so there's no need to worry about the potential for information leakage between team 1 and team 2, all instances of player B will make decisions independently using the same decision-making process.

Does Trueskill support this case, or is there any way to modify the algorithm to support this? What would be the correct behavior in this circumstance.

Would it be sufficient to just not update player B's rating in this case, and only update player A's?

ZeroDivisionError

Hi sublee,

I have encountered a problem with the calculation of some ratings (admittedly with some rather unusual parameter settings):

transform_ratings([Rating(mu=-323.263, sigma=2.965), Rating(mu=-48.441, sigma=2.190)], ranks = [0, 1])
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python2.7/site-packages/trueskill/init.py", line 602, in transform_ratings
return _g().transform_ratings(rating_groups, ranks, min_delta)
File "/usr/lib/python2.7/site-packages/trueskill/init.py", line 531, in transform_ratings
return self.rate(rating_groups, ranks, min_delta=min_delta)
File "/usr/lib/python2.7/site-packages/trueskill/init.py", line 389, in rate
self.run_schedule(_args)
File "/usr/lib/python2.7/site-packages/trueskill/init.py", line 309, in run_schedule
delta = trunc_layer[0].up()
File "/usr/lib/python2.7/site-packages/trueskill/factorgraph.py", line 193, in up
v = self.v_func(_args)
File "/usr/lib/python2.7/site-packages/trueskill/init.py", line 45, in v_win
return pdf(x) / cdf(x)
ZeroDivisionError: float division by zero

Documentation on free-for-all

Hi there. Great library!

I was reading the documentation and I feel like there's very little information on free-for-all games, especially when it comes to non-zero sum games. I'm unsure how to rate players in a non-zero sum game like racing, for example. How do you take distance from winning into account, for example? Do I just input a list of the result, normalized to between 0-1? In any case, I feel this could be clearer in the docs.
Thanks!

bug: matrix.adjuadge return a error value when row and col are greater than 2;

A matrix dot by it's inverse matrix will be a cell matrix.
There are two example of numpy and trueskill.mathematics

>> from trueskill.trueskill.mathematics import Matrix
>> import numpy as np
>> d = [[1, 2, 3], [6, 5, 10], [7, 8, 9]]
>> m = Matrix(d[:])
>> m.inverse() * m
Matrix([[4.2222222222222205, 3.1666666666666656, 4.777777777777777], [-0.6666666666666659, 4.440892098500626e-16, -1.3333333333333321], [0.11111111111111138, -0.16666666666666652, 0.8888888888888893]])
>> m = np.array(d[:])
>> np.linalg.inv(m) @ m
array([[ 1.00000000e+00,  1.11022302e-15,  1.85962357e-15],
       [-5.27355937e-16,  1.00000000e+00, -3.60822483e-16],
       [-2.22044605e-16, -2.22044605e-16,  1.00000000e+00]])

This reason for this bug is that adjugate matrix is not transposed.

### trueskill\trueskill\mathematics.py

def adjugate(self):
        height, width = self.height, self.width
        if height != width:
            raise ValueError('Only square matrix can be adjugated')
        if height == 2:
            a, b = self[0][0], self[0][1]
            c, d = self[1][0], self[1][1]
            return type(self)([[d, -b], [-c, a]])
        src = {}
        for r in range(height):
            for c in range(width):
                sign = -1 if (r + c) % 2 else 1
                src[r, c] = self.minor(r, c).determinant() * sign
---     return type(self)(src, height, width)
+++     return type(self)(src, height, width).transpose()

Tiny documentation error with explanation of beta parameter

The documentation describes beta as "the distance which guarantees about 75.6% chance of winning". I think the correct percentage should be 76.025% (rounded however you wish). While the difference is trivial, it might confuse other people.

I was curious where the 75.6 magic number came from so derived what it should be, using the formula for computing win probability (mentioned in another issue). If you consider a match of two players, with the player sigmas and draw margins being 0, and the difference in rating means equal to beta, the win probability simplifies to cdf(1/sqrt(2)), which is about 0.76025.

How to tell TrueSkill that some 1v1 matchups are worth "more" than others?

Story time:

Say that I run a tennis league with 20 players. Tennis is a 1v1 game.
Some league matches are played on clay courts and some league matches are played on grass courts.
Pretend that on clay courts, the ball always bounces predictably. On these courts, the better player almost always wins.
Pretend that on grass courts, the ball will sometimes bounce randomly (because the grass is slightly uneven). On these courts, a better player often loses due to randomness, but most of the time, the better player wins.
We can calculate that a better player has exactly a 3 times better chance at winning when playing on a clay court over a grass court.

Now, I want to feed in all of the league matches into this TrueSkill Python library for the purposes of calculating a skill leaderboard. But if I just feed in all of the matches, it won't be accurate, because the worst player in the league happened to beat the best player in the league when they played on a grass court game. Is there a way to tell TrueSkill that one match has less confidence than another?

Calculating chance to win + draw probability

So I see that there is a method for calculating drawing chances, but what about winning chances between two ratings?

Also, I don’t quite understand how to set the draw percentages. Is it supposed to be set to 0.5? I think that I have mistaken what this values mean. Because in our game, it is very unlikely to draw but I have a feeling that this value is used to determine the middle point.

Thanks!

Very large free-for-alls produce negative, positive ratings outside the expected range

Colab notebook demonstrating issue

I made a free-for-all consisting of 20k default-initialized players. The top-ranking players in a simulated game had ratings of over six hundred with the default trueskill settings. The bottom ranking players had negative ratings. I had been under the impression that the default settings would generate ratings between zero and fifty. Is this a bug in the Python version of the code, or the algorithm itself?

Code, if you don't have access to colaboratory or lack a Google account:

# -*- coding: utf-8 -*-
"""TrueSkill Surprising Ratings

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/notebook#fileId=1OctL8znwKZUvthK5_rv7KKfpUK4oKmYv

Start by installing dependencies.
"""

!pip install trueskill

import trueskill as ts

ts.setup(backend='mpmath')

# Give us more precision.
import mpmath
mpmath.mp.dps = 25

"""Generate some test data."""

nplayers = 20000

players = [ts.Rating() for x in range(nplayers)]
teams = [(players[i],) for i in range(nplayers)]

import random
ranks = random.shuffle(list(range(nplayers)))

"""Do an update. This is an n-player-way free for all. We would expect the ratings to remain within 0-50, but we see that the top ratings end up being way over 50. This may explain the very low rankings for the lowest elements."""

new_ratings = ts.rate(teams, ranks=ranks)

sorted(new_ratings, key=lambda x: x[0].mu)[:10]

"""[(trueskill.Rating(mu=-5898.653, sigma=3.727),),
 (trueskill.Rating(mu=-5898.052, sigma=3.727),),
 (trueskill.Rating(mu=-5897.454, sigma=3.727),),
 (trueskill.Rating(mu=-5896.859, sigma=3.727),),
 (trueskill.Rating(mu=-5896.264, sigma=3.727),),
 (trueskill.Rating(mu=-5895.670, sigma=3.727),),
 (trueskill.Rating(mu=-5895.076, sigma=3.727),),
 (trueskill.Rating(mu=-5894.482, sigma=3.727),),
 (trueskill.Rating(mu=-5893.889, sigma=3.727),),
 (trueskill.Rating(mu=-5893.295, sigma=3.727),)]"""

sorted(new_ratings, key=lambda x: x[0].mu)[-10:]

"""[(trueskill.Rating(mu=5943.295, sigma=3.727),),
 (trueskill.Rating(mu=5943.889, sigma=3.727),),
 (trueskill.Rating(mu=5944.482, sigma=3.727),),
 (trueskill.Rating(mu=5945.076, sigma=3.727),),
 (trueskill.Rating(mu=5945.670, sigma=3.727),),
 (trueskill.Rating(mu=5946.264, sigma=3.727),),
 (trueskill.Rating(mu=5946.859, sigma=3.727),),
 (trueskill.Rating(mu=5947.454, sigma=3.727),),
 (trueskill.Rating(mu=5948.052, sigma=3.727),),
 (trueskill.Rating(mu=5948.653, sigma=3.727),)]"""

Win probability in a free-for-all of N players

Has anyone come up with an appropriate formula for calculating a win probability in a free-for-all match? I'm aware of the formula for a two-team matchup or a 1v1 matchup, but I haven't seen one for a free-for-all.

I've tried to devise my own formula by simply defining the win probability for a player as the average of all win probabilities in 1v1 matchups against all the other opponents. It seems to give reasonable results intuitively, but I'm not sure about the mathematical validity of this approach.

import itertools
import math
from itertools import combinations
from typing import List, Tuple, Dict

import trueskill
from trueskill import Rating


def win_probability(team1: List[Rating], team2: List[Rating]):
    delta_mu = sum(r.mu for r in team1) - sum(r.mu for r in team2)
    sum_sigma = sum(r.sigma ** 2 for r in itertools.chain(team1, team2))
    size = len(team1) + len(team2)
    denom = math.sqrt(size * (trueskill.BETA * trueskill.BETA) + sum_sigma)
    ts = trueskill.global_env()
    return ts.cdf(delta_mu / denom)


def win_probability_free_for_all(all_ratings: List[Rating]) -> List[float]:
    all_ratings_with_index: List[Tuple[int, Rating]] = [(i, rating) for i, rating in enumerate(all_ratings)]
    matchups: List[Tuple[Tuple[int, Rating], Tuple[int, Rating]]] = combinations(all_ratings_with_index, 2)
    total_probability: float = 0.0
    player_index_to_win_probabilities: Dict[int, List[float]] = {i: [] for i in range(len(all_ratings))}

    for rating_with_index_1, rating_with_index_2 in matchups:
        index1, rating1 = rating_with_index_1
        index2, rating2 = rating_with_index_2

        win_probability_1 = win_probability([rating1], [rating2])
        win_probability_2 = win_probability([rating2], [rating1])

        player_index_to_win_probabilities[index1].append(win_probability_1)
        player_index_to_win_probabilities[index2].append(win_probability_2)

        total_probability += win_probability_1 + win_probability_2

    win_probabilities = []

    for index, win_probabilities in player_index_to_win_probabilities.items():
        win_probability_for_player = sum(win_probabilities) / total_probability
        win_probabilities.append(win_probability_for_player)

    return win_probabilities


assert win_probability_free_for_all([Rating(mu=25, sigma=50 / 3) for _ in range(4)]) == [
    0.24999999999999997,
    0.24999999999999997,
    0.24999999999999997,
    0.24999999999999998
]

assert win_probability_free_for_all(
    [Rating(mu=30, sigma=50 / 3)] + [Rating(mu=25, sigma=50 / 3) for _ in range(3)]) == [
       0.2907628713511193,
       0.23641237621629355,
       0.23641237621629355,
       0.23641237621629355
]

assert win_probability_free_for_all([Rating(mu=30, sigma=0.1)] + [Rating(mu=25, sigma=0.1) for _ in range(3)]) == [
    0.40093001957080043,
    0.19968999347639982,
    0.19968999347639982,
    0.19968999347639982
]

Would anyone more familiar with the mathematics be able to verify these results, or give some insight as to why it might be correct/incorrect? Many thanks.

float division by zero

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "trueskill/__init__.py", line 416, in transform_ratings
    return g().transform_ratings(rating_groups, ranks, min_delta)
  File "trueskill/__init__.py", line 341, in transform_ratings
    self.run_schedule(*layers, min_delta=min_delta)
  File "trueskill/__init__.py", line 288, in run_schedule
    teamdiff_layer[0].up(0)
  File "trueskill/factorgraph.py", line 140, in up
    return self.update(self.terms[index], vals, msgs, coeffs)
  File "trueskill/factorgraph.py", line 145, in update
    pi = 1. / sum(coeffs[x] ** 2 / divs[x].pi for x in xrange(size))
  File "trueskill/factorgraph.py", line 145, in <genexpr>
    pi = 1. / sum(coeffs[x] ** 2 / divs[x].pi for x in xrange(size))
ZeroDivisionError: float division by zero

Batch update step?

Hi, sorry to open an issue for a usage question, but I wasn't sure where else to ask. Is there any way to perform a batch update on a set of match results without chronological certainty? Say, after a tournament, A>B, B>C, and D>A, but we do not know the exact order in which these matches happened. Is there a way to update all their skills simultaneously or in parallel?

It seems to me that with the TrueSkill algorithm this cannot be done, but I am wondering if possibly there is a mathematical solution that I had not considered. Or will I have to turn to TrueSkill Through Time for this? Thanks in advance!

RuntimeWarning when using numpy

If numpy is installed in the environment, some input leads to RuntimeWarning.

>>> r1, r2 = Rating(mu=105.247, sigma=0.439), Rating(mu=27.030, sigma=0.901)
>>> transform_ratings([(r1,), (r2,)])
trueskill/factorgraph.py:144: RuntimeWarning: divide by zero encountered in double_scalars
  pi = 1. / sum(coeffs[x] ** 2 / divs[x].pi for x in xrange(size))
[(Rating(mu=105.247, sigma=0.447),), (Rating(mu=27.030, sigma=0.905),)]

How to get prior probability of player A or player B winning match

Hi,

I see you can use quality_1vs1 to get the probability of a draw in a 1vs1 match

print('{:.1%} chance to draw'.format(quality_1vs1(r1, r2)))
44.7% chance to draw

but how do you get the prior probability of player A or player B winning the match ?

Thank you!

most accurate way to persist a rating to file or DB

im wondering what is the most accurate and preferred way to save a rating so a file or a DB.

i can just grab the floats with rating.mu and rating.sigma?

i have also seen people persist the hex of a python float. rating.mu.hex() / rating.sigma.hex() and then later Rating(mu=float.fromhex(mu),signma=float.fromhex(sigma))

i have also seen getting the float as an int expression. (mu0,mu1) = rating.mu.as_integer_ratio() , (sigma0,sigma1) = rating.sigma.as_integer_ratio(). then later Rating(mu=mu0/mu1,sigma=migma0/sigma1)

i have also seen some peopl trying to just format the float with this format '.60g' and persisting that.

do you have any advice

PEP517

Could you move the buildsystem to PEP517?

Is this project being renamed as FalseSkill?

See this:

Is there a way to derive a single score of a given team?

I'd like to calculate a leaderboard of teams. Is there a function to derive a single TrueSkill number for an entire team?

Results from 0.4.4 and 0.4.5 Don't Match

I have a project that uses this TrueSkill module to rate drivers in a racing series. I recently upgraded my version of TrueSkill via pip and the new results from 0.4.5 do not match the old results 0.4.4. I have attached two CSV files that show the output of my code which include the Rating (Mu - 3*Sigma) as well as the Mu and Sigma values for each driver. Both data sets used the exact same input results in the same order. I can provide any more information that you need to help figure this out.

TruckRatings.txt
NewTruckRatings.txt
iRacingTrueSkill.txt

Implement a rank prediction function

I'm interested in how well TrueSkill performs at predicting match outcomes.

That's best done by comparing an actual result with a prediction TrueSkill would make based on current skills.

I can see from the factor graph that player skill is first translated to performance by injecting an uncertainty of Beta, and that these player performances are summed to calculate a team performance. I'm shaky on my reading of the graph at this point and exactly what math is behind adding Beta and adding performances, but what I can't find mentioned in the paper I'm reading is what the most likely match outcome is given a set of players in teams with known skills.

I have a feeling this is a fairly simple function. If it were a game of n individual players I imagine the predicted ranking is just the players ranked by their skill means (Mu values). Well, that would be my supposition anyhow.

It gets trickier in the general case of teams, and how to calculate a pranking prediction of a game that has n players distributed among m teams. I imagine needing to estimate team skill on the basis of team members, and might infer from the way performances are added to arrive at team performance, that skills can be added to arrive at team skill. But I'm not sure and still reading and getting my head around what it means to add two performances or skills (which Gaussian variables).

It's clearly not just adding the Gaussians, nor can I expect the mean (expected value) to be the sum of means (as then the net performance or skill of a team would grow with the number of members. It may of course prove to be the mean of the means or the weighted mean of means (taking into account partial play weightings) and that would not surprising given how elegantly Gaussians pan out in so many ways. Still I am hypothesizing and I see value in the package including a function that performs the prediction in the general team structure case (and perhaps to see it documented at trueskill.org).

Of course if anyone can provide any pointers that help me understand this I'm happy to try and nail it and implement it and PR it. But I'm floundering a little at the moment and so thought to drop a note here while I do.

What's the workflow?

I'm trying to understand the workflow for this true skill tool.

The way I imagine doing matchmaking is my master server will attempt to find players who are already in a lobby with ratings similar to mine, within some sort of offset. However, the first question here is, what is the fair range of offsets for skill level? +50/-50? Second, what type of data should I be adding to my database tables for this to be able to be saved?

Is the workflow like this:

create match
Find a winner (let the game play out)
?
Submit results to db

Do you have an example workflow of how this could be used?

From reading the official true skill documentation, it says that for a 4v4 2 team game, you would need a minimum of 46 matches to determine with a high degree of belief that this is the players score. Where is the number of matches stored in your code?

Should my database store both the Sigma and Mu values of each player? The official calculation I found online for calculating player skill is this: μ – k*σ where k = 3. I found this on the official microsoft page.

Estimating winner probability

Currently trueskill has a way to estimate the draw probability for a given match with quality() function, but I would like to have a function for estimating the winner with a set of players. Is this possible to do?

For example, let's say I have three players with some ratings and they will play a three-player free-for-all. The program would give the probability of winning for each player:

Player 1: 68%
Player 2: 25%
Player 3:  7%

Partial Play

The TrueSkill Python module doesn't provide an interface for Partial Play but The original TrueSkill™ system does. We should implement a logic and good interface for Partial Play.

How to get every player's 'score' using this lib?

Sorry to ask such simple questiones.
But how can I get the 'score' of each player after a lot of matches?

I read the expose function is made for leaderboard.
leaderboard = sorted(ratings, key=env.expose, reverse=True)
Does it mean that the higher the rating exposure is, the better the player is?

And can I just use the μ directly?

Tied teams with same initial ratings have slightly different ratings after trueskill calc.

This was a subtly observation while testing a site I'm building. Here is some python code to demonstrate it:

#!/usr/bin/python3
import trueskill
mu0 = 25.0
sigma0 = 8.333333333333334
beta = 4.166666666666667
delta = 0.0001
tau = 0.0833333333333333
p = 0.1
TS = trueskill.TrueSkill(mu=mu0, sigma=sigma0, beta=beta, tau=tau, draw_probability=p)
oldRGs = [{10: trueskill.Rating(mu=25.000, sigma=8.333), 11: trueskill.Rating(mu=25.000, sigma=8.333)}, {8: trueskill.Rating(mu=25.000, sigma=8.333), 3: trueskill.Rating(mu=25.000, sigma=8.333)}, {9: trueskill.Rating(mu=25.000, sigma=8.333), 6: trueskill.Rating(mu=25.000, sigma=8.333)}]
Weights = {(1, 8): 1.0, (1, 3): 1.0, (0, 10): 1.0, (2, 9): 1.0, (0, 11): 1.0, (2, 6): 1.0}
Ranking = [1, 1, 2]
newRGs = TS.rate(oldRGs, Ranking, Weights, delta)
print(newRGs)

When I run this it produces:

[{10: trueskill.Rating(mu=26.804, sigma=7.250), 11: trueskill.Rating(mu=26.804, sigma=7.250)}, {8: trueskill.Rating(mu=26.808, sigma=7.249), 3: trueskill.Rating(mu=26.808, sigma=7.249)}, {9: trueskill.Rating(mu=21.387, sigma=7.576), 6: trueskill.Rating(mu=21.387, sigma=7.576)}]

In summary:

3 teams compete.
2 teams tied for first place.
Players are identified by a number
Team 1 has players 10 and 11
Team 2 has players 8 and 3
Team 3 has players 9 and 6
The partial play weights are all 1 and all players start with trueskill.Rating(mu=25.000, sigma=8.333)

Given teams 1 and 2 tied, I expect the trueskill rating for the players 10, 11, 8 and 3 all to be updated identically. And yet, after running trueskill.rate we would have players 10 and 11 complaining that their rating is now:

trueskill.Rating(mu=26.804, sigma=7.250)

while players 8 and 3 with whom they tied now have:

trueskill.Rating(mu=26.808, sigma=7.249)

I expect there is a math precision issue at play. But the integrity issue remains, that tied teams expect identical trueskill updates if starting from the same skill.

I'm not sure that expectation holds when they have disparate initial skill ratings and so in practice this may never be noticed. But ti does raise two questions:

What causes it?
What should be done about it?

Draw probability usage

I've noticed that draw probability given in the environment doesn't seem to effect the output of quality, even though the documentation states that quality should give the draw probability. Does the given draw probability have any effect on the results of the model, and in what way? Is there a way to get a more accurate number? The output of quality seems greatly inflated.

how can i take a match's score into account ?

Hey, i would like to know if it was possible to take into account a match's score using true skill ?

The idea would be that a match with a score of 3-2 won't lower the score of the loser as much as a `3-0.
Is this something that can be done in trueskill ?

TrueSkill Through Time

It would be nice if this module implemented TrueSkill Through Time (TTT), which fixes a few issues in the original TrueSkill algorithm, namely:

Inference within a rating period depends on the random order selected within that rating period.
Inference can only go forwards in time, so if A beat B then B beats C, and C is highly ranked, then A should also be highly ranked.

Losers of a free-for-all are not affected equally

I would like to use the TrueSkill algorithm for modeling free-for-all games among N players, with one winner and N-1 losers. My assumption would be that, assuming the losers all have the same initial rating, the losers' ratings would all go down the same amount. However, this does not appear to be the case:

>>> num_players = 5
>>> players = [trueskill.Rating() for _ in range(num_players)]
>>> free_for_all = [(p,) for p in players]
>>> ranking_table = [0] + [1]*(num_players-1)
>>> trueskill.rate(free_for_all, ranks=ranking_table)
[(trueskill.Rating(mu=30.621, sigma=6.366),), (trueskill.Rating(mu=23.585, sigma=5.104),), (trueskill.Rating(mu=23.593, sigma=5.100),), (trueskill.Rating(mu=23.599, sigma=5.102),), (trueskill.Rating(mu=23.602, sigma=5.108),)]

Obviously the differences are slight, but still present. Do you know why this would be the case?

I'm using version 0.4.4 of the package on Python3, using the internal implementation.

Trueskill 2.0

The Trueskill 2.0 paper came out a few months ago. It appears to be taking in game per players stats into account and includes quit penalties. There claim is much higher predictive power. It however doesn't appear to come with any source code. Is it possible to infer that steps necessary to implement?

Performance?

Hello!

Thank you for this wonderful piece of code. =)

Have you tried tuning the performance?

E.g. for 10,000 random matches (each of max. 5 teams of max. 10 members each) it consistently takes:

79.41 s to calculate new ratings (so ~8 ms per parallelized/amortized rate()),
16.52 s to calculate match qualities (~2 ms per parallel quality())

… on 4 cores (and 4 parallel python3 processes).

Transform expose() to a positive value

I have a problem with expose() going negative as soon as a new player loses his first game. Is there any simple arithmetic transformation, that always keeps all ratings in the positive zone? What is the expected range of Trueskill ratings?

Workable sample code

I'm new with that package, and I'd like to use a small sample code I could run in Python 3.6

Thank you