The individualcalibration from shengjiazhao

From our discussion at ICML

Hi! This is the code that I mentioned sharing during ICML. I think the line here https://github.com/ShengjiaZhao/IndividualCalibration/blob/master/utils.py#L69 could be improved by first taking an individual calibration error for each instance first.

Here is a small snippet of a testing loop, and how I implemented it after reading your paper...

        SAMPLES = 2000
        b = 100
        for i, (x, y, _) in enumerate(args.test_set):
            x, y = x.to(args.device), y.to(args.device)

            mus, logvars = model.mc(x, SAMPLES)

            mus = mus.squeeze()
            logvars = logvars.squeeze()

            cdf = 0.5 * (
                1.0 + torch.erf((y - mus) / torch.exp(logvars / 2) / math.sqrt(2))
            )

            bins = torch.linspace(1.0 / b, 1.0, b).to(args.device) # expectation of U
            cdf = cdf.repeat(b, 1, 1) # repeat it b times so we can test each sample at each bin
            # the next line is weird, but [...] it is like a double unsqueeze
            cal = cdf <= bins[(...,) + (None,) * 2]  # how many values are below bin percentile
            cal = cal.sum(dim=1) 
            cal = cal / float(SAMPLES) # empirical average number of samples below bin percentile

            # at this point we have a (b, x_inst) tensor of the empirical average of the number of CDF samples which 
            # fall into each bin, we can take the calibration error of each sample to find the worst subset, and then 
            # also get the average calibration error of that worst subset since we already have the calibration error
            # of each instance.

            cal_err += (torch.abs(cal - bins.unsqueeze(1))).sum(dim=1) # sum calibration error over the sample dimension

for my reimplementation purposes, I was not concerned with getting the worst subset, but you could do that right after getting the empirical average of samples below the bin percentile step (second to last non-comment line).

Also, if you did it this way, you could avoid taking only the last sample in this line https://github.com/ShengjiaZhao/IndividualCalibration/blob/master/utils.py#L73 and instead measure the mean calibration error of the worst subset which were calculated over all samples.

Thanks for answering my questions! Please let me know what you think

shengjiazhao / individualcalibration Goto Github PK

individualcalibration's People

Contributors

Stargazers

Watchers

individualcalibration's Issues

From our discussion at ICML

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent