Git Product home page Git Product logo

individualcalibration's People

Contributors

shengjiazhao avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

individualcalibration's Issues

From our discussion at ICML

Hi! This is the code that I mentioned sharing during ICML. I think the line here https://github.com/ShengjiaZhao/IndividualCalibration/blob/master/utils.py#L69 could be improved by first taking an individual calibration error for each instance first.

Here is a small snippet of a testing loop, and how I implemented it after reading your paper...

        SAMPLES = 2000
        b = 100
        for i, (x, y, _) in enumerate(args.test_set):
            x, y = x.to(args.device), y.to(args.device)

            mus, logvars = model.mc(x, SAMPLES)

            mus = mus.squeeze()
            logvars = logvars.squeeze()

            cdf = 0.5 * (
                1.0 + torch.erf((y - mus) / torch.exp(logvars / 2) / math.sqrt(2))
            )

            bins = torch.linspace(1.0 / b, 1.0, b).to(args.device) # expectation of U
            cdf = cdf.repeat(b, 1, 1) # repeat it b times so we can test each sample at each bin
            # the next line is weird, but [...] it is like a double unsqueeze
            cal = cdf <= bins[(...,) + (None,) * 2]  # how many values are below bin percentile
            cal = cal.sum(dim=1) 
            cal = cal / float(SAMPLES) # empirical average number of samples below bin percentile

            # at this point we have a (b, x_inst) tensor of the empirical average of the number of CDF samples which 
            # fall into each bin, we can take the calibration error of each sample to find the worst subset, and then 
            # also get the average calibration error of that worst subset since we already have the calibration error
            # of each instance.

            cal_err += (torch.abs(cal - bins.unsqueeze(1))).sum(dim=1) # sum calibration error over the sample dimension 

for my reimplementation purposes, I was not concerned with getting the worst subset, but you could do that right after getting the empirical average of samples below the bin percentile step (second to last non-comment line).

Also, if you did it this way, you could avoid taking only the last sample in this line https://github.com/ShengjiaZhao/IndividualCalibration/blob/master/utils.py#L73 and instead measure the mean calibration error of the worst subset which were calculated over all samples.

Thanks for answering my questions! Please let me know what you think

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.