Git Product home page Git Product logo

omnifoldtop's People

Contributors

theo1701 avatar wecassidy avatar zctao avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

omnifoldtop's Issues

Division by 0 if all data in histogram is in over/underflow bins

Minimum example:

>>> data = [10]
>>> bins = [0, 10]
>>> histogramming.calc_hist(data, bins)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wcassidy/wd_of/python/histogramming.py", line 72, in calc_hist
    check_hist_flow(h)
  File "/home/wcassidy/wd_of/python/histogramming.py", line 44, in check_hist_flow
    if float(n_underflow/n_total) > threshold_underflow:
ZeroDivisionError: float division by zero

The relevant function is histogramming.check_hist_flow():

def check_hist_flow(h, threshold_underflow=0.01, threshold_overflow=0.01):
    n_underflow = h[hist.underflow]['value']
    n_overflow = h[hist.overflow]['value']
    n_total = h.sum()['value']

    if float(n_underflow/n_total) > threshold_underflow:
        logger.debug("Percentage of entries in the underflow bin: {}".format(float(n_underflow/n_total)))
        logger.warn("Number of entries in the underflow bin exceeds the threshold!")
        return False

    if float(n_overflow/n_total) > threshold_overflow:
        logger.debug("Percentage of entries in the overflow bin: {}".format(float(n_overflow/n_total)))
        logger.warn("Number of entries in the overflow bin exceeds the threshold!")
        return False

    return True

There's a zero division error when all the data is in over/underflow bins since hist.hist.Hist.sum doesn't include over/underflow bins unless it is passed the argument flow=True. There are two potential fixes that I see:

  1. Pass flow=True to h.sum. This slightly changes how the overflow and underflow percentages are calculated by changing the denominator from "number of non-over/underflow datapoints" to "number of datapoints, including over and underflow".
  2. Check for the n_total == 0 case and handle it specially, preserving the current behaviour.

Any preferences for which way to go? I'd prefer option 1 unless it's important to maintain the current behaviour.

Loss function usage

This shouldn't be an urgent problem because things are running fine, but I feel that it is worth looking into.

At model compilation we currently use loss='binary_crossentropy', according to tensor flow documentation here: https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile, it seems like it will use the corresponding loss function when a string is given.

If we directly look for the loss function named binary crossentropy, https://www.tensorflow.org/api_docs/python/tf/keras/metrics/binary_crossentropy, it seems like it is not using the sample_weight

I'm not entirely sure whether this is related to any issue we have at the moment. I would expect not using sample_weight entirely would lead to more catastrophic problems, rather than just misfits at some regions. But this is definitely worth some testing and clarifications in the future.

Loss monitoring and plotting

A road map to the development of loss monitor, with its repository found here:
https://github.com/theo1701/omnifoldTop/tree/observeLoss

After examining how loss is handled currently, I think it is totally possible to create a new loss plotter class. It will be have a global instance recording the rerun number, iteration number, and basically produce a loss histogram for all observables in each iteration (or maybe even finer resolution, for each training step).

After that, it will use the obtained data to make histograms. I'm thinking of making one for each variable at each iteration, for each rerun.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.