zctao / omnifoldtop Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 2.0 943 KB

Python 95.62% Shell 4.38%

omnifoldtop's People

Contributors

Stargazers

Watchers

Forkers

wecassidy theo1701

omnifoldtop's Issues

Division by 0 if all data in histogram is in over/underflow bins

Minimum example:

>>> data = [10]
>>> bins = [0, 10]
>>> histogramming.calc_hist(data, bins)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wcassidy/wd_of/python/histogramming.py", line 72, in calc_hist
    check_hist_flow(h)
  File "/home/wcassidy/wd_of/python/histogramming.py", line 44, in check_hist_flow
    if float(n_underflow/n_total) > threshold_underflow:
ZeroDivisionError: float division by zero

The relevant function is histogramming.check_hist_flow():

def check_hist_flow(h, threshold_underflow=0.01, threshold_overflow=0.01):
    n_underflow = h[hist.underflow]['value']
    n_overflow = h[hist.overflow]['value']
    n_total = h.sum()['value']

    if float(n_underflow/n_total) > threshold_underflow:
        logger.debug("Percentage of entries in the underflow bin: {}".format(float(n_underflow/n_total)))
        logger.warn("Number of entries in the underflow bin exceeds the threshold!")
        return False

    if float(n_overflow/n_total) > threshold_overflow:
        logger.debug("Percentage of entries in the overflow bin: {}".format(float(n_overflow/n_total)))
        logger.warn("Number of entries in the overflow bin exceeds the threshold!")
        return False

    return True

There's a zero division error when all the data is in over/underflow bins since hist.hist.Hist.sum doesn't include over/underflow bins unless it is passed the argument flow=True. There are two potential fixes that I see:

Pass flow=True to h.sum. This slightly changes how the overflow and underflow percentages are calculated by changing the denominator from "number of non-over/underflow datapoints" to "number of datapoints, including over and underflow".
Check for the n_total == 0 case and handle it specially, preserving the current behaviour.

Any preferences for which way to go? I'd prefer option 1 unless it's important to maintain the current behaviour.

Missing hist module

histogramming.py imports a hist module that isn't checked in to version control. The issue was introduced in d8c8e2e in util.py (see the diff).

Loss function usage

This shouldn't be an urgent problem because things are running fine, but I feel that it is worth looking into.

At model compilation we currently use loss='binary_crossentropy', according to tensor flow documentation here: https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile, it seems like it will use the corresponding loss function when a string is given.

If we directly look for the loss function named binary crossentropy, https://www.tensorflow.org/api_docs/python/tf/keras/metrics/binary_crossentropy, it seems like it is not using the sample_weight

I'm not entirely sure whether this is related to any issue we have at the moment. I would expect not using sample_weight entirely would lead to more catastrophic problems, rather than just misfits at some regions. But this is definitely worth some testing and clarifications in the future.

Loss monitoring and plotting

A road map to the development of loss monitor, with its repository found here:
https://github.com/theo1701/omnifoldTop/tree/observeLoss

After examining how loss is handled currently, I think it is totally possible to create a new loss plotter class. It will be have a global instance recording the rerun number, iteration number, and basically produce a loss histogram for all observables in each iteration (or maybe even finer resolution, for each training step).

After that, it will use the obtained data to make histograms. I'm thinking of making one for each variable at each iteration, for each rerun.

zctao / omnifoldtop Goto Github PK

omnifoldtop's People

Contributors

Stargazers

Watchers

Forkers

omnifoldtop's Issues

Division by 0 if all data in histogram is in over/underflow bins

Missing hist module

Loss function usage

Loss monitoring and plotting

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent