Git Product home page Git Product logo

Comments (2)

CloseChoice avatar CloseChoice commented on June 8, 2024 2

Thanks for your effort on this! I think this is a good idea in principle.

My initial thought was that we could use nbsphinx, which supports executing notebooks out-of-the-box: notebooks are executed as part of the build if the outputs are empty. However, this has some problems:

1. Unintentional changes to docs might creep in without human review.

2. Compute time: training ML models, computing shap values.

3. Flaky builds. The build would fail as soon as any 3rd party lib used in the docs has a breaking change, even if the main shap test suite passes.

So your suggestion of checking the notebooks with a CI job seems like a good shout. That mitigates (1), as we wouldn't be overwriting the notebooks automatically.

To mitigate (2), an allowlist seems sensible. As a benchmark, we could aim to keep this new CI job running faster than the pytest suite on CI, so it's not a bottleneck.

So that leaves flaky builds (3). It's probably a good thing that the main pytest jobs use the very latest versions of dependencies; but the downside is flakiness. It's not trivial to pin dependencies with setuptools in a manner that works on multiple python versions and OSs. We could perhaps try to pin the dependencies in the notebooks job, whilst leaving the main pytest depenencies open? Open to ideas on this one.

Thanks for your thoughtful feedback.
Regarding (3). If things break this might also be an information for us. I would suggest giving it a shot without pinning versions and if we run into problems with failing pipelines we have the options to:

  1. merge anyway, as long as just the notebooks job crashed. This is a temporary option if a fix needs to be deployed
  2. deactivate the pipeline for some time and pin the versions.
    I see where you are coming from and it is certainly a valid concern but I would try to keep the work burden on these things minimal if we have no idea how bad the problem will be. Once we have more data on this I'll be the first to get my hands dirty and implement a solution.

from shap.

connortann avatar connortann commented on June 8, 2024

Thanks for your effort on this! I think this is a good idea in principle.

My initial thought was that we could use nbsphinx, which supports executing notebooks out-of-the-box: notebooks are executed as part of the build if the outputs are empty. However, this has some problems:

  1. Unintentional changes to docs might creep in without human review.
  2. Compute time: training ML models, computing shap values.
  3. Flaky builds. The build would fail as soon as any 3rd party lib used in the docs has a breaking change, even if the main shap test suite passes.

So your suggestion of checking the notebooks with a CI job seems like a good shout. That mitigates (1), as we wouldn't be overwriting the notebooks automatically.

To mitigate (2), an allowlist seems sensible. As a benchmark, we could aim to keep this new CI job running faster than the pytest suite on CI, so it's not a bottleneck.

So that leaves flaky builds (3). It's probably a good thing that the main pytest jobs use the very latest versions of dependencies; but the downside is flakiness. It's not trivial to pin dependencies with setuptools in a manner that works on multiple python versions and OSs. We could perhaps try to pin the dependencies in the notebooks job, whilst leaving the main pytest depenencies open? Open to ideas on this one.

from shap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.