Git Product home page Git Product logo

summit-2023's People

Contributors

bsipocz avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

summit-2023's Issues

SPECs supporting infrastructure

  • how do we launch the SPEC process
  • implementing SPEC supporting infrastructure (subtopics exists on their own just as CI/packaging/etc.)
  • non-infrastrucure SPECs

Package metrics & stats

From the Feb 27 meeting: "How do we collect metrics and package stats?"

Popularising "team-compass" repos

Team-compass repos.

Several years ago the JupyterHub team created a team-compass repository. The goal of the repo is to provide a place for "team interaction, syncing, and handling meeting notes across the JupyterHub ecosystem".

The point of this issue is that I think more projects would benefit from having a team-compass. The outcome of working on this would be a SPEC and other material explaining the idea, give advice, etc to popularise the idea.

We use the repo to talk about topics from "reduced availability", over "should we use pre-commit for all repos?", "coordinating outreachy work" and "is self-merging your PRs Ok?". But I think the best way to get a feeling is for people to browse the repo a bit.

Having a place like this creates a place to talk about things that are more about how to work together, than doing the work (which is more appropriate to discuss in the actual project repo).

The structure is popular and since a few years ago most of the other parts of Project Jupyter have adopted them.

It would be interesting to hear what other projects do and find out if there is something worth recommending to the wider community.

Testing against downstream libraries

I was recently chatting with some folks developing R packages and learned that when they want to make a release available on CRAN, the package is installed in the environment and the test suites of all downstream packages are ran to ensure that the new release does not come with an unexpected breaking change. While I think this is a bit too much (and it does indeed cause some friction and negativity) I'd like to explore the options how to do the same with Python.

I can imagine we have nightly runs that install a defined set of downstream packages (not one or two, but more twenty or forty) and runs their tests against the current main. While this is already technically feasible, I am not aware of anyone doing it comprehensively and it currently faces the issue of inconsistency of build and CI systems.

We could clone each repo and build from source, but then we need to know how each of the downstream packages get built and prepare for that. Which is non-trivial for a lot of spatial stuff depending on GDAL and friends. However, I don't think we need to test against main but testing against the latest release should be enough. But that comes with the other issue, that if we install the packages from PyPI, the tests are sometimes not a part of the distribution and in other cases will not run, because some data they use are available only in the repo itself.

I don't have a clear idea how to tackle this but I'd love to spend some time thinking about tooling that may enable it. It happened numerous times that we found a regression because a CI of a downstream package failed. This way, we would figure it ourselves.

Move the backport bot a rename it.

  • move the source code.
  • move the app
  • give access to heroku.
  • update website with metrics.
  • give access the the mail/bot-account.
  • Make a new logo/page/name.
  • Access to keen.io ?
  • Make sure Steve Silvester still have access to everything.

CI Infrastructure

Edit by bsipocz to hash out some ideas e.g. to include in CI templating, etc:

  • job that tests nightly wheels/dev versions. Maybe a separate job that tests pre-releases.
  • job that tests the oldest supported versions of declared dependencies
  • cron jobs: notification are not working properly on github, we should implement a small tool, workaround to get notifications to maintainers
  • filterwarnings to error out on warnings: expected warnings should be handled in CI, or ignored. Erroring out by default provides enormous help to notice new deprecations while upstream dependencies are still in their dev cycle (assuming there is a job that uses the dev versions/nightly builds)

Also, this could include documentation about the available CI tooling, though there is a risk involved here with keeping this up-to-date.

Reduce cross-project effort when shared dependencies change/break

EDIT: PR title changed to reflect the nature of the problem; initial PR description below is just one idea towards solving it

We could maintain forks of various key dependencies at the ecosystem level, and ecosystem projects could then depend on / install from there. Hopefully this results in fewer cases of "shoot, dependencyX is breaking our CIs, I don't have time to investigate right now so I'll just pin it" happening across many ecosystem packages. Examples of possible dependencies where this might make sense:

  • docutils
  • sphinx
  • sphinx-gallery
  • sphinxcontrib-bibtex
  • pybtex
  • pooch
  • (feel free to suggest others)

Some of these might be cases of "fork-and-freeze", with a predetermined schedule of when upstream changes are pulled in. Prior to such updates, a couple of ecosystem packages could temporarily switch to installing the upstream in a PR to see if any CIs break, and if they do, the update could be delayed until the breakage was resolved.

Others might need to be occasionally maintained at the ecosystem-fork level (here I'm thinking about recent docutils release where node.traverse replaced node.findall and it had to be fixed in numpydoc, sphinxcontrib-bibtex, and a couple other of MNE-Python's dependencies). Having ecosystem-level forks would mean that as soon as any of us noticed a problem, a patch could be made at the ecosystem fork level and the fix would be available to the ecosystem immediately (i.e., without waiting for the patch to be accepted upstream).

Questions:

  • where do we draw the line? It's not reasonable to fork every dependency of every package in the ecosystem

Package template cookiecutter

Currently there are many projects providing cookie cutters for Scientific Python projects. At the very least, it may be nice to make a list of the existing cookie cutters. Here are some other things to discuss:

  • is there anything not already covered by scikit-HEPs or OpenAstornomy's?
  • are there major differences between the various cookie cutters?
  • If so, would it be possible to come to some agreement about what should be consistent and would should vary between the various recommended cookie cutters.

Things to learn about

It came up during multiple pre-summit meetings, that it would be nice to have knowledge transfer sessions.

Please comment topic suggestions in this issue.

Ecosystem-level tutorial / documentation content

Splitting this off from the issue of tutorial infrastructure (#9). There are lots of things that don't need separate documentation for each package; things like

  • standard GitHub setup for contributors (fork, clone, add upstream, don't open PRs from main, etc)
  • explanation of CIs and how to view their logs
  • project management / workflows. This one is more aimed at maintainers, and has the goal of making it easier for contributors from other parts of the ecosystem to jump in and contribute to other packages without too much friction. Here I'm thinking of things like pre-commit setup, PR merge criteria, etc: probably there won't be general agreement on something like PR merge criteria across the ecosystem, but we could at least agree upon a standard place to write down what the PR merge policy is, so that a new contrib could easily find that information (CONTRIBUTING.md?) regardless of what package they happen to be contributing to.

obviously there will be some degree of variability across packages in how each of these things are done, but that's an opportunity for figuring out which ones have a clear best-practice (that could be codified in a SPEC?) and which are cases of "reasonable packages will disagree". For the latter cases, the ecosystem-level tutorial would be either more conceptual (individual package docs fill in the details) or simply not exist.

cc @lwasser who is planning tutorials about packaging I think

Originally posted by @drammock in #9 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.