scientific-python / summit-2023 Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 0 B

Work summit 2023

summit-2023's People

Contributors

Stargazers

Watchers

summit-2023's Issues

SPECs supporting infrastructure

how do we launch the SPEC process
implementing SPEC supporting infrastructure (subtopics exists on their own just as CI/packaging/etc.)
non-infrastrucure SPECs

pytest-plugins / Sphinx extensions

Hackmd link for the summit: https://hackmd.io/JL5slkxORA-q7VRN79v1sA

Package metrics & stats

From the Feb 27 meeting: "How do we collect metrics and package stats?"

@lwasser brought it up
@jpivarski has ideas to contribute (I'll follow up)
@InessaPawson has been researching this topic and has implemented some solutions in NumPy
@lagru asks if there has been interest/recent work on (regular) user surveys (perhaps Scientific Python-wide?)
@Carreau said that there was a discussion/project 8 years ago to have opt-in user feedback for SciPy: https://github.com/njsmith/sempervirens

Popularising "team-compass" repos

Team-compass repos.

Several years ago the JupyterHub team created a team-compass repository. The goal of the repo is to provide a place for "team interaction, syncing, and handling meeting notes across the JupyterHub ecosystem".

The point of this issue is that I think more projects would benefit from having a team-compass. The outcome of working on this would be a SPEC and other material explaining the idea, give advice, etc to popularise the idea.

We use the repo to talk about topics from "reduced availability", over "should we use pre-commit for all repos?", "coordinating outreachy work" and "is self-merging your PRs Ok?". But I think the best way to get a feeling is for people to browse the repo a bit.

Having a place like this creates a place to talk about things that are more about how to work together, than doing the work (which is more appropriate to discuss in the actual project repo).

The structure is popular and since a few years ago most of the other parts of Project Jupyter have adopted them.

It would be interesting to hear what other projects do and find out if there is something worth recommending to the wider community.

Community Building

Hackmd link for the summit: https://hackmd.io/YL5DNtsaSsS-1ZU3Pxkrxg

Testing against downstream libraries

I was recently chatting with some folks developing R packages and learned that when they want to make a release available on CRAN, the package is installed in the environment and the test suites of all downstream packages are ran to ensure that the new release does not come with an unexpected breaking change. While I think this is a bit too much (and it does indeed cause some friction and negativity) I'd like to explore the options how to do the same with Python.

I can imagine we have nightly runs that install a defined set of downstream packages (not one or two, but more twenty or forty) and runs their tests against the current main. While this is already technically feasible, I am not aware of anyone doing it comprehensively and it currently faces the issue of inconsistency of build and CI systems.

We could clone each repo and build from source, but then we need to know how each of the downstream packages get built and prepare for that. Which is non-trivial for a lot of spatial stuff depending on GDAL and friends. However, I don't think we need to test against main but testing against the latest release should be enough. But that comes with the other issue, that if we install the packages from PyPI, the tests are sometimes not a part of the distribution and in other cases will not run, because some data they use are available only in the repo itself.

I don't have a clear idea how to tackle this but I'd love to spend some time thinking about tooling that may enable it. It happened numerous times that we found a regression because a CI of a downstream package failed. This way, we would figure it ourselves.

scipy lecture notes

Hackmd link for the summit: https://hackmd.io/YL5DNtsaSsS-1ZU3Pxkrxg

DevStats (https://devstats.scientific-python.org/)

Hackmd for the summit: https://hackmd.io/UNwG2BjJSxOUJ0M1iWI-nQ

Tutorial Infrastructure

Move the backport bot a rename it.

move the source code.
move the app
give access to heroku.
update website with metrics.
give access the the mail/bot-account.
Make a new logo/page/name.
Access to keen.io ?
Make sure Steve Silvester still have access to everything.

CI Infrastructure

Edit by bsipocz to hash out some ideas e.g. to include in CI templating, etc:

job that tests nightly wheels/dev versions. Maybe a separate job that tests pre-releases.
job that tests the oldest supported versions of declared dependencies
cron jobs: notification are not working properly on github, we should implement a small tool, workaround to get notifications to maintainers
filterwarnings to error out on warnings: expected warnings should be handled in CI, or ignored. Erroring out by default provides enormous help to notice new deprecations while upstream dependencies are still in their dev cycle (assuming there is a job that uses the dev versions/nightly builds)

Also, this could include documentation about the available CI tooling, though there is a risk involved here with keeping this up-to-date.

Benchmarking

Raised by @MridulS at the Feb 27 prep-meeting.

Reduce cross-project effort when shared dependencies change/break

EDIT: PR title changed to reflect the nature of the problem; initial PR description below is just one idea towards solving it

We could maintain forks of various key dependencies at the ecosystem level, and ecosystem projects could then depend on / install from there. Hopefully this results in fewer cases of "shoot, dependencyX is breaking our CIs, I don't have time to investigate right now so I'll just pin it" happening across many ecosystem packages. Examples of possible dependencies where this might make sense:

docutils
sphinx
sphinx-gallery
sphinxcontrib-bibtex
pybtex
pooch
(feel free to suggest others)

Some of these might be cases of "fork-and-freeze", with a predetermined schedule of when upstream changes are pulled in. Prior to such updates, a couple of ecosystem packages could temporarily switch to installing the upstream in a PR to see if any CIs break, and if they do, the update could be delayed until the breakage was resolved.

Others might need to be occasionally maintained at the ecosystem-fork level (here I'm thinking about recent docutils release where node.traverse replaced node.findall and it had to be fixed in numpydoc, sphinxcontrib-bibtex, and a couple other of MNE-Python's dependencies). Having ecosystem-level forks would mean that as soon as any of us noticed a problem, a patch could be made at the ecosystem fork level and the fix would be available to the ecosystem immediately (i.e., without waiting for the patch to be accepted upstream).

Questions:

where do we draw the line? It's not reasonable to fork every dependency of every package in the ecosystem

Statistics package

Sparse Arrays

Package template cookiecutter

Currently there are many projects providing cookie cutters for Scientific Python projects. At the very least, it may be nice to make a list of the existing cookie cutters. Here are some other things to discuss:

is there anything not already covered by scikit-HEPs or OpenAstornomy's?
are there major differences between the various cookie cutters?
If so, would it be possible to come to some agreement about what should be consistent and would should vary between the various recommended cookie cutters.

SPECs (new / process / infrastructure)

Hackmd link for the summit: https://hackmd.io/MmbP4VTATyG129_U56xdJQ

Things to learn about

It came up during multiple pre-summit meetings, that it would be nice to have knowledge transfer sessions.

Please comment topic suggestions in this issue.

sphinx extensions

Release Management

Domain Stacks

Build Systems

Hackmd link for the summit: https://hackmd.io/0M1Yh7KwTnaXSsU14BiyQw

Ecosystem-level tutorial / documentation content

Splitting this off from the issue of tutorial infrastructure (#9). There are lots of things that don't need separate documentation for each package; things like

standard GitHub setup for contributors (fork, clone, add upstream, don't open PRs from main, etc)
explanation of CIs and how to view their logs
project management / workflows. This one is more aimed at maintainers, and has the goal of making it easier for contributors from other parts of the ecosystem to jump in and contribute to other packages without too much friction. Here I'm thinking of things like pre-commit setup, PR merge criteria, etc: probably there won't be general agreement on something like PR merge criteria across the ecosystem, but we could at least agree upon a standard place to write down what the PR merge policy is, so that a new contrib could easily find that information (CONTRIBUTING.md?) regardless of what package they happen to be contributing to.

obviously there will be some degree of variability across packages in how each of these things are done, but that's an opportunity for figuring out which ones have a clear best-practice (that could be codified in a SPEC?) and which are cases of "reasonable packages will disagree". For the latter cases, the ecosystem-level tutorial would be either more conceptual (individual package docs fill in the details) or simply not exist.

cc @lwasser who is planning tutorials about packaging I think

Originally posted by @drammock in #9 (comment)

scientific-python / summit-2023 Goto Github PK

summit-2023's People

Contributors

Stargazers

Watchers

summit-2023's Issues

Recommend Projects

Recommend Topics

Recommend Org