scientific-python / summit-2023 Goto Github PK
View Code? Open in Web Editor NEWWork summit 2023
Work summit 2023
Hackmd link for the summit: https://hackmd.io/JL5slkxORA-q7VRN79v1sA
From the Feb 27 meeting: "How do we collect metrics and package stats?"
Team-compass repos.
Several years ago the JupyterHub team created a team-compass repository. The goal of the repo is to provide a place for "team interaction, syncing, and handling meeting notes across the JupyterHub ecosystem".
The point of this issue is that I think more projects would benefit from having a team-compass. The outcome of working on this would be a SPEC and other material explaining the idea, give advice, etc to popularise the idea.
We use the repo to talk about topics from "reduced availability", over "should we use pre-commit for all repos?", "coordinating outreachy work" and "is self-merging your PRs Ok?". But I think the best way to get a feeling is for people to browse the repo a bit.
Having a place like this creates a place to talk about things that are more about how to work together, than doing the work (which is more appropriate to discuss in the actual project repo).
The structure is popular and since a few years ago most of the other parts of Project Jupyter have adopted them.
It would be interesting to hear what other projects do and find out if there is something worth recommending to the wider community.
Hackmd link for the summit: https://hackmd.io/YL5DNtsaSsS-1ZU3Pxkrxg
I was recently chatting with some folks developing R packages and learned that when they want to make a release available on CRAN, the package is installed in the environment and the test suites of all downstream packages are ran to ensure that the new release does not come with an unexpected breaking change. While I think this is a bit too much (and it does indeed cause some friction and negativity) I'd like to explore the options how to do the same with Python.
I can imagine we have nightly runs that install a defined set of downstream packages (not one or two, but more twenty or forty) and runs their tests against the current main. While this is already technically feasible, I am not aware of anyone doing it comprehensively and it currently faces the issue of inconsistency of build and CI systems.
We could clone each repo and build from source, but then we need to know how each of the downstream packages get built and prepare for that. Which is non-trivial for a lot of spatial stuff depending on GDAL and friends. However, I don't think we need to test against main but testing against the latest release should be enough. But that comes with the other issue, that if we install the packages from PyPI, the tests are sometimes not a part of the distribution and in other cases will not run, because some data they use are available only in the repo itself.
I don't have a clear idea how to tackle this but I'd love to spend some time thinking about tooling that may enable it. It happened numerous times that we found a regression because a CI of a downstream package failed. This way, we would figure it ourselves.
Hackmd link for the summit: https://hackmd.io/YL5DNtsaSsS-1ZU3Pxkrxg
Hackmd for the summit: https://hackmd.io/UNwG2BjJSxOUJ0M1iWI-nQ
Edit by bsipocz to hash out some ideas e.g. to include in CI templating, etc:
filterwarnings
to error out on warnings: expected warnings should be handled in CI, or ignored. Erroring out by default provides enormous help to notice new deprecations while upstream dependencies are still in their dev cycle (assuming there is a job that uses the dev versions/nightly builds)Also, this could include documentation about the available CI tooling, though there is a risk involved here with keeping this up-to-date.
Raised by @MridulS at the Feb 27 prep-meeting.
EDIT: PR title changed to reflect the nature of the problem; initial PR description below is just one idea towards solving it
We could maintain forks of various key dependencies at the ecosystem level, and ecosystem projects could then depend on / install from there. Hopefully this results in fewer cases of "shoot, dependencyX is breaking our CIs, I don't have time to investigate right now so I'll just pin it" happening across many ecosystem packages. Examples of possible dependencies where this might make sense:
Some of these might be cases of "fork-and-freeze", with a predetermined schedule of when upstream changes are pulled in. Prior to such updates, a couple of ecosystem packages could temporarily switch to installing the upstream in a PR to see if any CIs break, and if they do, the update could be delayed until the breakage was resolved.
Others might need to be occasionally maintained at the ecosystem-fork level (here I'm thinking about recent docutils release where node.traverse
replaced node.findall
and it had to be fixed in numpydoc, sphinxcontrib-bibtex, and a couple other of MNE-Python's dependencies). Having ecosystem-level forks would mean that as soon as any of us noticed a problem, a patch could be made at the ecosystem fork level and the fix would be available to the ecosystem immediately (i.e., without waiting for the patch to be accepted upstream).
Questions:
Currently there are many projects providing cookie cutters for Scientific Python projects. At the very least, it may be nice to make a list of the existing cookie cutters. Here are some other things to discuss:
Hackmd link for the summit: https://hackmd.io/MmbP4VTATyG129_U56xdJQ
It came up during multiple pre-summit meetings, that it would be nice to have knowledge transfer sessions.
Please comment topic suggestions in this issue.
Hackmd link for the summit: https://hackmd.io/0M1Yh7KwTnaXSsU14BiyQw
Splitting this off from the issue of tutorial infrastructure (#9). There are lots of things that don't need separate documentation for each package; things like
main
, etc)CONTRIBUTING.md
?) regardless of what package they happen to be contributing to.obviously there will be some degree of variability across packages in how each of these things are done, but that's an opportunity for figuring out which ones have a clear best-practice (that could be codified in a SPEC?) and which are cases of "reasonable packages will disagree". For the latter cases, the ecosystem-level tutorial would be either more conceptual (individual package docs fill in the details) or simply not exist.
cc @lwasser who is planning tutorials about packaging I think
Originally posted by @drammock in #9 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.