Git Product home page Git Product logo

pyodide-pack's Introduction

pyodide-pack

PyPI Latest Release GHA CI codecov

Python package bundler and minifier for the web

Pyodide-pack aims to reduce the size and load time of Python applications running in the browser using different strategies:

  • Minification of the source code via AST rewrite
  • Transformation of the source code into a different format such as Python bytecode (.pyc files)
  • Dead code elimination, by removing unused Python modules (detected at runtime)

Each of these approaches have different tradeoffs, and can be used separately or in combination.

Install

Pyodide-pack requires Python 3.10+ and can be installed via pip:

pip install pyodide-pack

(optionally) For elimation of unused modules via runtime detection, run NodeJS needs to be installed together with Pyodide 0.24.0+:

npm install pyodide@">=0.24.0"

Usage

For Python wheel minification via AST rewrites, run,

pyodide minify <path_to_dir_with_py_files>

See the documentation at pyodide-pack.pyodide.org for more details.

License

Pyodide-pack uses the Mozilla Public License Version 2.0.

pyodide-pack's People

Contributors

pre-commit-ci[bot] avatar rth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyodide-pack's Issues

Random import failures at validation

Currently when re-running the bundling of the same application (even a pure Python one), occasionally it would fail (e.g. with the message that some of the included packages cannot be imported)

Then work fine on the next run. This reminds me a bit of random selenium test failures, however, this is on my laptop in Node, so insufficient RAM should not be an issue.

I have worked on improving the determinism of packing, but there are likely still some issues.

For instance, maybe checking that the produced build artifact is exactly identical between two runs could be a start.

Rewrite package code to reduce inter-module dependencies

One reason why this bundler is not able to reduce sizes that much is because of the convention of using top-level imports in Python.
With top-level imports, even if a functionality is required for a function that we don't use, if we use other functions from that module, it would have to be included. This is particularly problematic for dynamic libraries which take a long time to load.

There are two ways around this,

  1. Manual fixes: we could do upstream fixes, to reduce inter-module dependencies
  2. Automatic code rewrites: one could imagine, under some conditions, trying to rewrite all code (e.g. via AST) to move top level imports under the functions / methods where they are used. Maybe starting by doing this only for .so libraries, and or imports used only a few times.

Record a list of accessed symbols

I'm trying to record the list of symbols that are being accessed in a given Pyodide application at runtime. So what I would like to do is to record a list of all argument calls of pyodide._module.__dlsym_js which then ends up being called in the real dlsym from C here in Emscripten. Together with pyodide._module.LDSO['loadedLibsByHandle'] that should tell which symbol is being loaded from which .so file. The end goal is to not bundle .so which are not being accessed.

I have tried doing something like,

  let dlsymCalls = [];  
  const dlsymOrig = pyodide._module.__dlsym_js;    
  pyodide._module.__dlsym_js = function(handle, symbol, symbolIndex) {
        dlsymCalls.push({handle: handle, symbol: symbol, symbolIndex: symbolIndex});       
        return dlsymOrig(handle, symbol, symbolIndex);                                      
  }

and then after import scipy I would expect some values in that list. But I get nothing.

Same by doing it via apply,

  let dlsymCalls = [];  
  pyodide._module.__dlsym_js.apply = function(context, args) {
    dlsymCalls.push(args); 
    return this.original.apply(context, args); 
  };  

Do you see another way of doing this via monkeypatching without patching emscripten code?

Maybe a question for @hoodmane since you were interested some time ago in Emscripten's dynamic linking code )

Improve dynamic libraries handling

Currently, we are

  1. intercepting dynamic libraries loading here. FS.findObject happen to be called when loading dynamic libraries, but maybe that's not the best hook for it.
  2. write a file with that list inside of the created bundle
  3. call loadDynlib on dynamic libraries, while preserving of the order they were initially loaded.

This can likely be improved. For instance, CLAPACK doesn't seem to be detected currently and needs to be included manually via the --include='*lapack*so' option.

Support stdlib bundling

We could extend this logic to the stdlib (which accounts for about half of the minimal Pyodide application). However this would require,

  • include the stdlib as a .tar file instead of a binary .data file
  • make sure we can monkeypatch FS.open before any stdlib files are opened
  • have the ability to swap this file.

Find a better name

As a name "pyodide-pack" works but is a bit boring, and it would be even worse if we start using subcommand (e.g. pyodide-pack minimize). Instead, it would be better to give it a shorter and nicer name,

Some (not very good) ideas,

  • pyopack: might be confusing with https://github.com/seemoo-lab/openwifipass/blob/main/OPACK.md already
  • pypack: maybe too general and the package already exists
  • wpack: (with ''w' for web and wasm) a bit difficult to read, and also feels like a name stealing from webpack
  • ypack: ("yp" -> "py") maybe, but the link with wasm is not obvious, name registered on pypi but could probably be released
  • wapack: (wa as webassembly) maybe, but then it would have been preferrable to have a Python reference
  • yopack (from pyodide) doesn't sound very serious

If you have any other ideas please comment

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.