Git Product home page Git Product logo

python-wasm's Introduction

CPython on WASM

Build scripts and configuration for building CPython for Emscripten.

Check out Christian Heimes' talk about the effort at PyConDE: https://www.youtube.com/watch?v=oa2LllRZUlU

Pretty straight forward. First, install emscripten. Then, run the following commands:

# get the Python sources
./fetch-python.sh
# build Python for the machine we are building on, needed before cross compiling for emscripten
./build-python-build.sh
# build Python cross-compiling to emscripten
./build-python-emscripten-browser.sh

There will probably be errors, but that's just part of the fun of experimental platforms.

Assuming things compiled correctly, you can have emscripten serve the Python executable and then open http://localhost:8000/python.html in your browser:

./run-python-browser.sh

The CLI input is done via an input modal which is rather annoying. Also to get output you need to click Cancel on the modal...

Developing

Once you've built the Emscripten'd Python, you can rebuild it via

./clean-host.sh
./build-python-emscripten-browser.sh

which will rebuild Python targeting emscripten and re-generate the python.{html, wasm, js}

Test build artifacts

You can also download builds from our CI workflow and test WASM builds locally.

Emscripten browser build

  • download and unzip the emscripten-browser-main.zip build artifact
  • run a local webserver in the same directory as python.html, e.g. python3 -m http.server
  • open http://localhost:8000/python.html
  • enter commands into the browser modal window and check the web developer console (F12) for output. You may need to hit "Cancel" on the modal after sending input for output to appear.

Emscripten NodeJS build

  • download and unzip the emscripten-node-main.zip build artifact
  • run node python.js (older versions may need --experimental-wasm-bigint)

WASI

  • download and unzip the wasi-main.zip build artifact
  • install wasmtime
  • run wasmtime run --dir . -- python.wasm

python-wasm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-wasm's Issues

Emscripten build with NODERAWFS=1

It might be helpful for tests? being in a more familiar environment? to do an emscripten build that uses the native filesystem, which is possible with NODERAWFS replacing the Emscripten in-memory filesystem. This would get us close to something like node python.js script.py.

FIle permissions don't quite work the same There are some limitations to going through this layer, but it's pretty seamless for the filesystem access that node does provide. This might let folks install the emscripten-compiled version of Python alongside other python builds like pypy.

WASI support seems strategically more important, but this could be a useful foothold.

Use out-of-tree build

Awesome!

You can simplify building by using out-of-tree builds. This lets you share a checkout. You can also get the current host triple from config.guess.

git clone https://github.com/python/cpython.git

cd cpython
mkdir -p builddir/host
mkdir -p builddir/wasi

# if you use an existing checkout, you must run "make clean"

pushd builddir/host
../../configure -C
make -j$(nproc)
popd

pushd builddir/wasi
emconfigure ../../configure -C ... --build=$(../../config.guess)
popd

Lazy load infrequently used modules?

The build scripts now load all of the modules that aren't already removed from a zip file, but one thing we could do is lazy load things instead, only putting the most frequently used modules in the browser cache (which we are close to maxing out, I think they are often 20-50MB, and the data assets are just over 20MB).

Disable unusable core extension module

3.11-dev / main has new code in its configure script to disable extension modules for a platform. I propose the disable modules for emscripten system that don't make sense in a browser environment or are rarely used, e.g. terminal and UI related modules such as curses, termios or tkinter, dbm, multiprocessing, and so on.

  • _curses _curses_panel
  • _dbm _gdbm
  • grp
  • _multiprocessing _posixshmem
  • nis
  • ossaudiodev
  • _posixsubprocess
  • resource
  • _scproxy
  • spwd
  • syslog
  • termios
  • _xxsubinterpreters

build-python-build.sh and setuptools 60.5.0

on Ubuntu 20.04, setuptools 60.5.0 causes dynamically loaded libraries to fail (and therefore things like "import math" during build to fail).

The temporary workaround is to invoke the python build as:
SETUPTOOLS_USE_DISTUTILS=stdlib make -j

which I needed to do in build-python-build.sh

I suppose this is just a place / way to log this until the setuptools fix is released...

Demo executable examples in docs

If we can get code examples on docs.python.org to be executable in the browser then that might be quite the boon for this project, motivation for others to get involved, and a way to make sure things don't break. 😉

Investigate compiler and linker flags

  • LTO (link time optimization) may generate faster interpreter
  • -Os instead of -O3 / -O2 generates much smaller code. Let's check if it makes a noticeable performance impact

How to use the compiled WASI artifacts?

Hi,
I'm trying to figure out how can I use the compiled WASI artifact, I tried the following but it threw me an exception;

$ wasmtime run --env PYTHONPATH=/builddir/wasi/$(cat pybuilddir.txt) --mapdir /::../../ --  python.wasm 
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = '/builddir/wasi/build/lib.wasi-wasm32-3.11'
  program name = 'python.wasm'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = '/usr/local/lib/python3.11'
  sys._base_executable = ''
  sys.base_prefix = '/usr/local'
  sys.base_exec_prefix = '/usr/local'
  sys.platlibdir = 'lib'
  sys.executable = ''
  sys.prefix = '/usr/local'
  sys.exec_prefix = '/usr/local'
  sys.path = [
    '/builddir/wasi/build/lib.wasi-wasm32-3.11',
    '/usr/local/lib/python311.zip',
    '/usr/local/lib/python3.11',
    '/usr/local/lib/python3.11/lib-dynload',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00000001 (most recent call first):
$ wasmtime run --env PYTHONPATH=/builddir/wasi/$(cat pybuilddir.txt) --mapdir /builddir/wasi/::. --  python.wasm
Exception ignored error evaluating path:
Traceback (most recent call last):
  File "<frozen getpath>", line 353, in <module>
OSError: [Errno 76] Capabilities insufficient
Fatal Python error: error evaluating path
Python runtime state: core initialized

Current thread 0x00000001 (most recent call first):
  1. I also tried to put the "build" files in the same dir

Add CI

Once we have fixed #9, we should probably add CI so that we don't break things :)

sysconfig info module is misnamed/not found

It seems that a _sysconfigdata__linux_x86_64-linux-gnu.py is generated under lib.wasm32-unknown-emscripten-3.11, which seems wrong?

Either way it isn't being included when running altinstall, so sysconfig.get_config_vars() doesn't work.

option to disable file_packager stage ?

Maybe the right place to ask is https://bugs.python.org/issue40280, but is there a option to disable the emscripten file packaging stage ( but not filesysem support )?
Because when using external compressed filesystem like e.g. those BrowserFS provides, it builds a bigger .data than the minimum wanted.

i think --preload-file comes from here https://github.com/python/cpython/blob/c27a33132be101e246ae2584f1826477357138d6/Makefile.pre.in#L92 originating from https://github.com/python/cpython/blob/ffa505b580464d9d90c29e69bd4db8c52275280a/configure.ac#L1845

maybe ideally the .data should only contains importlib and coldstart requirements, so the stdlib big chunk download can be defered, or handled by a third party. see pyodide/pyodide#646 (comment)

Using \N after exiting and restarting results in a function signature mismatch

Series of steps which reveal the bug:

  1. Click Start REPL button
  2. Type print("\N{digit nine}") and hit Enter
  3. See that 9 printed out
  4. Type exit() and hit Enter (to exit the process)
  5. Click Start REPL button again
  6. Type print("\N{digit nine}") and hit Enter
  7. Watch the process hang and an error message print to the console (this is the bug)

image

Similar series of working steps (notice no \N{...} was used)

  1. Click Start REPL button
  2. Type print("9") and hit Enter
  3. See that 9 printed out
  4. Type exit() and hit Enter (to exit the process)
  5. Click Start REPL button again
  6. Type print("9") and hint Enter
  7. See that 9 printed out

image


Context

This might seem like a strange bug to care about, but it's causing my Python pastebin site (which now runs Python code in-browser) to fail on some code examples. Specifically the "Run in Browser" button works the first time it's pressed but hangs the second time it's pressed. Here's a page that demonstrates the issue.

Debugging Emscripten builds

It's tricky to debug emscripten builds. By default emcc strips all debug symbols and even function names from WASM files. I got some promising results with debug builds, source map, and Chromium DevTools.

  • Configure with --with-pydebug
  • Extend PY_LDFLAGS_NODIST var with -gsource-map --source-map-base http://localhost:8000/builddir/host/
  • emmake make -j8 python.html
  • Run python3 -m http.server from the cpython base directory (not the directory where python.html lives).
  • open http://localhost:8000/builddir/host/python.html
  • Use Chromium with C/C++ DevTools Support (DWARF) support and WebAssembly Debugging: Enable DWARF support.

Screenshot_20220105_171125

Suggestion for installing third-party libraries

I'd like to provide a way to download arbitrary ZIP archives from PyPI and install them in the WASM environment.

One way would be to just hit the PyPI API to find and download the ZIP file for a package. The issue there is that running pip to actually install the package doesn't work (since pip is disabled).

It looks like pyodide addressed this problem by creating a micropip package.

Any recommendations for how to approach this without inventing or copying micropip? It could be that this isn't something core Python should support directly and this should be entirely left up to distributions like pyodide. But if there is (or could be) a simple way to install a Python package given a ZIP file, I'd love to know!

Get ctypes working

  • build libffi from https://github.com/hoodmane/libffi-emscripten
  • cp /opt/libffi-emscripten/lib/libffi.a /emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/pic/
  • cp /opt/libffi-emscripten/include/ffi* /emsdk/upstream/emscripten/cache/sysroot/include/
  • Build with ./build-python-emscripten-node.sh --enable-wasm-dynamic-linking
  • Add *shared*\n_ctypes _ctypes/_ctypes.c _ctypes/callbacks.c _ctypes/callproc.c _ctypes/stgdict.c _ctypes/cfield.c -lffi to Module/Setup.local

Known Emscripten issue to report upstream

A ticket to collect issues and bugs with Emscripten. We want to report them upstream eventually.

Build container with all dependencies

I have created and pushed a build container image with all build dependencies pre-installed and emcc ports pre-seeded. The container might be useful for others that like to play around with emscripten and Python WASM. The image is pretty big and experimental. It works for me must of the time, though.

$ cd cpython
$ podman run --rm -ti -v $(pwd):/python-wasm/cpython:Z quay.io/tiran/cpythonbuild:ubuntu-impish-wasm
# . /activate
# ./build-python-build.sh
...

On Docker-based systems, do

docker run --rm -ti -v $(pwd):/python-wasm/cpython quay.io/tiran/cpythonbuild:ubuntu-impish-wasm

Use stdlib zip bundle to reduce size of python.data

Fast hack:

mkdir - builddep/wasi/usr/local/lib/
cd Lib
zip -0 -r ../builddep/wasi/usr/local/lib/python311.zip *.py asyncio concurrent email encodings collections html http importlib logging multiprocessing sqlite3 urllib wsgiref xml zoneinfo
emcc -o python.html Programs/python.o libpython3.11d.a -ldl -lm --preload-file usr/

This reduces python.data to about 9 MB. I have to use zip -0 because my build env does not have zlib for wasm in its sysroot. For a production build we want to create and include __pycache__ as well as compress the file.

CI builds does not work: No module named 'zlib'

The CI build artifact does not work:

 Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
 Python runtime state: core initialized
 Traceback (most recent call last):
   File "<frozen zipimport>", line 570, in _get_decompress_func
 ModuleNotFoundError: No module named 'zlib'
 
 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "<frozen zipimport>", line 618, in _get_data
   File "<frozen zipimport>", line 573, in _get_decompress_func
 zipimport.ZipImportError: can't decompress data; zlib not available

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "<frozen zipimport>", line 196, in get_code
   File "<frozen zipimport>", line 752, in _get_module_code
   File "<frozen zipimport>", line 620, in _get_data
 zipimport.ZipImportError: can't decompress data; zlib not available

There seems to be a problem with the Modules/Setup.local symlink.

Remove need for `READELF=true`

When I looked at this a little while back I believe there was only one use of READELF in the code base and it was in setup.py for curses. Can probably make the existence of READELF be optional.

NODERAWFS: fstat() of unlinked file fails

Emscripten's fstat() fails with FileNotFoundError when the underlying file has been unlinked. The behavior breaks a handful of test cases.

import os

f = open("testfile", "w+")
f.write("test")
f.seek(0)
print(os.fstat(f.fileno()))
os.unlink(f.name)
print(f.read())
print(os.fstat(f.fileno()))
$ python test_fstat.py
os.stat_result(st_mode=33206, st_ino=10374816, st_dev=91, st_nlink=1, st_uid=0, st_gid=0, st_size=4, st_atime=1646767083, st_mtime=1646767083, st_ctime=1646767083)
test
os.stat_result(st_mode=33206, st_ino=10374816, st_dev=91, st_nlink=0, st_uid=0, st_gid=0, st_size=4, st_atime=1646767083, st_mtime=1646767083, st_ctime=1646767083)
$ ./run-python-node.sh fst.py 
os.stat_result(st_mode=33206, st_ino=10374816, st_dev=91, st_nlink=1, st_uid=0, st_gid=0, st_size=4, st_atime=1646765917, st_mtime=1646765917, st_ctime=1646765917)
test
Traceback (most recent call last):
  File "/python-wasm/fst.py", line 9, in <module>
    print(os.fstat(f.fileno()))
          ^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 44] No such file or directory

best practice to detect emscriptened platform ?

edited to keep only what could be best or that can match C
'emscripten' should be mentionned here https://docs.python.org/fr/3.11/library/sys.html#sys.platform


Recommended one liners for now:

The matching C tags would be __wasi__ and __EMSCRIPTEN__

if __import__('os').uname().machine.startswith("wasm"):
    print("detected webassembly")
   __wasi__ = True

if __import__('sys').platform == "emscripten":`
    print("detected emscripten")
    __EMSCRIPTEN__ = True

ref: wasi https://reviews.llvm.org/rGc1eee1d659155b99fac63cbf72491c27106f0481

Modify heap size or allow heap growth

When running

help('modules')

it causes Python to OOM (the heap is limited to 16MB). We should consider using either

  1. -s INITIAL_MEMORY=X for something like 20-30MB or
  2. just allow memory growth via -s ALLOW_MEMORY_GROWTH=1.

We should maybe also compile with -s ABORTING_MALLOC=0 which makes malloc() return NULL, which will give people a more familiar Python traceback rather than the following:

Uncaught RuntimeError: Aborted(Cannot enlarge memory arrays to size 16855040 bytes (OOM). Either (1) compile with  -s INITIAL_MEMORY=X  with X higher than the current value 16777216, (2) compile with  -s ALLOW_MEMORY_GROWTH=1  which allows increasing the size at runtime, or (3) if you want malloc to return NULL (0) instead of this abort, compile with  -s ABORTING_MALLOC=0 )

Investigate "asyncifying" the core interpreter loop

Emscripten has an asyncify pass which can allow synchronous C code yield to the JS event loop and even block on an async JS call.

I expect asyncifying the core interpreter loop would probably be difficult, if not impossible, and I'm not sure if it would cause problems, but it could provide a way to have synchronous Python I/O in the async browser environment.

Fix deployment to github pages

(Mostly for @brettcannon, but it is good to track these things)

The job here: https://github.com/ethanhs/python-wasm/blob/main/.github/workflows/ci.yml#L104 does not run when a PR is merged or on pushes to the repo. Until I discovered this two days ago, the github pages branch hadn't been deploying for a long time (I think several weeks?).

Ideally the job should run when we:

  • merge a PR
  • push to the main branch

but not when:

  • someone opens a PR
  • someone pushes to a PR

python-wasm.org

Hi,

I don't know if there is any interest in python-wasm.org, but it was a domain I was renting until yesterday. I'm not using the name anymore, so I let it expire. I'm just letting the devs of this project know, in case anybody wants to rent it. There's no coordination with me necessary, since I didn't renew it.

Anyway, once one of you sees this, please close this issue. Sorry for any noise.

-- William (wstein @ gmail.com)

Non pure python libs

Hello, this is just a question.
I'm only starting to look at wasm stuff and have a bunch of python scripts I could try to run on browsers and I'm particularly interested in that.
But I wish to know if there is a way to have python-wasm working with non pure python libs, I'm particularly thinking about ta-lib (the python port that still relies on libta/C) ?
As far as I understood, pyodide can't do that at least for now.

Thanks !

ModuleNotFoundError: No module named 'urllib'

Tried to type help() in https://repl.ethanhs.me/ and received error:

Python 3.11.0a4+ (heads/main-dirty:c47c9e6, Jan 18 2022, 05:26:55) [Clang 14.0.0 (https://github.com/llvm/llvm-project f142c45f1e494f8dbdcc1bcf1412 on emscripten
Type "help", "copyright", "credits" or "license" for more information.
>>> help()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen _sitebuiltins>", line 102, in __call__
  File "/build/cpython/builddir/emscripten-browser/../../Lib/pydoc.py", line 72, in <module>
ModuleNotFoundError: No module named 'urllib'
>>> 

Results of running the regression test suite on nodejs

After spending a lot of time going through all the tests, here is a summary of my findings towards running the regression test suite on Emscripten/nodejs:

The command line I used and exclude list I used are here: https://gist.github.com/ethanhs/656637de570cc2c239c8728c4aa1e731

The list of tests excluded also includes a brief description of each test's failure(s). A quick summary:

  • almost half of the test either directly import subprocess (51 tests) or indirectly does so via e.g. unitest.mock (97 tests)
  • despite pthread being compiled in, I seem unable to create threads under nodejs (6 tests), I am getting EAGAIN...
  • there are some locale issues which I presume are Emscripten bugs
  • it also requires a couple of small changes to CPython, to patch around an Emscripten bug and also to disable the faulthandler thread because it won't start and causes the test runner to crash after a successful run.
diff --git a/Lib/test/libregrtest/main.py b/Lib/test/libregrtest/main.py
index fc3c2b9692..0879bb6baa 100644
--- a/Lib/test/libregrtest/main.py
+++ b/Lib/test/libregrtest/main.py
@@ -659,7 +659,9 @@ def main(self, tests=None, **kwargs):
        except SystemExit as exc:
            # bpo-38203: Python can hang at exit in Py_Finalize(), especially
            # on threading._shutdown() call: put a timeout
-            faulthandler.dump_traceback_later(EXIT_TIMEOUT, exit=True)
+            # Emscripten has issues starting threads
+            if sys.platform != 'emscripten':
+                faulthandler.dump_traceback_later(EXIT_TIMEOUT, exit=True)

            sys.exit(exc.code)

diff --git a/Modules/socketmodule.c b/Modules/socketmodule.c
index 89e93c5818..dbd7d4dc63 100644
--- a/Modules/socketmodule.c
+++ b/Modules/socketmodule.c
@@ -5114,6 +5114,7 @@ sock_initobj(PyObject *self, PyObject *args, PyObject *kwds)
                                     &family, &type, &proto, &fdobj))
        return -1;

+
#ifdef MS_WINDOWS
    /* In this case, we don't use the family, type and proto args */
    if (fdobj == NULL || fdobj == Py_None)
@@ -7926,7 +7927,7 @@ PyInit__socket(void)
#ifdef  IPPROTO_VRRP
    PyModule_AddIntMacro(m, IPPROTO_VRRP);
#endif
-#ifdef  IPPROTO_SCTP
+#if defined(IPPROTO_SCTP) && !defined(__EMSCRIPTEN__)
    PyModule_AddIntMacro(m, IPPROTO_SCTP);
#endif
#ifdef  IPPROTO_BIP

Huge thanks to @tiran for helping figure out the IPPROTO_SCTP issue which was causing test_socket to crash the test runner!

With this I think the steps towards adding a build bot for Emscripten/nodejs:

  • Have python/cpython#30495 merged
  • Figure out why threads aren't starting - EDIT: It seems that WASM_STANDALONE doesn't work with pthread? strange...
  • Patch Python around the IPPROTO_SCTP for now
  • Set up build bot

Want to run things yourself? Here are the steps:

  1. Do a normal checkout of this repo, pull my above mentioned PR
  2. Apply the above patches
  3. Grab the tests.txt file in the above gist and put it in the top level of the checkout
  4. Run the command in the gist to run the test suite!

bzip2 breaks build

emscripten's bzip2 port breaks the build for me. I'm not entirely sure what is going on. It looks like the build archive includes the main function from mk251.c.

wasm-ld: error: duplicate symbol: main
>>> defined in Programs/python.o
>>> defined in /usr/share/emscripten/cache/sysroot/lib/wasm32-emscripten/libbz2.a(mk251.c.o)

I suggest to disable bzip2 for now and investigate later.

error happening when I try to install

hello,

what happen when I try to install :

Capture d’écran de 2021-12-25 20-00-36

the command line who is happen the problem is the following :

shell
# get the Python sources
./fetch-python.sh
# build Python for the machine we are building on, needed before cross compiling for emscripten
./build-python-build.sh
# build Python cross-compiling to emscripten
./build-python-host-emscripten.sh

Thank you in advan,ce to help myself pass thoses errors,

Regards.

Azaretdodo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.