I'm using subpar to generate an independent python executable with Bazel. Unfortunate

if you have a build rule like: <div class="snippet-clipboard-content notranslate p

Can you use pkg_resources or <code class="notranslate

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Par executables can't find data about subpar HOT 12 OPEN

google commented on May 2, 2024

Par executables can't find data

from subpar.

Comments (12)

dsculptor commented on May 2, 2024 3

if you have a build rule like:

par_binary(
    name = alpha, 
    data = ["data.txt"]
    ...
)

# Then the following commands are available:
bazel run //path/to/alpha           # This is same as py_binary.
bazel run //path/to/alpha.par    # This is subpar in action

However, It turns out only one of them can work:

bazel is recommending us to use their own runfiles library to access data files.
subpar is recommending the use of pkg_util!

Can we have an ultimate example of a python project which uses a simple data file - and it works for both par as well as py_binary?

from subpar.

jackhumphries commented on May 2, 2024 3

Hi,

I'm having a similar issue. I have this project tree:

project/BUILD
project/experiments/scripts/BUILD

In project/BUILD, I have a cc_binary called agent. In project/experiments/scripts/BUILD, I have a py_library with a data dependency (data = ["//:agent"]) and a par_binary that depends on that py_library.

I've been trying for two hours and I can't figure out how to access the agent binary from my Python code. Does anyone know what I should put in place of the question mark below? Also, is there a constraint that a par_binary can only have a data dependency on a target that is in its own directory? I was having issues creating a par_binary from Python files in a subdirectory, so maybe that is part of the issue here.

pkgutil.get_data("?", "agent")

Attempts:

pkgutil.get_data("", "agent") # Returns None
pkgutil.get_data("experiments", "agent") # Returns None
pkgutil.get_data("experiments.scripts", "agent") # Returns None

from subpar.

mattmoor commented on May 2, 2024

Can you use pkg_resources or pkgutil to access the file?

Here's an example where we use pkgutil to access a file within the PAR and extract it onto the filesystem so that things work as the bundled library expects.

from subpar.

mattmoor commented on May 2, 2024

Actually maybe it's pkg_resources that doesn't work well with PAR, so try pkgutil first :)

from subpar.

hwright commented on May 2, 2024

pkgutil works, thanks.

Is this in the documentation somewhere?

from subpar.

mattmoor commented on May 2, 2024

@duggelz is the authority on this repo. If not, I think we should track adding it with this issue.

from subpar.

duggelz commented on May 2, 2024

.par files are not intended (by me, at least), to extract all of their files to disk, that kind of defeats the point.

However, the waters are seriously muddied by the Bazel .zip for Windows which does always extract by default, and the various ways to create .par files inside Google that use magic command lines or environment variables to autoextract.

So, point 1:

Document how to access data files

Yes, I should do this. For reference it's like:

import pkgutil
dat = pkgutil.get_data('my.package.name', 'filename.ext')

This provides a file-like object, which is often good enough. When you really need an actual file, you need an API to materialize that file to disk. The internal Google API is terrible (I can say that because I wrote it) so we don't plan to open-source it. The pkg_resources module should be the preferred API, at least it's better, but there's some issues with proper metadata handling at present, and also there are some logistical issues with pkg_resources being part of setuptools rather than part of the Python standard library, the way pkgutil is.

"Feature Request: .par files should autoextract when you run them".

This is a valid feature request, but I'm biased by the fact that we're actively trying to move away from this inside Google, because the performance and disk usage implications have become quite severe. It's a balance between programmer ease of use, and performance/resource usage, and Google's position on that line is probably quite different than almost everyone else.

from subpar.

hwright commented on May 2, 2024

Slight correction: pkgutil.get_data returns a string (on Python 3 I believe it's actually bytes), not a file-like object.

I personally have less interest in 2, but see how others might.

from subpar.

duggelz commented on May 2, 2024

A resource API is finally coming to the standard library in Python 3.7, and will be backported to 2.7 and 3.4-3.6. Hallelujah!

https://gitlab.com/python-devs/importlib_resources

Also, I'm leaning toward a "just extract everything to disk all the time" strategy for this tool, instead of the much more complicated heuristics used inside Google for their performance benefits. At the same time, we're investigating open-sourcing the real PAR file implementation used inside Google.

from subpar.

hwright commented on May 2, 2024

@duggelz If it's coming in Python 3.7, that means we'll only have to wait 3-4 years before it makes it into the distroless base images which rules_docker uses. :)

from subpar.

depthwise commented on May 2, 2024

Could someone suggest how to deal with data deps provided by WORKSPACE? Basically, I'd like to embed a deep learning model and then read it with TFLite from inside. TFLite needs either a file, or byte representation of the model. The model does get embedded into PAR, but it's at the root level (if I unzip it), and therefore, it seems, pkgutil can't get to it.

The layout of the unpacked par is as follows:

tflite_models  __main__.py  subpar  <namespace name>

The models are inside tflite_models.

from subpar.

depthwise commented on May 2, 2024

Answering my own question after digging through the code some more:

pkgutil.get_data("__main__", "tflite_models/detect_float.tflite")

gets the data

from subpar.

Par executables can't find data about subpar HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent