Git Product home page Git Product logo

Comments (12)

dsculptor avatar dsculptor commented on May 2, 2024 3

if you have a build rule like:

par_binary(
    name = alpha, 
    data = ["data.txt"]
    ...
)

# Then the following commands are available:
bazel run //path/to/alpha           # This is same as py_binary.
bazel run //path/to/alpha.par    # This is subpar in action

However, It turns out only one of them can work:

  • bazel is recommending us to use their own runfiles library to access data files.
  • subpar is recommending the use of pkg_util!

Can we have an ultimate example of a python project which uses a simple data file - and it works for both par as well as py_binary?

from subpar.

jackhumphries avatar jackhumphries commented on May 2, 2024 3

Hi,

I'm having a similar issue. I have this project tree:

project/BUILD
project/experiments/scripts/BUILD

In project/BUILD, I have a cc_binary called agent. In project/experiments/scripts/BUILD, I have a py_library with a data dependency (data = ["//:agent"]) and a par_binary that depends on that py_library.

I've been trying for two hours and I can't figure out how to access the agent binary from my Python code. Does anyone know what I should put in place of the question mark below? Also, is there a constraint that a par_binary can only have a data dependency on a target that is in its own directory? I was having issues creating a par_binary from Python files in a subdirectory, so maybe that is part of the issue here.

pkgutil.get_data("?", "agent")

Attempts:

  1. pkgutil.get_data("", "agent") # Returns None
  2. pkgutil.get_data("experiments", "agent") # Returns None
  3. pkgutil.get_data("experiments.scripts", "agent") # Returns None

from subpar.

mattmoor avatar mattmoor commented on May 2, 2024

Can you use pkg_resources or pkgutil to access the file?

Here's an example where we use pkgutil to access a file within the PAR and extract it onto the filesystem so that things work as the bundled library expects.

from subpar.

mattmoor avatar mattmoor commented on May 2, 2024

Actually maybe it's pkg_resources that doesn't work well with PAR, so try pkgutil first :)

from subpar.

hwright avatar hwright commented on May 2, 2024

pkgutil works, thanks.

Is this in the documentation somewhere?

from subpar.

mattmoor avatar mattmoor commented on May 2, 2024

@duggelz is the authority on this repo. If not, I think we should track adding it with this issue.

from subpar.

duggelz avatar duggelz commented on May 2, 2024

.par files are not intended (by me, at least), to extract all of their files to disk, that kind of defeats the point.

However, the waters are seriously muddied by the Bazel .zip for Windows which does always extract by default, and the various ways to create .par files inside Google that use magic command lines or environment variables to autoextract.

So, point 1:

  1. Document how to access data files

Yes, I should do this. For reference it's like:

import pkgutil
dat = pkgutil.get_data('my.package.name', 'filename.ext')

This provides a file-like object, which is often good enough. When you really need an actual file, you need an API to materialize that file to disk. The internal Google API is terrible (I can say that because I wrote it) so we don't plan to open-source it. The pkg_resources module should be the preferred API, at least it's better, but there's some issues with proper metadata handling at present, and also there are some logistical issues with pkg_resources being part of setuptools rather than part of the Python standard library, the way pkgutil is.

  1. "Feature Request: .par files should autoextract when you run them".

This is a valid feature request, but I'm biased by the fact that we're actively trying to move away from this inside Google, because the performance and disk usage implications have become quite severe. It's a balance between programmer ease of use, and performance/resource usage, and Google's position on that line is probably quite different than almost everyone else.

from subpar.

hwright avatar hwright commented on May 2, 2024

Slight correction: pkgutil.get_data returns a string (on Python 3 I believe it's actually bytes), not a file-like object.

I personally have less interest in 2, but see how others might.

from subpar.

duggelz avatar duggelz commented on May 2, 2024

A resource API is finally coming to the standard library in Python 3.7, and will be backported to 2.7 and 3.4-3.6. Hallelujah!

https://gitlab.com/python-devs/importlib_resources

Also, I'm leaning toward a "just extract everything to disk all the time" strategy for this tool, instead of the much more complicated heuristics used inside Google for their performance benefits. At the same time, we're investigating open-sourcing the real PAR file implementation used inside Google.

from subpar.

hwright avatar hwright commented on May 2, 2024

@duggelz If it's coming in Python 3.7, that means we'll only have to wait 3-4 years before it makes it into the distroless base images which rules_docker uses. :)

from subpar.

depthwise avatar depthwise commented on May 2, 2024

Could someone suggest how to deal with data deps provided by WORKSPACE? Basically, I'd like to embed a deep learning model and then read it with TFLite from inside. TFLite needs either a file, or byte representation of the model. The model does get embedded into PAR, but it's at the root level (if I unzip it), and therefore, it seems, pkgutil can't get to it.

The layout of the unpacked par is as follows:

tflite_models  __main__.py  subpar  <namespace name>

The models are inside tflite_models.

from subpar.

depthwise avatar depthwise commented on May 2, 2024

Answering my own question after digging through the code some more:

pkgutil.get_data("__main__", "tflite_models/detect_float.tflite")

gets the data

from subpar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.