Git Product home page Git Product logo

coffee_boat's Introduction

coffee_boat_logo buildstatus

pyspark-deps

WIP PySpark dependency management.

Tests

Running tests will modify your current virtualenv’s packages. This software is WIP.

Using

Not in production, please! If you’re curious to see how it can be used, check out https://github.com/nteract/coffee_boat/blob/master/Coffee%2BBoat%2BSample.ipynb

This has been tested (although not thoroughly) with standalone mode and YARN. It explicitly does not currently support local mode, although we’d love your contribs.

For now, stay tuned or join us in hacking!

Alternatives

Build you own conda env, or use one of the commercial integrated solutions which has a nice UI for this.

coffee_boat's People

Contributors

bryancutler avatar holdenk avatar rgbkrk avatar sukrubezen avatar willingc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coffee_boat's Issues

Improve CI

Now that #26 is finished we should look at how we can improve our CI testing.

idk maybe YARN on CI?

Investigate distribute venv cleanup

In theory most of what we do is with add files in Spark which should be handled, but the decompressed directory I'm less certain about. We should investigate this.

Turn README into an RST

Current README is just a md file. For publishing to pypi we should generate an RST (spark uses a package to do this).

explore conda environments

As was done in snakestagram, explore dependency management with conda.

du -hs snakestagram*
448M	snakestagram
100M	snakestagram-latest-darwin-x64.tar.gz

Uncompressed 448MB conda environment that is fully copyable, 100MB compressed.

Adding on tensorflow to this conda environment makes this 692MB uncompressed, and 145MB compressed.

In the words of @holdenk, "165MB to all the workers is totally fine and not a problem, I totally said this".

Handle local cleanup better

Write now we create a bunch of temp files but don't really clean them up. There is a flag to do part of this but it needs to be tested and see what else we might be missing (also if we need to change how we download mini conda?)

UI prototyping

In the Alternatives section of the README, I noticed it said:

Build you own conda env, or use one of the commercial integrated solutions which has a nice UI for this.

Which makes me wonder how we could start this off, what people would expect to be able to do, as well as what our message protocol from frontend to kernel would be.

Support Spark on K8 better

Right now we do some terrible things with overriding the PYTHON_PATH, which is great and works in the general case. If the Spark+K8 folks end up integrating better first party support we should integrate with that for K8 deployments.

Enable basic local cleanup

First version accidently shipped without it. Note: there is a seperate follow up to do this "better" later.

Add PySpark dep back in

For now we leave it out since many providers won't support it right now. Long story, buy me a ☕ .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.