python-bonobo / bonobo Goto Github PK
View Code? Open in Web Editor NEWExtract Transform Load for Python 3.5+
Home Page: https://www.bonobo-project.org/
License: Apache License 2.0
Extract Transform Load for Python 3.5+
Home Page: https://www.bonobo-project.org/
License: Apache License 2.0
Collect comments and questions from hackernews, reddit, issues and mails and compile them in a FAQ section.
Maybe an option that checks requirements.txt, if present, and install (or update) dependencies prior to run.
bonobo run --install blah/foo.py
Exception ignored in: <bonobo.ext.console.IOBuffer object at 0x104e1b978>
AttributeError: 'IOBuffer' object has no attribute 'flush'
Hi 👊
This is my first visit to this fine repo, but it seems you have been working hard to keep all dependencies updated so far.
Once you have closed this issue, I'll create separate pull requests for every update as soon as I find one.
That's it for now!
Happy merging! 🤖
$ bonobo run somedir
usage: bonobo run [-h] [--quiet] file
bonobo run: error: argument file: can't open 'somedir': [Errno 21] Is a directory: 'somedir'
Also, bonobo run -m .... should work.
For now, tutorial needs manual download like this:
curl https://raw.githubusercontent.com/python-bonobo/bonobo/master/bonobo/examples/datasets/coffeeshops.txt > `python -c 'import bonobo; print(bonobo.get_examples_path("datasets/coffeeshops.txt"))'`
This is not the best, and although it's not desirable to bundle the datasets within the eggs/wheels (which would add unnecessary files to the packaging), there should be an easy way to download the datasets. Github does not allow git archive
, so probably we should add a command that download the missing files.
bonobo examples download
??? Just suggesting, not thought about it.
When trying to run one of the basic examples from the documentation, I hit the following stack trace:
Traceback (most recent call last):
File "test.py", line 1, in <module>
import bonobo
File "C:\Users\mcopeland\Envs\bonobo\lib\site-packages\bonobo\__init__.py" line 17, in <module>
from .core import __all__ as __all_core__
File "C:\Users\mcopeland\Envs\bonobo\lib\site-packages\bonobo\core\__init_.py", line 3, in <module>
from .bags import Bag, ErrorBag
File "C:\Users\mcopeland\Envs\bonobo\lib\site-packages\bonobo\core\bags.py" line 3, in <module>
from bonobo.util.tokens import Token
File "C:\Users\mcopeland\Envs\bonobo\lib\site-packages\bonobo\util\__init__.py", line 6, in <module>
import blessings
File "C:\Users\mcopeland\Envs\bonobo\lib\site-packages\blessings\__init__.py" line 7, in <module>
from fcntl import ioctl
ImportError: No module named 'fcntl'
It seems that the blessings library leverages fcntl which is used for file and IO control over Unix routines. However, this leaves Windows users out of luck.
The following links have some tidbits of information:
http://stackoverflow.com/questions/1422368/fcntl-substitute-on-windows
cs01/gdbgui#18
This issue seems to plague many other projects (google "fcntl python windows" for a more thorough list of resources) and I haven't found any other examples of how this issue has been handled in a satisfactory manner.
There should be a way to define service dependencies for function-based transformations.
Maybe we can find a way to use the Service class as a decorator:
@Service('database.connection')
def query(db):
# ...
pass
Or something similar.
Let's move the hack-ish and outdated roadmap from the readme to https://www.bonobo-project.org/roadmap
Different references (contributing, for example) should link there.
A detail view of each roadmap "chapter" would be great too.
Cyclomatic complexity is too high in class Configurable. (11)
https://github.com/hartym/bonobo/blob/develop/bonobo/commands/run.py
(See the raise RuntimeError('Cannot --install on a file (only available for dirs containing requirements.txt).')
line)
Not professional windows user here, quite the contrary. Maybe noob mistake.
Maybe related to the way I installed git ? I asked for "clone using windows encoding, commit using unix". Gonna try again with full unix, maybe.
PS C:\bb36\bonobo> pip install -e .
Obtaining file:///C:/bb36/bonobo
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\bb36\bonobo\setup.py", line 48, in <module>
long_description=read('README.rst'),
File "C:\bb36\bonobo\setup.py", line 15, in read
content = f.read().strip()
File "c:\users\ieuser\appdata\local\programs\python\python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2: character maps to <undefined>
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\bb36\bonobo\
Look at other packages readme for inspiration
Why does rst readme show up bad on pypi? See how requests does it, there may be some transformations to make before release, or in setup.py.
https://docs.python.org/3/library/inspect.html
Pretty much all the tools needed to inspect python objects, and that could replace a bit of code.
CsvWriter takes dictionaries as its input. In some cases, it would make sense to have a zip() output or even a value tuple. Let's think about options to support all those cases.
Enhance bonobo version to show where the package is installed.
Two options (need choice):
If a transformation wants to define an argument parser, it will conflict because sys.argv is not "prepared" for children execution.
Example use case : oauth with google apis (see https://developers.google.com/sheets/api/quickstart/python), where there is "flags" defined.
I am getting the following error with pip install bonobo.
Building wheels for collected packages: psutil
Running setup.py bdist_wheel for psutil ... error
Complete output from command /Users/akash/installations/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/t2/tg4jlfqn715g4p9_5yh1kzsc0000gn/T/pip-build-yw2o18bv/psutil/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /var/folders/t2/tg4jlfqn715g4p9_5yh1kzsc0000gn/T/tmpjdqqg2odpip-wheel- --python-tag cp36:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-3.6
creating build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/__init__.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_common.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_compat.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_psbsd.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_pslinux.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_psosx.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_psposix.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_pssunos.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_pswindows.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
creating build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/__init__.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/runner.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_bsd.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_linux.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_memory_leaks.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_misc.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_osx.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_posix.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_process.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_sunos.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_system.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_windows.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
running build_ext
building 'psutil._psutil_osx' extension
creating build/temp.macosx-10.7-x86_64-3.6
creating build/temp.macosx-10.7-x86_64-3.6/psutil
creating build/temp.macosx-10.7-x86_64-3.6/psutil/arch
creating build/temp.macosx-10.7-x86_64-3.6/psutil/arch/osx
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/akash/installations/anaconda3/include -arch x86_64 -I/Users/akash/installations/anaconda3/include -arch x86_64 -DPSUTIL_POSIX=1 -DPSUTIL_VERSION=501 -DPSUTIL_OSX=1 -I/Users/akash/installations/anaconda3/include/python3.6m -c psutil/_psutil_osx.c -o build/temp.macosx-10.7-x86_64-3.6/psutil/_psutil_osx.o
In file included from /usr/include/Availability.h:190:0,
from /usr/include/stdio.h:65,
from /Users/akash/installations/anaconda3/include/python3.6m/Python.h:25,
from psutil/_psutil_osx.c:9:
/System/Library/Frameworks/CoreFoundation.framework/Headers/CFDateFormatter.h:53:34: error: 'introduced' undeclared here (not in a function)
kCFISO8601DateFormatWithYear API_AVAILABLE(macosx(10.12), ios(10.0), watchos(3.0), tvos(10.0)) = (1UL << 0),
^
/System/Library/Frameworks/CoreFoundation.framework/Headers/CFURL.h:777:39: error: 'deprecated' undeclared here (not in a function)
const CFStringRef kCFURLLabelColorKey API_DEPRECATED("Use NSURLLabelColorKey", macosx(10.6, 10.12), ios(4.0, 10.0), watchos(2.0, 3.0), tvos(9.0, 10.0));
^
/System/Library/Frameworks/CoreFoundation.framework/Headers/CFURL.h:777:39: error: 'message' undeclared here (not in a function)
const CFStringRef kCFURLLabelColorKey API_DEPRECATED("Use NSURLLabelColorKey", macosx(10.6, 10.12), ios(4.0, 10.0), watchos(2.0, 3.0), tvos(9.0, 10.0));
^
error: command 'gcc' failed with exit status 1
----------------------------------------
Failed building wheel for psutil
Running setup.py clean for psutil
Failed to build psutil
Installing collected packages: psutil, blessings, requests, stevedore, bonobo
Found existing installation: psutil 5.1.3
DEPRECATION: Uninstalling a distutils installed project (psutil) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
Uninstalling psutil-5.1.3:
Successfully uninstalled psutil-5.1.3
Running setup.py install for psutil ... error
Complete output from command /Users/akash/installations/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/t2/tg4jlfqn715g4p9_5yh1kzsc0000gn/T/pip-build-yw2o18bv/psutil/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/t2/tg4jlfqn715g4p9_5yh1kzsc0000gn/T/pip-m1qlazv0-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-3.6
creating build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/__init__.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_common.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_compat.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_psbsd.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_pslinux.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_psosx.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_psposix.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_pssunos.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
copying psutil/_pswindows.py -> build/lib.macosx-10.7-x86_64-3.6/psutil
creating build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/__init__.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/runner.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_bsd.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_linux.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_memory_leaks.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_misc.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_osx.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_posix.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_process.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_sunos.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_system.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
copying psutil/tests/test_windows.py -> build/lib.macosx-10.7-x86_64-3.6/psutil/tests
running build_ext
building 'psutil._psutil_osx' extension
creating build/temp.macosx-10.7-x86_64-3.6
creating build/temp.macosx-10.7-x86_64-3.6/psutil
creating build/temp.macosx-10.7-x86_64-3.6/psutil/arch
creating build/temp.macosx-10.7-x86_64-3.6/psutil/arch/osx
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/akash/installations/anaconda3/include -arch x86_64 -I/Users/akash/installations/anaconda3/include -arch x86_64 -DPSUTIL_POSIX=1 -DPSUTIL_VERSION=501 -DPSUTIL_OSX=1 -I/Users/akash/installations/anaconda3/include/python3.6m -c psutil/_psutil_osx.c -o build/temp.macosx-10.7-x86_64-3.6/psutil/_psutil_osx.o
In file included from /usr/include/Availability.h:190:0,
from /usr/include/stdio.h:65,
from /Users/akash/installations/anaconda3/include/python3.6m/Python.h:25,
from psutil/_psutil_osx.c:9:
/System/Library/Frameworks/CoreFoundation.framework/Headers/CFDateFormatter.h:53:34: error: 'introduced' undeclared here (not in a function)
kCFISO8601DateFormatWithYear API_AVAILABLE(macosx(10.12), ios(10.0), watchos(3.0), tvos(10.0)) = (1UL << 0),
^
/System/Library/Frameworks/CoreFoundation.framework/Headers/CFURL.h:777:39: error: 'deprecated' undeclared here (not in a function)
const CFStringRef kCFURLLabelColorKey API_DEPRECATED("Use NSURLLabelColorKey", macosx(10.6, 10.12), ios(4.0, 10.0), watchos(2.0, 3.0), tvos(9.0, 10.0));
^
/System/Library/Frameworks/CoreFoundation.framework/Headers/CFURL.h:777:39: error: 'message' undeclared here (not in a function)
const CFStringRef kCFURLLabelColorKey API_DEPRECATED("Use NSURLLabelColorKey", macosx(10.6, 10.12), ios(4.0, 10.0), watchos(2.0, 3.0), tvos(9.0, 10.0));
^
error: command 'gcc' failed with exit status 1
----------------------------------------
Rolling back uninstall of psutil
Command "/Users/akash/installations/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/t2/tg4jlfqn715g4p9_5yh1kzsc0000gn/T/pip-build-yw2o18bv/psutil/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/t2/tg4jlfqn715g4p9_5yh1kzsc0000gn/T/pip-m1qlazv0-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/t2/tg4jlfqn715g4p9_5yh1kzsc0000gn/T/pip-build-yw2o18bv/psutil/
I have gcc 6.3 installed.
gcc (MacPorts gcc6 6.3.0_2) 6.3.0
Could contain :
Looks like most people care much more about animal reign knowledge than what the actual tool do.
I should probably at least have a light background on that.
The standard terminal on a Ubuntu workstation has a dark background. Bonobo uses a dark green/grey for its statistics. So a less dark gray text color for the Bonobo statistics could be more readable. Maybe is possible to find a color that is a good trade off on all the platforms. Lighter green/grey maybe?
#I looked, but didn't see, a way to pass parameters to bonobo at runtime. Specifically, I need the ability to use these at scale and to do that will need the ability to pass additional arguments.
Example:
bonobo run csvsanitizer 2773 inventory.txt
Is there anyway to do that? The closest thing I saw was possibly passing arguments as services.
Please note that though I used a command line example I would primarily be using the python api.
For now, two keyboardinterrupts are needed to stop a graph execution.
Some fatal errors stay in an infinite loop, like when we have an extractor that raise a FileNotFoundError.
Whatever happens, there should be an interruption of pipeline if the full graph cannot be configured, for example if there is a file reader/writer that cannot open file.
$ bonobo run bonobo/examples/datasets/fablabs.py
Traceback (most recent call last):
File "/Users/rd/.pyenv/versions/anaconda3-4.3.1/bin/bonobo", line 11, in <module>
load_entry_point('bonobo', 'console_scripts', 'bonobo')()
File "/Users/rd/Projects/Bonobo/bonobo/bonobo/commands/__init__.py", line 28, in entrypoint
commands[args.pop('command')](**args)
File "/Users/rd/Projects/Bonobo/bonobo/bonobo/commands/run.py", line 21, in execute
exec(code, context)
File "bonobo/examples/datasets/fablabs.py", line 60, in <module>
OpenDataSoftAPI(dataset=API_DATASET, netloc=API_NETLOC, timezone='Europe/Paris'),
File "/Users/rd/Projects/Bonobo/bonobo/bonobo/config.py", line 66, in __init__
len(extraneous), 's' if len(extraneous) > 1 else '', ', '.join(map(repr, sorted(extraneous)))
TypeError: OpenDataSoftAPI() got 1 unexpected option: 'timezone'.
Similar code found in 2 other locations (mass = 32)
For now, jupyter or interactive console environments are detected, and respective output plugins are automatically added.
This is great, but maybe you do not want that (running in tests, running in another software, etc.).
This is now implemented using environment variables.
Write a how-to for releases, based on http://rdc.li/r but with specific infos for bonobo.
Related to this, we need a process to move to minor version +1
Post-release QA
Can't print objects, can't print title less (or looks crappy), etc ...
This will be rewritten from scratch. Idea is to print a bag, instead of the first argument, which is a bit unrealistic. One argument Bag is a special case, that should be taken care of, but only a special case.
Other tasks:
Although using pandas function names was the first idea, I'm not so much inclined to use lowercase names anymore because we have to implement those as classes anyway. So there are two options, either we drop pandas names completely (which sounds not so unacceptable), or make them available through the bonobo.compat.pandas module, which will just alias things.
Extract from file ...
Load to file ...
Still need to stabilize / freeze this API.
Is there any value of building the doc on travis/appveyor, knowing that readthedocs will do it anyway ?
Context processors are too complicated.
There should be a simpler implementation, looking as much possible like Options and Services.
Proposal / draft implementation: http://docs.bonobo-project.org/en/0.2/guide/services.html
Release checklist
The one pager and documentation has a lot of defects.
Few things that were not correctly understood by people discovering bonobo :
To fix this, maybe
tbc.
We need to run the test suite on windows, at least to know potential problems like imports not working, etc.
AppVeyor looks like an option.
There is a problem with the release process, which is not very bad, but still may become in the future.
We have bonobo[X] (here X is one of docker, sqlalchemy, selenium, etc) depend on bonobo-X.
But bonobo-X depends on bonobo, too.
So we have to find a way to freeze the dependency, because obviously it's hard to be in the future at release time.
Maybe we should force usage of bonobo[X] (which was the favored way while discussing with people) and make sure the freezing is correct this way (freeze version of dep in bonobo for bonobo-X, which means release bonobo-X before bonobo while doing maintenance, and unfreeze bonobo in bonobo-X, that should only rely on bonobo "stable" (and anyway, it wont be encouraged to install extensions this way))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.