Git Product home page Git Product logo

snakebite's Introduction

Snakebite mini logo

Note: this project is inactive and has been archived.

Snakebite is a python library that provides a pure python HDFS client and a wrapper around Hadoops minicluster. The client uses protobuf for communicating with the NameNode and comes in the form of a library and a command line interface. Currently, the snakebite client supports most actions that involve the Namenode and reading data from DataNodes.

Note: all methods that read data from a data node are able to check the CRC during transfer, but this is disabled by default because of performance reasons. This is the opposite behaviour from the stock Hadoop client.

Snakebite requires python2 (python3 is not supported yet) and python-protobuf 2.4.1 or higher.

Snakebite 1.3.x has been tested mainly against Cloudera CDH4.1.3 (hadoop 2.0.0) in production. Tests pass on HortonWorks HDP 2.0.3.22-alpha (protocol versions 7 and 8)

Snakebite 2.x has been tested on Hortonworks HDP2.0 and CDH5 Beta and ONLY supports Hadoop 2.2.0 and up (protocol version 9)!

Installing

Snakebite releases are available through pypi at https://pypi.python.org/pypi/snakebite/

To install snakebite run:

pip install snakebite

To install snakebite 2.x with Kerberos/SASL support, make sure you can install python-krbV (https://fedorahosted.org/python-krbV/) and then run:

pip install "snakebite[kerberos]"

Since the older version of snakebite (1.3.x) supports Hadoop 1.0 (instead of Hadoop 2), you might want to install an older version by running:

pip install -I snakebite==1.3.x

Note that the 1.3 branch is unmaintained and doesn't include any of the fixes in the 2.x branch.

Documentation

More information and documentation can be found at https://snakebite.readthedocs.io/en/latest/

Development

Make sure to read about development here and about testing over here, hack and come back with a pull requests <3

Travis CI status: Travis Join the chat at https://gitter.im/spotify/snakebite

Copyright 2013-2016 Spotify AB

snakebite's People

Contributors

aarya123 avatar aeroevan avatar anyman avatar bolkedebruin avatar davefnbuck avatar dterror-zz avatar gbadiali avatar gitter-badger avatar gpoulin avatar guhehehe avatar hammer avatar jkukul avatar julian avatar kawaa avatar ogrisel avatar perploug avatar phoet avatar pragmattica avatar ravwojdyla avatar ro-ket avatar tarrasch avatar zline avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snakebite's Issues

Kerberos support

Hi,

I'd like to implement Kerberos support for snakebite. Been looking through documentation, but since I don't know snakebite codebase that well, could you point me in the right direction - where should this additional auth be implemented?

Or perhaps there is already some effort at implementing this feature?

Snakebite ls .. produces unexpected output

hdfs dfs -ls ..
(does ls of my user directory's parent as expected)

snakebite ls ..
Request error: org.apache.hadoop.fs.InvalidPathException
Invalid path name Invalid file name: /user/jbx/..

It seems to be failing with the getFileInfo call:

> snakebite -D ls /user/jbx/..

...

methodName: "getFileInfo"
declaringClassProtocolName: "org.apache.hadoop.hdfs.protocol.ClientProtocol"
clientProtocolVersion: 1

DEBUG:snakebite.channel:Request:

src: "/user/jbx/.."

DEBUG:snakebite.channel:RPC message length: 106 (00 00 00 6a (len: 4))
DEBUG:snakebite.channel:Sending: 00 00 00 6a (len: 4)
DEBUG:snakebite.channel:Sending: 1a (len: 1)
DEBUG:snakebite.channel:Sending: 08 02 10 00 18 00 22 10 65 35 65 35 61 66 65 35 2d 64 38 62 37 2d 34 61 28 01 (len: 26)
DEBUG:snakebite.channel:Sending: 3f (len: 1)
DEBUG:snakebite.channel:Sending: 0a 0b 67 65 74 46 69 6c 65 49 6e 66 6f 12 2e 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 68 64 66 73 2e 70 72 6f 74 6f 63 6f 6c 2e 43 6c 69 65 6e 74 50 72 6f 74 6f 63 6f 6c 18 01 (len: 63)
DEBUG:snakebite.channel:Sending: 0e (len: 1)
DEBUG:snakebite.channel:Sending: 0a 0c 2f 75 73 65 72 2f 6a 62 78 2f 2e 2e (len: 14)
DEBUG:snakebite.channel:############## RECVING ##############
DEBUG:snakebite.channel:############## PARSING ##############
DEBUG:snakebite.channel:Payload class: <class 'snakebite.protobuf.ClientNamenodeProtocol_pb2.GetFileInfoResponseProto'>
DEBUG:snakebite.channel:Bytes read: 4, total: 4
DEBUG:snakebite.channel:Total response length: 1184
DEBUG:snakebite.channel:Bytes read: 4, total: 8
DEBUG:snakebite.channel:Delimited message length (pos 2): 1182
DEBUG:snakebite.channel:Rewinding pos 7 with 2 places
DEBUG:snakebite.channel:Reset buffer to pos 5
DEBUG:snakebite.channel:Bytes read: 1180, total: 1188
DEBUG:snakebite.channel:Delimited message bytes (1182): 08 00 10 01 18 09 22 29 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 66 73 2e 49 6e 76 61 6c 69 64 50 61 74 68 45 78 63 65 70 74 69 6f 6e 2a d4 08 49 6e 76 61 6c 69 64 20 70 61 74 68 20 6e 61 6d 65 20 49 6e 76 61 6c 69 64 20 66 69 6c 65 20 6e 61 6d 65 3a 20 2f 75 73 65 72 2f 6a 62 78 2f 2e 2e 0a 09 61 74 20 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 68 64 66 73 2e 73 65 72 76 65 72 2e 6e 61 6d 65 6e 6f 64 65 2e 46 53 4e 61 6d 65 73 79 73 74 65 6d 2e 67 65 74 46 69 6c 65 49 6e 66 6f 28 46 53 4e 61 6d 65 73 79 73 74 65 6d 2e 6a 61 76 61 3a 33 32 38 31 29 0a 09 61 74 20 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 68 64 66 73 2e 73 65 72 76 65 72 2e 6e 61 6d 65 6e 6f 64 65 2e 4e 61 6d 65 4e 6f 64 65 52 70 63 53 65 72 76 65 72 2e 67 65 74 46 69 6c 65 49 6e 66 6f 28 4e 61 6d 65 4e 6f 64 65 52 70 63 53 65 72 76 65 72 2e 6a 61 76 61 3a 37 34 39 29 0a 09 61 74 20 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 68 64 66 73 2e 70 72 6f 74 6f 63 6f 6c 50 42 2e 43 6c 69 65 6e 74 4e 61 6d 65 6e 6f 64 65 50 72 6f 74 6f 63 6f 6c 53 65 72 76 65 72 53 69 64 65 54 72 61 6e 73 6c 61 74 6f 72 50 42 2e 67 65 74 46 69 6c 65 49 6e 66 6f 28 43 6c 69 65 6e 74 4e 61 6d 65 6e 6f 64 65 50 72 6f 74 6f 63 6f 6c 53 65 72 76 65 72 53 69 64 65 54 72 61 6e 73 6c 61 74 6f 72 50 42 2e 6a 61 76 61 3a 36 39 32 29 0a 09 61 74 20 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 68 64 66 73 2e 70 72 6f 74 6f 63 6f 6c 2e 70 72 6f 74 6f 2e 43 6c 69 65 6e 74 4e 61 6d 65 6e 6f 64 65 50 72 6f 74 6f 63 6f 6c 50 72 6f 74 6f 73 24 43 6c 69 65 6e 74 4e 61 6d 65 6e 6f 64 65 50 72 6f 74 6f 63 6f 6c 24 32 2e 63 61 6c 6c 42 6c 6f 63 6b 69 6e 67 4d 65 74 68 6f 64 28 43 6c 69 65 6e 74 4e 61 6d 65 6e 6f 64 65 50 72 6f 74 6f 63 6f 6c 50 72 6f 74 6f 73 2e 6a 61 76 61 3a 35 39 36 32 38 29 0a 09 61 74 20 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 69 70 63 2e 50 72 6f 74 6f 62 75 66 52 70 63 45 6e 67 69 6e 65 24 53 65 72 76 65 72 24 50 72 6f 74 6f 42 75 66 52 70 63 49 6e 76 6f 6b 65 72 2e 63 61 6c 6c 28 50 72 6f 74 6f 62 75 66 52 70 63 45 6e 67 69 6e 65 2e 6a 61 76 61 3a 35 38 35 29 0a 09 61 74 20 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 69 70 63 2e 52 50 43 24 53 65 72 76 65 72 2e 63 61 6c 6c 28 52 50 43 2e 6a 61 76 61 3a 39 32 38 29 0a 09 61 74 20 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 69 70 63 2e 53 65 72 76 65 72 24 48 61 6e 64 6c 65 72 24 31 2e 72 75 6e 28 53 65 72 76 65 72 2e 6a 61 76 61 3a 32 30 35 33 29 0a 09 61 74 20 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 69 70 63 2e 53 65 72 76 65 72 24 48 61 6e 64 6c 65 72 24 31 2e 72 75 6e 28 53 65 72 76 65 72 2e 6a 61 76 61 3a 32 30 34 39 29 0a 09 61 74 20 6a 61 76 61 2e 73 65 63 75 72 69 74 79 2e 41 63 63 65 73 73 43 6f 6e 74 72 6f 6c 6c 65 72 2e 64 6f 50 72 69 76 69 6c 65 67 65 64 28 4e 61 74 69 76 65 20 4d 65 74 68 6f 64 29 0a 09 61 74 20 6a 61 76 61 78 2e 73 65 63 75 72 69 74 79 2e 61 75 74 68 2e 53 75 62 6a 65 63 74 2e 64 6f 41 73 28 53 75 62 6a 65 63 74 2e 6a 61 76 61 3a 34 31 35 29 0a 09 61 74 20 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 73 65 63 75 72 69 74 79 2e 55 73 65 72 47 72 6f 75 70 49 6e 66 6f 72 6d 61 74 69 6f 6e 2e 64 6f 41 73 28 55 73 65 72 47 72 6f 75 70 49 6e 66 6f 72 6d 61 74 69 6f 6e 2e 6a 61 76 61 3a 31 34 39 31 29 0a 09 61 74 20 6f 72 67 2e 61 70 61 63 68 65 2e 68 61 64 6f 6f 70 2e 69 70 63 2e 53 65 72 76 65 72 24 48 61 6e 64 6c 65 72 2e 72 75 6e 28 53 65 72 76 65 72 2e 6a 61 76 61 3a 32 30 34 37 29 0a 30 01 3a 10 65 35 65 35 61 66 65 35 2d 64 38 62 37 2d 34 61 40 01 (len: 1182)
DEBUG:snakebite.channel:Header read 1184
DEBUG:snakebite.channel:RpcResponseHeaderProto:

callId: 0
status: ERROR
serverIpcVersionNum: 9
exceptionClassName: "org.apache.hadoop.fs.InvalidPathException"

Support umask

I would like to be able to set the umask for snakebite.

E.g. the hadoop equivalent is hadoop fs -D fs.permissions.umask-mode=002 -mkdir foobar

Ideally I could set it in my .snakebiterc, and having it available on the command line would be nice too.

What would be super neat is snakebite umask 002 which sets an environment variable which sets the umask for just that shell.

But just reading it from my config would be the best place to start.

Feature request: timeout

Would be nice with a timeout flag that could be used mainly for the auto-complete feature when the namenode is slow at responding. Tab completion isn't very useful if it takes more than say a second to get the completion, but currently the user has to cancel the completion with a ctrl-C sequence in that case which aborts the current command altogether.
What do you think?

Add force flag to rm

Vanilla hdfs client has force flag for rm operation - it would be nice to add it to snakebite as well.

Not everything needs to be a generator

Making the commands return generators is really useful in many cases, but some commands make a lot more sense to just return one item. A good example are atomic commands like mkdir, as well as a command like serverdefaults().

IMHO in cases where only one output is expected it should return that thing instead of a generator of one item

broken pipe

I installed it using 'pip install snakebite' onto the Cloudera Demo VM (http://www.cloudera.com/content/support/en/downloads/download-components/download-products.html)

And I get a 'broken pipe' error. Any suggestions on how to fix it?

Reference:
'snakebite -D ls /'
DEBUG:snakebite.client:Trying to find path / DEBUG:snakebite.channel:############## CONNECTING ##############
DEBUG:snakebite.channel:Sending: 68 72 70 63 (len: 4)
DEBUG:snakebite.channel:Sending: 09 (len: 1)
DEBUG:snakebite.channel:Sending: 00 (len: 1)
DEBUG:snakebite.channel:Sending: 00 (len: 1) DEBUG:snakebite.channel:RpcRequestHeaderProto (len: 26):

rpcKind: RPC_PROTOCOL_BUFFER
rpcOp: RPC_FINAL_PACKET
callId: -3
clientId: "aad60a7f-0a44-4d"
retryCount: -1

DEBUG:snakebite.channel:RequestContext (len: 60):

userInfo {
realUser: "cloudera"
}
protocol: "org.apache.hadoop.hdfs.protocol.ClientProtocol"

DEBUG:snakebite.channel:Header length: 88 (00 00 00 58 (len: 4))
DEBUG:snakebite.channel:Sending: 00 00 00 58 (len: 4)
DEBUG:snakebite.channel:Sending: 1a (len: 1) DEBUG:snakebite.channel:Closing socket Traceback (most recent call last):
File "/usr/bin/snakebite", line 39, in
SnakebiteCli()
File "/usr/bin/snakebite", line 30, in init
clparser.execute()
File "/usr/lib/python2.6/site-packages/snakebite/commandlineparser.py", line 340, in execute
exitError(e)
File "/usr/lib/python2.6/site-packages/snakebite/commandlineparser.py", line 46, in exitError
raise error
socket.error: [Errno 32] Broken pipe

Snakebite on EMR?

This might be a basic question, but has anyone gotten snakebite working on AWS EMR?

[Feature request] Short Circuits Reads

This is a feature request to add the possibility to leverage HDFS Short Circuit Reads from snakebite:

Since HDFS-347 was fixed in Hadoop 2.1.0, it is possible for HDFS clients collocated with a DataNode to directly access pre-opened file descriptors for the blocks of HDFS files hosted by the data datanode, making it possible for the client to read raw data without any overhead nor memory copy.

More details in this blog post:

http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/

This feature is even more interesting when combined with the new CentralizedCacheManagement of Hadoop 2.3.0 that makes it possible to warm-up a and lock the blocks of specific HDFS files in the underlying operating system disk cache.

The two features combined make it possible to do full in-memory access of HDFS data directly from Python (optionally by the Python's mmap module or numpy.memmap).

So this feature request is about adding SCR support to snakebite only.

Exposing cached block info from the name node via RPC access along with caching directive control in some new snakebite functions would also be interesting to fully leverage collocation of the execution of the HDFS client program with cached blocks. However this can be probably be implemented independently from the SCR feature.

Have client.py read hadoop config

Right now, Client is initialized with hostname, port and possibly a version number. Maybe it would be nice to have an "autoconfig" client that reads the hadoop config (if HADOOP_HOME is set), just like the command line interface does.

Support hdfs:// uris

Could be nice if snakebite ls supported parsing HDFS URIs like hdfs://namenodehost:54310/path/to/dir etc. like hadoop fs -ls does.
Right now it prepends the path with my user dir instead, i.e. it looks for /user/freider/hdfs://namenodehost: ... etc.

Update Tox to Tox 1.8

Currently out tox.ini has a lot of duplication including basepython and some environmental variables. Unfortunately there's no better solution for that in 1.7.x
One of the features in Tox 1.8 is "multi-dimensional configuration support" - sounds awesome indeed!
Documentation -> http://tox.readthedocs.org/en/latest/changelog.html#dev1
Currently there's developer preview of version 1.8 and afaiu multi-dimensional configuration should reduce code and increase readability of tox.ini. So as soon as it's available let's use it!

mkdir doesn't create parent directories

snakebite mkdir /foo/bar/ will not create bar if /foo doesn't exist
"reason: Parent directory doesn't exist:"

Doesn't work with --recurse either.
hadoop fs -mkdir (without flags) does it by default, so I think it would be nice if snakebite also supported it (with or without flag)

Improve speed of CRC verification

Cleaning up documentation - it was in TODO.
I want to remove TODO section from documentation and add link to open issues on github.
There's no context to this issue - could you provide @wouterdebie - thanks!

Use Fig to make testing totally isolated

With Tox we isolated python environments and enhance testing process, which is cool, but tests still run on developers machines, that has to provide java and hadoop distribution. With Fig we should be able to abstract away operation system using docker and provide testing images so that testing is totally isolated and the same everywhere (in "perfect" world).

Add tox for local/automated testing

Makes local/automated testing simpler and also reduce duplication - local virtualenvs and travis setups.

Goals:

  • make local testing isolated by default
  • reduce duplication of logic - everything in tox setup
  • automate local testing (env variable etc)

Fix github pages generation script

There's scripts/create_gh_pages script - but it's:

  • not generic (git aliases ci, github remote name)
  • no error checking - use set -o errexit and set -o pipefail
  • super dangerous - one has to run the script from within scripts directory, script does not enforce it, if you run the script from snakebite root directory, the script will delete your whole snakebite directory - including .git, yes I did that ;(

Space in filename breaks snakebite ls autocomplete

If I have a file "foo bar" in my home dir, and do snakebite ls , the homedir will not be auto-filled and I will be presented all files in my home dir as well as:
/user/freider/foo
and
hopp (just hopp, not /user/freider/hopp)

Snakebite count has unexpected behavior

The file size is listed after replication in snakebite count, but before replication using hdfs count.

snakebite du -s and hdfs' du -s lists it before replication.

snakebite rmdir doesn't work

File "/usr/lib/python2.6/dist-packages/snakebite/client.py", line 410, in _handle_rmdir
if len(files) > 0:
TypeError: object of type 'generator' has no len()

Client.text() throws inconsistent exception

Receiving an inconsistent Exception: Feailure to read block XXX...XXX when using client.text([paths])

When paths is a list of part files and I'm trying to generate the text, I receive an inconsistent exception where the block its failing to read from changes, and even sometimes generating the text just works.

Code snippet I'm trying to run:

part_files = []
for f in sb_conn.ls([path]):
    # FIXME: this is ugly, there must be a better way to do this
    if f['path'].split('/')[-1].startswith('part'):
        part_files.append(f['path'])

# THIS IS WHERE EXCEPTION GETS THROWN
for part in sb_conn.text(part_files):
    gz_tempfile.write(part)

My best guess is this has to do with an unstable DataXceiverChannel, but can use some guidance.

Fix constant failures of travis ci

Travis is constantly failing for test_cat_file_on_3_blocks test. Investigate why and find solution so that we can depend on travis again.

touchz doesn't work

$ snakebite touchz /user/jco/heyyo
Traceback (most recent call last):
File "/usr/bin/snakebite", line 345, in
cliclient.execute()
File "/usr/bin/snakebite", line 176, in execute
return Commands.methods[self.cmd]'method'
File "/usr/bin/snakebite", line 288, in touchz
for line in format_results(result, json_output=self.opts.json):
File "/usr/lib/python2.6/dist-packages/snakebite/formatter.py", line 134, in format_results
for r in results:
File "/usr/lib/python2.6/dist-packages/snakebite/client.py", line 438, in touchz
replication = defaults['replication']
TypeError: 'generator' object is unsubscriptable

Can't ls() a directory with a Unicode name containing files with Unicode filenames

If you have a directory called "/testdir", and it contains a file called "testfile", then this:

list(client.ls([u'/testdir']))

Gives this stacktrace:

File "<stdin>", line 1, in <module>
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 139, in ls
    recurse=recurse):
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 1094, in _find_items
    full_path = self._get_full_path(path, node)
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 941, in _get_full_path
    return os.path.join(path, node.path)
  File "/home/wsong/memsql-loader/venv/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1: ordinal not in range(128)

If you instead pass in a Python string to ls(), you get this:

list(client.ls(['/testdir']))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 139, in ls
    recurse=recurse):
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 1078, in _find_items
    fileinfo = self._get_file_info(path)
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 1206, in _get_file_info
    request.src = path
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 471, in field_setter
    self._fields[field] = type_checker.CheckValue(new_value)
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/google/protobuf/internal/type_checkers.py", line 166, in CheckValue
    (proposed_value))
ValueError: '/\xef\xbd\x94\xef\xbd\x85\xef\xbd\x93\xef\xbd\x94\xef\xbd\x84\xef\xbd\x89\xef\xbd\x92' has type bytes, but isn't in 7-bit ASCII encoding. Non-ASCII strings must be converted to unicode objects before being added.

The right fix is probably to do some type checking before you pass things into os.path.join().

data_files not included in sdist

In py2.6 and still in 2.7 sdist may not include data_files from setup.py in source distribution which make some file unreachable during install process (including bash completion and license file).

@wouterdebie how do you produce tarballs available in pypi, I can see they have license and bash completion but I can't reproduce this via python setup.py sdist?

To solve the problem, let's add MANIFEST.in and add all required files including:

  • bash completion
  • license

Also since by default sdist includes all test/test* files, which at the moment includes two files (out of many test files), let's decided on either including all test files (via manifest) or not. I think we shouldn't include test files, what do you think @wouterdebie ?

Client.cat returns a generator that yields a generator that yields strings

Is this the intended behavior? The docs make it sound like Client.cat should return a generator that yields strings.

a = hdfs_client.cat(['/path/to/my/file'])
a.next()
# prints: <generator object _handle_cat at 0x10535c4b0>

while

a = hdfs_client.cat(['/path/to/my/file'])
a.next().next()
# prints data from the file

client.text only accepts type str, not unicode

Modern linux distributions allow for unicode paths, but there is strict type checking in the Client.test method for only str type paths. This gives errors when you have utf-8 encoded paths.

It also gives errors when you are using helper classes for path manipulation like path.py, which subclasses 'unicode' (for the above reason).

Successful snakebite mv says OK then ERROR

I get this confusing output even though the move succeeds:

uldis@box:~$ sudo -u username snakebite mv /log/logname/2013-10-22.bak /log/logname/2013-10-22.bak_
OK: /log/logname/2013-10-22.bak
ERROR: /log/logname/2013-10-22.bak_ (reason: )

With -D:

uldis@box:~$ sudo -u username snakebite -D mv /log/logname/2013-10-21.bak /log/logname/2013-10-21.bak_
DEBUG:snakebite.client:Trying to find path /log/logname/2013-10-21.bak
DEBUG:snakebite.channel:RequestContext (len: 74):

userInfo {
  effectiveUser: "username"
}
protocol: "org.apache.hadoop.hdfs.protocol.ClientProtocol"

DEBUG:snakebite.channel:############## CONNECTING ##############
DEBUG:snakebite.channel:RpcPayloadHeader (len: 6):

rpcKind: RPC_PROTOCOL_BUFFER
rpcOp: RPC_FINAL_PAYLOAD
callId: 0

DEBUG:snakebite.channel:Protobuf message:

src: "/log/logname/2013-10-21.bak"

DEBUG:snakebite.channel:Protobuf message bytes (34): 0a 20 2f 6c 6f 67 2f 70 72 6f 64 75 63 74 2d 75 73 65 72 2f 32 30 31 33 2d 31 30 2d 32 31 2e 62 61 6b
DEBUG:snakebite.channel:RpcRequest (len: 99):

methodName: "getFileInfo"
request: "\n /log/logname/2013-10-21.bak"
declaringClassProtocolName: "org.apache.hadoop.hdfs.protocol.ClientProtocol"
clientProtocolVersion: 1

DEBUG:snakebite.channel:############## SENDING ##############
DEBUG:snakebite.channel:Header + payload len: 107
DEBUG:snakebite.channel:############## RECVING ##############
DEBUG:snakebite.channel:############## PARSING ##############
DEBUG:snakebite.channel:Payload class: <class 'snakebite.protobuf.ClientNamenodeProtocol_pb2.GetFileInfoResponseProto'>
DEBUG:snakebite.channel:Bytes read: 8, total: 8
DEBUG:snakebite.channel:Rewinding pos 7 with 8 places
DEBUG:snakebite.channel:Reset buffer to pos -1
DEBUG:snakebite.channel:---- Parsing header ----
DEBUG:snakebite.channel:Delimited message length (pos 1): 4
DEBUG:snakebite.channel:Rewinding pos 3 with 3 places
DEBUG:snakebite.channel:Reset buffer to pos 0
DEBUG:snakebite.channel:Delimited message bytes (4): 08 00 10 00
DEBUG:snakebite.channel:Response header:

callId: 0
status: SUCCESS

DEBUG:snakebite.channel:---- Parsing response ----
DEBUG:snakebite.channel:Bytes read: 1, total: 9
DEBUG:snakebite.channel:4 bytes delimited part length: 62
DEBUG:snakebite.channel:Bytes read: 62, total: 71
DEBUG:snakebite.channel:Response bytes (62): 0a 3c 08 01 12 00 18 00 22 03 08 ed 03 2a 16 73 70 6f 74 69 66 79 2d 61 6e 61 6c 79 74 69 63 73 2d 64 61 74 61 32 0a 73 75 70 65 72 67 72 6f 75 70 38 ed c5 d9 ed 9d 28 40 00 50 00 58 00
DEBUG:snakebite.channel:Response:

fs {
  fileType: IS_DIR
  path: ""
  length: 0
  permission {
    perm: 493
  }
  owner: "username"
  group: "supergroup"
  modification_time: 1382404219629
  access_time: 0
  block_replication: 0
  blocksize: 0
}

DEBUG:snakebite.client:Added /log/logname/2013-10-21.bak to to result set
DEBUG:snakebite.channel:RpcPayloadHeader (len: 6):

rpcKind: RPC_PROTOCOL_BUFFER
rpcOp: RPC_FINAL_PAYLOAD
callId: 1

DEBUG:snakebite.channel:Protobuf message:

src: "/log/logname/2013-10-21.bak"
dst: "/log/logname/2013-10-21.bak_"

DEBUG:snakebite.channel:Protobuf message bytes (69): 0a 20 2f 6c 6f 67 2f 70 72 6f 64 75 63 74 2d 75 73 65 72 2f 32 30 31 33 2d 31 30 2d 32 31 2e 62 61 6b 12 21 2f 6c 6f 67 2f 70 72 6f 64 75 63 74 2d 75 73 65 72 2f 32 30 31 33 2d 31 30 2d 32 31 2e 62 61 6b 5f
DEBUG:snakebite.channel:RpcRequest (len: 129):

methodName: "rename"
request: "\n /log/logname/2013-10-21.bak\022!/log/logname/2013-10-21.bak_"
declaringClassProtocolName: "org.apache.hadoop.hdfs.protocol.ClientProtocol"
clientProtocolVersion: 1

DEBUG:snakebite.channel:############## SENDING ##############
DEBUG:snakebite.channel:Header + payload len: 138
DEBUG:snakebite.channel:############## RECVING ##############
DEBUG:snakebite.channel:############## PARSING ##############
DEBUG:snakebite.channel:Payload class: <class 'snakebite.protobuf.ClientNamenodeProtocol_pb2.RenameResponseProto'>
DEBUG:snakebite.channel:Bytes read: 8, total: 8
DEBUG:snakebite.channel:Rewinding pos 7 with 8 places
DEBUG:snakebite.channel:Reset buffer to pos -1
DEBUG:snakebite.channel:---- Parsing header ----
DEBUG:snakebite.channel:Delimited message length (pos 1): 4
DEBUG:snakebite.channel:Rewinding pos 3 with 3 places
DEBUG:snakebite.channel:Reset buffer to pos 0
DEBUG:snakebite.channel:Delimited message bytes (4): 08 01 10 00
DEBUG:snakebite.channel:Response header:

callId: 1
status: SUCCESS

DEBUG:snakebite.channel:---- Parsing response ----
DEBUG:snakebite.channel:Bytes read: 1, total: 9
DEBUG:snakebite.channel:4 bytes delimited part length: 2
DEBUG:snakebite.channel:Bytes read: 2, total: 11
DEBUG:snakebite.channel:Response bytes (2): 08 01
DEBUG:snakebite.channel:Response:

result: true

OK: /log/logname/2013-10-21.bak
DEBUG:snakebite.client:Trying to find path /log/logname/2013-10-21.bak_
DEBUG:snakebite.channel:RpcPayloadHeader (len: 6):

rpcKind: RPC_PROTOCOL_BUFFER
rpcOp: RPC_FINAL_PAYLOAD
callId: 2

DEBUG:snakebite.channel:Protobuf message:

src: "/log/logname/2013-10-21.bak_"

DEBUG:snakebite.channel:Protobuf message bytes (35): 0a 21 2f 6c 6f 67 2f 70 72 6f 64 75 63 74 2d 75 73 65 72 2f 32 30 31 33 2d 31 30 2d 32 31 2e 62 61 6b 5f
DEBUG:snakebite.channel:RpcRequest (len: 100):

methodName: "getFileInfo"
request: "\n!/log/logname/2013-10-21.bak_"
declaringClassProtocolName: "org.apache.hadoop.hdfs.protocol.ClientProtocol"
clientProtocolVersion: 1

DEBUG:snakebite.channel:############## SENDING ##############
DEBUG:snakebite.channel:Header + payload len: 108
DEBUG:snakebite.channel:############## RECVING ##############
DEBUG:snakebite.channel:############## PARSING ##############
DEBUG:snakebite.channel:Payload class: <class 'snakebite.protobuf.ClientNamenodeProtocol_pb2.GetFileInfoResponseProto'>
DEBUG:snakebite.channel:Bytes read: 8, total: 8
DEBUG:snakebite.channel:Rewinding pos 7 with 8 places
DEBUG:snakebite.channel:Reset buffer to pos -1
DEBUG:snakebite.channel:---- Parsing header ----
DEBUG:snakebite.channel:Delimited message length (pos 1): 4
DEBUG:snakebite.channel:Rewinding pos 3 with 3 places
DEBUG:snakebite.channel:Reset buffer to pos 0
DEBUG:snakebite.channel:Delimited message bytes (4): 08 02 10 00
DEBUG:snakebite.channel:Response header:

callId: 2
status: SUCCESS

DEBUG:snakebite.channel:---- Parsing response ----
DEBUG:snakebite.channel:Bytes read: 1, total: 9
DEBUG:snakebite.channel:4 bytes delimited part length: 62
DEBUG:snakebite.channel:Bytes read: 62, total: 71
DEBUG:snakebite.channel:Response bytes (62): 0a 3c 08 01 12 00 18 00 22 03 08 ed 03 2a 16 73 70 6f 74 69 66 79 2d 61 6e 61 6c 79 74 69 63 73 2d 64 61 74 61 32 0a 73 75 70 65 72 67 72 6f 75 70 38 ed c5 d9 ed 9d 28 40 00 50 00 58 00
DEBUG:snakebite.channel:Response:

fs {
  fileType: IS_DIR
  path: ""
  length: 0
  permission {
    perm: 493
  }
  owner: "username"
  group: "supergroup"
  modification_time: 1382404219629
  access_time: 0
  block_replication: 0
  blocksize: 0
}

DEBUG:snakebite.client:Added /log/logname/2013-10-21.bak_ to to result set
DEBUG:snakebite.channel:RpcPayloadHeader (len: 6):

rpcKind: RPC_PROTOCOL_BUFFER
rpcOp: RPC_FINAL_PAYLOAD
callId: 3

DEBUG:snakebite.channel:Protobuf message:

src: "/log/logname/2013-10-21.bak_"
dst: "/log/logname/2013-10-21.bak_"

DEBUG:snakebite.channel:Protobuf message bytes (70): 0a 21 2f 6c 6f 67 2f 70 72 6f 64 75 63 74 2d 75 73 65 72 2f 32 30 31 33 2d 31 30 2d 32 31 2e 62 61 6b 5f 12 21 2f 6c 6f 67 2f 70 72 6f 64 75 63 74 2d 75 73 65 72 2f 32 30 31 33 2d 31 30 2d 32 31 2e 62 61 6b 5f
DEBUG:snakebite.channel:RpcRequest (len: 130):

methodName: "rename"
request: "\n!/log/logname/2013-10-21.bak_\022!/log/logname/2013-10-21.bak_"
declaringClassProtocolName: "org.apache.hadoop.hdfs.protocol.ClientProtocol"
clientProtocolVersion: 1

DEBUG:snakebite.channel:############## SENDING ##############
DEBUG:snakebite.channel:Header + payload len: 139
DEBUG:snakebite.channel:############## RECVING ##############
DEBUG:snakebite.channel:############## PARSING ##############
DEBUG:snakebite.channel:Payload class: <class 'snakebite.protobuf.ClientNamenodeProtocol_pb2.RenameResponseProto'>
DEBUG:snakebite.channel:Bytes read: 8, total: 8
DEBUG:snakebite.channel:Rewinding pos 7 with 8 places
DEBUG:snakebite.channel:Reset buffer to pos -1
DEBUG:snakebite.channel:---- Parsing header ----
DEBUG:snakebite.channel:Delimited message length (pos 1): 4
DEBUG:snakebite.channel:Rewinding pos 3 with 3 places
DEBUG:snakebite.channel:Reset buffer to pos 0
DEBUG:snakebite.channel:Delimited message bytes (4): 08 03 10 00
DEBUG:snakebite.channel:Response header:

callId: 3
status: SUCCESS

DEBUG:snakebite.channel:---- Parsing response ----
DEBUG:snakebite.channel:Bytes read: 1, total: 9
DEBUG:snakebite.channel:4 bytes delimited part length: 2
DEBUG:snakebite.channel:Bytes read: 2, total: 11
DEBUG:snakebite.channel:Response bytes (2): 08 00
DEBUG:snakebite.channel:Response:

result: false

ERROR: /log/logname/2013-10-21.bak_ (reason: )
uldis@box:~$ snakebite ls /log/logname
...
drwxr-xr-x   - username supergroup          0 2013-10-22 01:10 /log/logname/2013-10-21.bak_
...
uldis@box:~$ snakebite ls /log
...
drwxr-xr-x   - username supergroup                      0 2013-10-24 08:33 /log/logname
...

Error running snakebite

Hi, I've got the following problem with snakebite:

# bin/snakebite 
Traceback (most recent call last):
  File "bin/snakebite", line 24, in <module>
    from snakebite.client import Client
  File "/usr/lib/python2.6/site-packages/snakebite-1.0-py2.6.egg/snakebite/client.py", line 16, in <module>
    import snakebite.protobuf.ClientNamenodeProtocol_pb2 as client_proto
  File "/usr/lib/python2.6/site-packages/snakebite-1.0-py2.6.egg/snakebite/protobuf/ClientNamenodeProtocol_pb2.py", line 12, in <module>
    import hdfs_pb2
  File "/usr/lib/python2.6/site-packages/snakebite-1.0-py2.6.egg/snakebite/protobuf/hdfs_pb2.py", line 1687, in <module>
    DESCRIPTOR.message_types_by_name['ExtendedBlockProto'] = _EXTENDEDBLOCKPROTO
AttributeError: 'FileDescriptor' object has no attribute 'message_types_by_name'

I've found an issue with similar problem: silverchris/Android-Munin-Node#1 but doing as state there doesn't solve my problem.

I'm running CDH 4.1.2

Order of arguments for operation(s) changes.

Example for ls command:

rav@edge:~$ snakebite ls -d / /user /dfs
'/dfs': No such file or directory
rav@edge:~$ snakebite ls -d / /user /user/rav /dfs
Found 2 items
drwxr-xr-x   - hdfs       supergroup          0 2014-07-12 11:21 /
drwxrwxr-x   - hdfs       supergroup          0 2014-08-25 12:58 /user
'/dfs': No such file or directory

Error handling (permission, common)

For some errors snakebite will print nice error message like:

  • DirectoryException ("No such file or directory")
  • FileNotFoundException ("No such file or directory")

Example:

rav@edge:~$ snakebite ls tesfds
'/user/rav/tesfds': No such file or directory

For other it will print:

"ERROR: %s (reason: %s)" % (r.get('path'), r.get('error', ''))

For permission errors it will actually print stack trace:

rav@edge:~$ snakebite chmod 777 /lib
Request error: org.apache.hadoop.security.AccessControlException
Permission denied
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:180)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:170)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5193)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5175)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOwner(FSNamesystem.java:5131)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermissionInt(FSNamesystem.java:1375)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1356)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:528)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:348)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59576)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)

It would be nice to have a common error handling.

Unexpected end-group tag.

/root/portal/lib/python2.7/site-packages/google/protobuf/internal/python_message.pyc in MergeFromString(self, serialized)
842 # The only reason _InternalParse would return early is if it
843 # encountered an end-group tag.
--> 844 raise message_mod.DecodeError('Unexpected end-group tag.')
845 except (IndexError, TypeError):
846 # Now ord(buf[p:p+1]) == ord('') gets TypeError.

2.4.1 tag is missing

It would be really great if you could tag 2.4.1 so that we can download the tarball directly from github.

Requirements file for both setup and dev

We now have requirements in both requirements.txt and setup.py, reduce duplication by reading requirements from the file in setup.py, add requirements-dev.txt for development/build/test requirements, make sure to include it in setup.py test/build tasks.

Snakebite touchz only cares about first file

This both affects the cli (snakebite touchz path_1 path_2) and the internal implementation in snakebite/client.py:Client.touchz().

~ arash@lon4-edgenode-b5  
❯ snakebite mkdir arash-test
OK: /user/arash/arash-test

~ arash@lon4-edgenode-b5  
❯ snakebite ls arash-test
Found 0 items

~ arash@lon4-edgenode-b5  
❯ snakebite touchz arash-test/a arash-test/b
OK: /user/arash/arash-test/a

~ arash@lon4-edgenode-b5  
❯ snakebite ls arash-test
Found 1 items
-rw-r--r--   3 arash      hadoop              0 2014-12-03 17:33 /user/arash/arash-test/a

@ravwojdyla, I suppose this won't get fixed fast right (unless I do it myself :))? I'll just make a work around for myself. :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.