douban / dpark Goto Github PK
View Code? Open in Web Editor NEWPython clone of Spark, a MapReduce alike framework in Python
License: BSD 3-Clause "New" or "Revised" License
Python clone of Spark, a MapReduce alike framework in Python
License: BSD 3-Clause "New" or "Revised" License
系统默认的python2.7是2.7.2,
新装了python2.7.5
建立一个新的virtualenv,用python2.7.5
用pwd
/venv/bin/python的方式,process没问题,
但是在mesos的环境下发现仍然调用了/usr/bin/env python2.7,修改
venv/lib/python2.7/site-packages/DPark-0.1-py2.7.egg/dpark/bin/executor27.py
的第一行
从 #!/usr/bin/env python2.7
改为
#!pwd
/venv/bin/python
就OK
/tmp/mesos/slaves/201301141050-1654703350-5050-14385-0/frameworks/201301141050-1654703350-5050-14385-0010/executors/default/runs/1/stderr 的内容为:
sh: /opt/python-2.7/lib/python2.7/site-packages/DPark-0.1-py2.7.egg/dpark/executor27.py: Permission denied
chmod -v +x /opt/python-2.7/lib/python2.7/site-packages/DPark-0.1-py2.7.egg/dpark/executor*.py
后解决。
rhel7 3.10.0-229.el7.x86_64,
python 2.7.5
运行python setup.py install
出错:
gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c dpark/portable_hash.c -o build/temp.linux-x86_64-2.7/dpark/portable_hash.o
dpark/portable_hash.c:8:22: 致命错误:pyconfig.h:没有那个文件或目录
#include "pyconfig.h"
运行过程中会输出多行
DeprecationWarning: Call to deprecated function or method decompress (use lz4.block.decompress instead)
请问是什么原因
File "C:\Python27\lib\site-packages\Cython\Utils.py", line 22, in wrapper
res = cache[args] = f(*args)
File "C:\Python27\lib\site-packages\Cython\Utils.py", line 111, in search_incl
de_directories
path = os.path.join(dir, dotted_filename)
File "C:\Python27\lib\ntpath.py", line 85, in join
result_path = result_path + p_path
nicodeDecodeError: 'ascii' codec can't decode byte 0xca in position 1: ordinal
ot in range(128)
uilding 'dpark.portable_hash' extension
reating build\temp.win32-2.7
reating build\temp.win32-2.7\Release
reating build\temp.win32-2.7\Release\dpark
:\Program Files\Common Files\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe
c /nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\Python27\include -IC:\Python27\PC /Tcd
ark\portable_hash.c /Fobuild\temp.win32-2.7\Release\dpark\portable_hash.obj
ortable_hash.c
park\portable_hash.c(1) : fatal error C1189: #error : Do not use this file, it
is the result of a failed Cython compilation.
rror: command 'C:\Program Files\Common Files\Microsoft\Visual C++ for Pytho
\9.0\VC\Bin\cl.exe' failed with exit status 2
在Mesos上跑当计算文件数量稍多时会报下面的错…
2014-12-24 20:02:30,120 [ERROR] [pymesos.process] error while call <function handle at 0x1566ed8> (tried 4 times)
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/pymesos-0.0.3-py2.6.egg/pymesos/process.py", line 69, in run_jobs
func(*args, **kw)
File "/usr/lib64/python2.6/site-packages/pymesos-0.0.3-py2.6.egg/pymesos/process.py", line 139, in handle
f(*args)
File "/usr/lib64/python2.6/site-packages/pymesos-0.0.3-py2.6.egg/pymesos/scheduler.py", line 93, in onResourceOffersMessage
self.sched.resourceOffers(self, list(offers))
File "/usr/lib64/python2.6/site-packages/DPark-0.1-py2.6-linux-x86_64.egg/dpark/schedule.py", line 448, in _
r = f(self, *a, **kw)
File "/usr/lib64/python2.6/site-packages/DPark-0.1-py2.6-linux-x86_64.egg/dpark/schedule.py", line 685, in resourceOffers
len(offers), sum(cpus), sum(mems), len(self.activeJobs))
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
以前是/tmp/dpark,
现在启动时会优先/dev/shm/,
那是否要把webroot改成/dev/shm/ ?
error: can't copy 'dpark/portable_hash.c': doesn't exist or not a regular file
看了下,setup.py中有
ext_modules = [Extension('dpark.portable_hash', ['dpark/portable_hash.pyx'])],
它会 找 .portable_hash.c ??? 不过已经指明是找pyx文件呢。。奇怪
Creating a context with 'process' will use the MultiProcessScheduler instead of LocalScheduler. However, this breaks rdd.fold() (and maybe others?). See below for reduced test case:
>>> import dpark
>>> ctx = dpark.DparkContext('process')
>>> rdd = ctx.makeRDD([1, 2, 3, 4])
>>> rdd.fold(0, max)
2013-04-18 10:58:02,786 [INFO] [scheduler] Got a job with 2 tasks
2013-04-18 10:58:02,797 [ERROR] [scheduler] error in task <ResultTask(0) of <ParallelCollection 4>
Traceback (most recent call last):
File "dpark/schedule.py", line 343, in run_task
result = task.run(aid)
File "dpark/task.py", line 52, in run
return self.func(self.rdd.iterator(self.split))
File "dpark/rdd.py", line 253, in <lambda>
self.ctx.runJob(self, lambda x: reduce(f, x, copy(zero))),
TypeError: 'int' object is not callable
2013-04-18 10:58:02,801 [ERROR] [scheduler] error in task <ResultTask(1) of <ParallelCollection 4>
Traceback (most recent call last):
File "dpark/schedule.py", line 343, in run_task
result = task.run(aid)
File "dpark/task.py", line 52, in run
return self.func(self.rdd.iterator(self.split))
File "dpark/rdd.py", line 253, in <lambda>
self.ctx.runJob(self, lambda x: reduce(f, x, copy(zero))),
TypeError: 'int' object is not callable
2013-04-18 10:58:02,804 [INFO] [scheduler] Job finished in 0.0 seconds
2013-04-18 10:58:02,889 [ERROR] [scheduler] task <ResultTask(0) of <ParallelCollection 4> failed
2013-04-18 10:58:02,889 [ERROR] [scheduler] <OtherFailure exception:'int' object is not callable> <type 'instance'> exception:'int' object is not callable
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "dpark/rdd.py", line 254, in fold
zero)
File "dpark/context.py", line 202, in runJob
for it in self.scheduler.runJob(rdd, func, partitions, allowLocal):
File "dpark/schedule.py", line 330, in runJob
raise evt.reason
dpark.schedule.OtherFailure: <OtherFailure exception:'int' object is not callable>
And the correct behavior for the LocalScheduler version:
>>> import dpark
>>> ctx = dpark.DparkContext('local')
>>> rdd = ctx.makeRDD([1, 2, 3, 4])
>>> rdd.fold(0, max)
4
是这样的,
有一台节点的mesos-slave进程突然中止,然后Dpark开始报404,然后就卡在那里了……
对于这种情况,不知有什么建议吗?
下面的这个是一个上传文件然后算各个字母数量的小模块。我用 内核为gevent的gunicorn 启动该服务,请求时出现timeout问题。内核换为sync就木有问题~ 谢谢
@app.route('/upload', methods = ['POST'])
def upload():
if request.method == 'POST':
file = request.files['file']
if file:
filename = secure_filename(file.filename)
file.save(os.path.join(UPLOAD_FOLDER, filename))
f = dpark.textFile(os.path.join(UPLOAD_FOLDER, filename))
chs = f.flatMap(lambda x: x).filter(lambda x: x in string.ascii_letters).map(lambda x: (x, 1))
wc = chs.reduceByKey(lambda x, y: x+y).collectAsMap()
db.LogProc.save(wc)
return redirect('/')
I have a pair RDD that the key is a named tuple. For example,
from collections import namedtuple
Sample = namedtuple('Sample',
['i'],
verbose = False)
When I try to invoke groupByKey
on the rdd, dpark
responds with
Traceback (most recent call last):
File "demo.py", line 253, in <module>
sys.exit(main(sys.argv) or 0)
File "demo.py", line 247, in main
oneD(verbose)
File "demo.py", line 236, in oneD
resultRDD = inputRDD.flatMap(aFillIn.apply).reduceByKey(combineKeys).reduce(xxx)
File "/Users/jbw/anaconda/lib/python2.7/site-packages/DPark-0.1-py2.7-macosx-10.5-x86_64.egg/dpark/rdd.py", line 246, in reduce
return reduce(f, chain(self.ctx.runJob(self, reducePartition)))
File "/Users/jbw/anaconda/lib/python2.7/site-packages/DPark-0.1-py2.7-macosx-10.5-x86_64.egg/dpark/util.py", line 50, in chain
for v in it:
File "/Users/jbw/anaconda/lib/python2.7/site-packages/DPark-0.1-py2.7-macosx-10.5-x86_64.egg/dpark/context.py", line 261, in runJob
for it in self.scheduler.runJob(rdd, func, partitions, allowLocal):
File "/Users/jbw/anaconda/lib/python2.7/site-packages/DPark-0.1-py2.7-macosx-10.5-x86_64.egg/dpark/schedule.py", line 341, in runJob
raise Exception(reason.message)
Exception: exception:<class '__main__.SampleHash'> is unhashable by portable_hash
The method named portable_hash
in the file dpark/portable_hash.pyx
contains the dispatching of the hashing algorithm based on type:
cpdef int64_t portable_hash(obj) except -1:
t = type(obj)
if obj is None:
return 1315925605
if t is bytes:
return string_hash(obj)
elif t is unicode:
return unicode_hash(obj)
elif t is tuple:
return tuple_hash(obj)
elif t is int or t is long or t is float:
return hash(obj)
else:
raise TypeError('%s is unhashable by portable_hash' % t)
As I read it, the current implementation has the limitation that all user defined classes will raise TypeError
. Is this the intended behavior in dpark
because I don't see this limitation described in the Spark documentation? If it is, then at appears that keys in dpark
can only be:
None
btyes
unicode
tuple
int
long
float
One possible way to allow user defined classes would be to invoke hash
when required, something like:
cpdef int64_t portable_hash(obj) except -1:
t = type(obj)
if obj is None:
return 1315925605
if t is bytes:
return string_hash(obj)
elif t is unicode:
return unicode_hash(obj)
elif t is tuple:
return tuple_hash(obj)
elif t is int or t is long or t is float:
return hash(obj)
else:
return hash(obj)
Please let me know your thinking on this issue.
dpark怎样在mesos上安装部署?mesos怎么启动Myexecutor来运行task的?
This Project looks promising but could someone translate the docs and add a architectural overview.
I have a txt file to process. I use -M options to set the memory limit (I've tried -M 400 param, should be 400 megabytes as far as I understood). But on the test run, the memory consumption jumps up to 3.2GB which is really high. The script works as intended and the results are valid, the only issue here is the memory consumption. I'm afraid on larger files it will start swapping and simply choke.
I use no extra parameters, only -M, so I guess it should work using the default ones.
The memory consumption jumps not smoothly so I don't think it's a memory garbage issue.
Looks like Dpark creates a heap or something like that but how to set a limit for that? As I wrote, I'm afraid that it will eat all the memory. Now my test file is around 1GB, but what if I want to process 20GB?
In general I really like Dpark and don't want to switch to any other map-reduce framework (I've tried most of them, I guess). Please tell me how to predict and control the memory consumption. Thank you very much!
in test_job.py: line 54, unitest.main() should be unittest.main()
作为一个两度被豆瓣拒绝了的孩纸,表示终于有时间给豆瓣项目挑错好爽*3~
好吧,这也不是啥错╮(╯﹏╰)╭........
如题,谢谢~~
I increasingly find more and more cool GitHub projects on the web that look really interesting, dpark is one of them. And I see that you have re-written spark in Python, which is cool because I am a Python fan, and am also interested in Spark. But I find no mention as to why, and this puzzles me- what problems with original Spark were you trying to solve? Did you find some big bug with the current implementation, or was this maybe instead just a learning excise, something done for fun (A impressive excretes if so!)?
Basically it would be nice to have a paragraph in the README just saying why this projects exists- what limitations it was trying to overcome or performance to gain etc.
Many thanks!
See this test case:
mike@EDEN:~/Development/dpark$ cat t.py
import dpark
rdd = dpark.makeRDD(range(9), 3).flatMap(lambda i: (i, i)).glom()
print list(map(list, rdd.collect()))
mike@EDEN:~/Development/dpark$ python t.py
[[0, 0, 1, 1, 2, 2], [3, 3, 4, 4, 5, 5], [6, 6, 7, 7, 8, 8]]
mike@EDEN:~/Development/dpark$ python t.py -m process
2013-04-26 17:24:30,239 [INFO] [scheduler] Got a job with 3 tasks
2013-04-26 17:24:30,249 [INFO] [scheduler] Job finished in 0.0 seconds
[[], [], []]
Take a look at Spark's implementation of glom, specifically line 11. They get the array from the iterator and make a new array. The equivalent of that in Python is list()
. Take this patch:
diff --git a/dpark/rdd.py b/dpark/rdd.py
index c04a5e9..1efb3f8 100644
--- a/dpark/rdd.py
+++ b/dpark/rdd.py
@@ -511,7 +511,7 @@ class FilteredRDD(MappedRDD):
class GlommedRDD(DerivedRDD):
def compute(self, split):
- yield self.prev.iterator(split)
+ yield list(self.prev.iterator(split))
class MapPartitionsRDD(MappedRDD):
def compute(self, split):
Once I apply the patch, here is the result:
mike@EDEN:~/Development/dpark$ python t.py -m process
2013-04-26 17:25:27,891 [INFO] [scheduler] Got a job with 3 tasks
2013-04-26 17:25:27,902 [INFO] [scheduler] Job finished in 0.0 seconds
[[0, 0, 1, 1, 2, 2], [3, 3, 4, 4, 5, 5], [6, 6, 7, 7, 8, 8]]
I believe this is being caused by the pickle-ing that is happening when chained iterables are being passed through the multiprocessing pool. If you have a better solution that doesn't involve pulling everything into memory, I'd be happy to hear it!
Is there any documents that tell people how to contribute?
aws:redhat7.1
Installed /usr/lib64/python2.7/site-packages/mesos.interface-0.23.0-py2.7.egg
error: Installed distribution mesos.interface 0.23.0 conflicts with requirement mesos.interface==0.22.0
不知道是不是pymesos依赖的问题:
“install_requires=['mesos.interface>=0.22.0,<0.22.1.2'],”
Couldn't find a setup script in /tmp/easy_install-QfQ_64/mesos.interface-1.0.1.linux-x86_64.tar.gz
#!/usr/bin/env python
import dpark
nums = dpark.parallelize(range(100), 4)
print nums.count()
n=0;
while true; do
echo $n ;
n=`expr $n + 1 `;
../venv/bin/python test_hang.py -p 10 ;
done
具体表现为运行完print nums.count()之后卡在那里不退出
一般运行30次会出现1次,频率不固定,有时多有时少
环境:
OS: RHEL 6.2 x86_64
Python: 2.7.5
Github上最新的Dpark
我这边用旧版本(212e088)的是正常的,
使用mesos的也是正常的,
pyzmq等依赖模块也是同一版本(我是把旧的virtualenv中的site-packages/dpark cp过来测试的,所以可以保证除了dpark外都是一样的)
如:
经过很多的中间计算得出一个较大的rdd,后面的运算需要重复的用到这个rdd,有什么好的建议吗?
用cache和snapshot都会出错
目前我自己是先把rdd用saveAsTextFile写文件,运算用到时再导入
通过textFile创建rdd,文件中的每一行是一个document,现在需要进行分词、转vector等,怎么样给每一行自动添加一个行号,跟其在text file中的对应行号一致
Python2.7 64位机器上
在对读入的数据 A 做 reduce 的时候抛出了 Negative size passed to PyString_FromStringAndSize 这样的异常错误,而将数据 A 切分为 A1,A2 两份分别处理就没有这个问题。溢出的时候内存大约使用31G 多,不知道是不是数据量大了导致溢出。不知道是否和 portable_hash.c 里面有关?
当时没有打印堆栈信息,稍后准备重现下试试。
import sys
sys.path.append('../')
from dpark import DparkContext
dpark = DparkContext()
file = dpark.textFile("./tmp/words.txt")
words = file.flatMap(lambda x:x.split()).map(lambda x:(x,1))
wc = words.reduceByKey(lambda x,y:x+y).collectAsMap()
Traceback (most recent call last):
File "wc_1.py", line 7, in
wc = words.reduceByKey(lambda x,y:x+y).collectAsMap()
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\rdd.py", line 502, in collectAsMap
for v in self.ctx.runJob(self, lambda x:list(x)):
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\context.py", line 264, in runJob
for it in self.scheduler.runJob(rdd, func, partitions, allowLocal):
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\schedule.py", line 333, in runJob
[l[-1] for l in stage.outputLocs])
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\shuffle.py", line 353, in registerMapOutputs
self.client.call(SetValueMessage('shuffle:%s' % shuffleId, locs))
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\tracker.py", line 112, in call
sock.connect(self.addr)
File "zmq/backend/cython/socket.pyx", line 514, in zmq.backend.cython.socket.S
ocket.connect (zmq\backend\cython\socket.c:5376)
zmq.error.ZMQError: Invalid argument
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "F:\install\acond\lib\atexit.py", line 24, in _run_exitfuncs
func(_targs, *_kargs)
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\context.py", line 281, in stop
env.stop()
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\env.py", line 92, in stop
self.trackerServer.stop()
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\tracker.py", line 49, in stop
sock.connect(self.addr)
File "zmq/backend/cython/socket.pyx", line 514, in zmq.backend.cython.socket.S
ocket.connect (zmq\backend\cython\socket.c:5376)
ZMQError: Invalid argument
Error in sys.exitfunc:
Traceback (most recent call last):
File "F:\install\acond\lib\atexit.py", line 24, in _run_exitfuncs
func(_targs, *_kargs)
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\context.py", line 281, in stop
env.stop()
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\env.py", line 92, in stop
self.trackerServer.stop()
File "F:\install\acond\lib\site-packages\dpark-0.2.2-py2.7-win-amd64.egg\dpark
\tracker.py", line 49, in stop
sock.connect(self.addr)
File "zmq/backend/cython/socket.pyx", line 514, in zmq.backend.cython.socket.S
ocket.connect (zmq\backend\cython\socket.c:5376)
zmq.error.ZMQError: Invalid argument
There are already some documents for DPark in English, https://github.com/jackfengji/test_pro/issues.
English documents are needed to popularize this project to English speakers.
I could translate some of documents as a volunteer.
环境: pymesos 0.1.6, mesosphere 1.1.0 , dpark 0.3.5
文件: executor.py
函数: reply_status
问题: mesos_pb2.TaskStatus()返回的status对象的slave_id属性是空
def reply_status(driver, task_id, state, data=None):
status = mesos_pb2.TaskStatus()
status.task_id.MergeFrom(task_id)
status.state = state
status.timestamp = time.time()
if data is not None:
status.data = data
driver.sendStatusUpdate(status)
你好,请问支持py3吗?
results is below:
raceback (most recent call last):
File "test_dpark.py", line 20, in
from dpark import DparkContext
File "/usr/lib64/python2.6/site-packages/DPark-0.3.5-py2.6-linux-x86_64.egg/dpark/init.py", line 1, in
from context import DparkContext, parser as optParser
File "/usr/lib64/python2.6/site-packages/DPark-0.3.5-py2.6-linux-x86_64.egg/dpark/context.py", line 10, in
from dpark.schedule import LocalScheduler, MultiProcessScheduler, MesosScheduler
File "/usr/lib64/python2.6/site-packages/DPark-0.3.5-py2.6-linux-x86_64.egg/dpark/schedule.py", line 15, in
import pymesos as mesos
File "/usr/lib64/python2.6/site-packages/pymesos-0.1.6-py2.6.egg/pymesos/init.py", line 1, in
from .scheduler import MesosSchedulerDriver
File "/usr/lib64/python2.6/site-packages/pymesos-0.1.6-py2.6.egg/pymesos/scheduler.py", line 18, in
from .process import UPID, Process, async
File "/usr/lib64/python2.6/site-packages/pymesos-0.1.6-py2.6.egg/pymesos/process.py", line 141
args = {f.name: getattr(msg, f.name) for f in msg.DESCRIPTOR.fields}
^
SyntaxError: invalid syntax
在mesos 上使用docker 的方式运行 dpark ,有没有部署的文档可以参考?
看了点源码, 发现可运行的stage之间 是不是不能并行, 只支持某stage下tasks的并行. 求问是真的不能并行还是我忽略了些什么...
$ sudo python setup.py install
running install
running bdist_egg
running egg_info
writing requirements to DPark.egg-info/requires.txt
writing DPark.egg-info/PKG-INFO
writing top-level names to DPark.egg-info/top_level.txt
writing dependency_links to DPark.egg-info/dependency_links.txt
reading manifest file 'DPark.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'dpark/porable_hash.pyx'
writing manifest file 'DPark.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
skipping 'dpark/portable_hash.c' Cython extension (up-to-date)
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/decorator.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/hotcounter.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/schedule.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/__init__.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/util.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/rdd.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/cache.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/tracker.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/bagel.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/executor.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/task.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/bitindex.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/table.py -> build/bdist.linux-x86_64/egg/dpark
creating build/bdist.linux-x86_64/egg/dpark/moosefs
copying build/lib.linux-x86_64-2.7/dpark/moosefs/__init__.py -> build/bdist.linux-x86_64/egg/dpark/moosefs
copying build/lib.linux-x86_64-2.7/dpark/moosefs/consts.py -> build/bdist.linux-x86_64/egg/dpark/moosefs
copying build/lib.linux-x86_64-2.7/dpark/moosefs/cs.py -> build/bdist.linux-x86_64/egg/dpark/moosefs
copying build/lib.linux-x86_64-2.7/dpark/moosefs/utils.py -> build/bdist.linux-x86_64/egg/dpark/moosefs
copying build/lib.linux-x86_64-2.7/dpark/moosefs/master.py -> build/bdist.linux-x86_64/egg/dpark/moosefs
copying build/lib.linux-x86_64-2.7/dpark/env.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/broadcast.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/serialize.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/tabular.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/mutable_dict.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/context.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/conf.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/accumulator.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/dstream.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/hyperloglog.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/portable_hash.so -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/job.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/dependency.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/shuffle.py -> build/bdist.linux-x86_64/egg/dpark
copying build/lib.linux-x86_64-2.7/dpark/portable_hash.pyx -> build/bdist.linux-x86_64/egg/dpark
byte-compiling build/bdist.linux-x86_64/egg/dpark/decorator.py to decorator.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/hotcounter.py to hotcounter.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/schedule.py to schedule.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/__init__.py to __init__.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/util.py to util.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/rdd.py to rdd.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/cache.py to cache.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/tracker.py to tracker.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/bagel.py to bagel.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/executor.py to executor.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/task.py to task.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/bitindex.py to bitindex.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/table.py to table.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/moosefs/__init__.py to __init__.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/moosefs/consts.py to consts.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/moosefs/cs.py to cs.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/moosefs/utils.py to utils.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/moosefs/master.py to master.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/env.py to env.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/broadcast.py to broadcast.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/serialize.py to serialize.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/tabular.py to tabular.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/mutable_dict.py to mutable_dict.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/context.py to context.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/conf.py to conf.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/accumulator.py to accumulator.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/dstream.py to dstream.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/hyperloglog.py to hyperloglog.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/job.py to job.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/dependency.py to dependency.pyc
byte-compiling build/bdist.linux-x86_64/egg/dpark/shuffle.py to shuffle.pyc
creating stub loader for dpark/portable_hash.so
byte-compiling build/bdist.linux-x86_64/egg/dpark/portable_hash.py to portable_hash.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
installing scripts to build/bdist.linux-x86_64/egg/EGG-INFO/scripts
running install_scripts
running build_scripts
creating build/bdist.linux-x86_64/egg/EGG-INFO/scripts
copying build/scripts-2.7/dquery -> build/bdist.linux-x86_64/egg/EGG-INFO/scripts
copying build/scripts-2.7/executor.py -> build/bdist.linux-x86_64/egg/EGG-INFO/scripts
copying build/scripts-2.7/scheduler.py -> build/bdist.linux-x86_64/egg/EGG-INFO/scripts
copying build/scripts-2.7/drun -> build/bdist.linux-x86_64/egg/EGG-INFO/scripts
copying build/scripts-2.7/mrun -> build/bdist.linux-x86_64/egg/EGG-INFO/scripts
copying build/scripts-2.7/dgrep -> build/bdist.linux-x86_64/egg/EGG-INFO/scripts
changing mode of build/bdist.linux-x86_64/egg/EGG-INFO/scripts/dquery to 755
changing mode of build/bdist.linux-x86_64/egg/EGG-INFO/scripts/executor.py to 755
changing mode of build/bdist.linux-x86_64/egg/EGG-INFO/scripts/scheduler.py to 755
changing mode of build/bdist.linux-x86_64/egg/EGG-INFO/scripts/drun to 775
changing mode of build/bdist.linux-x86_64/egg/EGG-INFO/scripts/mrun to 775
changing mode of build/bdist.linux-x86_64/egg/EGG-INFO/scripts/dgrep to 755
copying DPark.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying DPark.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying DPark.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying DPark.egg-info/not-zip-safe -> build/bdist.linux-x86_64/egg/EGG-INFO
copying DPark.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying DPark.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
creating 'dist/DPark-0.3.2-py2.7-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing DPark-0.3.2-py2.7-linux-x86_64.egg
removing '/usr/lib64/python2.7/site-packages/DPark-0.3.2-py2.7-linux-x86_64.egg' (and everything under it)
creating /usr/lib64/python2.7/site-packages/DPark-0.3.2-py2.7-linux-x86_64.egg
Extracting DPark-0.3.2-py2.7-linux-x86_64.egg to /usr/lib64/python2.7/site-packages
DPark 0.3.2 is already the active version in easy-install.pth
Installing dquery script to /usr/bin
Installing executor.py script to /usr/bin
Installing scheduler.py script to /usr/bin
Installing drun script to /usr/bin
Installing mrun script to /usr/bin
Installing dgrep script to /usr/bin
Installed /usr/lib64/python2.7/site-packages/DPark-0.3.2-py2.7-linux-x86_64.egg
Processing dependencies for DPark==0.3.2
Searching for psutil
Reading https://pypi.python.org/simple/psutil/
^[[B^[[B^[Best match: psutil 4.3.0
Downloading https://pypi.python.org/packages/22/a8/6ab3f0b3b74a36104785808ec874d24203c6a511ffd2732dd215cf32d689/psutil-4.3.0.tar.gz#md5=ca97cf5f09c07b075a12a68b9d44a67d
Processing psutil-4.3.0.tar.gz
Writing /tmp/easy_install-T5wfHJ/psutil-4.3.0/setup.cfg
Running psutil-4.3.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-T5wfHJ/psutil-4.3.0/egg-dist-tmp-_tuYcX
Traceback (most recent call last):
File "setup.py", line 53, in <module>
'examples/dgrep',
File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "build/bdist.linux-x86_64/egg/setuptools/command/install.py", line 67, in run
File "build/bdist.linux-x86_64/egg/setuptools/command/install.py", line 117, in do_egg_install
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 380, in run
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 610, in easy_install
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 661, in install_item
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 709, in process_distribution
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 836, in resolve
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1081, in best_match
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1093, in obtain
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 629, in easy_install
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 659, in install_item
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 842, in install_eggs
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 1070, in build_and_install
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 1056, in run_setup
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 240, in run_setup
File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 193, in setup_context
File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 152, in save_modules
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 126, in __exit__
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
File "build/bdist.linux-x86_64/egg/setuptools/sandbox.py", line 110, in dump
MemoryError
In the test codes about countByValueAndWindow, I found it's different from the Spark's test cases.
The format and meaning of these two output (Dpark streaming and Spark streaming) are different. Although I think Dpark accords with the documentation:
When called on a DStream of (K, V) pairs, returns a new DStream of (K, Long) pairs where the value of each key is its frequency within a sliding window. Like in reduceByKeyAndWindow, the number of reduce tasks is configurable through an optional argument.
Could someone please explain this difference a little bit?
Please also refer to this related question on stackoverflow.
Thanks!
使用未压缩文件运行正常。
在使用gz或bz2压缩时,
其它task运行要比未经压缩的文件快很多,异常的快。
然后会有一些任务一直timeout
[WARNING] [job ] re-submit task 118 for timeout 30.1, try 2
补充:我看到gzip rdd中的注释了,用pigz -i可以正常运行,但bzip2有什么需要注意的吗?
另外,请问豆瓣常用哪种压缩格式呢?
Hello, have you considered releasing dpark.pymesos
as an standalone package?
I have used it with great success and now that mesos 0.20.0 is out it is possible to install mesos interfaces from mesos.interface to avoid copy-pasting more code.
I like pymesos doesn't require extra packages compared to https://github.com/wickman/pesos project.
thanks
在我本上运行 demo时候, 会在text search example 卡住, ctr+c后的 exception 为:
^Clogging
Traceback (most recent call last):
File "demo.py", line 16, in
print 'logging', log.count()
File "/home/xzl/hg/spark/dpark/dpark/rdd.py", line 368, in count
return sum(self.ctx.runJob(self, lambda x: sum(1 for i in x)))
File "/home/xzl/hg/spark/dpark/dpark/context.py", line 263, in runJob
for it in self.scheduler.runJob(rdd, func, partitions, allowLocal):
File "/home/xzl/hg/spark/dpark/dpark/schedule.py", line 292, in runJob
submitStage(finalStage)
File "/home/xzl/hg/spark/dpark/dpark/schedule.py", line 254, in submitStage
submitMissingTasks(stage)
File "/home/xzl/hg/spark/dpark/dpark/schedule.py", line 290, in submitMissingTasks
self.submitTasks(tasks)
File "/home/xzl/hg/spark/dpark/dpark/schedule.py", line 397, in submitTasks
_, reason, result, update = run_task(task, self.nextAttempId())
File "/home/xzl/hg/spark/dpark/dpark/schedule.py", line 376, in run_task
result = task.run(aid)
File "/home/xzl/hg/spark/dpark/dpark/task.py", line 52, in run
return self.func(self.rdd.iterator(self.split))
File "/home/xzl/hg/spark/dpark/dpark/rdd.py", line 368, in
return sum(self.ctx.runJob(self, lambda x: sum(1 for i in x)))
File "/home/xzl/hg/spark/dpark/dpark/rdd.py", line 368, in
return sum(self.ctx.runJob(self, lambda x: sum(1 for i in x)))
File "/home/xzl/hg/spark/dpark/dpark/util.py", line 111, in _
for r in result:
File "/home/xzl/hg/spark/dpark/dpark/cache.py", line 205, in getOrCompute
cachedVal = self.cache.get(key)
File "/home/xzl/hg/spark/dpark/dpark/cache.py", line 57, in get
locs = self.tracker.getCacheUri(rdd_id, index)
File "/home/xzl/hg/spark/dpark/dpark/cache.py", line 195, in getCacheUri
return self.client.call(GetValueMessage('cache:%s-%s' % (rdd_id, index)))
File "/home/xzl/hg/spark/dpark/dpark/tracker.py", line 116, in call
return sock.recv_pyobj()
File "/home/xzl/.virtualenvs/dpark/local/lib/python2.7/site-packages/zmq/sugar/socket.py", line 436, in recv_pyobj
s = self.recv(flags)
File "zmq/backend/cython/socket.pyx", line 674, in zmq.backend.cython.socket.Socket.recv (zmq/backend/cython/socket.c:6971)
File "zmq/backend/cython/socket.pyx", line 708, in zmq.backend.cython.socket.Socket.recv (zmq/backend/cython/socket.c:6763)
File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy (zmq/backend/cython/socket.c:1931)
File "zmq/backend/cython/checkrc.pxd", line 12, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/socket.c:7222)
KeyboardInterrupt
而且得 按两次 ctr+c
在台式机上没啥问题。。。
是zmq的问题么, 看traceback 好像是sock.recv_pyobj()没有收到 trackerServer的消息挂起了。。。
是这样的,我的大致流程是:
map1 = map1.reduceByKey2.cache()
map2.reduceBykey2(map1)
map2.foreach
map1.foreach
在这个过程中,我发现加不加cache()好像没什么用?
rdd.py里cache函数也只是return self,但还有一行被注释掉的
'self.shouldCache = True'
是我用的方式不对吗?还是我对cache()的理解有误?
代码在 https://gist.github.com/5062200
运行后报错
TypeError: 'Database' object is not callable. If you meant to call the 'getnewargs' method on a 'Connection' object it is failing because no such method exists.
暂时只想到加个代理层(比如nginx + lua做个http接口)来使用mongodb,不知有没有更好的办法?
需求:统计每UID及UID各WORD中的pv, uv
代码在:https://gist.github.com/muxueqz/4748764
期望输出:
[(('uid', 'word'), (set(['0']), 1)), ('123', [set(['1', '3', '2']), 5]), (('123', 'home'), [set(['1', '2']), 3]),
(('123', 'work'), [set(['3', '2']), 2]), ('uid', (set(['0']), 1))]
实际输出:
[(('uid', 'word'), (set(['0']), 1)), ('123', [set(['1', '3', '2']), 5]), (('123', 'home'), [set(['1', '2']), 3]),
(('123', 'work'), [set(['1', '3', '2']), 2]), ('uid', (set(['0']), 1))]
重点在 (('123', 'work') 的部分
不知是我用法不对还是遇到bug?
ImportError: Building module dpark.portable_hash failed: ["CompileError: command 'gcc' failed with exit status 1\n"]
在os x上安装完dpark,import dpark时报出以上错误,何解?
import dpark
bcast_test = dpark.broadcast(1)
bcast_2 = dpark.broadcast(2)
以前的版本是可以的,现在的版本会报:
Traceback (most recent call last):
File "bcast_test.py", line 3, in <module>
bcast_2 = dpark.broadcast(2)
TypeError: 'module' object is not callable
改为
import dpark
dpark_bcast = dpark.DparkContext()
bcast_test = dpark_bcast.broadcast(1)
bcast_2 = dpark_bcast.broadcast(2)
就OK
I set up mesos cluster on Amazon ec2 using mesos EC2-Scripts:
Then I run
python27 demo.py -m mesos://[email protected]:5050 -p 2
The program was halting there, it stuck at submitTasks() in schedule.py. Press Ctrl-C:
ec2-user@ip-10-31-194-149 examples]$ python27 demo.py -m mesos://[email protected]:5050 -p 2 2013-05-20 12:30:43,786 [INFO] [scheduler] Got a job with 4 tasks ^CTraceback (most recent call last): File "demo.py", line 10, in <module> print nums.count() File "/home/ec2-user/dpark/dpark/rdd.py", line 271, in count return sum(self.ctx.runJob(self, lambda x: ilen(x))) File "/home/ec2-user/dpark/dpark/context.py", line 204, in runJob for it in self.scheduler.runJob(rdd, func, partitions, allowLocal): File "/home/ec2-user/dpark/dpark/schedule.py", line 269, in runJob submitStage(finalStage) File "/home/ec2-user/dpark/dpark/schedule.py", line 231, in submitStage submitMissingTasks(stage) File "/home/ec2-user/dpark/dpark/schedule.py", line 267, in submitMissingTasks self.submitTasks(tasks) File "/home/ec2-user/dpark/dpark/schedule.py", line 436, in _ r = f(self, *a, **kw) File "/usr/lib64/python2.7/threading.py", line 154, in __exit__ self.release() File "/usr/lib64/python2.7/threading.py", line 142, in release raise RuntimeError("cannot release un-acquired lock") RuntimeError: cannot release un-acquired lock
In Mesos logs:
Log file created at: 2013/05/20 11:40:49 Running on machine: ip-10-31-194-149.ec2.internal Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I0520 11:40:49.170337 2099 logging.cpp:70] Logging to /mnt/mesos-logs I0520 11:40:49.172806 2099 main.cpp:95] Build: 2011-12-03 06:24:10 by root I0520 11:40:49.172871 2099 main.cpp:96] Starting Mesos master I0520 11:40:49.176777 2099 webui.cpp:81] Starting master web server on port 8080 I0520 11:40:49.176911 2101 master.cpp:264] Master started at mesos://[email protected]:5050 I0520 11:40:49.177106 2104 webui.cpp:47] Master web server thread started I0520 11:40:49.177109 2101 master.cpp:279] Master ID: 201305201140-0 I0520 11:40:49.177775 2101 master.cpp:462] Elected as master! I0520 11:40:49.191300 2104 webui.cpp:59] Loading webui/master/webui.py I0520 11:40:54.348682 2101 master.cpp:814] Attempting to register slave 201305201140-0-0 at [email protected]:33513 I0520 11:40:54.349149 2101 master.cpp:1057] Master now considering a slave at ip-10-28-0-219.ec2.internal:33513 as active I0520 11:40:54.349210 2101 master.cpp:1588] Adding slave 201305201140-0-0 at ip-10-28-0-219.ec2.internal with cpus=2; mem=677 I0520 11:40:54.349393 2101 simple_allocator.cpp:71] Added slave 201305201140-0-0 with cpus=2; mem=677 W0520 12:00:52.365500 2101 protobuf.hpp:260] Initialization errors: framework.executor
Question:
Does DPark require a specific mesos version?
Is there any relavant documentation for setting up DPark+Mesos?
已测试
会出现 “job lost” 错误
能不能加入环境说明,在win64,python3.5安装失败
Installed /Library/Python/2.7/site-packages/mesos.interface-0.22.1.2-py2.7.egg
error: mesos.interface 0.22.1.2 is installed but mesos.interface==0.22.0 is required by set(['pymesos'])
是需要设置pymesos 的环境变量吗?请问怎么设置?
The closestCenter
function in the kmeans.py
example may contain an off-by-one error:
for i in range(1, len(centers)):
skips over the first center, because the centers are stored in zero-based lists.
Installed /Library/Python/2.7/site-packages/DPark-0.1-py2.7-macosx-10.8-intel.egg
Processing dependencies for DPark==0.1
error: Installed distribution protobuf 3.0.0-alpha-1 conflicts with requirement protobuf>=2.5.0,<3
protobuf有版本限制,找了一下应该是文件要求安装protobuf3.0以上版本,但是系统要求只能是2.5.0到3之间的版本。不知道怎么修改这个限制。谢谢
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.