Comments (7)
Consul is nice. It seems handling service discovery and key value store, kind of like zookeeper.
Current "leader" component also does resource allocation to each driver program. If using Consul or other stuff, the resource allocation will need its own separate instance.
One of the goal is to keep the number of components minimal, and put the logic where it should be.
In future, Glow will integrate with Consul, zookeeper, Mesos, YARN, etc. But for now, let's build a system whee it should be like, instead of trying to fit into other project's existing APIs.
from glow.
I used zookeeper and mesos before on other ML projects.
Consul is go and light.
Please consider it going forward.
I am getting bogged down with the data ranging from data sources. For example I have 4 tb of data, and put it on a SAN, so the whole cluster can see it as a data source.
Each compute node needs to request a data range of the whole range. You know what I mean ?
Any thoughts ?
Are you planning to build a driver for each data source type ?
S3 is easy, because it just uses the http range operators.
SQL also because its just a pagination.
from glow.
One executor can devide the data into blocks and map to
On Wednesday, December 2, 2015, joeblew99 [email protected] wrote:
I used zookeeper and mesos before on other ML projects.
Consul is go and light.
Please consider it going forward.I am getting bogged down with the data ranging from data sources. For
example I have 4 tb of data, and put it on a SAN, so the whole cluster can
see it as a data source.
Each compute node needs to request a data range of the whole range. You
know what I mean ?
Any thoughts ?Are you planning to build a driver for each data source type ?
S3 is easy, because it just uses the http range operators.
SQL also because its just a pagination.—
Reply to this email directly or view it on GitHub
#11 (comment).
from glow.
(last email was sent by mistake)
To read the data source, you can give the data location to a mapper, and
the mapper can divide the data location into data ranges if possible, and
partition the data ranges, and the following mapper can fetch one range on
one executor.
The adapters for data sources would varies a lot. I am thinking to put them
into an external package. Ideas/Pull Requests are welcome!
Chris
On Wed, Dec 2, 2015 at 8:15 AM, Chris Lu [email protected] wrote:
One executor can devide the data into blocks and map to
On Wednesday, December 2, 2015, joeblew99 [email protected]
wrote:I used zookeeper and mesos before on other ML projects.
Consul is go and light.
Please consider it going forward.I am getting bogged down with the data ranging from data sources. For
example I have 4 tb of data, and put it on a SAN, so the whole cluster can
see it as a data source.
Each compute node needs to request a data range of the whole range. You
know what I mean ?
Any thoughts ?Are you planning to build a driver for each data source type ?
S3 is easy, because it just uses the http range operators.
SQL also because its just a pagination.—
Reply to this email directly or view it on GitHub
#11 (comment).
from glow.
OK that makes sense.
I just need to find it in the code. The reflection abstraction makes it a bit tough still.
Any examples or links to code would be awesome. I am still playing with the current examples.
I am happy to contribute drivers.
For file based maybe Seaweedfs ? It's your baby and will make a great fs for this ?
For db, maybe coachroachdb or a simple kV store.
Coachroachdb used to have a kV api but they removed it.
For me I would like to use a very simple dB store, because I am wishing to run it on mobile. I run other golang code on ios and android.
Again links into the code would really help, so I can see where I can start
from glow.
You should not need to write any reflection code.
Here is some seudo code for hdfs.
func AddHdfsFile(f *flow.Flow, hdfsLocation string) *flow.Dataset{
//list block files under hdfsLocation
blockList := some_func(hdfsLocation)
f.Slice(blockList).Map(func retrieveOneFile(blockLocation string, lines
chan string){
// read data, and feed into "lines" channel
})
}
On Wed, Dec 2, 2015 at 9:45 AM, joeblew99 [email protected] wrote:
OK that makes sense.
I just need to find it in the code. The reflection abstraction makes it a
bit tough still.
Any examples or links to code would be awesome. I am still playing with
the current examples.I am happy to contribute drivers.
For file based maybe Seaweedfs ? It's your baby and will make a great fs
for this ?For db, maybe coachroachdb or a simple kV store.
Coachroachdb used to have a kV api but they removed it.
For me I would like to use a very simple dB store, because I am wishing to
run it on mobile. I run other golang code on ios and android.Again links into the code would really help, so I can see where I can start
—
Reply to this email directly or view it on GitHub
#11 (comment).
from glow.
Added a HDFS example.
from glow.
Related Issues (20)
- Fold operation HOT 3
- Add unit tests for moderately complex APIs across the code base HOT 8
- Fix the timing out flakiness revealed in dataset_map_test.go HOT 2
- document failure/retry modes in distributed use HOT 1
- any ideas to add Lua(LuaJIT)? HOT 14
- Consider reduce the number of Travis CI builds HOT 3
- any plan for hive like execution engine? HOT 2
- Is there a means of teeing the flow? HOT 3
- All the work is done by only 1 node HOT 2
- Read size invalid argument - expected data input? HOT 7
- Issues at start_local_glow_cluster.sh HOT 2
- Doing partial reduceByKey in Flow created in func init() HOT 3
- How to make it working for multiple split logs HOT 2
- Glow support a time window like the Flink?
- glow使用与部署方法怎么处理? HOT 3
- Has it been used in the commercial production environment so far?
- glow run block when read big file data to mysql HOT 2
- Glow support for elastic search
- feature requiest: type i to enter editor mode
- update instalation instructions since go get is not longer surported HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glow.