Git Product home page Git Product logo

Comments (4)

keisukefukuda avatar keisukefukuda commented on June 25, 2024

hi @sharpstill
Thank you for your request. We will consider this carefully because it is better to be a Chainer's feature, rather than ChainerMN's. In addition, we have limited resource and we don't have much experience on Hadoop/HDFS unfortunately.

Thanks!

from chainermn.

machanic avatar machanic commented on June 25, 2024

@keisukefukuda Thank you very much, HDFS can setup in only 2 machines to experiment.
By the way, tensorflow has already support HDFS and Yahoo release a tensorflow On spark version.
Maybe this will help you.

from chainermn.

kuenishi avatar kuenishi commented on June 25, 2024

I wonder in what @sharpstill meant by supporting HDFS file system. Even today it is possible to just access data in HDFS; by creating your own Dataset subclass that has list of files or chunks of HDFS file(s), and iterating over them at each get_example() call. See our example on local file system and Chainer's documentation on dataset . With these tools it's already possible to make your job directly read contents from HDFS. Also see Wes McKinney's article on Python HDFS clients comparison for HDFS Python clients choice.

For further support such as data locality awareness and data access optimization, it needs various design decision and we'll be working on it.

from chainermn.

kuenishi avatar kuenishi commented on June 25, 2024

So we're testing some use cases on HDFS but still not feeling the need to special code for HDFS, as we can make own Dataset accessor like this, using libhdfs and PyArrow. For the time being I'm closing this issue, but feel free to reopen for further discussion.

from chainermn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.