Git Product home page Git Product logo

Comments (8)

thomasg19930417 avatar thomasg19930417 commented on June 25, 2024

I think it should be caused by too many connections when the partition is too large. I also found a lot of connection creation process from the log as follows
image

from incubator-celeborn.

thomasg19930417 avatar thomasg19930417 commented on June 25, 2024

If the shuffle.partitions can be reduced, this problem can be solved, but in complex tasks, both large and small tasks exist. If the setting is too small, it is not very suitable for large tasks

from incubator-celeborn.

pan3793 avatar pan3793 commented on June 25, 2024

Does increasing celeborn.network.timeout help? The default value is 240s.
https://celeborn.apache.org/docs/0.2.1-incubating/configuration/#network

Note: you should set it in Spark configuration with the additional prefix spark., then it should be spark.celeborn.network.timeout

from incubator-celeborn.

wForget avatar wForget commented on June 25, 2024

Is it caused by not disabling spark.sql.adaptive.localShuffleReader.enabled?

from incubator-celeborn.

waitinfuture avatar waitinfuture commented on June 25, 2024

Is it caused by not disabling spark.sql.adaptive.localShuffleReader.enabled?

@wForget Right, @thomasg19930417 could you turn off spark.sql.adaptive.localShuffleReader.enabled and test again?

from incubator-celeborn.

thomasg19930417 avatar thomasg19930417 commented on June 25, 2024

After testing, it is indeed caused by this parameter, thank you for your reply @pan3793 @wForget @waitinfuture

from incubator-celeborn.

waitinfuture avatar waitinfuture commented on June 25, 2024

After testing, it is indeed caused by this parameter, thank you for your reply @pan3793 @wForget @waitinfuture

image
We do added comment about this config, maybe we should highlight it more :)

from incubator-celeborn.

thomasg19930417 avatar thomasg19930417 commented on June 25, 2024

After testing, it is indeed caused by this parameter, thank you for your reply @pan3793 @wForget @waitinfuture

image We do added comment about this config, maybe we should highlight it more :)

This parameter was commented out by my mistake, maybe it was automatically commented out when the shell was copied.

from incubator-celeborn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.