Git Product home page Git Product logo

Comments (14)

lgarithm avatar lgarithm commented on June 22, 2024

@zrss could you also share the config.json that you applied for each scaling step?

from kungfu.

lgarithm avatar lgarithm commented on June 22, 2024

What are the flags that you passed to kungfu-run? This can be found in the very beginning of the log, e.g.

$ kungfu-run -w -np 4 echo 2
[arg] [0]=kungfu-run
[arg] [1]=-w
[arg] [2]=-np
[arg] [3]=4
[arg] [4]=echo
[arg] [5]=2
[kf-env]: KUNGFU_GIT_URL=/Users/lg/code/mirrors/github.com/lsds/KungFu
[nic] [0] lo0 :: 127.0.0.1/8, ::1/128, fe80::1/64
[nic] [1] gif0 :: 
[nic] [2] stf0 :: 
[nic] [3] en0 :: 192.168.1.85/24
[nic] [4] en3 :: 
[nic] [5] en4 :: 
[nic] [6] en1 :: 
[nic] [7] en2 :: 
[nic] [8] bridge0 :: 
[nic] [9] p2p0 :: 
[nic] [10] awdl0 :: fe80::4c97:11ff:feab:ca51/64
[nic] [11] llw0 :: fe80::4c97:11ff:feab:ca51/64
[nic] [12] utun0 :: fe80::9d17:9272:6aa8:a8e8/64
[nic] [13] utun1 :: fe80::e340:7496:425b:77ee/64
[nic] [14] en5 :: fe80::aede:48ff:fe00:1122/64
[I] watching config server

I suspect the -init-version flag is not set correctly for the second scaling up.
According to the example

https://github.com/lsds/KungFu/blob/master/tests/go/cmd/kungfu-cluster-manager-example/kungfu-cluster-manager-example.go#L89

it should be set to -1 if the kungfu-run is not the first generation.

from kungfu.

zrss avatar zrss commented on June 22, 2024

thanks for the reply ~

it should be set to -1 if the kungfu-run is not the first generation.

kungfu-run params are all the same among my scale up/down cases ...

from kungfu.

zrss avatar zrss commented on June 22, 2024

@lgarithm , can i make this conclusion

  1. bootstrap a new kungfu job with default init-version (not set it)
  2. always set init-version to -1 for newly added kungfu-run

from kungfu.

rankeey avatar rankeey commented on June 22, 2024

@lgarithm can we just set the init-version to -1,not only for the newly added kungfu-run,but also for the first generation kungfu-run. It is hard for cluster to distinguish wether it is the first generation.

from kungfu.

lgarithm avatar lgarithm commented on June 22, 2024

@lgarithm , can i make this conclusion

  1. bootstrap a new kungfu job with default init-version (not set it)
  2. always set init-version to -1 for newly added kungfu-run

Yes, this is correct.

from kungfu.

lgarithm avatar lgarithm commented on June 22, 2024

@lgarithm can we just set the init-version to -1,not only for the newly added kungfu-run,but also for the first generation kungfu-run. It is hard for cluster to distinguish wether it is the first generation.

We can consider this as future improvement. But currently I can't think of how to do it in a clean way.

from kungfu.

lgarithm avatar lgarithm commented on June 22, 2024

@lgarithm can we just set the init-version to -1,not only for the newly added kungfu-run,but also for the first generation kungfu-run. It is hard for cluster to distinguish wether it is the first generation.

If you can manually initialize the first generation kungfu-runs, then you can always set init-version to -1.

from kungfu.

lgarithm avatar lgarithm commented on June 22, 2024

i.e. start the first generation kungfu-run with -init-version -1, then run this

var notify execution.PeerFunc = func(ctrl plan.PeerID) error {
ctx, cancel := context.WithTimeout(context.TODO(), config.WaitRunnerTimeout)
defer cancel()
n, err := p.router.Wait(ctx, ctrl)
if err != nil {
return err
}
if n > 0 {
log.Warnf("%s is up after pinged %d times", ctrl, n+1)
}
return p.router.Send(ctrl.WithName("update"), stage.Encode(), connection.ConnControl, 0)
}
if err := notify.Par(cluster.Runners); err != nil {
utils.ExitErr(err)
}

in your cluster manager.

from kungfu.

zrss avatar zrss commented on June 22, 2024

@lgarithm thanks for the reply, i'd like to try, currently, it seems the only way for us to do

to clarify

in our current arch, a host file (the file only records the ip of containers) is generated by cluster manager, and we import a kungfu-mng process for converting the host file to config.json of kungfu

the kungfu-mng is running in container, and every container has the same meta info (included the bootstrap command, that's the reason why we want to set the init-version as a fixed value) as we can only modify the number of containers by cluster manager (i.e. the elastic feature of Volcano on K8S)

the cluster manager will update the host file and bootstrap (shutdown) the new container when we scale up/down the kungfu-job

so now i can't think of a way for us to distinguish it is a newly added container in container unless the cluster manager can tag the newly added container with some labels (for example, add a SCALE_OUT env in the newly added container)

the kungfu-mng can compare the number of container in the host file with the bootstrap command of kungfu-run -H

  1. the number of container (and ip) == -H, the first generation
  2. the number of container (and ip) != -H, not the first generation

then

  1. the first generation, bootstrap kungfu-run by init-version=0
  2. not the first generation, bootstrap kungfu-run by init-version=-1

from kungfu.

zrss avatar zrss commented on June 22, 2024

https://github.com/volcano-sh/volcano

from kungfu.

lgarithm avatar lgarithm commented on June 22, 2024

What if the config.json restored to the origin after two scaling operations?

  1. the number of container (and ip) == -H, the first generation
  2. the number of container (and ip) != -H, not the first generation

then

  1. the first generation, bootstrap kungfu-run by init-version=0
  2. not the first generation, bootstrap kungfu-run by init-version=-1

from kungfu.

lgarithm avatar lgarithm commented on June 22, 2024

How about add a version field in the config.json object?

from kungfu.

zrss avatar zrss commented on June 22, 2024

What if the config.json restored to the origin after two scaling operations?

  1. the number of container (and ip) == -H, the first generation
  2. the number of container (and ip) != -H, not the first generation

then

  1. the first generation, bootstrap kungfu-run by init-version=0
  2. not the first generation, bootstrap kungfu-run by init-version=-1

we (platform) should limit the number of instances that cannot be smaller than the default value when scaling down, and this can simplify the scene

How about add a version field in the config.json object?

good idea, we can post a feature request to cluster manager for adding a version field in host file. generally saying, version=version + 1 in every scale up/down case

from kungfu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.