Sometimes balancer gives output containing upmap commands which are not valid accordin

Wohoo I've fixed this in <a class="commit-link" data-hovercard-type="commit" data-hove

Balancer advices invalid PG moves about ceph-balancer HOT 11 CLOSED

thejj commented on June 18, 2024

Balancer advices invalid PG moves

from ceph-balancer.

Comments (11)

dimm0 commented on June 18, 2024 1

I've got the matrix invite and messaged you, and see no other messages from you. Not sure if you're off somewhere or my matrix server is not happy

from ceph-balancer.

TheJJ commented on June 18, 2024 1

Wohoo I've fixed this in d92c2fc ! Thanks for your perfect reproduction cluster state file @dimm0 😄

from ceph-balancer.

TheJJ commented on June 18, 2024

That's bad. And it means my crush validation is somehow wrong. I guess that's also why #4 occurs, and the sanity check fires (with the same reason - i've never tested the script on a cluster having racks, instead of just hosts).
It's really hard debugging without a working cluster currently, I should add a feature where it collects all the json data so you may send it to me and one can test/fix it completely offline.

Also here, for now, it would be helpful if you ran with maximum logging, and send me the output of the whole decision process of one move that'll be invalid, and one of a move that is valid.

from ceph-balancer.

Dragonn commented on June 18, 2024

Okay, sorry I should read both your responses before replying :-).

Currently we use cluster in a way that 1 rack = 1 host, but we are currently preparing to extend cluster which will result in multiple hosts within single rack so I cannot really change CRUSH topology right now.

However I think I can send you all necessary JSON outputs which are collected in the beginning of script. I just probably prefer not to attach them publicly to the issue directly. Can we find some more private channel to deliver them? E.g. email or PM somewhere?

from ceph-balancer.

TheJJ commented on June 18, 2024

I've now added a feature to export the state bundle :)
Please create it with ./placementoptimizer.py gather outputfile.xz (you can reach me in matrix or via mail - jj -at- sft lol :) Ideally upload the file somewhere and send me a link (mail's not suited for blobs..)

from ceph-balancer.

dimm0 commented on June 18, 2024

Same here, almost all moves are not allowed in a cluster with datacenter failure domain

from ceph-balancer.

TheJJ commented on June 18, 2024

Are running the latest version?

Can you send me your state dump, like i described previously, please? Also, for a given cluster state dump, please send me generated moves that ceph does not accept, but which the balancer produced. That would mean i can tinker with the code and try to figure out the problem.

from ceph-balancer.

dimm0 commented on June 18, 2024

Yes, the most recent.

I had to do a couple changes, since json is giving a wrong type for replicated pools (I'm using rook, it might've done something weird in config):

diff --git a/placementoptimizer.py b/placementoptimizer.py
index 91a37ae..25a64e6 100755
--- a/placementoptimizer.py
+++ b/placementoptimizer.py
@@ -320,17 +320,18 @@ for pool in CLUSTER_STATE["df_dump"]["pools"]:
 
 
 for pool in CLUSTER_STATE["pool_dump"]:
+
     id = pool["pool_id"]
     ec_profile = pool["erasure_code_profile"]
 
     pg_shard_size_avg = pools[id]["stored"] / pools[id]["pg_num"]
 
-    if ec_profile:
+    if ec_profile and ec_profiles.get(ec_profile):
         pg_shard_size_avg /= ec_profiles[ec_profile]["data_chunks"]
 
     pools[id].update({
         "erasure_code_profile": ec_profile,
-        "repl_type": "ec" if ec_profile else "repl",
+        "repl_type": "ec" if ec_profile and ec_profiles.get(ec_profile) else "repl",
         "pg_shard_size_avg": pg_shard_size_avg
     })
 
@@ -678,7 +679,7 @@ def get_pg_shardsize(pgid):
     shard_size += pg_stats['num_omap_bytes']
 
     ec_profile = pg_ec_profile(pgid)
-    if ec_profile:
+    if ec_profile and ec_profiles.get(ec_profile):
         shard_size /= ec_profiles[ec_profile]["data_chunks"]
         # omap is not supported on EC pools (yet)
         # when it is, check how the omap data is spread (replica or also ec?)

[root@k8s-haosu-14 ~]# ./placementoptimizer.py -v balance --max-pg-moves 10
[2022-03-15 17:13:26,040] gathering cluster state via ceph api...
[2022-03-15 17:13:31,988] running pg balancer
[2022-03-15 17:13:32,023] current OSD fill rate per crushclasses:
[2022-03-15 17:13:32,024]   hdd: average=51.24%, median=50.80%, without_placement_constraints=58.60%
[2022-03-15 17:13:32,024] cluster variance for crushclasses:
[2022-03-15 17:13:32,024]   hdd: 54.884
[2022-03-15 17:13:32,025] min osd.26 37.306%
[2022-03-15 17:13:32,025] max osd.34 68.550%
[2022-03-15 17:13:32,033]   SAVE move 1.345 osd.34 => osd.26
[2022-03-15 17:13:32,033]     props: size=81.0G remapped=False upmaps=0
[2022-03-15 17:13:32,033]     => variance new=54.63124092496721 < 54.884087241088345=old
[2022-03-15 17:13:32,034]     new min osd.19 37.488%
[2022-03-15 17:13:32,034]         max osd.29 68.065%
[2022-03-15 17:13:32,034]     new cluster variance:
[2022-03-15 17:13:32,034]       hdd: 54.631
[2022-03-15 17:13:32,043]   SAVE move 1.21b osd.29 => osd.19
[2022-03-15 17:13:32,043]     props: size=81.0G remapped=False upmaps=0
[2022-03-15 17:13:32,043]     => variance new=54.384077250915475 < 54.63124092496721=old
[2022-03-15 17:13:32,043]     new min osd.170 38.010%
[2022-03-15 17:13:32,043]         max osd.34 67.683%
[2022-03-15 17:13:32,043]     new cluster variance:
[2022-03-15 17:13:32,043]       hdd: 54.384
[2022-03-15 17:13:32,052]   SAVE move 1.10c osd.34 => osd.170
[2022-03-15 17:13:32,052]     props: size=80.9G remapped=False upmaps=0
[2022-03-15 17:13:32,053]     => variance new=54.161715943167025 < 54.384077250915475=old
[2022-03-15 17:13:32,053]     new min osd.26 38.170%
[2022-03-15 17:13:32,053]         max osd.29 67.199%
[2022-03-15 17:13:32,053]     new cluster variance:
[2022-03-15 17:13:32,053]       hdd: 54.162
[2022-03-15 17:13:32,062]   SAVE move 1.75 osd.29 => osd.26
[2022-03-15 17:13:32,062]     props: size=80.2G remapped=False upmaps=0
[2022-03-15 17:13:32,062]     => variance new=53.9295038278412 < 54.161715943167025=old
[2022-03-15 17:13:32,062]     new min osd.133 38.245%
[2022-03-15 17:13:32,062]         max osd.34 66.817%
[2022-03-15 17:13:32,062]     new cluster variance:
[2022-03-15 17:13:32,062]       hdd: 53.930
[2022-03-15 17:13:32,071]   SAVE move 1.1a0 osd.34 => osd.133
[2022-03-15 17:13:32,071]     props: size=80.4G remapped=False upmaps=0
[2022-03-15 17:13:32,071]     => variance new=53.71749631647796 < 53.9295038278412=old
[2022-03-15 17:13:32,071]     new min osd.122 38.325%
[2022-03-15 17:13:32,071]         max osd.29 66.341%
[2022-03-15 17:13:32,071]     new cluster variance:
[2022-03-15 17:13:32,071]       hdd: 53.717
[2022-03-15 17:13:32,081]   SAVE move 1.38f osd.29 => osd.122
[2022-03-15 17:13:32,081]     props: size=80.2G remapped=False upmaps=0
[2022-03-15 17:13:32,082]     => variance new=53.510389448720424 < 53.71749631647796=old
[2022-03-15 17:13:32,082]     new min osd.19 38.353%
[2022-03-15 17:13:32,082]         max osd.34 65.956%
[2022-03-15 17:13:32,082]     new cluster variance:
[2022-03-15 17:13:32,082]       hdd: 53.510
[2022-03-15 17:13:32,090]   SAVE move 1.fd osd.34 => osd.19
[2022-03-15 17:13:32,091]     props: size=80.0G remapped=False upmaps=0
[2022-03-15 17:13:32,091]     => variance new=53.290570485934 < 53.510389448720424=old
[2022-03-15 17:13:32,091]     new min osd.70 38.369%
[2022-03-15 17:13:32,091]         max osd.29 65.484%
[2022-03-15 17:13:32,091]     new cluster variance:
[2022-03-15 17:13:32,091]       hdd: 53.291
[2022-03-15 17:13:32,100]   SAVE move 1.1e8 osd.29 => osd.70
[2022-03-15 17:13:32,100]     props: size=80.2G remapped=False upmaps=0
[2022-03-15 17:13:32,100]     => variance new=53.09083806042146 < 53.290570485934=old
[2022-03-15 17:13:32,100]     new min osd.130 38.645%
[2022-03-15 17:13:32,100]         max osd.53 65.476%
[2022-03-15 17:13:32,100]     new cluster variance:
[2022-03-15 17:13:32,100]       hdd: 53.091
[2022-03-15 17:13:32,109]   SAVE move 1.11e osd.53 => osd.130
[2022-03-15 17:13:32,109]     props: size=80.5G remapped=False upmaps=0
[2022-03-15 17:13:32,109]     => variance new=52.892366525271036 < 53.09083806042146=old
[2022-03-15 17:13:32,109]     new min osd.170 38.730%
[2022-03-15 17:13:32,109]         max osd.34 65.101%
[2022-03-15 17:13:32,109]     new cluster variance:
[2022-03-15 17:13:32,109]       hdd: 52.892
[2022-03-15 17:13:32,118]   SAVE move 28.6de osd.34 => osd.170
[2022-03-15 17:13:32,118]     props: size=48.1G remapped=False upmaps=0
[2022-03-15 17:13:32,118]     => variance new=52.77434199993548 < 52.89236652527104=old
[2022-03-15 17:13:32,118]     new min osd.140 38.759%
[2022-03-15 17:13:32,118]         max osd.66 65.037%
[2022-03-15 17:13:32,118]     new cluster variance:
[2022-03-15 17:13:32,118]       hdd: 52.774
[2022-03-15 17:13:32,118] enough remaps found
[2022-03-15 17:13:32,118] --------------------------------------------------------------------------------
[2022-03-15 17:13:32,118] generated 10 remaps.
[2022-03-15 17:13:32,118] total movement size: 772.4G.
[2022-03-15 17:13:32,118] --------------------------------------------------------------------------------
[2022-03-15 17:13:32,119] old cluster variance per crushclass:
[2022-03-15 17:13:32,119]   hdd: 54.884
[2022-03-15 17:13:32,119] old min osd.26 37.306%
[2022-03-15 17:13:32,119] old max osd.34 68.550%
[2022-03-15 17:13:32,119] --------------------------------------------------------------------------------
[2022-03-15 17:13:32,119] new min osd.140 38.759%
[2022-03-15 17:13:32,119] new max osd.66 65.037%
[2022-03-15 17:13:32,119] new cluster variance:
[2022-03-15 17:13:32,119]   hdd: 52.774
[2022-03-15 17:13:32,119] --------------------------------------------------------------------------------
ceph osd pg-upmap-items 1.345 34 26
ceph osd pg-upmap-items 1.21b 29 19
ceph osd pg-upmap-items 1.10c 34 170
ceph osd pg-upmap-items 1.75 29 26
ceph osd pg-upmap-items 1.1a0 34 133
ceph osd pg-upmap-items 1.38f 29 122
ceph osd pg-upmap-items 1.fd 34 19
ceph osd pg-upmap-items 1.1e8 29 70
ceph osd pg-upmap-items 1.11e 53 130
ceph osd pg-upmap-items 28.6de 34 170

1.345      20898                   0         0          0        0   86954708992          296          27  3386      3386                        active+clean  2022-03-14T05:26:58.567934+0000   1908585'13965079   1908585:39935681            [99,34,104]          99            [99,34,104]

I have the datacenter failure domain

1.345 34->26 will move it from datacenter ucsb to datacenter ucsd which is also the datacenter of osd.99 (in fact the same node even)

from ceph-balancer.

dimm0 commented on June 18, 2024

I'm @dimm:matrix.nrp-nautilus.io on matrix

from ceph-balancer.

dimm0 commented on June 18, 2024

Oh, I just read in another ticket that you only support hosts for now. I guess that's the issue then

from ceph-balancer.

TheJJ commented on June 18, 2024

It seems that it is the case, but I do want to fix the bugs so we can use it to balance arbitrary crush trees :) I've contacted you on matrix!

from ceph-balancer.

Balancer advices invalid PG moves about ceph-balancer HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent