Comments (29)
Ideally, to mirror the behavior available in non-Rook Ceph, the ability set both the global config default value and per-pool pg_num for any pool Rook/Ceph will deploy.
from rook.
All of the rgw metadata pools are created with the settings from the CephObjectStore CR under the metadataPool
settings. So I expect all of the pools being created can be controlled with CRs today.
from rook.
The pool-specific settings can be specified today in the parameters
of the CephBlockPool spec. Any setting that can be applied to a pool in the toolbox with a command such as ceph osd pool set <pool> <key> <value>
can be specified. For example:
spec:
parameters:
pg_autoscale_mode: "on"
bulk: "true"
Setting bulk: true
seems to cause the autoscaler to immediately jump to 256 PGs, which seems rather high to consider for the default.
I believe this covers all cases you are suggesting, except the global setting osd_pool_default_pg_autoscale_mode
. Since each pool could configure to enable/disable the autoscaler, perhaps this isn't necessary?
from rook.
Re osd_pool_default_pg_autoscale_mode
, there are pools that are auto-created that there aren't Rook-specific places to set it I don't think.
256PGs is not excessive unless there are fewer than 3 OSDs fwiw, modulo the number of coresident pools.
from rook.
So do you wan to turn off the autoscale on the auto-created pools?
I think probably you can do it from the rook toolbox.
Or if there is any specific auto created pool that is concerning we can have a env variable setting for it
from rook.
If one has to do common things from a shell by hand instead of having IaC, why have Rook at all?
from rook.
So having such a configuration is as per design to make rook work with more productivity, I don't see any reason to change what we have in internal design.
If it's a common problem I said we can have a common env variable to configure it.
Btw for .mgr pool you can specify here https://github.com/rook/rook/blob/master/deploy/examples/cluster-test.yaml#L57 similar for .rgw pool
from rook.
I don't see any reason to change what we have in internal design.
I didn't ask for that. I asked for a feature whose lack degrades performance.
What about rgw.buckets.non-ec
? rgw.buckets.index
? rgw.otp
? .rgw.root
?
from rook.
Okay I agree we need to have a setting,
Do you prefer to have individual spec field to set for each pre-defined pool, or just a common spec to change the pgs of all predefined pools?
from rook.
We should be sure that's documented.
from rook.
We should be sure that's documented.
Are you suggesting to add the names of all the metadata pools in this comment, or perhaps in this doc? I was hoping the "metadata pools" could be self-explanatory to most users and that level of detail wouldn't be needed in the docs, but it certainly could be added if helpful.
from rook.
The pool names can vary by release -- we've seen that with RGW -- but we do have the case with Rook that a user might not predict them in advance, so for RGW at least, just documenting like "all of the rgw metadata pools are created with the settings from the CephObjectStore CR under the metadataPool settings. So I expect all of the pools being created can be controlled with CRs today." In this context the index pool is separate I hope? Since it typically warrants individual planning.
from rook.
In this context the index pool is separate I hope? Since it typically warrants individual planning.
You're referring to the .rgw.root
pool? This one does get some special treatment, but if you have multiple object stores, they should have the same metadataPool settings or else the .rgw.root
pool would attempt to apply the different settings. If there are multiple object stores, at least with v1.14 now this problem can be remedied with the shared pools for object stores where .rgw.root
can be explicitly configured.
from rook.
I believe the features you're requesting are able to be configured in Rook today.
Allow setting osd_pool_default_pg_autoscale_mode on or off
Ideally, to mirror the behavior available in non-Rook Ceph, the ability set both the global config default value and per-pool pg_num for any pool Rook/Ceph will deploy.
This is possible via Rook config options here (https://rook.io/docs/rook/latest-release/CRDs/Cluster/ceph-cluster-crd/#ceph-config), or by using the rook-config-override
configmap.
Allow setting off, on, warn for each pool
Default to bulk mode to minimize impactful PG splitting later on
Travis did a much better job of explaining these points here: #14075 (comment)
As added notes, Ceph docs for these pool-focused params are here: https://docs.ceph.com/en/latest/rados/operations/pools/
And Rook docs about using parameters
are here:
documented here: https://rook.io/docs/rook/latest-release/CRDs/Block-Storage/ceph-block-pool-crd/#pool-settings
These configs should allow modifying pre-existing pools without the need to use the toolbox CLI, which as you mentioned is a non-ideal workflow in the context that Rook is supposed to be a desired-state system.
And it is a good point that something that might help other Rook users is to add some documentation sections that show users how to configure pools with advanced features like these using the parameters
section.
What about rgw.buckets.non-ec ? rgw.buckets.index? rgw.otp? .rgw.root?
I think the best "answer" for these is the object store shared pools feature that was added recently, mentioned here: #14075 (comment)
We very much need the means to configure it. Notably, getting it to do what it's supposed to requires prognostication
For other pools like .mgr
, you're also right that it requires oracular foresight to figure out how to make Ceph do things right before runtime. Unfortunately, that seems to be what it takes to do advanced stuff with Ceph. I'm not sure how much more Rook can do to help the situation without taking on obscene code burden.
from rook.
from rook.
No, e.g. ceph-objectstore.rgw.buckets.index
Insufficient PGs in this pool significantly bottleneck RGW operations -- this is sadly quite common.
This is an area I'm curious to hear more about in the context of shared pools. Currently, I think we assume shared pools are broken into 2 categories:
- metadata (must be replicated, cannot be erasure coded)
- data (can be replica/ec)
But I wondered when we were implementing if there would be any users who need additional breakdowns. Perhaps this could also make sense:
- index (replica with count Y, fairly few pgs)
- metadata (replica with count X, more pgs) - all non-index metadata
- data (whatever you want)
Is this part of what you're expressing, @anthonyeleven ?
from rook.
from rook.
And hopefully one day we can configure multiple (probably not more than a handful of) data (bucket) pools within a single objectstore, which would probably want to share a single index pool.
I guess a shared pool makes sense if someone has like dozens of bucket pools, especially given that Rook creates a CRUSH rule for each and ever pool. I can't see that I personally would ever need to do that.
What is the scenario to have a single object store with multiple data pools? Perhaps creating separate object stores that have their own data pools meets the same requirements?
The rgw.index pool stores RGW S3 / Swift bucket indexes. With smaller objects and/or buckets with a lot of objects in them, this is often an RGW service's bottleneck. To work well, the index pool needs:
Thanks for the context on the index pool. Sounds like we need a separate option for its configuration from other metadata pools.
from rook.
from rook.
Thanks for the background on separate data pools for the same store. Looks like we need to consider the storageclasses capability of rgw to allow the placement targets to different data pools.
from rook.
This issue has delved into various different issues. @anthonyeleven would you mind opening new issues for the separate topics? We can keep this issue focused on the small doc clarification for pg management of rgw pools.
from rook.
from rook.
from rook.
already covers the storageclasses.
Thanks, I missed the connection there!
from rook.
The rgw.index pool stores RGW S3 / Swift bucket indexes. With smaller objects and/or buckets with a lot of objects in them, this is often an RGW service's bottleneck. To work well, the index pool needs:
- to be on SSDs
- preferably NVMe of course
- with a decent number of PGs, since both the OSD and the PG code have serializations that limit performance. On SATA SSDs I'd aim for a PG ratio of 200-250, for NVMe SSDs 300 easily. The pg_autoscaler unless forced will only do a fraction of these numbers.
- to be across a decent number of OSDs. 3 isn't a decent number. 12 is maybe a start. As a cluster grows so should the index pool, so OSD nodes that have 1-2 SSDs in them for the index pool scale well, and we use deviceclasses to segregate the OSDs if they aren't all TLC
- The SSDs don't have to be big, this is all omap data
Based on my interpretation of these requirements, I don't think they explicitly suggest that the index pool can't also provide storage for other metadata (in a shared pools case). In the interest of simplicity, I think it would make sense for users to configure the "metadata" pool with solid state and many PGs to give the index best performance, and then other metadata can also reap those benefits.
The only reason I can imagine right now that someone might want to separate index from "other metadata" is to save money by buying as few metadata NVMe drives as possible. But I also can't imagine that the other metadata includes a significant percentage of data that it makes much difference.
If that all is correct, then I think what we have today with shared pools can meet these needs. If not, then we can consider splitting index and non-index metadata pools (similar to OSD md and db).
And this is obviously separate from needing to develop and implement support for multiple RGW "s3 storage classes".
from rook.
from rook.
Trying to lump the minor RGW pools into the index pool would be a bad idea. I dunno what RADOS object names are used, but Ceph for sure will not be expecting that.
The shared pool feature does not "lump pools together" as it seems you are thinking. With shared pools, objects are separated by namespaces (instead of pools) to avoid name collisions. My assertion is that with shared pools, there is no need to separate index and "minor" pools because I can't find evidence of substantial benefit to doing so when both index and minor metadata can share the "index-optimized" pool easily.
from rook.
from rook.
I don’t know how one would direct Ceph to do so.
That is the feature that is provided by shared pools: https://rook.io/docs/rook/v1.14/Storage-Configuration/Object-Storage-RGW/object-storage/#create-local-object-stores-with-shared-pools
from rook.
Related Issues (20)
- Try to make import and export more eaiser HOT 8
- Update the document for volume cloning HOT 1
- object: Add field in hosting to specify wildcard supported entries in DNSnames
- Add Kubernetes v1.30 support to Rook CI
- reclaimSpace job not working HOT 5
- OSD Resize Increases Used Capacity Not Available Capacity HOT 22
- Support custom annotations on the Ceph dashboard service HOT 1
- Applying cluster.yaml on v1.13.8: failed calling webhook "cephcluster-wh-rook-ceph-admission-controller-rook-ceph.rook.io": connect: connection refused HOT 23
- RBD provisioner missing on some nodes HOT 1
- ceph-volume crashes and osd fails to initialize when creating a disk osd on a NixOS node HOT 4
- Creating a cluster never succeeds if CSI driver is disabled HOT 2
- 1 slow ops, oldest one blocked for 1880444 sec, mon.e has slow ops HOT 15
- Canary Integration test is failing intermittently
- Failed to enable disk encryption in the storage on EKS anywhere bare metal nodes HOT 1
- rook-ceph-cluster: Support Prometheus rules customization HOT 1
- Support block.wal in not-PVC deployment HOT 3
- Add curl timeout to the rgw-probe.sh healtchcheck script HOT 7
- Support options to always use LVM mode even if the disk is allow for raw mode HOT 2
- Doc Bug: links are not properly formatted HOT 6
- Add cluster option to enable "upmap-read" in the balancer module HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rook.