Wednesday, 2023-12-20

jrosserNeilHanlon: thanks for the pointer to `rocky-release-kernel` i think theres a missing build for aarch64 64k page size?08:31
jrosserhamburgler: was there anything specific you were interested in for the radosgw config?08:53
hamburglerjrosser: hey :), i was curious how the mappings were working from storage-policy (swift) to placement-group(ceph) as for some reason in horizon I was unable to see these from the drop down menu in horizon so I could not create a container that way. I ended up upgrading to 18.2.1 from 18.2.0 and it seemed to resolve the issue lol :D19:50
hamburglerwas able to get a multi-site config setup running in lab, really quite neat :)19:51
jrosserhamburgler: interesting - is that with replication?19:53
hamburgleryes sir, so for demo purposes, two ceph azs, one is master, other is secondary, when i write to first az it gets replicated to second19:54
hamburglerlowered garbage collection times to run faster to see that syncs as well 19:55
jrosserdid you ever investigate performance with lots of objects in a bucket19:55
jrosserdelete time for the bucket for example19:55
jrosser^ not related to multisite19:56
jrosserwe see a delete rate of about 500 objects/second which can make it extremely long to delete massive buckets19:58
hamburglernot yet with any measurable workload - but from observation it looks pretty quick with a single bucket, that really doesn't mean much though I suppose :D, if I delete an object in a bucket, i can see the pool drop in size (this must be a marker saying objects to be deleted) then since i set rgw_gc_processor_period and rgw_gc_obj_min_wait to 60s within the objects within a bucket are gone i'd say within 2 minutes 19:59
hamburglereverything is removed19:59
hamburglerdid you adjust the default delete io rate limit for objects in buckets?19:59
jrosseralmost certainly not19:59
hamburgleri haven't played with that yet myself19:59
jrossergenerally we've focussed on getting thoughtput up19:59
hamburglergotcha, what type of disks do you use? are your pools mapped to specific crush rules/drive types - such as I read the red hat docs a bit ago and it looks like certain pools related to metadata should be mapped to ssd at a minimum, ideally NVMe - then data pools likely hdd with an ssd/nvme journal/wal20:01
jrosserall the rgw metadata is on an nvme pool20:01
jrosserbut then there is several PB of hdd for the default placement group20:02
hamburglerreplication or EC?20:02
jrosserand we made an extra placement group "fast" which is placed on the nvme20:02
jrosserall replicated, no EC20:02
jrosserinitially i didn't have enough chassis to sensibly do EC20:02
hamburgleryeah it looks like quite a few nodes are needed for that, I'm truly not sure if it is a benefit over replication as I've barely touched EC for anything20:03
hamburglerI would say since you have metadata on nvme pool that's probably not the limit of 500 objects/second - wonder if it's the io rate but again haven't really played with it20:04
hamburgleris it different from web to cli?20:05
jrosseri don't think so20:05
jrosserwe did discuss it briefly and decided there must be some rate limit20:05
jrosserbut bigger problem is dealing with quota exceeded :)20:06
hamburglerhaha - is that an issue on a per tenant basis?20:06
jrosserthat returns an error code immediately from the radosgw and is absolutely not bound by the latency of doing any storage I/o20:07
jrosserso if the client is written naievely, then that can badly hurt your radosgw node20:07
jrosserwe need to come up with some rate limiting for that case20:08
hamburglerahh you mean that if there is lots of objects being written it triggers quota exceeded?20:08
hamburglersorry if I am not following there 20:09
jrosseronce you exceed your quota and the client just retries over and over, you can DOS the radosgw pretty easily20:09
jrosserparticularly if the client is highly parallel20:09
hamburgleroh shoot that's good to know20:09
jrosserso theres two sides to that, client needs to behave appropriatey20:09
jrosserbut object store needs to be able to cope with a stupid client20:10
hamburglerhmm could haproxy not handle a rate limit?20:11
hamburglervia source 20:11
jrosserit could20:11
jrosserbut we need to revisit as the haproxy on the front is L420:11
jrosserand SSL termination is done on a big bank of radosgw behind that20:11
jrosserso that needs to become a bit fancier architecture20:12
hamburgleryeah absolutely, I'm not sure what we would do horizon size, but I imagine we will likely set the public endpoint through a different set of haproxy nodes - currently my lab is using the orchestrator to deploy keepalived/haproxy for ceph interal to openstack 20:13
hamburglerhorizon side*20:13
jrosserthats what we do - a separate pair of haprox y for the public endpoint20:14
jrosserthe radsgw sit in a sort of dmz20:14
jrosserand then theres some magic which allows access to those radosgw directly from instances20:15
jrosserwithout having to go through a neutron router20:15
hamburgleroh interesting never thought of that use case!20:15
hamburgleris your object storage built on its own cluster or dedicated root where rbd pools sit?20:16
jrosserit's the same cluster for everything20:17
jrosserbut we had some particular requirements around object store throughtput MB/s, rather than rbd iops20:17
hamburglergotcha, we have multiple roots for different tiers at the moment, was debating about throwing object storage on its own dedicated cluster but again $ :| lol20:18
jrosseri have opportunity to expand the object store significantly20:19
jrosserbut the disk chassis are 84 disks each20:19
jrosserwhich is some pretty wild number for OSDs20:19
hamburgleroh yeah :O, those are HDD nodes for data pools? 20:20
jrosseryes20:20
hamburglerwas gunna say if that was NVMe or even SSD those poor processors :D bottleneck for days20:20
hamburgler*with that many disks on one node :)20:24
hamburgleractually pretty happy with 18.2.1 - not that I am much of a fan of dashboards but the multi-site and overview are a nice touch20:26
hamburglerjrosser: btw ty, appreciate the chat!20:30
jrosserheh no problem20:30
jrossertbh we had a lot of trouble so far with the radosgw dashboard20:30
jrosserbuckets with loads of objects did not go well20:31
hamburglerI imagine it will be the same for me when not in a quiet lab env :D 20:32
NeilHanlonjrosser: not sure about the 64k kernel--but will check!20:40
jrosserNeilHanlon: that would be great20:41

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!