Friday, 2023-02-24

cloudnullnoonedeadpunk sorry, I'm replying late. I out of the loop, but the strategy plugin may not actually be needed with modern ansible. Much of the strategy plugin was trying to do forward lookups on tasks and omit them when conditions were not met all without ever loading the task into the ansible-engine. We also were changing the connection plugin01:43
cloudnullwhen dealing with specific scenarios to use paraminko or ssh. All of this is probably not really needed any more. 01:43
cloudnullI think the only thing that would actually need to be kept is the magic variable mapping 01:44
cloudnullhttps://github.com/openstack/openstack-ansible-plugins/blob/master/plugins/strategy/linear.py#L4801:44
cloudnullbut then again, I'm not really sure. 01:44
cloudnullThiagoCMC I've seen that error before, the issue for me was that the Galera cluster health check with xinetd ( I think OSA uses something else now ) wasn't permitting the traffic from the HAProxy node. To fix it I had to change the allow address for the head check config; in general I would set it to something like 0.0.0.0/0 for testing then lock01:50
cloudnullit down once it was working again. 01:50
cloudnullhttps://github.com/openstack/openstack-ansible-galera_server/blob/master/defaults/main.yml#L8201:51
cloudnullhere's the implementation - https://github.com/openstack/openstack-ansible-galera_server/blob/8a8d29ea490fba6695e3356831846466f6991089/tasks/galera_server_post_install.yml#L60 01:52
cloudnullI guess that's using clustercheck these days.01:53
cloudnullbut same basic idea 01:53
prometheanfirecloudnull: ohai02:29
noonedeadpunkcloudnull: aha, thanks for explaining it a bit. SSH connection plugin is still needed though as we don't have ssh inside lxc and it's used to do lxc-attach from lxc_hosts08:12
noonedeadpunkand indeed looks like ansible-core 2.13 doesn't really need our strategy, but I defenitely see performance improvenent with 2.1108:12
jamesdentonprometheanfire makes complete sense and demonstrates my lack of testing with LXC, only metal13:44
jamesdentonSo we may need to adjust that strategy, then13:44
Mohaa"Host evacuation [for Masakri] requires shared storage and some method of fencing nodes, likely provided by Pacemaker/STONITH and access to the OOB management network. Given these requirements and an incomplete implementation within OpenStack-Ansible at this time, I’ll skip this demonstration." (jamesdenton's blog post, 20201)14:12
jamesdentonhi14:13
Mohaa(:14:13
MohaaHi14:13
jamesdentoni haven't looked at it again since then14:14
MohaaIs that feature not yet implemented in openstack-ansible?14:14
MohaaAh, ok14:15
jamesdentonWell, TBH I am not sure about any sort of pacemaker role in OSA, and i'm pretty sure we don't have any sort of reference arch for this14:17
MohaaOops! Instances HA is necessary for a production environment!14:26
jamesdentondepends on who you ask. pets vs cattle, etc.14:30
prometheanfirejamesdenton: I'm willing to test patches for it :D15:32
spatelwhat SSDs you guys prefer for Ceph nowadays.. 15:48
spatelIntel or Samsung and what model?15:48
damiandabrowskican't help you with vendor but recently I learned that enterprise grade SSDs may behave a lot better for ceph due to the way they handle fsyncs: https://forum.proxmox.com/threads/ceph-shared-ssd-for-db-wal.80165/post-39754415:56
damiandabrowskiso even my samsungs' 980 pro performance looks good on paper, they probably suck for ceph. I haven't got a time for a real comparison though15:57
admin1spatel.. intel .. it generally has very high endurance compared to samsungs15:58
spatelModel?16:00
admin1p* series 16:00
spateldamiandabrowski Samsung 980 pro is consumer SSD correct?16:00
admin1dont recall the exact model 16:00
damiandabrowskispatel: yeah16:01
spatelThey sucks..! ask me.. i had to upgrade my whole cluster with PM883 4 year ago..16:01
spatelconsume SSD sucks..16:01
spateladmin1 thanks i will try to find p* model and see which one fit in pocket :)16:02
admin1spatel, S4500  also has a lot of endurance 16:02
spatelAssuming we don't need dedicated for journal SSD with Intel SSD16:02
admin1if you are going to have bigger cluster or chance of growth, then having a nvme journal might become bottleneck16:03
spatelhttps://www.amazon.com/Intel-SSDSC2KB019T701-S4500-Reseller-Single/dp/B074QSB52M/ref=sr_1_3?crid=1B4LWHP6M79LY&keywords=Intel+S4500+2TB&qid=1677254585&s=electronics&sprefix=intel+s4500+2tb%2Celectronics%2C79&sr=1-3&ufe=app_do%3Aamzn1.fos.304cacc1-b508-45fb-a37f-a2c47c48c32f16:03
damiandabrowskiadmin1: ouh, so you generally do not recommend having NVMe WAL&DB for SSD OSDs?16:04
admin1i do not if you are going to have 100s of ssds or say public cloud where you need to add dozens per month for growth for example16:04
spatelWhy do we need dedicated journal for SSD? why create single point of failure16:05
spateli can understand with HDD16:05
admin1this * endurance * comes into play 16:05
damiandabrowskii guess performance. Ceph docs suggest putting WAL on faster device type and NVMe performance is generally better than standard SSD16:06
damiandabrowskibut i'd love to see some benchmarks comparing colocated vs. external WAL for SSD OSDs16:06
damiandabrowskiand also what is the optimal number of SSD OSDs per NVMe WAL(I have 4 OSDs per NVMe WAL but not sure if it's the best ratio)16:08
spateldamiandabrowski you do use NvME for SSD journal? 16:09
damiandabrowskiyeah16:09
spatelHow many SSD you have per server ?16:10
spateli meant OSD16:10
damiandabrowskiit's a small cluster, I have 3 storage servers, each one has: 6 HDD OSDs + 1 SSD journal and 4 SSD OSDs + 1 NVMe journal16:13
damiandabrowskibut honestly, I don't know if it's a good ratio or not :D cluster performance is quite bad, but probably due to consumer grade SSDs16:13
spatelconsumer grade SSD is really really bad for Ceph (I went through that pain) 16:17
spatelDo you have dedicated MON nodes and running with OSD? because of budget i am using MON+OSD on same node16:18
spatelBad bad idea.. but plan is to when we have money will split it 16:18
damiandabrowskiI'm running MONs on openstack controllers16:18
damiandabrowski(in LXC containers)16:20
spatelohh 16:21
damiandabrowskiand yeah..probably I'll have to replace my samsungs sooner or later :|16:21
spatelI am deploying with cephadm 16:21
spatelI did replace with PM883 SSD and happy with performance. 16:21
damiandabrowskiwell, probably you can create a ceph-mon group in OSA, let it prepare containers and do the rest with cephadm16:23
spatelcephadm use docker/podman16:23
damiandabrowskiahhh  that's true, forgot about it  :|16:23
spatelhaha!! 16:23
spatelwhat a mess.. 16:23
spatelwhat number we should look in this output of smartctl - https://paste.opendev.org/show/bB5XMOXki9y1Tt0igBdV/16:24
damiandabrowskiregarding containers for ceph: i love that title :D https://www.youtube.com/watch?v=pPZsN_urpqw16:25
damiandabrowskiregarding smartctl, i think most attributes are meaningful but instead of manually reading smartctl output, I prefer to use smartd and just wait for emails :D 16:29
prometheanfirejamesdenton: as far as metal vs lxc, is metal becoming the more 'supported' option by OSA?16:29
damiandabrowskihttps://github.com/stuvusIT/smartd - i used this role some time ago16:30
mgariepydamiandabrowski, yeah i saw this one last year :D they almost kill ceph-ansible because of that tho.. :P16:41
damiandabrowskimgariepy: oops, but at least with ceph-ansible you can choose whether you want a containerized deployment or not :D https://github.com/ceph/ceph-ansible/blob/main/group_vars/all.yml.sample#L58316:44
damiandabrowskican't say the same about cephadm :|16:44
spateldamiandabrowski i will look into smartd16:47
spatelBut how do i know how much life left for my SSD here  https://paste.opendev.org/show/bB5XMOXki9y1Tt0igBdV/16:48
jamesdentonprometheanfire all of our deploys have been "metal" since Stein16:50
spateldamiandabrowski what RAID controller card do you prefer for ceph ?16:55
damiandabrowskispatel: i think you can consider this disk as worn out, but if you want to save some money, it should be ok to keep it until you see some errors(like Uncorrectable_Error_Cnt )17:01
spatel+117:02
damiandabrowskiregarding RAID controller, I just use some random HBA. AFAIK ceph do not recommend raid controllers17:02
prometheanfirenoonedeadpunk: is metal considered more supported now than lxc, where dev effort / support is likely to be going?  asking for a new install17:02
spatelWhat would be that random HBA ? 17:05
spatelAny model i should follow.. 17:05
spatelis it possible to use RAID + JBOD on same controller?17:05
spatelLike OS disk do RAID and rest of disk JBOD17:05
damiandabrowskidon't remember the exact model, something from broadcom17:15
damiandabrowskii think it depends on the controller, few years ago i was working with one that didn't support JBOD at all17:15
damiandabrowskiand I had to configure RAID0 separately on each disk xD17:16
damiandabrowskior maybe i had to create RAID1 on each disk because RAID0 was not supported? don't remember, it was a long time ago17:20
spatelThat is what i have in old ceph all RAID0 17:30
admin1yeah . 17:30
spatelI never work on JBOD so no idea what model and how good they are17:30
spatelDo you enable RAID0 with Write-Back-Cache on HBA controller?17:31
admin1spatel, does the controller have its own battery unit 17:35
spateladmin1 yes controller has battery18:09
spatelsmall one :)18:09
noonedeadpunkprometheanfire: I don't think it matters much to be frank. But metal with ceph will be troublesome if both glance and cinder will use it as a backend. So you will need to isolate some of them. And troublesome not during deploy but futher operations18:18
noonedeadpunkprometheanfire: me personally still running LXC and not going to migrate out of it18:18
prometheanfiregood to know about ceph18:18
noonedeadpunk(as of today)18:18
noonedeadpunkso it's more about your preferrence to be frank18:19
prometheanfireya, I like the containers, being able to blow things away is nice18:19
noonedeadpunkWell, if you have ironic you can blow off controllers as well... But yeah, limiting impact might be good sometimes18:19
prometheanfireironic for the controllers? snake eating it's own tail :P18:36
spatelprometheanfire haha!! controller running ironic and ironic building controller.. to much fun19:17
noonedeadpunkit can be standalone like bifrost for instance20:04
prometheanfiretrue20:18
admin1anyone tried trove ? 20:23

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!