Tuesday, 2020-11-17

openstackgerritMerged openstack/kayobe stable/train: Fix filtering of network names set to an empty string  https://review.opendev.org/76133100:09
*** k_mouza has joined #openstack-kolla00:45
*** k_mouza has quit IRC00:49
*** kevko has joined #openstack-kolla00:56
*** zzzeek has quit IRC01:06
*** zzzeek has joined #openstack-kolla01:09
*** xinliang has joined #openstack-kolla01:15
*** xinliang has quit IRC01:31
openstackgerritMerged openstack/kolla-ansible stable/victoria: CI: add missing --fail argument to curl  https://review.opendev.org/76274502:38
*** skramaja has joined #openstack-kolla02:59
*** kevko has quit IRC03:05
*** vishalmanchanda has joined #openstack-kolla03:52
*** sri_ has quit IRC04:07
*** sri_ has joined #openstack-kolla04:11
*** vkmc has quit IRC04:16
*** vkmc has joined #openstack-kolla04:16
*** k_mouza has joined #openstack-kolla04:46
*** johnsom has quit IRC04:46
*** johnsom has joined #openstack-kolla04:49
*** k_mouza has quit IRC04:50
*** stackedsax has quit IRC05:13
*** stackedsax has joined #openstack-kolla05:13
*** johnsom has quit IRC05:18
*** johnsom has joined #openstack-kolla05:19
*** zzzeek has quit IRC05:25
*** zzzeek has joined #openstack-kolla05:27
*** evrardjp has quit IRC05:33
*** evrardjp has joined #openstack-kolla05:33
*** zzzeek has quit IRC05:50
*** zzzeek has joined #openstack-kolla05:51
*** JamesBenson has quit IRC05:53
*** zzzeek has quit IRC06:10
*** zzzeek has joined #openstack-kolla06:11
*** johnsom has quit IRC06:27
*** johnsom has joined #openstack-kolla06:27
*** gfidente|afk is now known as gfidente06:42
*** cah_link has joined #openstack-kolla06:47
*** zzzeek has quit IRC07:14
*** zzzeek has joined #openstack-kolla07:16
*** rm_work has quit IRC07:26
*** jbadiapa has joined #openstack-kolla07:27
*** rm_work has joined #openstack-kolla07:28
*** wuchunyang has joined #openstack-kolla07:35
*** zzzeek has quit IRC07:37
*** zzzeek has joined #openstack-kolla07:38
*** nikparasyr has joined #openstack-kolla07:46
*** rpittau|afk is now known as rpittau08:05
yoctozeptodcapone2004: one would have to switch to using role-based hostnames to have this working nicely08:10
mnasiadkamorning08:18
*** pescobar has quit IRC08:21
*** Fl1nt has joined #openstack-kolla08:22
Fl1ntGood morning everyone!08:22
Fl1ntsorry for the vanishing, but I've been busy at work :)08:23
mnasiadkawe are all busy at work08:25
*** zzzeek has quit IRC08:26
*** pescobar has joined #openstack-kolla08:27
*** bengates has joined #openstack-kolla08:28
*** zzzeek has joined #openstack-kolla08:30
*** zzzeek has quit IRC08:38
*** zzzeek has joined #openstack-kolla08:40
*** zzzeek has quit IRC08:48
Fl1ntyeah! fortunately, it's all good for the community as I'm working to setup an internal advocacy division about Openstack / Kolla / Ansible etc at work.08:49
Fl1ntBTW, quick question, when we activate external TLS, it create an appropriate automatic redirect for horizon but not for keystone and other services, is there a reason? Same way, when dealing with SSO using SAML2 iDP endpoint and especially ADFS08:51
Fl1ntthis one refuse to send back claims to non-tls endpoint08:51
*** zzzeek has joined #openstack-kolla08:51
Fl1ntmeaning that returnTo parameter can't be forged using non-tls by Apache08:51
*** mgoddard has joined #openstack-kolla08:51
Fl1ntmeaning we miss a SSLEngine on option on keystone Apache public virtualhost section08:52
Fl1ntI'm currently testing it, but if someone have any clue on this ^^08:52
*** bengates has quit IRC08:59
*** k_mouza has joined #openstack-kolla09:00
*** bengates has joined #openstack-kolla09:01
*** k_mouza has quit IRC09:05
*** kevko has joined #openstack-kolla09:18
*** sean-k-mooney has quit IRC09:27
*** sean-k-mooney has joined #openstack-kolla09:27
*** slunav has quit IRC09:28
*** mgoddard has quit IRC09:29
*** sluna has joined #openstack-kolla09:31
Fl1ntaaaah fuuuu** it's actually available on Train+ release.09:33
Fl1ntall right.09:33
openstackgerritMichal Nasiadka proposed openstack/kolla-ansible stable/ussuri: [baremetal]: Use $releasever in docker-ce repo  https://review.opendev.org/76297909:35
Fl1ntmnasiadka, why would you use the docker_yum_baseurl when you can just use the already variabilized docker-ce.repo file: https://download.docker.com/linux/centos/docker-ce.repo ?09:39
mnasiadkaFl1nt: feel free to propose a change to master, I'm not putting any more cycles into that ;)09:40
Fl1ntno, I mean, I'm looking for the reasoning behind that.09:41
mnasiadkaFl1nt: reasoning is so it's easy to backport09:41
Fl1ntjust curious pp09:41
Fl1ntok09:41
*** bengates has quit IRC09:46
*** bengates has joined #openstack-kolla09:47
Fl1ntStill working on SSO Using SAML2 within k-a this will be a big boy, the amount of configuration and logic is pretty large.09:49
Fl1ntand I've finished the cloudkitty role fix for Elasticsearch and prometheus, I just need time to push the patch.09:50
*** bengates has quit IRC09:51
*** bengates has joined #openstack-kolla09:54
*** zzzeek has quit IRC09:57
*** zzzeek has joined #openstack-kolla10:02
*** wuchunyang has quit IRC10:16
*** mgoddard has joined #openstack-kolla10:19
*** jpward has quit IRC10:20
openstackgerritMichal Nasiadka proposed openstack/kolla master: prometheus: Add OVN and OVS exporters  https://review.opendev.org/76298610:34
mgoddardFl1nt: Apache is always non-TLS, unless you are using backend TLS (Ussuri+)10:35
mgoddardHAproxy terminates TL10:35
mgoddardTLS10:35
mgoddardFl1nt: and horizon has a redirect because the port is different for HTTP10:36
mgoddardplus it's more of a user/browser facing service10:36
*** zzzeek has quit IRC10:38
*** zzzeek has joined #openstack-kolla10:40
Fl1ntactually, when dealing with ADFS (SAML 2.0) you need keystone tls backend as apache (mod_auth_mellon) create a returnTo URL which use non-tls otherwise, and using a non-tls returnTo is not working10:40
Fl1ntexemple:10:41
Fl1ntwhen your user call /auth/login using WebSSO it call for https://<fqdn>:5000/v3/auth/OS-FEDERATION/identity-providers/adfs/protocols/saml2/websso?origin=https://<fqdn>/auth/websso10:44
mnasiadkaFl1nt: tried just setting ServerName directive in wsgi config?10:47
Fl1ntyep doesn't work neither.10:47
mnasiadkaworks for me10:48
Fl1ntServerName <fqdn> at the virtualHost level just right upper the WSGI config?10:48
Fl1ntI'll check everything again.10:49
*** k_mouza has joined #openstack-kolla10:51
*** kwazar is now known as quasar_10:55
*** quasar_ is now known as quasar10:55
*** quasar is now known as quasar`10:56
Fl1ntok, so, just to clarify the issue.11:04
Fl1ntwhen using a normal kolla train branch, only enabling kolla_enable_external_tls11:05
Fl1ntat some point11:05
Fl1ntthe SP (keystone/mod_auth_mellon) craft the relayState URL11:05
Fl1ntthat URL should be a TLS endpoint using your now enabled public tls endpoint for keystone: https://<fqdn>:5000/v311:06
Fl1nthowever11:06
Fl1ntand it's were I'm being lost11:06
Fl1ntfor some reason11:06
Fl1ntapache isn't using our originURL=https://<fqdn>:5000/v3 value11:06
Fl1ntbut the non-tls equivalent11:07
Fl1ntI've try to add a http to https redirect at haproxy level adding a keystone_public_redirect section within keystone services dict for haproxy, it works, but it doesn't as it create a loop as the relayState is a parameter that is then redirected but create a new request with a non-tls parameters etc11:08
*** e0ne has joined #openstack-kolla11:08
Fl1ntso, for now, my conclusion is: Until apache virtualhost isn't using TLS (SSLEngine on) mod_auth_mellon context can't craft a TLS relayState url as it's location composition (VirtualHost context) isn't TLS based.11:10
openstackgerritMichal Nasiadka proposed openstack/kolla-ansible master: WIP: Add OVN and OVS exporter deployment  https://review.opendev.org/76299211:12
*** wuchunyang has joined #openstack-kolla11:15
*** zzzeek has quit IRC11:17
*** zzzeek has joined #openstack-kolla11:20
*** zzzeek has quit IRC11:27
*** zzzeek has joined #openstack-kolla11:28
*** stingrayza has quit IRC11:29
*** stingrayza has joined #openstack-kolla11:31
Fl1ntok, so there is someone else having the same issue: https://github.com/latchset/mod_auth_mellon/issues/27 and hopefully there is mellon diagnostics directive to activate more verbose log.11:35
openstackgerritMark Goddard proposed openstack/kolla master: WIP: CI: revert to public package mirrors after build  https://review.opendev.org/76192811:42
openstackgerritMark Goddard proposed openstack/kolla master: CI: add templated Dockerfiles to build logs  https://review.opendev.org/76299711:42
*** brinzhang0 has joined #openstack-kolla11:50
*** brinzhang_ has quit IRC11:53
*** brinzhang_ has joined #openstack-kolla11:55
*** brinzhang0 has quit IRC11:58
*** mgoddard has quit IRC12:01
openstackgerritMichal Nasiadka proposed openstack/kolla master: prometheus: Add OVN and OVS exporters  https://review.opendev.org/76298612:10
*** wuchunyang has quit IRC12:11
*** JamesBenson has joined #openstack-kolla12:12
*** mgoddard has joined #openstack-kolla12:15
*** jpward has joined #openstack-kolla12:20
*** Luzi has joined #openstack-kolla12:36
*** stingrayza has quit IRC12:37
*** zzzeek has quit IRC12:39
*** zzzeek has joined #openstack-kolla12:41
openstackgerritMichal Nasiadka proposed openstack/kolla master: prometheus: Add OVN exporter  https://review.opendev.org/76298612:45
Fl1ntok, I managed to re deploy our staging environment in order to validate my assertion.12:56
Fl1ntso12:56
Fl1ntour ADFS refuse to send claims to non-tls assertion consumer service URLs (postResponse) but in the meantime, if I do activate the kolla_enable_external_tls then the VIP:80 redirect to VIP:443 but then VIP:5000 doesn't provide TLS endpoint until you declare either haproxy to redirect keystone_external_redirect or keystone apache vhost TLS Backend.12:58
Fl1ntnow my question to you mnasiadka is, do you use a Train release or a Ussuri one?12:59
mnasiadkaUssuri12:59
Fl1ntTBN: CentOS provided mod_auth_mellon comes without mellon_diagnostic compiled so no chance ^^12:59
Fl1ntah ok, so hence why it works13:00
Fl1ntok and finally, did anyone already successfully managed to make keystone SAML2.0 federated authentication usin mod_mellon work on a Train release?13:01
*** dougsz has joined #openstack-kolla13:01
Fl1ntfrom my test, the missing part is all the backend tls works done with ussuri as when doing federation agains SAML2 endpoint (mainly ADFS) you need the whole communication channel to be TLS as your client (WebBrowser mostly) need to connect to keystone directly using TLS but keystone mod_mellon won't craft a TLS endpoint until your apache vhost is using TLS.13:04
Fl1ntmore accuratly, mod_mellon won't create an appropriate relayState URL13:05
Fl1ntand so lead you to this redirect loop nightmare13:06
Fl1ntbecause haproxy translate the request from HTTP to HTTPS but he request parameter continue to be a non-tls originURL/RelayState URL etc etc13:06
*** wuchunyang has joined #openstack-kolla13:07
Fl1ntmgoddard, until how many time is Train supposed to be supported on kolla?13:07
mgoddardFl1nt: it will enter extended maintenance in about 6 months13:08
yoctozeptoFl1nt: but we can support it in em stage as long as we care13:08
yoctozeptojust with no new official releases13:08
Fl1ntI need to evaluate if it worth the effort to either migrate to ussuri right now and so natively get TLS backend support, or if I should patch Train appropriately13:09
yoctozeptoI believe Train might be late because Ussuri breaks c7 compat13:09
yoctozeptoI would just go to Ussuri13:09
yoctozeptoyou would have to upgrade anyhow13:09
Fl1ntyeah, this is kind of an additional breakthrough but I want to migrate on C8 for prod release.13:09
yoctozeptomgoddard, mnasiadka: I have some cycles for upstream today and tomorrow - any priority stuff to look at?13:10
mgoddardyoctozepto: docker pull limits13:11
mgoddardyoctozepto: https://bugs.launchpad.net/kolla-ansible/+bug/190406213:11
openstackLaunchpad bug 1904062 in kolla-ansible wallaby "external ceph cinder volume config breaks volumes on ussuri upgrade" [High,Triaged]13:11
yoctozeptomgoddard: ack, have you talked to infra since then?13:11
mgoddardyoctozepto: not yet13:11
yoctozeptook, then I will handle this13:12
Fl1ntmgoddard, regarding ceph external bug, it's not recommended to fix the host actually and as mentioned there is a command to migrate volumes.13:13
Fl1ntbut I don't completely get the issue tbh it's not that clear what's the issue.13:14
Fl1ntoooh ok, I see, BTW, train doc about external ceph is broken.13:15
mnasiadkamgoddard: I can take the cinder bug, did an investigation yesterday.13:19
mgoddardthanks mnasiadka13:20
yoctozeptomgoddard: http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018817.html13:30
yoctozeptothanks mnasiadka13:31
yoctozeptomgoddard: also pinged infra on irc (#opendev)13:31
yoctozeptolet's see and I will coordinate this13:31
yoctozeptoany other urgent matters?13:31
mgoddardnice, thanks yoctozepto13:31
openstackgerritMichal Nasiadka proposed openstack/kolla-ansible master: cinder: start using active-active for rbd  https://review.opendev.org/76301113:33
mgoddardyoctozepto: I don't think so. I guess just victoria stabilisation13:33
mnasiadkayeah, we could look at bugs targeted at victoria and just start closing them13:33
mnasiadkaI guess Kolla should be close to a first stable release for Victoria13:34
yoctozeptothat is what I wanted to do next, so we are aligned13:35
*** k_mouza has quit IRC13:41
openstackgerritMerged openstack/kolla master: Bump up openstack exporter to 1.2.0  https://review.opendev.org/76112313:45
*** dougsz has quit IRC13:48
*** dougsz has joined #openstack-kolla14:17
*** dougsz has quit IRC14:17
*** dougsz has joined #openstack-kolla14:18
*** Luzi has quit IRC14:18
openstackgerritMerged openstack/kayobe-config-dev master: Sync configs with kayobe @ 074024d63f9cb364ca16a7a7f0ac94d77ee9466b  https://review.opendev.org/76282614:19
*** k_mouza has joined #openstack-kolla14:24
*** k_mouza has quit IRC14:24
*** k_mouza has joined #openstack-kolla14:24
*** brinzhang_ has quit IRC14:26
mnasiadkayoctozepto: so in order to close a bug, I need to submit a feature with some distributed lock manager? :D14:41
mnasiadkaI see tripleo is using etcd14:41
yoctozeptomnasiadka: not sure about our etcd either, sorry14:41
yoctozeptomnasiadka: I mean its ha properties14:42
openstackgerritMark Goddard proposed openstack/kolla master: WIP: CI: revert to public package mirrors after build  https://review.opendev.org/76192814:42
openstackgerritMark Goddard proposed openstack/kolla master: Remove footer block from intermediate images  https://review.opendev.org/76302714:42
yoctozeptomnasiadka: redis generally works better because of etcd driver quirks14:42
mnasiadkawell, we would need to enforce coordination backend to anything when cinder backend is ceph14:42
yoctozeptobut really one needs to finally look at that lock mechanism14:42
yoctozeptoyeah, that too14:42
yoctozeptosounds bad14:42
yoctozeptothe previous generally worked14:42
yoctozeptoso maybe for now we should just keep it14:43
mnasiadkayoctozepto: it led to some duplications, because it was meant for active/passive14:45
yoctozeptomnasiadka: could you expand that?14:46
yoctozeptomnasiadka: I might want to know :D14:46
mnasiadkahttps://docs.openstack.org/cinder/latest/contributor/high_availability.html#cinder-volume14:47
mnasiadkacheck the attention :)14:47
yoctozeptomnasiadka: yeah, that's why we should use *backend_host* and I am pretty sure we always did14:50
yoctozeptolet me check my deployment14:51
yoctozeptomhm14:52
yoctozeptothough I think I set it14:53
*** TrevorV has joined #openstack-kolla14:53
yoctozeptomnasiadka: it only has some issues that mgoddard linked to: https://bugs.launchpad.net/cinder/+bug/183740314:55
openstackLaunchpad bug 1837403 in openstack-ansible trunk "CleanableInUse exceptions when doing large parallel operations (like snapshot creates)" [Undecided,New]14:55
yoctozepto"large number of parallel Cinder operations"14:55
yoctozeptoI certainly do not have it14:55
yoctozeptofrom cinder docs I can't tell why backend_host would be "hacky"14:56
mnasiadkaWell, we can just add it back, and work on coordination14:57
mnasiadkaIt won’t be worse...14:57
yoctozeptoexactly14:59
yoctozeptobut we did not have it15:00
yoctozeptoso it is a problem for those moving from internal to external15:00
yoctozeptoso I guess this is a general issue against external15:00
yoctozeptoand not essentially its refactoring15:00
mgoddardThe most common deployment option for Cinder-Volume is as Active-Passive. This requires a common storage backend, the same Cinder backend configuration in all nodes, having the backend_host set on the backend sections, and using a high-availability cluster resource manager like Pacemaker.15:00
yoctozeptoyeah, the pacemaker sounds scary15:01
mgoddardwe deploy active/active15:01
yoctozeptobut it seems to work nonetheless without it15:01
yoctozeptowe deploy whatever15:01
yoctozepto:D15:01
Fl1ntCan someone explain this issue? I'm having a hard time finding what's the problem.15:08
Fl1ntbecause I'm using an external CEPH cluster and everything is active-active from a cinder and nova standpoint.15:09
*** dougsz has quit IRC15:11
Fl1ntI'm currently transfering my whole kolla-config for our new deployment that include a fair amount of downstream patch, so I'll be able to give some examples and samples of how we did it if needed.15:11
*** cah_link has quit IRC15:12
*** cah_link has joined #openstack-kolla15:13
mnasiadkamgoddard, yoctozepto: so what - just add backend_host back as part of the bugfix, and work on cluster/coordination? or do we want to do it properly (according to cinder)15:13
yoctozeptomnasiadka: where does it say that we should use active/active?15:15
Fl1ntjust TBN: not using backend_host and having a working one.15:15
yoctozeptoit feels right to use it15:15
mnasiadkayoctozepto: well, we deploy active active (multiple cinder-volumes)15:15
yoctozeptoFl1nt: then using `cluster` perhaps?15:15
Fl1ntneither15:15
Fl1nthold on, I'm redacting/pasting15:15
mnasiadkayoctozepto: but the mechanism we use to present them as one, is a bit hacky :)15:15
yoctozeptoFl1nt: well, then your HA is not really HA15:15
yoctozeptomnasiadka: hmm, could be15:15
yoctozeptomnasiadka: I felt like it worked like active/passive anyhow15:16
yoctozeptobecause cinder claims to require coordination for active/active15:16
mnasiadkayoctozepto: I guess it did, we could just set active/backup in haproxy15:16
yoctozeptoand I'm not running one15:16
Fl1ntI swear it is, as all hosts uses all cinder-volume nodes, hold on for the screenshots and conf15:16
mnasiadkayoctozepto: as the bugfix15:16
yoctozeptomnasiadka: haproxy does not care about cinder-volume15:16
mnasiadkaah right15:16
yoctozepto:-)15:16
mnasiadkaso then we can just go back to old hacky somewhat working solution15:17
mnasiadkaand close the bug15:17
*** dougsz has joined #openstack-kolla15:17
mnasiadkabecause cluster+coordination seems like it needs a fair amount of testing15:18
openstackgerritMichal Nasiadka proposed openstack/kolla-ansible stable/ussuri: [baremetal]: Use $releasever in docker-ce repo  https://review.opendev.org/76297915:20
mgoddardmnasiadka, yoctozepto: let's find out from Cinder what the potential issues with backend_host & active/active are15:21
mgoddardmailing list time?15:21
yoctozeptomgoddard: ok, good call15:23
yoctozeptoI wonder as it works just fine for me at the moment :D15:24
yoctozepto(and really thought it somehow degraded itself to active/passive due to no coordination)15:24
yoctozeptoyou might want to mention that no coordination is configured15:24
*** ysirndjuro has joined #openstack-kolla15:24
mnasiadkayoctozepto: if you feel super bad to images created with just downloading a file via curl, so then we need to fix one third of images? :)15:27
yoctozeptomnasiadka: feel bad about increasing our debt :D15:31
mnasiadkayoctozepto: well, then we need some better approach - any proposals? ;)15:31
Fl1ntyoctozepto, ok, which config/screen do you need?15:32
Fl1ntI've retrieve cinder-volume.conf15:32
Fl1ntceph.conf15:32
Fl1ntand dash screen of the services and volumes distribution15:32
Fl1nthttp://paste.openstack.org/show/aM7Xsdim8oAHQXcfAUVY/ - cinder-volume.conf15:33
Fl1nthttp://paste.openstack.org/show/Tw0FnpLhdQN93EP1teSm/ - ceph.conf15:33
Fl1ntand here is the volumes distribution: https://imgur.com/a/2CKptab15:35
Fl1ntTBN: this is on a staging cluster but yet.15:35
Fl1ntsorry for those blackbox but I don't yet get a local distributed cluster to live show as I'm still waiting for some parts.15:36
mnasiadkayoctozepto: we can use ADD, but it will not cache those downloads anyway15:38
Fl1ntor you could use the packages from distributions, I've done it that way as we don't get access to internet.15:39
yoctozeptoFl1nt: kill one cinder-volume to discover that you can no longer manage volumes linked to it15:39
yoctozeptomnasiadka: I mean I would love to have this from real repos15:39
mnasiadkayoctozepto: real repos... you want people writing 300 lines of code go an extra mile and build rpms and debs? :)15:40
mnasiadkait would probably take more time than writing the app :)15:40
yoctozeptomnasiadka: yeas15:40
yoctozeptomake them feel the pain15:40
Fl1ntaaaaaaah THAT! ok, so the issue is about management. not distribution or availability.15:40
yoctozeptoFl1nt: well, availability during failure is impacted15:41
yoctozeptorunning vms are happy15:41
yoctozeptobut otherwise :-)15:41
mnasiadkayoctozepto: get back on the ground now ;)15:41
Fl1ntit's not, we already tested that, if you loose your storage controller you don't loose the VM workload and attachment, you just need to rebalance the volume managed.15:41
yoctozeptomnasiadka: yes, sir15:42
*** skramaja has quit IRC15:42
yoctozeptoFl1nt: yeah, it needs "rebalancing"15:42
yoctozeptoquirky but worky15:42
Fl1ntyoctozepto, right you're too quick to wrote :p but yet availability of the APIs calls aren't something that can't be managed real quick by migrating to another controller so what's the point? Except if your cluster only have one controller which is far from being serious.15:43
Fl1ntI mean, rebalancing data is already an ops daily task with swift so, it's not really a biggie to have to do it for a cinder-volume controller.15:44
openstackgerritMark Goddard proposed openstack/kolla master: WIP: CI: revert to public package mirrors after build  https://review.opendev.org/76192815:45
Fl1ntBTW, from cinder doc: Active-Active is not yet15:50
Fl1nt# supported.15:50
Fl1ntso I guess it kinda fix the issue.15:51
mnasiadkawhere is it stated?15:54
*** rouk has joined #openstack-kolla15:54
Fl1nthttps://docs.openstack.org/cinder/ussuri/configuration/block-storage/samples/cinder.conf.html <- at the cluster directive level15:54
Fl1nttho it seems it became available on victoria15:55
Fl1ntbut as neither the official administration or installation doc is actually refering to it and that only OOO seems to "use it" I would be extra careful introducing the feature.15:57
mgoddardwe are using active/active15:58
mgoddardalways have15:59
Fl1nton victoria?15:59
mgoddardthe question is around how we do it15:59
mgoddardif you are running ceph and more than one active cinder-volume, you have active/active15:59
Fl1ntstarting from which release?15:59
mgoddardactive/passive would require some fencing mechanism, such as pacemaker15:59
Fl1ntmgoddard, I think we're not talking about the same thing.16:00
mgoddardpossibly16:00
Fl1ntcinder-volume services are active/active in terms of requests, as long as there is one down and that your request is passing by the VIP you're safe16:01
Fl1ntBUT16:01
Fl1ntas yoctozepto noted it, if you've got a cinder-volume agent down, all volumes attached BY this agent and so referenced by it within the database, won't be manageable until you explicitly attach them to another up and running agent16:01
Fl1ntusing the openstack cli16:02
roukoh hey is this my bug being discussed?16:02
Fl1ntor maybe through horizon I don't tested it.16:02
Fl1ntrouk, depends ^^16:02
roukhttps://bugs.launchpad.net/kolla-ansible/+bug/190406216:02
openstackLaunchpad bug 1904062 in kolla-ansible wallaby "external ceph cinder volume config breaks volumes on ussuri upgrade" [High,In progress] - Assigned to Michal Nasiadka (mnasiadka)16:02
Fl1ntrouk, yep16:03
roukis the backend_host method from before no longer recommended? im a bit out of the loop.16:03
Fl1ntto be clear, rouk your problem is that when you lost a cinder-volume agent, you have to migrate them to another still running one before being able to manage your volume again right?16:04
roukthat, and that every pre-ussuri volume is on the old host, which goes away on upgrade forever.16:04
roukso every existing volume needs a manual migration to a random host.16:04
Fl1ntyes, using the openstack cli16:05
mgoddardrouk: the old ones are assigned to the old backend_host16:05
mgoddardwhen that gets removed, they need to be migrated to a real hostname16:05
roukmgoddard: that is correct, yeah.16:05
Fl1ntthe old backend_host vanished if I correctly understand rouk16:05
roukFl1nt: s/vanished/down/16:05
Fl1ntyes my point16:05
mgoddardrouk: did you actively remove backend_host?16:06
rouki moved to the new templates for external_ceph, which involved trusting kolla to not brick my previously-recommended config :p16:06
mgoddardright16:07
roukso right now im just overriding backend_host back in.16:07
*** wuchunyang has quit IRC16:07
*** k_mouza has quit IRC16:07
roukif theres a way to get individual host states without manual migrations, that would be cool though.16:07
Fl1ntso you basically just have to cinder migrate <volume> <host> with a for loop16:08
Fl1ntreplace cinder with the appropriate openstack cmd if you're using the wrapper16:08
roukyeah, which is <0 fun on routine maintenance, or random node deaths.16:08
Fl1ntrouk, no no no16:08
Fl1ntif your storage host die, and that it is part of a ceph cluster16:09
Fl1ntyour VMs don't lost the volume16:09
Fl1ntif your storage host die, that it is part of a ceph cluster AND hosting a cinder-volume agent16:09
roukif i take down a cinder-volume node for maint, and then someone, out of my hundreds of users, deletes a vm, and i didnt react instantly and migrate volumes as soon as the crash happened, i end up with a volume attached to a deleted instance.16:09
Fl1ntthen your "just" need to instruct openstack to delegate those volume management to another cindervolume agent16:09
roukthats how i noticed this issue.16:10
roukuser deleted vm, ended up with a stuck attached volume, cause cinder-volume didnt respond during the delete.16:10
Fl1ntyou then just have to put your volume in an available state get ride of the attachement and delete it or let it be available again16:10
Fl1ntthat's the way cinder/openstack works, it's not really a bug.16:11
roukwhich is manual intervention in hundreds of workflows which are often scripted and will require calls to like 20 people to get them to clean up.16:11
Fl1ntwell, welcome to openstack orphane resources no management ^^16:11
mgoddardFl1nt: well, that's the way it works if you don't use backend_host or clustering16:12
mgoddardFl1nt: but surely with backend_host or clustering, you don't need to do that?16:12
roukbackend_host just fixes this, and if clustering can do it too, while maintaining a record for each agent, that would be even better.16:12
Fl1ntyou can't use clustering until victoria from my understanding of the doc (but can be wrong) and using backend_host wont fix that specific orphaned issue.16:12
Fl1nts/issue/resource issue/16:13
roukmgoddard: correct, the odds of a stuck volume with backend_host is only if a node dies and is sent commands in that 1 minute period before its timed off rabbit, i think.16:13
roukit still happens, but its better than "till all volumes get migrated"16:13
roukpresumably clustering would fix that last possible case.16:14
mgoddardFl1nt: OSA switched to active/active in Stein: https://opendev.org/openstack/openstack-ansible-os_cinder/commit/918b9077c816be5fc056637301265e0be2f245ab16:14
mgoddard(after release)16:15
Fl1ntrouk, are you running on victoria? because until then, you can't get cluster (stated within the doc) and even on victoria there is a lack of documentation.16:15
Fl1ntmgoddard, yeah but it's not because they activate something that it necessarly work and it require pacemaker.16:15
roukFl1nt: nah, im slow because of ties to FWaaS that i need to convince said hundreds of people to fix their 0/0 public ip security groups before i can upgrade.16:16
mgoddardFl1nt: no, pacemaker is for active/passive16:16
roukussuri hitting prod for me friday.16:16
Fl1ntmgoddard, ok, noticed16:16
Fl1ntI can test cluster directive on staging but I doubt it will work like that out of the box.16:17
rouki can get victoria onto my PTE env and start testing it some time after the new year, sadly.16:18
roukso im kinda useless on testing clustering.16:18
Fl1ntmgoddard, how can they have cinder a/a cluster used with stein when the configuration from cinder and the doc state that it is not yet supported even on ussuri?16:18
Fl1nthttps://docs.openstack.org/cinder/ussuri/configuration/block-storage/samples/cinder.conf.html16:18
Fl1ntis there kind of a new "beta" phase with features on openstack like for kubernetes now ?16:19
mnasiadkawell, we need some statement from Cinder team, how it should be done in Victoria and before ;-)16:20
Fl1ntrouk, is your cluster hosting sensitive data? Do you have proper backup solution in place? Because if you don't, I would not advise you to use the cluster feature until victoria and the proper validation from cinde maintainers from the distribution list.16:20
Fl1ntmnasiadka, +1016:20
mgoddardFl1nt: https://docs.openstack.org/releasenotes/cinder/rocky.html16:21
roukFl1nt: im not doing anything in prod till i know its good. PTE is worthless to me, its big, but its designed to be nuked.16:21
roukfor now, im going to keep using backend_host till theres a better option.16:21
Fl1ntare you refering to this mark? "Added support for active-active replication to the RBD driver. This allows users to configure multiple volume backends that are all a member of the same cluster participating in replication."16:22
*** k_mouza has joined #openstack-kolla16:23
mgoddardFl1nt: yes16:24
Fl1nthum, this is so vague that I don't know if it's not refering to ceph capability itself and not really cinder-volume agent per see16:24
roukit doesnt make sense as a ceph statement... unless they mean setting up cross-cluster pool replication? but cinder doesnt do pool management, it expects pools to be there already, heh.16:26
Fl1ntrouk, you can have multiple backends from the same ceph cluster and then even have additional mirroring on rbd16:26
Fl1ntall in all, need to be tested and validated from cinder team.16:27
Fl1ntfor instance on our prod cluster, we have three different ceph backends, participating on the same cluster and cinder actually use those three different backends.16:28
mgoddardhere's the patch that added that reno: https://review.opendev.org/#/c/556658/16:29
patchbotpatch 556658 - cinder - RBD: add support for active/active replication (MERGED) - 7 patch sets16:29
mgoddardI think the word replication is a misnomer16:29
Fl1ntthank for the patch16:30
Fl1ntso16:30
Fl1ntit's a CEPH level feature, not a cinder-volume one.16:31
Fl1ntit use CEPH RBD mirroring16:31
*** cah_link has quit IRC16:32
Fl1nthum... actually, it not even that16:35
Fl1ntthey're cloning images16:36
roukso i have a completely tangental question, which ive tried to get neutron to answer a few times, but never got anywhere and havnt had time to persue it hard enough. since train upgrade, ive had issues with routers not getting routes, and when one fails over, its a dice roll if the target has its assigned routes. could it just be the fwaas plugin for l3-agent slowly rotting?16:36
roukre-adding routes magically fixes the problem, but it only started happening in train.16:37
Fl1ntNever had this issue sorry :(16:37
roukyeah its nasty, its in all my clusters, and nobody elses, and has no error, and no reproducable test case.16:38
roukmust be fwaas since im apparently the only user left.16:38
roukFl1nt: i agree, the code, and the commit message, and the merge request are uselessly vague, and they need to weigh in on the "right" solution.16:39
Fl1ntmgoddard, look at the _disable_replication function, it's a ceph function to mirror a flattened image (volume in ceph vocabulary)16:40
Fl1ntso basically everything replication related is based on this concept16:40
roukthen yeah, thats not helpful for this, sadly.16:41
roukmust be clustering for >V, backend_host for <V, but would be nice for their opinion on it.16:41
* Fl1nt reading the rbd driver... it's pretty interesting ^^16:42
roukits a lot shorter code than i expected, but everything with ceph is pretty smooth, so i guess not that unexpected.16:43
Fl1ntit's not even clear from the driver itself as everything named volume is coming from cinder library but then they're transforming volume back as image when dealing with ceph related block. it's confusing ^^16:43
roukyeah, needs a terminology fix, too many opinions on what something means.16:43
mgoddardFl1nt: I think that is unrelated. AFAICT, the cluster option, SUPPORTS_ACTIVE_ACTIVE driver flag etc relate to mapping of volumes to cinder-volume hosts16:44
mgoddardvolume replication is a ceph cluster concern16:44
Fl1ntactually, I would be able to tell you correctly if I download the cinder source as the IDE would then follow the appropriate link and not just be a guess ^^16:44
Fl1ntwith ceph there is to my knowledge, three different replication features, geo/rbd mirroring and OSDs of course, so it isn't that clear. the problem here is that the failover function fail if your volume isn't a replication enabled RBD image.16:47
Fl1ntwhich doesn't make sense16:48
Fl1ntas why would cinder-volume require to know about a image feature related in order to do the active/active?16:48
*** nikparasyr has left #openstack-kolla16:49
mgoddardhttps://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cinder-volume-active-active-support.html16:52
*** bengates has quit IRC16:56
*** muhaha has joined #openstack-kolla16:57
Fl1ntmgoddard, thanks, gonna dive deeper on tomorrow as from the review patch I can't really tell what they're really doing there are function call from external module that I can't understand otherwise than with having the code.17:06
*** Fl1nt has quit IRC17:11
*** cah_link has joined #openstack-kolla17:12
*** cah_link has quit IRC17:15
*** e0ne has quit IRC17:22
*** kevko has quit IRC17:23
*** rpittau is now known as rpittau|afk17:27
*** k_mouza has quit IRC17:30
*** dougsz has quit IRC17:32
*** kevko has joined #openstack-kolla17:42
*** k_mouza has joined #openstack-kolla17:50
*** gfidente is now known as gfidente|afk17:59
*** mgoddard has quit IRC18:02
*** k_mouza has quit IRC18:10
*** k_mouza has joined #openstack-kolla18:19
*** k_mouza has quit IRC18:56
*** mgoddard has joined #openstack-kolla19:21
*** kevko has quit IRC19:44
*** k_mouza has joined #openstack-kolla19:57
*** k_mouza has quit IRC20:01
*** k_mouza has joined #openstack-kolla20:16
*** k_mouza has quit IRC20:21
*** mgoddard has quit IRC20:31
*** TrevorV has quit IRC20:50
*** gfidente|afk is now known as gfidente21:04
*** hjensas_ has joined #openstack-kolla21:06
*** hjensas has quit IRC21:10
*** jovial[m] has quit IRC21:41
*** muhaha has quit IRC21:47
*** jovial[m] has joined #openstack-kolla22:05
openstackgerritJames Kirsch proposed openstack/kolla master: Add LetsEncrypt images for cert request/renewal  https://review.opendev.org/74133922:31
*** hjensas__ has joined #openstack-kolla22:47
*** hjensas_ has quit IRC22:51
*** quasar` is now known as parallax23:03
*** parallax has left #openstack-kolla23:07
*** parallax has joined #openstack-kolla23:15
*** Arador has joined #openstack-kolla23:36
AradorHello, new here and new to Kolla, but I have been using Openstack for a while.  Is anyone here familiar with getting the Adjutant role to work?  I am getting an error that say "no filter named 'customise_fluentd'"23:40
*** mloza has quit IRC23:46

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!