Monday, 2024-02-26

mnasiadka	morning	08:15
opendevreview	Michal Nasiadka proposed openstack/kolla stable/2023.2: toolbox: Bump ansible-core to 2.15 https://review.opendev.org/c/openstack/kolla/+/910148	08:28
mnasiadka	frickler, kevko: ^^ seems we forgot this in 2023.2 cycle...	08:34
opendevreview	Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879	08:42
opendevreview	Michal Nasiadka proposed openstack/kolla-ansible master: CI: Break OVN cluster before reconfigure https://review.opendev.org/c/openstack/kolla-ansible/+/897935	08:48
opendevreview	Michal Nasiadka proposed openstack/kolla-ansible master: CI: Break OVN cluster before reconfigure https://review.opendev.org/c/openstack/kolla-ansible/+/897935	08:48
frickler	mnasiadka: I'm not sure we can or should do this for a stable branch. let's see what testing says, but at least it'll need a reno?	08:51
mnasiadka	frickler: now it's minimum supported version, we usually had maximum supported version - if it passes I assume we can create a bug and add a reno	08:52
opendevreview	Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879	09:16
opendevreview	Michal Nasiadka proposed openstack/kolla stable/2023.2: toolbox: Improve retry loop for ansible-galaxy https://review.opendev.org/c/openstack/kolla/+/910181	09:45
opendevreview	Michal Nasiadka proposed openstack/kolla stable/2023.1: toolbox: Improve retry loop for ansible-galaxy https://review.opendev.org/c/openstack/kolla/+/910182	09:45
opendevreview	Michal Nasiadka proposed openstack/kolla stable/zed: toolbox: Improve retry loop for ansible-galaxy https://review.opendev.org/c/openstack/kolla/+/910183	09:45
opendevreview	Michal Nasiadka proposed openstack/kolla stable/zed: toolbox: Improve retry loop for ansible-galaxy https://review.opendev.org/c/openstack/kolla/+/910183	09:46
mnasiadka	seems the same thing happened on 2023.1 - some collections were not installed and we didn't fail the build (periodic)	09:46
frickler	that sounds plausible	09:48
SvenKieske	mhm	09:50
frickler	oh, we didn't even retry in zed	09:51
SvenKieske	ah we didn't backport this? I remember it.	09:51
SvenKieske	I should keep more attention on where a backport is needed I guess. :)	09:59
mnasiadka	well, a simple script getting backport flag and backporting would be nice	10:07
frickler	yeah, except that that patch also didn't have a backport flag set. so we need another script to check that. or a deputy that watches things ;)	10:12
mnasiadka	well, we need us all checking if it needs backporting when we merge it :)	10:13
mnasiadka	point taken	10:13
SvenKieske	we could have a script that mandates an active decision to vote for backporting either -1 or +1 and forbids an abstain voting, no? so a basic gerrit rule for that? it add's more work though, but I guess that's the point? :)	10:14
mnasiadka	that's also an option	10:15
SvenKieske	so, basically before a patch is gated, and get's verified +2 the bot checks if the backport yes/no decision was made.	10:16
mnasiadka	I think we can just have gerrit config that doesn't allow W+1 if backport flag is not one of -2 -1 +1 +2	10:18
SvenKieske	yeah, sounds good. if you nudge me into the right direction I can spin up a short patch	10:18
SvenKieske	is that in project-config settings or where do I find this?	10:18
opendevreview	Michal Nasiadka proposed openstack/kolla stable/zed: toolbox: Improve retry loop for ansible-galaxy https://review.opendev.org/c/openstack/kolla/+/910183	10:20
SvenKieske	I guess it would go somewhere in here? https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/openstack/kolla.config#L53	10:21
mnasiadka	probably, I'm not an expert on those weird acls, I just remember they need a tab (not 4 whitespaces as tab) ;-)	10:37
frickler	I don't think you can do this with gerrit, iiuc you'd have to renumber things from 0..+4 or so. we'd need some more complicated check job like the release team uses to verify PTL-approvals, but then I'd think that would be overdoing it. maybe just having a script that does these checks just before doing the monthly stable releases might work better	10:42
opendevreview	Will Szumski proposed openstack/kayobe master: WIP: Add podman support https://review.opendev.org/c/openstack/kayobe/+/909686	10:47
SvenKieske	frickler: well I guess you could do this via the prolog rules that gerrit uses to evaluate the ACLs, you can add custom stuff there, but I guess that's to invasive as it seems to be that we do not use this feature yet.	10:49
SvenKieske	mhm, seems that's deprecated as well and one should use submit requirements, but I don't see how this can be done with sub requirements just yet: https://gerrit-review.googlesource.com/Documentation/prolog-cookbook.html	10:51
opendevreview	Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879	10:52
opendevreview	Matúš Jenča proposed openstack/kolla-ansible master: Add Redis as caching backend for Keystone https://review.opendev.org/c/openstack/kolla-ansible/+/909201	10:55
SvenKieske	mhm, thinking about it, I _guess_ I have a solution..	11:00
SvenKieske	mnasiadka, frickler: possibly buggy, so please evaluate :) https://review.opendev.org/c/openstack/project-config/+/910212	11:03
kevko	morning	11:16
SvenKieske	o/	11:18
kevko	We were upgrading quite big openstack ...and rabbitmq failed to upgrade .	11:20
kevko	and i am wondering why ....	11:21
kevko	we've seen in logs something s " wait for mnesia ....blabla ..9 retries left .. fail"	11:21
SvenKieske	that upgrade to zed? or what was it?	11:21
kevko	it was xena - > yoga	11:21
SvenKieske	was this with default config/upgraded from older versions, or a xena release? because I have seen old envs having no ha enabled at all, because of our conservative upgrade policies so far :)	11:24
kevko	in upgrade yob there is a task which stop rabbitmqs but not first node ... and right after this the first node is stopped .... as it's huge cloud .. I was wondering if this was not a problem (while 2 nodes were stil in a process of stopping ...first node was stopped during this ... and because of this cluster was not stopped correctly)	11:25
SvenKieske	mhm, taking one step back, what does "failed to upgrade" mean? the upgrade role did throw an error? or did the processes just not start successful after the upgrade?	11:25
mnasiadka	kevko: mnesia is getting deprecated - RMQ is moving to Raft - but that doesn't help now :)	11:25
kevko	mnasiadka: I know .... we fixed it ...but I am very curious why it happened ...	11:26
SvenKieske	kevko: I was under the impression our upgrade jobs should reboot the rmq nodes one after the other, and not two simultaneously? looking at the upgrade job	11:26
kevko	can be my theory the right one ?	11:26
SvenKieske	what is your theory? 2 nodes in process of stopping and first node stopped during this via our playbook? I need to look but afaik that should not happen	11:27
kevko	SvenKieske: yeah, but it's only theory	11:27
SvenKieske	maybe we have some edgecase here for large envs, did you verify that the upgrade logic permits this?	11:27
kevko	SvenKieske: I relly don't know ...we need to say rabbitmq node to force start	11:29
kevko	*really	11:29
kevko	rabbitmqctl force_start	11:29
kevko	or how it is ...	11:29
SvenKieske	mhm interesting	11:29
SvenKieske	have a meeting now, can take a look later maybe	11:29
kevko	well, it's only maybe yoga issue ...i don't know ..because i've already checked newer branches and the way how it is upgraded is little bit different	11:30
mnasiadka	it had to be reworked due to Ansible fixing a bug that we apparently used in maria/rmq roles :)	11:31
kevko	mnasiadka: which bug ?	11:36
mnasiadka	it's referenced in the commit message for sure	11:41
mnasiadka	https://review.opendev.org/c/openstack/kolla-ansible/+/886485	11:42
mnasiadka	but it was in 2023.1	11:42
mnasiadka	not zed	11:42
SvenKieske	last change in the zed release with respect to upgrades was afaik 3dfca54dfb3be04283e2c5d6800e27cc5a3d8776 which removed the ha-all policy when not required	11:45
SvenKieske	the last change to the config.yml, which is used there, in zed, was in fact by me, but I would rule that one out, that was 159925248375727ae67ef93a606b09a54e83ba2d (re-add rmq config for clustering interface)	11:47
SvenKieske	well if you have ipv6 that might be a problem, actually	11:47
kevko	SvenKieske: well, we didn't remove ha-all policy of course ...for now it's no-go to do this ...	11:48
SvenKieske	but by default afaik that changes nothing, only provides the ability to be able to configure the interface used by rabbitmq for clustering, which was removed by accident	11:48
SvenKieske	kevko: ok, so I guess you didn't hit a bug related to new code, at least from the few files I looked at.	11:49
SvenKieske	might be the upgrade logic is not robust enough	11:49
kevko	it's not :D	11:49
SvenKieske	for smaller deployments container restarts are rather fast, because there are not that many queues and queues are mostly empty anyway.	11:50
SvenKieske	so the fast restart might obscure a race condition	11:50
kevko	SvenKieske: I know :D .... testing env upgraded without problems, LAB on customer side was OK .... production not :D	11:50
SvenKieske	maybe write all your information up in a bug report? that would be a good starting point for the investigation (and I can feed it to my time tracker :D)	11:50
opendevreview	Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879	11:51
kevko	SvenKieske: let me provide some logs ...i need to ask them to give me some ..	11:52
opendevreview	Merged openstack/kolla master: Install ironic-inspector in bifrost https://review.opendev.org/c/openstack/kolla/+/909865	11:54
opendevreview	Michal Nasiadka proposed openstack/kolla master: Remove calls to libvirt repo https://review.opendev.org/c/openstack/kolla/+/910224	12:04
kevko	btw, xena centos libvirt to yoga centos libvirt can't live migrate even if libvirt version is same :D	12:07
mnasiadka	and the reason is? ;-)	12:08
kevko	because they patched something ... :D ..but broke the code	12:08
kevko	2024-02-23 14:40:11.574+0000: 967670: error : qemuProcessReportLogError:2051 : internal error: qemu unexpectedly closed the monitor: 2024-02-23T14:40:11.549441Z qemu-kvm: Missing section footer for 0000:00:01.3/piix4_pm	12:09
kevko	2024-02-23T14:40:11.550219Z qemu-kvm: load of migration failed: Invalid argument	12:09
kevko	mnasiadka: can I found patches and build my own version (for debuntu it is easy as I can download the source for package ...and debug ...) ..or is it closed ? I don't have much experiences with rpms	12:11
mnasiadka	the spec used by c8s/c9s is here: https://git.centos.org/rpms/libvirt	12:11
mnasiadka	you can build on your own	12:11
opendevreview	Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879	12:17
jheikkin	Does kaybe/kolla support other container backends than Docker? Has anyone experience with running openstack services on systemd-nspawn containers?	12:18
mnasiadka	Kolla supports Podman since 2023.2 release, Kayobe doesn't support podman yet	12:18
mnasiadka	systemd-nspawn - I think OpenStack-Ansible supported that or planned to support that	12:19
jheikkin	I saw that nspawn containers git under openstack-ansible was archived about 3 years ago, though	12:23
mnasiadka	so now there's your answer ;-)	12:25
jheikkin	Thanks for the answers!	12:45
opendevreview	Merged openstack/kolla stable/2023.1: toolbox: Improve retry loop for ansible-galaxy https://review.opendev.org/c/openstack/kolla/+/910182	12:54
opendevreview	Merged openstack/kolla stable/zed: toolbox: Improve retry loop for ansible-galaxy https://review.opendev.org/c/openstack/kolla/+/910183	12:54
opendevreview	Verification of a change to openstack/kayobe master failed: Make hooks environment-aware https://review.opendev.org/c/openstack/kayobe/+/904141	13:10
kevko	mnasiadka, SvenKieske: I was wondering if we can implement rabbitmq blue green upgrade ... federated rabbitmq	13:30
mnasiadka	switching OpenStack applications to a second cluster? that's going to be fun	13:44
SvenKieske	I actually also had the same idea some weeks ago, but yeah, it would be quite some work, but maybe also more robust. you could write some elaborate script to use the "shovel" plugin to migrate messages from queue to queue. afaik I even found an online example where some other deployment tooling did that	14:06
SvenKieske	but that was written in dotNET IIRC :D	14:07
rohit02	@Team For Kolla Bobcat(2023.2) with external ceph is there any ceph version dependancy for cinder-volume.We are trying to integrate ceph nautilus and cinder volume throws exception	15:18
kevko	SvenKieske: do you have a link ?	16:13
SvenKieske	for the dotNet blue/green stuff? let me check my history..	16:13
SvenKieske	ah was not dotnet, but I guess the language doesn't really matter: https://github.com/Particular/NServiceBus.RabbitMQ/blob/master/src/NServiceBus.Transport.RabbitMQ.CommandLine/Commands/Queue/QueueMigrateCommand.cs	16:14
SvenKieske	I stumbled upon it, because they have actual docs around queue migration: https://docs.particular.net/transports/rabbitmq/operations-scripting#queue-migrate-to-quorum	16:15
SvenKieske	rereading it I'm not sure it's connected to blue green deployments, not sure, just skimmed it	16:16
greatgatsby__	Hello. Can I get some guidance on the neutron_l3_agent_failover_delay value? If I use the `40 + 3n` method, where n is the number of routers, and I have 100 projects each with a router, then my delay would be 340?	16:17
greatgatsby__	the docs also mention to "time how long an outage lasts" but I'm not sure what this means	16:18
SvenKieske	greatgatsby: well as the docs at https://docs.openstack.org/kolla-ansible/latest/reference/networking/neutron.html#l3-agent-high-availability say (it could maybe be worded more clearly), during an l3 agent restart the virtual routers need to be recreated, this can cause an outage, depending on the exact scenario	16:22
SvenKieske	the delay is there to make the failover from agent 1 to agent 2 configurable; so kolla-ansible waits for that delay before restarting (iirc)	16:23
SvenKieske	in theory that should allow all virtual routers to be migrated	16:23
greatgatsby__	SvenKieske: for sure, we actually implemented our own in yoga and are now backporting the solution from 2023.1. The only thing we're struggling with now is determining the value we should use for the failover delay.	16:23
SvenKieske	however it might be difficult to calculate the exact amount of seconds for the failover, because this is dependent on the recreation duration per router and the number of routers, which might not be static	16:24
SvenKieske	greatgatsby: I believe I already linked you to a different solution in the past? this might have some advantages (no need to calculate the downtime): https://review.opendev.org/c/openstack/kolla-ansible/+/874769	16:25
SvenKieske	you might also be interested in this throttled restart rewrite of the l3 agent handler: https://review.opendev.org/c/openstack/kolla-ansible/+/904134 (also not merged)	16:26
greatgatsby__	SvenKieske: thanks for that link. Since that is still a WIP PR we've decided to go with the current solution in 2023.1, for better or worse.	16:27
SvenKieske	yeah, I think we already talked about it. it's wip, but I personally happen to know that it runs in production :D	16:27
SvenKieske	for upstream the code is a little bit unclean, no proper code module etc.	16:28
SvenKieske	nevertheless, yeah, you need to time how long your environment needs to create a router and then extrapolate that	16:28
greatgatsby__	do you anticipate the 2023.1 solution changing to this for the next KA release?	16:29
SvenKieske	mhm I don't know, mnasiadka: someone at your company wanted to replace the current solution for a proper solution regarding neutron_l3_agent graceful restarts by splitting the processes up into more containers	16:30
SvenKieske	I don't know the status of that or when it is ready, or if it even started, maybe I'm also misremembering the company there, but someone wanted to work on it	16:30
spatel	Kolla has default DHCP agent per network setting 2 but when I created network Its showing only single agent running. am i missing something?	16:44
spatel	https://paste.opendev.org/show/bvdPRCGlA4tUIFTUCgMh/	16:44
mnasiadka	SvenKieske: it's work in progress, we'll put more effort into this in coming weeks	16:46
spatel	I didn't set enable_neutron_agent_ha	16:46
greatgatsby__	SvenKieske, mnasiadka: thanks for the feedback and dev effort	16:48
spatel	Can I enable enable_neutron_agent_ha: "yes" and re-run deploy?	16:52
spatel	does it going to make it dhcp-agent HA	16:52
spatel	Oh yes.. it did	16:53
-opendevstatus- NOTICE: Gerrit on review.opendev.org will be restarted to perform a minor upgrade to the service.		22:34

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!