Tuesday, 2024-03-12

opendevreview	Merged openstack/kolla-ansible stable/2023.1: Add precheck for RabbitMQ quorum queues https://review.opendev.org/c/openstack/kolla-ansible/+/909967	02:47
mnasiadka	kevko, frickler: Would be good to get new RMQ version in: https://review.opendev.org/c/openstack/kolla/+/911093 and https://review.opendev.org/c/openstack/kolla-ansible/+/911094 ;-)	06:49
opendevreview	Michal Nasiadka proposed openstack/kolla master: WIP: Switch to Ubuntu 24.04 https://review.opendev.org/c/openstack/kolla/+/907589	06:54
opendevreview	Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879	06:55
opendevreview	Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879	06:55
opendevreview	Rafal Lewandowski proposed openstack/kayobe master: Add Redfish rules to Ironic and Bifrost introspection https://review.opendev.org/c/openstack/kayobe/+/902772	08:09
SvenKieske	o/	08:56
SvenKieske	kevko: I also like prechecks against my own stupidity :)	08:57
kevko	mnasiadka: done	09:03
opendevreview	Verification of a change to openstack/kolla-ansible master failed: rabbitmq: Add 3.12 feature flags (for upgrade to 3.13) https://review.opendev.org/c/openstack/kolla-ansible/+/911094	09:07
frickler	kevko: mnasiadka: ^^ pulled the brake on that one, not sure if we need to stop the kolla change, too (and the dependency might need to be the other way round?)	09:09
opendevreview	Michal Arbet proposed openstack/kolla-ansible master: Fix images pull in ovs-dpdk role https://review.opendev.org/c/openstack/kolla-ansible/+/899626	09:11
kevko	frickler: hmmm, how yeah, good catch ...do we have only two multinodes jobs ?	09:17
frickler	kevko: no, we also have cephadm jobs, still waiting on results for those	09:18
kevko	frickler: Oops, sorry, that's my mistake. I didn't even notice that the CI was running. I saw it was verified +1, and I also remember that I did that rebase yesterday, and it was okay.	09:22
kevko	frickler: thanks	09:22
mnasiadka	frickler: have no clue if that would error or not, if yes - then only probably on slurp jobs	09:52
mnasiadka	and those two feature flags are in standard set, but I don't know if those are required	09:53
mnasiadka	frickler: https://rabbitmq-website.pages.dev/docs/feature-flags#core-feature-flags - no, these two new are not required	09:53
opendevreview	Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879	10:30
opendevreview	Martin Hiner proposed openstack/kolla-ansible master: Fix incorrect condition in kolla_container_facts https://review.opendev.org/c/openstack/kolla-ansible/+/912521	10:40
opendevreview	Verification of a change to openstack/kolla master failed: Bump rabbitmq to 3.13 https://review.opendev.org/c/openstack/kolla/+/911093	10:43
frickler	mnasiadka: rmq stopped but not starting again on upgrade ^^ I think I've seen a similar report earlier	10:55
mnasiadka	that's interesting	10:56
mnasiadka	recreate_or_restart_container should probably do some post-check	10:56
mnasiadka	especially that we use systemd now	11:00
opendevreview	Matúš Jenča proposed openstack/kolla-ansible master: Implement TLS for Redis https://review.opendev.org/c/openstack/kolla-ansible/+/909188	11:02
kevko	frickler: we were upgrading rabbitmq 3 weeks ago and rabbitmq didn't start	11:15
kevko	frickler: we needed to force start	11:15
kevko	2024-02-23 21:56:32.297 [info] <0.274.0> Waiting for Mnesia tables for 60000 ms, 9 retries left	11:21
kevko	2024-02-23 21:57:32.298 [warning] <0.274.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@controller0,rabbit@controller1,rabbit@controller2],[rabbit_user_permission,rabbit_semi_durable_route,rabbit_topic_trie_edge,rabbit_queue]}	11:22
kevko	we needed to force boot the node ....	11:23
kevko	my theory is that we are using pause_minority ...and clouds with huge load as we have can fail rabbitmq upgrade phase because of this ... i think we should switch to autoheal during the upgrade and after upgrade is done ..restart again to set pause_minority back ....	11:25
mnasiadka	well, in CI RMQ is never stopped, but maybe we just need handling for waiting for container to be really stopped	11:26
mnasiadka	(and started)	11:26
kevko	mnasiadka: during upgrade ?	11:29
mnasiadka	yes - see https://85e97ec41df2e8196f68-ad9677b8f3b079990c4951ac9cbbd797.ssl.cf1.rackcdn.com/911093/4/gate/kolla-ansible-debian-upgrade/2f81bbb/primary/logs/ansible/reconfigure-rabbitmq	11:29
mnasiadka	wondering if it shows up again	11:29
mnasiadka	(did a recheck)	11:29
mnasiadka	and it's only Debian for now	11:30
kevko	mnasiadka: this is not issue i've seen i think ...	11:34
kevko	mnasiadka: and this is only one node rabbit	11:34
mnasiadka	yup	11:34
kevko	mnasiadka: what i've seen is this	11:42
kevko	mnasiadka: https://paste.openstack.org/show/bgZwGqWjAtzgjOxICj3H/	11:42
kevko	mnasiadka: upgrade was from xena -> yoga .. and if I am correct ..during the upgrade ..two nodes are stopped and then restarted ....and that's the thing .pause minority will break start of the node which lost two neighbours	11:43
kevko	mnasiadka: i haven't investigated master yet	11:44
kevko	because there is a different code	11:44
kevko	anybody to approve https://review.opendev.org/c/openstack/kolla-ansible/+/899626 ? trivial and bug fixing	12:04
mnasiadka	I'm still amazed that anybody uses it and it works	12:10
kevko	mnasiadka: haha, yeah ... I was asked to push it little bit as we have customer who really using it :)	12:10
kevko	mnasiadka: and I've already verified onsite that it's working ...	12:11
kevko	mnasiadka: thanks	12:11
mnasiadka	still don't understand why it's a separate role - and we don't even publish those images I think	12:11
mnasiadka	ah, we do for debian/ubuntu	12:11
kevko	mnasiadka: if it is in kolla repo ..it's buildable - so it's supported :)	12:15
opendevreview	Verification of a change to openstack/kolla master failed: Bump rabbitmq to 3.13 https://review.opendev.org/c/openstack/kolla/+/911093	12:39
mnasiadka	and failed again, nice	12:57
mnasiadka	what is with debian :)	12:58
mnasiadka	hmm, weird, it's some old notification	12:59
frickler	mnasiadka: the notification is repeated when the arm64 result comes in since that doesn't change the V-2 state	13:08
mnasiadka	frickler: a bit misleading, but whatever	13:09
mnasiadka	well, the bot got tired of my complaints	13:09
opendevreview	Merged openstack/kolla-ansible stable/2023.1: Rework quorum queues precheck https://review.opendev.org/c/openstack/kolla-ansible/+/909968	13:14
opendevreview	Verification of a change to openstack/kolla-ansible master failed: Fix images pull in ovs-dpdk role https://review.opendev.org/c/openstack/kolla-ansible/+/899626	13:43
opendevreview	Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879	13:45
mnasiadka	frickler: I think after tooz pin the cephadm multinode jobs are so stable that I'm tempted to change them to voting and gate on them ;-)	13:51
SvenKieske	that would actually be very nice if it worked.	13:52
frickler	mnasiadka: yes, let's do that	13:55
opendevreview	Michal Nasiadka proposed openstack/kolla-ansible master: CI: Change cephadm jobs to voting and add them to gating https://review.opendev.org/c/openstack/kolla-ansible/+/912585	13:59
mnasiadka	let's see	13:59
SvenKieske	kevko: (anybody interested in rmq really) maybe have a look at these new rmq tuning parameters: https://review.opendev.org/c/openstack/kolla-ansible/+/900528	14:03
SvenKieske	looking at the multinode jobs: did I miss it or do we not have any multinode jobs on debian?	14:05
SvenKieske	https://review.opendev.org/c/openstack/kolla-ansible/+/899626?tab=change-view-tab-header-zuul-results-summary fails also on debian upgrade in gate pipeline :(	14:13
SvenKieske	the same rmq error as above	14:15
mnasiadka	SvenKieske: not the same	14:16
mnasiadka	in the rmq 3.13 case it was not existing container	14:16
mnasiadka	here wait is timing out	14:16
mnasiadka	wonder why only on Debian	14:16
SvenKieske	ah right, sorry, don't know how I mixed that up	14:17
mnasiadka	and it's deploy phase	14:17
SvenKieske	yeah, it's right after restart of rmq container, about which you talked somewhere up there^^	14:18
SvenKieske	[11:55] <frickler> mnasiadka: rmq stopped but not starting again on upgrade ^^ I think I've seen a similar report earlier <- this one	14:18
SvenKieske	ah, but that was multinode ipv6 ubuntu iiuc	14:20
SvenKieske	but different error, yes. "container is not running" vs "Waiting for pid..."	14:22
mnasiadka	SvenKieske: standard timeout is 10 seconds, from the log it seems rabbitmq came up in something like 15 seconds	14:25
Lockesmith	I'm generating certificates using certbot. I have figured out what I need to piece together to get internal and external traffic working with tls, but I can't figure out exactly what I need to get backend tls working. Can I even use a certbot cert for that?	14:29
opendevreview	Michal Nasiadka proposed openstack/kolla-ansible master: rabbitmq: bump wait timeout to 60 seconds https://review.opendev.org/c/openstack/kolla-ansible/+/912586	14:30
SvenKieske	Lockesmith: did you read https://docs.openstack.org/kolla-ansible/latest/admin/tls.html#back-end-tls-configuration ? is there anything missing there not answering your question?	14:46
opendevreview	Michal Nasiadka proposed openstack/kolla-ansible master: rabbitmq: bump wait timeout to 60 seconds https://review.opendev.org/c/openstack/kolla-ansible/+/912586	14:48
Lockesmith	SvenKieske: I did. And everything looks good up until applying to keystone at which point the action `os_keystone_service` fails with a 503.	15:17
Lockesmith	It fails at `TASK [service-ks-register : keystone \| Creating services]`	15:17
SvenKieske	which openstack version do you run and do you happen to build your own containers or do you consume a stable git branch, or do you install packages from pypi?	15:18
mhiner	Hello, can you please review this small fix: https://review.opendev.org/c/openstack/kolla-ansible/+/912521	15:20
opendevreview	Merged openstack/kolla master: Bump rabbitmq to 3.13 https://review.opendev.org/c/openstack/kolla/+/911093	15:58
opendevreview	Merged openstack/kolla-ansible stable/2023.1: RabbitMQ: correct docs on Quorum Queue migrations https://review.opendev.org/c/openstack/kolla-ansible/+/909969	15:58
Lockesmith	SvenKieske: I install it from pypi and use the master branch	16:14
SvenKieske	Lockesmith: can you provide the full trace of that task failure? e.g. via http://paste.openstack.org ?	16:20
opendevreview	Matt Crees proposed openstack/kolla-ansible master: CI: Only migrate RMQ queues during SLURP https://review.opendev.org/c/openstack/kolla-ansible/+/909971	16:21
Lockesmith	SvenKieske: absolutely! https://paste.opendev.org/show/bZAZUOcTN2RnOjKegeOk/	16:44
SvenKieske	Lockesmith: well the error seems to be that your provided auth_url is either incorrect, or if it is correct, which I guess it is, there's a problem with https://openstack.coldforge.xyz:5000 (which is the default keystone auth port). it returns a HTTP 503 service unavailable	16:47
SvenKieske	possibly the keystone logs have more information what has gone wrong there :)	16:48
SvenKieske	so this error seems, at first sight, unrelated to your actual task, but it's crucial that authentication is working. you might also want to check your keystone backend, if there is anything externally configured, like LDAP, active directory, etc.	16:51
Lockesmith	Sorry, my brain is all over the place. I'll grab those too for you.	16:52
Lockesmith	SvenKieske: I'll paste the logs if you'd like still, but I cleared them and ran the reconfigure again and there's not an error in them nor the docker logs.	17:53
SvenKieske	Lockesmith: if it works now it was most likely a spurious network error, might be worth to track it down yourself, but I doubt it's a bug in the software, rather something in your setup/hardware. :)	17:56
Lockesmith	SvenKieske: It's not working, I just can't find any error in the logs outside of the one I sent earlier.	17:57
Lockesmith	Though I'm not really sure which servers to check other than keystone and haproxy	17:57
Lockesmith	services*	17:57
Lockesmith	Do I need to copy the letsencrypt ca into my containers?	18:20

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!