opendevreview | Merged openstack/kolla-ansible stable/2023.1: Add precheck for RabbitMQ quorum queues https://review.opendev.org/c/openstack/kolla-ansible/+/909967 | 02:47 |
---|---|---|
mnasiadka | kevko, frickler: Would be good to get new RMQ version in: https://review.opendev.org/c/openstack/kolla/+/911093 and https://review.opendev.org/c/openstack/kolla-ansible/+/911094 ;-) | 06:49 |
opendevreview | Michal Nasiadka proposed openstack/kolla master: WIP: Switch to Ubuntu 24.04 https://review.opendev.org/c/openstack/kolla/+/907589 | 06:54 |
opendevreview | Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879 | 06:55 |
opendevreview | Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879 | 06:55 |
opendevreview | Rafal Lewandowski proposed openstack/kayobe master: Add Redfish rules to Ironic and Bifrost introspection https://review.opendev.org/c/openstack/kayobe/+/902772 | 08:09 |
SvenKieske | o/ | 08:56 |
SvenKieske | kevko: I also like prechecks against my own stupidity :) | 08:57 |
kevko | mnasiadka: done | 09:03 |
opendevreview | Verification of a change to openstack/kolla-ansible master failed: rabbitmq: Add 3.12 feature flags (for upgrade to 3.13) https://review.opendev.org/c/openstack/kolla-ansible/+/911094 | 09:07 |
frickler | kevko: mnasiadka: ^^ pulled the brake on that one, not sure if we need to stop the kolla change, too (and the dependency might need to be the other way round?) | 09:09 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: Fix images pull in ovs-dpdk role https://review.opendev.org/c/openstack/kolla-ansible/+/899626 | 09:11 |
kevko | frickler: hmmm, how yeah, good catch ...do we have only two multinodes jobs ? | 09:17 |
frickler | kevko: no, we also have cephadm jobs, still waiting on results for those | 09:18 |
kevko | frickler: Oops, sorry, that's my mistake. I didn't even notice that the CI was running. I saw it was verified +1, and I also remember that I did that rebase yesterday, and it was okay. | 09:22 |
kevko | frickler: thanks | 09:22 |
mnasiadka | frickler: have no clue if that would error or not, if yes - then only probably on slurp jobs | 09:52 |
mnasiadka | and those two feature flags are in standard set, but I don't know if those are required | 09:53 |
mnasiadka | frickler: https://rabbitmq-website.pages.dev/docs/feature-flags#core-feature-flags - no, these two new are not required | 09:53 |
opendevreview | Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879 | 10:30 |
opendevreview | Martin Hiner proposed openstack/kolla-ansible master: Fix incorrect condition in kolla_container_facts https://review.opendev.org/c/openstack/kolla-ansible/+/912521 | 10:40 |
opendevreview | Verification of a change to openstack/kolla master failed: Bump rabbitmq to 3.13 https://review.opendev.org/c/openstack/kolla/+/911093 | 10:43 |
frickler | mnasiadka: rmq stopped but not starting again on upgrade ^^ I think I've seen a similar report earlier | 10:55 |
mnasiadka | that's interesting | 10:56 |
mnasiadka | recreate_or_restart_container should probably do some post-check | 10:56 |
mnasiadka | especially that we use systemd now | 11:00 |
opendevreview | Matúš Jenča proposed openstack/kolla-ansible master: Implement TLS for Redis https://review.opendev.org/c/openstack/kolla-ansible/+/909188 | 11:02 |
kevko | frickler: we were upgrading rabbitmq 3 weeks ago and rabbitmq didn't start | 11:15 |
kevko | frickler: we needed to force start | 11:15 |
kevko | 2024-02-23 21:56:32.297 [info] <0.274.0> Waiting for Mnesia tables for 60000 ms, 9 retries left | 11:21 |
kevko | 2024-02-23 21:57:32.298 [warning] <0.274.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@controller0,rabbit@controller1,rabbit@controller2],[rabbit_user_permission,rabbit_semi_durable_route,rabbit_topic_trie_edge,rabbit_queue]} | 11:22 |
kevko | we needed to force boot the node .... | 11:23 |
kevko | my theory is that we are using pause_minority ...and clouds with huge load as we have can fail rabbitmq upgrade phase because of this ... i think we should switch to autoheal during the upgrade and after upgrade is done ..restart again to set pause_minority back .... | 11:25 |
mnasiadka | well, in CI RMQ is never stopped, but maybe we just need handling for waiting for container to be really stopped | 11:26 |
mnasiadka | (and started) | 11:26 |
kevko | mnasiadka: during upgrade ? | 11:29 |
mnasiadka | yes - see https://85e97ec41df2e8196f68-ad9677b8f3b079990c4951ac9cbbd797.ssl.cf1.rackcdn.com/911093/4/gate/kolla-ansible-debian-upgrade/2f81bbb/primary/logs/ansible/reconfigure-rabbitmq | 11:29 |
mnasiadka | wondering if it shows up again | 11:29 |
mnasiadka | (did a recheck) | 11:29 |
mnasiadka | and it's only Debian for now | 11:30 |
kevko | mnasiadka: this is not issue i've seen i think ... | 11:34 |
kevko | mnasiadka: and this is only one node rabbit | 11:34 |
mnasiadka | yup | 11:34 |
kevko | mnasiadka: what i've seen is this | 11:42 |
kevko | mnasiadka: https://paste.openstack.org/show/bgZwGqWjAtzgjOxICj3H/ | 11:42 |
kevko | mnasiadka: upgrade was from xena -> yoga .. and if I am correct ..during the upgrade ..two nodes are stopped and then restarted ....and that's the thing .pause minority will break start of the node which lost two neighbours | 11:43 |
kevko | mnasiadka: i haven't investigated master yet | 11:44 |
kevko | because there is a different code | 11:44 |
kevko | anybody to approve https://review.opendev.org/c/openstack/kolla-ansible/+/899626 ? trivial and bug fixing | 12:04 |
mnasiadka | I'm still amazed that anybody uses it and it works | 12:10 |
kevko | mnasiadka: haha, yeah ... I was asked to push it little bit as we have customer who really using it :) | 12:10 |
kevko | mnasiadka: and I've already verified onsite that it's working ... | 12:11 |
kevko | mnasiadka: thanks | 12:11 |
mnasiadka | still don't understand why it's a separate role - and we don't even publish those images I think | 12:11 |
mnasiadka | ah, we do for debian/ubuntu | 12:11 |
kevko | mnasiadka: if it is in kolla repo ..it's buildable - so it's supported :) | 12:15 |
opendevreview | Verification of a change to openstack/kolla master failed: Bump rabbitmq to 3.13 https://review.opendev.org/c/openstack/kolla/+/911093 | 12:39 |
mnasiadka | and failed again, nice | 12:57 |
mnasiadka | what is with debian :) | 12:58 |
mnasiadka | hmm, weird, it's some old notification | 12:59 |
frickler | mnasiadka: the notification is repeated when the arm64 result comes in since that doesn't change the V-2 state | 13:08 |
mnasiadka | frickler: a bit misleading, but whatever | 13:09 |
mnasiadka | well, the bot got tired of my complaints | 13:09 |
opendevreview | Merged openstack/kolla-ansible stable/2023.1: Rework quorum queues precheck https://review.opendev.org/c/openstack/kolla-ansible/+/909968 | 13:14 |
opendevreview | Verification of a change to openstack/kolla-ansible master failed: Fix images pull in ovs-dpdk role https://review.opendev.org/c/openstack/kolla-ansible/+/899626 | 13:43 |
opendevreview | Michal Nasiadka proposed openstack/kolla master: WIP: Add support for rpm to repos.yaml https://review.opendev.org/c/openstack/kolla/+/909879 | 13:45 |
mnasiadka | frickler: I think after tooz pin the cephadm multinode jobs are so stable that I'm tempted to change them to voting and gate on them ;-) | 13:51 |
SvenKieske | that would actually be very nice if it worked. | 13:52 |
frickler | mnasiadka: yes, let's do that | 13:55 |
opendevreview | Michal Nasiadka proposed openstack/kolla-ansible master: CI: Change cephadm jobs to voting and add them to gating https://review.opendev.org/c/openstack/kolla-ansible/+/912585 | 13:59 |
mnasiadka | let's see | 13:59 |
SvenKieske | kevko: (anybody interested in rmq really) maybe have a look at these new rmq tuning parameters: https://review.opendev.org/c/openstack/kolla-ansible/+/900528 | 14:03 |
SvenKieske | looking at the multinode jobs: did I miss it or do we not have any multinode jobs on debian? | 14:05 |
SvenKieske | https://review.opendev.org/c/openstack/kolla-ansible/+/899626?tab=change-view-tab-header-zuul-results-summary fails also on debian upgrade in gate pipeline :( | 14:13 |
SvenKieske | the same rmq error as above | 14:15 |
mnasiadka | SvenKieske: not the same | 14:16 |
mnasiadka | in the rmq 3.13 case it was not existing container | 14:16 |
mnasiadka | here wait is timing out | 14:16 |
mnasiadka | wonder why only on Debian | 14:16 |
SvenKieske | ah right, sorry, don't know how I mixed that up | 14:17 |
mnasiadka | and it's deploy phase | 14:17 |
SvenKieske | yeah, it's right after restart of rmq container, about which you talked somewhere up there^^ | 14:18 |
SvenKieske | [11:55] <frickler> mnasiadka: rmq stopped but not starting again on upgrade ^^ I think I've seen a similar report earlier <- this one | 14:18 |
SvenKieske | ah, but that was multinode ipv6 ubuntu iiuc | 14:20 |
SvenKieske | but different error, yes. "container is not running" vs "Waiting for pid..." | 14:22 |
mnasiadka | SvenKieske: standard timeout is 10 seconds, from the log it seems rabbitmq came up in something like 15 seconds | 14:25 |
Lockesmith | I'm generating certificates using certbot. I have figured out what I need to piece together to get internal and external traffic working with tls, but I can't figure out exactly what I need to get backend tls working. Can I even use a certbot cert for that? | 14:29 |
opendevreview | Michal Nasiadka proposed openstack/kolla-ansible master: rabbitmq: bump wait timeout to 60 seconds https://review.opendev.org/c/openstack/kolla-ansible/+/912586 | 14:30 |
SvenKieske | Lockesmith: did you read https://docs.openstack.org/kolla-ansible/latest/admin/tls.html#back-end-tls-configuration ? is there anything missing there not answering your question? | 14:46 |
opendevreview | Michal Nasiadka proposed openstack/kolla-ansible master: rabbitmq: bump wait timeout to 60 seconds https://review.opendev.org/c/openstack/kolla-ansible/+/912586 | 14:48 |
Lockesmith | SvenKieske: I did. And everything looks good up until applying to keystone at which point the action `os_keystone_service` fails with a 503. | 15:17 |
Lockesmith | It fails at `TASK [service-ks-register : keystone | Creating services]` | 15:17 |
SvenKieske | which openstack version do you run and do you happen to build your own containers or do you consume a stable git branch, or do you install packages from pypi? | 15:18 |
mhiner | Hello, can you please review this small fix: https://review.opendev.org/c/openstack/kolla-ansible/+/912521 | 15:20 |
opendevreview | Merged openstack/kolla master: Bump rabbitmq to 3.13 https://review.opendev.org/c/openstack/kolla/+/911093 | 15:58 |
opendevreview | Merged openstack/kolla-ansible stable/2023.1: RabbitMQ: correct docs on Quorum Queue migrations https://review.opendev.org/c/openstack/kolla-ansible/+/909969 | 15:58 |
Lockesmith | SvenKieske: I install it from pypi and use the master branch | 16:14 |
SvenKieske | Lockesmith: can you provide the full trace of that task failure? e.g. via http://paste.openstack.org ? | 16:20 |
opendevreview | Matt Crees proposed openstack/kolla-ansible master: CI: Only migrate RMQ queues during SLURP https://review.opendev.org/c/openstack/kolla-ansible/+/909971 | 16:21 |
Lockesmith | SvenKieske: absolutely! https://paste.opendev.org/show/bZAZUOcTN2RnOjKegeOk/ | 16:44 |
SvenKieske | Lockesmith: well the error seems to be that your provided auth_url is either incorrect, or if it is correct, which I guess it is, there's a problem with https://openstack.coldforge.xyz:5000 (which is the default keystone auth port). it returns a HTTP 503 service unavailable | 16:47 |
SvenKieske | possibly the keystone logs have more information what has gone wrong there :) | 16:48 |
SvenKieske | so this error seems, at first sight, unrelated to your actual task, but it's crucial that authentication is working. you might also want to check your keystone backend, if there is anything externally configured, like LDAP, active directory, etc. | 16:51 |
Lockesmith | Sorry, my brain is all over the place. I'll grab those too for you. | 16:52 |
Lockesmith | SvenKieske: I'll paste the logs if you'd like still, but I cleared them and ran the reconfigure again and there's not an error in them nor the docker logs. | 17:53 |
SvenKieske | Lockesmith: if it works now it was most likely a spurious network error, might be worth to track it down yourself, but I doubt it's a bug in the software, rather something in your setup/hardware. :) | 17:56 |
Lockesmith | SvenKieske: It's not working, I just can't find any error in the logs outside of the one I sent earlier. | 17:57 |
Lockesmith | Though I'm not really sure which servers to check other than keystone and haproxy | 17:57 |
Lockesmith | services* | 17:57 |
Lockesmith | Do I need to copy the letsencrypt ca into my containers? | 18:20 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!