Tuesday, 2024-01-23

opendevreviewMerged openstack/kolla-ansible master: Bump ansible-lint version  https://review.opendev.org/c/openstack/kolla-ansible/+/90624903:37
opendevreviewMichal Arbet proposed openstack/kolla master: Rework horizon image to support local_settings.d  https://review.opendev.org/c/openstack/kolla/+/90633905:38
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [CI] Fix podman cross-dependency build  https://review.opendev.org/c/openstack/kolla-ansible/+/90622906:49
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: Rework horizon role to support local_settings.d  https://review.opendev.org/c/openstack/kolla-ansible/+/90634706:50
opendevreviewMichal Arbet proposed openstack/kolla master: Rework horizon image to support local_settings.d  https://review.opendev.org/c/openstack/kolla/+/90633907:45
opendevreviewMichal Arbet proposed openstack/kolla master: Fix gnocchi and skyline after requirements change  https://review.opendev.org/c/openstack/kolla/+/90634907:45
opendevreviewTadas proposed openstack/kolla master: add: cross-compile support  https://review.opendev.org/c/openstack/kolla/+/88913908:02
kevkofrickler: https://review.opendev.org/q/topic:%22requirements-change%2208:03
kevkofrickler: and this is fix for depends-on gate fix we should merge asap also https://review.opendev.org/q/topic:%22depends-on-fix%22 08:05
fricklersadly I cannot log in to gerrit currently, working on resolving that first08:06
kevkofrickler: haha, new laptop or what ? 08:06
fricklerno, infra issue. if you are still logged in, try to keep it that way08:08
kevkofrickler: oh, good advice, thanks 08:08
sorin-mihaio/08:09
kevkosorin-mihai: \o08:09
sorin-mihaii have a multinode hci deployment, ceph-ansible pacific and kolla-ansible yoga (upgraded from xena). planning to upgrade to zed and then to antelope, must keep all things alive without downtime08:11
kevkosorin-mihai: read upgrade docs, use --check --diff 08:12
sorin-mihaimain issue i have is that during _any_ kolla-ansible deploy, _all_ neutron get killed the wrong way and incurs 10-15 minutes timeout on everything related to IPs of the instances08:12
kevkosorin-mihai: use limit, migrate resources to another node ? 08:13
sorin-mihaii fear it might be related to the rabbit queues because that is around the time when i see neutron flipping, but never got to properly isolate. now i have high available ques, and hoping the antelope quorum queues would make this more bearable?08:15
sorin-mihaikevko, you mean run deploy multiple times, limit one controller at a time?08:15
kevkosorin-mihai: i think i have  a patch in yoga for queues 08:15
kevkosorin-mihai: no, yoga is already OK 08:17
kevkosorin-mihai: rabbitmq and erlang is OK ? 08:17
sorin-mihaiafaik, yes, only during deploy things get flipped all at once08:17
kevkosorin-mihai: i think upgrade neutron server at once is OK 08:18
kevkosorin-mihai: BUT, you have to care other neutron services 08:18
kevkosorin-mihai: and it's not easy with kolla-ansible :P 08:18
sorin-mihaitbh, i would prefer to upgrade this cluster towards antelope before it goes into a heavier use08:18
kevkosorin-mihai: I think you can manually edit some handlers to not restart neutrons (except neutron-server) ...and once you have control plane upgrade with database migrations ...08:19
kevkosorin-mihai: then you can migrate routers ..etc..etc to another agent (node) ...and empty node you can upgrade again with all containers ...08:20
kevkosorin-mihai: somewhere I've already seen some patch to address this issue ...08:20
sorin-mihaiyou mean... kolla-ansible does _not_ deploy/restart containers in the right order?08:21
kevkosorin-mihai: no, but for example routers can oscilate during restarts of agent 08:22
sorin-mihaithe need to address these things manually outside of kolla-ansible seems like a major bug to me08:22
kevkosorin-mihai: feel free to fix it :) 08:22
kevkosorin-mihai: some time ago for example we fixed this https://review.opendev.org/c/openstack/kolla-ansible/+/90451508:22
sorin-mihaikevko, so that is in zed and should be fixing my actual problem?08:41
sorin-mihaithough, it looks like it was related to DVR in https://bugs.launchpad.net/kolla-ansible/+bug/2009884, but i don't use DVR, i think08:43
kevkosorin-mihai: i am really not completly sure ..i am migrating routers away from agent which i am going to upgrade 08:53
sorin-mihaithat's new to me. like migrating instances before a host reboot?08:54
-opendevstatus- NOTICE: all new logins to https://review.opendev.org are currently failing. investigation is ongoing, please be patient08:54
kevkosorin-mihai: i don't think it's only DVR ... normally if you have a router ..you have l3 agents which create routers on the hosts right ? ... active - passive -passive depends on config 08:56
kevkosorin-mihai: they know who is active because of keepalived between them right ? 08:56
kevkosorin-mihai: what do you think ? where the keepalived processes run in kolla deployment ? yes - in l3 agent container08:57
sorin-mihaihm, then i must have been wrong for a few years already... i have 3 control nodes, 3 dhcp agents for each subnet, one on each control node08:58
kevkokevko: so If I have hundreds, or thousands of neutron routers ... I am really afraid what will happen if i start to restart node1, node2, node3 ...because there will be hundreds or thousands packects just going from one node to another node ...garp packets ..arp table changes ...VIP jumping from host to host ... 08:59
kevkosorin-mihai: i am not saying that is your case ...but we have 120 computes ... 10 network nodes ...thousands of routers with replica 3 08:59
kevkosorin-mihai: so we upgrade control plane and then we are upgrading l3 agents but it takes days ... because we are doing it always in batches ...09:00
kevkosorin-mihai: it's about your processes and operation :) 09:00
kevkowe have 1500 k8s clusters inside 09:00
sorin-mihaiyeah, for me the effects were obvious, but too small cluster... 15 minutes downtime planned was ok-ish, with some blasting incurred09:01
kevkosorin-mihai: 4291 regular vms, 938 amphoras, 112 computes, 21660VCPUs 60TB ram09:01
kevkosorin-mihai: so, if you want to be totally safe without any downtime ..you can check how many and what routers are active on node1 for example ...09:03
sorin-mihaiyou mean compute node, yes?09:03
kevkosorin-mihai: you will remove it from agent and readd ...so you actually move active to the other one ..09:03
kevkosorin-mihai: no computes, network nodes .. where l3 agents are 09:04
sorin-mihaithat'l be the control nodes in my case09:04
kevkosorin-mihai: let me show you 09:07
sorin-mihaibut in that case, since for each subnet i have 3 dhcp agents, i am guessing i should be good to just use limit and do one control/network node at a time?09:07
kevkosorin-mihai: hmm, now it looks like it works as expected :D 09:12
kevkosorin-mihai: but still ..after some time router ip will jump to another server 09:16
kevkosorin-mihai: https://paste.openstack.org/show/bDiNHZinku4lWT4qk75F/09:17
kevkosorin-mihai: after some time ... ip just dissapeared and jump to another server 09:17
sorin-mihaiyeah, seems like expected behavior, but still unclear for me why with 3 agents and the router with HA enabled still causing the timeout. and not just any random time, but sometimes even 10 minutes or more09:25
kevkosorin-mihai: that's probably bug ... 09:26
kevkosorin-mihai: you have to debug 09:26
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: Rework horizon role to support local_settings.d  https://review.opendev.org/c/openstack/kolla-ansible/+/90634711:25
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: Rework horizon role to support local_settings.d  https://review.opendev.org/c/openstack/kolla-ansible/+/90634712:33
opendevreviewVerification of a change to openstack/kolla master failed: Fix gnocchi and skyline after requirements change  https://review.opendev.org/c/openstack/kolla/+/90634913:45
kevkofrickler: i fixed horizon ..but i can't make tests work again ..it's passing on deploy but failing on upgrade ..because there is previous failed horizon which is not working ...13:49
kevkofrickler: :( 13:49
kevkofrickler: we probably need to merge fix with turned off tests and then turn on by a patch...13:50
kevkofrickler: most simpler solution i think ...and revert if needed ..but i can confirm that fixed locally 13:50
fricklerhmm, but reqs for 2023.2 should not have been updated. which fix exactly are you talking about?13:58
kevkofrickler: 906347,3 currently on zuul status page 14:00
kevkofrickler: so I don't know what is going on ..or maybe I don't understand consequences between jobs ...14:01
kevkofrickler: I removed tons of comments from horizon k-a role and leave what kolla set and rework container  ...so in future local_settings lands from a code ... clean solution ..it's working ..but tests for upgrade not working and don't know why 14:03
fricklerkevko: the trouble is that horizon hides the traceback from import local_settings.py and just shows "No local_settings file found." instead14:12
fricklerI had overridden that in my local testing to see the actual issues14:13
fricklerhttps://opendev.org/openstack/horizon/src/branch/master/openstack_dashboard/settings.py#L243-L24614:13
kevkofrickler: i reworked it ... it has to work 14:31
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: Rework horizon role to support local_settings.d  https://review.opendev.org/c/openstack/kolla-ansible/+/90634714:31
kevkofrickler: I have a feeling that we are not testing the previous version -> master   ...but master -> current tested patch 14:32
kevkofrickler: and what the .... is this 14:33
kevkomichalarbet@pixla:~/ultimum/git/upstream/kolla-ansible$ egrep -ri 'previous_release:|openstack_previous_release_name' | grep 202314:34
kevkozuul.d/base.yaml:      previous_release: "2023.2"14:34
kevkoansible/group_vars/all.yml:openstack_previous_release_name: "2023.1"14:34
fricklerthat looks like an ooportunity for improvement ;)14:35
fricklerlike my spelling today :-/14:35
kevkofrickler: i fell like on university when i needed to push my code and wait for hours for output ... and you didn't know what are the input values :D ... so you needed to write some good logging ...output was limited to 1KB :D 14:39
kevko*feel14:39
opendevreviewVerification of a change to openstack/kolla master failed: Fix gnocchi and skyline after requirements change  https://review.opendev.org/c/openstack/kolla/+/90634914:54
*** mmalchuk_ is now known as mmalchuk16:06
opendevreviewWill Szumski proposed openstack/kayobe master: WIP: Remove docker devicemapper support  https://review.opendev.org/c/openstack/kayobe/+/90638616:37
opendevreviewMerged openstack/kolla master: Fix gnocchi and skyline after requirements change  https://review.opendev.org/c/openstack/kolla/+/90634917:25
*** osmanlicilegi is now known as Guest1618:22
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily for a restart, in order to attempt to restore OpenID login functionality19:36
-opendevstatus- NOTICE: OpenID logins for the Gerrit WebUI on review.opendev.org should be working normally again since the recent service restart20:03
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [DNM] Just test temporarily  https://review.opendev.org/c/openstack/kolla-ansible/+/90642321:24
opendevreviewPierre Riteau proposed openstack/kolla-ansible master: Remove outdated comments in dev mode docs  https://review.opendev.org/c/openstack/kolla-ansible/+/90642621:54
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [DNM] Just test temporarily  https://review.opendev.org/c/openstack/kolla-ansible/+/90642322:01
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [DNM] Just test temporarily  https://review.opendev.org/c/openstack/kolla-ansible/+/90642322:28
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [DNM] Just test temporarily  https://review.opendev.org/c/openstack/kolla-ansible/+/90642323:14
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [DNM] Just test temporarily  https://review.opendev.org/c/openstack/kolla-ansible/+/90642323:52

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!