opendevreview | Merged openstack/kolla-ansible master: Bump ansible-lint version https://review.opendev.org/c/openstack/kolla-ansible/+/906249 | 03:37 |
---|---|---|
opendevreview | Michal Arbet proposed openstack/kolla master: Rework horizon image to support local_settings.d https://review.opendev.org/c/openstack/kolla/+/906339 | 05:38 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [CI] Fix podman cross-dependency build https://review.opendev.org/c/openstack/kolla-ansible/+/906229 | 06:49 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: Rework horizon role to support local_settings.d https://review.opendev.org/c/openstack/kolla-ansible/+/906347 | 06:50 |
opendevreview | Michal Arbet proposed openstack/kolla master: Rework horizon image to support local_settings.d https://review.opendev.org/c/openstack/kolla/+/906339 | 07:45 |
opendevreview | Michal Arbet proposed openstack/kolla master: Fix gnocchi and skyline after requirements change https://review.opendev.org/c/openstack/kolla/+/906349 | 07:45 |
opendevreview | Tadas proposed openstack/kolla master: add: cross-compile support https://review.opendev.org/c/openstack/kolla/+/889139 | 08:02 |
kevko | frickler: https://review.opendev.org/q/topic:%22requirements-change%22 | 08:03 |
kevko | frickler: and this is fix for depends-on gate fix we should merge asap also https://review.opendev.org/q/topic:%22depends-on-fix%22 | 08:05 |
frickler | sadly I cannot log in to gerrit currently, working on resolving that first | 08:06 |
kevko | frickler: haha, new laptop or what ? | 08:06 |
frickler | no, infra issue. if you are still logged in, try to keep it that way | 08:08 |
kevko | frickler: oh, good advice, thanks | 08:08 |
sorin-mihai | o/ | 08:09 |
kevko | sorin-mihai: \o | 08:09 |
sorin-mihai | i have a multinode hci deployment, ceph-ansible pacific and kolla-ansible yoga (upgraded from xena). planning to upgrade to zed and then to antelope, must keep all things alive without downtime | 08:11 |
kevko | sorin-mihai: read upgrade docs, use --check --diff | 08:12 |
sorin-mihai | main issue i have is that during _any_ kolla-ansible deploy, _all_ neutron get killed the wrong way and incurs 10-15 minutes timeout on everything related to IPs of the instances | 08:12 |
kevko | sorin-mihai: use limit, migrate resources to another node ? | 08:13 |
sorin-mihai | i fear it might be related to the rabbit queues because that is around the time when i see neutron flipping, but never got to properly isolate. now i have high available ques, and hoping the antelope quorum queues would make this more bearable? | 08:15 |
sorin-mihai | kevko, you mean run deploy multiple times, limit one controller at a time? | 08:15 |
kevko | sorin-mihai: i think i have a patch in yoga for queues | 08:15 |
kevko | sorin-mihai: no, yoga is already OK | 08:17 |
kevko | sorin-mihai: rabbitmq and erlang is OK ? | 08:17 |
sorin-mihai | afaik, yes, only during deploy things get flipped all at once | 08:17 |
kevko | sorin-mihai: i think upgrade neutron server at once is OK | 08:18 |
kevko | sorin-mihai: BUT, you have to care other neutron services | 08:18 |
kevko | sorin-mihai: and it's not easy with kolla-ansible :P | 08:18 |
sorin-mihai | tbh, i would prefer to upgrade this cluster towards antelope before it goes into a heavier use | 08:18 |
kevko | sorin-mihai: I think you can manually edit some handlers to not restart neutrons (except neutron-server) ...and once you have control plane upgrade with database migrations ... | 08:19 |
kevko | sorin-mihai: then you can migrate routers ..etc..etc to another agent (node) ...and empty node you can upgrade again with all containers ... | 08:20 |
kevko | sorin-mihai: somewhere I've already seen some patch to address this issue ... | 08:20 |
sorin-mihai | you mean... kolla-ansible does _not_ deploy/restart containers in the right order? | 08:21 |
kevko | sorin-mihai: no, but for example routers can oscilate during restarts of agent | 08:22 |
sorin-mihai | the need to address these things manually outside of kolla-ansible seems like a major bug to me | 08:22 |
kevko | sorin-mihai: feel free to fix it :) | 08:22 |
kevko | sorin-mihai: some time ago for example we fixed this https://review.opendev.org/c/openstack/kolla-ansible/+/904515 | 08:22 |
sorin-mihai | kevko, so that is in zed and should be fixing my actual problem? | 08:41 |
sorin-mihai | though, it looks like it was related to DVR in https://bugs.launchpad.net/kolla-ansible/+bug/2009884, but i don't use DVR, i think | 08:43 |
kevko | sorin-mihai: i am really not completly sure ..i am migrating routers away from agent which i am going to upgrade | 08:53 |
sorin-mihai | that's new to me. like migrating instances before a host reboot? | 08:54 |
-opendevstatus- NOTICE: all new logins to https://review.opendev.org are currently failing. investigation is ongoing, please be patient | 08:54 | |
kevko | sorin-mihai: i don't think it's only DVR ... normally if you have a router ..you have l3 agents which create routers on the hosts right ? ... active - passive -passive depends on config | 08:56 |
kevko | sorin-mihai: they know who is active because of keepalived between them right ? | 08:56 |
kevko | sorin-mihai: what do you think ? where the keepalived processes run in kolla deployment ? yes - in l3 agent container | 08:57 |
sorin-mihai | hm, then i must have been wrong for a few years already... i have 3 control nodes, 3 dhcp agents for each subnet, one on each control node | 08:58 |
kevko | kevko: so If I have hundreds, or thousands of neutron routers ... I am really afraid what will happen if i start to restart node1, node2, node3 ...because there will be hundreds or thousands packects just going from one node to another node ...garp packets ..arp table changes ...VIP jumping from host to host ... | 08:59 |
kevko | sorin-mihai: i am not saying that is your case ...but we have 120 computes ... 10 network nodes ...thousands of routers with replica 3 | 08:59 |
kevko | sorin-mihai: so we upgrade control plane and then we are upgrading l3 agents but it takes days ... because we are doing it always in batches ... | 09:00 |
kevko | sorin-mihai: it's about your processes and operation :) | 09:00 |
kevko | we have 1500 k8s clusters inside | 09:00 |
sorin-mihai | yeah, for me the effects were obvious, but too small cluster... 15 minutes downtime planned was ok-ish, with some blasting incurred | 09:01 |
kevko | sorin-mihai: 4291 regular vms, 938 amphoras, 112 computes, 21660VCPUs 60TB ram | 09:01 |
kevko | sorin-mihai: so, if you want to be totally safe without any downtime ..you can check how many and what routers are active on node1 for example ... | 09:03 |
sorin-mihai | you mean compute node, yes? | 09:03 |
kevko | sorin-mihai: you will remove it from agent and readd ...so you actually move active to the other one .. | 09:03 |
kevko | sorin-mihai: no computes, network nodes .. where l3 agents are | 09:04 |
sorin-mihai | that'l be the control nodes in my case | 09:04 |
kevko | sorin-mihai: let me show you | 09:07 |
sorin-mihai | but in that case, since for each subnet i have 3 dhcp agents, i am guessing i should be good to just use limit and do one control/network node at a time? | 09:07 |
kevko | sorin-mihai: hmm, now it looks like it works as expected :D | 09:12 |
kevko | sorin-mihai: but still ..after some time router ip will jump to another server | 09:16 |
kevko | sorin-mihai: https://paste.openstack.org/show/bDiNHZinku4lWT4qk75F/ | 09:17 |
kevko | sorin-mihai: after some time ... ip just dissapeared and jump to another server | 09:17 |
sorin-mihai | yeah, seems like expected behavior, but still unclear for me why with 3 agents and the router with HA enabled still causing the timeout. and not just any random time, but sometimes even 10 minutes or more | 09:25 |
kevko | sorin-mihai: that's probably bug ... | 09:26 |
kevko | sorin-mihai: you have to debug | 09:26 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: Rework horizon role to support local_settings.d https://review.opendev.org/c/openstack/kolla-ansible/+/906347 | 11:25 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: Rework horizon role to support local_settings.d https://review.opendev.org/c/openstack/kolla-ansible/+/906347 | 12:33 |
opendevreview | Verification of a change to openstack/kolla master failed: Fix gnocchi and skyline after requirements change https://review.opendev.org/c/openstack/kolla/+/906349 | 13:45 |
kevko | frickler: i fixed horizon ..but i can't make tests work again ..it's passing on deploy but failing on upgrade ..because there is previous failed horizon which is not working ... | 13:49 |
kevko | frickler: :( | 13:49 |
kevko | frickler: we probably need to merge fix with turned off tests and then turn on by a patch... | 13:50 |
kevko | frickler: most simpler solution i think ...and revert if needed ..but i can confirm that fixed locally | 13:50 |
frickler | hmm, but reqs for 2023.2 should not have been updated. which fix exactly are you talking about? | 13:58 |
kevko | frickler: 906347,3 currently on zuul status page | 14:00 |
kevko | frickler: so I don't know what is going on ..or maybe I don't understand consequences between jobs ... | 14:01 |
kevko | frickler: I removed tons of comments from horizon k-a role and leave what kolla set and rework container ...so in future local_settings lands from a code ... clean solution ..it's working ..but tests for upgrade not working and don't know why | 14:03 |
frickler | kevko: the trouble is that horizon hides the traceback from import local_settings.py and just shows "No local_settings file found." instead | 14:12 |
frickler | I had overridden that in my local testing to see the actual issues | 14:13 |
frickler | https://opendev.org/openstack/horizon/src/branch/master/openstack_dashboard/settings.py#L243-L246 | 14:13 |
kevko | frickler: i reworked it ... it has to work | 14:31 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: Rework horizon role to support local_settings.d https://review.opendev.org/c/openstack/kolla-ansible/+/906347 | 14:31 |
kevko | frickler: I have a feeling that we are not testing the previous version -> master ...but master -> current tested patch | 14:32 |
kevko | frickler: and what the .... is this | 14:33 |
kevko | michalarbet@pixla:~/ultimum/git/upstream/kolla-ansible$ egrep -ri 'previous_release:|openstack_previous_release_name' | grep 2023 | 14:34 |
kevko | zuul.d/base.yaml: previous_release: "2023.2" | 14:34 |
kevko | ansible/group_vars/all.yml:openstack_previous_release_name: "2023.1" | 14:34 |
frickler | that looks like an ooportunity for improvement ;) | 14:35 |
frickler | like my spelling today :-/ | 14:35 |
kevko | frickler: i fell like on university when i needed to push my code and wait for hours for output ... and you didn't know what are the input values :D ... so you needed to write some good logging ...output was limited to 1KB :D | 14:39 |
kevko | *feel | 14:39 |
opendevreview | Verification of a change to openstack/kolla master failed: Fix gnocchi and skyline after requirements change https://review.opendev.org/c/openstack/kolla/+/906349 | 14:54 |
*** mmalchuk_ is now known as mmalchuk | 16:06 | |
opendevreview | Will Szumski proposed openstack/kayobe master: WIP: Remove docker devicemapper support https://review.opendev.org/c/openstack/kayobe/+/906386 | 16:37 |
opendevreview | Merged openstack/kolla master: Fix gnocchi and skyline after requirements change https://review.opendev.org/c/openstack/kolla/+/906349 | 17:25 |
*** osmanlicilegi is now known as Guest16 | 18:22 | |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily for a restart, in order to attempt to restore OpenID login functionality | 19:36 | |
-opendevstatus- NOTICE: OpenID logins for the Gerrit WebUI on review.opendev.org should be working normally again since the recent service restart | 20:03 | |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [DNM] Just test temporarily https://review.opendev.org/c/openstack/kolla-ansible/+/906423 | 21:24 |
opendevreview | Pierre Riteau proposed openstack/kolla-ansible master: Remove outdated comments in dev mode docs https://review.opendev.org/c/openstack/kolla-ansible/+/906426 | 21:54 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [DNM] Just test temporarily https://review.opendev.org/c/openstack/kolla-ansible/+/906423 | 22:01 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [DNM] Just test temporarily https://review.opendev.org/c/openstack/kolla-ansible/+/906423 | 22:28 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [DNM] Just test temporarily https://review.opendev.org/c/openstack/kolla-ansible/+/906423 | 23:14 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [DNM] Just test temporarily https://review.opendev.org/c/openstack/kolla-ansible/+/906423 | 23:52 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!