Friday, 2021-09-24

*** rpittau|afk is now known as rpittau06:41
*** odyssey4me is now known as Guest84708:50
snadgei have dhcp and dhcpv6 enabled on the wan interface, and now the ipv4 address only times out after 1800 seconds11:56
snadgeits like it wont renew.. but if i manually renew it does, pretty frustrating11:56
snadgei could just set the ipv4 address statically.. as it is a static ip, but that's annoying, i want to know why its doing that11:57
spatelnoonedeadpunk yesterday i have upgraded openstack V to W without any issue. 12:58
spatelit took 8 hours to completed full upgrade 12:58
mgariepynice spatel, what is your network stack ? ovs ? or still lxb ?13:06
spatellxb13:06
mgariepyhow many host do you have ?13:06
spatelthis is production 100 around 13:06
mgariepycool.13:07
noonedeadpunkgreat news!13:07
spatelis that normal to take 8 hour ?13:07
noonedeadpunkagainst 100 hosts?13:07
spatelyes13:07
spateli have one more environment which has 328 computes :)13:08
mgariepywhen i do upgrade like that i tend to split the task over a couple of days.13:08
spatelmgariepy is it ok if infra running on wallaby and compute running on Victoria ?13:08
noonedeadpunkme too13:08
noonedeadpunkyes, totally13:08
spateli thought they don't work in mix version 13:08
mgariepycontrol plane first on day 1 (infra + keystone + other services)  and then the nova/neutron on day 2.13:09
spateloh!! damn it.. i didn't know that13:09
mgariepywell your upgrade was live and for 8 hours you had a mix of the 2 releases! ;)13:09
spatelnext time i will split out... it would be good to put nodes in Official doc incase people not aware :)13:09
mgariepyyou can also split that i 3 days if you want !13:10
noonedeadpunkwe split in a week lol13:10
mgariepydepending on how many hosts :P13:10
mgariepymy cloud is kinda small so .. 2 days is enough haha13:10
noonedeadpunkto provide exact time when and what exactly can fail for customers13:10
spateli have noticed when you upgrade OVS agent / restart take network hit 13:12
spatelis that true?13:12
noonedeadpunkyep, ovs restart does break networking on compute13:12
spatelso we have to take downtime, that sucks...13:12
noonedeadpunkbecause eventually you shot down and re-create bridges13:12
spatelmy big problem is in remote datacenter i am planning to use dpdk (because sriov doesn't support bonding). so i have to use ovs13:13
spatelnoonedeadpunk as you know i haven't upgrade ceph yet.. that is i am planning to test in lab and later do it on production 13:14
spatelnoonedeadpunk i have very stupid question, what is i lost deployment node in that case how to build deployment node? 13:15
noonedeadpunkyou need to backup /etc/openstack_deploy :)13:15
noonedeadpunkor store it in git13:26
mgariepyand push it somewhere.13:26
noonedeadpunknot best idea to store user_secrets in git though13:26
mgariepyi do encrypt secrets and store it in a private git server.13:27
mgariepyhow do you guys do that noonedeadpunk ?13:29
jamesdentonanyone here run into an issue with the dashboard, when after logging in you get a 504 after a while. Looking at nova-api it's just spinning its wheels on /v2.1/os-simple-tenant-usage/13:29
jamesdentonrepeated (successful) requests for GET /v2.1/os-simple-tenant-usage/7a8df96a3c6a47118e60e57aa9ecff54 (project). strange13:31
mgariepyis it only one node that is failing ?13:32
jamesdentoni see the same behavior across all controllers running that service13:33
jamesdentonIt's the Project -> Compute -> Overview page that's problematic, not all of horizon13:34
mgariepydo you see the error in the nova  logs?13:34
mgariepywhen i do an upgrade on horizon i usualy destroy one of the container and create a new one. just to have a quick fallback if the upgrade fails for misterious reason..13:36
jamesdentonhttps://pastebin.com/ndzhmk1B13:36
jamesdentonhorizon logs look clean, but something is making repeated requests to this url13:37
jamesdentonwhich is related to that overview page13:37
mgariepyare the request comes all from the same horizon container ?13:40
mgariepyor mostly**13:40
jamesdentonahh, good question, i'll have to check. it's "load balanced" but i need t make sure13:42
mgariepythe connection to horizon are sticky per client IRRC13:43
jamesdentonnova usage-list uses the same os-simple-tenant-usage calls, testing it now. 13:43
mgariepyyour loadbalancer would log the balancing for the nova api logs if you list the last 1000 are they coming mostly from the same host ?13:45
jamesdentonthis is just a lab env13:45
mgariepylol ok and ? take 100 call then haha13:46
jamesdenton:D13:46
mgariepyyou are the only client ?13:46
jamesdentonyes, i'm the only client. Problem persists a reboot, too, so i wonder if there something funky in the DB13:46
mgariepyflush memcached13:47
jamesdentonit happened after a recent upgrade, but can't recall which13:47
mgariepyis it a 3 ctrl node deployement ?13:48
jamesdentonyes13:48
mgariepyblock-migrate on hdds is soooo painful13:48
mgariepydid you try to remove on ctrl node from the lb to see if it's only one node that is causing the issue ?13:51
jamesdentonyeah, i've done the usual stuff. 14:02
mgariepyyou should try to start the weekend early ahha14:07
Adri2000hi... this backport https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/806210 would be happy with another +2 :)14:17
mgariepyAdri2000, done.14:22
Adri2000thank you mgariepy!14:23
noonedeadpunkEventually would be great to review https://review.opendev.org/q/topic:"osa%252Fgalera_pki"+(status:open) as well :)14:51
jamesdentonmgariepy So, I've isolated this to some issue w/ nova api versioning. But maybe really something with haproxy, dunno. If I use microversion <= 2.39 it works, >2.40 fails15:17
jamesdentonhttps://docs.openstack.org/nova/latest/reference/api-microversion-history.html15:17
mgariepyjamesdenton, mismatch of version of the microversion between horizon/nova/ and other stuff?15:35
jamesdentoni can't imagine so, microversion 2.40 is pretty old (older than this env, even). but i'm not really sure what the deal is. running wallaby now and this issue started happening in this env during victoria, IIRC. this lab sees some abuse. at this point it's the principle of the thing - i need to fix it to keep myself sane15:37
jamesdentoni might adjust the endpoints to bypass haproxy and see if it occurs directly to nova-api15:38
mgariepykeep us in the loop 15:42
mgariepyi don't see why haproxy would cause issue with a version like that.15:43
jamesdentonI was thinking maybe a difference in the payload was causing an issue. but the same thing is happening directly to nova api when i changed the endpoint15:44
jamesdentonmgariepy so, it looks like there was something about terminated instances that was causing an issue. The issue persisted, even after deleting all instances from all projects. Had to run "nova-manage db archive_deleted_rows"15:58
mgariepywhen did you install that system ? is it an old install upgraded for the last 10 years ?15:59
jamesdentonAustin -> Wallaby15:59
jamesdentonNot really, probably Queens/Rocky -> W15:59
mgariepyLOL15:59
jamesdentonbut this issue crept up some time in the last few weeks15:59
mgariepydid you run nova api with debug ?16:00
mgariepyhow did you saw the issue with the deleted instance?16:00
jamesdentonoh yeah. in fact, i thought it was fixed but it's not. as soon as i created a new instance, the problem came back. I mentioned deleted instances because you could still see that returned in the payload16:01
mgariepydb migration missing?16:01
mgariepyif nova error's out on a db issue there should be a traceback somewhere ?no ?16:02
mgariepyunless it's in try: except: pass haha16:03
jamesdentoni would think so. only think i see in the logs is repeated attempts against os-simple-tenant-usage16:03
opendevreviewMerged openstack/openstack-ansible-os_swift stable/ussuri: Revert "split templates to work around configparser bug"  https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/80621016:35

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!