Wednesday, 2021-06-09

*** spatel has joined #openstack-lbaas01:42
*** spatel has quit IRC03:00
*** dulek has quit IRC05:57
*** amotoki has joined #openstack-lbaas06:06
*** osmanlic- has joined #openstack-lbaas06:45
*** osmanlicilegi has quit IRC06:45
*** osmanlic- has quit IRC07:01
*** osmanlicilegi has joined #openstack-lbaas07:05
*** andrewbonney has joined #openstack-lbaas07:16
opendevreviewmitya-eremeev-2 proposed openstack/octavia master: More logging during load balancer creation.  https://review.opendev.org/c/openstack/octavia/+/79269708:46
*** spatel has joined #openstack-lbaas12:46
opendevreviewMerged openstack/octavia-tempest-plugin master: New test: test_tcp_and_udp_traffic_on_same_port  https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/75505015:21
gthiemonge#startmeeting Octavia16:00
opendevmeetMeeting started Wed Jun  9 16:00:52 2021 UTC and is due to finish in 60 minutes.  The chair is gthiemonge. Information about MeetBot at http://wiki.debian.org/MeetBot.16:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:00
opendevmeetThe meeting name has been set to 'octavia'16:00
johnsomo/16:01
gthiemongeHi Folks16:01
haleybhi16:01
gthiemonge#topic Announcements16:01
gthiemongeML2/OVN is now the default network backend in devstack16:02
gthiemongeOur gates are back16:02
gthiemongeIPv6 connectivity for the VIPs has been restored!16:02
rm_worko/16:02
gthiemongeI haven't seen any other issues with the gates16:03
gthiemongeperhaps I need to recheck the open review for the multinode job16:03
johnsomYeah, I haven't been able to get a devstack to go with OVN. I had to revert to OVS.16:03
johnsomI see there are still some open devstack issues around this16:03
haleybgthiemonge: fyi there is at least one follow-on devstack patch to fix IPv6 issues w/OVN16:03
haleyb#link https://review.opendev.org/c/openstack/devstack/+/79537116:04
gthiemongeyeah I have a working devstack env with OVN, it was a bit painful to get there16:04
johnsomAny tips or docs pointers? It's been a few days since I tried.16:04
haleybtap your heels together and repeat "OVN is great" :p16:05
johnsomI asked in the channel "Any idea why devstack dies looking for  /var/run/openvswitch/ovnnb_db.sock" here, but didn't get a response16:05
johnsomHa, well, ok.16:05
gthiemongeI can provide my local.conf file16:05
johnsomI may just need to create a fresh VM to make the conversion... I don't nkow16:06
rm_workyeah that'd be great, i have yet to get a successful devstack build to happen16:06
rm_worknewer working local.conf would be A++16:06
gthiemongeI'll send it after the meeting (I need to remove some secrets ;-)16:07
johnsomYeah and a list of packages needed. I looked for a bindep but didn't find one.16:07
johnsom+116:07
haleybi'll check the local.conf in the OVN provider repo with the latest code too, it was working last week....16:08
gthiemongeWell I didn't have any package issue (on centos 8 stream), but I had to downgrade a nova-related package to spawn VMs16:08
gthiemongelocal.conf:16:10
johnsomI tried installing everything I could think of, but it didn't help, so yeah, it may not be package related.16:10
gthiemonge#link http://paste.openstack.org/show/806497/16:10
johnsomThanks Greg16:10
gthiemongenp16:10
gthiemongein case of issues, ping haleyb ;-)16:11
johnsomI tried...16:11
rm_workI am semi-distracted because I am trying to track down how our API is failing oddly on a member create -- may ping here after the meeting to see if anyone has seen issues recently16:12
*** xgerman has joined #openstack-lbaas16:12
haleybyes, or look in the ovn provider repo, it's different and builds from source16:12
haleybjohnsom: i never saw the ping, sorry16:12
johnsomNo worries16:12
gthiemongeAny other announcements?16:14
johnsomWe are going to retire #openstack-state-management16:14
johnsomPlease use #openstack-oslo for TaskFlow discussion going forward16:14
johnsomNot that there has been much discussion in a while....16:14
gthiemongeack16:14
gthiemongethanks johnsom16:15
gthiemonge#topic Brief progress reports / bugs needing review16:15
gthiemongeI've been working on a weird behavior with amphorav2+persistence, some tasks were executed twice (and concurrently)16:16
gthiemongeFor instance the octavia-worker created 2 VIP ports per amphora, and then it failed16:17
johnsomI have mostly been on the review path recently.16:17
gthiemonge(because the amphora cannot get 2 VIP ports)16:17
gthiemongeI found out that the jobboard_expiration_time setting value (default 30sec) should never be greater that the duration of the longest task16:17
gthiemongeMy dev env was a bit loaded and the ComputeWait task took ~30sec16:17
gthiemongeI detailed the issue in16:18
gthiemonge#link https://storyboard.openstack.org/#!/story/200895616:18
gthiemonge(see the first comment)16:18
gthiemongeNo fix/code so far, but that was an interesting debugging session with amphorav216:18
gthiemongeAnd that would be great to get reviews on those python-octaviaclient patches:16:21
gthiemonge#link https://review.opendev.org/c/openstack/python-octaviaclient/+/79212216:22
gthiemonge#link https://review.opendev.org/c/openstack/python-octaviaclient/+/79268816:22
gthiemongeI amended the 2nd patch, I don't know if I can review it :D16:22
johnsomI think so. I'm the author.16:23
johnsomMaybe we both do, but don't workflow? Grin16:23
gthiemonge:D16:23
johnsomThanks for helping out with the test updates! I ran out of cycles I could give to that.16:23
johnsomIt looks like Ann reviewed, so just one more core needed16:24
gthiemongeit passes the tests, but the change is not pretty16:24
johnsomlol16:25
johnsomWell, it will be a huge speed up for anyone with a large cloud.16:26
gthiemonge#topic Open Discussion16:27
gthiemongeany other topics today?16:27
gthiemongeOk!16:29
gthiemongeThanks Folks!16:29
johnsomThanks Greg16:29
gthiemonge#endmeeting16:29
opendevmeetMeeting ended Wed Jun  9 16:29:38 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:29
opendevmeetMinutes:        http://eavesdrop.openstack.org/meetings/octavia/2021/octavia.2021-06-09-16.00.html16:29
opendevmeetMinutes (text): http://eavesdrop.openstack.org/meetings/octavia/2021/octavia.2021-06-09-16.00.txt16:29
opendevmeetLog:            http://eavesdrop.openstack.org/meetings/octavia/2021/octavia.2021-06-09-16.00.log.html16:29
rm_workI am failing to see how this is even POSSIBLE: http://paste.openstack.org/show/806499/16:44
opendevreviewVishal Manchanda proposed openstack/octavia-dashboard master: Drop horizon-nodejs10-jobs template  https://review.opendev.org/c/openstack/octavia-dashboard/+/79559416:47
johnsomIt looks like it was in a rollback, so maybe the request failed before the member was created inside the DB transaction?16:47
johnsomThere must be another exception logged there.16:48
johnsomOh, nevermind16:49
johnsomHmm, maybe it was a DBDuplicateEntry that walked off the "if" conditional chain there????16:50
johnsomhttps://github.com/openstack/octavia/blob/e1adb335c788dbd902bfd8a00f748da4abb8b06b/octavia/api/v2/controllers/member.py#L12216:51
johnsomThis seems fishy as well: https://github.com/openstack/octavia/blob/e1adb335c788dbd902bfd8a00f748da4abb8b06b/octavia/api/v2/controllers/member.py#L12516:52
johnsomlike it should be an "in" not "=="16:52
rm_workhmm17:00
rm_workbut like... anything breaks in there and it raises... so it'd return like 5xx17:01
rm_workright?17:01
rm_worki was assuming it couldn't have failed there17:01
rm_workthe `return None` at the end isn't helpful, but I assume MUST be what happened?17:02
rm_workoh, no, i see what you mean17:02
rm_workthat exception block doesn't have to raise/return17:02
*** andrewbonney has quit IRC17:05
rm_workyeah this looks like a bug17:05
rm_workoh so it's checking to see if the set of columns IS ['id']17:06
rm_workthat is valid but ... weird17:06
rm_workand i'm not sure if it's actually right17:06
johnsomYep. BTW, don't look at the patch author17:16
johnsomI would vote to at least add a log line there, if not just remove that second conditional and make DBduplicate always raise something17:16
rm_workhmm yeah, i can add a log there for debugging for now and see what happens17:17
rm_workgot another fun one17:36
rm_workhmm though not sure exactly how to describe this17:37
johnsom"A stray neutrino passed through the motherboard of our server....."17:41
rm_workI THINK what happened17:42
rm_workis that nova failed to spin up the second amphora during a create17:42
rm_workbut the first amp was up and reporting health?17:42
johnsom"A lost manager leaned on the big-red-button in the datacenter....."17:42
rm_workand then it started the rollback17:42
rm_workand the amp went stale17:42
rm_workand then the healthmanager tried to do a failover on that amp?17:43
rm_workbut it didn't have an entry in the DB because it was rolled back?17:43
rm_workwhich led to:17:43
rm_work2021-06-09 16:59:11,295 ERROR [octavia.controller.worker.v1.controller_worker] /opt/openstack/venv/octavia/lib/python3.7/site-packages/octavia/controller/worker/v1/controller_worker.py:failover_amphora:871 Amphora failover for amphora 95780910-0248-4d5b-b236-57e44f9b8bc5 failed because there is no record of this amphora in the database. Check that the [house_keeping] amphora_expiry_age configuration setting is not too short.17:44
rm_workSkipping17:44
rm_work failover.17:44
rm_work2021-06-09 16:59:11,297 ERROR [octavia.controller.worker.v1.controller_worker] /opt/openstack/venv/octavia/lib/python3.7/site-packages/octavia/controller/worker/v1/controller_worker.py:failover_amphora:947 Amphora 95780910-0248-4d5b-b236-57e44f9b8bc5 failover exception: amphora 95780910-0248-4d5b-b236-57e44f9b8bc5 not found. Traceback (most recent call last): File17:44
rm_work"/opt/openstack/venv/octavia/lib/python3.7/site-packages/octavia/controlle17:44
rm_workr/worker/v1/controller_worker.py", line 873, in failover_amphora id=amphora_id) octavia.common.exceptions.NotFound: amphora 95780910-0248-4d5b-b236-57e44f9b8bc5 not found.17:44
rm_workerk17:44
rm_workI thought that was gonna paste nicely as two lines17:44
rm_workhttp://paste.openstack.org/show/806503/17:44
johnsomWell, I haven't groked this all yet, but HM can't touch the amp if the Create flow is still in progress/reverting. The objects are locked. Also, HM doesn't know about the amp until it sends it's first heartbeat (open bug).17:46
rm_workright, so i THINK it actually did sent its first heartbeat17:47
johnsomAlso, failover records in the worker log mean someone run the amp failover command from the API17:47
rm_workthis is HM log17:47
rm_workand it happened right after a nova port binding faiilure that caused a create rollback17:48
rm_workANYWAY, for the FIRST error (the member creation):17:51
rm_workgot more info17:52
johnsomHmm, so, amp in amphora health table, but not in amphora table. So, maybe a missing delete from amphora health table on a revert cleanup of a failed amp create? Still, if the port plug was the failure, it seems like the amp should be in error and not deleted.17:52
rm_worki think the port plug failed on amp217:52
rm_workand rolled back amp117:52
rm_workbut anyway that's totally speculation17:53
rm_workall i am going off is the error in that paste, and that it happened right after a port-bind fail17:53
rm_workbut ANYWAY, original issue, member create error: it's definitely a DB dupe17:54
rm_workand that except block is broken17:54
rm_worktrying to get the full traceback out of logs now17:54
rm_workhttp://paste.openstack.org/show/806504/17:57
rm_workso in this case, de.columns == ['member.uq_member_pool_id_address_protocol_port']17:58
rm_workthis seems... obvious?17:59
rm_worklike .... it failed the constraint. we know about this.17:59
rm_workthe second `if` should catch that17:59
rm_work`set(constraint_list) == set(de.columns)`18:00
rm_workdid a sqlalchemy update change how it returns?18:00
rm_work['uq_member_pool_id_address_protocol_port'] == ['member.uq_member_pool_id_address_protocol_port']18:00
rm_workit has `member.` now18:00
johnsomMaybe18:00
rm_workwas this always broken or did sqlalchemy change and break it...18:01
johnsomThere was a major sqla release a while back, but I would dismiss that it has been broken a long time.18:01
johnsomwould->would not18:02
rm_workyeah...18:06
rm_workso votes -- should I just update the constraint name here so it matches?18:06
rm_workor should I figure out how to rewrite this whole section?18:07
johnsomIs there a good reason to *not* have every duplicate here (other than the ID with it's special message) raise??? It's during a create18:08
johnsomI guess I would have to trace back all of the places that call _validate_create_member and check if there is a case where it should just be passive on a dulicate.18:09
*** masterpe[m] has joined #openstack-lbaas18:10
johnsomFace value, it seems it should always raise IMO18:10
rm_workyes18:11
rm_workleaking through to the `return None` seems always dumb18:11
rm_workwe were trying to return different errors depending on why it failed the insert tho18:11
rm_workso I'm thinking this:18:12
opendevreviewAdam Harwell proposed openstack/octavia master: Fix constraint check on dupe member create  https://review.opendev.org/c/openstack/octavia/+/79563718:15
johnsomyou might have to put back the "return None" due to a pylint check that all paths have a return....18:22
johnsomOtherwise, that was basically what I was thinking18:23
rm_workno because now there's no way to escape18:33
rm_work:D18:33
rm_workso it can never get there18:33
johnsomAh,yeah, ha18:33
rm_workthat even existing should have been a red flag I think18:33
rm_workstill running local tests18:37
rm_workhmmm UNRELATED lint errors:18:39
rm_workoctavia/amphorae/backends/utils/ip_advertisement.py:16:0: E0611: No name 'pack' in module 'struct' (no-name-in-module)18:39
rm_workoctavia/amphorae/backends/utils/ip_advertisement.py:17:0: E0611: No name 'unpack' in module 'struct' (no-name-in-module)18:39
rm_workis that an osx issue?18:39
johnsomI haven't seen that issue. That code hasn't changed since I wrote it over a year ago18:40
johnsomLooks to still exist in python18:41
johnsomhttps://docs.python.org/3/library/struct.html18:41
johnsomhttps://dilbert.com/strip/1995-06-2418:42
rm_worklol18:53
rm_workpretty much all of dilbert still holds up :D18:53
rm_workif only scott adams wasn't a total fuckwit18:54
rm_workI still think about this every time I have any discussion of art: https://www.youtube.com/watch?v=u05S4ikHdNc18:55
rm_workjohnsom: do kinda wish I could update flavor-profiles without having to do UPDATE in the DB <_<19:39
johnsomYou are just going to change a flavor definition out from under deployed load balancers?19:40
rm_workyes19:41
rm_workbecause it doesn't matter19:42
rm_worknot everything affects existing LBs19:42
rm_workI need to add vip_subnet_selection_tag19:42
rm_workonly affects creates19:43
rm_workI still don't believe these should be immutable19:45
rm_workhmmm, default `graceful_shutdown_timeout` of 60s might be a bit low T_T20:07
rm_workif LB create just started, 60s is not very long20:07
rm_workI realize it can be easily changed to match your cloud, but default values should probably be a bit on the conservative side :D20:08
johnsomWhere is that setting?20:17
johnsomHmm, wonder if that is used. Also, we set it to 300 in devstack plugin, so, yeah, 60 seems a bit short. I mean a provision should be 30-45 seconds, but some clouds are..... not so good.20:19
johnsomI can think of one I saw take up to 15 minutes as it was totally overloaded20:20
johnsomVM boot time that is20:20
*** spatel has quit IRC20:42
rm_workjohnsom: pretty sure it's used -- we had it come up during a restart earlier, where it timed out and stuck something in PENDING21:43
rm_workI don't see it used directly? but I guess maybe cotyledon uses it internally?21:44
johnsomA search shows it's part of oslo service, which I thought we were not using for worker21:45
johnsomI may be remembering that wrong though21:45
rm_workoslo_config_glue.setup(sm, CONF, reload_method="mutate")21:47
rm_workI think that prolly does the magic21:47
rm_workor this21:47
rm_worksm.add(consumer_v1.ConsumerService, workers=CONF.controller_worker.workers, args=(CONF,))21:47
rm_workprolly that one21:47
rm_workpart of how it's a "drop in" replacement21:48
johnsomYeah, could be21:48

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!