*** spatel has joined #openstack-lbaas | 01:42 | |
*** spatel has quit IRC | 03:00 | |
*** dulek has quit IRC | 05:57 | |
*** amotoki has joined #openstack-lbaas | 06:06 | |
*** osmanlic- has joined #openstack-lbaas | 06:45 | |
*** osmanlicilegi has quit IRC | 06:45 | |
*** osmanlic- has quit IRC | 07:01 | |
*** osmanlicilegi has joined #openstack-lbaas | 07:05 | |
*** andrewbonney has joined #openstack-lbaas | 07:16 | |
opendevreview | mitya-eremeev-2 proposed openstack/octavia master: More logging during load balancer creation. https://review.opendev.org/c/openstack/octavia/+/792697 | 08:46 |
---|---|---|
*** spatel has joined #openstack-lbaas | 12:46 | |
opendevreview | Merged openstack/octavia-tempest-plugin master: New test: test_tcp_and_udp_traffic_on_same_port https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/755050 | 15:21 |
gthiemonge | #startmeeting Octavia | 16:00 |
opendevmeet | Meeting started Wed Jun 9 16:00:52 2021 UTC and is due to finish in 60 minutes. The chair is gthiemonge. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'octavia' | 16:00 |
johnsom | o/ | 16:01 |
gthiemonge | Hi Folks | 16:01 |
haleyb | hi | 16:01 |
gthiemonge | #topic Announcements | 16:01 |
gthiemonge | ML2/OVN is now the default network backend in devstack | 16:02 |
gthiemonge | Our gates are back | 16:02 |
gthiemonge | IPv6 connectivity for the VIPs has been restored! | 16:02 |
rm_work | o/ | 16:02 |
gthiemonge | I haven't seen any other issues with the gates | 16:03 |
gthiemonge | perhaps I need to recheck the open review for the multinode job | 16:03 |
johnsom | Yeah, I haven't been able to get a devstack to go with OVN. I had to revert to OVS. | 16:03 |
johnsom | I see there are still some open devstack issues around this | 16:03 |
haleyb | gthiemonge: fyi there is at least one follow-on devstack patch to fix IPv6 issues w/OVN | 16:03 |
haleyb | #link https://review.opendev.org/c/openstack/devstack/+/795371 | 16:04 |
gthiemonge | yeah I have a working devstack env with OVN, it was a bit painful to get there | 16:04 |
johnsom | Any tips or docs pointers? It's been a few days since I tried. | 16:04 |
haleyb | tap your heels together and repeat "OVN is great" :p | 16:05 |
johnsom | I asked in the channel "Any idea why devstack dies looking for /var/run/openvswitch/ovnnb_db.sock" here, but didn't get a response | 16:05 |
johnsom | Ha, well, ok. | 16:05 |
gthiemonge | I can provide my local.conf file | 16:05 |
johnsom | I may just need to create a fresh VM to make the conversion... I don't nkow | 16:06 |
rm_work | yeah that'd be great, i have yet to get a successful devstack build to happen | 16:06 |
rm_work | newer working local.conf would be A++ | 16:06 |
gthiemonge | I'll send it after the meeting (I need to remove some secrets ;-) | 16:07 |
johnsom | Yeah and a list of packages needed. I looked for a bindep but didn't find one. | 16:07 |
johnsom | +1 | 16:07 |
haleyb | i'll check the local.conf in the OVN provider repo with the latest code too, it was working last week.... | 16:08 |
gthiemonge | Well I didn't have any package issue (on centos 8 stream), but I had to downgrade a nova-related package to spawn VMs | 16:08 |
gthiemonge | local.conf: | 16:10 |
johnsom | I tried installing everything I could think of, but it didn't help, so yeah, it may not be package related. | 16:10 |
gthiemonge | #link http://paste.openstack.org/show/806497/ | 16:10 |
johnsom | Thanks Greg | 16:10 |
gthiemonge | np | 16:10 |
gthiemonge | in case of issues, ping haleyb ;-) | 16:11 |
johnsom | I tried... | 16:11 |
rm_work | I am semi-distracted because I am trying to track down how our API is failing oddly on a member create -- may ping here after the meeting to see if anyone has seen issues recently | 16:12 |
*** xgerman has joined #openstack-lbaas | 16:12 | |
haleyb | yes, or look in the ovn provider repo, it's different and builds from source | 16:12 |
haleyb | johnsom: i never saw the ping, sorry | 16:12 |
johnsom | No worries | 16:12 |
gthiemonge | Any other announcements? | 16:14 |
johnsom | We are going to retire #openstack-state-management | 16:14 |
johnsom | Please use #openstack-oslo for TaskFlow discussion going forward | 16:14 |
johnsom | Not that there has been much discussion in a while.... | 16:14 |
gthiemonge | ack | 16:14 |
gthiemonge | thanks johnsom | 16:15 |
gthiemonge | #topic Brief progress reports / bugs needing review | 16:15 |
gthiemonge | I've been working on a weird behavior with amphorav2+persistence, some tasks were executed twice (and concurrently) | 16:16 |
gthiemonge | For instance the octavia-worker created 2 VIP ports per amphora, and then it failed | 16:17 |
johnsom | I have mostly been on the review path recently. | 16:17 |
gthiemonge | (because the amphora cannot get 2 VIP ports) | 16:17 |
gthiemonge | I found out that the jobboard_expiration_time setting value (default 30sec) should never be greater that the duration of the longest task | 16:17 |
gthiemonge | My dev env was a bit loaded and the ComputeWait task took ~30sec | 16:17 |
gthiemonge | I detailed the issue in | 16:18 |
gthiemonge | #link https://storyboard.openstack.org/#!/story/2008956 | 16:18 |
gthiemonge | (see the first comment) | 16:18 |
gthiemonge | No fix/code so far, but that was an interesting debugging session with amphorav2 | 16:18 |
gthiemonge | And that would be great to get reviews on those python-octaviaclient patches: | 16:21 |
gthiemonge | #link https://review.opendev.org/c/openstack/python-octaviaclient/+/792122 | 16:22 |
gthiemonge | #link https://review.opendev.org/c/openstack/python-octaviaclient/+/792688 | 16:22 |
gthiemonge | I amended the 2nd patch, I don't know if I can review it :D | 16:22 |
johnsom | I think so. I'm the author. | 16:23 |
johnsom | Maybe we both do, but don't workflow? Grin | 16:23 |
gthiemonge | :D | 16:23 |
johnsom | Thanks for helping out with the test updates! I ran out of cycles I could give to that. | 16:23 |
johnsom | It looks like Ann reviewed, so just one more core needed | 16:24 |
gthiemonge | it passes the tests, but the change is not pretty | 16:24 |
johnsom | lol | 16:25 |
johnsom | Well, it will be a huge speed up for anyone with a large cloud. | 16:26 |
gthiemonge | #topic Open Discussion | 16:27 |
gthiemonge | any other topics today? | 16:27 |
gthiemonge | Ok! | 16:29 |
gthiemonge | Thanks Folks! | 16:29 |
johnsom | Thanks Greg | 16:29 |
gthiemonge | #endmeeting | 16:29 |
opendevmeet | Meeting ended Wed Jun 9 16:29:38 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:29 |
opendevmeet | Minutes: http://eavesdrop.openstack.org/meetings/octavia/2021/octavia.2021-06-09-16.00.html | 16:29 |
opendevmeet | Minutes (text): http://eavesdrop.openstack.org/meetings/octavia/2021/octavia.2021-06-09-16.00.txt | 16:29 |
opendevmeet | Log: http://eavesdrop.openstack.org/meetings/octavia/2021/octavia.2021-06-09-16.00.log.html | 16:29 |
rm_work | I am failing to see how this is even POSSIBLE: http://paste.openstack.org/show/806499/ | 16:44 |
opendevreview | Vishal Manchanda proposed openstack/octavia-dashboard master: Drop horizon-nodejs10-jobs template https://review.opendev.org/c/openstack/octavia-dashboard/+/795594 | 16:47 |
johnsom | It looks like it was in a rollback, so maybe the request failed before the member was created inside the DB transaction? | 16:47 |
johnsom | There must be another exception logged there. | 16:48 |
johnsom | Oh, nevermind | 16:49 |
johnsom | Hmm, maybe it was a DBDuplicateEntry that walked off the "if" conditional chain there???? | 16:50 |
johnsom | https://github.com/openstack/octavia/blob/e1adb335c788dbd902bfd8a00f748da4abb8b06b/octavia/api/v2/controllers/member.py#L122 | 16:51 |
johnsom | This seems fishy as well: https://github.com/openstack/octavia/blob/e1adb335c788dbd902bfd8a00f748da4abb8b06b/octavia/api/v2/controllers/member.py#L125 | 16:52 |
johnsom | like it should be an "in" not "==" | 16:52 |
rm_work | hmm | 17:00 |
rm_work | but like... anything breaks in there and it raises... so it'd return like 5xx | 17:01 |
rm_work | right? | 17:01 |
rm_work | i was assuming it couldn't have failed there | 17:01 |
rm_work | the `return None` at the end isn't helpful, but I assume MUST be what happened? | 17:02 |
rm_work | oh, no, i see what you mean | 17:02 |
rm_work | that exception block doesn't have to raise/return | 17:02 |
*** andrewbonney has quit IRC | 17:05 | |
rm_work | yeah this looks like a bug | 17:05 |
rm_work | oh so it's checking to see if the set of columns IS ['id'] | 17:06 |
rm_work | that is valid but ... weird | 17:06 |
rm_work | and i'm not sure if it's actually right | 17:06 |
johnsom | Yep. BTW, don't look at the patch author | 17:16 |
johnsom | I would vote to at least add a log line there, if not just remove that second conditional and make DBduplicate always raise something | 17:16 |
rm_work | hmm yeah, i can add a log there for debugging for now and see what happens | 17:17 |
rm_work | got another fun one | 17:36 |
rm_work | hmm though not sure exactly how to describe this | 17:37 |
johnsom | "A stray neutrino passed through the motherboard of our server....." | 17:41 |
rm_work | I THINK what happened | 17:42 |
rm_work | is that nova failed to spin up the second amphora during a create | 17:42 |
rm_work | but the first amp was up and reporting health? | 17:42 |
johnsom | "A lost manager leaned on the big-red-button in the datacenter....." | 17:42 |
rm_work | and then it started the rollback | 17:42 |
rm_work | and the amp went stale | 17:42 |
rm_work | and then the healthmanager tried to do a failover on that amp? | 17:43 |
rm_work | but it didn't have an entry in the DB because it was rolled back? | 17:43 |
rm_work | which led to: | 17:43 |
rm_work | 2021-06-09 16:59:11,295 ERROR [octavia.controller.worker.v1.controller_worker] /opt/openstack/venv/octavia/lib/python3.7/site-packages/octavia/controller/worker/v1/controller_worker.py:failover_amphora:871 Amphora failover for amphora 95780910-0248-4d5b-b236-57e44f9b8bc5 failed because there is no record of this amphora in the database. Check that the [house_keeping] amphora_expiry_age configuration setting is not too short. | 17:44 |
rm_work | Skipping | 17:44 |
rm_work | failover. | 17:44 |
rm_work | 2021-06-09 16:59:11,297 ERROR [octavia.controller.worker.v1.controller_worker] /opt/openstack/venv/octavia/lib/python3.7/site-packages/octavia/controller/worker/v1/controller_worker.py:failover_amphora:947 Amphora 95780910-0248-4d5b-b236-57e44f9b8bc5 failover exception: amphora 95780910-0248-4d5b-b236-57e44f9b8bc5 not found. Traceback (most recent call last): File | 17:44 |
rm_work | "/opt/openstack/venv/octavia/lib/python3.7/site-packages/octavia/controlle | 17:44 |
rm_work | r/worker/v1/controller_worker.py", line 873, in failover_amphora id=amphora_id) octavia.common.exceptions.NotFound: amphora 95780910-0248-4d5b-b236-57e44f9b8bc5 not found. | 17:44 |
rm_work | erk | 17:44 |
rm_work | I thought that was gonna paste nicely as two lines | 17:44 |
rm_work | http://paste.openstack.org/show/806503/ | 17:44 |
johnsom | Well, I haven't groked this all yet, but HM can't touch the amp if the Create flow is still in progress/reverting. The objects are locked. Also, HM doesn't know about the amp until it sends it's first heartbeat (open bug). | 17:46 |
rm_work | right, so i THINK it actually did sent its first heartbeat | 17:47 |
johnsom | Also, failover records in the worker log mean someone run the amp failover command from the API | 17:47 |
rm_work | this is HM log | 17:47 |
rm_work | and it happened right after a nova port binding faiilure that caused a create rollback | 17:48 |
rm_work | ANYWAY, for the FIRST error (the member creation): | 17:51 |
rm_work | got more info | 17:52 |
johnsom | Hmm, so, amp in amphora health table, but not in amphora table. So, maybe a missing delete from amphora health table on a revert cleanup of a failed amp create? Still, if the port plug was the failure, it seems like the amp should be in error and not deleted. | 17:52 |
rm_work | i think the port plug failed on amp2 | 17:52 |
rm_work | and rolled back amp1 | 17:52 |
rm_work | but anyway that's totally speculation | 17:53 |
rm_work | all i am going off is the error in that paste, and that it happened right after a port-bind fail | 17:53 |
rm_work | but ANYWAY, original issue, member create error: it's definitely a DB dupe | 17:54 |
rm_work | and that except block is broken | 17:54 |
rm_work | trying to get the full traceback out of logs now | 17:54 |
rm_work | http://paste.openstack.org/show/806504/ | 17:57 |
rm_work | so in this case, de.columns == ['member.uq_member_pool_id_address_protocol_port'] | 17:58 |
rm_work | this seems... obvious? | 17:59 |
rm_work | like .... it failed the constraint. we know about this. | 17:59 |
rm_work | the second `if` should catch that | 17:59 |
rm_work | `set(constraint_list) == set(de.columns)` | 18:00 |
rm_work | did a sqlalchemy update change how it returns? | 18:00 |
rm_work | ['uq_member_pool_id_address_protocol_port'] == ['member.uq_member_pool_id_address_protocol_port'] | 18:00 |
rm_work | it has `member.` now | 18:00 |
johnsom | Maybe | 18:00 |
rm_work | was this always broken or did sqlalchemy change and break it... | 18:01 |
johnsom | There was a major sqla release a while back, but I would dismiss that it has been broken a long time. | 18:01 |
johnsom | would->would not | 18:02 |
rm_work | yeah... | 18:06 |
rm_work | so votes -- should I just update the constraint name here so it matches? | 18:06 |
rm_work | or should I figure out how to rewrite this whole section? | 18:07 |
johnsom | Is there a good reason to *not* have every duplicate here (other than the ID with it's special message) raise??? It's during a create | 18:08 |
johnsom | I guess I would have to trace back all of the places that call _validate_create_member and check if there is a case where it should just be passive on a dulicate. | 18:09 |
*** masterpe[m] has joined #openstack-lbaas | 18:10 | |
johnsom | Face value, it seems it should always raise IMO | 18:10 |
rm_work | yes | 18:11 |
rm_work | leaking through to the `return None` seems always dumb | 18:11 |
rm_work | we were trying to return different errors depending on why it failed the insert tho | 18:11 |
rm_work | so I'm thinking this: | 18:12 |
opendevreview | Adam Harwell proposed openstack/octavia master: Fix constraint check on dupe member create https://review.opendev.org/c/openstack/octavia/+/795637 | 18:15 |
johnsom | you might have to put back the "return None" due to a pylint check that all paths have a return.... | 18:22 |
johnsom | Otherwise, that was basically what I was thinking | 18:23 |
rm_work | no because now there's no way to escape | 18:33 |
rm_work | :D | 18:33 |
rm_work | so it can never get there | 18:33 |
johnsom | Ah,yeah, ha | 18:33 |
rm_work | that even existing should have been a red flag I think | 18:33 |
rm_work | still running local tests | 18:37 |
rm_work | hmmm UNRELATED lint errors: | 18:39 |
rm_work | octavia/amphorae/backends/utils/ip_advertisement.py:16:0: E0611: No name 'pack' in module 'struct' (no-name-in-module) | 18:39 |
rm_work | octavia/amphorae/backends/utils/ip_advertisement.py:17:0: E0611: No name 'unpack' in module 'struct' (no-name-in-module) | 18:39 |
rm_work | is that an osx issue? | 18:39 |
johnsom | I haven't seen that issue. That code hasn't changed since I wrote it over a year ago | 18:40 |
johnsom | Looks to still exist in python | 18:41 |
johnsom | https://docs.python.org/3/library/struct.html | 18:41 |
johnsom | https://dilbert.com/strip/1995-06-24 | 18:42 |
rm_work | lol | 18:53 |
rm_work | pretty much all of dilbert still holds up :D | 18:53 |
rm_work | if only scott adams wasn't a total fuckwit | 18:54 |
rm_work | I still think about this every time I have any discussion of art: https://www.youtube.com/watch?v=u05S4ikHdNc | 18:55 |
rm_work | johnsom: do kinda wish I could update flavor-profiles without having to do UPDATE in the DB <_< | 19:39 |
johnsom | You are just going to change a flavor definition out from under deployed load balancers? | 19:40 |
rm_work | yes | 19:41 |
rm_work | because it doesn't matter | 19:42 |
rm_work | not everything affects existing LBs | 19:42 |
rm_work | I need to add vip_subnet_selection_tag | 19:42 |
rm_work | only affects creates | 19:43 |
rm_work | I still don't believe these should be immutable | 19:45 |
rm_work | hmmm, default `graceful_shutdown_timeout` of 60s might be a bit low T_T | 20:07 |
rm_work | if LB create just started, 60s is not very long | 20:07 |
rm_work | I realize it can be easily changed to match your cloud, but default values should probably be a bit on the conservative side :D | 20:08 |
johnsom | Where is that setting? | 20:17 |
johnsom | Hmm, wonder if that is used. Also, we set it to 300 in devstack plugin, so, yeah, 60 seems a bit short. I mean a provision should be 30-45 seconds, but some clouds are..... not so good. | 20:19 |
johnsom | I can think of one I saw take up to 15 minutes as it was totally overloaded | 20:20 |
johnsom | VM boot time that is | 20:20 |
*** spatel has quit IRC | 20:42 | |
rm_work | johnsom: pretty sure it's used -- we had it come up during a restart earlier, where it timed out and stuck something in PENDING | 21:43 |
rm_work | I don't see it used directly? but I guess maybe cotyledon uses it internally? | 21:44 |
johnsom | A search shows it's part of oslo service, which I thought we were not using for worker | 21:45 |
johnsom | I may be remembering that wrong though | 21:45 |
rm_work | oslo_config_glue.setup(sm, CONF, reload_method="mutate") | 21:47 |
rm_work | I think that prolly does the magic | 21:47 |
rm_work | or this | 21:47 |
rm_work | sm.add(consumer_v1.ConsumerService, workers=CONF.controller_worker.workers, args=(CONF,)) | 21:47 |
rm_work | prolly that one | 21:47 |
rm_work | part of how it's a "drop in" replacement | 21:48 |
johnsom | Yeah, could be | 21:48 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!