*** zzzeek has quit IRC | 00:13 | |
*** zzzeek has joined #openstack-lbaas | 00:15 | |
*** zzzeek has quit IRC | 00:20 | |
*** zzzeek has joined #openstack-lbaas | 00:21 | |
*** zzzeek has quit IRC | 00:33 | |
*** zzzeek has joined #openstack-lbaas | 00:35 | |
*** ianychoi__ is now known as ianychoi | 00:44 | |
sorrison | I've delete all the amphora and ports for the amphora manually and then attempted a failover but still getting the same error about `"[ALERT] 259/005926 (1009) : Proxy '2303926f-e0f0-4d64-a55a-91ac55e33bcb:0eff681a-2839-4327-9f1e-341d861edc54': unable to find local peer 'y-mDIaEIktlflHuOB0-4vAQzCfM' in peers section '9924384542d1415586494f01c823ab2a_peers'.\n[WARNING] 259/005926 (1009) : Removing incomp | 01:01 |
---|---|---|
sorrison | lete section 'peers 9924384542d1415586494f01c823ab2a_peers' (no peer named 'y-mDIaEIktlflHuOB0-4vAQzCfM').\n[ALERT] 259/005926 (1009) : Fatal errors found in configuration.` | 01:01 |
*** sapd1 has joined #openstack-lbaas | 01:15 | |
*** zzzeek has quit IRC | 01:22 | |
*** zzzeek has joined #openstack-lbaas | 01:24 | |
*** zzzeek has quit IRC | 01:45 | |
*** zzzeek has joined #openstack-lbaas | 01:47 | |
johnsom | sorrison: I will look tomorrow, can’t now. Please let me know what version you are running. | 01:48 |
sorrison | ok thanks, no worries, I'm still tweaking it. I'm running ussuri with the failover patch. I had a look at what I'm missing from master but can't see anything related on first glance | 01:49 |
*** spatel has joined #openstack-lbaas | 02:33 | |
*** rcernin has quit IRC | 02:46 | |
*** rcernin has joined #openstack-lbaas | 02:56 | |
*** psachin has joined #openstack-lbaas | 02:59 | |
*** armax has quit IRC | 03:32 | |
*** spatel has quit IRC | 04:17 | |
*** gcheresh has joined #openstack-lbaas | 05:10 | |
*** rcernin has quit IRC | 05:19 | |
*** rcernin has joined #openstack-lbaas | 05:28 | |
*** rcernin has quit IRC | 05:28 | |
*** rcernin has joined #openstack-lbaas | 05:28 | |
*** AlexStaf has joined #openstack-lbaas | 05:47 | |
*** vishalmanchanda has joined #openstack-lbaas | 06:40 | |
*** ccamposr__ has joined #openstack-lbaas | 07:21 | |
*** ccamposr has quit IRC | 07:24 | |
*** ataraday_ has joined #openstack-lbaas | 07:43 | |
*** rcernin has quit IRC | 08:06 | |
*** TMM has quit IRC | 08:20 | |
*** TMM has joined #openstack-lbaas | 08:20 | |
rm_work | johnsom: found an interesting one. still working on tracing... but it looks like somehow an amp that WAS on an LB lost its LB linkage | 08:53 |
rm_work | LB is now deleted! | 08:53 |
rm_work | amp still exists, tho ports and everything are gone | 08:53 |
rm_work | loadbalancer_id is None | 08:53 |
rm_work | I found in logs where it used to have a loadbalancer_id... | 08:54 |
rm_work | the only thing I see that happened after I clearly see a successful provision of the amp (with lb_id) is a month later a failed cert rotate that reverted the amp to error (presumably the compute was gone?) but not sure if the lb_id was cleared before or after. Looked through the rotate flow, can't see how it would clear the lb_id (even on a revert) so my guess is it happened before that | 08:55 |
rm_work | we have about 30 amps in this state, out of 155 | 08:56 |
rm_work | so something is causing the amps to lose their LB linkage pretty regularly O_o | 08:56 |
rm_work | all of these amps that are missing lb_id are in ERROR | 08:57 |
rm_work | ah nope, delete of LB happened after the failed cert rotate | 08:57 |
rm_work | nevermind, original delete happened like a week before that cert-rotate failed. then housekeeping deleted it after. O_o | 09:09 |
rm_work | maybe housekeeping can delink? | 09:09 |
rm_work | if a LB is deleted, is it possible for it to have amp records linked still? and then housekeeping deletes the record fully for the LB but leaves the amps? | 09:10 |
dulek | johnsom: Just a thought regarding o-hm… Would it be possible to just disable it? I don't think we are using any healthchecks on the gate at the moment. | 11:16 |
cgoncalves | dulek, I'd strongly advise against disabling o-hm. it serves many purposes, including setting new LB operating status to ONLINE | 11:19 |
cgoncalves | dulek, if you have amphora spare pool disabled and your amps are short-lived ones (just CI testing), you could disable o-hk | 11:22 |
*** zzzeek has quit IRC | 11:27 | |
dulek | cgoncalves: Right, it's just in CI context. Okay, thanks for explanations! | 11:29 |
*** zzzeek has joined #openstack-lbaas | 11:30 | |
*** servagem has joined #openstack-lbaas | 11:58 | |
*** ramishra has quit IRC | 11:59 | |
*** ramishra has joined #openstack-lbaas | 12:21 | |
openstackgerrit | Merged openstack/octavia master: Add a release note about HAProxy 2.0 https://review.opendev.org/750531 | 12:24 |
*** vishalmanchanda has quit IRC | 13:04 | |
*** sapd1_x has joined #openstack-lbaas | 13:19 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Fix AttributeError on TLS-enabled pool provisioning https://review.opendev.org/752239 | 13:38 |
*** TrevorV has joined #openstack-lbaas | 14:02 | |
*** ataraday_ has quit IRC | 14:11 | |
*** armax has joined #openstack-lbaas | 14:37 | |
*** AlexStaf has quit IRC | 14:50 | |
*** vishalmanchanda has joined #openstack-lbaas | 14:58 | |
johnsom | sorrison Any chance you can include the full worker log output for a load balancer failover attempt? | 15:15 |
*** psachin has quit IRC | 15:40 | |
*** ataraday_ has joined #openstack-lbaas | 15:58 | |
johnsom | #startmeeting Octavia | 16:01 |
openstack | Meeting started Wed Sep 16 16:01:08 2020 UTC and is due to finish in 60 minutes. The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:01 |
*** openstack changes topic to " (Meeting topic: Octavia)" | 16:01 | |
openstack | The meeting name has been set to 'octavia' | 16:01 |
ataraday_ | hi | 16:01 |
johnsom | Hi everyone (anyone?). I know a few folks are not able to make the meeting today. | 16:01 |
aannuusshhkkaa | hey | 16:01 |
johnsom | #topic Announcements | 16:02 |
*** openstack changes topic to "Announcements (Meeting topic: Octavia)" | 16:02 | |
johnsom | I will jump right in | 16:02 |
johnsom | We are in feature freeze for the Victoria release. | 16:02 |
johnsom | #link https://releases.openstack.org/victoria/schedule.html | 16:02 |
cgoncalves | Hi | 16:02 |
haleyb | o/ | 16:02 |
johnsom | We should not be merging any new features for Victoria at this time. | 16:02 |
johnsom | Our focus should be on getting bug fixes merged for the release candidate. | 16:03 |
johnsom | RC1 will be cut next week | 16:03 |
johnsom | Also an FYI, I will be giving an Octavia project update presentation at OpenInfra Days Turkey next week. | 16:04 |
johnsom | #link https://openinfradayturkey.com/schedule/ | 16:04 |
cgoncalves | Nice! Congrats and thanks! | 16:04 |
johnsom | They asked if I could present through the OpenStack speaker bureau and I accepted. | 16:04 |
johnsom | Finally, I wanted to mention we (the infra team) found the problem with pypa packages being missing yesterday. | 16:05 |
johnsom | pypa had a mirror source server for the CDN that ran out of disk space in August and was not getting new packages. | 16:06 |
johnsom | That has been disabled and things should be back to normal today. | 16:06 |
johnsom | Recheck patches as needed. | 16:06 |
johnsom | Any other announcements this week? | 16:07 |
johnsom | Thank you all again for your hard work reviewing in the lead up to Victoria. We have made great progress. | 16:07 |
johnsom | #topic Brief progress reports / bugs needing review | 16:08 |
*** openstack changes topic to "Brief progress reports / bugs needing review (Meeting topic: Octavia)" | 16:08 | |
ataraday_ | I prepared a followup backport for failover refactor for amphorav2 https://review.opendev.org/#/c/750632/ (as we merged backport for v1) | 16:08 |
cgoncalves | I've just come back from 2-week vacation so not much from my side. I started yesterday ALPN and HTTP/2 support in the backend side. Patches are posted | 16:08 |
johnsom | I have been busy reviewing patches, created a couple of bug fixes for the new failover flow, and have been rebasing things. | 16:08 |
johnsom | ataraday_ Thank you. I lost track of that. | 16:09 |
cgoncalves | I found bug where TLS-enabled pools fail to create. This may be prioritized for RC1 maybe | 16:09 |
cgoncalves | #link https://review.opendev.org/752239 | 16:10 |
johnsom | Yes, I will add it to the list | 16:10 |
ataraday_ | Question about delete for amphorav2 https://review.opendev.org/#/c/750081/ - should we block it? Is it a feature or followup change for a feature :D | 16:10 |
johnsom | I really need to get back and create those tempest tests for backend re-encrypt | 16:10 |
gthiemonge | I'm working on the SCTP support in the amphora, and I'm not really happy with the way UDP is handled, so I'm thinking about refactoring a part of the UDP/keepalived support in the amphora driver | 16:10 |
johnsom | ataraday_ In my boot it is fixing a bug of the code being missing from the v2 driver. grin | 16:11 |
johnsom | But open for opinion | 16:11 |
johnsom | gthiemonge Feel free | 16:11 |
*** ccamposr has joined #openstack-lbaas | 16:11 | |
johnsom | boot->book | 16:11 |
ataraday_ | johnsom, this sounds right for me. just wanted to check this out | 16:11 |
johnsom | A lot of good stuff going on. | 16:13 |
johnsom | #topic Priority bug reviews for Victoria | 16:13 |
*** openstack changes topic to "Priority bug reviews for Victoria (Meeting topic: Octavia)" | 16:13 | |
johnsom | Just a quick reminder, we are still working against the etherpad of patches that need reviewed for Victoria. | 16:13 |
johnsom | #link https://etherpad.opendev.org/p/octavia-priority-reviews | 16:13 |
johnsom | I will go through and update the list after the meeting. | 16:14 |
johnsom | I expect next week, after the meeting we will be doing the RC1 cut. | 16:14 |
*** ccamposr__ has quit IRC | 16:14 | |
johnsom | #topic Open Discussion | 16:14 |
*** openstack changes topic to "Open Discussion (Meeting topic: Octavia)" | 16:15 | |
johnsom | Any other topics this week? | 16:15 |
johnsom | PTL nominations open next week. Please consider running. I am happy to answer questions, etc. about the role. | 16:15 |
johnsom | Ok, if that is all today I will give you some review time back. grin | 16:17 |
johnsom | Thanks everyone! | 16:17 |
johnsom | #endmeeting | 16:17 |
*** openstack changes topic to "Discussions for OpenStack Octavia | Priority bug review list: https://etherpad.openstack.org/p/octavia-priority-reviews" | 16:17 | |
openstack | Meeting ended Wed Sep 16 16:17:27 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:17 |
ataraday_ | thank you! | 16:17 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/octavia/2020/octavia.2020-09-16-16.01.html | 16:17 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/octavia/2020/octavia.2020-09-16-16.01.txt | 16:17 |
openstack | Log: http://eavesdrop.openstack.org/meetings/octavia/2020/octavia.2020-09-16-16.01.log.html | 16:17 |
*** AlexStaf has joined #openstack-lbaas | 16:39 | |
*** ataraday_ has quit IRC | 16:42 | |
*** AlexStaf has quit IRC | 16:48 | |
*** tobberydberg has quit IRC | 16:51 | |
*** tobberydberg_ has joined #openstack-lbaas | 16:51 | |
*** sapd1_x has quit IRC | 16:52 | |
*** vishalmanchanda has quit IRC | 18:28 | |
openstackgerrit | Merged openstack/octavia master: Add amphora delete support to amphorav2 driver https://review.opendev.org/750081 | 18:47 |
openstackgerrit | Merged openstack/octavia stable/ussuri: Update grenade job to run one smoke test https://review.opendev.org/751854 | 19:23 |
openstackgerrit | Michael Johnson proposed openstack/octavia stable/ussuri: Remove haproxy_check_script for UDP-only LBs https://review.opendev.org/751907 | 19:27 |
openstackgerrit | Michael Johnson proposed openstack/octavia stable/ussuri: Fix memory consumption issues with default connection_limit https://review.opendev.org/747651 | 19:27 |
*** gcheresh has quit IRC | 20:00 | |
*** ccamposr__ has joined #openstack-lbaas | 20:13 | |
*** ccamposr has quit IRC | 20:14 | |
haleyb | johnsom: sorry, i know i asked this question before, do you remember which change you fixed the 'insert-headers is not a valid option for a TCP protocol listener.' failure? we're re-enabling the API tests in the ovn provider and three of the listener tests trip over this | 20:47 |
johnsom | This one? https://review.opendev.org/#/c/744047/ | 20:51 |
*** sapd1 has quit IRC | 20:52 | |
*** sapd1_x has joined #openstack-lbaas | 20:52 | |
johnsom | haleyb Or do you mean this one? https://review.opendev.org/744805 | 20:52 |
haleyb | johnsom: i think the first one, but if the api tests require the second that would explain my failure | 20:53 |
haleyb | the tripleo job is failing on the scenario tests, which would be the first one i think though | 20:54 |
johnsom | haleyb The rework of the API test to remove the broken OVN stuff is the second patch. (The API version of the scenario test fix patch) | 20:54 |
haleyb | johnsom: ack, the first one should have helped in the tripleo failure then, more digging, but thanks for the link couldn't find that one | 20:56 |
*** zzzeek has quit IRC | 21:05 | |
*** zzzeek has joined #openstack-lbaas | 21:07 | |
haleyb | johnsom: so the tripleo scenario job is somehow using an older version of octavia-tempest-plugin code, guess i'll be learning more about that tomorrow :-/ | 21:07 |
johnsom | Neat | 21:11 |
*** zzzeek has quit IRC | 21:22 | |
*** zzzeek has joined #openstack-lbaas | 21:25 | |
*** rcernin has joined #openstack-lbaas | 21:31 | |
*** rcernin has quit IRC | 21:36 | |
*** TrevorV has quit IRC | 21:57 | |
rm_work | johnsom: do you have ANY idea how amp records could *become disassociated* with an LB (and that LB be deleted)? | 22:06 |
rm_work | I guess I could see housekeeping doing that maybe? if it's possible for an amp record to exist in ERROR state on a DELETED LB, maybe on the cleanup it somehow is set to remove the linkage to avoid FK constraint? | 22:07 |
johnsom | rm_work Well, the sqlalchemy ORM magic may be setup to cascade delete or referentially delete. | 22:07 |
rm_work | if this were one or two i might ignore it but it's like 20% of our amp records | 22:07 |
rm_work | the fact that it is POSSIBLE worries me | 22:08 |
rm_work | I assume that amps got in a weird state and this happened due to the various errors we had around failover | 22:08 |
johnsom | Yeah, that is super odd. So you have amps, but no lb_ID on them? Not spares right? | 22:08 |
rm_work | but it really shouldn't be possible to null-out an amp record's LB-ID | 22:08 |
rm_work | yes | 22:08 |
rm_work | and looking at logs, they HAD lb_ids | 22:08 |
rm_work | but at some point they were cleared | 22:08 |
rm_work | the thing in common is that none of those LBs exist anymore | 22:09 |
rm_work | so something about the delete is actually ... clearing out the amp record's FK relation | 22:09 |
rm_work | rather than stopping the delete from happening because of an existing amp record (probably what SHOULD happen?) | 22:09 |
*** rcernin has joined #openstack-lbaas | 22:09 | |
johnsom | This rings a bell, but I'm not sure which one | 22:11 |
rm_work | k just ... ruminate on that i guess :D | 22:12 |
rm_work | i'll keep trying to get newer code deployed with fixes for the failover issue, and the amp-delete function, then i will clean all those up, and then see if it happens again | 22:12 |
rm_work | also I keep meaning to work on an auditing script for octavia objects | 22:13 |
rm_work | so i can see if there are orphaned ports/compute/etc created in octavia projects | 22:13 |
rm_work | have you worked on anything like that? | 22:13 |
johnsom | Yeah, I will poke around and see if I can see something in the code | 22:13 |
rm_work | k, lower priority than getting that fix merged tho | 22:13 |
johnsom | No. The new failover will cleanup vip and vrrp ports, but the member ports we don't track in the DB (yet), so we could leak there. | 22:14 |
*** rcernin has quit IRC | 22:18 | |
rm_work | I want to audit just because given the number of weird failures (and especially this weird disassociation) I don't 100% trust that and I would like to do a sweep and see if there's anything unexpected, just to make sure. | 22:22 |
rm_work | That will improve my trust level. | 22:22 |
rm_work | We're seeing an abnormally high port usage number for the LBs we have, so I've been asked to check it out. | 22:22 |
*** rcernin has joined #openstack-lbaas | 22:32 | |
*** rcernin has quit IRC | 22:33 | |
*** rcernin has joined #openstack-lbaas | 22:33 | |
johnsom | Well, something to note too, only LB failover cleans them up and only with the new failover patch. | 22:36 |
*** zzzeek has quit IRC | 22:38 | |
rm_work | hmm | 22:39 |
rm_work | yeah i need to write some script for this audit probably :/ | 22:39 |
*** zzzeek has joined #openstack-lbaas | 22:40 | |
johnsom | Otherwise if something went wrong and it didn't get deleted it would still be there. | 22:40 |
johnsom | rm_work I did a search and could not find a place we update the amphora record that would remove the lb_id. | 22:54 |
sorrison | rm_work: yeah we see this issue too | 22:55 |
rm_work | johnsom: yeah i couldn't, so my assumption is it's something with our FK / cascade settings in the ORM | 22:55 |
rm_work | and the decision it makes on a delete isn't obvious | 22:55 |
rm_work | there's nowhere we actually null the field out explicitly, for sure | 22:55 |
rm_work | sorrison: yours is also with deleted LBs? probably all post-housekeeping-purge I would guess? | 22:56 |
rm_work | that's my current #1 theory | 22:56 |
rm_work | housekeeping purge breaks the FK link by doing a null | 22:56 |
sorrison | Yeah I think that is what is going on | 22:57 |
johnsom | But why would there still be amps if the record qualifies for purge (DELETED) | 22:57 |
* rm_work shrugs | 22:58 | |
rm_work | not sure | 22:58 |
johnsom | What status are the amps in? | 22:58 |
sorrison | the ones I see aren't in DELETED state in the DB | 22:58 |
rm_work | amp in ERROR still allows for a delete to complete somehow? | 22:58 |
*** zzzeek has quit IRC | 22:58 | |
sorrison | ERROR | 22:58 |
rm_work | all ERROR | 22:58 |
johnsom | And there are nova instances behind them, or just DB records? | 22:59 |
rm_work | just DB records AFAICT | 22:59 |
rm_work | seems the compute/ports are deleted on the ones I checked | 23:00 |
rm_work | but there's 30 and i'm not doing that by hand | 23:00 |
rm_work | thus, audit script | 23:00 |
*** zzzeek has joined #openstack-lbaas | 23:00 | |
johnsom | Ok, that is making slightly more sense, maybe | 23:00 |
sorrison | yeah the compute and ports are all gone, we've had around ~30 of them in the past month or so | 23:02 |
sorrison | We just mark them as deleted in the DB to clean it up for now | 23:03 |
sorrison | on a different note, johnsom: here are the worker logs for my broken LB that won't failover http://paste.openstack.org/show/797962/ | 23:08 |
johnsom | sorrison I sent you a message earlier. On the issue you are seeing, could you get me the full worker log for an LB failover attempt that failed? | 23:09 |
johnsom | lol | 23:09 |
sorrison | hehe, you make all better now :-) | 23:09 |
rm_work | rofl | 23:12 |
johnsom | sorrison Ok, that gives me what I need to figure it out. | 23:13 |
johnsom | Thanks | 23:13 |
sorrison | I have deleted the existing errored nova instance and ports and also purged the amphora db records | 23:14 |
johnsom | It's haproxy complaining, I will need to figure out why. | 23:15 |
*** zzzeek has quit IRC | 23:17 | |
*** zzzeek has joined #openstack-lbaas | 23:19 | |
sorrison | could it be something to do with the listener? | 23:20 |
sorrison | listener looks normal | 23:21 |
johnsom | Well, it's probably related to the zero amphora on lb, but I'm not sure why. It's the peer section that should only have itself in it if there were zero other amp records. | 23:22 |
sorrison | I had the same error when there were 2 amphora there (one was deleted) | 23:33 |
sorrison | *nova instance was gone but both amps were in ERROR | 23:33 |
johnsom | Well, there are cases where it could happen but it should not revert | 23:34 |
*** zzzeek has quit IRC | 23:43 | |
*** zzzeek has joined #openstack-lbaas | 23:44 | |
*** zzzeek has quit IRC | 23:49 | |
*** zzzeek has joined #openstack-lbaas | 23:52 | |
*** armax has quit IRC | 23:52 | |
johnsom | Ugh, well, booting an LB and clearing the LB_ID from the amps doesn't trigger it. | 23:53 |
*** zzzeek has quit IRC | 23:57 | |
johnsom | sorrison Are you using a centos image? Which version? | 23:58 |
sorrison | na, using bionic | 23:58 |
*** zzzeek has joined #openstack-lbaas | 23:58 | |
johnsom | Custom haproxy in the image? | 23:58 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!