Wednesday, 2020-09-16

*** zzzeek has quit IRC00:13
*** zzzeek has joined #openstack-lbaas00:15
*** zzzeek has quit IRC00:20
*** zzzeek has joined #openstack-lbaas00:21
*** zzzeek has quit IRC00:33
*** zzzeek has joined #openstack-lbaas00:35
*** ianychoi__ is now known as ianychoi00:44
sorrisonI've delete all the amphora and ports for the amphora manually and then attempted a failover but still getting the same error about `"[ALERT] 259/005926 (1009) : Proxy '2303926f-e0f0-4d64-a55a-91ac55e33bcb:0eff681a-2839-4327-9f1e-341d861edc54': unable to find local peer 'y-mDIaEIktlflHuOB0-4vAQzCfM' in peers section '9924384542d1415586494f01c823ab2a_peers'.\n[WARNING] 259/005926 (1009) : Removing incomp01:01
sorrisonlete section 'peers 9924384542d1415586494f01c823ab2a_peers' (no peer named 'y-mDIaEIktlflHuOB0-4vAQzCfM').\n[ALERT] 259/005926 (1009) : Fatal errors found in configuration.`01:01
*** sapd1 has joined #openstack-lbaas01:15
*** zzzeek has quit IRC01:22
*** zzzeek has joined #openstack-lbaas01:24
*** zzzeek has quit IRC01:45
*** zzzeek has joined #openstack-lbaas01:47
johnsomsorrison: I will look tomorrow, can’t now. Please let me know what version you are running.01:48
sorrisonok thanks, no worries, I'm still tweaking it. I'm running ussuri with the failover patch. I had a look at what I'm missing from master but can't see anything related on first glance01:49
*** spatel has joined #openstack-lbaas02:33
*** rcernin has quit IRC02:46
*** rcernin has joined #openstack-lbaas02:56
*** psachin has joined #openstack-lbaas02:59
*** armax has quit IRC03:32
*** spatel has quit IRC04:17
*** gcheresh has joined #openstack-lbaas05:10
*** rcernin has quit IRC05:19
*** rcernin has joined #openstack-lbaas05:28
*** rcernin has quit IRC05:28
*** rcernin has joined #openstack-lbaas05:28
*** AlexStaf has joined #openstack-lbaas05:47
*** vishalmanchanda has joined #openstack-lbaas06:40
*** ccamposr__ has joined #openstack-lbaas07:21
*** ccamposr has quit IRC07:24
*** ataraday_ has joined #openstack-lbaas07:43
*** rcernin has quit IRC08:06
*** TMM has quit IRC08:20
*** TMM has joined #openstack-lbaas08:20
rm_workjohnsom: found an interesting one. still working on tracing... but it looks like somehow an amp that WAS on an LB lost its LB linkage08:53
rm_workLB is now deleted!08:53
rm_workamp still exists, tho ports and everything are gone08:53
rm_workloadbalancer_id is None08:53
rm_workI found in logs where it used to have a loadbalancer_id...08:54
rm_workthe only thing I see that happened after I clearly see a successful provision of the amp (with lb_id) is a month later a failed cert rotate that reverted the amp to error (presumably the compute was gone?) but not sure if the lb_id was cleared before or after. Looked through the rotate flow, can't see how it would clear the lb_id (even on a revert) so my guess is it happened before that08:55
rm_workwe have about 30 amps in this state, out of 15508:56
rm_workso something is causing the amps to lose their LB linkage pretty regularly O_o08:56
rm_workall of these amps that are missing lb_id are in ERROR08:57
rm_workah nope, delete of LB happened after the failed cert rotate08:57
rm_worknevermind, original delete happened like a week before that cert-rotate failed. then housekeeping deleted it after. O_o09:09
rm_workmaybe housekeeping can delink?09:09
rm_workif a LB is deleted, is it possible for it to have amp records linked still? and then housekeeping deletes the record fully for the LB but leaves the amps?09:10
dulekjohnsom: Just a thought regarding o-hm… Would it be possible to just disable it? I don't think we are using any healthchecks on the gate at the moment.11:16
cgoncalvesdulek, I'd strongly advise against disabling o-hm. it serves many purposes, including setting new LB operating status to ONLINE11:19
cgoncalvesdulek, if you have amphora spare pool disabled and your amps are short-lived ones (just CI testing), you could disable o-hk11:22
*** zzzeek has quit IRC11:27
dulekcgoncalves: Right, it's just in CI context. Okay, thanks for explanations!11:29
*** zzzeek has joined #openstack-lbaas11:30
*** servagem has joined #openstack-lbaas11:58
*** ramishra has quit IRC11:59
*** ramishra has joined #openstack-lbaas12:21
openstackgerritMerged openstack/octavia master: Add a release note about HAProxy 2.0  https://review.opendev.org/75053112:24
*** vishalmanchanda has quit IRC13:04
*** sapd1_x has joined #openstack-lbaas13:19
openstackgerritCarlos Goncalves proposed openstack/octavia master: Fix AttributeError on TLS-enabled pool provisioning  https://review.opendev.org/75223913:38
*** TrevorV has joined #openstack-lbaas14:02
*** ataraday_ has quit IRC14:11
*** armax has joined #openstack-lbaas14:37
*** AlexStaf has quit IRC14:50
*** vishalmanchanda has joined #openstack-lbaas14:58
johnsomsorrison Any chance you can include the full worker log output for a load balancer failover attempt?15:15
*** psachin has quit IRC15:40
*** ataraday_ has joined #openstack-lbaas15:58
johnsom#startmeeting Octavia16:01
openstackMeeting started Wed Sep 16 16:01:08 2020 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.16:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:01
*** openstack changes topic to " (Meeting topic: Octavia)"16:01
openstackThe meeting name has been set to 'octavia'16:01
ataraday_hi16:01
johnsomHi everyone (anyone?). I know a few folks are not able to make the meeting today.16:01
aannuusshhkkaahey16:01
johnsom#topic Announcements16:02
*** openstack changes topic to "Announcements (Meeting topic: Octavia)"16:02
johnsomI will jump right in16:02
johnsomWe are in feature freeze for the Victoria release.16:02
johnsom#link https://releases.openstack.org/victoria/schedule.html16:02
cgoncalvesHi16:02
haleybo/16:02
johnsomWe should not be merging any new features for Victoria at this time.16:02
johnsomOur focus should be on getting bug fixes merged for the release candidate.16:03
johnsomRC1 will be cut next week16:03
johnsomAlso an FYI, I will be giving an Octavia project update presentation at OpenInfra Days Turkey next week.16:04
johnsom#link https://openinfradayturkey.com/schedule/16:04
cgoncalvesNice! Congrats and thanks!16:04
johnsomThey asked if I could present through the OpenStack speaker bureau and I accepted.16:04
johnsomFinally, I wanted to mention we (the infra team) found the problem with pypa packages being missing yesterday.16:05
johnsompypa had a mirror source server for the CDN that ran out of disk space in August and was not getting new packages.16:06
johnsomThat has been disabled and things should be back to normal today.16:06
johnsomRecheck patches as needed.16:06
johnsomAny other announcements this week?16:07
johnsomThank you all again for your hard work reviewing in the lead up to Victoria. We have made great progress.16:07
johnsom#topic Brief progress reports / bugs needing review16:08
*** openstack changes topic to "Brief progress reports / bugs needing review (Meeting topic: Octavia)"16:08
ataraday_I prepared a followup backport for failover refactor for amphorav2 https://review.opendev.org/#/c/750632/ (as we merged backport for v1)16:08
cgoncalvesI've just come back from 2-week vacation so not much from my side. I started yesterday ALPN and HTTP/2 support in the backend side. Patches are posted16:08
johnsomI have been busy reviewing patches, created a couple of bug fixes for the new failover flow, and have been rebasing things.16:08
johnsomataraday_ Thank you. I lost track of that.16:09
cgoncalvesI found bug where TLS-enabled pools fail to create. This may be prioritized for RC1 maybe16:09
cgoncalves#link https://review.opendev.org/75223916:10
johnsomYes, I will add it to the list16:10
ataraday_Question about delete for amphorav2 https://review.opendev.org/#/c/750081/ - should we block it? Is it a feature or followup change for a feature :D16:10
johnsomI really need to get back and create those tempest tests for backend re-encrypt16:10
gthiemongeI'm working on the SCTP support in the amphora, and I'm not really happy with the way UDP is handled, so I'm thinking about refactoring a part of the UDP/keepalived support in the amphora driver16:10
johnsomataraday_ In my boot it is fixing a bug of the code being missing from the v2 driver. grin16:11
johnsomBut open for opinion16:11
johnsomgthiemonge Feel free16:11
*** ccamposr has joined #openstack-lbaas16:11
johnsomboot->book16:11
ataraday_johnsom, this sounds right for me. just wanted to check this out16:11
johnsomA lot of good stuff going on.16:13
johnsom#topic Priority bug reviews for Victoria16:13
*** openstack changes topic to "Priority bug reviews for Victoria (Meeting topic: Octavia)"16:13
johnsomJust a quick reminder, we are still working against the etherpad of patches that need reviewed for Victoria.16:13
johnsom#link https://etherpad.opendev.org/p/octavia-priority-reviews16:13
johnsomI will go through and update the list after the meeting.16:14
johnsomI expect next week, after the meeting we will be doing the RC1 cut.16:14
*** ccamposr__ has quit IRC16:14
johnsom#topic Open Discussion16:14
*** openstack changes topic to "Open Discussion (Meeting topic: Octavia)"16:15
johnsomAny other topics this week?16:15
johnsomPTL nominations open next week. Please consider running. I am happy to answer questions, etc. about the role.16:15
johnsomOk, if that is all today I will give you some review time back. grin16:17
johnsomThanks everyone!16:17
johnsom#endmeeting16:17
*** openstack changes topic to "Discussions for OpenStack Octavia | Priority bug review list: https://etherpad.openstack.org/p/octavia-priority-reviews"16:17
openstackMeeting ended Wed Sep 16 16:17:27 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:17
ataraday_thank you!16:17
openstackMinutes:        http://eavesdrop.openstack.org/meetings/octavia/2020/octavia.2020-09-16-16.01.html16:17
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/octavia/2020/octavia.2020-09-16-16.01.txt16:17
openstackLog:            http://eavesdrop.openstack.org/meetings/octavia/2020/octavia.2020-09-16-16.01.log.html16:17
*** AlexStaf has joined #openstack-lbaas16:39
*** ataraday_ has quit IRC16:42
*** AlexStaf has quit IRC16:48
*** tobberydberg has quit IRC16:51
*** tobberydberg_ has joined #openstack-lbaas16:51
*** sapd1_x has quit IRC16:52
*** vishalmanchanda has quit IRC18:28
openstackgerritMerged openstack/octavia master: Add amphora delete support to amphorav2 driver  https://review.opendev.org/75008118:47
openstackgerritMerged openstack/octavia stable/ussuri: Update grenade job to run one smoke test  https://review.opendev.org/75185419:23
openstackgerritMichael Johnson proposed openstack/octavia stable/ussuri: Remove haproxy_check_script for UDP-only LBs  https://review.opendev.org/75190719:27
openstackgerritMichael Johnson proposed openstack/octavia stable/ussuri: Fix memory consumption issues with default connection_limit  https://review.opendev.org/74765119:27
*** gcheresh has quit IRC20:00
*** ccamposr__ has joined #openstack-lbaas20:13
*** ccamposr has quit IRC20:14
haleybjohnsom: sorry, i know i asked this question before, do you remember which change you fixed the 'insert-headers is not a valid option for a TCP protocol listener.' failure?  we're re-enabling the API tests in the ovn provider and three of the listener tests trip over this20:47
johnsomThis one? https://review.opendev.org/#/c/744047/20:51
*** sapd1 has quit IRC20:52
*** sapd1_x has joined #openstack-lbaas20:52
johnsomhaleyb Or do you mean this one? https://review.opendev.org/74480520:52
haleybjohnsom: i think the first one, but if the api tests require the second that would explain my failure20:53
haleybthe tripleo job is failing on the scenario tests, which would be the first one i think though20:54
johnsomhaleyb The rework of the API test to remove the broken OVN stuff is the second patch. (The API version of the scenario test fix patch)20:54
haleybjohnsom: ack, the first one should have helped in the tripleo failure then, more digging, but thanks for the link couldn't find that one20:56
*** zzzeek has quit IRC21:05
*** zzzeek has joined #openstack-lbaas21:07
haleybjohnsom: so the tripleo scenario job is somehow using an older version of octavia-tempest-plugin code, guess i'll be learning more about that tomorrow :-/21:07
johnsomNeat21:11
*** zzzeek has quit IRC21:22
*** zzzeek has joined #openstack-lbaas21:25
*** rcernin has joined #openstack-lbaas21:31
*** rcernin has quit IRC21:36
*** TrevorV has quit IRC21:57
rm_workjohnsom: do you have ANY idea how amp records could *become disassociated* with an LB (and that LB be deleted)?22:06
rm_workI guess I could see housekeeping doing that maybe? if it's possible for an amp record to exist in ERROR state on a DELETED LB, maybe on the cleanup it somehow is set to remove the linkage to avoid FK constraint?22:07
johnsomrm_work Well, the sqlalchemy ORM magic may be setup to cascade delete or referentially delete.22:07
rm_workif this were one or two i might ignore it but it's like 20% of our amp records22:07
rm_workthe fact that it is POSSIBLE worries me22:08
rm_workI assume that amps got in a weird state and this happened due to the various errors we had around failover22:08
johnsomYeah, that is super odd. So you have amps, but no lb_ID on them? Not spares right?22:08
rm_workbut it really shouldn't be possible to null-out an amp record's LB-ID22:08
rm_workyes22:08
rm_workand looking at logs, they HAD lb_ids22:08
rm_workbut at some point they were cleared22:08
rm_workthe thing in common is that none of those LBs exist anymore22:09
rm_workso something about the delete is actually ... clearing out the amp record's FK relation22:09
rm_workrather than stopping the delete from happening because of an existing amp record (probably what SHOULD happen?)22:09
*** rcernin has joined #openstack-lbaas22:09
johnsomThis rings a bell, but I'm not sure which one22:11
rm_workk just ... ruminate on that i guess :D22:12
rm_worki'll keep trying to get newer code deployed with fixes for the failover issue, and the amp-delete function, then i will clean all those up, and then see if it happens again22:12
rm_workalso I keep meaning to work on an auditing script for octavia objects22:13
rm_workso i can see if there are orphaned ports/compute/etc created in octavia projects22:13
rm_workhave you worked on anything like that?22:13
johnsomYeah, I will poke around and see if I can see something in the code22:13
rm_workk, lower priority than getting that fix merged tho22:13
johnsomNo. The new failover will cleanup vip and vrrp ports, but the member ports we don't track in the DB (yet), so we could leak there.22:14
*** rcernin has quit IRC22:18
rm_workI want to audit just because given the number of weird failures (and especially this weird disassociation) I don't 100% trust that and I would like to do a sweep and see if there's anything unexpected, just to make sure.22:22
rm_workThat will improve my trust level.22:22
rm_workWe're seeing an abnormally high port usage number for the LBs we have, so I've been asked to check it out.22:22
*** rcernin has joined #openstack-lbaas22:32
*** rcernin has quit IRC22:33
*** rcernin has joined #openstack-lbaas22:33
johnsomWell, something to note too, only LB failover cleans them up and only with the new failover patch.22:36
*** zzzeek has quit IRC22:38
rm_workhmm22:39
rm_workyeah i need to write some script for this audit probably :/22:39
*** zzzeek has joined #openstack-lbaas22:40
johnsomOtherwise if something went wrong and it didn't get deleted it would still be there.22:40
johnsomrm_work I did a search and could not find a place we update the amphora record that would remove the lb_id.22:54
sorrisonrm_work: yeah we see this issue too22:55
rm_workjohnsom: yeah i couldn't, so my assumption is it's something with our FK / cascade settings in the ORM22:55
rm_workand the decision it makes on a delete isn't obvious22:55
rm_workthere's nowhere we actually null the field out explicitly, for sure22:55
rm_worksorrison: yours is also with deleted LBs? probably all post-housekeeping-purge I would guess?22:56
rm_workthat's my current #1 theory22:56
rm_workhousekeeping purge breaks the FK link by doing a null22:56
sorrisonYeah I think that is what is going on22:57
johnsomBut why would there still be amps if the record qualifies for purge (DELETED)22:57
* rm_work shrugs22:58
rm_worknot sure22:58
johnsomWhat status are the amps in?22:58
sorrisonthe ones I see aren't in DELETED state in the DB22:58
rm_workamp in ERROR still allows for a delete to complete somehow?22:58
*** zzzeek has quit IRC22:58
sorrisonERROR22:58
rm_workall ERROR22:58
johnsomAnd there are nova instances behind them, or just DB records?22:59
rm_workjust DB records AFAICT22:59
rm_workseems the compute/ports are deleted on the ones I checked23:00
rm_workbut there's 30 and i'm not doing that by hand23:00
rm_workthus, audit script23:00
*** zzzeek has joined #openstack-lbaas23:00
johnsomOk, that is making slightly more sense, maybe23:00
sorrisonyeah the compute and ports are all gone, we've had around ~30 of  them in the past month or so23:02
sorrisonWe just mark them as deleted in the DB to clean it up for now23:03
sorrisonon a different note, johnsom: here are the worker logs for my broken LB that won't failover http://paste.openstack.org/show/797962/23:08
johnsomsorrison I sent you a message earlier. On the issue you are seeing, could you get me the full worker log for an LB failover attempt that failed?23:09
johnsomlol23:09
sorrisonhehe, you make all better now :-)23:09
rm_workrofl23:12
johnsomsorrison Ok, that gives me what I need to figure it out.23:13
johnsomThanks23:13
sorrisonI have deleted the existing errored nova instance and ports and also purged the amphora db records23:14
johnsomIt's haproxy complaining, I will need to figure out why.23:15
*** zzzeek has quit IRC23:17
*** zzzeek has joined #openstack-lbaas23:19
sorrisoncould it be something to do with the listener?23:20
sorrisonlistener looks normal23:21
johnsomWell, it's probably related to the zero amphora on lb, but I'm not sure why. It's the peer section that should only have itself in it if there were zero other amp records.23:22
sorrisonI had the same error when there were 2 amphora there (one was deleted)23:33
sorrison*nova instance was gone but both amps were in ERROR23:33
johnsomWell, there are cases where it could happen but it should not revert23:34
*** zzzeek has quit IRC23:43
*** zzzeek has joined #openstack-lbaas23:44
*** zzzeek has quit IRC23:49
*** zzzeek has joined #openstack-lbaas23:52
*** armax has quit IRC23:52
johnsomUgh, well, booting an LB and clearing the LB_ID from the amps doesn't trigger it.23:53
*** zzzeek has quit IRC23:57
johnsomsorrison Are you using a centos image? Which version?23:58
sorrisonna, using bionic23:58
*** zzzeek has joined #openstack-lbaas23:58
johnsomCustom haproxy in the image?23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!