Wednesday, 2019-04-03

*** abaindur has joined #openstack-lbaas01:06
*** abaindur has quit IRC01:09
*** abaindur has joined #openstack-lbaas01:09
*** cbrumm_ has joined #openstack-lbaas01:10
*** cbrumm has quit IRC01:12
*** kklimonda has quit IRC01:12
*** kklimonda has joined #openstack-lbaas01:12
*** hongbin has joined #openstack-lbaas01:41
*** hongbin has quit IRC03:00
openstackgerritMerged openstack/octavia master: Fix VIP plugging on CentOS-based amphorae  https://review.openstack.org/64928203:12
openstackgerritMerged openstack/octavia stable/queens: Add missing import octavia/opts.py  https://review.openstack.org/63739803:12
openstackgerritMerged openstack/octavia master: Fix diskimage-create tox, add ``build`` and ``test`` targets  https://review.openstack.org/63294803:12
*** psachin has joined #openstack-lbaas03:16
openstackgerritMerged openstack/octavia stable/queens: Ensure pool object contains the listener_id if passed  https://review.openstack.org/63740203:16
openstackgerritMerged openstack/octavia stable/rocky: Fix possible state machine hole in failover  https://review.openstack.org/63739903:16
openstackgerritMerged openstack/octavia stable/rocky: Add missing import octavia/opts.py  https://review.openstack.org/63739703:32
openstackgerritMerged openstack/octavia stable/queens: Slightly reorder member flows  https://review.openstack.org/64738103:36
*** ramishra has joined #openstack-lbaas04:17
johnsomcgoncalves: when you are back online (likely before me) feel free to start the stable releases if we have what we want merged. It looks like we got a lot of it today.04:45
*** vishalmanchanda has joined #openstack-lbaas05:39
*** lemko has joined #openstack-lbaas05:42
openstackgerritCarlos Goncalves proposed openstack/octavia stable/stein: Fix VIP plugging on CentOS-based amphorae  https://review.openstack.org/64950305:49
openstackgerritCarlos Goncalves proposed openstack/octavia stable/rocky: Fix VIP plugging on CentOS-based amphorae  https://review.openstack.org/64950405:49
openstackgerritCarlos Goncalves proposed openstack/octavia stable/queens: Fix VIP plugging on CentOS-based amphorae  https://review.openstack.org/64950505:49
cgoncalvesjohnsom, we need these ^ backports merged first. I wonder if we should also release Stein RC2, otherwise we'd be releasing Stein GA with broken spare pool05:50
cgoncalvesfinal RCs are due up to Friday05:53
openstackgerritAdit Sarfaty proposed openstack/octavia master: Fix catching driver exceptions  https://review.openstack.org/64885305:58
cgoncalvesoops! s/spare pool/vip plug on centos/06:02
*** ccamposr has joined #openstack-lbaas06:13
openstackgerritKobi Samoray proposed openstack/octavia master: Fix catching driver exceptions  https://review.openstack.org/64885306:25
*** gcheresh has joined #openstack-lbaas06:25
*** pcaruana has joined #openstack-lbaas06:36
*** pcaruana has quit IRC06:38
*** pcaruana has joined #openstack-lbaas06:38
*** rpittau|afk is now known as rpittau06:50
*** luksky has joined #openstack-lbaas06:50
openstackgerritMerged openstack/octavia-dashboard master: Drop nodejs4 jobs  https://review.openstack.org/64938007:30
*** velizarx has joined #openstack-lbaas07:31
openstackgerritpengyuesheng proposed openstack/neutron-lbaas-dashboard master: When using update_member_list method, need to pass pool_id  https://review.openstack.org/64951907:31
*** velizarx has quit IRC07:32
*** velizarx has joined #openstack-lbaas07:36
*** abaindur has quit IRC07:56
*** ramishra has quit IRC08:07
*** velizarx has quit IRC08:11
dulekcgoncalves: Hi! Thanks for yesterday's debugging, with ltomasbo's suggestions we start to think that Amp is getting into L2 mode for some reason, even though we don't provide subnet_id when creating a member.08:13
dulekcgoncalves: I checked my older env, and there the amp has a default route to a router connecting our two subnets.08:13
dulekcgoncalves: Any idea here? Do you even have some distinction between amp's L2 and L3 modes internally?08:13
ltomasbodulek, if memory works, the net namespace inside the amphora was different08:18
ltomasboas it had 2 nics instead of one08:18
ltomasboone connected to the VIP subnet, and one to the member subnet08:18
dulekltomasbo: http://paste.openstack.org/show/748772/08:19
dulekltomasbo: That's my "old" netns setup.,08:19
dulekltomasbo: But 10.1.0.168 is service subnet, I think.08:19
*** ramishra has joined #openstack-lbaas08:19
ltomasboyes, that one is right08:21
ltomasbodulek, ^^08:21
ltomasbothat was working, right?08:21
dulekltomasbo: Yup.08:21
ltomasboand that means L308:21
dulekltomasbo: Yep. I think something is forcing L2 mode on amp.08:22
ltomasbodulek, and do you have the pastebin for the non-working env?08:23
dulekltomasbo: Not really, I think I'll restack without my fix again.08:24
ltomasbodulek, ok! it will be great to configure is an L2 vs L3 setting problem08:28
dulekltomasbo: I'm running my patch with experimental queue, we have an L2 job there.08:29
dulekltomasbo: Most likely it'll pass and we'll know that something is forcing L2 mode for us.08:29
*** jiteka1 has quit IRC08:30
*** lemko has quit IRC08:31
*** velizarx has joined #openstack-lbaas08:37
*** velizarx has quit IRC08:40
*** velizarx has joined #openstack-lbaas08:42
*** luksky has quit IRC08:46
openstackgerritGregory Thiemonge proposed openstack/octavia master: Fix spare amphora check and creation  https://review.openstack.org/64938108:53
*** salmankhan has joined #openstack-lbaas08:56
*** ccamposr has quit IRC09:03
*** ccamposr has joined #openstack-lbaas09:04
*** celebdor has joined #openstack-lbaas09:07
*** luksky has joined #openstack-lbaas09:18
dulekltomasbo: That's the broken one: http://paste.openstack.org/show/748776/09:19
cgoncalvesdulek, ltomasbo: I'm not following what's the behavior you're suggesting that changed in the amp09:39
dulekcgoncalves: So this is how amphora-haproxy namespace is networking in older version: http://paste.openstack.org/show/748772/09:40
dulekcgoncalves: And you saw how it looks like now.09:40
dulekcgoncalves: Also in case of the older one, I have a default route set.09:40
cgoncalvesin your environment of yesterday, you had two networks each with its own subnet. both subnets, though, had the same range but again were on different networks09:41
dulekcgoncalves: I'm pretty sure 10.0.0.64/26 and 10.0.0.128/26 are totally separated, if that what you mean.09:42
cgoncalvesyou created the member without passing in a subnet id, so the amp assumed it was on the same network hence not plugging a new interface (eth2) and configuring it09:42
dulekcgoncalves: Anyway I only had two networks when I spawned member with subnet_id.09:42
cgoncalvesuhm, right. ok. different subnet ranges. still my point stands09:42
dulekcgoncalves: Yes, I agree. I just want to know what change made our plugin not working and establish what's the expected behavior, so we can code it correctly.09:43
dulekIt's totally possible that we were abusing some bug/assumption that got fixed recently.09:43
dulekBut to decide I need to know.09:44
dulekcgoncalves: ltomasbo was mentioning L3 and L2 modes of amphora. Is that a thing on Octavia's side or had Kuryr just made it up from its internal stuff?09:45
*** yamamoto has quit IRC09:58
*** yamamoto has joined #openstack-lbaas10:01
*** yamamoto has quit IRC10:03
*** ltomasbo has quit IRC10:12
*** yamamoto has joined #openstack-lbaas10:22
cgoncalvesdulek, sorry, what is L3 and L2 modes of amphora? I know what L3 and L2 is but I don't get in the context of amphora10:22
dulekcgoncalves: Well, I assume this means - there's no such thing on Octavia's side and we made it up. :D10:23
dulekcgoncalves: Based on octaviamember_mode10:23
cgoncalvesright10:24
dulekcgoncalves: Based on [octavia_defaults]member_mode we were adding or not the subnet_id when creating a member.10:24
cgoncalvesonly thing you have to consider is you have to specify subnet on member create otherwise amp will assume vip network10:24
dulekcgoncalves: So since the weekend not adding it stopped to work. I see first hit of that issue on Sunday.10:25
*** salmankhan has quit IRC10:26
dulekcgoncalves: Yeah, I assume that we did that L2-L3 distinction for a reason, but I don't know it and ltomasbo seems to be lunching.10:26
*** salmankhan has joined #openstack-lbaas10:26
dulekI'll try to dig the code to understand why we have that in the first place…10:26
rm_workdulek: johnsom did most of that refactoring I believe (some stuff did change in the networking code, some around the switch to bionic, some for bug fixes) so he might be the best one to ask. but, I would say that what you're describing does seem like expected behavior, you must have been abusing a bug before unknowingly, and the solution is really just to pass in the member subnet_id always :)10:32
dulekrm_work: Can you point to the johnsom commit? I haven't noticed anything relevant landing since Friday.10:33
* dulek looks again.10:33
rm_worknothing SUPER recently10:33
rm_workthe stuff I'm talking about was across the whole last cycle10:33
rm_workbut, we did just cut releases semi-recently for stein? what version are you running exactly? I think I missed that in the scrollback10:34
dulekrm_work: Okay, well, that thing broke our gates on Friday.10:34
dulekrm_work: It's on master gates so whatever's latest + upper-constraints for most libs.10:34
*** Dinesh_Bhor has quit IRC10:34
rm_workso you know for sure it was a change that happened like ... this weekend-ish?10:34
dulekrm_work: I'll try to find latest build that worked.10:35
rm_workIE, you are deploying/testing somewhat constantly and you have a pass and a fail in short succession?10:35
rm_workyeah that'd be good, would like a time window10:35
rm_workcgoncalves: is https://review.openstack.org/#/q/95a872fcd905c0f7c4f2b4cf63e93fa9770d13c1 one of those ones that's going to have to merge queens->rocky->stein because of grenade? or is something wonky happening on our stable gate tests again10:36
rm_worki think it's the latter <_<10:37
rm_workoh rofl you already did rechecks on those and i didn't see them heh10:37
*** ltomasbo has joined #openstack-lbaas10:39
openstackgerritAdit Sarfaty proposed openstack/octavia master: Fix catching driver exceptions  https://review.openstack.org/64885310:42
dulekrm_work: Okay, so it seems like it's between Friday 16:00 (probably UTC? Not sure what logstash.openstack.org uses) and first failed run is on Sunday, 12:00.10:46
rm_workHmm10:53
rm_workReally not that much would have changed on our side10:54
rm_workMaybe it's something funny like you were getting lucky sort ordering and some lib changed the way it sorts10:54
* rm_work shrugs10:55
*** ramishra has quit IRC10:56
dulekYeah, sounds possible. I'm testing without one Neutron commit, let's see…10:59
cgoncalvesrm_work, no, can be merged in any order. the grenade issue was fixed already11:04
*** ramishra has joined #openstack-lbaas11:04
rm_workK11:08
*** yamamoto has quit IRC11:15
dulekcgoncalves, rm_work: Do I need to set anything to test with CentOS amphora, or just switch the image we use (in the gates we download the nightly tarball)?11:17
cgoncalvesOCTAVIA_AMP_BASE_OS=centos11:18
cgoncalvesOCTAVIA_AMP_DISTRIBUTION_RELEASE_ID=711:18
cgoncalvesOCTAVIA_AMP_IMAGE_SIZE=311:18
cgoncalvesif you want to use nightly centos, http://tarballs.openstack.org/octavia/test-images/test-only-amphora-x64-haproxy-centos-7.qcow211:18
*** yamamoto has joined #openstack-lbaas11:20
dulekcgoncalves: Ah damn nightly won't have your fix as it was built around 6 AM and fix was merged after 7…11:21
rm_workT_T11:37
rm_workCan we trigger that? lol11:37
*** rcernin has quit IRC11:38
rm_workBut yeah, whatever image you use, there's nothing you need to configure anymore on the control plane side. They all just work.11:50
rm_workThat wasn't the case in... Maybe Pike? But we fixed it11:51
*** celebdor has quit IRC12:06
*** trown|outtypewww is now known as trown12:07
*** ramishra has quit IRC12:10
*** ramishra has joined #openstack-lbaas12:12
dulekrm_work, cgoncalves: Oh well, centos works.12:14
dulekI now think about cloud-init update in latest ubuntu-minimal.12:14
openstackgerritCarlos Goncalves proposed openstack/octavia master: Fix setting of VIP QoS policy  https://review.openstack.org/64581713:00
*** boden has joined #openstack-lbaas13:04
*** lemko has joined #openstack-lbaas13:10
rm_workOk, but you really should still include the subnet id for members :)13:22
*** ricolin has joined #openstack-lbaas13:30
*** Vorrtex has joined #openstack-lbaas13:40
*** oanson has quit IRC13:55
*** fnaval has joined #openstack-lbaas13:58
*** ramishra has quit IRC14:16
johnsomWe don’t have a L2 or L3 mode for the amps. It is always L3 and up.14:19
johnsomYour paste shows only the VIP network is plugged.  Maybe you had s route between those networks before? Routes and gateways are both given to us by neutron, we just honor what was given to us. Host routes, gateways, etc.14:22
*** yamamoto has quit IRC14:26
*** yamamoto has joined #openstack-lbaas14:26
*** yamamoto has quit IRC14:26
*** yamamoto has joined #openstack-lbaas14:27
*** yamamoto has quit IRC14:27
*** yamamoto has joined #openstack-lbaas14:28
*** ramishra has joined #openstack-lbaas14:32
*** luksky has quit IRC14:47
*** celebdor has joined #openstack-lbaas14:52
*** ramishra has quit IRC14:57
*** gcheresh has quit IRC14:57
cgoncalvesjohnsom, do you agree to release stein RC2 and hold release of maintenance versions until https://review.openstack.org/#/c/649381/ and https://review.openstack.org/#/q/I56947e0d2bb207b59b0b3928efc96546d6410f43 are all merged?14:58
johnsomIt seems like we should try to get those two in rc2.15:00
cgoncalves+115:00
*** celebdor has quit IRC15:06
*** ccamposr has quit IRC15:17
*** gcheresh has joined #openstack-lbaas15:24
*** fnaval has quit IRC15:31
*** fnaval_ has joined #openstack-lbaas15:31
openstackgerritGregory Thiemonge proposed openstack/octavia master: Fix invalid query selector with list_ports  https://review.openstack.org/64938215:34
*** goldyfruit has joined #openstack-lbaas15:37
*** gcheresh has quit IRC15:44
cgoncalvesscenario: admin restarts all controllers at same time. when nodes are up again, LBs are failed over by HM if reboot is longer than heartbeat_timeout.15:47
cgoncalvesshould the periodic health check be started for the first time only after heartbeat_timeout?15:47
cgoncalvescurrently running immediately -- https://github.com/openstack/octavia/blob/372ff99a030e6b33dad11a35cb9d5c4058805c53/octavia/cmd/health_manager.py#L6315:48
cgoncalveshmm, there's L7615:48
johnsomIt sleeps a whole heartbeat timeout interval on startup.15:49
johnsom60 seconds by default.15:49
cgoncalvesright, that's L76.15:50
cgoncalvesmight not be good enough in some deployments, it seems. network might not be 100% up again.15:51
*** rpittau is now known as rpittau|afk15:57
*** velizarx has quit IRC16:00
dmelladodulek: o/16:02
dmelladohey johnsom cgoncalves16:02
dulekjohnsom: Sorry, missed your answers. Doesn't the fact that CentOS amp works fine with our setup and only newest Ubuntu isn't suggests that it's something else?16:02
dmelladothe sick and restless me comes back, beware if you have a baby16:03
* dmellado reading the backlog16:03
dmelladolarsks: o/16:03
dmelladodulek: told me that the issue could be related to cloud-init16:03
dmelladoand I was wondering if you could have a look as well16:03
larsksdmellado: howdy!16:03
dmelladolarsks: hey! just coming back from paternity leave and it's been tough so far xD16:04
dmelladodulek could you summarize the issue to larsks, pls?16:04
larsksdmellado: hope you had a fun paternity leave, though!16:04
dmelladolarsks yeah, I'll send you some pics of the kid!16:04
duleklarsks: Newest Ubuntu amphora doesn't have a default route pointing to the router inside amphora-haproxy netns.16:05
cgoncalvesdulek, you were running ubuntu amp yesterday. we root-caused it to be the SG of the member port not allowing ingress16:05
duleklarsks: That's why we have issues in kuryr-kubernetes where we depend on routing from the LB's.16:05
larsksdulek: I'm not sure why dmellado pointed you at me. dmellado??16:05
*** velizarx has joined #openstack-lbaas16:05
dmelladolarsks: there comes the hint16:06
dmelladothe only difference we notices in between ubuntu amphora and centos amphora16:06
dmelladowas the cloud init version16:06
larsksAh, I see.  I assume ubuntu is ahead of centos by a chunk.16:06
dulekdmellado: It's not the only one, but the only one that is suspicious to me.16:06
dmelladodulek: yep, let's call it the main one16:07
larsksAnd I know there has been a lot of churn in cloud-init's handling of network configuration.16:07
dulekcgoncalves: Yes, that was a valid workaround. The SG problem was there because now we needed to open traffic from the pod subnet (as by providing --subnet-id when creating the member the traffic comes from pod subnet).16:07
larsksBut I haven't followed it closely for a while.16:07
dulekcgoncalves: Previously it came from service subnet as it was routed through default route.16:07
dmelladolarsks: seems like ubuntu amphora is missing the default route16:07
larsksIs the amphora using dhcp to configure interfaces? Or is it getting a static config generated by cloud-init from network metadata?16:08
dmelladocgoncalves: dulek ^^16:08
cgoncalvesneither the amphora-haproxy netns nor its interfaces are configured by cloud-init16:08
larskscgoncalves: so, ruling out cloud-init as an issue?16:08
dulekcgoncalves: Ah, that's interesting, so I probably wasn't right with cloud-init upgrade.16:08
cgoncalveslarsks, IIRC the problem yes16:08
johnsomCan you check that the neutron subnet had a gateway?16:09
johnsomYeah, that is true, only the lb-mgmt-net is configured via cloud-init16:09
cgoncalvesthe VIP interface (eth1) in centos is configured via DHCP it seems16:10
*** velizarx has quit IRC16:10
johnsomdmellado Congratulations BTW16:10
dmelladojohnsom: thanks!16:10
*** trown is now known as trown|lunch16:10
cgoncalvesso it might be getting the gateway from there16:10
cgoncalveshttps://github.com/openstack/octavia/blob/fa6e02cc8d8191c042625a352ce56e72adecce6f/octavia/amphorae/backends/agent/api_server/templates/rh_plug_vip_ethX.conf.j2#L5016:11
johnsomIs it always DHCP or is it dependent on what neutron gives us?  I know for Ubuntu, we decide based on what the port and subnet has from neutron. If they give us the details, we honor them, otherwise it's DHCP16:11
dulekjohnsom: Okay, that's a very solid clue to check this in the morning. At the moment my setup has CentOS amp which works as previously.16:13
dulekBut I can easily tell that the subnet we create loadbalancer in *has* a gateway_ip.16:14
dulekAnd it's exactly the one we should have there. ;)16:14
johnsomOk, yeah, so in a one-armed amphora (where the members were not provided a subnet-id, inside the netns you should have one interface and a default route set to the gateway neutron shows for the subnet16:15
dmelladodulek: ok, I'll spin up a ubuntu based one so we can do A-B16:15
dulekjohnsom: That's definitely not what I see: http://paste.openstack.org/show/748776/16:15
dulekjohnsom: Wait, might that "unable to resolve host amphora-6a609b93-6f92-4057-893b-8ba9fd2da0cb" be related?16:16
dulekYeah, seems like I don't get this on CentOS…16:17
johnsomdulek No, we disable DNS16:17
johnsomSo, wait a second, that amp isn't up. Is that the backup amphora in an active/standby pair?16:17
dulekjohnsom: No, there's no pairing configured.16:18
dulekUnless the default changed recently?16:18
johnsomIt's VIP interface doesn't have the second IP address it should have, so it's either not up yet, something else bad happened, or ....16:18
johnsomVIPs should always have two IPs in our current configuration.16:19
dulekjohnsom: I have two IP's on my env with CentOS AMP.16:19
dulekAs I said - I don't have the broken one at the moment to poke. I can fire up one tomorrow.16:20
dulekjohnsom: What data should I gather?16:20
johnsomcan you pastebin cat /etc/netns/amphora-haproxy/network/interfaces.d/eth1.cfg?16:20
johnsomOk, that ubuntu pastebin, something is not right there.16:20
dulekjohnsom: I'll ping you tomorrow with that file, okay? I'll also make sure to have the env ready in case you have more questions.16:21
johnsomI would collect the interface config file (ignore the eth1 file, it's the eth1.cfg that is used), an ip a, ip link, ip r, ip rule all from inside the netns16:22
johnsomOk, sure.16:22
johnsomcgoncalves Want to chat about the network outage issue?16:23
cgoncalvesjohnsom, sure. I don't have logs, though, but isn't the first case reporting this behavior16:24
johnsomYeah, so I'm just thinking out loud here....16:25
johnsomCurrently, if the HM is shutdown, on startup it waits the full HM timeout period before starting to do failovers. This is 60 seconds by default, which would give the amps six tries to report in.16:26
cgoncalvesright16:26
johnsomIf neutron is still hosed up and the lb-mgmt-net is still down, we currently don't have a way to know that.16:27
johnsomI think the original thoughts were that if it is down, that is a valid failure situation and we should be attempting to recover.16:28
johnsomHowever, currently we only have one lb-mgmt-net, so it's just cycling on it's own.16:28
cgoncalvesbelieve it or not, that was my suspicious. I asked beagles a few minutes ago if we could have a dependency between them16:28
*** cbrumm_ has quit IRC16:29
johnsomWell, I would also ask, "should we have a dependency with neutron?"16:29
dmelladothanks johnsom16:29
johnsomI have considered having a dead mans hand, which attempts a connection to the amphora-agent API before a failover.16:30
*** yamamoto has quit IRC16:30
johnsomHowever I'm not sure that is a good idea either, as if the heartbeat isn't coming through we aren't getting status, stats, etc.16:31
cgoncalvesagreed16:32
johnsomI'm guessing in your case, the failover ended up failing as neutron or nova was still down?  I.e. why is a failover a problem?16:33
johnsomOther than just churn, which we have rate limiting for already if that is a problem.16:33
cgoncalvesI wanted to confirm that from the logs16:33
*** psachin has quit IRC16:34
johnsomWe could try pinging the neutron gateway as a test of the lb-mgmt-net, but that wouldn't catch net splits.16:35
johnsomI have seen the net split problem where someone messed up an MLAG and *one* rack is only partially functional.16:36
cgoncalveschecking neutron agent status could be another option but I fear that would be too intrusive16:36
johnsomYeah, and how would we know what the *right* number of happy agents are?16:36
cgoncalvesand still doesn't guarantee flows are installed16:37
johnsomRight16:37
johnsomI also don't want to do in-band testing by connecting to the VIP, etc. as that means some controller is on tenant networks.16:37
cgoncalveslet's have octavia agents deployed in all compute nodes and attached to lb-mgmt-net! kidding16:38
johnsomThe good news I have had is if they are in act/stdby and this happens, the LB is still up and passing traffic even though it's in ERROR16:38
johnsomLol, yeah, isn't that neutron's job? lol16:38
johnsomWe can go down the path the appliances do, require a dedicated cable between them16:39
johnsomI remember the cisco one was a DB-9 proprietary serial cable. lol16:40
cgoncalveswell, you know, someone once upon a time added a data plane status extension16:40
cgoncalveshttps://developer.openstack.org/api-ref/network/v2/?expanded=show-network-details-detail#data-plane-status-extension16:40
johnsomhttps://usercontent.irccloud-cdn.com/file/UT4G3KuG/image.png16:42
cgoncalveslol16:42
johnsomThat was the actual cable...16:42
cgoncalvesdoesn't help much, the cable is broken16:42
* johnsom The room thinks, "whatever old timer"16:43
johnsomSo that extension, I'm sure everyone is delploying it right?16:43
cgoncalvessure......16:44
johnsomMy devstack, HM port shows "| data_plane_status       | None "16:44
johnsomSo, yeah, it's there, but useful????16:45
cgoncalvesit would be VERY useful if it was in fact being used, which is not the case16:45
johnsomNot sure I would trust it anyway....  net splits, are likely to impact it too16:46
cgoncalvesI know one large deployment that uses it. the folks that manage that cloud were the ones that asked me to implement it ;)16:47
johnsomSo it monitors OVS or something else and populates that?16:48
cgoncalvesthe field is set by an 3rd party (monitoring) tool16:49
johnsomRight, I was just curious what it was monitoring16:49
cgoncalvesthe extension is just a parameter exposed via the API16:49
johnsomRight16:50
cgoncalveshttps://specs.openstack.org/openstack/neutron-specs/specs/backlog/ocata/port-data-plane-status.html16:50
*** cbrumm_ has joined #openstack-lbaas16:50
*** cbrumm_ has quit IRC16:52
johnsomSo maybe we should open an RFE to add code that looks at that?  maybe?16:53
johnsomStill not sure what our action would be other than trigger a failover16:53
cgoncalvesfailover is the right thing to do if we're sure lb-mgmt-net is up16:55
johnsomYeah, like at least some part of it16:55
*** luksky has joined #openstack-lbaas16:55
cgoncalvesmaybe neutron could set dataplane_status to DOWN if a given neutron L2 agent that installs flows for a given network is down16:57
cgoncalvesI don't like that, though. it would be a very ML2/OVS specific way16:58
*** cbrumm has joined #openstack-lbaas17:00
*** trown|lunch is now known as trown17:00
johnsomBTW, I have started work on adding unset to the client. I just need to figure out the tags situation and add a unit test. Then I can post an LB unset patch. The rest should be easy after that.17:03
cgoncalvesto unset a VIP policy ID one can do --vip-policy-id None. why would we need to add unset?17:05
johnsomCustomers expect it and don't read the comments17:06
cgoncalvesso, "--vip-policy-id unset" or "--no-vip-policy-id"?17:06
johnsomhttps://www.irccloud.com/pastebin/0waalZhZ/17:07
cgoncalvesah!17:07
johnsomLol, need to fix the comment17:07
cgoncalvesok, makes total sense17:07
*** jniesz has joined #openstack-lbaas17:11
cgoncalveswould it work not to fail over amps unless at least one heartbeat from any amp is received by the HM on start up?17:14
*** yamamoto has joined #openstack-lbaas17:22
*** Vorrtex has quit IRC17:23
*** yamamoto has quit IRC17:32
*** yamamoto has joined #openstack-lbaas17:33
*** Vorrtex has joined #openstack-lbaas17:37
colin-we have thoughts about this behavior that i just haven't captured yet johnsom cgoncalves17:46
colin-it's a high priority to articulate them for this community17:47
colin-have a deep concern over conditions that could lead to duplicate MAC addresses, and any HM/failover outcomes that might lead to it17:47
cgoncalvescolin-, I'd love to hear more from you :)17:48
colin-yeah i will get a story in that describes the scenario as specifically as possible and why the outcome is undesirable17:48
colin-it's been a tricky one to pin down17:48
colin->johnsom> BTW, I have started work on adding unset to the client.  <--- cool!17:49
*** velizarx has joined #openstack-lbaas17:50
*** yamamoto has quit IRC18:00
johnsomcgoncalves I have +2'd the stuff we are targeting for RC218:07
johnsomActually I think there is a problem with the spares fix18:14
johnsomYeah, ok, I confirmed that the current fix does not work.18:26
johnsomcgoncalves Is Greg still around or should I take a pass at this patch?18:26
johnsomWe kind of need to get that in today18:26
*** yamamoto has joined #openstack-lbaas18:37
*** ricolin has quit IRC18:38
cgoncalvesjohnsom, most likely offline. go for it as it is urgent18:45
johnsomOk18:45
*** yamamoto has quit IRC18:45
openstackgerritMerged openstack/octavia stable/queens: Fix VIP plugging on CentOS-based amphorae  https://review.openstack.org/64950518:47
xgermanSo much scrollback today...19:02
cgoncalvestl;dr: Earth keeps spinning19:03
*** salmankhan has quit IRC19:08
*** boden has quit IRC19:25
*** boden has joined #openstack-lbaas19:32
*** abaindur has joined #openstack-lbaas19:36
*** abaindur has quit IRC19:39
*** abaindur has joined #openstack-lbaas19:39
*** abaindur has quit IRC19:42
*** abaindur has joined #openstack-lbaas19:42
johnsom#startmeeting Octavia19:59
openstackMeeting started Wed Apr  3 19:59:14 2019 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.19:59
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:59
*** openstack changes topic to " (Meeting topic: Octavia)"19:59
openstackThe meeting name has been set to 'octavia'19:59
cgoncalveshello19:59
johnsomHi folks19:59
johnsompoke rm_work our future PTL19:59
johnsom#topic Announcements20:00
*** openstack changes topic to "Announcements (Meeting topic: Octavia)"20:00
johnsomThis is the final RC week. I think we need an RC2 for octavia, so we will try to get those fixes in today and hopefully do the RC2 today as well.20:00
johnsomWe are also close to doing some stable branch releases.20:01
johnsomOther than that, I don't have any other announcements this week.  Anyone else?20:01
johnsom#topic Proposal to change meeting time (cgoncalves)20:02
*** openstack changes topic to "Proposal to change meeting time (cgoncalves) (Meeting topic: Octavia)"20:02
johnsomcgoncalves Do you want to talk to this?20:02
cgoncalvessure, thanks20:02
cgoncalvesso currently our weekly meetings are at 1pm PST, which means 10 pm CEST and 11 pm IST (Israel)20:03
johnsom2000 UTC20:04
cgoncalvesin Asia, it is in the middle of the night20:04
cgoncalvesI was wondering if we could have our meetings earlier to be more friendly to folks in EMEA and Asia20:04
cgoncalvesthanks for the correction20:05
johnsomYes as long as we get quorum with the change.20:05
cgoncalvesagreed20:05
johnsom(I was just adding to the conversation, grin)20:05
xgermansure, what time are you proposing?20:05
johnsomSo how this works, community process wise:20:06
xgermanalso we should make sure rm_work is available20:06
johnsom1. we propose some times/days20:06
cgoncalvesright20:06
johnsom2. I will create a doodle for those times/days20:06
johnsom3. We e-mail the openstack list with the details and the doodle.20:06
xgermans/I/rm_work/g20:06
johnsom4. We let that soak a week, then if we have quorum for a new time, I will go update all the places that need updating and we have a new time.20:07
cgoncalvessounds good to me20:08
johnsomQuestions/comments on the process?20:08
xgerman+1 (other than we should have rm_work own more of the process)20:08
xgermanI think it will stretch into when he takes over20:09
johnsomI would hope that rm_work would participate in the proposals.20:09
* johnsom wonders how many times we can ping him.... grin20:09
cgoncalvesperhaps current meeting time is also not much convenient for rm_work20:09
xgermanyeah, who knows which time zone he lives by nowerdays20:09
johnsomSo I will start, 1600 UTC is a nice time for me20:09
cgoncalvestrue. it's 5 am in Japan20:10
xgermansince for the people here the time works we are the wrong ones to ask to begin with20:12
cgoncalveslet's throw more time options in to doodle. say 1500 UTC20:13
johnsom+120:13
cgoncalveslet's also make sure we include the current meeting time20:13
johnsomOk, fair point20:14
xgerman+120:14
johnsomDo we have any particular day that we should propose or are no-go for folks?20:14
xgermanLet’s stick with Wednesday - Friday/Monday are funny in a lot of time zones20:15
cgoncalvesFridays are no go for Israelis20:15
johnsomI am guessing Friday, Saturday, Sunday are bad20:15
johnsomYeah, so Tue-Wed-Thur20:15
cgoncalvesTue-Thu20:15
xgerman+120:15
johnsomOk, any other proposed times for the doodle?20:16
cgoncalvescool. thank you, all!20:16
cgoncalvesjohnsom, there's an option in doodle that allows anyone to add new rows (= times), no?20:17
cgoncalvess/rows/columns/20:17
johnsomOk. I will get the process going.20:17
xgerman=120:17
johnsomyes I think so, you think we should leave it open?20:17
xgermanbut really rm_work...20:17
cgoncalveswhy not20:17
* johnsom notes it's been a year or two since I did this20:17
johnsomOk, will do20:18
cgoncalvesthanks20:18
johnsomIt just means folks need to check back to it in case new times are added20:18
johnsom#topic Brief progress reports / bugs needing review20:19
*** openstack changes topic to "Brief progress reports / bugs needing review (Meeting topic: Octavia)"20:19
johnsomI have worked on removing the last references to oslosphinx which is broken with sphinx 2, deprecated, and won't be fixed.20:20
johnsomFor the most part we have already done that, but there were two references we missed. You should not see any major changes in the docs/release notes20:20
johnsomI helped figure out a solution to our grenade issue.20:21
johnsomLots of reviews, etc.20:21
cgoncalvesjohnsom, thank you for your help troubleshooting and proposing a fix to the grenade issue. really appreciate!20:22
johnsomCurrently I'm working on adding the "unset" option to our openstack client. This will make it more clear for users of how to clear settings.20:22
johnsomI'm going to go through the main options first, then come back and do tags.20:22
xgermanI send back my laptop, did a week vacation… slowly getting my 2008 Mac into 2019’s software20:23
johnsomI need to move a module out of neutron in OSC up to osc-lib so we can share the tags code.20:23
johnsomYeah, I am also running on "alternate" hardware now.  Seems to be working ok though.20:23
johnsomI also fixed a security related issue in the OSA role.20:24
johnsom#link https://review.openstack.org/64874420:24
johnsomxgerman You might want to do a quick review on that20:24
xgermanon it20:25
johnsomSo, that is my plan for the next few days, work on unset and then tags for the client.20:25
johnsomAlso, I will be travelling and not available much Sun-Wed.20:26
johnsomJust as a heads up20:26
cgoncalvesthere is a patch in master and stein that broke spare pools. Change https://review.openstack.org/#/c/649381/ will fix it. we need to backport it to stein and release Stein RC2 this week20:26
johnsomYeah, I am going to take a stab at that after lunch.20:26
cgoncalvesit would be nice if we could merge a tempest test for spare pool to prevent regressions like this in the future20:27
cgoncalves#link https://review.openstack.org/#/c/634988/20:27
johnsomI think we just need to do another migration and we can fix it that way20:27
cgoncalvesmigration? as in DB migration?20:28
johnsomYep. Good stuff. I had previously +2'd, will circle back20:28
cgoncalvespatch set I take20:28
johnsomYeah20:28
johnsomAny other updates this week?20:28
cgoncalvesI also addressed reviews on https://review.openstack.org/#/c/645817/20:28
cgoncalvesI have a customer waiting for it20:29
johnsom+2, you addressed my only issue with it.20:29
xgermanlooking20:30
cgoncalvesthank you, thank you!20:30
xgermanneed to see what happened to my patches...20:30
johnsom#topic Open Discussion20:31
*** openstack changes topic to "Open Discussion (Meeting topic: Octavia)"20:31
johnsomOk, other topics this week?20:31
cgoncalvessome folks were discussing here on the channel earlier today about an issue where health manager would trigger failover20:32
cgoncalveswhile the network was still being configured, i.e. flows, etc20:32
xgermanyeah, it can do that ;-)20:32
xgermanhow do we know that the network is configures20:32
cgoncalvesno resolution yet20:32
xgerman?20:32
cgoncalvesprecisely, that's the question20:33
johnsomReally? The HM honors the lock on the objects, so it should not be able to start a failover if another controller owns the resouce20:33
johnsomOh, you mean the neutron networking....20:33
johnsomRight20:33
xgermanI was thinking neutron was pulling sh*t20:33
cgoncalvesyes20:33
cgoncalveswould it work not to fail over amps unless at least one heartbeat from any amp is received by the HM on start up?20:34
xgermanthat goes back to should we go i to ERROR and tell the operator neutron is broken or keep retrying20:34
xgermansee [1] http://blog.eichberger.de/posts/yolo_cloud/20:34
cgoncalveswe don't know if neutron is "broken". all we know is the HM hasn't received heartbeat within the heartbeat_timeout (60 seconds)20:35
johnsomI suspect the issue is around net splits where some hosts and racks are working, but others are not, so any heartbeat would likely have the same issue20:35
xgermancgoncalves: we had it not fail over stuff when there was no heartbeat which caused other problems20:35
xgerman(mainly amp doesn’t come up right and we never know)20:35
cgoncalvesxgerman, we still do not failover on newly created LBs20:35
xgermanI thought we fixed that a while back...20:36
johnsomI thought someone was looking at that again and proposed a fix as well. Not positive though.20:36
cgoncalvesmy understanding was that it was a feature/desired behavior, not a bug20:36
johnsomIt's a trickier problem than it seems on the surface20:36
*** gcheresh has joined #openstack-lbaas20:37
xgermanyeah, it’s the wait or trow hands-up in the air20:37
cgoncalvesjust wanted to bring this up in case anyone had some thoughts20:38
cgoncalvesthis is affecting some customers on this side20:38
xgermanhow? OSP 13 is not HA...20:39
cgoncalvesI heard other people also facing same issue20:39
johnsomPersonally, if the amp isn't talking to us, it seems like it is the right answer to fail it over. The question is what to do after that, specifically if the neutron outage causes the failover to not be successful.  Right now we fail "safe", in that it's marked ERROR, but the secondary amp is still passing traffic.20:39
johnsomlol, ouch20:39
cgoncalveshaving a HA tempest job would play a long way in having it supported in OSP20:40
johnsomWe could have a periodic that looks for ERROR LBs with an amp in ERROR and attempts an amp failover again. We would just need to figure out the right back off and make super sure we don't bother the other functional amp.20:41
johnsomI think we have some work to do on the failover flows in general actually.20:42
xgermanmmh, I would leave that for the PTG…20:42
cgoncalvesI don't like that, to be honest. we would be killing an amphora that is actually up by failing over not once but twice20:42
johnsomYes, good topic for the PTG20:42
xgermanyep , agree on failover fpws20:42
xgermanwas beating that drum for a ear now20:42
johnsomAdded a topic to the PTG etherpad20:43
johnsom#link https://etherpad.openstack.org/p/octavia-train-ptg20:44
cgoncalvesis there a reason you guys see not to like my proposed approach?20:44
*** rcernin has joined #openstack-lbaas20:45
johnsomThe one heartbeat one or did I miss something?20:45
cgoncalves"would it work not to fail over amps unless at least one heartbeat from any amp is received by the HM on start up?"20:45
cgoncalveslet me know if I'm not being clear20:46
johnsomI responded: "I suspect the issue is around net splits where some hosts and racks are working, but others are not, so any heartbeat would likely have the same issue"20:46
cgoncalvesah, sorry. I read that but didn't read it as a reply to my message20:46
johnsomWe could also try to set some threshold, say if more than x amps are "down", pause failovers20:47
*** Vorrtex has quit IRC20:47
johnsomI guess now that we can mutate the config this could be more feasible now. It would allow the operator to have a knob to turn.20:47
cgoncalvesit would be expected that sometime "soon" the network would be back again so HMs would start receiving heartbeats20:48
johnsomWell, there is always the "rack got zapped by a evil mastermind's laser" scenario where it will not come back20:49
johnsomOr the scenario I saw once, where the host was just powered off for a day20:50
*** abaindur has quit IRC20:50
cgoncalveshmm...20:51
johnsomThat one was nice, it led to a bunch of zombie instances showing up again20:51
cgoncalveshealth manager kills them now, thanks to xgerman patch20:52
johnsomStuff to think about. I captured a few on the etherpad, please add more!20:52
johnsomYeah, that is some of the background on that patch20:52
*** trown is now known as trown|outtypewww20:52
johnsomWe have about 5 minutes, were there other topics we needed to discuss?20:53
johnsomOk, just wanted to check. Sometimes we run out of time before we discuss everything.20:55
johnsomOh, BTW, the devstack patch still hasn't merged, so the barbican job is still going to fail.20:55
cgoncalveswhich devstack patch?20:56
johnsom#link https://review.openstack.org/64895120:56
cgoncalvesthanks20:58
johnsomOk, sounds like we are wrapping up.  Thanks folks!  Have a great week.20:58
johnsom#endmeeting20:58
*** openstack changes topic to "Discussions for OpenStack Octavia | Train PTG etherpad: https://etherpad.openstack.org/p/octavia-train-ptg"20:58
openstackMeeting ended Wed Apr  3 20:58:30 2019 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:58
openstackMinutes:        http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-04-03-19.59.html20:58
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-04-03-19.59.txt20:58
openstackLog:            http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-04-03-19.59.log.html20:58
*** gcheresh has quit IRC21:04
*** pcaruana has quit IRC21:06
openstackgerritMichael Johnson proposed openstack/octavia master: Fix spare amphora check and creation  https://review.openstack.org/64938121:14
cgoncalvesa db migration after all21:14
cgoncalvesmakes sense, I guess21:15
johnsomYeah, I think that is the best option to not end up with a race.21:15
*** abaindur has joined #openstack-lbaas21:18
*** boden has quit IRC21:39
*** rcernin has quit IRC21:42
*** velizarx has quit IRC21:43
*** goldyfruit has quit IRC21:46
*** fnaval_ has quit IRC22:26
openstackgerritMerged openstack/octavia master: Fix setting of VIP QoS policy  https://review.openstack.org/64581722:37
openstackgerritMichael Johnson proposed openstack/python-octaviaclient master: Adds "unset" action to the loadbalancer command  https://review.openstack.org/64975822:40
*** yamamoto has joined #openstack-lbaas22:43
*** abaindur has quit IRC22:44
*** yamamoto has quit IRC22:47
rm_workaugh for some reason this meeting isn't on my calendar anymore22:49
johnsomWell, the time might be changing....22:49
johnsomWe pinged you a few times22:49
rm_workstill getting through scrollback22:51
rm_work"cgoncalves:would it work not to fail over amps unless at least one heartbeat from any amp is received by the HM on start up?" actually sounds like a decently easy switch to flip too22:51
*** fnaval has joined #openstack-lbaas22:53
rm_workyeah22:57
rm_worki schedule my life around my calendar meeting times22:57
rm_workif it's not on my calendar, i probably won't make it22:57
rm_worklol22:57
rm_workjust fixing it now22:58
*** yamamoto has joined #openstack-lbaas23:00
*** abaindur has joined #openstack-lbaas23:07
johnsomDoodle poll: https://doodle.com/poll/9sxbzfhwirqiyqe823:11
rm_workthe times constantly change on me as i travel <_<23:12
johnsomE-mail sent23:26
*** abaindur has quit IRC23:48
*** abaindur has joined #openstack-lbaas23:49
openstackgerritMichael Johnson proposed openstack/octavia stable/stein: Fix spare amphora check and creation  https://review.openstack.org/64976623:50
johnsomCores: We would like to get these two fixes into Stein: https://review.openstack.org/#/c/649503/ and https://review.openstack.org/#/c/649766/23:50
johnsomWe need those landed by tomorrow23:51
*** abaindur has quit IRC23:53

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!