Friday, 2019-08-23

*** dtruong has quit IRC00:07
*** dtruong has joined #openstack-lbaas00:07
*** bzhao__ has joined #openstack-lbaas00:19
johnsomOk, done with my action items from this morning.  Signing off for the evening.00:28
*** sapd1_x has joined #openstack-lbaas01:30
*** goldyfruit has quit IRC03:01
*** psachin has joined #openstack-lbaas03:02
*** KeithMnemonic1 has quit IRC03:06
*** KeithMnemonic has joined #openstack-lbaas03:12
*** ramishra has joined #openstack-lbaas03:17
*** ricolin has quit IRC04:14
*** ricolin has joined #openstack-lbaas04:21
openstackgerritHidekazu Nakamura proposed openstack/octavia master: Add install guide for Ubuntu  https://review.opendev.org/67284205:00
*** ricolin has quit IRC05:02
*** ricolin has joined #openstack-lbaas05:03
*** takamatsu has joined #openstack-lbaas07:08
*** ricolin has quit IRC07:15
*** rcernin has quit IRC07:15
*** trident has quit IRC07:25
*** trident has joined #openstack-lbaas07:31
*** sapd1_ has joined #openstack-lbaas07:33
*** sapd1 has quit IRC07:37
*** rpittau|afk is now known as rpittau08:14
*** tkajinam has quit IRC08:19
*** ivve has joined #openstack-lbaas08:26
openstackgerritCarlos Goncalves proposed openstack/octavia stable/stein: worker: Re-add FailoverPreparationForAmphora  https://review.opendev.org/67818008:33
openstackgerritCarlos Goncalves proposed openstack/octavia stable/rocky: worker: Re-add FailoverPreparationForAmphora  https://review.opendev.org/67818108:33
openstackgerritCarlos Goncalves proposed openstack/octavia stable/queens: worker: Re-add FailoverPreparationForAmphora  https://review.opendev.org/67818208:34
*** sapd1_x has quit IRC08:47
*** salmankhan has joined #openstack-lbaas08:50
*** gcheresh has joined #openstack-lbaas09:03
*** ajay33 has joined #openstack-lbaas09:07
rm_worki'm going to be on during the later side of the day this time, flipping my schedule around09:42
rm_worksleep time now ;)09:42
*** gcheresh has quit IRC09:50
*** psachin has quit IRC10:04
*** psachin has joined #openstack-lbaas10:06
*** maciejjozefczyk has joined #openstack-lbaas10:06
*** maciejjozefczyk has quit IRC10:07
*** roukoswarf has quit IRC10:24
*** rouk has joined #openstack-lbaas10:24
openstackgerritAnn Taraday proposed openstack/octavia master: Convert pool flows to use dicts  https://review.opendev.org/66538110:29
openstackgerritAnn Taraday proposed openstack/octavia master: Transition amphora flows to dicts  https://review.opendev.org/66889810:29
openstackgerritAnn Taraday proposed openstack/octavia master: [WIP] Lb flows to dicts  https://review.opendev.org/67172510:29
openstackgerritAnn Taraday proposed openstack/octavia master: [WIP] Jobboard based controller  https://review.opendev.org/64740610:45
*** tesseract has joined #openstack-lbaas11:12
*** goldyfruit has joined #openstack-lbaas11:14
*** salmankhan has quit IRC11:30
*** salmankhan has joined #openstack-lbaas11:31
openstackgerritAnn Taraday proposed openstack/octavia master: [WIP] Lb flows to dicts  https://review.opendev.org/67172511:58
*** spatel has joined #openstack-lbaas12:43
*** spatel has quit IRC12:48
*** roukoswarf has joined #openstack-lbaas12:59
*** rouk has quit IRC13:00
*** spatel has joined #openstack-lbaas13:14
*** spatel has quit IRC13:17
openstackgerritAnn Taraday proposed openstack/octavia master: Transition amphora flows to dicts  https://review.opendev.org/66889813:23
openstackgerritAnn Taraday proposed openstack/octavia master: [WIP] Lb flows to dicts  https://review.opendev.org/67172513:23
openstackgerritAnn Taraday proposed openstack/octavia master: [WIP] Jobboard based controller  https://review.opendev.org/64740613:23
*** ivve has quit IRC13:24
*** psachin has quit IRC13:47
*** ajay33 has quit IRC13:52
*** ramishra has quit IRC14:01
*** salmankhan has quit IRC14:14
*** salmankhan has joined #openstack-lbaas14:26
*** salmankhan has quit IRC15:10
*** salmankhan has joined #openstack-lbaas15:13
*** ccamposr has quit IRC15:17
*** rpittau is now known as rpittau|afk15:32
*** Vorrtex has joined #openstack-lbaas15:51
*** salmankhan has quit IRC15:55
*** salmankhan has joined #openstack-lbaas16:04
*** tesseract has quit IRC16:11
*** Vorrtex has quit IRC16:29
*** Vorrtex has joined #openstack-lbaas16:39
*** gcheresh has joined #openstack-lbaas18:07
gregworkis there any way of clearing a load balancer in PENDING_UPDATE from a failed stack delete ?19:02
gregworki mean i think the stack delete failed because the delete on the loadbalancer got stuck in pending_update19:02
johnsomSo LBs don't get "stuck" unless you hard kill the controller that owned that resource. By owned, I mean it was actively working on the load balancer.19:03
johnsomYou may have very high retry timeouts that means it could be 25+ minutes before it gives up, but they don't get stuck.19:04
johnsomNow, if you hard killed the controller (kill -9, pulled the power, etc.) you can leave orphaned resources in a PENDING_* state.19:04
johnsomWe are actually working on solving that as well (it's the jobboard effort).19:05
gregworkso that didnt happen19:05
gregworkthese resources were stood up by openshift-ansible.  and got stuck tearing that environemnt down via openstack stack delete openshift-cluster19:05
gregworkno servers have been abruptly turned off19:06
gregworkit just got stuck for some reason19:06
johnsomMaybe it did a kill -9 and not a graceful shutdown (kill -15)?19:06
*** bzhao__ has quit IRC19:06
johnsomAll of our code paths lead back to an unlocked state, either ERROR or ACTIVE.19:07
gregworkim not sure where any killing occurs, it was stood up by the stack create, and a standard stack delete occurred so i imagine its just running whatever OS::Octavia:Whatever calls19:07
johnsomAnyway, here is how to check what is going on and potentially fix it19:07
johnsomWhat is OS::Octavia?19:07
gregworki was generalizing on the heat template that setup octavia19:07
johnsomDid someone create an ansible module?19:07
gregworkall of this is being orchestrated by heat19:08
gregworkopenshift-ansible creates a heat stack that setups the environment19:08
gregworkthen does a openstack stack create on that template19:08
johnsomAh, hmm. Maybe we should try to find that heat code and see if they are kill -9 the processes.19:08
*** ash2307 has joined #openstack-lbaas19:08
gregworkthe stack create completes without issue, its just when we do the teardown of that stack19:09
gregworkthat it breaks19:09
*** salmankhan has quit IRC19:09
johnsomAnyway, so first thing is to check the logs for all of the controller processes and make sure it's not retrying actions on the resource you want to unlock. Typically they log "warning" messages when they are still retrying an action (for example if nova is failing).19:09
johnsomYeah, I don't think we have seen that heat code, so I can't really speak to it.19:10
johnsomIf all of the instances of the Octavia controllers are idle and not actively working on the resource, then you can go into the octavia database in mysql and update the load balancer record to provisioning_status = 'ERROR'. Here is example SQL:19:11
johnsomupdate load_balancer set provisioning_status = 'ERROR' where id = '<UUID of LB>';19:12
johnsomThen use the openstack command to delete the resource "openstack loadbalancer delete --cascade <lb uuid or name>"19:12
gregworkalright19:13
johnsomHowever, it is very important that you make sure no controller has active ownership of the resource. If it does, and this process is followed, you will potentially orphan resources in other services, get into failover/retry loops, and/or end up with duplicate ghost resources.19:13
gregworkjohnsom: zaneb in #heat says heat doesnt touch the octavia process19:14
gregworkin regards to if it -9 or -15's it19:14
johnsomOk, then I would carefully check the logs as one of the controllers probably still has ownership and is retrying something.19:15
johnsomThe controller logs would also reflect the unsafe shutdown as we log safe shutdowns, but if it was kill -9 there would be no shutdown messages, just a new startup.19:17
gregwork2019-08-23 15:01:36.602 26 INFO octavia.api.v2.controllers.load_balancer [req-16342311-7390-497f-9dd6-8f5afd493523 - 1d11e677a9e94ebca0dcea2b9ae9a7fe - default default] Invalid state PENDING_UPDATE of loadbalancer resource 99ea5fd5-eb1a-41e0-828a-7488acf6157719:23
gregworki see this guy on one of my controlls19:23
gregworkbefore that19:25
gregworkhttps://pastebin.com/SDmDmZYv19:25
gregworkwith traceback19:25
johnsomYeah, so that message says that some thing attempted to make a change to a load balancer while it was still in the PENDING_UPDATE state. The API will then return a 409 HTTP status code and the client should try again later. (4xx status codes are all retry codes for REST)19:25
johnsomYou are running queens? Which version?19:28
gregworkhttps://pastebin.com/vqwzxTqH19:28
gregworkthose appear to be the only other tracebacks19:29
gregworkip address already allocated in subnet19:30
gregworkim not sure how to tell if its in use19:30
gregworkis there something i can do on the controller19:31
gregworkwere kind of stuck until we can bring this stack down19:31
johnsomThe IP address in use should have just returned an error to the user saying the address was already in use on the subnet. That is a passive error.19:32
johnsomThat is a create call as well, not a delete.19:32
johnsomI'm looking at the code for your first traceback.19:33
gregworkgotcha19:33
johnsomgregwork Can you run a "openstack loadbalancer status show <lbid>" on that load balancer and paste it?19:37
gregworksure19:40
gregworkstats ?19:40
gregworki dont have status19:40
johnsomThis command: https://docs.openstack.org/python-octaviaclient/latest/cli/index.html#loadbalancer-status-show19:41
johnsomOh, the queens client was missing that.19:42
gregworkyeah :/19:42
johnsomWould you mind installing a newer version in a venv and running that command?  Queens support it, just the client didn't have it for queens.19:43
gregworkthe site im in has limited internet access19:46
gregworkis there a neutron equivalent19:46
gregworkor non osc method19:46
johnsomNo, that is it for CLI. I could give you curl commands if you are really adventurous19:47
johnsomcurl -v -X GET -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $current_token"  $test_API_ENDPOINT/v2.0/lbaas/loadbalancers/$test_lb_id/status19:48
johnsomexport test_API_ENDPOINT=$(openstack endpoint list --service load-balancer --interface public -f value -c URL)19:48
johnsomexport current_token=`curl -i -s -H "Content-Type: application/json"  -d '{"auth":{"identity":{"methods":["password"],"password":{"user":{"name":"admin","domain":{"id":"default"},"password":"password"}}},"scope":{"project":{"name":"admin","domain":{"id":"default"}}}}}'   http://$test_API_IP/identity/v3/auth/tokens | awk '/X-Subject-Token/ {print $2}' | tr -cd "[:print:]"`19:49
johnsomYour traceback has my interest. It implies that the load balancer had a listener on it that is missing from the database. Which should not be able to happen as there are database constraints that block it from being possible. The status tree dump would give me the LB data model.19:52
gregworkhaving some difficulty with those curls .. getting a 404 talking to keystone20:03
*** ivve has joined #openstack-lbaas20:03
johnsomMaybe your cloud uses the old endpoint scheme?20:04
johnsomopenstack endpoint list | grep keystone should show the proper URL path20:05
johnsomIt might have :5000 on it if it's the old form20:05
gregworkidentity       | True    | public    | https://overcloud.idm.symrad.com:1300020:06
gregworkonly one accessible20:06
johnsomHa, ok so to get a token you would use:20:07
johnsomexport current_token=`curl -k -i -s -H "Content-Type: application/json"  -d '{"auth":{"identity":{"methods":["password"],"password":{"user":{"name":"admin","domain":{"id":"default"},"password":"password"}}},"scope":{"project":{"name":"admin","domain":{"id":"default"}}}}}' https://overcloud.idm.symrad.com:13000/v3/auth/tokens | awk '/X-Subject-Token/ {print $2}' | tr -cd "[:print:]"20:07
johnsomyou might also be able to get one with "openstack token issue" if your client has that20:10
johnsomI keep forgetting about that command20:11
gregwork{"statuses": {"loadbalancer": {"listeners": [{"pools": [{"members": [{"name": "", "provisioning_status": "ACTIVE", "address": "172.20.1.7", "protocol_port": 8443, "id": "a97f1b82-cc06-432d-8ffd-cdf04d5f77db", "operating_status": "NO_MONITOR"}, {"name": "", "provisioning_status": "ACTIVE", "address": "172.20.1.8", "protocol_port": 8443, "id": "50b00457-e267-4fb1-bef7-e7d4807955ac", "operating_status":20:12
gregwork"NO_MONITOR"}], "provisioning_status": "ACTIVE", "id": "dfdf6769-01a9-490b-af40-d4eff5f22c47", "operating_status": "ONLINE", "name": "openshift-ansible-openshift.idm.symrad.com-api-lb-pool"}], "provisioning_status": "ACTIVE", "id": "bb12eacf-39c6-4e03-b8b1-5c05f3155181", "operating_status": "ONLINE", "name": "openshift-ansible-openshift.idm.symrad.com-api-lb-listener"}], "provisioning_status": "PENDING_UPDATE",20:12
gregwork"id": "99ea5fd5-eb1a-41e0-828a-7488acf61577", "operating_status": "ONLINE", "name": "openshift-ansible-openshift.idm.symrad.com-api-lb"}}}20:12
gregworkdo we think it is safe to mysql delete ?20:13
gregworkthis is the only LB in the stack left20:13
gregworkwe are deleting everything, all the instances are gone20:13
gregworkthe network will be torn down20:13
gregworketc20:13
johnsomYeah, if you have checked the logs and it's not scrolling warnings, it's probably safe20:13
johnsomThank you for taking the time on this, I want to understand that traceback20:14
johnsomI do see how it didn't rollback correctly. That was fixed in Rocky as part of some other work. I should probably create a special backport patch for queens for that issue. The test and lock is outside the larger transaction in queens and isn't set to rollback like it should.20:15
johnsomThat is really strange. So it's a simple load balancer with just one listener.20:17
johnsomIf you still have the DB, it would be interesting to see if there is a listener here with id "bb12eacf-39c6-4e03-b8b1-5c05f3155181", I think it is, but....20:17
gregworkoctavia db name is octavia or octavia_api20:19
johnsomoctavia20:19
johnsommysql octavia20:19
johnsomshould open it20:19
gregworktrying to figure out how tripleo locked down that20:19
gregworkgot it20:20
gregworkok were in the db, any last requests before we blow up the bad lb20:21
gregworkredhat is asking about the bug in rocky you mentioned.. could you provide the link for them20:21
gregworkthey are on my webex20:21
gregworkso just pasting it here would work20:21
johnsomselect * from listener where id = "bb12eacf-39c6-4e03-b8b1-5c05f3155181";20:22
johnsomBug in Rocky?20:22
gregworkfrom your earlier comment20:22
gregworkabout rollback not working?20:22
gregworkbtw the results of that query20:23
gregworkhttps://pastebin.com/E3kE9zyC20:23
johnsomThe roll back I mentioned is a bug in queens, not Rocky20:23
gregworkyeah do you have a reference to the bug i can pass them20:23
gregworka launchpad or something20:23
johnsomI don't have one. I don't think there is an open bug/story for that.20:23
johnsomYou can give them this link: https://github.com/openstack/octavia/blob/stable/queens/octavia/api/v2/controllers/member.py#L27320:24
* johnsom notes, it's just going to come back to me anyway....20:24
gregworkok gonna run that sql update20:27
johnsomSo, that is strange. So the root cause doesn't make any sense. Your traceback shows a DB query was made for that listener and it got back no results, which it should not be able to do due to the DB relations. Plus the ID it was querying should have come from the DB. I wonder if there is a sqlalchemy bug here.20:27
*** gcheresh has quit IRC20:35
gregworkhmmn, the load balancer cleaned up but i think there are some ports left over20:36
johnsomWell, going to ERROR and then deleting the LB should have cleaned up any ports that load balancer had. They might be from a different issue, or source.20:37
johnsomThe controller would have logged if neutron refused to delete a port when we asked it to.20:38
cgoncalvesgregwork: hi. FYI, both johnsom and I are Red Hatters :)20:44
*** Vorrtex has quit IRC20:47
gregworki figured20:48
gregworkjohnsom: possibly i know there is a possibility of badness because we are going under the hood tho20:49
gregworkwe are getting 409 conflict errors trying to kill these ports20:49
gregworkthey are not attached to any host id which is odd20:49
gregworkand we just cleaned up the amphora20:49
gregworkthere are also no instances in this tenant on the networks these ports are attached to anymore20:50
gregworkso they are kind of phantoms20:50
johnsomWhat are the port names?20:50
johnsomThe only thing I can think of that would cause neutron to 409 the port delete is if there is a floating IP attached to the port.20:52
johnsomThough I'm not a neutron port expert, so there may be other causes20:52
gregworkactually i think they are ports for openshift nodes that didnt get cleaned up20:53
johnsomAh, ok.20:54
openstackgerritMichael Johnson proposed openstack/octavia master: WIP: Generate PDF documentation  https://review.opendev.org/66724921:44
*** takamatsu has quit IRC21:55
*** rcernin has joined #openstack-lbaas22:11
*** KeithMnemonic has quit IRC22:13
*** ivve has quit IRC22:17
colin-this is O/T but wondered if anybody has tried out https://github.com/cilium/cilium to test benefits of bpf/xdp in software lbs?22:35
johnsomI think there was a group that did a XDP proof-of-concept and presented at a summit about it.22:43
johnsomhttps://www.youtube.com/watch?v=1oAsRzrwAAw22:44
colin-ah nice that's relevant thanks22:47
johnsomYeah, I went to that talk22:48
johnsomWe are doing something similar to that with the TCP flows. We use the kernel splicing, so once it's established it is pretty much and in and out at the kernel level22:52
johnsomJust we don't have to write the BPF ourselves22:53
johnsomhttp://www.linuxvirtualserver.org/software/tcpsp/index.html22:54
johnsomThough that is old info, it's in the main kernel now22:55
*** rcernin has quit IRC23:11
*** rcernin has joined #openstack-lbaas23:12
openstackgerritMichael Johnson proposed openstack/octavia master: WIP: Generate PDF documentation  https://review.opendev.org/66724923:18

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!