Monday, 2019-08-05

*** yamamoto has joined #openstack-lbaas01:45
*** yamamoto has quit IRC02:04
*** yamamoto has joined #openstack-lbaas02:04
*** yamamoto has quit IRC02:35
*** yamamoto has joined #openstack-lbaas02:38
*** ricolin has joined #openstack-lbaas02:51
*** abaindur has joined #openstack-lbaas03:29
*** psachin has joined #openstack-lbaas03:30
*** ricolin_ has joined #openstack-lbaas03:35
*** ricolin has quit IRC03:38
*** abaindur has quit IRC03:54
*** ramishra has joined #openstack-lbaas04:04
*** ramishra has quit IRC04:45
*** ramishra has joined #openstack-lbaas04:45
openstackgerritMerged openstack/octavia-tempest-plugin master: Adds provider flavor capabilities API tests  https://review.opendev.org/63111304:58
openstackgerritMerged openstack/octavia-tempest-plugin master: Add a flavor to the load balancer CRUD scenarios  https://review.opendev.org/63135304:58
openstackgerritMerged openstack/octavia-tempest-plugin master: Add amphora update service client and API test  https://review.opendev.org/63329504:59
openstackgerritMerged openstack/octavia-tempest-plugin master: Add amphora failover API test  https://review.opendev.org/63361404:59
*** vishalmanchanda has joined #openstack-lbaas05:13
*** abaindur has joined #openstack-lbaas05:18
*** abaindur has quit IRC05:35
*** yamamoto has quit IRC05:47
*** yamamoto has joined #openstack-lbaas05:48
*** abaindur has joined #openstack-lbaas05:54
*** abaindur has quit IRC06:05
*** ramishra has quit IRC06:06
*** ramishra has joined #openstack-lbaas06:06
*** abaindur has joined #openstack-lbaas06:12
*** abaindur has quit IRC06:31
*** ricolin__ has joined #openstack-lbaas06:41
*** ricolin__ is now known as ricolin06:41
*** ltomasbo has left #openstack-lbaas06:45
*** ricolin_ has quit IRC06:45
*** ccamposr has joined #openstack-lbaas06:46
*** maciejjozefczyk has joined #openstack-lbaas07:02
*** maciejjozefczyk has quit IRC07:03
*** rcernin has quit IRC07:04
*** pcaruana has joined #openstack-lbaas07:08
*** tesseract has joined #openstack-lbaas07:17
*** maciejjozefczyk has joined #openstack-lbaas07:33
*** rpittau|afk is now known as rpittau07:51
openstackgerritAnn Taraday proposed openstack/octavia master: Convert pool flows to use dicts  https://review.opendev.org/66538107:59
openstackgerritAnn Taraday proposed openstack/octavia master: Transition amphora flows to dicts  https://review.opendev.org/66889807:59
openstackgerritAnn Taraday proposed openstack/octavia master: [WIP] Lb flows to dicts  https://review.opendev.org/67172507:59
openstackgerritAnn Taraday proposed openstack/octavia master: [WIP] Jobboard based controller  https://review.opendev.org/64740607:59
*** tkajinam has quit IRC08:11
*** yamamoto has quit IRC08:13
openstackgerritMaciej Józefczyk proposed openstack/octavia master: Validate supported LB algorithm in Amphora provider drivers  https://review.opendev.org/67247708:19
*** tesseract-RH has joined #openstack-lbaas08:22
*** tesseract has quit IRC08:22
openstackgerritMaciej Józefczyk proposed openstack/octavia-tempest-plugin master: Specify used algorithm for tests  https://review.opendev.org/67226408:23
openstackgerritMaciej Józefczyk proposed openstack/octavia-tempest-plugin master: Add an option to reuse connections  https://review.opendev.org/67297608:23
*** tesseract-RH has quit IRC08:24
*** tesseract has joined #openstack-lbaas08:24
*** yamamoto has joined #openstack-lbaas08:54
*** yamamoto has quit IRC09:09
*** ajay33 has joined #openstack-lbaas09:21
*** lemko has joined #openstack-lbaas09:42
lemkoOne of my loadbalancer and its unique amphora failed because of some issue with database. When I do openstack loadbalancer failover <loadbalancer_id>, it creates another amphora but it gets immediately in error state. As a result of this, there are four amphora for this loadbalancer, all in error state. Any idea, what can I do?09:43
*** yamamoto has joined #openstack-lbaas09:44
*** dasp has quit IRC09:49
*** dasp has joined #openstack-lbaas09:49
*** ramishra has quit IRC11:24
*** ramishra has joined #openstack-lbaas11:26
*** devfaz has quit IRC11:26
*** devfaz has joined #openstack-lbaas11:30
*** mkuf_ is now known as mkuf11:35
*** yamamoto has quit IRC11:53
jrosserjohnsom: when we try to locally build an amphora from the stable/stein branch code, something run inside diskimage-create.sh does "Cloning from amphora-agent cache and applying ref master" so we get a master version amphora on a Stein cloud, which doesnt work..... is there somewhere to specify the amphora branch to build, on top of checking out the stein branch of the octavia code?12:07
maciejjozefczykhey! looks like we have some gate trouble after octavia-lib release, https://logs.opendev.org/77/672477/4/check/openstack-tox-py27/5af555f/job-output.txt.gz12:09
jrosserjohnsom: looks like things break becasue the API version is now changed on master here https://github.com/openstack/octavia/commit/37799137a3f1f5ff6aa0f8809a141d4ea04cca7512:09
*** yamamoto has joined #openstack-lbaas12:13
*** yamamoto has quit IRC12:24
*** ramishra has quit IRC12:46
*** yamamoto has joined #openstack-lbaas12:48
johnsomlemko You will need to look in the worker log to see why the controller is unable to build a replacement. The amphora in error state will get cleaned up by the housekeeping manager eventually. They should not have actual nova VMs behind them.13:02
*** goldyfruit has joined #openstack-lbaas13:06
johnsomjrosser https://github.com/openstack/octavia/blob/master/devstack/plugin.sh#L7213:06
johnsomYou can set two environment variables to override the version of the agent it pulls in. I thought we had that in the README file, but I don't see it.13:07
jrosserjohnsom: ah great will take a look at those13:07
johnsomjrosser So, as of this critical patch, you can older amphora image versions with the current controllers, but you can't run a newer amphora image on an older controller. This is due to the api version change.13:09
jrosseryes, thats what we see today13:09
jrosserthe amphora has been built with 1.0 api and then things are all a bit broken13:09
johnsomYeah, you would need an updated controller set13:10
jrosserso it should be possible to pin the repo back to stable/stein for the amphora build then?13:10
johnsomYes13:10
*** goldyfruit has quit IRC13:11
johnsommaciejjozefczyk Yes, there has been a patch up for that issue: https://review.opendev.org/#/c/673687/13:11
maciejjozefczykjohnsom, thanks13:12
jrosserjohnsom: so relatedly, from an openstack-ansible perspective i guess this is now only good for master? https://github.com/openstack/openstack-ansible-os_octavia/blob/master/defaults/main.yml#L24513:13
maciejjozefczykjohnsom, did the same in our unittests, i'm working on functionals to enable those foo listeners . the same way You proposed here: https://review.opendev.org/#/c/665029/21/octavia/tests/functional/api/drivers/driver_agent/test_driver_agent.py13:14
maciejjozefczykjohnsom, but for now I stuck on duplicate config option: http://paste.openstack.org/show/755513/13:15
johnsomjrosser Yes, those are master only images13:15
lemkothanks johnsom, so the log is here: http://paste.openstack.org/show/755519/ about   PortNotFound: port not found (port id: 42a01db7-0b95-49fd-afd1-cbe96c713b4a) and the operation is aborted after that. Any idea how to fix it?13:25
johnsomThe actual error may have occurred prior to that log snippet.13:27
johnsomUsually the root issue is above that tree in the log13:28
lemkoHere it is johnsom : http://paste.openstack.org/show/755520/13:33
lemkoInstance could not be found is the issue?13:34
johnsomNo, that actually looks ok. It says it's assuming it's already been deleted and moving on.13:36
johnsomIt looks like this task "octavia.controller.worker.tasks.network_tasks.GetAmphoraeNetworkConfigs" is the one failing13:36
johnsomSo, it is failing when trying to look up the VIP port.13:37
lemkoSo if the port is not found, what can be done?13:38
johnsomWell, this is very timely to report as I'm about to start work on fixing the failover flow. This is a good use case that needs to be fixed.13:39
johnsomWe can probably recover it if it is worth the effort vs. just deleting and re-building the LB.  Let me know which you prefer13:41
johnsomI have opened this story to track you situation13:42
johnsomhttps://storyboard.openstack.org/#!/story/200633313:42
lemkoIf I can help the community, I would be happy ;)13:44
lemkoIt is worth the effort for me because it might happen again in the future and I would like to be able to fix it13:44
johnsomLol. Yeah, fixing the failover flow is the next work item on my list. It has some deficiencies when resources disappear under the load balanacer.13:44
johnsomOk, give me a minute to get setup and I will work with you through the process13:45
johnsomOk, I need to restack a cloud for this. Give me ~10 minutes to get setup.13:47
lemko ok, thanks a lot!13:47
*** pcaruana has quit IRC13:47
*** ajay33 has quit IRC13:50
*** yamamoto has quit IRC13:58
*** pcaruana has joined #openstack-lbaas14:00
johnsomlemko Ok, ready to get started. Is the load balancer active/standby?14:06
*** ramishra has joined #openstack-lbaas14:07
lemkoSINGLE14:07
lemkoStandalone I mean14:07
johnsomAh, ok.14:07
johnsomFirst up, let's have a look at what is in the DB for this amp.14:07
johnsomconnect to the octavia database. (mysql octavia)14:08
johnsomselect * from amphora;14:08
*** yamamoto has joined #openstack-lbaas14:08
johnsomselect * from amphora where load_balancer_id = '<id>';14:08
lemkook.14:08
johnsomActually, looking for just the one amp is probably easier.14:08
lemkohere it is : https://pastebin.com/0pdscaLL14:11
lemkoyou can see 7 amphora in error state14:11
johnsomOk, reading14:12
johnsomOk, good. Can you do an "openstack port show 5891db34-622b-40c5-88ad-a08919ad0da0"?14:13
johnsomLet's see if the other port is present or not.14:14
lemkothe port is present14:14
lemkohttps://pastebin.com/jve0WiGn14:14
johnsomOk, cool. So half the battle is done already.14:15
johnsomNow "openstack port list | grep fa:16:3e:d0:8d:"14:16
lemko| 5891db34-622b-40c5-88ad-a08919ad0da0 | octavia-lb-vrrp-153a6d53-a9cc-45f9-b189-5caf7f323609 | fa:16:3e:d0:8d:8c | ip_address='192.168.81.8', subnet_id='23e3aa95-62c9-4fdb-bb9e-8453afc6756e'   | DOWN   |14:17
johnsomOk, so the situation is that the base port is present however, neutron has lost it's "allowed_address_pairs" port for some reason. It's still configured on the base port, but it doesn't actually exist in neutron.14:18
lemkoallowed_address_pairs should be the ip of the amphora responsible for it?14:19
johnsomIt should be the load balancer VIP address. When you add an allowed_address_pair to a port, neutron creates a "fake" port for it.14:20
lemkoshould I delete it and add it again?14:21
lemkoor simply delete it?14:22
johnsomYeah, just a second, I will provide some commands to try14:22
johnsomopenstack port unset --allowed-address ip-address=192.168.81.6 5891db34-622b-40c5-88ad-a08919ad0da014:23
johnsomopenstack port set --allowed-address ip-address=192.168.81.6,mac-address=fa:16:3e:d0:8d:8c 5891db34-622b-40c5-88ad-a08919ad0da014:23
johnsomLet's try that and see if neutron will let us do it14:23
lemkoPort does not contain allowed-address-pair {'ip_address': '192.168.81.6'}14:24
lemko:)14:24
johnsomDid both commands fail?14:25
lemkoyes.14:25
lemkoBadRequestException: 400: Client Error for url: http://neutron-server.openstack.svc.cluster.local:9696/v2.0/ports/5891db34-622b-40c5-88ad-a08919ad0da0, Request contains duplicate address pair: mac_address fa:16:3e:d0:8d:8c ip_address 192.168.81.6.14:25
johnsomSigh, ok, maybe we need to be more specific for neutron.14:26
lemkofor the second one14:26
lemkook.14:26
johnsomlet's try "openstack port unset --allowed-address ip-address=192.168.81.6,mac-address=fa:16:3e:d0:8d:8c 5891db34-622b-40c5-88ad-a08919ad0da0"14:26
lemkogood. yes14:27
lemkoshould I do the second command and also specify the mac?14:27
johnsomHa, ok, let's run the second command again14:27
lemkodidn't read well sorry14:27
lemkook just ran it14:27
johnsomNo worries, I think we are doing fine.14:28
johnsomOk, so that passed this time?14:28
lemkoyes14:28
johnsomExcellent! Now let's do this again: "openstack port list | grep fa:16:3e:d0:8d"14:28
lemko| 5891db34-622b-40c5-88ad-a08919ad0da0 | octavia-lb-vrrp-153a6d53-a9cc-45f9-b189-5caf7f323609 | fa:16:3e:d0:8d:8c | ip_address='192.168.81.8', subnet_id='23e3aa95-62c9-4fdb-bb9e-8453afc6756e'   | DOWN   |14:29
johnsomJust one result? I was hoping for two14:29
lemkojust one yes.14:29
johnsomhow about if we do "openstack port list | grep 192.168.81.6"14:30
lemkono result14:31
*** yamamoto has quit IRC14:31
johnsomBlah. Ok, so we are going to have to get more creative as neutron isn't rebuilding the AAP port for us.14:32
*** yamamoto has joined #openstack-lbaas14:32
johnsomOne second while I build a command for you14:33
lemkoSad. what's AAP? amphora port?14:33
johnsomAllowed_address_pairs (AAP)14:35
johnsomIt's a neutron term for the "fake" port they create when you add a secondary IP address on a neutron port.14:35
johnsomOk, let's give this a go: openstack port create --fixed-ip subnet=23e3aa95-62c9-4fdb-bb9e-8453afc6756e,ip-address=192.168.81.6 --disable --project 2a095cd2d94c4d888dd8e8edf8b851b3 temp-v14:36
johnsomip14:36
johnsomWell, the name was supposed to be temp-vip, it line wrapped for some reason.14:36
*** yamamoto has quit IRC14:37
lemkoOk, I created the port (I also added the --network field)14:38
johnsomAh, yes, that might be needed.14:39
johnsomWhat is the ID of the port?14:39
lemko27bb002f-c515-403e-8093-52e0adec284d14:39
johnsomOk, back in the octavia database, let's do this to trick Octavia to work around the missing AAP port:14:40
johnsomupdate amphora set ha_port_id = '27bb002f-c515-403e-8093-52e0adec284d' where load_balancer_id = '983c9b7c-6874-40a0-b626-d7694ddd93e6';14:40
johnsomonce that is done, let's do another load balancer failover command14:40
lemkook, it's going on14:41
johnsomWhen it completes we will do a few more commands to see if we need to do any cleanup or not.14:42
lemkook it already failed14:43
johnsomBummer, can you paste worker log?14:43
lemkohttp://paste.openstack.org/show/755525/14:46
lemko+ a tiny bit here : http://paste.openstack.org/show/755526/14:48
johnsomAh, I made a mistake and forgot to update the port ID in one more table.14:50
johnsomIn the DB we need to do this command as well:14:50
johnsomupdate vip set port_id = '27bb002f-c515-403e-8093-52e0adec284d' where load_balancer_id = '983c9b7c-6874-40a0-b626-d7694ddd93e6';14:50
johnsomThen we will try failover again14:51
lemkook14:51
lemkoit's now pending update14:52
lemkostill stucked in pending update14:55
johnsomGive it some time14:55
lemkoI see from octavia-worker :14:55
lemko2019-08-05 14:54:59,164.164 17 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectTimeout: HTTPSConnectionPool(host='10.22.0.6', port=9443): Max retries exceeded with url: /0.5/info (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f53e747a590>, 'Connection to 10.22.0.6 timed out. (connect timeout=10.0)'))14:55
johnsomYeah, that is normal. That is just saying that nova has not yet finished booting the VM14:55
johnsomIt will retry for a while, waiting for nova to finish the boot.14:55
johnsomThe nova status goes to Active as soon as the process launches, but that is not when the VM is actually booted, so we have to retry/wait to find when it actually comes up14:56
lemko10.22.0.6 is an amphora in `openstack loablanacer list | grep 983c9b7c-6874-40a0-b626-d7694ddd93e6` which is in error state and that I don't see any vm with this IP in my tenant where amphora are14:57
johnsomTypically that is about 30 seconds, but depending on your setup this could take up to 18 mintues.14:57
lemkoI am a bit confused, what octavia will try to do? create a vm and give it the IP 10.22.0.6?14:58
lemkoand why this IP since it was already in database when we started the procedure?14:58
johnsomWell, assuming that log entry is from the load balancer we are working on (it may not be), it would have asked nova to boot a VM and nova gave it back 10.22.0.6 as the IP address nova assigned for the lb-mgmt-net14:59
johnsomNova can re-use IPs, which is fine, we account for that14:59
johnsomSo, when you look at our load balancer with openstack loadbalancer show it's marked in provisioning_status ERROR?15:01
johnsomNot PENDING_UPDATE?15:01
lemkoPENDING_UPDATE.15:01
johnsomOk good, that is what we want to see15:01
johnsomWhat do we have here? select * from amphora where load_balancer_id = '983c9b7c-6874-40a0-b626-d7694ddd93e6';15:03
lemkoBTW there's an ALLOCATED amphora with STANDALONE status for our LB but it has different IP15:03
lemkoI'll execute the select15:03
lemkohttp://paste.openstack.org/show/755529/15:04
johnsomOk, that is good. That is what I would expect. Now, why it's still in pending update....  hmmm15:05
johnsomCan you look in both the controller worker log and the health manager log to see if it's trying to contact "10.22.0.9"?15:06
lemkoNo mention of it.15:08
lemkoonly from house-keeping 30 mins afgo15:08
lemkoago*15:08
lemko2019-08-05 14:43:27,993.993 1 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='10.22.0.9', port=9443): Max retries exceeded with url: /0.5/info (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3df53f91d0>: Failed to establish a new connection: [Errno 111] Connection refused',))15:08
lemkowhen the amphora was booting and getting ready, it was full of logs like this but it's the last one I sent you15:09
lemkoand octavia worker is still trying to contact 10.22.0.615:10
johnsomHmm. Have you tuned your startup timeouts or are you running with the defaults?15:11
johnsomDo you have spares pool enabled?15:13
lemkoYes15:13
lemkospare-pool of 215:13
lemkohere is the conf with timeout http://paste.openstack.org/show/755530/15:13
lemkofull conf : http://paste.openstack.org/show/N3pHJTLKbVx5NIKVwTTw/15:15
johnsomYeah, ok, we are going to have to wait a while for the LB to go back to ERROR15:15
johnsomYou might consider the following settings:15:16
johnsom[haproxy_amphora] connection_max_retries = 12015:16
johnsom[haproxy_amphora] build_active_retries = 12015:16
lemkonow we're waiting for connection_max_retries to go away15:16
lemko?15:16
johnsomSo, spares pool bring an intersting twist to this. I wonder if there isn't a bug in the spares pool allocation on failover.15:17
johnsomYeah, we need the controller to release ownership of the load balancer. Basically we want it to come out of the PENDING_* state and either go ACTIVE or ERROR15:17
lemkoIs there a chance it can go to ACTIVE?15:18
johnsomI'm going to look in the failover code path when spares pool is enabled.15:18
lemkowhat would forcing it to ERROR do?15:18
johnsomBad things15:18
*** yamamoto has joined #openstack-lbaas15:19
johnsomWhen an object is in PEDNING_* a controller has ownership of the resource. If you force it out of PENDING you are going to leave resources in use  in the tenant and most likely make this situation of repairing much worse.15:20
lemkoOK. I did this a few time and I know understand why it ended badly with me having to purge completely octavia and reinstall it15:20
lemkonow*15:20
johnsomPlus, at some point the controller will give up waiting on nova and will start making changes to whatever exists at that time.15:20
johnsomSo it could circle back and start deleting things even if we resolved the problem15:21
johnsomYeah, PENDING is an important state.15:21
johnsomIt's best if you tune those timeouts so Octavia doesn't retry so long on nova/neutron.15:21
johnsomThat way it will release the resource much faster.15:22
lemkoIf I change it now, will it take effect now?15:22
johnsomNo15:22
lemkoOk.15:22
johnsomThe long defaults are there for the zuul test instances that run in nested VMs without hardware acceleration. They can take up to 18 minutes to boot a VM.15:23
johnsomIt's an unfortunate set of defaults.15:23
johnsomSo, what I think we should do is:15:24
johnsom1. Wait for the controller to release the load balancer. Likely to ERROR15:24
johnsom2. Check the amphora table status. Likely all DELETED and ERROR, which would be fine.15:25
*** yamamoto has quit IRC15:25
johnsom3. Purge the old amphora records from the amphora table (I think there might be a bug with spares pool). "DELETE from amphora where status = 'ERROR' or 'DELETED';"15:26
*** goldyfruit has joined #openstack-lbaas15:26
johnsom4. Trigger another failover.15:26
johnsomoptionally, update those timeouts and restart the controllers. Please wait until the LB is out of PENDING though.15:27
openstackgerritAdam Harwell proposed openstack/octavia master: Fix a unit test for new octavia-lib  https://review.opendev.org/67368715:27
lemkoand what about the port we created previously?15:28
lemkothe temp-vip port15:29
johnsomAfter we get the LB going again. We should grep the port list for the VIP IP. if we see the other AAP port with it we will delete the temp-vip15:29
rm_workok lets15:35
rm_work*let's see if that works now15:35
johnsomHmm, I don't see anything the the failover flow with spares that could cause it to get the wrong amp for the DB.15:39
lemkoThanks a lot for your help. I'll come back to you when the retry ends ;)15:44
johnsomOk, cool15:45
rm_workyep cool, docs passed15:49
rm_workso cgoncalves we could prolly merge https://review.opendev.org/#/c/673687/ now :D it'll fix our lower-constraints job which will I believe otherwise continue to break15:49
rm_work(technically waiting for tests to finish, but everything passed before besides docs15:49
rm_work)15:49
rm_workjohnsom: https://review.opendev.org/#/c/673172/15:58
rm_workhttps://review.opendev.org/#/c/674087/ is waiting on that one to merge ^^15:58
rm_workthere's a LOT of patches up with one +2 from me that could use reviews from other cores ;)16:01
*** tesseract has quit IRC16:04
openstackgerritCarlos Goncalves proposed openstack/octavia master: WIP: Switch Fedora-based amphora to fedora-minimal  https://review.opendev.org/67317316:09
rm_workspecifically this thingy https://review.opendev.org/#/c/645495/ could probably use a review, it's been waiting a while and seems trivial-ish to me16:12
*** henriqueof has quit IRC16:12
rm_workand https://review.opendev.org/#/c/661309/ will start another one16:20
rm_workugh need to merge https://review.opendev.org/#/c/673687/ before anything in Octavia can merge I thinjk16:30
johnsomYes16:31
rm_workcan we just +A that as a gatefix16:32
rm_workI guess I'll wait at least until the checks finish16:32
rm_workok cool, cgoncalves got it :)16:32
johnsomIt seems like the octavia-lib patches should run some octavia tests.... At least unit and functional16:33
rm_workmaybe yeah <_<16:37
*** ramishra has quit IRC16:44
*** ricolin has quit IRC16:57
*** goldyfruit has quit IRC17:18
*** goldyfruit_ has joined #openstack-lbaas17:18
*** Vorrtex has joined #openstack-lbaas17:21
openstackgerritMichael Johnson proposed openstack/octavia master: Add Octavia tox "tips" jobs  https://review.opendev.org/67465917:21
johnsomLet's see what that does. If it works we can add it to octavia-lib17:21
openstackgerritCarlos Goncalves proposed openstack/octavia master: Install missing packages in nodepool instance  https://review.opendev.org/67425917:21
openstackgerritCarlos Goncalves proposed openstack/octavia master: WIP: Switch Fedora-based amphora to fedora-minimal  https://review.opendev.org/67317317:21
rm_worktests take sooooo excruciatingly looooooong17:26
*** psachin has quit IRC17:28
*** rpittau is now known as rpittau|afk17:42
johnsomIs it the functionals?17:51
johnsomI am seeing this odd behavior that if an /etc/octavia/octavia.conf exists, and debug is True, the functional tests slow to a crawl17:52
openstackgerritMerged openstack/octavia-tempest-plugin master: Increase connection_max_retries to 480 secs on CentOS jobs  https://review.opendev.org/67317217:56
rm_workhmmmmm18:14
rm_worki should check18:14
rm_worknope don't have that18:14
rm_workit's specifically these new data api ones18:15
rm_workerr, let me see, how do i get it to tell me the times again?18:15
rm_workanywho, gonna rebase patches down the line18:17
openstackgerritAdam Harwell proposed openstack/octavia master: Lookup interfaces by MAC directly  https://review.opendev.org/67333718:17
openstackgerritAdam Harwell proposed openstack/octavia master: Fix L7 repository create methods  https://review.opendev.org/67315418:17
openstackgerritAdam Harwell proposed openstack/octavia master: Fix provider driver utils  https://review.opendev.org/67315518:17
openstackgerritAdam Harwell proposed openstack/octavia master: Add get method support to the driver-agent  https://review.opendev.org/66502918:17
*** maciejjozefczyk has quit IRC18:52
rm_workyeah i'm thinking i can't actually get a functional suite run to complete18:57
rm_workthey hang on my machine18:57
johnsomReally? Total hang or super slow reports?18:58
*** gcheresh_ has joined #openstack-lbaas19:15
*** gcheresh_ has quit IRC19:21
rm_worktotal hang when i run it from the CLI with tox19:21
rm_workthey hang for about 3 minutes each when i run them individually19:22
rm_workin pycharm19:22
rm_worktrying to figure out WHERE it's hanging19:22
rm_workah i am 90% sure it's this:19:23
rm_workself.status_listener_proc.join(60)19:23
rm_work        self.stats_listener_proc.join(60)19:23
rm_work        self.get_listener_proc.join(60)19:23
rm_workthat's the 3 minutes :D19:23
rm_workthey all time out one at a time19:23
rm_workwhich means the real issue is that the exit_event that's set isn't actually working19:23
rm_workthey all just run `server.handle_request()` which seems to be blocking19:24
rm_workshould have a timeout set? what is the default for CONF.driver_agent.get_request_timeout19:25
rm_work5 seconds? ... it's not respecting that for sure19:26
rm_workjohnsom: ^^19:27
johnsomHmm, works great for me, those take about 9 seconds each and done19:30
johnsomI thought I left those without a timeout....19:35
johnsomI only force killed the third party processes19:35
johnsomYeah where do you see the join(60)???19:37
rm_worktop of test_driver_agent.py19:37
rm_worklike like 50 or so19:37
rm_work*line19:37
johnsomAh, the test19:37
johnsomhttps://www.irccloud.com/pastebin/QAnZ5M2S/19:40
johnsomjohnsom@python23:/tmp/octavia$ tox -e functional -- octavia.tests.functional.api.drivers.driver_agent19:41
johnsomEach test is right about 9 seconds19:42
rm_workyeah these are never dieing19:42
rm_worki wonder if the socket implementation is different / f'd on OSX19:43
rm_workahh hmm19:44
rm_workin py27 they finish basically instantly19:44
rm_workit's in py37 that they hang19:44
johnsomAh, let me try19:44
rm_work(again, i usually test in py37)19:45
rm_workor py3619:45
rm_workT_T19:45
johnsomI get the same on py36 (this 18.04 VM doesn't have 3.7 on it)19:46
johnsomhttps://www.irccloud.com/pastebin/uHNqGYKm/19:46
rm_workgrrrr something in cinder broke and killed our grenade run and so we have to recheck that patch and have it run through twice again19:49
rm_workthere goes another 5 hours19:49
johnsomI thought I had cinder disabled. Is it just that it keeps cloning it?19:50
johnsomI shut it down for this very reason....19:50
rm_workit looked like it was installing/setting it up during the devstack run19:50
rm_workagain, in Grenade19:50
rm_workso maybe not disabled there?19:50
johnsomYeah, ok.19:50
rm_worksetting up a functional-py36 env so i can test19:51
rm_workto see if it's a 3.7 only issue19:51
rm_workit could be <_<19:51
johnsomwell, installing "python3.7" on 18.04 ends up with strange results. The tests "pass" but it throws one of the "ascii" codec errors19:55
johnsomNo runtime info, nothing19:55
johnsomYeah, the built in 3.7 support seems to just not function correctly....19:58
openstackgerritAnqi Li proposed openstack/octavia master: Implements notifications for octavia  https://review.opendev.org/67443219:59
rm_workyes, 3.6 runs fine20:03
rm_work<_<20:03
rm_workonly 3.7 hangs20:04
rm_workso something is borked in 3.7 and we will have to figure it out at some point20:04
johnsomDon't our gates run that? Seems like I need to figure that out sooner-ish20:10
johnsomAh, we only have 3.6 in there20:11
openstackgerritAdam Harwell proposed openstack/octavia master: Add get method support to the driver-agent  https://review.opendev.org/66502920:17
rm_workwell, that's a working version anyway20:17
rm_workbut yeah uhh... something is causing the process to hang in 3.7 and it COULD be the socket lib, not sure20:17
*** henriqueof has joined #openstack-lbaas20:20
openstackgerritAnqi Li proposed openstack/octavia master: Implements notifications for octavia  https://review.opendev.org/67443220:28
openstackgerritAdam Harwell proposed openstack/octavia master: Move to using octavia-lib constants  https://review.opendev.org/67371220:32
rm_workeugh rebases20:36
openstackgerritAdam Harwell proposed openstack/octavia master: Add long-running provider agent support  https://review.opendev.org/67414020:37
rm_workok that chain is up to date20:37
openstackgerritMichael Johnson proposed openstack/octavia master: Add Octavia tox "tips" jobs  https://review.opendev.org/67465920:41
*** lemko has quit IRC20:54
*** vishalmanchanda has quit IRC21:03
*** yamamoto has joined #openstack-lbaas21:23
*** yamamoto has quit IRC21:28
openstackgerritMichael Johnson proposed openstack/octavia master: Add the DIB_REPO* variables to the README.rst  https://review.opendev.org/67470121:30
johnsomCould have swore we had that in the README, but I guess not.21:30
*** rcernin has joined #openstack-lbaas22:06
*** spatel has joined #openstack-lbaas22:12
rm_workAUGH IT FAILED AGAIN22:32
rm_workWTB gatefix merging please zuul22:32
*** tkajinam has joined #openstack-lbaas22:56
*** Vorrtex has quit IRC22:58
*** henriqueof has quit IRC23:00
* johnsom starts chanting towards zuul23:01
*** spatel has quit IRC23:33
openstackgerritMerged openstack/octavia-tempest-plugin master: Support skipping APP_COOKIE and HTTP_COOKIE  https://review.opendev.org/64549523:34
rm_workcgoncalves: so ... centos is getting ... slower? or what23:38
rm_worki thought you showed graphs of the boot time getting way faster23:39
rm_worknow we're just getting TIMEOUTs on the centos gate job :(23:40
johnsomWorst part is the centos gates have been that way for months23:50

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!