Tuesday, 2019-03-19

*** sapd1_x has joined #openstack-lbaas00:11
*** luksky has quit IRC00:15
*** sapd1_x has quit IRC00:21
*** abaindur has joined #openstack-lbaas00:51
*** abaindur has quit IRC00:55
*** abaindur has joined #openstack-lbaas01:02
*** abaindur has quit IRC01:03
*** abaindur has joined #openstack-lbaas01:03
*** lemko has quit IRC01:04
*** ricolin has joined #openstack-lbaas01:08
*** hongbin has joined #openstack-lbaas01:12
*** ricolin has quit IRC02:44
*** sapd1_x has joined #openstack-lbaas03:52
*** sapd1_x has quit IRC04:04
*** hongbin has quit IRC04:12
*** ramishra has joined #openstack-lbaas04:18
*** vishalmanchanda has joined #openstack-lbaas05:07
vishalmanchandajohnsom:  Hi, i've had a query regarding your patch[1], may i know why we need both horizon-nodejs4-jobs,  horizon-nodejs10-jobs in the octavia plugin?05:11
vishalmanchanda[1] https://review.openstack.org/#/c/643630/05:11
johnsomvishalmanchanda: Hi. So our old jobs used nodejs 4.  However that is very old now, so we want to test with the current Long Term Support (LTS) version of nodejs, version 1005:23
*** ricolin has joined #openstack-lbaas05:23
vishalmanchandajohnsom:  yes, my question was related to keeping nodejs4. IIUC it's running on xenial and we've migrated to bionic so why are we still keeping that job?05:42
johnsomvishalmanchanda: We consider Stein the transition release where both xenial and bionic should be supported.  In train, nodejs 4 will not matter.05:58
vishalmanchandajohnsom: ok. Thanks for info.06:00
*** lemko has joined #openstack-lbaas06:15
*** ivve has joined #openstack-lbaas06:19
*** ricolin has quit IRC06:43
*** gcheresh has joined #openstack-lbaas06:47
*** mkuf has joined #openstack-lbaas06:50
openstackgerritOpenStack Proposal Bot proposed openstack/octavia-dashboard master: Imported Translations from Zanata  https://review.openstack.org/64449607:13
*** pcaruana has joined #openstack-lbaas07:14
*** pcaruana has quit IRC07:33
*** pcaruana has joined #openstack-lbaas07:34
*** rpittau|afk is now known as rpittau07:56
*** ramishra has quit IRC08:03
*** ramishra has joined #openstack-lbaas08:06
*** luksky has joined #openstack-lbaas08:07
*** luksky has quit IRC08:16
*** abaindur has quit IRC08:24
*** ricolin has joined #openstack-lbaas08:58
*** luksky has joined #openstack-lbaas09:19
*** ramishra has quit IRC09:23
*** ramishra_ has joined #openstack-lbaas09:35
*** ramishra_ has quit IRC09:36
*** ramishra_ has joined #openstack-lbaas09:36
openstackgerritMerged openstack/octavia-dashboard master: Imported Translations from Zanata  https://review.openstack.org/64449610:12
dulekcgoncalves: Hi! Do I need to enable o-da even in Apmhora jobs in Kuryr?10:24
cgoncalvesdulek, not currently, no10:25
dulekcgoncalves: So just for OVN?10:25
cgoncalvesdulek, yes and any other 3rd party provider driver10:26
dulekcgoncalves: Okay, nice, thanks!10:26
*** yamamoto has quit IRC10:50
*** yamamoto has joined #openstack-lbaas10:56
*** yamamoto has quit IRC11:00
*** yamamoto has joined #openstack-lbaas11:05
*** yamamoto has quit IRC11:05
*** yamamoto has joined #openstack-lbaas11:06
*** psachin has joined #openstack-lbaas11:07
*** rcernin has quit IRC11:24
*** psachin has quit IRC11:54
*** luksky has quit IRC12:07
openstackgerritMargarita Shakhova proposed openstack/octavia master: Support create amphora instance from volume based.  https://review.openstack.org/57050512:16
*** luksky has joined #openstack-lbaas12:20
*** luksky has quit IRC12:29
*** trown|outtypewww is now known as trown12:38
*** luksky has joined #openstack-lbaas12:42
*** yamamoto has quit IRC13:09
rm_workjohnsom: was the plan for even the amphora driver to go through o-da eventually?13:12
cgoncalvesrm_work, IIRC I read somewhere it is. I'd like it13:15
rm_workIt seems like it'd be "fair"13:22
rm_workOtherwise we have a kind of... First class / second class provider arrangement13:23
rm_workAnd multiple code paths13:23
rm_workI guess that's probably planned as a follow up?13:23
cgoncalvesright13:23
cgoncalvesplanned: most likely yes. assigned: unsure ;)13:23
rm_workIt's something I'd not mind doing but ffff no idea if I'm gonna have time for that much13:26
rm_workWe'll see13:27
*** yamamoto has joined #openstack-lbaas13:28
rm_workDo we have any patches we need for RC1?13:31
cgoncalveshttps://etherpad.openstack.org/p/octavia-priority-reviews13:31
*** sapd1_x has joined #openstack-lbaas14:13
rm_workWill see if I can do some reviews today... :/14:14
openstackgerritCarlos Goncalves proposed openstack/octavia master: Add RHEL 8 amphora support  https://review.openstack.org/63858114:46
*** yamamoto has quit IRC14:49
*** fnaval has joined #openstack-lbaas14:50
*** yamamoto has joined #openstack-lbaas14:55
*** yamamoto has quit IRC14:55
*** yamamoto has joined #openstack-lbaas14:56
*** yamamoto has quit IRC14:58
*** luksky has quit IRC14:58
*** yamamoto has joined #openstack-lbaas14:58
*** yamamoto has quit IRC14:58
*** yamamoto has joined #openstack-lbaas14:59
*** yamamoto has quit IRC15:03
*** vishalmanchanda has quit IRC15:05
openstackgerritMargarita Shakhova proposed openstack/octavia master: Support create amphora instance from volume based.  https://review.openstack.org/57050515:12
*** sapd1_x has quit IRC15:19
*** roukoswarf has joined #openstack-lbaas15:23
openstackgerritNir Magnezi proposed openstack/octavia master: Add RHEL 8 amphora support  https://review.openstack.org/63858115:28
*** gcheresh has quit IRC15:34
*** yamamoto has joined #openstack-lbaas15:39
*** yamamoto has quit IRC15:45
openstackgerritCarlos Goncalves proposed openstack/octavia master: Add RHEL 8 amphora support  https://review.openstack.org/63858115:48
*** luksky has joined #openstack-lbaas15:51
openstackgerritNir Magnezi proposed openstack/octavia master: Add RHEL 8 amphora support  https://review.openstack.org/63858115:55
*** trown is now known as trown|lunch16:13
*** ramishra_ has quit IRC16:15
*** ivve has quit IRC16:31
roukoswarfjohnsom: how crazy of an LB count has octavia/amphora been tested at?16:34
roukoswarfhaving some downtime on adding listeners to a LB with like... 6 listeners, 6 pools, 2 members ish each16:35
johnsomroukoswarf What are you seeing? That seems pretty small16:35
xgermanrm_work: ran our biggest cluster followed by eandersson16:35
johnsomWell, there are others too.16:36
xgerman:-)16:36
roukoswarfuser is adding a new listener, and then every LBs provisioning status went error, then went down, then eventually just the one that got a listener stayed down, but the other LBs came back16:36
roukoswarfah actually, LBs never went down, but it did take down the provision status of every listener in that LB to down16:37
roukoswarfsorry, error, not down, same result16:38
johnsomYeah, we try to make sure they still pass traffic even if there is some kind of provisioning error.16:38
roukoswarfso added new listener, then every listener which previously worked is now down16:38
johnsomWhat does your controller worker log show as to why it went to error?16:38
roukoswarfwhich log?16:40
johnsomThe worker process log16:40
johnsomo-cw if you are using devstack16:40
roukoswarfboth of the amphoras are listed as down now.. and one is completely not even an instance anymore.16:43
roukoswarfstill checking logs16:43
johnsomWhile you look for logs, I will boot up an LB with that scenario and see what I get.16:44
roukoswarfwhat influences the amphora status? master and backup both listed as down16:45
roukoswarfnever had an amphora blow up16:45
roukoswarfis there a way to re-provision the amphoras?16:45
johnsomYeah, me either16:45
xgermanthe failover command will under certain conditions re-provision16:46
roukoswarfboth the active and backup died16:46
roukoswarfbut... the backup is totally deleted, not an instance anymore, and the active is "error", but the instance is up16:47
johnsomYeah, that is what should happen. It attempted to repair the backup, but it failed somehow (nova issue?), so it stopped.16:48
roukoswarfso how do i give it a kick to see if it fails again?16:48
johnsomOk I have an LB with six listeners, each with a pool, each with two members.16:48
roukoswarfall he did was add another listener, took the entire active-backup pair down16:49
johnsomThe failover commands are what you need.16:49
johnsomAh, active/standby. I'm running single.  Let me test this, then I will restart with act/stdby.16:49
johnsomYeah, no issue with #7 listener.16:50
roukoswarfalso, every member is a "external" member because this user is doing insanity with ip pairs16:51
*** henriqueof2 has quit IRC16:51
roukoswarfbut that should be the same code path, no?16:51
*** luksky has quit IRC16:54
johnsomCorrect, same code16:55
johnsomOk, I'm up to 10 listeners, each with a pool, each with two members. No errors.16:55
johnsomI will start over with active/standby16:55
roukoswarfhealth manager is failing out the LB due to WARNING octavia.controller.healthmanager.health_drivers.update_db [-] Amphora 676aaa72-7c61-4c67-8b1b-4b9ed48dab0e health message reports 5 listeners when 6 expected16:56
roukoswarfit seems, looking for why theres a mismatch16:57
johnsomYeah, that is just a warning and can happen during the provisioning process. Not an error16:57
*** ricolin has quit IRC17:01
johnsomYeah, up to seven under active/standby as well. All is working as expected17:04
*** luksky has joined #openstack-lbaas17:07
johnsomOk, up to 1017:07
johnsomhttps://www.irccloud.com/pastebin/YR5nUy12/17:09
roukoswarfhttp://paste.openstack.org/show/gKyWjSbM1ZuQqQDG4cfM/17:10
roukoswarfthis is the only errors we have that i can find, but they are not for the right LBs17:10
roukoswarfthe failed out lbs are .50 and .3517:10
*** rpittau is now known as rpittau|afk17:10
roukoswarfno mention of either failed LB id either17:11
johnsomWhat version of Octavia are you using in the image?17:12
roukoswarfyou mean the amphora?17:12
roukoswarfi built everything on the git branch for rocky17:13
johnsomYeah, what version of Octavia did you use to create the image?17:13
roukoswarf3.0.217:13
johnsomThis is the troubling error: Amphora agent returned unexpected result code 500 with response {u'http_code': 500, u'error': u"a bytes-like object is required, not 'str'"}17:14
roukoswarfwell, thats the worker/controller.17:14
johnsomIt implies there is something wrong with the image.17:14
roukoswarflooks like a python2 vs python3 problem17:14
johnsomRight. however we test with both. this is why I'm wondering what is up17:15
roukoswarfwhats the easy way to check octavia version in the image built?17:15
roukoswarfcontroller is 3.0.2, but do i just check the agent's pip env?17:16
johnsomWell, you can ssh into the instance and cd /opt/amphora-agent and do a "git log"17:16
johnsomcommit ae26cf9ebf7986148f9be2ddcc67cc185e88c7e017:17
johnsomThere is also a way to manually query the amp agent, but that is harder to do because of the security17:17
roukoswarfoctavia==3.1.0.dev166 inside the image, quite ahead...17:17
roukoswarfimage is commits up to jan 917:18
roukoswarfso should we build an image and lock it to 3.0.2? is the mismatch the problem?17:19
johnsomIt should not be.17:19
roukoswarfhow do i get the images back up so i can stop blocking people while i troubleshoot this? how do i trigger a rebuild of master/backup?17:20
johnsomSo, if you can reproduce this, I can give you configurations that will cause it to not delete the failed instance so you can look at the logs. It basically disables all of the error handling code.  I just can't reproduce this locally. I'm running python3 in the amp.17:21
johnsomhttps://developer.openstack.org/api-ref/load-balancer/v2/index.html#failover-amphora17:21
johnsomand17:21
roukoswarfpython3 is being used in the amphora, the controller is py217:22
johnsomhttps://developer.openstack.org/api-ref/load-balancer/v2/index.html#failover-a-load-balancer17:22
roukoswarfit seems17:22
johnsomThat should not matter, we do everything over a json REST api17:22
roukoswarfis there no openstack api command to wrap this rest api?17:23
johnsomyes, you can use the openstack CLI17:24
roukoswarffailover errors due to being in error state.17:24
johnsomAh, yeah, that bug was recently fixed.17:25
johnsomThis is the rocky backport: https://review.openstack.org/#/c/643005/17:26
roukoswarfso... what do17:26
roukoswarfamphoras are "immutable", and LB cant failover17:27
roukoswarfnot sure where to find why these nodes are failed, those python errors arent on the failed nodes.17:27
johnsomyeah, all of the details are in the logs. The create listener error should be in the worker log, the failover error should be in the health manager log.17:28
johnsomUmm, you can try setting the status to ACTIVE manually in the DB, then using the failvoer17:29
*** luksky has quit IRC17:32
roukoswarfim hoping the next version (after rocky) wont blow up?17:34
johnsomWell, rocky doesn't blow up for us17:35
roukoswarfwell i cant really make syntax errors myself unless i edited the code, which i didnt17:35
roukoswarfits a kolla-ansible deploy of openstack.17:36
johnsomYeah, that is what I don't understand, why you are getting those.17:36
roukoswarfi could provide configs17:36
johnsomCan you provide the commit ID from your amphora image?17:36
roukoswarfif you wanted to lab them tog17:36
roukoswarfwell, the amphora is running on py3, which shouldnt get that error, its the 3.0.2 controller that seems to have blown a gasket17:37
roukoswarf7675c40d3024f275666240e4c2ada44813d0e535 commit on amphora-agent17:38
johnsomThe error you pasted was clearly something wrong inside the amphora image.17:38
johnsomThanks17:38
johnsomOk, so that is a "master" branch amphroa-agent.17:41
roukoswarfyep17:41
*** trown|lunch is now known as trown17:43
roukoswarfi can build images to a specific commit/version, if thats my fix for my irregularity17:44
johnsomDo you know the exact command line they used to create the listener?17:45
johnsomAh, looking through the rest of that log snippet, they were creating a pool.17:46
johnsomSo, the options they used for the pool would be helpful17:47
roukoswarfthey have just been poking around in horizon, i could try and dig for the settings17:49
roukoswarfneed to get the blown LB up first though17:49
roukoswarfso i just set the LB to up in the db and then failover the LB and it should bring both the active and backup back?17:51
roukoswarfjust want to confirm before i do further damage, seems like an odd way to trigger a reprovision of a LB17:54
johnsomYes, it should cycle through the amphora and attempt a repair. There is a risk, using LB failover, that it may interrupt traffic that is likely still passing through the LB.17:59
roukoswarfwell everything is down currently17:59
johnsomIt's not passing traffic still?18:00
roukoswarfadding a listener killed the entire stack18:00
roukoswarfnope, no ports listening on the vip18:00
johnsomHmmm, that seems more like a outside of Octavia failure.18:00
johnsomWell, then, you have nothing to make worse I guess.18:01
roukoswarfwell, other LBs are fine18:01
roukoswarfthe one LB that had a listener added blew up both amphoras and all listeners18:01
*** ivve has joined #openstack-lbaas18:07
roukoswarfso i set everything to online/active on the lb, then failoverd it, which now theres a master and a standlone, on the same vip...18:16
roukoswarflogs show another 500 error18:17
roukoswarfsame state i was before, so this specific LB setup crashes the amphora setup?18:18
roukoswarfhttp://sprunge.us/DBBRRl18:19
johnsomCan you ssh into one of the amphora and check the syslog/messages for an amphora-agent error18:19
johnsom?18:19
roukoswarfsure18:19
roukoswarfhttp://paste.openstack.org/show/6JQAWhzNSBYExhlVb571/18:22
roukoswarfcannot fork?18:23
roukoswarfonly one haproxy is failing though, that i can tell.18:25
johnsomHmm, yeah, that is odd. Is there an error in the haproxy.log?18:35
roukoswarfits the fork issue... debuged it18:36
roukoswarfhaproxy will throw a fork error instead of ooming18:36
johnsomYeah, it throws that for a few reasons18:36
roukoswarfstrange though, cause i had 200mb of ram free, it wanted more.18:37
roukoswarfeven while running it, it still had 200mb of ram free... must have some kind of minfree, which is silly to throw a fork error on...18:37
johnsomHow much RAM are you giving the amps? Are you doing a lot of TLS offload?18:38
roukoswarf0 tls offload, 1gb of ram18:38
roukoswarfits all tcp proxy, not even http18:39
johnsomThis is the ubuntu image right?18:39
roukoswarfyes18:39
johnsomYeah, TCP doesn't use much18:39
johnsompaste finally loaded, wondering why the str error isn't there from the amphora-agent18:41
roukoswarfdoubt the str error is related to this specific haproxy crash, but yeah18:43
johnsomWell, if it caused the one of the files to be written out wrong, it might18:44
roukoswarfit was happening on LBs that never went down18:45
roukoswarfbut... maybe they are close on ram and failed intermittently?18:45
johnsomFYI, here is my amp memory with one LB: KiB Mem :  1015812 total,   756360 free,   114248 used,   145204 buff/cache18:45
*** abaindur has joined #openstack-lbaas19:10
roukoswarfjohnsom: if the health monitor is happy, and the member is happy, why would the listener still be error? just a delay in update?19:33
johnsomProvisioning status error or operating status error?19:33
roukoswarfprovisioning, sorry.19:34
roukoswarfdoes provisioning status need a manual kick on a failover rebuild?19:34
johnsomprovisioning status is the status of actions the controller was taking.  Operating is the status as measured or observed.19:35
johnsomA listener in prov status ERROR means the last time the controller attempted to interact with that listener, the process failed. For  example if neutron had a problem and the controller could not reach the listener.19:35
johnsomA successful failvoer will clear the provisioning status errors.19:36
roukoswarfeven if we enable/disable a health monitor, it flips the provision status to active, lol.19:37
johnsomYeah, because it did a round trip to the amp and pushed out another configuration. So if that succeeded, i.e. neutron is working again, it's back to Active.19:40
roukoswarfyep, thanks for all the help19:40
*** trown is now known as trown|outtypewww20:14
*** yamamoto has joined #openstack-lbaas20:18
*** yamamoto has quit IRC20:22
*** luksky has joined #openstack-lbaas20:54
*** abaindur has quit IRC22:01
*** abaindur has joined #openstack-lbaas22:03
*** pcaruana has quit IRC22:08
*** celebdor has joined #openstack-lbaas22:16
*** rcernin has joined #openstack-lbaas22:16
*** fnaval has quit IRC23:12
*** fnaval has joined #openstack-lbaas23:20
*** luksky has quit IRC23:21
rm_workxgerman: did you want to refactor https://review.openstack.org/#/c/613685/ on top of https://review.openstack.org/642914 which I just +W'd?23:22
rm_workmight still be able to make RC1... though not sure, we're at the line23:22
xgermanok, will refactor that one on top23:23
rm_worki think it addresses the last concerns cgoncalves had23:23
johnsomWow, you are bold +w a flow change....23:23
johnsomWe just had to fix/revert one of those23:23
xgermanthat one is LOW impact23:23
johnsomgrin, ok23:24
johnsomUgh, tracking down a status bug my last patch uncovered....  Fun times, not23:24
johnsomSummary: pool: PENDING_UPDATE lb: ACTIVE list: ACTIVE23:26
*** abaindur has quit IRC23:27
rm_workyeah it passed my head-parser23:27
rm_workso i'm pretty sure it's good :D23:27
rm_workhmm :/23:28
rm_worksomehow the pool is escaping our state machine?23:28
rm_workBTW I'm starting to move patches that are not ready yet out of the "Merged by 19th" section into "next cycle"23:28
johnsomOk23:28
johnsomYeah, somewhere we aren't marking the pool back to active at the end of a flow.23:29
rm_worknot ready yet == zuul -1 that is clearly not an intermittent issue23:29
rm_workor heavyweight refactors that we just don't need in an RC23:29
*** abaindur has joined #openstack-lbaas23:30
rm_workyou really think you'll make it in for RC1? lol23:31
johnsomI think we need to get this lifecycle bug fix in.23:31
rm_worki mean, it's not *done*, and gate time alone...23:31
johnsomThursday is the hard cut23:32
rm_workwhat TIME thursday?23:32
johnsomI should have it fixed up today23:32
johnsomWell, the patch is fine, but we have another bug it exposed23:32
rm_worklol ok, if you really want it in, I'll try :)23:32
johnsomUgh, that neutron-lbaas proxy API gate is blocking an A10 patch on neutron-lbaas.23:33
xgerman?23:34
johnsomhttps://review.openstack.org/#/c/639571/23:34
*** henriqueof has joined #openstack-lbaas23:36
xgermanmust have forgotten how to read logs. http://logs.openstack.org/71/639571/1/gate/neutron-lbaasv2-dsvm-api-proxy/a673ee2/logs/screen-n-api.txt.gz ends timnestampwise way before the test runs23:41
xgermandisrregard23:42
*** celebdor has quit IRC23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!