Tuesday, 2020-04-14

*** openstack has joined #openstack-lbaas09:56
*** ChanServ sets mode: +o openstack09:56
*** rpittau is now known as rpittau|bbl10:23
openstackgerritAdam Harwell proposed openstack/octavia master: Healthmanager opts aren't CLI-related  https://review.opendev.org/71992111:26
openstackgerritAdam Harwell proposed openstack/octavia master: Fix py3 amphora-agent cert-rotation type bug  https://review.opendev.org/71992211:29
*** sapd1 has quit IRC11:30
rm_work^^ that second one is pretty high priority IMO11:33
rm_workcgoncalves: you around? could use thoughts -- do we also not need to worry about py2 on amps anymore?11:37
rm_workif not, then i don't need any six / typechecking guard code around that, but if we do, then i do11:37
cgoncalvesrm_work, Stein is still a supported release version and is tested against both py2 and py311:48
rm_workok but this is master11:48
rm_worki'm referring specifically to patch 71992211:49
cgoncalvesrm_work, correct. no six in master. when backporting, please consider py2 too11:49
rm_workk11:50
rm_workgot it, you want to backport it with that fix, kk11:50
rm_worki'll do that once it merges11:50
rm_workBTW this sucks  really hard, my amps just started exploding one by one as their certs came up for rotation11:51
rm_workand i think it has been happening before and i didn't even notice because of all the other things that caused amps to explode <_<11:53
rm_workbut we really need to fix this sqlalchemy issue because we can't merge anything11:54
rm_workalso  cgoncalves i still don't understand https://review.opendev.org/#/c/717619/11:55
rm_worki posted another comment -- the whole design of the original local-cert-manager driver was to enable tempest testing like you're trying to do11:55
rm_workit should work fine?11:56
rm_workI just don't understand the need for *yet another* noop driver (the local driver was essentially designed to be a noop, it's not usable for anything besides testing)11:56
rm_workif we're not going to use the local driver for this, then we may as well delete and replace it with the noop one?11:57
rm_workbut it seems like a more robust option11:57
*** servagem has joined #openstack-lbaas11:58
cgoncalvesrm_work, I agree replacing the local cert manager with the noop one.11:58
rm_workso you'd really rather have this (what seems to me to be) really limited noop option?11:59
cgoncalvesrm_work, the problem with the local cert manager is it requires pre-configuration prior to running tempest11:59
rm_workno?11:59
rm_worktempest can drop files in the tests11:59
rm_worka test can: write out a certfile and then use it in octavia within the same test12:00
cgoncalvestempest should test against the cloud from the outside (black box) so having the pre-req of having cert files in the cloud nodes isn't ideal12:00
cgoncalvesrm_work, from tempest how would you write out a cert file?12:00
rm_workah yeah i guess i am thinking mostly of gates where it's all on the same couple nodes... it could be complex with a multinode deployed cloud12:01
cgoncalvesI mean, yes, it is possible but it shouldn't require having internal perms to the cloud12:01
rm_workin gates it's easy... you use open()12:01
cgoncalvesright12:01
rm_workbut... in a deployed cloud... you'd need to have the noop driver enabled?12:01
rm_workwhich ... what is even the point of doing the test if the noop driver is what your cloud uses12:02
rm_workin a real cloud if you're running tempest you should be testing with real barbican certs, or else if TLS-Termination is disabled, you should skip those tests12:02
rm_workthis kind of driver really *is* only for gates12:04
cgoncalvesnoop cert manager still requires less. or none actually, pre configuration than the local cert manager12:07
cgoncalvesthere are two side of tempest tests: API tests and scenario tests. API tests test against an implementation of the API specification and only that12:08
rm_workyeah ok i guess it's fair to say it's simpler12:08
rm_workbut i still don't buy the "pre-configuration" argument12:08
rm_workboth can be set up live during the test, just for one of them the setup  is calling open() and the other it's ... nothing :D12:09
*** vishalmanchanda has quit IRC12:11
cgoncalvesrm_work, you wouldn't be able to open() if you were to run tempest from outside the octavia controller nodes. with the noop you can12:11
rm_workyeah alright12:12
cgoncalvesyou'd also need to copy the cert files to all your nodes running the octavia API service12:12
rm_worki doubt anyone even uses that anyway12:12
rm_workagain, these have zero use outside of gates12:12
rm_workand gates have at most two  nodes12:12
rm_workso...12:12
* rm_work shrugs12:12
*** tkajinam has quit IRC12:12
rm_workgo ahead and replace it if you want, i'm ambivilent12:14
cgoncalvesI'd maybe delete the local one on a follow-up patch. even though it is hightly discouraged to be used outside testing envs, it probably requires a deprecation12:21
*** rpittau|bbl is now known as rpittau12:31
*** sapd1 has joined #openstack-lbaas12:33
*** vishalmanchanda has joined #openstack-lbaas13:15
*** TrevorV has joined #openstack-lbaas13:22
*** tkajinam has joined #openstack-lbaas13:33
*** ccamposr__ has joined #openstack-lbaas13:41
*** ccamposr has quit IRC13:43
openstackgerritAdam Harwell proposed openstack/octavia master: Fix py3 amphora-agent cert-rotation type bug  https://review.opendev.org/71992213:48
*** dougwig has quit IRC13:50
*** andrein has quit IRC13:50
*** dougwig has joined #openstack-lbaas13:50
*** rpittau has quit IRC13:50
*** andrein has joined #openstack-lbaas13:51
*** rpittau has joined #openstack-lbaas13:51
*** maciejjozefczyk_ is now known as maciejjozefczyk13:55
*** tkajinam has quit IRC14:17
*** tkajinam has joined #openstack-lbaas14:18
*** sapd1 has quit IRC14:24
*** dosaboy_ is now known as dosaboy15:04
*** JasonF is now known as JayF15:25
johnsomrm_work Any luck catching zeek?15:32
johnsomHmm, looks like we have a problem in the devstack plugin too, the act/stdby job is failing looking for redis15:39
rm_work:/15:49
rm_workno he hasn't responded15:49
johnsomSo this seems to be a problem: https://review.opendev.org/#/c/647406/106/octavia/controller/queue/v2/consumer.py15:55
johnsomIt is loading taskflow redis stuff no mater what.15:55
johnsomtaskflow isn't declaring redis as a requirement.15:55
*** gcheresh has joined #openstack-lbaas15:56
johnsomYeah, redis is in the setuptools "extras"15:56
rm_workhmm15:57
haleybjohnsom: maciej thought he was seeing issues with the octavia devstack plugin too wrt post-config, i haven't reproduced, something regarding generating certs, do you typically set OCTAVIA_USE_PREGENERATED_CERTS=True ?15:59
johnsomNo I don't16:00
johnsomI haven't heard of any issues generating certs16:00
johnsomI think we only set that for the multinode jobs, but I might be wrong.16:01
*** rpittau is now known as rpittau|afk16:02
haleybyeah, it was only in the multinode examples, i'll see if there's a diff in our conf files16:03
johnsomYeah, it's only there when there are multiple controllers so each doesn't build their own set of certs.16:04
haleybjohnsom: so one thing i have noticed is that if i run this line in plugin.sh in a testenv: "source create_dual_intermediate_CA.sh" the shell i'm in will die shortly afterwards just running a command, so there's something funky in that script.  Running as ./create_dual_intermediate_CA.sh is fine16:11
johnsomIt's a pretty straight forward script as I remember16:13
haleybmust be the semantics of source vs ./16:13
gthiemongehaleyb: johnsom: sourcing a script that uses set -e could be an issue, the shell will be closed on error16:15
johnsomYou could turn on +x for the whole script if you want to see what it is doing during a run16:15
haleybgthiemonge: i was just googling that16:15
johnsomYeah, it is setup to fail on error as we would want the gates to fail early if something went wrong16:16
haleyblocally it runs fine, although it does complain about a file not existing16:16
johnsomYeah, openssl does that, it just creates the file automatically16:16
johnsomThat will not trigger an exit16:17
haleybjohnsom: it doesn't, but looking in the logs shortly after the plugin.sh code seems to stop and the next service is configured16:18
haleybso that set -e maybe is an issue16:19
johnsomThat needs to stay, it is important16:19
*** tkajinam has quit IRC16:20
haleybputting a set +e at the end helps16:20
johnsomSo some other plugin is failing is what you are saying?16:23
haleybjohnsom: i can't tell, but i don't think the octavia one truly finished16:24
johnsomrm_work For the redis issue.... Should we add the redis extra on taskflow in our requirements, just declare it as a requirement, or stop these conductors from starting if the driver isn't ampv2?16:26
johnsomWhat are your thoughts?16:26
rm_workhmm the latter sounds like it might be the most efficient...16:27
rm_workthe other two would impact other deployments16:27
*** gcheresh has quit IRC16:28
rm_workis there a taskflow[redis]?16:28
rm_workactually that might not be horrible... guessing those libs are rather small and that'd guarantee it works right for anyone wanting to switch over (which hopefully most will)16:29
johnsomThere is a taskflow[redis]16:30
*** psachin has quit IRC16:34
haleybjohnsom: i'll send a patch for the set +e after lunch, definitely seems like a problem16:38
johnsomI don't think we should change anything.16:38
haleybhttp://paste.openstack.org/show/792112/16:39
johnsomI would want to see a detailed story behind the change16:39
haleybjohnsom: it would just do a 'set +e' at the end of the script16:39
rm_workyeah sourcing a `set -e` script can have unintended effects16:39
rm_workIMO we shouldn't even source it16:40
rm_worki don't know why it is done that way16:40
haleybsome command after is returning non-zero which is causing a shell exit, POOF goes the plugin.sh that was running16:40
rm_workshould just run it...16:40
johnsomIt causes devstack to stop on failures and not just continue pretending things are fine. It's on everywhere as far as I know16:40
rm_workright, which normally might be fine in the following scripts16:40
rm_workerr16:40
rm_worknot EVERYWHERE16:40
rm_workit's up to individual scripts16:40
rm_workwe'd be overriding that16:41
haleybjohnsom: it's fine if the script sets it, but it should un-set it when done, the source is causing the parent to inherit it16:41
johnsomIf I remember it's sourced so it has access to the devstack variables, but we may not need that anymore, would have to look at the script again.16:41
johnsomhaleyb Correct, which should (did) also have it set16:41
rm_workerr, running with `./` should inherit vars from parent?16:43
rm_workor does it need to SET devstack vars?16:44
haleybjohnsom: plugin.sh doesn't do it, something else recently is just tickling something, but i was only seeing a 1/10 success rate getting things to work here friday16:45
johnsomI don't think we need anything passed in or exported now. When I re-wrote that a year ago I think I removed the need for any of that.16:45
haleybrm_work: it seems to run the same just as ./create...16:46
rm_workyeah i imagine it would16:46
rm_workmeanwhile, *all of our gates are blocked* because of this sqlalchemy issue16:46
johnsomYeah, and the redis thing16:46
rm_workk16:47
* haleyb goes to lunch, will put up a review later16:47
rm_worki think i vote we add [redis]16:47
rm_workif that's all it needs16:47
johnsomOk, I wanted a second opinion as those extras are... a pain16:47
rm_work... are they? i didn't think so16:48
johnsomTechnically there is a zookeeper option too16:48
rm_worklet me look closer16:48
johnsomhttps://github.com/openstack/taskflow/blob/master/setup.cfg16:48
johnsomMy issue is the extras aren't vetted by G-R and people tend to bundle too much in them16:48
rm_workhmmmmmm16:49
rm_workthe redis one is JUST redis16:49
rm_worki guess it might be best to magically detect16:49
rm_worki'm just thinking about folks who try to turn that on16:49
johnsomyeah, not saying that is an issue here (though the DB one looks...)16:49
rm_workand don't realize they need to hack at the reqs16:49
rm_workjust doing a normal install of our package won't do it at that point16:50
johnsomhttps://github.com/openstack/octavia/blob/master/octavia/common/config.py#L47216:50
rm_workright16:51
openstackgerritMichael Johnson proposed openstack/octavia master: Add the "redis" extra for taskflow requirement  https://review.opendev.org/72003317:03
rm_workgonna have to combine that with a sqlalchemy fix17:03
rm_workoh unless the redis thing is only for a nonvoting gate?17:03
johnsomyeah, it will, just getting it up for comment17:04
*** vishalmanchanda has quit IRC17:04
johnsomrm_work I'm not sure about this "for each endpoint" either: https://review.opendev.org/#/c/647406/106/octavia/controller/queue/v2/consumer.py17:07
johnsomThat might be a  bug as well.17:07
johnsomOk, I'm pivoting to look at if I can work around the sqlalchemy bug17:07
rm_workhmm17:09
rm_workyeah sending to multiple queues might not be good17:09
rm_workcould lead to double-processing?17:09
johnsomNo, I don't think that is the issue, I think it is just starting the number of conductors based on the queue endpoints, instead of like a taskflow worker setting for example17:10
rm_workwouldn't it need one conductor per endpoint?17:12
rm_worki dunno, this is the part of this patch i didn't follow so well17:12
johnsomhttps://docs.openstack.org/taskflow/latest/user/conductors.html17:13
*** maciejjozefczyk has quit IRC17:36
johnsomYeah, if I remove the "#            dbapi_connection.isolation_level = """ line from that sqlalchemy patch, the tests pass again17:44
johnsomThe concerning thing is the tests pass if I run just the DB functionals, so that is super odd.17:46
johnsomOk, so that patch changed the default isolation_level from None to "".17:47
openstackgerritBrian Haley proposed openstack/octavia master: Don't inherit enforcing bash errexit in devstack plugin  https://review.opendev.org/72004117:47
*** gcheresh has joined #openstack-lbaas18:14
rm_workjohnsom: yeah i was trying to figure out that last part -- why running just the DB functionals wouldn't replicate19:14
rm_workjohnsom: it means testing any change requires running the WHOLE suite19:15
rm_workand debugging that test becomes very difficult19:15
johnsomrm_work tox -e functional-py36 -- octavia.tests.functional.db.test_repositories.AllRepositoriesTest.test_create_load_balancer_tree\|octavia.tests.functional.api.v2.test_flavors19:15
rm_workok so just two will do it?19:15
johnsomThis is so strange....19:17
rm_workyes19:17
johnsomI mean, I can instantly fix it by removing the isolation = "" in sqlalchemy19:17
rm_workwe use an inmemory sqlite db for the functionals, right?19:18
rm_worknot a fileDB?19:18
johnsomNo, there are both. Most are in-memory, a few require a file19:18
rm_workhmm19:18
rm_workso, for file DBs, sqlite *cannot* handle concurrency, apparently19:18
rm_workjust because of the way it works19:18
rm_workAFAIU19:18
johnsomright19:18
rm_workso transactional isolation would obviously fail in that case19:19
rm_worki'm trying to figure out why my canary test *suddenly passes*19:19
rm_workit makes it seem like something was FIXED19:19
rm_workand makes me wonder if there's a bug in that other test19:19
rm_worklike, it was written around the bug that got fixed19:19
johnsomI'm ignoring that for now.19:20
johnsomI don't understand your statment about transactions not working on a file backed sqlite, but...19:21
johnsomI think the issue is around sqlalchemy not being thread safe, somewhere we are sharing a session or something.19:22
johnsomIt's very test order dependent19:22
rm_worksince we initialize two sessions19:23
rm_workand do things in separate transactions19:23
rm_workthat's what my test was checking19:23
johnsomThe thing is, the tree test that is failing, is bombing on the part that is all part of one session/transaction19:24
rm_workand we do the same thing in this tree test19:24
rm_workhmm19:25
rm_workerg well i have a meeting followed by sleep19:26
johnsomYeah, I need lunch19:26
*** tobberydberg_ has quit IRC20:17
*** tobberydberg has joined #openstack-lbaas20:22
*** maciejjozefczyk has joined #openstack-lbaas20:30
*** tobberydberg has quit IRC20:30
*** tobberydberg has joined #openstack-lbaas20:36
*** tobberydberg has quit IRC20:37
*** maciejjozefczyk has quit IRC20:39
*** tobberydberg has joined #openstack-lbaas20:42
*** tobberydberg has quit IRC20:43
johnsomWell, I got a sqlalchemy info level capture of the bug20:44
*** tobberydberg has joined #openstack-lbaas20:45
*** tobberydberg has quit IRC20:45
*** tobberydberg has joined #openstack-lbaas20:46
*** tobberydberg has quit IRC20:46
*** tobberydberg has joined #openstack-lbaas20:46
*** tobberydberg has quit IRC20:47
*** tobberydberg has joined #openstack-lbaas20:47
*** tobberydberg has quit IRC20:47
*** tobberydberg has joined #openstack-lbaas20:51
*** gcheresh has quit IRC20:53
*** tobberydberg has quit IRC20:55
*** KeithMnemonic has joined #openstack-lbaas21:08
*** KeithMnemonic has quit IRC21:16
*** KeithMnemonic has joined #openstack-lbaas21:17
johnsomYeah, ok, so before the new version it ran with sqlite autocommit and sqlalchemy non-autocommit. Now it is non-autocommit and non-autocommit.21:21
lxkongjohnsom, rm_work could you take a look at the updated patch for the https://storyboard.openstack.org/#!/story/2007531please? Do you think we could just submit a gerrit patch considering the security class Jeremy suggested?21:51
johnsomI saw you updated, but have not yet reviewed. Path forward would be to submit a patch, however sqlalchemy has broke our gates, so now might not be the best time.21:52
rm_workyeah i think this is a case of "don't let perfect be the enemy of good"21:54
rm_workthat solution is better than nothing, even though it still has some flaws21:54
rm_workand it's not an api-level change so we can always revert it once we can do the totally correct thing21:54
johnsomYeah, I agree, I'm just saying if it's posted now, it may sit for days21:54
rm_workyep21:55
lxkongjohnsom, rm_work, thanks for the suggestion, then I wait for gate issue solved?21:56
rm_workyeah21:57
johnsomYeah, I will post a comment when I have reviewed21:57
lxkongcool, please ping me or leave a comment in the story after that's done, thank you so much21:57
lxkongjohnsom, ack21:57
johnsomI think you addressed my only concern21:57
*** servagem has quit IRC22:13
lxkongjohnsom, rm_work, is redis or zookeeper a hard requirement for the master deployment now?22:24
johnsomlxkong No, not yet.22:24
johnsomIt is only needed if you use the amphorav2 driver at the moment.22:25
rm_workjohnsom: do i need to do anything besides `ifconfig lo up` to make local queries on an amp?22:31
johnsomno, just make sure you are inside the netns22:31
rm_workhmm22:31
rm_workyeah weirdness22:32
johnsomI might do ifup lo22:32
rm_worki can hit a member from the netns22:32
rm_workbut i can't hit it via the local IP22:32
rm_work(of the lb)22:32
rm_workipvsadm shows members up22:33
johnsomAh, UDP... that might be different22:33
rm_workah nm i can reach it, was using the vrrp ip i think22:33
rm_workthe HA IP doesn't show as up, but ipvsadm shows it in use22:33
rm_workand it does work :D22:33
johnsomThis DB stuff is bonkers.22:35
rm_worktesting failover right now22:35
rm_workseems like this LB stopped passing traffic22:35
johnsomOne call, I can see the LB with a select, a few calls later, same transaction, LB missing. Run the test again, LB doesn't disappear22:35
rm_worktrying to figure out what happened22:35
rm_workpre-failover and post-failover HA port looks VERY different22:39
rm_workwtf?22:39
rm_workoh nm i think i see why22:40
lxkong> It is only needed if you use the amphorav2 driver at the moment.22:45
lxkongjohnsom, but it's not possible to config run v1 or v2 consumer for octavia-worker. I updated Octavia for my devstack environment yesterday but failed with http://dpaste.com/36KYZ9B. Then I have to install redis. So I suppose either redis or zookeeper needs to be installed for octavia-worker.22:45
*** born2bake has quit IRC22:46
*** tkajinam has joined #openstack-lbaas22:46
johnsomlxkong What is the output of "openstack loadbalancer provider list "22:50
lxkonghttps://www.irccloud.com/pastebin/d2DGDbVY/22:51
johnsomrm_work ^^^ Yeah, I think there is a bigger problem with the jobboard patch than that missing extra.22:52
johnsomYeah, v2 in the controller worker is enabled by default: https://github.com/openstack/octavia/blob/master/octavia/cmd/octavia_worker.py#L3822:54
johnsomI wonder why the non-v2 gates are passing22:54
rm_workI thought that was turned off22:54
rm_workAlso -- I just deployed that code and it seems to be working fine22:55
rm_workAh. Not using the devstack plugin though :D22:56
rm_workWell isn't that ok? It'll listen on a second queue that just never has anything22:57
rm_workRight?22:57
johnsomIt'22:58
johnsomIt is starting up the taskflow conductors which try to go out to redis22:58
lxkongor it would be good to config which version consumer is running22:59
johnsomOne issue I think is the devstack plugin is assigning instead of ==23:00
johnsomBut it still seems like these conductors are always  going to be spun up, which... isn't what we intended.23:00
*** TrevorV has quit IRC23:06
*** dayou has joined #openstack-lbaas23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!