*** openstack has joined #openstack-lbaas | 09:56 | |
*** ChanServ sets mode: +o openstack | 09:56 | |
*** rpittau is now known as rpittau|bbl | 10:23 | |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Healthmanager opts aren't CLI-related https://review.opendev.org/719921 | 11:26 |
---|---|---|
openstackgerrit | Adam Harwell proposed openstack/octavia master: Fix py3 amphora-agent cert-rotation type bug https://review.opendev.org/719922 | 11:29 |
*** sapd1 has quit IRC | 11:30 | |
rm_work | ^^ that second one is pretty high priority IMO | 11:33 |
rm_work | cgoncalves: you around? could use thoughts -- do we also not need to worry about py2 on amps anymore? | 11:37 |
rm_work | if not, then i don't need any six / typechecking guard code around that, but if we do, then i do | 11:37 |
cgoncalves | rm_work, Stein is still a supported release version and is tested against both py2 and py3 | 11:48 |
rm_work | ok but this is master | 11:48 |
rm_work | i'm referring specifically to patch 719922 | 11:49 |
cgoncalves | rm_work, correct. no six in master. when backporting, please consider py2 too | 11:49 |
rm_work | k | 11:50 |
rm_work | got it, you want to backport it with that fix, kk | 11:50 |
rm_work | i'll do that once it merges | 11:50 |
rm_work | BTW this sucks really hard, my amps just started exploding one by one as their certs came up for rotation | 11:51 |
rm_work | and i think it has been happening before and i didn't even notice because of all the other things that caused amps to explode <_< | 11:53 |
rm_work | but we really need to fix this sqlalchemy issue because we can't merge anything | 11:54 |
rm_work | also cgoncalves i still don't understand https://review.opendev.org/#/c/717619/ | 11:55 |
rm_work | i posted another comment -- the whole design of the original local-cert-manager driver was to enable tempest testing like you're trying to do | 11:55 |
rm_work | it should work fine? | 11:56 |
rm_work | I just don't understand the need for *yet another* noop driver (the local driver was essentially designed to be a noop, it's not usable for anything besides testing) | 11:56 |
rm_work | if we're not going to use the local driver for this, then we may as well delete and replace it with the noop one? | 11:57 |
rm_work | but it seems like a more robust option | 11:57 |
*** servagem has joined #openstack-lbaas | 11:58 | |
cgoncalves | rm_work, I agree replacing the local cert manager with the noop one. | 11:58 |
rm_work | so you'd really rather have this (what seems to me to be) really limited noop option? | 11:59 |
cgoncalves | rm_work, the problem with the local cert manager is it requires pre-configuration prior to running tempest | 11:59 |
rm_work | no? | 11:59 |
rm_work | tempest can drop files in the tests | 11:59 |
rm_work | a test can: write out a certfile and then use it in octavia within the same test | 12:00 |
cgoncalves | tempest should test against the cloud from the outside (black box) so having the pre-req of having cert files in the cloud nodes isn't ideal | 12:00 |
cgoncalves | rm_work, from tempest how would you write out a cert file? | 12:00 |
rm_work | ah yeah i guess i am thinking mostly of gates where it's all on the same couple nodes... it could be complex with a multinode deployed cloud | 12:01 |
cgoncalves | I mean, yes, it is possible but it shouldn't require having internal perms to the cloud | 12:01 |
rm_work | in gates it's easy... you use open() | 12:01 |
cgoncalves | right | 12:01 |
rm_work | but... in a deployed cloud... you'd need to have the noop driver enabled? | 12:01 |
rm_work | which ... what is even the point of doing the test if the noop driver is what your cloud uses | 12:02 |
rm_work | in a real cloud if you're running tempest you should be testing with real barbican certs, or else if TLS-Termination is disabled, you should skip those tests | 12:02 |
rm_work | this kind of driver really *is* only for gates | 12:04 |
cgoncalves | noop cert manager still requires less. or none actually, pre configuration than the local cert manager | 12:07 |
cgoncalves | there are two side of tempest tests: API tests and scenario tests. API tests test against an implementation of the API specification and only that | 12:08 |
rm_work | yeah ok i guess it's fair to say it's simpler | 12:08 |
rm_work | but i still don't buy the "pre-configuration" argument | 12:08 |
rm_work | both can be set up live during the test, just for one of them the setup is calling open() and the other it's ... nothing :D | 12:09 |
*** vishalmanchanda has quit IRC | 12:11 | |
cgoncalves | rm_work, you wouldn't be able to open() if you were to run tempest from outside the octavia controller nodes. with the noop you can | 12:11 |
rm_work | yeah alright | 12:12 |
cgoncalves | you'd also need to copy the cert files to all your nodes running the octavia API service | 12:12 |
rm_work | i doubt anyone even uses that anyway | 12:12 |
rm_work | again, these have zero use outside of gates | 12:12 |
rm_work | and gates have at most two nodes | 12:12 |
rm_work | so... | 12:12 |
* rm_work shrugs | 12:12 | |
*** tkajinam has quit IRC | 12:12 | |
rm_work | go ahead and replace it if you want, i'm ambivilent | 12:14 |
cgoncalves | I'd maybe delete the local one on a follow-up patch. even though it is hightly discouraged to be used outside testing envs, it probably requires a deprecation | 12:21 |
*** rpittau|bbl is now known as rpittau | 12:31 | |
*** sapd1 has joined #openstack-lbaas | 12:33 | |
*** vishalmanchanda has joined #openstack-lbaas | 13:15 | |
*** TrevorV has joined #openstack-lbaas | 13:22 | |
*** tkajinam has joined #openstack-lbaas | 13:33 | |
*** ccamposr__ has joined #openstack-lbaas | 13:41 | |
*** ccamposr has quit IRC | 13:43 | |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Fix py3 amphora-agent cert-rotation type bug https://review.opendev.org/719922 | 13:48 |
*** dougwig has quit IRC | 13:50 | |
*** andrein has quit IRC | 13:50 | |
*** dougwig has joined #openstack-lbaas | 13:50 | |
*** rpittau has quit IRC | 13:50 | |
*** andrein has joined #openstack-lbaas | 13:51 | |
*** rpittau has joined #openstack-lbaas | 13:51 | |
*** maciejjozefczyk_ is now known as maciejjozefczyk | 13:55 | |
*** tkajinam has quit IRC | 14:17 | |
*** tkajinam has joined #openstack-lbaas | 14:18 | |
*** sapd1 has quit IRC | 14:24 | |
*** dosaboy_ is now known as dosaboy | 15:04 | |
*** JasonF is now known as JayF | 15:25 | |
johnsom | rm_work Any luck catching zeek? | 15:32 |
johnsom | Hmm, looks like we have a problem in the devstack plugin too, the act/stdby job is failing looking for redis | 15:39 |
rm_work | :/ | 15:49 |
rm_work | no he hasn't responded | 15:49 |
johnsom | So this seems to be a problem: https://review.opendev.org/#/c/647406/106/octavia/controller/queue/v2/consumer.py | 15:55 |
johnsom | It is loading taskflow redis stuff no mater what. | 15:55 |
johnsom | taskflow isn't declaring redis as a requirement. | 15:55 |
*** gcheresh has joined #openstack-lbaas | 15:56 | |
johnsom | Yeah, redis is in the setuptools "extras" | 15:56 |
rm_work | hmm | 15:57 |
haleyb | johnsom: maciej thought he was seeing issues with the octavia devstack plugin too wrt post-config, i haven't reproduced, something regarding generating certs, do you typically set OCTAVIA_USE_PREGENERATED_CERTS=True ? | 15:59 |
johnsom | No I don't | 16:00 |
johnsom | I haven't heard of any issues generating certs | 16:00 |
johnsom | I think we only set that for the multinode jobs, but I might be wrong. | 16:01 |
*** rpittau is now known as rpittau|afk | 16:02 | |
haleyb | yeah, it was only in the multinode examples, i'll see if there's a diff in our conf files | 16:03 |
johnsom | Yeah, it's only there when there are multiple controllers so each doesn't build their own set of certs. | 16:04 |
haleyb | johnsom: so one thing i have noticed is that if i run this line in plugin.sh in a testenv: "source create_dual_intermediate_CA.sh" the shell i'm in will die shortly afterwards just running a command, so there's something funky in that script. Running as ./create_dual_intermediate_CA.sh is fine | 16:11 |
johnsom | It's a pretty straight forward script as I remember | 16:13 |
haleyb | must be the semantics of source vs ./ | 16:13 |
gthiemonge | haleyb: johnsom: sourcing a script that uses set -e could be an issue, the shell will be closed on error | 16:15 |
johnsom | You could turn on +x for the whole script if you want to see what it is doing during a run | 16:15 |
haleyb | gthiemonge: i was just googling that | 16:15 |
johnsom | Yeah, it is setup to fail on error as we would want the gates to fail early if something went wrong | 16:16 |
haleyb | locally it runs fine, although it does complain about a file not existing | 16:16 |
johnsom | Yeah, openssl does that, it just creates the file automatically | 16:16 |
johnsom | That will not trigger an exit | 16:17 |
haleyb | johnsom: it doesn't, but looking in the logs shortly after the plugin.sh code seems to stop and the next service is configured | 16:18 |
haleyb | so that set -e maybe is an issue | 16:19 |
johnsom | That needs to stay, it is important | 16:19 |
*** tkajinam has quit IRC | 16:20 | |
haleyb | putting a set +e at the end helps | 16:20 |
johnsom | So some other plugin is failing is what you are saying? | 16:23 |
haleyb | johnsom: i can't tell, but i don't think the octavia one truly finished | 16:24 |
johnsom | rm_work For the redis issue.... Should we add the redis extra on taskflow in our requirements, just declare it as a requirement, or stop these conductors from starting if the driver isn't ampv2? | 16:26 |
johnsom | What are your thoughts? | 16:26 |
rm_work | hmm the latter sounds like it might be the most efficient... | 16:27 |
rm_work | the other two would impact other deployments | 16:27 |
*** gcheresh has quit IRC | 16:28 | |
rm_work | is there a taskflow[redis]? | 16:28 |
rm_work | actually that might not be horrible... guessing those libs are rather small and that'd guarantee it works right for anyone wanting to switch over (which hopefully most will) | 16:29 |
johnsom | There is a taskflow[redis] | 16:30 |
*** psachin has quit IRC | 16:34 | |
haleyb | johnsom: i'll send a patch for the set +e after lunch, definitely seems like a problem | 16:38 |
johnsom | I don't think we should change anything. | 16:38 |
haleyb | http://paste.openstack.org/show/792112/ | 16:39 |
johnsom | I would want to see a detailed story behind the change | 16:39 |
haleyb | johnsom: it would just do a 'set +e' at the end of the script | 16:39 |
rm_work | yeah sourcing a `set -e` script can have unintended effects | 16:39 |
rm_work | IMO we shouldn't even source it | 16:40 |
rm_work | i don't know why it is done that way | 16:40 |
haleyb | some command after is returning non-zero which is causing a shell exit, POOF goes the plugin.sh that was running | 16:40 |
rm_work | should just run it... | 16:40 |
johnsom | It causes devstack to stop on failures and not just continue pretending things are fine. It's on everywhere as far as I know | 16:40 |
rm_work | right, which normally might be fine in the following scripts | 16:40 |
rm_work | err | 16:40 |
rm_work | not EVERYWHERE | 16:40 |
rm_work | it's up to individual scripts | 16:40 |
rm_work | we'd be overriding that | 16:41 |
haleyb | johnsom: it's fine if the script sets it, but it should un-set it when done, the source is causing the parent to inherit it | 16:41 |
johnsom | If I remember it's sourced so it has access to the devstack variables, but we may not need that anymore, would have to look at the script again. | 16:41 |
johnsom | haleyb Correct, which should (did) also have it set | 16:41 |
rm_work | err, running with `./` should inherit vars from parent? | 16:43 |
rm_work | or does it need to SET devstack vars? | 16:44 |
haleyb | johnsom: plugin.sh doesn't do it, something else recently is just tickling something, but i was only seeing a 1/10 success rate getting things to work here friday | 16:45 |
johnsom | I don't think we need anything passed in or exported now. When I re-wrote that a year ago I think I removed the need for any of that. | 16:45 |
haleyb | rm_work: it seems to run the same just as ./create... | 16:46 |
rm_work | yeah i imagine it would | 16:46 |
rm_work | meanwhile, *all of our gates are blocked* because of this sqlalchemy issue | 16:46 |
johnsom | Yeah, and the redis thing | 16:46 |
rm_work | k | 16:47 |
* haleyb goes to lunch, will put up a review later | 16:47 | |
rm_work | i think i vote we add [redis] | 16:47 |
rm_work | if that's all it needs | 16:47 |
johnsom | Ok, I wanted a second opinion as those extras are... a pain | 16:47 |
rm_work | ... are they? i didn't think so | 16:48 |
johnsom | Technically there is a zookeeper option too | 16:48 |
rm_work | let me look closer | 16:48 |
johnsom | https://github.com/openstack/taskflow/blob/master/setup.cfg | 16:48 |
johnsom | My issue is the extras aren't vetted by G-R and people tend to bundle too much in them | 16:48 |
rm_work | hmmmmmm | 16:49 |
rm_work | the redis one is JUST redis | 16:49 |
rm_work | i guess it might be best to magically detect | 16:49 |
rm_work | i'm just thinking about folks who try to turn that on | 16:49 |
johnsom | yeah, not saying that is an issue here (though the DB one looks...) | 16:49 |
rm_work | and don't realize they need to hack at the reqs | 16:49 |
rm_work | just doing a normal install of our package won't do it at that point | 16:50 |
johnsom | https://github.com/openstack/octavia/blob/master/octavia/common/config.py#L472 | 16:50 |
rm_work | right | 16:51 |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Add the "redis" extra for taskflow requirement https://review.opendev.org/720033 | 17:03 |
rm_work | gonna have to combine that with a sqlalchemy fix | 17:03 |
rm_work | oh unless the redis thing is only for a nonvoting gate? | 17:03 |
johnsom | yeah, it will, just getting it up for comment | 17:04 |
*** vishalmanchanda has quit IRC | 17:04 | |
johnsom | rm_work I'm not sure about this "for each endpoint" either: https://review.opendev.org/#/c/647406/106/octavia/controller/queue/v2/consumer.py | 17:07 |
johnsom | That might be a bug as well. | 17:07 |
johnsom | Ok, I'm pivoting to look at if I can work around the sqlalchemy bug | 17:07 |
rm_work | hmm | 17:09 |
rm_work | yeah sending to multiple queues might not be good | 17:09 |
rm_work | could lead to double-processing? | 17:09 |
johnsom | No, I don't think that is the issue, I think it is just starting the number of conductors based on the queue endpoints, instead of like a taskflow worker setting for example | 17:10 |
rm_work | wouldn't it need one conductor per endpoint? | 17:12 |
rm_work | i dunno, this is the part of this patch i didn't follow so well | 17:12 |
johnsom | https://docs.openstack.org/taskflow/latest/user/conductors.html | 17:13 |
*** maciejjozefczyk has quit IRC | 17:36 | |
johnsom | Yeah, if I remove the "# dbapi_connection.isolation_level = """ line from that sqlalchemy patch, the tests pass again | 17:44 |
johnsom | The concerning thing is the tests pass if I run just the DB functionals, so that is super odd. | 17:46 |
johnsom | Ok, so that patch changed the default isolation_level from None to "". | 17:47 |
openstackgerrit | Brian Haley proposed openstack/octavia master: Don't inherit enforcing bash errexit in devstack plugin https://review.opendev.org/720041 | 17:47 |
*** gcheresh has joined #openstack-lbaas | 18:14 | |
rm_work | johnsom: yeah i was trying to figure out that last part -- why running just the DB functionals wouldn't replicate | 19:14 |
rm_work | johnsom: it means testing any change requires running the WHOLE suite | 19:15 |
rm_work | and debugging that test becomes very difficult | 19:15 |
johnsom | rm_work tox -e functional-py36 -- octavia.tests.functional.db.test_repositories.AllRepositoriesTest.test_create_load_balancer_tree\|octavia.tests.functional.api.v2.test_flavors | 19:15 |
rm_work | ok so just two will do it? | 19:15 |
johnsom | This is so strange.... | 19:17 |
rm_work | yes | 19:17 |
johnsom | I mean, I can instantly fix it by removing the isolation = "" in sqlalchemy | 19:17 |
rm_work | we use an inmemory sqlite db for the functionals, right? | 19:18 |
rm_work | not a fileDB? | 19:18 |
johnsom | No, there are both. Most are in-memory, a few require a file | 19:18 |
rm_work | hmm | 19:18 |
rm_work | so, for file DBs, sqlite *cannot* handle concurrency, apparently | 19:18 |
rm_work | just because of the way it works | 19:18 |
rm_work | AFAIU | 19:18 |
johnsom | right | 19:18 |
rm_work | so transactional isolation would obviously fail in that case | 19:19 |
rm_work | i'm trying to figure out why my canary test *suddenly passes* | 19:19 |
rm_work | it makes it seem like something was FIXED | 19:19 |
rm_work | and makes me wonder if there's a bug in that other test | 19:19 |
rm_work | like, it was written around the bug that got fixed | 19:19 |
johnsom | I'm ignoring that for now. | 19:20 |
johnsom | I don't understand your statment about transactions not working on a file backed sqlite, but... | 19:21 |
johnsom | I think the issue is around sqlalchemy not being thread safe, somewhere we are sharing a session or something. | 19:22 |
johnsom | It's very test order dependent | 19:22 |
rm_work | since we initialize two sessions | 19:23 |
rm_work | and do things in separate transactions | 19:23 |
rm_work | that's what my test was checking | 19:23 |
johnsom | The thing is, the tree test that is failing, is bombing on the part that is all part of one session/transaction | 19:24 |
rm_work | and we do the same thing in this tree test | 19:24 |
rm_work | hmm | 19:25 |
rm_work | erg well i have a meeting followed by sleep | 19:26 |
johnsom | Yeah, I need lunch | 19:26 |
*** tobberydberg_ has quit IRC | 20:17 | |
*** tobberydberg has joined #openstack-lbaas | 20:22 | |
*** maciejjozefczyk has joined #openstack-lbaas | 20:30 | |
*** tobberydberg has quit IRC | 20:30 | |
*** tobberydberg has joined #openstack-lbaas | 20:36 | |
*** tobberydberg has quit IRC | 20:37 | |
*** maciejjozefczyk has quit IRC | 20:39 | |
*** tobberydberg has joined #openstack-lbaas | 20:42 | |
*** tobberydberg has quit IRC | 20:43 | |
johnsom | Well, I got a sqlalchemy info level capture of the bug | 20:44 |
*** tobberydberg has joined #openstack-lbaas | 20:45 | |
*** tobberydberg has quit IRC | 20:45 | |
*** tobberydberg has joined #openstack-lbaas | 20:46 | |
*** tobberydberg has quit IRC | 20:46 | |
*** tobberydberg has joined #openstack-lbaas | 20:46 | |
*** tobberydberg has quit IRC | 20:47 | |
*** tobberydberg has joined #openstack-lbaas | 20:47 | |
*** tobberydberg has quit IRC | 20:47 | |
*** tobberydberg has joined #openstack-lbaas | 20:51 | |
*** gcheresh has quit IRC | 20:53 | |
*** tobberydberg has quit IRC | 20:55 | |
*** KeithMnemonic has joined #openstack-lbaas | 21:08 | |
*** KeithMnemonic has quit IRC | 21:16 | |
*** KeithMnemonic has joined #openstack-lbaas | 21:17 | |
johnsom | Yeah, ok, so before the new version it ran with sqlite autocommit and sqlalchemy non-autocommit. Now it is non-autocommit and non-autocommit. | 21:21 |
lxkong | johnsom, rm_work could you take a look at the updated patch for the https://storyboard.openstack.org/#!/story/2007531please? Do you think we could just submit a gerrit patch considering the security class Jeremy suggested? | 21:51 |
johnsom | I saw you updated, but have not yet reviewed. Path forward would be to submit a patch, however sqlalchemy has broke our gates, so now might not be the best time. | 21:52 |
rm_work | yeah i think this is a case of "don't let perfect be the enemy of good" | 21:54 |
rm_work | that solution is better than nothing, even though it still has some flaws | 21:54 |
rm_work | and it's not an api-level change so we can always revert it once we can do the totally correct thing | 21:54 |
johnsom | Yeah, I agree, I'm just saying if it's posted now, it may sit for days | 21:54 |
rm_work | yep | 21:55 |
lxkong | johnsom, rm_work, thanks for the suggestion, then I wait for gate issue solved? | 21:56 |
rm_work | yeah | 21:57 |
johnsom | Yeah, I will post a comment when I have reviewed | 21:57 |
lxkong | cool, please ping me or leave a comment in the story after that's done, thank you so much | 21:57 |
lxkong | johnsom, ack | 21:57 |
johnsom | I think you addressed my only concern | 21:57 |
*** servagem has quit IRC | 22:13 | |
lxkong | johnsom, rm_work, is redis or zookeeper a hard requirement for the master deployment now? | 22:24 |
johnsom | lxkong No, not yet. | 22:24 |
johnsom | It is only needed if you use the amphorav2 driver at the moment. | 22:25 |
rm_work | johnsom: do i need to do anything besides `ifconfig lo up` to make local queries on an amp? | 22:31 |
johnsom | no, just make sure you are inside the netns | 22:31 |
rm_work | hmm | 22:31 |
rm_work | yeah weirdness | 22:32 |
johnsom | I might do ifup lo | 22:32 |
rm_work | i can hit a member from the netns | 22:32 |
rm_work | but i can't hit it via the local IP | 22:32 |
rm_work | (of the lb) | 22:32 |
rm_work | ipvsadm shows members up | 22:33 |
johnsom | Ah, UDP... that might be different | 22:33 |
rm_work | ah nm i can reach it, was using the vrrp ip i think | 22:33 |
rm_work | the HA IP doesn't show as up, but ipvsadm shows it in use | 22:33 |
rm_work | and it does work :D | 22:33 |
johnsom | This DB stuff is bonkers. | 22:35 |
rm_work | testing failover right now | 22:35 |
rm_work | seems like this LB stopped passing traffic | 22:35 |
johnsom | One call, I can see the LB with a select, a few calls later, same transaction, LB missing. Run the test again, LB doesn't disappear | 22:35 |
rm_work | trying to figure out what happened | 22:35 |
rm_work | pre-failover and post-failover HA port looks VERY different | 22:39 |
rm_work | wtf? | 22:39 |
rm_work | oh nm i think i see why | 22:40 |
lxkong | > It is only needed if you use the amphorav2 driver at the moment. | 22:45 |
lxkong | johnsom, but it's not possible to config run v1 or v2 consumer for octavia-worker. I updated Octavia for my devstack environment yesterday but failed with http://dpaste.com/36KYZ9B. Then I have to install redis. So I suppose either redis or zookeeper needs to be installed for octavia-worker. | 22:45 |
*** born2bake has quit IRC | 22:46 | |
*** tkajinam has joined #openstack-lbaas | 22:46 | |
johnsom | lxkong What is the output of "openstack loadbalancer provider list " | 22:50 |
lxkong | https://www.irccloud.com/pastebin/d2DGDbVY/ | 22:51 |
johnsom | rm_work ^^^ Yeah, I think there is a bigger problem with the jobboard patch than that missing extra. | 22:52 |
johnsom | Yeah, v2 in the controller worker is enabled by default: https://github.com/openstack/octavia/blob/master/octavia/cmd/octavia_worker.py#L38 | 22:54 |
johnsom | I wonder why the non-v2 gates are passing | 22:54 |
rm_work | I thought that was turned off | 22:54 |
rm_work | Also -- I just deployed that code and it seems to be working fine | 22:55 |
rm_work | Ah. Not using the devstack plugin though :D | 22:56 |
rm_work | Well isn't that ok? It'll listen on a second queue that just never has anything | 22:57 |
rm_work | Right? | 22:57 |
johnsom | It' | 22:58 |
johnsom | It is starting up the taskflow conductors which try to go out to redis | 22:58 |
lxkong | or it would be good to config which version consumer is running | 22:59 |
johnsom | One issue I think is the devstack plugin is assigning instead of == | 23:00 |
johnsom | But it still seems like these conductors are always going to be spun up, which... isn't what we intended. | 23:00 |
*** TrevorV has quit IRC | 23:06 | |
*** dayou has joined #openstack-lbaas | 23:46 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!