Tuesday, 2020-04-14

*** openstack has joined #openstack-lbaas		09:56
*** ChanServ sets mode: +o openstack		09:56
*** rpittau is now known as rpittau\|bbl		10:23
openstackgerrit	Adam Harwell proposed openstack/octavia master: Healthmanager opts aren't CLI-related https://review.opendev.org/719921	11:26
openstackgerrit	Adam Harwell proposed openstack/octavia master: Fix py3 amphora-agent cert-rotation type bug https://review.opendev.org/719922	11:29
*** sapd1 has quit IRC		11:30
rm_work	^^ that second one is pretty high priority IMO	11:33
rm_work	cgoncalves: you around? could use thoughts -- do we also not need to worry about py2 on amps anymore?	11:37
rm_work	if not, then i don't need any six / typechecking guard code around that, but if we do, then i do	11:37
cgoncalves	rm_work, Stein is still a supported release version and is tested against both py2 and py3	11:48
rm_work	ok but this is master	11:48
rm_work	i'm referring specifically to patch 719922	11:49
cgoncalves	rm_work, correct. no six in master. when backporting, please consider py2 too	11:49
rm_work	k	11:50
rm_work	got it, you want to backport it with that fix, kk	11:50
rm_work	i'll do that once it merges	11:50
rm_work	BTW this sucks really hard, my amps just started exploding one by one as their certs came up for rotation	11:51
rm_work	and i think it has been happening before and i didn't even notice because of all the other things that caused amps to explode <_<	11:53
rm_work	but we really need to fix this sqlalchemy issue because we can't merge anything	11:54
rm_work	also cgoncalves i still don't understand https://review.opendev.org/#/c/717619/	11:55
rm_work	i posted another comment -- the whole design of the original local-cert-manager driver was to enable tempest testing like you're trying to do	11:55
rm_work	it should work fine?	11:56
rm_work	I just don't understand the need for yet another noop driver (the local driver was essentially designed to be a noop, it's not usable for anything besides testing)	11:56
rm_work	if we're not going to use the local driver for this, then we may as well delete and replace it with the noop one?	11:57
rm_work	but it seems like a more robust option	11:57
*** servagem has joined #openstack-lbaas		11:58
cgoncalves	rm_work, I agree replacing the local cert manager with the noop one.	11:58
rm_work	so you'd really rather have this (what seems to me to be) really limited noop option?	11:59
cgoncalves	rm_work, the problem with the local cert manager is it requires pre-configuration prior to running tempest	11:59
rm_work	no?	11:59
rm_work	tempest can drop files in the tests	11:59
rm_work	a test can: write out a certfile and then use it in octavia within the same test	12:00
cgoncalves	tempest should test against the cloud from the outside (black box) so having the pre-req of having cert files in the cloud nodes isn't ideal	12:00
cgoncalves	rm_work, from tempest how would you write out a cert file?	12:00
rm_work	ah yeah i guess i am thinking mostly of gates where it's all on the same couple nodes... it could be complex with a multinode deployed cloud	12:01
cgoncalves	I mean, yes, it is possible but it shouldn't require having internal perms to the cloud	12:01
rm_work	in gates it's easy... you use open()	12:01
cgoncalves	right	12:01
rm_work	but... in a deployed cloud... you'd need to have the noop driver enabled?	12:01
rm_work	which ... what is even the point of doing the test if the noop driver is what your cloud uses	12:02
rm_work	in a real cloud if you're running tempest you should be testing with real barbican certs, or else if TLS-Termination is disabled, you should skip those tests	12:02
rm_work	this kind of driver really is only for gates	12:04
cgoncalves	noop cert manager still requires less. or none actually, pre configuration than the local cert manager	12:07
cgoncalves	there are two side of tempest tests: API tests and scenario tests. API tests test against an implementation of the API specification and only that	12:08
rm_work	yeah ok i guess it's fair to say it's simpler	12:08
rm_work	but i still don't buy the "pre-configuration" argument	12:08
rm_work	both can be set up live during the test, just for one of them the setup is calling open() and the other it's ... nothing :D	12:09
*** vishalmanchanda has quit IRC		12:11
cgoncalves	rm_work, you wouldn't be able to open() if you were to run tempest from outside the octavia controller nodes. with the noop you can	12:11
rm_work	yeah alright	12:12
cgoncalves	you'd also need to copy the cert files to all your nodes running the octavia API service	12:12
rm_work	i doubt anyone even uses that anyway	12:12
rm_work	again, these have zero use outside of gates	12:12
rm_work	and gates have at most two nodes	12:12
rm_work	so...	12:12
* rm_work shrugs		12:12
*** tkajinam has quit IRC		12:12
rm_work	go ahead and replace it if you want, i'm ambivilent	12:14
cgoncalves	I'd maybe delete the local one on a follow-up patch. even though it is hightly discouraged to be used outside testing envs, it probably requires a deprecation	12:21
*** rpittau\|bbl is now known as rpittau		12:31
*** sapd1 has joined #openstack-lbaas		12:33
*** vishalmanchanda has joined #openstack-lbaas		13:15
*** TrevorV has joined #openstack-lbaas		13:22
*** tkajinam has joined #openstack-lbaas		13:33
*** ccamposr__ has joined #openstack-lbaas		13:41
*** ccamposr has quit IRC		13:43
openstackgerrit	Adam Harwell proposed openstack/octavia master: Fix py3 amphora-agent cert-rotation type bug https://review.opendev.org/719922	13:48
*** dougwig has quit IRC		13:50
*** andrein has quit IRC		13:50
*** dougwig has joined #openstack-lbaas		13:50
*** rpittau has quit IRC		13:50
*** andrein has joined #openstack-lbaas		13:51
*** rpittau has joined #openstack-lbaas		13:51
*** maciejjozefczyk_ is now known as maciejjozefczyk		13:55
*** tkajinam has quit IRC		14:17
*** tkajinam has joined #openstack-lbaas		14:18
*** sapd1 has quit IRC		14:24
*** dosaboy_ is now known as dosaboy		15:04
*** JasonF is now known as JayF		15:25
johnsom	rm_work Any luck catching zeek?	15:32
johnsom	Hmm, looks like we have a problem in the devstack plugin too, the act/stdby job is failing looking for redis	15:39
rm_work	:/	15:49
rm_work	no he hasn't responded	15:49
johnsom	So this seems to be a problem: https://review.opendev.org/#/c/647406/106/octavia/controller/queue/v2/consumer.py	15:55
johnsom	It is loading taskflow redis stuff no mater what.	15:55
johnsom	taskflow isn't declaring redis as a requirement.	15:55
*** gcheresh has joined #openstack-lbaas		15:56
johnsom	Yeah, redis is in the setuptools "extras"	15:56
rm_work	hmm	15:57
haleyb	johnsom: maciej thought he was seeing issues with the octavia devstack plugin too wrt post-config, i haven't reproduced, something regarding generating certs, do you typically set OCTAVIA_USE_PREGENERATED_CERTS=True ?	15:59
johnsom	No I don't	16:00
johnsom	I haven't heard of any issues generating certs	16:00
johnsom	I think we only set that for the multinode jobs, but I might be wrong.	16:01
*** rpittau is now known as rpittau\|afk		16:02
haleyb	yeah, it was only in the multinode examples, i'll see if there's a diff in our conf files	16:03
johnsom	Yeah, it's only there when there are multiple controllers so each doesn't build their own set of certs.	16:04
haleyb	johnsom: so one thing i have noticed is that if i run this line in plugin.sh in a testenv: "source create_dual_intermediate_CA.sh" the shell i'm in will die shortly afterwards just running a command, so there's something funky in that script. Running as ./create_dual_intermediate_CA.sh is fine	16:11
johnsom	It's a pretty straight forward script as I remember	16:13
haleyb	must be the semantics of source vs ./	16:13
gthiemonge	haleyb: johnsom: sourcing a script that uses set -e could be an issue, the shell will be closed on error	16:15
johnsom	You could turn on +x for the whole script if you want to see what it is doing during a run	16:15
haleyb	gthiemonge: i was just googling that	16:15
johnsom	Yeah, it is setup to fail on error as we would want the gates to fail early if something went wrong	16:16
haleyb	locally it runs fine, although it does complain about a file not existing	16:16
johnsom	Yeah, openssl does that, it just creates the file automatically	16:16
johnsom	That will not trigger an exit	16:17
haleyb	johnsom: it doesn't, but looking in the logs shortly after the plugin.sh code seems to stop and the next service is configured	16:18
haleyb	so that set -e maybe is an issue	16:19
johnsom	That needs to stay, it is important	16:19
*** tkajinam has quit IRC		16:20
haleyb	putting a set +e at the end helps	16:20
johnsom	So some other plugin is failing is what you are saying?	16:23
haleyb	johnsom: i can't tell, but i don't think the octavia one truly finished	16:24
johnsom	rm_work For the redis issue.... Should we add the redis extra on taskflow in our requirements, just declare it as a requirement, or stop these conductors from starting if the driver isn't ampv2?	16:26
johnsom	What are your thoughts?	16:26
rm_work	hmm the latter sounds like it might be the most efficient...	16:27
rm_work	the other two would impact other deployments	16:27
*** gcheresh has quit IRC		16:28
rm_work	is there a taskflow[redis]?	16:28
rm_work	actually that might not be horrible... guessing those libs are rather small and that'd guarantee it works right for anyone wanting to switch over (which hopefully most will)	16:29
johnsom	There is a taskflow[redis]	16:30
*** psachin has quit IRC		16:34
haleyb	johnsom: i'll send a patch for the set +e after lunch, definitely seems like a problem	16:38
johnsom	I don't think we should change anything.	16:38
haleyb	http://paste.openstack.org/show/792112/	16:39
johnsom	I would want to see a detailed story behind the change	16:39
haleyb	johnsom: it would just do a 'set +e' at the end of the script	16:39
rm_work	yeah sourcing a `set -e` script can have unintended effects	16:39
rm_work	IMO we shouldn't even source it	16:40
rm_work	i don't know why it is done that way	16:40
haleyb	some command after is returning non-zero which is causing a shell exit, POOF goes the plugin.sh that was running	16:40
rm_work	should just run it...	16:40
johnsom	It causes devstack to stop on failures and not just continue pretending things are fine. It's on everywhere as far as I know	16:40
rm_work	right, which normally might be fine in the following scripts	16:40
rm_work	err	16:40
rm_work	not EVERYWHERE	16:40
rm_work	it's up to individual scripts	16:40
rm_work	we'd be overriding that	16:41
haleyb	johnsom: it's fine if the script sets it, but it should un-set it when done, the source is causing the parent to inherit it	16:41
johnsom	If I remember it's sourced so it has access to the devstack variables, but we may not need that anymore, would have to look at the script again.	16:41
johnsom	haleyb Correct, which should (did) also have it set	16:41
rm_work	err, running with `./` should inherit vars from parent?	16:43
rm_work	or does it need to SET devstack vars?	16:44
haleyb	johnsom: plugin.sh doesn't do it, something else recently is just tickling something, but i was only seeing a 1/10 success rate getting things to work here friday	16:45
johnsom	I don't think we need anything passed in or exported now. When I re-wrote that a year ago I think I removed the need for any of that.	16:45
haleyb	rm_work: it seems to run the same just as ./create...	16:46
rm_work	yeah i imagine it would	16:46
rm_work	meanwhile, all of our gates are blocked because of this sqlalchemy issue	16:46
johnsom	Yeah, and the redis thing	16:46
rm_work	k	16:47
* haleyb goes to lunch, will put up a review later		16:47
rm_work	i think i vote we add [redis]	16:47
rm_work	if that's all it needs	16:47
johnsom	Ok, I wanted a second opinion as those extras are... a pain	16:47
rm_work	... are they? i didn't think so	16:48
johnsom	Technically there is a zookeeper option too	16:48
rm_work	let me look closer	16:48
johnsom	https://github.com/openstack/taskflow/blob/master/setup.cfg	16:48
johnsom	My issue is the extras aren't vetted by G-R and people tend to bundle too much in them	16:48
rm_work	hmmmmmm	16:49
rm_work	the redis one is JUST redis	16:49
rm_work	i guess it might be best to magically detect	16:49
rm_work	i'm just thinking about folks who try to turn that on	16:49
johnsom	yeah, not saying that is an issue here (though the DB one looks...)	16:49
rm_work	and don't realize they need to hack at the reqs	16:49
rm_work	just doing a normal install of our package won't do it at that point	16:50
johnsom	https://github.com/openstack/octavia/blob/master/octavia/common/config.py#L472	16:50
rm_work	right	16:51
openstackgerrit	Michael Johnson proposed openstack/octavia master: Add the "redis" extra for taskflow requirement https://review.opendev.org/720033	17:03
rm_work	gonna have to combine that with a sqlalchemy fix	17:03
rm_work	oh unless the redis thing is only for a nonvoting gate?	17:03
johnsom	yeah, it will, just getting it up for comment	17:04
*** vishalmanchanda has quit IRC		17:04
johnsom	rm_work I'm not sure about this "for each endpoint" either: https://review.opendev.org/#/c/647406/106/octavia/controller/queue/v2/consumer.py	17:07
johnsom	That might be a bug as well.	17:07
johnsom	Ok, I'm pivoting to look at if I can work around the sqlalchemy bug	17:07
rm_work	hmm	17:09
rm_work	yeah sending to multiple queues might not be good	17:09
rm_work	could lead to double-processing?	17:09
johnsom	No, I don't think that is the issue, I think it is just starting the number of conductors based on the queue endpoints, instead of like a taskflow worker setting for example	17:10
rm_work	wouldn't it need one conductor per endpoint?	17:12
rm_work	i dunno, this is the part of this patch i didn't follow so well	17:12
johnsom	https://docs.openstack.org/taskflow/latest/user/conductors.html	17:13
*** maciejjozefczyk has quit IRC		17:36
johnsom	Yeah, if I remove the "# dbapi_connection.isolation_level = """ line from that sqlalchemy patch, the tests pass again	17:44
johnsom	The concerning thing is the tests pass if I run just the DB functionals, so that is super odd.	17:46
johnsom	Ok, so that patch changed the default isolation_level from None to "".	17:47
openstackgerrit	Brian Haley proposed openstack/octavia master: Don't inherit enforcing bash errexit in devstack plugin https://review.opendev.org/720041	17:47
*** gcheresh has joined #openstack-lbaas		18:14
rm_work	johnsom: yeah i was trying to figure out that last part -- why running just the DB functionals wouldn't replicate	19:14
rm_work	johnsom: it means testing any change requires running the WHOLE suite	19:15
rm_work	and debugging that test becomes very difficult	19:15
johnsom	rm_work tox -e functional-py36 -- octavia.tests.functional.db.test_repositories.AllRepositoriesTest.test_create_load_balancer_tree\\|octavia.tests.functional.api.v2.test_flavors	19:15
rm_work	ok so just two will do it?	19:15
johnsom	This is so strange....	19:17
rm_work	yes	19:17
johnsom	I mean, I can instantly fix it by removing the isolation = "" in sqlalchemy	19:17
rm_work	we use an inmemory sqlite db for the functionals, right?	19:18
rm_work	not a fileDB?	19:18
johnsom	No, there are both. Most are in-memory, a few require a file	19:18
rm_work	hmm	19:18
rm_work	so, for file DBs, sqlite cannot handle concurrency, apparently	19:18
rm_work	just because of the way it works	19:18
rm_work	AFAIU	19:18
johnsom	right	19:18
rm_work	so transactional isolation would obviously fail in that case	19:19
rm_work	i'm trying to figure out why my canary test suddenly passes	19:19
rm_work	it makes it seem like something was FIXED	19:19
rm_work	and makes me wonder if there's a bug in that other test	19:19
rm_work	like, it was written around the bug that got fixed	19:19
johnsom	I'm ignoring that for now.	19:20
johnsom	I don't understand your statment about transactions not working on a file backed sqlite, but...	19:21
johnsom	I think the issue is around sqlalchemy not being thread safe, somewhere we are sharing a session or something.	19:22
johnsom	It's very test order dependent	19:22
rm_work	since we initialize two sessions	19:23
rm_work	and do things in separate transactions	19:23
rm_work	that's what my test was checking	19:23
johnsom	The thing is, the tree test that is failing, is bombing on the part that is all part of one session/transaction	19:24
rm_work	and we do the same thing in this tree test	19:24
rm_work	hmm	19:25
rm_work	erg well i have a meeting followed by sleep	19:26
johnsom	Yeah, I need lunch	19:26
*** tobberydberg_ has quit IRC		20:17
*** tobberydberg has joined #openstack-lbaas		20:22
*** maciejjozefczyk has joined #openstack-lbaas		20:30
*** tobberydberg has quit IRC		20:30
*** tobberydberg has joined #openstack-lbaas		20:36
*** tobberydberg has quit IRC		20:37
*** maciejjozefczyk has quit IRC		20:39
*** tobberydberg has joined #openstack-lbaas		20:42
*** tobberydberg has quit IRC		20:43
johnsom	Well, I got a sqlalchemy info level capture of the bug	20:44
*** tobberydberg has joined #openstack-lbaas		20:45
*** tobberydberg has quit IRC		20:45
*** tobberydberg has joined #openstack-lbaas		20:46
*** tobberydberg has quit IRC		20:46
*** tobberydberg has joined #openstack-lbaas		20:46
*** tobberydberg has quit IRC		20:47
*** tobberydberg has joined #openstack-lbaas		20:47
*** tobberydberg has quit IRC		20:47
*** tobberydberg has joined #openstack-lbaas		20:51
*** gcheresh has quit IRC		20:53
*** tobberydberg has quit IRC		20:55
*** KeithMnemonic has joined #openstack-lbaas		21:08
*** KeithMnemonic has quit IRC		21:16
*** KeithMnemonic has joined #openstack-lbaas		21:17
johnsom	Yeah, ok, so before the new version it ran with sqlite autocommit and sqlalchemy non-autocommit. Now it is non-autocommit and non-autocommit.	21:21
lxkong	johnsom, rm_work could you take a look at the updated patch for the https://storyboard.openstack.org/#!/story/2007531please? Do you think we could just submit a gerrit patch considering the security class Jeremy suggested?	21:51
johnsom	I saw you updated, but have not yet reviewed. Path forward would be to submit a patch, however sqlalchemy has broke our gates, so now might not be the best time.	21:52
rm_work	yeah i think this is a case of "don't let perfect be the enemy of good"	21:54
rm_work	that solution is better than nothing, even though it still has some flaws	21:54
rm_work	and it's not an api-level change so we can always revert it once we can do the totally correct thing	21:54
johnsom	Yeah, I agree, I'm just saying if it's posted now, it may sit for days	21:54
rm_work	yep	21:55
lxkong	johnsom, rm_work, thanks for the suggestion, then I wait for gate issue solved?	21:56
rm_work	yeah	21:57
johnsom	Yeah, I will post a comment when I have reviewed	21:57
lxkong	cool, please ping me or leave a comment in the story after that's done, thank you so much	21:57
lxkong	johnsom, ack	21:57
johnsom	I think you addressed my only concern	21:57
*** servagem has quit IRC		22:13
lxkong	johnsom, rm_work, is redis or zookeeper a hard requirement for the master deployment now?	22:24
johnsom	lxkong No, not yet.	22:24
johnsom	It is only needed if you use the amphorav2 driver at the moment.	22:25
rm_work	johnsom: do i need to do anything besides `ifconfig lo up` to make local queries on an amp?	22:31
johnsom	no, just make sure you are inside the netns	22:31
rm_work	hmm	22:31
rm_work	yeah weirdness	22:32
johnsom	I might do ifup lo	22:32
rm_work	i can hit a member from the netns	22:32
rm_work	but i can't hit it via the local IP	22:32
rm_work	(of the lb)	22:32
rm_work	ipvsadm shows members up	22:33
johnsom	Ah, UDP... that might be different	22:33
rm_work	ah nm i can reach it, was using the vrrp ip i think	22:33
rm_work	the HA IP doesn't show as up, but ipvsadm shows it in use	22:33
rm_work	and it does work :D	22:33
johnsom	This DB stuff is bonkers.	22:35
rm_work	testing failover right now	22:35
rm_work	seems like this LB stopped passing traffic	22:35
johnsom	One call, I can see the LB with a select, a few calls later, same transaction, LB missing. Run the test again, LB doesn't disappear	22:35
rm_work	trying to figure out what happened	22:35
rm_work	pre-failover and post-failover HA port looks VERY different	22:39
rm_work	wtf?	22:39
rm_work	oh nm i think i see why	22:40
lxkong	> It is only needed if you use the amphorav2 driver at the moment.	22:45
lxkong	johnsom, but it's not possible to config run v1 or v2 consumer for octavia-worker. I updated Octavia for my devstack environment yesterday but failed with http://dpaste.com/36KYZ9B. Then I have to install redis. So I suppose either redis or zookeeper needs to be installed for octavia-worker.	22:45
*** born2bake has quit IRC		22:46
*** tkajinam has joined #openstack-lbaas		22:46
johnsom	lxkong What is the output of "openstack loadbalancer provider list "	22:50
lxkong	https://www.irccloud.com/pastebin/d2DGDbVY/	22:51
johnsom	rm_work ^^^ Yeah, I think there is a bigger problem with the jobboard patch than that missing extra.	22:52
johnsom	Yeah, v2 in the controller worker is enabled by default: https://github.com/openstack/octavia/blob/master/octavia/cmd/octavia_worker.py#L38	22:54
johnsom	I wonder why the non-v2 gates are passing	22:54
rm_work	I thought that was turned off	22:54
rm_work	Also -- I just deployed that code and it seems to be working fine	22:55
rm_work	Ah. Not using the devstack plugin though :D	22:56
rm_work	Well isn't that ok? It'll listen on a second queue that just never has anything	22:57
rm_work	Right?	22:57
johnsom	It'	22:58
johnsom	It is starting up the taskflow conductors which try to go out to redis	22:58
lxkong	or it would be good to config which version consumer is running	22:59
johnsom	One issue I think is the devstack plugin is assigning instead of ==	23:00
johnsom	But it still seems like these conductors are always going to be spun up, which... isn't what we intended.	23:00
*** TrevorV has quit IRC		23:06
*** dayou has joined #openstack-lbaas		23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!