Wednesday, 2017-09-27

*** tongl has joined #openstack-lbaas		00:49
*** mugsie has quit IRC		00:51
*** fnaval has quit IRC		00:56
*** yamamoto has quit IRC		01:03
*** yamamoto_ has joined #openstack-lbaas		01:03
*** leyal has quit IRC		01:31
*** leyal has joined #openstack-lbaas		01:33
*** bzhao has joined #openstack-lbaas		01:39
*** bbzhao has joined #openstack-lbaas		01:48
*** leitan has joined #openstack-lbaas		02:06
openstackgerrit	Lingxian Kong proposed openstack/octavia-tempest-plugin master: [WIP] Create scenario tests for listeners https://review.openstack.org/492311	02:14
*** SumitNaiksatam has joined #openstack-lbaas		02:18
*** aojea has joined #openstack-lbaas		03:17
*** aojea has quit IRC		03:22
openstackgerrit	Lingxian Kong proposed openstack/octavia-tempest-plugin master: [WIP] Create scenario tests for listeners https://review.openstack.org/492311	03:28
*** Yipei has joined #openstack-lbaas		03:46
*** ianychoi_ has joined #openstack-lbaas		04:20
*** ianychoi has quit IRC		04:29
*** m-greene_ has quit IRC		04:32
*** m-greene_ has joined #openstack-lbaas		04:35
*** sanfern has joined #openstack-lbaas		04:38
*** belharar has joined #openstack-lbaas		04:40
*** armax has joined #openstack-lbaas		04:42
*** Alex_Staf has joined #openstack-lbaas		04:54
*** leitan has quit IRC		05:11
openstackgerrit	Rajat Sharma proposed openstack/octavia master: Replace 'manager' with 'os_primary' and 'os_adm' with 'os_admin' https://review.openstack.org/478399	05:22
*** aojea has joined #openstack-lbaas		05:42
*** gcheresh_ has joined #openstack-lbaas		05:44
*** ltomasbo has quit IRC		05:49
openstackgerrit	Pradeep Kumar Singh proposed openstack/octavia master: Add flavor, flavor_profile table and their APIs https://review.openstack.org/486499	06:12
*** armax has quit IRC		06:19
*** Alex_Staf has quit IRC		06:20
*** ltomasbo has joined #openstack-lbaas		06:21
openstackgerrit	Bar RH proposed openstack/octavia master: [WIP] Assign bind_host ip address per amphora https://review.openstack.org/505158	06:27
*** eezhova has joined #openstack-lbaas		06:36
*** rcernin has joined #openstack-lbaas		06:39
*** armax has joined #openstack-lbaas		06:41
*** yamamoto_ has quit IRC		06:41
*** yamamoto has joined #openstack-lbaas		06:45
*** armax has quit IRC		06:50
*** yamamoto_ has joined #openstack-lbaas		07:04
*** yamamoto has quit IRC		07:04
*** yamamoto_ has quit IRC		07:10
*** tongl has quit IRC		07:12
*** yamamoto has joined #openstack-lbaas		07:18
*** eezhova has quit IRC		07:19
*** tesseract has joined #openstack-lbaas		07:21
*** Alex_Staf has joined #openstack-lbaas		07:41
*** eezhova has joined #openstack-lbaas		07:42
*** eezhova_ has joined #openstack-lbaas		07:45
*** eezhova has quit IRC		07:47
*** Yipei has left #openstack-lbaas		08:40
*** chlong has quit IRC		08:49
openstackgerrit	Pradeep Kumar Singh proposed openstack/octavia master: Add flavor, flavor_profile table and their APIs https://review.openstack.org/486499	08:51
openstackgerrit	Bar RH proposed openstack/octavia master: [WIP] Assign bind_host ip address per amphora https://review.openstack.org/505158	09:00
*** yamamoto has quit IRC		09:19
*** salmankhan has joined #openstack-lbaas		09:23
*** numans has quit IRC		09:25
*** numans has joined #openstack-lbaas		09:28
openstackgerrit	Pradeep Kumar Singh proposed openstack/octavia master: Add flavor, flavor_profile table and their APIs https://review.openstack.org/486499	09:30
openstackgerrit	Bar RH proposed openstack/octavia master: [WIP] Assign bind_host ip address per amphora https://review.openstack.org/505158	09:55
*** yamamoto has joined #openstack-lbaas		10:20
*** eezhova__ has joined #openstack-lbaas		10:28
*** eezhova__ has quit IRC		10:28
*** salmankhan has quit IRC		10:29
*** eezhova_ has quit IRC		10:31
*** atoth has quit IRC		10:35
openstackgerrit	Lingxian Kong proposed openstack/octavia-tempest-plugin master: Create scenario tests for listeners https://review.openstack.org/492311	10:38
*** salmankhan has joined #openstack-lbaas		10:39
*** apuimedo_ has joined #openstack-lbaas		10:43
*** apuimedo has quit IRC		10:45
*** apuimedo_ is now known as apuimedo		10:45
*** sanfern has quit IRC		10:54
*** eezhova has joined #openstack-lbaas		11:20
*** strigazi has quit IRC		11:23
*** strigazi has joined #openstack-lbaas		11:24
*** pcaruana has joined #openstack-lbaas		11:27
*** atoth has joined #openstack-lbaas		11:29
*** sanfern has joined #openstack-lbaas		12:32
nmagnezi	o/	12:44
openstackgerrit	OpenStack Proposal Bot proposed openstack/neutron-lbaas master: Updated from global requirements https://review.openstack.org/506638	12:48
openstackgerrit	OpenStack Proposal Bot proposed openstack/neutron-lbaas-dashboard master: Updated from global requirements https://review.openstack.org/504660	12:48
*** leitan has joined #openstack-lbaas		12:56
*** belharar has quit IRC		12:57
*** belharar has joined #openstack-lbaas		12:58
*** chlong has joined #openstack-lbaas		13:29
*** chlong has quit IRC		13:31
*** belharar has quit IRC		13:34
*** Alex_Staf has quit IRC		13:36
*** rtjure has quit IRC		13:58
*** sanfern has quit IRC		13:59
*** sanfern has joined #openstack-lbaas		14:00
*** rtjure has joined #openstack-lbaas		14:03
*** belharar has joined #openstack-lbaas		14:08
*** yamamoto has quit IRC		14:10
*** ipsecguy_ has joined #openstack-lbaas		14:13
*** ipsecguy has quit IRC		14:14
*** yamamoto has joined #openstack-lbaas		14:15
*** yamamoto has quit IRC		14:20
*** tongl has joined #openstack-lbaas		14:27
johnsom	o/	14:28
*** tongl has quit IRC		14:30
openstackgerrit	Merged openstack/neutron-lbaas-dashboard master: Updated from global requirements https://review.openstack.org/504660	14:34
*** dayou has quit IRC		15:01
*** longkb_ has joined #openstack-lbaas		15:01
*** bbzhao has quit IRC		15:03
*** bbzhao has joined #openstack-lbaas		15:03
*** yamamoto has joined #openstack-lbaas		15:10
*** gcheresh_ has quit IRC		15:11
*** chlong has joined #openstack-lbaas		15:17
*** eezhova has quit IRC		15:18
xgerman_	o/ - not sure if I am able to make the meeting but feel free to summon me if needed ;-)	15:33
johnsom	Ok	15:35
*** tongl has joined #openstack-lbaas		15:37
*** rcernin has quit IRC		15:43
openstackgerrit	Bar RH proposed openstack/octavia master: [WIP] Assign bind_host ip address per amphora https://review.openstack.org/505158	15:49
johnsom	Hmmm, I think I just reproduced the "404" issue locally. I have a nova vm, but the interface isn't in the VM	15:54
nmagnezi	johnsom, in case I won't make it to the meeting, I voted to the poll (the best available option for me is to revert back to the old timing)	15:58
nmagnezi	johnsom, oh, and hi :)	15:58
johnsom	Ok, great, I was going to ping you to vote	15:59
nmagnezi	johnsom, btw as for my plugin.sh (and more) patch, for some reason when I stack with this patch it fails to spawn vms with: {u'message': u"Host 'rdocloud-devstack2' is not mapped to any cell", u'code': 400, u'created': u'2017-09-27T11:31:42Z'}	16:01
nmagnezi	so not sure if that's related, going to use a clean setup to restack	16:01
johnsom	Ok, yeah, that sounds like a nova setup issue, probably unrelated	16:02
nmagnezi	johnsom, byw in the story I asked you question about that DVR comment you asked for	16:02
nmagnezi	yup. i think the same.	16:02
johnsom	Do you have a link to the story?	16:02
*** atoth has quit IRC		16:03
nmagnezi	yes, sec	16:03
johnsom	Thanks, trying to dig into why I have a instance with a missing network interface. Nova/neutron say it's there, but linux in the vm doesn't see it.	16:04
nmagnezi	johnsom, https://storyboard.openstack.org/#!/story/2001183#comment-17461	16:04
*** gcheresh_ has joined #openstack-lbaas		16:07
johnsom	nmagnezi Thanks, commented	16:08
nmagnezi	johnsom, thanks! was that issue resolved later? (I want to specify this in the comment)	16:09
johnsom	It was fixed in the Pike release	16:10
johnsom	Prior to Pike it has always been broken when using DVR	16:10
nmagnezi	ack. thanks!	16:13
nmagnezi	johnsom, btw fixed even the nova commands, so I want those bonus points.	16:13
johnsom	Cool!	16:14
johnsom	500+	16:14
*** gcheresh_ has quit IRC		16:22
*** tesseract has quit IRC		16:24
*** sshank has joined #openstack-lbaas		16:39
*** SumitNaiksatam has quit IRC		16:40
*** JudeC has joined #openstack-lbaas		16:53
*** yamamoto has quit IRC		16:55
*** dayou has joined #openstack-lbaas		16:57
*** longstaff has joined #openstack-lbaas		16:58
*** gans has joined #openstack-lbaas		16:59
*** rm_mobile has joined #openstack-lbaas		16:59
*** yamamoto has joined #openstack-lbaas		16:59
*** eezhova has joined #openstack-lbaas		17:01
*** longstaff has quit IRC		17:07
*** longstaff has joined #openstack-lbaas		17:10
*** pcaruana has quit IRC		17:14
*** yamamoto has quit IRC		17:31
*** yamamoto has joined #openstack-lbaas		17:31
*** gans has quit IRC		17:32
*** rm_mobile has quit IRC		17:49
*** sshank has quit IRC		17:54
*** jniesz has joined #openstack-lbaas		18:00
johnsom	jniesz Ok with next week or want to chat here?	18:02
rm_work	wait, i just read the flavor spec and it said flavors were immutable and only set at create time :P	18:02
rm_work	are we changing that?	18:03
johnsom	Yeah, that is the current stance	18:03
johnsom	I think the topic was a discussion about revisiting that.	18:03
jniesz	yes because the question is how to move from one flavor to another	18:04
johnsom	I think it's "possible", but I would like to see it working first... grin	18:04
jniesz	i agree that an lb created under a flavor is immutable	18:04
jniesz	but should be able to failover (deprovision / reprovision) lb to new flavor	18:04
*** longstaff has quit IRC		18:06
jniesz	for example if we update glance image of a flavor	18:06
jniesz	create a new flavor with a new glance image	18:06
jniesz	and want to move all lb's under the old flavor over to that new flavor	18:06
johnsom	It gets pretty strange if the provider is different across the flavors. I mean, I think it is possible, but definitely something I would want to tackle in the future when we have providers/drivers working	18:06
johnsom	Well, today, glance images are tagged, so by updating the tag to point to the new image and then using the failover API you can accomplish that without changing the flavor	18:08
jniesz	correct. Depending how we implement glance images in flavor that might be different	18:09
jniesz	if we have different glance images for different flavors	18:09
jniesz	flavor might point to glance image id	18:09
jniesz	or need to support multiple tags for different images	18:10
johnsom	We have deprecated pointing to image IDs I think...	18:10
jniesz	right now we just look for single tag	18:10
johnsom	I think we only support tags	18:10
johnsom	Right, you could setup a tag per flavor if you want to manage it that way.	18:11
johnsom	https://github.com/openstack/octavia/blob/master/octavia/common/config.py#L290	18:11
jniesz	yes, so amp_image_tag would have to move into flavor meta_info	18:12
jniesz	from the config	18:12
johnsom	Yes, that is definitely something that needs to happen	18:12
johnsom	I think there are a few config settings that need to move to flavor, like topology, tag, nova flavor, etc. Mostly we parked things in config because we didn't have flavors.	18:13
jniesz	yea, nova flavor is another useful one.	18:14
jniesz	migrate from one flavor to another (vertical scaling)	18:14
jniesz	it would be good to failover to make that happen similar to the way glance is handled with tags	18:14
johnsom	It's another interesting one. Nova team is starting to work on hot-plug vcpu/ram	18:15
*** pcaruana has joined #openstack-lbaas		18:15
*** salmankhan has quit IRC		18:15
openstackgerrit	Bar RH proposed openstack/octavia master: [WIP] Assign bind_host ip address per amphora https://review.openstack.org/505158	18:19
xgerman_	yeah, we can move all that to flavor ;-) We still support image-ids but they are a bit weird when using	18:19
rm_work	johnsom: harlowja says moving to jobboard requires us dropping oslo.messaging	18:20
xgerman_	Probably need to extends our failover mechanism and one flavor->another might be a user op…	18:20
harlowja	or at least parts of it...	18:20
harlowja	the models just aren't the same (in that there is no messaging, lol)	18:20
xgerman_	mmh, can we keep our queue between API and worker?	18:20
johnsom	hmmm, confused a bit by that	18:21
xgerman_	+1	18:21
johnsom	We use it in two places:	18:21
johnsom	1. API process to worker (prior to starting a task flow)	18:21
rm_work	xgerman_: specifically the queue between API and worker is what needs to stop using it	18:21
xgerman_	mmh	18:21
johnsom	2. Sending stats/status over to neutron (outside task flow)	18:21
rm_work	yeah that second one could continue	18:21
rm_work	except, neutron-lbaas is on fire	18:22
johnsom	Since API->CW is before we even launch an engine, how would it conflict? In reality I think the CW gets a lot smaller and the JB workers take on more of the stuff	18:23
harlowja	https://openstack.nimeyo.com/83061/openstack-dev-oslo-mistral-saga-process-than-where-from-here also for homework/reading	18:24
harlowja	quiz tommorow	18:25
rm_work	johnsom: because jobboard requires ack-after-work and oslo does ack-before-work apparently	18:25
johnsom	Deal	18:25
*** belharar has quit IRC		18:25
xgerman_	so we need to run our own queue? Or does job board does that for us? Confused?	18:26
harlowja	so it prob would be useful for me to tell u what a jobboard (at least the zookeeper one)	18:26
xgerman_	guess @johnsom will write a summary paper for us and present :-)	18:26
johnsom	rm_work Still don't see how that is a problem. It is a feature of this solution IMO	18:26
harlowja	depending on peoples zookeeper knowledge	18:26
harlowja	i need 20 minutes class time to do said description	18:26
xgerman_	I know the keeper —	18:26
harlowja	k	18:27
harlowja	https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#ch_zkDataModel (for those who don't know)	18:27
harlowja	watches, znodes, emphermal nodes are important	18:27
johnsom	Here comes the fire hose....	18:27
harlowja	ha	18:27
harlowja	let me know when to fully firehose	18:28
harlowja	johnsom ack-before-work means u can lose messages if worker processing messages crashes right after acking	18:29
harlowja	so that's bad, especially in projects (mistral, octavia) that aren't retrying and ...	18:29
harlowja	(retrying from the sender side)	18:29
harlowja	(crash or software upgrade, or network blip or other)	18:30
johnsom	Right, this is our current situation.	18:31
xgerman_	yep, but can’t we make the queue “duarble”	18:31
johnsom	I am envisioning.... (cue dreamland music) Our CW gets message off queue, fires up TF engine stuffs, ACKs queue as message has been handed off to a more durable system.	18:32
rm_work	that is what i thought too, but apparently not how it works :/	18:32
*** sshank has joined #openstack-lbaas		18:32
*** sshank has quit IRC		18:33
*** sshank has joined #openstack-lbaas		18:33
harlowja	not with oslo.messaging, lol	18:34
xgerman_	fix it!	18:34
harlowja	not likely	18:34
*** sshank has quit IRC		18:35
harlowja	oslo.messaging doesn't really expose the acking to users	18:35
harlowja	on purpose	18:35
johnsom	So even Our CW gets message off queue, auto ACKs, fires up TF engine is better than we have today and a pretty small failure window	18:35
xgerman_	+1	18:35
harlowja	that's already happening u use oslo.messaging	18:36
rm_work	welcome to my conversation with harlowja as of like 15 minutes ago	18:36
xgerman_	nice — so we fire off taskboard earlier?	18:36
johnsom	So you are proposing to scrap the queue, launch the TF engine straight from the API provider driver	18:36
xgerman_	monoliths unite!	18:36
harlowja	different kind of queue (if u can call it that)	18:36
rm_work	I think we ... post a job to a board	18:36
harlowja	a taskflow job 'queue' is a directory in zookeeper	18:37
johnsom	The problem we have is after the message is off the queue and we start the TF is when the "hard stuff" happens	18:37
rm_work	instead of put a call on a queue	18:37
rm_work	it's basically the same kind of thing	18:37
harlowja	so when u post to a job board, it creates entry in that directory	18:37
harlowja	workers that are waiting for work are 'watching' that directory for entries being created	18:37
xgerman_	ok, zookeeper is the queue	18:37
xgerman_	got it… easy	18:37
rm_work	so yeah we'd just replace the places where we post to oslo.messaging, and instead post the job to zookeeper in a slightly different format	18:37
harlowja	then a worker (one of them) gets the 'entry' (via atomic blah blah)	18:37
harlowja	then worker starts processing workflow described in job	18:38
rm_work	and the workers look at zookeeper for jobs	18:38
rm_work	instead of reading from oslo.messaging	18:38
harlowja	if worker dies (at any point) it releases lock (emphermal ...node)	18:38
xgerman_	yeah, then they lock and if they die it unlocks bla, bla	18:38
harlowja	then another worker can see this has happened and try to take that job over	18:38
harlowja	blah blah	18:38
johnsom	Where in this does the TF engine get started?	18:38
harlowja	post-job-claim	18:38
harlowja	the interseting bit is that if u have TF engines persisting task state to somewhere	18:38
harlowja	that on worker death the next worker can try to figure out where the last worker finished	18:39
johnsom	Yeah, that is the part we actually need	18:39
harlowja	and pick it up there	18:39
xgerman_	now we are talking ;-)	18:39
harlowja	of course, some projects don't give a shit about the persistnce part	18:39
johnsom	sub-flow durability	18:39
harlowja	and just restart the whole damn thing (and just use the auto-worker transfer stuff)	18:39
harlowja	depends on if u can restart the subflows or if its just easier to start the whole thing over	18:39
harlowja	(and then have tasks themselves check things and do nothing...)	18:40
xgerman_	could work for us but would be wasteful (unaccounted resources…)	18:40
johnsom	Yeah, restart the whole thing is bad in most cases. We have this pesky VIP IP/port	18:40
harlowja	i'd expect a task could check something before doing work no?	18:40
harlowja	like check if VIP/IP/port already made, then do nothing	18:40
xgerman_	probably	18:40
harlowja	if not already made, do something...	18:40
harlowja	and repeat	18:40
xgerman_	but it gets more tough with VMs, etc.	18:40
harlowja	sure	18:41
harlowja	anyway, that's the idea	18:41
harlowja	hose done	18:41
harlowja	lol	18:41
*** gcheresh_ has joined #openstack-lbaas		18:41
johnsom	You have such faith in the OpenStack API capabilities.... Odds are high we would walk every port in the system.... grin	18:41
harlowja	i haven't (but could) transfer the same concepts to etcd	18:41
harlowja	i just haven't	18:41
harlowja	there is a limited redis driver for jobboard that is sorta similar (but not so good)	18:41
harlowja	since redis doesn't support the same concepts natively	18:41
harlowja	johnsom ha	18:42
harlowja	don't do dumb things :-P	18:42
xgerman_	why can’t we use a graphDB - I heard they are all the rage now	18:42
harlowja	ha	18:42
johnsom	I mean if we are going to restart the whole flow, isn't there something lighter weight than zookeeper, etc?	18:42
xgerman_	like a durable queue?	18:42
xgerman_	after all the keeper is not officially part of OpenStacj whereas etcd is	18:43
harlowja	define light-weight	18:43
harlowja	lol	18:43
xgerman_	so if we can avoid the keeper would be goodness	18:43
johnsom	Yeah, relying on more external parts makes me ill	18:43
harlowja	meh, u decide	18:43
harlowja	u can hack all of this with db\|rabbit\|something else	18:43
harlowja	but i'd rather not	18:43
harlowja	with some polling threads and shit	18:44
*** sshank has joined #openstack-lbaas		18:44
harlowja	enjoy that, ha	18:44
harlowja	but ya, the jobboard stuff was before etcd got approved (3 months ago?) so ya...	18:44
harlowja	is what it is ,ha	18:44
xgerman_	so we should at least use etcd since that’s now officially part of the kit whereas zookeeper isn't	18:44
harlowja	go for it	18:44
harlowja	i could prob do it, but it might be a useful thing for someone here	18:44
harlowja	then u'll know wtf jobboards are better, haha	18:44
xgerman_	I guess we have our work cutout or wait for the K8 Octavia	18:44
harlowja	at least now u know the concepts	18:45
xgerman_	yeah, might also be a non-issue, e.g. terraform checks if the LB appeared and if not errors out — so if we loose the message the user will know and can run again	18:46
harlowja	sounds like shifting work to user	18:46
xgerman_	yep, it’s shitty	18:46
harlowja	ie, the user is your retry decorator, lol	18:46
harlowja	user-powered-decorators	18:46
johnsom	We could just tell rm_work to never interrupt a controller	18:47
xgerman_	:-)	18:47
johnsom	Get a bunker, UPS, generator, vault door	18:47
xgerman_	http://www.zerohedge.com/sites/default/files/images/user5/imageroot/2017/04/15/north-korea-missiles_0.png	18:48
xgerman_	just look that you build it outside those circles	18:48
johnsom	Geez, three pages in the first doc alone...	18:49
johnsom	I feel like harlowja gave us free candy (taskflow) and then .....	18:50
*** sshank has quit IRC		18:56
*** JudeC has quit IRC		18:59
*** pcaruana has quit IRC		19:00
*** gcheresh_ has quit IRC		19:05
nmagnezi	johnsom, o/	19:11
nmagnezi	johnsom, a question about https://review.openstack.org/#/c/505884/7	19:12
johnsom	o/	19:12
nmagnezi	johnsom, why do we have openstackclient in both places?	19:12
johnsom	Yeah, I had that question too. Let me see if I can find the comments about this.	19:13
nmagnezi	johnsom, so in the patch i looks like it bumps it in test req, but strangely enough i don't see it in master: https://github.com/openstack/python-octaviaclient/blob/master/test-requirements.txt	19:13
johnsom	If I remember right it has to do with some integrated tests	19:13
nmagnezi	johnsom, ack. just tried to makes sense of it for myself :)	19:13
johnsom	Oh, interesting point	19:14
johnsom	hmmm	19:14
johnsom	https://review.openstack.org/#/c/487565/3/test-requirements.txt	19:15
johnsom	So, we did remove it	19:15
johnsom	The bot must be confused	19:15
johnsom	Oh, we removed it after stable/pike was cut	19:15
johnsom	That bot patch is against stable/pike	19:16
nmagnezi	johnsom, oh, i missed that	19:16
johnsom	Yeah, we can backport that if it is a problem for your packaging	19:16
nmagnezi	johnsom, i should have waited with my vote.. :P	19:17
*** sanfern has quit IRC		19:17
johnsom	Not to late to fix it	19:17
nmagnezi	johnsom, indeed	19:17
nmagnezi	johnsom, as for packaging, I didn't package the client yet (I plan to do so in a few weeks)	19:19
nmagnezi	but since pike was already release i think we should just leave it there	19:19
johnsom	Yep	19:19
nmagnezi	if it ain't broken.. :)	19:19
johnsom	Yep	19:20
openstackgerrit	Nir Magnezi proposed openstack/octavia master: Fix a Python3 issue with start_stop_listener https://review.openstack.org/480919	19:20
nmagnezi	rm_work, i need some advise with this one ^. A little bird (with a PTL wings) whispered me you know how to tackle those py3 issues :)	19:22
*** eezhova has quit IRC		19:23
rm_work	yeah those are always fun	19:23
rm_work	often you can run it through / test against six.text_type	19:23
rm_work	i THINK in this case that might be safe	19:24
rm_work	six.text_type("a") == six.text_type(b"a")	19:25
*** sanfern has joined #openstack-lbaas		19:25
rm_work	ah yeah doesn't work directly in py3	19:26
rm_work	but you can test	19:27
harlowja	johnsom hahaha, free candy	19:31
harlowja	i can only provide some much of the candy, the rest is up to u guys	19:33
harlowja	i had a hard enough time just trying to get etcd\|zookeeper into openstack as an accepted thing	19:34
harlowja	(thankfully it now is)	19:34
harlowja	^ that blew my mind honestly (that it took that long)	19:34
harlowja	especially for a cloud distributed system...	19:34
harlowja	lol	19:34
* harlowja slightly burnt out by that crap		19:36
rm_work	johnsom: do you remember what log level we get if we don't have debug=true?	19:38
harlowja	i think also not super-happy with how people like used oslo.messaging and i think they didn't quite know what it really is doing (acking before work...)	19:38
johnsom	Yeah, I am getting worn down by things breaking out from under us... Thus the worry of adding more	19:38
harlowja	anyways, rant over	19:38
johnsom	rm_work INFO I think	19:39
rm_work	yeah I'm hoping that's true	19:39
johnsom	That is what I see in one of my devstack VMs	19:43
johnsom	Ok, back to nova/neutron fun with the case of the missing network interface	19:43
rm_work	ganbatte	19:52
openstackgerrit	Adam Harwell proposed openstack/octavia master: WIP: Floating IP Network Driver (spans L3s) https://review.openstack.org/435612	20:03
*** sshank has joined #openstack-lbaas		20:05
*** dayou has quit IRC		20:07
*** yamamoto has quit IRC		20:08
*** eezhova has joined #openstack-lbaas		20:14
*** sshank has quit IRC		20:24
*** sshank has joined #openstack-lbaas		20:24
*** yamamoto has joined #openstack-lbaas		20:27
*** yamamoto has quit IRC		20:31
*** ltomasbo has quit IRC		20:34
*** ltomasbo has joined #openstack-lbaas		20:37
johnsom	So, our 404 issue....	20:39
johnsom	Appears to be an issue inside the amp	20:39
johnsom	If I force a PCI bus re-enumeration the interface pops up	20:40
rm_work	hmmmmmmmmm	20:40
rm_work	curiosity: i wonder if we switched the gates to centos amps if it'd show up :P	20:41
rm_work	might be an ubuntu thing	20:41
johnsom	Yeah. I collected a ton of nova/neutron logs then figured I would start poking things.	20:41
rm_work	you said you managed to repro, but	20:41
rm_work	reliably? or just once randomly	20:41
johnsom	Once ever	20:42
rm_work	T_T	20:42
johnsom	But we saw it in the gates a bunch in Pike	20:42
rm_work	yes, i remember	20:42
rm_work	is it a race?	20:42
rm_work	like sure you did a re-enum and it popped up	20:42
rm_work	but maybe because nova didn't set it up in the time it was supposed to?	20:42
johnsom	If so it's in the bowels of the linux kernel hot-plug systems....	20:43
johnsom	Oh, no, I had looked at the device list just before	20:43
johnsom	https://www.irccloud.com/pastebin/5DeVTzTG/	20:43
*** aojea has quit IRC		20:44
johnsom	I ran it in the netns just to make sure there wasn't some strange netns magic going on (though PCI should not be masked by that)	20:44
johnsom	I mean, forcing a rescan if we don't find the interface we expect is not harmful.	20:45
openstackgerrit	Adam Harwell proposed openstack/octavia master: WIP: Floating IP Network Driver (spans L3s) https://review.openstack.org/435612	20:45
*** aojea has joined #openstack-lbaas		20:49
rm_work	johnsom: so, basically a quick workaround	20:59
rm_work	maybe that's fine :/	20:59
rm_work	my standards have become a lot lower as someone trying to actually get all this working reliably	21:00
openstackgerrit	Nir Magnezi proposed openstack/octavia master: Update devstack plugin and examples https://review.openstack.org/503638	21:01
*** Alex_Staf has joined #openstack-lbaas		21:02
johnsom	It's actually a pretty common practice when hot plugging things into linux hosts. I just thought we were long beyond needing it.	21:02
rm_work	aah there was another admin endpoint i forgot to talk about in the meeting T_T	21:02
rm_work	maintenance mode	21:02
rm_work	a couple of things:	21:03
rm_work	1) I think i want to store the AZ/host that comes back from the nova polling	21:04
rm_work	2) make a new table for storing currently active maintenances (either an AZ or a Host)	21:04
rm_work	3) Make an endpoint to create/read/delete to that table	21:05
rm_work	4) have some logic that can failover amps off those AZ/hosts when a maintenance is set	21:05
rm_work	jniesz: ^^ possibly relevant for you too	21:06
rm_work	for #1 i mean, in the amphora table	21:07
xgerman_	mmh, I can see that we switch off an AZ for maintenance but actively failing stuff over - that should be left to the operator to run some script	21:08
rm_work	hmmm	21:08
xgerman_	yeah, not every maintenance might need a failover - maybe just not schedule new amps for some time	21:09
johnsom	Yeah, it seems like maintenance would stop health monitoring and block failovers	21:10
johnsom	If I remember right, you are looking for something to evacuate an AZ???	21:11
xgerman_	looks like it and I think that should be done outside Octavia	21:11
xgerman_	we just give you the building blocks	21:11
johnsom	Yeah, some folks might want to live migrate too	21:12
xgerman_	+1	21:12
xgerman_	also rm_work should write a spec — this is getting beyond what we can handle i irc ;-)	21:15
*** eezhova has quit IRC		21:15
johnsom	Yeah, this one probably should have a spec	21:15
*** ltomasbo has quit IRC		21:16
*** leitan has quit IRC		21:16
xgerman_	I haven’t looked at our minutes but if we do a spec we might do it for all the proposed ones… so we have a proper record	21:16
xgerman_	aka an admin-api spec	21:17
*** ltomasbo has joined #openstack-lbaas		21:18
rm_work	yeah prolly	21:18
johnsom	https://www.irccloud.com/pastebin/CykDBtZM/	21:19
xgerman_	root is not what it used to be?	21:19
rm_work	open('/sys/bus/pci/rescan', 'w')	21:20
rm_work	plz try	21:20
johnsom	Ah, I'm a dork and forgot the 'w'	21:20
openstackgerrit	Michael Johnson proposed openstack/octavia master: Force PCI bus rescan if interface is not found https://review.openstack.org/507986	21:26
rm_work	so now our gates will be impervious to failure? :P	21:27
*** yamamoto has joined #openstack-lbaas		21:28
tongl	Do we still track neutron-lbaas v2 bug? I created a listener with default pool, and also add healthmonitor for this pool. However, when I tried to create healthmonitor for the 2nd redirect_pool, it reports exception "TypeError: unhashable type: 'dict'".	21:29
tongl	Did anyone see this before in LBaaS v2?	21:29
tongl	Error log: http://paste.openstack.org/show/622092/	21:30
* rm_work doesn't use neutron-lbaas		21:31
tongl	We still have to develop our driver to support neutron-lbaas :(	21:33
johnsom	tongl neutron-lbaas bugs go here: https://storyboard.openstack.org/#!/project/906	21:33
johnsom	I will say, developers working on neutron-lbaas are few	21:33
tongl	thanks	21:34
*** yamamoto has quit IRC		21:34
johnsom	Is it the odd newline in "admin_state_up"?	21:37
*** gcheresh_ has joined #openstack-lbaas		21:39
*** gcheresh_ has quit IRC		21:43
*** sshank has quit IRC		21:47
tongl	Another quick question: in Octavia, do we allow deleting a pool when l7policy redirect_to_pool is still referencing it?	21:48
johnsom	I think the answer is no, but I have not tested it	21:50
tongl	thx	21:51
*** yamamoto has joined #openstack-lbaas		21:57
*** yamamoto has quit IRC		21:57
*** aojea has quit IRC		21:58
*** kbyrne has quit IRC		22:00
*** kbyrne has joined #openstack-lbaas		22:03
*** jniesz has quit IRC		22:04
rm_work	blegh, amps explode when i use single-create, not when i use normal create	22:22
rm_work	because of the active-standby keepalived	22:22
johnsom	Ugh	22:22
rm_work	something with the initiation of it happening very early	22:22
rm_work	logs look like this:	22:22
rm_work	http://paste.openstack.org/show/622094/	22:24
rm_work	that's one amp	22:24
rm_work	http://paste.openstack.org/show/622095/	22:26
rm_work	that's the other	22:26
rm_work	so what seems to happen ...	22:26
rm_work	http://paste.openstack.org/show/622096/	22:29
rm_work	hmmm	22:29
rm_work	actually i am unsure if it's keepalived related	22:29
*** sshank has joined #openstack-lbaas		22:29
rm_work	it actually fails over because of no health updates?	22:29
rm_work	but it does so before we consider the create "done"	22:30
rm_work	so it errors the amps because it's still PENDING_CREATE	22:30
rm_work	not sure why it's not sending health messages	22:30
johnsom	Ummm, it can't be told to failover from health messages under after it starts sending them.	22:32
rm_work	yeah i mean it seems like it STOPS sending them	22:32
rm_work	or something	22:32
rm_work	it IS sending them now	22:33
rm_work	blegh	22:33
rm_work	I am still unsure what I think about being unable to failover while LB is pending state	22:34
johnsom	Should not happen, HM doesn't own the amp	22:35
*** sshank has quit IRC		22:37
rm_work	aahh	22:37
rm_work	2017-09-27 15:37:28.083 7766 WARNING octavia.controller.healthmanager.update_db [-] Amphora 9912b285-4aee-4216-bc81-3e036db246cd health message reports 1 listeners when 0 expected	22:37
rm_work	the messages come in but are ignored I think	22:37
rm_work	because the listener count doesn't line up	22:37
johnsom	correct, that is normal on startup	22:38
rm_work	ok but it stays that way for a WHILE	22:38
rm_work	i think because it's still in the process of setting up the single-create stuff so it hasn't persisted that yet?	22:39
rm_work	but it has sent it to the amps/	22:39
rm_work	?	22:39
johnsom	Even so, it can't fail over due to HM because it should have never written a amphora_health record to the DB until the first listener comes up	22:40
rm_work	yeah the first listener comes up on the amp	22:40
rm_work	err	22:41
rm_work	hmm so at SOME POINT the numbers had to match	22:41
johnsom	I think we had bug about that, that we really should be doing health of the amp before the listener, but you know, chicken/egg	22:41
johnsom	The last attempt at that caused it to always failover	22:42
johnsom	ha	22:42
rm_work	i have an idea...	22:42
rm_work	if len(listeners) == expected_listener_count:	22:42
rm_work	^^ replace that with	22:42
rm_work	if len(listeners) == expected_listener_count and 'PENDING' not in lb.provisioning_status:	22:43
rm_work	err sorry bad logic	22:43
rm_work	one sec	22:43
rm_work	if len(listeners) == expected_listener_count or 'PENDING' in lb.provisioning_status:	22:43
johnsom	spares pool	22:43
rm_work	erm	22:43
rm_work	hold on we're USING lb.id here	22:44
rm_work	ah i see	22:44
rm_work	one sec	22:44
*** sshank has joined #openstack-lbaas		22:45
rm_work	http://paste.openstack.org/show/622097/	22:46
rm_work	added lines 1, 6/7, edited 13	22:46
*** leitan has joined #openstack-lbaas		22:48
rm_work	johnsom: ^^	22:50
johnsom	What about an LB created but no listener?	22:51
johnsom	I don't remember when we start sending	22:51
johnsom	Frankly I'm still not sure why we are doing this at all...	22:51
rm_work	oh dear god, somehow i got an orphaned VM pair	22:55
rm_work	i don't know how to track them down...	22:55
*** yamamoto has joined #openstack-lbaas		22:58
*** rtjure has quit IRC		23:01
*** bcafarel has quit IRC		23:03
rm_work	k got it	23:03
*** yamamoto has quit IRC		23:05
rm_work	yeah umm shit	23:06
rm_work	so when i do single-create with active-standby, it dies before it ever finishes creating	23:06
rm_work	(the amps go to error)	23:06
johnsom	nova error?	23:06
rm_work	and then when i try cleanup (delete the LB, which is ACTIVE) it seems to be leaving the VMs behind but deleting the amp record O_o	23:07
rm_work	no, the amps failover but it's in PENDING_CREATE so they don't finish failover and go to ERROR	23:07
rm_work	maybe we shouldn't put amps in ERROR if we try a failover but don't do it because of LB state	23:07
rm_work	O_o	23:08
*** tongl has quit IRC		23:09
openstackgerrit	Adam Harwell proposed openstack/octavia master: WIP: Floating IP Network Driver (spans L3s) https://review.openstack.org/435612	23:11
rm_work	ok that fix seems to fix my startup issue	23:13
rm_work	the PENDING state is just too long for them and so it gets out an initial heartbeat somehow but then is blocked for too long T_T	23:14
johnsom	I would be figuring out how that hearthbeat is coming out	23:14
*** bcafarel has joined #openstack-lbaas		23:16
rm_work	hmm	23:18
rm_work	yeah the timing on this is crazy complex	23:18
rm_work	johnsom: ok here's a question for you	23:35
rm_work	LB is happy and ACTIVE	23:35
rm_work	an amp dies	23:35
rm_work	we notice, trigger failover, LB goes to PENDING_UPDATE	23:35
rm_work	we finish the failover, mark it ACTIVE again	23:35
rm_work	right?	23:35
openstackgerrit	Michael Johnson proposed openstack/octavia master: Force PCI bus rescan if interface is not found https://review.openstack.org/507986	23:35
johnsom	Yes, it also has a health lock	23:36
rm_work	now what happens if both of the amps behind an ACTIVE_STANDBY LB die within a few seconds?	23:36
rm_work	one amp dies, we PENDING_UPDATE the LB	23:36
rm_work	second amp ... failover fails	23:36
rm_work	now think about once we have N-Active	23:37
rm_work	I really don't think we can afford to lock the LB on failover <_<	23:37
johnsom	But, it should be fine right, the second failed amp will stay failed, first one will be built, go back ACTIVE, HM will notice the second one is borked and start failover on it.	23:39
johnsom	Right?	23:39
rm_work	i don't think that works	23:39
rm_work	i need to figure out why	23:40
rm_work	but IME they stay ERROR forever	23:40
johnsom	About a year ago it did, I tried deleting both and got it to work	23:40
rm_work	i think possibly the revert doesn't properly unset the busy flag	23:40
rm_work	i'll look again next time it happens	23:40
rm_work	unfortunately my breakpoints keep getting messed up, so tempest finishes cleanup before i can investigate fully	23:41
johnsom	That could be the case actually	23:41
rm_work	but i just went through and found a ton of amps in the health table with busy-flag set	23:41
rm_work	so there is obviously a case where it doesn't unset properly somewhere	23:42
johnsom	Yeah, because that happens outside the flow	23:42
johnsom	It happens in the HM during "stale" check (still hate that term)	23:42
rm_work	MarkAmphoraHealthBusy	23:43
rm_work	that doesn't have a revert action	23:43
johnsom	so, failover error on revert should unset it	23:43
johnsom	https://github.com/openstack/octavia/blob/master/octavia/db/repositories.py#L1079	23:43
rm_work	errr	23:43
rm_work	AmphoraToErrorOnRevertTask doesn't	23:44
rm_work	trying to find a task that does	23:44
johnsom	Yeah, that is what I am saying, it probably should unset busy	23:44
rm_work	ah k	23:44
rm_work	you think in AmphoraToErrorOnRevertTask or as a revert on the MarkAmphoraHealthBusy task?	23:44
johnsom	MarkAmphoraHealthBusy is on the new amp I think	23:44
rm_work	err	23:44
rm_work	hmmm	23:45
rm_work	database_tasks.MarkAmphoraHealthBusy(	23:45
rm_work	rebind={constants.AMPHORA: constants.FAILED_AMPHORA},	23:45
johnsom	Oh, no, it's there for the non-failed failover case	23:45
rm_work	looks like it's on the old one	23:45
rm_work	although I guess doing MarkAmphoraHealthBusy in the flow at all is redundant	23:46
rm_work	because in order to RUN this flow on an amp, we've already set it, right?	23:46
rm_work	so it's just... a noop	23:46
johnsom	Well, either way, your case with PENDING will fail before it gets to MarkAmphoraHealthBusy	23:46
johnsom	Well, if it isn't actually a failure, but failover API it needs to be set	23:46
rm_work	hmmm	23:47
johnsom	failure case, it's a dup, failover case it's needed	23:47
rm_work	ok	23:47
rm_work	so maybe the revert should be there regardless	23:47
rm_work	and then ALSO somewhere higher?	23:47
*** sshank has quit IRC		23:47
johnsom	Yeah, I think AmphoraToErrorOnRevertTask should clear the busy flag	23:47
*** sshank has joined #openstack-lbaas		23:52
*** sshank has quit IRC		23:53
openstackgerrit	Adam Harwell proposed openstack/octavia master: WIP: Floating IP Network Driver (spans L3s) https://review.openstack.org/435612	23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!