Monday, 2018-05-14

openstackgerrit	Ian Wienand proposed openstack-infra/nodepool master: Default image creation to qcow2 type https://review.openstack.org/566437	00:36
*** dkranz has quit IRC		01:06
*** snapiri has quit IRC		01:07
ianw	any idea why running unit tests manually I get zero stdout/err/any logging at all ?	01:18
openstackgerrit	Merged openstack-infra/nodepool master: Default image creation to qcow2 type https://review.openstack.org/566437	01:18
ianw	http://paste.openstack.org/show/720873/	01:18
tristanC	ianw: perhaps try with "testr run $test_name" after activating the venv?	01:29
ianw	tristanC: i've deleted the OS_CAPTURE etc lines from .testr.conf and now do see the relevant exception	01:32
ianw	something weird is going on, but my desire to dig into stdout tracing in unit testing isn't strong atm :)	01:32
*** snapiri has joined #zuul		01:35
*** jesusaur has quit IRC		01:45
*** swest has quit IRC		01:51
*** jesusaur has joined #zuul		01:53
*** swest has joined #zuul		02:07
*** threestrands has joined #zuul		03:59
*** threestrands_ has joined #zuul		04:00
*** threestrands_ has quit IRC		04:01
*** threestrands_ has joined #zuul		04:02
*** threestrands has quit IRC		04:04
*** swest has quit IRC		04:20
*** toabctl has joined #zuul		04:21
*** threestrands_ has quit IRC		04:39
*** swest has joined #zuul		04:59
*** swest has quit IRC		05:03
*** threestrands has joined #zuul		05:04
*** threestrands has quit IRC		05:07
*** swest has joined #zuul		05:13
openstackgerrit	Ian Wienand proposed openstack-infra/zuul master: Ignore extra routes https://review.openstack.org/568195	05:59
openstackgerrit	Ian Wienand proposed openstack-infra/zuul master: Ignore extra routes https://review.openstack.org/568195	06:14
*** pcaruana has joined #zuul		06:31
*** gtema has joined #zuul		06:44
*** gtema has quit IRC		06:44
*** gtema has joined #zuul		06:45
*** gtema has quit IRC		06:45
*** gtema has joined #zuul		06:46
*** gtema has joined #zuul		06:46
openstackgerrit	Artem Goncharov proposed openstack-infra/zuul master: fill `delta` with '0' for `creates` and `removes` command. https://review.openstack.org/567864	07:14
openstackgerrit	Ian Wienand proposed openstack-infra/zuul master: Only cap aiohttp for python 3.5.2 and below https://review.openstack.org/567663	07:19
openstackgerrit	Ian Wienand proposed openstack-infra/zuul master: Remove env marker from uvloop https://review.openstack.org/567665	07:19
openstackgerrit	Ian Wienand proposed openstack-infra/zuul master: Ignore extra routes https://review.openstack.org/568195	07:19
openstackgerrit	Ian Wienand proposed openstack-infra/zuul master: await in test_websocket_streaming calls https://review.openstack.org/568214	07:19
*** sshnaidm\|bbl is now known as sshnaidm		07:33
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: gerrit: add support for report only connection https://review.openstack.org/568216	07:44
*** jpena\|off is now known as jpena		07:53
openstackgerrit	Ian Wienand proposed openstack-infra/zuul master: await in test_websocket_streaming calls https://review.openstack.org/568214	07:53
openstackgerrit	Ian Wienand proposed openstack-infra/zuul master: Ignore extra routes https://review.openstack.org/568195	07:54
*** ssbarnea_ has joined #zuul		07:55
*** ssbarnea_ has quit IRC		08:06
*** ssbarnea_ has joined #zuul		08:42
*** bhavik1 has joined #zuul		08:46
*** bhavik1 has quit IRC		09:01
*** sshnaidm is now known as sshnaidm\|rover		09:12
*** needsleep is now known as TheJulia		11:25
*** jpena is now known as jpena\|lunch		11:31
*** weshay is now known as weshay_interview		11:59
*** dkranz has joined #zuul		12:00
*** jpena\|lunch is now known as jpena		12:25
*** rlandy has joined #zuul		12:33
*** dmsimard is now known as dmsimard\|off		12:58
*** weshay_interview is now known as weshay		13:01
*** dkranz has quit IRC		13:14
*** elyezer has quit IRC		13:19
*** elyezer has joined #zuul		13:21
*** gtema has quit IRC		13:35
Shrews	corvus: thank you. i now have a Warrant song in my head that i cannot get rid of	13:43
Shrews	"Zuul's my cherry pie, cool drink of water such a sweet surprise, streams so good make a grown man cry, sweet cherry pie."	13:48
Shrews	there... now you all must suffer	13:48
pabelanger	ha	13:50
mordred	Shrews: I'm pretty sure SpamapS needs to not miss that ^^	14:11
pabelanger	so, I've noticed with linaro cloud, it has a pretty high rate of failure to launch a VM. Which results in NODE_FAILURE on a job, I've noticed we've had to recheck it a lot more now, since before nodepool would keep trying until successful.	14:14
*** dkranz has joined #zuul		14:16
*** bhavik1 has joined #zuul		14:32
*** bhavik1 has quit IRC		14:32
*** acozine1 has joined #zuul		14:43
*** pcaruana has quit IRC		15:02
*** jpena is now known as jpena\|brb		15:43
*** pcaruana has joined #zuul		16:20
*** jpena\|brb is now known as jpena		16:24
*** mugsie has quit IRC		17:06
*** mugsie has joined #zuul		17:06
*** mugsie has quit IRC		17:06
*** mugsie has joined #zuul		17:06
*** mugsie has quit IRC		17:08
*** mugsie has joined #zuul		17:12
*** mugsie has quit IRC		17:12
*** mugsie has joined #zuul		17:12
*** jpena is now known as jpena\|off		17:19
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Move SQL web handler to driver https://review.openstack.org/568028	17:37
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy https://review.openstack.org/567959	17:37
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp https://review.openstack.org/568335	17:37
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp https://review.openstack.org/568335	17:40
SpamapS	Shrews: you win the Zuul improve game. ;)	17:41
SpamapS	improv	17:41
corvus	Shrews also has a commanding lead in the improve game	17:42
SpamapS	true	17:42
*** sshnaidm\|rover is now known as sshnaidm\|off		18:07
clarkb	pabelanger: when you say before nodepool would keep trying until successful that was true before the zuulv3 work, but zuulv3 nodepool has never done that aiui	18:09
clarkb	pabelanger: the new request api is such that if all clouds capable of fulfilling a request fail or refuse that becomes a NODE_FAILURE	18:09
clarkb	pabelanger: since this is the only cloud that can provide arm64 nodes its chances of doing that are much higher	18:09
*** harlowja has joined #zuul		18:12
pabelanger	clarkb: right, in the case of linaro, it is actually failing to boot the node. It was nice in nodepoolv2, to have it keep going until it eventually worked. Now, we need to recheck, which may results in more CI resources since linaro is having troubles. This is also another use case where a single cloud cases this to be much higher as you say	18:13
clarkb	we might be able to update the request handler to retry if the result were to be a failure at that point	18:15
clarkb	probably by resetting the request state to that of a completely unhandled request	18:16
corvus	at what point would you like it to fail?	18:21
clarkb	if the underlying failure came from the cloud and not nodepool maybe never?	18:22
pabelanger	never maybe better for users, means longer waiting for job to run, but from ops POV, might not know there is an issue with cloud	18:23
corvus	users may not know either. that effectively puts the system into an infinite loop with no user feedback	18:23
pabelanger	yah	18:23
corvus	it's worth noting that by the time it's reported to zuul, the cloud will have already failed to boot a node 3 times.	18:24
corvus	(nodepool does have internal retry logic)	18:24
pabelanger	I do think it is reasonable to say we need to reach out to linaro cloud and help debug the failures	18:25
corvus	and that's configurable with the launch-retries parameters	18:25
corvus	pabelanger: well, that's perhaps a task for openstack-infra	18:25
pabelanger	corvus: yes, agree	18:26
corvus	clarkb: are you aware of launch-retries? it's not clear from your earlier suggestion about changing the request handler behavior...	18:27
clarkb	corvus: yes I am	18:28
clarkb	but once all clouds enter that state its done	18:28
clarkb	so retries failed or unable to fulfill request	18:28
clarkb	whcih is different than the old v2 behavior	18:28
corvus	clarkb: so your suggestion was something along the lines of: if all handlers have failed due to transient errors, start over?	18:28
clarkb	corvus: ya, if all failed because nodepool can't fulfill it (say wrong label) then fail. But if cloud says 500 error just keep going. But stick it back on the request queue so that another cloud might be able to handle it	18:29
pabelanger	is launch-retries limited to the cloud the node request was originally launched on?	18:30
clarkb	my memory is yes, and if one cloud fails but not all clouds have failed it will try the next one	18:31
SpamapS	I dunno	18:32
SpamapS	sounds to me like you just need to bump up the # of retries	18:32
SpamapS	and ensure that the cloud operator is aware of their fail rate.	18:32
corvus	pabelanger: it's how many times a single provider-launcher will attempt to launch before it declares the attempt failed	18:32
corvus	so if there's one cloud provider, 3 attempts will be made. if there are 2 providers, then 6 attempts.	18:32
pabelanger	ack, thanks	18:34
clarkb	nodepool/driver/openstack/handler.py grep for retries if you want to see the code	18:34
tobiash	are there any updated plans for an ansible 2.5 upgrade?	18:36
tobiash	on the mailing list there was a discussion about a flag day (which was last week)	18:36
tobiash	we have it in production now and things look normal	18:36
tobiash	(since friday)	18:37
*** Guest16323 is now known as mgagne		18:40
*** mgagne has joined #zuul		18:40
*** ssbarnea_ has quit IRC		18:56
*** ssbarnea_ has joined #zuul		18:57
corvus	tobiash: my guess is that we're probably not going to have the team bandwidth to do it until after the summit (so sometime after may 28)	19:26
Shrews	So, I have no idea how to deal with quota calculation when a node can be registered in zk with multiple types	19:56
Shrews	This is starting to become a significant change	19:57
clarkb	that cost is node specific not label right?	20:00
Shrews	see quotaNeededByNodeType()	20:01
Shrews	am i just supposed to choose any (or the first) label that may have a matching label? or consider all matches?	20:05
clarkb	Shrews: the request is always going to be for a specific type right?	20:08
Shrews	yep	20:08
clarkb	quotaNeededByNodeType then looks up that image type in the provider which tells it what the flavor is and that determines the quota cost	20:08
clarkb	that node may fulfill some other label as well but it will have the same cost because the flavor is fixed for that node	20:08
Shrews	but i'm going to have to pre-choose the label early and indicate that to all places where we now use node.type to map to the label. otherwise we could be comparing apples and oranges with different matches	20:10
clarkb	but in the context of a request (where the quota matters aiui) there is only a single type	20:11
Shrews	the request may have a single type, but that type can be represented in nodepool by many pool labels	20:11
Shrews	which now must be chosen early rather than relying on a 1-to-1 mapping	20:12
Shrews	ugh	20:12
clarkb	I think the type would still be for the original request?	20:13
clarkb	or carry the requested type as a new field	20:13
clarkb	it isn't possible to have a node with multiple flavors so you still have a 1:1 mapping in that way	20:15
clarkb	ya hasProviderQuota() should just work due to requests still being 1:1 label:node	20:16
clarkb	and similar for hasRemainingQuota I think	20:17
clarkb	if you keep the needed_types based on the request itself and indepedent of the resulting type sets I think it should work?	20:18
corvus	er, type is another word for label	20:21
corvus	we just changed what we called it after initial implementation	20:22
corvus	we should be able to s/type/label/ in nodepool	20:22
*** rlandy is now known as rlandy\|brb		20:23
corvus	Shrews: ^ does that clarify things? i don't think OpenStackProvider.quotaNeededByNodeType needs any changes	20:26
*** rlandy\|brb is now known as rlandy		20:41
*** pcaruana has quit IRC		20:43
*** acozine1 has quit IRC		20:48
*** dkranz has quit IRC		20:55
*** dkranz has joined #zuul		20:57
*** dkranz has quit IRC		21:04
Shrews	corvus: it has to be changed. pool.labels can no longer be indexed by a label name/type	21:07
Shrews	that is going to be changed from a dict to a list (which will have to be searched to find a label matching the requested label)	21:08
corvus	Shrews: then, yeah, if we're going to allow the openstack provider to have duplicate labels (i think we talked about possibly doing that as a follow-up since it's not strictly necessary for the underlying api support) then whatever method we use to choose which actual-label to use for a given requested-label would need to apply to the quota check. i can only think of 'random' or 'first-matching' as making	21:12
corvus	sense.	21:12
Shrews	I admit, this might be possibly quite difficult for others to comprehend w/o seeing the actual code that I've written	21:13
Shrews	corvus: i sort of think we HAVE to change the openstack provider since this affects the driver api	21:14
Shrews	i think i might see a path forward. it's just uphill... and snowing... with a driving wind in my face	21:15
corvus	Shrews: i thought we were maintaining a distinction between updating the api to allow that, without actually supporting duplicate labels in dynamic drivers	21:15
corvus	* We can make implementation of multi-label support in the dynamic drivers	21:15
corvus	optional. I don't expect we'd add support of this to the OpenStack driver	21:15
corvus	with the initial change. It can be added at a later date, if desired.	21:15
corvus	from the email ^	21:15
*** sshnaidm\|off has quit IRC		21:16
Shrews	corvus: i know. i'm just not seeing a way to avoid it at this point	21:16
corvus	Shrews: is OpenStackProvider.quotaNeededByNodeType an API method?	21:17
Shrews	no	21:17
corvus	Shrews: it can continue to assume that there's only one label object for a given label name.... oh, is the issue that the config datastructure is changing?	21:19
corvus	and the config datastructure is not part of the driver api?	21:19
Shrews	the problem is that our code references pool.labels as a dict (not a properly defined part of the API... yes, you are correct). the static driver cannot support multiple labels without changing that	21:20
Shrews	and changing that affects the other drivers	21:20
Shrews	so i'm making it a proper part of the api	21:20
corvus	Shrews: maybe we should hand off configuration parsing of providers completely to their drivers?	21:21
Shrews	so that you have to pool.addLabel(x), and we can then do pool.getLabels() or pool.getFirstMatchingLabel()	21:21
Shrews	corvus: that might be another valid way	21:21
corvus	Shrews: that would put configuration options related to openstack in the openstack driver, and aws in the aws driver, etc	21:22
Shrews	but there are certain structures we expect to have populated (like pools). We lose enforcement of that	21:22
corvus	that's the model that zuul uses for the triggers and reporters -- zuul parses up to the point where it knows "this dict is a config blob for a github trigger, hand it to the github driver to parse)	21:22
corvus	Shrews: we could expect certain outcomes from a driver -- like, it returns a list of pools (and possibly images, etc), after parsing the configuration.	21:24
Shrews	corvus: let me put some thought into it. thx for the idea	21:24
Shrews	that might end up easier	21:25
corvus	cool	21:25
Shrews	i still worry about handling labels outside of the drivers though	21:26
Shrews	might end up at the same place :/	21:26
corvus	Shrews: maybe we can aim for making it as opaque to nodepool as possible; it's just calling driver methods with a (label) argument, and they're dealing with internal data structures on their own.	21:27
corvus	but yeah, it's not easy to see that far ahead, even if we squint :)	21:28
Shrews	actually... the drivers already handle all of the provider config parsing	21:28
Shrews	it's the things in drivers/__init__.py that would have to moved inside of driver itself	21:29
Shrews	maybe we can abstract most of that away	21:30
Shrews	oh look. beer-thirty	21:31
clarkb	pool.labels is already a list?	21:32
clarkb	as for handling duplicates in a list pass in the entire pool.labels entry that is chosen rather than just the type/label name I guess	21:35
clarkb	and use that to determine the flavor when calculating quota and when booting the instance	21:35
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Move SQL web handler to driver https://review.openstack.org/568028	21:37
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy https://review.openstack.org/567959	21:37
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp https://review.openstack.org/568335	21:37
corvus	clarkb: pools.labels is a list in configuration, but when parsed, it's turned into a dict -- because you can't have duplicate labels currently. i believe the issue shrews is describing is that if a driver supports multiple labels with the same name, how do we select which one to use? i'm suggesting that we should adjust the driver api so that nodepool itself doesn't have to answer that, and so that all of	21:40
corvus	the following choices are possible inside of a driver: pick the first available (static driver); don't support multiple labels with the same name (openstack driver); maybe pick one at random or something (possible future implementation of openstack driver)	21:40
clarkb	gotcha	21:43
clarkb	corvus: on https://review.openstack.org/#/c/568028/4/zuul/cmd/web.py seems like we should readjust the existing connections parameter to be passed the data we need and then update to find the connections data as necessary from there?	21:45
clarkb	I haven't looked to see how much of that changes with cherrypy though	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy https://review.openstack.org/567959	21:46
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp https://review.openstack.org/568335	21:46
corvus	clarkb: absolutely -- that happens in the cherrypy change. i just consider that temporary scaffolding to try to keep the changes separate	21:46
corvus	(the cherrypy change moves most of the iterating over connections, etc, to inside zuulweb, so we just pass in the connection registry object)	21:48
*** harlowja has quit IRC		21:56
*** ssbarnea_ has quit IRC		21:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy https://review.openstack.org/567959	22:10
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp https://review.openstack.org/568335	22:10
*** threestrands has joined #zuul		22:16
*** rlandy has quit IRC		23:20
*** snapiri has quit IRC		23:37

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!