Monday, 2018-05-14

openstackgerritIan Wienand proposed openstack-infra/nodepool master: Default image creation to qcow2 type  https://review.openstack.org/56643700:36
*** dkranz has quit IRC01:06
*** snapiri has quit IRC01:07
ianwany idea why running unit tests manually I get zero stdout/err/any logging at all ?01:18
openstackgerritMerged openstack-infra/nodepool master: Default image creation to qcow2 type  https://review.openstack.org/56643701:18
ianwhttp://paste.openstack.org/show/720873/01:18
tristanCianw: perhaps try with "testr run $test_name" after activating the venv?01:29
ianwtristanC: i've deleted the OS_CAPTURE etc lines from .testr.conf and now do see the relevant exception01:32
ianwsomething weird is going on, but my desire to dig into stdout tracing in unit testing isn't strong atm :)01:32
*** snapiri has joined #zuul01:35
*** jesusaur has quit IRC01:45
*** swest has quit IRC01:51
*** jesusaur has joined #zuul01:53
*** swest has joined #zuul02:07
*** threestrands has joined #zuul03:59
*** threestrands_ has joined #zuul04:00
*** threestrands_ has quit IRC04:01
*** threestrands_ has joined #zuul04:02
*** threestrands has quit IRC04:04
*** swest has quit IRC04:20
*** toabctl has joined #zuul04:21
*** threestrands_ has quit IRC04:39
*** swest has joined #zuul04:59
*** swest has quit IRC05:03
*** threestrands has joined #zuul05:04
*** threestrands has quit IRC05:07
*** swest has joined #zuul05:13
openstackgerritIan Wienand proposed openstack-infra/zuul master: Ignore extra routes  https://review.openstack.org/56819505:59
openstackgerritIan Wienand proposed openstack-infra/zuul master: Ignore extra routes  https://review.openstack.org/56819506:14
*** pcaruana has joined #zuul06:31
*** gtema has joined #zuul06:44
*** gtema has quit IRC06:44
*** gtema has joined #zuul06:45
*** gtema has quit IRC06:45
*** gtema has joined #zuul06:46
*** gtema has joined #zuul06:46
openstackgerritArtem Goncharov proposed openstack-infra/zuul master: fill `delta` with '0' for `creates` and `removes` command.  https://review.openstack.org/56786407:14
openstackgerritIan Wienand proposed openstack-infra/zuul master: Only cap aiohttp for python 3.5.2 and below  https://review.openstack.org/56766307:19
openstackgerritIan Wienand proposed openstack-infra/zuul master: Remove env marker from uvloop  https://review.openstack.org/56766507:19
openstackgerritIan Wienand proposed openstack-infra/zuul master: Ignore extra routes  https://review.openstack.org/56819507:19
openstackgerritIan Wienand proposed openstack-infra/zuul master: await in test_websocket_streaming calls  https://review.openstack.org/56821407:19
*** sshnaidm|bbl is now known as sshnaidm07:33
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: gerrit: add support for report only connection  https://review.openstack.org/56821607:44
*** jpena|off is now known as jpena07:53
openstackgerritIan Wienand proposed openstack-infra/zuul master: await in test_websocket_streaming calls  https://review.openstack.org/56821407:53
openstackgerritIan Wienand proposed openstack-infra/zuul master: Ignore extra routes  https://review.openstack.org/56819507:54
*** ssbarnea_ has joined #zuul07:55
*** ssbarnea_ has quit IRC08:06
*** ssbarnea_ has joined #zuul08:42
*** bhavik1 has joined #zuul08:46
*** bhavik1 has quit IRC09:01
*** sshnaidm is now known as sshnaidm|rover09:12
*** needsleep is now known as TheJulia11:25
*** jpena is now known as jpena|lunch11:31
*** weshay is now known as weshay_interview11:59
*** dkranz has joined #zuul12:00
*** jpena|lunch is now known as jpena12:25
*** rlandy has joined #zuul12:33
*** dmsimard is now known as dmsimard|off12:58
*** weshay_interview is now known as weshay13:01
*** dkranz has quit IRC13:14
*** elyezer has quit IRC13:19
*** elyezer has joined #zuul13:21
*** gtema has quit IRC13:35
Shrewscorvus: thank you. i now have a Warrant song in my head that i cannot get rid of13:43
Shrews"Zuul's my cherry pie, cool drink of water such a sweet surprise, streams so good make a grown man cry, sweet cherry pie."13:48
Shrewsthere... now you all must suffer13:48
pabelangerha13:50
mordredShrews: I'm pretty sure SpamapS needs to not miss that ^^14:11
pabelangerso, I've noticed with linaro cloud, it has a pretty high rate of failure to launch a VM. Which results in NODE_FAILURE on a job, I've noticed we've had to recheck it a lot more now, since before nodepool would keep trying until successful.14:14
*** dkranz has joined #zuul14:16
*** bhavik1 has joined #zuul14:32
*** bhavik1 has quit IRC14:32
*** acozine1 has joined #zuul14:43
*** pcaruana has quit IRC15:02
*** jpena is now known as jpena|brb15:43
*** pcaruana has joined #zuul16:20
*** jpena|brb is now known as jpena16:24
*** mugsie has quit IRC17:06
*** mugsie has joined #zuul17:06
*** mugsie has quit IRC17:06
*** mugsie has joined #zuul17:06
*** mugsie has quit IRC17:08
*** mugsie has joined #zuul17:12
*** mugsie has quit IRC17:12
*** mugsie has joined #zuul17:12
*** jpena is now known as jpena|off17:19
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Move SQL web handler to driver  https://review.openstack.org/56802817:37
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy  https://review.openstack.org/56795917:37
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp  https://review.openstack.org/56833517:37
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp  https://review.openstack.org/56833517:40
SpamapSShrews: you win the Zuul improve game. ;)17:41
SpamapSimprov17:41
corvusShrews also has a commanding lead in the improve game17:42
SpamapStrue17:42
*** sshnaidm|rover is now known as sshnaidm|off18:07
clarkbpabelanger: when you say before nodepool would keep trying until successful that was true before the zuulv3 work, but zuulv3 nodepool has never done that aiui18:09
clarkbpabelanger: the new request api is such that if all clouds capable of fulfilling a request fail or refuse that becomes a NODE_FAILURE18:09
clarkbpabelanger: since this is the only cloud that can provide arm64 nodes its chances of doing that are much higher18:09
*** harlowja has joined #zuul18:12
pabelangerclarkb: right, in the case of linaro, it is actually failing to boot the node. It was nice in nodepoolv2, to have it keep going until it eventually worked.  Now, we need to recheck, which may results in more CI resources since linaro is having troubles.  This is also another use case where a single cloud cases this to be much higher as you say18:13
clarkbwe might be able to update the request handler to retry if the result were to be a failure at that point18:15
clarkbprobably by resetting the request state to that of a completely unhandled request18:16
corvusat what point would you like it to fail?18:21
clarkbif the underlying failure came from the cloud and not nodepool maybe never?18:22
pabelangernever maybe better for users, means longer waiting for job to run, but from ops POV, might not know there is an issue with cloud18:23
corvususers may not know either.  that effectively puts the system into an infinite loop with no user feedback18:23
pabelangeryah18:23
corvusit's worth noting that by the time it's reported to zuul, the cloud will have already failed to boot a node 3 times.18:24
corvus(nodepool does have internal retry logic)18:24
pabelangerI do think it is reasonable to say we need to reach out to linaro cloud and help debug the failures18:25
corvusand that's configurable with the launch-retries parameters18:25
corvuspabelanger: well, that's perhaps a task for openstack-infra18:25
pabelangercorvus: yes, agree18:26
corvusclarkb: are you aware of launch-retries?  it's not clear from your earlier suggestion about changing the request handler behavior...18:27
clarkbcorvus: yes I am18:28
clarkbbut once all clouds enter that state its done18:28
clarkbso retries failed or unable to fulfill request18:28
clarkbwhcih is different than the old v2 behavior18:28
corvusclarkb: so your suggestion was something along the lines of: if all handlers have failed due to transient errors, start over?18:28
clarkbcorvus: ya, if all failed because nodepool can't fulfill it (say wrong label) then fail. But if cloud says 500 error just keep going. But stick it back on the request queue so that another cloud might be able to handle it18:29
pabelangeris launch-retries limited to the cloud the node request was originally launched on?18:30
clarkbmy memory is yes, and if one cloud fails but not all clouds have failed it will try the next one18:31
SpamapSI dunno18:32
SpamapSsounds to me like you just need to bump up the # of retries18:32
SpamapSand ensure that the cloud operator is aware of their fail rate.18:32
corvuspabelanger: it's how many times a single provider-launcher will attempt to launch before it declares the attempt failed18:32
corvusso if there's one cloud provider, 3 attempts will be made.  if there are 2 providers, then 6 attempts.18:32
pabelangerack, thanks18:34
clarkbnodepool/driver/openstack/handler.py grep for retries if you want to see the code18:34
tobiashare there any updated plans for an ansible 2.5 upgrade?18:36
tobiashon the mailing list there was a discussion about a flag day (which was last week)18:36
tobiashwe have it in production now and things look normal18:36
tobiash(since friday)18:37
*** Guest16323 is now known as mgagne18:40
*** mgagne has joined #zuul18:40
*** ssbarnea_ has quit IRC18:56
*** ssbarnea_ has joined #zuul18:57
corvustobiash: my guess is that we're probably not going to have the team bandwidth to do it until after the summit (so sometime after may 28)19:26
ShrewsSo, I have no idea how to deal with quota calculation when a node can be registered in zk with multiple types19:56
ShrewsThis is starting to become a significant change19:57
clarkbthat cost is node specific not label right?20:00
Shrewssee quotaNeededByNodeType()20:01
Shrewsam i just supposed to choose any (or the first) label that may have a matching label? or consider all matches?20:05
clarkbShrews: the request is always going to be for a specific type right?20:08
Shrewsyep20:08
clarkbquotaNeededByNodeType then looks up that image type in the provider which tells it what the flavor is and that determines the quota cost20:08
clarkbthat node may fulfill some other label as well but it will have the same cost because the flavor is fixed for that node20:08
Shrewsbut i'm going to have to pre-choose the label early and indicate that to all places where we now use node.type to map to the label. otherwise we could be comparing apples and oranges with different matches20:10
clarkbbut in the context of a request (where the quota matters aiui) there is only a single type20:11
Shrewsthe request may have a single type, but that type can be represented in nodepool by many pool labels20:11
Shrewswhich now must be chosen early rather than relying on a 1-to-1 mapping20:12
Shrewsugh20:12
clarkbI think the type would still be for the original request?20:13
clarkbor carry the requested type as a new field20:13
clarkbit isn't possible to have a node with multiple flavors so you still have a 1:1 mapping in that way20:15
clarkbya hasProviderQuota() should just work due to requests still being 1:1 label:node20:16
clarkband similar for hasRemainingQuota I think20:17
clarkbif you keep the needed_types based on the request itself and indepedent of the resulting type sets I think it should work?20:18
corvuser, type is another word for label20:21
corvuswe just changed what we called it after initial implementation20:22
corvuswe should be able to s/type/label/ in nodepool20:22
*** rlandy is now known as rlandy|brb20:23
corvusShrews: ^ does that clarify things?  i don't think OpenStackProvider.quotaNeededByNodeType needs any changes20:26
*** rlandy|brb is now known as rlandy20:41
*** pcaruana has quit IRC20:43
*** acozine1 has quit IRC20:48
*** dkranz has quit IRC20:55
*** dkranz has joined #zuul20:57
*** dkranz has quit IRC21:04
Shrewscorvus: it has to be changed. pool.labels can no longer be indexed by a label name/type21:07
Shrewsthat is going to be changed from a dict to a list (which will have to be searched to find a label matching the requested label)21:08
corvusShrews: then, yeah, if we're going to allow the openstack provider to have duplicate labels (i think we talked about possibly doing that as a follow-up since it's not strictly necessary for the underlying api support) then whatever method we use to choose which actual-label to use for a given requested-label would need to apply to the quota check.  i can only think of 'random' or 'first-matching' as making21:12
corvussense.21:12
ShrewsI admit, this might be possibly quite difficult for others to comprehend w/o seeing the actual code that I've written21:13
Shrewscorvus:  i sort of think we HAVE to change the openstack provider since this affects the driver api21:14
Shrewsi think i might see a path forward. it's just uphill... and snowing... with a driving wind in my face21:15
corvusShrews: i thought we were maintaining a distinction between updating the api to allow that, without actually supporting duplicate labels in dynamic drivers21:15
corvus* We can make implementation of multi-label support in the dynamic drivers21:15
corvusoptional. I don't expect we'd add support of this to the OpenStack driver21:15
corvuswith the initial change. It can be added at a later date, if desired.21:15
corvusfrom the email ^21:15
*** sshnaidm|off has quit IRC21:16
Shrewscorvus: i know. i'm just not seeing a way to avoid it at this point21:16
corvusShrews: is OpenStackProvider.quotaNeededByNodeType an API method?21:17
Shrewsno21:17
corvusShrews: it can continue to assume that there's only one label object for a given label name.... oh, is the issue that the config datastructure is changing?21:19
corvusand the config datastructure is not part of the driver api?21:19
Shrewsthe problem is that our code references pool.labels as a dict (not a properly defined part of the API... yes, you are correct). the static driver cannot support multiple labels without changing that21:20
Shrewsand changing that affects the other drivers21:20
Shrewsso i'm making it a proper part of the api21:20
corvusShrews: maybe we should hand off configuration parsing of providers completely to their drivers?21:21
Shrewsso that you have to pool.addLabel(x), and we can then do pool.getLabels()  or pool.getFirstMatchingLabel()21:21
Shrewscorvus: that might be another valid way21:21
corvusShrews: that would put configuration options related to openstack in the openstack driver, and aws in the aws driver, etc21:22
Shrewsbut there are certain structures we expect to have populated (like pools). We lose enforcement of that21:22
corvusthat's the model that zuul uses for the triggers and reporters -- zuul parses up to the point where it knows "this dict is a config blob for a github trigger, hand it to the github driver to parse)21:22
corvusShrews: we could expect certain outcomes from a driver -- like, it returns a list of pools (and possibly images, etc), after parsing the configuration.21:24
Shrewscorvus: let me put some thought into it. thx for the idea21:24
Shrewsthat might end up easier21:25
corvuscool21:25
Shrewsi still  worry about handling labels outside of the drivers though21:26
Shrewsmight end up at the same place  :/21:26
corvusShrews: maybe we can aim for making it as opaque to nodepool as possible; it's just calling driver methods with a (label) argument, and they're dealing with internal data structures on their own.21:27
corvusbut yeah, it's not easy to see that far ahead, even if we squint :)21:28
Shrewsactually... the drivers already handle all of the provider config parsing21:28
Shrewsit's the things in drivers/__init__.py that would have to moved inside of driver itself21:29
Shrewsmaybe we can abstract most of that away21:30
Shrewsoh look. beer-thirty21:31
clarkbpool.labels is already a list?21:32
clarkbas for handling duplicates in a list pass in the entire pool.labels entry that is chosen rather than just the type/label name I guess21:35
clarkband use that to determine the flavor when calculating quota and when booting the instance21:35
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Move SQL web handler to driver  https://review.openstack.org/56802821:37
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy  https://review.openstack.org/56795921:37
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp  https://review.openstack.org/56833521:37
corvusclarkb: pools.labels is a list in configuration, but when parsed, it's turned into a dict -- because you can't have duplicate labels currently.  i believe the issue shrews is describing is that if a driver supports multiple labels with the same name, how do we select which one to use?  i'm suggesting that we should adjust the driver api so that nodepool itself doesn't have to answer that, and so that all of21:40
corvusthe following choices are possible inside of a driver: pick the first available (static driver); don't support multiple labels with the same name (openstack driver); maybe pick one at random or something (possible future implementation of openstack driver)21:40
clarkbgotcha21:43
clarkbcorvus: on https://review.openstack.org/#/c/568028/4/zuul/cmd/web.py seems like we should readjust the existing connections parameter to be passed the data we need and then update to find the connections data as necessary from there?21:45
clarkbI haven't looked to see how much of that changes with cherrypy though21:45
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy  https://review.openstack.org/56795921:46
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp  https://review.openstack.org/56833521:46
corvusclarkb: absolutely -- that happens in the cherrypy change.  i just consider that temporary scaffolding to try to keep the changes separate21:46
corvus(the cherrypy change moves most of the iterating over connections, etc, to inside zuulweb, so we just pass in the connection registry object)21:48
*** harlowja has quit IRC21:56
*** ssbarnea_ has quit IRC21:58
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy  https://review.openstack.org/56795922:10
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp  https://review.openstack.org/56833522:10
*** threestrands has joined #zuul22:16
*** rlandy has quit IRC23:20
*** snapiri has quit IRC23:37

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!