openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: Default image creation to qcow2 type https://review.openstack.org/566437 | 00:36 |
---|---|---|
*** dkranz has quit IRC | 01:06 | |
*** snapiri has quit IRC | 01:07 | |
ianw | any idea why running unit tests manually I get zero stdout/err/any logging at all ? | 01:18 |
openstackgerrit | Merged openstack-infra/nodepool master: Default image creation to qcow2 type https://review.openstack.org/566437 | 01:18 |
ianw | http://paste.openstack.org/show/720873/ | 01:18 |
tristanC | ianw: perhaps try with "testr run $test_name" after activating the venv? | 01:29 |
ianw | tristanC: i've deleted the OS_CAPTURE etc lines from .testr.conf and now do see the relevant exception | 01:32 |
ianw | something weird is going on, but my desire to dig into stdout tracing in unit testing isn't strong atm :) | 01:32 |
*** snapiri has joined #zuul | 01:35 | |
*** jesusaur has quit IRC | 01:45 | |
*** swest has quit IRC | 01:51 | |
*** jesusaur has joined #zuul | 01:53 | |
*** swest has joined #zuul | 02:07 | |
*** threestrands has joined #zuul | 03:59 | |
*** threestrands_ has joined #zuul | 04:00 | |
*** threestrands_ has quit IRC | 04:01 | |
*** threestrands_ has joined #zuul | 04:02 | |
*** threestrands has quit IRC | 04:04 | |
*** swest has quit IRC | 04:20 | |
*** toabctl has joined #zuul | 04:21 | |
*** threestrands_ has quit IRC | 04:39 | |
*** swest has joined #zuul | 04:59 | |
*** swest has quit IRC | 05:03 | |
*** threestrands has joined #zuul | 05:04 | |
*** threestrands has quit IRC | 05:07 | |
*** swest has joined #zuul | 05:13 | |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul master: Ignore extra routes https://review.openstack.org/568195 | 05:59 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul master: Ignore extra routes https://review.openstack.org/568195 | 06:14 |
*** pcaruana has joined #zuul | 06:31 | |
*** gtema has joined #zuul | 06:44 | |
*** gtema has quit IRC | 06:44 | |
*** gtema has joined #zuul | 06:45 | |
*** gtema has quit IRC | 06:45 | |
*** gtema has joined #zuul | 06:46 | |
*** gtema has joined #zuul | 06:46 | |
openstackgerrit | Artem Goncharov proposed openstack-infra/zuul master: fill `delta` with '0' for `creates` and `removes` command. https://review.openstack.org/567864 | 07:14 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul master: Only cap aiohttp for python 3.5.2 and below https://review.openstack.org/567663 | 07:19 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul master: Remove env marker from uvloop https://review.openstack.org/567665 | 07:19 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul master: Ignore extra routes https://review.openstack.org/568195 | 07:19 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul master: await in test_websocket_streaming calls https://review.openstack.org/568214 | 07:19 |
*** sshnaidm|bbl is now known as sshnaidm | 07:33 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: gerrit: add support for report only connection https://review.openstack.org/568216 | 07:44 |
*** jpena|off is now known as jpena | 07:53 | |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul master: await in test_websocket_streaming calls https://review.openstack.org/568214 | 07:53 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul master: Ignore extra routes https://review.openstack.org/568195 | 07:54 |
*** ssbarnea_ has joined #zuul | 07:55 | |
*** ssbarnea_ has quit IRC | 08:06 | |
*** ssbarnea_ has joined #zuul | 08:42 | |
*** bhavik1 has joined #zuul | 08:46 | |
*** bhavik1 has quit IRC | 09:01 | |
*** sshnaidm is now known as sshnaidm|rover | 09:12 | |
*** needsleep is now known as TheJulia | 11:25 | |
*** jpena is now known as jpena|lunch | 11:31 | |
*** weshay is now known as weshay_interview | 11:59 | |
*** dkranz has joined #zuul | 12:00 | |
*** jpena|lunch is now known as jpena | 12:25 | |
*** rlandy has joined #zuul | 12:33 | |
*** dmsimard is now known as dmsimard|off | 12:58 | |
*** weshay_interview is now known as weshay | 13:01 | |
*** dkranz has quit IRC | 13:14 | |
*** elyezer has quit IRC | 13:19 | |
*** elyezer has joined #zuul | 13:21 | |
*** gtema has quit IRC | 13:35 | |
Shrews | corvus: thank you. i now have a Warrant song in my head that i cannot get rid of | 13:43 |
Shrews | "Zuul's my cherry pie, cool drink of water such a sweet surprise, streams so good make a grown man cry, sweet cherry pie." | 13:48 |
Shrews | there... now you all must suffer | 13:48 |
pabelanger | ha | 13:50 |
mordred | Shrews: I'm pretty sure SpamapS needs to not miss that ^^ | 14:11 |
pabelanger | so, I've noticed with linaro cloud, it has a pretty high rate of failure to launch a VM. Which results in NODE_FAILURE on a job, I've noticed we've had to recheck it a lot more now, since before nodepool would keep trying until successful. | 14:14 |
*** dkranz has joined #zuul | 14:16 | |
*** bhavik1 has joined #zuul | 14:32 | |
*** bhavik1 has quit IRC | 14:32 | |
*** acozine1 has joined #zuul | 14:43 | |
*** pcaruana has quit IRC | 15:02 | |
*** jpena is now known as jpena|brb | 15:43 | |
*** pcaruana has joined #zuul | 16:20 | |
*** jpena|brb is now known as jpena | 16:24 | |
*** mugsie has quit IRC | 17:06 | |
*** mugsie has joined #zuul | 17:06 | |
*** mugsie has quit IRC | 17:06 | |
*** mugsie has joined #zuul | 17:06 | |
*** mugsie has quit IRC | 17:08 | |
*** mugsie has joined #zuul | 17:12 | |
*** mugsie has quit IRC | 17:12 | |
*** mugsie has joined #zuul | 17:12 | |
*** jpena is now known as jpena|off | 17:19 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Move SQL web handler to driver https://review.openstack.org/568028 | 17:37 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy https://review.openstack.org/567959 | 17:37 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp https://review.openstack.org/568335 | 17:37 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp https://review.openstack.org/568335 | 17:40 |
SpamapS | Shrews: you win the Zuul improve game. ;) | 17:41 |
SpamapS | improv | 17:41 |
corvus | Shrews also has a commanding lead in the improve game | 17:42 |
SpamapS | true | 17:42 |
*** sshnaidm|rover is now known as sshnaidm|off | 18:07 | |
clarkb | pabelanger: when you say before nodepool would keep trying until successful that was true before the zuulv3 work, but zuulv3 nodepool has never done that aiui | 18:09 |
clarkb | pabelanger: the new request api is such that if all clouds capable of fulfilling a request fail or refuse that becomes a NODE_FAILURE | 18:09 |
clarkb | pabelanger: since this is the only cloud that can provide arm64 nodes its chances of doing that are much higher | 18:09 |
*** harlowja has joined #zuul | 18:12 | |
pabelanger | clarkb: right, in the case of linaro, it is actually failing to boot the node. It was nice in nodepoolv2, to have it keep going until it eventually worked. Now, we need to recheck, which may results in more CI resources since linaro is having troubles. This is also another use case where a single cloud cases this to be much higher as you say | 18:13 |
clarkb | we might be able to update the request handler to retry if the result were to be a failure at that point | 18:15 |
clarkb | probably by resetting the request state to that of a completely unhandled request | 18:16 |
corvus | at what point would you like it to fail? | 18:21 |
clarkb | if the underlying failure came from the cloud and not nodepool maybe never? | 18:22 |
pabelanger | never maybe better for users, means longer waiting for job to run, but from ops POV, might not know there is an issue with cloud | 18:23 |
corvus | users may not know either. that effectively puts the system into an infinite loop with no user feedback | 18:23 |
pabelanger | yah | 18:23 |
corvus | it's worth noting that by the time it's reported to zuul, the cloud will have already failed to boot a node 3 times. | 18:24 |
corvus | (nodepool does have internal retry logic) | 18:24 |
pabelanger | I do think it is reasonable to say we need to reach out to linaro cloud and help debug the failures | 18:25 |
corvus | and that's configurable with the launch-retries parameters | 18:25 |
corvus | pabelanger: well, that's perhaps a task for openstack-infra | 18:25 |
pabelanger | corvus: yes, agree | 18:26 |
corvus | clarkb: are you aware of launch-retries? it's not clear from your earlier suggestion about changing the request handler behavior... | 18:27 |
clarkb | corvus: yes I am | 18:28 |
clarkb | but once all clouds enter that state its done | 18:28 |
clarkb | so retries failed or unable to fulfill request | 18:28 |
clarkb | whcih is different than the old v2 behavior | 18:28 |
corvus | clarkb: so your suggestion was something along the lines of: if all handlers have failed due to transient errors, start over? | 18:28 |
clarkb | corvus: ya, if all failed because nodepool can't fulfill it (say wrong label) then fail. But if cloud says 500 error just keep going. But stick it back on the request queue so that another cloud might be able to handle it | 18:29 |
pabelanger | is launch-retries limited to the cloud the node request was originally launched on? | 18:30 |
clarkb | my memory is yes, and if one cloud fails but not all clouds have failed it will try the next one | 18:31 |
SpamapS | I dunno | 18:32 |
SpamapS | sounds to me like you just need to bump up the # of retries | 18:32 |
SpamapS | and ensure that the cloud operator is aware of their fail rate. | 18:32 |
corvus | pabelanger: it's how many times a single provider-launcher will attempt to launch before it declares the attempt failed | 18:32 |
corvus | so if there's one cloud provider, 3 attempts will be made. if there are 2 providers, then 6 attempts. | 18:32 |
pabelanger | ack, thanks | 18:34 |
clarkb | nodepool/driver/openstack/handler.py grep for retries if you want to see the code | 18:34 |
tobiash | are there any updated plans for an ansible 2.5 upgrade? | 18:36 |
tobiash | on the mailing list there was a discussion about a flag day (which was last week) | 18:36 |
tobiash | we have it in production now and things look normal | 18:36 |
tobiash | (since friday) | 18:37 |
*** Guest16323 is now known as mgagne | 18:40 | |
*** mgagne has joined #zuul | 18:40 | |
*** ssbarnea_ has quit IRC | 18:56 | |
*** ssbarnea_ has joined #zuul | 18:57 | |
corvus | tobiash: my guess is that we're probably not going to have the team bandwidth to do it until after the summit (so sometime after may 28) | 19:26 |
Shrews | So, I have no idea how to deal with quota calculation when a node can be registered in zk with multiple types | 19:56 |
Shrews | This is starting to become a significant change | 19:57 |
clarkb | that cost is node specific not label right? | 20:00 |
Shrews | see quotaNeededByNodeType() | 20:01 |
Shrews | am i just supposed to choose any (or the first) label that may have a matching label? or consider all matches? | 20:05 |
clarkb | Shrews: the request is always going to be for a specific type right? | 20:08 |
Shrews | yep | 20:08 |
clarkb | quotaNeededByNodeType then looks up that image type in the provider which tells it what the flavor is and that determines the quota cost | 20:08 |
clarkb | that node may fulfill some other label as well but it will have the same cost because the flavor is fixed for that node | 20:08 |
Shrews | but i'm going to have to pre-choose the label early and indicate that to all places where we now use node.type to map to the label. otherwise we could be comparing apples and oranges with different matches | 20:10 |
clarkb | but in the context of a request (where the quota matters aiui) there is only a single type | 20:11 |
Shrews | the request may have a single type, but that type can be represented in nodepool by many pool labels | 20:11 |
Shrews | which now must be chosen early rather than relying on a 1-to-1 mapping | 20:12 |
Shrews | ugh | 20:12 |
clarkb | I think the type would still be for the original request? | 20:13 |
clarkb | or carry the requested type as a new field | 20:13 |
clarkb | it isn't possible to have a node with multiple flavors so you still have a 1:1 mapping in that way | 20:15 |
clarkb | ya hasProviderQuota() should just work due to requests still being 1:1 label:node | 20:16 |
clarkb | and similar for hasRemainingQuota I think | 20:17 |
clarkb | if you keep the needed_types based on the request itself and indepedent of the resulting type sets I think it should work? | 20:18 |
corvus | er, type is another word for label | 20:21 |
corvus | we just changed what we called it after initial implementation | 20:22 |
corvus | we should be able to s/type/label/ in nodepool | 20:22 |
*** rlandy is now known as rlandy|brb | 20:23 | |
corvus | Shrews: ^ does that clarify things? i don't think OpenStackProvider.quotaNeededByNodeType needs any changes | 20:26 |
*** rlandy|brb is now known as rlandy | 20:41 | |
*** pcaruana has quit IRC | 20:43 | |
*** acozine1 has quit IRC | 20:48 | |
*** dkranz has quit IRC | 20:55 | |
*** dkranz has joined #zuul | 20:57 | |
*** dkranz has quit IRC | 21:04 | |
Shrews | corvus: it has to be changed. pool.labels can no longer be indexed by a label name/type | 21:07 |
Shrews | that is going to be changed from a dict to a list (which will have to be searched to find a label matching the requested label) | 21:08 |
corvus | Shrews: then, yeah, if we're going to allow the openstack provider to have duplicate labels (i think we talked about possibly doing that as a follow-up since it's not strictly necessary for the underlying api support) then whatever method we use to choose which actual-label to use for a given requested-label would need to apply to the quota check. i can only think of 'random' or 'first-matching' as making | 21:12 |
corvus | sense. | 21:12 |
Shrews | I admit, this might be possibly quite difficult for others to comprehend w/o seeing the actual code that I've written | 21:13 |
Shrews | corvus: i sort of think we HAVE to change the openstack provider since this affects the driver api | 21:14 |
Shrews | i think i might see a path forward. it's just uphill... and snowing... with a driving wind in my face | 21:15 |
corvus | Shrews: i thought we were maintaining a distinction between updating the api to allow that, without actually supporting duplicate labels in dynamic drivers | 21:15 |
corvus | * We can make implementation of multi-label support in the dynamic drivers | 21:15 |
corvus | optional. I don't expect we'd add support of this to the OpenStack driver | 21:15 |
corvus | with the initial change. It can be added at a later date, if desired. | 21:15 |
corvus | from the email ^ | 21:15 |
*** sshnaidm|off has quit IRC | 21:16 | |
Shrews | corvus: i know. i'm just not seeing a way to avoid it at this point | 21:16 |
corvus | Shrews: is OpenStackProvider.quotaNeededByNodeType an API method? | 21:17 |
Shrews | no | 21:17 |
corvus | Shrews: it can continue to assume that there's only one label object for a given label name.... oh, is the issue that the config datastructure is changing? | 21:19 |
corvus | and the config datastructure is not part of the driver api? | 21:19 |
Shrews | the problem is that our code references pool.labels as a dict (not a properly defined part of the API... yes, you are correct). the static driver cannot support multiple labels without changing that | 21:20 |
Shrews | and changing that affects the other drivers | 21:20 |
Shrews | so i'm making it a proper part of the api | 21:20 |
corvus | Shrews: maybe we should hand off configuration parsing of providers completely to their drivers? | 21:21 |
Shrews | so that you have to pool.addLabel(x), and we can then do pool.getLabels() or pool.getFirstMatchingLabel() | 21:21 |
Shrews | corvus: that might be another valid way | 21:21 |
corvus | Shrews: that would put configuration options related to openstack in the openstack driver, and aws in the aws driver, etc | 21:22 |
Shrews | but there are certain structures we expect to have populated (like pools). We lose enforcement of that | 21:22 |
corvus | that's the model that zuul uses for the triggers and reporters -- zuul parses up to the point where it knows "this dict is a config blob for a github trigger, hand it to the github driver to parse) | 21:22 |
corvus | Shrews: we could expect certain outcomes from a driver -- like, it returns a list of pools (and possibly images, etc), after parsing the configuration. | 21:24 |
Shrews | corvus: let me put some thought into it. thx for the idea | 21:24 |
Shrews | that might end up easier | 21:25 |
corvus | cool | 21:25 |
Shrews | i still worry about handling labels outside of the drivers though | 21:26 |
Shrews | might end up at the same place :/ | 21:26 |
corvus | Shrews: maybe we can aim for making it as opaque to nodepool as possible; it's just calling driver methods with a (label) argument, and they're dealing with internal data structures on their own. | 21:27 |
corvus | but yeah, it's not easy to see that far ahead, even if we squint :) | 21:28 |
Shrews | actually... the drivers already handle all of the provider config parsing | 21:28 |
Shrews | it's the things in drivers/__init__.py that would have to moved inside of driver itself | 21:29 |
Shrews | maybe we can abstract most of that away | 21:30 |
Shrews | oh look. beer-thirty | 21:31 |
clarkb | pool.labels is already a list? | 21:32 |
clarkb | as for handling duplicates in a list pass in the entire pool.labels entry that is chosen rather than just the type/label name I guess | 21:35 |
clarkb | and use that to determine the flavor when calculating quota and when booting the instance | 21:35 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Move SQL web handler to driver https://review.openstack.org/568028 | 21:37 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy https://review.openstack.org/567959 | 21:37 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp https://review.openstack.org/568335 | 21:37 |
corvus | clarkb: pools.labels is a list in configuration, but when parsed, it's turned into a dict -- because you can't have duplicate labels currently. i believe the issue shrews is describing is that if a driver supports multiple labels with the same name, how do we select which one to use? i'm suggesting that we should adjust the driver api so that nodepool itself doesn't have to answer that, and so that all of | 21:40 |
corvus | the following choices are possible inside of a driver: pick the first available (static driver); don't support multiple labels with the same name (openstack driver); maybe pick one at random or something (possible future implementation of openstack driver) | 21:40 |
clarkb | gotcha | 21:43 |
clarkb | corvus: on https://review.openstack.org/#/c/568028/4/zuul/cmd/web.py seems like we should readjust the existing connections parameter to be passed the data we need and then update to find the connections data as necessary from there? | 21:45 |
clarkb | I haven't looked to see how much of that changes with cherrypy though | 21:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy https://review.openstack.org/567959 | 21:46 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp https://review.openstack.org/568335 | 21:46 |
corvus | clarkb: absolutely -- that happens in the cherrypy change. i just consider that temporary scaffolding to try to keep the changes separate | 21:46 |
corvus | (the cherrypy change moves most of the iterating over connections, etc, to inside zuulweb, so we just pass in the connection registry object) | 21:48 |
*** harlowja has quit IRC | 21:56 | |
*** ssbarnea_ has quit IRC | 21:58 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP replace use of aiohttp with cherrypy https://review.openstack.org/567959 | 22:10 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp https://review.openstack.org/568335 | 22:10 |
*** threestrands has joined #zuul | 22:16 | |
*** rlandy has quit IRC | 23:20 | |
*** snapiri has quit IRC | 23:37 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!