*** michael-beaver has quit IRC | 00:02 | |
*** jamesmcarthur has quit IRC | 00:04 | |
*** jamesmcarthur has joined #zuul | 00:10 | |
*** igordc has quit IRC | 00:15 | |
*** jamesmcarthur has quit IRC | 00:16 | |
*** mattw4 has quit IRC | 00:17 | |
*** jamesmcarthur has joined #zuul | 00:19 | |
*** jamesmcarthur has quit IRC | 00:41 | |
*** jamesmcarthur has joined #zuul | 00:42 | |
*** jamesmcarthur has quit IRC | 01:11 | |
*** jamesmcarthur has joined #zuul | 01:11 | |
openstackgerrit | Merged zuul/zuul master: Set url scheme on HTTP Gerrit events https://review.opendev.org/686054 | 01:22 |
---|---|---|
*** jamesmcarthur has quit IRC | 01:41 | |
*** jamesmcarthur has joined #zuul | 01:42 | |
*** jamesmcarthur has quit IRC | 01:47 | |
*** spsurya has joined #zuul | 01:47 | |
*** jamesmcarthur has joined #zuul | 01:59 | |
*** jamesmcarthur has quit IRC | 02:03 | |
*** jamesmcarthur has joined #zuul | 02:04 | |
*** irclogbot_0 has quit IRC | 02:09 | |
*** jamesmcarthur has quit IRC | 02:09 | |
*** irclogbot_0 has joined #zuul | 02:13 | |
*** jamesmcarthur has joined #zuul | 02:33 | |
*** jamesmcarthur has quit IRC | 02:41 | |
*** saneax has joined #zuul | 03:18 | |
*** jamesmcarthur has joined #zuul | 03:37 | |
*** saneax has quit IRC | 03:37 | |
*** jamesmcarthur has quit IRC | 03:44 | |
*** recheck_ has joined #zuul | 04:35 | |
*** jangutter_ has joined #zuul | 04:36 | |
*** jangutter has quit IRC | 04:38 | |
*** recheck has quit IRC | 04:38 | |
*** fdegir9 has joined #zuul | 04:40 | |
*** fdegir has quit IRC | 04:40 | |
*** tobiash has quit IRC | 04:40 | |
*** mhu has quit IRC | 04:40 | |
*** Miouge has quit IRC | 04:40 | |
*** tobiash has joined #zuul | 04:41 | |
*** Miouge has joined #zuul | 04:42 | |
*** pcaruana has joined #zuul | 04:52 | |
*** jamesmcarthur has joined #zuul | 05:05 | |
*** jamesmcarthur has quit IRC | 05:10 | |
*** bhavikdbavishi has quit IRC | 05:25 | |
*** AJaeger has quit IRC | 05:57 | |
*** AJaeger has joined #zuul | 06:02 | |
*** jamesmcarthur has joined #zuul | 06:36 | |
*** jamesmcarthur has quit IRC | 06:49 | |
*** avass has joined #zuul | 06:52 | |
*** hashar has joined #zuul | 07:10 | |
*** badboy has joined #zuul | 07:12 | |
*** tosky has joined #zuul | 07:17 | |
*** jamesmcarthur has joined #zuul | 07:25 | |
*** jamesmcarthur has quit IRC | 07:31 | |
*** jamesmcarthur has joined #zuul | 07:39 | |
*** jamesmcarthur has quit IRC | 07:44 | |
*** jamesmcarthur has joined #zuul | 07:44 | |
*** jpena|off is now known as jpena | 07:48 | |
*** jamesmcarthur has quit IRC | 07:51 | |
*** jamesmcarthur has joined #zuul | 07:56 | |
*** mhu has joined #zuul | 08:01 | |
*** jamesmcarthur has quit IRC | 08:09 | |
*** jpena is now known as jpena|brb | 08:10 | |
*** jpena|brb is now known as jpena | 08:32 | |
*** bhavikdbavishi has joined #zuul | 08:34 | |
*** panda has quit IRC | 08:48 | |
*** panda has joined #zuul | 08:49 | |
*** bolg has joined #zuul | 09:08 | |
bolg | anybody working on mac free to review https://review.opendev.org/c/671674/5 ? Relatively small change = quick win :) | 09:09 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Authorization rules: support YAML nested dictionaries https://review.opendev.org/684790 | 09:10 |
*** bolg has quit IRC | 09:35 | |
badboy | why did my logs stopped being compressed? | 09:40 |
badboy | OOP [upload-logs : gzip console log and json output] | 09:40 |
badboy | localhost | skipping: Conditional result was False | 09:40 |
*** bhavikdbavishi has quit IRC | 09:41 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: authentication config: add optional token_expiry https://review.opendev.org/642408 | 09:48 |
*** bhavikdbavishi has joined #zuul | 09:49 | |
AJaeger | badboy: http://lists.zuul-ci.org/pipermail/zuul-announce/2019-September/000053.html - change merged yesterday | 09:55 |
*** hashar has quit IRC | 09:55 | |
badboy | AJaeger: thank you! | 10:07 |
badboy | AJaeger: is there a way to set timezone offset in the UI? | 10:14 |
*** badboy has quit IRC | 10:15 | |
*** badboy has joined #zuul | 10:18 | |
*** bolg has joined #zuul | 10:36 | |
*** zbr|ruck is now known as zbr|lunch | 10:39 | |
*** tosky_ has joined #zuul | 10:49 | |
*** tosky has quit IRC | 10:52 | |
*** tosky_ is now known as tosky | 10:58 | |
*** tosky_ has joined #zuul | 11:02 | |
*** tosky is now known as Guest18639 | 11:02 | |
*** tosky_ is now known as tosky | 11:02 | |
*** Guest18639 has quit IRC | 11:04 | |
*** openstackstatus has quit IRC | 11:09 | |
*** bhavikdbavishi has quit IRC | 11:14 | |
*** jpena is now known as jpena|lunch | 11:24 | |
*** jamesmcarthur has joined #zuul | 11:36 | |
*** jamesmcarthur has quit IRC | 11:43 | |
*** bhavikdbavishi has joined #zuul | 11:47 | |
*** badboy has quit IRC | 11:49 | |
*** jamesmcarthur has joined #zuul | 11:51 | |
*** bhavikdbavishi1 has joined #zuul | 11:53 | |
*** bhavikdbavishi has quit IRC | 11:54 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 11:54 | |
*** jamesmcarthur has quit IRC | 12:11 | |
*** jamesmcarthur has joined #zuul | 12:11 | |
*** ianychoi has quit IRC | 12:21 | |
*** ianychoi has joined #zuul | 12:25 | |
*** jpena|lunch is now known as jpena | 12:29 | |
*** jamesmcarthur has quit IRC | 12:30 | |
*** jamesmcarthur has joined #zuul | 12:30 | |
*** rlandy has joined #zuul | 12:40 | |
*** jamesmcarthur has quit IRC | 12:45 | |
*** jamesmcarthur has joined #zuul | 12:53 | |
*** jamesmcarthur has quit IRC | 12:54 | |
*** jamesmcarthur has joined #zuul | 12:54 | |
pabelanger | Shrews: I seem to have a stuck request: http://paste.openstack.org/show/780747/ but no more providers to try | 13:18 |
pabelanger | I'm not sure how to debug | 13:18 |
pabelanger | Shrews: http://paste.openstack.org/show/780749/ seems to be history of node request, which was all from yesterday | 13:23 |
pabelanger | so, looks like nodepool lost track of it some how? | 13:23 |
frickler | pabelanger: I'm assuming the exception at 00:12:34,674 may tell more | 13:29 |
pabelanger | frickler: yah, that is cloud outage in nodepool.PoolWorker.vexxhost-sjc1-v2-highcpu-1, and our 3rd attempt to launch. So we then proceeded to remove request from there | 13:32 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 13:33 |
*** zbr|lunch is now known as zbr|ruck | 13:48 | |
*** michael-beaver has joined #zuul | 13:52 | |
Shrews | pabelanger: the only way I can see that happening is if there are still registered launchers that have not processed it | 14:05 |
pabelanger | Shrews: any suggestions how I can check that? | 14:06 |
*** panda has quit IRC | 14:06 | |
Shrews | pabelanger: comparing the "Declined By" list in http://paste.openstack.org/show/780747/ against running launchers? Or you can check the znodes in /nodepool/launchers with zk-shell or similar | 14:07 |
*** panda has joined #zuul | 14:08 | |
Shrews | pabelanger: if all registered launchers have declined the request, it will go to FAILED | 14:08 |
pabelanger | Shrews: yah, for declied by that is all my launcher regions. But let me check zookeeper, maybe something else there like old one | 14:09 |
pabelanger | hard to tell, what it is waiting for | 14:09 |
Shrews | pabelanger: what's the exception that frickler pointed out? you left that out of the paste | 14:16 |
Shrews | bolg: I thought mac used kqueue, not epoll/poll? | 14:17 |
Shrews | none of that seems mac specific, unless they started supporting epoll at some point | 14:18 |
pabelanger | http://paste.openstack.org/show/780757/ | 14:19 |
pabelanger | sdkexception | 14:19 |
*** jamesmcarthur has quit IRC | 14:28 | |
*** openstackstatus has joined #zuul | 14:29 | |
*** ChanServ sets mode: +v openstackstatus | 14:29 | |
clarkb | Shrews: https://www.freebsd.org/cgi/man.cgi?poll only epoll is linux specific | 14:35 |
clarkb | kqueue is bsd specific. They both share poll and select | 14:35 |
clarkb | its less about addibg mac specific code and instead supporting osx with poll alongsidebetter performing epoll for linux | 14:36 |
Shrews | clarkb: oh, i thought he was adding epoll to existing poll support, but it's reversed. Perhaps I should have looked at the code rather than just the commit. :) | 14:37 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 14:40 |
openstackgerrit | Merged zuul/zuul-website master: CSS fix for ul/li in FAQ https://review.opendev.org/686003 | 14:41 |
pabelanger | Shrews: Ah, it looks like there is an old provider in zk for some reason | 14:46 |
pabelanger | but, not in my nodepool.yaml files | 14:46 |
pabelanger | how best to remove it from zk? | 14:46 |
Shrews | pabelanger: hrm. those znodes should be ephemeral, so that must mean it's still running somewhere | 14:46 |
clarkb | possible we dont remove providers unless we restart? just a hunch | 14:47 |
*** jamesmcarthur has joined #zuul | 14:48 | |
pabelanger | Shrews: pretty sure they are not registered in launchers ATM, I don't see references for them in logs | 14:48 |
*** avass has quit IRC | 14:48 | |
pabelanger | I can stop / start launch to see if that cleans it up | 14:49 |
pabelanger | it would be nice if nodepool info had a list flag or new command, to see what providers are configured and running | 14:50 |
pabelanger | Shrews: clarkb: okay, stop / start removed the info from zookeeper | 14:52 |
pabelanger | so, sounds like we might have a bug when we remove a provider, not cleaning up properly | 14:52 |
pabelanger | and that also removed my periodic jobs from zuul | 14:53 |
pabelanger | so, that was the issue | 14:53 |
pabelanger | ended up with NODE_FAILURE on said job waiting | 14:53 |
pabelanger | which, make sense | 14:53 |
Shrews | pabelanger: you should be able to use https://opendev.org/zuul/nodepool/src/branch/master/nodepool/tests/unit/test_launcher.py#L1476 to help catch the problem you saw | 14:57 |
Shrews | i'm not sure what's lacking there | 14:57 |
*** jangutter_ has quit IRC | 15:00 | |
pabelanger | Shrews: ++ I'll try to remember the order I did for the removal | 15:02 |
*** mattw4 has joined #zuul | 15:23 | |
clarkb | pabelanger: Shrews the issue is stop() on providers is a noop | 15:27 |
clarkb | however the provider isn't really a long running thread, the manager is and we call into the provider from the manager which probably explains why stop is a noop | 15:28 |
clarkb | but the zk connection is managed by the provider not the manager so there is no closure of that connection | 15:28 |
clarkb | sorry it is the handler with the state information about what is running not the provider manager | 15:32 |
clarkb | but the provider manager manages the zk connection | 15:32 |
clarkb | I think we want to push the zk connection up into the handler as it is what actually interacts with zookeeper | 15:33 |
clarkb | and it can make deicions on when to close the zk connection unlike the provider manager | 15:33 |
clarkb | oh except node requset handler is per request hrm | 15:36 |
clarkb | in that case maybe we can simply have stop() in the provider close the zk connection and force it to simply error that request? | 15:36 |
clarkb | that likely won't help pabelanger with a single provider though | 15:36 |
clarkb | I expect that will result in node failures when reloading configs | 15:36 |
pabelanger | clarkb: for zuul.a.c, we have more then 1 provider. Which is nice! rdoproject is the place for single provider | 15:40 |
pabelanger | but also neat, you can see what the potential issue is | 15:40 |
*** tosky_ has joined #zuul | 15:41 | |
*** tosky is now known as Guest83058 | 15:41 | |
*** tosky_ is now known as tosky | 15:41 | |
*** Guest83058 has quit IRC | 15:43 | |
clarkb | reading even more it is actually the pool worker threads taht store the state I think we need (knowing when a provider manager has processed all requests coming to it | 15:46 |
clarkb | What we want to do (roughly) is kill the zk connection once all NodeRequestHanlders associated with the current ProviderManager have completed | 15:47 |
Shrews | pabelanger: clarkb: the zk connection is shared, so i think we probably need the PoolWorker stop() to deregister the launcher | 15:48 |
Shrews | we don't currently have a method in zk.py to do that, so that will need added as well | 15:50 |
clarkb | Shrews: ya this is complicated by the fact that the provider manager and the zk connection are shared with multiple pool workers (where we have the running request state) | 15:51 |
*** ianychoi has quit IRC | 15:51 | |
clarkb | and ya maybe we can have the zk lib track users per connection. Then as we stop those users and go to zero we can close the tcp connection cleaning up the ephemeral nodes | 15:52 |
clarkb | oh but the zk connection is global for all providers right? | 15:52 |
clarkb | I guess that is the real underlying reason we don't clean up the ephemeral nodes | 15:53 |
*** ianychoi has joined #zuul | 15:54 | |
Shrews | right. we don't ever terminate the zk connection, except on shutdown. no need to track users, we just need to cleanup after thread stop | 15:55 |
Shrews | we just need a deregisterLauncher() call (does not yet exist) here: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/launcher.py#L374 | 15:56 |
clarkb | you mean do explicit cleanup rather than wait for ephemeral cleanup. That would work too | 15:58 |
Shrews | yes. it is a bug that we depend *only* on ephemeral cleanup. that works when we shutdown nodepool, but not on dynamic changes | 15:58 |
Shrews | i've already got the code mostly written | 15:59 |
clarkb | Shrews: you should update the docstring on registerLauncher too. It does not automatically deregister when the launcher terminates (only when the zk connection goes away) | 16:01 |
*** jamesmcarthur has quit IRC | 16:06 | |
openstackgerrit | David Shrewsbury proposed zuul/nodepool master: Deregister a launcher when removed from config https://review.opendev.org/686198 | 16:09 |
Shrews | clarkb: pabelanger: confirmed the new test changes fail before adding the call to deregisterLauncher() ^^^ | 16:09 |
*** bolg has quit IRC | 16:09 | |
*** pcaruana has quit IRC | 16:10 | |
pabelanger | yay! Thanks for quick fix, will look shortly | 16:11 |
clarkb | Shrews: one small nit, but +2 anyway | 16:11 |
*** pcaruana has joined #zuul | 16:11 | |
Shrews | clarkb: i thought of that but decided against it. unnecessary since we don't modify any data | 16:12 |
Shrews | we just need the id | 16:12 |
clarkb | Shrews: I agree, but its nice to have symmetry | 16:14 |
clarkb | you register the same thing you deregister | 16:14 |
*** rlandy is now known as rlandy|brb | 16:22 | |
*** fdegir9 is now known as fdegir | 16:23 | |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Replace command with shell in persistent-firewall https://review.opendev.org/686212 | 16:30 |
clarkb | corvus: pabelanger fungi ^ thats another "lets try tweaking things" change around persistent-firewall to see if we can get the behavior to change | 16:30 |
fungi | watch it be that iptables-save doesn't work with the minimal environment provided by command tasks or something | 16:33 |
clarkb | http://status.openstack.org/elastic-recheck/#1846093 even that shows some weird gaps | 16:34 |
clarkb | its definitely not happening all the time and happens on all the platforms (distro and cloud) and across jobs | 16:35 |
clarkb | whcih is why I expect it is an ansible bug | 16:35 |
*** rlandy|brb is now known as rlandy | 16:37 | |
*** tosky_ has joined #zuul | 16:40 | |
pabelanger | yah, I think it is ansible bug | 16:41 |
pabelanger | for me, I ended up switch to mitogen, which resulted in fast runs and no more -13 | 16:41 |
pabelanger | but, do want to figure out why it happens | 16:42 |
pabelanger | clarkb: +2, approve when ready | 16:42 |
clarkb | what does switching to mitogen look like? have to add package installs and set config flag? | 16:42 |
*** tosky has quit IRC | 16:43 | |
pabelanger | yah, pip install then ansible.cfg flag | 16:43 |
pabelanger | so far, it just works | 16:43 |
pabelanger | even tho totally unsupported by ansible core team | 16:43 |
*** rfolco is now known as rfolco|dentist | 16:44 | |
*** tosky_ is now known as tosky | 16:45 | |
*** hashar has joined #zuul | 16:51 | |
corvus | tristanC: i'm trying to rework the log manifest tree view so that the folder names are no longer links to the raw log storage directory index, but instead behave just like the ">" icon to the left and expand the tree. i've spent quite some time trying to get the patternfly-react TreeView to behave in that way, but it seems constructed in just such a way that it's impossible to change the internal state of | 16:55 |
corvus | the TreeViewNode to set it to be expanded. | 16:55 |
corvus | tristanC: i've tried several approaches, but fundamentally i think the issue is that once the TreeViewNode instance is created, it only checks this.state.expanded to see if it should be expanded (it does not consult this.props.node -- so even though you can use that to set a default expanded state, once created, you can't change the prop to set the expanded state) | 16:57 |
corvus | tristanC: there is no reference to the TreeViewNode object that we can get to, so we can't call TreeViewNode.setState (which is what it does if you click on ">") | 16:58 |
corvus | tristanC: the only thing remaining i can think of is to find the DOM node, and work backward to the react component via __reactInternalInstance$... which seems like a very bad way of doing it. do you have any other ideas? | 16:59 |
corvus | code for easy reference: https://github.com/patternfly/patternfly-react/blob/master/packages/patternfly-3/patternfly-react/src/components/TreeView/TreeView.js https://github.com/patternfly/patternfly-react/blob/master/packages/patternfly-3/patternfly-react/src/components/TreeView/TreeViewNode.js | 17:00 |
corvus | fungi: ^ fyi | 17:01 |
fungi | corvus: thanks! i hadn't even gotten that far. playing around with what's in web/src/containers/build/Manifest.jsx wasn't yielding the results i anticipated | 17:03 |
fungi | and i wasn't having much luck trying to crash-course my way through reactjs internals | 17:04 |
tristanC | corvus: have you tried https://reactjs.org/docs/refs-and-the-dom.html#creating-refs ? | 17:14 |
corvus | tristanC: yeah, i have a ref to the TreeView, but there's no ref to the TreeViewNode | 17:15 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: WIP - Gitlab - Basic handling of merge_requests event https://review.opendev.org/685990 | 17:15 |
tristanC | corvus: even using the TreeView.nodes prop? ( https://github.com/patternfly/patternfly-react/blob/master/packages/patternfly-3/patternfly-react/src/components/TreeView/TreeView.js#L111 ) | 17:16 |
clarkb | " | 17:17 |
clarkb | 2019-10-02 17:11:10.027139 | ubuntu-bionic | 305 Use shell only when shell functionality is required" thank you ansible lint | 17:17 |
pabelanger | yah, I stopped using it and switched to yamllint. Too opinionated now, IMO | 17:18 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Replace command with shell in persistent-firewall https://review.opendev.org/686212 | 17:18 |
clarkb | pabelanger: corvus fungi ^ now with the skip linting tag | 17:19 |
corvus | tristanC: yeah, TreeView.nodes is our array of dictionaries that we pass in; it uses that to create TreeViewNode objects, but it doesn't keep track of them, and they are the objects that have the expanded state: https://github.com/patternfly/patternfly-react/blob/master/packages/patternfly-3/patternfly-react/src/components/TreeView/TreeView.js#L89-L91 | 17:19 |
corvus | clarkb: when you have a second, can you re-review https://review.opendev.org/683958 and child? i can work on deploying that once we have those images built | 17:21 |
clarkb | corvus: yup | 17:22 |
tristanC | corvus: arg i see, the map result is not discarded | 17:23 |
tristanC | is somehow discarded* | 17:25 |
*** jpena is now known as jpena|off | 17:27 | |
tristanC | corvus: there may be a way to trigger the render() procedure of the TreeView component, e.g. by changing a dummy state... | 17:27 |
openstackgerrit | Merged zuul/zuul-registry master: Initial implementation https://review.opendev.org/683958 | 17:28 |
corvus | tristanC: i can do that by changing the nodes prop that i pass to TreeView -- it runs, but the issue is that react caches the TreeViewNode and just updates its props. once initialized, TreeViewNode doesn't consult its props anymore to determine whether it should be expanded, it only consults its state. | 17:32 |
corvus | tristanC: it's like they designed it perfectly to make it impossible to do this. | 17:32 |
*** pcaruana has quit IRC | 17:45 | |
*** jamesmcarthur has joined #zuul | 17:51 | |
openstackgerrit | Merged zuul/zuul-jobs master: Replace command with shell in persistent-firewall https://review.opendev.org/686212 | 17:54 |
*** igordc has joined #zuul | 18:03 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-registry master: Fix container image build https://review.opendev.org/685808 | 18:05 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-registry master: Add tox configuration and fixe flake8 errors https://review.opendev.org/686230 | 18:05 |
tristanC | oops, 685808 got rebased by mistake | 18:06 |
clarkb | weirdly the approval remained | 18:07 |
pabelanger | that is odd | 18:09 |
pabelanger | it looks like approval and new pS was same time | 18:09 |
pabelanger | oh, no sorry | 18:09 |
pabelanger | that was zuul enqueing | 18:09 |
clarkb | Considering the repo is new there isn't really anything to rebase on | 18:11 |
clarkb | gerrit may have recognized that? not sure what changed if anything | 18:11 |
clarkb | just the timestamps in the commit and the committer | 18:11 |
*** mattw4 has quit IRC | 18:16 | |
*** mattw4 has joined #zuul | 18:17 | |
clarkb | on the off chance that ansible explicitly exits with a -13 I grepped the source code for -13 and didn't find anything for return codes | 18:21 |
clarkb | lots of software versions and stuff though | 18:22 |
tristanC | corvus: would you mind trying strict mypy for zuul_registry? | 18:22 |
tristanC | clarkb: could this be something fixed in recent release? it seems like some opendev executors are not using updated ansible version, e.g. 2.8.0 instead of 2.8.5 | 18:23 |
clarkb | tristanC: correct pip doesn't update ansible due to how we specify our versions | 18:24 |
clarkb | That is something we could try, updating all the venvs to the latest point release of the respective versions | 18:24 |
openstackgerrit | Merged zuul/zuul-registry master: Fix container image build https://review.opendev.org/685808 | 18:33 |
*** bhavikdbavishi has quit IRC | 18:33 | |
*** bhavikdbavishi has joined #zuul | 18:35 | |
pabelanger | clarkb: Shrews: +a on nodepool zk fix today, thanks for helping | 18:40 |
*** mattw4 has quit IRC | 18:50 | |
corvus | tristanC: yes -- i'm not convinced about strict mypy yet, so i think zuul-registry is a great place to demonstrate. | 18:55 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-registry master: WIP: add type annotations https://review.opendev.org/686249 | 18:56 |
corvus | tristanC: er, in case that "yes" was unlear -- i am in favor of trying strict mypy with z-r. :) | 18:56 |
tristanC | corvus: alright, here is how it can be done ^ I'll do the other modules later today or tomorrow | 18:56 |
pabelanger | with zuul-registry, would you run 1 per region (for nodepool) or top level, along side zuul-executors. I am guessing the 2nd option | 19:02 |
*** smcginnis has joined #zuul | 19:04 | |
smcginnis | I think I've noticed a zuul bug. | 19:04 |
smcginnis | The "ref url" link on https://zuul.opendev.org/t/openstack/build/6010c9c727e64db983744315768ad203 does not appear to be formatted properly. | 19:04 |
smcginnis | The resulting link gets you redirected to a list of tenants instead of the review. | 19:05 |
smcginnis | Looks like it may be appending the review URL to the build base URL. | 19:05 |
*** spsurya has quit IRC | 19:09 | |
clarkb | smcginnis: yes it ended up being rendered as a relative url due to the lack of the scheme component in the url | 19:10 |
clarkb | smcginnis: patch to fix this merged yesterday and requires a zuul scheduler restart to take effect | 19:10 |
smcginnis | Cool, thanks. | 19:11 |
*** hashar has quit IRC | 19:18 | |
openstackgerrit | Merged zuul/nodepool master: Deregister a launcher when removed from config https://review.opendev.org/686198 | 19:22 |
*** bhavikdbavishi has quit IRC | 19:43 | |
*** brennen has left #zuul | 20:00 | |
daniel2 | I starting getting a weird error in nodepool launcher: launcher_1 | AttributeError: 'NoneType' object has no attribute 'vcpus' | 20:14 |
daniel2 | Is it a permissions issue? | 20:14 |
*** remi_ness has joined #zuul | 20:14 | |
clarkb | is there a traceback? | 20:15 |
daniel2 | clarkb: https://shafer.cc/paste/view/raw/9e904f91 | 20:16 |
SpamapS | Hm, AWS just made some changes around vcpus in limits. | 20:16 |
SpamapS | daniel2: using AWS by any chance? | 20:16 |
daniel2 | No OpenStack. | 20:16 |
SpamapS | kk n/m then :) | 20:17 |
daniel2 | Used 24 of 400 | 20:17 |
clarkb | its trying to get the flavor's vcpu count | 20:17 |
daniel2 | Just setup a compute node with dual 20 core processors, 40 cores and 80 threads, plus 512GB of RAM. | 20:17 |
clarkb | but the flavor is a nonetype | 20:17 |
clarkb | did the flavor go away maybe? | 20:18 |
clarkb | on the cloud side? | 20:18 |
daniel2 | Nope, I'm looking right at it. | 20:19 |
clarkb | might also mean no matching flavor could be found? | 20:21 |
daniel2 | I restarted the container and it went away, I also dropped the min ram line cause the specific flavors are set | 20:21 |
clarkb | ya if min ram was resulting ib no matches that might do it | 20:22 |
*** smcginnis has left #zuul | 20:23 | |
*** jamesmcarthur has quit IRC | 20:23 | |
daniel2 | How do you get nodepool to not flood openstack when it has trouble creating instances. | 20:23 |
daniel2 | I end up with like 20 instances in Error state that can't be deleted. | 20:24 |
*** mattw4 has joined #zuul | 20:24 | |
clarkb | usually we'll disabale the provider and debug why it happens and fix it. But its purpose for existing is to request as many nodes as required as quickly aspossible | 20:24 |
daniel2 | clarkb: It doesn't even seem to give enough time for the nodes to spin up. | 20:24 |
daniel2 | It creates them then deletes them after a minute | 20:25 |
clarkb | that implies the api is telling nodepool that it failed | 20:25 |
clarkb | nodepool will wait patiently up to its timeouts if the cloud doesnt say anything is wrong | 20:25 |
corvus | yeah, if the instance went into error state, that's openstack doing that, not nodepool | 20:26 |
daniel2 | https://shafer.cc/paste/view/raw/a54ae82e Not a very informational traceback | 20:28 |
pabelanger | daniel2: you could also try modifying https://zuul-ci.org/docs/nodepool/configuration.html#attr-providers.[openstack].rate we had to do that with rdocloud for a while | 20:28 |
corvus | i wish it didn't create instances in that case, but apparently that's the architecture. nodepool will try to clean those up later on, if the cloud will let it. but sometimes they get stuck and a cloud operator has to go delete them | 20:28 |
daniel2 | The only way to delete them is go into the database usually. | 20:28 |
daniel2 | I'm confused cause these images built just fine before. | 20:29 |
corvus | daniel2: agreed -- we actually put that message in sdk because what we got back from openstack was so uninformative. :( | 20:29 |
clarkb | that can be due to no valid hypervisor being found | 20:29 |
clarkb | if you do a server show on the instancesyou should get the error from nova | 20:29 |
clarkb | if there is any info | 20:30 |
daniel2 | clarkb: what's weird is if I manually start an instance using the same parameters in Horizon, it works | 20:31 |
clarkb | does a server show on the instances reveal anything? | 20:33 |
*** jamesmcarthur has joined #zuul | 20:33 | |
daniel2 | Nope | 20:36 |
clarkb | no field in there with a json blob and a "reason" ? I want to say that is the key name for the fault | 20:38 |
daniel2 | So you think it's trying to connect but can't because of the key? | 20:39 |
clarkb | no | 20:39 |
clarkb | nodepool will give you a proper error message for that | 20:39 |
clarkb | nova is reporting a failure to boot the instance | 20:39 |
clarkb | and often nova will update the instance record with a reason later | 20:40 |
clarkb | (but not at the point it reports back to nodepool) | 20:40 |
clarkb | are you able to share the output of a server show? | 20:40 |
daniel2 | clarkb: it shows nothing because its stuck in deleting/error state | 20:41 |
daniel2 | Check your notice clarkb, it had a hostname I didn't want to broadcast publically. | 20:42 |
clarkb | nova should still tell you the server uuid and name, flavor, image etc | 20:42 |
clarkb | hrm ya doesn't have the json blob | 20:43 |
clarkb | you might need to check with the cloud logs | 20:43 |
clarkb | if you have access to them grepping on the instance uuid should bring up useful info | 20:43 |
daniel2 | Yeah I have controller access | 20:44 |
clarkb | iptables-save problems persist after updating the role to use shell. We can try updating ansible as tristanC suggests: https://review.opendev.org/#/c/686237 thanks for the review on that pabelanger | 20:47 |
daniel2 | clarkb: [instance: 1b08544b-0a37-45d5-b85d-4b4f20ab5259] Instance build timed out. Set to error state | 20:48 |
daniel2 | Thats what the compute node is giving me. | 20:49 |
clarkb | ok so it is hitting a nova timeout | 20:49 |
clarkb | off the top of my head that can happen due to image conversions that nova will attempt to do | 20:49 |
clarkb | we saw that with infracloud way back when. Not sure if other things can cause that | 20:49 |
pabelanger | yup! | 20:49 |
pabelanger | remember | 20:49 |
pabelanger | you can also increase boot-timeout in provider config | 20:50 |
daniel2 | I'll try rebuilding the images. | 20:50 |
pabelanger | daniel2: what format are you uploading as? | 20:50 |
daniel2 | I need to provision a few more compute nodes too. | 20:50 |
daniel2 | QCOW2 | 20:51 |
clarkb | pabelanger: in this case I don't think the pvodier config will change anything | 20:51 |
clarkb | pabelanger: its nova that is timing out not nodepool | 20:51 |
pabelanger | oh, right | 20:51 |
pabelanger | missed that | 20:51 |
clarkb | (otherwise we would get proper error in nodepool logs) | 20:51 |
pabelanger | yah, I would check nova to see if they are forcing raw, then maybe switch nodepool-builder to that format | 20:52 |
daniel2 | Does raw need qemu-utils? | 20:52 |
pabelanger | https://opendev.org/opendev/puppet-infracloud/commit/9dc122c3ddd852f4572ffff0c6d5ed6f84fa2a96 | 20:52 |
pabelanger | was our infracloud fix | 20:52 |
daniel2 | I'm curious if I could go back to using the docker builder container | 20:52 |
pabelanger | yah, we use qemu-img to convert | 20:53 |
daniel2 | https://shafer.cc/paste/view/raw/41aa331d | 20:53 |
daniel2 | Thats the image details | 20:53 |
clarkb | pabelanger: daniel2 raw is actaully the "native" image type for dib | 20:54 |
clarkb | dib converts from raw to $otherformats | 20:54 |
*** remi_ness has quit IRC | 20:54 | |
daniel2 | Oh, I never specified anything. | 20:54 |
clarkb | the default for nodepool should be whatever the cloud config specifies | 20:54 |
pabelanger | isn't qcow2 the default? | 20:54 |
daniel2 | It uses qcow2 for me. | 20:55 |
clarkb | qcow2 may be the cloud config default yes | 20:55 |
clarkb | my comment about raw was more aimed at the qemu-utils question | 20:55 |
clarkb | you need qemu-img for qcow2 | 20:55 |
pabelanger | ah, right. I thought it was the other way | 20:55 |
daniel2 | Which you need qemu-utils for raw as well? | 20:56 |
clarkb | I don't think so | 20:56 |
clarkb | since there is no conversion to perform | 20:56 |
*** tosky_ has joined #zuul | 21:05 | |
*** tosky has quit IRC | 21:06 | |
*** tosky_ is now known as tosky | 21:08 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: web: render log manifest consistently https://review.opendev.org/686307 | 21:38 |
corvus | fungi, clarkb, tristanC: ^ this is the best improvement i can come up with short of either replacing or modifying the treeview to get the desired expansion behavior. I think this is better than what we have now, but still not ideal. | 21:39 |
*** jamesmcarthur has quit IRC | 21:56 | |
*** rfolco|dentist is now known as rfolco | 22:17 | |
fungi | looks like the zuul-build-dashboard result isn't capable of rendering specific build pages | 22:34 |
fungi | but i get the gist of it from the commit message and diff, sounds like a fine compromise | 22:35 |
fungi | unfortunate that react can't extend the expand/contract toggle to the object name | 22:36 |
fungi | oh! zuul-build-dashboard-multi-tenant seems to be able to: https://e88fada250d77d396ef5-a605dbf95134d478b50bec2dfa092555.ssl.cf1.rackcdn.com/686307/1/check/zuul-build-dashboard-multi-tenant/1469c57/npm/html/t/opendev.org/build/4be7070266c24a2da06087347012cde9 | 22:41 |
fungi | ahh, nevermind, we don't get a logs tab with that | 22:41 |
daniel2 | So I set the format to raw but it's still trying to do qcow2 | 22:41 |
fungi | oh, it was just that build. here's one that works: https://e88fada250d77d396ef5-a605dbf95134d478b50bec2dfa092555.ssl.cf1.rackcdn.com/686307/1/check/zuul-build-dashboard-multi-tenant/1469c57/npm/html/t/local/build/3e8d94730bab4649b1859c4d075002a6/logs | 22:42 |
fungi | daniel2: nova is still trying to boot qcow2 or dib is still trying to build qcow2? | 22:44 |
daniel2 | dib is still trying to build qcow2 even when I specify raw in "formats:" | 22:45 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-registry master: Add type annotations https://review.opendev.org/686249 | 22:46 |
tristanC | corvus: here is the type annotation for each modules, though it's a bit ugly for swift and main because neither openstack or cherrypy is typed. I think we should ignore those modules for now, though i left the full annotation for reference in this second PS | 22:48 |
tristanC | corvus: and doing so, mypy figured out that some procedure may return None and thus we need to check for that before doing things like json.dumps | 22:49 |
clarkb | arent main and swift the entire code that will run in production? | 22:54 |
clarkb | nit sure what the value us if we ignore those two | 22:54 |
fungi | daniel2: here's how we're setting it on our builders for specific providers: https://opendev.org/opendev/system-config/src/branch/master/playbooks/templates/clouds/nodepool_builder_clouds.yaml.j2#L81 | 22:55 |
fungi | is that roughly what you did? | 22:55 |
tristanC | corvus: also, here is an emacs snippet to configure flycheck mypy: http://paste.openstack.org/show/780770/ | 22:55 |
fungi | daniel2: the daemon also may need a restart to pick up clouds.yaml changes | 22:55 |
tristanC | clarkb: mypy also found issues in the storage module actually. | 22:56 |
tristanC | then for openstack and cherrypy, we could either add types there too, or at least provide a stub for the function we use | 22:58 |
fungi | tristanC: any reason not to approve 686307? | 23:03 |
tristanC | fungi: to leave a bit more time for others to review? the review was proposed less than an hour ago... | 23:05 |
fungi | fair, it's a behavior change after all | 23:05 |
fungi | a lot of opendev's zuul users are confused by the current behavior, but that doesn't mean all zuul users are | 23:06 |
daniel2 | What is the point of the nodepool-builder docker container if it doesn't have root, because it can't build images on it. | 23:07 |
tristanC | fungi: i didn't realized the current view caused confusion. the change seems to work as expected, i can +a now | 23:07 |
fungi | tristanC: i don't think it's urgent to approve, but we have seen a lot of users who don't realize there's a cooked log viewer for files in subdirectories because they keep following the raw link by clicking the directory name rather than using the > to expand the listing | 23:10 |
fungi | ideally there would have been a way to make clicking the directory name do the same as clicking the > symbol but that doesn't seem to be easily accomplished with react's treeview | 23:11 |
fungi | daniel2: o | 23:12 |
fungi | er | 23:12 |
fungi | daniel2: i'm not sure, but i think folks extend extra privs to the builder container in that case | 23:12 |
fungi | at a minimum it needs to be able to chroot | 23:13 |
*** tosky has quit IRC | 23:16 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-registry master: Add type annotations https://review.opendev.org/686249 | 23:20 |
clarkb | daniel2: fungi I think that is the one image that isnt tested and as far as I know no one is using it | 23:24 |
fungi | yeah, the quickstart exercise job just uses a static node i think? | 23:28 |
clarkb | correct | 23:28 |
*** mattw4 has quit IRC | 23:28 | |
*** panda is now known as panda|off | 23:39 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!