openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement a static driver for Nodepool https://review.openstack.org/468624 | 01:57 |
---|---|---|
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Abstract Nodepool request handling code https://review.openstack.org/468748 | 01:57 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Abstract Nodepool provider management code https://review.openstack.org/468749 | 01:57 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Collect request handling implementation in an OpenStack driver https://review.openstack.org/468750 | 01:57 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Extend Nodepool configuration syntax to support multiple drivers https://review.openstack.org/468751 | 01:57 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Add support for custom ssh port https://review.openstack.org/468752 | 01:57 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement an OpenContainer driver https://review.openstack.org/468753 | 01:57 |
tobiash | tristanC: cool, will have a look :) | 05:44 |
*** bhavik1 has joined #zuul | 06:03 | |
*** hashar has joined #zuul | 07:11 | |
tristanC | tobiash: nice, feedback are most welcome! | 08:47 |
tristanC | you'll also find a runc driver that should spawn container if you set a 'host' to nodepool provider | 08:50 |
tobiash | tristanC: jepp, saw that, will have a closer look on that later | 08:51 |
*** smyers has quit IRC | 09:02 | |
*** rcarrill1 is now known as rcarrillocruz | 09:08 | |
*** smyers has joined #zuul | 09:13 | |
tobiash | tristanC: left some thoughts on https://review.openstack.org/#/c/468749 | 10:12 |
tobiash | regarding the api of the ProviderManager base class | 10:12 |
tobiash | and I just stumbled about the question how we should handle min_ready as this is a label specific attribute | 10:13 |
tobiash | (which would not apply for static nodes) | 10:14 |
tobiash | tristanC, jeblair: I just had a thought about the planned config change for the nodepool drivers | 10:27 |
tobiash | tristanC, jeblair: https://review.openstack.org/#/c/468751/1/nodepool/config.py@208 | 10:28 |
tobiash | what do you think? | 10:28 |
tristanC | tobiash: absolutely, it sounds better to let driver manager their own configurations | 10:50 |
tobiash | tristanC: so we should prototype an example config and check with jeblair if that sounds good | 10:54 |
tristanC | note that I tried to do surgical changes to get an implementation up and running, if it's validated then it still need unit-test and improvements regarding configurations | 10:56 |
tobiash | sure | 10:57 |
tobiash | tristanC: btw, I rethought my thoughts on https://review.openstack.org/#/c/468749/ | 10:57 |
tristanC | tobiash: Thanks, those feedbacks are much appreciated, i'll propose new ps tomorrow | 10:59 |
tobiash | ok :) | 11:00 |
* tristanC is utc+9 :) | 11:04 | |
* tobiash is utc+2 | 11:06 | |
*** bhavik1 has quit IRC | 11:12 | |
*** isaacb has joined #zuul | 11:49 | |
*** nt has quit IRC | 12:18 | |
*** isaacb has quit IRC | 12:39 | |
*** Guest79619 is now known as mgagne | 13:14 | |
*** mgagne has quit IRC | 13:14 | |
*** mgagne has joined #zuul | 13:14 | |
*** jroll has joined #zuul | 15:20 | |
*** isaacb has joined #zuul | 15:33 | |
mordred | tobiash, tristanC: so - a few thoughts to toss out (great initial work btw) ... | 16:12 |
mordred | (also, these are me just thinking out loud, don't necessarily take these as *real* thoughts) | 16:13 |
mordred | tobiash, tristanC: for a static node, it's possible that an admin might want to allow more than one job to "check out" the node at a time | 16:14 |
mordred | although 'minReady' isn't a good word for that, it may be a similar concept at some level | 16:14 |
mordred | but similarly - one could imagine even a cloud/vm node type where the admin might want to say "use this for up to X concurrent jobs" and also "don't delete this when done" (don't delete when done is a very important feature for static nodes, obviously, but although I do not think openstack would ever use it - perhaps it's a useful enough feature to be in a base class) | 16:16 |
pabelanger | ++ | 16:19 |
pabelanger | concurrent jobs would be awesome | 16:20 |
mordred | tristanC, tobiash: it also may be worth considering having the list of static nodes read from config at start and written to zk so that any of the node launchers can respond to a request from zuul for one of the static nodes ... otherwise rolling out config changes to a set of node launcher processes might get weird with them having out-of-sync versions of the config (you may already be doing thatan dI just | 16:22 |
mordred | missed it - I haven't read _deeply_) | 16:22 |
tobiash | mordred: good idea | 16:24 |
tobiash | so the nodes could have a slots property | 16:25 |
mordred | yah. I know it's been a thing folks have asked for in the past - especially folks who have static hardware like "I have this one giant openpower server i want to use for testing, and it could handle 100 concurrent jobs easily" | 16:26 |
mordred | tristanC: also - for the oci driver - it doesn't seem to expose a "base image" anywhere and instead would just assume a "base os" that matches the OS of the node running the nodelauncher (which I actually rather like -it's a neat way to provide a simple thing there) | 16:26 |
tobiash | we also have such a use case (one node managing a cluster of test hardware) | 16:27 |
mordred | tobiash: ++ yes - that's another great example of such a thing | 16:27 |
mordred | tristanC: perhaps there should be some sort of matching in the zuul request protocol exposed by the nodelauncher that's running an oci driver? | 16:29 |
mordred | (or maybe some way of explicitly providing a label in the nodepool config) | 16:29 |
mordred | so that if you had a nodelauncher running on fedora and a nodelauncher running on ubuntu, in the zuul job you could hav ea nodeset that said "oci:fedora-24" or something | 16:30 |
mordred | and only a nodelauncher running the right base os would service that node request | 16:30 |
mordred | the other option, of course, would be also extending the image builder part to konw how to make base oci images and plumb those through to the oci run portion | 16:31 |
mordred | it's also possible I'm missing a part of that completely ... :) | 16:31 |
tobiash | interesting, with the oci driver the nodepool launcher would actually *become* a pool of slaves | 16:32 |
mordred | yah. I hadn't really thought of doing it that way - but I could see the appeal | 16:33 |
tobiash | but back to the question how to manage static slaves with multiple launcher | 16:34 |
tobiash | wouldn't also reading the config and pushing to zk be racy? | 16:35 |
tobiash | different possibility would be to allow adding additional nodes to zookeeper via e.g. cli at runtime | 16:36 |
mordred | tobiash: it shouldn't be - whena node notices a config change, it could grab the static-hosts lock, then write the static host config to zk | 16:37 |
mordred | perhaps with a checksum/version so that other nodes who notice a config change can doa quick check when they get the lock to see if the config is already valid | 16:38 |
tobiash | mordred: that implies that config changes are rolled out (almost) simultanously without a launcher (re)start in between | 16:38 |
mordred | yes. launchers should not need a restart to notice config changes - and then the only work the static 'launcher' is performing is writing the node info to zk | 16:39 |
mordred | then servicing the node requests can be as normal "do I have a matching node available in zk, if so, return to zuul, if not error" and can be done by anyway | 16:40 |
mordred | anyone | 16:40 |
tobiash | mordred: ok, that needs to be kept in mind then (thinking of running the launcher in k8s where a container restart could happen at any time) | 16:41 |
mordred | ++ - I think if we consider a contianer restart and "I noticed a config change" as the same thing - then a launcher restart and a config change event both essentially just perform the "verify my config is in sync with zk" | 16:42 |
tobiash | but writing the config sounds cool as all you have to do to support static nodes with several slots is to write several nodes into zk :) | 16:42 |
mordred | tobiash: oh - that's a great idea/way to support that | 16:42 |
tobiash | mordred: that race condition possibly could be solved by storing the hashes of all (last x) configs in zk so a container restart could check if it has an older version of the config | 16:44 |
mordred | tobiash: in the use youmention, we should also probably protect against ... yup, that's where I was going | 16:44 |
tobiash | that could be a good solution | 16:46 |
tobiash | btw, is the nodepool protocol protected against sudden launcher deaths? | 16:46 |
mordred | yes, I believe that should be covered | 16:47 |
tobiash | e.g. nodepool takes a lock (to delete a node) then dies -> could this leak an instance? | 16:47 |
mordred | if a launcher has claimed a noderequest and then goes away, its lock on the noderequest will go away and another launcher can pick up the request and fulfill it | 16:47 |
tobiash | ah, I see | 16:47 |
tobiash | the lock of the launcher vanishes with the launcher | 16:48 |
mordred | yup | 16:48 |
mordred | that's one of the things we gain in v3 | 16:48 |
tobiash | :) | 16:48 |
tobiash | similar for executors in v4 will be cool | 16:48 |
tobiash | just had today a jenkins oom (v2 deployment) leaving a bunch of stale jobs behind... | 16:49 |
mordred | actually - I believe we're already good on executors in v3 ... same thing - if an executor dies after having been scheduled with a job, it should get rescheduled | 16:49 |
mordred | but I might be wrong about that | 16:50 |
tobiash | how does that work? | 16:50 |
tobiash | executors are in v3 still scheduled via gearman? | 16:50 |
tobiash | so does gearman has means to deal with this? | 16:50 |
mordred | they are- but I believe if the worker claiming a gearman job dies, the scheduler will notice that - and in v3 I believe it will re-submit the job to gearman so it will be picked up again | 16:51 |
tobiash | ah, cool | 16:51 |
mordred | *I think* | 16:51 |
clarkb | yes | 16:51 |
mordred | I could be quite wrong aboutthat | 16:51 |
mordred | yay! | 16:51 |
mordred | I lke it when I get things right | 16:51 |
clarkb | if client goes away worker should go away and free things up. I tmay not do an explicit release to nodepool though | 16:52 |
clarkb | but the jobs should be rerun and node deleted eventually iirc | 16:52 |
clarkb | in fact we occasionally just hard restart the executors, I think only real cost is in rerunning previously completed but unrecorded work | 16:53 |
mordred | yah - I think nodepol will notice because the ownership lock on the nodes will go away perhaps? | 16:53 |
mordred | clarkb: ++ | 16:54 |
tobiash | today I tried to get gating working with my shiny new v3 deployment | 16:56 |
tobiash | I had issues with casing when comparing labels names in the canMerge method in the gerrit driver | 16:56 |
tobiash | (label.tolower() is compared to another label) | 16:57 |
tobiash | with newest gerrit we had issues in the gate if verified is spelled in small letters and gerrit having it spelled Verified | 16:58 |
tobiash | in this case zuul does "review --label verified=2 submit" | 16:58 |
tobiash | gerrit then adds +2 to verified but didn't submit | 16:59 |
tobiash | that's why we had to spell the verified flag also in the pipeline as Verified (which then breaks canMerge check) | 16:59 |
mordred | tobiash: oh - that's LOVELY | 17:01 |
tobiash | so we could either enforce the correct casing and do case sensitive matches everywhere (without tolower normalization) | 17:01 |
mordred | tobiash: that seems to be behavior change with latest gerrit,right? | 17:01 |
tobiash | yepp, gerrit upgrade broke it (also broke our v2 installation which I patched locally) | 17:02 |
tobiash | well latest gerrit at that point when we upgraded (2.13.5) | 17:03 |
*** isaacb has quit IRC | 17:04 | |
tobiash | so I would prefer to stick with case sensitive comparisons and don't do any case normalization anywhere | 17:04 |
tobiash | if that's ok for you | 17:04 |
mordred | tobiash: we could also do tolower() on both sides of the comparison - but it seems like perhaps enforcing correct case in the config is the best place to validate that zuulwill then send the value to gerrit with the correct case | 17:04 |
tobiash | yepp | 17:05 |
mordred | tobiash: it seems to me like the thing which would be the least surprising - worthwhile nudging jeblair for his opinion on that one in case there's a case we haven't thought of | 17:05 |
tobiash | ok, I will propose a change for this tomorrow | 17:07 |
tobiash | btw, with the help of jamielennox patches (https://review.openstack.org/#/q/topic:source-reporter) I now have a v3 setup connected to two gerrits with pipelines serving projects of both in a single queue :) | 17:11 |
mordred | tobiash: WOOT! | 17:14 |
mordred | tobiash: did you apply your suggested change to https://review.openstack.org/#/c/461981/5/zuul/driver/gerrit/gerritreporter.py ? | 17:16 |
tobiash | mordred: a slighly changes variant of it I think | 17:17 |
mordred | kk | 17:17 |
tobiash | mordred: just combined the initial isinstance check with the canonical hostname check | 17:20 |
tobiash | mordred: updated my comment with the real life solution | 17:23 |
tobiash | I've also written a test case for this: https://review.openstack.org/#/c/468374/ | 17:24 |
tobiash | build should get become green once this patch is fixed | 17:24 |
mordred | tobiash: maybe we should just add your test case to https://review.openstack.org/#/c/461981 and also apply your fix there ... I doubt jamielennox would mind | 17:25 |
tobiash | mordred: I didn't know if anyone could become upset so I just added the test case as a followup patch | 17:27 |
tobiash | I also didn't see jamielennox not away since I left the review so I didn't modify his patch without asking him | 17:28 |
mordred | tobiash: fair enough - let's give it til folks are back tomorrow, and we can move it forward then | 17:30 |
tobiash | ok, sounds good | 17:30 |
tobiash | is it expected that I cannot add a project (listed in untrusted-projects) initially to a gate pipeline just via proposing a .zuul.yaml? | 18:06 |
tobiash | when I try this the gate pipeline rejects processing the change with 'zuul.DependentPipelineManager: Unable to find change queue for change ...' | 18:06 |
tobiash | after manually pushing the change and sighup'ing zuul the gate works as expected | 18:07 |
pabelanger | jamielennox|away: are you still working on nodepool-webapp? https://review.openstack.org/#/c/445675 | 18:49 |
pabelanger | wsgi seems like a good approach | 18:50 |
SpamapS | tobiash: jamielennox|away is on holiday for a week. You may want to move forward now. | 18:56 |
SpamapS | I'm sure he'll understand. :) | 18:56 |
tobiash | SpamapS: good to know | 18:59 |
tobiash | SpamapS: so what would be the correct approach? Uploading a new ps on his patch? What about the signed off line? | 19:00 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Add console-log to config-validate https://review.openstack.org/468942 | 19:32 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit https://review.openstack.org/461981 | 19:36 |
tobiash | SpamapS, mordred: hope I did it the correct way ^^^ | 19:37 |
tobiash | hm, needs a rebase | 19:38 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Remove source from reporter https://review.openstack.org/462362 | 19:41 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Remove source from reporter https://review.openstack.org/462362 | 20:04 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit https://review.openstack.org/461981 | 20:04 |
*** dkranz has quit IRC | 21:53 | |
*** hashar has quit IRC | 22:53 | |
*** jkilpatr has joined #zuul | 23:17 | |
*** jkilpatr has quit IRC | 23:25 | |
*** jkilpatr has joined #zuul | 23:48 | |
*** jkilpatr has quit IRC | 23:55 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!