Monday, 2017-05-29

openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement a static driver for Nodepool  https://review.openstack.org/46862401:57
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Abstract Nodepool request handling code  https://review.openstack.org/46874801:57
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Abstract Nodepool provider management code  https://review.openstack.org/46874901:57
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Collect request handling implementation in an OpenStack driver  https://review.openstack.org/46875001:57
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Extend Nodepool configuration syntax to support multiple drivers  https://review.openstack.org/46875101:57
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Add support for custom ssh port  https://review.openstack.org/46875201:57
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement an OpenContainer driver  https://review.openstack.org/46875301:57
tobiashtristanC: cool, will have a look :)05:44
*** bhavik1 has joined #zuul06:03
*** hashar has joined #zuul07:11
tristanCtobiash: nice, feedback are most welcome!08:47
tristanCyou'll also find a runc driver that should spawn container if you set a 'host' to nodepool provider08:50
tobiashtristanC: jepp, saw that, will have a closer look on that later08:51
*** smyers has quit IRC09:02
*** rcarrill1 is now known as rcarrillocruz09:08
*** smyers has joined #zuul09:13
tobiashtristanC: left some thoughts on https://review.openstack.org/#/c/46874910:12
tobiashregarding the api of the ProviderManager base class10:12
tobiashand I just stumbled about the question how we should handle min_ready as this is a label specific attribute10:13
tobiash(which would not apply for static nodes)10:14
tobiashtristanC, jeblair: I just had a thought about the planned config change for the nodepool drivers10:27
tobiashtristanC, jeblair: https://review.openstack.org/#/c/468751/1/nodepool/config.py@20810:28
tobiashwhat do you think?10:28
tristanCtobiash: absolutely, it sounds better to let driver manager their own configurations10:50
tobiashtristanC: so we should prototype an example config and check with jeblair if that sounds good10:54
tristanCnote that I tried to do surgical changes to get an implementation up and running, if it's validated then it still need unit-test and improvements regarding configurations10:56
tobiashsure10:57
tobiashtristanC: btw, I rethought my thoughts on https://review.openstack.org/#/c/468749/10:57
tristanCtobiash: Thanks, those feedbacks are much appreciated, i'll propose new ps tomorrow10:59
tobiashok :)11:00
* tristanC is utc+9 :)11:04
* tobiash is utc+211:06
*** bhavik1 has quit IRC11:12
*** isaacb has joined #zuul11:49
*** nt has quit IRC12:18
*** isaacb has quit IRC12:39
*** Guest79619 is now known as mgagne13:14
*** mgagne has quit IRC13:14
*** mgagne has joined #zuul13:14
*** jroll has joined #zuul15:20
*** isaacb has joined #zuul15:33
mordredtobiash, tristanC: so - a few thoughts to toss out (great initial work btw) ...16:12
mordred(also, these are me just thinking out loud, don't necessarily take these as *real* thoughts)16:13
mordredtobiash, tristanC: for a static node, it's possible that an admin might want to allow more than one job to "check out" the node at a time16:14
mordredalthough 'minReady' isn't a good word for that, it may be a similar concept at some level16:14
mordredbut similarly - one could imagine even a cloud/vm node type where the admin might want to say "use this for up to X concurrent jobs" and also "don't delete this when done" (don't delete when done is a very important feature for static nodes, obviously, but although I do not think openstack would ever use it - perhaps it's a useful enough feature to be in a base class)16:16
pabelanger++16:19
pabelangerconcurrent jobs would be awesome16:20
mordredtristanC, tobiash: it also may be worth considering having the list of static nodes read from config at start and written to zk so that any of the node launchers can respond to a request from zuul for one of the static nodes ... otherwise rolling out config changes to a set of node launcher processes might get weird with them having out-of-sync versions of the config (you may already be doing thatan dI just16:22
mordredmissed it - I haven't read _deeply_)16:22
tobiashmordred: good idea16:24
tobiashso the nodes could have a slots property16:25
mordredyah. I know it's been a thing folks have asked for in the past - especially folks who have static hardware like "I have this one giant openpower server i want to use for testing, and it could handle 100 concurrent jobs easily"16:26
mordredtristanC: also - for the oci driver - it doesn't seem to expose a "base image" anywhere and instead would just assume a "base os" that matches the OS of the node running the nodelauncher (which I actually rather like -it's a neat way to provide a simple thing there)16:26
tobiashwe also have such a use case (one node managing a cluster of test hardware)16:27
mordredtobiash: ++ yes - that's another great example of such a thing16:27
mordredtristanC: perhaps there should be some sort of matching in the zuul request protocol exposed by the nodelauncher that's running an oci driver?16:29
mordred(or maybe some way of explicitly providing a label in the nodepool config)16:29
mordredso that if you had a nodelauncher running on fedora and a nodelauncher running on ubuntu, in the zuul job you could hav ea nodeset that said "oci:fedora-24" or something16:30
mordredand only a nodelauncher running the right base os would service that node request16:30
mordredthe other option, of course, would be also extending the image builder part to konw how to make base oci images and plumb those through to the oci run portion16:31
mordredit's also possible I'm missing a part of that completely ... :)16:31
tobiashinteresting, with the oci driver the nodepool launcher would actually *become* a pool of slaves16:32
mordredyah. I hadn't really thought of doing it that way - but I could see the appeal16:33
tobiashbut back to the question how to manage static slaves with multiple launcher16:34
tobiashwouldn't also reading the config and pushing to zk be racy?16:35
tobiashdifferent possibility would be to allow adding additional nodes to zookeeper via e.g. cli at runtime16:36
mordredtobiash: it shouldn't be - whena node notices a config change, it could grab the static-hosts lock, then write the static host config to zk16:37
mordredperhaps with a checksum/version so that other nodes who notice a config change can doa quick check when they get the lock to see if the config is already valid16:38
tobiashmordred: that implies that config changes are rolled out (almost) simultanously without a launcher (re)start in between16:38
mordredyes. launchers should not need a restart to notice config changes - and then the only work the static 'launcher' is performing is writing the node info to zk16:39
mordredthen servicing the node requests can be as normal "do I have a matching node available in zk, if so, return to zuul, if not error" and can be done by anyway16:40
mordredanyone16:40
tobiashmordred: ok, that needs to be kept in mind then (thinking of running the launcher in k8s where a container restart could happen at any time)16:41
mordred++ - I think if we consider a contianer restart and "I noticed a config change" as the same thing - then a launcher restart and a config change event both essentially just perform the "verify my config is in sync with zk"16:42
tobiashbut writing the config sounds cool as all you have to do to support static nodes with several slots is to write several nodes into zk :)16:42
mordredtobiash: oh - that's a great idea/way to support that16:42
tobiashmordred: that race condition possibly could be solved by storing the hashes of all (last x) configs in zk so a container restart could check if it has an older version of the config16:44
mordredtobiash: in the use youmention, we should also probably protect against ... yup, that's where I was going16:44
tobiashthat could be a good solution16:46
tobiashbtw, is the nodepool protocol protected against sudden launcher deaths?16:46
mordredyes, I believe that should be covered16:47
tobiashe.g. nodepool takes a lock (to delete a node) then dies -> could this leak an instance?16:47
mordredif a launcher has claimed a noderequest and then goes away, its lock on the noderequest will go away and another launcher can pick up the request and fulfill it16:47
tobiashah, I see16:47
tobiashthe lock of the launcher vanishes with the launcher16:48
mordredyup16:48
mordredthat's one of the things we gain in v316:48
tobiash:)16:48
tobiashsimilar for executors in v4 will be cool16:48
tobiashjust had today a jenkins oom (v2 deployment) leaving a bunch of stale jobs behind...16:49
mordredactually - I believe we're already good on executors in v3 ... same thing  - if an executor dies after having been scheduled with a job, it should get rescheduled16:49
mordredbut I might be wrong about that16:50
tobiashhow does that work?16:50
tobiashexecutors are in v3 still scheduled via gearman?16:50
tobiashso does gearman has means to deal with this?16:50
mordredthey are- but I believe if the worker claiming a gearman job dies, the scheduler will notice that - and in v3 I believe it will re-submit the job to gearman so it will be picked up again16:51
tobiashah, cool16:51
mordred*I think*16:51
clarkbyes16:51
mordredI could be quite wrong aboutthat16:51
mordredyay!16:51
mordredI lke it when I get things right16:51
clarkbif client goes away worker should go away and free things up. I tmay not do an explicit release to nodepool though16:52
clarkbbut the jobs should be rerun and node deleted eventually iirc16:52
clarkbin fact we occasionally just hard restart the executors, I think only real cost is in rerunning previously completed but unrecorded work16:53
mordredyah - I think nodepol will notice because the ownership lock on the nodes will go away perhaps?16:53
mordredclarkb: ++16:54
tobiashtoday I tried to get gating working with my shiny new v3 deployment16:56
tobiashI had issues with casing when comparing labels names in the canMerge method in the gerrit driver16:56
tobiash(label.tolower() is compared to another label)16:57
tobiashwith newest gerrit we had issues in the gate if verified is spelled in small letters and gerrit having it spelled Verified16:58
tobiashin this case zuul does "review --label verified=2 submit"16:58
tobiashgerrit then adds +2 to verified but didn't submit16:59
tobiashthat's why we had to spell the verified flag also in the pipeline as Verified (which then breaks canMerge check)16:59
mordredtobiash: oh - that's LOVELY17:01
tobiashso we could either enforce the correct casing and do case sensitive matches everywhere (without tolower normalization)17:01
mordredtobiash: that seems to be behavior change with latest gerrit,right?17:01
tobiashyepp, gerrit upgrade broke it (also broke our v2 installation which I patched locally)17:02
tobiashwell latest gerrit at that point when we upgraded (2.13.5)17:03
*** isaacb has quit IRC17:04
tobiashso I would prefer to stick with case sensitive comparisons and don't do any case normalization anywhere17:04
tobiashif that's ok for you17:04
mordredtobiash: we could also do tolower() on both sides of the comparison - but it seems like perhaps enforcing correct case in the config is the best place to validate that zuulwill then send the value to gerrit with the correct case17:04
tobiashyepp17:05
mordredtobiash: it seems to me like the thing which would be the least surprising - worthwhile nudging jeblair for his opinion on that one in case there's a case we haven't thought of17:05
tobiashok, I will propose a change for this tomorrow17:07
tobiashbtw, with the help of jamielennox patches (https://review.openstack.org/#/q/topic:source-reporter) I now have a v3 setup connected to two gerrits with pipelines serving projects of both in a single queue :)17:11
mordredtobiash: WOOT!17:14
mordredtobiash: did you apply your suggested change to https://review.openstack.org/#/c/461981/5/zuul/driver/gerrit/gerritreporter.py ?17:16
tobiashmordred: a slighly changes variant of it I think17:17
mordredkk17:17
tobiashmordred: just combined the initial isinstance check with the canonical hostname check17:20
tobiashmordred: updated my comment with the real life solution17:23
tobiashI've also written a test case for this: https://review.openstack.org/#/c/468374/17:24
tobiashbuild should get become green once this patch is fixed17:24
mordredtobiash: maybe we should just add your test case to https://review.openstack.org/#/c/461981 and also apply your fix there ... I doubt jamielennox would mind17:25
tobiashmordred: I didn't know if anyone could become upset so I just added the test case as a followup patch17:27
tobiashI also didn't see jamielennox not away since I left the review so I didn't modify his patch without asking him17:28
mordredtobiash: fair enough - let's give it til folks are back tomorrow, and we can move it forward then17:30
tobiashok, sounds good17:30
tobiashis it expected that I cannot add a project (listed in untrusted-projects) initially to a gate pipeline just via proposing a .zuul.yaml?18:06
tobiashwhen I try this the gate pipeline rejects processing the change with 'zuul.DependentPipelineManager: Unable to find change queue for change ...'18:06
tobiashafter manually pushing the change and sighup'ing zuul the gate works as expected18:07
pabelangerjamielennox|away: are you still working on nodepool-webapp? https://review.openstack.org/#/c/44567518:49
pabelangerwsgi seems like a good approach18:50
SpamapStobiash: jamielennox|away is on holiday for a week. You may want to move forward now.18:56
SpamapSI'm sure he'll understand. :)18:56
tobiashSpamapS: good to know18:59
tobiashSpamapS: so what would be the correct approach? Uploading a new ps on his patch? What about the signed off line?19:00
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Add console-log to config-validate  https://review.openstack.org/46894219:32
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit  https://review.openstack.org/46198119:36
tobiashSpamapS, mordred: hope I did it the correct way ^^^19:37
tobiashhm, needs a rebase19:38
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Remove source from reporter  https://review.openstack.org/46236219:41
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Remove source from reporter  https://review.openstack.org/46236220:04
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit  https://review.openstack.org/46198120:04
*** dkranz has quit IRC21:53
*** hashar has quit IRC22:53
*** jkilpatr has joined #zuul23:17
*** jkilpatr has quit IRC23:25
*** jkilpatr has joined #zuul23:48
*** jkilpatr has quit IRC23:55

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!