Monday, 2017-05-29

openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement a static driver for Nodepool https://review.openstack.org/468624	01:57
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Abstract Nodepool request handling code https://review.openstack.org/468748	01:57
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Abstract Nodepool provider management code https://review.openstack.org/468749	01:57
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Collect request handling implementation in an OpenStack driver https://review.openstack.org/468750	01:57
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Extend Nodepool configuration syntax to support multiple drivers https://review.openstack.org/468751	01:57
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Add support for custom ssh port https://review.openstack.org/468752	01:57
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement an OpenContainer driver https://review.openstack.org/468753	01:57
tobiash	tristanC: cool, will have a look :)	05:44
*** bhavik1 has joined #zuul		06:03
*** hashar has joined #zuul		07:11
tristanC	tobiash: nice, feedback are most welcome!	08:47
tristanC	you'll also find a runc driver that should spawn container if you set a 'host' to nodepool provider	08:50
tobiash	tristanC: jepp, saw that, will have a closer look on that later	08:51
*** smyers has quit IRC		09:02
*** rcarrill1 is now known as rcarrillocruz		09:08
*** smyers has joined #zuul		09:13
tobiash	tristanC: left some thoughts on https://review.openstack.org/#/c/468749	10:12
tobiash	regarding the api of the ProviderManager base class	10:12
tobiash	and I just stumbled about the question how we should handle min_ready as this is a label specific attribute	10:13
tobiash	(which would not apply for static nodes)	10:14
tobiash	tristanC, jeblair: I just had a thought about the planned config change for the nodepool drivers	10:27
tobiash	tristanC, jeblair: https://review.openstack.org/#/c/468751/1/nodepool/config.py@208	10:28
tobiash	what do you think?	10:28
tristanC	tobiash: absolutely, it sounds better to let driver manager their own configurations	10:50
tobiash	tristanC: so we should prototype an example config and check with jeblair if that sounds good	10:54
tristanC	note that I tried to do surgical changes to get an implementation up and running, if it's validated then it still need unit-test and improvements regarding configurations	10:56
tobiash	sure	10:57
tobiash	tristanC: btw, I rethought my thoughts on https://review.openstack.org/#/c/468749/	10:57
tristanC	tobiash: Thanks, those feedbacks are much appreciated, i'll propose new ps tomorrow	10:59
tobiash	ok :)	11:00
* tristanC is utc+9 :)		11:04
* tobiash is utc+2		11:06
*** bhavik1 has quit IRC		11:12
*** isaacb has joined #zuul		11:49
*** nt has quit IRC		12:18
*** isaacb has quit IRC		12:39
*** Guest79619 is now known as mgagne		13:14
*** mgagne has quit IRC		13:14
*** mgagne has joined #zuul		13:14
*** jroll has joined #zuul		15:20
*** isaacb has joined #zuul		15:33
mordred	tobiash, tristanC: so - a few thoughts to toss out (great initial work btw) ...	16:12
mordred	(also, these are me just thinking out loud, don't necessarily take these as real thoughts)	16:13
mordred	tobiash, tristanC: for a static node, it's possible that an admin might want to allow more than one job to "check out" the node at a time	16:14
mordred	although 'minReady' isn't a good word for that, it may be a similar concept at some level	16:14
mordred	but similarly - one could imagine even a cloud/vm node type where the admin might want to say "use this for up to X concurrent jobs" and also "don't delete this when done" (don't delete when done is a very important feature for static nodes, obviously, but although I do not think openstack would ever use it - perhaps it's a useful enough feature to be in a base class)	16:16
pabelanger	++	16:19
pabelanger	concurrent jobs would be awesome	16:20
mordred	tristanC, tobiash: it also may be worth considering having the list of static nodes read from config at start and written to zk so that any of the node launchers can respond to a request from zuul for one of the static nodes ... otherwise rolling out config changes to a set of node launcher processes might get weird with them having out-of-sync versions of the config (you may already be doing thatan dI just	16:22
mordred	missed it - I haven't read _deeply_)	16:22
tobiash	mordred: good idea	16:24
tobiash	so the nodes could have a slots property	16:25
mordred	yah. I know it's been a thing folks have asked for in the past - especially folks who have static hardware like "I have this one giant openpower server i want to use for testing, and it could handle 100 concurrent jobs easily"	16:26
mordred	tristanC: also - for the oci driver - it doesn't seem to expose a "base image" anywhere and instead would just assume a "base os" that matches the OS of the node running the nodelauncher (which I actually rather like -it's a neat way to provide a simple thing there)	16:26
tobiash	we also have such a use case (one node managing a cluster of test hardware)	16:27
mordred	tobiash: ++ yes - that's another great example of such a thing	16:27
mordred	tristanC: perhaps there should be some sort of matching in the zuul request protocol exposed by the nodelauncher that's running an oci driver?	16:29
mordred	(or maybe some way of explicitly providing a label in the nodepool config)	16:29
mordred	so that if you had a nodelauncher running on fedora and a nodelauncher running on ubuntu, in the zuul job you could hav ea nodeset that said "oci:fedora-24" or something	16:30
mordred	and only a nodelauncher running the right base os would service that node request	16:30
mordred	the other option, of course, would be also extending the image builder part to konw how to make base oci images and plumb those through to the oci run portion	16:31
mordred	it's also possible I'm missing a part of that completely ... :)	16:31
tobiash	interesting, with the oci driver the nodepool launcher would actually become a pool of slaves	16:32
mordred	yah. I hadn't really thought of doing it that way - but I could see the appeal	16:33
tobiash	but back to the question how to manage static slaves with multiple launcher	16:34
tobiash	wouldn't also reading the config and pushing to zk be racy?	16:35
tobiash	different possibility would be to allow adding additional nodes to zookeeper via e.g. cli at runtime	16:36
mordred	tobiash: it shouldn't be - whena node notices a config change, it could grab the static-hosts lock, then write the static host config to zk	16:37
mordred	perhaps with a checksum/version so that other nodes who notice a config change can doa quick check when they get the lock to see if the config is already valid	16:38
tobiash	mordred: that implies that config changes are rolled out (almost) simultanously without a launcher (re)start in between	16:38
mordred	yes. launchers should not need a restart to notice config changes - and then the only work the static 'launcher' is performing is writing the node info to zk	16:39
mordred	then servicing the node requests can be as normal "do I have a matching node available in zk, if so, return to zuul, if not error" and can be done by anyway	16:40
mordred	anyone	16:40
tobiash	mordred: ok, that needs to be kept in mind then (thinking of running the launcher in k8s where a container restart could happen at any time)	16:41
mordred	++ - I think if we consider a contianer restart and "I noticed a config change" as the same thing - then a launcher restart and a config change event both essentially just perform the "verify my config is in sync with zk"	16:42
tobiash	but writing the config sounds cool as all you have to do to support static nodes with several slots is to write several nodes into zk :)	16:42
mordred	tobiash: oh - that's a great idea/way to support that	16:42
tobiash	mordred: that race condition possibly could be solved by storing the hashes of all (last x) configs in zk so a container restart could check if it has an older version of the config	16:44
mordred	tobiash: in the use youmention, we should also probably protect against ... yup, that's where I was going	16:44
tobiash	that could be a good solution	16:46
tobiash	btw, is the nodepool protocol protected against sudden launcher deaths?	16:46
mordred	yes, I believe that should be covered	16:47
tobiash	e.g. nodepool takes a lock (to delete a node) then dies -> could this leak an instance?	16:47
mordred	if a launcher has claimed a noderequest and then goes away, its lock on the noderequest will go away and another launcher can pick up the request and fulfill it	16:47
tobiash	ah, I see	16:47
tobiash	the lock of the launcher vanishes with the launcher	16:48
mordred	yup	16:48
mordred	that's one of the things we gain in v3	16:48
tobiash	:)	16:48
tobiash	similar for executors in v4 will be cool	16:48
tobiash	just had today a jenkins oom (v2 deployment) leaving a bunch of stale jobs behind...	16:49
mordred	actually - I believe we're already good on executors in v3 ... same thing - if an executor dies after having been scheduled with a job, it should get rescheduled	16:49
mordred	but I might be wrong about that	16:50
tobiash	how does that work?	16:50
tobiash	executors are in v3 still scheduled via gearman?	16:50
tobiash	so does gearman has means to deal with this?	16:50
mordred	they are- but I believe if the worker claiming a gearman job dies, the scheduler will notice that - and in v3 I believe it will re-submit the job to gearman so it will be picked up again	16:51
tobiash	ah, cool	16:51
mordred	I think	16:51
clarkb	yes	16:51
mordred	I could be quite wrong aboutthat	16:51
mordred	yay!	16:51
mordred	I lke it when I get things right	16:51
clarkb	if client goes away worker should go away and free things up. I tmay not do an explicit release to nodepool though	16:52
clarkb	but the jobs should be rerun and node deleted eventually iirc	16:52
clarkb	in fact we occasionally just hard restart the executors, I think only real cost is in rerunning previously completed but unrecorded work	16:53
mordred	yah - I think nodepol will notice because the ownership lock on the nodes will go away perhaps?	16:53
mordred	clarkb: ++	16:54
tobiash	today I tried to get gating working with my shiny new v3 deployment	16:56
tobiash	I had issues with casing when comparing labels names in the canMerge method in the gerrit driver	16:56
tobiash	(label.tolower() is compared to another label)	16:57
tobiash	with newest gerrit we had issues in the gate if verified is spelled in small letters and gerrit having it spelled Verified	16:58
tobiash	in this case zuul does "review --label verified=2 submit"	16:58
tobiash	gerrit then adds +2 to verified but didn't submit	16:59
tobiash	that's why we had to spell the verified flag also in the pipeline as Verified (which then breaks canMerge check)	16:59
mordred	tobiash: oh - that's LOVELY	17:01
tobiash	so we could either enforce the correct casing and do case sensitive matches everywhere (without tolower normalization)	17:01
mordred	tobiash: that seems to be behavior change with latest gerrit,right?	17:01
tobiash	yepp, gerrit upgrade broke it (also broke our v2 installation which I patched locally)	17:02
tobiash	well latest gerrit at that point when we upgraded (2.13.5)	17:03
*** isaacb has quit IRC		17:04
tobiash	so I would prefer to stick with case sensitive comparisons and don't do any case normalization anywhere	17:04
tobiash	if that's ok for you	17:04
mordred	tobiash: we could also do tolower() on both sides of the comparison - but it seems like perhaps enforcing correct case in the config is the best place to validate that zuulwill then send the value to gerrit with the correct case	17:04
tobiash	yepp	17:05
mordred	tobiash: it seems to me like the thing which would be the least surprising - worthwhile nudging jeblair for his opinion on that one in case there's a case we haven't thought of	17:05
tobiash	ok, I will propose a change for this tomorrow	17:07
tobiash	btw, with the help of jamielennox patches (https://review.openstack.org/#/q/topic:source-reporter) I now have a v3 setup connected to two gerrits with pipelines serving projects of both in a single queue :)	17:11
mordred	tobiash: WOOT!	17:14
mordred	tobiash: did you apply your suggested change to https://review.openstack.org/#/c/461981/5/zuul/driver/gerrit/gerritreporter.py ?	17:16
tobiash	mordred: a slighly changes variant of it I think	17:17
mordred	kk	17:17
tobiash	mordred: just combined the initial isinstance check with the canonical hostname check	17:20
tobiash	mordred: updated my comment with the real life solution	17:23
tobiash	I've also written a test case for this: https://review.openstack.org/#/c/468374/	17:24
tobiash	build should get become green once this patch is fixed	17:24
mordred	tobiash: maybe we should just add your test case to https://review.openstack.org/#/c/461981 and also apply your fix there ... I doubt jamielennox would mind	17:25
tobiash	mordred: I didn't know if anyone could become upset so I just added the test case as a followup patch	17:27
tobiash	I also didn't see jamielennox not away since I left the review so I didn't modify his patch without asking him	17:28
mordred	tobiash: fair enough - let's give it til folks are back tomorrow, and we can move it forward then	17:30
tobiash	ok, sounds good	17:30
tobiash	is it expected that I cannot add a project (listed in untrusted-projects) initially to a gate pipeline just via proposing a .zuul.yaml?	18:06
tobiash	when I try this the gate pipeline rejects processing the change with 'zuul.DependentPipelineManager: Unable to find change queue for change ...'	18:06
tobiash	after manually pushing the change and sighup'ing zuul the gate works as expected	18:07
pabelanger	jamielennox\|away: are you still working on nodepool-webapp? https://review.openstack.org/#/c/445675	18:49
pabelanger	wsgi seems like a good approach	18:50
SpamapS	tobiash: jamielennox\|away is on holiday for a week. You may want to move forward now.	18:56
SpamapS	I'm sure he'll understand. :)	18:56
tobiash	SpamapS: good to know	18:59
tobiash	SpamapS: so what would be the correct approach? Uploading a new ps on his patch? What about the signed off line?	19:00
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Add console-log to config-validate https://review.openstack.org/468942	19:32
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit https://review.openstack.org/461981	19:36
tobiash	SpamapS, mordred: hope I did it the correct way ^^^	19:37
tobiash	hm, needs a rebase	19:38
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Remove source from reporter https://review.openstack.org/462362	19:41
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Remove source from reporter https://review.openstack.org/462362	20:04
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit https://review.openstack.org/461981	20:04
*** dkranz has quit IRC		21:53
*** hashar has quit IRC		22:53
*** jkilpatr has joined #zuul		23:17
*** jkilpatr has quit IRC		23:25
*** jkilpatr has joined #zuul		23:48
*** jkilpatr has quit IRC		23:55

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!