Thursday, 2019-08-29

SpamapS	How would people feel about some added cloud images features for AWS to allow us to specify a set of tags and account ids rather than AMI ids?	00:59
SpamapS	Not getting much time to implement nodepool-builder for AMI's... but I can add some boto3 filtering in a few lines.	00:59
SpamapS	Alternatively: How would people feel about adding packer support to nodepool-builder? ;-)	01:00
clarkb	SpamapS: I think you can already overload the builder command but the way that is done you havr to accept the same args as dib	01:01
clarkb	I think we should do another followup to make it fully configurable though figuring out how to do that is likely work in auditing other builders	01:02
clarkb	the as long as files get end up in the right spots it should work maybe?	01:02
SpamapS	clarkb: yeah maybe I'll just be the first person who tries and fixes the bugs ;)	01:04
SpamapS	We're starting to build custom images and just manually adjusting the AMI ID's.. but.. that sucks.	01:05
SpamapS	Would be much simpler if we could just say "find the image owned by account id # xxxxx with tag Name==ubuntu1804-local	01:06
SpamapS	or if we could just drop a command in the nodepool-builder config.. that would be zomg wtf amazing	01:06
*** jamesmcarthur has joined #zuul		01:17
*** michael-beaver has quit IRC		01:19
*** noorul has joined #zuul		01:23
ianw	yeah, i think tobiash was already intercepting disk-image-create and sending it off to do something else	01:28
ianw	that was the idea behind putting in dib-cmd as a diskimage option	01:29
*** noorul has quit IRC		01:33
*** noorul has joined #zuul		01:41
*** jamesmcarthur has quit IRC		01:48
*** jamesmcarthur has joined #zuul		01:49
*** jamesmcarthur has quit IRC		01:49
*** jamesmcarthur has joined #zuul		01:49
*** noorul has quit IRC		01:49
*** noorul has joined #zuul		01:53
noorul	Is it possible to dynamically increase and decrease nodepool's pool size?	01:57
*** jamesmcarthur has quit IRC		02:06
clarkb	it will respect quota sizing on openstack clouds	02:08
clarkb	that means if you change the quota limits nodepool will respect them	02:09
*** jamesmcarthur has joined #zuul		02:18
*** openstackgerrit has quit IRC		02:37
*** jamesmcarthur has quit IRC		02:37
*** bhavikdbavishi has joined #zuul		02:43
*** noorul has quit IRC		02:46
*** jamesmcarthur has joined #zuul		02:49
*** jamesmcarthur has quit IRC		02:59
*** noorul has joined #zuul		02:59
*** noorul has quit IRC		03:01
*** bhavikdbavishi1 has joined #zuul		03:02
*** bhavikdbavishi has quit IRC		03:03
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:03
*** noorul has joined #zuul		03:42
noorul	clarkb: So there is no dynamic config loading for nodepool at nodepool level	03:43
clarkb	nodepool will reload its config off of disk each time through its loop	03:44
noorul	I have base job definition http://paste.openstack.org/show/766657/	03:44
noorul	clarkb: Oh I see	03:44
noorul	clarkb: So I just have to update the config by gitops	03:44
noorul	In the initial version playbooks/base/post.yaml was not there as part of post-run	03:45
noorul	I added that now	03:45
noorul	But after restarting both the scheduler and executor, the new PRs are not running that playbook	03:46
noorul	Am I missing something?	03:46
clarkb	Im not sure. The bitbucket driver is new could be a bug	03:46
noorul	Are you suspecting bitbucket driver here?	03:47
clarkb	if that is where that config is loaded then maybr	03:47
noorul	I thought the driver has minimal role	03:47
clarkb	well that config comes from git via that driver	03:47
clarkb	unless you load that config from elsewhere	03:47
clarkb	but restarting the scheduler should reload all configs	03:47
noorul	I restarted both the scheduler and executor	03:48
noorul	Where on the disk zuul stores latest config?	03:48
clarkb	it is loaded into memory	03:48
noorul	I see	03:49
clarkb	but the source of that is your git repo's yaml files	03:49
clarkb	that commit was merged?	03:49
noorul	Yes	03:50
noorul	It got merged and now it is in master	03:50
clarkb	you can check if the commit is present in your git repos /var/lib/zuul/executor-git I think	03:51
noorul	http://paste.openstack.org/show/766658/	03:52
noorul	That folder has that	03:53
noorul	http://paste.openstack.org/show/766659/	03:53
clarkb	I would expect it to be loaded then	03:54
noorul	I think now it is working	04:13
noorul	http://paste.openstack.org/show/766660/	04:14
noorul	clarkb: It looks like it is not picking up nodepool_rsa private key	04:22
* noorul is going crazy with Zuul's key management		04:22
* SpamapS peeks		04:27
SpamapS	noorul: This part of it isn't so hard... you just have a private key that's on the executors, and the public portion should be named in your cloud provider.	04:28
noorul	SpamapS: I am using static driver	04:48
noorul	SpamapS: My nodepool instance is running on the same machine	04:48
noorul	SpamapS: When I look at /var/lib/zuul which is the home folder for zuul user	04:48
noorul	SpamapS: I see under .ssh folder id_rsa and nodepool_rsa	04:49
noorul	I assume that rsync is running on executor	04:53
noorul	as zuul user. If so it will be id_rsa private key by default.	04:53
noorul	I am not sure where it came from. Who is responsible for adding id_rsa.pub to authorized_keys of nodepool instances?	04:54
noorul	SpamapS: ^^^	04:56
*** jamesmcarthur has joined #zuul		05:00
*** jamesmcarthur has quit IRC		05:04
*** raukadah is now known as chandankumar		05:09
*** openstackgerrit has joined #zuul		05:15
openstackgerrit	Ian Wienand proposed zuul/zuul master: executor: always setup an inventory, even for blank node list https://review.opendev.org/679184	05:15
*** badboy has joined #zuul		05:16
*** bjackman has joined #zuul		05:47
openstackgerrit	Ian Wienand proposed zuul/zuul master: executor: always setup an inventory, even for blank node list https://review.opendev.org/679184	06:10
*** noorul has quit IRC		06:46
*** tosky has joined #zuul		07:23
*** jpena\|off is now known as jpena		07:40
*** themroc has joined #zuul		07:53
*** sshnaidm\|afk is now known as sshnaidm\|ruck		08:20
*** tdasilva has joined #zuul		08:38
tobiash	zuul-maint: is anyone already running with the json log appending change? (https://review.opendev.org/676717)	08:50
tobiash	I'm wondering if I need to take special care when updating zuul	08:50
*** hashar has joined #zuul		09:00
corvus	clarkb, fungi, mnaser: all things considered, i think we should simplify the log urls for swift and just put them in build_uuid[:2]/build_uuid/ for all jobs. i think the web ui is sufficient for navigating between builds.	09:22
*** sshnaidm\|ruck is now known as sshnaidm\|afk		09:44
*** bhavikdbavishi has quit IRC		10:02
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Add reference pipelines file for Github driver https://review.opendev.org/672712	10:28
*** sshnaidm\|afk is now known as sshnaidm\|ruck		10:33
*** bjackman has quit IRC		10:40
AJaeger_	tobiash: clarkb restarted opendev yesterday with the json log appending change AFAIU - I haven't seen any problems mentioned but backscroll here (or #openstack-infra) from last night should mention any issues	11:04
*** gtema has joined #zuul		11:04
tobiash	cool, thanks	11:07
*** jpena is now known as jpena\|lunch		11:35
*** badboy has quit IRC		11:59
*** rfolco has joined #zuul		12:28
*** rlandy has joined #zuul		12:34
*** gtema has quit IRC		12:40
*** jpena\|lunch is now known as jpena		12:41
*** jeliu_ has joined #zuul		12:42
openstackgerrit	Fabien Boucher proposed zuul/zuul master: A Zuul reporter for Elasticsearch https://review.opendev.org/644927	12:42
fungi	corvus: makes sense to me, though clarkb also suggested [:3] if we want to shard across 4k containers instead of only 256	12:43
*** panda\|rover\|off is now known as panda\|rover		12:53
*** zbr has quit IRC		13:08
*** zbr has joined #zuul		13:16
*** brendangalloway has joined #zuul		13:18
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Add support for smart reconfigurations https://review.opendev.org/652114	13:23
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Add --check-config option to zuul scheduler https://review.opendev.org/542160	13:23
brendangalloway	Hi, we've been looking at ways to request a nodeset across multiple connected providers. Is there any way to do this with existing configuration options?	13:26
Shrews	brendangalloway: Not with the current nodepool. We specifically do not allow that.	13:27
brendangalloway	Any suggestions for how to work within that restriction? We need to be able have access to static nodes and openstack nodes within the same job	13:29
jangutter	Shrews: would there be interest if we built something to a kind-of aggregate providers specifically for labs that needs cross-provider nodesets?	13:30
pabelanger	brendangalloway: you could use add_host, at jobs run time to do it	13:30
pabelanger	which means, keeping the static node outside of nodepool	13:30
Shrews	brendangalloway: oh, you mean across multiple drivers, not providers?	13:30
Shrews	I'm not aware of any restrictions to using multiple nodepool drivers simultaneously.	13:31
brendangalloway	Yes - sorry if my terminology is not 100% accurate	13:32
brendangalloway	nodepool returns node error when we try defined a nodeset with both a static node and an openstack node	13:32
Shrews	brendangalloway: just define one set of labels for static nodes, and another for the openstack nodes. Should work, afaik	13:32
Shrews	though i don't know anyone that actually does that. maybe tobiash does?	13:33
Shrews	brendangalloway: is there a traceback in the logs?	13:33
jangutter	Shrews: can you have one provider with multiple drivers?!	13:33
tobiash	mixing static and dynamic nodes in a single nodeset is not possible	13:33
tobiash	because that always will be multiple providers	13:34
brendangalloway	looking at the logs, each driver is given the request ['static', 'openstack'], which none are able to provide individually	13:34
Shrews	tobiash: ah, i guess because we always try to group to the same provider. yeah, i suppose that's right	13:34
jangutter	Shrews, tobiash: yah, we work around it by using Ironic to spawn baremetals, but we're hitting a limit where we need the static nodes, unfortunately.	13:35
Shrews	i think pabelanger's suggestion might work for you then	13:35
Shrews	sorry, still working on injecting coffee :)	13:37
tobiash	one nodeset is always one node request and one node request can only be satisfied by one provider. That's how nodepool works at the moment.	13:38
tobiash	so to combine static and dynamic node you'd need two jobs	13:39
pabelanger	brendangalloway: https://github.com/ansible-network/windmill-config/blob/master/tests/playbooks/bastion.yaml is an example of how to add dynamic static node	13:39
pabelanger	via ansible	13:39
pabelanger	however, you'll then need to set semaphores on job, to avoid multiple jobs using same static node	13:39
jangutter	tobiash: I was thinking to modify nodepool to build a driver that specifically aggregates non-overlapping providers into one provider for zuul.	13:40
tobiash	a meta-driver?	13:40
jangutter	tobiash: yeah - allowing you to specify that "these providers belong to the same site".	13:40
tobiash	that won't solve the problem because actually each provider is split into pools and the pools are the thing that counts for the algorithm	13:41
brendangalloway	pabelanger: Thanks, I'll take a look at that. I'm guessing it works by name and not by label?	13:41
tobiash	so to really solve that one would need to completely redesign the algoritm how nodepool satisfies node requests	13:41
pabelanger	brendangalloway: right, the static node isn't even in nodepool	13:42
pabelanger	so you need to manage that info at job run time	13:42
jangutter	tobiash: even if the labels are guaranteed not to overlap?	13:42
tobiash	jangutter: the current algoritm is: take node request, satisfy it completely or decline it, if you're the last one to decline, fail the request	13:42
tobiash	this is done by all providers (actually poolworkers) in parallel	13:43
brendangalloway	pabelanger: We were hoping to leverage nodepool as a host booking system rather than needing to build our own	13:43
tobiash	so to solve this you'd need a way to partially satisfy a node request and let another poolworker continue	13:43
jangutter	tobiash: right, but in my scenario, if a provider is a "slave provider" (i.e. has been registered as belonging to a site), it never tells zuul directly it can satisfy a request.	13:43
pabelanger	brendangalloway: agree, I think that is the right approach, in my case, i didn't want other jobs to request this nodeset, it was the reason to keep it out	13:44
jangutter	tobiash: but I see what you mean, it introduces a lock - the site has to wait for all pools to come back before it can say "I can satisfy this".	13:44
tobiash	what you could do in a meta provider (but that's an ugly hack): you could take the node request, create a secondary node request for each label in the original node request, wait for all to be satisfied, attach those nodes to the original node request and satisfy it	13:48
tobiash	I think this could work	13:50
jangutter	tobiash: that's some really next-level recursion there!	13:53
tobiash	but I think this should work	13:53
brendangalloway	pabelanger: what happens if you have more than one instance of the job triggered?	13:53
tobiash	brendangalloway: in this case you're doomed ;)	13:54
tobiash	brendangalloway: but you can avoid this by using a semaphore on the job	13:55
*** tflink has quit IRC		13:55
brendangalloway	pabelanger: we're trying really hard to undoom outselves :p	13:55
Shrews	tobiash: you're thinking of the meta provider as a separate driver? that's an interesting thought	13:57
*** mordred has quit IRC		13:57
jangutter	tobiash: not knowning much about semaphores, but can you tell in the job that "hey, I've been given semaphore 3 out of 4"?	13:57
*** tflink has joined #zuul		13:57
tobiash	corvus, fungi: re executor performance, I'm wondering if we should document some performance requirements regarding the executors (e.g. on large deployments put executor on ssd + use plenty of ram)?	13:57
tobiash	Shrews: yes, a separate driver that maybe could be configured with supported labels	13:58
pabelanger	brendangalloway: yah, without semaphore, more then one could result in odd job results	13:58
tobiash	jangutter: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.semaphore	13:59
tobiash	a semaphore tells zuul how many jobs referencing that semaphore can run in parallel	13:59
Shrews	tobiash: the dodgey part would be preventing the meta driver from handling its own sub-node-requests	13:59
*** mordred has joined #zuul		13:59
Shrews	but, yeah,might be possible	14:00
tobiash	Shrews: right, but it could do that easyly by just not handling node requests with one node type	14:00
jangutter	Shrews, tobiash: yeah, I had the idea of a field that you set in all the providers that say: "hey, I'm a slave to site <x>, I don't satisfy requests directly".	14:01
Shrews	well, it would have to handle ALL node types. but, it could set the requestor field and just decline those requests that came from itself	14:01
tobiash	I think a meta driver wouldn't make much sense to handle requests with a single type (in this case it would just proxying the request without benefit) so it could be limited to only support multi-label requests and decline all the others	14:03
tobiash	but I guess we should still use the requestor field for that too	14:04
Shrews	tobiash: yeah, but that requires changing the common driver code (i think), and i wouldn't approve any driver that needed to change that (without a very good reason)	14:04
jangutter	Shrews, and I guess bribery is not a very good reason :-p	14:05
Shrews	jangutter: in any case, you're in unknown territory here, so there may be dragons we haven't yet thought of :)	14:05
jangutter	Shrews, dragons, grues, basilisks... we're kinda used to 'em.	14:06
tobiash	Shrews: right, I thought that part was not in the common driver code but it is	14:07
tobiash	then I agree that it would need to handle all types	14:07
Shrews	tobiash: so hard to remember such complex code :) especially before 2nd cup of coffee!	14:08
tobiash	my coffe is already hours ago...	14:08
jangutter	OK, so let's assume this turns out to require common driver code mods - is there a spec process or something where we can suss out the details?	14:09
Shrews	jangutter: no, just discuss here for that	14:10
jangutter	Okay! I'll see if I can come up with some code blasphemy that can be used as a discussion point.	14:11
Shrews	jangutter: we haven't really spec'd out the common driver code (it just developed as we added more drivers), but as it has proven very stable at this point, changes to it must be carefully considered	14:11
jangutter	Shrews: yep, core plumbing is core for a reason.	14:12
*** sgw has quit IRC		14:12
brendangalloway	Circling back to jangutter's previous question another, simpler, hack (for our specific usecase at least) would be if there was some way for a job to know which part of a semaphore it is?	14:14
jangutter	Yah, is the job semaphore value in zuul vars somewhere?	14:14
brendangalloway	add_host 1 when the job is number one in the queue, add_host 2 if it is number 2	14:14
brendangalloway	automatically linking lab state to job add_host calls will probably require some maintenance, but it will at least prevent zuul jobs from stomping on the static nodes	14:16
*** sgw has joined #zuul		14:31
clarkb	zuulians I think the executor restarts on opendev have been happy with 50a0f736a3de29280e459cccc4c25a46cba9570e looking at git log since the last release most of the changes have been against web and the executor	14:37
clarkb	I'm thinking we should tag that commit as 3.10.2 or 3.11 (haven't checked for appropriate semver bump). If we would prefer we can also restart opendev's scheduler and mergers and do the release on say monday or tuesday though	14:38
*** themroc has quit IRC		14:42
*** jamesmcarthur has joined #zuul		14:42
tobiash	maybe the zuul-web changes justify 3.11	14:49
pabelanger	zuul for workflows 3.11	14:49
fungi	yikes	14:50
fungi	i can't un-see that now	14:51
pabelanger	clarkb: we need some reno notes first I think	14:52
pabelanger	we have nothing to indicate need for next release	14:53
clarkb	I can write one for the bug fix, but someone else might be able to better capture the web updates	14:53
* clarkb writes bug relnote		14:53
tobiash	are the zuul web updates complete enough to do a release?	14:55
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Add autohold delete/info commands to web API https://review.opendev.org/679057	14:59
openstackgerrit	Clark Boylan proposed zuul/zuul master: Add release note for bug fix to test correct commit https://review.opendev.org/679285	15:00
*** bhavikdbavishi has joined #zuul		15:00
clarkb	pabelanger: ^	15:00
clarkb	tobiash: I feel like they've been working for opendev	15:00
clarkb	we know there are things to improve but they are definitely useable	15:00
*** bhavikdbavishi has quit IRC		15:04
pabelanger	+2	15:05
*** jamesmcarthur has quit IRC		15:07
*** jamesmcarthur has joined #zuul		15:07
*** bhavikdbavishi has joined #zuul		15:11
clarkb	https://twitter.com/GerritReview/status/1167059571095015425	15:20
clarkb	if you look closely you might see a familiar logo	15:20
Shrews	volvo!!!	15:23
Shrews	no?	15:23
Shrews	jfrog?	15:23
clarkb	thats the one :P	15:23
Shrews	looks like they've posted a tweet featuring some water-loving nomad as well	15:25
*** sshnaidm\|ruck is now known as sshnaidm\|afk		15:26
clarkb	you almost can't recognize him without the scuba gear on	15:27
Shrews	he totally should have presented in his wet suit	15:27
* Shrews must afk for a bit		15:28
*** chandankumar is now known as raukadah		15:40
*** brendangalloway has quit IRC		15:44
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Add reference pipelines file for Github driver https://review.opendev.org/672712	15:57
*** jpena is now known as jpena\|off		16:11
*** sshnaidm\|afk is now known as sshnaidm\|off		16:39
*** igordc has joined #zuul		16:44
*** jamesmcarthur has quit IRC		16:50
SpamapS	Is there any way to restrict a role to trusted-only? I'm asking because I'm trying to figure out how to make a role to allow a user to SSH in, but I only want them to be able to get in if their key is in a whitelist of keys... since vars: is passed as extravars .. I think they could always override any value unless the role is always run in a trusted context.	16:56
* SpamapS will propose the role so you can see		16:56
openstackgerrit	Clint 'SpamapS' Byrum proposed zuul/zuul-jobs master: WIP: intercept-job -- self-service SSH access https://review.opendev.org/679306	16:58
pabelanger	SpamapS: rather role directly, make it a job, with final: true? leave it in config-project and variable can't change?	17:00
clarkb	ya I think roles are intentionally meant to be reusable	17:00
clarkb	you could also set it up to take a secret then only jobs in the trusted repo would have access to the secret	17:00
pabelanger	yah, that might work too	17:01
clarkb	however if we are talking ssh authorized_keys any job can update those for the current user	17:01
SpamapS	clarkb: good point	17:06
SpamapS	but I"m making it REALLY easy	17:06
clarkb	for some this might be a downside of ansible using ssh	17:07
clarkb	makes it hard to lock down access via ssh	17:07
openstackgerrit	Tristan Cacqueray proposed zuul/zuul master: web: add config page https://review.opendev.org/633667	17:11
SpamapS	I've always wondered if we should firewall off once we're connected	17:11
clarkb	ya as long as the job then deescalates its privileges that would work	17:14
clarkb	we do that in a lot of the tox jobs, start and do setup that needs root then remove root access so that things like unittests don't abuse that	17:14
*** panda\|rover is now known as panda\|rover\|off		17:18
*** hashar has quit IRC		17:24
SpamapS	clarkb:ok, I'll chill out on the allow list stuff then. SSH for everybody!!	17:58
*** rlandy is now known as rlandy\|biab		18:34
*** noorul has joined #zuul		18:46
clarkb	tobiash: any reason to not approve https://review.opendev.org/#/c/670461/ ?	18:57
clarkb	I noticed you +2'd but didn't +A (I hope it was clear that I didn't think the change needed updating in my comment as I called it a Nit and +2'd anyway)	18:58
tobiash	clarkb: I wanted to land this together with the child change which still has a -1 by corvus (which I think is addressed now) but I didn't want to overrule	18:59
clarkb	got it	18:59
tobiash	but I'd love to land both	19:00
tobiash	one patch less on top of master ;)	19:00
*** spsurya has quit IRC		19:07
clarkb	both look good to me. I guess if corvus acks them they can go in then	19:08
tobiash	:)	19:09
clarkb	evrardjp: for https://review.opendev.org/#/c/676393/1 I don't think we can remove that interface now that it is there	19:10
clarkb	evrardjp: if we did so then jobs consuming that interface will break	19:10
*** jeliu_ has quit IRC		19:12
*** bhavikdbavishi has quit IRC		19:21
clarkb	tristanC: I left some thoughts on https://review.opendev.org/#/c/676424/5	19:23
*** noorul has quit IRC		19:26
openstackgerrit	Merged zuul/zuul master: Add reference pipelines file for Gerrit driver https://review.opendev.org/672683	19:34
*** rlandy\|biab is now known as rlandy		19:39
*** jamesmcarthur has joined #zuul		20:36
*** jeliu_ has joined #zuul		20:40
*** armstrongs has joined #zuul		20:55
*** armstrongs has quit IRC		21:04
*** rfolco has quit IRC		21:09
*** jamesmcarthur has quit IRC		21:26
*** jeliu_ has quit IRC		21:35
*** jamesmcarthur has joined #zuul		21:38
*** rfolco has joined #zuul		21:42
clarkb	Shrews: https://review.opendev.org/#/c/671704/2 could be useful for debugging cloud problems	21:42
clarkb	adds better debugging to nodepool logs when nodes fail to launch	21:42
clarkb	thanks tobiash !	21:42
*** jamesmcarthur has quit IRC		21:43
Shrews	clarkb: yes, but I left a question re: tests	21:44
clarkb	ya I guess I see the utility of it to be worthwhile and I don't think we have tests for any of that code there	21:46
clarkb	(the retries and logging of quota errors, but may be I am wrong	21:48
*** jamesmcarthur has joined #zuul		21:59
*** jamesmcarthur has quit IRC		22:04
*** jamesmcarthur has joined #zuul		22:10
*** jamesmcarthur has quit IRC		22:17
*** rlandy has quit IRC		22:18
clarkb	ianw: for https://review.opendev.org/#/c/679184/2/zuul/executor/server.py we do allow untrusted executor only jobs but they are limited in what they can do	22:29
*** tosky has quit IRC		22:29
clarkb	I don't think that having an inventory for htat case is any less secure because those jobs could try to add host anyway?	22:30
*** jamesmcarthur has joined #zuul		22:31
*** jamesmcarthur has quit IRC		22:35
ianw	clarkb: yeah, i think that in either case, having the inventory explicit is about the same	23:04
ianw	that {{hostvars}} is not defined for implicit localhost is i think, at best, confusing	23:05

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!