Thursday, 2019-08-29

SpamapSHow would people feel about some added cloud images features for AWS to allow us to specify a set of tags and account ids rather than AMI ids?00:59
SpamapSNot getting much time to implement nodepool-builder for AMI's... but I can add some boto3 filtering in a few lines.00:59
SpamapSAlternatively: How would people feel about adding packer support to nodepool-builder? ;-)01:00
clarkbSpamapS: I think you can already overload the builder command but the way that is done you havr to accept the same args as dib01:01
clarkbI think we should do another followup to make it fully configurable though figuring out how to do that is likely work in auditing other builders01:02
clarkbthe as long as files get end up in the right spots it should work maybe?01:02
SpamapSclarkb: yeah maybe I'll just be the first person who tries and fixes the bugs ;)01:04
SpamapSWe're starting to build custom images and just manually adjusting the AMI ID's.. but.. that sucks.01:05
SpamapSWould be much simpler if we could just say "find the image owned by account id # xxxxx with tag Name==ubuntu1804-local01:06
SpamapSor if we could just drop a command in the nodepool-builder config.. that would be zomg wtf amazing01:06
*** jamesmcarthur has joined #zuul01:17
*** michael-beaver has quit IRC01:19
*** noorul has joined #zuul01:23
ianwyeah, i think tobiash was already intercepting disk-image-create and sending it off to do something else01:28
ianwthat was the idea behind putting in dib-cmd as a diskimage option01:29
*** noorul has quit IRC01:33
*** noorul has joined #zuul01:41
*** jamesmcarthur has quit IRC01:48
*** jamesmcarthur has joined #zuul01:49
*** jamesmcarthur has quit IRC01:49
*** jamesmcarthur has joined #zuul01:49
*** noorul has quit IRC01:49
*** noorul has joined #zuul01:53
noorulIs it possible to dynamically increase and decrease nodepool's pool size?01:57
*** jamesmcarthur has quit IRC02:06
clarkbit will respect quota sizing on openstack clouds02:08
clarkbthat means if you change the quota limits nodepool will respect them02:09
*** jamesmcarthur has joined #zuul02:18
*** openstackgerrit has quit IRC02:37
*** jamesmcarthur has quit IRC02:37
*** bhavikdbavishi has joined #zuul02:43
*** noorul has quit IRC02:46
*** jamesmcarthur has joined #zuul02:49
*** jamesmcarthur has quit IRC02:59
*** noorul has joined #zuul02:59
*** noorul has quit IRC03:01
*** bhavikdbavishi1 has joined #zuul03:02
*** bhavikdbavishi has quit IRC03:03
*** bhavikdbavishi1 is now known as bhavikdbavishi03:03
*** noorul has joined #zuul03:42
noorulclarkb: So there is no dynamic config loading for nodepool at nodepool level03:43
clarkbnodepool will reload its config off of disk each time through its loop03:44
noorulI have base job definition http://paste.openstack.org/show/766657/03:44
noorulclarkb: Oh I see03:44
noorulclarkb: So I just have to update the config by gitops03:44
noorulIn the initial version playbooks/base/post.yaml was not there as part of post-run03:45
noorulI added that now03:45
noorulBut after restarting both the scheduler and executor, the new PRs are not running that playbook03:46
noorulAm I missing something?03:46
clarkbIm not sure. The bitbucket driver is new could be a bug03:46
noorulAre you suspecting bitbucket driver here?03:47
clarkbif that is where that config is loaded then maybr03:47
noorulI thought the driver has minimal role03:47
clarkbwell that config comes from git via that driver03:47
clarkbunless you load that config from elsewhere03:47
clarkbbut restarting the scheduler should reload all configs03:47
noorulI restarted both the scheduler and executor03:48
noorulWhere on the disk zuul stores latest config?03:48
clarkbit is loaded into memory03:48
noorulI see03:49
clarkbbut the source of that is your git repo's yaml files03:49
clarkbthat commit was merged?03:49
noorulYes03:50
noorulIt got merged and now it is in master03:50
clarkbyou can check if the commit is present in your git repos /var/lib/zuul/executor-git I think03:51
noorulhttp://paste.openstack.org/show/766658/03:52
noorulThat folder has that03:53
noorulhttp://paste.openstack.org/show/766659/03:53
clarkbI would expect it to be loaded then03:54
noorulI think now it is working04:13
noorulhttp://paste.openstack.org/show/766660/04:14
noorulclarkb: It looks like it is not picking up nodepool_rsa private key04:22
* noorul is going crazy with Zuul's key management04:22
* SpamapS peeks04:27
SpamapSnoorul: This part of it isn't so hard... you just have a private key that's on the executors, and the public portion should be named in your cloud provider.04:28
noorulSpamapS: I am using static driver04:48
noorulSpamapS: My nodepool instance is running on the same machine04:48
noorulSpamapS: When I look at /var/lib/zuul which is the home folder for zuul user04:48
noorulSpamapS: I see under .ssh folder id_rsa and nodepool_rsa04:49
noorulI assume that rsync is running on executor04:53
noorulas zuul user. If so it will be id_rsa private key by default.04:53
noorulI am not sure where it came from. Who is responsible for adding id_rsa.pub to authorized_keys of nodepool instances?04:54
noorulSpamapS: ^^^04:56
*** jamesmcarthur has joined #zuul05:00
*** jamesmcarthur has quit IRC05:04
*** raukadah is now known as chandankumar05:09
*** openstackgerrit has joined #zuul05:15
openstackgerritIan Wienand proposed zuul/zuul master: executor: always setup an inventory, even for blank node list  https://review.opendev.org/67918405:15
*** badboy has joined #zuul05:16
*** bjackman has joined #zuul05:47
openstackgerritIan Wienand proposed zuul/zuul master: executor: always setup an inventory, even for blank node list  https://review.opendev.org/67918406:10
*** noorul has quit IRC06:46
*** tosky has joined #zuul07:23
*** jpena|off is now known as jpena07:40
*** themroc has joined #zuul07:53
*** sshnaidm|afk is now known as sshnaidm|ruck08:20
*** tdasilva has joined #zuul08:38
tobiashzuul-maint: is anyone already running with the json log appending change? (https://review.opendev.org/676717)08:50
tobiashI'm wondering if I need to take special care when updating zuul08:50
*** hashar has joined #zuul09:00
corvusclarkb, fungi, mnaser: all things considered, i think we should simplify the log urls for swift and just put them in build_uuid[:2]/build_uuid/ for all jobs.  i think the web ui is sufficient for navigating between builds.09:22
*** sshnaidm|ruck is now known as sshnaidm|afk09:44
*** bhavikdbavishi has quit IRC10:02
openstackgerritFabien Boucher proposed zuul/zuul master: Add reference pipelines file for Github driver  https://review.opendev.org/67271210:28
*** sshnaidm|afk is now known as sshnaidm|ruck10:33
*** bjackman has quit IRC10:40
AJaeger_tobiash: clarkb restarted opendev yesterday with the json log appending change AFAIU - I haven't seen any problems mentioned but backscroll here (or #openstack-infra) from last night should mention any issues11:04
*** gtema has joined #zuul11:04
tobiashcool, thanks11:07
*** jpena is now known as jpena|lunch11:35
*** badboy has quit IRC11:59
*** rfolco has joined #zuul12:28
*** rlandy has joined #zuul12:34
*** gtema has quit IRC12:40
*** jpena|lunch is now known as jpena12:41
*** jeliu_ has joined #zuul12:42
openstackgerritFabien Boucher proposed zuul/zuul master: A Zuul reporter for Elasticsearch  https://review.opendev.org/64492712:42
fungicorvus: makes sense to me, though clarkb also suggested [:3] if we want to shard across 4k containers instead of only 25612:43
*** panda|rover|off is now known as panda|rover12:53
*** zbr has quit IRC13:08
*** zbr has joined #zuul13:16
*** brendangalloway has joined #zuul13:18
openstackgerritTobias Henkel proposed zuul/zuul master: Add support for smart reconfigurations  https://review.opendev.org/65211413:23
openstackgerritTobias Henkel proposed zuul/zuul master: Add --check-config option to zuul scheduler  https://review.opendev.org/54216013:23
brendangallowayHi, we've been looking at ways to request a nodeset across multiple connected providers.  Is there any way to do this with existing configuration options?13:26
Shrewsbrendangalloway: Not with the current nodepool. We specifically do not allow that.13:27
brendangallowayAny suggestions for how to work within that restriction?  We need to be able have access to static nodes and openstack nodes within the same job13:29
jangutterShrews: would there be interest if we built something to a kind-of aggregate providers specifically for labs that needs cross-provider nodesets?13:30
pabelangerbrendangalloway: you could use add_host, at jobs run time to do it13:30
pabelangerwhich means, keeping the static node outside of nodepool13:30
Shrewsbrendangalloway: oh, you mean across multiple drivers, not providers?13:30
ShrewsI'm not aware of any restrictions to using multiple nodepool drivers simultaneously.13:31
brendangallowayYes - sorry if my terminology is not 100% accurate13:32
brendangallowaynodepool returns node error when we try defined a nodeset with both a static node and an openstack node13:32
Shrewsbrendangalloway: just define one set of labels for static nodes, and another for the openstack nodes. Should work, afaik13:32
Shrewsthough i don't know anyone that actually does that. maybe tobiash does?13:33
Shrewsbrendangalloway: is there a traceback in the logs?13:33
jangutterShrews: can you have one provider with multiple drivers?!13:33
tobiashmixing static and dynamic nodes in a single nodeset is not possible13:33
tobiashbecause that always will be multiple providers13:34
brendangallowaylooking at the logs, each driver is given the request ['static', 'openstack'], which none are able to provide individually13:34
Shrewstobiash: ah, i guess because we always try to group to the same provider. yeah, i suppose that's right13:34
jangutterShrews, tobiash: yah, we work around it by using Ironic to spawn baremetals, but we're hitting a limit where we need the static nodes, unfortunately.13:35
Shrewsi think pabelanger's suggestion might work for you then13:35
Shrewssorry, still working on injecting coffee  :)13:37
tobiashone nodeset is always one node request and one node request can only be satisfied by one provider. That's how nodepool works at the moment.13:38
tobiashso to combine static and dynamic node you'd need two jobs13:39
pabelangerbrendangalloway: https://github.com/ansible-network/windmill-config/blob/master/tests/playbooks/bastion.yaml is an example of how to add dynamic static node13:39
pabelangervia ansible13:39
pabelangerhowever, you'll then need to set semaphores on job, to avoid multiple jobs using same static node13:39
janguttertobiash: I was thinking to modify nodepool to build a driver that specifically aggregates non-overlapping providers into one provider for zuul.13:40
tobiasha meta-driver?13:40
janguttertobiash: yeah - allowing you to specify that "these providers belong to the same site".13:40
tobiashthat won't solve the problem because actually each provider is split into pools and the pools are the thing that counts for the algorithm13:41
brendangallowaypabelanger: Thanks, I'll take a look at that.  I'm guessing it works by name and not by label?13:41
tobiashso to really solve that one would need to completely redesign the algoritm how nodepool satisfies node requests13:41
pabelangerbrendangalloway: right, the static node isn't even in nodepool13:42
pabelangerso you need to manage that info at job run time13:42
janguttertobiash: even if the labels are guaranteed not to overlap?13:42
tobiashjangutter: the current algoritm is: take node request, satisfy it completely or decline it, if you're the last one to decline, fail the request13:42
tobiashthis is done by all providers (actually poolworkers) in parallel13:43
brendangallowaypabelanger: We were hoping to leverage nodepool as a host booking system rather than needing to build our own13:43
tobiashso to solve this you'd need a way to partially satisfy a node request and let another poolworker continue13:43
janguttertobiash: right, but in my scenario, if a provider is a "slave provider" (i.e. has been registered as belonging to a site), it never tells zuul directly it can satisfy a request.13:43
pabelangerbrendangalloway: agree, I think that is the right approach, in my case, i didn't want other jobs to request this nodeset, it was the reason to keep it out13:44
janguttertobiash: but I see what you mean, it introduces a lock - the site has to wait for all pools to come back before it can say "I can satisfy this".13:44
tobiashwhat you could do in a meta provider (but that's an ugly hack): you could take the node request, create a secondary node request for each label in the original node request, wait for all to be satisfied, attach those nodes to the original node request and satisfy it13:48
tobiashI think this could work13:50
janguttertobiash: that's some really next-level recursion there!13:53
tobiashbut I think this should work13:53
brendangallowaypabelanger:  what happens if you have more than one instance of the job triggered?13:53
tobiashbrendangalloway: in this case you're doomed ;)13:54
tobiashbrendangalloway: but you can avoid this by using a semaphore on the job13:55
*** tflink has quit IRC13:55
brendangallowaypabelanger: we're trying really hard to undoom outselves :p13:55
Shrewstobiash: you're thinking of the meta provider as a separate driver? that's an interesting thought13:57
*** mordred has quit IRC13:57
janguttertobiash: not knowning much about semaphores, but can you tell in the job that "hey, I've been given semaphore 3 out of 4"?13:57
*** tflink has joined #zuul13:57
tobiashcorvus, fungi: re executor performance, I'm wondering if we should document some performance requirements regarding the executors (e.g. on large deployments put executor on ssd + use plenty of ram)?13:57
tobiashShrews: yes, a separate driver that maybe could be configured with supported labels13:58
pabelangerbrendangalloway: yah, without semaphore, more then one could result in odd job results13:58
tobiashjangutter: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.semaphore13:59
tobiasha semaphore tells zuul how many jobs referencing that semaphore can run in parallel13:59
Shrewstobiash: the dodgey part would be preventing the meta driver from handling its own sub-node-requests13:59
*** mordred has joined #zuul13:59
Shrewsbut, yeah,might be possible14:00
tobiashShrews: right, but it could do that easyly by just not handling node requests with one node type14:00
jangutterShrews, tobiash: yeah, I had the idea of a field that you set in all the providers that say: "hey, I'm a slave to site <x>, I don't satisfy requests directly".14:01
Shrewswell, it would have to handle ALL node types. but, it could set the requestor field and just decline those requests that came from itself14:01
tobiashI think a meta driver wouldn't make much sense to handle requests with a single type (in this case it would just proxying the request without benefit) so it could be limited to only support multi-label requests and decline all the others14:03
tobiashbut I guess we should still use the requestor field for that too14:04
Shrewstobiash: yeah, but that requires changing the common driver code (i think), and i wouldn't approve any driver that needed to change that (without a very good reason)14:04
jangutterShrews, and I guess bribery is not a very good reason :-p14:05
Shrewsjangutter: in any case, you're in unknown territory here, so there may be dragons we haven't yet thought of  :)14:05
jangutterShrews, dragons, grues, basilisks... we're kinda used to 'em.14:06
tobiashShrews: right, I thought that part was not in the common driver code but it is14:07
tobiashthen I agree that it would need to handle all types14:07
Shrewstobiash: so hard to remember such complex code  :)  especially before 2nd cup of coffee!14:08
tobiashmy coffe is already hours ago...14:08
jangutterOK, so let's assume this turns out to require common driver code mods - is there a spec process or something where we can suss out the details?14:09
Shrewsjangutter: no, just discuss here for that14:10
jangutterOkay! I'll see if I can come up with some code blasphemy that can be used as a discussion point.14:11
Shrewsjangutter: we haven't really spec'd out the common driver code (it just developed as we added more drivers), but as it has proven very stable at this point, changes to it must be carefully considered14:11
jangutterShrews: yep, core plumbing is core for a reason.14:12
*** sgw has quit IRC14:12
brendangallowayCircling back to jangutter's previous question another, simpler, hack (for our specific usecase at least) would be if there was some way for a job to know which part of a semaphore it is?14:14
jangutterYah, is the job semaphore value in zuul vars somewhere?14:14
brendangallowayadd_host 1 when the job is number one in the queue, add_host 2 if it is number 214:14
brendangallowayautomatically linking lab state to job add_host calls will probably require some maintenance, but it will at least prevent zuul jobs from stomping on the static nodes14:16
*** sgw has joined #zuul14:31
clarkbzuulians I think the executor restarts on opendev have been happy with 50a0f736a3de29280e459cccc4c25a46cba9570e looking at git log since the last release most of the changes have been against web and the executor14:37
clarkbI'm thinking we should tag that commit as 3.10.2 or 3.11 (haven't checked for appropriate semver bump). If we would prefer we can also restart opendev's scheduler and mergers and do the release on say monday or tuesday though14:38
*** themroc has quit IRC14:42
*** jamesmcarthur has joined #zuul14:42
tobiashmaybe the zuul-web changes justify 3.1114:49
pabelangerzuul for workflows 3.1114:49
fungiyikes14:50
fungii can't un-see that now14:51
pabelangerclarkb: we need some reno notes first I think14:52
pabelangerwe have nothing to indicate need for next release14:53
clarkbI can write one for the bug fix, but someone else might be able to better capture the web updates14:53
* clarkb writes bug relnote14:53
tobiashare the zuul web updates complete enough to do a release?14:55
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Add autohold delete/info commands to web API  https://review.opendev.org/67905714:59
openstackgerritClark Boylan proposed zuul/zuul master: Add release note for bug fix to test correct commit  https://review.opendev.org/67928515:00
*** bhavikdbavishi has joined #zuul15:00
clarkbpabelanger: ^15:00
clarkbtobiash: I feel like they've been working for opendev15:00
clarkbwe know there are things to improve but they are definitely useable15:00
*** bhavikdbavishi has quit IRC15:04
pabelanger+215:05
*** jamesmcarthur has quit IRC15:07
*** jamesmcarthur has joined #zuul15:07
*** bhavikdbavishi has joined #zuul15:11
clarkbhttps://twitter.com/GerritReview/status/116705957109501542515:20
clarkbif you look closely you might see a familiar logo15:20
Shrewsvolvo!!!15:23
Shrewsno?15:23
Shrewsjfrog?15:23
clarkbthats the one :P15:23
Shrewslooks like they've posted a tweet featuring some water-loving nomad as well15:25
*** sshnaidm|ruck is now known as sshnaidm|afk15:26
clarkbyou almost can't recognize him without the scuba gear on15:27
Shrewshe *totally* should have presented in his wet suit15:27
* Shrews must afk for a bit15:28
*** chandankumar is now known as raukadah15:40
*** brendangalloway has quit IRC15:44
openstackgerritFabien Boucher proposed zuul/zuul master: Add reference pipelines file for Github driver  https://review.opendev.org/67271215:57
*** jpena is now known as jpena|off16:11
*** sshnaidm|afk is now known as sshnaidm|off16:39
*** igordc has joined #zuul16:44
*** jamesmcarthur has quit IRC16:50
SpamapSIs there any way to restrict a role to trusted-only? I'm asking because I'm trying to figure out how to make a role to allow a user to SSH in, but I only want them to be able to get in if their key is in a whitelist of keys... since vars: is passed as extravars .. I think they could always override any value unless the role is always run in a trusted context.16:56
* SpamapS will propose the role so you can see16:56
openstackgerritClint 'SpamapS' Byrum proposed zuul/zuul-jobs master: WIP: intercept-job -- self-service SSH access  https://review.opendev.org/67930616:58
pabelangerSpamapS: rather role directly, make it a job, with final: true? leave it in config-project and variable can't change?17:00
clarkbya I think roles are intentionally meant to be reusable17:00
clarkbyou could also set it up to take a secret then only jobs in the trusted repo would have access to the secret17:00
pabelangeryah, that might work too17:01
clarkbhowever if we are talking ssh authorized_keys any job can update those for the current user17:01
SpamapSclarkb: good point17:06
SpamapSbut I"m making it REALLY easy17:06
clarkbfor some this might be a downside of ansible using ssh17:07
clarkbmakes it hard to lock down access via ssh17:07
openstackgerritTristan Cacqueray proposed zuul/zuul master: web: add config page  https://review.opendev.org/63366717:11
SpamapSI've always wondered if we should firewall off once we're connected17:11
clarkbya as long as the job then deescalates its privileges that would work17:14
clarkbwe do that in a lot of the tox jobs, start and do setup that needs root then remove root access so that things like unittests don't abuse that17:14
*** panda|rover is now known as panda|rover|off17:18
*** hashar has quit IRC17:24
SpamapSclarkb:ok, I'll chill out on the allow list stuff then. SSH for everybody!!17:58
*** rlandy is now known as rlandy|biab18:34
*** noorul has joined #zuul18:46
clarkbtobiash: any reason to not approve https://review.opendev.org/#/c/670461/ ?18:57
clarkbI noticed you +2'd but didn't +A (I hope it was clear that I didn't think the change needed updating in my comment as I called it a Nit and +2'd anyway)18:58
tobiashclarkb: I wanted to land this together with the child change which still has a -1 by corvus (which I think is addressed now) but I didn't want to overrule18:59
clarkbgot it18:59
tobiashbut I'd love to land both19:00
tobiashone patch less on top of master ;)19:00
*** spsurya has quit IRC19:07
clarkbboth look good to me. I guess if corvus acks them they can go in then19:08
tobiash:)19:09
clarkbevrardjp: for https://review.opendev.org/#/c/676393/1 I don't think we can remove that interface now that it is there19:10
clarkbevrardjp: if we did so then jobs consuming that interface will break19:10
*** jeliu_ has quit IRC19:12
*** bhavikdbavishi has quit IRC19:21
clarkbtristanC: I left some thoughts on https://review.opendev.org/#/c/676424/519:23
*** noorul has quit IRC19:26
openstackgerritMerged zuul/zuul master: Add reference pipelines file for Gerrit driver  https://review.opendev.org/67268319:34
*** rlandy|biab is now known as rlandy19:39
*** jamesmcarthur has joined #zuul20:36
*** jeliu_ has joined #zuul20:40
*** armstrongs has joined #zuul20:55
*** armstrongs has quit IRC21:04
*** rfolco has quit IRC21:09
*** jamesmcarthur has quit IRC21:26
*** jeliu_ has quit IRC21:35
*** jamesmcarthur has joined #zuul21:38
*** rfolco has joined #zuul21:42
clarkbShrews: https://review.opendev.org/#/c/671704/2 could be useful for debugging cloud problems21:42
clarkbadds better debugging to nodepool logs when nodes fail to launch21:42
clarkbthanks tobiash !21:42
*** jamesmcarthur has quit IRC21:43
Shrewsclarkb: yes, but I left a question re: tests21:44
clarkbya I guess I see the utility of it to be worthwhile and I don't think we have tests for any of that code there21:46
clarkb(the retries and logging of quota errors, but may be I am wrong21:48
*** jamesmcarthur has joined #zuul21:59
*** jamesmcarthur has quit IRC22:04
*** jamesmcarthur has joined #zuul22:10
*** jamesmcarthur has quit IRC22:17
*** rlandy has quit IRC22:18
clarkbianw: for https://review.opendev.org/#/c/679184/2/zuul/executor/server.py we do allow untrusted executor only jobs but they are limited in what they can do22:29
*** tosky has quit IRC22:29
clarkbI don't think that having an inventory for htat case is any less secure because those jobs could try to add host anyway?22:30
*** jamesmcarthur has joined #zuul22:31
*** jamesmcarthur has quit IRC22:35
ianwclarkb: yeah, i think that in either case, having the inventory explicit is about the same23:04
ianwthat {{hostvars}} is not defined for implicit localhost is i think, at best, confusing23:05

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!