Friday, 2017-11-03

*** dkranz has joined #zuul00:11
*** xinliang has quit IRC03:43
*** xinliang has joined #zuul03:55
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make encrypt_secret.py work with OpenSSL 0.x  https://review.openstack.org/51713306:37
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Reset state on unpaused, declined request  https://review.openstack.org/51741706:44
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add BaseSource.getProjectReadonly and refactor  https://review.openstack.org/51706706:55
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Do not add invalid projets via the /keys API  https://review.openstack.org/51707806:55
*** hashar has joined #zuul08:58
*** electrofelix has joined #zuul09:38
*** Cibo_ has joined #zuul10:43
*** Cibo has joined #zuul10:54
*** Cibo_ has quit IRC10:56
*** hashar has quit IRC11:08
*** hashar has joined #zuul11:13
*** hashar has quit IRC11:37
*** hashar has joined #zuul12:19
dmsimardAnyone know where I could start troubleshooting three mergers being connected to geard properly but only one of them is picking up work ?13:52
*** jkilpatr has quit IRC13:52
dmsimardit's connected properly ... zuul-merg 10705   zuul    6u  IPv4 26617254      0t0  TCP zm01.review.rdoproject.org:41876->managesf.review.rdoproject.org:4730 (ESTABLISHED)13:52
dmsimardrestarted the two not picking up work in debug and that's all I get http://paste.openstack.org/raw/625431/13:53
dmsimardas I mention that, things start rolling again...14:09
dmsimard¯\_(ツ)_/¯14:09
*** jkilpatr has joined #zuul14:12
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Switch to threading model of socketserver  https://review.openstack.org/51743714:45
*** sambetts|afk has quit IRC14:51
*** jkilpatr has quit IRC15:01
*** dkranz has quit IRC15:09
jeblairdmsimard: telnet geardserver 473015:10
jeblairdmsimard: workers15:10
jeblairdmsimard: status15:11
dmsimardjeblair: it was a red herring, what was stuck was actually our nodepool15:11
dmsimardit was just weird that two mergers were not picking up anything but all three are doing stuff now.15:11
jeblairdmsimard: it's worth noting that unless the system is busy and there are jobs queued, it's quite likely for only one to pick up jobs (and the same one).  all depends on cpu speed and network topology.15:11
jeblairdmsimard: the geard server doesn't round-robin or anything, it wakes up all idle workers, and the first one to respond gets the next job.15:12
dmsimardyeah, we had to get more than one merger.. we have three and it's fine now. Back when we only had one, sometimes we would get one of those "rebase bombs" from the upstream gerrit and that monopolizes a merger for quite a bit15:12
jeblairdmsimard: and it wakes them in the same order :|15:12
dmsimardlike a 22-long patch stack with depends-on mixed in between15:13
jeblairgood times15:13
*** jkilpatr has joined #zuul15:14
*** kmalloc has joined #zuul15:37
dmsimardjeblair: I'm trying to hunt down the best solution to fix https://review.openstack.org/#/c/514489/ and https://review.openstack.org/#/c/514490/15:50
dmsimardjeblair: I wondered if we should just make sure the hostvars apply to localhost as well15:50
dmsimardbut at the same time, there can be legitimate "nodeless" jobs which run *only* on localhost, so I'm not sure to what extent the concept of nodepool vars applies (cloud, provider, ip addresses, etc.)15:52
dmsimardFor example, tristanC's work on the container driver defines a different inventory layout https://review.openstack.org/#/c/468753/24/nodepool/driver/oci/provider.py15:54
dmsimardIt'd look like emit-job-header would be driver dependant15:55
dmsimardso maybe we could pull that back in the executor code itself, rather than to try and keep it as a role/playbook15:56
*** bhavik has joined #zuul15:56
SpamapSjeblair: oh waking in the same order. That would be an easy thing to fix. ;)16:06
* SpamapS throws it on the pile16:06
SpamapSjeblair: note that ideally they'd be woken up in the order that they slept.16:09
SpamapSso the one that gave you a PRE_SLEEP first is the first one to wake up. That sort of functions as a round robin.16:10
* SpamapS ponders putting that into the protocol doc16:10
jeblairSpamapS: ++16:27
jeblairdmsimard: indeed we have nodeless jobs already.16:28
dmsimardjeblair: what does the inventory look like for nodeless jobs ?16:30
jeblairdmsimard: i think the right solution is for zuul not to care.  i think our job header should gracefully handle missing data.16:30
jeblairdmsimard: "hosts: []"16:31
jeblairor hosts: {}16:31
jeblairsomething like that16:31
jeblairdmsimard: yeah, should be "all: hosts: {}"16:31
*** Cibo has quit IRC17:03
*** dkranz has joined #zuul17:07
*** jkilpatr has quit IRC17:16
*** jkilpatr has joined #zuul17:28
*** bhavik has quit IRC17:36
*** Cibo has joined #zuul18:10
*** Cibo has quit IRC18:16
*** weshay is now known as weshay_brb18:30
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Make encrypt_secret.py work with OpenSSL 0.x  https://review.openstack.org/51713318:37
jlkugh. So if I bring 20K to the table, I can get a decent XC90 on a 72 month term for the same monthly payments I'm making now. Only, I think I'm underwater on current car, so blah.18:46
SpamapSjlk: wrong window :)18:50
jlkah shit.18:50
jlktoo bad this isn't slack, I can't delete it.18:50
SpamapSyou can't really delete it in slack either. :)18:50
SpamapSYou can just hide it from unprivileged users who haven't thought to run clients that log everything yet. ;)18:51
*** Cibo has joined #zuul18:52
jlksure18:52
jlkSpamapS: as a GH user, care to review https://review.openstack.org/#/c/517121/ ?18:52
jlkjeblair: ^^ is ready for review18:53
SpamapSjlk: oh yeah that one is cool :)18:54
SpamapSeven though I still don't have apps ;)18:54
jlkah right.18:54
jlkI wonder if you'll get GraphQL before apps18:54
SpamapSbut I can review with hope for the future18:54
*** Cibo has quit IRC19:11
SpamapSLooks like we might have a GH bug: http://paste.openstack.org/show/625458/19:19
*** weshay_brb is now known as weshay19:20
*** electrofelix has quit IRC19:21
jlkinteresting!19:32
jlkwe only seem to set that value in pull request style events19:33
jlkwha tevent was taht?19:33
jlkI'm guessing a push19:33
jlkor...19:34
jlkwow, why is it a ZuulTriggerEvent?19:35
SpamapS2017-11-03 12:18:43,376 DEBUG zuul.ZuulTrigger: onChangeEnqueued {'parent-change-enqueued'}19:49
SpamapS2017-11-03 12:18:43,376 DEBUG zuul.ZuulTrigger: Checking for changes needing <Change 0x7f2c1c016a90 20,472d642f38f232719e8d75ee15c87ac09d2fa2bd>:19:49
SpamapSjlk: just added parent-change-enqueued as a trigger. :)19:50
jlkokay, so that's the new thing we never hit in Bonny19:50
SpamapSwhich is how you get dependencies to merge along with their parents19:50
jlkinteresting that the trigger object is a zuul trigger event, so it doesn't have that updated_at19:50
SpamapSthe source is zuul. :)19:51
SpamapSso it makes sense19:51
jlkIt means we wouldn't be able to determine if it's an updateOf something else19:52
jlkmaybe what could be done is if updated_at isn't a key, create it and make it "now"19:53
*** dkranz has quit IRC20:41
SpamapSYeah that's the way to go I think.20:50
SpamapSBecause that's pretty much the way it happened20:50
SpamapSso I have an interesting problem to deal with now.. hoping semaphore can help but I'm not sure.20:52
SpamapSI have a pool of resources that are statically allocated to my CI (user accounts). Creating/deleting them is heavy so we don't want the churn. But the accounts give us isolation during a CI run...20:53
SpamapSWhat I'm not sure how I'm going to deal with is how to hand out one account to only one job at a time.20:53
SpamapSthe semaphore will just tell me how many are concurrently running. But I kind of need something more like a resource pool.20:54
clarkbcould have a pre run on the executor do a checkout of the account and lock it20:55
clarkband do it all in ansible20:55
clarkbthough not sure if you can then modify the inventory20:56
clarkbthat might be the issue with this plan20:56
SpamapSI think I need a state store for that.21:01
SpamapSand I may need one, period.21:01
clarkbya I think thats a given if you need to coordinate arbitrary data among jobs?21:01
SpamapSyeah I was just trying to think if I could use something in zuul already but I don't think I can.21:02
SpamapSI honestly just need a pool of integers.21:02
SpamapSsince I could have an array of 10 user accounts and just pick using the integers as offsets.21:03
SpamapSFeels like something zookeeper or etcd would do well.21:04
jeblairthings like this may or may not come under nodepool's remit.  we're giving it more node types, and we may extend it to support more types of cloud resources.  but this is perhaps more general.  perhaps extending semaphores in zuul to include data, or making secrets reservable is another option.21:04
jeblairSpamapS: but i agree, nothing off-the-shelf.21:05
clarkbreservable secrets makes sense to me21:05
clarkb"this is my trove cluster now" "this job has complete control of this k8s deployment"21:06
clarkbmaybe those arent the best examples but ya21:06
jeblairSpamapS: a pre-playbook could reserve it via zk or etcd, but you'd still need a way to put it back, and we need cleanup jobs (your idea!) for that.  we haven't implemented them yet, but they're probably not *too* hard.21:06
SpamapSjeblair: oh yeah I'd love it if nodepool could be extended to do this.21:07
SpamapSFor now I think what I'm going to do is just abuse the zookeeper we already have for zuul but with different creds and different "chroot"21:10
jeblairSpamapS: yeah, if you can do it all within one playbook then an ephemeral zk node should work well21:12
SpamapSshould work OK if I start a daemon-ish process early in each playbook which holds the resource while the playbook runs. As long as the node timeout is longer than the lag between pre/run/post21:12
SpamapSNo I was thinking it would cross the boundaries but that as long as we refresh the ephemeral nodes in time it wouldn't be a problem.21:13
jeblairSpamapS: i'm not sure that's going to work.  once a playbook ends, bwrap will kill the process, and ephemeral nodes are tied to connections, so zk will soon realize that connection is dead and delete it.  if you tried to lock it again, it could race with another connection (and using the standard zk locking algorithm, if there is a race, it will lose, because it'll be at the back of the line)21:18
SpamapSjeblair: there's a timeout for re-establishing that dead connection, I was hoping we can make that long enough to survive pre to run.21:19
jeblairSpamapS: yeah, but you won't be re-establishing it, you'll be making a new one21:19
SpamapSActually21:19
SpamapSI have nodes21:19
jeblairdoing it on the node works :)21:19
SpamapSAnd in my evil world, those nodes can reach my zk (or.. a ZK anyway)21:19
SpamapSbut gah21:20
SpamapSthis got evil21:20
SpamapSMaybe I can just write it as a zuul feature. ;)21:20
SpamapSAnother thought is to just write a little API frontend.21:20
jeblairyou could implement cleanup jobs :)21:20
SpamapSYeah21:20
SpamapSNot sure how much more I can invest in Zuul.. just getting it up and running has been a bit of a side-job.21:21
SpamapSRight now my automation spins up the heavy accounts and deletes them. And that's ok because the job that needs those accounts runs like, 3 times a day. But the admins of that service have made it clear that the harder I push it, the more it will fail.21:22
jeblairalso, i wonder if another sort of job might be useful -- one that starts before any others in the buildset, runs for the duration of the buildset, then gets a signal (or is killed) when all others are done.  so basically doing a pre-job + cleanup-job pair but with only one job.21:22
SpamapSso the pool of accounts is something I'll have to do before I get more jobs on Zuul.. but it's also something I can't do until I get more buy-in for zuul ;-)21:22
SpamapSjeblair: oh interesting!21:22
SpamapSA supervisor job21:23
jeblairgood name21:23
jeblairthat's still probably about as much work as the cleanup job (they share enough maybe we can implement both at once)21:23
*** Cibo has joined #zuul21:23
jeblairSpamapS: i have a really hacky idea21:23
jeblairthis is terrible and no one should ever do it21:23
jeblairbut, you could probably make your own supervisor job just by having a long-running zero-node job that queries the zuul status.json to figure out when its peers are all done.21:24
jeblairyou'd still need to do some inter-job communication outside of zuul to coordinate things.  but it's an option.21:25
jeblairi'm not actually sure it's any less work than implementing cleanup job.  but just brainstorming here.  :)21:25
SpamapSI like it21:27
SpamapSBut I think the trouble is still finding the coordination point.21:28
SpamapSdunno, I've sort of backburnered it now as I think I'll have to let it sit and stew for the next 2 weeks while I get some other stuff done21:29
SpamapSI'm sure if I was coming to Sydney I could have figured it out with all of you over fried Wallaby toes.21:29
SpamapSjlk: have you ever succeeded in getting Zuul to dump debug logs about the github API requests it is making?21:35
SpamapSI'd really like to include the Etag's and stuff so GH support can debug with me.21:35
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Prime github app install map on connection load  https://review.openstack.org/51712121:50
*** harlowja has quit IRC22:12
jlkSpamapS: yeah I thought with a log config that was cranked up to debug you can get some23:01
jlkoh, maybe not. I see "caching due to etag" but not what the etag itself is23:02
jlkso maybe it's more github3.py debugging23:02
jlklet me see if I can insert a debugger here and get data23:03
*** harlowja has joined #zuul23:04
openstackgerritEmilien Macchi proposed openstack-infra/zuul-jobs master: version-from-git: fix logic with tags  https://review.openstack.org/51773323:05
*** Cibo has quit IRC23:39
openstackgerritMerged openstack-infra/zuul-jobs master: version-from-git: fix logic with tags  https://review.openstack.org/51773323:48
*** hashar has quit IRC23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!