Wednesday, 2018-11-07

*** caphrim007 has quit IRC00:03
*** ssbarnea has quit IRC00:28
jheskethcorvus, mordred, dmsimard: Yep, the start of which for locally run stuff is here: https://review.openstack.org/#/q/status:open+project:openstack-infra/zuul+branch:master+topic:freeze_job (I took a long vacation last month, so I'm just catching up on things and hope to continue that "real soon now"(tm))00:37
*** irclogbot_3 has quit IRC00:38
dmsimardjhesketh: ++ I know the feeling00:56
*** rlandy has quit IRC01:58
*** threestrands has joined #zuul02:30
*** bhavikdbavishi has joined #zuul03:52
dmsimardmy zuul-fu is rusty, what's the best approach when you'd like to override just a single parameter from a job that is otherwise okay ?03:58
dmsimardlike, pretend in a totally hypothetical scenario that I'd like my tox-linters job to run on ubuntu-bionic instead of the current ubuntu-xenial03:58
dmsimardit's not enough to warrant creating a child job that would inherit tox-linters03:59
dmsimardit would just be a variant I guess ?04:01
clarkbdmsimard: yes would just be a variant04:07
clarkbcan be as simple as setting nodeset where you choose to run the job04:07
dmsimardfor some reason I may misremember variants as being across branches of a single project04:08
dmsimardor perhaps that's just the most common use case.. I'm rusty T_T04:09
clarkbyes that is one instance where it happens04:10
tristanCdmsimard: you may also override parameter when adding the job to a pipeline04:11
tristanCdmsimard: e.g. https://softwarefactory-project.io/r/#/c/13585/3/zuul.d/openshift-jobs.yaml04:11
dmsimardtristanC: oh, thanks04:12
*** bhavikdbavishi has quit IRC04:30
*** pall has quit IRC05:28
*** threestrands has quit IRC06:23
tobiashtristanC, corvus: what do you think about making config errors a bit more prominent in the web ui?06:47
tobiashcurrently it's just a tiny little bell in the upper right corner. This can be overlooked easily by most users. I think having a real warning bar would be more appropriate as this should indicate something a project should fix.06:49
tobiashspotting this on the first glance is especially with the github workflows and protected branches essential as this has a lot of opportunity about introducing config errors which we don't have any chance to catch06:51
tobiashs/especially/especially important/06:51
*** bhavikdbavishi has joined #zuul06:52
tristanCtobiash: i actually don't have an opinion :)06:52
tristanCtobiash: though i meant to add an generic error_reducer to simplify api error handling:06:53
*** andreaf has quit IRC06:53
tristanCin the main app.jsx, we could have a warning banner hidden on all the pages06:53
tobiashtristanC: for me as an multi-tenant operator this would be crucial as config errors are one of our re-occurring pain points06:54
tristanCand an error reducer that could make it visible when api call fails, or in this case, when config-error list isn't empty06:54
jheskethtobiash: my opinion fwiw is that it's appropriate. The majority of visitors to the zuul dashboard are looking for their job status or results, they are likely less concerned with the configuration06:54
*** andreaf has joined #zuul06:55
jheskethof course we should be logging the errors back to the patchsets introducing them, but we need to figure out a better way of testing config projects first06:55
tristanCtobiash: my concern with multi-tenant operation, is that you have to go through each tenant to get their status and config-errors. What do you think of having top-level endpoints to return aggregated data?06:56
tobiashjhesketh: the problem is that with many workflows there are no patchsets that introduce the errors but just things like repo renames, protecting a branch, etc which are all things that can be done by any repo admin in github without any possibility to react/report from zuul side06:56
tobiashtristanC: I'm not interested in seeing all config errors on a global scale but make them more visible to my users (that only care for their own tenant)06:57
tobiashtristanC: each tenant is mainly responsible for its own config here06:59
tobiashwhat we're doing atm to mitigate this as best as possible is to start a zuul to verify the global config on each main.yaml change that is proposed (e.g. to add/rename/remove repos)07:00
tobiashthat is not going to scale in the foreseeable future so I'd like to change that to only validate if the main.yaml itself is valid07:01
jheskethtristanC: it would be nice to have a top-level view for administrators, but I agree with tobiash about not displaying them all to all tenants07:01
tobiashjhesketh: so currently the errors are already skoped by tenant so that's great07:01
jheskethtobiash: and yes, I see your point about where errors may come from... I'm still not sure if the visitors to the dashboard will care for larger warnings though? If we ever build an admin set of pages perhaps it could be more prominent there?07:01
jhesketh:thumbsup:07:01
tobiashjhesketh: I think this just needs to be a bit more visible so I don't get complains several times a week why zuul is not acting on a specific repo and telling them 'look there is a tiny little bell that tells you why' ;)07:03
jheskeththat's fair and I'm not against making it a bit more obvious. I would be cautious of making it look like zuul is broken when an uninitiated developer visits it for the first time to check their patch's progress though07:05
tristanCfor large tenant, like the one at zuul.openstack.org, wouldn't it be odd to display a single project config errors as a very visible warning bar?07:05
jheskethperhaps it could be a large warning on the project's page (when that merges)07:05
tobiashtristanC: yes, maybe that needs to be different/configurable for single and multi tenant setups?07:06
tobiashI see that a warning bar could be annoying in large single tenant installations07:07
tobiash... while it could save me much time supporting projects in my large multi-tenant setup07:08
*** bhavikdbavishi has quit IRC07:10
*** quiquell has joined #zuul07:12
quiquellGood morning07:12
*** bhavikdbavishi has joined #zuul07:13
*** pcaruana has joined #zuul07:36
*** bhavikdbavishi has quit IRC07:37
*** quiquell is now known as quiquell|brb07:49
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: doc: fix typo in secret example  https://review.openstack.org/61609508:02
*** quiquell|brb is now known as quiquell08:04
*** themroc has joined #zuul08:16
*** bhavikdbavishi has joined #zuul08:23
*** hashar has joined #zuul08:33
*** bhavikdbavishi has quit IRC08:36
*** jpena|off is now known as jpena08:50
*** hashar has quit IRC08:55
*** goern has joined #zuul08:56
*** hashar has joined #zuul09:00
*** ssbarnea has joined #zuul09:19
*** electrofelix has joined #zuul09:29
*** ttx has quit IRC09:37
quiquellHello I am using the zuul running at docker-compose as explained in the user guide to test stuff09:39
quiquellI am pushing a project  there but takes a lot of time at "Processing changes" is that normal ?09:40
*** panda|off is now known as panda09:43
*** ttx has joined #zuul09:50
*** sshnaidm|afk is now known as sshnaidm|rover10:01
*** rfolco|ruck has joined #zuul10:37
*** hashar has quit IRC10:42
quiquellianw: I am testing zuul's docker-compose thingy10:51
quiquellianw: But has some issues pushing projects there10:51
*** bhavikdbavishi has joined #zuul10:51
quiquellianw: It takes forever... do you know what can be ?10:51
*** bhavikdbavishi has quit IRC11:24
*** ssbarnea has quit IRC11:34
*** ssbarnea has joined #zuul11:53
*** bhavikdbavishi has joined #zuul11:59
*** snapiri has joined #zuul12:22
*** sshnaidm|rover is now known as sshnaidm|afk12:50
*** jpena is now known as jpena|lunch12:53
*** rlandy has joined #zuul12:54
*** jpena|lunch is now known as jpena13:18
*** goern has quit IRC13:21
*** bhavikdbavishi has quit IRC13:29
*** JosefWells has joined #zuul13:33
JosefWellsI having trouble getting zuul to run ansible playbooks..13:45
JosefWellsin my logs I see:13:45
JosefWellszuul-executor_1   | FileNotFoundError: [Errno 2] No such file or directory: 'ansible'13:45
*** pcaruana has quit IRC13:45
JosefWellsI put together a docker compose for zuul a while back13:45
JosefWellshttps://github.com/josefwells/zuul-docker13:45
JosefWellsIn the docker build I see ansible being grabbed as part of the pip install zuul13:46
tobiashJosefWells: is it in the path?13:46
JosefWellsRunning setup.py bdist_wheel for ansible: started   Running setup.py bdist_wheel for ansible: finished with status 'done'13:46
JosefWellsthat is really my question I guess13:47
JosefWellsit is pip installed in the container that zuul-executor is running in13:47
JosefWellsI guess not.. I need to add the zuul home pip bin to path13:47
JosefWellsthanks tobiash, let me add that to my dockerfile13:48
JosefWellsI saw some discussion on the mailing list about an official docker setup for zuul13:48
tobiash:)13:48
JosefWellsthis was my first rodeo with docker, so it is probably sub-optimal13:48
tobiashJosefWells: there is a new docker based quick start tutorial: https://zuul-ci.org/docs/zuul/admin/quick-start.html13:49
JosefWellscool, I'll take a look13:49
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog  https://review.openstack.org/61620613:57
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog  https://review.openstack.org/61620613:59
*** pcaruana has joined #zuul14:00
tobiashcorvus, mordred: ansible errors like missing roles were missing now in the buildlog. I assume since the ansible update and we didn't notice because of a lack of testing this. So here is a fix and test ^14:08
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog  https://review.openstack.org/61620614:11
mordredtobiash: whoops - and looks good to me14:12
*** smyers has quit IRC14:13
tobiash:)14:13
*** smyers has joined #zuul14:14
openstackgerritMerged openstack-infra/nodepool master: Correct heading levels for Kubernetes config docs  https://review.openstack.org/61600714:20
*** pcaruana has quit IRC14:33
*** pcaruana has joined #zuul14:34
*** quiquell is now known as quiquell|off16:19
*** hashar has joined #zuul16:30
*** hashar has quit IRC16:35
*** hashar has joined #zuul16:36
*** pcaruana has quit IRC16:38
*** hashar has quit IRC17:02
*** irclogbot_3 has joined #zuul17:02
openstackgerritFabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver  https://review.openstack.org/60440417:25
*** themroc has quit IRC17:30
*** jpena is now known as jpena|off17:46
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Add resource metadata to nodes  https://review.openstack.org/61626218:04
openstackgerritMerged openstack-infra/zuul master: Fix reporting ansible errors in buildlog  https://review.openstack.org/61620618:04
openstackgerritMerged openstack-infra/zuul master: doc: fix typo in secret example  https://review.openstack.org/61609518:04
*** panda is now known as panda|off18:10
Shrewstobiash: i don't believe i really understand that metadata change18:12
tobiashShrews: zuul's side is wip, just had an idea how we could have per project/tenant compute resource statistics18:13
tobiashShrews: the idea is that nodepool could add the abount of resources (cores, ram, more in the future) to each node as metadata18:13
tobiashand zuul could push usage data when locking/unlocking nodes18:14
tobiashto statsd18:16
corvusoh, i get it.  i like that idea.18:21
Shrewshrm, seems like it might be expensive calculations this way (depending on how often it's calculated)18:22
corvusit would only change on node status changes, so in the scheme of things, we wouldn't have to calculate it that often.18:25
tobiashthe idea was to just increment/decrement gauges (maybe stored in the tenant object)18:27
tobiashthat would happen on every lock/unlock of a nodeset18:27
Shrewsoh i see. so usage as zuul sees it18:27
tobiashyes18:27
tobiashand on nodeset unlock I'd also like to emit a counter of resource*duration18:28
Shrewsi wonder if there's a better way we can de-duplicate that data, rather than having it in every node18:28
tobiashShrews: well, we could store that in the node request18:29
tobiashbut that's slightly more complicated as this needs to be updated on every node that is added18:29
tobiashbut that would be fine for me too18:30
Shrewswhat you have is fine. i'm just thinking out loud right now18:30
corvusplus, different nodes may be different sizes18:31
corvus(we don't do that in openstack -- yet -- but the ability and intent is there)18:31
tobiashif we have it in the request we would have to sum it up in nodepool18:31
tobiashwe have different sized nodes so supporting that is required for me anyway :)18:32
tobiashdo you prefer a defined format for this or the loose metadata-dict approach I took?18:33
tobiashI'd be fine with both18:33
Shrewsi just mean that a particular flavor from a provider will always be the same. seems a bit wasteful to store that data in all the nodes matching that. but i'm just doing unnecessary pre-optimization that may not be necessary18:33
Shrewsthus my overuse of "unnecessary" which was totally unnecessary18:34
Shrewsand superfluous18:34
tobiashif we want to optimize that we would need to store the flavor data in zk which is a new object and different by providers18:34
corvusShrews: oh, yes, you're right.  it's not exactly third-normal form is it?  :)  but yeah, we don't (yet) have a zk record for provider-flavor so that's a little more difficult.18:34
Shrewstobiash: exactly where my head was headed, then zuul could do the lookup18:35
tobiashso I would prefer to store that little bit of extra data for the sake of simplicity18:35
Shrewscorvus: it's the db dev in me... i can't shake it :)18:36
tobiashShrews: we currently have 600kb of data in zk with 200 nodes so that really might be premature optimization ;)18:36
Shrewstobiash: yeah, but it's your "more in the future" statement that we should probably think a bit about18:37
tobiashShrews: the more in the future might be volume, disk, but I cannot think about much more18:38
Shrewsi wouldn't like small metadata to grow to large metadata, then we'd be forced to redesign18:38
tobiashShrews: then we probably should go with a fixed schema18:39
corvusi wonder if containers have any useful metadata?18:40
Shrewshrm18:40
Shrewsexposed ports? volumes? base image?18:40
tobiashthat why this should be probably optional so zuul can use it or not depending on the existence18:40
corvusyeah, ports, volumes, networks are countable.  and containers themselves for the namespaces.18:40
corvusi like the 'meta' dictionary for things like this.  easy to put whatever makes sense for a given resource in there.18:41
corvus(though i could also see having a "resource" dictionary with the same content.  but either way, i do like the data-in-a-dict approach)18:42
* tobiash just wanted to suggest to rename that dict to resource18:42
Shrewsseems like this metadata could be grouped by "label" if we determine we need to go that route18:42
tobiash"label" can be different per provider18:43
Shrewsyep, so a /<provider>/<label> node heirarchy18:44
Shrewsnot suggesting we do that. just wildly throwing out ideas18:44
Shrewsmuch like mordred does18:44
tobiashlol18:45
Shrews:)18:45
Shrewsoh wait... do we already store some metadata somewhere???18:47
Shrewsyes! in the launcher node18:47
Shrewswe store supported labels when the launcher thread registers itself18:48
tobiashI'd suggest to take the easy approach and add the counters to a resource dict per node (or nodeset as an optimization?). This is an easier less error prone approach and I don't see a scaling limit there in the foreseeable future. We can switch to a more sophisticated approach any time if this would get too much somewhere in the future.18:48
*** kmalloc is now known as needscoffee19:05
*** electrofelix has quit IRC19:47
*** hashar has joined #zuul20:11
*** needscoffee is now known as kmalloc20:19
*** rfolco|ruck is now known as rfolco|off20:40
*** jtanner has quit IRC20:47
*** mattclay has quit IRC20:47
*** mrhillsman has quit IRC20:47
*** jbryce has quit IRC20:47
*** hogepodge has quit IRC20:47
*** fdegir has quit IRC20:47
*** jbryce has joined #zuul20:48
*** mnaser has quit IRC20:48
*** mattclay has joined #zuul20:48
*** hogepodge has joined #zuul20:48
*** fdegir has joined #zuul20:48
*** jtanner has joined #zuul20:49
*** mnaser has joined #zuul20:49
*** andreaf has quit IRC20:51
dkehntobiash: I guess this is a repeditive question, when the scheduler is retreiving ProtectedBranch github connection, is it using the zuul user, the sshkey config variable is set, but I notice that there is no user/password combo?20:51
*** andreaf has joined #zuul20:52
tobiashdkehn: it is using the auth mechanism you configured in the github connection21:00
tobiashso this can be a user or github app auth21:00
*** edleafe_ has joined #zuul21:03
dkehnok, sshkey is configured, just wanted to make sure,21:06
openstackgerritTobias Henkel proposed openstack-infra/zuul master: WIP: Report tenant and project specific resource usage stats  https://review.openstack.org/61630621:07
dkehntobiash: would you have time to look at a pastebin of the error I’m seeing, https://pastebin.com/T9mDkwRM21:13
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Add resource metadata to nodes  https://review.openstack.org/61626221:13
tobiashdkehn: I guess you have protected branches in that repo?21:15
dkehntobiash: actually I’m not sure21:16
tobiashdkehn: oh, well it isn't even searching for protected branches: https://github.ibm.com/api/v3/repos/wrigley/zuul/experiment-conf/branches?per_page=100&protected=021:16
tobiashdkehn: so make sure that zuul has access to that repo21:17
tobiashto me it looks like zuul has no access to that repo at all which is the reason why github returns 40421:17
tobiashdkehn: do you have the zuul.conf at hand (remove credentials before upload)21:18
dkehntobiash: actually when you explain to someone else you see a glaring error in the config21:19
tobiash:)21:19
clarkbhttps://etherpad.openstack.org/p/BER-reusable-zuul-job-configurations etherpad prep for tuesday session in berlin on reusing zuul configs21:33
clarkbplease feel free to add topics you'd like to discuss or have others think about21:33
*** hashar has quit IRC21:49
*** hashar_ has joined #zuul21:50
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: Set relative priority of node requests  https://review.openstack.org/61535621:58
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Remove unneeded nodepool test skips  https://review.openstack.org/61635821:58
*** hashar_ has quit IRC22:07
corvustobiash: two questions an 604648: 1) what do you think about my idea of just updating the cache inside of lockNode and always using the cache?  2) regardless of that, why do we care about the races in stats?  surely that's a place where it would be better to use cached data?22:12
corvustobiash: (my understanding is that this cache uses callbacks, so the updates should be relatively quick, meaning the stats should not be far out of date)22:13
tobiashcorvus: even slightly delayed stats can prevent the stats to drop to zero im certain situations22:15
tobiashcorvus: but there is a followup that fixes some of that22:15
corvustobiash: if we make the stats cheap (by using the cache) we could just send them periodically.22:16
corvus(so we could send them on every node update, or every 10 seconds if there are no node updates)22:16
corvusthat should clear things out.22:16
tobiashcorvus: this is the followup i meant: https://review.openstack.org/#/c/61368022:17
tobiashIt does something similar22:18
tobiashcorvus: regarding 1) The current version already defaults to using the cache22:20
tobiashThe cache is only disabled in a few cases in order to work around some test races22:21
corvustobiash: i think i'm suggesting the opposite.  rather than rate limit, keep the current behavior, but send periodic updates to clean up any lingering errors.22:21
tobiashThe current behavior isn't really cheap even with the cache22:21
corvustobiash: yeah, i'm wondering why you didn't think we needed to update the cache in the node lock.  basically... i wrote a nice review comment and got no reply.  :(22:21
tobiashSorry, was side tracked at that time22:22
corvustobiash: really?  it seems like with the cache it should be very fast.22:22
tobiashThere is much json parsing involved while iterating even with cache22:23
tobiashSo we save the network rtts but not json parsing22:23
tobiashIf we want to safe that we need to cache the node objects which would be another cache layer22:24
corvustobiash: yeah, i think we should look at doing that; there should be update events we get from the treecache that can do that for us.22:24
tobiashRegarding update on lock, I think it's a good idea, then we could remove all the scattered safety updates after lock22:25
corvus++22:25
tobiashBut I think that's independent of the cache as the updaze after lock already exist everywhere22:26
corvusi don't think we need to block on the stats thing -- we can like with rate limited for now.  just i think we should work toward a system that's fast enough that we can count all the nodes in a reasonable amount of time.  :)22:26
corvuss/we can like/we can live/22:26
corvustobiash: yes, though i think it should also give us the confidence to remove the cache flag.22:27
corvusso we can hide that detail22:27
tobiashYes, with thr additional cache layer that will be much faster22:27
corvus(put another way -- i think it will be easier for us as nodepool developers to just remember that everything is cached, it could be slightly out of date, *unless* a node is locked -- then it's guaranteed to be up to date and immutable.  that's easy to understand :)22:28
tobiashBut starting only with the treecache was easier, maybe it makes sense tondo that in two steps22:28
corvus++22:28
corvustobiash: one more question -- do you have any work in progress to cache the node request tree?  or immediate plans to do so?22:29
tobiashNot yet22:29
corvusok, i might do that very soon :)22:30
corvusi want it for https://review.openstack.org/61535622:30
tobiashSounds cool, we are very interested in your priority idea too :)22:31
tobiashSo if I can help...22:31
*** rfolco|off has quit IRC22:34
*** ssbarnea has quit IRC22:37
tobiashcorvus: one thought I have about that is that we might need to adapt the loop behavior of assignHandlers to make this work22:38
tobiashI observed that this loop skips a lot of requests due to lock contention with several not matching providers22:39
tobiashSo it runs througj the whole list and skips every request that it couldnt lock22:40
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: Set relative priority of node requests  https://review.openstack.org/61535622:41
tobiashMaybe we need some criteria to tell it to abort the loop and start again from the beginning22:42
corvustobiash: hrm, i don't understand the problem you're describing22:43
tobiashThe problem is that we have say two providers that can fulfill most of the requests and 8 pools with seldom used static nodes22:44
tobiashAll of these loop over the request list, lock and decline22:45
tobiashSo there is a high probability that a high prio request is locked by a static provider that will decline22:46
tobiashBut that will cause that request to be skipped by the provider that can fulfill this22:46
tobiashSo it takes the next request and the now skipped high prio request will only be served after the provider looped over the complete list and starts again from the beginning22:48
tobiashIf that probability is high enough this circumvents the priority queue22:49
tobiashWe have times when this lock contention leads to more or less random orders of processing in the end22:50
clarkbtobiash: maybe this is an indication the priority system should include a rewrite to use node subscriptions instead of polls?22:50
clarkbor poll without locking?22:50
clarkbread data, if actionable lock, if lock fails assume someone else has lock and move to next potential candidate?22:51
tobiashI'm sure that this is partially an indication that we need to split launchers22:52
clarkboh you are talking about current priority system (not the proposed one)22:52
clarkbmight still be an indication that we should lock less aggressively and/or avoid polling22:53
tobiashYes, but that wouldn't matter22:53
clarkbtobiash: reduced lock contention should make the ordering more deterministc?22:53
tobiashA solution could be to start from the beginning after we accepted a (or few) request22:54
tobiashThat would also fit into the dynamic prio22:55
tobiashAnd this should be fast enough if we cache the requests22:57
tobiashYes, I guess in openstack land this is a non issue because most providers can fullfill the same types?23:01
clarkbtobiash: yes, though there are cases where providers fulfill specific types (specifically arm and kata test nodes)23:04
corvustobiash: ah i see.  yes, restarting from the beginning would be a good fix for that i think, and low cost with request caching.23:06
SpamapSdid something happened to the add-build-sshkey role?23:06
corvusSpamapS: i think i made a recent change to it that should be noop in most cases.23:07
SpamapShttp://paste.openstack.org/show/734380/23:07
corvusrecent = months23:07
SpamapSMy zuul was idle for a few weeks and now every job goes to NODE_FAILURE with that fail23:07
corvusSpamapS: oh, well, the role still exists23:08
SpamapShttp://paste.openstack.org/show/734381/23:08
SpamapShm23:08
tobiashSpamapS: node_failure normally happens before ansible23:09
tobiashSpamapS: or did you mean retry_limit?23:09
corvusSpamapS: the add-build-sshkey is in zuul-jobs, so you'll need that added to the roles for the job.  i notice your second error says "git.openstack.org/openstack-infra/zuul-base-jobs" but that refers to "git.zuul-ci.org/zuul-jobs", so something may not be adding up in terms of your connections.23:12
corvus(see http://git.zuul-ci.org/cgit/zuul-base-jobs/tree/zuul.yaml#n15 )23:12
SpamapSI had to use git.openstack.org because of {some reason I forget the details of}23:15
SpamapSit's possible somebody has fixed that now23:15
*** rlandy is now known as rlandy|bbl23:29

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!