Monday, 2019-12-16

*** tosky has quit IRC00:02
*** jamesmcarthur has joined #zuul00:23
*** jamesmcarthur has quit IRC00:29
*** jamesmcarthur has joined #zuul01:04
*** jamesmcarthur has quit IRC01:09
*** jamesmcarthur has joined #zuul01:35
openstackgerritIan Wienand proposed zuul/nodepool master: Dockerfile: add DEBUG environment flag  https://review.opendev.org/69484501:37
openstackgerritIan Wienand proposed zuul/nodepool master: Also build sibling container images  https://review.opendev.org/69739301:37
openstackgerritIan Wienand proposed zuul/nodepool master: Add container-with-siblings functional test  https://review.opendev.org/69346401:37
openstackgerritIan Wienand proposed zuul/nodepool master: Dockerfile: install nodepool-builder dependencies  https://review.opendev.org/69330601:37
openstackgerritIan Wienand proposed zuul/nodepool master: Add a container-with-releases functional test  https://review.opendev.org/69881801:37
openstackgerritIan Wienand proposed zuul/nodepool master: Functional tests - use common verification script  https://review.opendev.org/69883401:37
*** jamesmcarthur has quit IRC01:40
*** jamesmcarthur has joined #zuul02:36
*** jamesmcarthur_ has joined #zuul02:46
*** jamesmcarthur has quit IRC02:46
*** bhavikdbavishi has joined #zuul02:53
*** jamesmcarthur_ has quit IRC03:28
*** jamesmcarthur has joined #zuul03:32
*** jamesmcarthur has quit IRC03:37
*** swest has quit IRC04:08
*** openstackgerrit has quit IRC04:08
*** dmellado has quit IRC04:08
*** irclogbot_0 has quit IRC04:08
*** dmsimard has quit IRC04:08
*** shanemcd has quit IRC04:08
*** klindgren_ has quit IRC04:08
*** ianw has quit IRC04:08
*** aspiers has quit IRC04:08
*** gothicmindfood has quit IRC04:08
*** openstackstatus has quit IRC04:11
*** openstackstatus has joined #zuul04:14
*** ChanServ sets mode: +v openstackstatus04:14
*** raukadah is now known as chkumar|rover05:22
*** saneax has joined #zuul06:42
*** sanjayu_ has joined #zuul06:48
*** saneax has quit IRC06:51
*** themroc has joined #zuul07:15
*** pcaruana has joined #zuul07:16
*** jcapitao|off has joined #zuul07:47
*** tosky has joined #zuul08:02
*** jcapitao|off is now known as jcapitao08:16
*** hashar has joined #zuul08:37
*** sshnaidm|off is now known as sshnaidm08:41
*** avass has joined #zuul08:49
avassIs it possible to reload the tenant config for only one tenant?08:50
*** jpena|off is now known as jpena08:52
*** fbo has joined #zuul09:07
*** mugsie has quit IRC09:19
*** mugsie has joined #zuul09:21
*** yolanda has quit IRC09:28
*** yolanda__ has joined #zuul09:28
*** mhu has joined #zuul09:32
*** shanemcd has joined #zuul10:07
*** dmellado has joined #zuul10:07
*** ianw has joined #zuul10:07
*** klindgren has joined #zuul10:07
*** themroc has quit IRC10:07
*** themroc has joined #zuul10:08
*** irclogbot_0 has joined #zuul10:10
*** aspiers has joined #zuul10:14
*** jcapitao is now known as jcapitao|afk11:50
*** rfolco has joined #zuul12:03
*** dmsimard has joined #zuul12:12
*** pcaruana has quit IRC12:27
*** pcaruana has joined #zuul12:33
*** jpena is now known as jpena|lunch12:39
*** sanjayu_ has quit IRC12:56
*** rlandy has joined #zuul12:58
*** jamesmcarthur has joined #zuul13:04
*** jcapitao|afk is now known as jcapitao13:09
*** jamesmcarthur has quit IRC13:15
*** jpena|lunch is now known as pjena13:31
*** pjena is now known as jpena13:32
*** Goneri has joined #zuul13:44
fungiavass: i don't think so... what's the use case? is there some problem when the other tenants are reconfigured?13:48
mordredfungi, avass: I want to say this is something tobiash was interested in a while back14:05
mordredand maybe did something about? or maybe didn't do something about?14:06
tobiashmordred, avass: check out https://review.opendev.org/#/c/65211414:19
tobiashbtw, this is ready for review and we use it in production since two.months now14:19
tobiash;)14:20
tobiashfungi: with many tenants this stalls zuul quite a while. A full reconfiguration can take up to 20 minutes in our deployment14:21
*** saneax has joined #zuul14:29
*** openstackgerrit has joined #zuul14:30
openstackgerritMonty Taylor proposed zuul/zuul master: Add --check-config option to zuul scheduler  https://review.opendev.org/54216014:30
mnaserFYI my talk was accepted into FOSDEM so I’ll be talking zuul: https://fosdem.org/2020/schedule/event/safe_gated_and_integrated_gitops_for_kubernetes/14:32
tristanCtobiash: left a comment14:33
mordredmnaser: cool!14:34
tristanCi meant, as an operator, it may be confusing to pick between a smart-reconfigure and full-reconfigure... shouldn't the reconfigure be always smart?14:37
mordredtristanC: yeah, I'd imagine scripting to trigger reconfigures when landing a change would almost always want smart - I'm not sure if there's still a use case for full ... but maybe adding smart is a more conservative way to add it - and then maybe at some point in the future if we're all happy with it we just alias full to smart? it's a good question14:41
tristanCmordred: having both works for me, but since this update the cli/unix-socket api, we might want avoid adding a new command14:44
*** chkumar|rover is now known as ignoreirc14:57
*** ignoreirc is now known as chkumar|rover14:58
openstackgerritMerged zuul/nodepool master: Dockerfile: create APP_DIR  https://review.opendev.org/69364614:59
avassfungi: one of the tenant has a lot of branches in one of the projects, it's a bit annoying having to reload that one when we add a project on another tenant15:00
tobiashtristanC: full reconfig is still needed i.e. to fix inconsistent cached things15:02
avasstobiash: yeah, a full-reconfigure takes about 30-40 minutes for us15:02
tobiashAs it reloads all config while smart reconfig operates incrementally15:02
tristanCtobiash: then perhaps full-reconfigure could be renamed full-reload, and the smart-reconfigure be renamed full-reconfigure ?15:05
avassis the 'full' needed. how about 'reconfigure'?15:06
tristanCavass: well most zuul operators must already be using the 'full-reconfigure' command15:07
tobiashtristanC: a full-reconfigure does a full config reload while a smart-reconfigure does an incremental approach. I don't see the need to change the already existing full-reconfigure15:08
tristanCand i guess most will want the new smart-reconfigure command, thus i'm suggesting we make it the default15:08
corvusmordred: what did you think of my concern on 542160?15:08
tristanCtobiash: then that's ok, it seems like we can just s/full/smart/ in our playbooks15:09
tobiashThere is no default, full-reconfigure is named like that in anticipation of further reconfig variants15:09
avasstobish: thanks for the link, looks good to me except a small spelling error :)15:10
tobiashno, because the use cases are different15:12
tobiashscratch my last sentence15:12
mordredcorvus: oh - that's a good point - I missed that back there15:13
corvusi'll restate it with a -115:14
mordredcorvus: thanks - I agree, it's worth a discussion about user experience15:15
tobiashcorvus; mordred: my use case is that we want to be able to test changes to tenant config upfront15:16
tobiashwe spawn a second scheduler and mergers and check if the new config is valid15:16
tobiash(for the whole tenant)15:17
corvustobiash: and with the smaller system, it's able to load the configuration within the timeout?15:18
tobiashcorvus: it worked until half a year ago where it tool ~20min for startup, now we filter the tenant list in the job based on the diff15:18
tobiashs/tool/took15:19
tobiashand with that it runs typiically only 5 minutes15:19
tobiashcorvus: would you be ok to add a big exclamation mark to the docs about the use case and how it is expected to be used?15:20
corvustobiash: ok, i have a few thoughts: 1) the idea of being able to run "program --validate-config" is pretty universal, so users will be surprised when they try to use it that they also have to start daemons just to see if other daemons will start.  for this, i think we should add a section to the docs about it, and even add a quick note to the cli args pointing there (like "caveat: see [doc section]").  2)15:23
corvussince even you can't use it as written, we might want to consider altering it to be "--validate-tenant" or something like that, so it takes an argument and filters for the one tenant.  but that's a suggestion, not a -1.15:23
corvus(#2 could be a followup)15:25
tobiashvalidate-tenant actually sounds like the way to go15:25
tobiashhowever this should accept a list, so --validate-tenants?15:25
corvussure, and maybe also accept * (and/or default to *)?15:26
tobiash++15:26
tobiashgreat, thanks15:26
corvussounds good; i'll copy/paste this conversation into review :)15:26
tobiash:)15:27
*** avass has quit IRC15:30
corvustobiash: 652114 lgtm but avass found a typo15:43
*** bhavikdbavishi has quit IRC15:46
*** themroc has quit IRC15:59
AJaegerzuul-jobs reviewers, https://review.opendev.org/#/c/696337/ has two +2s but a few questions and was not approved - anybody wants to +2A? Subject is "Add pypi_fqdn to differentiate it package mirrors"16:03
*** hashar has quit IRC16:05
*** hashar has joined #zuul16:07
*** jamesmcarthur has joined #zuul16:09
*** chkumar|rover is now known as raukadah16:14
*** saneax has quit IRC17:05
*** hashar has quit IRC17:14
pabelangerif multiple jobs produce artifacts, and a child depends on both, zuul.artifacts should be updated correctly in this use case?17:14
pabelangertesting it now to confirm, but figured I'd ask17:15
*** mattw4 has joined #zuul17:26
openstackgerritTobias Henkel proposed zuul/zuul master: Add support for smart reconfigurations  https://review.opendev.org/65211417:31
tobiashcorvus: thanks, fixed17:32
corvuspabelanger: yep, and multiple changes too17:37
corvustristanC: are you happy with https://review.opendev.org/652114  (i think your questions were answered in irc, but i wanted to double check)17:56
*** jpena is now known as jpena|off18:08
tristanCcorvus: i'm very happy with it, it's a much needed improvement18:21
corvustristanC: want to go ahead and +3 then?  that way we have a record that you're happy :)  (it also had a +2 from mordred before the typo fix)18:23
tristanCcorvus: done, thanks tobiash :)18:24
*** themroc has joined #zuul18:25
mordredthat should make avass happy18:25
*** sshnaidm is now known as sshnaidm|afk18:26
clarkbopendev had 1053 job retries in the previous 24 hours period. 73 of these ended with all three attempts failing. A good chunk of this 73 are due to pre-run failures that are consistent. The good news here is that we also had a cloud outage in this period18:29
clarkbThought I'd share some data on the utility of job retries18:30
corvusclarkb: wow, that sounds like things are working well18:30
corvusi mean, the whole pre-run retry idea is working well18:30
clarkbI think what this shows opendev is that we need the retries, but we don't want a very high limit as we do have a non trivial number of consistent failures18:30
clarkbcorvus: yup18:30
corvus(the internet and stuff on it is *not* working well :)18:30
clarkbI think this also means that the zuul default of 3 retries is a good one18:31
corvus++18:31
clarkbI also wrote up an email to openstack-discuss identifying cases of those 73 consistent failures for the related parties so that they can hopefully fix them (and in some cases they are already fixing them)18:32
clarkbother zuul operators may want to keep track of these numbers too as real issues can hide in those retries18:33
clarkbhttp://lists.openstack.org/pipermail/openstack-discuss/2019-December/011600.html if others are interested in the sorts of consistent retry failures we see18:34
clarkbanyway this has been in the back of my head for a while and finally got around to put a bit more logging in place. From that was able to dig in and identify that this is a useful feature and our default is good as well as point specific projects at improvements they can make. And with that I'm going to context switch to the next thing18:39
clarkbThe next thing is the OSF Annual Report Zuul section. My plan is to get a draft going on an etherpad and share that with the channel18:39
mordredclarkb: that's some awesome data18:40
clarkbone thing we might want to consider is how to store retry data in the db18:42
clarkbaiui we only report the last attempt to the database18:42
mordredinfra-root: it has come to my attention that I have some vacation days that I have to take or lose - so I'm going to be primarily AFK for the rest of the week. I'll probably still be lurking around since I'd intended to be working so we don't have a ton of extra things to do teed up18:42
clarkbthis means it might be difficult for zuul operators to track retries without having another system (like opendev's logstash/elasticsearch)18:42
*** themroc has quit IRC18:42
corvusmordred: vacationing by volunteering your time to an open source project? :)18:43
mordredcorvus: :)18:43
clarkbMy parents are flying into town thursday I expect my hours will become weird starting about then18:44
Shrewsmordred: be aware that you have until like mid february to use it18:44
corvusclarkb: yeah; i think in general we're going to need to store non-final builds and buildsets.  that would also let us have zuul-web build pages for completed builds before their buildset is complete (another common complaint).  i think that can come as part of (or after) the conversion of the sql reporter from a reporter to a mandatory component.18:45
corvus(but right now, it's structurally hard to do that)18:46
mordredShrews: yah - I think this is likely the best time to do it since stuff is slow anyway (and also I'm on the Mexican Riviera)18:46
clarkbcorvus: roger18:48
Shrewsmordred: that being said, it's a good reminder for me to examine which days to use my remaining tiime18:48
*** armstrongs has joined #zuul18:49
*** rfolco has quit IRC19:09
*** rfolco has joined #zuul19:10
mordredShrews: ++19:17
openstackgerritMerged zuul/zuul master: Add support for smart reconfigurations  https://review.opendev.org/65211419:21
fungimordred: glad to hear you've got some vacation time coming and so we'll be seeing more of you!19:21
*** rlandy is now known as rlandy|brb19:22
fungii personally am being shanghaied on a boat which is not destined for shanghai, and will disappear from regcognizance the day after next19:25
*** jamesmcarthur has quit IRC19:27
*** jamesmcarthur has joined #zuul19:28
*** rlandy|brb is now known as rlandy19:45
*** jamesmcarthur has quit IRC19:48
*** mhu has quit IRC19:50
*** themroc has joined #zuul19:51
*** themroc has quit IRC19:51
*** Goneri has quit IRC19:53
tobiashclarkb: re retries, check out https://review.opendev.org/#/c/63350119:54
tobiashThat also helped us to analyse the reason of a retry (at least if log upload still worked)19:58
clarkbtobiash: oh that would be great20:05
clarkbcorvus: ^  that may be a workaround for the db issues?20:05
clarkbcorvus: maybe you can review that to make sure it doesn't conflict with your plans stated earlier?20:05
clarkbzuulians first draft of an annual report report at the bottom of fungi's data gathering etherpad. Put it there because I reference the data in the etherpad and this way it is easy for people to double check I got things correct. https://etherpad.openstack.org/p/zuul-2019-annual-report-data20:48
clarkbwe've been asked to ahve this ready by January 10 so not a huge rush but with holidays I figured it was better to start early than late20:48
*** jamesmcarthur has joined #zuul20:49
clarkbfeel free to edit or make suggestions.20:56
*** jamesmcarthur_ has joined #zuul20:58
*** gothicmindfood has joined #zuul20:59
*** jamesmcarthur has quit IRC21:01
corvusclarkb, tobiash: in principle i think that could work and i don't think it would impact future db changes.  but i think there are some problems with those patches with current code; i left some comments on 633501.21:06
*** Goneri has joined #zuul21:06
corvusalso, i'm still actually a little confused about how that would work in practice since there are generally no longs for retried builds.21:06
corvuslike, you just get to see that the build was retried.  it's pretty durned hard to find out why.21:07
clarkbthere are logs if it fails in pre run like devstack jobs or if the disk issues that ironic sees occur21:07
clarkbbut ya not in all cases21:07
*** jcapitao has quit IRC21:47
*** pcaruana has quit IRC21:55
*** rfolco is now known as rfolco|bbl21:58
*** mattw4 has quit IRC22:04
*** mattw4 has joined #zuul22:04
*** mattw4 has quit IRC22:28
*** mattw4 has joined #zuul22:28
*** rlandy is now known as rlandy|bbl22:51
*** jamesmcarthur_ has quit IRC23:26
*** tosky has quit IRC23:43

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!