Tuesday, 2020-07-07

*** rlandy has quit IRC00:00
openstackgerritMerged zuul/zuul master: Add a simple test for upstream renaming branches  https://review.opendev.org/73925500:14
fungiy2kenny: not sure if it's a direction you want to go, but nodepool launcher daemons have worker threads which perform periodic tasks for purpose of node lifecycle management. maybe one of them could renew tokens for any nodes it knows about?00:18
openstackgerritMerged zuul/zuul master: Expire Github installation key 3 minutes before  https://review.opendev.org/73877200:25
openstackgerritGuillaume Chauvel proposed zuul/zuul master: scheduler: Fix event process abide hasUnparsedBranchCache argument  https://review.opendev.org/73904200:26
openstackgerritGuillaume Chauvel proposed zuul/zuul master: Fix branch name and project name for ref-updated create/delete  https://review.opendev.org/73832000:26
openstackgerritGuillaume Chauvel proposed zuul/zuul master: FakeGerritChange: Add Change-Id in commit message  https://review.opendev.org/73919700:26
openstackgerritGuillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated create/delete  https://review.opendev.org/73919800:26
openstackgerritGuillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated oldrev+newrev  https://review.opendev.org/73907800:26
*** hashar has joined #zuul00:46
*** rfolco has quit IRC00:48
*** hashar has quit IRC01:11
*** swest has quit IRC01:50
*** swest has joined #zuul02:04
y2kennyfungi: do you separate from the cleanup leaked node ones?  are those worker threads accessible from the drivers?02:09
y2kennydid you mean*02:09
*** bhavikdbavishi has joined #zuul02:21
*** hamalq has quit IRC02:25
*** ysandeep|away is now known as ysandeep02:26
*** hamalq has joined #zuul02:33
*** bhavikdbavishi has quit IRC02:34
*** bhavikdbavishi has joined #zuul02:37
fungithe cleanup thread would be an example... it's been a while since i poked around in that bit of the codebase but they'd be driver-specific by nature02:54
y2kennyfungi: ok thanks02:56
*** hamalq has quit IRC03:04
*** bhavikdbavishi1 has joined #zuul03:07
*** bhavikdbavishi has quit IRC03:09
*** bhavikdbavishi1 is now known as bhavikdbavishi03:09
*** bhavikdbavishi has quit IRC04:04
*** bhavikdbavishi has joined #zuul04:14
*** wuchunyang has joined #zuul04:33
*** evrardjp has quit IRC04:33
*** evrardjp has joined #zuul04:33
*** sgw1 has quit IRC04:39
*** vishalmanchanda has joined #zuul04:44
*** wuchunyang has quit IRC04:59
*** wuchunyang has joined #zuul04:59
*** sugaar has quit IRC05:02
swestianw: would be great if you could have another look at https://review.opendev.org/#/c/728824/05:25
*** y2kenny has quit IRC05:25
*** wuchunyang has quit IRC05:29
*** bhagyashris is now known as bhagyashris|brb05:50
*** marios has joined #zuul06:01
*** bhagyashris|brb is now known as bhagyashris06:16
*** bhavikdbavishi1 has joined #zuul06:20
*** bhavikdbavishi has quit IRC06:22
*** bhavikdbavishi1 is now known as bhavikdbavishi06:22
ianwswest: sorry, i've just had my head in non dib things06:22
*** wuchunyang has joined #zuul06:31
openstackgerritFelix Edel proposed zuul/zuul master: Introduce Patternfly 4  https://review.opendev.org/73622506:33
openstackgerritFelix Edel proposed zuul/zuul master: PF4: Add new Zuul logo with text  https://review.opendev.org/73803306:33
openstackgerritFelix Edel proposed zuul/zuul master: PF4: Update "fetching info ..." and refresh animation  https://review.opendev.org/73801006:33
openstackgerritFelix Edel proposed zuul/zuul master: PF4: Update buildset result page (new layout and styling)  https://review.opendev.org/73801106:33
swestianw: thanks a lot06:59
*** bhavikdbavishi has quit IRC07:06
openstackgerritMerged zuul/zuul master: Replace cookie use with localStorage  https://review.opendev.org/73945407:13
*** jcapitao has joined #zuul07:22
*** bhavikdbavishi has joined #zuul07:30
*** tosky has joined #zuul07:36
*** hashar has joined #zuul07:42
*** saneax has joined #zuul07:47
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul master: Reduce table nesting on build pages  https://review.opendev.org/73955907:50
zbrapparently we still have gate failures on zuul https://zuul.opendev.org/t/zuul/build/f973fd1dff9743449e6c1a34eea5f3aa07:51
zbrpromote pipeline failed consistently for more than... zull can display.07:52
openstackgerritBenjamin Schanzel proposed zuul/zuul master: github: use change.message in squahsed commit message  https://review.opendev.org/73601908:04
*** sugaar has joined #zuul08:10
*** nils has joined #zuul08:10
*** nils has quit IRC08:11
*** nils has joined #zuul08:11
*** ysandeep is now known as ysandeep|lunch08:34
tobiashcorvus: I've administratively -2ed the auto gc patch (https://review.opendev.org/723800) so we can observe this in combination with the followup for 1-2 weeks in prod08:44
tobiashcorvus: we're running this since many weeks now and the followup (which we deployed yesterday) should rule out the last problem with it. So I'd like to give it a burn in test in our prod system for another week or so before landing those.08:45
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul master: Avoid interactive when building containers  https://review.opendev.org/73968009:00
*** bhavikdbavishi has quit IRC09:11
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul master: Attempt to remove .keep file  https://review.opendev.org/73968409:11
tobiashzbr: regarding ^ check out https://review.opendev.org/66310809:17
zbrtobiash: lol, more than an year to remove a file, and not merged yet.09:18
tobiashzbr: it was harder than initially expected and low prio ;)09:18
zbrsadly i cannot help you merge it, but once recheck passes i will try to poke others.09:19
openstackgerritFelix Edel proposed zuul/zuul master: Introduce Patternfly 4  https://review.opendev.org/73622509:20
openstackgerritFelix Edel proposed zuul/zuul master: PF4: Add new Zuul logo with text  https://review.opendev.org/73803309:20
openstackgerritFelix Edel proposed zuul/zuul master: PF4: Update "fetching info ..." and refresh animation  https://review.opendev.org/73801009:20
openstackgerritFelix Edel proposed zuul/zuul master: PF4: Update buildset result page (new layout and styling)  https://review.opendev.org/73801109:20
tobiashto be fair I wasn't too active on that change and forgot to ask for a review after addressing the comments09:21
tobiashcorvus: ++ for the branch release dance :)09:27
*** jcapitao is now known as jcapitao_afk09:30
*** tumble has joined #zuul09:31
*** bhavikdbavishi has joined #zuul09:43
*** holser has quit IRC09:48
*** ysandeep|lunch is now known as ysandeep09:48
*** jcapitao_afk is now known as jcapitao09:49
*** tosky has quit IRC10:00
zbrtobiash: corvus: small https://review.opendev.org/#/c/739680/10:00
*** tosky has joined #zuul10:01
tobiashzbr: afaik this has been fixed in the python-builder base image. Can you update them and try again?10:03
zbrafaik these values are not persistent inside the image.10:05
zbri am almost sure my image was donwloaded and failed to run10:06
zbrbasically you need to be sure you defined them whenever you perform the build10:06
*** hashar has quit IRC10:07
*** wuchunyang has quit IRC10:08
tobiashzbr: the central fix was supposed to be https://review.opendev.org/73812110:09
zbri got the error today, on master.10:09
zbri bet it came from another call of apt10:10
*** bhavikdbavishi has quit IRC10:13
*** holser has joined #zuul10:15
zbrtobiash: I can assure you that "docker build ." on zuul master does trigger the interactive prompt.10:20
zbryour fix was good, but for another use-case.10:20
zbrhttp://paste.openstack.org/show/795598/10:22
tobiashit wasn't my fix ;)10:22
*** bhavikdbavishi has joined #zuul10:28
*** bhagyashris is now known as bhagyashris|brb10:37
*** harrymichal has joined #zuul10:41
*** wuchunyang has joined #zuul10:50
*** wuchunyang has quit IRC10:55
*** felixedel has joined #zuul10:57
*** jcapitao is now known as jcapitao_lunch10:59
felixedelzuul-maint: Question about the build result page: Is there a reason why there are three dedicated pages Build, BuildLogs and BuildConsole instead of just a single page using three tabs? Currently, each of those three pages is kind of using the same container, but fills it with a different content and activates a different tab.11:00
openstackgerritTobias Henkel proposed zuul/zuul master: Resume jobs after reenqueue of an item  https://review.opendev.org/73970911:01
*** holser has quit IRC11:06
*** bhagyashris|brb is now known as bhagyashris11:12
avasstobiash: wanna +2: https://review.opendev.org/#/c/727158/ to drop ansible 2.6 in zuul-jobs?11:16
tobiashavass: you mean +3?11:17
avassshould have been done by 2nd of july: http://lists.zuul-ci.org/pipermail/zuul-announce/2020-June/000075.html11:17
avassah yeah11:17
tobiashdone11:17
avassthanks!11:18
openstackgerritMerged zuul/zuul-jobs master: Drop support for ansible 2.6  https://review.opendev.org/72715811:32
tristanCfelixedel: none that I can remember... it sounds like a refactor opportunity11:36
*** ysandeep is now known as ysandeep|afk11:41
felixedeltristanC: I think I found it: https://review.opendev.org/#/c/675235/11:41
*** felixedel has quit IRC11:43
*** hashar has joined #zuul11:52
*** rfolco has joined #zuul12:04
*** ysandeep|afk is now known as ysandeep12:06
*** jcapitao_lunch is now known as jcapitao12:15
*** holser has joined #zuul12:18
*** rlandy has joined #zuul12:19
*** vishalmanchanda has quit IRC12:22
*** harrymichal has quit IRC12:29
*** harrymichal has joined #zuul12:29
*** bhavikdbavishi has quit IRC12:35
*** bhavikdbavishi has joined #zuul12:36
*** piotrowskim has joined #zuul12:52
*** sgw1 has joined #zuul13:11
*** Goneri has joined #zuul13:18
*** Goneri has quit IRC13:33
*** bhavikdbavishi has quit IRC13:51
openstackgerritSimon Westphahl proposed zuul/zuul master: Protect repo update for cat/fileschanges with lock  https://review.opendev.org/73976113:52
openstackgerritSimon Westphahl proposed zuul/zuul master: Correctly fail cat/fileschanges when update fails  https://review.opendev.org/73976213:52
*** olaph has quit IRC13:58
*** Goneri has joined #zuul14:02
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Update to dhall lang v14  https://review.opendev.org/73976714:03
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add reformat changes to the blame ignore list  https://review.opendev.org/73976814:03
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Update to dhall lang v17  https://review.opendev.org/73976714:03
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add reformat changes to the blame ignore list  https://review.opendev.org/73976814:04
*** bhagyashris is now known as bhagyashris|dinn14:09
*** holser has quit IRC14:35
*** holser has joined #zuul14:36
*** saneax has quit IRC14:51
tobiashtristanC: mind an easy review? https://review.opendev.org/738620 (removes some unused variables)14:54
*** saneax has joined #zuul14:55
*** bhagyashris|dinn is now known as bhagyashris14:56
*** saneax has quit IRC14:56
*** saneax has joined #zuul14:57
corvustobiash: sounds good (re admin -2 on git patch)15:00
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Use ensure-pip role to unblock the CI  https://review.opendev.org/73978615:00
zbravass: https://github.com/ansible/ansible-lint/pull/881 please, i think that webknjaz may be in vacation this week. ignore zuul error (infra)15:00
tobiashcorvus: great, I'll +w it in 1-2 weeks if we don't observe issues15:01
*** saneax has quit IRC15:01
*** saneax has joined #zuul15:04
*** saneax has quit IRC15:05
*** hamalq has joined #zuul15:10
*** bhavikdbavishi has joined #zuul15:12
*** hamalq_ has joined #zuul15:12
*** hamalq has quit IRC15:15
*** bhavikdbavishi has quit IRC15:18
*** vishalmanchanda has joined #zuul15:25
*** bhavikdbavishi has joined #zuul15:28
openstackgerritBenjamin Schanzel proposed zuul/zuul master: github: use change.message in squahsed commit message  https://review.opendev.org/73601915:29
*** ysandeep is now known as ysandeep|away15:32
openstackgerritGuillaume Chauvel proposed zuul/zuul master: scheduler: Fix event process abide hasUnparsedBranchCache argument  https://review.opendev.org/73904215:38
openstackgerritGuillaume Chauvel proposed zuul/zuul master: Fix branch name and project name for ref-updated create/delete  https://review.opendev.org/73832015:38
openstackgerritGuillaume Chauvel proposed zuul/zuul master: FakeGerritChange: Add Change-Id in commit message  https://review.opendev.org/73919715:38
openstackgerritGuillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated create/delete  https://review.opendev.org/73919815:38
openstackgerritGuillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated oldrev+newrev  https://review.opendev.org/73907815:38
zbrfelixedel did a great job with pf4, i wonder when it will merge. new UI looks bit weird but not necessarily in a bad way.15:39
zbrtobiash: the .keep removal still has an issue15:41
tobiashyes :(15:42
*** bhavikdbavishi1 has joined #zuul15:52
zbrhow can i find all jobs that run with a specific nodeset? i am ware of a problem with fedora-31 but i do not know how to find jobs that used it recently (last 24h)15:52
openstackgerritMerged zuul/zuul master: Avoid interactive when building containers  https://review.opendev.org/73968015:53
zbri am not aware of any filtering tricks that could allow me to do that15:53
corvuszbr: i don't think zuul provides that.  but if you want to hop over to #opendev, i can look that up for you in the logs.15:53
*** bhavikdbavishi has quit IRC15:53
*** bhavikdbavishi1 is now known as bhavikdbavishi15:53
tobiashotherwise one would need to query all builds, squash them into a list of jobnames and query them from the zuul api (which includes the nodeset)15:55
fungialso, "i want to know all the jobs with <property x>" is a bit vague. presumably the desire is to just know about jobs in merged configuration and not in speculative configuration15:57
*** reiterative has quit IRC15:57
fungibut zuul can run plenty of jobs which are not in its merged configuration state15:57
tobiashin that case one could loop over the jobs list which is non-speculatively15:57
*** reiterative has joined #zuul15:58
tobiashbut doesn't cover the 'in the last 24h' filter15:58
fungiright, "jobs which ran with <property x>" and "jobs in merged config which could run with <property x>" are overlapping sets, but neither is necessarily a proper subset of the other15:59
fungiif this were going to eventually become queryable from the builds api then i guess it should be the former, but from the jobs api it should be the latter16:00
*** marios has quit IRC16:02
corvuswe don't store enough info in the builds api (yet) to make that queryable.  there's a node field, but it's inadequate for the current data model16:02
corvuswe don't store enough info in the builds table (yet) to make that queryable.  there's a node field, but it's inadequate for the current data model16:03
*** jcapitao has quit IRC16:11
*** ysandeep|away is now known as ysandeep16:21
openstackgerritMerged zuul/zuul-jobs master: Allow deleting workspace after running terraform destroy  https://review.opendev.org/73877116:29
*** chandankumar is now known as raukadah16:38
*** SpamapS has quit IRC16:42
webknjaz@zbr: Monday was a public holiday in CZ, but not the whole week16:52
openstackgerritMerged zuul/zuul master: GitHub Reporter: Fix User Email in Merge Commit Message  https://review.opendev.org/73859016:58
openstackgerritMerged zuul/zuul master: Introduce Patternfly 4  https://review.opendev.org/73622517:07
openstackgerritMerged zuul/zuul master: PF4: Add new Zuul logo with text  https://review.opendev.org/73803317:11
fungiwebknjaz: you could have pretended, most of us wouldn't have questioned it ;)17:12
webknjazlol 😂17:19
zbrtoo late to hide now17:19
*** SpamapS has joined #zuul17:20
openstackgerritMerged zuul/zuul master: Remove some unused variables  https://review.opendev.org/73862017:23
corvustobiash, tristanC: this is something i've noticed with the zuul for the gerrit project.  it's running in k8s with a single mysql server, and if the scheduler pod restarts while the sql server is down, it disables the sql reporter.  then i need to manually restart the scheduler to fix it.17:29
corvusthe best solution would probably be to have an HA mysql service.  :)17:29
corvusbut should we also re-think how we handle disabling sql reporters?17:29
tobiashHA mysql is always good to have :)17:30
tobiashI think we shouldn't disable them at all probably17:30
corvusshould we have the scheduler wait for sql before starting?  or should we have it continually retry after starting and put it into service if it shows up17:30
corvusor that17:30
corvusjust let it soft-fail like any other reporter if it isn't there17:30
tobiash++17:30
corvus(i think the main thing is that right now the scheduler does the schema upgrades, so there is something special that has to happen on start)17:31
corvus(but maybe that can still happen any time17:31
tobiashhrm, good question17:31
corvus(ugh, also, the k8s cluster has done the thing where it lost its internal dns server; i think i need to reboot the whole cluster)17:32
tobiashwe saw schema updates taking up to 30min in prod and actually I'd rather wait for it to succeed except 'loosing' builds from users point of view17:32
tobiashs/except/instead of17:32
fungi"reboot the whole cluster"17:37
fungiso basically the cluster is the new server, and kubernetes is init17:37
*** hashar has quit IRC17:47
openstackgerritTobias Henkel proposed zuul/zuul master: Revert "Revert "Create zuul/web/static on demand""  https://review.opendev.org/66310817:55
tobiashzbr: I think that should finally do it ^17:55
tristanCcorvus: it seems like the ideal behavior would be to wait for a reporter instead of disabling it18:00
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: keep retrying SQL db init  https://review.opendev.org/73982718:01
corvustristanC, tobiash: ^ maybe something like that?  i haven't tested it yet; just sketching it out18:01
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul master: Enable ANSI rendering via react-ansi  https://review.opendev.org/73944418:02
tobiashcorvus: I think _init() requires a lock, other than that I think it makes sense to be able to configure zuul such that startup still fails when sql is not available (which would be the mode we'd be using)18:05
zbrcorvus: fungi: wdyt about https://review.opendev.org/#/c/739559/ ?18:05
corvustobiash: yeah, i think you're right about the lock since it could be triggered by a web gear rpc call (otherwise it should all be in the scheduler main loop)18:05
tobiashah yeah, didn't think about the main loop :D18:06
corvuszbr: lgtm but i'm going to recheck it since we just landed pf418:07
fungizbr: seems reasonable, though i'll admit i'm not really savvy with react/bootstrap panel containers18:07
zbrindeed, probably not impacted but better to test.18:08
fungii definitely agree with the reasoning in the commit message18:08
fungiand assuming the recheck looks good for build results views i'm tentatively +2 on it18:09
zbrfungi: when it comes to UI, less (noise) is more.18:09
fungiabsolutely18:09
fungii still think in sgml, for what it's worth ;)18:09
zbrfungi: i asked yesterday but did not get a clear answer: i want to propose ditching the "popup on result label" and display all details in the expansion.18:10
zbrso we would no longer have 3 places to render task details18:10
corvuszbr: let's also ask tristanC about the panel -- he might have insight about the most patternfly-typical way to handle things like that18:11
*** nils has quit IRC18:11
zbrsome "tipical" ways where changed in fp4 too, the idea is that we should not ab(use) all cool UI elements it provides.18:12
fungizbr: i'm a fairly utilitarian sort of guy... if it still works and i can find the information i need and it's simpler than what we had before, then it's fine by me. but i would like broader feedback on ui preferences than just mine18:12
corvuszbr: i'm open to an experiment about the popup, but it's going to have to work really well for me to favor that.18:13
zbri do not plan to make major redesigns, only to simplify parts that proved to be confusing18:13
corvuszbr: don't embark on it if you aren't prepared for rejection.18:13
zbri wasn't expecting more than this, only to see if people are open to something like this18:14
zbrobviously that we need to see it to decide applies18:14
corvusdefinitely open.  this is our most novel ui feature, so we have the most to gain and the most to lose with experiments/changes.  :)18:15
fungii definitely favor simplicity (both for the user and for the maintainer), but i recognize that usability is a multi-dimensional vector and different people favor different dimensions in the field18:16
corvushttps://ci.gerritcodereview.com/t/gerrit/builds18:17
corvusthat's back up now that i have upgraded the cluster (an effective rolling restart)18:17
corvusand of course, since it was a restart, that's the branch tip18:18
corvusso you can see the pf4 stuff in action18:18
tobiashcool :)18:19
tristanCcorvus: zbr: i don't have much insight about pf or ui design... i guess regular user are used to clicking on the [result] box to get the detail, but i can see how new user can be confused by the current layout18:19
funginice! i don't really see much of a change, which i suppose is praise coming from me ;)18:19
corvustristanC: sorry, i was asking about the panel change in https://review.opendev.org/73955918:20
tobiashand I see the retry reporting there as well :)18:20
corvusyep18:20
corvustristanC, zbr: seems like zbr is saying we should reserve panels for pages which display multiple items, so each item gets a panel; i was wondering if there's an overall page structure framework we should use in pf, or if a simple h2 is the way to go18:22
tristanCcorvus: zbr: if we are going to change the layout to accomodate new users, we might want to help existing user getting use to a new behavior18:22
tristanCcorvus: about page panel, i don't know, a simple h2 seems to work fine18:22
tobiashcorvus: do you mind a re-review of 710034 (the github auth refactor). I had to do a rebase in the meantime and addressed a comment clarkb had.18:23
tristanCi would say that if the page only has one panel header, then that can be replaced by a h218:23
avasswebknjaz: seeing emojis in irc feels.. strange18:23
fungitristanC: that sounds like what 739559 is doing then18:24
fungiavass: i recently upgraded my font set to include the noto (no tofu) family and started getting them in my consoles. i agree it continues to surprise me18:25
corvuszbr, tristanC: this is the last time we discussed the task expansion: http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2020-06-10.log.html#t2020-06-10T15:44:0018:27
avasszbr: lgtm and I agree that it doesn't make sense to split that test into several tests so I'll go ahead and approve that18:27
avasswebknjaz: unless you have something else you want to change18:28
corvusi believe everything said then still stands.  it's worth a re-read before embarking on any changes.18:28
avasszbr: uh, except that the tests aren't passing ;)18:28
zbravass: ignore zuul jobs (best channel to say this....)18:29
corvuszbr: is that the f31 issue?18:30
zbryes18:30
zbri think paul started to work on it18:30
zbrwe did not see it because we do not (usually) run fedora jobs, started last night18:31
avasstristanC: I started looking into the zuul-operator a bit and might push some changes later in the week when I get around to it. But I think I've found some bugs :)18:32
zbrmaybe that is a good opportunity to ask about having a periodic-nodeset-sanity-pipeline, that runs daily and makes use of most important zuul-jobs roles18:33
zbrso we would know when something breaks from outside18:33
zbrsame could be used to validate that a new nodeset image "looks" good to be used18:33
avasstristanC: is there a list of things that needs to be done somewhere?18:34
tristanCavass: nice, that's quite possible, i plan on working on it some more. here are a list of missing things: https://review.opendev.org/#/c/718755/4/README.md18:36
tristanCavass: iirc we talked about providing a one file install to setup all the services on different provider, and i think we should be able to generate such file from the existing configuration18:38
tobiashcorvus: I noticed another unrelated failure in test_playbook and it looks like it retries the job 'timeout' instead of marking it as TIMED_OUT. However I have no idea yet why.18:40
tobiashhttps://16017867d971cd1a3c19-21d6bae6f57d664d8ef403dd2ad49654.ssl.cf1.rackcdn.com/738620/1/gate/tox-py35/219edc3/testr_results.html18:40
tristanCavass: also i'd like to adapt this k8s deploy function to work using podman play kube directly, without k8s : https://softwarefactory-project.io/cgit/zuul-images-jobs/tree/functions/deploy.dhall18:40
tobiashmaybe it's not even a test race but a real race18:40
tristanCavass: i think that would be a nice little transformer to convert the StatefuleSet/Deployment to simpler Pod18:40
avasstristanC: ah cool, I've been messing around with it with Kind so far.18:41
tristanCavass: Kind sounds like something interesting to document too18:43
*** vishalmanchanda has quit IRC18:44
tristanCavass: i meant to try podman as a way to run the smallest zuul possible locally18:44
avasstristanC: I think I'll have some reading to do to get my head around dhall first ;)18:51
*** bhavikdbavishi has quit IRC19:07
zbris it common for zuul to fail py35 tests randomly? https://zuul.opendev.org/t/zuul/build/62339692bbcf43a68c3c7d9463cde97f -- looks random to me.19:09
fungirandomness is an illusion, nondeterminism is more likely19:13
* fungi risks veering into philosophy19:13
tobiashcorvus: I think I found out the cause of the test_playbook fail. In this case it was slow and the 'timeout' job already timed out during the pre playbook causing it to retry hitting the test timeout in the end19:15
tobiashcorvus: now the key question, do we want a job to retry if it timeouts in a pre playbook or not?19:16
tobiashcorvus: the docs are not very clear about that: https://zuul-ci.org/docs/zuul/reference/job_def.html#attr-job.timeout19:17
avasszbr: is it always that test or is it random?19:20
zbryep19:21
openstackgerritTobias Henkel proposed zuul/zuul master: Make test_playbook more stable  https://review.opendev.org/73983519:21
tobiashcorvus: if we want to retain the current behavior, this should make the test more stable ^19:22
tobiashzbr: there are some test cases that might contain races which over time accumulate until someone takes the time to track each one down(like that one ^ as well)19:25
tobiashzbr: in your case the command socket thread didn't exit cleanly as it seems19:30
zbrbtw, i find weird to see py35 being only python tested instead of py36 or py37.19:31
fungimaking sure we don't break "old" python?19:31
*** hashar has joined #zuul19:34
openstackgerritTobias Henkel proposed zuul/zuul master: Join command thread on exit  https://review.opendev.org/73983819:36
tobiashzbr: this might fix the issue you spotted ^19:37
tobiashzbr: in order to save resources we test the oldest and newest supported python version and not all in between19:37
zbrpy35 is joining the py27 club in two months, is that what we call newest?19:38
tobiashzbr: why newest? we test py35 and py38 where py35 being our oldest supported version19:39
zbrthere are important changes around threading on py36,py37,... so I would not attempt to skip any version.19:39
fungiwhy not compile and test every point release then? we have to draw the line somewhere19:41
fungipreviously we said testing oldest and newest we support should be sufficient to catch most errors19:42
mordredzbr: those threading changes should exist in 38 too no?19:43
tobiashthe problem is that the zuul tests take quite some time combined with unfortunately some degree of unstable tests19:43
tobiashtesting four python versions is just not feasable currently19:43
zbrjust to clarify it: i am not asking to test all 3x python, i argue that py35 may be a poor choice, and that py36 or py37 would be much more useful19:44
tobiashbut there was already discussion to ditch py35 in a not too distant future so then we'd test py36 and py3819:44
zbrprobably py36+py38 would be the best values, imho19:44
zbrw/o 37 due to resource19:45
tobiashpy35 is not a poor choice as opendev is still running zuul with that version (unless the switch to containers is already complete)19:45
tobiashand testing the lowest supported version ensures that we don't use language features that are not supported in py3519:45
tobiashmordred: did opendev already switch to containers or is that wip?19:46
mordredtobiash: we're mostly on containers19:46
mordredtobiash: we're still working on getting executors on containers - as well as our arm nodepool-builder19:46
mordredbut I think both of those are really close19:47
zbras long we  replace it with py36 before 2020-09-13 we should be fine19:47
mordredtobiash: but yes, that's right- we test py35 to ensure that we don't accidentlaly use too-new features19:47
mordredand in so doing break the opendev deployment19:47
tobiashzbr: fyi, there is a mailing list discussion about dropping py35: http://lists.zuul-ci.org/pipermail/zuul-discuss/2020-May/001225.html19:49
tobiashtldr as I understood is that as soon as opendev switched to containers we can ditch it19:50
zbrthat message describes the situation very well19:50
hasharhello20:03
hasharI am not talking much in here anymore. But just wanted to highlight Wikimedia has upgraded its Gerrit from 2.15 to 3.2 ~  8 days ago20:03
hasharin short: new modern ui that is way more pleasant than the old GWT based one20:04
hasharthere is support for git protocol v2  which makes fetches dramatically faster20:04
fungihashar: thanks for the update! paladox has been keeping us apprised too20:04
hasharand for the upgrade itself the person that did an update wrote a blog post upstream https://groups.google.com/g/repo-discuss/c/G5wucKJg9Ag/m/pLin-i3mBgAJ?pli=120:04
hasharahh paladox :] so nice20:05
fungiopendev is in progress planning a similar upgrade (though we'll need a pause at 2.16 i think for the notedb transition)20:05
mordredhashar: yes - I will be using info from that blog post to work on ours! :)20:05
mordredyeah. although having read the wikimedia account, maybe just plowing all the way to 3.2 over a weekend is the right choice ...20:06
hasharPaladox had setup a dev platform months ahead and contributed a lot back to Gerrit upstream20:07
fungihashar: also, good to see you around again for a bit--missed you!20:07
tristanCwe are also planning to work on upgrading software factory to gerrit 3.x in the next few months20:07
paladoxi fixed zuul v2 to support at least gerrit 2.1620:07
hasharand Christian had set up a whole replica of production has a staging/test bed area to do the whole migration without affecting production20:07
paladoxand seems to have worked with 3.2 :)20:07
hasharfungi: so kind thank you :]  I have been busy with a bunch of other duties20:07
fungiaren't we all? ;)20:08
avassmordred: that's what we did when we upgraded from 2.15 to 3.2 ;)20:09
hasharI guess software wise most culpirts have probably been fixed by now, though there might still be some fixes still pending review in upstream gerrit20:11
hasharthe key I guess was to be able to play the whole upgrade outside of production with same hardware/os/packages/git repos/database20:11
hasharand do it several tiome making sure everything went fine20:11
paladoxi'm currently migrating dark theme to a pref (so it'll replicate accross all login users)20:11
paladox*logged in20:12
mordredI'm fairly confident it's going to go pretty well for us due to all of the work of everyone else upgrading already :)20:12
tristanCmordred: ++ :)20:13
hashar;D20:14
*** harrymichal has quit IRC20:42
*** tumble has quit IRC20:44
fungione of the (very few) benefits to being slow to upgrade, i guess20:46
corvusi was just tracking down an issue seen on gerrit's nodepool that i mentioned here last week20:56
corvuswhere zuul was throwing a bunch of retry errors because the host key doesn't match20:56
funginew developments then?20:57
corvusthat happens at the setup playbook, so i think it's not matching the host key that nodepool sends it20:57
corvusand i think that's becaues nodepool is getting the host key very quickly after boot20:57
corvusi'm guessing the image may have a host key burned into it but then it gets re-generated at boot20:57
fungioh, and nodepool is racing the key regen at boot20:58
corvusya20:58
fungiwhat's the distro?20:58
corvusdebian buster20:58
corvusi'll try to catch 2 in a row and see if they start with the same key20:58
fungii thought the usual tactic was to strip the keys from the image so that sshd blocks waiting for key generation20:59
corvusme too20:59
fungii wonder if instead something like cloud-init is replacing keys proactively20:59
corvusonly other theory i have is arp related20:59
fungiwell, yeah, i wouldn't be surprised if there are rogue instances squatting the same ip addresses21:00
fungiit's not like that problem is necessarily unique to openstack nova (or even virtual machines in general)21:00
corvuscool, i spotted a second booting with the same key21:08
corvusAAAAC3NzaC1lZDI1NTE5AAAAIMWnt6KgVST9yHYCOCmSz7YxFG6lB7JuHt9NXfBeKi2I21:08
corvusit then immediately changed21:08
fungisame key but more importantly different ip address?21:09
corvusso i think that's good evidence for the theory that the image has the key baked in and it gets updated on boot theory21:09
corvusya21:09
corvusi'm not sure how to work around this21:09
fungiagreed then, that implies the image21:09
tobiashcorvus: interesting, but how can that be fixed (assuming it's a non custom image)21:10
fungii mean, ultra hacky solution is that if you know the baked-in key fingerprint, keep retrying until it's not that21:10
fungibut not at all elegant, and fragile if the image changes its baked-in key21:10
tobiashBlacklisting in nodepool21:10
corvushrm, yeah, that could work.  that's probably better than "sleep(5)" which is the best i've come up with so far :)21:11
tobiashI guess fixing the image is not an option?21:11
corvusit's the cloud-standard google-provided image; there might be a way to fix it, but it'd be nice to handle this case anyway (and for any cloud)21:12
tobiashNodepool could auto blacklist by remembering the last x keys21:12
corvus(also, did i say buster? i think i meant stretch)21:13
corvustobiash: that's good too -- everything after the first failure should work :)21:14
tobiashLike store a set of the last 1000 keys encountered and wait until we get a uniqur one21:14
avassI believe ec2 has an 'initializing' phase while running user-data, but I've only seen that through the web interface. does openstack/gce have something similar?21:14
fungiin theory, the folks in charge of building and uploading those images are regulars on the debian-cloud ml... i could certainly ask some questions if desired21:15
corvusavass: good q, i'll look for that21:16
corvusalso, there are 'startup scripts'; i could probably provide one that sets a metadata flag, and assuming it runs after key generation, that would indicate that we had progressed past that21:16
corvusavass: (also, yeah this is gce)21:16
corvus(or if the startup script doesn't run after that, it could background and wait)21:17
avassI'm trying to find out how to get that state through the api though21:17
corvusi only see the "status: RUNNING" field (which is what we're already using to detect the instance is ready)21:20
corvushttps://wiki.debian.org/Cloud/GoogleComputeEngineImage is relevant21:22
corvusi wonder if buster is different21:23
avassI think I found it for ec2 at least, it's under DescribeInstanceStatus: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceStatus.html instance-status.reachability and system-status.reachability21:23
fungicorvus: official debian cloud image building for buster changed significantly in an attempt to be more consistent across providers21:24
corvushere's what i think i'll do: i'll set up my local test env to do some boot tests where i output the ssh key as fast as possible, and try that with buster vs stretch21:25
corvusif buster is better, then it might make sense to just see if we can move gerrit's zuul to use buster :)21:25
corvusif they both show the problem, i'll look into a startup script to work around it21:26
fungiahh, yeah, transitioned from bootstrap-vz to fai for buster21:26
fungisounds like a good plan21:26
*** armstrongs has joined #zuul21:33
corvusoooh21:36
corvusin my local tests, i'm using the public ip, but gerrit's nodepool/zuul are using the private ip21:37
corvusi wonder if the public ip takes longer to hook up (NAT), so the key generation is done by the time i can connect21:37
corvusthat would explain why i never saw this locally21:37
corvus(and why now that i'm looking closely, it seems like it's taking much longer in my tests)21:38
corvusthat means i'll need to move my testing in-cloud :/21:39
*** armstrongs has quit IRC21:41
corvuswe could also probably just turn of host key checking; it seems pretty low-risk since this in entirely inside the cloud21:54
avasscorvus: looked through the gce docs but couldnt think of a better way to get the correct host-key except to let nodepool manage the host-keys. But that would require nodepool to be able to ssh to the node.21:55
corvus(perhaps in the gce driver, use-internal-ip:True should imply host-key-checking:False21:55
corvusavass: that is what nodepool does21:55
avasscorvus: I mean generate new host-keys and install them :)21:55
corvusooh :)21:55
corvusyeah, i don't think we need to go that far21:55
corvushey i managed to snag a reproduction in testing21:56
corvusit took about 4 tries21:56
corvusand reprod with buster too, same key as stretch21:59
corvus(so maybe there's only one key to watch out for?)22:00
*** rlandy is now known as rlandy|bbl22:08
*** tobiash has quit IRC22:12
corvusi think for the moment, i'm going to just turn off host key checking in gerrit's nodepool, since this is starting to look like it's hard to hit otherwise.  if that works, then it may not be worth doing any of the other rube-goldberg stuff.  :)22:14
corvushttps://gerrit-review.googlesource.com/c/zuul/ops/+/27465222:23
*** avass has quit IRC22:25
mordredmhu: related to our discussion yesterday about translations - here's a guide to aliasing git commands into more native swedish: https://github.com/bjorne/git-pa-svenska :)22:27
corvusmordred: if you're around, can you look at that gerrit-zuul change ^22:36
corvusit's all commit message :)22:36
mordredcorvus: yes!22:38
mordredcorvus: done22:39
mordredcorvus: (I'd already read the scrollback here)22:39
*** erbarr has quit IRC22:45
corvusmordred: hrm, apparently that was wrong22:46
corvus2020-07-07 22:45:47,586 DEBUG zuul.AnsibleJob.output: [e: 390951245725432ba62ac536b8014109] [build: 10c8322dbefe4cf2bfc67f51c70c9c21] Ansible output: b'    "msg": "Data could not be sent to remote host \\"10.128.15.214\\". Make sure this host can be reached over ssh: Host key verification failed.\\r\\n",'22:46
*** gmann has quit IRC22:47
*** erbarr has joined #zuul22:48
corvusi'm not entirely sure why we have that option if it behaves like that22:48
*** PrinzElvis has quit IRC22:50
*** webknjaz has quit IRC22:50
*** iamweswilson has quit IRC22:50
*** kmalloc has quit IRC22:50
*** mwhahaha has quit IRC22:50
*** mnaser has quit IRC22:50
*** jbryce has quit IRC22:50
*** piotrowskim has quit IRC22:51
*** kklimonda has quit IRC22:51
*** stevthedev has quit IRC22:51
*** johnsom has quit IRC22:51
*** maxamillion has quit IRC22:51
*** dcastellani has quit IRC22:51
*** rpittau has quit IRC22:51
*** evgenyl has quit IRC22:52
*** gundalow has quit IRC22:52
*** ericsysmin has quit IRC22:52
*** erbarr has quit IRC22:52
*** tdasilva has quit IRC22:52
*** samccann has quit IRC22:53
*** ChrisShort has quit IRC22:53
*** Open10K8S has quit IRC22:53
*** lseki has quit IRC22:53
*** zbr has quit IRC22:53
*** guilhermesp has quit IRC22:53
*** donnyd has quit IRC22:53
corvusokay, i'm really confused.  is anyone setting "host-key-checking" to false?22:59
corvusbecause based on what i just saw the gerrit zuul do, i don't see how you could use it and have a working configuration23:00
*** tosky has quit IRC23:01
fungii wonder if behavior for ansible changed since we implemented that23:04
mordredcorvus: yeah - that seems very not ok23:10
*** hamalq_ has quit IRC23:10
mordredcorvus: is there a corresponding action we need to do?23:10
mordredcorvus: the docs say "when set to false nodepool-launcher will not ssh-keyscan nodes"23:11
mordredbut that's nodepool side - we'd need to tell ansible on zuul's side to not do host key validation23:11
corvusmordred: there is no such option for zuul; i think the only thing we could do would be to somehow set ansible inventory variables to do that23:13
mordredcorvus: I agree - I don't see how this could be used - I think we need to pass host-key-checking in the node dict, and then in zuul we need to set host_key_checking = False in the ansible.cfg23:13
corvusyeah23:13
mordredor inventory args: ansible_ssh_extra_args='-o StrictHostKeyChecking=no'23:13
mordredhttps://stackoverflow.com/questions/23074412/how-to-set-host-key-checking-false-in-ansible-inventory-file23:13
mordredcorvus: so clearly nobody is using this23:14
corvusi think that's a reasonable thing to do (we'd probably want it to be inventory, since it's a per-host variable)23:14
corvusbut yeah, i'd also like to double check that23:14
mordredyeah. agree23:14
corvushrm, a lot of eu folks are not in channel right now23:14
*** hashar has quit IRC23:15
mordredcorvus: we should also double check that ansible_ssh_extra_args is extra and doesn't override ssh_args from ansible.cfg23:15
corvusi was hoping i could ping them now and collect responses tomorrow; i'll just try to ask first thing when i get up23:15
mordredI'd guess it is from naming23:15
mordredcorvus: it definitely seems like if we set host-key-checking off in nodepool that we'd want that to follow the node23:16
* mordred has to afk23:16
*** rlandy|bbl is now known as rlandy23:21
*** Goneri has quit IRC23:25
openstackgerritGuillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated create/delete  https://review.opendev.org/73919823:39
openstackgerritGuillaume Chauvel proposed zuul/zuul master: WIP: Scheduler: Reconfiguration ref-updated oldrev+newrev  https://review.opendev.org/73907823:39

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!