Monday, 2020-06-22

*** wuchunyang has joined #zuul00:59
*** wuchunyang has quit IRC01:05
*** swest has quit IRC01:55
*** swest has joined #zuul02:09
*** bhavikdbavishi has joined #zuul02:56
*** bhavikdbavishi has quit IRC03:04
*** bhavikdbavishi has joined #zuul03:05
*** bhavikdbavishi1 has joined #zuul03:08
*** bhavikdbavishi has quit IRC03:10
*** bhavikdbavishi1 is now known as bhavikdbavishi03:10
*** sgw has quit IRC03:21
*** wuchunyang has joined #zuul04:02
*** wuchunyang has quit IRC04:06
*** vishalmanchanda has joined #zuul04:29
*** evrardjp has quit IRC04:33
*** evrardjp has joined #zuul04:33
*** bhavikdbavishi has quit IRC04:36
*** bhavikdbavishi has joined #zuul04:37
*** bhavikdbavishi has quit IRC04:44
*** bhavikdbavishi has joined #zuul04:44
*** sgw has joined #zuul04:58
*** bhavikdbavishi has quit IRC05:02
*** sgw has quit IRC05:05
*** bhavikdbavishi has joined #zuul05:07
*** bhavikdbavishi1 has joined #zuul05:52
*** bhavikdbavishi has quit IRC05:53
*** bhavikdbavishi1 is now known as bhavikdbavishi05:53
*** saneax has joined #zuul06:10
*** rpittau|afk is now known as rpittau06:21
*** bhavikdbavishi has quit IRC06:40
*** dennis_effa has joined #zuul06:40
*** hashar has joined #zuul07:08
*** bhavikdbavishi has joined #zuul07:27
*** bhagyashris is now known as bhagyashris|lunc07:27
*** tosky has joined #zuul07:29
*** sshnaidm|off is now known as sshnaidm|ruck07:29
*** jcapitao has joined #zuul07:37
openstackgerritFelix Edel proposed zuul/zuul master: Introduce Patternfly 4  https://review.opendev.org/73622507:43
*** jpena|off is now known as jpena07:56
*** raukadah is now known as chandankumar07:58
*** dpawlik6 has quit IRC08:18
*** odyssey4me has joined #zuul08:30
odyssey4mehey folks, is the note in https://zuul-ci.org/docs/zuul-jobs/python-roles.html#role-ensure-python about only supporting debian true? if so, what're the options for centos/rpm ?08:30
*** bhagyashris|lunc is now known as bhagyashris08:34
*** dpawlik6 has joined #zuul08:35
odyssey4meit would appear to me that it's not true any more08:42
*** nils has joined #zuul08:42
openstackgerritJesse Pretorius (odyssey4me) proposed zuul/zuul-jobs master: [ensure-python] Remove debian-only note  https://review.opendev.org/73723108:44
*** dennis_effa has quit IRC08:50
avassodyssey4me: looks like it's only supported for debian unless you use pyenv08:52
avassso the documentation probably needs to be updated08:52
odyssey4meavass: yeah, I saw that and abandoned the patch... looks like holser is working on a fix in https://review.opendev.org/#/c/737060/108:53
holseryeah08:53
holserstarting now08:53
holseras I was reading/replying emails08:53
holsercoffee and back to patch08:53
holserbbi1008:53
avassoh, tell me when it's ready and I'll take a look at it then :)08:54
holseravass sure09:07
*** bolg has joined #zuul09:15
openstackgerritFelix Edel proposed zuul/zuul master: Introduce Patternfly 4  https://review.opendev.org/73622509:20
openstackgerritFelix Edel proposed zuul/zuul master: Introduce Patternfly 4  https://review.opendev.org/73622509:27
swestzuul-maint: I'd like to kindly ask you for a review of the circular dependency change https://review.opendev.org/#/c/685354/ as well as tobiash change queue refactor that need a second review from a Zuul maintainer https://review.opendev.org/#/q/topic:change-queues+(status:open+OR+status:merged)09:27
*** rpittau is now known as rpittau|bbl10:15
*** wuchunyang has joined #zuul10:38
*** wuchunyang has quit IRC10:49
*** wuchunyang has joined #zuul10:49
*** jcapitao is now known as jcapitao_lunch10:54
*** jpena is now known as jpena|lunch11:29
*** rlandy has joined #zuul11:49
*** rlandy is now known as rlandy|ruck11:50
*** bhavikdbavishi has quit IRC11:58
*** rfolco has joined #zuul12:05
*** wuchunyang has quit IRC12:11
*** rpittau|bbl is now known as rpittau12:22
*** jcapitao_lunch is now known as jcapitao12:23
*** jpena|lunch is now known as jpena12:32
openstackgerritFelix Edel proposed zuul/zuul master: Introduce Patternfly 4  https://review.opendev.org/73622512:35
*** felixedel has joined #zuul12:36
*** ysandeep|away is now known as ysandeep|PTO12:38
felixedelzuul-maint: The first Patternfly 4 patch is ready for review https://review.opendev.org/#/c/736225/ :)   It adds the patternfly4 react package, updates header, navbar and navigation drawer with Patternfly 4 components and adapts the global page layout (that's why every page/ file is changed). This should allow us to update the other components step12:42
felixedelby step from PF3 to PF4. The navigation should now also work like before and you shouldn't get lost in any undefined tenant ;-)12:42
openstackgerritJan Kubovy proposed zuul/zuul master: Scheduler's pause/resume functionality  https://review.opendev.org/70973512:55
openstackgerritJan Kubovy proposed zuul/zuul master: Separate connection registries in tests  https://review.opendev.org/71295812:55
openstackgerritJan Kubovy proposed zuul/zuul master: Prepare Zookeeper for scale-out scheduler  https://review.opendev.org/71726912:55
openstackgerritJan Kubovy proposed zuul/zuul master: Mandatory Zookeeper connection for ZuulWeb in tests  https://review.opendev.org/72125412:55
openstackgerritJan Kubovy proposed zuul/zuul master: Driver event ingestion  https://review.opendev.org/71729912:55
openstackgerritGuillaume Chauvel proposed zuul/zuul master: Add 'uuid' to 'src_dir' in order to allow parallel jobs for a static node  https://review.opendev.org/73598113:05
avassfelixedel: cool :)13:05
*** rlandy|ruck is now known as rlandy|ruck|mtg13:12
*** bhavikdbavishi has joined #zuul13:13
*** hashar is now known as hasharAway13:21
openstackgerritFelix Edel proposed zuul/zuul master: Introduce Patternfly 4  https://review.opendev.org/73622513:22
*** rlandy|ruck|mtg is now known as rlandy|ruck13:31
mordredfelixedel: nice!13:36
openstackgerritFelix Edel proposed zuul/zuul master: Introduce Patternfly 4  https://review.opendev.org/73622513:37
*** bhavikdbavishi has quit IRC13:48
*** rlandy|ruck is now known as rlandy|ruck|mtg13:49
*** sgw has joined #zuul13:49
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add tests for upload-docker-image  https://review.opendev.org/73540213:55
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Fix and test multiarch docker builds in a release pipeline  https://review.opendev.org/73705913:55
*** hasharAway is now known as hashar13:58
*** rlandy|ruck|mtg is now known as rlandy|ruck14:01
tristanCswest: i'll setup a zuul with multiple project and branch to validate 718531 , i should be able to finish the review soon14:16
*** felixedel has quit IRC14:21
corvusavass: https://review.opendev.org/735402 is green now with the htpasswd fix and the multiarch fix+test on top of it too: https://review.opendev.org/73705914:39
corvusmordred: ^14:39
corvuslanding those will let us make another attempt at the nodepool 3.x tag14:39
openstackgerritMerged zuul/nodepool master: Improve max-servers handling for GCE  https://review.opendev.org/73714614:45
*** sgw1 has joined #zuul14:48
*** saneax has quit IRC14:52
fungiianw: looks like vos release of the mirror.fedora volume is now around 7 seconds with rsync 3.1.3 on focal!14:52
fungi(with rsync -t that is)14:53
*** saneax has joined #zuul14:53
*** saneax has quit IRC14:54
*** saneax has joined #zuul14:55
fungiwe've still got some sizeable outbound spikes from 01.dfw every 2 hours, but nothing like before: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=2362&rra_id=all14:55
clarkbfungi: ww ?14:56
clarkb(excellent news though :) )14:56
fungiww?14:57
*** sanjayu_ has joined #zuul14:57
mordredfungi: \o/14:58
corvustobiash, fungi, mordred: moving this from #opendev -- looking at the zk kazoo tls issues, it seems to die on get_children on /nodepool/requests-lock;  that node has 7877 children.14:58
corvuswe only have 600 requests14:59
fungiclarkb: ianw: oops, wrong channel14:59
fungisorry for the noise14:59
corvusso i think that in addition to the kazoo issue, we may also have a nodepool bug leaking requests-lock entries?14:59
*** saneax has quit IRC15:00
fungiyeah, so an order of magnitude more entries than expected. is it continuing to grow, or did we likely leak some at one point and they're just hanging around?15:01
fungijust wondering if they could be cruft from an old leak15:02
bolgcorvus: we experienced similar issue with tobiash in on one of our test environments with get_children when there are too many nodes (arround 8000 in our case 2w ago)15:03
fungigranted, if kazoo is dying at only an order of magnitude difference from our production case, that's still concerning15:03
corvusi'll check the numbers to see when they leaked15:04
*** hashar is now known as hasharAway15:18
openstackgerritMatthieu Huin proposed zuul/zuul master: [WIP] Web UI: add i18n support, french translations  https://review.opendev.org/73729015:23
*** hamalq has joined #zuul15:27
*** bhavikdbavishi has joined #zuul15:32
corvusthe oldest request lock is 6744 requests before the latest one, which doesn't seem very old15:33
corvuswe delete them after 8 hours, so i think this is expected behavior15:36
corvusi don't think we need changes to nodepool; i'll proceed with diagnosing the kazoo issue15:36
fungithe locks don't get deleted once the request is handled?15:39
corvusfungi: not the directory that holds the lock15:41
corvuswe might be able to make it smarter and delete the lock dir after deleting the request15:42
fungioh, i get it. yep15:44
fungiso it's because we're not immediately cleaning up empty trees15:44
corvusclarkb: i wonder if https://github.com/python-zk/kazoo/issues/587 really is the same issue -- that report seems to have the error on start rather than during a typical read15:46
clarkbcorvus: I associated them due to the operation did not complete errors lining up other than the ssl c line number. I figured that could be due to different openssl versions15:47
clarkbcorvus: but ya entierly possible there is a separate similar issue going on15:47
corvusyeah... maybe the nicest thing to do is open another issue and link back to that one15:49
*** hasharAway is now known as hashar15:53
openstackgerritMerged zuul/zuul-jobs master: Add tests for upload-docker-image  https://review.opendev.org/73540215:55
openstackgerritMerged zuul/zuul-jobs master: Fix and test multiarch docker builds in a release pipeline  https://review.opendev.org/73705915:55
*** rpittau is now known as rpittau|afk16:01
corvusmordred: you will appreciate that the first step i have taken in tracking down the problem further in kazoo is to disable the exception relocation code which is masking the real exception16:09
corvusand i think i have a fix16:13
*** sshnaidm|ruck is now known as sshnaidm|afk16:13
corvusit's the old ssl_want_read issue16:13
*** Goneri has joined #zuul16:14
mordredcorvus: I do appreciate that16:14
corvusi'm preparing a pr now16:16
avasscorvus: cool, it doesn't test buildx yet though :)16:18
avassbut could do that in another change16:18
avassis nodepools version synched to zuul? like will it be 3.19.0?16:19
corvusavass: nope; though i expect us to resync at 4.x16:19
corvusavass: are you sure it doesn't test buildx?16:19
corvusavass: let's back up.  what do you mean by "it"? :)16:19
corvusavass: i agree that 735402 does not test buildx; but i think 737059 should.16:20
sshnaidm|afkfolks, please review ansible collections roles for zuul in your time: https://review.opendev.org/#/c/730360/16:21
avassthe buildx part of build/upload docker image doesn't use the docker_registry variables16:22
avassunless I missed something16:23
corvusavass: i don't think it needs to; i think the upload part should be the same (the buildx path pulls the image from the buildx builder back on to the node's docker cache so that the push can run normally)16:24
avassthe push is different depending on if it's buildx or not though16:25
avassit's either 'docker push ...' if  it's normal docker or 'docker buildx ... --push', and upload-docker-image/tasks/buildx.yaml doesn't use the buildset_registry variable yet16:27
corvusoh, hrm, i thought setting the multiarch flag in the job would run that path, but it didn't.16:27
avasslet me push a quick change so show what I mean :)16:28
corvushttps://zuul.opendev.org/t/zuul/build/a55b55260589421e99bc5386101ff93a/console  does look like it ran the 'docker build' not 'buildx' path16:28
corvusavass: i understand16:28
corvus737059 should have failed.  its job is not testing what it says it tests.  the thing we should figure out is why the multiarch flag didn't cause buildx to be used.  i wonder if the other multiarch jobs work?16:30
avassgood, it's probably not a lot to do to make sure it's tested though16:30
avassupload-docker-image only checks if 'docker_images.arch' is defined16:31
mordredcorvus: it doens't make sense to me that it doesn't run the arch code16:31
mordredavass: yeah - but it shoudl be due to that ternary16:31
mordredwe don't need to do multiarch | bool | ternary do we?16:33
avassah, multiarch is only used in testing16:34
avassand that toggles between using images that sets the arch attribute and images that don't16:35
*** jcapitao has quit IRC16:35
mordredyeah16:36
mordredat least in theory. it seems it might not actually be doing that16:36
avasswell, it's not used for the release-test roles16:36
mordredoh! duh16:37
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731516:39
avasssomething like that is needed, and that should break.16:39
mordredavass: yes - I agree, that should do the trick16:41
*** nils has quit IRC16:44
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731516:45
avasscorvus: and I believe that is what you want to do ^ ?16:46
avassactaully no16:46
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731516:46
*** hashar is now known as hasharAway16:47
avassI think that should mirror what the normal docker push does16:48
corvusmordred, clarkb, fungi, tobiash, bolg: https://github.com/python-zk/kazoo/issues/618  and  https://github.com/python-zk/kazoo/pull/61916:58
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731517:00
fungicorvus: oh neat, so it's not handling server side buffering i guess?17:02
tobiashawesome, that was quick17:02
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731517:03
corvusfungi: well, it's openssl's weird internal state machine thing where in order to proceed with reading, sometimes reading or writing needs to happen.17:04
fungiahh, so it's not as simple as polling a read and getting back zero bytes and then trying again when there's more data in the buffer17:08
corvusfungi: well, it is, except that it can be polling a read, getting back zero bytes, then trying again when the socket is writable.17:08
fungiahh, and judging from your patch it's sufficient to just ignore those conditions since it will keep polling anyway?17:13
corvusfungi: yeah.  this is the approach we took in gear17:18
corvusseems to have worked out okay :)17:18
fungimakes sense, thanks17:18
*** jpena is now known as jpena|off17:23
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731517:25
mordredcorvus: so - now we just have to wait for that to land and be released17:26
tobiashcorvus: btw I just saw that kazoo dropped testing of py35: https://github.com/python-zk/kazoo/releases/tag/2.7.017:26
corvustobiash: i guess we're providing the 3.5 testing, until we drop it :)17:32
tobiashseems so :)17:38
avassmordred: might need some help with buildx: https://zuul.opendev.org/t/zuul/build/f74165333e9443c2b3ce7b9adbed4470/log/job-output.txt#603 :)17:39
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731517:39
mordredavass: https://zuul.opendev.org/t/zuul/build/f74165333e9443c2b3ce7b9adbed4470/log/job-output.txt#58017:41
avassah yep, just saw taht17:41
mordredavass: we normally set up one of those for buildset registry17:41
mordredso maybe we're missing an equiv step in test land17:41
mordredavass: it's "neat" that that error doesn't cause the task to fail17:42
avassyep, I tried to just re-use the registry used for testing upload-docker-image and hoped for the best17:42
avassdo I need to create that somehow?17:43
avassbecause I think it should be close to working otherwise17:44
mordredavass: the use-buildset-registry role does it17:45
mordredbut - buildset registry might not make a ton of sense for this codepath?17:46
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731517:46
avasswe need one for testing buildx17:46
avasssomething like the last patchset there ^ ?17:46
mordredavass: yeah - that should work17:48
avassoh... does buildx build everything in nested containers?17:54
*** rlandy|ruck is now known as rlandy|ruck|mtg17:59
mordredyup18:12
mordredavass: welcome to the super magical magic magic docker docker magic docker18:13
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731518:14
avassyeah...18:14
avassmordred: I guess that's why we were using 'ansible_host' instead of just localhost18:14
mordredavass: yeah18:15
SpamapSmordred: are you sure you captured all of the magic?18:17
mordredSpamapS: docker docker what?18:17
mordredSpamapS: docker magic docker docker18:17
SpamapSdocker ok that docker makes docker sense.18:18
SpamapSdocker on18:18
fungidocker on garth18:20
SpamapShmm, how do we stop the game then?  "KERNEL PANIC!"18:22
fungifished you in18:23
*** hasharAway has quit IRC18:23
*** bhavikdbavishi has quit IRC18:25
*** rlandy|ruck|mtg is now known as rlandy|ruck18:26
openstackgerritGuillaume Chauvel proposed zuul/zuul-jobs master: prepare-workspace: Add Role Variable in README.rst  https://review.opendev.org/73735218:36
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731518:40
*** y2kenny has joined #zuul18:47
y2kennyI understand that a nodeset can define to have multiple node with different label/node type.  Is it possible to request a nodeset that consist of labels from multiple provider?  For example, can I define a nodeset with a node from Static provider and a node from OpenStack provider?18:48
avassy2kenny: nope18:49
avassy2kenny: not at the moment at least18:49
y2kennyavass: ok so that's a known limitation.  Is there something very fundamental that prevents this feature from being implemented?18:51
avassy2kenny: not that I know of, I believe the reason is to ensure that the nodes are in the same network18:52
mnasercrazy idea of the day: would it be sane to introduce pre-merge jobs for zuul? like yes, the whole concept is zuul is pre-merge.. but the concept for this case is: i have helm charts and dockerfile's in a repo, when my change merges, the repo is updated with the newest helm charts, my cd platform notices the repo changing and starts kicking off a new deploy -- but promote hasn't started/finished running yet.. so it uses18:52
mnaserstale images18:52
mnasercould we do this in a way of like... revamping our upload job to promote right after if the entire buildset passes (but i guess that means upload job would depend on * jobs)18:53
avassmnaser: I guess you could use a dependent job for that18:54
avassbut I guess that would have to depend on everything else in the buildset18:55
avassand if there's anything ahead of that buildset in the queue that could be a problem18:56
corvusmnaser: instead of having your cd system (argo?) watch the repo, can you kick it with a promote job?18:57
mnasercorvus: that would be nice, but that would mean encoding info of every single environment we deploy to into these jobs -- something we're trying to avoid because we don't always have direct access to their locations over the internets18:59
corvusmnaser: sorry i mean just kick argo18:59
corvuslike, tell argo not to watch the repo, but instead use the argo cli to run a convergence (whatever argo calls that, i forget)18:59
mnasercorvus: right, argo isn't publicly accessible (even from our internal zuul that we run) for some stuff, which makes it hard to do that18:59
mnaserso i can curl http://argo/go-update or so19:00
mnaser(I mean, if i have to encode some logic to do some pre-deploy checks, that's on me too, but just wondered if it could be a use case)19:00
mnaserbut i guess that _might_ require a custom set of pipelines (gate (all gate checks), pre-merge (post gate pass, promote images), promote (post merge)19:01
*** hashar has joined #zuul19:02
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731519:04
corvusmnaser: a key but subtle behavior that zuul is built around is that the change isn't merged until it's merged.  zuul can't guarantee that gerrit (or github) will actually succeed in merging the change.  it tries to get as close as possible, but sometimes it fails.  i think a tighter integration would be possible if we enabled zuul-push (so zuul performs the merge instead of gerrit/github/etc).19:05
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731519:13
openstackgerritGuillaume Chauvel proposed zuul/zuul-jobs master: prepare-workspace: Add Role Variable in README.rst  https://review.opendev.org/73735219:18
y2kennyavass: where do you folks usually keep track of feature request?  Being able to mix use provider for nodeset can be useful.19:19
avassy2kenny: I would guess https://storyboard.openstack.org/#!/project/67919:21
y2kennyavass: great, thanks.19:21
fungialso accessible these days as https://storyboard.openstack.org/#!/project/zuul/zuul19:22
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731519:25
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731519:44
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731519:56
*** sanjayu_ has quit IRC19:56
y2kennycorvus: last friday I asked about log streaming not available.  I tried doing echo SHA | nc localhost 7900 and I see the log but there's still no stream from the web ui20:04
y2kennyI also tried doing the same thing from the web-ui server and I get the log20:05
y2kennyare there anything else I can try to debug this?20:05
y2kenny(right now I am thinking about restarting the browser but I doubt that's the issue since I didn't get a log stream from a different browser either.)20:05
corvusy2kenny: you might open the devtools console in your browser and see if there are any relevant errors.  is it possible zuul-web is behind something (a reverse proxy?) that interferes with websockets?20:08
y2kennycorvus: ok I will give that a shot20:08
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731520:14
y2kennycorvus: interesting... I am getting 500 for console-stream websocket20:16
*** vishalmanchanda has quit IRC20:16
y2kennylet me see log for the web-ui server....20:16
y2kennya bunch of error ws4py.exc.HandshakeError: Illegal value for header Connection: keep-alive20:18
avassy2kenny, corvus: I have a vague memory of this also happening if you're running the executor in a container on a separate host or docker network since the executor reports it's own hostname for the web to connect to20:19
avasssince that would report the container hash20:19
y2kennyavass: does that happen all the time or only occasionally?  Because the stream used to work and I haven't really change how I deploy Zuul20:20
avassah, no in that case the logs would never work20:21
mordredy2kenny: my hunch is that sounds like regular http is getting passed to the websocket - like it's not going through the ws upgrade... maybe something changed in whatever you're using to do proxying?20:22
mordredif you look at https://ws4py.readthedocs.io/en/latest/_modules/ws4py/client/20:22
mordredin "def handshake_headers"20:22
mordredthere's a list of appropriate headers ... and Connection is, I think, supposed to be "upgrade"20:22
mordredbut that's a COMPLETE guess20:23
avassmordred: yeah20:24
avassmordred: any ideas why buildx is getting connection refused when trying to push images?20:25
avassy2kenny: if it's behind a reverse proxy, I set up my nginx config like this: https://github.com/dhackz/infra/blob/master/nginx/conf/zuul.conf#L1220:26
y2kennymordred, avass: I don't think I am using any reverse proxy.  I deployed zuul on top of k8s and using metallb.20:27
mordredah. hrm.20:27
y2kennycould be bug in metallb?  I haven't restarted that20:27
mordredmaybe? maybe it's doing something (or not doing something)20:28
y2kennyand the web-ui is still working...  just not the console streaming bit20:28
corvusis there an ingress controller involved?20:28
y2kennyno, just using metallb20:28
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Test multiarch release builds  https://review.opendev.org/73731520:41
*** rfolco has quit IRC20:59
*** hashar has quit IRC21:27
y2kennyWhat's the best way to kill a stuck job?  (I have asked before but I forgot the answer.)21:39
clarkby2kenny: I'm not sure this is the best way but you can kill the ansible process associated with that job21:42
clarkban easier sledgehammer approach is to restart the executor running the ansible process21:42
y2kennyclarkb: I think the method may be dequeue21:42
y2kennythat should work right?21:42
clarkbah yup that would do it too, but that does the whole change21:42
clarkbso is a smaller hammer21:42
y2kenny:)21:43
*** y2kenny has quit IRC21:47
fungiyes, you can use the zuul dequeue rpc cli subcommand, but as clarkb notes it will abort/cancel all builds for the entire buildset rather than just failing the one build21:47
fungialso i don't know whether that will actually cause ansible processes to terminate, depending on how the build is "stuck"21:48
*** y2kenny has joined #zuul21:49
y2kennyIf I write a new nodepool driver, are there any corresponding change needed on the scheduler side?21:57
y2kennyor is everything about the node's life cycle handled within the driver?21:58
clarkby2kenny: I am pretty sure it should all be in the driver22:00
y2kennyI've got a prototype driver going and it launches some node  so they appear in the web ui22:01
y2kennyI try to define a nodeset to use the label, the request seems to be routed to the driver properly22:02
y2kennybut then the job got stuck22:02
y2kennyand I am trying to figureout what might be missing22:02
y2kennyFrom the nodepool side I see "Fullfilled node request" from NodeRequestHandler22:04
clarkbzuulians https://review.opendev.org/#/c/737027/1 is an easy docs update22:05
clarkby2kenny: on the zuul side you should be able to trace the reqest too to see if it is unhappy with what it got22:05
clarkbmaybe a missing field or similar in the zookeeper blob22:06
y2kennyclarkb: by zuul side do you mean the scheduler?22:06
y2kennyclarkb: or another bit?22:06
fungischeduler22:07
clarkby2kenny: the zuul scheduler and executors22:07
clarkbthe scheduler coordinates the noderequests to nodepool then it hands that data over to the executor to use it in the job22:07
clarkbso it could be in either spot22:07
fungiat least start with the scheduler, but then yeah if it seems to be engaging an executor check to see if the executor logs say it's having trouble connecting or something22:07
y2kennyclarkb: so when scheduler say "Completed node request" that's the point of handing over to the executor right?  (I also see ExecutorClient Execute job... after that.)22:11
y2kennyAnd the executor receving it is "ExecutorServer"?22:13
clarkby2kenny: yes to the first thing, once node request is completed I would expect the next thing that is done with it is the info is passed to the executor22:16
clarkby2kenny: for your second question what is the context? docker container name? log prefix?22:17
y2kennythe context is Executor log22:17
clarkby2kenny: you should be able to grep for the event id of your triggering event across all services and have it show up consistently I think in the logs22:17
clarkbthe other thing I'll do is grep for the build sha in the executor logs22:17
y2kennyclarkb: ok.  I am seeing multiple node request for a single job for some reason22:18
y2kennythat seems odd22:18
y2kennyclarkb: build sha is not the same as event id right?22:20
clarkby2kenny: correct they are separate. The build sha is associated to the job build side and the event id is from the trigger event input22:20
fungiy2kenny: correct, the event id appears in the log as "e: ..." where the ... is some uuid22:20
fungiin [] brackets i believe22:21
fungiyeah, it'll look like [e: 12c542cdd9d444c5a8d2e256c6691bea]22:21
fungitaken from one of our scheduler logs just now22:21
y2kennyfungi: does event id get pass to the executor side or only build id?22:24
clarkbI thought both did but I could be wrong about that22:24
fungievent id in the executor logs should correspond, checking22:24
*** armstrongs has joined #zuul22:26
fungiy2kenny: confirmed, i can find that exact event i pasted above in one of our executor's debug logs too22:26
y2kennyI can associate eventid in both scheduler and nodepool but having trouble finding the corresponding event id in the executor...22:26
y2kennywould that be an indication of what is stuck?22:27
fungiif no executor ever picked up the job request in the gearman queue, then it wouldn't be logged on any executor22:27
clarkbya that could mean the scheduler is having trouble after the node request is completed but before it gets executed by the executor22:28
y2kennywhat does the scheduler do with the node request after it is completed?22:28
y2kennydoes it try to talk to the node in any way?22:28
*** armstrongs has quit IRC22:35
y2kennyoh and an unrelated note... does opendev.org search function sometimes don't work?22:35
fungithe scheduler never tries to contact the node, it adds the node information to a gearman request which waits in geard's queue for an executor to claim it22:42
clarkby2kenny: ya the search problem is a known issue with gitea, they have plans for fixing it. For that reason we've kept http://codesearch.openstack.org running22:44
clarkbwe hope that gitea will eventually take over those duties though :/22:44
y2kennyfungi: so is there something in gearman I can examine about noderequest?22:44
y2kennyclarkb: that's good to know.  Thanks.22:45
clarkbguillaumec: left some thoughts on https://review.opendev.org/#/c/732066/2222:45
fungiy2kenny: y2kenny: https://zuul-ci.org/docs/zuul/howtos/troubleshooting.html#gearman-jobs22:45
y2kennyfungi: ok.  I was using that earlier but I thought those are just stats and counts... I will dig into it deeper22:48
corvusy2kenny: the scheduler should tell you in the debug logs exactly what it's doing22:48
clarkbtobiash: https://review.opendev.org/#/c/730624/ would be a good one to update when you get a chance. Since I think that does address a class of problem with the window sizing22:49
y2kennycorvus: from the scheduler log, the last useful thing I see is "INFO zuul.ExecutorClient: [e: c19e087de6f64c4790d16b73a2008946] Execute job x-ipmi (uuid: 25abe404e9134fc7a4529a37f905a90d) on nodes <NodeSet [ for change  with dependent changes [{'"22:51
y2kennyso reading this again... does the left open bracket of "<NodeSet [" suppose to happen?22:51
clarkby2kenny: Execute job build-openstack-sphinx-docs (uuid: ce224c2d3c35447dbd25d2e05115ac97) on nodes <NodeSet ubuntu-xenial [<Node 0017300685 ('ubuntu-xenial',):ubuntu-xenial>]> for change <- that is what the openstack driver produces22:53
clarkbit looks like maybe your node request was fulfilled incompletely?22:53
y2kennyclarkb: ok... so looks like there is something wrong with the nodeset being returned22:53
y2kennybefore that line I got "INFO zuul.nodepool: Setting nodeset <NodeSet [ in use"22:54
clarkby2kenny: maybe double check what ends up in the zookeeper db and ensure that looks correct to you as far as info goes22:56
clarkband if it does let me know what you're lookign at and we can probably grab similar from openstack driver to compare22:56
y2kennyclarkb: what's the best way to drill into zookeeper db? (I assume that's different from gearman?)22:57
*** tosky has quit IRC22:59
clarkbya there is a tool that helps let me find it22:59
corvuszk-shell22:59
clarkby2kenny: https://pypi.org/project/zk-shell/ that tool22:59
clarkbit allows you to navigate the db "filesystem" and dump node contents22:59
y2kennyclarkb: ok I will take a look.  fwiw, "nodepool list" don't seems to show anonmoly.23:00
y2kenny| 0000000781 | 10.6.198.1     | zuul-nodes-exp | ng0006              | None        | None | in-use | 00:00:49:56 | locked   |23:01
corvusy2kenny: there's a verbose option to nodepool list, but it still may not show everything23:01
openstackgerritMerged zuul/zuul master: gitlab - add driver documentation  https://review.opendev.org/73388023:02
y2kennycorvus: oh... that dumped a lot of info... (I used it with --debug)23:02
fungiy2kenny: the stray "[" there should be a nodeset name, so i wonder if you have a yaml parsing issue (reading a string where you intended a list maybe?)23:04
y2kennyfungi: that would be part of the job definition?23:05
fungialso the change id seems to be empty and the dependent changes list looks like an opening to a list of dictionaries23:05
*** rlandy|ruck is now known as rlandy|ruck|bbl23:05
fungiy2kenny: yeah it would probably be the nodeset parameter of that job definition, or of a parent job23:06
y2kennyfungi: the change id I just truncated what I copied.  sorry about that one.23:06
fungiahh, okay23:06
fungiit was hard to tell23:06
y2kennyclarkb: so I have zk-shell hooked up and I used the tree to look at things23:10
y2kennyis there a particular branch I should go into?23:11
clarkby2kenny: it will be something like /nodepool/node-requests/something/something23:11
clarkbI always have to explore a bit when I do it23:11
clarkbtobiash: left +2 on https://review.opendev.org/#/c/710034/ but didn't approve as I think my comments are worth considering before we land that23:12
clarkbtobiash: but I think its mostly cosmetic things that I called out23:12
clarkbbut since it is a mostly cosmetic change anyway I figured those were worth a look23:12
y2kennyclarkb: I see /nodepool/nodes/<Id>/lock/<some other stuff...>23:13
clarkbhrm if you don't have active requests they may not be there?23:13
clarkbits the noderequest completing which maps onto the nodes path above23:14
clarkbI think you want to check your node requests as they are completed23:14
clarkbmay be easier to log that in the services due to the cleanup race?23:14
y2kennybut doesn't this mean the node request was completed successfully?23:15
clarkby2kenny: I think technically it was, what I think may be happening is that the contents of that successful transition aren't complete enough for zuul to figure out which nodes to use23:16
clarkbits successful according to the protocol, but probably not when actually needing to use those nodes23:16
*** rfolco has joined #zuul23:18
y2kennyclarkb: ok... I am not sure how to capture that.  And the driver doesn't control that transition does it?  (like... I don't recall implementing something that flip the state from "ready" to "in-use")23:19
clarkby2kenny: the driver signals that it is done then the scheduler takes over from there. I think what you want to do is log the contents of the node request when the driver is marking the request as completed23:19
y2kennyI assume everything the scheduler/executor needs is in zk.Node23:19
clarkby2kenny: the scheduler looks at the node request to figure out what zk Nodes to use23:20
clarkbthe node request completes with a list of ndoes in it that fulfill the request23:20
clarkband it almost seemsl ike maybe that is the bit that is missing23:20
y2kennyclarkb: OOhhh kay... this is the launchesComplete of NodeRequestHandler?23:21
clarkby2kenny: its actually https://opendev.org/zuul/nodepool/src/branch/master/nodepool/driver/__init__.py#L707-L719 which is generic to all drivers so maybe that isn't the issue23:26
clarkbbased on the log you have from the scheduler it seems the request didn't provide complete node(request) data23:27
clarkband you'll need to work backward from there23:27
y2kennyok23:27
y2kennyclarkb: I am also comparing the zk node entry and see if I miss something (comparing with a working driver entry like k8s)23:28
*** hamalq has quit IRC23:37
clarkbcorvus: mhu tristanC left a couple of nits on https://review.opendev.org/#/c/734134/ will defer to others if that should be approved as is or if we should clean that stuff up23:55
clarkbtristanC: ^ you approved the child so maybe you want to just approve that one if the nits are minor enough23:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!