Monday, 2020-06-22

*** wuchunyang has joined #zuul		00:59
*** wuchunyang has quit IRC		01:05
*** swest has quit IRC		01:55
*** swest has joined #zuul		02:09
*** bhavikdbavishi has joined #zuul		02:56
*** bhavikdbavishi has quit IRC		03:04
*** bhavikdbavishi has joined #zuul		03:05
*** bhavikdbavishi1 has joined #zuul		03:08
*** bhavikdbavishi has quit IRC		03:10
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:10
*** sgw has quit IRC		03:21
*** wuchunyang has joined #zuul		04:02
*** wuchunyang has quit IRC		04:06
*** vishalmanchanda has joined #zuul		04:29
*** evrardjp has quit IRC		04:33
*** evrardjp has joined #zuul		04:33
*** bhavikdbavishi has quit IRC		04:36
*** bhavikdbavishi has joined #zuul		04:37
*** bhavikdbavishi has quit IRC		04:44
*** bhavikdbavishi has joined #zuul		04:44
*** sgw has joined #zuul		04:58
*** bhavikdbavishi has quit IRC		05:02
*** sgw has quit IRC		05:05
*** bhavikdbavishi has joined #zuul		05:07
*** bhavikdbavishi1 has joined #zuul		05:52
*** bhavikdbavishi has quit IRC		05:53
*** bhavikdbavishi1 is now known as bhavikdbavishi		05:53
*** saneax has joined #zuul		06:10
*** rpittau\|afk is now known as rpittau		06:21
*** bhavikdbavishi has quit IRC		06:40
*** dennis_effa has joined #zuul		06:40
*** hashar has joined #zuul		07:08
*** bhavikdbavishi has joined #zuul		07:27
*** bhagyashris is now known as bhagyashris\|lunc		07:27
*** tosky has joined #zuul		07:29
*** sshnaidm\|off is now known as sshnaidm\|ruck		07:29
*** jcapitao has joined #zuul		07:37
openstackgerrit	Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225	07:43
*** jpena\|off is now known as jpena		07:56
*** raukadah is now known as chandankumar		07:58
*** dpawlik6 has quit IRC		08:18
*** odyssey4me has joined #zuul		08:30
odyssey4me	hey folks, is the note in https://zuul-ci.org/docs/zuul-jobs/python-roles.html#role-ensure-python about only supporting debian true? if so, what're the options for centos/rpm ?	08:30
*** bhagyashris\|lunc is now known as bhagyashris		08:34
*** dpawlik6 has joined #zuul		08:35
odyssey4me	it would appear to me that it's not true any more	08:42
*** nils has joined #zuul		08:42
openstackgerrit	Jesse Pretorius (odyssey4me) proposed zuul/zuul-jobs master: [ensure-python] Remove debian-only note https://review.opendev.org/737231	08:44
*** dennis_effa has quit IRC		08:50
avass	odyssey4me: looks like it's only supported for debian unless you use pyenv	08:52
avass	so the documentation probably needs to be updated	08:52
odyssey4me	avass: yeah, I saw that and abandoned the patch... looks like holser is working on a fix in https://review.opendev.org/#/c/737060/1	08:53
holser	yeah	08:53
holser	starting now	08:53
holser	as I was reading/replying emails	08:53
holser	coffee and back to patch	08:53
holser	bbi10	08:53
avass	oh, tell me when it's ready and I'll take a look at it then :)	08:54
holser	avass sure	09:07
*** bolg has joined #zuul		09:15
openstackgerrit	Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225	09:20
openstackgerrit	Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225	09:27
swest	zuul-maint: I'd like to kindly ask you for a review of the circular dependency change https://review.opendev.org/#/c/685354/ as well as tobiash change queue refactor that need a second review from a Zuul maintainer https://review.opendev.org/#/q/topic:change-queues+(status:open+OR+status:merged)	09:27
*** rpittau is now known as rpittau\|bbl		10:15
*** wuchunyang has joined #zuul		10:38
*** wuchunyang has quit IRC		10:49
*** wuchunyang has joined #zuul		10:49
*** jcapitao is now known as jcapitao_lunch		10:54
*** jpena is now known as jpena\|lunch		11:29
*** rlandy has joined #zuul		11:49
*** rlandy is now known as rlandy\|ruck		11:50
*** bhavikdbavishi has quit IRC		11:58
*** rfolco has joined #zuul		12:05
*** wuchunyang has quit IRC		12:11
*** rpittau\|bbl is now known as rpittau		12:22
*** jcapitao_lunch is now known as jcapitao		12:23
*** jpena\|lunch is now known as jpena		12:32
openstackgerrit	Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225	12:35
*** felixedel has joined #zuul		12:36
*** ysandeep\|away is now known as ysandeep\|PTO		12:38
felixedel	zuul-maint: The first Patternfly 4 patch is ready for review https://review.opendev.org/#/c/736225/ :) It adds the patternfly4 react package, updates header, navbar and navigation drawer with Patternfly 4 components and adapts the global page layout (that's why every page/ file is changed). This should allow us to update the other components step	12:42
felixedel	by step from PF3 to PF4. The navigation should now also work like before and you shouldn't get lost in any undefined tenant ;-)	12:42
openstackgerrit	Jan Kubovy proposed zuul/zuul master: Scheduler's pause/resume functionality https://review.opendev.org/709735	12:55
openstackgerrit	Jan Kubovy proposed zuul/zuul master: Separate connection registries in tests https://review.opendev.org/712958	12:55
openstackgerrit	Jan Kubovy proposed zuul/zuul master: Prepare Zookeeper for scale-out scheduler https://review.opendev.org/717269	12:55
openstackgerrit	Jan Kubovy proposed zuul/zuul master: Mandatory Zookeeper connection for ZuulWeb in tests https://review.opendev.org/721254	12:55
openstackgerrit	Jan Kubovy proposed zuul/zuul master: Driver event ingestion https://review.opendev.org/717299	12:55
openstackgerrit	Guillaume Chauvel proposed zuul/zuul master: Add 'uuid' to 'src_dir' in order to allow parallel jobs for a static node https://review.opendev.org/735981	13:05
avass	felixedel: cool :)	13:05
*** rlandy\|ruck is now known as rlandy\|ruck\|mtg		13:12
*** bhavikdbavishi has joined #zuul		13:13
*** hashar is now known as hasharAway		13:21
openstackgerrit	Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225	13:22
*** rlandy\|ruck\|mtg is now known as rlandy\|ruck		13:31
mordred	felixedel: nice!	13:36
openstackgerrit	Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225	13:37
*** bhavikdbavishi has quit IRC		13:48
*** rlandy\|ruck is now known as rlandy\|ruck\|mtg		13:49
*** sgw has joined #zuul		13:49
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402	13:55
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Fix and test multiarch docker builds in a release pipeline https://review.opendev.org/737059	13:55
*** hasharAway is now known as hashar		13:58
*** rlandy\|ruck\|mtg is now known as rlandy\|ruck		14:01
tristanC	swest: i'll setup a zuul with multiple project and branch to validate 718531 , i should be able to finish the review soon	14:16
*** felixedel has quit IRC		14:21
corvus	avass: https://review.opendev.org/735402 is green now with the htpasswd fix and the multiarch fix+test on top of it too: https://review.opendev.org/737059	14:39
corvus	mordred: ^	14:39
corvus	landing those will let us make another attempt at the nodepool 3.x tag	14:39
openstackgerrit	Merged zuul/nodepool master: Improve max-servers handling for GCE https://review.opendev.org/737146	14:45
*** sgw1 has joined #zuul		14:48
*** saneax has quit IRC		14:52
fungi	ianw: looks like vos release of the mirror.fedora volume is now around 7 seconds with rsync 3.1.3 on focal!	14:52
fungi	(with rsync -t that is)	14:53
*** saneax has joined #zuul		14:53
*** saneax has quit IRC		14:54
*** saneax has joined #zuul		14:55
fungi	we've still got some sizeable outbound spikes from 01.dfw every 2 hours, but nothing like before: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=2362&rra_id=all	14:55
clarkb	fungi: ww ?	14:56
clarkb	(excellent news though :) )	14:56
fungi	ww?	14:57
*** sanjayu_ has joined #zuul		14:57
mordred	fungi: \o/	14:58
corvus	tobiash, fungi, mordred: moving this from #opendev -- looking at the zk kazoo tls issues, it seems to die on get_children on /nodepool/requests-lock; that node has 7877 children.	14:58
corvus	we only have 600 requests	14:59
fungi	clarkb: ianw: oops, wrong channel	14:59
fungi	sorry for the noise	14:59
corvus	so i think that in addition to the kazoo issue, we may also have a nodepool bug leaking requests-lock entries?	14:59
*** saneax has quit IRC		15:00
fungi	yeah, so an order of magnitude more entries than expected. is it continuing to grow, or did we likely leak some at one point and they're just hanging around?	15:01
fungi	just wondering if they could be cruft from an old leak	15:02
bolg	corvus: we experienced similar issue with tobiash in on one of our test environments with get_children when there are too many nodes (arround 8000 in our case 2w ago)	15:03
fungi	granted, if kazoo is dying at only an order of magnitude difference from our production case, that's still concerning	15:03
corvus	i'll check the numbers to see when they leaked	15:04
*** hashar is now known as hasharAway		15:18
openstackgerrit	Matthieu Huin proposed zuul/zuul master: [WIP] Web UI: add i18n support, french translations https://review.opendev.org/737290	15:23
*** hamalq has joined #zuul		15:27
*** bhavikdbavishi has joined #zuul		15:32
corvus	the oldest request lock is 6744 requests before the latest one, which doesn't seem very old	15:33
corvus	we delete them after 8 hours, so i think this is expected behavior	15:36
corvus	i don't think we need changes to nodepool; i'll proceed with diagnosing the kazoo issue	15:36
fungi	the locks don't get deleted once the request is handled?	15:39
corvus	fungi: not the directory that holds the lock	15:41
corvus	we might be able to make it smarter and delete the lock dir after deleting the request	15:42
fungi	oh, i get it. yep	15:44
fungi	so it's because we're not immediately cleaning up empty trees	15:44
corvus	clarkb: i wonder if https://github.com/python-zk/kazoo/issues/587 really is the same issue -- that report seems to have the error on start rather than during a typical read	15:46
clarkb	corvus: I associated them due to the operation did not complete errors lining up other than the ssl c line number. I figured that could be due to different openssl versions	15:47
clarkb	corvus: but ya entierly possible there is a separate similar issue going on	15:47
corvus	yeah... maybe the nicest thing to do is open another issue and link back to that one	15:49
*** hasharAway is now known as hashar		15:53
openstackgerrit	Merged zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402	15:55
openstackgerrit	Merged zuul/zuul-jobs master: Fix and test multiarch docker builds in a release pipeline https://review.opendev.org/737059	15:55
*** rpittau is now known as rpittau\|afk		16:01
corvus	mordred: you will appreciate that the first step i have taken in tracking down the problem further in kazoo is to disable the exception relocation code which is masking the real exception	16:09
corvus	and i think i have a fix	16:13
*** sshnaidm\|ruck is now known as sshnaidm\|afk		16:13
corvus	it's the old ssl_want_read issue	16:13
*** Goneri has joined #zuul		16:14
mordred	corvus: I do appreciate that	16:14
corvus	i'm preparing a pr now	16:16
avass	corvus: cool, it doesn't test buildx yet though :)	16:18
avass	but could do that in another change	16:18
avass	is nodepools version synched to zuul? like will it be 3.19.0?	16:19
corvus	avass: nope; though i expect us to resync at 4.x	16:19
corvus	avass: are you sure it doesn't test buildx?	16:19
corvus	avass: let's back up. what do you mean by "it"? :)	16:19
corvus	avass: i agree that 735402 does not test buildx; but i think 737059 should.	16:20
sshnaidm\|afk	folks, please review ansible collections roles for zuul in your time: https://review.opendev.org/#/c/730360/	16:21
avass	the buildx part of build/upload docker image doesn't use the docker_registry variables	16:22
avass	unless I missed something	16:23
corvus	avass: i don't think it needs to; i think the upload part should be the same (the buildx path pulls the image from the buildx builder back on to the node's docker cache so that the push can run normally)	16:24
avass	the push is different depending on if it's buildx or not though	16:25
avass	it's either 'docker push ...' if it's normal docker or 'docker buildx ... --push', and upload-docker-image/tasks/buildx.yaml doesn't use the buildset_registry variable yet	16:27
corvus	oh, hrm, i thought setting the multiarch flag in the job would run that path, but it didn't.	16:27
avass	let me push a quick change so show what I mean :)	16:28
corvus	https://zuul.opendev.org/t/zuul/build/a55b55260589421e99bc5386101ff93a/console does look like it ran the 'docker build' not 'buildx' path	16:28
corvus	avass: i understand	16:28
corvus	737059 should have failed. its job is not testing what it says it tests. the thing we should figure out is why the multiarch flag didn't cause buildx to be used. i wonder if the other multiarch jobs work?	16:30
avass	good, it's probably not a lot to do to make sure it's tested though	16:30
avass	upload-docker-image only checks if 'docker_images.arch' is defined	16:31
mordred	corvus: it doens't make sense to me that it doesn't run the arch code	16:31
mordred	avass: yeah - but it shoudl be due to that ternary	16:31
mordred	we don't need to do multiarch \| bool \| ternary do we?	16:33
avass	ah, multiarch is only used in testing	16:34
avass	and that toggles between using images that sets the arch attribute and images that don't	16:35
*** jcapitao has quit IRC		16:35
mordred	yeah	16:36
mordred	at least in theory. it seems it might not actually be doing that	16:36
avass	well, it's not used for the release-test roles	16:36
mordred	oh! duh	16:37
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	16:39
avass	something like that is needed, and that should break.	16:39
mordred	avass: yes - I agree, that should do the trick	16:41
*** nils has quit IRC		16:44
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	16:45
avass	corvus: and I believe that is what you want to do ^ ?	16:46
avass	actaully no	16:46
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	16:46
*** hashar is now known as hasharAway		16:47
avass	I think that should mirror what the normal docker push does	16:48
corvus	mordred, clarkb, fungi, tobiash, bolg: https://github.com/python-zk/kazoo/issues/618 and https://github.com/python-zk/kazoo/pull/619	16:58
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	17:00
fungi	corvus: oh neat, so it's not handling server side buffering i guess?	17:02
tobiash	awesome, that was quick	17:02
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	17:03
corvus	fungi: well, it's openssl's weird internal state machine thing where in order to proceed with reading, sometimes reading or writing needs to happen.	17:04
fungi	ahh, so it's not as simple as polling a read and getting back zero bytes and then trying again when there's more data in the buffer	17:08
corvus	fungi: well, it is, except that it can be polling a read, getting back zero bytes, then trying again when the socket is writable.	17:08
fungi	ahh, and judging from your patch it's sufficient to just ignore those conditions since it will keep polling anyway?	17:13
corvus	fungi: yeah. this is the approach we took in gear	17:18
corvus	seems to have worked out okay :)	17:18
fungi	makes sense, thanks	17:18
*** jpena is now known as jpena\|off		17:23
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	17:25
mordred	corvus: so - now we just have to wait for that to land and be released	17:26
tobiash	corvus: btw I just saw that kazoo dropped testing of py35: https://github.com/python-zk/kazoo/releases/tag/2.7.0	17:26
corvus	tobiash: i guess we're providing the 3.5 testing, until we drop it :)	17:32
tobiash	seems so :)	17:38
avass	mordred: might need some help with buildx: https://zuul.opendev.org/t/zuul/build/f74165333e9443c2b3ce7b9adbed4470/log/job-output.txt#603 :)	17:39
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	17:39
mordred	avass: https://zuul.opendev.org/t/zuul/build/f74165333e9443c2b3ce7b9adbed4470/log/job-output.txt#580	17:41
avass	ah yep, just saw taht	17:41
mordred	avass: we normally set up one of those for buildset registry	17:41
mordred	so maybe we're missing an equiv step in test land	17:41
mordred	avass: it's "neat" that that error doesn't cause the task to fail	17:42
avass	yep, I tried to just re-use the registry used for testing upload-docker-image and hoped for the best	17:42
avass	do I need to create that somehow?	17:43
avass	because I think it should be close to working otherwise	17:44
mordred	avass: the use-buildset-registry role does it	17:45
mordred	but - buildset registry might not make a ton of sense for this codepath?	17:46
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	17:46
avass	we need one for testing buildx	17:46
avass	something like the last patchset there ^ ?	17:46
mordred	avass: yeah - that should work	17:48
avass	oh... does buildx build everything in nested containers?	17:54
*** rlandy\|ruck is now known as rlandy\|ruck\|mtg		17:59
mordred	yup	18:12
mordred	avass: welcome to the super magical magic magic docker docker magic docker	18:13
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	18:14
avass	yeah...	18:14
avass	mordred: I guess that's why we were using 'ansible_host' instead of just localhost	18:14
mordred	avass: yeah	18:15
SpamapS	mordred: are you sure you captured all of the magic?	18:17
mordred	SpamapS: docker docker what?	18:17
mordred	SpamapS: docker magic docker docker	18:17
SpamapS	docker ok that docker makes docker sense.	18:18
SpamapS	docker on	18:18
fungi	docker on garth	18:20
SpamapS	hmm, how do we stop the game then? "KERNEL PANIC!"	18:22
fungi	fished you in	18:23
*** hasharAway has quit IRC		18:23
*** bhavikdbavishi has quit IRC		18:25
*** rlandy\|ruck\|mtg is now known as rlandy\|ruck		18:26
openstackgerrit	Guillaume Chauvel proposed zuul/zuul-jobs master: prepare-workspace: Add Role Variable in README.rst https://review.opendev.org/737352	18:36
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	18:40
*** y2kenny has joined #zuul		18:47
y2kenny	I understand that a nodeset can define to have multiple node with different label/node type. Is it possible to request a nodeset that consist of labels from multiple provider? For example, can I define a nodeset with a node from Static provider and a node from OpenStack provider?	18:48
avass	y2kenny: nope	18:49
avass	y2kenny: not at the moment at least	18:49
y2kenny	avass: ok so that's a known limitation. Is there something very fundamental that prevents this feature from being implemented?	18:51
avass	y2kenny: not that I know of, I believe the reason is to ensure that the nodes are in the same network	18:52
mnaser	crazy idea of the day: would it be sane to introduce pre-merge jobs for zuul? like yes, the whole concept is zuul is pre-merge.. but the concept for this case is: i have helm charts and dockerfile's in a repo, when my change merges, the repo is updated with the newest helm charts, my cd platform notices the repo changing and starts kicking off a new deploy -- but promote hasn't started/finished running yet.. so it uses	18:52
mnaser	stale images	18:52
mnaser	could we do this in a way of like... revamping our upload job to promote right after if the entire buildset passes (but i guess that means upload job would depend on * jobs)	18:53
avass	mnaser: I guess you could use a dependent job for that	18:54
avass	but I guess that would have to depend on everything else in the buildset	18:55
avass	and if there's anything ahead of that buildset in the queue that could be a problem	18:56
corvus	mnaser: instead of having your cd system (argo?) watch the repo, can you kick it with a promote job?	18:57
mnaser	corvus: that would be nice, but that would mean encoding info of every single environment we deploy to into these jobs -- something we're trying to avoid because we don't always have direct access to their locations over the internets	18:59
corvus	mnaser: sorry i mean just kick argo	18:59
corvus	like, tell argo not to watch the repo, but instead use the argo cli to run a convergence (whatever argo calls that, i forget)	18:59
mnaser	corvus: right, argo isn't publicly accessible (even from our internal zuul that we run) for some stuff, which makes it hard to do that	18:59
mnaser	so i can curl http://argo/go-update or so	19:00
mnaser	(I mean, if i have to encode some logic to do some pre-deploy checks, that's on me too, but just wondered if it could be a use case)	19:00
mnaser	but i guess that _might_ require a custom set of pipelines (gate (all gate checks), pre-merge (post gate pass, promote images), promote (post merge)	19:01
*** hashar has joined #zuul		19:02
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	19:04
corvus	mnaser: a key but subtle behavior that zuul is built around is that the change isn't merged until it's merged. zuul can't guarantee that gerrit (or github) will actually succeed in merging the change. it tries to get as close as possible, but sometimes it fails. i think a tighter integration would be possible if we enabled zuul-push (so zuul performs the merge instead of gerrit/github/etc).	19:05
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	19:13
openstackgerrit	Guillaume Chauvel proposed zuul/zuul-jobs master: prepare-workspace: Add Role Variable in README.rst https://review.opendev.org/737352	19:18
y2kenny	avass: where do you folks usually keep track of feature request? Being able to mix use provider for nodeset can be useful.	19:19
avass	y2kenny: I would guess https://storyboard.openstack.org/#!/project/679	19:21
y2kenny	avass: great, thanks.	19:21
fungi	also accessible these days as https://storyboard.openstack.org/#!/project/zuul/zuul	19:22
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	19:25
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	19:44
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	19:56
*** sanjayu_ has quit IRC		19:56
y2kenny	corvus: last friday I asked about log streaming not available. I tried doing echo SHA \| nc localhost 7900 and I see the log but there's still no stream from the web ui	20:04
y2kenny	I also tried doing the same thing from the web-ui server and I get the log	20:05
y2kenny	are there anything else I can try to debug this?	20:05
y2kenny	(right now I am thinking about restarting the browser but I doubt that's the issue since I didn't get a log stream from a different browser either.)	20:05
corvus	y2kenny: you might open the devtools console in your browser and see if there are any relevant errors. is it possible zuul-web is behind something (a reverse proxy?) that interferes with websockets?	20:08
y2kenny	corvus: ok I will give that a shot	20:08
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	20:14
y2kenny	corvus: interesting... I am getting 500 for console-stream websocket	20:16
*** vishalmanchanda has quit IRC		20:16
y2kenny	let me see log for the web-ui server....	20:16
y2kenny	a bunch of error ws4py.exc.HandshakeError: Illegal value for header Connection: keep-alive	20:18
avass	y2kenny, corvus: I have a vague memory of this also happening if you're running the executor in a container on a separate host or docker network since the executor reports it's own hostname for the web to connect to	20:19
avass	since that would report the container hash	20:19
y2kenny	avass: does that happen all the time or only occasionally? Because the stream used to work and I haven't really change how I deploy Zuul	20:20
avass	ah, no in that case the logs would never work	20:21
mordred	y2kenny: my hunch is that sounds like regular http is getting passed to the websocket - like it's not going through the ws upgrade... maybe something changed in whatever you're using to do proxying?	20:22
mordred	if you look at https://ws4py.readthedocs.io/en/latest/_modules/ws4py/client/	20:22
mordred	in "def handshake_headers"	20:22
mordred	there's a list of appropriate headers ... and Connection is, I think, supposed to be "upgrade"	20:22
mordred	but that's a COMPLETE guess	20:23
avass	mordred: yeah	20:24
avass	mordred: any ideas why buildx is getting connection refused when trying to push images?	20:25
avass	y2kenny: if it's behind a reverse proxy, I set up my nginx config like this: https://github.com/dhackz/infra/blob/master/nginx/conf/zuul.conf#L12	20:26
y2kenny	mordred, avass: I don't think I am using any reverse proxy. I deployed zuul on top of k8s and using metallb.	20:27
mordred	ah. hrm.	20:27
y2kenny	could be bug in metallb? I haven't restarted that	20:27
mordred	maybe? maybe it's doing something (or not doing something)	20:28
y2kenny	and the web-ui is still working... just not the console streaming bit	20:28
corvus	is there an ingress controller involved?	20:28
y2kenny	no, just using metallb	20:28
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315	20:41
*** rfolco has quit IRC		20:59
*** hashar has quit IRC		21:27
y2kenny	What's the best way to kill a stuck job? (I have asked before but I forgot the answer.)	21:39
clarkb	y2kenny: I'm not sure this is the best way but you can kill the ansible process associated with that job	21:42
clarkb	an easier sledgehammer approach is to restart the executor running the ansible process	21:42
y2kenny	clarkb: I think the method may be dequeue	21:42
y2kenny	that should work right?	21:42
clarkb	ah yup that would do it too, but that does the whole change	21:42
clarkb	so is a smaller hammer	21:42
y2kenny	:)	21:43
*** y2kenny has quit IRC		21:47
fungi	yes, you can use the zuul dequeue rpc cli subcommand, but as clarkb notes it will abort/cancel all builds for the entire buildset rather than just failing the one build	21:47
fungi	also i don't know whether that will actually cause ansible processes to terminate, depending on how the build is "stuck"	21:48
*** y2kenny has joined #zuul		21:49
y2kenny	If I write a new nodepool driver, are there any corresponding change needed on the scheduler side?	21:57
y2kenny	or is everything about the node's life cycle handled within the driver?	21:58
clarkb	y2kenny: I am pretty sure it should all be in the driver	22:00
y2kenny	I've got a prototype driver going and it launches some node so they appear in the web ui	22:01
y2kenny	I try to define a nodeset to use the label, the request seems to be routed to the driver properly	22:02
y2kenny	but then the job got stuck	22:02
y2kenny	and I am trying to figureout what might be missing	22:02
y2kenny	From the nodepool side I see "Fullfilled node request" from NodeRequestHandler	22:04
clarkb	zuulians https://review.opendev.org/#/c/737027/1 is an easy docs update	22:05
clarkb	y2kenny: on the zuul side you should be able to trace the reqest too to see if it is unhappy with what it got	22:05
clarkb	maybe a missing field or similar in the zookeeper blob	22:06
y2kenny	clarkb: by zuul side do you mean the scheduler?	22:06
y2kenny	clarkb: or another bit?	22:06
fungi	scheduler	22:07
clarkb	y2kenny: the zuul scheduler and executors	22:07
clarkb	the scheduler coordinates the noderequests to nodepool then it hands that data over to the executor to use it in the job	22:07
clarkb	so it could be in either spot	22:07
fungi	at least start with the scheduler, but then yeah if it seems to be engaging an executor check to see if the executor logs say it's having trouble connecting or something	22:07
y2kenny	clarkb: so when scheduler say "Completed node request" that's the point of handing over to the executor right? (I also see ExecutorClient Execute job... after that.)	22:11
y2kenny	And the executor receving it is "ExecutorServer"?	22:13
clarkb	y2kenny: yes to the first thing, once node request is completed I would expect the next thing that is done with it is the info is passed to the executor	22:16
clarkb	y2kenny: for your second question what is the context? docker container name? log prefix?	22:17
y2kenny	the context is Executor log	22:17
clarkb	y2kenny: you should be able to grep for the event id of your triggering event across all services and have it show up consistently I think in the logs	22:17
clarkb	the other thing I'll do is grep for the build sha in the executor logs	22:17
y2kenny	clarkb: ok. I am seeing multiple node request for a single job for some reason	22:18
y2kenny	that seems odd	22:18
y2kenny	clarkb: build sha is not the same as event id right?	22:20
clarkb	y2kenny: correct they are separate. The build sha is associated to the job build side and the event id is from the trigger event input	22:20
fungi	y2kenny: correct, the event id appears in the log as "e: ..." where the ... is some uuid	22:20
fungi	in [] brackets i believe	22:21
fungi	yeah, it'll look like [e: 12c542cdd9d444c5a8d2e256c6691bea]	22:21
fungi	taken from one of our scheduler logs just now	22:21
y2kenny	fungi: does event id get pass to the executor side or only build id?	22:24
clarkb	I thought both did but I could be wrong about that	22:24
fungi	event id in the executor logs should correspond, checking	22:24
*** armstrongs has joined #zuul		22:26
fungi	y2kenny: confirmed, i can find that exact event i pasted above in one of our executor's debug logs too	22:26
y2kenny	I can associate eventid in both scheduler and nodepool but having trouble finding the corresponding event id in the executor...	22:26
y2kenny	would that be an indication of what is stuck?	22:27
fungi	if no executor ever picked up the job request in the gearman queue, then it wouldn't be logged on any executor	22:27
clarkb	ya that could mean the scheduler is having trouble after the node request is completed but before it gets executed by the executor	22:28
y2kenny	what does the scheduler do with the node request after it is completed?	22:28
y2kenny	does it try to talk to the node in any way?	22:28
*** armstrongs has quit IRC		22:35
y2kenny	oh and an unrelated note... does opendev.org search function sometimes don't work?	22:35
fungi	the scheduler never tries to contact the node, it adds the node information to a gearman request which waits in geard's queue for an executor to claim it	22:42
clarkb	y2kenny: ya the search problem is a known issue with gitea, they have plans for fixing it. For that reason we've kept http://codesearch.openstack.org running	22:44
clarkb	we hope that gitea will eventually take over those duties though :/	22:44
y2kenny	fungi: so is there something in gearman I can examine about noderequest?	22:44
y2kenny	clarkb: that's good to know. Thanks.	22:45
clarkb	guillaumec: left some thoughts on https://review.opendev.org/#/c/732066/22	22:45
fungi	y2kenny: y2kenny: https://zuul-ci.org/docs/zuul/howtos/troubleshooting.html#gearman-jobs	22:45
y2kenny	fungi: ok. I was using that earlier but I thought those are just stats and counts... I will dig into it deeper	22:48
corvus	y2kenny: the scheduler should tell you in the debug logs exactly what it's doing	22:48
clarkb	tobiash: https://review.opendev.org/#/c/730624/ would be a good one to update when you get a chance. Since I think that does address a class of problem with the window sizing	22:49
y2kenny	corvus: from the scheduler log, the last useful thing I see is "INFO zuul.ExecutorClient: [e: c19e087de6f64c4790d16b73a2008946] Execute job x-ipmi (uuid: 25abe404e9134fc7a4529a37f905a90d) on nodes <NodeSet [ for change with dependent changes [{'"	22:51
y2kenny	so reading this again... does the left open bracket of "<NodeSet [" suppose to happen?	22:51
clarkb	y2kenny: Execute job build-openstack-sphinx-docs (uuid: ce224c2d3c35447dbd25d2e05115ac97) on nodes <NodeSet ubuntu-xenial [<Node 0017300685 ('ubuntu-xenial',):ubuntu-xenial>]> for change <- that is what the openstack driver produces	22:53
clarkb	it looks like maybe your node request was fulfilled incompletely?	22:53
y2kenny	clarkb: ok... so looks like there is something wrong with the nodeset being returned	22:53
y2kenny	before that line I got "INFO zuul.nodepool: Setting nodeset <NodeSet [ in use"	22:54
clarkb	y2kenny: maybe double check what ends up in the zookeeper db and ensure that looks correct to you as far as info goes	22:56
clarkb	and if it does let me know what you're lookign at and we can probably grab similar from openstack driver to compare	22:56
y2kenny	clarkb: what's the best way to drill into zookeeper db? (I assume that's different from gearman?)	22:57
*** tosky has quit IRC		22:59
clarkb	ya there is a tool that helps let me find it	22:59
corvus	zk-shell	22:59
clarkb	y2kenny: https://pypi.org/project/zk-shell/ that tool	22:59
clarkb	it allows you to navigate the db "filesystem" and dump node contents	22:59
y2kenny	clarkb: ok I will take a look. fwiw, "nodepool list" don't seems to show anonmoly.	23:00
y2kenny	\| 0000000781 \| 10.6.198.1 \| zuul-nodes-exp \| ng0006 \| None \| None \| in-use \| 00:00:49:56 \| locked \|	23:01
corvus	y2kenny: there's a verbose option to nodepool list, but it still may not show everything	23:01
openstackgerrit	Merged zuul/zuul master: gitlab - add driver documentation https://review.opendev.org/733880	23:02
y2kenny	corvus: oh... that dumped a lot of info... (I used it with --debug)	23:02
fungi	y2kenny: the stray "[" there should be a nodeset name, so i wonder if you have a yaml parsing issue (reading a string where you intended a list maybe?)	23:04
y2kenny	fungi: that would be part of the job definition?	23:05
fungi	also the change id seems to be empty and the dependent changes list looks like an opening to a list of dictionaries	23:05
*** rlandy\|ruck is now known as rlandy\|ruck\|bbl		23:05
fungi	y2kenny: yeah it would probably be the nodeset parameter of that job definition, or of a parent job	23:06
y2kenny	fungi: the change id I just truncated what I copied. sorry about that one.	23:06
fungi	ahh, okay	23:06
fungi	it was hard to tell	23:06
y2kenny	clarkb: so I have zk-shell hooked up and I used the tree to look at things	23:10
y2kenny	is there a particular branch I should go into?	23:11
clarkb	y2kenny: it will be something like /nodepool/node-requests/something/something	23:11
clarkb	I always have to explore a bit when I do it	23:11
clarkb	tobiash: left +2 on https://review.opendev.org/#/c/710034/ but didn't approve as I think my comments are worth considering before we land that	23:12
clarkb	tobiash: but I think its mostly cosmetic things that I called out	23:12
clarkb	but since it is a mostly cosmetic change anyway I figured those were worth a look	23:12
y2kenny	clarkb: I see /nodepool/nodes/<Id>/lock/<some other stuff...>	23:13
clarkb	hrm if you don't have active requests they may not be there?	23:13
clarkb	its the noderequest completing which maps onto the nodes path above	23:14
clarkb	I think you want to check your node requests as they are completed	23:14
clarkb	may be easier to log that in the services due to the cleanup race?	23:14
y2kenny	but doesn't this mean the node request was completed successfully?	23:15
clarkb	y2kenny: I think technically it was, what I think may be happening is that the contents of that successful transition aren't complete enough for zuul to figure out which nodes to use	23:16
clarkb	its successful according to the protocol, but probably not when actually needing to use those nodes	23:16
*** rfolco has joined #zuul		23:18
y2kenny	clarkb: ok... I am not sure how to capture that. And the driver doesn't control that transition does it? (like... I don't recall implementing something that flip the state from "ready" to "in-use")	23:19
clarkb	y2kenny: the driver signals that it is done then the scheduler takes over from there. I think what you want to do is log the contents of the node request when the driver is marking the request as completed	23:19
y2kenny	I assume everything the scheduler/executor needs is in zk.Node	23:19
clarkb	y2kenny: the scheduler looks at the node request to figure out what zk Nodes to use	23:20
clarkb	the node request completes with a list of ndoes in it that fulfill the request	23:20
clarkb	and it almost seemsl ike maybe that is the bit that is missing	23:20
y2kenny	clarkb: OOhhh kay... this is the launchesComplete of NodeRequestHandler?	23:21
clarkb	y2kenny: its actually https://opendev.org/zuul/nodepool/src/branch/master/nodepool/driver/__init__.py#L707-L719 which is generic to all drivers so maybe that isn't the issue	23:26
clarkb	based on the log you have from the scheduler it seems the request didn't provide complete node(request) data	23:27
clarkb	and you'll need to work backward from there	23:27
y2kenny	ok	23:27
y2kenny	clarkb: I am also comparing the zk node entry and see if I miss something (comparing with a working driver entry like k8s)	23:28
*** hamalq has quit IRC		23:37
clarkb	corvus: mhu tristanC left a couple of nits on https://review.opendev.org/#/c/734134/ will defer to others if that should be approved as is or if we should clean that stuff up	23:55
clarkb	tristanC: ^ you approved the child so maybe you want to just approve that one if the nits are minor enough	23:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!