*** wuchunyang has joined #zuul | 00:59 | |
*** wuchunyang has quit IRC | 01:05 | |
*** swest has quit IRC | 01:55 | |
*** swest has joined #zuul | 02:09 | |
*** bhavikdbavishi has joined #zuul | 02:56 | |
*** bhavikdbavishi has quit IRC | 03:04 | |
*** bhavikdbavishi has joined #zuul | 03:05 | |
*** bhavikdbavishi1 has joined #zuul | 03:08 | |
*** bhavikdbavishi has quit IRC | 03:10 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:10 | |
*** sgw has quit IRC | 03:21 | |
*** wuchunyang has joined #zuul | 04:02 | |
*** wuchunyang has quit IRC | 04:06 | |
*** vishalmanchanda has joined #zuul | 04:29 | |
*** evrardjp has quit IRC | 04:33 | |
*** evrardjp has joined #zuul | 04:33 | |
*** bhavikdbavishi has quit IRC | 04:36 | |
*** bhavikdbavishi has joined #zuul | 04:37 | |
*** bhavikdbavishi has quit IRC | 04:44 | |
*** bhavikdbavishi has joined #zuul | 04:44 | |
*** sgw has joined #zuul | 04:58 | |
*** bhavikdbavishi has quit IRC | 05:02 | |
*** sgw has quit IRC | 05:05 | |
*** bhavikdbavishi has joined #zuul | 05:07 | |
*** bhavikdbavishi1 has joined #zuul | 05:52 | |
*** bhavikdbavishi has quit IRC | 05:53 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 05:53 | |
*** saneax has joined #zuul | 06:10 | |
*** rpittau|afk is now known as rpittau | 06:21 | |
*** bhavikdbavishi has quit IRC | 06:40 | |
*** dennis_effa has joined #zuul | 06:40 | |
*** hashar has joined #zuul | 07:08 | |
*** bhavikdbavishi has joined #zuul | 07:27 | |
*** bhagyashris is now known as bhagyashris|lunc | 07:27 | |
*** tosky has joined #zuul | 07:29 | |
*** sshnaidm|off is now known as sshnaidm|ruck | 07:29 | |
*** jcapitao has joined #zuul | 07:37 | |
openstackgerrit | Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225 | 07:43 |
---|---|---|
*** jpena|off is now known as jpena | 07:56 | |
*** raukadah is now known as chandankumar | 07:58 | |
*** dpawlik6 has quit IRC | 08:18 | |
*** odyssey4me has joined #zuul | 08:30 | |
odyssey4me | hey folks, is the note in https://zuul-ci.org/docs/zuul-jobs/python-roles.html#role-ensure-python about only supporting debian true? if so, what're the options for centos/rpm ? | 08:30 |
*** bhagyashris|lunc is now known as bhagyashris | 08:34 | |
*** dpawlik6 has joined #zuul | 08:35 | |
odyssey4me | it would appear to me that it's not true any more | 08:42 |
*** nils has joined #zuul | 08:42 | |
openstackgerrit | Jesse Pretorius (odyssey4me) proposed zuul/zuul-jobs master: [ensure-python] Remove debian-only note https://review.opendev.org/737231 | 08:44 |
*** dennis_effa has quit IRC | 08:50 | |
avass | odyssey4me: looks like it's only supported for debian unless you use pyenv | 08:52 |
avass | so the documentation probably needs to be updated | 08:52 |
odyssey4me | avass: yeah, I saw that and abandoned the patch... looks like holser is working on a fix in https://review.opendev.org/#/c/737060/1 | 08:53 |
holser | yeah | 08:53 |
holser | starting now | 08:53 |
holser | as I was reading/replying emails | 08:53 |
holser | coffee and back to patch | 08:53 |
holser | bbi10 | 08:53 |
avass | oh, tell me when it's ready and I'll take a look at it then :) | 08:54 |
holser | avass sure | 09:07 |
*** bolg has joined #zuul | 09:15 | |
openstackgerrit | Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225 | 09:20 |
openstackgerrit | Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225 | 09:27 |
swest | zuul-maint: I'd like to kindly ask you for a review of the circular dependency change https://review.opendev.org/#/c/685354/ as well as tobiash change queue refactor that need a second review from a Zuul maintainer https://review.opendev.org/#/q/topic:change-queues+(status:open+OR+status:merged) | 09:27 |
*** rpittau is now known as rpittau|bbl | 10:15 | |
*** wuchunyang has joined #zuul | 10:38 | |
*** wuchunyang has quit IRC | 10:49 | |
*** wuchunyang has joined #zuul | 10:49 | |
*** jcapitao is now known as jcapitao_lunch | 10:54 | |
*** jpena is now known as jpena|lunch | 11:29 | |
*** rlandy has joined #zuul | 11:49 | |
*** rlandy is now known as rlandy|ruck | 11:50 | |
*** bhavikdbavishi has quit IRC | 11:58 | |
*** rfolco has joined #zuul | 12:05 | |
*** wuchunyang has quit IRC | 12:11 | |
*** rpittau|bbl is now known as rpittau | 12:22 | |
*** jcapitao_lunch is now known as jcapitao | 12:23 | |
*** jpena|lunch is now known as jpena | 12:32 | |
openstackgerrit | Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225 | 12:35 |
*** felixedel has joined #zuul | 12:36 | |
*** ysandeep|away is now known as ysandeep|PTO | 12:38 | |
felixedel | zuul-maint: The first Patternfly 4 patch is ready for review https://review.opendev.org/#/c/736225/ :) It adds the patternfly4 react package, updates header, navbar and navigation drawer with Patternfly 4 components and adapts the global page layout (that's why every page/ file is changed). This should allow us to update the other components step | 12:42 |
felixedel | by step from PF3 to PF4. The navigation should now also work like before and you shouldn't get lost in any undefined tenant ;-) | 12:42 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Scheduler's pause/resume functionality https://review.opendev.org/709735 | 12:55 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Separate connection registries in tests https://review.opendev.org/712958 | 12:55 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Prepare Zookeeper for scale-out scheduler https://review.opendev.org/717269 | 12:55 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Mandatory Zookeeper connection for ZuulWeb in tests https://review.opendev.org/721254 | 12:55 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Driver event ingestion https://review.opendev.org/717299 | 12:55 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: Add 'uuid' to 'src_dir' in order to allow parallel jobs for a static node https://review.opendev.org/735981 | 13:05 |
avass | felixedel: cool :) | 13:05 |
*** rlandy|ruck is now known as rlandy|ruck|mtg | 13:12 | |
*** bhavikdbavishi has joined #zuul | 13:13 | |
*** hashar is now known as hasharAway | 13:21 | |
openstackgerrit | Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225 | 13:22 |
*** rlandy|ruck|mtg is now known as rlandy|ruck | 13:31 | |
mordred | felixedel: nice! | 13:36 |
openstackgerrit | Felix Edel proposed zuul/zuul master: Introduce Patternfly 4 https://review.opendev.org/736225 | 13:37 |
*** bhavikdbavishi has quit IRC | 13:48 | |
*** rlandy|ruck is now known as rlandy|ruck|mtg | 13:49 | |
*** sgw has joined #zuul | 13:49 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 13:55 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Fix and test multiarch docker builds in a release pipeline https://review.opendev.org/737059 | 13:55 |
*** hasharAway is now known as hashar | 13:58 | |
*** rlandy|ruck|mtg is now known as rlandy|ruck | 14:01 | |
tristanC | swest: i'll setup a zuul with multiple project and branch to validate 718531 , i should be able to finish the review soon | 14:16 |
*** felixedel has quit IRC | 14:21 | |
corvus | avass: https://review.opendev.org/735402 is green now with the htpasswd fix and the multiarch fix+test on top of it too: https://review.opendev.org/737059 | 14:39 |
corvus | mordred: ^ | 14:39 |
corvus | landing those will let us make another attempt at the nodepool 3.x tag | 14:39 |
openstackgerrit | Merged zuul/nodepool master: Improve max-servers handling for GCE https://review.opendev.org/737146 | 14:45 |
*** sgw1 has joined #zuul | 14:48 | |
*** saneax has quit IRC | 14:52 | |
fungi | ianw: looks like vos release of the mirror.fedora volume is now around 7 seconds with rsync 3.1.3 on focal! | 14:52 |
fungi | (with rsync -t that is) | 14:53 |
*** saneax has joined #zuul | 14:53 | |
*** saneax has quit IRC | 14:54 | |
*** saneax has joined #zuul | 14:55 | |
fungi | we've still got some sizeable outbound spikes from 01.dfw every 2 hours, but nothing like before: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=2362&rra_id=all | 14:55 |
clarkb | fungi: ww ? | 14:56 |
clarkb | (excellent news though :) ) | 14:56 |
fungi | ww? | 14:57 |
*** sanjayu_ has joined #zuul | 14:57 | |
mordred | fungi: \o/ | 14:58 |
corvus | tobiash, fungi, mordred: moving this from #opendev -- looking at the zk kazoo tls issues, it seems to die on get_children on /nodepool/requests-lock; that node has 7877 children. | 14:58 |
corvus | we only have 600 requests | 14:59 |
fungi | clarkb: ianw: oops, wrong channel | 14:59 |
fungi | sorry for the noise | 14:59 |
corvus | so i think that in addition to the kazoo issue, we may also have a nodepool bug leaking requests-lock entries? | 14:59 |
*** saneax has quit IRC | 15:00 | |
fungi | yeah, so an order of magnitude more entries than expected. is it continuing to grow, or did we likely leak some at one point and they're just hanging around? | 15:01 |
fungi | just wondering if they could be cruft from an old leak | 15:02 |
bolg | corvus: we experienced similar issue with tobiash in on one of our test environments with get_children when there are too many nodes (arround 8000 in our case 2w ago) | 15:03 |
fungi | granted, if kazoo is dying at only an order of magnitude difference from our production case, that's still concerning | 15:03 |
corvus | i'll check the numbers to see when they leaked | 15:04 |
*** hashar is now known as hasharAway | 15:18 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: [WIP] Web UI: add i18n support, french translations https://review.opendev.org/737290 | 15:23 |
*** hamalq has joined #zuul | 15:27 | |
*** bhavikdbavishi has joined #zuul | 15:32 | |
corvus | the oldest request lock is 6744 requests before the latest one, which doesn't seem very old | 15:33 |
corvus | we delete them after 8 hours, so i think this is expected behavior | 15:36 |
corvus | i don't think we need changes to nodepool; i'll proceed with diagnosing the kazoo issue | 15:36 |
fungi | the locks don't get deleted once the request is handled? | 15:39 |
corvus | fungi: not the directory that holds the lock | 15:41 |
corvus | we might be able to make it smarter and delete the lock dir after deleting the request | 15:42 |
fungi | oh, i get it. yep | 15:44 |
fungi | so it's because we're not immediately cleaning up empty trees | 15:44 |
corvus | clarkb: i wonder if https://github.com/python-zk/kazoo/issues/587 really is the same issue -- that report seems to have the error on start rather than during a typical read | 15:46 |
clarkb | corvus: I associated them due to the operation did not complete errors lining up other than the ssl c line number. I figured that could be due to different openssl versions | 15:47 |
clarkb | corvus: but ya entierly possible there is a separate similar issue going on | 15:47 |
corvus | yeah... maybe the nicest thing to do is open another issue and link back to that one | 15:49 |
*** hasharAway is now known as hashar | 15:53 | |
openstackgerrit | Merged zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 15:55 |
openstackgerrit | Merged zuul/zuul-jobs master: Fix and test multiarch docker builds in a release pipeline https://review.opendev.org/737059 | 15:55 |
*** rpittau is now known as rpittau|afk | 16:01 | |
corvus | mordred: you will appreciate that the first step i have taken in tracking down the problem further in kazoo is to disable the exception relocation code which is masking the real exception | 16:09 |
corvus | and i think i have a fix | 16:13 |
*** sshnaidm|ruck is now known as sshnaidm|afk | 16:13 | |
corvus | it's the old ssl_want_read issue | 16:13 |
*** Goneri has joined #zuul | 16:14 | |
mordred | corvus: I do appreciate that | 16:14 |
corvus | i'm preparing a pr now | 16:16 |
avass | corvus: cool, it doesn't test buildx yet though :) | 16:18 |
avass | but could do that in another change | 16:18 |
avass | is nodepools version synched to zuul? like will it be 3.19.0? | 16:19 |
corvus | avass: nope; though i expect us to resync at 4.x | 16:19 |
corvus | avass: are you sure it doesn't test buildx? | 16:19 |
corvus | avass: let's back up. what do you mean by "it"? :) | 16:19 |
corvus | avass: i agree that 735402 does not test buildx; but i think 737059 should. | 16:20 |
sshnaidm|afk | folks, please review ansible collections roles for zuul in your time: https://review.opendev.org/#/c/730360/ | 16:21 |
avass | the buildx part of build/upload docker image doesn't use the docker_registry variables | 16:22 |
avass | unless I missed something | 16:23 |
corvus | avass: i don't think it needs to; i think the upload part should be the same (the buildx path pulls the image from the buildx builder back on to the node's docker cache so that the push can run normally) | 16:24 |
avass | the push is different depending on if it's buildx or not though | 16:25 |
avass | it's either 'docker push ...' if it's normal docker or 'docker buildx ... --push', and upload-docker-image/tasks/buildx.yaml doesn't use the buildset_registry variable yet | 16:27 |
corvus | oh, hrm, i thought setting the multiarch flag in the job would run that path, but it didn't. | 16:27 |
avass | let me push a quick change so show what I mean :) | 16:28 |
corvus | https://zuul.opendev.org/t/zuul/build/a55b55260589421e99bc5386101ff93a/console does look like it ran the 'docker build' not 'buildx' path | 16:28 |
corvus | avass: i understand | 16:28 |
corvus | 737059 should have failed. its job is not testing what it says it tests. the thing we should figure out is why the multiarch flag didn't cause buildx to be used. i wonder if the other multiarch jobs work? | 16:30 |
avass | good, it's probably not a lot to do to make sure it's tested though | 16:30 |
avass | upload-docker-image only checks if 'docker_images.arch' is defined | 16:31 |
mordred | corvus: it doens't make sense to me that it doesn't run the arch code | 16:31 |
mordred | avass: yeah - but it shoudl be due to that ternary | 16:31 |
mordred | we don't need to do multiarch | bool | ternary do we? | 16:33 |
avass | ah, multiarch is only used in testing | 16:34 |
avass | and that toggles between using images that sets the arch attribute and images that don't | 16:35 |
*** jcapitao has quit IRC | 16:35 | |
mordred | yeah | 16:36 |
mordred | at least in theory. it seems it might not actually be doing that | 16:36 |
avass | well, it's not used for the release-test roles | 16:36 |
mordred | oh! duh | 16:37 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 16:39 |
avass | something like that is needed, and that should break. | 16:39 |
mordred | avass: yes - I agree, that should do the trick | 16:41 |
*** nils has quit IRC | 16:44 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 16:45 |
avass | corvus: and I believe that is what you want to do ^ ? | 16:46 |
avass | actaully no | 16:46 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 16:46 |
*** hashar is now known as hasharAway | 16:47 | |
avass | I think that should mirror what the normal docker push does | 16:48 |
corvus | mordred, clarkb, fungi, tobiash, bolg: https://github.com/python-zk/kazoo/issues/618 and https://github.com/python-zk/kazoo/pull/619 | 16:58 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 17:00 |
fungi | corvus: oh neat, so it's not handling server side buffering i guess? | 17:02 |
tobiash | awesome, that was quick | 17:02 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 17:03 |
corvus | fungi: well, it's openssl's weird internal state machine thing where in order to proceed with reading, sometimes reading or writing needs to happen. | 17:04 |
fungi | ahh, so it's not as simple as polling a read and getting back zero bytes and then trying again when there's more data in the buffer | 17:08 |
corvus | fungi: well, it is, except that it can be polling a read, getting back zero bytes, then trying again when the socket is writable. | 17:08 |
fungi | ahh, and judging from your patch it's sufficient to just ignore those conditions since it will keep polling anyway? | 17:13 |
corvus | fungi: yeah. this is the approach we took in gear | 17:18 |
corvus | seems to have worked out okay :) | 17:18 |
fungi | makes sense, thanks | 17:18 |
*** jpena is now known as jpena|off | 17:23 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 17:25 |
mordred | corvus: so - now we just have to wait for that to land and be released | 17:26 |
tobiash | corvus: btw I just saw that kazoo dropped testing of py35: https://github.com/python-zk/kazoo/releases/tag/2.7.0 | 17:26 |
corvus | tobiash: i guess we're providing the 3.5 testing, until we drop it :) | 17:32 |
tobiash | seems so :) | 17:38 |
avass | mordred: might need some help with buildx: https://zuul.opendev.org/t/zuul/build/f74165333e9443c2b3ce7b9adbed4470/log/job-output.txt#603 :) | 17:39 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 17:39 |
mordred | avass: https://zuul.opendev.org/t/zuul/build/f74165333e9443c2b3ce7b9adbed4470/log/job-output.txt#580 | 17:41 |
avass | ah yep, just saw taht | 17:41 |
mordred | avass: we normally set up one of those for buildset registry | 17:41 |
mordred | so maybe we're missing an equiv step in test land | 17:41 |
mordred | avass: it's "neat" that that error doesn't cause the task to fail | 17:42 |
avass | yep, I tried to just re-use the registry used for testing upload-docker-image and hoped for the best | 17:42 |
avass | do I need to create that somehow? | 17:43 |
avass | because I think it should be close to working otherwise | 17:44 |
mordred | avass: the use-buildset-registry role does it | 17:45 |
mordred | but - buildset registry might not make a ton of sense for this codepath? | 17:46 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 17:46 |
avass | we need one for testing buildx | 17:46 |
avass | something like the last patchset there ^ ? | 17:46 |
mordred | avass: yeah - that should work | 17:48 |
avass | oh... does buildx build everything in nested containers? | 17:54 |
*** rlandy|ruck is now known as rlandy|ruck|mtg | 17:59 | |
mordred | yup | 18:12 |
mordred | avass: welcome to the super magical magic magic docker docker magic docker | 18:13 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 18:14 |
avass | yeah... | 18:14 |
avass | mordred: I guess that's why we were using 'ansible_host' instead of just localhost | 18:14 |
mordred | avass: yeah | 18:15 |
SpamapS | mordred: are you sure you captured all of the magic? | 18:17 |
mordred | SpamapS: docker docker what? | 18:17 |
mordred | SpamapS: docker magic docker docker | 18:17 |
SpamapS | docker ok that docker makes docker sense. | 18:18 |
SpamapS | docker on | 18:18 |
fungi | docker on garth | 18:20 |
SpamapS | hmm, how do we stop the game then? "KERNEL PANIC!" | 18:22 |
fungi | fished you in | 18:23 |
*** hasharAway has quit IRC | 18:23 | |
*** bhavikdbavishi has quit IRC | 18:25 | |
*** rlandy|ruck|mtg is now known as rlandy|ruck | 18:26 | |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: prepare-workspace: Add Role Variable in README.rst https://review.opendev.org/737352 | 18:36 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 18:40 |
*** y2kenny has joined #zuul | 18:47 | |
y2kenny | I understand that a nodeset can define to have multiple node with different label/node type. Is it possible to request a nodeset that consist of labels from multiple provider? For example, can I define a nodeset with a node from Static provider and a node from OpenStack provider? | 18:48 |
avass | y2kenny: nope | 18:49 |
avass | y2kenny: not at the moment at least | 18:49 |
y2kenny | avass: ok so that's a known limitation. Is there something very fundamental that prevents this feature from being implemented? | 18:51 |
avass | y2kenny: not that I know of, I believe the reason is to ensure that the nodes are in the same network | 18:52 |
mnaser | crazy idea of the day: would it be sane to introduce pre-merge jobs for zuul? like yes, the whole concept is zuul is pre-merge.. but the concept for this case is: i have helm charts and dockerfile's in a repo, when my change merges, the repo is updated with the newest helm charts, my cd platform notices the repo changing and starts kicking off a new deploy -- but promote hasn't started/finished running yet.. so it uses | 18:52 |
mnaser | stale images | 18:52 |
mnaser | could we do this in a way of like... revamping our upload job to promote right after if the entire buildset passes (but i guess that means upload job would depend on * jobs) | 18:53 |
avass | mnaser: I guess you could use a dependent job for that | 18:54 |
avass | but I guess that would have to depend on everything else in the buildset | 18:55 |
avass | and if there's anything ahead of that buildset in the queue that could be a problem | 18:56 |
corvus | mnaser: instead of having your cd system (argo?) watch the repo, can you kick it with a promote job? | 18:57 |
mnaser | corvus: that would be nice, but that would mean encoding info of every single environment we deploy to into these jobs -- something we're trying to avoid because we don't always have direct access to their locations over the internets | 18:59 |
corvus | mnaser: sorry i mean just kick argo | 18:59 |
corvus | like, tell argo not to watch the repo, but instead use the argo cli to run a convergence (whatever argo calls that, i forget) | 18:59 |
mnaser | corvus: right, argo isn't publicly accessible (even from our internal zuul that we run) for some stuff, which makes it hard to do that | 18:59 |
mnaser | so i can curl http://argo/go-update or so | 19:00 |
mnaser | (I mean, if i have to encode some logic to do some pre-deploy checks, that's on me too, but just wondered if it could be a use case) | 19:00 |
mnaser | but i guess that _might_ require a custom set of pipelines (gate (all gate checks), pre-merge (post gate pass, promote images), promote (post merge) | 19:01 |
*** hashar has joined #zuul | 19:02 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 19:04 |
corvus | mnaser: a key but subtle behavior that zuul is built around is that the change isn't merged until it's merged. zuul can't guarantee that gerrit (or github) will actually succeed in merging the change. it tries to get as close as possible, but sometimes it fails. i think a tighter integration would be possible if we enabled zuul-push (so zuul performs the merge instead of gerrit/github/etc). | 19:05 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 19:13 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: prepare-workspace: Add Role Variable in README.rst https://review.opendev.org/737352 | 19:18 |
y2kenny | avass: where do you folks usually keep track of feature request? Being able to mix use provider for nodeset can be useful. | 19:19 |
avass | y2kenny: I would guess https://storyboard.openstack.org/#!/project/679 | 19:21 |
y2kenny | avass: great, thanks. | 19:21 |
fungi | also accessible these days as https://storyboard.openstack.org/#!/project/zuul/zuul | 19:22 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 19:25 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 19:44 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 19:56 |
*** sanjayu_ has quit IRC | 19:56 | |
y2kenny | corvus: last friday I asked about log streaming not available. I tried doing echo SHA | nc localhost 7900 and I see the log but there's still no stream from the web ui | 20:04 |
y2kenny | I also tried doing the same thing from the web-ui server and I get the log | 20:05 |
y2kenny | are there anything else I can try to debug this? | 20:05 |
y2kenny | (right now I am thinking about restarting the browser but I doubt that's the issue since I didn't get a log stream from a different browser either.) | 20:05 |
corvus | y2kenny: you might open the devtools console in your browser and see if there are any relevant errors. is it possible zuul-web is behind something (a reverse proxy?) that interferes with websockets? | 20:08 |
y2kenny | corvus: ok I will give that a shot | 20:08 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 20:14 |
y2kenny | corvus: interesting... I am getting 500 for console-stream websocket | 20:16 |
*** vishalmanchanda has quit IRC | 20:16 | |
y2kenny | let me see log for the web-ui server.... | 20:16 |
y2kenny | a bunch of error ws4py.exc.HandshakeError: Illegal value for header Connection: keep-alive | 20:18 |
avass | y2kenny, corvus: I have a vague memory of this also happening if you're running the executor in a container on a separate host or docker network since the executor reports it's own hostname for the web to connect to | 20:19 |
avass | since that would report the container hash | 20:19 |
y2kenny | avass: does that happen all the time or only occasionally? Because the stream used to work and I haven't really change how I deploy Zuul | 20:20 |
avass | ah, no in that case the logs would never work | 20:21 |
mordred | y2kenny: my hunch is that sounds like regular http is getting passed to the websocket - like it's not going through the ws upgrade... maybe something changed in whatever you're using to do proxying? | 20:22 |
mordred | if you look at https://ws4py.readthedocs.io/en/latest/_modules/ws4py/client/ | 20:22 |
mordred | in "def handshake_headers" | 20:22 |
mordred | there's a list of appropriate headers ... and Connection is, I think, supposed to be "upgrade" | 20:22 |
mordred | but that's a COMPLETE guess | 20:23 |
avass | mordred: yeah | 20:24 |
avass | mordred: any ideas why buildx is getting connection refused when trying to push images? | 20:25 |
avass | y2kenny: if it's behind a reverse proxy, I set up my nginx config like this: https://github.com/dhackz/infra/blob/master/nginx/conf/zuul.conf#L12 | 20:26 |
y2kenny | mordred, avass: I don't think I am using any reverse proxy. I deployed zuul on top of k8s and using metallb. | 20:27 |
mordred | ah. hrm. | 20:27 |
y2kenny | could be bug in metallb? I haven't restarted that | 20:27 |
mordred | maybe? maybe it's doing something (or not doing something) | 20:28 |
y2kenny | and the web-ui is still working... just not the console streaming bit | 20:28 |
corvus | is there an ingress controller involved? | 20:28 |
y2kenny | no, just using metallb | 20:28 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 20:41 |
*** rfolco has quit IRC | 20:59 | |
*** hashar has quit IRC | 21:27 | |
y2kenny | What's the best way to kill a stuck job? (I have asked before but I forgot the answer.) | 21:39 |
clarkb | y2kenny: I'm not sure this is the best way but you can kill the ansible process associated with that job | 21:42 |
clarkb | an easier sledgehammer approach is to restart the executor running the ansible process | 21:42 |
y2kenny | clarkb: I think the method may be dequeue | 21:42 |
y2kenny | that should work right? | 21:42 |
clarkb | ah yup that would do it too, but that does the whole change | 21:42 |
clarkb | so is a smaller hammer | 21:42 |
y2kenny | :) | 21:43 |
*** y2kenny has quit IRC | 21:47 | |
fungi | yes, you can use the zuul dequeue rpc cli subcommand, but as clarkb notes it will abort/cancel all builds for the entire buildset rather than just failing the one build | 21:47 |
fungi | also i don't know whether that will actually cause ansible processes to terminate, depending on how the build is "stuck" | 21:48 |
*** y2kenny has joined #zuul | 21:49 | |
y2kenny | If I write a new nodepool driver, are there any corresponding change needed on the scheduler side? | 21:57 |
y2kenny | or is everything about the node's life cycle handled within the driver? | 21:58 |
clarkb | y2kenny: I am pretty sure it should all be in the driver | 22:00 |
y2kenny | I've got a prototype driver going and it launches some node so they appear in the web ui | 22:01 |
y2kenny | I try to define a nodeset to use the label, the request seems to be routed to the driver properly | 22:02 |
y2kenny | but then the job got stuck | 22:02 |
y2kenny | and I am trying to figureout what might be missing | 22:02 |
y2kenny | From the nodepool side I see "Fullfilled node request" from NodeRequestHandler | 22:04 |
clarkb | zuulians https://review.opendev.org/#/c/737027/1 is an easy docs update | 22:05 |
clarkb | y2kenny: on the zuul side you should be able to trace the reqest too to see if it is unhappy with what it got | 22:05 |
clarkb | maybe a missing field or similar in the zookeeper blob | 22:06 |
y2kenny | clarkb: by zuul side do you mean the scheduler? | 22:06 |
y2kenny | clarkb: or another bit? | 22:06 |
fungi | scheduler | 22:07 |
clarkb | y2kenny: the zuul scheduler and executors | 22:07 |
clarkb | the scheduler coordinates the noderequests to nodepool then it hands that data over to the executor to use it in the job | 22:07 |
clarkb | so it could be in either spot | 22:07 |
fungi | at least start with the scheduler, but then yeah if it seems to be engaging an executor check to see if the executor logs say it's having trouble connecting or something | 22:07 |
y2kenny | clarkb: so when scheduler say "Completed node request" that's the point of handing over to the executor right? (I also see ExecutorClient Execute job... after that.) | 22:11 |
y2kenny | And the executor receving it is "ExecutorServer"? | 22:13 |
clarkb | y2kenny: yes to the first thing, once node request is completed I would expect the next thing that is done with it is the info is passed to the executor | 22:16 |
clarkb | y2kenny: for your second question what is the context? docker container name? log prefix? | 22:17 |
y2kenny | the context is Executor log | 22:17 |
clarkb | y2kenny: you should be able to grep for the event id of your triggering event across all services and have it show up consistently I think in the logs | 22:17 |
clarkb | the other thing I'll do is grep for the build sha in the executor logs | 22:17 |
y2kenny | clarkb: ok. I am seeing multiple node request for a single job for some reason | 22:18 |
y2kenny | that seems odd | 22:18 |
y2kenny | clarkb: build sha is not the same as event id right? | 22:20 |
clarkb | y2kenny: correct they are separate. The build sha is associated to the job build side and the event id is from the trigger event input | 22:20 |
fungi | y2kenny: correct, the event id appears in the log as "e: ..." where the ... is some uuid | 22:20 |
fungi | in [] brackets i believe | 22:21 |
fungi | yeah, it'll look like [e: 12c542cdd9d444c5a8d2e256c6691bea] | 22:21 |
fungi | taken from one of our scheduler logs just now | 22:21 |
y2kenny | fungi: does event id get pass to the executor side or only build id? | 22:24 |
clarkb | I thought both did but I could be wrong about that | 22:24 |
fungi | event id in the executor logs should correspond, checking | 22:24 |
*** armstrongs has joined #zuul | 22:26 | |
fungi | y2kenny: confirmed, i can find that exact event i pasted above in one of our executor's debug logs too | 22:26 |
y2kenny | I can associate eventid in both scheduler and nodepool but having trouble finding the corresponding event id in the executor... | 22:26 |
y2kenny | would that be an indication of what is stuck? | 22:27 |
fungi | if no executor ever picked up the job request in the gearman queue, then it wouldn't be logged on any executor | 22:27 |
clarkb | ya that could mean the scheduler is having trouble after the node request is completed but before it gets executed by the executor | 22:28 |
y2kenny | what does the scheduler do with the node request after it is completed? | 22:28 |
y2kenny | does it try to talk to the node in any way? | 22:28 |
*** armstrongs has quit IRC | 22:35 | |
y2kenny | oh and an unrelated note... does opendev.org search function sometimes don't work? | 22:35 |
fungi | the scheduler never tries to contact the node, it adds the node information to a gearman request which waits in geard's queue for an executor to claim it | 22:42 |
clarkb | y2kenny: ya the search problem is a known issue with gitea, they have plans for fixing it. For that reason we've kept http://codesearch.openstack.org running | 22:44 |
clarkb | we hope that gitea will eventually take over those duties though :/ | 22:44 |
y2kenny | fungi: so is there something in gearman I can examine about noderequest? | 22:44 |
y2kenny | clarkb: that's good to know. Thanks. | 22:45 |
clarkb | guillaumec: left some thoughts on https://review.opendev.org/#/c/732066/22 | 22:45 |
fungi | y2kenny: y2kenny: https://zuul-ci.org/docs/zuul/howtos/troubleshooting.html#gearman-jobs | 22:45 |
y2kenny | fungi: ok. I was using that earlier but I thought those are just stats and counts... I will dig into it deeper | 22:48 |
corvus | y2kenny: the scheduler should tell you in the debug logs exactly what it's doing | 22:48 |
clarkb | tobiash: https://review.opendev.org/#/c/730624/ would be a good one to update when you get a chance. Since I think that does address a class of problem with the window sizing | 22:49 |
y2kenny | corvus: from the scheduler log, the last useful thing I see is "INFO zuul.ExecutorClient: [e: c19e087de6f64c4790d16b73a2008946] Execute job x-ipmi (uuid: 25abe404e9134fc7a4529a37f905a90d) on nodes <NodeSet [ for change with dependent changes [{'" | 22:51 |
y2kenny | so reading this again... does the left open bracket of "<NodeSet [" suppose to happen? | 22:51 |
clarkb | y2kenny: Execute job build-openstack-sphinx-docs (uuid: ce224c2d3c35447dbd25d2e05115ac97) on nodes <NodeSet ubuntu-xenial [<Node 0017300685 ('ubuntu-xenial',):ubuntu-xenial>]> for change <- that is what the openstack driver produces | 22:53 |
clarkb | it looks like maybe your node request was fulfilled incompletely? | 22:53 |
y2kenny | clarkb: ok... so looks like there is something wrong with the nodeset being returned | 22:53 |
y2kenny | before that line I got "INFO zuul.nodepool: Setting nodeset <NodeSet [ in use" | 22:54 |
clarkb | y2kenny: maybe double check what ends up in the zookeeper db and ensure that looks correct to you as far as info goes | 22:56 |
clarkb | and if it does let me know what you're lookign at and we can probably grab similar from openstack driver to compare | 22:56 |
y2kenny | clarkb: what's the best way to drill into zookeeper db? (I assume that's different from gearman?) | 22:57 |
*** tosky has quit IRC | 22:59 | |
clarkb | ya there is a tool that helps let me find it | 22:59 |
corvus | zk-shell | 22:59 |
clarkb | y2kenny: https://pypi.org/project/zk-shell/ that tool | 22:59 |
clarkb | it allows you to navigate the db "filesystem" and dump node contents | 22:59 |
y2kenny | clarkb: ok I will take a look. fwiw, "nodepool list" don't seems to show anonmoly. | 23:00 |
y2kenny | | 0000000781 | 10.6.198.1 | zuul-nodes-exp | ng0006 | None | None | in-use | 00:00:49:56 | locked | | 23:01 |
corvus | y2kenny: there's a verbose option to nodepool list, but it still may not show everything | 23:01 |
openstackgerrit | Merged zuul/zuul master: gitlab - add driver documentation https://review.opendev.org/733880 | 23:02 |
y2kenny | corvus: oh... that dumped a lot of info... (I used it with --debug) | 23:02 |
fungi | y2kenny: the stray "[" there should be a nodeset name, so i wonder if you have a yaml parsing issue (reading a string where you intended a list maybe?) | 23:04 |
y2kenny | fungi: that would be part of the job definition? | 23:05 |
fungi | also the change id seems to be empty and the dependent changes list looks like an opening to a list of dictionaries | 23:05 |
*** rlandy|ruck is now known as rlandy|ruck|bbl | 23:05 | |
fungi | y2kenny: yeah it would probably be the nodeset parameter of that job definition, or of a parent job | 23:06 |
y2kenny | fungi: the change id I just truncated what I copied. sorry about that one. | 23:06 |
fungi | ahh, okay | 23:06 |
fungi | it was hard to tell | 23:06 |
y2kenny | clarkb: so I have zk-shell hooked up and I used the tree to look at things | 23:10 |
y2kenny | is there a particular branch I should go into? | 23:11 |
clarkb | y2kenny: it will be something like /nodepool/node-requests/something/something | 23:11 |
clarkb | I always have to explore a bit when I do it | 23:11 |
clarkb | tobiash: left +2 on https://review.opendev.org/#/c/710034/ but didn't approve as I think my comments are worth considering before we land that | 23:12 |
clarkb | tobiash: but I think its mostly cosmetic things that I called out | 23:12 |
clarkb | but since it is a mostly cosmetic change anyway I figured those were worth a look | 23:12 |
y2kenny | clarkb: I see /nodepool/nodes/<Id>/lock/<some other stuff...> | 23:13 |
clarkb | hrm if you don't have active requests they may not be there? | 23:13 |
clarkb | its the noderequest completing which maps onto the nodes path above | 23:14 |
clarkb | I think you want to check your node requests as they are completed | 23:14 |
clarkb | may be easier to log that in the services due to the cleanup race? | 23:14 |
y2kenny | but doesn't this mean the node request was completed successfully? | 23:15 |
clarkb | y2kenny: I think technically it was, what I think may be happening is that the contents of that successful transition aren't complete enough for zuul to figure out which nodes to use | 23:16 |
clarkb | its successful according to the protocol, but probably not when actually needing to use those nodes | 23:16 |
*** rfolco has joined #zuul | 23:18 | |
y2kenny | clarkb: ok... I am not sure how to capture that. And the driver doesn't control that transition does it? (like... I don't recall implementing something that flip the state from "ready" to "in-use") | 23:19 |
clarkb | y2kenny: the driver signals that it is done then the scheduler takes over from there. I think what you want to do is log the contents of the node request when the driver is marking the request as completed | 23:19 |
y2kenny | I assume everything the scheduler/executor needs is in zk.Node | 23:19 |
clarkb | y2kenny: the scheduler looks at the node request to figure out what zk Nodes to use | 23:20 |
clarkb | the node request completes with a list of ndoes in it that fulfill the request | 23:20 |
clarkb | and it almost seemsl ike maybe that is the bit that is missing | 23:20 |
y2kenny | clarkb: OOhhh kay... this is the launchesComplete of NodeRequestHandler? | 23:21 |
clarkb | y2kenny: its actually https://opendev.org/zuul/nodepool/src/branch/master/nodepool/driver/__init__.py#L707-L719 which is generic to all drivers so maybe that isn't the issue | 23:26 |
clarkb | based on the log you have from the scheduler it seems the request didn't provide complete node(request) data | 23:27 |
clarkb | and you'll need to work backward from there | 23:27 |
y2kenny | ok | 23:27 |
y2kenny | clarkb: I am also comparing the zk node entry and see if I miss something (comparing with a working driver entry like k8s) | 23:28 |
*** hamalq has quit IRC | 23:37 | |
clarkb | corvus: mhu tristanC left a couple of nits on https://review.opendev.org/#/c/734134/ will defer to others if that should be approved as is or if we should clean that stuff up | 23:55 |
clarkb | tristanC: ^ you approved the child so maybe you want to just approve that one if the nits are minor enough | 23:56 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!