Wednesday, 2022-09-07

@jim:acmegating.comClark: could automate with a script to grab job-output.json from every build00:00
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul-jobs] 856215: Improve stage-output functional test https://review.opendev.org/c/zuul/zuul-jobs/+/85621500:05
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 856214: [wip] zuul-stream : try using !127.0.0.1 for loopback https://review.opendev.org/c/zuul/zuul/+/85621400:08
-@gerrit:opendev.org- Ian Wienand proposed:00:48
- [zuul/zuul] 853208: zuul-stream : Test against a Python 2.7 container https://review.opendev.org/c/zuul/zuul/+/853208
- [zuul/zuul] 856214: zuul_stream : Use !127.0.0.1 for loopback https://review.opendev.org/c/zuul/zuul/+/856214
- [zuul/zuul] 856217: [dnm] zuul-stream multi-container testing https://review.opendev.org/c/zuul/zuul/+/856217
@iwienand:matrix.orgcorvus: https://review.opendev.org/c/zuul/zuul-jobs/+/851334 and https://review.opendev.org/c/zuul/zuul-jobs/+/852932 are two zuul-jobs changes; I think 852932 is a real bug in the zuul_azure_storage uploader.  i found that when working on the follow-on https://review.opendev.org/c/zuul/zuul-jobs/+/851289/ which runs an old version of ansible-lint that is compatible with ansible-2.8.  it turns out it wasn't being detected in our current linters pass due to what seems a regression in ansible-lint01:56
@iwienand:matrix.orgit's unclear if 851289 (running this old version of ansible lint with 2.8) is necessary.  i was worried if we're using a modern ansible-lint (that pulls in like ansible 2.12) we could commit things that don't work with 2.8 and not detect it.  But I don't know if we want to keep running an old unsupported version of ansible-lint -- it's more likely that trips up on valid things.  01:57
@iwienand:matrix.organsible-lint has made a couple of 6.5 series point-releases, but none of them fix it missing "skip_ansible_lint" tags (https://github.com/ansible/ansible-lint/issues/2320 ) -- hence they all fail to lint zuul-jobs as it won't recognise our skips 02:00
@iwienand:matrix.orgssbarnea: ^ fyi02:00
@iwienand:matrix.org * corvus: https://review.opendev.org/c/zuul/zuul-jobs/+/851334 and https://review.opendev.org/c/zuul/zuul-jobs/+/852932 are two zuul-jobs changes; I think 852932 is a real bug in the zuul_azure_storage uploader.  i found that when working on the follow-on https://review.opendev.org/c/zuul/zuul-jobs/+/851289/ which runs an old version of ansible-lint that is compatible with ansible-2.8.  it turns out it wasn't being detected in our current linters pass due to what seems a regression in ansible-lint (https://github.com/ansible/ansible-lint/issues/2283)02:01
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/nodepool] 853914: nodepool-functional-openshift: update nodpeool launcher to Fedora 36 https://review.opendev.org/c/zuul/nodepool/+/85391402:08
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/nodepool] 853914: nodepool-functional-openshift: update nodepool launcher to Fedora 36 https://review.opendev.org/c/zuul/nodepool/+/85391402:09
@iwienand:matrix.org^ that at least works to remove the dependency on fedora-3503:05
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 856225: upload-npm : support authToken argument https://review.opendev.org/c/zuul/zuul-jobs/+/85622503:30
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 856227: upload-npm: use latest nvm https://review.opendev.org/c/zuul/zuul-jobs/+/85622704:37
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 856246: Don't run cleanup playbooks after setup failure https://review.opendev.org/c/zuul/zuul/+/85624608:45
@avass:vassast.orgThe metastatic driver looks interesting to split a node into multiple smaller workers, but the slot information doesn't seem to be included in ansible so the job would have colliding workspaces :/09:04
@seyeongkim:matrix.orghello, job.vars can be considered as environment variable? I would like to specify UPPER_CONSTRAINTS_FILE for tempest and for xena only. e.g https://pastebin.ubuntu.com/p/9bd8tqybzp/ this can be working fine?09:25
@fungicide:matrix.orgSeyeong Kim: your example is just setting an ansible variable. you'd still have to do something like template that into a shell script and source that to set it in a shell environment or as an environment entry in an ansible shell/command task, i think. but really, you shouldn't have to do that yourself, we have actual variables for the standard roles that do it for you already. the tox role will accept `tox_constraints_file` pointing to the path of the constraints file you had zuul checkout for you, like this: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L6012:11
@fungicide:matrix.organd for documentation on the tox role where that variable is explained, see here: https://zuul-ci.org/docs/zuul-jobs/python-roles.html#rolevar-tox.tox_constraints_file12:13
@fungicide:matrix.orgif that telemetry-tempest-base job you're inheriting from is the one in the openstack/telemetry-tempest-plugin on opendev, it probably inherits from something which includes the tox role from zuul-jobs and will therefore use that variable (the ancestry of that job is so many layers across so many repositories that i don't personally have time to analyze them all in order to work out exactly where though)12:26
@fungicide:matrix.orgyou'll probably have more luck asking for more project-specific suggestions in the #_oftc_#openstack-qa:matrix.org channel or maybe #_oftc_#openstack-telemetry:matrix.org12:29
@fungicide:matrix.orgmmm, looks like element did some funky interpretation of the channel names there, probably not clickable. anyway #openstack-qa or #openstack-telemetry through matrix.org's oftc bridge12:30
@noonedeadpunk:matrix.orgI have a question to operators mostly - how do you test post jobs that should be delegated to another host? I've tried to make them as "check" ones, but I do have troubles with auth, since project-key added to ssh-agent only for post job. and nodepool key can't be used as it's wiped out with add-build-sshkey role in pre-step of base job. So there should be something to add_host that will be added to the base, which is weird and doesn't make sense kind of (except it's test-base ofc...) 12:31
But I was wondering if there's any best practise for this kind of thing?
@fungicide:matrix.orgDmitriy Rabotyagov: all of the oftc irc network is bridged via matrix.org12:31
@noonedeadpunk:matrix.orgYeah, but I don't see this convo in IRC and seems my messages are not here either12:31
@fungicide:matrix.orgDmitriy Rabotyagov: right, this channel is a matrix channel, not a bridged irc channel12:31
@noonedeadpunk:matrix.orgaha, ok, gotcha12:32
@fungicide:matrix.orgi thought you were referring to my earlier attempts to link the ofct matrix bridged versions of the openstack qa and telemetry channels12:32
@fungicide:matrix.org * i thought you were referring to my earlier attempts to link the oftc matrix bridged versions of the openstack qa and telemetry channels12:32
@noonedeadpunk:matrix.orgI do recall you worked on that but I wasn't able to recall where that ended, so... 12:33
@fungicide:matrix.orgworked on what?12:33
@fungicide:matrix.orgi mean referring to my comments about those irc channels moments ago when replying to Seyeong Kim 's question12:34
@noonedeadpunk:matrix.orgah, no, I was not :)12:34
@fungicide:matrix.orgthis conversation is getting very meta though, so probably best to move on12:34
@noonedeadpunk:matrix.orgSo yeah, my original challenge was about how to ensure post job actually works12:35
@noonedeadpunk:matrix.orgAs now I quite stuck with auth to 3rd party host12:36
@fungicide:matrix.orgas for your actual question, testing post-merge jobs pre-merge... yes access to secrets is problematic since zuul restricts those to post-review pipelines (which post-merge pipelines generally are as well). you may be able to get by adding them to a pre-merge post-review pipeline like a "gate" or something, though in opendev we've generally avoided that as well12:38
@fungicide:matrix.orgthe underlying challenge is that pre-review use of secrets risks someone exfiltrating them before a reviewer has a chance to check that changes to how they're accessed and used are safe12:39
@fungicide:matrix.orgone thing we do is have our check and gate jobs create stand-in replacements for those systems with disposable credentials and exercise with them instead of the real system the post job would connect to12:41
@noonedeadpunk:matrix.orgok, yes, fair. then it's not an option to add_host to in-memory inventory as well12:41
@fungicide:matrix.orgof course, in opendev we're lucky that we insist on open source systems, so we have the luxury of being able to just install new versions of them onto temporary test nodes12:42
@noonedeadpunk:matrix.orgWell, I'm excersising with internal zuul deployment right now. 12:43
@noonedeadpunk:matrix.organd review process would mean disturbing ppl from their work for each patchset12:43
@fungicide:matrix.orgyeah, we have zuul deployment test jobs which basically stand up an entire zuul and associated suite of services on test nodes and interacts with those12:44
@noonedeadpunk:matrix.orgSo setup a sandbox is what I tried to do, but I tried to do that with check pipeline12:44
@avass:vassast.orgfungi: does opendev have any cpu metrics for zuul anywhere?12:44
@noonedeadpunk:matrix.orgAnd I can't access this sandbox VM....12:44
@fungicide:matrix.orgAlbin Vass: yes, we have https://cacti.openstack.org/12:44
@fungicide:matrix.orgAlbin Vass: and also some dashboards for zuul and nodepool on https:grafana.opendev.org/12:45
@noonedeadpunk:matrix.orgI guess you meant `http://`12:45
@noonedeadpunk:matrix.org * I guess you meant \`http://\` for cacti12:45
@fungicide:matrix.org * Albin Vass: yes, we have http://cacti.openstack.org/12:46
@fungicide:matrix.org * Albin Vass: and also some dashboards for zuul and nodepool on https://grafana.opendev.org/12:46
@fungicide:matrix.orgDmitriy Rabotyagov: yep, thanks. corrected. that should hopefully get replaced by a prometheus system soon-ish12:47
@avass:vassast.org> <@fungicide:matrix.org> Albin Vass: yes, we have http://cacti.openstack.org/12:47
thanks!
@sameer.deshpande:matrix.orghi , I need some help on resolving issue with installation of nodepool builder . Is this the right forum to ask queries ?13:05
@clarkb:matrix.org> <@sameer.deshpande:matrix.org> hi , I need some help on resolving issue with installation of nodepool builder . Is this the right forum to ask queries ?13:22
Yes, this is the correct location for questions about zuul and nodepool.
@clarkb:matrix.orgDmitriy Rabotyagov: another approach we use is to share as much of the content of jobs between pre and post merge. For example in opendev/system-config the system-config-run-* jobs share the same playbooks as infra-prod-service-*13:23
@clarkb:matrix.orgThat doesn't get us complete coverage but it is decent. Particularly since those playbooks are what tend to change 13:24
@avass:vassast.orgfungi: how many cores does opendevs schedulers have?13:35
@fungicide:matrix.orgAlbin Vass: virtual machine reporting 4 cores at 2.6ghz (intel xeon e5-2650) for zuul01, though zuul02 has 8 cores (other processor details look the same though)13:39
@fungicide:matrix.orgalso no idea how oversubscribed they are on the hypervisor since they're just running in a public cloud13:39
@fungicide:matrix.orgwe'll likely downsize zuul02 the next time we replace it, to be in line with the size of zuul01 (01 is newer at the moment)13:39
@alex.disp:matrix.orgHi everyone. I have a question about resources requirements for Zuul and its components: my team is currently trying to decide what to use for CI and we just can't find any documentation related to Zuul minimum/recommended hardware configuration and scaling requirements. For example we would like to have some info like "Zuul can support *number of jobs* on *number of CPU* and *RAM size*". There's currently no definite info how much CPU and RAM each component requires, and if we want to scale Zuul executors with kubernetes - when do we start to scale? What number of CPUs and RAM size is recommended for one instance of each executor?14:22
@jim:acmegating.comDisp: it varies greatly based on the jobs that are actually running.  opendev manages to get about 80-100 jobs out of an 8g/8vcpu executor, others maybe half that.  and to be clear, that's not including the actual test nodes which are obviously external.14:26
@jim:acmegating.comthe values also depend on the rate of jobs.  starting a job is more expensive for an executor14:27
@jim:acmegating.comDisp: zuul will take executors offline if they get too busy.  when that happens too much, add more.14:27
@alex.disp:matrix.orgOk, thank you. And what about other components? Would be nice to have a spreadsheet for each of them, because some are not scalable and we would need to allocate enough resources for them from the start.14:30
@jim:acmegating.comall are scalable14:30
@alex.disp:matrix.orgI think documentation says that scheduler is not scalable.14:31
@jim:acmegating.comDisp: if you can point me at that so I can fix it, I would appreciate it 14:31
@jim:acmegating.comDisp: the entire system can run in a few gb of ram on a laptop -- it scales down and up.  so it's hard to give universal guidance there.  the minimum requirement is literally something like 4gb for all of the components running on one host.14:33
@jim:acmegating.comactually the tutorial says 2gb of ram: https://zuul-ci.org/docs/zuul/latest/tutorials/quick-start.html14:34
@jim:acmegating.comDisp:  both https://zuul-ci.org/docs/zuul/latest/components.html and https://zuul-ci.org/docs/zuul/latest/configuration.html#scheduler indicate the scheduler is scalable.  do please let me know where you see that it isn't so we can fix it.14:37
@jim:acmegating.comDisp: you can see stats from opendev here: https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1  (not super busy right now)14:38
@jim:acmegating.comand individual host graphs (like cpu/ram usage) here: https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=114:39
@alex.disp:matrix.org> <@jim:acmegating.com> Disp: if you can point me at that so I can fix it, I would appreciate it14:39
Hm, for some reason I can't find the page I saw it. I remember the page with all components listed and there was mention that executor and merger are scalable but scheduler is not. Maybe it was some old documentation?
@alex.disp:matrix.org> <@jim:acmegating.com> Disp: if you can point me at that so I can fix it, I would appreciate it14:39
* Hm, for some reason I can't find the page I saw it on. I remember the page with all components listed and there was mention that executor and merger are scalable but scheduler is not. Maybe it was some old documentation?
@jim:acmegating.comthat was probably it14:39
@jim:acmegating.comthat was the last component we made scalable14:40
@jim:acmegating.com(caveat: nodepool launchers are scalable, but only by adding more providers)14:40
@alex.disp:matrix.orgAl right, thank you very much for your response!14:59
@jim:acmegating.comDisp: you're welcome!  enjoy zuul :)15:04
-@gerrit:opendev.org- Thomas Cardonne proposed wip: [zuul/zuul] 856294: feat(elasticsearch): support datastreams and rollover aliases https://review.opendev.org/c/zuul/zuul/+/85629415:14
-@gerrit:opendev.org- Thomas Cardonne proposed wip: [zuul/zuul] 856294: feat(elasticsearch): support datastreams and rollover aliases https://review.opendev.org/c/zuul/zuul/+/85629415:23
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 855801: Add nodeset alternatives https://review.opendev.org/c/zuul/zuul/+/85580116:31
@jim:acmegating.comClark: swest do my responses on https://review.opendev.org/849887 address your questions?16:37
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 855801: Add nodeset alternatives https://review.opendev.org/c/zuul/zuul/+/85580116:42
@westphahl:matrix.orgcorvus: yes ,sorry forgot to reply.16:50
@clarkb:matrix.orgcorvus: swest I resolved all but two of the threads I started. On the one I expect swest to respond and the other is the tenant access for building images without admin approval. It sounds like we're good with making zuul support access control to that then separately can discuss whether opendev can enable it on a per tenant basis. I think as a zuul admin being able to prevent that for a tenant is important though and want to make sure zuul supports that need. I didn't want to resolve it until I was sure that was the way we were leaning16:56
@clarkb:matrix.orgfungi: do you want to reapprove https://review.opendev.org/c/zuul/zuul-jobs/+/855402/ now? The depends on in devstack came through clean: https://review.opendev.org/c/openstack/devstack/+/85621116:57
@westphahl:matrix.orgClark: sorry reply/resolve did not work. Should be good now17:03
@clarkb:matrix.orgswest: no problem at all. The resolve option is one that I'm still learning to use myself17:05
@jim:acmegating.comClark: replied thanks.  i'm on board with the tenant config option to disable image builds.  i think that's a pretty minor clarification/change to the spec and i think we can get away without pushing a new rev.  but i'm happy to add that in and respin if desired.17:11
@jim:acmegating.com(i think in my head i was assuming allowed-triggers would be sufficient, but that's not high enough resolution so we do need something new.  either way, it's simple to do.)17:12
@clarkb:matrix.orgcorvus: I'm ok without a respin as long as we're all on board with making it a thing :)17:17
@jim:acmegating.comianw: you had a -1 on an earlier ps of https://review.opendev.org/849887 which i am confident has been resolved in later patchsets, but it would be great if you can confirm that17:24
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 856307: Fix race in test_periodic_freeze_job_failure https://review.opendev.org/c/zuul/zuul/+/85630717:30
@jim:acmegating.comzuul-maint: ^ that should make it much easier to merge changes :)17:30
@clarkb:matrix.orgI'm going to go ahead and approve the stage-output speed up change in the next little bit if I don't hear any objections. It is reasonably well tested at this point and a primary consumer is happy with it via depends on17:55
@clarkb:matrix.orgIf necessary I have no issues with reverting it as well17:56
@clarkb:matrix.orgHearing no objections I have approved it18:08
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul-jobs] 855402: Speed up log file fetching tasks https://review.opendev.org/c/zuul/zuul-jobs/+/85540218:19
@mbecker12:matrix.orgHi, I left a comment on my patch https://review.opendev.org/c/zuul/nodepool/+/853993 recently regarding the tests. corvus maybe you could have another look if you have a minute :)18:28
@jpew:matrix.orgI have some jobs that very rarely have a POST_FAILURE that prevents the logs from being copied to the log server; does anyone have a suggestion as to how to figure out what the problem is?18:39
@clarkb:matrix.orgjpew: the executor logs are usually where I start when the job itself doesn't have logs. You can grep the executor debug log by the build uuid to get all the info from it that is available from the ansible control side18:43
@jpew:matrix.orgYa, I think I have the debug logs turned off18:52
@jpew:matrix.orgI guess I can turn them back on18:52
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 856317: Add option to include returned data in MQTT reporter https://review.opendev.org/c/zuul/zuul/+/85631718:54
-@gerrit:opendev.org- Tristan Cacqueray proposed: [zuul/zuul] 856321: Add initial telemetry tracing to the executor component https://review.opendev.org/c/zuul/zuul/+/85632119:19
-@gerrit:opendev.org- Thomas Cardonne proposed wip: [zuul/zuul] 856294: feat(elasticsearch): support datastreams and rollover aliases https://review.opendev.org/c/zuul/zuul/+/85629419:34
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 856307: Fix race in test_periodic_freeze_job_failure https://review.opendev.org/c/zuul/zuul/+/85630720:40
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 855291: Fix and improve Keycloak tutorial https://review.opendev.org/c/zuul/zuul/+/85529120:57
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand:20:57
- [zuul/zuul] 854284: zuul-tox-remote: capture callback errors in remote zuul jobs https://review.opendev.org/c/zuul/zuul/+/854284
- [zuul/zuul] 854285: zuul_stream: handle failed_when tasks https://review.opendev.org/c/zuul/zuul/+/854285
@jim:acmegating.comzuul-maint on the etherpad at https://etherpad.opendev.org/p/qF7VE9HzqPVzsCyZLWxb i originally put "add ansible 6" after 6.4.0 and before 7.0.0, but i think it may make more sense before 6.4.0 so that people who want to jump from 2.x to 6 can do so.  then we remove 2.x in 7.0.0.  does that make sense?21:09
@jim:acmegating.com(i updated the etherpad with the proposed change if you want to look there)21:09
@clarkb:matrix.orgcorvus: yes, I don't think it hurts to add more ansible earlier21:10
@clarkb:matrix.orgits only the removals that we need to be careful with21:10
@jim:acmegating.comyeah.  and addition of an ansible in a minor rev makes sense to me ("new feature")21:11
@jim:acmegating.comso now that we have 6.3.0 out of the way, doing that in 6.4.0 seems nice21:11
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:21:31
- [zuul/zuul] 856337: Add semaphores to REST API https://review.opendev.org/c/zuul/zuul/+/856337
- [zuul/zuul] 856338: WIP: Add semaphore support to web UI https://review.opendev.org/c/zuul/zuul/+/856338
@iwienand:matrix.org> <@jim:acmegating.com> ianw: you had a -1 on an earlier ps of https://review.opendev.org/849887 which i am confident has been resolved in later patchsets, but it would be great if you can confirm that22:09
thanks, LGTM; hopefully i'll find some ways to help, particularly extracting dib testing a bit
@jim:acmegating.com> <@iwienand:matrix.org> thanks, LGTM; hopefully i'll find some ways to help, particularly extracting dib testing a bit22:13
thanks! i won't be shy! :)
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 854009: Fix Ansible version testing https://review.opendev.org/c/zuul/zuul/+/85400922:32
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/zuul] 853603: zuul_stream: handle non-string msg value https://review.opendev.org/c/zuul/zuul/+/85360322:32
@jim:acmegating.comClark: would you mind a quick review of ianw's change at https://review.opendev.org/856214 ? i'd like to go ahead and approve that with its parent22:35
@clarkb:matrix.orgcorvus: can do22:36
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 853548: Deprecate Ansible 2, make Ansible 5 default https://review.opendev.org/c/zuul/zuul/+/85354822:37
@clarkb:matrix.orgcorvus: should I approve them now or do you want to wait for the recheck to come back?22:39
@jim:acmegating.comClark: i think approve is ok -- the stream tests passed22:39
@clarkb:matrix.orgdone22:40
@jim:acmegating.comoh hrm22:40
@jim:acmegating.comthere's a funny looking unit test failure on that; i'll dbl check locally22:40
@clarkb:matrix.orgok22:40
@jim:acmegating.com99.9% sure that's not related to the change and is just a really unfortunate race22:43
@jim:acmegating.comone of the fingergw components started with a public port of 0, so i'm guessing it's a race in the unit test fixture startup22:44
@jim:acmegating.comprobably another component's component registry hadn't updated in time22:45
@jim:acmegating.comi think we're good22:45
@jim:acmegating.comsomeone should be able to make that test a little more robust if we need to22:46

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!