Wednesday, 2019-09-18

*** armstrongs has joined #zuul00:04
*** armstrongs has quit IRC00:13
*** jamesmcarthur has joined #zuul00:42
*** jamesmcarthur has quit IRC01:07
*** kerby has quit IRC01:55
openstackgerritMohammed Naser proposed zuul/zuul-operator master: Create zookeeper operator  https://review.opendev.org/67645802:08
*** jamesmcarthur has joined #zuul02:22
*** bhavikdbavishi has joined #zuul02:33
*** roman_g has quit IRC02:35
*** jamesmcarthur has quit IRC02:38
*** bhavikdbavishi1 has joined #zuul02:40
*** bhavikdbavishi has quit IRC02:42
*** bhavikdbavishi1 is now known as bhavikdbavishi02:42
*** jamesmcarthur has joined #zuul02:47
*** noorul has joined #zuul03:12
noorulhi03:12
noorulI am seeing the following error03:12
noorulSending result: {"result": "DISK_FULL", "warnings": [], "data": {}}03:12
noorulBut plenty of disk space is available on the node where zuul is running and on the slave nodes03:13
*** noorul has quit IRC03:18
*** jamesmcarthur has quit IRC03:24
*** jamesmcarthur has joined #zuul03:36
*** noorul has joined #zuul03:41
noorulI see this warning too03:45
noorulWARNING zuul.ExecutorDiskAccountant: /var/lib/zuul/builds/e27b42f0eff140bcb66b0ccb050e5b81 is using 280MB (limit=250)03:45
noorulIs this related?03:45
clarkbyes, there is a per job disk limit and you are hitting that. There is a co fig option to make the limit larger03:47
clarkbit is there to avoid jobs filling disks on your exwcutors and log servers03:47
*** jamesmcarthur has quit IRC03:50
*** noorul has quit IRC03:59
openstackgerritIan Wienand proposed zuul/zuul master: zuul_console: fix python 3 support  https://review.opendev.org/68255603:59
openstackgerritIan Wienand proposed zuul/zuul master: Support nodes setting 'auto' python-path  https://review.opendev.org/68227503:59
*** bolg has joined #zuul03:59
mnaserhttps://review.opendev.org/#/c/676458/704:40
mnaserWhat's the best way to consume the image built by the dependant job?04:41
mnaser(and should we just build the image during the test run rather than out in a job before it?)04:41
*** pcaruana has joined #zuul04:46
*** noorul has joined #zuul04:48
mnaserI wonder why the image is not being found05:02
noorulOn my slave node I am trying to use docker to run unit tests. What is the normal practice for this? Should we create user in the docker image or root should be used. When I have a user mount is not working05:31
openstackgerritIan Wienand proposed zuul/nodepool master: Set default python-path to "auto"  https://review.opendev.org/68279705:57
*** avass has joined #zuul06:13
*** igordc has joined #zuul06:17
*** igordc has quit IRC06:19
*** avass has quit IRC06:32
*** shachar has quit IRC06:35
*** snapiri has joined #zuul06:35
*** bolg has quit IRC06:50
*** roman_g has joined #zuul07:00
flaper87tobiash: how do you do the restart of the scheduler pod when you add a new project to the tenant? I've a configmap with the tenant config that is mounted in the scheduler's POD. I'd like to have a (safe?) way to signal the scheduler POD when the configmap changes07:04
flaper87I guess I could have a custom startup script07:04
flaper87but I'd rather use a more k8s method if there's one07:04
*** themroc has joined #zuul07:07
*** hashar has joined #zuul07:13
*** noorul has quit IRC07:17
*** tosky has joined #zuul07:18
mordredmnaser: the zuul images jobs (with the opendev-buildset-registry job) should make the image stuff just work - as well as image promotion so that the image built in the gate is what gets published. it's a whole thing, but it works really well and works well with depends-on (in your copious free time we should make sure your zuul(s) are set up so that people can do speculative image jobs, because they're07:19
mordredmindblowingly awesome, but they do take work from the zuul operator)07:19
mordredmnaser: if you're having issues with images not showing up, let's for sure figure that out07:19
*** bolg has joined #zuul07:23
*** sshnaidm|pto is now known as sshnaidm|rover07:24
tobiashflaper87: you can reload the scheduler without restart07:24
flaper87tobiash: yeah, but that requires getting into the pod and reloading the scheduler. was hoping to have that done automagically when the configmap is updated07:25
tobiashflaper87: in that case you could run a helper script using inotify to automate that07:26
mordredflaper87: I think you're yearning for the distributed scheduler work to be completed07:27
flaper87coolio, that's what I was going to do but I wanted to know if there was a more native way to do it.07:27
*** gtema_ has joined #zuul07:27
flaper87mordred: oh, there's an ongoing work to make the scheduler distributed?07:27
flaper87I'd very much love that07:27
*** mmedvede has quit IRC07:31
mordredflaper87: yeah. https://review.opendev.org/#/c/621479/ is the spec07:32
*** arxcruz has quit IRC07:32
*** mmedvede has joined #zuul07:34
*** arxcruz has joined #zuul07:36
*** jpena|off is now known as jpena07:41
*** gtema_ has quit IRC07:45
*** gtema_ has joined #zuul07:45
*** AJaeger has quit IRC08:14
*** recheck has quit IRC08:20
*** noorul has joined #zuul08:22
noorulWhen the job log crossed the size limit, it failed with DISK_FULL error message08:23
noorultestjob finger://aruba-virtual-machine/e27b42f0eff140bcb66b0ccb050e5b81 : DISK_FULL in 0s08:23
noorulI am not sure what is the use of finger URL08:23
*** recheck has joined #zuul08:24
*** AJaeger has joined #zuul08:27
openstackgerritMerged zuul/zuul master: Add no-jobs reporter action  https://review.opendev.org/68127808:36
openstackgerritMerged zuul/zuul master: Add report time to item model  https://review.opendev.org/68132308:50
*** pcaruana has quit IRC08:57
*** pcaruana has joined #zuul09:01
mgoddardFYI, had a weird issue in the persistent-firewall role: https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7dd/681446/7/gate/ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/7dd6514/job-output.txt09:06
openstackgerritMerged zuul/zuul master: Add Item.formatStatusUrl  https://review.opendev.org/68132409:11
*** panda|ruck|off is now known as panda|ruck09:37
*** hashar has quit IRC09:41
*** gtema_ has quit IRC09:46
*** gtema_ has joined #zuul09:46
*** noorul has quit IRC09:49
*** bhavikdbavishi has quit IRC09:52
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - fix wrong commit gitweb url  https://review.opendev.org/67994610:02
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - handle initial comment change event  https://review.opendev.org/68031010:02
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - handle Pull Request tags (labels) metadata  https://review.opendev.org/68105010:02
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - reference pipelines add open: True requirement  https://review.opendev.org/68125210:02
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - handles pull-request.closed event  https://review.opendev.org/68127910:02
*** openstackgerrit has quit IRC10:06
*** zbr has quit IRC10:11
*** zbr has joined #zuul10:12
*** noorul has joined #zuul10:21
fbocorvus: tristanC mordred regarding Pagure, I've rebased the patch chain and removed the git.tag.creation patch. The chain should be ok to merge. For the tag.creation patch I'll discuss with pagure folks.10:23
fbohttps://review.opendev.org/#/q/topic:pagure-driver-update10:24
*** noorul has quit IRC10:26
mordredfbo: stack looks great to me10:29
*** openstackgerrit has joined #zuul10:30
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - add support for git.tag.creation event  https://review.opendev.org/67993810:30
fbomordred: thanks for the reviewes !10:31
mordredfbo: thanks for working on that - it's exciting to see the pagure support and that relationship moving forward10:31
fbomordred: thanks, yes :) and we are working on a set of default jobs for Fedora distgit here an example https://src.fedoraproject.org/rpms/python-gear/pull-request/8#comment-30592. I hope packagers will like it.10:41
*** brendangalloway has joined #zuul10:46
*** noorul has joined #zuul10:51
*** AJaeger has quit IRC10:55
openstackgerritMerged zuul/zuul master: Pagure - fix wrong commit gitweb url  https://review.opendev.org/67994610:57
*** AJaeger has joined #zuul11:01
*** noorul has quit IRC11:04
*** noorul has joined #zuul11:05
*** pcaruana has quit IRC11:19
*** pcaruana has joined #zuul11:28
*** jpena is now known as jpena|lunch11:35
*** bhavikdbavishi has joined #zuul11:54
*** bhavikdbavishi has quit IRC11:59
mnaserflaper87: you can run a second pod that watches the file for changes and fires the zuul full reconfigure command11:59
*** bhavikdbavishi has joined #zuul12:05
*** jamesmcarthur has joined #zuul12:09
*** jamesmcarthur has quit IRC12:16
*** jamesmcarthur has joined #zuul12:22
*** rlandy has joined #zuul12:25
*** jamesmcarthur has quit IRC12:31
*** jpena|lunch is now known as jpena12:31
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Add scheduler config options for hold expiration  https://review.opendev.org/68267512:42
*** themroc has quit IRC12:42
*** themroc has joined #zuul12:42
ShrewstristanC: if that ^^ looks good, i'll toss the rest of the stack on top of that again. Hopefully we can get this all merged today  :/12:43
Shrewsalso corvus12:44
*** avass has joined #zuul12:47
*** fdegir has quit IRC12:47
*** fdegir has joined #zuul12:48
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Add scheduler config options for hold expiration  https://review.opendev.org/68267512:48
Shrewsfixed up outdated comments  ^^12:48
*** jamesmcarthur has joined #zuul12:51
*** hashar has joined #zuul12:51
tristanCShrews: i'm in a meeting now, i can have a look in a couple of hours12:56
mnaserIf I want to use the same base job across different zuuls (for logs), would it pretty much require that all deployments at least include the keys used to build the secrets on that job and I should be good to go?12:59
mnaserIt sounds like it, I haven't tried it yet but it seems like that should be enough12:59
*** Goneri has joined #zuul13:09
*** bolg has quit IRC13:12
*** hashar has quit IRC13:30
*** bhavikdbavishi has quit IRC13:30
tristanCShrews: left a comment, it lgtm, though if you don't mind i'll wait for the rebase to test the stack again13:41
tristanCfungi: re bwrap: containers usually setup cgroups, seccomp and selinux which we don't currently get from bwrap in zuul13:53
tristanCseccomp in particular may be important, though it seems like bwrap can setup the filters thus we could get that for zuul. then zuul would be mostly missing selinux context by using bwrap instead of full containers14:00
*** swest has quit IRC14:01
*** swest has joined #zuul14:02
*** swest has quit IRC14:03
corvusmnaser: yes; as long as the keys are there, zuul won't geerate them14:04
corvusnoorul: the finger url is just the only url zuul had for the job at the time.  it does contain the build uuid so you can check that in logs.14:05
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Add scheduler config options for hold expiration  https://review.opendev.org/68267514:07
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Mark nodes as USED when deleting autohold  https://review.opendev.org/66406014:12
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Auto-delete expired autohold requests  https://review.opendev.org/66376214:12
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Add autohold delete/info commands to web API  https://review.opendev.org/67905714:12
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Remove outdated TODO  https://review.opendev.org/68242114:12
ShrewstristanC: that should be the rest of the stack ^^14:12
Shrewsthere was a test fix in 67905714:13
corvusclarkb: https://review.opendev.org/680778 could use re+2 from you14:13
pabelangermordred: where is a good place to talk about how pbr does versioning of python things from git commits? Basically, I'd like to see how to add that support into galaxy cli command now for generating tarballs used by collections.  So today, on galaxy side, it is very static.14:13
fungitristanC: do other container solutions actually apply selinux on platforms which don't ship with selinux enabled (for example, ubuntu)?14:14
pabelangerso we can get a good story / versioning around speculative galaxy collections14:14
fungipabelanger: there is a thorough spec in the pbr docs14:15
pabelangerThanks!14:15
fungipabelanger: or are you talking about the actual implementation at the python level, not the versioning rules?14:15
pabelangerfungi: yah, I wanted to see how to write the logic in python or maybe extract from pbr to galaxy can use14:16
pabelangeror some sort of CLI command, where I can run and generate a proper version number14:16
fungii think pbr still relies on pkg_resources to obtain the metadata from disk, but could in theory be switched over to the newer packaging library14:16
*** noorul has quit IRC14:17
pabelangerright now, everything generate is all using the same version number, which isn't ideal.14:17
*** avass has quit IRC14:18
fungioh, pbr does have some functions for things like outputting equivalent rpm and deb version strings, so you just want something similar for its normal pep-440 version strings14:18
tristanCfungi: i can't tell for ubuntu, but podman does setup uniq context label per containers to restrict container access to host and to others containers (when selinux is activated)14:19
fungiubuntu ships by default with apparmor, so maybe that gets used in similar ways by some containerization solutions14:20
pabelangerfungi: yes. Basically, take git repo. Run command, output something like 3.9.1.dev183 a2018c5 (taken from zuul.o.o UI). I can then generate proper 'Manifest' file for galaxy or bonus points if galaxy did it directly14:20
*** openstackgerrit has quit IRC14:21
fungiseems like adding an additional pbr subcommand like `pbr version [--deb, --rpm, --sdist, --wheel]` might do the trick14:26
pabelangerokay cool, that should give me starting place14:26
fungier, i guess it would be `pbr version [--deb, --rpm, --sdist, --wheel] <package>`14:26
fungiso work like `pbr info <package>` does now14:27
fungithere's also a `pbr sha <package>`14:27
fungii expect most of the plumbing is there, and i know the functions you need to generate equivalent version strings for multiple package formats are already in pbr14:28
fungi"The version.SemanticVersion class can be used to query versions of a package and present it in various forms - debian_version(), release_string(), rpm_string(), version_string(), or version_tuple()." https://docs.openstack.org/pbr/latest/user/features.html#version14:30
pabelangerfungi: yah, I think the change will be, there is no package. As these repos don't use setuptools. So, need to understand how pbr reads that info from git, to generate the version number14:30
fungithe implementation would likely be no more than a few lines of python14:30
fungioh14:30
fungiif there's no python packaging involved, then yeah you may want a separate took14:31
fungitool14:31
fungipbr is fairly tied to python packaging metadata14:31
pabelangerack14:31
fungithe bits of pbr which deal with finding the highest tag in your history, obtaining the commit id and so on are in https://opendev.org/openstack/pbr/src/branch/master/pbr/git.py14:33
fungithe actual version string format is likely also not too relevant outside of python packaging contexts, since it's specific to how pip and similar tools expect to order things like prerelease, postrelease and development identifiers in version strings14:34
fungithat's why pbr also has functions to translate those to deb and rpm version formats... different platforms have different rules for how to encode such details and how they sort relative to each other14:35
*** openstackgerrit has joined #zuul14:36
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: DNM: test prepare-workspace-git base-test  https://review.opendev.org/68291214:36
brendangallowaypabelanger: tried your connection reset suggestion from yesterday, but it fails due to what looks like ansible 2.5 compatibility issues.  Do you know if that will work if we upgrade from SF 3.2 to 3.3?14:56
pabelangerbrendangalloway: don't know, but which version of zuul are you using? We have multi-ansible support14:56
brendangallowaywe also noticed that the static node returns to the 'ready' state in nodepool when it comes back, even though zuul still considers it in use.  Not sure if that's related though14:56
pabelangerso you could use job.ansible_version to ask for newer version14:57
brendangallowayWe're on Software Factory 3.2, not sure exactly which zuul version that is.  Default ansible is 2.514:57
pabelangerbrendangalloway: look on your zuul dashboard UI14:58
pabelangerbottom left, should be version info14:58
brendangallowaypabelanger:  3.6.1-1.el714:58
pabelanger3.7.0 was when we adding multiple ansible: https://zuul-ci.org/docs/zuul/releasenotes.html#relnotes-3-7-014:58
pabelanger:(14:58
pabelangerbrendangalloway: yah, sounds like you might need to upgrade14:59
brendangallowayAah - the Multiple Ansible Versions doc made it sound like the feature was still work in progress, so we haven't tried to use it yet14:59
openstackgerritMerged zuul/zuul master: Pagure - handle initial comment change event  https://review.opendev.org/68031014:59
pabelangerbrendangalloway: nope! is production, works great14:59
brendangallowaypabelanger: https://zuul-ci.org/docs/zuul/developer/specs/multiple-ansible-versions.html maybe needs and update then?  It's the first thing that came up searching for 'zuul change ansible version'15:01
pabelanger+115:01
brendangallowaybut yes, looks like next step is upgrading and seeing if that fixes things15:01
pabelangeryah, we should consider updating it15:01
fungithat's also somewhat of a risk with documenting our design specs in the documentation tree... can make it seem like those features aren't implemented yet if you go looking for them and end up at the design spec15:03
fungithe pink warning at the top of the page is intended to convey that, but maybe it gives the opposite impression15:04
corvusoh, if that's implemented, it should be removed15:04
tristanCbrendangalloway: zuul multi-ansible support is added to SF-3.3, you would need to follow this procedure: https://www.softwarefactory-project.io/docs/3.3/operator/upgrade.html15:05
fungicorvus: the spec should be removed, or just the admonition at the top of it?15:05
corvusfungi: the spec i think15:05
fungimakes sense15:05
fungiit's always available in git history after all15:05
tristanCbrendangalloway: though we are fixing minor issues regarding the recent centos-7.7 update, feel free to ask on #softwarefactory if you have any issue with that15:05
corvuswe should make sure that any documentation value it has is covered elsewhere, then remove it.  (otherwise, we end up with documentation by spec which is less user friendly)15:06
openstackgerritMerged zuul/zuul-website master: Update to page titles and Users  https://review.opendev.org/68045915:06
*** arxcruz is now known as arxcruz|ruck15:08
*** mattw4 has joined #zuul15:09
*** themroc has quit IRC15:11
openstackgerritMerged zuul/zuul master: zuul_console: fix python 3 support  https://review.opendev.org/68255615:17
*** michael-beaver has joined #zuul15:19
clarkbcorvus: +2 reapplied15:19
*** noorul has joined #zuul15:23
*** TxGirlGeek has joined #zuul15:26
brendangallowayI don't suppose there's any way for zuul to report how many times a certain job has been run in a certain pipeline?15:32
*** zbr has quit IRC15:36
*** zbr has joined #zuul15:37
noorulAny idea how to copy a folder from one location to another using inbuilt ansible module?15:37
clarkbnoorul: you can use the synchronize module15:37
clarkbbrendangalloway: for all time? I believe that data is in the databse but not currently exposed15:38
*** sshnaidm|rover is now known as sshnaidm15:39
*** panda|ruck is now known as panda15:39
brendangallowayclarkb: Yeah, we've ported one of our build systems into zuul and we want to switch the old one off, but the idea of no longer having build numbers is ruffling feathers15:40
brendangallowayTrying to find an easy way to provide something equivalent15:40
clarkbin the opendev system I would point people to the statsd reporting since we expose that publicly through graphite and grafana15:42
*** jamesmcarthur has quit IRC15:45
*** jamesmcarthur has joined #zuul15:47
*** jamesmcarthur has quit IRC15:49
*** noorul has quit IRC15:50
*** jamesmcarthur has joined #zuul15:52
*** noorul has joined #zuul15:53
noorulDoes zuul support running a job in a container on a static node?15:55
pabelangernoorul: yup, you can write a job to use docker for that15:57
*** jamesmcarthur has quit IRC15:57
corvusbrendangalloway: in addition to stats reporting, you can run a build query -- eg http://zuul.opendev.org/t/zuul/builds?job_name=zuul-promote-image&pipeline=promote15:57
*** TxGirlGeek has quit IRC15:58
*** jamesmcarthur has joined #zuul15:59
*** recheck has quit IRC16:00
*** recheck has joined #zuul16:00
noorulpabelanger: I think I can use https://docs.ansible.com/ansible/latest/modules/docker_container_module.html16:00
*** zbr is now known as zbr|ruck16:07
*** gtema_ has quit IRC16:11
noorulI am trying something like this16:19
noorulhttp://paste.openstack.org/show/777430/16:19
noorulBut it has syntax error at line 1116:19
nooruldid not find expected '-' indicator while parsing a block collection16:19
noorulIs it not possible to use variables in a list?16:20
pabelangerline 11 needs to quota all16:23
pabelanger":/tmp16:23
pabelangerneeds to also be inside quotas16:24
pabelangerquotes*16:24
noorulpabelanger: Thank you16:25
noorulofosos: hi16:26
noorulWhat is the procedure for using 3rd party ansible modules inside zuul ?16:34
funginoorul: what sort of third-party module, and where inside zuul?16:36
funginoorul: if you're talking about use of ansible modules on the executor, zuul shadows and filters some modules in the per-ansible trees like https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/2.816:38
pabelangerI'd say nested ansible is likey more flexable approach16:38
pabelangerbut, with coming collections we'll need to make 3rd party modules a little more easier16:39
fungiand yes, having the executor's ansible invoke an ansible on a disposable build node gets you access to use any additional modules you want without compromising the security of the executor's contained ansible environment16:39
fungiansible wasn't really designed with the idea in mind that you might want to use it to run untrusted playbooks/roles/modules, so zuul cripples the ansible it's interfacing directly with to try and help preserve the security of its executors16:41
fungiif you're considering expanding the set of modules the executor allows, they should be very carefully scrutinized for ways they could be abused to compromise the environment16:43
*** tosky has quit IRC16:49
pabelangerneat, https://github.com/theopenlab/labkeeper looks to be a fork of stuff we use to deploy zuul.a.c :)16:50
noorulI am getting this error16:50
noorulFailed to import docker or docker-py - No module named requests.exceptions. Try `pip install docker` or `pip install docker-py` (Python 2.6)16:50
noorulDoes zuul only use python 2?16:50
pabelangerno, can use python3 but need configure that in nodepool16:50
pabelangerhttps://zuul-ci.org/docs/nodepool/configuration.html#attr-diskimages.python-path for example16:51
noorulpabelanger: I am using static driver16:52
pabelangerhttps://zuul-ci.org/docs/nodepool/configuration.html#attr-providers.[static].pools.nodes.python-path16:52
noorulIt also has python path option16:52
openstackgerritMerged zuul/zuul master: Add support for the Gerrit checks plugin  https://review.opendev.org/68077816:59
noorulpabelanger: http://paste.openstack.org/show/777430/17:06
noorullooks like the : in 10.29.12.160:7990 is creating problem17:06
noorulI am getting following error Error creating container: 500 Server Error: Internal Server Error ("invalid mode: /tmp")17:07
*** brendangalloway has quit IRC17:07
noorulhttps://docs.ansible.com/ansible/latest/modules/docker_container_module.html17:08
fungiis 10.29.12.160:7990 your connection name?17:08
pabelangershould it be: "{{ zuul.projects['10.29.12.160:7990/ac/commonlib'].src_dir }}:/tmp"17:08
openstackgerritMerged zuul/zuul master: Update gerrit pagination test fixtures  https://review.opendev.org/68211417:08
fungiahh, it's the second : causing the issue17:08
funginot the earlier one17:09
noorulOops let me correct17:09
*** jamesmcarthur_ has joined #zuul17:09
noorulThis is what I have now http://paste.openstack.org/show/777434/17:09
noorulThat was the old one17:09
noorulAs per the syntax, the third one is mode17:10
noorulSince I have colon in connection name itself17:10
noorulMy connection is defined here http://paste.openstack.org/show/777435/17:11
pabelangernoorul: what does your inventory file look like17:11
*** jamesmcarthur has quit IRC17:12
clarkbhttp://zuul.openstack.org/build/7dd6514fd3b24eaea2db05b09e1d0d26/log/job-output.txt#2904 seems to be the cause of the failure mgoddard pointed out earlier today17:12
clarkbbut I'm not seeing that module failure in the console of the job17:12
pabelangernoorul: that will include zuul.projects variable17:12
clarkbit says "See stdout/stderr for the exact error"17:13
clarkbanyone know what might cause a module failure like that?17:13
clarkbit has failed_when set to false which explains why that task doesn't fail the job17:14
noorulpabelanger: ?17:14
noorulI am not able to use zuul.projects['bitbucket/ac/commonlib'].src_dir }}17:16
pabelangernoorul: no, I am asking to see your inventory file, it will list the projects the job is using17:16
pabelangerwanted to see what zuul is expecting17:16
noorulDid you mean main.yaml ?17:18
noorulhttp://paste.openstack.org/show/777437/17:18
pabelangerclarkb: http://paste.openstack.org/show/777438/17:18
pabelangerusually ARA helps to expose some of that info17:18
*** hashar has joined #zuul17:18
clarkbpabelanger: yes the stdout problem is caused by the module failre I linked to (the task that hit module failure registers that variable17:18
clarkbpabelanger: the job failed bceause stdout wasn't set. stdout wasn't set because the module that sets it failed17:18
pabelangerwe restarted zuul right17:19
pabelangerdid we pick up new version of ansible?17:19
*** jpena is now known as jpena|off17:19
clarkbI don't think we upgrade ansible due to how pip install works17:19
clarkbthough that gives me the idae of checking the executor logs17:20
clarkbI'll go do that now17:20
noorulpabelanger: I am wondering why is it using ip address instead of the name17:20
pabelangerclarkb: Hmm, I guess something about ansible changed17:21
pabelangerhowever17:22
pabelangerhttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/persistent-firewall/tasks/main.yaml17:22
pabelangerI would consider switching that to include_task, and pass in iptables_rules / ip6tables_rules into the tasks17:22
pabelangerlet me check something17:22
clarkbwhat is the difference?17:22
pabelangerOh17:23
pabelangerhttp://paste.openstack.org/show/777440/17:23
pabelangerrc: -1317:23
pabelangerI've seen this before17:23
pabelangerbut, haven't figured it out17:23
pabelangerI think that is SIGPIPE?17:24
pabelangerbasically, ansible is killing the task, IIRC, which raises -1317:24
pabelangerI never figured it out17:24
fungiyes, manual for signal(7) confirms 13 is sigpipe17:25
fungioh, though this is an exit code not a signal17:25
clarkbone thing I notice looking at the executor logs is that those tasks ran twice17:25
clarkbthe first time it runs it does so successfully17:25
pabelangerclarkb: I can see it run twice, but on different nodes17:26
pabelangerunless I missed17:26
pabelangerfungi: yah, I think the - in -13 was a signal17:26
clarkbpabelanger: its twice per node for a total of 4 times17:26
pabelangerat least if I remember talking to core17:26
clarkbI think because the multinode bridge updates the firewall rules then repersists them after the initial pass17:26
clarkbthis happens 2 minutes and 15 seconds apart ish17:27
clarkblikely not a race in that case17:27
clarkbbut that also means the command is present on the host and successful ran at least once17:27
noorulpabelanger: Any work around?17:28
pabelangerclarkb: yah, something looks odd in that playbook, we call persistent-firewall multiple times17:28
pabelangeroh17:29
clarkbpabelanger: I don't think its wrong, we call it after making changes to the rules17:29
clarkband we make changes multiple times if doing multinode bridge setup17:29
pabelangerthis is test playbook?17:29
clarkbno17:29
clarkbunfortunately the ironic job doesn't seem to successfully collect syslog :/17:29
pabelangerk, I haven't really looked to much into multi-node playbooks17:31
pabelangerbut, I think we are hitting ansible issue17:31
pabelangerwould be cool if we can reproduce17:31
clarkbit wouldn't surprise me if including the same task file multiple times is buggy in ansible17:31
clarkbwe've find that having nested includes has been able to cause weird errors17:31
pabelangeryah, in fact, I think I'm including the same role multiple times too17:32
pabelangerso, maybe you are on to something17:32
clarkbmight also be a problem with the register task. Like rerunning the same command task with a register breaks17:33
pabelangeryah, I think passing vars into task via include_task might be something to try: https://docs.ansible.com/ansible/latest/user_guide/playbooks_reuse_includes.html#including-and-importing-task-files17:35
openstackgerritMerged zuul/zuul master: Support HTTP-only Gerrit  https://review.opendev.org/68193617:35
*** zbr|ruck is now known as zbr17:36
SpamapSWatching a Circle CI demo.. we could totally steal a thing from them with their restore_cache/save_cache keywords.17:38
SpamapSSimpler than building AMI's or docker images .. just have a way to save dirs and restore dirs.17:39
fungisounds like pbuilder in debian17:39
SpamapSyeah exactly17:40
fungitar up the tree and archive it, then untar it for subsequent builds17:40
SpamapSthey just use a key to invalidate.. so you can hash things like your requirements.txt or something else.17:40
clarkbrequirements.txt won't work because it is too loose. constraints might17:41
SpamapSPipfile.lock would work17:41
clarkbalso worth noting (and this too is python specific) venvs are not portable across even minor python version updates17:42
clarkbyou'd also need to invalidate if your base image changed17:42
SpamapSYeah, that seems pretty simple to do.17:42
fungiit goes deeper still17:43
SpamapSI think you are optimizing for rare events.17:43
noorulpabelanger: I shared main.yaml http://paste.openstack.org/show/777437/17:43
SpamapSbut yes, of course, invalidation will be needed.17:44
SpamapSalso I didn't suggest venvs17:44
SpamapSwheel cache for instance17:44
fungiif your dependencies include things which build c extensions from sdist and don't provide wheels, then it won't be the same if the underlying libs/headers change17:44
fungioh, but yeah, a wheel cache, sure17:44
SpamapSyou are all thinking in terms of *BIG* infrastructure like opendev. Small shops won't want to maintain all the caches and such17:44
mordredyeah. there's definitely ways in which such a technique could be really nice - and I think we have the general tools and plumbing to be able to do such a thing17:44
fungigranted, we basically already build a wheel cache in our ci system and publish it from local hosts in each region17:44
SpamapSRight I'm looking at what a team did that migrated away from Zuul17:45
mordredand probably wiring up a similar mechanism using the plumbing we've got so that people could easy do such a thing would be really helpful to a range of folks17:45
pabelangernoorul: so, each job will produce an inventory file, which should be collected via logs. For example: https://zuul.opendev.org/t/zuul/build/adf44f2115344c1bac7733f1ef22983c/log/zuul-info/inventory.yaml17:45
SpamapSBecause we didn't have time to build our own wheel cache.17:45
fungibut i agree, not having to maintain a separate system for wheel caches would be nice, even for large deployments like opendev, absolutely17:45
pabelangerin it you can confirm the syntax, but based on your yaml file, you need likely need to use bitbucket/ac/commonlib17:46
fungiSpamapS: the other concern which arises is trusted vs untrusted builds and cache poisoning. this is already a problem for systems like distcc17:46
fungier, well, ccache17:46
SpamapSonly do it on gate17:46
*** hashar has quit IRC17:46
fungiand scope it independently along project lines i suppose17:47
mordredyou could even use promote to push an updated version of the cache17:47
noorulpabelanger: I have that file17:47
mordredso that check and gate pull the cache, do $whatever to "update" it (which would frequently be noop)17:47
fungibecause project a doesn't necessarily trust caches created by changes approved by project b's maintainers17:47
mordredthen promote publishes an updated cache17:47
noorulpabelanger: It has project names that I can't put in public17:47
pabelangernoorul: that is fine, if you look at zuul.projects, it will be a list of projects you can use17:48
* mordred waves hands a bit - but it could be really cool17:48
noorulIt has this key17:48
noorul10.29.12.160:7990/ac/commonlib:17:48
noorulI am wondering why stash driver is using 10.29.12.160:7990 as prefix17:49
openstackgerritClark Boylan proposed zuul/zuul-jobs master: DO NOT MERGE test cleanup phase playbook  https://review.opendev.org/68017817:49
noorulIs it stash driver or Zuul ?17:49
pabelangerthat comes from your tenant config, IIRC17:50
pabelangerbut, you had bitbucket for connection17:50
noorulYes17:50
pabelangeryou can try setting canonical_hostname in your connect17:52
pabelangerto the hostname of your service17:52
pabelangerconnection*17:52
fungimakes me wonder if the bitbucket driver could be passing the connection address/port where it's supposed to be passing the connection name17:52
noorulAny idea which interface method is used?17:53
pabelangerSpamapS: curious how moved away from zuul, is that public info?17:54
pabelangerasterisk project used to run zuul at one point, but moved to jenkins. They never did get to a point of gating asterisk, just using zuul with gerrit to test, then merge manually17:55
*** sshnaidm is now known as sshnaidm|bbl17:57
SpamapSpabelanger: It's a team here at Good Money. Experiment that so far is going well. They just didn't ever embrace Zuul/Ansible and CircleCI has a lot of bells and whistles for smaller teams/apps.17:57
pabelangerack17:58
fungii agree looking at what bells and whistles people find useful is a good opportunity to identify possible missing features17:58
mordred++17:58
mordredit's always good to learn about what things make people like other things17:58
openstackgerritMerged zuul/zuul master: Add autogenerated tag to Gerrit reviews  https://review.opendev.org/68247317:59
pabelangerSpamapS: I do agree that setting up caches stuff is time consuming. Still need to do that for zuul.a.c. Starting now to get to point where regional mirrors of things will be helpful17:59
SpamapSI mean, to put it in context.. we were too slow, taking 17 minutes to deploy.18:01
fungiit's also possible to cache a lot more stuff on node images18:03
fungiopendev used to do a bunch more of that than it does now, because the more you cache the easier it is for your cacheing to hide problems with dependency management18:04
SpamapSWe did that in response, but that took us, frankly, a month.18:04
SpamapSBecause nodepool+aws has no builder.18:04
SpamapSSo we had to create a packer build, get it working, move all the pre steps into it, and then make a zuul job that packer builds and uploads an AMI.18:05
pabelangeryah, we started to cache ansible/ansible on images, which did speed up job runs a bit18:05
SpamapSBy that time they'd already moved to Circle and ripping that team's stuff out of our monolithic build took it from 17 minutes to 7, just in time for some new microservices to add 3 more minutes. ;)18:05
SpamapSBut the AMI having everything installed won us back those 3 minutes.18:06
pabelangerI think I'd like to add docker insecure register, but also want to back with swift... so in a holding pattern until we figure out zuul-registry18:06
SpamapS(In reality, it happened a little different in terms of ordering, but we're at 7 minutes for our zuul build as of today, and that team that went to CircleCI is at 4 minutes. ;)18:07
pabelangeryah, that is some of the concern I hear from awx team, and using static nodes and GCE. They get faster builds18:08
pabelangerbut, I also don't think they are rebuilding (base) images each time18:09
SpamapSI think at scale, building AMI's with caches and having local mirrors .. all "the right thing". But making it simpler for a single build to solve its own problems may actually be the more important capability.18:09
SpamapS(AMI's, or disk images, or whatever)18:10
mordredSpamapS: the thing with a cache save/restore thing is that it doesn't just have to be used for caches18:10
clarkbpabelanger: static nodes also have the potential for going sideways18:12
mordredand it's a feature we already have in zuul - its just not exposed in a single save/restore pair like in circle18:12
* clarkb lived that life for a long time does not want to go back18:12
SpamapSmordred: is that the provides thing?18:12
pabelangerclarkb: yah, agree. I totally think it is an education issue too.  But so far, they seem happy to use that, regardless of the potential issues18:12
fungiyeah, glad to no longer spend a good chunk of my day checking the health of all our static ci nodes and rebooting them18:12
SpamapSI've never really looked at it.18:12
mordredSpamapS: which is me saying - yeah - empowering people to be able to use the underlying power to solve their spefific problems without them knowing how all of the pieces work is super helpful18:12
mordredSpamapS: exactly18:13
mordred:)18:13
SpamapSBut isn't that between pipelines?18:13
pabelangerclarkb: I really don't want to have jobs be blocked when infra is down, that is the power of zuul to me18:13
mordredyeah - provides is - that's how a parent job could update a 'cache' if needed and have a child job pick up the updates. it's the other side of the coin to "I need one of these, it needs to be updated sometimes, and I need it as an input to my job"18:14
fungipabelanger: general leaks over time cause static nodes to start systemically failing jobs because they run out of disk/ram/whatever, but if you're running builds which require root privileges then it gets waaay worse since a build can absolutely shred the node or, worse, backdoor it and then steal things like credentials for subsequent builds18:14
mordredthe image build jobs are, I think, the most complete implementation of the pattern overall - but they're docker specific. the usage pattern though is essentially what one should have from a caching system in a world where depends-on and friends exist18:15
SpamapSthe general flow of   things = restore_cache("key") or build_things("key")...save_cache("key") is the one that I like.18:15
pabelangerfungi: agree! Maybe at ansiblfest you can help share that info :D18:15
fungihappy to!18:15
SpamapSLike in this case, we have a node_modules dir that only needs to be maintained when the yarn.lock changes.18:16
mordredyah. totally18:16
SpamapSso, Circle has a really nice way to just say "restore node_modules from cache if you can" and then "save node_modules to cache"18:16
SpamapSI'd like to steal that for Zuul. :)18:16
mordredjust saying - that's the same effective pattern - so the tools are there to put together the thing you're talking about18:16
SpamapSAnd one problem we have there, is that Zuul doesn't have a real abstraction for object storage/filesystem-caching/etc.18:17
fungii suppose we just needs some more specific roles/workflows around that use case18:17
SpamapSIt goes back to my idea that we don't need "nodepool" we need "thingpool"18:17
SpamapSAnyway, just a thought.18:18
SpamapSI also don't know how much I've missed in terms of roles that could make our builds simpler.18:19
openstackgerritMerged zuul/zuul master: Use robot_comments in Gerrit  https://review.opendev.org/68248718:19
SpamapSThere may be lots of roles out there that make Zuul life as simple as Circle life.18:20
noorulAfter adding canonical_hostname everything stopped working :(18:22
mordredSpamapS: yeah - I think that was kind of what I was getting at - I think the usage pattern you're talking about, while not implemented in exactly that way, is where several things either are headed or are already there, on a system-by-system basis - so I think broadly speaking that type of pattern is one that has a decent amount of work already done for it and I think it would mostly be some glue18:22
mordredSpamapS: so like - a, "yeah, I agree, that's a great interface to provide" and "I don't think we're too far away from being able to do so"18:23
SpamapSCool. Just.. it needs to be really obvious and simple to consume. Right now I think Zuul is in that stage where it can do anything, but .. it's not super approachable for the few things that should be simple and easy.18:24
mordredwith a side helping of "the specific use case example is a good one to have in our heads as we continue to work on providing good sugar"18:24
SpamapSSome things are, though. Like, if you embrace tox.ini, and all you want is CI on that.. ++ .. zuul crushes that.18:24
mordredSpamapS: yeah - I think the power in sharable job definitions is that for anythign people have written good jobs for, things become trivial and uber powerful18:25
mordredbut we definitely need to pick up more reusable job content of the same depth as the tox support18:25
pabelangerI wish all jobs were like tox jobs :) That would be huge, but also takes a lot of work.18:26
mordredyup18:26
SpamapSAnother thing that Circle does more easily is secrets.18:26
mordredbecause it also takes people who understand the ecosystem of the tools being supported. like - what are the right actions to take on behalf of someone when they are doing yarn/npm-based things18:26
clarkbmordred: you are asking the wrong person about that. I still have to undelete or is it delete? the "I'm just here to hangout" file in zuuls js builds18:27
clarkbseems like I get that wrong every time I push a new js related change18:27
SpamapSThey just have these things called "Contexts" and that's a group of envvars and other settings. Each CI step just calls out a context. That is totally doable in a Zuul shop with parent: prod-context   parent: staging-context  ... but.. it's not quite batteries-included for Zuul.18:27
mordredclarkb: well - to be fair, that's a javascript project embedded in a python project - so it's a very zuul-specific one-off18:28
clarkbSpamapS: i think at least some of that (env vars in particualr) is a desire to push that onto ansible so that we aren't rewriting what ansible can do18:29
*** hashar has joined #zuul18:29
clarkbmaybe that has to change, maybe not.18:30
mordredyeah - and the inability to tell ansible "just pass these env vars to every task please kthxbai" isn't not a thing that has annoyed me before :)18:30
nooruleven after removing canonical_hostname nothing works18:33
SpamapSclarkb: I think a solid library of jobs that are one level up from base job, would do it.18:34
SpamapSmordred: yeah, that would be a nice feature to add to Ansible. Something like playbook-level generics.18:34
*** openstackgerrit has quit IRC18:37
SpamapSlike "- playbooks_defaults: { become: true, environment: { FOO: bar } }"18:42
*** mattw4 has quit IRC18:44
noorulsomehow working after removing canonical_hostname18:46
noorulCan someone help me to find a work around to http://paste.openstack.org/show/777434/ ?18:51
noorulThe colon in 10.29.12.160:7990 is conflicting with volumes syntax18:52
clarkbnoorul: can you share error messages?19:20
dmsimardSpamapS: it sort of has something like that now -- module defaults: https://docs.ansible.com/ansible/latest/user_guide/playbooks_module_defaults.html19:29
SpamapSdmsimard: I want it a level up19:29
SpamapSthat's at the play level19:29
SpamapSI need it at the playbook level19:29
noorulclarkb: Error creating container: 500 Server Error: Internal Server Error ("invalid mode: /tmp")19:29
dmsimardSpamapS: it's at the play level19:29
dmsimardyeah19:30
dmsimardmight be workaroundable with a play that imports other playbooks, not sure19:30
clarkbnoorul: did it log the argument it passed to docker?19:30
clarkbnoorul: if so that will help us udnerstand how things are getting interpolated there19:31
SpamapSimport playbook is at the playbook level19:31
dmsimardah, yeah, I'm mistaken19:31
noorulSimple docker run http://paste.openstack.org/show/777447/ creates another problem19:32
noorulthe input device is not a TTY19:32
noorul19:32
noorul"volumes":[1 item19:34
noorul0:"src/10.29.12.160:7990/ac/commonlib: /tmp"19:34
noorul]19:34
mordreddmsimard: still - that's cool! TIL19:35
noorulclarkb: ^^19:35
clarkbI see so the problem is the port specification in the source side?19:36
clarkbdoes docker have a way to escape that? : is a valid file name character19:36
noorulby docker did you mean the ansible module?19:37
clarkbno docker itself or whatever is consuming that input19:37
noorulIt is the ansible module19:38
*** openstackgerrit has joined #zuul19:41
openstackgerritDavid Shrewsbury proposed zuul/nodepool master: Reduce upload threads in tests from 4 to 1  https://review.opendev.org/68297719:41
*** tosky has joined #zuul19:43
noorulhttps://github.com/moby/moby/issues/860419:44
clarkbnoorul: if I'm reading that right you need the --mount flag ?19:45
noorulclarkb: Not sure whether ansible module https://docs.ansible.com/ansible/latest/modules/docker_container_module.html supports that19:46
clarkbnoorul: ya you might have to run docker via the shell or command module isntead19:46
clarkbnot sure19:46
noorulI tried that, and I get the error "the input device is not a TTY"19:48
noorulIs there any example in opendev which uses docker run19:48
noorul?19:48
SpamapSdmsimard: I want ot echo what mordred said. I didn't know about module_defaults, and that's a handy thing19:49
Shrewszuul-maint: it sure would be pleasant to get the autohold-revamp merged and remove this Sword of Damocles from over my head19:49
* Shrews starts singing from the Rocky Horror soundtrack19:49
mordredShrews: but the dangling sword is so purty19:49
SpamapSShrews: is that a transvestite, transexual patch, from transylvania?19:50
noorulIn the devel branch mounts is supported19:50
noorulhttps://github.com/ansible/ansible/blob/devel/lib/ansible/modules/cloud/docker/docker_container.py#L37819:50
ShrewsSpamapS: you know it19:50
*** mattw4 has joined #zuul19:50
*** sshnaidm|bbl is now known as sshnaidm19:52
*** Goneri has quit IRC19:52
funginow we need to plan a zuul rocky horror stage performance19:59
Shrewsi call the part of Dr. Frank-N-Furter20:00
fungiwith the right baldwig i can do a decent riff raff20:00
corvusooh, i'm shivering with antici20:07
Shrews.... ?????????20:07
corvuspation20:07
Shrews*phew*20:07
fungithe suspense was unbearable20:08
*** pcaruana has quit IRC20:10
*** jamesmcarthur_ has quit IRC20:12
*** hashar has quit IRC20:12
*** jamesmcarthur has joined #zuul20:17
*** jamesmcarthur has quit IRC20:17
*** jamesmcarthur has joined #zuul20:18
dmsimardmordred, SpamapS: I've been using it to set the openstack cloud credentials so we don't need to repeat it for every module, it's pretty neat20:26
*** noorul has quit IRC20:35
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: RFC: Generic cache implementation  https://review.opendev.org/68299220:38
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: RFC: Generic cache implementation  https://review.opendev.org/68299220:39
corvusSpamapS: I did a weird thing there -- i wrote up a sketch of how we might implement a cache system like you were describing, except that the way i originally thought of implementing it wouldn't work without some changes to how zuul handles secrets.  but i wrote it anyway and pushed it as PS1.  then i revised the plan to something that would work with what we have today, but has a different interface20:42
corvus(shifting a bit more work into base jobs) and pushed that as PS2.20:42
corvusSpamapS: when you have a minute, maybe take a look at those and let me know if you think that fits the use-case, and if so, whether PS2 is something we should work on now, or whether we should take this as design input for figuring out how to make PS1 work.20:43
corvusanyone else too of course ^20:45
*** Goneri has joined #zuul20:45
corvusShrews: your stack is +3 up through the web api change20:51
corvuser, up until it20:51
daniel2Someone gave me a link to a documentation on using cloud images instead of building them with nodepool.  Can someone share that link again, I can't find it.20:59
corvusdaniel2: https://zuul-ci.org/docs/nodepool/configuration.html#attr-providers.[openstack].cloud-images  (or similar for other drivers)21:11
daniel2Thanks!21:11
Shrewscorvus: woooo21:18
*** Goneri has quit IRC21:22
daniel2Does nodepool builder have any .service files?  I swore I saw one before for nodepool builder but can't find it now.  For systemd21:25
clarkbdaniel2: nodepool itself doesn't ship any but some of the various config management tools for managing nodepool ahve init scripts and unit files21:26
clarkbthe zuul from scratch docs may also haev examples too /me looks21:26
*** Goneri has joined #zuul21:26
daniel2ah yes, found it21:27
clarkbhttps://zuul-ci.org/docs/zuul/admin/nodepool_install.html#service-file aha I was wrong21:27
clarkbhttps://opendev.org/zuul/nodepool/src/branch/master/etc/nodepool-launcher.service21:27
daniel2I could probably edit that just to builder21:28
SpamapScorvus: will take a look later today. Cool.21:28
daniel2launcher is running in docker.  I'm going to run the builder on the host.21:28
corvusi'm restarting opendev's zuul on HEAD right now; i noticed one minor issue which caused a non-fatal traceback on a gerrit connection without an http password.  i don't think it will cause ongoing problems, and i'll push a fix in a minute.21:31
*** panda has quit IRC21:41
*** panda has joined #zuul21:42
corvusand 2 tracebacks that are fatal21:44
corvusi'm restarting our scheduler at 3.10.221:44
*** nhicher has joined #zuul21:47
openstackgerritTristan Cacqueray proposed zuul/zuul master: Store a list of held node per held build in hold request  https://review.opendev.org/68246621:47
*** michael-beaver has quit IRC21:49
*** rlandy is now known as rlandy|bbl22:04
*** jamesmcarthur has quit IRC22:06
*** jamesmcarthur has joined #zuul22:09
mnaseri know "reviews welcome" but i've been struggling at searching zuul's codebase with gitea22:27
mnaserit almost always returns 0 results22:27
SpamapSisn't that what ripgrep i for? ;-)22:27
mnaseraha22:28
mnaserand on that note22:28
mnaseri was trying to check which services are stateful vs stateless and noticed the executor actually has a state_dir config option22:28
mnaserlittle research later shows we define it as `executor_state_root` but never reference it.. ever22:28
mnaserbesides just creating that directory22:28
mnaserwait gr thats in the test code only22:29
mnaserok so looks like one reference to determine disk avaialble on executor22:30
mnaserand storing ansible's22:30
mnasermaybe one day someone will read these chat logs and save themselves the source code reading :)22:30
*** jamesmcarthur has quit IRC22:33
*** rfolco has quit IRC22:43
clarkbSpamapS: more specifically it is why we haven't deleted codesearch22:47
*** jamesmcarthur has joined #zuul22:50
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Fix gerrit errors from production  https://review.opendev.org/68300622:53
openstackgerritJames E. Blair proposed zuul/zuul master: DNM: Use http for all gerrit tests  https://review.opendev.org/68300722:53
pabelangermnaser: yah, mostly executor.state_dir needs to be stateful23:06
pabelangergit repos / ansible, etc23:06
mnaserok so statefulsets for everything except zuul-web23:06
pabelangerand fingergw23:07
*** kerby has joined #zuul23:07
pabelangernodepool-launchers are stateless, builders maybe need access to cache for dib23:08
pabelangerstarting upgrade of zuul.a.c to 3.10.223:08
*** jamesmcarthur has quit IRC23:09
mnaserbtw, i think we should get around tagging release on dockerhub23:09
mnaserthat'd be neat23:09
pabelangerHmm, maybe we just need to add job to release pipeline23:10
mnaserwell in the release pipeline we dont really have a built artifact (unlike promote i guess)23:15
pabelangerYup, we'd need to build it, then push with right version info23:16
mnaserbecause the buildset registry would be long gone by then, so it would be a fresh rebuild23:17
pabelangerother jobs do it today23:17
pabelangereg: loci23:17
mnaser(in an ideal world it would be nice if it wasn't a fresh rebuild, and we just retag the one that's already uploaded)23:17
mnaserand that we know is tested23:17
mnaserbecause technically i guess the build that happens inside release isn't _exactly_ what we tested23:17
pabelangerI mean, could do that too. But I don't think we keep more then head in docker hub23:18
pabelangerdo we?23:18
mnaserno only head is in dockerhub23:18
mnaseractually the intermediate registry should always be running, so i'm wrong on that, i think it should be trivial, the trick is just figuring out the zuul version name in the job to tag23:19
pabelangerI think we purge that each night23:20
mnaserah that breaks my plan in that case23:20
mnasercause release can come many days after no changes23:21
pabelangerbut yah, we could keep them around some how23:21
pabelangerthen when we tag, also fetch original bits23:21
pabelangerbut, given for pypi we do a rebuild, docker should work the same23:21
pabelangerjust pulls in way more thing23:21
pabelangerthings*23:21
mnaserwell since we already upload to dockerhhub, we can upload/tag every commit id there23:22
mnaserand then its a matter of docker pull zuul/zuul-merger@<sha-of-current-released-commit> and retagging that with the tagged version23:22
mnaserits also makes it easy for someone to point towards a very specific version of zuul based on the commit id23:23
pabelangeryah, we did that for pypi for a bit23:23
pabelangerI liked it, so I could fetch any wheel23:23
pabelangerrather then building it myself23:24
mnaseri can imagine those repos getting a little annoyed with us tagging everything in there :p23:24
corvuspabelanger, mnaser: what do you mean by stateful for executors?  they benefit from cache but do not require it.23:26
mnasercorvus: the idea that /var/lib/zuul (aka executor.state_dir) needs to be stable23:27
corvusi mean, that if you shut one down or move it or whatever, if it has a git repo cache, great, it'll be faster.  but if it doesn't, it can totally rebuild it from scratch.23:27
pabelangerZuul version: 3.10.2 !23:27
mnaserright ok, so the state is a nice to have cache, but it is not necessary23:27
pabelangeryah, I was assuming mnaser was looking at doing a rolling upgrade in k8s. Agree, having cache isn't needed, just faster23:28
mnaserit seems like a useful optimization to have if its possible imho23:28
corvusmnaser: yep.  let's pretend i've forgotten all the right k8s vocab.  so just in general terms -- executors should be somewhat long-running so that they benefit from the cache, but for scaling up, down, or horizontal moves for eviction, etc, they can lose the state dir and be fine.23:30
mnasercorvus: that makes perfect sense23:30
corvusstate dir for the scheduler is critical (secret decryption keys)23:30
corvusoh, and everything said about executors applies to mergers, if you run any23:31
pabelangerso, for some reason, that I don't fully understand, https://dashboard.zuul.ansible.com/t/ansible/build/026dbb8307d5405d93e226286ade0650/log/job-output.txt seems to take some time to render the HTML23:34
pabelangerI'm unsure where the bottleneck is happening23:34
clarkbpabelanger: is it a very large file?23:34
*** jamesmcarthur has joined #zuul23:34
mnaseri think its the large file factor23:34
mnasermy browser seems to hang23:34
clarkbpabelanger: basically what happens is the js has to request the entire file then scan it with regexes. Large files will be slow because network transfer is slow as is regexing a large file23:34
pabelanger4.3mb?23:35
pabelangerif I access file directly, it loads much faster23:35
pabelangerhttps://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/ansible_24/524/64bf07cadde572bf2ae5fa19bd0201b153f12f5a/promote/windmill-config-deploy/026dbb8/job-output.txt23:35
mnaseri think its cause one does zero css/rendering/blah23:35
pabelangerclarkb: ah, I see23:35
mnaseri just profiled it with chrome23:36
pabelangercould it be server side?23:36
mnaser2s for scripting, 4s rendering23:36
clarkbpabelanger: if directly fetching it is fast then no it shouldn't be server side23:36
pabelangerk, I'll have to dig more23:37
pabelangerwas thinking of disabling job-output.txt.html, like opendev did23:37
clarkbI think mnaser just profiled why it is slow23:37
pabelangerwhat does rendering?23:37
corvuspabelanger: obviously, whatever workflow works for you, but i've basically stopped using the text console in favor of the json: https://dashboard.zuul.ansible.com/t/ansible/build/026dbb8307d5405d93e226286ade0650/console23:38
corvus(i reserve the log viewer for logs other than the console output)23:39
mnaserok i see a little reason why it is taking up a lot23:39
mnaserwe're building a table of N rows where N is the number of lines23:40
mnaserin this case, a table with 39.4k rows23:40
*** tosky has quit IRC23:40
mnaserso the number of nodes explodes as the browser tries to render it all23:40
pabelangerhow do I see that?23:41
mnaseri usually open the chrome developer tools23:42
mnaserand then go into "performance"23:42
mnaserhit the record button (make sure you have memory profiling enabled/checked too)23:42
mnaserand then refresh the page, wait for it to be done, stop recording23:42
pabelangerk23:42
mnaserFWIW, GitHub does the same thing.. so i dont know if there's a better way to deal with this23:42
mnaserhttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py a 10K github file probably gives your computer a workout too23:43
mnaserand they even do syntax highlighting so its probably worse (if it was ~40k lines)23:43
pabelangerthat loads not too bad23:44
corvuswell, if it loads in 25% of the time it's comprable performance.23:44
pabelangertrue23:45
pabelangerokay, thanks for help23:45
corvuspabelanger: try out the console tab.  keep in mind you can deep-link to individual results.23:45
corvuseg https://dashboard.zuul.ansible.com/t/ansible/build/026dbb8307d5405d93e226286ade0650/console#3/0/1/localhost23:46
pabelangerwell, we have some large shell tasks, for nested ansible. Having some trouble figuring that out23:47
pabelangereg: https://dashboard.zuul.ansible.com/t/ansible/build/026dbb8307d5405d93e226286ade0650/console#3/0/1/localhost23:47
pabelangersorry23:48
pabelangerhttps://dashboard.zuul.ansible.com/t/ansible/build/026dbb8307d5405d93e226286ade0650/console#3/1/6/bastion01.sjc1.vexxhost.zuul.ansible.com23:48
corvusoh yeah, it's less great for nested ansible23:48
pabelangerk23:49
pabelangermnaser: btw: we haven't see issues with centos7.7 so far, but not doing much with it currently. Mostly a stale mirror23:52

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!