Tuesday, 2019-09-17

*** jamesmcarthur has joined #zuul00:21
*** jamesmcarthur has quit IRC00:23
*** jamesmcarthur has joined #zuul00:23
*** jamesmcarthur has quit IRC01:05
*** igordc has quit IRC01:08
*** mattw4 has quit IRC01:09
ianwi'm coming to the conclusion there really is a problem with the log streamer when ansible is running under python3 ... do we really test that in production?01:10
ianwi think even on our bionic nodes, we're not setting ansible_python_interpreter01:10
clarkbwe set it to 2 iirc01:11
ianwit works locally testing zuul_stream ... but reliably fails https://review.opendev.org/#/c/682275 :/01:25
*** Goneri has quit IRC01:31
*** rfolco has quit IRC02:28
*** roman_g has quit IRC02:33
*** jamesmcarthur has joined #zuul02:38
*** bhavikdbavishi has joined #zuul02:43
SpamapSianw: I use python3 exclusively in my setup. What's the problem?02:43
ianwSpamapS: so you set python-path on your dib nodes to "/usr/bin/python3" as well?02:44
SpamapSI don't have "dib nodes" ... I'm on AWS with packer-built AMI's.02:45
*** bhavikdbavishi1 has joined #zuul02:45
SpamapSI've been doing this a long time, I set ansible_python_interpreter=/usr/bin/python3 in my site variables.02:46
SpamapSI think I did this before nodepool had a facility for that.02:46
ianwhrm, well it's definitely failing in the gate remote tests, and i *think* what's happening is the streamer plugin is somehow failing02:47
SpamapSAnd my only working OS is Ubuntu 18.0402:47
*** bhavikdbavishi has quit IRC02:47
ianwyou get the last task output, then http://paste.openstack.org/show/777058/02:47
*** bhavikdbavishi1 is now known as bhavikdbavishi02:47
ianwAnsible output terminated02:48
SpamapSHm, let me see02:48
SpamapSin the executor?02:48
ianwyou can see the failure in https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_539/682275/2/check/zuul-tox-remote/5391677/testr_results.html.gz02:49
SpamapSI see "Ansible output terminated" at the end of every job's ansible summary.02:49
SpamapSbut there's no traceback or anything02:49
ianwsorry, i think the error is more "[Zuul] Log Stream did not terminate"02:50
*** persia has quit IRC02:50
ianwthe the job aborts02:51
SpamapSI do see that now and then02:51
ianwi think that means the streaming callback plugin died, somehow ... but figuring out how is currently how i'm stumped :)02:52
SpamapSahh yeah, perhaps we need to wrap it in a try/except that writes the exception into a tempfile.02:53
ianwhrrm, i could wrap all functions in a decorator for that ...02:55
*** persia has joined #zuul02:56
*** jamesmcarthur has quit IRC02:57
SpamapSYeah I guess since it's a plugin that's the way you'd have ot do it.03:03
ianwhttp://paste.openstack.org/show/777059/ ... something seems to be going bananas03:17
ianwconstantly forking and "ansible_zuul_console_payload_69fl742k" is somehow involved ... this seems to suggest "zuul_console:" somehow :/03:18
*** persia has quit IRC03:18
*** persia has joined #zuul03:24
*** jamesmcarthur has joined #zuul03:56
*** jamesmcarthur has joined #zuul03:57
ianwi think that might be a red herring ... the console streamer is trying to open a file that never appears maybe04:00
*** jamesmcarthur has quit IRC04:51
*** bolg has joined #zuul05:15
*** pcaruana has joined #zuul05:16
*** jamesmcarthur has joined #zuul05:21
*** pcaruana has quit IRC05:29
*** sshnaidm|afk is now known as sshnaidm|pto05:35
*** AJaeger has quit IRC05:52
*** sanjayu_ has joined #zuul05:54
*** spsurya has joined #zuul05:55
*** AJaeger has joined #zuul05:56
*** sanjayu_ has quit IRC06:00
*** saneax has joined #zuul06:01
*** jamesmcarthur has quit IRC06:10
*** themroc has joined #zuul06:27
*** jamesmcarthur has joined #zuul06:32
*** avass has joined #zuul06:36
*** jamesmcarthur has quit IRC06:37
*** jamesmcarthur has joined #zuul06:45
openstackgerritIan Wienand proposed zuul/zuul master: [dnm] testing python3 ansible  https://review.opendev.org/68255606:57
*** jamesmcarthur has quit IRC07:17
*** tosky has joined #zuul07:27
*** armstrongs has joined #zuul07:27
*** jangutter has joined #zuul07:28
*** saneax has quit IRC07:28
*** armstrongs has quit IRC07:37
*** jamesmcarthur has joined #zuul07:44
*** bhavikdbavishi has quit IRC07:44
*** jpena|off is now known as jpena07:46
*** jamesmcarthur has quit IRC07:50
*** jamesmcarthur has joined #zuul07:52
ianwhttps://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_cfd/682556/1/check/zuul-tox-remote/cfd83a2/testr_results.html.gz07:53
ianwmordred: ^ if you could figure out any clues as to why this seemingly small change to python3 leads to ^ https://review.opendev.org/#/c/682556/1/tests/base.py which appears to me to be an issue with the streaming?07:54
*** hashar has joined #zuul07:58
mordredianw: 2019-09-17 07:19:04,069 zuul.AnsibleJob.output seems to be the last chunk that ran - and I don't know if that traceback is expected08:00
mordredianw: maybe there's a behavior/traceback change under 3 that's not getting caught properly? that test is testing if something doesn't exist - so *maybe* something that we're doing is not handling an error properly when running under 3?08:01
mordredbut - otherwise, no, I don't have an immediate thought08:01
ianwmordred: yeah, i just can't find it :(08:02
ianw2019-09-17 07:20:04.902649 | ubuntu-bionic |     b"2019-09-17 07:19:34,105 zuul.AnsibleJob.output           DEBUG    [e: 32de590007fe490da9e7b9cc89391a90] [build: c1a1bde54c19483a97a26774f9add01f] Ansible output: b'[Zuul] Log Stream did not terminate'"08:02
ianwI think that is probably a problem?08:02
mordredmaybe?08:02
ianwdunno, throwing in the towel on this one for today, anyway08:05
ianwthe bigger problem is that i want to use python-path python3 for fedora 30 -> https://review.opendev.org/68256908:06
ianwi think that for ansible >=2.8 zuul should be able to just leave ansilbe up to it's own devices on this one (https://review.opendev.org/682275) but have to figure out what's going on with this failure first08:07
mordredianw: you've got all the fun ones :)08:11
*** jamesmcarthur has quit IRC08:15
*** jamesmcarthur has joined #zuul08:16
*** igordc has joined #zuul08:20
*** bhavikdbavishi has joined #zuul08:25
*** igordc has quit IRC08:28
*** noorul has joined #zuul08:31
noorulhi08:31
noorulIs there a way to define a project dependent on another project?08:32
SpamapSnoorul: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects08:37
SpamapSnoorul: but that's job->project.08:37
SpamapSso it can be inferred by  project->job->required-project08:38
openstackgerritMerged zuul/zuul-jobs master: Add a netconsole role  https://review.opendev.org/68090108:38
noorulSpamapS: Thanks! Which branch will it checkout?08:39
avassnoorul: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.override-checkout08:40
noorulavass: Thank you! Is there an example?08:41
avassnoorul: So the same branch as the branch triggering the job unless override-checkout is specified08:41
noorulavass: What happens if that branch does not exist in the required project?08:41
avassnoorul: Not sure, I guess the job would fail08:42
noorulOops!08:42
noorulI am thinking how can I do the following08:43
noorulI have two repositories repo1 and repo208:43
noorulboth of them has release branches08:43
noorulrel_1.1.108:43
noorulNow I have a private branch br1 and using that I raise a PR08:44
noorulIf I defined the repo2 and required project for the job08:44
noorulAny idea what will happen?08:44
ianwmordred: when you say "not handling an error properly" do you mean in the streamer, or somewhere else in zuul that might decide to abort the job?  http://paste.openstack.org/show/777068/ is what i see trying with local testing, but i can't make the streamer fail :/08:46
avassnoorul: I guess that it would try to checkout your private branch on both repositories unless override-checkout attribute is specified08:48
avassnoorul: But I'm not sure exactly how it works since we're not using it yet :)08:49
avassnoorul: unless I'm reading this wrong08:51
avassnoorul: Seems strange to me that it would try to checkout the same ref for both projects since you can't guarantee that the same ref exists in both repos.08:55
mordredianw: yeah - I'm honestly not sure what I mean - I'm a bit grasping at straws there08:58
mordredianw: oh yeah - hrm. both are showing a traceback09:00
avassmordred: could you shine some light on how this works? https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.override-checkout09:02
avassmordred: actually I mean this: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.override-checkout09:02
avassmordred: which branch/ref does it checkout by default? master?09:03
mordredwell - by default it checks out the branch matching the target, and if it cant' find that, it'll fall back to master. if override-checkout is defined, it'll use that09:04
avassmordred: ah, that makes sense09:04
mordredin the example from noorul above, the private branch br1 isn't relevant if it's the *source* of the PR09:04
mordredassuming the PR is targetting one of the regular shared long-lived branches09:05
noorulmordred: In my example I want repo2's rel_1.1.1 to be checked out09:06
noorulmordred: But looks like master will be checked out09:06
mordrednoorul: if you are submitting the PR ato repo1's rel_1.1.1 branch, zuul should also check out rel_1.1.1 of repo209:08
noorulSo, if br1 exists in repo2, it will be checkout out, otherwise rel_1.1.109:10
noorulIs my understanding correct?09:10
mordrednoorul: ah - no - sorry, I misunderstood what you meant by private branch. you mean you have a branch, br1, on the main shared repo1 and you are submitting PRs to that branch09:15
mordredam I understanding that right?09:15
noorulNot exactly09:16
noorulI submitting a PR from br1 to rel_1.1.1 of repo109:16
mordredah - awesome09:16
nooruland say I have in the job required_projects - repo209:16
mordredin that case, br1 shouldn't play into the decision making from zuul at all09:16
mordredit's about which branch you are submitting a change *to* - so since you are submitting from br1 to rel_1.1.1 of repo1- then if you add repo2 into the required_projects, zuul should default to checking out rel_1.1.1 of repo209:17
noorulI see09:18
mordredso in this case you should not need an override-checkout and zuul should do the right thing09:18
noorulWhat is the use case of override-checkout?09:18
*** pcaruana has joined #zuul09:19
mordredin case the repos don't share a common structure. for instance, I have a job in openstacksdk that tests against ansible and has required-projects: github.com/ansible/ansible ... in this case, I want to test stable/rocky of openstacksdk against stable-2.6 of ansible - so I use override-checkout to tell zuul about that09:20
mordredyou could also use if it you wanted to define an additional job that tested your rel_1.1.1 of repo1 against master of of repo2 - to verify that a change worked both with a release and had future upwards compat, for instance09:21
nooruldoes it support wildcard ?09:22
mordredno - only direct values. however- the repos all have all of their branches in correct state, so if you need to get more clever, you can always do a git checkout in a job09:24
*** saneax has joined #zuul09:24
mordredhttps://opendev.org/opendev/system-config/src/branch/master/.zuul.yaml#L192-L229 <-- this is an example of a job where we want stable-2.15 of a bunch of gerrit repos on patches to master of opendev/system-config09:24
noorulI see09:25
noorulthanks for that example09:25
noorulI have a very simple job http://paste.openstack.org/show/777072/09:25
noorulbut it fails09:25
noorul/bin/sh: 1: ./run_tests.sh: not found09:25
*** sanjayu_ has joined #zuul09:25
noorulBut the file exists09:26
mordredyou need to change directories to your repo09:26
noorulinside run_tests.sh?09:26
mordredno - in the job - that shell command is going to be running with cwd of /home/zuul09:27
mordredbut your repo will be in something like /home/zuul/src/opendev.org/openstack/repo109:27
noorulHmm09:27
mordred(repos are put in golang format on disk inside of the src dir)09:27
noorulIs there an example?09:28
mordredso - http://paste.openstack.org/show/777073/09:28
avassnoorul: you probably want to put a chdir: {{ zuul.project.src_dir }} on the shell command09:28
*** saneax has quit IRC09:29
mordredhttps://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/gerrit/repos.yaml09:29
mordredyah09:29
mordred{{ zuul.project.src_dir }} is great for this case09:29
*** roman_g has joined #zuul09:30
*** jamesmcarthur has quit IRC09:31
*** jamesmcarthur has joined #zuul09:32
*** noorul has quit IRC09:42
openstackgerritIan Wienand proposed zuul/zuul master: [dnm] testing python3 ansible  https://review.opendev.org/68255609:50
*** noorul has joined #zuul09:54
noorulmordred: http://paste.openstack.org/show/777075/09:54
avassnoorul: found unacceptable key (unhashable type: 'AnsibleMapping')10:03
avassnoorul: https://docs.ansible.com/ansible/2.5/user_guide/playbooks_variables.html#hey-wait-a-yaml-gotcha10:03
avassnoorul: The jinja expression needs to be put in quotes otherwise ansible will think it's a yaml dictionary10:07
*** jamesmcarthur has quit IRC10:07
*** pcaruana has quit IRC10:07
avassnoorul: But only if a value is started with an expression10:07
avassSo it should have been chdir: "{{ zuul.project.src_dir }}", my fault :)10:08
*** bhavikdbavishi has quit IRC10:14
noorulavass: Got it10:16
*** recheck has quit IRC10:17
noorulIs it possible to define ansible role in non-trusted project?10:17
*** recheck has joined #zuul10:18
*** recheck has quit IRC10:22
*** recheck has joined #zuul10:23
mordredabsolutely10:24
*** recheck has quit IRC10:24
*** recheck has joined #zuul10:25
openstackgerritIan Wienand proposed zuul/zuul master: [dnm] testing python3 ansible  https://review.opendev.org/68255610:27
*** recheck has quit IRC10:28
*** recheck has joined #zuul10:29
*** recheck has quit IRC10:31
*** recheck has joined #zuul10:31
*** noorul has quit IRC10:34
*** recheck has quit IRC10:37
*** recheck has joined #zuul10:38
*** jamesmcarthur has joined #zuul10:40
*** recheck has quit IRC10:41
*** recheck has joined #zuul10:41
*** hashar has quit IRC10:44
*** pcaruana has joined #zuul10:55
*** noorul has joined #zuul11:00
*** avass has quit IRC11:07
*** hashar has joined #zuul11:09
*** noorul has quit IRC11:11
*** jamesmcarthur has quit IRC11:12
*** jpena is now known as jpena|lunch11:17
*** panda is now known as panda|ruck11:41
*** sanjayu_ has quit IRC11:44
*** gtema_ has joined #zuul11:53
*** gtema_ has quit IRC11:56
*** bhavikdbavishi has joined #zuul11:59
*** rfolco has joined #zuul12:04
*** jangutter_ has joined #zuul12:05
openstackgerritJan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check  https://review.opendev.org/64455712:07
*** jangutter has quit IRC12:08
*** jamesmcarthur has joined #zuul12:10
*** jamesmcarthur has quit IRC12:10
*** jamesmcarthur_ has joined #zuul12:10
*** jangutter_ has quit IRC12:11
*** rlandy has joined #zuul12:18
*** bhavikdbavishi1 has joined #zuul12:22
*** gtema_ has joined #zuul12:23
*** bhavikdbavishi has quit IRC12:24
*** bhavikdbavishi1 is now known as bhavikdbavishi12:24
*** jamesmcarthur_ has quit IRC12:26
*** pcaruana has quit IRC12:27
*** openstackstatus has quit IRC12:28
*** openstack has joined #zuul12:32
*** ChanServ sets mode: +o openstack12:32
*** sanjayu_ has joined #zuul12:33
*** AJaeger has quit IRC12:34
*** AJaeger has joined #zuul12:36
*** jpena|lunch is now known as jpena12:37
*** Goneri has joined #zuul12:39
*** bhavikdbavishi has quit IRC12:46
*** fdegir has quit IRC12:47
*** fdegir has joined #zuul12:48
*** mattymo has joined #zuul12:53
*** gtema_ has quit IRC12:56
*** pcaruana has joined #zuul12:59
mattymoAnyone familiar with nodepool that could help me out? I can get nodepool to do rhel7 registration just fine, but it unregisters at the end of build. That's okay. I want now to do RH registration when nodepool-launcher launches an openstack instance12:59
mattymoor is this something beyond nodepool's scope?12:59
pabelangermattymo: yes, you need'd to setup a zuul pre-run playbook to do that13:03
pabelangernodepool no longer has the ability to modify a node at launch time, only zuul can13:03
mordredwhat an interesting use case ...13:03
pabelangermattymo: other option, is create some sort of boot script, that does it when the node first launches13:04
pabelangerwe do that tody with some dns things in opendev, and for ansible-network setting up network appliance config13:04
*** gtema_ has joined #zuul13:07
mordredpabelanger: this seems like a good description of a type of action that might (or might not) be worth pondering. because the activity in question is tied to a nodeset / label and isn't really job specific. I don't think I'd pondered the rhel activation use case before (I mean - obviously can do with a pre-run - it's just an interesting case to consider)13:07
fungiit also could be the case that you want to restrict to some specific maximum number of rhel nodes running simultaneously because you only have a certain number of licenses? if so that starts to look a lot like a (perhaps label-specific) quota13:09
mattymopabelanger, is that dns boot script on a public git repo?13:10
fungii wonder if a nodepool driver shim might be a way to solve it13:10
pabelangermordred: yup, there is likey a few network appliance related things we can lump into that too. X that needs to be accomplish to finsh image build process. For now, we solved that with pre-run jobs in zuul, but is complicated to control which playbooks run via hosts13:11
pabelangermattymo: https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/nodepool-base/finalise.d/89-unbound#L13813:13
*** bhavikdbavishi has joined #zuul13:13
*** brendangalloway has joined #zuul13:14
tristanCcorvus: is it ok if I +3 the pagure patches from fbo you already +2?13:14
mattymothankfully in my case I have plenty of licenses13:14
mattymoI just don't want creds to ever get stored on the target host13:14
mattymobut the way my deployment runs, ansible runs only on the target host13:15
brendangallowayI'm writing a job that redeploys a static host (by triggering a  pxe boot) in the middle13:19
brendangallowayHowever, I'm getting a failure on wait_for_reconnect timing out even though the host has come back up after the reinstall13:20
pabelangermattymo: yah, in that case, might be better to have zuul do that step. So, you can store that data as a secret in zuul13:20
brendangallowayRunning the ansible role outside of zuul succeeds, and I am able to reestablish a connection and continue the job13:20
pabelangerthen hope (which I haven't figured out or even tested) and job doesn't try to leak the license info13:20
brendangallowayAny suggestions on debugging the state of the executor at the time of the failure to figure out what is going wrong?13:22
pabelangerbrendangalloway: I'm not sure what wait_for_reconnect is, that something you created?13:23
pabelangeror using wait_for_connection13:23
brendangallowaypabelanger: yes, wait_for_connection13:24
pabelangerbrendangalloway: maybe first use wait_for to ensure SSH port is open?13:25
pabelangerwe do that today, and works well13:25
brendangallowayok, let me try that13:25
*** bhavikdbavishi has quit IRC13:25
*** bhavikdbavishi has joined #zuul13:26
*** pcaruana has quit IRC13:28
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - add support for git.tag.creation event  https://review.opendev.org/67993813:33
pabelangermordred: do you remember a time, where zuul might have been running multiple ansible shell tasks, on the same node from nodepool, at the same time?  For some reason I thought that was a problem a while back.  Basically, I've seen something odd, where we pontentially have shell task process running twice in zuul.a.c13:37
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - add support for git.tag.creation event  https://review.opendev.org/67993813:37
pabelangerI want to say, it was something to do with the version of command that zuul shipped for ansible?13:38
*** avass has quit IRC13:47
*** jamesmcarthur has quit IRC13:47
*** sanjayu_ has quit IRC13:48
*** pcaruana has joined #zuul13:59
*** michael-beaver has joined #zuul14:00
openstackgerritTristan Cacqueray proposed zuul/zuul master: Disable rsh synchronize rsync_opts  https://review.opendev.org/68265714:12
mordredpabelanger: I vaguely remember something something but also don't remember. clarkb ?14:16
corvusclarkb: ^ 68265714:17
*** jamesmcarthur has joined #zuul14:17
clarkbcorvus: tristanC done14:18
clarkbmordred: pabelanger I dont recall14:18
*** jamesmcarthur has quit IRC14:22
corvustristanC: re pagure, yes -- https://review.opendev.org/679938  was the only thing i was worried about.  i just left a comment on that (cc fbo)14:24
mordredcorvus: I left you a comment on your robot comments patch - possibly just to prove I'm actually reading these patches14:25
*** jamesmcarthur has joined #zuul14:27
corvusmordred: excellent question14:31
*** jamesmcarthur has quit IRC14:32
corvusmordred: i made an answer14:32
mordredcorvus: cool14:34
corvusclarkb: do you think you could take a look soon at the gerrit stack starting at https://review.opendev.org/681132 ?14:37
clarkbcorvus: yes after my morning meeting I can take a look14:38
*** jamesmcarthur has joined #zuul14:38
mordredclarkb: I think you'll enjoy it14:38
*** mattymo has quit IRC14:42
*** jamesmcarthur has quit IRC14:43
*** bolg has quit IRC14:47
*** jamesmcarthur has joined #zuul14:48
*** pcaruana has quit IRC14:55
openstackgerritMerged zuul/zuul master: Disable rsh synchronize rsync_opts  https://review.opendev.org/68265715:08
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Add scheduler max_hold_age config option.  https://review.opendev.org/68267515:28
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Mark nodes as USED when deleting autohold  https://review.opendev.org/66406015:38
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Auto-delete expired autohold requests  https://review.opendev.org/66376215:38
brendangallowaypabelanger:  I'm trying to use wait_for to check that the ssh service has come back up, but I can't execute the task on the executor.  Not sure I correctly understood your suggestion15:39
pabelangerbrendangalloway: do we block it? Which error are you seeing. In our multi node setup, we run it from nested ansible node to check if 2nd node is online15:41
ShrewstristanC: I don't really follow your comments on https://review.opendev.org/679057. Why is the tenant needed to get or delete an autohold via the web API?  I was following mhu's instructions there and I don't quite get why that's required (obviously not required via CLI).15:41
brendangallowaypabelanger:  "msg": "Executing local code is prohibited"15:41
pabelangerk, so in that case you need to move it to trusted playbook or maybe run from nested ansible15:42
Shrewscorvus: is https://review.opendev.org/682675 what you had in mind to deal with the node expiration issue?15:43
brendangallowayWe're running a single node here - we asked previously about issues trying to run jobs with both static and openstack nodes.15:43
pabelangerright, so the single node is still online, is that right? it is the static node you are doing a pxe with?15:44
brendangallowayso we don't really have an option to spin up another node in the same job to defer the wait_for to15:46
brendangallowaythe single node is going down during the play and being pxe booted15:46
tristanCShrews: when a tenant REST endpoint is white-label, user doesn't have access to /api/autohold, all there requests are scoped /api/tenant/{user-tenant-name}/15:47
mordredwhy are we blocking wait_for ?15:47
pabelangerah, so yah in that case you need to move the wait_for into a trusted playbook, to run from executor15:47
brendangallowaytrusted playbooks are only allow to run in a post environment correct?15:47
mordredno - they can run anywhere - their content just isn't executed speculatively - so if you propose a change to one, the change has to land before it takes effect15:48
tristanCShrews: thus if we have /api/tenant/{tenant}/autohold to list autoholds (at L1105), then we should have /api/tenant/{tenant}/autohold/{id}15:48
mordredbut also - I want to see if there is a way we can allow wait_for - because it seems like a sensible thing to want to do15:48
mordredbrendangalloway: can you try using wait_for_connection instead?15:50
pabelangerthat didn't work15:50
brendangallowayThat was my original approach15:50
mordredoh. weird15:50
mordredk.15:50
pabelangerwait_for, was to scan to see if port was open15:51
clarkbcorvus: https://review.opendev.org/#/c/682487/ is the stack you were asking for review on right?15:51
pabelangerthen try wait_for_connection15:51
pabelangerbrendangalloway: next option, would be shell ssh-keyscan / loop15:51
brendangallowaybut it's getting the 'unable to ssh' error15:51
pabelangerbut with executor, that is going to be blocked too15:51
pabelangerIIRC15:51
brendangallowaywait_for being allowed to execute would be ideal15:51
corvusclarkb: that's the end, https://review.opendev.org/681132 is the start15:52
*** gtema_ has quit IRC15:52
corvusShrews: generally yes -- i'll take a detailed look in a few15:52
*** bolg has joined #zuul15:52
*** hashar has quit IRC15:53
*** hashar has joined #zuul15:54
brendangallowaypabelanger:  That would also still need to executed by trusted or a third party?15:54
pabelangerbrendangalloway: yah, in this case, you are going to have very limited way to check in untrusted job15:54
mordredbrendangalloway: yeah - I think allowing wait_for to be used makes sense - it might be tomorrow before I can get enough headspace to dive in to the action plugin exclusions and figure it all out15:54
pabelangerwhat I'd suggest, is figure out how to do the wait check, move that job into trusted, then parent to it from untrusted job15:55
mordredpabelanger, brendangalloway: what didn't work with wait_for_connection (curious)15:55
*** mattw4 has joined #zuul15:56
brendangallowaypabelanger: that could be an option.  Luckily the reformat is the first role called and once it's working it shouldn't need much changing15:56
pabelangernot sure, I'm guessing that some with socket from executor to node?15:57
brendangallowaymordred: "msg": "SSH Error: data could not be sent to remote host "<redacted>". Make sure this host can be reached over ssh"15:57
pabelangerwait15:57
pabelangerso, if a new host is coming online with pxe boot15:57
pabelangerit will have new hostkeys15:57
pabelangerand zuul-executor hasn't accepted them15:57
*** hashar has quit IRC15:58
brendangallowaysame playbook works when I run it from my laptop, and once nodepool sees the node, zuul accessess them just fine15:58
pabelangerbrendangalloway: are you preserving hostkeys?15:58
brendangallowaypabelanger: that's already been solved15:58
pabelangerkk15:58
brendangallowayyes15:58
*** hashar has joined #zuul15:58
pabelangerI wonder if ssh-agent hasn't timed out or something15:58
pabelangerbrendangalloway: maybe try using meta reset_connection before wait_for_connection?15:59
pabelangerthat would cause ansible-playbook to create a new connection again15:59
brendangallowayIs there some way I can introduce a delay on that?  I'd like to wait a minute or two to make sure the connection is properly down before resetting and waiting for the connection16:02
pabelangerwait_for_connection has delay / sleep / timeout settings16:03
*** noorul has joined #zuul16:03
pabelangersame with wait_for16:03
pabelangerbut you can also use pause task to hardcode a delay too16:03
brendangallowayso wait_for_connection with ignore errors, reset connection, wait_for_connection again?16:04
pabelangerreboot node, reset connection, wait_for_connection (delay / sleep / timeout)16:05
mordredI look away for a second and come back to a very fun scrollback16:06
brendangallowaywhen I call reset connection, will it just drop the connection until the next task?  Or will it try reconnect as part of the reset?16:06
pabelangerit will reconnect16:06
pabelangeron next task16:06
brendangallowaysounds perfect, I will test that then16:06
pabelangerI'm starting to think, when node is booting up again, something is going on with sshd on server side. And maybe connection is reset or something16:07
pabelangerwhy I like wait_for, is you can look for SSHd headers in connection attempt16:07
pabelangereg: https://github.com/ansible/ansible-zuul-jobs/blob/master/playbooks/ansible-network-appliance-base/pre.yaml#L2516:07
brendangallowaywe do a normal reboot at other stages in the play and wait_for_connection works fine16:09
*** hashar has quit IRC16:10
noorulWhere does ansible push the code to the remote node?16:12
clarkbnoorul: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/prepare-workspace-git/README.rst that is the role that should be used and typically as an early action of a base job16:14
corvusnoorul: https://zuul-ci.org/docs/zuul/admin/quick-start.html#configure-a-base-job also has more info -- you might remember doing that part when you ran through the quickstart16:15
clarkbcorvus: left a question on https://review.opendev.org/#/c/681132/16:15
corvusclarkb, noorul: the quickstart uses 'prepare-workspace' -- maybe we should change it to prepare-workspace-git ?16:15
clarkbcorvus: ++ prepare-workspace-git will handle presence of a cache or no cache16:16
noorulclarkb: I am using prepare-workspace16:16
clarkbit is more flexible16:16
corvusclarkb: yes.  Do you want that in the form of a new patchset or followup?16:17
clarkbcorvus: considering there are already a few moving parts and we likely need a full stack to do anything useful a followup is probably fine16:17
corvusclarkb: cool.  i'll stage the fix that way and wait for you to finish the stack before pushing it up.16:18
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Add autohold delete/info commands to web API  https://review.opendev.org/67905716:28
*** rlandy has quit IRC16:30
*** hashar has joined #zuul16:33
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Remove outdated TODO  https://review.opendev.org/68242116:34
*** rlandy has joined #zuul16:35
clarkbcorvus: left some thoughts on the big change https://review.opendev.org/#/c/68077816:35
corvusheh, i just replied to mordreds comment on that, i'll look at clarkb's now16:35
*** noorul has quit IRC16:45
clarkbcorvus: https://review.opendev.org/#/c/681936/ I think I found a bug in that change16:47
clarkb(and I -1'd because I think merging it as is would break opendev)16:47
corvusclarkb: ++16:48
*** rfolco is now known as rfolco|dentist16:51
SpamapSTIL that if you have a directory in your roles path that just has a README.rst in it, Ansible will still consider that a "role", and when you depend on that role, it will happily consider it having run successfully by not doing anything at all.16:54
clarkbcorvus: and comment on https://review.opendev.org/#/c/682487 I think all of my comments but the one on 681936 can be addressed as followups if you prefer16:54
SpamapSThis seems like.. a less than ideal default.16:54
*** pcaruana has joined #zuul16:55
corvusclarkb: what do you think of my response on that one?16:57
paladoxcorvus robot comments are in 2.15.0. At least i doin't remember anyone adding it in a point release :)16:57
corvuspaladox: cool, i'll set it to >=2.15.0 then16:58
openstackgerritJames E. Blair proposed zuul/zuul master: Add enqueue reporter action  https://review.opendev.org/68113216:58
paladoxcorvus https://github.com/GerritCodeReview/gerrit/commit/3fde7e4e75f4653e4a56e6c38bc7718a3280bd9c#diff-a9ed91a039490d5fed094853d96608fe16:58
openstackgerritJames E. Blair proposed zuul/zuul master: Add no-jobs reporter action  https://review.opendev.org/68127816:58
openstackgerritJames E. Blair proposed zuul/zuul master: Add report time to item model  https://review.opendev.org/68132316:58
openstackgerritJames E. Blair proposed zuul/zuul master: Add Item.formatStatusUrl  https://review.opendev.org/68132416:58
openstackgerritJames E. Blair proposed zuul/zuul master: Add support for the Gerrit checks plugin  https://review.opendev.org/68077816:58
openstackgerritJames E. Blair proposed zuul/zuul master: Update gerrit pagination test fixtures  https://review.opendev.org/68211416:58
openstackgerritJames E. Blair proposed zuul/zuul master: Support HTTP-only Gerrit  https://review.opendev.org/68193616:58
openstackgerritJames E. Blair proposed zuul/zuul master: Add autogenerated tag to Gerrit reviews  https://review.opendev.org/68247316:58
openstackgerritJames E. Blair proposed zuul/zuul master: Use robot_comments in Gerrit  https://review.opendev.org/68248716:58
corvusclarkb: those exceeded my own threshold for followups, so i amended the original commits and redid the whole stack16:58
clarkbcorvus: wfm16:59
paladoxit's actually in 2.14 based on that commit!16:59
*** jpena is now known as jpena|off16:59
clarkbcorvus: in https://review.opendev.org/#/c/682487/3..4/zuul/driver/gerrit/gerritconnection.py you updated teh > to >= but left the version as 2.15.0 instead of 2.15.16. The rest of the stack lgtm now17:03
corvusclarkb: yeah did that based on paladox's comment above17:04
clarkboh /me catches up on irc17:04
clarkbaha thanks17:04
corvuswhich came right after i left the response in gerrit, sorry i didn't update17:04
clarkbin that case I think the whole stack is good17:05
Shrewscorvus: been wanting to review that stack, but been dealing with a stack of my own  :/17:06
ShrewstristanC: you want to review https://review.opendev.org/682675 that deals with the node expiration issue?17:10
*** brendangalloway has quit IRC17:13
tristanCShrews: left a comment17:18
ShrewstristanC: Not sure what you're asking there. That code should act the same if the user supplied 0 or not17:20
tristanCShrews: if the user supplied 0, then the code default to scheduler configuration value17:21
tristanCShrews: e.g. shouldn't we differentiate a supplied 0, explicite no expiration, to the default cli value?17:21
ShrewstristanC: oh, well, that brings up a good question. Would we ever want a user supplied value to exceed our zuul's configured max? If we've configured a max of 2 days, would we want to allow something greater than that?17:25
tristanCShrews: at the moment you can, but you would have to use a silly "--hold-expiration 99999999" to ensure a greater value than what is zuul's configured max17:26
ShrewstristanC: right. either way, yes, that needs fixed. but i think that needs to be answered before i can fix it properly17:26
tristanCShrews: I guess supplied expiration should never exceed what is configured in zuul17:27
tristanCShrews: or perhaps it could on the cli, but not from the rest endpoint17:27
Shrewscorvus: after our call, maybe you have an opinion on that ^^^17:30
*** michael-beaver has quit IRC17:40
openstackgerritMerged zuul/zuul-jobs master: Update the base-roles test to use prepare-workspace-git  https://review.opendev.org/68070317:47
openstackgerritMerged zuul/zuul-jobs master: Clean non-bare remote repos  https://review.opendev.org/68068917:47
*** recheck has quit IRC17:49
*** recheck has joined #zuul17:53
*** themroc has quit IRC17:54
*** igordc has joined #zuul18:16
corvusShrews, tristanC: i don't think we want to differentiate cli vs rest -- i believe the cli is expected to move to use rest eventually anyway.  the current nodepool setting is a true max -- it can't be exceeded.18:21
corvusShrews, tristanC: max_hold_expiration should probably match that.  maybe we want to add another option though, a default?  for the situations where you might not want to set a hard limit, but you don't want the default to be unlimited.18:22
corvuser, 'max_hold_age'18:22
*** bhavikdbavishi has quit IRC18:30
Shrewscorvus: ok. i can rework it for that18:41
Shrewsdefault_hold_expiration / max_hold_expiration is probably clearest18:47
corvusmordred: i updated the gerrit stack, so if you have a sec to re-review the updated changes, that'd be swell18:55
*** armstrongs has joined #zuul19:09
*** armstrongs has quit IRC19:15
*** kerby has joined #zuul19:21
*** bolg has quit IRC19:38
*** spsurya has quit IRC19:48
*** sean-k-mooney has quit IRC20:17
*** sean-k-mooney has joined #zuul20:25
*** panda|ruck is now known as panda|ruck|off20:29
*** kerby has quit IRC20:43
*** Goneri has quit IRC20:46
openstackgerritIan Wienand proposed zuul/zuul master: [dnm] testing python3 ansible  https://review.opendev.org/68255620:53
*** kerby has joined #zuul20:56
*** hashar has quit IRC20:57
*** pcaruana has quit IRC20:58
corvuszuul-maint: i pushed up https://review.opendev.org/682743  regarding  http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-September/001017.html20:59
openstackgerritIan Wienand proposed zuul/zuul master: [dnm] testing python3 ansible  https://review.opendev.org/68255621:15
openstackgerritJames E. Blair proposed zuul/zuul master: Add support for the Gerrit checks plugin  https://review.opendev.org/68077821:15
openstackgerritJames E. Blair proposed zuul/zuul master: Update gerrit pagination test fixtures  https://review.opendev.org/68211421:15
openstackgerritJames E. Blair proposed zuul/zuul master: Support HTTP-only Gerrit  https://review.opendev.org/68193621:15
openstackgerritJames E. Blair proposed zuul/zuul master: Add autogenerated tag to Gerrit reviews  https://review.opendev.org/68247321:15
openstackgerritJames E. Blair proposed zuul/zuul master: Use robot_comments in Gerrit  https://review.opendev.org/68248721:15
*** rfolco|dentist is now known as rfolco21:23
ianwSpamapS / mordred: well it seems 10+ hours of debugging that python3 failure has come down to a single "b" character :)21:37
SpamapSianw: of course it has21:52
openstackgerritJames E. Blair proposed zuul/project-config master: Add a third-party check pipeline to OpenDev  https://review.opendev.org/68275621:52
corvusoops ^ wrong repo :)21:59
corvusi've pushed HEAD as 3.10.2 for the security fix22:09
*** jamesmcarthur has quit IRC22:13
openstackgerritMerged zuul/zuul master: Add enqueue reporter action  https://review.opendev.org/68113222:23
openstackgerritJames E. Blair proposed zuul/zuul master: Move reference pipelines out of the quickstart  https://review.opendev.org/68276022:36
corvuswe should merge ^ asap -- i'm not sure if it's causing the quick-start instability, but it's certainly not helping debug it and it's not doing new users any favors22:37
clarkbhrm do zuul release notes only update when changes merge and not when we tag things?22:41
clarkbI'm noticing that 3.10.2 isn't in the release notes yet22:41
clarkb(still shows under in development)22:41
clarkbthere is a change in the gate so I guess we will know soon if that merges and updates release notes22:42
corvusclarkb: yep. https://zuul-ci.org/docs/zuul/3.10.2/releasenotes.html is correct, but master is not22:42
corvusuntil a change lands22:42
clarkbaha22:42
clarkbthanks!22:42
corvusunintended consequence of promote22:42
corvusoh wait, we don't promote on tag22:42
corvusso yeah, i guess we need a "rebuild master docs" job on release22:43
corvusso we'd build it twice, once for the tag, and once so that master sees the new tag22:43
corvusbut we can't use the same build for both because the tag may be behind master22:43
corvusi think this is a peculiarity of projects that put their release notes in their docs, which isn't most of the reno users?22:43
clarkbI think openstack reno usage publishes the release notes independent of the docs22:44
corvusi sent the release announce email22:45
fungiyes, openstack projects run a separate release notes job22:46
fungiindependent of docs jobs for the same repos22:46
corvusi think we should be able to fix it with some override-checkout mojo22:47
corvusi'm too braindead to work it out myself right now :)22:47
fungii don't think it's especially urgent, no22:48
clarkbya the next update to master will fix it22:48
clarkband that will happen shortly according to the zuu lstatus page22:48
* mnaser just barging in with ideas22:48
mnaserhow different/far is bwrap's security model from docker and friends22:49
mnaseri.e. could we technically have a k8s native zuul-executor "driver" to use pods instead of bwrap22:49
mnaser(coming in from a security approach and not as much of a "how much code has to be done")22:50
clarkbI think its quite a bit different to how people often deploy k8s but not so different than openshift's locked down pods22:50
clarkbfor example k8s on google by default gave every pod admin access to the account in the psat (I think they have changed that since)22:50
mnaseroof22:50
mnaseri was thinking more on the container side of things, that stuff could technically be locked down via serviceaccount/rolebindings22:51
SpamapSmnaser: bubblewrap as zuul uses it is pretty locked down. clarkb's assessment matches my own.22:51
clarkbspecifically it is there to limit blast radius if you manage to do something on the executor via ansible in an untrusted job22:51
fungiyeah, at best docker and pals would be no better security-wise (they rely on the same kernel features after all)22:52
SpamapSEssentially, if you can break out of bwrap, you probably have a local kernel root.22:52
clarkband often times the way k8s is deployed means pods are trusted to interact with the rest of the system22:52
clarkbwe want the opposite of that22:52
SpamapShm, that's not been my experience22:52
mnaserright but there's a lot of stuff now to avoid exactly that (networkpolicy, securitycontext, etc)22:52
mnaseri was just trying to compare the container vs bwrap aspect, if in some weird/odd way you could have pods _only_ instead of zuul-executors with bwraps inside them22:53
mnaserthen you can start scaling things out far more because you're not locking down an executors 'bwrap' to a single host22:53
SpamapSUnless you allow privileged: true, my experience has been that most k8s nodes are set up to be pretty safe from container escape.22:53
mnaserSpamapS: my only annoyance is the fact all service are exposed to everything by defaulty22:54
mnaserbut networkpolicy can work around that easily22:54
clarkbSpamapS: cloud providers like gke put cloud authentication stuff in those pods though22:54
clarkbSpamapS: and from that you change the config to allow privileged true and win22:54
SpamapSmnaser: that's life, IMO. zero trust networking is the cloud model that we follow. Everything requires auth.22:54
fungiwell, part of the problem for scaling it to hosts other than the executor is that we share files directly into the bubblewrap containers22:54
clarkbI think its more of a make things easy for users problem in popular deployments22:54
corvusmnaser: it's come up before.  i think it would be possible, perhaps desirable for various non-security purposes to have a container-based executor, but it's going to take a lot of planning and implementation.  we're talking a pretty big spec, and it's absolutely going to depend on the successful completion of the zuul-operator spec.  in the mean time, i agree with clarkb and SpamapS that it wouldn't be a22:55
corvussecurity win.  the more compelling reasons to do that have to do with, honestly, things like better integration with openshift.22:55
clarkbnot inherent ot k8s and openshift for example locks it down22:55
mnaseron the subject of the zuul-operator, i am kinda under a very pressing deadline so i have been taking time to set things up and see what a golang based operator looks like22:55
corvus(and yeah, fungi has hit on one of the fundamental design problems to be overcome)22:56
mnaserand ive discovered more useful things like the ability to literally embed another operator that uses operator-sdk22:56
clarkbhttps://cloud.google.com/kubernetes-engine/docs/concepts/security-overview#securing_instance_metadata for gke docs on the subject22:56
mnaserso why ask the user to install the zookeeper operator when you can quite literally just include it as a dependency _inside_ your operator and it will start managing zookeeper, so 1 operator for everything22:56
corvusmnaser: well, the spec addresses that with https://github.com/operator-framework/operator-lifecycle-manager22:57
mnaserand im not talking about an extra pod, im talking about the apis and controllers living _inside_ the zuul-operator22:57
corvusmnaser: would embedding be a better approach?22:57
mnaseryeah, but this adds up a whole bunch of other things,  the zuul-operator would manage Zookeeper types inside its namespace22:57
mnaseri.e.: to get started, simply install the zuul-operator. and that's it.  you're done.22:58
corvusmnaser: i thought that would come out of the lifecycle manager too?22:58
mnaserno OLM -- or -- "if you dont want to use OLM.... make sure you have the zookeeper and the pxc and the etc"22:58
mnaserit sounds like OLM it actaully an extra componenet that will make it so you have 3 operators running in your namespace, where as this case, you have one22:59
mnaserthis means you dont need anything _except_ the zuul-operator running22:59
SpamapSStill22:59
mnaseralso some other fun things you can do with golang that you couldnt with the ansible one, i actually can do things like use the provided github credentials23:00
SpamapSthat's highly optimized23:00
mnaserpoll github23:00
SpamapSI respect the desire to optimize it23:00
SpamapSbut there's a whole community to think about23:00
mnaserand pull down repositories it's installed into23:00
SpamapS98% awesome, and maintainable by ansible-knowing folks is better than 100% awesome but only 10% of Zuul users can approach it.23:00
mnaserso no need to necessarily list out all the repos you are using the app for (obviously some might want to explicitly list it, but it does simplify life in that manner)23:00
mnaseri totally understand23:01
mnaseranyways, i'd be happy to show what i have at some point if there's interest, i don't think i'd probably want to throw it out there to cause confusion for those seeking a zuul operator23:02
*** tosky has quit IRC23:11
*** jamesmcarthur has joined #zuul23:24
openstackgerritMerged zuul/zuul master: Move reference pipelines out of the quickstart  https://review.opendev.org/68276023:25
*** mattw4 has quit IRC23:34
*** igordc has quit IRC23:39
*** jamesmcarthur has quit IRC23:50
*** rlandy has quit IRC23:51

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!