tristanC | johanssone: i'm no pip expert, though you can grab the web stuff from http://tarballs.openstack.org/zuul/zuul-content-latest.tar.gz | 00:36 |
---|---|---|
*** yolanda has joined #zuul | 03:44 | |
*** yolanda_ has quit IRC | 03:46 | |
*** swest has joined #zuul | 04:35 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Change test prints to log.info https://review.openstack.org/554058 | 05:38 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix logging in tests to be quiet when expected https://review.openstack.org/554054 | 05:38 |
*** Rohaan has joined #zuul | 05:39 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix logging in tests to be quiet when expected https://review.openstack.org/554054 | 05:44 |
tobiash | mordred: I took over your logging changes, hope that's ok | 05:44 |
*** hashar has joined #zuul | 05:52 | |
*** AJaeger has quit IRC | 05:54 | |
*** AJaeger has joined #zuul | 05:58 | |
openstackgerrit | Merged openstack-infra/zuul master: Add timestamps to multiline debug message logs https://review.openstack.org/572845 | 06:06 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix logging in tests to be quiet when expected https://review.openstack.org/554054 | 06:22 |
*** pcaruana has joined #zuul | 06:26 | |
Rohaan | tristanC: ping | 06:45 |
tristanC | Rohaan: Hey! | 06:48 |
Rohaan | tristanC: I was trying to install rh-python35-python-openshift, but yum complains : No package rh-python35-python-openshift available. Which repository needs to be enabled for this package? | 06:55 |
*** bhavik1 has joined #zuul | 06:59 | |
tristanC | Rohaan: it's only in sf-master, it will be part of the next release. you can try it out using this repo: http://softwarefactory-project.io/repos/sf-release-master.rpm | 07:01 |
tristanC | Rohaan: or you can install using "pip install openshift" as root | 07:01 |
* Rohaan checks | 07:05 | |
*** bhavik1 has quit IRC | 07:11 | |
Rohaan | tristanC: Did you try this with a local openshift instance or some hosted openshift instance? | 07:49 |
*** jpena|off is now known as jpena | 07:50 | |
tristanC | Rohaan: only with a dedicated all-in-one "oc cluster up" setup | 07:50 |
Rohaan | I tried with one Openshift instance which we get with subcribing to Openshift.io, but seems like we don't have access to create namespaces there :( | 07:52 |
Rohaan | Now I'm trying with a local openshift cluster on my machine. | 07:52 |
tristanC | arg, yes this proposed implementation needs to be able to create project and use the kubectl ansible connection plugin to run task on the pods | 07:53 |
Rohaan | I'm running sfactory in Virtualbox in a centos instance | 07:53 |
tristanC | well i wrote another driver that create pods in an existing project, but that's not part of the current container spec implementation in zuul, fwiw this driver is: https://review.openstack.org/#/c/535557/ | 07:54 |
tristanC | Rohaan: you can run oc cluster up on the zuul host or on a seperate instance. I've tried both the fedora origin package and the upstream release binary on centos too | 07:55 |
Rohaan | O, would it be okay if I do on Zuul host? | 07:56 |
Rohaan | lemme try | 07:56 |
tristanC | yes sure, just grab the server binary from https://github.com/openshift/origin/releases and run it along side zuul | 07:57 |
rcarrillocruz | tristanC: the creation of the openshift node for a given job will be in its project. How is that creation, is it a template where you just plumb the node type from nodepool or is that created within python code? I assume the latter... | 08:06 |
rcarrillocruz | asking cos...it would be VERY interesting if the driver could allow pushing a $deployment_template , thinking of networking use cases were we spin up a topology per a given job | 08:07 |
*** Wei_Liu has joined #zuul | 08:09 | |
tristanC | rcarrillocruz: afaiu, it's better to create a new project for each PR so that they each get isolated image registry | 08:11 |
tristanC | rcarrillocruz: the current driver i propose create the project and service account using python sdk here: https://review.openstack.org/#/c/570667/3/nodepool/driver/openshift/provider.py L88 | 08:11 |
rcarrillocruz | agreed, not saying the opposite. What I'm asking is the driver is I assume just addressing the creation of the PR node | 08:11 |
rcarrillocruz | in terms on how you are developing, wondering if you create with python or you create a YAML template then push with kubectl/openshift plugin | 08:12 |
rcarrillocruz | so python | 08:12 |
rcarrillocruz | gotcha | 08:12 |
tristanC | rcarrillocruz: then the job starts with an access to this namespace, the initial implementation just build an image and start a single pod with a sleep command: https://review.openstack.org/#/c/570669/2/playbooks/openshift/deploy-project.yaml | 08:13 |
rcarrillocruz | also tristanC , i've started poking at kubevirt and it works, i can get VMs on openshift | 08:13 |
rcarrillocruz | it's VERY intriguing for zuul purposes :-) | 08:13 |
rcarrillocruz | getting in a world where you can have zuul in openshift running containrs when possible and VMs when not possible is very interesting | 08:14 |
tristanC | but this openshift-base job could easily be extended to use a more complex deployment recipie, e.g. to deploy cross project environment | 08:14 |
tristanC | ultimately, here's what looks like a job to test a container deployment: https://softwarefactory-project.io/draft/zuul-openshift/#orgheadline10 | 08:15 |
*** electrofelix has joined #zuul | 08:16 | |
tristanC | rcarrillocruz: it seems like kubevirt is another openshift use case that may need another nodepool. Does it setup floating ip and ssh access, presumbably setup with cloud-init? | 08:18 |
rcarrillocruz | yeah, if you look at the kubevirt demo repo, they spin up a cirros pod by attaching to it a volume for cloud init consumption | 08:19 |
rcarrillocruz | https://raw.githubusercontent.com/kubevirt/demo/master/manifests/vm.yaml | 08:19 |
tristanC | that's neat :-) | 08:21 |
*** gtema has joined #zuul | 08:28 | |
rcarrillocruz | mordred , corvus : ^ , openshift *can* run VMs | 08:34 |
rcarrillocruz | i tested it over the weekend | 08:34 |
Rohaan | tristanC: Hey, I've installed and logged into openshift. Is there some way to force nodepool to create empty openshift project again? I was at step 3.1 with previous openshift instance | 08:51 |
Rohaan | I've tried restarting nodepool/zuul services but that doesn't seem to create it | 08:52 |
tristanC | Rohaan: unfortunately, you can swap out nodepool provider like that, you need to first set it to max-servers: 0, then remove it... | 08:56 |
tristanC | Rohaan: since it's a test setup, I think you can "systemctl stop zookeeper; rm -Rf /var/lib/zookeeper/version-2/*; systemctl start zookeeper" | 08:57 |
tristanC | that should purge nodepool database | 08:57 |
tristanC | /you can not* swap out nodepool provider/ | 08:58 |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: Accumulate errors in context managers - part 2 https://review.openstack.org/574028 | 09:01 |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: Accumulate errors in context managers - part 2 https://review.openstack.org/574028 | 09:01 |
Rohaan | tristanC: I restarted zookeeper, but now nodepool list returns nothing :( | 09:03 |
tristanC | Rohaan: perhaps restart the nodepool service, or does the openshift-project label has a min-ready value? | 09:05 |
tristanC | (systemctl restart rh-python35-nodepool-launcher) | 09:05 |
Rohaan | yes, I restarted nodepool also | 09:06 |
Rohaan | How can I check if openshift-project label has a min-ready value? | 09:07 |
tristanC | Rohaan: could you share /var/log/nodepool/launcher.log ? | 09:07 |
tristanC | and /etc/nodepool/nodepool.yaml , label definition is in this file | 09:08 |
Rohaan | nodepool.yaml: https://pastebin.com/vSzdLt1k | 09:10 |
Rohaan | nodepool launcher.log : https://pastebin.com/3B0ijhng | 09:10 |
tristanC | Rohaan: oh, you are missing the openshift provider, you need to add https://softwarefactory-project.io/draft/zuul-openshift/#orgheadline9 to /etc/nodepool/nodepool.yaml | 09:12 |
tristanC | no need to restart nodepool, it auto reloads the configuration when it changes | 09:13 |
*** fbo has joined #zuul | 09:14 | |
tristanC | Rohaan: make sure to follow all the steps of that draft zuul-openshift documentation | 09:15 |
Rohaan | ah, sorry | 09:16 |
Rohaan | I had added that in /root/config/nodepool/openshift.yaml file | 09:17 |
tristanC | Rohaan: then you need to git commit && git push that file to trigger the config-update that will copy the content in /etc/nodepool/nodepool.yaml | 09:18 |
Rohaan | tristanC: Have I added it correctly? https://pastebin.com/8WnhfXmF | 09:22 |
tristanC | Rohaan: that should work, though next time you better merge the "providers" and "labels" list to not confuse the yaml parser | 09:24 |
Rohaan | Now previous nodepool providers are back but still I don't see openshift project :( | 09:29 |
tristanC | Rohaan: could you paste new launcher.log ? | 09:31 |
Rohaan | https://pastebin.com/b52g0sKV | 09:32 |
tristanC | Rohaan: you need to setup nodepool .kube/config: sudo -u nodepool oc login -u developer https://localhost:8443 | 09:34 |
Rohaan | I remember I did that. but let me try again | 09:35 |
Rohaan | Hm, .kube/config just got created. Earlier I used to login successfully but I used to get a small "stat .: permission denied" error/warning | 09:40 |
tristanC | Rohaan: hum, perhaps the sudo needs to be ran from a public directory, not from /root | 09:41 |
Rohaan | tristanC: Yay, I can now see openshift provider https://pastebin.com/Pcu7Rj54 | 09:47 |
tristanC | Rohaan: nice! | 09:47 |
Rohaan | tristanC: Hey, Have I changed .zuul.yaml in demo project correctly? https://pastebin.com/dpcXLAj5 . Somehow I can't see job status in zuul console after doing git review | 10:05 |
tristanC | Rohaan: I think you need to remove the initial "- project" statement and only keep the one that sets the openshift-test job | 10:07 |
Rohaan | Do I need to create a patch again? Or would amending and doing git review again would trigger it? | 10:10 |
Rohaan | Right now I'm just amending it and pushing it again | 10:11 |
tristanC | both action would work, though it sounds better to amend | 10:12 |
Rohaan | Somehow it's not coming up, Do you know where zuul store it's logs? | 10:15 |
Rohaan | https://ibin.co/44uRhpaq0ztW.png | 10:17 |
Rohaan | tristanC: Any idea? | 10:21 |
tristanC | Rohaan: clicking the "toggle ci" button at the bottom should show zuul configuration errors in gerrit comments | 10:21 |
tristanC | Rohaan: otherwise you need to look at /var/log/zuul/scheduler.log | 10:22 |
Rohaan | tristanC: Thanks. I see it's complaining about base-openshift being not defined https://pastebin.com/jyVEhrXX | 10:24 |
tristanC | Rohaan: you need to copy those file in /root/config/playbooks: https://review.openstack.org/#/c/570669/ | 10:25 |
tristanC | and define the openshift-base job in /root/config/zuul.d/openshift.yaml using this: https://review.openstack.org/#/c/570669/2/zuul.yaml | 10:25 |
tristanC | then commit and push those | 10:26 |
*** Wei_Liu has quit IRC | 10:31 | |
Rohaan | tristanC: I found mistake, in https://review.openstack.org/#/c/570669/2/zuul.yaml job name is openshift-base but in doc we are writing base-openshift . Now it seems to be coming up | 10:35 |
*** Wei_Liu has joined #zuul | 10:35 | |
Rohaan | but it fails giving : openshift-test finger://dhcppc4:7979/7e3edec464a942a9be90d17879af6fd6 : RETRY_LIMIT in 1s | 10:36 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix signature of overridden methods in LogStreamHandler https://review.openstack.org/574204 | 10:44 |
tristanC | Rohaan: to debug RETRY_LIMIT, change the zuul handler to DEBUG in /etc/zuul/executor-logging.yaml; then "systemctl restart rh-python35-zuul-executor"; | 10:45 |
tristanC | Rohaan: then run "/opt/rh/rh-python35/root/usr/bin/zuul-executor verbose" | 10:46 |
tristanC | Rohaan: that will make the executor write ansible-playbook (the job execution) output to /var/log/zuul/executor.log | 10:46 |
tristanC | Rohaan: you can comment "recheck" on the code review to retrigger the job | 10:46 |
Rohaan | tristanC: Thanks. Getting some python import errors there https://pastebin.com/QqyuHSwr | 10:52 |
tristanC | Rohaan: oh, after you installed the "sf-release-master.rpm", did you run "yum update -y" ? | 10:54 |
Rohaan | oops | 10:54 |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: Add a config_errors info to {tenant}/config-errors endpoint https://review.openstack.org/553873 | 10:56 |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: Add .stestr to .gitignore https://review.openstack.org/574213 | 11:00 |
Rohaan | tristanC: After doing `yum update -y` do I need to restart nodepool also? | 11:02 |
tristanC | Rohaan: I guess mostly ansible got upgraded to the version 2.5, then you can just recheck the job without restarting the zuul-executor | 11:04 |
tristanC | if zuul got upgraded, then you need to re-apply the tiny patch to support openshift resource, it's: https://review.openstack.org/#/c/570668/ | 11:05 |
*** GonZo2000 has joined #zuul | 11:08 | |
*** jpena is now known as jpena|lunch | 11:19 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Switch content type of public keys back to text/plain https://review.openstack.org/574220 | 11:23 |
*** Rohaan has quit IRC | 11:36 | |
*** Rohaan has joined #zuul | 11:50 | |
*** jpena|lunch is now known as jpena | 12:03 | |
*** Rohaan has quit IRC | 12:03 | |
*** rlandy has joined #zuul | 12:17 | |
*** rlandy is now known as rlandy|rover | 12:18 | |
*** pcaruana has quit IRC | 12:39 | |
*** odyssey4me has quit IRC | 12:39 | |
*** odyssey4me has joined #zuul | 12:39 | |
*** Wei_Liu has quit IRC | 12:45 | |
*** Wei_Liu has joined #zuul | 12:45 | |
*** myoung|off is now known as myoung | 13:19 | |
*** pcaruana has joined #zuul | 13:32 | |
openstackgerrit | Merged openstack-infra/nodepool master: Fail quickly for disabled provider pools https://review.openstack.org/573762 | 13:35 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Hide queue headers for empty queues when filtering https://review.openstack.org/572588 | 13:36 |
*** GonZo2000 has quit IRC | 13:38 | |
openstackgerrit | Merged openstack-infra/nodepool master: Fix 'satisfy' spelling errors https://review.openstack.org/573823 | 13:42 |
Shrews | mordred: thx | 13:44 |
openstackgerrit | Merged openstack-infra/zuul master: Add .stestr to .gitignore https://review.openstack.org/574213 | 13:46 |
openstackgerrit | Merged openstack-infra/zuul master: github: Optimize getPullReviews() a bit https://review.openstack.org/570077 | 13:46 |
*** gtema has quit IRC | 13:50 | |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: Add tenant yaml validation option to scheduler https://review.openstack.org/574265 | 13:52 |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: Add tenant yaml validation option to scheduler https://review.openstack.org/574265 | 13:55 |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: Add tenant yaml validation option to scheduler https://review.openstack.org/574265 | 13:56 |
*** ianychoi has quit IRC | 14:02 | |
*** needssleep is now known as TheJulia | 14:15 | |
openstackgerrit | Merged openstack-infra/nodepool master: Correctly use connection-port in static driver https://review.openstack.org/569339 | 14:22 |
corvus | i need to run an errand this morning that could take a while; i hope to be back around lunch but it's hard to say. | 14:25 |
pabelanger | fetch-zuul-cloner seems to only work with a single zuul connection, multiple will fail | 14:46 |
openstackgerrit | Merged openstack-infra/zuul master: Make file matchers overridable https://review.openstack.org/571745 | 14:46 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Consume Task and TaskManager from openstacksdk https://review.openstack.org/414759 | 14:54 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Switch to RateLimitingTaskManager base class https://review.openstack.org/574287 | 14:57 |
mordred | Shrews, corvus: ^^ those two patches, along with "Import rate limiting TaskManager from nodepool https://review.openstack.org/574285" - should be basically a no-op/cleanup from a nodepool perspective, but should help people over in sdk land see the whole picture | 15:00 |
mordred | I've got at least 2 people who have expressed interest in improving the caching story over there, but if you don't understand the nodepoool task manager you really can't touch the shade/sdk caching code | 15:01 |
mordred | I broke it up into two patches so that we could see the is-a-thread to has-a-thread change in place- that first patch should basically work today as-is | 15:03 |
Shrews | mordred: ++ | 15:05 |
Shrews | mordred: why does 547285 depend on 414759? | 15:09 |
Shrews | seems backwards, but i havent look too closely yet | 15:10 |
mordred | Shrews: just so we can make sure that the has-a-thread rework is good in nodepool before importing it into sdk | 15:12 |
mordred | the patches don't *actually* depend on each other - but if nodepool reviewers find a flaw in 414759, I don't want the flawed code landing in sdk-land | 15:13 |
Shrews | k | 15:13 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Consume Task and TaskManager from openstacksdk https://review.openstack.org/414759 | 15:21 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Switch to RateLimitingTaskManager base class https://review.openstack.org/574287 | 15:21 |
openstackgerrit | Merged openstack-infra/nodepool master: Log connection port in static driver on timeout https://review.openstack.org/569334 | 15:39 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Use openstacksdk instead of os-client-config https://review.openstack.org/566158 | 15:42 |
*** electrofelix has quit IRC | 15:53 | |
*** dtruong_ has quit IRC | 16:05 | |
*** GonZo2000 has joined #zuul | 16:30 | |
openstackgerrit | qingszhao proposed openstack-infra/nodepool master: fix tox python3 overrides https://review.openstack.org/574344 | 17:03 |
*** jpena is now known as jpena|off | 17:13 | |
openstackgerrit | Doug Hellmann proposed openstack-infra/zuul-jobs master: allow build-python-release consumers to override python version https://review.openstack.org/574373 | 18:17 |
*** dtruong has joined #zuul | 18:31 | |
clarkb | tristanC: I wanted to followup on the nodepool azure driver. Any chance you would be interested in working with kata to see if you can get that tested? I can introduce you to people and help too | 18:34 |
*** myoung is now known as myoung|lunch | 18:52 | |
openstackgerrit | Doug Hellmann proposed openstack-infra/zuul-jobs master: allow build-python-release consumers to override python version https://review.openstack.org/574373 | 19:09 |
*** bhavik1 has joined #zuul | 19:14 | |
openstackgerrit | Doug Hellmann proposed openstack-infra/zuul-jobs master: allow build-python-release consumers to override python version https://review.openstack.org/574373 | 19:17 |
*** GonZo2000 has quit IRC | 19:48 | |
*** bhavik1 has quit IRC | 19:58 | |
-openstackstatus- NOTICE: Zuul was restarted for a software upgrade; changes uploaded or approved between 19:30 and 19:50 will need to be rechecked | 19:58 | |
*** myoung|lunch is now known as myoung | 19:59 | |
*** dkranz has quit IRC | 20:05 | |
clarkb | corvus: not sure if you saw but on friday I noticed that github event handling in zuul doesn't always handle depends on properly | 20:26 |
corvus | clarkb: no, i missed that | 20:26 |
corvus | clarkb: have more detail yet? | 20:26 |
clarkb | corvus: http://paste.openstack.org/show/723011/ shows the logs from that. The TL;DR is that the depends on processing path wans't handled properly when I edited the PR to add the depends on. I had to push a new "patchset" for it to process the depends on | 20:27 |
corvus | clarkb: i bet we can come up with a test case for that; i think our fake github is sufficiently expressive for that | 20:28 |
corvus | clarkb: do you know what the event github sent was? | 20:28 |
clarkb | corvus: I don't. But reading teh code in independent pipeline manager's checkForChangesNeededBy the issue is we return True (and check for True) to know if the depends on are managed properly but we return the list of needed changes if there are needed changes. Then lesewhere in independent manager we call addChange() on those changes to tie them up properly. But in the base manager class we check against true | 20:30 |
clarkb | only | 20:30 |
clarkb | _processOneItem() specifically doesn't handle if a list of needed changes is returned | 20:30 |
clarkb | so I think this may be an issue on any event if the event does not go through the addChange() path | 20:31 |
corvus | clarkb: i think i understand. so it's github-only by virtue of the fact that changing a gerrit commit message means a new patch, but in github does not. i do think we ought to be able to repro in a test case. | 20:34 |
clarkb | ya, where I am still a little lost reading the code is how a change gets into the pipeline queue as an update PR event without having gone through the addChange path somewhere along the way | 20:36 |
corvus | it should not be able to | 20:36 |
clarkb | since getChangeQueue (which creates the queues in independent pipelines) creates them as part of addChange() | 20:39 |
clarkb | maybe it was using cached data? | 20:39 |
clarkb | also there is a possibly related problem of sometimes changes show up in the third party check pipeline twice. tobiash said that was a known issue he was debugging though | 20:40 |
clarkb | I'll start by trying to write this up into a bug as it doesn't sound like a known issue | 20:42 |
*** hashar has quit IRC | 20:44 | |
tobiash | clarkb: actually I planned to debug the double enqueue this week but haven't done so far | 21:12 |
clarkb | tobiash: that is ok, week only just started :) | 21:13 |
tobiash | clarkb: regarding adding dependsin later, I also observed this and I think it doesn't update the cached change in this case isn't really a new patchset | 21:14 |
tobiash | So I guess we need to handle some event properly | 21:15 |
tobiash | I saw that dependson issue also in our deployment btw | 21:16 |
corvus | clarkb: moving our discussion about the log issue here -- ( http://logs.openstack.org/20/564220/1/check/devstack-multinode/7a4d81d/ara-report/result/ba7fbc30-b825-4650-a903-a6c4afdcbcb8/ ) | 21:17 |
corvus | are we certain that any v3-native multinode jobs are working? | 21:17 |
corvus | that error looks like the 'command' module is being called without the zuul_log_id argument. that argument is set by the zuul_stream callback plugin inside v2_playbook_on_task_start. i'm starting to wonder if that method isn't always called in the same way that it previously was. | 21:18 |
corvus | tobiash: ^ | 21:19 |
clarkb | ah ok you are about where I am | 21:20 |
corvus | i'm looking at the job-output.json from that build, and i see some "zuul_log_id": null in it | 21:25 |
tobiash | Ara says it's a shell zask but the shell rask before is ok | 21:25 |
tobiash | corvus: maybe within the with_items it works differently now? | 21:26 |
corvus | what's the previous shell task that's okay? | 21:27 |
tobiash | http://logs.openstack.org/20/564220/1/check/devstack-multinode/7a4d81d/ara-report/result/1ee921c4-34c4-4294-bf7d-d9b042ea4a6e/ | 21:27 |
corvus | tobiash: that's on a different host | 21:27 |
corvus | looks like http://logs.openstack.org/20/564220/1/check/devstack-multinode/7a4d81d/ara-report/result/0c83e4f7-8241-4edf-9df6-8cf0455c9775/ is the previous shell task on that host, and it didn't have the error | 21:28 |
clarkb | maybe it doesn't call v2_playbook_on_task_start once for every node running the task? | 21:28 |
corvus | tobiash: the failed task isn't in a with_items, is it? | 21:29 |
corvus | it looks like it's just in a when block | 21:29 |
tobiash | Ah, so it's not with_items but a task which ix executed on several hosts at once | 21:29 |
corvus | (but it is in a block, which is notable) | 21:29 |
tobiash | clarkb: yes, that's maybe the problem | 21:32 |
tobiash | corvus: thos might be testable with tox-remote if we fake two ansible hosts with the same ip | 21:33 |
corvus | ++ | 21:33 |
clarkb | readingthe code I think it actually calls the callback on each host | 21:37 |
clarkb | the free StrategyModule's run method seems to do this | 21:37 |
corvus | most of the "zuul_log_id" entries in the json have uuids. only one task has "null" as the value. it's the apt-get update command in the pre-playbook. it's null for both hosts. | 21:38 |
corvus | i don't see the connection yet | 21:38 |
tristanC | clarkb: providing an azure account, or shell access to a system with one already setup, i could find some time to get it working. Though I can't promise to maintain it and it's probably better if someone from the infra^Wwinterscale team be assigned to it | 21:39 |
corvus | (i suspect that it's the proximate cause for the error though -- you get to write to one log file called "None", but not a second time) | 21:39 |
corvus | tristanC: we don't really "assign" people :) | 21:39 |
corvus | people volunteer for things | 21:40 |
tobiash | corvus: is there something else special with this command task. | 21:44 |
tobiash | ? | 21:44 |
tristanC | corvus: heh, my bad wording. though it is a proprietary service, so supporting it should deserve some sort of compensation :-) | 21:44 |
tobiash | Maybe task.action doesn't match 'command' or 'shell' in this case | 21:45 |
corvus | tobiash: hrm, i can't see why it wouldn't... | 21:45 |
tristanC | clarkb: and moving forward, i guess we need to reconsider some sort of label restriction per project or tenant | 21:45 |
clarkb | tristanC: yes, we have an open story for that and I think to start we can operate on an honor system and if people abuse it go from there | 21:46 |
tobiash | corvus: i need sleep now. If you need help with this just ping me and I try to reproduce and debug this tomorrow. | 21:47 |
corvus | tobiash: thanks | 21:49 |
tristanC | clarkb: fair enough. well let me know when i can start putting the code together | 21:49 |
*** myoung is now known as myoung|off | 21:50 | |
corvus | i think we want to finish reworking the driver interface and make sure it can handle containers, then add more drivers | 21:50 |
tristanC | corvus: the first step is more of prototype and get the sdk working (e.g. create instance, clean-up leaks, etc...), that shouldn't be hard to rebase on driver interface change | 21:51 |
corvus | tristanC: yep | 21:52 |
clarkb | ya I think both of those things can be worked in parallel particularly if we have azure access in the near future | 21:52 |
tobiash | corvus: one thing I noticed is that the task is skippex on one node | 21:55 |
tobiash | Maybe that's the special thing | 21:55 |
clarkb | the action value comes out of the yaml parser for tasks | 21:56 |
corvus | the other task is a role handler. i don't know what a role handler is | 21:56 |
corvus | ok, now i understand what a role handler is | 21:57 |
corvus | anyway, that's a special thing about it | 21:57 |
tobiash | The failed task is skipped for compute1 and fails for controller | 21:58 |
tobiash | So there could be a side effect of the on skipped callback | 21:58 |
corvus | so far, we have null values on: 1) a handler in a role in a playbook; 2) a task with a when argument in a block with a when argument in a role in a playbook | 21:59 |
corvus | add "after a skipped task" to #2 :) | 21:59 |
corvus | i don't think we can add that to #1 though | 22:00 |
clarkb | it is also a parse error to not have a task action value | 22:00 |
corvus | i'm going to start seeing if i can reproduce any of this locally | 22:03 |
clarkb | ok, I was wrong about running the v2_playbook_on_task_start() function for each host. Reading the linear strategy instead (which appears to be default), it will only do this for the first host,task pair in its loop | 22:18 |
clarkb | but it also seems to have always had this behavior | 22:19 |
corvus | clarkb: that probably is good enough if we assume that the hosts are distinct (we actually can't because of the new ability to add the same host to the inventory twice, but that's really a separate problem. that's not what we're running into here. so let's set that aside for later) | 22:22 |
corvus | clarkb: it has also occured to me that we're modifying a data structure here. i wonder if there's a code path in ansible 2.5 where this modification is no longer permanent. like, we get passed a copy of the task structure. | 22:23 |
corvus | clarkb: unless i misunderstood what you said about the first host,task pair... | 22:24 |
clarkb | corvus: my reading of it is that you have task command to echo foo that gets turned into a list of (hostx, task_command_foo), (hosty, task_command_foo) | 22:25 |
corvus | clarkb: do you mean that it only runs once for each task? (and that one invocation just gets whichever host happens to be first?) | 22:25 |
clarkb | the first time through that list it will run the callback against task_command_foo. What I am not sure about is if that task object is the same for hosty and hostx or if that chaged (but it might explain the behavior if it is a different task object) | 22:25 |
clarkb | corvus: yup that | 22:25 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: allow build-python-release consumers to override python version https://review.openstack.org/574373 | 22:27 |
clarkb | it does look like blocks play into this as well in the get next task code | 22:29 |
corvus | i haven't been able to reproduce it by including this role: http://paste.openstack.org/show/723237/ | 22:31 |
clarkb | this quickly gets into internal ansible state machien stuff and I'm losing my way through it. But d6872a7b070d1179e7d76bcda1490bb7003c4574 and b107e397cbd75d8e095b08495da2ac2aad3ef96f appear suspicious | 22:33 |
clarkb | that second one in particular, I think meant the old behavior was that all the tasks were premade and when you iterate through that host,tasks list they will share task objects | 22:35 |
clarkb | but now I don't know that that happens as they don't cache them upfront | 22:35 |
*** EmilienM|PTO is now known as EmilienM | 22:41 | |
corvus | i have been able to get a null log id by using handlers | 22:41 |
clarkb | corvus: what does that look like? | 22:44 |
clarkb | (I'm not sure I've seen that example) | 22:44 |
corvus | clarkb: patterned after configure-mirrors role | 22:44 |
corvus | there's a handlers/main.yaml with the task in it, and then elsewhere, something says "notify: [task name]" | 22:45 |
clarkb | and its the handler itself in handlers/main.yaml that breaks? | 22:46 |
corvus | clarkb: not so much breaking as being called with zuul_log_id=null | 22:46 |
*** snapiri has quit IRC | 22:47 | |
*** snapiri has joined #zuul | 22:47 | |
corvus | clarkb: in the real exemplar, this happened first, so it didn't break either (it's okay to have a log file called "log-None". but only once. the second time breaks) | 22:47 |
corvus | clarkb: so the block task in devstack (which i still haven't gotten to run with a null task locally) broke because it ran after the handler task had already written a log-None file | 22:48 |
clarkb | why can't it just keep writing to the existing file? similar to how the zuul_log_id will be reused by tasks in a job I think | 22:48 |
clarkb | oh each log is its own uuid | 22:49 |
clarkb | I thought it was the job uuid for some reason | 22:49 |
corvus | clarkb: yeah. i suspect it can't write to the file because the first task is done as root, the second as 'stack' user | 22:50 |
corvus | thus the permission denied error on the file | 22:50 |
corvus | which, i guess is something i'm missing from my local test :) | 22:51 |
clarkb | just thinking out loud here, if each task gets its own uuid for the log file anyway, each task already has a built in uuid. We could possibly use that instead of passing in an explicit argument | 22:52 |
corvus | clarkb: that seems plausible | 22:52 |
clarkb | granted it is task._uuid so maybe that is why we provide our own | 22:53 |
corvus | i've got something like this in a test locally: http://paste.openstack.org/show/723242/ but it's still not failing on the 'test block' task | 23:11 |
corvus | i think i've got all the significant things | 23:12 |
clarkb | on the block side of things maybe you need it to have some nodes evaluate to false? | 23:16 |
clarkb | the discover nodes one will only run on one of the multiple hosts due to a when clause | 23:16 |
corvus | clarkb: oh good point | 23:16 |
corvus | still no joy | 23:20 |
corvus | (and i have compute1 and controller in the same inventory ordre) | 23:21 |
corvus | i don't have any groups though | 23:21 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an Azure driver https://review.openstack.org/554432 | 23:24 |
corvus | clarkb: i'm nearing EOD, i wrote up a story here: https://storyboard.openstack.org/#!/story/2002528 | 23:36 |
corvus | mordred, tobiash: ^ that's as far as i've gotten | 23:36 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: DNM: reproducer for 2002528 https://review.openstack.org/574487 | 23:38 |
corvus | mordred, tobiash, clarkb: ^ that's my incomplete attempt to modify test_zuul_console to reproduce that | 23:39 |
clarkb | corvus: before you go I dug up the original from oepnstack-helm http://logs.openstack.org/55/574055/3/check/openstack-helm-infra-ubuntu/9e9ac15/ara-report/result/54b51530-2857-4003-83e2-53fe446808b0/ that is failig on a shell task without a block or conditional at least not at the level the task is defined. Its pretty boring other than a delegate to | 23:44 |
corvus | huh | 23:45 |
clarkb | it is the running of that task on the second node that makes it unhappy (runs fine on the first node) | 23:48 |
corvus | why isn't the previous task on node-1 a pause? | 23:48 |
clarkb | I was just wondering that myself | 23:49 |
clarkb | if you filter to node-1 there in ara there is no task at all | 23:49 |
clarkb | perhaps pause is implicitly a one node thing in ansible? | 23:50 |
clarkb | it just picks one and does it? | 23:50 |
corvus | yeah, that seems to be the behavior | 23:52 |
clarkb | I wonder if this is the sort of thing where if we ask ansible upstream about the behavior they will know off the top of their heads what is going on | 23:52 |
clarkb | I too need to EOD here shortly. We finally have sun and that means yardwork | 23:53 |
corvus | clarkb: i tried this, still no joy: http://paste.openstack.org/show/723246/ | 23:53 |
*** rlandy|rover is now known as rlandy|rover|bbl | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!