hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 00:13 |
---|---|---|
*** Goneri has quit IRC | 00:19 | |
*** rlandy|bbl is now known as rlandy | 01:55 | |
*** rlandy has quit IRC | 01:56 | |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 02:13 |
*** jaganathan has quit IRC | 02:43 | |
*** jaganathan has joined #oooq | 02:47 | |
*** ykarel|away has joined #oooq | 03:48 | |
*** ykarel|away is now known as ykare | 03:48 | |
*** ykare is now known as ykarel | 03:48 | |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 04:13 |
ykarel | So still we have issues in promotion related to registry | 04:19 |
ykarel | failed: [localhost] (item=swift-account) => {"changed": false, "item": "swift-account", "msg": "Error searching for image trunk.registry.rdoproject.org/tripleomaster/centos-binary-swift-account - 500 Server Error: Internal Server Error (\"{\"message\":\"layer does not exist\"}\")"} | 04:20 |
*** udesale has joined #oooq | 04:21 | |
*** ratailor has joined #oooq | 04:55 | |
ykarel | Also noticed that retries: 3 is not working for docker pull ^^, so we should try with fixing retries to see if that helps in reducing failures | 05:14 |
*** pgadiya has joined #oooq | 05:24 | |
*** pgadiya has quit IRC | 05:24 | |
*** udesale_ has joined #oooq | 05:33 | |
*** ykarel_ has joined #oooq | 05:33 | |
*** ykarel has quit IRC | 05:36 | |
*** udesale has quit IRC | 05:36 | |
*** hamzy has quit IRC | 05:37 | |
*** agopi has quit IRC | 05:43 | |
*** agopi has joined #oooq | 05:43 | |
*** jaganathan has quit IRC | 05:44 | |
*** jaganathan has joined #oooq | 05:44 | |
*** marios has joined #oooq | 05:46 | |
*** hamzy has joined #oooq | 05:47 | |
*** quiquell|off is now known as quiquell|ruck | 05:57 | |
quiquell|ruck | ykarel_: Good morning, going to check | 05:58 |
*** jfrancoa has joined #oooq | 06:05 | |
*** jfrancoa has joined #oooq | 06:06 | |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 06:13 |
*** jtomasek has joined #oooq | 06:25 | |
*** jtomasek has quit IRC | 06:25 | |
*** jtomasek has joined #oooq | 06:26 | |
*** holser__ has joined #oooq | 06:41 | |
*** links has joined #oooq | 06:47 | |
*** skramaja has joined #oooq | 07:00 | |
*** kopecmartin has joined #oooq | 07:15 | |
*** quiquell|ruck is now known as quiquell|ruck|af | 07:15 | |
*** quiquell|ruck|af is now known as quique|ruck|afk | 07:15 | |
*** florianf has joined #oooq | 07:16 | |
*** links has quit IRC | 07:29 | |
*** tesseract has joined #oooq | 07:29 | |
*** udesale__ has joined #oooq | 07:29 | |
*** ykarel__ has joined #oooq | 07:30 | |
*** udesale_ has quit IRC | 07:32 | |
*** ykarel_ has quit IRC | 07:33 | |
*** ykarel__ is now known as ykarel|lunch | 07:36 | |
*** amoralej|off is now known as amoralej | 07:38 | |
*** links has joined #oooq | 07:47 | |
*** tosky has joined #oooq | 07:48 | |
*** links has quit IRC | 07:52 | |
*** quique|ruck|afk is now known as quiquell|ruck | 07:58 | |
*** ccamacho has joined #oooq | 07:59 | |
*** bogdando has joined #oooq | 07:59 | |
*** lucas-afk is now known as lucasagomes | 08:05 | |
*** gkadam has joined #oooq | 08:07 | |
*** ykarel|lunch is now known as ykarel | 08:13 | |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 08:13 |
ykarel | quiquell|ruck, you find something about promotion failures? | 08:30 |
quiquell|ruck | ykarel: I have to check it out with panda, I also see some timeout a image fetching | 08:32 |
ykarel | quiquell|ruck, Ok, will prepare a patch to fix retries and see how it goes | 08:32 |
quiquell|ruck | failed: [localhost] (item=[u'etcd', u'f106094e961c5ab430687d673063baee379f6bbd_310b64d1']) => {"changed": false, "item": ["etcd", "f106094e961c5ab430687d673063baee379f6bbd_310b64d1"], "msg": "Error removing image docker.io/tripleomaster/centos-binary-etcd:f106094e961c5ab430687d673063baee379f6bbd_310b64d1 - UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)"} | 08:33 |
quiquell|ruck | at master | 08:33 |
ykarel | hmm so two different errors one for localhost, and other for rdo registry | 08:34 |
quiquell|ruck | I think RDO is ok | 08:35 |
*** udesale_ has joined #oooq | 08:35 | |
quiquell|ruck | TASK [Push images to rdoproject registry with named label] is working fine | 08:35 |
quiquell|ruck | Feels like a docker image layer missing before push | 08:36 |
*** udesale__ has quit IRC | 08:38 | |
quiquell|ruck | Maybe something is half baked at docker hub | 08:43 |
*** agopi has quit IRC | 08:44 | |
*** links has joined #oooq | 08:49 | |
quiquell|ruck | syslog: inotify_add_watch(7, /dev/dm-1, 10) failed: No such file or directory | 08:51 |
quiquell|ruck | Don't know if it can be related | 08:51 |
*** pgadiya has joined #oooq | 08:57 | |
*** pgadiya has quit IRC | 08:57 | |
*** panda|rover|off is now known as panda|rover | 08:59 | |
quiquell|ruck | panda|rover: Hi | 09:02 |
ykarel | quiquell|ruck, fix retries: https://review.rdoproject.org/r/13419 | 09:03 |
quiquell|ruck | panda|rover: ykarel introducing a change on retries | 09:05 |
*** jtomasek has quit IRC | 09:07 | |
amoralej | https://review.openstack.org/#/c/561482/ is failint in check | 09:29 |
amoralej | it seems infra | 09:29 |
amoralej | can i just recheck? | 09:29 |
amoralej | is the undercloud-containers job broken?, two failures in a row | 09:32 |
amoralej | quiquell|ruck, ^ | 09:32 |
quiquell|ruck | amoralej: Let me do a quick check | 09:32 |
amoralej | any known issue? | 09:32 |
amoralej | 2018-04-17 20:50:15.983954 | primary | [WARNING]: No hosts matched, nothing to do | 09:32 |
amoralej | http://logs.openstack.org/82/561482/1/check/tripleo-ci-centos-7-undercloud-containers/ecc4b0c/job-output.txt.gz | 09:32 |
ykarel | amoralej, looks like there is a issue, docker image not found | 09:33 |
ykarel | i have seen other job as well | 09:33 |
ykarel | Image docker.io/tripleomaster/centos-binary-rabbitmq has no tag f106094e961c5ab430687d673063baee379f6bbd_310b64d1 | 09:33 |
quiquell|ruck | ImageUploaderException: Image docker.io/tripleomaster/centos-binary-rabbitmq has no tag f106094e961c5ab430687d673063baee379f6bbd_310b64d1 | 09:34 |
quiquell|ruck | Ups the same | 09:34 |
quiquell|ruck | There something weird with docker hub | 09:34 |
ykarel | similar bug is there but for different tag: https://bugs.launchpad.net/tripleo/+bug/1764870 | 09:34 |
openstack | Launchpad bug 1764870 in tripleo "Missing tags in dockerhub, impossible to deploy a containerized undercloud" [Critical,Triaged] - Assigned to Gabriele Cerami (gcerami) | 09:34 |
amoralej | that must be breaking all reviews | 09:35 |
panda|rover | is this still happening ? | 09:36 |
ykarel | yes | 09:36 |
panda|rover | mmm there's no rabbitmq with that tag, I see other containers but not rabbitmq | 09:41 |
ykarel | Tag and push to docker hub: failed: [localhost] (item=[u'f106094e961c5ab430687d673063baee379f6bbd_310b64d1', u'rabbitmq']) => {"changed": false, "item": ["f106094e961c5ab430687d673063baee379f6bbd_310b64d1", "rabbitmq"], "msg": "Error searching for image docker.io/docker.io/tripleomaster/centos-binary-rabbitmq - 500 Server Error: Internal Server Error (\"{\"message\":\"layer does not exist\"}\")"} | 09:49 |
ykarel | and many other images for this tag failed to be pushed | 09:50 |
panda|rover | ykarel: but now the images are there | 09:50 |
panda|rover | I see f106094e961c5ab430687d673063baee379f6bbd_310b64d1 | 09:50 |
panda|rover | ykarel: for rabbitmq | 09:50 |
ykarel | panda|rover, current run 2018-04-18 07:51:27,452 14001 INFO promoter Promoting the container images for dlrn hash f106094e961c5ab430687d673063baee379f6bbd on master to current-tripleo should have fixed it | 09:51 |
panda|rover | ykarel: yeah probably | 09:52 |
panda|rover | so we had a transient error | 09:52 |
panda|rover | and we are not handling very well | 09:53 |
ykarel | sorry, 2018-04-18 07:51:27,452 14001 INFO promoter Promoting the container images for dlrn hash f106094e961c5ab430687d673063baee379f6bbd on master to current-tripleo | 09:53 |
ykarel | 2018-04-18 08:01:01,758 21610 ERROR promoter Another promoter process is running | 09:53 |
ykarel | looking which Another is running | 09:53 |
ykarel | panda|rover, can you check which is running currently | 09:54 |
ykarel | which release and which hash | 09:54 |
panda|rover | ykarel: mhh I see master and queens are running at the same time, and I'm not sure it should happen | 09:56 |
ykarel | f106094e961c5ab430687d673063baee379f6bbd_310b64d1 | 09:56 |
ykarel | 229 MB | 09:56 |
ykarel | 7 minutes ago for rabbitmq | 09:56 |
panda|rover | ykarel: and it's not easy to understand which hash | 09:56 |
panda|rover | ykarel: no, that was me | 09:56 |
panda|rover | ykarel: I repushed it manually | 09:56 |
ykarel | you pushed? | 09:56 |
ykarel | Okk | 09:56 |
panda|rover | ykarel: but it was already there | 09:56 |
panda|rover | ykarel: all the layers were existing | 09:57 |
ykarel | But the question is why check jobs are using a non promoted hash: f106094e961c5ab430687d673063baee379f6bbd_310b64d1 | 09:57 |
ykarel | is this expected? | 09:58 |
panda|rover | ykarel: because the push happens before the promotion | 09:59 |
panda|rover | ykarel: so we promote only after we have all the containers in place | 09:59 |
panda|rover | the wierd thing is why the jobs are trying to get that hash | 10:01 |
ykarel | yah that what i mean, jobs should use: https://trunk.rdoproject.org/centos7-master/current-tripleo/delorean.repo tag specified in base url | 10:01 |
panda|rover | they should resolve the hash at the start, not taking the containers with the tag "current-tripleo" | 10:01 |
*** bogdando has quit IRC | 10:02 | |
*** bogdando has joined #oooq | 10:04 | |
*** holser__ has quit IRC | 10:04 | |
*** holser__ has joined #oooq | 10:05 | |
panda|rover | ykarel: that's what I always thought, maybe there was a change, and some pull is using the current-tripleo tag directly instead of the hash | 10:05 |
quiquell|ruck | ykarel: I think openstack jobs use docker hub and rdo jobs use rdo repo | 10:05 |
*** zoli is now known as zoli|lunch | 10:07 | |
panda|rover | ykarel: crap. prep-container is using the tag directly | 10:07 |
panda|rover | quiquell|ruck: https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-prep-containers/templates/overcloud-prep-containers.sh.j2#L44 | 10:08 |
panda|rover | this shouls be the resolved hash | 10:09 |
panda|rover | it was overlooked in the review | 10:09 |
panda|rover | and I remember reviewing it | 10:10 |
panda|rover | damn | 10:10 |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 10:14 |
panda|rover | ykarel: where did you take that error log from ? | 10:18 |
ykarel | check job | 10:18 |
ykarel | https://review.openstack.org/#/c/561482 | 10:19 |
panda|rover | this is getting weirder | 10:19 |
*** jaganathan has quit IRC | 10:21 | |
*** udesale__ has joined #oooq | 10:21 | |
panda|rover | ykarel: which job is failing ? | 10:22 |
panda|rover | ykarel: undercloud containers ? | 10:22 |
ykarel | yes | 10:22 |
ykarel | panda|rover, can this be the fix: https://review.openstack.org/#/c/559492 | 10:22 |
*** udesale_ has quit IRC | 10:24 | |
panda|rover | ykarel: this is about container update | 10:28 |
panda|rover | ykarel: do you know the undercloud containers workflow well ? | 10:31 |
ykarel | no :( | 10:31 |
panda|rover | ykarel: what's latest master hash ? | 10:47 |
ykarel | current-tripleo? | 10:48 |
panda|rover | ykarel: yes | 10:48 |
ykarel | a2e69c2c44417c85334944a4c46f91648aa0b97f_3791bf5d | 10:48 |
panda|rover | and we are also trying to promote a new one | 10:53 |
ykarel | yes | 10:53 |
ykarel | panda|rover, looks like https://review.openstack.org/#/c/559492 would solve or atleast workaround the issue | 11:00 |
panda|rover | ykarel: how so ? | 11:00 |
ykarel | panda|rover, it will set update_containers to true | 11:01 |
ykarel | and because of that containers would be prepared from docker_image_tag | 11:01 |
ykarel | and thus undercloud would be installed from correct tag(promoted one) | 11:01 |
panda|rover | ykarel: I think we have problems with the uploads, not all the containers from that tag ar present anyway | 11:11 |
ykarel | yes that's one issue | 11:11 |
ykarel | and other is wrong tag is used for containers in undercloud install | 11:12 |
panda|rover | ykarel: I'm not even sure about that | 11:13 |
panda|rover | ykarel: because it either uses current-tripleo directly, and then I don't understand from where it's taking th e explicit hash | 11:13 |
ykarel | f106094e961c5ab430687d673063baee379f6bbd_310b64d1 is used for undercloud containers which is not promoted yet | 11:14 |
*** zoli|lunch is now known as zoli | 11:14 | |
panda|rover | ykarel: or it's trying to download the explicit hash as tag, and I don't understand from where it's taking it | 11:14 |
*** zoli is now known as zoli|wfh | 11:14 | |
*** zoli|wfh is now known as zoli | 11:14 | |
panda|rover | ykarel: certainly not from the current-tripleo tag | 11:14 |
ykarel | yes, /me also trying to find that | 11:15 |
*** udesale__ has quit IRC | 11:16 | |
*** lucasagomes is now known as lucas-hungry | 11:18 | |
amoralej | ykarel, panda|rover so iiuc undercloud containers is broken? | 11:55 |
panda|rover | amoralej: master just promoted, it should be ok | 11:56 |
panda|rover | amoralej: but i'ts not clean what's the method of selecting the tag, and that is not working well when we have problems with the images upload | 11:57 |
amoralej | ok | 11:57 |
amoralej | i'll recheck | 11:57 |
weshay | quiquell|ruck, morning | 11:58 |
ykarel | panda|rover, but in promotion jobs we don't have a job with only containerized undercloud and no containerized overcloud | 11:58 |
ykarel | panda|rover, is there any such job? | 11:58 |
*** atoth has joined #oooq | 11:59 | |
panda|rover | ykarel: I think there is | 11:59 |
ykarel | ok, which one, i will check something there | 11:59 |
*** rfolco|off is now known as rfolco | 12:01 | |
weshay | panda|rover, any ideas what the root cause of "500 Server Error: Internal Server Error (\"{\"message\":\"layer does not exist\"}\")"}" is? | 12:01 |
weshay | panda|rover, afaict.. the last run of the container push worked w/o error in both rdo and docker | 12:02 |
panda|rover | weshay: it's either device mapper overload when there are two promotions happening at the same moment | 12:02 |
panda|rover | or file system is failing | 12:02 |
weshay | ah | 12:02 |
panda|rover | we are seeing some worrying messages in the logs | 12:03 |
weshay | ya.. two promotions at the same time seems like an issue | 12:03 |
panda|rover | we are currently stopping all the automatic runs | 12:03 |
weshay | k | 12:03 |
panda|rover | and we have a queens run launched manually | 12:03 |
weshay | manually from the promotion server, or using the manual push script? | 12:03 |
panda|rover | weshay: you in the program call ? I'm there only to not leave quique alone | 12:03 |
weshay | panda|rover, ya.. I'm on | 12:03 |
panda|rover | weshay: manually from the promoter server | 12:03 |
panda|rover | weshay: ok I'll drop then | 12:04 |
weshay | panda|rover, k | 12:04 |
weshay | panda|rover, ya.. ur good | 12:04 |
weshay | quiquell|ruck, will be great | 12:04 |
panda|rover | yeah, just the first time will be a bit rough if they start asking questions :) | 12:04 |
quiquell|ruck | weshay, panda|rover: Let's see I have already change my pants | 12:07 |
weshay | quiquell|ruck, lolz | 12:07 |
quiquell|ruck | weshay: Even with the problem at promote we still are green, isn't it ? | 12:07 |
weshay | quiquell|ruck, you actually may be one of the first OpenStackers to join the program call in their first 90 days | 12:07 |
weshay | so ++ | 12:07 |
weshay | quiquell|ruck, board says queens is green :) | 12:08 |
weshay | we are green | 12:08 |
quiquell|ruck | deal | 12:08 |
*** lhinds- is now known as lhinds | 12:08 | |
quiquell|ruck | Looks like around 20 is missing | 12:09 |
quiquell|ruck | For the manual promoter to pormote queens | 12:09 |
quiquell|ruck | Will check this https://dashboards.rdoproject.org/queens before talking | 12:09 |
weshay | panda|rover, quiquell|ruck are any jobs failing due to the registry? missing containers | 12:09 |
quiquell|ruck | weshay: containerized undercloud | 12:09 |
quiquell|ruck | tripleo-ci-centos-7-undercloud-upgrades | 12:10 |
*** amoralej is now known as amoralej|lunch | 12:10 | |
weshay | hrm.. ya | 12:10 |
weshay | panda|rover, 2018-04-18 08:43:39 | Exception: Image docker.io/tripleomaster/centos-binary-rabbitmq has no tag f106094e961c5ab430687d673063baee379f6bbd_310b64d1. | 12:11 |
ykarel | weshay, is there a place where i can find all the runs for tripleo-ci-centos-7-undercloud-containers? | 12:12 |
weshay | panda|rover, shouldn't that be fixed now that you have pushed all the containers for master? | 12:12 |
quiquell|ruck | ykarel: at zuul's builds | 12:12 |
weshay | ykarel, http://cistatus.tripleo.org/ | 12:12 |
panda|rover | weshay: master promoted again 1 hour ago | 12:12 |
weshay | lolz | 12:13 |
panda|rover | weshay: but there's something wrong with how the undercloud container job chooses its hash | 12:13 |
ykarel | weshay, Thanks | 12:13 |
weshay | panda|rover, ya.. it should use the hash | 12:13 |
weshay | not the current-tripleo tag | 12:13 |
panda|rover | weshay: yep, I see the same error in prep containers now, after the upgrade sprints | 12:14 |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 12:14 |
panda|rover | weshay: we are using the tag directly | 12:14 |
weshay | panda|rover, we should push the hash tag first, and the softlink tag second | 12:14 |
quiquell|ruck | They are talking about containerized undercloud problems now | 12:14 |
panda|rover | weshay: we already do that | 12:14 |
*** jtomasek has joined #oooq | 12:15 | |
panda|rover | weshay https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/container-push/container-push.yml#L122 | 12:15 |
weshay | panda|rover, ya.. so I was looking at that yesterday | 12:16 |
weshay | panda|rover, which with_items is that using? | 12:16 |
weshay | tag: "{{ item[0] }}" | 12:16 |
weshay | \/centos-binary-{{ item[1] }}" | 12:17 |
weshay | those two are confusing to me | 12:17 |
panda|rover | weshay: it's a with_nested. item[0] is the first element in the with_nested list, item[1] is the second | 12:17 |
weshay | ykarel, you see it? | 12:17 |
weshay | panda|rover, ha.. I'm blind.. thanks | 12:18 |
ykarel | weshay, looking | 12:18 |
ykarel | panda|rover, weshay it's using with_nested: | 12:22 |
ykarel | item[0] means 1st list element, and item[1] means element from second list | 12:22 |
ykarel | ohh it's already told, | 12:23 |
ykarel | then what to look | 12:23 |
weshay | quiquell|ruck, nice work! | 12:25 |
quiquell|ruck | weshay: Done! put that on my 90 days plan ! | 12:26 |
quiquell|ruck | :-P | 12:26 |
ykarel | panda|rover, you told that you pushed manually f106094e961c5ab430687d673063baee379f6bbd_310b64d for rabbitmq | 12:26 |
weshay | :) | 12:26 |
ykarel | panda|rover, can you tell how you did that | 12:26 |
quiquell|ruck | thanks for the assistance btw | 12:26 |
panda|rover | ykarel: yes, but only for rabbitmq | 12:26 |
ykarel | Ok, how | 12:26 |
panda|rover | ykarel: I'm not giving out this information for free | 12:27 |
ykarel | :) what need to be done | 12:28 |
panda|rover | ykarel: I want a chicken masala delivered to my home | 12:28 |
ykarel | u r crazy :) | 12:28 |
panda|rover | ykarel: you basically pull the image from registry with the tag, then retag changing the registry name, then push | 12:28 |
ykarel | you pushed only <hash> or name also <current-tripleo> | 12:29 |
*** trown|outtypewww is now known as trown | 12:29 | |
panda|rover | ykarel: only the hash | 12:29 |
ykarel | please try pushing current-tripleo | 12:29 |
quiquell|ruck | weshay: Do you know how to access the bmc image of the RHEL stacks ? | 12:29 |
panda|rover | ykarel: I'd rather not, it will change the link to a wrong container | 12:29 |
panda|rover | ykarel: did you find any error with current-tripleo | 12:30 |
ykarel | [zuul@subnode-0 ~]$ skopeo inspect docker://docker.io/tripleomaster/centos-binary-rabbitmq:current-tripleo|grep rdo_version | 12:30 |
ykarel | "rdo_version": "b23b33707ba4f4bd0682e58be30e1d16c6232992_d032039d" | 12:30 |
ykarel | panda|rover, ^^ | 12:30 |
*** rlandy has joined #oooq | 12:30 | |
panda|rover | ykarel: yes, that's the hash that just promoted 1 hour ago | 12:30 |
ykarel | i can see b23b33707ba4f4bd0682e58be30e1d16c6232992_d032039d | 12:31 |
ykarel | 229 MB | 12:31 |
ykarel | 10 hours ago | 12:31 |
*** lucas-hungry is now known as lucasagomes | 12:31 | |
panda|rover | ykarel: I think the creation time is maintained in the container itself | 12:31 |
panda|rover | ykarel: and it was indeed created 10 hours ago | 12:32 |
panda|rover | ykarel: then there' the promotion pipeline, and the promotion process, with all the delays we had | 12:32 |
ykarel | panda|rover, so when you pushred rabbitmq i saw the time and it was 7 minutes ago | 12:32 |
panda|rover | 10 hours looks right | 12:32 |
panda|rover | heh | 12:32 |
panda|rover | you're right | 12:34 |
panda|rover | so at this point | 12:34 |
panda|rover | ykarel: for b23b33707ba4f4bd0682e58be30e1d16c62 part of the upload happened 10 hours ago | 12:35 |
panda|rover | but we were ebale to finish only 1 hour ago | 12:35 |
panda|rover | for all the container | 12:35 |
panda|rover | s | 12:35 |
weshay | quiquell|ruck, you are free to drop | 12:36 |
panda|rover | ykarel: also, we are tagging all the containers with current-tripleo, only after they are all pushed with the hash first | 12:36 |
quiquell|ruck | weshay: Ok, Mark McLoughling is asking for something, I think trello cards for queens | 12:36 |
*** quiquell|ruck is now known as quique|ruck|food | 12:37 | |
weshay | quiquell|ruck, on the call? | 12:37 |
quique|ruck|food | weshay: On the doc | 12:37 |
quique|ruck|food | Comments on the right | 12:37 |
weshay | quique|ruck|food, https://trello.com/c/KhgqKQGB | 12:37 |
panda|rover | is asking for a CIX card | 12:38 |
panda|rover | I updated the bug, I'll update the card too | 12:38 |
rlandy | oh we're still on this container tag issue :( | 12:38 |
quique|ruck|food | weshay: Cool thanks | 12:39 |
rlandy | trown: good morning ... | 12:39 |
trown | rlandy: good morning... I finally had some success last night! | 12:40 |
rlandy | trown: that's good - mine timed out on TASK [Render deployment file for InstanceIdDeployment] | 12:40 |
trown | rlandy: I think it might be a hostname thing... I changed a few things at once though, so not totally sure | 12:40 |
panda|rover | apevec already updated the card | 12:40 |
rlandy | trown: which fs? | 12:41 |
trown | rlandy: but when I went back and deployed non-container pike scenario I could see rabbit was complaining | 12:41 |
trown | rlandy: I did fs007 on pike, just for familiarity in troubleshooting | 12:41 |
rlandy | non-container? or containerized? | 12:41 |
trown | rlandy: I will get what I have in to the patch so you can try, but I eventually got that featureset working | 12:42 |
trown | non-container | 12:42 |
trown | I am trying my changes on fs010 queens now though | 12:42 |
rlandy | trown: cool - in the mean time, I was going to try sshnaidm|off's new patch | 12:42 |
trown | k | 12:42 |
rlandy | trown: my deployment went much further the second time around though | 12:43 |
rlandy | when I reran overcloud-deploy manually on the undercloud | 12:43 |
rlandy | that I can't really explain | 12:44 |
trown | rlandy: that is probably a fluke... it isnt really possible to rerun multinode | 12:45 |
rlandy | trown: nothing really happened the first time round | 12:45 |
rlandy | bailed very early | 12:46 |
ykarel | panda|rover, b23b33707ba4f4bd0682e58be30e1d16c6232992_d032039d tagged images pushed manually? | 12:46 |
ykarel | as i can't see this hash in logs | 12:46 |
panda|rover | ykarel: no, it was an automatic run | 12:47 |
ykarel | ok logs will appear when script finishes | 12:47 |
ykarel | right? | 12:47 |
panda|rover | ykarel: script already finished | 12:48 |
*** apetrich_ has joined #oooq | 12:48 | |
ykarel | then where are logs? | 12:48 |
ykarel | i am looking here: http://38.145.34.55/master.log | 12:48 |
panda|rover | ykarel: search for the hash, it should be a 11:07 the first mention | 12:49 |
panda|rover | 2018-04-18 11:07:33,288 19409 INFO promoter Promoting the container images for dlrn hash b23b33707ba4f4bd0682e58be30e1d16c6232992 on master to current-tripleo | 12:49 |
ykarel | ya found after refreshing :) | 12:49 |
panda|rover | ykarel: oh yeah, we still don't have tail -f via http | 12:50 |
ykarel | so no failure this time | 12:50 |
ykarel | and master promoted | 12:50 |
panda|rover | yes, but I also halted all automatic runs | 12:50 |
ykarel | Ok, so u will manually run for remaining releases | 12:50 |
panda|rover | ykarel: yes, but I'm doing this to test the theory, that the server is unable to handle correctly more than one promotion at a time | 12:51 |
panda|rover | ykarel: we got all sort of dm errors in the logs | 12:51 |
ykarel | okk, i heard dm is deprecated, no? | 12:52 |
quique|ruck|food | panda|rover: We can restart docker instead of the server | 12:52 |
ykarel | dm is devicemapper, right? | 12:52 |
quique|ruck|food | ykarel: Yep | 12:52 |
panda|rover | quique|ruck|food: yes, but after queens finishes | 12:52 |
*** ratailor has quit IRC | 12:52 | |
quique|ruck|food | panda|rover: sure sure | 12:54 |
panda|rover | quique|ruck|food: twice sure, we must be very very sure | 12:54 |
quique|ruck|food | panda|rover: very very sure sure | 12:55 |
panda|rover | ykarel: we had error in the logs that don't make much sense | 12:55 |
panda|rover | ykarel: missing layers when we are trying to push, or delete an image | 12:55 |
*** apetrich_ has quit IRC | 12:55 | |
ykarel | panda|rover, i think first of all we should fix retries | 12:55 |
ykarel | i think retries is not working | 12:56 |
weshay | panda|rover, note.. pike also promoted .. just in case you didn't see that | 12:56 |
quique|ruck|food | panda|rover: Looks like ansible docker module do a lookup, maybe after pushing it to check that it's there | 12:56 |
quique|ruck|food | Or maybe before | 12:57 |
panda|rover | ykarel: "If the until parameter isn’t defined, the value for the retries parameter is forced to 1" | 12:57 |
ykarel | and we don't have until | 12:57 |
panda|rover | so it' certainly not working, but if the problem is temporary , it may not be solved by the time we are retrying | 12:57 |
weshay | panda|rover, do you want me to reschedule your 1-1? | 12:58 |
panda|rover | weshay: no, I have to wait for queens promotion to finish before taking any other action | 12:58 |
ykarel | panda|rover, yes it would not be solved but we can get something from that if it passess in some tries | 12:58 |
weshay | panda|rover, I'm in | 12:59 |
panda|rover | ykarel: if it's a load problem we are just delaying the inevitable | 12:59 |
quique|ruck|food | If we go to the retry, let's add a delay if it's not one by default | 13:00 |
ykarel | hmm but is that load causing issue | 13:02 |
*** Goneri has joined #oooq | 13:04 | |
*** amoralej|lunch is now known as amoralej | 13:05 | |
quique|ruck|food | panda|rover: queen promoted :-) yeeeih !!! | 13:07 |
panda|rover | quique|ruck|food: hhmmm | 13:12 |
quique|ruck|food | https://dashboards.rdoproject.org/queens | 13:17 |
arxcruz | weshay: 1-1 in 10 ? | 13:21 |
*** zoli is now known as zoli|afk | 13:21 | |
weshay | arxcruz, aye | 13:22 |
*** quique|ruck|food is now known as quiquell|ruck | 13:24 | |
trown | booyakasha | 13:29 |
trown | 2018-04-18 13:17:20 | Ran: 1 tests in 73.0000 sec. | 13:29 |
trown | 2018-04-18 13:17:20 | - Passed: 1 | 13:29 |
trown | 2018-04-18 13:17:20 | - Skipped: 0 | 13:29 |
trown | 2018-04-18 13:17:20 | - Expected Fail: 0 | 13:29 |
trown | 2018-04-18 13:17:20 | - Unexpected Success: 0 | 13:29 |
trown | 2018-04-18 13:17:20 | - Failed: 0 | 13:29 |
trown | ^^ featureset10-queens | 13:29 |
trown | rlandy: ^ | 13:29 |
weshay | arxcruz, ready | 13:31 |
weshay | trown++ | 13:31 |
hubbot | weshay: trown's karma is now 39 | 13:31 |
weshay | trown++ | 13:31 |
hubbot | weshay: trown's karma is now 40 | 13:31 |
weshay | trown++ | 13:31 |
hubbot | weshay: trown's karma is now 41 | 13:31 |
panda|rover | booyakasha ? | 13:31 |
panda|rover | booyakasha++ ? | 13:31 |
rlandy | trown: very nice :) | 13:33 |
panda|rover | quiquell|ruck: weid, the script did not finish yet on my console | 13:33 |
panda|rover | it's hanging there | 13:33 |
quiquell|ruck | panda|rover: Same here | 13:35 |
panda|rover | I stilll see this 25fc8390fb49c6660b4dc953a2e2f0f3c977ff2b as ongoing | 13:35 |
*** trown|brb has joined #oooq | 13:35 | |
*** trown has quit IRC | 13:35 | |
panda|rover | oh | 13:35 |
panda|rover | it's promoting two hashes ? | 13:35 |
panda|rover | on a row ? | 13:35 |
panda|rover | that should not happen ... | 13:36 |
panda|rover | oh no no | 13:36 |
panda|rover | it's promoting phase1 | 13:36 |
panda|rover | Promoting the container images for dlrn hash 25fc8390fb49c6660b4dc953a2e2f0f3c977ff2b on queens to current-tripleo-rdo | 13:36 |
panda|rover | but it's working smoothly | 13:36 |
panda|rover | the overload theory stregthen | 13:37 |
panda|rover | I think we need to put the global lock in place .. | 13:37 |
rlandy | trown: what was the fix? | 13:37 |
rlandy | trown: can I rerun from https://etherpad.openstack.org/p/libvirt-setup-fake-nodepool-poc or are there more instructions? | 13:38 |
panda|rover | and then we can merge the retries fix from ykarel | 13:38 |
quiquell|ruck | panda|rover: More than a global lock just one promoter script with a scheduler | 13:39 |
quiquell|ruck | This way we can even give prioritys to different promotions | 13:39 |
quiquell|ruck | panda|rover: It's stuck here 28754 | 13:40 |
quiquell|ruck | Here /usr/bin/python2 /tmp/ansible_1FfWtT/ansible_module_docker_image.py | 13:41 |
panda|rover | quiquell|ruck: is that a process id ? | 13:41 |
panda|rover | ok | 13:41 |
quiquell|ruck | And the directory doesn't exist | 13:41 |
panda|rover | quiquell|ruck: look at the docker images output | 13:42 |
panda|rover | quiquell|ruck: it's pushing current-triple-rdo tags to docker.io | 13:43 |
jfrancoa | panda|rover: weshay: do you have a moment for a question? | 13:43 |
panda|rover | jfrancoa: it'll going to cost you | 13:43 |
quiquell|ruck | panda|rover: that's phase 1? | 13:43 |
jfrancoa | panda|rover: whatever it is, I'll pay it ;-) | 13:43 |
panda|rover | jfrancoa: I like gazpacho a lot, you know ? | 13:43 |
weshay | jfrancoa, sure.. in 1-1 but can irc now or chat in a bit | 13:43 |
panda|rover | jfrancoa: shoot | 13:44 |
panda|rover | quiquell|ruck: yes | 13:44 |
quiquell|ruck | panda|rover: So it's doing the right job, just promoting phase 1 too | 13:44 |
quiquell|ruck | :-) | 13:44 |
panda|rover | weshay: phase1 of the previous promotion | 13:44 |
jfrancoa | panda|rover: we were wondering if it would be possible to create a pipeline similar to the experimental one, but dedicate to upgrades | 13:44 |
panda|rover | jfrancoa: I think we have to discuss it with rdo infra folks | 13:44 |
trown|brb | rlandy: I think the fix was the hostname stuff I added to the patch, but ya I updated the etherpad with the updated patch | 13:44 |
*** trown|brb is now known as trown | 13:45 | |
jfrancoa | panda|rover: the thing is that many patches are being backported in tht to former releases, and there is no upgrades job running in that project (which is normal because some of the jobs are not as stable as we'd wish) | 13:45 |
panda|rover | trown: can you stop being successful, there will be nothing left to do for the rest of the sprint :) | 13:45 |
trown | irc is acting goofy | 13:45 |
jfrancoa | panda|rover: but, if we could have an option to trigger all upgrades related job upon request, something like "check-rdo-upgrades" | 13:45 |
*** udesale__ has joined #oooq | 13:45 | |
trown | panda|rover: lol... there is plenty to do to clean up the mess I have made... I was almost ready to give up on the entire sprint yesterday :P | 13:46 |
rlandy | trown: vm flavor? patch still has subnode-2 as control | 13:46 |
rlandy | subnode-1 | 13:46 |
trown | rlandy: ya I switched that back, and made them both big... like i said I changed alot at the same time | 13:46 |
quiquell|ruck | panda|rover, trown: We are also overloading the sprints ? | 13:46 |
* trown has poor operational discipline | 13:46 | |
quiquell|ruck | With too much successful | 13:47 |
panda|rover | quiquell|ruck: we are always overloading sprints | 13:47 |
trown | rlandy: but what is in that patcheset is what worked for me | 13:47 |
rlandy | trown: np - just checking so I test the right thing and avoid asking questions | 13:47 |
* rlandy reads through the diffs | 13:47 | |
trown | rlandy: and I suspect it was just the hostname change that did it, because that looked to be what was causing rabbitmq to barf when I went back to a release I actually know how to troubleshoot well | 13:48 |
panda|rover | jfrancoa: technically is perfectly doable, the problem is the load this could cause to rdocloud, that's why we should talk with rdo infra | 13:48 |
panda|rover | jfrancoa: we can also try to find sutable alternatives | 13:48 |
jfrancoa | panda|rover: I submitted something in between, which makes use of the experimental pipeline: https://review.rdoproject.org/r/#/c/13420/1/zuul/upstream.yaml@307 | 13:48 |
rlandy | trown: can we pls change out your personal key? | 13:49 |
rlandy | --upload '/home/trown/.ssh/id_rsa.pub:/root/.ssh/authorized_keys' | 13:49 |
trown | rlandy: whoops, sure | 13:49 |
* rlandy has to manually correct that each time | 13:49 | |
jfrancoa | panda|rover: but the amount of jobs triggered would be huge (as the ovb-experimental are also included + the upgrades ones) | 13:50 |
trown | rlandy: will ~/ work for now? | 13:50 |
rlandy | trown: yep | 13:50 |
rlandy | anything we can merge | 13:50 |
rlandy | for now | 13:50 |
trown | rlandy: ok updated that | 13:50 |
rlandy | trown; I needed to edit the /etc/resolv.conf | 13:51 |
rlandy | on subnode-0 | 13:51 |
panda|rover | jfrancoa: mmhh | 13:51 |
rlandy | to resolve the repos | 13:51 |
weshay | trown, what was the issue? dns? | 13:51 |
trown | rlandy: hmm I have not needed to do that... | 13:51 |
rlandy | maybe just my setup | 13:51 |
rlandy | ok - let's try this again | 13:51 |
trown | weshay: still not 100% sure, but I think it was rabbitmq hostname issues | 13:51 |
trown | weshay: rabbitmq is very particular about that stuff | 13:52 |
hrybacki | rlandy: this morning confirmed that rebase didn't fix the issue either | 13:52 |
weshay | yes it is | 13:52 |
weshay | trown, is everything pushed to gerrit? | 13:52 |
trown | weshay: ya | 13:53 |
rlandy | hrybacki: then I think we still have a problem with the quickstart change https://github.com/openstack/tripleo-quickstart/commit/c8f9d725ea0306f980b96eff42b0f99230c5b8c7 | 13:53 |
*** holser__ has quit IRC | 13:54 | |
rlandy | and whatever it was fixed to | 13:54 |
trown | testing fresh with only what is in gerrit now to make sure | 13:54 |
rlandy | weshay: ^^ hrybacki is having problem with changes being copied | 13:54 |
*** holser__ has joined #oooq | 13:54 | |
quiquell|ruck | rlandy: Checking rhes 7.5I have arrive to Failed to set up security class mapping | 13:54 |
rlandy | weshay; you changed that revert? | 13:55 |
weshay | rlandy, that was reverted | 13:55 |
weshay | https://github.com/openstack/tripleo-quickstart/commit/05cce7bd240192b3682ba0545f863ab8e8ed5229 | 13:55 |
weshay | rlandy, the bug the user hit was fixed w/ https://github.com/openstack/tripleo-quickstart/commit/4ab8b782768c91f6ba1be9a9f6f58634a5e202b3 | 13:55 |
rlandy | quiquell|ruck: is that same error as in the screen shot and as on the other two stacks? | 13:56 |
rlandy | if so, I forwarded emails weshay from bob fournier regarding the IPA image | 13:57 |
quiquell|ruck | rlandy: Yep | 13:57 |
rlandy | ok - pls see his response to the emails | 13:57 |
rlandy | I was told to drop the investigation at that point | 13:57 |
quiquell|ruck | weshay: Can you send me those e-mails those e-mails | 13:59 |
* quiquell|ruck rerepeating | 13:59 | |
quiquell|ruck | rlandy: Found a bug from him | 14:02 |
quiquell|ruck | https://bugzilla.redhat.com/show_bug.cgi?id=1566110 | 14:02 |
openstack | bugzilla.redhat.com bug 1566110 in openstack-selinux "selinux errors in IPA with OSP-12 using RHEL 7.5" [High,Closed: notabug] - Assigned to lhh | 14:02 |
rlandy | quiquell|ruck: forwarding you the last emails | 14:03 |
quiquell|ruck | rlandy: most obliged | 14:04 |
quiquell|ruck | we are the ones building the IPA images or we have to use "official" ones ? | 14:05 |
rlandy | quiquell|ruck: we don't build any images | 14:05 |
rlandy | if you look at the release file, you will see the IPA images defined | 14:06 |
rlandy | ie: where we pull it from | 14:06 |
rlandy | rhos-release install | 14:07 |
quiquell|ruck | Maybe with RHEL 7.5 we are expossing some selinux stuff into the introspected nodes | 14:08 |
quiquell|ruck | Or even a stupid kernel boot option ? | 14:08 |
rlandy | weshay: to get back to hrybacki's problem, he is making change to tqe on the undercloud in /opt/stack and not seeing those changes picked up in the toc-gate-test run (manually on a reproducer set up env) ... | 14:12 |
rlandy | the zuul_changes are picked up | 14:12 |
rlandy | but not subsequent changes | 14:12 |
weshay | hrybacki, post the log again | 14:12 |
hrybacki | weshay: oh it's so long gone. I'll look up the channel log shortly though | 14:12 |
weshay | faker | 14:12 |
hrybacki | lol | 14:12 |
weshay | :) | 14:13 |
rlandy | hrybacki: weshay: https://paste.fedoraproject.org/paste/KMIyEKt5JSrYYhd4j8Xe1w | 14:13 |
panda|rover | jfrancoa: you have 5 minutes to chat ? | 14:13 |
jfrancoa | panda|rover: sure | 14:13 |
*** zoli|afk is now known as zoli | 14:14 | |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 14:14 |
*** zoli is now known as zoli|wfh | 14:14 | |
hrybacki | weshay: that was the complete run. Here is the specific example of missing bits: https://paste.fedoraproject.org/paste/C18QtGEix-K9Wrk-I9hguA | 14:14 |
panda|rover | jfrancoa: bj/gcerami ? | 14:14 |
jfrancoa | panda|rover: joining | 14:15 |
weshay | hrybacki, this is something you are running locally? | 14:15 |
hrybacki | weshay: so I am running a reproducer script that is pulling in changes for OOOQ and OOOQ-E | 14:15 |
hrybacki | after the initial deployment I log onto the undercloud and make changes in /opt/stack/* (to avoid submitting patchsets for test runs) | 14:15 |
*** gkadam has quit IRC | 14:16 | |
hrybacki | then I invoke the toci script expecting it to use what lives in /opt/stack/* | 14:16 |
hrybacki | but what we find is /tmp/.quickstart/* does not line up with what is in /opt/stack/* | 14:16 |
hrybacki | my guess is quickstart.sh is re-pulling the changes from gerrit | 14:16 |
hrybacki | IIUC that is not the intended behavior | 14:17 |
hrybacki | and a one letter typo => new patchset :( | 14:18 |
rlandy | quiquell|ruck: panda|rover: note incoming review from rasca adding rhos-13 pipeline ( see discussion on #rhos-dev) | 14:21 |
weshay | hrybacki, hrm... | 14:21 |
rlandy | weshay: ^^ fyi | 14:21 |
weshay | k | 14:21 |
weshay | quiquell|ruck, panda|rover tool I quickly ginned up to help discover the owning DFG for escaltions https://github.com/weshayutin/google_sheet_search | 14:21 |
weshay | hrybacki, if you could get that setup and tmate us in that would be very helpful | 14:22 |
hrybacki | weshay: yeah. Not sure if a bug or not but does require me to bombard CI more than I'd prefer testing stuff out | 14:22 |
hrybacki | If it is a new (or re-occurence) of an old one I'm happy to create/add to an old bug report | 14:23 |
hrybacki | rlandy: weshay ^^ | 14:23 |
rlandy | hrybacki: the old bug report was a diff problem to start with | 14:24 |
quiquell|ruck | rlandy: Going to check thanks | 14:24 |
quiquell|ruck | weshay: Watched | 14:25 |
hrybacki | ack. But is the intended behavior for what lives in /opt/stack to be what is used by toci and subsequently end up in /tmp/.quickstart right rlandy? | 14:25 |
rlandy | hrybacki: stupid question - do you clean out LOCAL_WORKING_DIR="$WORKSPACE/.quickstart" when you rerun? | 14:26 |
hrybacki | rlandy: on the undercloud? | 14:26 |
hrybacki | rlandy: every execution is on a fresh undercloud (I've never had success runing a toci script more than once on a system) | 14:27 |
rlandy | ok - should create a new tmp dir anyways | 14:28 |
* hrybacki nods | 14:28 | |
hrybacki | it does. I also wipe out the old /tmp/repro dirs just in case | 14:28 |
rlandy | quiquell|ruck: wrt rhos-7.5 | 14:28 |
rlandy | bob questions the way we are working with extracting the IPA image | 14:31 |
rlandy | you can check his untar comments | 14:31 |
rlandy | the envs are still up | 14:31 |
rlandy | tbh - I don't wee what we are doing wrong all of a sudden | 14:31 |
rlandy | but we need to prove it | 14:31 |
rlandy | I don't think we magically create selinux issues | 14:32 |
rlandy | the question is if somehow others are turning off selinux so we are the ones showing it | 14:32 |
panda|rover | :( lp is timing out right after I wrote a long bug description | 14:35 |
quiquell|ruck | rlandy: I will check, I have se a success now running only the introspection at one of the baremetals | 14:37 |
quiquell|ruck | rlandy: Do you know how to check de kernel booting options of the IPA image ? | 14:51 |
quiquell|ruck | weshay, adarazs: We can see the script now | 14:53 |
adarazs | quiquell|ruck: ? you mean start the meeting early? | 14:54 |
quiquell|ruck | https://bluejeans.com/7891065232 | 14:54 |
quiquell|ruck | adarazs: Yes, If we all can | 14:54 |
weshay | aye joining | 14:54 |
adarazs | me too | 14:54 |
rlandy | panda|rover: pls see question on #rhos-ops | 15:01 |
rlandy | is there any milestone that needs to be achieved during next week? we are planning a scheduled outage because of networking maintenance job . Is it ok or any specific date should not be suggested? | 15:01 |
rlandy | rdocloud | 15:02 |
rlandy | quiquell|ruck: ^^ fyi | 15:02 |
panda|rover | rlandy: sorry I wasn't in the program call, last queens import was 3 days ago | 15:04 |
rlandy | panda|rover: pls join #rhos-ops for the discussion | 15:04 |
weshay | quiquell|ruck, https://github.com/rdo-infra/ci-config/tree/master/ci-scripts | 15:13 |
*** ykarel has quit IRC | 15:17 | |
*** quiquell|ruck is now known as quiquell|off | 15:17 | |
rlandy | quiquell|off: ugh - sorry - missed you :( | 15:18 |
rlandy | going to answer the IPA question | 15:18 |
panda|rover | quiquell|off: https://review.rdoproject.org/r/13429 | 15:18 |
panda|rover | d'oh | 15:18 |
quiquell|off | rlandy: Just paste it here I will read it tomorrow | 15:18 |
rlandy | k - good night | 15:18 |
panda|rover | quiquell|off: have a nice rest of day | 15:18 |
rlandy | sorry - got distracted on other channel | 15:19 |
hrybacki | rlandy: weshay is there a way to stop the log squelching in the toci script? | 15:21 |
weshay | hrybacki, like what in particular? | 15:22 |
hrybacki | weshay: I want verbose ansible output with none of this `no_logs: true` business for debugging | 15:22 |
weshay | hrybacki, the ansible logs rarely help much | 15:23 |
*** skramaja has quit IRC | 15:23 | |
hrybacki | weshay: I feel like I'm blindly searching for a needle in a haystack | 15:24 |
panda|rover | weshaystack | 15:24 |
weshay | hrybacki, tmate your env | 15:25 |
hrybacki | weshay: ? | 15:26 |
hrybacki | I just need to see what the tripleo-inventory role is actually doing | 15:27 |
*** tosky has quit IRC | 15:27 | |
*** tosky has joined #oooq | 15:29 | |
hrybacki | weshay: http://etherpad.corp.redhat.com/tls-everywhere-on-rdo-cloud -- current issues that aren't striked out are still persisting | 15:30 |
rlandy | trown: got delayed - running with your latest changes now | 15:34 |
trown | rlandy: k... I think I might be missing something, my tests from scratch have failed... one was on fs037 though | 15:35 |
*** bogdando has quit IRC | 15:36 | |
rlandy | running so far | 15:37 |
*** ykarel has joined #oooq | 15:45 | |
*** holser__ has quit IRC | 15:47 | |
*** holser__ has joined #oooq | 15:48 | |
*** udesale__ has quit IRC | 15:52 | |
trown | rlandy: I think we need to reboot the vms after setting the hostame... or maybe just set it with hostname command too... | 15:53 |
rlandy | trown: getting fail ssh'ing to subnode-1 | 15:54 |
rlandy | failure | 15:54 |
*** links has quit IRC | 15:55 | |
rlandy | toci-gate-test | 15:55 |
rlandy | says ip is unrechable | 15:55 |
trown | hmmm that is different than what I was hitting | 15:56 |
rlandy | ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=Verbose -o PasswordAuthentication=no -o ConnectionAttempts=32 -tt -i /etc/nodepool/id_rsa 192.168.122.99 sudo mkdir -p /opt/stack/new/tripleo-ci | 15:56 |
rlandy | times out | 15:56 |
rlandy | new error | 15:57 |
*** agopi has joined #oooq | 15:57 | |
rlandy | 192.168.122.17 is the undercloud | 15:58 |
*** jfrancoa has quit IRC | 16:06 | |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 16:14 |
*** florianf has quit IRC | 16:16 | |
myoung | weshay: panda|rover: does this look ok (sprint 12 planning summary for email) - would like to send this shortly: https://docs.google.com/document/d/1ZuK4uvRO-kpjiErChzwI5s3PU2pZxwQal5IRYWl32qU/edit?usp=sharing | 16:16 |
myoung | chandankumar: ^^ | 16:18 |
trown | rlandy: ok.. that is different than what I am hitting... apparently I am hitting some new legit DIB bug... see #tripleo | 16:20 |
trown | rlandy: I do think we probably need a reboot of the nodes though... adding that in | 16:21 |
* rlandy tries that\ | 16:23 | |
weshay | trown, rlandy https://bugs.launchpad.net/tripleo/+bug/1765123 | 16:25 |
openstack | Launchpad bug 1765123 in tripleo "dib-run-parts fails with /tmp/tmpkGJoNl/pre-install.d/05-rpm-epel-release: line 9: DISTRO_NAME: unbound variable" [Critical,Triaged] | 16:25 |
trown | rlandy: I put up a new patch with that, that is also on top of the fix for DIB issue | 16:25 |
trown | rlandy: updating etherpad as well | 16:25 |
panda|rover | trown: is https://review.openstack.org/561630 ready to be merged ? | 16:29 |
trown | panda|rover: no | 16:29 |
rlandy | thanks | 16:30 |
*** kopecmartin has quit IRC | 16:30 | |
weshay | trown, you need to change your patch to depend-on: https://review.openstack.org/#/c/562325/ | 16:32 |
trown | weshay: I am not so sure that patch will work tbh... I put my patch on top of alex's that will exclude dib from current repo | 16:33 |
weshay | trown, k k.. I'm just not sure if the bug is in the latest package | 16:34 |
weshay | in current-tripleo | 16:34 |
trown | that could not get through promotion | 16:34 |
weshay | k | 16:35 |
weshay | trown, sorry.. btw.. for the libvirt playbook.. run as a regular user? | 16:35 |
*** lucasagomes is now known as lucas-afk | 16:35 | |
trown | weshay: ya regular user | 16:36 |
*** panda|rover is now known as panda|rover|off | 16:36 | |
trown | rlandy: I think I am hitting the same thing as you now... I think this is why I hardcoded the key | 16:42 |
trown | rlandy: trying to find a better way | 16:42 |
rlandy | trown: yep - that was not the case before | 16:44 |
weshay | trown, mind if I add | 16:44 |
weshay | - name: ensure libvirt volume path exists | 16:44 |
weshay | file: | 16:44 |
weshay | path: "{{ libvirt_volume_path }}" | 16:44 |
weshay | state: directory | 16:44 |
weshay | to roles/libvirt/setup/common/tasks/main.yml | 16:45 |
rlandy | I just wanted to get the review to a mergeable state | 16:45 |
rlandy | hence the request to remove the hardcoded key | 16:45 |
trown | rlandy: ya, and annoying to have to change that anytime a new patchset is up | 16:45 |
weshay | ? | 16:46 |
trown | weshay: line 54 https://review.openstack.org/#/c/561630/5/roles/libvirt/setup/overcloud/tasks/fake_nodepool.yml | 16:47 |
trown | weshay: I had it hardcoded to /home/trown/... | 16:47 |
trown | weshay: but what I have there now doesnt actually work | 16:48 |
weshay | don't think it's related | 16:48 |
weshay | was failing on | 16:49 |
weshay | TASK [libvirt/setup/common : Start volume pool] ********************************************************************************************************************************************************************* | 16:49 |
weshay | Wednesday 18 April 2018 12:39:16 -0400 (0:00:00.461) 0:00:07.681 ******* | 16:49 |
weshay | fatal: [whayutin-testbox]: FAILED! => {"changed": false, "failed": true, "msg": "cannot open directory '/opt/vm_images': No such file or directory"} | 16:49 |
rlandy | I got that before | 16:49 |
rlandy | create the /opt/vm_images dir | 16:50 |
weshay | rlandy, right.. but ansible should do that | 16:51 |
weshay | :) | 16:51 |
weshay | trown, it won't hurt anything | 16:51 |
trown | weshay: ya we are talking about 2 different things :P | 16:53 |
weshay | trown, rlandy updated the review | 16:54 |
trown | weshay: there are a milion things to clean up, that is what will become the sprint, I just want to get something that works | 16:54 |
trown | weshay: if you hack on the same review though... that will get messy | 16:54 |
weshay | ya.. I don't like doing it | 16:54 |
weshay | gerrit sucks in that regard | 16:54 |
weshay | so.. what's a good workflow.. two diff reviews and then reconcile them> | 16:54 |
weshay | ? | 16:54 |
weshay | anyone have a sec for a zuul config question | 17:05 |
*** amoralej is now known as amoralej|off | 17:07 | |
rlandy | here we go again | 17:12 |
*** trown is now known as trown|lunch | 17:12 | |
rlandy | Set hostname correctly for subnode-0 - "Failed to connect to the host via ssh | 17:14 |
* rlandy goes back to old changes | 17:14 | |
*** ykarel has quit IRC | 17:21 | |
*** holser__ has quit IRC | 17:24 | |
*** marios has quit IRC | 17:37 | |
*** zoli|wfh is now known as zoli|gone | 18:03 | |
*** ykarel has joined #oooq | 18:04 | |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 18:14 |
*** ykarel has quit IRC | 18:19 | |
*** trown|lunch is now known as trown | 18:30 | |
trown | rlandy: I updated review with a fix for the ssh key issue | 18:32 |
trown | rlandy: it now defaults to ~/.ssh/id_rsa.pub ... but can be overridden, and actually works :P | 18:33 |
rlandy | cool - will try in a but - just fixing some hardware | 18:34 |
trown | the fix from alex for the undercloud install dib issue did not work for me on queens... trying pike | 18:36 |
*** holser__ has joined #oooq | 18:44 | |
trown | oh duh... just realized I have not been passing ZUUL_CHANGES to toci | 18:46 |
*** Goneri has quit IRC | 18:50 | |
*** tosky has quit IRC | 18:52 | |
*** tosky has joined #oooq | 18:55 | |
*** tesseract has quit IRC | 19:15 | |
*** atoth has quit IRC | 19:30 | |
*** holser__ has quit IRC | 19:36 | |
hrybacki | trown: woo! that was another one I was gonna bring up :P | 19:55 |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 20:14 |
*** dmellado has quit IRC | 20:45 | |
*** holser__ has joined #oooq | 21:06 | |
rlandy | trown: hit a failure TASK [repo-setup : Setup repos on live host] subnode-2 | 21:07 |
rlandy | see that? | 21:07 |
rlandy | subnode-2? | 21:07 |
rlandy | hosts only has subnode 0 and 1 | 21:08 |
*** strattao has quit IRC | 21:09 | |
trown | nah different hosts... gotta run though | 21:11 |
trown | doesnt matter what we name them in the dummy setup part though | 21:11 |
*** trown is now known as trown|outtypewww | 21:11 | |
*** strattao has joined #oooq | 21:12 | |
*** apetrich_ has joined #oooq | 21:27 | |
*** holser__ has quit IRC | 21:48 | |
*** apetrich_ has quit IRC | 21:49 | |
*** rfolco is now known as rfolco|off | 21:59 | |
hubbot | FAILING CHECK JOBS: gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv, tripleo-quickstart-extras-gate-newton-delorean-full-minimal | check logs @ https://review.openstack.org/472607 and fix them ASAP. | 22:14 |
*** rlandy has quit IRC | 22:18 | |
*** yolanda has quit IRC | 22:32 | |
*** tosky has quit IRC | 23:02 | |
*** strattao has quit IRC | 23:14 | |
*** strattao has joined #oooq | 23:15 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!