*** jmasud has joined #oooq | 00:04 | |
*** jmasud has quit IRC | 00:43 | |
*** holser has quit IRC | 03:17 | |
*** ykarel|away is now known as ykarel | 04:23 | |
*** ratailor has joined #oooq | 04:38 | |
*** jtomasek has joined #oooq | 04:59 | |
*** jmasud has joined #oooq | 05:13 | |
*** saneax has joined #oooq | 05:26 | |
*** jtomasek has quit IRC | 05:37 | |
ysandeep | folks o/ , Have you seen this kind of error before in container build : "SystemError: The following jobs were incomplete: [{'swift-base" ? but container built itself seems succesfull | 05:56 |
---|---|---|
ysandeep | https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-rhos-17/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-containers-rhel-8-rhos-17-build-push/75dc618/logs/build.log ? | 05:56 |
bhagyashris | ysandeep, hey, there was some discussion going on friday you can see the logs here http://paste.openstack.org/show/794448/ | 06:01 |
ysandeep | bhagyashris, thank you o/ | 06:02 |
*** matbu has joined #oooq | 06:06 | |
*** udesale has joined #oooq | 06:19 | |
*** jmasud has quit IRC | 06:40 | |
*** jtomasek has joined #oooq | 06:41 | |
*** jmasud has joined #oooq | 06:42 | |
*** yolanda has joined #oooq | 06:43 | |
*** ysandeep is now known as ysandeep|afk | 06:57 | |
*** skramaja has joined #oooq | 07:04 | |
*** jmasud has quit IRC | 07:12 | |
*** ccamacho has joined #oooq | 07:32 | |
*** tosky has joined #oooq | 07:39 | |
*** amoralej|off is now known as amoralej | 07:52 | |
*** jpena|off is now known as jpena | 07:56 | |
*** ysandeep|afk is now known as ysandeep | 08:08 | |
*** dtantsur has joined #oooq | 08:12 | |
*** jfrancoa has joined #oooq | 08:40 | |
*** sshnaidm|afk is now known as sshnaidm | 08:44 | |
*** jtomasek has quit IRC | 08:48 | |
*** jtomasek has joined #oooq | 08:50 | |
*** apetrich has joined #oooq | 08:54 | |
*** holser has joined #oooq | 08:56 | |
*** jschlueter has joined #oooq | 09:12 | |
*** ccamacho has quit IRC | 09:55 | |
*** jbadiapa has joined #oooq | 10:02 | |
akahat | cgoncalves, o/ | 10:30 |
cgoncalves | akahat, hi | 10:30 |
akahat | cgoncalves, i need to talk about this:https://review.opendev.org/#/c/731501 | 10:30 |
akahat | cgoncalves, will the enable_provider_drivers will not work here? | 10:31 |
akahat | I mean we have only three drivers: amphora, octavia and ovn. | 10:31 |
*** ccamacho has joined #oooq | 10:31 | |
cgoncalves | akahat, ideally the OVN provider driver should be appended. we must not assume the 'octavia' and 'amphora' provider drivers are enabled | 10:32 |
akahat | cgoncalves, okay. so appending only ovn will work? | 10:34 |
cgoncalves | akahat, yes, appended if the provider driver isn't already present. I am not sure I follow what's driving this change though. could you please help me understand? | 10:40 |
*** jtomasek has quit IRC | 10:41 | |
akahat | cgoncalves, this will help to understand: https://tree.taiga.io/project/tripleo-ci-board/task/1699?kanban-status=1447275 | 10:42 |
*** jtomasek has joined #oooq | 10:43 | |
*** derekh has joined #oooq | 10:46 | |
cgoncalves | akahat, maybe a better approach would be to construct the 'enabled_provider_drivers' in tempestconf based on the enabled provider drivers that you can get via Octavia API | 10:48 |
cgoncalves | btw, you have a typo in the conf setting. it is "enable***d***_provider_drivers" | 10:49 |
cgoncalves | akahat, https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=list-providers-detail#list-providers | 10:49 |
akahat | cgoncalves, okay. I'll fix it. | 10:52 |
akahat | cgoncalves, thanks :) | 10:52 |
cgoncalves | you're welcome | 10:54 |
arxcruz | sshnaidm: hey, how can I add an ansible collection in tq? | 10:58 |
arxcruz | sshnaidm: context: os_tempest is now using openstack.cloud collection | 10:58 |
arxcruz | and we need to add it on tq | 10:59 |
arxcruz | i'm checking the tripleo-ansible-operator, but it has the python.py file, and you install it using the tripleo-quickstart-extras requirements but the openstack.cloud doesn't have, and I'm not sure if it's the right way | 10:59 |
arxcruz | I would do it with ansible-galaxy, but it doesn't seems to have it in tq | 11:00 |
sshnaidm | arxcruz, https://review.opendev.org/#/c/730083/ | 11:04 |
arxcruz | sshnaidm: danke her Shnaidm, I was seein this path to follow but was unsure | 11:06 |
sshnaidm | arxcruz, de nada, señor Arx | 11:07 |
sshnaidm | or how do you call it in brasilian language? :D | 11:07 |
arxcruz | de nada senhor Arx | 11:08 |
arxcruz | we don't have the ñ in portuguese | 11:08 |
arxcruz | in this case it pronounces equal with some accent | 11:09 |
*** jpena is now known as jpena|lunch | 11:32 | |
*** rfolco has joined #oooq | 11:37 | |
weshay|ruck | 0/ | 11:48 |
cgoncalves | \0 | 11:50 |
weshay|ruck | pojadhav|ruck, rfolco we should sync up | 11:53 |
pojadhav|ruck | weshay|ruck, yup | 11:53 |
weshay|ruck | k.. holding for rfolco | 11:53 |
rfolco | weshay|ruck, pojadhav|ruck need 2 min, will get a coffee | 11:54 |
*** rfolco is now known as rfolco|rover | 11:54 | |
weshay|ruck | https://meet.google.com/one-rbow-bcs | 11:56 |
weshay|ruck | pojadhav|ruck, 2020-06-08 08:33:52.573060 | primary | urllib3.exceptions.LocationParseError: Failed to parse: https://trunk.rdoproject.org/api-centos8-master-uc/api/report_result | 11:58 |
*** rlandy has joined #oooq | 12:06 | |
*** skramaja has quit IRC | 12:10 | |
*** skramaja has joined #oooq | 12:10 | |
rfolco|rover | weshay|ruck, https://review.opendev.org/#/c/733790 | 12:11 |
*** jfrancoa has quit IRC | 12:12 | |
*** jfrancoa has joined #oooq | 12:14 | |
weshay|ruck | rfolco|rover, pojadhav|ruck https://logserver.rdoproject.org/openstack-periodic-24hr/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset002-train/773e973/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz | 12:20 |
*** amoralej is now known as amoralej|lunch | 12:22 | |
*** ratailor has quit IRC | 12:31 | |
rlandy | rfolco|rover: do we have scrum today? | 12:39 |
rfolco|rover | rlandy, no, we have only on thu | 12:40 |
rlandy | oh ok - we took the once a week option | 12:40 |
rfolco|rover | per team's suggestion yep | 12:40 |
rfolco|rover | for this sprint only as experiment | 12:40 |
*** udesale_ has joined #oooq | 12:49 | |
weshay|ruck | rfolco|rover, that's the neutron ptl.. Slawomir Kaplonski | 12:52 |
*** udesale has quit IRC | 12:52 | |
ysandeep | rlandy, hey sorry i was out of friday so I am not sure of what you and chandankumar decided for image build issue. Any luck with it? | 12:52 |
rfolco|rover | weshay|ruck, ok thanks | 12:52 |
weshay|ruck | rfolco|rover, irc slaweq | 12:53 |
rlandy | ysandeep: hi - yes - chandankumar change some settings in the env review | 12:53 |
ysandeep | rlandy, Whenever you are free, can we sync for some minutes today. | 12:53 |
rlandy | ysandeep: but then that image expired :( | 12:53 |
rlandy | so I had to update the image | 12:54 |
*** saneax is now known as saneax_AFK | 12:54 | |
rlandy | ysandeep: yeah - just trying to kick the two diff image build jobs again os we can promote and clear the scenario failures | 12:54 |
rlandy | will ping you in a bit | 12:54 |
rfolco|rover | pojadhav|ruck, need 10 min before we sync | 12:55 |
ysandeep | rlandy, ack thanks! and yes that container build is failing with weird System Errors , i saw you and marios had some discussion about it on friday. | 12:55 |
pojadhav|ruck | rfolco|rover, okay | 12:56 |
*** jpena|lunch is now known as jpena | 13:03 | |
*** ykarel is now known as ykarel|afk | 13:11 | |
*** amoralej|lunch is now known as amoralej | 13:14 | |
rlandy | ysandeep: ok - if you have time now, let's chat | 13:16 |
ysandeep | rlandy, sure | 13:16 |
rlandy | ysandeep: https://meet.google.com/xpa-ceom-onm | 13:17 |
sshnaidm | rlandy, hi | 13:20 |
sshnaidm | rlandy, how is downstream ovb going? | 13:20 |
*** rlandy is now known as rlandy|mtg | 13:20 | |
rlandy|mtg | sshnaidm: not great - the introspection still fails | 13:20 |
rlandy|mtg | it looks like the cloud is very slow to respond to power actions | 13:21 |
sshnaidm | rlandy|mtg, timeout? | 13:21 |
rlandy|mtg | in mtg now - will show you in a bit | 13:21 |
sshnaidm | ack | 13:21 |
rlandy|mtg | introspection outright fails | 13:21 |
rlandy|mtg | no lcear trace as to why | 13:21 |
weshay|ruck | pojadhav|ruck, rfolco|rover directories/files still missing in latest https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-centos-8-buildimage-overcloud-full-master/4bd3223/job-output.txt | 13:31 |
rfolco|rover | 2020-06-08 08:37:50.462917 | primary | ok: All assertions passed | 13:35 |
rfolco|rover | 2020-06-08 08:37:50.463339 | | 13:35 |
rfolco|rover | 2020-06-08 08:37:50.500997 | primary | ok: All assertions passed | 13:35 |
rfolco|rover | missing /etc/pki/CA/private | 13:35 |
rfolco|rover | weshay|ruck, ok let me fix it and test the routine locally first | 13:36 |
weshay|ruck | rfolco|rover, what are you fixing? | 13:39 |
rfolco|rover | weshay|ruck, it should fail assertions | 13:39 |
rfolco|rover | isn't ? | 13:39 |
weshay|ruck | rfolco|rover, 2020-06-08 13:24:28.136248 | primary | "msg": "Assertion failed" | 13:39 |
rfolco|rover | weshay|ruck, I was looking at different job https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-centos-8-buildimage-overcloud-full-master/ea5c167/job-output.txt | 13:42 |
*** TrevorV has joined #oooq | 13:43 | |
rfolco|rover | weshay|ruck, ok so its doing what is supposed to | 13:43 |
weshay|ruck | rfolco|rover, the build is failing appropriately, but we need to resolve the root cause of missing files still | 13:46 |
rfolco|rover | weshay|ruck, yeah, looking what package it provides and comparing to green jobs | 13:46 |
rfolco|rover | weshay|ruck, openssl-libs-1.1.1c-2.el8_1.1.x86_64 is installed | 13:48 |
rfolco|rover | in the controller at least :) | 13:48 |
rfolco|rover | weshay|ruck, this check is weird... there is no "missing /etc/pki..." on rpm_va https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-centos-8-buildimage-overcloud-full-master/4bd3223/rpm_va.txt | 13:54 |
rfolco|rover | so the grep fails | 13:54 |
rfolco|rover | let me look at this test | 13:54 |
rfolco|rover | just thinking loud here... | 14:00 |
rfolco|rover | [rfolco@redbox tripleo-ci]$ rpm -Va openssl-libs-1.1.1c-2.el8_1.1.x86_64 /etc/pki/tls/private | 14:00 |
rfolco|rover | [rfolco@redbox tripleo-ci]$ echo $? | 14:00 |
rfolco|rover | 0 | 14:00 |
*** ykarel|afk is now known as ykarel | 14:09 | |
rfolco|rover | weshay|ruck, I think I know what the issue is | 14:09 |
rfolco|rover | weshay|ruck, the check is giving false negative | 14:09 |
weshay|ruck | rfolco|rover, keep in mind, the issue often shows up in the deploy.. what makes you think it's a false negative? | 14:11 |
rfolco|rover | the test now is giving false negative | 14:11 |
rfolco|rover | the fix | 14:11 |
rfolco|rover | weshay|ruck, if we tmate I can explain better | 14:12 |
weshay|ruck | k.. in mtg atm | 14:12 |
rfolco|rover | ok | 14:12 |
*** rlandy|mtg is now known as rlandy | 14:15 | |
rlandy | sshnaidm: hi .. is it possible that we see a significant slowdown on the nodes after changing the network? | 14:16 |
rlandy | sshnaidm: we have jobs timing out that never did before last week | 14:16 |
rfolco|rover | weshay|ruck, ok, I'm confident this is wrong and working on a fix... http://pastebin.test.redhat.com/872965 -- buggy code: https://github.com/openstack/tripleo-ci/blob/508376e178eab29f0debc5dbb40908d5dc985eb1/roles/oooci-build-images/tasks/image_sanity.yaml#L36 | 14:17 |
rfolco|rover | weshay|ruck, in summary: if (***AND ONLY IF***) we find /etc/pki/tls/private in rpm_Va output, we should check if it is "missing" | 14:21 |
rfolco|rover | updated pastebin shows this http://pastebin.test.redhat.com/872972 | 14:21 |
ysandeep | rlandy, fyi.. test run for that last task passed | 14:23 |
weshay|ruck | rfolco|rover, ah ya.. see what you mean.. /etc/pki is not listed here https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-centos-8-buildimage-overcloud-full-master/4bd3223/rpm_va.txt | 14:25 |
weshay|ruck | but is marked as failed here https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-centos-8-buildimage-overcloud-full-master/4bd3223/job-output.txt | 14:25 |
rfolco|rover | weshay|ruck, exactly, we should fail only if the file is in rpm_va plus "missing" | 14:25 |
rfolco|rover | patch coming | 14:26 |
weshay|ruck | k great | 14:26 |
sshnaidm | rlandy, maybe, but I don't know how it's possible | 14:31 |
sshnaidm | rlandy, routing in internal network should be better and faster actually.. | 14:32 |
*** TrevorV has quit IRC | 14:36 | |
sshnaidm | rlandy, do you see slowness in specific steps? | 14:37 |
rlandy | sshnaidm: either way, wrt OVB, we have IPMI connection now - so that is good bur the response to power on/off is very slow | 14:37 |
sshnaidm | rlandy, logs? | 14:37 |
*** TrevorV has joined #oooq | 14:37 | |
rlandy | sshnaidm: see the last run on https://code.engineering.redhat.com/gerrit/#/c/200436 | 14:39 |
sshnaidm | rlandy, and which jobs does time out? | 14:40 |
rlandy | sshnaidm: the MTU on the private subnet is 1450 and 1500 on external | 14:40 |
rlandy | sshnaidm: the ipa multonode job for example | 14:40 |
rlandy | getting logs | 14:40 |
rlandy | should those NTUs match? | 14:41 |
rlandy | MTUs | 14:41 |
sshnaidm | "No nodes are manageable at this time." | 14:42 |
sshnaidm | rlandy, the less mtu the better.. | 14:42 |
rlandy | sshnaidm: so if you watch introspection, two things happen | 14:44 |
rlandy | the nodes stay in validating for some time | 14:44 |
rlandy | and then get to enroll | 14:44 |
rlandy | or manageable | 14:44 |
rlandy | then when the nodes are in manageable state, | 14:44 |
rlandy | and introspection does start, the power on command is issued | 14:45 |
rlandy | and registered | 14:45 |
rlandy | but the nodes don't power on for a very long time | 14:45 |
sshnaidm | rlandy, I don't see in this job introspection starts, it fails before with "no manageable nodes" error | 14:46 |
rlandy | I reran on the node | 14:46 |
sshnaidm | rlandy, maybe when introspection starts in job, the nodes are still in enroll | 14:46 |
rlandy | sshnaidm: yes | 14:46 |
rlandy | the nodes take too long to get to every state that is expected | 14:47 |
rlandy | tbh, idk if this cloud can support OVB | 14:47 |
rlandy | the nodes are in fact still in verifying | 14:47 |
rlandy | and only get to enroll afterwards | 14:48 |
sshnaidm | rlandy, so maybe worth to add polling check if they're in manageable state with timeout | 14:49 |
sshnaidm | to wait for them | 14:49 |
rlandy | sshnaidm: maybe something else hit this cloud ... see the container build job for example: | 14:49 |
rlandy | https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?job_name=periodic-tripleo-containers-rhel-8-rhos-17-build-push | 14:49 |
rlandy | see the slow down after 06/02 | 14:49 |
rlandy | so 06/03 onwards | 14:50 |
rlandy | the jobs take twice as long | 14:50 |
rlandy | 2020-06-03 things go downhill | 14:51 |
sshnaidm | yeah, no idea what's happening.. | 14:53 |
rfolco|rover | weshay|ruck, https://review.opendev.org/734112 Fix image_sanity check | 14:57 |
rfolco|rover | weshay|ruck, https://review.rdoproject.org/r/27986 Test image_sanity fix | 14:57 |
rfolco|rover | pojadhav|ruck, ^ | 14:58 |
rlandy | sshnaidm: what's the equivalent of provider_net_shared_3 om rdocloud? | 15:01 |
rlandy | 38.145.32.0/22 | 15:02 |
sshnaidm | rlandy, yep, 38.145.32.0/22 | 15:03 |
weshay|ruck | rfolco|rover, I like the change, but look at the output here https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-centos-8-buildimage-overcloud-full-master/4bd3223/rpm_va.txt | 15:03 |
weshay|ruck | \/etc/pki is not listed | 15:04 |
weshay|ruck | so either grep should return that it's not found | 15:04 |
weshay|ruck | https://review.opendev.org/#/c/734112/1/roles/oooci-build-images/tasks/image_sanity.yaml | 15:04 |
weshay|ruck | rfolco|rover, run a grep yourself and check the return code on something you find and something you don't | 15:05 |
weshay|ruck | rfolco|rover, and look at ur patch again | 15:05 |
*** skramaja has quit IRC | 15:06 | |
weshay|ruck | zbr, FYI.. https://review.opendev.org/#/c/734083/ | 15:07 |
weshay|ruck | zbr, extra points if we can move to centos-stream | 15:07 |
zbr | i use stream locally, switching to it should be very easy | 15:08 |
zbr | if something works on non-stream, likely it will still work with stream. | 15:08 |
weshay|ruck | zbr, ensuring we have the ecosystem upstream is ++ | 15:08 |
rfolco|rover | weshay|ruck, rpm -Va won't add all the files to the output.... /etc/pki/tls/private exists and is not added to the rpm_va.txt | 15:08 |
weshay|ruck | rfolco|rover, the issue is w/ grep | 15:08 |
weshay|ruck | and the return code | 15:08 |
weshay|ruck | afaict | 15:08 |
ykarel | rlandy, weshay|ruck can u review https://review.opendev.org/#/c/733471/ | 15:09 |
sshnaidm | rlandy, we can revert back tenant config and see if it helps to other jobs, but we'll loose ovb for this time | 15:09 |
rlandy | sshnaidm: discussing on #rhos-ops | 15:10 |
weshay|ruck | rfolco|rover, http://pastebin.test.redhat.com/873012 | 15:10 |
weshay|ruck | ykarel, looking | 15:10 |
rfolco|rover | weshay|ruck, yes thats why I reverse grep in my patch | 15:11 |
rlandy | ykarel: oh gosh, that constraints change keeps goinh | 15:11 |
rlandy | going | 15:11 |
rfolco|rover | weshay|ruck, I look for missing first | 15:11 |
ykarel | rlandy, yes :( | 15:11 |
rfolco|rover | weshay|ruck, then I reverse grep -v "/etc/pki..." | 15:11 |
weshay|ruck | rfolco|rover, ur right.. missed the -v | 15:11 |
weshay|ruck | :) | 15:11 |
rfolco|rover | weshay|ruck, I tested w/ a file that is not in rpm_va output, a file that is in the file, but not missing... and a file that is marked as missing | 15:12 |
rfolco|rover | weshay|ruck, the famous "works in my machine" | 15:13 |
weshay|ruck | zbr, can you work w/ carlos on enabling centos-stream for tripleo upstream? not RIGHT NOW, but generally speaking? https://review.opendev.org/#/q/topic:centos-stream+(status:open+OR+status:merged) | 15:16 |
weshay|ruck | we've talked about this is in the past | 15:16 |
zbr | i already added a warch on that change, I will help him. | 15:16 |
cgoncalves | cool, thanks | 15:17 |
weshay|ruck | zbr++ | 15:17 |
weshay|ruck | cgoncalves++ | 15:17 |
*** jmasud has joined #oooq | 15:19 | |
rlandy | ysandeep: fyi ... see #rhos-ops | 15:37 |
ysandeep | rlandy, checking | 15:37 |
rlandy | having discussion about slow downstream cloud | 15:37 |
weshay|ruck | rlandy, sshnaidm working this from another angle for ya.. /me is MOD, looking at the customer escalation board atm | 15:41 |
weshay|ruck | that issue is not tracked there, and imho should me | 15:41 |
weshay|ruck | be | 15:41 |
cgoncalves | zbr, weshay|ruck: does tripleo build images from an existing centos image ("centos" DIB element) or "centos-minimal"? | 15:41 |
*** jmasud has quit IRC | 15:44 | |
weshay|ruck | sec.. dealing w/ psi issues | 15:45 |
weshay|ruck | rlandy, sshnaidm open a lp to track, and mark promotion blocker | 15:46 |
rlandy | weshay|ruck: ack | 15:47 |
*** ykarel is now known as ykarel|away | 15:51 | |
*** ysandeep is now known as ysandeep|afk | 15:54 | |
rlandy | rfolco|rover: can you post that LP you had on the failing container build when they all pass? | 15:59 |
rlandy | weshay|ruck: ^^ | 15:59 |
weshay|ruck | I'm not aware of that | 16:00 |
weshay|ruck | I'll check the hackmd | 16:00 |
rlandy | weshay|ruck: rfolco|rover: got it ... https://bugs.launchpad.net/tripleo/+bug/1879365 | 16:04 |
openstack | Launchpad bug 1879365 in tripleo "[container build] SystemError: The following jobs were incomplete: state=finished" [High,Incomplete] | 16:04 |
*** udesale_ has quit IRC | 16:08 | |
*** dtantsur is now known as dtantsur|afk | 16:10 | |
*** ysandeep|afk is now known as ysandeep | 16:15 | |
*** jmasud has joined #oooq | 16:17 | |
rlandy | weshay|ruck: sshnaidm: from #rhos-ops, it looks like there are people working on the downstream cloud slowness and korde "I've bumped up the urgency in the case again." | 16:22 |
rlandy | do we want another blocker LP? | 16:22 |
sshnaidm | idk, but I'd like to get know about such cases asap and not waste hours trying to find what's wrong | 16:23 |
sshnaidm | the notification part doesn't seem to work at all | 16:23 |
rlandy | ok - creating one anyways | 16:24 |
weshay|ruck | rlandy, what's the bug #? | 16:24 |
weshay|ruck | notification? | 16:25 |
weshay|ruck | sshnaidm, which part are you speaking to? | 16:25 |
sshnaidm | weshay|ruck, if cloud is broken we should know about that asap | 16:25 |
weshay|ruck | rlandy, bz is probably more appropriate in this case | 16:25 |
rlandy | https://one.redhat.com/tools-and-services/details/psi-openstack-cloud-d | 16:25 |
rlandy | weshay|ruck: k - adding | 16:25 |
weshay|ruck | sshnaidm, yes.. agree | 16:25 |
rlandy | weshay|ruck: we're locked out of JIRA atm | 16:25 |
rlandy | auth issue | 16:25 |
rlandy | there is some tracking there | 16:26 |
Tengu | sso's dead apparently. | 16:26 |
weshay|ruck | probably runs on PSI | 16:26 |
Tengu | uhuhu | 16:27 |
ysandeep | rlandy, Fyi.. Hey we tried but unable to reproduce issue manually - tried running that test playbook against localhost, undercloud , Trying running playbook from outside just like zuul does but didn't hit any issue :( | 16:27 |
ysandeep | rlandy, Working on theory if issue is somewhere else, and it's being false reported. I am rerunning that job replacing localhost with undercloud for a test. | 16:27 |
weshay|ruck | rlandy, sshnaidm Alan owns the relationship w/ the cloud provider.. we just need to cix.. | 16:27 |
rlandy | ysandeep: ^^ there are a lot of issues with the downsteam cloud atm | 16:28 |
weshay|ruck | cix can be cross referenced w/ jira or what ever other bs we need | 16:28 |
rlandy | weshay|ruck: yeah - creating BZ - will mention the JIRA ticket | 16:28 |
ysandeep | rlandy, ack, not sure if its related but will trigger jobs later then | 16:29 |
rlandy | ysandeep: may not be worth your debug time atm | 16:29 |
weshay|ruck | rlandy, sshnaidm https://access.redhat.com/support/cases/#/case/02671591 | 16:29 |
weshay|ruck | fyi | 16:29 |
ysandeep | rlandy, o/ thanks.. i will go to sleep then.. See you tomorrow o/ Have a great day ahead :) | 16:30 |
rlandy | ysandeep: yeah - sorry about all this | 16:30 |
sshnaidm | weshay|ruck, "There was an error loading case." | 16:30 |
zbr | rlandy or weshay|ruck: quick review on https://review.rdoproject.org/r/#/c/27987/ | 16:30 |
rlandy | ysandeep: will leave you email if there is any progress | 16:30 |
ysandeep | rlandy, thanks! that will help | 16:30 |
*** ysandeep is now known as ysandeep|away | 16:30 | |
weshay|ruck | rlandy, imho.. bz that lists that ticket is enough.. then email rhos-dev w/ the cix flags in the subject | 16:30 |
weshay|ruck | sshnaidm, you may need to be logged in | 16:31 |
weshay|ruck | or another system is down | 16:31 |
weshay|ruck | lolz | 16:31 |
sshnaidm | weshay|ruck, I am.. | 16:31 |
rlandy | weshay|ruck: yep - can't log in to BZ atm | 16:32 |
weshay|ruck | lolz | 16:32 |
rlandy | another 2020 disaster | 16:32 |
rlandy | zbr: we have no more molecule test in centos7? if so, great | 16:33 |
zbr | rlandy: incorrect: we still have them but we are now using py36 on both c7/8. | 16:33 |
sshnaidm | weshay|ruck, upstream CI times out as well, takes 1 hours to prepare containers: https://187cce064a1459d372de-21abb6d2b9f578210dfe07e5ee1d658a.ssl.cf1.rackcdn.com/730083/2/check/tripleo-ci-centos-8-scenario001-standalone/ac77443/logs/undercloud/var/log/tripleo-container-image-prepare.log | 16:33 |
rlandy | zbr: then +2 | 16:34 |
zbr | basically this helps us to migrate our codebase to py36 w/o forcing the system bump at the same time | 16:34 |
zbr | smaller steps = safer | 16:34 |
rlandy | ack | 16:34 |
* sshnaidm is out to prepare bunker and supplies | 16:34 | |
*** sshnaidm is now known as sshnaidm|afk | 16:34 | |
rlandy | Requests typically are < 2ms but are now taking > 10 secs. | 16:39 |
rlandy | yep - that looks like our issue | 16:39 |
rlandy | sloooooooooowwww cloud | 16:39 |
*** amoralej is now known as amoralej|lunch | 16:52 | |
*** amoralej|lunch is now known as amoralej|off | 16:52 | |
*** jmasud has quit IRC | 16:57 | |
*** derekh has quit IRC | 17:01 | |
*** jmasud has joined #oooq | 17:06 | |
rfolco|rover | zbr, on a quick look, do you understand why this failed ? https://08c3aae88ab0ce3ed41d-baf4f807d40559415da582760ebf9456.ssl.cf1.rackcdn.com/733659/7/check/tripleo-buildimage-overcloud-full-centos-7-train/c35c051/build.log | 17:15 |
rfolco|rover | zbr, if command -v python3 executed, why python_path is empty and was the last command to run ? https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/dib-python/pre-install.d/01-dib-python#L18 | 17:16 |
zbr | not sure, but i remember having a similar problem in other places | 17:18 |
rfolco|rover | zbr, ok thanks... will compare to the scl run | 17:19 |
rfolco|rover | weshay|ruck, can you re-w+ this one https://review.opendev.org/#/c/732618/ | 17:19 |
rfolco|rover | weshay|ruck, not sure what happened | 17:20 |
weshay|ruck | rfolco|rover, depends-on https://review.opendev.org/#/c/730763 | 17:20 |
zbr | rfolco|rover: i wonder if command may return multiple lines in some cases, could break the code in ugly ways | 17:20 |
*** jpena is now known as jpena|off | 17:20 | |
weshay|ruck | which needs https://review.opendev.org/#/c/733790/ | 17:20 |
rfolco|rover | weshay|ruck, ah ok gotcha | 17:21 |
zbr | i know that type does return multiple results and that you need to "| head -n1" | 17:21 |
weshay|ruck | rfolco|rover, this should fix the epel issue if we saw that in ussuri https://review.opendev.org/#/c/733790/3/container-images/tripleo_kolla_template_overrides.j2 | 17:21 |
rfolco|rover | weshay|ruck, yep | 17:21 |
rfolco|rover | zbr, command -v your mean ? | 17:22 |
zbr | yep | 17:22 |
rfolco|rover | aahhh the first result might be empty | 17:23 |
rfolco|rover | or command -v python3 is really retrieving none | 17:27 |
weshay|ruck | rlandy, fyi.. this fails if you run master release from a centos-7 virthost fyi.. /me updates this patch | 17:45 |
weshay|ruck | https://review.opendev.org/#/c/733471/3 | 17:45 |
weshay|ruck | rlandy, can chat when ever | 17:49 |
rlandy | weshay|ruck: https://meet.google.com/qin-kmpv-nwf | 17:59 |
*** saneax_AFK has quit IRC | 18:03 | |
*** jmasud has quit IRC | 18:11 | |
*** jmasud has joined #oooq | 18:13 | |
*** rlandy is now known as rlandy|mtg | 18:20 | |
*** jmasud has quit IRC | 18:26 | |
weshay|ruck | rlandy|mtg, https://review.opendev.org/#/c/724193/ | 18:50 |
weshay|ruck | https://review.opendev.org/#/c/729824/ | 18:50 |
weshay|ruck | rlandy|mtg, tripleo-build-containers-ubi-8SUCCESS in 45m 15s (non-voting) | 18:52 |
weshay|ruck | rlandy|mtg, https://review.opendev.org/#/c/724193/50 | 18:52 |
*** rlandy|mtg is now known as rlandy | 18:56 | |
weshay|ruck | rlandy, the patch to fix ussuri containers build is close to merging | 18:57 |
weshay|ruck | known issue | 18:57 |
rlandy | great | 18:57 |
weshay|ruck | rlandy, removed DNM, https://review.opendev.org/#/c/730321/ | 19:12 |
rlandy | thanks - voted | 19:12 |
weshay|ruck | rlandy, k.. and I got these in the right place.. thought I didn't but I did | 19:13 |
weshay|ruck | https://review.opendev.org/#/c/733392/ | 19:13 |
weshay|ruck | https://review.opendev.org/#/c/734100/1/zuul.d/standalone-jobs.yaml | 19:13 |
rlandy | weshay|ruck: CIX email sent for https://bugzilla.redhat.com/show_bug.cgi?id=1845266 | 19:20 |
openstack | bugzilla.redhat.com bug 1845266 in releng "Significant slowdown in running jobs in PSI upshift - internal zuul" [Unspecified,New] - Assigned to apevec | 19:20 |
rlandy | weshay|ruck: I'm going to try the old container build push job again ( from testproject) now that kforde says the API response time issues may have been addressed | 19:22 |
rlandy | will see if it makes any difference | 19:22 |
*** jmasud has joined #oooq | 19:37 | |
rlandy | weshay|ruck: https://code.engineering.redhat.com/gerrit/202706 Update rhos-17 promotion criteria with new jobs added. | 19:49 |
weshay|ruck | rlandy, thanks | 19:59 |
rfolco|rover | weshay|ruck, I don't know what to do with fs020, failing on master, ussuri, train.. | 20:07 |
rfolco|rover | weshay|ruck, mostly the same issue: pacemaker | 20:07 |
weshay|ruck | which issue w/ pacemaker? | 20:08 |
weshay|ruck | rfolco|rover, is it on https://hackmd.io/YAqFJrKMThGghTW4P2tabA?both ? | 20:10 |
rfolco|rover | this is failing since ever | 20:11 |
rfolco|rover | https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master&job_name=periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-ussuri&job_name=periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-train&result=FAILURE | 20:11 |
rfolco|rover | weshay|ruck, https://bugs.launchpad.net/tripleo/+bug/1867602 | 20:11 |
openstack | Launchpad bug 1867602 in tripleo "overcloud deploy failed due to Systemd start for pcsd failed" [Medium,Triaged] | 20:11 |
rfolco|rover | bug filed a few sprint ago | 20:11 |
rfolco|rover | weshay|ruck, its not 100% consistent, it also failed on tempest sometimes | 20:12 |
weshay|ruck | rfolco|rover, that's the images dude | 20:12 |
weshay|ruck | rfolco|rover, it's the missing files in the overcloud images | 20:13 |
rfolco|rover | hmmm | 20:13 |
rfolco|rover | mark as up then ? | 20:13 |
rfolco|rover | dup | 20:13 |
weshay|ruck | rfolco|rover, No such file or directory: '/var/log/pcsd/pcsd.log | 20:13 |
weshay|ruck | rfolco|rover, that goes away.. like the /etc/pki issue goes away when there is a valid working overcloud-full image | 20:14 |
rfolco|rover | ok | 20:15 |
weshay|ruck | rfolco|rover, see my last comment in that bug | 20:15 |
rfolco|rover | weshay|ruck, will mark dup of https://bugs.launchpad.net/tripleo/+bug/1879766 | 20:16 |
openstack | Launchpad bug 1879766 in tripleo "master ovb jobs failing on Destination directory /etc/pki/tls/private does not exist" [Critical,Triaged] - Assigned to chandan kumar (chkumar246) | 20:16 |
weshay|ruck | k | 20:16 |
weshay|ruck | rfolco|rover, sooner we can push a working image, sooner these problems go away | 20:18 |
weshay|ruck | rfolco|rover, https://review.rdoproject.org/r/#/c/27986/ | 20:18 |
weshay|ruck | focus there | 20:19 |
rfolco|rover | again... | 20:19 |
rfolco|rover | yep | 20:19 |
rfolco|rover | working on it | 20:19 |
rfolco|rover | well.. now the check IS RIGHT | 20:20 |
rfolco|rover | weshay|ruck, ^ image_sanity is doing what is supposed to do | 20:21 |
rfolco|rover | the files are really missing and failing the job | 20:21 |
weshay|ruck | not all the time | 20:22 |
weshay|ruck | rfolco|rover, the sanity check was failing ALL the time.. you are fixing that bit | 20:23 |
weshay|ruck | rfolco|rover, pojadhav|ruck and chandankumar should pick it up from you.. and also figure out why it does in fact fail sometimes | 20:24 |
rfolco|rover | weshay|ruck, last time it failed even if the filename was not in the rpm va output. | 20:25 |
weshay|ruck | rlandy, https://code.engineering.redhat.com/gerrit/#/c/202706/ is correct, merged | 20:30 |
rlandy | thanks | 20:31 |
rfolco|rover | weshay|ruck, but even fixing the check itself, if the file is missing on rpm_va output, the job will fail... | 20:31 |
rfolco|rover | missing /var/lib/pcsd | 20:31 |
weshay|ruck | ya.. same shit | 20:31 |
rfolco|rover | so also need to understand why the image is missing files... | 20:32 |
*** jbadiapa has quit IRC | 20:32 | |
weshay|ruck | rfolco|rover, yes indeed we do.. this started after chandankumar's refactor of the tripleo-ci/ooo-buildimage and oooq/buildimages role | 20:32 |
rfolco|rover | weshay|ruck, I did not look at the code yet, but maybe we close out the qcow2 image while its still copying files into it | 20:33 |
*** ccamacho has quit IRC | 20:39 | |
*** jtomasek has quit IRC | 21:18 | |
*** jmasud has quit IRC | 21:28 | |
*** jmasud has joined #oooq | 21:38 | |
*** jmasud has quit IRC | 21:55 | |
*** jfrancoa has quit IRC | 21:56 | |
rlandy | weshay|ruck: still around? | 22:02 |
weshay|ruck | rlandy, aye | 22:10 |
weshay|ruck | rlandy, check ur email | 22:10 |
rlandy | weshay|ruck: thanks for the graphical backup | 22:11 |
weshay|ruck | :) | 22:11 |
rlandy | weshay|ruck: your opinion of reverting the change to add the private network? | 22:11 |
weshay|ruck | rlandy, do you want me to look at other jobs? | 22:12 |
rlandy | weshay|ruck: I don't think so - we're in the same place. OVB just died - as did BM agian in accessing the undercloud | 22:12 |
weshay|ruck | rlandy, let's schedule a 1/2 for you, sagi and myself to chat | 22:12 |
rlandy | hangs on introspection | 22:12 |
rlandy | weshay|ruck: k - tomorrow morning | 22:13 |
rlandy | at this point, I'd rather go back to the direct external node connection | 22:13 |
rlandy | and give OVB a shot another time | 22:13 |
weshay|ruck | k | 22:14 |
weshay|ruck | let's rope sagi and chat about it | 22:14 |
rlandy | yep | 22:15 |
rlandy | I give up | 22:18 |
*** dmellado_ has joined #oooq | 23:10 | |
*** dmellado has quit IRC | 23:11 | |
*** dmellado_ is now known as dmellado | 23:11 | |
*** tosky has quit IRC | 23:13 | |
*** TrevorV has quit IRC | 23:19 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!