*** rascasoft has quit IRC | 00:00 | |
*** dsneddon has joined #oooq | 00:02 | |
*** dsneddon has quit IRC | 00:06 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq-container, tripleo-ci- (4 more messages) | 00:18 |
---|---|---|
*** rlandy has quit IRC | 00:20 | |
*** vinaykns has joined #oooq | 00:33 | |
*** dsneddon has joined #oooq | 00:35 | |
*** vinaykns has quit IRC | 00:39 | |
*** dsneddon has quit IRC | 00:49 | |
*** agopi has quit IRC | 01:04 | |
*** dsneddon has joined #oooq | 01:15 | |
*** dsneddon has quit IRC | 01:28 | |
*** rascasoft has joined #oooq | 01:36 | |
*** dsneddon has joined #oooq | 01:39 | |
*** rascasoft has quit IRC | 01:44 | |
*** dsneddon has quit IRC | 01:55 | |
*** dsneddon has joined #oooq | 02:00 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq-container, tripleo-ci- (4 more messages) | 02:18 |
*** openstackstatus has quit IRC | 02:22 | |
*** openstack has joined #oooq | 02:23 | |
*** ChanServ sets mode: +o openstack | 02:23 | |
*** rascasoft has joined #oooq | 03:00 | |
*** dsneddon has quit IRC | 03:05 | |
*** agopi has joined #oooq | 03:06 | |
*** rascasoft has quit IRC | 03:10 | |
*** apetrich has quit IRC | 03:16 | |
*** saneax has joined #oooq | 03:18 | |
*** dsneddon has joined #oooq | 03:23 | |
*** dsneddon has quit IRC | 03:28 | |
*** dsneddon has joined #oooq | 03:32 | |
*** dsneddon has quit IRC | 03:37 | |
*** ykarel|away has joined #oooq | 03:39 | |
*** ykarel|away is now known as ykarel | 03:39 | |
*** udesale has joined #oooq | 03:49 | |
*** saneax has quit IRC | 03:51 | |
*** skramaja has joined #oooq | 03:52 | |
*** skramaja has quit IRC | 03:56 | |
*** dsneddon has joined #oooq | 03:56 | |
*** skramaja has joined #oooq | 03:57 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-fedora-28-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq-container @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039 @ https://review.openstack.org/602248, (3 more messages) | 04:18 |
*** rascasoft has joined #oooq | 04:41 | |
*** udesale has quit IRC | 04:45 | |
*** udesale has joined #oooq | 04:46 | |
*** rascasoft has quit IRC | 04:49 | |
*** dsneddon has quit IRC | 05:05 | |
*** marios_ has joined #oooq | 05:09 | |
*** marios_ is now known as marios | 05:09 | |
ykarel | marios, looks like container-build push job is not working correctly | 05:16 |
ykarel | i commented https://bugs.launchpad.net/tripleo/+bug/1818994/comments/4 | 05:17 |
openstack | Launchpad bug 1818994 in tripleo "ovb jobs broken because pacemaker is unconfigured" [Critical,Triaged] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles) | 05:17 |
marios | o/ ykarel thanks checking in a minute | 05:21 |
*** dsneddon has joined #oooq | 05:35 | |
*** dsneddon has quit IRC | 05:39 | |
*** ykarel is now known as ykarel|afk | 05:40 | |
*** dsneddon has joined #oooq | 05:42 | |
marios | ykarel|afk: o/ looking at it now (we *should* be using tripleo-ci-testing repos in the build job i mean we added it recently but there could be something missing still | 05:53 |
*** jtomasek has joined #oooq | 05:59 | |
*** rf0lc0 has joined #oooq | 06:00 | |
*** ykarel|afk is now known as ykarel | 06:01 | |
ykarel | marios, afaik tripleo-ci-testing repo is used | 06:01 |
ykarel | but still tripleo-ci-testing tagged containers and version-hash tagged containers are different | 06:01 |
ykarel | tripleo-ci-testing tagged wrong, version hash tagged correct | 06:02 |
ykarel | you can try downloading both and confirm | 06:02 |
marios | ykarel: oh ok i thought your comment was about the repos. so what is wrong about them then I mean what are you comparing in what you're getting from skopo | 06:04 |
marios | skopo | 06:04 |
marios | ha | 06:04 |
marios | skopeo | 06:04 |
ykarel | marios, my comment was tripleo-common package is not correct(not from tripleo-ci-testing repo) in container | 06:04 |
*** jbadiapa has quit IRC | 06:05 | |
ykarel | marios, try:- trunk.registry.rdoproject.org/tripleomaster/centos-binary-mistral-engine:1ac63709436a0230f547040e4a514470a3c19d78_9c2c4c8f and trunk.registry.rdoproject.org/tripleomaster/centos-binary-mistral-engine:tripleo-ci-testing | 06:06 |
ykarel | and see package list, you will get the difference | 06:06 |
marios | ykarel: ack | 06:06 |
*** rfolco|ruck|off has quit IRC | 06:07 | |
marios | ykarel: well for one i see "Created": "2019-03-11T15:17:57.921333611Z" vs "Created": "2019-03-07T22:55:29.594370349Z", i mean thats definitely an issue they should at least be from same job run | 06:09 |
*** irclogbot_0 has quit IRC | 06:09 | |
*** irclogbot_0 has joined #oooq | 06:10 | |
*** panda|rover|off has quit IRC | 06:10 | |
ykarel | 07 march is too old | 06:10 |
ykarel | job is running regullary, so seems job itself has issue | 06:10 |
marios | ykarel: yeah the job first tags using the hash so its the retag which fails i mean its like it didn't tag on ci-testing | 06:11 |
marios | it didn't re-tag | 06:11 |
ykarel | possibly | 06:12 |
*** panda has joined #oooq | 06:12 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-fedora-28-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq-container @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039 @ https://review.openstack.org/602248, (3 more messages) | 06:18 |
*** rascasoft has joined #oooq | 06:19 | |
*** rascasoft has quit IRC | 06:32 | |
*** saneax has joined #oooq | 06:49 | |
marios | ykarel: panda rf0lc0 fyi https://bugs.launchpad.net/tripleo/+bug/1819583 | 07:00 |
openstack | Launchpad bug 1819583 in tripleo "periodic-... containers-build-push job skips retagging with tripleo-ci-testing" [Critical,In progress] - Assigned to Marios Andreou (marios-b) | 07:00 |
marios | panda: am looking at it just fyi | 07:00 |
marios | panda: ykarel damn it its a typo yaml vs yml | 07:03 |
* marios facepalm | 07:03 | |
marios | fixing | 07:03 |
ykarel | omg :) | 07:03 |
marios | https://git.openstack.org/cgit/openstack-infra/tripleo-ci/commit/?id=0f61e33f01886e3fbf36e7af4110e11a9e4f80bb&context=3&ignorews=0&dt=0 | 07:03 |
marios | ykarel: :/ | 07:04 |
marios | o_O | 07:04 |
marios | ykarel: yeah see in the diff there we add tag.yaml but include tag.yml | 07:04 |
marios | don't know why it didn't fail though on the include | 07:04 |
ykarel | marios, may be built_images returned blank list | 07:05 |
marios | ykarel: hmm :/ that is also not good | 07:05 |
marios | ykarel: if that is the case though it says 'changed' for that one | 07:05 |
marios | (i mean in console and also in ara ) | 07:05 |
marios | whereas the tag is skipped | 07:06 |
ykarel | changed can be for blank [] as well | 07:06 |
marios | ykarel: k well we'll find out | 07:06 |
ykarel | ack | 07:06 |
marios | i think next periodic run in 2 hours maybe won't make that one though | 07:06 |
ykarel | hmm | 07:06 |
*** apetrich has joined #oooq | 07:13 | |
*** dsneddon has quit IRC | 07:22 | |
*** dsneddon has joined #oooq | 07:49 | |
*** ykarel is now known as ykarel|lunch | 07:52 | |
*** kopecmartin|off is now known as kopecmartin | 07:53 | |
*** dsneddon has quit IRC | 07:54 | |
*** jbadiapa has joined #oooq | 07:54 | |
*** jfrancoa has joined #oooq | 08:07 | |
*** holser_ has joined #oooq | 08:10 | |
arxcruz | sshnaidm: https://review.openstack.org/641641 it was creating the right subunit and gunzip but the ping test remains there, i just copy now the right subunit file without gunzip it :) | 08:11 |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-fedora-28-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq- (3 more messages) | 08:18 |
*** ykarel|lunch is now known as ykarel | 08:20 | |
*** ccamacho has joined #oooq | 08:21 | |
*** dsneddon has joined #oooq | 08:29 | |
*** chem has joined #oooq | 08:29 | |
*** amoralej|off is now known as amoralej | 08:36 | |
*** jschlueter has quit IRC | 08:39 | |
*** jschlueter has joined #oooq | 08:40 | |
*** jpena|off is now known as jpena | 08:44 | |
*** bogdando has joined #oooq | 08:46 | |
*** dtantsur|afk is now known as dtantsur | 08:52 | |
arxcruz | sshnaidm: panda https://review.rdoproject.org/r/#/c/18796/ and https://review.rdoproject.org/r/#/c/18795/ please? :D | 08:56 |
sshnaidm | arxcruz, did you run these jobs somewhere for test/ | 09:01 |
sshnaidm | ? | 09:01 |
arxcruz | sshnaidm: no, how can I do it ? can i add a job on tqe with a depends on rdo? | 09:02 |
sshnaidm | arxcruz, just make a dummy patch to rdo-jobs and set for example rocky job in "project" there | 09:02 |
arxcruz | ok | 09:03 |
arxcruz | brb | 09:03 |
sshnaidm | arxcruz, https://review.rdoproject.org/r/#/c/19328/ | 09:07 |
arxcruz | sshnaidm: ack | 09:23 |
*** rf0lc0 is now known as rfolco|ruck | 09:34 | |
*** derekh has joined #oooq | 09:34 | |
*** dsneddon has quit IRC | 09:35 | |
*** tosky has joined #oooq | 09:44 | |
bogdando | o/ devops gurus | 09:48 |
bogdando | https://bugs.launchpad.net/tripleo/+bug/1818994/comments/7 weshay WDYT? | 09:49 |
openstack | Launchpad bug 1818994 in tripleo "ovb jobs broken because pacemaker is unconfigured" [Critical,Triaged] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles) | 09:49 |
bogdando | is it time for that yet? | 09:49 |
sshnaidm | marios, panda for fedora ovb: https://review.rdoproject.org/r/#/c/19327/ | 09:49 |
bogdando | I think the update-package run takes something close to the full rebuild time | 09:50 |
bogdando | got numbers? | 09:50 |
sshnaidm | bogdando, we can look at build containers job to compare | 09:51 |
bogdando | yeah | 09:51 |
bogdando | I wonder where could we host that one-time registry to consume for neighbour jobs executed in the pipeline | 09:53 |
marios | ack sshnaidm but not right now middle sthing | 09:53 |
bogdando | and order it with dependency of zuul | 09:53 |
bogdando | something to place onto discussions list (again)... | 09:53 |
bogdando | weshay: ^^ | 09:53 |
bogdando | ykarel: ^^ | 09:54 |
sshnaidm | bogdando, not sure I understand - what does mean "neighbour jobs" registry? | 09:54 |
bogdando | so where does that container-build-push live? | 09:54 |
bogdando | want to see its numbers | 09:54 |
bogdando | sshnaidm: those in the zuul pipeline | 09:55 |
bogdando | the active one | 09:55 |
bogdando | there is a set of standalone/multinode et al jobs there | 09:55 |
bogdando | and if we order those on the ad-hoc build containers job instead... | 09:55 |
bogdando | like we did for tox ordering | 09:56 |
sshnaidm | bogdando, https://github.com/openstack-infra/tripleo-ci/blob/e15753d072c051a89890fa29df43f5a58c21a2e2/zuul.d/build-containers.yaml#L5 | 09:56 |
bogdando | sshnaidm: https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged) | 09:56 |
bogdando | those neighbors | 09:56 |
bogdando | see for dependency: | 09:56 |
bogdando | so it may be like that: | 09:57 |
bogdando | dependencies: &deps_build_containers | 09:57 |
bogdando | and adding it for al | 09:57 |
bogdando | centrally, in tripleo-ci | 09:57 |
sshnaidm | bogdando, would be easier just to rebuild all containers every N hours and just download them in jobs | 09:58 |
bogdando | sshnaidm: no | 09:59 |
bogdando | every N hours brings us back to the source issue | 09:59 |
bogdando | see mistral container packages versions mismatching | 09:59 |
bogdando | it should be just adhoc versions consumed from zuul deps | 09:59 |
bogdando | I mean those depends-on in the patch under test | 10:00 |
bogdando | or whatever it assebles dlrn repos for buils from | 10:00 |
bogdando | builds* | 10:00 |
sshnaidm | bogdando, patch updates don't take time usually | 10:00 |
bogdando | so just like we build local dlrn repos for jobs, same to containers registry | 10:00 |
ykarel | bogdando, looks like mixing issues https://bugs.launchpad.net/tripleo/+bug/1818994 | 10:01 |
openstack | Launchpad bug 1818994 in tripleo "ovb jobs broken because pacemaker is unconfigured" [Critical,Triaged] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles) | 10:01 |
bogdando | sshnaidm: yeah, but that's for consistency, not time | 10:01 |
bogdando | we need consistent view into versions used for builds | 10:01 |
ykarel | ^^ issue is in promotion jobs where we don't update containers, it's issue in container-build-push job | 10:01 |
ykarel | which is new job and not finish yet | 10:01 |
*** dsneddon has joined #oooq | 10:01 | |
bogdando | ykarel: I was thinking of just never having possible versions mismatches | 10:01 |
bogdando | if that's possible to do for ci jobs | 10:02 |
sshnaidm | bogdando, not sure I get the problem, how could version be mismatched? | 10:02 |
bogdando | please read for https://bugs.launchpad.net/tripleo/+bug/1818994/comments | 10:02 |
openstack | Launchpad bug 1818994 in tripleo "ovb jobs broken because pacemaker is unconfigured" [Critical,Triaged] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles) | 10:02 |
ykarel | bogdando, yes that's different issue, where container updates are done in all job and takes time when promotion is delayed | 10:03 |
ykarel | but can't get how version mismatch issue is realted | 10:03 |
bogdando | I think the idea to consume artifacts from promotions on periodic basis is not applicable for the "front side" CI jobs | 10:03 |
bogdando | it should be left for periodic jobs only | 10:03 |
bogdando | and for regular jobs, let's neven consume periodic promotions just build it all adhoc | 10:04 |
bogdando | ok, nevermind | 10:04 |
bogdando | that's just me then | 10:04 |
bogdando | sorry) | 10:04 |
bogdando | perhaps I don't have the whole picture right | 10:04 |
bogdando | but still not bought on using periodic promotions for regular jobs, I think that's wrong and always creates the mismatching issues | 10:06 |
sshnaidm | bogdando, the problem in bug is about tagging containers.. it's not really related | 10:06 |
*** dsneddon has quit IRC | 10:06 | |
sshnaidm | bogdando, currently we use newest tripleo packages in containers, they are not waiting for promotion | 10:07 |
sshnaidm | bogdando, promotions promote non-tripleo packages like nova, etc | 10:07 |
bogdando | yes, right. I think I just forgot about tripleo-current etc tags... | 10:08 |
bogdando | so do you think building containers for jobs locally wouldn't improve anything? | 10:09 |
bogdando | thinking of consistency for used versions, not time to do | 10:09 |
bogdando | IMO whatever that new tripleo-build-containers-jobs job does, please consider changing it to be done as a step 2 in each pipeline, locally, if tox passes for a change | 10:12 |
bogdando | so we could have tox (PASSED) -> tripleo-build-containers-jobs -> PASSED -> standalone/multinode* -> RUNNING | 10:12 |
zbr | does any of you have a r8 machine running all the time? i need to check things from time to time. | 10:15 |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-fedora-28-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq- (3 more messages) | 10:18 |
*** panda is now known as panda|rover | 10:22 | |
arxcruz | jesus christ, every time I need to fix something in validate-tempest, that I see som many ifs and elses to all workarounds, I want to cry | 10:23 |
arxcruz | it's terrible to maintain it, and i did a huge work to rewrite mostly of it in ansible | 10:23 |
panda|rover | arxcruz: solution ? | 10:24 |
arxcruz | panda|rover: get a bomb, and explode everything | 10:24 |
arxcruz | time machine also works | 10:24 |
panda|rover | arxcruz: ok, I'll contact the black market | 10:25 |
*** skramaja has quit IRC | 10:25 | |
panda|rover | arxcruz: I think I have something based on both your ideas: a time bomb | 10:26 |
sshnaidm | arxcruz, split it to a few | 10:29 |
sshnaidm | arxcruz, like tempest-configure, tempest-run-container, tempest-run-package, etc | 10:29 |
rfolco|ruck | arxcruz, panda|rover I see 2 tempest failures in gate, aware ? | 10:33 |
rfolco|ruck | http://logs.openstack.org/06/640706/3/gate/tripleo-ci-centos-7-standalone/27f4007/logs/undercloud/home/zuul/tempest/tempest.html.gz | 10:33 |
rfolco|ruck | http://logs.openstack.org/43/641743/2/gate/tripleo-ci-centos-7-standalone/e1d6ec2/logs/undercloud/home/zuul/tempest/tempest.html.gz | 10:33 |
rfolco|ruck | these 2 ^ | 10:33 |
*** dsneddon has joined #oooq | 10:34 | |
arxcruz | rfolco|ruck: nope | 10:34 |
arxcruz | rfolco|ruck: randon i guess | 10:34 |
arxcruz | random | 10:34 |
rfolco|ruck | random in gate is bad | 10:35 |
panda|rover | random is generally bad | 10:35 |
arxcruz | sshnaidm: yeah, that's what i did, but check the configure-tempest.sh and run-tempest.sh | 10:35 |
arxcruz | so you check if is container, but also check if is standalone, and if is container, you check for a rc file, however, if is also standalone, it doesn't have a rc file | 10:36 |
arxcruz | so if you're running a standalone, with containerized tempest, it will fail | 10:36 |
panda|rover | marios: zbr I was tasked to push a set of fedora containers to docker.io, can you point me to a hash that is passing deployment ? | 10:36 |
arxcruz | because it doesn't have a rc file | 10:36 |
arxcruz | and we weren't testing it | 10:36 |
*** dsneddon has quit IRC | 10:39 | |
*** ykarel is now known as ykarel|lunch | 10:40 | |
sshnaidm | arxcruz, I meant completely different roles | 10:40 |
sshnaidm | arxcruz, so you don't need to check if it's containerized or not, but just run tempest-container role, configured in featureset | 10:41 |
arxcruz | sshnaidm: problem is, we are moving to os_tempest, so it wont worth the effort | 10:41 |
sshnaidm | arxcruz, then it's solved :D | 10:42 |
arxcruz | sshnaidm: but we are not there yet, now i'm working on the problem in promotion, that i need to fix in validate-tempest, but in order to fix it, i have to fix several other things | 10:42 |
arxcruz | like inception | 10:42 |
arxcruz | panda|rover: is that okay, if we switch to run tempest as packages ? | 10:43 |
arxcruz | that would 'fix' the problem | 10:43 |
panda|rover | zbr: marios pushing e3f9cc7df7c87a2fce4e9ddfa05f8365ca63703d_4bfa3685 to docker.io | 10:50 |
panda|rover | zbr: marios and manually promoting to current-tripleo | 10:51 |
zbr | panda|rover: cool! | 10:51 |
panda|rover | arxcruz: fix which problem ? the failure in gates ? | 10:51 |
arxcruz | panda|rover: https://bugs.launchpad.net/tripleo/+bug/1819440 | 10:52 |
openstack | Launchpad bug 1819440 in tripleo "phase 1 failing on No such file or directory: '/home/stack/tempest'" [Critical,In progress] - Assigned to Arx Cruz (arxcruz) | 10:52 |
panda|rover | arxcruz: ah that one | 10:52 |
marios | panda|rover: ack i am trying to fix https://launchpad.net/bugs/1819583 still local debug will post sthing in bit there (will update https://review.openstack.org/#/c/642662/ or the parent for test) | 10:53 |
openstack | Launchpad bug 1819583 in tripleo "periodic-... containers-build-push job skips retagging with tripleo-ci-testing" [Critical,In progress] - Assigned to Marios Andreou (marios-b) | 10:53 |
panda|rover | arxcruz: so you want to run tempest from packages and not from containers for all master ? | 10:53 |
arxcruz | panda|rover: at least to release the promotion | 10:54 |
arxcruz | because... there are a lot of logic wrong on container, and to really fix it... it might take a while | 10:54 |
panda|rover | arxcruz: ok, so you suggest we switch that particular job only, to run from packages, on master | 10:55 |
sshnaidm | panda|rover, can you take a look please? https://review.rdoproject.org/r/#/c/19327/ | 10:55 |
sshnaidm | arxcruz, why not to configure only phase1 to use package? | 10:56 |
panda|rover | sshnaidm: tested anywhere ? | 10:58 |
arxcruz | sshnaidm: panda|rover whatever it takes to unblock the promotion :) | 10:59 |
panda|rover | arxcruz: ok you either modify the call in jenkins or you modify a parameter somewhere to make just that job run on packages | 11:00 |
*** dsneddon has joined #oooq | 11:04 | |
*** ykarel|lunch is now known as ykarel | 11:07 | |
sshnaidm | panda|rover, not in the job | 11:08 |
*** dsneddon has quit IRC | 11:10 | |
panda|rover | oh wow | 11:10 |
panda|rover | the promotion script was not updated to promote fedora | 11:10 |
panda|rover | at all | 11:10 |
panda|rover | I'll put up a review | 11:11 |
*** chkumar|pto is now known as chandankumar | 11:11 | |
* chandankumar just checking out, will back tomorrow at work! | 11:12 | |
rfolco|ruck | sshnaidm, zbr marios: did anyone add f28 container build job to sova ? | 11:14 |
zbr | not me | 11:15 |
sshnaidm | rfolco|ruck, yes | 11:15 |
rfolco|ruck | sshnaidm, can't find it. Show me pls? | 11:15 |
sshnaidm | rfolco|ruck, it will take a couple hours to appear in the site | 11:16 |
rfolco|ruck | fair | 11:16 |
rfolco|ruck | sshnaidm, thanks | 11:16 |
panda|rover | zbr: are the containers called fedora-binary or fedora28-binary ? | 11:17 |
zbr | panda|rover: i think without version | 11:18 |
panda|rover | zbr: ok, good | 11:18 |
panda|rover | and bad | 11:18 |
panda|rover | at the same time | 11:18 |
panda|rover | good for now, bad for the future | 11:18 |
zbr | yeah, i know that once we get the new centos 8, we will have another round of issues. | 11:19 |
zbr | panda|rover: btw, do we need to support both versions 7/8 on the same os release? if not, we do not care. | 11:20 |
panda|rover | zbr: you will care whens time to make the transition ... | 11:22 |
panda|rover | OH if you WILL care | 11:22 |
panda|rover | :) | 11:22 |
*** chandankumar is now known as chkumar246 | 11:35 | |
*** dsneddon has joined #oooq | 11:39 | |
marios | rfolco|ruck: did not (re containers jobs and sova ) and i notice there is no containers-push job at http://cistatus.tripleo.org/promotion/ (saw it friday actually and was away yesterday ) | 11:41 |
rfolco|ruck | marios, sshnaidm did, it will refresh soon | 11:42 |
rfolco|ruck | thanks marios | 11:42 |
marios | rfolco|ruck: cool thanks | 11:42 |
marios | ykarel: panda|rover updated the way we're getting th elist of containers.. .using the build log instead https://review.openstack.org/#/c/642662/3/playbooks/tripleo-buildcontainers/run.yaml (& we'll find out after https://review.rdoproject.org/r/#/c/19131/ reports) | 11:43 |
*** dsneddon has quit IRC | 11:44 | |
panda|rover | marios: mmmmhhh | 11:46 |
panda|rover | marios: MMMMMMMHHHH | 11:46 |
* marios cowers | 11:47 | |
panda|rover | marios: who creates build.log.txt ? | 11:49 |
marios | panda|rover: line 82 | 11:51 |
*** chem has quit IRC | 11:51 | |
marios | panda|rover: its the literal list from the kolla build output like http://logs.rdoproject.org/31/19131/4/check/periodic-tripleo-centos-7-master-containers-build-push/ea18cc2/logs/build.log.txt.gz | 11:52 |
ykarel | marios, ack, but i think you have to consider failure cases as well | 11:54 |
marios | ykarel: well the retag should fail and tell us if there was no such container | 11:55 |
marios | ykarel: that what you mean | 11:55 |
ykarel | marios, so you want to push some containers, even if some containers failed to build? | 11:56 |
*** jpena is now known as jpena|lunch | 11:56 | |
marios | ykarel: well that happens anyway | 11:56 |
marios | ykarel: i mean its the way kolla build works... it build and push as soon as built and then move onto next | 11:56 |
ykarel | marios, okk, just check if ansible task is clean in case of build failures | 11:57 |
ykarel | i mean built_containers: "{{ (lookup('file', '{{ workspace }}/build.log.txt' )|from_json).built | map(attribute='name') | list }}" | 11:57 |
ykarel | i have not seen how that file looks like in case of failures | 11:58 |
marios | ykarel: i mean its a good point and weakness of the current approach... plan is to make it build and push into two tasks in future so we can push all the things at same time rather than current way | 11:58 |
marios | ykarel: so maybe we should check 'failed' is empty or something in http://logs.rdoproject.org/31/19131/4/check/periodic-tripleo-centos-7-master-containers-build-push/ea18cc2/logs/build.log.txt.gz | 11:58 |
ykarel | marios, ack | 11:58 |
*** dsneddon has joined #oooq | 12:01 | |
panda|rover | marios: ykarel yeah, there's some logic about checking taht everything is where it should in the container-push.yml of the promoter. You can look at that | 12:01 |
panda|rover | marios: so build.log is not a json file | 12:01 |
*** rlandy has joined #oooq | 12:01 | |
marios | panda|rover: i tried this with wget http://logs.rdoproject.org/31/19131/4/check/periodic-tripleo-centos-7-master-containers-build-push/ea18cc2/logs/buil.log.txt.gz https://paste.fedoraproject.org/paste/cLXw09fGgodVZA-4~SAHhA | 12:03 |
marios | panda|rover: seems to read it fine | 12:03 |
marios | panda|rover: and from_yaml works same :D | 12:04 |
panda|rover | marios: the file is build.log then, not build.log.txt ... the txt is added by the collect logs to make it readable directly from the browser | 12:05 |
marios | panda|rover: right... updating! | 12:05 |
*** dsneddon has quit IRC | 12:05 | |
panda|rover | marios: still something doesn't seem right | 12:05 |
panda|rover | marios: line 82 is a redirection | 12:05 |
marios | panda|rover: and its yaml not json | 12:05 |
marios | well isn't it json /me confused | 12:06 |
marios | both works though | 12:06 |
panda|rover | marios: how can a file with a correct yaml be created by a bash redirection ? | 12:06 |
marios | panda|rover: well its there man http://logs.rdoproject.org/31/19131/4/check/periodic-tripleo-centos-7-master-containers-build-push/ea18cc2/logs/build.log.txt.gz | 12:07 |
panda|rover | marios: no ok, I undestand | 12:07 |
panda|rover | marios: the openstack overcloud container image build comamnd actually outputs a yaml | 12:08 |
panda|rover | and we capture that output in a file | 12:08 |
marios | panda|rover: so apparently its both valid yaml and valid json ... :/ http://yaml-online-parser.appspot.com/?url=http%3A%2F%2Flogs.rdoproject.org%2F31%2F19131%2F4%2Fcheck%2Fperiodic-tripleo-centos-7-master-containers-build-push%2Fea18cc2%2Flogs%2Fbuild.log.txt.gz https://www.freeformatter.com/json-validator.html | 12:08 |
marios | panda|rover: ack | 12:08 |
panda|rover | and deventually error in mor loggy fashion in build-err.log | 12:08 |
panda|rover | marios: valid json is also valid yaml, but the opposite is not | 12:09 |
panda|rover | marios: I think the command is outputting json | 12:09 |
marios | panda|rover: bah ok updating again :D | 12:09 |
*** trown|outtypewww is now known as trown | 12:10 | |
arxcruz | panda|rover: where's the job definition for tripleo-quickstart-promote-ocata-rdo_trunk-minimal ? | 12:10 |
panda|rover | marios: which is good for ansible | 12:10 |
panda|rover | arxcruz: in jenkins | 12:10 |
panda|rover | arxcruz: why ocata ? | 12:11 |
arxcruz | panda|rover: sorry, wrong copy and paste | 12:12 |
panda|rover | arxcruz: https://ci.centos.org/job/tripleo-quickstart-promote-ocata-rdo_trunk-minimal_pacemaker/ | 12:12 |
panda|rover | arxcruz: the answer is similar anyway, you have to modify the jenkins config | 12:12 |
arxcruz | panda|rover: i mean, which repo should I edit? | 12:13 |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq-container, tripleo-ci- (3 more messages) | 12:18 |
*** panda|rover is now known as panda|rover|lunc | 12:21 | |
weshay | panda|rover|lunc rfolco|ruck thanks for the work this morning.. afaict.. rdo may be down again? | 12:24 |
rfolco|ruck | weshay, I did not check last hour, let me refresh | 12:25 |
rfolco|ruck | weshay, rdo seems ok, did you see anything ? | 12:26 |
weshay | rfolco|ruck ya.. looking at the cockpit, noticing the measurements stopped at yesterday | 12:28 |
weshay | and all the node failures | 12:28 |
sshnaidm | panda|rover|lunc, fixed https://review.rdoproject.org/r/#/c/19327/ | 12:31 |
*** dsneddon has joined #oooq | 12:33 | |
*** chem has joined #oooq | 12:33 | |
rfolco|ruck | weshay, I see some retry_limit from yesterday... stack create looks good http://dashboard-ci.tripleo.org/d/cEEjGFFmz/cockpit?orgId=1&panelId=232&fullscreen | 12:33 |
rfolco|ruck | oh no | 12:34 |
rfolco|ruck | weshay, right, latest periodic did not run, ok | 12:34 |
jaosorior | still seeing: | 12:38 |
jaosorior | 2019-03-12 11:40:50 | [2019/03/12 11:38:04 AM] [ERROR] stdout: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Error, some other host (26:72:54:BF:0B:59) already uses address 172.17.0.71. | 12:38 |
jaosorior | in OVB | 12:38 |
*** dsneddon has quit IRC | 12:39 | |
weshay | jaosorior ya.. I think rdo is back down | 12:42 |
rfolco|ruck | jaosorior, will also check if rdo needs a cleanup | 12:43 |
weshay | rfolco|ruck that happens automatically | 12:43 |
weshay | however it isn't able to kill everything at times | 12:43 |
jaosorior | brb | 12:43 |
rfolco|ruck | weshay, no more leftovers like used ips? | 12:43 |
*** jaosorior has quit IRC | 12:43 | |
rfolco|ruck | weshay, yep, thats what I am talking about | 12:43 |
weshay | k we have about 6 instances in error state | 12:44 |
*** udesale has quit IRC | 12:50 | |
*** udesale has joined #oooq | 12:51 | |
panda|rover|lunc | weshay: https://hub.docker.com/r/tripleomaster/fedora-binary-openstack-base/tags | 12:57 |
sshnaidm | weshay, panda|rover|lunc do we need ovb jobs with fedora? https://github.com/rdo-infra/rdo-jobs/blob/98bbed7e0157a175b7ee2b6d4604408344ce7c54/zuul.d/ovb-jobs.yaml#L295 | 12:58 |
*** jpena|lunch is now known as jpena | 13:00 | |
panda|rover|lunc | sshnaidm: we probably do, I don't think we can't seriously promote without OVB. Not sure about the time though | 13:02 |
sshnaidm | panda|rover|lunc, mm.. but ovb nodes as fedora or centos? | 13:03 |
sshnaidm | panda|rover|lunc, because this job will use fedora as undercloud and centos as overcloud | 13:03 |
marios | m hicks call now reminder | 13:04 |
panda|rover|lunc | marios: ZZZzz | 13:05 |
weshay | wait.. | 13:11 |
weshay | did I just hear base rhel is now open? | 13:12 |
weshay | I just joined | 13:12 |
weshay | marios ^ | 13:12 |
*** panda|rover|lunc is now known as panda|rover | 13:12 | |
marios | weshay: last/ready build last night but live at summit | 13:15 |
*** jaosorior has joined #oooq | 13:15 | |
weshay | ? | 13:15 |
weshay | not understanding that | 13:15 |
*** dsneddon has joined #oooq | 13:15 | |
marios | weshay: not sure but see pvt | 13:17 |
*** dsneddon has quit IRC | 13:20 | |
amoralej | we are updating ovs for stein, we have gated with oooq jobs but let us know if you observe anything abnormal | 13:25 |
*** dsneddon has joined #oooq | 13:25 | |
amoralej | moving to 2.11, will be in the repo in next hour or so | 13:25 |
amoralej | https://review.rdoproject.org/r/#/c/19209/ | 13:25 |
weshay | amoralej any progress in the planning to squash dlrn and dlrn-deps into a single repo for master? | 13:27 |
amoralej | wes, we added some info in https://tree.taiga.io/project/tripleo-ci-board/epic/601 about deps repo | 13:28 |
amoralej | and how to use it in a versioned way | 13:28 |
weshay | amoralej k thanks for pointing that out, will review | 13:30 |
*** dsneddon has quit IRC | 13:32 | |
panda|rover | te-broker changed address | 13:45 |
jaosorior | oops | 13:45 |
jaosorior | panda|rover: why was the change of address an issue? | 13:46 |
ykarel | panda|rover, te-broker still used? i thought all ovb job moved away from it | 13:47 |
panda|rover | jaosorior: it's mostly a toolbox now, used for minor tasks, previously it meant that RDO job could not contact it to spawn OVB environment | 13:48 |
panda|rover | ykarel: jaosorior we still run the script that cleans up OVB stacks there, and it wasn't running properly without floating ip | 13:48 |
ykarel | panda|rover, ack, thanks for clarifying it | 13:49 |
jaosorior | I se | 13:49 |
panda|rover | but even after the cleanup, we have 30 instances out of 730 older than 5 hours | 13:49 |
panda|rover | that means that hte load is legit and very high | 13:49 |
*** jbadiapa has quit IRC | 13:55 | |
*** dsneddon has joined #oooq | 13:56 | |
*** openstack has joined #oooq | 15:38 | |
*** ChanServ sets mode: +o openstack | 15:38 | |
*** openstackstatus has joined #oooq | 15:38 | |
*** ChanServ sets mode: +v openstackstatus | 15:38 | |
weshay | I'm slow to respond, in another mtg | 15:38 |
* weshay rereads to see if I get a similar impression | 15:38 | |
marios | weshay: panda|rover rfolco|ruck removing promotion blocker comment #2 https://bugs.launchpad.net/tripleo/+bug/1819583 | 15:40 |
*** dsneddon has joined #oooq | 15:41 | |
*** jbadiapa has quit IRC | 15:41 | |
rfolco|ruck | marios, ack | 15:43 |
openstack | Launchpad bug 1819583 in tripleo "periodic-... containers-build-push job skips retagging with tripleo-ci-testing" [Critical,In progress] - Assigned to Marios Andreou (marios-b) | 15:43 |
marios | weshay: rfolco|ruck panda|rover also https://bugs.launchpad.net/tripleo/+bug/1818994/comments/10 | 15:51 |
openstack | Launchpad bug 1818994 in tripleo "ovb jobs broken because pacemaker is unconfigured" [Critical,Triaged] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles) | 15:51 |
arxcruz | sshnaidm: https://review.openstack.org/#/c/641641/ it's working now :) | 15:51 |
arxcruz | sshnaidm: http://logs.openstack.org/41/641641/5/check/tripleo-ci-centos-7-standalone-os-tempest/28e3506/logs/testrepository.subunit.gz | 15:52 |
panda|rover | marios: uhm, is mistral is building with the wrong package , it happens even before the first tagging | 15:55 |
marios | panda|rover: can you rephrase please | 15:58 |
panda|rover | marios: was reading https://bugs.launchpad.net/tripleo/+bug/1818994. if tripleo-common ended up with a wrong version in mistral container, it's either hash mismatch between containers, or the mistral container did not build with the correct tripleo-common package. And in this last case, this happens *BEFORE* any tagging or retagging. | 16:01 |
openstack | Launchpad bug 1818994 in tripleo "ovb jobs broken because pacemaker is unconfigured" [Critical,Triaged] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles) | 16:01 |
*** weshay is now known as Dwight | 16:18 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq-container, tripleo-ci- (3 more messages) | 16:18 |
*** Dwight is now known as DwightH | 16:18 | |
*** DwightH is now known as weshay | 16:19 | |
*** agopi has joined #oooq | 16:22 | |
*** ccamacho has quit IRC | 16:29 | |
*** jfrancoa has quit IRC | 16:31 | |
*** holser_ has quit IRC | 16:31 | |
panda|rover | sshnaidm: do we have specific file as environemnt for tq/tqe in RDO ? | 16:32 |
sshnaidm | panda|rover, multinode-rdocloud.yml, ovb-rdocloud.yml | 16:34 |
sshnaidm | panda|rover, all *rdocloud.yml https://github.com/openstack-infra/tripleo-ci/tree/668c03178df892055d3d30dde1c04b5d50883f90/toci-quickstart/config/testenv | 16:34 |
*** udesale has quit IRC | 16:37 | |
panda|rover | sshnaidm: yep, ok, but all in tripleo-ci, and nowhere else. | 16:38 |
sshnaidm | panda|rover, where should it be? | 16:38 |
*** chem has quit IRC | 16:38 | |
panda|rover | sshnaidm: wonder why we are not setting undercloud_docker_registry_mirror in thos files | 16:38 |
sshnaidm | panda|rover, why should it differ from upstream? | 16:39 |
sshnaidm | toci-quickstart/config/testenv/multinode.yml:undercloud_docker_registry_mirror: "{{ lookup('env','NODEPOOL_DOCKER_REGISTRY_PROXY') }}" | 16:40 |
sshnaidm | the same thing should be for rdo cloud | 16:40 |
panda|rover | sshnaidm: that's what I don't understand, why are we setting that value for upstream and not for rdocloud | 16:41 |
sshnaidm | panda|rover, we apply multinode-rdocloud after we apply multinode.yml | 16:41 |
sshnaidm | at least we should | 16:42 |
panda|rover | mmhh, the chain of overrides strikes again. | 16:43 |
*** d0ugal has quit IRC | 17:04 | |
weshay | marios need to chat any more about tagging? | 17:04 |
*** kopecmartin is now known as kopecmartin|off | 17:05 | |
panda|rover | weshay: IIUC there's still some concern if the missed tag is affecting bugs like https://bugs.launchpad.net/tripleo/+bug/1818994 somehow. I was trying to understand where the wrong tripleo-common package version came from | 17:06 |
openstack | Launchpad bug 1818994 in tripleo "ovb jobs broken because pacemaker is unconfigured" [Critical,Triaged] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles) | 17:06 |
weshay | sshnaidm were you able to get on any of those boxes | 17:10 |
weshay | ? | 17:11 |
sshnaidm | weshay, not yet | 17:11 |
weshay | panda|rover let me know if you need to hand off anything | 17:11 |
weshay | marios zbr panda|rover fyi :) f28 is running through deployment upstream http://logs.openstack.org/88/615988/26/check/tripleo-ci-fedora-28-standalone/8221d4a/logs/tempest.html.gz | 17:15 |
weshay | one fix for at least some of the tempest is https://review.openstack.org/#/c/642517/ | 17:16 |
zbr | weshay: btw, what is the conclusion with nove hw arch? we got N answers, with N+1 ideas :) | 17:17 |
zbr | q35 ? | 17:17 |
weshay | zbr see my email to openstack-discuss | 17:19 |
weshay | zbr going to go w/ pc first.. and see how q35 goes | 17:20 |
*** trown is now known as trown|lunch | 17:20 | |
zbr | so i need to ping few people to merge https://review.openstack.org/#/c/642517/2 | 17:21 |
weshay | arxcruz is this still wip? https://review.openstack.org/#/c/635478/ | 17:21 |
zbr | panda|rover: sshnaidm please help with ^^ | 17:22 |
sshnaidm | zbr, it worth to review it only after CI jobs pass | 17:23 |
panda|rover | zbr: I have no idea what that patch is doing , completely mssing context | 17:24 |
*** panda|rover is now known as panda|rover|off | 17:24 | |
zbr | lots of emails on openstack-discuss rel to it. mainly fedora default hw-arch used by nova was not compatible and we are switching to a value that is more portable "pc". it has no side effects on centos because pc is an alias pointing to current value. | 17:25 |
zbr | think about "pc" to some kind of "current" | 17:26 |
zbr | i will try to update the bug with info related to the issue.... | 17:26 |
sshnaidm | zbr, yes, please, there is no any info in bug about it | 17:27 |
zbr | panda|rover|off: sshnaidm I updated the description | 17:38 |
*** dtantsur is now known as dtantsur|afk | 17:39 | |
zbr | i think that in the end we do want this one https://review.openstack.org/#/c/642443/ -- but i will wait for the CI results. | 17:40 |
weshay | zbr the comment from nova folks was already not to use pc | 17:45 |
weshay | zbr the suggestion is q35 or what ever | 17:45 |
weshay | note my email re: rhel doc | 17:45 |
weshay | of course we get a node failure on the f28 container build | 17:46 |
weshay | :( | 17:46 |
zbr | weshay: my impression was that q35 was likely better, but I would prefer two steps: pc first, and q35 after. I was writing the follow-up as we chat. | 17:46 |
weshay | zbr right.. but not in the puppet | 17:46 |
weshay | not at first at least | 17:46 |
weshay | let's update the standalone jobs.. and then we'll propose q35 on puppet | 17:46 |
weshay | for x86_64 | 17:46 |
*** amoralej is now known as amoralej|off | 17:47 | |
zbr | weshay: not against it but we need to remember to undo our override if we do this. | 17:48 |
zbr | weshay: i am glad that q35 is supported by my obsoleted supermicro nodes, i was worried that my be left outside. ;) | 17:50 |
zbr | weshay: this may be only few chars to change but this kind of change can have major implications. | 17:51 |
zbr | weshay: lest use this topic for the subject https://review.openstack.org/#/q/topic:nova-arch+(status:open+OR+status:merged) | 17:52 |
*** bogdando has quit IRC | 17:52 | |
zbr | so we avoid duplicating efforth | 17:52 |
*** derekh has quit IRC | 17:55 | |
weshay | zbr++ | 17:56 |
hubbot1 | weshay: zbr's karma is now 2 | 17:56 |
*** jbadiapa has joined #oooq | 17:58 | |
weshay | rlandy can you look at the timeout value for ovb jobs in rdo | 18:00 |
weshay | I think we may need 3.5 hrs | 18:00 |
rlandy | weshay: ack | 18:01 |
rlandy | weshay: depends on the ovb - some have diff specified timeouts - which one is problematic | 18:02 |
sshnaidm | rlandy, weshay take a look please: https://review.rdoproject.org/r/#/c/19259/ it fixes git-review problem with libvirt mode, and finally it works in beaker machines | 18:02 |
weshay | rlandy /me looking at http://dashboard-ci.tripleo.org/d/cEEjGFFmz/cockpit?orgId=1&panelId=207&fullscreen | 18:03 |
weshay | fs001 | 18:03 |
weshay | fs35 | 18:03 |
rlandy | on master = I see | 18:04 |
sshnaidm | rlandy, weshay with this patch: http://rdo-ci-fx2-02-s4.v101.rdoci.lab.eng.rdu.redhat.com:8000/01/1001/1/check/tripleo-ci-centos-7-standalone-dlrn-hash-tag/2f9cc57/job-output.txt.gz | 18:04 |
rfolco|ruck | anyone seen this before or have any insights? | 18:06 |
rfolco|ruck | http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-centos-7-master-containers-build-push/ea05d75/logs/build-err.log.txt.gz | 18:06 |
rfolco|ruck | 2 containers started failing to build | 18:06 |
rfolco|ruck | ERROR:kolla.common.utils:ironic-inspector Failed with status: error | 18:06 |
rfolco|ruck | ERROR:kolla.common.utils:octavia-base Failed with status: error | 18:06 |
sshnaidm | weshay, check this please: https://review.rdoproject.org/r/#/c/19327/ | 18:08 |
* rlandy is looking for obvious timeout | 18:09 | |
rlandy | collect logs | 18:09 |
*** d0ugal has joined #oooq | 18:09 | |
*** sshnaidm is now known as sshnaidm|afk | 18:11 | |
rlandy | weshay: collect_logs took 20 mins | 18:11 |
rlandy | comparing that with other fs001 job | 18:12 |
*** trown|lunch is now known as trown | 18:13 | |
rlandy | I guess previously it took 18 mins | 18:14 |
rlandy | so we may have overrun | 18:14 |
rlandy | weshay: ack - looks like master is overrunning the time during collect logs ... I would agree to increase the time but there may be a reason we have a time increase | 18:18 |
hrybacki | o/ -- where is OOOQ dropping live deployment logs nowadays? | 18:18 |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq-container, tripleo-ci- (3 more messages) | 18:19 |
rlandy | collect logs runs 20 mins - used to run 18 - which is not a big deal | 18:19 |
rlandy | but the previous run time plus that pushes over the mark | 18:19 |
rlandy | sshnaidm|afk: looks ok - waiting on ci to complete | 18:23 |
hrybacki | specifically trying to find more details during undercloud deployment that keeps | 18:27 |
rlandy | hrybacki: where is your job running upstream or rdocloud? | 18:29 |
hrybacki | rlandy: neither -- running locally to a virthost | 18:29 |
hrybacki | I recall there used to be an undercloud_deploy.log* but sniffing around I can't seem to find that | 18:30 |
rlandy | can you access the undercloud? | 18:30 |
rlandy | undercloud_install.log in /home/<user> | 18:31 |
hrybacki | rlandy++ ty | 18:32 |
hubbot1 | hrybacki: rlandy's karma is now 49 | 18:32 |
weshay | hrybacki if you reference an upstream job.. you'll see a bunch of helpful links | 18:32 |
weshay | hrybacki rlandy https://review.openstack.org/#/c/642546/ | 18:33 |
weshay | hrybacki ovb jobs are the same but we can't create the footer there yet .. e.g. http://logs.openstack.org/46/642546/1/check/tripleo-ci-centos-7-containers-multinode/052bd3f/logs/ | 18:34 |
hrybacki | nice | 18:34 |
hrybacki | weshay: are we able to deploy against RDO Cloud again already? | 18:34 |
weshay | hrybacki it's starting to go green again .. http://dashboard-ci.tripleo.org/d/cEEjGFFmz/cockpit?orgId=1&panelId=207&fullscreen | 18:35 |
weshay | we're looking at the timeouts | 18:35 |
weshay | still not great.. but it may work | 18:35 |
weshay | hrybacki actually | 18:35 |
weshay | jump on my blue | 18:35 |
* weshay will show you something possibly helpful | 18:36 | |
weshay | https://bluejeans.com/u/whayutin/ | 18:36 |
hrybacki | ack | 18:36 |
weshay | sshnaidm|afk jobs on vexx are working :) | 18:40 |
weshay | hrybacki http://logs.rdoproject.org/83/642583/2/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/79d4eda/logs/README-reproducer-zuul-based-quickstart.html | 18:42 |
*** jpena is now known as jpena|off | 18:44 | |
rlandy | weshay: been looking at jobs times for fs001 master vs rocky - jobs are definitely running longer | 18:56 |
rlandy | maybe that's expected | 18:56 |
weshay | rfolco|ruck fix_released if it's working | 19:02 |
weshay | rlandy well.. there are a lot of yum updates to containers now too | 19:02 |
rfolco|ruck | weshay, panda|rover|off had already moved to fix_release. I just checked latest runs and confirmed the error vanished. | 19:02 |
weshay | because the lack of promotions | 19:02 |
weshay | rfolco|ruck k | 19:03 |
weshay | thanks | 19:03 |
rfolco|ruck | weshay, thank you | 19:03 |
rlandy | weshay: k - just to be aware if we are covering a performance regression | 19:03 |
rlandy | but +1 to increasing the time | 19:03 |
rlandy | clear that we are running out during collect logs | 19:03 |
rlandy | not one particular step | 19:04 |
weshay | failing the jobs is not going to find it, but enabling rooks team would be ++ for that | 19:04 |
weshay | rlandy ya.. that is VERY difffifcult to isolate | 19:04 |
rlandy | weshay: ack - in process | 19:04 |
weshay | rlandy let's just add 30min and then get back to work on the other stuff.. getting rook up there would be the best thing we can do | 19:05 |
weshay | rlandy also you'll want to review this.. re: reproducer https://review.openstack.org/#/c/642578/ | 19:05 |
weshay | open to changing that | 19:05 |
rlandy | ah we're going public now | 19:06 |
weshay | rlandy not really | 19:20 |
weshay | but still want to call the old one out as deprecated | 19:21 |
weshay | or maybe.. a different name | 19:21 |
weshay | maybe the new one is NEW | 19:21 |
weshay | really I'm trying to clean up the dir | 19:21 |
*** d0ugal has quit IRC | 19:23 | |
weshay | rfolco|ruck are the overcloud deployment failures in periodic master accounted for in the bugs you guys opened this morning? | 19:27 |
rlandy | weshay: understand - looks reasonable | 19:28 |
rfolco|ruck | weshay, one is tempest, and I opened yesterday. The others I filed this morning is not periodic master. It's gate. | 19:30 |
weshay | rfolco|ruck ok.. let's get the periodic failures in lp's | 19:31 |
fmount | ping | 19:31 |
weshay | umount | 19:31 |
rfolco|ruck | weshay, right I'll check latest failures and open bugs | 19:31 |
weshay | rfolco|ruck this looks like infra to me http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/72b71cd/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz | 19:33 |
rfolco|ruck | rlandy, any reason the new repro would not reproduce this https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-rocky-rdo_trunk-minimal-118/console.txt.gz ? | 19:33 |
rfolco|ruck | weshay, apparently yes, panda|rover|off and I had the same conclusion about this. | 19:34 |
rfolco|ruck | weshay, I'm finding any other different one... if you have anything send to me, I can file bugs | 19:34 |
rlandy | rfolco|ruck: what do you mean by not reproduce? | 19:35 |
rlandy | not fail the same way? | 19:35 |
rfolco|ruck | rlandy, reproducer is supposed to work on this job ? | 19:35 |
rfolco|ruck | rlandy, the new one | 19:35 |
rlandy | rfolco|ruck: ack - only works with zuul-based jobs | 19:36 |
weshay | rfolco|ruck this one got passed os-net-config http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/3565ce7/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz | 19:36 |
rfolco|ruck | I want to build an env for arx to debug | 19:36 |
rlandy | rfolco|ruck: it's a zuul-based reproducer | 19:36 |
weshay | rfolco|ruck any ci.centos job you just run quickstart.sh | 19:36 |
weshay | w/ the same args | 19:37 |
weshay | you'll need your centos box to do that | 19:37 |
weshay | one | 19:37 |
weshay | anyone see pip fail on build-test-packages? | 19:37 |
weshay | 2019-03-12 19:25:43.496709 | primary | Could not find a version that satisfies the requirement cryptography>=2.3 (from pyOpenSSL>=16.2.0->rdopkg==0.47.3) (from versions: ) | 19:37 |
weshay | 2019-03-12 19:25:43.496802 | primary | No matching distribution found for cryptography>=2.3 (from pyOpenSSL>=16.2.0->rdopkg==0.47.3) | 19:37 |
rfolco|ruck | weshay, this one fs002 above hits the pacemaker bug | 19:45 |
rfolco|ruck | "Error: Evaluation Error: Error while evaluating a Function Call, The 'hacluster_pwd' hiera key is undefined, did you forget to include ::tripleo::profile::base::pacemaker in your role? (file: /etc/puppet/modules/tripleo/manifests/profile/base/pacemaker.pp, line: 94, column: 5) on node overcloud-controller-0.localdomain", | 19:45 |
rfolco|ruck | so fs001 probably infra, fs002 pacemaker | 19:45 |
rfolco|ruck | https://bugs.launchpad.net/tripleo/+bug/1818994 probably does not show in cockpit coz it was reopened (was in fix_commit) | 19:48 |
openstack | Launchpad bug 1818994 in tripleo "ovb jobs broken because pacemaker is unconfigured" [Critical,Triaged] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles) | 19:48 |
rfolco|ruck | weshay, ^ | 19:48 |
rfolco|ruck | so tempest, pacemaker... checking what else | 19:49 |
rfolco|ruck | weshay, actually latest periodic skipped due to container build bug - https://bugs.launchpad.net/tripleo/+bug/1819766 | 20:01 |
openstack | Launchpad bug 1819766 in tripleo "containers build job failing" [Critical,Triaged] - Assigned to Gabriele Cerami (gcerami) | 20:01 |
weshay | rfolco|ruck put the trace in the summary please https://bugs.launchpad.net/tripleo/+bug/1819766 | 20:08 |
openstack | Launchpad bug 1819766 in tripleo "containers build job failing" [Critical,Triaged] - Assigned to Gabriele Cerami (gcerami) | 20:08 |
weshay | so we can find it amoungst other container build job failures | 20:08 |
rfolco|ruck | weshay, this is all the job produces | 20:09 |
rfolco|ruck | I put what we see in build-err.log | 20:10 |
weshay | rfolco|ruck /me updated | 20:12 |
rfolco|ruck | weshay, ah with containers name | 20:12 |
rfolco|ruck | not trace | 20:12 |
weshay | well .. usually a trace if the the trace is meaningful | 20:13 |
weshay | in this case it's not | 20:13 |
rfolco|ruck | k | 20:13 |
weshay | rfolco|ruck folks get into the habit of saying.. for instance | 20:13 |
weshay | multinode-container job failed... as the bug summary | 20:13 |
weshay | so you end up w/ 30 bugs w/ that summary | 20:13 |
rfolco|ruck | weshay, agreed | 20:14 |
weshay | rfolco|ruck so do your best to make it specific and hopefully unique | 20:14 |
rfolco|ruck | weshay, ok thanks for your patience | 20:14 |
weshay | rfolco|ruck it would be nice to update the upstream playbook that logs container build to get more info | 20:14 |
rfolco|ruck | weshay, ok, I have the timestamp patch maybe we can add more verbose logs | 20:15 |
weshay | https://github.com/openstack-infra/tripleo-ci/blob/master/playbooks/tripleo-buildcontainers/post.yaml#L10-L27 | 20:15 |
rfolco|ruck | weshay, https://review.openstack.org/#/c/639089/5/playbooks/tripleo-buildcontainers/run.yaml | 20:16 |
rfolco|ruck | weshay, need a more verbose output to openstack container build command | 20:17 |
rfolco|ruck | or in kolla.cfg I don't know | 20:18 |
weshay | rfolco|ruck that is a very helpful update | 20:18 |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-fedora-28-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq- (3 more messages) | 20:19 |
weshay | rfolco|ruck we're going to switch to http://logs.openstack.org/89/639089/5/check/tripleo-build-containers-centos-7-buildah/cd41d96/logs/buildah-builds/ buildah soonish | 20:20 |
weshay | would be interesting to see if http://logs.openstack.org/89/639089/5/check/tripleo-build-containers-centos-7-buildah/cd41d96/logs/build.log.txt.gz | 20:21 |
weshay | is really all the real output we get | 20:21 |
weshay | Emilien did that work.. | 20:21 |
* weshay checks to see if there is a job that fails | 20:21 | |
rfolco|ruck | weshay, hmm doesn't seem to be collecting stderr | 20:22 |
weshay | http://logs.openstack.org/63/642663/5/check/tripleo-build-containers-centos-7-buildah/a3c4184/logs/build-err.log.txt.gz | 20:22 |
rfolco|ruck | might be wrong though | 20:22 |
weshay | http://logs.openstack.org/63/642663/5/check/tripleo-build-containers-centos-7-buildah/a3c4184/logs/build.log.txt.gz | 20:23 |
rfolco|ruck | ah there is | 20:23 |
weshay | hrm.. not much info | 20:23 |
* weshay pings Emilien | 20:23 | |
rfolco|ruck | yeah need to increase verbosity or debug mode on kolla.cfg I suppose... or in openstack cmd | 20:23 |
weshay | rfolco|ruck I think it would be kolla.cfg | 20:25 |
weshay | not sure | 20:25 |
weshay | but I think so | 20:25 |
weshay | common is just a wrapper | 20:26 |
rfolco|ruck | weshay, looking at kolla docs | 20:26 |
rfolco|ruck | weshay, there is a debug = true | 20:28 |
rfolco|ruck | weshay, will put in a patch for check job | 20:28 |
weshay | thanks | 20:32 |
*** agopi has quit IRC | 20:42 | |
*** fmount has quit IRC | 20:43 | |
*** fmount has joined #oooq | 20:44 | |
*** chkumar246 is now known as chandankumar | 20:47 | |
rlandy | weshay: hello | 21:20 |
rlandy | weshay: new wrt internal shared user | 21:20 |
weshay | rlandy howdy | 21:20 |
rlandy | weshay: pasting on pvt - internal details | 21:21 |
weshay | rlandy we need to not use rdo mirros in the reproducer | 21:23 |
weshay | :( | 21:23 |
weshay | 2019-03-12 21:22:58.996567 | primary | Could not install packages due to an EnvironmentError: HTTPConnectionPool(host='mirror.regionone.rdo-cloud.rdoproject.org', port=80): Max retries exceeded with url: /pypifiles/packages/50/d8/95f7cb04344033bf9d1a12c5a7969a15999b6a710fbe1969c517333d9a62/bcrypt-3.1.6-cp27-cp27mu-manylinux1_x86_64.whl (Caused by C | 21:23 |
weshay | onnectTimeoutError(<pip._vendor.urllib3.connection.HTTPConnection object at 0x7f74d8297990>, 'Connection to mirror.regionone.rdo-cloud.rdoproject.org timed out. (connect timeout=60.0)')) | 21:23 |
rlandy | weshay: because rdocloud goes down? | 21:23 |
rlandy | if running in rdocloud, it's best | 21:24 |
rlandy | for libvirt we set it otherwise | 21:24 |
rlandy | weshay: ^^ what do you want set? | 21:25 |
* rlandy thinks there is a mirror option | 21:26 | |
weshay | rlandy I don't know re: mirrors | 21:26 |
weshay | failing twice in a row | 21:26 |
weshay | :( | 21:26 |
weshay | makes me sad | 21:26 |
rlandy | where are you running? | 21:26 |
rlandy | libvirt or rdocloud? | 21:26 |
*** agopi has joined #oooq | 21:27 | |
weshay | rlandy I guess it was an rdo job | 21:27 |
weshay | ovb | 21:27 |
weshay | so it theory it should work.. but DANG IT | 21:27 |
rlandy | the rdocloud mirrors are best | 21:27 |
rlandy | yo can set them in the script | 21:27 |
weshay | they were | 21:28 |
rlandy | but not sure that's a good idea | 21:28 |
rlandy | to something other than rdo mirrors | 21:28 |
weshay | Downloading http://mirror.regionone.rdo-cloud.rdoproject.org/pypifiles/packages/7b/7c/c9386b82a25115cccf1903441bba3cbadcfae7b678a20167347fa8ded34c/pyasn1-0.4.5-py2.py3-none-any.whl ( | 21:28 |
rlandy | you can try ... https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/create-zuul-based-reproducer/templates/reproducer-zuul-based-quickstart.sh.j2#L257 | 21:32 |
rlandy | when you run the launcher playbook | 21:32 |
rlandy | weshay: ^^ | 21:32 |
weshay | k will try in a bit.. | 21:45 |
*** vkapalav has quit IRC | 21:51 | |
zbr | interesting q35 seems to break some tempest test, https://stackoverflow.com/questions/55131153/how-do-i-make-pytest-fail-fast-as-a-user-level-configuration | 21:52 |
zbr | ... or they are caused by something else. | 21:52 |
*** d0ugal has joined #oooq | 21:53 | |
zbr | i am curious what caused tmpwatch to fail to install on f28 job.... as the rpm is clearly the same on both distros. http://logs.openstack.org/17/642517/2/check/tripleo-ci-fedora-28-standalone/cea6a01/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2019-03-12_19_09_44 | 21:58 |
rlandy | weshay: when you have a moment ... https://sf.hosted.upshift.rdu2.redhat.com/logs/51/165051/18/check/tripleo-ci-rhel-7-standalone-rhos-14/95f88cb/job-output.txt.gz#_2019-03-12_17_33_02_374238 - any suggestions with deps on rhel? | 22:08 |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-fedora-28-standalone, tripleo-ci-centos-7-scenario010-multinode-oooq-container @ https://review.openstack.org/604298, (3 more messages) | 22:19 |
*** agopi has quit IRC | 22:23 | |
*** tosky has quit IRC | 22:43 | |
*** jjoyce has quit IRC | 22:46 | |
*** jjoyce has joined #oooq | 22:48 | |
*** jjoyce has quit IRC | 22:51 | |
*** jjoyce has joined #oooq | 22:52 | |
*** rascasoft has quit IRC | 23:05 | |
*** rascasoft has joined #oooq | 23:40 | |
*** dsneddon has quit IRC | 23:46 | |
*** rascasoft has quit IRC | 23:53 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!