Friday, 2018-05-25

hubbotFAILING CHECK JOBS on stable/ocata: tripleo-ci-centos-7-undercloud-upgrades @ https://review.openstack.org/564291, master: tripleo-ci-centos-7-3nodes-multinode, tripleo-ci-centos-7-containerized-undercloud-upgrades @ https://review.openstack.org/56044500:16
*** EmilienM is now known as EmilienM_PTO00:18
*** matbu has quit IRC00:19
*** jtomasek has quit IRC00:25
*** jtomasek has joined #oooq00:29
*** jtomasek has quit IRC00:53
*** jtomasek has joined #oooq01:15
*** tcw has quit IRC01:17
*** tcw has joined #oooq01:17
*** jtomasek_ has joined #oooq01:27
*** jtomasek has quit IRC01:28
myoungrlandy|rvr|bbl: i can add ya...02:15
myoungrlandy|rvr|bbl: doing that now02:15
hubbotFAILING CHECK JOBS on stable/ocata: tripleo-ci-centos-7-undercloud-upgrades @ https://review.openstack.org/564291, master: tripleo-ci-centos-7-3nodes-multinode, tripleo-ci-centos-7-containerized-undercloud-upgrades @ https://review.openstack.org/56044502:16
myoungrlandy|rvr|bbl: the reason the dashboards at http://rhos-release.virt.bos.redhat.com:3030/rhosp are showing 7d for queens promotions are because they are looking at the date the hash was created.02:17
myoungrlandy|rvr|bbl: @ http://dashboards.rdoproject.org/queens have both create and promote date listed for this reason...so for rdo2 (current-tripleo-rdo-internal) https://trunk.rdoproject.org/centos7-queens/61/15/61152f1f452f02d2f0bccc8e3b3b1695103c4114_ba256d89 is currently promoted hash, created 5/17, promoted 5/2202:18
myoungrlandy|rvr|bbl: next rdo2 jobs will pull the current-tripleo-rdo hash (https://trunk.rdoproject.org/centos7-queens/85/de/85de06e2c40bfdc8dee80506f8d1d809a93b900e_25e5ea4b), created today, promoted today02:19
*** rlandy|rvr|bbl is now known as rlandy|rover02:20
rlandy|rovermyoung: hi - reading back02:20
rlandy|rover2018-05-24 17:57:42,051 4326 INFO     promoter Skipping promotion of current-tripleo-rdo to current-tripleo-rdo-internal, missing successful jobs: ['oooq-queens-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans']02:21
myoungrlandy|rover: looking at current job https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rdo-promote-queens-rdo_trunk/88/02:21
myoungrlandy|rover: IMHO this log is the pits and hard to read/parse lol.02:21
rlandy|roverI know02:21
* myoung looks at http://38.145.34.55/queens.log02:21
rlandy|roverqueens says its is 7 day out02:22
rlandy|roverthat job did run02:22
myoungso the snipit above was the promoter trying to promote 23e8921b6f52e0361bf5e78123ff3843a4c7328c_1fed7df502:23
myoungi think...sec...always takes me a few mins.02:23
* myoung thinks we should spend a sprint on making ruck/rover eyes bleed less02:23
myoungrlandy|rover: i end up looking here btw http://sol.usersys.redhat.com/dlrnapi-reports/queens-combined.txt02:24
myoung^^ all promotion activity for queens02:24
myoungand there are filtered ones too by phase e.g. http://sol.usersys.redhat.com/dlrnapi-reports/queens-current-tripleo-rdo-internal.txt02:24
rlandy|rover2018-05-22 08:37:10, https://trunk.rdoproject.org/centos7-queens/61/15/61152f1f452f02d2f0bccc8e3b3b1695103c4114_ba256d89, current-tripleo-rdo-internal02:24
rlandy|rovernot 7 days old02:25
myounghttp://dashboards.rdoproject.org/queens02:25
myoungso the internal dashboard is saying 7 days because it's checking the creation date of the repo file, which is 5/1702:26
rlandy|roverwhy doesn't it promote a more current?02:26
rlandy|roverjobs passed today02:26
rlandy|roverugh ... promotion failure02:27
myoung2018-05-17 01:27 is the creation date02:27
myounglooking at log now to see why02:27
myoungjobs passed (fs20 and oooq-bmu)02:27
rlandy|rover "overcloud_deploy_result": "failed"02:27
rlandy|roversorry - looking at that failure02:27
myoungno maybe because this02:27
myoung2018-05-25 00:40:25,799 17311 ERROR    promoter Unable to acquire lock. Another promoter process is running. Aborting.02:27
myoungthis is a wedged promoter process, or another one02:28
myounghave a quick sec?  getting into promoter server and adding your key02:28
rlandy|rover  ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. , Code: 500"02:28
myoung^^ which is that?02:28
rlandy|rovermyoung: looking at the error in the current promotion job ...  Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"02:34
rlandy|roverfs001 failing both queens and master02:34
myoungrlandy|rover: ack..re promoter server it's in the middle of promoting master containers02:34
myoung2018-05-25 00:54:07,222 17991 INFO     promoter Promoting the container images for dlrn hash 98e667518f7aaa0aa9f31e2d41bbd6d3124cc7e3 on master to current-tripleo-rdo-internal02:34
rlandy|roverok - but queens should promote a newer hash02:34
myoungyes...after master02:35
myoungpromoter.sh does master --> pike --> ocata --> queens serially02:35
rlandy|rovero gee02:35
myounglast cycle wes and I spent some friday night time plowing thru this, dns at that time was messed so stuff was just straight up failing...but a full container push was taking around 88m02:36
myoungone of the pain points around our promoter is the output is all hidden until it's done02:38
myoungso can't tell if there's a problem or not...other than sleuthing as root on the promoter vm02:38
rlandy|roverok - thanks for your help02:40
rlandy|roverit's so confusing02:40
rlandy|roverand I'm kind of on my own here'02:40
rlandy|rovermyoung: sorry - juts logging a bug02:41
rlandy|roverI'll need you to check me02:41
rlandy|roverit's been a while since I logged a promotion blocker02:41
rlandy|rovermyoung: pls check this ... https://bugs.launchpad.net/tripleo/+bug/177328902:46
openstackLaunchpad bug 1773289 in tripleo "[queens/master promotion] fs001 fails overcloud deploy with 'No valid host was found. , Code: 500'" [Critical,Triaged]02:46
rlandy|roverdid I get the tags/status etc.correct?02:46
myoungrlandy|rover: yup02:47
myoungalert will add an alter in the #tripleo channel02:47
myoungpromotion-blocker will autocreate a CIX card02:47
*** myoung is now known as myoung|off02:57
*** myoung|off is now known as myoung|zzz03:12
rlandy|roverarxcruz|ruck: when you get in ... https://bugs.launchpad.net/tripleo/+bug/177328903:19
openstackLaunchpad bug 1773289 in tripleo "[queens/master promotion] fs001 fails overcloud deploy with 'No valid host was found. , Code: 500'" [Critical,Triaged]03:19
rlandy|roverI started investigating but it's getting late here03:20
*** skramaja has joined #oooq04:09
*** skramaja has quit IRC04:12
*** skramaja has joined #oooq04:12
*** skramaja has quit IRC04:13
*** skramaja has joined #oooq04:13
hubbotFAILING CHECK JOBS on stable/ocata: gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-ocata, tripleo-ci-centos-7-undercloud-upgrades @ https://review.openstack.org/564291, master: tripleo-ci-centos-7-3nodes-multinode, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch, tripleo-ci-centos-7-containerized-undercloud-upgrades, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master @  (1 more message)04:16
*** ykarel|away has joined #oooq04:17
*** links has joined #oooq04:19
*** jaganathan has joined #oooq04:23
*** jtomasek_ has quit IRC04:30
*** pgadiya has joined #oooq04:43
*** pgadiya has quit IRC04:43
*** ratailor has joined #oooq04:44
*** ykarel|away is now known as ykarel04:55
*** links has quit IRC04:58
*** saneax has joined #oooq05:06
*** links has joined #oooq05:08
*** links has quit IRC05:12
*** links has joined #oooq05:14
*** udesale has joined #oooq05:21
*** marios has joined #oooq05:36
*** quiquell|off is now known as quiquell05:37
quiquellarxcruz|ruck: Do you know why this is failing ? http://logs.openstack.org/67/570167/1/gate/tripleo-ci-centos-7-scenario002-multinode-oooq-container/18e0db5/job-output.txt.gz05:38
quiquellIt's around tempest05:38
quiquellIt's a reproducer change05:38
quiquellft1.1: setUpClass (tempest.api.object_storage.test_object_services.ObjectTest)_StringException: Traceback (most recent call last):05:39
quiquell  File "/usr/lib/python2.7/site-packages/tempest/test.py", line 172, in setUpClass05:39
quiquell    six.reraise(etype, value, trace)05:40
quiquell  File "/usr/lib/python2.7/site-packages/tempest/test.py", line 165, in setUpClass05:40
quiquell    cls.resource_setup()05:40
quiquell  File "/usr/lib/python2.7/site-packages/tempest/api/object_storage/test_object_services.py", line 36, in resource_setup05:40
quiquell    cls.container_name = cls.create_container()05:40
quiquell  File "/usr/lib/python2.7/site-packages/tempest/api/object_storage/base.py", line 113, in create_container05:40
quiquell    cls.container_client.update_container(container_name)05:40
quiquell  File "/usr/lib/python2.7/site-packages/tempest/lib/services/object_storage/container_client.py", line 37, in update_container05:40
quiquell    resp, body = self.put(url, body=None, headers=headers)05:40
quiquell  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 343, in put05:40
quiquell    return self.request('PUT', url, extra_headers, headers, body, chunked)05:40
quiquell  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 668, in request05:40
quiquell    self._error_checker(resp, resp_body)05:40
quiquell  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 794, in _error_checker05:40
quiquell    raise exceptions.PreconditionFailed(resp_body, resp=resp)05:40
quiquelltempest.lib.exceptions.PreconditionFailed: Precondition Failed05:40
quiquellDetails: Bad URL05:40
quiquell05:40
*** jfrancoa has joined #oooq05:40
hubbotFAILING CHECK JOBS on stable/ocata: gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-ocata, tripleo-ci-centos-7-undercloud-upgrades @ https://review.openstack.org/564291, master: tripleo-ci-centos-7-3nodes-multinode, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch, tripleo-ci-centos-7-containerized-undercloud-upgrades, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master @  (1 more message)06:16
*** ratailor has quit IRC06:40
*** ratailor has joined #oooq06:42
*** ccamacho has quit IRC06:53
*** ccamacho has joined #oooq06:53
chandankumarHey guys, fs001 is failing on pike and queens while deploying overcloud07:13
chandankumarhttps://review.rdoproject.org/jenkins/job/gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-pike-branch/4837/07:13
chandankumarhttps://review.rdoproject.org/jenkins/job/gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/1795/07:13
*** tesseract has joined #oooq07:17
*** rlandy|rover has quit IRC07:22
*** matbu has joined #oooq07:23
*** ykarel is now known as ykarel|lunch07:33
*** tesseract-RH has joined #oooq07:38
*** bogdando has joined #oooq07:48
arxcruz|ruckquiquell: investigating07:53
*** saneax has quit IRC07:58
*** holser__ has joined #oooq08:02
*** amoralej|off is now known as amoralej08:04
arxcruz|ruckquiquell: can you please send me again that grafana link?08:12
arxcruz|ruckI forgot to add to bookmarks and yesterday i restart my laptop :(08:12
hubbotFAILING CHECK JOBS on stable/ocata: gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-ocata, tripleo-ci-centos-7-undercloud-upgrades @ https://review.openstack.org/564291, master: tripleo-ci-centos-7-3nodes-multinode, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch, tripleo-ci-centos-7-containerized-undercloud-upgrades, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master @  (1 more message)08:16
*** dtantsur|afk is now known as dtantsur08:33
*** kopecmartin has joined #oooq08:34
*** panda|off is now known as panda08:51
*** panda is now known as Guest4565808:52
*** Guest45658 is now known as panda08:53
*** ykarel|lunch is now known as ykarel08:54
*** tesseract-RH has quit IRC08:56
quiquellarxcruz|ruck: http://38.145.34.131:3000/d/pgdr_WVmk/ruck-rover?orgId=109:03
quiquellChanging it now09:03
arxcruz|ruckchanging what ?09:03
quiquellthe dashboard09:04
quiquellYou have skiped the test ?09:04
arxcruz|ruckquiquell: yes, i'm finishing deploy a scenario002 and i'll test09:10
arxcruz|ruckbut if continuously failing, so in order to not block anything better skip for now09:11
arxcruz|rucki also open the lp09:11
*** jbadiapa has quit IRC09:20
*** jbadiapa has joined #oooq09:21
*** udesale_ has joined #oooq09:30
*** udesale__ has joined #oooq09:32
*** udesale has quit IRC09:33
*** udesale_ has quit IRC09:34
*** marios has quit IRC09:39
*** marios has joined #oooq09:40
*** holser__ has quit IRC09:45
*** zoli is now known as zoli|lunch09:52
quiquellpanda: Good morning09:56
quiquellHave an idea for the upgrades delta type09:56
quiquellhttps://trello.com/c/pfQ867XP/779-differentiate-the-same-featureset-to-do-the-two-begin-release-end-release-combinationsç+09:56
quiquellhttps://review.openstack.org/#/c/57055109:56
quiquellhttps://review.openstack.org/#/c/5705509:56
quiquellShit09:56
quiquellhttps://review.openstack.org/#/c/57055109:56
pandaquiquell: so we need to duplicate job definitions for the two types of jobs10:00
quiquellThis was going to happend10:01
quiquellWe want to run two types of upgrades10:01
pandaquiquell: that is adding another job cloned from this but with upgrade_delta_type: decrement10:01
quiquellWe can have a base job10:01
quiquellFor upgrades10:01
quiquellOr the base is the default10:01
quiquellidea is not to add jobs with n -> n + 1 ?10:02
quiquellWe were going to have to do new jobs anyways10:02
pandaquiquell: don't take anything I say as a no, I'm just evaluating the consequences10:03
quiquellpanda: Ok ok, feels always like a negative10:03
quiquellpanda: Other option would be like a job builder with all the upgrades posibilites10:04
quiquellwe can have a base job with10:05
quiquell job:10:05
quiquell    name: tripleo-ci-centos-7-containerized-undercloud-upgrades10:05
quiquell    parent: tripleo-ci-dsvm10:05
quiquell    run: playbooks/tripleo-ci/run.yaml10:05
quiquell    post-run: playbooks/tripleo-ci/post.yaml10:05
quiquell    timeout: 1080010:05
quiquell    nodeset: legacy-centos-710:05
quiquell    voting: false10:05
quiquell    branches: ^(?!stable/(newton|ocata|pike|queens)).*$10:06
quiquell    vars:10:06
quiquell      toci_jobtype: singlenode-featureset05010:06
quiquelland the the children10:06
quiquelljust one with upgrade_delta_type: increment and other with upgrade_delta_type: decrement10:06
quiquellpanda: Can we override jobs vars at project-template ?10:06
pandaquiquell: the only concern I have is that we may be forced to call two jobs in a different way if they do something different10:07
*** hamzy has quit IRC10:07
quiquellpanda: What do you mean ?10:08
pandawith this solution we are calling two jobs that do two different things with the same name. I'm just not sure if they are or not different enough that we need to call them in a different way10:08
pandawe're really playing with releases here10:08
quiquellpanda: They will have different names10:08
quiquellChanging job names is a problem I think10:08
quiquellWe can just add the new ones with a suffix10:09
pandaquiquell: and if we add a suffix we don't need the variable10:09
quiquellis normal to parse the job name ?10:09
pandaquiquell: because we will handle the suffix in the TOCI_JOBTYPE handling loop10:09
pandaquiquell: not the name, but the type10:10
quiquellYou mean have a new TOCI_JOBTYPE10:10
quiquellbut we still need new jobs10:10
pandaquiquell: yes10:10
quiquellpointing to the TOCI_JOBTYPE10:10
quiquellTOCI_JOBTYPE is more consisten with what we have10:11
quiquellBut kind of cryptic if we add more stuff10:11
pandaquiquell: yes10:12
quiquellBut the variable is not good for non upgrade jobs10:12
quiquellIt will add a varible that doesn't make sense10:12
pandaquiquell: first thing to understan really is if we really need to have two different job types for the two sides of the job10:12
quiquellpanda: We have to do two different runs10:13
quiquellpanda: At different zuul executions10:14
quiquellDon't know if we can enqueue to builds from the same job10:14
quiquellEverytime a job runs10:14
pandafood for questions :)10:15
arxcruz|ruckquiquell: so, the test object storage, i wasn't able to reproduce, on rdocloud is working :(10:15
pandaquiquell: probably openstack-indra is the best place to dump our concerns on the name10:16
quiquellor #zuul10:16
hubbotFAILING CHECK JOBS on stable/ocata: tripleo-ci-centos-7-undercloud-upgrades @ https://review.openstack.org/564291, master: tripleo-ci-centos-7-3nodes-multinode, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch, tripleo-ci-centos-7-containerized-undercloud-upgrades, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master @ https://review.openstack.org/56044510:16
pandaquiquell: zuull is probably more focus on zuul itself, not on the policy for naming in infra10:17
quiquellpanda: The main question if we need additional jobs10:18
quiquellpanda: in that case, we have multiple solutions to pass the upgrade delta type10:18
pandaI think it's absolutely inevitable to add additional jobs10:21
quiquellpanda: Then we have two options, new variable or TOCI_TYPE10:21
pandaquiquell: yes, and that depends at least partially on if we need a different job name  or not10:22
pandajob name and type10:22
quiquellpanda: New name is good for debugging10:22
pandawe need to find a good name to address this10:22
pandaa big historic name10:23
pandalike little endian and big endia10:23
quiquellMy 2 cents is about deltas10:23
quiquellupgrade delta type10:23
quiquellincremente/decrement is about deltas10:24
pandadelta to me is a difference between to values ... too much physics and math in my curriculum10:25
quiquelllt or gt is a difference10:26
quiquellboth of them10:26
quiquellincrement decrement10:26
pandadownstep and upstep ? we need brands, like slogans, mottos10:27
pandathis is advertisement, marketing, we need a sinle work to encapsulate an entire concept10:27
quiquellffu is more than one step10:27
pandalike featureset10:28
quiquellffu is delta=310:28
pandawhich nobody like apparently, but is the solution the suks less10:28
quiquellYep, let don't spend too much time in namings10:28
pandayes, but with ffu is is easy, you always have downsteps10:29
quiquellnop10:29
pandabut with n-1 -> n or n -> n+ 1 the delta is always 110:29
quiquellwe don't have to support n -> n + 3 ?10:29
quiquellA change in newton and need to test newton -> queens10:29
quiquellOr a change in ocata and need to toest ocata -> rocky10:30
pandaquiquell: no10:30
pandaquiquell: n -3 is usually so OEL that most of the times is not even possible to put a change in n-310:31
pandaEOL*10:31
quiquellI see10:31
quiquellok10:31
pandaquiquell: and not all the releases will have ffu10:31
quiquellHave to be a hell to support it btw10:31
pandanewton -> queens, then queens -> T release10:32
quiquellok ok10:32
pandaquiquell: yep10:32
quiquellwe have to focus on n -> n + 110:32
pandaquiquell: how do I access the wiki page with the table with upgrade map ?10:32
pandaI want to add the other map10:32
quiquellGoing to link it in the trello card10:33
pandaquiquell: do I need credentials ?10:33
quiquellI was thinkint about scripting it out10:33
quiquellTo generate the table from git repos10:33
quiquellhttps://wiki.openstack.org/wiki/Tripleo-upgrades-fs-variables10:33
quiquellDon't think so10:33
*** udesale__ has quit IRC10:36
*** udesale__ has joined #oooq10:36
chandankumarmyoung|zzz: arxcruz|ruck I and kopecmartin have populated the backlog for this sprint with all description feel free to take a look https://trello.com/c/dksT94bI/768-sprint-14-release-python-tempestconf-200 checklist10:39
pandaquiquell: I was thinking about something like this10:40
pandaquiquell: updated the page with a new table10:40
*** holser__ has joined #oooq10:41
pandaquiquell: updated again10:44
pandaquiquell: and regading your idea of making a script to update the table: https://xkcd.com/1319/ :)10:46
quiquellpanda: ^ Programmers are lazy people :-)10:51
pandaquiquell: going to discuss this with jistr10:51
quiquellpanda: Ask him why do we need a undercloud + overcloud upgrade job10:52
quiquellDoens't make any sense to me10:52
quiquellMaybe to discover problems from an upgraded undercloud when you upgrade de overcloud ?10:52
quiquells/de/the/10:53
pandaI asked him yesterday. This is the complete workflow, this is one of the things that customers do, and it has value to check all the steps in a upgrade, he mentioned ssl as a pain point10:53
pandaand one of the reasons it's important to test it10:53
quiquellpanda: Maybe we can do the undercloud upgrade, shelve it and next job do a overcloud upgrade10:54
quiquellTo able to chop it10:54
pandaquiquell: I doubt it :(10:54
quiquellpanda: Crazy idea10:54
*** marios has quit IRC11:00
*** marios has joined #oooq11:00
*** jbadiapa has quit IRC11:01
*** zoli|lunch is now known as zoli11:04
*** hubbot has quit IRC11:04
*** hubbot has joined #oooq11:05
*** jbadiapa has joined #oooq11:16
*** udesale__ has quit IRC11:31
*** jaosorior has quit IRC11:47
*** jaosorior has joined #oooq11:48
*** Guest60997 is now known as honza11:49
*** ratailor has quit IRC11:49
hubbotFAILING CHECK JOBS on stable/ocata: tripleo-ci-centos-7-undercloud-upgrades @ https://review.openstack.org/564291, master: tripleo-ci-centos-7-3nodes-multinode, tripleo-ci-centos-7-scenario002-multinode-oooq-container, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master @ https://review.openstack.org/56044512:17
*** rlandy has joined #oooq12:39
*** rlandy is now known as rlandy|rover12:39
rlandy|roverarxcruz|ruck: hello12:42
arxcruz|ruckrlandy|rover: hey, not so good today12:42
*** myoung|zzz is now known as myoung12:42
rlandy|roverarxcruz|ruck: we need PTL permission for https://review.openstack.org/#/c/570533/12:42
arxcruz|ruckrlandy|rover: i open a bug for object store test that is failing in scenario00212:42
arxcruz|ruckrlandy|rover: senario002 doesn't use featureset01912:43
rlandy|roveryes - I saw the commit12:43
arxcruz|ruckbut yeah, forgot to check if 019 run this test12:43
myoungrlandy|rover, arxcruz|ruck, do you guys have time to sync in a bit ~2 hrs ?12:43
rlandy|roverfeatureset016 - featureset01912:43
rlandy|roveryeah12:43
myoungor sooner if tempest squad sprint 14 planning goes on the faster side12:44
arxcruz|ruckmyoung: i might not, but i'll let you know12:44
rlandy|roverarxcruz|ruck: opened https://bugs.launchpad.net/tripleo/+bug/1773289 yesterday12:44
openstackLaunchpad bug 1773289 in tripleo "[queens/master promotion] fs001 fails overcloud deploy with 'No valid host was found. , Code: 500'" [Critical,Triaged]12:44
chandankumarmyoung: it will go faster, we are almost done with everything12:44
rlandy|roverarxcruz|ruck: that did not get to the escalation board12:45
rlandy|rovernot sure what I missed there12:45
rlandy|roverarxcruz|ruck: the promotion server needs work12:45
rlandy|roverhttp://38.145.34.55/queens.log12:45
arxcruz|ruckrlandy|rover: this test is not in fs016 or 01912:46
rlandy|roverthat's fine - ignore my comment12:46
rlandy|roverarxcruz|ruck: are you looking at the promotion logs?12:46
arxcruz|ruckrlandy|rover: might need the milestone?12:46
rlandy|roverarxcruz|ruck: milestone?12:47
arxcruz|ruckrlandy|rover: no, i wasn't12:47
arxcruz|rucktarget milestone12:47
arxcruz|rucki add rocky-3 in the bug12:47
arxcruz|rucknot sure12:47
rlandy|roverah12:47
rlandy|roverarxcruz|ruck: bugger question12:47
rlandy|roverbigger12:47
arxcruz|ruck4212:47
rlandy|roverwill check if it's still failing12:48
rlandy|roverrokcy-212:48
rlandy|roverrocky-212:48
rlandy|rover| 2018-05-25 07:26 || 118.0 min || Overcloud stack: FAILED. /home/jenkins/overcloud-deploy.sh fail. || Logs || openstack-periodic |12:49
rlandy|rover| 2018-05-25 01:30 || 118.0 min || Overcloud stack: FAILED. /home/jenkins/overcloud-deploy.sh fail. || Logs || openstack-periodic |12:49
rlandy|roverstill failing12:49
rlandy|roverugh12:49
rlandy|roverarxcruz|ruck: I am concerned wht queens phase 2 has not prmoted12:51
rlandy|roverit should have12:51
*** amoralej is now known as amoralej|lunch12:52
rlandy|roverykarel: hello - you ping'ed about https://bugs.launchpad.net/tripleo/+bug/1773289 late last night?12:52
openstackLaunchpad bug 1773289 in tripleo "[queens/master promotion] fs001 fails overcloud deploy with 'No valid host was found. , Code: 500'" [Critical,Triaged] - Assigned to Ronelle Landy (rlandy)12:52
rlandy|roverarxcruz|ruck: ocata is also a pain - I can't get that to reproduce12:53
rlandy|roverit fails on a diff error each time12:53
*** jaosorior has quit IRC12:55
rlandy|roverykarel: I can reproduce the error on my own tenant for fs001 - Went to status ERROR due to "Message: No valid host was found. , Code: 500"12:55
rlandy|roverprovisioning state -  clean failed12:56
ykarelrlandy|rover, yes i pinged, and seeing the bug created yesterday made me think that can it be because of rdo cloud minor update yesterday12:57
rlandy|roverykarel: it could be - but here is what I see in my reproducer12:58
pandaquiquell: I have updated the table after an exhausting meeting with jistr12:58
rlandy|roverykarel: introspection passes12:58
rlandy|rover 4 node(s) successfully moved to the "available" state.12:58
quiquellpanda: Can you give me some summary ?12:58
quiquellOr the table is enough12:58
pandaquiquell: lol12:58
rlandy|roverbut when I look at the state of the nodes after deploy failed12:58
rlandy|roverthey are in provisioning state -  clean failed12:59
pandaquiquell: the table is not enough, but you have to read it first12:59
pandaquiquell: it's actually very complex12:59
rlandy|roverykarel: that is the first time I have seen such a provisioning state12:59
rlandy|roverpanda: EmilienM_PTO asked if we could land https://review.openstack.org/#/c/568946/ last night13:00
rlandy|rovertrown|outtypewww: ^^13:00
rlandy|roverany objections>13:00
pandait's too late to land it last night13:00
myoungchandankumar: kopecmartin arxcruz|ruck i'll be a few mins late to planning13:00
rlandy|roverhttps://review.openstack.org/#/c/567060/ merged13:00
rlandy|roverpanda: ^^13:00
chandankumarmyoung: ack13:00
panda\o/13:01
rlandy|roverpanda: will need to recheck https://review.openstack.org/#/c/568946/13:01
rlandy|roverbut any objection to merging it?13:01
pandarlandy|rover: it's ok to land it, we may need to modify this implementation in this sprint13:01
rlandy|roverpanda: ok - thanks13:01
rlandy|roverwill recheck it13:02
rlandy|roveronce we figure out fs00113:02
rlandy|roverykarel: any thoughts?13:02
rlandy|rover2018-05-25 05:22:44 | 2018-05-25 05:21:37Z [1.Controller]: CREATE_FAILED  ResourceInError: resources.Controller: Went to status ERROR due to "Message: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 43016df9-31a6-42d9-a391-05d27df25a56., Code: 500"13:03
rlandy|roverthat is usually a state of the overcloud nodes not being sufficient for the deployment13:04
ykarelrlandy|rover, i have seen this error earlier also, ironic guys would have more insight on it. rlandy|rover do you see some errors in nova13:05
rlandy|roverdtantsur: hi - what would put nodes in a 'clean failed ' state if introspection passed13:05
*** skramaja has quit IRC13:05
rlandy|roveryakrel: yep - asking ironic gurus13:06
ykarelrlandy|rover, on one of the logs i can see: Insufficient compute resources: Free disk 39.00 GB < requested 40 GB, do you see similar in your local reproducer13:06
rlandy|roverI am looking there now13:07
rlandy|roverwe nay need bigger nodes13:07
rlandy|rovermay13:07
rlandy|roverwhy only fs001, though?13:08
rlandy|roverfs035 passes13:08
ykarelgood point, let's check wh13:10
ykarelcontainerized undercloud?13:10
rlandy|roveryakrel: that error makes sense though13:10
rlandy|roverqueens and master failing13:10
rlandy|roverfor what we see happen to the nodes13:11
rlandy|rovercontainerized_overcloud: >-13:11
rlandy|rover  {% if release in ['newton', 'ocata', 'pike'] -%}13:11
rlandy|rover  false13:11
rlandy|rover  {%- else -%}13:11
rlandy|rover  true13:11
rlandy|rover  {%- endif -%}13:11
rlandy|roverykarel: yes ^^13:12
dtantsurrlandy|rover: cleaning process failed13:12
dtantsurif you have it enabled, it will run when the node goes to "available"13:12
rlandy|roverdtantsur: is it a problem if cleaning process fails?13:12
dtantsurrlandy|rover: yes, it should not13:12
rlandy|roverundercloud_clean_nodes: >-13:13
rlandy|rover  {% if release not in ['newton','ocata','pike'] -%}13:13
rlandy|rover  true13:13
rlandy|rover  {%- else -%}13:13
rlandy|rover  false13:13
rlandy|rover  {%- endif -%}13:13
rlandy|roverdtantsur: ^^ that setting13:13
arxcruz|ruckrlandy|rover: how to log into promotion? what's the username? centos?13:13
rlandy|roverarxcruz|ruck: ask myoung to add your key13:14
dtantsurrlandy|rover: right, this should pass on queens and master13:14
rlandy|roverI can do it in a but13:14
rlandy|roverbiy13:14
rlandy|roverykarel: dtantsur: ok - that setting is in fs001 and not in fs03513:14
pandaquiquell: everything clear, right ? :)13:14
rlandy|roverand it's failing13:14
rlandy|roverI have a reproducer13:15
quiquellpanda: I am digesting13:15
dtantsurwell, then we need to fix it :) it's not enabled in fs035 indeed13:15
rlandy|roverdtantsur: ok - pls advise as to how - is it an infra issue13:16
rlandy|roverI can give you access to my reproducer env13:16
rlandy|roverhttps://bugs.launchpad.net/tripleo/+bug/177328913:16
openstackLaunchpad bug 1773289 in tripleo "[queens/master promotion] fs001 fails overcloud deploy with 'No valid host was found. , Code: 500'" [Critical,Triaged] - Assigned to Ronelle Landy (rlandy)13:16
rlandy|rover^^ related bug13:16
pandaquiquell: bon appetit13:17
rlandy|roverit's blocking gates/promotion since yesterday13:17
rlandy|roverwe are getting no available host - if that makes sense from nodes in clean failed state13:18
dtantsurrlandy|rover: "Timeout reached while waiting for callback for node" it may be an infra issue, hard to tell13:18
rlandy|roverdtantsur: we have a third possibility ....13:19
rlandy|rover<ykarel> rlandy|rover, on one of the logs i can see: Insufficient compute resources: Free disk 39.00 GB < requested 40 GB, do you see similar in your local reproducer13:19
dtantsurrlandy|rover: I wonder if we can be hitting https://storyboard.openstack.org/#!/story/200207913:19
rlandy|roverbut that does not kill fs03513:19
dtantsurtbh in your logs I don't see cleaning failures, but rather: https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/512d712/undercloud/var/log/containers/ironic/ironic-conductor.log.txt.gz?#_2018-05-25_01_13_16_59413:20
dtantsurwhich is a deploy failure13:20
dtantsurwhat makes you think that cleaning is related?13:20
rlandy|roverha13:20
rlandy|roverjust that I saw clean failed state on my reproducer13:21
rlandy|rovermay be a side problem13:21
rlandy|roverdtantsur: we can increase the nodes size13:21
dtantsurbtw this log is stripped - it does not have the beginning13:21
rlandy|roverfor the overcloud13:21
dtantsuris it okay?13:21
dtantsurrlandy|rover: can I get a link to the "Insufficient resources" in the logs please?13:22
rlandy|roverykarel saw that13:22
rlandy|roverthat error would make the most sense but I can't tell why then fs035 would pass13:22
* rlandy|rover gets full logs13:23
ykareldtantsur, https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/512d712/undercloud/var/log/extra/errors.txt.gz#_2018-05-25_01_14_27_34613:23
ykarelrlandy|rover, ^^13:23
dtantsurhmm13:24
*** tcw has quit IRC13:25
dtantsuron master we should not use disk resources at all, I wonder why it's popping up13:25
rlandy|roveron master and queens13:25
*** tcw has joined #oooq13:25
rlandy|roversince yesterday13:25
rlandy|roverwe had an rdocloud update13:25
rlandy|roverperhaps somethings changed there??13:26
*** tcw has quit IRC13:26
dtantsurrlandy|rover: well, I suspect the immediate problem is that (contrary to common sense) you cannot put an instance requiring 40 GiB to a 40 GiB node13:26
dtantsurs/node/machine/13:26
dtantsurIronic needs some space for partition table, configdrive, etc13:27
dtantsurso when introspecting we report <real disk size> - 113:27
dtantsurwhich should no longer matter since queens, but for some reason it does13:27
* dtantsur goes hunting for nova folks13:27
rlandy|roverdtantsur: I'd agree :) - but it seems strange that it only hots fs00113:27
rlandy|roverhits13:28
dtantsuragreed13:28
dtantsurthere is one more difference between these jobs13:28
dtantsurfs001 runs a simpler version of introspection. it should not affect local disk discovery, but...13:29
dtantsurrlandy|rover: anyway, asking owalsh on #tripleo13:29
rlandy|roverdtantsur: thanks - following13:29
rlandy|rover I can change the node size but I would prefer not to13:30
dtantsurwell, I think this problem can be ignored, this is why:13:31
dtantsurthe failures begins with 4 nodes getting timeouts on callback (from the ramdisk)13:31
dtantsurthen nova tries to reschedule the nodes, BUT we have cleaning enabled, so the nodes may not be immediately available13:32
dtantsurduring this time we see all kinds of weird messages from nova13:32
dtantsurbut the key problems seems the callback timeout13:32
dtantsurI wonder if we actually have similar timeouts on fs035 but they're masked by retries?13:32
dtantsurrlandy|rover: thought dump ^^13:33
rlandy|roverdtantsur: fs035 does not having cleaning enabled so we would not have that extra step13:33
dtantsurrlandy|rover: well, it's fine, but the cleaning itself may not be an issue13:33
dtantsurit may simply uncover something requiring rescheduling13:34
dtantsurbut this is just a wild guess13:34
dtantsurcan I get similar logs from fs035 please?13:34
rlandy|roveryep - posting13:34
rlandy|roverdtantsur: https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master/ebcb753/undercloud/13:35
rlandy|roverlogs from the same promotion run13:35
dtantsurno, no timeouts there13:36
dtantsurso, it's time to start my favourite rant: we don't publish console logs from the "baremetals", do we?13:36
dtantsuraka https://bugs.launchpad.net/tripleo/+bug/177108213:38
openstackLaunchpad bug 1771082 in tripleo "[rfe] TripleO CI must collect and publish console logs from fake overcloud nodes during introspection and deployment" [High,Triaged]13:38
rlandy|roverno :( - but I'll raise that priority13:38
quiquellpanda: this table mix updates + upgrades13:40
quiquellLet's separate them13:40
quiquellI don't understand the package type13:40
rlandy|roverhmmm ... we should deploy large nodes13:40
rlandy|roverhttps://github.com/openstack-infra/tripleo-ci/blob/master/scripts/te-broker/create-env#L5113:41
rlandy|roverci.m1.large    |  8192 |   80 |         0 |     4 | False13:41
rlandy|roverwhy is it complaining about 40?13:41
*** jfrancoa has quit IRC13:42
rlandy|rovercomparing: https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master/ebcb753/undercloud/home/jenkins/instackenv.json.txt.gz13:43
rlandy|roverand13:43
rlandy|roverhttps://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/64c9587/undercloud/home/jenkins/instackenv.json.txt.gz13:44
rlandy|roverdoes say 8013:44
*** jfrancoa has joined #oooq13:46
dtantsurrlandy|rover: they have only one disk, right?13:46
pandaquiquell: I have a personal issue to solve, then we can chat.13:47
quiquellpanda: Let's talk next week, I will disconnect soon13:47
rlandy|roverI think so13:47
ykarelrlandy|rover, have you see any failures recently? looks like we are not seeing it now, in morning i asked kforde to look into the issue, then he applied some patches and told the patch should fix the issue13:47
dtantsurrlandy|rover: because apparently introspection detects 40..13:48
pandaquiquell: regarding package type, we have repos that are never installed in overcloud. THT for example is only insalled in the undercloud, so the injection on the overcloud is never needed13:48
ykareli can see some passes in fs001 for queens/master gate jobs13:48
* dtantsur MAAAAAGIIIC13:48
pandaquiquell: other repos like nova, for example, will eventually need to be injected in both undercloud and overcloud13:48
rlandy|roverykarel: that would be nice13:48
quiquellpanda: So you mean excluded overcloud packages ?13:48
rlandy|roverI'll ask kforde what he did13:48
rlandy|roverdtantsur: thanks for your time13:49
ykarelrlandy|rover, yes we should know that :)13:49
rlandy|roversorry for the runaround13:49
dtantsurrlandy|rover: you're always welcome :)13:49
pandaquiquell: I mean that for some changes, we'll need to ignore overcloud injection, because it's not needed13:49
*** d0ugal_ has joined #oooq13:50
rlandy|roverdtantsur: I am going to find out what happened on the infra side so we don't have this fun again :)13:50
quiquellpanda: Ok, but the package type, are the packages excluded in the overcloud, isn't it ?13:50
quiquellpanda: NBut let's talk next week13:50
dtantsur:)13:50
*** d0ugal has quit IRC13:51
pandaquiquell: if the package type is undercloud only, yes13:53
quiquellpanda: A lot of stuff is mixed in the table, we have to simplify man13:53
pandaquiquell: upgrades are complicated. Let's sync next week. This is what we came up with jiri13:54
quiquellpanda: Let's sync, have a good weekend13:54
*** tesseract-RH has joined #oooq13:57
rlandy|roverarxcruz|ruck: hi - if you send me your keys, I can get you on the promotion server13:59
arxcruz|ruckrlandy|rover: https://github.com/arxcruz.keys14:00
*** tesseract has quit IRC14:00
*** ykarel is now known as ykarel|away14:00
*** amoralej|lunch is now known as amoralej14:02
quiquellrlandy|rover: About promoter, life I check the promoter.sh script was still running in the tmux14:02
quiquells/life/last/14:02
rlandy|roverquiquell: master takes forever and I want queens to promote :)14:02
rlandy|roverqueens phase 214:02
quiquellok14:02
rlandy|roverthose jobs are running and the dashboard still says 7d old14:02
quiquellThere is now a promoer.sh script instead of crontab14:02
rlandy|roverwhich it is not14:03
quiquellTO run them sequencially14:03
quiquellIf you do a tmux a14:03
quiquellYou go to the execut ion of the script14:03
quiquellNo one has never time to productify that14:03
quiquellrlandy|rover, arxcruz|ruck: https://review.rdoproject.org/r/#/c/13622/14:04
rlandy|roverlooking14:04
rlandy|roverarxcruz|ruck: promotion just failed14:05
quiquellIt's blocked at master14:05
rlandy|roverInstall the undercloud14:05
arxcruz|ruckquiquell: you have my -114:06
rlandy|roveroh dear14:06
rlandy|roverpromotion failing all over the place14:06
arxcruz|ruckrlandy|rover: checking14:06
quiquellarxcruz|ruck: This is just a WIP14:07
rlandy|roverquiquell: will have to review this when the world is not on fire14:07
quiquellrlandy|rover: Sure thing, we didn't manage to productify the state of promoter14:08
*** moguimar has joined #oooq14:08
arxcruz|ruckrlandy|rover: which job fails ?14:09
arxcruz|rucki'm not seeing14:09
arxcruz|ruckwhich phase14:09
ykarel|awayarxcruz|ruck, queens/master14:10
ykarel|awayhttps://review.rdoproject.org/zuul/14:10
rlandy|roverarxcruz|ruck: the promotion is blood red14:10
rlandy|rovermaster and queens are failing for diff reasons14:11
*** d0ugal_ has quit IRC14:11
*** d0ugal has joined #oooq14:11
arxcruz|ruckchecking14:12
rlandy|rover Prepare for the containerized deployment on master14:12
arxcruz|ruckImageUploaderException: Could not pull image docker.io/tripleomaster/centos-binary-rsyslog-base14:12
rlandy|roveryep14:12
rlandy|roverwe got a promoter issue14:13
rlandy|roverarxcruz|ruck: I'll check the queens problem14:13
rlandy|roverif you can investigate this one14:13
arxcruz|ruckrlandy|rover: opening the bug14:13
arxcruz|ruckrlandy|rover: I can for like one hour, i'm in the office today, so i need to leave soon, it's 4pm here14:14
ykarel|awaywhy docker.io is used ^^, here rdo registry should have been used14:14
*** quiquell is now known as quiquell|off14:14
arxcruz|ruckykarel|away: in this phase ?14:14
arxcruz|rucki think only on phase 2 rdo registry is being used no ?14:15
rlandy|roverqueens also has containerized_deployment issues14:15
ykarel|awayarxcruz|ruck, in promotion jobs rdo registry should be used as container-build job pushes to rdo registry, no?14:15
rlandy|roverImageUploaderException: Could not pull image docker.io/tripleoqueens/centos-binary-cron14:15
rlandy|roverimagename: docker.io/tripleoqueens/centos-binary-manila-scheduler:1f0eaa23ba556a9af5abd7f394374dd81d657d3d_5a496b9e14:16
hubbotFAILING CHECK JOBS on stable/queens: gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens @ https://review.openstack.org/567224, stable/ocata: tripleo-ci-centos-7-undercloud-upgrades @ https://review.openstack.org/564291, master: tripleo-ci-centos-7-3nodes-multinode, tripleo-ci-centos-7-scenario002-multinode-oooq-container, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch, gate- (1 more message)14:17
rlandy|roverok - well we got one major issue going on here14:17
rlandy|roverykarel|away: arxcruz|ruck; ^^ so at least it's not two as I though14:17
rlandy|roverthought14:17
arxcruz|ruckrlandy|rover: https://bugs.launchpad.net/tripleo/+bug/1773381 opened for the docker issue14:18
openstackLaunchpad bug 1773381 in tripleo "Promotion failing to pull image from docker" [Critical,Triaged] - Assigned to Arx Cruz (arxcruz)14:18
arxcruz|ruckykarel|away: so i'll move to rdo14:18
rlandy|roverok14:18
rlandy|rovernot so fast with the moving - let's compare14:18
rlandy|roverimagename: trunk.registry.rdoproject.org/tripleoqueens/centos-binary-aodh-api:d29f833a5977b89a110d19c02fa6bf860709af34_f0f5884b14:19
rlandy|roveryeah - that looks diff14:19
rlandy|roverwhat happened to change that?14:19
arxcruz|ruckrlandy|rover: ykarel|away hmmm https://github.com/openstack-infra/tripleo-ci/blob/master/toci-quickstart/config/testenv/multinode-rdocloud.yml#L4014:21
arxcruz|ruckshould be using registry instead of docker, perhaps the PERIODIC var isn't set anymore ?14:21
rlandy|roverlooking at what might have changed14:22
rlandy|roverhttps://github.com/openstack-infra/tripleo-ci/blob/master/toci-quickstart/config/testenv/ovb-rdocloud.yml#L2614:23
rlandy|roversame there14:23
myoungchandankumar, arxcruz|ruck, kopecmartin: thanks for your time and a great sprint planning!  I think we're in great shape for sprint 14.  The pre-planning and time put into cards for this sprint ahead of planning meeting worked very well.14:26
myoungchandankumar++14:26
hubbotmyoung: chandankumar's karma is now 414:26
myoungkopecmartin++14:26
hubbotmyoung: kopecmartin's karma is now 114:26
rlandy|rover[undercloud-deploy : Install the undercloud] issue - but maybe that's a side problem14:26
myoungarxcruz|ruck++14:26
hubbotmyoung: arxcruz|ruck's karma is now 114:26
arxcruz|ruckrlandy|rover: periodic is set to 114:26
arxcruz|ruckhttps://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-master/e77b063/console.txt.gz#_2018-05-25_13_35_33_59214:26
myoungchandankumar: do you have a hot sec?  forgot to ask you something14:26
myoungchandankumar: bluejeans.com/matyoung, only need 45sec14:27
rlandy|roverhmmm ... maybe it's hardcoded now14:30
arxcruz|ruckrlandy|rover: what is hardcoded ?14:30
arxcruz|ruckokay, latest success had pull from trunk.registry.rdoproject.org14:34
rlandy|rover13:35:33 +(/opt/stack/new/tripleo-ci/toci_gate_test.sh:209): PERIODIC=114:35
rlandy|roverarxcruz|ruck: latest success>14:36
rlandy|rover?14:36
arxcruz|ruckrlandy|rover: https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-master/ece9569/console.txt.gz14:37
rlandy|roverarxcruz|ruck: oh you mean before this failure>14:37
arxcruz|ruckrlandy|rover: https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-master/ece9569/undercloud/home/jenkins/overcloud_prep_containers.log.txt.gz#_2018-05-25_06_00_0414:37
arxcruz|ruckrlandy|rover: yes14:37
rlandy|roverquestion is - what changed??14:37
rlandy|roverarxcruz|ruck: maybe this review messed things up? https://review.openstack.org/#/c/567060/34/toci_quickstart.sh14:39
rlandy|roverI don't see how though14:39
rlandy|rover13:38:28 TASK [Always build images in the periodic jobs] ********************************14:41
rlandy|rover13:38:28 Friday 25 May 2018  13:38:28 +0000 (0:00:00.058)       0:04:59.672 ************14:41
rlandy|rover13:38:28  [WARNING]: when statements should not include jinja2 templating delimiters14:41
rlandy|rover13:38:28 such as {{ }} or {% %}. Found: {{ lookup('env', 'PERIODIC')|default('0')|int ==14:41
rlandy|rover13:38:28 1 }}14:41
rlandy|rover13:38:28 ok: [undercloud]14:41
rlandy|rover^^ knows it's set14:41
arxcruz|ruckrlandy|rover: where are you seeing this task ?14:45
*** links has quit IRC14:45
rlandy|roverin fs035 console log14:45
rlandy|roverseems fine though14:45
arxcruz|ruckrlandy|rover: thing is, periodic is being set to 114:46
rlandy|roverarxcruz|ruck: merged this morning Zuul14:48
rlandy|roverChange has been successfully merged by Zuul14:48
rlandy|rover6:59 AM14:48
rlandy|roverhttps://review.openstack.org/#/c/567060/14:49
rlandy|roveridk what else changed14:49
ykarel|awayrlandy|rover, i think you are correct ^^ patch is causing the issue14:49
ykarel|awaythis messed with the install command14:49
rlandy|roverykarel|away: arxcruz|ruck: I am going to revert it14:50
ykarel|awayand changed the order of --extra-args passed, multinode-rdocloud should override the release file params14:50
*** jtomasek has joined #oooq14:50
ykarel|awayrlandy|rover, yup revert should fix that,14:50
rlandy|roverykarel|away: arxcruz|ruck: https://review.openstack.org/#/c/570585/14:50
ykarel|away${RELEASE_ARGS[$playbook] $PLAYBOOK_REPEATED_ARGS should correct, but good to check complete order14:51
rlandy|roverykarel|away: the tests on the patch never failed14:51
ykarel|awayrlandy|rover, because we use docker.io and promoted container images on gate jobs14:51
rascamyoung, rlandy|rover hey folks how's going? After an entire day of debugging I should be able to fix the master Rdophase2 job... testing it right now, hopefully soon I'll ask you for a review...14:51
rlandy|rover$PLAYBOOK_REPEATED_ARGS should correct?14:52
rlandy|rovershould be correct?14:52
rlandy|roverykarel|away: ^^ not sure what you mean14:52
ykarel|awayrlandy|rover, so multinode-rdocloud.yml should be in the end of the command(after promoting-testing-hash-master.yml)14:52
ykarel|awaybut the patch reversed the order14:53
*** tesseract has joined #oooq14:53
arxcruz|ruckrlandy|rover: maybe add a related-bug or closes bug https://bugs.launchpad.net/tripleo/+bug/1773381  ?14:53
openstackLaunchpad bug 1773381 in tripleo "Promotion failing to pull image from docker" [Critical,Triaged] - Assigned to Arx Cruz (arxcruz)14:53
rlandy|roveryep - will add that14:54
rlandy|roverI'll fix that patch later14:54
ykarel|awayrlandy|rover, you got what i meant?14:55
*** tesseract has quit IRC14:56
*** tesseract-RH has quit IRC14:56
rlandy|roverykarel|away: yep - will rework order14:56
*** tesseract has joined #oooq14:56
rlandy|roverykarel|away: arxcruz|ruck: https://review.openstack.org/#/c/570585/14:56
rlandy|roverpls vote14:57
arxcruz|ruckrlandy|rover: i'm just a +1 guy :/14:57
rlandy|roverpanda: ^^ pls vote14:58
rlandy|roverykarel|away: do you have +2?14:59
arxcruz|ruckwe should not merge stuff on friday lol14:59
arxcruz|ruckrlandy|rover: already talked with alex14:59
ykarel|awayrlandy|rover, voted +1 :)14:59
*** moguimar has quit IRC14:59
rlandy|roverthanks14:59
rlandy|roverI'll rework the playbooks order there15:00
* ykarel|away leaving15:00
rlandy|roverwe should get ykarel|away +2 rights15:00
rlandy|rovermore than earned it15:01
arxcruz|ruckwe should stolen ykarel|away for our team15:01
ykarel|awaythere is still lot more to learn15:01
rlandy|roverpanda: ping ping there15:02
myoungpanda: ping, still want to take a whack at backlog grooming?15:02
arxcruz|ruckmyoung: rlandy|rover want to talk, otherwise i'm heading to home15:03
rlandy|roverarxcruz|ruck: ant news on stuck promotion script>15:03
rlandy|roverany15:03
rlandy|roverarxcruz|ruck: also I am out on monday15:04
myoungarxcruz|ruck: rlandy|rover: I can chat if you like or are blocked, or need help with the promoter / promotion script, or ______.15:04
arxcruz|ruckrlandy|rover: not yet, but i can take a look15:04
arxcruz|rucki didn't because myoung told he was looking into it after the meeting15:04
myoungarxcruz|ruck: rlandy|rover: I wanted to sync prior to holiday just to sync on where we are.15:04
rlandy|rovermyoung: arxcruz|ruck: ready to sync when you are15:05
myoungarxcruz|ruck, rlandy|rover, I'm in my room (bj/matyoung).  panda hasn't entered yet so guessing I have this hour free :)15:05
* myoung logs into promoter to poke at it15:05
rlandy|roverI need panda to +2 my patch first15:05
pandaouch15:06
arxcruz|ruckpanda: bad panda15:06
myoungpanda: lol...i'm trying on sarcasm :)15:06
pandasorry15:06
rlandy|roverpanda: before you go anywhere https://review.openstack.org/#/c/570585/15:06
pandaI'm trying to solve to many problems at once15:06
rlandy|roverneed to rework the order there15:06
rlandy|roverbut blocking promotion atm15:07
pandamyoung: coming15:07
rlandy|roverI need another +215:07
pandarlandy|rover: approved, what did it break ?15:07
rlandy|roverpanda: the order of arguments15:07
rlandy|rovereasy enough fix15:08
rlandy|roverI';; do it later15:08
pandaoh15:08
pandaok15:08
rlandy|roverjust want to revert to get the promotions moving15:08
pandasure15:08
rlandy|roveredge case15:08
*** holser__ has quit IRC15:09
*** holser__ has joined #oooq15:12
*** jfrancoa has quit IRC15:12
*** bogdando has quit IRC15:14
*** holser__ has quit IRC15:17
*** dtrainor has joined #oooq15:18
*** matbu has quit IRC15:19
*** ykarel|away has quit IRC15:19
*** tcw has joined #oooq15:23
*** marios has quit IRC15:25
*** jtomasek has quit IRC15:27
*** dtrainor has quit IRC15:36
rascarlandy|rover, myoung, https://code.engineering.redhat.com/gerrit/#/c/139887/ if you can please have a look by the end of today and maybe merge it, this should fix BM Rdophase2 deployments15:50
*** zoli is now known as zoli|gone16:02
*** ccamacho has quit IRC16:06
rlandy|roversorry - busy with upstream fires atm16:13
*** ccamacho has joined #oooq16:15
hubbotFAILING CHECK JOBS on stable/ocata: tripleo-ci-centos-7-undercloud-upgrades @ https://review.openstack.org/564291, master: tripleo-ci-centos-7-3nodes-multinode, tripleo-ci-centos-7-scenario002-multinode-oooq-container, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master @ https://review.openstack.org/560445, stable/queens: gate- (1 more message)16:17
*** jtomasek has joined #oooq16:28
rlandy|roverpanda: https://review.openstack.org/#/c/570593/ - your vote pls16:33
rlandy|roveralex ok'ed it16:33
pandarlandy|rover: quotes are wrong16:34
rlandy|roverpanda: overcloud_templates_path is a var16:35
rlandy|roverthe rest is an actual string16:35
pandadoes this work ?16:35
rlandy|roverwe will find out16:35
rlandy|roverpanda: so ...16:36
rlandy|roverhttps://github.com/openstack/tripleo-quickstart/blob/master/config/release/tripleo-ci/master.yml#L2116:36
rlandy|roverwe default to a var w/o brakcets16:36
rlandy|roveridk how else to denote a string and a var in default16:37
*** jtomasek has quit IRC16:37
pandarlandy|rover: yeah it may be the only way, I just never done it before. I'll +2, you can +1W when checks pass16:38
rlandy|roverpanda:ack16:38
rlandy|roversorry for the rush - a lot of breaking changes :(16:39
pandayeah, let's try not to add some more :P16:40
rlandy|rovermyoung: seen this error before: https://logs.rdoproject.org/85/570585/2/openstack-check/gate-tripleo-ci-centos-7-container-to-container-upgrades-master-nv/Zb3586619a40848269762ae63b5fe5c5f/undercloud/home/jenkins/overcloud_deploy.log.txt.gz#_2018-05-25_16_20_55?16:46
rlandy|roveryou mentioned something about hostname bet maybe that was unrelated16:46
rlandy|roverhttps://bugs.launchpad.net/tripleo/+bug/177199716:47
openstackLaunchpad bug 1771997 in tripleo "Mixed version (R/Q) deploy with config-download cannot find role_name variable" [High,Fix released] - Assigned to Jiří Stránský (jistr)16:47
rlandy|rovernvm - been there for a wheil16:50
rlandy|roverwhile16:50
rlandy|roverpanda: hmmm - string  didn't quite work16:55
* myoung finds lunch and will biab 16:56
pandarlandy|rover: :(16:56
rlandy|roverthinking about how to do this16:56
rlandy|rover-r {{undercloud_roles_data|default("{{ overcloud_templates_path/roles_data_undercloud.yaml}}")}}16:57
rlandy|roverpanda: ^^ how do we feel about that one16:57
myoungpanda: thanks for spending an hour with me backlog grooming.16:58
myoungpanda++16:58
hubbotmyoung: panda's karma is now 116:58
rlandy|rover"{{ overcloud_templates_path }} /roles_data_undercloud.yaml"16:58
rlandy|roverbetter yet16:58
myoungall: if curious where the "new" and "technical debt" cols on our trello board went, we've been plowing thru, grooming, sorting, cleaning, etc.16:59
myounghttps://trello.com/b/N9gHLMyP/tripleo-ci-backlog-grooming is the sandbox.  Nothing is being arbitrarily nuked, but moved to columns that can be reviewed by whomever is interested.16:59
pandarlandy|rover: yeahm let's try the second one16:59
rlandy|roverokie dokie16:59
rlandy|roverbrb17:02
*** rlandy|rover is now known as rlandy|rover|brb17:02
rlandy|rover|brblet's hope the world doe snot catch on yet another fire in the next few minutes17:02
*** tesseract has quit IRC17:15
*** zoli|gone is now known as zoli17:23
*** rlandy|rover|brb is now known as rlandy|rover17:34
*** amoralej is now known as amoralej|off17:43
*** myoung is now known as myoung||bbl17:45
*** kopecmartin has quit IRC17:56
*** dtantsur is now known as dtantsur|afk18:06
hubbotFAILING CHECK JOBS on stable/ocata: tripleo-ci-centos-7-undercloud-upgrades @ https://review.openstack.org/564291, master: tripleo-ci-centos-7-3nodes-multinode, tripleo-ci-centos-7-scenario002-multinode-oooq-container, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master @ https://review.openstack.org/56044518:17
*** ccamacho has quit IRC18:43
arxcruz|ruckrlandy|rover: just arrive home, if you need my help with something18:55
rlandy|roverarxcruz|ruck: hi18:59
rlandy|roverwe have patches in flight18:59
rlandy|roverhttps://review.openstack.org/#/c/570593/19:00
rlandy|roverto fic fs00119:00
rlandy|roverfix19:00
rlandy|roverand https://review.openstack.org/#/c/570585 waiting for gates19:00
rlandy|roverarxcruz|ruck: qqueens still says 7d19:01
rlandy|rovermyoung||bbl: ^^ did we get anywhere with that?19:01
rlandy|rover2018-05-25 15:33:40,714 6680 INFO     promoter Promoting the container images for dlrn hash 85de06e2c40bfdc8dee80506f8d1d809a93b900e on queens to current-tripleo-rdo-internal19:02
rlandy|roverarxcruz|ruck: ocata needs work19:03
rlandy|roverwill get back to that after resubmitting reverted patch19:03
*** holser__ has joined #oooq19:04
*** myoung||bbl is now known as myoung19:09
myoungrlandy|rover: looking at it now19:09
myoungrlandy|rover: it pulled all 98 images from rdo for the promoted queens hash.  the queens promo script is still running19:10
myoungrlandy|rover: looking at it now to see where it's hung up19:10
myoungrlandy|rover: seems to be working, just slowly.  very slowly19:11
myoungroot     30635  0.0  0.2 138292  8964 pts/2    Sl+  19:01   0:00 /usr/bin/docker-current push trunk.registry.rdoproject.org/tripleoqueens/centos-binary-cinder-api:current-tripleo-rdo-internal19:11
myoungrlandy|rover: cinder-api is the 12th image...so the prev 11 should be tagged with current-tripleo-rdo-internal in rdoregistry...19:12
* myoung checks https://console.registry.rdoproject.org/registry#/images/tripleomaster19:12
*** holser__ has quit IRC19:12
*** holser__ has joined #oooq19:12
* myoung meant https://console.registry.rdoproject.org/registry#/images/tripleoqueens19:13
rlandy|rovermyoung: thanks19:20
rlandy|roveryay - passing fs001 https://review.rdoproject.org/jenkins/job/gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/14852/console19:21
rlandy|roverwaiting to check logs there19:21
pandarlandy|rover: I'll let you approve19:33
*** panda is now known as panda|off19:33
rlandy|roverpanda|off: thanks19:34
rlandy|roverwaitung on gates19:34
rlandy|rovertripleo-quickstart-extras-gate-newton-delorean-full-minimalFAILURE19:34
rlandy|rovermyoung: hi - https://ci.centos.org/view/rdo/view/tripleo-gate/job/tripleo-quickstart-extras-gate-newton-delorean-full-minimal/ looks like this has been failing for a while19:37
rlandy|roverdo we still care?19:37
rlandy|roverhttps://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-extras-gate-newton-delorean-full-minimal-5960/undercloud/home/stack/undercloud_install.log.gz19:39
rlandy|roverfailing since 05/-919:40
rlandy|rover0919:40
rlandy|rovermyoung: ^^19:41
rlandy|rover2018-05-09 20:25:19 - was the last pass19:41
myoungrlandy|rover: ok so the promoter for queens is running, but DREADFULLY slow19:51
myoungyou can see the current status here:19:51
myounghttps://console.registry.rdoproject.org/registry#/?namespace=tripleoqueens19:51
myoungit's pushing cinder-scheduler now19:51
myoungbut at this rate it's going to take a couple hours.  something's not right19:51
myoungwe can run it from another machine (we did this last sprint too) - it's not too hard and I have things set up to do it already19:52
rlandy|rovermyoung: thanks for watching19:52
myoungi can do with you, but I need to go pick up the wee one (ok she's 13)19:52
myoungsomething is not right with rdocloud networking (again) or with our promoter vm19:52
myoungthings are taking 4x or more longer than they should19:52
myoungrlandy|rover: on promoter server on promoter server, /home/centos/ci-config/ci-scripts/container-push/parsed_containers-queens.txt is the list (in order) of containers that the (running) script is using19:55
myoung[centos@promoter-server container-push]$ ps aux | grep docker | grep push19:57
myoungroot       978  0.0  0.2 138292  8836 pts/2    Sl+  19:46   0:00 /usr/bin/docker-current push trunk.registry.rdoproject.org/tripleoqueens/centos-binary-cinder-volume:current-tripleo-rdo-internal19:57
myoungcinder-volume is 14/98 images, and it's been running for hours19:57
myoungwill check again when I'm back at keyboard, will plan to run this promotion from a box outside rdocloud if it's still poking along later19:57
*** myoung is now known as myoung|bbl19:57
myoung|bblarxcruz|ruck: ^^20:01
arxcruz|ruckmyoung|bbl: dns issues maybe?20:06
arxcruz|rucki would vote to reboot the vm just in case20:06
*** holser__ has quit IRC20:08
rlandy|roverarxcruz|ruck: fyi - if you have time on monday ... https://bugs.launchpad.net/tripleo/+bug/177344520:11
openstackLaunchpad bug 1773445 in tripleo "tripleo-quickstart-extras-gate-newton-delorean-full-minimal fails to install undercloud - Access denied for user 'heat'@'192.168.24.1" [High,Triaged]20:11
rlandy|roverit's been broken since 05/0920:11
rlandy|roverso it was not us20:12
arxcruz|ruckok20:15
hubbotFAILING CHECK JOBS on master: tripleo-ci-centos-7-3nodes-multinode, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master @ https://review.openstack.org/56044520:17
arxcruz|ruck:GG:20:21
*** yolanda has quit IRC20:34
rlandy|rovermyoung|bbl: ping when you are back21:18
hubbotFAILING CHECK JOBS on master: tripleo-ci-centos-7-3nodes-multinode, gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master @ https://review.openstack.org/56044522:17
*** myoung|bbl is now known as myoung22:56
myoungrlandy|rover: this one now... root     13355  0.0  0.2 138292  8748 pts/2    Sl+  22:46   0:00 /usr/bin/docker-current push trunk.registry.rdoproject.org/tripleoqueens/centos-binary-gnocchi-metricd:current-tripleo-rdo-internal22:57
myoung^^ 24/9822:57
myoungwant me to run this locally?22:57
*** hamzy has joined #oooq23:18
rlandy|roverit's ok23:57
*** rlandy|rover has quit IRC23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!