Thursday, 2018-11-15

*** tosky has quit IRC00:27
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario002-multinode-oooq-container, tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-scenario001-multinode-oooq-container @ https://review.openstack.org/602248, master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci- (1 more message)00:48
ssbarnea|bkp2rlandy: https://review.openstack.org/#/c/574731/00:55
*** saneax has quit IRC02:10
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario002-multinode-oooq-container, tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-scenario001-multinode-oooq-container @ https://review.openstack.org/602248, master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci- (1 more message)02:48
*** saneax has joined #oooq03:11
*** vkapalav has quit IRC03:11
*** agopi|brb has joined #oooq03:15
*** udesale has joined #oooq03:52
*** gkadam has joined #oooq04:23
*** jaganathan has joined #oooq04:27
*** gkadam has quit IRC04:27
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, master: tripleo-ci-centos-7-containers-multinode-queens, tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/560445, stable/queens: tripleo-ci-centos-7-scenario008-multinode-oooq- (1 more message)04:48
*** chandankumar has joined #oooq04:59
*** chandankumar is now known as chkumar|ruck05:03
*** chem has quit IRC05:26
*** rlandy has quit IRC05:48
*** ratailor has joined #oooq05:50
*** apetrich has quit IRC05:52
*** apetrich has joined #oooq06:08
*** ratailor_ has joined #oooq06:13
*** ratailor has quit IRC06:15
*** ykarel has joined #oooq06:30
*** jfrancoa has joined #oooq06:33
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, master: tripleo-ci-centos-7-containers-multinode-queens, tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/560445, stable/queens: tripleo-ci-centos-7-scenario008-multinode-oooq- (1 more message)06:48
*** ykarel has quit IRC07:22
*** ykarel has joined #oooq07:42
*** apetrich has quit IRC07:47
*** quiquell is now known as quiquell|bbl08:10
*** sshnaidm|pto is now known as sshnaidm08:15
*** apetrich has joined #oooq08:33
*** tosky has joined #oooq08:38
*** gkadam has joined #oooq08:38
*** apetrich has quit IRC08:40
*** dtrainor_ has joined #oooq08:48
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, master: tripleo-ci-centos-7-containers-multinode-queens, tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/560445, stable/queens: tripleo-ci-centos-7-scenario008-multinode-oooq- (1 more message)08:49
*** amoralej|off is now known as amoralej08:49
*** dtrainor has quit IRC08:51
*** dtrainor__ has joined #oooq08:52
chkumar|rucksshnaidm, Hello08:54
chkumar|rucksshnaidm, standalone deployment is supported from stein cycle onwards na?08:54
*** dtrainor_ has quit IRC08:55
sshnaidmchkumar|ruck, I think so, why?08:56
chkumar|rucksshnaidm, we are running https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-7-standalone-rocky for rocky so asked08:57
*** apetrich has joined #oooq08:59
*** d0ugal has quit IRC09:02
*** d0ugal has joined #oooq09:03
sshnaidmchkumar|ruck, ack09:04
sshnaidmchkumar|ruck, do you know why it could happen? https://logs.rdoproject.org/33/614633/6/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master-vexxhost/3134b57/logs/undercloud/home/zuul/tempest.log.txt.gz#_2018-11-15_06_15_1309:04
chkumar|rucksshnaidm, looking into that currently09:05
sshnaidmchkumar|ruck, also consider this change depends on some another: https://review.openstack.org/#/c/614633/09:13
chkumar|rucksshnaidm, my current vexhost deployment on RDO cloud failed during rdo cloud http://paste.openstack.org/show/734853/09:14
chkumar|ruck*during overcloud deploy09:14
sshnaidmhmm09:21
*** dtrainor_ has joined #oooq09:23
*** dtrainor__ has quit IRC09:25
*** sshnaidm is now known as sshnaidm|afk09:26
*** dtrainor__ has joined #oooq09:27
chkumar|rucksshnaidm|afk, if I check nova api logs there is working fine https://logs.rdoproject.org/33/614633/6/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master-vexxhost/3134b57/logs/undercloud/var/log/containers/nova/nova-api.log.txt.gz#_2018-11-15_05_40_40_54309:28
chkumar|rucktosky, Hello09:28
chkumar|rucktosky, https://logs.rdoproject.org/33/614633/6/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master-vexxhost/3134b57/logs/undercloud/home/zuul/tempest.log.txt.gz#_2018-11-15_06_15_1309:28
chkumar|rucktosky, here while quering nova extensions it is giving 40409:28
chkumar|rucktosky, but in nova api log, it is fine09:29
chkumar|rucktosky, do I need to look somewhere else also?09:29
toskyis the IP correct? I see also a weird traceback09:30
*** dtrainor_ has quit IRC09:30
chkumar|rucktosky, it should use 10.0.0.5 instead of that 172 one09:33
chkumar|rucklet me check from this ip is coming09:34
chkumar|rucktosky, sshnaidm|afk https://logs.rdoproject.org/33/614633/6/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master-vexxhost/3134b57/job-output.txt.gz#_2018-11-15_04_14_28_67466509:37
chkumar|rucktosky, sshnaidm|afk baremetal-335_3 instances have those ip09:37
chkumar|ruckstarting with 172.17.009:37
chkumar|rucktosky, is it possible that when https://10.0.0.5:portnumber/v2.1/extensions called, it internally routed to one of the public ip?09:40
chkumar|ruckor may be running tempest from docker container is doing this?09:41
toskyI have no idea about that specific setup :)09:41
chkumar|ruckbecause of --net=host it is taking the host ip09:42
*** derekh has joined #oooq09:50
*** dtrainor_ has joined #oooq09:53
*** dtrainor__ has quit IRC09:56
*** dtrainor__ has joined #oooq09:57
*** dtrainor_ has quit IRC10:00
*** sshnaidm|afk is now known as sshnaidm10:02
*** chem has joined #oooq10:06
chkumar|ruckpanda|rover|off, container build is passing now10:08
chkumar|ruckpanda|rover|off, but not in openstack-periodic pipeline as it got kicked earlier10:09
*** apetrich has quit IRC10:09
*** gkadam has quit IRC10:13
*** gkadam has joined #oooq10:14
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, master: tripleo-ci-centos-7-containers-multinode-queens, tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/560445, stable/queens: tripleo-ci-centos-7-scenario008-multinode-oooq- (1 more message)10:49
*** udesale has quit IRC10:52
ssbarnea|bkp2sshnaidm: can we workflow https://review.openstack.org/#/c/574731/ ?11:00
sshnaidmssbarnea|bkp2, done11:01
ssbarnea|bkp2sshnaidm: thanks. i randomly came across it but it was obvious. maybe we need to have some CR cleaning sessions once a week.11:02
ssbarnea|bkp2abandoning or merging them11:03
*** panda|rover|off is now known as panda|rover11:08
ssbarnea|bkp2sshnaidm: f28 related https://review.openstack.org/#/c/618027/11:15
*** ratailor_ has quit IRC11:45
*** saneax has quit IRC11:51
*** chem has quit IRC11:57
*** amoralej is now known as amoralej|lunch12:02
* marios afk for a bit back for scrum12:03
*** jtomasek has joined #oooq12:05
sshnaidmssbarnea|bkp2, please use "WIP" in commit message when you testing/debugging/trying something: https://review.openstack.org/#/c/613672/12:09
*** quiquell|bbl is now known as quiquell12:09
ssbarnea|bkp2sshnaidm: that one is not WIP, is ready for review, as far as I am concerned it can be merged like this.12:10
sshnaidmssbarnea|bkp2, is it ready for merge?12:10
ssbarnea|bkp2when I have WIP, I add -1 Workflow on it, even gerrit states near -1 W: "Work in progress".12:11
ssbarnea|bkp2sshnaidm: i cannot force people to review, but the change is valid, passed zuul and only waiting for reviews. this does not make it a WIP, or maybe we should start using "RIP" prefix... for review-in-progress ;)12:12
sshnaidmssbarnea|bkp2, do you have card about it?12:13
ssbarnea|bkp2sshnaidm: *scroll* the description and you will see it.12:13
sshnaidmssbarnea|bkp2, why do you need fedora:26 ?12:14
ssbarnea|bkp2sshnaidm: this was added because we had some CI nodes running on fedora-26 and rlandy dicovered some builds in qs.sh that were happening only on fedora-26.12:15
ssbarnea|bkp2i would be happy to remove fedora-26 once we have confirmation that we no longer have such nodes.12:15
quiquellssbarnea|bkp2: Hello there, I usually add WIP and -1 as -1 sometimes disappear at rebases or new patchsets12:15
quiquellssbarnea|bkp2: We have f26 at phase2 nodes, isn't it ?12:16
*** chem has joined #oooq12:16
ssbarnea|bkp2please put comment on the review, i will be happy to address these. regarding f26 (and f27) i think rlandy can say.12:17
ssbarnea|bkp2she upgraded at least one node from f26 to f27 but i don't know about overall status of this.12:17
ssbarnea|bkp2if someone can tell me where to look it will look.12:17
quiquellssbarnea|bkp2: I suppose we can finde the phase2 promotion jobs, maybe it's easier to update the missing nodes than think about suport f2612:18
sshnaidmas I understand we get rid off phase2 at all12:19
sshnaidmssbarnea|bkp2, well, you can't merge this patch when you clone molecule from github12:20
sshnaidmssbarnea|bkp2, it's ok for local usage, but not for CI12:21
ssbarnea|bkp2quiquell: i am afraid is not easy because we cannot upgrade to f28 because.... py3, and we cannot (easily) switch to centos-7 either, so we are partially stuck, i think.12:21
*** saneax has joined #oooq12:22
quiquellssbarnea|bkp2: Maybe we can use f2712:22
quiquellssbarnea|bkp2: Suppose is similar to centos712:22
ssbarnea|bkp2sshnaidm: i totally agree with you on git aspect. add it there and I will fix it. mainly i was waiting for a new release.12:22
sshnaidmssbarnea|bkp2, that's why I asked if it's ready for merge, it is not. I think it's worth review it and try it, but it should be with W-1 or WIP12:23
chkumar|rucksshnaidm, I am seeing the same issue on https://logs.rdoproject.org/66/610966/9/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/1530b8d/logs/undercloud/home/zuul/tempest.log.txt.gz#_2018-11-15_11_12_2112:23
chkumar|rucklet me retry the reproducer12:24
*** saneax has quit IRC12:24
sshnaidmchkumar|ruck, yes.. looks like mess with network isolation?12:24
chkumar|rucksshnaidm, yes kind of12:25
chkumar|rucksshnaidm, is docker doing something fishy there?12:25
ssbarnea|bkp2sshnaidm: thanks for the comments, added -W on it and addressing them.12:28
sshnaidmchkumar|ruck, I don't know, it worked with dockers before.. please check when did it start to fail on vexxhost12:29
*** saneax has joined #oooq12:30
*** chem has quit IRC12:33
sshnaidmchkumar|ruck, maybe we can find the breaking patch..12:37
*** chem has joined #oooq12:38
quiquellsshnaidm: Good TV series12:39
ssbarnea|bkp2do we have a generic bug (LP) about timeouts?12:41
chkumar|rucktripleo-ci-centos-7-scenario00(1-8)-multinode-oooq-container timed out recently12:42
sshnaidmquiquell, need to sync with you via bluej, do you have a time now?12:46
ssbarnea|bkp2i found https://bugs.launchpad.net/tripleo/+bug/1781888 related to timeouts but that one mentiones upgrade jobs, my question was more generic because I want to add an elastic-recheck search for the generic timeouts.12:47
openstackLaunchpad bug 1781888 in tripleo "[stable/queens]tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades job timing out on stable/queens noop jobs " [Critical,Triaged] - Assigned to Sagi (Sergey) Shnaidman (sshnaidm)12:47
sshnaidmquiquell, well, we have a mtg soon, then after it12:47
ssbarnea|bkp2should I repurpose existing bug or create new generic one?12:48
ssbarnea|bkp2I need bug number for elastic-search12:48
sshnaidmssbarnea|bkp2, why do we need elastic recheck for generic timeouts?12:48
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/560445, stable/queens: tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/567224,  (1 more message)12:49
ssbarnea|bkp2sshnaidm: because wes asked me to lower the number of uncategorized failures from http://status.openstack.org/elastic-recheck/data/others.html12:49
ssbarnea|bkp2and is important goal, with card on it.12:50
sshnaidmssbarnea|bkp2, but what is a goal?12:51
ssbarnea|bkp2sshnaidm: don't confuse the recheck in the repo name with "recheck" we do on gerrit ;)12:51
quiquellsshnaidm: Can do after it, tomorrow I will work at my afternoon12:51
quiquellsshnaidm: s/tomorrow/today/12:51
ssbarnea|bkp2excerpt from READNE: "Use ElasticSearch to classify OpenStack gate failures"12:51
sshnaidmssbarnea|bkp2, I know what is elastic recheck12:52
sshnaidmssbarnea|bkp2, and what is it helpful for?12:52
sshnaidmquiquell, cool12:52
*** jtomasek has quit IRC12:53
sshnaidmssbarnea|bkp2, if it's for just counting timeouts in gates, we have it in cockpit graphs12:53
*** rlandy has joined #oooq12:56
ykarelchkumar|ruck, sshnaidm is the log collection broken i can't find overcloud logs in ovb jobs12:58
ykarel?12:58
panda|roverykarel: logs ?12:59
*** ykarel has quit IRC12:59
*** ykarel has joined #oooq12:59
ykarelpanda|rover, u can check any of the ovb logs in last promotion run: https://trunk-primary.rdoproject.org/api-centos-rocky/api/civotes_detail.html?commit_hash=4af630f02a985625b587049d633bed4c7752d2d1&distro_hash=15dcc0dfac7994229f2a5e5a4d948227e1f84fdf13:00
ykareli see some ssl error which results in 0 overcloud_nodes13:00
ykarelget_overcloud_nodes.py13:00
ykarelssbarnea|bkp2, can u check/fix containers collection logs, seems broken by ur patch: https://review.openstack.org/#/c/610491/44/roles/collect-logs/tasks/collect.yml@23913:00
chkumar|ruckykarel, that's why I am thinking where is subnode2 logs gone in timedout job13:01
ykarelchkumar|ruck, in multinode job logs are there13:01
ykareli am talking about ovb13:02
ykarelthere may be some other issue13:02
chkumar|ruckykarel, https://logs.rdoproject.org/66/610966/9/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/1530b8d/logs/13:02
ykarelthis is ovb13:02
chkumar|ruckme too talking about ovb13:02
ykarelokk generally in ovb overcloud is not named as subnode-213:03
ykarelthere is ovecloud-controller, overcloud-compute etc13:03
ykarelbut that's configurable13:03
ykareland in multinode it's subnode-2 subnode-313:03
ykarelso confused13:03
chkumar|ruckyes13:04
ykarelssbarnea|bkp2, u can check logs in ur patch to see what went wrong13:05
ykarellook for: syntax error: unexpected end of file13:05
ykarelex log: http://logs.openstack.org/91/610491/44/check/tripleo-ci-centos-7-scenario002-multinode-oooq-container/941fac4/logs/quickstart_collect_logs.log13:05
*** apetrich has joined #oooq13:05
ssbarnea|bkp2ykarel: what to search for in that log?13:07
ykarelssbarnea|bkp2, syntax error: unexpected end of file13:07
ssbarnea|bkp2ykarel: ok good point.13:07
ssbarnea|bkp2ykarel: if I would have to guess,  ( ) need escaping.13:11
ykarelssbarnea|bkp2, is \ before $ required?13:12
ykarelanyway good to check logs before merging the fix for it13:13
*** chkumar|ruck is now known as chkumar|ruck|brb13:15
ssbarnea|bkp2ykarel: https://review.openstack.org/#/c/618163/ -- shoud fix it.13:17
*** chkumar|ruck|brb is now known as chkumar|ruck13:18
ykarelssbarnea|bkp2, ack let's see the result13:25
sshnaidmssbarnea|bkp2, you sigh very hard :) can you mute please?13:25
weshaychkumar|ruck, few minutes late13:27
ykarelssbarnea|bkp2, following command runs: docker exec heat_api_cfn bash -c "$(command -v dnf || command -v yum) list installed" with ur last patch13:29
ykareland with current: docker exec  heat_api_cfn bash -c "$\(command -v dnf || command -v yum\) list installed"13:29
ykarelthis what u want ^^13:29
ykarel?13:29
weshaypanda|rover, can you run the rest of the mtg?13:31
ssbarnea|bkp2ykarel: I think what we need to run is: bash -c "$(command -v dnf || command -v yum) list installed"13:32
*** trown|outtypewww is now known as trown13:32
chkumar|ruckweshay, ok13:32
ssbarnea|bkp2this command would return installed packages. if you know how to test it faster with docker context it could speed us up.13:33
panda|roverweshay: yes13:33
weshaypanda|rover, thank you13:33
ykarelssbarnea|bkp2, that is currently running but failing, can u try locally running on a container13:34
ykarelex log: http://logs.openstack.org/91/610491/44/check/tripleo-ci-centos-7-scenario002-multinode-oooq-container/941fac4/logs/undercloud/var/log/extra/docker/containers/heat_api_cfn/docker_info.log.txt.gz13:35
*** udesale has joined #oooq13:37
ykarelssbarnea|bkp2, we need to run: bash -c "\$(command -v dnf || command -v yum) list installed"13:38
ssbarnea|bkp2ykarel: so the right fix would be wirte \\\$ and no need to escape the ()13:39
ykarelssbarnea|bkp2, yes looks so, but i still can't get why syntax error: EOF is ther13:40
*** apetrich has quit IRC13:44
ykarelssbarnea|bkp2, it would be more clear if we write the stderr also to file: https://review.openstack.org/#/c/610491/44/roles/collect-logs/tasks/collect.yml@24413:45
ykarelas with loop it's not clear what error is with which command13:45
weshaychkumar|ruck, https://review.openstack.org/#/c/618092/1/roles/collect-logs/tasks/collect.yml13:48
chkumar|ruckweshay, https://github.com/openstack-infra/tripleo-ci/blob/475da5cdf82ed3db25d0ce2fca2140bdaffcea01/scripts/get_docker_logs.sh13:49
weshaychkumar|ruck, https://review.openstack.org/#/c/618092/1/roles/collect-logs/tasks/collect.yml13:51
*** jfrancoa has quit IRC13:51
ykarelchkumar|ruck, https://bugs.launchpad.net/tripleo/+bug/1803547 seems to be happening after weshay's patch: https://review.openstack.org/#/c/617845/13:51
openstackLaunchpad bug 1803547 in tripleo "tempest run failed with 'The specified regex doesn't match with anything ' on tripleo-ci-centos-7-scenario001-multinode-oooq-container" [Critical,Triaged] - Assigned to chandan kumar (chkumar246)13:51
ykarelas whitelist and blacklist for scenario001 seems same13:51
ykarelso no test13:51
*** amoralej|lunch is now known as amoralej13:52
ykarelso a real issue13:52
ykarela/it's a13:52
panda|roverhttps://tree.taiga.io/project/tripleo-ci-board/us/109  https://review.openstack.org/60728813:54
*** agopi|brb is now known as agopi13:54
weshaychkumar|ruck, https://cryptic-cliffs-32040.herokuapp.com/13:55
ykarelpanda|rover, did you get the missing logs in ovb?13:55
ssbarnea|bkp2ykarel: clearly the failure to include stderr is a bug.13:55
ssbarnea|bkp2ykarel: in fact i am going to rewrite the entire loop as it does not make sense to me.13:57
weshaychkumar|ruck, http://logs.openstack.org/13/617913/2/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/e232a20/logs/undercloud/home/zuul/skip_file.gz13:57
ykarelssbarnea|bkp2, but remember to see if it's working before merging :)13:57
chkumar|ruckpanda|rover, weshay see ya tomorrow14:02
chkumar|ruckbye14:02
*** chkumar|ruck has quit IRC14:02
*** ykarel is now known as ykarel|away14:04
*** jfrancoa has joined #oooq14:04
quiquellsshnaidm: Do you want to sync now ?14:08
panda|roverykarel|away: sorry was in meeting14:08
sshnaidmquiquell, https://bluejeans.com/u/sshnaidm/14:09
ykarel|awaypanda|rover, ack no issue14:09
panda|roverykarel|away: yes collect logs is broken14:31
panda|roverykarel|away: it's failing to get the overcloud nodes ips14:31
panda|roverkeystoneauth1.exceptions.connection.SSLError: SSL exception connecting to https://192.168.24.2:13000/v3/auth/tokens: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)14:32
quiquellsshnaidm: All progress in the taiga task https://tree.taiga.io/project/tripleo-ci-board/task/348?kanban-status=144727514:32
quiquellsshnaidm: also the github link to it14:32
*** ykarel|away has quit IRC14:34
sshnaidmquiquell, ack14:35
weshaypanda|rover, is there a taiga task for collect logs?14:37
panda|roverweshay: which part of collect logs ?14:37
panda|roverweshay: anyway, there is not task14:37
weshaypanda|rover, re: the overcloud nodes14:38
panda|roverweshay: it's a bug we just discovered14:38
*** vkapalav has joined #oooq14:40
panda|roveroh ykarel is gone, I'll open the bug then14:42
weshaygeez14:43
panda|roverweshay: https://bugs.launchpad.net/tripleo/+bug/180356114:45
openstackLaunchpad bug 1803561 in tripleo "Collect logs in OVB job fails to fetch overcloud nodes and their logs" [High,Triaged] - Assigned to Gabriele Cerami (gcerami)14:45
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/560445, stable/queens: tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/567224,  (1 more message)14:49
weshaypanda|rover, so we don't looks track https://tree.taiga.io/project/tripleo-ci-board/issue/380?kanban-status=202773314:49
panda|roverweshay: ok14:50
weshaypanda|rover, it may be related to https://logs.rdoproject.org/33/614633/7/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master-vexxhost/ffd8061/logs/undercloud/home/zuul/tempest.log.txt.gz#_2018-11-15_12_12_2714:51
weshaynhicher, panda|rover are we able to put a hold on a vexxhost ovb deployment?14:51
weshaynhicher, so NONE of the nodes are cleaned up14:51
weshaynhicher, panda|rover or maybe we can run a reproducer live in that env14:51
panda|roverweshay: different problems, you pasted a network connection error, here the network works, but SSL exchange fails for the certificate. My first idea was to make the script ignore the certificate, be the script we have is using nova client, and if it worked so far, this may be a legit error.14:55
panda|rovertbh I don't know since when it started to fail14:56
nhicherweshay: we can hold only nodepool instances, so only undercloud14:57
panda|roverwe would need to instruct the te broker to not remove overcloud nodes.14:58
*** d0ugal has quit IRC14:58
weshaypanda|rover, nhicher to figure out wth is going ..  I would suggest launching an ovb by hand using the vexx creds14:59
weshaynhicher, is that possible?15:01
nhicherweshay: I can set undercloud in hold state and add yours keys on it if you want15:03
weshaynhicher, k.. and after a job launches there... panda|rover would probably have to turn off the tebroker15:04
weshaynhicher, so he'll need access to that15:04
nhicherweshay: ok, panda|rover can you give me your ssh pub key ?15:04
*** ykarel|away has joined #oooq15:05
*** jfrancoa has quit IRC15:24
*** jfrancoa has joined #oooq15:39
panda|roverall the OVB logs we have available for the last periodic runs are missing the overcloud for the same error, I can track up 24 Oct15:41
panda|rovernot easy to understand when this started15:41
panda|rovertrying with log search15:42
weshaypanda|rover, it's been going on for a while15:50
weshaysuprised no one noticed15:50
weshaypanda|rover, let's focus on the fixing it15:51
weshaynot finding when it started15:51
weshayI don't think we have logs that old15:51
rlandywoohoo - filter worked15:54
rlandyweshay: ^^15:54
rlandyzuul_legacy_vars15:54
weshayrlandy, oh nice15:57
panda|roverweshay: I'll put a workaround in our script, to disable SSL verification15:57
weshaypanda|rover, cool15:57
panda|rovernot sure if we even care to test SSL chain15:57
weshaynot w/ that script15:57
weshaypanda|rover, now that newton and ocata are dead15:58
weshaywe could probably just use openstack commands15:58
weshayto get the list of the overcloud nodes15:58
weshaybut I digress15:58
*** d0ugal has joined #oooq16:00
quiquellweshay, sshnaidm: will remove them from the cockpit16:04
weshayquiquell, what?16:04
quiquellweshay: ocata16:05
sshnaidmquiquell, yeah, it's a time16:06
sshnaidmquiquell, maybe pike too16:06
quiquellsshnaidm: ack16:07
quiquellThe train is coming16:07
panda|roverweshay: https://review.openstack.org/61820516:10
weshaypanda|rover, k.. awesome.. think you need to run check-rdo on it16:11
panda|roverweshay: already did16:15
panda|roverif gertty says the truth at least16:15
mariosweshay: rfolco o/ comments please on the design of the featureset_override, v18 @ https://review.openstack.org/#/c/616872/18/playbooks/tripleo-ci/run-v3.yaml and i updated the description in https://tree.taiga.io/project/tripleo-ci-board/task/363 to reflect our no longer needing a featureset and describing this featureset_override16:15
mariosweshay: i restricted with the path as we said, and also decided to not even expose the standalone_custom_env_files instead using a new/dedicated override we can re-use specifically for the scenario jobs (i.e. to address your concern weshay about allowing this override.) anyway wdyt16:16
weshayhrm.. /me looks16:17
marioshttps://review.openstack.org/#/c/616872/18/zuul.d/standalone-jobs.yaml here you specify your featureset_override and we process it in the run-v3 i pointed at there ^16:17
*** gkadam has quit IRC16:18
weshayhrm.. not sure if I grok this yet https://review.openstack.org/#/c/616872/18/playbooks/tripleo-ci/run-v3.yaml16:20
mariosweshay: this is the interface we expose: https://review.openstack.org/#/c/616872/18/zuul.d/standalone-jobs.yaml16:20
weshayya16:20
weshayand just the file name?16:20
mariosweshay: so for standalone scenario jobs we have this standalone_scenario_environment allowed override16:20
weshayof the env16:21
weshay-/\116:21
mariosweshay: i made it very specific, in line with the discussion, i.e. NOT general mechanism. specifically, for the scenario standalone override16:21
mariosweshay: so then they have in featureset_override: "standalone_scenario_environment: 'ci/environments/scenario001-standalone.yaml'" and i use a replace to make that into the standalone standalone_custom_env_files: [/usr/share/openstack-tripleo etc16:22
mariosweshay: makes sense?16:22
weshayk.. but why:  "standalone_scenario_environment: 'ci/environments/scenario001-standalone.yaml16:23
weshayand not16:23
weshay "standalone_scenario_environment: 'scenario001-standalone.yaml16:23
mariosweshay: right and then also assume ci/environments you mean?16:23
weshayI guess we want the option of taking from some other dir?16:23
mariosweshay: right but since we want to be very specific i think it fits16:23
mariosweshay: so we're saying the interface is to use this specific override for standalone scneario your env file lives here16:24
weshayI *think* so.. I think simplicity and being concise will help16:24
weshayit's not a major diff either way, but my focus is on usability16:24
weshayas I want this to go out to the world16:24
mariosweshay: yeah so we are passing this in the job (ie the regex works) http://logs.openstack.org/72/616872/17/check/tripleo-ci-centos-7-scenario001-standalone/b2804dd/logs/reproducer-quickstart/featureset-override.yaml and then the command looks like http://logs.openstack.org/72/616872/17/check/tripleo-ci-centos-7-scenario001-standalone/b2804dd/logs/undercloud/home/zuul/standalone.sh.txt.gz16:25
weshaymarios, maybe it's a new dir under ci/environments16:25
weshayto be clean16:25
weshayci/environments/standalone/*16:25
mariosweshay: sure but then its a comment at https://review.openstack.org/#/c/616592/ please?16:25
weshayaye16:25
*** chandankumar has joined #oooq16:26
mariosweshay: so you are mostly /first impression +1 on the idea of NOT exposing standalone_custom_env_files and instead having a thing we will use specifically for these standalone scenario jobs (that thing being "standalone_scenario_environment" or whatever else you want to call it16:27
*** d0ugal has quit IRC16:27
weshayI'm +1 on it for sure.. just now focused on the minor details of the implementation16:28
*** ykarel|away has quit IRC16:28
weshaymarios, on that note...16:28
weshaystandalone_scenario_environment may be to wordy16:28
mariosweshay: ack cool. i wasted a long time trying to make my regex match ' and " until i realised the to_nice_yaml here removes those https://github.com/openstack-infra/tripleo-ci/blob/master/playbooks/tripleo-ci/templates/featureset-override.j2#L1  .. so yeah details16:28
weshayand I htink we can deploy w/o scenarios16:28
weshayright16:28
weshaystandalone_environment16:29
mariosweshay: sure man add comment on the review? or even standalone_scenario_env16:29
marioswhtaver16:29
weshayya16:29
mariosi'll update all that tomorrow, undo the layout changes to add all the jobs back and lets merge it already. though the job is currently broken http://logs.openstack.org/72/616872/17/check/tripleo-ci-centos-7-scenario001-standalone/b2804dd/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz16:29
mariosweshay: key thing is to work all this stuff out for scen1. then copy/paste16:30
*** chandankumar has quit IRC16:30
mariosweshay: i think requiring update for the directory is to omuch/not worth it16:35
mariosweshay: i'll just update my review to at least assume /ci/environments and just use alex filename scenario001-standalone.yml as is16:35
mariosweshay: k?16:35
weshayupdated the review16:36
*** agopi is now known as agopi|brb16:36
weshaymarios, ya.. that is fine16:36
mariosweshay: thank you16:36
weshaymarios, think about zuul triggers as you do this though16:36
*** udesale has quit IRC16:36
*** d0ugal has joined #oooq16:39
*** agopi|brb has quit IRC16:40
*** quiquell has quit IRC16:46
*** jfrancoa has quit IRC16:46
mariosweshay: rfolco v19 interface like https://review.openstack.org/#/c/616872/19/zuul.d/standalone-jobs.yaml https://review.openstack.org/#/c/616872/19/playbooks/tripleo-ci/run-v3.yaml16:47
mariosstandalone_environment lgtm16:48
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/560445, stable/queens: tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/567224,  (1 more message)16:49
weshaymarios, cool.. nit on the commit message16:50
mariosrlandy: panda|rover can we catch up tomorrow :/ sorry it took longer than i expected16:50
weshayaight16:50
rlandysure16:50
weshayl8r16:50
mariosweshay: ack thanks. hopefully we can now focus on getting the job working cos it isn't currenlty but didn't investigate if its a known issue or something specific to scen1 standalone16:50
weshayya.. see that is what I was thinking :)16:51
mariosrlandy: hopefully i spend much less time on this tomorrow16:51
weshayI assume thtat will happen on all of them16:51
weshayaint nothing in this world for free16:51
mariossshnaidm: good call on the nomination16:59
mariosimo/fwiw16:59
*** d0ugal has quit IRC17:07
*** quiquell has joined #oooq17:18
*** agopi|brb has joined #oooq17:27
*** agopi|brb is now known as agopi17:28
weshaymarios, you still here?17:32
weshaymarios, just figure it out w/ alex17:32
*** sshnaidm is now known as sshnaidm|afk17:37
ssbarnea|bkp2marios: weshay : https://review.openstack.org/#/c/618021/ may look trivial but it fixes two duplicate key bugs and implements preventive measures.17:40
mariosweshay: not realy at the gym.. v22/3 was updated whilst jumping around like a lunatic. i think i will refrain from more till the morning and you guys decide ;)17:41
weshaymarios, np I will add a comment17:41
weshayfigured it out17:41
*** quiquell has quit IRC17:43
*** quiquell has joined #oooq17:43
*** chandankumar has joined #oooq17:44
*** quiquell is now known as quiquell|off17:48
*** trown is now known as trown|lunch17:48
*** saneax has quit IRC17:50
ssbarnea|bkp2https://review.openstack.org/#/c/618163/ bugfix on collect logs is ready for review https://review.openstack.org/#/c/618163/17:55
*** derekh is now known as derekh_afk18:01
panda|roverssbarnea|bkp2: \\\\\\\\\\!18:02
panda|roverssbarnea|bkp2: anyway I think there' another change about to merge that is adding another command there18:02
panda|roverssbarnea|bkp2: ${engine} version, and adding some spacing between the outputs in the log file18:03
panda|roverTengu: what's the change id for that ^ ?18:03
*** panda|rover is now known as panda|rover|off18:05
ssbarnea|bkp2panda|rover: well, we will find out. because I use "set +x" we should not longer need to do echo and separation.18:05
ssbarnea|bkp2panda|rover|off: still, I wonder where is the resulting file collected because I was not able to spot it yet. somehow I don't. like the overly verbose ansible output -- really hard to read. i would personally do a: no_log: results.rc == 0 once I have the proof that the file is collected.18:07
panda|rover|offssbarnea|bkp2: you don't see it, it's in the cloud18:08
panda|rover|offsomewhere18:09
panda|rover|offssbarnea|bkp2: http://logs.openstack.org/63/618163/2/check/tripleo-ci-centos-7-containers-multinode/9687fcf/logs/undercloud/var/log/extra/docker/docker_allinfo.log.txt.gz18:10
panda|rover|offssbarnea|bkp2: no, that's not it18:11
panda|rover|offssbarnea|bkp2: mmh maybe we're really not collecting it18:13
ssbarnea|bkp2i know, everything is in the cloud...18:14
chandankumarOh it is someone else computer :-018:14
ssbarnea|bkp2when i will launch my own cloud, I will call it devnull, because is the most popular destination.18:16
chandankumarhehe18:16
*** chandankumar has quit IRC18:17
ssbarnea|bkp2any core I persuade to workflow https://review.openstack.org/#/c/618021/ ?18:18
panda|rover|offssbarnea|bkp2: tempest_log_file was already removed by another patch18:23
ssbarnea|bkp2well, at the time I raised it it was still there.18:24
ssbarnea|bkp2and I think you do remember having a similar discussions few weeks back, same kind of bug and based on suggestion I decided to split the fix in two, .... we all know that the preventive measure never managed to get merged.18:25
panda|rover|offssbarnea|bkp2: what's with all those repos in pre-commit-config ?18:26
ssbarnea|bkp2panda|rover|off:  this is how pre-commit hooks work, is similar to how we have requirements, the difference is that pre-commit ones are *always* pinned (for consistency),18:28
panda|rover|offssbarnea|bkp2: do every time I git commit, I download those repos ?18:29
ssbarnea|bkp2while pre-commit is a python project, it does not install stuff using wheels, uses only git+revision as this being the only portable way to install stuff (some repos are using nodejs for example, not ours)18:30
ssbarnea|bkp2nope, try it first and run twice and you will what happens.18:30
panda|rover|offssbarnea|bkp2: IIRC it's not used by default ?18:31
ssbarnea|bkp2first time you use a repo+hash it will install it in your user cache, after this it will reuse it (even across different projects)18:31
panda|rover|offoh now it is18:31
panda|rover|offI have to install pre-commit to run tox ?18:31
ssbarnea|bkp2no need.18:32
ssbarnea|bkp2tox will install pre-commit inside linters, and use it.18:32
panda|rover|offit's in test-req18:32
panda|rover|offmmmhh, so we are fetching a buch of random repos from github to run our linters ? :/18:33
ssbarnea|bkp2but you can install it if you want (and also activate the hook, again *optional*)18:33
ssbarnea|bkp2these are not random repos, all of them are official project git locations.18:33
ssbarnea|bkp2is not a 3rd party repo, is officiall repos for both bashare and yamllint.18:34
panda|rover|offssbarnea|bkp2: why not add yamllint in test-requirements ?18:35
ssbarnea|bkp2panda|rover|off: multiple reasosn: one of them being that you would have to write a LOT of boilerplace code to implement the ability to check only the touched files, and this would only be the start.18:36
ssbarnea|bkp2think about pre-commit like a generic tox for linters, one that can run from inside tox.18:37
*** amoralej is now known as amoralej|off18:37
ssbarnea|bkp2i have another change that is switching from internal ansible-linter to pre-commit ansible-linter making it 10x faster on each execution.18:38
panda|rover|offssbarnea|bkp2: internal ?18:38
panda|rover|offssbarnea|bkp2: you're tying a linters job in openstack to github.com availability  and named repos, not using packages, just running unverified code on CI node ...18:39
panda|rover|offbashate should use openstack git18:41
panda|rover|offof all this, is the onlyt thing that I don't like ... it's triggering my SysAdmin sensors18:42
ssbarnea|bkp2panda|rover|off:  check http://codesearch.openstack.org/?q=github.com%2F&i=nope&files=.txt&repos= -- try to skip the openstack ones, you will find lots.18:44
ssbarnea|bkp2but as the same time, i do understand your worries, i seen some ugly things in the past, especially around private repos....18:44
ssbarnea|bkp2i think is more about *what* we point to: a pypi package that is not pinned has same security as a git one.18:45
ssbarnea|bkp2nice part about pre-commit is that you can put hashes, making close to impossible to hack.18:46
panda|rover|offssbarnea|bkp2: I see lots of mentions in documentation18:47
panda|rover|offssbarnea|bkp2: comments18:48
panda|rover|offssbarnea|bkp2: release file for mitaka18:48
ssbarnea|bkp2regarding using git.openstack.org instead of github.com - is questionable: i find github.com considerably faster than our infra, and I also like the idea of using M$ money to lower our git server loads.18:48
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/560445, stable/queens: tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/567224,  (1 more message)18:49
panda|rover|offssbarnea|bkp2: I think it's bast practice at this point to use git.openstack when we can, even if it's slower18:50
panda|rover|offanyway, going home. I'm not blocking your review18:51
ssbarnea|bkp2panda|rover|off: sure, i don't have anything against that. can fix it quickly, other remarks?  personally I would prefer to merge it like this and fix it later as is already go the magic votes on it.18:51
ssbarnea|bkp2but if you want I can also include the ansible-lint part in it and restart from zero :(18:52
ssbarnea|bkp2just observed that the bashate one was already like this, was not part of this change.18:56
*** trown|lunch is now known as trown18:56
weshaypanda|rover|off, updated your patch to https://review.openstack.org/#/c/614633/19:53
weshayso we can test w/ vexx19:54
*** sshnaidm|afk is now known as sshnaidm|off20:00
*** derekh_afk has quit IRC20:07
ssbarnea|bkp2weshay: any chance to convince you about https://review.openstack.org/#/c/618021/ ? -- i will take care of potential rebasing when it goes in.20:25
weshayssbarnea|bkp2, did you see my comment?20:25
ssbarnea|bkp2weshay: now I did, and the answer: http://codesearch.openstack.org/?q=https%3A%2F%2Fgithub.com%2Fpre-commit%2Fpre-commit-hooks&i=nope&files=&repos=20:27
ssbarnea|bkp2mainly 4 projects, with me involved in some of the but not all.20:27
ssbarnea|bkp2nice comment! btw.20:27
weshay:)20:29
*** agopi has quit IRC20:37
*** agopi has joined #oooq20:38
weshayssbarnea|bkp2, think you can setup openstack mirroring for that repo?20:48
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-scenario002-multinode-oooq-container @ https://review.openstack.org/602248, master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone, tripleo-ci-centos-7-scenario007-multinode-oooq- (1 more message)20:49
ssbarnea|bkp2weshay: i have no idea, but i can look for it.21:08
weshayssbarnea|bkp2, ya.. I think it's just a config file21:08
weshayssbarnea|bkp2, using github is bad21:08
ssbarnea|bkp2not sure why is bad, only case would trottleing form github but i didn't see that ever on openstack. still, I will look into it. if adoption explodes and we lack caching it may be needed.21:10
ssbarnea|bkp2i am going offline now, if this merges i will be busy tomorrow, fixings others CRs. but it would worth the effort.21:11
*** dtrainor__ has quit IRC21:26
weshayssbarnea|bkp2, go ask in infra if you want more details21:27
weshaybut openstack tests failing when github is down or dos attacked happens21:27
weshayand that's bad21:27
weshayopenstack tests failing because git.openstack is down.. is obvious21:27
weshayssbarnea|bkp2, get it21:27
weshayssbarnea|bkp2, aye. .thanks for the hands and eyes21:27
*** d0ugal has joined #oooq21:46
*** d0ugal has quit IRC22:06
*** trown is now known as trown|outtypewww22:13
*** derekh has joined #oooq22:23
*** derekh is now known as derekh_afk22:38
hubbot1FAILING CHECK JOBS on master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-scenario002-multinode-oooq-container @ https://review.openstack.org/602248, master: tripleo-ci-centos-7-scenario001-multinode-oooq-container, tripleo-ci-fedora-28-standalone, tripleo-ci-centos-7-scenario007-multinode-oooq- (1 more message)22:49
*** d0ugal has joined #oooq22:53
*** saneax has joined #oooq23:10
*** agopi is now known as agopi|brb23:19
*** agopi|brb has quit IRC23:24
*** vkapalav has quit IRC23:38
*** agopi|brb has joined #oooq23:49

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!