Monday, 2018-01-29

*** bobh has joined #openstack-infra00:01
*** dingyichen has quit IRC00:02
*** dingyichen has joined #openstack-infra00:04
*** bobh has quit IRC00:06
*** dhill_ has quit IRC00:07
*** abelur_ has joined #openstack-infra00:14
*** lbragstad has joined #openstack-infra00:14
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Export DIB_ROOT_LABEL from final state  https://review.openstack.org/53227900:19
*** bobh has joined #openstack-infra00:20
*** tosky has quit IRC00:23
*** bobh has quit IRC00:24
*** salv-orlando has quit IRC00:29
*** bobh has joined #openstack-infra00:31
openstackgerritMerged openstack-infra/elastic-recheck master: Add query for stable branch cellsv1 job libvirt crash bug 1745838  https://review.openstack.org/53861400:34
openstackbug 1745838 in OpenStack Compute (nova) "legacy-tempest-dsvm-cells constantly failing on stable pike and ocata due to libvirt connection reset" [Undecided,New] https://launchpad.net/bugs/174583800:34
*** bobh has quit IRC00:35
*** bobh has joined #openstack-infra00:40
*** bobh has quit IRC00:45
*** kiennt26 has joined #openstack-infra00:46
*** gcb has joined #openstack-infra00:53
*** hemna has quit IRC00:55
jlvillalAny ideas on why this patch isn't moving to the 'gate'? https://review.openstack.org/#/c/537972/00:57
*** cuongnv has joined #openstack-infra00:57
jlvillalIt has CR +2, V +1, WF +100:57
jlvillalDoesn't appear to depend on any un-merged patches.00:57
* jlvillal tries a recheck00:59
*** hemna has joined #openstack-infra01:01
corvusjlvillal: the w+1 event may have been missed due to the zuul outage earlier.  a second w+1 or toggling the existing w+1 would enqueue it; or if no core reviewers are available, recheck works (though it'll go through check again first)01:02
jlvillalcorvus, I did try doing a W -1, CR -201:03
jlvillalcorvus, And then undoing it. Didn't seem to take.01:03
jlvillalcorvus, So I'll see if the recheck works01:03
jlvillalThanks01:03
corvuswell, it's in check now, but if that didn't work, it may not move into gate.  i'll take a look and see if i can spot what was missing01:04
corvusjlvillal: oh, it needs a w+1.01:07
corvusjlvillal: if you just go ahead and give it that now, it'll go into gate01:07
jlvillalcorvus, It has a W+1 right now01:07
jlvillalNot by me01:07
corvusjlvillal: right, it needs to see the event01:07
jlvillalOKay. I'll try :)01:08
jlvillalcorvus, I see it in the 'gate' now. Thanks.01:08
corvusjlvillal: it's event driven, so the things that put a change into gate are either w+1 or v+101:09
jlvillalcorvus, Ah, okay. Thanks01:09
corvus(if a matching event happens, then it checks the other requirements)01:09
jlvillalcorvus, If I removed my W+1, would it stop? Or keep going?01:09
* jlvillal just curious01:10
corvusjlvillal: it'll keep going.  will still need at least one +1 on there when it's done to merge.01:10
jlvillalcorvus, Okay.01:11
*** liujiong has joined #openstack-infra01:19
*** gcb has quit IRC01:21
*** liujiong has quit IRC01:27
*** gongysh has joined #openstack-infra01:29
*** salv-orlando has joined #openstack-infra01:29
*** salv-orlando has quit IRC01:35
*** mriedem has quit IRC01:37
*** b_bezak has joined #openstack-infra01:59
*** b_bezak has quit IRC02:04
*** hongbin has joined #openstack-infra02:06
*** spligak_ has joined #openstack-infra02:11
*** spligak has quit IRC02:12
*** armax has joined #openstack-infra02:14
*** cuongnv has quit IRC02:27
*** cuongnv has joined #openstack-infra02:28
*** salv-orlando has joined #openstack-infra02:31
*** salv-orlando has quit IRC02:35
*** jamesmcarthur has joined #openstack-infra02:37
*** sshnaidm has quit IRC02:40
*** jamesmcarthur has quit IRC02:47
*** jamesmcarthur has joined #openstack-infra02:47
*** yamamoto has joined #openstack-infra02:58
*** gongysh has quit IRC03:03
*** rcernin has quit IRC03:08
*** harlowja has joined #openstack-infra03:11
openstackgerritNguyen Van Trung proposed openstack-dev/hacking master: Drop py34 target in tox.ini  https://review.openstack.org/53873103:14
openstackgerritIan Wienand proposed openstack/diskimage-builder master: [DNM] Separate initial state and create state  https://review.openstack.org/53873203:15
*** xinliang has quit IRC03:16
*** xinliang has joined #openstack-infra03:16
*** harlowja has quit IRC03:18
*** dave-mccowan has quit IRC03:20
*** olaph has joined #openstack-infra03:20
*** stakeda has joined #openstack-infra03:21
*** olaph1 has quit IRC03:22
*** jamesmcarthur has quit IRC03:25
*** jamesmcarthur has joined #openstack-infra03:26
*** jgwentworth is now known as melwitt03:26
*** jamesmcarthur has quit IRC03:31
*** salv-orlando has joined #openstack-infra03:31
*** cshastri has joined #openstack-infra03:32
*** salv-orlando has quit IRC03:36
*** annp has joined #openstack-infra03:40
*** jamesmcarthur has joined #openstack-infra03:56
*** jamesmcarthur has quit IRC04:03
*** rcernin has joined #openstack-infra04:04
*** hongbin has quit IRC04:04
*** jamesmcarthur has joined #openstack-infra04:09
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Set default label for XFS disks  https://review.openstack.org/53227904:11
*** olaph1 has joined #openstack-infra04:13
*** olaph has quit IRC04:13
*** owalsh_ has joined #openstack-infra04:13
*** jamesmcarthur has quit IRC04:14
*** jamesmcarthur has joined #openstack-infra04:16
*** owalsh has quit IRC04:17
*** psachin has joined #openstack-infra04:17
*** sree has joined #openstack-infra04:19
*** jamesmcarthur has quit IRC04:21
*** ykarel has joined #openstack-infra04:21
*** bhagyashris has quit IRC04:22
EmilienMI keep having post failures on http://logs.openstack.org/12/538012/1/check/build-openstack-releasenotes/245ceec/job-output.txt.gz04:22
EmilienMnot sure what to do really04:22
EmilienMdmsimard, pabelanger^ is that related to the fsck thing you're running?04:23
*** xarses has quit IRC04:24
*** xarses has joined #openstack-infra04:25
*** rosmaita has quit IRC04:26
*** jamesmcarthur has joined #openstack-infra04:29
*** pgadiya has joined #openstack-infra04:30
*** daidv has joined #openstack-infra04:31
*** salv-orlando has joined #openstack-infra04:32
*** jamesmcarthur has quit IRC04:35
ianwEmilienM: hmm, is that the remote hanging up?04:36
*** salv-orlando has quit IRC04:37
EmilienMianw: not sure tbh04:37
ianwrax-ord ... are your failures consistent in that region maybe?04:37
EmilienMianw: consistent04:38
EmilienMI did 10 rechecks04:38
EmilienMalways failing04:38
EmilienMI'll see tomorrow, cheers04:38
ianwnope, http://logs.openstack.org/12/538012/1/check/build-openstack-releasenotes/74c04f7/zuul-info/inventory.yaml for example is different region04:38
*** jamesmcarthur has joined #openstack-infra04:38
ianw ubuntu-xenial | Error: Cannot find source directory `/home/zuul/src/git.openstack.org/openstack/os-net-config/releasenotes/source'.04:39
ianwi think that's the root cause04:39
*** jamesmcarthur has quit IRC04:42
*** armax has quit IRC04:49
*** pgadiya has quit IRC04:50
*** pgadiya has joined #openstack-infra04:51
*** jamesmcarthur has joined #openstack-infra04:59
*** jamesmcarthur has quit IRC05:03
*** pgadiya has quit IRC05:03
*** ramishra has joined #openstack-infra05:06
*** claudiub|2 has joined #openstack-infra05:07
*** hongbin has joined #openstack-infra05:09
*** jamesmcarthur has joined #openstack-infra05:12
openstackgerritMatthew Treinish proposed openstack-infra/storyboard master: Make notification driver configurable  https://review.openstack.org/53857405:16
openstackgerritMatthew Treinish proposed openstack-infra/storyboard master: WIP: Add MQTT notification publisher  https://review.openstack.org/53857505:16
*** pgadiya has joined #openstack-infra05:16
*** jamesmcarthur has quit IRC05:17
openstackgerritMerged openstack/diskimage-builder master: Don't install dmidecode on Fedora ppc64le  https://review.openstack.org/53685205:23
openstackgerritMerged openstack/diskimage-builder master: Add support for Fedora 27, remove EOL Fedora 25  https://review.openstack.org/53675905:23
*** jamesmcarthur has joined #openstack-infra05:27
*** hongbin has quit IRC05:28
*** jamesmcarthur has quit IRC05:32
*** salv-orlando has joined #openstack-infra05:33
*** jamesmcarthur has joined #openstack-infra05:36
*** janki has joined #openstack-infra05:37
*** salv-orlando has quit IRC05:37
*** jamesmcarthur has quit IRC05:41
*** agopi|out has quit IRC05:44
*** jamesmcarthur has joined #openstack-infra05:46
*** wolverineav has joined #openstack-infra05:49
*** jamesmcarthur has quit IRC05:52
*** jamesmcarthur has joined #openstack-infra06:01
*** xinliang has quit IRC06:02
*** e0ne has joined #openstack-infra06:04
*** jamesmcarthur has quit IRC06:06
*** chenying has joined #openstack-infra06:12
*** chenying_ has quit IRC06:12
*** jamesmcarthur has joined #openstack-infra06:13
*** xinliang has joined #openstack-infra06:14
*** xinliang has quit IRC06:14
*** xinliang has joined #openstack-infra06:14
*** jamesmcarthur has quit IRC06:18
*** jamesmcarthur has joined #openstack-infra06:23
*** ramishra has quit IRC06:27
*** jamesmcarthur has quit IRC06:27
*** ramishra has joined #openstack-infra06:28
*** salv-orlando has joined #openstack-infra06:34
*** jamesmcarthur has joined #openstack-infra06:35
*** salv-orlando has quit IRC06:38
*** salv-orlando has joined #openstack-infra06:38
*** jamesmcarthur has quit IRC06:40
*** yolanda_ has quit IRC06:44
*** dsariel has joined #openstack-infra06:47
*** jamesmcarthur has joined #openstack-infra06:48
*** jamesmcarthur has quit IRC06:53
*** e0ne has quit IRC06:56
*** makowals has joined #openstack-infra06:56
*** e0ne has joined #openstack-infra06:58
*** jamesmcarthur has joined #openstack-infra07:00
*** jamesmcarthur has quit IRC07:05
*** jamesmcarthur has joined #openstack-infra07:06
*** rcernin has quit IRC07:11
*** jamesmcarthur has quit IRC07:11
*** e0ne has quit IRC07:11
*** pcichy has joined #openstack-infra07:12
*** jamesmcarthur has joined #openstack-infra07:16
*** pcichy has quit IRC07:17
*** jamesmcarthur has quit IRC07:21
*** namnh has joined #openstack-infra07:21
*** armaan has joined #openstack-infra07:26
*** jamesmcarthur has joined #openstack-infra07:27
*** matbu has quit IRC07:29
*** jamesmcarthur has quit IRC07:31
*** andreas_s has joined #openstack-infra07:32
*** ramishra has quit IRC07:32
*** salv-orlando has quit IRC07:33
*** jamesmcarthur has joined #openstack-infra07:33
*** slaweq has joined #openstack-infra07:34
*** ramishra has joined #openstack-infra07:35
*** ykarel is now known as ykarel|lunch07:35
*** olaph has joined #openstack-infra07:37
*** olaph1 has quit IRC07:37
*** jamesmcarthur has quit IRC07:38
*** slaweq has quit IRC07:38
*** slaweq has joined #openstack-infra07:38
*** gcb has joined #openstack-infra07:39
*** jamesmcarthur has joined #openstack-infra07:39
*** slaweq has quit IRC07:40
*** slaweq has joined #openstack-infra07:40
*** pcaruana has joined #openstack-infra07:44
*** jamesmcarthur has quit IRC07:44
*** florianf has joined #openstack-infra07:48
*** jamesmcarthur has joined #openstack-infra07:50
*** jamesmcarthur has quit IRC07:54
*** salv-orlando has joined #openstack-infra07:55
*** jamesmcarthur has joined #openstack-infra08:00
*** jtomasek has joined #openstack-infra08:01
*** gongysh has joined #openstack-infra08:03
*** jamesmcarthur has quit IRC08:04
*** kjackal has joined #openstack-infra08:08
*** gongysh has quit IRC08:08
*** b_bezak has joined #openstack-infra08:09
*** links has joined #openstack-infra08:09
openstackgerritDuong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in tap-as-a-service  https://review.openstack.org/51322808:14
*** jamesmcarthur has joined #openstack-infra08:15
*** zhenguo has joined #openstack-infra08:16
*** ralonsoh has joined #openstack-infra08:20
*** jamesmcarthur has quit IRC08:20
*** d0ugal has quit IRC08:22
*** alexchadin has joined #openstack-infra08:25
*** AJaeger has quit IRC08:26
*** tesseract has joined #openstack-infra08:26
prometheanfireI probably need some zuul help, I'm not sure what's causing the seemingly random fails, but something is...08:27
*** d0ugal has joined #openstack-infra08:27
prometheanfiresee https://review.openstack.org/536793 for example08:28
*** jamesmcarthur has joined #openstack-infra08:28
prometheanfirehttps://review.openstack.org/537645 too08:28
prometheanfireI wonder if meltdown mitigation patches slowed down gate and we are running against that08:29
prometheanfireevrardjp: we should probably talk here :P08:29
prometheanfireevrardjp: thanks for looking into it, tony set up the gate initially08:30
*** AJaeger has joined #openstack-infra08:31
prometheanfireevrardjp: where do you see those stats?08:31
evrardjpI am just checking http://zuul.openstack.org/builds.html08:32
evrardjpfor the patch you've shown me needing a few rechecks, it was due to many reasons08:32
*** jamesmcarthur has quit IRC08:33
evrardjpso I don't think it's a timeout thing that deserves changing08:33
evrardjpI digged a little deeper into the numbers, and you can see that one of your offenders could theoretically be the job: cross-neutron-py3508:33
evrardjp(enter that and the project: openstack/requirements )08:33
evrardjpyou'll see it's not a big offender, and I don't think you should change your limits right now, it should be fine.08:34
evrardjpif it becomes too much timeouts, then maybe a little deeper investigation would be wise. Then maybe change the timeouts if nothing can be changed.08:34
evrardjpand that's a maybe08:35
evrardjp(and only for that job)08:35
evrardjpwell that's what I'd do08:35
evrardjpbut after quickly checking your running jobs, I saw one of it had a post merge issue08:35
evrardjpthat's something that could interest infra: http://logs.openstack.org/c5/c5053646aa1bbbd0b2f2a5b269ddb42a9f29d49e/post/publish-openstack-python-branch-tarball/8205ccb/job-output.txt.gz#_2018-01-29_08_15_27_78274008:36
evrardjpit looks like the branch tarball got a (maybe temporary?) issue.08:36
*** efoley has joined #openstack-infra08:37
*** ykarel|lunch is now known as ykarel08:37
*** sshnaidm has joined #openstack-infra08:37
evrardjpjust saying, I am no infra person, and all.08:38
*** jpich has joined #openstack-infra08:41
*** masber has quit IRC08:41
*** masber has joined #openstack-infra08:42
*** jamesmcarthur has joined #openstack-infra08:42
*** apetrich has quit IRC08:44
*** salv-orlando has quit IRC08:45
*** dingyichen has quit IRC08:46
*** jamesmcarthur has quit IRC08:47
*** jpena|off is now known as jpena08:48
openstackgerritChandan Kumar proposed openstack-infra/project-config master: Added check-requirements and publish-to-pypi jobs  https://review.openstack.org/53883808:48
*** makowals has quit IRC08:50
*** jamesmcarthur has joined #openstack-infra08:50
*** olaph has quit IRC08:50
*** amoralej|off is now known as amoralej08:53
*** olaph has joined #openstack-infra08:54
*** jamesmcarthur has quit IRC08:54
*** rossella_s has joined #openstack-infra08:56
*** jamesmcarthur has joined #openstack-infra08:57
AJaegerprometheanfire, evrardjp, what'S the timeout for those jobs? Is it the default?08:58
*** alexchadin has quit IRC08:58
*** makowals has joined #openstack-infra08:58
prometheanfireAJaeger: ya, default08:58
*** alexchadin has joined #openstack-infra08:58
AJaegeryou might want to follow https://review.openstack.org/537016 to set it to 2400s08:58
AJaegerprometheanfire: shall I send you a patch?08:59
prometheanfireAJaeger: that'd be awesome08:59
prometheanfireI could probably figure it out (with that example though08:59
prometheanfirelet me do it, good learning08:59
*** d0ugal has quit IRC09:00
AJaegerprometheanfire: https://review.openstack.org/538842 done already - sorry, read your message too late09:01
prometheanfireAJaeger: https://gist.github.com/e7dc44d57043c9fa3e1204341c8829c809:01
prometheanfirelol09:01
*** jamesmcarthur has quit IRC09:01
*** efoley has quit IRC09:01
AJaegerprometheanfire: but can abandon ;)09:01
AJaegerprometheanfire: your call..09:01
*** rossella_s has quit IRC09:01
*** rfolco|off is now known as rfolco09:01
*** salv-orlando has joined #openstack-infra09:02
AJaegerprometheanfire: yeah, that works - I just put it earlier :)09:02
prometheanfirenah, just tell me if my patch would do it09:02
prometheanfirecool09:02
AJaegerprometheanfire: so, get that change in quickly to avoid some problems.09:02
prometheanfireya09:02
AJaegerregarding the tarball: That might be a node going down, some infra-root need to check that later09:03
prometheanfirenot sure you could accelerate that, up to you09:03
AJaegerprometheanfire, no cannot09:03
*** e0ne has joined #openstack-infra09:04
*** rossella_s has joined #openstack-infra09:05
jianghuawhi, I met an error in a compute node when using devstack setup multiple nodes env. I think the problem is that we need include libpcre3-dev as the general prerequisite package.09:06
jianghuawAs python-pcre is not added in the upper-constraints.txt and python-pcre depends on libpcre3-dev.09:06
jianghuawhttps://github.com/openstack/requirements/blob/master/upper-constraints.txt#L44009:06
jianghuawI created a patch here to fix it: https://review.openstack.org/#/c/538841/09:06
jianghuawcould you help to have a look?09:07
*** giblet is now known as gibi_09:07
jianghuawThanks. My multi-nodes testing job is broken due to the above problem. At the moment I install this package in the image.09:08
prometheanfireI thought we just removed python-pcre from requirements09:08
prometheanfiremaybe that was another conversation09:09
jianghuawprometheanfire, I'm not sure why python-pcre is added.09:09
jianghuawok.09:09
jianghuawalready there is some discussion on it.09:09
jianghuawDo we have conclusion? and who is removeing it?09:10
prometheanfireit doesn't look like it was removed, just never added09:11
prometheanfireanyway, it's after 3 and I have to sleep some time09:11
*** jamesmcarthur has joined #openstack-infra09:11
*** finucannot is now known as sfinucan09:12
*** sfinucan is now known as stephenfin09:12
jianghuawprometheanfire, ok. Thanks. But that's really in https://github.com/openstack/requirements/blob/master/upper-constraints.txt#L44009:13
prometheanfireoh, maybe I missed it (was just looking at recent commits)09:15
prometheanfirethat was added or changed 2 months ago09:15
*** matbu has joined #openstack-infra09:15
prometheanfirehttps://github.com/openstack/requirements/commit/89cebce27b7bd84260ea8e01a3fff1b64851e41d added as a dep from another module it looks like09:16
*** jamesmcarthur has quit IRC09:16
jianghuawyes09:16
prometheanfireanyway, time for me to sleep09:17
jianghuawThanks anyway:-)09:17
*** owalsh_ has quit IRC09:17
*** owalsh has joined #openstack-infra09:17
*** makowals has quit IRC09:18
*** pcichy has joined #openstack-infra09:19
*** dbecker has joined #openstack-infra09:19
*** d0ugal has joined #openstack-infra09:20
*** derekh has joined #openstack-infra09:23
*** kiennt26 has quit IRC09:24
*** abelur_ has quit IRC09:24
jianghuawAJaeger, added you to review my patch: https://review.openstack.org/#/c/538841/09:24
jianghuawThanks in advance:-)09:25
AJaegerjianghuaw: I'm not a devstack core, better ask on #openstack-qa.09:25
*** kopecmartin has joined #openstack-infra09:26
*** jamesmcarthur has joined #openstack-infra09:26
jianghuawAJaeger, thanks.09:26
*** apetrich has joined #openstack-infra09:28
*** yamahata has quit IRC09:28
AJaegerjianghuaw: I commented nevertheless...09:28
jianghuawthanks. Indeed that should be *now*. My bad:-)09:29
*** yamamoto has quit IRC09:30
*** jamesmcarthur has quit IRC09:31
*** jamesmcarthur has joined #openstack-infra09:32
*** s-shiono has quit IRC09:33
*** e0ne has quit IRC09:35
jianghuawAJaeger, uploaded a new PS.09:35
*** jamesmcarthur has quit IRC09:37
openstackgerritMasahito Muroi proposed openstack-infra/project-config master: Add publish-to-pypi in blazar-nova repo  https://review.openstack.org/53818509:42
*** shardy has joined #openstack-infra09:43
evrardjprcarrillocruz: mordred: jeblair: I am not sure where you are in the integration of tests from PRs inside ansible github's repo to our zuul jobs, but I started to work on this real quick: https://review.openstack.org/#/c/538856/ to add a job on our side to be able to test ansible modules.09:44
evrardjpwhen one of you is available, ping me to know if I should continue or not.09:44
evrardjpI just wanted to have a draft, get it working, and adapt that pattern if need be. Improvements of speed can come later.09:45
*** makowals has joined #openstack-infra09:45
*** jamesmcarthur has joined #openstack-infra09:45
*** makowals has quit IRC09:45
AJaegerevrardjp: read backscroll about https://review.openstack.org/537955 - that'S all I know...09:45
*** efoley has joined #openstack-infra09:46
evrardjpoh there are already devstack tests apparently. Maybe I am too late in the game.09:46
AJaegerevrardjp: best to talk to mordred and corvus later09:47
*** yamamoto has joined #openstack-infra09:47
evrardjpAJaeger: yup, thanks!09:47
chandankumarAJaeger: Hello, Thanks for working on tempest-lib project removal :-)09:49
chandankumarAJaeger: https://review.openstack.org/#/c/538838/09:50
*** jamesmcarthur has quit IRC09:50
chandankumarAJaeger: i have added openstackci as a owner for uploading the package to pypi09:51
chandankumarfor python-tempestconf09:52
AJaegerchandankumar: pypi.org/pypi/python-tempestconf does not exist09:52
AJaegerplease double check ^09:52
*** jamesmcarthur has joined #openstack-infra09:52
*** erlon_ has quit IRC09:53
*** pblaho has joined #openstack-infra09:53
chandankumarAJaeger: https://pypi.python.org/pypi/python-tempestconf09:53
chandankumarrefresh again09:54
AJaegeryeah, looks fine now...09:54
AJaegerchandankumar: home page looks wrong - but next release will fix that ;)09:55
chandankumarAJaeger: https://review.openstack.org/#/c/538840/ for new release09:55
AJaegerchandankumar: let'S update setup.cfg first, please. That'S sooo wrong for an official openstack project09:56
*** jamesmcarthur has quit IRC09:57
*** makowals has joined #openstack-infra09:58
*** e0ne has joined #openstack-infra10:00
chandankumarAJaeger: sorry i didnot get that, you mean adding version metadata in setup.cfg, or updating Red Hat stuff part?10:01
*** kopecmartin has quit IRC10:01
AJaegerchandankumar: yes, remove Red Hat stuff. home page, mailing list, author look wrong10:01
*** sree has quit IRC10:01
chandankumarAJaeger: sure10:02
*** sree has joined #openstack-infra10:02
AJaegerchandankumar: you could even publish to docs.o.o (no job setup currently) - if you have docs for that.10:02
*** jamesmcarthur has joined #openstack-infra10:03
chandankumarAJaeger: currently we donot have too much docs, but in future, docs will be added, then we can publish it10:07
chandankumarAJaeger: https://review.openstack.org/#/c/538862/10:07
*** jamesmcarthur has quit IRC10:09
*** namnh has quit IRC10:09
*** namnh has joined #openstack-infra10:09
*** sree has quit IRC10:11
AJaegerchandankumar: thanks!10:14
*** chenying_ has joined #openstack-infra10:15
*** chenying has quit IRC10:15
*** jamesmcarthur has joined #openstack-infra10:16
*** cuongnv has quit IRC10:17
*** jamesmcarthur has quit IRC10:21
*** stakeda has quit IRC10:21
*** adarazs is now known as adarazs_brb10:25
ssbarneaWould it be possible to get rid of the wiki captcha? its presence makes any kind of contributions a huge PITA. Now wonder why pages endup being outdated.10:26
*** jamesmcarthur has joined #openstack-infra10:27
*** hjensas has joined #openstack-infra10:27
*** oidgar has joined #openstack-infra10:28
*** gcb has quit IRC10:29
*** ldnunes has joined #openstack-infra10:31
*** jamesmcarthur has quit IRC10:31
*** threestrands_ has joined #openstack-infra10:33
*** jamesmcarthur has joined #openstack-infra10:35
*** threestrands has quit IRC10:36
*** lucas-afk is now known as lucasagomes10:37
rcarrillocruzevrardjp: where's openstack-ansible-functional-<os> parent job defined?10:37
evrardjpopenstack-ansible-tests10:37
*** jappleii__ has joined #openstack-infra10:37
evrardjphttps://github.com/openstack/openstack-ansible-tests/blob/master/zuul.d/jobs.yaml10:37
evrardjpit's just a start to get the ball rolling.10:38
*** jappleii__ has quit IRC10:38
*** panda|off is now known as panda10:38
*** makowals has quit IRC10:39
*** jappleii__ has joined #openstack-infra10:39
evrardjpbut if you're already on something else, I can drop this. It's just that I promised you testing, and I am on my way of delivering now.10:40
*** jamesmcarthur has quit IRC10:40
*** threestrands_ has quit IRC10:41
*** dhajare has joined #openstack-infra10:42
evrardjpIf you don't need it, fine for me, less code in our repos. :D10:43
*** pbourke has quit IRC10:43
*** pbourke has joined #openstack-infra10:45
*** danpawlik has quit IRC10:46
*** jamesmcarthur has joined #openstack-infra10:48
*** tpsilva has joined #openstack-infra10:49
*** jaosorior has joined #openstack-infra10:52
*** namnh has quit IRC10:52
*** jamesmcarthur has quit IRC10:53
*** gcb has joined #openstack-infra10:53
cmurphysomething up with http://zuul.openstack.org/ ?10:55
*** alexchadin has quit IRC10:55
*** alexchadin has joined #openstack-infra10:55
cmurphy"The proxy server received an invalid response from an upstream server."10:55
cmurphyinfra-root ^10:56
*** makowals has joined #openstack-infra10:56
cmurphygoing to cry if my change was dequeued :'(10:57
AJaegercmurphy: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all10:59
*** gcb has quit IRC10:59
*** alexchadin has quit IRC10:59
AJaegerinfra-root, we hit out of memory on zuul ;(10:59
cmurphyoh lovely :(10:59
AJaeger16 GB of swap ;( http://cacti.openstack.org/cacti/graph.php?local_graph_id=64794&rra_id=all11:00
AJaeger#status notice Zuul is currently under heavy load. Do not *recheck* or *approve* any changes.11:01
openstackstatusAJaeger: sending notice11:01
*** annp has quit IRC11:01
*** ganso has joined #openstack-infra11:02
cmurphythanks AJaeger11:02
*** jamesmcarthur has joined #openstack-infra11:02
-openstackstatus- NOTICE: Zuul is currently under heavy load. Do not *recheck* or *approve* any changes.11:03
*** adarazs_brb is now known as adarazs11:03
*** alexchadin has joined #openstack-infra11:04
openstackstatusAJaeger: finished sending notice11:04
ianwhmm, it's gone nuts i guess11:04
AJaegercmurphy: that's all I can do ;(11:05
cmurphyseems to be back and the queues are still there11:05
AJaegerinfra-root, I fear that the mass approval of https://review.openstack.org/#/q/status:open+topic:zuulv3-projects is the problem ;(11:05
AJaegerianw, cmurphy shall we wait and let them finish?11:05
* AJaeger goes to lunch - bbl11:05
ianwthere's stuff in there about exceptions and nodes not locked11:05
*** electrofelix has joined #openstack-infra11:06
ianwi'm afraid at this point, i'm not really up for a debugging session11:06
ianwso although i could restart fairly easily, if it goes wrong we might be in a worse position than we are now11:06
ianwi think it's best if someone else looks into it, i'm off now11:07
cmurphyi bet mordred or fungi or pabelanger will be up in the next couple of hours11:07
ianwyep, and they have all day to fix it up too ;) ttyl11:07
*** alexchadin has quit IRC11:07
*** jamesmcarthur has quit IRC11:07
*** danpawlik has joined #openstack-infra11:13
*** jamesmcarthur has joined #openstack-infra11:14
*** gcb has joined #openstack-infra11:17
*** jamesmcarthur has quit IRC11:18
*** gcb has quit IRC11:18
*** tosky has joined #openstack-infra11:21
*** olaph has quit IRC11:22
*** olaph has joined #openstack-infra11:23
*** sambetts|afk is now known as sambetts|11:24
*** sambetts| is now known as sambetts11:24
*** andreas_s has quit IRC11:25
*** andreas_s has joined #openstack-infra11:25
*** dhajare has quit IRC11:26
*** jamesmcarthur has joined #openstack-infra11:27
*** alexchadin has joined #openstack-infra11:30
*** _ari_|DevConf is now known as _ari_|conf11:32
*** _ari_|conf is now known as _ari_|brno11:32
*** jamesmcarthur has quit IRC11:32
sambettsHey Infra whats the status of Zuul, I see the message from Friday but wanted to make sure before I start rechecking stuff11:33
sambettsI can't get on zuul.openstack.org so I assume something not good11:33
sambetts?11:33
toskysambetts: there was a notification not long ago11:33
*** alexchadin has quit IRC11:33
AJaegersambetts: https://wiki.openstack.org/wiki/Infrastructure_Status11:34
sambettsoh ... my client timestamped that message Friday.... not sure why...11:34
*** andreas_s has quit IRC11:35
AJaegersambetts: we had Friday other challenges ;(11:35
sambetts:(11:36
*** dhajare has joined #openstack-infra11:38
*** andreas_s has joined #openstack-infra11:40
*** jamesmcarthur has joined #openstack-infra11:40
*** alexchadin has joined #openstack-infra11:41
*** alexchadin has quit IRC11:43
*** jamesmcarthur has quit IRC11:45
*** jamesmcarthur has joined #openstack-infra11:49
*** dklyle has quit IRC11:52
*** david-lyle has joined #openstack-infra11:53
*** jamesmcarthur has quit IRC11:53
*** andreas_s has quit IRC11:55
*** andreas_s has joined #openstack-infra12:00
*** jamesmcarthur has joined #openstack-infra12:01
*** alexchadin has joined #openstack-infra12:02
*** andreas_s has quit IRC12:03
*** andreas_s has joined #openstack-infra12:03
toskyAJaeger: while waiting for zuul to come up, I have a question where you may help, related to publishing artifacts12:04
toskyrecently we (sahara) merged this https://review.openstack.org/#/c/532690 and I realized that the artifacts are published now under  http://tarballs.openstack.org/sahara-extra/dist/ instead of  http://tarballs.openstack.org/sahara/dist/12:05
toskynow, before changing all references to the old URLs, is there an easy way (or are we even allowed) to make the output of a sahara-extra job to publish under tarballs.o.o/sahara/ and not /sahara-extra?12:06
toskyor if not, can we put a symlink?12:06
*** jamesmcarthur has quit IRC12:07
*** rfolco is now known as rfolco|ruck12:07
*** sree has joined #openstack-infra12:10
*** pblaho has quit IRC12:11
*** jamesmcarthur has joined #openstack-infra12:11
*** erlon_ has joined #openstack-infra12:13
*** salv-orlando has quit IRC12:14
*** jamesmcarthur has quit IRC12:16
*** jamesmcarthur has joined #openstack-infra12:22
*** salv-orlando has joined #openstack-infra12:22
*** jamesmcarthur has quit IRC12:26
*** sree_ has joined #openstack-infra12:27
*** sree_ is now known as Guest4021512:27
*** sshnaidm has quit IRC12:28
*** jamesmcarthur has joined #openstack-infra12:28
*** salv-orlando has quit IRC12:29
*** sree has quit IRC12:30
*** salv-orlando has joined #openstack-infra12:32
*** jpena is now known as jpena|lunch12:33
*** jamesmcarthur has quit IRC12:33
*** katkapilatova has joined #openstack-infra12:39
dmsimardconfig-core, infra-root: I'll be semi-afk all week in a certification thingy so I might not be responsive to pings. Hopefully fungi and clarkb are back this week!12:40
mnasersigh12:41
mnaserit looks like someone in puppet world took the liberty of doing the remove project name patches12:41
*** jamesmcarthur has joined #openstack-infra12:41
mnaserwithout a wait timeout12:41
mnaserand then someone else did the same.  causing even more load12:41
mnaserthen 2nd person went and -1'd patches of 1st person12:42
* mnaser flips table12:42
*** pblaho has joined #openstack-infra12:42
*** sshnaidm has joined #openstack-infra12:43
*** cshastri has quit IRC12:43
*** jamesmcarthur has quit IRC12:46
*** janki has quit IRC12:48
*** pcichy has quit IRC12:48
*** armaan_ has joined #openstack-infra12:49
*** jamesmcarthur has joined #openstack-infra12:50
*** adarazs is now known as adarazs_lunch12:51
*** armaan has quit IRC12:52
*** jamesmcarthur has quit IRC12:55
*** jamesmcarthur has joined #openstack-infra12:56
*** jamesmcarthur has quit IRC13:01
cmurphyi think the zuul issues are going to start causing a lot of POST_FAILUREs like this one http://logs.openstack.org/41/538541/1/gate/neutron-grenade/4c60129/job-output.txt.gz :(13:04
AJaegertosky: best discuss with mordred , he's the master of publishing in CI ;)13:04
AJaegermnaser: and some people approved large number of these changes ...13:05
toskyAJaeger: I was trying to avoid to fill mordred's request buffer :)13:05
AJaegertosky: I'm fine reviewing but can't help in designing. No time today to dig into this13:06
*** rosmaita has joined #openstack-infra13:06
openstackgerritBalazs Gibizer proposed openstack-infra/project-config master: consolidate nova job definitions  https://review.openstack.org/53890813:07
toskyAJaeger: sure, sure, it was not a request to proceed further; it was just sorry about the load on mordred13:07
tosky(and in general for all people with a lot of stuff on their plate)13:07
*** amoralej is now known as amoralej|lunch13:07
*** sshnaidm_ has joined #openstack-infra13:11
*** jamesmcarthur has joined #openstack-infra13:12
ssbarneaanyone knows if gerritbot supports patterns for matching project list?13:12
*** Guest40215 has quit IRC13:13
*** pgadiya has quit IRC13:13
rosmaitais it just me, or has http://zuul.openstack.org/ become non-responsive?13:13
cmurphyrosmaita: known issue https://wiki.openstack.org/wiki/Infrastructure_Status13:14
*** sshnaidm has quit IRC13:14
cmurphyhoping an infra-root can help soon :(13:14
rosmaitacmurphy: ty, i always look at the topic in #openstack-infra-incident but looks like the wiki is a better place13:16
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Move github webhook from webapp to zuul-web  https://review.openstack.org/53571113:16
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Move status_url from webapp to web section  https://review.openstack.org/53677313:16
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Remove webapp  https://review.openstack.org/53678013:16
cmurphyrosmaita: yeah there was a notice sent out but i guess that doesn't change the channel topics13:17
*** jamesmcarthur has quit IRC13:17
*** dprince has joined #openstack-infra13:19
*** jamesmcarthur has joined #openstack-infra13:19
*** dhajare has quit IRC13:20
*** panda is now known as panda|lunch13:22
*** alexchad_ has joined #openstack-infra13:22
*** kopecmartin has joined #openstack-infra13:23
*** vivsoni has quit IRC13:23
AJaegercmurphy: shall I change the topics? I can sent out a status alert...13:24
*** alexchadin has quit IRC13:24
*** vivsoni has joined #openstack-infra13:24
*** mkopec_ has joined #openstack-infra13:25
*** jamesmcarthur has quit IRC13:25
cmurphyAJaeger: your call13:26
AJaeger;)13:27
*** kopecmartin has quit IRC13:28
AJaegerWhat about this wording: #status alert Zuul is currently under heavy load. Do not *recheck* or *approve* any changes until we give the go ahead.13:29
cmurphyAJaeger: lgtm13:29
AJaeger#status alert Zuul is currently under heavy load. Do not *recheck* or *approve* any changes until we give the go ahead.13:29
openstackstatusAJaeger: sending alert13:29
*** rlandy has joined #openstack-infra13:30
AJaegerthanks cmurphy. Then let's do it - we can always change.13:30
AJaegerit might avoid some questions...13:30
-openstackstatus- NOTICE: Zuul is currently under heavy load. Do not *recheck* or *approve* any changes until we give the go ahead.13:32
*** ChanServ changes topic to "Zuul is currently under heavy load. Do not *recheck* or *approve* any changes until we give the go ahead."13:32
AJaegerFrom Zuul (I could just attach it) "Queue lengths: 751 events, 0 management events, 1355 results."13:32
*** jpena|lunch is now known as jpena13:33
*** fultonj has joined #openstack-infra13:34
*** alexchadin has joined #openstack-infra13:34
*** tmorin has joined #openstack-infra13:34
openstackstatusAJaeger: finished sending alert13:35
*** alexchad_ has quit IRC13:37
*** d0ugal has quit IRC13:38
*** d0ugal has joined #openstack-infra13:38
AJaegerinfra-root, FYI ^ - please investigate and restart zuul13:40
*** edmondsw has joined #openstack-infra13:41
*** trown|outtypewww is now known as trown13:42
*** trown is now known as trown|rover13:42
*** alexchadin has quit IRC13:43
pabelangermorning13:44
*** tellesnobrega has quit IRC13:44
*** tellesnobrega has joined #openstack-infra13:45
*** hemna_ has joined #openstack-infra13:45
AJaegerpabelanger, good morning! REady for a short fire drill? Otherwise grab your coffee first, please...13:45
pabelangersure, grabbing coffee13:46
pabelangerand seeing if I can save queues13:46
*** alexchadin has joined #openstack-infra13:47
AJaegerpabelanger: don't restore *all* changes, otherwise we swap again13:47
AJaegerpabelanger: my theory: We're swapping due to too many approvals of topic:zuulv3-projects changes13:48
pabelangerAJaeger: why, is there a specific changes thta pushed up over?13:48
pabelangerokay, I'm not sure how to filter them out13:48
AJaegerpabelanger: just this list: https://review.openstack.org/#/q/status:open+topic:zuulv3-projects13:49
pabelangerwe can start with gate first13:49
AJaegerpabelanger: those are in gate...13:49
AJaegerremove all openstack-ansible from gate ;)13:49
AJaegerthat's the bulk of the changes13:49
pabelangerthat' wont be easy13:49
pabelangerI only have patchset ides13:50
pabelangerid*13:50
pabelangeractually13:50
pabelangerlet me first dump queue13:50
AJaegerpabelanger: worst case: Do not restore gate for now...13:51
AJaegerpabelanger: sorry, can't help further - meeting time.13:51
*** dhajare has joined #openstack-infra13:52
pabelangerokay, I also know the project13:54
pabelangerdumping queues13:54
pabelangerstopping zuul13:55
*** kgiusti has joined #openstack-infra13:56
AJaegerpabelanger: can you do the #status ok later, please?13:57
* AJaeger gave 30 mins ago a status alert13:58
*** sree has joined #openstack-infra13:58
*** ykarel is now known as ykarel|away14:00
pabelangerstill bringing zuul backonline14:00
*** yamamoto has quit IRC14:01
*** slaweq has quit IRC14:02
pabelangermerge:cat running14:02
pabelangermerger:cat*14:02
*** slaweq has joined #openstack-infra14:02
*** mriedem has joined #openstack-infra14:03
*** sree has quit IRC14:03
pabelangerloading queue (minus some OSA patches)14:05
*** zhenguo has quit IRC14:05
*** ykarel|away has quit IRC14:06
*** ihrachys has joined #openstack-infra14:06
*** dhajare has quit IRC14:07
*** slaweq has quit IRC14:07
*** jamesmcarthur has joined #openstack-infra14:08
*** dhill_ has joined #openstack-infra14:09
*** tmorin has quit IRC14:09
pabelangerloading check queue now14:12
*** david-lyle has quit IRC14:15
*** psachin has quit IRC14:15
*** dmsimard is now known as dmsimard|afk14:17
*** eyalb has joined #openstack-infra14:17
*** mkopec_ has quit IRC14:17
*** yamamoto has joined #openstack-infra14:18
*** r-daneel has joined #openstack-infra14:20
*** myoung is now known as myoung|reboot14:20
pabelangerokay, I've stopped loading changes into check, up to 18GB of ram14:21
*** Goneri has joined #openstack-infra14:24
*** jcoufal has joined #openstack-infra14:24
*** edmondsw has quit IRC14:25
*** dhajare_ has joined #openstack-infra14:25
pabelanger#status notice we've been able to restart zuul, and re-enqueue changes for gate. Please hold off on recheck or approves, we are still recovering. More info shortly.14:28
openstackstatuspabelanger: sending notice14:28
evrardjppabelanger: what's happening on our side?14:28
*** superdan is now known as dansmith14:28
*** dave-mccowan has joined #openstack-infra14:28
*** edmondsw has joined #openstack-infra14:28
*** myoung|reboot is now known as myoung14:29
-openstackstatus- NOTICE: we've been able to restart zuul, and re-enqueue changes for gate. Please hold off on recheck or approves, we are still recovering. More info shortly.14:29
*** makowals has quit IRC14:29
*** dhajare_ has quit IRC14:30
pabelangerAJaeger: do we know how to stop corvus script?14:31
openstackstatuspabelanger: finished sending notice14:31
*** makowals has joined #openstack-infra14:31
*** makowals has quit IRC14:32
*** ralonsoh_ has joined #openstack-infra14:32
mnaserpabelanger: re approving puppet-openstack, did that and notified #puppet-openstack to not approve anything in the meantime too :)14:32
*** dave-mccowan has quit IRC14:33
*** efoley has quit IRC14:33
*** efoley has joined #openstack-infra14:33
pabelangergreat, thank you14:33
*** makowals has joined #openstack-infra14:34
*** ykarel|away has joined #openstack-infra14:34
*** dave-mccowan has joined #openstack-infra14:34
pabelangerso, were at about 19Gb for zuul right now14:34
pabelangerthings looks to have leveled for the moment14:35
*** eyalb has left #openstack-infra14:35
*** ralonsoh has quit IRC14:35
*** jamesmca_ has joined #openstack-infra14:36
AJaegerpabelanger: corvus script sends a new change every 20 mins - best talk with him on next steps. It was all well so far - but I see at least 40 *approved* changes from them, so that was too much. Perhaps he has to stop for now...14:37
*** jamesmca_ is now known as jamesmcarthur_14:39
d0ugalHas this moved? http://zuulv3.openstack.org/14:40
mnaserd0ugal: zuul.openstack.org14:41
evrardjpd0ugal: yes a few time ago14:41
d0ugalaha, thanks14:41
d0ugal(sorry, I have been out for a few weeks)14:41
evrardjphaha that would explain. There was a redirect before to zuul.openstack.org from zuulv3.14:41
evrardjpjust changing the bookmark would do the trick :p14:42
*** amoralej|lunch is now known as amoralej14:42
*** rloo has joined #openstack-infra14:43
rloohi, is there some way to remove a patch fro zuul. eg, abandon the patch?14:43
pabelangerrloo: yes, abandon will dequeue it from zuul14:44
rloopabelanger: sweet. will do that until things are working better in zuul-la-la-land14:44
*** dhajare_ has joined #openstack-infra14:45
*** janki has joined #openstack-infra14:46
*** kopecmartin has joined #openstack-infra14:47
*** daidv has quit IRC14:49
*** jcoufal has quit IRC14:49
*** daidv has joined #openstack-infra14:49
*** salv-orlando has quit IRC14:49
*** salv-orlando has joined #openstack-infra14:50
*** esberglu has joined #openstack-infra14:51
*** gcb has joined #openstack-infra14:52
*** ykarel|away has quit IRC14:53
*** bfournie has joined #openstack-infra14:54
*** salv-orlando has quit IRC14:55
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Fix relaunch attempts when hitting quota errors  https://review.openstack.org/53693014:58
*** fresta has quit IRC14:59
*** jcoufal has joined #openstack-infra15:00
*** fresta has joined #openstack-infra15:01
*** lucasagomes is now known as lucas-hungry15:01
*** gibi_ is now known as gibi15:01
*** fresta has quit IRC15:02
*** r-daneel_ has joined #openstack-infra15:02
*** rfolco has joined #openstack-infra15:03
*** rfolco|ruck has quit IRC15:03
*** rfolco_ has joined #openstack-infra15:03
pabelangerafk for a few moments15:03
*** Goneri has quit IRC15:03
*** r-daneel has quit IRC15:04
*** r-daneel_ is now known as r-daneel15:04
*** tmorin has joined #openstack-infra15:04
*** slaweq has joined #openstack-infra15:05
*** rfolco_ is now known as rfolco|ruck15:05
*** adarazs_lunch is now known as adarazs15:05
*** kiennt26 has joined #openstack-infra15:05
*** fresta has joined #openstack-infra15:06
mriedeme-r seems to be dead http://status.openstack.org/ - no updates since saturday15:09
*** slaweq has quit IRC15:10
*** yamahata has joined #openstack-infra15:10
*** r-daneel has quit IRC15:11
*** r-daneel has joined #openstack-infra15:11
sc`the monday thundering herd :(15:11
*** hongbin has joined #openstack-infra15:12
*** panda|lunch is now known as panda15:12
*** gus has quit IRC15:13
pabelangermriedem: I can look shortly15:13
*** makowals_ has joined #openstack-infra15:13
sc`my cookbooks' tests prevented me from pushing more than one of the zuul changes through at a time for chef. the ones that checked cleanly got merged to keep the amount of in-flight changes low15:13
*** eharney has joined #openstack-infra15:13
sc`...come monday. thanks, finger-return.15:14
*** gus has joined #openstack-infra15:14
*** makowals has quit IRC15:16
*** Goneri has joined #openstack-infra15:17
mgagnepabelanger: did you see any improvement since we enable the quota refresher in inap-mtl01?15:19
*** oidgar has quit IRC15:19
*** rfolco has quit IRC15:20
pabelangermgagne: possible, when did you enable it? I haven't see any failures in a last while15:20
*** rfolco has joined #openstack-infra15:20
pabelangermgagne: aside from when we upload new images, thunder hurd issue15:20
mgagneenabled since January 1615:21
*** rfolco has quit IRC15:21
mgagnepabelanger: ok, don't you have the same issue with other providers?15:21
mgagneor only with us?15:21
pabelangermgagne: we have an issue in citycloud with quotas, but think that is a bug in nodepool.15:22
pabelangermgagne: maybe OVH would be the other place we seem some quota issue15:23
mgagnepabelanger: and how about the thundering herd issue?15:23
*** hemna_ has quit IRC15:23
pabelangermgagne: mostly inap, we often seem compute nodes fail to boot the new images, just after upload. But usually fixed by 2nd time we have launched on the compute node15:24
smcginnisWe're still supposed to be holding off on approvals, right?15:24
*** myoung is now known as myoung|brb15:24
pabelangersmcginnis: yah, for a bit longer. Would like to make sure corvus or another infra-root is only before we open the flood gates again15:24
mgagnepabelanger: ok, I can try something on my side and see if it improves15:25
smcginnispabelanger: ack, thanks!15:25
pabelangermgagne: I think we talked in the past, but does compute nodes convert images to raw after download? Or is that settings disabled in nova15:26
mgagnepabelanger: you are thinking like me =)15:26
*** bobh_ has joined #openstack-infra15:26
pabelangermgagne: :)15:26
*** ijw has joined #openstack-infra15:28
*** bobh_ has quit IRC15:28
*** mylu has joined #openstack-infra15:33
*** dhajare_ has quit IRC15:35
*** Goneri has quit IRC15:36
*** alexchadin has quit IRC15:37
*** myoung|brb is now known as myoung15:37
*** Goneri has joined #openstack-infra15:39
*** r-daneel_ has joined #openstack-infra15:39
*** r-daneel has quit IRC15:40
*** r-daneel_ is now known as r-daneel15:40
*** sree has joined #openstack-infra15:43
*** claudiub has joined #openstack-infra15:46
corvusAJaeger, pabelanger: i stopped the script.  it shouldn't have been a problem for zuul, but i guess folks got carried away?15:47
*** gcb has quit IRC15:47
pabelangercorvus: I'm still looking to see what tipped us over memory, but ya first indications we just +A too many at once15:48
AJaegercorvus: 40+ of your changes approved in short order and all in gate is my theory of what brought us down...15:48
*** claudiub|3 has joined #openstack-infra15:48
*** esberglu has quit IRC15:48
*** hemna_ has joined #openstack-infra15:49
*** claudiub|2 has quit IRC15:49
*** esberglu has joined #openstack-infra15:49
*** ramishra has quit IRC15:49
corvusperhaps i should only run the script during the week instead of on the weekend15:49
AJaegercorvus: yeah...15:50
AJaegerhow do you want to get those in that are approved already?15:51
*** salv-orlando has joined #openstack-infra15:51
*** claudiub has quit IRC15:51
*** mylu has quit IRC15:53
*** hemna_ has quit IRC15:54
* AJaeger will be back later15:54
corvusAJaeger: not sure i understand the question15:54
*** andreww has joined #openstack-infra15:54
*** mylu has joined #openstack-infra15:55
AJaegercorvus: we have 40+ approved but not merged changes for this. Do you want to recheck them one by one? Merge them with some time in between? Or what should be done with them?15:55
*** salv-orlando has quit IRC15:55
*** xarses_ has joined #openstack-infra15:55
*** tosky has quit IRC15:56
*** inc0 has joined #openstack-infra15:56
*** david-lyle has joined #openstack-infra15:56
*** b_bezak has quit IRC15:57
*** sree_ has joined #openstack-infra15:57
*** felipemonteiro has joined #openstack-infra15:57
corvusAJaeger: i don't really want to do anything with them right now.15:57
*** sree_ is now known as Guest6445615:57
*** eharney has quit IRC15:58
mnasercorvus: there is 2 individuals who have been doing a lot of bulk changes in puppet openstack without discussing with us that noticed your changes and started doing the same as you15:58
mnaserbut without 20 minutes..15:58
mnaseri think their attempt at helping might have contributed15:58
*** andreww has quit IRC15:59
*** sree has quit IRC15:59
corvusmnaser: i guess i should send out emails when i do this.  i had hoped to just save everyone from having to worry or even think about it by just doing it.15:59
AJaegermnaser: I saw melissaml and Tuan - and sent both emails once I noticed. Tuan abandoned the changes. I think melissa did not submit new ones...15:59
*** lucas-hungry is now known as lucasagomes15:59
mnaserAJaeger: thanks, i tried to send emails too, some -1's were tossed at each other :\16:00
mnaserwe get this often in puppet-openstack where we get a big bulk of changes like this without being consulted but anyways16:00
AJaegermnaser: and a third person indeed on puppet Hoang Trung Hieu ;(16:00
*** eharney has joined #openstack-infra16:01
mnaserits a bit of a mess yeah, i'm waiting for gate to slow down and i was staggering my approves too16:01
*** felipemonteiro has quit IRC16:01
AJaegerquestion now is, abandon those puppet reviews and let corvus's ones in - or merge them?16:02
* AJaeger really leaves now16:02
mnaserAJaeger: ill discuss this with the puppet team16:03
*** tesseract has quit IRC16:03
*** tosky has joined #openstack-infra16:04
pabelangerokay, zuuls results queue looks to be caught up. I'm hoping the wavyness of nodes / requests will start to level out: http://grafana.openstack.org/dashboard/db/zuul-status16:05
*** jappleii__ has quit IRC16:05
pabelangerI'm going to top up coffee16:05
*** hemna_ has joined #openstack-infra16:06
*** kiennt26 has quit IRC16:10
EmilienMpabelanger: I'm not sure what to do with https://review.openstack.org/#/c/538012/16:13
EmilienMin POST_FAILURE for release notes16:13
*** xarses_ has quit IRC16:14
*** slaweq has joined #openstack-infra16:14
*** slaweq has quit IRC16:14
*** xarses_ has joined #openstack-infra16:14
EmilienMis it related to the zuul heavy load?16:14
pabelangerEmilienM: looks like bug with job: http://logs.openstack.org/12/538012/1/check/build-openstack-releasenotes/5ee5d6d/job-output.txt.gz#_2018-01-29_16_02_43_56207016:14
pabelangerEmilienM: I'll dig more into it shortly, still watching zuul this morning16:15
corvuspabelanger: fyi the status alert still says don't recheck or approve16:15
pabelangercorvus: yah, I think we can clear now.16:15
clarkbEmilienM: pabelanger looks like the release note build failed because the source dir couldn't be found which led to not having build artifacts to sync which led to a post failure16:16
pabelangerhow does the following sound: status ok zuul.o.o is back online, feel free to recheck / approve patches.16:16
*** dsariel has quit IRC16:16
*** mylu has quit IRC16:17
EmilienMclarkb: ah yeah, sources dir is missing, let me fix that16:17
EmilienMthat's probably it16:17
*** felipemonteiro has joined #openstack-infra16:18
pabelanger#status ok zuul.o.o is back online, feel free to recheck / approve patches.16:18
openstackstatuspabelanger: sending ok16:18
pabelangerAJaeger: mriedem: looks like we still need to get https://review.openstack.org/533608/ to help nova. I rechecked the depends-on patches already16:19
*** links has quit IRC16:20
mriedempabelanger: yup thanks - i was holding off on the rechecks b/c of the earlier 'don't recheck'16:20
*** ChanServ changes topic to "Discussion of OpenStack Developer and Community Infrastructure | docs http://docs.openstack.org/infra/ | bugs https://storyboard.openstack.org/ | source https://git.openstack.org/cgit/openstack-infra/ | channel logs http://eavesdrop.openstack.org/irclogs/%23openstack-infra/"16:20
-openstackstatus- NOTICE: zuul.o.o is back online, feel free to recheck / approve patches.16:20
*** felipemonteiro_ has joined #openstack-infra16:21
openstackgerritMerged openstack-infra/nodepool master: Fix race in test_failed_provider  https://review.openstack.org/53852916:21
pabelangerclarkb: have you see http://paste.openstack.org/show/657148/ before? that is on logstash-worker01.o.o16:22
clarkbpabelanger: ya that shouldn't be fatal I don't think unless we've filled the disk again with crm114 data16:23
*** olaph1 has joined #openstack-infra16:24
openstackstatuspabelanger: finished sending ok16:24
openstackgerritJavier Peña proposed openstack-infra/system-config master: Move AFS mirror code to puppet-openstackci  https://review.openstack.org/52903216:24
*** olaph has quit IRC16:24
*** felipemonteiro has quit IRC16:25
pabelangerclarkb: yah, does appear to be fatal, but haven't see that worker process anything else after it16:25
pabelangerlet me poke around why that is16:25
*** pcaruana has quit IRC16:25
pabelangerlogproc+ 18309  4.5  0.0      0     0 ?        Z    16:19   0:20 [classify-log.cr] <defunct>16:27
pabelangerthat doesn't looks healthy16:27
pabelangeraround same time too16:27
*** yamamoto has quit IRC16:28
pabelangerclarkb: where should I be looking for crm114 data?16:28
clarkbpabelanger: /var/run/crm114 iirc16:29
clarkbmight be /var/lib/crm11416:29
clarkbpabelanger: typically when I debug e-r/es/logstash problems I try to start at the bottom of the pipeline. So make sure es is happy first. Then logstash is running. Then logstash workers. It could be that logstash OOM'd or something which made teh workers unhappy16:32
*** dhill_ has quit IRC16:33
*** kopecmartin has quit IRC16:34
pabelangerokay16:34
*** slaweq has joined #openstack-infra16:34
*** armaan has joined #openstack-infra16:36
*** andreas_s has quit IRC16:37
*** tushar_ has joined #openstack-infra16:37
*** andreas_s has joined #openstack-infra16:38
*** armaan_ has quit IRC16:39
tushar_Hi16:39
tushar_can we use zuul v2 with github instead of gerrit?16:39
*** felipemonteiro_ has quit IRC16:40
pabelangertushar_: only zuulv3 today has native support github integration16:40
*** dhill_ has joined #openstack-infra16:40
*** andreas_s has quit IRC16:41
*** andreas_s has joined #openstack-infra16:42
*** andreas_s has quit IRC16:42
*** andreas_s has joined #openstack-infra16:43
*** yamamoto has joined #openstack-infra16:43
*** jpena is now known as jpena|brb16:44
*** spligak_ has quit IRC16:44
pabelangermgagne: do you mind checking the following IP in http://logs.openstack.org/23/524423/44/gate/openstack-tox-py35/a86c30b/zuul-info/inventory.yaml16:45
mgagnepabelanger: what's happening?16:45
mgagneorphan/zombie instance?16:45
pabelangermgagne: we're seeing WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!16:45
pabelangeryah, maybe16:45
pabelangerhttp://logs.openstack.org/23/524423/44/gate/openstack-tox-py35/a86c30b/job-output.txt.gz#_2018-01-29_16_35_05_94612416:45
tushar_pabelanger : ok, that means zuul v2 can only work with gerrit16:45
*** e0ne has quit IRC16:46
pabelangertushar_: yah, we only supported gerrit for zuulv216:46
tushar_pabelanger : thank you16:46
*** slaweq_ has joined #openstack-infra16:47
*** tesseract has joined #openstack-infra16:47
*** oidgar has joined #openstack-infra16:47
tobiashtushar_: there were some experimental patches for v2 but16:48
*** yamamoto has quit IRC16:48
tobiashI think for github it's better to switch to v316:48
*** ykarel|away has joined #openstack-infra16:49
tushar_tobiash : yes , thank you16:49
prometheanfire10:50 <          smcginnis > prometheanfire: I think I saw something that the "/#/c" format of the URLs don't work for the new Depends-on syntax.16:50
prometheanfirecan someone confirm?16:50
mgagnepabelanger: I found the instance and destroyed it16:51
mordredprometheanfire: the c format works fine16:51
*** salv-orlando has joined #openstack-infra16:51
*** yamamoto has joined #openstack-infra16:51
prometheanfiremordred: ok, thanks16:51
*** yamamoto has quit IRC16:51
prometheanfiresmcginnis: #/c works aparently :D16:52
*** slaweq_ has quit IRC16:52
*** andreas_s has quit IRC16:52
pabelangermgagne: thanks! do you know the UUID? Want to see if we had an error on nodepool side16:52
smcginnisOh, great.16:52
mgagnec7de571d-bc73-4945-97b6-4c738276962e16:52
*** Guest64456 has quit IRC16:53
smcginnisprometheanfire, mordred: What about the second to last paragraph here: http://lists.openstack.org/pipermail/openstack-dev/2018-January/126650.html16:53
Roamer`tobiash, pabelanger, tushar_, but can't one still use github with zuulv2 only as a Git mirror for "projects from git", not as an event stream?  Still follow the review.o.o event stream and get the patches from Gerrit, but then check out the other repos from GitHub?16:53
*** salv-orlando has quit IRC16:54
Roamer`(I've never tried it, I just can't see a reason why it would not work)16:54
*** salv-orlando has joined #openstack-infra16:54
*** eharney_ has joined #openstack-infra16:54
mgagnepabelanger: seems there are more, will check16:54
tobiashRoamer`: sure, that works16:56
*** andreas_s has joined #openstack-infra16:57
pabelangermgagne: nodepool looks to have been fine with that UUID, no exceptions raised16:57
*** jamesmcarthur has quit IRC16:57
mordredsmcginnis: oh! well, ignore me and pay attention to jim always16:57
mgagnepabelanger: ok, could be that if you restart a Nova service, it might just drop the request and never properly delete the instance.16:58
pabelangerRoamer`: right, in that case zuul doesn't have any interactions with github, just gerrit. So that is supported, as long as you replicate projects to github, end users could pull from github. But like you say, patches must be sent to gerrit16:58
pabelangermgagne: k16:58
*** eharney has quit IRC16:58
mordredprometheanfire: (also, pay attention to smcginnis and corvus and ignore me - I'm wrong)16:58
mgagnepabelanger: deleting ~10 rogue instances16:58
*** jpich has quit IRC16:59
pabelangermgagne: thank you16:59
prometheanfiremordred: ok, so just remove the /#/c from the url?17:00
mordredprometheanfire: yah17:00
smcginnismordred: :)17:00
mnaserat what point can i queue another check in zuul to avoid memory blowing up?  once i see the jobs appear in zuul.openstack.org -- can i continue at that point?17:00
mgagneall done. bug is still in our backlog... =(17:00
*** andreas_s has quit IRC17:01
mordredRoamer`: you can ... but you should almost certainly not, as github is actually less stable as a source of cloning git refs. you should use git.openstack.org as a source of cloning refs and review.o.o as the source of the event stream17:03
*** janki has quit IRC17:04
mordredRoamer`: I'm not saying that to be a hater, fwiw - we recently connected zuulv3 to ansible/ansible on github and had a pile of issues related to being able to clone/fetch from github which caused us to need to turn off the github connection until we can go make the cloning more robust17:05
*** janki has joined #openstack-infra17:05
*** pcichy has joined #openstack-infra17:06
tobiashmordred: did you have problems with auth and so on or just networking instabilities?17:08
*** ijw has quit IRC17:09
mordredtobiash: just networking and/or missing refs17:09
tobiashmordred: when running in github app mode you also need https://review.openstack.org/#/c/535716/17:09
*** janki has quit IRC17:09
*** sshnaidm_ is now known as sshnaidm17:09
tobiashat least if auth matters17:10
mordredtobiash: oh - it's possible I don't actually know - in this case it's a public repo so we should have just been cloning/fetching normally17:10
tobiash(and it matters for api rate limits)17:10
*** wolverineav has quit IRC17:10
pabelangermnaser: is the results queue on zuul.o.o is backing up (currently 316 results) then it is likely possible zuul is doing dynamic reloads.  Not always, but seems to be a good indicator.17:10
mnaserok i'll watch that number and try to approve when that number is low17:10
pabelangermnaser: ideally it should be 0 most of the time, when it is growing, zuul is usually doing something with CPU17:11
* mnaser write a bot to notify when is a good time to approve changes :P17:11
mordredtobiash: I haven't looked at the actual errors in the logs myself - but I think ours were different than yours in that patch17:12
*** eharney_ is now known as eharney17:12
tobiashmordred: probably, but github also rate limits anonymous clones17:13
*** armax has joined #openstack-infra17:14
mordredtobiash: good point17:14
pabelangermordred: mnaser: do you mind reviewing https://review.openstack.org/537995/, is the removal of infracloud-chocolate in nodepool.17:15
pabelangerwould like to see about starting to clean that up this week17:15
*** ykarel|away has quit IRC17:16
mordredpabelanger: do you want me to avoid +A?17:16
mordredpabelanger: +2 - will let you +A as needed17:16
pabelangermordred: yah, just wanted to get some eyes on it. I'm happy to +3 if everybody is good17:17
pabelangerclarkb: ^17:17
Shrewspabelanger: oh, that reminds me. i'm going to use the new erase command to cleanup vanilla data17:17
pabelangerShrews: yay17:17
*** tmorin has quit IRC17:18
Shrewspabelanger: w00t. success17:18
Shrews7 nodes of build data removed17:18
pabelangercool17:19
mnaserpabelanger: lgtm too, feel free to +A when you would like17:20
pabelangerShrews: is there another command to erase images?17:20
Shrewspabelanger: no, 'erase' does both images and nodes (note the actual images or instances, just the zk data)17:21
Shrewspabelanger: i had manually done the vanilla nodes earlier (which led to me creating the new command)17:21
*** r-daneel_ has joined #openstack-infra17:21
Shrewss/note/not/17:22
Shrewspabelanger: 'nodepool info infracloud-chocolate' will show what will be removed if you s/info/erase/17:22
*** r-daneel has quit IRC17:23
*** r-daneel_ is now known as r-daneel17:23
clarkbpabelanger: ya I just wanted to make sure we didn't remove it if someone had more info or had reason to keep it17:24
clarkbbut its been a while now probably fine to remove if it hasn't come back17:24
pabelangerShrews: okay, cool. So did you run it against vanilla right?17:25
Shrewspabelanger: correct. you want to do chocolate?17:25
*** mlavalle has joined #openstack-infra17:25
pabelangerShrews: yah, think so. It is still down and we are landing 53799517:25
pabelangerclarkb: wfm17:26
mlavalleHi, are we also expected to move the neutron periodic jobs to the neutron repo: https://github.com/openstack-infra/project-config/blob/master/zuul.d/projects.yaml#L10220?17:26
*** gyee has joined #openstack-infra17:29
*** armaan_ has joined #openstack-infra17:29
*** armaan_ has quit IRC17:31
clarkbmlavalle: I think the ultimate goal is to have everything except for maybe the system-required template moved into the repos themselves17:31
clarkbthat said, I'm not sure how periodic which are branchless will interact if defined in branched repos17:31
*** jpena|brb is now known as jpena17:31
*** armaan_ has joined #openstack-infra17:32
mlavalleclarkb: thanks. how can we clarify that?17:32
*** armaan has quit IRC17:32
clarkbmlavalle: probably just need to try it and see how it works. I think you may end up wanting to define branch specific periodic jobs in each branch?17:33
corvusyes, that should work17:33
mlavalleclarkb, corvus: ok cool. I'll put together a patch for Neutron master and start playing with it. I will hold off on merging until Rocky17:34
corvusinstead of 'periodic-neutron-foo-pike' just create 'periodic-neutron-foo' and define it on the pike branch, and put another copy on the ocata branch, etc.17:34
mlavalleok, sounnds reasonable17:35
mlavallematches this: https://docs.openstack.org/infra/manual/zuulv3.html#periodic-jobs17:35
*** slaweq_ has joined #openstack-infra17:36
AJaegermlavalle: no need to name them periodic-, see https://docs.openstack.org/infra/manual/drivers.html#consistent-naming-for-jobs-with-zuul-v317:37
AJaegermlavalle: and you can easily test them, add them *initially* to check gate to see that the job works - but don't merge. Once it works, add to periodic pipeline.17:38
*** ykarel|away has joined #openstack-infra17:38
mlavalleAJaeger: ok, will keep both things in mind :-)17:38
mlavallethnaks17:38
mlavallewe will merge after we release Queens17:39
*** oidgar has quit IRC17:40
*** slaweq_ has quit IRC17:40
corvusoh, yes, what AJaeger said.17:41
*** myoung is now known as myoung|food17:42
smcginnisLooks like we had a release job failure for python-blazarclient.17:42
smcginnis- tag-releases finger://ze03.openstack.org/1a4929c9670547a89fd3eb23896329d7 : POST_FAILURE in 3m 55s17:42
smcginnisHow do we determine what happened there?17:43
smcginniscc dhellmann ^17:43
corvussmcginnis: infra-root is required if all we have is the finger url17:43
corvussmcginnis: i'll dig it up17:43
smcginnisThanks corvus17:44
corvus2018-01-29 17:28:16,458 DEBUG zuul.AnsibleJob: [build: 1a4929c9670547a89fd3eb23896329d7]         msg: 'There was an issue creating /srv/static/logs/9c/9c06fcd854a83f2e348e062cbea507b7d752369a17:45
corvus2018-01-29 17:28:16,459 DEBUG zuul.AnsibleJob: [build: 1a4929c9670547a89fd3eb23896329d7]           as requested: [Errno 13] Permission denied: ''/srv/static/logs/9c/9c06fcd854a83f2e348e062cbea507b7d752369a'''17:45
corvusdmsimard|afk, infra-root: ^ i suspect something is amiss with the rsync17:46
pabelangeroh no17:46
*** jamesmcarthur_ has quit IRC17:46
corvusls -la /srv/static/logs/9c/17:46
dhellmannsmcginnis : the new missing-releases output: http://paste.openstack.org/show/657270/17:46
corvusdrwx------   2 root    root    4096 Jan 28 14:04 .17:46
pabelangerread-only FS again17:47
corvuspabelanger: where do you see that?17:47
corvuspabelanger: it's an ownership issue17:47
openstackgerritMerged openstack-infra/project-config master: Remove infracloud-chocolate from nodepool  https://review.openstack.org/53799517:47
pabelangercorvus: still confirming17:47
pabelangercorvus: yes, please ignore me17:48
corvusok17:48
*** jamesmcarthur has joined #openstack-infra17:49
corvusinfra-root: i'm not really well enough to operate machinery as root.  can someone correct the filesystem permissions, and decide if we need to stop dmsimard's rsync?17:50
pabelangeryes, give me a moment and I'll start looking17:50
clarkbI'm not entirely sure I know everything going on here but let me know how I can help17:50
*** sree has joined #openstack-infra17:51
pabelangerclarkb: yes, we have a mix of root / jenkins permissions on /srv/static/logs17:52
pabelangerI think this is because dmsimard|afk rsync process is running as root17:52
*** yamamoto has joined #openstack-infra17:52
pabelangerand not jenkins user17:52
corvussmcginnis: the job otherwise succeeded, only failed to copy the logs17:53
clarkbis it running with -a? that should've preserved uids17:53
clarkbbut ya we may need to stop the rsync, chmod what is there then restart rsync with proper uid handle17:53
pabelangerI am not sure if temp server has jenkins user, would need to first check17:53
clarkbpabelanger: even if it doesn't the filesystem should have ownership set to those uids17:53
Shrewspabelanger: fyi, chocolate zk data erased17:53
pabelangerclarkb: possible chown happens after all data is copied?17:54
pabelangerShrews: ack17:54
dhellmanncorvus , smcginnis : I see the zaqar client release jobs in the queue17:54
clarkbpabelanger: my concern about that is the rsync is supposed to take like a week? and then who knows how long the chown will take17:54
*** slaweq_ has joined #openstack-infra17:55
*** florianf has quit IRC17:55
smcginnisdhellmann: I think this was blazarclient.17:55
*** sree has quit IRC17:55
pabelangerclarkb: agree. I think best case now is stop rsync, chown /srv/static/logs, then look back into rsync restore17:55
clarkbpabelanger: that would be my vote too17:56
pabelangerokay, let me connect to screen17:56
pabelangerand stopped17:56
dhellmannsmcginnis : blazarclient just merged and is still in release-post17:57
pabelangerchown -R jenkins:jenkins running17:57
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: Added endpoint get speakers summits assistance by summit  https://review.openstack.org/53899217:58
*** rkukura has joined #openstack-infra17:58
*** slaweq_ has quit IRC17:59
*** wolverineav has joined #openstack-infra18:00
*** derekh has quit IRC18:00
*** yamamoto has quit IRC18:01
*** sambetts is now known as sambetts|afk18:01
*** sree has joined #openstack-infra18:04
*** slaweq has quit IRC18:07
*** agopi|out has joined #openstack-infra18:07
*** slaweq has joined #openstack-infra18:08
*** efoley has quit IRC18:08
*** sree has quit IRC18:09
dmsimard|afkcorvus, pabelanger: hey, briefly stepping in before I go out again.. the rsync shouldbe running with -avz --progress18:09
*** weshay|ruck is now known as weshay|ruck|brb18:09
dmsimard|afkI also questioned in infra-incident yesterday whether we should bother with the rsync at all considering it is bound to take several days18:09
*** david-lyle has quit IRC18:09
*** wolverineav has quit IRC18:10
*** yamahata has quit IRC18:10
dmsimard|afkI'm totally open to stopping the rsync altogether18:10
openstackgerritMerged openstack-infra/openstackid-resources master: Added endpoint get speakers summits assistance by summit  https://review.openstack.org/53899218:11
*** trown|rover is now known as trown|lunch18:11
pabelangerrsync is already stopped, now trying to change permissions of root:root back to jenkins:jenkins18:11
pabelangerclarkb: wonder if we should just find -type d first, then set to jenkins: jenkins. I'm then guessing ansible would do the right thing to upload new files18:12
*** ralonsoh_ has quit IRC18:12
pabelangerwe can then run it again on files after18:12
clarkbpabelanger: ya not sure if the extra checking will be faster or slower18:13
clarkbwhat you've got running now is probably fine?18:13
pabelangerk, I did chown top level directories first18:14
pabelangerbut is running now across everything, and in 00/ directory18:14
*** myoung|food is now known as myoung18:14
*** ykarel|away has quit IRC18:16
*** Swami has joined #openstack-infra18:17
openstackgerritFabien Boucher proposed openstack-infra/zuul-jobs master: Propose to move submit-log-processor-jobs and submit-logstash-jobs in zuul-jobs  https://review.openstack.org/53784718:17
*** jamesmcarthur has quit IRC18:18
*** jpena is now known as jpena|off18:18
mgagnepabelanger: so I disabled force_raw_images in inap-mtl01, lets see how it goes18:19
pabelangermgagne: let me check my notes, I want to say there is a 2nd setting needed18:19
mgagnepabelanger: ok, let me know =)18:20
*** weshay|ruck|brb is now known as weshay18:20
*** weshay is now known as weshay|ruck18:20
pabelangermgagne: https://review.openstack.org/368955/18:20
pabelangermgagne: we also set libvirt/images_type to qcow218:21
*** jamesmcarthur has joined #openstack-infra18:21
*** slaweq has quit IRC18:23
*** slaweq has joined #openstack-infra18:23
pabelangersmcginnis: dhellmann: do you mind trying your release-test project tag, would like to see if logging was been corrected (I believe it should be for release).18:24
mgagnepabelanger: could it be that image_type defaults to flat if cow is disabled? https://github.com/openstack/nova/blob/ffd59abf1635b35e38396468f9828e2d8cc85f09/nova/virt/libvirt/imagebackend.py#L114918:24
*** tosky has quit IRC18:24
*** felipemonteiro has joined #openstack-infra18:24
mgagnebut use_cow_images is true by default: https://github.com/openstack/nova/blob/f96e89cc5183e107cffeaf47525ab337c18d7e14/nova/conf/compute.py#L23018:25
*** felipemonteiro_ has joined #openstack-infra18:25
clarkbpabelanger: what server is the rsync running from?18:26
*** jamesmcarthur has quit IRC18:26
pabelangerclarkb: logs.o.o is where I found screen18:26
pabelangermgagne: I'm not sure, I'd have to look into it more18:26
mgagneok, or lets wait until tomorrow and see if it fixes anything. Or I can stop being lazy and test it =)18:27
pabelangermgagne: yah, infracloud was mitaka, so it might have been fixed since then18:28
mgagnepabelanger: oh, we are still running mitaka =)18:28
dmsimard|afkclarkb, pabelanger: the server the data is in is 104.130.246.18718:28
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Partial revert for disabled provider change  https://review.openstack.org/53899518:28
clarkbdmsimard|afk: pabelanger cool thanks18:28
mgagnepabelanger: btw, why aren't you updating? should you be using ci/cd? /s =)18:28
ShrewsI'm not sure what made me discover it, but 538995 fixes a change that just merged.18:29
pabelangermgagne: lost access to hardware18:29
mgagnethat's a good reason I guess lol18:29
pabelanger:)18:29
*** felipemonteiro has quit IRC18:29
*** efried is now known as efried_hexchat18:29
*** r-daneel_ has joined #openstack-infra18:33
rosmaitahate to be a PITA (who am i kidding), but could someone take a look at https://review.openstack.org/536525 ? it's been showing as all success on zuul status page for a while now, but is still in the integrated queue18:34
rosmaita"for a while18:34
rosmaita" > 1 hour18:34
*** r-daneel has quit IRC18:34
clarkbrosmaita: it cannot merge until the things ahead of it in the gate queue merge or are rejected18:34
rosmaitaclarkb gotcha, i was only looking locally18:36
rosmaitanow the outlook is much bleaker when i look globally :(18:36
*** r-daneel has joined #openstack-infra18:37
*** spzala has joined #openstack-infra18:37
*** spzala has quit IRC18:37
*** r-daneel_ has quit IRC18:38
*** jamesmcarthur has joined #openstack-infra18:38
*** e0ne has joined #openstack-infra18:40
*** jamesmcarthur has quit IRC18:42
*** jamesmcarthur has joined #openstack-infra18:44
*** jaosorior has quit IRC18:48
mriedemare we ok to recheck things now?18:50
*** camunoz has joined #openstack-infra18:50
smcginnismriedem: Should be OK.18:51
mriedemok18:51
clarkbthe only ting you may run into at tis point is the permissions issue on the log server which pabelanger is working to fix18:51
*** hemna_ has quit IRC18:52
clarkbpabelanger: any idea how far it has gotten at this point?18:53
*** yamahata has joined #openstack-infra18:53
*** shardy has quit IRC18:56
*** jamesmcarthur has quit IRC18:57
cmurphyhttps://review.openstack.org/#/c/537645/ is about to fail with a timeout in neutron-grenade and domino a bunch of other jobs, would it make sense to increase the timeout in neutron-grenade?18:58
*** tushar_ has quit IRC19:00
AJaegercmurphy: we increased timeout of unit tests already, do we see this more often? Then it makes sense...19:01
AJaegercmurphy: job lives in neutron repo19:01
*** slaweq_ has joined #openstack-infra19:01
AJaegercmurphy: http://zuul.openstack.org/builds.html?job_name=neutron-grenade19:02
clarkbinfra-root can you take a look at https://bugs.launchpad.net/openstack-ci/+bug/1745512 to see if my comment(s) are accurate?19:03
openstackLaunchpad bug 1745512 in OpenStack Core Infrastructure "openstack email server blacklisted" [High,Confirmed] - Assigned to OpenStack CI Core (openstack-ci-core)19:03
cmurphyi guess this is the first i've seen it for neutron-grenade in particular, it just seems like something or other is always timing out and taking out a queue of successful jobs with it19:03
*** jamesmcarthur has joined #openstack-infra19:03
*** tosky has joined #openstack-infra19:03
clarkbcmurphy: its fairly common for things to break all ove rduring feature freeze :/19:03
cmurphyokay, it's just really discouraging and i'm looking for ways to help19:04
clarkbcmurphy: what seems to happen is we merge a ton of code last minute that hasn't had the same level of review or testing in order to get it in with the idea we'll fix it before release19:05
*** slaweq_ has quit IRC19:05
clarkbthat coupled with the extra load in general for the extra demand results in sadness19:05
clarkband the result is we spend the next few weeks whack a moling things to make them happy again19:06
*** jamesmcarthur has quit IRC19:06
cmurphyyeah, i've definitely learned to plan better for next time19:07
*** lucasagomes is now known as lucas-afk19:08
*** jamesmcarthur has joined #openstack-infra19:09
*** dprince has quit IRC19:10
*** sshnaidm is now known as sshnaidm|afk19:12
*** jamesmcarthur has quit IRC19:12
*** trown|lunch is now known as trown|rover19:14
*** jamesmcarthur has joined #openstack-infra19:16
AJaegercmurphy's https://review.openstack.org/#/c/537645/ has timed out first roles and no output since 90 mins ;( we should take it out of the queue instead of waiting longer ;(19:17
pabelangerclarkb: only in 03, we have a long way to go19:17
pabelangerAJaeger: no, please done19:19
pabelangerdon't*19:19
cmurphyAJaeger: looks like it just aborted19:19
pabelangerI think zuul would have rerun it19:19
pabelanger:(19:19
AJaegerARGH ;(19:19
AJaegersorry, i rebased19:19
*** CrayZee has joined #openstack-infra19:19
*** CrayZee is now known as snapiri-19:19
cmurphyi didn't know that timing out jobs would automatically be rerun, that's good to know19:20
*** jamesmcarthur has quit IRC19:20
pabelangerit depends on the return result we get back from ansible, in some cases we requeue the job to try again19:21
AJaegercmurphy: I didn't either19:21
pabelangerespecially if provider is having issues19:21
pabelangerat least, that is what I have seen before19:21
AJaegerindeed, we do - should have looked closer ;(19:22
*** edmondsw has quit IRC19:23
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: [WIP] zuul web: add admin endpoint, enqueue commands  https://review.openstack.org/53900419:23
*** harlowja has joined #openstack-infra19:25
pabelangerAJaeger: yah, not output is bug in timeout handler. Often seen if we have networking issues with provider. We'd wait until to timeout value for all phases of post-run playbooks19:26
pabelangerI think it was hung on deleting SSH keys19:26
*** david-lyle has joined #openstack-infra19:27
*** tesseract has quit IRC19:31
*** snapiri- has quit IRC19:32
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: Added endpoint delete summit speaker assistance  https://review.openstack.org/53900719:35
*** jamesmcarthur has joined #openstack-infra19:38
*** xarses_ has quit IRC19:44
*** dprince has joined #openstack-infra19:45
*** kjackal has quit IRC19:48
*** mriedem1 has joined #openstack-infra19:52
*** mriedem has quit IRC19:52
*** xarses_ has joined #openstack-infra19:53
*** e0ne has quit IRC19:54
*** jamesmcarthur has quit IRC19:55
*** e0ne has joined #openstack-infra19:57
*** mriedem1 is now known as mriedem19:57
openstackgerritMerged openstack-infra/openstackid-resources master: Added endpoint delete summit speaker assistance  https://review.openstack.org/53900719:58
*** e0ne has quit IRC19:59
*** jamesmcarthur has joined #openstack-infra20:03
*** pramodrj07 has joined #openstack-infra20:04
*** slaweq_ has joined #openstack-infra20:08
*** Swami has quit IRC20:09
*** eharney has quit IRC20:09
*** jamesmca_ has joined #openstack-infra20:11
pabelangeranother gate reset by nova tox job, timed out again20:12
pabelangerwe should consider maybe promoting those patches20:12
AJaegerEmilienM: please wait with further approvals on the Zuul project name removal changes20:12
*** slaweq_ has quit IRC20:12
*** jamesmca_ has quit IRC20:13
pabelangerinfra-root: any objections to promote https://review.openstack.org/536936/ to help stop integrate queue stop resetting?20:13
AJaegerEmilienM: every change to zuul config files increases Zuuls memory usage and might lead us to kill again...20:13
EmilienMAJaeger: ok...20:13
*** jamesmcarthur has quit IRC20:13
*** jamesmca_ has joined #openstack-infra20:13
EmilienMAJaeger: I'm just reviewing patches20:13
EmilienMhow long should I wait?20:14
AJaegerEmilienM: please take a break o nthose that touch zuul yaml files - until the current ones are merged I guess20:14
pabelangersorry, it is 53793320:14
pabelangerbut 536936 is also important, but can look at that in a bit20:15
AJaegerpabelanger: I'm fine with 53793320:15
AJaegerpabelanger: 536936 only defines when jobs to run20:15
ianwseems ok (just catching up ...)20:15
*** Swami has joined #openstack-infra20:15
AJaegerso, important to not introduce regressions but it's on stable/ocata, so could wait as well a bit...20:15
AJaegermorning ianw20:15
EmilienMAJaeger: ok20:16
pabelangerAJaeger: I believe https://review.openstack.org/533608/ is part of the issue too20:16
pabelangerAJaeger: but 536936 and 536934 should also include timeout bump from 537933 otherwise, same issues will happen on stable branches20:17
pabelangermriedem: ^FYI20:17
cmurphy537933 seems to be timing out itself though :/20:18
clarkbthats interesting that they seem to blame kernel patches for meltdown?20:18
clarkbI wonder if we have more data on that20:18
*** ldnunes has quit IRC20:18
AJaegerpabelanger: agreed20:19
*** Goneri has quit IRC20:19
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: zuul autohold: allow filtering per commit  https://review.openstack.org/53699320:19
mriedempabelanger: yeah i think i made a note to myself about that earlier today when the backports were failing b/c nova-tox-functional was timing out20:20
mriedemthat i might have to backport and squash the timeout change too20:20
pabelangercmurphy: I'm not sure what happened in tempest-full, but openstack-tox-functional-py35 is a duplicate test currently. Replaced by nova-tox-functional-py3520:20
pabelangermriedem: ya, i think that might be good.20:20
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Remove webapp  https://review.openstack.org/53678020:21
AJaegermriedem: what about removing depends-on from https://review.openstack.org/#/c/533608/ and merge the change?20:21
pabelangerAJaeger: another option is to just remove openstack-tox-functional-py35 from master branch (skip) then do so for other stable branches, until everything is properly backported in nova. To help keep master moving20:21
AJaegerpabelanger: yep, works as well. mriedem, what do you want? pabelanger and myself can +2A quickly...20:22
*** eharney has joined #openstack-infra20:22
pabelangerIMO: 533608 should be updated to use branches on stable only.20:22
mriedemah yeah we could do that,20:23
mriedemchange https://review.openstack.org/#/c/533608/ to be !master,20:23
pabelangeronce landed, turn our focus to stable branches20:24
pabelangermriedem: yah20:24
mriedemand then push the removal patch on top with the depends-on20:24
mriedemi can do that quick20:24
pabelangerk20:25
*** amoralej is now known as amoralej|off20:26
*** edmondsw has joined #openstack-infra20:30
*** rfolco|ruck is now known as rfolco|off20:31
*** Swami has quit IRC20:31
*** edmondsw has quit IRC20:32
*** edmondsw_ has joined #openstack-infra20:32
openstackgerritMatt Riedemann proposed openstack-infra/project-config master: Moving nova functional test def to in tree  https://review.openstack.org/53360820:34
openstackgerritMatt Riedemann proposed openstack-infra/project-config master: Only run openstack-tox-functional on nova stable branches  https://review.openstack.org/53901620:34
mriedempabelanger: AJaeger: ok i think this does it ^20:34
mriedemi didn't mess with the bottom patch to run on stable/queens since it doesn't exist yet, and figured these should all be merged by the time we cut stable/queens anyway20:35
*** agopi|out has quit IRC20:36
AJaegermriedem: looks good.20:37
AJaegermriedem: yeah, hope this can be solved quickly20:37
pabelanger+320:37
AJaegermriedem: might need 80 mins to get nodes - and then should merge quickly...20:38
pabelangeryah, lets see how long new nodes take20:39
mriedemafter waiting a week, i think i can wait 80 minutes :)20:41
mriedemgood idea on slicing that up btw20:41
AJaegermriedem: btw. you asked me a couple of days ago for a nova docs change - done in https://review.openstack.org/538163 . could you put that on your review queue, please?20:46
mriedemah yeah thanks20:46
mriedemi figured you'd find a suse minion20:46
clarkbpabelanger: I'm about to grab lunch but kids should be napping and I can help dig into the logstash stuff20:47
pabelangerclarkb: thanks, I haven't made much progress20:48
AJaegermriedem: was quicker fixing myself ;)20:48
AJaegermriedem: thanks20:49
clarkbpabelanger: es02 reports that cluster is happy and "green". Logstash-worker01 workers all seem to be writing to their log files did you do anything there?20:53
*** Swami has joined #openstack-infra20:53
pabelangerclarkb: I did restart workers on that server, but believe logstash-worker02 has the same issue, but I didn't restart anything20:54
clarkb02 looks happy too20:54
*** e0ne has joined #openstack-infra20:54
* clarkb scans all the servers really quick20:54
pabelangerk20:54
clarkbthe gearman server reports we are keeping up too fwiw20:55
pabelangerclarkb: oh, could it have been permissions on logs.o.o?20:55
clarkbpossibly20:55
pabelangeryah, might be related to that20:55
pabelangerso, I've spend most of today watching zuul.o.o, and it does appear we are spending a lot of time doing dynamic reloads.  Maybe 5-10mins at a time, then we get results from queue, and proceed to do reloads again, get more results, etc. During the periods of reloads, we don't see to be processing node requests, and unblock each time we get more results20:57
clarkbe-r reports happyness now too so whatever it was guessing something related to the logs server20:57
pabelangerleading to wavy graphs in grafana: http://grafana.openstack.org/dashboard/db/nodepool20:57
AJaegerpabelanger: agree with your observation. So, this results in ready nodes that are waiting for jobs for those 5-10 mins and thus start later than they could?21:06
*** olaph has joined #openstack-infra21:06
*** olaph1 has quit IRC21:07
pabelangerAJaeger: yah, we end up pooling ready nodes (from nodepool) and eventually zuul has CPU to switch to in-use.21:09
pabelangerhaving the gate reset isn't helping, as I think it then triggers a new round of dynamic reloads21:10
smcginnisSo every few results, we end up introducing another 5-10 minutes of latency?21:12
AJaegersmcginnis: this hits us pretty hard with gate resets. Not really a problem with new changes added at the ned21:14
AJaegers/ned/end21:14
pabelangerwell, I'd like to avoid saying that is what happening, until we confirm from logs. But what I have just been noticing looking at zuul.o.o and grafana21:16
AJaegerpabelanger: it explains nicely what I'm seeing as well21:17
* AJaeger waves good night21:18
*** anticw has joined #openstack-infra21:22
*** dprince has quit IRC21:22
anticwis there a channel (this one?) to ask about zuul (v3) quirks/issues?21:22
*** slaweq_ has joined #openstack-infra21:22
clarkbanticw: if they are specific to openstack's use of zuulv3 this channel is a good place. But ifyou are running your own zuul for your CI then #zuul may be better21:24
anticwclarkb: specific to openstack zuulv3 ... thanks21:26
anticwre: http://zuul.openstack.org/ ... if i have a filter (most of the time) ... the bulk of the screen is taken up with queues that aren't relevant/useful ... how can i hide those?21:27
*** slaweq_ has quit IRC21:27
clarkbanticw: for that we'd likely need to modify the filtering javascript to remove empty pipelines after applying filters21:27
anticwre: .zuul.yaml ... i searched the docs, but couldn't find a way to do this ... is there a way to specific the *minimum* resources suitable for a builder?  some are slow and will timeout, this is wasted effort for everyone21:27
clarkbanticw: all of our test nodes should be roughly equivalent. 8 vcpus, at least 80GB of disk, an external ip etc21:28
clarkbanticw: so as of right now there isn't much to distinguish the test nodes. There is talk of starting to add smaller instances for jobs like pep8 though which will likely be implemented via a different "label" that you can use in your nodesets21:29
anticwclarkb: some are unreliable, i thought about doing tests on hostname to detect known patterns of VMs which fail often but that feels a bit snarky21:29
clarkbanticw: can you be more specific about the unreliableness? I think the best way forward there is to address those problems instead of avoiding them entirely. We've explicitly tried to build a system thta is resilient to cloud failures and that starts to breakdown if you exclude clouds21:30
anticwclarkb: for openstack-helm... it does... a lot...  and it's not uncommon to see docker and/or kubernetes timeouts21:31
anticwwhen i search the failed logs (this was a week or two back) some hosts seemed more problematic than others21:31
*** e0ne has quit IRC21:32
pabelangerwhat sort of timeouts related to docker / k8s?21:32
clarkbanticw: its not uncommon for cloud specific behavior to create problems but typically we can and have addressed that directly and have not just avoided the cloud entirely21:32
clarkbspecific details are helpful so that we can understand the actual problems here21:33
pabelanger+121:33
*** r-daneel has quit IRC21:33
*** r-daneel has joined #openstack-infra21:33
anticwclarkb: the builder script check things are up ... and times out after say 10 minutes ... some are find after five, some are not21:34
clarkbanticw: and that implies not all services are running in that amount of time?21:35
clarkbanticw: do we know if that is beacuse they are blocking on network io to get packages or images?21:35
anticwmy guess is slow IO21:35
anticwbut i don't think we know21:35
anticwportdirect: ? do we know?21:35
clarkbok, I think ^ is what we need to figure out before we start "solving" the problem21:35
anticwfair21:36
anticwlater today i will dig out recent failures related to things being abnormally slow and point at the specific log items21:36
clarkbwe've put a lot of effort into making things like caching proxies for docker images for example21:36
*** vivsoni has quit IRC21:36
anticwones from 2+ weeks ago i think are less useful21:36
pabelangeryah, wonder if maybe downloading packages from network, we've dealt with that in the pass with regional mirrors / apache reverse proxy21:36
clarkband if you aren't using that proxy image downloads likely will be slow21:36
anticwpabelanger: i don't think it's image download performance21:36
pabelangereg: if you are downloading directly from docker.io, I can see that21:36
clarkbbut straightforward fix for problems like that is using the mirrors21:36
*** vivsoni has joined #openstack-infra21:36
clarkbetc21:36
anticwit's usually k8s pods are slow to be in a ready state21:36
*** dsariel has joined #openstack-infra21:37
pabelangeris there logs showing the failure?21:38
clarkbright we should dig into the causes of that then decide on the best way to address it21:38
*** kgiusti has left #openstack-infra21:38
anticwpabelanger: yeah, there are but as gate scripts change daily i will point only recent ones21:38
anticw(almost daily)21:38
*** jamesmca_ has quit IRC21:39
anticwmost builders have how much ram?  i'm going to start one here as a reference21:40
clarkbanticw: they all have 8GB of ram and 8vcpus21:41
*** jamesmcarthur has joined #openstack-infra21:42
anticwthanks21:42
openstackgerritMerged openstack-infra/nodepool master: Partial revert for disabled provider change  https://review.openstack.org/53899521:43
*** eharney has quit IRC21:43
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: [WIP] zuul web: add admin endpoint, enqueue commands  https://review.openstack.org/53900421:43
*** myoung is now known as myoung|bbl21:46
*** Goneri has joined #openstack-infra21:47
*** jamesmcarthur has quit IRC21:53
*** olaph1 has joined #openstack-infra21:54
*** olaph has quit IRC21:55
*** jamesmcarthur has joined #openstack-infra21:56
*** andreww has joined #openstack-infra21:58
clarkbanticw: https://docs.openstack.org/infra/manual/testing.html is a general document on the topic22:01
*** xarses_ has quit IRC22:01
*** trown|rover is now known as trown|outtypewww22:02
*** dsariel has quit IRC22:02
anticwclarkb: thanks22:03
clarkbneeds an update though we got rid of all the static privileged VMs22:03
anticwit specifies the RAM which i've only seen through experience not documentation before22:04
*** threestrands has joined #openstack-infra22:05
*** threestrands has quit IRC22:05
*** threestrands has joined #openstack-infra22:05
*** threestrands_ has joined #openstack-infra22:07
*** threestrands has quit IRC22:08
*** threestrands_ has quit IRC22:08
*** threestrands has joined #openstack-infra22:08
*** threestrands has quit IRC22:08
*** threestrands has joined #openstack-infra22:08
*** jamesmcarthur has quit IRC22:09
openstackgerritClark Boylan proposed openstack-infra/infra-manual master: Update testing doc with zuul v3 info  https://review.openstack.org/53902922:13
*** dtruong has quit IRC22:13
clarkbupdated to reflect current situation a bit better22:13
*** jamesmcarthur has joined #openstack-infra22:14
pabelanger539016 is finally running jobs22:14
*** dmellado has quit IRC22:17
*** stevebaker has quit IRC22:17
*** stevebaker has joined #openstack-infra22:18
*** dmellado has joined #openstack-infra22:20
*** threestrands_ has joined #openstack-infra22:21
*** felipemonteiro_ has quit IRC22:22
*** threestrands has quit IRC22:23
pabelangerokay, fixed another permission issue on logs.o.o, had to chmod 0775 top-level directories. We might also want to update our publish playbooks to confirm that permission too22:23
*** edmondsw_ is now known as edmondsw22:26
clarkbis the chown still running? I am guessing it is22:28
*** ganso has quit IRC22:29
*** dbecker has quit IRC22:30
*** dave-mccowan has quit IRC22:31
*** dbecker has joined #openstack-infra22:31
*** jamesmcarthur has quit IRC22:33
pabelangeryah, up to 0822:33
*** jappleii__ has joined #openstack-infra22:35
*** jappleii__ has quit IRC22:36
*** jappleii__ has joined #openstack-infra22:37
*** jcoufal has quit IRC22:37
EmilienMgerrit is down?22:37
*** threestrands_ has quit IRC22:37
EmilienMssh: connect to host review.openstack.org port 29418: Network is unreachable22:37
pabelangerEmilienM: works for me22:40
*** bfournie has quit IRC22:40
EmilienMok22:41
EmilienMagain my canadian line22:41
*** mylu has joined #openstack-infra22:42
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: zuul autohold: allow filtering per commit  https://review.openstack.org/53699322:43
EmilienMpabelanger: try pushing a patch22:44
EmilienMit doesn't work, wes tried as well22:44
pabelangerEmilienM: ssh review.openstack.org -p2941822:45
pabelangerthat work?22:45
EmilienMmeh22:45
*** mylu has quit IRC22:46
*** rcernin has joined #openstack-infra22:46
*** mylu has joined #openstack-infra22:46
pabelangerremote:   https://review.openstack.org/539036 Test22:46
pabelangerEmilienM: possible VPN issue?22:46
EmilienMpabelanger: it works now22:48
EmilienMweird22:48
EmilienManyway22:48
openstackgerritMerged openstack-infra/project-config master: Only run openstack-tox-functional on nova stable branches  https://review.openstack.org/53901622:49
pabelangermriedem: AJaeger: ^now merged22:50
*** mylu has quit IRC22:50
mriedemwoot22:52
mriedemabout 30 minutes for that to flush through rihgt?22:52
mriedem*right22:52
pabelangermaybe 120mins22:54
pabelangerjob looks gone now in gate for master branch22:55
pabelangerI'll keep an eye out for 53793322:56
pabelangerbut first grabbing some food22:56
*** slaweq_ has joined #openstack-infra23:00
bnemecEmilienM: I had similar intermittent problems in the past week or two.23:03
bnemecMy suspicion is that it was trying to use the ipv6 DNS entry.23:03
pabelangerclarkb: Shrews: looks like citycloud-sto2 might be full of ready nodes, and zuul doesn't know it.  I manually deleted a node, but maybe want to see why that is.  It looks to be around the time zuul was swapping a lot this morning23:03
bnemecI hard-coded the ipv4 address in /etc/hosts and haven't had a problem since.23:04
bnemecAt some point I'll have to look into a more permanent fix.23:04
*** slaweq_ has quit IRC23:04
*** dtruong has joined #openstack-infra23:04
pabelangerclarkb: Shrews: maybe this is where max-ready-age comes into play?23:05
*** jamesmcarthur has joined #openstack-infra23:05
*** edmondsw has quit IRC23:06
clarkbpabelanger: are there records for the nodes in zookeeper ?23:06
clarkb(I'm trying to determine that myself using zk-shell)23:07
*** weshay|ruck is now known as weshay|ruck|afk23:07
pabelangerclarkb: I haven't looked yet, but nodepool list shows them as ready / unlocked23:07
pabelangerso not sure why zuul isn't using them23:07
clarkbin that case they must be in zk as nodepool list is getting its info from there23:08
pabelangeroh, they just went in-use23:08
clarkbmy understanding is that the request handler is basically first to lock wins23:08
pabelangerI wonder if my delete maybe updated something in zookeeper23:08
*** salv-orlando has quit IRC23:08
*** ekcs has joined #openstack-infra23:08
clarkbso if the other clouds' threads are quicker at locking the requests then those nodes won't be used until the request handler for that cloud gets some locks23:08
*** salv-orlando has joined #openstack-infra23:09
*** tpsilva has quit IRC23:09
pabelanger| 0002238654 | citycloud-sto2         | centos-7         | 53d89441-aa55-4e0c-9b5d-c5bae64ef3a6 | 77.81.189.44    |                                        | ready    | 00:10:34:28 | unlocked |23:09
*** Goneri has quit IRC23:09
*** claudiub|3 has quit IRC23:10
pabelangerwould be good to see why that is still idle23:10
*** rlandy is now known as rlandy|bbl23:11
pabelangeryah, I only see a few centos-7 now ready for 10 hours in sto223:13
*** salv-orlando has quit IRC23:13
*** uberjay_ has joined #openstack-infra23:14
*** bfournie has joined #openstack-infra23:16
clarkbpabelanger: my understanding of how it works is there is a request handler thread for each pool. These poll zookeeper for new requests in zk and if they see one attempt to lock it. Once they have the request lock they check if they have any existing nodes to fulfill the request if they don't then they attempt to boot new instances for the request. If they are at quota they block until they are no23:16
clarkblonger at quota before fulfilling the request23:16
clarkbmy guess is that when zuul was under load it made a ton of node requests. sto2 fulfilled them but then zuul went away so the requests went away but we had nodes for them23:17
Shrewspabelanger: citycloud-sto2 is at quote23:18
clarkbah that would explain why it blocks23:18
*** uberjay has quit IRC23:18
*** felipemonteiro_ has joined #openstack-infra23:18
Shrewspabelanger: there are 4 ready nodes, the sto2 thread is trying to handle a request that wants a ubuntu-xenial node. node of those ready nodes are ubuntu-xenial23:18
Shrewsso it's paused waiting for quota release to build one23:19
*** bnemec has quit IRC23:20
pabelangerShrews: yah, so we must have blocked sto2 again (somehow) and my delete requests got things flowing again23:20
*** mlavalle has quit IRC23:20
*** bnemec has joined #openstack-infra23:20
clarkbpabelanger: your delete requests changed it fomr being at quota to not being at quota23:20
pabelangerclarkb: yah23:20
*** edmondsw has joined #openstack-infra23:20
clarkbI think the issue here is that if you block at quota but none of your exising nodes belong to zuul requests then you'll be deadlocked23:20
Shrewswhat did you delete?23:20
pabelangerShrews: I wanted to see if cloud was processing the request (maybe outage)23:21
pabelangerclarkb: I think if we had max-ready-age to some value (2 hours) we would have deleted a node and unwedged. not the best, but potential work around23:22
Shrewsthat assumes you have something to delete23:22
clarkbya. Another appraoch may be to have nodepool decline requests if it would block and is at quota and has ready nodes23:22
clarkb(but that may result in failed requests unexpectedly)23:23
clarkbShrews: in this case the entire quota was consumed by ready nodes not tied to existing zuul reuqests23:23
clarkbI think because we restarted zuul23:23
clarkbso any of the ready nodes could be deleted23:23
pabelangeryah, believe so, too. They were unlocked for 10 hours23:24
Shrewsclarkb: can you expand more on "ready nodes not tied to existing zuul requests"?23:24
clarkbShrews: yes, sto2 had 50 ready nodes which put it at quota. So the next request it got made it block. Those ready nodes would never go away because the zuul process that requested them was stopped23:25
clarkbShrews: and since there is a single request handler per pool we weren't able to use those ready nodes in other zuul requests23:25
Shrewswait23:25
Shrewsso, i saw 5 ready nodes23:26
Shrewsare you saying all 50 were READY and locked?23:26
pabelanger50 ready and unlocked23:26
pabelangerhttp://grafana.openstack.org/dashboard/db/nodepool-city-cloud23:26
pabelangerfor ~10 hours23:26
Shrewsthat graph is not helpful for me. do either of you have raw zk data for that?23:27
clarkbI don't, just going off of what pabelanger said23:27
Shrewsnodepool list --detail output maybe?23:27
pabelangerI just have nodepool list23:27
*** olaph1 is now known as olaph23:27
Shrewspabelanger: may i see that?23:27
pabelanger1 sec23:27
Shrewsthx23:27
clarkbbasically zuul is running and gets into an unhappy state, while on its way to this unhappy state sto2 made a bunch of centos7 nodes for it. Then we restart zuul unlocking all of those nodes and "freeing" them up23:28
clarkbexcept that the next request sto2 processed was for a different flavor and thus blocked23:28
pabelangerShrews: clarkb: http://paste.openstack.org/show/657537/23:29
pabelanger$ sudo -H -u nodepool nodepool delete 000223803723:29
clarkbbasically deadlocking because it had used up its quota with a single label but was trying to fulfill a new request for a different label23:29
pabelangeris what I ran to test cloud23:29
clarkbwe can't free up nodes because no jobs can run on them and we can't boot new instance because we have no free quota23:29
clarkbdeleting one node allowed the blocked reques tto proceed then if the next request was for centos7 everything starts to get happy23:30
*** uberjay_ has quit IRC23:30
*** stakeda has joined #openstack-infra23:30
clarkbah ok so it wasn't 50 of a single label23:30
clarkblooking at that list I think trusty or debian or fedora or suse request would block though as they aren't xenial or centos723:31
*** uberjay has joined #openstack-infra23:31
pabelangerI'd have to see what nl04.o.o was doing (that is the launcher for citycloud)23:31
pabelangerbut because they were unlocked, I thought zuul would just iterate over unlocked nodes and use them23:32
Shrewspabelanger: was this list *after* restarting a zuul process?23:32
pabelangerShrews: at least 8 hours23:32
pabelanger1 sec23:32
pabelanger14:00 UTC is when I started zuul up again23:33
pabelangerso, 9.5 hours I'd say23:33
*** jamesmcarthur has quit IRC23:34
Shrewssomething isn't making sense here. i don't think i can evaluate it w/o seeing it in real time myself.23:36
pabelangerYa, I really should not have deleted the node23:36
*** s-shiono has joined #openstack-infra23:37
*** olaph has quit IRC23:37
*** olaph has joined #openstack-infra23:37
*** ekhugen- has joined #openstack-infra23:44
Shrewspabelanger: clarkb: ah, so further log digging shows that citycloud-sto2 was handling a request for an ubuntu-trusty node. I count 50 nodes (max-servers = 50 for sto2) in pabelanger's output23:44
Shrewsnone of those 50 were a trusty node23:44
pabelangerokay, so wedged right?23:45
Shrewspabelanger: right23:45
pabelangerk23:45
pabelangeralso :(23:45
Shrewspabelanger: so max-ready-age would definitely have helped in this scenario.23:45
*** ekhugen_alt has quit IRC23:45
*** igormarnat has quit IRC23:45
*** logan- has quit IRC23:45
*** StevenK has quit IRC23:45
*** clarkb has quit IRC23:45
*** jlvillal has quit IRC23:45
*** zeus has quit IRC23:45
*** _Cyclone_ has quit IRC23:45
*** mandre has quit IRC23:45
*** honza has quit IRC23:45
*** adarazs has quit IRC23:45
*** jistr has quit IRC23:45
*** dtantsur|afk has quit IRC23:45
*** r-daneel has quit IRC23:45
Shrewsi was just trying to make sure there was not a bug23:45
pabelangerk23:45
*** edmondsw has quit IRC23:46
* Shrews returns to his evening23:47
pabelangerShrews: thanks!23:47
*** edmondsw has joined #openstack-infra23:47
*** freerunner has quit IRC23:48
Shrewspabelanger: np. i think we could try setting a fairly low max-ready-age in our environment. maybe an hour? will need to consider that a bit more i suppose23:50
pabelangerShrews: yup, a good topic to discuss23:50
*** freerunner has joined #openstack-infra23:51
*** jamesmcarthur has joined #openstack-infra23:51
*** igormarnat has joined #openstack-infra23:51
*** logan- has joined #openstack-infra23:51
*** StevenK has joined #openstack-infra23:51
*** clarkb has joined #openstack-infra23:51
*** jlvillal has joined #openstack-infra23:51
*** zeus has joined #openstack-infra23:51
*** _Cyclone_ has joined #openstack-infra23:51
*** mandre has joined #openstack-infra23:51
*** honza has joined #openstack-infra23:51
*** adarazs has joined #openstack-infra23:51
*** jistr has joined #openstack-infra23:51
*** dtantsur|afk has joined #openstack-infra23:51
*** mylu has joined #openstack-infra23:51
*** edmondsw has quit IRC23:51
*** dave-mccowan has joined #openstack-infra23:53
*** jamesmcarthur has quit IRC23:55
*** hongbin has quit IRC23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!