Monday, 2018-12-17

openstackgerritMerged openstack-infra/system-config master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62538900:05
*** slaweq has joined #openstack-infra00:11
*** slaweq has quit IRC00:15
*** wolverineav has joined #openstack-infra00:29
*** wolverineav has quit IRC00:33
*** dkehn has quit IRC01:29
*** dkehn has joined #openstack-infra01:36
*** bobh has joined #openstack-infra01:44
*** wolverineav has joined #openstack-infra01:48
openstackgerritIan Wienand proposed openstack-infra/nodepool master: [wip] Add dogpile.cache master to the -src tests  https://review.openstack.org/62545701:49
*** oanson has quit IRC01:59
*** dave-mccowan has joined #openstack-infra02:07
*** slaweq has joined #openstack-infra02:11
*** hongbin has joined #openstack-infra02:11
*** slaweq has quit IRC02:16
*** wolverineav has quit IRC02:18
*** wolverineav has joined #openstack-infra02:25
*** wolverineav has quit IRC02:25
*** wolverineav has joined #openstack-infra02:25
*** hongbin has quit IRC02:28
*** bobh has quit IRC02:37
*** dave-mccowan has quit IRC02:47
*** bobh has joined #openstack-infra02:49
*** hongbin has joined #openstack-infra02:50
*** psachin has joined #openstack-infra02:58
*** mrsoul has quit IRC03:02
openstackgerritIan Wienand proposed openstack-infra/project-config master: Add github dogpile.cache to project list  https://review.openstack.org/62546703:06
openstackgerritIan Wienand proposed openstack-infra/nodepool master: [wip] Add dogpile.cache master to the -src tests  https://review.openstack.org/62545703:07
*** dklyle has joined #openstack-infra03:20
*** lbragstad has joined #openstack-infra03:26
openstackgerritMerged openstack-infra/project-config master: add release jobs for git-os-job  https://review.openstack.org/62527303:33
openstackgerritMerged openstack-infra/project-config master: import openstack-summit-counter repository  https://review.openstack.org/62529203:33
*** bhavikdbavishi has joined #openstack-infra03:41
*** ramishra has joined #openstack-infra03:42
*** ykarel has joined #openstack-infra03:43
*** udesale has joined #openstack-infra03:49
*** armax has quit IRC03:57
*** slaweq has joined #openstack-infra04:11
*** bobh has quit IRC04:12
*** slaweq has quit IRC04:16
*** hongbin has quit IRC04:37
*** ykarel has quit IRC04:40
*** eernst has joined #openstack-infra04:49
*** chandan_kumar is now known as chandankumar04:55
*** ykarel has joined #openstack-infra05:01
*** ykarel has quit IRC05:10
*** ykarel has joined #openstack-infra05:12
*** agopi has quit IRC05:15
*** dklyle has quit IRC05:16
*** wolverineav has quit IRC05:24
*** wolverineav has joined #openstack-infra05:25
*** wolverineav has quit IRC05:35
*** janki has joined #openstack-infra05:36
*** eernst has quit IRC05:36
*** rcernin has joined #openstack-infra05:46
*** markvoelker has joined #openstack-infra05:47
*** rcernin has quit IRC05:47
*** markvoelker has quit IRC05:51
*** bhavikdbavishi1 has joined #openstack-infra05:59
*** bhavikdbavishi has quit IRC06:01
*** bhavikdbavishi1 is now known as bhavikdbavishi06:01
openstackgerritOpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml  https://review.openstack.org/62548706:08
*** rcernin has joined #openstack-infra06:09
*** rcernin has quit IRC06:09
*** rcernin has joined #openstack-infra06:09
*** rcernin has quit IRC06:09
*** quiquell|off has quit IRC06:09
*** slaweq has joined #openstack-infra06:11
*** slaweq has quit IRC06:16
openstackgerritRico Lin proposed openstack-infra/irc-meetings master: Change Heat meeting schedule  https://review.openstack.org/62549306:20
*** wolverineav has joined #openstack-infra06:35
*** wolverineav has quit IRC06:40
*** AJaeger has quit IRC07:00
openstackgerritMerged openstack-infra/project-config master: Normalize projects.yaml  https://review.openstack.org/62548707:02
*** AJaeger has joined #openstack-infra07:09
*** apetrich has joined #openstack-infra07:16
openstackgerritM V P Nitesh proposed openstack/diskimage-builder master: Adding new dib element  https://review.openstack.org/62550107:19
*** quiquell has joined #openstack-infra07:24
*** dpawlik has joined #openstack-infra07:26
*** yboaron_ has quit IRC07:28
*** jtomasek has joined #openstack-infra07:32
*** jtomasek has quit IRC07:33
*** jtomasek has joined #openstack-infra07:33
*** Emine has joined #openstack-infra07:33
*** oanson has joined #openstack-infra07:34
*** e0ne has joined #openstack-infra07:35
*** slaweq has joined #openstack-infra07:37
*** e0ne has quit IRC07:39
*** slaweq has quit IRC07:40
*** jpena|off is now known as jpena07:42
*** pgaxatte has joined #openstack-infra07:44
*** slaweq has joined #openstack-infra07:44
*** markvoelker has joined #openstack-infra07:48
*** yolanda has joined #openstack-infra07:50
*** jbadiapa has joined #openstack-infra07:57
*** ykarel is now known as ykarel|lunch07:58
*** rpittau has joined #openstack-infra07:58
*** pcaruana has joined #openstack-infra08:18
*** jpena is now known as jpena|away08:21
*** witek_ is now known as witek08:27
*** yboaron_ has joined #openstack-infra08:29
*** yamamoto has quit IRC08:30
*** yamamoto has joined #openstack-infra08:32
*** yboaron_ has quit IRC08:33
*** yboaron_ has joined #openstack-infra08:34
*** witek has quit IRC08:35
*** tosky has joined #openstack-infra08:36
*** gfidente has joined #openstack-infra08:38
*** ykarel|lunch is now known as ykarel08:39
*** ccamacho has quit IRC08:42
*** yamamoto has quit IRC08:46
*** alexchadin has joined #openstack-infra08:47
*** gibi has joined #openstack-infra08:52
*** agopi has joined #openstack-infra09:01
*** jpich has joined #openstack-infra09:02
*** shardy has joined #openstack-infra09:04
*** lpetrut has joined #openstack-infra09:09
openstackgerritWayne Chan proposed openstack/diskimage-builder master: Update mailinglist from dev to discuss  https://review.openstack.org/62551809:18
*** owalsh_ is now known as owalsh09:24
*** yamamoto has joined #openstack-infra09:26
*** tosky has quit IRC09:29
*** ccamacho has joined #openstack-infra09:31
odyssey4meclarkb mnaser ah, that old chestnut - we did resolve it in the rocky cycle, but have not ported that fix back to older branches.... not sure we should either - what's going on there is that it tries to install the appropriate packages from the local mirror (a container on the host), then falls back to pypi if all the packages aren't on that local mirrror09:31
*** tosky has joined #openstack-infra09:33
AJaegerodyssey4me: if you don't backport it, then stop running the broken jobs...09:34
odyssey4meclarkb mnaser Given that our approach in master/rocky seems to be successful - perhaps we can port it back. Let me propose it and see what the cores think about it.09:35
*** rossella_s has joined #openstack-infra09:35
odyssey4meAJaeger Stopping the jobs running means losing test coverage. The job is not broken, it is working as designed. It's just outputting some logs which appear to be interfering with something which hasn't been expressed. I'm happy to work with clarkb to get that resolved, but not happy to cut test coverage.09:36
*** bhavikdbavishi has quit IRC09:36
AJaegerodyssey4me: happy to see it adressed ;)09:38
*** yamamoto has quit IRC09:45
*** yamamoto has joined #openstack-infra09:45
*** ssbarnea|rover has joined #openstack-infra09:47
*** rtjure has quit IRC09:52
*** derekh has joined #openstack-infra09:58
*** e0ne has joined #openstack-infra10:02
*** yamamoto has quit IRC10:06
*** ginopc has joined #openstack-infra10:08
*** lbragstad has quit IRC10:10
*** jpena|away is now known as jpena10:12
*** xek has joined #openstack-infra10:16
*** yamamoto has joined #openstack-infra10:18
*** yboaron_ has quit IRC10:21
*** yboaron_ has joined #openstack-infra10:22
*** yamamoto has quit IRC10:23
*** sambetts_ has joined #openstack-infra10:29
*** pbourke has quit IRC10:34
*** pbourke has joined #openstack-infra10:36
*** aojea has joined #openstack-infra10:37
*** markmcd has quit IRC10:44
*** derekh has quit IRC10:46
*** derekh has joined #openstack-infra10:47
*** markmcd has joined #openstack-infra10:52
*** electrofelix has joined #openstack-infra10:52
*** yamamoto has joined #openstack-infra11:00
*** ginopc has quit IRC11:01
*** ginopc has joined #openstack-infra11:02
*** udesale has quit IRC11:10
mgoddardhello infra team, we are looking at upgrading the version of Docker used in kolla-ansible. Doing so brings in a new constraint on the URL of the Docker registry mirror - it cannot contain a path. The registry mirror provided currently in CI has a path - /registry-1.docker/. How difficult would it be to configure the mirror to also support use without a path? Also being important here to avoid a hard11:17
mgoddardbreak in existing jobs. Docker bug on the topic is at https://github.com/moby/moby/issues/36598.11:18
*** yboaron_ has quit IRC11:23
*** bhavikdbavishi has joined #openstack-infra11:30
*** e0ne has quit IRC11:31
*** yboaron_ has joined #openstack-infra11:33
*** rfolco has joined #openstack-infra11:37
*** yamamoto has quit IRC11:38
*** yamamoto has joined #openstack-infra11:38
*** tpsilva has joined #openstack-infra11:42
*** rtjure has joined #openstack-infra11:53
*** dkehn has quit IRC11:57
openstackgerritSorin Sbarnea proposed openstack-infra/elastic-recheck master: Identify *POST* timeout failures individually  https://review.openstack.org/62557312:03
openstackgerritSorin Sbarnea proposed openstack-infra/elastic-recheck master: Identify *POST* timeout failures individually  https://review.openstack.org/62557312:03
*** rpittau is now known as rpittau|lunch12:09
*** bhavikdbavishi has quit IRC12:09
odyssey4meclarkb mnaser AJaeger proposed the back ports to OSA's pike & queens branches: https://review.openstack.org/#/q/Ic966bafd04c4c01b3d93851a0e3ec2c1f3312f2812:13
openstackgerritSorin Sbarnea proposed openstack-infra/zuul-jobs master: Remove world writable umask from /src folder  https://review.openstack.org/62557612:14
*** jpena is now known as jpena|lunch12:30
*** janki has quit IRC12:31
*** bobh has joined #openstack-infra12:34
*** bobh has quit IRC12:38
*** udesale has joined #openstack-infra12:39
*** e0ne has joined #openstack-infra12:47
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Delay Github fileschanges workaround to pipeline processing  https://review.openstack.org/62558412:50
*** lpetrut has quit IRC12:51
*** bobh has joined #openstack-infra12:53
*** rlandy has joined #openstack-infra12:57
*** janki has joined #openstack-infra12:59
*** markvoelker has quit IRC13:05
*** boden has joined #openstack-infra13:11
*** e0ne has quit IRC13:11
*** rpittau|lunch is now known as rpittau13:11
*** boden has quit IRC13:12
*** bhavikdbavishi has joined #openstack-infra13:13
*** e0ne has joined #openstack-infra13:14
*** boden has joined #openstack-infra13:14
*** Bhujay has joined #openstack-infra13:15
*** bobh has quit IRC13:16
*** jamesmcarthur has joined #openstack-infra13:20
fricklermgoddard: I don't think that it should be difficult, just tedious. would need dedicated dns records per mirror and an appropriate vhost set up13:20
fricklerinfra-root: FYI this ^^ seems to be what is breaking zuul-quick-start jobs, too. seeing the same error on the node help for this http://logs.openstack.org/55/624855/3/check/zuul-quick-start/00d956c/job-output.txt.gz#_2018-12-17_12_36_58_996620 http://paste.openstack.org/show/737483/13:22
*** dave-mccowan has joined #openstack-infra13:23
*** smarcet has joined #openstack-infra13:23
*** alexchadin has quit IRC13:24
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Delay Github fileschanges workaround to pipeline processing  https://review.openstack.org/62558413:26
openstackgerritMerged openstack-infra/system-config master: Stop running unnecessary tests on trusty  https://review.openstack.org/62535813:26
*** yamamoto has quit IRC13:27
*** jpena|lunch is now known as jpena13:27
fricklermgoddard: FYI I added that topic for tomorrow's infra meeting, maybe you want to join us or read up on it afterwards13:29
*** yamamoto has joined #openstack-infra13:32
*** jamesmcarthur has quit IRC13:32
*** Bhujay has quit IRC13:33
*** rh-jelabarre has joined #openstack-infra13:33
fricklerhmm, actually we are using dedicated ports already. so maybe we can just drop the path if we update our mirror config. http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/install-docker/tasks/mirror.yaml13:35
fricklerthere even is a comment about it here, so IIUC it should be possible to simply switch from :8081/path to :8082/ http://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/templates/mirror.vhost.erb#n38713:39
*** e0ne has quit IRC13:42
*** dave-mccowan has quit IRC13:42
*** gagehugo has joined #openstack-infra13:42
*** kgiusti has joined #openstack-infra13:42
openstackgerritJens Harbott (frickler) proposed openstack-infra/zuul-jobs master: Alwas use pathless docker mirror URI  https://review.openstack.org/62559613:43
fricklerinfra-root: ^^ I'm probably missing something here, but this might be a simple solution13:44
*** e0ne has joined #openstack-infra13:45
*** dkehn has joined #openstack-infra13:45
*** jamesmcarthur has joined #openstack-infra13:47
*** bhavikdbavishi has quit IRC13:49
*** quiquell is now known as quiquell|lunch13:49
*** pcaruana has quit IRC13:50
openstackgerritMerged openstack/os-testr master: Updated from global requirements  https://review.openstack.org/53399313:51
fungifrickler: mgoddard: or a dedicated tcp port?13:54
fungioh, i see that's in fact what we've already set up! ;)13:56
openstackgerritJens Harbott (frickler) proposed openstack-infra/nodepool master: Switch devstack jobs to Xenial  https://review.openstack.org/62485513:57
*** lbragstad has joined #openstack-infra13:58
*** lbragstad has quit IRC13:59
mgoddardfrickler: thanks for looking into this. So it looks like we can use 8082 without a path already?13:59
*** adriancz has joined #openstack-infra13:59
fungiyes, that seems to be why we added the 8082 variant13:59
fricklermgoddard: at least that's how I'm interpreting the situation currently. the above patch is testing this now14:00
fricklerhmm, though maybe a depends-on won't work there properly14:00
mgoddardfrickler: it looks like SamYaple hit this issue based on the comment in that file, so presumably it working for him14:00
fricklermgoddard: yeah, but it still may vary depending on the exact docker version in use. but I think you could give it a try if you want, the mirror setup should be working already14:01
mgoddardfrickler: this looks promising: http://git.openstack.org/cgit/openstack/airship-maas/tree/tools/gate/playbooks/vars.yaml14:02
openstackgerritsebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7  https://review.openstack.org/62495714:02
mgoddardfrickler: I'll give it a shot. Thanks for the help14:02
*** lbragstad has joined #openstack-infra14:04
*** pcaruana has joined #openstack-infra14:05
*** dave-mccowan has joined #openstack-infra14:06
*** smarcet has quit IRC14:06
*** mriedem has joined #openstack-infra14:08
*** bobh has joined #openstack-infra14:09
fricklerfwiw, this whole thing seems to have been triggered by backporting a recent version of docker.io into xenial-updates last thursday https://launchpad.net/ubuntu/+source/docker.io14:09
*** nhicher has joined #openstack-infra14:11
*** dave-mccowan has quit IRC14:11
*** bobh has quit IRC14:12
*** bobh has joined #openstack-infra14:13
*** jamesmcarthur has quit IRC14:13
*** smarcet has joined #openstack-infra14:14
*** jamesmcarthur has joined #openstack-infra14:14
*** quiquell|lunch has quit IRC14:15
*** quiquell has joined #openstack-infra14:15
dulekFolks, I'm seeing "kuryr-daemon 2609G" in dstat's top-mem column.14:15
dulekExample here: http://logs.openstack.org/27/625327/3/check/kuryr-kubernetes-tempest-daemon-openshift-octavia/cb22439/controller/logs/screen-dstat.txt.gz, around 10:2214:16
fungiwoo! kuryr likes it some memory i suppose?14:16
fungiat least that's a kuryr-specific job, so it's presumably not impacting more general job configurations14:16
*** ginopc has quit IRC14:16
*** psachin has quit IRC14:17
*** ginopc has joined #openstack-infra14:17
dulekHow possible that it's some dstat quirk? I strongly doubt that my process allocates more than 2 TB of memory without OOM stepping in.14:17
fricklerdulek: just allocating memory should be no issue as long as it isn't actually used14:19
openstackgerritMerged openstack/ptgbot master: Handle all schedule in a single table  https://review.openstack.org/60730714:20
dulekfrickler: You mean problem for OOM, in general I guess it's not a good move to allocate 2 TB's of RAM. ;)14:21
dulekfrickler: It drains 3 GB of swap, but yeah, looks like it stops there.14:21
*** graphene has joined #openstack-infra14:22
openstackgerritMerged openstack/ptgbot master: Split up function colorizing non-colored tracks  https://review.openstack.org/62003614:22
openstackgerritMerged openstack/ptgbot master: Load base schedule dynamically  https://review.openstack.org/60730814:22
openstackgerritMerged openstack/ptgbot master: Rename ~reload to ~emptydb  https://review.openstack.org/62003714:24
openstackgerritMerged openstack/ptgbot master: Make 'unbook' available for all  https://review.openstack.org/62004314:27
openstackgerritMerged openstack/ptgbot master: Add emergency messages (~motd and ~cleanmotd)  https://review.openstack.org/62004714:27
openstackgerritMerged openstack/ptgbot master: Give better hints in case of command errors  https://review.openstack.org/62005914:27
*** yboaron has joined #openstack-infra14:27
*** yboaron_ has quit IRC14:27
fricklerfungi: https://review.openstack.org/625596 passed and fixed the zuul-quick-start job for https://review.openstack.org/624855 , which in turn is needed to fix nodepool and unblock other things, if you have time for a review yet. other infra-root, too ;)14:27
openstackgerritMerged openstack/ptgbot master: Allow unscheduled tracks to use now/next  https://review.openstack.org/62006614:28
fungifrickler: thanks!14:28
openstackgerritWitold Bedyk proposed openstack-infra/irc-meetings master: Add second time for Monasca team meeting  https://review.openstack.org/62560914:29
*** kiennt26 has joined #openstack-infra14:30
fungifrickler: that's in zuul-jobs and looks like a potential behavior change for downstream consumers. merits discussing in #zuul or on the zuul-discuss ml at a minimum14:31
*** bobh has quit IRC14:31
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Delay Github fileschanges workaround to pipeline processing  https://review.openstack.org/62558414:32
openstackgerritSorin Sbarnea proposed openstack-infra/zuul-jobs master: Remove world writable umask from /src folder  https://review.openstack.org/62557614:33
*** yboaron_ has joined #openstack-infra14:36
*** yamamoto has quit IRC14:38
*** yboaron has quit IRC14:39
*** yamamoto has joined #openstack-infra14:39
*** graphene has quit IRC14:42
*** graphene has joined #openstack-infra14:44
*** yamamoto has quit IRC14:45
*** yamamoto has joined #openstack-infra14:47
*** bobh has joined #openstack-infra14:48
*** bobh has quit IRC14:52
*** yamamoto has quit IRC14:53
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Add docker mirror url entries to site variables  https://review.openstack.org/62561514:55
*** kiennt26 has left #openstack-infra14:55
mordredfrickler, fungi, tobiash: ^^ that as a step one14:55
*** toabctl has quit IRC14:57
*** calbers_ has quit IRC14:57
*** toabctl has joined #openstack-infra14:57
*** calbers has joined #openstack-infra14:58
dhellmanngerrit-admin: when you have a moment, could someone please add me to the osc-summit-counter-core and osc-summit-counter-release groups? https://review.openstack.org/#/admin/groups/1991,members and https://review.openstack.org/#/admin/groups/1992,members14:58
fungidhellmann: done15:00
dhellmannfungi : thanks!15:00
*** beekneemech is now known as bnemec15:00
fungiany time!15:00
*** e0ne has quit IRC15:02
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Update install-docker to use docker site variable  https://review.openstack.org/62561715:03
*** bobh has joined #openstack-infra15:03
mordredEmilienM|off, ssbarnea|rover: so you know if tripleo is using the install-docker role? and if so, are you still using an older docker that needs the old-style docker mirror?15:05
*** e0ne has joined #openstack-infra15:06
*** efried has joined #openstack-infra15:06
openstackgerritMerged openstack-infra/irc-meetings master: Change Heat meeting schedule  https://review.openstack.org/62549315:07
*** jamesmcarthur has quit IRC15:09
*** jamesmcarthur has joined #openstack-infra15:10
*** yboaron_ has quit IRC15:10
openstackgerritMerged openstack-infra/opendev-website master: Add .zuul.yaml  https://review.openstack.org/62413915:11
*** jamesmcarthur has quit IRC15:15
fricklermordred: fyi, codesearching for "registry-mirrors" also shows kayobe and various airship repos15:15
*** dpawlik has quit IRC15:17
*** dpawlik has joined #openstack-infra15:18
*** dpawlik has quit IRC15:18
openstackgerritMerged openstack-infra/opendev-website master: Add some initial content thoughtso  https://review.openstack.org/62262415:18
openstackgerritMerged openstack-infra/opendev-website master: Convert initial content to html for publication  https://review.openstack.org/62414915:18
openstackgerritThierry Carrez proposed openstack-infra/puppet-ptgbot master: No longer needs room map in configuration  https://review.openstack.org/62561915:19
*** derekh has quit IRC15:24
ssbarnea|roverrlandy: one side-effect of running reproducer: https://review.openstack.org/#/c/625621/15:24
*** chandankumar is now known as chkumar|out15:26
*** ykarel is now known as ykarel|away15:26
*** yamamoto has joined #openstack-infra15:27
*** derekh has joined #openstack-infra15:27
*** pcaruana has quit IRC15:29
openstackgerritMerged openstack-infra/project-config master: Add github dogpile.cache to project list  https://review.openstack.org/62546715:31
corvusfungi: i didn't see a review from you on 625596 -- do you want to hold that change or do you think it's okay to merge (potentially being too openstack-specific for wider use at the moment)?15:33
corvus(i'm trying to catch up and don't have everything paged in)15:34
fungii was mostly just making sure the discussion in #zuul had played out first15:35
*** ykarel|away has quit IRC15:35
evrardjpmordred: it would be easier (by far!) if you have an explicit version of docker you're thinking about15:35
mordredevrardjp: it seems that docker 1.6 is where support for v2 registries happened, and that's in Trusty but the other things seem to have newer15:37
mordredevrardjp: so I'm starting to think that we don't really need to support the v1 registry format anymore15:37
mordredcorvus: I've got 2 followup changes up to potentially clean up the non-openstack portions15:38
corvusfungi, mordred: okay my read of #zuul is we should merge 596 now, and merge the frickler/tobiash/mordred stuff as soon as it's ready (which might be in a few mins)15:38
amorinhey all15:38
fungicorvus: i concur and have +2'd it15:38
mordredcorvus: biggest outstanding question is whether we want to attempt to support the old style v1 registry (other than by url overrides) - which I'm now leaning towards "no"15:38
mordredcorvus: yes15:38
*** andreww has joined #openstack-infra15:39
*** jamesmcarthur has joined #openstack-infra15:40
openstackgerritDoug Hellmann proposed openstack-infra/project-config master: add release job for osc-summit-counter  https://review.openstack.org/62562715:41
EmilienM|offmordred: we don't use this role afik15:41
openstackgerritJens Harbott (frickler) proposed openstack-infra/nodepool master: [wip] Add dogpile.cache master to the -src tests  https://review.openstack.org/62545715:42
EmilienM|offBut I'll check when back on computer15:42
*** xarses_ has quit IRC15:42
smarcetfungi: morning could we retrigger https://review.openstack.org/#/c/611936/ ? its failing on legacy units tests15:48
evrardjpmordred: oh yeah, I am using v2 by default in all I write15:48
mgoddardfrickler: just to feed back, using port 8082 worked a charm. Thanks15:50
fungismarcet: wow, that jobs so broken it doesn't even make it far enough to generate a console log?15:50
smarcetfungi: yeah its seems so15:50
smarcetfungi: not sure whats going on there15:50
mordredEmilienM|off: cool. I'm guessing you're using docker > 1.6 at this point too yeah?15:51
*** pcaruana has joined #openstack-infra15:51
corvussmarcet, fungi: best thing to do is retrigger it and then watch the log in the web browser while it's running15:52
tobiashfungi: I guess the logs of that job just were deleted given the job run was 6 weeks ago (the successful job also has no logs)15:52
fungismarcet: oh! i see, the job ran 6 weeks ago, so we've already expired those job logs15:52
corvusoh :)  that15:52
*** ykarel|away has joined #openstack-infra15:52
fungismarcet: any review comment starting with the word "recheck" and no accompanying vote will do that, i've added one now15:53
smarcetok will note that15:53
smarcetthx u :)15:53
*** slaweq has quit IRC15:54
*** armax has joined #openstack-infra15:55
openstackgerritsebastian marcet proposed openstack-infra/system-config master: Migrate OpenStackID dev server to php7  https://review.openstack.org/62564015:55
*** udesale has quit IRC15:57
clarkbamorin: hello15:59
*** dklyle has joined #openstack-infra15:59
*** slaweq has joined #openstack-infra16:01
clarkbinfra-root https://review.openstack.org/#/c/625350/ fixes our ansible base server application16:04
mordredEmilienM|off: fwiw - it looks like tripleo is still consuming the v1 proxy (which is fine)16:05
dmsimardclarkb: got someone to look at https://github.com/ansible/ansible/issues/4996916:05
*** janki has quit IRC16:05
clarkbdmsimard: ya I think they found the reason but odd an unhandled exception wouldnt result in failure16:05
clarkbcould be acouple things that need fixing in ansible16:06
*** quiquell is now known as quiquell|off16:06
mordredclarkb: maybe, since v2 support seems like a thing for all the docker versions people are using, we should get people transitioned from the :8081 mirror to the :8082 mirror16:07
clarkbmordred: if tou look at apache usage its 99% v1 due to tripleo andnothing uses v216:07
mordredclarkb: there don't seem to be many places it's used: http://codesearch.openstack.org/?q=8081%2Fregistry&i=nope&files=&repos=16:07
openstackgerritMerged openstack-infra/zuul-jobs master: Alwas use pathless docker mirror URI  https://review.openstack.org/62559616:07
*** aojea has quit IRC16:07
clarkbI dont think we care if people use one or the other since itseasy enough to run both?16:07
mordredclarkb: actually, I just looked at the 8081 logs and it does seem to be tripleo but it seems to be using v2 now16:07
mordred104.130.139.141 - - [17/Dec/2018:16:03:19 +0000] "GET /cloudflare/registry-v2/docker/registry/v2/blobs/sha256/ec/ecdfdc556e7deea5f905380baaa27c9770625870837e3bfc73e06c353644ab56/data?verify=1545065599-iSYZzie9ooFj%2Bl6t9bPFhVEEd3c%3D HTTP/1.1" 200 72000541 cache hit16:08
mordred"http://mirror.dfw.rax.openstack.org:8081/registry-1.docker/v2/tripleomaster/centos-binary-neutron-l3-agent/blobs/sha256:ecdfdc556e7deea5f905380baaa27c9770625870837e3bfc73e06c353644ab56" "docker/1.13.1 go/go1.9.4 kernel/3.10.0-957.1.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(docker-sdk-python/3.5.0)"16:08
clarkbv2 through the v1 proxy? neat16:09
clarkbIm guessing thats achieved by not using docker itself16:09
*** gyee has joined #openstack-infra16:10
mordredaccording to the docs, the latest version of registry is supposed to support both and is supposed to support them seamlessly16:10
mordredclarkb: so maybe the thing that caused us to run two was a bug and has since been fixed?16:10
clarkbaiui it was the client side refusing to accept path'd mirrors16:11
*** graphene has quit IRC16:11
clarkbI think it is/was purely a client issue16:11
mordrednod. well - I think we've hit a point where the version of docker everyone is using can handle the v2 mirror16:11
*** ykarel|away is now known as ykarel16:12
clarkband with tripleo using podman etc thereis good chance their client tooling isnt so restrictive16:12
*** graphene has joined #openstack-infra16:12
openstackgerritJames E. Blair proposed openstack-infra/project-config master: Add check queue labels for relative-priority  https://review.openstack.org/62564516:15
corvusclarkb, AJaeger, mordred: ^16:15
*** Bhujay has joined #openstack-infra16:16
clarkbcorvus: thanks.16:17
clarkbit would be much appreciated if we can fix our ansible runs via https://review.openstack.org/#/c/625350/ or something similar before we all disappear for holidays too. This ended up being an ansible bug which I reproduced and filed upstream (link in the comments for that change)16:18
*** ccamacho has quit IRC16:19
*** jamesmcarthur has quit IRC16:21
*** ccamacho has joined #openstack-infra16:21
dulekfungi, frickler: Looks like that kuryr's "appetite" for memory is just a dstat's bug.16:23
mordredclarkb: lgtm16:23
mordredclarkb: yeah - podman works great with the v2 mirror (I've already tried)16:23
dulekfungi, frickler: I think it's splitting the /proc/<pid>/stat in a wrong way.16:24
*** Emine has quit IRC16:24
mordredclarkb: I have been meaning to write an install-podman role that will similarly setup the registry mirror16:24
mordredclarkb: incidentally, this list:16:26
mordredclarkb: registries = ['docker.io', 'registry.fedoraproject.org', 'quay.io', 'registry.access.redhat.com', 'registry.centos.org']16:26
dulekfungi, frickler: Yup, for kuryr it's multiplying virt-mem * PAGE_SIZE instead of rss.16:26
fungiinteresting16:26
mordredclarkb: is what gets installed into registries.conf by default for fedora16:26
mrhillsmanfollowing up on a discussion on the openci ml re opendev; are the svcs - in particular ml and git - available right now with custom domain?16:27
mordredmrhillsman: ml yes16:27
fungimrhillsman: the plan under discussion for git is to deprecate the custom domanis and use git.opendev.org16:27
fungier, domains16:27
mordredmrhillsman: git is currently avail with custom domain- but once the git farm is branded as opendev.org - the plan is to drop custom git domains16:27
corvusfungi, mrhillsman: or perhaps even just 'opendev.org'16:28
corvus(no 'git.')16:28
*** pcaruana has quit IRC16:28
fungioh, right, that was also an option16:28
mordredyeah. opendev.org/openstack/nova should work just fine16:28
mordredmrhillsman: so we'd be able to offer opendev.org/openci/foo - for instance16:28
mrhillsmanso no custom domains?16:28
corvusmrhillsman: https://review.openstack.org/623033  is the plan under discussion16:28
mrhillsmanacross anything?16:28
fungimrhillsman: for mailman mailing lists and web content we're likely to continue supporting custom domains going forward16:29
mrhillsmanwill the underlying things be available for someone to host to resolve that use case?16:29
mrhillsmani guess like how - cannot remember the name right now - the red hat zuul installer16:30
fungiso for example the https://opendev.org/openci/openci-website git repo could publish https://openci.org/ content hosted from the files.opendev.org server... something like that16:30
mrhillsmanif someone wants to use a custom domain to tie into opendev or flat out install everything in a different location16:30
fungimrhillsman: windmill?16:31
mrhillsmannah, brain fart, sec16:31
mrhillsmani think tristan does the updates for it16:31
mordredmrhillsman: software factory16:31
mrhillsmanyeah16:31
mordredmrhillsman: I think right now the thinking is that with a neutral base domain, it's simiar to github.com or gitlab.com - and people donm't seem to mind that for their git repo hosting, so the complexity of whitelabeling the git domains could be avoided16:32
clarkbmordred: re registries yes that is how you specify them now without paths or as a URI. Its jus a hostname then host is expected to speak the right protocol at the right url for it16:32
openstackgerritMerged openstack-infra/puppet-mediawiki master: Optionally alias to a favicon.ico file if provided  https://review.openstack.org/43908216:33
mordredmrhillsman: however, it's an in-discussion topic, so if there is a strong use case for whitelabeled git domains, now would definitely be the time to talk it through16:33
clarkband with the docker client tooling there is no way to say its foo.com/over/here iirc16:33
smarcetfungi: where i could get the source code for legacy-laravel-openstackid-unittests ? i thinkg that its failing bc its trying to use php5 and we need to update to php7.216:34
mordredclarkb: yah - I was mainly pointing out that rh is starting to ship things that expect to be able to talk to a set of registries not just dockerhub, so we might want to ponder adding those others to mirroring infrastructure so that it's possible plop down a similar file that points to mirrors for all of that content16:34
clarkbmordred: ugh16:34
clarkbmordred: this is the nodejs don't actually host your packages (pypi before it too) problem all over again16:34
mordredclarkb: more of a heads-up than a needs-action at the moment16:34
mordredclarkb: actually- in newer tooling you can specify registry as part of container name16:35
mrhillsmanok, so to get the ml going what is needed?16:35
clarkbmordred: oh tahts good16:35
mordredmrhillsman: ml going is just a patch to system-config and making sure dns is set up properly. importing existing archives might be a little more work - I'll defer to fungi on that16:36
openstackgerritMerged openstack-infra/system-config master: Manage the favicon.ico file for the wiki  https://review.openstack.org/43908316:36
mrhillsmanok cool16:36
mordredmrhillsman: I'm happy to help with the project-config patch if that's the direction you decide you'd like to go16:36
corvusmrhillsman: example change to add a new list domain: https://review.openstack.org/56954516:37
fungimrhillsman: for lists.opendev.org i split it over a couple of changes: https://review.openstack.org/625096 sets up the domain, then https://review.openstack.org/625254 to add a specific ml in it16:37
mrhillsmani think everyone according to the discussions on the ml and in meetings over the past few months are ok with the change16:37
*** bhavikdbavishi has joined #openstack-infra16:38
clarkbnote that https://review.openstack.org/#/c/625350/ is necessary to get in first to have that config apply correctly16:38
mrhillsmanoh wait, fungi, so can there be lists.openci.io or has to be lists.opendev.org; i could have misunderstood16:38
mrhillsmangit single place, ml custom domain if you want?16:39
*** aojea has joined #openstack-infra16:39
fungimrhillsman: we have the ability to do either. what we haven't discussed yet is how we decide what domains we're willing to host white-labeled services for16:39
fungiat the ptg we said we'd at least do it for osf projects and pilot projects, but we didn't rule out supporting domains for other projects beyond that16:39
mordredyeah - and openci is the type of project I'd imagine being ok supporting custom domains for - if doing that is in-game16:40
*** _alastor_ has joined #openstack-infra16:40
mrhillsmanok. is that discussion a priority; has an expected decision date?16:40
mrhillsmanjust to get an idea16:41
fungii think it's a "discussion" i'd be comfortable having via review comments on a simple system-config change in gerrit16:41
mrhillsman++16:41
fungibut i don't know how any other stakeholders feel aboutit16:41
mrhillsmanunderstood16:41
clarkbya related to that I think its in the class of thing we probably will end up tackling once it becomes something someone wants to do16:41
clarkbrather than try and write down all the rules before hand16:41
fungithis is a nice opportunity to force the conversation to happen16:42
mrhillsmani'll push a patch16:42
clarkb(the concrete use cases likely help form better opiniosn too rather than guessing at use cases)16:42
mrhillsman++16:42
openstackgerritThierry Carrez proposed openstack-infra/irc-meetings master: Update count_slot_usage for new recurrences  https://review.openstack.org/62565616:42
openstackgerritThierry Carrez proposed openstack-infra/irc-meetings master: Add count_slot_usage argument for sensitivity  https://review.openstack.org/62565716:42
*** jamesmcarthur has joined #openstack-infra16:42
fungimrhillsman: also i'm happy to help with importing your archives if/once the list gets created on our server16:43
mrhillsmanty sir16:43
*** Emine has joined #openstack-infra16:48
*** aojea has quit IRC16:51
*** jamesmcarthur has quit IRC16:54
openstackgerritMerged openstack-infra/project-config master: Add check queue labels for relative-priority  https://review.openstack.org/62564516:54
*** jamesmcarthur has joined #openstack-infra16:55
openstackgerritsebastian marcet proposed openstack-infra/openstackid master: Migration to PHP 7.x  https://review.openstack.org/61193616:55
*** pgaxatte has quit IRC16:56
*** Bhujay has quit IRC16:56
*** graphene has quit IRC16:58
*** graphene has joined #openstack-infra17:00
openstackgerritJens Harbott (frickler) proposed openstack-infra/nodepool master: [wip] Add dogpile.cache master to the -src tests  https://review.openstack.org/62545717:00
openstackgerritClark Boylan proposed openstack-infra/opendev-website master: Add publishing of content to opendev-website  https://review.openstack.org/62566517:01
clarkbsuper simple start at publishing opendev website content. Starting with the logs server as I haven't dug into how the afs stuff works yet17:01
*** efried has quit IRC17:01
openstackgerritMerged openstack-infra/system-config master: Update favicon for newer OpenStack logo  https://review.openstack.org/43904517:01
*** graphene has quit IRC17:04
*** graphene has joined #openstack-infra17:06
*** fuentess has joined #openstack-infra17:07
openstackgerritJames E. Blair proposed openstack-infra/infra-specs master: Add opendev Gerrit spec  https://review.openstack.org/62303317:08
corvusclarkb: ^ that should be ready for voting.17:10
fricklermordred: do you want to approve https://review.openstack.org/624855 to fix nodepool to unblock sdk?17:10
corvuslooks like it's already on the agenda17:10
*** ramishra has quit IRC17:12
smarcetfungi: i am seeing the error now http://logs.openstack.org/36/611936/7/check/legacy-laravel-openstackid-unittests/c3619f5/job-output.txt.gz, saids that not matching package php7.0 is avaiable on dist, but its does exists on xenial17:15
frickler117:17
mordredfrickler: the -src jobs are still failing with that patch17:17
fungismarcet: http://logs.openstack.org/36/611936/7/check/legacy-laravel-openstackid-unittests/c3619f5/zuul-info/inventory.yaml indicates that job ran on an ubuntu-trusty node17:17
mordredfungi: although that explains why the sdk job was failing with zypper issues17:18
smarcetfungi: ok where i could change that to run on xenial ?17:18
fricklermordred: yes, but we need to start somewhere, then fix the dogpile.cache issue, then onwards. step by step, I'd say17:21
openstackgerritMerged openstack-infra/zuul master: Use combined status for Github status checks  https://review.openstack.org/62341717:21
mordredfrickler: ah - ok. actually - lemme look at something real quick ...17:21
openstackgerritClark Boylan proposed openstack-infra/opendev-website master: Publish opendev website to afs on merge.  https://review.openstack.org/62567117:21
clarkband that I think should work for publishing to afs17:21
*** derekh has quit IRC17:21
*** jtomasek has quit IRC17:21
fungismarcet: seems it's set at https://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/zuul.d/zuul-legacy-jobs.yaml#n433 but we should be looking at moving that job definition into the openstackid repo and updating it to be zuulv3-native per https://docs.openstack.org/infra/manual/zuulv3.html17:22
mordredfrickler: ok. it does seem like we are at least appropriately installing openstacksdk from source (just wanted ot make sure)17:22
fungithat job was merely auto-converted from the old jenkins job definitions17:22
clarkbegonzalez: I've noticed that quite a few hits for http://status.openstack.org/elastic-recheck/index.html#1708704 are for kolla jobs that don't appear to be using our local in cloud region mirrors. Is that something we can help you/kolla clean up?17:22
fricklermordred: although this error looks like something new now http://logs.openstack.org/55/624855/4/check/nodepool-functional-py35/32ff9b2/job-output.txt.gz#_2018-12-17_15_16_36_77033117:23
*** jpich has quit IRC17:23
*** graphene has quit IRC17:24
smarcetfungi: ok got it17:24
smarcetthx u17:24
*** graphene has joined #openstack-infra17:25
*** trown is now known as trown|lunch17:25
egonzalezclarkb patch is being merged, was because a package was removed from rpm repos, so image build were failing17:25
*** eernst has joined #openstack-infra17:27
*** eernst has quit IRC17:27
*** eernst has joined #openstack-infra17:28
clarkbegonzalez: have a link to that fix?17:29
clarkbegonzalez: looks like opendaylight ?17:30
egonzalezclarkb https://review.openstack.org/#/c/623426/17:31
egonzalezvitrage17:31
clarkbegonzalez: ok, looking at the logstash hits for that bug on the kolla jobs it seems related to opendaylight somehow (maybe that is teh flaky repo?)17:32
*** eernst has quit IRC17:32
egonzalezhrm, we have an odl repo you maybe not have mirrored17:33
clarkbmriedem: Do we want to send one last CI status update before we all disappear? We've aggregated repos by logical group/queue for prioritzing node assignments, we've fixed OVS installation repo setup for multinode bootstrapping on Centos. Thats the infra stuff. QA/Devstack have updated cirros image to version 0.3.6 from 0.3.5 and merged dansmiths direct-io change. I think nova has done some stuff too17:33
egonzalezclarkb baseurl=https://cbs.centos.org/repos/nfv7-opendaylight-6-release/x86_64/os but moving to 9th release now17:34
clarkbegonzalez: ya, we've found centos.org is quite flaky :(17:35
clarkbso we may want to add a mirror/cache for that repo if we don't already have it17:35
*** smarcet has quit IRC17:36
egonzalezprobably, given the odl packages are quite big >300mb17:36
clarkbhttp://mirror.dfw.rax.openstack.org/centos/7/ is what we mirror from centos.org. Which is a different repo than the opendaylight repo. Let me see if we proxy cache it17:37
*** bhavikdbavishi has quit IRC17:37
*** ykarel is now known as ykarel|away17:37
fungi300mb? wow, no wonder they don't carry it in the distro17:37
fungithat's pretty massive for a distro package17:38
clarkbegonzalez: we proxy cache https://nexus.opendaylight.org/ at http://$mirror_node:8080/opendaylight17:38
mriedemclarkb: sure if you want, i'd just ask you don't tag it with [infra] since then it filters for me to a folder i don't normally read :)17:38
clarkbnot sure if that location also hosts the same package repos17:38
mriedemclarkb: i haven't assessed where we are with the nova-specific gate stuff yet, trying to focus on some other things first this week before i'm out next week17:38
clarkbmriedem: no worries. I think the general push towards "people please look at this stuff" has resulted in godo results all around17:39
jrosseri have a mirror related question, we get this var in a job NODEPOOL_CEPH_MIRROR=http://mirror.mtl01.inap.openstack.org/ceph-deb-hammer which isn't totally helpful for getting at ceph != hammer..... is there something else i should be doing to construct the path to the ceph mirror?17:40
clarkbegonzalez: I don't know the mapping between opendaylight release numbers and names but that location does have centos repos for opendaylight releases17:41
clarkbjrosser: that is an unfortaunte side effect of how every ceph release is a different repository17:41
clarkbjrosser: I think we picked the one to advertise based on whatever devstack + nova + cinder are/were testing at the time17:42
clarkbjrosser: but if you browse eg http://mirror.dfw.rax.openstack.org/ you'll see we have different versions available17:42
jrosserhmm ok17:43
openstackgerritMerged openstack-infra/nodepool master: Switch devstack jobs to Xenial  https://review.openstack.org/62485517:44
clarkbjrosser: we can bump the globally advertised version too if we just do a quick check it won't explicitly break everyone and send a note to people that it is happening17:44
jrosserhow about a proxy for download.ceph.com? is that feasable17:45
jrosserbecasue it's hard to write code once that works in CI and in the outside world right now17:45
clarkbjrosser: yes, however that won't solve the select different url problem as we don't do transparent proxies. (the issue there being we can't restrict usage of transparent proxies so rather than be bad internet citizens we reverse proxy specific backends)17:46
clarkbalso the mirrors we have there should be far more reliable than a caching proxy17:46
*** e0ne has quit IRC17:46
clarkbjrosser: the way we try to handle this generally is have job setup apply the appropriate mirror info then have the job test workload skip over it17:48
egonzalezclarkb cannot find odl repos in here http://mirror.dfw.rax.openstack.org17:49
clarkbegonzalez: ya I think the only thing is the proxy for nexus.opendaylight.org as noted above17:50
egonzalezis there a list of the mirrors i can look at to point the ci jobs?17:50
clarkbegonzalez: https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/templates/mirror.vhost.erb is going to be most correct as it is the apache config for the reverse proxues17:51
egonzalezclarkb thanks will take a look17:51
clarkbnavigating eg http://mirror.dfw.rax.openstack.org shows you all of the content we mirror out of AFS17:51
clarkbthen we reverse proxy cache along side that things that are less practical to mirror on AFS. At this point the AFS mirrors are largely for the main distro mirrors17:52
*** Emine has quit IRC17:54
*** Emine has joined #openstack-infra17:54
*** diablo_rojo has joined #openstack-infra17:54
jrosserclarkb: i'd have to think harder about it as there is xenial/bionic/centos vs which ceph release is packaged for those vs. which openstack release is being tested in any particular job - so bumping the release in just changes the shape of the problem17:54
jrosser*bumping the release in NODEPOOL_CEPH_MIRROR....17:55
*** smarcet has joined #openstack-infra17:55
*** _alastor_ has quit IRC17:57
*** tosky has quit IRC17:57
*** Emine has quit IRC17:58
*** _alastor_ has joined #openstack-infra17:59
openstackgerritSalvador Fuentes Garcia proposed openstack-infra/openstack-zuul-jobs master: kata-containers: add /usr/sbin to the PATH  https://review.openstack.org/62567918:04
*** smarcet has quit IRC18:13
*** jpena is now known as jpena|off18:14
*** rpittau has quit IRC18:16
clarkbfungi: I think I tracked down one possible cause of those post failures18:17
clarkbThe error was: template error while templating string: no filter named \'bool\'. String: {% if zuul_site_upload_logs | default(true) | bool or (zuul_site_upload_logs == \'failure\' and not zuul_success | bool) %}18:17
clarkbfrom the executor logs18:17
clarkbI'm working on a fix18:17
fungioh fun18:17
clarkbhrm here I thought it was going to be called boolean or similar but docs seem to imply bool is correct18:18
clarkbgit.openstack.org/openstack-infra/zuul-jobs/roles/upload-logs/tasks/main.yaml is the file it seems unhappy about18:19
*** jamesmcarthur has quit IRC18:19
*** wolverineav has joined #openstack-infra18:21
fungiwrong parenthetical grouping there maybe?18:21
fungishould it be (zuul_site_upload_logs == 'failure' and not zuul_success) | bool18:21
fungior is it just zuul_success that needs to be recast there?18:22
fungii guess the == operator already renders a bool?18:23
clarkbya maybe it can't find a valid filter called bool to convert a boolean to a boolean?18:23
clarkbin which case it could be a grouping issue18:23
*** jamesmcarthur has joined #openstack-infra18:24
*** imacdonn has quit IRC18:24
clarkbdmsimard: mordred ^ you probably know more about ansible jinja2 filters than we do18:24
corvusclarkb: can you point me at what you're looking at?18:24
*** imacdonn has joined #openstack-infra18:24
*** smarcet has joined #openstack-infra18:24
clarkbcorvus: I'm looking at the output of `ssh ze07.openstack.org grep 5c245a7825554131aeaabf7f589cc28b /var/log/zuul/executor-debug.log`18:24
clarkbwhich is the executor log for a job that failed POST_FAILURE on my ansible crash fix change18:25
corvusclarkb: did we change something related to that recently?18:25
clarkbnot that I am aware of18:25
*** ykarel|away has quit IRC18:25
fungicould https://github.com/ansible/ansible/issues/31115 be related?18:25
clarkbfungi: maybe? I don't think we override nay of the jinja filters18:28
fungik18:29
*** _alastor_ has quit IRC18:31
fungiseems odd it wouldn't find one of its builtin filters18:31
*** electrofelix has quit IRC18:32
*** _alastor_ has joined #openstack-infra18:32
*** dpawlik has joined #openstack-infra18:32
clarkbfungi: ya makes me think you may be on to something with the typing being important18:34
*** wolverineav has quit IRC18:34
clarkbif jinja2 cares about input types it may be trying to say no valid bool filter for that input type found?18:35
*** wolverineav has joined #openstack-infra18:35
clarkblooks like jinja2 doesn't ship bool, that must come from ansible18:36
*** jamesmcarthur has quit IRC18:37
clarkbthe ansible code implies any type is valid as input though18:38
*** wolverineav has quit IRC18:38
*** Vadmacs has joined #openstack-infra18:38
*** wolverineav has joined #openstack-infra18:38
corvusit ran with "-e zuul_success=True"18:39
*** smarcet has quit IRC18:39
corvuszuul_site_upload_logs should not be defined currently18:40
fungilooks like that line was last altered in https://review.openstack.org/611622 which merged 2018-10-19 so it's been that way a couple months now18:40
*** trown|lunch is now known as trown18:41
corvusclarkb, fungi: i have no idea what happened, but it doesn't seem normal.18:41
clarkblib/ansible/plugins/filter/core.py is where to_bool is defined18:41
*** e0ne has joined #openstack-infra18:42
clarkband we don't seem to redefine that in zuul18:42
clarkbzuul does define its own set of filters though18:42
clarkblast updated in october as well18:43
corvusclarkb, fungi: puppet performed an install_zuul around that time.18:44
clarkboh interesting18:44
corvusDec 17 17:38:25 ze07 puppet-user[21405]: (/Stage[main]/Zuul/File[/opt/graphitejs]/ensure) removed18:45
corvusDec 17 17:40:07 ze07 puppet-user[21405]: (/Stage[main]/Zuul/Exec[install_zuul]) Triggered 'refresh' from 1 events18:45
corvusthat means install_zuul happened between those 2 timestamps, right?  the log error was at 17:39:40,09818:45
clarkbyes puppet logs after it is done performing a task iirc18:45
fungiwas there a removal/reinstallation of ansible then?18:46
*** e0ne has quit IRC18:46
corvusthe version on the host is from december 1318:46
corvusso i wouldn't expect it... but maybe?18:46
fungijust theorizing maybe jinja was looking for the filter plugin while pip was in the process of uninstalling/reinstalling ansible18:47
corvusfungi: yeah, i think that theory best fits most of the facts; the only thing missing is a stronger suggestion that pip touched ansible itself18:48
*** mriedem has quit IRC18:48
corvusi don't know if we have those logs18:48
clarkb2.5.14 was released today18:48
clarkber18:48
clarkbI'm bad at dates18:48
clarkbon the 13t18:48
fungi`stat /usr/local/lib/python3.5/dist-packages/ansible/plugins/filter/core.py` on ze07 says Modify: 2018-12-17 17:39:24.512241363 +000018:49
clarkbya ok you found that version already18:49
egonzalezclarkb added a patch to use proxies for ODL and percona repos https://review.openstack.org/#/c/625688/, hopefully it fix the elastic recheck18:49
clarkbegonzalez: great, thanks18:49
clarkbegonzalez: hopefully makes your jobs more reliable :)18:49
egonzalezyeah, we have a lot of timeouts on percona repos18:50
fungiclarkb: corvus: so i think that tells us it did at least overwrite that file around the time we hit the error18:50
*** mriedem has joined #openstack-infra18:51
corvusfungi, clarkb: oh, ha -- i think it just never occurred to me that we would go for 6 days without merging a zuul change, but that has happened18:52
corvusso there was no trigger to install the new ansible until now18:52
fungiit _was_ a quiet week18:52
*** jamesmcarthur has joined #openstack-infra18:53
clarkbah ok so it was the ansible upgrade, but needed zuul repo update to trigger the bits to pull that in18:53
fungiso anyway, pretty sure we have evidence to at least say that upgrading ansible in-place while an executor is active can be detrimental to the running jobs ;)18:53
corvusi think that ties up all the loose ends.  i believe we knew this was a possibility, but figured ansible releases are infrequent enough that we just wouldn't worry too much when it happened.18:53
clarkbmakes sense (though annoying it has a race tere)18:53
corvuslong-term, we may be able to make this better with the multi-version work18:53
fungiagreed, i'm not too worried about it, chalk this up to continuous deployment failure modes to be on the lookout for18:54
fungihonestly, i'm amazed that we upgraded ansible across a dozen busy executors and only saw a handful of post_error results as fallout18:54
*** smarcet has joined #openstack-infra18:55
fungiclarkb: though as a result, i doubt this explains the post_failure results ssbarnea|rover noticed over the weekend18:55
clarkbcorvus: could also possibly be made more reliable if ansible didn't lazy load that stuff, but not sure that is desireable (may impact memory or startup time)18:55
clarkbfungi: yup agreed18:55
ssbarnea|roverfungi: clarkb: i think that I found the rootcause for the posttimeouts, originating with an ansible security fix: https://github.com/ansible/ansible/pull/42142/files18:58
*** shardy has quit IRC18:58
ssbarnea|roverif you look in out logs, you will find that almost all tripleo builds present such a warning, because, for some .... reason our /home/zuul/src is made world writable.18:59
fungioh, neat!18:59
ssbarnea|roveri cannot blame ansible for that, is out fault of doing such an insane chmod19:00
ssbarnea|rovertriple contains some ssh arguments that are supposed to prevent frozen tasks at https://github.com/openstack/tripleo-quickstart/blob/457e61fb73eb55153cd4b8105c6090b9730c13be/ansible.cfg#L2119:00
ssbarnea|rovermainly the ServerAliveInterval19:01
*** dpawlik has quit IRC19:01
*** e0ne has joined #openstack-infra19:01
ssbarnea|roverif the config is not loaded, .... i am not sure what ssh_args will endup using, but i have the impression that this may be the cause.19:01
*** dpawlik has joined #openstack-infra19:01
clarkbfungi: https://review.openstack.org/#/c/625095/1 is the proxy change for pip caching from last week if you want to take a look19:04
fungissbarnea|rover: yes, that's a good bet. particularly on centos we've witnessed headless ssh invocations hang when trying to close down their socket19:06
clarkbfungi: I had that change and the ansible fix on the meeting agenda to bring them up in case we don't get them in today (as I'm going to be afk latter half of the week and din't want that getting lost19:07
ssbarnea|roverfungi: clarkb now if you can help me fix this it would be great. My first attempt was to look at https://review.openstack.org/62557619:08
ssbarnea|roverwhich is supposed to be tested via https://review.openstack.org/#/c/625680/ (swill in queue)19:08
*** _alastor_ has quit IRC19:08
ssbarnea|roverthe problem is that my test of a similar change on rdo (forked role) failed to fix the permission as ansible kept complaining about world writable folder. Is u=rwX,g=rX,o=rX wrong?19:09
*** Emine has joined #openstack-infra19:10
ssbarnea|roveror this only adds new permissions without removing w from o?19:10
openstackgerritsebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Update openstackid jobs  https://review.openstack.org/62569119:10
openstackgerritMerged openstack-infra/system-config master: Copy pasta the debian base server bits, don't include them  https://review.openstack.org/62535019:11
*** jamesmcarthur has quit IRC19:11
clarkbssbarnea|rover: I would use a bitmask instead to avoid ambiguity around that19:12
clarkbbut ya you may need -w to remove the flag using that mode of chmod?19:12
ssbarnea|roverclarkb: that runs recursively on both files and directories, so not sure how to assure directories are still exec.19:12
ssbarnea|roverthat line is a chmod, i hope you are not trying to tell me that we need two chmods there :D19:13
clarkbssbarnea|rover: a quick test with chmod itself has that working (no need to -w)19:14
ssbarnea|roverbased on https://serverfault.com/a/35107/10361 it should be correct.19:14
clarkbssbarnea|rover: possibly a bug in ansible if it doesn't work19:14
clarkbone way to check that could be to exec chmod?19:14
ssbarnea|roverclarkb: this folder is kept between builds, right?19:16
clarkbssbarnea|rover: it is on the test nodes which are single use for us so no19:16
*** smarcet has quit IRC19:16
ssbarnea|roverclarkb: this means that if I remove the entire chmod, the files should just have their default permissions, which are supposed to be correct. right?19:17
clarkbI don't know what the umask is on the different platforms. It is possible it will be too restrictive on some of them19:17
ssbarnea|roveri didn't really understand the reasoning behind that task in particular. why doing that operating in the first place19:18
clarkboh I see why it is writeable now though, its so that you cna hardlink19:18
clarkbssbarnea|rover: its so that you can have copies of the repos in other parts of the fs without being the file owner in the "source"19:18
ssbarnea|roverclarkb: well, this is not safe, so no more hardlinking.19:18
clarkbwhy isn't it safe?19:19
clarkban alternative approach would be to run your ansible.cfg out of /etc or similar19:19
clarkb(and lock it down to make ansible happy, this is how the infra team's integration testing works iirc)19:19
ssbarnea|roverbecause another user could mess with the file, i kinda agree with ansible that o+w is a very bad idea.19:19
ssbarnea|roverregardless of what reasons we have behind.19:20
clarkbthey are single use test nodes though19:20
fungii think this just means we should only clone from those src trees, and not perform operations directly in them19:20
fungithe hardlinks in the clone will get appropriate permissions instead19:20
ssbarnea|roverdo we really need more than one user accesing these files?19:21
fungidevstack uses several accounts on the system19:21
fungii think the stack user does the cloning in devstack's case?19:21
clarkbalso19:21
clarkbzuul-cloner is deprecated and has been for about a year now19:21
clarkbwe should be deleting code tat relies on those paths19:22
amorinwhey fungi frickler clarkb , I worked on the BHS1 aggregate19:23
clarkbamorin: hello19:23
amorincould you maybe try some io / ram stress test?19:23
amorinfrickler spawn many instances there19:23
clarkbamorin: yes, are those instnaces distributed to the differeny hypervisors?19:23
amorinalmost all of them are on separated hosts19:23
amorinyes19:23
amorininstance name frickler-test17 is the only one not accessible19:24
ssbarnea|roverclarkb: fungi : so it should be fine to remove the entire chmod.19:24
amorinthe host is under intervention19:24
clarkbgreat. I can look at running some artificial dd based testing as well as devstack + tempest later today19:24
amorinsounds great19:24
amorinlet me know the result, I must leave now but I will read it tomorrow morning19:25
clarkbamorin: will do, thank you!19:25
openstackgerritSorin Sbarnea proposed openstack-infra/zuul-jobs master: Remove unsecure chmod which makes src world writable  https://review.openstack.org/62569419:27
clarkbssbarnea|rover: I'm not sure we can remove it until we've moved more jobs off of it19:27
clarkb(and depending on how long zuul wants to support those users as well)19:28
clarkbI just worry about changing the behavior of a deprecated thing when the focus should be on using othe rtools instead19:28
ssbarnea|roverclarkb: you know better what relies on it, so we can test that change.19:28
clarkbssbarnea|rover: any of the jobs using devstack-gate19:28
clarkb(which are slowly going away)19:28
*** _alastor_ has joined #openstack-infra19:28
ssbarnea|roverclarkb: tripleo is too big/slow to make this, we still need to perform maintenance on deprecated roles.19:29
ssbarnea|roverslowly is the word....19:29
ssbarnea|roverregarding hardlink permissions the workaround for devstack or whom else wants to symlink, is to put all users in a common group, so they should be able to hardlink19:31
ssbarnea|roverthe problem is with "o" not with "g".19:31
clarkbssbarnea|rover: alternatively, the fix for ansible is to use ansible config in /etc19:32
fungiyes, but those jobs are effectively frozen. altering their behavior significantly now is not a great idea19:32
clarkbfungi: ++ exactly19:32
ssbarnea|roverclarkb: this makes no sense for me, tested code should be able to run with its internal setup.19:33
fungipart of why we made zuul v3 was to reduce the degree to which the infra team had to care about corner cases with widely-shared job definitions19:33
ssbarnea|roverwe may even want/need to have multiple ansible.cfg files.19:33
clarkbssbarnea|rover: unless that code is using a legacy portion of zuul that we'd like you to stop using19:33
clarkbif you are in that situation thenyou either work around the legacy behavior or get onto the modern tooling19:34
clarkbbut continuing to update and try to support the legacy behavior doesn't scale19:34
pabelangerclarkb: ssbarnea|rover: fungi: https://review.openstack.org/596874/ was how I fixed the world readable directories, it has to do with how we are doing used-cache-repos.19:36
pabelangerif somebody wanted to pick it up again19:36
clarkbpabelanger: thats different19:36
clarkbin this case its the zuul cloner interface we are trying to preserve for eg devstack-gate19:37
clarkbwhich actually does realy on that functionality19:37
*** jamesmcarthur has joined #openstack-infra19:37
*** anteaya has joined #openstack-infra19:37
ssbarnea|roveri think i will go offline now before i say something i cannot take back.19:38
clarkbssbarnea|rover: basically we realized that the way zuul cloner works is a mistake19:38
clarkbwe fixed it19:38
clarkbwe fixed it by not using zuul cloner and the bad interface anymore for new jobs and encourage people to convert19:38
pabelangerclarkb: agree, even if fixed, I still think /home/zuul/src would be world readable for this reason. But agree, zuul-cloner is legacy and should be avoided. in fact, we should remove it from our base jobs, we are still included it by default19:38
clarkbso we agree on that19:38
clarkbwhat we don't agree on is changing the legacy role and possibly breaking many jobs19:39
clarkbbecause then we'd have to update many legacy jobs instead of just replacing them19:40
ssbarnea|roverclarkb: how can I see the impact of that change on old jobs? I am just curios to see how it breaks them.19:41
pabelangerRemove fetch-zuul-cloner from base job: https://review.openstack.org/513506/19:41
pabelangerI adanboned it by mistake19:41
pabelangerthat is possible to break jobs, but jobs using it should be parented to legacy-base, not base19:41
clarkbssbarnea|rover: push a change to devstack-gate that depends on your change to update the chmod19:42
clarkbssbarnea|rover: that should still run devstack gate jobs there19:42
ssbarnea|roverclarkb: thanks, I will do that now. better to see the extent of the damage. maybe we are lucky.19:42
openstackgerritSorin Sbarnea proposed openstack-infra/devstack-gate master: DNM: test umask correction on src  https://review.openstack.org/62569719:45
fungidoesn't guarantee there aren't other more susceptible users of that floating around in the system of course19:46
*** bobh has quit IRC19:47
fungibut it's a useful high-level check anyway19:47
clarkbya I think it is a useful bsaeline19:47
fungia lot of projects have copied bits of legacy functionality into the jobs in their respective repos too, ostensibly as a stepping-stone to rewriting them, but lot are still nowhere near complete with that i suspect19:48
clarkbhrm speaking of devstack-gate. Looking at benchmarking these bhs1 nodes and d-g had/hsa the reproduce.sh script. We don't have that for current jobs so maybe I use d-g forthat? Have to figure out how I want to drive this19:49
*** bobh has joined #openstack-infra19:49
fungiwhich is going to make it fun when we eventually try to remove legacy pieces19:49
fungihrm, yeah no great ideas for orchestrating devstack+tempest other than old devstack-gate or hard-coding a bunch of configuration swiped from a recent devstack job and then alter the playbooks to install that configuration19:51
clarkbok I think what I'm going to do is boot instances for me to test on then once I've figured out a procedure apply that to frickler's distributed test nodes? Or maybe write the procedure down, apss that to frickler and make sure it doesn't conflict iwth any of frickler's plans19:53
*** Vadmacs has quit IRC19:53
*** _alastor_ has quit IRC19:53
*** bobh has quit IRC19:54
clarkbhehe we actually run a bunch of zuulv3 native jobs on d-g19:54
fungiwell, orchestrating some simple i/o testing with dd should be safe and quick to turn out19:54
*** _alastor_ has joined #openstack-infra19:55
clarkbya I'll spin up my test server (on xenial since that is what we were looking at in the past), figure out some artificial benchmarking as well as devstack + tempest19:55
clarkbthen we can apply that to fricklers distributed nodes19:56
*** bobh has joined #openstack-infra19:57
openstackgerritMerged openstack-infra/zuul master: Modify some file content errors  https://review.openstack.org/62427820:00
*** bobh has quit IRC20:02
*** jamesmcarthur has quit IRC20:05
*** sshnaidm is now known as sshnaidm|off20:07
*** graphene has quit IRC20:07
*** jamesmcarthur has joined #openstack-infra20:08
clarkbusing different fio settings I find that if we do direct io we get about 10MB/s reads and writes, but disabling direct io its in the 300MB/s range for reads and writes20:12
clarkbI think that does lend some weight to the argument that memory is part of the issue here20:12
clarkb(I imagine you'd fall back on direct io behavior if you have no memory to cache things in, though dansmith iwll probably explain how that assumption is wrong)20:13
*** wolverineav has quit IRC20:22
openstackgerritMerged openstack-infra/system-config master: Set CacheIgnoreCacheControl on pypi proxy cache  https://review.openstack.org/62509520:26
clarkbfirst issues I've found is the bsh1 mirror is not accessible from bhs1 instances20:26
clarkbit is accessible from the internet (my laptop at home)20:26
clarkboh wait20:27
clarkbdid I forget to set config drive on my instance? that may be the cause20:27
openstackgerritSorin Sbarnea proposed openstack-infra/devstack-gate master: DNM: test umask correction on rdo Depends-On: https://review.rdoproject.org/r/17732 Change-Id: Id4e0ecb6cca01570a611c9824c6d821fb1e6d9b0  https://review.openstack.org/62569720:27
clarkbya my instance has a /19 not a /3220:27
clarkboh ya I hav eto set the metadata to fix that20:28
bodenhi. I was testing out py3.6 for lower-constraints (https://review.openstack.org/#/c/623229/) and ran into  "Error: could not determine PostgreSQL version from '10.6'"... has anyone seen this by chance?20:28
openstackgerritSorin Sbarnea proposed openstack-infra/devstack-gate master: DNM: test umask correction on src  https://review.openstack.org/62569720:28
*** sthussey has joined #openstack-infra20:32
clarkbboden: I think psycopg2 may have a -wheel or -binary package now to install that includes the pre linked binary stuff? I'm guessin that failure is lack of ability to find those headers?20:35
clarkbboden: http://logs.openstack.org/84/625684/1/check/vmware-tox-lower-constraints/3b21419/job-output.txt.gz#_2018-12-17_18_35_45_388284 doesn't install headers just the client and server I think20:35
bodenclarkb is that something I need to resolve as part of the project?20:36
*** _alastor_ has quit IRC20:37
clarkbboden: yes, it should be part of your bindep.txt file20:37
bodenclarkb: ok I will dig; not familiar with that off hand20:38
openstackgerritIan Wienand proposed openstack-infra/zuul-jobs master: Add a note on testing  https://review.openstack.org/62457820:39
clarkbboden: basically its a list of binary dependencies for your project. We install them as part of the job setup20:39
openstackgerritIan Wienand proposed openstack-infra/zuul-jobs master: Add a note on testing trusted roles  https://review.openstack.org/62457820:39
*** _alastor_ has joined #openstack-infra20:42
*** wolverineav has joined #openstack-infra20:49
*** yboaron has joined #openstack-infra20:49
clarkbansible log says we are applying iptables role again20:50
clarkbfungi: ^ fyi I think the lists server is going to get ansibled exim config now, want to dobule check the config is as you expect?20:51
fungiyep20:53
fungicurrently last modified friday at 21:17z20:54
*** wolverineav has quit IRC20:54
clarkbmy first ovh bhs1 test node ran devtack in 1216seconds this is without cached repos too20:56
clarkbI have a second with the networking set up as we have it in nodepool20:56
clarkbtempest is running on the first now20:56
clarkbso far things are looking good to me, but I think we still want to run all of this on the distributed VMs once we are happy with my rough set of steps20:57
*** wolverineav has joined #openstack-infra21:00
*** jamesmcarthur_ has joined #openstack-infra21:01
*** wolverineav has quit IRC21:02
*** jamesmcarthur has quit IRC21:04
*** wolverineav has joined #openstack-infra21:04
*** dmellado has quit IRC21:05
openstackgerritSorin Sbarnea proposed openstack-infra/elastic-recheck master: Identify *POST* timeout failures individually  https://review.openstack.org/62571721:05
*** wolverineav has quit IRC21:05
*** stevebaker has quit IRC21:05
*** wolverineav has joined #openstack-infra21:06
*** gouthamr has quit IRC21:06
*** graphene has joined #openstack-infra21:07
*** graphene has quit IRC21:09
*** graphene has joined #openstack-infra21:10
*** _alastor_ has quit IRC21:16
*** graphene has quit IRC21:16
*** gouthamr has joined #openstack-infra21:17
*** _alastor_ has joined #openstack-infra21:18
*** yolanda has quit IRC21:18
*** jamesmcarthur has joined #openstack-infra21:20
*** Emine has quit IRC21:23
*** jamesmcarthur_ has quit IRC21:23
clarkbfrickler: amorin I'm collecting the data I'm finding on my test nodes on the etherpad https://etherpad.openstack.org/p/bhs1-test-node-slowness21:27
*** rcernin has joined #openstack-infra21:28
clarkbat this rate I am half expecting that I won't be able to run the script on all of the frickler* nodes so would be good if frickler can do that with the script I'll be adding to the etherpad?21:28
clarkbfwiw so far thing are looking good on the two instances I booted21:28
*** gouthamr_ has joined #openstack-infra21:30
*** slaweq has quit IRC21:30
*** bobh has joined #openstack-infra21:33
*** gouthamr_ has quit IRC21:37
openstackgerritMerged openstack-infra/git-review master: docs: Misc updates  https://review.openstack.org/61057421:38
openstackgerritMerged openstack-infra/git-review master: docs: Call out use of an agent to store SSH passwords  https://review.openstack.org/61061621:38
openstackgerritMerged openstack-infra/git-review master: tox.ini: add passenv = http_proxy https_proxy # _JAVA_OPTIONS  https://review.openstack.org/62449621:38
*** gouthamr_ has joined #openstack-infra21:41
*** xek has quit IRC21:44
*** gouthamr_ has quit IRC21:46
*** slaweq has joined #openstack-infra21:46
clarkbhttp://paste.openstack.org/show/737512/ those look good as benchmarks21:47
openstackgerritSean McGinnis proposed openstack-infra/irc-meetings master: Switch release team meeting to Thursday 1600  https://review.openstack.org/62529021:48
clarkbinfra-root ^ any other benchmarking ideas that I can test on my throwaway instance? Everything I'm seeing shows it as happy. I think if frickler agrees we can reenable ovh tomorrow?21:49
*** dpawlik has quit IRC21:50
*** slaweq has quit IRC21:51
funginothing comes to mind21:51
*** gouthamr_ has joined #openstack-infra21:51
*** yboaron has quit IRC21:52
*** wolverineav has quit IRC21:54
*** wolverineav has joined #openstack-infra21:54
*** gouthamr_ has quit IRC21:56
*** trown is now known as trown|outtypewww21:58
clarkbI'll leave my two instances running in case anyone wants to jump on them and try stuff. I've recorded the devstack and tempest runtimes off of them in that paste above and they look good to me21:58
*** _alastor_ has quit IRC22:04
*** _alastor_ has joined #openstack-infra22:05
mriedemclarkb: good news on http://status.openstack.org/elastic-recheck/#1806912 is that it looks like n-api is no longer a problem22:06
mriedemafter dan and my fixes are merged22:06
clarkbmriedem: looks like there is a spike there, any links to the changes?22:06
clarkblet me see if spike is on master22:07
mriedemno idk what that's from22:07
mriedemlogstash says it's all on master22:07
clarkbthey are also in check22:07
clarkbso could just be noise22:07
*** slaweq has joined #openstack-infra22:08
mriedemsomething is weird with the g-api check though,22:08
mriedembecause looking at one of the failures, g-api is ready in 3 seconds22:08
mriedemhttp://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/screen-g-api.txt.gz22:08
fungiclarkb: excellent fracas update22:09
mriedemwell i guess g-api is ready,22:09
mriedembut glance isn't serving GET /images requests22:09
clarkbmaybe it isn't ready yet then :P22:10
clarkbthat might be good feedback to glance though to not mark things ready until they can serve requests?22:10
*** ndahiwade has joined #openstack-infra22:11
mriedemalso do you know why this dropped off? http://status.openstack.org/elastic-recheck/#180751822:11
clarkbmriedem: I wonder if that maps to when we turned off ovh bhs122:12
fungilooks like it continued well into friday22:12
clarkbbhs1 was turned off on the 7th so that isn't it22:12
clarkbmriedem: is that the apache proxy configuration fixes maybe?22:13
clarkbhrm no that was the 11th and devstack/tempest move to bionic were the 12th22:14
clarkbmriedem: oh actually I wonder if it was the oslo policy infinite recursion bug22:14
clarkbmriedem: there was a thread about that on -discuss and cdent helped them sort it out22:14
mriedemlooking at http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/apache/access_log.txt.gz i see the 503s in there22:14
mriedem10.209.34.117 - - [17/Dec/2018:20:33:15 +0000] "GET /image HTTP/1.1" 503 568 "-" "curl/7.58.0"22:15
clarkbmriedem: https://review.openstack.org/#/c/625086/ I think that explains http://status.openstack.org/elastic-recheck/#180751822:16
mriedemyeah same22:16
*** bobh has quit IRC22:20
*** slaweq has quit IRC22:20
clarkbmriedem: the 503s imply to me that the backend service isn't actually up and serving requests yet22:20
*** rh-jelabarre has quit IRC22:21
*** bobh has joined #openstack-infra22:22
mriedemthis is where devstack is hitting g-api as well http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/apache/tls-proxy_error_log.txt.gz#_2018-12-17_20_33_08_84019322:22
clarkbmriedem: the 503s happen much later in the job than the g-api startup22:22
mriedemyeah that's true22:22
clarkbdevstack stops running at ~18:4222:23
clarkbut 503s are at ~20:3322:23
openstackgerritMerged openstack-infra/irc-meetings master: Switch release team meeting to Thursday 1600  https://review.openstack.org/62529022:23
clarkbmriedem: I wonder if that is the same issue dansmith was looking at on friday22:24
mriedem2018-12-17 20:33:08.714 | + functions-common:_run_under_systemd:1484 :   sudo systemctl start devstack@g-api.service22:24
mriedemwtf22:24
clarkbbasically the whole system goes to lunch and rabbitmq times out and so on22:24
mriedemyeha22:25
mriedem=ERROR REPORT==== 17-Dec-2018::18:42:10 === closing AMQP connection <0.1344.0> (10.209.34.117:41178 -> 10.209.34.117:5672 - uwsgi:9448:b8040f63-7cc7-4025-9957-3626a7000014): missed heartbeats from client, timeout: 60s22:25
clarkband this job did run with direct-io enabled22:25
*** e0ne has quit IRC22:25
clarkband this is a different cloud than the last one (this is rax-dfw the friday one was inap)22:25
clarkbpointing at maybe a bionic issue22:26
clarkbmriedem: does syslog show anything /me looks22:26
mriedemoh heh btw infra core https://review.openstack.org/#/q/topic:dump-rabbitmqctl-report+(status:open+OR+status:merged)22:28
mriedemi posted those in july of 2017 the last time we had weird rabbit issues in the gate22:28
clarkbmriedem: will need to go into devstack now that d-g isn't used much22:29
clarkbmriedem: this is totally thinking out loud a bit but it almost looks like we have two devstack runs on the same host22:29
clarkband possibly the second clobbers the first22:29
SpamapStobiash: I'm kind of over Alpine too. the -slim debian images are pretty good and you don't have to trust a bunch of people who aren't great at communicating where they're from, who they are, or why we're supposed to trust them.22:29
SpamapSI'd really love for pbrx to switch to Debian or Ubuntu honestly.22:30
fungiSpamapS: overflow from discussion in #zuul?22:30
clarkbmriedem: looking in syslog you can see we install packages twice about two hours apart from each other22:30
clarkbhttp://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/syslog.txt.gz#_Dec_17_18_19_46 and http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/syslog.txt.gz#_Dec_17_20_19_5822:30
clarkbis it possible rax gate us a duplicate IP and somehow everything just worked for two jobs to ssh into the same host?22:31
mriedemDec 17 20:33:08 ubuntu-bionic-rax-dfw-0001240665 sudo[9402]:    stack : TTY=unknown ; PWD=/opt/stack/devstack ; USER=root ; COMMAND=/bin/systemctl start devstack@g-api.service22:31
clarkbcorvus: ^ fyi this may be someting we want to guard against22:31
clarkbmriedem: I'm going to look in nodepool logs for that IP now22:31
*** tpsilva has quit IRC22:32
mriedemyeah it starts it twice22:32
clarkbnodepool doens't seem to think the IP was used twice22:32
clarkbat least not reused after the 1800ish usage at thestart of that job22:32
corvusclarkb: when we add the per-build ssh key, do we also remove the global key from authorized_keys?22:33
tobiashSpamapS: yes, the -slim images aren't that huge anymore22:34
*** boden has quit IRC22:34
corvusclarkb: looks like no; so i guess we don't protect against that22:34
clarkbcorvus: ya we remove it from the ssh agent but not authorized_keys22:35
SpamapSOh whoops, I think I meant for that to go to #zuul22:35
*** jamesmcarthur has quit IRC22:35
clarkbthat said I don't see evidence in nodepool we are interacting that way, something else must be starting devstack twice22:35
SpamapSright as I typed those messages I dropped my headset. Must have hit a key to move me here. ;)22:35
corvusSpamapS: all keys lead to -infra22:35
*** tobias-urdin has joined #openstack-infra22:38
clarkbmriedem: I think if we sort out why devstack is run twice we win a prize. The job output file seems to be devstack first run22:38
clarkbmriedem: the devstack log file is the second devstack run22:39
clarkbsyslog doesn't seem to show a reboot22:40
ianwclarkb: it's not confusion over output capture, between the console, the -early log file and the devstacklog.txt file?22:40
*** yamamoto has quit IRC22:40
*** yamamoto has joined #openstack-infra22:41
clarkbianw: I don't think so, if you look in syslog its clearly running dvstack package installs twice22:41
clarkbianw: and those package installs seem to line up timestamps wise with the two devstack runs we see (one in job output the other in devstack log)22:41
tobias-urdininfra-root I've been working on getting access to the "openstack" account on PuppetForge https://forge.puppet.com/openstack and I've got in contact with the Puppet Inc employee that has access to this account and that managed this (manual) process of uploading Puppet module tarballs. I would like to get this account owned by the OpenStack project and then automate the upload of versions there22:41
tobias-urdin later on.22:41
clarkband apache seems to corroborate that glance is running at first then after gets sad22:41
tobias-urdinIs there any email I can use to transfer that account or should I try to seize control of the account personally first?22:41
clarkbtobias-urdin: it may make sense to try and have the puppet openstack team own that and use zuul secrets to provide that data to your jobs22:42
*** fuentess has quit IRC22:42
clarkbianw: the two start_time values differ too22:44
tobias-urdinclarkb: yeah, that's my end goal here, I need an email address for that account and maybe infra has some email I could use otherwise I can transfer it to myself temporarly.22:44
clarkbtobias-urdin: I don't think the infra team wants to keep being manager of those secrets that don't directly tie to the infra team. Does puppetforge not have the concept of a group owning something?22:45
ianwtobias-urdin: you can use infra-root@openstack.org and we cloud keep the credentials in our store, and provide the secret?22:45
clarkbianw: ya we can do that, though I think we are trying to encourage teams to rely on us less for stuff like that since we don't have to be that roadbloack anymore22:45
smcginnisclarkb: If you have a moment, I thought after https://review.openstack.org/#/c/620664/ merged, the UI would reflect that after some time. Are there additional steps needed to get Cinder set up like Designate is?22:46
ianwtrue, but as team maintained i think it's probably ok if people want a circuit breaker on one person going missing?22:46
clarkbianw: ya, the way loci has addressed this is to have an org with >1 member on dockerhub then a robot account that is also a member of that org22:47
mriedemclarkb: i hope the prize isn't a stuffed animal because i'm up to my knees in those already22:47
clarkbso that anyone in the org can update the robot credentials22:47
clarkbif puppetforge supports something similar I think that would be a good approach22:47
tobias-urdinunfortunately i dont think it does, it's more a super simple approach on you have an account with an username and publish under that, no organization-like functionality.22:48
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Add governance document  https://review.openstack.org/62243922:49
ianwtobias-urdin: and then you're planning on a centralised release job to publish there?22:53
clarkbmriedem: ianw zuul notices that job is a failure after the second devstack completes22:53
tobias-urdinianw: if it's ok with you i'll propose he changes email to infra-root@openstack.org then we can recover a password that way to skip having to send the password, probably unencrypted, somewhere.22:53
tobias-urdinianw: i was hoping being able to put together a job that can push modules there automatically after version bump in openstack/releases22:53
ianwtobias-urdin: ok, well tell me when the email comes in, i can setup a random password, store it in our repo and provide the secret for such a job22:54
tobias-urdinsince the process has always been manual, and hasn't been done since Cody, the Puppet Inc employee, discontinued his involvement in Puppet Openstack, so we haven't had any updated modules there in years.22:54
tobias-urdinianw: thanks, i'll let you know!22:55
ianwthis is the type of thing that automated jobs do well :)22:55
tobias-urdinindeed :)22:55
*** bobh has quit IRC22:56
clarkbmriedem: ianw ara's failed run devstack task captures the output of the "second" devstack run23:00
clarkbbased on matching start_time data23:00
*** Emine has joined #openstack-infra23:04
ianwclarkb: is this the 60bd495 logs you posted above?23:04
clarkbianw: yes23:04
*** jamesmcarthur has joined #openstack-infra23:05
clarkbif it were a time sync issue I would expect the nested start time values to line up at least since that should be recorded from the same frame of reference (on the test node)23:07
*** Emine has quit IRC23:08
ianwhttp://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/job-output.txt.gz#_2018-12-17_18_14_43_99281023:09
ianwwhat's 10.209.34.11723:09
clarkbthe rax private ip address for that instance23:10
clarkbmatches up with http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/zuul-info/inventory.yaml23:11
clarkbits almost like ansible decides to run that task multiple times23:13
clarkbbut only records the second (but our console logging hacks record the first, also maybe the console logging hacks not working for the second point at a possible cause?)23:13
ianw$ journalctl --file ./devstack.journal23:17
ianwFailed to open files: Bad message23:17
ianwi wonder if anyone has actually used that ...23:17
clarkbianw: its in the export format so you have to reimport it23:17
clarkbsystemd-journal-remote is the tool iirc23:18
*** jamesmcarthur has quit IRC23:18
clarkbI had it working when sdague added it23:19
clarkbbut its been a longtime23:19
*** _alastor_ has quit IRC23:20
clarkbthe keystone service log implies that time is mostly continuous and we don't jump or reset23:21
ianwDec 18 07:29:41 ubuntu-bionic-rax-dfw-0001240665 devstack@keystone.service[16401]23:22
ianw2018-12-17 20:19:54.622 | + ./stack.sh:main:488                      :   exec23:23
ianwthat's 12 hours?23:24
ianwor is journalctl doing some sort of TZ manipulation ...23:25
clarkb"Dec 17 18:19:39 ubuntu-bionic-rax-dfw-0001240665 sudo[2837]:     zuul : TTY=unknown ; PWD=/home/zuul ; USER=stack ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-smynvpgdbxuytkbqxjwinxoiuwrzslwf; /usr/bin/python2" then later "Dec 17 20:19:51 ubuntu-bionic-rax-dfw-0001240665 sudo[24797]:     zuul : TTY=unknown ; PWD=/home/zuul ; USER=stack ; COMMAND=/bin/sh -c echo23:25
clarkbBECOME-SUCCESS-smynvpgdbxuytkbqxjwinxoiuwrzslwf; /usr/bin/python2"23:25
clarkbits running the same exact script twice according to syslog23:25
clarkbianw: I think it may be trying to normalize that for you? I dunno23:25
ianwyeah i think it is, --utc helps23:25
clarkbianw: I thought the export format was supposed to fix those issues for us :(23:25
clarkbbased on the above syslo gI think this must be an ansible bug23:26
clarkbansible is running the same exact command for te task twice23:26
clarkbdmsimard: mordred corvus ^ any ideas23:26
clarkbfrom ara's perspective it isn't though, the time start and end coincide with the two running back to back ish23:28
clarkbbut ara only has logs for the second in the output portion of ara23:29
clarkbI'm guessing this is second day in a row we get to find a big ansible bug :)23:29
*** gfidente has quit IRC23:30
*** jamesmcarthur has joined #openstack-infra23:30
clarkbthey are suspiciously almost exactly 2 hours apart from each other23:31
*** _alastor_ has joined #openstack-infra23:31
*** ndahiwade has quit IRC23:31
clarkbkeepalive behavior maybe?23:31
mordredclarkb: WEIRD23:34
*** jamesmcarthur has quit IRC23:35
*** dkehn has quit IRC23:37
*** dkehn has joined #openstack-infra23:38
*** mriedem has quit IRC23:40
*** kgiusti has left #openstack-infra23:41
clarkbthose randbits are defined in make_become_cmd23:42
dmsimardI have no idea23:42
dmsimardLacking a bit of context though23:43
openstackgerritIan Wienand proposed openstack-infra/nodepool master: [wip] Add dogpile.cache master to the -src tests  https://review.openstack.org/62545723:43
dmsimardWhat are we troubleshooting ? :D23:44
*** eernst has joined #openstack-infra23:44
clarkbdmsimard: it appears we have run devstack twice in a job causing the job to fial as the second run interferes with the first. From syslog ansible runs the same BECOME-SUCCESS command with the same randbits twice 2 hours apart23:45
*** eernst has quit IRC23:45
clarkbdmsimard: its almost as if ansible has a bug that causes it to run the task twice23:45
clarkb_low_level_execute_command() calls make_become_cmd and doesn't loop23:46
clarkbso for those randbits to remain constant I think we have to be in _low_level_execute_command23:46
clarkbthough I'm probably looking at devel not 2.5.1423:46
*** eernst_ has joined #openstack-infra23:47
*** eernst_ has quit IRC23:51
ianwmordred: http://logs.openstack.org/62/618962/1/check/nodepool-functional-py35-redhat-src/a7a7304/controller/logs/devstacklog.txt.gz ... all still failing with dogpile issues ... investigating23:52
clarkbreading the ansible ssh connection _bare_run it does seem to Popen() twice but it should guard against that. Possible that we are seeing that as an issue? I dunno23:53
ianwahhh, we're not installing from constraints as nodepool goes into a virtualenv and we're using pip directly, rather than via devstack madness/magic23:53
clarkbya if there is a Popen that has an oserror or IOerror then ansible catches that and sets p = None then it checks if p = None and tries again23:54
* clarkb find sa link23:54
*** _alastor_ has quit IRC23:54
clarkbis it possible https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/connection/ssh.py#L711-L731 starts the process which goes out to lunch for a while due to IO issues23:55
clarkbthen after 2 hours (with it running remotely too) it gets an IOError23:56
clarkbthen runs it again23:56
clarkbmordred: ianw dmsimard ^23:56
ianwmordred: oh, i see, the < 0.7 pin wasn't actually merged to openstacksdk ... can we sort something out here?  it's blocking glean, nodepool, dib ...23:57
mordredianw: there is a fix from kmalloc up for review ... but it failed things last time - which things do you think we should merge?23:58
mordredclarkb: it certainly does look like it's possible for an ioerror on the first command to lead to a second invocation23:58
mordredclarkb: due to not p23:58
*** Emine has joined #openstack-infra23:59
mordredclarkb: and something something ssh timeout something ioerror?23:59
ianwmordred: personally my thought would be to merge the < 0.7 pin to give a bit more time to consider kmalloc's fix ...23:59
ianwmordred: which also matches since requirements has merged the pin too23:59
clarkbmordred: ya ssh timeout or tcp keepalives etc. The 2 hour gap there makes me really suspicious of that23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!