Monday, 2019-01-07

*** bobh has quit IRC00:10
*** martinkennelly has quit IRC00:23
*** hwoarang_ has joined #openstack-infra01:33
*** hwoarang has quit IRC01:34
*** wolverineav has joined #openstack-infra01:39
*** bhavikdbavishi has joined #openstack-infra01:47
*** wolverineav has quit IRC01:57
*** bobh has joined #openstack-infra01:57
openstackgerritzhouxinyong proposed openstack/diskimage-builder master: Delete the duplicate words in  50-zipl  https://review.openstack.org/62881502:02
*** bobh has quit IRC02:05
*** jamesmcarthur has joined #openstack-infra02:24
*** bhavikdbavishi has quit IRC02:37
*** hongbin has joined #openstack-infra02:47
*** jamesmcarthur has quit IRC02:50
*** whoami-rajat has joined #openstack-infra02:52
*** psachin has joined #openstack-infra02:59
*** hwoarang has joined #openstack-infra03:10
*** hwoarang_ has quit IRC03:12
*** wolverineav has joined #openstack-infra03:19
openstackgerritzhurong proposed openstack-infra/project-config master: Retire murano-deployment  https://review.openstack.org/62885003:26
*** hongbin has quit IRC03:39
openstackgerritMerged openstack-infra/project-config master: Add 'Review-Priority' for Zaqar repos  https://review.openstack.org/62832303:46
*** jamesmcarthur has joined #openstack-infra03:51
*** bhavikdbavishi has joined #openstack-infra03:55
*** ramishra has joined #openstack-infra04:01
*** bhavikdbavishi has quit IRC04:02
*** bhavikdbavishi has joined #openstack-infra04:02
*** jamesmcarthur has quit IRC04:18
*** wolverineav has quit IRC04:23
*** udesale has joined #openstack-infra04:33
*** bobh_ has joined #openstack-infra04:38
openstackgerritzhurong proposed openstack-infra/project-config master: Retire murano-deployment  https://review.openstack.org/62885005:14
*** jamesmcarthur has joined #openstack-infra05:19
*** jamesmcarthur has quit IRC05:23
*** bobh_ has quit IRC05:29
*** ykarel has joined #openstack-infra05:49
*** diablo_rojo has joined #openstack-infra05:50
*** bobh_ has joined #openstack-infra05:51
*** bobh_ has quit IRC05:56
*** hwoarang_ has joined #openstack-infra06:02
*** hwoarang has quit IRC06:03
*** bhavikdbavishi has quit IRC06:05
*** wolverineav has joined #openstack-infra06:07
*** armax has quit IRC06:11
*** wolverineav has quit IRC06:12
*** bobh_ has joined #openstack-infra06:41
*** jtomasek has joined #openstack-infra06:53
*** rcernin has quit IRC06:56
*** apetrich has joined #openstack-infra07:06
*** AJaeger has quit IRC07:11
openstackgerritzhurong proposed openstack-infra/project-config master: Retire murano-deployment  https://review.openstack.org/62885007:11
*** AJaeger has joined #openstack-infra07:20
*** bhavikdbavishi has joined #openstack-infra07:24
*** dpawlik has joined #openstack-infra07:29
*** ykarel is now known as ykarel|lunch07:29
*** bobh_ has quit IRC07:34
*** adriancz has joined #openstack-infra07:35
*** agopi_ has joined #openstack-infra07:37
*** rpittau has joined #openstack-infra07:38
*** agopi_ is now known as agopi07:40
*** tosky has joined #openstack-infra07:45
*** yamamoto has joined #openstack-infra07:56
*** yamamoto has quit IRC07:58
*** ginopc has joined #openstack-infra07:59
*** diablo_rojo has quit IRC08:02
*** aojea has joined #openstack-infra08:08
*** bobh_ has joined #openstack-infra08:09
*** pcaruana has joined #openstack-infra08:14
*** agopi has quit IRC08:18
*** kjackal has joined #openstack-infra08:22
*** ykarel|lunch is now known as ykarel08:23
*** pcaruana has quit IRC08:24
*** rascasoft has joined #openstack-infra08:25
*** xek has joined #openstack-infra08:39
*** yamamoto has joined #openstack-infra08:42
*** rcarrillocruz has joined #openstack-infra08:45
*** pcaruana has joined #openstack-infra08:46
*** bobh_ has quit IRC08:55
*** jpich has joined #openstack-infra08:57
*** yamamoto has quit IRC09:05
*** ykarel has quit IRC09:06
*** ykarel has joined #openstack-infra09:06
*** bobh_ has joined #openstack-infra09:16
*** gfidente has joined #openstack-infra09:27
*** shardy has joined #openstack-infra09:31
*** owalsh_ is now known as owalsh09:35
*** derekh has joined #openstack-infra09:36
*** ssbarnea|bkp2 has joined #openstack-infra09:37
*** ssbarnea has quit IRC09:39
*** wolverineav has joined #openstack-infra09:49
*** wolverineav has quit IRC09:53
*** yamamoto has joined #openstack-infra09:53
*** jaosorior has joined #openstack-infra10:03
*** dtantsur|afk is now known as dtantsur10:08
*** agopi has joined #openstack-infra10:17
*** roman_g has joined #openstack-infra10:18
*** bobh_ has quit IRC10:19
*** agopi is now known as agopi_10:20
*** agopi_ is now known as agopi10:21
*** bhavikdbavishi has quit IRC10:21
*** gfidente has quit IRC10:27
openstackgerritMerged openstack-infra/zuul master: dict_object.keys() is not required for *in* operator  https://review.openstack.org/62148210:27
*** sshnaidm|off is now known as sshnaidm10:30
*** yamamoto has quit IRC10:35
*** arxcruz|brb is now known as arxcruz10:35
openstackgerritMerged openstack/ptgbot master: Pin irc module to 15.1.1 to avoid import error  https://review.openstack.org/62690610:36
openstackgerritMerged openstack/ptgbot master: Generate PTGbot index page dynamically  https://review.openstack.org/62690710:37
*** mpeterson has quit IRC10:40
*** mpeterson has joined #openstack-infra10:42
*** mpeterson has quit IRC10:42
*** yamamoto has joined #openstack-infra10:45
*** mpeterson has joined #openstack-infra10:54
*** udesale has quit IRC10:54
*** gfidente has joined #openstack-infra11:02
*** mpeterson has quit IRC11:05
*** mpeterson has joined #openstack-infra11:06
*** pbourke has quit IRC11:07
*** pbourke has joined #openstack-infra11:09
*** pcaruana has quit IRC11:11
*** pcaruana has joined #openstack-infra11:16
ssbarnea|bkp2infra-root: need review on https://review.openstack.org/#/c/625576/ -- to undo the damaging unsafe umask on src folder11:19
jheskethssbarnea|bkp2: lgtm11:24
*** kjackal has quit IRC11:34
*** tobias-urdin is now known as tobias-urdin_afk11:34
*** kjackal has joined #openstack-infra11:34
*** rpittau is now known as rpittau|lunch11:41
ssbarnea|bkp2   jhesketh: thanks. this one is more important than it appears as it has some unexpected side effects, also preventing us to fix the random post timeouts.11:45
ssbarnea|bkp2... not to mention that the original approach did not make much sense anyway :D11:45
jheskethright, I'm not comfortable pushing it through without a second review though as it's a trusted repo. Additionally given its potential affects it should probably be baby sat which sadly it's late here11:46
ssbarnea|bkp2jhesketh: totally agree. i am sure we will get others from US.11:48
*** yamamoto has quit IRC11:50
*** yamamoto has joined #openstack-infra11:53
*** bhavikdbavishi has joined #openstack-infra11:54
*** agopi has quit IRC11:56
*** dpawlik has quit IRC11:58
*** tobias-urdin_afk is now known as tobias-urdin11:59
* SpamapS tries treating his insomnia with updating the shade package in Debian. It's not working... but at least shade will be up to date. :-P12:12
openstackgerritSlawek Kaplonski proposed openstack-infra/openstack-zuul-jobs master: Remove tempest-dsvm-neutron-scenario-linuxbridge job definition  https://review.openstack.org/62894212:13
*** bobh_ has joined #openstack-infra12:16
*** dkehn has quit IRC12:21
*** bobh_ has quit IRC12:21
*** agopi has joined #openstack-infra12:24
*** dpawlik has joined #openstack-infra12:25
fricklercorvus: did you decide when to do the k8s walkthrough? seems like tomorrow is preferred, but it would be good if I could know for sure soon, so I can plan my evenings a bit12:26
*** yamamoto has quit IRC12:26
*** yamamoto has joined #openstack-infra12:29
*** ykarel is now known as ykarel|afk12:30
*** zigo has quit IRC12:31
SpamapSfrickler: IIRC k8s walkthrough is on Tuesday (US/Pacific), not sure the exact time.12:32
*** bhavikdbavishi has quit IRC12:32
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Extract common config parsing for ProviderConfig  https://review.openstack.org/62509412:34
*** zhangfei has joined #openstack-infra12:37
*** rlandy has joined #openstack-infra12:42
fricklerSpamapS: after the infra meeting is the first option on the ethercalc, wednesday the second one, so that would match my assumption12:51
SpamapSya12:52
*** yamamoto has quit IRC12:52
fricklerinfra-root: system-config-run-base-ansible-devel seems to be failing since friday, "ERROR! Unexpected Exception, this is probably a bug: 'PlaybookCLI' object has no attribute 'options'" http://logs.openstack.org/16/628216/4/check/system-config-run-base-ansible-devel/2fbf1ef/job-output.txt.gz#_2019-01-04_17_21_05_82768312:52
*** evrardjp has joined #openstack-infra12:54
*** rpittau|lunch is now known as rpittau12:56
*** evrardjp has quit IRC12:59
*** evrardjp has joined #openstack-infra12:59
*** jhesketh has quit IRC13:05
*** jhesketh has joined #openstack-infra13:06
fricklerlooks like this might be the culprit https://github.com/ansible/ansible/commit/afdbb0d9d5bebb91f632f0d4a1364de5393ba17a13:08
*** kaiokmo has joined #openstack-infra13:09
fricklerpossibly a genuine upstream bug instead of some bad use of internals on our side13:09
mordredfrickler: yah - I think so - we don't use the python api for running playbooks13:10
mordredfrickler: oh - wait - I think we might be using CLI options in callback plugins13:11
mordredfrickler: ok. I;m going to stop responding until I've had more coffee13:12
mordredfrickler: that's us running ansible in a job to run base.yaml - so isn't a zuul-side error, so yeah, I'd say that's most likely to be an ansible bug13:13
*** ykarel|afk is now known as ykarel13:15
*** boden has joined #openstack-infra13:18
*** wolverineav has joined #openstack-infra13:25
openstackgerritMerged openstack-infra/project-config master: Add fetch-output to base job  https://review.openstack.org/51185113:26
*** trown|outtypewww is now known as trown13:27
*** tmorin has joined #openstack-infra13:29
*** wolverineav has quit IRC13:29
*** boden has quit IRC13:30
tmorinhi infraroot: would someone be available to freeze and open access to a CI devstack VM, to allow me to investigate a failure I can't manage to reproduce locally ?13:30
*** dave-mccowan has joined #openstack-infra13:31
tmorin(the job is legacy-tempest-dsvm-networking-bgpvpn-bagpipe for change 626895,3)13:31
*** tmorin has left #openstack-infra13:31
*** tmorin has joined #openstack-infra13:31
*** ykarel is now known as ykarel|away13:32
tmorininfra-root ^   (perhaps more likely to ping someone than 'infraroot')13:32
tmorinthanks in advance13:32
SpamapSmordred: ty for the shade +A .. I'm getting 1.30.0 into Debian and finally fixing the RC that kept it out of all releases (usr/bin/shade-inventory in python- and python3-)13:32
SpamapShaving to remember some incantations though13:32
SpamapSmordred: quite nice to drop all of those old build-deps for openstacksdk.13:33
*** udesale has joined #openstack-infra13:36
fricklertmorin: looking13:38
mordredSpamapS: ++ - I enjoyed your tweet about the packaging this morning13:38
tmorinfrickler: thanks!  (I 'recheck' minutes ago, seeing a failure with many jobs)13:39
SpamapSmordred: yeah, I just wish I was asleep instead of tweeting to your delight. ;)13:40
tmorin"ERROR Unable to find playbook  /var/lib/zuul/builds/cf13bfca241e43f890868f4c09ce963c/trusted/project_0/git.openstack.org/openstack-infra/project-config/playbooks/base/post-ssh.yaml" -> this seems unusual, although totally unrelated to the issue I'm trying to investigate13:41
*** jamesmcarthur has joined #openstack-infra13:41
tmorinfrickler: the job isn't yet running, it's currently in "queued" status13:42
mnaseri just ran into the same error that tmorin ran into for my job13:42
fricklertmorin: hmm, it looks like we may have broken project-config13:42
mnaseri mean13:43
mnaserthe last merge into project-config is13:43
fricklermordred: this relates to https://review.openstack.org/#/c/511851/ ^^13:43
mnaseryeah13:43
fricklernetworking-bgpvpn-dsvm-functional networking-bgpvpn-dsvm-functional : ERROR Unable to find playbook /var/lib/zuul/builds/551b054a32fb425caa81d8ef5ba4ca2d/trusted/project_0/git.openstack.org/openstack-infra/project-config/playbooks/base/post-ssh.yaml13:43
openstackgerritJens Harbott (frickler) proposed openstack-infra/project-config master: Revert "Add fetch-output to base job"  https://review.openstack.org/62896713:44
fricklerinfra-root: ^^ probably need this until we have a better fix13:44
mnaserwe probably missed a post-ssh.yaml somewhere13:45
mnaseryep13:46
mnaserwe missed the one13:46
mnaserinside 'base'13:46
fricklerthis rather looks like a bad rebase to me13:46
*** smarcet has joined #openstack-infra13:46
*** jcoufal has joined #openstack-infra13:46
fricklermeh, the revert fails zuul, too13:47
*** ykarel|away has quit IRC13:47
mnaserwe might need an infra-root to force merge13:47
fricklermnaser: do you think you have an easy fix? otherwise I'd force-merge the revet13:47
fungimordred: pabelanger: you've been working on the fetch-output stuff, right? ^13:47
mnaseri mean13:47
mnaseri see a post-ssh.yaml that is referenced in the base job13:47
mordreduhoh. yeah - let's revert asap13:48
fricklerfungi: yes, https://review.openstack.org/511851 merged 20 minutes ago, seems it broke everything13:48
openstackgerritMohammed Naser proposed openstack-infra/project-config master: fetch-output: switch base to use post.yaml  https://review.openstack.org/62897013:48
mnasermordred: frickler ^13:48
mnaseri think that's the issue13:49
mordredmnaser: wow, how did we miss that13:49
mordredwe could just slam that one in instead - I think we're going to need a force-merge in either case13:49
mnaserill leave the decision of revert OR force-merge that up to whoever13:49
fricklerit will probably fail in zuul, too, yes13:49
fricklermordred: I'll leave it to you to clean up your patch, if you don't mind13:50
mordredkk. I'm going to force-merge mnaser's patch13:50
openstackgerritMerged openstack-infra/project-config master: fetch-output: switch base to use post.yaml  https://review.openstack.org/62897013:50
mnaserok lets see if we unbroke things now13:50
*** kgiusti has joined #openstack-infra13:51
mordredoh. sigh. I only set it on base-minimal before. sorry about that!13:51
mnaserwe know who's buying first round of drinks next ptg!13:51
mordredyes!13:52
fungiibm? ;)13:52
evrardjphaha13:52
mordredthanks ginny13:52
fungisorry, too soon13:53
mnaseri see jobs starting up13:53
mnaseri think we're good13:53
*** boden has joined #openstack-infra13:53
fungithanks for spotting that!13:54
mordredfrickler, fungi, mnaser: since you all have fetch-output stuff paged in- there's a patch that adds functional tests too: https://review.openstack.org/#/c/628731/13:56
fricklertmorin: o.k., now your job should be running properly and will be held once/if it fails. which ssh key shall I use for access?13:57
mnasermordred: btw, i would suggest getting in the habit of using 'is' instead of '|'13:57
mnaseri think | is being dropped soon, it throws deprecation stuff all over runs in newer versions of ansible13:58
tmorinfrickler: to be able to troubleshoot the problematic OVS state, I need to prevent tempest cleanup steps from happening13:58
mordredmnaser: ah - good call13:58
tmorinfrickler: I already tweaked the tempest test to do a sleep(10000) at the right place13:59
mnasermordred: it also reads easier sometimes, 'log_directory is not changed'13:59
mnaserbut anyways, that's just a nit :)13:59
tmorinfrickler: so we don't need to wait for a failure to freeze it13:59
mordredmnaser: totally. I'm going to do a folllowup that changes those - and also the ones that I cargo-culted from :)13:59
tmorinfrickler: sending you my pub key by PM13:59
mordredmnaser: does | succeeded go to is succeeded too?13:59
mnaseryep mordred14:00
fricklertmorin: oh, o.k., then I need to dig into finding the correct node before it is being listed as held14:00
mnaserhttps://docs.ansible.com/ansible/latest/user_guide/playbooks_tests.html#test-syntax -- "As of Ansible 2.5, using a jinja test as a filter will generate a warning."14:01
*** whoami-rajat has quit IRC14:01
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Use is instead of | for tests  https://review.openstack.org/62897314:02
mordredmnaser: ^^14:02
openstackgerritsebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Update laravel legacy jobs for PHP 7.x  https://review.openstack.org/62897414:04
*** tmorin has quit IRC14:04
mnasermordred: small tweak :)14:06
openstackgerritMonty Taylor proposed openstack-infra/zuul-base-jobs master: Add fetch-output to base jobs  https://review.openstack.org/62897514:07
openstackgerritMonty Taylor proposed openstack-infra/zuul-base-jobs master: Ignore errors on ssh key removal  https://review.openstack.org/62897614:07
mordredmnaser: ah - yes - thanks!14:07
mordredSpamapS: ^^ you use the zuul-base-jobs repo I believe? you might be interested in the stack there14:07
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Use is instead of | for tests  https://review.openstack.org/62897314:10
mordredmnaser: fixed. thanks! that's much better14:10
*** e0ne has joined #openstack-infra14:12
*** ykarel|away has joined #openstack-infra14:12
pabelangerfungi: it doesn't look like 511851 was staged properly via bast-test first, which broke things on merge14:12
smarcetfungi: mordred: morning , when u have some time please review https://review.openstack.org/#/c/628974/ thx !14:13
*** irclogbot_1 has quit IRC14:14
mordredpabelanger: nah - we did base-test - it's just when we applied it to base, I only updated the base-minimal job description and not also the base job description14:15
mordredsilly me14:15
mordredit makes me really wish zuul would consider a job definition that references a non-existent playbook as a config error (although it would be a bit expensive for it to do so)14:16
*** jamesmcarthur has quit IRC14:18
dhellmannconfig-core: this change to add a release job to the placement repo is a prereq for including it in the stein release, and the deadline for that is this week. Please add it to your review queue for the next couple of days. https://review.openstack.org/#/c/628240/14:19
*** e0ne has quit IRC14:19
pabelangermordred: ah, I see 628780 now14:24
AJaegermordred, mnaser, could you review https://review.openstack.org/#/c/628240/ , please?14:25
*** xek has quit IRC14:27
*** xek has joined #openstack-infra14:28
mordredAJaeger: done14:30
mordredpabelanger: yeah - oh well, we can't be perfect :)14:30
AJaegermordred: something still broken, see https://review.openstack.org/#/c/628731/14:30
AJaegerERROR Unable to find playbook /var/lib/zuul/builds/45e70d41476244b1b1ebdcea184fd3d8/trusted/project_0/git.openstack.org/openstack-infra/project-config/playbooks/base-minimal/post.yaml14:31
mordredoh ugh14:31
mordredyeah - one sec14:31
mordredthat'll only be affecting those tests and not everybody14:31
mordredbut still ugh14:31
dhellmannconfig-core: this patch to add new repos to the sahara project is also a prereq for a governance change for which this week's milestone is the deadline. Please add it to your review queue for early this week. https://review.openstack.org/#/c/628209/14:32
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Rename base-minimal/post-ssh to base-minimal/post  https://review.openstack.org/62898314:33
mordredpabelanger, AJaeger: ^^14:33
mordredpabelanger: maybe I should have started with getting your base job refactor in first :)14:33
mordredAJaeger: wildcards work in gerritbot config? (re: 628209)14:34
mordredAJaeger: ah - so they do. neat!14:35
*** smarcet has quit IRC14:36
*** needsleep is now known as TheJulia14:36
AJaegermordred: yes14:36
*** smarcet has joined #openstack-infra14:37
ssbarnea|bkp2mordred: pabelanger : any chance to get https://review.openstack.org/#/c/625576/ merged?14:37
AJaegerssbarnea|bkp2: let's first fix the current breakage, please ;)14:38
AJaegerssbarnea|bkp2: your change has a high risk for failure and is untested...14:38
pabelangerssbarnea|bkp2: I haven't followed but thought talks with clarkb and corvus was that is expected behavor for legacy reasons14:39
pabelangerI'd much rather see jobs stop using zuul-cloner14:39
pabelangerand delete that role14:39
*** irclogbot_1 has joined #openstack-infra14:39
mordredyeah - I'm worried about the fallout from making that change - it's super hard to test or figure out what might break14:39
*** fuentess has joined #openstack-infra14:39
dhellmannmordred : thank you!14:40
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Update laravel legacy jobs for PHP 7.x  https://review.openstack.org/62897414:40
ssbarnea|bkp2mordred: that chmod was evil in the first place, i do understand that we need to be careful about that change, but this does not mean we shouldn't repair the damage just because of potential risk. right?14:41
*** jamesmcarthur has joined #openstack-infra14:42
ssbarnea|bkp2i think we should be able to findout in less than 30min if something important is affected and address it (doing it a job level, a local fix, or even a revert)14:43
openstackgerritMerged openstack-infra/project-config master: Add publish-to-pypi template to placement  https://review.openstack.org/62824014:44
fungissbarnea|bkp2: is your supposition that the issue https://review.openstack.org/512285 attempted to fix by adding that is no longer present?14:45
*** ykarel|away is now known as ykarel14:45
*** smarcet has quit IRC14:45
*** e0ne has joined #openstack-infra14:46
ssbarnea|bkp2fungi: yep, this was my impression that we no longer need the hardlinking support.14:46
*** e0ne has quit IRC14:47
fungii think the discussion of the original problem starts at http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-15.log.html#t2017-10-15T12:24:4514:48
openstackgerritMerged openstack-infra/project-config master: Add new Sahara repositories for split plugins  https://review.openstack.org/62820914:49
openstackgerritMerged openstack-infra/project-config master: Rename base-minimal/post-ssh to base-minimal/post  https://review.openstack.org/62898314:50
tobias-urdinfungi: a while ago ianw_pto helped fix the forge.puppet.com credentials for the "openstack" account and iiuc it should be stored as a secret in zuul now but I can't seem to find it after browsing all repos and commit history, do you know where I could find it?14:50
*** anteaya has joined #openstack-infra14:50
AJaegertobias-urdin: might be only in the "private" hiera secret store14:52
anteayaso some confused third party ci person doesn't yet understand the purpose of an example wiki page: https://wiki.openstack.org/w/index.php?title=ThirdPartySystems/Example&diff=next&oldid=5644314:52
anteayaI'll change the text back and email the account cc'ing the infra email list14:53
fungithanks anteaya14:53
anteayaI don't know how successful I'll be, so thought I'd let folks know14:53
fungitobias-urdin: is there a job uploading files to puppetforge? i can likely trace backwards from whatever's using it14:54
fungianteaya: if you're a wiki admin you should be able to roll back the edit14:54
tobias-urdinfungi: no, I'm in the process of building that but the missing piece is what the secret was named so I can access it14:54
*** smarcet has joined #openstack-infra14:55
fungianteaya: if you see an "undo" link next to it in the list at https://wiki.openstack.org/w/index.php?title=ThirdPartySystems&action=history then that should hopefully do what you need14:56
*** rlandy is now known as rlandy|rover14:56
*** smarcet has quit IRC14:56
*** zhangfei has quit IRC14:57
fungitobias-urdin: i'll look in the usual places and see if we've recorded it somewhere14:57
tobias-urdinfungi: thanks!14:57
fungitobias-urdin: i have a password for an openstackinfra user on puppetforge with a comment: This is for the "openstack" namespace.  This used to be owned by a single user, but at request of PTL was assigned to infra.  user names map 1:1 with emails so we could not reuse above.  Note this + email gets filtered into a folder on infra-root imap server14:59
fungioh, sorry, that comment was for the openstack user, the openstackinfra user is noted as unused14:59
tobias-urdinthat's probably it, we had some issues since there was an openstack-infra namespace already that used the infra-root email15:01
*** irclogbot_1 has quit IRC15:01
anteayafungi: seems I am a wiki admin, I found a rollback option15:01
tobias-urdinthe namespace for modules is the username of the account so openstack/glance (which we upload from git.o.org/openstack/puppet-glance) puppet module is the "openstack" account on forge.puppet.com15:02
fungianteaya: great! i thought i recalled you being one, but didn't have time to check your account perms15:02
openstackgerritPaul Belanger proposed openstack-infra/zuul master: Clean up command sockets on stop  https://review.openstack.org/62899015:03
anteayafungi: you have a great memory15:04
fungitobias-urdin: makes sense. so anyway, if the intent is that the credentials for that account are going to be managed centrally (not by the puppet-openstack team), then we likely need the playbook which will use it added to the openstack-infra/project-config repo. if you want to propose that with a placeholder for the zuul secret, i can upload a revision of the change which includes the encrypted15:04
fungipassword15:04
mordredfungi: yeah - I think the idea was to have a central "upload to puppetforge" job sort of like upload-pypi15:08
mordrediirc15:08
* mordred wasn't 100% paying attention15:08
tobias-urdinhttps://review.openstack.org/#/q/topic:forge-publish+(status:open+OR+status:merged)15:09
*** irclogbot_1 has joined #openstack-infra15:09
tobias-urdin^ is what i have so far, where https://review.openstack.org/#/c/627573/ is the one that will use the secret15:10
anteayaalso looking at that contributor's edits on the wiki, that username is a 9 digit number and it appears their co-worker is a 10 digit number username15:10
anteayafeels weird to me15:10
anteayabut we don't have a rule about wiki usernames15:10
* anteaya is looking at https://wiki.openstack.org/w/index.php?title=ThirdPartySystems&diff=cur&oldid=16746115:11
fungianteaya: i've seen so many weird things from our community i've started to question them less and less often ;)15:11
anteayaokay, thanks for the sanity check15:11
fungitobias-urdin: great! so i guess we need to add a secret in zuul.d/secrets.yaml and then add it to the secrets list for the release-openstack-puppet job description15:12
clarkbssbarnea|bkp2: pabelanger mordred  ya my concern is that that role is for preserving "legacy" behavior15:13
clarkbso updating it to not be legacy is potentially dangerous15:14
ssbarnea|bkp2fungi: re chmod on src, if i understand well the risk is only around legacy jobs, right? probably something like http://codesearch.openstack.org/?q=cp%20-l&i=nope&files=&repos= being the affected stuff, right?15:14
clarkbinstead you should stop using zuul cloner15:14
fungiwhat shall we call it? "puppetforge_credentials" looks like it would be most consistent with the other entries, but there is a lot of variation in there15:14
fungitobias-urdin: ^15:14
fungissbarnea|bkp2: lots of things perform a hardlinking copy under the hood... virtualenv (via tox or otherwise), git clone file:///... and more15:16
ssbarnea|bkp2fungi: hardlinking limitation due to file permissions applies only original owner is different than the one trying to do the hardlinking.15:17
ssbarnea|bkp2fungi: if everything is run under the same user, there is no need to "hack" default umask15:17
fungiand testing for it becomes challenging because if it's between different filesystems then those tools have fallbacks which will do something other than hardlink (because that's not possible) but we have different filesystem layouts in different providers15:17
ssbarnea|bkp2and AFAIK, anything under ~zuul should be owned by zuul and no other users.15:17
tobias-urdinfungi: sounds good to me, but i'll leave it entirely up to infra :)15:18
*** dpawlik has quit IRC15:18
fungissbarnea|bkp2: sure, top offenders will be jobs which use multiple accounts, such as devstack and devstack-based functional test jobs15:18
ssbarnea|bkp2fungi: even jobs with multiple accounts have workarounds that do not need the umask hack: just adding the stack user to the zuul group is fixing the hardlinking issue15:18
ssbarnea|bkp2we only need to avoid o+w15:19
fungissbarnea|bkp2: that happened? on old branches of devstack too?15:19
ssbarnea|bkp2fungi: i don't know that, i am only trying to eradicate the umask while trying to address risks it receive.15:21
*** openstackgerrit has quit IRC15:22
*** openstackgerrit has joined #openstack-infra15:23
openstackgerritPaul Belanger proposed openstack-infra/zuul master: Ensure command_socket is last thing to close  https://review.openstack.org/62899515:23
openstackgerritSorin Sbarnea proposed openstack-infra/devstack-gate master: Replace cp -l with --reflink=auto  https://review.openstack.org/62899815:31
dmsimardbtw, tagged ara 0.16.2 for release. No new features -- addresses warnings and a deprecation notice.15:34
*** e0ne has joined #openstack-infra15:42
*** bobh_ has joined #openstack-infra15:43
*** bobh_ has quit IRC15:47
*** e0ne has quit IRC15:50
openstackgerritWill Szumski proposed openstack-dev/pbr master: Do not globally replace path prefix  https://review.openstack.org/62900615:56
smcginnisdmsimard: Not sure if you saw, but the ara release jobs failed. Well, just the docs publishing. Looks like the readthedocs config isn't fully set up.15:57
dmsimardsmcginnis: Oh? I'll look -- thanks15:57
clarkbrtd changed their api and now we cant remotely trigger updates15:58
clarkbiirc15:58
dmsimardclarkb: even with the new webhook stuff ?15:58
dmsimardsmcginnis: do you have a link handy ?15:58
smcginnisYeah, let me track that down.15:58
smcginnisdmsimard: http://lists.openstack.org/pipermail/release-job-failures/2019-January/001015.html15:59
smcginnisdmsimard: The logs don't actually have much useful info though - http://logs.openstack.org/a3/a31a4f8cbbc84f3d96efb0ffc533621190fdde46/release/trigger-readthedocs-webhook/d500e56/job-output.txt.gz#_2019-01-07_15_25_57_41688016:00
clarkbdmsimard: yes they broke it after the webhook stuff. ianw filed abug with them16:00
dmsimardsmcginnis: hmmm, I probably need to put the webhook_id elsewhere than https://github.com/openstack/ara/blob/master/zuul.d/layout.yaml#L316:00
dmsimardclarkb: ok, I'll trigger it manually for now16:01
*** bnemec is now known as stackymcstackfac16:04
*** stackymcstackfac is now known as bnemec16:05
*** fuentess has quit IRC16:06
*** smarcet has joined #openstack-infra16:12
*** jamesmcarthur has quit IRC16:17
*** jamesmcarthur_ has joined #openstack-infra16:17
*** pcaruana has quit IRC16:21
*** jamesmcarthur_ has quit IRC16:22
*** jamesmcarthur has joined #openstack-infra16:22
*** wolverineav has joined #openstack-infra16:24
fungiheading out to run some lunch errands, but will be back as soon as i can16:28
*** whoami-rajat has joined #openstack-infra16:28
*** wolverineav has quit IRC16:29
*** smarcet has quit IRC16:29
*** psachin has quit IRC16:31
*** smarcet has joined #openstack-infra16:32
openstackgerritWill Szumski proposed openstack-dev/pbr master: Do not globally replace path prefix  https://review.openstack.org/62900616:33
*** ramishra has quit IRC16:34
*** armax has joined #openstack-infra16:39
*** udesale has quit IRC16:40
*** bobh_ has joined #openstack-infra16:41
clarkbssbarnea|bkp2: would it be reasonable for your ansible use case to add a chmodprior to running ansible? or cloning the roles/playbooks to another location first? I'd really like to avoid breaking people in that legacy state by changing the expectations around that. We recognize there were bugs and deficiencies with that setup which is why we've replaced it entirely in zuulv3.16:42
clarkblogan-: fwiw host_id: 704a6e4d2ae61ad0bf113de69b52cb6414dadb287241358ebaf1c7b2 shows up in a couple jobs that exhibit weird ipv4 connectivity between test nodes in limestone cloud. http://logs.openstack.org/31/628731/7/check/openstack-infra-multinode-integration-ubuntu-trusty/35c4982/zuul-info/inventory.yaml is one example with ovs vxlan tunnel over ipv4 not working and16:46
clarkbhttp://logs.openstack.org/00/628200/1/gate/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/660080e/job-output.txt is a tripleo job unable to ssh from one node to the other for ansible over ipv416:46
clarkb(still a msall data set so unfortunately don't have much more info than that)16:46
*** fuentess has joined #openstack-infra16:47
clarkbdmsimard: https://github.com/rtfd/readthedocs.org/issues/4986 is the rtd bug16:48
clarkbstill open but looks to be accepted and should be fixed in the future16:48
*** smarcet has quit IRC16:51
*** ginopc has quit IRC16:53
openstackgerritWill Szumski proposed openstack-dev/pbr master: Do not globally replace path prefix  https://review.openstack.org/62900616:54
*** wolverineav has joined #openstack-infra16:54
*** notmyname has quit IRC16:55
openstackgerritMerged openstack-infra/system-config master: Turn on the future parser for all zuul mergers  https://review.openstack.org/61629516:56
*** wolverineav has quit IRC16:56
*** notmyname has joined #openstack-infra16:57
*** smarcet has joined #openstack-infra16:58
*** rfolco has quit IRC16:58
*** rfolco has joined #openstack-infra16:59
*** rpittau has quit IRC17:00
*** smarcet has quit IRC17:02
*** aojea has quit IRC17:04
openstackgerritMerged openstack-infra/zuul master: Fix ignored but tracked .keep file  https://review.openstack.org/62139117:06
openstackgerritMerged openstack-infra/system-config master: Turn on the future parser for zuul.openstack.org  https://review.openstack.org/61629617:06
clarkbinfra-root ^ I'll be watching that17:08
*** wolverineav has joined #openstack-infra17:08
mordredclarkb: ++17:09
clarkbfungi: the changes after that one are for lists.kata and lists.open, any chance you want to approve and/or babysit those with me today?17:09
*** dustinc has joined #openstack-infra17:11
openstackgerritMatthieu Huin proposed openstack-infra/zuul-jobs master: [WIP] upload-pypi: add option to register packages  https://review.openstack.org/62901817:19
ssbarnea|bkp2clarkb: so not fixing broken zuul-cloner role because risks and because is supposed to go away. Still, I believe we do include it in 99% of jobs based on https://github.com/openstack-infra/project-config/blob/ab0cb430d130aaed3e6d333384c4d6d8740040fe/playbooks/base/pre.yaml#L38 -- fetch-zuul-cloner does more than fetching it, is also messing the src folder.17:20
ssbarnea|bkp2clarkb: can we make the role execution conditional? ... a step towards deprecation.17:21
ssbarnea|bkp2or better, to run the "umark" task only on old jobs. do we have a variable we can use to add a condition?17:21
clarkbssbarnea|bkp2: right I don't think we want to fix the frozen deprecated process. Instead we want to convert jobs to the new process. I think the plan for that was to make a different base job for legacy jobs. And the main base job wouldn't run the zuul cloner shim setup17:22
clarkbssbarnea|bkp2: but that process ran into problems because jobs were not marked legacy but had legacy dependencies. Probably what we can do is notify the dev list of the change happening then make the switch in a couple weeks17:23
clarkbpabelanger: mordred ^ I think you had a lot more of thatp aged in than I did17:23
clarkbcorvus: http://paste.openstack.org/show/740460/ shows up on zuul node puppet runs. Can I just clean that up out of band to make the puppet logs quieter?17:25
clarkbmordred: ^ you amy know too as it seems related to the zuul dashboard hoating17:25
*** trown is now known as trown|lunch17:26
mordredclarkb: uhoh. what did I do?17:26
corvusyeah, i think that's very old status page17:26
ssbarnea|bkp2smells like chicken and the egg kind of issue. never fixing "base" because someone is/may be using it. How about having a "base2" base to use. At least this approach allow people to adopt newer base(s) without having to worry about some project being too slow.17:26
clarkbssbarnea|bkp2: yes I'm saying we should fix base, just give people some time to chagne their jobs if they need to first17:26
clarkbssbarnea|bkp2: basically the thing that has been missing is someone to drive and coordinate that work. There isn't a lack of wanting to do it17:27
mordredclarkb: I agree - I think we shoudl fix base to not run fetch-zuul-cloner and have that only run in legacy-base17:27
ssbarnea|bkp2mordred: i like that idea. i was considering using a "when" confition on inclusion of this role.17:28
ssbarnea|bkp2import_role with when works ok, I only do not know what to check for (how to know which job is old/new)17:29
clarkbssbarnea|bkp2: I don't think we want to make it complicated like that. Instead rely on zuul's job inheritance to simplify it for us. Use base if your job is not legacy and legacy-base if it is17:29
clarkb(I don't know if legacy-base exists yet)17:29
clarkbbase won't have the zuul cloner shim setup in it and legacy-base will17:30
clarkbcorvus: ok I'll clean those dirs up17:30
*** rf0lc0 has joined #openstack-infra17:30
*** jpich has quit IRC17:31
*** rf0lc0 has quit IRC17:31
mordredyes - legacy-base exists17:32
mordredand it runs fetch-zuul-cloner17:32
*** rfolco has quit IRC17:33
mordredso I think we should be able to warn people, give it a little time, then remove fetch-zuul-cloner from base17:33
mordredall of the autoconverted legacy jobs use legacy-base17:33
clarkb++17:33
openstackgerritSorin Sbarnea proposed openstack-infra/project-config master: WIP: attempt removal of fetch-zuul-cloner from base job  https://review.openstack.org/62901917:34
ssbarnea|bkp2mordred: clarkb ^^ so my WIP test above could be the future removal. I already crearted a DNM change that tests its effect on tripleo https://review.openstack.org/#/c/625680/17:38
clarkbssbarnea|bkp2: I don't think the depends on will work because project-config changes must be merged first before they can be tested17:38
mordredssbarnea|bkp2: that unfortunately won't work ...17:38
mordredyeah - what clarkb said17:38
clarkbwhat we can do is troll logstash for zuul-cloner usage17:39
mordred(this will get better with pabelanger's base job refactor, but that hasn't landed yet)17:39
clarkband cross check that with people using base and not legacy-base17:39
*** ykarel has quit IRC17:39
clarkb(I'm not sure how much work that is)17:39
mordredclarkb: we could also work through pushing in pabelanger's base refactor so that we could test zuul-cloner removal with depends-on17:39
clarkbzuul01 ran with futureparser and seems happy17:39
ssbarnea|bkp2i don't know how you can detect its usage, as the role runs as part of every hob.17:40
clarkbssbarnea|bkp2: you'd be looking for jobs that actually run the zuul-cloner command later17:40
clarkbif they don't then they don't need the shim17:40
clarkbmordred: ya17:40
mordredclarkb: https://review.openstack.org/#/q/status:open+topic:base-minimal-jobs - I was going to work on landing that once done with the zuulv3-output topic17:40
*** rfolco has joined #openstack-infra17:42
*** rkukura has joined #openstack-infra17:42
*** gyee has joined #openstack-infra17:45
*** dtantsur is now known as dtantsur|afk17:45
AJaegermordred: feel free to ping if you need review for zuulv3-output, I suggest we wrap that up without waiting another 12 months ;)17:46
mordredAJaeger: totally. It's my main priority job-wise at the moment17:47
jrosseri've seen mirror errors like this a fair few times now http://logs.openstack.org/26/628926/8/check/openstack-ansible-functional-centos-7/fec2d75/job-output.txt.gz#_2019-01-07_17_29_52_85580617:47
clarkbAJaeger: mordred is there much else to do after the base job switch? I guess convert a job or two and point people to that setup?17:48
*** wolverineav has quit IRC17:49
clarkbjrosser: http://mirror.regionone.limestone.openstack.org:8080/rdo/centos7/a4/4b/a44b7fc6344c56410b94f2e69ef07fd4b48abb6a_b72eb3dd/python-oslo-utils-lang-3.37.1-0.20181012100734.6e0b90b.el7.noarch.rpm exists but the 3.39 rpm your job requests does not. Those are caching proxies so you've tried to install a package that does not exist on the remote I think17:49
clarkbjrosser: its possible we've got cache mismatches between indexes and actualy contents? except the index I see shows 3.37 and that rpm exists17:50
AJaegerclarkb: https://review.openstack.org/#/q/topic:zuulv3-output+status:open is the list of open reviews - mordred is rebasing one after the other and merging them...17:50
ssbarnea|bkp2http://codesearch.openstack.org/?q=bin%2Fzuul-cloner&i=nope&files=&repos= reports ~444 occurences, which makes me hopless regarding reaching 0 in my lifetime. neither using logstash does not make me more optimistic17:50
mordredclarkb: yeah - I think the next bit after that stack would be converting some of our main jobs - like devstack and build-sphinx17:50
openstackgerritMerged openstack-infra/puppet-ptgbot master: No longer needs room map in configuration  https://review.openstack.org/62561917:51
mordredas examples - but also to get large portions of our system converted over17:51
mordredI think we need to wait a bit before we can convert things like unittests base though17:51
mordredwe'll need to give deployers a deprecation period to get fetch-output into their base jobs17:52
clarkbjrosser: https://trunk.rdoproject.org/centos7/a4/4b/a44b7fc6344c56410b94f2e69ef07fd4b48abb6a_b72eb3dd/ seems to only have 3.37 too. Any idea where 3.39 is coming from?17:52
mordredbut I thnik converting openstack-only base jobs should be straightforward17:52
clarkbssbarnea|bkp2: ya thats why I suggest we notify people via the list. Maybe include that codesearch link and a link to a logstash query17:52
clarkbssbarnea|bkp2: but let them fix it themselves17:52
clarkbssbarnea|bkp2: most of them should be using legacy-base if they came from the job conversion we did from jjb17:53
mordredyah. I'd say using zuul-cloner and not using legacy-base should be an unsupported config17:54
ssbarnea|bkp2somehow i find the concept of full switch from any v1 to v2 as something very hard to achieve. Can't we find a more progressive adoption approach? maybe we can enable/disable features/changes using a versioned variable. "zuul_job_version", which is implicitly 1 but we could defined a bigger value in our jobs. Ans we can have tasks that run based on the value of this.17:56
jrosserclarkb: i think that is a dependany of some other package, its easier to parse here but there are a bunch it can't find http://logs.openstack.org/26/628926/8/check/openstack-ansible-functional-centos-7/fec2d75/logs/ara-report/result/f84f0c97-d4d7-416b-8972-b3abcaa08833/17:56
clarkbssbarnea|bkp2: we crossed that bridge when zuul v3 decided to be incompatible with < 317:56
clarkbssbarnea|bkp2: I think for the future we should be more careful, but zuul-cloner is an artifact of zuul v2 which we do not run anymore and we should stop using it entirely17:57
clarkbjrosser: I wonder if RDO is updating packages before putting all of the dependencies in place17:57
clarkbjrosser: dmsimard may know17:57
openstackgerritMerged openstack-infra/puppet-ptgbot master: No longer build index page in puppet-ptgbot  https://review.openstack.org/62691117:57
AJaegerhttp://zuul.openstack.org/status/change/628731,7 is waiting since 3 hours for Debian nodes ;( Any problems with Debian nodes ?17:58
dmsimardclarkb, jrosser: having lunch right now, I'll be able to check in a few17:59
jrosserno worries - i'm travelling home soon but i've seen a fair few of these so its worth a bit of a dig later17:59
clarkbAJaeger: | 0001544620 | ovh-bhs1            | debian-stretch         | a3145fd4-7ce5-4a5a-be55-6b2407f00cac | 158.69.65.196   | 2607:5300:201:2000::335                | ready    | 00:02:24:55 | locked   |18:00
clarkbAJaeger: it is odd that a ready node would be locked for 2.5 hours18:00
*** ykarel has joined #openstack-infra18:00
*** diablo_rojo has joined #openstack-infra18:00
*** derekh has quit IRC18:00
clarkb| 0001547894 | rax-dfw             | debian-stretch         | 2f0bd8e3-135e-47d5-b28c-8cde74f3af85 | None            | None                                   | building | 00:00:00:17 | locked   | is new node building and | 0001547866 | inap-mtl01          | debian-stretch         | 15b690bd-0005-46c1-b47a-4047db6ed536 | 198.72.124.91   |                                        | in-use   | 00:00:00:4918:00
clarkb| locked   | is recently used nodes18:00
clarkbAJaeger: my guess is that locked ready node is tied to that job18:01
*** jcoufal has quit IRC18:01
clarkbShrews: ^ is that something you might want to look at?18:01
logan-interesting log on that first job you linked clarkb. it looks like it has connectivity across the vxlan but only in one direction?18:01
clarkblogan-: no I think it is broken in both directions. We ping the local IP locally and remotely. The pings that succeed are for the local IP18:02
AJaegerclarkb: thanks18:02
clarkblogan-: that helps us to know if hte interface itself is broken or if it is the tunnel. The local IP pinging implies the tunnel is the issue18:03
*** jcoufal has joined #openstack-infra18:04
ssbarnea|bkp2clarkb: regarding rdo, if i remember well updating deps doesn't happen before, due to its slow speed. but overall i think the logic changed in time.18:04
logan-ah ok, so 172.24.4.1 is on the 'primary' side of the tunnel, and 172.24.4.2 is on secondary18:05
clarkblogan-: yup18:05
*** diablo_rojo has quit IRC18:05
Shrewsclarkb: i can look, but it's not uncommon for zuul to hold ready node locks for that long18:06
Shrewswill see if i can track that one down though18:06
*** ykarel has quit IRC18:06
clarkbShrews: would it be waiting on an executor to be available to run the job? iirc we get nodes before an executor18:07
clarkbbasicaly what prevents that job from starting if it has a node and is queued. Executor availability is all I can think of right now18:08
clarkbssbarnea|bkp2: re cloner shim. Basically most jobs that use it will be carried over from our JJB conversion and use legacy-base. There is potential for some jobs to use the shim and parent to base and not legacy-base. If they do this they will break and we can fix them pretty easily by converting to legacy-base. So there might be a short period of brokeness, but comes with straightforward fix. If we18:11
clarkbcommunicate that people can check things before hand (and fix it before hand) then I think we've done a good job there and anyone broken should have a relatively easy path forward for fixing too18:11
clarkbssbarnea|bkp2: if we want to wait on the base job refactoring that allows us to test more of the base jobs we can do that too which will make the step of auditing whether or not it will break you easier18:11
*** agopi has quit IRC18:14
Shrewsclarkb: looks like ovh-bhs1 is at quota, so it's paused trying to handle the most recent request, but frequently getting launch errors. the active request queue is slowly shrinking, but node requests piled up in that queue are delayed (that node is part of one of the requests, but that one still needs 1 more node)18:15
*** agopi has joined #openstack-infra18:15
clarkbah so this is multinode requests when at quota18:16
Shrewsclarkb: correct18:16
clarkbamorin: ^ hello not sure if you are back from holidays, but we've noticed our quota in bhs1 is lower than we had previously18:17
clarkbamorin: do we need to update our configs or is that a bug?18:17
*** rkukura has quit IRC18:18
*** agopi has quit IRC18:18
*** jamesmcarthur_ has joined #openstack-infra18:19
*** agopi has joined #openstack-infra18:19
*** jamesmcarthur has quit IRC18:19
*** trown|lunch is now known as trown18:20
Shrewsclarkb: i'm confused by that since we have max-servers as 150 but i count only 36 ovh-bhs1 nodes18:22
Shrewssomething out of sync?18:22
clarkbShrews: yes our quota in bhs1 has been lowered18:22
clarkbShrews: don't know why or how yet, but basically nodepool is operating under the lower quota numbers18:22
clarkbShrews: part of it is we've kept frickler's test nodes around in that cloud so thats ~20 instances18:23
*** jamesmcarthur_ has quit IRC18:23
clarkbbut that still doesn't account for the full decrease.18:23
*** jamesmcarthur has joined #openstack-infra18:23
Shrewshrm18:23
clarkbfrickler: ^ maybe we should clean those up now that it has been almost a month?18:23
*** jamesmcarthur has quit IRC18:24
*** jamesmcarthur has joined #openstack-infra18:24
*** jamesmcarthur has quit IRC18:24
*** jamesmcarthur has joined #openstack-infra18:25
*** jamesmcarthur has quit IRC18:26
*** jamesmcarthur_ has joined #openstack-infra18:26
*** agopi_ has joined #openstack-infra18:26
*** jamesmcarthur_ has quit IRC18:27
*** agopi has quit IRC18:29
*** dkehn has joined #openstack-infra18:29
*** jamesmcarthur has joined #openstack-infra18:32
*** jamesmcarthur_ has joined #openstack-infra18:34
*** jamesmcarthur has quit IRC18:34
fungiclarkb: sure, once i'm caught up, happy to keep an eye on puppeting of the listservers18:37
clarkbfungi: great, want to approve the first one when you are in a good spot for that? I'll be around all day so can switch to that when you are ready18:37
*** jamesmcarthur_ has quit IRC18:40
fungiclarkb: "first one" being 628216?18:43
clarkbfungi: yup18:43
fungii have it queued up in gertty now, will approve shortly, sure. thanks!18:44
*** rkukura has joined #openstack-infra18:45
*** wolverineav has joined #openstack-infra18:48
clarkbI'll pop out for a bit now then should be back by the time it can be merged and applied18:50
*** wolverineav has quit IRC18:52
dmsimardinfra-root: apache on mirror02.regionone.limestone.openstack.org is complaining from read-only filesystem when trying to write cache header files, ex: http://paste.openstack.org/raw/740463/18:54
clarkbdmsimard: any errors in dmesg or the kernel log indicating why the fs is ro?18:55
fungidmesg ring buffer has been spammed by filesystem errors18:55
dmsimardclarkb: there are ext4-fs errors in dmesg but I'm trying to find when it started or if there are any afs errors18:56
dmsimardalso, yes, what fungi said -- actual non-fs errors have been cycled out18:56
clarkbdmsimard: that cache isnt on afs it is "local"18:56
logan-verified hv is not full /dev/mapper/SYSVG-ROOT  465G  251G  191G  57% /18:56
dmsimardclarkb: oh, oops18:57
fungiyeah, we have a lvm2 logical volume for it, from a vg on top of pv /dev/vdb118:57
dmsimardI'm working my way up the apache logs, so far this has been ongoing since at least jan 218:58
dmsimardapache logs go as far back as dec 31st and there was read only errors already18:59
fungisyslog only has a one-week retention there, yeah18:59
dmsimard[Mon Dec 31 06:35:17.172980 2018] [cache_disk:warn] [pid 2883:tid 140135773492992] (30)Read-only file system18:59
fungioldest syslog entry is:18:59
fungiDec 31 06:25:09 mirror02 kernel: [2073564.770768] EXT4-fs error (device dm-0): ext4_lookup:1606: inode #3367: comm updatedb.mlocat: deleted inode referenced: 825118:59
clarkbfwiw I dont think that ie the source of jrosser's 404 as weseem to proxy without caching19:00
fungiright, i thnik this would account for proxy performance degredation19:00
funginot 40419:00
jrosserIt may be worth looking in logstash for similar because I have a hunch all of these I’ve seen were in limestone19:01
dmsimardyes and no19:01
dmsimardI found that issue as part of troubleshooting the 404 :)19:01
fungiright, we ought to fix it regardless19:02
clarkbya shoudl be fixed19:02
dmsimardmnaser pointed out that we may have been pulling a stale .repo file but it doesn't appear to be that way19:02
clarkbdmsimard: no I checked directly and it seems to line up wuth our mirror19:02
clarkbmy hunch ia the rdo repo updating packages with new deps before adding the new deps19:03
dmsimardI was in #openstack-ansible attempting to help, tl;dr is that OSA is setting up a recent repo (built today) which contains the packages that are 404'ing but for some reason yum is looking at this old 1 month old repo19:03
dmsimardThere is a 404 on http://mirror.regionone.limestone.openstack.org:8080/rdo/centos7/a4/4b/a44b7fc6344c56410b94f2e69ef07fd4b48abb6a_b72eb3dd/python2-glanceclient-2.15.0-0.20181226110746.c4c92ec.el7.noarch.rpm19:03
dmsimardbecause that package is actually at https://trunk.rdoproject.org/centos7/90/44/9044841473d3c9a4c70882bfb5ca59f89cf7afa0_6dde040f/python-glanceclient-2.15.0-0.20181226110746.c4c92ec.el7.src.rpm19:04
fungilooks like that pv is via cinder volume f18e717d-8981-4134-8fe1-57596f7481e419:04
dmsimardor http://mirror.regionone.limestone.openstack.org:8080/rdo/centos7/90/44/9044841473d3c9a4c70882bfb5ca59f89cf7afa0_6dde040f/python-glanceclient-2.15.0-0.20181226110746.c4c92ec.el7.src.rpm19:04
*** tosky has quit IRC19:05
fungilogan-: possible we briefly lost contact with the cinder backend sometime >1week ago? that's usually sufficient for active volumes to go read-only on us19:05
dmsimardthat /90/44 repository is set up properly by OSA as far as we can tell http://logs.openstack.org/26/628926/8/check/openstack-ansible-functional-centos-7/fec2d75/logs/ara-report/result/fd68c556-12ed-4096-9d07-697951a4b3cf/19:05
dmsimardI'm not exactly sure what is going on, would like to address the caching issue and see if we can reproduce19:06
fungiif i stop apache2 and openafs services on mirror02.regionone.limestone i should be able to remount those filesystems read-write, but it probably makes more sense to reboot the instance anyway19:07
fungiinfra-root: objections to an emergency reboot of mirror02.regionone.limestone? ^19:07
logan-fungi: maybe when we rebooted hosts to update the kernel for nested virt fixes ceph hung io long enough to kick it into ro.. i can't remember exactly when that was, a couple weeks ago though19:07
*** smarcet has joined #openstack-infra19:07
fungilogan-: that could easily be it. we don't have syslog back that far unfortunately to get a more exact timestamp19:08
dmsimardfungi: I think we'd want to do a stop/start to ensure we're on a new process with a brand new volume connection19:08
clarkbfungi considering it broken anyeay seems fine. We can disable in nodepool too if we want to be more graceful about it19:08
fungidmsimard: yeah, rebooting the instance will certainly stop and start the processes running in it ;)19:08
clarkbdmsimard: can we stop start as users of nova?19:08
dmsimardfungi: I mean the KVM process19:08
clarkboh I read it as stop start the qemu process19:08
fungiohhh19:08
pabelangerclarkb: mordred: https://review.openstack.org/513506/ removed zuul-cloner from base, I think the issue there was legacy tox jobs there depend on it still.  Maybe we just reparent them to legacy base?19:09
pabelangerclarkb: mordred: but agree, a heads up to ML about fallout might be a good idea19:09
fungiyeah, i can `sudo poweroff` in the instance and then `openstack server boot` it fresh19:09
dmsimardI'm not sure to what extent it applies to ceph, I remember iscsi issues that could only be resolved by spinning up a new process and a soft reboot wasn't enough19:09
fungithough when we've seen this sort of thing in other clouds in the past, just remounting the filesystems read-write after a good fsck is usually sufficient19:10
*** wolverineav has joined #openstack-infra19:10
*** wolverineav has quit IRC19:10
*** wolverineav has joined #openstack-infra19:10
*** jamesmcarthur has joined #openstack-infra19:11
dmsimardI suppose we should first attempt to remount before considering rebooting19:11
fungiwell, the outage from a reboot should be brief19:15
fungibut we can dial down max-servers first if we want19:15
clarkbthat is the safest way19:17
clarkbbut I agree should be short and if we think jobs are broken anyway...19:17
*** jamesmcarthur has quit IRC19:18
clarkbok I have to pop out for a few minutes. back in a few19:18
dmsimardfungi: any of these solutions sound good to me -- I can send a patch for max-servers or we can put nl02 in the emergency file temporarily19:22
*** shardy has quit IRC19:24
*** shardy has joined #openstack-infra19:25
fungiprobably simplest to just do nl02 in emergency and then manually update max-servers on it, then wait for the used count to empty, then we can poweroff the mirror instance and boot it agani19:30
dmsimardWas going to add it in the emergency file but saw a .swp file from a minute ago :p19:31
fungithat was me19:31
dmsimardok19:31
*** gfidente has quit IRC19:32
fungi#status log temporarily lowered max-servers to 0 in limestone-regionone in preparation for a mirror instance reboot to clear a cinder volume issue19:32
openstackstatusfungi: finished logging19:32
openstackgerritsebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Updated laravel jobs to include php7 repo  https://review.openstack.org/62903519:32
fungii'll keep an eye on that to make sure the current puppet.ansible pulse doesn't re-up it19:32
fungithat was the only provider on nl02 booting nodes, btw. the others (citycloud, packethost) were already set to max-servers 019:34
*** agopi__ has joined #openstack-infra19:34
*** smarcet has quit IRC19:36
*** agopi_ has quit IRC19:36
clarkbpabelanger: ^ re packethost have you been able to follow up on using some osa there?19:36
clarkbI'm back now too19:37
pabelangerclarkb: Yah, they are keen. We are just finishing up the POC with ansible-network.19:37
clarkbNICE19:37
clarkber didn't mean the caps there but still nice :)19:37
pabelangerclarkb: I think we actually can start pushing up code to review.o.o next week, and start deploying it from zuul19:38
cloudnull^ nice19:38
*** smarcet has joined #openstack-infra19:38
pabelangerbut yah, OSA works well on stable/rocky19:38
smarcetfungi: mordred:clarkb: please review https://review.openstack.org/#/c/629035/ thx19:40
dmsimard#status log mirror02.regionone.limestone.openstack.org's filesystem on the additional cinder volume went read only for >1 week (total duration unknown) causing errors when apache was attempting to update it's cache files.19:41
openstackstatusdmsimard: finished logging19:41
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Add fetch-output and ensure-output-dirs tests  https://review.openstack.org/62873119:43
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Use is instead of | for tests  https://review.openstack.org/62897319:44
clarkbmordred: ^ fyi19:44
mordredclarkb: woot19:44
mordredsmarcet: zuul is sad about that patch19:45
smarcetyes saw it19:45
smarcetmordred: will fix sorry about that19:45
mordredsmarcet: no worries!19:45
*** jamesmcarthur has joined #openstack-infra19:47
openstackgerritsebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Updated laravel jobs to include php7 repo  https://review.openstack.org/62903519:52
*** whoami-rajat has quit IRC19:58
*** kjackal has quit IRC20:04
smarcetmordred:fungi: fixed https://review.openstack.org/#/c/629035/20:07
AJaegermordred: want to abandon https://review.openstack.org/628668 now?20:08
*** agopi__ has quit IRC20:09
openstackgerritsebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Updated laravel jobs to include php7 repo  https://review.openstack.org/62903520:21
*** bobh_ has quit IRC20:21
*** imacdonn has joined #openstack-infra20:22
clarkbfungi: should I approve https://review.openstack.org/#/c/628216/4 or were you still planning to do it?20:27
fungii've approved it just now and am watching syslog on the server20:29
clarkbgreat20:30
fungilast puppet apply there was 19:58:4020:30
*** e0ne has joined #openstack-infra20:40
imacdonnhi guys ... https://review.openstack.org/#/c/612393/ failed in the gate due to a tempest timeout ... must it be rechecked, or is there any shortcut (requeue?)20:41
clarkbimacdonn: in general failures have to be rechecked due to the "clean check" requirement20:44
clarkbthe major exception to this is when we are trying to get bug fixes in for the gate itself and want to expedite that process to avoid unnecessary gate resets20:44
clarkb(this is why I keep pushing for people to help debug and fix gate errors)20:44
imacdonnclarkb: yeah, I understand that, but it seems like a timeout has a high probability of not being the code's fault ... I think exceptions may have been given in the past, but I don't recall the circumstances20:45
imacdonnjust seems like a waste of resources to go through check again20:45
imacdonnunderstood, though ... figured it was working asking ;)20:46
clarkbimacdonn: what we are trying to avoid there is making it easy for flaky code to go through the gate and merge then be flaky for everyone (we've seen this happen in the past and is one of the reasons for clean check. The other is ensuring that we have relatively up to date results avoiding unnecessary gating)20:46
clarkbimacdonn: and yes the flaky gate failures are often not directly related to the specific change that failed.20:47
clarkbwhcih is why I keep pushing people to identify the failures, track them with elastic rehckec and ideally fix them if we are able20:47
imacdonnclarkb: yeah, I get it .... trying to diagnose timeouts on infra that you have little visibility into can be challenging, though ;)20:48
clarkbimacdonn: ya one of the frequent steps we have to take is add additional logging around the unhappy code20:49
fungicorvus: noticing /var/lib/bind/zones/zuulci.org/zone.db.signed on adns1.opendev.org is (much) older than zone.db, and rndc zonestatus doesn't list a "next resign node" or "next resign time" for it (also says "secure: no"). still digging to see why it's not getting new sigs20:51
fungioh! /etc/bind/keys/zuul-ci.org is empty20:52
fungithat could be related ;)20:52
openstackgerritMerged openstack-infra/system-config master: Fix glob for lists.katacontainers.io  https://review.openstack.org/62821620:52
fungisame on adns1.openstack.org as well20:53
fungii wonder how we got a signed zone for it to begin with20:53
fungier, meant to say /etc/bind/keys/zuulci.org is empty20:54
*** bobh_ has joined #openstack-infra20:54
*** jamesmcarthur has quit IRC20:54
fungi/etc/bind/keys/zuul-ci.org (with the hyphen) definitely has content20:55
*** smarcet has quit IRC20:58
fungilooks like we're missing a section for the zuulci.org zone in /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o20:58
*** bobh_ has quit IRC21:00
fungiworking on getting some added now21:02
corvusfungi: since it was registered through csc, it was probably never really signed21:05
corvus(cause i think opendev.org was the first csc domain we dnssec'd)21:05
fungimakes sense21:06
fungistrangely, it has a zone.db.signed file anyway21:06
*** smarcet has joined #openstack-infra21:07
clarkbcorvus: before I forget want to followup with the thread about k8s walkthrough with a time selection? Probably want to do that today so people can make time tomorrow if that is whn we are doing it21:07
corvusclarkb: will do now21:07
dmsimardfungi: need to take off and I'll be back later, it looks like limestone is clear: http://grafana.openstack.org/d/WFOSH5Siz/nodepool-limestone?orgId=121:08
fungidmsimard: yep, i'll get it rebooted thoroughly here in a but21:09
fungier, in a bit21:09
*** xek has quit IRC21:11
fungi#status log generated and added dnssec keys for zuulci.org to /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o21:15
openstackstatusfungi: finished logging21:15
fungihopefully that'll get things rolling21:16
*** smarcet has quit IRC21:19
clarkbfungi: do we need to give the registrar soem of that info?21:20
clarkbseems like we had to do that for opendev21:20
clarkbfungi: also lists.kata should be puppeting in the next 10-15 minutes I thin21:21
clarkb(I've got a shell there now too)(21:21
*** smarcet has joined #openstack-infra21:21
fungiyeah, the last pulse was a no-op but i think it started before the change merged21:22
fungiclarkb: we likely need to provide ds records to csc for zuulci.org once everything is confirmed working21:22
fungibut i'd hold off doing that until we see the serial update21:23
*** wolverineav has quit IRC21:26
*** kgiusti has left #openstack-infra21:27
clarkbfungi: and that just affects abuility to verify the signed zone?21:30
fungiright21:30
clarkbfungi: lists.kata lgtm21:30
fungiclarkb: yeah, other than all the deprecation messages it looks to have been a no-op?21:31
clarkbyup21:31
fungii'll go ahead and approve the next change now21:31
clarkb++21:31
fungiand watch lists.o.o accordingly21:31
*** jcoufal has quit IRC21:31
clarkbfungi: and so I understand the dns thing better, csc wasn't syncing the zone because it wasn't properly signed?21:36
fungihad nothing to do with csc21:36
clarkboh ns1 and ns2 weren't syncing it from adns1?21:36
fungins1/ns2.opendev.org were serving old copies of the zone because that's what adns1.opendev.org was providing them21:36
fungiadns1.opendev.org had a zone.db.signed file (presumably copied from adns1.openstack.org?) corresponding to before the ns record changes in the current zone.db file21:37
fungiand zone.db.signed is what was getting served21:38
clarkbgot it21:38
fungiif it had been updating zone.db.signed on each new zone.db change, that would have worked out fine21:38
*** jamesmcarthur has joined #openstack-infra21:39
fungibut since it had no dnssec keys for the zuulci.org zone, it wasn't able to sign the newer version of the zone so kept serving the old signed one21:39
fungithe fix (i'm hoping!) was to run the dnssec-keygen commands from https://docs.openstack.org/infra/system-config/dns.html#adding-a-zone and then insert their contents into /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o21:41
fungiif not, at least i'll get to see what else is missing next21:41
fungibut yeah, if zone.db.signed gets updated here in a little while, then we can run the dnssec-dsfromkey command there and provide the output to csc21:42
clarkbimacdonn: digging itno that failure it seems the main tempest run was fine http://logs.openstack.org/93/612393/21/gate/tempest-full-py3/952f7f7/job-output.txt.gz#_2019-01-07_18_28_17_260583 then the slowtests run gets really unhappy for about 35 minutes and the job times out21:43
clarkbfungi: got it21:43
fungiinfra-root: i've powered off mirror02.regionone.limestone now that there are no jobs running in that region. i'll get it booted back up and checked out here in a few21:43
*** e0ne has quit IRC21:44
*** smarcet has quit IRC21:44
imacdonnclarkb: I guess they weren't kidding when they marked them as "slow" :)21:44
clarkbimacdonn: dstat shows the node goes relatively idle after the first tempest run too21:46
clarkbI think that rules out a memory or cpu or disk issue21:46
clarkbimacdonn: http://logs.openstack.org/93/612393/21/gate/tempest-full-py3/952f7f7/controller/logs/screen-n-cpu.txt.gz#_Jan_07_18_33_34_967897 shows us virtual interface creation failed (and it had ~5 minutes to do that)21:50
openstackgerritMerged openstack-infra/system-config master: Fix glob for lists.o.o  https://review.openstack.org/62821721:51
clarkbhttp://logs.openstack.org/93/612393/21/gate/tempest-full-py3/952f7f7/controller/logs/screen-q-agt.txt.gz#_Jan_07_18_28_36_447492 neutron seems to think the device was created properly21:52
*** auristor has quit IRC21:53
fungiwow, neat, mirror02.regionone.limestone is refusing to let me ssh in now, and it's been at least 5 minutes since i issued the server start command. guess i'll check the console21:54
fungilovely, it can't fsck /dev/main/mirror02-cache-apache2 so has dropped to single-user21:56
imacdonnclarkb: hmm... message queue issue ?21:57
logan-fungi: :( lmk if i can be of any assistance on my end21:58
clarkbimacdonn: possibly. Looking at rabbit logs there are some unexpected disconnects but none from the nova-compute pids21:58
clarkbimacdonn: do those events go through nova conductor instead?21:58
*** bnemec has quit IRC21:58
clarkbhrm the disconnects all seem to be uwsgi processes so not conductor either21:59
clarkbimacdonn: probably need nova and/or neutron to dig into that more. Maybe dansmith or slaweq can help22:00
clarkbhttp://status.openstack.org/elastic-recheck/#1808171 may be related as well22:01
*** trown is now known as trown|outtypewww22:01
*** wolverineav has joined #openstack-infra22:01
*** wolverineav has quit IRC22:01
*** wolverineav has joined #openstack-infra22:01
fungilogan-: thanks! just getting pulled in lots of directions so taking me a few minutes to dig up credentials22:02
clarkbimacdonn: ya that logstash signature matches except for the test name. So we may want to broaden that query22:02
*** bnemec has joined #openstack-infra22:02
*** e0ne has joined #openstack-infra22:02
clarkbya lots of hits if I remove the test filter22:03
fungilogan-: aha, `console url show ...` did the trick. hooray for people running *actual* openstack!22:03
clarkbimacdonn: I'll go bug neutron22:03
*** auristor has joined #openstack-infra22:06
imacdonnclarkb: thanks! I'll watch there22:06
fungiyay! i can ssh into mirror02.regionone.limestone now, after manually rerunning fsck with -y via a root shell on the oob console22:08
*** rh-jelabarre has quit IRC22:08
logan-great22:08
clarkbfungi: was it refusing to complete the boot without a fsck?22:08
fungidepends on your definition of "refusing," "complete" and "boot" i guess ;)22:08
fungii happily complained that one of the filesystems in /etc/fstab had errors, and then helpfully dropped to a root shell in single-user mode22:09
fungier, it happily complainec22:09
fungisomething22:10
fungiit's getting to that time of day where my typing is even more atrocious than usual22:10
clarkbah22:11
*** slaweq has quit IRC22:12
clarkbfungi: have we reenabled that region in nodepool yet?22:22
funginot yet22:24
fungii'm in a bunch of conversations, trying to finish checking the mirror out22:24
fungiapache and afs caches look sane, hitting from a browser22:26
*** ianw_pto is now known as ianw22:26
*** boden has quit IRC22:26
fungiand the filesystems are mounted and no errors are being reported22:26
ianwHNY everyone22:26
fungihalf-normal yodelling to you too!22:27
clarkbianw: hello22:28
mordredhello ianw!22:29
*** slaweq has joined #openstack-infra22:30
*** e0ne has quit IRC22:32
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Update upload-logs to process docs as well  https://review.openstack.org/51185322:32
*** slaweq has quit IRC22:34
fungi#status log nl02 has been removed from the emergency maintenance list now that the filesystems on mirror02.regionone.limestone have been repaired and checked out22:41
openstackstatusfungi: finished logging22:41
*** diablo_rojo has joined #openstack-infra22:43
*** rcernin has joined #openstack-infra22:45
*** tosky has joined #openstack-infra22:52
manjeetsclarkb, hi, adding success-comment in pipeline.yaml didn't make it go to CI section22:54
clarkbmanjeets: I'm not sure then22:54
fungimanjeets: can you link to an example review where your ci system added a comment?22:55
*** eernst has joined #openstack-infra22:55
manjeetsfungi, https://review.openstack.org/#/c/603501/22:56
manjeetsIntel SriovTaas CI check comments from22:56
manjeetscomments from Intel SriovTaas CI check22:56
openstackgerritMerged openstack-infra/zuul master: Add timer for starting_builds  https://review.openstack.org/62346822:57
*** eernst_ has joined #openstack-infra22:57
*** eernst has quit IRC22:57
*** tmorin has joined #openstack-infra22:57
fungimanjeets: thanks, i think the account display name needs to be adjusted to just "Intel SriovTaas CI" without the "check" on the end of the account name22:57
manjeetsfungi, i'll try that too thanks !!22:58
tmorinfrickler: you can now release the CI node you had frozen earlier today to let me debug, I gathered enough information to explore a path to a solution -- many thanks!22:58
*** eernst_ has quit IRC22:58
*** eernst has joined #openstack-infra22:59
fungimanjeets: you can see the regular expression used to match on the display names for comments at https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/files/gerrit/hideci.js#n1922:59
fungivar ciRegex = /^(.* CI|Jenkins|Zuul)$/;22:59
fungiso if there's something after the "CI" in the name, that will cause it not to match22:59
fungiclarkb: puppet is running on lists.o.o now23:00
fungiand just finished23:00
tmorin( frickler, or anyone else acting as infra-root, the CI node that can be freed is was ubuntu-xenial-inap-mtl01-0001542013 )23:00
manjeetsfungi I wonder how's this working on here https://review.openstack.org/#/c/629041/23:00
fungiclarkb: looks like it was (properly) a no-op?23:00
manjeetsthere's Intel NFV CI check23:00
fungimanjeets: "check" is the pipeline name. if you hit the "toggle ci" button at the bottom of the page you'll see the display name for that account is just "Intel NFV CI" with no "check" after it23:01
*** eernst has quit IRC23:02
fungithe "check" part is taken from the job report string, where it says "Build succeeded (check pipeline)."23:02
manjeetsgot it thanks ! fungi for some reason I followed that thinking check is part of name23:02
manjeetsmy bad23:02
manjeetsfungi, cool that worked https://review.openstack.org/#/c/603501/3323:02
clarkbfungi: yup looks like a proper noop. The next services in the list are openstackid. Any idea if we are puppeting those currently? <- smarcet may know too23:02
manjeetsthanks !23:03
imacdonnclarkb: argh, my recheck failed on that thing that looks like an address conflict (ssh timeout)! can't win! http://logs.openstack.org/93/612393/21/check/cinder-tempest-dsvm-lvm-lio-barbican/ebc3a73/23:03
fungiclarkb: we are not (currently) puppeting openstackid.org production, while smarcet works through updating openstackid to newer php on openstackid-dev23:03
fungimanjeets: great! happy you got it worked out23:04
clarkbfungi: should we go ahead and flip openstackid-dev to futureparser now?23:04
clarkbthen we can flip the switch for prod too and it should work if -dev is happy23:04
fungiclarkb: we might want to double-check that it won't complicate what smarcet is doing on openstackid-dev now. also i think he wants to couple this with server rebuilds on xenial (they're still trusty)23:05
clarkbfungi: ok, I'm happy either way. We managed to get through a whole chunk of services onto futureparser and we can rebase the list order if necessary from this point forward23:06
clarkbI expect its only a small handful of services now23:06
*** tmorin has quit IRC23:06
fungiyeah, might make sense to bump those further up the list or something23:06
fungialso http://grafana.openstack.org/d/WFOSH5Siz/nodepool-limestone says 50 nodes in use again, so ansible/puppet has restored the old max-servers23:07
fungiif tmorin comes back, that was one of at least 3 nodes held with the same comment, so i'm not sure whether he's done with all of them or just that one23:10
*** rascasoft has quit IRC23:10
fungimnaser: are you done with the magnum-kubernetes-conformance troubleshooting for that last pair of nodes we held a week ago?23:10
*** smarcet has joined #openstack-infra23:13
*** diablo_rojo has quit IRC23:14
openstackgerritLuigi Toscano proposed openstack-infra/project-config master: Basic job and queue definitions for sahara-plugin-*  https://review.openstack.org/62906823:14
fungicorvus: clarkb: ansible added the keys under /etc/bind/keys/zuulci.org and bind seemed to be aware of them, but didn't update /var/lib/bind/zones/zuulci.org/zone.db.signed until i issued a `sudo rndc loadkeys zuulci.org`23:19
fungithough it's still serving that same older soa with the 1526407320 serial from may23:21
*** ianychoi has quit IRC23:21
fungiso all it seems to have done is refreshed the signature on the old zone content?23:22
fungii wonder if we should clear out the contents of /var/lib/bind/zones/zuulci.org and let the signatures get recreated fresh23:23
fungiis anybody else having trouble pulling up https://etherpad.openstack.org/ right now?23:24
openstackgerritMerged openstack-infra/openstackid-resources master: Migration to PHP 7.x  https://review.openstack.org/61622623:24
fungii can't ssh to it at all23:24
fungivia ipv6 or ipv423:25
fungii wonder if the host just crashed out from under it23:25
clarkbssh not working via ipv4 from here23:25
clarkbit does ping, console might say something interesting?23:25
fungijumping into rackspace dashboard, yeah23:26
fungi"Loading Console ..."23:28
fungiis all it gives me23:28
fungiexpecting an e-mail from fanatical support to infra-root@o.o in 3... 2... 1...23:28
fungicacti says load average and iowait went through the roof just before it went dead for us23:35
fungii wanted to check whether this was a good opportunity to rebuild it on xenial while it's offline anyway, so i went to pull up the etherpad where we had the list of remaining servers to upgrade... :/23:39
fungianyway, etherpad-dev seems to already be on xenial so i suspect etherpad.o.o is as well23:39
clarkbyup23:40
clarkbafs* kdc* groups* health status lists.* openstackid* ask graphite pbx refstack static and wiki-dev are the remaining servers23:41
clarkbwe also need to rm puppetmaster at some point (its still trusty but is replaced with bridge which is bionic)23:41
clarkbfungi: ^ that is from my cached copy of the etherpad23:41
fungicacti is still reporting values for snmp polls in the past few minutes, so i was about to say maybe the host is up...23:42
fungi"This message is to inform you that the host your cloud server 'etherpad01.openstack.org' resides on alerted our monitoring systems at 23:41 UTC. We are currently investigating the issue and will update you as soon as we have additional information regarding what is causing the alert.  Please do not access or modify 'etherpad01.openstack.org' during this process.  Please reference this incident ID if23:43
fungiyou need to contact support: CSHD-9wZ2KeoQVvD"23:43
fungi#status notice The Etherpad service at https://etherpad.openstack.org/ has been offline since 23:22 UTC due to a hypervisor issue in our service provider, but should hopefully return to service shortly.23:47
openstackstatusfungi: sending notice23:47
*** tosky has quit IRC23:48
-openstackstatus- NOTICE: The Etherpad service at https://etherpad.openstack.org/ has been offline since 23:22 UTC due to a hypervisor issue in our service provider, but should hopefully return to service shortly.23:49
corvusmordred: regardless of whether we think it's ready for gitea; i bet we could do an HA etherpad/percona in k8s.23:50
openstackstatusfungi: finished sending notice23:51
fungithat would be neat23:51
clarkbcorvus: the one gotcha there is only one nodejs process can serve all clients for a single pad23:51
clarkbcorvus: so you have to have some fairely intelligent load balancing happening23:52
mordredcorvus: ++23:52
fungihigh-availability doesn't necessarily imply active/active23:52
clarkbfungi: ya active standy would be simplest way to do it probably23:52
fungiwe could get away with active/standby probably (with some data loss at failover)23:52
mordredfungi: yah. just being able to restart a new process quickly on a different backend node as things go south would be a nice win23:52
corvusyeah, i guess we could have just one etherpad pod which gets auto-rescheduled, or we could probably have a stateful LB.23:52
mordredyah. could do both23:53
corvusso an active-active percona system with a single re-schedulable etherpad server.  piece of cake.23:53
fungilooks like the server is back up and responding to ssh again23:53
mordredyup23:53
fungi23:53:51 up 1 min,  1 user,  load average: 0.18, 0.06, 0.0123:53
corvus(and by piece of cake, i mean "mordred did all that percona work already" :)23:54
fungiheh23:54
fungiwhen you say "percona" you're referring to "Percona Server for MySQL"?23:55
*** dave-mccowan has quit IRC23:56
clarkbI'm guessing galera23:56
fungia la https://github.com/percona/percona-server23:56
*** jamesmcarthur has quit IRC23:56
clarkbwhcih does active active active mysql23:56
mordredpercona xtradb cluster23:59
mordredhttps://review.openstack.org/#/c/626054/23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!