Tuesday, 2019-09-10

sean-k-mooneywhat is the smalles job that actully aqquire a node. does the noop job do that?00:00
pabelangernoop jobs doesn't use any nodes, you likey want something tox related00:00
pabelangeror create a hello world job00:01
sean-k-mooneyif not i might create a simple hello world job that just00:01
sean-k-mooneyya that is want i was thinking00:01
clarkbre ironic if they come asking for access to the node I held it is | 0011007873 | rax-ord              | ubuntu-bionic        | 0326030d-851e-41bd-86df-d5acb9191f7b | 23.253.173.73   | 2001:4801:7825:104:be76:4eff:fe10:4061 | hold     | 00:05:29:02  | unlocked |00:01
sean-k-mooneyi can disable all jobs except the canary job and run check experimental on it a few times in a row00:01
fungiclarkb: that could also warrant a brief e-mail to openstack-discuss00:01
sean-k-mooneythat will skip the queue more or less00:02
clarkbfungi: good point (I did file a bug but email will hopefulyl get eyeballs on it)00:02
sean-k-mooneyalthough its 1 am so maybe tomorow00:02
clarkbsean-k-mooney: you should sleep :)00:02
sean-k-mooneyyes that is what my brain is currnelty telling me00:03
sean-k-mooneythanksf or your help o/00:03
*** ociuhandu has joined #openstack-infra00:10
*** factor has joined #openstack-infra00:12
clarkbjohnsom: and I haven't forgotten about your dns problems. 680340 is close to being able to address the next debugging steps for that I think as will my cleanup-run playbook00:13
johnsomclarkb Cool, thanks.00:13
*** aaronsheffield has quit IRC00:14
*** ociuhandu has quit IRC00:15
ianwclarkb: i just had a query on executor jobs with the cleanup stuff00:17
*** goldyfruit_ has joined #openstack-infra00:20
rm_workhey, we just noticed hacking checks stopped working after the upgrade from `hacking!=0.13.0,<0.14,>=0.12.0` to 1.1.000:22
rm_workanyone aware of any changes necessary to get them to run on the new version? did we miss a migration?00:22
*** mattw4 has quit IRC00:23
rm_worklooks like between 1.0.0 and 1.1.000:28
donnydso i restarted the l3-agent. I am looking in the logs now to see if there is an issue00:29
donnydits wasn't happy when i brought it back online.. so I am thinking maybe a service didn't start correctly or something00:30
logan-re-enabling limestone host aggregate, will monitor for any issues as we start accepting jobs again00:31
donnydsean-k-mooney: I use bgp for routing, but that subnet already has an entry. I route the whole /64 for that tenant at my edge... so it may as well be static00:33
donnydevery now and again the l3-agent on my side sticks... I don't have a better way to explain it other than that..00:35
donnydso maybe give it another go when you get back on tomorrow00:35
logan-sean-k-mooney: catching up on backscroll.. i'd be happy to support numa labels on limestone. also happy to discuss our experience keeping nested virt operational.00:37
donnydAlso I would just like to point out we didn't really seem to have this issue when we had a separate pool for the custom stuff. Maybe that is also worth a go00:39
* donnyd goes to get some food and then sleep. be back in the am00:39
*** michael-beaver has quit IRC00:40
*** prometheanfire has quit IRC00:42
*** gyee has quit IRC00:42
*** prometheanfire has joined #openstack-infra00:43
*** markvoelker has joined #openstack-infra00:46
*** markvoelker has quit IRC00:48
*** markvoelker has joined #openstack-infra00:49
rm_workah figured out the hacking-checks issue00:52
*** diablo_rojo has quit IRC00:57
*** markvoelker has quit IRC00:59
*** markvoelker has joined #openstack-infra00:59
*** markvoelker has quit IRC01:04
*** nicolasbock has quit IRC01:04
*** nicolasbock has joined #openstack-infra01:04
*** markvoelker has joined #openstack-infra01:07
*** yamamoto has joined #openstack-infra01:15
*** markvoelker has quit IRC01:17
*** markvoelker has joined #openstack-infra01:18
*** bobh has joined #openstack-infra01:19
*** markvoelker has quit IRC01:23
*** hongbin has joined #openstack-infra01:35
*** rfolco has quit IRC01:41
*** markvoelker has joined #openstack-infra01:48
*** jamesmcarthur has joined #openstack-infra01:48
*** nicolasbock has quit IRC02:01
*** apetrich has quit IRC02:10
*** spsurya has joined #openstack-infra02:18
*** yamamoto has quit IRC02:19
*** markvoelker has quit IRC02:20
*** markvoelker has joined #openstack-infra02:22
*** bobh has quit IRC02:23
*** icarusfactor has joined #openstack-infra02:25
*** ykarel|away has joined #openstack-infra02:25
*** factor has quit IRC02:27
*** jamesmcarthur has quit IRC02:30
*** larainema has joined #openstack-infra02:33
*** roman_g has quit IRC02:34
*** yamamoto has joined #openstack-infra03:03
*** hamzy_ has quit IRC03:06
*** rlandy|bbl has quit IRC03:12
*** igordc has quit IRC03:31
*** rh-jelabarre has quit IRC03:34
*** dave-mccowan has quit IRC03:35
*** armax has quit IRC03:41
*** exsdev0 has joined #openstack-infra03:43
*** exsdev has quit IRC03:44
*** exsdev0 is now known as exsdev03:44
*** xarses has quit IRC03:44
clarkbfwiw no new fn network errors at launch for the last several hours03:45
clarkbI've rechecked sean-k-mooney's change so hopefully there are results for sean-k-mooney when back at it03:45
*** xarses has joined #openstack-infra03:45
*** hongbin has quit IRC03:46
clarkbhrm the first attempt at booting that numa label failed :/ maybe the quietness was a fluke03:49
*** ykarel|away has quit IRC03:53
*** ramishra has joined #openstack-infra03:54
clarkbianw: responded to your question. And now I call it a night03:56
AJaegerconfig-core, https://review.opendev.org/678356 and https://review.opendev.org/678357 are ready for review, dependencies merged - those remove now unused publish jobs and update promote jobs. Please review04:06
*** udesale has joined #openstack-infra04:07
*** ykarel|away has joined #openstack-infra04:08
*** ykarel|away is now known as ykarel04:09
openstackgerritMerged opendev/base-jobs master: Add cleanup phase to base(-test)  https://review.opendev.org/68110004:12
*** snapiri has joined #openstack-infra04:19
*** ociuhandu has joined #openstack-infra04:30
*** ociuhandu has quit IRC04:34
*** jtomasek has joined #openstack-infra04:38
*** igordc has joined #openstack-infra04:44
*** markvoelker has quit IRC04:48
*** ricolin has joined #openstack-infra05:00
*** kjackal has joined #openstack-infra05:17
*** markvoelker has joined #openstack-infra05:26
openstackgerritMerged zuul/zuul master: Web: rely on new attributes when determining task failure  https://review.opendev.org/68049805:28
*** jtomasek has quit IRC05:29
*** markvoelker has quit IRC05:30
*** diga has joined #openstack-infra05:33
*** raukadah is now known as chandankumar05:34
*** ralonsoh has joined #openstack-infra05:38
*** pots has quit IRC05:42
*** soniya29 has joined #openstack-infra05:57
AJaegerianw: what's your timeframe for https://etherpad.openstack.org/p/static-services ? I can write the job updates for publishing/promote (just added myself to etherpad for that)06:00
openstackgerritIan Wienand proposed zuul/zuul master: [wip] Test and expand documentation for executor-only jobs  https://review.opendev.org/67918406:00
ianwAJaeger: ummm ... when it gets done? :)  the hard bits will moving the publishing to afs i guess?06:00
AJaegerThat should be easy, I can propose all the AFS publishing jobs...06:01
AJaegerJust tell me when - and whether you want them as single changes or one large one...06:01
AJaeger(easish for me - so, I volunteer)06:01
ianwAJaeger: well i mean any time you like!  the ideal would be to get to the point that static.o.o just isn't doing anything06:02
ianwthe redirects shouldn't be too hard.  we didn't really come to a conclusion if we should spin up a new service or move them over to files.openstack.org's apache ... we can probably push for more of an answer in meeting if that's becoming the holdup06:03
AJaegerlet me work the next days on the jobs and then we can discuss how to split them up.06:03
AJaegerI only volunteer for any publish/promote jobs - not the redirects ;)06:03
ianwthat's fine :)  i mean i have changes out that handle all the redirects via haproxy, if we want to do that06:04
AJaegerok06:04
*** dtantsur|afk is now known as dtantsur06:15
dtantsurclarkb: I don't think we're creating anything on test nodes (only in VMs inside test nodes)06:16
*** dklyle has quit IRC06:20
*** dklyle has joined #openstack-infra06:21
ianwmirror.opensuse06:24
ianwRelease failed: VOLSER: Problems encountered in doing the dump !06:25
ianwhrm ...06:25
*** kopecmartin|off is now known as kopecmartin06:26
*** gfidente has joined #openstack-infra06:27
ianwTue Sep 10 06:18:38 2019 1 Volser: ReadVnodes: IH_CREATE: File exists - restore aborted06:28
ianwTue Sep 10 06:18:38 2019 Scheduling salvage for volume 536871010 on part /vicepa over FSSYNC06:28
ianwi think that must have been it06:28
ianwStarting ForwardMulti from 536871010 to 536871010 on afs02.dfw.openstack.org (full release).06:28
ianwit was06:28
fricklerinfra-root: the kernel panics on bionic could be this one https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1842447 , hit us in several production nodes. kernel -62 should be fine06:31
openstackLaunchpad bug 1842447 in linux (Ubuntu) "Kernel Panic with linux-image-4.15.0-60-generic when specifying nameserver in docker-compose" [Undecided,Confirmed]06:31
AJaegerianw, could you review https://review.opendev.org/#/c/678356/ and https://review.opendev.org/#/c/678357/6, please?06:31
ianwthis looks like the same volume corruption we saw and i posted about @ https://lists.openafs.org/pipermail/openafs-devel/2018-May/020491.html06:36
ianwwe will have to try a similar salvage operation.  i won't do anything until the current batch of "vos unlock / vos release" operations complete (running in root screen on afs01.dfw)06:37
ianwso far mirror.opensuse is the only volume to have issues releasing, which is good at least06:37
ianwit still has06:38
ianwmirror.ubuntu-ports06:38
ianwmirror.ubuntu06:38
ianwmirror.yum-puppetlabs06:38
ianwto go06:38
*** slaweq has joined #openstack-infra06:41
dirkroman_g: AJaeger: fungi: regarding the opensuse 15.0 mirror issues, I se it is current on files.openstack.org - so where is the issue? also, please switch your jobs away from opensuse-150 nodeset to opensuse-15 (which is 15.1 right now) to reduce maintenance burden going forward06:42
openstackgerritMerged openstack/project-config master: Remove now unused publish jobs  https://review.opendev.org/67835606:42
*** pgaxatte has joined #openstack-infra06:44
*** igordc has quit IRC07:00
*** tesseract has joined #openstack-infra07:05
*** rcernin has quit IRC07:09
*** ociuhandu has joined #openstack-infra07:14
*** kjackal has quit IRC07:17
iceyanybody else seeing issues with ipv6 + review.opendev.org?07:17
*** kjackal has joined #openstack-infra07:18
*** threestrands has quit IRC07:20
*** ociuhandu has quit IRC07:21
*** yamamoto has quit IRC07:22
*** apetrich has joined #openstack-infra07:24
*** xenos76 has joined #openstack-infra07:26
*** rpittau|afk is now known as rpittau07:28
*** yamamoto has joined #openstack-infra07:30
*** jpena|off is now known as jpena07:33
*** yamamoto has quit IRC07:34
*** apetrich has quit IRC07:39
*** ykarel is now known as ykarel|lunch07:39
*** apetrich has joined #openstack-infra07:40
ianwdirk: per some of my prior messages; all mirroring is currently paused while we try to recover all the volumes to full replication status, and it looks like the opensuse volume will need a little help to get recovered07:41
ianwicey: i connect via ipv6 and not seeing any issues07:41
iceyianw: weird, I can `ping6 google.com`, but not review.opendev.org07:42
ianw64 bytes from review01.openstack.org (2001:4800:7819:103:be76:4eff:fe04:9229): icmp_seq=1 ttl=48 time=280 ms07:42
*** xenos76 has quit IRC07:42
iceyI suspect it's something wonky on my side, which is all I need to know (/me keeps digging!)07:43
*** pkopec has joined #openstack-infra07:43
*** xenos76 has joined #openstack-infra07:44
*** apetrich has quit IRC07:45
*** happyhemant has joined #openstack-infra07:46
*** trident has quit IRC07:50
*** pgaxatte has quit IRC07:50
*** apetrich has joined #openstack-infra07:52
*** lpetrut has joined #openstack-infra07:55
*** pgaxatte has joined #openstack-infra07:57
*** trident has joined #openstack-infra08:01
*** dchen has quit IRC08:04
*** e0ne has joined #openstack-infra08:06
*** jtomasek has joined #openstack-infra08:06
*** priteau has joined #openstack-infra08:06
* AJaeger will be offline for a bit...08:17
*** AJaeger has left #openstack-infra08:17
*** AJaeger has quit IRC08:17
*** ociuhandu has joined #openstack-infra08:20
*** tkajinam has quit IRC08:22
*** panda|rover has quit IRC08:23
*** panda has joined #openstack-infra08:24
*** ociuhandu has quit IRC08:25
*** ociuhandu has joined #openstack-infra08:26
*** roman_g has joined #openstack-infra08:29
*** ociuhandu has quit IRC08:30
*** ociuhandu has joined #openstack-infra08:30
openstackgerritMerged openstack/project-config master: Add allowed-projects to static publish jobs  https://review.opendev.org/67835708:32
*** derekh has joined #openstack-infra08:33
*** soniya29 has quit IRC08:34
*** sshnaidm|afk is now known as sshnaidm|ruck08:37
*** ociuhandu has quit IRC08:40
*** ociuhandu has joined #openstack-infra08:41
*** ociuhandu has quit IRC08:44
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul-jobs master: Fix delegating nodepool_ip and switch_ip facts  https://review.opendev.org/68118208:49
*** yamamoto has joined #openstack-infra08:56
*** ykarel|lunch is now known as ykarel|09:01
*** ykarel| is now known as ykarel09:01
*** jaosorior has joined #openstack-infra09:11
*** rfolco has joined #openstack-infra09:14
*** yamamoto has quit IRC09:17
*** soniya29 has joined #openstack-infra09:22
*** kaiokmo has joined #openstack-infra09:24
*** rcernin has joined #openstack-infra09:38
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul-jobs master: Fix delegating nodepool_ip and switch_ip facts  https://review.opendev.org/68118209:50
*** happyhemant has quit IRC09:56
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - handle Pull Request tags (labels) metadata  https://review.opendev.org/68105009:59
*** dklyle has quit IRC10:03
*** dklyle has joined #openstack-infra10:04
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - handle initial comment change event  https://review.opendev.org/68031010:06
*** udesale has quit IRC10:09
*** apetrich has quit IRC10:10
*** udesale has joined #openstack-infra10:10
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - handle Pull Request tags (labels) metadata  https://review.opendev.org/68105010:12
*** markvoelker has joined #openstack-infra10:16
*** xenos76 has quit IRC10:16
*** markvoelker has quit IRC10:21
*** xenos76 has joined #openstack-infra10:26
*** AJaeger has joined #openstack-infra10:43
jrosseri think this is wrong using version_compare as a filter when it is a test https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/templates/etc/yum.repos.d/fedora-updates.repo.j2#L510:46
*** markvoelker has joined #openstack-infra10:55
*** jpena is now known as jpena|lunch10:59
*** ociuhandu has joined #openstack-infra11:00
*** markvoelker has quit IRC11:00
*** ociuhandu has quit IRC11:01
*** nicolasbock has joined #openstack-infra11:06
*** ociuhandu has joined #openstack-infra11:07
*** rcosnita has joined #openstack-infra11:11
*** rcosnita has quit IRC11:11
*** udesale has quit IRC11:14
*** larainema has quit IRC11:15
*** pgaxatte has quit IRC11:19
*** rh-jelabarre has joined #openstack-infra11:29
*** rh-jelabarre has quit IRC11:30
*** rh-jelabarre has joined #openstack-infra11:30
*** priteau has quit IRC11:38
*** prometheanfire has quit IRC11:41
*** prometheanfire has joined #openstack-infra11:41
zbrianw: can you help me with https://review.opendev.org/#/c/680962/ ?11:48
*** ykarel is now known as ykarel|afk11:50
*** pgaxatte has joined #openstack-infra11:50
*** jamesmcarthur has joined #openstack-infra11:54
zbrsince we switched to the new dashboard view for logs the custom success/failure-urls are no longer working, is that a known bug?11:59
AJaegerzbr: can you give an example, please?12:04
fricklerzbr: that is being called a feature12:04
*** markvoelker has joined #openstack-infra12:04
fricklerinfra-root: devstack fedora-"latest", i.e. F28, jobs are failing with this mirror issue, is there anything we can do? ianw was working on moving to F29 or F30, but that's not ready yet IIUC https://zuul.opendev.org/t/openstack/build/8dd1004470124f9ba95a677c7d95995e/log/job-output.txt#232112:05
*** jamesmcarthur has quit IRC12:08
sean-k-mooneylogan-: thanks for the offer of trying to provide a multi numa flavor with nested vert via limestone. it would be very helpful if you could. in general haveing severl sources of the lable would likely help mitigate any network issue we hit or indiviugal error in a singel provider.12:09
sean-k-mooneyfrickler: just so you are aware a future qemu is going to break oslo utils12:09
*** jpena|lunch is now known as jpena12:10
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul-jobs master: Fix evaluating nodepool_ip and switch_ip facts  https://review.opendev.org/68118212:10
sean-k-mooneyi had a job that was trying to use the virt-preview repo with fedora 28 and i found that the output of qemu image info or something like that in the future version has chagned12:10
sean-k-mooneyso going to fedora 30 might hit the same issue12:10
sean-k-mooneyits a relitivly easy fix but i have not done it yet12:11
*** apetrich has joined #openstack-infra12:12
*** njohnston has joined #openstack-infra12:13
*** jamesmcarthur has joined #openstack-infra12:13
AJaegerfrickler: FYI, devstack is the only remaining f28 user - everybody else got moved to F29 already12:16
*** goldyfruit_ has quit IRC12:18
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul-jobs master: Fix evaluating nodepool_ip and switch_ip facts  https://review.opendev.org/68118212:19
*** rlandy has joined #openstack-infra12:23
*** jamesmcarthur has quit IRC12:24
*** gfidente has quit IRC12:24
*** Lucas_Gray has joined #openstack-infra12:25
*** jamesmcarthur has joined #openstack-infra12:26
openstackgerritAndy Ladjadj proposed zuul/zuul master: Fix: prevent usage of hashi_vault  https://review.opendev.org/68104112:28
*** jamesmcarthur has quit IRC12:29
*** Lucas_Gray has quit IRC12:32
*** Wryhder has joined #openstack-infra12:32
*** Wryhder is now known as Lucas_Gray12:33
*** apetrich has quit IRC12:33
zbrAJaeger: frickler yes, here is the example: click the openstack-tox-molecule job on https://review.opendev.org/#/c/669223/612:34
*** soniya29 has quit IRC12:34
zbrthis job was supposed to load the reports.html file from tox, based on https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L212-L21312:34
zbrit was doing this, but not anymore12:35
zbri think it stopped doing this when we switched to way of browsing the logs12:35
*** pgaxatte has quit IRC12:35
zbrthe report.html is still there, but user needs to dig to find it.12:36
AJaegerzbr: yes, indeed. To make the reports easy available, you need to follow the changes we did for docs12:37
*** derekh has quit IRC12:43
*** pgaxatte has joined #openstack-infra12:43
*** derekh has joined #openstack-infra12:43
zbrAJaeger: I am not sure what you did for docs, tried to find but failed. Clearly the output file is collected, what is missing is the change URLs.12:47
zbrif is no longer possible to change the success-url is ok as long I can make the repot visible on the log main page, maybe like the "console" tab?12:48
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul-jobs master: Fix evaluating nodepool_ip and switch_ip facts  https://review.opendev.org/68118212:48
zbrclearly that file is one of the most important outcome (artifact) of that build, like the html on docs too. and we need to make it easily accesible.12:49
AJaegerzbr: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/fetch-sphinx-tarball/tasks/html.yaml#L31 shows what was done - zuul_return is the magic.12:52
openstackgerritAndy Ladjadj proposed zuul/zuul master: Fix: prevent usage of hashi_vault  https://review.opendev.org/68104112:54
zbrAJaeger: thanks, this is clearly what i was looking for. One more bit: do I still need to get the file again here, even if I know that is collected correctly?12:55
*** mriedem has joined #openstack-infra12:55
AJaegerzbr: not sure, better ask corvus later12:55
*** AJaeger has quit IRC12:55
zbrAJaeger: i will try anyway, i will learn by doing it. thanks again, really helpful.12:55
*** aaronsheffield has joined #openstack-infra12:56
*** Goneri has joined #openstack-infra12:57
*** udesale has joined #openstack-infra12:57
*** hamzy_ has joined #openstack-infra12:59
*** kaiokmo has quit IRC13:02
*** gfidente has joined #openstack-infra13:03
*** kaiokmo has joined #openstack-infra13:05
*** ykarel|afk is now known as ykarel13:10
*** eharney has joined #openstack-infra13:12
openstackgerritSorin Sbarnea proposed openstack/openstack-zuul-jobs master: openstack-tox-molecule: replace success-url and failure-url  https://review.opendev.org/68125113:13
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul-jobs master: Fix evaluating nodepool_ip and switch_ip facts  https://review.opendev.org/68118213:16
*** sthussey has joined #openstack-infra13:18
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - reference pipelines add open: True requirement  https://review.opendev.org/68125213:22
*** goldyfruit_ has joined #openstack-infra13:23
openstackgerritMatt Riedemann proposed opendev/elastic-recheck master: Add query for nova functional test race bug 1843433  https://review.opendev.org/68125613:28
openstackbug 1843433 in OpenStack Compute (nova) "functional test test_migrate_server_with_qos_port fails intermittently due to race condition" [Medium,Confirmed] https://launchpad.net/bugs/1843433 - Assigned to Balazs Gibizer (balazs-gibizer)13:28
fungiamorin: looks like our ovh account is still working since you fixed it on sunday... do you think we're safe to assume the bot isn't going to automatically turn that account off again at this point?13:37
amorinfungi: yes13:38
amorinwe are safe now13:38
fungithanks! we've been waiting to turn swift log storage back on there since it's not as robust against such things as nodepool13:38
fungii'll approve change 680855 now in that case13:38
*** eharney has quit IRC13:42
*** rcernin has quit IRC13:45
*** AJaeger has joined #openstack-infra13:46
openstackgerritCorey Bryant proposed openstack/project-config master: Retire charm-neutron-api-genericswitch  https://review.opendev.org/68125913:47
openstackgerritMerged opendev/elastic-recheck master: Add query for nova functional test race bug 1843433  https://review.opendev.org/68125613:48
openstackbug 1843433 in OpenStack Compute (nova) "functional test test_migrate_server_with_qos_port fails intermittently due to race condition" [Medium,In progress] https://launchpad.net/bugs/1843433 - Assigned to Balazs Gibizer (balazs-gibizer)13:48
openstackgerritMerged opendev/base-jobs master: Revert "Stop storing logs on OVH"  https://review.opendev.org/68085513:51
*** ramishra has quit IRC13:51
*** ramishra has joined #openstack-infra13:51
*** eharney has joined #openstack-infra13:55
zbrwhat are the supported metadata types on zuul artifacts? I tried to find some docs but apparently they are just a free form test field.13:56
*** ociuhandu has quit IRC13:57
openstackgerritCorey Bryant proposed openstack/project-config master: End project gating for charm-neutron-api-genericswitch  https://review.opendev.org/68125913:57
*** ociuhandu has joined #openstack-infra13:58
*** jtomasek has quit IRC14:00
openstackgerritSorin Sbarnea proposed openstack/openstack-zuul-jobs master: openstack-tox-molecule: replace success-url and failure-url  https://review.opendev.org/68125114:01
*** jtomasek has joined #openstack-infra14:01
fungizbr: i don't know, but i wouldn't be surprised if that's left up to the service being used to store and serve artifacts, so in our case, swift14:02
*** tkajinam has joined #openstack-infra14:02
openstackgerritCorey Bryant proposed openstack/project-config master: End project gating for charm-neutron-api-genericswitch  https://review.opendev.org/68125914:03
*** ociuhandu has quit IRC14:03
corvuszbr: yes, free form14:05
zbrcorvus: optional?14:05
corvuszbr: yes; not currently used by zuul.  actually the entire metadata dict is optional and free-form.14:08
zbrcorvus: thanks.14:08
corvuszbr: the promote jobs use those fields to find the right kind of artifact to promote14:08
fungiahh, so we don't pass the mime type info along in the swift upload to instruct it what mime type to claim when serving the files?14:09
corvusfungi: we do, but not manually, that's handled automatically by the uploads-logs-swift role14:09
fungioh, got it, so we use a magic number lib of some sort to infer mime types rather than whatever the job has claimed?14:10
openstackgerritCorey Bryant proposed openstack/project-config master: End project gating for charm-neutron-api-genericswitch  https://review.opendev.org/68125914:10
corvusfungi: yep.  there isn't actually any direct link between an artifact and a logfile anyway; usually it points to the logserver, but you can (and we do) link to other urls such as zuul-preview urls14:11
fungithat makes sense. thanks!14:12
openstackgerritCorey Bryant proposed openstack/project-config master: Remove charm-neutron-api-genericswitch from infra  https://review.opendev.org/68127014:15
*** Lucas_Gray has quit IRC14:24
*** Lucas_Gray has joined #openstack-infra14:25
*** dtantsur is now known as dtantsur|afk14:25
donnydclarkb: should we re-enable swift logs for FN... the propane guy didn't even show up yesterday14:28
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul master: Normalize/alias nodepool inventory IPs in executor  https://review.opendev.org/68127314:28
clarkbdonnyd: sure, I think we are going to reenable ovh too based on emails so probably want to stack on AJaeger's change for that if it hasnt merged yet14:29
donnydyea that makes sense to me14:29
*** Lucas_Gray has quit IRC14:30
*** armax has joined #openstack-infra14:31
*** ociuhandu has joined #openstack-infra14:32
*** ociuhandu has quit IRC14:33
fungiclarkb: i replied to that e-mail and we also discussed it in here at 13:37z, so the change merged to reenable logs in ovh at 13:51z14:33
*** ociuhandu has joined #openstack-infra14:33
clarkbthanks!14:34
donnydclarkb: you have the review handy for that?14:34
openstackgerritCorey Bryant proposed openstack/project-config master: End project gating for charm-neutron-api-genericswitch  https://review.opendev.org/68125914:34
clarkbdonnyd: soundslike it merged so just base your change on current master14:34
donnydok14:34
openstackgerritCorey Bryant proposed openstack/project-config master: End project gating for charm-neutron-api-genericswitch  https://review.opendev.org/68125914:35
*** ianychoi_ is now known as ianychoi14:37
*** Lucas_Gray has joined #openstack-infra14:37
openstackgerritCorey Bryant proposed openstack/project-config master: End project gating for charm-neutron-api-genericswitch  https://review.opendev.org/68125914:37
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul master: Normalize/alias nodepool inventory IPs in executor  https://review.opendev.org/68127314:37
*** ociuhandu has quit IRC14:37
openstackgerritDonny Davis proposed opendev/base-jobs master: Re-enable FN swift logs - electrical work complete  https://review.opendev.org/68127514:38
openstackgerritJeremy Stanley proposed openstack/project-config master: Report openstack/election repo changes in IRC  https://review.opendev.org/68127614:38
*** ociuhandu has joined #openstack-infra14:39
openstackgerritTristan Cacqueray proposed zuul/zuul master: spec: add a zuul-runner cli  https://review.opendev.org/68127714:40
*** ykarel is now known as ykarel|afk14:42
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Add support for the Gerrit checks plugin  https://review.opendev.org/68077814:42
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Add no-jobs reporter action  https://review.opendev.org/68127814:42
openstackgerritCorey Bryant proposed openstack/project-config master: End project gating for charm-neutron-api-genericswitch  https://review.opendev.org/68125914:43
*** e0ne has quit IRC14:44
*** e0ne has joined #openstack-infra14:44
*** jamesmcarthur has joined #openstack-infra14:47
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - handles pull-request.closed event  https://review.opendev.org/68127914:49
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul-jobs master: Fix evaluating nodepool_ip and switch_ip facts  https://review.opendev.org/68118214:50
*** Lucas_Gray has quit IRC14:52
fungitristanC: this is not urgent, but i noticed that since you originally created the #openstack-election irc channel on freenode, you're the only one with administrative control over it. when you get time, can i trouble you to run this? `/msg chanserv access #openstack-election add openstackinfra +AFRefiorstv`14:56
openstackgerritCorey Bryant proposed openstack/project-config master: Remove charm-neutron-api-genericswitch from infra  https://review.opendev.org/68127014:58
*** udesale has quit IRC15:00
*** udesale has joined #openstack-infra15:00
*** diablo_rojo has joined #openstack-infra15:01
*** spsurya has quit IRC15:05
openstackgerritMerged opendev/base-jobs master: Re-enable FN swift logs - electrical work complete  https://review.opendev.org/68127515:05
clarkbdonnyd: ^ fyi15:05
openstackgerritMerged zuul/zuul master: Overriding max. starting builds.  https://review.opendev.org/67046115:05
*** pgaxatte has quit IRC15:06
*** tkajinam has quit IRC15:13
donnydclarkb: i just tested swift to make sure its actually working. I don't think we had any issues with it before did we?15:13
clarkbdonnyd: I had not heard of any issues before15:14
openstackgerritCorey Bryant proposed openstack/project-config master: End project gating for charm-neutron-api-genericswitch  https://review.opendev.org/68125915:14
openstackgerritCorey Bryant proposed openstack/project-config master: Remove charm-neutron-api-genericswitch from infra  https://review.opendev.org/68127015:14
clarkbjohnsom: fyi I think https://review.opendev.org/680340 is ready now to fix worlddump in devstack which will hopefully aid in debugging your dns problems15:15
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul-jobs master: Fix evaluating nodepool_ip and switch_ip facts  https://review.opendev.org/68118215:16
donnydclarkb: I am also working on a faster router.. I think it is possible that FN is having an issue there and having something that can keep up would probably help greatly15:18
*** ociuhandu has quit IRC15:19
donnydwe keep seeing the occasional DNS can't be found or ssh inbound not working kinda things15:19
*** ociuhandu has joined #openstack-infra15:19
donnydAnd currently that device *is* the bottleneck15:19
donnydso I am working on something that should be in orders of magnitude better15:20
*** ociuhandu has quit IRC15:22
*** ociuhandu has joined #openstack-infra15:23
openstackgerritMerged zuul/zuul master: Update heuristing of parallel starting builds.  https://review.opendev.org/67170215:23
*** ginopc has joined #openstack-infra15:24
*** gfidente has quit IRC15:25
*** ociuhandu has quit IRC15:28
*** ociuhandu has joined #openstack-infra15:31
*** gyee has joined #openstack-infra15:35
*** ociuhandu has quit IRC15:38
openstackgerritJeremy Stanley proposed openstack/project-config master: Report openstack/election repo changes in IRC  https://review.opendev.org/68127615:38
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul-jobs master: Fix evaluating nodepool_ip and switch_ip facts  https://review.opendev.org/68118215:38
openstackgerritClark Boylan proposed opendev/base-jobs master: Remove explicit prints from cleanup playbook  https://review.opendev.org/68129115:38
clarkbinfra-root ^ testing shows those prints are not required. If we can get that in I'll rerun tests then push up a change to apply to base and base-minimal if it still looks good15:38
fungii fast-approved it15:41
*** apetrich has joined #openstack-infra15:41
*** ykarel|afk is now known as ykarel|away15:42
*** factor has joined #openstack-infra15:45
*** icarusfactor has quit IRC15:45
*** ykarel|away has quit IRC15:47
clarkbtyty15:47
*** ociuhandu has joined #openstack-infra15:47
*** mattw4 has joined #openstack-infra15:48
corvusclarkb: do you even need the register?15:48
clarkbcorvus: oh no I don't15:48
openstackgerritClark Boylan proposed opendev/base-jobs master: Remove explicit prints from cleanup playbook  https://review.opendev.org/68129115:49
fungiunapproved in that case15:49
clarkblets go ahead and fix that now so that we have what we want for base and base-minimal when happy with the results15:49
corvusclarkb: did you have a test change i can look at?15:49
clarkbcorvus: I do https://zuul.opendev.org/t/zuul/build/0ee6b77daafd40e3ac410e6f308801c8 that one15:49
corvus(just curious to see what that looks like)15:49
clarkbcorvus: due to when cleanup runs we don't get that in published logs though. You have to grep that uuid on the executor (ze04)15:49
corvusclarkb: ah right :)15:50
openstackgerritBogdan Dobrelya (bogdando) proposed zuul/zuul-jobs master: Fix evaluating nodepool_ip and switch_ip facts  https://review.opendev.org/68118215:54
*** ginopc has quit IRC15:55
*** ociuhandu has quit IRC15:58
*** ociuhandu has joined #openstack-infra15:58
*** ociuhandu has quit IRC16:02
*** ociuhandu has joined #openstack-infra16:03
openstackgerritMerged opendev/base-jobs master: Remove explicit prints from cleanup playbook  https://review.opendev.org/68129116:03
clarkbjohnsom: the mirror hasn't been updated yet. ianw was working on that overnight (relative to my location)16:04
clarkbjohnsom: I'll see if I can't figure out where that left off16:04
johnsomThank you!16:05
*** ykarel|away has joined #openstack-infra16:06
*** tesseract has quit IRC16:06
*** rpittau is now known as rpittau|afk16:06
*** eernst has joined #openstack-infra16:08
clarkbI think ianw's script completed (I don't see it running anymore and the only locked volume insn't in his script's list)16:09
clarkbcorvus: fungi ^ did you want to double check that? but if you agree I think we can turn the mirror-update servers back on and have it start updating thigns again?16:09
clarkband that should get us an up to date mirror?16:09
clarkbI guess we also need to know if ianw's script successfully vos released those volumes before we let the cron at it which may timeout?16:10
clarkb/afs/.openstack.org/mirror/ubuntu/timestamp.txt and /afs/openstack.org/mirror/ubuntu/timestamp.txt differ. Maybe that means we need to do a vos release16:11
*** eernst has quit IRC16:11
corvusclarkb: yes, and the 'last update' timestamps on mirror.ubuntu and mirror.ubuntu.readonly differ16:12
*** lpetrut has quit IRC16:12
corvusclarkb: (under 'vos examine mirror.ubuntu' and 'vos examine mirror.ubuntu.readonly')16:13
clarkbcorvus: are we able to check if there is a vos release running for a volume without running ps everywhere? maybe examine would tell us that?16:13
clarkbbecause if we know that vos release isn't running for ubuntu then maybe we just run it?16:13
*** mattw4 has quit IRC16:14
corvusclarkb: 'vos listvldb' says mirror.wheel.bionicx64 is the only release in progress16:14
corvusclarkb: and yes, vos examine shows the same info16:14
*** gfidente has joined #openstack-infra16:15
clarkbhere we go. root@afs01.dfw.o.o:/root/unlock.log and unlock.sh16:15
clarkbthe unlock.log files shows that ubuntu released successfully ~2.5 hours ago16:15
clarkbexcept the RW and RO volumes differ in content16:16
*** kopecmartin is now known as kopecmartin|off16:16
corvusdid a mirror update somehow run after the release started?16:17
clarkbcorvus: I don't think so unless the mirror-updat servers booted again16:18
*** e0ne has quit IRC16:18
clarkbianw shutdown the update servers in order to run this script unchallenged16:18
*** mattw4 has joined #openstack-infra16:25
fungithey're not responding to icmp ping, at least16:25
mnaseri have a question -- how do we feel about enabling CR for all of infra images and installing py3 in there by default?16:27
mnasermaking all of our images py3 native16:27
mnasers/indra/nodepool/16:27
clarkbCR?16:27
clarkbfungi: corvus I need to find breakfast but can keep digging on the afs stuff after back in a bit16:27
clarkbfungi: corvus: thinking maybe we do another vos release just to rule it out and then maybe we need to ivnestigate if the volume needs salvaging?16:28
tristanCclarkb: I guess mnaser is refering to https://wiki.centos.org/AdditionalResources/Repositories/CR16:30
corvusclarkb: another vos release sounds prudent.  the timestamps only differ by like 1 minute which suggests to me some minor change16:30
mnasercorrect, using CR there16:31
clarkbtristanC: mnaser: I think one of our goals with our testing is to test what people will find in the wild. If you install centos 7 you don't get the CR repos by default. If you want contents from that repo then your jobs can update the node as necessary to make that happen then it is also documented and automatable for anyone in the real world trying to do that same16:31
clarkbI don't think we should update the base images we use to do that16:32
mnaserclarkb: but we also develop what you dont see in the wild yet (for master)16:32
mnaserand whatever is in CR will become centos 7.7 in the next few days16:32
tristanCmnaser: we do enable CR in some of our jobs as a pre-tasks16:32
fungiyeah, testing openstack on centos7 with cr suggests we expect users will actually deploy that combination16:32
*** mattw4 has quit IRC16:33
pabelangerpython3 is in CR for centos7 now?16:34
mnaseryep16:34
pabelangerwow16:34
pabelangerstill holding out hope for centos 8 :p16:36
pabelangerhttps://wiki.centos.org/About/Building_8 seems to imply things are pretty much done16:37
fungithe easy 99% is done, the nearly impossible 1% is all that's left now? ;)16:38
mnaserthere's an intersting thread on centos-devel16:39
*** igordc has joined #openstack-infra16:39
clarkbmnaser: specifically we test our software's future. If we wantto have specific jobs that combine our future with another future we can do that. But I dont think we want that as default16:39
*** jpena is now known as jpena|off16:39
clarkbcorvus: ok I'm running `vos release -v mirror.ubuntu -localauth` on afs01.dfw in root screen window 116:43
*** igordc has quit IRC16:44
*** derekh has quit IRC16:44
*** kjackal has quit IRC16:46
* fungi disappears for a bit to try and reverse the boarding-up of windows16:46
clarkbvos release is done and the timestamps match now16:47
clarkbchecking if the kernel udpate is present16:48
clarkbkernel update is not present16:48
clarkbI think that means next step is turning the mirror update server back on and doign an ubuntu mirror update16:48
*** gfidente has quit IRC16:49
corvus++16:49
*** ykarel|away has quit IRC16:54
clarkbserver is booted. the /var/run/reprepro dir does not exist. I seem to recall /var/run with systemd is an fs not backed on disk and we use puppet to create that dir?16:55
clarkbI'll manually create it once I Figuer that out to speed things up16:55
clarkbhrm no maybe that isn't how it is set up? where does /var/run/reprepro come from?16:57
clarkbah it is in puppet just a different manifest16:57
clarkbubuntu mirror update is now running in screen on mirror-update01.openstack.org16:59
*** ociuhandu_ has joined #openstack-infra17:00
*** ociuhandu has quit IRC17:03
*** ykarel|away has joined #openstack-infra17:03
*** ociuhandu_ has quit IRC17:05
*** udesale has quit IRC17:12
clarkbnb01 and nb02 are not running builders currently. Looks like they both OOMed on august 2717:12
clarkbI'm going to clean their disks then reboot them (may as well get updates since they aren't doing anything anyway)17:12
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Add support for the Gerrit checks plugin  https://review.opendev.org/68077817:14
clarkbubuntu mirror is updated and released. Now just need to get the builders going17:16
*** jamesmcarthur has quit IRC17:18
clarkbnb01 has been asked to reboot but it has not come back yet :/17:29
clarkbI'll give it 5 more minuets then use the api to re reboot it17:30
*** njohnston is now known as njohnston|lunch17:30
*** ralonsoh has quit IRC17:32
*** nicolasbock has quit IRC17:35
*** jtomasek has quit IRC17:37
clarkbhard reboot fixed it. Now cleaning up the disk and then will reenable nodepool-builder and reboot again (to ensure that reboots not after an OOM work)17:37
*** ralonsoh has joined #openstack-infra17:39
*** nicolasbock has joined #openstack-infra17:40
*** igordc has joined #openstack-infra17:44
*** trident has quit IRC17:46
*** e0ne has joined #openstack-infra17:46
*** igordc has quit IRC17:50
clarkbnb01 is up and running and building images17:53
*** kjackal has joined #openstack-infra17:54
*** diablo_rojo has quit IRC17:55
*** igordc has joined #openstack-infra17:57
*** trident has joined #openstack-infra17:59
openstackgerritClark Boylan proposed opendev/base-jobs master: Add cleanup playbook to all base jobs  https://review.opendev.org/68132218:05
clarkbinfra-root ^ I think that is ready. I also pasted logs from ze10 for a base-test tested job that ran there18:06
*** igordc has quit IRC18:06
*** igordc has joined #openstack-infra18:06
AJaegerclarkb: can we avoid the duplication? Why not use the same playbook everywhere?18:09
AJaegerclarkb: you made a pasto, I commented18:10
clarkbAJaeger: thats how those jobs are done today with the duplication18:10
clarkbI belive so that they can be modified without unintended side effects18:10
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Add support for the Gerrit checks plugin  https://review.opendev.org/68077818:12
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Add report time to item model  https://review.opendev.org/68132318:12
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Add Item.formatStatusUrl  https://review.opendev.org/68132418:12
AJaegerclarkb: the others mainly include roles, but yes, it's duplicated.18:13
corvusclarkb, AJaeger: when we correct the typo, can we put the playbook in 'base/' rather than base-minimal?18:13
corvus(so the base-minimal job will refer to the base/ playbook, not the other way around)18:14
corvusoh wait i have misunderstood18:14
paladoxcorvus you'll be happy with https://github.com/dburm/pg-test-result-plugin !18:14
corvusone sec18:14
paladoxlive demo at https://gerrit.git.wmflabs.org/r/c/testing/test/+/310118:14
corvusclarkb, AJaeger: forget i said anything :)18:15
*** igordc has quit IRC18:15
corvuspaladox: cool, that looks like a polygerrit version of our hacky hideci script18:16
paladoxyeh18:16
corvuspaladox: so our upgrade path can be 2.13 ---> 2.16, with hideci on the gwt ui, and pg-test-results on the polygerrit ui, pause for a bit, then -> 3.0 with checks18:17
paladoxyup18:17
*** eharney has quit IRC18:17
paladoxi plan on installing that for wikimedia18:18
corvuspaladox: nice, i think that'll work great, thanks!18:18
paladoxif there's no objection18:18
*** jamesmcarthur has joined #openstack-infra18:18
corvusnoted in https://etherpad.openstack.org/p/gerrit-upgrade18:18
paladox:)18:18
paladox(just noting it works for only the latest patchset)18:19
paladoxit basically looks through messages to match a pattern.18:19
corvuspaladox: that's mostly what hideci does too (though it counts runs from previous patchsets to tell you how many there were; we can live without that i think)18:19
paladoxah18:19
corvusor, it's probably a trivial pr to the pg plugin to add that if it sounds useful18:20
paladoxyup18:20
paladoxcorvus https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/gerrit/plugins/wikimedia/+/master/gr-wikimedia/gr-wikimedia-custom-buttons.html is what i used to add a "Recheck" button if you want to copy that!18:22
clarkbcorvus: AJaeger do you think maybe base-test should be a separate playbook since we tend to test that independently but then I could have base and base-minimal share a common playbook?18:23
clarkbI do think having base-test be separate is likely a good thing to avoid test stuff leaking into production unexpectedly18:23
openstackgerritClark Boylan proposed opendev/base-jobs master: Add cleanup playbook to all base jobs  https://review.opendev.org/68132218:24
clarkbAJaeger: corvus ^ that fixes the typ18:24
clarkb#status log Rebooted and cleaned up /opt/dib_tmp on nb01 and nb02 after their builders stopped running due to OOMs18:28
openstackstatusclarkb: finished logging18:28
clarkbjohnsom: ^ I've got the entire image build toolchain running again. Now we just need ubuntu-bionic to haev its turn in the queue and get uploaded18:33
johnsomOk, let me know when I should fire up a test18:34
*** ociuhandu has joined #openstack-infra18:39
*** ykarel|away has quit IRC18:39
*** ociuhandu has quit IRC18:44
AJaegerclarkb: sharing base and base-minimal should be ok - and yes, having base-test separately helps. But we can keep that separate as well...18:44
*** ralonsoh has quit IRC18:46
fungiwell, base-test is for manually testing the base job. base-minimal is for using an analog of the base job in tests of jobs, right?18:48
clarkbfungi: ya, mostly base-minimal replaces pre-run playbook stuff though so that test jobs can test those playbooks directly18:49
*** goldyfruit___ has joined #openstack-infra18:49
*** gyee has quit IRC18:49
fungithe main consumer, according to codesearch, is https://opendev.org/zuul/zuul-jobs/src/branch/master/zuul-tests.d/general-roles-jobs.yaml#L103-L10618:50
*** gyee has joined #openstack-infra18:50
fungiso if the desire is to separate the base job from tests of the base job or its contents, then separating base, base-test and base-minimal is desirable18:50
*** goldyfruit_ has quit IRC18:52
fungialso jobs like https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L6-L1918:52
*** panda is now known as panda|rover|off18:54
*** jamesmcarthur has quit IRC18:57
*** ricolin has quit IRC18:58
*** kjackal has quit IRC19:03
*** kjackal has joined #openstack-infra19:08
ianwok, mirroring update -- it looks like everything but mirror.opensuse released itself and is back in sync19:09
clarkbianw: ya I turned mirror-update.openstack.org back on19:09
ianwi have run the salvage described in https://lists.openafs.org/pipermail/openafs-devel/2018-May/020493.html19:09
clarkbianw: in order to get ubuntu updates with kernel that fixes ovs panic19:09
clarkband that all seems happy19:09
ianwi'm re-running the mirror.opensuse release again (in root screen on afs01) and we can see if that volume is fixed by that19:10
clarkbianw: sounds great19:10
*** eharney has joined #openstack-infra19:11
Shrewsianw: since you're around now, do you still need those two held nodes? couple of weeks old now19:17
*** pcaruana has quit IRC19:20
ianwShrews: ... um which ones sorry?19:20
Shrewsianw: i don't remember. i just cleaned several yesterday. yours were the only 2 left19:20
clarkb(I've since added a couple debugging network issues in testing)19:20
Shrewsianw: the comment is "ianw: redirs" if that helps19:22
clarkbjohnsom: bionic is building now so hopefully in about an hour. Will confirm with you before you should recheck though19:22
ianwShrews: oh right yeah, sorry just pulling up status page ... no that can be deleted if you have a console up, thanks, or i can do it19:22
Shrewsianw: i'm there now, i'll delete them19:22
ianwthanks19:22
Shrewsdone. np19:23
*** pkopec has quit IRC19:28
*** mattw4 has joined #openstack-infra19:29
*** mattw4 has quit IRC19:33
*** panda|rover|off has quit IRC19:37
openstackgerritJames E. Blair proposed opendev/system-config master: Add docs for recovering an OpenAFS fileserver  https://review.opendev.org/68133819:38
*** panda has joined #openstack-infra19:38
johnsomclarkb Ok, thanks19:42
clarkbinfra-root https://review.opendev.org/#/c/681322/ cleanup playbook in production is ready for review now I think20:01
clarkbjohnsom: bionic image just finished building and is being uploaded to clouds now20:02
ianwinfra-root: also if one other wants to check f30 node support with -> https://review.opendev.org/680919 ... i can monitor20:02
johnsomNice20:03
ianwonce this opensuse volume releases (assuming it does now after the salvage ...) i might take the chance to switch on auditing and try a fedora volume release before re-enabling20:05
*** e0ne has quit IRC20:19
clarkbmgagne: are you about? curious if you know what the status of infra using inap is. I think you mentioned someone else was working on it now?20:35
*** ociuhandu has joined #openstack-infra20:37
openstackgerritMerged zuul/zuul master: Record handler tasks in json job output  https://review.opendev.org/68072620:38
clarkbcmurphy: did you manage to get the timeout fixes in for keystone? I noticed that the lower constraints job hits that too (and I'm guessing upper constraints) if they hvaen't been updated yet20:48
cmurphyclarkb: yeah we did but i missed that one https://review.opendev.org/68116120:49
cmurphyif you want to make the gate a little healthier you could consider promoting that one, it's been waiting for almost 6 hours20:50
clarkbcool. Also I think openstack/requirements runs your unittests on dep updates20:50
cmurphyoof :/20:50
clarkbshould be able to make a similar update in requirements20:50
clarkbas for promoting I'm not opposed though we've got cinder also flapping. It would probaly be helpful if someone could take a cross project view and start identifying these changes as priorities then we can promote them all20:51
*** kjackal has quit IRC20:52
*** ociuhandu has quit IRC20:52
*** ociuhandu has joined #openstack-infra20:53
clarkbcmurphy: I enqueued 681161 into the gate (but didn't promote it to head since it looks like things ahead of it might be stable ish for now?)20:53
clarkbcinder in particular has been flapping on unittests and the cinder ahead passed those20:54
clarkbif I catch it reseting again I'll promote20:54
cmurphythanks clarkb20:55
clarkbsmcginnis: diablo_rojo_phon ttx ^ thinking out loud here the release team might want to act like ATC for changes that fix the gate?20:55
clarkb(qa seems a bit hands off these days, but maybe someone from qa is willing to help too?)20:56
fungii'm not sure what "act like atc" means20:56
clarkbfungi: track changes that fix issues in our testing and ensure they get enqueued/promoted as appropriate when ready20:57
*** ociuhandu has quit IRC20:57
clarkband to prioritize these efforts across projects20:57
clarkbbasically the situation we are in now is a large backlog because we are all fighting against each other with flaky testing20:57
clarkbrather than focusing on ensuring that fixes get attention first20:57
smcginnisclarkb: I can push the cinder fix through. It's one of those stupid false UT failures.20:58
fungiahh, okay, i mainly just didn't know what "atc" meant in that context20:58
clarkbjohnsom: bionic is updated on all clouds that run jobs right now20:58
clarkbjohnsom: I think you are clear to check if octavia jobs are happier now20:58
johnsomclarkb Awesome. I will give it a go20:59
clarkbfungi: for example that keystone fix hsa sat queued for ~5 hours in check because there are a ton of nova neutron cinder keystone etc changes all up too20:59
fungimaybe it's just that i don't know what "atc" is an abbreviation of there. not "active technical contributor" but something else20:59
clarkbfungi: air traffic control21:00
clarkbsorry21:00
fungigot it. thanks!21:00
*** Goneri has quit IRC21:00
fungii thought you were suggesting that they should open voting for release management ptl to anyone who fixed gate bugs (which is also an interesting suggestion, but entirely different)21:00
clarkboh no mostly we need someone (some group) to basically do air traffic contorl for changes since things are really unhappy and we are theoretically in a stablization period and have RCs in ~2 weeks21:01
fungiyes, having release and/or qa folks help track and coordinate major gate-obstructing fixes in openstack projects would be a huge help21:01
smcginnisWe may want to push through https://review.opendev.org/#/c/681318/21:01
smcginnisIt's approved, but is going to take a long time yet to land.21:01
smcginnisI think this was the main cinder issue causing resets.21:02
clarkbsmcginnis: I've enqueued it too21:03
smcginnisThanks!21:03
clarkbthe concern I have is that a long ish gate that resets often represents significant wasted effort21:04
smcginnisDefinitely.21:05
clarkband that then slows down check21:05
fungiyes, and significant delays21:05
fungithat21:05
fungislows down everything, not *just* check21:05
clarkbya21:05
clarkbI've got the promote command typed up for those two and will run it if I catch a reset21:07
clarkbsmcginnis: note the top of gate is a cinder change that failed in unittests and legacy-grenade-dsvm-cinder-mn-sub-volbak21:12
clarkblooks like an error with nova cells in that job21:14
clarkbmriedem: ^ https://48dfe966566e2a08c129-e099c5c03695c7198c297e75ec3f8d05.ssl.cf1.rackcdn.com/680838/1/gate/legacy-grenade-dsvm-cinder-mn-sub-volbak/ab25959/logs/grenade.sh.txt.gz do you understand why that may happen?21:14
mriedemlooks like a subnode isn't getting discovered, could be a race in how the job is setup21:17
mriedemi've never heard of that job before though - is it voting?21:17
clarkbmriedem: yes it and a unittest failure are the cause of the latest gate reset21:18
mriedemnot really a cells issue, just a failure to map the host issue21:18
clarkbwe think we've got a change up to fix the unittest issue. Next up fixing that grenade job I guess21:18
clarkbmriedem: thats the thing that d-g runs after all the stack.sh runs right?21:18
clarkbits from d-g/tool/something.sh ?21:18
clarkb(its a legacy job so should be using d-g)21:18
mriedemyeah https://github.com/openstack/devstack-gate/blob/401e7535c6c29f9ba814058610ef6cefc952678b/devstack-vm-gate.sh#L25121:19
mriedembtw https://review.opendev.org/#/c/681238/ is a known nova functional test race bug fix as well21:19
clarkbmriedem: that change is a bit more involved than the other two I just enqueued to the gate so don't trust myself to evaluate it. you however have approved it. Are we reasonably sure it will pass testing as is without having run check tests first?21:21
clarkb(I don't want to enqueue something that will make the gate worse)21:21
mriedemi was going to rebase it into the middle of another series that needs to use some code off it so it can sit out21:22
clarkbk21:22
mriedemwill be doing so from the comfort of a dance class lobby while my kid is in class...21:22
mriedemi see this Creating host mapping for compute host 'ubuntu-xenial-ovh-gra1-0011108653': e6914d05-6244-4b1f-86dc-b73a4cc5ce3c21:24
mriedemin that one job21:24
clarkbgate just reset so I'm promoting those two21:25
mriedemi think that's the control node21:25
clarkbhttps://d494348350733031166c-4e71828f84900af50a9a26357b84a827.ssl.cf1.rackcdn.com/676138/16/gate/openstack-tox-py27/9628b8b/job-output.txt caused this reset, nova db migration tests21:26
mriedemi think that's an old one http://status.openstack.org/elastic-recheck/#182325121:26
mriedemor http://status.openstack.org/elastic-recheck/#179336421:26
mriedemin that legacy-grenade-dsvm-cinder-mn-sub-volbak failure i can't see where discover_hosts was run from d-g on the subnode21:27
mriedemit's supposed to generate a log but i don't see that getting archived21:27
mriedemwe must not archive $WORKSPACE/logs/devstack-gate-discover-hosts.txt ?21:27
mriedemoh nvm i see why,21:28
mriedemfor grenade we don't create that log..https://github.com/openstack/devstack-gate/blob/401e7535c6c29f9ba814058610ef6cefc952678b/devstack-vm-gate.sh#L71421:29
clarkband we only run it if the branch in devstack has it for the old side21:29
mriedemit's been around since ocata so it'll be there21:30
clarkbok the change is against queens https://review.opendev.org/#/c/680838/ so we would be looking at pike21:31
mriedemthis is what that log should look like https://776a947f0924bf63aed6-b3fb5ecd7a4ff2f6244fae5211e9579f.ssl.cf5.rackcdn.com/673990/18/check/nova-live-migration/4d7d637/logs/devstack-gate-discover-hosts.txt.gz21:31
clarkbhttps://opendev.org/openstack/devstack/src/branch/stable/pike/tools/discover_hosts.sh and it does exist21:31
clarkbis it possible that nova-manage does not exist so it noops?21:32
mriedemnova-manage exists,21:32
mriedemotherwise you can't migrate the db schema21:32
clarkbis it possible it doesn't exist on the subnode?21:33
clarkboh we run that only on the contrller so ya should be fine21:33
mriedemyeah,21:33
mriedemso what i think is probably happening,21:33
*** sshnaidm|ruck is now known as sshnaidm|afk21:33
mriedemis in pike devstack might not have a patch to wait properly/longer for nova-compute to come up on the subnode before stacking is done,21:34
clarkbmriedem: grep discover_hosts.sh in https://48dfe966566e2a08c129-e099c5c03695c7198c297e75ec3f8d05.ssl.cf1.rackcdn.com/680838/1/gate/legacy-grenade-dsvm-cinder-mn-sub-volbak/ab25959/logs/grenade.sh.txt.gz21:34
mriedemi think that's something mnaser fixed in devstack but i'm not sure how far back that fix went21:34
mriedemif discover_hosts runs before the subnode nova-compute service is created in the db it won't discover it21:34
clarkbah21:34
clarkb2019-09-10 20:22:55.309 | Found 1 unmapped computes in cell: 7ba51989-bbc4-42fe-93e2-7f52bf2b2828 is what it says21:34
clarkbso I guess what this might boil down to is stop running multinode grenade on queens?21:35
*** jamesmcarthur has joined #openstack-infra21:35
mriedemor backport the devstack fixes - i remember this showed up after the meltdown/spectre patches started hitting nodepool providers and slowing things down enough that we'd see the race21:36
mriedemi'm having trouble finding the patch though21:36
mriedemhttps://review.opendev.org/#/q/I0cd7f193589a1a0776ae76dc30cecefe7ba9e5db21:37
mriedembingo21:37
mriedemnot in pike21:37
mriedembrotha21:37
mriedemoh shit it is21:37
mriedemheh well i'm out of ideas o-)21:37
mriedemi have to run to this class, will be online later21:37
*** rh-jelabarre has quit IRC21:37
*** mriedem has quit IRC21:37
*** exsdev0 has joined #openstack-infra21:42
*** exsdev has quit IRC21:43
*** exsdev0 is now known as exsdev21:43
*** rh-jelabarre has joined #openstack-infra21:43
clarkbdonnyd: thinking out loud here re your swift graphs. Might be good to show a rate of growth (even aggregate is fine) that way we can see if we are trending always upward or if we plateau or trend down etc21:47
clarkbI'm working to finally getaround to filing that gitea bug and noticed https://github.com/go-gitea/gitea/issues/49121:53
clarkbtl;dr gitea upstream seems aware of their performance issues, there is someone maintaining a branch that addresses it but they haven't upstreamed it :/21:53
fungithat sounds familiar21:54
*** iokiwi has quit IRC21:59
*** adriant has quit IRC21:59
*** mriedem has joined #openstack-infra22:00
mriedemclarkb: so on that failed queens grenade job,22:05
mriedemthe last discover_hosts.sh runs at:22:05
mriedem2019-09-10 20:32:13.462 | + ./post-stack.sh:main:8                   :   /opt/stack/old/devstack/tools/discover_hosts.sh22:05
mriedemthe compute node record for the subnode that was not discovered is created after that:22:05
mriedemSep 10 20:32:24.538778 ubuntu-xenial-ovh-gra1-0011108654 nova-compute[26738]: INFO nova.compute.resource_tracker [None req-09805f53-608d-4e2a-bfeb-1c3222e9b451 None None] Compute node record created for ubuntu-xenial-ovh-gra1-0011108654:ubuntu-xenial-ovh-gra1-0011108654 with uuid: b5a32e0d-b32d-4516-a21b-cb5ac5952aba22:05
clarkbhttps://github.com/go-gitea/gitea/issues/8146 has been filed22:06
clarkbmriedem: k so that was what you thought was fixed and backported?22:06
ianwok, i guess the whole-server salvage worked and opensuse mirror r/o is synced now22:06
mriedemwell what's weird is that devstack log on the subnode finds the compute service in the api earlier:22:07
mriedem2019-09-10 20:32:07.888 | ++ ::                                       :   openstack --os-cloud devstack-admin --os-region RegionOne compute service list --host ubuntu-xenial-ovh-gra1-0011108654 --service nova-compute -c ID -f value22:07
ianwi was hoping mirror.wheel.bionicx64 was just in progress, but it seems not.  i'll recover that too22:08
mriedemclarkb: so no that patch i linked earlier about the timeout isn't the same thing,22:11
mriedemlooks like the subnode devstack says the compute is ready b/c the service record is in the API,22:11
mriedembefore the compute node record is created which is what discover_hosts is looking for,22:11
mriedemwhich doesn't really make sense to me, but if that node is pike...it's been a long time since i've looked at what might be buggy there still22:11
mriedemhttp://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20mapped%20to%20any%20cell%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d22:13
*** jamesmcarthur has quit IRC22:14
mriedempredominantly on ovh-gra1 nodes, i wonder if there is some subtle timing issue on those nodes22:14
*** jamesmcarthur has joined #openstack-infra22:15
*** aaronsheffield has quit IRC22:16
*** jamesmcarthur has quit IRC22:17
mriedemlooking at nova pike code, the service record that devstack is waiting on is created before the compute node record that discover_hosts is looking for,22:18
mriedemso that's a race22:18
ianwwith no releases happening, i'm going to enable logging on the afs servers so we can hopefully get to the bottom of this slow release issue22:18
clarkbianw: k22:18
*** jamesmcarthur has joined #openstack-infra22:18
clarkbmriedem: I guess we have to decide how valuable it is to keep supporting pike then?22:18
clarkbmriedem: I only called it out because it caused a gate reset but there are other ways of dealing with those22:19
mriedemlooking at logstash results this is pretty rare it looks like22:19
mriedemi would make legacy-grenade-dsvm-cinder-mn-sub-volbak non-voting in stable/queens first, though it looks like that job hasn't made it's way to cinder yet so it's still branchless22:20
mriedemalthough couldn't you make it non-voting in the cinder stable/pike .zuul.yaml? https://github.com/openstack/cinder/blob/stable/queens/.zuul.yaml#L5722:21
*** slaweq has quit IRC22:21
*** jamesmcarthur has quit IRC22:21
clarkbya I think you can22:21
clarkbbasically list it there and set nonvoting to true22:21
clarkbseemsl ike this was the same trick used to increase keystone's unittest timeouts22:22
openstackgerritMerged openstack/project-config master: Add Fedora 30 nodes  https://review.opendev.org/68091922:24
*** jamesmcarthur has joined #openstack-infra22:24
ianwok, interesting; i've enabled auditing and am doing a "vos release mirror.fedora" now.  since nothing has written to this since it was released (mirror-update.opendev.org is off ...) i'd expect this would be a zero-delta update22:25
ianw... it does not appear to think it is ...22:25
mriedemi'm puzzled why this isn't an issue after pike nodes but nothing is coming to mind right now22:25
mriedemmaybe it is and we're just lucky22:25
clarkbmnaser: re https://review.opendev.org/#/c/662300/ we are planning project renames on monday. Now would be a good time to udpate that if it is still something you want otherwise maybe abandon?22:26
*** panda has quit IRC22:26
clarkbmriedem: that is often the case particularly with timing issues22:26
mriedemhmm, when did devstack change to systemd?22:26
clarkbmriedem: swap out hardware and all of a sudden no longer lucky22:26
clarkbmriedem: sdague did it so a while back22:26
clarkbI think after the first PTG22:27
clarkbwhenever that was22:27
mriedemthe service record and compute node record should be created by the time the service is reported as running/active by systemctl22:27
mriedemfirst ptg was pike22:27
mriedemin hot lanta22:27
mriedemso, this might be a case of pre-systemd devstack not waiting long enough for the service to be fully started and records created22:27
clarkbin the legionnaires hotel22:27
*** panda has joined #openstack-infra22:28
clarkbfungi: is it valid for a project to be in two storyboard groups? https://review.opendev.org/#/c/669298/3/gerrit/projects.yaml22:30
*** ociuhandu has joined #openstack-infra22:30
mriedemfwiw systemd in devstack was made the default in queens https://review.opendev.org/#/c/499186/22:31
fungiclarkb: absolutely22:31
fungia project can be in as many groups as make sense22:32
*** prometheanfire has quit IRC22:33
*** prometheanfire has joined #openstack-infra22:33
clarkbremote: https://review.opendev.org/681353 project rename records22:33
clarkbI think https://etherpad.openstack.org/p/project-renames-2019-09-10 and its related changes are ready for review now22:34
clarkbfungi: thanks for confirming22:34
*** ociuhandu has quit IRC22:35
johnsomclarkb So far looking good. Will know for sure in a few hours.22:35
*** jamesmcarthur has quit IRC22:37
*** mriedem has quit IRC22:41
clarkbcorvus: fungi ianw do we want to apply cleanups globally? https://review.opendev.org/#/c/681322/22:43
*** jamesmcarthur has joined #openstack-infra22:44
fungiianw and i both just +2'd it but would appreciate corvus's input so i did not approve22:47
clarkbk22:47
fungisince it's a change to the base job, i feel extra caution is warranted22:47
fungialso i'm knocking off for the day... it's been a long one22:48
corvusfungi, clarkb: +222:49
corvusclarkb: interested in making that happen only on failure?22:49
corvusi think it's possible22:49
clarkbcorvus: hrm can I check a zuul var for that?22:50
clarkbzuul_success says grep22:50
corvusyep22:50
clarkbsure22:50
corvusthat's the one i was thinking of22:50
clarkbdo you think that should go through base-test too (probably not a bad idea)22:50
corvusyes22:50
*** asettle has quit IRC22:51
openstackgerritClark Boylan proposed opendev/base-jobs master: Add cleanup playbook to all base jobs  https://review.opendev.org/68132222:54
openstackgerritClark Boylan proposed opendev/base-jobs master: Update cleanup tasks to only happen on failure  https://review.opendev.org/68135422:54
clarkb681354 is parent of 681322 and modifies base-test22:54
*** mattw4 has joined #openstack-infra22:55
clarkbI'll test that on both successful and failed job cases22:55
*** jamesmcarthur has quit IRC22:57
clarkbianw: is it possible that something other than rsync is touching files (maybe some verification step though I didn't think we had those for the rsynced repos)22:59
*** rcernin has joined #openstack-infra22:59
ianwclarkb: so right now i'm just testing with rsync completely out of the loop.  it just finished "vos release mirror.fedora" and now i am re-running it23:00
*** mattw4 has quit IRC23:00
*** shachar has joined #openstack-infra23:01
ianwi'm watching the rxdebug stats on afs02, right now it seems that the thread handling this has recevied 3110647592 bytes23:02
*** tkajinam has joined #openstack-infra23:03
ianwwe'll see where it ends up at ... but multiple gigabytes for a volume that has had no updates doesn't seem right23:03
*** snapiri has quit IRC23:03
ianwwe have the fileaudit logs too, so can see nothing touched the underlying r/w volume23:03
*** exsdev has quit IRC23:10
*** slaweq has joined #openstack-infra23:11
*** rcosnita has joined #openstack-infra23:13
*** exsdev has joined #openstack-infra23:13
*** slaweq has quit IRC23:15
*** goldyfruit___ has quit IRC23:17
*** rcosnita has quit IRC23:18
*** jamesmcarthur has joined #openstack-infra23:18
*** jamesmcarthur has quit IRC23:21
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Add support for the Gerrit checks plugin  https://review.opendev.org/68077823:22
*** jamesmcarthur has joined #openstack-infra23:23
johnsomclarkb Yes, confirmed. No more retries.23:24
johnsomclarkb ianw Thank you for all your help.23:25
*** xenos76 has quit IRC23:25
*** jamesmcarthur has quit IRC23:26
openstackgerritMerged zuul/nodepool master: Fix node failures when at volume quota  https://review.opendev.org/67170423:29
clarkbjohnsom: great one less issue to worry about  I wonder if other jobs will be more stable too23:30
johnsomAny job actually putting traffic through floating IPs should have been impacted.23:30
johnsomDoesn't seem to take much traffic either23:31
*** dchen has joined #openstack-infra23:31
*** jamesmcarthur has joined #openstack-infra23:31
clarkbthe traffic would need to fragment too23:35
clarkbwhich if I'm remembering correctly from way back when we didn't always have mtus set correctly doesn't always happen23:35
johnsomI know we do for the amphora, but this was out at the neutron level,  so... don't know23:35
*** iokiwi has joined #openstack-infra23:37
*** adriant has joined #openstack-infra23:39
*** jamesmcarthur has quit IRC23:41
*** mtreinish has quit IRC23:43
*** sthussey has quit IRC23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!