clarkb | The meeting will begin shortly | 18:59 |
---|---|---|
* fungi listens to the muzak | 19:00 | |
*** ianw_pto is now known as ianw | 19:00 | |
fungi | [please stand by] | 19:00 |
ianw | o/ | 19:00 |
frickler | o/ | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Nov 2 19:01:08 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-November/000294.html Our Agenda | 19:01 |
clarkb | Welcome, you'll find the agenda for this meeting ^ there. | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | The Gerrit User Summit will be happening sometime early next month and details should be coming out soon. I expect that it will be remote but don't know that for sure. | 19:02 |
clarkb | I bring it up because I was discussing that we did the 3.2 -> 3.3 upgrade and automated much of our testing for that and he thoguht other Gerrit users would be interested in hearing how we manage our gerrit | 19:02 |
fungi | i guess we're in a much better position to talk about the things we're doing with gerrit, now that we're running a relatively recent release | 19:02 |
clarkb | I think our installation is a bit different than many others because while we run a fairly large instance we don't currently do HA or have very strict uptime requirements. But at the same time we automate much of our testing and development around gerrit now. | 19:03 |
clarkb | Anyway the event might be interesting to those who attend this meeting so calling it out here. I'm hopign I can submit something to talk about how we run gerrit stuff too | 19:03 |
clarkb | #topic Actions from last meeting | 19:04 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-10-26-19.01.txt minutes from last meeting | 19:04 |
clarkb | ianw you had an action to start on the gerrit 3.4 stuff | 19:04 |
fungi | i hope he didn't do that on his vacation | 19:05 |
clarkb | write a checklist, hold a node, and test the downgrade. | 19:05 |
ianw | yes i started on that | 19:05 |
ianw | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.4 | 19:05 |
clarkb | thank you! | 19:05 |
ianw | i got as far as noticing that the plugin updates do seem to break our theme | 19:06 |
clarkb | that is good news since it was one of the questions we had | 19:06 |
ianw | so i'll dig into that first | 19:06 |
clarkb | er | 19:06 |
clarkb | I read it as does not. But I guess knowing is good either way just more work in this case | 19:06 |
clarkb | yay for testing :) | 19:06 |
ianw | indeed :) | 19:07 |
fungi | those used a "lightweight" polygerrit plugin method, so i guess we need some java to go along with it now | 19:07 |
clarkb | I think you may still be able to do pure javascript plugins but you have to hook in some specific way? | 19:07 |
fungi | s/those/the theme/ | 19:07 |
clarkb | ianw: the origianl file came from the android theme iirc. We might be able to see how they updated their theme? | 19:08 |
ianw | yeah, good idea. honestly i haven't even had an initial look at it yet | 19:08 |
fungi | paladox also might have suggestions as to how to update it since he effectively supplied the original for us | 19:08 |
clarkb | ya no rush. Just wanted to check in on this as it was recorded as an action. Sounds like good progress. Thanks | 19:09 |
clarkb | The other recorded action was for infra root to review the mailman 3 spec | 19:09 |
clarkb | lets just dive into that topic now | 19:09 |
clarkb | #topic Specs | 19:09 |
clarkb | #link https://review.opendev.org/810990 Mailman 3 spec | 19:09 |
clarkb | ianw and I have reviewed it and appear happy with the spec. I'd like to approve this soon if we can as the holiday period is a good time for this type of work | 19:10 |
clarkb | frickler: corvus: do you think you have time to review it this week? Any objections to landing it this week if not? | 19:10 |
frickler | I'll put it on my list but won't object to anything | 19:11 |
clarkb | ok in that case I'll aim to approve it end of day Friday if no review objections come up | 19:11 |
clarkb | thanks! | 19:11 |
clarkb | #topic Topics | 19:12 |
fungi | also i'm happy to make adjustments during setup if people come up with new concerns | 19:12 |
clarkb | fungi: thanks | 19:12 |
clarkb | #topic Improving OpenDev's CD throughput | 19:12 |
clarkb | I'll admit I haven't really had a chance to look at this yet. I feel like this sort of change requires me to not be juggling a few things so I can focus on understanding the end result and haven't had that opportunity tet | 19:12 |
clarkb | It is on my todo list if I ever find that block of time :/ | 19:13 |
ianw | yeah i need to cycle back on some failures in jobs too | 19:13 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/807672 | 19:13 |
clarkb | specifically that chagne and its child if others have time too | 19:13 |
clarkb | at this point I think it just needs reviewers and someone to look at failures. Then we can improve it as required by review and start landing changes | 19:14 |
clarkb | #topic Gerrit account cleanups | 19:15 |
clarkb | Just a note that I haven't heard back from the user I most recently did the fixup for | 19:15 |
clarkb | I suppose no news is good news in this case | 19:15 |
fungi | they seemed relatively uncommunicative anyway | 19:15 |
clarkb | #topic Fedora 34 boot problems | 19:16 |
clarkb | I've not managed to keep up with the status of this other than reviewing a change here and there | 19:17 |
clarkb | Is this still an issue? Anything we need to do to help fix it? | 19:17 |
ianw | the dracut fix made it into f34, but i was not clear if that actually would fix the default initramfs | 19:17 |
ianw | so i have a change out still that regenerates it with dracut | 19:18 |
ianw | however, i also just updated for fedora 35 | 19:18 |
fungi | what's the anticipated release date for 35? | 19:19 |
ianw | i wasn't sure what to do with the mirror, but i think it just released today | 19:19 |
ianw | so that solves having to figure out "/devel" paths | 19:19 |
clarkb | fungi: it was yesterday I think | 19:19 |
fungi | oh, then yeah that may just be a better place to focus regardless | 19:19 |
fungi | in the meantime we're not all that blocked on 34 since we've got three providers where it can boot now | 19:20 |
clarkb | I think only 2 have labels configured for it though | 19:20 |
ianw | (sorry just logging into gerrit) | 19:20 |
clarkb | but that is probably good enough while we get f35 up | 19:20 |
fungi | oh, i guess we never added it to vexxhost | 19:20 |
clarkb | the other related item was I had a -1 comment on the f33 mirror cleanup. I don't think we can remove the fedora atomic image yet because magnum has older branches still using it | 19:21 |
ianw | i guess we still think we have fedora 29 users | 19:21 |
fungi | granted it's also not terribly efficient that we've got poolworkers accepting node requests they'll ultimately be unable to fulfil after waiting 15-20 minutes for the node to never become reachable | 19:21 |
clarkb | but we should definitely clean out f33 | 19:21 |
clarkb | ianw: ya I think we should also send a note to openstack-discuss that that image needs to go away. It isn't something anyone should be using and they need to make a plan for using something else? | 19:21 |
fungi | if we do decide to abandon 34 and focus on 35, then we should probably remove the 34 label from providers where we know it's broken | 19:22 |
ianw | ok, i think it's more a "this is going away" message at this point ... | 19:22 |
clarkb | ianw: yup. Basically we know it is used but no one should use it and we need to clean it up. Lets give them sufficient warning then proceed | 19:22 |
fungi | maybe a one-sentence reminder that we tend to not keep eol distro versions around | 19:23 |
ianw | i can split those up, drop f33, add f35, then starts builds for f35, update zuul-jobs and any users and then we can drop f34 | 19:23 |
clarkb | sounds like a plan | 19:23 |
fungi | but in the meantime, drop f34 everywhere besides inmotion and citycloud (and maybe add it to vexxhost?) | 19:24 |
fungi | we're just wasting resources trying to boot it everywhere else | 19:24 |
clarkb | fungi: ya otherwise that | 19:24 |
ianw | ok, i'll do that too, although i hope the removal to proceed in a timely fashion :) | 19:25 |
fungi | it's ore just that i'm watching nodepool try to boot a f34 node in rackspace right now | 19:25 |
clarkb | thanks. Let me know if I can help. Happy to do reviews on that as the slow f34 boots affected random things when it was a bigger issue | 19:26 |
clarkb | #topic Zuul multi scheduler setup | 19:26 |
clarkb | Over the weekend zuul ran with an active active scheduler for the first time | 19:27 |
clarkb | I saw a report that at least one job was started by one scheduler and finished by another | 19:27 |
clarkb | Unfortunately there have been some bumps along the way (corvus is currently doing a restart to fall back to a single scheduler after debugging the main issue) | 19:27 |
clarkb | basically keep this in mind if you are doing any zuul work. And if you notice any weird zuul behavior reporting that back to the zuul matrix room is a good idea | 19:28 |
corvus | it went a lot better than i expected actually :) | 19:28 |
fungi | we ran with one again just a few minutes ago! | 19:28 |
clarkb | I'm happy because it is nice to see all that code review done last week show results. Super exciting to see zuulv5 when it is ready | 19:29 |
clarkb | But ya if you notice weirdness please report it. That information and feedback is useful. | 19:29 |
fungi | also the zuul restart docs have been updated, and include information on dumping some diagnostic data | 19:30 |
clarkb | #topic FIPS testing in our CI system | 19:31 |
clarkb | We're seeing more and more interest in testing software on FIPS enabled systems, particularly for openstack. | 19:31 |
clarkb | The way we've been approaching this is having the jobs install whatever they need for that then enable configs and reboot into the new kernel state | 19:32 |
clarkb | The reason for this is managing another set of centos-8 and fedora-* images just for FIPS doesnt' really scale well and zuul supports this reboot case just fine | 19:32 |
clarkb | This does present an issue where a lot of jobs set ephemeral state that doesn't survive reboots | 19:33 |
clarkb | for example multinode networking creates ovs networks and those don't come back after a reboot. swift unittesting creates an xfs filesystem that is mounted and not added to fstab | 19:33 |
fungi | i had a random crazy thought about that... what if glean grew the ability to run userdata scripts like cloud-init can, and nodepool supplied those to do things like enable fips and reboot, or tweak kernel parameters to limit available memory and reboot... we wouldn't need separate images then, just separate labels | 19:34 |
clarkb | If peopel come to you with problems around FIPS testing it is probably a good idea to check for any lost ephemeral state as an early debugging step | 19:34 |
clarkb | fungi: I think we intentionally avoid that stuff because it is really difficult to debug | 19:34 |
clarkb | fungi: the reason that we prefer putting as much logic into zuul as possible is that the info can then be exposed to users easily | 19:34 |
fungi | yeah, fair, only the console log can really provide that info | 19:35 |
clarkb | in fact debugging the swift issue was done with a held node but I did not log into that machine at all and only used job info | 19:35 |
ianw | it would only be a label though to pass fips=1 on the command line? | 19:35 |
ianw | does that not happen via glance options, similar to some of the stuff we set for arm64 images? | 19:36 |
clarkb | ianw: it would be a label with a user script set that installed necessary pacakges and updated config then did a reboot | 19:36 |
ianw | glance metadata | 19:36 |
clarkb | no it happens via nova metadata and is per instance | 19:36 |
clarkb | debugging that stuff is really difficult | 19:36 |
clarkb | personally I don't think we should go that route for this reason | 19:37 |
fungi | i don't think all hypervisors can alter the kernel command line anyway | 19:37 |
ianw | i did think it was just a switch, that we could have the images ready and boot them in either mode. but i haven't looked at details, obviously | 19:38 |
clarkb | Something else to be aware of is that to address the loss of state people are wanting to modify all the jobs with FIPS enabled flags. I've been -1'ing those and asking for different base jobs instead. The reason for this is zuul-jobs is meant to be a generally reconsumable library of jobs for all zuul users and adding a bunch of fips flags to generic jobs seems to pollute | 19:38 |
clarkb | that goal | 19:38 |
clarkb | basically have a multinode-fips job instead of a multinode job with a fips flag | 19:38 |
clarkb | The last thing I wanted to call out on this topic is I think we should also try to encourage projects to avoid having two copies of all the jobs to cover fips=1 and fips=0 | 19:39 |
fungi | i can't remember, does multiple inheritance (mix-ins) work? did that every get added or just talked about? | 19:39 |
clarkb | fungi: there is a way to do it but it is undocumented iirc | 19:40 |
fungi | ahh, so probably discouraged | 19:40 |
corvus | i don't discourage it :) | 19:40 |
clarkb | We should probably encourage fips by default for projects that is important for with the assumption that if it works under fips it will work without fips | 19:40 |
clarkb | and/or targetted fips testing and not attempt to test everything under fips | 19:40 |
ianw | i'm just trying to read ... is the local config required regenerating initramfs? | 19:42 |
clarkb | ianw: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/enable-fips/tasks/main.yaml is the current implementation | 19:42 |
clarkb | I do not know what all `fips-mode-setup --enable` does | 19:43 |
clarkb | but I assume it is non trivial if it comes with its own command | 19:43 |
clarkb | I don't think you can change the operating mode without a reboot either | 19:44 |
clarkb | since it changes kernel stuff that can't be modified without rebooting | 19:44 |
ianw | it looks like it's mostly regenerating initramfs, disabling prelink and some sshd_config tweaks | 19:44 |
fungi | apparenrtly you can do it on ubuntu lts too, but need to have a ua subscription | 19:44 |
ianw | i thought prelink was dead anyway, have to investigate that | 19:45 |
clarkb | Anyway the most important thing I wanted to call out was the hint for helping debug FIPS related problems potentailly being related to losing state in the reboot | 19:46 |
clarkb | since that was overlooked in the swift case for far too long | 19:46 |
clarkb | #topic Open Discussion | 19:47 |
clarkb | I ended up rebooting nb03 to address its weird high load average | 19:47 |
fungi | the implementation detail of swift's functional tests creating and mounting an xfs filesystem on a loop device is easily overlooked anyway | 19:47 |
clarkb | the system itself is happy afterwards but nodepool-builder won't start there due to openshift==0.0.1 being installed | 19:47 |
clarkb | https://review.opendev.org/c/zuul/nodepool/+/816389 should fix that but we wanted to confirm it doesn't make the nodepool iamge builds very slow before approving it | 19:47 |
ianw | oh, indeed. i'm not sure if we implemented something to add extra wheels to the opendev requirements build, or just talked about it | 19:48 |
frickler | that sounds like we lack some test for that image? | 19:49 |
clarkb | frickler: yup largely because we'd need to figure out arm64 specific jobs for it | 19:49 |
clarkb | it is doable but not as easy as getting it covered with say the dib jobs | 19:49 |
clarkb | ianw: I think your spec for zuul is likely to be the plan for properly solving that problem? | 19:49 |
ianw | it doesn't work on the dib side because devsatck doesn't work on arm64 | 19:49 |
clarkb | ah | 19:50 |
ianw | we could do a "does it start" test | 19:51 |
clarkb | ya we would fail that currently and that would be an improvement | 19:51 |
clarkb | won't cover everything but should ensure basic functionality | 19:51 |
fungi | that might be "good enough" for a secondary architecture exercise anyway | 19:52 |
fungi | since if it starts, the rest of its operation is unlikely to differ substantially | 19:52 |
ianw | yeah, not exactly sure what that would look like -- perhaps just start a ZK container and make sure it gets into a listening state? | 19:53 |
ianw | although probably just tox tests would pick this up too? | 19:53 |
clarkb | if we can have it build a simple image but not upload it (do we have a way to force a build without upload?) then we can nodepool dib-image-list it | 19:53 |
ianw | maybe we should just run that in check-arm64? | 19:54 |
clarkb | ianw: you would need to ensure deps are installed the same way when running unittests but that should work too | 19:54 |
clarkb | part of this is an artifact of how we make the qemu arm64 emulated docker image build run in a reasonable time frame | 19:55 |
clarkb | if we just ran tox on arm64 it would probably work | 19:55 |
clarkb | because it would find the sdist for openshift 0.11.2 and install from that. | 19:55 |
fungi | (or fail, more importantly) | 19:55 |
fungi | oh, i see what you mean | 19:55 |
clarkb | well in this case it wouldn't fail | 19:55 |
fungi | yeah | 19:55 |
clarkb | if we reproduced the same install method for unittests then it would fail | 19:55 |
fungi | so tox is not good enough | 19:55 |
ianw | yeah, it's more making sure we're pulling the same wheels etc. in tox | 19:56 |
fungi | in this case, not good enough without setting nonstandard options for pip's dep solver to skip source-only versions anyway | 19:56 |
ianw | this does more-or-less cycle back to the rough spec from our discussions on arm64 wheels + zuul | 19:57 |
clarkb | yup I think if we focus on that we'll largely solve this specific problem | 19:57 |
ianw | #link https://review.opendev.org/c/zuul/zuul/+/815406 | 19:57 |
clarkb | any arm64 testing would be to sanity check that result | 19:57 |
clarkb | and we can possibly rely on unittests then if we have the better arm64 stuff for zuul | 19:58 |
ianw | yeah, so i think calling this out there and making sure we address it is probably the way forward | 19:58 |
clarkb | ++ | 19:58 |
ianw | i can update that for the testing case and we can loop back on it | 19:58 |
clarkb | sounds good. And we are at time | 20:00 |
clarkb | thank you everyone! | 20:00 |
fungi | thanks clarkb! | 20:00 |
clarkb | We'll see you here next week and the week after. but then many of us have a big holiday in three weeks | 20:00 |
clarkb | I expect that I won't be around much during the week of US thanksgiving | 20:00 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue Nov 2 20:00:45 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-02-19.01.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-02-19.01.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-02-19.01.log.html | 20:00 |
clarkb | Just a heads up. Feel free to have the meeting without me or skip. I suspect we'll just do a skip | 20:01 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!