clarkb | Anyone else here for our meeting? | 19:00 |
---|---|---|
clarkb | We will get started momentarily | 19:00 |
ianw | o/ | 19:00 |
fungi | ahoy | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Oct 12 19:01:10 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-October/000288.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | I forgot to mention this in the agenda but next week is the PTG | 19:01 |
clarkb | I requested a short amount of time for ourselves largely as office hours that other projects can jump into to discuss stuff with us | 19:01 |
clarkb | I plan to be there and should get an etherpad together today. If the times work out for you feel to join, otherwise I think we'll have it covered | 19:02 |
clarkb | But also keep in mind that is happening next week and we shoudl avoid changes to meetpad/etherpad if possible | 19:02 |
clarkb | #topic Actions from last meeting | 19:03 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-10-05-19.01.txt minutes from last meeting | 19:03 |
clarkb | We had actions last week but they were all related to specs. Let's just jump into specs discussion then :) | 19:03 |
clarkb | #topic Specs | 19:03 |
clarkb | First up I did manage to update the prometheus spec based on feedback on how to run it. Ended up settling on using the built binary to avoid docker weirdness and old versions in distros | 19:04 |
clarkb | #link https://review.opendev.org/c/opendev/infra-specs/+/804122 Prometheus Cacti replacement | 19:04 |
clarkb | corvus brought up a good concern which is that we should ensure that it can run on old old distros and I confirmed it seems to work on xenial | 19:04 |
clarkb | fungi: I was going to work with you to check it on trusty when you have time | 19:04 |
fungi | ahh, yep | 19:05 |
clarkb | I didn't want to touch the remaining trusty node without you being around | 19:05 |
fungi | i should have time tomorrow | 19:05 |
clarkb | Cool I'll ping you tomorrow then. | 19:05 |
clarkb | My plan is to approve this spec if nothing comes up by end of day Thursday for me | 19:05 |
clarkb | if you havent' reviewed the spec and would like to now is the time tod o that | 19:05 |
clarkb | and we can note any trusty problems before landing it too | 19:05 |
clarkb | Next up is the mailman 3 spec | 19:05 |
clarkb | #link https://review.opendev.org/810990 Mailman 3 spec | 19:05 |
clarkb | I've reviewed this and the plan seems straightforward. Essentially spin up a new machine running mm3. Then migrate existing mm2 vhosts into it as users are ready starting with opendev | 19:06 |
clarkb | If other infra-root can review this spec that would be much appreciated | 19:06 |
clarkb | fungi: ^ anything else to add on mailman 3? | 19:07 |
fungi | nah, the migration tools are fairly honed from what i understand, so other than new interfaces and probably some new message formatting in places, users shouldn't really be impacted | 19:07 |
clarkb | thank you for putting that together. I'm excited to be able to use the new frontend | 19:08 |
clarkb | #topic Topics | 19:08 |
clarkb | #topic Improving OpenDev CD throughput | 19:08 |
fungi | i guess the actual steps for the cut-over could stand to be drilled down into a bit, deciding how we want to go about making sure deliveries to the list get queued up while we're copying things over | 19:08 |
fungi | but we can work that out as we get closer | 19:08 |
clarkb | ++ | 19:09 |
clarkb | ianw: you mentioned trying to pick this up again. I think the next step is largely in making that first chagne in the stack mergeable (by runnign jobs for it somehow?) | 19:09 |
clarkb | note that corvus pointed out the change in zuul you thought would fix it likely won't as the playbooks aren't changing for those jobs | 19:10 |
ianw | yeah i added a update to a readme | 19:10 |
clarkb | ah ok, I should go back and rereview then | 19:11 |
ianw | (late last night ... and i just realised that file doesn't trigger anything either :) | 19:11 |
ianw | i'll try something else! but yep, i did respond to all comments | 19:11 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/807672 is the first change in the sequence | 19:11 |
clarkb | cool and I'll look at rereviewing things this afternoon | 19:11 |
clarkb | #topic Gerrit account cleanups | 19:13 |
clarkb | This is something that has gone on the back burner with zuul updates, gerrit upgrades, gitea upgrades, openstack releases etc. I haven't forgotten about it and will try and pick it up next week if the PTG is quiet for me (I expect it to be but you never know with an event) | 19:13 |
clarkb | Really just noting that I still intend on getting to this but it is an easy punt because it doesn't typcially immediately affect stuff | 19:14 |
clarkb | #topic Gerrit Project Renames | 19:14 |
clarkb | We announced last week that we would rename gerrit projects Friday October 15 at 18:00 UTC | 19:14 |
clarkb | All of our project renaming testing continues to function we should largely be mechanically ready for this | 19:14 |
clarkb | The one thing that has been noted is that we need to update project metadata in gitea after renames to update descriptions and urls and storyboard links | 19:15 |
clarkb | I think the easiest way to do that would be to run the gitea project management with the full update flag set after we rename. Either using a subset projects.yaml or just doing it for everything (which could take hours) | 19:15 |
clarkb | fungi: ^ you were thinking about this too, did you have a sense for how you wanted to approach it? | 19:16 |
fungi | it's still not clear to me why we can't update specific projects | 19:16 |
fungi | though i suppose we do need to perform a full run at least once to catch up | 19:16 |
clarkb | fungi: the fundamental issue is that the input to renaming and the input to setting project metadata are different. The rename playbook takes taht simple yaml file with old and new names. The project metadata takes projects.yaml | 19:17 |
clarkb | This is why I think it is simpler to do it as two distinct steps. | 19:17 |
fungi | oh, it doesn't filter projects.yaml by specific entries? | 19:17 |
clarkb | no projects.yaml is not referenced at all in the rename process | 19:18 |
clarkb | Then to make things more complicated projects.yaml doesn't get updated until after the reanme is done and we merge the associated changes | 19:18 |
fungi | oh, i see, we would normally update projects.yaml after renaming | 19:18 |
clarkb | and since it takes hours to do the force update we don't do those | 19:18 |
fungi | so would need to run the metadata update after that | 19:18 |
clarkb | er we don't automatically do those | 19:18 |
clarkb | yup | 19:19 |
fungi | but we could tell it to only update the projects which had been renamed, once things are up and the projects.yaml update merges, right? | 19:19 |
fungi | rather than telling it to update every project listed in the file | 19:19 |
clarkb | fungi: the current code does not support that. We could hack it in by running the update against an edited projects.yaml. But even that might be complicated since I think the playbook syncs project config directly | 19:20 |
clarkb | this is the bit I was hoping someone would have time to look at | 19:20 |
clarkb | I think it is ok to do the metadata update as a separate step post rename, but ya we should device a method of making it less expensive | 19:20 |
fungi | ahh, okay, so we need a way to tell the metadata update script to filter its actions to specific project names, i guess | 19:20 |
clarkb | ya that | 19:20 |
fungi | seems like that shouldn't be too hard, we could probably have it use the rename file as the filter | 19:21 |
fungi | i'll see if i can figure out what we need to be able to do that bit | 19:21 |
clarkb | essentially our rename process becomes: run rename playbook, restart things and land project-config updates, ensure zuul is happy, manually run the gitea repo management playbook hopefully not in the most expensive configuration possible | 19:21 |
clarkb | and is that very last bit that we need someone to look at | 19:22 |
clarkb | thanks! | 19:22 |
fungi | i guess we could even avoid doing a full sync this time by feeding it the historical rename files too | 19:22 |
clarkb | fungi: yup, we could go through and pull out all the names that need updating as a subset of the whole | 19:23 |
clarkb | since we have those records | 19:23 |
clarkb | I'll work on an etherpad process doc as well as reviewing the project-config proposal and writing up the record files for this rename tomorrow | 19:23 |
clarkb | Anything else to cover on this topic? | 19:24 |
fungi | i don't think so | 19:25 |
clarkb | #topic Gerrit 3.3 upgrade | 19:25 |
clarkb | This went exceptionally well. I'm still looking around wondering what happened and if it is too good to be true :) | 19:25 |
clarkb | the 2.13 -> 3.2 upgrade has scarred me | 19:25 |
clarkb | I've been trying to add hashtag:gerrit-3.3 to changes related to the upgrade and the cleanup afterwards. Feel free to add this hashtag to your changes too | 19:26 |
clarkb | At this point the major change remaining is the 3.2 image cleanup change. | 19:26 |
fungi | i'm trying to run down what looks like a new comment-related crash in gertty, probably related to the upgrade to 3.3 | 19:27 |
clarkb | Do we have opinions on when we'll feel comfortable dropping the 3.2 images? | 19:27 |
ianw | dropping the jobs won't purge the images from dockerhub though will it? | 19:28 |
clarkb | ianw: it will not, however the images in docker hub will eventually get aged out and deleted on that end | 19:28 |
clarkb | (I forget what the timing on that is with dockerhub's new policy) | 19:28 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/813074 is the change. I'm feeling more and more confident in 3.3 since we hanven't had any issues yet that make me think revert | 19:29 |
fungi | iirc, the image ageout has to do with when it last got a download | 19:29 |
clarkb | maybe we go ahead and land it and we can always restore it again later if necessary or just use the existing 3.2 tag until that ages out in docker hub | 19:29 |
ianw | yep; we can always rebuild too, and have local copies. so i think 813074 is probably gtg | 19:29 |
clarkb | fungi: ya and our test jobs for 3.2 download that tag | 19:29 |
clarkb | ianw: ok I've removed the WIP | 19:30 |
fungi | i'm fine dropping them at any time, yeah. it's increasingly unlikely we'd try to roll back at this point | 19:30 |
fungi | we're nearing the 48-hour mark | 19:30 |
clarkb | cool please review the change then :) | 19:30 |
fungi | yeah, i was, but... gertty crashing on one of those changes distracted me ;) | 19:30 |
ianw | the attention set seems to be working | 19:31 |
clarkb | I've also got https://review.opendev.org/c/opendev/system-config/+/813534/ up which is semi related in that the chanegs to update post upgrade hit a bug where we set the heap limit to 96gb in testing whcih doesn't work if the jvm tries to allocate memory on an 8gb instance | 19:31 |
clarkb | ianw: ^ I made a new ps on that fixing an issue I cuaght when writing the followups we talked about yesterday. The followups are there too. I think the whole stack deserves careful review to ensure we don't accidentally remove anything. | 19:31 |
clarkb | oh it just occured to me that the gerrit -> review group rename needs to be checked against the private group vars. Let me WIP that until that is done | 19:32 |
clarkb | ianw: oh neat it is telling me to review the CD Improvement change since you pushed a new ps | 19:33 |
clarkb | I really think the attention set has the potential for being very powerful, just need to sort out how to make it work for us | 19:33 |
ianw | yeah i've been careful to unclick people when voting, which isn't something that needs attention | 19:33 |
clarkb | The last thing I had on this topic is pointing out that we are already testing the 3.3 -> 3.4 upgrade in CI now :) We should start thinking about scheduling that upgrade next. Probably really look at that post PTG so in 2 weeks? | 19:34 |
clarkb | ianw: using the modify button at the bottom of the comment window thing? | 19:34 |
ianw | i did add a note on that @ | 19:34 |
ianw | #link https://bugs.chromium.org/p/gerrit/issues/detail?id=15154 | 19:34 |
ianw | still every bug seems to go into the polygerrit category there :/ | 19:35 |
ianw | i guess maybe this is polygerrit; i don't know who owns it | 19:35 |
clarkb | ianw: their bug tracker is broken and the nromal issue type can't be submitted because no one is assigned to receive notifications for them or some such | 19:36 |
clarkb | so it defaults to polygerrit and you have to hope it gets in front of the right people. But I agree this could be a polygerrit issue | 19:36 |
ianw | clarkb: yep; i think that's going to be the major issue -- if everyone adds your attention when they +1/+2 your change, your attention list becomes less useful | 19:36 |
ianw | also if anyone pops up with dashboard issues see | 19:37 |
ianw | #link https://groups.google.com/g/repo-discuss/c/565rD1Sjiag | 19:37 |
clarkb | might be worth an email to opendev-discuss calling out the modify action and the dashboard stuff | 19:37 |
ianw | basically; /#/dashboard/... doesn't work, /dashboard/... does. unclear if this is a bug or feature | 19:37 |
ianw | sure i can draft something | 19:38 |
clarkb | thanks | 19:38 |
fungi | service-discuss? | 19:38 |
clarkb | fungi: yup sorry. Every other -discuss is name-discuss | 19:39 |
fungi | cool, just making sure you didn't mean some other-discuss | 19:39 |
fungi | (like openstack-discuss) | 19:39 |
clarkb | Thanks again to everyone who helped make this upgrade happen. I think we're in a really good place as far as gerrit goes. We can upgrade with minimal impact and in some cases downgrade. Much of the things we do with gerrit like project creation, project renaming, etc are tested. We even have upgrade testing | 19:40 |
clarkb | Oh and the new server etc | 19:40 |
clarkb | We've come a long way since we were on 2.13 a yaer ago | 19:40 |
clarkb | #topic Open Discussion | 19:42 |
clarkb | That was it for the agenda. | 19:42 |
clarkb | Anything else? | 19:42 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/812622 | 19:42 |
ianw | fungi: ^ maybe you could double check i didn't fat finger anything; that's from the lock issues i think we discussed last week | 19:43 |
clarkb | oh for some reason I thought that had landed and noticed we still had the error that should fix | 19:43 |
clarkb | but I guess it hasn't landed and ++ to reviewing it and getting it in to fix those conflicts | 19:43 |
fungi | checking | 19:43 |
ianw | borg verify lock issues i mean | 19:43 |
fungi | and yeah, i too saw the cron message and thought we had already fixed it, so good to know! | 19:44 |
clarkb | also gitea01 started failing backups again which makes me wonder about networking invexxhost again :) | 19:44 |
ianw | the recent -devel job failure had me thinking about bridge upgrades too | 19:44 |
ianw | at one time it seemed to be under a lot of pressure for it's small size, but i don't recall any issues recently | 19:45 |
clarkb | One tricky thing I remembered about bridge updates is we have ssh rules around bridge connectivity iirc. We may need to spin up a new bridge, then update all the things to talk to it, then swap over? | 19:45 |
clarkb | ianw: the effort to parallelize the CD stuff could have us wanting a bigger server again | 19:45 |
clarkb | ianw: might be a good idea to get that work done first, monitor resource needs and size appropriately for a new server? | 19:45 |
ianw | i thought it might be a good time to start thinking about using zuul-secrets in more places | 19:46 |
clarkb | ianw: to avoid needing the bastion? | 19:47 |
corvus | we could easily >4x the load on bridge before it's a problem. | 19:47 |
clarkb | ya the main issue that seems to affect bridge performance is having leaked ssh connections pile up ansible rules which causes system load to grow | 19:49 |
clarkb | when that isn't happening it doesn't ened a ton of resources | 19:49 |
ianw | clarkb: yep; it would be a different way of working but I think moves us even more towards a "gitops" model | 19:50 |
clarkb | ya I think my biggest struggle there is still around manually running stuff. It seems super useful to be able to do that when things pop up and zuul doesn't have a great answer to that (yet?) | 19:51 |
corvus | i think the cd story gets a lot better if we can remove the ansible module restrictions for untrusted jobs on the executor (but that's a v5+ idea) | 19:52 |
fungi | and the "leaked" (really indefinitely hung) ansible ssh processes seem to crop up when we have pathological server failures where ssh authentication begins but the login never completes | 19:52 |
fungi | rebooting whatever server is stuck generally clears it up | 19:52 |
clarkb | if we want to hold off a bit on updating bridge with the idea that zuul improves before we need to upgrade I'd be ok with that. But we should probably write done a concrete set of things we can go to zuul about making better for that. The modules thing is another good one | 19:53 |
frickler | before we end, I also want to mention the issue with suds-jurko, which is mostly masked in CI because we have an old wheel built for it | 19:54 |
ianw | clarkb: yep, i agree, but i imagine we could have some sort of "escape-hatch" where we do something like replicate the key and have a way to manually run ansible that provides the decrypted secrets | 19:54 |
clarkb | but also "zuul please run this job now" bypass as sysadmins would also be nice alternative to the escape hatch bridge gives us | 19:54 |
clarkb | ianw: ya that could work too | 19:54 |
fungi | worth noting, we'll be unable to upgrade to the upcoming ansible release until we're on focal, or else we need to use a nondefault python3 with it on bridge | 19:54 |
clarkb | frickler: oh ya thats a good call out | 19:54 |
frickler | so maybe we want to expire wheels after some time or have some other way to check whether they still can be built | 19:54 |
ianw | frickler: ahh, i have a spec for that i think :) | 19:55 |
corvus | i'm unaware of the suds-jurko issue, is there a summary? | 19:55 |
clarkb | we built a wheel for suds-jurko some time ago with older setuptools and have that in our wheel mirror. But suds-jurko doesn't build with current setuptools. This means running openstacky things outside of our CI system is problematic as that package doesn't install | 19:55 |
clarkb | corvus: ^ | 19:55 |
ianw | #link https://review.opendev.org/#/c/703916/ | 19:55 |
frickler | the wheel was built with setuptools < 58, with 58 it fails | 19:55 |
corvus | thx | 19:56 |
fungi | a sort of toctou problem, i guess | 19:56 |
clarkb | and ya doing a fresh wipe of the wheel mirrors periodically would be a good way to expose that stuff in CI | 19:56 |
ianw | that doesn't exactly cover this scenario, but related. basically that we are an append-only copy that grows indefinitely | 19:56 |
frickler | also https://bugs.launchpad.net/cinder/+bug/1946340 | 19:56 |
frickler | we first noticed it with fedora, because we don't seem to have a wheel for py39 | 19:56 |
clarkb | we probably don't need daily rebuilds but weekly or monthly might be a good approach. And solve the indefinite growth problem too | 19:56 |
clarkb | separately suds-jurko has been unmaintained for like 7 years... | 19:57 |
clarkb | and should be replaced too | 19:58 |
fungi | one of half a dozen (at least) dead dependencies which setuptools 58 turned up in various projects i know about | 19:58 |
frickler | yes, in that specific case, there's suds-community as a replacement | 19:58 |
frickler | https://review.opendev.org/c/openstack/requirements/+/813302 lists what fails in u-c | 19:58 |
frickler | with a job that doesn't use the pre-built wheels | 19:59 |
clarkb | We are at time. Feel free to continue conversation in #opendev or on the mailing list. | 20:00 |
clarkb | Thank you everyone for listening and participating. We'll probably be around next week since I don't expect a ton of direct PTG involvment. But if that changes I'll try to send and email about it | 20:00 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue Oct 12 20:00:49 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-10-12-19.01.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-10-12-19.01.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-10-12-19.01.log.html | 20:00 |
fungi | thanks clarkb! | 20:01 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!