clarkb | meeting time | 19:00 |
---|---|---|
clarkb | I'm a bit behind due to the docker stuff. Please excuse my lack of organization today | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Mar 14 19:01:16 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YZXXWZ7LB3KEF3AMJV3WIPFKCGH2IA2O/ Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:02 |
clarkb | Daving saving time has gone into effect for some of us. Heads up that it will go into effect for others in 2-3 weeks as well | 19:02 |
clarkb | heh I can't type either. *Daylight saving time | 19:02 |
fungi | i favor switching to daving savelights time | 19:03 |
clarkb | Our meeting doesn't change the time it occurs at. It remains at 19:00 UTC but this time may have shifted relative to your local timezone due to the time change | 19:03 |
clarkb | OpenStack is making its 2023.1/Antelope release next week. That should occur on a wednesday so roughly 8 days from now | 19:03 |
fungi | yeah, "festivities" will likely start around 09:00 utc | 19:04 |
fungi | maybe a bit later | 19:04 |
fungi | release notes jobs in the tag | 19:04 |
fungi | pipeline will need about 8-10 hours due to serializatio | 19:04 |
fungi | n | 19:04 |
fungi | would love to work out a better option than that semaphore at some point | 19:05 |
clarkb | its only there to prevent errors that aren't actually fatal in the docs jobs right? | 19:05 |
clarkb | I mean you could just remove the semaphore and tell them to validate docs publication? | 19:05 |
clarkb | or maybe I'm confusing issues and there is a more important reason to have the semaphore | 19:05 |
fungi | well, it's there to solve when someone approves release requests for several branches of the same project and they race uploads of the release notes and one regresses the others | 19:06 |
fungi | because all branches share the same tree in afs | 19:06 |
fungi | so they need a per-project semaphore, which doesn't really exist (without defining a separate one for each of hundreds of repos) | 19:06 |
clarkb | aha, could possibly remove the semaphore temporarily for the release since only that one branch should e getting releases on that day? | 19:07 |
fungi | possible, i'll bring it up with them | 19:07 |
clarkb | The week after next the virtual PTG will be taking place | 19:08 |
clarkb | And that was it for announcements | 19:09 |
clarkb | #topic Bastion Host Changes | 19:09 |
clarkb | ianw: are you here? I was hoping we'd be able to decide on whetheror not we are proceeding with the backups stack. | 19:09 |
clarkb | #link https://review.opendev.org/q/topic:bridge-backups | 19:09 |
ianw | yes :) | 19:09 |
clarkb | It looks like you may need a second reviewer? Something we should probably do in this case since we need multiple people to stash keys? | 19:10 |
clarkb | Any volunteers for second reviews? | 19:10 |
ianw | yes, and probably a few people to say they're on board with holding a secret for it, otherwise it's not going to work | 19:10 |
fungi | i can try to take a look, and am happy to safeguard a piece of the key | 19:11 |
clarkb | I'm happy to stash the bits into my keepassxc db | 19:11 |
ianw | ok, well if fungi can take a look we can move one way or the other | 19:12 |
clarkb | fungi: thanks! I think thats the next step then. Get a second review and assuming review is happy make a plan to distribute the right key bits | 19:12 |
clarkb | anything else bridge related? | 19:12 |
ianw | nope, not for now | 19:13 |
clarkb | #topic Mailman 3 | 19:13 |
clarkb | fungi: I haven't seen anything new here, but want to make sure Ididn't miss naything | 19:14 |
corvus | i'm on board for being a (partial) keymaster | 19:15 |
fungi | yeah, i got very close to picking it back up today, before things started to get exciting again | 19:16 |
fungi | so nothing new to share yet | 19:16 |
clarkb | yes boring would be nice occasionally | 19:16 |
fungi | vinz clortho, keymaster of gozer | 19:16 |
clarkb | #topic Gerrit Updates | 19:16 |
clarkb | ianw's stack of copyCondition and submit requirements changes has landed as has the manual update to All-Projects for submit requirements | 19:17 |
clarkb | We did run into some problems with the All-Projects update because 'and' and 'AND' are different in Gerrit 3.6 query expressions | 19:17 |
clarkb | But that got sorted out andI think things have been happy since (at least no new complaints since then) | 19:17 |
fungi | but not in 3.7. that seems like an unfortunate choice of fix not to backport | 19:17 |
clarkb | ianw: from your work on this are there other ACL updates you think we need to make or are we all up to date for modern Gerrit3.7 expecttations? | 19:18 |
ianw | nope, i think we're ready for the 3.7 transition from that POV now | 19:18 |
ianw | i will spend a little time updating https://etherpad.opendev.org/p/gerrit-upgrade-3.7 today | 19:19 |
clarkb | great! | 19:19 |
ianw | a couple of things to check, but i think all known knowns and known unknowns are dealt with :) | 19:19 |
clarkb | ianw: as far as ensuring we don't slide backwards goes can we update the little checker tool to only allow function = NoBlock and require copyCondition not the old thing? | 19:19 |
clarkb | I think if we do those two things it will prevent any cargo culting of old info accidentally | 19:20 |
ianw | oh yes, sorry that's on my todo list. the snag i hit was that the normalizer isn't really a linter in the way of a normal linter, but a transformer, and then if there's a diff it stops | 19:20 |
clarkb | ya in that case maybe just delete the lines we don't want which will produce a diff | 19:20 |
clarkb | and hopefully that diff is clear that we don't want those lines because they are removed (don't need to replace them with an equivalent as that would be more effort) | 19:20 |
ianw | i guess the problem is that that then creates a diff that is wrong | 19:21 |
ianw | i wasn't sure if the point was that you could apply the diff | 19:21 |
ianw | if so, it kind of implies writing a complete function -> s-r transformer | 19:21 |
clarkb | I think the idea was the diff would help people correct their changes and bonus points if you could directl pply it | 19:21 |
clarkb | in this case I think it is ok if we have a diff that isn't going to complete fix changes for peopel and simply force an error and pull the eye to where the problem is | 19:22 |
ianw | i could do something like add a comment line "# the following line is deprecated, work around it"? | 19:22 |
clarkb | ++ | 19:22 |
ianw | ok, i'll do that then | 19:22 |
clarkb | #topic Project Renames and Gerrit Upgrade | 19:24 |
clarkb | Quick check if we think we are still on track for an April 7th upgrade of Gerrit and project renames | 19:24 |
clarkb | I think the only concern that has come up is the docker org deletion on the 14th | 19:25 |
clarkb | mostly worried that will demand our time and we won't be able to prep for gerrit things appropriately. But it is probably too early to cancel or move the date for that. Mostly bringing it up as a risk | 19:25 |
clarkb | And then I wanted to talk about the coordination of that. Do we want to do the renames and upgrade in one window or two separate windows? And what sorts of times are we looking at? | 19:26 |
clarkb | ianw: I think you were thinking of doing the Gerrit upgrade late April 6 UTC or early April 7 UTC? Then maybe fungi and I do the renames during our working hours April 7 if we do two different windows | 19:27 |
clarkb | If we do one window I can be around to do it all late April 6 early APril 7 but I think that gets more difficult for fungi | 19:28 |
ianw | i guess question 1 is do we want renames or upgrade first? | 19:28 |
corvus | i agree it's worth keeping an eye on, and if anyone feels overburdened, raise a flag and we can slow down or deal with it. but from right now at least, i think we can work on both. | 19:28 |
fungi | i can swing it | 19:29 |
fungi | i just can't do the week after as i'll be offline | 19:29 |
clarkb | ianw: I think one reason to do renames first would be if we had previously done renames under that gerrit version. But we have never reanmed anything under 3.6 so order doesn't matter much | 19:29 |
clarkb | fungi: ianw ok in that case maybe aim for ~2200-2300 UTC April 6 and do both of them? | 19:30 |
clarkb | and we can sort out the order antoher time if we're committing to a single block like that | 19:30 |
ianw | ok, if that's a bit late we coul dmove it forward a few hours too | 19:31 |
fungi | wfm | 19:31 |
clarkb | Ok with that decided (lets say 2200 UTC to make it a bit easier for fungi) should we send email about that now? | 19:31 |
clarkb | for some value of now approximately equal to soon | 19:32 |
ianw | ++ | 19:32 |
clarkb | I can do that I just want to make sure we're reasonably confident first | 19:32 |
clarkb | cool I'll add that to my todo list | 19:32 |
fungi | thanks! | 19:32 |
ianw | i am happy to drive, and we'll have checklists, so hopefully it's really just don't be drunk at that time in case the worst happens :) | 19:32 |
clarkb | haha | 19:32 |
ianw | or maybe, get drunk, in case the worst happens. either way :) | 19:32 |
clarkb | #topic Old Server Upgrades | 19:33 |
clarkb | Much progress has been made with the giteas. | 19:33 |
clarkb | As of Friday we're entirely jammy for the gitea cluster in production behind the load balancer | 19:33 |
clarkb | I have changes up to clean up gitea01-04 but have WIP'd them becuase I think the openstack release tends to be a high load scenario for the giteas and that is a good sanity check we won't need those servers before deleting them | 19:34 |
clarkb | I'll basically aim to keep the gitea01-04 backends replicated to until after the openstack release and if all looks well after that clean them up | 19:34 |
fungi | yeah, especially when some of the deployment projects update and all their users start pulling the new release at the same time | 19:35 |
clarkb | there are two reasons for the caution here. The first is that we've changed the flavor type for the new servers and we've seen some high cpu steal at times. But those flavors are bigger on more modern cpus so in theory will be quicker anyway so I've reduced the gitea backend count from 6 to 8 | 19:35 |
clarkb | * 8 to 6 | 19:35 |
clarkb | so far though those new servers have looked ok | 19:35 |
clarkb | just want to keep an eye out through the release before making the cleanup more permanent | 19:35 |
clarkb | ianw has also started looking at nameserver replacements | 19:36 |
clarkb | #link https://etherpad.opendev.org/p/2023-opendev-dns | 19:36 |
clarkb | #link https://review.opendev.org/q/topic:jammy-dns | 19:36 |
clarkb | good news the docker stuff doesn't affect dns :) | 19:36 |
ianw | yep sorry got toally distracted on that, but will update all that now that we've got consenus on the names | 19:36 |
fungi | thanks for working on it | 19:36 |
clarkb | Ya this is all good progress. Still more work to do including ehtperad which I had previously planned to do after the PTG | 19:37 |
clarkb | its possible to get it done quickly pre ptg but the ptg relies on etherpad so much I'd kinda prefer changing things after | 19:37 |
clarkb | jitsi meet as well | 19:37 |
corvus | clarkb: the gitea graphs look good. qq (i hope it's quick, if not, nevermind and we can take it offline) -- what happened between march 7-9 -- maybe we had fewer new servers and then added more? | 19:37 |
clarkb | corvus: yes, we had 4 new servers and we also got hit by a bot crawler that was acting like a 2014 samsung phone | 19:38 |
clarkb | corvus: we addressed that by updating our UA agent block list to block the nacient phone and added two more servers for a total of 6 | 19:38 |
clarkb | I thought we might get away with 4 servers instead of 8 but that incident showed that was probably too small | 19:39 |
fungi | so the issue was twofold: a bad actor and fewer backends | 19:39 |
corvus | cool; thanks | 19:39 |
fungi | it noticeably slowed down response times for clients too | 19:39 |
fungi | while that was going on | 19:39 |
clarkb | If I get time this week or next I'll probably try to do a server or two that the ptg doesn't interact with (mirror nodes maybe?) | 19:40 |
clarkb | anyway measurable progress here. Thanks for all the help | 19:40 |
clarkb | #topic AFS volume quotas and utilization | 19:40 |
clarkb | Last week I bumped AFS quotas for the volumes that were very close to the limit | 19:41 |
clarkb | That avoided breaking any of those distro repo mirrors which is great. But doesn't address the every growing disk utilization problem | 19:41 |
clarkb | also it looks like deleting fedora 35 and adding fedora 37 resulted in a net increase of disk utilization | 19:41 |
ianw | i should be able to remove 36 fairly quickly | 19:42 |
clarkb | I did poke around looking for some easy wins deleting things (something that has worked well in the past) and did't really come up with any other than: Maybe we can drop the openeuler mirror and force them to pull from upstream like we do with rocky? | 19:42 |
clarkb | ianw: oh thats good to know | 19:42 |
fungi | there's also a debian release coming up which we'll probably need at least a temporary bump in capacity for before we can drop old-oldstable | 19:42 |
clarkb | Maybe lets get that done before making any afs decisions. The other idea I had was we should maybe consider adding a new backing volume to the two dfw fileservers | 19:42 |
clarkb | I don't think this is urgent as long as we are not adding new stuff (debian will force the issue when that happens) | 19:44 |
clarkb | I guess start with fedora 36 cleanup then evaluate what is necessary to add new debian content | 19:44 |
fungi | worth trying to find out if debian-buster images are still heavily used, or transition them off our mirroring if they are infrequently used but unlikely to get dropped from use soon | 19:44 |
fungi | in order to free up room for debian-bookworm in a few months | 19:44 |
clarkb | fungi: ya thats an option. Can also make buster talk to upstream if infrequently used | 19:45 |
ianw | you can't have one *volume* > 2tb right (that was pypi's issue?) | 19:45 |
clarkb | but keep the images | 19:45 |
clarkb | ianw: correct | 19:45 |
clarkb | ianw: we can add up to 12 cinder volumes each a max of 1TB (these are cloud limitations) to the lvm on the fileservers so we are wll under total afs disk potential | 19:45 |
fungi | yeah, that's what i meant by transition off our mirroring | 19:45 |
clarkb | but then an individual afs volume can't be more than 2TB | 19:46 |
fungi | but also the more cinder devices we attach, the more precarious the server becomes | 19:46 |
ianw | i guess the only problem is if those screw up, it becomes increasingly difficult to recover | 19:47 |
ianw | heh, jinx | 19:47 |
corvus | we can add more servers | 19:47 |
clarkb | ya and also just general risk of an outage | 19:47 |
fungi | it basically multiplies the chances of the server suffering a catastrophic failure from an iscsi incident | 19:47 |
fungi | right, more afs servers with different rw volumes may be more robust than adding more storage to one server | 19:48 |
corvus | (doesn't affect our overall chances of being hit by an iscsi incident, but may contain the fallout and make it easier to recover) | 19:48 |
fungi | the risk of *an* outage doesn't decrease, but the impact of an outage for a single device or server decreases to just the volumes served from it | 19:48 |
clarkb | corvus: does growing vicepa require services be stopped? | 19:49 |
ianw | we also add everything under vicepa -- we could use other partitions? | 19:49 |
clarkb | if so that may be another good reason to use new servers | 19:49 |
clarkb | ianw: heh jinx. I'm not sure what hte mechanics of the underlying data are like and whether or not one appraoch should be preferred | 19:50 |
fungi | also, vos release performance may improve, since we effectively serialize those today with the assumption that otherwise we'll overwhelm the one server with the rw volumes | 19:50 |
clarkb | we've only got 10 minutes left and there are a couple of other things I wanted to discuss. Lets keep afs in mind and we can brainstorm ideas going forward but it isn't urgent today | 19:50 |
clarkb | more of a mid term thing | 19:51 |
clarkb | #topic Quo vadis Storyboard | 19:51 |
corvus | we're not using raw partitions, we're using ext filesystems, so i don't think anything needs to be stopped to grow it, but i'm not positive on that. | 19:51 |
clarkb | corvus: ack | 19:51 |
clarkb | frickler won't be able to attend this meeting today but made a good point that with the PTG coming up there may be discussions from projects about not using storyboard aynmore | 19:51 |
clarkb | I mentioned in #opendev that I think we should contiue to encourage those groups to work together and coordinate any tooling they might produce so that we don't have duplicated efforts | 19:52 |
clarkb | But does leave open the question for what we should do. I also mentioned in #opendev that if I was a lone person making a decision I think I'd look at sunsetting storyboard since we haven't been able to effectively operate/upgrade/maintain it | 19:52 |
clarkb | with an ideal sunset involving more than 30 days notice and if we can makin some sort of read only archive that is easier to mange | 19:53 |
clarkb | That said I don't think I should make decisions like that alone so am open to feedback and other ideas | 19:53 |
clarkb | I'm also happy to jump into ptg sessions that involve storyboard to try and help where I can during the ptg | 19:53 |
clarkb | Maybe ya'll can digest those ideas and let me know if they make sense or are terrible or have better ones :) | 19:54 |
clarkb | Definitely not something we have time for today or in this meeting. But the feedback would be helpful | 19:54 |
ianw | perhaps sunsetting it would be the push someone needs to dedicate resources on it? | 19:54 |
clarkb | ianw: its possible | 19:54 |
clarkb | I think that is unlikely but it is a theoretical outcome | 19:55 |
ianw | either way something happens then, i guess | 19:55 |
clarkb | ok running out of time and one more item remains | 19:55 |
clarkb | this is not on the agenda but worth bringing up | 19:55 |
clarkb | #topic Docker ending free team organizations | 19:55 |
fungi | because people will ask about it anyway ;) | 19:55 |
clarkb | Docker is ending their free team organization setup which we use for opendevorg and zuul on docker hub | 19:56 |
clarkb | (there are actually two other orgs openstackinfra and stackforge which are unused and empty) | 19:56 |
clarkb | This will affect us one way or another and we are very likely going to need to make changes | 19:56 |
clarkb | It isn't clear yet which changes we will need to make and of the options which we should take but I started an etherpad to collect info and try to make that decision making easier | 19:56 |
clarkb | #link https://etherpad.opendev.org/p/MJTzrNTDMFyEUxi1ReSo | 19:57 |
clarkb | I think we should continue to gather information and collect ideas there for the next day or two without trying to attribute too much value to any of them. Then once we have a good clear picture make some decisions | 19:57 |
corvus | one point it would be useful to clarify is whether it's possible, and if so how, we can have an unpaid organization on quay.io to host our public images. quay.io says that's possible, but i only see a $15/mo developer option on the pricing page, and account signup requires a phone number. | 19:57 |
clarkb | If you sign up with aphone number and get what you need I'm happy to sacrifice mine | 19:58 |
clarkb | ianw: ^ maybe that is something you can ask about at red hat? | 19:58 |
clarkb | basically clarify what account setup requirements are and if public open source projects need to pay for public image hosting | 19:58 |
corvus | (i'd be happy to sign up to find out too, except that i wear a lot of hats, and if i only get one phone number, i don't know if i should burn it for "opendev" "zuul" or "acme gating"...) | 19:59 |
ianw | i can certainly look into it -- of the top of my head i don't know anyone directly involved for instant answers but i'll see what i can find | 19:59 |
clarkb | ianw: thanks! | 19:59 |
corvus | (or maybe it's okay to have two accounts with the same phone number.. <shrug>) | 19:59 |
clarkb | Also NeilHanlon (Rocky Linux) and Ramereth (OSUOSL) have similar issues/concerns with this and we may be able to learn from each other. They have both applied for docker's open source program which is apparently one way around this | 20:00 |
clarkb | I asked them to provide us with info on how that goes just so that we've got it and can weight that option | 20:00 |
fungi | or at least someeone on twitter with the same name as a c-level exec at docker claimed that they won't delete teams who apply for the open source tier | 20:01 |
* fungi takes nothing for granted these days | 20:01 | |
clarkb | yes, they also say they won't allow names to be reused which means if/when we get our orgs deleted others shouldn't be able to impersonate us | 20:01 |
clarkb | this is important because docker clients default to dockerhub if you don't qualify the image names with a location | 20:02 |
clarkb | and we are at time. I can smell lunch too so we'll end here :) | 20:02 |
clarkb | Thank you everyone! | 20:02 |
clarkb | #endmeeting | 20:02 |
opendevmeet | Meeting ended Tue Mar 14 20:02:53 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:02 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-03-14-19.01.html | 20:02 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-03-14-19.01.txt | 20:02 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-03-14-19.01.log.html | 20:02 |
corvus | fungi: good point; technically someone on github who says they're at docker said there won't be namespace takeover issues... but they don't actually have any flair that would confirm their position... | 20:03 |
fungi | oof | 20:04 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!