clarkb | anyone else here for the meeting? | 19:00 |
---|---|---|
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Aug 10 19:01:10 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-August/000273.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | I had none. Let's just jump right into the meeting proper | 19:01 |
clarkb | #topic Actions from last meeting | 19:01 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-08-03-19.01.txt minutes from last meeting | 19:01 |
clarkb | I did manage to get around to writing up the start of a prometheus spec yesterday and today | 19:02 |
clarkb | #link https://review.opendev.org/c/opendev/infra-specs/+/804122 Prometheus spec | 19:02 |
clarkb | This is still quite high level as I haven't run one locally but did read a fair bit of documentation yesterday | 19:02 |
clarkb | I think in this case we don't need to have a bunch of specifics sorted out early because we can run this side by side with cacti while we sort it out and make it do what we want | 19:02 |
corvus | ++ | 19:03 |
fungi | yeah, i'm in favor of feeling it out once it's running | 19:03 |
clarkb | Then as noted in the spec we can run it for a month or so and compare data between cacti and prometheus before shutting down cacti | 19:03 |
corvus | i will review the spec asap | 19:03 |
clarkb | I think it captures the important bits, but I'm happy for feedback and will update it appropriately | 19:03 |
fungi | i need to read it still, was there any treatment of how we might import our historical data, or just keep the old graphs around? | 19:03 |
clarkb | fungi: no, I think that is well beyond the scope of that spec | 19:04 |
fungi | got it, thanks | 19:04 |
clarkb | you'd need to write an rrd to tsdb conversion tool which may exist? | 19:04 |
clarkb | https://groups.google.com/g/opentsdb/c/H7t-WPY11Ro | 19:05 |
fungi | yeah, or may be as simple as plugging a coupld or python libraries into one another | 19:05 |
clarkb | if someone wants to work on that during that side by side period it should definitely be possible | 19:05 |
fungi | s/coupld or/couple of/ | 19:05 |
clarkb | but I'm not sure it is critical? | 19:05 |
fungi | right, it's something else we'll want to figure out as a group | 19:05 |
corvus | i'd vote for just keeping cacti around for many months until we don't care | 19:05 |
clarkb | corvus: ya that was sort of what I was thinking | 19:05 |
fungi | certainly one option | 19:05 |
clarkb | basically keep cacti around to ensure the data we have in prometheus is at least as accurate as cacti then when ready delete cacti | 19:06 |
clarkb | the spec says we can do that after a month but happy to update that to be more flexible | 19:06 |
fungi | depends on how much we value being able to compare against older trending (and how much older) | 19:06 |
corvus | if there's a security reason we can't keep cacti up, we could still keep it around but firewall it | 19:06 |
corvus | so that if we need to look at old data, it's possible (if not easy) | 19:06 |
fungi | anyway, all things we can hash out later | 19:07 |
clarkb | #topic Topics | 19:07 |
clarkb | Ya lets hash it out in the spec review :) | 19:07 |
clarkb | #topic Service Coordinator Election | 19:07 |
clarkb | The end of today UTC time is the end of the service coordinator nomination period | 19:07 |
clarkb | I've not seen anyone volunteer yet :P | 19:08 |
clarkb | I'll keep doing it if no one else wants to do it, but definitely think someone else should do it | 19:08 |
clarkb | Anyway this is your reminder of that deadline. Please do volunteer if you are interested | 19:09 |
fungi | i can volunteer if you really need to step down, but i'm not sure another openinfra foundation staff member is a better choice. as things are, it's a struggle to explain that opendev is a community rather than a service run by the foundation | 19:09 |
fungi | (hard to explain that to the rest of the foundation staff as much as to the public) | 19:09 |
clarkb | I think for me it would be nice to be able to focus on more of the technical details of upgrading services and running new services, etc. But I agree that is also a struggle | 19:10 |
clarkb | and I think having someone else do it can be good for a shift in approach/perspective | 19:10 |
fungi | from a sustainability perspective, it would be nice to have an option other than foundation employees | 19:10 |
clarkb | #topic Review Upgrades | 19:12 |
clarkb | I believe old server cleanups have happend. Thank you ianw again for doing a bunch of the work on this | 19:13 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/803374 Clean up old mysql gerrit stuff | 19:13 |
ianw | yep all done | 19:13 |
clarkb | That removes the mysql connector from our images as well as support for h2 and mysql from the gerrit role in system-config | 19:13 |
clarkb | at this point I think we are good to move forward on landing that as there haven't been problems with prod since the mariadb switch | 19:13 |
fungi | neatly wrapped up! | 19:13 |
fungi | i agree | 19:13 |
ianw | the only thing left on the cleanup list is "decide on sshfp records" | 19:14 |
clarkb | our options are to have no sshfp records or only do port 29418 sshd records on review.o.o and port 22 on review02.o.o ? | 19:15 |
ianw | personally i think we generally want to access ssh on port 22 & 29418 @ review.opendev.org so that is in conflict with choosing one for sshfp records | 19:15 |
clarkb | fwiw I've been trying to train myself to ssh to the actual host fqdn when using port 22 and use review.o.o for 29418 | 19:15 |
fungi | i'm okay leaving it as-is, but it's inconsistent with how we handle sshfp records for admin access to our other servers | 19:15 |
clarkb | but ya I'm not doing any sshfp verification from my client as far as I know | 19:16 |
clarkb | I'm happy to leave it as is with the comment in the zone file about why this host is different | 19:16 |
fungi | on the other hand, if we do have a review02.opendev.org-only sshfp record then it wouldn't directly conflict with anything, we'd just need to separate the address records and not use a cname for that | 19:16 |
ianw | at the time i was thinking also things like zuul want review02 as the ssh target | 19:17 |
ianw | but that turned out to not work so well | 19:17 |
ianw | (gerrit ssh port target i mean) | 19:17 |
fungi | another option would be to switch openssh to using the same host key as the gerrit service, it's the only service running there, and so i'm not super concerned that someone might get ahold of the api hostkey and use that to take control of the underlying operating system, if they get that first bit then the whole server is already sunk really | 19:17 |
fungi | it's not as if there's anything else to protect which the gerrit service doesn't have access to | 19:18 |
clarkb | that is an interesting idea. I hadn't considered that before. It would make distinguishing between gerrit hosts a bit more fuzzy, but would simplify sshfp records | 19:18 |
fungi | yeah, i guess it's the transitional gerrit server replacement period when there are two running which is the real issue | 19:19 |
ianw | hrm, i'm not sure we have any ansible logic for writing out host keys on base servers though | 19:19 |
clarkb | I don't feel strongly about any of the otpions fwiw. I'm happy with the current situation but have also started trying to train myself when ssh'ing to use the actual host fqdn which falls in line with the old sshfp setup | 19:19 |
clarkb | ianw: ya we don't | 19:19 |
fungi | right, my takeaway is that all the solutions are fairly complex and have their own distinct downsides, so i'm good with the option requiring the least work (that is, to be clear, just leaving it how it's configured now) | 19:20 |
ianw | i think we're all ok with no records and a comment why, which is the status quo | 19:20 |
ianw | all right, decided. i'll cross that off the list and so other than that cleanup change, i think this is done! | 19:21 |
fungi | the split record solution was elegant enough until we had to reason about server replacements | 19:21 |
fungi | thanks! | 19:21 |
clarkb | ianw: ++ we can always reevaluate if some reason to have the sshfp records pops up | 19:21 |
clarkb | #topic Project Renames | 19:21 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/803992 Accomodate zuul's new zk key management system | 19:21 |
clarkb | I've pushed that up with a depends-on to handle the future zuul state where it doesn't implicitly back up things to disk | 19:22 |
clarkb | The other thing we had on the todo list was updating the docs to handle the edits we made to the etherpad compared to the documented process | 19:22 |
clarkb | has anyone started on that change yet? | 19:23 |
fungi | also we discovered that accepting the inability to run zuul jobs on rename changes makes it hard to spot when you've caught all the remaining tentacles. we ended up merging two fixes (i think it was two?) where the old project name was referenced | 19:23 |
clarkb | yup, I think part of the doc updates should be splitting those changes up so that we can review them with more CI testing upfront | 19:23 |
fungi | i agree, but last time this came up we couldn't agree on where/how to split them so we wound up just keeping it all squashed | 19:24 |
clarkb | ya its a bit of a pain iirc | 19:24 |
clarkb | I was thinkign we could do a add everything but don't remove old stuff change for things like acls etc | 19:25 |
fungi | also no i haven't yet written any process changes based on the notes in the pad | 19:25 |
clarkb | then we can safely land that first and then land a cleanup that does the actual rename? | 19:25 |
fungi | #link https://etherpad.opendev.org/p/project-renames-2021-07-30 The maintenance plan we followed | 19:25 |
clarkb | fungi: ok, I can probably look at that this week. | 19:25 |
clarkb | that == writing the docs update change | 19:25 |
fungi | i may get to it if you don't. i think a lot of it is going to be deletions anyway | 19:26 |
clarkb | Then we can delete this from the agenda along with the review upgrade topic :) | 19:26 |
clarkb | fungi: thanks | 19:26 |
fungi | i guess it's step #5 there which will need some consideration | 19:27 |
fungi | well, and step #1 | 19:27 |
fungi | also is there anything about how zuul handles configuration we can improve to make this easier, or which we can take advantage of (run a config check on the altered config in teh check pipeline?) | 19:28 |
clarkb | fungi: the problem is that zuul in prod is verifying its own config against the config changes | 19:28 |
clarkb | fungi: we could run a testing zuul to validate things but those jobs won't even run due to the config errors in the proposal | 19:28 |
fungi | well, it isn't going to speculatively apply the change anyway, the refusal to enqueue is a safeguard | 19:29 |
fungi | maybe there's an option we could add to bypass that safety check in a yes-i-know-this-doesn't-make-sense kind of way? | 19:30 |
clarkb | something like that would work for acl verification at least | 19:30 |
clarkb | basically where we do out of band validation | 19:30 |
fungi | or post zuul v5 maybe some support for actual repository renames in zuul, where it can reason about such things... but that's likely to be a significant undertaking | 19:31 |
clarkb | ya something to bring up with the zuul maintainers I suspect | 19:32 |
clarkb | Lets continue on. We can hash out our options while writing and reviewing the docs updates | 19:32 |
clarkb | #topic Matrix Homeserver and bots | 19:32 |
clarkb | tristanC's prometheus metrics show that gerritbot loses connectivity to review.opendev.org reliably every hour | 19:33 |
clarkb | Sorting that out is probably a good idea, though possibly not critical to zuul using the service | 19:33 |
fungi | that's affecting our production gerrit, or a test instance? | 19:33 |
clarkb | We also got billed for the homeserver in the expected amount which means that aspect is working without surprises (a very good thing) | 19:34 |
corvus | tristanC: are you working on that? | 19:34 |
clarkb | fungi: our production gerrit | 19:34 |
fungi | er, production gerritbot (the irc-connected one)? | 19:34 |
clarkb | fungi: aiui yes | 19:34 |
fungi | neat | 19:34 |
clarkb | oh sorry no | 19:34 |
clarkb | the production matrix gerritbot | 19:34 |
corvus | it's affected the irc one too? | 19:34 |
corvus | didn't think so | 19:34 |
clarkb | I don't have any evidence that it is affecting the irc gerritbot | 19:34 |
fungi | well, that's what i'm wondering. if the gerrit connection code is all the same then it could i suppose | 19:35 |
ianw | is it reliably at the same time every hour, or reliably once an hour? | 19:35 |
clarkb | ianw: same time every hour according to the prometheus graph I saw | 19:35 |
clarkb | fungi: its completelydifferent. irc gerritbot uses paramiko iirc and matrix gerritbot uses libssh2 in haskell | 19:35 |
ianw | i do seem to remember rewriting/fixing the gerritbot reconnect logic at some point | 19:35 |
ianw | it might be hiding any drops | 19:35 |
clarkb | I'm calling it out because it may lead to service impacts for zuul to use the matrix bot | 19:36 |
corvus | clarkb: do you know if tristanC is working on a fix? | 19:37 |
corvus | (i'm unaware of any previous discussion about this -- it's the first time i'm hearing of it) | 19:37 |
clarkb | corvus: I do not know. It was mentioned over the weekend and I don't know if anyone including tristanC is looking into it further | 19:37 |
clarkb | https://matrix-client.matrix.org/_matrix/media/r0/download/matrix.org/TIjNHQWUwHJlwgOpLbQRMYdN was what tristanC shared on Sunday (relative to me) | 19:38 |
corvus | is there some discussion somewhere? | 19:39 |
corvus | i can't find anything in #opendev eavesdrop logs | 19:40 |
clarkb | corvus: it was in #opendev on oftc from ~2100UTC Sunday to early Monday | 19:40 |
clarkb | https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2021-08-08.log.html#t2021-08-08T21:29:52 and https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2021-08-09.log.html#t2021-08-09T00:15:09 | 19:40 |
corvus | https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2021-08-09.log.html#t2021-08-09T00:15:09 looks relevant | 19:41 |
clarkb | I haven't seen mention of it since | 19:41 |
corvus | okay, well, i was hoping to get the 'all clear' at this meeting to move zuul over, but it doesn't seem like we're there | 19:41 |
corvus | tristanC: can you please provide an update (if you're not here now, maybe over in #opendev when you are around) on the impact of this issue and if you're addressing it? | 19:42 |
clarkb | I think I'm happy for Zuul to use it as is. It would be up to Zuul if they are ok with the connection error issue being sorted out concurrently with the move | 19:42 |
clarkb | Billing was my last major concern before moving (I didn't want zuul to move then us get a large unexpected bill and have to move quickly to something else for example) | 19:43 |
corvus | clarkb: i don't feel like i have enough info to make that decision -- like -- how much stream-events time does gerritbot miss? | 19:43 |
clarkb | corvus: ya getting more info makes sense | 19:43 |
clarkb | Why don't we followup with tristanC on that then if keepalives fixed it Zuul can proceed otherwise dig in more? and make a decision? But I think from OpenDev's perspective its largely up to Zuul's level of comfort with starting to actually use the services | 19:45 |
clarkb | Anything else to bring up on this subject? | 19:46 |
fungi | that jives with my reconing | 19:46 |
corvus | that's it | 19:46 |
clarkb | #topic Gitea Backups | 19:46 |
clarkb | We got an email saying lists failed as well. I was worried that it may be suffering the same issue now but it only happened the once | 19:47 |
clarkb | I suspect that was "normal" internet flakyness rather tahn the persistent variety | 19:47 |
clarkb | ianw: did an email get sent about this yet? | 19:47 |
ianw | ahh, no sorry | 19:49 |
clarkb | Alright, considering the lists issue hasn't persisted I think that is all for this topic | 19:49 |
clarkb | #topic Gitea 1.15.0 upgrade | 19:49 |
clarkb | Thank you everyone for helping to review and land the prep changes for this work. We are no longer using hacky UI interactions via http and instead use the REST api for all gitea project management updates | 19:50 |
clarkb | The lates gitea 1.15.0-rc3 release seems to work fine in testing with the associated template updates and file moves | 19:50 |
clarkb | Upstream has a milestone setup due on the 18th for the 1.15.0 release and no outstanding bugs are listed. I expect the release will happen soon. Once it happens we can update my change and hold the nodes and do direct verification that stuff works as expected | 19:51 |
clarkb | The other gotcha is that the hosting of the logos changes and the paths move | 19:51 |
clarkb | this will impact review and paste's theming | 19:51 |
clarkb | If anyone has time to host those logos on static or with each service that uses them that might be a good idea | 19:52 |
fungi | we haven't merged any project additions to exercise the new api interactions in production, as far as anyone knows? | 19:52 |
clarkb | then we aren't updating a bunch of random stuff when our hacked up gitea theming changes | 19:52 |
clarkb | fungi: ya I don't know of any new project creations since | 19:52 |
ianw | ah i can make a static logo location | 19:53 |
fungi | i think baking the logos into each image/deploying them to each server is probably the safest so we don't have unnecessary cross-site hosting | 19:53 |
fungi | but keeping them in a single place in system-config (or some repo) would be good so we don't have duplicates in git | 19:53 |
clarkb | I hadn't considered that concern. It seems to be working now at least, but preventing future problems seems liek a good thing | 19:54 |
clarkb | We can definitely coordinate the 1.15.0 gitea update around making sure we're happy with logo hosting | 19:54 |
clarkb | While it would be nice to update early we don't need to | 19:54 |
clarkb | Almost out of time so lets move on here | 19:55 |
clarkb | #topic Mailman Ansible and Upgrades | 19:55 |
clarkb | The newlist fix landed | 19:55 |
clarkb | I don't know of any new lists being created since, so keep an eye out when that happens | 19:55 |
clarkb | I have not had time to snapshot the lists.kc.io server yet for server inplace upgrade testing but hope that it will happen this week | 19:56 |
clarkb | #topic Open Discussion | 19:56 |
clarkb | Anything else? | 19:56 |
fungi | i've got nothing | 19:56 |
clarkb | Rico Lin reached out to fungi and I about doing a presentation about OpenDev for Open Infra Days Asia 2021. This is happening in a month and we have ~3 weeks to put together a recorded talk. I'd like to give it a go, but am balancing that with everything else | 19:57 |
ianw | i'm trying to get to the bottom of debian-stable | 19:57 |
ianw | https://review.opendev.org/q/topic:%22debian-stretch-rm%22+(status:open%20OR%20status:merged) | 19:57 |
clarkb | Mentioning it in case anyone is interested in helping put that together. I've been told that one of the easiest ways to do a recording like that is to have a recorded conference call whenre you present the data either to an empty call or to your copresenters | 19:57 |
fungi | ianw: not sure if you saw, but jrosser was in favor of bypassing ci to merge the removals from murano-dashboard | 19:57 |
ianw | fungi: oh, no missed that but that seems good | 19:58 |
fungi | clarkb: yeah, i expect we could talk through some slides on jitsi-meet and then someone could record it locally from their browser | 19:58 |
fungi | the more the merrier on that | 19:58 |
clarkb | And we are at time | 20:00 |
fungi | thanks clarkb! | 20:00 |
clarkb | Thank you everyone! | 20:00 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue Aug 10 20:00:11 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-08-10-19.01.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-08-10-19.01.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-08-10-19.01.log.html | 20:00 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!