clarkb | Our meeting will begin in a couple of minutes | 18:59 |
---|---|---|
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Feb 8 19:01:20 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
ianw | o/ | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-February/000317.html Our Agenda | 19:01 |
frickler | \o | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | OpenInfra Summit CFP needs your input: https://openinfra.dev/summit/ You have until tomorrow to get your ideas in. | 19:01 |
clarkb | I believe the deadline is just something like 21:00 UTC February 9 | 19:02 |
clarkb | it is based on US central time. If you plan to get something in last minute definitely double check what the cut off is | 19:02 |
clarkb | Service coordinator nominations run January 25, 2022 - February 8, 2022 Today is the last day. | 19:02 |
clarkb | two things about this. First I've realized that I don't think I formally sent announcement of this outside of our meeting agendas and the email planning for it last year | 19:03 |
clarkb | second we don't have any volunteers yet | 19:03 |
clarkb | Do we think we should send formal announcement in a dedicated thread and give it another week? Not doing that was on me. I think in my head I had done so because I put it on the agenda but that probably wasn't sufficient | 19:04 |
fungi | oh, sure i don't think delaying it a week will hurt | 19:05 |
frickler | I would be surprised if anyone outside this group would show up, so I'd be fine with you just continuing | 19:05 |
clarkb | I'm willing to volunteer again, but would be happy to have someone else do it too. I just don't want anyone to feel like I was being sneaky if I volunteer last minute after doing a poor job announcing this | 19:05 |
clarkb | frickler: agreed, but in that case I don't think waiting another week will hurt anything either. | 19:05 |
clarkb | And that way I can feel like this was done a bit more above board | 19:05 |
frickler | yeah, it won't change anything, true | 19:06 |
frickler | oh, while we're announcing, I'll be on PTO next monday and tuesday | 19:06 |
frickler | so probably won't make it to the meeting, too | 19:06 |
clarkb | If there are no objections I'll send an email today asking for nominations until 23:59 UTC February 15, 2022 just to make sure it is clear and anyone can speak up if they aren't reading meeting agendas or attending | 19:06 |
clarkb | frickler: hopefully you get to do something fun | 19:07 |
clarkb | alright I'm not hearing objections so I'll proceed with that plan after the meeting | 19:07 |
clarkb | #topic Actions from last meeting | 19:07 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-02-01-19.01.txt minutes from last meeting | 19:08 |
clarkb | We recorded two actions. The first was for clarkb to make a list of opendev projects to retire | 19:08 |
clarkb | #link https://etherpad.opendev.org/p/opendev-repo-retirements List of repos to retire. Please double check. | 19:08 |
clarkb | This is step 0 to cleaning up old reviews as we can abandon any changes associated with those repos once they are retired | 19:08 |
clarkb | If you get time please take a look and cross any out that shouldn't be retired or feel free to add projects that should be. I'm hoping to start batching those up later this week | 19:08 |
clarkb | And the other action was frickler was going to push a chaneg to reenable gerrit mergability checks | 19:09 |
clarkb | frickler: I looked for a change but didn't see one. Did I miss it or should we cotninue recording this as an action? | 19:09 |
frickler | yeah, I didn't get to that after having fun with cirros | 19:09 |
frickler | please continue | 19:09 |
clarkb | #action frickler Push change to reenable Gerrit mergability checks | 19:09 |
clarkb | thanks! the cirros stuff helped us double check other important items :) | 19:10 |
clarkb | #topic Topics | 19:10 |
clarkb | #topic Improving OpenDev's CD throughput | 19:10 |
clarkb | I keep getting distracted and reviewing these changes falls lower on the list :( | 19:10 |
clarkb | ianw: anything new to call out for those of us that haven't been able to give this attention recently? | 19:10 |
ianw | no, they might make it to the top of the todo list later this week :) | 19:11 |
clarkb | #topic Container Maintenance | 19:11 |
clarkb | I haven't made recent progress on this, but feel like I am getting out of the hole of the zuul release and server upgrades and summit CFP whee I have time for this. jentoio if an afternoon later this week works for you please reach out. | 19:12 |
clarkb | jentoio: I think we can get on a call together for an hour or two and work through a specific item together | 19:12 |
clarkb | I did take notes a week or two back that hsould help us identify a good candidate and take it from there | 19:13 |
clarkb | Anyway lets sync up later today (or otherwise soon) and find a good time that works | 19:14 |
clarkb | #topic Nodepool image cleanups | 19:14 |
clarkb | CentOS 8 is now gone. We've removed the image from nodepool and the repo from our mirrors. | 19:15 |
clarkb | We accelerated this because upstream started removing bits from repos that broke things anyway | 19:15 |
clarkb | However, we ran into problems where projects were stuck in a chicken and egg situation unable to remove centos 8 jobs beacuse centos 8 was gone | 19:15 |
clarkb | To address this we added the nodeset back to base-jobs but centos-8-stream provides the nodes | 19:15 |
clarkb | We should check periodically with projects on when we can remove that label to remove any remaining confusion over what centos-8 is. But we don't need to be in a huge rush for that. | 19:16 |
ianw | hrm, i thought we were doing that just to avoid zuul errors | 19:17 |
fungi | odds are some of those jobs may "just work" with stream, but they should be corrected anyway | 19:17 |
ianw | i guess it works ... but if they "just work" i think they'll probably never get their node-type fixed | 19:17 |
clarkb | ianw: ya it was the circular zuul errors thta prevented them from removing the centos-8 jobs | 19:17 |
fungi | ianw: we were doing it to avoid zuul errors which prevented those projects from merging the changes they needed to remove the nodes | 19:17 |
clarkb | they are still expected to remove those jobs and stop using the nodeset | 19:17 |
ianw | sure; i wonder if a "fake" node that doesn't fail errors, but doesn't run anything, could work | 19:18 |
fungi | the alternative was for gerrit admins to bypass testing and merge the removal changes for them since they were untestable | 19:18 |
frickler | can we somehow make those jobs fail instead of passing? that would increase probability of fixing things | 19:18 |
fungi | but yeah, we could have pointed them at a different label too | 19:18 |
clarkb | ianw: frickler: that is ag ood idea though I'm not sure how to make that happen | 19:18 |
fungi | i guess we'd need to add an intentionally broken distro | 19:19 |
ianw | if we had a special node that just always was broken | 19:19 |
clarkb | we might be able to do it by setting up a nodepool image+label where the username isn't valid | 19:19 |
clarkb | so nodepool would node_failure it | 19:19 |
fungi | and this becomes an existential debate into what constitutes "broken" ;) | 19:19 |
fungi | but yeah, that sounds pretty broken | 19:19 |
clarkb | basically reuse an existing image but tell nodepool to login as zuulinvalid | 19:19 |
fungi | and would probably be doable without a special image | 19:20 |
clarkb | then we don't have another image around | 19:20 |
clarkb | fungi: ya exactly | 19:20 |
fungi | wfm | 19:20 |
ianw | i wonder if it's generic enough to do something in nodepool itself | 19:20 |
frickler | and nodefailure is actually better than just failing the jobs | 19:20 |
ianw | maybe a no-op node | 19:20 |
frickler | because it gives a hint into the right direction | 19:21 |
clarkb | frickler: ++ | 19:21 |
ianw | although, no-op tends to suggest passing, rather than failing | 19:21 |
ianw | i can take a look at some options | 19:21 |
clarkb | thanks! | 19:21 |
clarkb | ianw: frickler: I've also noticed progress on getting things switched over to fedora 35 | 19:22 |
clarkb | are we near being able to remove fedora 34 yet? | 19:22 |
frickler | yes, I merged ianw's work earlier today | 19:22 |
ianw | yeah, thanks for that, once devstack updates that should be the last user and we can remove that node type for f35 | 19:23 |
frickler | f34 | 19:23 |
clarkb | exciting | 19:23 |
ianw | #link https://review.opendev.org/c/openstack/diskimage-builder/+/827772 | 19:23 |
ianw | is one minor one that stops a bunch of locale error messages, but will need a dib release | 19:23 |
frickler | another question that came up was how long do we want to keep running xenial images? | 19:24 |
clarkb | frickler: at this point I think they are largely there for our own needs with the last few puppeted things | 19:24 |
frickler | no, there are some py2 jobs and others afaict | 19:24 |
clarkb | frickler: I've got on my todo list to start cleaning up openstack health, subunit2sql, and logstash/elasticsearch stuff which will be a huge chunk of that removed. Now that openstack is doing that with opensearch | 19:25 |
clarkb | frickler: oh interesting | 19:25 |
clarkb | frickler: I think we should push those ahead as much as possible. The fact xenial remains is an artifact of our puppetry and not because we think anyone should be using it anymore | 19:25 |
frickler | publish-openstack-sphinx-docs-base is one example | 19:26 |
clarkb | frickler: do you think that deserves a targetted email to the existing users? | 19:26 |
fungi | right, if we still want to test that our remaining puppet modules can deploy things, we need xenial servers until we decommission the last of them or update their configuration management to something post-puppet | 19:26 |
fungi | thankfully their numbers are continuing to dwindle | 19:27 |
frickler | I'd say I try to collect a more complete list of jobs that would be affected and we can discuss again next week | 19:27 |
clarkb | sounds good | 19:27 |
frickler | except I'm not there, but I'll try to prepare the list at least | 19:27 |
clarkb | frickler: ya if you put the list on the agenda or somewhere else conspicuous I can bring it up | 19:27 |
ianw | on a probably similar note, i started looking at the broken translation proposal jobs | 19:29 |
frickler | but those were bionic iirc? | 19:29 |
ianw | these seemed to break when we moved ensure-sphinx to py3 only, so somehow it seems py2 is involved | 19:29 |
fungi | yeah, bionic | 19:30 |
ianw | but also, all the zanata stuff is stuck in puppet but has no clear path | 19:30 |
frickler | ianw: from what I saw, they were py3 before | 19:30 |
fungi | note that translation imports aren't the only jobs which were broken by that ensure-sphinx change, a bunch of pre-py3 openstack branches were still testing docs builds daily | 19:30 |
frickler | so rather the change from virtualenv to python3 -m venv | 19:30 |
ianw | mmm, yeah that could be it | 19:30 |
ianw | #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/828219 | 19:31 |
clarkb | ianw: ya I think we need to start a thred on the openstack discuss list (because zanata is openstack specific) and lay out the issues and try to get help with a plan forward | 19:31 |
ianw | better explains ^ ... if "python3 -m venv" has a lower version of pip by default than the virtualenv | 19:31 |
clarkb | I'm nto sure I'm fully clued in on all the new stuff. Maybe we should draft an email on an etherpad to make sure we cover all teh important points? | 19:31 |
ianw | clarkb: i feel like there have been threads, but yeah restarting the conversation won't hurt. although i feel like, just like with the images, we might need to come to a "this will break at X" time-frame | 19:32 |
ianw | i can dig through my notes and come up with something | 19:33 |
clarkb | ianw: ya I think that is fair. It is basically what we did with elasticsaerch too. basically this is not currently maintainable and we don't have the bandwidth to make it maintainable. If people want to help us change that please reach out otherwise we'll need to sunset at $time | 19:33 |
clarkb | and ya there have been some threads, but I think they end up bitrotting in people's minds as zanata continues to work :) | 19:34 |
clarkb | thanks! | 19:35 |
clarkb | #topic Cleaing up Old Reviews | 19:35 |
clarkb | As mentioned previously I came up with a list of old repos in opendev that we can probably retire | 19:35 |
clarkb | #link https://etherpad.opendev.org/p/opendev-repo-retirements List of repos to retire. Please double check | 19:35 |
clarkb | If we're happy with the list running through retirements for them will largely be mechanical and then we can abandon all related changes as phase one of the cleanup here | 19:36 |
fungi | i added one a few minutes ago | 19:37 |
clarkb | Then we can see where we are at and dive into system-config cleanups | 19:37 |
clarkb | fungi: thanks! | 19:37 |
fungi | abandoning the changes for those repos will also be necessary as a step in their retirement anyway | 19:37 |
clarkb | fungi: yup the two processes are tied together hwich is nice | 19:37 |
clarkb | The last item on the agenda was already covered (Gerrit mergability checking) | 19:38 |
clarkb | #topic Open Discussion | 19:39 |
clarkb | I'm working on a gitea 1.16 upgrade. First change to do that with confidence is an update to our gitea testing to ensure we're exercisign the ssh container: https://review.opendev.org/c/opendev/system-config/+/828203 | 19:39 |
clarkb | The child change which actually upgrades to 1.16 should probably be WIP (I'll do that now) until we've had time to go over the changelog a bit more and hold a node to look for any problems | 19:40 |
clarkb | As a general rule the bug releases have been fine with gitea but the feature releases have been more problematic and I don't mind taking some time to check stuff | 19:40 |
frickler | do we know how much quota we have for our log storage and how much of it we are using? | 19:41 |
frickler | the cirros artifacts, once I manage to collect them properly, are pretty large and I don't want to explode anything | 19:41 |
clarkb | frickler: I don't think quota was ever explicitly stated. Instead we maintained an expiry of 30 days whcih is what we did prior to the move to swift | 19:42 |
clarkb | The swift apis in the clouds we use for this should be abel to give us container size iirc | 19:42 |
clarkb | but I haven't used the swift tooling in a while so could be wrong about that | 19:42 |
frickler | o.k., so I can try to look into checking that manually, thx | 19:42 |
clarkb | note that we shard the containers too so you might need a script to get it | 19:43 |
clarkb | but that should be doable | 19:43 |
fungi | we had a request come through backchannels at the foundation from a user who is looking to have an old ethercalc spreadsheet restored (it was defaced or corrupted at some point in the past). unfortunately the most they know is that the content was intact as of mid-2019. any idea if we even have backups stretching back that far (not to mention that they'd be in bup not borg)? | 19:43 |
ianw | hrm, i think the answer is no | 19:44 |
clarkb | fungi: Ithink if you login to the two backup servers you can see what volumes are mounted and see if we have any of the old volumes but I suspect ya that | 19:44 |
clarkb | your best bet is probably to get the oldest version in borg and spin up an ethercalc locally off of that and se eif the content was corrupted then | 19:44 |
fungi | i was going to say that we have no easy way to restore the data regardless, but also if we don't even have the data still then that's pretty much the end of it | 19:45 |
clarkb | (which is unfortunately a lot of work) | 19:45 |
ianw | unless deletion is just a bit flip of "this is just deleted" | 19:45 |
ianw | you can quickly mount the borg backups on the host | 19:45 |
clarkb | ianw: it looks like ethercalc doesn't store history of edits like etherpad does unfrotunately | 19:45 |
clarkb | it is far more "ether" :) | 19:45 |
fungi | yeah, ethercalc itself can't unwind edits | 19:45 |
fungi | once a pad is defaced, the way you fix it is to reimport the backup export you made | 19:46 |
clarkb | but also it is redis so there isn't an easy just grab these rows from the db backup | 19:46 |
fungi | in this case the user did make a backup, but onto a device whose hard drive apparently died recently | 19:46 |
clarkb | its a whole db backup and I think you have to use it that way | 19:46 |
ianw | ethercalc02-filesystem-2021-02-28T05:51:02 is the earliest | 19:47 |
fungi | oh, thanks for checking! | 19:47 |
ianw | (/usr/local/bin/borg-mount backup02.vexxhost.opendev.org on the host) | 19:47 |
fungi | that's the borg backup. do we still have older bup backups? | 19:48 |
fungi | or did we delete the bup servers? | 19:48 |
clarkb | fungi: correct the bup servers were deleted. Their volumes were kept for a time. Then also eventually deleted iirc | 19:49 |
clarkb | you could double check that the bup volumes are gone. its possible they remained. But pretty sure the servers did not | 19:49 |
ianw | neither has bup backups mounted | 19:49 |
fungi | yeah, okay, so sounds like that's gone, thanks | 19:49 |
ianw | i have in my old todo notes "cleanup final bup backup volume 2021-07" | 19:50 |
clarkb | corvus: made a thing that was really cool I wanted to call out https://twitter.com/acmegating/status/1490821104918618112 | 19:50 |
ianw | but no note that i did that ... fungi i'll poke to make sure | 19:51 |
clarkb | shows how we can do testing of gerrit upstream in our downstream jobs with our configs | 19:51 |
clarkb | (using changes that fungi and I wrote as illustrations) | 19:51 |
ianw | nice! | 19:52 |
fungi | now if they'll only merge your fix so i can get on with plastering over gitiles | 19:52 |
clarkb | Sounds like that may be it. Thank you everyone! | 19:54 |
clarkb | we'll see you here next week | 19:54 |
clarkb | #endmeeting | 19:54 |
opendevmeet | Meeting ended Tue Feb 8 19:54:41 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:54 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-02-08-19.01.html | 19:54 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-02-08-19.01.txt | 19:54 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-02-08-19.01.log.html | 19:54 |
fungi | thanks clarkb! | 19:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!