Tuesday, 2022-02-08

clarkb	Our meeting will begin in a couple of minutes	18:59
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue Feb 8 19:01:20 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
ianw	o/	19:01
clarkb	#link https://lists.opendev.org/pipermail/service-discuss/2022-February/000317.html Our Agenda	19:01
frickler	\o	19:01
clarkb	#topic Announcements	19:01
clarkb	OpenInfra Summit CFP needs your input: https://openinfra.dev/summit/ You have until tomorrow to get your ideas in.	19:01
clarkb	I believe the deadline is just something like 21:00 UTC February 9	19:02
clarkb	it is based on US central time. If you plan to get something in last minute definitely double check what the cut off is	19:02
clarkb	Service coordinator nominations run January 25, 2022 - February 8, 2022 Today is the last day.	19:02
clarkb	two things about this. First I've realized that I don't think I formally sent announcement of this outside of our meeting agendas and the email planning for it last year	19:03
clarkb	second we don't have any volunteers yet	19:03
clarkb	Do we think we should send formal announcement in a dedicated thread and give it another week? Not doing that was on me. I think in my head I had done so because I put it on the agenda but that probably wasn't sufficient	19:04
fungi	oh, sure i don't think delaying it a week will hurt	19:05
frickler	I would be surprised if anyone outside this group would show up, so I'd be fine with you just continuing	19:05
clarkb	I'm willing to volunteer again, but would be happy to have someone else do it too. I just don't want anyone to feel like I was being sneaky if I volunteer last minute after doing a poor job announcing this	19:05
clarkb	frickler: agreed, but in that case I don't think waiting another week will hurt anything either.	19:05
clarkb	And that way I can feel like this was done a bit more above board	19:05
frickler	yeah, it won't change anything, true	19:06
frickler	oh, while we're announcing, I'll be on PTO next monday and tuesday	19:06
frickler	so probably won't make it to the meeting, too	19:06
clarkb	If there are no objections I'll send an email today asking for nominations until 23:59 UTC February 15, 2022 just to make sure it is clear and anyone can speak up if they aren't reading meeting agendas or attending	19:06
clarkb	frickler: hopefully you get to do something fun	19:07
clarkb	alright I'm not hearing objections so I'll proceed with that plan after the meeting	19:07
clarkb	#topic Actions from last meeting	19:07
clarkb	#link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-02-01-19.01.txt minutes from last meeting	19:08
clarkb	We recorded two actions. The first was for clarkb to make a list of opendev projects to retire	19:08
clarkb	#link https://etherpad.opendev.org/p/opendev-repo-retirements List of repos to retire. Please double check.	19:08
clarkb	This is step 0 to cleaning up old reviews as we can abandon any changes associated with those repos once they are retired	19:08
clarkb	If you get time please take a look and cross any out that shouldn't be retired or feel free to add projects that should be. I'm hoping to start batching those up later this week	19:08
clarkb	And the other action was frickler was going to push a chaneg to reenable gerrit mergability checks	19:09
clarkb	frickler: I looked for a change but didn't see one. Did I miss it or should we cotninue recording this as an action?	19:09
frickler	yeah, I didn't get to that after having fun with cirros	19:09
frickler	please continue	19:09
clarkb	#action frickler Push change to reenable Gerrit mergability checks	19:09
clarkb	thanks! the cirros stuff helped us double check other important items :)	19:10
clarkb	#topic Topics	19:10
clarkb	#topic Improving OpenDev's CD throughput	19:10
clarkb	I keep getting distracted and reviewing these changes falls lower on the list :(	19:10
clarkb	ianw: anything new to call out for those of us that haven't been able to give this attention recently?	19:10
ianw	no, they might make it to the top of the todo list later this week :)	19:11
clarkb	#topic Container Maintenance	19:11
clarkb	I haven't made recent progress on this, but feel like I am getting out of the hole of the zuul release and server upgrades and summit CFP whee I have time for this. jentoio if an afternoon later this week works for you please reach out.	19:12
clarkb	jentoio: I think we can get on a call together for an hour or two and work through a specific item together	19:12
clarkb	I did take notes a week or two back that hsould help us identify a good candidate and take it from there	19:13
clarkb	Anyway lets sync up later today (or otherwise soon) and find a good time that works	19:14
clarkb	#topic Nodepool image cleanups	19:14
clarkb	CentOS 8 is now gone. We've removed the image from nodepool and the repo from our mirrors.	19:15
clarkb	We accelerated this because upstream started removing bits from repos that broke things anyway	19:15
clarkb	However, we ran into problems where projects were stuck in a chicken and egg situation unable to remove centos 8 jobs beacuse centos 8 was gone	19:15
clarkb	To address this we added the nodeset back to base-jobs but centos-8-stream provides the nodes	19:15
clarkb	We should check periodically with projects on when we can remove that label to remove any remaining confusion over what centos-8 is. But we don't need to be in a huge rush for that.	19:16
ianw	hrm, i thought we were doing that just to avoid zuul errors	19:17
fungi	odds are some of those jobs may "just work" with stream, but they should be corrected anyway	19:17
ianw	i guess it works ... but if they "just work" i think they'll probably never get their node-type fixed	19:17
clarkb	ianw: ya it was the circular zuul errors thta prevented them from removing the centos-8 jobs	19:17
fungi	ianw: we were doing it to avoid zuul errors which prevented those projects from merging the changes they needed to remove the nodes	19:17
clarkb	they are still expected to remove those jobs and stop using the nodeset	19:17
ianw	sure; i wonder if a "fake" node that doesn't fail errors, but doesn't run anything, could work	19:18
fungi	the alternative was for gerrit admins to bypass testing and merge the removal changes for them since they were untestable	19:18
frickler	can we somehow make those jobs fail instead of passing? that would increase probability of fixing things	19:18
fungi	but yeah, we could have pointed them at a different label too	19:18
clarkb	ianw: frickler: that is ag ood idea though I'm not sure how to make that happen	19:18
fungi	i guess we'd need to add an intentionally broken distro	19:19
ianw	if we had a special node that just always was broken	19:19
clarkb	we might be able to do it by setting up a nodepool image+label where the username isn't valid	19:19
clarkb	so nodepool would node_failure it	19:19
fungi	and this becomes an existential debate into what constitutes "broken" ;)	19:19
fungi	but yeah, that sounds pretty broken	19:19
clarkb	basically reuse an existing image but tell nodepool to login as zuulinvalid	19:19
fungi	and would probably be doable without a special image	19:20
clarkb	then we don't have another image around	19:20
clarkb	fungi: ya exactly	19:20
fungi	wfm	19:20
ianw	i wonder if it's generic enough to do something in nodepool itself	19:20
frickler	and nodefailure is actually better than just failing the jobs	19:20
ianw	maybe a no-op node	19:20
frickler	because it gives a hint into the right direction	19:21
clarkb	frickler: ++	19:21
ianw	although, no-op tends to suggest passing, rather than failing	19:21
ianw	i can take a look at some options	19:21
clarkb	thanks!	19:21
clarkb	ianw: frickler: I've also noticed progress on getting things switched over to fedora 35	19:22
clarkb	are we near being able to remove fedora 34 yet?	19:22
frickler	yes, I merged ianw's work earlier today	19:22
ianw	yeah, thanks for that, once devstack updates that should be the last user and we can remove that node type for f35	19:23
frickler	f34	19:23
clarkb	exciting	19:23
ianw	#link https://review.opendev.org/c/openstack/diskimage-builder/+/827772	19:23
ianw	is one minor one that stops a bunch of locale error messages, but will need a dib release	19:23
frickler	another question that came up was how long do we want to keep running xenial images?	19:24
clarkb	frickler: at this point I think they are largely there for our own needs with the last few puppeted things	19:24
frickler	no, there are some py2 jobs and others afaict	19:24
clarkb	frickler: I've got on my todo list to start cleaning up openstack health, subunit2sql, and logstash/elasticsearch stuff which will be a huge chunk of that removed. Now that openstack is doing that with opensearch	19:25
clarkb	frickler: oh interesting	19:25
clarkb	frickler: I think we should push those ahead as much as possible. The fact xenial remains is an artifact of our puppetry and not because we think anyone should be using it anymore	19:25
frickler	publish-openstack-sphinx-docs-base is one example	19:26
clarkb	frickler: do you think that deserves a targetted email to the existing users?	19:26
fungi	right, if we still want to test that our remaining puppet modules can deploy things, we need xenial servers until we decommission the last of them or update their configuration management to something post-puppet	19:26
fungi	thankfully their numbers are continuing to dwindle	19:27
frickler	I'd say I try to collect a more complete list of jobs that would be affected and we can discuss again next week	19:27
clarkb	sounds good	19:27
frickler	except I'm not there, but I'll try to prepare the list at least	19:27
clarkb	frickler: ya if you put the list on the agenda or somewhere else conspicuous I can bring it up	19:27
ianw	on a probably similar note, i started looking at the broken translation proposal jobs	19:29
frickler	but those were bionic iirc?	19:29
ianw	these seemed to break when we moved ensure-sphinx to py3 only, so somehow it seems py2 is involved	19:29
fungi	yeah, bionic	19:30
ianw	but also, all the zanata stuff is stuck in puppet but has no clear path	19:30
frickler	ianw: from what I saw, they were py3 before	19:30
fungi	note that translation imports aren't the only jobs which were broken by that ensure-sphinx change, a bunch of pre-py3 openstack branches were still testing docs builds daily	19:30
frickler	so rather the change from virtualenv to python3 -m venv	19:30
ianw	mmm, yeah that could be it	19:30
ianw	#link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/828219	19:31
clarkb	ianw: ya I think we need to start a thred on the openstack discuss list (because zanata is openstack specific) and lay out the issues and try to get help with a plan forward	19:31
ianw	better explains ^ ... if "python3 -m venv" has a lower version of pip by default than the virtualenv	19:31
clarkb	I'm nto sure I'm fully clued in on all the new stuff. Maybe we should draft an email on an etherpad to make sure we cover all teh important points?	19:31
ianw	clarkb: i feel like there have been threads, but yeah restarting the conversation won't hurt. although i feel like, just like with the images, we might need to come to a "this will break at X" time-frame	19:32
ianw	i can dig through my notes and come up with something	19:33
clarkb	ianw: ya I think that is fair. It is basically what we did with elasticsaerch too. basically this is not currently maintainable and we don't have the bandwidth to make it maintainable. If people want to help us change that please reach out otherwise we'll need to sunset at $time	19:33
clarkb	and ya there have been some threads, but I think they end up bitrotting in people's minds as zanata continues to work :)	19:34
clarkb	thanks!	19:35
clarkb	#topic Cleaing up Old Reviews	19:35
clarkb	As mentioned previously I came up with a list of old repos in opendev that we can probably retire	19:35
clarkb	#link https://etherpad.opendev.org/p/opendev-repo-retirements List of repos to retire. Please double check	19:35
clarkb	If we're happy with the list running through retirements for them will largely be mechanical and then we can abandon all related changes as phase one of the cleanup here	19:36
fungi	i added one a few minutes ago	19:37
clarkb	Then we can see where we are at and dive into system-config cleanups	19:37
clarkb	fungi: thanks!	19:37
fungi	abandoning the changes for those repos will also be necessary as a step in their retirement anyway	19:37
clarkb	fungi: yup the two processes are tied together hwich is nice	19:37
clarkb	The last item on the agenda was already covered (Gerrit mergability checking)	19:38
clarkb	#topic Open Discussion	19:39
clarkb	I'm working on a gitea 1.16 upgrade. First change to do that with confidence is an update to our gitea testing to ensure we're exercisign the ssh container: https://review.opendev.org/c/opendev/system-config/+/828203	19:39
clarkb	The child change which actually upgrades to 1.16 should probably be WIP (I'll do that now) until we've had time to go over the changelog a bit more and hold a node to look for any problems	19:40
clarkb	As a general rule the bug releases have been fine with gitea but the feature releases have been more problematic and I don't mind taking some time to check stuff	19:40
frickler	do we know how much quota we have for our log storage and how much of it we are using?	19:41
frickler	the cirros artifacts, once I manage to collect them properly, are pretty large and I don't want to explode anything	19:41
clarkb	frickler: I don't think quota was ever explicitly stated. Instead we maintained an expiry of 30 days whcih is what we did prior to the move to swift	19:42
clarkb	The swift apis in the clouds we use for this should be abel to give us container size iirc	19:42
clarkb	but I haven't used the swift tooling in a while so could be wrong about that	19:42
frickler	o.k., so I can try to look into checking that manually, thx	19:42
clarkb	note that we shard the containers too so you might need a script to get it	19:43
clarkb	but that should be doable	19:43
fungi	we had a request come through backchannels at the foundation from a user who is looking to have an old ethercalc spreadsheet restored (it was defaced or corrupted at some point in the past). unfortunately the most they know is that the content was intact as of mid-2019. any idea if we even have backups stretching back that far (not to mention that they'd be in bup not borg)?	19:43
ianw	hrm, i think the answer is no	19:44
clarkb	fungi: Ithink if you login to the two backup servers you can see what volumes are mounted and see if we have any of the old volumes but I suspect ya that	19:44
clarkb	your best bet is probably to get the oldest version in borg and spin up an ethercalc locally off of that and se eif the content was corrupted then	19:44
fungi	i was going to say that we have no easy way to restore the data regardless, but also if we don't even have the data still then that's pretty much the end of it	19:45
clarkb	(which is unfortunately a lot of work)	19:45
ianw	unless deletion is just a bit flip of "this is just deleted"	19:45
ianw	you can quickly mount the borg backups on the host	19:45
clarkb	ianw: it looks like ethercalc doesn't store history of edits like etherpad does unfrotunately	19:45
clarkb	it is far more "ether" :)	19:45
fungi	yeah, ethercalc itself can't unwind edits	19:45
fungi	once a pad is defaced, the way you fix it is to reimport the backup export you made	19:46
clarkb	but also it is redis so there isn't an easy just grab these rows from the db backup	19:46
fungi	in this case the user did make a backup, but onto a device whose hard drive apparently died recently	19:46
clarkb	its a whole db backup and I think you have to use it that way	19:46
ianw	ethercalc02-filesystem-2021-02-28T05:51:02 is the earliest	19:47
fungi	oh, thanks for checking!	19:47
ianw	(/usr/local/bin/borg-mount backup02.vexxhost.opendev.org on the host)	19:47
fungi	that's the borg backup. do we still have older bup backups?	19:48
fungi	or did we delete the bup servers?	19:48
clarkb	fungi: correct the bup servers were deleted. Their volumes were kept for a time. Then also eventually deleted iirc	19:49
clarkb	you could double check that the bup volumes are gone. its possible they remained. But pretty sure the servers did not	19:49
ianw	neither has bup backups mounted	19:49
fungi	yeah, okay, so sounds like that's gone, thanks	19:49
ianw	i have in my old todo notes "cleanup final bup backup volume 2021-07"	19:50
clarkb	corvus: made a thing that was really cool I wanted to call out https://twitter.com/acmegating/status/1490821104918618112	19:50
ianw	but no note that i did that ... fungi i'll poke to make sure	19:51
clarkb	shows how we can do testing of gerrit upstream in our downstream jobs with our configs	19:51
clarkb	(using changes that fungi and I wrote as illustrations)	19:51
ianw	nice!	19:52
fungi	now if they'll only merge your fix so i can get on with plastering over gitiles	19:52
clarkb	Sounds like that may be it. Thank you everyone!	19:54
clarkb	we'll see you here next week	19:54
clarkb	#endmeeting	19:54
opendevmeet	Meeting ended Tue Feb 8 19:54:41 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	19:54
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-02-08-19.01.html	19:54
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-02-08-19.01.txt	19:54
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-02-08-19.01.log.html	19:54
fungi	thanks clarkb!	19:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!