Thursday, 2025-03-13

frickler	JayF: do you still need your ironic-inspector-grenade node? just saw that while cleaning up my own held nodes	06:28
JayF	frickler: no	13:29
JayF	Thanks and sorry I left it 😅	13:29
fungi	the /opt/backups-202010 volume on backup02.ca-ymq-1.vexxhost has reached 90% again, so i've started pruning it in a root screen session	13:35
frickler	JayF: no problem, thanks for the update, deleted the node	13:49
*** benj_9 is now known as benj_		14:19
clarkb	I don't see anything in scrollback indicating problems with the new vexxhost mirrors. I'll get a change up to start removing the old mirrors from config	14:50
opendevreview	Clark Boylan proposed opendev/system-config master: Remove old vexxhost mirrors https://review.opendev.org/c/opendev/system-config/+/944269	14:56
opendevreview	Clark Boylan proposed opendev/zone-opendev.org master: Remove old vexxhost mirrors from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/944270	14:58
clarkb	The gerrit meets event next week Happens at Midnight GMT Wednesday (thats a fun one to try and do a time conversion for 5pm pacific tuesday?) will be covering Gerrit caches	15:18
clarkb	considering our struggles with caches I'm going to try and attend	15:19
clarkb	but figured I'd mention it in case anyone else wants to try too (usually they stream it to youtube and then you can send questions through discord)	15:19
*** benj_4 is now known as benj_		15:21
fungi	i think gmt==utc for now still, as bst hasn't started yet	15:39
fungi	but i could be wrong, i didn't check the start/end dates for bst	15:39
clarkb	fungi: yes that seems to be the case. I punched it into my calendar tool and selected GMT and it gave me 5pm Tuesday pacific which is Midnight UTC	15:40
clarkb	the UK starts summer time on the last Sunday of March	15:40
frickler	iiuc GMT==UTC always holds, british daylight saving time is GMT+1? same start+end as in europe also	15:46
clarkb	ah yup Internet says that is true for most purposes (they are measured different apparently but otherwise equivalent)	15:47
clarkb	and British Summer Time (BST) is the timezone for summer that they will switch to at the end of the moneth	15:47
fungi	right, that's like saying usa est is always UTC-5, which is true, but when people schedule meetings in "est" they typically also mean "edt" (utc-4) in the summer	15:49
fungi	i've definitely run into places where meetings were scheduled in "gmt" which really also meant bst during the months where that's valid	15:50
clarkb	ya though maybe in the UK the distinction is more explicit because greenwich mean time and british summer time don't share the any words but time	15:50
clarkb	fun	15:50
fungi	#status log Pruned backups on backup02.ca-ymq-1.vexxhost reducing volume usage from 90% to 62%	16:00
opendevstatus	fungi: finished logging	16:01
JayF	fungi: BST is in a couple of weeks; I'm in my "all the meetings with UK folks are later" period :D	16:02
JayF	oh, clark with the real specific info :D	16:02
clarkb	https://review.opendev.org/c/opendev/system-config/+/944269is ready when we are (removes old vexxhost mirrors from inventory)	16:02
*** benj_2 is now known as benj_		16:20
*** benj_3 is now known as benj_		16:29
clarkb	double checking the periodic pipeline from last night it looks like ti failed on infra-prod-base. Pulling up base.yaml.log on bridge the issue is a dpkg.lock problem on storyboard. Is that typically a transient thing or do we need to go clear out locks and stuff?	16:52
clarkb	E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)	16:52
clarkb	I'm not worried about the periodic pipline and the higher semaphore limit as adding the new mirrors yesterday ran all the jobs so we've already coverd that	16:53
clarkb	separately each gitea backend appears happy so I think the fix of forcing things through apache is working	16:55
fungi	clarkb: transient, i was able to manually update package indices on it just now with no error	16:58
clarkb	unattended upgrades timing I guess?	16:59
fungi	i can confirm, just a sec	17:00
fungi	looks like it ran today at 02:14:10 and 06:49:28	17:02
clarkb	"start": "2025-03-13 02:18:08.040765" <- from the afiled ansible task	17:02
clarkb	any idea why it would run at the 02:00 time? I think 06:00 is normal right?	17:02
fungi	it's random	17:03
clarkb	ah	17:03
clarkb	fwiw ansible was trying to run apt-get autoremove to remove extra deps	17:03
clarkb	I wonder if we should make that a failed_when false task	17:03
clarkb	though getting signal that something is wrong if packages can't be removed and should be is probably a good idea so I'm not in a hurry to do that. Just throwing the idea out there	17:04
fungi	yesterday it ran at 06:59:02 but the day before it was 06:41:28 and 14:02:51	17:04
clarkb	if this works 95% of the time its probably good enough	17:04
fungi	we could consider turning off unattended-upgrades autorun and instead invoking it with a periodic zuul job holding a semaphore	17:05
clarkb	ya thats an idea. Just invoke the command via ansible in infra-prod-base. THat may make that job run a fair bit longer though	17:05
opendevreview	Merged opendev/system-config master: Remove old vexxhost mirrors https://review.opendev.org/c/opendev/system-config/+/944269	17:06
clarkb	any job it goes into would need to be holding an exclusive lock as other playbooks often do apt tasks	17:06
clarkb	so a new job isn't necessarily beneficial I don't think	17:06
clarkb	unless we only run that job in periodic and not deploy etc	17:06
clarkb	infra-prod-unattended-upgrades then have infra-prod-base soft depend on that and only run it in periodic?	17:08
clarkb	probably not necessary at this point but something to consider if periodic is extra flaky due to unattended upgrades	17:08
fungi	yeah, the more servers we have the higher the chances that at least one is checking for package updates when periodics trigger	17:09
clarkb	the deployment for 944269 is just about done. Should we pull those nodes from dns now?	17:47
fungi	sure, approved 944270 now	17:48
opendevreview	Merged opendev/zone-opendev.org master: Remove old vexxhost mirrors from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/944270	17:49
clarkb	great. I'll work on clearing the cloud resources after lunch	17:50
clarkb	will have to be careful with those since some are boot from volume and others arent and will want to make sure we clean up those extra bits	17:51
fungi	deploy for 944269 took right at 40 minutes	17:51
fungi	so that seems to be our new normal for inventory changes	17:52
clarkb	and we might be able to shave another few minutes by increasing the semaphore limit but I think those returns will be less impressive	17:52
clarkb	1 to 2 was 2 hours to 1 hour. 2 to 4 was 1 hour to 40 minutes. Going to 6 might get us to 35 minutes?	17:53
fungi	hard to say because of the event horizons where it temporarily coalesces to one running job while dependencies wait	17:54
fungi	i don't think it will follow a well-defined curve	17:54
clarkb	good point	17:54
fungi	though i do thing we could get at least some improvements for these particular buildsets even taking it as high as 8x	17:56
fungi	s/thing/think/	17:56
fungi	but yeah, probably not going to get them much below 35 minutes just counting up the durations on the necessarily serialized parts	17:57
fungi	not without refactoring the blocking jobs anyway	17:57
fungi	and we touch the inventory infrequently enough that there's probably not a lot of point in spending more time over-optimizing for that specific case	17:58
clarkb	still to go from over at least 2 hours down to 40 minutes is great	17:58
fungi	yeah, this is plenty faster	17:59
clarkb	I'm still poking around gerrit replcaement things. We put replication config in host vars not group vars (thats good makes spinning up the new server safer/easier). In the host vars I notice that we still issue a cert for review.openstack.org. It doesn't hurt much to have that but do wonder if we think there is some point where we might not need to serve that url anymore	18:03
clarkb	or rather review.openstack.org is an altname for the LE cert	18:03
clarkb	we can probably check apache logs t oget an indication of usage but web crawlers might make that noisy	18:04
clarkb	also review02 has no sshfp records (intentionally)	18:09
clarkb	currently there are about 4 bots competing for time on the old review.openstack.org vhost	18:12
fungi	it's still serving a redirect to the new hostname so that seems fine. we have a number of those situations still, e.g. etherpad	18:21
clarkb	ya just wondering if we want to keep the redirects forever or sunset them if the only active traffic is AI web crawler bots	18:22
clarkb	the main concern with them is they rely on the cname redirect in openstack.org dns which I think you are the only one here that has access to modify now?	18:23
clarkb	though I can probably get access too	18:23
fungi	my only real concern with dropping them is breaking links in old mailing list posts, blog entries, etc	18:26
clarkb	we can also undo that later if we need to. Doesn't have to be part of the server replcaement	18:28
fungi	we don't need separate vhosts for them, could just rely on sni/san+redirect to canonical hostname in a single vhost	18:28
fungi	we have it set up that way in other services for simplicity	18:29
clarkb	though splitting the logs is nice	18:29
clarkb	anyway I just figured it was worth double checking assumptions. I don't feel strongly about it just noticed it as something we are doing and current logs don't show a lot of activity. But if people are clikcing old links that won't pop up as very active	18:30
fungi	https://lists.debian.org/msgid-search/20250313182324.4kibyrjumhdhgt3g@shell.thinkmo.de is a rather fun bug	18:51
fungi	(weird netplan/systemd/cloud-init crash when no serial console is attached)	18:53
JayF	There's a lot of unpleasantness around serial consoles. That's why, at least in the baremetal world, we've discouraged use of them. Including measurable performance hits from just having console logged to serial console.	18:56
fungi	well, in this case netplan was crashing when there was no attached virtual serial device, but when there was one the vm would boot up just fine	18:58
clarkb	I'm going to start cleaning up the old mirror resources in vexxhost. Looks like mirror02s are straightforward and go away as is as they don't have volumes attached. mirror01's have a boot volume and a cache volume neither of which is set to cleanup on delete so will need manual deletion after the server si gone	19:17
fungi	sounds good, thanks	19:18
clarkb	#status log Deleted mirror02.sjc1 (cf0ad69f-3669-4a25-a172-176921626a51) and mirror02.ca-ymq-1 (4dae3327-2ca1-4c66-9595-20ef991633c6) in each corresponding vexxhost region. mirror03's in each region replaced them.	19:20
clarkb	easy ones are done	19:20
opendevstatus	clarkb: finished logging	19:20
clarkb	#status log Deleted mirror01.sjc1.vexxhost.opendev.org (a23a4761-58f3-4237-a2da-30717f96a6fa) and its two volumes (7a1f6f6e-49ff-421d-a320-df526a5162f3, a1a165ab-9b9c-4207-bdfa-71aadb5d75ca) as mirror03 has replaced it.	19:35
opendevstatus	clarkb: finished logging	19:35
clarkb	I don't know if anyone in nova or cinder land is listening but one of the problems I have with boot from volume is that when you delete an instance its old boot volume is now detached in a way that is not easily tied back to the old instance in any way as far as I can tell	19:36
clarkb	you have to take extra steps if you wish to track that info. This makes it easy to orphan volumes and not know what they are for	19:36
clarkb	and then you have to remember to check anytime you delete an instance that might be bfv to take a bunch of notes first	19:36
clarkb	#status log Deleted mirror01.ca-ymq-1.vexxhost.opendev.org (a064b2e3-f47c-4e70-911d-363c81ccff5e) and its two volumes (6590c0fd-6cfb-49fa-bdf0-8636704642ca, eca67816-54a7-4be6-b14f-9b0e733a804f) as mirror03 has replaced it.	19:41
clarkb	I tried to be very careful and complete	19:41
opendevstatus	clarkb: finished logging	19:42
clarkb	I've got a science fair to get to in a couple hours but https://review.opendev.org/c/opendev/system-config/+/943819 may be another good one to get landed (switches hound to python3.12 which is really only used for jeepyb to set up the hound config iirc)	21:57
clarkb	thoughts on that? if we don't do it now I'll try to do it first thing tomorrow	21:57
fungi	yeah, that's pretty straightforward, approved	22:01
clarkb	ya the screenshots confirmed it seemed to get a working config so I expect it is safe too	22:06
clarkb	I'm going to slowly try to rollout more python3.12. As greaet as 3.11 has been we can't stay parked there forever	22:06
clarkb	thinking about the bfv problem I described earlier: I wonder if nova/cinder should simply name the volume "Boot device for instancename (instanceuuid)" or similar	22:08
opendevreview	Merged opendev/system-config master: Update Hound image to python3.12 https://review.opendev.org/c/opendev/system-config/+/943819	22:39
clarkb	that is deploying and I'll check the service afterwards	22:41
clarkb	I expect it will be about a 5-10 minute outage while hound starts up then should be back to nromal after	22:42
fungi	yeah, it takes time to reindex all the code	22:42
fungi	deploy succeeded	22:47
fungi	hound is still starting though, of course	22:47
clarkb	ya I'm checking the container log occasionally to see if it is done spinning up	22:49
clarkb	its still chewing through repos as far as I can tell	22:49
clarkb	it should be up now	22:53
clarkb	https://codesearch.opendev.org/?q=whereto&i=nope&literal=nope&files=&excludeFiles=&repos= works for me	22:53
fungi	yep	23:02
clarkb	I'm popping out now for science fair things	23:25
fungi	for SCIENCE!	23:26
fungi	have fun!	23:26

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!