Thursday, 2025-03-13

fricklerJayF: do you still need your ironic-inspector-grenade node? just saw that while cleaning up my own held nodes06:28
JayFfrickler: no 13:29
JayFThanks and sorry I left it 😅13:29
fungithe /opt/backups-202010 volume on backup02.ca-ymq-1.vexxhost has reached 90% again, so i've started pruning it in a root screen session13:35
fricklerJayF: no problem, thanks for the update, deleted the node13:49
*** benj_9 is now known as benj_14:19
clarkbI don't see anything in scrollback indicating problems with the new vexxhost mirrors. I'll get a change up to start removing the old mirrors from config14:50
opendevreviewClark Boylan proposed opendev/system-config master: Remove old vexxhost mirrors  https://review.opendev.org/c/opendev/system-config/+/94426914:56
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Remove old vexxhost mirrors from DNS  https://review.opendev.org/c/opendev/zone-opendev.org/+/94427014:58
clarkbThe gerrit meets event next week Happens at Midnight GMT Wednesday (thats a fun one to try and do a time conversion for 5pm pacific tuesday?) will be covering Gerrit caches15:18
clarkbconsidering our struggles with caches I'm going to try and attend15:19
clarkbbut figured I'd mention it in case anyone else wants to try too (usually they stream it to youtube and then you can send questions through discord)15:19
*** benj_4 is now known as benj_15:21
fungii think gmt==utc for now still, as bst hasn't started yet15:39
fungibut i could be wrong, i didn't check the start/end dates for bst15:39
clarkbfungi: yes that seems to be the case. I punched it into my calendar tool and selected GMT and it gave me 5pm Tuesday pacific which is Midnight UTC15:40
clarkbthe UK starts summer time on the last Sunday of March15:40
frickleriiuc GMT==UTC always holds, british daylight saving time is GMT+1? same start+end as in europe also15:46
clarkbah yup Internet says that is true for most purposes (they are measured different apparently but otherwise equivalent)15:47
clarkband British Summer Time (BST) is the timezone for summer that they will switch to at the end of the moneth15:47
fungiright, that's like saying usa est is always UTC-5, which is true, but when people schedule meetings in "est" they typically also mean "edt" (utc-4) in the summer15:49
fungii've definitely run into places where meetings were scheduled in "gmt" which really also meant bst during the months where that's valid15:50
clarkbya though maybe in the UK the distinction is more explicit because greenwich mean time and british summer time don't share the any words but time15:50
clarkbfun15:50
fungi#status log Pruned backups on backup02.ca-ymq-1.vexxhost reducing volume usage from 90% to 62%16:00
opendevstatusfungi: finished logging16:01
JayFfungi: BST is in a couple of weeks; I'm in my "all the meetings with UK folks are later" period :D  16:02
JayFoh, clark with the real specific info :D 16:02
clarkbhttps://review.opendev.org/c/opendev/system-config/+/944269is ready when we are (removes old vexxhost mirrors from inventory)16:02
*** benj_2 is now known as benj_16:20
*** benj_3 is now known as benj_16:29
clarkbdouble checking the periodic pipeline from last night it looks like ti failed on infra-prod-base. Pulling up base.yaml.log on bridge the issue is a dpkg.lock problem on storyboard. Is that typically a transient thing or do we need to go clear out locks and stuff?16:52
clarkbE: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)16:52
clarkbI'm not worried about the periodic pipline and the higher semaphore limit as adding the new mirrors yesterday ran all the jobs so we've already coverd that16:53
clarkbseparately each gitea backend appears happy so I think the fix of forcing things through apache is working16:55
fungiclarkb: transient, i was able to manually update package indices on it just now with no error16:58
clarkbunattended upgrades timing I guess?16:59
fungii can confirm, just a sec17:00
fungilooks like it ran today at 02:14:10 and 06:49:2817:02
clarkb"start": "2025-03-13 02:18:08.040765" <- from the afiled ansible task17:02
clarkbany idea why it would run at the 02:00 time? I think 06:00 is normal right?17:02
fungiit's random17:03
clarkbah17:03
clarkbfwiw ansible was trying to run apt-get autoremove to remove extra deps17:03
clarkbI wonder if we should make that a failed_when false task17:03
clarkbthough getting signal that something is wrong if packages can't be removed and should be is probably a good idea so I'm not in a hurry to do that. Just throwing the idea out there17:04
fungiyesterday it ran at 06:59:02 but the day before it was 06:41:28 and 14:02:5117:04
clarkbif this works 95% of the time its probably good enough17:04
fungiwe could consider turning off unattended-upgrades autorun and instead invoking it with a periodic zuul job holding a semaphore17:05
clarkbya thats an idea. Just invoke the command via ansible in infra-prod-base. THat may make that job run a fair bit longer though17:05
opendevreviewMerged opendev/system-config master: Remove old vexxhost mirrors  https://review.opendev.org/c/opendev/system-config/+/94426917:06
clarkbany job it goes into would need to be holding an exclusive lock as other playbooks often do apt tasks17:06
clarkbso a new job isn't necessarily beneficial I don't think17:06
clarkbunless we only run that job in periodic and not deploy etc17:06
clarkbinfra-prod-unattended-upgrades then have infra-prod-base soft depend on that and only run it in periodic?17:08
clarkbprobably not necessary at this point but something to consider if periodic is extra flaky due to unattended upgrades17:08
fungiyeah, the more servers we have the higher the chances that at least one is checking for package updates when periodics trigger17:09
clarkbthe deployment for 944269 is just about done. Should we pull those nodes from dns now?17:47
fungisure, approved 944270 now17:48
opendevreviewMerged opendev/zone-opendev.org master: Remove old vexxhost mirrors from DNS  https://review.opendev.org/c/opendev/zone-opendev.org/+/94427017:49
clarkbgreat. I'll work on clearing the cloud resources after lunch17:50
clarkbwill have to be careful with those since some are boot from volume and others arent and will want to make sure we clean up those extra bits17:51
fungideploy for 944269 took right at 40 minutes17:51
fungiso that seems to be our new normal for inventory changes17:52
clarkband we might be able to shave another few minutes by increasing the semaphore limit but I think those returns will be less impressive17:52
clarkb1 to 2 was 2 hours to 1 hour. 2 to 4 was 1 hour to 40 minutes. Going to 6 might get us to 35 minutes?17:53
fungihard to say because of the event horizons where it temporarily coalesces to one running job while dependencies wait17:54
fungii don't think it will follow a well-defined curve17:54
clarkbgood point17:54
fungithough i do thing we could get at least some improvements for these particular buildsets even taking it as high as 8x17:56
fungis/thing/think/17:56
fungibut yeah, probably not going to get them much below 35 minutes just counting up the durations on the necessarily serialized parts17:57
funginot without refactoring the blocking jobs anyway17:57
fungiand we touch the inventory infrequently enough that there's probably not a lot of point in spending more time over-optimizing for that specific case17:58
clarkbstill to go from over at least 2 hours down to 40 minutes is great17:58
fungiyeah, this is plenty faster17:59
clarkbI'm still poking around gerrit replcaement things. We put replication config in host vars not group vars (thats good makes spinning up the new server safer/easier). In the host vars I notice that we still issue a cert for review.openstack.org. It doesn't hurt much to have that but do wonder if we think there is some point where we might not need to serve that url anymore18:03
clarkbor rather review.openstack.org is an altname for the LE cert18:03
clarkbwe can probably check apache logs t oget an indication of usage but web crawlers might make that noisy18:04
clarkbalso review02 has no sshfp records (intentionally)18:09
clarkbcurrently there are about 4 bots competing for time on the old review.openstack.org vhost18:12
fungiit's still serving a redirect to the new hostname so that seems fine. we have a number of those situations still, e.g. etherpad18:21
clarkbya just wondering if we want to keep the redirects forever or sunset them if the only active traffic is AI web crawler bots18:22
clarkbthe main concern with them is they rely on the cname redirect in openstack.org dns which I think you are the only one here that has access to modify now?18:23
clarkbthough I can probably get access too18:23
fungimy only real concern with dropping them is breaking links in old mailing list posts, blog entries, etc18:26
clarkbwe can also undo that later if we need to. Doesn't have to be part of the server replcaement18:28
fungiwe don't need separate vhosts for them, could just rely on sni/san+redirect to canonical hostname in a single vhost18:28
fungiwe have it set up that way in other services for simplicity18:29
clarkbthough splitting the logs is nice18:29
clarkbanyway I just figured it was worth double checking assumptions. I don't feel strongly about it just noticed it as something we are doing and current logs don't show a lot of activity. But if people are clikcing old links that won't pop up as very active18:30
fungihttps://lists.debian.org/msgid-search/20250313182324.4kibyrjumhdhgt3g@shell.thinkmo.de is a rather fun bug18:51
fungi(weird netplan/systemd/cloud-init crash when no serial console is attached)18:53
JayFThere's a lot of unpleasantness around serial consoles. That's why, at least in the baremetal world, we've discouraged use of them. Including measurable performance hits from just having console logged to serial console.18:56
fungiwell, in this case netplan was crashing when there was no attached virtual serial device, but when there was one the vm would boot up just fine18:58
clarkbI'm going to start cleaning up the old mirror resources in vexxhost. Looks like mirror02s are straightforward and go away as is as they don't have volumes attached. mirror01's have a boot volume and a cache volume neither of which is set to cleanup on delete so will need manual deletion after the server si gone19:17
fungisounds good, thanks19:18
clarkb#status log Deleted mirror02.sjc1 (cf0ad69f-3669-4a25-a172-176921626a51) and mirror02.ca-ymq-1 (4dae3327-2ca1-4c66-9595-20ef991633c6) in each corresponding vexxhost region. mirror03's in each region replaced them.19:20
clarkbeasy ones are done19:20
opendevstatusclarkb: finished logging19:20
clarkb#status log Deleted mirror01.sjc1.vexxhost.opendev.org (a23a4761-58f3-4237-a2da-30717f96a6fa) and its two volumes (7a1f6f6e-49ff-421d-a320-df526a5162f3, a1a165ab-9b9c-4207-bdfa-71aadb5d75ca) as mirror03 has replaced it.19:35
opendevstatusclarkb: finished logging19:35
clarkbI don't know if anyone in nova or cinder land is listening but one of the problems I have with boot from volume is that when you delete an instance its old boot volume is now detached in a way that is not easily tied back to the old instance in any way as far as I can tell19:36
clarkbyou have to take extra steps if you wish to track that info. This makes it easy to orphan volumes and not know what they are for19:36
clarkband then you have to remember to check anytime you delete an instance that might be bfv to take a bunch of notes first19:36
clarkb#status log Deleted mirror01.ca-ymq-1.vexxhost.opendev.org (a064b2e3-f47c-4e70-911d-363c81ccff5e) and its two volumes (6590c0fd-6cfb-49fa-bdf0-8636704642ca, eca67816-54a7-4be6-b14f-9b0e733a804f) as mirror03 has replaced it.19:41
clarkbI tried to be very careful and complete19:41
opendevstatusclarkb: finished logging19:42
clarkbI've got a science fair to get to in a couple hours but https://review.opendev.org/c/opendev/system-config/+/943819 may be another good one to get landed (switches hound to python3.12 which is really only used for jeepyb to set up the hound config iirc)21:57
clarkbthoughts on that? if we don't do it now I'll try to do it first thing tomorrow21:57
fungiyeah, that's pretty straightforward, approved22:01
clarkbya the screenshots confirmed it seemed to get a working config so I expect it is safe too22:06
clarkbI'm going to slowly try to rollout more python3.12. As greaet as 3.11 has been we can't stay parked there forever22:06
clarkbthinking about the bfv problem I described earlier: I wonder if nova/cinder should simply name the volume "Boot device for instancename (instanceuuid)" or similar22:08
opendevreviewMerged opendev/system-config master: Update Hound image to python3.12  https://review.opendev.org/c/opendev/system-config/+/94381922:39
clarkbthat is deploying and I'll check the service afterwards22:41
clarkbI expect it will be about a 5-10 minute outage while hound starts up then should be back to nromal after22:42
fungiyeah, it takes time to reindex all the code22:42
fungideploy succeeded22:47
fungihound is still starting though, of course22:47
clarkbya I'm checking the container log occasionally to see if it is done spinning up22:49
clarkbits still chewing through repos as far as I can tell22:49
clarkbit should be up now22:53
clarkbhttps://codesearch.opendev.org/?q=whereto&i=nope&literal=nope&files=&excludeFiles=&repos= works for me22:53
fungiyep23:02
clarkbI'm popping out now for science fair things23:25
fungifor SCIENCE!23:26
fungihave fun!23:26

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!