frickler | JayF: do you still need your ironic-inspector-grenade node? just saw that while cleaning up my own held nodes | 06:28 |
---|---|---|
JayF | frickler: no | 13:29 |
JayF | Thanks and sorry I left it 😅 | 13:29 |
fungi | the /opt/backups-202010 volume on backup02.ca-ymq-1.vexxhost has reached 90% again, so i've started pruning it in a root screen session | 13:35 |
frickler | JayF: no problem, thanks for the update, deleted the node | 13:49 |
*** benj_9 is now known as benj_ | 14:19 | |
clarkb | I don't see anything in scrollback indicating problems with the new vexxhost mirrors. I'll get a change up to start removing the old mirrors from config | 14:50 |
opendevreview | Clark Boylan proposed opendev/system-config master: Remove old vexxhost mirrors https://review.opendev.org/c/opendev/system-config/+/944269 | 14:56 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Remove old vexxhost mirrors from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/944270 | 14:58 |
clarkb | The gerrit meets event next week Happens at Midnight GMT Wednesday (thats a fun one to try and do a time conversion for 5pm pacific tuesday?) will be covering Gerrit caches | 15:18 |
clarkb | considering our struggles with caches I'm going to try and attend | 15:19 |
clarkb | but figured I'd mention it in case anyone else wants to try too (usually they stream it to youtube and then you can send questions through discord) | 15:19 |
*** benj_4 is now known as benj_ | 15:21 | |
fungi | i think gmt==utc for now still, as bst hasn't started yet | 15:39 |
fungi | but i could be wrong, i didn't check the start/end dates for bst | 15:39 |
clarkb | fungi: yes that seems to be the case. I punched it into my calendar tool and selected GMT and it gave me 5pm Tuesday pacific which is Midnight UTC | 15:40 |
clarkb | the UK starts summer time on the last Sunday of March | 15:40 |
frickler | iiuc GMT==UTC always holds, british daylight saving time is GMT+1? same start+end as in europe also | 15:46 |
clarkb | ah yup Internet says that is true for most purposes (they are measured different apparently but otherwise equivalent) | 15:47 |
clarkb | and British Summer Time (BST) is the timezone for summer that they will switch to at the end of the moneth | 15:47 |
fungi | right, that's like saying usa est is always UTC-5, which is true, but when people schedule meetings in "est" they typically also mean "edt" (utc-4) in the summer | 15:49 |
fungi | i've definitely run into places where meetings were scheduled in "gmt" which really also meant bst during the months where that's valid | 15:50 |
clarkb | ya though maybe in the UK the distinction is more explicit because greenwich mean time and british summer time don't share the any words but time | 15:50 |
clarkb | fun | 15:50 |
fungi | #status log Pruned backups on backup02.ca-ymq-1.vexxhost reducing volume usage from 90% to 62% | 16:00 |
opendevstatus | fungi: finished logging | 16:01 |
JayF | fungi: BST is in a couple of weeks; I'm in my "all the meetings with UK folks are later" period :D | 16:02 |
JayF | oh, clark with the real specific info :D | 16:02 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/944269is ready when we are (removes old vexxhost mirrors from inventory) | 16:02 |
*** benj_2 is now known as benj_ | 16:20 | |
*** benj_3 is now known as benj_ | 16:29 | |
clarkb | double checking the periodic pipeline from last night it looks like ti failed on infra-prod-base. Pulling up base.yaml.log on bridge the issue is a dpkg.lock problem on storyboard. Is that typically a transient thing or do we need to go clear out locks and stuff? | 16:52 |
clarkb | E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable) | 16:52 |
clarkb | I'm not worried about the periodic pipline and the higher semaphore limit as adding the new mirrors yesterday ran all the jobs so we've already coverd that | 16:53 |
clarkb | separately each gitea backend appears happy so I think the fix of forcing things through apache is working | 16:55 |
fungi | clarkb: transient, i was able to manually update package indices on it just now with no error | 16:58 |
clarkb | unattended upgrades timing I guess? | 16:59 |
fungi | i can confirm, just a sec | 17:00 |
fungi | looks like it ran today at 02:14:10 and 06:49:28 | 17:02 |
clarkb | "start": "2025-03-13 02:18:08.040765" <- from the afiled ansible task | 17:02 |
clarkb | any idea why it would run at the 02:00 time? I think 06:00 is normal right? | 17:02 |
fungi | it's random | 17:03 |
clarkb | ah | 17:03 |
clarkb | fwiw ansible was trying to run apt-get autoremove to remove extra deps | 17:03 |
clarkb | I wonder if we should make that a failed_when false task | 17:03 |
clarkb | though getting signal that something is wrong if packages can't be removed and should be is probably a good idea so I'm not in a hurry to do that. Just throwing the idea out there | 17:04 |
fungi | yesterday it ran at 06:59:02 but the day before it was 06:41:28 and 14:02:51 | 17:04 |
clarkb | if this works 95% of the time its probably good enough | 17:04 |
fungi | we could consider turning off unattended-upgrades autorun and instead invoking it with a periodic zuul job holding a semaphore | 17:05 |
clarkb | ya thats an idea. Just invoke the command via ansible in infra-prod-base. THat may make that job run a fair bit longer though | 17:05 |
opendevreview | Merged opendev/system-config master: Remove old vexxhost mirrors https://review.opendev.org/c/opendev/system-config/+/944269 | 17:06 |
clarkb | any job it goes into would need to be holding an exclusive lock as other playbooks often do apt tasks | 17:06 |
clarkb | so a new job isn't necessarily beneficial I don't think | 17:06 |
clarkb | unless we only run that job in periodic and not deploy etc | 17:06 |
clarkb | infra-prod-unattended-upgrades then have infra-prod-base soft depend on that and only run it in periodic? | 17:08 |
clarkb | probably not necessary at this point but something to consider if periodic is extra flaky due to unattended upgrades | 17:08 |
fungi | yeah, the more servers we have the higher the chances that at least one is checking for package updates when periodics trigger | 17:09 |
clarkb | the deployment for 944269 is just about done. Should we pull those nodes from dns now? | 17:47 |
fungi | sure, approved 944270 now | 17:48 |
opendevreview | Merged opendev/zone-opendev.org master: Remove old vexxhost mirrors from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/944270 | 17:49 |
clarkb | great. I'll work on clearing the cloud resources after lunch | 17:50 |
clarkb | will have to be careful with those since some are boot from volume and others arent and will want to make sure we clean up those extra bits | 17:51 |
fungi | deploy for 944269 took right at 40 minutes | 17:51 |
fungi | so that seems to be our new normal for inventory changes | 17:52 |
clarkb | and we might be able to shave another few minutes by increasing the semaphore limit but I think those returns will be less impressive | 17:52 |
clarkb | 1 to 2 was 2 hours to 1 hour. 2 to 4 was 1 hour to 40 minutes. Going to 6 might get us to 35 minutes? | 17:53 |
fungi | hard to say because of the event horizons where it temporarily coalesces to one running job while dependencies wait | 17:54 |
fungi | i don't think it will follow a well-defined curve | 17:54 |
clarkb | good point | 17:54 |
fungi | though i do thing we could get at least some improvements for these particular buildsets even taking it as high as 8x | 17:56 |
fungi | s/thing/think/ | 17:56 |
fungi | but yeah, probably not going to get them much below 35 minutes just counting up the durations on the necessarily serialized parts | 17:57 |
fungi | not without refactoring the blocking jobs anyway | 17:57 |
fungi | and we touch the inventory infrequently enough that there's probably not a lot of point in spending more time over-optimizing for that specific case | 17:58 |
clarkb | still to go from over at least 2 hours down to 40 minutes is great | 17:58 |
fungi | yeah, this is plenty faster | 17:59 |
clarkb | I'm still poking around gerrit replcaement things. We put replication config in host vars not group vars (thats good makes spinning up the new server safer/easier). In the host vars I notice that we still issue a cert for review.openstack.org. It doesn't hurt much to have that but do wonder if we think there is some point where we might not need to serve that url anymore | 18:03 |
clarkb | or rather review.openstack.org is an altname for the LE cert | 18:03 |
clarkb | we can probably check apache logs t oget an indication of usage but web crawlers might make that noisy | 18:04 |
clarkb | also review02 has no sshfp records (intentionally) | 18:09 |
clarkb | currently there are about 4 bots competing for time on the old review.openstack.org vhost | 18:12 |
fungi | it's still serving a redirect to the new hostname so that seems fine. we have a number of those situations still, e.g. etherpad | 18:21 |
clarkb | ya just wondering if we want to keep the redirects forever or sunset them if the only active traffic is AI web crawler bots | 18:22 |
clarkb | the main concern with them is they rely on the cname redirect in openstack.org dns which I think you are the only one here that has access to modify now? | 18:23 |
clarkb | though I can probably get access too | 18:23 |
fungi | my only real concern with dropping them is breaking links in old mailing list posts, blog entries, etc | 18:26 |
clarkb | we can also undo that later if we need to. Doesn't have to be part of the server replcaement | 18:28 |
fungi | we don't need separate vhosts for them, could just rely on sni/san+redirect to canonical hostname in a single vhost | 18:28 |
fungi | we have it set up that way in other services for simplicity | 18:29 |
clarkb | though splitting the logs is nice | 18:29 |
clarkb | anyway I just figured it was worth double checking assumptions. I don't feel strongly about it just noticed it as something we are doing and current logs don't show a lot of activity. But if people are clikcing old links that won't pop up as very active | 18:30 |
fungi | https://lists.debian.org/msgid-search/20250313182324.4kibyrjumhdhgt3g@shell.thinkmo.de is a rather fun bug | 18:51 |
fungi | (weird netplan/systemd/cloud-init crash when no serial console is attached) | 18:53 |
JayF | There's a lot of unpleasantness around serial consoles. That's why, at least in the baremetal world, we've discouraged use of them. Including measurable performance hits from just having console logged to serial console. | 18:56 |
fungi | well, in this case netplan was crashing when there was no attached virtual serial device, but when there was one the vm would boot up just fine | 18:58 |
clarkb | I'm going to start cleaning up the old mirror resources in vexxhost. Looks like mirror02s are straightforward and go away as is as they don't have volumes attached. mirror01's have a boot volume and a cache volume neither of which is set to cleanup on delete so will need manual deletion after the server si gone | 19:17 |
fungi | sounds good, thanks | 19:18 |
clarkb | #status log Deleted mirror02.sjc1 (cf0ad69f-3669-4a25-a172-176921626a51) and mirror02.ca-ymq-1 (4dae3327-2ca1-4c66-9595-20ef991633c6) in each corresponding vexxhost region. mirror03's in each region replaced them. | 19:20 |
clarkb | easy ones are done | 19:20 |
opendevstatus | clarkb: finished logging | 19:20 |
clarkb | #status log Deleted mirror01.sjc1.vexxhost.opendev.org (a23a4761-58f3-4237-a2da-30717f96a6fa) and its two volumes (7a1f6f6e-49ff-421d-a320-df526a5162f3, a1a165ab-9b9c-4207-bdfa-71aadb5d75ca) as mirror03 has replaced it. | 19:35 |
opendevstatus | clarkb: finished logging | 19:35 |
clarkb | I don't know if anyone in nova or cinder land is listening but one of the problems I have with boot from volume is that when you delete an instance its old boot volume is now detached in a way that is not easily tied back to the old instance in any way as far as I can tell | 19:36 |
clarkb | you have to take extra steps if you wish to track that info. This makes it easy to orphan volumes and not know what they are for | 19:36 |
clarkb | and then you have to remember to check anytime you delete an instance that might be bfv to take a bunch of notes first | 19:36 |
clarkb | #status log Deleted mirror01.ca-ymq-1.vexxhost.opendev.org (a064b2e3-f47c-4e70-911d-363c81ccff5e) and its two volumes (6590c0fd-6cfb-49fa-bdf0-8636704642ca, eca67816-54a7-4be6-b14f-9b0e733a804f) as mirror03 has replaced it. | 19:41 |
clarkb | I tried to be very careful and complete | 19:41 |
opendevstatus | clarkb: finished logging | 19:42 |
clarkb | I've got a science fair to get to in a couple hours but https://review.opendev.org/c/opendev/system-config/+/943819 may be another good one to get landed (switches hound to python3.12 which is really only used for jeepyb to set up the hound config iirc) | 21:57 |
clarkb | thoughts on that? if we don't do it now I'll try to do it first thing tomorrow | 21:57 |
fungi | yeah, that's pretty straightforward, approved | 22:01 |
clarkb | ya the screenshots confirmed it seemed to get a working config so I expect it is safe too | 22:06 |
clarkb | I'm going to slowly try to rollout more python3.12. As greaet as 3.11 has been we can't stay parked there forever | 22:06 |
clarkb | thinking about the bfv problem I described earlier: I wonder if nova/cinder should simply name the volume "Boot device for instancename (instanceuuid)" or similar | 22:08 |
opendevreview | Merged opendev/system-config master: Update Hound image to python3.12 https://review.opendev.org/c/opendev/system-config/+/943819 | 22:39 |
clarkb | that is deploying and I'll check the service afterwards | 22:41 |
clarkb | I expect it will be about a 5-10 minute outage while hound starts up then should be back to nromal after | 22:42 |
fungi | yeah, it takes time to reindex all the code | 22:42 |
fungi | deploy succeeded | 22:47 |
fungi | hound is still starting though, of course | 22:47 |
clarkb | ya I'm checking the container log occasionally to see if it is done spinning up | 22:49 |
clarkb | its still chewing through repos as far as I can tell | 22:49 |
clarkb | it should be up now | 22:53 |
clarkb | https://codesearch.opendev.org/?q=whereto&i=nope&literal=nope&files=&excludeFiles=&repos= works for me | 22:53 |
fungi | yep | 23:02 |
clarkb | I'm popping out now for science fair things | 23:25 |
fungi | for SCIENCE! | 23:26 |
fungi | have fun! | 23:26 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!