opendevreview | Ian Wienand proposed opendev/system-config master: install-root-key : run on localhost https://review.opendev.org/c/opendev/system-config/+/944084 | 01:38 |
---|---|---|
ianw | tonyb: what ever happened with the tzdata thing -> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_dee/944084/2/check/system-config-run-letsencrypt/deea11d/bridge99.opendev.org/ansible/install-root-key.2025-03-12T01%3A51%3A16.log | 02:28 |
tonyb | There is one patch that is ready to merge, but getting the rest merged means larger work which I'm hoping to get to this week | 02:57 |
*** mrunge_ is now known as mrunge | 06:39 | |
amorin | hey team, corvus, we wont be able to perform the flavor upgrade on march 17, the team is partially off. | 08:50 |
amorin | we will most likely be able to perform that in april. I'll keep you in touch with date proposal | 08:50 |
opendevreview | Karolina Kula proposed opendev/glean master: WIP: Add support for CentOS 10 keyfiles https://review.opendev.org/c/opendev/glean/+/941672 | 09:35 |
karolinku | Hello, working on support keyfiles is progressing, (passing https://zuul.opendev.org/t/openstack/build/35604b7b0b704cd8b0e3a43f1395b4b0) but I still miss two functionalities - ipv6 and bonding. I found some sample config which includes bonding, but now I'm looking for config which would include Ipv6 modification (to have something I can base on). Can you help me find something? | 11:12 |
slaweq | hi frickler and fungi - qq about zuul jobs, do we have any job that can be used in e.g. "post" queue to trigger quay.io's webhook to build container image there? | 12:44 |
tonyb | slaweq: I don't think we do. I think we have jobs that build in our CI and then publish to Quay.io | 12:52 |
tonyb | karolinku: Check out https://opendev.org/openstack/ironic/src/branch/master/ironic/tests/json_samples/network_data.json#L22-L33 and https://opendev.org/openstack/ironic/src/branch/master/ironic/tests/json_samples/network_data.json#L64-L82 | 12:53 |
slaweq | tonyb thx for info, I will check those then | 13:01 |
frickler | slaweq: not sure what is actually needed for quay.io, but we do have something to trigger webhooks for readthedocs builds, maybe you could adapt that | 13:02 |
slaweq | yes, I know that one | 13:02 |
karolinku[m] | @tonyb this is bonding and I already found it. ipv6 is sth I coudn't | 13:07 |
fungi | thanks for the update amorin! | 13:07 |
tonyb | Isn't the 2nd link IPv6? | 13:07 |
karolinku[m] | oh, sorry, you are right, I got confused by "private-ipv4" name | 13:22 |
karolinku[m] | thank you! | 13:22 |
*** elodilles is now known as elodilles_pto | 13:22 | |
tonyb | karolinku[m]: Yeah. I guess bonus points if you fix that in ironic ;P | 13:48 |
* tonyb wonders if it's worth borrowing that network_data.json file and adding it to glean for more test coverage | 13:49 | |
tonyb | but that's a not right now suggestion | 13:49 |
karolinku[m] | I mostly based on this config: https://github.com/canonical/cloud-init/issues/5366 | 13:50 |
tonyb | Ah true | 13:51 |
Clark[m] | tonyb: karolinku: glean has ipv6 examples: https://opendev.org/opendev/glean/src/branch/master/glean/tests/fixtures/rax-iad/mnt/config/openstack/latest/network_data.json | 13:54 |
tonyb | Okay. That figures | 13:55 |
tonyb | Can we +A https://review.opendev.org/c/opendev/system-config/+/923684 ... the rest of the series needs work but that one should be safe | 14:08 |
fungi | done | 14:28 |
fungi | i also approved the ansible log redirect change | 14:28 |
fungi | hopefully in a bit we can approve 944081 now that we see gitea09 is still fine today | 14:29 |
opendevreview | Merged opendev/system-config master: run-production-playbook: redirect via ansible logger https://review.opendev.org/c/opendev/system-config/+/943999 | 14:49 |
opendevreview | Merged opendev/system-config master: Also include tzdata when installing ARA https://review.opendev.org/c/opendev/system-config/+/923684 | 14:49 |
fungi | i'll check logs from 943999 when it deploys | 14:50 |
fungi | looks like it only ran infra-prod-bootstrap-bridge | 14:57 |
opendevreview | Amy Marrich proposed opendev/irc-meetings master: Adjust meeting time an hour earlier https://review.opendev.org/c/opendev/irc-meetings/+/944125 | 15:02 |
fungi | 923684 similarly only deployed infra-prod-bootstrap-bridge | 15:02 |
fungi | but the hourlies are running now so those will be a good indication | 15:03 |
clarkb | ya I think that is expected and agreed hourlies should be good checks | 15:04 |
clarkb | last night's periodic build for infra-prod-service-gitea pass despite the port 3000 block on gitea09 and the system-config run change for gitea blocking port 3000 passed. Any objection to me approving the change to block port 3000 on all the giteas now? | 15:17 |
clarkb | I guess we can wait for fungi to beh appy with the logging situation | 15:17 |
clarkb | fungi: let me know when you're satisfied and I'll approve the gitea port block | 15:17 |
fungi | https://zuul.opendev.org/t/openstack/buildset/f130cb4fdf4c4a0d820f658ba8f67308 is the latest hourly buildset that just reported at 15:12:26 utc | 15:20 |
clarkb | https://zuul.opendev.org/t/openstack/build/aefb7604fb4846eab30071f982c85447/console#2/1/3/bridge01.opendev.org this shows the new command ran and we don't get stderr to the stdout field in ansible anymore | 15:21 |
clarkb | now we need to check the log on bridge but I need to load my ssh keys so will be a minute before I can do that | 15:21 |
fungi | https://zuul.opendev.org/t/openstack/build/8216d8ada27f483c998246035e1e737e/console#2/1/3/bridge01.opendev.org for example includes the 2>&1 not | 15:21 |
fungi | s/not/now/ | 15:21 |
fungi | stderr is also empty, as expected | 15:22 |
fungi | /var/log/ansible/service-eavesdrop.yaml.log on bridge looks fine to me | 15:22 |
clarkb | as does service-zuul.yaml.log. I do note the expected stderr output isn't in that file and I think that isb eacuse the stderr output from manage-projects was specific to things running against review as there is some host that isn't in inventory but we use a group for it or something | 15:23 |
clarkb | so I think that is still fine | 15:24 |
fungi | contents of /var/log/ansible/ansible.log for that run is very minimal now too, as intended | 15:24 |
fungi | clarkb: please approve 944081 when ready, i was just waiting for you to be around and settled in for the day | 15:28 |
fungi | or i can approve it | 15:28 |
clarkb | done | 15:28 |
opendevreview | Merged opendev/irc-meetings master: Adjust meeting time an hour earlier https://review.opendev.org/c/opendev/irc-meetings/+/944125 | 15:29 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update infra-prod limit semaphore to a max of 4 https://review.opendev.org/c/opendev/system-config/+/944126 | 15:32 |
clarkb | I don't think ^ is urgent but I didn't want to forget | 15:33 |
clarkb | fungi: have a moment for https://review.opendev.org/c/opendev/system-config/+/944063 as well? That is a good one just to avoid future problems making updates to those images | 15:52 |
clarkb | I suspect we can approve https://review.opendev.org/c/opendev/system-config/+/943992 now as well since ifnra-prod stuff seems generally happy? | 15:52 |
clarkb | thanks! once that set of changes flushes out I'm going to look at booting a noble server or two | 16:20 |
opendevreview | Merged opendev/system-config master: Drop public port 3000 access for Gitea https://review.opendev.org/c/opendev/system-config/+/944081 | 17:02 |
opendevreview | Merged opendev/system-config master: Update infra-prod limit semaphore to a max of 4 https://review.opendev.org/c/opendev/system-config/+/944126 | 17:02 |
clarkb | those ended up just behind the hourly jobs | 17:04 |
fungi | and the semaphore change landed barely too late to increase parallelism in this hourly run | 17:05 |
clarkb | The iptables update change is deploying now. I've ssh'd into gitea09 and 10 to confirm iptables updates when ansible gets there | 17:12 |
opendevreview | Merged opendev/system-config master: Trigger related jobs when statsd images update https://review.opendev.org/c/opendev/system-config/+/944063 | 17:15 |
opendevreview | Merged opendev/system-config master: Have puppet depend on letsencrypt https://review.opendev.org/c/opendev/system-config/+/943992 | 17:15 |
fungi | those should benefit from the parallelsim | 17:16 |
clarkb | infra-prod-base seems to have updated the iptables rules files on disk but loaded rules haven't updated yet. That may be an ansible handler though whcih ahppens late so I'm being patient | 17:18 |
clarkb | ah yup loaded rules appear updated on gitea09 and gitea10 now | 17:19 |
fungi | confirmed, no mention of 3000 in iptables or ip6tables -L now | 17:19 |
fungi | as expected, 944081 is running at 2x parallelism still | 17:20 |
clarkb | side note: if you open :3000 in firefox then edit to :3081 firefox doesn't seem to recheck connectivity until you hit refresh | 17:20 |
fungi | interesting | 17:20 |
clarkb | but anyway :3081 on 09 and 10 is available to me and https://opendev.org seems to work | 17:21 |
clarkb | The infra-prod-service-gitea job which is going to run soon is the last major check I think | 17:21 |
fungi | i guess that and manage-projects aren't running because they depend on the letsencrypt job | 17:22 |
clarkb | I think gitea depends on LE and manage-projects may depend on gitea? | 17:23 |
clarkb | so ya we can still end up with some serialized job sets. | 17:23 |
fungi | curious to see how the next several waiting in line do | 17:34 |
fungi | also this manage-projects run is worth checking the logs for after it finishes since it's the only one we publish in the clear, i think? | 17:35 |
fungi | https://zuul.opendev.org/t/openstack/build/084541f37a1342e990b71b5e41c03ac5/log/manage-projects.yaml.log | 17:35 |
fungi | there it is | 17:35 |
fungi | lgtm | 17:36 |
clarkb | the stderr is in there now rather tahn the zuul logs too | 17:36 |
fungi | mmm, 944126 is only going to run the bootstrap job | 17:36 |
clarkb | ya it just needs to update the git repos basically | 17:36 |
fungi | i wonder, is there a reason to run that job by itself? | 17:37 |
clarkb | yes for the git repo updates | 17:37 |
fungi | i mean, do the git repo updates have any benefit if no other jobs are using that new state? | 17:37 |
fungi | 9440631 may show us >2x parallelism | 17:38 |
fungi | looks like it's going to run both statsd jobs at the same time as the bootstrap | 17:38 |
fungi | oh, though they're just promote jobs | 17:38 |
clarkb | fungi: we interact with the git repos on the host as do cron jobs like zuul upgrades and reboots. 99% of the time its probable fine to let something else catch it up later but being accurate seems good to me | 17:38 |
fungi | yeah, cronjobs are a good reason | 17:38 |
fungi | here we go, gitea-lb, zuul-lb and zookeeper all at the same time now | 17:39 |
clarkb | zoom zoom | 17:39 |
fungi | so that's 3x | 17:39 |
fungi | somebody needs to update the inventory, that'll be a great test | 17:40 |
clarkb | yup I plan on booting some new mirrors in vexxhost as soon as this set of work is able to be paged out | 17:40 |
clarkb | but happy for someone else to find other edits too | 17:40 |
clarkb | fungi: I would expect the hourly jobs to do 4x too. We should try and catch bridge system load while that happens | 17:43 |
fungi | yeah, that should be coming up in about 15 minutes | 17:43 |
clarkb | looks like each vexxhost mirror is 2vcpu 8GB memory with 50GB root disk. Then our typical 200GB cache volume split into two mounts half for apache2 and half for openafs | 17:47 |
fungi | sounds right | 17:47 |
tonyb | That's my recollection | 17:48 |
fungi | the new rackspace flex mirrors are 4vcpu and 80gb rootfs, but otherwise similar | 17:49 |
clarkb | and looks like both are boot from volume. I might switch away from boot from volume if that is an option | 17:49 |
fungi | worth a try | 17:49 |
clarkb | (and if anyone is wondering I'm looking at these two mirrors because review is currently hosted in vexxhost so booting new noble there seems like the next get feedback step for review replacement) | 17:49 |
fungi | maybe also upload a fresh vanilla noble cloud image too | 17:50 |
clarkb | our launch node system does a full system update and reboot before handling the node over to us | 17:50 |
clarkb | it should be fine to use the image that tonyb uploaded previously | 17:50 |
fungi | oh, if it's fairly recent then yeah that works. mainly just wanted to be sure it was an official ubuntu image and not a doctored one | 17:51 |
clarkb | I think the images were upload late last yearso not super recent but also not doctored | 17:52 |
clarkb | but I'll double check. Right now I'm still parsing flavor details | 17:53 |
clarkb | in sjc1 we booted with flavor v2-standard-2 which has no root disk size hence the boot from volume. v3-standard-2 appears to have a 40gb root disk and otherwise matches v2-standard-2 (2vcpu 8gb memory) | 17:54 |
clarkb | I'm going to try the v3-standard-2 flavor and not boot from volume | 17:54 |
clarkb | oh wait | 17:56 |
clarkb | tonyb: you've already added new mirrors in vexxhost in both regions running noble looks like. Do we just need to update DNS then remove the old server? | 17:56 |
clarkb | hrm the sjc1 server doesn't appear to have a volume mounted for caches | 17:57 |
clarkb | and the flavor is v3-standard-8 not v3-standard-2 so maybe bigger than we need. | 17:58 |
clarkb | tonyb: fungi: thoughts on whether we want to retrofit those servers to add the volumes or create new smaller ones with new volumes? | 17:58 |
clarkb | basically we have some half baked mirror02 nodes booted on noble in both vexxhost regions. The base flavor is bigger than necessary and we'd need to attach 200GB volumes to each then migrate content from the existing root fs hosted paths onto the newly mounted paths | 18:00 |
clarkb | or I can boot new mirror03 nodes and start fresh on a smaller flavor | 18:00 |
clarkb | then cleanup mirror02 and mirror01 in each region | 18:00 |
clarkb | given the extra unneeded resource usage I'm leaning twoards starting over | 18:01 |
fungi | are they in the inventory yet? | 18:02 |
clarkb | fungi: yes | 18:02 |
clarkb | but the primary dns record for the mirrors in those regions don't point to them yet | 18:03 |
clarkb | hourly jobs have started so I'm going to pay attention to that for a minute | 18:03 |
clarkb | there are 4 jobs running right now | 18:04 |
clarkb | load average is almost 2 | 18:04 |
fungi | i don't think we could move the mirror02 instances to new flavors without recreating them, could we? and we avoid replacing inventory entries with the same name because of ansible fact caches on bridge | 18:04 |
clarkb | I believe both things to be true | 18:04 |
clarkb | you can increase the size of nodes but not shrink them and only in some clouds iirc | 18:05 |
clarkb | load average up to 2.14 | 18:05 |
fungi | oh, right, this would be a shrinking not a growing | 18:05 |
clarkb | and then 2.75 | 18:06 |
fungi | so i think either live with the mirror02 instances being on the wrong larger flavors, or create new mirror03 instances on our preferred flavors | 18:06 |
fungi | i was too late to catch the parallelism, there are only two remaining jobs running now | 18:07 |
clarkb | yes and if we go with mirror02 instances we will have to do extra surgery to sort out the volume situation to copy over old data or clear it out to avoid masking it. WIth new nodes we can ensure the volumes are ready in place before adding to inventory so we only ever write the content to the cache volume. Either way works | 18:07 |
clarkb | fungi: it ran 4 jobs at once | 18:07 |
clarkb | and I think load peaked at 2.75 | 18:07 |
fungi | awesome | 18:07 |
clarkb | we can see if tonyb has thoughts since I think tonyb did the deployments of the 02 mirrors too | 18:09 |
clarkb | we just ran hourly jobs in like 7 minutes | 18:09 |
fungi | snappy | 18:09 |
fungi | that also means less of a delay when we end up with deploy jobs enqueued just after the top of the hour | 18:10 |
fungi | so these speed improvements compound on one another | 18:10 |
clarkb | yup | 18:10 |
clarkb | I think this is pretty minimal too bceause the nodepool job runs for 6 minutes 15 seconds and the bootstrap job which has to run first took 1 minute 15 ish seconds before pausing | 18:12 |
clarkb | thats 7 minutes and 30 seconds ish and the entire buildset took 7 minutes 43 seconds | 18:12 |
clarkb | any additional parallelism won't be appreciated unless we run bigger buildsets like periodic or some deployments | 18:13 |
clarkb | I've marked the infra prod parallel execution item on our TODO list as done | 18:15 |
clarkb | while we wait for tonyb feedback on the mirror stuff https://review.opendev.org/c/opendev/system-config/+/943819 is another one that we can probably rollout. The statsd container updates lgtm as both zookeeper and haproxy graphs on grafana have recent content | 18:17 |
clarkb | and with lunch approaching maybe I wait until I've been fed if I don't hear back by then then we proceed with mirror surgery that sounds best to us? | 18:18 |
tonyb | Ummm I don't recall the state of those noble mirrors. I strongly suspect I chose poor names for testing the images I uploaded. | 18:21 |
tonyb | I'd suggest just deleting them and starting with fresh | 18:21 |
clarkb | tonyb: the image names are fine "Ubuntu Noble OpenDev 20240724" | 18:22 |
clarkb | I think what happened is you thought the -8 in v3-standard-8 meant 8GB memory but it means 8vcpu I think | 18:22 |
clarkb | and memory is 32GB | 18:22 |
tonyb | Also those images are as close to vanilla as possible. I scripted download/convert upload to all the clouds | 18:22 |
clarkb | which is much much bigger than we need for the mirror nodes :) | 18:22 |
clarkb | tonyb: ya I plan to reuse your images. They are stale but our launch system updates them and reboots before we use them for anything important so should be fine | 18:23 |
tonyb | Yes much bigger that's also partially why I think we should drop them and start from scratch | 18:23 |
fungi | the weather here has taken an unexpected turn for the pleasant, so in a bit i may take a longer late-lunch/early-dinner break to enjoy some food outdoors and get in a short walk | 18:23 |
clarkb | it was more a qusetion of is it better to fix the inflight stuff or start over. And ya I think starting over to get the sizing right makes sense to me | 18:23 |
tonyb | If we can use the same setup as other regions where we have an external volume that'd be cool too | 18:23 |
clarkb | I'll launch new mirror03 nodes and we can cleanup mirror01 and mirror02 together when the time is right | 18:23 |
clarkb | tonyb: yup that is the plan | 18:24 |
tonyb | Perfect thanks | 18:24 |
tonyb | Sorry I left that dangling | 18:24 |
clarkb | it happens. I'm sure you can find things I've left behind in gerrit and elsewhere | 18:24 |
tonyb | Hehe | 18:24 |
clarkb | fungi: enjoy | 18:24 |
fungi | i have an entire list of half-done things i mean to get back to, but the list itself is only half-done and i've been meaning to get back to it | 18:25 |
clarkb | I'll start booting the new node in sjc1 momentarily. I want to work through one before I worry about the other so that I can fix mistakes once not twice :) | 18:25 |
fungi | okay, heading out, i'll probably be up to a couple of hours, but should be around again by 20:30-ish | 18:34 |
clarkb | the sjc1 server is up and I've created a cache volume for it. Next step is to attach that volume, format it, mount it and then I can push up a change to add it to the inventory. Though maybe I should do one chagne for both regions to speed stuff up | 18:42 |
clarkb | but I need to pause here to get lunch going so will pick it back up in a bit | 18:42 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Add new vexxhost mirrors https://review.opendev.org/c/opendev/zone-opendev.org/+/944150 | 20:00 |
opendevreview | Clark Boylan proposed opendev/system-config master: Replace mirror02 with mirror03 in vexxhost regions https://review.opendev.org/c/opendev/system-config/+/944151 | 20:03 |
clarkb | infra-root ^ please review that carefully as there are a number of moving parts | 20:03 |
clarkb | infra-root looking at zuul status we may have an unhappy swift target | 20:09 |
fungi | looking | 20:10 |
fungi | decided to check back in before we take a walk | 20:10 |
clarkb | build 367a55e1a95d4aa289223604bccfd3fc ran on ze05 I'm looking at its logs there | 20:11 |
clarkb | it tried to upload to ovh_bhs | 20:12 |
fungi | i guess see if we get the same errors to gra1 as well | 20:12 |
fungi | sometimes they both go in tandem | 20:13 |
clarkb | ya let me grab a few more example builds and see if I can find them | 20:13 |
clarkb | https://public-cloud.status-ovhcloud.com/ this says keystone may be down. Instead fo needle in a haystack checks let me see what grafana says | 20:13 |
clarkb | https://grafana.opendev.org/d/2b4dba9e25/nodepool3a-ovh?orgId=1&from=now-6h&to=now&timezone=utc&var-region=$__all seems to have started just before 18:30 | 20:14 |
clarkb | I'll push up a change to disable both regions | 20:14 |
fungi | standing by to review | 20:14 |
opendevreview | Clark Boylan proposed opendev/base-jobs master: Disable ovh job log uploads https://review.opendev.org/c/opendev/base-jobs/+/944152 | 20:16 |
fungi | not urgent, but question on 944150 when you're free to revisit it | 20:16 |
fungi | approved 944152 but can also bypass zuul to merge if you like, there's a good chance it won't land on its own | 20:17 |
clarkb | fungi: responded | 20:18 |
clarkb | fungi: ya I think we probably should bypass zuul | 20:18 |
fungi | already halfway there | 20:18 |
opendevreview | Merged opendev/base-jobs master: Disable ovh job log uploads https://review.opendev.org/c/opendev/base-jobs/+/944152 | 20:20 |
clarkb | How about status notice One of our Zuul job log storage providers is experiencing errors. We have removed that storage target from base jobs. You should be able to safely recheck changes now. | 20:21 |
fungi | lgtm | 20:21 |
clarkb | #status notice One of our Zuul job log storage providers is experiencing errors. We have removed that storage target from base jobs. You should be able to safely recheck changes now. | 20:21 |
opendevstatus | clarkb: sending notice | 20:21 |
-opendevstatus- NOTICE: One of our Zuul job log storage providers is experiencing errors. We have removed that storage target from base jobs. You should be able to safely recheck changes now. | 20:22 | |
clarkb | https://zuul.opendev.org/t/opendev/build/8ca02d7df1ec495fbcfeadcf9eb1254a looks like we also have a regular job failuer in that repo | 20:22 |
clarkb | I don't think it was caused by my change | 20:22 |
clarkb | oh thats promote after you force merged complaining that there was no build to promote | 20:23 |
clarkb | nevermind that is expected in this situation | 20:23 |
fungi | right | 20:23 |
fungi | these are not the droids you're looking for, move along | 20:23 |
clarkb | roger roger | 20:24 |
opendevstatus | clarkb: finished sending notice | 20:25 |
opendevreview | Merged opendev/zone-opendev.org master: Add new vexxhost mirrors https://review.opendev.org/c/opendev/zone-opendev.org/+/944150 | 20:27 |
fungi | okay, things seem to be working again so heading out for a walk, should be back in no more than 45 minutes | 20:27 |
clarkb | enjoy! | 20:30 |
clarkb | the inventory update change for the mirrors failed on an rsync ssh key exhcange problem. I'll recheck it once it reports | 21:06 |
*** benj_1 is now known as benj_ | 21:13 | |
fungi | cool, back now anyway | 21:17 |
clarkb | and it is back in the gate again | 21:44 |
fungi | should be merging any moment now, the last running gate job is just starting to upload its logs | 22:12 |
clarkb | I'm still paying attention. This will be a good exercise of the increased semaphore limit due to the inventroy update I think | 22:13 |
fungi | yep | 22:13 |
opendevreview | Merged opendev/system-config master: Replace mirror02 with mirror03 in vexxhost regions https://review.opendev.org/c/opendev/system-config/+/944151 | 22:14 |
fungi | and hourlies are already well and done thanks to how fast they're running | 22:14 |
clarkb | service-mirror is about 2/3 of the way through the list so maybe starting in half an hour? | 22:15 |
clarkb | though with the doubled limit I'm just guessing | 22:15 |
fungi | with bootstrap paused and just base running, load average on bridge is around 2.5 already | 22:19 |
clarkb | ya base hits every server so likely has higher load costs | 22:20 |
fungi | up over 3 now | 22:20 |
clarkb | I guess taht is an important thing to keep in mind the laod on bridge is proportional to the number of servers we have ansible interacting with at once | 22:20 |
clarkb | so it kinda works out that base is a dependency of everything else as it is going to be costly on its own | 22:20 |
fungi | but since base runs by itself this is fairly okay | 22:20 |
fungi | yeah, exactly | 22:21 |
fungi | here comes the load | 22:24 |
clarkb | charge! | 22:24 |
fungi | these 4 jobs aren't half the system load as base imposed though | 22:24 |
fungi | so yeah, we could probably increase the parallelism even further if we see this working out for a while, but changes that touch the inventory are probably the only ones it will make any significant additional performance improvement for | 22:25 |
fungi | it's doing a good job of saturating the semaphore at least, as soon as one of the four completes another spins up to take its place | 22:26 |
clarkb | ya we get diminishing returns as the number goes higher anyway | 22:27 |
fungi | i guess everything else past here has to wait for letsencrypt to finish, sort of an event horizon | 22:28 |
clarkb | yup | 22:28 |
fungi | so we start the bootstrap, pause it and run base by itself, then about a third of the jobs at 4x parallelism, then once letsencrypt completes we do the remaining 3/3 at 4x again | 22:30 |
fungi | er, remaining 2/3 | 22:30 |
fungi | this is also best case conditions at the moment as we're not really running a node request backlog and have plenty of available quota, so new builds are starting with very little latency | 22:31 |
clarkb | which should still be a massive improvment | 22:31 |
fungi | absolutely | 22:32 |
fungi | it'll be interesting to compare to historical deploy buildset times for other changes that touched the inventory | 22:33 |
clarkb | I think ovh may be happier now looking at https://grafana.opendev.org/d/2b4dba9e25/nodepool3a-ovh?orgId=1&from=now-6h&to=now&timezone=utc&var-region=$__all and https://public-cloud.status-ovhcloud.com/incidents/9myc4g6tfvlb | 22:34 |
clarkb | let me see if I can find my base-test test change and if that looks good I can push up a revert | 22:34 |
fungi | and it's off to the races once more | 22:34 |
fungi | just had to catch its breath is all | 22:34 |
clarkb | https://review.opendev.org/c/zuul/zuul-jobs/+/680178 is the base-test test change | 22:35 |
fungi | k | 22:35 |
fungi | software factory is still reporting syntax errors on zuul-jobs changes | 22:36 |
fungi | load average on bridge during parallel jobs is mostly staying in the 0.5-1.5 range | 22:38 |
clarkb | service-mirror just started my estimate was too high by 7 ish minutes | 22:38 |
fungi | keeping in mind this is also a 6vcpu server | 22:38 |
fungi | er, 8vcpu | 22:39 |
clarkb | https://zuul.opendev.org/t/zuul/build/b8d6e273124f4230b2eba82ccbc6db58/logs appears to have uploaded to ovh confirm things appear to work again | 22:41 |
opendevreview | Clark Boylan proposed opendev/base-jobs master: Revert "Disable ovh job log uploads" https://review.opendev.org/c/opendev/base-jobs/+/944171 | 22:43 |
clarkb | I don't think the mirror job has gotten to either mirror yet | 22:44 |
fungi | https://public-cloud.status-ovhcloud.com/incidents/9myc4g6tfvlb | 22:44 |
fungi | seems to confirm | 22:44 |
clarkb | oh its compiling openafs on the mirrors | 22:45 |
clarkb | which si sloooooow | 22:45 |
fungi | even slower on my workstation unfortunately | 22:45 |
opendevreview | Merged opendev/base-jobs master: Revert "Disable ovh job log uploads" https://review.opendev.org/c/opendev/base-jobs/+/944171 | 22:49 |
clarkb | https://mirror03.sjc1.vexxhost.opendev.org/ https://mirror03.ca-ymq-1.vexxhost.opendev.org/ tada | 22:50 |
fungi | and the full deploy buildset is nearly done | 22:50 |
clarkb | give me a sec and I'll get a change up to update DNS and then tomorrow or Friday we can clean up the old servers | 22:50 |
fungi | might be finished in under 40 minutes from the time it enqueued | 22:51 |
fungi | never mind, zuul's estimate for the remaining job probably puts it a bit over | 22:51 |
clarkb | so we went from 2 hours to 1 hour to 40 minutes roughly | 22:52 |
clarkb | oh but this one is an exceptionally long one due to compiling openafs | 22:52 |
clarkb | wouldn't surprise me if periodic is closer to half an hour tonight | 22:53 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Switch vexxhost mirrors over to the new Noble mirrors https://review.opendev.org/c/opendev/zone-opendev.org/+/944173 | 22:54 |
fungi | anyway, yeah, comparing system load peaks on bridge to its cpu count, i don't think it's even close to breaking a sweat | 22:55 |
clarkb | 41 minutes 27 seconds | 22:57 |
clarkb | says the buildsets page | 22:57 |
clarkb | pretty good | 22:57 |
fungi | not too shabby at all | 22:57 |
opendevreview | Merged opendev/zone-opendev.org master: Switch vexxhost mirrors over to the new Noble mirrors https://review.opendev.org/c/opendev/zone-opendev.org/+/944173 | 23:09 |
clarkb | that landed while hourly jobs were running and its already deploying | 23:12 |
fungi | nice | 23:12 |
clarkb | and dns resolves the new records for me | 23:14 |
fungi | same here | 23:20 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!