opendevreview | Jeremy Stanley proposed opendev/system-config master: Install ssl-cert-check from distro package not Git https://review.opendev.org/c/opendev/system-config/+/939187 | 00:00 |
---|---|---|
fungi | the cronjob needed a path change too | 00:00 |
fungi | mmm, should i include a task to clean up the old git repo on disk if present? | 00:01 |
clarkb | note that certcheck runs on the cacti node which might be older | 00:01 |
clarkb | based on invenstory/service/groups.yaml certcheck group membership | 00:01 |
fungi | oh, good point, bionic there | 00:02 |
fungi | hah, on cacti it's already installed from distro package | 00:02 |
clarkb | huh we must run it out of the git repo for some reason then | 00:02 |
clarkb | I wonder what the reason is | 00:03 |
fungi | i'm digging | 00:03 |
corvus | bionic distro version is an older version | 00:04 |
fungi | probably we decided the version in bionic is too old, (3.27) | 00:04 |
fungi | looks like we switched from distro package to git checkout coincident with migrating it from puppet to ansible, but the commit message doesn't explain the reason | 00:07 |
fungi | and no obvious review comments about it in https://review.opendev.org/728743 either | 00:09 |
fungi | oh, actually, i missed some history from when it was split out to a separate repo. it happened here in 2019: https://review.openstack.org/650162 | 00:10 |
fungi | "so that we get new features like support for SNI" | 00:11 |
fungi | i'll wip 939187 and note that it's on hold until we replace cacti or move the function to a different server | 00:11 |
corvus | any reason not to run it on bridge? | 00:13 |
fungi | apparently we do already in the test jobs | 00:13 |
fungi | just not in production | 00:13 |
corvus | let's make production more like test ;) | 00:14 |
fungi | the only reason i can think of is to minimize the surface area for vulnerabilities on bridge, given its importance, but this doesn't seem like a high risk | 00:14 |
fungi | clarkb: ^ maybe we should touch on that briefly during the meeting? too late to get it onto the agenda? | 00:14 |
Clark[m] | Sorry switched to laptop to do the mega update desktop got this morning and not on irc yet. Agenda hasn't gone out yet I can add it | 00:16 |
fungi | https://www.githubstatus.com/incidents/qd96yfgvmcf9 might explain the pull error that job reported | 00:27 |
clarkb | ok added to the agenda and I'll get that sent out momentarily | 00:47 |
fungi | thanks! | 01:15 |
opendevreview | Lajos Katona proposed openstack/project-config master: Remove taas-tempest-plugin and taas-dashboard jobs https://review.opendev.org/c/openstack/project-config/+/938665 | 10:30 |
opendevreview | Vladimir Kozhukalov proposed zuul/zuul-jobs master: [remove-registry-tag] Improve usage experience https://review.opendev.org/c/zuul/zuul-jobs/+/939234 | 13:09 |
*** dhill is now known as Guest5897 | 13:31 | |
*** jhorstmann is now known as Guest5900 | 13:55 | |
opendevreview | Merged openstack/project-config master: Remove taas-tempest-plugin and taas-dashboard jobs https://review.opendev.org/c/openstack/project-config/+/938665 | 14:50 |
clarkb | anyone know how to confirm https://quay.io/repository/opendevmirror/mariadb/manifest/sha256:1d3a79e8186307b8f76141e190230f04bce4ebef98057586b8ca75cb0c5055c4 and https://hub.docker.com/layers/library/mariadb/10.11/images/sha256-db739b1d86fd8606a383386d2077aa6441d1bdd1c817673be62c9b74bdd6e7f3 are the same image without fetching them locally? | 15:35 |
clarkb | in theory those are both the 10.11 mariadb tag, but I don't find shas that match up. Potentially because the docker hub side is multiarch and the quay side is just amd64? | 15:35 |
clarkb | if I fetch them locally they both end up with the same image id so docker inspect basically inspects the same image strongly implying they are the same. I was just hoping the web dashboards would give us some method of making the same comparison | 15:38 |
clarkb | anyway infra-root any preference on whether paste02 should incorporate a move to the mirrored image or do that in a separate parent chagne first? | 15:39 |
fungi | can you checksum them? | 15:43 |
clarkb | fungi: I think that is effectively what the docker pull and docker inspect locally is doing | 15:44 |
fungi | makes sense | 15:45 |
clarkb | my gripe here is that the local client figures out they are the same and effectively merges them, but the two web dashboards dno't provide a quick and easy way to do that | 15:45 |
clarkb | just wondering if I'm missing something to make this ieasier | 15:45 |
fungi | i'd be fine squashing the mirror switch into the existing change, but i guess separating it out would make a clearer example for any other similar switching we want people to work on | 15:45 |
clarkb | I think the issue with hashes mismatching is that both web dashboards show you the manifest hash. The manifest in docker hub has many arches in it and quay.io has a single arch resulting in different manifest and different hashes. But the amd64 image is the same sha256:73b895f9f0fdb34280b709e601d5293a411b1c92cc2ff71a2f70d88fa0d6ddc0 | 15:47 |
clarkb | I can separate it out then | 15:47 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add paste02 to inventory as a new paste server https://review.opendev.org/c/opendev/system-config/+/939124 | 15:52 |
opendevreview | Clark Boylan proposed opendev/system-config master: Switch to quay.io/opendevmirror/mariadb:10.11 for paste https://review.opendev.org/c/opendev/system-config/+/939252 | 15:52 |
clarkb | there is an email about review.o.o's cert expiring in 29 days or less. I just checked and it looks like we did renew the cert so I suspect that not all of the apache processes have aged out? If the warning persists we may do a proper apache restart | 16:11 |
clarkb | feel free to double check me on that | 16:12 |
fungi | thanks, i was going to look into that but hadn't gotten to it yet | 16:14 |
fungi | also i see a infra-prod-service-zuul failure, i'm digging into that now | 16:14 |
fungi | ERROR: for executor Head "https://quay.io/v2/zuul-ci/zuul-executor/manifests/latest": received unexpected HTTP status: 502 Bad Gateway | 16:16 |
fungi | guess it was just a blip, a subsequent run succeeded | 16:17 |
clarkb | fungi: ya quay noted they had some 500 error problems that were being addressed | 16:24 |
clarkb | this was in a banner when I tried to compare mariadb hashes | 16:24 |
opendevreview | Merged opendev/engagement master: Switch from pipermail to hyperkitty archives https://review.opendev.org/c/opendev/engagement/+/938983 | 16:24 |
opendevreview | Merged opendev/engagement master: Add historical report data https://review.opendev.org/c/opendev/engagement/+/938984 | 16:24 |
fungi | oh cool, mystery solved then, nothing to see here | 16:32 |
fungi | python 3.14.0a4 is up | 16:44 |
clarkb | the mariadb switch to opendevmirror seems to have worked | 17:30 |
clarkb | so I guess if we feel that is safe for opendev production services we can proceed with that then deploy paste02 | 17:31 |
fungi | a couple of git vulnerabilities were just announced on the oss-security mailing list, but don't seem likely to impact our systems or users of our workflows: https://www.openwall.com/lists/oss-security/2025/01/14/4 | 18:32 |
clarkb | fungi: any reason to not approve https://review.opendev.org/c/opendev/system-config/+/939252 and then go eat lunch? | 19:57 |
clarkb | It should be a noop | 19:57 |
fungi | sounds good to me, done | 19:58 |
clarkb | thanks and now I will find food | 19:58 |
fungi | i'm popping out for a bit to grab dinner, but i guess i can recheck the paste02 inventory add again when i get back | 19:59 |
opendevreview | Merged opendev/system-config master: Switch to quay.io/opendevmirror/mariadb:10.11 for paste https://review.opendev.org/c/opendev/system-config/+/939252 | 20:31 |
clarkb | fungi: I posted a couple of comments on the bindep stack but nothing is critical so I +2'd | 20:36 |
clarkb | https://zuul.opendev.org/t/openstack/build/a5d5ecd8d50545f3896b7a8744c7b973 was successful. Checking the server next | 20:39 |
clarkb | the image ids match between mariadb and the quay mirrored image. One thing I didn't expect is that docker-compose still restarted the database | 20:39 |
clarkb | so now I'm glad I dind't do that for all services all at once | 20:40 |
clarkb | https://paste.opendev.org/show/boCi85KicbL8LtDwEeiE/ I am able to make a paste | 20:40 |
clarkb | and I can load old pastes so I think this is happy now | 20:40 |
clarkb | the paste02 inventory add actually needs a reapproval bceause I stakced it on top of the quay.io mirror mariadb change. I'll approve it now | 20:41 |
opendevreview | Merged opendev/system-config master: Add paste02 to inventory as a new paste server https://review.opendev.org/c/opendev/system-config/+/939124 | 21:10 |
clarkb | I forgot that changes to the inventory file are going to trigger all the jobs. But those are enqueued and it should get to it eventually | 21:18 |
fungi | indeed | 21:20 |
opendevreview | Jay Faulkner proposed openstack/project-config master: Deprecate ironic-lib https://review.opendev.org/c/openstack/project-config/+/939282 | 21:25 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Evacuate most metadata out of setup.cfg https://review.opendev.org/c/opendev/bindep/+/938520 | 21:27 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Drop support for Python 3.6 https://review.opendev.org/c/opendev/bindep/+/938568 | 21:27 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Drop requirements.txt https://review.opendev.org/c/opendev/bindep/+/938570 | 21:27 |
opendevreview | Jay Faulkner proposed openstack/project-config master: Deprecate ironic-lib https://review.opendev.org/c/openstack/project-config/+/939282 | 21:34 |
opendevreview | Jay Faulkner proposed openstack/project-config master: Deprecate ironic-lib https://review.opendev.org/c/openstack/project-config/+/939282 | 21:44 |
opendevreview | Jay Faulkner proposed openstack/project-config master: Deprecate ironic-lib https://review.opendev.org/c/openstack/project-config/+/939282 | 21:47 |
opendevreview | Jay Faulkner proposed openstack/project-config master: Deprecate ironic-lib https://review.opendev.org/c/openstack/project-config/+/939282 | 21:49 |
opendevreview | Jay Faulkner proposed openstack/project-config master: Deprecate ironic-lib https://review.opendev.org/c/openstack/project-config/+/939282 | 21:55 |
clarkb | the paste deployment should happen soon. Maybe in 15 minutes? | 22:16 |
fungi | yeah, still trying to keep an eye on it so i can test before we switch the cname | 22:21 |
clarkb | well we have to move the data too | 22:21 |
clarkb | (just want to make sure we don't forget that step | 22:21 |
fungi | right, who was going to dump/import the db and when? | 22:21 |
fungi | presumably we'll want to temporarily stop apache on the production server for that, so we don't lose any additions | 22:22 |
clarkb | I was sorta not thinking about it until after we had a successful deployment. But assuming that happens we could do it immediately I guess? | 22:22 |
clarkb | yes process should be stop old server services except db, dump db, copy dump to new server, restore, update cname | 22:22 |
clarkb | then there are some followup chagnes I need to rebase that will update the backup ssystem to backup the new server | 22:22 |
fungi | did we want to do a test dump/import first and then stop apache to do a final dump/import and cname update? or just do it all in one? | 22:23 |
clarkb | a test is probably not a bad idea, but the db version doesn't change nor does the lodgeit version chagne since they are all in containers so it should just work as long as we do the restore properly | 22:23 |
fungi | since it's in a container, i doubt the base os upgrade is likely to break anything | 22:23 |
fungi | yeah, that's also where my head's at | 22:23 |
fungi | so mainly wondering if we wanted to do the switch straight away, or schedule it for later | 22:24 |
fungi | i'm around now and good with either option | 22:24 |
clarkb | I'm ahppy to schedule for later. The biggest unknown is the stability of podman I guess? | 22:24 |
fungi | outage will probably be long enough we'll at least want to #status lot or notice | 22:24 |
clarkb | though my held nodes ran things without noticeable problems | 22:24 |
fungi | s/lot/log/ | 22:25 |
clarkb | if we do wait a day or two we can confirm that podman is still running things on the new server before switching | 22:25 |
clarkb | that might be the best option | 22:25 |
fungi | notably, i don't see any indication that we're performing periodic db backups of the current production server | 22:26 |
clarkb | fungi: paste01 is in the backup list | 22:26 |
fungi | yeah, we're doing fs backups | 22:26 |
clarkb | oh I see its the db specifically | 22:26 |
fungi | but no db dumps | 22:26 |
fungi | i was looking for one to gauge the data transfer time order of magnitude | 22:26 |
clarkb | Tue Jan 14 17:28:12 UTC 2025 Backing up stream archive mariadb | 22:27 |
clarkb | the backup logs show it does the separate db backup stream so I think that is happening | 22:27 |
clarkb | that took about a minute and a half to stream from mariadb to the backup server so it probably won't take long | 22:27 |
fungi | weird, on paste01 i don't see any separate cronjobs for that | 22:28 |
clarkb | its nto a separate cronjob its part of the main one for each backup server | 22:28 |
clarkb | the backup process looks for any streaming commands somewhere (I forget where we stash those scripts) and runs them after it does the fs backups if they exist | 22:28 |
fungi | aha, but also we're not doing dumps to /var/backups/mysql or whatever like we do on some servers | 22:28 |
fungi | okay, probably good enough | 22:29 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Swap paste.o.o to paste02 https://review.opendev.org/c/opendev/zone-opendev.org/+/939289 | 22:30 |
clarkb | fungi: ya the local backups were how we used to do things then ianw reworked it to be more space efficient. We've continued to do both things for some servers (like gitea iirc) | 22:30 |
clarkb | I've gone ahead and staged a change for the CNAME update | 22:30 |
clarkb | I'll wip it for now | 22:30 |
fungi | cool | 22:30 |
clarkb | fungi: but maybe the thing to do is do a test run without taking paste01 down to gauge how long it will take then we can decide from thati f we just send it or if we want to announce it with a bit more advance notice and/or give podman some burn in time | 22:31 |
clarkb | the job failed so thats next to debug | 22:31 |
fungi | mmm | 22:32 |
clarkb | arg this is my fault | 22:33 |
clarkb | I assumed the secret vars were in a group vars file but they must be in a host vars file | 22:33 |
clarkb | this is based on the complaint in the ansible log file saying a secret carrying var was not defined | 22:33 |
fungi | i ran into the same with the keycloak and mailman uplifts | 22:34 |
fungi | sorry i forgot to check that | 22:34 |
clarkb | we actually have both group and host vars so I'm first going to understand what is what but then I'll fix in place on bridge and we can probably just let the daily runs fix it for us | 22:35 |
clarkb | since renquing for this one change will rerun everything | 22:35 |
fungi | yeah | 22:35 |
clarkb | ok I think the group var file is basically old data. Do we want to merge the content from the paste01 host var file into the group var file then plan to delete the paste01 host var file after we switch to paste02? Or do we want to generate new passwords and secret keys for the new server and have them in different host var files and remove the group var file? | 22:36 |
clarkb | fungi: ^ do you haev a prefernece on that? | 22:37 |
fungi | i'm fine with the first thing, it's less work and we have no reason to expect we need to make more work for ourselves | 22:37 |
clarkb | ack proceeding with that now. You can cehck the git log when I'm done | 22:38 |
fungi | i'm already in place and ready to check | 22:41 |
clarkb | fungi: I'm done | 22:42 |
clarkb | I think the two things to check are did I sufficiently set up paste02 for success and did I do anything to make paste01 unhappy. I think we should be good on both counts but extra eyes catch silly mistakes | 22:43 |
fungi | clarkb: git show lgtm | 22:51 |
clarkb | thanks for checking. But I think that likely forces us into a later migration date. I don't think we're in a hurry to do this otherwise I would manualyl rerun the playbook or something. I guess we could still do that if we want | 22:52 |
corvus | in general group vars seems more like how we want to manage most things (i think the exception would be during a migration with changing data) | 23:00 |
fungi | agreed | 23:01 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!