Tuesday, 2025-01-14

opendevreviewJeremy Stanley proposed opendev/system-config master: Install ssl-cert-check from distro package not Git  https://review.opendev.org/c/opendev/system-config/+/93918700:00
fungithe cronjob needed a path change too00:00
fungimmm, should i include a task to clean up the old git repo on disk if present?00:01
clarkbnote that certcheck runs on the cacti node which might be older00:01
clarkbbased on invenstory/service/groups.yaml certcheck group membership00:01
fungioh, good point, bionic there00:02
fungihah, on cacti it's already installed from distro package00:02
clarkbhuh we must run it out of the git repo for some reason then00:02
clarkbI wonder what the reason is00:03
fungii'm digging00:03
corvusbionic distro version is an older version00:04
fungiprobably we decided the version in bionic is too old, (3.27)00:04
fungilooks like we switched from distro package to git checkout coincident with migrating it from puppet to ansible, but the commit message doesn't explain the reason00:07
fungiand no obvious review comments about it in https://review.opendev.org/728743 either00:09
fungioh, actually, i missed some history from when it was split out to a separate repo. it happened here in 2019: https://review.openstack.org/65016200:10
fungi"so that we get new features like support for SNI"00:11
fungii'll wip 939187 and note that it's on hold until we replace cacti or move the function to a different server00:11
corvusany reason not to run it on bridge?00:13
fungiapparently we do already in the test jobs00:13
fungijust not in production00:13
corvuslet's make production more like test ;)00:14
fungithe only reason i can think of is to minimize the surface area for vulnerabilities on bridge, given its importance, but this doesn't seem like a high risk00:14
fungiclarkb: ^ maybe we should touch on that briefly during the meeting? too late to get it onto the agenda?00:14
Clark[m]Sorry switched to laptop to do the mega update desktop got this morning and not on irc yet. Agenda hasn't gone out yet I can add it00:16
fungihttps://www.githubstatus.com/incidents/qd96yfgvmcf9 might explain the pull error that job reported00:27
clarkbok added to the agenda and I'll get that sent out momentarily00:47
fungithanks!01:15
opendevreviewLajos Katona proposed openstack/project-config master: Remove taas-tempest-plugin and taas-dashboard jobs  https://review.opendev.org/c/openstack/project-config/+/93866510:30
opendevreviewVladimir Kozhukalov proposed zuul/zuul-jobs master: [remove-registry-tag] Improve usage experience  https://review.opendev.org/c/zuul/zuul-jobs/+/93923413:09
*** dhill is now known as Guest589713:31
*** jhorstmann is now known as Guest590013:55
opendevreviewMerged openstack/project-config master: Remove taas-tempest-plugin and taas-dashboard jobs  https://review.opendev.org/c/openstack/project-config/+/93866514:50
clarkbanyone know how to confirm https://quay.io/repository/opendevmirror/mariadb/manifest/sha256:1d3a79e8186307b8f76141e190230f04bce4ebef98057586b8ca75cb0c5055c4 and https://hub.docker.com/layers/library/mariadb/10.11/images/sha256-db739b1d86fd8606a383386d2077aa6441d1bdd1c817673be62c9b74bdd6e7f3 are the same image without fetching them locally?15:35
clarkbin theory those are both the 10.11 mariadb tag, but I don't find shas that match up. Potentially because the docker hub side is multiarch and the quay side is just amd64?15:35
clarkbif I fetch them locally they both end up with the same image id so docker inspect basically inspects the same image strongly implying they are the same. I was just hoping the web dashboards would give us some method of making the same comparison15:38
clarkbanyway infra-root any preference on whether paste02 should incorporate a move to the mirrored image or do that in a separate parent chagne first?15:39
fungican you checksum them?15:43
clarkbfungi: I think that is effectively what the docker pull and docker inspect locally is doing15:44
fungimakes sense15:45
clarkbmy gripe here is that the local client figures out they are the same and effectively merges them, but the two web dashboards dno't provide a quick and easy way to do that15:45
clarkbjust wondering if I'm missing something to make this ieasier15:45
fungii'd be fine squashing the mirror switch into the existing change, but i guess separating it out would make a clearer example for any other similar switching we want people to work on15:45
clarkbI think the issue with hashes mismatching is that both web dashboards show you the manifest hash. The manifest in docker hub has many arches in it and quay.io has a single arch resulting in different manifest and different hashes. But the amd64 image is the same sha256:73b895f9f0fdb34280b709e601d5293a411b1c92cc2ff71a2f70d88fa0d6ddc015:47
clarkbI can separate it out then15:47
opendevreviewClark Boylan proposed opendev/system-config master: Add paste02 to inventory as a new paste server  https://review.opendev.org/c/opendev/system-config/+/93912415:52
opendevreviewClark Boylan proposed opendev/system-config master: Switch to quay.io/opendevmirror/mariadb:10.11 for paste  https://review.opendev.org/c/opendev/system-config/+/93925215:52
clarkbthere is an email about review.o.o's cert expiring in 29 days or less. I just checked and it looks like we did renew the cert so I suspect that not all of the apache processes have aged out? If the warning persists we may do a proper apache restart16:11
clarkbfeel free to double check me on that16:12
fungithanks, i was going to look into that but hadn't gotten to it yet16:14
fungialso i see a infra-prod-service-zuul failure, i'm digging into that now16:14
fungiERROR: for executor  Head "https://quay.io/v2/zuul-ci/zuul-executor/manifests/latest": received unexpected HTTP status: 502 Bad Gateway16:16
fungiguess it was just a blip, a subsequent run succeeded16:17
clarkbfungi: ya quay noted they had some 500 error problems that were being addressed16:24
clarkbthis was in a banner when I tried to compare mariadb hashes16:24
opendevreviewMerged opendev/engagement master: Switch from pipermail to hyperkitty archives  https://review.opendev.org/c/opendev/engagement/+/93898316:24
opendevreviewMerged opendev/engagement master: Add historical report data  https://review.opendev.org/c/opendev/engagement/+/93898416:24
fungioh cool, mystery solved then, nothing to see here16:32
fungipython 3.14.0a4 is up16:44
clarkbthe mariadb switch to opendevmirror seems to have worked17:30
clarkbso I guess if we feel that is safe for opendev production services we can proceed with that then deploy paste0217:31
fungia couple of git vulnerabilities were just announced on the oss-security mailing list, but don't seem likely to impact our systems or users of our workflows: https://www.openwall.com/lists/oss-security/2025/01/14/418:32
clarkbfungi: any reason to not approve https://review.opendev.org/c/opendev/system-config/+/939252 and then go eat lunch?19:57
clarkbIt should be a noop19:57
fungisounds good to me, done19:58
clarkbthanks and now I will find food19:58
fungii'm popping out for a bit to grab dinner, but i guess i can recheck the paste02 inventory add again when i get back19:59
opendevreviewMerged opendev/system-config master: Switch to quay.io/opendevmirror/mariadb:10.11 for paste  https://review.opendev.org/c/opendev/system-config/+/93925220:31
clarkbfungi: I posted a couple of comments on the bindep stack but nothing is critical so I +2'd20:36
clarkbhttps://zuul.opendev.org/t/openstack/build/a5d5ecd8d50545f3896b7a8744c7b973 was successful. Checking the server next20:39
clarkbthe image ids match between mariadb and the quay mirrored image. One thing I didn't expect is that docker-compose still restarted the database20:39
clarkbso now I'm glad I dind't do that for all services all at once20:40
clarkbhttps://paste.opendev.org/show/boCi85KicbL8LtDwEeiE/ I am able to make a paste20:40
clarkband I can load old pastes so I think this is happy now20:40
clarkbthe paste02 inventory add actually needs a reapproval bceause I stakced it on top of the quay.io mirror mariadb change. I'll approve it now20:41
opendevreviewMerged opendev/system-config master: Add paste02 to inventory as a new paste server  https://review.opendev.org/c/opendev/system-config/+/93912421:10
clarkbI forgot that changes to the inventory file are going to trigger all the jobs. But those are enqueued and it should get to it eventually21:18
fungiindeed21:20
opendevreviewJay Faulkner proposed openstack/project-config master: Deprecate ironic-lib  https://review.opendev.org/c/openstack/project-config/+/93928221:25
opendevreviewJeremy Stanley proposed opendev/bindep master: Evacuate most metadata out of setup.cfg  https://review.opendev.org/c/opendev/bindep/+/93852021:27
opendevreviewJeremy Stanley proposed opendev/bindep master: Drop support for Python 3.6  https://review.opendev.org/c/opendev/bindep/+/93856821:27
opendevreviewJeremy Stanley proposed opendev/bindep master: Drop requirements.txt  https://review.opendev.org/c/opendev/bindep/+/93857021:27
opendevreviewJay Faulkner proposed openstack/project-config master: Deprecate ironic-lib  https://review.opendev.org/c/openstack/project-config/+/93928221:34
opendevreviewJay Faulkner proposed openstack/project-config master: Deprecate ironic-lib  https://review.opendev.org/c/openstack/project-config/+/93928221:44
opendevreviewJay Faulkner proposed openstack/project-config master: Deprecate ironic-lib  https://review.opendev.org/c/openstack/project-config/+/93928221:47
opendevreviewJay Faulkner proposed openstack/project-config master: Deprecate ironic-lib  https://review.opendev.org/c/openstack/project-config/+/93928221:49
opendevreviewJay Faulkner proposed openstack/project-config master: Deprecate ironic-lib  https://review.opendev.org/c/openstack/project-config/+/93928221:55
clarkbthe paste deployment should happen soon. Maybe in 15 minutes?22:16
fungiyeah, still trying to keep an eye on it so i can test before we switch the cname22:21
clarkbwell we have to move the data too22:21
clarkb(just want to make sure we don't forget that step22:21
fungiright, who was going to dump/import the db and when?22:21
fungipresumably we'll want to temporarily stop apache on the production server for that, so we don't lose any additions22:22
clarkbI was sorta not thinking about it until after we had a successful deployment. But assuming that happens we could do it immediately I guess?22:22
clarkbyes process should be stop old server services except db, dump db, copy dump to new server, restore, update cname22:22
clarkbthen there are some followup chagnes I need to rebase that will update the backup ssystem to backup the new server22:22
fungidid we want to do a test dump/import first and then stop apache to do a final dump/import and cname update? or just do it all in one?22:23
clarkba test is probably not a bad idea, but the db version doesn't change nor does the lodgeit version chagne since they are all in containers so it should just work as long as we do the restore properly22:23
fungisince it's in a container, i doubt the base os upgrade is likely to break anything22:23
fungiyeah, that's also where my head's at22:23
fungiso mainly wondering if we wanted to do the switch straight away, or schedule it for later22:24
fungii'm around now and good with either option22:24
clarkbI'm ahppy to schedule for later. The biggest unknown is the stability of podman I guess?22:24
fungioutage will probably be long enough we'll at least want to #status lot or notice22:24
clarkbthough my held nodes ran things without noticeable problems22:24
fungis/lot/log/22:25
clarkbif we do wait a day or two we can confirm that podman is still running things on the new server before switching22:25
clarkbthat might be the best option22:25
funginotably, i don't see any indication that we're performing periodic db backups of the current production server22:26
clarkbfungi: paste01 is in the backup list22:26
fungiyeah, we're doing fs backups22:26
clarkboh I see its the db specifically22:26
fungibut no db dumps22:26
fungii was looking for one to gauge the data transfer time order of magnitude22:26
clarkbTue Jan 14 17:28:12 UTC 2025 Backing up stream archive mariadb22:27
clarkbthe backup logs show it does the separate db backup stream so I think that is happening22:27
clarkbthat took about a minute and a half to stream from mariadb to the backup server so it probably won't take long22:27
fungiweird, on paste01 i don't see any separate cronjobs for that22:28
clarkbits nto a separate cronjob its part of the main one for each backup server22:28
clarkbthe backup process looks for any streaming commands somewhere (I forget where we stash those scripts) and runs them after it does the fs backups if they exist22:28
fungiaha, but also we're not doing dumps to /var/backups/mysql or whatever like we do on some servers22:28
fungiokay, probably good enough22:29
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Swap paste.o.o to paste02  https://review.opendev.org/c/opendev/zone-opendev.org/+/93928922:30
clarkbfungi: ya the local backups were how we used to do things then ianw reworked it to be more space efficient. We've continued to do both things for some servers (like gitea iirc)22:30
clarkbI've gone ahead and staged a change for the CNAME update22:30
clarkbI'll wip it for now22:30
fungicool22:30
clarkbfungi: but maybe the thing to do is do a test run without taking paste01 down to gauge how long it will take then we can decide from thati f we just send it or if we want to announce it with a bit more advance notice and/or give podman some burn in time22:31
clarkbthe job failed so thats next to debug22:31
fungimmm22:32
clarkbarg this is my fault22:33
clarkbI assumed the secret vars were in a group vars file but they must be in a host vars file22:33
clarkbthis is based on the complaint in the ansible log file saying a secret carrying var was not defined22:33
fungii ran into the same with the keycloak and mailman uplifts22:34
fungisorry i forgot to check that22:34
clarkbwe actually have both group and host vars so I'm first going to understand what is what but then I'll fix in place on bridge and we can probably just let the daily runs fix it for us22:35
clarkbsince renquing for this one change will rerun everything22:35
fungiyeah22:35
clarkbok I think the group var file is basically old data. Do we want to merge the content from the paste01 host var file into the group var file then plan to delete the paste01 host var file after we switch to paste02? Or do we want to generate new passwords and secret keys for the new server and have them in different host var files and remove the group var file?22:36
clarkbfungi: ^ do you haev a prefernece on that?22:37
fungii'm fine with the first thing, it's less work and we have no reason to expect we need to make more work for ourselves22:37
clarkback proceeding with that now. You can cehck the git log when I'm done22:38
fungii'm already in place and ready to check22:41
clarkbfungi: I'm done22:42
clarkbI think the two things to check are did I sufficiently set up paste02 for success and did I do anything to make paste01 unhappy. I think we should be good on both counts but extra eyes catch silly mistakes22:43
fungiclarkb: git show lgtm22:51
clarkbthanks for checking. But I think that likely forces us into a later migration date. I don't think we're in a hurry to do this otherwise I would manualyl rerun the playbook or something. I guess we could still do that if we want22:52
corvusin general group vars seems more like how we want to manage most things (i think the exception would be during a migration with changing data)23:00
fungiagreed23:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!