*** darmach9 is now known as darmach | 11:43 | |
clarkb | infra-root I'm getting ready to put the review servers in the emergency file and do presyncing of the data (which should dramatically speed up the actual syncs during the downtime) | 13:50 |
---|---|---|
clarkb | https://etherpad.opendev.org/p/i_vt63v18c3RKX2VyCs3 is our document | 13:51 |
fungi | cool, i'm around but on a conference call for the next hour | 13:51 |
clarkb | one thing I realized is that the manage-projects git repo cache and json state cache are not synced as part of this migration. That should be fine as they are both caches | 13:52 |
clarkb | but I wanted to call it out as something that we might see later when manage-projects runs for the first time. In particular I would expect it to be slow the first run | 13:52 |
fungi | ah, yeah, cold cache. we could copy it over, but it won't slow down the outage if we don't, just slow down the first mp run | 13:57 |
clarkb | ya I think we ignore it for now and we can try and copy it over later if it becomes a problem | 14:01 |
clarkb | index has presynced, git is presyncing now | 14:01 |
clarkb | then I'll also copy the replication config over but not put it in place yet. | 14:02 |
clarkb | I think /opt/lib/jeepyb is the manage-projects cache. Probably copying over the project.cache json blob is sufficient then let it grow out the git repo/ref cache over time as things change relative to that? I don't know | 14:06 |
clarkb | but as mentioned I think it should be safe for us to proceed as is and just let it build out that cache in the first place | 14:06 |
fungi | should we do a service notice sometime in the next hour reminding people about the upcoming outage and pointing to https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/D6VGKXHKXCV6TD6MFJY4H4KQBIM3AQYI/ ? | 14:07 |
clarkb | ++ | 14:07 |
fungi | and yeah, i think we perform a cold cache test of manage-projects in zuul jobs, right? | 14:08 |
fungi | (because there is no jeepyb cache on the test nodes) | 14:08 |
fungi | service notice Reminder: The Gerrit service on review.opendev.org will be offline for a server replacement maintenance between 16:00-17:00 UTC today per https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/D6VGKXHKXCV6TD6MFJY4H4KQBIM3AQYI/ | 14:10 |
fungi | something like that? | 14:10 |
clarkb | "yes" I'm not sure the regular system-config-run-review jobs use manage-projects at all. But the separate jeepyb gerritlib integration job does. However this jeepyb gerritlib integration job uses a much smaller projects.yaml that isn't the same as our production one | 14:11 |
fungi | s/^service notice/status notice/ | 14:11 |
clarkb | the notice lgtm | 14:11 |
fungi | cool | 14:11 |
clarkb | git presync took almost 10 minutes but a rerun took only a few seconds | 14:12 |
clarkb | so I think we're in good shape there now | 14:12 |
fungi | perfect | 14:12 |
clarkb | now to copy over the replication config so that it is ready for us later | 14:13 |
fungi | #status notice notice Reminder: The Gerrit service on review.opendev.org will be offline for a server replacement maintenance between 16:00-17:00 UTC today per https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/D6VGKXHKXCV6TD6MFJY4H4KQBIM3AQYI/ | 14:13 |
opendevstatus | fungi: sending notice | 14:13 |
-opendevstatus- NOTICE: notice Reminder: The Gerrit service on review.opendev.org will be offline for a server replacement maintenance between 16:00-17:00 UTC today per https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/D6VGKXHKXCV6TD6MFJY4H4KQBIM3AQYI/ | 14:13 | |
clarkb | the replication config is stashed in ~gerrit2/tmp/clarkb/replication.config for now | 14:16 |
opendevstatus | fungi: finished sending notice | 14:16 |
clarkb | I guess I can go ahead and clear out the cache files on review03 now since gerrit is stopped there | 14:17 |
clarkb | fungi: for later: maybe you can be set up to force merge https://review.opendev.org/c/opendev/zone-opendev.org/+/947137 to ensure it goes in ahead of the hourly jobs at 1600 UTC (maybe merge it at 15:59?) | 14:20 |
clarkb | then I'll be ready to shutdown gerrit on review02 as soon as that deploys and sync data across | 14:20 |
fungi | yeah, can do | 14:21 |
clarkb | looking at /usr/local/bin/manage-projects it bind mounts root's ssh known_hosts file which apepars to have the record in it that we need for the ssh connection to work. It also bind points /opt/lib/jeepyb which does not exist on review03 yet. It isn't clear to me if docker/podman will create that directory as part of the bind mount process or if we need to create it first | 14:31 |
clarkb | "If you use --volume to bind-mount a file or directory that does not yet exist on the Docker host, Docker automatically creates the directory on the host for you. It's always created as a directory." | 14:33 |
clarkb | so in theory I think this should work as is | 14:34 |
clarkb | I guess we can sync over the manage-projects cache after we have review03 up and running since manage-projects shouldn't run uintil we land the switch to pull it out of review-staging. Maybe that is the safest thign to do? | 14:38 |
clarkb | I guess let's worry about that when the initial migration is done | 14:38 |
fungi | yeah, seems like something to double-check later today | 15:01 |
fungi | i've got the submit queued up for 947137,1 in about 45 minutes | 15:14 |
clarkb | then do we want to resend your notice at 1600 too? | 15:17 |
fungi | yeah | 15:26 |
fungi | i can do that once i merge the change | 15:26 |
fungi | status notice notice The Gerrit service on review.opendev.org is offline for a server replacement maintenance between 16:00-17:00 UTC today per https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/D6VGKXHKXCV6TD6MFJY4H4KQBIM3AQYI/ | 15:27 |
clarkb | ++ | 15:27 |
fungi | slight edit to bring that one into the present-tense | 15:27 |
clarkb | then I'll wait for that to roll out and for DNS to start resolving the new location before I stop gerrit on 02 and begin data syncs | 15:27 |
fungi | in theory we shouldn't need to wait for either of those | 15:28 |
clarkb | we need to wait long enough that the chagne replicatiosn and zuul can fetch it | 15:28 |
fungi | the moment the deploy finishes we should be able to go ahead | 15:28 |
clarkb | *that the change replicates | 15:28 |
clarkb | but ya once we're far enough along it should be fine to proceed | 15:28 |
fungi | well, that yes, but that will happen before the deploy completes | 15:29 |
clarkb | ++ | 15:29 |
fungi | whereas change in dns resolution won't start to propagate until after it's deployed to the server | 15:29 |
clarkb | fwiw I was thinking something like dig review.opendev.org @ns03.opendev.org just to verify the dns update occurred but not necessarily wait for google et al to see it | 15:30 |
fungi | yeah, wfm | 15:32 |
fungi | in theory that should show the new value shortly before the deploy job completes | 15:33 |
fungi | infra-root: 10 minutes to maintenance | 15:50 |
clarkb | I'm ready now :) | 15:51 |
clarkb | I'm staged to stop gerrit on 02 and dump the sql db there. Then on 03 there is a screen where I'll run the data synchronization from | 15:53 |
clarkb | (no screen on 02 as its just stopping the service and performing the db dump) | 15:53 |
fungi | thanks, attached | 15:54 |
fungi | merging the dns change now | 15:58 |
opendevreview | Merged opendev/zone-opendev.org master: Switch review.o.o to review03 https://review.opendev.org/c/opendev/zone-opendev.org/+/947137 | 15:59 |
fungi | #status notice notice The Gerrit service on review.opendev.org is offline for a server replacement maintenance between 16:00-17:00 UTC today per https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/D6VGKXHKXCV6TD6MFJY4H4KQBIM3AQYI/ | 15:59 |
opendevstatus | fungi: sending notice | 15:59 |
clarkb | deploy jobs have started | 15:59 |
-opendevstatus- NOTICE: notice The Gerrit service on review.opendev.org is offline for a server replacement maintenance between 16:00-17:00 UTC today per https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/D6VGKXHKXCV6TD6MFJY4H4KQBIM3AQYI/ | 15:59 | |
clarkb | side note in the future maybe we should do things at the half hour so that we avoid hourly jobs entirely? | 16:01 |
fungi | fair point | 16:01 |
opendevstatus | fungi: finished sending notice | 16:02 |
clarkb | `dig review.opendev.org @ns03.opendev.org` seems to show me the new record now | 16:02 |
clarkb | and that deploy job was a success. Should I want for bootstrap bridge for hourlies to finish or just proceed with stopping services now? | 16:02 |
clarkb | s/want/wait/ | 16:02 |
fungi | i would just proceed | 16:03 |
clarkb | I guess I should proceed since dns updates can be seen on that server at this point anyway and there is no gerrit on 03 running | 16:03 |
clarkb | doing so | 16:03 |
clarkb | gerrit is down and mariadb is up | 16:04 |
clarkb | will do the db dump momentarily (want to udpate teh etherpad progress notes) | 16:04 |
fungi | i've been crossing out what you've done too | 16:05 |
fungi | looks right | 16:06 |
clarkb | db is synced and loaded. proceeding to the rsyncs now | 16:08 |
fungi | k | 16:08 |
fungi | seems done | 16:10 |
fungi | looks like i would expect | 16:10 |
clarkb | index and git should be synced now | 16:11 |
clarkb | fungi: were you able to follow ^ that or do you want to double check in scrollback? | 16:11 |
fungi | i was able to follow it | 16:12 |
clarkb | cool and you're happy that db, git, and indexes all synced? If so I'll copy the replication config over and I think we can turn on gerrit | 16:12 |
fungi | yeah, all looks right | 16:13 |
fungi | replication config lgtm too | 16:13 |
clarkb | ok ready for me to start gerrit on 03? | 16:14 |
fungi | yep, go for it | 16:14 |
clarkb | cool just updated the etherpad to reflect we're about to start gerrit. | 16:15 |
clarkb | I'll start gerrit momentarily | 16:15 |
clarkb | [2025-04-21T16:15:49.548Z] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.10.5-1-g47283ba335-dirty ready | 16:16 |
clarkb | unfortunately our old fail to ssh friend is still there | 16:16 |
clarkb | so IP change didn't help that | 16:16 |
fungi | Powered by Gerrit Code Review (3.10.5-1-g47283ba335-dirty)\ | 16:16 |
fungi | webui is up | 16:16 |
clarkb | I'm going to work on logging in and all that now | 16:16 |
fungi | i was able to log in | 16:17 |
fungi | pulling up changes and diffs is working for me | 16:17 |
fungi | very snappy | 16:17 |
clarkb | I was abel to log in as well | 16:18 |
corvus | i just did some searching by files and it's working (so i guess that confirms some indexes) | 16:18 |
corvus | gertty is happy | 16:19 |
clarkb | corvus: any chance you've checekd zuul yet? | 16:19 |
corvus | nope but i can | 16:19 |
clarkb | if we think zuul is happy we can approve https://review.opendev.org/c/opendev/zone-opendev.org/+/947614 and see that zuul ci testing and replication to gitea etc all work | 16:19 |
fungi | yeah, my gertty went "offline" for a bit during the outage but came right back on its own without a restart | 16:20 |
corvus | zuul01 is getting stream events | 16:21 |
clarkb | corvus: cool I think we approve 947614 then | 16:21 |
corvus | looks pretty normal to me | 16:21 |
fungi | openstack tenant isn't very busy, but that's to be expected | 16:21 |
clarkb | I approved that change | 16:22 |
clarkb | its gate job enqueued | 16:22 |
fungi | yeah, picked it up right away | 16:22 |
fungi | seems the job is running normally | 16:22 |
opendevreview | Merged opendev/zone-opendev.org master: Update the SOA serial https://review.opendev.org/c/opendev/zone-opendev.org/+/947614 | 16:23 |
fungi | watching for `host -t soa opendev.org ns03.opendev.org` to update | 16:23 |
clarkb | https://opendev.org/opendev/zone-opendev.org/commits/branch/master shows that replicated | 16:24 |
fungi | currently serving 1744647079 but should switch to 1744911189 during deploy | 16:24 |
fungi | opendev.org has SOA record adns02.opendev.org. hostmaster.opendev.org. 1744911189 3600 600 864000 300 | 16:25 |
clarkb | great that roughly verifies code review approval, zuul gating, zuul merge, zuul deploy of system-config stuff | 16:26 |
fungi | also 3 minutes from approval to serving a dns update is... impressive | 16:26 |
clarkb | the next big step is https://review.opendev.org/c/opendev/system-config/+/947139 but because that touches inventory/.* we will trigger manage projects. So before we get to 947139 do we want to sync /opt/lib/jeepyb? | 16:28 |
clarkb | part of me is leaning towards that sync just to avoid errors in manage-projects due to timeouts | 16:28 |
clarkb | or other unexpected behavior | 16:28 |
fungi | i was going to say it would be good to exercise it with a cold/missing cache, but you make a good point about job timeouts | 16:28 |
clarkb | while ya'll think about that I'm going to manaully disable the replication config on review02 | 16:28 |
clarkb | replication config on review02 is unconfigured (I left it in the state 03 was in before the migration. Top level configs in place but removed all the targets) | 16:30 |
clarkb | /opt/lib/jeepyb is owned by root not gerrit02 so the key setup I used to sync the other data may not work. | 16:31 |
clarkb | fungi: ^ any good ideas for the best way to sync that? It is small so should be quick once we have a plan for it | 16:32 |
clarkb | https://review.opendev.org/c/starlingx/test/+/947751 is a newly pushed patchset | 16:33 |
clarkb | I'm going to double check that replicated | 16:33 |
corvus | m-p is idempotent right, so if it does error we can "just" keep running it? i don't feel strongly either way. | 16:34 |
clarkb | https://opendev.org/starlingx/test/commit/ac386f4a548e95ac32a789bae8612d004b28909e seems to show that replicated | 16:35 |
clarkb | corvus: yes it should be. The idea is it checks state against gerrit if the local cache doesn't know the answer to things like what is the hash of the refs/meta/config | 16:35 |
clarkb | corvus: so it should slowly rebuild that cache from scratch if all goes as expected and we don't sync the cache | 16:35 |
fungi | probably an easy way to copy it is to just tar it up on 02, scp that to 03 and untar there | 16:37 |
clarkb | the original use of tar | 16:37 |
fungi | that way we can preserve ownership without needing to worry about which accounts have access to ssh | 16:37 |
clarkb | I marked check pushing new patchsets and checking replication as done via starlingx/test 947751,3 | 16:38 |
fungi | well, no, the original use of tar was as a writer for tape archives ;) | 16:38 |
fungi | hence its name | 16:38 |
clarkb | fungi: any chance you want to sync /opt/lib/jeepyb? | 16:38 |
clarkb | you can use /home/gerrit2/tmp/ as the staging location for the tarball on 02 then gerrit2 can fetch it from there | 16:38 |
fungi | i can, sure | 16:38 |
fungi | it's tiny anyway, under a megabyte | 16:39 |
clarkb | this isn't on the etherpad but I've run gerrit show-queue just to check that the server isn't running unexpected tasks or falling behind | 16:40 |
clarkb | it looks correct to me | 16:40 |
clarkb | I've also spot checked the "reviewed" flag on older changes that I know I've reviewed and those flags are present so the sql db sync appears to have worked | 16:42 |
fungi | `tar tf tmp/jeepyb_cache.tar` as gerrit2 on review03 shows expected file list | 16:42 |
clarkb | oh huh are those directories empty? | 16:42 |
clarkb | I guess we create a scratch space to do the refs/meta/cofnig updates that we don't keep around then record the hash in the json blob | 16:43 |
fungi | yeah, i wonder if we should have copied something else | 16:43 |
fungi | the /opt/lib/jeepyb/* directories on review02 are definitely all empty too | 16:43 |
clarkb | the project.cache file timestamp looks about right | 16:43 |
clarkb | and /usr/local/bin/manage-projects does -v/opt/lib/jeepyb:/opt/lib/jeepyb for bind mounts so I think the path is correct | 16:44 |
clarkb | I think we just don't keep as much data there as I expected and the json blob is the interseting thing? | 16:44 |
fungi | /opt/lib/jeepyb/project.cache.old i guess | 16:46 |
clarkb | fungi: /opt/lib/jeepyb/project.cache | 16:46 |
fungi | aha | 16:46 |
fungi | /opt/lib/jeepyb/project.cache.old must be cruft from long ago | 16:46 |
fungi | last modigied over 2 years ago, yep | 16:47 |
clarkb | I guess that gives us a third option we could sync just /opt/lib/jeepyb/project.cache | 16:47 |
clarkb | so that we've got an up to date json blob with refs/meta/config hashes we don't have to recalculate | 16:47 |
clarkb | btu then let jeepyb build out whatever scratch workspace it needs from there | 16:47 |
clarkb | I kinda like that as a halfway step | 16:48 |
clarkb | fungi: cool I see that file extracted now | 16:49 |
clarkb | and looks like you hashed it and confirmed it matches what is on 02 | 16:50 |
clarkb | given that should I go ahead and remove my WIP from https://review.opendev.org/c/opendev/system-config/+/947139 and remove review03.opendev.org from the emergency file? | 16:50 |
fungi | yeah, let's see what happens | 16:50 |
clarkb | ok wip is removed and review03 is not in the emergnecy file. review02 is in the emergency file and will stay there | 16:51 |
clarkb | anyone else want ot review 947139 or should I approve it? | 16:52 |
clarkb | I guess I should approve it | 16:55 |
clarkb | this is done. YOu have a few minutes while that gates to stop it | 16:55 |
fungi | thanks! | 16:57 |
clarkb | once that is done I have ~three changes in mind for the longer term followup. First is replacing sighup with sigint (this is already written). Then a change to pull review02 from the inventory. Then finally a change that depends on both of those prior changes that removes the docker compose version specifier so that we stop getting warnings about that from the new host | 17:02 |
clarkb | those are longer term because I don't think we're in a huge rush to land any of them. Timing is likely largely based around when we're comfortable we won't rollback for some reason | 17:03 |
clarkb | oh then we can also move the gerrit images to quay | 17:03 |
clarkb | hourly deploy just completed so when 947139 lands it will dive right in. I think manage-projects occurs near the end though fwiw | 17:09 |
fungi | all three of those sound good, yes | 17:09 |
fungi | s/three/four/ | 17:09 |
opendevreview | Merged opendev/system-config master: Unstage review03.opendev.org https://review.opendev.org/c/opendev/system-config/+/947139 | 17:10 |
clarkb | oh that didn't trigger quite as many jobs as I expected. but manage-projects did queue up | 17:10 |
fungi | even better | 17:10 |
clarkb | infra-root now that we're on the noble host if you need to run `docker-compose down` for some reason to shutdown gerrit on 03 you'll see that it sits there waiting and waiting. The reason is we have a 5 minute timeout for gerrit to stop and we issue a sighup that never reaches the container due to apparomor rules. If you do this you can switch to another prompt and `ps -elf | grep | 17:23 |
clarkb | gerrit | grep jdk` to get the pid of the java process then `sudo kill -HUP $pid` and docker-compose down will see the container has shutdown and proceed as usual | 17:23 |
clarkb | I have a change up to switch us over to sigint which should just work then you won't have to think about this anymore | 17:23 |
clarkb | but calling this out now in case that sigint change doesn't merge before you need this trick | 17:24 |
clarkb | manage-projects should be just a minute or two away now | 17:25 |
clarkb | it just started | 17:26 |
clarkb | I'm tailing the log on bridge. It runs against gitea first. Also I think ansible may buffer the entire output of manage-projcets and not write it until its done? | 17:26 |
clarkb | its done. I'm like 90% certain it nooped as epxcted but the log isn't small so digging in a bit more now | 17:29 |
fungi | the log is very verbose. it lists tons of things it decides not to do | 17:30 |
corvus | sigint lgtm | 17:31 |
clarkb | `grep manage_projects /var/log/ansible/manage-projects.yaml.log | grep -v 'Processing project' | grep -v 'skipping ACLs'` the only output from this is that it wrote its cache file at the end | 17:31 |
clarkb | I think that confirms it nooped | 17:31 |
corvus | https://review.opendev.org/947540 | 17:32 |
clarkb | and ya I think ^ should be safe to land before we decide to remove review02 since sigint should work with the old docker compose setup too | 17:33 |
clarkb | but the other changes I need to write are largely going to wait on us being ready to cleanup the old server I think | 17:33 |
clarkb | I think we're in a good spot now. I haven't seen anything concerning. Review02 gerrit is shutdown and in the emergency file. Review03 is now acting like a normal gerrit server with that last change landing. I'm going to take a break for something to eat (I skipped breakfast) then will dive into the todo list at the end of the etherpad | 17:35 |
clarkb | thank you everyone for the extra set of eyeballs. | 17:35 |
corvus | clarkb: thank you for the planning and doing of the work! :) | 17:57 |
fungi | yes, thanks!!! | 18:05 |
fungi | went very smoothly, and we were able to keep the outage to about 15 minutes at the start of the window | 18:05 |
fungi | extreme success | 18:05 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add review03 to a couple of places that were missed https://review.opendev.org/c/opendev/system-config/+/947757 | 18:09 |
opendevreview | Clark Boylan proposed opendev/system-config master: Remove review02 from the inventory https://review.opendev.org/c/opendev/system-config/+/947758 | 18:09 |
opendevreview | Clark Boylan proposed opendev/system-config master: Drop docker-compose version specifier for Gerrit https://review.opendev.org/c/opendev/system-config/+/947759 | 18:09 |
clarkb | infra-root like 947540 I expect 947757 to be a safe change to alnd now. It adds the server to a couple places I missed intiailly. Nothing major but we probably want to fix that before we worry about cleaning up the server in the subsequent change | 18:09 |
opendevreview | Clark Boylan proposed opendev/system-config master: Migrate gerrit images to quay.io https://review.opendev.org/c/opendev/system-config/+/882900 | 18:22 |
opendevreview | Clark Boylan proposed opendev/system-config master: Fix gerrit upgrade testing https://review.opendev.org/c/opendev/system-config/+/947761 | 18:24 |
clarkb | infra-root 947761 is another thing I noticed that should be safe to land whenever as only upgrade testing of gerrit is really affected | 18:25 |
clarkb | and then 882900 is a restoration and rebase of the old change that would've put gerrit images on quay. | 18:26 |
clarkb | and with those changes I think I'm caught up on followup changes | 18:27 |
clarkb | now to sync the user cleanup info from my homedir on the old host to the new host | 18:27 |
clarkb | review03:~clarkb/gerrit_user_cleanups should now contain the content from review02:~clarkb/gerrit_user_cleanups | 18:31 |
clarkb | I feel like I'm largely acught up now and just waiting on reviews for the safe changes | 18:33 |
clarkb | given that I'm going to unpause ubuntu-noble image builds now | 18:33 |
clarkb | #status log Migrated Gerrit services from review02 to review03 | 18:35 |
opendevstatus | clarkb: finished logging | 18:35 |
fungi | yay! | 18:35 |
clarkb | #status log Unpaused ubuntu-noble and ubuntu-noble-arm64 nodepool image builds | 18:35 |
opendevstatus | clarkb: finished logging | 18:35 |
fungi | also i think i've reviewed and +2'd all th eoutstanding related changes as of moments ago | 18:35 |
clarkb | thanks. Looks like I can approve a few of them. I'll approve the sigint change and the fixup to add review03 in a few places | 18:37 |
corvus | you got a pair of +2s on all of them. sounds like a winning hand | 18:38 |
clarkb | the quay change is broken. Its pulling the base images from quay which it shouldn't. I'll gix that | 18:38 |
fungi | gerrit-base? | 18:39 |
clarkb | for anyone following along I approved https://review.opendev.org/c/opendev/system-config/+/947761 https://review.opendev.org/c/opendev/system-config/+/947757 and https://review.opendev.org/c/opendev/system-config/+/947540 I think they are mostly disjoint enough to be safe to land together or seprately etc so just sent it | 18:40 |
fungi | we do still have old copies of that on quay too, but i guess the change doesn't start updating those? | 18:40 |
clarkb | fungi: no the python base images | 18:40 |
fungi | aha | 18:40 |
fungi | because originally we had moved everything over | 18:40 |
fungi | python-builder and python-base both apparently | 18:41 |
opendevreview | Clark Boylan proposed opendev/system-config master: Migrate gerrit images to quay.io https://review.opendev.org/c/opendev/system-config/+/882900 | 18:41 |
clarkb | ya but then we found the problem and now the new appraoch is to do it bit by bit as things move to noble | 18:41 |
fungi | we had also been publishing a literal gerrit-base image to quay | 18:42 |
clarkb | ya we'll resume doing that with 882900 | 18:42 |
clarkb | thats the image that has java and everything but the war in it. Then we have the version specific images that drop the way in | 18:43 |
clarkb | s/drop the way in/drop the war in/ | 18:43 |
fungi | ah, we start from python-base as gerrit-base and then amend it and publish the latter | 18:43 |
clarkb | ya we build on the python image to add in java there | 18:44 |
clarkb | since we need both java and python. Then we mix in the specific version of gerrit that we want in the final image | 18:44 |
fungi | via system-config-upload-image-gerrit-base | 18:44 |
fungi | okay, makes sense | 18:44 |
fungi | is there a reason we don't publish python-builder and python-base to both dockerhub and quay, so builds for dependent images going to quay can use it from there? | 18:45 |
corvus | we do have it in opendevmirror; zuul uses it | 18:46 |
clarkb | probably the main thing is the docker hub pipeline is different than the quay pipeline. I think we could switch docker hub pipeline over to the container generic pipeline too though | 18:46 |
clarkb | but basically ^ its just more work right now probably | 18:47 |
fungi | makes sense | 18:47 |
corvus | https://quay.io/repository/opendevmirror/python-base?tab=tags | 18:47 |
corvus | switching to mirror would reduce the docker calls, but we'd lose stacked image testing | 18:48 |
clarkb | things are still gating. I'm going to eat lunch while I wait | 19:11 |
fungi | i'm around in case any of them goes sideways | 19:11 |
clarkb | https://zuul.opendev.org/t/openstack/build/8521e3cbdb584dd897f73794e0c36879/log/job-output.txt#17743 is an interesting result from the quay.io change | 19:46 |
clarkb | I suspect but can't say for sure that we maybe didn't use the newly built base image there and instead used the old stale base image that is on quay | 19:47 |
clarkb | and then that explodes when trying to run the resulting war under that java version? | 19:47 |
clarkb | this isn't urgent though so I'm just going to ignore that for now. I think we can fetch the imagesl ocally and inspect them directly to determine what is going on there | 19:48 |
fungi | the newly-built vs stale gerrit-base image? | 19:48 |
clarkb | fungi: ya that error is saying the java can't run the classes in the war I think. So I'm thinking the java version is older than expected so we fetched whatever is actually on quay and not built by https://zuul.opendev.org/t/openstack/build/c0fb6820e42049a3b7cc3c7f7058ee8a | 19:49 |
clarkb | we should be using the newly built image in that buildset but maybe the jobs aren't configured correctly to do that or something | 19:49 |
clarkb | but since this is all down the road followups I'm also thinking punt for now | 19:49 |
fungi | system-config-build-image-gerrit-3.10 needs to run with the artifact system-config-build-image-gerrit-base creates, right? or are they only combined when system-config-run-review-3.10 runs? | 19:50 |
clarkb | right system-config-build-image-gerrit-3.10 should run with the image built by sytem-config-build-image-gerrit-base | 19:52 |
clarkb | within the same buildset | 19:52 |
clarkb | however I suspect this hasn't happened based on that error | 19:52 |
fungi | system-config-build-image-gerrit-3.10 started after system-config-build-image-gerrit-base completed, so it had the opportunity at least | 19:53 |
clarkb | https://zuul.opendev.org/t/openstack/build/c0fb6820e42049a3b7cc3c7f7058ee8a/log/job-output.txt#2699-2758 is where we push to the buildset registry | 19:55 |
corvus | i see that job running the docker-image/pre.yaml playbook; maybe it should run container-image/pre.yaml ? | 19:55 |
clarkb | https://zuul.opendev.org/t/openstack/build/26a4c797e04f4f9e90a279d6da54bab8/log/job-output.txt#1404-1531 is where we fetch it | 19:56 |
clarkb | corvus: aha that may cause it | 19:56 |
fungi | ah, right, if it's not using podman then it's going to pull from dockerhub | 19:57 |
fungi | er, from quay directly | 19:57 |
clarkb | corvus: where do you see that? I'm looking at the console on the jobs and both seem to use container-image? | 19:57 |
fungi | instead of the intermediate, because it can't handle proxy locations for anything other than dockerhub | 19:58 |
fungi | intermediate or buildset | 19:58 |
corvus | https://zuul.opendev.org/t/openstack/build/8521e3cbdb584dd897f73794e0c36879/console | 19:58 |
corvus | 3rd pre playbook | 19:58 |
clarkb | ah in the run job | 19:59 |
clarkb | I do wonder if actually the problem is using docker to build the image since as fungi says that will not use the mirror unless we're using buildah or buildx? | 19:59 |
clarkb | and then separately we may also need to update the run job as well? | 19:59 |
clarkb | the three followup changes should land shortly. I'll watch them and check things like the docker compose file update | 20:02 |
clarkb | I don't recall needing to switch that for system-config-run-paste and from testing it seemed to work when we moved it. But maybe that is also broken | 20:05 |
clarkb | definitely worthy of followup but will have to be another time | 20:05 |
corvus | maybe we just accept things are squirrely until everything in a given stack is switched over | 20:06 |
clarkb | ya that might be a reasonable option depending on the underlying issue | 20:06 |
clarkb | the remaining gitea build is about to run testinfra so hopefully it finishes soon | 20:17 |
opendevreview | Merged opendev/system-config master: Use sigint instead of sighup to stop gerrit https://review.opendev.org/c/opendev/system-config/+/947540 | 20:24 |
opendevreview | Merged opendev/system-config master: Add review03 to a couple of places that were missed https://review.opendev.org/c/opendev/system-config/+/947757 | 20:24 |
opendevreview | Merged opendev/system-config master: Fix gerrit upgrade testing https://review.opendev.org/c/opendev/system-config/+/947761 | 20:24 |
clarkb | stop_signal is now SIGINT on disk and the running containers appear to have been elft alone. Job is still running for 947540 though | 20:28 |
fungi | i guess we'll want to do a live test of it at some point when things are quiet | 20:28 |
clarkb | ya but I think that is safe to do another day as well. I have that held node where I tested it a lot (admittedly primarily with kill not docker compose but still) | 20:29 |
fungi | agreed | 20:29 |
clarkb | manage projects ran again and again appaers to have noop'd | 20:42 |
clarkb | once these changes deploy I'm going to work on getting a meeting agenda out. Please let me know if you have any edits/updates/deletions/etc to add | 20:44 |
clarkb | but then I may call it early today. i had an early start and its hard to context switch into something else afterdoing gerrit all day | 20:45 |
clarkb | all three of those changes have deployed now and all three buildsets report success | 20:52 |
fungi | good idea | 20:53 |
clarkb | I made note of this in #openstack-infra but new noble images have built and are in the process of uploading to the various cloud regions | 20:54 |
clarkb | I don't really know how to test things for neutron. I'm guessing specific jobs should be run but if I tried rechecking stuff now it would be hit and miss for getting on the new image | 20:54 |
fungi | i expect they're let us know if things are (still/re)broken | 21:01 |
clarkb | ya I think the main risk is we build a second new image before they can test and they both end up broken | 21:03 |
clarkb | but noble pushed new kernels that supposedly fix this and we can't stay in the past forever. It has been almost a month. If things are still broken knowing that and finding workarounds is probably the right thing | 21:04 |
fungi | exactly | 21:08 |
clarkb | https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting has the updates I wanted to apply to it in place | 21:10 |
clarkb | anything else? | 21:10 |
corvus | lgtm; we can talk about who has signed up for image building | 21:12 |
clarkb | oh ya let me add that | 21:12 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add win_zuul_console to start-zuul-console https://review.opendev.org/c/zuul/zuul-jobs/+/947769 | 21:27 |
clarkb | I'll send out the meeting agenda at 22:00 UTC | 21:36 |
clarkb | looking at the build and system-config-run jobs in https://review.opendev.org/c/opendev/system-config/+/947010 for hound I'm like 95% certain that that images was properly used from the intermediate build registry and not the two year old image that is on quay | 22:03 |
clarkb | my suspicion with the gerrit change is that because docker image builds are trying to use the speculative state we're running into a problem | 22:03 |
clarkb | but its fine on the runtime side when we use podman as is | 22:04 |
clarkb | ya I think the missing piece is that use-buildset-registry also needs to configure /etc/buildkitd.toml or we can switch from docker to podman for the builds maybe | 22:12 |
opendevreview | Clark Boylan proposed opendev/system-config master: Migrate gerrit images to quay.io https://review.opendev.org/c/opendev/system-config/+/882900 | 22:20 |
clarkb | that attempts to build with podman as the build command whcih should in theory recognize the mirror config that is set up for us | 22:21 |
clarkb | corvus: also note that https://zuul.opendev.org/t/openstack/build/8521e3cbdb584dd897f73794e0c36879/console#2/0/13/localhost this seems to be how we preload things for system-config-run jobs that use a mix of docker and podman things | 22:24 |
clarkb | so I think the system-config-run jobs are ok, but only when artifacts are setup to preload the images? | 22:24 |
clarkb | times liek this Iwish the docker image system was less implementation specific (everyone has their ow nconfig files and behaviors...) | 22:25 |
clarkb | ok now to get the meeting agenda out | 22:26 |
clarkb | fungi: I've detached from the review03 screen. I think we can clean it up. If you agree I can attach later and exit the shells in there | 22:30 |
clarkb | or feel free too | 22:31 |
fungi | i agree, i already detached a couple hours ago | 22:31 |
clarkb | cool I'll probably get to that tomorrow morning. I'm going to wind down my typing activities for the day. I had to reload keys just to push that 882900 update | 22:32 |
clarkb | which is a sign I've had my keys loaded for long enough. This way I can go on a bike ride too | 22:32 |
fungi | indeed | 22:32 |
opendevreview | Merged openstack/project-config master: storlets: Add gerritbot notification about stable branches https://review.opendev.org/c/openstack/project-config/+/947469 | 22:54 |
opendevreview | Merged openstack/project-config master: zaqar: Fix missing notification about stable branches https://review.opendev.org/c/openstack/project-config/+/947467 | 22:55 |
opendevreview | Merged openstack/project-config master: watcher: Remove notification of puppet-watcher https://review.opendev.org/c/openstack/project-config/+/947466 | 22:57 |
opendevreview | Merged openstack/project-config master: Remove telemetry groups https://review.opendev.org/c/openstack/project-config/+/946767 | 23:00 |
opendevreview | Merged openstack/project-config master: Only pause update_constraints.sh when needed https://review.opendev.org/c/openstack/project-config/+/946541 | 23:00 |
opendevreview | Merged openstack/project-config master: Charms: add review priority to charms repos https://review.opendev.org/c/openstack/project-config/+/942381 | 23:00 |
opendevreview | Merged opendev/gerritlib master: Run the Gerritlib Jeepyb Gerrit integration job on Noble https://review.opendev.org/c/opendev/gerritlib/+/944407 | 23:01 |
opendevreview | Merged openstack/project-config master: Remove qdrouterd role from the config https://review.opendev.org/c/openstack/project-config/+/938194 | 23:08 |
opendevreview | Merged openstack/project-config master: Move OSA sync to integrated repository https://review.opendev.org/c/openstack/project-config/+/947628 | 23:08 |
opendevreview | Merged openstack/project-config master: Deprecate openstack-ansible-tests repository https://review.opendev.org/c/openstack/project-config/+/947629 | 23:08 |
opendevreview | Merged opendev/system-config master: Publish hound container images to quay https://review.opendev.org/c/opendev/system-config/+/947010 | 23:17 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!