*** JamesNg[m] is now known as jamesngn[m] | 01:09 | |
*** elodilles_pto is now known as elodilles | 07:23 | |
*** mrunge_ is now known as mrunge | 09:20 | |
*** mko is now known as Guest2243 | 12:05 | |
tonyb | clarkb: fungi: Would be helpful for me to do steps, 4-6 in the gerrit 3.10 upgrade checklist? | 14:35 |
---|---|---|
fungi | tonyb: i had already set myself reminders to do those, but feel free! | 14:35 |
fungi | i was just getting ready to do step 6 now, in case running deploy jobs take a little longer to complete | 14:36 |
fungi | and then step 5 at the top of the hour as a one-hour notice | 14:37 |
tonyb | fungi: okay. Sounds like you have it in hand already. I don't want to get in the way | 14:37 |
fungi | happy to share, my morning was sidetracked a bit by a power blip that made it clear to me the ups for my workstation needs replacing. since it's rebooted anyway i'm taking a few extra minutes to upgrade packages/kernel on it | 14:39 |
tonyb | Ah okay | 14:39 |
fungi | openafs kernel modules take soooo long to build | 14:40 |
tonyb | Yes, Yes they do | 14:42 |
tonyb | Okay emergency file updated | 14:47 |
fungi | thanks! | 14:49 |
fungi | tonyb: are you in a position to send the status notice now as well? | 15:01 |
tonyb | #status notice Gerrit on review.opendev.org is being upgraded to version 3.10 and will be offline. We have allocated an hour for the outage window lasting until 1700 UTC. | 15:01 |
opendevstatus | tonyb: sending notice | 15:01 |
fungi | hah, perfect! | 15:01 |
fungi | thanks!!! | 15:01 |
-opendevstatus- NOTICE: Gerrit on review.opendev.org is being upgraded to version 3.10 and will be offline. We have allocated an hour for the outage window lasting until 1700 UTC. | 15:01 | |
tonyb | np, I was just a little distracted so I missed the target by a little | 15:02 |
tonyb | #ooops | 15:02 |
tonyb | #status notice Gerrit on review.opendev.org is being upgraded to version 3.10 and will be offline starting at 1600 UTC. We have allocated an hour for the outage window lasting until 1700 UTC. | 15:02 |
opendevstatus | tonyb: finished sending notice | 15:04 |
opendevstatus | tonyb: sending notice | 15:04 |
-opendevstatus- NOTICE: Gerrit on review.opendev.org is being upgraded to version 3.10 and will be offline starting at 1600 UTC. We have allocated an hour for the outage window lasting until 1700 UTC. | 15:04 | |
opendevstatus | tonyb: finished sending notice | 15:08 |
fungi | good catch | 15:09 |
tonyb | fungi: Thanks. Hopefully it's clear enough and if not anyone confused will ask here | 15:10 |
fungi | yeah, should be fine | 15:10 |
fungi | looking at trending for the openeuler mirror, if growth rate of the new version is similar to the old one, we'll probably hit the 350gb quota around june | 15:14 |
tonyb | That doesn't seem sustainable | 15:14 |
fungi | seems to grow linearly at about 25gb/mo | 15:15 |
tonyb | long term that's a lot of disk if there isn't a natural reset/truncate point | 15:17 |
clarkb | I think the reset is probably the next release but as this transition shows there hasn't been much community involvement in getting those rotated regularly | 15:24 |
clarkb | also I am awake and almost ready (I still need tea and ssh keys need loading) | 15:24 |
tonyb | Yeah I guess that's part of my worry? | 15:25 |
clarkb | but ya I personally would like to see openeuler try a no mirror approach given the amount of jobs run there and see if that is stable enough | 15:27 |
tonyb | clarkb: That sounds fair | 15:27 |
clarkb | I have tea and ssh keys are loaded | 15:39 |
clarkb | I've confirmed the preflight items (the logging config is in place and the change to swap our gerrit image tag in prod after we do manual things has a +1 from zuul | 15:41 |
tonyb | sounds good to me | 15:42 |
clarkb | the emergency file looks good and I see the notice was sent | 15:42 |
clarkb | apparently emergency file edits were in the file twice so I trimmed the extra one. | 15:43 |
clarkb | screen is started and logging in screen is enabled too | 15:43 |
fungi | attached | 15:44 |
* tonyb is also attachech FWIW | 15:47 | |
clarkb | sounds good. I'm happy to drive since I've gone though this several times in the last several weeks | 15:49 |
fungi | i've also given the openstack release managers a couple of reminders this morning | 15:53 |
clarkb | er I meant the emergency file edits step was in the etherpad twice so I edited the etherpad | 15:56 |
clarkb | the actual emergency file looked fine | 15:56 |
clarkb | I think I've just remembered that we need to restart zuul schedulers to pick up the new gerrit version too, but I don't think zuul has any new functionality based on that version so its probably fine for this upgrade if it waits for our normal friday/saturday upgrade process | 15:57 |
tonyb | clarkb: ooooh that makes more sense ... I couldn't figure out how I'd added the content twice | 15:57 |
clarkb | tonyb: sorry about that. I reread what I wrote and realized it wasn't very clear | 15:57 |
tonyb | clarkb: all good | 15:57 |
corvus | clarkb: agree re zuul | 15:58 |
clarkb | I'll make a note about the zuul scheduler thing on the etherpad so we remmber for next time | 15:59 |
clarkb | #status notice Gerrit on review.opendev.org is being upgraded to version 3.10 and will be offline momentarily. We have allocated an hour for the outage window lasting until 1700 UTC. | 15:59 |
opendevstatus | clarkb: sending notice | 15:59 |
-opendevstatus- NOTICE: Gerrit on review.opendev.org is being upgraded to version 3.10 and will be offline momentarily. We have allocated an hour for the outage window lasting until 1700 UTC. | 16:00 | |
clarkb | once that gets back with a complete notice I'll start with the disruptive commands (down gerrit and proceed with the rest of the upgrade process) | 16:01 |
clarkb | in the future maybe we send the notice at 15:55 UTC :) | 16:03 |
opendevstatus | clarkb: finished sending notice | 16:03 |
fungi | there we are | 16:03 |
clarkb | ok starting now | 16:03 |
clarkb | fs backup is done the db backup is running now | 16:05 |
clarkb | it is done too both subcommands report terminating with success status, rc 0 in the log file | 16:06 |
fungi | and no containers running | 16:07 |
clarkb | these indexes are not small | 16:09 |
clarkb | I half wonder if we should skip this step for the future and just reindex from scratch if we downgrade | 16:10 |
fungi | i think it's good to keep | 16:10 |
fungi | you can plan a longer outage for the upgrade, but rollback is always likely to end in a scramble | 16:11 |
fungi | so easing the rollback process seems like a reasonable choice | 16:11 |
clarkb | ack | 16:13 |
clarkb | pulled image looks correct to me | 16:13 |
clarkb | ok now is when I'll lean on ya'll to check gerrit functionality loosk good to you | 16:15 |
clarkb | we are pruning caches and reindexing all of the indexes though so things might be a bit slow | 16:15 |
clarkb | I'm going to check on reindexing progress | 16:16 |
clarkb | reindexing appears to be moving forward but it has a lot of work to do | 16:16 |
clarkb | the web ui loads for me and reports the expected version. I can see change lists and changes and at least one file diff | 16:17 |
fungi | grrr... my broadband provider picked now to have an outage | 16:17 |
clarkb | I'll leave the log file tail in the first screen window and create window 1 to check the config diffs | 16:18 |
tonyb | UI looks good, version is as expected logout/login (with OpenID) "just worked" | 16:18 |
fungi | i'm on a phone terminal right now so no longer folloeing the screen session, but will try to get up on a tether asap | 16:19 |
tonyb | changes load but currently there isn't any diff data | 16:19 |
clarkb | the config diff lgtm. As expected there is a delta but it is limited to the email soy templates | 16:19 |
clarkb | (we don't manage those) | 16:19 |
clarkb | tonyb: ya diffs take a bit to load as caches trim and refresh but after 5-10 minutes should be consistently available | 16:19 |
clarkb | change upload, recheck and zuul enqueing things, replication and eventually merging a chagne are the big items to check on | 16:20 |
clarkb | anyone have a change to upload or push a new patchset to? | 16:21 |
tonyb | clarkb: yeah, I know it's expected, just calling it out so that I can confirm when it starts working (which it has done) | 16:21 |
clarkb | I suspect those tracebacks for commits not being found may be due to the reindexing backlog | 16:22 |
clarkb | I'm happy with reindexing progress down to under 2k tasks | 16:22 |
clarkb | (we started with over 16k) | 16:23 |
tonyb | nice | 16:23 |
clarkb | of course it is a long tail as it gets into projects like nova and neutron and cinder but progress is progress | 16:23 |
clarkb | I rechecked https://review.opendev.org/c/opendev/system-config/+/935395 because it runs only a few jobs | 16:24 |
clarkb | and zuul has enqueued things for that recheck | 16:24 |
clarkb | I don't see any new changes or patchsets since the update | 16:28 |
clarkb | I guess I'll work on a noop change to check that | 16:28 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM testing https://review.opendev.org/c/opendev/system-config/+/937266 | 16:29 |
clarkb | ok new change create worked per ^ and zuul has enqueued it | 16:29 |
tonyb | Thanks | 16:30 |
clarkb | I'm going to check replication of 937266 next | 16:30 |
fungi | okay, finally on a slightly more useful computer via phone tether | 16:31 |
clarkb | git fetch origin refs/changes/66/937266/1 for origin https://opendev.org/opendev/system-config (fetch) succeeded and git show FETCH_HEAD looks correct to me | 16:32 |
clarkb | that has me checking off all of the things we've explicitly listed to check for functionality in the ehterpad | 16:32 |
fungi | seems like i missed the most exciting steps, thankfully nothing got more exciting than expected | 16:32 |
clarkb | but please do your own checks as we have different browsers, network setups, and workflows | 16:32 |
clarkb | fungi: I wonder if you should get on the starlink bandwagon :) | 16:33 |
clarkb | also welcome backl | 16:33 |
tonyb | Oh feck I was making the replication check ways more complex than it needed to be. | 16:34 |
fungi | yeah, there's a new fiber provider rolling out lines here, so hoping to give them a try soon | 16:34 |
clarkb | tonyb: heh ya its a bit hard to discover the magical refs since they are mostly implementation details but we replicate the mtoo so make a quick and easy check | 16:34 |
clarkb | we're currently waiting on reindexing to complete per the etherpad but as mentioendb efore any additional testing people do is appreciated | 16:35 |
tonyb | clarkb: I was going to each gitea server bypassing the haproxy and checking the object was there, which is more full gitea checking than gerrit -> gitea replication checking | 16:36 |
clarkb | tonyb: ah I see. That doesn't hurt but ya shouldn't be necessary to chcek if new gerrit can talk to the version of gitea we run | 16:37 |
clarkb | if a specific gitea has a problem I wouldn't expect that to be due to the upgrade | 16:37 |
tonyb | Yeah. My thought process was just plain wrong | 16:39 |
clarkb | when reindexing is complete and we're happy with the testing we have done https://review.opendev.org/c/opendev/system-config/+/937051 is the next thing to get in place so we can remove the emergency file entries | 16:40 |
clarkb | corvus: I think zuul jobs that attempted to update state from gerrit while gerrit was done have reported merge conflict | 16:42 |
clarkb | corvus: I believe this is expected and a recheck will sort them out; however, I wonder if we should update zuul to differentiate between an actual merge command fail and network access/git access failures | 16:42 |
fungi | s/done/down/ presumably | 16:42 |
clarkb | yes down sorry | 16:42 |
clarkb | just about 1k reindexing tasks remaining | 16:43 |
clarkb | https://review.opendev.org/936705 is a chagne in the gate that should merge soon | 16:44 |
fungi | at least this time the fiber cut is only a mile down the road, not off the side of a deserted stretch of highway over on the mainland like usual | 16:45 |
clarkb | fungi: you should lobby them to wrap the fiber in lots of kevlar and steel | 16:46 |
fungi | and whatever cern uses for antimatter containment | 16:46 |
clarkb | https://review.opendev.org/c/openstack/ansible-collection-kolla/+/936705 did merge | 16:47 |
fungi | seems things are generally working post-upgrade. great work! | 16:47 |
clarkb | reindexing is complete. 3 changes failed to reindex and quickly skimming the log its ancient changes that we've had problems with before | 16:51 |
clarkb | so I think that is expected | 16:51 |
clarkb | I'm going to stop the logfile tail now | 16:51 |
fungi | thnks! | 16:51 |
tonyb | Sounds good | 16:51 |
tonyb | Thanks clarkb | 16:51 |
clarkb | so I think the next step is to decide if we are happy with this and if so land the chagne to reflect the new version in system-config | 16:52 |
clarkb | I haven't seen anything concerning yet so I think we can proceed with landing that change | 16:52 |
clarkb | landing that change doesn't prevent a rollback later just makes it slightly more painful | 16:52 |
fungi | sounds good to me, virtual +2 | 16:52 |
clarkb | I removed my -W | 16:53 |
clarkb | if at least one other person can +2 before I approve that would be great (so we record it in there more than just a virtual +2_ | 16:53 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/937051 this chnage | 16:54 |
clarkb | or maybe fungi can just add the +2 later? | 16:57 |
clarkb | I'll go ahead and approve it we have the time it takes to gate for anyone to protest and ya'll can add +2's later | 16:58 |
clarkb | and just before that appears ready to merge I'll remove nodes from the emergency file so we can check idempotency | 17:00 |
fungi | i can +2 for reals once my broadband is back, getting logged into gerrit or setting up gertty on this connection is going to be a bit of extra work | 17:00 |
clarkb | I just don't want history in gerrit to look like I was ninja upgrading gerrit | 17:00 |
clarkb | but even that is not a big deal | 17:00 |
clarkb | and while I wait I'm going to refresh my tea | 17:01 |
tonyb | I have +2+A'd 937051 | 17:02 |
fungi | thanks tonyb! | 17:03 |
clarkb | tyty | 17:03 |
tonyb | fungi: good luck with your broadband :/ | 17:04 |
clarkb | I think our total outage was about 15 minutes? maybe 20 | 17:04 |
clarkb | and most of that was waiting for things to copy/backup | 17:04 |
fungi | yeah, if it goes on much longer i'll swap my workstation over to the phone tether, though it's really not optimized for lower-bandwidth activity | 17:05 |
fungi | they're estimating service restoration in the next 3 hours, so i may need to bite the bullet for now | 17:06 |
clarkb | my ISP is getting purchased by Bell Canada | 17:06 |
tonyb | clarkb: 14 mins going of the IRC timestamps, and roughly 10 was waiting on backups ;P | 17:07 |
corvus | clarkb: it's not trivial to make that differentiation in zuul. we are intentionally conservative about what information git outputs that we send back to the user. | 17:07 |
clarkb | somewhat concerned because the scrappy localish ISP i've been on has been great from a communications perspective | 17:07 |
clarkb | corvus: ack and its understandable from our end by comparing timestamps and we can just tell people to recheck | 17:07 |
tonyb | clarkb: Yeah I can understand that feeling | 17:07 |
fungi | is all of oregon seceeding from the usa to become part of canada, or only the portland area? | 17:08 |
clarkb | the main network admin is on their subreddit answering questions. I have a feeling that will stop once the acquisition completes | 17:08 |
clarkb | fungi: the main secession movement is "cascadia" which is where oregon and washington steal BC and we become our own thing | 17:08 |
fungi | that definitely sounds like more work | 17:09 |
clarkb | there is also the state of jeffersion which is southern oregon and northern california bcoming a 51st state | 17:09 |
clarkb | and then "greater idaho" which is where eastern oregon splits off to join idaho | 17:09 |
clarkb | I wonder if any other part of he country has this many border realignment groups. Hawaii and alaska maybe? | 17:09 |
clarkb | I suspect the problem is people who live in oregon really like it (weather isn't too extreme, its beautiful almost everywhere) but then get annoyed when politics don't align so rather than moving themselves want to move the borders | 17:10 |
opendevreview | Clark Boylan proposed openstack/project-config master: Update jeepyb triggered Gerrit builds to Gerrit 3.10 https://review.opendev.org/c/openstack/project-config/+/937268 | 17:16 |
clarkb | looking at post upgrade tasks led to ^ there are several other tasks too that aren't super urgent like dropping 3.9 image builds and adding 3.11 and updating the upgrade job | 17:19 |
clarkb | that said I suspect the issue corvus found with 3.11 means adding 3.11 and updating the upgrade job may be a bit of work compared to the previous upgrade lookahead job updates | 17:20 |
clarkb | which makes me less enthusiastic for jumping into that today | 17:20 |
clarkb | I'll probably try to get some naive changes up today then we can work through the broken next week | 17:20 |
tonyb | Also I suppose the dropping 3.9 can/should wait until we confirm a rollback isn't going to happen | 17:22 |
clarkb | ++ | 17:23 |
clarkb | I think the chnage to update the gerrit version is going to land a minute or two after hourly jobs enqueue | 17:47 |
clarkb | that said I think it is safe to remove hosts from the emergency file while hourly jobs run because the hourly jobs don't hit review | 17:49 |
clarkb | so I'll go ahead and do that now | 17:49 |
clarkb | (also we don't automatically restart gerrit so even if things update out of sync the runnign service should continue on the version we want until we fix the on disk version) | 17:49 |
tonyb | ++ | 17:49 |
* tonyb puts laptop down and thinks about dinner | 17:50 | |
clarkb | and done, enjoy dinner | 17:50 |
clarkb | I will check the docker-compose file after jobs run inside of the screen so that we log that then I'll stop the screen and move the logfile | 17:53 |
clarkb | as expected hourly jobs have started before the chagne merged | 18:01 |
clarkb | this shouldn't be a problem it will just delay our idempotency check by 15-20 minutes | 18:02 |
opendevreview | Merged opendev/system-config master: Update Gerrit image tag to 3.10 (from 3.9) https://review.opendev.org/c/opendev/system-config/+/937051 | 18:05 |
opendevreview | Clark Boylan proposed opendev/gerritlib master: Test gerrit lib (and jeepyb) against Gerrit 3.10.3 https://review.opendev.org/c/opendev/gerritlib/+/937276 | 18:11 |
opendevreview | Clark Boylan proposed opendev/system-config master: Remove log cleanup cronjob from review https://review.opendev.org/c/opendev/system-config/+/937277 | 18:16 |
opendevreview | Clark Boylan proposed opendev/system-config master: Drop Gerrit log cleanup cron from Ansible https://review.opendev.org/c/opendev/system-config/+/937278 | 18:16 |
clarkb | infra-prod-service-review is running now | 18:22 |
clarkb | ansible reports "changed": false, when writing out the docker-compose.yaml and 3.10 still shows up in the file on disk and the timestamp for file modification hasn't changed | 18:25 |
clarkb | if anyone else wants to confirm that in the screen session that would be great then I 'll turn off the screen | 18:25 |
clarkb | manage-projects is running now | 18:26 |
clarkb | https://zuul.opendev.org/t/openstack/build/ac72d6bfe8174fa3abca27d817880320 manage projects was succesful | 18:30 |
clarkb | skimming the log on bridge I see that project acl updatse are all skipped as anticipated and there are no obvious errors | 18:31 |
clarkb | so ya I think we're good for both manage-projects/jeepyb and also docker-compose | 18:31 |
clarkb | I'm going to shutdown the screen now | 18:32 |
clarkb | oh wait someone was checking something looks like buffer got claered I'll wait for that to be done (let me know) | 18:32 |
fungi | that was me, sorry, i'm not succeeding at getting into the screen session (it's acting hung after i screen -x, but i guess keypresses were going through). since it seems like nothing went sideways, so i'm going to step away and get the shower i skipped earlier. power outage and network outage in the same morning has made this day productivity-challenged | 18:33 |
clarkb | ack and ya I think everything looks good you can always take a look later | 18:34 |
clarkb | quitting screen now | 18:34 |
fungi | thanks. i think the background traffic volume in/out of my workstation is just overwhelming this phone tether | 18:35 |
clarkb | the screenlog is in the upgrade scratch dir and screen is closed | 18:36 |
clarkb | cleaning up autoholds and writing changes to drop 3.9 images and testing add 3.11 images and testing as well as landing the other post upgrade changes I've pushed are next on the agenda. However, I don't think there is a rush on that as the further we get on those the more painful any rollback will be | 18:37 |
clarkb | fungi: assuming network connectivity improves for you I think the main things to check are that docker-compose.yaml looks right, that manage-projects looks happy to you (log is on bridge in the ansible log dir), and that the screenlog we captured is complete | 18:39 |
clarkb | but I've checked all that and it lgtm so I'm not super worried | 18:39 |
fungi | noted | 18:39 |
fungi | i'll hopefully be able to take a look soon | 18:39 |
clarkb | I'm going to take a short break (I haven't eaten anything today) and then work on proposing more followup changes | 18:40 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update gerrit image build job dependencies https://review.opendev.org/c/opendev/system-config/+/937288 | 19:43 |
opendevreview | Clark Boylan proposed opendev/system-config master: Drop Gerrit 3.9 image builds and testing https://review.opendev.org/c/opendev/system-config/+/937289 | 19:44 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add Gerrit 3.11 image builds and testing https://review.opendev.org/c/opendev/system-config/+/937290 | 19:44 |
clarkb | infra-root of ^ https://review.opendev.org/c/opendev/system-config/+/937288 shoulod be landed soonish. The other two are less urgent | 19:44 |
clarkb | I've also been using topic:update-gerrit-3.10 if you want to load all of them up at once | 19:46 |
clarkb | looks like we need to land the jeepyb dependency update before zuul will test the gerrit image changes (this is due to jobs being removed that are still used by jeepyb until that change lands) https://review.opendev.org/c/openstack/project-config/+/937268 is the other change that would be good land soonish as a result | 19:51 |
fungi | ugh, isp has bumped my repair estimate out another 3.5 hours | 19:51 |
fungi | i've been taking the opportunity to clean up around my lab and reroute wiring at least | 19:52 |
clarkb | fwiw I have reproduced the login behavior from 3.9.8 on 3.10.3 (well as far as checking login links and refreshing to see they updated I haven't done a full login pass) | 19:54 |
clarkb | I'll make a note to look into getting my fix that landed on 3.9.8 merged up into 3.10 soonish | 19:54 |
clarkb | er it landed on stable-3.9 after 3.9.8 | 19:54 |
clarkb | typically they roll things forward themselves too, but my guess is with holidays and very few changes on 3.9 so far pushing the merge commit up myself will speed things up | 19:55 |
clarkb | fungi: if you interents come back and you haven't called it a weekend then I think 937288 and 937268 would be great to get in. But they can also safely wait until next week | 22:10 |
clarkb | I find myself wearing out and windnig down and my day didn't start as early as yours. Gerrit upgrades are always a bit drianing | 22:10 |
fungi | i'm still holding out hope i can take a look at those, though my isp has bumped out their estimated resolution time by several more hours | 22:50 |
fungi | 02:00 utc now | 22:51 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!