Saturday, 2025-06-28

corvusthe zuul playbook failed on the first deployment because     "cmd": "docker-compose exec -T scheduler zuul-scheduler smart-reconfigure",  did not run00:14
corvuswe should probably wrap that in a conditional to check if the scheduler is running00:15
corvushrm, incidentally, zuul02 is failing with a broken docker package, but i'm not going to bother fixing it since i'm about to delete it.00:16
corvusi believe that zuul01 ran far enough to be a workable system, so i'm going to start it now00:16
corvushrm, looks like letsencrypt did not run00:28
corvusi'm running it now manually00:39
Clark[m]Looks like because infra-prod-base failed a bunch of things were skipped00:39
Clark[m]As a hack you can copy the certs since the names don't change and worry about LE later00:40
corvusyeah, unfortunately, i don't know the log url for infra-prod-base.  that was my fault.  i should have stopped the zuul01 scheduler sooner, but the deploy stuff was fast enough it cut off zuul01's database access :)00:40
corvusi suppose that's probably on bridge somewhere?00:41
Clark[m]corvus: on bridge it's in /var/log/ansible00:41
corvusoh it's just called "base"00:41
corvusi was looking for like "infra-prod"00:41
corvusrestart exim on zuul02 failed.........00:42
corvusso... also weird... but also, still not going to fix :)00:42
corvusanyway, letsencrypt is done, i'm going to stop zuul02 now and see if all the web stuff still looks good00:42
corvusyep, looks good, only running on the new zuul01 now.  zuul02 is down.00:43
corvushopefully round 2 goes better00:43
corvus(still, we had like... a minute or so of a partial outage)00:44
opendevreviewMerged opendev/zone-opendev.org master: Replace zuul02  https://review.opendev.org/c/opendev/zone-opendev.org/+/95360300:47
opendevreviewMerged opendev/system-config master: Replace zuul02  https://review.opendev.org/c/opendev/system-config/+/95360501:34
corvushttps://imgur.com/MgD9uNl02:19
corvusi'm not sure i've checked the status page at this time of day since the ui update...02:19
corvusthat's a lot of little boxes02:19
tkajinamI wonder if anyone else see problems with access tu zuul build logs02:58
corvustkajinam: yes, i'm working on it... something is wrong with the load balancer02:58
tkajinamcorvus, thanks !02:58
tkajinamyeah I thought it can be a problem with lb or network. firefox complains PR_END_OF_FILE_ERROR while chrome complains ERR_CONNECTION_CLOSED03:02
tkajinamboth indicates response is not returned properly03:02
opendevreviewJames E. Blair proposed opendev/system-config master: Update zuul-lb configuration  https://review.opendev.org/c/opendev/system-config/+/95368703:38
corvusinfra-root: ^ i completely forgot about that.  i'm assuming we thought we had a good reason for that at the time... but i think maybe we should think about making that automatic.03:39
corvustkajinam: should be fixed now03:42
corvus#status log replaced zuul01/zuul02 with noble servers; deleted old vms03:49
opendevstatuscorvus: finished logging03:49
opendevreviewJames E. Blair proposed opendev/zone-opendev.org master: Replace ze01-ze06  https://review.opendev.org/c/opendev/zone-opendev.org/+/95368804:04
opendevreviewJames E. Blair proposed opendev/system-config master: Replace ze01-ze06  https://review.opendev.org/c/opendev/system-config/+/95368904:04
corvusinfra-root: ^ i'd like to start on those tomorrow.04:05
tkajinamcorvus, thank you! I can access the logs now.04:40
opendevreviewMerged opendev/system-config master: Update zuul-lb configuration  https://review.opendev.org/c/opendev/system-config/+/95368704:59
corvusinfra-root: i also forgot about https://review.opendev.org/941468 -- it looks like we left /var/haproxy in place, which confused me since i didn't realize it was unused; /var/lib/haproxy is the new location.05:17
corvusnext time i'm at a terminal long enough to monitory, i plan to remove those old directories from both of the load balancers (zuul and gitea).05:18
corvusin the mean time, i've confirmed that the zuul haproxy has the correct config now.05:18
opendevreviewMerged opendev/zone-opendev.org master: Replace ze01-ze06  https://review.opendev.org/c/opendev/zone-opendev.org/+/95368814:19
corvus#status log removed unused /var/haproxy from zuul-lb02 and gitea-lb0214:22
opendevstatuscorvus: finished logging14:23
opendevreviewMerged opendev/system-config master: Replace ze01-ze06  https://review.opendev.org/c/opendev/system-config/+/95368914:44
corvusinitial deployment of all of those was fine, except for ze05 which had a Failed to lock directory /var/lib/apt/lists/: E:Could not get lock /var/lib/apt/lists/lock. It is held by process 2294 (apt-get) -- hopefully fixed in the next run15:36
corvuswonder if it was an unattended updates or something15:37
fungialmost definitely, nothing else should be locking the package database while you're in the middle of initial setup15:42
fungiprobably just got lucky with the random timing on it15:43
corvusoh weird, second run failed with the same pid holding the lock15:43
corvusmaybe it's a stuck unattended upgrades15:43
fungimight be a stuck process in that case? i usually look at `ps afuxww` for a quick tree of parentage15:43
corvusyep that's it exactly15:44
corvusi'll kill it, then run apt-get update/upgrade manually to make sure all is ok15:44
fungimaybe something's blocked on i/o15:44
fungiyeah15:44
fungithough that doesn't bode well for a newly-launched vm15:44
corvushttp    2303 _apt    3u  IPv4 136119      0t0    TCP ze05.opendev.org:46006->ubuntu-archive-3.ps5.canonical.com:http (CLOSE_WAIT)15:45
corvusapt-get upgrade did nothing, and apt-get -f install did nothing as well.15:47
corvusrunning the playbook manually now15:48
corvus#status log replaced ze01-ze06 with noble vms; deleted old vms16:15
opendevstatuscorvus: finished logging16:15
corvusi'm going to gracefully stop ze07-ze12 so that for the rest of the day we'll run on 50% capacity, but all new vms.  if something blows up, hopefully we see it during that period, and we'll still have the old vms.  if everything is okay, i'll swap them out tonight.16:17
corvus(so far, the jobs running on the new ze01 look good)16:17
opendevreviewJames E. Blair proposed opendev/zone-opendev.org master: Replace ze07-ze12  https://review.opendev.org/c/opendev/zone-opendev.org/+/95371316:26
opendevreviewJames E. Blair proposed opendev/system-config master: Replace ze07-ze12  https://review.opendev.org/c/opendev/system-config/+/95371416:26
opendevreviewMerged opendev/system-config master: Switch zuul-schedulers/web to noble  https://review.opendev.org/c/opendev/system-config/+/95269717:04
opendevreviewMerged opendev/system-config master: Switch zuul-executors to noble  https://review.opendev.org/c/opendev/system-config/+/95269817:06

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!