Wednesday, 2025-01-08

clarkbexperimentally with docker compose talking to podman on noble only restart: always results in containers running post reboot. "no" "on-failure" and "unless-stopped" all have the same behavior of not starting containers on boot if they were running when a reboot command was issued00:00
clarkbI dno't know that I'll get to testing the docker-compose with docker behavior this evening but I should be able to run through that tomorrow morning and we can compare00:00
clarkbI'm going to reset the held paste setup back to restart: always and start containers so that it matchse our existing configs00:01
RMI am hitting "https://review.opendev.org/SignInFailure,SIGN_IN,Contact+site+administrator" while trying to login to open, it was working fine before this morning00:06
RMsorry opendev* I mean00:07
RMcan the administrator look into this issue?00:07
clarkbRM: it looks liek your email address is in use by another account. This implies that when you loggedin previously you did so with one openid and since then something has changed for you on the ubuntu one side of things to give you a new openid with the same email address00:10
clarkbRM: gerrit wants to create a new account for the new openid but fails to do so because an email address cannot be shared by multiple accounts00:11
clarkbthe easiest solution to this is to login with your prior ubuntu one openid if you are able to00:11
clarkbif you are not able to do so then we may have to do some account surgery and essentailly retire your old account and delete the conflicting email address from that account so that you can login with the new openid and get a new gerrit account00:12
RMThank you, it is working now.  I used the older account.00:25
clarkbRM: great if you look at https://review.opendev.org/settings/#Identities and https://review.opendev.org/settings/#Profile that should give you a complete list of emails that can't be associated with another account00:26
RMThank you, this helps00:32
*** janders3 is now known as janders03:16
fricklerinfra-root: I would like to get some movement into the stuck unattended-upgrades process, causing us to receive daily mails from almost all hosts. this is a typical example https://paste.opendev.org/show/bgP5k3wLnWDm4V08p0qz/07:59
fricklerany objection to me doing the updates manually on one host in order to find out about the blockers and then looks whether to automate something or just do it manually on all hosts as a one-off?07:59
frickleralso I'm not sure about the docker related updates. do we just install the then-current version when creating a host and those are not touched by unattended-upgrades? are we fine with that?08:01
*** thuvh1 is now known as thuvh09:10
tonybfrickler: I suspect that docker isn't being updated as it doesn't match Any of the allowed-Origins. which I think is our bug to correct.09:46
tonybfrickler: It looks to me like policy ubuntu-pro-client is a new package, and that requires config and that's the cause of all the other things being on hold (ignoring the docker issue)09:48
tonybfrickler: Looking at: https://download.docker.com/linux/ubuntu/dists/jammy/Release I think we should add origin=Docker, suite=jammy to our nodes, probably as part of the install-docker role09:50
fricklertonyb: did you intend to upload a new PS for https://review.opendev.org/c/opendev/system-config/+/921321 ? seems a bit weird to me to mark comments as done when in fact nothing has changed yet10:39
fricklerconfig-core: two small reviews https://review.opendev.org/c/openstack/project-config/+/938537 https://review.opendev.org/c/openstack/project-config/+/93818010:42
fricklertonyb: regarding docker, the question for me is do we really want unattended upgrades for that? or maybe rather pin to some specific version explicitly, avoiding also unintentional upgrades, and just update via config changes when we deem those necessary?10:45
tonybfrickler: I will but the changes to the second patch in that series is taking a longer than I expected :/10:47
tonybfrickler: I think we do want it handled automacically, but there is years of context that I don't have10:50
fricklercorvus: seems the image mirror jobs ran fine tonight, but I cannot see the repos. iirc from kolla, newly created repos need some more clicks in order to be made public?13:11
opendevreviewLajos Katona proposed openstack/project-config master: Remove taas-tempest-plugin and taas-dashboard jobs  https://review.opendev.org/c/openstack/project-config/+/93866514:30
clarkbfrickler: tonyb  we cannot upgrade docker safely automatically doing so often (always?) requires restarting the containers16:01
clarkbits less about the version (I can't recall an instance where updating the version presented problems) and about the upgrade process itself16:02
fungiyeah, the package upgrade restarts dockerd which downs and ups all the containers16:02
clarkbbut also unattended upgrades shouldn't touch docker so that shouldn't create any problems for us? the issue with things getting stuck may be separate?16:03
clarkbI think the new package thing is likely the problem16:03
fungiright, the docker-ce packages aren't considered for upgrade by unattended-upgrades with our current configuration16:04
clarkbre the opendevmirror yes I think you have to manually toggle things to public. I may have encoded that into our regular push roles16:05
clarkbcorvus: frickler ensure-quay-repo in zuul-jobs takes care of that I think so we may want to modify the job to run that first?16:07
clarkbits the "visibility" attribute in there16:08
clarkbI suspect if we add that to the jobs it will automatically update things for us for all the images on the next run16:09
clarkbbut it requires an api token which is maybe different to the docker api credential?16:10
clarkbcorvus: frickler: do we want to update the job and see that it flips things to public properly or manually set those images to public for now and do the job update as part of followup image mirroring?16:18
clarkbI personally really need to focus on some annual paperwork stuff today. I'm going to do docker restart testing shortly if the held node is good then I may need to tune out IRC for a bit to focus on that so I'll defer to others16:19
clarkbOh I also brought up the Gerrit H2 cache problems yesterday and there is some feedback from upstream. All h2 db access is currently serialized in gerrit due to the version of h2. This can explain why db operations on startup hold everything up. They are working on updating the h2 version which would remove this restriction. There is some thought that it wasn't the pruning action that16:22
clarkbwas slow but instead the creation of bloom filters. There is work to add better indexes to the h2 dbs to improve the general performance of operations like this. And finally reindexing writes to the caches which can cause them to explode in size because reindexing looks at all changes rather than just the working set (so you don't really get a cache behavior but instead a total16:22
clarkbproblem space db) and there is an open change to make that optional16:22
clarkbthe suggestion to us was that we try and monitor the problem more closely to characterise it a bit better. Perhaps capturing jstacks again if things are slow16:23
clarkbbut also it seems like there may be some broader effort to imrpove h2 usage, but not necessarily fix the specific problem of the dbs can get really big16:23
clarkbfor docker-compose and docker managed containers experiencing a reboot while running the default of no policy and a policy of no both don't restart the containers on boot. This matches the podman case. on-failure policy is a weird one. The gerrit container restarted but mariadb did not. If you look at the "no" cases this is beacuse gerrit was exiting non zero but mariadb was not so16:40
clarkbit treats gerrit as having failed so it restarts it. This differs from the podman behavior. unless-stopped resulted in containers being started on boot (expected since they weren't manually stopped prior to the reboot) this differs from the podman behavior. Then finally always resulted in containers starting up. This matches podman behavior.16:40
clarkbthis means that both unless-stopped and on-failure differ. The on-failure behavior is inconsistent enough with docker that my initial impression is that we probably shouldn't rely on that anyway?16:40
clarkbOn paper it seems like a good idea but if you can restart a container that depends on another container but not the one it depends on due to technicalities in the rules then maybe it isn't a good choice anyway.16:41
clarkbThat leaves us with unless-stopped which I guess we can approximate with down instead of stop?16:41
clarkbI think it is worth all of us considering what that means for our services going forward and if we're happy with the less rich but maybe more consistent behavior of podman?16:42
clarkbcc infra-root and corvus in particular16:42
clarkbthe test nodes are also up if anyone wants to check themselves. The noble node with podman is 158.69.66.158 and focal node with docker is 200.225.47.2416:44
clarkbI think I'm still willing to muddle through podman. This is a setback but a relatively minor one compared to the image hosting and speculative image testing problems. Worst case I suspect we could write some monitor/management unit that gives us richer hanlding of reboots17:11
fungimore consistent means more predictable. i'm in favor of predictability17:15
fricklerclarkb: regarding docker upgrades, yes, the restarts caused by it are a concern. I'm just wondering whether we need to be more on the lookout then for possible security things that could affect us? also doing a pin to make the installed version consistent across our servers could still be worthwhile17:17
fricklerthe thing with stuck unattended upgrades is unrelated, I just noticed the docker thing when trying an "apt upgrade"17:18
clarkbI have tried to do updates for docker things in the past yes.17:19
clarkbmanualyl I mean17:19
fricklerbut if there are no concerns, I'd try to run "apt install ubuntu-advantage-tools sosreport" on one host and see how we can resolve those being stuck?17:19
clarkbI think doing that is fine. I would try to pick the least impactful affected host17:21
fungiyeah, i've manually upgraded those in the past because of other stuck installs breaking ansible deploy runs17:21
fungii wouldn't be overly concerned, though if i were especially industrious i'd look into forcing them to be uninstalled everywhere possible instead17:22
fungibecause we don't make use of them except on eol distro versions17:23
fricklero.k, sosreport couldn't be upgraded because the new version needs more python3-* deps, we can purge it instead. ubuntu-advantage-tools is pulled in by ubuntu-minimal, so it is more difficult to get rid of. I guess I'll just do the manual upgrades, then17:31
fricklerI have done a focal and a noble node (tracing01 and a mirror), now to find jammy one and then wait for the results tomorrow17:35
corvusclarkb: thanks; i think switching our behavior to "down" things that we don't want to restart is okay and we can proceed with podman.  as long as we know the right thing to do.17:35
corvusclarkb: i will try to manually flip the quay bit and then later update the job for future repos.17:36
corvusokay https://quay.io/organization/opendevmirror looks right now17:40
funginice!17:41
clarkbcorvus: sounds good to both things17:44
clarkbfrickler: I think we have a location in our base server role to purge pacakges. We should add sosreport to that list if it isn't lready there17:44
opendevreviewDr. Jens Harbott proposed opendev/system-config master: Add sosreport to list of absent packages for base server  https://review.opendev.org/c/opendev/system-config/+/93868217:50
fricklerclarkb: ack ^^17:50
opendevreviewJames E. Blair proposed opendev/system-config master: Rename container mirror job  https://review.opendev.org/c/opendev/system-config/+/93868618:11
opendevreviewMerged opendev/system-config master: Remove explicit docker-compose install in nodepool-launcher  https://review.opendev.org/c/opendev/system-config/+/93862018:13
opendevreviewJeremy Stanley proposed opendev/bindep master: Evacuate most metadata out of setup.cfg  https://review.opendev.org/c/opendev/bindep/+/93852018:15
opendevreviewJeremy Stanley proposed opendev/bindep master: Drop support for Python 3.6  https://review.opendev.org/c/opendev/bindep/+/93856818:15
opendevreviewJeremy Stanley proposed opendev/bindep master: Drop requirements.txt  https://review.opendev.org/c/opendev/bindep/+/93857018:15
opendevreviewJames E. Blair proposed opendev/base-jobs master: Add opendev-mirror-container-images job  https://review.opendev.org/c/opendev/base-jobs/+/93868718:20
opendevreviewJames E. Blair proposed opendev/system-config master: Use opendev container mirror base job  https://review.opendev.org/c/opendev/system-config/+/93868918:26
opendevreviewJames E. Blair proposed opendev/system-config master: Mirror jaegertracing  https://review.opendev.org/c/opendev/system-config/+/93869018:29
corvusclarkb: frickler fungi https://review.opendev.org/q/hashtag:opendevmirror  should address the visibility problem18:30
corvusalso, one extra image added at the end.  again, sort of wishy washy on whether it should be in opendev or zuul... :|18:31
opendevreviewJames E. Blair proposed opendev/system-config master: Use opendev container mirror base job  https://review.opendev.org/c/opendev/system-config/+/93868918:33
opendevreviewJames E. Blair proposed opendev/system-config master: Mirror jaegertracing  https://review.opendev.org/c/opendev/system-config/+/93869018:33
clarkbcorvus: does container registry credentials have the apikey in it?18:39
clarkboh yup I just have to map the name to the opendev specific name18:39
clarkbI see it now sorry for the noise18:39
clarkbcorvus: there is a valid linter error in the base-jobs change18:40
clarkbcorvus: and I have a question on https://review.opendev.org/c/opendev/system-config/+/93868918:41
opendevreviewJames E. Blair proposed opendev/base-jobs master: Add opendev-mirror-container-images job  https://review.opendev.org/c/opendev/base-jobs/+/93868718:51
corvusclarkb: thx replied18:52
clarkbthanks +2 from me on the whole stack at this point. The rename to old- change is approved too18:56
opendevreviewMerged opendev/system-config master: Add sosreport to list of absent packages for base server  https://review.opendev.org/c/opendev/system-config/+/93868219:29
opendevreviewMerged opendev/system-config master: Rename container mirror job  https://review.opendev.org/c/opendev/system-config/+/93868619:29
opendevreviewMerged opendev/base-jobs master: Add opendev-mirror-container-images job  https://review.opendev.org/c/opendev/base-jobs/+/93868719:46
opendevreviewMerged opendev/system-config master: Use opendev container mirror base job  https://review.opendev.org/c/opendev/system-config/+/93868920:05
opendevreviewVladimir Kozhukalov proposed openstack/project-config master: Add refs/tags/* acls for openstack-helm projects  https://review.opendev.org/c/openstack/project-config/+/93869820:51
tonybfungi: The last time we held a mediawiki node you added me to the needed groups/roles etc so I could test the spam patrolling process.  Can you make that same change on the live site?22:15
fungitonyb: "the live site" being wiki.openstack.org? sure22:17
fungiwhat's your username there?22:17
tonybTonyBreeds22:18
tonybfungi: Thank you22:18
fungidone, you have the same perms as i do now22:19
fungihttps://wiki.openstack.org/w/index.php?title=Special:RecentChanges&limit=500&hidepatrolled=1&days=3022:19
tonybhuzzah!22:19
fungithat's the url i usually check, should be empty currently22:19
tonybGreat.22:20
tonybI can't say how often but I'll keep an eye on it22:21
fungiit's not critical. i try to check it about once a day when i get to my desk, but sometimes i go a week without gettint to it22:29
fungithe goal is that if a spammer does end up creating an account and posting garbage, we'll catch and delete it reasonably quickly before search engines pick it up and block their account so it doesn't get reused22:30
fungiin the past we've had incidents where scammers posted fake support hotline phone numbers in order to get them to show up in search engines, and then we got complaints when their purported victims ended up scammed (though my guess is the "complaints" were actually from the scammers pretending to be victims of their own scams, since they were asking to have their losses refunded)22:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!