Wednesday, 2025-01-08

clarkb	experimentally with docker compose talking to podman on noble only restart: always results in containers running post reboot. "no" "on-failure" and "unless-stopped" all have the same behavior of not starting containers on boot if they were running when a reboot command was issued	00:00
clarkb	I dno't know that I'll get to testing the docker-compose with docker behavior this evening but I should be able to run through that tomorrow morning and we can compare	00:00
clarkb	I'm going to reset the held paste setup back to restart: always and start containers so that it matchse our existing configs	00:01
RM	I am hitting "https://review.opendev.org/SignInFailure,SIGN_IN,Contact+site+administrator" while trying to login to open, it was working fine before this morning	00:06
RM	sorry opendev* I mean	00:07
RM	can the administrator look into this issue?	00:07
clarkb	RM: it looks liek your email address is in use by another account. This implies that when you loggedin previously you did so with one openid and since then something has changed for you on the ubuntu one side of things to give you a new openid with the same email address	00:10
clarkb	RM: gerrit wants to create a new account for the new openid but fails to do so because an email address cannot be shared by multiple accounts	00:11
clarkb	the easiest solution to this is to login with your prior ubuntu one openid if you are able to	00:11
clarkb	if you are not able to do so then we may have to do some account surgery and essentailly retire your old account and delete the conflicting email address from that account so that you can login with the new openid and get a new gerrit account	00:12
RM	Thank you, it is working now. I used the older account.	00:25
clarkb	RM: great if you look at https://review.opendev.org/settings/#Identities and https://review.opendev.org/settings/#Profile that should give you a complete list of emails that can't be associated with another account	00:26
RM	Thank you, this helps	00:32
*** janders3 is now known as janders		03:16
frickler	infra-root: I would like to get some movement into the stuck unattended-upgrades process, causing us to receive daily mails from almost all hosts. this is a typical example https://paste.opendev.org/show/bgP5k3wLnWDm4V08p0qz/	07:59
frickler	any objection to me doing the updates manually on one host in order to find out about the blockers and then looks whether to automate something or just do it manually on all hosts as a one-off?	07:59
frickler	also I'm not sure about the docker related updates. do we just install the then-current version when creating a host and those are not touched by unattended-upgrades? are we fine with that?	08:01
*** thuvh1 is now known as thuvh		09:10
tonyb	frickler: I suspect that docker isn't being updated as it doesn't match Any of the allowed-Origins. which I think is our bug to correct.	09:46
tonyb	frickler: It looks to me like policy ubuntu-pro-client is a new package, and that requires config and that's the cause of all the other things being on hold (ignoring the docker issue)	09:48
tonyb	frickler: Looking at: https://download.docker.com/linux/ubuntu/dists/jammy/Release I think we should add origin=Docker, suite=jammy to our nodes, probably as part of the install-docker role	09:50
frickler	tonyb: did you intend to upload a new PS for https://review.opendev.org/c/opendev/system-config/+/921321 ? seems a bit weird to me to mark comments as done when in fact nothing has changed yet	10:39
frickler	config-core: two small reviews https://review.opendev.org/c/openstack/project-config/+/938537 https://review.opendev.org/c/openstack/project-config/+/938180	10:42
frickler	tonyb: regarding docker, the question for me is do we really want unattended upgrades for that? or maybe rather pin to some specific version explicitly, avoiding also unintentional upgrades, and just update via config changes when we deem those necessary?	10:45
tonyb	frickler: I will but the changes to the second patch in that series is taking a longer than I expected :/	10:47
tonyb	frickler: I think we do want it handled automacically, but there is years of context that I don't have	10:50
frickler	corvus: seems the image mirror jobs ran fine tonight, but I cannot see the repos. iirc from kolla, newly created repos need some more clicks in order to be made public?	13:11
opendevreview	Lajos Katona proposed openstack/project-config master: Remove taas-tempest-plugin and taas-dashboard jobs https://review.opendev.org/c/openstack/project-config/+/938665	14:30
clarkb	frickler: tonyb we cannot upgrade docker safely automatically doing so often (always?) requires restarting the containers	16:01
clarkb	its less about the version (I can't recall an instance where updating the version presented problems) and about the upgrade process itself	16:02
fungi	yeah, the package upgrade restarts dockerd which downs and ups all the containers	16:02
clarkb	but also unattended upgrades shouldn't touch docker so that shouldn't create any problems for us? the issue with things getting stuck may be separate?	16:03
clarkb	I think the new package thing is likely the problem	16:03
fungi	right, the docker-ce packages aren't considered for upgrade by unattended-upgrades with our current configuration	16:04
clarkb	re the opendevmirror yes I think you have to manually toggle things to public. I may have encoded that into our regular push roles	16:05
clarkb	corvus: frickler ensure-quay-repo in zuul-jobs takes care of that I think so we may want to modify the job to run that first?	16:07
clarkb	its the "visibility" attribute in there	16:08
clarkb	I suspect if we add that to the jobs it will automatically update things for us for all the images on the next run	16:09
clarkb	but it requires an api token which is maybe different to the docker api credential?	16:10
clarkb	corvus: frickler: do we want to update the job and see that it flips things to public properly or manually set those images to public for now and do the job update as part of followup image mirroring?	16:18
clarkb	I personally really need to focus on some annual paperwork stuff today. I'm going to do docker restart testing shortly if the held node is good then I may need to tune out IRC for a bit to focus on that so I'll defer to others	16:19
clarkb	Oh I also brought up the Gerrit H2 cache problems yesterday and there is some feedback from upstream. All h2 db access is currently serialized in gerrit due to the version of h2. This can explain why db operations on startup hold everything up. They are working on updating the h2 version which would remove this restriction. There is some thought that it wasn't the pruning action that	16:22
clarkb	was slow but instead the creation of bloom filters. There is work to add better indexes to the h2 dbs to improve the general performance of operations like this. And finally reindexing writes to the caches which can cause them to explode in size because reindexing looks at all changes rather than just the working set (so you don't really get a cache behavior but instead a total	16:22
clarkb	problem space db) and there is an open change to make that optional	16:22
clarkb	the suggestion to us was that we try and monitor the problem more closely to characterise it a bit better. Perhaps capturing jstacks again if things are slow	16:23
clarkb	but also it seems like there may be some broader effort to imrpove h2 usage, but not necessarily fix the specific problem of the dbs can get really big	16:23
clarkb	for docker-compose and docker managed containers experiencing a reboot while running the default of no policy and a policy of no both don't restart the containers on boot. This matches the podman case. on-failure policy is a weird one. The gerrit container restarted but mariadb did not. If you look at the "no" cases this is beacuse gerrit was exiting non zero but mariadb was not so	16:40
clarkb	it treats gerrit as having failed so it restarts it. This differs from the podman behavior. unless-stopped resulted in containers being started on boot (expected since they weren't manually stopped prior to the reboot) this differs from the podman behavior. Then finally always resulted in containers starting up. This matches podman behavior.	16:40
clarkb	this means that both unless-stopped and on-failure differ. The on-failure behavior is inconsistent enough with docker that my initial impression is that we probably shouldn't rely on that anyway?	16:40
clarkb	On paper it seems like a good idea but if you can restart a container that depends on another container but not the one it depends on due to technicalities in the rules then maybe it isn't a good choice anyway.	16:41
clarkb	That leaves us with unless-stopped which I guess we can approximate with down instead of stop?	16:41
clarkb	I think it is worth all of us considering what that means for our services going forward and if we're happy with the less rich but maybe more consistent behavior of podman?	16:42
clarkb	cc infra-root and corvus in particular	16:42
clarkb	the test nodes are also up if anyone wants to check themselves. The noble node with podman is 158.69.66.158 and focal node with docker is 200.225.47.24	16:44
clarkb	I think I'm still willing to muddle through podman. This is a setback but a relatively minor one compared to the image hosting and speculative image testing problems. Worst case I suspect we could write some monitor/management unit that gives us richer hanlding of reboots	17:11
fungi	more consistent means more predictable. i'm in favor of predictability	17:15
frickler	clarkb: regarding docker upgrades, yes, the restarts caused by it are a concern. I'm just wondering whether we need to be more on the lookout then for possible security things that could affect us? also doing a pin to make the installed version consistent across our servers could still be worthwhile	17:17
frickler	the thing with stuck unattended upgrades is unrelated, I just noticed the docker thing when trying an "apt upgrade"	17:18
clarkb	I have tried to do updates for docker things in the past yes.	17:19
clarkb	manualyl I mean	17:19
frickler	but if there are no concerns, I'd try to run "apt install ubuntu-advantage-tools sosreport" on one host and see how we can resolve those being stuck?	17:19
clarkb	I think doing that is fine. I would try to pick the least impactful affected host	17:21
fungi	yeah, i've manually upgraded those in the past because of other stuck installs breaking ansible deploy runs	17:21
fungi	i wouldn't be overly concerned, though if i were especially industrious i'd look into forcing them to be uninstalled everywhere possible instead	17:22
fungi	because we don't make use of them except on eol distro versions	17:23
frickler	o.k, sosreport couldn't be upgraded because the new version needs more python3-* deps, we can purge it instead. ubuntu-advantage-tools is pulled in by ubuntu-minimal, so it is more difficult to get rid of. I guess I'll just do the manual upgrades, then	17:31
frickler	I have done a focal and a noble node (tracing01 and a mirror), now to find jammy one and then wait for the results tomorrow	17:35
corvus	clarkb: thanks; i think switching our behavior to "down" things that we don't want to restart is okay and we can proceed with podman. as long as we know the right thing to do.	17:35
corvus	clarkb: i will try to manually flip the quay bit and then later update the job for future repos.	17:36
corvus	okay https://quay.io/organization/opendevmirror looks right now	17:40
fungi	nice!	17:41
clarkb	corvus: sounds good to both things	17:44
clarkb	frickler: I think we have a location in our base server role to purge pacakges. We should add sosreport to that list if it isn't lready there	17:44
opendevreview	Dr. Jens Harbott proposed opendev/system-config master: Add sosreport to list of absent packages for base server https://review.opendev.org/c/opendev/system-config/+/938682	17:50
frickler	clarkb: ack ^^	17:50
opendevreview	James E. Blair proposed opendev/system-config master: Rename container mirror job https://review.opendev.org/c/opendev/system-config/+/938686	18:11
opendevreview	Merged opendev/system-config master: Remove explicit docker-compose install in nodepool-launcher https://review.opendev.org/c/opendev/system-config/+/938620	18:13
opendevreview	Jeremy Stanley proposed opendev/bindep master: Evacuate most metadata out of setup.cfg https://review.opendev.org/c/opendev/bindep/+/938520	18:15
opendevreview	Jeremy Stanley proposed opendev/bindep master: Drop support for Python 3.6 https://review.opendev.org/c/opendev/bindep/+/938568	18:15
opendevreview	Jeremy Stanley proposed opendev/bindep master: Drop requirements.txt https://review.opendev.org/c/opendev/bindep/+/938570	18:15
opendevreview	James E. Blair proposed opendev/base-jobs master: Add opendev-mirror-container-images job https://review.opendev.org/c/opendev/base-jobs/+/938687	18:20
opendevreview	James E. Blair proposed opendev/system-config master: Use opendev container mirror base job https://review.opendev.org/c/opendev/system-config/+/938689	18:26
opendevreview	James E. Blair proposed opendev/system-config master: Mirror jaegertracing https://review.opendev.org/c/opendev/system-config/+/938690	18:29
corvus	clarkb: frickler fungi https://review.opendev.org/q/hashtag:opendevmirror should address the visibility problem	18:30
corvus	also, one extra image added at the end. again, sort of wishy washy on whether it should be in opendev or zuul... :\|	18:31
opendevreview	James E. Blair proposed opendev/system-config master: Use opendev container mirror base job https://review.opendev.org/c/opendev/system-config/+/938689	18:33
opendevreview	James E. Blair proposed opendev/system-config master: Mirror jaegertracing https://review.opendev.org/c/opendev/system-config/+/938690	18:33
clarkb	corvus: does container registry credentials have the apikey in it?	18:39
clarkb	oh yup I just have to map the name to the opendev specific name	18:39
clarkb	I see it now sorry for the noise	18:39
clarkb	corvus: there is a valid linter error in the base-jobs change	18:40
clarkb	corvus: and I have a question on https://review.opendev.org/c/opendev/system-config/+/938689	18:41
opendevreview	James E. Blair proposed opendev/base-jobs master: Add opendev-mirror-container-images job https://review.opendev.org/c/opendev/base-jobs/+/938687	18:51
corvus	clarkb: thx replied	18:52
clarkb	thanks +2 from me on the whole stack at this point. The rename to old- change is approved too	18:56
opendevreview	Merged opendev/system-config master: Add sosreport to list of absent packages for base server https://review.opendev.org/c/opendev/system-config/+/938682	19:29
opendevreview	Merged opendev/system-config master: Rename container mirror job https://review.opendev.org/c/opendev/system-config/+/938686	19:29
opendevreview	Merged opendev/base-jobs master: Add opendev-mirror-container-images job https://review.opendev.org/c/opendev/base-jobs/+/938687	19:46
opendevreview	Merged opendev/system-config master: Use opendev container mirror base job https://review.opendev.org/c/opendev/system-config/+/938689	20:05
opendevreview	Vladimir Kozhukalov proposed openstack/project-config master: Add refs/tags/* acls for openstack-helm projects https://review.opendev.org/c/openstack/project-config/+/938698	20:51
tonyb	fungi: The last time we held a mediawiki node you added me to the needed groups/roles etc so I could test the spam patrolling process. Can you make that same change on the live site?	22:15
fungi	tonyb: "the live site" being wiki.openstack.org? sure	22:17
fungi	what's your username there?	22:17
tonyb	TonyBreeds	22:18
tonyb	fungi: Thank you	22:18
fungi	done, you have the same perms as i do now	22:19
fungi	https://wiki.openstack.org/w/index.php?title=Special:RecentChanges&limit=500&hidepatrolled=1&days=30	22:19
tonyb	huzzah!	22:19
fungi	that's the url i usually check, should be empty currently	22:19
tonyb	Great.	22:20
tonyb	I can't say how often but I'll keep an eye on it	22:21
fungi	it's not critical. i try to check it about once a day when i get to my desk, but sometimes i go a week without gettint to it	22:29
fungi	the goal is that if a spammer does end up creating an account and posting garbage, we'll catch and delete it reasonably quickly before search engines pick it up and block their account so it doesn't get reused	22:30
fungi	in the past we've had incidents where scammers posted fake support hotline phone numbers in order to get them to show up in search engines, and then we got complaints when their purported victims ended up scammed (though my guess is the "complaints" were actually from the scammers pretending to be victims of their own scams, since they were asking to have their losses refunded)	22:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!