clarkb | experimentally with docker compose talking to podman on noble only restart: always results in containers running post reboot. "no" "on-failure" and "unless-stopped" all have the same behavior of not starting containers on boot if they were running when a reboot command was issued | 00:00 |
---|---|---|
clarkb | I dno't know that I'll get to testing the docker-compose with docker behavior this evening but I should be able to run through that tomorrow morning and we can compare | 00:00 |
clarkb | I'm going to reset the held paste setup back to restart: always and start containers so that it matchse our existing configs | 00:01 |
RM | I am hitting "https://review.opendev.org/SignInFailure,SIGN_IN,Contact+site+administrator" while trying to login to open, it was working fine before this morning | 00:06 |
RM | sorry opendev* I mean | 00:07 |
RM | can the administrator look into this issue? | 00:07 |
clarkb | RM: it looks liek your email address is in use by another account. This implies that when you loggedin previously you did so with one openid and since then something has changed for you on the ubuntu one side of things to give you a new openid with the same email address | 00:10 |
clarkb | RM: gerrit wants to create a new account for the new openid but fails to do so because an email address cannot be shared by multiple accounts | 00:11 |
clarkb | the easiest solution to this is to login with your prior ubuntu one openid if you are able to | 00:11 |
clarkb | if you are not able to do so then we may have to do some account surgery and essentailly retire your old account and delete the conflicting email address from that account so that you can login with the new openid and get a new gerrit account | 00:12 |
RM | Thank you, it is working now. I used the older account. | 00:25 |
clarkb | RM: great if you look at https://review.opendev.org/settings/#Identities and https://review.opendev.org/settings/#Profile that should give you a complete list of emails that can't be associated with another account | 00:26 |
RM | Thank you, this helps | 00:32 |
*** janders3 is now known as janders | 03:16 | |
frickler | infra-root: I would like to get some movement into the stuck unattended-upgrades process, causing us to receive daily mails from almost all hosts. this is a typical example https://paste.opendev.org/show/bgP5k3wLnWDm4V08p0qz/ | 07:59 |
frickler | any objection to me doing the updates manually on one host in order to find out about the blockers and then looks whether to automate something or just do it manually on all hosts as a one-off? | 07:59 |
frickler | also I'm not sure about the docker related updates. do we just install the then-current version when creating a host and those are not touched by unattended-upgrades? are we fine with that? | 08:01 |
*** thuvh1 is now known as thuvh | 09:10 | |
tonyb | frickler: I suspect that docker isn't being updated as it doesn't match Any of the allowed-Origins. which I think is our bug to correct. | 09:46 |
tonyb | frickler: It looks to me like policy ubuntu-pro-client is a new package, and that requires config and that's the cause of all the other things being on hold (ignoring the docker issue) | 09:48 |
tonyb | frickler: Looking at: https://download.docker.com/linux/ubuntu/dists/jammy/Release I think we should add origin=Docker, suite=jammy to our nodes, probably as part of the install-docker role | 09:50 |
frickler | tonyb: did you intend to upload a new PS for https://review.opendev.org/c/opendev/system-config/+/921321 ? seems a bit weird to me to mark comments as done when in fact nothing has changed yet | 10:39 |
frickler | config-core: two small reviews https://review.opendev.org/c/openstack/project-config/+/938537 https://review.opendev.org/c/openstack/project-config/+/938180 | 10:42 |
frickler | tonyb: regarding docker, the question for me is do we really want unattended upgrades for that? or maybe rather pin to some specific version explicitly, avoiding also unintentional upgrades, and just update via config changes when we deem those necessary? | 10:45 |
tonyb | frickler: I will but the changes to the second patch in that series is taking a longer than I expected :/ | 10:47 |
tonyb | frickler: I think we do want it handled automacically, but there is years of context that I don't have | 10:50 |
frickler | corvus: seems the image mirror jobs ran fine tonight, but I cannot see the repos. iirc from kolla, newly created repos need some more clicks in order to be made public? | 13:11 |
opendevreview | Lajos Katona proposed openstack/project-config master: Remove taas-tempest-plugin and taas-dashboard jobs https://review.opendev.org/c/openstack/project-config/+/938665 | 14:30 |
clarkb | frickler: tonyb we cannot upgrade docker safely automatically doing so often (always?) requires restarting the containers | 16:01 |
clarkb | its less about the version (I can't recall an instance where updating the version presented problems) and about the upgrade process itself | 16:02 |
fungi | yeah, the package upgrade restarts dockerd which downs and ups all the containers | 16:02 |
clarkb | but also unattended upgrades shouldn't touch docker so that shouldn't create any problems for us? the issue with things getting stuck may be separate? | 16:03 |
clarkb | I think the new package thing is likely the problem | 16:03 |
fungi | right, the docker-ce packages aren't considered for upgrade by unattended-upgrades with our current configuration | 16:04 |
clarkb | re the opendevmirror yes I think you have to manually toggle things to public. I may have encoded that into our regular push roles | 16:05 |
clarkb | corvus: frickler ensure-quay-repo in zuul-jobs takes care of that I think so we may want to modify the job to run that first? | 16:07 |
clarkb | its the "visibility" attribute in there | 16:08 |
clarkb | I suspect if we add that to the jobs it will automatically update things for us for all the images on the next run | 16:09 |
clarkb | but it requires an api token which is maybe different to the docker api credential? | 16:10 |
clarkb | corvus: frickler: do we want to update the job and see that it flips things to public properly or manually set those images to public for now and do the job update as part of followup image mirroring? | 16:18 |
clarkb | I personally really need to focus on some annual paperwork stuff today. I'm going to do docker restart testing shortly if the held node is good then I may need to tune out IRC for a bit to focus on that so I'll defer to others | 16:19 |
clarkb | Oh I also brought up the Gerrit H2 cache problems yesterday and there is some feedback from upstream. All h2 db access is currently serialized in gerrit due to the version of h2. This can explain why db operations on startup hold everything up. They are working on updating the h2 version which would remove this restriction. There is some thought that it wasn't the pruning action that | 16:22 |
clarkb | was slow but instead the creation of bloom filters. There is work to add better indexes to the h2 dbs to improve the general performance of operations like this. And finally reindexing writes to the caches which can cause them to explode in size because reindexing looks at all changes rather than just the working set (so you don't really get a cache behavior but instead a total | 16:22 |
clarkb | problem space db) and there is an open change to make that optional | 16:22 |
clarkb | the suggestion to us was that we try and monitor the problem more closely to characterise it a bit better. Perhaps capturing jstacks again if things are slow | 16:23 |
clarkb | but also it seems like there may be some broader effort to imrpove h2 usage, but not necessarily fix the specific problem of the dbs can get really big | 16:23 |
clarkb | for docker-compose and docker managed containers experiencing a reboot while running the default of no policy and a policy of no both don't restart the containers on boot. This matches the podman case. on-failure policy is a weird one. The gerrit container restarted but mariadb did not. If you look at the "no" cases this is beacuse gerrit was exiting non zero but mariadb was not so | 16:40 |
clarkb | it treats gerrit as having failed so it restarts it. This differs from the podman behavior. unless-stopped resulted in containers being started on boot (expected since they weren't manually stopped prior to the reboot) this differs from the podman behavior. Then finally always resulted in containers starting up. This matches podman behavior. | 16:40 |
clarkb | this means that both unless-stopped and on-failure differ. The on-failure behavior is inconsistent enough with docker that my initial impression is that we probably shouldn't rely on that anyway? | 16:40 |
clarkb | On paper it seems like a good idea but if you can restart a container that depends on another container but not the one it depends on due to technicalities in the rules then maybe it isn't a good choice anyway. | 16:41 |
clarkb | That leaves us with unless-stopped which I guess we can approximate with down instead of stop? | 16:41 |
clarkb | I think it is worth all of us considering what that means for our services going forward and if we're happy with the less rich but maybe more consistent behavior of podman? | 16:42 |
clarkb | cc infra-root and corvus in particular | 16:42 |
clarkb | the test nodes are also up if anyone wants to check themselves. The noble node with podman is 158.69.66.158 and focal node with docker is 200.225.47.24 | 16:44 |
clarkb | I think I'm still willing to muddle through podman. This is a setback but a relatively minor one compared to the image hosting and speculative image testing problems. Worst case I suspect we could write some monitor/management unit that gives us richer hanlding of reboots | 17:11 |
fungi | more consistent means more predictable. i'm in favor of predictability | 17:15 |
frickler | clarkb: regarding docker upgrades, yes, the restarts caused by it are a concern. I'm just wondering whether we need to be more on the lookout then for possible security things that could affect us? also doing a pin to make the installed version consistent across our servers could still be worthwhile | 17:17 |
frickler | the thing with stuck unattended upgrades is unrelated, I just noticed the docker thing when trying an "apt upgrade" | 17:18 |
clarkb | I have tried to do updates for docker things in the past yes. | 17:19 |
clarkb | manualyl I mean | 17:19 |
frickler | but if there are no concerns, I'd try to run "apt install ubuntu-advantage-tools sosreport" on one host and see how we can resolve those being stuck? | 17:19 |
clarkb | I think doing that is fine. I would try to pick the least impactful affected host | 17:21 |
fungi | yeah, i've manually upgraded those in the past because of other stuck installs breaking ansible deploy runs | 17:21 |
fungi | i wouldn't be overly concerned, though if i were especially industrious i'd look into forcing them to be uninstalled everywhere possible instead | 17:22 |
fungi | because we don't make use of them except on eol distro versions | 17:23 |
frickler | o.k, sosreport couldn't be upgraded because the new version needs more python3-* deps, we can purge it instead. ubuntu-advantage-tools is pulled in by ubuntu-minimal, so it is more difficult to get rid of. I guess I'll just do the manual upgrades, then | 17:31 |
frickler | I have done a focal and a noble node (tracing01 and a mirror), now to find jammy one and then wait for the results tomorrow | 17:35 |
corvus | clarkb: thanks; i think switching our behavior to "down" things that we don't want to restart is okay and we can proceed with podman. as long as we know the right thing to do. | 17:35 |
corvus | clarkb: i will try to manually flip the quay bit and then later update the job for future repos. | 17:36 |
corvus | okay https://quay.io/organization/opendevmirror looks right now | 17:40 |
fungi | nice! | 17:41 |
clarkb | corvus: sounds good to both things | 17:44 |
clarkb | frickler: I think we have a location in our base server role to purge pacakges. We should add sosreport to that list if it isn't lready there | 17:44 |
opendevreview | Dr. Jens Harbott proposed opendev/system-config master: Add sosreport to list of absent packages for base server https://review.opendev.org/c/opendev/system-config/+/938682 | 17:50 |
frickler | clarkb: ack ^^ | 17:50 |
opendevreview | James E. Blair proposed opendev/system-config master: Rename container mirror job https://review.opendev.org/c/opendev/system-config/+/938686 | 18:11 |
opendevreview | Merged opendev/system-config master: Remove explicit docker-compose install in nodepool-launcher https://review.opendev.org/c/opendev/system-config/+/938620 | 18:13 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Evacuate most metadata out of setup.cfg https://review.opendev.org/c/opendev/bindep/+/938520 | 18:15 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Drop support for Python 3.6 https://review.opendev.org/c/opendev/bindep/+/938568 | 18:15 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Drop requirements.txt https://review.opendev.org/c/opendev/bindep/+/938570 | 18:15 |
opendevreview | James E. Blair proposed opendev/base-jobs master: Add opendev-mirror-container-images job https://review.opendev.org/c/opendev/base-jobs/+/938687 | 18:20 |
opendevreview | James E. Blair proposed opendev/system-config master: Use opendev container mirror base job https://review.opendev.org/c/opendev/system-config/+/938689 | 18:26 |
opendevreview | James E. Blair proposed opendev/system-config master: Mirror jaegertracing https://review.opendev.org/c/opendev/system-config/+/938690 | 18:29 |
corvus | clarkb: frickler fungi https://review.opendev.org/q/hashtag:opendevmirror should address the visibility problem | 18:30 |
corvus | also, one extra image added at the end. again, sort of wishy washy on whether it should be in opendev or zuul... :| | 18:31 |
opendevreview | James E. Blair proposed opendev/system-config master: Use opendev container mirror base job https://review.opendev.org/c/opendev/system-config/+/938689 | 18:33 |
opendevreview | James E. Blair proposed opendev/system-config master: Mirror jaegertracing https://review.opendev.org/c/opendev/system-config/+/938690 | 18:33 |
clarkb | corvus: does container registry credentials have the apikey in it? | 18:39 |
clarkb | oh yup I just have to map the name to the opendev specific name | 18:39 |
clarkb | I see it now sorry for the noise | 18:39 |
clarkb | corvus: there is a valid linter error in the base-jobs change | 18:40 |
clarkb | corvus: and I have a question on https://review.opendev.org/c/opendev/system-config/+/938689 | 18:41 |
opendevreview | James E. Blair proposed opendev/base-jobs master: Add opendev-mirror-container-images job https://review.opendev.org/c/opendev/base-jobs/+/938687 | 18:51 |
corvus | clarkb: thx replied | 18:52 |
clarkb | thanks +2 from me on the whole stack at this point. The rename to old- change is approved too | 18:56 |
opendevreview | Merged opendev/system-config master: Add sosreport to list of absent packages for base server https://review.opendev.org/c/opendev/system-config/+/938682 | 19:29 |
opendevreview | Merged opendev/system-config master: Rename container mirror job https://review.opendev.org/c/opendev/system-config/+/938686 | 19:29 |
opendevreview | Merged opendev/base-jobs master: Add opendev-mirror-container-images job https://review.opendev.org/c/opendev/base-jobs/+/938687 | 19:46 |
opendevreview | Merged opendev/system-config master: Use opendev container mirror base job https://review.opendev.org/c/opendev/system-config/+/938689 | 20:05 |
opendevreview | Vladimir Kozhukalov proposed openstack/project-config master: Add refs/tags/* acls for openstack-helm projects https://review.opendev.org/c/openstack/project-config/+/938698 | 20:51 |
tonyb | fungi: The last time we held a mediawiki node you added me to the needed groups/roles etc so I could test the spam patrolling process. Can you make that same change on the live site? | 22:15 |
fungi | tonyb: "the live site" being wiki.openstack.org? sure | 22:17 |
fungi | what's your username there? | 22:17 |
tonyb | TonyBreeds | 22:18 |
tonyb | fungi: Thank you | 22:18 |
fungi | done, you have the same perms as i do now | 22:19 |
fungi | https://wiki.openstack.org/w/index.php?title=Special:RecentChanges&limit=500&hidepatrolled=1&days=30 | 22:19 |
tonyb | huzzah! | 22:19 |
fungi | that's the url i usually check, should be empty currently | 22:19 |
tonyb | Great. | 22:20 |
tonyb | I can't say how often but I'll keep an eye on it | 22:21 |
fungi | it's not critical. i try to check it about once a day when i get to my desk, but sometimes i go a week without gettint to it | 22:29 |
fungi | the goal is that if a spammer does end up creating an account and posting garbage, we'll catch and delete it reasonably quickly before search engines pick it up and block their account so it doesn't get reused | 22:30 |
fungi | in the past we've had incidents where scammers posted fake support hotline phone numbers in order to get them to show up in search engines, and then we got complaints when their purported victims ended up scammed (though my guess is the "complaints" were actually from the scammers pretending to be victims of their own scams, since they were asking to have their losses refunded) | 22:32 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!