clarkb | Hello, it is meeting time | 18:59 |
---|---|---|
clarkb | we'll get started in a couple of minutes | 18:59 |
fungi | ahoy! | 18:59 |
ianw | o/ | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Oct 4 19:01:15 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-October/000363.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | The OpenStack release is happenign this week (tomorrow in fact) | 19:01 |
clarkb | fungi: I Think you indicated you would try to be around early tomorrow to keep an eye on things. I'll do my best too | 19:01 |
clarkb | But I don't expect any issues | 19:02 |
fungi | yeah, though i have appointments starting around 14:00 utc | 19:02 |
fungi | so will be less available at that point | 19:02 |
fungi | extra eyes are appreciated | 19:02 |
clarkb | I can probably be around at that point and take over | 19:03 |
clarkb | The other thing to note is that the PTG is in 2 weeks | 19:03 |
clarkb | #topic Bastion Host Changes | 19:04 |
clarkb | lets dive right into the agenda | 19:04 |
clarkb | ianw has made progress on a stack fo chagnes to shift bridge to running ansible out of a venv | 19:04 |
clarkb | #link https://review.opendev.org/q/topic:bridge-ansible-venv | 19:04 |
clarkb | The changes lgtm but please do reivew them carefully since this is the bastion host | 19:04 |
ianw | yep i need to loop back on your comments thankyou, but it's close | 19:05 |
clarkb | ianw: one thing I noted on one of the chagnes is that launch node may need different venvs for different clouds in order to have different versions of oepnstacksdk | 19:05 |
clarkb | It is possible that good followup to this will be managing launch node venvs for that purpose | 19:05 |
clarkb | And then separately your change to update zuul to disable console log file generation landed in zuul and I think the most recent restart of the cluster picked it up | 19:06 |
clarkb | That means we can configure out jobs to not write those files | 19:06 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/855472 | 19:06 |
ianw | yeah; is that mostly cinder/rax? i feel like that's been a pita before, and i saw in scrollback annoyances adding disk to nodepool builders | 19:06 |
ianw | (openstacksdk venvs) | 19:06 |
clarkb | ianw: its now rax and networking (not sure if nova or neutron is the problem there) | 19:06 |
clarkb | but ya | 19:06 |
clarkb | ianw: re console log writing I have a note there that a second location also needs the update. | 19:07 |
fungi | though on a related note, the openstacksdk maintainers want to add a new pipeline in the openstack tenant of our zuul to ease testing of public clouds | 19:07 |
fungi | (a "post-review" pipeline and associated required label in gerrit to enable/trigger it) | 19:08 |
clarkb | and I've proposed a topic to the openstack tc ptg to discuss not forgetting the sdk is a tool for end users in addition to an itnernal api tool for openstack clusters | 19:08 |
ianw | ++ | 19:08 |
fungi | i think i'm the only reviewer to have provided them feedback on those changes so far | 19:08 |
ianw | we don't want to have to start another project to smooth out differences in openstacksdk versions ... maybe call it "shade" :) | 19:08 |
clarkb | fungi: I thought I left a comment too | 19:09 |
fungi | ahh, cool | 19:09 |
fungi | i probably missed the update | 19:09 |
clarkb | indicating that there isn't a reason to put it in project-config I don't think | 19:09 |
clarkb | since project-config doesn't protect the secrets in the way they think it does | 19:10 |
fungi | oh, that part, yeah | 19:10 |
fungi | the pipeline creation still needs to happen in project-config though, as does the acl addition and support for the new review label in our linter | 19:10 |
clarkb | I guess I'm not up to date on why any of that is necessary. I'll have to take another look | 19:11 |
fungi | i can bring up more details when we get to open discussion | 19:11 |
clarkb | but ya infra-root please look over the ansible in venv changes and the console log file disabling change(s). And ianw don't forget the second change needed for that | 19:11 |
clarkb | Anything else to bring up on this topic? | 19:11 |
ianw | yep i'll loop back on that | 19:12 |
ianw | one minor change this relevaed in zuul was | 19:12 |
ianw | #link https://review.opendev.org/c/zuul/zuul/+/860062 | 19:12 |
ianw | after i messed up the node allocations. that improves an edge-case error message | 19:12 |
ianw | i think probably the last thing i can do is switch the testing to "bridge.opendev.org" | 19:12 |
ianw | all the changes should have abstracted things such that it should work | 19:13 |
ianw | at that point, i think we're ready (modulo launching focal nodes) to do the switch. it will still be quite a manual process getting secrets etc, but i'm happy to do that | 19:13 |
clarkb | ya and using the symlink into $PATH should make it fairly transparent to all the infra-prod job runs | 19:13 |
clarkb | #topic Updating Bionic Servers / Launching Jammy Servers | 19:15 |
clarkb | Thats a good lead into the next topic | 19:15 |
clarkb | corvus did try to launch the new tracing server on a jammy host but that failed because our base user role couldn't delete the ubuntu user as a process was running and owned by it | 19:15 |
clarkb | I believe what happened there is launch node logged in as the ubuntu user and used it to set up root. Then it logged back in as root and tried to delete the ubuntu user but something was left behind from the original login | 19:16 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/860112 Update to launch node to handle jammy hosts | 19:16 |
clarkb | That is one attempt at addressing this. Basically we use userdel --force whcih won't care if a process is running. Then the end of launch node processing involves a reboot which should clear out any stale processes | 19:16 |
clarkb | The downside to this is that --force has some behaviors we may not want generally which is why I've limited the --force deletion to users associated with the base distro cloud images and not with our regular users | 19:17 |
clarkb | this way failures to remove regular users will bubble up and we can debug them more closely | 19:17 |
clarkb | If we don't like that I think another approach would be to have launch login as ubuntu, set up root, then reboot the host and log back in after a reboot | 19:18 |
corvus | what kind of undesirable behaviors? | 19:18 |
clarkb | the reboot should clear out any stale processes and allow userdel to run as before | 19:18 |
clarkb | corvus: "This option forces the removal of the user account, even if the user is still logged in. It also forces userdel to remove the user's home directory and mail spool, even if another user uses the same home directory or if the mail spool is not owned by the specified user." | 19:18 |
clarkb | corvus: in particular I think we want it to error if a normal user outside of the launch context is logged in or otherwise has processes running | 19:19 |
clarkb | as that is something we should address. In the launch context the ubuntu user isn't something we care about and we'll reboot in a few minuets anyway | 19:19 |
corvus | yep agree. seems like --force is okay (even exactly what we want) for this case, and basically almost never otherwise. | 19:19 |
clarkb | anyway I expect that with that change landed we can retry a jammy node launch and see if we make more progress there | 19:20 |
clarkb | but also let me know if we want to try a different appraoch like the early reboot during launch idea | 19:21 |
clarkb | Did anyone else have server upgrade related items for this topic? | 19:21 |
ianw | all sounds good thanks! hopefully we have some new nodes up soon :) if not bridge, the arm64 bits too | 19:22 |
clarkb | #topic Mailman 3 | 19:23 |
clarkb | We continue to make progress. Though things have probably slowed a bit | 19:23 |
clarkb | In particular my efforts to work upstream to improve the images seems to have stalled. | 19:23 |
clarkb | There haven't been any responses to the github issues and PRs so I sent email to the mailman3 users list and the response I got there was that maxking is basically the only person who devs on those and we need to wait for maxking | 19:23 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/860157 Forking upstream mm3 images | 19:24 |
clarkb | because of that I finally gave in and pushed ^ to fork the images. | 19:24 |
clarkb | I think this leads us to two major questions: 1) Do we want to fork or just use the images with their existing issues? and 2) If we do want to fork how forked do we want to get? If we do a minimal fork we can more easily resync with upstream if they become active again. But then we need to continue to carry workarounds in our mm3 role and stick to their uid and gid | 19:25 |
clarkb | selections. | 19:25 |
clarkb | It is worth noting that I did look at maybe just building our own images based on our python base image stuff. The problem with that is it appears there is a lot of inside knowledge over what versions of things need to be combined together to make a working system | 19:25 |
clarkb | https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/H7YK27E4GKG3KNAUPWTV32XWRWPFEU25/ upstream even acknowledges the confusion | 19:26 |
clarkb | For that reason I think we're best off forking or working upstream if we can amange it and then hope upstream curates those lists of specific versions for us | 19:26 |
clarkb | The existing change does a "lightweight" fork fwiw. The only change I made to the images was to install lynx which is necessary for html to text conversion | 19:27 |
clarkb | I don't think we need to decide on any of this right now in the meeting. But I wanted ot throw the considerations out there and ask ya'll to take a look. Feel free to leave your thoughts on the chagne and I'll do my best to followup there | 19:28 |
clarkb | with that out of the way fungi did you have anything to add on the testing side ? | 19:28 |
fungi | it seems like a reasonable path forward, and opens us up to adding other fixes | 19:28 |
fungi | i expect we'll want to hold another node with the build using the forked containers, and do another test import | 19:28 |
clarkb | ++ and probably do that after we update the prod fields that are too long for the new db? | 19:29 |
fungi | i also wanted to double-check that we're redirecting some common patterns like list description pages | 19:29 |
fungi | and i was going to fix those three lists with message templates that were too large for the columns in the db and do at least one more import test | 19:29 |
fungi | yes | 19:30 |
fungi | but otherwise we're probably close to scheduling maintenance for some initial site cut-overs | 19:30 |
clarkb | sounds good. Maybe see if we can get feedback on the image fork idea and then hold based on that | 19:31 |
fungi | right | 19:31 |
clarkb | since we may need to make changes to the images | 19:31 |
fungi | and maybe we'll hear back from the upstream image maintainer | 19:31 |
fungi | but at least we have options if not | 19:31 |
clarkb | Anything else? | 19:32 |
fungi | not on my end | 19:32 |
clarkb | #topic Gitea Connectivity Issues | 19:33 |
clarkb | At the end of last week we had several reports from users in europe that had problems with git clones to opendev.org | 19:33 |
clarkb | We were unable to reproduce this from north american' isp connections and from our ovh region in france | 19:34 |
clarkb | Ultimately I think we decided it was something between the two endpoints and not something we could fix ourselves. | 19:34 |
clarkb | However | 19:34 |
clarkb | it did expose that our gitea logging no longer correlated connections from haproxy -> apache -> gitea | 19:34 |
clarkb | haproxy -> apache was workign fine. The problem was apache -> gitea and that appears to be related to gitea switching http libraries from macaron to go-chi | 19:35 |
clarkb | basically go-chi doesn't handle x-forwarded-for properly to preserve port info and isntead the port becomes :0 | 19:35 |
clarkb | We made some changes to stop forwarding x-forwarded-for which forces everything to record the actual ports in use. THis mostly works but apache -> gitea does reuse connections for multiple requests which means that it isn't a fully 1:1 mapping now but it is better than what we had on friday | 19:36 |
clarkb | I think we can also force apache to use a new connection for each request but that is probably overkill? | 19:36 |
clarkb | I wanted to bring this up in case anyone had better ideas or concerns with these cahgnes since we tried to get them in quickly last week while debugging | 19:36 |
fungi | the request pipelining is probably more efficient, yeah, i don't think i'd turn it off just to make logs easier to correlate | 19:37 |
clarkb | Sounds like no one has any immediate concerns. | 19:39 |
clarkb | #topic Open Discussion | 19:39 |
clarkb | Zuul will make its 7.0.0 release soon. The next step in the zuul release planning process is to switch opendev to ansible 6 by default to ensuer that is working happily. I had asked that we do that after the openstack release. But once openstack releases I think we can make that change | 19:39 |
clarkb | I had a test devstack change up to check ansible 6 on the devstack jobs and that seemed to work happily | 19:40 |
clarkb | https://review.opendev.org/c/openstack/devstack/+/858436 | 19:40 |
clarkb | Now is a good time to test things with ansible 6 if you have any concerns | 19:40 |
fungi | #link https://review.opendev.org/859977 Add post-review pipeline | 19:41 |
fungi | that's where most of the discussion i was talking about earlier took place | 19:41 |
ianw | thanks -- in slightly related to ansible updates i think ansible-lint has fixed some issues that were holding us back from upgrading in zuul-jobs, i'll take a look | 19:42 |
fungi | the openstacksdk maintainers want to take advantage of zuul's post-review pipeline flag to run some specific jobs which use secrets but limit them to changes which the core reviewers have okayed | 19:42 |
clarkb | fungi: and looks like they don't want to use gate for that because they don't want the changes to merge at that point necessarily | 19:42 |
fungi | right, the reviewers want build results after checking that it's safe to run those jobs but before approving them | 19:43 |
clarkb | it might be worth considering if "Allow-Post-Review" conveys the intent here clearly as this might be a pipeline that is adopted more widely | 19:43 |
fungi | we'd discussed this as a possibility (precisely for the case they bring up, testing with public cloud credentials), so i tried to rehash some of our earlier conversations about that | 19:44 |
clarkb | (typicalyl I'd avoid bikeshedding stuff like that but once it is in gerrit acls it is hard to change) | 19:44 |
fungi | yeah, allow-post-review was merely my best suggestion. what they had before that was even less clear | 19:44 |
corvus | (this use case was an explicit design requirement for zuul, so something like this was anticipated and planned for) | 19:44 |
fungi | something to convey "voting +1 here means it's safe to run post-review pipeline jobs" but small enough to be a gerrit label name | 19:45 |
corvus | in design, i think we called it a "restricted check" pipeline or something like that. | 19:45 |
fungi | that's not terrible | 19:45 |
clarkb | no objections from me to move forward on this. As mentioned this was alays somethign we anticipated might become a useful utility | 19:46 |
fungi | yeah, the previous name they had for it was the "post-check" pipeline (and a corresponding gerrit label of the same name) | 19:46 |
fungi | but i agree bikeshedding on terms at least a little is probably good just because of the cargo cult potential | 19:47 |
corvus | the "post-check" phrasing is slightly confusing to me. | 19:47 |
fungi | yeah, since we already have pipelines in that tenant called post and check | 19:47 |
clarkb | I think my initial concern with "allow-post-review" is it doesn't convey what is being allowed. Just that somethign is | 19:47 |
fungi | short for allow-post-review-jobs-to-run | 19:48 |
corvus | for the label name, maybe something that conveys "safety" or some level of having been "reviewed"... | 19:48 |
fungi | yes, something along those lines would be good | 19:49 |
fungi | my wordsmithing was simply not getting me all that far | 19:49 |
fungi | everything i came up with was too lengthy | 19:49 |
corvus | yeah, i'm not much help either | 19:49 |
clarkb | ya its a tough one | 19:49 |
clarkb | trigger-zuul-secrets | 19:49 |
fungi | word-soup | 19:49 |
clarkb | indeed | 19:50 |
fungi | anyway, since it's a use case we'd discussed at length, but it's been a while, i just wanted to call those changes to others' attention so they don't go unnoticed | 19:50 |
clarkb | ++ thanks | 19:51 |
fungi | especially since it's also in service of something we've had a bee in our collective bonnet over (loss of old public cloud support in openstacksdk) | 19:51 |
corvus | ++ | 19:51 |
clarkb | I'll give it a couple more minutes for anything else, but then we can probably end about 5 minutes early today | 19:52 |
clarkb | sounds like that is it. Thank you everyone | 19:54 |
clarkb | We'll be back next week | 19:54 |
clarkb | same location and time | 19:54 |
clarkb | #endmeeting | 19:55 |
opendevmeet | Meeting ended Tue Oct 4 19:55:03 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:55 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-04-19.01.html | 19:55 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-04-19.01.txt | 19:55 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-04-19.01.log.html | 19:55 |
fungi | thanks clarkb! | 19:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!