*** openstackstatus has quit IRC | 01:20 | |
*** openstack has joined #opendev-meeting | 01:22 | |
*** ChanServ sets mode: +o openstack | 01:22 | |
*** sboyron has joined #opendev-meeting | 07:58 | |
*** hashar has joined #opendev-meeting | 08:00 | |
*** hashar is now known as hasharAway | 11:44 | |
*** hasharAway is now known as hashar | 12:35 | |
*** hashar is now known as hasharAway | 15:27 | |
*** hasharAway is now known as hashar | 15:58 | |
*** hashar is now known as hasharAway | 18:23 | |
clarkb | Anyone else here for the meeting? we will get started shortly | 18:59 |
---|---|---|
ianw | o/ | 19:00 |
fungi | yep | 19:01 |
clarkb | #startmeeting infra | 19:01 |
openstack | Meeting started Tue Feb 9 19:01:16 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-February/000180.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:01 | |
clarkb | I had no announcements | 19:01 |
clarkb | #topic Actions from last meeting | 19:01 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:01 | |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-02-19.01.txt minutes from last meeting | 19:02 |
clarkb | I had an action to start writing down a xenial upgrade todo list. | 19:02 |
clarkb | #link https://etherpad.opendev.org/p/infra-puppet-conversions-and-xenial-upgrades | 19:02 |
clarkb | I started there, it is incomplete, but figured starting with something that we can all put notes on was better than waiting for perfect | 19:02 |
clarkb | ianw also had an action to followup with wiki backups. Any update on that? | 19:03 |
ianw | yes, i am getting closer :) | 19:03 |
ianw | do you want to talk about pruning now? | 19:03 |
clarkb | lets pick that up later | 19:04 |
clarkb | #topic Priority Efforts | 19:04 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:04 | |
clarkb | #topic OpenDev | 19:04 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:04 | |
clarkb | I have continued to make progress (though it feels slow) on the gerrit account situation | 19:04 |
clarkb | 11 more accounts with preferred emails lackign external ids have been cleaned up. The bulk of these were simply retired. But one example for tbachman's accounts was a good learning experience | 19:05 |
clarkb | With tbachman there were two accounts. An active one that had preferred email set and no external id for that email and another inactive account with the same preferred email and external ids to match | 19:06 |
clarkb | tbachman said the best thing for them was to update the preferred email to a current email address. We tested this on review-test and tbachman was able to fix things on their end. The update was then made on the prod server | 19:06 |
clarkb | To avoid confusion with the other unused account I set it inactive | 19:07 |
clarkb | The important bit of news here is that users can actually update things themselves within the web ui and don't need us to intervene for this situation. They just need to update their preferred email address to be one of the actual email addresses further down in the settings page | 19:07 |
clarkb | I have also begun looking at the external id email conflicts. This is where two or more different accounts have external ids for the same email address | 19:08 |
clarkb | The vast majority of these seem to be accounts where one is clearly the account that has been used and the other is orphaned | 19:08 |
clarkb | for these cases I think we retire the orphaned account then remove the external ids assoicated with that account that conflict. The order here is important to ensure we don't generate a bunch of new preferred email doesn't have external id errors | 19:09 |
clarkb | There are a few cases where both accounts have been used and we may need to use our judgement or perhaps disable both accounts and let the user come to us with problems if they are still around (however most of these seem to be from years ago) | 19:10 |
clarkb | I suspect that the vast majority of users who are active and have these problems have reached out to us to help fix them | 19:10 |
clarkb | Where I am struggling is that I am finding it hard to automate the classification aspects. I have autoamted a good chunk of the data pulling but there is a fair bit of judgement in "what do we do next" | 19:11 |
clarkb | if others get a chance maybe they can take a look at my notes on review-test and see if any improvements to process or information gathing stand out. I'd also be curious if people think I've prposed invalid solutions to the issues | 19:11 |
clarkb | we don't need to go through that here though, can do that outside of meetings | 19:12 |
clarkb | As a reminder the workaround in the short term is to make changes with gerrit offline then reindex accounts (and groups?) with gerrit offline | 19:12 |
clarkb | I'm hoping we can fix all these issues without ever doing that, but that option is availalbe if we run into a strong need for it | 19:13 |
clarkb | As far as next steps go I'll continue to classify things in my notes on -test and if others agree the proposed plans there seem valid I should make a checkout of the external ids on review and then start committing those fixes | 19:13 |
clarkb | then if we do have to take a downtime we can get as many fixes as are already prepared in too | 19:14 |
clarkb | Next up is a pointer to my gerrit 3.3 image build changes | 19:14 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/765021 Build 3.3 images | 19:14 |
clarkb | reviews appreciated. | 19:14 |
clarkb | And that takes us to the gitea OOM'ing from last week | 19:14 |
clarkb | we had to add richer logging to apache so that we had source connection port for the haproxy -> apache connections. We haven't seen the issue return so haven't really had any new data to debug aiui | 19:15 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/774023 Rate limiting framework change for haproxy. | 19:15 |
clarkb | I also put up an example of what haproxy tcp connection based rate limits might look like. I think the change as proposed woudl completely break users behind corporate NAT though | 19:15 |
clarkb | so the change is WIP | 19:15 |
clarkb | fungi: ianw anything else to add re Gitea OOMs? | 19:16 |
fungi | i'm already finding it hard to remember last week. that's not good | 19:16 |
ianw | yeah, i don't think we really found a smoking gun, it just sort of went away? | 19:17 |
clarkb | ya it went away and by the time we got better logging in place there wasn't much to look at | 19:17 |
clarkb | I guess we keep our eyes open and use better logging next time around if it happens again. Separately maybe take a look at haproxy rate limiting and decide if we watn to implement some version of that? | 19:17 |
clarkb | (the trick is going to be figuringout what a valid bound is that doesn't just braek all the corporate NAT users) | 19:18 |
clarkb | sounds like that may be it let's move on | 19:19 |
clarkb | #topic Update Config Management | 19:19 |
*** openstack changes topic to "Update Config Management (Meeting topic: infra)" | 19:19 | |
clarkb | There are OpenAFS and refstack ansible (and docker in the case of refstack) efforts underway. | 19:19 |
clarkb | I also saw mention that launch node may not be working? | 19:19 |
ianw | launch node was working for me yesterday (i launched a refstack) ... but openstack client on bridge isn't | 19:20 |
clarkb | oh I see I think I mixed up launch node and openstackclient | 19:20 |
fungi | problems with latest openstackclient (or sdk?) talking to rackspace's api | 19:20 |
ianw | well, it can't talk to rax anyway. i didn't let myself yak shave, fungi had a bit of a look too | 19:20 |
clarkb | ianw: I've got an older openstackclient in a venv in my homedir that I use to cross check against clouds when that happens | 19:20 |
clarkb | basically to answer the question of "does this work if we use old osc" | 19:21 |
ianw | yeah, same, and my older client works | 19:21 |
fungi | problem is the exception isn't super helpful because it's masked by a retry | 19:21 |
fungi | so the exception is that the number of retries was exceeded | 19:21 |
fungi | and it (confusingly) complains about host lookup failing | 19:21 |
clarkb | did osc drop keystone api v2 support? | 19:22 |
clarkb | that might be something to check? | 19:22 |
fungi | if mordred gets bored he might be interested in looking at that failure case | 19:22 |
clarkb | I can probably take a look later today after lunch and bike ride stuff. Would be a nice change of pace from staring at gerrit accoutns :) | 19:22 |
clarkb | let me know if that would be helpful | 19:22 |
fungi | but it probably merits being brought up in #openstack-sdk if it hasn't been already | 19:22 |
mordred | fungi: what did I do? | 19:23 |
fungi | mordred: you totally broke rackspace ;) | 19:23 |
fungi | not really | 19:23 |
mordred | ah - joy | 19:23 |
fungi | just thought you might be interested that latest openstacksdk is failing to talk to rackspace's keystone | 19:24 |
mordred | that's exciting | 19:24 |
fungi | using older openstacksdk works, so that's how we got around it in the short term | 19:24 |
fungi | well, an older openstacksdk install, so also older dependencies. it could be any of a number of them | 19:25 |
clarkb | ianw: I've got openafs and refstack as separate agenda items, Should we just go over them here or move on and catch up under proper topic headings? | 19:25 |
ianw | up to you | 19:25 |
clarkb | #topic General topics | 19:26 |
*** openstack changes topic to "General topics (Meeting topic: infra)" | 19:26 | |
clarkb | #topic OpenAFS Cluster Status | 19:26 |
mordred | fungi: I'll take a look - the only thing that would be likely to have an impact would be keystoneauth | 19:26 |
*** openstack changes topic to "OpenAFS Cluster Status (Meeting topic: infra)" | 19:26 | |
clarkb | I don't think I saw any movement on this but wanted to double check. The fileservers are upgraded to 1.8.6 but not the db servers? | 19:26 |
ianw | the openafs status is that all servers/db servers are running 1.8.6-5 | 19:26 |
clarkb | oh nice the db servers got upgraded too. Excellent. Thank you for working on that | 19:27 |
clarkb | next steps there are to do the server upgrades then? | 19:27 |
clarkb | I've got them on my initial pass of a list for server upgrades too | 19:27 |
ianw | yep; so next i'll try an in-place focal upgrade, probably on one of the db servers first as they're small, and start that process | 19:27 |
clarkb | great, thanks again | 19:28 |
clarkb | #topic Refstack upgrade and container deployment | 19:28 |
*** openstack changes topic to "Refstack upgrade and container deployment (Meeting topic: infra)" | 19:28 | |
ianw | i got started on this | 19:29 |
ianw | there's a couple of open reviews in https://review.opendev.org/q/topic:%22refstack%22+(status:open%20OR%20status:merged) to add the production deployment jobs | 19:29 |
clarkb | is there a change to add a server to inventory yet? I suppose for this server we won't have dns changes as dns will be upgraded via rax | 19:29 |
ianw | yeah i merged that yesterday | 19:30 |
clarkb | #link https://review.opendev.org/q/topic:%22refstack%22+(status:open%20OR%20status:merged) Refstack changes that need review | 19:30 |
ianw | if we can just double check those jobs, i can babysit it today | 19:30 |
clarkb | cool I can take a look at those really quickly after the meeting I bet | 19:30 |
fungi | SotK: ^ that may also make a good example for doing the storyboard deployment | 19:30 |
clarkb | ++ | 19:30 |
ianw | then have to look at the db migration; the old one seemed to have a trove while we're running it from a container now | 19:30 |
clarkb | ya I expect we'll restore from dump for now testing that things work? then schedule a downtime so that we can stop refstack properly, do a dump, restore from that, then start on the new server with dns updates | 19:31 |
clarkb | and kopecmartin volunteered to test the service that has been newly deployed whcih will go a long way as I don't even know how to interact with it properly | 19:32 |
clarkb | Anything else to add on this topic? | 19:32 |
ianw | yep, there's terse notes at | 19:32 |
ianw | #link https://etherpad.opendev.org/p/refstack-docker | 19:32 |
ianw | other than that no | 19:32 |
clarkb | thank you everyone who helped move this along | 19:32 |
clarkb | #topic Bup and Borg Backups | 19:33 |
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)" | 19:33 | |
clarkb | ianw feel free to give us an update on borg db streaming and pruning and all other new info | 19:33 |
ianw | the streaming bit seems to be going well | 19:34 |
ianw | modulo of course mysqldump --all-databases stopping actually dumping all databases with a recent update | 19:34 |
clarkb | but it does still work if you specify specificy databases | 19:35 |
ianw | #link https://bugs.launchpad.net/ubuntu/+source/mysql-5.7/+bug/1914695 | 19:35 |
openstack | Launchpad bug 1914695 in mysql-5.7 (Ubuntu) "mysqldump --all-databases not dumping any databases with 5.7.33" [Undecided,New] | 19:35 |
clarkb | (which is the workaround we're going with?) | 19:35 |
fungi | also there was some unanticipated fallout from the bup removal | 19:35 |
ianw | nobody else has commented or mentioned anything in this bug, and i can't find anything in the mysql bug thing (though it's a bit of a mess) and i don't know how much more effort we want to spend on it, because it's talking to a 5.1 server in our case | 19:36 |
fungi | apparently apt-get thought bup was the only reason we wanted pymysql installed on the storyboard server, so when bup got uninstalled so did the python-pymysql package. hilarity ensued | 19:36 |
clarkb | mordred: ^ possible you may be interested? but ya I think our workaround is likely sufficient | 19:36 |
ianw | I also realised some things about borg's append-only model and pruning that are explained in their docs, if you read them the right way | 19:38 |
ianw | i've put up some reviews at | 19:39 |
ianw | #link https://review.opendev.org/q/topic:%22backup-more-prune%22+status:open | 19:39 |
ianw | that provides a script to do manual prunes of the backups, and a cron job to warn us via email when the backup partitions are looking full | 19:39 |
ianw | i think that is the best way to manage things for now | 19:39 |
clarkb | ianw: that seems like a good compromise, similar to how the certchecker reminded us to go buy new certs when we weren't using LE | 19:40 |
ianw | i think the *best* way would be to have rolling LVM snapshots implemented on the backup server | 19:40 |
ianw | but i think it's more important to just get running 100% with borg in an stable manner first | 19:41 |
clarkb | ++ | 19:41 |
ianw | so yeah, basically request for reviews on the ideas presented in those changes | 19:42 |
clarkb | thank you for sticking to this. Its never an easy thing to change, but helps enable upgrades to focal and beyond for a number of services | 19:42 |
ianw | but i think we've got it working at a stable working set. some things we can't avoid like the review backups being big diffs due to git pack file updates | 19:42 |
clarkb | we could stop packing but then gerrit would get slow | 19:42 |
clarkb | Anything else on this or should we move on? | 19:43 |
fungi | we could "backup" git repositories via replication rather than off the fs? | 19:43 |
fungi | though what does the replication in that case? | 19:43 |
clarkb | fungi: the risk with that is a corrupted repo wouldn't be able to roll back easily | 19:44 |
fungi | yeah | 19:44 |
clarkb | with proper backups we can go to an old state | 19:44 |
fungi | well, assuming the repository was not mid-write when we backed it up | 19:44 |
ianw | yep, and although the deltas take up a lot of space, the other side is they do prune well | 19:44 |
clarkb | I think git is pretty godo about that | 19:44 |
clarkb | basically git does order of operations to make backups like that mostly work aiui | 19:45 |
clarkb | Alright lets move on as we have a few more topics to cover | 19:45 |
clarkb | #topic Xenial Server Upgrades | 19:46 |
*** openstack changes topic to "Xenial Server Upgrades (Meeting topic: infra)" | 19:46 | |
clarkb | #link https://etherpad.opendev.org/p/infra-puppet-conversions-and-xenial-upgrades | 19:46 |
clarkb | this has sort of been in continual progress over time, but as xenial eol appraoches I think we should capture what remains and start prioritizing things | 19:46 |
clarkb | I've started to write down a partial list in that etherpad | 19:46 |
clarkb | I'm hoping that I might have time next week to start doing rolling replacements of zuul-mergers, zuul-executors, and nodepool-launchers | 19:47 |
clarkb | my idea there was to redeploy oen of each on focal and we can check everything is happy with the switch, then roll through the others in each group | 19:47 |
clarkb | If you've got ideas on priorities or process/method/etc feel free to add notes to that etherpad | 19:48 |
clarkb | #topic Meetpad Audio Stopped Working | 19:48 |
*** openstack changes topic to "Meetpad Audio Stopped Working (Meeting topic: infra)" | 19:48 | |
clarkb | Late last week a few of us noticed that meetpad's audio wasn't working. By the time I got around to trying it again in order to look at it this week it was working | 19:49 |
fungi | yeah, it seems to be working fine today | 19:49 |
fungi | used it for a while | 19:49 |
clarkb | Last week I had actually tried using the main meet.jit.si service as well and had problems with it too. I suspect that we may have deployed a bug then deployed the fix all automatically | 19:49 |
clarkb | This reminds me that I think corvus has mentioend we should be able to unfork one of the images we are running too | 19:50 |
*** diablo_rojo has joined #opendev-meeting | 19:50 | |
clarkb | it is possible that having a more static image for one of the services could have contributed as well | 19:50 |
* diablo_rojo appears suuuuper late | 19:50 | |
clarkb | corvus: ^ is it just a matter of replacing the image in our docker-compose? | 19:50 |
corvus | ohai | 19:51 |
corvus | clarkb: everything except -web is unpinned i think | 19:52 |
corvus | -web is the fork | 19:52 |
clarkb | and to unfork web we just update our docker-compose file? maybe set some new settings? | 19:52 |
corvus | i don't think we'd be updating/restarting any of those automatically | 19:52 |
clarkb | corvus: I think we may do a docker-compose pull && docker-compose up -d regularly | 19:52 |
corvus | gimme a sec | 19:53 |
clarkb | similar to how gitea does it (and it finds new mariadb images) | 19:53 |
corvus | okay, yeah, looks like we do restart, last was 4/5 days ago | 19:54 |
corvus | to unfork web, we actually update our dockerfile and the docker-compose | 19:54 |
clarkb | ok, it wasn't clear to me if we had to keep building the image ourselves or if we can use theirs like we do for the other services | 19:55 |
corvus | (we're building the image from a github/jeblair source repo and deploying it; to unfork, change docker-compose to deploy from upstream and rm the dockerfile) -- but don't do that yet, upstream may not have updated the image. | 19:55 |
corvus | we should use their | 19:55 |
clarkb | got it | 19:55 |
corvus | https://hub.docker.com/r/jitsi/web | 19:55 |
corvus | https://hub.docker.com/layers/jitsi/web/latest/images/sha256-018f7407c2514b5eeb27f4bc4d887ae4cd38d8446a0958c5ca9cee3fa811f575?context=explore | 19:56 |
corvus | 4 days ago | 19:56 |
corvus | we should unfork now | 19:56 |
clarkb | excellent. Did you want to write that change? If not I'm sure we can find a volunteer | 19:56 |
corvus | their build of -web should now have the meetpad PR merge in it | 19:56 |
corvus | clarkb: i will do so | 19:56 |
clarkb | thank you | 19:56 |
corvus | #action corvus unfork jitsi-meet | 19:56 |
clarkb | #topic InMotion OpenStack as a Service | 19:57 |
*** openstack changes topic to "InMotion OpenStack as a Service (Meeting topic: infra)" | 19:57 | |
clarkb | Really quickly before our hour is up: I have deployed a control plane for an inmotion openstack managed cloud | 19:57 |
clarkb | everything seems to work at first glance and we could bootstrap users and projects and then point cloud launcher at it. Except that none of the api endpoints have ssl | 19:57 |
clarkb | thee is a VIP involved somehow that load balances requests across the three control plane nodes (it is "hyperconverged") | 19:58 |
clarkb | I need to figure out how to properly listen on that VIP and then can run a simple ssl terminating proxy with a self signed cert or LE cert that forwards to local services | 19:58 |
clarkb | I have not yet figured that out | 19:58 |
clarkb | I've also tried to give this feedback back to inmotion as something that would be useful | 19:59 |
clarkb | another thing worth noting is that we have a /28 of ipv4 addresses there currently so the ability to expand our nodepool resources is minimal right now | 19:59 |
fungi | got it, so by default their cloud deployments don't provide a reachable rest api> | 19:59 |
clarkb | well they do but in plaintext | 19:59 |
corvus | clarkb: what's the vip attached to? | 19:59 |
fungi | oh! http just no https? | 20:00 |
clarkb | corvus: I have no idea. I tried looking and couldn't find it then ran out of time | 20:00 |
clarkb | a lot of things are in kolla containers and they all run the same exact command so its been interesting poking around | 20:00 |
clarkb | (they run some sort of init that magically knows what other comamnds to run) | 20:00 |
clarkb | fungi: yup | 20:01 |
ianw | is it only ipv4 or also ipv6? | 20:01 |
clarkb | ianw: currently only ipv4 but ipv6 is something that they are looking at | 20:01 |
fungi | ipv6 is all the rage with the kids these days | 20:01 |
clarkb | (I expect that if we use this more properly it will be as an ipv6 "only" cloud then use the ipv4 /28 to do nat for outbound like limestone does) | 20:01 |
clarkb | but that is still theoretical right now | 20:01 |
clarkb | also we are now at time | 20:01 |
clarkb | thank you everyone! | 20:01 |
ianw | yeah, 28 is ... 16? nodes? - control plane bits? | 20:01 |
fungi | thanks clarkb! | 20:02 |
fungi | ianw: correct | 20:02 |
clarkb | ianw: the control plane has a separate /28 or /29 | 20:02 |
clarkb | this /28 is for the neutron networking side so ya probably 14 usable and after neutron uses a couple 12? | 20:02 |
clarkb | We can continue conversations in #opendev | 20:02 |
clarkb | #endmeeting | 20:02 |
fungi | if the entire /28 is routed to the endpoint you could in theory use all 16 addresses | 20:02 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 20:02 | |
openstack | Meeting ended Tue Feb 9 20:02:33 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:02 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-09-19.01.html | 20:02 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-09-19.01.txt | 20:02 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-09-19.01.log.html | 20:02 |
*** hasharAway has quit IRC | 21:14 | |
kopecmartin | ianw: thank you for working on it | 21:21 |
*** gmann is now known as gmann_afk | 22:13 | |
*** sboyron has quit IRC | 23:18 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!