*** hamalq_ has quit IRC | 01:59 | |
*** sboyron has joined #opendev-meeting | 07:53 | |
*** hashar has joined #opendev-meeting | 07:57 | |
*** sboyron_ has joined #opendev-meeting | 09:04 | |
*** sboyron has quit IRC | 09:04 | |
*** sboyron_ has quit IRC | 09:06 | |
*** sboyron_ has joined #opendev-meeting | 09:07 | |
*** sboyron_ has quit IRC | 12:07 | |
*** sboyron_ has joined #opendev-meeting | 12:07 | |
*** sboyron_ has quit IRC | 13:08 | |
*** sboyron_ has joined #opendev-meeting | 13:09 | |
*** hashar is now known as hasharAway | 15:23 | |
*** yoctozepto has quit IRC | 15:43 | |
*** yoctozepto has joined #opendev-meeting | 15:44 | |
*** sboyron_ has quit IRC | 15:51 | |
*** sboyron has joined #opendev-meeting | 16:02 | |
*** sboyron has quit IRC | 16:17 | |
*** sboyron has joined #opendev-meeting | 16:17 | |
*** hasharAway is now known as hashar | 16:43 | |
*** sboyron has quit IRC | 17:22 | |
*** sboyron has joined #opendev-meeting | 17:24 | |
*** hashar is now known as hasharDinner | 17:49 | |
clarkb | #startmeeting infra | 19:01 |
---|---|---|
openstack | Meeting started Tue Jan 5 19:01:20 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
clarkb | hello everyone, welcome to the first meeting of 2021 | 19:01 |
clarkb | Others indicated they would be delayed in joining so I'll give it a few minutes before we dive into the agenda I sent out | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-January/000160.html Our Agenda | 19:02 |
clarkb | #topic Announcements | 19:05 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:05 | |
clarkb | I didn't have any announcements. Were there others to share? | 19:05 |
* corvus joins late | 19:05 | |
fungi | i've nothing to share | 19:06 |
clarkb | #topic Actions from last meeting | 19:06 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:06 | |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-12-08-19.01.txt minutes from last meeting | 19:06 |
clarkb | It hasbeen a while since our last meeting. I don't see any actions registered tehre. I think we can just roll forward into 2021 | 19:07 |
clarkb | #topic Priority Efforts | 19:07 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:07 | |
clarkb | #topic Update Config Management | 19:07 |
*** openstack changes topic to "Update Config Management (Meeting topic: infra)" | 19:07 | |
clarkb | Over the holidays it appears that rax was doing a number of host migrations. A non zero number of these failed leaving servers unreachable | 19:07 |
clarkb | other than services like ethercalc, wiki, and elasticsearch going down as a result one of the fallouts to this is our ansible playbooks try to connect to the servers and never time out piling up a number of stale ansible-playbook processes and their children on bridge | 19:08 |
clarkb | then subsequent runs timeout because the server is slow due to load | 19:08 |
clarkb | We do set an ansible ssh connection timeout but it doesn't seem to be sufficient in these cases | 19:09 |
clarkb | fungi: ^ I think you had a theory for why that may be but I can't remember it right now? | 19:09 |
fungi | because ssh doesn't time out connecting | 19:09 |
fungi | ssh authenticates and hangs | 19:09 |
clarkb | I see, its the next step that isn't being useful | 19:09 |
clarkb | I wonder if we can make that better in ansible or if ansible already has tooling to try and detect that. | 19:10 |
fungi | basically the servers are in a pathological condition which i think ansible's timeout mechanism doesn't take into consideration but happens rather regularly for us | 19:10 |
clarkb | like maybe we can set a task timeout to some value like 2 hours | 19:10 |
clarkb | anyway we don't need to solve it here. I just wanted to call that out since we hit this problem multiple times on bridge over the holidays ( and on our return) | 19:11 |
corvus | unsure if this is on/off topic, but i made some changes to the root email alias, and it doesn't seem to have taken effect on many servers; is our periodic ansible run failing due to these issues? | 19:11 |
fungi | it's either hanging the connection indefinitely during or immediately following authentication, i'm not sure which | 19:11 |
clarkb | corvus: base was failing, but should be running as of yeaterday evening my local time | 19:11 |
clarkb | correction: base was timing out | 19:11 |
corvus | ok, so i'll see if my inbox is full again tomorrow :) | 19:12 |
fungi | yeah, so servers later in the sequence would have been repeatedly skipped | 19:12 |
clarkb | and if you notice servers are unresponsive reboots seem to correct their issues | 19:13 |
clarkb | any other config management items to bring up? that was all I had | 19:13 |
clarkb | #topic OpenDev | 19:14 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:14 | |
clarkb | On the Gerrit tuning topic we enabled the git v2 protocl then updated our zuul images to enable it client side and that was the last gerrit tuning we did | 19:14 |
clarkb | it seems to be working from a functionality perspective (zuul and git review are happy etc) but probably too early to say if it has helped with the system load issues | 19:15 |
corvus | yeah, we also scheduled holidays ;) | 19:15 |
corvus | if the tuning doesn't work out, let's fall back on scheduling more holidays | 19:16 |
fungi | yeah, i'll be more convinced next week or the week after when everyone's turning it up to 11 again | 19:16 |
clarkb | Other tuning ideas are the strong refs for jgit caches (potentially needs more memory and is scary for that reason), setting up service user and regular user thread counts to better balance CI and humans, and on the upstream mailing list there has been a ton of recent discussion from other users about tuning caches | 19:16 |
clarkb | corvus: do you know where ianw has gotten with the zuul results plugin work? I think you were helping to get that into an upstream plugin? | 19:17 |
clarkb | I expect we will be able to incorporate taht into our images soon, but I've not yet acught up on the status of this work | 19:18 |
fungi | i'll readily admit i ended up not finding time to work on the jeepyb fixes for update_bug/update_bp as other problems kept preempting my time | 19:18 |
corvus | um... i haven't checked recently but last i remember is it exists in an upstream repo | 19:18 |
clarkb | corvus: cool so progress :) | 19:18 |
clarkb | the other thing ianw had brought up was using the built in WIP status for changes. In testing that we have found that Zuul doesn't understand WIP status changse as unmergable | 19:19 |
corvus | #link https://gerrit.googlesource.com/plugins/zuul-results-summary/ | 19:19 |
clarkb | we mentioned this last time we had a meeting but we should discourage users from using that until Zuul does understand that status | 19:19 |
corvus | i can add that feature | 19:19 |
clarkb | the preexisting WIP vote on the workflow should be used until zuul has been updated | 19:20 |
clarkb | corvus: tahnks | 19:20 |
corvus | #action corvus add wip support to zuul | 19:20 |
clarkb | The last Gerrit related topic I wanted to bring up was the 3.3 upgrade. guillaumec says that 3.3.1 incorporates the fix for zuul | 19:20 |
corvus | this was the comments thing (that would break 'recheck' i think) | 19:21 |
clarkb | I think that means we can start looking at 3.3.1 upgrades if people have time. The upgrade does involve some changes like Non-Interactive Users group being renamed to Service Users and I am sure there are other things to consider so if we do that lets read release notes and test it (review-test can still be used for this I think) | 19:21 |
clarkb | corvus: yup | 19:21 |
corvus | i haven't checked on what the final status of that is (ie, do we need to enable an option or is it transparantly backwards compat) | 19:21 |
clarkb | oh good point we should also dobule check this fix doesn't need settings to be effective | 19:22 |
corvus | i think people were leaning towards not requiring that, but it was a suggestion, so we should verify | 19:22 |
clarkb | I don't know that I'll have time to drive a gerrit upgrade at the beginning of the year. I've got all the typically beginning of the year things distracting me. But I can help anyone else who may have time (if they don't also have beginning of the year items) | 19:22 |
clarkb | ianw was also working on improving our testing of gerrit in CI | 19:23 |
clarkb | it might be worth getting those improvements landed then relying on it to help verify the next upgrade. I don't think we're in a rush so that may be a good idea | 19:23 |
clarkb | The other opendev related upgrade is Gitea 1.13 | 19:24 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/769226 | 19:25 |
clarkb | this upgrade seems to be a bigger leap than previous gitea upgrades. They have added new features like project management kanban boards | 19:25 |
clarkb | our testing is decent for api checking but maybe we should hold the run job for that change now and put a repo or three in it and confirm it is happy from a ui perspective? | 19:25 |
corvus | oO | 19:25 |
clarkb | this version also adds elasticsearch support for indexing. It isn't the default and I think we should upgrade to it first without worrying about elasticsearch just to sort out the other changes. Then as a followon we can work to sort out elasticsearch | 19:26 |
fungi | our manage-projects test loads repos into gitea, can we depends-on or something to just take advantage of that and hold it? | 19:26 |
clarkb | fungi: the gitea test creats all of the projects, but without git content | 19:27 |
clarkb | fungi: all you need to do is push the content in after holding it | 19:27 |
fungi | ahh | 19:27 |
clarkb | we could potentially modify the job to push in content for some small repos too) | 19:27 |
clarkb | that may be a good idea | 19:27 |
fungi | or push some ourselves after setting up necessary credentials, yeah | 19:27 |
clarkb | ya why don't we do that. I'll WIP the change and suggest we hold it and check the ui since the upgrade is a bit more involved than ones we have done perviously | 19:29 |
clarkb | Any other opendev topics to discuss or should we move on? | 19:30 |
fungi | annual report? | 19:30 |
clarkb | thats next though I guess technically it fits under here | 19:30 |
fungi | or did you have a separate topic for that? | 19:30 |
fungi | ahh, no worries | 19:30 |
clarkb | ya I had it in general topics but it is the opendev project update. Lets talk about it here | 19:30 |
* fungi should read meeting agendas | 19:30 | |
clarkb | We have been asked to put together a project update for opendev in the foundation's annual report | 19:31 |
clarkb | #link https://etherpad.opendev.org/p/opendev-2020-annual-report | 19:31 |
clarkb | I have written a draft. But I'm happy to scrap that if others want to write one. Also happy for edits and suggestions | 19:31 |
clarkb | I believe we have a week from tomorrow to get it together so this isn't a huge rush but is also a near future item to figure out | 19:31 |
fungi | i'm also putting some polish on our engagement metrics generator: https://review.opendev.org/729293 | 19:34 |
clarkb | I've been planning to do periodic rereads and edits myself too. Basically want to reread it with it being a bit more fresh than correct things as necessary | 19:34 |
clarkb | #topic General topics | 19:34 |
*** openstack changes topic to "General topics (Meeting topic: infra)" | 19:34 | |
clarkb | #topic Bup and Borg Backups | 19:34 |
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)" | 19:34 | |
clarkb | I think we may be about ready to drop this entry from our agenda. I'll double check with ianw when holidays end. | 19:35 |
clarkb | tldr aiui is we're using borg now, bup should be diasbled at least on some servers | 19:35 |
clarkb | we'll keep the old bup backups around on the old volumes liek we've done with previous bup rotations | 19:35 |
clarkb | if you haven't yet had a chance to interact with borg and try out recovery methods that may be a good exercise. Should only take about half an hour I would expect | 19:36 |
clarkb | #topic InMotion Hosted Cloud | 19:37 |
*** openstack changes topic to "InMotion Hosted Cloud (Meeting topic: infra)" | 19:37 | |
clarkb | The other thing I've been working on this week is getting an account with inmotion bootstrapped so that we can spin up an openstack cloud there for nodepool resources when they are ready | 19:37 |
clarkb | I have created an account and the details for that as well as our contacts are in the usual location. There is no actualy cloud yet though. AIUI we are waiting on them to tell us they are ready to try bootstrapping the actual resources | 19:38 |
fungi | this is the experiment where we're sort of on the hook as openstack cloud admins, right? | 19:39 |
fungi | infracloud mk2? | 19:39 |
clarkb | yes, but I think we've decided taht we are comfortable with a redeploy strategy using their provided management tools | 19:40 |
clarkb | in theory that means the actual overhead to us is low | 19:40 |
fungi | okay, so basically hands-off and if it breaks we push a button and rebuild it all | 19:40 |
clarkb | exactly | 19:40 |
corvus | so if it breaks or we need to upgrade, ^ that? | 19:40 |
clarkb | yup | 19:40 |
corvus | that happens occasionally with our current providers too | 19:41 |
clarkb | they have also expressed interest in zuul and nodepool so maybe we can get them involved there too | 19:41 |
fungi | openstack as a service. it'll be interesting | 19:41 |
clarkb | #topic Open Discussion | 19:42 |
*** openstack changes topic to "Open Discussion (Meeting topic: infra)" | 19:42 | |
clarkb | That was about all I had. There are some old agenda items that I should probably clean up after thinking about them for half a second | 19:43 |
clarkb | I've got meetings mon-wed next week that will have me distracted in the mornings (and maybe afternoons? I don't know if that has bee nsorted out yet) | 19:43 |
clarkb | I should be around for our meeting next week though | 19:43 |
fungi | yeah, same here (same meetings) | 19:44 |
fungi | but they're half-day if memory serves, so shouldn't be entirely distracting | 19:44 |
clarkb | Anything else? or should we call it here? | 19:46 |
* fungi has nothing | 19:46 | |
clarkb | sounds like that may be it then. Thanks everyone and we'll see you here next week | 19:47 |
fungi | thanks clarkb! | 19:47 |
clarkb | feel free to bring up discussions in #opendev or on the mailing list and we can pick things up there if they were missed here | 19:47 |
corvus | thanks! | 19:47 |
clarkb | #endmeeting | 19:47 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 19:47 | |
openstack | Meeting ended Tue Jan 5 19:47:41 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:47 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-05-19.01.html | 19:47 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-05-19.01.txt | 19:47 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-05-19.01.log.html | 19:47 |
*** sboyron has quit IRC | 20:06 | |
*** hasharDinner is now known as hashar | 20:12 | |
*** hashar has quit IRC | 22:38 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!