-openstackstatus- NOTICE: The Gerrit service on review.opendev.org is being quickly restarted to apply a new security patch | 00:55 | |
*** hamalq has quit IRC | 02:36 | |
*** sboyron has joined #opendev-meeting | 07:02 | |
*** sboyron_ has joined #opendev-meeting | 07:19 | |
*** sboyron has quit IRC | 07:22 | |
*** hashar has joined #opendev-meeting | 08:13 | |
*** kopecmartin has quit IRC | 08:48 | |
*** kopecmartin has joined #opendev-meeting | 08:50 | |
*** zbr1 has joined #opendev-meeting | 11:15 | |
*** zbr has quit IRC | 11:16 | |
*** zbr1 is now known as zbr | 11:16 | |
*** hashar is now known as hasharLunch | 12:29 | |
*** hasharLunch is now known as hashar | 13:18 | |
*** hashar is now known as hasharAway | 15:27 | |
*** hasharAway is now known as hashar | 16:04 | |
*** zbr1 has joined #opendev-meeting | 17:17 | |
*** zbr has quit IRC | 17:18 | |
*** zbr1 is now known as zbr | 17:18 | |
*** hashar is now known as hasharDinner | 18:41 | |
clarkb | anyone else here for the meeting? | 19:01 |
---|---|---|
clarkb | #startmeeting infra | 19:01 |
openstack | Meeting started Tue Feb 2 19:01:06 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-February/000179.html Our Agenda | 19:01 |
kopecmartin | hi o/ | 19:01 |
ianw | o/ | 19:02 |
clarkb | hello kopecmartin your agenda item is near the tail end of the meeting, if that is a problem feel free to say something and we can cover it earlier (not sure what meeting timing is like for you) | 19:02 |
kopecmartin | clarkb: it's fine, i'll wait :) | 19:02 |
clarkb | #topic Announcements | 19:03 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:03 | |
clarkb | I had no announcements | 19:03 |
clarkb | #topic Actions from last meeting | 19:03 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:03 | |
*** sboyron_ has quit IRC | 19:03 | |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-26-19.01.txt minutes from last meeting | 19:03 |
clarkb | fungi's request for help on config-core duties went out | 19:03 |
clarkb | fungi: thank you for that | 19:03 |
clarkb | #action clarkb begin puppet -> ansible and xenial upgrade audit | 19:04 |
clarkb | I did not manage to find time for ^ so have added it back on | 19:04 |
clarkb | ianw: do we need to keep an action item for wiki backups or are those happy now? | 19:04 |
ianw | not done yet ... | 19:04 |
clarkb | #action ianw figure out borg backups for wiki | 19:05 |
clarkb | Ok lets dive into our topics for today | 19:05 |
clarkb | #topic Priority Efforts | 19:05 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:05 | |
clarkb | #topic OpenDev | 19:05 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:05 | |
clarkb | The service coordination nominations period has finished. | 19:05 |
clarkb | I didn't see anyone else volunteer by the weekend so I put my name in | 19:05 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-January/000161.html Clarkb appears to be the only nomination | 19:05 |
clarkb | I haven't seen any other since I made mine either. | 19:06 |
clarkb | I think that means I'm it again, but if I missed something please call it out :) | 19:06 |
clarkb | Since last weeks meeting I've done a bit of work on the Gerrit account inconsistencies problem | 19:07 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-user-consistency-2021 High level notes. | 19:07 |
clarkb | I've started to try and keep high level notes there while keeping the PII out of the etherpad | 19:07 |
clarkb | Group problems and 81 accounts with preferred emails missing external ids have been fixed. | 19:07 |
clarkb | thank you fungi for being an extra set of eyes while I worked through ^ | 19:08 |
fungi | any time! | 19:08 |
clarkb | We have 28 accounts with preferred email addresses that don't have a matching external id | 19:08 |
clarkb | We have ~642 accounts with conflicting emails in their external ids. This needs more investigating to better understand the fix for them. | 19:08 |
clarkb | Need to correct the ~642 external id issues before we can push updates to refs/meta/external-ids with Gerrit online. | 19:08 |
clarkb | Workaround is we can stop gerrit, push to external ids directly, reindex accounts (and groups?), start gerrit, then clear accounts caches (and groups caches?) | 19:08 |
clarkb | I'ev given the next set of steps some though and I think roughly it is: | 19:08 |
clarkb | Classify users further into situation groups | 19:09 |
clarkb | Decide on next steps for users depending on their situation group | 19:09 |
clarkb | Fix the preferred email issue if possible as this can be done with gerrit online | 19:09 |
clarkb | Start a refs/meta/external-ids checkout in a shared location and begin committing fixes to it. If we can't push all the fixes as separate commits we can squash them together and then push. | 19:09 |
clarkb | that might be broken down further to do all the preferred email issues first as we can correct them online. Then do the external ids | 19:09 |
zbr | clarkb: does this mean manually investigating and patching >600 accounts? | 19:09 |
clarkb | Another upside to doing it ^ that way is I expect some of the external id fixes will result in preferred email issues in the account side. If we fix the existing issues first we won't confuse them with any new ones we introduce | 19:10 |
clarkb | zbr: yes | 19:10 |
fungi | probably semi-scripted at least | 19:10 |
fungi | and with distinct classifications, some of them may be quick to blow through | 19:11 |
clarkb | right the 81 we fixed already were 95% done with a script once we were satisfied with an earlier pass and classification | 19:11 |
clarkb | depending on how the classification for external ids goes I think downtime to crrect a portion of them is an option as well | 19:11 |
clarkb | that will help us ensure that we're making changes that are viable once loaded into gerrit | 19:12 |
clarkb | btu I don't want to make too strong of a plan for those until we start actually committing changes to that shared checkout | 19:12 |
zbr | while this does not sound like a joy, if spliting the work can speed things up, i may give it a try. | 19:12 |
clarkb | zbr: part of the problem here is that it is all PII so I think we need to be careful who we give access to. Currently it is just gerrit admins | 19:13 |
clarkb | (you need to be admin to access the refs) | 19:13 |
zbr | ah. | 19:13 |
clarkb | anyway, after fixing the first 81 accounts I spent a bit of time doing further classification but then got distracted | 19:14 |
clarkb | I need to pick that back up again. I think that a good chunk of the remaining preferred email issues can be fixed like the first 81 | 19:14 |
clarkb | but I need to actually make those lists and then see if others agree | 19:14 |
clarkb | I'll be trying to pick this back up again this week | 19:14 |
clarkb | Other Gerrit items: | 19:15 |
clarkb | We upgraded Gerrit to ~3.2.7 yseterday to patch a security issue | 19:15 |
clarkb | I also tested that Gerrit's workinprogress state is handled by zuul properly when you approve changes. It appears to ignore workinprogress changes properly now | 19:15 |
clarkb | (we expected it to since the fix was deployed, but needed to test with actual gerrit) | 19:16 |
clarkb | ianw and I have made some improvements to the gerrit testing too. | 19:16 |
clarkb | the selenium stuff is a bit better now and I added a test to check that the x/ clone workaround continues to work | 19:17 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/765021 Build 3.3 images | 19:17 |
clarkb | I also resurrected my change to build 3.3 images. I don't think we're in a hurry to upgrade, doing the OS upgrade first seems like better prioritization, but having working image builds ready for us would be nice | 19:17 |
clarkb | That was what I had for OpenDev and Gerrit things. Anything else to add before we move on? | 19:18 |
ianw | hrm that doesn't run the system-config job? | 19:18 |
clarkb | ianw: no, beacuse currently the system-config job is 3.2 only | 19:18 |
clarkb | I think in a followup I could add a system-config + 3.3 job | 19:18 |
clarkb | or if people prefer can add it to this existing change | 19:19 |
ianw | oh i see totally new jobs :) i think it would be great to run it, either way | 19:19 |
clarkb | ok I'll probably start with a followup change then as that is slightly easier and take it from there | 19:20 |
clarkb | #topic Update Config Management | 19:21 |
*** openstack changes topic to "Update Config Management (Meeting topic: infra)" | 19:21 | |
clarkb | I am not aware of anything to add to this other than maybe the refstack topic which we've got later on in the agenda | 19:21 |
clarkb | Might be worth mentioning that I helped zuul fix an issue in zuul-registry that new buildx exposed. This was affecting our ability to do multiarch builds (things like nodepool builders) | 19:22 |
clarkb | that should be fixed now though. Thank you zbr for calling out the problem | 19:23 |
clarkb | #topic General topics | 19:24 |
*** openstack changes topic to "General topics (Meeting topic: infra)" | 19:24 | |
clarkb | #topic OpenAFS cluster status | 19:24 |
*** openstack changes topic to "OpenAFS cluster status (Meeting topic: infra)" | 19:24 | |
clarkb | Just a quick status check on the openafs cluster. I think we still need to upgrade the db servers? | 19:24 |
clarkb | ianw: fungi: anything else to note about ^ ? | 19:25 |
fungi | that's still the status afaik | 19:25 |
ianw | yeah, i got distracted with other things. high on my todo list :) | 19:26 |
fungi | and then we can think through upgrading operating systems/replacing servers | 19:26 |
clarkb | no worries, just making sure I (and others) are up to date | 19:26 |
clarkb | #topic Bup and Borg Backups | 19:26 |
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)" | 19:26 | |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/773570 | 19:26 |
clarkb | I think that is the latest on borg? Would be good if fungi frickler and corvus could review that one | 19:27 |
zbr | clarkb: i just realized why i did not get notifications from Zuul-jobs-failures mailing list... i did not whitelist the user. | 19:27 |
clarkb | ianw: feel free to fill us in on any and all relevant info for this topic though :) | 19:27 |
ianw | yeah, i'm pretty focused on getting our working set to a reasonable level | 19:28 |
ianw | unfortunately i didn't quite fully grok the implications of --append mode and the particular way borg implements that | 19:28 |
clarkb | (I didn't either) | 19:28 |
ianw | all the details are in the changelog of 773570 | 19:29 |
fungi | i caught some of it last night before i passed out | 19:29 |
ianw | anyway, a better way would be do do someting like a rolling set of LVM snapshots on the server side | 19:30 |
fungi | i guess cow wouldn't help there because of the encryption layer | 19:30 |
fungi | or maybe it would, depends on if borg manages to not update most of the blocks when updating the backup | 19:31 |
ianw | we don't encrypt the backups, i think it would be ok | 19:31 |
clarkb | I think cow would be fine the way we're using borg | 19:31 |
fungi | oh, right then | 19:31 |
ianw | anyway we can discuss in the review, but yeah i would like to get this all sorted and running by itself very soon | 19:33 |
clarkb | ++ to getting this sorted soon. I intend on looking at it much closer this afternoon. I want to catch up on the docs and related issues | 19:33 |
clarkb | Anything else or should we move on to the next item? | 19:33 |
ianw | nope, move on | 19:34 |
clarkb | #topic Deploy a new refstack.openstack.org server | 19:34 |
*** openstack changes topic to "Deploy a new refstack.openstack.org server (Meeting topic: infra)" | 19:34 | |
clarkb | kopecmartin: has updated my old change to make a refstack container | 19:34 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/705258 | 19:34 |
clarkb | we think it is just about read to be landed, then we can spin up a new refstack server on bionic/focal (probably focal), make sure it works ( kopecmartin has volunteered to help with this step ), then migrate teh data from the old instance to the new one iwth a scheduled downtime | 19:35 |
clarkb | I think the main thing we need help with is someone to spin up the new instance, configure dns records, and ensure that LE and ansible and all that are happy | 19:36 |
ianw | happy to help with that | 19:36 |
clarkb | cool. I'm happy to keep helping too, but worried that I'm not in a great spot to drive any single effort right now (as I'm assisting a bunch) | 19:37 |
clarkb | I also expect we may need to learn us a refstack in order to figure out what the migration from old to new server will look like, but I'm 99% sure that can happen once we're happy the new deployment functions at all | 19:37 |
ianw | can it run without DNS pointed to it? | 19:37 |
ianw | i haven't looked but i was imagning it would be a db import? | 19:38 |
clarkb | ianw: you might need to edit your lcaol /etc/hosts to make everything happy but it should | 19:38 |
clarkb | yes I believe it is a db import | 19:38 |
*** hasharDinner is now known as hashar | 19:39 | |
clarkb | kopecmartin: do you know what kind of testing you think would be appropriate here? | 19:39 |
ianw | speaking of, should it be in our backup rotation? | 19:39 |
clarkb | ianw: probably | 19:39 |
kopecmartin | clarkb: click on every button in the UI , try to upload new results files, register a new user maybe .. this kind of things | 19:40 |
clarkb | kopecmartin: got it, general usage | 19:40 |
clarkb | makes sense | 19:40 |
ianw | ok, we can tackle that separately. i'll review 705258 and can try starting something | 19:40 |
clarkb | I expect all that will work if you set /etc/hosts | 19:40 |
kopecmartin | yeah, we dropped py2 support, so i'd like to exercise every function of the site | 19:40 |
kopecmartin | luckily it's not that complex | 19:40 |
clarkb | Thank you ianw for helping out. | 19:41 |
clarkb | Anything else on this topic? | 19:41 |
kopecmartin | not from my side, just let me know it you need anything | 19:41 |
clarkb | will do! | 19:41 |
clarkb | The next two items are on my plate and have been neglected due to other distractions. This is why I'm wary to dive into something new :/ | 19:42 |
clarkb | #topic Picking up steam on Puppet -> Ansible rewrites | 19:42 |
*** openstack changes topic to "Picking up steam on Puppet -> Ansible rewrites (Meeting topic: infra)" | 19:42 | |
clarkb | I have yet to write this etherpad, but I'm hopeful I'll get to it this week. I think it will give us good perspective and ability to prioritize effort | 19:42 |
clarkb | Not really anything else to add to this. Other than thank you to everyone who has continued to push on migrating us off of puppet | 19:43 |
clarkb | #topic inmotion cloud openstack as a service | 19:43 |
*** openstack changes topic to "inmotion cloud openstack as a service (Meeting topic: infra)" | 19:43 | |
clarkb | I'm hoping that tomorrow I can try turning this on and see what happens | 19:43 |
clarkb | If all goes well hopeflly we'll be able to expand nodepool's resource pool | 19:44 |
clarkb | its been a while since I did one of these though so should be interesting to see how it goes | 19:44 |
clarkb | I know they are interested in our feedback too, which always makes it easier when things are weird or not working | 19:45 |
clarkb | #topic Open Discussion | 19:45 |
*** openstack changes topic to "Open Discussion (Meeting topic: infra)" | 19:45 | |
clarkb | Anything else that didn't make it on the agenda that you'd like to bring up? | 19:45 |
fungi | change in vexxhost node memory? | 19:45 |
fungi | something we probably need to keep an eye on, as folks could start merging regressions for memory use more easily | 19:46 |
ianw | i missed that, did it go up or down? | 19:46 |
fungi | or will generally start asking why not all of our nodes have 32gb ram | 19:46 |
clarkb | ianw: it wen tup to 32GB of memory | 19:46 |
clarkb | the risk is that changes could merge in vexxhost that cannot merge anywhere else | 19:47 |
fungi | #link https://review.opendev.org/773710 Switch to using v3-standard-8 flavors | 19:47 |
ianw | ahh | 19:48 |
clarkb | fungi: piecing together dansmith's question in #openstack-infra and some of what was discussed in #opendev is this also thought to improve io in vexxhost? | 19:48 |
clarkb | or are those separate concerns? | 19:48 |
ianw | i feel like there was at some point something we did booting nodes with like kernel mem= parameters to keep them all the same | 19:48 |
ianw | but that's probably very silly, to have 32gb allocated but artifically limit to only 8 | 19:49 |
fungi | i'm not clear on whether it will improve i/o performance | 19:49 |
fungi | ianw: yeah, that's what i was referring to in my review comment | 19:49 |
clarkb | ianw: ya we did that to avoid the fear that we could merge thigns in one cloud and then break jobs in all the others | 19:49 |
fungi | also we had to do it in bootloader configuration, which means applying it to all our providers | 19:50 |
clarkb | back then you couldn't reboot with new kernel parameters, you can now so it would be a bit of a bandaid to do it now | 19:50 |
ianw | we could run them as static nodes and do a 1:4 reverse split :) | 19:51 |
clarkb | that is an interesting thought, static nodes seem like pain though :) | 19:52 |
clarkb | maybe converting the set to a large k8s cluster and then scheduling into that with nodepool would make sense if we found infinite time somewhere :) | 19:53 |
ianw | we'll see how the bare-metal cloud thing works out :) | 19:53 |
clarkb | definitely worth a brainstorm to think about other ways of slicing them | 19:53 |
clarkb | I'll think it through on tomorrow's bike ride :) | 19:53 |
clarkb | or try to anyway, its probably going to be cold and my brain won't work | 19:53 |
ianw | yeah the k8s cluster is probably actually a pretty sane thing to think about | 19:54 |
clarkb | sounds like that may be all. Thank you everyone | 19:57 |
clarkb | #endmeeting | 19:57 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 19:57 | |
openstack | Meeting ended Tue Feb 2 19:57:12 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:57 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-02-19.01.html | 19:57 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-02-19.01.txt | 19:57 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-02-19.01.log.html | 19:57 |
fungi | thanks clarkb! | 19:57 |
*** hashar has quit IRC | 21:40 | |
*** diablo_rojo has joined #opendev-meeting | 22:53 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!