clarkb | Anyone else here for the meeting? | 19:00 |
---|---|---|
ianw | o/ | 19:01 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Jun 29 19:01:06 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-June/000262.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | No real announcements other than my life is returning to normally scheduled day to day so I'll be around at typical times now | 19:01 |
clarkb | The one exception to that is Monday is apparently the observation of a holiday here | 19:02 |
fungi | yes, th eone where citizens endeavor to celebrate the independence of their nation by blowing up a small piece of it | 19:02 |
diablo_rojo | o/ | 19:02 |
fungi | always a fun occasion | 19:02 |
clarkb | fungi: yup, but also this year I think we are declaring the pandemic is over here and we should remove all precautions | 19:02 |
fungi | blowing up in more ways than one, in that case | 19:03 |
clarkb | But I'll be around Tuesday and we'll have a meeting as usual | 19:03 |
clarkb | #topic Actions from last meeting | 19:03 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-06-15-19.01.txt minutes from last meeting | 19:03 |
clarkb | I have not brought up the ELK situation with openstack leadership yet. diablo_rojo fyi I intend on doing that when I find time in the near future. Mostly just to plan out what we are doing next as far as wind down goes | 19:04 |
clarkb | #action clarkb Followup with OpenStack on ELK retirement | 19:04 |
clarkb | ianw: have ppc packages been cleaned up from centos mirrors? | 19:04 |
fungi | though they're presenting a call to action for it to the board of directors tomorrow | 19:04 |
diablo_rojo | makes sense to me. | 19:05 |
fungi | (the elastic recheck support request, i mean) | 19:05 |
clarkb | fungi: yup, I don't think we have to say "its turning off tomorrow" more of a we are doing these things you are doing those things, when is a reasonable time to say its dead or not | 19:05 |
clarkb | and start to create the longer term expectations | 19:05 |
ianw | clarkb: yep, that was done https://review.opendev.org/c/opendev/system-config/+/797365 | 19:05 |
clarkb | ianw: excellent thanks! | 19:06 |
clarkb | and I don't think a prometheus replacement for cacti spec has been written yet either. I'm mostly keeping this on the list because i think it is a good idea and keeping it visible can only help make it happen :) | 19:06 |
clarkb | #action someone write spec to replace Cacti with Prometheus | 19:06 |
fungi | also, while it didn't get flagged as an action item it effectively was one: | 19:07 |
fungi | #link https://review.opendev.org/797990 Stop updating Gerrit RDBMS for repo renames | 19:07 |
fungi | now i can stop forgetting to remember to do that | 19:07 |
clarkb | fungi: great, I'll have to give that a review (I've been on a review push myself the last few days trying to catch up on all the awesome work everyone has been doing) | 19:07 |
clarkb | #topic Topics | 19:08 |
clarkb | #topic Eavesdrop and Limnoria | 19:08 |
clarkb | We discovered there was a bug in the channel log conversion from raw text logs to html that may have explained the lag people noticed in those files | 19:08 |
clarkb | basically we ran the conversion once an hour instead of every 15 minutes. Fungi wrote a fix for that. | 19:09 |
fungi | and it merged | 19:09 |
fungi | so should be back to behavnig normally now | 19:09 |
fungi | behaving | 19:09 |
clarkb | Would be good to keep an eye out for any new reports of lag in those logs, but I think we can call it fixed now based on what we saw timestamp wise yesterday | 19:09 |
clarkb | ++ | 19:09 |
ianw | sorry about that, missed a * from the old job :/ | 19:09 |
fungi | that was the new lag, by the way, the old lag before that was related to flushing files | 19:10 |
fungi | so we actually had two lag sources playing off one another | 19:10 |
clarkb | ah cool I wasn't sure if we saw lag in the text files previously or only html | 19:10 |
clarkb | text files were happy yseterday it seemed like when we looked at least and then we fixed the html side | 19:10 |
clarkb | #topic Gerrit Account Cleanup | 19:11 |
clarkb | I'm hoping to find time for this among everything else and deactivate those accounts whose external ids we'll delete later | 19:11 |
clarkb | fungi: you started to look at that more closely, have you had a chance to do a sufficient sampling to be comfortable with the list? | 19:12 |
fungi | yes, my spot-checking didn't turn up any concerns | 19:13 |
clarkb | great, I'll try to pencil this in for the end of the week then and do the account retirement/deactivation then in a few weeks we can do the external id deletions for all those that don't complain (and none should) | 19:13 |
clarkb | #topic Review Upgrade | 19:14 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-upgrade-2021 Upgrade Checklist | 19:14 |
clarkb | The agenda says this document is ready for review. infra-root please take a look at it | 19:14 |
clarkb | ianw: does the ipv6 problem that recently happened put a pause on this while we sort that out? | 19:14 |
ianw | i'm not sure, i rebooted the host and it came back | 19:15 |
ianw | i know there was some known issue at some point that required a reboot prior, it might have been broken then | 19:16 |
clarkb | ianw: the issue that hapepned on the cloud side? | 19:16 |
clarkb | Considering that we do have the otpion of removing the AAAA record from DNS temporarily if necessary I suspect this isn't critical. But others may feel more strongly about ipv6 | 19:16 |
fungi | there was a host outage and reboot/migration forced at one point, but i don't recall how long ago | 19:17 |
fungi | and probably didn't track it closely since the server was not yet in production | 19:17 |
ianw | right, that feels like the sort of thing that duplicate addresses might pop up in | 19:17 |
clarkb | it happened the weekend after I did all those focal reboots | 19:17 |
clarkb | I remember because I delayed review02s reboot and then vexxhost took care of it for me :) | 19:17 |
fungi | ahh, right, mnaser let us know about it, could find a more precise time in irc logs | 19:18 |
clarkb | and ya that seems like a possibility if there was a migration with two instances out there fighting over arp | 19:18 |
clarkb | (or even just not properly flushing the router's tables first) | 19:18 |
fungi | well, dad operates on the server seeing evidence of a conflict | 19:19 |
fungi | so presumably there really were two systems trying to use the same v6 address at the same moment | 19:19 |
clarkb | got it | 19:19 |
ianw | anyway, if we can work on that checklist, i'm happy to maybe do this on a .au monday morning. that's usually a very quiet time | 19:19 |
ianw | i'm not sure if we could be ready for the 5th, but that would be even quieter | 19:20 |
fungi | will do, thanks! | 19:20 |
clarkb | ianw: yup I'll need to add that to my list of reviews for today. And I can do .au morning as well usually. Since that overlaps with my afternoon/evening without too much pain | 19:20 |
clarkb | ianw: I think your suggested date of the 19th is probably reasonable | 19:20 |
clarkb | that way we can announce it with a couple of weeks of notice too (so that firewall rules can be updates in various places if necessary) | 19:20 |
clarkb | maybe plan to send that out in a couple of days after we have a chance to double check your checklist | 19:21 |
ianw | the 12th maybe too, although i'll be out a day or two before that (still deciding on plans wrt. lockdowns, etc.) | 19:21 |
clarkb | I like giving a bit of notice for this and 19th feels like a good balance between too little and too much | 19:22 |
clarkb | infra-root ^ feel free to weigh in though | 19:22 |
fungi | i the past we've announced the new ip addresses somewhat in advance | 19:23 |
clarkb | yes in the past we've tried to do ~4 weeks iirc | 19:23 |
fungi | since a number of companies maintain firewall exceptions allowing their employees or ci systems to connect | 19:23 |
clarkb | but we also had more companies with strict firewall rules than we have today (or at least they don't complain as much anymore) | 19:23 |
ianw | ok, i can construct a notification for that soon then, as i don't see any reason we'll change the ip and reverse dns is setup too | 19:23 |
fungi | right, i do think 4 weeks is probably excessive today | 19:24 |
fungi | but if we can give them a heads up, sooner would be better than later | 19:24 |
clarkb | ++ | 19:24 |
clarkb | we could even advertise the new IPs with a no sooner than X date | 19:24 |
clarkb | then they can add firwall rules and keep the old one in place until we do the switch | 19:25 |
clarkb | but the 19th seems like a good option to me. | 19:25 |
clarkb | should cross check against release schedules for various projects but I think that is relatively quiet time | 19:25 |
clarkb | Anything else on the review upgrade topic? | 19:25 |
ianw | not really, i just want to get the checklist as detailed as possible | 19:26 |
fungi | i got nothin' | 19:26 |
fungi | thanks for organizing this, ianw! | 19:26 |
clarkb | #topic Listserv upgrades | 19:26 |
clarkb | ++ thanks! | 19:26 |
clarkb | I've somewhat stalled out on this and worry I've got a number of other tasks that are just as or more important fighting for time | 19:27 |
clarkb | If anyone else wants to boot hte test node and run through an upgrade on it I've already started notes on an etherpad somewhere I should dig up again. But if not I'll keep this on my list and try to get to it when I can | 19:27 |
clarkb | Mostly this is a heads up that I'm probably not getting to it this week. Hopeflly next | 19:28 |
clarkb | #topic Draft matrix spec | 19:28 |
clarkb | #link https://review.opendev.org/796156 Draft matrix spec | 19:28 |
clarkb | I reached out to EMS (element matrix services) today through a contact that corvus had | 19:28 |
clarkb | Their day was largely already over but they said they will try to scheduel a call with me tomorrow. | 19:29 |
clarkb | I suspect that corvus would be itnerested in bneing on that call. Is anyone else interested too? We'll be overlapping with pacific timezone and europe so the window for that isn't very large | 19:29 |
corvus | thanks! i'm hoping we can narrow the options down and revise the spec with something more concrete there | 19:30 |
clarkb | I suspect this intiial conversation will be super high level and not incredibly important for everyone to be on. But I'm happy to include others if there is interest | 19:30 |
clarkb | corvus: ++ | 19:30 |
fungi | i can be on the call, but am happy to entrust the discussion to the two of you | 19:31 |
clarkb | alright I'll see what they say schedule wise tomorrow | 19:32 |
clarkb | #topic gitea01 backups | 19:32 |
clarkb | Not sure if anyone has looked into this yet but gitea01 seems to be failign to backup to one of our two backup targets | 19:32 |
ianw | is it somewhat random? | 19:32 |
clarkb | Thought I would bring it up here to ensure it wasn't forgotten. I don't think this is super urgent as we haven't made any recent project renames (which would update the db tables that we want to backup) | 19:33 |
fungi | i haven't checked the logs, just noticed the notifications to the root inbox | 19:33 |
clarkb | ianw: no it seems to happen consistently each day | 19:33 |
fungi | seems like it's consistently every day | 19:33 |
clarkb | the consistency is why I believe only one backup target is affected | 19:33 |
clarkb | (otherwise we'd see multiple timestamps?) | 19:33 |
ianw | i'm sure it's mysql dropping right? | 19:33 |
fungi | appears to have started on 2021-06-12 | 19:33 |
clarkb | ianw: I havne't even dug in that far, but probably a good guess | 19:34 |
mordred | clarkb: (sorry, I'd also love to be on the matrix call, but obviously don't block on me) | 19:34 |
ianw | http://paste.openstack.org/show/807046/ | 19:34 |
fungi | socket timeouts maybe? | 19:35 |
clarkb | mordred: noted | 19:35 |
fungi | i wonder if the connection goes idle waiting on the query to complete | 19:35 |
ianw | but only to the vexxhost backup | 19:35 |
fungi | which implies some router in that path dropping state prematurely | 19:36 |
fungi | or nat if we're doing a vip | 19:36 |
ianw | and this runs in vexxhost, right? so the external further-away rax backup is working | 19:36 |
clarkb | yup gitea01 is in sjc vexxhost | 19:36 |
clarkb | and the mysql is localhost | 19:36 |
fungi | oh, it's vexx-to-vexx dropping? hmm... yeah that's strange | 19:37 |
fungi | and same region presumably | 19:37 |
ianw | 64 bytes from 2604:e100:1:0:f816:3eff:fe83:a5e5 (2604:e100:1:0:f816:3eff:fe83:a5e5): icmp_seq=6 ttl=47 time=72.0 ms | 19:37 |
ianw | 64 bytes from backup01.ord.rax.opendev.org (2001:4801:7825:103:be76:4eff:fe10:1b1): icmp_seq=3 ttl=52 time=49.9 ms | 19:37 |
ianw | the ping to rax seems lower | 19:37 |
fungi | also surprising | 19:37 |
clarkb | if the backup server is in montreal then that would make sense | 19:38 |
clarkb | since ord is slightly closer to sjc than montreal | 19:38 |
clarkb | anyway we don't have to do live debugging in the meeting. I just wanted to bring it up as a not super urgent issue but one that should probably be addressed | 19:38 |
clarkb | (the db backups in both sites should be complete until we do a project rename) | 19:39 |
fungi | i thought he was saying tat higher rtt was locally within vexxhost | 19:39 |
fungi | but yeah, we can dig into it after the meeting | 19:39 |
clarkb | as it is project renames that update the redirects whcih live in the db | 19:39 |
ianw | this streams the output of mysqldump directly to the server | 19:39 |
clarkb | #topic Scheduling Project Renames | 19:39 |
ianw | so if anyone knows any timeout options for that, let me know :)\ | 19:40 |
clarkb | Lets move on and then we can discuss further at the end or eat lunch/breakfast/dinner :) | 19:40 |
fungi | in theory we can "just do it" now that the rename playbook no longer tries to update the nonexistent mysql db | 19:40 |
clarkb | For project renames do we want to try and incorporate that into the server move? My preference would be that maybe we do the renames the week after once we're settled into the new server and not try to overdo it | 19:40 |
fungi | i don't think we had any other pending blockers besides actual scheculing anyway | 19:40 |
clarkb | fungi: linked one of the changes we need to do renames | 19:40 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/797990/ | 19:41 |
fungi | yeah, once that merges i mean | 19:41 |
clarkb | Anyone have a concern with doing the renames a week after the move? | 19:41 |
clarkb | That should probably be enough time to be settled in on the new server and if not we can always reschedule | 19:42 |
ianw | ++ | 19:42 |
fungi | wfm | 19:42 |
clarkb | but that gives us a time frame to tell people to get their requests in for | 19:42 |
clarkb | great | 19:42 |
fungi | and also a window to do any non-urgent post-move config tweaks | 19:42 |
clarkb | ++ | 19:42 |
fungi | in case we spot things which need adjusting | 19:42 |
clarkb | #topic Open Discussion | 19:43 |
clarkb | Anything else to bring up? | 19:44 |
diablo_rojo | I think I have the container mostly setup for the ptgbot? | 19:44 |
clarkb | diablo_rojo: oh cool are there changes that need review? | 19:44 |
diablo_rojo | oh. failing zuul though. | 19:44 |
fungi | on the oftc migration wrap-up, i have an infra manual change which needs reviewing: | 19:45 |
diablo_rojo | clarkb, just the one kinda? I havent written the role yet for it. Started with setting up the container | 19:45 |
fungi | #link https://review.opendev.org/797531 Switch docs from referencing Freenode to OFTC | 19:45 |
clarkb | diablo_rojo: have a link? | 19:45 |
diablo_rojo | https://review.opendev.org/c/openstack/ptgbot/+/798025 | 19:45 |
clarkb | great I'll try to take a look at that change too. Feel free to reach out about the failures too | 19:46 |
clarkb | fungi: that looks like a good one to get in ASAP to avoid any additional confusion that may be causing | 19:47 |
fungi | there was some discussion between other reviewers about adjustments, so more feedback around those for preferences would be appreciated | 19:47 |
ianw | diablo_rojo: i think you've got an openstack that hsould be an opendev at first glance : FileNotFoundError: [Errno 2] No such file or directory: '/home/zuul/src/openstack.org/opendev/ptgbot' | 19:48 |
diablo_rojo | Oh I thought I had that as opendev originally. | 19:48 |
diablo_rojo | I can change that back | 19:48 |
ianw | i think it has a high chance of working with that | 19:49 |
diablo_rojo | Sweet. | 19:49 |
diablo_rojo | Will do that now. | 19:49 |
ianw | speaking of building images for external projects | 19:49 |
ianw | #link https://review.opendev.org/c/openstack/project-config/+/798413 | 19:49 |
ianw | is there a reason lodgeit isn't in openstack? i can't reference it's image build jobs from system-config jobs, so can't do a speculative build of the image | 19:50 |
fungi | yeah, the ptgbot repo is openstack/ptgbot | 19:50 |
fungi | the puppet-ptgbot repo we'll be retiring is opendev/puppet-ptgbot | 19:50 |
fungi | different namespaces | 19:50 |
ianw | yeah, i think "opendev.org/openstack/ptgbot" is the path | 19:50 |
clarkb | ianw: no I think it was one of the very first moves out to opendev and we probably just figured it was fine to be completely separate | 19:50 |
clarkb | ianw: we've learned soem stuff since then | 19:50 |
ianw | ok, if we could add it with that review that would be helpful :) | 19:51 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/798400 | 19:51 |
clarkb | ianw: you may need a null include for that repo though | 19:51 |
clarkb | ianw: since its jobs are expected to be handled in the opendev tenant | 19:51 |
clarkb | include: [] is what we do for gerrit just above in your change | 19:51 |
clarkb | corvus: ^ can probably confirm that | 19:51 |
fungi | yeah, i think the expectation was that the rest would be moving to the opendev tenant in time, and then we could interlink them | 19:52 |
ianw | is working https://104.130.239.208/ is a held node | 19:52 |
fungi | i've managed to move some more leaf repos into the opendev tenant, but things heavily integrated with system-config or openstack/project-config are harder | 19:52 |
ianw | but there is some sort of db timeout weirdness. when you submit, you can see in the network window it gets redirected to the new paste but then it seems to take 60s for the query to return | 19:53 |
ianw | i'm not yet sure if it's my janky hand-crafted server there or somethign systematic | 19:53 |
ianw | suggestions welcome | 19:53 |
clarkb | ianw: if you hack /etc/hosts locally wouldn't that avoid any redirect problems? | 19:54 |
clarkb | might help isolate things a bit. But I doubt that is a solution | 19:54 |
ianw | i don't think it is name resolution; it really seems like the db, or something in sqlalchemy, takes that long to return | 19:55 |
ianw | but then it does, and further queries work fine | 19:55 |
clarkb | it only happens the first time? | 19:55 |
clarkb | We are just about at time. I need lunch and then I have a large stack of changes and etherpads to review :) Thank you everyone! We'll be back here same time and place next week. As always feel free to reach out to us anytime on the mailing list of in #opendev | 19:56 |
ianw | when you paste a new ... paste. anyway, yeah, chat in #opendev | 19:56 |
fungi | thanks clarkb! | 19:57 |
clarkb | ya sorry, realized we should move along (not going to lie in part because I am now very hungry :) ) | 19:57 |
clarkb | #endmeeting | 19:57 |
opendevmeet | Meeting ended Tue Jun 29 19:57:29 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:57 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-06-29-19.01.html | 19:57 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-06-29-19.01.txt | 19:57 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-06-29-19.01.log.html | 19:57 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!