clarkb | Just about meeting time. We'll get started shortly | 18:59 |
---|---|---|
ianw | o/ | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Jun 14 19:01:09 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-June/000339.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | I had none | 19:01 |
clarkb | There were no actions last meeting either so we can dive right into the agenda | 19:02 |
clarkb | #topic Topics | 19:02 |
clarkb | #topic Improving CD throughput | 19:02 |
clarkb | We worked through the issues with the zuul cluster upgrade and reboot playbook and managed to run it to completion without error | 19:03 |
clarkb | The next step there is to run it automatically. It took abou 18 hours to complete so I figure a daily cron wiht some sort of locking mechanism is appropriate. Any concern with getting that set up? | 19:03 |
frickler | I'd think maybe once a week would be often enough? | 19:04 |
frickler | running this 3/4 of the time seems a bit much to me | 19:04 |
clarkb | ya we don't have to do it as often as possible either | 19:04 |
clarkb | In that case maybe a weekend cron to do it when zuul is under the least load. I can work on that | 19:05 |
fungi | probably one more manual run is in order before we turn on a cronjob too | 19:05 |
clarkb | ++ | 19:05 |
fungi | i'm happy to run it, e.g., tomorrow | 19:06 |
clarkb | thanks | 19:06 |
fungi | this week is pretty quiet, so may go faster and also probably less impact if something does go wrong | 19:06 |
clarkb | sounds like a plan. Anything else on this tpoic? | 19:07 |
clarkb | #topic Gerrit 3.5 upgrade planning | 19:09 |
clarkb | ianw: are we still on track for doing this your monday (sunday utc)? | 19:09 |
ianw | yes i think so | 19:10 |
clarkb | I ended up pushing a change for the collision checking config, but in the process realized the default is to enable it so that bit is less urgent than I thought it was | 19:10 |
ianw | couple of config todo's but i'll get that done soon | 19:10 |
ianw | ++, sorry haven't checked review queues just yet but sounds good | 19:11 |
clarkb | I guess let us know if we need to review anything or go over the process. I was planning to look at the etherpad more closely again, but this upgrade very closely resembles the 3.4 upgrade iirc | 19:11 |
clarkb | the next one to 3.6 is a bit more involved but we aren't going that far | 19:12 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.5 | 19:12 |
clarkb | anything else to call out before the weekend upgrade? | 19:13 |
ianw | nope, as you say this one doesn't seem too involved | 19:13 |
clarkb | #topic Changing our default ansible version in Zuul | 19:14 |
clarkb | I meant to send email about this but then summit travel and prep ended up beingtoo distracting. | 19:14 |
frickler | I also forgot about kolla testing with all the other brokenness | 19:15 |
clarkb | Do we think tw oweeks notice if I send an email this week is sufficient for flipping to ansible v5 by default at the end of june or should we do it in july | 19:15 |
fungi | seems reasonable. set it much longer and openstack's release cycle will be too far along | 19:15 |
frickler | I think it is o.k., I don't expect many people to act before it happens | 19:16 |
fungi | agreed | 19:16 |
frickler | and that, too | 19:16 |
fungi | well, the two are directly related ;) | 19:16 |
clarkb | ok I'll plan to send notice of that changing June 30 then (its a thursday so that gives people time before the weekend to loo kat brokeness) | 19:16 |
fungi | thanks! | 19:16 |
clarkb | #topic Enable webapp on nodepool launchers | 19:17 |
clarkb | frickler: I think you added this one. I did want to point out we do run a webserver on the builders | 19:17 |
frickler | yes, I came across that while looking at how to check to stuck image build | 19:17 |
clarkb | But I think you're looking for access to the newer launcher api stuff | 19:17 |
frickler | the webserver only serves log and images right now iiuc | 19:17 |
frickler | we could add the couple of special URLs that the api serves to it | 19:18 |
frickler | and then have a data source to check image builds quite easily | 19:18 |
clarkb | ya I think adding that is fine and a good idea | 19:18 |
frickler | do we need a spec? otherwise I could just hack up a patch I think | 19:18 |
clarkb | I don't think we need a spec. We already have a webserver in place and there isn't any privilged info | 19:19 |
clarkb | just a matter of adding the webserver to the launchers and wiring it up to the api bits | 19:19 |
clarkb | (no new servers, no new security concerns, no new dns records, etc pretty traightforward) | 19:19 |
ianw | my theory with this was that we should be able to see from a dashboard like ... | 19:20 |
clarkb | the zuul dashboard does expose nodes and labels but not the images | 19:20 |
ianw | https://grafana.opendev.org/d/f3089338b3/nodepool-dib-status?orgId=1 | 19:20 |
ianw | i have to admit i haven't looked at that in a while, and now it has a big *green* FAILED | 19:21 |
frickler | oh, I didn't know that page | 19:21 |
clarkb | oh ya I don't recall knowing that existed | 19:21 |
ianw | grafana has ways to alert us of issues, but we've never quite managed to get consensus on actually turning that on | 19:22 |
frickler | maybe if we manage to make "failed" red, that's already all we need | 19:22 |
frickler | just for comparison, this is an example of how the api result looks like https://paste.opendev.org/show/bwHPkLhxzyARMsOryUyV/ | 19:22 |
frickler | but this is also maybe something to shortly talk about | 19:23 |
frickler | arm64 builds are broken, haven't checked yet why | 19:23 |
clarkb | I don't think it hurts to have the information available directly via the api too if we still want to add that | 19:24 |
ianw | yeah i saw that note, thanks, sorry i've been out a few days but will look into it | 19:24 |
frickler | and centos9 waits for a dib release which is difficult because there is a nasty workaround merged | 19:24 |
clarkb | but I agree the dashborad is likely more generally a better way to consume tit | 19:24 |
clarkb | I'm on my laptop keyboard and my typing is extra bad | 19:24 |
frickler | I'll try to get the API working anyway, yes | 19:24 |
ianw | yeah i'm hoping the centos 9 packages have been fixed in the last few days | 19:24 |
frickler | and the other thing is wheels haven't been published for 14 days, I think also due to centos9 | 19:25 |
clarkb | ya the afs packaging is sensitive to booting on current kernels so when the images get delayed wheels get delayed | 19:26 |
clarkb | I wonder if we need to only publish if all jobs pass though | 19:26 |
clarkb | and instead just publish whatever we've built | 19:26 |
ianw | yeah, that's been a constant issue; not sure if we have a "finally" type zuul dependency? | 19:27 |
frickler | or make arch specific publishing? | 19:27 |
fungi | i suppose that's safe, it shouldn't create a wheel if building that wheel fails, so we're probably not going to be more likely to publish broken wheels that way at least | 19:27 |
clarkb | fungi: yup exactly. If we write a wheel it should be fine tp publsih | 19:27 |
fungi | that said, we're more likely to not notice it's broken if we do that | 19:28 |
ianw | https://opendev.org/openstack/project-config/src/branch/master/zuul.d/projects.yaml#L4811 is where it is released | 19:28 |
fungi | er, more likely to not notice we've started failing to build some hweels i mean | 19:28 |
clarkb | fungi: ya Ithink that is the balance. Is it better to hold everything up and probably notice or do best effort and maybe not notice as quickly | 19:28 |
ianw | (also, grafana monitors this, and i would also be happy for it to push me notifications it was broken) | 19:29 |
clarkb | ianw: in the past we've said making notifications like that opt in would be fine. I think I'm also ok with sending them to an infra-root@ folder | 19:30 |
clarkb | I would probably consume them ^ that way | 19:30 |
clarkb | (we just want to avoid people getting middle of the night pages and feeling obligated to do something, but an alert that can be checked in the morning is something I woul dfind helpful) | 19:30 |
fungi | yes, my position on it is that notifications of what's broken is fine, as long as we don't et expectations that someone is necessarily going to address whatever we're being notified about, and as long as the false failure rate isn't significant | 19:31 |
fungi | we already do it for cronjobs, expiring ssl certs, et cetera | 19:31 |
ianw | https://review.opendev.org/c/opendev/system-config/+/573183/ was in this area | 19:32 |
clarkb | I think I would avoid irc (at least to start) and do email if we can | 19:34 |
clarkb | simply because it is easier to "subscribe" with email | 19:34 |
clarkb | (though most irc clients will let you filter stuff out too) | 19:34 |
clarkb | but ya I think if we can make grafana send us an email to infra-root@ and elsewhere that would work | 19:34 |
ianw | https://meetings.opendev.org/irclogs/%23openstack-infra/%23openstack-infra.2018-06-07.log.html#t2018-06-07T23:43:25 was some discussion on it | 19:36 |
ianw | at the time i accidentally left a test server alerting top #openstack-infra, which probably had people starting from a base of "already annoyed" :) | 19:37 |
fungi | hah | 19:37 |
frickler | we might use a dedicated channel then. but I'm also not against mail | 19:38 |
clarkb | ya a dedicated channel would bte other method. Then I just want join that channel on my phone :) | 19:38 |
ianw | also, this might go into another point of contention on this as well, which is i'm not sure exactly how to set it up, but i feel like grafyaml may not support it | 19:39 |
clarkb | if these are things we can add to specific grpahs it may work with grafyaml as is | 19:40 |
clarkb | anyway we have one more agenda item to get to. We don't need to design this here. It may be worth a specific agenda item or a spec/email thread for future discussion though | 19:42 |
clarkb | #topic Running a URL shortener | 19:42 |
clarkb | frickler pointed out that people use services like bit.ly | 19:42 |
frickler | another thing I came up with, yes | 19:43 |
clarkb | #topic https://opensource.com/article/18/7/apache-url-shortener an open source alternative we could host | 19:43 |
frickler | and seeing that apache2 has everything one needs was new to me | 19:43 |
clarkb | I'm not opposed and this seems like the sort of thing that would fit in well on static.o.o | 19:44 |
ianw | i guess my concern is that it seems to be a target for abuse, isn't that why github killed "git.io"? | 19:44 |
clarkb | ianw: in this case I think you'd have to modify a file via gerrit, it wouldn't be self service | 19:44 |
frickler | well we would still have reviews in front of the data | 19:44 |
frickler | I would do it within project-config for simplicity, but we could also use a dedicated repo if you prefer | 19:45 |
fungi | yeah, the main concern i have is that this is something we'd probably have to commit to maintain ~forever or else break people's external links | 19:45 |
fungi | however, it does seem like a pretty lightweight thing | 19:45 |
ianw | oh, so basically just a vhost with a list of 301 redirects? | 19:45 |
clarkb | ianw: ya | 19:45 |
clarkb | it is simple neough that fungi's concern doens't seem to be a major thing. If we had to ru na proper wsgi service or similar I'd think differently | 19:46 |
frickler | RewriteMap shortlinks txt:/data/web/shortlink/links.txt RewriteRule ^/(.+)$ ${shortlinks:$1} [R=temp,L] | 19:46 |
ianw | that's what a large part of static.o.o is anyway :) | 19:46 |
fungi | agreed | 19:46 |
ianw | i certainly don't have an issue if it's just an easy-to-update config file that goes through review | 19:46 |
fungi | for the sites we already host, we do similar things, e.g. zuul-ci.org/start | 19:47 |
frickler | then another question would be whether e.g. l.opendev.org is short enough or we want to grab a shorter domain | 19:47 |
frickler | I reserved od42.de just in case, but not sure if everyone would be fine using a .de domain | 19:48 |
clarkb | using another domain typically adds another level of management with the registrar service | 19:48 |
ianw | i always find it weird that these things use what i generally don't consider stable countries as a top-level domain | 19:49 |
clarkb | its not impossible but avoiding that if possible is likely a good idea | 19:49 |
fungi | ianw: .io is a pet peeve of mine, yeah | 19:49 |
ianw | something in .dev maybe, but i imagine anything short is unavailable | 19:49 |
fungi | note that .dev is controlled by google too | 19:50 |
fungi | and they have a history of forcing a number of "experimental" features for domains in that tld as a result | 19:50 |
clarkb | my vote is something like l.opendev.org as it is one less thing to manage and I feel that is short enough to work on conference slide sfor example | 19:51 |
fungi | (where experimental means anything they're considering for tie-ins with chrome) | 19:51 |
frickler | we don't have to decide now, I can start preparing a patch with that anyway | 19:51 |
clarkb | yup we could expand to another domain later if we decide it is neceessary | 19:51 |
ianw | ++ i can't imagine we can get any shorter without spending ridiculous amounts of $ anyway | 19:51 |
fungi | the foundation already spent a semi-large amount of money to buy opendev.org off a scalper as it was | 19:52 |
fungi | and reusing a subdomain of opendev.org is also a bit of useful advertising for the collaboratory too | 19:53 |
fungi | "oh opendev, what's that?" | 19:53 |
clarkb | lets open it up to anything else befoer we run out of time | 19:53 |
clarkb | #topic Open DIsussion | 19:54 |
clarkb | anything else? | 19:54 |
frickler | do we want to restrict targets to being opendev related? | 19:54 |
clarkb | frickler: ya I wouldn't use it for arbitrary stuff to avoid that abuse concern ianw brought up | 19:54 |
frickler | anyway, can discuss that once I have a patch | 19:54 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/845066 | 19:54 |
ianw | that's a doc update for duplicate accounts | 19:55 |
clarkb | ah I'll have to take a look at that one | 19:55 |
ianw | and cleans up some other things | 19:55 |
frickler | I also have a zuul patch if someone get's bored ;) | 19:56 |
frickler | #link https://review.opendev.org/c/zuul/zuul/+/834671 | 19:56 |
ianw | interesting ... do people take anonymous patches? | 19:58 |
frickler | ianw: zuul can only see public data, not everyone publishes that | 19:58 |
frickler | in particular for the email | 19:59 |
clarkb | And we are at time. Thanks everyone. We'll be back here next week | 20:00 |
fungi | thanks clarkb! | 20:00 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue Jun 14 20:00:09 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-14-19.01.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-14-19.01.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-14-19.01.log.html | 20:00 |
frickler | thx | 20:00 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!