Tuesday, 2022-06-21

clarkbMeeting time19:01
clarkbAnyone else here?19:01
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Jun 21 19:01:22 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
ianwo/19:01
frickler\o19:02
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-June/000340.html Our Agenda19:02
clarkb#topic Announcements19:02
clarkbI had no announcements19:03
clarkbAnd there were no actiosn from last meeting19:03
clarkbThat means we can dive right in19:03
clarkb#topic Topics19:03
clarkb#topic Improving CD throughput19:03
clarkb#link https://review.opendev.org/c/opendev/system-config/+/846195 Running Zuul cluster upgrade playbook automatically.19:03
clarkbThis change is ready for review now. It sets up a cron that fires on the weekend to upgrade and reboot our entire zuul cluster19:04
clarkbfungi: ran the playbook by hand last week and it continued to function. I think we're as ready as we will be to do this, but let me know if you disagree. Also call out any problems with my implementation if you see them19:04
fungiyeah, it completed without issue19:04
fungitook over 24 hours to complete, but weekly seems like a reasonable cadence for that19:05
clarkband it should go quicker when zuul is calmer over the weekend19:06
fungiyes19:06
clarkbanyway we can followup further in review. Just wanted to call out the change exists and has had its -W removed19:06
clarkbAnything else on this topic?19:06
clarkb#topic Glean rpm platform static ipv6 support19:08
clarkbThis is sort of a ninja addition to the agenda, but I realized i never followed up on this whole thing due to travel19:08
clarkbianw: I see all the glean changes ended up merging. I guess we did the releases and things are happy on ovh now?19:08
clarkbIs there anything else that needs to be done or can we consider this completed?19:09
ianwi think that's complete, i haven't heard anything else on it19:09
clarkbgreat. Thank you for taking care of that19:09
ianwcertainly if somebody wants something to do, there could be quite a bit of refactoring done in glean19:09
ianwbut, it works, so if it ain't broke ... :)19:09
clarkb#topic Container Maintenance19:10
clarkbI wanted to call out that we upgraded our first mariadb installation from 10.4 to 10.6 during the gerrit 3.5 upgrade process19:10
clarkbAs far as I can tell that went well. We should probably start thinking about upgrading the DBs for other services too19:11
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.5 captures mariadb upgrade process19:11
clarkbThis isn't urgent and does require rot access as performed on gerrit. However, there is an env var we can set to have mariadb do it automatically for us if non roots want to help us be brave and push those changes :)19:12
clarkbs/rot/root/19:12
fungiroot1319:13
clarkb#topic Gerrit 3.5 upgrade19:13
clarkbThis happened. Thank you to everyone (ianw in particular) who helped get us here19:13
clarkbThe upgrade itself went very smoothly. We have noticed a couple of issues since then.19:14
clarkb#link https://bugs.chromium.org/p/gerrit/issues/detail?id=16018 Now fixed19:14
fungione of which is already addressed19:14
clarkbyup that one I just linked is fixed upstream and the fix is deployed in opendev19:14
clarkbThe other issue whcih frickler noticed is that it seems any change marked WorkInProgress also shows up as merge conflicting in a change listing19:14
clarkbI am hoping to have time to look into that probably tomorrow as I suspect that is a bug on the gerrit side where they equate WIP and merge conflict for some reason19:15
clarkbIf you notice other issues please do call them out19:15
fungihas anyone had an opportunity to confirm whether that also happens on 3.6?19:15
frickleractually some other kolla people noticed first19:15
clarkbI have not. The next thing to discuss is removing 3.4 images and adding 3.6 stuff whcih will make it easier for us to test that19:15
ianw++ having a 3.6 replication in s-c would probably be a good help 19:16
fricklerone other thing that seems new is the highlighting on hovering with the pointer on a word, which I think is very annoying19:16
ianwi feel like it was doing that before; or maybe i'm just thinking of the upstream gerrit19:16
clarkbyes that is new, and yes that is intentional and I haven't figured out if I like it or not yet19:16
clarkbianw: upstream gerrit did it before but 3.5 brought it to us19:16
clarkbI wonder if we can put that behind a config and turn it off19:16
frickleris that configurable somehow?19:16
clarkbfrickler: I looked for user config for it yesterday and couldn't find it19:17
clarkbI think user config would be ideal, but server wide would be acceptable too. I'll have to look at it more19:17
fungii guess that's another behavior gertty is shielding me from19:17
fungisince i don't get what's being described19:17
fungis/get/understand/19:17
clarkbfungi: ya if you mouse over wrods in the diff view gerrit now highlights all occurences of that word in the diff19:17
clarkbI personally prefer explicit use of ^F19:17
fungiuh... huh19:17
fricklerin screaming yellow19:17
clarkbnot sure why they added that, but its definitely something that seems intentional19:18
clarkb#link https://review.opendev.org/q/topic:gerrit-3.4-cleanups19:18
clarkbI've pushed up changes to being some of the cleanup here. The first two are actually followups to running 3.519:18
clarkb#link https://review.opendev.org/c/opendev/system-config/+/84703419:18
clarkb#link https://review.opendev.org/c/openstack/project-config/+/84705719:19
fungiis ^F another gerrit override? one of the reasons i don't use the webui is that it also implements its own keybindings, which seems to somehow override my browser's keybinds, so for someone who prefers to do keyboard-driven browser navigation it's nigh unworkable19:19
clarkbif we can get those reviewed then the rest of the stack can be checked and landed when we are happy that we are unlikely to revert19:19
clarkbfungi: old gerrit captured ^F but modern polygerrit doesn't it just uses the browser search function now19:19
clarkbfungi: there are still other shortcut keys though19:19
fungioh, well that's an improvement at least, though my keyboard navigation plugin relies on / to search which is probably still overridden19:20
clarkbI'm thinking maybe next week we remove 3.4 and hopefully by then we've also got the 3.6 jobs working and can land 3.6 quickly after 3.4 is removed19:20
fungiif there were an option to disable all the gerrit webui keybindings, i might consider trying it out again19:20
clarkbfungi: / is not overridden anymore19:20
clarkbbut [ and ] still move forward and backward through a review19:21
clarkboh wait / is but it grabs the normal search bar now not in page text search19:21
clarkbsorry about that they just chagned what search meant in the context of / I guess19:21
fungiyeah. ultimately i see that as the failing of the browser for not giving me an option to just ignore javascript keypress capture19:22
clarkbThe other thing we'll want to monitor is general memory usage by 3.5 since other users have had memory trouble. I suspect we're fine simply because we don't run extra plugins and metric gathering19:23
clarkbBut I'm remembering now that I meant to take a heap usage measurement then compare it daily or something19:23
fungialso the earlier memory capture attempts in zuul didn't show any difference, though obviously there's no production load there19:23
clarkbgerrit show-caches will give us this info I'll run that after the meeting and check it every day at roughly 1900 utc for the next few days19:24
clarkbAnything else gerrit upgraded related to call out before we move on?19:25
fungijust the surprisingly few complaints we've received19:25
fungiit's like hardly anyone even noticed we upgraded19:25
fungisuccess19:25
clarkb++19:25
clarkb#topic Enable Nodepool Launcher Webapp19:27
frickleractually it turns out that it is enabled19:27
fricklerjust not present in the config, which doesn't work in my local setup, but somehow works in the opendev deployment19:27
fricklerso nothing to do there19:28
clarkbI think ianw was also talking about fixing up the grafana dashboard stuff?19:28
fricklerthe grafana page has two issues: the "failed" status isn't shown in red19:28
clarkbto add missing images?19:28
fricklerand some of the timing graphs are missing19:28
clarkboh right eh color19:28
ianwyes i did start looking at this ...19:29
fricklerbut I couldn't find a solution for any of this19:29
fungiwhat's the launcher webapp path?19:29
ianwthe reason it doesn't work is because grafana has redone the way it deals with thresholds19:29
ianwand it seems the way that grafyaml writes thresholds is in the old format, that doesn't quite get upgraded properly to the new format, so it doesn't set "red" when the value is "1"19:30
fricklerfor the webapp, we should replace the apache default page with a list of the possible links. and maybe add ssl. http://nl01.opendev.org/19:31
clarkbianw: ok so a grafyaml update is required.19:31
fricklerhttp://nl01.opendev.org/dib-image-list and http://nl01.opendev.org/image-list are the useful things19:31
fungiwhat's the launcher webapp path?19:31
clarkbis that fork you found stillactive? I wonder if we're better off just adopting that?19:31
ianwso i started trying to reverse engineer the way to do new thresholds, and I find it fairly frustrating and frankly a bit pointless to be trying to rewrite grafyaml to a non-existent API19:31
fungithanks!19:31
clarkbianw: non existant because they change it arbitrarily?19:32
fungiand/or don't document it?19:32
ianwboth19:32
fungishades of jjb19:33
ianwthey do tend to write backwards compat things so that old dashboards load19:33
ianwthen also, as mentioned, i found https://github.com/deliveryhero/grafyaml who have done a bunch of reverse engineering work for new panel types, etc19:34
ianwbut have also run reformatters, and not chosen to interact with the upstream development at all19:34
ianwwhich was also depressing19:34
fungiit's apache 2.0 though, so we could just fork their changes back in19:35
fungioh, except for the reformatting19:35
fricklersound like typical german startup I must say19:35
ianwthen last time we discussed just using the raw json (which imo really isn't that hard to read) there was disagreement, which was also not fun19:35
ianwso frankly i left the whole exercise a bit depressed over everything :)19:36
clarkbya maybe we need to revive that discussion and see if we can find a compromise. Like maybe we can store the raw json as yaml to make it reviewable and then have a really light weight tool that just converts yaml to json directly for grafana consumption19:36
fungii'm tempted to open a github issue against that project saying that their updates aren't importable upstream due to the reformatting19:36
clarkbrather tahn doing a different representation entirely in yaml19:36
clarkbI assume ruamel can do that in a pretty straightforward manner for us19:37
fungipyyaml could presumably do it too19:37
ianwi don't know if we're having any different conversation than we had before, but i think if you look at the json output exported by grafana it's quite readable.  they are not doing anything crazy in there19:38
fungiunless you're worried about preserving ordering or inline comments19:38
ianwit's just they arbitrarily choose to refactor things19:38
clarkbianw: I think it was corvus  who had the major objection and didn't feel json was human readable at all.19:38
clarkb(regardless of the actual json)19:38
clarkbmaybe this deservers a mailing list thread19:39
clarkbas I don't know that corvus is watching the meeting today19:39
ianwyes, fair enough19:40
clarkband we can be explciit about what the issues are that way and try to find a workaround/compromise/etc19:40
fungicould probably stand to be a ml thread either way19:40
clarkb++19:40
ianwi mean, delivery hero or whatever have obviously seen some usefulness in it too19:40
ianwbut imo it's just a losing game if upstream don't want to give you an api to work to19:41
clarkbya you'll always be fighting problems like this19:41
corvusoh hi19:41
clarkbcorvus: I think where we've ended up is we need to do something re grafyaml and grafana dashboard/graph management as the current tool is not working in ways that are annoying. But we should start up a mailing list thread to discuss it further to make sure we capture all the angles (including this random fork on github)19:42
fricklerdo we know if that lack of documentation is intentional on the grafana side?19:43
fungifwiw, i don't see any obvious indication that the deliveryhero devs tried to upstream patches for grafyaml, just skimming the reviews there19:43
clarkbianw: is that something you woudl like to start or would it be helpful if I try to give it a go19:43
ianwi can send a mail, sure19:43
clarkbthanks19:43
ianwi don't think we need to keep having the same conversation in irc, for sure19:43
clarkbAlright anything else on this before we move on?19:44
clarkb#topic Custom url shortener19:45
fricklerthat's an easy one: still on my todo list19:45
clarkbok just wanted to make sure I had not missed a change 19:45
fricklernope19:45
clarkb#topic Removing Projects from Zuul19:46
clarkbThis was not on the emailed agenda beacuse it occured to me just this morning19:46
clarkbThe changes I pushed up to windmill and x/vmware-nsx to remove their use of pipeline level queue definitions in the gerrit config have not been reviewed and most of them fail CI19:47
clarkbone idea to address this is to simply remove projects like that from our zuul config.19:47
clarkbSeparately I do also notice that it seems like literally no one has addressed this problem in openstack at all19:47
clarkbbut I think for this topic I'm mostly concerned about what are very likely dead projects that we should just decouple from zuul until they become active again19:48
clarkbAre there any objections to that or concerns with doing that?19:48
fricklerno, but an additional idea, can we also handle some of the lang standing zuul config errors like that?19:48
fungii'm fine with that. for openstack projects, i'm happy to present the tc with the list of projects we're removing, and suggest that they can be re-added when their authors are ready to address any problems19:48
fungisame for config errors19:49
ianwwould you commit something to the projects saying "btw this zuul config is not being processed"?19:49
ianwi just wonder if people do try to commit something, and it goes into gerrit and nothing happens19:49
clarkbok a lot going on. I'm going to start with ianw's since that was one of the concerns I had too19:49
corvusi understood the suggestion as remove them from the tenant config; and i think that sounds good19:49
clarkbWhich was how do we make people aware of this change if they are already not really paying attention19:49
fungithough that can lead to a cascade effect, since many of those errors are due to projects which have been renamed or retired still listed as required-projects in old branches of other projects' configs19:49
corvus(removing from the tenant config means no commits to projects necessary)19:49
clarkbcorvus: correct, but it also means if someone pushes code to that repo now they'll just be silently ignored19:50
fricklerwe could add a job in project-config that just output some comment?19:50
corvusyes, and presumably ask someone what is up and end up at service-discuss or #opendev 19:50
clarkbfrickler: the problem with that is you need th project in the tenant config to run the job against the repo19:51
clarkbI think where I've ended up on that is what corvus describes19:51
clarkbbasically it isn't ideal but they should know where to go asking questions19:51
fricklercan't we just ignore the in-project config like we do for some github projects?19:52
corvus(you could theoretically exclude all config objects from those projects and then run a job from a config repo, but that sounds like a lot of work for people who aren't around)19:52
fungijust so i can go back to the openstack tc with a clear message, it's that leaving project zuul configs in a broken/error state indefinitely is not okay, even if it's "just" on some old stable branches ~nobody cares about, and we will be taking those projects out of the tenant config even if their master branch configs are still working19:52
corvusmy argument would be that opendev's level of service for projects should not exceed the attention given by their developers to them.19:52
clarkbfungi: sort of. This si specifically re http://lists.openstack.org/pipermail/openstack-discuss/2022-May/028603.html19:52
fricklercorvus: fair enough, I support that19:53
clarkbfungi: I do think though that we're appraoching what you descirbe whcih is that broken project configs regardless of the reason create problems for the projects in question and others. If they aren't going to do basic care and feeding then we'll remove from the CI system to avoid confusion19:53
fungiwell, i was extending it to frickler's suggestion that we "also handle some of the lang standing zuul config errors like that"19:53
clarkbcorvus: ++19:53
corvus(my comments are mostly in the context of abandoned projects)19:53
clarkbthe risk with doing wide srpead removal is that it will chain reaction down all the dependencies19:53
fungiyes, my point was that already a vast number of the config errors are due to retired/renamed projects no longer appearing in the tenant config19:54
fungiso i would expect the error count to grow if we remove them19:54
clarkbanyway to start I'm just suggesting x/vmware-nsx and windmill be removed since they both appear dead and are not part of openstack. Then separately we need to push openstack harder to actually fix this stuff19:55
fungialso the errors are branch-specific, but tenant removal is project-level19:55
clarkband if pushing openstack harder doesn't result in fixing these things we should consider removing from zuul at that time19:55
clarkbbtu I don't think we're quite to that point yet for openstack. But we should probably warn them that is ultimately our failsafe on the zuul side19:55
fungiyeah, i'll give a gently firm reminder19:56
corvusmaybe separately ask openstack to appoint some janitors for those projects (midonet, etc)?19:57
fungidefinitely19:58
fricklerI also noticed that the zuul tenant has collected a set of config errors btw.19:58
corvusyes, they are unresolvable until opendev finishes being extracted from openstack19:59
clarkbAlright we are just about at time19:59
clarkb#topic Open Discussion19:59
clarkbAnything else?19:59
corvuspossibly some may remain even then19:59
fungii'm hoping to refresh our meetpad configs to current upstream examples/defaults20:00
funginot quite sure how best to minimize our differential going forward20:00
fungiwe've tried a few different things, but those files are quite large and our edits represent a small proportion of them20:00
fungiopen to ideas in #opendev if anyone has some20:01
clarkbfungi: probably upstreaming support for flags we need is the best way20:01
clarkbbut also we are at time20:01
clarkbthank you everyone20:01
clarkb#endmeeting20:01
opendevmeetMeeting ended Tue Jun 21 20:01:32 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:01
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-21-19.01.html20:01
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-21-19.01.txt20:01
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-21-19.01.log.html20:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!