Tuesday, 2021-11-30

clarkbMeeting time in a minute or two18:59
clarkbIts been a while too :)18:59
ianwo/19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Nov 30 19:01:02 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-November/000303.html Our Agenda19:01
clarkbWe have an agenda.19:01
clarkb#topic Announcements19:01
clarkbGerrit User Summit is happening Thursday and Friday this week from 8am-11am pacific time virtually19:01
clarkbIf you are interested in joining registration is free. I think they will have recordings too if you prefer to catch up out of band19:02
fungialso there was a new git-review release last week19:02
clarkbI intend on joining as there is a talk on gerrit updates that will be useful to us to hear I think19:02
clarkbyup please update your git-review installation to help ensure it is working properly. I've updated as my git version updated locally forcing me to update19:02
clarkbI haven't had any issues with new git review yet19:03
fungigit-review 2.2.019:03
fungii sort of rushed it through because an increasing number of people were upgrading to newer git which it was broken with19:03
clarkbthe delta to the previous release was small too so probably the right move19:03
fungibut yeah, follow up on the service-discuss ml or in #opendev if you run into anything unexpected with it19:04
clarkb#topic Actions from last meeting19:05
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-11-16-19.01.txt minutes from last meeting19:05
clarkbI don't see any recorded actions19:05
clarkbWe'll dive right into the fun stuff then19:05
clarkb#topic Topics19:05
clarkb#topic Improving CD Throughput19:05
clarkbsorry small network hiccup19:06
clarkbA number of changes have landed to make this better while keeping our serialized one job after another setup19:07
clarkb#link https://review.opendev.org/c/opendev/system-config/+/807808 Update system-config once per buildset.19:07
clarkb#link https://review.opendev.org/c/opendev/base-jobs/+/818297/ Reduce actions needed to be taken in base-jobs.19:07
ianwyep, those are the last two19:08
clarkbThese are the last two updates to keep status quo but prepare for parallel ops19:08
clarkbOnce those go in we can start thinking about adding/updating semaphores to allow jobs to run in parallel. Very exciting. Thank you ianw for pushing this along19:08
ianwyep i'll get to that change soon and we can discuss19:08
clarkb#topic Zuul multi scheduler setup19:09
clarkbJust a note that a number of bug fixes have landed to zuul since we last restarted19:09
clarkbI expect that we'll be doing a restart at some point soon to check everything is happy before zuul cuts a new release19:10
clarkbI'm not sure if that will require a full restart and clearing of the zk state, corvus would know. Basically it is possible that this won't be a graceful restart19:10
fungiafter our next restart, it would probably be helpful to comb the scheduler/web logs for any new exceptions getting raised19:10
clarkbs/graceful/no downtime/19:10
corvusyes we do need a full clear/restart19:10
clarkbcorvus: thank you for confirming19:11
fungii saw you indicated similar in matrix as well19:11
fungi(for future 4.11 anyway)19:11
clarkband ya generally be on the lookout for odd behaviors, our input has been really helpful to the development process here and we should keep providing that feedback19:11
corvusi'd like to do that soon, but maybe after a few more changes land19:11
corvuswe should probably talk about multi web19:12
corvusit is, amusingly, now our spof :)19:12
clarkbcorvus: are we thinking run a zuul-web on zuul01 as well then dns round robin?19:13
corvus(amusing since it hasn't ever actually been a spof except that opendev only ever needed to run 1)19:13
corvusthat's an option, or a LB19:13
clarkbif we add an haproxy that might work better for outages and balancing but it would still be a spof for us19:13
corvuswe might want to think about the LB so we can have more frequent restarts without outages19:13
clarkbI guess the idea is haproxy will need to restart less often than zuul-web and in many cases haproxy is able to keep connections open until they complete19:14
fungidns round-robin is only useful for (coarse) load distribution, not failover19:14
fricklerdo we have octavia available? that in vexxhost?19:14
corvusi figure if it's good enough for gitea it's good enough for zuul; we know that we'll want to restart zuul-web frequently, and there's a pretty long window when a zuul-web is not fully initialized, so a lb setup could make a big difference.19:14
clarkbfrickler: I think it is available in vexxhost, but we don't host these services in vexxhost currently so that would add a large (~40ms?) rtt between the lb frontend and backend19:15
clarkbcorvus: good point re gitea19:15
fungion the other hand, if we need to take the lb down for an extended period, which is far less often, we can change dns to point directly to a single zuul-web while we work on the lb19:15
ianwit's a bit old now, but https://review.opendev.org/c/opendev/system-config/+/677903 does the work to make haproxy a bit more generic for situations such as this19:16
fungior just build a new lb and switch dns to it, then tear down the old one19:16
ianw(haproxy roles, not haproxy itself)19:16
clarkbianw: oh ya we'll want something like that if we go the haproxy route and don't aaS it19:17
corvusianw: is that for making a second haproxy server, or for using an existing one for more services?19:17
corvus(i think it's option #1 from the commit msg)19:17
clarkbcorvus: I read the commit message as #1 as well19:17
ianwcorvus: iirc that was when we were considering a second haproxy server19:17
fungiyeah, make it easier for us to reuse the system configuration, not the individual load balancer instances19:17
corvusthat approach seems good to me (but i don't feel strongly; if there's an aas we'd like to use that should be fine too)19:18
fungiso that we don't end up with multiple almost identical copies of the same files in system-config for different load balancers19:18
clarkbcorvus: I think I have a slgiht prefernce of using our existing tooling for consistency19:19
clarkband separately if someone wants to investigate octavia we can do that and switch wholesale later (I'd be most concerned about using it across geographically distributed systems with disparate front and backe ends)19:19
fungithough for that we'd probably be better off with some form of dns-based global load balancing19:20
fungigranted it can be a bit hard on the nameservers19:20
fungi(availability checks driving additions and removals to a dedicated dns record/zone)19:21
fungirequires very short ttls, which some caching resolvers don't play nicely with19:21
corvusok, i +2d ianw's change; seems like we can base a zuul-lb role on that19:22
clarkbsounds good, anything else zuul related to go over?19:22
corvusi'll put that on my list, but it's #2 on my opendev task list, so if someone wants to grab it first feel free :)19:23
corvus(and that's all from me)19:24
clarkb#topic User management on our systems19:24
clarkbThe update to irc gerritbot here went really well. The update to matrix-gerritbot did not.19:24
clarkbIt turns out that matrix-gerritbot needs a cache dir in $HOME/.cache to store its dhall intermediate artifacts19:24
clarkband that didn't play nicely with the idea of running the container as a different user as it couldn't write to $HOME/.cache. I had thought I had bind mounted everything it needed and that it was all read only but that wasn't the case. TO make things a bit worse the dhall error log messages couldn't be written because the image lacked a utf8 locale and the error messages had19:25
clarkbutf8 characters19:25
clarkbtristanC has updated the matrix-gerritbot image to address these things so we can try again this week. I need to catch back up on that.19:25
clarkbOne thing I wanted to ask about is whether or not we'd like to build our own matrix-gerritbot images using docker instead of nix so that we can have a bit more fully featured image as well as understand the process19:26
clarkbI found the nix stuff to be quite obtuse myself and basically punted on it as a result19:26
clarkb(the image is really interesting it sets a bash prompt but no bash is installed, there is no /tmp (I tried to override $HOME to /tmp to fix the issue and that didn't work), etc)19:27
clarkbI don't need an answer to that in this meeting but wanted to call it out. Let me know if you think that is a good or terrible idea once you have had a chance to ponder it19:28
fungii agree, it's nice to have images which can be minimally troubleshot at least19:28
ianwit wouldn't quite fit our usual python-builder base images, though, either?19:29
clarkbianw: correct, it would be doing very similar things but with haskell and cabal instead of python and pip19:29
clarkbianw: we'd do a build in a throwaway image/layer and then copy the resulting binary into a more minimal haskell image19:29
clarkbs/haskell/ghc/ I guess19:29
clarkbhttps://hub.docker.com/_/haskell is the image we'd probably use19:30
clarkbI don't think we would need to maintain the base images, we could just FROM that image a couple of times and copy the resulting binary over19:30
clarkbWe can move on. I wanted to call this out and get people thinking about it so that we can make a decision later. It isn't urgent to decide now as it isn't an operation issue at the moment19:31
clarkb#topic UbuntuOne two factor auth19:31
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-November/000298.html Using 2fa with ubuntu one19:31
fungiat the beginning of last week i started that ml thread19:31
fungii wanted to bring it up again today since i know a lot of people were afk last week19:32
fungiso far there have been no objections to proceeding, and two new volunteers to test19:32
clarkbI have no objections, if users are comfortable with the warning in the group description I think we should enroll those who are interested19:32
fungieven though we haven't really made a call for volunteers yet19:32
ianw(i think i approved one already, sorry, after not reading the email)19:32
fungino harm done ;)19:33
clarkbya was hrw, I think hrw was aware of the concerns after working at canonical previously19:33
clarkban excellent volunteer :)19:33
fungii just didn't want to go approving more volunteers or asking for volunteers until we seemed to have some consensus that we're ready19:33
clarkbI think so, its been about a year, I have yet to have a problem in that time19:33
fungii'll give people until this time tomorrow to follow up on the ml as well before i more generally declare that we're seeking volunteers to help try it out19:34
clarkbsounds like a plan, thanks19:34
fricklerI guess I can't be admin for that group without being member?19:35
fungifrickler: correct19:35
clarkbfrickler: I think that is correct due to how lp works19:35
fungii'm also happy to add more admins for the group19:36
fricklero.k., not a blocker I'd think, but I'm not going to join at least for now19:36
clarkbOne thing we might need to clarify with canonical/lp/ubuntu is what happens if someone is removed from the group19:36
clarkband until then don't remove anyone?19:36
fungii'll make sure to mention that in the follow-up19:37
fungimaybe hrw knows, even19:37
ianwit does seem like from what it says it's a one-way ticket, i was treating it as such19:37
ianwbut good to confirm19:37
clarkbianw: yup, that is why I asked because if we add more admins they need to be aware of that and not remove people potentially19:37
clarkbit may also be the case that the enrollment happens on the backend once and then never changes regardless of group membership19:38
clarkbWe have a couple more topics so lets continue on19:38
clarkb#topic Adding a lists.openinfra.dev mailman site19:38
clarkb#link https://review.opendev.org/818826 add lists.openinfra.dev19:38
clarkbfungi: I guess you've decided it is safe to add the new site based on current resource usage on lists.o.o?19:39
clarkbOne thing I'll note is that I don't think we've added a new site since we converted to ansible. Just be on the lookout for anything odd due to that. We do test site creation in the test jobs though19:39
fungiyeah, i've been monitoring the memory usage there and it's actually under less pressure after the ubuntu/python/mailman upgrade19:39
clarkbyou'll also need to update DNS over in the dns as a service but that is out of bad and safe to land this before that happens19:40
fungifor some summmary background, as part of the renaming of the openstack foundation to the open infrastructure foundation, there's a desire to move the foundation-specific mailing lists off the openstack.org domain19:40
fungii'm planning to duplicate the list configs and subscribers, but leave the old archives in place19:40
clarkbfungi: is there any concern for impact on the mm3 upgrade from this? I guess it is just another site to migrate but we'll be doing a bunch of those either way19:41
fungiand forward from the old list addresses to the new ones of course19:41
fungiyeah, one of the reasons i wanted to knock this out was to reduce the amount of list configuration churn we need to deal with shortly after a move to mm3 when we're still not completely familiar with it19:42
clarkbmakes sense. I think you've got the reviws you need so approve when ready I guess :)19:42
fungiso the more changes we can make before we migrate, the more breathing room we'll have after to finish coming up to speed19:42
clarkbAnything else on this topic?19:42
funginope, thanks. i mainly wanted to make sure everyone was aware this was going on so there were few surprises19:43
clarkbthank you for the heads up19:43
clarkb#topic Proxying and caching Ansible Galaxy in our providers19:43
clarkb#link https://review.opendev.org/818787 proxy caching ansible galaxy19:43
clarkbThis came up in the context of tripleo jobs needing to use ansible collections and having less reliable downloads19:44
fungiright19:44
clarkbI think we set them up with zuul github projects they can require on their jobs19:44
fungiyes we added some oc the collections they're using, i think19:44
clarkbIs the proxy cache something we think we should moev those ansible users to? or should we continue adding github projects?19:44
clarkbor do we need some combo of both?19:44
fungithat's my main question19:45
fungione is good for integration testing, the other good for deployment testing19:45
fungiif you're writing software which pulls things from galaxy, you may want to exercise that part of it19:45
clarkbcorvus: from a zuul perspective I know we've struggled with the github api throttling during zuul restarts. Is that something you think we should try to optimize by reducing the number of github projects in our zuul config?19:45
clarkbfungi: I think you still point galaxy at a local file dir url. And I'm not sure you gain much testing galaxies ability to parse file:/// vs https:///19:46
corvusclarkb: i don't know if that's necessary at this point; i think it's worth forgetting what we knew and starting a fresh analysis (if we think it's worthwhile or is/could-be a problem)19:46
corvusmuch has changed19:46
clarkbcorvus: got it19:46
clarkbAt the end of the day adding the proxy cache is pretty low effort on our end. But the zuul required projects should be far more reliable for jobs. And since we are already doing that I sort of lean that direction19:47
clarkbBut considering the low effort to run the caching proxy I'm good with doing both and letting users decide which tradeoff is best for them19:48
fungiyeah, the latter means we need to review every new addition, even if the project doesn't actually need to consume that dependency from arbitrary git states19:48
fungiwith the caching proxy, if they add a collection or role from galaxy they get the benefit of the proxy right away19:49
clarkbgood point. I'll add this to my review list for after lunch and we can roll forweard with both while we sort out github connections in zuul19:49
clarkbAnything else on this subject?19:49
fungibut i agree that if the role or collection is heavily used then having it in the tenant config is going to be superior for stability19:49
fungii didn't have anything else on that one19:50
clarkb#topic Open Discussion19:50
clarkbWe've got 10 minutes for any other items to discuss.19:50
fungiyou had account cleanups on the agenda too19:50
clarkbya but there isn't anything to say about them. I've been out and no time to discuss them19:50
fungifor anyone reviewing storyboard, i have a couple of webclient fixes up19:51
clarkbIts a bit aspirational at this point :/ I need to block of a solid day or three and just dive into it19:51
fungi#link https://review.opendev.org/814053 Bindep cleanup and JavaScript updates19:51
fungithat solves bitrot in the tests19:51
clarkb#link https://review.opendev.org/c/opendev/system-config/+/819733 upgrade Gerrit to 3.3.819:51
fungiand makes it deployable again19:51
clarkbGerrit made new versions and ^ updates our image so that we can upgrade19:51
clarkbMight want to do that during a zuul restart?19:51
fungiyeah, since we need to clear zk anyway that probably makes sense19:52
fungi#link https://review.opendev.org/814041 Update default contact in error message template19:52
fungithat fixes the sb error message to point users to oftc now instead of freenode19:52
fungican't merge until the tests work again (the previous change i mentioned)19:52
ianwoh i still have the 3.4 checklist to work through.  hopefully can discuss next week19:53
clarkbianw: 819733 does update the 3.4 imgae to 3.4.2 as well. We may want to refresh the test system on that once the above change lands19:53
clarkbThe big updates in these new versions are to reindexing so something that might actually impact the upgrade19:53
clarkbthey added a bunch of performance improvements sounds like19:54
ianwiceweasel ... there's a name i haven't heard in a while19:54
fungiespecially since it essentially no longer exists19:54
ianwclarkb: ++19:54
clarkbLast call, then we can all go eat $meal19:56
ianwkids these days wouldn't even remember the trademark wars of ... 2007-ish?19:56
fungii had to trademark uphill both ways in the snow19:57
clarkbianw: every browser is Chrome now too19:57
clarkb#endmeeting19:57
opendevmeetMeeting ended Tue Nov 30 19:57:59 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:57
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-30-19.01.html19:57
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-30-19.01.txt19:57
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-30-19.01.log.html19:57
clarkbThank you everyone19:58
clarkbWe'll see you here next week19:58
fungithanks clarkb!19:58
corvusand app too (thanks electron!)19:58
fungisame bat time, same bat channel!19:58
ianwhttps://en.wikipedia.org/wiki/Mozilla_software_rebranded_by_Debian - 2006 - i'll take 2007 as a pretty good guess off the top of my head :)19:59
ianwi remember installing it on an Itanium desktop, so that constrained the timeline a bit19:59
clarkbwow itanium20:00
ianwgosh that was a long time ago!20:00
clarkbwe had a couple racks of itanium servers when I was at Intel. I think they were largely idle because by that point in time everyone knew the arch wasn't going anywhere20:01
ianwoh those were the days.  this was pre amd64 (which was pre x86-64!) so just about everything had weird issues related to 64-bit pointers20:01
ianwthere was ~ nobody running gnome, etc. on 64-bit in those days20:02
clarkbianw: not even on sparc?20:02
ianwmaybe enthusiasts would play with things on a sparc, or a alpha multia etc. but generally you'd run x11 and something more basic like fvwm 20:03
clarkbI guess solaris had the CDE probably not many gnome users. But by the time opensolaris happened gnome was the default iirc20:04
clarkbwe had a lot of sparc stuff at the university20:04
ianwyeah, sparc was fun and coveted hardware if you coud find it.  i feel like people fiddling with the alternative archs were also more bsd-ish.  a lot of netbsd going around for alpha and sparc at the time20:07
ianwi don't know why i coveted 200lb boxes full of jet engine fans, but it was a different time :)20:09
fungii did have a 64-bit sparc (sunstation) and 64-bit mips (sgi indy)20:14
fungis/sunstation/sparcstation/20:14
fungi(the sunstations also existed but i didn't have one)20:15
fungier, no, the sgi indy was 32-bit mips, but i did have a dec alpha as well20:16
fungii eventually swapped out the sparcstation for sun t1-105 rackmount servers because they were more compact and drew less power20:17

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!