Thursday, 2024-04-04

opendevreviewMerged opendev/irc-meetings master: We have decided to adjust meeting time to 0700 during summer time.  https://review.opendev.org/c/opendev/irc-meetings/+/91494009:02
SvenKieskeI'm currently trying to decide which parent job to utilize for my new linting job, and looking at https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L72 I have questions. Are the python27 jobs really still running? the build history for those is not loading for me in zuul. I guess I could grep all the zuul jobs.yamls if it is enabled at all09:30
SvenKieskein general it seems - for me anyway - that there are some jobs in there which could probably be updated? referencing EOL branches etc? But I'm not sure if these are still in use for some fips testing or other stuff?09:31
SvenKieskeI don't see any jobs there explicitly running on newer releases (2023.X et al). newest branches there in most parent jobs e.g. "openstack-tox" is zed release. even openstack-tox-py311's parent is openstack-tox with no newer branches declared? I must be missing something?09:50
SvenKieskeah, those are defined here: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/project-templates.yaml#L645 weird structure..09:54
SvenKieskeso it seems, previously gate testing jobs and their branches where defined in jobs.yaml https://opendev.org/openstack/openstack-zuul-jobs/commit/6d85fd8399ed6b9f2358412945cd6683989662cd09:59
SvenKieskebut nowadays this is done in project-templates.yaml https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/91371010:00
SvenKieskeand nobody cleaned that up? afaik it's not necessary to split the branch definition for jobs being run over two files here?10:00
SvenKieskemaybe I'm still not seeing the whole picture here, but it does seem to make some kind of sense at least. although it seems a little brittle and error prone to have no single source of truth which branch is being used for which job at which point in time.10:03
*** sfinucan is now known as stephenfin12:25
fungiSvenKieske: you might find https://zuul.opendev.org/t/openstack/jobs an easier way to browse the zuul jobs defined in the openstack tenant, and then cross-reference them to git from there12:28
fungito answer your question about whether some projects still maintain compatibility with and test on python 2.7, yes of course. there are still distributions that support it even if it's not supported upstream by the python community, and openstack project branches that (at least very recently) supported being installed with python 2.712:30
fungiand even master branches of non-branching tools and libraries that need to continue to support those older versions of the software (e.g. pbr). i think we only dropped python 2.7 support from bindep a few weeks ago12:33
fungihttps://zuul.opendev.org/t/openstack/builds?job_name=openstack-tox-py27&project=openstack/pbr12:36
fungihttps://zuul.opendev.org/t/opendev/builds?job_name=tox-py27&project=opendev/bindep though that's in a different zuul tenant these days12:38
SvenKieskeah, that's why I didn't find any jobs, thanks!12:38
SvenKieskedoes pbr still use py27?12:39
fungiyes, it still supports python 2.7 because other projects supporting python 2.7 need to be installable with the latest versions of pbr12:39
fungia prime example though is https://zuul.opendev.org/t/openstack/builds?project=openstack%252Fswift12:40
fungiyou'll see it runs several different "py27" jobs even on master branch changes12:41
SvenKieskeinteresting, I was under the impression that the transition to python3 was complete years ago. at least that was some marketing speak around it. so swift still supports python2?12:43
SvenKieskeis that to support some old redhat cruft? I assume swift does support python3. I wouldn't even know how to currently install python2 on most distros, maybe software heritage has old packages.12:45
tkajinamwe globally removed python 2 support at ussuri afair and that was because mainly SwiftStack required it for a bit longer ( to run new swift in older ubuntu). Idk if that requiremenet still stands12:48
tkajinams/that was/keeping py2 support in swift/   I mean12:49
SvenKieskei quickly grepped for python2|3 in the swift repo, at least there seem to be some remnants of python2 support left. as the only tests being run on python2 seem to be linting|tox stuff I doubt that it works.12:52
SvenKieskejust recently found out swift support with keystone backend for auth was broken since zed release in k-a because we did not test that.12:53
SvenKieskeso my basic assumption since roughly 6 years is: untested code is broken code.12:54
tkajinamSvenKieske, that's interesting. do you have a bug for it ?12:54
tkajinamwe run some tempest tests to validate deployment with swift + keystone in puppt jobs but I've never seen any problems so far (though our test coverage is quite limited)12:55
tkajinamI think swift + keystone is covered by usual dsvm jobs run in multiple projects12:55
SvenKiesketkajinam: https://bugs.launchpad.net/kolla-ansible/+bug/2060121 this is kolla-ansible specific bug, see the attached patch12:55
SvenKieskebut it reinforces my view. another contributor is working on implementing tempest tests in kolla now. we need more integration tests in kolla and I think tempest is the best we can add.12:56
tkajinamhttps://github.com/openstack/devstack/blob/master/lib/swift#L43512:57
SvenKieskewe have custom bash integration test and they mostly work, but they really only test a fraction, even of our default install.12:57
tkajinamI'd say that's not a bug in swift but one in deployment tools. though I feel like the requirement of /v3 path is redundant and something we may want to improve12:57
SvenKieskeyeah sure, I was talking about kolla-ansible, that's my main interaction point with openstack :) it's a bug in this deployment tool12:58
tkajinama bit tricky point with this discussion is that you may need to test s3 api instead of native swift api and tempest does not cover it for now12:58
SvenKieskeit was just an example where no tests lead to silently broken code, for many releases even.12:59
tkajinamyeah12:59
SvenKieskeso if swift does not run integration tests in python2 I doubt it works in python2, until proven otherwise :)13:00
tkajinamhm https://github.com/openstack/swift/tree/master/test/functional/s3api13:01
fungiSvenKieske: to be clear, our "marketing speak" was that we fully supported python 3. you can fully support python 3 and 2.7 if you're careful about how you write your software13:16
SvenKieskesure, but I also imho read somewhere that python2 support was - to be - removed and afaik there are projects without python2 support13:17
fungiand yes, it was in service of people on older (but still supported by their vendor) gnu/linux distributions being able to upgrade to the latest releases of swift13:17
SvenKieskeI'm pretty sure I myself added the usage of a standardlib that's not available in python213:17
fungisometimes people don't want to upgrade the distribution they're running, and as long as that distro still provides necessary things like security fixes i don't see that as a concern13:18
SvenKieskeno13:18
SvenKieskebut https://docs.openstack.org/tempest/latest/supported_version.html does not list any python 2 version as supported13:18
SvenKieskeso imho it's good to wonder why we burn CI cycles on python2 tests?13:19
SvenKieskeah damn, that's only tempest13:19
fungiSvenKieske: maybe the missing piece here is that swift doesn't require keystone. it can be installed as a stand-alone service13:19
fungiand from what i understand there are quite a few large deployments of stand-alone swift without other openstack services alongside it13:20
fungiand that's what the swift team has been trying to make sure kept working on python 2.713:21
SvenKieskebut python2 is also not listed on the supported runtimes for zed: https://governance.openstack.org/tc/reference/runtimes/zed.html13:21
fungiSvenKieske: right, in the zed release projects were not *required* to support python 2.7, but that doesn't mean they couldn't still choose to do so13:22
SvenKieskeokay, makes sense :)13:22
fungiopenstack-wide support guarantees are the minimum required by projects included in openstack, not the extent of what some of they might support in isolation13:23
SvenKieskeimho it would still make sense to at least think about a roadmap when to officially demand removal of this though.13:23
fungii think i heard that the swift team was sunsetting python 2.7 support, i don't recall what that timeline was, perhaps that's something they'll be talking about in ptg sessions next week13:24
SvenKieskeokay, thanks for the insights. I wasn't really aware there are actually still openstack projects with python2 support. guess I'm one of the lucky 10000 ( https://xkcd.com/1053/ )13:25
fungiopenstack is pretty vast, it's hard to know everything that goes on in it13:31
corvus1my guesstimate of the db import time was way off, it took 9 hours.14:14
fricklerwow, that's a lot14:18
corvus1yeah, i think it's all the indexes... i made the estimate based on the rate of the artifact table, but it has few indexes.  i think the builds/buildsets/refs tables, which have lots of indexes, slowed down a lot14:19
corvus1during the recent migration, i turned off indexes then recreated them at the end. i think we should see if it's possible to do that with mysqldump14:20
corvus1s/turned off/deleted/14:20
fricklerunrelated, didn't we have like 500+ config errors? now I see only 335 and I wonder what happened14:20
corvus1i think if we do that, we may approach the runtime of the migration (which i think was 40m?)14:20
corvus1heh, i wonder if archive.org crawled our config error page?  and if we're about to block it?  :)14:21
fricklerinteresting idea. seems there's only one single copy from 2022 though :(14:26
corvus1frickler: i think the scheduler logs the error count every time it reconfigures a tenant; so you could at least double check the numbers, but not the actual error contents from the logs14:28
fricklerI'm just check against a copy I made 4 weeks ago. seems my memory of the count is mostly correct and we have 250 less errors for openstack/trove combined with some general increase due to 2024.1 branching14:32
fungihopefully that means someone actually fixed trove's job configs14:49
clarkbfungi: I do wonder if we should make a pbr2 package and then have that drop python<3.8 support. Probably a lot of effort for minimal gain14:50
clarkbparticularly since we may be able to drop python2 in the near future which is probably the biggest tripping hazard14:50
fungiwe'll probably be able to drop python2 support from pbr soon enough that it's not worth the extra dance14:50
fungiyeah, that14:50
clarkbthe one place where it could get tricky is if people see that as an invitation to start adding newer python3 stuff but then we won't necessarily work with newer python3 when installing in old locations either. THough I suspect that pip's fake pyproject.toml stuff may help a bit there14:51
clarkbI'll be joining the gerrit community meeting in about 9 minutes. Going to ask them about this reindexing bug on gerrit 3.9 (my current biggest concern with an upgrade)14:51
fungitesting with sufficiently old python3 is probably sufficient to control that14:52
clarkbgood point14:52
corvus1hrm, the mysqldump script already disables keys during the import, so i'm not sure that manually deleting them and adding them would be faster.15:04
fungiany idea what the bottleneck is? network bandwidth? cpu on the trove instance?15:05
corvus1(or string parsing overhead of a text dump file)?15:05
corvus1i don't know, and it's a bit hard to tell without access to the db host...15:06
fungimaybe this is where that "break the warranty sticker" root login comes in handy15:07
fungior we can accelerate our plans to launch a dedicated mariadb server instance15:07
corvus1well, even that's just root db user15:07
fungioh, root in the db not a rood command shell in the os15:07
fungiyeah, that's maybe some help still (i think mysql has performance details for some stuff?) but not the whole picture15:08
corvus1i'm sure we can make this faster, but is it worth it?  do we sink a lot of time into improving it, or is 9 hours of missing build records on a saturday acceptable?15:08
fungii think it's acceptable, but also wonder if it's much less work than an ansible playbook that installs the mariadb container we use elsewhere and launching a server15:09
fricklerI think it is fine, too. bonus if you make sure to start after the periodic-weekly pipeline is done15:10
corvus1yeah, if we can fold a migration to self-hosted in, that would be ideal; just not sure how fast we can cobble together that change15:10
fungii'm starting to look at it because i can't help myself, but really shouldn't be since i'm up against several other deadlines15:13
fungii guess the main things we need are a servicve-zuul-db playbook that includes the mariadb role and our other standard roles, custom firewall rules allowing query access from the zuul schedulers, some rudimentary testinfra test(s)... what else?15:15
corvus1yes -- except there is no mariadb role because we don't have any standalone mariadbs15:15
corvus1so that has to start as a copy of, say, the gerrit role with a bunch of stuff removed15:15
fungioh, yes i totally missed that and assumed we had already made a shared mariadb role, but i see we haven't15:16
fungiinstead we just embed mariadb container configuration in every service that needs one15:16
corvus1yep15:16
* fungi had started from a copy of the etherpad role but gerrit would have also worked yes15:17
corvus1if you're doing that, i can launch the server15:17
fungibut yeah, maybe we just do the migration to another trove on saturday. i don't think i can commit to writing and debugging this before the weekend15:18
corvus1ok i'll stand down then :)15:18
fungithe scope isn't substantial, but it's more than i have time for before next week at the earliest15:18
clarkbhttps://gerrit-review.googlesource.com/c/gerrit/+/417857 progress15:18
fungiyay!15:19
fungiand with that, i'm disappearing for lunch but shouldn't be more than an hour tops15:19
clarkbI also understand the issue much better now. Bsaically there was a new feature added that allowed a full offline reindex to start from a checkpoint state (possibly precreated before you do an upgrade) this keeps deltas small and speeds up your "full" offlien reindex. However, there was a bug and in some cases (I think when you did not create the checkpoint state whcih is non default)15:28
clarkbit would completely delete the changes index15:28
clarkbthen you start gerrit and panic. It turns out that if you rerun a full reindex from that state it works beacuse its starting from 0. That means the workaround is to simply rerun the reindex15:28
clarkbbut this was under documented and obtuse and when you're in a "gerrit is basically completely unusable" state you're not likely to find that path forward15:28
clarkbanyway that revert pulls out the functionality and then 3.10 (current master) has reimplemented it in a different more robust way15:29
clarkbThe other thing that was called out is that SAP apparently has hit what they think may be a race between C git and Jgit during repacking of large repos. It sounds like packed-refs ends up getting truncated and then the repo is unuseable. They were able to restore from backups though16:00
clarkbApparently one installation has for many years (like a decade) done a hard link to packed-refs before running gc then only removes the hard link if things check out cleanly. We may want to investigate doing this in our system. (it was nasser saying they do that so we can followup if it seems like a good idea)16:01
clarkbSAP seemed to think it is extremely unlikely to happen though (and they don't even know that it is a race between c git and jgit that is just a theory).16:02
clarkbsounds like their gerrit install has many more changes than ours and much larger repos involved16:02
opendevreviewJames E. Blair proposed opendev/system-config master: Add a standalone zuul db server  https://review.opendev.org/c/opendev/system-config/+/91507916:08
corvus1fungi: clarkb ^ i dropped and added a single index on the new server on the artifacts table and it took over an hour, which is kind of spooking me about using this trove db.  so i went ahead and tried to throw together a self-hosted change.16:10
clarkbcorvus1: you did that on the old server or the new test one? (or maybe both?)16:11
corvus1new one; i don't have a comparable time for the old server, so i can't say whether it's slower or not16:14
clarkbcorvus1: posted some quick thoughts on that change. But lgtm16:17
clarkbwell other than addressing those minor things I mean16:18
opendevreviewJames E. Blair proposed opendev/system-config master: Add a standalone zuul db server  https://review.opendev.org/c/opendev/system-config/+/91507916:24
corvus1clarkb: thanks!16:24
clarkbcorvus1: looks like you've got the secrets file open so gpg is telling me to go away (I presume ofr the trove stuff)16:26
clarkbif you don't need it anymore there is an edit I'd like to make this morning16:26
corvus1yep, i'll exit now16:26
clarkbthe email for the password change infra-root just got was me16:34
corvus1i'm launching an 8gb performance flavor in dfw for zuul-db0116:36
corvus1on jammy16:36
fungiokay, back now16:37
fungiand reviewing the db server change, awesome!16:40
clarkboh maybe the passwd change email only went to me. Maybe we should upate that contact email. One thing at a time :)16:41
fungiyeah, we talked about that when i was doing the mfa stuff for it too16:42
opendevreviewJames E. Blair proposed opendev/zone-opendev.org master: Add zuul-db01  https://review.opendev.org/c/opendev/zone-opendev.org/+/91508216:45
clarkbcorvus1: on the db role change I think maybe we need to add the groups fiel for testing to the big list that gets copied by ansible to set up the test bridge?16:46
opendevreviewJames E. Blair proposed opendev/system-config master: Add zuul-db01  https://review.opendev.org/c/opendev/system-config/+/91508316:46
clarkbI'm trying to dig that up and will get a pink16:47
clarkbcorvus1: https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base.yaml#L115 needs an entry there. I'll leavea  gerrit comment for historical reasons too16:48
corvus1clarkb: found it16:48
corvus1yep16:48
opendevreviewJames E. Blair proposed opendev/system-config master: Add a standalone zuul db server  https://review.opendev.org/c/opendev/system-config/+/91507916:49
opendevreviewJames E. Blair proposed opendev/system-config master: Add zuul-db01  https://review.opendev.org/c/opendev/system-config/+/91508316:49
corvus1clarkb: fungi i just ran the same drop/add on my local mysql8 db with a slightly older copy of the opendev db and it finished in 19min.  so i think it's safe to say that the trove mysql8 is not optimal, but it won't be clear if self-hosting will be an improvement until we test there.16:52
clarkback16:52
fungiit definitely sounds like a promising data point though16:52
corvus1i need to run some errands -- if ya'll can continue pushing on the mariadb thing, that would be much appreciated16:53
fungiabsolutely, thanks!!!16:55
opendevreviewClark Boylan proposed opendev/system-config master: Rebuild the etherpad container image  https://review.opendev.org/c/opendev/system-config/+/91508416:55
opendevreviewJames E. Blair proposed opendev/system-config master: Set standalone mariadb innodb buffer pool to 4G  https://review.opendev.org/c/opendev/system-config/+/91508517:01
corvus1okay really leaving now :)17:01
opendevreviewJeremy Stanley proposed opendev/system-config master: Add a standalone zuul db server  https://review.opendev.org/c/opendev/system-config/+/91507917:51
clarkbfungi: I think you need to rebase the other two chagnes on that too?17:54
clarkbor maybe you want to see check apss first? thats fine I guess17:54
fungiyeah, i wasn't eager to rebase those until i see it passing17:54
fungijust in case there are other surprises lingering17:55
fungibut will do, for sure, once it's all good17:55
Clark[m]Thanks. I'm working on lunch now. It got cold again so I'm making a quick dashi to do some ramen18:19
fungiyum! what base are you using? shiitaki? kombu? bonito? niboshi? some combination of those?18:21
Clark[m]Kombu and katsuobushi (bonito)18:22
fungisoooo gooooooooood!18:22
fungioiishi18:23
fungier, oishii i meant18:23
Clark[m]Nothing fancy just putting something together with what I've got laying around. Noodles were bought fresh hut had one serving left hiding in thebfreezert18:25
Clark[m]*the freezer18:25
fungiwe end up with a lot of shiitake dashi from rehydrating dried mushrooms for other dishes, excellent for reuse18:26
fungiless so recently since we've been growing our own shiitake though18:28
fungigiven my irc nick you'd think i would have at least attempted mushroom farming earlier in life, but i've discovered it's surprisingly easy18:30
Clark[m]I've debated buying one of the kits. I feel like I would end up killing them like I do the plants in my yard18:31
fungikits are mostly for educational purposes and not a sustainable way to farm18:36
fungilonger term you can just grow shiitake on billets of oak in a dark place like your basement, root cellar or crawlspace18:36
fungithe wood needs to stay damp but not soaked, and you just harvest the mushroom growth from them periodically18:37
fungihttps://zuul.opendev.org/t/openstack/build/60d3f05987e34125b88c1cdbe8a85ad919:16
fungiwhat am i missing?19:16
fungithe change has:19:16
fungiassert mariadb_log_file.contains('mariadb: ready for connections')19:16
fungioh, wait, that's a buildset for the old patchset19:17
fungicheck hasn't reported on the new patchset19:17
fungiguess i found an outdated notification in my inbox19:17
fungithough oddly, it was sent 5 minutes ago19:18
fungioh, it's for a child change, not the one i updated19:19
fungiokay, less confused now19:19
fungithough the new patchset is failing in a new way19:20
fungi"Apr  4 18:24:24 zuul-db99 docker-mariadb[15314]: 2024-04-04 18:24:24 0 [Note] mariadbd: ready for connections." https://zuul.opendev.org/t/openstack/build/e9eb0810ce1c45c995bcea46b6655405/log/zuul-db99.opendev.org/containers/docker-mariadb.log#3619:23
fungihttps://zuul.opendev.org/t/openstack/build/e9eb0810ce1c45c995bcea46b6655405/log/job-output.txt#54619-5462219:24
corvus1ha i see it19:24
corvus1i'll fix19:24
* fungi sighs19:24
fungiwhat did i miss?19:24
opendevreviewJames E. Blair proposed opendev/system-config master: Add a standalone zuul db server  https://review.opendev.org/c/opendev/system-config/+/91507919:25
corvus1fungi: you're gonna love it19:25
fungizomg19:26
fungihow did i not notice that extra d? did i not cut and paste? i guess not!19:26
corvus1dbdbdbdbd19:26
fungifeels like it should be a friday19:27
corvus1i want to start a db project called pqdb19:27
fungiand name the process pqdbdbd19:27
corvus1and the query client pqdbdbdpq19:28
fungii'd use it as often as i was able to type it19:29
opendevreviewJames E. Blair proposed opendev/system-config master: Add zuul-db01  https://review.opendev.org/c/opendev/system-config/+/91508319:29
opendevreviewJames E. Blair proposed opendev/system-config master: Set standalone mariadb innodb buffer pool to 4G  https://review.opendev.org/c/opendev/system-config/+/91508519:29
fungi::19:32
fungimy window manager is unusually squirrelly today19:32
fungiguess i'll take the opportunity for a package upgrade. running out of ways to procrastinate on paperwork19:33
clarkbits really maria db d ?19:55
clarkbthats like the equivalent of a typing tongue twister19:55
corvus1yup for realz20:04
fungireally for reals20:24
corvus1first change is looking good, but third change failed with this failure which seems spurious: https://zuul.opendev.org/t/openstack/build/7504595c5a474e7a81bcbf57f62c9f2620:32
corvus1(and related to the zk host)20:33
corvus1i'm going to recheck that but we should keep that in mind if it shows up again20:33
clarkbya looks like the test node lost networking?20:34
clarkb++ to a recheck20:34
corvus1clarkb: fungi https://review.opendev.org/915082 can you +3 that?20:34
fungithat's my interpretation as well20:34
fungicorvus1: done20:35
corvus1i have created secret creds for it on bridge, so i think all pieces are in place, just awaiting merge20:37
opendevreviewMerged opendev/zone-opendev.org master: Add zuul-db01  https://review.opendev.org/c/opendev/zone-opendev.org/+/91508220:38
opendevreviewMerged opendev/system-config master: Add a standalone zuul db server  https://review.opendev.org/c/opendev/system-config/+/91507921:21
opendevreviewMerged opendev/system-config master: Add zuul-db01  https://review.opendev.org/c/opendev/system-config/+/91508321:21
corvus1that's enough to get started; once that deploys i'll manually make the config change, start the db, and start an import21:25
fungiinfra-prod-base deploy failed for 915079, checking into the logs to see why21:34
fungizuul-db01.opendev.org      : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=021:36
funginot sure what that's all about, i can sudo ssh into it from bridge21:37
fungi"Data could not be sent to remote host "104.239.240.24". Make sure this host can be reached over ssh: Host key verification failed."21:38
fungi`sudo ssh -4 zuul-db01.opendev.org` is also working with no interaction, so it's not a host key mismatch on the v4 addy or anything like that21:39
fungishould we just reenque the change into deploy?21:40
corvus1fungi: yeah, i agree with all that.  any chance you have a buildset link?21:45
corvus1fungi: actually there is a giant deploy happening now, i guess we can just let it run?21:46
corvus1fungi: oh weird, actually i would not have expected 079 to try to contact the host because it wasn't added until 915083, which is the change after it, and the one that's running now21:49
corvus1i'm not sure why it thought it had a zuul-db01 in inventory since it hadn't been added to the inventory yet.... :?21:49
fungicorvus1: oh, we've seen this before21:49
corvus1but at any rate, i think we can expect 083 to work since at least everything should definitely be in place then21:49
fungiapprove changes together and the deploy works off the state of everything that merged before it started21:50
fungiwhich fails for the penultimate change because it's operating off state that isn't relevant yet21:50
fungihappened to me when i added lists01.opendev.org now that i think about it21:51
fungiif we'd waited to approve the inventory addition until after the prior change deploy had finished, it would have been fine21:51
fungibut the inventory addition showed up early during the parent change's deploy21:51
corvus1oh because deploy is a change pipeline, not ref-updated21:52
fungizactly21:52
fungiso not a show-stopper, but worth thinking about whether there's a way to solve that short-term race i suppose21:53
fungiother than just delaying later approvals, that is21:54
fungibecause we'll never remember to do that21:55
clarkbbut shouldn'tssh still work?21:55
fungiit's possible we didn't add the host key as known at that point21:56
clarkbI can understand there may be an order thing going on but if the inventory has the node and we have ssh set up (lauch node does this) it should still be able to connect right?21:56
clarkboh right the host key comes out of the inventory and we may have needed an earlier step to apply that21:56
fungimy guess is it's a known_hosts challenge, right21:56
clarkbI wonder if bootstrap bridge does that21:57
fungihard to say for sure because the logs are a bit opaque21:57
clarkband it can run at the same time as other playbooks? that might be the source of the bug21:57
fungipossible we're skipping a necessary job with changed file filters, yep21:57
fungior at least not running it first21:57
clarkbya I think that is it (either of those two scenarios or both)21:58
fungiwhich we might also be able to solve another way if we could make openssh ignore missing known_hosts entries as long as there's a matching sshfp result21:59
fungithat was the original intent with sshfp records, but openssh upstream backpedaled after dnssec failed to gain traction22:00
corvus1the letsencrypt job failed (don't know why yet) but that's a stroke of luck in that it skipped 21 jobs and the zuul-db deploy is next since it doesn't depend on it.22:25
clarkb'ansible.vars.hostvars.HostVarsVars object' has no attribute 'letsencrypt_certcheck_domains'22:35
clarkbhowever in the task just two ish tasks prior it records the value of that variable so I dont' know why that failed22:35
clarkbsome sort of bug in fact recording?22:35
opendevreviewMerged opendev/system-config master: Set standalone mariadb innodb buffer pool to 4G  https://review.opendev.org/c/opendev/system-config/+/91508522:45
corvus1clarkb: but nb02 is not in that list...22:45
corvus1okay that's some serious ansible wizardry to build the list of domains and it's failing at the nb02 step22:47
corvus1there's a comment in that task file that makes me think maybe this happens occasionally22:50
corvus1iptables on zuul-db01 looks reasonable22:51
opendevreviewJames E. Blair proposed opendev/system-config master: Mariadb: listen on all IP addresses  https://review.opendev.org/c/opendev/system-config/+/91509622:57
corvus1clarkb: fungi ^ one more thing; i have that in place manually and i think that's it.22:57
fungioh, good catch. for the mailman3 role i configured it to only work on the loopback22:59
corvus1fungi: clarkb we need to put /var/mariadb/db on /opt -- how should we do that?23:16
fungioh, for more disk space? hmm..23:17
corvus1should i just change the docker-compose to use /opt instead of /var/mariadb?  or do a bind mount... or...?23:17
corvus1ya23:17
corvus1seems like maybe just moving the docker-compose volume mounts might be easiest/best?23:17
fungiyeah, i think for other things we've done something like /opt/mariadb and then fiddled with fstab and cinder volumes if we deploy in another provider23:18
fungiclarkb: ^ does that sound right?23:18
opendevreviewJames E. Blair proposed opendev/system-config master: Move standalone mariadb to /opt  https://review.opendev.org/c/opendev/system-config/+/91509823:20
corvus1since it's easy, i've made that change locally on the server.  i'm going to put it into emergency for now so it won't be reverted23:33
fungisounds good. i would have approved it, but would appreciate clarkb's input once he's back23:34
corvus1yeah, it's super easy to undo if we want something else23:34
Clark[m]Usually we bind mount the drive to something in /var23:35
Clark[m]But the end result is the same other than the path. Etherpad is an example of this iirc23:35
corvus1is that in ansible, or was that just done manually?23:36
Clark[m]I think it's done with launch node flags telling it what to do with the epehermal drive?23:37
Clark[m]Or with a volume via launch node23:37
corvus1 /dev/main/main-etherpad02  /var/etherpad/db  ext4  errors=remount-ro,barrier=0  0  223:37
corvus1that's looking like lvm23:37
fungioh, so we mount the ephemeral disk to /var/something in those cases?23:38
corvus1well, at least in etherpad's case, we got an extra volume and lvm'd it and mounted that; it's not actually using /opt23:38
fungiyeah, on etherpad02 we have /dev/xvdb1 as a pv and then make a logical volume on it23:38
Clark[m]That's a volume xvde is ephemeral23:39
fungiright, we're using a cinder volume in that case, mainly so that we have some insurance beyond backups23:39
corvus1want i should make a volume for zuul-db02 and mount it at /var/mariadb ?23:40
corvus1mimic etherpad?23:40
corvus1might be nice to have opt for scratch space for giant sql files anyway :)23:40
fungii suppose it's a question of whether we 1. need more space than the ephemeral disk provides, 2. might consider detaching and attaching to a different server in the future23:41
fungior yeah, 3. want to be able to use the ephemeral disk for something else entirely23:41
corvus11: not today, but we'll use ~half of it i think; 30 out of 60g  maybe more23:42
corvus12. probability significantly higher than 023:42
corvus1and 3 -- yeah, if we share it, we will be nearly out of space if we make a single mysqldump.  so yeah, cinder has some things going for it.  :)23:42
fungisounds good to me. also we could probably use some of our ssd quota for that rather than the default sata, for added performance23:45
corvus1yep23:45
corvus1VolumeManager.create() got an unexpected keyword argument 'backup_id'23:45
corvus1i got that when i ran volume create... :(23:45
corvus1do i need to use a certain venv?23:45
fungiwe don't need it to be huge. probably the 100gb minimum rackspace requires would suffice23:45
corvus1the one in launcher-venv does not work23:47
tonybcorvus1: yes we do.   I think it's /home/fungi/xyzy23:47
tonybsomething like that.   history should help23:48
fungiheh, um...23:48
corvus1tonyb: thanks!  different error: The plugin rackspace_apikey could not be found23:48
corvus1so i think fungi's secret venv has bitrotted since the mfa stuff and we need a new one!23:48
tonybahhh that be new due to the MFA stuff23:48
fungiwe can probably fix that by installing the rackspace client plugin into that venv23:48
corvus1fungi: if you want -- i won't touch it since it's in your homedir :)23:49
fungii just installed rackspaceauth into that xyzzy venv23:50
fungitry again?23:50
corvus1fungi: success, thanks!23:51
fungibut really, we should figure out why /usr/launcher-venv doesn't work for that23:51
corvus1other openstack volume commands work in the global env, only the create fails with the backup_id thing23:52
fungiyeah, i had previously only tested things like volume list23:54
fungiso didn't realize the main venv wasn't able to create new volumes23:55
corvus1okay all done, and change abandoned23:57
corvus1removed host from emergency23:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!