Wednesday, 2024-01-31

clarkbfungi: ok posted a review.00:04
tkajinamI wonder if https://review.opendev.org/c/openstack/project-config/+/905976 can be moved forward02:06
tonybtkajinam: Looks good to me.02:08
tkajinamtonyb, thanks !02:09
fungiclarkb: thanks!02:11
opendevreviewMerged openstack/project-config master: Add puppet-ceph-release right for special stable branch handling  https://review.opendev.org/c/openstack/project-config/+/90597602:46
tonybI've done more poking on the inmotion cloud and it looks like there are instances in the nova_api database that are deleted in the nova_cell0 database which explains the mismatch.  I've reached out in openstack-nova for some help and will keep prodding there05:59
tonybI think it's just a matter of missed cleanups but I'd like some help from nova to make sure I do it right.06:00
tonybWhile working on it I may need to take set the various hypervisors to disabled in a rolling fashion but I don't think that's any worse than what we have right now.06:01
fricklertonyb: ack, thx for digging through this06:30
opendevreviewJan Marchel proposed openstack/project-config master: Add new components to NebulOuS project: prediction-orchestrator, exn-middleware, overlay-network-agent  https://review.opendev.org/c/openstack/project-config/+/90706008:50
*** liuxie is now known as liushy08:57
*** zigo_ is now known as zigo09:43
*** ykarel_ is now known as ykarel10:00
opendevreviewMerged openstack/project-config master: Add new components to NebulOuS project: prediction-orchestrator, exn-middleware, overlay-network-agent  https://review.opendev.org/c/openstack/project-config/+/90706013:17
*** d34dh0r5| is now known as d34dh0r5315:01
opendevreviewJeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0  https://review.opendev.org/c/opendev/system-config/+/90714115:21
*** d34dh0r53 is now known as d34dh0r5|16:01
*** d34dh0r5| is now known as d34dh0r5316:01
fungiclarkb: i think i addressed all your comments on ^ and the keycloak job is still passing (buildset just hasn't reported yet)16:22
clarkback I'll rereview shortly16:22
fungino rush, just making sure you're aware16:22
clarkbI just need to finish doing local updates and catch up on emails16:23
fungii hear ya, that sounds like the first 3 hours of my day16:23
clarkbalso tea which I now have16:24
fungioh! yes i'm overdue for a cup myself, thanks for the reminder16:24
clarkbfungi: one thing that occurred to me is I'm not sure if the keycloak service has db backups yet (or backups at all). Would be a good followup post redeployment to ensure that is added16:32
clarkbI don't think we need to bundle it in the deployment though since we aren't using it for anything critical yet16:32
fungiclarkb: yes, i added a note about db backups on the etherpad, i was unsure if that was something i could include in the initial change or if it needed to be a followup after deployment16:33
clarkbI think you can do it all together, but that is a lot of moving parts and something that might end up getting discarded if we redo it16:33
fungii don't think it has any persistent data other than in the database, so unless we also want to backup its logs (maybe not a bad idea for forensic reasons) the db backup should be sufficient for disaster recovery16:34
clarkbI think we get the system backups with the exclusions by default then we can add db on top. Not sure if the current roles allow you to just do the db16:35
clarkbalso the change lgtm now16:35
fungireally, remote logging (or local worm) would be best for forensics, but not something to worry about for now16:36
fungii wonder what worm drive options there might be. something tells me that's not a common thing in cloud providers16:37
clarkbaroo?16:38
fungi"write once read many"16:39
fungisometimes called "append-only"16:39
fungiapparently amazon glacier has a worm option16:39
clarkbah. "worm drive" I think gears and stuff16:39
fungihah, yes that's also a type of gear. i had to replace one in my stand mixer recently16:40
fungii suppose the modern solution is cryptographic approaches for tamper-evident logging, i.e. merkle-damgård16:44
fungie.g., you progressively add each line to an iterative hash of the previous line16:45
fungibut then you still have to put the hashes somewhere they can't be tampered with16:46
fungichained hashes16:47
fungihmm... apparently syslog-ng has something along those lines: https://man.archlinux.org/man/secure-logging.7.en16:49
fungithough that also encrypts them16:49
fungibut ultimately, the most thorough solution it to just stream logs in near-real-time to another system and try to make sure that the place you send your logs is unlikely to get compromised even if someone manages to tamper with the sending system and tries to hide their tracks by editing or removing logs16:55
fungiso yeah, there's no real magic solution like the old-school worm enforced at the hardware level (or even older school logging to greenbar on an impact printer attached to a serial line to a different locked room/building)16:57
fungiwhat was great was when the admins would mindlessly just toss piles of that into an unsecured dumpster, and you could laboriously read through looking for places where someone accidentally typed their password at the username prompt16:58
fungii mean, not that i ever did that or anything16:59
fungion closer inspection, this version of keycloak seems to do all its logging to stdout and gets captured in the container log, so we can drop that extra mount i think17:07
fungion the host filesystem for the held node, /var/log/keycloak/ is entirely empty17:08
clarkbthat may be another change between jboss and wildfly17:09
clarkbin that case I think the update to have syslog consume it for us is fine adn we can probably drop the dir and mount for the log dir?17:09
fungiyeah, that's what i'm thinking17:10
fungiminor concern though, there's still h2 databases in the container. i'll check for signs it's actually using sql17:10
fungipossible i've got the envvars wrong17:11
fungiyeah, there are no tables in the keycloak database17:15
fungiresorting to cloning the source to dig for confirmation the envvar names are correct, but wow this is not a small repo17:29
fungiworst case we can probably just map in our own https://github.com/keycloak/keycloak/blob/main/quarkus/dist/src/main/content/conf/keycloak.conf and set values directly there17:29
fungi605mb just checking out the main branch17:30
clarkbfungi: https://www.keycloak.org/server/containers has different vars17:31
fungithere are build-time and run-time envvars17:31
fungipretty sure those are the options to set when building your own image17:31
clarkblooks like instead of a address we have to give it a full jdbc connection string?17:31
clarkboh weird17:32
clarkbfungi: further down in that page they provide the db info as args to the start command17:32
clarkbunder running a standard keycloak container. Maybe ditch the env vars and use the command line instead?17:32
fungiit looks like DB_VENDOR may have changed to just DB? https://github.com/keycloak/keycloak/blob/main/quarkus/config-api/src/main/java/org/keycloak/config/DatabaseOptions.java17:32
fungiand yeah, i considered switching to the cli opts, since we already have several we're supplying anyway17:33
clarkbhttps://mariadb.com/kb/en/about-mariadb-connector-j/ has jdbc url example for mariadb17:33
fungii'm going to fiddle with the held node a bit and see what works17:33
jrosseri have some examples of this if youre interested17:33
jrosserwe run HA keycloak and mariadb17:33
fungijrosser: oh really? yes please!17:33
jrosser`db-url=jdbc:{{ keycloak_jdbc_provider }}://{{ keycloak_jdbc_haproxy_vip }}:{{ keycloak_jdbc_db_port }}/{{ keycloak_db_name }}`17:34
jrosserfrom the conf file17:34
jrosseransible, of course so those are our vars17:34
fungijrosser: also https://review.opendev.org/c/opendev/system-config/+/907141/11/playbooks/roles/keycloak/templates/docker-compose.yaml.j2 is what we'd tried up to this point17:34
clarkbfungi: looking at that file I agree DB appears to be the var to set the high level type17:34
clarkbbut it isn't clear to me if those are read as env vars17:35
clarkb--db=postgres is in the first example link I provided so tehy seem to map to cli args at least17:35
fungiclarkb: the other common envvars like DB_PASSWORD turned up in that file17:35
fungiso just a hunch17:35
fungii'm going to break for lunch and then start fiddling around a bit17:36
jrosserfungi: ours is installed from distro packages so we template out the conf file17:37
jrosserbut we have recently done a massive series of upgrades bringing it to a pretty new version17:37
fungijrosser: yeah, like i said earlier, we can also just map our own conffile into the container if we want17:55
fungibut having some semi-stable api (more stable than tracking changes to their default config file) would be preferable if we can work it out17:56
opendevreviewJeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0  https://review.opendev.org/c/opendev/system-config/+/90714118:35
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM: Fail keycloak testing for an autohold  https://review.opendev.org/c/opendev/system-config/+/90660018:35
fungiclarkb: jrosser: ^ apparently they template out the db url with sub-options18:35
clarkbthat simplifies things18:35
fungiso you can do --db-url-host, --db-url-port, --db-url-database...18:35
fungithe tricky bit is that --db-url-host gets stuck straight into the jdbc url string, which is colon-delimited, so you have to include [] if using raw ipv6 addresses18:36
fungi--db-url-host=::1 didn't work (and returned odd errors about the port), which confused me initially, until i realized it was reading that as a null host and null port18:37
fungi--db-url-host=[::1] worked a treat though18:38
fungii also added a test to confirm we have expected initial database content18:39
JayFI may have just backported an ironic fix of a similar shape :-|18:39
JayFalthough I think we found some things want [] and some break with []18:39
fungiwhat's old is old again18:39
clarkband docker just refuses to understand both versions18:39
clarkband podman refuses to change in solidarity with docker18:39
fungibecause the podman folks love docker so very, very much18:40
JayFAt least all our API mistakes are our API mistakes. It has to be annoying to be chasing someone elses' API18:40
fungiJayF: only when it's undocumented18:40
fungiwhich, you know, is most of the time18:41
clarkbJayF: the frustrating thing as an end user is that podman is not compatibile with docker in a bunch of different ways18:41
JayFI have18:41
clarkbbut for some reason ipv6 literal support is not one of the ways they can differ18:41
JayF**I have lots of opinions about podman, and none of them include "this is a good idea that influenced tech in a positive way". I prefer someone be incompatible rather than 90% there18:41
fungiit's mainly annoying that they clearly chose to be incompatible with docker in some ways, but then refuse to acknowledge clear bugs with the excuse that they want to be bug-compatible with docker18:42
* fungi has cake and eats it too18:43
JayFYeah, this is the pattern you get trapped in if you chase someone elses' API18:43
clarkbnow I want cake18:44
fungithe cake is a lie18:50
opendevreviewClark Boylan proposed opendev/system-config master: Update to etherpad 1.9.6  https://review.opendev.org/c/opendev/system-config/+/90734918:57
fungiwow, database test worked on the first try!18:57
clarkbnice18:58
fungithe new held keycloak test node is 104.239.230.3119:00
fungialso no h2 files in the running container this time19:01
fungiwhich i considered adding a test for, but figured checking mariadb for content was sufficient19:01
clarkbya and I think h2 dbs can be used as caches (gerrit oes somethign like this)19:05
clarkbso as long as the permanent data ends up in mariadb we should be good19:05
fricklerfwiw I'm seeing packet loss and sometimes-slow-responses from review.o.o like SvenKieske did earlier (in #*-kolla)19:13
clarkbI'm seeing very minimal loss within my isp before packets jump out of our AS but nothing beyond that19:18
fungifrickler: ipv4 i guess? you're presumably still not able to reach it at all over ipv619:21
clarkbI've started to try and add some more depth to the pre ptg etherpad19:23
fungifor me, ping -4 from home to review.o.o looks like: 100 packets transmitted, 100 received, 0% packet loss, time 99725ms rtt min/avg/max/mdev = 66.104/74.671/431.658/36.152 ms19:23
fungiping -6 is surprisingly a little better: 100 packets transmitted, 100 received, 0% packet loss, time 99140ms rtt min/avg/max/mdev = 54.896/57.737/68.236/2.460 ms19:26
fricklerfungi: yes, v6 is still unreachable19:31
JayF75.196/79.345/89.269/4.652 ms from here over v4, it looks good/normal to me19:33
* JayF has no v619:33
fungii'm doing some tests from our mirror server in france, since that's the geographically closest network to germany i have access to, though i doubt it's following similar routes19:36
fungii could also boot one in vexxhost's warsaw region but that's not really any better19:37
fungiipv6: 100 packets transmitted, 100 received, 0% packet loss, time 99136ms, rtt min/avg/max/mdev = 80.353/81.795/99.545/3.724 ms19:41
fungiipv4: 100 packets transmitted, 100 received, 0% packet loss, time 99154ms, rtt min/avg/max/mdev = 80.125/80.971/98.320/2.549 ms19:41
fungifairly consistent from there19:41
jrosserfrickler: do you see where they are lost with mtr?19:42
fungiwe can also install the mtr package on review.o.o to get the reverse path for comparison, since cases like this quite often involve an asymmertic route somewhere and you've got a 50% chance to see failures misattributed to the first hop where they diverge19:44
fricklerjrosser: seems to be only the final two hops, so either the vexxhost link is full or something going wrong on the return path19:51
fricklertoo bad mnaser isn't around any more most of the time to look at things from the inside19:52
fungiwhen you see failures like that close to a provider edge, odds are you're dealing with an asymmetric route and the loss is somewhere on the way back19:52
mnaseri'm around, but honestly, there's not much we can do with zayo, i've filed endless tickets with them19:52
fungiwe can run mtr from review.o.o to see where it errors19:52
mnaseri'm playing ping pong with them and it's just a matter of having the contract lapse and recommending no one to ever touch their stuff :)19:53
fungii definitely don't envy you, nor do i miss chasing backbone provider problems19:53
mnaserit turns out after all the internet is a series of tubes19:54
fungivery leaky ones at that19:54
mnaserits not a big truck19:54
fungimost of the troubles i remember would end up being two backbone providers who couldn't agree on who was responsible for upgrading the capacity on their peering with one another, so they'd just point fingers and let customers suffer until one of them eventually caved and added more circuits19:55
fungiour bgp tables were a never-ending churn of pads and prefs to try to work around the worst offenders, but there was only so much we could do19:56
jrosserjust now everything i can look at (not at my work laptop) goes via cogent and looks OK19:56
fricklermnaser: sorry to hear that. though from my traceroutes, both directions seem to be via cogent. and tbh I've heard more bad stories about cogent than zayo, but who knows20:14
mnaserfrickler: historically cogent has been the bad guy, but surprisingly they got their act together20:14
fricklerfungi: would we install mtr by hand or do we need to add it to the automation somewhere?20:15
fricklermnaser: well not in terms of their connectivity to german telekom it seems20:16
fungifrickler: i would just manually `sudo apt install mtr` but i wouldn't object to adding it and similar diagnostic tools to our default set if others are in favor20:18
fricklerI went for mtr-tiny in order to avoid installing like 100 X11 libraries20:20
tonybyeah I think adding it to the defaults is good.20:21
fricklerbut having that as default tool together with things like tcpdump and nc is a good idea20:21
tonybalso maybe jq?20:21
fricklerjq is also good, yes20:21
tonybis nmap too much?20:22
fricklerhmm ... at least questionable I'd say, too easy to do unwanted things with it20:22
fricklerI can look into a patch tomorrow, eoding for now20:24
fungiyeah, i don't see nmap as being in the same category as those other things20:24
opendevreviewClark Boylan proposed opendev/system-config master: DNM force etherpad failure to hold node  https://review.opendev.org/c/opendev/system-config/+/84097220:38
clarkbput a hold in plcae for ^ after a successful test run on the parent20:39
fungilooks like you should have a held node for it now21:24
opendevreviewJames E. Blair proposed openstack/project-config master: DNM: test syntax error  https://review.opendev.org/c/openstack/project-config/+/90736221:44
clarkb173.231.255.107 is the held node and I'm in the clarkb-test etherpad if you want to help test21:49
clarkbchrome is doing the random reconnect thing we've seen in the past but seems to work otherwise21:52
clarkbif others can't find issues I think this is probably a safe update21:52
clarkbcuriously chrome and firefox render that reddish color differently21:53
*** blarnath is now known as d34dh0r5322:17
clarkbI missed that https://discuss.python.org/t/what-to-do-about-gpus-and-the-built-distributions-that-support-them/7125 is a thing pypi is actually looking at now22:20
clarkbthis same issue is what ultimately led to us turning off our pypi mirroring22:20
fungiyeah22:21
clarkbit seems like the fundamental issue is that CUDA isn't packaged in a way that is consumable as a dependency so everyone bundles it22:23
clarkbkidn of surprising to me that very little of the discussion seems to have gone down the path of "stop allowing cuda to do this to us"22:24
fungistockholm syndrome22:25
clarkbnvidia is making large buckets of money in large part due to the success of cuda + python22:26
clarkbits crazy to me that investing a small amount of that into making the packaging of the software not suck seems insurmountable22:26
clarkbI guess at the end of the thread there is talk of the cudapython lib which does some of that22:27
clarkbexcept that those bindings are different than the ones everyone is already using22:27
fungiheld etherpad lgtm22:28
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job  https://review.opendev.org/c/zuul/zuul-jobs/+/90736322:35
clarkbI'ev got a doctor appointment tomorrow morning but maybe we upgrade etherpad when I get back22:44
clarkbtonyb: looks like fungi reviewed the meetpad stack too if we want to start merging some of those. I think most of them are safe as they don't try and replace anything yet?22:45
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job  https://review.opendev.org/c/zuul/zuul-jobs/+/90736322:46
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job  https://review.opendev.org/c/zuul/zuul-jobs/+/90736322:57
fungisgtm23:05
tonybclarkb:  Sounds good.  I'll add a comment to the review to the first review to address your question23:11
tonybAlso FWIW: I'm slowly removing the stuck nodes from inmotion23:12
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job  https://review.opendev.org/c/zuul/zuul-jobs/+/90736323:14
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job  https://review.opendev.org/c/zuul/zuul-jobs/+/90736323:24
clarkbtonyb: what process did you end up for cleaning up the stuck nodes?23:43
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job  https://review.opendev.org/c/zuul/zuul-jobs/+/90736323:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!