Tuesday, 2024-12-17

clarkb	last call on meeting agenda content. I'll get that sent out shortly	00:16
tonyb	nothing from me	00:17
fungi	i have nothing else to suggest either	00:18
clarkb	ok I'll send it nowish then	00:21
opendevreview	Tony Breeds proposed openstack/diskimage-builder master: Add a tool for displaying CPU flags https://review.opendev.org/c/openstack/diskimage-builder/+/937836	04:52
opendevreview	Joel Capitao proposed openstack/project-config master: Authorize packstack-core to force push to remove branch https://review.opendev.org/c/openstack/project-config/+/937792	08:27
opendevreview	Joel Capitao proposed openstack/project-config master: Authorize packstack-release to delete https://review.opendev.org/c/openstack/project-config/+/937792	08:41
*** ykarel_ is now known as ykarel		12:28
NeilHanlon	clarkb, fungi: at this time Rocky is not intending to diverge from the x86-v3 baseline that RH has decided on for v10.. though if a SIG wanted to, I'd support it... we're already planning on having RISC-V support for Rocky 10 via a SIG, so...	15:02
clarkb	NeilHanlon: I doubt we'd drive such an effort. Mostly just curious what sorts of choices others are making	15:15
karolinku[m]	if after updating this CPU flags info and it appear that x86-v3 is available, would you consider creating label which would work with C10?	15:17
clarkb	karolinku[m]: I think that depends a lot on what we discover. My primary concerns are that availability will be extremely limited and as mentioend yesterday I feel like that is ok for minor feature that are minimally used that projects can live without simply by having longer jobs vs an entire platform only works on more specialized hardware	15:28
clarkb	I also have a concern that this effectively means centos 10 can only run on what is likely our most performant hardware	15:29
clarkb	its one thing to turn off nested virt support and force projects to use emulation or rewrite tests to approach the problem another way. It is another to say "you can't have centos 10 stream anymore today because a cloud is no longer present"	15:30
clarkb	disk use of gitea09 and paste seems table even with the logging changes (thats good)	17:08
clarkb	I'll see how things are looking after my morning of meetings and we can probably proceed wtih a gerrit restart at that point	17:08
opendevreview	Merged openstack/project-config master: Authorize packstack-release to delete https://review.opendev.org/c/openstack/project-config/+/937792	17:20
opendevreview	Tony Breeds proposed openstack/diskimage-builder master: Add a tool for displaying CPU flags https://review.opendev.org/c/openstack/diskimage-builder/+/937836	19:00
tonyb	karolinku[m]: Can you rebase your CS-10 testing patch ontop of ^^	20:04
tonyb	karolinku[m]: note my change doesn't do anything to ensure x86_64-v3 nodes but it makes it very clear if you got one.	20:04
clarkb	tonyb: karolinku[m] I would continue to use the nested virt labels for now since that constrains the problem space a bit and it would be good to know among those clouds which are not acapable	20:08
clarkb	then once we've understood that subset we can go to the wider labels	20:08
tonyb	clarkb: you wanted me to include KVM info along side the cpu-level, Are you thinking capturing the output of `qemu-kvm --version` and `qemu-system-$(arch) --version` as adequate ?	20:09
clarkb	ya that seem adequate. But also we can just check the packaged versiosn too probably and not have an explicit check	20:10
clarkb	7.2 and newer can emulate haswell (though I don't know what the underlying cpu requirements for that are)	20:10
tonyb	clarkb: Yup I agree, I also have in my head a poorly formed request about adding a label explicitly for "foundational support" for RHEL-10-like distros. NOT for general CI of those OSs	20:12
tonyb	so we'd know that we can build images, but not actually build and add them to our clouds	20:12
tonyb	with my thinking being that was we do get clouds and adequate quota so that these machines are more common we could potentially add CI	20:14
tonyb	although I haven't thought much about if that's actually helpful	20:14
Clark[m]	tonyb: for the foundational stuff I wondered if we could just qemu emulate haswell to check a build and not do functests	20:21
Clark[m]	We already emulate those VM boots for checking iirc	20:21
Clark[m]	So it shouldn't be any skower	20:21
tonyb	Clark[m]: That's certainly possible	20:22
clarkb	infra-root there is nothing in the openstack gate or release pipelines. I've just notified the release team that I'll be restarting Gerrit shortly	20:55
clarkb	if there are no objections I'll start on that in a couple of minutes by sending a notice first then figure out the commands I need to run while that posts	20:56
tonyb	no objection from me	20:57
clarkb	I'm not pulling new images or anything like that just a docker-compose down; mv the waiting queue task files for replication plugin then docker-compose up -d	20:58
clarkb	#status notice Gerrit will be restarted to pick up a small configuration update. You may notice a short Gerrit outage.	20:59
opendevstatus	clarkb: sending notice	20:59
-opendevstatus- NOTICE: Gerrit will be restarted to pick up a small configuration update. You may notice a short Gerrit outage.		21:00
opendevstatus	clarkb: finished sending notice	21:03
clarkb	ok proceeding with the restart now. There is a root screen on review02 if anyone else is interested but this should be quick	21:03
clarkb	web ui is up but I'm still waiting for diffs (this is expected	21:05
clarkb	the /var/log/containers content for the db container appears to have updated as epxected	21:06
clarkb	still waiting for files/diffs	21:09
JayF	how long does it usually take for the diffs to start back working again?	21:12
clarkb	usually about 5 monutes, its going long this time but I'm not seeing anything yet indicating why	21:13
clarkb	it does prune caches on startup and I suspect that is related	21:14
clarkb	seems like it may have stopped responding too?	21:15
JayF	I'm seeing the same.	21:15
clarkb	arg I don't understand what is going on yet the error log is largely devoid of anything related	21:15
clarkb	there are some ssh timeouts	21:15
clarkb	oh now the error log is logging reejcted connections over http so that explains that at least	21:16
clarkb	not the underlying cause but it is logging it	21:16
clarkb	from what I can see gerrit is rejecting http connections so then the apahce proxy is responding with 502 bad gwateways	21:20
clarkb	gerrit show-queue shows a gerrit_file_diff pruning task started around the restart	21:22
clarkb	I suspect this is related to the lack of diffs	21:22
clarkb	however it isn't clear to me yet why this has snowballed into gerrit rejecting http connections. maybe we're all trying to load diffs filled up all the http slots?	21:22
corvus	clarkb: i'm around and can take a look	21:23
clarkb	corvus: thanks. the gerrit_file_diff cache is 61GB on disk last	21:24
clarkb	s/last//	21:24
clarkb	the last time we pruned it was 0100 and gerrit says it was 3GB at that time	21:25
clarkb	however this is an h2 db so the on disk size might be much larger than the actual cache content size?	21:25
hashar	yup :)	21:25
hashar	unless you get it vacuumed from time to time	21:25
clarkb	no vacuuming that i know of	21:25
clarkb	there are a number of other stuck tasks in the queue as well so I'm guessing that things got stuck in gerrit running startup tasks and that is leading to other things not proceeding	21:26
hashar	do you have some monitoring in place?	21:27
clarkb	hashar: we don't use the prometheus plugin type stuff if that is what you are asking. And I think we removed that one java monitoring plugin when log4shell happened	21:28
clarkb	(possibly bad) ideas: we could stop gerrit and start it again to clear out the tasks and/or try manually killing tasks	21:28
clarkb	if we stop and start gerrit again we could potentially move the git file diff cache aside to see if bringing that up clean is happier	21:28
hashar	the prometheus plugin and the grafana dashboard maintained somewhere upstream have very nicely helped us	21:29
corvus	i ran a show-caches command a few mins ago; it hasn't returned yet	21:29
clarkb	corvus: ya that one is slow	21:29
corvus	clarkb: do you mean we removed java melody?	21:29
hashar	show-caches runs a full gc iirc	21:29
clarkb	corvus: yes java melody was the one I couldnt' remember the name for	21:29
clarkb	looks like gerrit is responsive now fwiw	21:30
corvus	losing javamelody is sad; it has been very helpful to get a stack trace when something was stuck... :/	21:30
clarkb	I wonder if the show caches running a gc tripped something over into working again? or it could be that we just needed to wait	21:30
clarkb	corvus: agreed but it also did scary things with the logging methods iirc so we removed it at the time. We could probably add it back now?	21:30
corvus	well my show-caches hasn't returned yet	21:30
clarkb	gerrit has cleaned up the logging stuff quite a bit	21:30
JayF	My diff loads (well, at least once) now, fwiw.	21:31
corvus	i don't see a significant difference in the queue	21:31
clarkb	corvus: ya the show queue output loosk very similar to when it waas being sad	21:32
hashar	for the H2 backed cache: the database will grow up over time and get fragmented as bits are written/removed from it	21:32
corvus	so if it's responsive, i'm guessing it's just slowly working through a backlog while still performing the pruning	21:32
clarkb	hashar: is the suggsetyion that you delete the backing files occasionally?	21:32
hashar	on startup, Gerrit only allows 200ms to compact the database which is certainly not long enough and if the host does not restart often, it is essentailly never cleaned	21:32
corvus	"does not restart often" matches here :)	21:33
hashar	on our setup we have been transfering the caches over and over and had massive multiple GB caches which eventually one day ended up filing the disk	21:33
hashar	I probably spent two weeks of my life debugging it, exactly two years ago	21:33
hashar	my fix was to set `-Dh2.maxCompactTime=15000` which sets the time allowed to compact the db to 15 seconds	21:34
hashar	and after some restart the files were smaller	21:34
hashar	I wrote about my debugging at https://phabricator.wikimedia.org/phame/post/view/300/shrinking_h2_database_files/	21:34
hashar	the raw journal is in https://phabricator.wikimedia.org/T323754	21:35
clarkb	hashar: did the problems with those caches eventually lead to slow startup times like we observed?	21:35
hashar	but I don't think we had any performance issue. The files were just super large	21:35
clarkb	I'm wondering if this is related or we may have a different more urgent problem that we need to debug first then get back to the gerrit stuff	21:35
clarkb	* get back to the gerrit cache stuff	21:35
hashar	I don't think that slowed the startup times	21:35
hashar	at least I don't remember that to have been an issue	21:35
hashar	nor that anything has improved after having the files compacted	21:36
hashar	our diff caches went from 12G and 8.2G down to 0.5G	21:36
clarkb	looks like index tasks for some changes show up in the queue then go away	21:36
corvus	clarkb: i'm going to attempt to get a stack trace in the container	21:37
clarkb	but the ones at the top of the queue listing from aroudn when we restarted are slow	21:37
clarkb	corvus: ok	21:37
clarkb	*from around when we restarted are slow and still in there	21:37
corvus	loving "ps: command not found"	21:38
hashar	(for the H2 cache, Gerrit 3.10 has a system running the H2 cache pruning on a daily basis at 1:00 (UTC I think) https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#cachePruning	21:38
clarkb	there is a growing set of errors in the error log for a user attempting tp push to starlingx/metal	21:38
clarkb	but it almost looks like they tried to push and got an error but then gerrit eventually caught up and now they get no new changes errors?	21:39
corvus	clarkb: /home/gerrit2/review_site/logs/jstack.log	21:40
clarkb	corvus: looks like the gerrit_file_diff is RUNNABLE in that list	21:42
clarkb	system load has shot back up and I think http is unhappy again	21:42
clarkb	though maybe less unhappy than before seems things eventually loaded for me just slowly	21:42
corvus	i'm worried that there may be a deadlock...	21:43
clarkb	corvus: the queue dropped down quite a bit actually	21:43
corvus	lemme put some things together	21:43
clarkb	ack	21:43
clarkb	gerrit_file_diff and git_file_diff are quite large on disk	21:45
clarkb	everything else appears to be under 10GB	21:45
corvus	i think a lot of the index commands are waiting on my show-caches command to finish	21:47
corvus	so that's clearly a dangerous command to run :(	21:47
corvus	that thread is runnable though; it's apparently reading the h2 db.	21:48
clarkb	corvus: ah ok your show-cache isn't showing up anymore and the index tasks are no longer in there	21:48
corvus	oh yep it finished	21:48
clarkb	so theory time: the cache pruning is very expensive on very large backing files and things may get caught up in that?	21:48
clarkb	I suspect though don't know for sure that we can remove the cache files and have gerrit start those cache files over again	21:49
clarkb	hashar: ^ do you knwo?	21:49
clarkb	the git_file_diff backing file is even large	21:49
hashar	gerrit_file_diff and git_file_diff hold `git diff` output	21:50
clarkb	but I'm kind of thinking it may be prudent to stop gerrit, move those files aside / delete them, then start gerrit back up again	21:50
hashar	or something like that, so they are necessarily super large	21:50
clarkb	hashar: ya but 61GB and 222GB large?	21:50
clarkb	the actual data in them is about 3GB	21:50
hashar	oh	21:50
hashar	is the cache pruning showing in the show-queue output or the jstack?	21:51
clarkb	hashar: both	21:51
hashar	https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#cachePruning	21:51
hashar	looks like it default to be enabled on startup	21:51
hashar	so potenitally cachePruning.pruneOnStartup=false ?	21:51
clarkb	ya and before 3.10 it only ran on startup	21:51
clarkb	well we want to prune I think so try and keep disk usage under control over time	21:51
corvus	i did another jstack dump in jstack-2.log	21:52
corvus	the dick cache pruner thread stack is different, so it's doing something :)	21:52
clarkb	ok that is good to confirm	21:53
hashar	my fix was to raise it to 15 seconds to let it prune the caches (`java -Dh2.maxCompactTime=15000`)	21:53
hashar	and I haven't looked at whether that had any effect with the newer "pruneOnStartup"	21:53
clarkb	hashar: I think that is a good followup though I am concerned it won't be sufficient with things being as large as they are	21:53
clarkb	and instead wondering if we can just delete the caches. I want to say we can and gerrit generates new empty caches on startup	21:54
hashar	true yes	21:54
clarkb	but probably a good idea to move the cache aside rather than delete it	21:54
clarkb	then once gerrit is up and happy delete it	21:54
corvus	clarkb: i think you are correct; i think all the h2 stuff can be considered ephemeral and i'm fairly sure an individual missing h2 db would be created empty.	21:54
clarkb	I'm kinda thinking we should maybe take the hit of moving the files aside then given their size and apparent impact on startup	21:55
hashar	there is one h2 backed db which is not a cache though	21:55
hashar	I think the one storing whether a given file has been reviewed	21:55
clarkb	oh we're processing git_file_diff according to show queue now so it is done with gerrit_file_diff	21:55
corvus	yep	21:55
clarkb	hashar: we store that one in mariadb	21:55
corvus	and the web ui is pretty responsive	21:55
hashar	so it took like 20 minutes to clear the git_file_diff?	21:56
hashar	+1 on mariadb :)	21:56
clarkb	hashar: more like 45 minutes for gerrit_file_diff I think	21:56
hashar	has the file become smaller at least?	21:56
clarkb	no	21:57
clarkb	corvus: do you think stopping gerrit, moving the two massive diff caches aside then starting gerrit again is something we should try and if so should we do that nowish?	21:58
clarkb	it does feel like this is related to gerrit becoming overly occupid with cache maintenance on startup leading to an inability to process other tasks/requests	21:58
clarkb	I suspect that growth of those files will be more constrained in the future too since we have daily pruning now when before it was only pruning on startup	22:00
corvus	clarkb: i think we're going to get out of this eventually and the system is sufficiently operable at the moment that we don't need to do that.	22:01
corvus	but maybe taking the hit of that now with a few days before holidays sets us up to be more resilient if there is another problem later?	22:02
clarkb	corvus: that is/was one of my thoughts though I suspect if there is a reason to restart gerrit later we can simply bundle of the cache moves into that effort	22:03
corvus	basically -- i'd say let's do that friday, if friday weren't like the last day anyone would be around for a while. so maybe now is better if we want to just nip it in the bud.	22:03
clarkb	ya I think the main reason to do it nwo would be to see that we can restart gerrit safely with a known process before we have holidays	22:03
corvus	i'm on board with that line of thinking	22:03
tonyb	++	22:03
corvus	clarkb: all clear from me whenever you want to do that	22:04
clarkb	ok the process I'm thinking is we down gerrit, move the replication waiting queue aside, move the gerrit_file_diff and git_file_diff files out of the cache and into /home/gerrit2/tmp (this prevents them from being backed up but is same fs so shoulbe immediate mv of large files), then up gerrit	22:05
clarkb	let me get commands written down for all that then we can send a notice and try again	22:05
corvus	oh look there's a tempfile	22:05
corvus	-rw-r--r-- 1 gerrit2 gerrit2 237595340800 Dec 17 22:05 git_file_diff.h2.db	22:05
corvus	-rw-r--r-- 1 gerrit2 gerrit2 1517463392 Dec 17 21:50 git_file_diff.698130932.385.temp.db	22:05
corvus	i wonder if that can be used to guage progress	22:06
corvus	anyway -- main thing i was looking for is to "mv gerrit__diff." out of the way -- to make sure we get all the related files.	22:07
corvus	uh not sure if that made it through the bridge right, but you get the idea.	22:07
clarkb	ya basically get the lock file and the trace file and the tempfile if present	22:07
corvus	++	22:08
clarkb	someone want to work on sending a status notice? I should be ready by the time that gets through	22:09
clarkb	actually I think I'm ready now	22:10
clarkb	how about #status notice You may have noticed the Gerrit restart was a bit bumpy. We have identified an issue with Gerrit caches that we'd like to address which we think will make this better. This requires one more restart	22:11
hashar	+1 :)	22:11
corvus	++	22:11
clarkb	ok sending that now	22:11
clarkb	#status notice You may have noticed the Gerrit restart was a bit bumpy. We have identified an issue with Gerrit caches that we'd like to address which we think will make this better. This requires one more restart	22:11
opendevstatus	clarkb: sending notice	22:11
-opendevstatus- NOTICE: You may have noticed the Gerrit restart was a bit bumpy. We have identified an issue with Gerrit caches that we'd like to address which we think will make this better. This requires one more restart		22:12
hashar	fun is your git_file_diff has a diskLimit of 2G and a gerrit_file_diff of 3G	22:12
clarkb	hashar: ya that was based on a single day size	22:12
clarkb	which is what the docs suggest	22:12
hashar	the cache pruning thathappens on startup does some magic sql queries to keep those caches under those limits	22:13
clarkb	well we had the default limits which were 128MB iirc then we did a restart a day after a prior restart and based the sizes on the reported prune size from the second restart	22:13
hashar	if you have the h2 files on disk that are severely larger than those (230G and 61G), then my guess is you suffered from the same issue I have encountered: the database needs to be compacted	22:13
hashar	which is a different mechanism than the cachePruning one	22:13
clarkb	right its the content vs the backing file problem. But now pruning happens daily which is new in 3.10 so maybe that will keep it under control	22:14
corvus	it would do that only if pruning also compacts the db?	22:14
hashar	that is only keeping the data under those 2 and 3 g limit	22:14
corvus	i think hashar is talking about something like postgres vacuuming	22:14
hashar	the compaction is at a lower level (that is the H2 driver itself)	22:15
hashar	yeah same as vacuuming	22:15
opendevstatus	clarkb: finished sending notice	22:15
corvus	so we'd need a "compact h2" system cron job	22:15
hashar	I knew of the concept after Sqlite Vacuum	22:15
hashar	and eventually found H2 has the exact same logic but named Compact	22:15
hashar	when Gerrit connects to the H2 databsaes through the java driver, the driver does compact upon connection	22:16
clarkb	ok proceeding	22:16
hashar	for up to 20 ms	22:16
corvus	oh that's the timeout you mentioned	22:16
hashar	or up to system property `h2.maxCompactTime` ms	22:16
hashar	yeah sorry I wasn't clear	22:16
hashar	so in 20 ms it can vacuum much	22:16
corvus	no you were clear :)	22:16
hashar	I gave an arbitrary 15 seconds value, restarted Gerrit some times and eventually the file got smaller	22:17
corvus	so right after clarkb clears out the db file would be a really good time to bump that property	22:17
hashar	and if you have been carrying those h2 cache files over and over as we did	22:18
hashar	then I guess you had the exact same issue I have encountered :b	22:18
hashar	ever growing caches!	22:18
clarkb	oh oops its already coming back	22:18
hashar	which makes me regret to not have pushed that further upstream to have a nice solution implemented	22:18
clarkb	but we can deal with that some time later as in theory we have time for that	22:18
hashar	have you nuked both h2 files?*	22:18
corvus	clarkb: yeah i think we just do that today/tomorrow or maybe next year :)	22:19
corvus	but that's a system-config change and i think we don't want to rush it	22:19
clarkb	++	22:19
clarkb	I have ocnfirmed that new h2 backing files and lock files were created	22:19
clarkb	and show-queue looks much much cleaner	22:21
clarkb	and I can see diffs	22:21
hashar	congratulations!	22:22
clarkb	the disk cache pruning things that show up in show-queue are not running they are the 01:00 scheduled tasks I believe	22:22
clarkb	and distinct from the startup gerrit tasks which have all completed as far as I can tell	22:22
corvus	yep i saw some startup cache pruning that is done now	22:23
clarkb	I'm making notes now to followup with deletion of the large h2 backing files from the tmpdir and to look at the compaction option hashar mentioned	22:25
clarkb	then I'll push an update to my podman docker compose change to make it mergeable	22:25
clarkb	which should be a good exercise of gerrit overall	22:25
corvus	i'll delete the jstack logs	22:27
clarkb	hashar: any idea if this is something upstream knows about? It does seem like h2 is a bad caching implementation if it grows indefintiely	22:27
hashar	I guess I will investigate the size of those files caches. At quick glance Wikimedia runs with the default of 10mb in memory and whatever the default is for disk based	22:28
clarkb	I think I saw there was someone looking at an alternative caching implementation I wonder if that is better	22:28
hashar	I might have mentioned it on the upstream Discord yeah	22:28
hashar	my guess is Google is using something else	22:28
hashar	but Sap surely relies on H2	22:28
clarkb	also fwiw I had originally thought that the caches being too small is what led to diffs not loading quickly on startup but now I'm wondering that is purely related to the backing files not pruning quickly	22:28
clarkb	and so in theory smaller caches are actually better?	22:28
hashar	so maybe worth filing a task for it or at least have the driver to compact things	22:29
hashar	no clue	22:29
clarkb	we may want to revert the changes to increase the cache sizes and clear out all of the h2 caches and reset our baselines	22:29
clarkb	(I don't think this is urgent)	22:29
hashar	my guess is one want to look at the hit ratio	22:30
opendevreview	Clark Boylan proposed opendev/system-config master: Run containers on Noble with docker compose and podman https://review.opendev.org/c/opendev/system-config/+/937641	22:31
clarkb	I think ^ should be mergable now though it won't do anything until we have followups that run on noble. Probably also want to sanity check the jobs it does run don't try to install and use podman	22:32
hashar	I am off, congratulations on fixing the Gerrit caches!	22:32
clarkb	hashar: thank you for the help!	22:33
hashar	you are very welcome, I am always happy to help here	22:34
hashar	I guess take note of the magic compact time for next year	22:35
clarkb	yup I have written a note in my notes file locally	22:35
hashar	and do note https://phabricator.wikimedia.org/phame/post/view/300/shrinking_h2_database_files/	22:35
clarkb	done added the link to my notes	22:36
hashar	also the compaction time should be able to be passed to the java database driver base on my comment at https://phabricator.wikimedia.org/T323754#8419742	22:36
hashar	java -cp h2-1.3.176.jar org.h2.tools.Shell -url 'jdbc:h2:file:.git_file_diff;IFEXISTS=TRUE;MAX_COMPACT_TIME=15000'	22:37
hashar	sql> SHUTDOWN COMPACT;	22:37
hashar	The file went from 9G to 302MB \o/	22:37
hashar	which is a HACKY way to compact it manually :b	22:37
clarkb	hashar: did you do that while gerrit was running?	22:37
hashar	nop	22:37
hashar	locally	22:37
hashar	err I mean independently	22:38
hashar	I think I copied the whole cache file on my local machine to ease debugging	22:38
clarkb	ya that makes sense seems likely that would conflict with gerrit h2 operations especailly if they set the default compaction time so short it must be something you don't want happening while stuff is running	22:38
hashar	but potentially Gerrit could learn yet another setting that would be passed down to the jdbc connection url	22:38
hashar	but I did not look into that since the java property was a quick/good enough fix	22:39
clarkb	makes sense thank you for the pointers	22:39
hashar	at least I wrote a blog post! :b	22:39
hashar	what I wonder is why you would have the issue only triggering now	22:40
clarkb	hashar: I'm guessing the size went over some threshold that caused us to not trim within the timeout of some request or client and then that snowballed into many many requests	22:41
clarkb	it is interesting that Gerrit started almost immediately with no file diff delay etc after moving those files aside	22:41
clarkb	makes me feel more confident this was related	22:41
hashar	most probably yeah	22:42
hashar	the cache pruning tries to keep your caches below 2G/3G apparently. Maybe it does not run that often	22:43
hashar	anyway, no cache, no slowness :)	22:43
hashar	you are probably set for the next ten years	22:43
clarkb	ha indeed	22:43
hashar	I am off for real!	22:43
clarkb	goodnight	22:43
clarkb	timburke: I'm just now seeing your comments on the swift container deleter script that I wrote. Sorry I have ignored that mostly because more important things have been fighting for my attention. I'll try to get back to that at some point as we still have need for it and thank you fro the feedback	22:45
clarkb	infra-root I'ev confirmed that I can push a change, comemt on a change, search changes, view diffs and other web ui actions. I've also fetched the new patchset I pushed from gitea implying replication is working	22:47
clarkb	the only thing I haven't seen yet is a change merge	22:47
clarkb	I don't have any changes that are in a good spot to merge right this moment. Does anyone else have a change we can merge as a canary?	22:48
tonyb	sorry I don't think I do	22:50
corvus	actually yes	22:55
corvus	+3 https://review.opendev.org/935970	22:55
clarkb	awesome thanks!	22:56
clarkb	corvus: hrm that change isn't showing up in zuul	22:57
clarkb	but zuul says it is starting the jobs so it must've seen it?	22:57
clarkb	oh thats opendev zuul jobs not zuul/zuul-jobs	22:58
clarkb	I see it now	22:59
clarkb	also mordred wandertracks has a couple of changesi nthe gate queued up forever I wonder what is with that	22:59
clarkb	https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_46d/937641/13/check/system-config-run-review-3.10/46d66cc/bridge99.opendev.org/ara-report/results/147.html I think that shows gerrit system-config-run jobs properly selecting the defaul.yaml from my updated install-docker role and not the noble specific file	23:34
clarkb	I see it installing docker-cmopose too and not docker compose	23:34
clarkb	corvus: the opendev tenant also has some stuck promote jobs for an opendev/base-jobs merge from a month ago. I suspect we can simply evict those changes and the wandertracks ones and move on, but before we attmpt that I wanted to make sure you didn't want to try and debug this	23:36
clarkb	though I suspect log files may have rolled over	23:37
opendevreview	Clark Boylan proposed opendev/zuul-jobs master: Compress raw/vhd images https://review.opendev.org/c/opendev/zuul-jobs/+/935970	23:47
clarkb	corvus: ^ there was a post run failure and I think that should fix it	23:47
clarkb	I ended up using https://review.opendev.org/c/opendev/sandbox/+/934997 test test merging. That worked and it replicated here: https://opendev.org/opendev/sandbox/commits/branch/master	23:49

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!