Wednesday, 2022-04-06

opendevreview	Merged openstack/project-config master: Retire openstack-health project: end project gating https://review.opendev.org/c/openstack/project-config/+/836707	00:01
opendevreview	Ghanshyam proposed openstack/project-config master: Retire openstack-health projects: remove project from infra https://review.opendev.org/c/openstack/project-config/+/836709	00:09
opendevreview	Ghanshyam proposed openstack/project-config master: Retire opendev/puppet-openstack_health: set noop jobs https://review.opendev.org/c/openstack/project-config/+/836710	00:31
opendevreview	Ghanshyam proposed openstack/project-config master: Retire opendev/puppet-openstack_health: remove project from infa https://review.opendev.org/c/openstack/project-config/+/836712	00:39
*** ysandeep\|out is now known as ysandeep		05:22
*** jpena\|off is now known as jpena		07:36
noonedeadpunk	clarkb: regarding zuul queues. I have 2 questions. 1. Should we calculate time project is consuming in gates before we switched to shared queues to compare how much time we started wasting with that approach? 2. Can I just define that in project-config?	08:20
*** raukadah is now known as chandankumar		08:29
*** ysandeep is now known as ysandeep\|lunch		09:08
opendevreview	Elod Illes proposed openstack/project-config master: Add ansible-collection-kolla to projects.yaml https://review.opendev.org/c/openstack/project-config/+/836763	09:49
*** ysandeep\|lunch is now known as ysandeep		10:03
*** rlandy_ is now known as rlandy		10:19
opendevreview	Marcin Juszkiewicz proposed openstack/project-config master: Build and publish wheel mirror for CentOS Stream 9 https://review.opendev.org/c/openstack/project-config/+/836793	10:24
opendevreview	Marcin Juszkiewicz proposed openstack/project-config master: Add centos-stream-9-arm64 nodes https://review.opendev.org/c/openstack/project-config/+/836796	10:34
opendevreview	Marcin Juszkiewicz proposed openstack/project-config master: Remove CentOS 8 wheel mirrors https://review.opendev.org/c/openstack/project-config/+/836799	10:37
*** dviroel\|out is now known as dviroel		11:21
*** ysandeep is now known as ysandeep\|afk		11:37
opendevreview	Merged openstack/project-config master: Remove CentOS 8 wheel mirrors https://review.opendev.org/c/openstack/project-config/+/836799	11:50
*** ysandeep\|afk is now known as ysandeep		12:56
opendevreview	Marcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Add build-wheel-cache jobs for CentOS Stream 9 https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836829	13:19
opendevreview	Marcin Juszkiewicz proposed openstack/project-config master: Build and publish wheel mirror for CentOS Stream 9 https://review.opendev.org/c/openstack/project-config/+/836793	13:19
lajoskatona	frickler: Hi, can you join Neutron discussion for OVN related questions? (ttps://www.openstack.org/ptg/rooms/grizzly )	13:23
opendevreview	Ghanshyam proposed openstack/project-config master: Remove tempest-lib from infra https://review.opendev.org/c/openstack/project-config/+/836703	13:24
*** dasm\|off is now known as dasm\|ruck		13:44
*** dviroel is now known as dviroel\|ptg		14:13
clarkb	noonedeadpunk: I think that is a different waste value to the one you were concerned abou previously. The previous concern was wasted resources which isn't quite the same as gate time. From a resource perspective that is tracked in graphite and there are some dashboard for it already: https://grafana.opendev.org/d/94891e7b01/resource-usage-by-tenants-and-projects?orgId=1 but	15:01
clarkb	looks like those graphs may have broken recently	15:02
clarkb	noonedeadpunk: I think graphite is also tracking the time in queues but we don't have dashboards for it	15:02
clarkb	noonedeadpunk: for 2) you can define it htere since there should already be project entries for the projects there	15:03
clarkb	lajoskatona: I'm trying t oremember were you working on the resource usage dashboards?	15:05
lajoskatona	clarkb: yes, but yatin actually finished it	15:05
clarkb	ah. I'm not sure why those grpahs would've stopped having data recently. I wonder if zuul shifted the location of that info but I don't recall that happening	15:06
lajoskatona	clarkb: this one: https://grafana.opendev.org/d/94891e7b01/resource-usage-by-tenants-and-projects?orgId=1	15:06
clarkb	ya data seems to end on march 25th	15:06
lajoskatona	clarkb: true, I don't know what happened with it	15:06
clarkb	let me ask the zuul project if they know what may have caused that	15:07
*** dasm\|ruck is now known as dasm\|ruck\|bbl		15:08
noonedeadpunk	clarkb: likely I wasn't able to explane my concerns correctly:)	15:10
clarkb	noonedeadpunk: well either way that info should be in graphite and you can query it there	15:10
noonedeadpunk	clarkb: as my concern was mostly about random failures in gate that would invalidate all queue	15:10
noonedeadpunk	ok, gotcha	15:11
clarkb	noonedeadpunk: right but it was expressed as a concern was wasting resources not wall time for developers (they are related but distinct problems)	15:11
noonedeadpunk	(I just today got OOM during tempest in gates)	15:11
clarkb	wall time is much more susceptble to other demands in the system	15:11
noonedeadpunk	eventually having flavor with 12Gb of RAM would be quite helpfull to eliminate that	15:12
fungi	noonedeadpunk: we have flavors with 16	15:12
noonedeadpunk	fungi: I checked nodepool conf and saw only 1 provider having them...	15:12
clarkb	but they are limited and you may experience NODE_FAILURE depending on availbality of the cloud provider	15:13
fungi	it's possible we're down to 1 now that ericsson cancelled their citycloud donation, yeah. i'll double-check	15:13
clarkb	yes I think it was vexxhost and citycloud and now just vexxhost? citycloud could never reliably boot the larger instances either	15:13
noonedeadpunk	uhhhh... citycloud should just donate on their own...	15:13
fungi	noonedeadpunk: you'd think that, but... if you happen to know anybody there that would be awesome	15:14
noonedeadpunk	we're working on that but progress is slow :(	15:14
* noonedeadpunk happen to know himself....		15:14
* noonedeadpunk happen to know also evrardjp who is CTO there		15:15
fungi	oh, i didn't realize that's where he ended up!	15:15
clarkb	right so with citycloud we had resources there indepdnent of airship and had to shut them all down	15:15
clarkb	then airship + ericcson + citycloud did a thing for a bit that was smaller but intended for larger flavor sizes. Unfortunately the larger flavor sizes were very flaky	15:16
fungi	er, didn't have resources independent of what was being donated for airship	15:16
clarkb	and now we don't have that at all which leaves just vexxhost for the larger flavor sizes aiui	15:16
clarkb	fungi: we did several years ago	15:17
noonedeadpunk	well, we have some ideas, but have soooo limited time for that....	15:17
noonedeadpunk	and huge backlog...	15:17
clarkb	but then they were gone for a year or two by the time airship donation happened	15:17
fungi	oh, right there was an earlier citycloud provider but we were running into scheduling issues with it then too	15:17
noonedeadpunk	but will see	15:17
noonedeadpunk	I didn't know that you had to drop citycloud though	15:17
clarkb	noonedeadpunk: yes we were told they didn't have the extra capacity anymore iirc	15:18
clarkb	which is a perfectly valid reason to stop donating	15:18
fungi	on the original donation, right	15:18
clarkb	Just want to pint out that asking us to give you bigger instances is difficult when the clouds you work for are unable t odo it :)	15:18
fungi	and then the more recent donation ended because ericsson stopped paying for it	15:18
clarkb	re OOMing I've tried a few times to encourage projects to look at their resource consumption particularly memory	15:19
noonedeadpunk	ok, gotcha	15:19
clarkb	the last time I did debugging of it privsep was a major culprit	15:19
clarkb	because every service (maybe even each process) runs a separate privsep instance and they all together consume more memory than any single openstack service iirc	15:19
noonedeadpunk	well, the job that failed was installing whole telemetry stack....	15:19
fungi	right, i expect openstack-ansible is falling victim to unchecked memory bloat across openstack services, which happens in part because they happily use all the memory our flavors offer	15:20
clarkb	fungi: right, if we had 16GB of memory as default then openstack would balloon to fill that	15:20
clarkb	we can kick the can down th eroad or push openstack to fix it	15:20
fungi	if every project started using 16gb flavors for testing, suddenly the openstack services would just grow their memory footprint in response, right	15:20
clarkb	I've attempted the please fix it route without much success	15:20
fungi	and so openstack-ansible would probably continue to oom once that happened	15:21
noonedeadpunk	well we are really limiting amount of workers, which reduces memroy consumption	15:21
*** ysandeep is now known as ysandeep\|out		15:21
noonedeadpunk	but out of aio build it feels that smth like 12 gb ram is needed for stable work. Such snadboxes can remain working for several month without issues	15:21
fungi	it does, until the individual service projects see that their memory-hungry patches are no longer failing with oom, and start to merge them without concern for how much memory they're wasting	15:22
noonedeadpunk	for example, when we're testing manila I spent half of day to find out flavor for it that would allow VM to spawn (in terms of image requirement) and not to OOM	15:22
noonedeadpunk	as 256 was too smal for focal to launch and 384 too much to start VM	15:23
noonedeadpunk	So I'd say we almost fitting 8Gb, until more hungry jobs (like with ceph) are launched...	15:24
fungi	probably the tc would need to select integrated/aggregate testing memory footprints as a cross-project goal	15:25
fungi	and figure out ways to keep services from consuming however much memory is available	15:25
fungi	because otherwise, projects are just going to grow their memory consumption to fit whatever new flavor is provided	15:26
fungi	it seems like our current test nodes are the only thing forcing them to be careful about memory waste	15:27
clarkb	really I think if privsep was improved we'd see a major benefit	15:28
fungi	there was a time when we ran devstack jobs in 2gb ram virtual machines. when we switched it to 4gb, all the projects quickly grew their memory utilization such that running in 2gb was no longer possible. same happened almost immediately again when we increased from 4gb to 8gb	15:28
clarkb	like maybe running a single privsep for all the things or making its regexes more efficient (re2 maybe?)	15:28
opendevreview	Elod Illes proposed openstack/openstack-zuul-jobs master: Use again bionic in lower-constraints for xena, wallaby and victoria https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836843	15:29
fzzf[m]	Hi, I use nodepool-builder build diskimages. diskimage-builder run `"curl -v -L -o source-repositories/.download.qg00hgu8 -w '%{http_code}' --connect-timeout 10 --retry 3 --retry-delay 30 https://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img -z"`, timeout occured.... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/NzuCpmBXgBZrhuVqdwZaYcPU)	15:29
opendevreview	Elod Illes proposed openstack/openstack-zuul-jobs master: Use again bionic in lower-constraints for xena and older branches https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836843	15:29
fungi	fzzf[m]: normally, dib should only download a particular cirros image once on each builder and then cache it indefinitely	15:30
fungi	fzzf[m]: i'm able to download that image at home in 3.3 seconds. maybe you have a proxy or filter between your builder and the internet which is blocking that request?	15:32
opendevreview	Elod Illes proposed openstack/openstack-zuul-jobs master: Use again bionic in lower-constraints for older branches https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836843	15:33
opendevreview	Clark Boylan proposed openstack/project-config master: Update grafana resource usage graphs to match new paths https://review.opendev.org/c/openstack/project-config/+/836844	15:33
clarkb	lajoskatona: yasufum ^ I think that will fix the issue with those graphs	15:33
clarkb	er sorry yasufum I meant to ping yatin and tab complete failed me	15:33
lajoskatona	clarkb: thanks, sorry we have neutron session in the meantime	15:34
fzzf[m]	fungi: Is there any way to set the proxy, or set the timeout, or is there one way to download manually and configure	15:36
clarkb	fzzf[m]: DIB should respect http_proxy env var settings. Is the problem that you are running in an environment that doens't have external access without a proxy?	15:39
clarkb	I don't expect increasing the timeout will make it work any better if whatever caused the initial timeout isn't addressed	15:40
clarkb	specifically if you cannot connect in 10 seconds that indicates a problem somewhere aside from the timeout	15:40
*** dviroel\|ptg is now known as dviroel\|ptg\|lunch		15:42
opendevreview	Merged openstack/project-config master: Retire opendev/puppet-openstack_health: set noop jobs https://review.opendev.org/c/openstack/project-config/+/836710	15:44
fzzf[m]	clarkb: Does DIB have any variable configuration for http_proxy?	15:46
clarkb	fzzf[m]: I think it should respect the standard env vars: http_proxy and https_proxy	15:47
fzzf[m]	clarkb: Do you mean the variables displayed by env command?	15:51
clarkb	fzzf[m]: the env command displays all currently set environment variables yes. I'm saying if you set http_proxy and https_proxy as environment variables pointing at your proxy that DIB should respect it	15:52
*** dasm\|ruck\|bbl is now known as dasm\|ruck		15:52
fungi	the http_proxy and https_proxy environment variables are a standard unix/linux way of specifying the location of outbound proxies for your systems, it's not something dib/nodepool/zuul-specific	15:53
fungi	if you're going to be running any servers in a network which requires use of a proxy, it's how you would handle that for most applications you run	15:54
timburke	fungi, clarkb: fyi, looks like the fedora mirror issue for py310 cleared up. thanks again for the quick investigation!	15:55
fungi	timburke: thanks for pointing it out! glad things there stabilized eventually	15:56
fungi	seems like there was a very large mirror push from fedora and the second-tier mirror we sync from was probably mid-update for a while	15:56
fzzf[m]	<fungi> "the http_proxy and https_proxy..." <- okay, I get it, I'll check it. thanks.	15:59
opendevreview	Merged openstack/project-config master: Update grafana resource usage graphs to match new paths https://review.opendev.org/c/openstack/project-config/+/836844	16:00
opendevreview	Marcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Add build-wheel-cache jobs for CentOS Stream 9 https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836829	16:03
opendevreview	Marcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Install EPEL on CentOS Stream 9 before using it https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836852	16:03
*** dviroel\|ptg\|lunch is now known as dviroel\|ptg		16:03
opendevreview	Marcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Add build-wheel-cache jobs for CentOS Stream 9 https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836829	16:13
opendevreview	Marcin Juszkiewicz proposed openstack/project-config master: Add centos-stream-9-arm64 nodes https://review.opendev.org/c/openstack/project-config/+/836796	16:16
opendevreview	Marcin Juszkiewicz proposed openstack/project-config master: Add EPEL into CentOS Stream 9 images https://review.opendev.org/c/openstack/project-config/+/836855	16:19
*** jpena is now known as jpena\|off		16:35
opendevreview	Merged openstack/project-config master: Add EPEL into CentOS Stream 9 images https://review.opendev.org/c/openstack/project-config/+/836855	16:38
opendevreview	Merged openstack/project-config master: Add centos-stream-9-arm64 nodes https://review.opendev.org/c/openstack/project-config/+/836796	16:40
clarkb	ok three of the resource usage graphs work now but not the first one	16:49
opendevreview	Clark Boylan proposed openstack/project-config master: Fix small bug in resource usage grpahs https://review.opendev.org/c/openstack/project-config/+/836861	16:51
clarkb	fungi: ^ I suspect that small typo is to blame	16:52
fungi	d'oh! i did not spot it	16:52
clarkb	I wrote it :)	16:52
*** dasm\|ruck is now known as dasm\|ruck\|bbl		17:06
opendevreview	Merged openstack/project-config master: Fix small bug in resource usage grpahs https://review.opendev.org/c/openstack/project-config/+/836861	17:10
*** dviroel\|ptg is now known as dviroel		17:22
*** dasm\|ruck\|bbl is now known as dasm\|ruck		17:36
*** dviroel is now known as dviroel\|mtg		17:47
clarkb	the resource usage graphs are fixed now	18:22
*** dviroel\|mtg is now known as dviroel		18:43
*** rlandy is now known as rlandy\|biab		19:48
*** rlandy\|biab is now known as rlandy		20:04
*** dviroel is now known as dviroel\|afk		20:34
*** dasm\|ruck is now known as dasm\|off		21:12
*** rlandy is now known as rlandy\|out		22:46

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!