Wednesday, 2022-04-06

opendevreviewMerged openstack/project-config master: Retire openstack-health project: end project gating  https://review.opendev.org/c/openstack/project-config/+/83670700:01
opendevreviewGhanshyam proposed openstack/project-config master: Retire openstack-health projects: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/83670900:09
opendevreviewGhanshyam proposed openstack/project-config master: Retire opendev/puppet-openstack_health: set noop jobs  https://review.opendev.org/c/openstack/project-config/+/83671000:31
opendevreviewGhanshyam proposed openstack/project-config master: Retire opendev/puppet-openstack_health: remove project from infa  https://review.opendev.org/c/openstack/project-config/+/83671200:39
*** ysandeep|out is now known as ysandeep05:22
*** jpena|off is now known as jpena07:36
noonedeadpunkclarkb: regarding zuul queues. I have 2 questions. 1. Should we calculate time project is consuming in gates before we switched to shared queues to compare how much time we started wasting with that approach? 2. Can I just define that in project-config?08:20
*** raukadah is now known as chandankumar08:29
*** ysandeep is now known as ysandeep|lunch09:08
opendevreviewElod Illes proposed openstack/project-config master: Add ansible-collection-kolla to projects.yaml  https://review.opendev.org/c/openstack/project-config/+/83676309:49
*** ysandeep|lunch is now known as ysandeep10:03
*** rlandy_ is now known as rlandy10:19
opendevreviewMarcin Juszkiewicz proposed openstack/project-config master: Build and publish wheel mirror for CentOS Stream 9  https://review.opendev.org/c/openstack/project-config/+/83679310:24
opendevreviewMarcin Juszkiewicz proposed openstack/project-config master: Add centos-stream-9-arm64 nodes  https://review.opendev.org/c/openstack/project-config/+/83679610:34
opendevreviewMarcin Juszkiewicz proposed openstack/project-config master: Remove CentOS 8 wheel mirrors  https://review.opendev.org/c/openstack/project-config/+/83679910:37
*** dviroel|out is now known as dviroel11:21
*** ysandeep is now known as ysandeep|afk11:37
opendevreviewMerged openstack/project-config master: Remove CentOS 8 wheel mirrors  https://review.opendev.org/c/openstack/project-config/+/83679911:50
*** ysandeep|afk is now known as ysandeep12:56
opendevreviewMarcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Add build-wheel-cache jobs for CentOS Stream 9  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/83682913:19
opendevreviewMarcin Juszkiewicz proposed openstack/project-config master: Build and publish wheel mirror for CentOS Stream 9  https://review.opendev.org/c/openstack/project-config/+/83679313:19
lajoskatonafrickler: Hi,  can you join Neutron discussion for OVN related questions? (ttps://www.openstack.org/ptg/rooms/grizzly )13:23
opendevreviewGhanshyam proposed openstack/project-config master: Remove tempest-lib from infra  https://review.opendev.org/c/openstack/project-config/+/83670313:24
*** dasm|off is now known as dasm|ruck13:44
*** dviroel is now known as dviroel|ptg14:13
clarkbnoonedeadpunk: I think that is a different waste value to the one you were concerned abou previously. The previous concern was wasted resources which isn't quite the same as gate time. From a resource perspective that is tracked in graphite and there are some dashboard for it already: https://grafana.opendev.org/d/94891e7b01/resource-usage-by-tenants-and-projects?orgId=1 but15:01
clarkblooks like those graphs may have broken recently15:02
clarkbnoonedeadpunk: I think graphite is also tracking the time in queues but we don't have dashboards for it15:02
clarkbnoonedeadpunk: for 2) you can define it htere since there should already be project entries for the projects there15:03
clarkblajoskatona: I'm trying t oremember were you working on the resource usage dashboards?15:05
lajoskatonaclarkb: yes, but yatin actually finished it15:05
clarkbah. I'm not sure why those grpahs would've stopped having data recently. I wonder if zuul shifted the location of that info but I don't recall that happening15:06
lajoskatonaclarkb: this one: https://grafana.opendev.org/d/94891e7b01/resource-usage-by-tenants-and-projects?orgId=115:06
clarkbya data seems to end on march 25th15:06
lajoskatonaclarkb: true, I don't know what happened with it15:06
clarkblet me ask the zuul project if they know what may have caused that15:07
*** dasm|ruck is now known as dasm|ruck|bbl15:08
noonedeadpunkclarkb: likely I wasn't able to explane my concerns correctly:)15:10
clarkbnoonedeadpunk: well either way that info should be in graphite and you can query it there15:10
noonedeadpunkclarkb: as my concern was mostly about random failures in gate that would invalidate all queue 15:10
noonedeadpunkok, gotcha15:11
clarkbnoonedeadpunk: right but it was expressed as a concern was wasting resources not wall time for developers (they are related but distinct problems)15:11
noonedeadpunk(I just today got OOM during tempest in gates)15:11
clarkbwall time is much more susceptble to other demands in the system15:11
noonedeadpunkeventually having flavor with 12Gb of RAM would be quite helpfull to eliminate that15:12
funginoonedeadpunk: we have flavors with 1615:12
noonedeadpunkfungi: I checked nodepool conf and saw only 1 provider having them... 15:12
clarkbbut they are limited and you may experience NODE_FAILURE depending on availbality of the cloud provider15:13
fungiit's possible we're down to 1 now that ericsson cancelled their citycloud donation, yeah. i'll double-check15:13
clarkbyes I think it was vexxhost and citycloud and now just vexxhost? citycloud could never reliably boot the larger instances either15:13
noonedeadpunkuhhhh... citycloud should just donate on their own... 15:13
funginoonedeadpunk: you'd think that, but... if you happen to know anybody there that would be awesome15:14
noonedeadpunkwe're working on that but progress is slow :(15:14
* noonedeadpunk happen to know himself....15:14
* noonedeadpunk happen to know also evrardjp who is CTO there15:15
fungioh, i didn't realize that's where he ended up!15:15
clarkbright so with citycloud we had resources there indepdnent of airship and had to shut them all down15:15
clarkbthen airship + ericcson + citycloud did a thing for a bit that was smaller but intended for larger flavor sizes. Unfortunately the larger flavor sizes were very flaky15:16
fungier, didn't have resources independent of what was being donated for airship15:16
clarkband now we don't have that at all which leaves just vexxhost for the larger flavor sizes aiui15:16
clarkbfungi: we did several years ago15:17
noonedeadpunkwell, we have some ideas, but have soooo limited time for that....15:17
noonedeadpunkand huge backlog...15:17
clarkbbut then they were gone for a year or two by the time airship donation happened15:17
fungioh, right there was an earlier citycloud provider but we were running into scheduling issues with it then too15:17
noonedeadpunkbut will see15:17
noonedeadpunkI didn't know that you had to drop citycloud though15:17
clarkbnoonedeadpunk: yes we were told they didn't have the extra capacity anymore iirc15:18
clarkbwhich is a perfectly valid reason to stop donating15:18
fungion the original donation, right15:18
clarkbJust want to pint out that asking us to give you bigger instances is difficult when the clouds you work for are unable t odo it :)15:18
fungiand then the more recent donation ended because ericsson stopped paying for it15:18
clarkbre OOMing I've tried a few times to encourage projects to look at their resource consumption particularly memory15:19
noonedeadpunkok, gotcha15:19
clarkbthe last time I did debugging of it privsep was a major culprit15:19
clarkbbecause every service (maybe even each process) runs a separate privsep instance and they all together consume more memory than any single openstack service iirc15:19
noonedeadpunkwell, the job that failed was installing whole telemetry stack....15:19
fungiright, i expect openstack-ansible is falling victim to unchecked memory bloat across openstack services, which happens in part because they happily use all the memory our flavors offer15:20
clarkbfungi: right, if we had 16GB of memory as default then openstack would balloon to fill that15:20
clarkbwe can kick the can down th eroad or push openstack to fix it15:20
fungiif every project started using 16gb flavors for testing, suddenly the openstack services would just grow their memory footprint in response, right15:20
clarkbI've attempted the please fix it route without much success15:20
fungiand so openstack-ansible would probably continue to oom once that happened15:21
noonedeadpunkwell we are really limiting amount of workers, which reduces memroy consumption15:21
*** ysandeep is now known as ysandeep|out15:21
noonedeadpunkbut out of aio build it feels that smth like 12 gb ram is needed for stable work. Such snadboxes can remain working for several month without issues15:21
fungiit does, until the individual service projects see that their memory-hungry patches are no longer failing with oom, and start to merge them without concern for how much memory they're wasting15:22
noonedeadpunkfor example, when we're testing manila I spent half of day to find out flavor for it that would allow VM to spawn (in terms of image requirement) and not to OOM15:22
noonedeadpunkas 256 was too smal for focal to launch and 384 too much to start VM15:23
noonedeadpunkSo I'd say we almost fitting 8Gb, until more hungry jobs (like with ceph) are launched... 15:24
fungiprobably the tc would need to select integrated/aggregate testing memory footprints as a cross-project goal15:25
fungiand figure out ways to keep services from consuming however much memory is available15:25
fungibecause otherwise, projects are just going to grow their memory consumption to fit whatever new flavor is provided15:26
fungiit seems like our current test nodes are the only thing forcing them to be careful about memory waste15:27
clarkbreally I think if privsep was improved we'd see a major benefit15:28
fungithere was a time when we ran devstack jobs in 2gb ram virtual machines. when we switched it to 4gb, all the projects quickly grew their memory utilization such that running in 2gb was no longer possible. same happened almost immediately again when we increased from 4gb to 8gb15:28
clarkblike maybe running a single privsep for all the things or making its regexes more efficient (re2 maybe?)15:28
opendevreviewElod Illes proposed openstack/openstack-zuul-jobs master: Use again bionic in lower-constraints for xena, wallaby and victoria  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/83684315:29
fzzf[m]Hi, I use nodepool-builder build diskimages. diskimage-builder run `"curl -v -L -o source-repositories/.download.qg00hgu8 -w '%{http_code}' --connect-timeout 10 --retry 3 --retry-delay 30 https://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img -z"`,  **timeout occured**.... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/NzuCpmBXgBZrhuVqdwZaYcPU)15:29
opendevreviewElod Illes proposed openstack/openstack-zuul-jobs master: Use again bionic in lower-constraints for xena and older branches  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/83684315:29
fungifzzf[m]: normally, dib should only download a particular cirros image once on each builder and then cache it indefinitely15:30
fungifzzf[m]: i'm able to download that image at home in 3.3 seconds. maybe you have a proxy or filter between your builder and the internet which is blocking that request?15:32
opendevreviewElod Illes proposed openstack/openstack-zuul-jobs master: Use again bionic in lower-constraints for older branches  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/83684315:33
opendevreviewClark Boylan proposed openstack/project-config master: Update grafana resource usage graphs to match new paths  https://review.opendev.org/c/openstack/project-config/+/83684415:33
clarkblajoskatona: yasufum  ^ I think that will fix the issue with those graphs15:33
clarkber sorry yasufum I meant to ping yatin and tab complete failed me15:33
lajoskatonaclarkb: thanks, sorry we have neutron session in the meantime15:34
fzzf[m]fungi: Is there any way to set the proxy, or set the timeout, or is there one way to download manually and configure15:36
clarkbfzzf[m]: DIB should respect http_proxy env var settings. Is the problem that you are running in an environment that doens't have external access without a proxy?15:39
clarkbI don't expect increasing the timeout will make it work any better if whatever caused the initial timeout isn't addressed15:40
clarkbspecifically if you cannot connect in 10 seconds that indicates a problem somewhere aside from the timeout15:40
*** dviroel|ptg is now known as dviroel|ptg|lunch15:42
opendevreviewMerged openstack/project-config master: Retire opendev/puppet-openstack_health: set noop jobs  https://review.opendev.org/c/openstack/project-config/+/83671015:44
fzzf[m]clarkb: Does DIB have any variable configuration for http_proxy?15:46
clarkbfzzf[m]: I think it should respect the standard env vars: http_proxy and https_proxy15:47
fzzf[m]clarkb: Do you mean the variables displayed by env command?15:51
clarkbfzzf[m]: the env command displays all currently set environment variables yes. I'm saying if you set http_proxy and https_proxy as environment variables pointing at your proxy that DIB should respect it15:52
*** dasm|ruck|bbl is now known as dasm|ruck15:52
fungithe http_proxy and https_proxy environment variables are a standard unix/linux way of specifying the location of outbound proxies for your systems, it's not something dib/nodepool/zuul-specific15:53
fungiif you're going to be running any servers in a network which requires use of a proxy, it's how you would handle that for most applications you run15:54
timburkefungi, clarkb: fyi, looks like the fedora mirror issue for py310 cleared up. thanks again for the quick investigation!15:55
fungitimburke: thanks for pointing it out! glad things there stabilized eventually15:56
fungiseems like there was a very large mirror push from fedora and the second-tier mirror we sync from was probably mid-update for a while15:56
fzzf[m]<fungi> "the http_proxy and https_proxy..." <- okay, I get it, I'll check it. thanks.15:59
opendevreviewMerged openstack/project-config master: Update grafana resource usage graphs to match new paths  https://review.opendev.org/c/openstack/project-config/+/83684416:00
opendevreviewMarcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Add build-wheel-cache jobs for CentOS Stream 9  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/83682916:03
opendevreviewMarcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Install EPEL on CentOS Stream 9 before using it  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/83685216:03
*** dviroel|ptg|lunch is now known as dviroel|ptg16:03
opendevreviewMarcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Add build-wheel-cache jobs for CentOS Stream 9  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/83682916:13
opendevreviewMarcin Juszkiewicz proposed openstack/project-config master: Add centos-stream-9-arm64 nodes  https://review.opendev.org/c/openstack/project-config/+/83679616:16
opendevreviewMarcin Juszkiewicz proposed openstack/project-config master: Add EPEL into CentOS Stream 9 images  https://review.opendev.org/c/openstack/project-config/+/83685516:19
*** jpena is now known as jpena|off16:35
opendevreviewMerged openstack/project-config master: Add EPEL into CentOS Stream 9 images  https://review.opendev.org/c/openstack/project-config/+/83685516:38
opendevreviewMerged openstack/project-config master: Add centos-stream-9-arm64 nodes  https://review.opendev.org/c/openstack/project-config/+/83679616:40
clarkbok three of the resource usage graphs work now but not the first one16:49
opendevreviewClark Boylan proposed openstack/project-config master: Fix small bug in resource usage grpahs  https://review.opendev.org/c/openstack/project-config/+/83686116:51
clarkbfungi: ^ I suspect that small typo is to blame16:52
fungid'oh! i did not spot it16:52
clarkbI wrote it :)16:52
*** dasm|ruck is now known as dasm|ruck|bbl17:06
opendevreviewMerged openstack/project-config master: Fix small bug in resource usage grpahs  https://review.opendev.org/c/openstack/project-config/+/83686117:10
*** dviroel|ptg is now known as dviroel17:22
*** dasm|ruck|bbl is now known as dasm|ruck17:36
*** dviroel is now known as dviroel|mtg17:47
clarkbthe resource usage graphs are fixed now18:22
*** dviroel|mtg is now known as dviroel18:43
*** rlandy is now known as rlandy|biab19:48
*** rlandy|biab is now known as rlandy20:04
*** dviroel is now known as dviroel|afk20:34
*** dasm|ruck is now known as dasm|off21:12
*** rlandy is now known as rlandy|out22:46

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!