*** jistr has quit IRC | 00:00 | |
*** jistr has joined #openstack-infra | 00:00 | |
fungi | dansmith: one shortcut might be to patch the job to fetch the webserver access logs (if it doesn't already?) and see when it was getting requests | 00:06 |
---|---|---|
clarkb | fungi: I was looking for those and couldn't find them | 00:07 |
clarkb | its possible they are there and I was just not seeing them though | 00:07 |
dansmith | fungi: yeah | 00:08 |
dansmith | but | 00:08 |
dansmith | my butt has been in this chair since I got up at 0430 for a meeting so I better go see what things further than six feet look like | 00:09 |
fungi | yeah, i get that ;) | 00:10 |
*** ianychoi__ has joined #openstack-infra | 00:14 | |
clarkb | dansmith: they are very wet and rainy right now | 00:15 |
*** tosky has quit IRC | 00:15 | |
*** ianychoi_ has quit IRC | 00:17 | |
*** piotrowskim has quit IRC | 00:17 | |
dansmith | clarkb: not in my kitchen :) | 00:21 |
*** jamesmcarthur has joined #openstack-infra | 01:42 | |
*** __ministry1 has joined #openstack-infra | 01:44 | |
*** sreejithp has quit IRC | 01:54 | |
*** dviroel has quit IRC | 01:54 | |
*** jamesmcarthur has quit IRC | 02:07 | |
*** zzzeek has quit IRC | 02:12 | |
*** zzzeek has joined #openstack-infra | 02:12 | |
*** jamesmcarthur has joined #openstack-infra | 02:14 | |
*** lbragstad_ has joined #openstack-infra | 02:14 | |
*** ysandeep|away is now known as ysandeep | 02:15 | |
*** lbragstad has quit IRC | 02:17 | |
*** jamesmcarthur has quit IRC | 02:19 | |
*** jamesmcarthur has joined #openstack-infra | 02:29 | |
*** rcernin has quit IRC | 02:37 | |
*** rcernin has joined #openstack-infra | 02:51 | |
*** tbachman has quit IRC | 02:51 | |
*** tbachman has joined #openstack-infra | 02:54 | |
*** tbachman_ has joined #openstack-infra | 02:56 | |
*** tbachman has quit IRC | 02:59 | |
*** tbachman_ is now known as tbachman | 02:59 | |
*** jamesmcarthur has quit IRC | 03:05 | |
*** jamesmcarthur has joined #openstack-infra | 03:06 | |
*** jamesmcarthur has quit IRC | 03:11 | |
*** jamesmcarthur has joined #openstack-infra | 03:13 | |
*** jamesmcarthur has quit IRC | 03:14 | |
*** jamesmcarthur has joined #openstack-infra | 03:16 | |
*** ykarel has joined #openstack-infra | 03:18 | |
*** jamesmcarthur has quit IRC | 03:20 | |
*** jamesmcarthur has joined #openstack-infra | 03:20 | |
*** jamesmcarthur has quit IRC | 03:20 | |
*** jamesmcarthur has joined #openstack-infra | 03:22 | |
*** jamesmcarthur has quit IRC | 03:26 | |
*** ykarel has quit IRC | 03:27 | |
*** dustinc has joined #openstack-infra | 03:29 | |
*** jamesmcarthur has joined #openstack-infra | 03:32 | |
*** jamesmcarthur has quit IRC | 03:37 | |
*** d34dh0r53 has quit IRC | 03:47 | |
*** d34dh0r53 has joined #openstack-infra | 03:48 | |
*** d34dh0r53 has quit IRC | 03:48 | |
*** d34dh0r53 has joined #openstack-infra | 03:49 | |
*** d34dh0r53 has quit IRC | 03:49 | |
*** jamesmcarthur has joined #openstack-infra | 03:51 | |
*** lbragstad_ is now known as lbragstad | 03:51 | |
*** jamesmcarthur has quit IRC | 03:52 | |
*** jamesmcarthur has joined #openstack-infra | 03:53 | |
*** d34dh0r53 has joined #openstack-infra | 03:53 | |
*** d34dh0r53 has quit IRC | 03:55 | |
*** d34dh0r53 has joined #openstack-infra | 03:56 | |
*** d34dh0r53 has quit IRC | 03:56 | |
*** d34dh0r53 has joined #openstack-infra | 03:57 | |
*** d34dh0r53 has quit IRC | 03:57 | |
*** d34dh0r53 has joined #openstack-infra | 03:58 | |
*** d34dh0r53 has quit IRC | 03:58 | |
*** d34dh0r53 has joined #openstack-infra | 03:59 | |
*** zul has quit IRC | 04:16 | |
*** __ministry1 has quit IRC | 04:53 | |
*** ykarel has joined #openstack-infra | 05:01 | |
*** ramishra has quit IRC | 05:29 | |
*** ramishra has joined #openstack-infra | 05:29 | |
*** ramishra has quit IRC | 05:30 | |
*** ramishra has joined #openstack-infra | 05:30 | |
*** ramishra has quit IRC | 05:32 | |
*** ramishra has joined #openstack-infra | 05:32 | |
*** dchen has quit IRC | 05:36 | |
*** dchen has joined #openstack-infra | 05:36 | |
*** dustinc has quit IRC | 05:39 | |
*** dchen has quit IRC | 05:47 | |
*** dchen has joined #openstack-infra | 05:48 | |
*** dchen has quit IRC | 05:49 | |
*** ykarel_ has joined #openstack-infra | 05:50 | |
*** dchen has joined #openstack-infra | 05:53 | |
*** ykarel has quit IRC | 05:53 | |
*** ykarel_ is now known as ykarel | 05:53 | |
*** ramishra has quit IRC | 05:55 | |
*** ramishra has joined #openstack-infra | 05:55 | |
*** ykarel_ has joined #openstack-infra | 05:58 | |
*** ykarel has quit IRC | 06:00 | |
*** rcernin has quit IRC | 06:08 | |
*** rcernin has joined #openstack-infra | 06:08 | |
*** ykarel_ is now known as ykarel | 06:15 | |
*** rcernin has quit IRC | 06:17 | |
*** rcernin has joined #openstack-infra | 06:17 | |
*** rcernin has quit IRC | 06:17 | |
*** rcernin has joined #openstack-infra | 06:19 | |
*** jamesmcarthur has quit IRC | 06:34 | |
*** dchen has quit IRC | 06:52 | |
*** dchen has joined #openstack-infra | 06:53 | |
*** jamesmcarthur has joined #openstack-infra | 07:04 | |
*** jcapitao has joined #openstack-infra | 07:08 | |
*** jamesmcarthur has quit IRC | 07:10 | |
*** jamesmcarthur has joined #openstack-infra | 07:22 | |
*** eolivare has joined #openstack-infra | 07:28 | |
*** slaweq has joined #openstack-infra | 07:28 | |
*** ralonsoh has joined #openstack-infra | 07:28 | |
*** vishalmanchanda has joined #openstack-infra | 07:33 | |
*** hashar has joined #openstack-infra | 07:58 | |
*** dklyle has quit IRC | 07:58 | |
*** hashar has quit IRC | 08:01 | |
*** hashar has joined #openstack-infra | 08:01 | |
*** sboyron_ has joined #openstack-infra | 08:04 | |
*** psachin has joined #openstack-infra | 08:11 | |
*** amoralej|off is now known as amoralej | 08:15 | |
*** ysandeep is now known as ysandeep|lunch | 08:18 | |
*** gfidente has joined #openstack-infra | 08:18 | |
*** andrewbonney has joined #openstack-infra | 08:19 | |
*** ykarel is now known as ykarel|lunch | 08:21 | |
*** jamesmcarthur has quit IRC | 08:40 | |
*** tosky has joined #openstack-infra | 08:40 | |
*** rpittau|afk is now known as rpittau | 08:41 | |
*** jamesmcarthur has joined #openstack-infra | 08:56 | |
*** jpena|off is now known as jpena | 08:57 | |
*** jamesmcarthur has quit IRC | 09:04 | |
*** ociuhandu has joined #openstack-infra | 09:06 | |
*** gfidente has quit IRC | 09:06 | |
*** gfidente has joined #openstack-infra | 09:09 | |
*** lucasagomes has joined #openstack-infra | 09:11 | |
jcapitao | hello folks, we are facing timed out connection when accessing tarballs.opendev.org | 09:12 |
*** jamesmcarthur has joined #openstack-infra | 09:15 | |
*** dulek has joined #openstack-infra | 09:22 | |
*** jamesmcarthur has quit IRC | 09:22 | |
*** xek has joined #openstack-infra | 09:28 | |
*** ociuhandu has quit IRC | 09:28 | |
*** nightmare_unreal has joined #openstack-infra | 09:30 | |
*** ociuhandu has joined #openstack-infra | 09:33 | |
*** jamesmcarthur has joined #openstack-infra | 09:34 | |
*** ysandeep|lunch is now known as ysandeep | 09:38 | |
*** derekh has joined #openstack-infra | 09:39 | |
*** psachin has quit IRC | 09:40 | |
*** jamesmcarthur has quit IRC | 09:40 | |
*** psachin has joined #openstack-infra | 09:43 | |
*** ociuhandu has quit IRC | 09:45 | |
*** wanzenbug has joined #openstack-infra | 09:51 | |
*** jamesmcarthur has joined #openstack-infra | 09:51 | |
*** jamesmcarthur has quit IRC | 09:58 | |
*** wanzenbug has quit IRC | 10:04 | |
*** yamamoto_ has quit IRC | 10:04 | |
*** psachin has quit IRC | 10:04 | |
*** jamesmcarthur has joined #openstack-infra | 10:09 | |
Tengu | seems to be answering now, but reeeeeall slow | 10:15 |
*** jamesmcarthur has quit IRC | 10:17 | |
*** ociuhandu has joined #openstack-infra | 10:19 | |
*** ociuhandu has quit IRC | 10:24 | |
*** rcernin has quit IRC | 10:24 | |
*** ykarel|lunch is now known as ykarel | 10:29 | |
*** jamesmcarthur has joined #openstack-infra | 10:29 | |
*** sshnaidm|afk is now known as sshnaidm|ruck | 10:35 | |
*** jamesmcarthur has quit IRC | 10:36 | |
*** ociuhandu has joined #openstack-infra | 10:36 | |
*** ociuhandu has quit IRC | 10:41 | |
*** ociuhandu has joined #openstack-infra | 10:42 | |
*** hashar has quit IRC | 10:45 | |
*** jamesmcarthur has joined #openstack-infra | 10:47 | |
*** dtantsur|afk is now known as dtantsur | 10:49 | |
*** rcernin has joined #openstack-infra | 10:55 | |
*** gfidente has quit IRC | 10:55 | |
*** gfidente has joined #openstack-infra | 10:58 | |
*** jamesmcarthur has quit IRC | 10:59 | |
*** jcapitao is now known as jcapitao_lunch | 11:11 | |
*** jamesmcarthur has joined #openstack-infra | 11:12 | |
*** dviroel has joined #openstack-infra | 11:14 | |
*** rcernin has quit IRC | 11:15 | |
*** jamesmcarthur has quit IRC | 11:19 | |
*** yamamoto has joined #openstack-infra | 11:23 | |
*** ociuhandu has quit IRC | 11:27 | |
*** jamesmcarthur has joined #openstack-infra | 11:31 | |
*** jamesmcarthur has quit IRC | 11:37 | |
*** yamamoto has quit IRC | 11:46 | |
*** jamesmcarthur has joined #openstack-infra | 11:49 | |
*** tbachman has quit IRC | 11:52 | |
*** tbachman has joined #openstack-infra | 11:53 | |
*** jamesmcarthur has quit IRC | 11:56 | |
*** ociuhandu has joined #openstack-infra | 11:59 | |
*** yamamoto has joined #openstack-infra | 12:00 | |
*** rcernin has joined #openstack-infra | 12:01 | |
*** jamesmcarthur has joined #openstack-infra | 12:09 | |
*** ociuhandu_ has joined #openstack-infra | 12:12 | |
*** ociuhandu has quit IRC | 12:15 | |
*** rlandy has joined #openstack-infra | 12:16 | |
*** jamesmcarthur has quit IRC | 12:16 | |
*** przemeklal has joined #openstack-infra | 12:19 | |
*** jamesmcarthur has joined #openstack-infra | 12:30 | |
*** jamesmcarthur has quit IRC | 12:37 | |
*** dchen has quit IRC | 12:40 | |
*** jpena is now known as jpena|lunch | 12:41 | |
openstackgerrit | Pedro Luis Marques Sliuzas proposed openstack/project-config master: Add Metrics Server App to StarlingX https://review.opendev.org/c/openstack/project-config/+/773883 | 12:42 |
*** rcernin has quit IRC | 12:43 | |
*** ociuhandu_ has quit IRC | 12:46 | |
*** ociuhandu has joined #openstack-infra | 12:47 | |
*** jamesmcarthur has joined #openstack-infra | 12:50 | |
*** jamesmcarthur has quit IRC | 12:57 | |
*** amoralej is now known as amoralej|lunch | 13:02 | |
*** jcapitao_lunch is now known as jcapitao | 13:05 | |
*** jamesmcarthur has joined #openstack-infra | 13:09 | |
*** ociuhandu has quit IRC | 13:10 | |
*** ociuhandu has joined #openstack-infra | 13:11 | |
fungi | jcapitao: Tengu: apache got itself tied in knots on that server, not entirely sure how/why, but it was restarted around 10:20 utc and that seems to have cleared it up | 13:12 |
*** jamesmcarthur has quit IRC | 13:14 | |
Tengu | fungi: ok. maybe some tweaked queries making it crash? though it doesn't seem to have any dynamic content - just plain "auto index", isn't it? | 13:16 |
fungi | Tengu: yes, it's serving static files from global afs, and doing mod_autoindex in cases where there's no index file | 13:17 |
Tengu | fungi: maybe afs crashed for #reason? it's a network filesystem/share iirc? | 13:18 |
fungi | well, apache itself was still serving content, just slowly, and restarting apache caused it to clear up | 13:19 |
Tengu | or... maybe I'm mixing things. iirc there's a special afs thing for openstack infra | 13:19 |
fungi | afs is a global filesystem (like the way dns is a global database) | 13:19 |
Tengu | fungi: might be due to locked inodes locking httpd process at some point, making it slow as hell due to lack of resources... | 13:19 |
Tengu | or something like that - not sure if that afs has "inode" concept... | 13:20 |
fungi | and yes, looking in dmesg this was logged three times by the kernel at 10:04:33 utc "Waiting for busy volume 536870992 () in cell openstack.org" so there was some temporary blip at that time but that somehow left apache in a bad state | 13:20 |
Tengu | :) | 13:21 |
Tengu | maybe that mod_autoindex doing some funny things in the back and unhappy with that blip... | 13:21 |
fungi | 536870992 is the docs volume's read-only replica, served from afs01.dfw.openstack.org and afs01.ord.openstack.org | 13:24 |
fungi | both of those servers seem to be up and happy though | 13:24 |
Tengu | network glitch? | 13:25 |
fungi | highly likely | 13:25 |
openstackgerrit | Merged openstack/project-config master: CentOS 8 Stream initial enablement for AArch64 https://review.opendev.org/c/openstack/project-config/+/772887 | 13:25 |
fungi | i'm checking our network graphs for the involved systems to see if maybe there was bandwidth starvation | 13:26 |
openstackgerrit | Pedro Luis Marques Sliuzas proposed openstack/project-config master: Add Metrics Server App to StarlingX https://review.opendev.org/c/openstack/project-config/+/773883 | 13:26 |
*** jamesmcarthur has joined #openstack-infra | 13:27 | |
fungi | there's been a fairly steady volume of outbound traffic from afs01.dfw which started up around that time, though it's not substantial just mildly anomalous | 13:28 |
Tengu | stop that torrent service then ;) | 13:29 |
* Tengu runs away | 13:29 | |
fungi | heh | 13:29 |
fungi | i don't find corresponding inbound traffic on the other fileservers so i don't think it was a volume release | 13:34 |
fungi | but also could be entirely unrelated | 13:34 |
*** jamesmcarthur has quit IRC | 13:36 | |
*** jpena|lunch is now known as jpena | 13:37 | |
frickler | fungi: the slowness of static.o.o was well before those latest "busy volume" msgs appeared, they weren't there when I checked the server initially today, so pretty sure they're not the trigger for this event | 13:46 |
fungi | cool, so probably an unrelated coincidence | 13:46 |
fungi | oh, right, it was mentioned in #opendev at 08:36z | 13:47 |
fungi | so had already been going on for at least an hour or two | 13:47 |
*** zul has joined #openstack-infra | 13:50 | |
*** jamesmcarthur has joined #openstack-infra | 13:51 | |
*** ykarel_ has joined #openstack-infra | 13:51 | |
*** ykarel has quit IRC | 13:54 | |
*** jamesmcarthur has quit IRC | 13:56 | |
*** lbragstad has quit IRC | 13:57 | |
*** amoralej|lunch is now known as amoralej | 13:58 | |
*** ykarel_ is now known as ykarel | 13:59 | |
*** ociuhandu has quit IRC | 14:08 | |
*** jamesmcarthur has joined #openstack-infra | 14:08 | |
*** ociuhandu has joined #openstack-infra | 14:09 | |
*** jamesmcarthur has quit IRC | 14:15 | |
*** rlandy is now known as rlandy|training | 14:21 | |
*** ociuhandu has quit IRC | 14:27 | |
*** ociuhandu has joined #openstack-infra | 14:28 | |
*** ociuhandu has quit IRC | 14:32 | |
*** akahat|rover is now known as akahat | 14:34 | |
*** arxcruz is now known as arxcruz|ruck | 14:34 | |
*** lbragstad has joined #openstack-infra | 14:35 | |
*** sreejithp has joined #openstack-infra | 14:40 | |
*** gfidente has quit IRC | 14:46 | |
*** ysandeep is now known as ysandeep|afk | 14:48 | |
*** jamesmcarthur has joined #openstack-infra | 14:52 | |
*** gfidente has joined #openstack-infra | 14:54 | |
*** jamesmcarthur has quit IRC | 14:57 | |
*** bcafarel has quit IRC | 14:58 | |
*** d34dh0r53 has quit IRC | 15:01 | |
*** d34dh0r53 has joined #openstack-infra | 15:01 | |
*** jamesmcarthur has joined #openstack-infra | 15:05 | |
*** ociuhandu has joined #openstack-infra | 15:06 | |
*** jamesmcarthur has quit IRC | 15:09 | |
*** ociuhandu has quit IRC | 15:15 | |
*** jamesmcarthur has joined #openstack-infra | 15:18 | |
*** ociuhandu has joined #openstack-infra | 15:18 | |
*** jamesmcarthur has quit IRC | 15:23 | |
*** ykarel_ has joined #openstack-infra | 15:30 | |
*** ysandeep|afk is now known as ysandeep | 15:31 | |
*** ykarel has quit IRC | 15:32 | |
*** jamesmcarthur has joined #openstack-infra | 15:36 | |
*** jamesmcarthur has quit IRC | 15:43 | |
*** zxiiro has joined #openstack-infra | 15:49 | |
*** dklyle has joined #openstack-infra | 15:51 | |
*** jamesmcarthur has joined #openstack-infra | 15:52 | |
*** dklyle has quit IRC | 15:53 | |
*** david-lyle has joined #openstack-infra | 15:53 | |
*** jamesmcarthur has quit IRC | 15:57 | |
*** jamesmcarthur has joined #openstack-infra | 15:58 | |
clarkb | the daily periodic jobs do start at ~0600 iirc | 15:58 |
clarkb | though I'm unsure if they will actually get nodes quickly since zuul demand has been high | 15:58 |
*** ociuhandu_ has joined #openstack-infra | 16:00 | |
*** ykarel_ is now known as ykarel | 16:00 | |
*** ociuhandu has quit IRC | 16:03 | |
*** ysandeep is now known as ysandeep|away | 16:06 | |
*** rlandy|training is now known as rlandy | 16:07 | |
*** jamesmcarthur has quit IRC | 16:14 | |
*** ykarel has quit IRC | 16:17 | |
*** d34dh0r53 has quit IRC | 16:18 | |
*** d34dh0r53 has joined #openstack-infra | 16:19 | |
openstackgerrit | Matt McEuen proposed openstack/project-config master: New Project Request: airship/gerrit-to-github-bot https://review.opendev.org/c/openstack/project-config/+/773936 | 16:19 |
*** jamesmcarthur has joined #openstack-infra | 16:29 | |
*** jamesmcarthur has quit IRC | 16:29 | |
*** ociuhandu_ has quit IRC | 16:30 | |
*** jamesmcarthur has joined #openstack-infra | 16:36 | |
*** ociuhandu has joined #openstack-infra | 16:38 | |
*** hashar has joined #openstack-infra | 16:45 | |
*** sshnaidm|ruck is now known as sshnaidm | 16:52 | |
*** gfidente has quit IRC | 17:00 | |
*** gfidente has joined #openstack-infra | 17:02 | |
*** lucasagomes has quit IRC | 17:31 | |
*** ralonsoh has quit IRC | 17:31 | |
*** ociuhandu_ has joined #openstack-infra | 17:34 | |
*** ociuhandu has quit IRC | 17:37 | |
*** ociuhandu_ has quit IRC | 17:38 | |
*** david-lyle is now known as dklyle | 17:39 | |
*** amoralej is now known as amoralej|off | 17:49 | |
*** ociuhandu has joined #openstack-infra | 17:56 | |
*** jcapitao has quit IRC | 17:57 | |
*** gfidente is now known as gfidente|afk | 17:57 | |
*** jpena is now known as jpena|off | 17:59 | |
*** ociuhandu has quit IRC | 18:01 | |
*** derekh has quit IRC | 18:02 | |
*** eolivare has quit IRC | 18:07 | |
*** rpittau is now known as rpittau|afk | 18:08 | |
openstackgerrit | Merged openstack/project-config master: Update ACLs of Ironic Projects to allow Edit Hashtags https://review.opendev.org/c/openstack/project-config/+/772427 | 18:10 |
*** dtantsur is now known as dtantsur|afk | 18:16 | |
*** przemeklal has quit IRC | 18:17 | |
*** d34dh0r53 has quit IRC | 18:22 | |
openstackgerrit | Merged openstack/project-config master: Remove anachronistic jobs from scciclient https://review.opendev.org/c/openstack/project-config/+/772908 | 18:23 |
*** d34dh0r53 has joined #openstack-infra | 18:24 | |
*** nightmare_unreal has quit IRC | 18:26 | |
dansmith | how do I get the js for this? https://zuul.opendev.org/t/openstack/job/tripleo-ci-centos-8-containers-multinode | 18:28 |
dansmith | adding .js doesn't do it | 18:28 |
dansmith | clarkb: ^ | 18:29 |
dansmith | er, I mean the json of course | 18:29 |
clarkb | let me see | 18:29 |
fungi | dansmith: what json are you looking for? do you mean the yaml for the job definition? | 18:30 |
dansmith | fungi: isn't that an html rendering of a data structure? | 18:30 |
clarkb | ya and it should be json | 18:30 |
fungi | dansmith: oh! the json from the zuul api, got it | 18:30 |
dansmith | yeah | 18:30 |
fungi | quick batman, to the page source | 18:31 |
clarkb | https://zuul-ci.org/docs/zuul/reference/web.html is the doc but it seems to not have the job lookup defined | 18:31 |
dansmith | maybe I need to restrict my accept? | 18:31 |
clarkb | dansmith: https://zuul.opendev.org/api/tenant/openstack/job/tripleo-ci-centos-8-containers-multinode | 18:31 |
dansmith | ah there it is, thanks | 18:32 |
fungi | clarkb is much faster at reverse-engineering things than i am | 18:32 |
fungi | but yeah, it's just rendered from the job method | 18:33 |
clarkb | fungi: I took a leap that the job listing api and job detail api would be similar | 18:34 |
clarkb | I got lucky :) | 18:34 |
fungi | almost like someone designed a reasonably intuitive api. shocker | 18:35 |
fungi | (and kudos to the zuul developers!) | 18:35 |
dansmith | fungi: clarkb: I'm trying to count up the total node usage time for a given run on a patch | 18:48 |
dansmith | does this look legit? https://pastebin.com/9CW9sSZh | 18:48 |
dansmith | that comes from this patch: https://review.opendev.org/c/openstack/glance/+/770682 | 18:48 |
dansmith | just picked at random | 18:48 |
clarkb | looking | 18:49 |
dansmith | actually | 18:49 |
dansmith | heh | 18:49 |
dansmith | that has no multinode jobs apparently | 18:49 |
dansmith | let me pick a nova | 18:50 |
fungi | dansmith: seems reasonable, how are you extracting the node count? fro the inventory? | 18:50 |
fungi | and yeah, a change with some multi-node jobs would help | 18:50 |
clarkb | ya was going to mention it looks right, but no multinode to verify the multiplication that implies | 18:50 |
clarkb | and to clarify those numbers are basically the base patchset cost (without counting for gate or rechecks) | 18:50 |
clarkb | an ideal node hours count | 18:51 |
fungi | right, things like gate resets, build retries, et cetera all pile on top of that | 18:51 |
fungi | rechecks can be covered a little more directly | 18:51 |
clarkb | but may also not need that info if we want to compare assumed cost | 18:52 |
fungi | but any builds internally aborted/discarded/rerun by zuul won't be reported | 18:52 |
clarkb | like if one base cost is vastly different than another we can use that for some comparisons | 18:52 |
fungi | and yeah, that can mostly be assumed as a fixed overhead cost percentage | 18:52 |
dansmith | https://pastebin.com/NdagG24Q | 18:52 |
openstackgerrit | Merged openstack/project-config master: Add Metrics Server App to StarlingX https://review.opendev.org/c/openstack/project-config/+/773883 | 18:52 |
fungi | though technically it will be greater in projects with more involved/longer gate queues | 18:52 |
dansmith | all I care about is base time, not penalizing job weight for rehecks.. just "ideally, this would take N hours of time" | 18:52 |
clarkb | dansmith: ya I think what you've got there looks good with respect to that goal. Note that multinode jobs can have >2 nodes too, I expcet you're already handling that but wanted to call it out just in case | 18:53 |
dansmith | rechecks meaning zuul retrying something or other transient things | 18:53 |
fungi | it's an excellent place to start (and also possibly stop, as being "good enough") | 18:53 |
dansmith | clarkb: yep, I'm counting nodeset['nodes'] | 18:53 |
dansmith | that makes nova's count much higher than my off-the-cuff counting of the big jobs in my head | 18:54 |
fungi | "Job nova-grenade-multinode takes 2 nodes for 1h 18m 32s, total 2h 37m 4s" is that time already multiplied by 2 or should be multiplied after to get a total utilization? | 18:54 |
clarkb | fungi: the first number is wall time and second is node hours is how I read it | 18:55 |
dansmith | yes | 18:55 |
dansmith | here's a tripleo: https://pastebin.com/suaYz7hE | 18:55 |
fungi | yeah, so 2 * 1h 18m 32s is 2h 37m 4s | 18:55 |
fungi | sorry, i was being dense ;) | 18:55 |
fungi | both are included | 18:56 |
dansmith | only about 50% more than nova, which is surpising given the 40% of the resources that clark's script gives, vs nova's 10 | 18:56 |
fungi | of course, the number of change revisions, number of rechecks issued, tiems a build failed in pre-run or due to presumed network issues and was retried automatically, times a slew of builds were tanked by a gate reset and restarted... all that adds up | 18:57 |
dansmith | well, not counting manual rechecks or volume of commits, is there really any meat in the "automatically retried by zuul" bucket? | 18:58 |
dansmith | if there is, it'd be good to know that, but I guess I thought compared to a 4h tripleo job run, probably not so much | 18:59 |
dansmith | in the past, when nova was the big user, the high patch volume for nova was an argument to cut down the amount of stuff we run per commit, so I think that's still valid here | 19:00 |
dansmith | very few glance commits means it's not really a big deal how much they run | 19:00 |
dansmith | (for example) | 19:00 |
clarkb | ya, glance's bigger concern will be returning results back to their devs in a reasonable amount of time and with a reasonable chance of success (more jobs means more chance for falure) | 19:01 |
clarkb | the measure may still be worthwile in that context | 19:01 |
dansmith | well, that's the point of this.. not to make the tripleo jobs run faster or slower, but because actual result time is sooooo long right now | 19:02 |
dansmith | but yeah, glance doing 10h worth of work, if it has to be serialized due to unavailable workers, after a 4h wait, is no good | 19:02 |
dansmith | it sounds like tripleo has some jobs they can cut out of their current setup right now, so that may put them closer to where nova is | 19:03 |
dansmith | I still don't have a good way to determine commit volume per project to know if tripleo has like a thousand changes a day such that their jobs would need to be smaller, or whether nova is still high in terms of review pushes per day compared to others | 19:04 |
clarkb | fungi: ^ may have scripting tools that can figure that out (or similar enough) | 19:05 |
dansmith | yeah I tried one of his but it crashed | 19:05 |
dansmith | tripleo uses 20 nodes for that run, compared to nova's 27 FWIW | 19:06 |
* dansmith also thinks we have fat we can cut from nova | 19:06 | |
clarkb | the node counts would be another interesting stat to track too. Since once you've got a node its yours for ~ 3hours until it times out. But you may have to wait for more nodes if you need more to run the job set | 19:07 |
clarkb | My hunch is that this will be less useful than node hours since node hours more accurate captures the pool usage | 19:08 |
clarkb | but I could be wrong | 19:08 |
dansmith | yeah, there's definitely a "but how much of this hour were you *wasting* a node" but that's harder | 19:10 |
clarkb | dansmith: was your example for tripleo from a triple-heat-templates change? | 19:10 |
clarkb | based on the zuul log scraping I did that repo uses the most overall node hours | 19:11 |
dansmith | clarkb: it was | 19:11 |
dansmith | I will clean this up and pastebin it | 19:11 |
clarkb | and thats a 13.8% tht vs 8.96% nova in my scrape of the last week | 19:12 |
clarkb | whcih is closer to a 20 vs 31 hours ratio | 19:12 |
dansmith | true I was comparing all of tripleo | 19:13 |
clarkb | 1.54 ratio vs 1.55 according to python float math | 19:14 |
clarkb | based on that I'd like to say "that means tht and nova have similar change rates in gerrit" btu I don't think we can make that leap yet (due to rechecks and potential for gate thrashing). It is a good sign that things seem to line up between two different viewpoints | 19:15 |
dansmith | yeah | 19:16 |
fungi | dansmith: also a single change for a project is likely a poor sample size for resources used, given rampant application of file and branch filters | 19:16 |
*** andrewbonney has quit IRC | 19:17 | |
clarkb | ya https://review.opendev.org/766980 in the gate right now is running more jobs than in dansmith's example (which is why I asked if the source was tht), But I think its still a good starting point and we can take more samples and see what median/average/whatever looks like | 19:17 |
fungi | but yeah, revision count points to developer/feedback loop efficiency, rechecks point to job instability, automated retries possibly to jobs doing unorthodox things which could crash test nodes, et cetera | 19:18 |
dansmith | yeah, obviously there's a potential for under-counting there | 19:18 |
fungi | the average revision count will be higher, for example, in projects which have a culture of very long change stacks, when developers rebase most of a stack to address a review comment on something toward the beginning | 19:20 |
fungi | something as simple as avoiding unnecessarily stacking commits together when they're not strictly dependent could make a significant difference | 19:21 |
fungi | i mean, squashing commits could help in similar ways, but honestly i wouldn't advocate for workflow changes which reduce reviewability for the sake of resource savings | 19:22 |
dansmith | clarkb: that one you linked above is 31.5h, not too much more | 19:23 |
dansmith | although that was a check run, not gate | 19:23 |
dansmith | any chance their gate is larger than check? | 19:23 |
clarkb | oh interseting it could be I suppose | 19:23 |
fungi | usually the opposite, changes might run in check which are omitted in the gate, but sure it's possible they've done the reverse for some reason | 19:24 |
clarkb | we generally suggest that gate should always be a subset of check (so that you have confidence that when things go into the gate they will pass as they already passed in check) | 19:24 |
dansmith | https://termbin.com/eslv | 19:24 |
dansmith | fungi: right, but that's why I'm asking | 19:24 |
fungi | also trimming in dependent pipelines like gate is likely to have a bigger impact than in independent pipelines like check, because of the "gate reset" factor | 19:25 |
dansmith | yup | 19:25 |
dansmith | that's basic gating 101 to me now, but maybe that has been lost over the years | 19:25 |
dansmith | so to be honest, both nova and tripleo seem pretty fat to me, given that one computer couldn't likely finish all the work done on *each patch* in a friggin day | 19:28 |
dansmith | but the tripleo bigness seems less bigly than I was expecting | 19:28 |
openstackgerrit | Matt McEuen proposed openstack/project-config master: New Project Request: airship/gerrit-to-github-bot https://review.opendev.org/c/openstack/project-config/+/773936 | 19:29 |
clarkb | dansmith: fungi looking at https://review.opendev.org/c/openstack/tripleo-heat-templates/+/766980/ check jobs it ran 19 jobs in check and has 17 in gate. It appears the two missing in gate were non voting in check | 19:29 |
clarkb | so that all checks out from a gate is subset of check perspective | 19:30 |
dansmith | yeah | 19:30 |
fungi | dansmith: i would at least be cromulent not to embiggen them further | 19:30 |
fungi | er, it would | 19:30 |
dansmith | for sure, and I think both could shrink down | 19:30 |
dansmith | in nova, I've proposed to stop running two grenades that have 100% overlap, but there's a ceph zuulv3 thing that has to be worked out first | 19:31 |
dansmith | as an exaple | 19:31 |
dansmith | and I think that the tripleo people are open to dropping some or all of those current nv jobs, as well as one of theirs that was covering an upgrade scenario that has recently passed | 19:31 |
clarkb | I think it more likely that the difference in jobs I noticed between dansmith's paste example and the change in the gate is based on file matchers triggering different sets of jobs for different kidns of changes to the repo | 19:31 |
dansmith | and that's goodness from a resource conservation perspective | 19:32 |
clarkb | dansmith: worth mentioning the "experimental" queue is a good place for things that we'd like to be able to run on demand but don't need often | 19:32 |
dansmith | clarkb: yep | 19:32 |
clarkb | that way you can balance dropping jobs against needing to run them occasionally | 19:32 |
dansmith | especially things that we know fail or fail a lot and are just for traffic lighting | 19:32 |
dansmith | traffic lighting? maybe "sniff testing" | 19:33 |
dansmith | anyway | 19:33 |
dansmith | so, funny story | 19:35 |
dansmith | I picked a neutron one | 19:35 |
dansmith | 54.5h | 19:35 |
clarkb | wow | 19:36 |
dansmith | slaweq: ^ | 19:36 |
dansmith | hate to think what it was before the recent patch to drop a bunch of co-gating | 19:37 |
dansmith | clarkb: also, 42 nodes | 19:37 |
fungi | experimental has a low priority... is it lower than check priority though? if memory serves we're still limited to three priority levels because of the gearman protocol, and release activities fall into the top priority, then gating... | 19:37 |
dansmith | vs 22 for a tripleo run | 19:37 |
* fungi tries to figure it out | 19:37 | |
clarkb | fungi: zuul and nodepool have the ability to do finer grained priority than that, but I don't know if we expose it that way in configs | 19:38 |
clarkb | and ya experimental and check should be roughyl at the same level. More that if you know you need something you trigger the experimental jobs but 90% of the time they don't incur any overhead | 19:39 |
dansmith | also I hardly ever see the experimental queue on my dashboard, but that might be because there's nothing from nova in it | 19:39 |
fungi | check and experimental are currently both precedence low in https://opendev.org/openstack/project-config/src/branch/master/zuul.d/pipelines.yaml | 19:39 |
dansmith | Job neutron-tempest-dvr-ha-multinode-full takes 3 nodes for 1h 53m 11s, total 5h 39m 33s | 19:39 |
dansmith | hoo boy | 19:39 |
fungi | oh, right, the point is that things in the experimental pipeline are not automatically triggered, only run when someone issues a "check experimental" comment | 19:39 |
clarkb | https://zuul-ci.org/docs/zuul/reference/pipeline_def.html#attr-pipeline.precedence ya we still only expose the three levels but I'm pretty sure internally its a range 0-999 or similar | 19:40 |
clarkb | dansmith: and now we know the math works for >2 nodes :) | 19:41 |
dansmith | heh | 19:41 |
dansmith | clarkb: is there an appropriate git tree of random infra tools I could throw this in instead of a pastebin? | 19:44 |
openstackgerrit | Merged openstack/project-config master: New Project Request: airship/gerrit-to-github-bot https://review.opendev.org/c/openstack/project-config/+/773936 | 19:49 |
*** rcernin has joined #openstack-infra | 19:49 | |
fungi | we've got a tools directory in opendev/system-config which has been a bit of a dumping ground | 19:52 |
fungi | not certain that's a tradition we want to continue, but maybe | 19:53 |
dansmith | not a big deal, just figured if there was somewhere obvious | 19:53 |
slaweq | dansmith: clarkb: the problem for us is that this multinode dvr jobs is really really job which is testing dvr (except grenade one but this isn't covering everything) | 19:57 |
dansmith | slaweq: ack, it's not just that job that is the problem, it's just the biggest | 19:58 |
slaweq | dansmith: I know | 19:58 |
slaweq | our problem is that we have many backends and config combinations to test | 19:58 |
slaweq | and because of that we have many jobs | 19:58 |
dansmith | slaweq: sure, lots of projects have many more combinations than they can test | 19:58 |
fungi | also it's not absolutely necessary to run every test for every backend, in theory you could only run tests you expect to have differing results/code paths for those backends | 20:00 |
dansmith | and, sometimes you have to say "-W, let's run experimental on this patch to get coverage from $subsystem" | 20:01 |
fungi | odds are most of the code tested by jobs for two different backends are mostly the same | 20:01 |
slaweq | fungi: yes, I thought about defining some lists of unrelated files for various jobs | 20:01 |
slaweq | I will try to propose something soon | 20:01 |
fungi | well, thinking less in terms of file filters for jobs and more about what tests are run in which jobs, but sure both could help | 20:01 |
slaweq | fungi: ok, I will take a look for that too | 20:02 |
dansmith | yes, irrelevant_files, tempest test filters, and sometimes human intervention knowing when to run an optional job | 20:02 |
slaweq | dansmith: I will work on improvement there | 20:02 |
dansmith | slaweq: thanks, definitely appreciated :) | 20:02 |
fungi | and dansmith's suggestion on using the experimental pipeline for resource-intensive jobs which test a tiny slice of the codebase is good too | 20:02 |
fungi | (tiny compared to the overlap with other automatically run jobs i mean) | 20:02 |
*** zxiiro has quit IRC | 20:03 | |
*** ociuhandu has joined #openstack-infra | 20:04 | |
dansmith | clarkb: I can't fetch the zuul job definition for the devstack-platform-async job I added in here: https://review.opendev.org/c/openstack/devstack/+/771505 | 20:08 |
dansmith | whyfornot? | 20:08 |
clarkb | dansmith: first guess is because it is a new job that hasn't merged to the data set yet | 20:09 |
clarkb | so it only exists in the context of that change when that change is tested | 20:09 |
dansmith | okay | 20:09 |
dansmith | I guess that makes sense, but I guess I also didn't think that zuul would be collecting all defined jobs somehow either | 20:11 |
*** hashar has quit IRC | 20:11 | |
dansmith | devstack is low volume but I thought maybe had a lot of resources because it runs multinode things, | 20:11 |
dansmith | but 16h and 23 nodes | 20:11 |
clarkb | I think one of the big improvements devstack did was it dropped a bunch of tempest testing | 20:12 |
dansmith | yeah | 20:12 |
clarkb | it tests the devsatck portion of tempest jobs in a number of ways but then doesn't worry about tempest a whole lot anymore | 20:12 |
fungi | as far as querying the job method to figure out node counts, yeah that will only work for jobs in merged configuration so jobs which are proposed in a change and executed speculatively won't be queryable that way | 20:17 |
fungi | but also it's a toctou situation, where you may be querying the current node count for a job which was not the node count at the time a build of it ran in the change you're looking at | 20:17 |
dansmith | yeah, clearly this requires running on fresh jobs | 20:18 |
fungi | you can probably safely ignore/discard those situations from a statistical standpoint, but you do have to at least not break when you hit them | 20:18 |
dansmith | is there any sort of post-buildset hook where we could run this, and then be able to plot the big ones out of logstash or something? | 20:18 |
dansmith | we would need the job runtimes some other way obviously | 20:19 |
fungi | might just make more sense to emit them in statsd and be able to query from graphite/in a grafana dashboard | 20:19 |
clarkb | fungi: the trouble there is I don't think we do that sort of data collection at a patchset level | 20:20 |
*** rcernin has quit IRC | 20:20 | |
clarkb | since graphite collapses things down | 20:20 |
*** rcernin has joined #openstack-infra | 20:20 | |
clarkb | but I could be wrong about that (I did look around in graphite a bit yseterday and wasn't finding a better way to do it via its data) | 20:20 |
fungi | yeah, it would definitely require choosing our preferred aggregation up front | 20:20 |
clarkb | oh the other issue was we do more organization by job not buildset | 20:20 |
clarkb | there are probably ways to bend it to our will for this, but I'm not immediately aware of what that process would be | 20:21 |
*** jamesmcarthur has quit IRC | 20:22 | |
*** lamt has joined #openstack-infra | 20:28 | |
dansmith | slaweq: sorry, I think I picked a stable/ patch to run that on.. the master number is 38h not 54h, but node count is still quite high at 32 | 20:48 |
dansmith | slaweq: maybe stable needs some tripleo-ectomy like you did for master? | 20:49 |
fungi | though also if stable branch changes are far less frequent, the gain from spending effort to improve their efficiency could be small | 20:50 |
dansmith | yep | 20:50 |
fungi | like if there are 100 master branch changes for every stable backport, not a lot of pojnt | 20:50 |
fungi | point | 20:50 |
dansmith | at 54h and 42 nodes, I'm not sure it's not worth it, but.. I know the scale is very differernt | 20:51 |
*** sshnaidm is now known as sshnaidm|afk | 20:52 | |
*** SpamapS has quit IRC | 20:52 | |
*** rcernin has quit IRC | 21:22 | |
*** hamalq has joined #openstack-infra | 21:27 | |
openstackgerrit | Merged openstack/project-config master: Add ansible-role-pki repo https://review.opendev.org/c/openstack/project-config/+/773385 | 21:36 |
*** ociuhandu has quit IRC | 21:36 | |
fungi | and waiting for that to deploy | 21:37 |
fungi | er, wrong channel | 21:37 |
*** sboyron_ has quit IRC | 21:49 | |
*** SpamapS has joined #openstack-infra | 21:54 | |
*** thiago__ has joined #openstack-infra | 22:00 | |
*** tdasilva_ has quit IRC | 22:02 | |
*** xek has quit IRC | 22:03 | |
*** rcernin has joined #openstack-infra | 22:05 | |
*** rcernin has quit IRC | 22:06 | |
*** tdasilva_ has joined #openstack-infra | 22:06 | |
*** thiago__ has quit IRC | 22:06 | |
*** rcernin has joined #openstack-infra | 22:07 | |
*** openstackgerrit has quit IRC | 22:11 | |
*** vishalmanchanda has quit IRC | 22:30 | |
*** tdasilva_ has quit IRC | 22:34 | |
*** thiago__ has joined #openstack-infra | 22:34 | |
*** slaweq has quit IRC | 22:41 | |
dansmith | clarkb: https://zuul.opendev.org/t/openstack/build/ad126256fe7b4b3e9454dbd6a6532ec7/log/job-output.txt#25680 | 22:42 |
dansmith | Speedup: 1.336 | 22:42 |
*** slaweq has joined #openstack-infra | 22:43 | |
clarkb | just over a 5 minute time savings? thats pretty good when you multiply it by the number of devstack job run | 22:43 |
dansmith | yup | 22:43 |
*** rcernin has quit IRC | 22:45 | |
*** slaweq has quit IRC | 22:47 | |
*** rlandy is now known as rlandy|bbl | 23:16 | |
*** dchen has joined #openstack-infra | 23:34 | |
*** tosky has quit IRC | 23:54 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!