Monday, 2021-11-15

-@gerrit:opendev.org- Dong Zhang proposed: [zuul/zuul] 817885: WIP: just for test https://review.opendev.org/c/zuul/zuul/+/81788504:01
-@gerrit:opendev.org- Dong Zhang proposed: [zuul/zuul] 817885: WIP: just for test https://review.opendev.org/c/zuul/zuul/+/81788504:45
@westphahl:matrix.orgcorvus: it shouldn't be possible for the run handler to process a tenant that is not loaded. A tenant is only added to the abide after config loading completed for a tenant. Also, priming/reconfig is protected by the tenant read-write lock so I can't imagine a case where the scheduler discards events for pipelines as it should never have an incomplete or partially state of a tenant (when e.g. the read lock is held)06:37
@westphahl:matrix.org * corvus: it shouldn't be possible for the run handler to process a tenant that is not loaded. A tenant is only added to the abide after config loading is completed for a tenant. Also, priming/reconfig is protected by the tenant read-write lock so I can't imagine a case where the scheduler discards events for pipelines as it should never have an incomplete or partially state of a tenant (when e.g. the read lock is held)06:38
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 817949: SQL reporter: add substring search on some fields https://review.opendev.org/c/zuul/zuul/+/81794913:44
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 793159: Web UI: make more filters selectable in build, buildset searches https://review.opendev.org/c/zuul/zuul/+/79315913:49
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 808041: [web] Early support for pagination in builds, buildsets search https://review.opendev.org/c/zuul/zuul/+/80804113:49
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 808042: [zuul-web] Add pagination information when querying builds, buildsets endpoints https://review.opendev.org/c/zuul/zuul/+/80804213:49
@jim:acmegating.comswest: but if it wins a connection election and starts distributing connection events to tenants, it wouldn't distribute it to a tenant that wasn't loaded, right?13:56
@westphahl:matrix.orgcorvus: correct, but that's not covered by 817869 AFAIKS14:04
@westphahl:matrix.orgso I think it would make sense to make the connections wait until the scheduler is primed14:06
@jim:acmegating.comswest: okay, makes sense.  i'll rework it to do that.14:07
@jim:acmegating.comswest: we want the connections to accept events, but not distribute them to tenants14:08
@jim:acmegating.com(otherwise, we'd have a huge window of lost events on startup)14:08
@westphahl:matrix.orgyes14:09
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] 817959: Remove fake SQL classes and variables from tests https://review.opendev.org/c/zuul/zuul/+/81795914:38
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] 817960: Add missing onStop() call to TestMysqlDatabase https://review.opendev.org/c/zuul/zuul/+/81796014:42
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] 817963: Fix NoneType ppc exception in freezeJobGraph() https://review.opendev.org/c/zuul/zuul/+/81796315:10
@felixedel:matrix.orgcorvus:  A small fix for the latest changes to the ppc debugging https://review.opendev.org/c/zuul/zuul/+/817963. In case you have a better approach/idea, feel free to update this change ;-) I just spotted this exception while running some tests. However, this exception doesn't seem to affect the test results.15:14
-@gerrit:opendev.org- Clint Byrum proposed:16:42
- [zuul/nodepool] 817487: Add network and subnetwork to GCE driver https://review.opendev.org/c/zuul/nodepool/+/817487
- [zuul/nodepool] 817990: Adding namespace args to k8s pod type https://review.opendev.org/c/zuul/nodepool/+/817990
@spamaps:spamaps.ems.host```FAIL: nodepool.tests.unit.test_launcher.TestLauncher.test_node_assignment_at_tenant_quota_ram```17:43
@spamaps:spamaps.ems.hostAnyone know if that one regularly times out?17:43
@jim:acmegating.comi've seen it once or twice; it's likely a slightly flaky test17:44
@spamaps:spamaps.ems.hostKK, I'll recheck17:44
@spamaps:spamaps.ems.hostOn to other things.. `Warning: rbac.authorization.k8s.io/v1beta1 Role is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 Role`17:44
@spamaps:spamaps.ems.hostLooksl ike one of the things the k8s driver does needs some updating for newer k8s.17:45
@spamaps:spamaps.ems.hostI think it's just dropping the beta tag.. but I wonder.. are we testing k8s in the gate? I didn't look yet.17:46
@jim:acmegating.comspamaps: yes, twice: nodepool-functional-k8s and nodepool-functional-openshift17:47
@spamaps:spamaps.ems.host`      minikube_version: v1.22.0  # NOTE(corvus): 1.23.0 failed with 404 on create_namespaced_role`17:55
@spamaps:spamaps.ems.hostIndeed. ;)17:55
@jim:acmegating.comyep, that needs some updating17:56
@spamaps:spamaps.ems.hostI am guessing v1 was ratified a long, long time ago.17:56
@jim:acmegating.comspamaps: i have confirmed that gerrit's zuul was spamming the recursive equality issue as expected18:25
@jim:acmegating.comand it appears it is no longer doing so now that i have restarted it18:26
@clarkb:matrix.orgI've been asked to do a resource usage audit of the opendev zuul/nodepool install and I found the graphite data we emit from zuul for that: but the numbers look off to me. Also we removed the job name from the info log entry on the exuectors that records this in the logs. Two questions. Am I looking at the statsd graphite data wrong? Any idea what the scale is there? Milliseconds? And would it be terrible to include the job name in the BuildRequest records so that they get logged?18:49
@clarkb:matrix.orgfwiw I'd like to use the statsd data for aggregate reporting, but the logs are still helpful when spot checking specific things18:49
@jim:acmegating.comi think it'd be great to include the job name18:50
@clarkb:matrix.orggreat, in that case I'll try to take a look at adding that in today18:51
@jim:acmegating.comthe graphite data can be a little tricky to interpret; i usually have to have the docs for the timer type handy to work out what i need18:54
@jim:acmegating.comif you paste a link, i can try to help18:55
@clarkb:matrix.orgnoted. I'll have to find the docs for these entries18:55
@jim:acmegating.comyou have to keep the aggretation in mind with timers too18:56
@clarkb:matrix.orgcorvus: https://graphite.opendev.org/?width=586&height=308&target=stats.zuul.nodepool.resources.project.opendev_org-openstack-neutron.instances is the sort of thing I'm looking at18:56
@jim:acmegating.comClark: another option would be the sql db and/or rest api18:56
@clarkb:matrix.orgThe idea would be to compare total instance time for each openstack project in a graph18:56
@clarkb:matrix.org(or something along those lines)18:56
@jim:acmegating.comthe sql db is how zuul answers the question of how long a job takes now18:56
@clarkb:matrix.orgah right18:57
@clarkb:matrix.orgI would probably need to sum all the data points over $timeperiod per project?18:57
@clarkb:matrix.orgThen I could say eg Neutron used X time over this period and Nova used Y time.18:58
@clarkb:matrix.orgintegralByInterval(thatpath, "1h") Seems to give me something maybe useful19:02
@jim:acmegating.comClark: re that graphite link -- the underlying metric is "instance-seconds", but it's being interpreted as a rate per second, so it's instance-seconds/second, and graphite averages that over a minute.19:02
@clarkb:matrix.orgI see, they aren't raw values, they are a rate. Is that how we should be sending the data?19:03
@clarkb:matrix.orgI guess graphite deals with rates19:03
@clarkb:matrix.orgAnyway putting that into integrateByInterval seems to give me something closer to what I want ( an hourly or other period) usage19:04
@jim:acmegating.comwell, it's statsd that's doing the '/second' bit.19:04
@clarkb:matrix.orgBig link warning: https://graphite.opendev.org/?width=943&height=529&target=integralByInterval(stats.zuul.nodepool.resources.project.opendev_org-openstack-neutron.instances%2C%20%221d%22)&target=integralByInterval(stats.zuul.nodepool.resources.project.opendev_org-openstack-nova.instances%2C%20%221d%22)&from=00%3A00_20211109&until=23%3A59_20211115 is closer to what I need19:07
@clarkb:matrix.organy idea why when I shift the range to the 8th or previous the scale drops by a factor of 10?19:07
@clarkb:matrix.orgIt seems to do that even if I move the end date backward so not something to do with the range width19:08
@clarkb:matrix.orgI know we fixed a bug related to this but I dind't think it was that recently. Maybe that was when we restarted to pick up the sos work19:09
@jim:acmegating.comi don't remember the details, but i do think there was a period where we may not have been reporting stats correctly19:10
@clarkb:matrix.orgok it is probably good enough to compare more recent stuff and just make it work going forward. Thanks for walking me through this it was helpful19:10
@spamaps:spamaps.ems.host> <@jim:acmegating.com> and it appears it is no longer doing so now that i have restarted it20:49
Woot!
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 818019: Add job name to BuildRequest repr https://review.opendev.org/c/zuul/zuul/+/81801920:53
@clarkb:matrix.orgThere is the change I promised. Easier to write than I thought it would be as job_name was already an attribute on BuildRequest20:53
@clarkb:matrix.orgI need to followup on my config loading change too as unittests were unhappy with it20:54
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Felix Edel: [zuul/zuul] 817963: Fix NoneType ppc exception in freezeJobGraph() https://review.opendev.org/c/zuul/zuul/+/81796320:58
@clarkb:matrix.orgcorvus:  small thing on https://review.opendev.org/c/zuul/zuul/+/815558 if you have a second20:58
@jim:acmegating.comayep21:00
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 815558: Replace asserts with exceptions https://review.opendev.org/c/zuul/zuul/+/81555821:00
@clarkb:matrix.orgcorvus: for https://review.opendev.org/c/zuul/zuul/+/817869 I guess we're waiting on the new patchset that does connection work prevention?21:12
@jim:acmegating.comClark: yep, i'll try to get to it today, but it's behind a few things in my queue21:18
@clarkb:matrix.orgno rush, I was just making sure I understood the state of the sos queue. I think I've approved a bunch of approvable changes21:21
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 817652: Prevent duplicate config file entries https://review.opendev.org/c/zuul/zuul/+/81765221:47
@clarkb:matrix.orgYay for testing. That was an actual bug in the client when I updated readConfig21:48
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 817959: Remove fake SQL classes and variables from tests https://review.opendev.org/c/zuul/zuul/+/81795922:02
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 817960: Add missing onStop() call to TestMysqlDatabase https://review.opendev.org/c/zuul/zuul/+/81796022:06
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 817963: Fix NoneType ppc exception in freezeJobGraph() https://review.opendev.org/c/zuul/zuul/+/81796322:30
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 817646: Re-order scheduler shutdown https://review.opendev.org/c/zuul/zuul/+/81764622:30
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 817652: Prevent duplicate config file entries https://review.opendev.org/c/zuul/zuul/+/81765223:16
@clarkb:matrix.orgArg I ran unittests but not the linter before the last push. Sorry about that23:16
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 818030: Add admin password support for Azure driver https://review.opendev.org/c/zuul/nodepool/+/81803023:52
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 807464: Add metastatic driver https://review.opendev.org/c/zuul/nodepool/+/80746423:53
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:23:58
- [zuul/nodepool] 818030: Add admin password support for Azure driver https://review.opendev.org/c/zuul/nodepool/+/818030
- [zuul/nodepool] 807464: Add metastatic driver https://review.opendev.org/c/zuul/nodepool/+/807464

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!