Wednesday, 2019-07-24

clarkbany idea how http://logs.openstack.org/58/671858/3/gate/zuul-tox-docs/b6f9626/job-output.txt.gz#_2019-07-23_16_48_40_674698 happened and if it has been corrected?00:14
*** ianychoi has quit IRC00:18
*** ianychoi has joined #zuul00:20
fungiclarkb: fixed by 672372 maybe?00:22
clarkbah I see that now in scrollback thanks00:23
openstackgerritMerged zuul/zuul master: Fix sphinx error  https://review.opendev.org/67237200:33
*** mattw4 has quit IRC00:37
*** jamesmcarthur has joined #zuul00:52
*** jamesmcarthur has quit IRC00:58
*** igordc has quit IRC01:04
*** jamesmcarthur has joined #zuul01:20
*** bhavikdbavishi has joined #zuul01:52
*** bhavikdbavishi1 has joined #zuul01:55
*** bhavikdbavishi has quit IRC01:57
*** bhavikdbavishi1 is now known as bhavikdbavishi01:57
*** jamesmcarthur has quit IRC02:16
*** jamesmcarthur has joined #zuul02:17
*** jamesmcarthur has quit IRC02:21
*** jamesmcarthur has joined #zuul02:22
*** jamesmcarthur has quit IRC02:24
*** jamesmcarthur has joined #zuul02:24
*** jamesmcarthur has quit IRC02:32
*** bhavikdbavishi has quit IRC02:46
*** mattw4 has joined #zuul03:21
*** bhavikdbavishi has joined #zuul03:35
*** mattw4 has quit IRC03:37
*** mattw4 has joined #zuul03:43
*** mattw4 has quit IRC03:51
*** igordc has joined #zuul03:58
*** igordc has quit IRC04:01
*** bolg has joined #zuul04:02
*** raukadah is now known as chandankumar04:02
*** pcaruana has joined #zuul04:27
*** michael-beaver has quit IRC04:31
*** bjackman has joined #zuul04:43
*** pcaruana has quit IRC05:12
*** jangutter has quit IRC05:18
*** pcaruana has joined #zuul05:25
*** bolg has quit IRC05:34
*** bolg has joined #zuul05:36
*** bolg has quit IRC05:55
*** tosky has joined #zuul06:41
*** rlandy has joined #zuul06:58
*** jpena|off is now known as jpena07:12
*** jangutter has joined #zuul07:19
*** jpena is now known as jpena|mtg07:19
daniel2So with the dockerized zuul setup, how does nodepool build images?  Does it still use diskimagebuilder?07:21
*** jangutter has quit IRC07:23
openstackgerritIan Wienand proposed zuul/nodepool master: Enable debug logs for openstack-functional tests  https://review.opendev.org/67241207:23
*** hashar has joined #zuul07:41
*** bolg has joined #zuul07:53
*** threestrands has joined #zuul08:02
*** rlandy is now known as rlandy|mtg08:51
*** altlogbot_2 has quit IRC08:51
*** irclogbot_2 has quit IRC08:51
*** altlogbot_2 has joined #zuul08:53
*** irclogbot_3 has joined #zuul08:53
*** hwangbo has quit IRC09:04
*** jangutter has joined #zuul09:16
*** saneax has joined #zuul09:35
*** bhavikdbavishi has quit IRC09:38
*** bolg has quit IRC09:39
*** bolg has joined #zuul09:48
*** jamesmcarthur has joined #zuul09:49
*** arxcruz is now known as arxcruz|brb09:54
*** jamesmcarthur has quit IRC09:58
*** jamesmcarthur_ has joined #zuul09:58
*** threestrands has quit IRC10:08
*** sshnaidm|afk is now known as sshnaidm10:10
*** jamesmcarthur_ has quit IRC10:11
zbr_can can I add some extra tasks to run on a child-job without overriding pre-run from parent? Is there a way to do this?10:13
zbr_i guess roles would do the trick, but I am not sure how inheritance works with them10:14
AJaegerzbr_: if you add a pre-run, it is run *in-addition*, see also https://zuul-ci.org/docs/zuul/user/config.html#job10:16
AJaegerzbr_: so, you never override pre-run10:16
openstackgerritFabien Boucher proposed zuul/zuul master: Return dependency cycle failure to user  https://review.opendev.org/67248710:16
zbr_AJaeger: super! thanks.10:17
zbr_it totally makes sense to do it like that.10:17
*** hashar has quit IRC10:40
*** saneax has quit IRC11:32
*** saneax has joined #zuul11:33
*** irclogbot_3 has quit IRC11:33
*** irclogbot_3 has joined #zuul11:36
*** wxy-xiyuan has quit IRC11:42
*** hashar has joined #zuul11:57
*** bolg has quit IRC12:04
*** bhavikdbavishi has joined #zuul12:12
*** roman_g has joined #zuul12:22
roman_gHello team! Users question. I have 2 dependent changes in 2 repos - A and B. Change in repo B has proper Depends-On: xxxxxx pointing to the change in repo A. Code in repo B git-clones repo A internally and it currently clones 'master'. How could I properly utilize Zuul to do git-clone repo A for me, and apply the parent change over it?12:28
roman_g*User's12:28
flaper87what ansible version does zuul-executor use by default?12:28
flaper87(when there are multiple versions installed, that is)12:28
flaper87is it the oldest or the latest?12:29
*** arxcruz|brb is now known as arxcruz12:29
Shrewsflaper87: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.ansible-version12:29
*** bhavikdbavishi has quit IRC12:35
*** bhavikdbavishi has joined #zuul12:39
flaper87Shrews: thanks12:39
*** bhavikdbavishi1 has joined #zuul12:42
*** bhavikdbavishi has quit IRC12:43
*** bhavikdbavishi1 is now known as bhavikdbavishi12:43
AJaegerroman_g: Use required-projects - it's a list of repos that we download on disk for you, it's the same branch plus your depends-on changes on it.12:44
AJaegerroman_g: So, just use that repo, you get it for free ;)12:44
AJaegerI meant: checked out tree - and not entirely free but no magic for you to apply a change12:44
AJaegerroman_g: you can pass that dir to your jobs like in https://opendev.org/zuul/nodepool/src/branch/master/playbooks/nodepool-functional-openstack/pre.yaml#L412:47
*** bolg has joined #zuul12:50
*** yoctozepto has joined #zuul12:54
roman_gAJaeger: thank you! Looking into it.12:56
*** bjackman has quit IRC12:59
*** bolg has quit IRC12:59
*** bolg has joined #zuul13:01
*** bolg has quit IRC13:05
*** jku has joined #zuul13:06
*** jku has quit IRC13:06
*** jank has joined #zuul13:07
*** jank has quit IRC13:07
tristanCcorvus: about zuul-tests.d, is there a change planned to match such test job when a job definition changes (to avoid adding the full zuul.yaml to the files list)?13:09
*** jank has joined #zuul13:10
AJaegertristanC: already merged ;)13:11
AJaegertristanC: see https://review.opendev.org/66975213:11
*** jpena|mtg is now known as jpena|off13:12
*** jank has quit IRC13:14
tristanCAJaeger: thanks. So a tox-linters-test job would set tox-linters as parent to trigger when tox-linters definition change right?13:15
*** jank has joined #zuul13:15
openstackgerritFabien Boucher proposed zuul/zuul master: Return dependency cycle failure to user  https://review.opendev.org/67248713:15
AJaegertristanC: not sure what happens if parent changes, best check implementation or ask corvus...13:16
*** bhavikdbavishi has quit IRC13:20
tristanCcorvus: it seems like  match-on-config-updates implies that a job is able to self test itself. Wouldn't it make sense to let another job run when a job definition change? e.g. tox-linters-test.match-on-job-update: [tox-linters] ?13:24
*** jpena|off is now known as jpena13:25
*** jpena is now known as jpena|mtg13:27
*** jamesmcarthur has joined #zuul13:55
*** jeliu_ has joined #zuul14:07
*** lennyb has quit IRC14:18
*** jamesmcarthur has quit IRC14:25
*** michael-beaver has joined #zuul14:29
*** mattw4 has joined #zuul14:31
*** rlandy|mtg has quit IRC14:32
fungidaniel2: depends on what you mean by "dockerized zuul setup" but sure you can run nodepool builders (including diskimage-builder) in a container. nodepool can also talk to container orchestration engines (kubernetes, openshift) to launch containers for jobs to run in, if that's what you're asking14:35
*** jpena|mtg is now known as jpena|off14:36
corvustristanC: right, the match-on-config-update feature was designed to eliminate the need for '.zuul.yaml' in files matchers, so that when a job is updated, it is run.  most jobs are self-testing.  i'm having trouble imagining a job which tests another job without also being descended from that job.  in zuul-jobs, we have a lot of test-foo jobs, but they are testing roles.  a tox-linters-test job could inherit14:39
corvusfrom tox-linters, and it should match on a job update to either.  so i think even that use-case is covered.14:39
tristanCcorvus: so match-on-config-updates also matches parent job definition change?14:42
corvustristanC: afaik it should14:42
corvustristanC: it's basically implemented as a diff for the finalized job config; if anything about what it's about to run changes, it'll match14:43
tristanCcorvus: the use-case is to be able to test the pre/post phase of the job14:43
tristanCcorvus: for example, for tox job, we might want to be able to setup a tools/tests-setup.sh, which would need to be done before the unittest pre phase14:44
corvustristanC: got it.  yeah, that should work14:45
*** igordc has joined #zuul14:49
*** AshBullock has joined #zuul14:51
*** mattw4 has quit IRC14:51
AshBullockHey guys, we are trying to hook up kubernetes to nodepool, we see in the documented configs two labels, namepace and pod, we've added both of these, but are seeing "AttributeError: 'NoneType' object has no attribute 'create_namespace'"  in the nodepool logs, how are these labels supposed to be setup? Thanks14:55
tristanCAJaeger: there should be an exception in the logs about "Couldn't load client from config"14:56
tristanCAshBullock: ^ (sorry AJaeger, wrong autocomplete)14:56
AshBullockthis is our config file for nodepool http://paste.openstack.org/show/754804/14:59
*** mattw4 has joined #zuul15:00
AshBullockand the error we get is http://paste.openstack.org/show/754805/15:00
AshBullockand to confirm we do see the error you mentioned 2019-07-24 14:55:57,139 ERROR nodepool.driver.kubernetes.KubernetesProvider: Couldn't load client from config15:01
*** swest has quit IRC15:03
tristanCcorvus: alright, we'll give this a try then. I guess we can re-run the pre and post phase of the parent in the run phase of the test job and do the assert in the child job15:03
tristanCs/test job/child job/15:04
tristanCAshBullock: did you setup the ~nodepool/.kube/config file?15:05
AshBullockhave the kube config added yes15:06
AshBullockNow receiving this error: "Failure","message":"namespaces is forbidden: User \"system:anonymous\" cannot create resource \"namespaces\" in API group \"\" at the cluster scope","reason":"Forbidden","details":{"kind":"namespaces"},"code":403}15:07
AshBullockafter updating my nodepool config15:08
AshBullockto reference the kube context name correctly15:08
tristanCAshBullock: perhaps there is something to enable in EKS to enable your service account to list/create resources15:09
*** jank has quit IRC15:09
AshBullockthanks, I'll look into that now15:09
tristanCAshBullock: iirc eks requires an iam token to use the api from outside. is your nodepool-launcher service running in eks?15:12
*** roman_g has left #zuul15:29
AshBullockthanks, managed to get it spinning up now, issue was the aws client was out of date and did not have the get-token command15:32
AshBullockthanks for all the help15:33
clarkbwe might want to make note of that requirement in the docs?15:34
corvusyeah, any additional info that could help future users would be great :)15:37
AshBullockSo I've got the containers running but get this error: main | MODULE FAILURE: error: You must be logged in to the server (Unauthorized)     after installing kubectl on the executor, I can run kubectl commands from nodepool user but I assume the ansible run is using a virtual env, any ideas how to solve this?15:42
AshBullockthis is running on the pre.yml tasks15:43
*** mattw4 has quit IRC15:44
*** mattw4 has joined #zuul15:44
AshBullockwhich is running zuul roles  add-build-sshkey , prepare-workspace15:45
tristanCAshBullock: nodepool creates a service account per zuul noderequest like so: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/driver/kubernetes/provider.py#L15515:46
tristanCAshBullock: which may different from the one you had to configure for nodepool...15:47
AshBullockso that created service account probably doesn't have access to the credentials?15:48
AshBullockso how would I pass them through to the user?15:48
*** mattw4 has quit IRC15:49
tristanCAshBullock: it shouldn't have access to the credentials to prevent the job from tempering with another job resources.15:50
tristanCAshBullock: it is meant to be a restricted account for the namespace created per job15:51
AshBullockwe're targeting hosts: all, should we be targeting hosts: pod as per this guide: https://www.softwarefactory-project.io/tech-preview-using-openshift-as-a-resource-provider.html15:51
tristanC"hosts: all" should match the resources given to the job15:53
tristanCthe blog post uses "pod" instead just to be more explicit15:54
AshBullockwith containers is there a list of approved zuul roles to use?15:56
tristanCAshBullock: most should work, except for the one using the "synchronize" module15:57
tristanCAshBullock: here is how we copy the sources to pod: https://review.opendev.org/63140215:58
*** AshBullock has quit IRC16:04
clarkbI have rechecked https://review.opendev.org/#/c/671858/3 since sphinx builds were fixed (just a heads up that that should be going in)16:06
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Download-artifact: use the artifact type rather than name  https://review.opendev.org/67255716:09
*** hashar has quit IRC16:10
*** AshBullock has joined #zuul16:13
*** pcaruana has quit IRC16:13
mordredcorvus: that also looks good - I can go either way16:20
corvusmordred: it's a little more work, but ending up with "_type" is going to bother me less, so i'm perfectly happy to do it :)16:21
mordred\o/16:21
openstackgerritJames E. Blair proposed zuul/zuul master: Add log browsing to build page  https://review.opendev.org/67190616:25
openstackgerritJames E. Blair proposed zuul/zuul master: Move artifacts to their own section  https://review.opendev.org/67237916:25
daniel2fungi: So I added the nodepool builder container, I'm not sure if you or anyone active here has done this, but I can't get the images directory to create due to permission issues.  Even if its put inside of the home /var/lib/nodepool.16:27
daniel2Docker log shows: PermissionError: [Errno 13] Permission denied: '/var/lib/nodepool/images/builder_id.txt'16:28
clarkbdaniel2: can you check the uid:gid ownership of that file and that of the nodepool-builder process?16:30
clarkbthe nodepool-builder records that file so that it identifies itself uniquely to zookeeper iirc. It will need to be able to write to that directory and file16:31
daniel2clarkb: so when docker runs, it does a whoami, but fails `whoami: cannot find name for user ID 10001`16:31
daniel2This is using the zuul/nodepool-builder image16:31
daniel2That folder is probably not the right permissions but not sure how to change that.  I guess I could write a file to run before the command16:32
clarkbI expect the nodepool-builder image wants you to mount in a volume or bindmount for that directly so that it persists between docker container instances16:32
clarkband you'll have to set permissions appropriately?16:32
daniel2clarkb: That's what I'm doing.16:32
daniel2using a bind mount16:32
clarkbdaniel2: you can docker exec a shell into the image to poke around and check things like permissions16:32
daniel2I cant because the container died due to nodepool-builder exiting.16:32
clarkbyou can run it then and change the command16:33
daniel2dunno why I didnt think of that16:33
daniel2must be the lack of sleep16:33
clarkbjust double confirming that uid is the one set by the image so that is the expected value16:35
daniel2clarkb: this is just strange, the permissions appear correct.16:38
daniel2Would it possibly be an issue with the host?16:38
pabelangeris /var/lib/nodepool/images/ a mount?16:39
daniel2/dev/mapper/area50--vg-root on /var/lib/nodepool/images type ext4 (rw,relatime,errors=remount-ro,data=ordered)16:39
pabelangerthat is from inside container?16:40
daniel2yes16:40
pabelangerand touch /var/lib/nodepool/images/foo.txt works?16:40
daniel2no, gives permission denied16:40
pabelangerwhat does mount outside of container look like16:41
clarkband what are permissions of the directory16:41
clarkb(to create a file you need w on the dir iirc)16:42
daniel2drwxrwxr-x 2 ubuntu ubuntu 4.0K Jul 24 04:50 images16:42
pabelangerwhat is uuid of ubuntu?16:42
clarkbthat would be the issue I think16:42
daniel2100016:42
pabelangerthat should be the same as user in container16:42
daniel2eheh16:42
daniel2Can I specify the user id in the docker compose file16:43
clarkbwe may need to modify our entrypoint to chown that16:43
daniel2ah I see16:43
pabelangerI don't think we start a nodepool-builder in quickstart do we?16:43
daniel2no we don't.16:43
clarkbpabelanger: we don't16:43
daniel2I added that myself.16:43
Shrewsquickstart uses static nodes so builder isn't necessary16:45
daniel2builder_1       | chown: changing ownership of '/var/lib/nodepool/images': Operation not permitted16:45
daniel2:D16:45
pabelangerdaniel2: https://opendev.org/windmill/ansible-role-nodepool/src/branch/master/tests/playbooks/templates/etc/systemd/system/nodepool-builder.service.j2 is how I do it directly with docker, -u 1001:1001, for the volumes16:46
*** mattw4 has joined #zuul16:46
pabelangernot sure how do to it with compose16:46
clarkbpabelanger: that presumes you chown outside of docker though right?16:46
daniel2ohhh16:46
clarkbpabelanger: its the same problem in either case whether you change the uid or not16:46
clarkbsomething has to set ownership on that dir an dlooks like the current image is failing to do so16:47
daniel2Well, I guess changing the id wouldn't work in nodepool-builder16:47
clarkbdaniel2: do you have more log context for the chown failure?16:47
daniel2no thats all it said16:47
pabelangerclarkb: I cann't remember. I'd have to look at docs16:47
pabelangerbut I don't chown anything directly, docker does it16:47
pabelangerdaniel2: but that is using zuul/nodepool-builder images, so should work16:48
daniel2I could try and bind mount the service file16:48
daniel2oh no16:49
daniel2Thats to start with docker16:49
pabelangeryah, this isn't using compose, just docker directly16:49
pabelangeryou could just try command manually, and see if it works16:50
daniel2https://shafer.cc/paste/view/0c89553c That was what I had in docker-compose.yaml when I tried with chown16:50
pabelangerthen, work to add it to compose16:50
clarkbdaniel2: did you chown it to ubuntu:ubuntu then?16:51
daniel2I got past it16:51
daniel2I set user: 1000:1000 in docker-compose section for builder16:52
clarkbsure but if you didn't chown it to 1000:1000 in the first place would it have owrked?16:52
daniel2No, I didn't chown it until you guys had mentioned it16:52
daniel2before I wasn't doing anything outside of the norm16:52
clarkbfwiw our entrypoint for the opendev gitea images does an explicit chown of the mounts. I'm not see where we might do that for nodepool-builder16:52
clarkbdaniel2: what i sthe context of builder_1       | chown: changing ownership of '/var/lib/nodepool/images': Operation not permitted ? was that a chown you ran or a chown that it tried to rn on its own?16:53
daniel2it was a chown I ran16:53
daniel2using the command config line in the docker-compose section16:53
pabelangerclarkb: Yah, same. I have ansible chown / chmod the /var/lib/nodepool folder to specific user, which is uuid of user in container. I assume compose has the same ability16:53
clarkbgotcha so I think we may want ot update nodepool/tools/uid_entrypoint.sh to chown things properly16:53
clarkbpabelanger: its not compose's job, it is the entrypoint16:54
clarkbat least with how we've set up gitea16:54
Shrewsperhaps the best course of action is to set images-dir in nodepool.yaml to some place that a) it has permission to write to, and b) has space to actually build images17:00
daniel2mkdir: cannot create directory '/var/lib/nodepool/.cache/image-create': Permission denied17:00
daniel2this is getting old D:17:00
Shrewsthat will fix the builder_id.txt issue since it writest to images-dir17:00
Shrewsan external volume is probably best17:01
daniel2It is.17:01
daniel2I have no name!@7c14b606484d:/$ ls17:01
daniel2Nice hostname17:01
daniel2or username.17:01
clarkbShrews: yes the way it is done with other tools is to have the entrypoint do the chown17:01
clarkbShrews: it would also work to chown outside of the container17:01
* clarkb gets an example17:02
clarkbShrews: https://opendev.org/opendev/system-config/src/branch/master/docker/gitea-init/entrypoint.sh#L17-L26 we could do that with nodepool's entrypoint17:04
Shrewsclarkb: but /var/lib is within the actual container, right? i'm suggesting an external volume mounted at run time17:04
clarkbShrews: yes17:04
clarkbsame as with /data/ in the gitea example17:05
*** AshBullock has quit IRC17:09
*** igordc has quit IRC17:16
*** igordc has joined #zuul17:17
daniel2So I fixed one problem and created another :)17:23
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: testing  https://review.opendev.org/67256717:24
*** sgw has quit IRC17:25
fungidaniel2: welcome to computers? ;)17:25
daniel2haha right17:25
fungipretty much describes my typical day17:25
daniel2at least I finished enough of the CI setup that we were able to close out that issue for the sprint.17:26
daniel2Thats why I was up so late, wanted to knock that out.17:26
fungiawesome~!17:26
daniel2We moevd the nodepool stuff to another issue.17:26
*** igordc has quit IRC17:27
yoctozeptotried asking on infra but maybe here is a better place - any idea why http://zuul.openstack.org/builds?project=openstack%2Fkolla-ansible&pipeline=periodic&pipeline=periodic-stable&branch=master&branch=stable%2Fstein&branch=stable%2Frocky&branch=stable%2Fqueens does not return?17:29
yoctozeptoit does if you replace kolla-ansible with kolla17:29
yoctozeptoor remove filter on periodic-stable17:30
yoctozeptootherwise it does not work (or would take hours? waited several minutes already)17:30
fungiit's not clear to me what that query means17:33
fungiare those terms expected to be anded? ored?17:33
clarkbfungi: it operates as an AND17:33
clarkb(sorry lucene query language all caps habit)17:33
clarkbI think that is why you get no results17:34
clarkbyou can't be in two pipelines17:34
fungiare the types compared via and but multiple options within each type compared with or?17:34
clarkbfungi: I don't think so17:34
* clarkb looks at the sql query17:34
clarkbhrm it moved17:35
AJaegerclarkb, fungi: http://zuul.openstack.org/builds?project=openstack%2Fkolla-ansible&pipeline=periodic-stable is much simpler and shows the problem that yoctozepto has...17:36
AJaegeryoctozepto: keep it simple, please ;)17:36
fungii'm guessing the expectation is that this should query "project:openstack/kolla-ansible AND pipeline:(periodic-stable OR stable) AND branch:(master OR stable/stein or stable/rocky OR stable/queens)"17:36
corvusif someone wants to implement that, go for it, but that's not how it works :)17:37
yoctozeptoAJaeger: I did in the other channel, then I discovered this one loads in a couple of minutes17:37
fungiahh, sorry, my network connection here is going in and out, so my responses are lagging somewhat17:37
yoctozepto;-)17:37
clarkbas a sanity check periodic-stable pipeline does have the mysql reporter listed17:37
yoctozeptoguys17:38
yoctozeptobut it works for kolla17:38
yoctozeptomagic: http://zuul.openstack.org/builds?project=openstack%2Fkolla&pipeline=periodic&pipeline=periodic-stable&branch=master&branch=stable%2Fstein&branch=stable%2Frocky&branch=stable%2Fqueens17:38
clarkbAJaeger: that url works for me17:38
yoctozeptohence someone has implemented it17:38
yoctozeptobut it does not want to work for kolla-ansible17:38
fungior this is undefined behavior17:38
yoctozeptofor no particular reason17:38
clarkblooking at the api code it expects a singular pipeline17:38
yoctozeptololz, but it worked so great until I tried it on k-a17:39
yoctozepto;D17:39
yoctozeptohttp://zuul.openstack.org/builds?project=openstack%2Fnova&pipeline=periodic&pipeline=periodic-stable&branch=master&branch=stable%2Fstein&branch=stable%2Frocky&branch=stable%2Fqueens17:39
yoctozeptoetc.17:39
yoctozepto;D17:39
fungibut yes, the results there do seem to match the pseudoquery i wrote above, so possible it works that way by accident17:39
yoctozeptoit also does work for check+gate: http://zuul.openstack.org/builds?project=openstack%2Fkolla-ansible&pipeline=gate&pipeline=check&branch=master&branch=stable%2Fstein&branch=stable%2Frocky&branch=stable%2Fqueens17:40
corvusif there are multiple values for a parameter, then they are treated as an "in" query17:40
fungii wonder if one of those branches or pipelines has no kolla-ansible matches and that's the difference17:40
yoctozeptoonly not if you sprinkle periodic-stable17:41
yoctozeptofungi: good one17:41
yoctozeptolemme check17:41
AJaegeryoctozepto: so, what's the smallest query that shows the problem?17:41
*** pcaruana has joined #zuul17:41
yoctozeptoAJaeger: I wish I knew17:41
AJaegeryoctozepto: yeah, get now results for the one I posted using kolla-ansible - was too impatient last time ;)17:41
clarkbq = self.listFilter(q, buildset_table.c.pipeline, pipeline)17:42
clarkbcorvus: ^ that generates the "in" ?17:42
*** saneax has quit IRC17:42
corvusclarkb: yep17:42
corvusif it's single, it's "==", otherwise it's "in"17:42
clarkbah yup I see the defintion of listFilter now17:43
yoctozeptorocky/queens do not have any periodic-stable17:43
yoctozeptobut withouth them: http://zuul.openstack.org/builds?project=openstack%2Fkolla-ansible&branch=master&branch=stable%2Fstein&pipeline=periodic&pipeline=periodic-stable17:43
yoctozeptostill nothing17:43
*** saneax has joined #zuul17:43
*** saneax has quit IRC17:44
yoctozeptoAJaeger: you might like this, it's shorter ;p17:44
yoctozeptoI think I overloaded Zuul now17:44
yoctozeptoit says Fetching info...17:44
yoctozeptofor the previously working too now17:44
fungiwell, it's only zuul-web you're overloading, presumably17:45
yoctozeptoyeah, thought about that17:45
fungi(if you are)17:45
fungiseparate daemon from the scheduler17:45
yoctozeptoI'm not well versed in zuul's components yet17:45
yoctozeptoI see, will have that in mind17:45
yoctozepto(the next time I overload it)17:45
corvusthis is the query that has been running for 11 minutes: http://paste.openstack.org/show/754810/17:46
clarkbis it possible that seraching over (project, branch, pipeline) simply needs to be better indexed?17:46
clarkbthe query that works is only (project, pipeline)17:46
yoctozeptoclarkb: it works fast for kolla and nova for example17:47
* yoctozepto - zuul-web official overloader17:47
corvuswow it does construct the query differently for kolla and kolla-ansible17:47
* yoctozepto convicted for generating a 11 minute query :-(17:48
fungihuh, that's bizarre17:48
yoctozeptomaybe due to -?17:48
corvushttp://paste.openstack.org/show/754811/17:48
yoctozeptoI mean: -17:48
corvusfirst one is k-a, second is k17:48
yoctozeptoah, on the mysql side17:49
yoctozeptoUsing filesort and Using join buffer (Block Nested Loop) spoil the play :-(17:50
corvusmordred, Shrews: ^ i'm stumped as to why the query planner would make that choice17:57
corvuskolla-ansible has 11k buildsets in the db, and kolla has 10k.  so they should both be in the index17:58
yoctozeptocorvus: could you try explain after replacing USE INDEX with FORCE INDEX?17:58
yoctozeptocorvus: you can also compare with nova as it is fine too17:58
corvusyoctozepto: force index with k-a shows the same plan17:59
Shrewswhat indexes are available on the zuul_build table?18:00
yoctozeptocorvus: yeah, was worth trying anyway, it protects agains full scan on the same table though...18:00
Shrewsit's doing a table scan on that table for both queries18:00
corvushttps://etherpad.openstack.org/p/Z0cucbdugf18:00
*** igordc has joined #zuul18:01
* fungi is enjoying this impromptu crash course in database mysteries18:01
Shrewscorvus: could you get a describe for the other two tables?18:02
* yoctozepto sharing fungi's enjoyment18:03
corvusdone18:03
Shrewswhat's the difference in those two queries? my eyes can't find it18:04
yoctozeptoShrews: kolla vs kolla-ansible18:04
yoctozeptochanging project caused it18:04
corvusyeah, only the string constant; no structural change18:04
Shrewsand they're both slow?18:04
yoctozeptoexactly18:04
yoctozeptonope18:05
yoctozeptokolla fast18:05
yoctozeptokolla-ansible slow18:05
yoctozeptonova fast18:05
yoctozeptoprobably many more fast18:05
Shrewsoh. figuring out why the optimizer does what it does is dark magic. lemme see if i can spot anything obvious though. my guess is the cardinality is significantly different for the projects18:06
fungi(de)optimizer18:07
Shrewsselect count(*) from zuul_buildset where project = "openstack/kolla"       <--- that and a similar count for "openstack/kolla-ansible" might be useful18:09
yoctozeptoand nova18:09
Shrewsor at least interesting18:09
fungi17:58 <corvus> kolla-ansible has 11k buildsets in the db, and kolla has 10k...18:09
yoctozeptoand nova has probably many more18:10
corvus4618:10
corvusexact numbers in etherpad now18:10
yoctozepto;D18:10
yoctozeptok-a has an even number18:11
yoctozeptothe others have odd18:11
corvusnova has the fast query, nova-specs is slow18:12
yoctozeptoand count is?18:13
corvusyoctozepto: and nova-specs is 2153 -- odd :)18:13
yoctozepto:-(18:13
Shrewspipeline might make a difference in counts18:13
corvusyoctozepto: but that is making me wonder about your '-' theory18:13
yoctozeptocorvus: yeah but why xD18:13
corvusit's a longer form of something that's also in the index18:14
yoctozeptoI proposed that before you explained it is mysql query optimizer18:14
yoctozeptoah, you think this way18:14
yoctozeptowell, it did try to use project_pipeline_idx18:15
yoctozeptoand cardinality is different/better18:15
yoctozeptobecause as you observed we have a prefix18:15
yoctozeptoI would try out some others but zuul-web is still angry at me18:16
yoctozeptoactually it does not load for me at all atm18:16
clarkbya I think it is angry more globally18:16
clarkbscheduler is still running so nothing should be lost18:16
yoctozeptoI hope so18:17
corvusi can probably kill the queries18:17
clarkbI checked the scheduler logs and it was busy and happy18:17
yoctozeptobut if mysql locked tables then it is bad anyway18:17
corvusah, there's only one long query right now18:17
corvusrunning for 1566 seconds18:17
yoctozeptokill it anyways18:17
corvusjust died on its own :)18:18
yoctozeptook ;-)18:18
yoctozepto(yeah, right)18:18
corvusif we have more questions, i'm happy to run 'explain' commands so we can find out without tying things up18:18
yoctozeptozuul-web still angry18:18
yoctozeptosure, let's go with more - dash examples18:18
yoctozeptosomething small18:19
yoctozeptokarbor18:19
yoctozeptokarbor-dashboard18:19
yoctozepto(well, relatively to nova)18:19
Shrewsso, when the pipeline is considered in the counts, the cardinality of the rows is much more different: 820 rows vs. 330718:20
Shrewsmight be enough to cause the optimizer to choose a different plan18:20
Shrewssadly, i've forgotten sooooo much about query optimization18:20
yoctozeptoShrews: A vs B18:21
yoctozeptoA = ? B = ?18:21
Shrewsprepared statements might make sense here18:23
yoctozeptoShrews: what cardinalities were those that you posted?18:24
yoctozeptok vs k-a?18:24
yoctozeptok-a vs k?18:24
yoctozeptosomething vs nova? :D18:24
yoctozepto(I can't do that myself, you know)18:24
Shrewsproject name and pipeline counts from zuul_buildsets18:24
Shrewsselect count(*) from zuul_buildset where project = "openstack/kolla-ansible" and pipeline IN ('periodic', 'periodic-stable');18:24
Shrewsselect count(*) from zuul_buildset where project = "openstack/kolla" and pipeline IN ('periodic', 'periodic-stable');18:25
yoctozeptoso kolla simply has more here?18:25
*** panda has quit IRC18:25
Shrewsyes. but nova has 3000+ too.18:26
yoctozeptoand that's why it is fast18:26
Shrewsno. just speculation18:26
yoctozeptoyeah, but quite possible18:26
yoctozeptoit actually used that index18:26
yoctozeptowith 82018:27
yoctozepto|  1 | SIMPLE      | zuul_buildset | NULL       | range | PRIMARY,project_pipeline_idx,project_change_idx | project_pipeline_idx | 1536    | NULL               |     820 |     4.00 | Using index condition; Using where; Using temporary; Using filesort |18:27
yoctozeptoso it liked this one specifically18:27
yoctozepto<corvus> yoctozepto: and nova-specs is 2153 -- odd :)18:27
yoctozeptoShrews: can you check nova-specs filtered?18:27
Shrewsi get 0 with nova-specs18:28
corvusnova-specs with periodic pipelines is 018:28
corvuswhich makes sense18:28
yoctozeptoand it's slow18:28
yoctozeptoslowest 0 in history, been there, done that18:28
Shrewswith the smaller cardinality of that combo, it's picking a poor index (project_pipeline_idx) and we then scan 820 x 7093912 rows, vs just 709391218:31
Shrewsis that a needed index?18:31
yoctozeptowe can exclude it18:32
corvusShrews: which one, project_pipeline_idx?18:32
*** panda has joined #zuul18:33
Shrewsi'd have to get a mysqldump of that data and then remember a whole bunch of stuff before i could say what a fix is18:33
Shrewscorvus: yeah18:33
yoctozeptoIGNORE INDEX (blah)18:33
corvusShrews: i think it's only there to speed up queries like this :)18:34
Shrewscorvus: seems to do the opposite  :)18:34
yoctozeptoif you don't want to remove it entirely, try if IGNORE INDEX after that join will help us for now18:35
corvusyoctozepto: yeah that seems to switch to the other form18:35
yoctozeptoYAY18:35
corvus2614 rows in set (9.24 sec)18:35
yoctozeptoreasonable18:35
*** hwangbo has joined #zuul18:36
yoctozeptostill, could probably be better if things were index more optimally for this type of query18:36
corvusit also seems that removing the branch terms uses the better query18:36
corvusyeah, i think the thinking for the project_pipeline_idx was that it should help this case, because we're giving it a project and a pipeline, so it should be able to get the 820 buildsets that match, then join with the builds.18:38
yoctozepto¯\_(ツ)_/¯18:38
yoctozepto^ that's mysql to us18:38
corvusyoctozepto, Shrews: using a single pipeline helps as well18:39
corvusi wonder if indexing the project and pipeline separately would help?18:39
yoctozeptocorvus: http://zuul.openstack.org/builds?project=openstack%2Fkolla-ansible&pipeline=periodic-stable&branch=master&branch=stable%2Fstein&branch=stable%2Frocky&branch=stable%2Fqueens - seems slow ?18:40
yoctozeptoah not that much18:40
yoctozeptolike 15 s18:40
yoctozeptoso fine indeed18:40
yoctozeptoI got impatient waiting minutes for the bad one18:40
yoctozeptojust like AJaeger did18:41
corvusthat's actually a 3rd query plan18:41
corvusyoctozepto: http://paste.openstack.org/show/754812/18:42
corvuswhen i switched to 1 pipeline, i did 'periodic' and got the 'fast' one.  your switching to 'periodic-stable' got us a new 'medium' one :)18:42
*** tosky has quit IRC18:42
corvus(it looks sort of like the 'slow' one, but it's a 'ref' rather than a 'range' scan)18:42
yoctozeptolol18:43
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: debug zuul-operator-functional-k8s job  https://review.opendev.org/67257618:43
clarkbunrelated but do we need to edit zuul's pipeline configs so that rechecked changes go into the gate?18:44
yoctozeptoI don't get this thingy18:45
yoctozeptowe have18:45
yoctozeptoINNER JOIN zuul_buildset ON zuul_buildset.id = zuul_build.buildset_id18:45
yoctozepto[ FROM zuul_build USE INDEX (PRIMARY) ]18:45
tristanCjeliu_: corvus: we shouldn't need the operator-sdk to test the zuul-operator. It seems like the cli should be used for local dev... I'll have a look why the pod doesn't start now18:45
yoctozeptobuildset has18:45
yoctozepto  PRIMARY KEY (`id`),18:45
corvustristanC: agreed, and thanks!  i think jeliu_ was also just looking into that too18:45
yoctozeptobuild has18:45
*** fdegir has quit IRC18:45
yoctozepto  PRIMARY KEY (`id`),18:45
yoctozepto  KEY `buildset_id` (`buildset_id`),18:45
yoctozeptoyet we get:18:45
yoctozepto|  1 | SIMPLE      | zuul_build    | NULL       | ALL  | NULL                                            | NULL                 | NULL    | NULL               | 7094344 |     0.00 | Using where; Using join buffer (Block Nested Loop) |18:45
yoctozeptoit's counterintuitive for me atm18:46
*** fdegir has joined #zuul18:46
yoctozeptodamn lol18:46
yoctozeptoUSE INDEX18:46
yoctozeptowhat happens if drop this bastard?18:46
yoctozeptoif you* drop18:46
corvusyoctozepto: language :)18:46
yoctozeptosorry18:47
yoctozeptonon-native speaker here, I probably feel it differently18:47
corvusyoctozepto: that corrected some very bad queries we used to have18:47
yoctozeptohmm, do you have examples?18:47
yoctozeptodid they use buildset_id or job_name_buildset_id_idx?18:48
yoctozeptomaybe let's add at least buildset_id18:48
yoctozeptoto the list18:48
corvusyoctozepto: https://review.opendev.org/605170 says it was the build list query we were fixing18:49
corvuswe could probably find more detail in irc logs around that18:49
yoctozeptoI checked https://dictionary.cambridge.org/pl/dictionary/english/bastard some dictionaries really list that as offensive, sorry18:50
corvusyoctozepto: no prob :)18:50
corvusyoctozepto: but i think that may have been the query without any project filters18:50
yoctozeptofun thing is the best translations to Polish are not offensive language at all, they are actually much politer than what you actually hear18:51
yoctozepto"MySQL's query optimizer is choosing a poor index to use when"18:51
yoctozeptopoor index is not very precise :D18:51
corvusyoctozepto: agreed.  though i'm not sure we could characterize our current problem with 100% precision :)18:51
yoctozeptoit's a very generic method anyway18:52
yoctozeptothis probably hit some things good, some things bad18:52
yoctozeptocorvus: well, forcibly excluding the best JOIN INDEX18:53
yoctozeptois a BAD IDEA18:53
yoctozepto(TM)18:53
corvusif we drop use index, things get very fast indeed, for k-a, 2614 rows in set (0.99 sec)18:53
yoctozeptocan you re-explain the queries without this?18:54
corvusyoctozepto: line 114 of etherpad18:54
yoctozeptocan yo do that for kolla and nova and nova-specs too?18:55
yoctozeptoand possibly kolla-ansible in the "medium case"?18:55
yoctozepto(interested in how mysql goes about it)18:56
yoctozepto|  1 | SIMPLE      | zuul_build    | NULL       | ref   | buildset_id                                     | buildset_id          | 5       | zuul.zuul_buildset.id |    8 |   100.00 | NULL                                                                |18:56
yoctozepto^ this, sir, is pure heaven18:56
corvusi'm about 30 mins late for lunch now, maybe Shrews can help?18:56
yoctozeptocorvus: sorry, I am actually planning my bedtime now18:57
Shrewswe still may want to consider prepared statement for this if it's executed a lot18:57
Shrewsif it's rare/on-demand... meh18:57
corvusyoctozepto: i'm pretty sure where that breaks down is when we do the same queries without any project, pipeline, or branch selection; ie, what you get when you first hit the builds page without entering any search terms18:57
fungiit's on-demand insofar as folks are querying it via the builds page on the web dashboard18:57
yoctozeptoon-demand, my case possibly rare but I wanted to propose it as CI health check18:57
Shrewsglad my suspicion bore us some fruit though18:57
yoctozeptocorvus: it can be checked easily18:58
corvusyep; someone needs to take over for me though; i need food18:58
corvusbiab18:58
yoctozeptoShrews/fungi?18:59
Shrewsyoctozepto: sorry, got distracted... what do you need?19:00
yoctozeptolet's have explains for completenes19:00
yoctozeptowithtout the use index19:00
yoctozeptocorvus produced one at L11419:00
yoctozeptowould like to look at kolla and nova cases too19:01
yoctozeptoand maybe the no-WHERE case as corvus suggested?19:01
Shrewslemme look19:01
yoctozepto(apart from tenant WHERE probably)19:01
Shrewsyoctozepto: posted nova and kolla explains19:05
yoctozeptohmm, you got the same color as I have ;D19:06
yoctozeptook, it did them all in the same way19:06
yoctozeptowhich is... good19:06
Shrewsand why would we want no WHERE clause?19:06
yoctozeptonot that 'Using index condition; Using where; Using temporary; Using filesort' is very good but still19:07
yoctozeptoShrews: corvus suggested main page was slow without the use index19:07
yoctozeptocould be*19:07
yoctozepto(not was)19:07
yoctozeptoit should run just the tenant where19:07
yoctozeptobut not the others19:07
Shrewspersonally, i'm not interested in complex, slow queries without a where clause19:08
Shrewswe're doing something wrong if we are doing that19:08
yoctozeptoShrews: it's the curse of on-demand stuff19:09
yoctozeptoShrews: in case you want to take a look: http://zuul.openstack.org/builds19:10
yoctozeptothis is what we users are filtering19:10
mordredShrews: I'm back from lunch - it seems like this is a topic that someone might reasonably expect me to also look at?19:11
yoctozeptomordred: be my guest19:11
* mordred is still reading scrollback19:11
Shrewsmordred: i think the immediate need has dissipated19:11
mordredok. there's a LOT of scrollback19:12
fungiwe painted an elder sign on the side of the database and it seems to be keeping the great old ones on the other side of the portal now19:12
mordredoh thank god19:12
mordredI was worried I was going to have to page in the mysql optimizer internals again19:13
yoctozeptofungi: fantasy RPG fan?19:13
yoctozeptomordred: we kind of found the culprit19:13
fungiyoctozepto: or h.p. lovecraft stories. take your pick19:13
yoctozeptofungi: or both?19:13
yoctozeptomordred: USE INDEX(PRIMARY) kills our best available JOIN index19:14
Shrewsmordred: it's been over 10 years for both of us. i question any of the knowledge that we've paged out at this point19:14
mordredShrews: yeah.19:14
mordredyoctozepto: something was doing an explicit index hint?19:14
yoctozeptomordred: yeah, the very generic method19:15
yoctozeptohttps://review.opendev.org/#/c/605170/2/zuul/driver/sql/sqlconnection.py19:15
yoctozeptowish the commit message was a bit less enigmatic19:15
mordredah - yeah - I think I remember chatting about that at the time. 9 times out of 10 index hints wind up being the wrong choice - I think at the time it was presenting as the strange case where it was needed - but now things have shifted again (Which is usually the issue with index hints - most of the time humans can't keep up with the optimizer in terms of dealing with changing data set)19:17
yoctozeptomordred: optimizer changes too19:17
yoctozeptoheuristic change19:17
yoctozeptodata changes19:18
mordredhttp://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2018-09-25.log.html#t2018-09-25T16:25:1019:18
mordredthere's the original conversation19:18
fungiyeah, we're likely running a newer mysql/mariadb in opendev than we were back then19:18
fungiso entirely possible the optimizer is no longer the same19:18
yoctozepto"doing a full table scan to check the tenant"19:19
yoctozeptonot happening today19:20
mordredwell - that's good at least :)19:20
fungiahh, actually we're using a trove instance for the db, so i doubt the version has been updated19:21
yoctozepto"a box of Sakila memberberries"19:21
mordredyoctozepto: I set them right next to my little glass dolphin sculpture19:26
mordredof course, now I can't remember where either of them are19:26
fungiin a storage unit somewhere by the airport19:26
mordredoh right19:26
mordredalthough it turns out "by" in this case is give or take an hour19:27
fungifor atlanta, that's practically next door?19:27
mordredwe opted for a unit in kennesaw, because it was super cheap and near one of my cousins19:27
yoctozeptoah, guys19:27
yoctozeptoback to real life19:27
mordred:)19:27
* mordred disturbs the real work19:27
yoctozeptowhat about reverting that one change19:28
yoctozeptoand observing?19:28
fungiwe would only need to restart zuul-web with that revert applied, right?19:29
fungiif so, doesn't seem terribly risky19:29
mordredwell - I don't think we should revert revert should we? we should only remove the with_hint line?19:30
mordredor have I not paged in enough of the backscroll?19:30
mordredNEVERMIND19:30
mordredthe other bits are just formatting change19:30
fungithat's what reverting 605170 would be, right19:30
mordredyeah19:31
mordredwant me to propose a revert?19:31
fungiwhoever wants to push the button, i'm happy to review19:31
fungithough i was thinking we could just hand-apply it on zuul.opendev.org and restart zuul-web19:33
mordredfungi: that's also a fine choice19:33
mordredfungi: we could do those in parallel even19:33
fungiand see if it has the anticipated performance impact19:33
openstackgerritMonty Taylor proposed zuul/zuul master: Revert "Speed up build list query under mysql"  https://review.opendev.org/67258119:33
yoctozeptobut don't forget to change it permanently later19:33
clarkbya the hand rvert and restart is a common thing we've done when checking things like memory leaks19:34
mordredthat gives us a place to discuss19:34
clarkbI would do it tha way to quickly get results iwthout ending up with revert revert revert revert commits19:34
yoctozeptoyeah, it needs some time for observations19:34
fungiyoctozepto: right, i mean hand-apply the proposed change without approving it straight away19:34
fungistill propose it for review so we have a place to better record our observations19:34
*** hwangbo has quit IRC19:34
fungiwhich mordred has done19:35
openstackgerritMerged zuul/zuul master: Update xterm to >= 3.14.5  https://review.opendev.org/67185819:39
clarkbI'll keep an eye on ^19:39
clarkbwas tested with the check queue results so should e fine19:40
*** hwangbo has joined #zuul19:50
corvusno way19:54
fungino way what?19:54
corvusthe solution to the problem is to go back to when things were even worse?19:54
corvusi sunk several hours of analysis into this, including why that revert would be a bad idea19:55
corvusand i go out for lunch, and come back to find that in my absence, folks have just decided to ignore that?19:55
fungiahh, i misunderstood that was the conclusion19:55
corvus19:31 < mordred> want me to propose a revert?19:56
corvus19:31 < fungi> whoever wants to push the button, i'm happy to review19:56
fungireview by testing it, to see if that made it worse/better19:57
fungi(push revert button generating the revert, not button to approve it)19:58
corvusa year ago, that query took 20 seconds.  why would it be faster now?19:58
corvus50 rows in set (1 min 14.12 sec)19:59
fungiit sounded like the theory was that changes in relative table sizes have caused the optimizer to choose different indices now than they did in october, but i have likely misunderstood19:59
corvusfungi: i tested it ^19:59
corvusthat's going to happen everytime someone goes to the build page http://zuul.openstack.org/builds20:01
fungigot it, so 672581 will cause the builds page default query to take 20s to load on opendev's deployment20:02
corvusfungi: no, it would have taken 20 seconds a year ago, it would take 1 minute 14 seconds now.20:02
clarkb74.12 seconds now looks like20:02
fungiwell, even 20s would not be great, agreed20:03
fungiso yes the revert does seem to not be a solution20:03
fungimordred: yoctozepto: Shrews: ^20:03
corvusit will make one arcane and rarely used query faster at the cost of making the simple query used many times per day extremely slow20:04
Shrewsi just returned from physical therapy, catching up. what's being reverted?20:04
corvusShrews: nothing -- https://review.opendev.org/67258120:05
mordrednothing. corvus already analyzed it and determined it's a terrible idea20:05
Shrewsoh yeah. that wasn't the solution20:06
corvuswith my limited knowledge, i can think of 2 ways to proceed -- 1) repeat the process from a year ago and try to come up with a better set of indexes that works in all cases; or 2) maybe we look into keeping the hint where we aren't filtering by project, etc, but drop it when we are20:07
corvusoption 2 seems kind of hacky, like we're doubling down on the "try to beat the optimizer" bet.  but it also seems like it might be practical.20:08
corvusand i agree with mordred in principle; i would love to not try to beat the optimizer20:08
fungibut by default the optimizer is selecting a less optimal query for the default page view20:09
mordredcorvus: I'm still digesting - but I agree - we're already in beat-the-optimizer land, so I don't think 2 is any WORSE - and if we know in code the difference between filtering by project and not, sending two different queries to the db is a fine choice20:09
mordredlike, it's not uncommon for the answer to hard query optimization to be "put more logic in the app and ask the database different questions" - even though it frequently feels wrong20:10
corvusmordred: okay, i think it'd be great if you continued to digest this (https://etherpad.openstack.org/p/Z0cucbdugf has a bunch of queries you can run on the prod zuul db) 'cause you and Shrews are gonna be more likely to synthenize an option 3 than i am.  but while you work on that, i can try regression testing a bunch of queries against the use/don't-use hint idea and see if that's feasible.20:12
mordredcorvus: yeah. I am now reading the etherpad and enjoying it20:12
corvusalso, if anyone else wants to do this, i'm sure we can put up a copy of the db somewhere; there isn't a whit of sensitive data in it.20:13
Shrewscorvus: i was about to suggest letting me mysqldump the data and play with it locally.20:13
Shrewsof course, now i have to remember how to use mysqldump  :/20:13
corvusShrews: ++20:13
clarkbmysqldump -u foo -p databasename | gzip -9 > foo.sql.gz20:14
clarkbroughly20:14
Shrewshrm, how can i pull that out of the container20:14
corvusShrews: no container on this host20:14
clarkbI don't think it is in a container20:14
Shrewsoh wait20:14
Shrewsyah20:14
Shrewsnm20:14
corvusleast, not yet :)20:15
corvus(but also, ftr, we have figured out how to do that; it's a bit wonky with lots of weird shell quoting; we do that for gitea)20:15
fungiShrews: yeah, the database is i a trove instance, so you need to know the hostname along with the credentials20:16
fungibut can be found in the zuul configs in /etc20:17
mordredcorvus: so - just so I can make sure I'm understanding what's going on from the etherpad ...20:17
mordredthe first query is an example of with the hint but filtering by project20:17
mordredwhich shows the results of 820 * 7093912 rows which == a lot of rows20:18
Shrewsmordred: the query plan changes based on project name (which caused me to suspect cardinality of that data). the one you reference there is the slow plan20:19
corvusmordred: ah, the hint didn't really come into the conversation until really late.  line 114.20:19
mordredwait - both .. yeah20:19
mordredthis is why I'm talking out loud :)20:19
corvusmordred: so, yes that's true for the first query.  but it's also true for the second query.  but then what Shrews said -- the only diff between those 2 queries is the project name.20:19
Shrewsmordred: corvus: anyone else: data dump in my home dir on zuul01 (zuul.sql.gz)20:20
fungithanks Shrews!20:20
corvuscool; if anyone without access to that server wants it, i can put it somewhere public20:21
Shrews5.7.18 MySQL Community Server20:23
Shrewsfor those playing along20:23
Shrewsmordred: we could put everything in ndb20:25
* Shrews waits for hurled wet critters20:25
mordredShrews: I was actually thinking mongo20:26
mordred(but i could totally kill this with NdbRecord)20:26
corvusyeah, this is the kind of data it was made for20:31
Shrewsoh wow. 5.7.18 is really old20:34
mordredyeah. when we start deploying zuul from our container images I TOTALLY want to reassess our db backend20:34
corvusyeah, that's why we're tending toward mariadb containers for new things20:34
*** panda has quit IRC20:35
Shrews5.7.27 is the closest archive download available. will try that20:35
*** panda has joined #zuul20:37
mordredShrews: your user on servers is uncapitalized which confuses me20:45
Shrewskeeps the feds off my trail20:46
* mordred decapitalizes20:46
mordredcorvus: I can't believe I'm about to say this - but we should change this to be a subquery20:54
corvusmordred: feeling feverish?20:55
mordredI know. but rewriting the first query as a subquery returns in 0.03 seconds and looks at 1536 * 5 rows20:55
mordredI'm still experimenting, mind you - so let's consider that an anecdote20:56
mordredfor now20:56
corvusmordred, Shrews: i have some initial results from my regression testing on dropping the hint.  as expected, so far it's only a problem for the queries which have no project or pipeline conditionals.  if you have either of those, it's good.  if you have a project, it's excellent (0 seconds).  if you have a pipeline without a project, it's okay (1-5 seconds).  that makes me wonder if an independent pipeline20:58
corvusindex would additionally be helpful (we only have (project, pipeline) right now)20:58
mordredI doubt it -the pipeline cardinality is going to be super low - it's really only valuable combined with something else20:59
corvusoh that's a good point21:00
mordredoh - let me check pipeline without project in my subquery experiment21:00
corvusi think i should improve this script a little bit to make it more automated, and so we can add in more of the fields we search (job, build, etc), and we can also do apples/apples with mordred's subquery idea21:01
mordred2.8 seconds for pipeline without project. 0.29 seconds for nova21:01
corvusmordred: branch or no?21:02
mordredyeah - those all have branches21:02
mordredor - a list of them21:02
corvusmordred: then that's the same time i got (2.35s)21:02
mordredawesome21:02
corvusmordred: so what's the subquery like with only "zuul_buildset.tenant='openstack'" in the where clause?21:03
mordredchecking21:04
*** pcaruana has quit IRC21:05
mordred8.69 seconds21:05
mordredof course, that produces 7.2 million records21:06
corvuswell, it's better than 74, but not better than 021:07
corvusmordred: throw a 'limit 50' on the end there :)21:07
corvus(that query with the hint and limit 50 is 0.00 seconds)21:07
mordredit's no better with limit because there is no support (at least in 5.7) for limit in a subquery21:08
mordredoh - although I guess it's actually not correct to limit in the subquery in this case21:08
mordredcorvus: it's possible we might actually have a collection of different queries that are each better for different use cases21:09
corvusmordred: well, also i'm wondering if the subquery is producing pretty similar results to the single query but without the hint21:10
mordredit might be - except for the limit 50 case with the hint where that's smoking the subquery21:11
Shrewsare you both testing on the production db? i just remembered sometimes running ANALYZE on a table makes a difference on funky path decisions. might want to give that a whirl21:12
Shrewsjust got my local db loaded21:12
mordredShrews: I am testing on the production db21:12
corvusyep21:12
mordredwhich is a sentence I realize is a crazy sentence to type21:12
Shrewshow very infra-responsible of you two   :J)21:13
corvusit's not *that* important of a db :)21:13
* fungi wonders if mariadb has added support for freudian_analyze21:13
mordredcorvus, Shrews: I put a couple of subquery examples at the bottom21:13
fungier, i should have said psychoanalyze ;)21:14
corvusmordred: is there a case where you think the subquery is better than not?21:14
mordredcorvus: we could totally get around the limit limitation by performing it as 2 queries - one with a limit on the buildset table to produce a list of ids, then a second query with a constructed in list with its own limit21:15
mordredcorvus: I think it's very good on things that filter by project21:15
corvusmordred: but i think the single query without the hint is also good on things that filter by project21:15
mordredit's not awful at things that don't limit by project21:15
mordredyeah21:15
corvusgiven that so far the single query with hint seems to be the only thing that can handle the query with no additional filters, it may not be worth the complexity of adding subquery into the mix if it doesn't have a win over single-query-without-hint21:16
mordredcorvus: what about with the hint without project and with limit - is that still terrible?21:16
mordredagree21:17
*** armstrongs has joined #zuul21:17
corvusmordred: re your question ^ are you asking about pipeline and branch or no?21:17
*** mattw4 has quit IRC21:18
Shrewsmy analyze suggestion does nothing locally, fwiw21:19
corvusbasically: hint: yes, pipeline: no, project: no, branch: no  -->  0.00s  (this is the query for the builds landing page)21:19
corvusbut so far, that's the only thing we'd want to use the hint for21:20
corvusoh, sorry -- if we only have branch, we want to use the hint21:20
mordredyeah, because branch also has super low cardinality21:21
corvusso let me rephrase that: so far, it's looking like "use the hint if we have pipeline or project; otherwise do not use the hint" with a single query is producing excellent results21:21
*** armstrongs has quit IRC21:22
mordredwfm21:24
clarkbconsole log streaming still works in the web ui so the xterm update must be working21:26
mordredclarkb: woot21:28
clarkbmordred: corvus points out we aren't updating the js in production currently21:29
clarkbbecause js tarball has moved21:29
mordredclarkb: oh. yeah. so we have not learned that the xterm update is working21:30
*** jeliu_ has quit IRC21:30
*** mattw4 has joined #zuul21:31
corvusmordred: for timing purposes i would like to eliminate the query cache21:31
corvusmordred: know of a way to do that?21:31
corvuscause i'm starting to get crazy fast times21:32
mordredyes - one sec21:32
mordredcorvus: select SQL_NO_CACHE ...21:32
mordredcorvus: or SET SESSION query_cache_type = OFF;21:33
Shrewscorvus: that's a deprecated modifier now. weird21:38
corvusgood thing we're on an old server21:38
corvusokay, i have automated my regression script; running it now21:38
mordredyeah - that's the 5.7 thing21:38
mordredcool21:38
Shrewscorvus: it should be deprecated even on 5.7 ('show warnings' after your select will output the deprecation notice)21:39
Shrewsstill deprecated in 8.0 so... whatever21:39
corvusmordred, Shrews: http://paste.openstack.org/show/754818/21:39
fungithere's a mysql 8.0? what rock have i been living under?21:40
corvusi omitted the queries with no project or pipeline, since i ran them earlier and we know that without the hint they take forever21:40
corvusthat's measuring execution time in python, so that includes query parsing, fetching data, etc.  it's going to be a little more than what the mysql cli would report21:40
mordredcool21:41
Shrewsfungi: went from 5.7 to 8.0, so you haven't been under a rock too long21:41
fungiahh, okay21:41
fungii see, there were 7.x cluster releases21:41
corvusi'm running my script with the other search terms now (job, build, result, etc)21:52
corvusit's looking like we also don't want to use the hint when we have a change; that's the other thing that's indexed on the buildset table.21:58
corvusso far: if not (project or pipeline or change): use hint; otherwise do not use hint21:58
corvushttp://paste.openstack.org/show/754820/22:01
corvusthat held up for that set of things22:01
mordredcorvus: I'm fascinated that pipeline makes it ok22:02
mordredproject and change make sense to me - they're the things that are the most specific22:02
corvuslet me run that narrowly22:02
mordredcorvus: also - I have meltybrain - can I restate to make sure I'm parsing - if project or pipeline or chage: do not use hint ; else: use hint22:03
corvusmordred: yep -- except i think you are right to be suspicious of pipeline22:04
corvushttp://paste.openstack.org/show/754821/22:04
corvusit looks like we *are* better off using the hint with pipeline; it's just that if we don't use the hint with pipeline, it's not terrible22:05
mordredcorvus: I think it we wanted to generalize, we could do a query at startup something like "select count(distinct(column_name))" to get the number of distinct values - and then define a column threshold over which we use the hint22:05
mordredthat's super weird - we only have l ike 10 pipeline names22:06
mordredbut - empirical data wins22:06
mordredbut for now - I think just hard-coding the logic as you have defined it seems like a fine choice22:06
corvusmordred: yeah, there's a relationship to the indexes here, so it's not *crazy*; just incomprehensible22:06
mordredyeah22:07
* mordred needs to dinner ... I'll check back post dinnering to see if there is further things to be baffled by22:07
corvusproject and change are the most accessible indexes on that table22:07
corvusk, i'll probably check a few more things then write this up22:07
mordredyeah. project and change are the best things to be looking for22:07
corvussorta weird that build isn't faster; we may want to look into that one22:08
corvusoh that's not weird at all22:08
corvuswe don't have an index on build uuid22:08
corvusthat's a big oversight22:08
corvusShrews: are you still around?  can you try some queries for me before and after adding an index?22:09
Shrewscorvus: can be. just a sec22:09
corvusi'm digging up the q's22:09
Shrewsk. ready22:10
corvusShrews: http://paste.openstack.org/show/754822/22:10
corvusShrews: can you run both of those before and after adding an index on zuul_build.uuid ?22:11
Shrewsyup22:11
Shrewscorvus: err, the explains, yes?22:11
Shrewsor do you want the actual data?22:11
corvusShrews: timing22:12
Shrewsoh22:12
corvus(explains could be interesting too)22:12
*** mattw4 has quit IRC22:12
corvusthat's the table with 7m rows, so it may take a minute to build the index22:12
corvusalso, how long does it take to build the index is good info :)22:12
Shrewsbefore index, 1st query is 11.71s, 2nd query is 2.96s22:13
Shrewscorvus: explains for those are at the end of the etherpad. anything else before i add the index22:14
corvusShrews: not that i can think of now22:15
Shrewsok. gimme a sec to recall the create index syntax22:15
*** mattw4 has joined #zuul22:16
Shrewscorvus: index creation data at the end of the etherpad22:18
corvus30s isn't too bad22:19
Shrewscorvus: that uuid does not exist in my data22:19
corvusoh, heh, it's really new22:19
corvusi'm sure i have an old one handy, 1 sec22:19
corvusShrews: try 0f4573fc46934b79bec64438d7d63c7022:20
Shrewsgood news is with the index, both empty result sets returned REALLY quickly22:20
Shrewscorvus: see end of etherpad. the results of the non-index queries will be slightly skewed since there were no results22:23
Shrewsbut you can consider the timings best case i guess?22:23
Shrews0s and 2.78s for the queries with the index22:23
Shrewsi can drop the index and retry them with the existing uuid if you like22:25
*** jeliu_ has joined #zuul22:25
corvusShrews: can i beg you to drop the index and repeat that with a slight modification?  (1 sec and i'll explain)22:26
Shrewsyep22:26
corvuswe have a job name index there that includes the buildset id, so i looked up why and found this: https://review.opendev.org/48161422:27
corvusa "covering index"22:27
corvusi'm thinking that we'd probably want the same thing for a build uuid index -- so can you create it as (uuid, buildset_id) ?22:28
Shrewscreating. you want the queries with the existing uuid i assume corvus?22:29
corvusyeah22:29
Shrewsindex create about the same, 28.6s22:29
Shrewscorvus: 0.00s on the first query, 2.84s on the second22:31
corvusk, about the same22:31
Shrewsyeah22:31
corvusbut theoretically better assuming the 'covering index' thing works like that22:32
corvusthat seems like a Vocabulary Word so i assume mordred is right about that :)22:32
Shrewscorvus: the explain actually looks much better22:32
openstackgerritJeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running  https://review.opendev.org/67039522:32
Shrewscorvus: see end of etherpad again22:33
corvusthx22:33
Shrewsthat actually may have more to do with using an existing uuid22:34
Shrewshrm, nope22:35
Shrews(uuid, buildset_id) looks like a winner22:36
corvuscool, i'll include a schema change with this patch too22:36
*** jeliu_ has quit IRC22:36
corvusnow i'm curious why job_name isn't performing in the same way; maybe it's because the cardinality is too low on something like 'openstack-tox-py27'22:36
corvusi'm trying it with a more rare job22:36
corvusyep, that's it.  if i use 'zuul-operator-functional-k8s' as the job name, it looks like what you got for the build uuid query22:38
corvusso, because the uuid is unique, once we add that index, we're always going to have good results with no hint.22:39
Shrewscorvus: openstack-tox-py27 occurs the most of any other value22:40
corvusbut with the same kind of index for job name, we get better results using the hint with a job with lots of hits like 'openstack-tox-py27', but better results without the hit for a rare job.22:40
Shrewsi placed the top 20 counts in the etherpad22:40
corvusthis is where it would be really nice if the optimizer were making the right choices :)22:41
corvushttp://paste.openstack.org/show/754823/ numbers for those ^22:41
Shrewsonly 29 for zuul-operator-functional-k8s22:42
Shrewscorvus: need anything else? food time here22:43
corvusShrews: nope, thanks!22:43
corvusi'll flip a coin on whether to use the hint for job name, then write up the patch22:43
openstackgerritJames E. Blair proposed zuul/zuul master: Improve SQL query performance in some cases  https://review.opendev.org/67260623:13
corvusShrews, mordred, yoctozepto: ^ whew.  there we go.  thanks for your help there23:14
openstackgerritJames E. Blair proposed zuul/zuul master: Improve SQL query performance in some cases  https://review.opendev.org/67260623:14
openstackgerritJames E. Blair proposed zuul/zuul master: Improve SQL query performance in some cases  https://review.opendev.org/67260623:15
*** michael-beaver has quit IRC23:24
openstackgerritIan Wienand proposed zuul/nodepool master: Enable debug logs for openstack-functional tests  https://review.opendev.org/67241223:28
*** mattw4 has quit IRC23:44

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!