Monday, 2020-03-16

*** tosky has quit IRC00:14
*** jamesmcarthur has joined #zuul00:23
openstackgerritMerged zuul/nodepool master: Stop using hacking for anything  https://review.opendev.org/71278100:42
*** armstrongs has joined #zuul01:22
*** sshnaidm is now known as sshnaidm|afk01:31
*** armstrongs has quit IRC01:31
openstackgerritMerged zuul/nodepool master: Add visual dividers for each image in Dockerfile  https://review.opendev.org/71300201:44
*** jamesmcarthur has quit IRC02:09
*** swest has quit IRC02:10
*** jamesmcarthur has joined #zuul02:13
*** jamesmcarthur has quit IRC02:18
*** jamesmcarthur has joined #zuul02:20
*** swest has joined #zuul02:24
openstackgerritIan Wienand proposed zuul/nodepool master: Add diskimages-globals section for common configuration  https://review.opendev.org/71315702:48
*** jamesmcarthur has quit IRC03:13
openstackgerritIan Wienand proposed zuul/nodepool master: Add diskimages-globals section for common configuration  https://review.opendev.org/71315703:13
*** jamesmcarthur has joined #zuul03:27
*** bhavikdbavishi has joined #zuul03:33
*** jamesmcarthur has quit IRC03:56
*** pleia2 has joined #zuul04:11
*** jamesmcarthur has joined #zuul04:37
openstackgerritIan Wienand proposed zuul/nodepool master: Dockerfile: set SECLEVEL=1 for RAX dfw and ord region compatability  https://review.opendev.org/71316204:47
*** jamesmcarthur has quit IRC04:57
openstackgerritIan Wienand proposed zuul/nodepool master: Dockerfile: set SECLEVEL=1 for RAX dfw and ord region compatability  https://review.opendev.org/71316205:02
*** bolg has joined #zuul05:13
*** evrardjp has quit IRC05:35
*** evrardjp has joined #zuul05:36
*** sanjayu_ has joined #zuul05:42
openstackgerritIan Wienand proposed zuul/nodepool master: Dockerfile: set SECLEVEL=1 for RAX dfw and ord region compatability  https://review.opendev.org/71316206:18
*** irclogbot_3 has quit IRC06:29
*** dpawlik has joined #zuul06:58
*** jamesmcarthur has joined #zuul06:59
*** jamesmcarthur has quit IRC07:03
*** bolg has quit IRC07:25
*** irclogbot_0 has joined #zuul07:30
*** NBorg has joined #zuul08:06
*** dpawlik has quit IRC08:07
*** dpawlik has joined #zuul08:07
*** sugaar has joined #zuul08:11
NBorgHello. Is there a suggested way to use tox (my problem is playbooks-syntax) from zuul-jobs if you use roles from another repo? Modify ANSIBLE_ROLES_PATH in tox.ini?08:17
*** dpawlik has quit IRC08:18
*** bolg has joined #zuul08:22
*** dpawlik has joined #zuul08:28
*** jcapitao has joined #zuul08:32
*** jcapitao has quit IRC08:34
*** jcapitao has joined #zuul08:36
*** bolg has quit IRC08:37
*** bolg has joined #zuul08:38
*** jcapitao has quit IRC08:42
*** jcapitao has joined #zuul08:42
*** jpena|off is now known as jpena08:50
*** tosky has joined #zuul09:07
*** sanjayu_ has quit IRC09:22
*** arxcruz is now known as arxcruz|rover09:34
zbrmorning!09:34
zbrwho can help me merge few bits? https://review.opendev.org/#/c/708642/09:34
*** harrymichal has joined #zuul09:37
AJaegerzbr: I suggest to ask clarkb again, he had some comments and one of his was not addressed...09:39
zbrAJaeger: my impression was that was not an actionable comment.bindep/test-reqs do not change very often, but they can impact testing, so it makes sense to run when modified.09:41
zbrAJaeger: in fact the one I really care about is https://review.opendev.org/#/c/690057/09:44
zbrbecause over the weekend I was unable to fix some broken tox jobs on ansible zuul instance due to lack of this feature09:45
zbrthe centos-7 image had preinstalled ancient tox, and ensure-tox is currently unable to upgrade it. paul was in PTO, so it was impossible to do anything.09:46
zbrprobably he will fix the image today, but is much better to be able to workaround bugs in images.09:47
AJaegerzbr: leave a comment to his one ;)09:49
zbralready did09:49
AJaegergreat09:49
AJaegerzbr: I think one point clarkb wanted to make is that the this is for the test job and only the bindep file in zuul-jobs. Just to double check that you are aware of it09:50
AJaegerzbr: I'll approve tomorrow if clarkb finds no time today09:50
zbri am wondering if it is possible to validate an image before switching it on zuul, so we can avoid putting broken images.09:51
zbri am still hoping to reach a point where a change made to the build image is tested like a normal job and last step is to try to use that image, so we know is not worse than previous one.09:52
AJaegerzbr: not today, I expect this would need major  changes to nodepool and Zuul09:52
zbrthis was my impression too, but we should not ignore it, i seen multiple issues related to this.09:53
zbronce we have this, it will be much easier to update images (more confidence)09:53
*** saneax has joined #zuul10:01
*** avass has joined #zuul10:03
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly  https://review.opendev.org/71318210:08
*** sshnaidm|afk is now known as sshnaidm10:08
avassSoooo, someone just gave a node in nodeset the name something like 'abc/123'. And it looks like that's completely valid, except it causes some problem with the valdiate-host role10:09
avassAnd I can see how names like that could cause a lot of problems in other ways10:09
*** hashar has joined #zuul10:10
avassBut since it's valid in both zuul and ansible I think it should be fixed10:11
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly  https://review.opendev.org/71318210:16
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly  https://review.opendev.org/71318210:18
openstackgerritDaniel Pawlik proposed zuul/zuul-jobs master: DNM - checking new images  https://review.opendev.org/71318310:22
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly  https://review.opendev.org/71318210:24
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly  https://review.opendev.org/71318210:28
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly  https://review.opendev.org/71318210:30
*** hashar is now known as hasharLunch11:09
*** rlandy has joined #zuul11:30
*** Defolos has quit IRC11:39
*** Defolos has joined #zuul11:44
*** avass has quit IRC11:50
*** avass has joined #zuul11:58
*** jcapitao is now known as jcapitao_lunch12:00
*** harrymichal has quit IRC12:05
*** harrymichal has joined #zuul12:06
*** hasharLunch has quit IRC12:15
*** pabelanger has joined #zuul12:37
*** NBorg has quit IRC12:40
*** bhavikdbavishi has quit IRC12:48
*** jpena is now known as jpena|lunch12:59
*** jcapitao_lunch is now known as jcapitao13:08
*** hasharLunch has joined #zuul13:22
*** hasharLunch is now known as hashar13:23
openstackgerritMonty Taylor proposed zuul/nodepool master: Revert "Dockerfile: add support for arbritary uid"  https://review.opendev.org/71306013:26
openstackgerritMonty Taylor proposed zuul/zuul master: Remove fix-tox workaround for python3.5  https://review.opendev.org/71249513:27
openstackgerritMonty Taylor proposed zuul/zuul master: Remove duplicate variables for tox jobs  https://review.opendev.org/71249613:27
*** dpawlik has quit IRC13:29
*** avass has quit IRC13:29
*** Defolos has quit IRC13:36
*** NBorg has joined #zuul13:36
*** Defolos has joined #zuul13:38
*** erbarr has joined #zuul13:38
*** jamesmcarthur has joined #zuul13:42
*** jpena|lunch is now known as jpena13:50
*** zxiiro has joined #zuul13:54
openstackgerritMerged zuul/nodepool master: Stop comparing hostnames to determine ownership  https://review.opendev.org/71305713:58
tristanCzuul-maint: a small stack of 3 changes for the zuul-operator is waiting for review here: https://review.opendev.org/713138 . Those are little things i'm basing on a wip integration tests13:59
corvustristanC: where's the wip integration tests?14:02
tristanCwell it's still a wip, i meant my wip is on top of the stack14:03
corvusok that clears it up :)14:03
mordredcorvus: that's a fun issue from avass there in scrollback14:04
tristanCi basically have a role to setup static node provider and i'm loooking forward triggering jobs and testing the different parts like builds result, console stream, etc...14:04
corvusmordred: yeah -- i wonder what "valid in ansible" means?14:04
corvustristanC: you can borrow a bunch of stuff from the quickstart for that14:05
mordredcorvus: that's the second time in a week I've seen someone use a character in a zuul name that I'd never considered using - but which there's no specific reason not to use (mnaser used a : in a job name and my brain when "pow")14:05
mordredcorvus: maybe you can put a / in an inventory hostname since it's a valid ansible char?14:05
corvusmordred: right, but it's not a valid dns name14:05
mnaserya i kinda like ':' esp because it helps splitting different smaller 'slugs'14:06
mordredcorvus: yah - but ansible inventories don't have anything to do with dns14:06
corvusmordred: ah, then 713182 is probably the way to go.  i wonder if the prepare artifact role is okay?14:07
mordredcorvus: good question14:08
mordredcorvus: I have verified14:08
mordredcorvus: foo/bar ansible_host=mordred-irc.inaugust.com14:08
mordredin a hosts file for ansible totally works with foo/bar being the ansible name of that hose14:08
*** plaurin has joined #zuul14:08
mordredhost14:08
plaurinGood morning irc people!14:09
* tristanC waves14:09
mnasero/14:09
*** Goneri has joined #zuul14:09
plaurinzuul chugging nicely with happy developers since kubernetes stream output is working :)14:10
corvusmordred: qq on 71306014:10
corvusplaurin: \o/14:10
mordredplaurin: \o/14:11
plaurinI was wondering are there any plans for more interactions with the zuul ui? like triggering or killing jobs14:11
mordredcorvus: I don't think the sudo setting works if we don't add a nodepool user14:11
corvusmordred: this is a good point14:12
corvusmordred: we actually do create a zuul user in the zuul containers14:13
corvusmordred: so maybe we should leave that part and just revert out the uid entrypoint bit?14:13
tristanCplaurin: there are plans for that indeed, for killing or enqueuing jobs there is https://review.opendev.org/#/c/643536/14:14
plaurinautohold / dequeue / enqueue, nice14:15
plaurinuse those quite a lot14:16
tristanCplaurin: and you already reviewed https://review.opendev.org/635716 which adds manual trigger, but I think we need a proper spec for such use-case14:16
*** Goneri has quit IRC14:16
plaurinya I remember that one14:16
plaurinthose upcoming features will make a big difference14:17
plaurinfor me and my team anyways :)14:18
*** Goneri has joined #zuul14:19
tristanCplaurin: fwiw, we also investigated a service to convert arbritary event like AMQP message to a project pipeline config with this project: https://pagure.io/software-factory/zuul-gateway14:20
corvusplaurin: zuul is primarily intended to integrate with code review systems so that developers mostly interact with it there (rather than with its own ui).  what are the use cases that cause you to want to manually trigger and kill jobs?14:22
mordredcorvus: ok. good point. I'll update it14:23
mordredcorvus: I might need your eyeballs on a container job14:23
mordredcorvus: https://zuul.opendev.org/t/openstack/build/565c64362194492fa0c603c3396d053c/console14:23
mordredcorvus: it uploaded the image but failed uploading the tag14:24
mordredwith an access denied14:24
corvusmordred: that sounds like an internet hiccup14:24
mordredyeah? awesome14:25
mordredshould I just reapprove the change do you think?14:25
corvusmordred: yeah14:25
corvusmordred: and i guess we should loop that?14:25
mordredcorvus: yeah - probably so14:26
mordredcorvus: if reapprove works, let's add a loop14:26
mordredif not - we can investigate more14:26
*** harrymichal has quit IRC14:26
plaurinWe're doing UUT testing (units under test), BBD development. We run hardware tests and firmware upgrade/downgrade tests on actual hardware. Reality is that hardware and custom firmware crashes machines a lot, making many of our zuul job wait for unit connections, etc... We of course need proper failure detection but this is complex in the case of hardware and firmware issues and we do get a lot of 'stuck' scripts for long running job14:26
plaurins (4h+ long jobs).14:26
plaurinhappens a lot that we want to stop something instead of waiting for 4+ hours for it to finish14:27
mordredso thus why you'd like to be able to manually cancel a job?14:27
plaurinand then trigger it again after we fix hardware14:27
openstackgerritMerged zuul/zuul master: Declare support for Python3.8  https://review.opendev.org/71248914:27
plaurinyeah, I use the zuul dequeue and enqueue quite a lot14:27
corvusplaurin: which code review system do you use?  (gerrit, github, ...?)14:28
plauringerrit14:28
plaurinwe also use a lot of cron ' timer ' zuul jobs, running qualification tests 24h14:29
plaurinwe have to kill those from time to time after hardware failure, stress tests, etc14:29
corvusplaurin: i do not recommend it now, but in the future, we should have good support for the checks plugin which has a 're-run' button right in gerrit.  so that could help with that part of it.  mhu's work would still be the best answer for the dequeue part of it.14:29
mordredyeah - in the mean time - we use a "recheck" keyword in comments in gerrit to allow re-triggering jobs14:30
plaurinyeah we use recheck a lot too when we can. I set recheck and rebuild14:30
mordredcool. so yeah - mhu's work is your answer for the other thing14:30
corvusplaurin: thanks for the info, it's good to hear how it's being used :)14:31
plaurinsure np :)14:31
mordredoh - there's another trick you can use14:31
mordredplaurin: if you abandon the change, zuul should cancel its jobs14:32
mordredand then kick them off again when you restore it14:32
plaurinI have a stuck job right now lol let me try this14:32
corvusor if you upload a new patchset (you could just change the commit message right in gerrit), it will cancel and restart14:32
plaurinyes new patchset I am aware of14:33
plaurinha, abandon trick works cool14:34
corvusplaurin: what version of gerrit are you running?14:34
corvusmordred, plaurin: i just thought of a variant of the abandon trick: if we add support for gerrit tags to pipeline requirements (which we should anyway) you could have a pipeline which "rejects: tag: suppress-ci" and you could tag a change with that and it would cause zuul to cancel jobs and refuse to enqueue it.14:36
plaurin2.16.1314:37
plaurinyes that would be nice for the tag14:37
corvus(ftr, i don't think any of this is a better ux than mhu's web dequeue; these are just workarounds until that's ready)14:37
plaurin+114:38
fungifrom a "jobs are stuck and have to wait for them to time out or abort manually" perspective, how about a watchdog? maybe an option for a job which says it should be prematurely stopped if the console stream goes silent for x minutes, or something like that14:41
corvusclarkb, mnaser: you may be interested in giving https://review.opendev.org/710649 and children a quick look -- dhall updates to the operator14:42
plaurinfungi we have couple of BBD hardware stress test cases that can be waiting silently for 1 to 2 hours, like a CPU stress test for example14:44
fungiyeah, just wondering how you normally identify that it's not going to complete which leads you to end it earlier, and whether that's something which could signal zuul to do the same instead14:45
plaurinI depend a lot on devs designing proper failure cases and detection :(14:46
plaurinYou bet we had a lot of "It works on my machine but not on zuul, so zuul is broken' (...lol)14:46
fungiso then... what leads you to stop a job? you said "we do get a lot of 'stuck' scripts for long running jobs ... happens a lot that we want to stop something instead of waiting"14:47
fungihow do you make that determination?14:47
plaurinIt's kinda hard to know and usually unexpected stuff. Many times it's the devs who ask me to stop them because they see something not going well or taking more time than usual14:48
plaurinmost of my job have 8 hours timeout14:48
fungiahh, okay, just curious if it's a general set of cases which zuul could identify or get notification of automatically14:48
plaurinalso their test builtin timeout are badly designed but I have no control over this hehe :(14:48
plaurinI mean their test timeout built in the code14:49
corvusand if there's some way to detect it (by probing or other inspection of the test resource), maybe we could have ansible do that, and tell zuul the job has failed14:50
plaurinimho inspection of the status of the test resource should be part of the dev's BDD14:50
plaurinthey have so many tests that kills and re-enable the network layer that would be hard to follow14:51
plaurinsoo many unexpected variables hehe14:51
plaurinby having trigger 'buttons' I can react quickly for them manually14:52
plaurinI guess I could leverage the zuul api too14:52
Shrewsmordred: what's driving the nodepool reverts?14:56
plaurina button to "disable" a job would also be useful for me. For my 24/7 cron jobs. Sometime I have to suspend them for hardware maintenance like bricked machines, etc14:57
mordredShrews: we reverted the similar code in zuul already because it turns out it's not actually providing the value it purports to - but more importantly it actually broke things in the nodepool-builder image when running under podman14:58
corvusShrews: see https://github.com/RHsyseng/container-rhel-examples/issues/103 for more info14:59
Shrews<sung to batman theme> nuh nuh, nuh nuh, nuh nuh, nuh nuh, podman!15:00
Shrewsyou're welcome for that15:00
openstackgerritMonty Taylor proposed zuul/nodepool master: Revert abitrary uid support  https://review.opendev.org/71306015:00
Shrewscorvus: oh, yeah. that's not good15:02
fungiholy security holes, podman! now i have the theme stuck in my head, thanks15:02
Shrews>:)15:03
fungi*pow*15:03
clarkbcorvus: left a response to one of your comments on https://review.opendev.org/#/c/713157/2 if you have a minute15:14
clarkbmordred: and left a thought on https://review.opendev.org/#/c/713162/3 in response to one of your ideas. I doubt that will work though15:15
clarkbah yup fungi has responded with similar doubt15:16
corvusclarkb: wait, elements are not ordered?15:16
clarkbcorvus: not on the command line. They are internally ordered15:16
corvushuh, how about that15:16
mnaseri don't want to be a pain but i think maybe we should have opendev/nodepool and zuul/nodepool if we're going to have provider-specific hacks/workarounds15:20
corvusmnaser: regarding?15:20
mnasercorvus: like the /etc/ssl/openssl.cnf change in the zuul/nodepool change discussed above15:20
clarkbmnaser: fwiw that ssl change is in line with other distros15:21
clarkbdebian is being weird there15:21
clarkboh unless this is the new one15:21
clarkbfor the new one ya, this gets weird which is why I was trying to brainstorm config based fixes15:21
clarkb(I think monty was tring to do that too)15:21
mnaser"Note that CAs have stopped issuing certificates that didn't meet those requirements in January 2015, and since January 2017 all valid CA certificates should meet those requirements." -- i'm curious how we'd run into that .. unless they have a really really old cert15:22
*** Goneri has quit IRC15:22
clarkbmnaser: its easy enough to check with openssl s_client /me looks15:23
mnaserthey both seem recent15:23
clarkband are issued by digicert but sure enough doing sha115:24
mordredmnaser: I agree - I'd rather avoid putting ssl workarounds for a provider into the image and would much prefer figuring out a way to have it be in the provider-specific config15:24
fungiyeah, i think step 1 is to try interacting with rackspace via osc/sdk from a recent debian and see if there's a way to make that work without editing /etc/ssl/openssl.cnf15:25
clarkbok its the peer signing digest not the cert itself15:25
clarkbthis is what is used during the handshake15:25
clarkbwhich is why they can have a recent cert that isn't rejected, but still fail because sha115:25
corvusmnaser: in principle, i agree; but i also don't think "use the default upstream openssl.cnf" file is too out there; if this can happen in sdk, that would be best, but if not, this seems low risk/cost.15:26
fungias a reminder, this is what debian's doing to the upstream openssl.cnf when packaging it for redistribution: https://salsa.debian.org/debian/openssl/-/blob/debian/unstable/debian/patches/Set-systemwide-default-settings-for-libssl-users.patch15:27
clarkb"In TLS 1.2 the client can send which digest algorithms are supported for the handshake. If the client does not offer SHA-1 the server should not use SHA-1, which means that it might use other algorithms offered by the client or that the connection fails."15:27
clarkbfrom stackexchange so unsure of how accurate that is, but maybe we can force the remote to use sha2 by not claiming to accept sha115:28
*** sshnaidm is now known as sshnaidm|afk15:28
* clarkb trying to see how to do that with s_client now15:28
fungii doubt the handshake is claiming to accept sha1, but set_ciphers() should allow us to remove it if so15:29
*** Goneri has joined #zuul15:29
mordredcorvus: well - maybe the answer might want to be "use the upstream openssl conf file" instead of "sed a particular setting"15:32
clarkb-sigalgs SHA256 is apparently not correct15:32
corvusmordred: indeed15:34
clarkbah I need algorithm+hash15:34
clarkb`openssl s_client -host dfw.images.api.rackspacecloud.com -port 443 -sigalgs RSA+SHA256` continues to use sha115:34
tristanCianw: about https://review.opendev.org/713157 , have you thought about using dhall to manage your nodepool.yaml configuration? we are giving it a try and working on dhall-nodepool binding: https://tree.taiga.io/project/morucci-software-factory/us/337815:36
clarkbtristanC: I'm not sue I could +2 such a change in opendev15:36
corvusyes please, please keep dhall away from users15:36
corvusi still don't understand the operator15:37
corvustristanC: if there is a problem with the zuul or nodepool configuration language, maybe we could work together on making it better?15:37
corvusrather that replacing it with something that's very hard for users to understand?15:38
corvus(i thought with our discussions on the operator we were in agreement that it was worth exploring for a system like the operator where programmatic manipulation of the config files was necessary, but that it was not suitable for direct use by end-users, which is what we're talking about here)15:39
clarkbdoes openssl itself disable sha1 peer signatures by default?15:39
mnaserclarkb: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912759 i think the same thing is happening here15:39
openstackDebian bug 912759 in openssl ""wrong signature type" with working websites" [Important,Open]15:39
mnaserwhich links to https://github.com/openssl/openssl/issues/712615:40
clarkbmnaser: probably. Though I'm trying to gauge risk and if debian is being over cautious here15:40
clarkbif openssl has decided this isn't valid we've got more leverage with the server side too15:42
clarkbbecause they'll break pretty broadly15:42
tristanCcorvus: iiuc, either we add abstraction in the runtime like nodepool and zuul to express complex things, like `use this env-vars on all images because it is declared in this new diskimages-globals option`, or either we keep the configuration simple (or even simpler as we could remove zuul project-template too) by moving the abstraction in the configuration language directly15:42
tristanCcorvus: i am happy with either solution, though i prefer the later as it works for any runtimes15:43
clarkbmnaser: et al https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_security_level.html seems to say that seclevel 1 is the default15:45
tristanCand both solution are not incompatible, for the issue ianw was having about setting the zuul public key on all images, that something you can do locally in opendev system-config using a dhall-to-yaml step15:46
mordredclarkb, mnaser, corvus: reading the above it seems to me like debian is pushing a bit too aggressively here with their config file changes. why don't we just install the upstream openssl conf?15:46
corvustristanC: i don't think the other zuul maintainers are in favor of the second option, so let's do what we've been doing for the past few years: slowly and carefully evolve the config syntax as needed.15:46
clarkband even specifies that higher than 1 is likely to cause many problems15:46
mordredyeah15:46
clarkbmordred: ya I think that is what I'm leaning towards especialyl since openssl explicitly warns against doing what debian does15:46
mordredyeah15:46
mordredshould we just do it in the python-base image?15:46
clarkbnote master says the same thing https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_security_level.html15:47
clarkbmordred: thats likely to reduce future debugging headaches as more things land on that image15:47
clarkb(though we may be able to get away with stricter rules in specific cases)15:48
corvustristanC: note that we learned a lot of things from the zuul syntax that we could probably take advantage of if we reworked nodepool; we never made as much of a clean break with nodepool as we did for zuul in v315:48
mnaserI’m indifferent about sed or replacing the whole thing after doing this small tad of research15:48
mordredany thoughts on the best way to do that? should we just download a current copy of the upstream file into system-config? or do a build-time curl?15:48
mnaserIt seems like the right approach however imho15:48
clarkbmnaser: I would vendor15:48
clarkber mordred ^15:49
mordredclarkb: kk15:49
mordredanybody got a link to the upstream file handy?15:49
tristanCcorvus: well for nodepool, the issue we are trying to solve with dhall is assigning labels and diskimages accross different provider and launcher15:49
zbrhow do I query zuul about all builds that run in the last 7 days? apparently the api has a count limit but no way to sort the results15:49
zbrand clearly order is neither start or end time15:49
*** jcapitao is now known as jcapitao_lunch15:50
corvustristanC: that sounds like something that could drive a spec for a new config file format.15:50
*** jcapitao_lunch is now known as jcapitao_afk15:50
clarkbmordred: https://github.com/openssl/openssl/blob/master/apps/openssl.cnf I think that is it15:51
*** bolg has quit IRC15:51
clarkbI don't know if that will work as pure drop in though (possible paths need changing/)15:51
*** bolg has joined #zuul15:51
tristanCcorvus: iirc, mordred suggested that nodepool configuration could be managed like zuul, e.g. in config projects.15:51
mordredclarkb: maybe we should just apply a reverse of the patch fungi linked15:52
mordredclarkb: actually. why don't I move this convo to opendev15:53
tristanCbut i guess before that, a much needed improvement would be to have a single nodepool configuration that could be used by many launcher and builder service15:53
clarkbtristanC: isn't that how it works already?15:53
clarkbI guess not with launchers15:53
*** bolg has quit IRC15:53
clarkb(builders kinda need to be set up that way as we discovered recently though)15:54
corvustristanC: note that one of the main tasks of zuul-operator is supposed to be to take the nodepool launcher config and split it so there is one pod for each launcher; that's not implemented yet, is it?15:54
corvusclarkb: not really15:54
tristanCcorvus: right, the zuul-operator could do that, but wouldn't it better if nodepool support it natively?15:54
corvusclarkb: builders are intended to have the same config and act as a cluster; it's only when you split architectures do they need different configs.15:54
tristanC(and it's not implemented yet, i don't think the launcher service is even activated yet)15:54
corvustristanC: well, the operator has to decide how many launcher pods to run anyway15:54
corvus(or, someone could propose a new nodepool algorithm that lets launchers act as a cluster)15:55
corvustristanC: let's see if we can wrap up the work on zk auth/tls?15:58
corvustristanC: it looks like my zuul + quickstart patches are failing tests, i'll fix them up.  but they should generally be working.  you suggested that you might want to just drop the sasl support and just use x509 for authn/z.  do you still want to do that?15:59
tristanCcorvus: i think that cluster algorithm would help a lot. then back to the initial new diskimages-globals option, not sure if it's ok to revisit now, but i would suggest to add a more generic parent inheritence system similar to what zuul do with jobs15:59
corvustristanC: i agree15:59
corvusabout both15:59
corvustristanC: i don't think there's any rush on diskimages-globals, feel free to -1 that for more design15:59
tristanCalright, let me drop a comment and then i'll look into zk auth16:00
corvus(clearly, we can just keep copying dicts around until we're happy with something better)16:00
corvusclarkb, mnaser, fungi, mordred, SpamapS (if you're around): what do you think about using tls certs (signed by a private ca) for zookeeper (as authn and authz) and omitting the sasl password support?16:03
clarkbcorvus: that should be fine16:03
fungiseems light it ought to work16:04
fungis/light/like/16:04
*** saneax has quit IRC16:04
mordredseems fine to me16:04
fungihow easy is it to specify a ca to trust in zk?16:05
fungido you have to alter the trust set in the jvm?16:05
tristanCcorvus: assuming tls can prevent anon connection, then i'm happy to abandon the sasl support16:05
mordredfungi: well, we have to do that already for tls connections for zk anyway - so basically with tls support we've done all the hard bits of config - so I think the thought is we might as well just use that for auth?16:06
mnasercorvus: from someone who waves around the world k8s a lot, i'd probably prefer that cause i can use something like cert manager to create a ca for the deployment and then manage all the certs distribution16:06
*** bhavikdbavishi has joined #zuul16:08
corvustristanC: btw, zk acls also support x509, so if we wanted, we could keep the default_acl stuff; we wouldn't have to require it, but it could be an option...16:08
corvusmnaser: yeah, tristanC started looking at cert-manager16:09
corvusmnaser: i wrote a shell script that does the equivalent for non-cert-manager environments.  i think cert-manager supports everything the shell script does, so it's hopefully mostly a matter of "porting" it.16:09
*** hashar has quit IRC16:12
tristanCcorvus: mnaser: i had a quick look and i couldn't find how it could be deployed in a the same namespace as the zuul-operator (since i didn't want to bother with a cluster wide configuration for our self-signed cert only used by control-plane)16:15
tristanCand i wonder if cert-manager is really worth the trouble for zk and gearman tls, since they are self-signed16:16
corvustristanC: cert-manager defines CRDs, so it needs to be installed by an admin in the cluster anyway.  is that a problem?16:18
*** harrymichal has joined #zuul16:21
SpamapScorvus: +1, it's the better option. TBH, I'm also starting to wonder if Zuul 6 should be "the etcd3 migration". ZK seems to be losing its community rapidly. :-/16:22
fungiup side is that zk server certs don't need to be self-signed if we're already setting up a ca to sign the peer certs for auth16:22
fungiwe could sign the server certs with the same ca and then peers have the option of validating the server certs16:23
corvusSpamapS: ack; that'll be a good time to re-evaluate16:23
corvusfungi: yeah, all of these are actually "private CA-signed"; i think tristanC mistyped16:25
corvus(nothing is self-signed, zk validates the signature of every cert, server-side and client-side)16:25
fungioh, okay, thanks for clarifying16:26
fungiand yes, i have nothing against self-signed server certs (provided appropriate precautions are taken to establish the server's identity some other way), but if there's already a private ca involved then there's little reason for self-signing16:27
tristanCcorvus: you don't need to be admin to defines CRD, the zuul-operator can (and is) installed locally without being cluster admin16:28
mnasercorvus, tristanC: i mean in my case, because of zuul-helm, i can probably add support to generate a ca/etc and point users at "make sure cert-manager is installed"16:29
mnaserso it just lets you use the crds opt-in16:29
tristanCfungi: corvus: i meant the ca is self-signed. or perhaps i don't actually understand tls :)16:29
*** jamesmcarthur has quit IRC16:30
*** harrymichal has quit IRC16:30
tristanCmnaser: i think that should work yes. i don't know how that would be expressed in zuul-helm though16:31
*** jamesmcarthur has joined #zuul16:31
corvustristanC: yes, the ca is self-signed; the ca signs the certs; the server and client trust the (self-signed) ca.16:31
tristanCcorvus: in that case, i don't think we need any cluster wide cert-manager right? the (self-signed) ca would only be used by zuul services16:31
tristanCused internally*16:32
corvustristanC: yeah, it's not about that16:32
corvusdidn't we talk about this on friday?16:32
tristanCi think so, then i looked at using https://cert-manager.io/docs/configuration/selfsigned/16:32
corvusyes i looked at that too, that's how you make the ca.16:33
corvusi don't understand the resistance to using cert-manager?16:33
tristanCbut when installing cert-manager, it created a new namespace and required cluster wide things that didn't make sense to me for zuul ca16:33
fungitristanC: right, any ca which is the root of trust may as well sign itself (though they can be cross-signed too). the point of the ca cert is that you can implicitly trust it through out of band distribution16:33
fungiso it doesn't ultimately matter where its signature came from16:33
fungias opposed to independently distributing a bunch of self-signed server and even client certs16:34
tristanCcorvus: the resistance for me is that it's an extra requirements, and it seems like it requires cluster admin privilege.16:35
fungia ca gives you the ability to only have to solve distributing one cert16:35
corvustristanC: i guess i don't see those as objectionable.  being able to use the standard k8s way of managing a CA seems worth it to me.  it's one extra line in the installation step to install it.16:38
*** jamesmcarthur has quit IRC16:41
*** jamesmcarthur has joined #zuul16:42
corvusmnaser: you seem to think cert-manager is worthwhile, what would you recommend?16:42
mnasercorvus: i mean from a zuul perspective, it shouldn't really care what you end up doing.  from an operator/helm perspective, i would think it would be nice to have something like certSource: cert-manager or something along those lines16:43
mnaserand then that way it can generate a ca and certs off of it16:43
mnaserand the neat thing is they all end up in k8s secrets which you can just mount into the container/env variables/whatever way you end up manipulating it16:44
mnaserprobably just mount it in /etc/zuul/tls and point configs towards it16:44
tristanCcorvus: here is a cert-manager installation log http://paste.openstack.org/show/790745/ , used memory went from 948MB to 1.1GB, or about 15% increase in usage. i mean that's not too bad, but still...16:45
corvusi feel like if memory efficiency is a concern, maybe k8s isn't the right system.16:45
fungiany idea what's using the extra memory? a ca shouldn't need to be a persistent service unless you're also providing a software hsm to automatically handle signing requests or something16:46
fungis/software hsm/virtual hsm/16:46
corvusfungi: there's a daemon to watch for cert requests and process them16:46
corvus==pod16:46
mnaseralso16:47
mnasercainjector and webhook is running in there16:47
mnasernot needed for this specific use case16:47
clarkbfungi: think of it like a whole new api service that updates a ca16:47
mnaserits also nice because it can talk to ACME issuers or whatnot16:47
mnaseror something even like vault16:48
mordredyeah - I mean - k8s isn't exactly the system you use to run the minimum number of daemons16:48
fungiyep, got it. so basically acting like an hsm16:48
funginot just distributing a static ca cert16:48
mordredit's a system you run to remove the cost of running the daemons you need so that you can reuse work from other people without worrying TOO much about things like "oh, but then we'd need to run another service"16:49
tristanCcorvus: mordred: well i do care about memory efficienty. but that's not very relevant here, unfortunately.16:49
corvustristanC: i think mnaser has outlined why using cert-manager would be valuable.  if you think that reducing the dependency set is more valuable than that, then i guess we could just handle it internally (or support both).16:50
mnaseri mean again i dont think this should be a hard dependency16:50
mnaserbut its a "nice to have if i already have cert-manager deployed"16:50
corvusi just want to point out, i'm the one saying "we should use the k8s thing" and tristanC is saying "we should do it ourselves" :)16:51
mnaserwhich many clusters will end up having anyways16:51
corvusmnaser: that sounds like a vote for "handle it internally (or support both)"16:51
corvustristanC: so if you want to handle it internally, where will the CA live?  a persistent volume for the operator pod?16:52
fungiyeah, it's a good point, the overhead from adding the kubernetes certmanager is only overhead if you don't already need it for something else16:52
mordredI could see support both - although I could also see just doing cert-manager for a start and then adding "handle it ourselves" as a followup16:52
mnaseri haven't followed the tls stuff lately, is it "tls only" or "tls if you want it"16:52
mordredlike - my main worry with "handle it ourselves" is that we're then reimplementing cert-manager16:52
tristanCi meant that i'm happy to look into cert-manager, but i worry it's oversized for what we need. also, isn't the jks still an experimental feature that was released 11 day ago?16:52
mnaserbecause in that case i would be on the side of "you only get tls if you have cert-manager in your cluster .. or you provide your own certs"16:52
mnaserthe pattern used in things like ingress controllers is that you point towards a tls secret16:53
tristanCmordred: isn't zk-ca.sh already a cert-manager reimplementation?16:53
mnaserand then the cert-manager would automatically generate certs for that16:53
corvusmnaser: i think by zuul v4 we're going to want to require authentication, and it looks like tls may be the way to do that, so probably zuulv4 will require tls16:54
corvusit will be optional in v316:54
*** hashar has joined #zuul16:54
mnaserif it's required, then i think the zuul operator should accept a tls field which points towards a secret in k8s that has a ca/cert/key16:54
tristanCcorvus: when adding or removing zk service, shouldn't the cluster be restarted anyway?16:55
mnaserhow that tls secret is generated is up to the user16:55
corvustristanC: no, latest zk can do that live16:55
tristanCmnaser: i disagree, why the user should have to worry about genearting zk tls?16:55
mnaserwell, because they might want to copy paste it from some really complicated internal CA they use16:56
mordredI mean - cert-manager seems fairly common across k8s installs these days, right mnaser?16:56
tristanCbeside it's not such a simple process, you need a java keystore thingy that needs to be base64 encoded16:56
mnaseralso, the ingress is a good example of this16:56
mnaserit doesnt generate the certs, it points towards a tls secret16:56
mnaserhow that secret is generated can either be the user (using cert-manager or manually creating the secret)16:56
mnaserthat way you're covering all your bases16:57
mnasera secret that's of type tls is very standard16:57
fungithere will absolutely be deployments in organizations where they need to supply their own certificates and trust without depending on any integrated ca/cert management automation, due to regulations compliance16:57
tristanCmnaser: to that token, we should also drop gearman tls generation from the operator, and that makes using zuul crd a bit more complicated as you need to provide more and more configuration files16:57
mordredso - I think we might be talking across each other here16:58
fungilikely16:59
mordredfor the purposes of intra-zuul communication we need certs signed by a CA - and that CA should _not_ be a CA shared with other things (like some corporate CA) because it's being used for client-auth16:59
mordredone of the ways we can accomplish this in teh zuul operator is to have the zuul operator use cert-manager to do the work, which will generate the k8s secrets which it will then use16:59
corvusauthz even16:59
mordredone of the ways is for the operator to use something like zk-ca.sh to generate k8s secrets which could be used by the operator to config things17:00
fungiright, while it *shouldn't* be shared with a general corporate ca, it may need its dedicated ca managed by some outside system to be able to satisfy compliance requirements17:00
mordredand one of teh ways is provide config settings that the user can use to specify a set of k8s secrets that user would like to be used for the purpose and have the operator not generate them17:01
corvusfungi, mordred: (this case is why it may still be useful to keep the acl support in zuul+nodepool -- in that case one could use a shared corporate CA, but set acls so that only the zuul client certs have authz)17:01
mordredas this is for intra-service comm - there are almost no use cases where the third is desirable, so I think for now let's just ignore it17:01
fungiit may not make sense, but there are some industry regulations which demand certain properties for any certificate authority operated within the organization17:01
mordred(we can always add config overrides later if we find a use case)17:01
mordredbut I thnik the main thing we're discussing now is: "should the zuul operator use cert-manager to generate, or zk-ca.sh to generate - or both"17:02
fungibut yes, so long as we still leave an out to supply the certificates on your own manually, i think that's covered17:02
tristanCoh right, having an optional, user-supplied, secret for tls files wfm. though it's not specified in the zuul-operator spec17:03
mordredtristanC: so you'd be happy having the zuul-operator use cert-manager if one is there and no secrets are specified and not having a different auto-generated method?17:04
fungifor the users who aren't hamstrung by draconian regulations compliance requirements, i think whatever provides the most seamless and hand-off solution while maintaining reasonable levels of security is preferred17:04
fungieven if it drags in a dependency and consumes a bit more system resources in the process17:05
tristanCmordred: that sounds like an implementation detail. in one case we need to require a new cert-manager dependencies, in the other we use the existing zk-ca.sh. i picked the later because i plan to use it outside of kubernetes too17:05
corvusit does sound like it's 3 options: 1) manual 2) cert-manager 3) zk-ca.sh;  all of them would produce a secret with cert/key/ca files which will be mounted into the pods and specified in the service configs.  they just produce those secrets different ways.17:06
fungi1 covers all cases but is not very happymaking for people who just want it to work and don't care how the certs are managed. 2 seems good for the second case17:08
fungii guess 3 would be as well, but then if someone already has kubernetes cert-manager for other reasons they've effectively got two different ca managers going in the environment17:08
corvus3) zk-ca.sh is the simplest case -- it handles everything behind the scenes with no extra dependencies;  2) requires more of the cluster administrator, but may integrate better with existing k8s practices;  1) is the option for people who need it for $reasons.17:09
tristanCcorvus: which could be expressed in the CRD input... for each tls configuration, we could have an `Optional <UserSecret | use-cert-manager>`, and if that's not set, we generate the cert locally using zk-ca17:09
corvusi'm pretty sure if we don't start with cert-manager, it'll never be implemented in the operator though.17:10
tristanCor if we don't want to support running without cert-manager, we drop the Optional17:10
corvusthe thing i'm worried about is mnaser saying "i want to use cert-manager" and adding it to zuul-helm, and tristanC saying "i don't want to use cert-manager" and not adding it to the operator.17:11
mnasercorvus: for what it's worth, i was thinking cert-manager would just be opt-in, otherwise you have to supply the secret as values17:11
tristanCmnaser: what changes are needed to add cert-manager to zuul-helm?17:11
mnasernothing, just adding one more file into templates/ to generate the ca+certs17:12
*** jcapitao_afk is now known as jcapitao17:12
corvusmnaser: right, so options #1 and #2 for zuul-helm.17:12
tristanCcorvus: i'm not saying we shouldn't add it to the operator, it's just that will take more time than using the existing zk-ca.sh17:12
mnaseryes, i do not want to have people rely on external things.  step #1 is a secret that's manually generated using provided values, step #2 is adding k8s cr's to generate those (if the user wants)17:13
tristanCdo we know if the cert-manager can even be used to satisfy zookeeper jks secretstore thing? :)17:13
corvusi do not know that yet17:14
tristanCcorvus: wait, perhaps that if we support user provided cert in the zuul-operator, then supporting the cert-manager is just a matter of adding more CR to the deployment recipe, and that could be transparent for the operator?17:14
mnaserwhat tristanC said is exactly what i've been pushing for ^17:15
mnaserall cert-manager does is create a 'tls' secret inside k8s17:15
tristanCmnaser: wouldn't that be the same for zuul-helm, that's actually transparent for the chart too right?17:15
corvuswe need a cert for each zk server17:15
mnaserso as long as we consume a 'tls' secret inside k8s, that's all we need to essentially do17:15
tristanCcorvus: well that's ok, we can have a List UserSecret17:16
corvusi was assuming the operator would need to create a CertRequest CR for each zk pod it created17:16
tristanCcorvus: that's still up for discussion, but i think we need a user controlled `count` attribute to the zookeeper schemas17:17
mnaserhmm17:17
corvusor we could fix it at 317:17
tristanCcorvus: yes, that would be the easiest17:17
mordreddoesn't tobiash run 5 atm?17:18
corvusoh, i wonder why17:18
mordredthere was a reason - I have forgotten what it is17:18
mnasergerman engineering probably17:18
mnaser=P17:18
corvusmaybe we do need a count17:18
corvus>=3 :)17:18
mordredI think we need a count17:18
mordredyeah17:18
tristanCso to summarize, the zuul-operator doesn't have to worry about cert-manager, and if the user doesn't provided tls secret, then we just run the existing zk-ca17:19
mordredbut that's kind of why I was thinking it would be nice to have the operator just handle this - it's a lot of plumbing details that aren't really interesting to a user - and while we _could_ have a chunk of yaml that all users copy-pasta17:20
mordredtristanC: that's not what my take away has been so far :)17:20
corvustristanC, mnaser: can you explain how we get the right number of certs in place for the operator if it doesn't know about cert-manager?17:20
corvus(and named the right way)17:21
corvus(note that the cert subjects must match the hostname; i believe there's an option to disable hostname verification, but it's going to look dodgy for us to set that if we don't need to)17:21
tristanCcorvus: for idx in $(seq input.zookeeper.count); do zk-ca.sh zuul-zk-${idx}; done ?17:22
corvustristanC: no i understand how zk-ca.sh will work.  i'm asking about cert-manager.17:22
mnaseri wasn't aware we have multiple certs in place17:23
mnaserthis is interesting now because i'm not sure how we can mount a different secret per pod17:23
corvusmnaser: minimum of 5.17:23
mnaserso if you have 3 replicas of zuul-executor, how do you make sure each one gets its _own_ secret17:23
corvus(at least 3 servers, a ca, and a client (which can be shared among all clients -- we don't need hostname verification for that))17:23
corvusmnaser: zuul-executor can share secrets17:24
corvusit's the zk servers that (ideally) require unique certs17:24
mnaserright, i think in plain k8s, that is somewhat of a challenge, as pods are meant/designed to be identical in definition17:24
mordredthere's a pattern for this used in the mariadb galera cluster17:25
tristanCmnaser: here is the scheduler statefulset with zookeeper and gearman tls : https://zuul.opendev.org/t/zuul/build/562b3ab7f32940b18db7cc35043d2f26/console#3/0/4/ubuntu-bionic (search for StatefulSet/zuul-scheduler )17:25
corvusmnaser: this isn't uinque to certs, each zk server needs to know its own id too, that's usually a configmap-per-server17:25
tristanCoh i see, different secret per replicaset17:25
corvusthese aren't replicas17:26
mordredyeah. these are three individually defined pods in a logical relationship with each other17:26
tristanChmm, how about one statefulset per zookeeper pod then?17:27
mnaseri wouldn't encourage that, because you'll end up with like zuul-0-0 zuul-1-0 etc17:27
mnaserim looking how the current helm charts do things (like different config/id)17:28
fungigranted if the executors all share a cert, then if one of them is compromised you have to replace the cert on all executors17:28
fungiprobably not a huge deal, but something to keep in mind in the risk model for the system17:28
mnaseryeah the zuul bits are easy i think the zookeeper are the harder ones17:28
mordredyah - becuae the executors are only precious due to cache coherency - they _can_ just scale up and down in count17:29
mordredthe zk cluster is different17:29
tristanCmordred: iiuc, in the event the user doesn't provide tls secret, the fact that the operator use a cert-manager or run the zk-ca script doesn't make a significant difference? and if for compliance or other reason the user would like to use a cert-manager, then the user provide them as tls secret17:30
corvusfungi: yes; though given that (unless you're using zoning) all executors can run any job, if one is compromised, it would be a good idea to burn them all17:30
fungii concur17:30
fungihow is certificate revocation handled in that case?17:30
fungiis a crl maintained/distributed to everything using client certs for authorization?17:30
clarkbfungi: new ca17:30
fungiouch17:31
clarkbI'm not sure it checks a revocation list17:31
corvuscrls cost more than cas.17:31
clarkb(many things seem to not do that)17:31
corvustristanC: i'm just not getting that.  cert-manager is not to handle the case of compliance.  it's for folks who want to manage their k8s systems using k8s tools.  it should be just as automatic as the zk-ca.sh.17:32
tobiashcorvus, mordred: we run 5 because availability (and because we're running them on tmpfs because ceph on spindles performs poorly with zk at our scale)17:33
mordredtobiash: awesome. thanks!17:33
corvustristanC: but i've yet to understand what you and mnaser do about how allowing users to supply certs manually enables cert-manager to work with the operator.17:33
corvustobiash: so more zk servers mitigates the risk from not having persistent storage?17:34
zenkurohi, when I run tox -e docs in locs17:34
mnasercorvus: i can talk about how it makes sense for zuul, but for zookeeper, i dont know actually17:34
zenkuroin local zuul copy do I execute any additional programs?17:34
corvuszenkuro: that should be all that's necessary; are you getting an error?17:34
mordredzenkuro: it shouldn't be running anything other than sphinx-build in the virtualenv17:35
*** evrardjp has quit IRC17:35
tobiashcorvus: well it didn't save us from data loss tbh. What we do now is we have data and snapshots on tmpfs and a sync crontab that syncs all to a persistent volume every minute17:35
*** evrardjp has joined #zuul17:36
tobiashAnd that is synced back to tmpfs on pre startzp of zk17:36
tristanCcorvus: if user sets zk.count = 3; then it needs to provide three certificate issued for '${cr-name}-zk-[123]-0.${namespace}.svc...', then the operator attach each secret to each zk deployment17:36
corvustristanC: right, that's not something that can be put in a static k8s yaml file17:37
tristanCcorvus: why not?17:37
corvusbecause that's templated?17:37
corvusi mean, that's the whole point of operators, right?  to do this for users?17:37
corvusthe way the operator should interact with cert-manager is: when it knows how many zk servers to create, it creates the cert request CRs for each17:38
corvus(because, i mean, you *can* run a whole zuul cluster without any operators or helm charts or anything, just using a single static k8s yaml file, see https://gerrit.googlesource.com/zuul/ops/+/refs/heads/master/k8s/zuul.yaml  but that's tedious)17:39
*** chandankumar is now known as raukadah17:39
mordredyeah. like - I feel like the tls connections for zk are _totally_ an impl detail that the operator should elide for the user17:40
mordredmost users shouldn't even have to care that there is a zk at all17:40
corvus++17:40
mordredbut certainly shouldn't have to care about how the CA and certs work for it to talk to itself17:40
tristanCok, but you still want the zuul-operator to register zk tls certificate in an external service17:41
zenkuromordred: it tries to run zuul-manage-ansible17:41
mordredI want the operator to use CRDs from cert-manager to generate a CA, a client cert and a cert-per-zk-pod17:41
zenkurocorvus: yep17:41
mordredbecuase it is an existing k8s system that is designed for this purpose17:42
corvuszenkuro: can you copy the error into http://paste.openstack.org/ and then paste the resulting url here so we can see it?17:42
tristanCmordred: sigh, alright, i'll have a look now and see if it can actually be used17:42
corvuszenkuro: it does that so it can include the help output in the manual17:42
mordredif that is untennable, I think the operator should use zk-ca.sh to do the same thing - my main worry is that now we will have our own local impl of cert-manager that only does what we're doing17:42
mordredtristanC: this is mostly because managing certs and cas is one of those things that I think extra effort shoudl be used to re-use existing work because most of the time doing it yourself is a recipe for making a mistake somewhere and not noticing17:43
tristanCmordred: right i understand, but i'd rather spend time doing something else than zk tls, especially since we eventually made the zk-ca works last week.17:45
tristanCand if there is a mistake in zk-ca, we should care and fix it, as it will likely be the tool used outside of kubernetes17:46
zenkurocorvus: sure, in 5 min17:46
tristanCand iiuc, zk requires jks, which seems to be an experimental feature of the cert-manager that got released less than two weeks ago17:47
tristanCso perhaps we should wait cert-manager support for jks to be stable before relying on it?17:47
corvustristanC: zk requires either a jks file; or a pkcs12 file with .jks extension; or a pkcs8 pem-encoded file with a .pem extension (but the key has to be encrypted)17:49
corvusi don't know which of those cert-manager can output, but i think that's at least 2 or 3 distinct options there17:50
mordredtristanC: that's good point about managing zk-ca.sh anyway, why don't we start with zk-ca.sh - but I think using cert-manager if it's there is still worthwhile (and the thing that k8s people with cert-manager would expect to happen), perhaps we do that as an upgrade followon - and have the operator detect if it can use cert-manager or if it needs to use zk-ca17:51
corvusmordred: ++ i think it's important that we have an answer about cert-manager17:52
*** jamesmcarthur has quit IRC17:52
corvusbecause if an experienced k8s operator (person) uses this, they will expect that (see mnaser's very first reaction when we started talking about this)17:52
corvusso the answer should either be "we do! use this option!" or "we looked into it, and it doesn't quite produce the output we need"17:53
corvusbut it should not be "because we didn't feel like it".  that will reflect poorly.17:53
tristanCcorvus: so there are ways to use cert-manager for zk in the zuul-operator, but what about the zuul-heml charts, should we start here first, then i'll port the change to zuul-operator (as i did with the initial service labels) ?17:57
*** jamesmcarthur has joined #zuul17:57
*** jpena is now known as jpena|off17:58
corvustristanC: i don't quite know enough helm to know how to do that; but i'm assuming there's some way that will be similar -- however you create the 3 (or more) zk pods, do the same for the tls certs17:59
corvustristanC: (i don't think there's a way to use zk-ca.sh with helm -- other than just running it before running helm and then using its output as input to helm)18:00
zenkurocorvus: softwarefactory-project.io/paste/show/176318:04
tristanCoh well, it seems like zuul-helm is using https://github.com/helm/charts/tree/master/incubator/zookeeper  which simply doesn't have tls or sasl18:05
corvuszenkuro: hrm, i was hoping for more output.  there may be something useful in .tox/docs/log/ directory.18:07
corvuszenkuro: but also, you may need to install the packages from bindep.txt18:08
corvuszenkuro: something like "sudo yum -y install $(bindep -b test)" may be necessary...18:09
corvusmordred, clarkb: ^ does that sound right?18:09
clarkbcorvus: yes that command for installing things looks correct18:09
clarkbrun from the repo top level dir (otherwise you can specify a path to the bindep file)18:10
mordredcorvus: I rechecked that osc image job and got the same error - so it might actually be a something18:12
zenkurocorvus: softwarefactory-project.io/paste/show/176518:12
mordredoh wait18:13
zenkurolooks like Im mising zuul manage ansible... but18:13
corvuszenkuro: that should be installed by tox; but running it is failing for some reason18:14
zenkurodo I need to install this to have check doc style? Isnt it a problem for security?18:14
corvuszenkuro: another thing you could do is try running ".tox/docs/bin/zuul-manage-ansible -h" and see what the output is18:14
mordredcorvus: ignore me - I hadn't added the appropriate zuul user to the dockerhub org :)18:14
corvusmordred: aha!18:15
*** sshnaidm|afk is now known as sshnaidm18:15
corvuszenkuro: if you have run tox before, but have recently updated the zuul repo, there may be new dependencies.  in that case, you should run "tox -r -e docs" to recreate it.18:15
zenkurocorvus: it fails with some python deps18:15
corvuszenkuro: pathspec?18:15
zenkurore218:16
fungithis may be the recent fedora re2 incompatibility someone was talking about18:16
corvuszenkuro: then do both things: run the bindep command, then run "tox -r -e docs"18:16
corvusfungi: oh?18:16
corvusi'm unfamiliar with the issue18:16
fungii'll see if i can find details18:16
fungi2020-03-13 11:30:21 tobiash clarkb, ianw: fyi libre2 in latest fedora is incompatible with prebuilt wheel of fb-re218:17
fungi(from scrollback in here)18:18
zenkuroyah, I have fedora18:18
tobiashIn fedora fr-re2 must be force installed by source (I don't recall the command atm)18:19
corvus--no-binary ?18:19
tobiashcorvus: I think that was it combined with a filter18:20
mnaserfwiw zuul-helm does _not_ do zookeeper deployments18:20
corvus--no-binary fb-re2 ?18:21
*** jamesmcarthur has quit IRC18:22
zenkurohm... kooks like installation of re2 devel src solved the issue18:22
tristanCmnaser: is the zookeeper dependencies coming from https://github.com/helm/charts/tree/master/incubator/zookeeper ?18:22
mnasertristanC: only for the "zuul-system" chart which is meant to be an integration point for now18:23
corvuszenkuro: what commands did you run?18:23
mnasermordred: unrelated but i know you tinkered a lot with pyenv.  did you find a way to get pyenv to install "latest" of a specific series18:24
mnaseraka pyenv install 3.7 (noonedeadpunk dug in initially with no success18:24
*** bhavikdbavishi has quit IRC18:25
tristanCmnaser: hmm, how is the zuul-helm supposed to be used then?18:25
mordredmnaser: nope. I do "pyenv install --list | grep '^ *3.7'"18:25
mordredthen look for the version with my eyeballs18:25
mordredit's not idea18:25
mordredideal18:25
mnasertristanC: "bring your own zookeeper and db"18:25
*** jcapitao has quit IRC18:25
tristanCmnaser: oh, ok. then when zuul requires zk auth, then that would be 'bring your own zk tls too'18:27
mnaseryep18:27
zenkurocorvus: dnf install re2-devel18:30
corvuszenkuro: ok good, that's already in bindep, so the bindep command i gave you earlier would have run that18:30
zenkurocorvus: tox -r -e docs18:30
tobiashcorvus, zenkuro: I used this to fix the tox env: ".tox/py37/bin/pip install --force-reinstall --ignore-installed --no-binary :all: fb-re2"18:31
tobiash(and installed re2-devel)18:31
corvustobiash, zenkuro: i just ran tox -e docs in a fedora:latest container after running "yum -y install $(bindep -b doc test)" and it worked without error18:31
zenkurobut why docs require so much things?18:31
tobiashcorvus: maybe the package has been fixed in the meantime18:32
corvuszenkuro: because it runs the zuul commands to collect the help output18:32
tobiashat least there was a new release in february18:33
*** jamesmcarthur has joined #zuul18:35
tobiashcorvus: oh, it looks like they just removed the prebuilt wheels from pypi: https://pypi.org/project/fb-re2/1.0.7/#files18:36
tobiashthat's also the reason the devel package is required now18:36
corvuswell that works :)18:38
tristanCaccording to https://github.com/jetstack/cert-manager/pull/2643/files#diff-c210e0aac6b999044e3472381a445c4b , it seems like the keystore password needs to be set at cert-manager installation time18:44
*** jamesmcarthur has quit IRC18:51
tristanCand well, it seems like it's part of the crd yet, only the controller seems to be implemeneted. corvus: is there something else i can use beside pkcs12 or jks?18:59
zbrensure-tox: https://review.opendev.org/#/c/690057/ -- i really need it in order to be able to change how tox is installed19:01
tristanChere is the Certificate CR I tried using a pkcs12 enabled cert-manager: http://paste.openstack.org/show/790752/19:01
tristanCcorvus: i think that's a good reason why the zuul-operator is not using cert-manager ^19:05
corvustristanC: pem-encoded pkcs8 with an encrypted key is an option19:10
tristanCcorvus: i'm not seing an option to provide an encryption key for pcks8, do you think that should be provided by the cert-manager?19:14
mnaserso with adding the 'vexxhost' tenant -- i noticed zuul breaks out after the first missing required-project (rather than checking all of them *then* breaking out)19:15
mnaserso its complanining about openstack/cinder *only* missing rather than all the others19:15
corvustristanC: probably worth a look at how the jks support handles it (it seems very likely that jks support would include a store password)19:15
mordredcorvus: where did we ever wind up with the "treat required projects as foreign" or whatever we were calling that potential change for making things like re-using repos like devstack easier? I'm pretty sure I should know the answer, but I'm not able to page it into my brain pellets19:16
corvusmordred: needs more design: https://review.opendev.org/61314319:17
mordredah yes!19:17
tristanCcorvus: reading https://github.com/jetstack/cert-manager/blob/master/cmd/controller/app/options/options.go#L322 , it seems like those (experimental) options are provided at cert-manager install time, L6597 of the yaml file19:17
mordredmnaser: ^^19:17
mordredmnaser: idea being that you really shouldn't need to add _any_ of those projects to your tenant if you don't want to - assuming the design issues can be sorted out19:18
corvustristanC: oof, that is the wrong place for that19:25
corvustristanC: it should be per-keystore, so i'd expect it to be an attribute of the certificate19:25
corvustristanC: or, at the *very* least, a property of the issuer19:26
corvusit's certainly not appropriate for it to be cluster-wide19:26
corvustristanC: what yaml file?19:27
corvusi'm having trouble figuring out where --experimental-jks-password should be provided19:28
tristanCcorvus: https://github.com/jetstack/cert-manager/releases/download/v0.14.0/cert-manager.yaml  L6597 I added: `- --experimental-issue-pkcs8` and `- --experimental-pkcs12-keystore-password=secretstorepassword`19:28
*** jamesmcarthur has joined #zuul19:28
mnasermordred: yeah -- i think i'd be a little more comfortable fixing the for loop that breaks early :P19:28
corvustristanC: thanks, that clarifies.  also, wow, that's an interesting way for that to happen.19:28
corvusmnaser: i *think* having that evaluate the complete set of r-p would be feasible and reasonable; i can't think of a reason not to do it right now.19:29
mnaserok i found it19:30
mnaserwe're raising an exception when we hit an unknown project19:30
corvusmnaser: in configloader.py19:30
corvusyeah that19:30
mnaserthe messages zuul emits aren't really like.. an api right?19:30
mordredseems collecting them and raising an exception with a larger error payload would be potentially nicer19:30
mnaser^^^19:30
mnaserso that sems reasonable to just start rewriting the message19:31
corvusmnaser: nope; the ones that are are special exceptions19:31
corvusyep19:31
* mordred afks for just a bit19:31
* corvus gets lunch19:31
*** harrymichal has joined #zuul19:34
mnaserjust wrote something with tests now, testing locally19:36
* clarkb is back ish from lunch and dealing with kids19:41
mnasercant run the tests locally because gear and select19:54
clarkbmnaser: there is a chnage up to address that but still needs people on osx to test I think19:54
mnaserclarkb: yeah i saw that and i saw it was also reverted too at some point, assuming it broke other things19:55
clarkbmnaser: https://review.opendev.org/#/c/708267/19:55
clarkbyes this fixes the revert (or so we hope)19:55
mnaserclarkb: i'll try to make time to try that out19:58
openstackgerritMohammed Naser proposed zuul/zuul master: config: show error for all unknown projects  https://review.opendev.org/71332319:58
mnasercorvus, mordred: should be sane ^19:58
mordredmnaser: you angered the gods of whitespace20:05
mnaserNOT AGAIN20:05
mnaserugh20:07
openstackgerritMohammed Naser proposed zuul/zuul master: config: show error for all unknown projects  https://review.opendev.org/71332320:07
*** jamesmcarthur has quit IRC20:29
*** jamesmcarthur has joined #zuul20:30
ianwtristanC : if things are complex, you could write the config file in jinja2 and deploy it as a template too ... personally i see it as something small enough to be useful and put in the linked dnm project-config change to illustrate why20:33
ianwif yourself and corvus like the idea of a parent structure more, can certainly implement that20:34
ianwbut if nobody thinks the general idea is useful, well, then that's that :)20:34
tristanCianw: i don't mind either way. I think there is value in being able to express hierarchy. In our case, we have fedora-30, fedora-31 and fedora-rawhide that are mostly identical beside the version name, and having parent image would remove the repetition. With a base image, we'll also get the global option you are proposing too.20:55
*** jamesmcarthur has quit IRC20:57
ianwtristanC: yeah, thanks for that suggestion it does feel more "zuul-onic"; i can write something up and see if it slots in neatly or if there's some horrific corner case i haven't thought of yet :) (my initial feeling is the former)20:59
tristanCianw: well we have to be careful about merging. we could override parent attribute with child, but perhaps for elements list we want to extend instead?21:01
ianwyes, elements list i think we want to extend, as done in the original21:03
ianwotherwise dict update() semantics21:03
*** jamesmcarthur has joined #zuul21:06
openstackgerritJames E. Blair proposed zuul/zuul master: Add TLS support for ZooKeeper  https://review.opendev.org/71253121:08
openstackgerritJames E. Blair proposed zuul/zuul master: Use ZK TLS in quickstart  https://review.opendev.org/71281721:08
corvustristanC: ^ i think that should fix the errors that showed up in tests21:10
tristanCcorvus: i'll submit a zuul-operator update shortly21:12
tristanCcorvus: qq, it seems like using a generic name like `zk` works for client auth. Is a fqdn required for quorum ssl, or would it be possible to use a single server key for all zk pod?21:14
corvustristanC: it's required for quorum unless you turn off host verification (i tend to think we should leave it on if we can, but it's open to discussion if folks think it would be okay to turn it off -- mostly i worry it will look bad)21:15
corvus(i guess it's probably the only way to perform authz on the quorum side, unless you used two different cas.  so it's probably good to keep it enabled)21:16
clarkbShrews: left a couple notes/questions on https://review.opendev.org/#/c/712539/321:17
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add TLS configuration to ZooKeeper service  https://review.opendev.org/71275921:18
tristanCcorvus: alright, so that seems to work for the current single pod zk service ^21:19
tristanCnow i wonder what it would take to add tls support to the existing zk operator or chart so that the operator doesn't have to worry about that21:20
openstackgerritClark Boylan proposed zuul/nodepool master: Revert abitrary uid support  https://review.opendev.org/71306021:20
clarkbmordred: corvus ianw ^ decided to updated that and keep that moving forward21:21
corvustristanC: should be pretty easy -- but if the zuul-operator is going to run zk internally, it should at least have a quorum of 321:22
tristanCcorvus: maybe not so easy. the zk helm chart manage replication using a single statefulset, thus if we want one secret per pod, then that might a more profond refactor21:26
tristanCftr, it is https://github.com/helm/charts/blob/master/incubator/zookeeper/templates/statefulset.yaml21:26
*** plaurin has quit IRC21:27
tristanCand that seems to be the case for the operator too: https://github.com/pravega/zookeeper-operator/blob/1c4e3f1908412e219efa630445ce003fdcae279e/pkg/zk/generators.go#L4121:28
corvustristanC: how does it set myid?21:29
tristanCcorvus: https://github.com/pravega/zookeeper-operator/blob/master/docker/bin/zookeeperStart.sh#L33 where ORD is the last number of the hostname, e.g. zk-0  zk-1 ...21:32
tristanCperhaps we could shove all the service cert inside a single secret, and use that zookeeperStart script to pick the right cert based on ORD too21:33
corvustristanC: i love how many "how do i do x in k8s?" questions are answered with "custom start script" :)21:36
corvustristanC: but yeah, if they've already gone down the path of parsing the hostname, then that could work.21:37
*** sshnaidm has quit IRC21:37
corvustristanC: actually...21:38
corvustristanC: jks has a concept of an alias -- and i think zk uses that to ask the jks for the cert, so it might actually just work to just put all the certs in the same jks and give that to each host21:38
corvustristanC: i did not test that, but i could do that now without too much trouble if that sounds useful.21:39
corvustristanC: (and i'm sorry i didn't think of this earlier)21:39
*** jamesmcarthur has quit IRC21:41
*** sshnaidm has joined #zuul21:44
tristanCcorvus: tls support request submitted here: https://github.com/pravega/zookeeper-operator/issues/14121:51
tristanCmordred: would you mind approving https://review.opendev.org/#/c/71281621:54
mordredtristanC: +A21:54
tristanCiirc tobiash manage zk in k8s locally, and perhaps it would be better if the zuul-operator does that too21:57
tristanCbut the fact that no zk deployment system seems to bother with tls or sasl is quite annoying21:58
*** jamesmcarthur has joined #zuul21:59
corvustristanC: well, it's pretty new21:59
corvustristanC: i'm working on testing the single-keystore idea22:00
corvustristanC: it doesn't look like that's working.  even though zk insists that the keystore alias for a cert match the subject of the cert, it doesn't actually pull out the cert which matches its own hostname.  so when i build a keystore with 3 certs, all the quorum hosts present the first cert when they connect. :(22:04
corvusthat's a real missed opportunity22:04
corvusso i think we're stuck with "provide a different cert file to each quorum host"22:04
corvusthis is what my keystore looked like: http://paste.openstack.org/show/790758/22:05
corvusbut they all used zoo122:05
*** jamesmcarthur has quit IRC22:06
tristanCarg, too bad. makes me wonder why the keystore support multiple key :)22:06
corvustristanC: i think for this case, but i think the zk developers didn't take advantage of it22:08
corvustristanC: oh! i found that you *can* use unencrypted pkcs8 files22:17
corvustristanC: i think that means that we should be able to use the standard output from cert-manager without change22:18
openstackgerritJames E. Blair proposed zuul/zuul master: Don't use JKS with ZK  https://review.opendev.org/71334022:21
corvustristanC: ^ like that22:21
mordredcorvus: that seems WAY better22:32
corvusyeah, it doesn't solve the additional complexity for the operator, but it at least reduces the problem to something that most people are familiar with ("here is a pem file for this server")22:33
corvushopefully it makes the cert-manager problem easier22:33
mordred++22:33
corvus("for the operator" -> "for the zk operator")22:33
*** jamesmcarthur has joined #zuul22:36
*** Defolos has quit IRC22:40
*** jamesmcarthur has quit IRC22:42
*** jamesmcarthur has joined #zuul22:42
*** Defolos has joined #zuul22:42
*** jamesmcarthur has quit IRC22:43
*** jamesmcarthur has joined #zuul22:44
tristanCcorvus: that looks promising, i updated the zk-operator issue.22:46
corvustristanC: cool; if it's not too hard for you to try, it may be worth asking cert-manager for a cert, then seeing if zk can start up with that (using a config like my latest change to quickstart)22:49
*** threestrands has joined #zuul22:51
mnaseri posted these over the weekend and they're blocking a change of mine: https://review.opendev.org/#/c/713107/ and https://review.opendev.org/#/c/713115/ if possible to have a look at them :)23:04
mnaser(i have a depends-on so my change is pending on those)23:04
*** jamesmcarthur has quit IRC23:11
*** jamesmcarthur has joined #zuul23:11
openstackgerritMerged zuul/zuul master: config: show error for all unknown projects  https://review.opendev.org/71332323:15
*** hashar has quit IRC23:15
*** jamesmcarthur has quit IRC23:20
*** jamesmcarthur has joined #zuul23:36
*** harrymichal has quit IRC23:39
*** clayg has left #zuul23:49
*** jamesmcarthur has quit IRC23:56
*** jamesmcarthur has joined #zuul23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!