*** tosky has quit IRC | 00:14 | |
*** jamesmcarthur has joined #zuul | 00:23 | |
openstackgerrit | Merged zuul/nodepool master: Stop using hacking for anything https://review.opendev.org/712781 | 00:42 |
---|---|---|
*** armstrongs has joined #zuul | 01:22 | |
*** sshnaidm is now known as sshnaidm|afk | 01:31 | |
*** armstrongs has quit IRC | 01:31 | |
openstackgerrit | Merged zuul/nodepool master: Add visual dividers for each image in Dockerfile https://review.opendev.org/713002 | 01:44 |
*** jamesmcarthur has quit IRC | 02:09 | |
*** swest has quit IRC | 02:10 | |
*** jamesmcarthur has joined #zuul | 02:13 | |
*** jamesmcarthur has quit IRC | 02:18 | |
*** jamesmcarthur has joined #zuul | 02:20 | |
*** swest has joined #zuul | 02:24 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Add diskimages-globals section for common configuration https://review.opendev.org/713157 | 02:48 |
*** jamesmcarthur has quit IRC | 03:13 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Add diskimages-globals section for common configuration https://review.opendev.org/713157 | 03:13 |
*** jamesmcarthur has joined #zuul | 03:27 | |
*** bhavikdbavishi has joined #zuul | 03:33 | |
*** jamesmcarthur has quit IRC | 03:56 | |
*** pleia2 has joined #zuul | 04:11 | |
*** jamesmcarthur has joined #zuul | 04:37 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Dockerfile: set SECLEVEL=1 for RAX dfw and ord region compatability https://review.opendev.org/713162 | 04:47 |
*** jamesmcarthur has quit IRC | 04:57 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Dockerfile: set SECLEVEL=1 for RAX dfw and ord region compatability https://review.opendev.org/713162 | 05:02 |
*** bolg has joined #zuul | 05:13 | |
*** evrardjp has quit IRC | 05:35 | |
*** evrardjp has joined #zuul | 05:36 | |
*** sanjayu_ has joined #zuul | 05:42 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Dockerfile: set SECLEVEL=1 for RAX dfw and ord region compatability https://review.opendev.org/713162 | 06:18 |
*** irclogbot_3 has quit IRC | 06:29 | |
*** dpawlik has joined #zuul | 06:58 | |
*** jamesmcarthur has joined #zuul | 06:59 | |
*** jamesmcarthur has quit IRC | 07:03 | |
*** bolg has quit IRC | 07:25 | |
*** irclogbot_0 has joined #zuul | 07:30 | |
*** NBorg has joined #zuul | 08:06 | |
*** dpawlik has quit IRC | 08:07 | |
*** dpawlik has joined #zuul | 08:07 | |
*** sugaar has joined #zuul | 08:11 | |
NBorg | Hello. Is there a suggested way to use tox (my problem is playbooks-syntax) from zuul-jobs if you use roles from another repo? Modify ANSIBLE_ROLES_PATH in tox.ini? | 08:17 |
*** dpawlik has quit IRC | 08:18 | |
*** bolg has joined #zuul | 08:22 | |
*** dpawlik has joined #zuul | 08:28 | |
*** jcapitao has joined #zuul | 08:32 | |
*** jcapitao has quit IRC | 08:34 | |
*** jcapitao has joined #zuul | 08:36 | |
*** bolg has quit IRC | 08:37 | |
*** bolg has joined #zuul | 08:38 | |
*** jcapitao has quit IRC | 08:42 | |
*** jcapitao has joined #zuul | 08:42 | |
*** jpena|off is now known as jpena | 08:50 | |
*** tosky has joined #zuul | 09:07 | |
*** sanjayu_ has quit IRC | 09:22 | |
*** arxcruz is now known as arxcruz|rover | 09:34 | |
zbr | morning! | 09:34 |
zbr | who can help me merge few bits? https://review.opendev.org/#/c/708642/ | 09:34 |
*** harrymichal has joined #zuul | 09:37 | |
AJaeger | zbr: I suggest to ask clarkb again, he had some comments and one of his was not addressed... | 09:39 |
zbr | AJaeger: my impression was that was not an actionable comment.bindep/test-reqs do not change very often, but they can impact testing, so it makes sense to run when modified. | 09:41 |
zbr | AJaeger: in fact the one I really care about is https://review.opendev.org/#/c/690057/ | 09:44 |
zbr | because over the weekend I was unable to fix some broken tox jobs on ansible zuul instance due to lack of this feature | 09:45 |
zbr | the centos-7 image had preinstalled ancient tox, and ensure-tox is currently unable to upgrade it. paul was in PTO, so it was impossible to do anything. | 09:46 |
zbr | probably he will fix the image today, but is much better to be able to workaround bugs in images. | 09:47 |
AJaeger | zbr: leave a comment to his one ;) | 09:49 |
zbr | already did | 09:49 |
AJaeger | great | 09:49 |
AJaeger | zbr: I think one point clarkb wanted to make is that the this is for the test job and only the bindep file in zuul-jobs. Just to double check that you are aware of it | 09:50 |
AJaeger | zbr: I'll approve tomorrow if clarkb finds no time today | 09:50 |
zbr | i am wondering if it is possible to validate an image before switching it on zuul, so we can avoid putting broken images. | 09:51 |
zbr | i am still hoping to reach a point where a change made to the build image is tested like a normal job and last step is to try to use that image, so we know is not worse than previous one. | 09:52 |
AJaeger | zbr: not today, I expect this would need major changes to nodepool and Zuul | 09:52 |
zbr | this was my impression too, but we should not ignore it, i seen multiple issues related to this. | 09:53 |
zbr | once we have this, it will be much easier to update images (more confidence) | 09:53 |
*** saneax has joined #zuul | 10:01 | |
*** avass has joined #zuul | 10:03 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182 | 10:08 |
*** sshnaidm|afk is now known as sshnaidm | 10:08 | |
avass | Soooo, someone just gave a node in nodeset the name something like 'abc/123'. And it looks like that's completely valid, except it causes some problem with the valdiate-host role | 10:09 |
avass | And I can see how names like that could cause a lot of problems in other ways | 10:09 |
*** hashar has joined #zuul | 10:10 | |
avass | But since it's valid in both zuul and ansible I think it should be fixed | 10:11 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182 | 10:16 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182 | 10:18 |
openstackgerrit | Daniel Pawlik proposed zuul/zuul-jobs master: DNM - checking new images https://review.opendev.org/713183 | 10:22 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182 | 10:24 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182 | 10:28 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182 | 10:30 |
*** hashar is now known as hasharLunch | 11:09 | |
*** rlandy has joined #zuul | 11:30 | |
*** Defolos has quit IRC | 11:39 | |
*** Defolos has joined #zuul | 11:44 | |
*** avass has quit IRC | 11:50 | |
*** avass has joined #zuul | 11:58 | |
*** jcapitao is now known as jcapitao_lunch | 12:00 | |
*** harrymichal has quit IRC | 12:05 | |
*** harrymichal has joined #zuul | 12:06 | |
*** hasharLunch has quit IRC | 12:15 | |
*** pabelanger has joined #zuul | 12:37 | |
*** NBorg has quit IRC | 12:40 | |
*** bhavikdbavishi has quit IRC | 12:48 | |
*** jpena is now known as jpena|lunch | 12:59 | |
*** jcapitao_lunch is now known as jcapitao | 13:08 | |
*** hasharLunch has joined #zuul | 13:22 | |
*** hasharLunch is now known as hashar | 13:23 | |
openstackgerrit | Monty Taylor proposed zuul/nodepool master: Revert "Dockerfile: add support for arbritary uid" https://review.opendev.org/713060 | 13:26 |
openstackgerrit | Monty Taylor proposed zuul/zuul master: Remove fix-tox workaround for python3.5 https://review.opendev.org/712495 | 13:27 |
openstackgerrit | Monty Taylor proposed zuul/zuul master: Remove duplicate variables for tox jobs https://review.opendev.org/712496 | 13:27 |
*** dpawlik has quit IRC | 13:29 | |
*** avass has quit IRC | 13:29 | |
*** Defolos has quit IRC | 13:36 | |
*** NBorg has joined #zuul | 13:36 | |
*** Defolos has joined #zuul | 13:38 | |
*** erbarr has joined #zuul | 13:38 | |
*** jamesmcarthur has joined #zuul | 13:42 | |
*** jpena|lunch is now known as jpena | 13:50 | |
*** zxiiro has joined #zuul | 13:54 | |
openstackgerrit | Merged zuul/nodepool master: Stop comparing hostnames to determine ownership https://review.opendev.org/713057 | 13:58 |
tristanC | zuul-maint: a small stack of 3 changes for the zuul-operator is waiting for review here: https://review.opendev.org/713138 . Those are little things i'm basing on a wip integration tests | 13:59 |
corvus | tristanC: where's the wip integration tests? | 14:02 |
tristanC | well it's still a wip, i meant my wip is on top of the stack | 14:03 |
corvus | ok that clears it up :) | 14:03 |
mordred | corvus: that's a fun issue from avass there in scrollback | 14:04 |
tristanC | i basically have a role to setup static node provider and i'm loooking forward triggering jobs and testing the different parts like builds result, console stream, etc... | 14:04 |
corvus | mordred: yeah -- i wonder what "valid in ansible" means? | 14:04 |
corvus | tristanC: you can borrow a bunch of stuff from the quickstart for that | 14:05 |
mordred | corvus: that's the second time in a week I've seen someone use a character in a zuul name that I'd never considered using - but which there's no specific reason not to use (mnaser used a : in a job name and my brain when "pow") | 14:05 |
mordred | corvus: maybe you can put a / in an inventory hostname since it's a valid ansible char? | 14:05 |
corvus | mordred: right, but it's not a valid dns name | 14:05 |
mnaser | ya i kinda like ':' esp because it helps splitting different smaller 'slugs' | 14:06 |
mordred | corvus: yah - but ansible inventories don't have anything to do with dns | 14:06 |
corvus | mordred: ah, then 713182 is probably the way to go. i wonder if the prepare artifact role is okay? | 14:07 |
mordred | corvus: good question | 14:08 |
mordred | corvus: I have verified | 14:08 |
mordred | corvus: foo/bar ansible_host=mordred-irc.inaugust.com | 14:08 |
mordred | in a hosts file for ansible totally works with foo/bar being the ansible name of that hose | 14:08 |
*** plaurin has joined #zuul | 14:08 | |
mordred | host | 14:08 |
plaurin | Good morning irc people! | 14:09 |
* tristanC waves | 14:09 | |
mnaser | o/ | 14:09 |
*** Goneri has joined #zuul | 14:09 | |
plaurin | zuul chugging nicely with happy developers since kubernetes stream output is working :) | 14:10 |
corvus | mordred: qq on 713060 | 14:10 |
corvus | plaurin: \o/ | 14:10 |
mordred | plaurin: \o/ | 14:11 |
plaurin | I was wondering are there any plans for more interactions with the zuul ui? like triggering or killing jobs | 14:11 |
mordred | corvus: I don't think the sudo setting works if we don't add a nodepool user | 14:11 |
corvus | mordred: this is a good point | 14:12 |
corvus | mordred: we actually do create a zuul user in the zuul containers | 14:13 |
corvus | mordred: so maybe we should leave that part and just revert out the uid entrypoint bit? | 14:13 |
tristanC | plaurin: there are plans for that indeed, for killing or enqueuing jobs there is https://review.opendev.org/#/c/643536/ | 14:14 |
plaurin | autohold / dequeue / enqueue, nice | 14:15 |
plaurin | use those quite a lot | 14:16 |
tristanC | plaurin: and you already reviewed https://review.opendev.org/635716 which adds manual trigger, but I think we need a proper spec for such use-case | 14:16 |
*** Goneri has quit IRC | 14:16 | |
plaurin | ya I remember that one | 14:16 |
plaurin | those upcoming features will make a big difference | 14:17 |
plaurin | for me and my team anyways :) | 14:18 |
*** Goneri has joined #zuul | 14:19 | |
tristanC | plaurin: fwiw, we also investigated a service to convert arbritary event like AMQP message to a project pipeline config with this project: https://pagure.io/software-factory/zuul-gateway | 14:20 |
corvus | plaurin: zuul is primarily intended to integrate with code review systems so that developers mostly interact with it there (rather than with its own ui). what are the use cases that cause you to want to manually trigger and kill jobs? | 14:22 |
mordred | corvus: ok. good point. I'll update it | 14:23 |
mordred | corvus: I might need your eyeballs on a container job | 14:23 |
mordred | corvus: https://zuul.opendev.org/t/openstack/build/565c64362194492fa0c603c3396d053c/console | 14:23 |
mordred | corvus: it uploaded the image but failed uploading the tag | 14:24 |
mordred | with an access denied | 14:24 |
corvus | mordred: that sounds like an internet hiccup | 14:24 |
mordred | yeah? awesome | 14:25 |
mordred | should I just reapprove the change do you think? | 14:25 |
corvus | mordred: yeah | 14:25 |
corvus | mordred: and i guess we should loop that? | 14:25 |
mordred | corvus: yeah - probably so | 14:26 |
mordred | corvus: if reapprove works, let's add a loop | 14:26 |
mordred | if not - we can investigate more | 14:26 |
*** harrymichal has quit IRC | 14:26 | |
plaurin | We're doing UUT testing (units under test), BBD development. We run hardware tests and firmware upgrade/downgrade tests on actual hardware. Reality is that hardware and custom firmware crashes machines a lot, making many of our zuul job wait for unit connections, etc... We of course need proper failure detection but this is complex in the case of hardware and firmware issues and we do get a lot of 'stuck' scripts for long running job | 14:26 |
plaurin | s (4h+ long jobs). | 14:26 |
plaurin | happens a lot that we want to stop something instead of waiting for 4+ hours for it to finish | 14:27 |
mordred | so thus why you'd like to be able to manually cancel a job? | 14:27 |
plaurin | and then trigger it again after we fix hardware | 14:27 |
openstackgerrit | Merged zuul/zuul master: Declare support for Python3.8 https://review.opendev.org/712489 | 14:27 |
plaurin | yeah, I use the zuul dequeue and enqueue quite a lot | 14:27 |
corvus | plaurin: which code review system do you use? (gerrit, github, ...?) | 14:28 |
plaurin | gerrit | 14:28 |
plaurin | we also use a lot of cron ' timer ' zuul jobs, running qualification tests 24h | 14:29 |
plaurin | we have to kill those from time to time after hardware failure, stress tests, etc | 14:29 |
corvus | plaurin: i do not recommend it now, but in the future, we should have good support for the checks plugin which has a 're-run' button right in gerrit. so that could help with that part of it. mhu's work would still be the best answer for the dequeue part of it. | 14:29 |
mordred | yeah - in the mean time - we use a "recheck" keyword in comments in gerrit to allow re-triggering jobs | 14:30 |
plaurin | yeah we use recheck a lot too when we can. I set recheck and rebuild | 14:30 |
mordred | cool. so yeah - mhu's work is your answer for the other thing | 14:30 |
corvus | plaurin: thanks for the info, it's good to hear how it's being used :) | 14:31 |
plaurin | sure np :) | 14:31 |
mordred | oh - there's another trick you can use | 14:31 |
mordred | plaurin: if you abandon the change, zuul should cancel its jobs | 14:32 |
mordred | and then kick them off again when you restore it | 14:32 |
plaurin | I have a stuck job right now lol let me try this | 14:32 |
corvus | or if you upload a new patchset (you could just change the commit message right in gerrit), it will cancel and restart | 14:32 |
plaurin | yes new patchset I am aware of | 14:33 |
plaurin | ha, abandon trick works cool | 14:34 |
corvus | plaurin: what version of gerrit are you running? | 14:34 |
corvus | mordred, plaurin: i just thought of a variant of the abandon trick: if we add support for gerrit tags to pipeline requirements (which we should anyway) you could have a pipeline which "rejects: tag: suppress-ci" and you could tag a change with that and it would cause zuul to cancel jobs and refuse to enqueue it. | 14:36 |
plaurin | 2.16.13 | 14:37 |
plaurin | yes that would be nice for the tag | 14:37 |
corvus | (ftr, i don't think any of this is a better ux than mhu's web dequeue; these are just workarounds until that's ready) | 14:37 |
plaurin | +1 | 14:38 |
fungi | from a "jobs are stuck and have to wait for them to time out or abort manually" perspective, how about a watchdog? maybe an option for a job which says it should be prematurely stopped if the console stream goes silent for x minutes, or something like that | 14:41 |
corvus | clarkb, mnaser: you may be interested in giving https://review.opendev.org/710649 and children a quick look -- dhall updates to the operator | 14:42 |
plaurin | fungi we have couple of BBD hardware stress test cases that can be waiting silently for 1 to 2 hours, like a CPU stress test for example | 14:44 |
fungi | yeah, just wondering how you normally identify that it's not going to complete which leads you to end it earlier, and whether that's something which could signal zuul to do the same instead | 14:45 |
plaurin | I depend a lot on devs designing proper failure cases and detection :( | 14:46 |
plaurin | You bet we had a lot of "It works on my machine but not on zuul, so zuul is broken' (...lol) | 14:46 |
fungi | so then... what leads you to stop a job? you said "we do get a lot of 'stuck' scripts for long running jobs ... happens a lot that we want to stop something instead of waiting" | 14:47 |
fungi | how do you make that determination? | 14:47 |
plaurin | It's kinda hard to know and usually unexpected stuff. Many times it's the devs who ask me to stop them because they see something not going well or taking more time than usual | 14:48 |
plaurin | most of my job have 8 hours timeout | 14:48 |
fungi | ahh, okay, just curious if it's a general set of cases which zuul could identify or get notification of automatically | 14:48 |
plaurin | also their test builtin timeout are badly designed but I have no control over this hehe :( | 14:48 |
plaurin | I mean their test timeout built in the code | 14:49 |
corvus | and if there's some way to detect it (by probing or other inspection of the test resource), maybe we could have ansible do that, and tell zuul the job has failed | 14:50 |
plaurin | imho inspection of the status of the test resource should be part of the dev's BDD | 14:50 |
plaurin | they have so many tests that kills and re-enable the network layer that would be hard to follow | 14:51 |
plaurin | soo many unexpected variables hehe | 14:51 |
plaurin | by having trigger 'buttons' I can react quickly for them manually | 14:52 |
plaurin | I guess I could leverage the zuul api too | 14:52 |
Shrews | mordred: what's driving the nodepool reverts? | 14:56 |
plaurin | a button to "disable" a job would also be useful for me. For my 24/7 cron jobs. Sometime I have to suspend them for hardware maintenance like bricked machines, etc | 14:57 |
mordred | Shrews: we reverted the similar code in zuul already because it turns out it's not actually providing the value it purports to - but more importantly it actually broke things in the nodepool-builder image when running under podman | 14:58 |
corvus | Shrews: see https://github.com/RHsyseng/container-rhel-examples/issues/103 for more info | 14:59 |
Shrews | <sung to batman theme> nuh nuh, nuh nuh, nuh nuh, nuh nuh, podman! | 15:00 |
Shrews | you're welcome for that | 15:00 |
openstackgerrit | Monty Taylor proposed zuul/nodepool master: Revert abitrary uid support https://review.opendev.org/713060 | 15:00 |
Shrews | corvus: oh, yeah. that's not good | 15:02 |
fungi | holy security holes, podman! now i have the theme stuck in my head, thanks | 15:02 |
Shrews | >:) | 15:03 |
fungi | *pow* | 15:03 |
clarkb | corvus: left a response to one of your comments on https://review.opendev.org/#/c/713157/2 if you have a minute | 15:14 |
clarkb | mordred: and left a thought on https://review.opendev.org/#/c/713162/3 in response to one of your ideas. I doubt that will work though | 15:15 |
clarkb | ah yup fungi has responded with similar doubt | 15:16 |
corvus | clarkb: wait, elements are not ordered? | 15:16 |
clarkb | corvus: not on the command line. They are internally ordered | 15:16 |
corvus | huh, how about that | 15:16 |
mnaser | i don't want to be a pain but i think maybe we should have opendev/nodepool and zuul/nodepool if we're going to have provider-specific hacks/workarounds | 15:20 |
corvus | mnaser: regarding? | 15:20 |
mnaser | corvus: like the /etc/ssl/openssl.cnf change in the zuul/nodepool change discussed above | 15:20 |
clarkb | mnaser: fwiw that ssl change is in line with other distros | 15:21 |
clarkb | debian is being weird there | 15:21 |
clarkb | oh unless this is the new one | 15:21 |
clarkb | for the new one ya, this gets weird which is why I was trying to brainstorm config based fixes | 15:21 |
clarkb | (I think monty was tring to do that too) | 15:21 |
mnaser | "Note that CAs have stopped issuing certificates that didn't meet those requirements in January 2015, and since January 2017 all valid CA certificates should meet those requirements." -- i'm curious how we'd run into that .. unless they have a really really old cert | 15:22 |
*** Goneri has quit IRC | 15:22 | |
clarkb | mnaser: its easy enough to check with openssl s_client /me looks | 15:23 |
mnaser | they both seem recent | 15:23 |
clarkb | and are issued by digicert but sure enough doing sha1 | 15:24 |
mordred | mnaser: I agree - I'd rather avoid putting ssl workarounds for a provider into the image and would much prefer figuring out a way to have it be in the provider-specific config | 15:24 |
fungi | yeah, i think step 1 is to try interacting with rackspace via osc/sdk from a recent debian and see if there's a way to make that work without editing /etc/ssl/openssl.cnf | 15:25 |
clarkb | ok its the peer signing digest not the cert itself | 15:25 |
clarkb | this is what is used during the handshake | 15:25 |
clarkb | which is why they can have a recent cert that isn't rejected, but still fail because sha1 | 15:25 |
corvus | mnaser: in principle, i agree; but i also don't think "use the default upstream openssl.cnf" file is too out there; if this can happen in sdk, that would be best, but if not, this seems low risk/cost. | 15:26 |
fungi | as a reminder, this is what debian's doing to the upstream openssl.cnf when packaging it for redistribution: https://salsa.debian.org/debian/openssl/-/blob/debian/unstable/debian/patches/Set-systemwide-default-settings-for-libssl-users.patch | 15:27 |
clarkb | "In TLS 1.2 the client can send which digest algorithms are supported for the handshake. If the client does not offer SHA-1 the server should not use SHA-1, which means that it might use other algorithms offered by the client or that the connection fails." | 15:27 |
clarkb | from stackexchange so unsure of how accurate that is, but maybe we can force the remote to use sha2 by not claiming to accept sha1 | 15:28 |
*** sshnaidm is now known as sshnaidm|afk | 15:28 | |
* clarkb trying to see how to do that with s_client now | 15:28 | |
fungi | i doubt the handshake is claiming to accept sha1, but set_ciphers() should allow us to remove it if so | 15:29 |
*** Goneri has joined #zuul | 15:29 | |
mordred | corvus: well - maybe the answer might want to be "use the upstream openssl conf file" instead of "sed a particular setting" | 15:32 |
clarkb | -sigalgs SHA256 is apparently not correct | 15:32 |
corvus | mordred: indeed | 15:34 |
clarkb | ah I need algorithm+hash | 15:34 |
clarkb | `openssl s_client -host dfw.images.api.rackspacecloud.com -port 443 -sigalgs RSA+SHA256` continues to use sha1 | 15:34 |
tristanC | ianw: about https://review.opendev.org/713157 , have you thought about using dhall to manage your nodepool.yaml configuration? we are giving it a try and working on dhall-nodepool binding: https://tree.taiga.io/project/morucci-software-factory/us/3378 | 15:36 |
clarkb | tristanC: I'm not sue I could +2 such a change in opendev | 15:36 |
corvus | yes please, please keep dhall away from users | 15:36 |
corvus | i still don't understand the operator | 15:37 |
corvus | tristanC: if there is a problem with the zuul or nodepool configuration language, maybe we could work together on making it better? | 15:37 |
corvus | rather that replacing it with something that's very hard for users to understand? | 15:38 |
corvus | (i thought with our discussions on the operator we were in agreement that it was worth exploring for a system like the operator where programmatic manipulation of the config files was necessary, but that it was not suitable for direct use by end-users, which is what we're talking about here) | 15:39 |
clarkb | does openssl itself disable sha1 peer signatures by default? | 15:39 |
mnaser | clarkb: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912759 i think the same thing is happening here | 15:39 |
openstack | Debian bug 912759 in openssl ""wrong signature type" with working websites" [Important,Open] | 15:39 |
mnaser | which links to https://github.com/openssl/openssl/issues/7126 | 15:40 |
clarkb | mnaser: probably. Though I'm trying to gauge risk and if debian is being over cautious here | 15:40 |
clarkb | if openssl has decided this isn't valid we've got more leverage with the server side too | 15:42 |
clarkb | because they'll break pretty broadly | 15:42 |
tristanC | corvus: iiuc, either we add abstraction in the runtime like nodepool and zuul to express complex things, like `use this env-vars on all images because it is declared in this new diskimages-globals option`, or either we keep the configuration simple (or even simpler as we could remove zuul project-template too) by moving the abstraction in the configuration language directly | 15:42 |
tristanC | corvus: i am happy with either solution, though i prefer the later as it works for any runtimes | 15:43 |
clarkb | mnaser: et al https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_security_level.html seems to say that seclevel 1 is the default | 15:45 |
tristanC | and both solution are not incompatible, for the issue ianw was having about setting the zuul public key on all images, that something you can do locally in opendev system-config using a dhall-to-yaml step | 15:46 |
mordred | clarkb, mnaser, corvus: reading the above it seems to me like debian is pushing a bit too aggressively here with their config file changes. why don't we just install the upstream openssl conf? | 15:46 |
corvus | tristanC: i don't think the other zuul maintainers are in favor of the second option, so let's do what we've been doing for the past few years: slowly and carefully evolve the config syntax as needed. | 15:46 |
clarkb | and even specifies that higher than 1 is likely to cause many problems | 15:46 |
mordred | yeah | 15:46 |
clarkb | mordred: ya I think that is what I'm leaning towards especialyl since openssl explicitly warns against doing what debian does | 15:46 |
mordred | yeah | 15:46 |
mordred | should we just do it in the python-base image? | 15:46 |
clarkb | note master says the same thing https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_security_level.html | 15:47 |
clarkb | mordred: thats likely to reduce future debugging headaches as more things land on that image | 15:47 |
clarkb | (though we may be able to get away with stricter rules in specific cases) | 15:48 |
corvus | tristanC: note that we learned a lot of things from the zuul syntax that we could probably take advantage of if we reworked nodepool; we never made as much of a clean break with nodepool as we did for zuul in v3 | 15:48 |
mnaser | I’m indifferent about sed or replacing the whole thing after doing this small tad of research | 15:48 |
mordred | any thoughts on the best way to do that? should we just download a current copy of the upstream file into system-config? or do a build-time curl? | 15:48 |
mnaser | It seems like the right approach however imho | 15:48 |
clarkb | mnaser: I would vendor | 15:48 |
clarkb | er mordred ^ | 15:49 |
mordred | clarkb: kk | 15:49 |
mordred | anybody got a link to the upstream file handy? | 15:49 |
tristanC | corvus: well for nodepool, the issue we are trying to solve with dhall is assigning labels and diskimages accross different provider and launcher | 15:49 |
zbr | how do I query zuul about all builds that run in the last 7 days? apparently the api has a count limit but no way to sort the results | 15:49 |
zbr | and clearly order is neither start or end time | 15:49 |
*** jcapitao is now known as jcapitao_lunch | 15:50 | |
corvus | tristanC: that sounds like something that could drive a spec for a new config file format. | 15:50 |
*** jcapitao_lunch is now known as jcapitao_afk | 15:50 | |
clarkb | mordred: https://github.com/openssl/openssl/blob/master/apps/openssl.cnf I think that is it | 15:51 |
*** bolg has quit IRC | 15:51 | |
clarkb | I don't know if that will work as pure drop in though (possible paths need changing/) | 15:51 |
*** bolg has joined #zuul | 15:51 | |
tristanC | corvus: iirc, mordred suggested that nodepool configuration could be managed like zuul, e.g. in config projects. | 15:51 |
mordred | clarkb: maybe we should just apply a reverse of the patch fungi linked | 15:52 |
mordred | clarkb: actually. why don't I move this convo to opendev | 15:53 |
tristanC | but i guess before that, a much needed improvement would be to have a single nodepool configuration that could be used by many launcher and builder service | 15:53 |
clarkb | tristanC: isn't that how it works already? | 15:53 |
clarkb | I guess not with launchers | 15:53 |
*** bolg has quit IRC | 15:53 | |
clarkb | (builders kinda need to be set up that way as we discovered recently though) | 15:54 |
corvus | tristanC: note that one of the main tasks of zuul-operator is supposed to be to take the nodepool launcher config and split it so there is one pod for each launcher; that's not implemented yet, is it? | 15:54 |
corvus | clarkb: not really | 15:54 |
tristanC | corvus: right, the zuul-operator could do that, but wouldn't it better if nodepool support it natively? | 15:54 |
corvus | clarkb: builders are intended to have the same config and act as a cluster; it's only when you split architectures do they need different configs. | 15:54 |
tristanC | (and it's not implemented yet, i don't think the launcher service is even activated yet) | 15:54 |
corvus | tristanC: well, the operator has to decide how many launcher pods to run anyway | 15:54 |
corvus | (or, someone could propose a new nodepool algorithm that lets launchers act as a cluster) | 15:55 |
corvus | tristanC: let's see if we can wrap up the work on zk auth/tls? | 15:58 |
corvus | tristanC: it looks like my zuul + quickstart patches are failing tests, i'll fix them up. but they should generally be working. you suggested that you might want to just drop the sasl support and just use x509 for authn/z. do you still want to do that? | 15:59 |
tristanC | corvus: i think that cluster algorithm would help a lot. then back to the initial new diskimages-globals option, not sure if it's ok to revisit now, but i would suggest to add a more generic parent inheritence system similar to what zuul do with jobs | 15:59 |
corvus | tristanC: i agree | 15:59 |
corvus | about both | 15:59 |
corvus | tristanC: i don't think there's any rush on diskimages-globals, feel free to -1 that for more design | 15:59 |
tristanC | alright, let me drop a comment and then i'll look into zk auth | 16:00 |
corvus | (clearly, we can just keep copying dicts around until we're happy with something better) | 16:00 |
corvus | clarkb, mnaser, fungi, mordred, SpamapS (if you're around): what do you think about using tls certs (signed by a private ca) for zookeeper (as authn and authz) and omitting the sasl password support? | 16:03 |
clarkb | corvus: that should be fine | 16:03 |
fungi | seems light it ought to work | 16:04 |
fungi | s/light/like/ | 16:04 |
*** saneax has quit IRC | 16:04 | |
mordred | seems fine to me | 16:04 |
fungi | how easy is it to specify a ca to trust in zk? | 16:05 |
fungi | do you have to alter the trust set in the jvm? | 16:05 |
tristanC | corvus: assuming tls can prevent anon connection, then i'm happy to abandon the sasl support | 16:05 |
mordred | fungi: well, we have to do that already for tls connections for zk anyway - so basically with tls support we've done all the hard bits of config - so I think the thought is we might as well just use that for auth? | 16:06 |
mnaser | corvus: from someone who waves around the world k8s a lot, i'd probably prefer that cause i can use something like cert manager to create a ca for the deployment and then manage all the certs distribution | 16:06 |
*** bhavikdbavishi has joined #zuul | 16:08 | |
corvus | tristanC: btw, zk acls also support x509, so if we wanted, we could keep the default_acl stuff; we wouldn't have to require it, but it could be an option... | 16:08 |
corvus | mnaser: yeah, tristanC started looking at cert-manager | 16:09 |
corvus | mnaser: i wrote a shell script that does the equivalent for non-cert-manager environments. i think cert-manager supports everything the shell script does, so it's hopefully mostly a matter of "porting" it. | 16:09 |
*** hashar has quit IRC | 16:12 | |
tristanC | corvus: mnaser: i had a quick look and i couldn't find how it could be deployed in a the same namespace as the zuul-operator (since i didn't want to bother with a cluster wide configuration for our self-signed cert only used by control-plane) | 16:15 |
tristanC | and i wonder if cert-manager is really worth the trouble for zk and gearman tls, since they are self-signed | 16:16 |
corvus | tristanC: cert-manager defines CRDs, so it needs to be installed by an admin in the cluster anyway. is that a problem? | 16:18 |
*** harrymichal has joined #zuul | 16:21 | |
SpamapS | corvus: +1, it's the better option. TBH, I'm also starting to wonder if Zuul 6 should be "the etcd3 migration". ZK seems to be losing its community rapidly. :-/ | 16:22 |
fungi | up side is that zk server certs don't need to be self-signed if we're already setting up a ca to sign the peer certs for auth | 16:22 |
fungi | we could sign the server certs with the same ca and then peers have the option of validating the server certs | 16:23 |
corvus | SpamapS: ack; that'll be a good time to re-evaluate | 16:23 |
corvus | fungi: yeah, all of these are actually "private CA-signed"; i think tristanC mistyped | 16:25 |
corvus | (nothing is self-signed, zk validates the signature of every cert, server-side and client-side) | 16:25 |
fungi | oh, okay, thanks for clarifying | 16:26 |
fungi | and yes, i have nothing against self-signed server certs (provided appropriate precautions are taken to establish the server's identity some other way), but if there's already a private ca involved then there's little reason for self-signing | 16:27 |
tristanC | corvus: you don't need to be admin to defines CRD, the zuul-operator can (and is) installed locally without being cluster admin | 16:28 |
mnaser | corvus, tristanC: i mean in my case, because of zuul-helm, i can probably add support to generate a ca/etc and point users at "make sure cert-manager is installed" | 16:29 |
mnaser | so it just lets you use the crds opt-in | 16:29 |
tristanC | fungi: corvus: i meant the ca is self-signed. or perhaps i don't actually understand tls :) | 16:29 |
*** jamesmcarthur has quit IRC | 16:30 | |
*** harrymichal has quit IRC | 16:30 | |
tristanC | mnaser: i think that should work yes. i don't know how that would be expressed in zuul-helm though | 16:31 |
*** jamesmcarthur has joined #zuul | 16:31 | |
corvus | tristanC: yes, the ca is self-signed; the ca signs the certs; the server and client trust the (self-signed) ca. | 16:31 |
tristanC | corvus: in that case, i don't think we need any cluster wide cert-manager right? the (self-signed) ca would only be used by zuul services | 16:31 |
tristanC | used internally* | 16:32 |
corvus | tristanC: yeah, it's not about that | 16:32 |
corvus | didn't we talk about this on friday? | 16:32 |
tristanC | i think so, then i looked at using https://cert-manager.io/docs/configuration/selfsigned/ | 16:32 |
corvus | yes i looked at that too, that's how you make the ca. | 16:33 |
corvus | i don't understand the resistance to using cert-manager? | 16:33 |
tristanC | but when installing cert-manager, it created a new namespace and required cluster wide things that didn't make sense to me for zuul ca | 16:33 |
fungi | tristanC: right, any ca which is the root of trust may as well sign itself (though they can be cross-signed too). the point of the ca cert is that you can implicitly trust it through out of band distribution | 16:33 |
fungi | so it doesn't ultimately matter where its signature came from | 16:33 |
fungi | as opposed to independently distributing a bunch of self-signed server and even client certs | 16:34 |
tristanC | corvus: the resistance for me is that it's an extra requirements, and it seems like it requires cluster admin privilege. | 16:35 |
fungi | a ca gives you the ability to only have to solve distributing one cert | 16:35 |
corvus | tristanC: i guess i don't see those as objectionable. being able to use the standard k8s way of managing a CA seems worth it to me. it's one extra line in the installation step to install it. | 16:38 |
*** jamesmcarthur has quit IRC | 16:41 | |
*** jamesmcarthur has joined #zuul | 16:42 | |
corvus | mnaser: you seem to think cert-manager is worthwhile, what would you recommend? | 16:42 |
mnaser | corvus: i mean from a zuul perspective, it shouldn't really care what you end up doing. from an operator/helm perspective, i would think it would be nice to have something like certSource: cert-manager or something along those lines | 16:43 |
mnaser | and then that way it can generate a ca and certs off of it | 16:43 |
mnaser | and the neat thing is they all end up in k8s secrets which you can just mount into the container/env variables/whatever way you end up manipulating it | 16:44 |
mnaser | probably just mount it in /etc/zuul/tls and point configs towards it | 16:44 |
tristanC | corvus: here is a cert-manager installation log http://paste.openstack.org/show/790745/ , used memory went from 948MB to 1.1GB, or about 15% increase in usage. i mean that's not too bad, but still... | 16:45 |
corvus | i feel like if memory efficiency is a concern, maybe k8s isn't the right system. | 16:45 |
fungi | any idea what's using the extra memory? a ca shouldn't need to be a persistent service unless you're also providing a software hsm to automatically handle signing requests or something | 16:46 |
fungi | s/software hsm/virtual hsm/ | 16:46 |
corvus | fungi: there's a daemon to watch for cert requests and process them | 16:46 |
corvus | ==pod | 16:46 |
mnaser | also | 16:47 |
mnaser | cainjector and webhook is running in there | 16:47 |
mnaser | not needed for this specific use case | 16:47 |
clarkb | fungi: think of it like a whole new api service that updates a ca | 16:47 |
mnaser | its also nice because it can talk to ACME issuers or whatnot | 16:47 |
mnaser | or something even like vault | 16:48 |
mordred | yeah - I mean - k8s isn't exactly the system you use to run the minimum number of daemons | 16:48 |
fungi | yep, got it. so basically acting like an hsm | 16:48 |
fungi | not just distributing a static ca cert | 16:48 |
mordred | it's a system you run to remove the cost of running the daemons you need so that you can reuse work from other people without worrying TOO much about things like "oh, but then we'd need to run another service" | 16:49 |
tristanC | corvus: mordred: well i do care about memory efficienty. but that's not very relevant here, unfortunately. | 16:49 |
corvus | tristanC: i think mnaser has outlined why using cert-manager would be valuable. if you think that reducing the dependency set is more valuable than that, then i guess we could just handle it internally (or support both). | 16:50 |
mnaser | i mean again i dont think this should be a hard dependency | 16:50 |
mnaser | but its a "nice to have if i already have cert-manager deployed" | 16:50 |
corvus | i just want to point out, i'm the one saying "we should use the k8s thing" and tristanC is saying "we should do it ourselves" :) | 16:51 |
mnaser | which many clusters will end up having anyways | 16:51 |
corvus | mnaser: that sounds like a vote for "handle it internally (or support both)" | 16:51 |
corvus | tristanC: so if you want to handle it internally, where will the CA live? a persistent volume for the operator pod? | 16:52 |
fungi | yeah, it's a good point, the overhead from adding the kubernetes certmanager is only overhead if you don't already need it for something else | 16:52 |
mordred | I could see support both - although I could also see just doing cert-manager for a start and then adding "handle it ourselves" as a followup | 16:52 |
mnaser | i haven't followed the tls stuff lately, is it "tls only" or "tls if you want it" | 16:52 |
mordred | like - my main worry with "handle it ourselves" is that we're then reimplementing cert-manager | 16:52 |
tristanC | i meant that i'm happy to look into cert-manager, but i worry it's oversized for what we need. also, isn't the jks still an experimental feature that was released 11 day ago? | 16:52 |
mnaser | because in that case i would be on the side of "you only get tls if you have cert-manager in your cluster .. or you provide your own certs" | 16:52 |
mnaser | the pattern used in things like ingress controllers is that you point towards a tls secret | 16:53 |
tristanC | mordred: isn't zk-ca.sh already a cert-manager reimplementation? | 16:53 |
mnaser | and then the cert-manager would automatically generate certs for that | 16:53 |
corvus | mnaser: i think by zuul v4 we're going to want to require authentication, and it looks like tls may be the way to do that, so probably zuulv4 will require tls | 16:54 |
corvus | it will be optional in v3 | 16:54 |
*** hashar has joined #zuul | 16:54 | |
mnaser | if it's required, then i think the zuul operator should accept a tls field which points towards a secret in k8s that has a ca/cert/key | 16:54 |
tristanC | corvus: when adding or removing zk service, shouldn't the cluster be restarted anyway? | 16:55 |
mnaser | how that tls secret is generated is up to the user | 16:55 |
corvus | tristanC: no, latest zk can do that live | 16:55 |
tristanC | mnaser: i disagree, why the user should have to worry about genearting zk tls? | 16:55 |
mnaser | well, because they might want to copy paste it from some really complicated internal CA they use | 16:56 |
mordred | I mean - cert-manager seems fairly common across k8s installs these days, right mnaser? | 16:56 |
tristanC | beside it's not such a simple process, you need a java keystore thingy that needs to be base64 encoded | 16:56 |
mnaser | also, the ingress is a good example of this | 16:56 |
mnaser | it doesnt generate the certs, it points towards a tls secret | 16:56 |
mnaser | how that secret is generated can either be the user (using cert-manager or manually creating the secret) | 16:56 |
mnaser | that way you're covering all your bases | 16:57 |
mnaser | a secret that's of type tls is very standard | 16:57 |
fungi | there will absolutely be deployments in organizations where they need to supply their own certificates and trust without depending on any integrated ca/cert management automation, due to regulations compliance | 16:57 |
tristanC | mnaser: to that token, we should also drop gearman tls generation from the operator, and that makes using zuul crd a bit more complicated as you need to provide more and more configuration files | 16:57 |
mordred | so - I think we might be talking across each other here | 16:58 |
fungi | likely | 16:59 |
mordred | for the purposes of intra-zuul communication we need certs signed by a CA - and that CA should _not_ be a CA shared with other things (like some corporate CA) because it's being used for client-auth | 16:59 |
mordred | one of the ways we can accomplish this in teh zuul operator is to have the zuul operator use cert-manager to do the work, which will generate the k8s secrets which it will then use | 16:59 |
corvus | authz even | 16:59 |
mordred | one of the ways is for the operator to use something like zk-ca.sh to generate k8s secrets which could be used by the operator to config things | 17:00 |
fungi | right, while it *shouldn't* be shared with a general corporate ca, it may need its dedicated ca managed by some outside system to be able to satisfy compliance requirements | 17:00 |
mordred | and one of teh ways is provide config settings that the user can use to specify a set of k8s secrets that user would like to be used for the purpose and have the operator not generate them | 17:01 |
corvus | fungi, mordred: (this case is why it may still be useful to keep the acl support in zuul+nodepool -- in that case one could use a shared corporate CA, but set acls so that only the zuul client certs have authz) | 17:01 |
mordred | as this is for intra-service comm - there are almost no use cases where the third is desirable, so I think for now let's just ignore it | 17:01 |
fungi | it may not make sense, but there are some industry regulations which demand certain properties for any certificate authority operated within the organization | 17:01 |
mordred | (we can always add config overrides later if we find a use case) | 17:01 |
mordred | but I thnik the main thing we're discussing now is: "should the zuul operator use cert-manager to generate, or zk-ca.sh to generate - or both" | 17:02 |
fungi | but yes, so long as we still leave an out to supply the certificates on your own manually, i think that's covered | 17:02 |
tristanC | oh right, having an optional, user-supplied, secret for tls files wfm. though it's not specified in the zuul-operator spec | 17:03 |
mordred | tristanC: so you'd be happy having the zuul-operator use cert-manager if one is there and no secrets are specified and not having a different auto-generated method? | 17:04 |
fungi | for the users who aren't hamstrung by draconian regulations compliance requirements, i think whatever provides the most seamless and hand-off solution while maintaining reasonable levels of security is preferred | 17:04 |
fungi | even if it drags in a dependency and consumes a bit more system resources in the process | 17:05 |
tristanC | mordred: that sounds like an implementation detail. in one case we need to require a new cert-manager dependencies, in the other we use the existing zk-ca.sh. i picked the later because i plan to use it outside of kubernetes too | 17:05 |
corvus | it does sound like it's 3 options: 1) manual 2) cert-manager 3) zk-ca.sh; all of them would produce a secret with cert/key/ca files which will be mounted into the pods and specified in the service configs. they just produce those secrets different ways. | 17:06 |
fungi | 1 covers all cases but is not very happymaking for people who just want it to work and don't care how the certs are managed. 2 seems good for the second case | 17:08 |
fungi | i guess 3 would be as well, but then if someone already has kubernetes cert-manager for other reasons they've effectively got two different ca managers going in the environment | 17:08 |
corvus | 3) zk-ca.sh is the simplest case -- it handles everything behind the scenes with no extra dependencies; 2) requires more of the cluster administrator, but may integrate better with existing k8s practices; 1) is the option for people who need it for $reasons. | 17:09 |
tristanC | corvus: which could be expressed in the CRD input... for each tls configuration, we could have an `Optional <UserSecret | use-cert-manager>`, and if that's not set, we generate the cert locally using zk-ca | 17:09 |
corvus | i'm pretty sure if we don't start with cert-manager, it'll never be implemented in the operator though. | 17:10 |
tristanC | or if we don't want to support running without cert-manager, we drop the Optional | 17:10 |
corvus | the thing i'm worried about is mnaser saying "i want to use cert-manager" and adding it to zuul-helm, and tristanC saying "i don't want to use cert-manager" and not adding it to the operator. | 17:11 |
mnaser | corvus: for what it's worth, i was thinking cert-manager would just be opt-in, otherwise you have to supply the secret as values | 17:11 |
tristanC | mnaser: what changes are needed to add cert-manager to zuul-helm? | 17:11 |
mnaser | nothing, just adding one more file into templates/ to generate the ca+certs | 17:12 |
*** jcapitao_afk is now known as jcapitao | 17:12 | |
corvus | mnaser: right, so options #1 and #2 for zuul-helm. | 17:12 |
tristanC | corvus: i'm not saying we shouldn't add it to the operator, it's just that will take more time than using the existing zk-ca.sh | 17:12 |
mnaser | yes, i do not want to have people rely on external things. step #1 is a secret that's manually generated using provided values, step #2 is adding k8s cr's to generate those (if the user wants) | 17:13 |
tristanC | do we know if the cert-manager can even be used to satisfy zookeeper jks secretstore thing? :) | 17:13 |
corvus | i do not know that yet | 17:14 |
tristanC | corvus: wait, perhaps that if we support user provided cert in the zuul-operator, then supporting the cert-manager is just a matter of adding more CR to the deployment recipe, and that could be transparent for the operator? | 17:14 |
mnaser | what tristanC said is exactly what i've been pushing for ^ | 17:15 |
mnaser | all cert-manager does is create a 'tls' secret inside k8s | 17:15 |
tristanC | mnaser: wouldn't that be the same for zuul-helm, that's actually transparent for the chart too right? | 17:15 |
corvus | we need a cert for each zk server | 17:15 |
mnaser | so as long as we consume a 'tls' secret inside k8s, that's all we need to essentially do | 17:15 |
tristanC | corvus: well that's ok, we can have a List UserSecret | 17:16 |
corvus | i was assuming the operator would need to create a CertRequest CR for each zk pod it created | 17:16 |
tristanC | corvus: that's still up for discussion, but i think we need a user controlled `count` attribute to the zookeeper schemas | 17:17 |
mnaser | hmm | 17:17 |
corvus | or we could fix it at 3 | 17:17 |
tristanC | corvus: yes, that would be the easiest | 17:17 |
mordred | doesn't tobiash run 5 atm? | 17:18 |
corvus | oh, i wonder why | 17:18 |
mordred | there was a reason - I have forgotten what it is | 17:18 |
mnaser | german engineering probably | 17:18 |
mnaser | =P | 17:18 |
corvus | maybe we do need a count | 17:18 |
corvus | >=3 :) | 17:18 |
mordred | I think we need a count | 17:18 |
mordred | yeah | 17:18 |
tristanC | so to summarize, the zuul-operator doesn't have to worry about cert-manager, and if the user doesn't provided tls secret, then we just run the existing zk-ca | 17:19 |
mordred | but that's kind of why I was thinking it would be nice to have the operator just handle this - it's a lot of plumbing details that aren't really interesting to a user - and while we _could_ have a chunk of yaml that all users copy-pasta | 17:20 |
mordred | tristanC: that's not what my take away has been so far :) | 17:20 |
corvus | tristanC, mnaser: can you explain how we get the right number of certs in place for the operator if it doesn't know about cert-manager? | 17:20 |
corvus | (and named the right way) | 17:21 |
corvus | (note that the cert subjects must match the hostname; i believe there's an option to disable hostname verification, but it's going to look dodgy for us to set that if we don't need to) | 17:21 |
tristanC | corvus: for idx in $(seq input.zookeeper.count); do zk-ca.sh zuul-zk-${idx}; done ? | 17:22 |
corvus | tristanC: no i understand how zk-ca.sh will work. i'm asking about cert-manager. | 17:22 |
mnaser | i wasn't aware we have multiple certs in place | 17:23 |
mnaser | this is interesting now because i'm not sure how we can mount a different secret per pod | 17:23 |
corvus | mnaser: minimum of 5. | 17:23 |
mnaser | so if you have 3 replicas of zuul-executor, how do you make sure each one gets its _own_ secret | 17:23 |
corvus | (at least 3 servers, a ca, and a client (which can be shared among all clients -- we don't need hostname verification for that)) | 17:23 |
corvus | mnaser: zuul-executor can share secrets | 17:24 |
corvus | it's the zk servers that (ideally) require unique certs | 17:24 |
mnaser | right, i think in plain k8s, that is somewhat of a challenge, as pods are meant/designed to be identical in definition | 17:24 |
mordred | there's a pattern for this used in the mariadb galera cluster | 17:25 |
tristanC | mnaser: here is the scheduler statefulset with zookeeper and gearman tls : https://zuul.opendev.org/t/zuul/build/562b3ab7f32940b18db7cc35043d2f26/console#3/0/4/ubuntu-bionic (search for StatefulSet/zuul-scheduler ) | 17:25 |
corvus | mnaser: this isn't uinque to certs, each zk server needs to know its own id too, that's usually a configmap-per-server | 17:25 |
tristanC | oh i see, different secret per replicaset | 17:25 |
corvus | these aren't replicas | 17:26 |
mordred | yeah. these are three individually defined pods in a logical relationship with each other | 17:26 |
tristanC | hmm, how about one statefulset per zookeeper pod then? | 17:27 |
mnaser | i wouldn't encourage that, because you'll end up with like zuul-0-0 zuul-1-0 etc | 17:27 |
mnaser | im looking how the current helm charts do things (like different config/id) | 17:28 |
fungi | granted if the executors all share a cert, then if one of them is compromised you have to replace the cert on all executors | 17:28 |
fungi | probably not a huge deal, but something to keep in mind in the risk model for the system | 17:28 |
mnaser | yeah the zuul bits are easy i think the zookeeper are the harder ones | 17:28 |
mordred | yah - becuae the executors are only precious due to cache coherency - they _can_ just scale up and down in count | 17:29 |
mordred | the zk cluster is different | 17:29 |
tristanC | mordred: iiuc, in the event the user doesn't provide tls secret, the fact that the operator use a cert-manager or run the zk-ca script doesn't make a significant difference? and if for compliance or other reason the user would like to use a cert-manager, then the user provide them as tls secret | 17:30 |
corvus | fungi: yes; though given that (unless you're using zoning) all executors can run any job, if one is compromised, it would be a good idea to burn them all | 17:30 |
fungi | i concur | 17:30 |
fungi | how is certificate revocation handled in that case? | 17:30 |
fungi | is a crl maintained/distributed to everything using client certs for authorization? | 17:30 |
clarkb | fungi: new ca | 17:30 |
fungi | ouch | 17:31 |
clarkb | I'm not sure it checks a revocation list | 17:31 |
corvus | crls cost more than cas. | 17:31 |
clarkb | (many things seem to not do that) | 17:31 |
corvus | tristanC: i'm just not getting that. cert-manager is not to handle the case of compliance. it's for folks who want to manage their k8s systems using k8s tools. it should be just as automatic as the zk-ca.sh. | 17:32 |
tobiash | corvus, mordred: we run 5 because availability (and because we're running them on tmpfs because ceph on spindles performs poorly with zk at our scale) | 17:33 |
mordred | tobiash: awesome. thanks! | 17:33 |
corvus | tristanC: but i've yet to understand what you and mnaser do about how allowing users to supply certs manually enables cert-manager to work with the operator. | 17:33 |
corvus | tobiash: so more zk servers mitigates the risk from not having persistent storage? | 17:34 |
zenkuro | hi, when I run tox -e docs in locs | 17:34 |
mnaser | corvus: i can talk about how it makes sense for zuul, but for zookeeper, i dont know actually | 17:34 |
zenkuro | in local zuul copy do I execute any additional programs? | 17:34 |
corvus | zenkuro: that should be all that's necessary; are you getting an error? | 17:34 |
mordred | zenkuro: it shouldn't be running anything other than sphinx-build in the virtualenv | 17:35 |
*** evrardjp has quit IRC | 17:35 | |
tobiash | corvus: well it didn't save us from data loss tbh. What we do now is we have data and snapshots on tmpfs and a sync crontab that syncs all to a persistent volume every minute | 17:35 |
*** evrardjp has joined #zuul | 17:36 | |
tobiash | And that is synced back to tmpfs on pre startzp of zk | 17:36 |
tristanC | corvus: if user sets zk.count = 3; then it needs to provide three certificate issued for '${cr-name}-zk-[123]-0.${namespace}.svc...', then the operator attach each secret to each zk deployment | 17:36 |
corvus | tristanC: right, that's not something that can be put in a static k8s yaml file | 17:37 |
tristanC | corvus: why not? | 17:37 |
corvus | because that's templated? | 17:37 |
corvus | i mean, that's the whole point of operators, right? to do this for users? | 17:37 |
corvus | the way the operator should interact with cert-manager is: when it knows how many zk servers to create, it creates the cert request CRs for each | 17:38 |
corvus | (because, i mean, you *can* run a whole zuul cluster without any operators or helm charts or anything, just using a single static k8s yaml file, see https://gerrit.googlesource.com/zuul/ops/+/refs/heads/master/k8s/zuul.yaml but that's tedious) | 17:39 |
*** chandankumar is now known as raukadah | 17:39 | |
mordred | yeah. like - I feel like the tls connections for zk are _totally_ an impl detail that the operator should elide for the user | 17:40 |
mordred | most users shouldn't even have to care that there is a zk at all | 17:40 |
corvus | ++ | 17:40 |
mordred | but certainly shouldn't have to care about how the CA and certs work for it to talk to itself | 17:40 |
tristanC | ok, but you still want the zuul-operator to register zk tls certificate in an external service | 17:41 |
zenkuro | mordred: it tries to run zuul-manage-ansible | 17:41 |
mordred | I want the operator to use CRDs from cert-manager to generate a CA, a client cert and a cert-per-zk-pod | 17:41 |
zenkuro | corvus: yep | 17:41 |
mordred | becuase it is an existing k8s system that is designed for this purpose | 17:42 |
corvus | zenkuro: can you copy the error into http://paste.openstack.org/ and then paste the resulting url here so we can see it? | 17:42 |
tristanC | mordred: sigh, alright, i'll have a look now and see if it can actually be used | 17:42 |
corvus | zenkuro: it does that so it can include the help output in the manual | 17:42 |
mordred | if that is untennable, I think the operator should use zk-ca.sh to do the same thing - my main worry is that now we will have our own local impl of cert-manager that only does what we're doing | 17:42 |
mordred | tristanC: this is mostly because managing certs and cas is one of those things that I think extra effort shoudl be used to re-use existing work because most of the time doing it yourself is a recipe for making a mistake somewhere and not noticing | 17:43 |
tristanC | mordred: right i understand, but i'd rather spend time doing something else than zk tls, especially since we eventually made the zk-ca works last week. | 17:45 |
tristanC | and if there is a mistake in zk-ca, we should care and fix it, as it will likely be the tool used outside of kubernetes | 17:46 |
zenkuro | corvus: sure, in 5 min | 17:46 |
tristanC | and iiuc, zk requires jks, which seems to be an experimental feature of the cert-manager that got released less than two weeks ago | 17:47 |
tristanC | so perhaps we should wait cert-manager support for jks to be stable before relying on it? | 17:47 |
corvus | tristanC: zk requires either a jks file; or a pkcs12 file with .jks extension; or a pkcs8 pem-encoded file with a .pem extension (but the key has to be encrypted) | 17:49 |
corvus | i don't know which of those cert-manager can output, but i think that's at least 2 or 3 distinct options there | 17:50 |
mordred | tristanC: that's good point about managing zk-ca.sh anyway, why don't we start with zk-ca.sh - but I think using cert-manager if it's there is still worthwhile (and the thing that k8s people with cert-manager would expect to happen), perhaps we do that as an upgrade followon - and have the operator detect if it can use cert-manager or if it needs to use zk-ca | 17:51 |
corvus | mordred: ++ i think it's important that we have an answer about cert-manager | 17:52 |
*** jamesmcarthur has quit IRC | 17:52 | |
corvus | because if an experienced k8s operator (person) uses this, they will expect that (see mnaser's very first reaction when we started talking about this) | 17:52 |
corvus | so the answer should either be "we do! use this option!" or "we looked into it, and it doesn't quite produce the output we need" | 17:53 |
corvus | but it should not be "because we didn't feel like it". that will reflect poorly. | 17:53 |
tristanC | corvus: so there are ways to use cert-manager for zk in the zuul-operator, but what about the zuul-heml charts, should we start here first, then i'll port the change to zuul-operator (as i did with the initial service labels) ? | 17:57 |
*** jamesmcarthur has joined #zuul | 17:57 | |
*** jpena is now known as jpena|off | 17:58 | |
corvus | tristanC: i don't quite know enough helm to know how to do that; but i'm assuming there's some way that will be similar -- however you create the 3 (or more) zk pods, do the same for the tls certs | 17:59 |
corvus | tristanC: (i don't think there's a way to use zk-ca.sh with helm -- other than just running it before running helm and then using its output as input to helm) | 18:00 |
zenkuro | corvus: softwarefactory-project.io/paste/show/1763 | 18:04 |
tristanC | oh well, it seems like zuul-helm is using https://github.com/helm/charts/tree/master/incubator/zookeeper which simply doesn't have tls or sasl | 18:05 |
corvus | zenkuro: hrm, i was hoping for more output. there may be something useful in .tox/docs/log/ directory. | 18:07 |
corvus | zenkuro: but also, you may need to install the packages from bindep.txt | 18:08 |
corvus | zenkuro: something like "sudo yum -y install $(bindep -b test)" may be necessary... | 18:09 |
corvus | mordred, clarkb: ^ does that sound right? | 18:09 |
clarkb | corvus: yes that command for installing things looks correct | 18:09 |
clarkb | run from the repo top level dir (otherwise you can specify a path to the bindep file) | 18:10 |
mordred | corvus: I rechecked that osc image job and got the same error - so it might actually be a something | 18:12 |
zenkuro | corvus: softwarefactory-project.io/paste/show/1765 | 18:12 |
mordred | oh wait | 18:13 |
zenkuro | looks like Im mising zuul manage ansible... but | 18:13 |
corvus | zenkuro: that should be installed by tox; but running it is failing for some reason | 18:14 |
zenkuro | do I need to install this to have check doc style? Isnt it a problem for security? | 18:14 |
corvus | zenkuro: another thing you could do is try running ".tox/docs/bin/zuul-manage-ansible -h" and see what the output is | 18:14 |
mordred | corvus: ignore me - I hadn't added the appropriate zuul user to the dockerhub org :) | 18:14 |
corvus | mordred: aha! | 18:15 |
*** sshnaidm|afk is now known as sshnaidm | 18:15 | |
corvus | zenkuro: if you have run tox before, but have recently updated the zuul repo, there may be new dependencies. in that case, you should run "tox -r -e docs" to recreate it. | 18:15 |
zenkuro | corvus: it fails with some python deps | 18:15 |
corvus | zenkuro: pathspec? | 18:15 |
zenkuro | re2 | 18:16 |
fungi | this may be the recent fedora re2 incompatibility someone was talking about | 18:16 |
corvus | zenkuro: then do both things: run the bindep command, then run "tox -r -e docs" | 18:16 |
corvus | fungi: oh? | 18:16 |
corvus | i'm unfamiliar with the issue | 18:16 |
fungi | i'll see if i can find details | 18:16 |
fungi | 2020-03-13 11:30:21 tobiash clarkb, ianw: fyi libre2 in latest fedora is incompatible with prebuilt wheel of fb-re2 | 18:17 |
fungi | (from scrollback in here) | 18:18 |
zenkuro | yah, I have fedora | 18:18 |
tobiash | In fedora fr-re2 must be force installed by source (I don't recall the command atm) | 18:19 |
corvus | --no-binary ? | 18:19 |
tobiash | corvus: I think that was it combined with a filter | 18:20 |
mnaser | fwiw zuul-helm does _not_ do zookeeper deployments | 18:20 |
corvus | --no-binary fb-re2 ? | 18:21 |
*** jamesmcarthur has quit IRC | 18:22 | |
zenkuro | hm... kooks like installation of re2 devel src solved the issue | 18:22 |
tristanC | mnaser: is the zookeeper dependencies coming from https://github.com/helm/charts/tree/master/incubator/zookeeper ? | 18:22 |
mnaser | tristanC: only for the "zuul-system" chart which is meant to be an integration point for now | 18:23 |
corvus | zenkuro: what commands did you run? | 18:23 |
mnaser | mordred: unrelated but i know you tinkered a lot with pyenv. did you find a way to get pyenv to install "latest" of a specific series | 18:24 |
mnaser | aka pyenv install 3.7 (noonedeadpunk dug in initially with no success | 18:24 |
*** bhavikdbavishi has quit IRC | 18:25 | |
tristanC | mnaser: hmm, how is the zuul-helm supposed to be used then? | 18:25 |
mordred | mnaser: nope. I do "pyenv install --list | grep '^ *3.7'" | 18:25 |
mordred | then look for the version with my eyeballs | 18:25 |
mordred | it's not idea | 18:25 |
mordred | ideal | 18:25 |
mnaser | tristanC: "bring your own zookeeper and db" | 18:25 |
*** jcapitao has quit IRC | 18:25 | |
tristanC | mnaser: oh, ok. then when zuul requires zk auth, then that would be 'bring your own zk tls too' | 18:27 |
mnaser | yep | 18:27 |
zenkuro | corvus: dnf install re2-devel | 18:30 |
corvus | zenkuro: ok good, that's already in bindep, so the bindep command i gave you earlier would have run that | 18:30 |
zenkuro | corvus: tox -r -e docs | 18:30 |
tobiash | corvus, zenkuro: I used this to fix the tox env: ".tox/py37/bin/pip install --force-reinstall --ignore-installed --no-binary :all: fb-re2" | 18:31 |
tobiash | (and installed re2-devel) | 18:31 |
corvus | tobiash, zenkuro: i just ran tox -e docs in a fedora:latest container after running "yum -y install $(bindep -b doc test)" and it worked without error | 18:31 |
zenkuro | but why docs require so much things? | 18:31 |
tobiash | corvus: maybe the package has been fixed in the meantime | 18:32 |
corvus | zenkuro: because it runs the zuul commands to collect the help output | 18:32 |
tobiash | at least there was a new release in february | 18:33 |
*** jamesmcarthur has joined #zuul | 18:35 | |
tobiash | corvus: oh, it looks like they just removed the prebuilt wheels from pypi: https://pypi.org/project/fb-re2/1.0.7/#files | 18:36 |
tobiash | that's also the reason the devel package is required now | 18:36 |
corvus | well that works :) | 18:38 |
tristanC | according to https://github.com/jetstack/cert-manager/pull/2643/files#diff-c210e0aac6b999044e3472381a445c4b , it seems like the keystore password needs to be set at cert-manager installation time | 18:44 |
*** jamesmcarthur has quit IRC | 18:51 | |
tristanC | and well, it seems like it's part of the crd yet, only the controller seems to be implemeneted. corvus: is there something else i can use beside pkcs12 or jks? | 18:59 |
zbr | ensure-tox: https://review.opendev.org/#/c/690057/ -- i really need it in order to be able to change how tox is installed | 19:01 |
tristanC | here is the Certificate CR I tried using a pkcs12 enabled cert-manager: http://paste.openstack.org/show/790752/ | 19:01 |
tristanC | corvus: i think that's a good reason why the zuul-operator is not using cert-manager ^ | 19:05 |
corvus | tristanC: pem-encoded pkcs8 with an encrypted key is an option | 19:10 |
tristanC | corvus: i'm not seing an option to provide an encryption key for pcks8, do you think that should be provided by the cert-manager? | 19:14 |
mnaser | so with adding the 'vexxhost' tenant -- i noticed zuul breaks out after the first missing required-project (rather than checking all of them *then* breaking out) | 19:15 |
mnaser | so its complanining about openstack/cinder *only* missing rather than all the others | 19:15 |
corvus | tristanC: probably worth a look at how the jks support handles it (it seems very likely that jks support would include a store password) | 19:15 |
mordred | corvus: where did we ever wind up with the "treat required projects as foreign" or whatever we were calling that potential change for making things like re-using repos like devstack easier? I'm pretty sure I should know the answer, but I'm not able to page it into my brain pellets | 19:16 |
corvus | mordred: needs more design: https://review.opendev.org/613143 | 19:17 |
mordred | ah yes! | 19:17 |
tristanC | corvus: reading https://github.com/jetstack/cert-manager/blob/master/cmd/controller/app/options/options.go#L322 , it seems like those (experimental) options are provided at cert-manager install time, L6597 of the yaml file | 19:17 |
mordred | mnaser: ^^ | 19:17 |
mordred | mnaser: idea being that you really shouldn't need to add _any_ of those projects to your tenant if you don't want to - assuming the design issues can be sorted out | 19:18 |
corvus | tristanC: oof, that is the wrong place for that | 19:25 |
corvus | tristanC: it should be per-keystore, so i'd expect it to be an attribute of the certificate | 19:25 |
corvus | tristanC: or, at the *very* least, a property of the issuer | 19:26 |
corvus | it's certainly not appropriate for it to be cluster-wide | 19:26 |
corvus | tristanC: what yaml file? | 19:27 |
corvus | i'm having trouble figuring out where --experimental-jks-password should be provided | 19:28 |
tristanC | corvus: https://github.com/jetstack/cert-manager/releases/download/v0.14.0/cert-manager.yaml L6597 I added: `- --experimental-issue-pkcs8` and `- --experimental-pkcs12-keystore-password=secretstorepassword` | 19:28 |
*** jamesmcarthur has joined #zuul | 19:28 | |
mnaser | mordred: yeah -- i think i'd be a little more comfortable fixing the for loop that breaks early :P | 19:28 |
corvus | tristanC: thanks, that clarifies. also, wow, that's an interesting way for that to happen. | 19:28 |
corvus | mnaser: i *think* having that evaluate the complete set of r-p would be feasible and reasonable; i can't think of a reason not to do it right now. | 19:29 |
mnaser | ok i found it | 19:30 |
mnaser | we're raising an exception when we hit an unknown project | 19:30 |
corvus | mnaser: in configloader.py | 19:30 |
corvus | yeah that | 19:30 |
mnaser | the messages zuul emits aren't really like.. an api right? | 19:30 |
mordred | seems collecting them and raising an exception with a larger error payload would be potentially nicer | 19:30 |
mnaser | ^^^ | 19:30 |
mnaser | so that sems reasonable to just start rewriting the message | 19:31 |
corvus | mnaser: nope; the ones that are are special exceptions | 19:31 |
corvus | yep | 19:31 |
* mordred afks for just a bit | 19:31 | |
* corvus gets lunch | 19:31 | |
*** harrymichal has joined #zuul | 19:34 | |
mnaser | just wrote something with tests now, testing locally | 19:36 |
* clarkb is back ish from lunch and dealing with kids | 19:41 | |
mnaser | cant run the tests locally because gear and select | 19:54 |
clarkb | mnaser: there is a chnage up to address that but still needs people on osx to test I think | 19:54 |
mnaser | clarkb: yeah i saw that and i saw it was also reverted too at some point, assuming it broke other things | 19:55 |
clarkb | mnaser: https://review.opendev.org/#/c/708267/ | 19:55 |
clarkb | yes this fixes the revert (or so we hope) | 19:55 |
mnaser | clarkb: i'll try to make time to try that out | 19:58 |
openstackgerrit | Mohammed Naser proposed zuul/zuul master: config: show error for all unknown projects https://review.opendev.org/713323 | 19:58 |
mnaser | corvus, mordred: should be sane ^ | 19:58 |
mordred | mnaser: you angered the gods of whitespace | 20:05 |
mnaser | NOT AGAIN | 20:05 |
mnaser | ugh | 20:07 |
openstackgerrit | Mohammed Naser proposed zuul/zuul master: config: show error for all unknown projects https://review.opendev.org/713323 | 20:07 |
*** jamesmcarthur has quit IRC | 20:29 | |
*** jamesmcarthur has joined #zuul | 20:30 | |
ianw | tristanC : if things are complex, you could write the config file in jinja2 and deploy it as a template too ... personally i see it as something small enough to be useful and put in the linked dnm project-config change to illustrate why | 20:33 |
ianw | if yourself and corvus like the idea of a parent structure more, can certainly implement that | 20:34 |
ianw | but if nobody thinks the general idea is useful, well, then that's that :) | 20:34 |
tristanC | ianw: i don't mind either way. I think there is value in being able to express hierarchy. In our case, we have fedora-30, fedora-31 and fedora-rawhide that are mostly identical beside the version name, and having parent image would remove the repetition. With a base image, we'll also get the global option you are proposing too. | 20:55 |
*** jamesmcarthur has quit IRC | 20:57 | |
ianw | tristanC: yeah, thanks for that suggestion it does feel more "zuul-onic"; i can write something up and see if it slots in neatly or if there's some horrific corner case i haven't thought of yet :) (my initial feeling is the former) | 20:59 |
tristanC | ianw: well we have to be careful about merging. we could override parent attribute with child, but perhaps for elements list we want to extend instead? | 21:01 |
ianw | yes, elements list i think we want to extend, as done in the original | 21:03 |
ianw | otherwise dict update() semantics | 21:03 |
*** jamesmcarthur has joined #zuul | 21:06 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add TLS support for ZooKeeper https://review.opendev.org/712531 | 21:08 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Use ZK TLS in quickstart https://review.opendev.org/712817 | 21:08 |
corvus | tristanC: ^ i think that should fix the errors that showed up in tests | 21:10 |
tristanC | corvus: i'll submit a zuul-operator update shortly | 21:12 |
tristanC | corvus: qq, it seems like using a generic name like `zk` works for client auth. Is a fqdn required for quorum ssl, or would it be possible to use a single server key for all zk pod? | 21:14 |
corvus | tristanC: it's required for quorum unless you turn off host verification (i tend to think we should leave it on if we can, but it's open to discussion if folks think it would be okay to turn it off -- mostly i worry it will look bad) | 21:15 |
corvus | (i guess it's probably the only way to perform authz on the quorum side, unless you used two different cas. so it's probably good to keep it enabled) | 21:16 |
clarkb | Shrews: left a couple notes/questions on https://review.opendev.org/#/c/712539/3 | 21:17 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add TLS configuration to ZooKeeper service https://review.opendev.org/712759 | 21:18 |
tristanC | corvus: alright, so that seems to work for the current single pod zk service ^ | 21:19 |
tristanC | now i wonder what it would take to add tls support to the existing zk operator or chart so that the operator doesn't have to worry about that | 21:20 |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: Revert abitrary uid support https://review.opendev.org/713060 | 21:20 |
clarkb | mordred: corvus ianw ^ decided to updated that and keep that moving forward | 21:21 |
corvus | tristanC: should be pretty easy -- but if the zuul-operator is going to run zk internally, it should at least have a quorum of 3 | 21:22 |
tristanC | corvus: maybe not so easy. the zk helm chart manage replication using a single statefulset, thus if we want one secret per pod, then that might a more profond refactor | 21:26 |
tristanC | ftr, it is https://github.com/helm/charts/blob/master/incubator/zookeeper/templates/statefulset.yaml | 21:26 |
*** plaurin has quit IRC | 21:27 | |
tristanC | and that seems to be the case for the operator too: https://github.com/pravega/zookeeper-operator/blob/1c4e3f1908412e219efa630445ce003fdcae279e/pkg/zk/generators.go#L41 | 21:28 |
corvus | tristanC: how does it set myid? | 21:29 |
tristanC | corvus: https://github.com/pravega/zookeeper-operator/blob/master/docker/bin/zookeeperStart.sh#L33 where ORD is the last number of the hostname, e.g. zk-0 zk-1 ... | 21:32 |
tristanC | perhaps we could shove all the service cert inside a single secret, and use that zookeeperStart script to pick the right cert based on ORD too | 21:33 |
corvus | tristanC: i love how many "how do i do x in k8s?" questions are answered with "custom start script" :) | 21:36 |
corvus | tristanC: but yeah, if they've already gone down the path of parsing the hostname, then that could work. | 21:37 |
*** sshnaidm has quit IRC | 21:37 | |
corvus | tristanC: actually... | 21:38 |
corvus | tristanC: jks has a concept of an alias -- and i think zk uses that to ask the jks for the cert, so it might actually just work to just put all the certs in the same jks and give that to each host | 21:38 |
corvus | tristanC: i did not test that, but i could do that now without too much trouble if that sounds useful. | 21:39 |
corvus | tristanC: (and i'm sorry i didn't think of this earlier) | 21:39 |
*** jamesmcarthur has quit IRC | 21:41 | |
*** sshnaidm has joined #zuul | 21:44 | |
tristanC | corvus: tls support request submitted here: https://github.com/pravega/zookeeper-operator/issues/141 | 21:51 |
tristanC | mordred: would you mind approving https://review.opendev.org/#/c/712816 | 21:54 |
mordred | tristanC: +A | 21:54 |
tristanC | iirc tobiash manage zk in k8s locally, and perhaps it would be better if the zuul-operator does that too | 21:57 |
tristanC | but the fact that no zk deployment system seems to bother with tls or sasl is quite annoying | 21:58 |
*** jamesmcarthur has joined #zuul | 21:59 | |
corvus | tristanC: well, it's pretty new | 21:59 |
corvus | tristanC: i'm working on testing the single-keystore idea | 22:00 |
corvus | tristanC: it doesn't look like that's working. even though zk insists that the keystore alias for a cert match the subject of the cert, it doesn't actually pull out the cert which matches its own hostname. so when i build a keystore with 3 certs, all the quorum hosts present the first cert when they connect. :( | 22:04 |
corvus | that's a real missed opportunity | 22:04 |
corvus | so i think we're stuck with "provide a different cert file to each quorum host" | 22:04 |
corvus | this is what my keystore looked like: http://paste.openstack.org/show/790758/ | 22:05 |
corvus | but they all used zoo1 | 22:05 |
*** jamesmcarthur has quit IRC | 22:06 | |
tristanC | arg, too bad. makes me wonder why the keystore support multiple key :) | 22:06 |
corvus | tristanC: i think for this case, but i think the zk developers didn't take advantage of it | 22:08 |
corvus | tristanC: oh! i found that you *can* use unencrypted pkcs8 files | 22:17 |
corvus | tristanC: i think that means that we should be able to use the standard output from cert-manager without change | 22:18 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Don't use JKS with ZK https://review.opendev.org/713340 | 22:21 |
corvus | tristanC: ^ like that | 22:21 |
mordred | corvus: that seems WAY better | 22:32 |
corvus | yeah, it doesn't solve the additional complexity for the operator, but it at least reduces the problem to something that most people are familiar with ("here is a pem file for this server") | 22:33 |
corvus | hopefully it makes the cert-manager problem easier | 22:33 |
mordred | ++ | 22:33 |
corvus | ("for the operator" -> "for the zk operator") | 22:33 |
*** jamesmcarthur has joined #zuul | 22:36 | |
*** Defolos has quit IRC | 22:40 | |
*** jamesmcarthur has quit IRC | 22:42 | |
*** jamesmcarthur has joined #zuul | 22:42 | |
*** Defolos has joined #zuul | 22:42 | |
*** jamesmcarthur has quit IRC | 22:43 | |
*** jamesmcarthur has joined #zuul | 22:44 | |
tristanC | corvus: that looks promising, i updated the zk-operator issue. | 22:46 |
corvus | tristanC: cool; if it's not too hard for you to try, it may be worth asking cert-manager for a cert, then seeing if zk can start up with that (using a config like my latest change to quickstart) | 22:49 |
*** threestrands has joined #zuul | 22:51 | |
mnaser | i posted these over the weekend and they're blocking a change of mine: https://review.opendev.org/#/c/713107/ and https://review.opendev.org/#/c/713115/ if possible to have a look at them :) | 23:04 |
mnaser | (i have a depends-on so my change is pending on those) | 23:04 |
*** jamesmcarthur has quit IRC | 23:11 | |
*** jamesmcarthur has joined #zuul | 23:11 | |
openstackgerrit | Merged zuul/zuul master: config: show error for all unknown projects https://review.opendev.org/713323 | 23:15 |
*** hashar has quit IRC | 23:15 | |
*** jamesmcarthur has quit IRC | 23:20 | |
*** jamesmcarthur has joined #zuul | 23:36 | |
*** harrymichal has quit IRC | 23:39 | |
*** clayg has left #zuul | 23:49 | |
*** jamesmcarthur has quit IRC | 23:56 | |
*** jamesmcarthur has joined #zuul | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!