Monday, 2020-03-16

*** tosky has quit IRC		00:14
*** jamesmcarthur has joined #zuul		00:23
openstackgerrit	Merged zuul/nodepool master: Stop using hacking for anything https://review.opendev.org/712781	00:42
*** armstrongs has joined #zuul		01:22
*** sshnaidm is now known as sshnaidm\|afk		01:31
*** armstrongs has quit IRC		01:31
openstackgerrit	Merged zuul/nodepool master: Add visual dividers for each image in Dockerfile https://review.opendev.org/713002	01:44
*** jamesmcarthur has quit IRC		02:09
*** swest has quit IRC		02:10
*** jamesmcarthur has joined #zuul		02:13
*** jamesmcarthur has quit IRC		02:18
*** jamesmcarthur has joined #zuul		02:20
*** swest has joined #zuul		02:24
openstackgerrit	Ian Wienand proposed zuul/nodepool master: Add diskimages-globals section for common configuration https://review.opendev.org/713157	02:48
*** jamesmcarthur has quit IRC		03:13
openstackgerrit	Ian Wienand proposed zuul/nodepool master: Add diskimages-globals section for common configuration https://review.opendev.org/713157	03:13
*** jamesmcarthur has joined #zuul		03:27
*** bhavikdbavishi has joined #zuul		03:33
*** jamesmcarthur has quit IRC		03:56
*** pleia2 has joined #zuul		04:11
*** jamesmcarthur has joined #zuul		04:37
openstackgerrit	Ian Wienand proposed zuul/nodepool master: Dockerfile: set SECLEVEL=1 for RAX dfw and ord region compatability https://review.opendev.org/713162	04:47
*** jamesmcarthur has quit IRC		04:57
openstackgerrit	Ian Wienand proposed zuul/nodepool master: Dockerfile: set SECLEVEL=1 for RAX dfw and ord region compatability https://review.opendev.org/713162	05:02
*** bolg has joined #zuul		05:13
*** evrardjp has quit IRC		05:35
*** evrardjp has joined #zuul		05:36
*** sanjayu_ has joined #zuul		05:42
openstackgerrit	Ian Wienand proposed zuul/nodepool master: Dockerfile: set SECLEVEL=1 for RAX dfw and ord region compatability https://review.opendev.org/713162	06:18
*** irclogbot_3 has quit IRC		06:29
*** dpawlik has joined #zuul		06:58
*** jamesmcarthur has joined #zuul		06:59
*** jamesmcarthur has quit IRC		07:03
*** bolg has quit IRC		07:25
*** irclogbot_0 has joined #zuul		07:30
*** NBorg has joined #zuul		08:06
*** dpawlik has quit IRC		08:07
*** dpawlik has joined #zuul		08:07
*** sugaar has joined #zuul		08:11
NBorg	Hello. Is there a suggested way to use tox (my problem is playbooks-syntax) from zuul-jobs if you use roles from another repo? Modify ANSIBLE_ROLES_PATH in tox.ini?	08:17
*** dpawlik has quit IRC		08:18
*** bolg has joined #zuul		08:22
*** dpawlik has joined #zuul		08:28
*** jcapitao has joined #zuul		08:32
*** jcapitao has quit IRC		08:34
*** jcapitao has joined #zuul		08:36
*** bolg has quit IRC		08:37
*** bolg has joined #zuul		08:38
*** jcapitao has quit IRC		08:42
*** jcapitao has joined #zuul		08:42
*** jpena\|off is now known as jpena		08:50
*** tosky has joined #zuul		09:07
*** sanjayu_ has quit IRC		09:22
*** arxcruz is now known as arxcruz\|rover		09:34
zbr	morning!	09:34
zbr	who can help me merge few bits? https://review.opendev.org/#/c/708642/	09:34
*** harrymichal has joined #zuul		09:37
AJaeger	zbr: I suggest to ask clarkb again, he had some comments and one of his was not addressed...	09:39
zbr	AJaeger: my impression was that was not an actionable comment.bindep/test-reqs do not change very often, but they can impact testing, so it makes sense to run when modified.	09:41
zbr	AJaeger: in fact the one I really care about is https://review.opendev.org/#/c/690057/	09:44
zbr	because over the weekend I was unable to fix some broken tox jobs on ansible zuul instance due to lack of this feature	09:45
zbr	the centos-7 image had preinstalled ancient tox, and ensure-tox is currently unable to upgrade it. paul was in PTO, so it was impossible to do anything.	09:46
zbr	probably he will fix the image today, but is much better to be able to workaround bugs in images.	09:47
AJaeger	zbr: leave a comment to his one ;)	09:49
zbr	already did	09:49
AJaeger	great	09:49
AJaeger	zbr: I think one point clarkb wanted to make is that the this is for the test job and only the bindep file in zuul-jobs. Just to double check that you are aware of it	09:50
AJaeger	zbr: I'll approve tomorrow if clarkb finds no time today	09:50
zbr	i am wondering if it is possible to validate an image before switching it on zuul, so we can avoid putting broken images.	09:51
zbr	i am still hoping to reach a point where a change made to the build image is tested like a normal job and last step is to try to use that image, so we know is not worse than previous one.	09:52
AJaeger	zbr: not today, I expect this would need major changes to nodepool and Zuul	09:52
zbr	this was my impression too, but we should not ignore it, i seen multiple issues related to this.	09:53
zbr	once we have this, it will be much easier to update images (more confidence)	09:53
*** saneax has joined #zuul		10:01
*** avass has joined #zuul		10:03
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182	10:08
*** sshnaidm\|afk is now known as sshnaidm		10:08
avass	Soooo, someone just gave a node in nodeset the name something like 'abc/123'. And it looks like that's completely valid, except it causes some problem with the valdiate-host role	10:09
avass	And I can see how names like that could cause a lot of problems in other ways	10:09
*** hashar has joined #zuul		10:10
avass	But since it's valid in both zuul and ansible I think it should be fixed	10:11
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182	10:16
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182	10:18
openstackgerrit	Daniel Pawlik proposed zuul/zuul-jobs master: DNM - checking new images https://review.opendev.org/713183	10:22
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182	10:24
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182	10:28
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Replace occurences of '/' in nodename to something more path friendly https://review.opendev.org/713182	10:30
*** hashar is now known as hasharLunch		11:09
*** rlandy has joined #zuul		11:30
*** Defolos has quit IRC		11:39
*** Defolos has joined #zuul		11:44
*** avass has quit IRC		11:50
*** avass has joined #zuul		11:58
*** jcapitao is now known as jcapitao_lunch		12:00
*** harrymichal has quit IRC		12:05
*** harrymichal has joined #zuul		12:06
*** hasharLunch has quit IRC		12:15
*** pabelanger has joined #zuul		12:37
*** NBorg has quit IRC		12:40
*** bhavikdbavishi has quit IRC		12:48
*** jpena is now known as jpena\|lunch		12:59
*** jcapitao_lunch is now known as jcapitao		13:08
*** hasharLunch has joined #zuul		13:22
*** hasharLunch is now known as hashar		13:23
openstackgerrit	Monty Taylor proposed zuul/nodepool master: Revert "Dockerfile: add support for arbritary uid" https://review.opendev.org/713060	13:26
openstackgerrit	Monty Taylor proposed zuul/zuul master: Remove fix-tox workaround for python3.5 https://review.opendev.org/712495	13:27
openstackgerrit	Monty Taylor proposed zuul/zuul master: Remove duplicate variables for tox jobs https://review.opendev.org/712496	13:27
*** dpawlik has quit IRC		13:29
*** avass has quit IRC		13:29
*** Defolos has quit IRC		13:36
*** NBorg has joined #zuul		13:36
*** Defolos has joined #zuul		13:38
*** erbarr has joined #zuul		13:38
*** jamesmcarthur has joined #zuul		13:42
*** jpena\|lunch is now known as jpena		13:50
*** zxiiro has joined #zuul		13:54
openstackgerrit	Merged zuul/nodepool master: Stop comparing hostnames to determine ownership https://review.opendev.org/713057	13:58
tristanC	zuul-maint: a small stack of 3 changes for the zuul-operator is waiting for review here: https://review.opendev.org/713138 . Those are little things i'm basing on a wip integration tests	13:59
corvus	tristanC: where's the wip integration tests?	14:02
tristanC	well it's still a wip, i meant my wip is on top of the stack	14:03
corvus	ok that clears it up :)	14:03
mordred	corvus: that's a fun issue from avass there in scrollback	14:04
tristanC	i basically have a role to setup static node provider and i'm loooking forward triggering jobs and testing the different parts like builds result, console stream, etc...	14:04
corvus	mordred: yeah -- i wonder what "valid in ansible" means?	14:04
corvus	tristanC: you can borrow a bunch of stuff from the quickstart for that	14:05
mordred	corvus: that's the second time in a week I've seen someone use a character in a zuul name that I'd never considered using - but which there's no specific reason not to use (mnaser used a : in a job name and my brain when "pow")	14:05
mordred	corvus: maybe you can put a / in an inventory hostname since it's a valid ansible char?	14:05
corvus	mordred: right, but it's not a valid dns name	14:05
mnaser	ya i kinda like ':' esp because it helps splitting different smaller 'slugs'	14:06
mordred	corvus: yah - but ansible inventories don't have anything to do with dns	14:06
corvus	mordred: ah, then 713182 is probably the way to go. i wonder if the prepare artifact role is okay?	14:07
mordred	corvus: good question	14:08
mordred	corvus: I have verified	14:08
mordred	corvus: foo/bar ansible_host=mordred-irc.inaugust.com	14:08
mordred	in a hosts file for ansible totally works with foo/bar being the ansible name of that hose	14:08
*** plaurin has joined #zuul		14:08
mordred	host	14:08
plaurin	Good morning irc people!	14:09
* tristanC waves		14:09
mnaser	o/	14:09
*** Goneri has joined #zuul		14:09
plaurin	zuul chugging nicely with happy developers since kubernetes stream output is working :)	14:10
corvus	mordred: qq on 713060	14:10
corvus	plaurin: \o/	14:10
mordred	plaurin: \o/	14:11
plaurin	I was wondering are there any plans for more interactions with the zuul ui? like triggering or killing jobs	14:11
mordred	corvus: I don't think the sudo setting works if we don't add a nodepool user	14:11
corvus	mordred: this is a good point	14:12
corvus	mordred: we actually do create a zuul user in the zuul containers	14:13
corvus	mordred: so maybe we should leave that part and just revert out the uid entrypoint bit?	14:13
tristanC	plaurin: there are plans for that indeed, for killing or enqueuing jobs there is https://review.opendev.org/#/c/643536/	14:14
plaurin	autohold / dequeue / enqueue, nice	14:15
plaurin	use those quite a lot	14:16
tristanC	plaurin: and you already reviewed https://review.opendev.org/635716 which adds manual trigger, but I think we need a proper spec for such use-case	14:16
*** Goneri has quit IRC		14:16
plaurin	ya I remember that one	14:16
plaurin	those upcoming features will make a big difference	14:17
plaurin	for me and my team anyways :)	14:18
*** Goneri has joined #zuul		14:19
tristanC	plaurin: fwiw, we also investigated a service to convert arbritary event like AMQP message to a project pipeline config with this project: https://pagure.io/software-factory/zuul-gateway	14:20
corvus	plaurin: zuul is primarily intended to integrate with code review systems so that developers mostly interact with it there (rather than with its own ui). what are the use cases that cause you to want to manually trigger and kill jobs?	14:22
mordred	corvus: ok. good point. I'll update it	14:23
mordred	corvus: I might need your eyeballs on a container job	14:23
mordred	corvus: https://zuul.opendev.org/t/openstack/build/565c64362194492fa0c603c3396d053c/console	14:23
mordred	corvus: it uploaded the image but failed uploading the tag	14:24
mordred	with an access denied	14:24
corvus	mordred: that sounds like an internet hiccup	14:24
mordred	yeah? awesome	14:25
mordred	should I just reapprove the change do you think?	14:25
corvus	mordred: yeah	14:25
corvus	mordred: and i guess we should loop that?	14:25
mordred	corvus: yeah - probably so	14:26
mordred	corvus: if reapprove works, let's add a loop	14:26
mordred	if not - we can investigate more	14:26
*** harrymichal has quit IRC		14:26
plaurin	We're doing UUT testing (units under test), BBD development. We run hardware tests and firmware upgrade/downgrade tests on actual hardware. Reality is that hardware and custom firmware crashes machines a lot, making many of our zuul job wait for unit connections, etc... We of course need proper failure detection but this is complex in the case of hardware and firmware issues and we do get a lot of 'stuck' scripts for long running job	14:26
plaurin	s (4h+ long jobs).	14:26
plaurin	happens a lot that we want to stop something instead of waiting for 4+ hours for it to finish	14:27
mordred	so thus why you'd like to be able to manually cancel a job?	14:27
plaurin	and then trigger it again after we fix hardware	14:27
openstackgerrit	Merged zuul/zuul master: Declare support for Python3.8 https://review.opendev.org/712489	14:27
plaurin	yeah, I use the zuul dequeue and enqueue quite a lot	14:27
corvus	plaurin: which code review system do you use? (gerrit, github, ...?)	14:28
plaurin	gerrit	14:28
plaurin	we also use a lot of cron ' timer ' zuul jobs, running qualification tests 24h	14:29
plaurin	we have to kill those from time to time after hardware failure, stress tests, etc	14:29
corvus	plaurin: i do not recommend it now, but in the future, we should have good support for the checks plugin which has a 're-run' button right in gerrit. so that could help with that part of it. mhu's work would still be the best answer for the dequeue part of it.	14:29
mordred	yeah - in the mean time - we use a "recheck" keyword in comments in gerrit to allow re-triggering jobs	14:30
plaurin	yeah we use recheck a lot too when we can. I set recheck and rebuild	14:30
mordred	cool. so yeah - mhu's work is your answer for the other thing	14:30
corvus	plaurin: thanks for the info, it's good to hear how it's being used :)	14:31
plaurin	sure np :)	14:31
mordred	oh - there's another trick you can use	14:31
mordred	plaurin: if you abandon the change, zuul should cancel its jobs	14:32
mordred	and then kick them off again when you restore it	14:32
plaurin	I have a stuck job right now lol let me try this	14:32
corvus	or if you upload a new patchset (you could just change the commit message right in gerrit), it will cancel and restart	14:32
plaurin	yes new patchset I am aware of	14:33
plaurin	ha, abandon trick works cool	14:34
corvus	plaurin: what version of gerrit are you running?	14:34
corvus	mordred, plaurin: i just thought of a variant of the abandon trick: if we add support for gerrit tags to pipeline requirements (which we should anyway) you could have a pipeline which "rejects: tag: suppress-ci" and you could tag a change with that and it would cause zuul to cancel jobs and refuse to enqueue it.	14:36
plaurin	2.16.13	14:37
plaurin	yes that would be nice for the tag	14:37
corvus	(ftr, i don't think any of this is a better ux than mhu's web dequeue; these are just workarounds until that's ready)	14:37
plaurin	+1	14:38
fungi	from a "jobs are stuck and have to wait for them to time out or abort manually" perspective, how about a watchdog? maybe an option for a job which says it should be prematurely stopped if the console stream goes silent for x minutes, or something like that	14:41
corvus	clarkb, mnaser: you may be interested in giving https://review.opendev.org/710649 and children a quick look -- dhall updates to the operator	14:42
plaurin	fungi we have couple of BBD hardware stress test cases that can be waiting silently for 1 to 2 hours, like a CPU stress test for example	14:44
fungi	yeah, just wondering how you normally identify that it's not going to complete which leads you to end it earlier, and whether that's something which could signal zuul to do the same instead	14:45
plaurin	I depend a lot on devs designing proper failure cases and detection :(	14:46
plaurin	You bet we had a lot of "It works on my machine but not on zuul, so zuul is broken' (...lol)	14:46
fungi	so then... what leads you to stop a job? you said "we do get a lot of 'stuck' scripts for long running jobs ... happens a lot that we want to stop something instead of waiting"	14:47
fungi	how do you make that determination?	14:47
plaurin	It's kinda hard to know and usually unexpected stuff. Many times it's the devs who ask me to stop them because they see something not going well or taking more time than usual	14:48
plaurin	most of my job have 8 hours timeout	14:48
fungi	ahh, okay, just curious if it's a general set of cases which zuul could identify or get notification of automatically	14:48
plaurin	also their test builtin timeout are badly designed but I have no control over this hehe :(	14:48
plaurin	I mean their test timeout built in the code	14:49
corvus	and if there's some way to detect it (by probing or other inspection of the test resource), maybe we could have ansible do that, and tell zuul the job has failed	14:50
plaurin	imho inspection of the status of the test resource should be part of the dev's BDD	14:50
plaurin	they have so many tests that kills and re-enable the network layer that would be hard to follow	14:51
plaurin	soo many unexpected variables hehe	14:51
plaurin	by having trigger 'buttons' I can react quickly for them manually	14:52
plaurin	I guess I could leverage the zuul api too	14:52
Shrews	mordred: what's driving the nodepool reverts?	14:56
plaurin	a button to "disable" a job would also be useful for me. For my 24/7 cron jobs. Sometime I have to suspend them for hardware maintenance like bricked machines, etc	14:57
mordred	Shrews: we reverted the similar code in zuul already because it turns out it's not actually providing the value it purports to - but more importantly it actually broke things in the nodepool-builder image when running under podman	14:58
corvus	Shrews: see https://github.com/RHsyseng/container-rhel-examples/issues/103 for more info	14:59
Shrews	<sung to batman theme> nuh nuh, nuh nuh, nuh nuh, nuh nuh, podman!	15:00
Shrews	you're welcome for that	15:00
openstackgerrit	Monty Taylor proposed zuul/nodepool master: Revert abitrary uid support https://review.opendev.org/713060	15:00
Shrews	corvus: oh, yeah. that's not good	15:02
fungi	holy security holes, podman! now i have the theme stuck in my head, thanks	15:02
Shrews	>:)	15:03
fungi	pow	15:03
clarkb	corvus: left a response to one of your comments on https://review.opendev.org/#/c/713157/2 if you have a minute	15:14
clarkb	mordred: and left a thought on https://review.opendev.org/#/c/713162/3 in response to one of your ideas. I doubt that will work though	15:15
clarkb	ah yup fungi has responded with similar doubt	15:16
corvus	clarkb: wait, elements are not ordered?	15:16
clarkb	corvus: not on the command line. They are internally ordered	15:16
corvus	huh, how about that	15:16
mnaser	i don't want to be a pain but i think maybe we should have opendev/nodepool and zuul/nodepool if we're going to have provider-specific hacks/workarounds	15:20
corvus	mnaser: regarding?	15:20
mnaser	corvus: like the /etc/ssl/openssl.cnf change in the zuul/nodepool change discussed above	15:20
clarkb	mnaser: fwiw that ssl change is in line with other distros	15:21
clarkb	debian is being weird there	15:21
clarkb	oh unless this is the new one	15:21
clarkb	for the new one ya, this gets weird which is why I was trying to brainstorm config based fixes	15:21
clarkb	(I think monty was tring to do that too)	15:21
mnaser	"Note that CAs have stopped issuing certificates that didn't meet those requirements in January 2015, and since January 2017 all valid CA certificates should meet those requirements." -- i'm curious how we'd run into that .. unless they have a really really old cert	15:22
*** Goneri has quit IRC		15:22
clarkb	mnaser: its easy enough to check with openssl s_client /me looks	15:23
mnaser	they both seem recent	15:23
clarkb	and are issued by digicert but sure enough doing sha1	15:24
mordred	mnaser: I agree - I'd rather avoid putting ssl workarounds for a provider into the image and would much prefer figuring out a way to have it be in the provider-specific config	15:24
fungi	yeah, i think step 1 is to try interacting with rackspace via osc/sdk from a recent debian and see if there's a way to make that work without editing /etc/ssl/openssl.cnf	15:25
clarkb	ok its the peer signing digest not the cert itself	15:25
clarkb	this is what is used during the handshake	15:25
clarkb	which is why they can have a recent cert that isn't rejected, but still fail because sha1	15:25
corvus	mnaser: in principle, i agree; but i also don't think "use the default upstream openssl.cnf" file is too out there; if this can happen in sdk, that would be best, but if not, this seems low risk/cost.	15:26
fungi	as a reminder, this is what debian's doing to the upstream openssl.cnf when packaging it for redistribution: https://salsa.debian.org/debian/openssl/-/blob/debian/unstable/debian/patches/Set-systemwide-default-settings-for-libssl-users.patch	15:27
clarkb	"In TLS 1.2 the client can send which digest algorithms are supported for the handshake. If the client does not offer SHA-1 the server should not use SHA-1, which means that it might use other algorithms offered by the client or that the connection fails."	15:27
clarkb	from stackexchange so unsure of how accurate that is, but maybe we can force the remote to use sha2 by not claiming to accept sha1	15:28
*** sshnaidm is now known as sshnaidm\|afk		15:28
* clarkb trying to see how to do that with s_client now		15:28
fungi	i doubt the handshake is claiming to accept sha1, but set_ciphers() should allow us to remove it if so	15:29
*** Goneri has joined #zuul		15:29
mordred	corvus: well - maybe the answer might want to be "use the upstream openssl conf file" instead of "sed a particular setting"	15:32
clarkb	-sigalgs SHA256 is apparently not correct	15:32
corvus	mordred: indeed	15:34
clarkb	ah I need algorithm+hash	15:34
clarkb	`openssl s_client -host dfw.images.api.rackspacecloud.com -port 443 -sigalgs RSA+SHA256` continues to use sha1	15:34
tristanC	ianw: about https://review.opendev.org/713157 , have you thought about using dhall to manage your nodepool.yaml configuration? we are giving it a try and working on dhall-nodepool binding: https://tree.taiga.io/project/morucci-software-factory/us/3378	15:36
clarkb	tristanC: I'm not sue I could +2 such a change in opendev	15:36
corvus	yes please, please keep dhall away from users	15:36
corvus	i still don't understand the operator	15:37
corvus	tristanC: if there is a problem with the zuul or nodepool configuration language, maybe we could work together on making it better?	15:37
corvus	rather that replacing it with something that's very hard for users to understand?	15:38
corvus	(i thought with our discussions on the operator we were in agreement that it was worth exploring for a system like the operator where programmatic manipulation of the config files was necessary, but that it was not suitable for direct use by end-users, which is what we're talking about here)	15:39
clarkb	does openssl itself disable sha1 peer signatures by default?	15:39
mnaser	clarkb: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912759 i think the same thing is happening here	15:39
openstack	Debian bug 912759 in openssl ""wrong signature type" with working websites" [Important,Open]	15:39
mnaser	which links to https://github.com/openssl/openssl/issues/7126	15:40
clarkb	mnaser: probably. Though I'm trying to gauge risk and if debian is being over cautious here	15:40
clarkb	if openssl has decided this isn't valid we've got more leverage with the server side too	15:42
clarkb	because they'll break pretty broadly	15:42
tristanC	corvus: iiuc, either we add abstraction in the runtime like nodepool and zuul to express complex things, like `use this env-vars on all images because it is declared in this new diskimages-globals option`, or either we keep the configuration simple (or even simpler as we could remove zuul project-template too) by moving the abstraction in the configuration language directly	15:42
tristanC	corvus: i am happy with either solution, though i prefer the later as it works for any runtimes	15:43
clarkb	mnaser: et al https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_security_level.html seems to say that seclevel 1 is the default	15:45
tristanC	and both solution are not incompatible, for the issue ianw was having about setting the zuul public key on all images, that something you can do locally in opendev system-config using a dhall-to-yaml step	15:46
mordred	clarkb, mnaser, corvus: reading the above it seems to me like debian is pushing a bit too aggressively here with their config file changes. why don't we just install the upstream openssl conf?	15:46
corvus	tristanC: i don't think the other zuul maintainers are in favor of the second option, so let's do what we've been doing for the past few years: slowly and carefully evolve the config syntax as needed.	15:46
clarkb	and even specifies that higher than 1 is likely to cause many problems	15:46
mordred	yeah	15:46
clarkb	mordred: ya I think that is what I'm leaning towards especialyl since openssl explicitly warns against doing what debian does	15:46
mordred	yeah	15:46
mordred	should we just do it in the python-base image?	15:46
clarkb	note master says the same thing https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_security_level.html	15:47
clarkb	mordred: thats likely to reduce future debugging headaches as more things land on that image	15:47
clarkb	(though we may be able to get away with stricter rules in specific cases)	15:48
corvus	tristanC: note that we learned a lot of things from the zuul syntax that we could probably take advantage of if we reworked nodepool; we never made as much of a clean break with nodepool as we did for zuul in v3	15:48
mnaser	I’m indifferent about sed or replacing the whole thing after doing this small tad of research	15:48
mordred	any thoughts on the best way to do that? should we just download a current copy of the upstream file into system-config? or do a build-time curl?	15:48
mnaser	It seems like the right approach however imho	15:48
clarkb	mnaser: I would vendor	15:48
clarkb	er mordred ^	15:49
mordred	clarkb: kk	15:49
mordred	anybody got a link to the upstream file handy?	15:49
tristanC	corvus: well for nodepool, the issue we are trying to solve with dhall is assigning labels and diskimages accross different provider and launcher	15:49
zbr	how do I query zuul about all builds that run in the last 7 days? apparently the api has a count limit but no way to sort the results	15:49
zbr	and clearly order is neither start or end time	15:49
*** jcapitao is now known as jcapitao_lunch		15:50
corvus	tristanC: that sounds like something that could drive a spec for a new config file format.	15:50
*** jcapitao_lunch is now known as jcapitao_afk		15:50
clarkb	mordred: https://github.com/openssl/openssl/blob/master/apps/openssl.cnf I think that is it	15:51
*** bolg has quit IRC		15:51
clarkb	I don't know if that will work as pure drop in though (possible paths need changing/)	15:51
*** bolg has joined #zuul		15:51
tristanC	corvus: iirc, mordred suggested that nodepool configuration could be managed like zuul, e.g. in config projects.	15:51
mordred	clarkb: maybe we should just apply a reverse of the patch fungi linked	15:52
mordred	clarkb: actually. why don't I move this convo to opendev	15:53
tristanC	but i guess before that, a much needed improvement would be to have a single nodepool configuration that could be used by many launcher and builder service	15:53
clarkb	tristanC: isn't that how it works already?	15:53
clarkb	I guess not with launchers	15:53
*** bolg has quit IRC		15:53
clarkb	(builders kinda need to be set up that way as we discovered recently though)	15:54
corvus	tristanC: note that one of the main tasks of zuul-operator is supposed to be to take the nodepool launcher config and split it so there is one pod for each launcher; that's not implemented yet, is it?	15:54
corvus	clarkb: not really	15:54
tristanC	corvus: right, the zuul-operator could do that, but wouldn't it better if nodepool support it natively?	15:54
corvus	clarkb: builders are intended to have the same config and act as a cluster; it's only when you split architectures do they need different configs.	15:54
tristanC	(and it's not implemented yet, i don't think the launcher service is even activated yet)	15:54
corvus	tristanC: well, the operator has to decide how many launcher pods to run anyway	15:54
corvus	(or, someone could propose a new nodepool algorithm that lets launchers act as a cluster)	15:55
corvus	tristanC: let's see if we can wrap up the work on zk auth/tls?	15:58
corvus	tristanC: it looks like my zuul + quickstart patches are failing tests, i'll fix them up. but they should generally be working. you suggested that you might want to just drop the sasl support and just use x509 for authn/z. do you still want to do that?	15:59
tristanC	corvus: i think that cluster algorithm would help a lot. then back to the initial new diskimages-globals option, not sure if it's ok to revisit now, but i would suggest to add a more generic parent inheritence system similar to what zuul do with jobs	15:59
corvus	tristanC: i agree	15:59
corvus	about both	15:59
corvus	tristanC: i don't think there's any rush on diskimages-globals, feel free to -1 that for more design	15:59
tristanC	alright, let me drop a comment and then i'll look into zk auth	16:00
corvus	(clearly, we can just keep copying dicts around until we're happy with something better)	16:00
corvus	clarkb, mnaser, fungi, mordred, SpamapS (if you're around): what do you think about using tls certs (signed by a private ca) for zookeeper (as authn and authz) and omitting the sasl password support?	16:03
clarkb	corvus: that should be fine	16:03
fungi	seems light it ought to work	16:04
fungi	s/light/like/	16:04
*** saneax has quit IRC		16:04
mordred	seems fine to me	16:04
fungi	how easy is it to specify a ca to trust in zk?	16:05
fungi	do you have to alter the trust set in the jvm?	16:05
tristanC	corvus: assuming tls can prevent anon connection, then i'm happy to abandon the sasl support	16:05
mordred	fungi: well, we have to do that already for tls connections for zk anyway - so basically with tls support we've done all the hard bits of config - so I think the thought is we might as well just use that for auth?	16:06
mnaser	corvus: from someone who waves around the world k8s a lot, i'd probably prefer that cause i can use something like cert manager to create a ca for the deployment and then manage all the certs distribution	16:06
*** bhavikdbavishi has joined #zuul		16:08
corvus	tristanC: btw, zk acls also support x509, so if we wanted, we could keep the default_acl stuff; we wouldn't have to require it, but it could be an option...	16:08
corvus	mnaser: yeah, tristanC started looking at cert-manager	16:09
corvus	mnaser: i wrote a shell script that does the equivalent for non-cert-manager environments. i think cert-manager supports everything the shell script does, so it's hopefully mostly a matter of "porting" it.	16:09
*** hashar has quit IRC		16:12
tristanC	corvus: mnaser: i had a quick look and i couldn't find how it could be deployed in a the same namespace as the zuul-operator (since i didn't want to bother with a cluster wide configuration for our self-signed cert only used by control-plane)	16:15
tristanC	and i wonder if cert-manager is really worth the trouble for zk and gearman tls, since they are self-signed	16:16
corvus	tristanC: cert-manager defines CRDs, so it needs to be installed by an admin in the cluster anyway. is that a problem?	16:18
*** harrymichal has joined #zuul		16:21
SpamapS	corvus: +1, it's the better option. TBH, I'm also starting to wonder if Zuul 6 should be "the etcd3 migration". ZK seems to be losing its community rapidly. :-/	16:22
fungi	up side is that zk server certs don't need to be self-signed if we're already setting up a ca to sign the peer certs for auth	16:22
fungi	we could sign the server certs with the same ca and then peers have the option of validating the server certs	16:23
corvus	SpamapS: ack; that'll be a good time to re-evaluate	16:23
corvus	fungi: yeah, all of these are actually "private CA-signed"; i think tristanC mistyped	16:25
corvus	(nothing is self-signed, zk validates the signature of every cert, server-side and client-side)	16:25
fungi	oh, okay, thanks for clarifying	16:26
fungi	and yes, i have nothing against self-signed server certs (provided appropriate precautions are taken to establish the server's identity some other way), but if there's already a private ca involved then there's little reason for self-signing	16:27
tristanC	corvus: you don't need to be admin to defines CRD, the zuul-operator can (and is) installed locally without being cluster admin	16:28
mnaser	corvus, tristanC: i mean in my case, because of zuul-helm, i can probably add support to generate a ca/etc and point users at "make sure cert-manager is installed"	16:29
mnaser	so it just lets you use the crds opt-in	16:29
tristanC	fungi: corvus: i meant the ca is self-signed. or perhaps i don't actually understand tls :)	16:29
*** jamesmcarthur has quit IRC		16:30
*** harrymichal has quit IRC		16:30
tristanC	mnaser: i think that should work yes. i don't know how that would be expressed in zuul-helm though	16:31
*** jamesmcarthur has joined #zuul		16:31
corvus	tristanC: yes, the ca is self-signed; the ca signs the certs; the server and client trust the (self-signed) ca.	16:31
tristanC	corvus: in that case, i don't think we need any cluster wide cert-manager right? the (self-signed) ca would only be used by zuul services	16:31
tristanC	used internally*	16:32
corvus	tristanC: yeah, it's not about that	16:32
corvus	didn't we talk about this on friday?	16:32
tristanC	i think so, then i looked at using https://cert-manager.io/docs/configuration/selfsigned/	16:32
corvus	yes i looked at that too, that's how you make the ca.	16:33
corvus	i don't understand the resistance to using cert-manager?	16:33
tristanC	but when installing cert-manager, it created a new namespace and required cluster wide things that didn't make sense to me for zuul ca	16:33
fungi	tristanC: right, any ca which is the root of trust may as well sign itself (though they can be cross-signed too). the point of the ca cert is that you can implicitly trust it through out of band distribution	16:33
fungi	so it doesn't ultimately matter where its signature came from	16:33
fungi	as opposed to independently distributing a bunch of self-signed server and even client certs	16:34
tristanC	corvus: the resistance for me is that it's an extra requirements, and it seems like it requires cluster admin privilege.	16:35
fungi	a ca gives you the ability to only have to solve distributing one cert	16:35
corvus	tristanC: i guess i don't see those as objectionable. being able to use the standard k8s way of managing a CA seems worth it to me. it's one extra line in the installation step to install it.	16:38
*** jamesmcarthur has quit IRC		16:41
*** jamesmcarthur has joined #zuul		16:42
corvus	mnaser: you seem to think cert-manager is worthwhile, what would you recommend?	16:42
mnaser	corvus: i mean from a zuul perspective, it shouldn't really care what you end up doing. from an operator/helm perspective, i would think it would be nice to have something like certSource: cert-manager or something along those lines	16:43
mnaser	and then that way it can generate a ca and certs off of it	16:43
mnaser	and the neat thing is they all end up in k8s secrets which you can just mount into the container/env variables/whatever way you end up manipulating it	16:44
mnaser	probably just mount it in /etc/zuul/tls and point configs towards it	16:44
tristanC	corvus: here is a cert-manager installation log http://paste.openstack.org/show/790745/ , used memory went from 948MB to 1.1GB, or about 15% increase in usage. i mean that's not too bad, but still...	16:45
corvus	i feel like if memory efficiency is a concern, maybe k8s isn't the right system.	16:45
fungi	any idea what's using the extra memory? a ca shouldn't need to be a persistent service unless you're also providing a software hsm to automatically handle signing requests or something	16:46
fungi	s/software hsm/virtual hsm/	16:46
corvus	fungi: there's a daemon to watch for cert requests and process them	16:46
corvus	==pod	16:46
mnaser	also	16:47
mnaser	cainjector and webhook is running in there	16:47
mnaser	not needed for this specific use case	16:47
clarkb	fungi: think of it like a whole new api service that updates a ca	16:47
mnaser	its also nice because it can talk to ACME issuers or whatnot	16:47
mnaser	or something even like vault	16:48
mordred	yeah - I mean - k8s isn't exactly the system you use to run the minimum number of daemons	16:48
fungi	yep, got it. so basically acting like an hsm	16:48
fungi	not just distributing a static ca cert	16:48
mordred	it's a system you run to remove the cost of running the daemons you need so that you can reuse work from other people without worrying TOO much about things like "oh, but then we'd need to run another service"	16:49
tristanC	corvus: mordred: well i do care about memory efficienty. but that's not very relevant here, unfortunately.	16:49
corvus	tristanC: i think mnaser has outlined why using cert-manager would be valuable. if you think that reducing the dependency set is more valuable than that, then i guess we could just handle it internally (or support both).	16:50
mnaser	i mean again i dont think this should be a hard dependency	16:50
mnaser	but its a "nice to have if i already have cert-manager deployed"	16:50
corvus	i just want to point out, i'm the one saying "we should use the k8s thing" and tristanC is saying "we should do it ourselves" :)	16:51
mnaser	which many clusters will end up having anyways	16:51
corvus	mnaser: that sounds like a vote for "handle it internally (or support both)"	16:51
corvus	tristanC: so if you want to handle it internally, where will the CA live? a persistent volume for the operator pod?	16:52
fungi	yeah, it's a good point, the overhead from adding the kubernetes certmanager is only overhead if you don't already need it for something else	16:52
mordred	I could see support both - although I could also see just doing cert-manager for a start and then adding "handle it ourselves" as a followup	16:52
mnaser	i haven't followed the tls stuff lately, is it "tls only" or "tls if you want it"	16:52
mordred	like - my main worry with "handle it ourselves" is that we're then reimplementing cert-manager	16:52
tristanC	i meant that i'm happy to look into cert-manager, but i worry it's oversized for what we need. also, isn't the jks still an experimental feature that was released 11 day ago?	16:52
mnaser	because in that case i would be on the side of "you only get tls if you have cert-manager in your cluster .. or you provide your own certs"	16:52
mnaser	the pattern used in things like ingress controllers is that you point towards a tls secret	16:53
tristanC	mordred: isn't zk-ca.sh already a cert-manager reimplementation?	16:53
mnaser	and then the cert-manager would automatically generate certs for that	16:53
corvus	mnaser: i think by zuul v4 we're going to want to require authentication, and it looks like tls may be the way to do that, so probably zuulv4 will require tls	16:54
corvus	it will be optional in v3	16:54
*** hashar has joined #zuul		16:54
mnaser	if it's required, then i think the zuul operator should accept a tls field which points towards a secret in k8s that has a ca/cert/key	16:54
tristanC	corvus: when adding or removing zk service, shouldn't the cluster be restarted anyway?	16:55
mnaser	how that tls secret is generated is up to the user	16:55
corvus	tristanC: no, latest zk can do that live	16:55
tristanC	mnaser: i disagree, why the user should have to worry about genearting zk tls?	16:55
mnaser	well, because they might want to copy paste it from some really complicated internal CA they use	16:56
mordred	I mean - cert-manager seems fairly common across k8s installs these days, right mnaser?	16:56
tristanC	beside it's not such a simple process, you need a java keystore thingy that needs to be base64 encoded	16:56
mnaser	also, the ingress is a good example of this	16:56
mnaser	it doesnt generate the certs, it points towards a tls secret	16:56
mnaser	how that secret is generated can either be the user (using cert-manager or manually creating the secret)	16:56
mnaser	that way you're covering all your bases	16:57
mnaser	a secret that's of type tls is very standard	16:57
fungi	there will absolutely be deployments in organizations where they need to supply their own certificates and trust without depending on any integrated ca/cert management automation, due to regulations compliance	16:57
tristanC	mnaser: to that token, we should also drop gearman tls generation from the operator, and that makes using zuul crd a bit more complicated as you need to provide more and more configuration files	16:57
mordred	so - I think we might be talking across each other here	16:58
fungi	likely	16:59
mordred	for the purposes of intra-zuul communication we need certs signed by a CA - and that CA should _not_ be a CA shared with other things (like some corporate CA) because it's being used for client-auth	16:59
mordred	one of the ways we can accomplish this in teh zuul operator is to have the zuul operator use cert-manager to do the work, which will generate the k8s secrets which it will then use	16:59
corvus	authz even	16:59
mordred	one of the ways is for the operator to use something like zk-ca.sh to generate k8s secrets which could be used by the operator to config things	17:00
fungi	right, while it shouldn't be shared with a general corporate ca, it may need its dedicated ca managed by some outside system to be able to satisfy compliance requirements	17:00
mordred	and one of teh ways is provide config settings that the user can use to specify a set of k8s secrets that user would like to be used for the purpose and have the operator not generate them	17:01
corvus	fungi, mordred: (this case is why it may still be useful to keep the acl support in zuul+nodepool -- in that case one could use a shared corporate CA, but set acls so that only the zuul client certs have authz)	17:01
mordred	as this is for intra-service comm - there are almost no use cases where the third is desirable, so I think for now let's just ignore it	17:01
fungi	it may not make sense, but there are some industry regulations which demand certain properties for any certificate authority operated within the organization	17:01
mordred	(we can always add config overrides later if we find a use case)	17:01
mordred	but I thnik the main thing we're discussing now is: "should the zuul operator use cert-manager to generate, or zk-ca.sh to generate - or both"	17:02
fungi	but yes, so long as we still leave an out to supply the certificates on your own manually, i think that's covered	17:02
tristanC	oh right, having an optional, user-supplied, secret for tls files wfm. though it's not specified in the zuul-operator spec	17:03
mordred	tristanC: so you'd be happy having the zuul-operator use cert-manager if one is there and no secrets are specified and not having a different auto-generated method?	17:04
fungi	for the users who aren't hamstrung by draconian regulations compliance requirements, i think whatever provides the most seamless and hand-off solution while maintaining reasonable levels of security is preferred	17:04
fungi	even if it drags in a dependency and consumes a bit more system resources in the process	17:05
tristanC	mordred: that sounds like an implementation detail. in one case we need to require a new cert-manager dependencies, in the other we use the existing zk-ca.sh. i picked the later because i plan to use it outside of kubernetes too	17:05
corvus	it does sound like it's 3 options: 1) manual 2) cert-manager 3) zk-ca.sh; all of them would produce a secret with cert/key/ca files which will be mounted into the pods and specified in the service configs. they just produce those secrets different ways.	17:06
fungi	1 covers all cases but is not very happymaking for people who just want it to work and don't care how the certs are managed. 2 seems good for the second case	17:08
fungi	i guess 3 would be as well, but then if someone already has kubernetes cert-manager for other reasons they've effectively got two different ca managers going in the environment	17:08
corvus	3) zk-ca.sh is the simplest case -- it handles everything behind the scenes with no extra dependencies; 2) requires more of the cluster administrator, but may integrate better with existing k8s practices; 1) is the option for people who need it for $reasons.	17:09
tristanC	corvus: which could be expressed in the CRD input... for each tls configuration, we could have an `Optional <UserSecret \| use-cert-manager>`, and if that's not set, we generate the cert locally using zk-ca	17:09
corvus	i'm pretty sure if we don't start with cert-manager, it'll never be implemented in the operator though.	17:10
tristanC	or if we don't want to support running without cert-manager, we drop the Optional	17:10
corvus	the thing i'm worried about is mnaser saying "i want to use cert-manager" and adding it to zuul-helm, and tristanC saying "i don't want to use cert-manager" and not adding it to the operator.	17:11
mnaser	corvus: for what it's worth, i was thinking cert-manager would just be opt-in, otherwise you have to supply the secret as values	17:11
tristanC	mnaser: what changes are needed to add cert-manager to zuul-helm?	17:11
mnaser	nothing, just adding one more file into templates/ to generate the ca+certs	17:12
*** jcapitao_afk is now known as jcapitao		17:12
corvus	mnaser: right, so options #1 and #2 for zuul-helm.	17:12
tristanC	corvus: i'm not saying we shouldn't add it to the operator, it's just that will take more time than using the existing zk-ca.sh	17:12
mnaser	yes, i do not want to have people rely on external things. step #1 is a secret that's manually generated using provided values, step #2 is adding k8s cr's to generate those (if the user wants)	17:13
tristanC	do we know if the cert-manager can even be used to satisfy zookeeper jks secretstore thing? :)	17:13
corvus	i do not know that yet	17:14
tristanC	corvus: wait, perhaps that if we support user provided cert in the zuul-operator, then supporting the cert-manager is just a matter of adding more CR to the deployment recipe, and that could be transparent for the operator?	17:14
mnaser	what tristanC said is exactly what i've been pushing for ^	17:15
mnaser	all cert-manager does is create a 'tls' secret inside k8s	17:15
tristanC	mnaser: wouldn't that be the same for zuul-helm, that's actually transparent for the chart too right?	17:15
corvus	we need a cert for each zk server	17:15
mnaser	so as long as we consume a 'tls' secret inside k8s, that's all we need to essentially do	17:15
tristanC	corvus: well that's ok, we can have a List UserSecret	17:16
corvus	i was assuming the operator would need to create a CertRequest CR for each zk pod it created	17:16
tristanC	corvus: that's still up for discussion, but i think we need a user controlled `count` attribute to the zookeeper schemas	17:17
mnaser	hmm	17:17
corvus	or we could fix it at 3	17:17
tristanC	corvus: yes, that would be the easiest	17:17
mordred	doesn't tobiash run 5 atm?	17:18
corvus	oh, i wonder why	17:18
mordred	there was a reason - I have forgotten what it is	17:18
mnaser	german engineering probably	17:18
mnaser	=P	17:18
corvus	maybe we do need a count	17:18
corvus	>=3 :)	17:18
mordred	I think we need a count	17:18
mordred	yeah	17:18
tristanC	so to summarize, the zuul-operator doesn't have to worry about cert-manager, and if the user doesn't provided tls secret, then we just run the existing zk-ca	17:19
mordred	but that's kind of why I was thinking it would be nice to have the operator just handle this - it's a lot of plumbing details that aren't really interesting to a user - and while we _could_ have a chunk of yaml that all users copy-pasta	17:20
mordred	tristanC: that's not what my take away has been so far :)	17:20
corvus	tristanC, mnaser: can you explain how we get the right number of certs in place for the operator if it doesn't know about cert-manager?	17:20
corvus	(and named the right way)	17:21
corvus	(note that the cert subjects must match the hostname; i believe there's an option to disable hostname verification, but it's going to look dodgy for us to set that if we don't need to)	17:21
tristanC	corvus: for idx in $(seq input.zookeeper.count); do zk-ca.sh zuul-zk-${idx}; done ?	17:22
corvus	tristanC: no i understand how zk-ca.sh will work. i'm asking about cert-manager.	17:22
mnaser	i wasn't aware we have multiple certs in place	17:23
mnaser	this is interesting now because i'm not sure how we can mount a different secret per pod	17:23
corvus	mnaser: minimum of 5.	17:23
mnaser	so if you have 3 replicas of zuul-executor, how do you make sure each one gets its _own_ secret	17:23
corvus	(at least 3 servers, a ca, and a client (which can be shared among all clients -- we don't need hostname verification for that))	17:23
corvus	mnaser: zuul-executor can share secrets	17:24
corvus	it's the zk servers that (ideally) require unique certs	17:24
mnaser	right, i think in plain k8s, that is somewhat of a challenge, as pods are meant/designed to be identical in definition	17:24
mordred	there's a pattern for this used in the mariadb galera cluster	17:25
tristanC	mnaser: here is the scheduler statefulset with zookeeper and gearman tls : https://zuul.opendev.org/t/zuul/build/562b3ab7f32940b18db7cc35043d2f26/console#3/0/4/ubuntu-bionic (search for StatefulSet/zuul-scheduler )	17:25
corvus	mnaser: this isn't uinque to certs, each zk server needs to know its own id too, that's usually a configmap-per-server	17:25
tristanC	oh i see, different secret per replicaset	17:25
corvus	these aren't replicas	17:26
mordred	yeah. these are three individually defined pods in a logical relationship with each other	17:26
tristanC	hmm, how about one statefulset per zookeeper pod then?	17:27
mnaser	i wouldn't encourage that, because you'll end up with like zuul-0-0 zuul-1-0 etc	17:27
mnaser	im looking how the current helm charts do things (like different config/id)	17:28
fungi	granted if the executors all share a cert, then if one of them is compromised you have to replace the cert on all executors	17:28
fungi	probably not a huge deal, but something to keep in mind in the risk model for the system	17:28
mnaser	yeah the zuul bits are easy i think the zookeeper are the harder ones	17:28
mordred	yah - becuae the executors are only precious due to cache coherency - they _can_ just scale up and down in count	17:29
mordred	the zk cluster is different	17:29
tristanC	mordred: iiuc, in the event the user doesn't provide tls secret, the fact that the operator use a cert-manager or run the zk-ca script doesn't make a significant difference? and if for compliance or other reason the user would like to use a cert-manager, then the user provide them as tls secret	17:30
corvus	fungi: yes; though given that (unless you're using zoning) all executors can run any job, if one is compromised, it would be a good idea to burn them all	17:30
fungi	i concur	17:30
fungi	how is certificate revocation handled in that case?	17:30
fungi	is a crl maintained/distributed to everything using client certs for authorization?	17:30
clarkb	fungi: new ca	17:30
fungi	ouch	17:31
clarkb	I'm not sure it checks a revocation list	17:31
corvus	crls cost more than cas.	17:31
clarkb	(many things seem to not do that)	17:31
corvus	tristanC: i'm just not getting that. cert-manager is not to handle the case of compliance. it's for folks who want to manage their k8s systems using k8s tools. it should be just as automatic as the zk-ca.sh.	17:32
tobiash	corvus, mordred: we run 5 because availability (and because we're running them on tmpfs because ceph on spindles performs poorly with zk at our scale)	17:33
mordred	tobiash: awesome. thanks!	17:33
corvus	tristanC: but i've yet to understand what you and mnaser do about how allowing users to supply certs manually enables cert-manager to work with the operator.	17:33
corvus	tobiash: so more zk servers mitigates the risk from not having persistent storage?	17:34
zenkuro	hi, when I run tox -e docs in locs	17:34
mnaser	corvus: i can talk about how it makes sense for zuul, but for zookeeper, i dont know actually	17:34
zenkuro	in local zuul copy do I execute any additional programs?	17:34
corvus	zenkuro: that should be all that's necessary; are you getting an error?	17:34
mordred	zenkuro: it shouldn't be running anything other than sphinx-build in the virtualenv	17:35
*** evrardjp has quit IRC		17:35
tobiash	corvus: well it didn't save us from data loss tbh. What we do now is we have data and snapshots on tmpfs and a sync crontab that syncs all to a persistent volume every minute	17:35
*** evrardjp has joined #zuul		17:36
tobiash	And that is synced back to tmpfs on pre startzp of zk	17:36
tristanC	corvus: if user sets zk.count = 3; then it needs to provide three certificate issued for '${cr-name}-zk-[123]-0.${namespace}.svc...', then the operator attach each secret to each zk deployment	17:36
corvus	tristanC: right, that's not something that can be put in a static k8s yaml file	17:37
tristanC	corvus: why not?	17:37
corvus	because that's templated?	17:37
corvus	i mean, that's the whole point of operators, right? to do this for users?	17:37
corvus	the way the operator should interact with cert-manager is: when it knows how many zk servers to create, it creates the cert request CRs for each	17:38
corvus	(because, i mean, you can run a whole zuul cluster without any operators or helm charts or anything, just using a single static k8s yaml file, see https://gerrit.googlesource.com/zuul/ops/+/refs/heads/master/k8s/zuul.yaml but that's tedious)	17:39
*** chandankumar is now known as raukadah		17:39
mordred	yeah. like - I feel like the tls connections for zk are _totally_ an impl detail that the operator should elide for the user	17:40
mordred	most users shouldn't even have to care that there is a zk at all	17:40
corvus	++	17:40
mordred	but certainly shouldn't have to care about how the CA and certs work for it to talk to itself	17:40
tristanC	ok, but you still want the zuul-operator to register zk tls certificate in an external service	17:41
zenkuro	mordred: it tries to run zuul-manage-ansible	17:41
mordred	I want the operator to use CRDs from cert-manager to generate a CA, a client cert and a cert-per-zk-pod	17:41
zenkuro	corvus: yep	17:41
mordred	becuase it is an existing k8s system that is designed for this purpose	17:42
corvus	zenkuro: can you copy the error into http://paste.openstack.org/ and then paste the resulting url here so we can see it?	17:42
tristanC	mordred: sigh, alright, i'll have a look now and see if it can actually be used	17:42
corvus	zenkuro: it does that so it can include the help output in the manual	17:42
mordred	if that is untennable, I think the operator should use zk-ca.sh to do the same thing - my main worry is that now we will have our own local impl of cert-manager that only does what we're doing	17:42
mordred	tristanC: this is mostly because managing certs and cas is one of those things that I think extra effort shoudl be used to re-use existing work because most of the time doing it yourself is a recipe for making a mistake somewhere and not noticing	17:43
tristanC	mordred: right i understand, but i'd rather spend time doing something else than zk tls, especially since we eventually made the zk-ca works last week.	17:45
tristanC	and if there is a mistake in zk-ca, we should care and fix it, as it will likely be the tool used outside of kubernetes	17:46
zenkuro	corvus: sure, in 5 min	17:46
tristanC	and iiuc, zk requires jks, which seems to be an experimental feature of the cert-manager that got released less than two weeks ago	17:47
tristanC	so perhaps we should wait cert-manager support for jks to be stable before relying on it?	17:47
corvus	tristanC: zk requires either a jks file; or a pkcs12 file with .jks extension; or a pkcs8 pem-encoded file with a .pem extension (but the key has to be encrypted)	17:49
corvus	i don't know which of those cert-manager can output, but i think that's at least 2 or 3 distinct options there	17:50
mordred	tristanC: that's good point about managing zk-ca.sh anyway, why don't we start with zk-ca.sh - but I think using cert-manager if it's there is still worthwhile (and the thing that k8s people with cert-manager would expect to happen), perhaps we do that as an upgrade followon - and have the operator detect if it can use cert-manager or if it needs to use zk-ca	17:51
corvus	mordred: ++ i think it's important that we have an answer about cert-manager	17:52
*** jamesmcarthur has quit IRC		17:52
corvus	because if an experienced k8s operator (person) uses this, they will expect that (see mnaser's very first reaction when we started talking about this)	17:52
corvus	so the answer should either be "we do! use this option!" or "we looked into it, and it doesn't quite produce the output we need"	17:53
corvus	but it should not be "because we didn't feel like it". that will reflect poorly.	17:53
tristanC	corvus: so there are ways to use cert-manager for zk in the zuul-operator, but what about the zuul-heml charts, should we start here first, then i'll port the change to zuul-operator (as i did with the initial service labels) ?	17:57
*** jamesmcarthur has joined #zuul		17:57
*** jpena is now known as jpena\|off		17:58
corvus	tristanC: i don't quite know enough helm to know how to do that; but i'm assuming there's some way that will be similar -- however you create the 3 (or more) zk pods, do the same for the tls certs	17:59
corvus	tristanC: (i don't think there's a way to use zk-ca.sh with helm -- other than just running it before running helm and then using its output as input to helm)	18:00
zenkuro	corvus: softwarefactory-project.io/paste/show/1763	18:04
tristanC	oh well, it seems like zuul-helm is using https://github.com/helm/charts/tree/master/incubator/zookeeper which simply doesn't have tls or sasl	18:05
corvus	zenkuro: hrm, i was hoping for more output. there may be something useful in .tox/docs/log/ directory.	18:07
corvus	zenkuro: but also, you may need to install the packages from bindep.txt	18:08
corvus	zenkuro: something like "sudo yum -y install $(bindep -b test)" may be necessary...	18:09
corvus	mordred, clarkb: ^ does that sound right?	18:09
clarkb	corvus: yes that command for installing things looks correct	18:09
clarkb	run from the repo top level dir (otherwise you can specify a path to the bindep file)	18:10
mordred	corvus: I rechecked that osc image job and got the same error - so it might actually be a something	18:12
zenkuro	corvus: softwarefactory-project.io/paste/show/1765	18:12
mordred	oh wait	18:13
zenkuro	looks like Im mising zuul manage ansible... but	18:13
corvus	zenkuro: that should be installed by tox; but running it is failing for some reason	18:14
zenkuro	do I need to install this to have check doc style? Isnt it a problem for security?	18:14
corvus	zenkuro: another thing you could do is try running ".tox/docs/bin/zuul-manage-ansible -h" and see what the output is	18:14
mordred	corvus: ignore me - I hadn't added the appropriate zuul user to the dockerhub org :)	18:14
corvus	mordred: aha!	18:15
*** sshnaidm\|afk is now known as sshnaidm		18:15
corvus	zenkuro: if you have run tox before, but have recently updated the zuul repo, there may be new dependencies. in that case, you should run "tox -r -e docs" to recreate it.	18:15
zenkuro	corvus: it fails with some python deps	18:15
corvus	zenkuro: pathspec?	18:15
zenkuro	re2	18:16
fungi	this may be the recent fedora re2 incompatibility someone was talking about	18:16
corvus	zenkuro: then do both things: run the bindep command, then run "tox -r -e docs"	18:16
corvus	fungi: oh?	18:16
corvus	i'm unfamiliar with the issue	18:16
fungi	i'll see if i can find details	18:16
fungi	2020-03-13 11:30:21 tobiash clarkb, ianw: fyi libre2 in latest fedora is incompatible with prebuilt wheel of fb-re2	18:17
fungi	(from scrollback in here)	18:18
zenkuro	yah, I have fedora	18:18
tobiash	In fedora fr-re2 must be force installed by source (I don't recall the command atm)	18:19
corvus	--no-binary ?	18:19
tobiash	corvus: I think that was it combined with a filter	18:20
mnaser	fwiw zuul-helm does _not_ do zookeeper deployments	18:20
corvus	--no-binary fb-re2 ?	18:21
*** jamesmcarthur has quit IRC		18:22
zenkuro	hm... kooks like installation of re2 devel src solved the issue	18:22
tristanC	mnaser: is the zookeeper dependencies coming from https://github.com/helm/charts/tree/master/incubator/zookeeper ?	18:22
mnaser	tristanC: only for the "zuul-system" chart which is meant to be an integration point for now	18:23
corvus	zenkuro: what commands did you run?	18:23
mnaser	mordred: unrelated but i know you tinkered a lot with pyenv. did you find a way to get pyenv to install "latest" of a specific series	18:24
mnaser	aka pyenv install 3.7 (noonedeadpunk dug in initially with no success	18:24
*** bhavikdbavishi has quit IRC		18:25
tristanC	mnaser: hmm, how is the zuul-helm supposed to be used then?	18:25
mordred	mnaser: nope. I do "pyenv install --list \| grep '^ *3.7'"	18:25
mordred	then look for the version with my eyeballs	18:25
mordred	it's not idea	18:25
mordred	ideal	18:25
mnaser	tristanC: "bring your own zookeeper and db"	18:25
*** jcapitao has quit IRC		18:25
tristanC	mnaser: oh, ok. then when zuul requires zk auth, then that would be 'bring your own zk tls too'	18:27
mnaser	yep	18:27
zenkuro	corvus: dnf install re2-devel	18:30
corvus	zenkuro: ok good, that's already in bindep, so the bindep command i gave you earlier would have run that	18:30
zenkuro	corvus: tox -r -e docs	18:30
tobiash	corvus, zenkuro: I used this to fix the tox env: ".tox/py37/bin/pip install --force-reinstall --ignore-installed --no-binary :all: fb-re2"	18:31
tobiash	(and installed re2-devel)	18:31
corvus	tobiash, zenkuro: i just ran tox -e docs in a fedora:latest container after running "yum -y install $(bindep -b doc test)" and it worked without error	18:31
zenkuro	but why docs require so much things?	18:31
tobiash	corvus: maybe the package has been fixed in the meantime	18:32
corvus	zenkuro: because it runs the zuul commands to collect the help output	18:32
tobiash	at least there was a new release in february	18:33
*** jamesmcarthur has joined #zuul		18:35
tobiash	corvus: oh, it looks like they just removed the prebuilt wheels from pypi: https://pypi.org/project/fb-re2/1.0.7/#files	18:36
tobiash	that's also the reason the devel package is required now	18:36
corvus	well that works :)	18:38
tristanC	according to https://github.com/jetstack/cert-manager/pull/2643/files#diff-c210e0aac6b999044e3472381a445c4b , it seems like the keystore password needs to be set at cert-manager installation time	18:44
*** jamesmcarthur has quit IRC		18:51
tristanC	and well, it seems like it's part of the crd yet, only the controller seems to be implemeneted. corvus: is there something else i can use beside pkcs12 or jks?	18:59
zbr	ensure-tox: https://review.opendev.org/#/c/690057/ -- i really need it in order to be able to change how tox is installed	19:01
tristanC	here is the Certificate CR I tried using a pkcs12 enabled cert-manager: http://paste.openstack.org/show/790752/	19:01
tristanC	corvus: i think that's a good reason why the zuul-operator is not using cert-manager ^	19:05
corvus	tristanC: pem-encoded pkcs8 with an encrypted key is an option	19:10
tristanC	corvus: i'm not seing an option to provide an encryption key for pcks8, do you think that should be provided by the cert-manager?	19:14
mnaser	so with adding the 'vexxhost' tenant -- i noticed zuul breaks out after the first missing required-project (rather than checking all of them then breaking out)	19:15
mnaser	so its complanining about openstack/cinder only missing rather than all the others	19:15
corvus	tristanC: probably worth a look at how the jks support handles it (it seems very likely that jks support would include a store password)	19:15
mordred	corvus: where did we ever wind up with the "treat required projects as foreign" or whatever we were calling that potential change for making things like re-using repos like devstack easier? I'm pretty sure I should know the answer, but I'm not able to page it into my brain pellets	19:16
corvus	mordred: needs more design: https://review.opendev.org/613143	19:17
mordred	ah yes!	19:17
tristanC	corvus: reading https://github.com/jetstack/cert-manager/blob/master/cmd/controller/app/options/options.go#L322 , it seems like those (experimental) options are provided at cert-manager install time, L6597 of the yaml file	19:17
mordred	mnaser: ^^	19:17
mordred	mnaser: idea being that you really shouldn't need to add _any_ of those projects to your tenant if you don't want to - assuming the design issues can be sorted out	19:18
corvus	tristanC: oof, that is the wrong place for that	19:25
corvus	tristanC: it should be per-keystore, so i'd expect it to be an attribute of the certificate	19:25
corvus	tristanC: or, at the very least, a property of the issuer	19:26
corvus	it's certainly not appropriate for it to be cluster-wide	19:26
corvus	tristanC: what yaml file?	19:27
corvus	i'm having trouble figuring out where --experimental-jks-password should be provided	19:28
tristanC	corvus: https://github.com/jetstack/cert-manager/releases/download/v0.14.0/cert-manager.yaml L6597 I added: `- --experimental-issue-pkcs8` and `- --experimental-pkcs12-keystore-password=secretstorepassword`	19:28
*** jamesmcarthur has joined #zuul		19:28
mnaser	mordred: yeah -- i think i'd be a little more comfortable fixing the for loop that breaks early :P	19:28
corvus	tristanC: thanks, that clarifies. also, wow, that's an interesting way for that to happen.	19:28
corvus	mnaser: i think having that evaluate the complete set of r-p would be feasible and reasonable; i can't think of a reason not to do it right now.	19:29
mnaser	ok i found it	19:30
mnaser	we're raising an exception when we hit an unknown project	19:30
corvus	mnaser: in configloader.py	19:30
corvus	yeah that	19:30
mnaser	the messages zuul emits aren't really like.. an api right?	19:30
mordred	seems collecting them and raising an exception with a larger error payload would be potentially nicer	19:30
mnaser	^^^	19:30
mnaser	so that sems reasonable to just start rewriting the message	19:31
corvus	mnaser: nope; the ones that are are special exceptions	19:31
corvus	yep	19:31
* mordred afks for just a bit		19:31
* corvus gets lunch		19:31
*** harrymichal has joined #zuul		19:34
mnaser	just wrote something with tests now, testing locally	19:36
* clarkb is back ish from lunch and dealing with kids		19:41
mnaser	cant run the tests locally because gear and select	19:54
clarkb	mnaser: there is a chnage up to address that but still needs people on osx to test I think	19:54
mnaser	clarkb: yeah i saw that and i saw it was also reverted too at some point, assuming it broke other things	19:55
clarkb	mnaser: https://review.opendev.org/#/c/708267/	19:55
clarkb	yes this fixes the revert (or so we hope)	19:55
mnaser	clarkb: i'll try to make time to try that out	19:58
openstackgerrit	Mohammed Naser proposed zuul/zuul master: config: show error for all unknown projects https://review.opendev.org/713323	19:58
mnaser	corvus, mordred: should be sane ^	19:58
mordred	mnaser: you angered the gods of whitespace	20:05
mnaser	NOT AGAIN	20:05
mnaser	ugh	20:07
openstackgerrit	Mohammed Naser proposed zuul/zuul master: config: show error for all unknown projects https://review.opendev.org/713323	20:07
*** jamesmcarthur has quit IRC		20:29
*** jamesmcarthur has joined #zuul		20:30
ianw	tristanC : if things are complex, you could write the config file in jinja2 and deploy it as a template too ... personally i see it as something small enough to be useful and put in the linked dnm project-config change to illustrate why	20:33
ianw	if yourself and corvus like the idea of a parent structure more, can certainly implement that	20:34
ianw	but if nobody thinks the general idea is useful, well, then that's that :)	20:34
tristanC	ianw: i don't mind either way. I think there is value in being able to express hierarchy. In our case, we have fedora-30, fedora-31 and fedora-rawhide that are mostly identical beside the version name, and having parent image would remove the repetition. With a base image, we'll also get the global option you are proposing too.	20:55
*** jamesmcarthur has quit IRC		20:57
ianw	tristanC: yeah, thanks for that suggestion it does feel more "zuul-onic"; i can write something up and see if it slots in neatly or if there's some horrific corner case i haven't thought of yet :) (my initial feeling is the former)	20:59
tristanC	ianw: well we have to be careful about merging. we could override parent attribute with child, but perhaps for elements list we want to extend instead?	21:01
ianw	yes, elements list i think we want to extend, as done in the original	21:03
ianw	otherwise dict update() semantics	21:03
*** jamesmcarthur has joined #zuul		21:06
openstackgerrit	James E. Blair proposed zuul/zuul master: Add TLS support for ZooKeeper https://review.opendev.org/712531	21:08
openstackgerrit	James E. Blair proposed zuul/zuul master: Use ZK TLS in quickstart https://review.opendev.org/712817	21:08
corvus	tristanC: ^ i think that should fix the errors that showed up in tests	21:10
tristanC	corvus: i'll submit a zuul-operator update shortly	21:12
tristanC	corvus: qq, it seems like using a generic name like `zk` works for client auth. Is a fqdn required for quorum ssl, or would it be possible to use a single server key for all zk pod?	21:14
corvus	tristanC: it's required for quorum unless you turn off host verification (i tend to think we should leave it on if we can, but it's open to discussion if folks think it would be okay to turn it off -- mostly i worry it will look bad)	21:15
corvus	(i guess it's probably the only way to perform authz on the quorum side, unless you used two different cas. so it's probably good to keep it enabled)	21:16
clarkb	Shrews: left a couple notes/questions on https://review.opendev.org/#/c/712539/3	21:17
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: Add TLS configuration to ZooKeeper service https://review.opendev.org/712759	21:18
tristanC	corvus: alright, so that seems to work for the current single pod zk service ^	21:19
tristanC	now i wonder what it would take to add tls support to the existing zk operator or chart so that the operator doesn't have to worry about that	21:20
openstackgerrit	Clark Boylan proposed zuul/nodepool master: Revert abitrary uid support https://review.opendev.org/713060	21:20
clarkb	mordred: corvus ianw ^ decided to updated that and keep that moving forward	21:21
corvus	tristanC: should be pretty easy -- but if the zuul-operator is going to run zk internally, it should at least have a quorum of 3	21:22
tristanC	corvus: maybe not so easy. the zk helm chart manage replication using a single statefulset, thus if we want one secret per pod, then that might a more profond refactor	21:26
tristanC	ftr, it is https://github.com/helm/charts/blob/master/incubator/zookeeper/templates/statefulset.yaml	21:26
*** plaurin has quit IRC		21:27
tristanC	and that seems to be the case for the operator too: https://github.com/pravega/zookeeper-operator/blob/1c4e3f1908412e219efa630445ce003fdcae279e/pkg/zk/generators.go#L41	21:28
corvus	tristanC: how does it set myid?	21:29
tristanC	corvus: https://github.com/pravega/zookeeper-operator/blob/master/docker/bin/zookeeperStart.sh#L33 where ORD is the last number of the hostname, e.g. zk-0 zk-1 ...	21:32
tristanC	perhaps we could shove all the service cert inside a single secret, and use that zookeeperStart script to pick the right cert based on ORD too	21:33
corvus	tristanC: i love how many "how do i do x in k8s?" questions are answered with "custom start script" :)	21:36
corvus	tristanC: but yeah, if they've already gone down the path of parsing the hostname, then that could work.	21:37
*** sshnaidm has quit IRC		21:37
corvus	tristanC: actually...	21:38
corvus	tristanC: jks has a concept of an alias -- and i think zk uses that to ask the jks for the cert, so it might actually just work to just put all the certs in the same jks and give that to each host	21:38
corvus	tristanC: i did not test that, but i could do that now without too much trouble if that sounds useful.	21:39
corvus	tristanC: (and i'm sorry i didn't think of this earlier)	21:39
*** jamesmcarthur has quit IRC		21:41
*** sshnaidm has joined #zuul		21:44
tristanC	corvus: tls support request submitted here: https://github.com/pravega/zookeeper-operator/issues/141	21:51
tristanC	mordred: would you mind approving https://review.opendev.org/#/c/712816	21:54
mordred	tristanC: +A	21:54
tristanC	iirc tobiash manage zk in k8s locally, and perhaps it would be better if the zuul-operator does that too	21:57
tristanC	but the fact that no zk deployment system seems to bother with tls or sasl is quite annoying	21:58
*** jamesmcarthur has joined #zuul		21:59
corvus	tristanC: well, it's pretty new	21:59
corvus	tristanC: i'm working on testing the single-keystore idea	22:00
corvus	tristanC: it doesn't look like that's working. even though zk insists that the keystore alias for a cert match the subject of the cert, it doesn't actually pull out the cert which matches its own hostname. so when i build a keystore with 3 certs, all the quorum hosts present the first cert when they connect. :(	22:04
corvus	that's a real missed opportunity	22:04
corvus	so i think we're stuck with "provide a different cert file to each quorum host"	22:04
corvus	this is what my keystore looked like: http://paste.openstack.org/show/790758/	22:05
corvus	but they all used zoo1	22:05
*** jamesmcarthur has quit IRC		22:06
tristanC	arg, too bad. makes me wonder why the keystore support multiple key :)	22:06
corvus	tristanC: i think for this case, but i think the zk developers didn't take advantage of it	22:08
corvus	tristanC: oh! i found that you can use unencrypted pkcs8 files	22:17
corvus	tristanC: i think that means that we should be able to use the standard output from cert-manager without change	22:18
openstackgerrit	James E. Blair proposed zuul/zuul master: Don't use JKS with ZK https://review.opendev.org/713340	22:21
corvus	tristanC: ^ like that	22:21
mordred	corvus: that seems WAY better	22:32
corvus	yeah, it doesn't solve the additional complexity for the operator, but it at least reduces the problem to something that most people are familiar with ("here is a pem file for this server")	22:33
corvus	hopefully it makes the cert-manager problem easier	22:33
mordred	++	22:33
corvus	("for the operator" -> "for the zk operator")	22:33
*** jamesmcarthur has joined #zuul		22:36
*** Defolos has quit IRC		22:40
*** jamesmcarthur has quit IRC		22:42
*** jamesmcarthur has joined #zuul		22:42
*** Defolos has joined #zuul		22:42
*** jamesmcarthur has quit IRC		22:43
*** jamesmcarthur has joined #zuul		22:44
tristanC	corvus: that looks promising, i updated the zk-operator issue.	22:46
corvus	tristanC: cool; if it's not too hard for you to try, it may be worth asking cert-manager for a cert, then seeing if zk can start up with that (using a config like my latest change to quickstart)	22:49
*** threestrands has joined #zuul		22:51
mnaser	i posted these over the weekend and they're blocking a change of mine: https://review.opendev.org/#/c/713107/ and https://review.opendev.org/#/c/713115/ if possible to have a look at them :)	23:04
mnaser	(i have a depends-on so my change is pending on those)	23:04
*** jamesmcarthur has quit IRC		23:11
*** jamesmcarthur has joined #zuul		23:11
openstackgerrit	Merged zuul/zuul master: config: show error for all unknown projects https://review.opendev.org/713323	23:15
*** hashar has quit IRC		23:15
*** jamesmcarthur has quit IRC		23:20
*** jamesmcarthur has joined #zuul		23:36
*** harrymichal has quit IRC		23:39
*** clayg has left #zuul		23:49
*** jamesmcarthur has quit IRC		23:56
*** jamesmcarthur has joined #zuul		23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!