Friday, 2019-11-22

*** rlandy is now known as rlandy\|bbl		00:01
*** mattw4 has quit IRC		00:47
*** michael-beaver has quit IRC		00:50
*** tosky has quit IRC		00:56
*** openstackstatus has joined #zuul		01:04
*** ChanServ sets mode: +v openstackstatus		01:04
*** jamesmcarthur has joined #zuul		01:34
*** igordc has quit IRC		01:57
*** rfolco has joined #zuul		02:22
*** rfolco has quit IRC		02:27
*** rlandy\|bbl is now known as rlandy		02:53
*** rlandy has quit IRC		02:56
*** jamesmcarthur has quit IRC		03:13
*** openstackgerrit has joined #zuul		05:55
openstackgerrit	Merged zuul/zuul-jobs master: install-podman: also install slirp4netns https://review.opendev.org/695601	05:55
ianw	Shrews: i know right! next we'll need a trumpet winsock package	05:58
*** rfolco has joined #zuul		05:58
*** rfolco has quit IRC		06:03
*** zbr has joined #zuul		06:06
*** zbr\|ooo has quit IRC		06:08
*** raukadah is now known as chkumar\|rover		06:37
*** saneax has joined #zuul		07:28
*** tosky has joined #zuul		08:15
*** pcaruana has joined #zuul		08:21
*** bolg has joined #zuul		08:36
openstackgerrit	Matthieu Huin proposed zuul/zuul master: authentication config: add optional token_expiry https://review.opendev.org/642408	09:03
openstackgerrit	Matthieu Huin proposed zuul/zuul master: Authorization rules: support YAML nested dictionaries https://review.opendev.org/684790	09:04
*** mhu has joined #zuul		09:04
mhu	Hey there, here are a few patches on the zuul_admin_web topic that have been +2'ed for a while but were blocked by another patch, can we get these to +3 now that the blocker was merged?	09:07
mhu	https://review.opendev.org/#/c/642408/ and https://review.opendev.org/#/c/684790/	09:07
mhu	and this one too while we're at it, as it'd simplify GUI integration https://review.opendev.org/#/c/684790/	09:08
*** sshnaidm is now known as sshnaidm\|off		09:21
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Fix deletion of stale build dirs on startup https://review.opendev.org/620697	10:01
reiterative	Hi. Can anyone point me to some guidance on suitable memory / storage allocations for the various elements of Zuul (NodePool, scheduler, executors)? This is not for a production environment (yet), and will only need to support a small number of devs (10-20) and a modest number of repos initially. I'm expecting to discover a lot of this in the process, but if anyone has tips or lessons learned to share, I'd be very grateful!	10:31
*** yolanda__ has joined #zuul		11:08
*** yolanda has quit IRC		11:09
*** mhu has quit IRC		11:23
*** yolanda__ has quit IRC		11:33
*** yolanda has joined #zuul		11:46
*** yolanda has quit IRC		11:52
*** rfolco has joined #zuul		11:55
fungi	reiterative: for our environment (i help run opendev), we mostly stick with smallish virtual machines for each component and scale out with more virtual machines as we need. we've been using 8gb ram flavors for zuul executors and nodepool builders, 2gb for dedicated zuul mergers and nodepool launchers, and then the bit which can't scale horizontally at the moment is the zuul scheduler which we're using a 30gb	11:57
fungi	flavor for (it also hosts the zuul-web and zuul-finger daemons) but now that we seem to have gotten the memory usage under control there it could probably get by fine for us on 15gb, oh and 4gb flavor for our zookeeper cluster nodes	11:57
fungi	as for storage, it really depends on how much data you're dealing with	11:57
fungi	we build a bunch of different virtual machine images for different releases of at least half a dozen linux distributions, so we mount a 1tb block device on them to give diskimage-builder room to do its thing	11:59
fungi	our executors each have an 80gb volume mounted on /var/lib/zuul	12:00
reiterative	Thanks fungi - that's very helpful	12:01
fungi	for size comparison, we've tuned our executors to be able to comfortably run around 100 concurrent builds each	12:02
fungi	if you want to be able to see more detailed stats of the individual servers, we have public graphs: http://cacti.openstack.org/	12:03
reiterative	Yes, i can see what the executors and individual nodes would need plenty of available resources, but what about the Zuul scheduler and nodepool instances?	12:03
fungi	sizing nodepool instances will really depend on your job payloads. we standardize on 8gb ram, 80gb disk and 8vcpus for job nodes	12:04
*** yolanda has joined #zuul		12:05
fungi	the scheduler server really doesn't use much disk, though we put its /var/lib/zuul on a separate volume just in case	12:05
fungi	keep in mind ours is a fairly large environment, with around a couple thousand projects and running roughly 1000 concurrent builds at peak	12:09
reiterative	Yes, I am feeling very much out of my depth at this scale :-)	12:09
fungi	but to handle that we're running several nodepool launchers, 12 zuul executors, 8 additional dedicated zuul mergers and a 3-node zk cluster	12:10
fungi	and we spread the workload across many different public cloud providers	12:10
fungi	there are other folks in here who run just a few concurrent builds for small teams who can hopefully give you an example from that end of the scale	12:11
fungi	the key takeaway though is that you can start out small (even an all-in-one deployment on a single server if you really want) and then scale out by moving components to their own servers and adding more of them as you need	12:12
reiterative	Thanks for your help - it's really useful to understand how it works at scale, as that's one of the reason I'm so interested in Zuul	12:13
reiterative	But I need to start small :-)	12:14
fungi	yep. i think pabelanger has a few smallish deployments which might provide a good counterpoint, once he's around	12:14
fungi	we've got regulars in here running zuul/nodepool at a wide variety of scales	12:15
reiterative	Great	12:16
*** yoctozepto has quit IRC		12:43
*** yoctozepto has joined #zuul		12:43
*** jamesmcarthur has joined #zuul		13:04
*** rlandy has joined #zuul		13:04
Shrews	ianw: Yeah. I believe the Internet truly was more fun when you had to experience it via Netscape, that ran on slackware linux, that you installed via 20-some floppies, that you obtained from hours spent in a computer lab with a green-bar printer, and then spent hours installing, then hours customizing Xconfig, just so you can dial into a local modem pool to get terminal access to run slirp to get said Internet access that was slow as mud (not	13:05
Shrews	the game).	13:05
fungi	Shrews: i just had this discussion yesterday on another irc network. for me the "good ol' days" of the internet were when you could walk up to just about any serial terminal or telnet to any server and log in as guest/guest	13:07
Shrews	ah yes	13:07
* fungi tries to recapture that sense of community and belonging in everything he builds		13:08
* Shrews tries to telnet to yuggoth.org machines as guest		13:09
fungi	so do a bazillion bots, so no that specific example is unfortunately no longer relevant	13:10
*** mhu has joined #zuul		13:10
*** mgoddard has quit IRC		13:44
*** mgoddard has joined #zuul		13:45
*** jamesmcarthur has quit IRC		14:12
*** jamesmcarthur has joined #zuul		14:37
*** themr0c has quit IRC		14:54
*** Goneri has joined #zuul		15:17
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Fix deletion of stale build dirs on startup https://review.opendev.org/620697	15:24
tobiash	corvus: this finally fixes the test cases of the executor job dir cleanup ^	15:25
corvus	tobiash: i keep forgetting that hasn't merged	15:32
corvus	should we wait 6 more days and merge it on its 1 year anniversary?	15:32
tobiash	lol	15:32
tobiash	works for me, but I'll pull it into our deployment now ;)	15:33
corvus	heh, i'll review it asap, thanks!	15:33
tobiash	unfortunately it got a little bit bigger than anticipated due to necessary test changes	15:34
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support https://review.opendev.org/695051	15:38
*** bolg has quit IRC		15:56
*** saneax has quit IRC		15:59
pabelanger	fungi: reiterative: yup, we've gone to the next step of sizing our nodes in nodepool, mostly because of budget reasons. So far, it has worked out pretty well. As for control plane, we also have some minimal sized servers to run zuul. EG: 1vcpu/1gb zuul-merger again it works well but took some time to figure out sizing	16:01
tobiash	corvus: I see the yellow bell in the zuul on https://zuul.opendev.org/t/zuul/status, some unknown projects and undefined jobs	16:02
tobiash	it doesn't look like important errors, just noticed	16:02
*** saneax has joined #zuul		16:03
tobiash	grr, still a failing test case in tox remote :/	16:04
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Fix deletion of stale build dirs on startup https://review.opendev.org/620697	16:06
tobiash	but that should be it	16:06
*** adso78 has quit IRC		16:26
*** chkumar\|rover is now known as raukadah		16:31
*** mattw4 has joined #zuul		16:35
*** yolanda has quit IRC		16:50
clarkb	tobiash: looks like the quickstart job failed because the scheduler failed to connect to the mariadb container?	16:56
corvus	i think that's a periodic failure we haven't figured out yet	16:56
clarkb	looking at timestamps it appears that mysql reports ready to accept connections about 20 seconds after the timeout occurred	16:56
corvus	hrm, i thought we observed that it was ready long before the timeout	16:57
clarkb	From mysql log: 2019-11-22 16:33:06 0 [Note] mysqld: ready for connections.	16:57
clarkb	From scheduler log: 2019-11-22T16:32:43+00:00 Timeout waiting for mysql	16:58
corvus	huh. what happened here? https://zuul.opendev.org/t/zuul/build/b49f217927644946a85ac642742c0b2c/log/container_logs/mysql.log#61-63	16:59
corvus	that's a 3 minute gap	16:59
corvus	well, a 2 minute 17 second gap	17:00
corvus	maybe we need to add another 3 minutes to our waiting?	17:00
clarkb	I wonder if that is part of normal mariadb startup or is docker compose restarting things?	17:08
clarkb	I guess it would be part of mariadb otherwise we'd get different containers in different log files?	17:09
corvus	i suspect (but am not certain) that the mariadb container image starts mariadb, creates the database and user specified in the env vars, then restarts it. but i don't know why it sat there for 2+ minutes.	17:09
corvus	it's normally very fast.	17:09
clarkb	https://jira.mariadb.org/browse/MDEV-13869 is a fixed bug but there are cases of mariadb being slow to start for reasons at least	17:12
clarkb	I wonder if we can bump the logging verbosity during that temporary server startup time	17:13
clarkb	we could build a zuul/mariadb image that has gone through the init step already	17:16
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support https://review.opendev.org/695051	17:25
corvus	clarkb: yeah, though if this only happens 5% of the time, just bumping the timeout might be okay	17:25
openstackgerrit	Clark Boylan proposed zuul/zuul master: Increase mariadb log verbosity https://review.opendev.org/695737	17:27
clarkb	corvus: ^ that may give us more information if I've read the dockerfile, entrypoint, and mariadb docs properly	17:27
corvus	clarkb: want to recheck-bash that to try to see the error?	17:28
clarkb	corvus: ya I think we should try to catch it pre merge if possible	17:29
*** Guest24639 has joined #zuul		17:32
*** Guest24639 is now known as mgagne_		17:34
*** michael-beaver has joined #zuul		17:35
tobiash	clarkb: you could duplicate that job to increase the chance of hitting this error	17:39
clarkb	ya and remove the other jobs too	17:41
clarkb	I'll do that on a second patchset if I don't get lucky on the first	17:41
pabelanger	looking at the zuul-build-image job we have, the artifact that is produced is upload to the intermediate registry, which lives outside of nodepool testing resources right? I am trying to figure out a good way to show case the new artifact system zuul has, but use it in the context of the new collections we have in ansible. Basically, build collection (tarball), publish to specilative galaxy, have jobs download	17:42
pabelanger	that content and test with	17:42
pabelanger	but, galaxy here is build a top of nodepool, not control plane	17:43
clarkb	pabelanger: correct, it is a long lived service	17:43
clarkb	pabelanger: that is necessary to ensure depends on works	17:43
pabelanger	Hmm	17:43
pabelanger	so, in the case of galaxy we'd generate a pre-release tarball, using speclative info for version (it is actually pbr right now). I would not sure how to handle parellel collections getting uploaded	17:45
pabelanger	and conflicting	17:45
corvus	pabelanger: don't forget about the buildset registry	17:46
openstackgerrit	Matthieu Huin proposed zuul/zuul master: [WIP] admin REST API: zuul-web integration https://review.opendev.org/643536	17:46
corvus	there are 2 pieces: the buildset registry lets jobs within a change share images, the intermediate registry lets jobs between changes share images	17:46
corvus	the buildset registry is ephemeral	17:46
pabelanger	okay, that is likely what I am thinking of then	17:47
corvus	you could do the same with galaxy by running a buildset galaxy to test speculative collections, then add in an intermediate galaxy to share them between changes.	17:48
pabelanger	okay, yes. that seems to be the idea I want. For some reason I forgot about the buildset galaxy	17:48
clarkb	to avoid conflicts you would need a tagging/versioning scheme that avoided them	17:48
clarkb	since pre merge you are right there maybe multiple changes that can all become version x.y.z	17:49
pabelanger	yah, I am not completly sure how we do that today with docker, but plan to look	17:49
corvus	pabelanger: if you take this doc and s/registry/galaxy/ i think the architecture will mostly work: https://zuul-ci.org/docs/zuul-jobs/docker-image.html	17:49
pabelanger	for the collection today, I just hacked up a simple script based on pbr to give us a semver number: https://github.com/ansible-network/releases/blob/master/ansible_releases/cmd/generate_ansible_collection.py	17:50
pabelanger	corvus: ++ thank you	17:50
corvus	the push-to-intermediate-registry and pull-from-intermediate-registry handle the collision case by giving the images special tags based on the buildset id	17:50
corvus	if that wouldn't work for galaxy -- then, well, you don't actually need the intermediate "galaxy" to be a galaxy. it could actually just be a static file server or swift or whatever.	17:51
corvus	you can even just use existing log storage (like we do for promoting doc builds)	17:51
corvus	we just thought that it made more sense to put container images in a registry, for space and other considerations.	17:51
pabelanger	okay cool, yah suspect it will take a little time to dig more into it. My main goal is to show a tighter integration for testing for galaxy, a top of zuul. And current think how the docker artifact system works now, can also work in the case of galaxy	17:54
pabelanger	today we just build colleciton tarball on disk, then install from it. Adding galaxy into that mix I think will blow minds	17:54
corvus	yeah, it's a great case	17:55
corvus	it lets you test "if i published this change to galaxy, how would the stuff that depends on installing it from galaxy work?"	17:55
pabelanger	++ that is what I am hoping to demo to humans. I think that is going to be an important thing now with move to collections	17:56
pabelanger	galaxy now becomes the source of truth for ansible users	17:56
clarkb	tobiash: corvus I can't just list the same job multiple times under check right? I need to make a new variant of that job?	17:56
tobiash	yes	17:58
*** jamesmcarthur has quit IRC		18:00
openstackgerrit	Clark Boylan proposed zuul/zuul master: DNM: Increase mariadb log verbosity https://review.opendev.org/695737	18:08
clarkb	I guess the variables being set like that wasn't enough	18:11
openstackgerrit	Clark Boylan proposed zuul/zuul master: DNM: Increase mariadb log verbosity https://review.opendev.org/695737	18:14
pabelanger	corvus: for zuul, the buildset registry is running in the zuul-build-image job and currently zuul-quick-start then consume the image from it, does that sound correct?	18:17
clarkb	pabelanger: yes	18:18
pabelanger	great	18:18
pabelanger	do we do anything special in docker registry to create namespaces?	18:20
pabelanger	I would assume yes	18:20
corvus	pabelanger: for docker.io vs quay.io? not yet, that's what i'm working on.	18:21
corvus	but atm, there's only one galaxy, so that probably doesn't impact this idea.	18:21
pabelanger	sorry, I was trying to understand if you did docker push foo/bar to buildset registry, do you first need to create the foo namespace on docker side. Or does the client create that at push?	18:22
*** igordc has joined #zuul		18:23
corvus	pabelanger: the buildset registry will auto-create any repository pushed to it. it requires authentication (which is randomly generated at the start of the build)	18:24
corvus	pabelanger: a key part of doing this speculatively is configuring the clients to first try to pull from the buildset registry, and if that fails, pull directly from upstream.	18:25
pabelanger	Yah, that makes sense	18:26
corvus	pabelanger: i'm not sure if the galaxy client can be configured with mirrors in a fallback configuration (that's how we do it for container images). if not, then we may need some kind of proxy. (or, you know, add mirror support if it's missing)	18:26
*** irclogbot_1 has quit IRC		18:27
pabelanger	corvus: infact, ansible-galaxy CLI does support something like this now (multiple galaxy servers). So it should work out of box if all galaxy servers are configured properly	18:31
pabelanger	I haven't tested however	18:31
corvus	++	18:31
*** irclogbot_1 has joined #zuul		18:31
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support https://review.opendev.org/695051	18:35
openstackgerrit	Merged zuul/zuul master: Fix deletion of stale build dirs on startup https://review.opendev.org/620697	18:58
tobiash	\o/	19:05
*** gtema has joined #zuul		19:08
clarkb	my change seems to have caught some slow mariadb starts but I don't think we got any better logging	19:19
clarkb	based on my reading of the mariadb docs I would've expected more innodb logs at least. Which makes me think that my chagne isn't taking effect	19:20
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support https://review.opendev.org/695051	19:20
Shrews	tobiash: Do we know for sure that subprocess.run() cannot throw any exception in your new call in https://review.opendev.org/676266 ? I fear if it does, and we don't handle that, we leave things in a weird state	19:21
Shrews	I'd feel better if there were a try: except: around that	19:22
corvus	++	19:23
clarkb	hrm I don't get the Online DDL messages but I do get the plugin initialization messages and the aborted connection messages. Maybe it is logging at a higher verbosity afterall	19:23
tobiash	Shrews: you're right, the docs about that are not clear enough, I'll wrap it	19:25
Shrews	tobiash: cool. just left a note. same for the deleteImage() call in that section	19:26
tobiash	ok, I'll change that probably on monday	19:27
Shrews	tobiash: everything else looks great though	19:27
tobiash	thanks!	19:27
*** gtema has quit IRC		19:36
Shrews	tobiash: also, tristanC found a nit in https://review.opendev.org/693672 that probably should be fixed either in that review or a follow up	19:38
tobiash	Shrews: I'll update this as well	19:42
*** michael-beaver has quit IRC		19:45
pabelanger	so, this just happened: vmware-vcsa-6.7.0-vexxhost-ca-ymq-1-0000163120	19:47
pabelanger	doing a POC with zuul.a.c, on how we can use it to test vmware modules for ansible	19:48
clarkb	https://github.com/docker-library/mariadb/issues/261 comparing logs I am pretty srue this is the cause of our mariadb slowness	19:51
openstackgerrit	Clark Boylan proposed zuul/zuul master: DNM: Increase mariadb log verbosity https://review.opendev.org/695737	19:53
clarkb	this ps applies a workaround that will break tzinfo data in mysql (but I expect we don't use that in zuul anyway and gerrit isn't using this db)	19:53
clarkb	they basically seem to say you need nvme storage now	19:54
clarkb	(someone asid "I have an ssd and this affects me too" and then the response was "ya but your ssd isn't fast like nvme)	19:55
fungi	yeesh	19:55
Shrews	i wonder if the oracle version exhibits the same behavior	19:57
Shrews	or even percona	19:58
clarkb	I also wonder if they could bootstrap the tzinfo in the container image build	19:59
clarkb	I guess not because we mount in the data dir	19:59
clarkb	and the tzinfo stuff goes into normal tables looks like	19:59
fungi	i still can't wrap my head around why they don't just store/replicate subsecond precision epoch and convert to/from that as needed. must be they're stuck with standards decisions made long, long ago	20:00
Shrews	I vote we just blame LinuxJedi since he isn't here to defend himself	20:01
Shrews	:)	20:01
corvus	pabelanger: neato :)	20:04
fungi	i thought that's what we were doing	20:04
*** saneax has quit IRC		20:05
corvus	clarkb: we could pin to an older version in the quickstart	20:08
clarkb	corvus: ya looks like 10.4.7 should work	20:08
clarkb	10.4.8 introduced the new tzinfo behavior on the 10.4 series	20:09
clarkb	10.4.10 is what we currently use	20:10
corvus	but i also agree we're unlikely to be bothered by the SKIP if we just wanted to do that	20:10
*** yolanda has joined #zuul		20:10
clarkb	ya I think we store everything in utc in the db and if we need to convert for users that should happen outside the db	20:13
Shrews	fyi: i'll be afk all week next week. someone be sure to feed and water Zuul while i'm away.	20:29
corvus	Shrews: happy thanksgiving!	20:29
corvus	i'll be afk as well	20:29
Shrews	you too! and all other US-based folks	20:29
corvus	and everyone else, enjoy the break from the us-based folks :)	20:30
openstackgerrit	Clark Boylan proposed zuul/zuul master: Disable Mariadb TZINFO table generation https://review.opendev.org/695737	20:33
clarkb	ok I've cleaned up my change and chosen the TZINFO skip. Largely because it seems to be what others are doing so if we have further problems hopefully we find a path forward easily	20:33
clarkb	I did mention a few other options that we could try instead in the commit message	20:33
clarkb	I'll be around early next week before disappearing.	20:34
Shrews	i think that's an acceptable change for us for now	20:34
*** mhu has quit IRC		20:39
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support https://review.opendev.org/695051	20:41
openstackgerrit	Merged zuul/zuul master: web: handle jobs that have multiple parent https://review.opendev.org/695450	21:09
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support https://review.opendev.org/695051	21:13
*** mgagne_ is now known as mgagne		21:13
openstackgerrit	Merged zuul/zuul master: Disable Mariadb TZINFO table generation https://review.opendev.org/695737	21:18
*** rlandy has quit IRC		21:18
corvus	the podman job just passed	21:26
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Add build-container-image role https://review.opendev.org/695251	21:34
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: use-buildset-registry: Add podman support https://review.opendev.org/695051	21:34
*** mattw4 has quit IRC		21:54
*** rfolco has quit IRC		21:58
*** mattw4 has joined #zuul		21:59
*** Goneri has quit IRC		22:02
corvus	tristanC, mordred, ianw: ^ the speculative podman container stack is green.	22:04
*** nhicher has quit IRC		22:24
*** nhicher has joined #zuul		22:32
*** mattw4 has quit IRC		23:01
*** tosky has quit IRC		23:45

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!