Friday, 2019-11-22

*** rlandy is now known as rlandy|bbl00:01
*** mattw4 has quit IRC00:47
*** michael-beaver has quit IRC00:50
*** tosky has quit IRC00:56
*** openstackstatus has joined #zuul01:04
*** ChanServ sets mode: +v openstackstatus01:04
*** jamesmcarthur has joined #zuul01:34
*** igordc has quit IRC01:57
*** rfolco has joined #zuul02:22
*** rfolco has quit IRC02:27
*** rlandy|bbl is now known as rlandy02:53
*** rlandy has quit IRC02:56
*** jamesmcarthur has quit IRC03:13
*** openstackgerrit has joined #zuul05:55
openstackgerritMerged zuul/zuul-jobs master: install-podman: also install slirp4netns  https://review.opendev.org/69560105:55
ianwShrews: i know right!  next we'll need a trumpet winsock package05:58
*** rfolco has joined #zuul05:58
*** rfolco has quit IRC06:03
*** zbr has joined #zuul06:06
*** zbr|ooo has quit IRC06:08
*** raukadah is now known as chkumar|rover06:37
*** saneax has joined #zuul07:28
*** tosky has joined #zuul08:15
*** pcaruana has joined #zuul08:21
*** bolg has joined #zuul08:36
openstackgerritMatthieu Huin proposed zuul/zuul master: authentication config: add optional token_expiry  https://review.opendev.org/64240809:03
openstackgerritMatthieu Huin proposed zuul/zuul master: Authorization rules: support YAML nested dictionaries  https://review.opendev.org/68479009:04
*** mhu has joined #zuul09:04
mhuHey there, here are a few patches on the zuul_admin_web topic that have been +2'ed for a while but were blocked by another patch, can we get these to +3 now that the blocker was merged?09:07
mhuhttps://review.opendev.org/#/c/642408/ and https://review.opendev.org/#/c/684790/09:07
mhuand this one too while we're at it, as it'd simplify GUI integration https://review.opendev.org/#/c/684790/09:08
*** sshnaidm is now known as sshnaidm|off09:21
openstackgerritTobias Henkel proposed zuul/zuul master: Fix deletion of stale build dirs on startup  https://review.opendev.org/62069710:01
reiterativeHi. Can anyone point me to some guidance on suitable memory / storage allocations for the various elements of Zuul (NodePool, scheduler, executors)? This is not for a production environment (yet), and will only need to support a small number of devs (10-20) and a modest number of repos initially. I'm expecting to discover a lot of this in the process, but if anyone has tips or lessons learned to share, I'd be very grateful!10:31
*** yolanda__ has joined #zuul11:08
*** yolanda has quit IRC11:09
*** mhu has quit IRC11:23
*** yolanda__ has quit IRC11:33
*** yolanda has joined #zuul11:46
*** yolanda has quit IRC11:52
*** rfolco has joined #zuul11:55
fungireiterative: for our environment (i help run opendev), we mostly stick with smallish virtual machines for each component and scale out with more virtual machines as we need. we've been using 8gb ram flavors for zuul executors and nodepool builders, 2gb for dedicated zuul mergers and nodepool launchers, and then the bit which can't scale horizontally at the moment is the zuul scheduler which we're using a 30gb11:57
fungiflavor for (it also hosts the zuul-web and zuul-finger daemons) but now that we seem to have gotten the memory usage under control there it could probably get by fine for us on 15gb, oh and 4gb flavor for our zookeeper cluster nodes11:57
fungias for storage, it really depends on how much data you're dealing with11:57
fungiwe build a bunch of different virtual machine images for different releases of at least half a dozen linux distributions, so we mount a 1tb block device on them to give diskimage-builder room to do its thing11:59
fungiour executors each have an 80gb volume mounted on /var/lib/zuul12:00
reiterativeThanks fungi - that's very helpful12:01
fungifor size comparison, we've tuned our executors to be able to comfortably run around 100 concurrent builds each12:02
fungiif you want to be able to see more detailed stats of the individual servers, we have public graphs: http://cacti.openstack.org/12:03
reiterativeYes, i can see what the executors and individual nodes would need plenty of available resources, but what about the Zuul scheduler and nodepool instances?12:03
fungisizing nodepool instances will *really* depend on your job payloads. we standardize on 8gb ram, 80gb disk and 8vcpus for job nodes12:04
*** yolanda has joined #zuul12:05
fungithe scheduler server really doesn't use much disk, though we put its /var/lib/zuul on a separate volume just in case12:05
fungikeep in mind ours is a fairly large environment, with around a couple thousand projects and running roughly 1000 concurrent builds at peak12:09
reiterativeYes, I am feeling very much out of my depth at this scale :-)12:09
fungibut to handle that we're running several nodepool launchers, 12 zuul executors, 8 additional dedicated zuul mergers and a 3-node zk cluster12:10
fungiand we spread the workload across many different public cloud providers12:10
fungithere are other folks in here who run just a few concurrent builds for small teams who can hopefully give you an example from that end of the scale12:11
fungithe key takeaway though is that you can start out small (even an all-in-one deployment on a single server if you really want) and then scale out by moving components to their own servers and adding more of them as you need12:12
reiterativeThanks for your help - it's really useful to understand how it works at scale, as that's one of the reason I'm so interested in Zuul12:13
reiterativeBut I need to start small :-)12:14
fungiyep. i think pabelanger has a few smallish deployments which might provide a good counterpoint, once he's around12:14
fungiwe've got regulars in here running zuul/nodepool at a wide variety of scales12:15
reiterativeGreat12:16
*** yoctozepto has quit IRC12:43
*** yoctozepto has joined #zuul12:43
*** jamesmcarthur has joined #zuul13:04
*** rlandy has joined #zuul13:04
Shrewsianw: Yeah. I believe the Internet truly was more fun when you had to experience it via Netscape, that ran on slackware linux, that you installed via 20-some floppies, that you obtained from hours spent in a computer lab with a green-bar printer, and then spent hours installing, then hours customizing Xconfig, just so you can dial into a local modem pool to get terminal access to run slirp to get said Internet access that was slow as mud (not13:05
Shrewsthe game).13:05
fungiShrews: i just had this discussion yesterday on another irc network. for me the "good ol' days" of the internet were when you could walk up to just about any serial terminal or telnet to any server and log in as guest/guest13:07
Shrewsah yes13:07
* fungi tries to recapture that sense of community and belonging in everything he builds13:08
* Shrews tries to telnet to yuggoth.org machines as guest13:09
fungiso do a bazillion bots, so no that specific example is unfortunately no longer relevant13:10
*** mhu has joined #zuul13:10
*** mgoddard has quit IRC13:44
*** mgoddard has joined #zuul13:45
*** jamesmcarthur has quit IRC14:12
*** jamesmcarthur has joined #zuul14:37
*** themr0c has quit IRC14:54
*** Goneri has joined #zuul15:17
openstackgerritTobias Henkel proposed zuul/zuul master: Fix deletion of stale build dirs on startup  https://review.opendev.org/62069715:24
tobiashcorvus: this finally fixes the test cases of the executor job dir cleanup ^15:25
corvustobiash: i keep forgetting that hasn't merged15:32
corvusshould we wait 6 more days and merge it on its 1 year anniversary?15:32
tobiashlol15:32
tobiashworks for me, but I'll pull it into our deployment now ;)15:33
corvusheh, i'll review it asap, thanks!15:33
tobiashunfortunately it got a little bit bigger than anticipated due to necessary test changes15:34
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support  https://review.opendev.org/69505115:38
*** bolg has quit IRC15:56
*** saneax has quit IRC15:59
pabelangerfungi: reiterative: yup, we've gone to the next step of sizing our nodes in nodepool, mostly because of budget reasons. So far, it has worked out pretty well. As for control plane, we also have some minimal sized servers to run zuul. EG: 1vcpu/1gb zuul-merger again it works well but took some time to figure out sizing16:01
tobiashcorvus: I see the yellow bell in the zuul on https://zuul.opendev.org/t/zuul/status, some unknown projects and undefined jobs16:02
tobiashit doesn't look like important errors, just noticed16:02
*** saneax has joined #zuul16:03
tobiashgrr, still a failing test case in tox remote :/16:04
openstackgerritTobias Henkel proposed zuul/zuul master: Fix deletion of stale build dirs on startup  https://review.opendev.org/62069716:06
tobiashbut that should be it16:06
*** adso78 has quit IRC16:26
*** chkumar|rover is now known as raukadah16:31
*** mattw4 has joined #zuul16:35
*** yolanda has quit IRC16:50
clarkbtobiash: looks like the quickstart job failed because the scheduler failed to connect to the mariadb container?16:56
corvusi think that's a periodic failure we haven't figured out yet16:56
clarkblooking at timestamps it appears that mysql reports ready to accept connections about 20 seconds after the timeout occurred16:56
corvushrm, i thought we observed that it was ready long before the timeout16:57
clarkbFrom mysql log: 2019-11-22 16:33:06 0 [Note] mysqld: ready for connections.16:57
clarkbFrom scheduler log: 2019-11-22T16:32:43+00:00 Timeout waiting for mysql16:58
corvushuh.  what happened here?  https://zuul.opendev.org/t/zuul/build/b49f217927644946a85ac642742c0b2c/log/container_logs/mysql.log#61-6316:59
corvusthat's a 3 minute gap16:59
corvuswell, a 2 minute 17 second gap17:00
corvusmaybe we need to add another 3 minutes to our waiting?17:00
clarkbI wonder if that is part of normal mariadb startup or is docker compose restarting things?17:08
clarkbI guess it would be part of mariadb otherwise we'd get different containers in different log files?17:09
corvusi suspect (but am not certain) that the mariadb container image starts mariadb, creates the database and user specified in the env vars, then restarts it.  but i don't know why it sat there for 2+ minutes.17:09
corvusit's normally very fast.17:09
clarkbhttps://jira.mariadb.org/browse/MDEV-13869 is a fixed bug but there are cases of mariadb being slow to start for reasons at least17:12
clarkbI wonder if we can bump the logging verbosity during that temporary server startup time17:13
clarkbwe could build a zuul/mariadb image that has gone through the init step already17:16
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support  https://review.opendev.org/69505117:25
corvusclarkb: yeah, though if this only happens 5% of the time, just bumping the timeout might be okay17:25
openstackgerritClark Boylan proposed zuul/zuul master: Increase mariadb log verbosity  https://review.opendev.org/69573717:27
clarkbcorvus: ^ that may give us more information if I've read the dockerfile, entrypoint, and mariadb docs properly17:27
corvusclarkb: want to recheck-bash that to try to see the error?17:28
clarkbcorvus: ya I think we should try to catch it pre merge if possible17:29
*** Guest24639 has joined #zuul17:32
*** Guest24639 is now known as mgagne_17:34
*** michael-beaver has joined #zuul17:35
tobiashclarkb: you could duplicate that job to increase the chance of hitting this error17:39
clarkbya and remove the other jobs too17:41
clarkbI'll do that on a second patchset if I don't get lucky on the first17:41
pabelangerlooking at the zuul-build-image job we have, the artifact that is produced is upload to the intermediate registry, which lives outside of nodepool testing resources right? I am trying to figure out a good way to show case the new artifact system zuul has, but use it in the context of the new collections we have in ansible.  Basically, build collection (tarball), publish to specilative galaxy, have jobs download17:42
pabelangerthat content and test with17:42
pabelangerbut, galaxy here is build a top of nodepool, not control plane17:43
clarkbpabelanger: correct, it is a long lived service17:43
clarkbpabelanger: that is necessary to ensure depends on works17:43
pabelangerHmm17:43
pabelangerso, in the case of galaxy we'd generate a pre-release tarball, using speclative info for version (it is actually pbr right now). I would not sure how to handle parellel collections getting uploaded17:45
pabelangerand conflicting17:45
corvuspabelanger: don't forget about the buildset registry17:46
openstackgerritMatthieu Huin proposed zuul/zuul master: [WIP] admin REST API: zuul-web integration  https://review.opendev.org/64353617:46
corvusthere are 2 pieces: the buildset registry lets jobs within a change share images, the intermediate registry lets jobs between changes share images17:46
corvusthe buildset registry is ephemeral17:46
pabelangerokay, that is likely what I am thinking of then17:47
corvusyou could do the same with galaxy by running a buildset galaxy to test speculative collections, then add in an intermediate galaxy to share them between changes.17:48
pabelangerokay, yes. that seems to be the idea I want. For some reason I forgot about the buildset galaxy17:48
clarkbto avoid conflicts you would need a tagging/versioning scheme that avoided them17:48
clarkbsince pre merge you are right there maybe multiple changes that can all become version x.y.z17:49
pabelangeryah, I am not completly sure how we do that today with docker, but plan to look17:49
corvuspabelanger: if you take this doc and s/registry/galaxy/ i think the architecture will mostly work: https://zuul-ci.org/docs/zuul-jobs/docker-image.html17:49
pabelangerfor the collection today, I just hacked up a simple script based on pbr to give us a semver number: https://github.com/ansible-network/releases/blob/master/ansible_releases/cmd/generate_ansible_collection.py17:50
pabelangercorvus: ++ thank you17:50
corvusthe push-to-intermediate-registry and pull-from-intermediate-registry handle the collision case by giving the images special tags based on the buildset id17:50
corvusif that wouldn't work for galaxy -- then, well, you don't actually need the intermediate "galaxy" to be a galaxy.  it could actually just be a static file server or swift or whatever.17:51
corvusyou can even just use existing log storage (like we do for promoting doc builds)17:51
corvuswe just thought that it made more sense to put container images in a registry, for space and other considerations.17:51
pabelangerokay cool, yah suspect it will take a little time to dig more into it. My main goal is to show a tighter integration for testing for galaxy, a top of zuul.  And current think how the docker artifact system works now, can also work in the case of galaxy17:54
pabelangertoday we just build colleciton tarball on disk, then install from it.  Adding galaxy into that mix I think will blow minds17:54
corvusyeah, it's a great case17:55
corvusit lets you test "if i published this change to galaxy, how would the stuff that depends on installing it from galaxy work?"17:55
pabelanger++ that is what I am hoping to demo to humans. I think that is going to be an important thing now with move to collections17:56
pabelangergalaxy now becomes the source of truth for ansible users17:56
clarkbtobiash: corvus I can't just list the same job multiple times under check right? I need to make a new variant of that job?17:56
tobiashyes17:58
*** jamesmcarthur has quit IRC18:00
openstackgerritClark Boylan proposed zuul/zuul master: DNM: Increase mariadb log verbosity  https://review.opendev.org/69573718:08
clarkbI guess the variables being set like that wasn't enough18:11
openstackgerritClark Boylan proposed zuul/zuul master: DNM: Increase mariadb log verbosity  https://review.opendev.org/69573718:14
pabelangercorvus: for zuul, the buildset registry is running in the zuul-build-image job and currently zuul-quick-start then consume the image from it, does that sound correct?18:17
clarkbpabelanger: yes18:18
pabelangergreat18:18
pabelangerdo we do anything special in docker registry to create namespaces?18:20
pabelangerI would assume yes18:20
corvuspabelanger: for docker.io vs quay.io?  not yet, that's what i'm working on.18:21
corvusbut atm, there's only one galaxy, so that probably doesn't impact this idea.18:21
pabelangersorry, I was trying to understand if you did docker push foo/bar to buildset registry, do you first need to create the foo namespace on docker side. Or does the client create that at push?18:22
*** igordc has joined #zuul18:23
corvuspabelanger: the buildset registry will auto-create any repository pushed to it.  it requires authentication (which is randomly generated at the start of the build)18:24
corvuspabelanger: a key part of doing this speculatively is configuring the clients to first try to pull from the buildset registry, and if that fails, pull directly from upstream.18:25
pabelangerYah, that makes sense18:26
corvuspabelanger: i'm not sure if the galaxy client can be configured with mirrors in a fallback configuration (that's how we do it for container images).  if not, then we may need some kind of proxy.  (or, you know, add mirror support if it's missing)18:26
*** irclogbot_1 has quit IRC18:27
pabelangercorvus: infact, ansible-galaxy CLI does support something like this now (multiple galaxy servers). So it should work out of box if all galaxy servers are configured properly18:31
pabelangerI haven't tested however18:31
corvus++18:31
*** irclogbot_1 has joined #zuul18:31
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support  https://review.opendev.org/69505118:35
openstackgerritMerged zuul/zuul master: Fix deletion of stale build dirs on startup  https://review.opendev.org/62069718:58
tobiash\o/19:05
*** gtema has joined #zuul19:08
clarkbmy change seems to have caught some slow mariadb starts but I don't think we got any better logging19:19
clarkbbased on my reading of the mariadb docs I would've expected more innodb logs at least. Which makes me think that my chagne isn't taking effect19:20
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support  https://review.opendev.org/69505119:20
Shrewstobiash: Do we know for sure that subprocess.run() cannot throw any exception in your new call in https://review.opendev.org/676266 ? I fear if it does, and we don't handle that, we leave things in a weird state19:21
ShrewsI'd feel better if there were a try: except: around that19:22
corvus++19:23
clarkbhrm I don't get the Online DDL messages but I do get the plugin initialization messages and the aborted connection messages. Maybe it is logging at a higher verbosity afterall19:23
tobiashShrews: you're right, the docs about that are not clear enough, I'll wrap it19:25
Shrewstobiash: cool. just left a note. same for the deleteImage() call in that section19:26
tobiashok, I'll change that probably on monday19:27
Shrewstobiash: everything else looks great though19:27
tobiashthanks!19:27
*** gtema has quit IRC19:36
Shrewstobiash: also, tristanC found a nit in https://review.opendev.org/693672 that probably should be fixed either in that review or a follow up19:38
tobiashShrews: I'll update this as well19:42
*** michael-beaver has quit IRC19:45
pabelangerso, this just happened: vmware-vcsa-6.7.0-vexxhost-ca-ymq-1-000016312019:47
pabelangerdoing a POC with zuul.a.c, on how we can use it to test vmware modules for ansible19:48
clarkbhttps://github.com/docker-library/mariadb/issues/261 comparing logs I am pretty srue this is the cause of our mariadb slowness19:51
openstackgerritClark Boylan proposed zuul/zuul master: DNM: Increase mariadb log verbosity  https://review.opendev.org/69573719:53
clarkbthis ps applies a workaround that will break tzinfo data in mysql (but I expect we don't use that in zuul anyway and gerrit isn't using this db)19:53
clarkbthey basically seem to say you need nvme storage now19:54
clarkb(someone asid "I have an ssd and this affects me too" and then the response was "ya but your ssd isn't fast like nvme)19:55
fungiyeesh19:55
Shrewsi wonder if the oracle version exhibits the same behavior19:57
Shrewsor even percona19:58
clarkbI also wonder if they could bootstrap the tzinfo in the container image build19:59
clarkbI guess not because we mount in the data dir19:59
clarkband the tzinfo stuff goes into normal tables looks like19:59
fungii still can't wrap my head around why they don't just store/replicate subsecond precision epoch and convert to/from that as needed. must be they're stuck with standards decisions made long, long ago20:00
ShrewsI vote we just blame LinuxJedi since he isn't here to defend himself20:01
Shrews:)20:01
corvuspabelanger: neato :)20:04
fungii thought that's what we were doing20:04
*** saneax has quit IRC20:05
corvusclarkb: we could pin to an older version in the quickstart20:08
clarkbcorvus: ya looks like 10.4.7 should work20:08
clarkb10.4.8 introduced the new tzinfo behavior on the 10.4 series20:09
clarkb10.4.10 is what we currently use20:10
corvusbut i also agree we're unlikely to be bothered by the SKIP if we just wanted to do that20:10
*** yolanda has joined #zuul20:10
clarkbya I think we store everything in utc in the db and if we need to convert for users that should happen outside the db20:13
Shrewsfyi: i'll be afk all week next week. someone be sure to feed and water Zuul while i'm away.20:29
corvusShrews: happy thanksgiving!20:29
corvusi'll be afk as well20:29
Shrewsyou too! and all other US-based folks20:29
corvusand everyone else, enjoy the break from the us-based folks :)20:30
openstackgerritClark Boylan proposed zuul/zuul master: Disable Mariadb TZINFO table generation  https://review.opendev.org/69573720:33
clarkbok I've cleaned up my change and chosen the TZINFO skip. Largely because it seems to be what others are doing so if we have further problems hopefully we find a path forward easily20:33
clarkbI did mention a few other options that we could try instead in the commit message20:33
clarkbI'll be around early next week before disappearing.20:34
Shrewsi think that's an acceptable change for us for now20:34
*** mhu has quit IRC20:39
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support  https://review.opendev.org/69505120:41
openstackgerritMerged zuul/zuul master: web: handle jobs that have multiple parent  https://review.opendev.org/69545021:09
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: use-buildset-registry: add podman support  https://review.opendev.org/69505121:13
*** mgagne_ is now known as mgagne21:13
openstackgerritMerged zuul/zuul master: Disable Mariadb TZINFO table generation  https://review.opendev.org/69573721:18
*** rlandy has quit IRC21:18
corvusthe podman job just passed21:26
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add build-container-image role  https://review.opendev.org/69525121:34
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: use-buildset-registry: Add podman support  https://review.opendev.org/69505121:34
*** mattw4 has quit IRC21:54
*** rfolco has quit IRC21:58
*** mattw4 has joined #zuul21:59
*** Goneri has quit IRC22:02
corvustristanC, mordred, ianw: ^ the speculative podman container stack is green.22:04
*** nhicher has quit IRC22:24
*** nhicher has joined #zuul22:32
*** mattw4 has quit IRC23:01
*** tosky has quit IRC23:45

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!