Wednesday, 2019-12-18

*** tosky has quit IRC00:00
*** mattw4 has quit IRC00:02
SpamapSmnaser: v3 is pretty nice.. I'd be down for testing. :)00:03
mnaserSpamapS: yeah, funnily enough tho, we've historically never used tiller ever :p00:03
mnaserso helm v3 was a "hey glad y'all finally caught on" :P00:03
*** saneax has quit IRC00:20
corvusmnaser: sweet -- maybe we can use those in the gerrit-review deployment00:21
mnasercorvus: ill push it up at some point, probably later today or tomorrow and then we can look about adding it to opendev00:22
corvus++00:22
mnaserthe only help i'd appreciate is (i tried this and failed a while back) is tagged releases on dockerhub would be nice00:25
*** rlandy has quit IRC00:29
corvusagreed; i'll try to get to that if someone else doesn't soon, but i've a bit larger backlog than normal right now00:35
*** sgw has joined #zuul01:01
SpamapSTBH, Helm is a far far simpler approach than the operator.01:09
SpamapSIt can't do as much, but it's definitely intended as "package management for k8s apps" rather than "do all the magic"01:09
mnaserSpamapS: yep, its a good intermediate step for now01:12
SpamapSWe've actually shifted over to managing kubernetes objects with Terraform of late. The lifecycle and dependency management is particularly nice.01:20
SpamapSBut we have a bunch of helm stuff that will stay in helm for a while.01:21
*** swest has joined #zuul01:30
*** swest has quit IRC01:35
*** swest has joined #zuul01:50
*** bhavikdbavishi has joined #zuul02:47
*** gouthamr has quit IRC03:14
*** gouthamr has joined #zuul03:24
*** jamesmcarthur has joined #zuul03:29
*** jamesmcarthur has quit IRC03:44
*** pcaruana has joined #zuul05:24
*** raukadah is now known as chkumar|rover05:58
*** saneax has joined #zuul06:24
*** jamesmcarthur has joined #zuul06:46
*** jamesmcarthur has quit IRC06:51
*** jcapitao|afk has joined #zuul07:27
*** jcapitao|afk is now known as jcapitao07:28
*** gtema_ has joined #zuul07:58
*** tosky has joined #zuul08:18
*** jpena|off is now known as jpena08:22
*** gtema_ has quit IRC08:26
*** tosky has quit IRC08:32
*** avass has joined #zuul08:44
*** themroc has joined #zuul08:49
*** tosky has joined #zuul09:16
*** mhu has joined #zuul09:33
*** bhavikdbavishi has quit IRC09:58
*** tosky has quit IRC10:02
*** tosky has joined #zuul10:03
*** jcapitao is now known as jcapitao|afk11:29
*** sshnaidm has quit IRC12:04
*** bhavikdbavishi has joined #zuul12:04
tristanCfwiw i'm still exploring dhall-lang, and for application deployment here is what i wrote for zuul: https://github.com/TristanCacqueray/dhall-operator/blob/master/applications/Zuul.dhall12:07
*** avass has quit IRC12:11
*** mgoddard has quit IRC12:31
*** jpena is now known as jpena|lunch12:41
*** armstrongs has joined #zuul12:41
*** mgoddard has joined #zuul12:49
*** armstrongs has quit IRC12:50
*** mgoddard has quit IRC12:54
*** rlandy has joined #zuul12:57
*** sshnaidm has joined #zuul12:58
*** Goneri has quit IRC12:58
*** sshnaidm has quit IRC13:02
*** sshnaidm has joined #zuul13:03
*** AshBullock has joined #zuul13:04
*** electrofelix has joined #zuul13:05
*** jamesmcarthur has joined #zuul13:05
*** mgoddard has joined #zuul13:09
AshBullockHey all, I have a question on the nodepool kubernetes driver, I'm seeing some jobs running on our eks cluster hitting RETRY_LIMIT intermittently, I was wondering if there is an undocumented max-pods setting for kubernetes similar to the openshift driver ? https://zuul-ci.org/docs/nodepool/configuration.html#attr-providers.[openshiftpods].max-pods13:09
*** jamesmcarthur has quit IRC13:12
*** jamesmcarthur has joined #zuul13:13
*** jcapitao|afk is now known as jcapitao13:16
*** ssbarnea has quit IRC13:22
*** jamesmcarthur has quit IRC13:29
ShrewsAshBullock: no, there is no such setting in the kubernetes driver13:33
ShrewsAshBullock: that being said, quota issues (such as max-pods) should not cause RETRY_LIMIT errors. You would see messages in nodepool about "not enough quota to satisfy" the request, and it would simply not handle the request until quota freed up. You are likely hitting some sort of communication issue, I'm guessing (not a k8s expert).13:39
Shrewsfirst place you may want to look is the zuul executor logs for the builds encountering the retry. might be more info there13:43
*** jpena|lunch is now known as jpena13:45
*** Goneri has joined #zuul13:46
*** bhavikdbavishi has quit IRC13:48
*** jamesmcarthur has joined #zuul13:55
*** jamesmcarthur has quit IRC13:59
AshBullockThanks for the help Shrews, I'll take a look at the executor logs to see if I can find anything14:00
*** sshnaidm has quit IRC14:02
*** jamesmcarthur has joined #zuul14:03
Shrewscorvus: the etherpad lgtm. who is going to run/operate their zuul?14:04
*** chkumar|rover is now known as chandankumar14:07
clarkbNODE_FAILURE is ehat you get if you cant provision with nodepool14:14
clarkbretry limit happens when the job runs but zuul detect s failures in pre run or ansible return exit code indicating network problems14:15
clarkbzuul by default tries 3 times to run the job when ithits this14:15
clarkbfailing 3 times results in retry limit14:15
clarkbAshBullock: ^14:15
mordredShrews: they/we are14:20
*** themroc has quit IRC14:21
mordredShrews: idea being having a community repo/repos sort of similar to opendev where we have a gitops repo for running it14:23
*** bhavikdbavishi has joined #zuul14:48
*** sshnaidm has joined #zuul14:53
*** AshBullock has quit IRC15:04
*** ssbarnea has joined #zuul15:12
mnaserhrm, the latest nodepool-builder image on dockerhub doesnt have debootstrap15:28
mnaserhttps://opendev.org/zuul/nodepool/commit/46d0ce248326127c2d883a415af98fea66af889d this commit implies "Note the sibling build will have installed many of these from the bindep.txt file from diskimage-builder itself." but adding "However, when using releases this is not done."15:28
mnaserdoes that mean i have to build my own images and add these bits on top of it (i'm ok with that, it just seems like uh, work that new users might struggle with)15:29
mordredhrm15:29
mordredI do not believe that's the intent15:29
mordredmnaser: let's loop corvus in when he gets up so we can talk through it15:30
mnaserya looking at the /var/log/apt/history.log in the container, its not htere15:31
mnaser*there15:31
mordredmnaser: (the ultimate solution here is the finishing of the docker-base-image patches for dib so that debootstrap is not needed anymore)15:31
mnasermordred: yeah i was thinking about that too!  i was wondering about the feasiblity of running docker-base-image inside a container15:31
*** panda has quit IRC15:31
mordredmnaser: shold be fine actually - it just does a podman export15:32
mordredso it doesn;'t actually _run_ a container build or anything - just fetches and then exports the filesystem15:32
mnasermordred: i guess we'd need podman as a runtime dependency in that case but thats' fine by me15:32
mordredyah15:32
mordred(it's actually written to work with podman or docker - but I think podman is the nicer runtime dep)15:33
mnaseryeah its probably not gonna try and mess around with trying to get a systemd service/etc done15:33
mordredmnaser: https://review.opendev.org/#/c/693619/ is what I've got so far15:33
*** panda has joined #zuul15:33
mnasermordred: i actually searched for that patch hoping it'd have merged15:34
mordredmnaser: I think that part is solid enough - the hard bits are going to be https://review.opendev.org/#/c/693642/ and similar for the other base os images15:34
mnaserand then i would have started using it =P15:34
clarkbmordred: mnaser that commit merged yesterday, maybe it failed to upload the updated image?15:34
mordredmnaser: honestly - I think it's all likely not too hard to get moving15:34
clarkbor your pull is from pre merge?15:34
mnaserclarkb: the last merge was 18 hours ago and the most recent image in dockerhub was 18 hours ago (and also the one i have)15:34
mordredclarkb: I don't think that commit is sufficient15:34
clarkboh I thought debootstrap was specifically listed15:35
mordredclarkb: somewhere we missed that we need some of the siblings behavior in production builds too15:35
mordredit's not15:35
clarkbah15:35
mordredI'm honestly not sure what the *right* solution is - think it's worth a quick discussion - I'm pretty sure implemeting the right solution won't be as hard as figuring out what it is :)15:36
mnaseryep, agreed15:36
mnaseri think for now ill kinda just uh, have an image with a few extra packages (based on the most recent tagged release)15:36
mnaserjust to unblock the helm charts work im at15:37
mordredmnaser: I betcha those two dib patches would get you a bootable ubuntu image if you did container-base-image vm ubuntu-kernel DIB_CONTAINER_IMAGE=docker.io/library/ubuntu15:37
mnaserso far launcher works well (tested with cloudimages) and working on builder now15:37
mordredmnaser: woot15:37
clarkbI think if we are installing vhd-utils and debian-keyring and yum etc we may as well install debootstrap15:38
clarkbyum is the equivalent of debootstrap there for red hat distros15:39
mordredI agree15:39
openstackgerritMonty Taylor proposed zuul/nodepool master: Add debootstrap to builder package list  https://review.opendev.org/69970715:40
clarkbI know ianw intends to get this into production after PTO15:40
mordredclarkb, mnaser: ^^15:40
clarkbI expect things will work a bit more happily out of the box once we dogfood it15:41
mnaserclarkb: thats a very reasonable argument IMHO15:41
*** ssbarnea has quit IRC15:46
mnaserfwiw mnaser/nodepool-builder:latest is running with mordred patch, so ill test that and see if any other things pop up missing15:50
tristanCSoftware Factory 3.4 has been released, amongs other things it removes SCL for python3 and the zuul rpm doesn't have patches anymore: https://www.softwarefactory-project.io/releases/3.4/15:51
corvusohai15:53
clarkbtristanC: removes SCL because centos/rhel 7 provide python 3 directly?15:54
mnaserchallenge #2: sudo mount --bind /opt/cache/apt/debian /tmp/dib_build.8Jsgxogy/mnt/var/cache/apt/archives => mount: /tmp/dib_build.8Jsgxogy/mnt/var/cache/apt/archives: permission denied. -- gonna guess i have to find the right capability to add to this container15:55
tristanCclarkb: yes, we rebuilt every python3 components using the python-3.6 provided by el715:55
* mnaser goes back to research15:55
clarkbmnaser: ya you'll need privileges15:55
corvusmordred: ++ 69970715:55
mnaserclarkb: ya im trying to avoid privileged: true and finding the right caps to add..15:55
corvusmnaser: you are my hero15:56
clarkbI think mount is its own cap?15:56
mordredmnaser: we've been wanting a human to do that for like 2 years now15:56
tristanCclarkb: we still enable the rh-git-218 SCL because zuul needs a more recent git15:56
mnasercorvus, mordred \o/15:56
clarkbtristanC: hrm what aspect of zuul requires newer git?15:56
mnasermount seems to require CAP_SYS_ADMIN:X15:57
tristanCclarkb: iirc GIT_SSH_COMMAND doesn't work on el715:57
mordredof course it does15:57
mnasersuper unrelated and old school but http://linux-vserver.org/Capabilities_and_Flags -- seems like there are contexts15:57
mnaserand we can build caps based on contexts, SECURE_MOUNT which is allowing to mount15:57
mnaseri wonder if k8s can support these15:57
mnaserseems like a vserver construct tho :<15:58
clarkbthere are ways around that if people want to hack on dib. One method is FUSE (might make builds slower?) another is mkfs.* for certain filesystems can take an existing fs tree and write it into the new fs on a file without mounting it15:59
clarkbext4 can do that but I don't think xfs or btrfs can16:00
mnaseryeah it's gonna have to be CAP_SYS_ADMIN because thats the only way you can get `mount` :(16:03
corvusi wonder if it's because of things like proc and sysfs16:04
clarkbcorvus: that and the way it writes the file image out is to mount a file as block device16:04
mnaseractually proc and sysfs seem to be ok to mount by default16:04
clarkbit then unmounts that file and you get the .raw image. This is then converted to other formats16:04
clarkbin theory that could be fuse mounted16:05
clarkbit could also have its contents written directly by mkfs if the fs types support that16:05
mnaserapparnetly certain file systems have a `FS_USERNS_MOUNT` flag (procfs, tmpfs, sysfs) which make sthem ok, but mountin ext4/nfs/btrfs/overlayfs etc are no bueno16:06
mordredmnaser: how does img work? (or the other "build images in unprivileged containers using user namespaces")16:08
clarkbmordred: they don't have to create a proper filesystem aiui16:09
mnasermordred: i mean i think i could have avoided that error by disabling the apt cache16:09
mnasercause it was trying to bind mount /var/cache/apt/archives16:09
clarkbmnaser: I don't think so. the end of dib runs is to mkfs on a file, mount it, write the fs out, unmount it, and convert from raw to $format16:10
clarkbthat mount will need perms16:10
mnaseryeah so i would have failed later if i disabled it16:10
clarkbyes16:10
*** chandankumar is now known as raukadah16:10
mnaserbtw lol, the next nugget16:10
mnaser"ps: command not found"16:10
mnaser:p16:10
clarkbdib is running ps or your are?16:11
mnaser /usr/local/lib/python3.7/site-packages/diskimage_builder/lib/common-functions: line 177: ps: command not found16:11
mnaserdib is16:11
mnaserhttps://github.com/openstack/diskimage-builder/blob/master/diskimage_builder/lib/common-functions#L172-L18916:11
clarkbdo we need to install coreutils/sysutils because the container images strip that out16:11
clarkbalso ^ would make the container image based bootstrap weird I expect16:12
mnaserim gonna keep iterating and push up a patch with all the things ill find..16:13
mnasermkdir: cannot create directory '/etc/modprobe.d': Permission denied16:19
mnaserhmmm16:19
mnaserthis feels like a bug16:20
mnaserhttps://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/modprobe/extra-data.d/50-modprobe-blacklist -- it should be `$TMP_MOUNT_PATH/etc/modprobe.d` no ?16:21
mnaseror maybe it should be prefixed with a sudo16:21
clarkbmnaser: I think it should be $TMP_MOUNT_PATH prefixed16:24
mnaserright, because we're not actually trying to check if the host has kmod or not..16:24
clarkbexactly16:24
mnaserok ill push up a patch now16:24
*** ssbarnea has joined #zuul16:26
openstackgerritMerged zuul/nodepool master: Add debootstrap to builder package list  https://review.opendev.org/69970716:27
mnaserhttps://review.opendev.org/699722 modprobe.d: use $TMP_MOUNT_PATH16:27
openstackgerritMohammed Naser proposed zuul/nodepool master: Add procps to packages in Dockerfile  https://review.opendev.org/69972516:32
mnaserhmm16:33
mnasermkdir: cannot create directory '/tmp/dib_build.qfvFRgnB/mnt/etc/modprobe.d': Permission denied16:33
mnaseri guess we gotta sudo that?16:33
clarkbpossibly. The relative permissiones get mind bendy there because that is in the nested fs and ya /etc there is probably owned by uid 0 which is root16:34
clarkband sudo will reconcile that16:34
mnaserok great, got past that16:46
mnaserim gonna keep putting things inside https://review.opendev.org/#/q/topic:nodepool-in-k8s16:47
*** jcapitao is now known as jcapitao|afk16:48
*** rlandy is now known as rlandy|brb17:17
*** hashar has joined #zuul17:21
*** bhavikdbavishi has quit IRC17:23
*** mattw4 has joined #zuul17:35
*** jpena is now known as jpena|off17:36
*** panda has quit IRC17:36
*** panda has joined #zuul17:39
mnaserexec_sudo: losetup: cannot find an unused loop device17:40
mnaseri've wrestled this enough for a while with no success, added CAP_MKNOD and no bueno17:41
mnaserapparently we do some mknods but that feels wrong https://serverfault.com/a/72049617:42
mnaserand seems to imply that they are shared with the host17:42
tristanCmnaser: perhaps try to bind mount /dev/loop-control or authorize the c:10:237 device ?17:42
mnasertristanC: ok, i'lll dig from there17:45
*** rlandy|brb is now known as rlandy17:46
openstackgerritTristan Cacqueray proposed zuul/zuul master: spec: add a zuul-runner cli  https://review.opendev.org/68127717:59
mnaserhmm18:04
mnaserwe can look into this at some point: https://github.com/braincorp/partfs18:04
*** rfolco has quit IRC18:07
*** electrofelix has quit IRC18:09
tristanCmnaser: you'd still need a device access, e.g. /dev/fuse18:09
clarkbtristanC: ya but anyone can read and write to that device18:10
clarkbat least on my machine18:10
clarkbyou won't need additioanl permissions/capabilities past that aiui18:10
mnasertristanC, clarkb: but apparently also fuse can be mounted inside containers18:10
mnaseraccording to some article i remember reading at some point18:10
pabelangermnaser: https://review.opendev.org/415927/ might be helpful, that was my last attemps of DIB inside docker, back in 201718:13
pabelangerthere was even work to use docker for dib matrix of tests18:14
*** yolanda__ is now known as yolanda18:18
*** tosky has quit IRC18:23
*** Goneri has quit IRC18:28
mnaserpabelanger: neat18:35
mnaserim so tempted to just say f'it and add "privileged: true" :(18:35
mnaserwe're at: "failed to set up loop device: Operation not permitted"18:36
tobiashmnaser: there was a time when containerized dib leaked loop devices18:36
mnasertobiash: thats what im worried about too18:36
mnaseri exposed a single loop device only (loop0)18:36
tobiashmnaser: you'll need to use privileged18:36
pabelangermnaser: yah, needs to be privileged right now18:36
mnasertobiash: ive been adding manual CAPS as needed..18:37
mnaserim at CAP_MKNOD and CAP_SYS_ADMIN ..18:37
tobiashok, that'll take some time to find all needed privs ;)18:37
mnaseri think ill cutover to privileged for now to make sure it works then i will scale back18:38
tobiashmnaser: for the loopdev leak, we have this in the root.d phase: http://paste.openstack.org/show/787742/18:41
tobiashbut no idea if that's still required18:41
tobiashwe needed this back in zuulv2 days and stick with that18:42
mnasertobiash: ya i remember running into similar issues a long time ago18:43
*** openstackgerrit has quit IRC18:43
mnaserya privileged just uh, fixed it all, but we'll see.18:44
clarkbtobiash: mnaser re the loop leak I would expect that would affect containerized and not containerized dib the same and i don't believe that is something we see leaking on our builders18:44
clarkbheh but now that I check I think maybe we do18:45
mnaser:P18:45
clarkbwhat is weird about that is we don't seem to hit the node limit18:45
clarkbso we don't leak them quickly?18:45
clarkbin any case that isn't container specific18:45
mnaseri think im going to make builders a statefulset18:46
tobiashclarkb: back then we had this issue only in dockerized envs18:46
mnaserbecause the builder hostname will be changing often during redeploys and the builder ids are constantly changing18:47
clarkbmnaser: that shouldn't matter? the biggest reason to make it stateful will be to keep the cache around so that your builds are faster18:47
tobiashmnaser: yes, builders need to be a statefulset18:47
mnaserhttps://www.irccloud.com/pastebin/VSmF0iC0/18:47
clarkbtobiash: oh?18:47
tobiashAs well as the executors18:47
tobiashclarkb: but I don't remember the reasons18:48
mnasereverytime you redeploy, it'll be a different hostname18:48
mnaserthe executors might make sense bc or the cache18:48
clarkbmnaser: hrm and I guess we use the hostnames to identify deleting images?18:48
mnaseryep18:48
mnaserso they're not deleting cause those "nodes" arent responding18:48
clarkbseems like we could make that better in nodepool, but probably also low priority18:49
clarkb(the image is already deleted on the nodepool side if the host is gone so it should noop and be happy there)18:49
tobiashmnaser: the executors also need a stable identity because of live streaming18:49
*** openstackgerrit has joined #zuul18:50
openstackgerritMerged zuul/nodepool master: Add procps to packages in Dockerfile  https://review.opendev.org/69972518:50
openstackgerritMerged zuul/nodepool master: Functional tests - use common verification script  https://review.opendev.org/69883418:50
*** rfolco has joined #zuul18:52
*** sshnaidm is now known as sshnaidm|afk18:58
*** jamesmcarthur has quit IRC19:03
mnaserok so https://review.opendev.org/#/c/699722/ helped me build images locally if anyone wants to help push that through19:25
clarkbmnaser: +2. ianw is out on pto so may not get too it soon. If another infra reviewer can ack it though I think we can merge it without ianw19:26
*** jamesmcarthur has joined #zuul19:26
* mnaser is trying to avoid having a local build as much as possible19:27
*** mgoddard has quit IRC19:33
*** mgoddard has joined #zuul19:34
*** Goneri has joined #zuul19:37
*** jamesmcarthur has quit IRC19:38
*** mhu has quit IRC19:41
SpamapSI hit an interesting problem today19:48
SpamapSwe have a job in our gate that creates a terraform plan... that's a diff against the infrastructure that it saves as an artifact...19:48
SpamapSbut we don't apply until promote, post-merge. The promote job goes and digs out the artifact, and applies that diff, or complains if the infrastructure changed and the diff is stale.19:49
SpamapSWe approved two changes in rapid succession, and zuul went [gate changeA = plan1][gate changeB = plan1][merge changeA][promote changeA+plan1 == SUCCESS][merge changeB][promote changeB+plan1 == stale FAIL]19:50
*** decimuscorvinus has quit IRC19:51
*** decimuscorvinus has joined #zuul19:52
SpamapSNow, if we semaphore the gate job and the promote job, we can shrink the window for duplicate plans, but we can't eliminate it. There's a window where the semaphore is unlocked, and changeA is merged, and gating changeB wins, and makes a duplicate plan, and then the same scenario happens...19:52
SpamapSAny ideas? At this point, we're thinking block the plan creation gate job until any unapplied plans are applied.. but that's also going to make things extremely serialized (maybe that's what we want?)19:53
corvusSpamapS: does the plan for B not include the changes that A makes?19:54
mordredSpamapS: it seems like the gate job for changeB needs to be making a plan/diff that would be the result of changeB being applied if changeA was already applied19:54
corvus(just wondering why the promote of B didn't see the A plan component as existing/noop)19:54
corvusi think mordred and i are saying similar things19:54
mordredwhich - if it's diffing against production, is kind of hard to simulate in the gate job, since changeA hasn't been applied yet19:54
mordredcorvus: I agree :_)19:55
mordredSpamapS: are corvus and I tracking the problem correctly at least?19:55
SpamapSTerraform doesn't give us the option to assume a plan has already been applied. It's not as smart as git.19:58
SpamapSmordred: yes you're right that the gate job for change B needs to make a plan that includes the results of changeA's plan. In order to do that, one must apply change A's plan. There's no stacking.19:59
corvusyeah, from a high level, it seems like "the production system" is a part of the gate environment.  i think that means it either needs to be able to be modeled serially (so that changes are stacked correctly), or mutexed into the singleton that it is20:01
SpamapScorvus: right, basically we're going to end up passing a lock through as an artifact that promote will unlock by applying or failing, and until that happens, no plans can be created.20:02
corvus(i wonder if it's feasible to make a tool which manipulates terraform plans that way -- subtraction, addition, etc)20:02
SpamapSI'm not sure it would be valid unfortunately. Cloud APIs often have emergent effects.20:03
SpamapSThe plan may be "create a foo" and that creation will get an ID that is now part of the state of the system.20:03
corvusSpamapS: ack20:03
SpamapSI'm also not sure this is how we need the system to work.20:04
SpamapSWe did this to lock in the handoff from gate tests -> promote applies.20:04
corvusthis seems like an interesting consideration in deployment systems -- how stateful vs stateless they are20:04
SpamapSAgreed, this is a particularly sticky wicket.20:05
SpamapSUp until yesterday, we'd just let the promote job apply whatever it needed to based on the code in the repo.20:05
corvusSpamapS: incidentally, if it's not too boring for you to explain it to me, why make a plan in the gate?  why isn't that just something that happens post-merge?20:05
corvusoh, heh, i think your last sentence is getting at my question :)20:06
corvusso yeah, what changed?20:06
SpamapSThat's a good question.20:06
SpamapSWe wanted to make sure that what we gated is the *only* change that happens. I'm not sure it's as valuable as we thought, especially if it makes our deployment pipeline serialize with the gate.20:06
corvusoh interesting20:07
SpamapSThe minor problem we were solving is that sometimes there are in-process manual changes that may get overridden by stuff landing in the gate.20:07
SpamapSTBH I'm struggling to come up with reasons of value.20:08
corvusheh, it's always those "manual" edge cases that mess up this whole gitops thing20:08
SpamapSWe may just want to drop it and be a bit more forceful with "apply what's in the code base"20:08
* corvus looks at opendev's "emergency" file20:08
SpamapSThere was also some talk about vetting the plans in the gate, so like, make sure it never deletes an RDS or something, but one can't really inspect them in the current state of terraform, so that's just a fantasy.20:09
*** jamesmcarthur has joined #zuul20:09
SpamapSI actually think the thing we want is not plan-in-the-gate but plan-in-check. So.. inform the user of what this change would do, and then maybe give them some kind of option to say "yes this is approved, but only with that plan"20:11
clarkbcorvus: re gerrit and zuul. I dont think yoi can do blue green deployments of zuul if gating. We can do that for subcomponents that scale out like the executor though20:12
corvusclarkb: is that re the original msg or the reply i just sent?20:13
clarkbcorvus: the one you just sent and the one from thomas20:13
clarkbbut as you say its well tested20:13
corvusi meant to say that blue/green wasn't necessary because of gating, but maybe i could have been more explicit20:13
corvusi wrote too many words20:14
clarkbcorvus: ya I got that but it goes the other way too. If the zuul isgating aproject you cant blue green that indtall due to shared statr20:14
clarkbbut ya I think once you get to gated state a lot of those concerns go away20:15
corvusah yes.  the flip side of SpamapS's coin20:15
SpamapScorvus: mordred thanks for your wisdom. I think we're going to move the plan generation into promote.20:15
*** rf0lc0 has joined #zuul20:15
corvusSpamapS: sounds good; thanks for the brain food.  i love hearing about use cases like this20:16
SpamapSThat way we still get a plan artifact of what happened, but we don't block gate jobs from running.20:16
*** hashar has quit IRC20:16
SpamapS(terraform plans tied to git commits are extremely useful for audits / RCA's)20:16
*** rfolco has quit IRC20:17
*** hashar has joined #zuul20:24
SpamapScorvus: if we find that we do want to have this serialization between gate and promote, I wonder if there's room for a new type of mutex-ish object where it follows the artifacts across pipelines. As annoying as serializing the plan generation would be.. as long as everything else could run in the gate.. and there's just this big queue of "generate plan and upload artifact" waiting.. that's a fast process..20:37
SpamapSthis might still be useful in other contexts.20:37
SpamapSbut, yeah, let's wait for a second attempt at it before we go beyond noodling20:37
*** hashar has quit IRC20:38
*** rf0lc0 has quit IRC20:41
mnaserdoes paramiko have any sort of uh, run-time dependencies like the actual ssh binaries?20:42
corvusi don't think so20:42
mnaserssh-keyscan to the IP of this machine works perfectly, but nodepool is timing out20:43
corvusmnaser: is it using the right ip?  it should be in the error log if it's timing out20:55
mnasercorvus: yeah, its the ipv4 one (eliminating any ipv6 shenanigans)20:56
pabelangerthat timeout do you have setup for boot?20:56
pabelangermaybe keyscan happening too fast?20:56
corvusmnaser: this is the method, if you want to try manual debug: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/nodeutils.py#L5820:57
corvusmnaser: you can actually just import that and run it from the repl; no special objects/classes needed20:57
mnaseryeah im running it manually on my local machine vs that container to see what is different20:57
mnaserit def times out running it seperately too20:57
* mnaser hmms20:57
mnaserthe nodepool-builder container has a bit more stuff in there (ssh tool is there) and it scans successfully20:59
mnaserlet me see if it actaully runs there20:59
mnaserok, extracting the method out, it actually.. works?21:06
mnaserit is returning a `ssh-ed25519` key21:06
*** rf0lc0 has joined #zuul21:26
*** rf0lc0 has quit IRC21:26
*** jcapitao|afk has quit IRC21:38
corvusclarkb: do you think we should restart opendev before releasing 3.14?21:44
clarkbcorvus: let me look at the git log21:44
corvusit's the 2.5 removal, smart-reconfig, and a couple of bugfixes.21:45
clarkbya those bugfixes are maybe worth restarting for since they affect pipeline behavior?21:46
clarkbI'm not too concerned about the ansible version removal21:46
corvusk, i'll get that started then21:46
fungii'm only on for a moment from tonight's hotel, but have to cheer for the "pi release"21:48
corvusfungi: wait till the bugfix releases.... 3.14.1592653521:49
fungiyass21:50
clarkbthey put pi on the wall in our only underground MAX station here. And got it wrong21:50
corvusknuths christmas lecture this year was on pi21:50
clarkbapparently they had taken the value as printed in some textbook which also got it wrong21:50
corvusclarkb: wow21:50
clarkband its carved into the stone wall21:50
clarkbso they never changed it :)21:51
fungishould i be worried about how well the max isn't engineered, if they can't get pi right?21:51
clarkbfungi: I think siemens makes the trains and not local construction company so probably ok21:51
corvushttps://www.roadsideamerica.com/tip/2081421:51
fungii bet siemens knows pi21:51
*** dtroyer has joined #zuul21:52
*** pcaruana has quit IRC21:54
*** saneax has quit IRC21:55
corvusalso letterspacing lining numbers is a bit of a typographic blunder.21:57
clarkbcorvus: if only they had you to help them do the layout :)21:58
corvusyes, they could have had the wrong numbers in better style!21:58
clarkbone of the really neat things about that station is the have the vertical cores they took laid out horizontally then have applied geological timeline tidbits along it21:59
corvusok, opendev restarted; we'll watch that a bit and then cut a release22:01
*** jamesmcarthur has quit IRC22:04
*** mattw4 has quit IRC22:07
*** mattw4 has joined #zuul22:10
*** mattw4 has quit IRC22:51
*** saneax has joined #zuul22:54
*** mattw4 has joined #zuul22:59
*** rlandy is now known as rlandy|bbl23:11
*** saneax has quit IRC23:24
*** mattw4 has quit IRC23:40
*** mattw4 has joined #zuul23:40
*** mattw4 has quit IRC23:45
*** mattw4 has joined #zuul23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!