Friday, 2021-07-02

opendevreviewJames E. Blair proposed zuul/zuul master: Use the nodeset build parameter instead of hosts/groups  https://review.opendev.org/c/zuul/zuul/+/79912700:26
corvusclarkb: ^ in response to your comments; tristanC ^ can you look at that and check if that will work with the runner (i suspect it may need a minor update)00:28
opendevreviewJames E. Blair proposed zuul/zuul master: WIP: Remove nodes/groups from build parameters  https://review.opendev.org/c/zuul/zuul/+/79912800:28
opendevreviewMerged zuul/zuul master: Lock/unlock nodes on executor server  https://review.opendev.org/c/zuul/zuul/+/77461000:48
opendevreviewIan Wienand proposed zuul/nodepool master: [wip] make /sys ro in container  https://review.opendev.org/c/zuul/nodepool/+/79912500:53
opendevreviewMerged zuul/zuul master: Remove ExecutorApi.update() call from tests  https://review.opendev.org/c/zuul/zuul/+/79877803:23
opendevreviewMerged zuul/zuul master: Add some jitter to apscheduler interval cleanup jobs  https://review.opendev.org/c/zuul/zuul/+/79895303:47
opendevreviewMerged zuul/zuul master: Add comment for test method  https://review.opendev.org/c/zuul/zuul/+/79895403:47
*** jpena|off is now known as jpena06:52
opendevreviewSimon Westphahl proposed zuul/zuul master: Refactor pipeline processing in run handler  https://review.opendev.org/c/zuul/zuul/+/79598509:40
opendevreviewTobias Henkel proposed zuul/nodepool master: Speed up node list  https://review.opendev.org/c/zuul/nodepool/+/76054211:18
opendevreviewTobias Henkel proposed zuul/nodepool master: Delete init nodes when resetting lost requests  https://review.opendev.org/c/zuul/nodepool/+/74410711:21
opendevreviewTobias Henkel proposed zuul/zuul master: Call manageLoad during pause and unpause  https://review.opendev.org/c/zuul/zuul/+/75576511:23
*** jpena is now known as jpena|lunch11:36
*** bhagyashris_ is now known as bhagyashris|ruck12:15
*** jpena|lunch is now known as jpena12:36
opendevreviewTobias Henkel proposed zuul/zuul master: Optimize stats reporting per node request  https://review.opendev.org/c/zuul/zuul/+/75496712:53
opendevreviewDenis proposed zuul/zuul master: [api][cors] Access-Control-Allow-Origin * for all routes  https://review.opendev.org/c/zuul/zuul/+/76769114:09
opendevreviewDenis proposed zuul/zuul master: [api][cors] Access-Control-Allow-Origin * for all routes  https://review.opendev.org/c/zuul/zuul/+/76769114:09
corvusi've just restarted opendev's zuul on master14:31
tobiash[m]yay, fingers crossed14:33
corvussome jobs have run to completion, so yay!  i'm not quite sure what to make of the changes in zk behavior yet14:35
opendevreviewJames E. Blair proposed zuul/zuul master: Fix zuul.executors.accepting stats bug  https://review.opendev.org/c/zuul/zuul/+/79921915:07
corvustobiash: ^15:07
corvustobiash: so we still are lacking a method of calculating the executor queue, right?15:07
corvustobiash: i think that was in an old version of the executor api change, but somehow got dropped15:11
tobiash[m]I think so15:11
corvusi found it in commit 0004c2865c (an old unmerged commit of I5de26afdf6774944b35472e2054b93d12fe21793)15:12
corvusi'll see if i can extract it15:12
corvusthe total is easy; the zone stuff is going to require a little extra work15:19
corvusso i think this will be original code instead of just extracting the old stuff15:20
opendevreviewJames E. Blair proposed zuul/zuul master: Correct executor queue stats  https://review.opendev.org/c/zuul/zuul/+/79922315:38
corvustobiash, fungi: ^ i think that should do it15:38
corvuswe may have a steadily increasing number of zk watches -- we should keep an eye on that and see if it levels out or keeps increasing15:40
clarkbthose should grow with the number of nodes right? since we add watches under each znode with contents worth watching? If we see znode count fall dramatically but watches continue to climb that could indicate a leak15:41
corvusyeah, nodes are holding steady at 40k while watches are increasing linearly15:42
corvusit's possible we're not clearing them from the executor api15:43
corvusbut i want to give it a bit more time before i call it a pattern15:43
clarkbcorvus: super small thing on the stats fix15:47
*** jpena is now known as jpena|off15:53
opendevreviewJames E. Blair proposed zuul/zuul master: Correct executor queue stats  https://review.opendev.org/c/zuul/zuul/+/79922316:09
corvusclarkb: ++16:09
clarkbcorvus: are we able to query zk for a list of watches via the admin side?16:31
clarkbcorvus: if so maybe we take a listing now then wait $timeperiod and take another listing and check the delta?16:31
clarkbif we see watches going away we know we aren't leaking those watches but if some category does not go away then there is a good chance those do leak?16:32
opendevreviewMerged zuul/zuul master: Fix zuul.executors.accepting stats bug  https://review.opendev.org/c/zuul/zuul/+/79921916:34
clarkbwchs, wchc, and wchp are commands that will do that (I don't think we have any enabled via the socket but we can hit them via the rest api?). Note that wchc and wchp come with warnings that those operations can be expensive and impact the server16:49
clarkbcorvus: I think wchp is what we want: "Lists detailed information on watches for the server, by path. This outputs a list of paths (znodes) with associated sessions. Note, depending on the number of watches this operation may be expensive (ie impact server performance), use it carefully."16:59
clarkbof course now we have to decide if we are willing to try it (maybe run it on zk04 which has the least number of watches?)16:59
clarkbcorvus: running wchs (the one not annotated with be careful messages) shows the number of paths and the number of watches. The number of paths seems to be growing far less than the number of watches17:11
clarkbbut both do seem to grow17:12
corvusclarkb: sorry was away; good ideas17:24
clarkbthats ok I've just been monitoring wchs informally and both num paths and watches go up17:25
corvusclarkb: i'd like to do either a wchc or wchp; which do you think?  i'm thinking wchp17:25
* clarkb rereads the docs17:25
clarkbcorvus: ya wchp seems better since it is waches by path. I suspect that mapping will make it easier to find any potential leaks17:26
clarkbcorvus: do you want me to go ahead and run that on zk04 or do you want to do it? It appears that it pretty prints json too though I should test that redirected17:27
clarkbinterestingly zk05 had a large drop in watches but zk04 and zk06 have not17:29
corvusclarkb: why don't you do it and save the output since you have a session going17:32
clarkbok17:33
clarkbcorvus: zk04:~clarkb/wchp.20210702-first17:34
corvusi'm logged into zk04 (also, this is borderline opendev ops related, but i figure since we're suspecting a zuul issue, we can keep going in this channel)17:34
clarkbya I figured it would be of interest to this group17:34
corvusas expected, many build request watches17:34
clarkbI can rerun it in say 5 minutes then we can look at diffs I guess17:34
corvus++17:34
corvusi'm going to look up some uuids while we wait17:34
corvusclarkb: oh, actually, i'd like to know who's on the other end of sessions 288312462838726666, 288312462838726667, 28831246283872665917:35
clarkbalso it ran pretty quickly implying that this wasn't too expensive for us17:35
corvusdo we have a way to figure that out?17:36
clarkbI'm sure there is a way but I don't know off the top of my head17:36
clarkbhttps://zookeeper.apache.org/doc/r3.5.9/zookeeperAdmin.html may tell us17:36
clarkbcons maybe17:36
corvusclarkb: are you just doing "nc localhost 2181"?17:36
corvusoh you're using the rest api?17:37
clarkbcorvus: no I'm using the rest api because the wch* commands are not whitelisted17:37
corvusclarkb: what's that look like?  i've never used it17:37
corvusthx for the pm :)17:38
clarkb6659 is zuul02, 6667 is ze12, and 6666 is ze0517:40
corvusit is interesting that all of the requests have exactly those hosts...17:42
clarkbthere are only 6 total sessions from when I ran cons, this is I think expected because other clients will be talking to toher servers in the cluster17:42
corvusah of course17:42
clarkblet me see what the other sessions are, they could all be mergeers maybe?17:42
clarkb288312462838726660 is also zuul02. Everything else is zuul mergers or nodepool builders17:44
corvusclarkb: adding up all the watchers for /zuul/build-requests/unzoned/ca6cd1ae56cd406eb330ce0d5b6abbc4 gives us 13 sessions17:44
corvusso we know every build request is getting a watch from every executor plus the scheduler17:44
clarkbI'm running my followup wchp now17:45
clarkbdiffing the two shows that no build request watches went away, we only added new ones (which could be coincidence given the time some of those take to run?)17:47
corvusi checked the first item in the first list: ff56c3afd7cc4ac0b143c54cfdc53a72 and zuul says it finished at 15:33:00,761 which is almost exactly the time of your first query.  but it's still there in the second query.17:47
corvusi think that's a leak17:47
clarkbya seems like it17:47
corvusokay, i'm going to go off and stare at some code17:47
clarkbcorvus: maybe double check that we aren't returning an exception in the logs in the watch methods which prevents the watch from being cleaned up?17:48
corvus(and maybe think about calling wchp from unit tests...)17:48
clarkbsince we need to return False alone iirc17:48
clarkbcorvus: does the request still exist too?17:48
corvusgood q17:49
clarkbthat might be another thing to check is if we are leaking the znode entirely? (Not sure how znode deletions interact with watches)17:49
corvuswe do have some exceptions with the executor client trying to return nodesets, we should look at that but it's likely a separate problem17:49
corvusclarkb: Path /zuul/build-requests/unzoned/ff56c3afd7cc4ac0b143c54cfdc53a72 doesn't exist17:51
corvusso just a watch leak17:51
corvus(in general, you can watch paths that don't exist in zk)17:52
clarkbgot it17:53
clarkbthinking out loud here if this was simply a matter of watch callbacks not being called quickly enough due to contention we would expect some of the watches to go away but more slowly than they are created. The diffs indicate this is unlikely the case as we only see new watches and none removed17:54
corvus++17:55
corvusclarkb: if there were an exception in our callback, it would prevent the watch from being removed.  but i'm pretty sure it would be logged under kazoo.recipe.watchers17:59
corvusi believe https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/watchers.py#L155 is the relevant kazoo code17:59
corvuswe probably should make our callback more robust, but i don't immediately see the error since i think that isn't happening.18:00
clarkbcorvus: could it be the if not content: return code at the top of the callback?18:02
clarkbthat would only trigger on the changed case but if CHANGED and DELETED are both triggered by deletion maybe that is the problem?18:03
clarkbhrm except that we should get it fired for both events in that case18:03
clarkbunless event.type is a mask18:03
clarkbhttps://kazoo.readthedocs.io/en/latest/api/protocol/states.html#kazoo.protocol.states.EventType doesnt say anything about masks18:06
corvustesting says we just get deleted on deleted18:08
clarkbperhaps something to do with the closure18:13
clarkblike maybe that has messed up kazoo's ability to count arguments and pass in event properly? (It shouldn't since we hide that from kazoo with the watch() signature18:13
corvusi can observe the behavior in tests18:16
corvusby adding a whcp call to the end of a test18:16
clarkbside note: a third wchp continues to show the same behavior, no build request watches are being cleaned up18:16
corvusi think this may be a characteristic of the datawatch kazoo recipe18:20
corvusi think there would have to be one more update after the delete in order for the watch to be triggered and not re-set18:22
corvusi think https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/watchers.py#L191-L192 causes the watcher to be re-set after the delete happens18:23
clarkbcorvus: isn't that running before the callback on line 206?18:26
corvusthe java api has a removeWatches call, but i don't see that in kazoo18:26
clarkbdistinct to https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/watchers.py#L168 ?18:27
corvusclarkb: exactly: delete in zk happens; zk sends client a "delete" event for the watcher; Datawatch calls getdata(); getdata() calls 'get' which fails, then calls 'exists' which succeeds and sets a watch.  then our callback happens, and we tell it to stop.18:28
corvusthe system is left with a watch in place because of the exists call.18:28
corvusi think the only way to get a datawatch to stop watching a path is to return False before the node is deleted, and then have one more event (which could be deletion) happen.  i think the 'exists' handling was added so you could watch a node that doesn't exist yet; i don't think they thought through watching a node that gets deleted.18:30
corvusbasically, we want a datawatch which doesn't call exists18:31
clarkband the exists call sets a watch that is _get_data so it is recursive18:33
clarkband _get_data never seems to return False?18:34
clarkbcorvus: can we fix this with an event type check in _get_data? basically don't call the get or exists if event is deleted? we already know it isn't existing18:35
corvusclarkb: yes -- but it may be an uphill battle to get that accepted; i could see the argument that they may have users that want watches to persist after deletion.  so at least i think it would need to be a new optional argument to the constructor or something18:38
opendevreviewJames E. Blair proposed zuul/zuul master: WIP datawatch  https://review.opendev.org/c/zuul/zuul/+/79931718:39
clarkbcorvus: ya that seems reasonable, or even a sublcass: DeletableDataWatch18:39
corvusclarkb: ^ that has the behavior we want (i removed the exists check, since we're not calling it until after we know it exists)18:39
clarkbright we only add those watches when we have a child watch that indicates the entity exists. I suppose there is a race there if the buildrequest is immediately deleted for some reason?18:40
clarkbbut in that case we still don't want to watch it18:41
clarkbwe just want to move on. I think that means your fix is appropriate if you ignore private api overrides18:41
corvusclarkb: yeah, either approach would work for us: option (clarkb): datawatch can be created regardless of whether the znode exists, but once it's deleted, watch is removed.  option (corvus): watch can only be created if node exists, watch is removed if node deleted.18:41
corvusyeah, maybe mine is slightly preferable in that it helps keep things tidy in the case of the create-then-delete race?18:42
clarkbya. Also I think you need to set data and stat since they may not be set if an exception is raised?18:42
clarkbthey don't get None values they will be undefined iirc18:42
clarkbare watches an implied ephemeral type? if the connection goes away we'll auto purge the leaked watches? (thinking about what cleanup might entail)18:46
clarkbcorvus: we also use DataWatch() elsewhere in existing code. Any idea why this doesnt' seem to be a problem for them?18:49
corvusyeah, when clients disconnect, watches disappear18:49
clarkbNodeRequests in particualr seem to be usign DataWatches18:49
corvusclarkb: we stop watching node requests once they are fulfilled or canceled18:52
corvusbut deletion is a separate step18:52
clarkbgot it that stops the watching entirely so when the deletion event happens there is no watch to recursively keep watching with at that point18:52
corvusyep18:53
corvusthere may be some edge cases there, but that's the typcial workflow.  i expect it would be okay to use whatever we come up with here for that as well18:53
clarkbthough that could potentailly race, it is unlikely to happen in bulk like this (and possibly never at all depending on timing)18:53
clarkb++18:53
corvusanother approach we could do is to avoid using DataWatch and set the watches ourselves... but honestly, datawatch is a relatively simple recipe, and we'd probably just end up reimplementing most of it.18:54
clarkbThis seems like a reasonable enough use case that we should maybe try upstreaming your child class?18:56
clarkbBasically a case where you only awnt to watch something as long as it exists18:56
corvusclarkb: re data+stat -- i don't think we need to set those?  because of the return?18:56
clarkboh I missed there is a return in the excpetion handler. I thought it was falling through. Don't we need to fall through and call our watcher one last time for the DELETED event?18:57
corvusyes18:57
clarkbthe deletion will happen and the watcher will fire for the DELETED event. The get will fail on NoNodeError and we then need to set None types for those values and fall through and let _log_func_exception run one more time I think18:58
clarkband then maybe try upstreaming that as ExistingDataWatch with some updates to the code to check self._ever_called and short circuit if that is false18:59
clarkbya I think in the exception handler block you want to do what you currently have if not self._ever_called else fallthrough and let the handler run one last time19:01
clarkbTHen you should haev a watcher that does nothing if the data does not exist when created and will stop doing things when the data is eventually deleted19:03
clarkb(if the watcher chooses to handle the deleted event I guess)19:03
corvusor we could still allow it to be called the first time, just with null data/stat/event; then the callback would know that it was deleted19:04
clarkbya that would work too19:04
clarkbif not data and not event then you don't exist at initialization.19:05
clarkband i guess letting the callback decide what to do in that case is more flexible19:05
opendevreviewJames E. Blair proposed zuul/zuul master: WIP datawatch  https://review.opendev.org/c/zuul/zuul/+/79931719:05
corvussomething like that ^19:05
corvusi'm going to get lunch, then i'll polish it up19:05
clarkbleft a comment. I think it is really close19:11
clarkbAnd thinking out loud more, maybe it is better to vendor the parent class in the child (and separate them) so that we don't run into private api change problems? at least until we can get something upstreamed?19:14
clarkbkazoo is apache2 licensed so that shouldn't pose a problem. We jsut do some attribution and should be good?19:15
corvusclarkb: replied... can you take a look?20:47
clarkbcorvus: yup lookiong now20:54
clarkbcorvus: what about the case where you might want to keep listening after handling the DELETE event. I guess we're basically saying that isn't a thing here20:56
clarkbtheis watcher is only good on valid nodes until they are deleted?20:57
clarkbcorvus: your comment makes sense though. And I agree with it as long as we don't intend on letting the DELETE event handler decide if the watch should stop or not20:59
corvusclarkb: yes that's the intent; structurally, it's very difficult to let the callback decide that, because the datawatcher calls the zk method that would set the watch before calling the callback21:04
corvusby very difficult, i think i mean 'impossible within the remaining design constraints of the DataWatch recipe'21:05
clarkbI think you would have to use the self._ever_called and event.type attributes to decide that21:05
clarkbif self._ever_called and event.type == DELETED and not data then you know it has gone away21:06
corvusclarkb: no it's too late21:06
corvusmaybe i'm misunderstanding the suggestion21:06
clarkbkeeping the behavior simple like you've done makes sense to me regardless of trying to figure out the state machine21:06
corvusi just want to check something21:07
corvusyou know the zookeeper watch is actually set by the "client.get" call, right?21:07
clarkbcorvus: I think you can call retry(exists, watcher) if the callback didn't return false and self._ever_called is true and data is None21:08
clarkbcorvus: yes both the get and the exists set it21:08
corvusok, then it's me that's missing something... :)21:08
corvusbut the callback is called after the exists call21:08
corvusso how do we let the callback decide whether or not to set the watch?21:08
josefwellsHey zuul-friends, getting close to my zuul on k8s.  My executor logs show: 2021-07-02 18:54:47,152 DEBUG zuul.AnsibleJob.output: [e: e16b8800-db66-11eb-8e5e-3951beb15980] [build: c7f4c441db6b4223b861f95dd37b2deb] Ansible output: b'bwrap: Failed to make / slave: Permission denied' 21:09
clarkbcorvus: you can move that call after I think21:09
clarkbcorvus: and then wrap it in the conditional to check if the callback wants you to put it back again21:09
clarkbcorvus: its more trouble than it is worth21:09
corvusclarkb: it's before so that we pass in the stat21:09
corvusclarkb: yeah that would mostly work, but i think it's potentially racy21:10
corvusthe "exists+watch" combo provides an important assurance that we don't miss any changes21:10
josefwellsbut I'm not sure where to start looking for the issue.. any debug pointers?  Everything is suspect to me :)21:10
corvus(or get+watch)21:10
corvusjosefwells: make sure the executor container is running privileged21:11
josefwellscorvus: ok, thanks21:12
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Enable ZooKeeper 4 letter words  https://review.opendev.org/c/zuul/zuul-jobs/+/79933421:24
opendevreviewJames E. Blair proposed zuul/zuul master: Add ExistingDataWatch class  https://review.opendev.org/c/zuul/zuul/+/79931721:26
corvusclarkb: ^ final version21:26
clarkbcorvus: cool looking21:26
corvuswell, i mean, finally ready for real review :)21:26
corvusi'll work on a kazoo pr meanwhile21:27
josefwellscorvus: ok, lets just say for the sake of hypotheticals that I can't run a privileged container in my cluster.. is there some other option?21:42
clarkbjosefwells: I think it is possible you could enable the specific items that bwrap needs to create its containers21:43
clarkba privileged container is a shortcut for that. That said there is a good chance that won't work because those may be considered to be effectively the same as privileged21:43
josefwellsbummer town, guess I just need to bite the bullet and own another server21:44
josefwellstrying to get out of that business21:44
josefwellsclarkb: where can I find a list of privileges I would need?21:45
opendevreviewMerged zuul/zuul-jobs master: Enable ZooKeeper 4 letter words  https://review.opendev.org/c/zuul/zuul-jobs/+/79933421:45
josefwellsclarkb: Oh, if it is ability to create containers, yeah, seems like that would be..21:46
clarkbjosefwells: https://github.com/containers/bubblewrap#sandboxing I think that covers the possible options. I know the the user namespaces are required for zuul's use case. Not sure about the others21:46
clarkbjosefwells: I know that tobiash[m] has/had to run zuul on a separate openshift from their normal openshift in order to allow for privileged containers. corvus also helps manage the gerrit zuul which runs on gks. I think on gks it is a non issue though21:52
clarkbit is possible tobiash[m] or tristanC may have input on how to make it run more happily21:52
opendevreviewMerged zuul/zuul master: Correct executor queue stats  https://review.opendev.org/c/zuul/zuul/+/79922321:53
josefwellsclarkb: thanks.. I've checked with my people.. I'll figure something out21:55
clarkbcorvus: I did leave a comment on the latest patchset to make sure I understand the test watch thing properly. Otherwise I think we can go with that22:11
corvusclarkb: cool; want to take a quick look at https://github.com/python-zk/kazoo/pull/648 ?22:18
corvus(while i review your comment)22:18
opendevreviewJames E. Blair proposed zuul/zuul master: Add ExistingDataWatch class  https://review.opendev.org/c/zuul/zuul/+/79931722:22
corvusclarkb: i did change something in that test method ^22:22
clarkbcorvus: For the PR: small typo in the docstring for the new class: "functiol" instead of "function". The tests you added look good. The only thing I would caution with them is the event.wait(0.2) lines may be too short depending on how long the watchers take to process. But waiting for longer will make teh tests slower22:25
corvusoof, thx ill fix the typo22:25
corvus0.2 was used in other tests with similar 'expected failure' so i went with that; i agree it's concerning tho22:26
corvuszuul-maint: if anyone else is around to review https://review.opendev.org/799317 that would be great; otherwise, i'll probably merge it in a little while just with +2 from clarkb and me, so we can restart opendev with that before the watches get too out of hand (i'm not too worried about them, but would like to nip it in the bud before it becomes a problem :)22:30
clarkbI need to take a short break but I can be around for that later today22:36
clarkbthe merging and restart and all that I mean22:36
corvuscool, i went aheand and +w'd it; if someone sees an issue, feel free to block it; otherwise, hopefully it'll merge within an hour or so and we can restart; i'm going to afk and get out of this chair for a bit :)22:39
fungiheh, was about to say i'll take a look22:46
fungibut there's still time if i catch a problem with it22:46
funginot that i expect to22:46
fungiahh, yeah this was tested on the z-j change i reviewed22:47
clarkbThe upstream PR is maybe a bit easier to read23:00
clarkbyou can pick out the new stuff a bit easier in the diff there, though corvus annotated the vendored code with comments23:00
fungiyep23:01
clarkbless than 11 minutes zuul says23:32
opendevreviewMerged zuul/zuul master: Add ExistingDataWatch class  https://review.opendev.org/c/zuul/zuul/+/79931723:38
corvusyay!23:38
corvushrm, promote failed; i'll re-enqueue that23:41
corvusoh actually, it failed in the cleanup part; i tihnk it's okay, i'll double check dockerhub23:41
clarkbok23:42
corvusno, i think it needs to re-run; i'll re-enqueue23:43
clarkbcorvus:  Ithink it may be old still23:43
clarkbya23:44
clarkbI'll notify the openstack release team though it should be a noop since it is the weekend for them23:44
corvussomething's fishy; it failed again, and there's no change_799317_latest tag23:46
clarkbhuh did the cleanup possibly fail because the image wasn't uplaoded it he first  place?23:48
clarkbno it seems we uploaded to docker hub23:51
clarkbhttps://zuul.opendev.org/t/zuul/build/08da96babad640a29136acdbb2244b49/log/job-output.txt#11370 is where that should have happened23:51
corvusit's almost like dockerhub isn't being consistent23:51
corvusbut the "Get manifest" task shouldn't have worked without that tag23:53
corvusdocker pull zuul/zuul:change_799317_latest works for me23:53
corvusso basically, the tag actually exists in the registry, but it doesn't show up in the web ui and the api can't delete it23:54
clarkbweird23:55
clarkbthat would explain the failure then I guess23:55
corvusyeah; we could manually pull that tag on all the hosts and tag it as latest; that's not great though since it could restart into something else23:56
clarkbhttps://status.docker.com/pages/533c6539221ae15e3f000031 indicates an active issue with email23:56
clarkbI doubt that is related to the issue we are seeing23:56
clarkbcorvus: any idea if we can manually update latest on dockerhub?23:56
corvusmaybe smtp is their distributed consensus algorithm23:56
clarkbor maybe we already tried that23:56
corvusclarkb: yeah i think it should work; basically do what the job does without the 'delete' call23:57
clarkbya that23:57
corvuswe could also add a 'failed_when: false' to that delete method23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!