Wednesday, 2019-10-09

*** armstrongs has quit IRC00:00
*** mattw4 has quit IRC00:07
*** saneax has quit IRC00:30
*** igordc has quit IRC01:24
*** dperique has joined #zuul02:11
*** dperique has left #zuul02:12
*** dperique has joined #zuul02:12
*** dperique has left #zuul02:14
*** bhavikdbavishi has joined #zuul02:40
*** bhavikdbavishi1 has joined #zuul02:43
*** bhavikdbavishi has quit IRC02:44
*** bhavikdbavishi1 is now known as bhavikdbavishi02:44
*** bhavikdbavishi has quit IRC03:40
*** bhavikdbavishi has joined #zuul03:44
*** raukadah is now known as chandankumar03:58
flaper87corvus: nice, thanks for suggesting `tools`. I'll be writing this script for sure. I hope to be able to do that in the next month or so04:18
*** saneax has joined #zuul04:42
*** bolg has joined #zuul06:39
*** hashar has joined #zuul07:09
*** pcaruana has joined #zuul07:10
*** themroc has joined #zuul07:13
*** tosky has joined #zuul07:14
*** saneax has quit IRC07:20
*** jpena|off is now known as jpena07:46
*** tosky has quit IRC07:58
*** tosky has joined #zuul07:59
*** hashar has quit IRC08:28
*** saneax has joined #zuul08:32
*** hashar has joined #zuul08:34
openstackgerritSimon Westphahl proposed zuul/nodepool master: Sort waiting static nodes by creation time  https://review.opendev.org/68727108:43
*** themr0c has joined #zuul09:31
*** themroc has quit IRC09:31
*** hashar has quit IRC09:59
*** avass is now known as Guest6185710:00
*** avass has joined #zuul10:00
*** themr0c has quit IRC10:02
*** themr0c has joined #zuul10:02
*** themr0c has quit IRC10:09
*** themroc has joined #zuul10:11
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - add the enqueue_ref unit test  https://review.opendev.org/68735110:13
*** jamesmcarthur has joined #zuul10:15
*** jamesmcarthur has quit IRC10:19
*** bhavikdbavishi has quit IRC10:24
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - add support for git.tag.creation event  https://review.opendev.org/67993811:14
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - Support for branch creation/deletion  https://review.opendev.org/68511611:15
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - add the enqueue_ref unit test  https://review.opendev.org/68735111:15
openstackgerritSimon Westphahl proposed zuul/zuul master: Add optional support for circular dependencies  https://review.opendev.org/68535411:16
openstackgerritSimon Westphahl proposed zuul/zuul master: Spec for allowing circular dependencies  https://review.opendev.org/64330911:23
*** jpena is now known as jpena|lunch11:29
*** bhavikdbavishi has joined #zuul11:52
*** bhavikdbavishi1 has joined #zuul11:55
*** bhavikdbavishi has quit IRC11:56
*** bhavikdbavishi1 is now known as bhavikdbavishi11:56
*** themroc has quit IRC11:57
*** avass has quit IRC12:05
tobiashcorvus: I had some thoughts on 677111 (direct push) what do you think?12:16
*** jpena|lunch is now known as jpena12:23
*** avass has joined #zuul12:44
*** jangutter_ has joined #zuul13:26
*** jangutter has quit IRC13:30
*** pcaruana has quit IRC13:31
*** pcaruana has joined #zuul13:33
*** ianychoi has quit IRC13:41
*** saneax has quit IRC13:43
*** rf0lc0 has joined #zuul13:51
*** klindgren_ has joined #zuul13:52
*** johanssone has quit IRC13:56
*** rfolco has quit IRC13:56
*** swest has quit IRC13:56
*** SotK has quit IRC13:56
*** gothicmindfood has quit IRC13:56
*** klindgren has quit IRC13:56
*** phildawson has joined #zuul13:56
*** SotK has joined #zuul13:56
*** swest has joined #zuul13:57
*** openstackstatus has quit IRC13:58
*** johanssone has joined #zuul13:58
*** rf0lc0 is now known as rfolco14:02
*** swest has quit IRC14:03
*** fdegir has quit IRC14:06
*** fdegir has joined #zuul14:07
*** jamesmcarthur has joined #zuul14:19
corvustobiash: replied14:23
tobiashcorvus: so you'd prefer to only make the push asynchronous instead of reporting in general?14:25
corvustobiash: yes, i think so -- i don't think we should scale out the scheduler with internal threads -- that has a limit...  but that just made me think of something....14:28
corvustobiash: we could leave the reporting implementation the way it is, and make the direct-push work depend on scale-out schedulers.14:29
corvustobiash: then, if a single scheduler is busy managing a push, other schedulers can still work on other pipelines14:29
corvus(no matter how we implement it, a pipeline is basically frozen while its head is being reported)14:30
tobiashgood point14:30
tobiashcould we decouple the cyclic dependency support then from the direct push? Technically those two are independent of each other.14:31
corvustobiash: i'm uncomfortable having cyclic dependencies without 2-phase commit or some way to roll back14:32
tobiashwe could make the limitaions clear in the docs and it will always be an opt-in feature14:33
corvuspersonally, i would say "don't use this, it is too dangerous, you can get into a state where you have to fix repos manually" and i don't like the idea of putting code out there that we tell people not to use.14:35
corvusgerrit/github saying "no i can't merge this" is not theoretical, it happens14:35
tobiashmy problem is that we have a pressing need for cyclic deps and cannot waitfor scale out scheduler. And I'm also uncomfortable rebasing this for 6 months+14:36
tobiashand with github enterprise direct push is not possible at all yet14:36
corvustobiash: re ghe, why?14:37
tobiashbecause of access restrictions with github apps, it's been resolved in github.com two months ago14:37
corvustobiash: do you have an eta for ghe?14:38
tobiashI hope for the next version but we don't get any eta for anything from gh :(14:39
*** michael-beaver has joined #zuul14:39
mordredfuture looking statements / revenue recognition / blah blah14:39
*** jangutter has joined #zuul14:40
*** jangutter_ has quit IRC14:43
corvustobiash: i'd like to keep cyclic connected to direct-push, but i don't think we need to wait for HA scheduler.  i think your connection-report-task-queue idea may not be too hard, and i think it would be compatible with HA scheduler.14:44
corvustobiash: (or, at least, easy to transition to ha scheduler)14:45
tobiashok, sounds like a compromise, after scale out scheduler it should be easy to rip it out again14:45
corvusand i think it's fine for us to have 'experimental' support for cyclic with github, since once ghe releases with that, we'll be able to complete work easily.14:47
tobiash:)14:50
pabelangerI should note, the new triage roles in github.com are nice! We're in the processing of dropping write permissions for humans, but apparently people don't like giving up root access once they have it :D14:55
tobiashcorvus: it would be great if you could put 671674 (macos support for gear) onto your review list. This would simplify zuul development for us as we could run zuul tests natively on our dev machines then. It's not urgent though.14:55
corvustobiash: can we make the selection automatic?14:58
corvustobiash: also, why _wait_for_connection if not using epoll?15:00
*** bolg has quit IRC15:03
tobiashcorvus: yep, automatic selection should be no problem I guess15:04
tobiashregarding _wait_for_connection we should ask bolg (he seems to be offline atm)15:05
openstackgerritJan Gutter proposed zuul/nodepool master: Add port-cleanup-interval config option  https://review.opendev.org/68702415:07
*** bhavikdbavishi has quit IRC15:37
pabelangermordred: so, I'd like to drop the dependency on setup.cfg for tox_siblings, in our case we have ansible roles, that was to use tox and siblings, but we don't ship setuptool files in our repos. Looks like we just use it to get project name for logging only and the following check15:40
pabelangerchanged=False, msg="No name in setup.cfg, skipping siblings")15:40
pabelangermaybe we can flip that to look for tox.ini file15:40
pabelangeror project name from zuul variable15:41
mordredthe project name is important - it's how tox-siblings knows which of the projects in required-projects should be installed into the tox venv (it doesn't just install everything from required-projects, because of things like openstack/requirements repo)15:43
mordredpabelanger: so I don't think we should change the default behavior - that said, I think adding support somehow for the things you're trying to do with your roles would be in scope15:43
mordredhowever, roles wouldn't get "installed" into the tox venv - so maybe we should think through making sure we're solving your problem15:44
pabelangermordred: well, if we can get the name from something other then setup.cfg, that is really what I'm looking for.  zuul.project.short_name comes to mind15:46
mordredit does't match15:46
mordredwe need the pip requirement name15:46
pabelangerah15:46
mordredand we need it for every repo in required-projects15:46
mordreddo you have an example I can look at?15:47
pabelangerokay, yah, so the user case here is, no pip project, that uses tox for dependency management, and we want cross-project testing of a dependency, molecule in this case15:47
pabelangersure15:47
pabelangerhttps://github.com/ansible-security/ids_config/pull/515:47
jangutterpabelanger: regarding dropping root access, Android made me actively _want_ to have a mode where root access has been revoked.15:48
pabelangerjangutter: ++15:48
jangutterpabelanger: one of the few examples where I went from "I must be able to root the phone" to "If this phone is rootable, I don't trust it"15:48
jangutterpabelanger: I just realised that I publicly stated that I've given up having control of my own device. Handing in my geek card now.15:49
mordredpabelanger: ok - I'm in a meeting, but let me wrap my head around what you've got here15:50
pabelangermordred: np! thanks. I can work around it for now adding the setuptools bits15:50
jangutterShrews: one unrelated failure in https://review.opendev.org/#/c/687024/ but is that more or less what you had in mind?15:51
*** openstackgerrit has quit IRC15:52
mordredpabelanger: I don't think you need tox-siblings here - your install script is the thing that's handling "installing" the sibling repos15:52
mordred(also, I like the sed solution there :) )15:53
pabelangermordred: yah, for roles that is right. But in test-requirement.txt, we need to pull in never molecule version, via depends-on15:53
fungijangutter: yeah, that works *as long as* you you trust google not to abuse/sell/give root access for your device without your knowledge15:53
*** toabctl has quit IRC15:53
fungiall "root access not possible" means is that google claims there are no backdoors other than the ones it chooses to keep secret from you15:54
mordredpabelanger: nod - so the only thing that should need a setup.cfg is molecule - where is the lack of setup.cfg in that repo breaking you? (we should be gracefully handling repos that don't have setup.cfg)15:54
fungijangutter: well, and also your hardware vendor15:55
jangutterfungi: yep. and it's turtles all the way down... (speaking as someone working for a hardware vendor....)15:55
mordredpabelanger: (so it's entirely possible there's a bug we need to fix)15:55
jangutterfungi: but at the end of the day, do you really trust your physics stack? I mean neutrinos are obviously such a security hole.15:55
pabelangermordred: yes, I believe that is right. Currently setup.cfg is a hard requirement15:55
pabelangerif we can skip that, I think things will work properly, assuming we know the right name15:56
Shrewsjangutter: a quick look says "yes". will have a closer look in a bit15:56
fungijangutter: i'd wager if google wants to be able to sell those phones in countries which demand access to their residents' mobile devices, then there are probably just holes they don't tell people about so they can continue to sell their products15:57
mordredpabelanger: well - I think we don't need to know the name - is one of the older versions of the PR one without the setup.cfg?15:57
* Shrews fumes over OS X changing his default shell out from under him15:57
jangutterShrews didn't read Ars Technica's writeup on Catalina.15:57
fungii guess if those devices become unavailable for purchase in china (or, for that matter, the usa) then it probably means it's actually not granting secret access for governments and law enforcement15:58
pabelangermordred: yes I believe https://dashboard.zuul.ansible.com/t/ansible/build/c388db008b4749679d2fa3573711f52a15:58
jangutterfungi: fun fact, South Africa has a law mandating government access to intercept communications. We're waay ahead of the rest of the world with surveillance-as-a-service.15:58
mordredpabelanger: cool, thanks15:59
fungijangutter: so does the usa, it's been on the books since ~1998, i had to manage law enforcement access to packet sniffers at an internet service provider for years15:59
fungibasically they give you two days to set up access for law enforcement agencies after receipt of a warrant, but prefer you just set them up with perpetual access and they "promise" to only actually use it when they have a valid warrant issued16:00
*** bhavikdbavishi has joined #zuul16:01
fungicalea, the communications assistance for law enforcement act16:01
jangutterfungi: ours require monitoring at all times, aggregated to a centralised surveillance warehouse. access to the historical data is only given with a warrant, or if you know a guy.16:02
fungiwell, also here, warrants are available if you know a guy, but sure, may as well dispense with the formality ;)16:02
*** igordc has joined #zuul16:12
*** openstackgerrit has joined #zuul16:20
openstackgerritJames E. Blair proposed zuul/zuul-registry master: Implement namespacing  https://review.opendev.org/68761316:20
corvustristanC: ^ let me know if that makes sense16:20
openstackgerritJames E. Blair proposed zuul/zuul-registry master: Implement namespacing  https://review.opendev.org/68761316:21
tristanCcorvus: having test would help understand... could we add a test_main with the cherrypy test object?16:22
*** jamesmcarthur has quit IRC16:22
corvustristanC: i plan on adding tests, but i think the most effective test of this would be functional16:22
corvusi plan to extend the test playbook to set this up and exercise it with docker, podman, k8s, etc16:23
corvusthat's next on my list16:24
*** gothicmindfood has joined #zuul16:27
*** jangutter has quit IRC16:29
*** jamesmcarthur has joined #zuul16:30
*** hashar has joined #zuul16:32
openstackgerritJames E. Blair proposed zuul/zuul-registry master: Implement namespacing  https://review.opendev.org/68761316:33
fungilooks like nodepool-build-image has been consistently ending in post_failure today (or it's a coincidence which has hit three builds across two changes): http://zuul.opendev.org/t/zuul/builds?job_name=nodepool-build-image16:35
*** jamesmcarthur has quit IRC16:35
fungi"push: dial tcp 127.0.0.1:9000: connect: connection refused"16:36
fungiis that using the new zuul-registry implementation yet?16:36
corvusfungi: no, z-r is only used for opendev's intermediate registry16:37
fungiokay, thanks. and yeah, this error does look familiar so i guess it's just the nondeterministic problem we've seen previously with the old design16:38
fungijust wanted to be sure it wasn't something suddenly broken in z-r16:38
corvusfungi: oh i was thinking this was new16:38
corvusi don't understand the error yet16:38
fungiwell, i recall making a joke about the fact that it says "dial" earlier in the week or maybe last week16:39
* fungi digs16:39
*** jamesmcarthur has joined #zuul16:39
fungiahh, no, the other occurrence was a "no route to host" you were experiencing when testing locally16:40
funginot "connection refused"16:40
corvusthis may be fallout from the jwt change, but i don't see how yet16:40
fungibut the db does have a similar failure on record from 5 days ago, http://zuul.opendev.org/t/zuul/build/54670bf42d5a41e0bfced6e6ee3a537c16:42
corvusoh wow big bug.  patch incoming :)16:42
openstackgerritJames E. Blair proposed zuul/zuul-registry master: Fix authorization URL  https://review.opendev.org/68762216:48
fungithanks! reviewing now. jangutter initially mentioned one of those failures in #openstack-infra so i went digging to see whether it was happening often16:48
corvusfungi, tristanC, mordred: ^16:48
corvuswe also need a system-config change for opendev, i'll do that real quick16:48
corvuser that patch is missing something, 1 sec16:49
openstackgerritJames E. Blair proposed zuul/zuul-registry master: Fix authorization URL  https://review.opendev.org/68762216:49
fungiaha, so that worked when you were testing locally i guess because it actually was listening on 127.0.0.1?16:49
corvusfungi: yes and the tests are as well16:49
corvushowever, that is not the right url to use when obtaining a token from opendev's intermediate registry :)16:50
fungiof course16:51
*** jpena is now known as jpena|off16:53
*** panda is now known as panda|off16:53
daniel2so switching to raw from qcow2 entirely fixed my issues, nodes are spinning up and marked as "ready"17:03
fungiremind me, what was the previous behavior you saw when using qcow2?17:22
fungiboot timeouts?17:22
SpamapSBeen there, done that, crashed that cloud.17:23
fungidid you at least get a tee shirt?17:24
pabelangerfungi: yah, nova defaults to convert qcow2 to raw on compute nodes17:26
pabelangerwe had same issue in infracloud17:26
pabelangerbut disabled that in nova.conf17:26
fungii remember17:27
*** bhavikdbavishi has quit IRC17:37
openstackgerritMerged zuul/zuul-registry master: Fix authorization URL  https://review.opendev.org/68762217:50
openstackgerritJames E. Blair proposed zuul/zuul-registry master: Implement namespacing  https://review.opendev.org/68761317:52
corvusokay, we should be back to normal now17:52
corvusfeel free to recheck any changes that hit problems with container jobs (eg zuul-quick-start, build-image, etc)17:53
fungithanks!!!17:54
*** igordc has quit IRC17:59
*** jamesmcarthur has quit IRC18:45
*** pcaruana has quit IRC19:07
*** mgagne has quit IRC19:11
mordredcorvus: I got a 500 from registry: https://zuul.opendev.org/t/openstack/build/32f0bcffc0e84a41b22f912dd96c269019:29
mordredcorvus: my hunch is that it's an depends-on image layer from a job from before the new registry rollout ...19:30
mordredbut I have not investigated further yet, and 500 is a weird error for that19:30
corvusmordred: that would make sense, i did not port over the data19:31
corvuswe should check the times first to verify your hunch, then also, i agree, see about maybe having that situation present itself differently :)19:31
mordredcorvus: ah - yes - missing layer19:32
mordredcorvus: swift is returning a 404 and that's causing an exception in cherrypy19:32
mordredcorvus: let me see if I can make a patch19:32
*** hashar has quit IRC19:34
openstackgerritMonty Taylor proposed zuul/zuul-registry master: Raise a 404 when we don't find a blob from swift  https://review.opendev.org/68765719:39
mordredcorvus: ^^ hasher.update(None) doesn't work too well19:39
tobias-urdinquick question, if i have a job with a parent and i want to override required-projects completely and not append to it, can i do that?19:40
corvustobias-urdin: no, usually that situation calls for reworking the inheritance (so that required-projects are added at a lower level)19:42
tobias-urdinok thanks!19:42
corvusmordred: i agree with your technical fix, but i'm confused about how we ended up with a manifest referencing a blob which doesn't exist19:44
corvusmordred: can you point me at some job output?19:45
mordredcorvus: yes19:51
mordredcorvus: https://zuul.opendev.org/t/openstack/build/32f0bcffc0e84a41b22f912dd96c2690 is the job output for the failed job19:52
mordredcorvus: and then the docker logs for the registry container have the http trace19:53
mordredcorvus: I'm also confused as to how we wound up in the situation19:54
mordreds/the/that/19:54
mnaserforgive the irony of this, but is "kick off a jenkins job and return value" is something we can add into zuul/zuul-jobs ?19:54
mnaseri'm only saying it because several voices raised the need (and afaik, i think thats how volvo talked about how they did it?)19:54
pabelangershould be, I know software factory does it with ovirt, IIRC19:55
corvusmnaser: yep, i don't see why not.  it's worth noting that there are some serious pitfalls there if you want to try to do that to actually test a change, but there are plenty of other legitimate reasons to do so.19:55
*** hashar has joined #zuul19:55
corvusmnaser: having a role for that which explained clearly what it does and the limitations would be good, i think.  also, if folks wanted to try to overcome those limitations (ie, transfer git repos to a jenkins workspace, etc), good to have a place to share.19:56
corvus(to elaborate: it would be simple and correct to use that to trigger some other build or deployment system when a change merges.  it would take a lot of work to do that to run a test.  if someone thinks "i will just have my jenkins job fetch the change and test it" then they don't understand what zuul is doing)19:58
mordredyeah19:59
corvusit is a problem that can be overcome, but this is at the core of the impedance mismatch between jenkins and zuul, and, er, really the reason we wrote it in the first place.19:59
mordredyeah20:00
mordredcorvus: although - I do think zuul has grown some abilities since last I pondered this concept that might make solving it easier than it was before ... but definitely a non-zero amount of work that is fraught with dragons and peril20:00
Shrewsand ponies?  evil, evil ponies?20:01
mordredso many evil ponies20:01
fungithe only kind20:03
corvusanyone have a good 'browse swift like a filesystem' cli tool?20:03
mordredI do not20:04
corvusi just want to cat some files20:04
* corvus writes a repl real quick20:05
mordredcorvus: I will enjoy your repl20:05
fungii see https://github.com/redbo/cloudfuse but not sure if it's any good20:06
mordredalso https://github.com/ovh/svfs20:07
pabelangermnaser: I do agree with corvus, running jenkins jobs trigger from zuul, can be done, but I wouldn't do it long term. But can see it as part of a migration plan, POC.20:07
pabelangerhttps://gerrit-staging.phx.ovirt.org/#/c/379/ was the example I was thinking of20:08
fungineat, ovirt's using zuul?!? (and gerrit?)20:09
fungii had no idea20:09
pabelangerI believe they are experimenting with it, but tristanC and dmsimard should know more20:10
dmsimardcorvus: s3ql20:11
dmsimard(for browsing swift)20:11
corvusdmsimard, pabelanger, mordred: thanks!20:11
corvusmordred: it's looking for this blob: sha256:49f4f4efd2abc8d2780773269f0d96c0c62f153a673e677d2b16fe3d881aa75d20:12
tristanCfungi: here is oVirt config project: https://gerrit.ovirt.org/gitweb?p=ovirt-zuul-config.git;a=summary20:12
funginifty!20:12
corvusmordred: so the question is, when the b0ac18ded52e418b971393bc0568f7ff_latest manifest was uploaded, why wasn't that blob there?20:13
tristanCmnaser: another missing piece to trigger jenkins job is a zuul-jobs/role to create a git reference that jenkins can clone from20:13
tristanCe.g.: https://gerrit.ovirt.org/gitweb?p=jenkins.git;a=blob;f=zuul-playbooks/expose_source.yaml;h=973ff601e6e5abbbefa43808813f3c0e6946d538;hb=HEAD20:14
tristanCmnaser: we are also looking into triggering tekton pipeline from a zuul jobs...20:15
mordredcorvus: that is a fascinating question20:15
corvusmordred: that should be this job: http://zuul.opendev.org/t/openstack/build/b0ac18ded52e418b971393bc0568f7ff20:17
pabelangertristanC: why not push the content to jenkins node?20:18
tristanCpabelanger: i think it's because the job are expecting to clone something20:18
mordrednode ownership gets weird too20:19
corvusmnaser, tristanC: maybe you could work on porting that overt playbook into zuul-jobs?20:19
corvusovirt even20:19
corvusmordred: http://paste.openstack.org/show/782465/20:20
mnaseryeah, i can see if i have sometime, i'm kinda trying to see if i have an actual use case that i can test this against20:20
mordredcorvus: that looks like it uploaded and returned a 20120:20
corvusayup20:21
mordredwe're not setting expires headers are we?20:22
corvusmordred: nope.  we have a cron job to prune20:22
corvusmordred: i wonder if we're seeing another manifestation of whatever caused the issues with the docker registry?20:26
corvusa 0 byte file and a 404 share some things in common20:27
corvusmordred: i'm out of immediate ideas -- i suggest we stop the prune cron for a while and add a little more debug logging, and wait for it to happen again20:29
mordredcorvus: ooh. that's an interesting thought20:30
mordredcorvus: like - what if we're writing this, and then something is going away20:31
mordredcorvus: yeah. I think that's a good idea20:31
fungiwhat criteria does the prune use (or is it supposed to use) to decide it's safe to remove images?20:31
mordredcorvus: also - it might be wasteful resource-wise - but what if we fetch after put to verify we get content back?20:31
fungithat does seem like it could be costly for large images20:32
mordredcorvus: (although, also - technically it is an eventually consistent system, so I believe the object is not guaranteed to exist immediately after PUT)20:32
corvusmordred: can we get clarification on that?20:32
corvusbecause that sounds very problematic20:32
mordredfungi: maybe we can HEAD the object we PUT to at least get a record in the logs that swift thinks the thing is there and has a length?20:32
fungithat would not be costly, sure20:32
corvusmordred: the head sounds like a reasonable debug thing under the circumstances.20:33
*** pcaruana has joined #zuul20:33
mordredtimburke, tdasilva: ^^ are we saying dumb swift things?20:33
fungiand yeah, i've always heard that swift makes no guarantees of immediate availability from one node for objects written to another node, but i'm unsure of the details of that20:33
mordredcorvus: do we know if we've seen this on both ceph and swift? (assuming the zero-byte issue is the same as this 404 issue)20:36
corvusmordred: only ever on rax swift20:37
corvusmordred: by that, i mean we've never run this on anything else20:37
mordredcorvus: nod20:37
mnaseri'm pretty sure swift has eventual consistency20:39
* timburke is reading...20:39
mnaser" For example, suppose a container server is under load and a new object is put in to the system. The object will be immediately available for reads as soon as the proxy server responds to the client with success. However, the container server did not update the object listing, and so the update would be queued for a later update. Container listings, therefore, may not immediately contain the object."20:40
*** pcaruana has quit IRC20:40
mnaserfrom swift docs, but, i'm sure you have a much better source of info :)20:40
corvusthat's fine -- we only use listings for async things like prune20:41
corvus(but if a put=201 can be followed by a get=404, then we may need to rethink some things)20:42
timburkeyou'd need some pretty pathological failures to have a 201 on PUT be followed by a 404 on GET -- having a bunch of nodes error-limited during the PUT but then having that wear off by the time you're doing the GET, or some weird write affinity setting and then the cluster's WAN link goes down, that sort of thing20:46
timburkeof course, i don't have much insight on how rax swift behaves these days...20:47
corvuscool, i think that's good enough for us to assume that mordred's 'head after put' is a potentially useful debug tool as well as confirm that our design isn't crazy20:47
corvus(if head after put returns 404, then indeed it does mean something has gone wrong)20:48
corvusmordred: since that upload, we've enable cherrypy request logging, so we're already getting a bit more info than before, though i have my doubts that would tell us more in this case -- we have enough logging to know that we believe we uploaded both the manifest blob and the tag which pointed to it.  i think the head-after-put log would be useful, and turning of pruning just to eliminate that as a potential20:55
corvussource are the best next steps.20:55
timburkefwiw, the 404 in http://paste.openstack.org/show/782465/ seems expected -- given the is-stale check around https://github.com/openstack/openstacksdk/blob/0.36.0/openstack/object_store/v1/_proxy.py#L37820:56
timburkei guess i'm still wondering what the observed bad was -- but i've gotta run for a meeting20:57
corvustimburke: oh sorry -- it's the next 404 after that, several days later, that's unexpected20:59
mordredcorvus: want me to make a patch for that?21:01
corvusmordred: that'd be great, i've got a handful of half-written patches in my tree right now :/21:01
mordredkk21:02
corvustimburke, mordred: here's all the relevant logging i can come up with:  the first chunk is the upload of the blob and the tag which references it, and the second chunk is when we went to go fetch that blob.21:02
corvustimburke, mordred: http://paste.openstack.org/show/782618/21:03
corvusmordred, fungi: the tricky thing about the prune command is that i can't find any output to confirm whether it did or did not delete anything.  i *expect* it not to delete anything, and in that case i also expect it not to output anything.  so i think the approach there should be for us to turn off prune for a bit (we're not scheduled to prune anything for a while anyway), update it to emit some logging21:04
corvuswithout pruning, make sure we get that going where we want, then turn it back on for real.21:04
mordredcorvus: ++21:04
fungiyeah, that's reasonable21:05
fungii too would expect no output to mean nothing was done, but out of curiosity how is docker able to figure out that images written to the registry are no longer "needed"?21:06
fungidoes that depend on having actual containers running from them on the system in question?21:06
fungi(which for the registry would only be the images used to run the registry service itself, right?)21:07
corvusfungi: it's not docker doing the pruning i'm talking about, it's zuul-registry.  zuul-registry prune does 2 things -- clean up aborted uploads, and remove images older than a certain amount of time (180 days for opendev) -- it is, after all, an *intermediate* registry.21:08
corvusas long as we don't have many aborted uploads, our usage won't be affected by not pruning for quite some time21:09
fungiahh, well 687673 is turning off the `docker image prune -f` task21:09
fungithought that was still what was being discussed, sorry21:10
corvusfungi: welp, that change is completely wrong then let me fix it :)21:10
fungi;)21:11
fungiokay, the new patchset is making more sense in the context of this discussion, thanks!21:12
corvusyay code review21:12
*** hashar has quit IRC21:19
*** armstrongs has joined #zuul21:25
openstackgerritMonty Taylor proposed zuul/zuul-registry master: HEAD object after PUT  https://review.opendev.org/68768121:25
*** rfolco has quit IRC21:28
openstackgerritMerged zuul/zuul-registry master: Raise a 404 when we don't find a blob from swift  https://review.opendev.org/68765721:29
openstackgerritMonty Taylor proposed zuul/zuul-registry master: HEAD object after PUT  https://review.opendev.org/68768121:34
*** armstrongs has quit IRC21:35
*** avass has quit IRC21:37
*** jamesmcarthur has joined #zuul22:42
openstackgerritJames E. Blair proposed zuul/zuul-registry master: Run docker and podman push/pull tests  https://review.opendev.org/68769222:46
corvustristanC: i did not get as far as i hoped with tests today, but my plan is to essentially do what's in that change ^ again in the buildset configuration; hopefully tomorrow22:47
openstackgerritMerged zuul/nodepool master: Assign static 'building' nodes in cleanup handler  https://review.opendev.org/68726123:01
*** saneax has joined #zuul23:14
*** openstackstatus has joined #zuul23:19
*** ChanServ sets mode: +v openstackstatus23:19
*** tosky has quit IRC23:30
*** jamesmcarthur has quit IRC23:43

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!