*** wxy-xiyuan has joined #zuul | 00:04 | |
*** adamw has joined #zuul | 01:15 | |
*** adamw has quit IRC | 01:19 | |
*** adamw has joined #zuul | 01:20 | |
*** jamesmcarthur has quit IRC | 02:43 | |
*** jamesmcarthur has joined #zuul | 02:53 | |
*** bhavikdbavishi has joined #zuul | 03:06 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: upload-afs: rename to upload-afs-roots; add afs-upload-synchronize https://review.opendev.org/705368 | 03:07 |
---|---|---|
*** bhavikdbavishi1 has joined #zuul | 03:13 | |
*** bhavikdbavishi has quit IRC | 03:15 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:15 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: upload-afs: rename to upload-afs-roots; add afs-upload-synchronize https://review.opendev.org/705368 | 03:18 |
openstackgerrit | Ian Wienand proposed zuul/project-config master: Migrate to upload-afs-roots role https://review.opendev.org/705372 | 03:22 |
*** jamesmcarthur has quit IRC | 03:22 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Remove deprecated upload-afs role https://review.opendev.org/705373 | 03:24 |
*** jamesmcarthur has joined #zuul | 03:28 | |
*** jamesmcarthur has quit IRC | 03:46 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: upload-afs: rename to upload-afs-roots; add afs-upload-synchronize https://review.opendev.org/705368 | 03:48 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Remove deprecated upload-afs role https://review.opendev.org/705373 | 03:48 |
*** jamesmcarthur has joined #zuul | 03:51 | |
*** jamesmcarthur has quit IRC | 04:50 | |
*** raukadah is now known as chkumar|rover | 04:55 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #zuul | 05:34 | |
*** jamesmcarthur has joined #zuul | 06:01 | |
*** jamesmcarthur has quit IRC | 06:06 | |
*** sanjayu_ has joined #zuul | 06:12 | |
*** mattw4 has joined #zuul | 06:17 | |
*** sanjayu__ has joined #zuul | 06:22 | |
*** sanjayu_ has quit IRC | 06:25 | |
*** mattw4 has quit IRC | 06:43 | |
*** felixedel has joined #zuul | 06:48 | |
*** sanjayu__ has quit IRC | 06:56 | |
*** saneax has joined #zuul | 06:58 | |
AJaeger | ianw: what about sending the announcement for the removal already now so that people can review the whole stack? Or do you want to give folks a chance for first review today? corvus, any suggestions on the stack above? ^ | 07:05 |
*** AJaeger has quit IRC | 07:06 | |
*** AJaeger has joined #zuul | 07:08 | |
*** felixedel has quit IRC | 07:27 | |
*** pcaruana has joined #zuul | 07:43 | |
*** tosky has joined #zuul | 08:29 | |
ianw | AJaeger: i don't know if i'd send something to the announce list about a *potential* deprecation, but if you'd like something to the main list i could | 08:31 |
ianw | i feel like the intersection of people using zuul and uploading various artifacts to afs is probably just openstack, though? | 08:32 |
AJaeger | ianw: I guess, but we should announce it nevertheless. I suggest you chat with corvus when both of you are online ;) Since he wrote the role initially AFAIR, he might be a good reviewer as well. | 08:43 |
*** yolanda has joined #zuul | 09:03 | |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Implement github checks API https://review.opendev.org/705168 | 09:18 |
mnaser | ^ this is cool! | 09:21 |
*** sshnaidm|off is now known as sshnaidm | 09:30 | |
*** felixedel has joined #zuul | 09:53 | |
*** felixedel has left #zuul | 10:02 | |
*** bhavikdbavishi has quit IRC | 10:12 | |
*** wxy-xiyuan has quit IRC | 10:42 | |
*** mhu has joined #zuul | 10:43 | |
*** bhavikdbavishi has joined #zuul | 11:10 | |
tobiash | corvus: when you have time, I'd like to discuss a cherrypy related problem we have and possible options to resolve this | 11:40 |
webknjaz | @tobiash: if it's generic enough, you can ask me about CherryPy | 11:42 |
tobiash | tldr is that we're facing occasional connection resets when connecting to zuul-web when many requests in parallel are coming in. I can reproduce this problem also locally with a minimal cherrypy based server. The hypothesis is that the http server of cherrypy cannot accept connections fast enough so they are rejected by the kernel | 11:42 |
*** bhavikdbavishi1 has joined #zuul | 11:43 | |
tobiash | I've experimented locally with combining cherrypy with the tornado web server (see https://docs.cherrypy.org/en/latest/deploy.html#tornado) | 11:43 |
*** bhavikdbavishi has quit IRC | 11:44 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 11:44 | |
webknjaz | does it happen w/o tornado? | 11:44 |
tobiash | which resolves this issue, I also have a running poc with zuul that works except websockets | 11:44 |
webknjaz | what's the worker thread count setting? | 11:44 |
tobiash | webknjaz: I've experimented with default setting, and also higher settings (like 100) | 11:45 |
webknjaz | there's recently been some refactoring in Cheroot concerning worker threads, try updating and see if it gets better | 11:46 |
tobiash | I used latest versions which are on pypi for both cherrypy and cheroot | 11:46 |
webknjaz | if you can share a reproducer with just CherryPy, I could take a deeper look. maybe post that as a GH issue. | 11:47 |
tobiash | test example cherrypy only: http://paste.openstack.org/show/789060/ | 11:48 |
*** hashar has joined #zuul | 11:48 | |
tobiash | test example cherrypy with tornado: http://paste.openstack.org/show/789061/ | 11:48 |
tobiash | k, I'll open an issue | 11:48 |
webknjaz | thanks | 11:49 |
webknjaz | and also describe how you send requests and all the details there | 11:49 |
tobiash | yes, thanks | 11:51 |
*** felixedel has joined #zuul | 11:58 | |
tobiash | webknjaz: shall I open the ticket against cherrypy or cheroot? | 12:07 |
webknjaz | Not sure yet. It looks like a Cheroot issue but it's fine to open it in CherryPy too. I think, if I'll manage to come up with a Cheroot only reproducer, I'll just transfer the issue across repos myself (github will put a redirect from the old location so it's fine). | 12:09 |
tobiash | ok, thanks | 12:10 |
*** avass has joined #zuul | 12:11 | |
*** rfolco has joined #zuul | 12:24 | |
tobiash | issue: https://github.com/cherrypy/cherrypy/issues/1839 | 13:01 |
webknjaz | thanks | 13:01 |
*** rlandy has joined #zuul | 13:01 | |
tobiash | thanks for helping :) | 13:02 |
openstackgerrit | Merged zuul/zuul master: Dockerfile: create a zuul user with uid 10001 https://review.opendev.org/650246 | 13:09 |
*** bolg has joined #zuul | 13:10 | |
mhu | ttx, thanks for the clarification on the history w/ CNCF & CDF - the whole github criteria is indeed unreasonable | 13:16 |
mhu | and also weird since non open source offerings like Circle CI are on the landscape | 13:17 |
*** jamesmcarthur has joined #zuul | 13:21 | |
*** jamesmcarthur has quit IRC | 13:34 | |
*** Goneri has joined #zuul | 13:40 | |
*** jamesmcarthur has joined #zuul | 13:46 | |
hashar | hi | 13:56 |
hashar | mhu: I would guess it is to gauge the popularity of a software | 13:56 |
hashar | since most everything is on github nowadays | 13:57 |
mhu | hashar, yes that was my assumption too | 13:57 |
hashar | the repository used to be monitored from gerrit to github, maybe that is enough | 13:57 |
hashar | or one could check with them as to why github is a requirement in the first place | 13:58 |
ttx | mhu: the weird part is that the github requirement is only for open source projects. | 14:08 |
ttx | So basically it's easier to get a proprietary solution listed. | 14:09 |
pabelanger | 'Projects must be open source and hosted on or mirrored to GitHub.' | 14:13 |
pabelanger | how does proprietary even get hosted? | 14:13 |
pabelanger | https://github.com/cdfoundation/cdf-landscape#new-entries | 14:13 |
hashar | oh | 14:19 |
hashar | they have a landspace.yml file with list of projects | 14:19 |
hashar | and then a bot auto generated a processed_landscape.yml to add bunch of metadata | 14:19 |
hashar | such as # of commits, # of twitts etc | 14:19 |
webknjaz | tobiash: I've added comments on the issue | 14:19 |
mordred | yeah. it's very much from a worldview of commercial mindset | 14:20 |
tobiash | webknjaz: thanks, I'm already trying this out | 14:20 |
hashar | so I guess they require github for their bot to be able to process the project extra meta data | 14:20 |
tobiash | the sysctl values you mentioned are the same on the ubuntu box by default | 14:20 |
tobiash | trying now server.socket_queue_size with that value on ubuntu | 14:20 |
webknjaz | do you have slow machines/a lot of things running in background? | 14:20 |
tobiash | no, macos was idle otherwise and the ubuntu test is on a freshly spawned ubuntu aws machine | 14:21 |
*** jamesmcarthur has quit IRC | 14:21 | |
*** jamesmcarthur has joined #zuul | 14:23 | |
tobiash | still getting connection resets | 14:23 |
tobiash | also with somaxconn 1024 no change | 14:24 |
*** bolg has quit IRC | 14:29 | |
webknjaz | tobiash: did you adjust `somaxconn` with `server.socket_queue_size` setting? | 14:33 |
tobiash | yes, default is 128 so I tested with server.socket_queue_size and I did a second test run with both values 1024 | 14:34 |
*** jamesmcarthur has quit IRC | 14:35 | |
*** bhavikdbavishi has quit IRC | 14:36 | |
*** zxiiro has joined #zuul | 14:37 | |
*** chkumar|rover is now known as raukadah | 14:37 | |
webknjaz | plz try `cheroot<8.1.0` just in case | 14:41 |
hashar | tobiash: for sockets troubles, I often rely on the utility `ss` a good swiss army knife to list/monitor socket states | 14:42 |
hashar | so you can list all sockets in state time_wait with destination port 443 for example | 14:42 |
webknjaz | yeah, `ss` should be helpful here | 14:43 |
hashar | which stands for "socket statistics" | 14:45 |
webknjaz | yep | 14:47 |
webknjaz | Just ran that reproducer in 4 parallel shells, still not reproducible on my machine | 14:47 |
hashar | might be different linux kernel tcp settings? :/ | 14:48 |
webknjaz | totally, taking into account it's Gentoo Linux :) | 14:48 |
tobiash | I just retried the test with cheroot<8.1.0 and the resets are gone | 14:48 |
tobiash | however it seems to get slow at some point | 14:48 |
webknjaz | alright, so we need to work towards creating a reproducer in pure Cheroot | 14:48 |
webknjaz | yep, it's slow because worker threads are per TCP connection, and not per HTTP request + the pool size is static (unless you use `cherrypy-dynpool`) | 14:50 |
webknjaz | moving the issue to the Cheroot tracker then | 14:50 |
tobiash | oh from the docs I thought it's growing automatically | 14:50 |
webknjaz | nope, the pool doesn't grow but there's an interface for external things to resize it. | 14:51 |
tobiash | yepp, with 200 threads it's fast :) | 14:51 |
webknjaz | I'm curious whether it's related to https://github.com/cherrypy/cheroot/issues/249 | 14:55 |
*** fbo has quit IRC | 15:00 | |
tobiash | webknjaz: shall I try a bisect? | 15:01 |
webknjaz | I mean, it's quite obvious that the regression is introduced by https://github.com/cherrypy/cheroot/pull/199 | 15:02 |
webknjaz | We need to (1) create a stable reproducer based on pure Cheroot (w/o CherryPy layer) and (2) figure out how to fix it. | 15:03 |
tobiash | is there a minimal standalone example how to use it? Then I'll try to reproduce it directly with cheroot | 15:04 |
webknjaz | oh, it's just a WSGI server (well, on top of an HTTP server) | 15:05 |
webknjaz | feed it with a dummy WSGI app | 15:05 |
webknjaz | simple example: https://github.com/cherrypy/cheroot/blob/bee5df9/cheroot/wsgi.py#L7-L15 | 15:06 |
*** jamesmcarthur has joined #zuul | 15:12 | |
tobiash | also reproduces with the same client script (needs a few more threads) and with directly with ^ (only changed port) | 15:15 |
*** mattw4 has joined #zuul | 15:18 | |
webknjaz | I'll try to hit it with `ab` locally, maybe if I use big enough parallelism value, it'll break... | 15:20 |
webknjaz | I guess it's because it' | 15:22 |
webknjaz | it's hard to hit limits with 32GB + i7 w/ 8 virtual cores | 15:23 |
tobiash | well, I tested on an 72 core aws ubuntu machine ;) | 15:24 |
tobiash | (it's also used for other tests on an on-demand basis) | 15:24 |
*** mattw4 has quit IRC | 15:25 | |
webknjaz | So I increased numbers in the client script to ~10k and started hitting `ERROR:test:FAILED http://localhost:5000/info: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /info (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f028f3fe790>: Failed to establish a new connection: [Errno 16] Device or resource busy'))` but that seems like the client got stuck, not srv | 15:26 |
tobiash | yes, I get a mixture of 'Connection reset by peer' and 'Remote end closed connection without response' | 15:27 |
webknjaz | oh, and the server was hitting "too many open files" for a while | 15:27 |
webknjaz | that's different | 15:27 |
webknjaz | facepalm | 15:27 |
webknjaz | I was testing against wrong cheroot version | 15:28 |
webknjaz | okay, now I see the error as described | 15:29 |
tobiash | :) | 15:29 |
webknjaz | PBKAC | 15:29 |
*** zbr has quit IRC | 15:30 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Cap cheroot to fix issues with concurrent requests https://review.opendev.org/705459 | 15:32 |
tobiash | zuul-maint: this avoids hittting https://github.com/cherrypy/cheroot/issues/263 until there is a fix available ^ | 15:33 |
tobiash | webknjaz: thanks a lot for your help! | 15:33 |
webknjaz | you're welcome :) | 15:34 |
*** zbr has joined #zuul | 15:34 | |
*** hashar has quit IRC | 15:35 | |
*** rfolco is now known as rfolco|eats | 15:43 | |
mordred | corvus, tristanC: in https://review.opendev.org/#/c/702106 - what installs ZK now? | 15:46 |
*** electrofelix has joined #zuul | 15:47 | |
tristanC | mordred: the zookeeper service definition is https://review.opendev.org/#/c/702106/22/conf/zuul/resources.dhall@558 , and it's currently just spawning docker.io/library/zookeeper | 15:48 |
tristanC | mordred: i mean, it is an optional service, that depends if the user provides a zk secret | 15:49 |
mordred | tristanC: ah - cool - thanks! | 15:50 |
tristanC | mordred: and this is not sufficient according to the spec, a follow-up should setup a more solid solution such as the zk operator or even the helm chart | 15:50 |
mordred | yah - but what's there is an adequate first step I think | 15:50 |
tristanC | yep, that's just the easiest thing to use right now to get it going | 15:50 |
*** felixedel has quit IRC | 15:55 | |
mordred | tristanC: ok - another one for you - in https://review.opendev.org/#/c/702716/7/roles/zuul-ensure-gearman-tls/tasks/main.yaml - why write the generated certs locally? just for debugging? | 15:56 |
tristanC | mordred: just because it's easier to make the openssl cli write file | 15:57 |
tristanC | mordred: the operator sdk setups a cwd per CR to avoid conflict, and iiuc this can be used for such tasks | 15:58 |
mordred | tristanC: oh - no - what I mean is - line 36 - we don't seem to use those files for anything? | 15:59 |
mordred | (I'm just wondering if I'm missing something - the openssl creation and then k8s secret creation makes sense) | 16:00 |
tristanC | mordred: oh that, they are used in the next review, to dump the status queues | 16:00 |
tristanC | mordred: https://review.opendev.org/#/c/703624/7/roles/zuul-restart-when-zuul-conf-changed/module_utils/gearlib.py@24 | 16:01 |
mordred | tristanC: ah - cool | 16:01 |
mordred | tristanC: I really wish lookup syntax was different | 16:06 |
tristanC | mordred: what do you mean? | 16:07 |
mordred | tristanC: the zuul_conf_secret: "{{ lookup('k8s', api_version='v1', kind='Secret', namespace=namespace, resource_name=zuul_name + '-secret-zuul') }}" would be SO neat if it was more structured and less embedded in a string - but there's nothinng we can really do about that :) | 16:10 |
tristanC | mordred: indeed, that's odd we are not using them as regular ansible task | 16:14 |
mordred | yeah. It's always been weird to me that it's not a more first-class construct. but oh well | 16:15 |
*** avass has quit IRC | 16:27 | |
*** sshnaidm is now known as sshnaidm|afk | 16:27 | |
*** mattw4 has joined #zuul | 16:31 | |
*** mattw4 has quit IRC | 16:32 | |
*** jamesmcarthur has quit IRC | 16:33 | |
*** jamesmcarthur has joined #zuul | 16:37 | |
*** zxiiro has quit IRC | 16:45 | |
*** mattw4 has joined #zuul | 16:51 | |
*** rfolco|eats is now known as rfolco | 16:51 | |
*** mattw4 has quit IRC | 16:56 | |
tristanC | mordred: thanks for the review on the zuul-operator. The next things we'll need is the the zuul secret to get the image promotion: https://review.opendev.org/704187 needs a config core to zuul-encrypt the dockerhub password | 16:58 |
*** mattw4 has joined #zuul | 17:08 | |
corvus | tristanC: can you update the nodepool dockerfile to match the user approach in zuul? | 17:33 |
*** evrardjp has quit IRC | 17:33 | |
*** evrardjp has joined #zuul | 17:34 | |
tristanC | corvus: yes, should we remove the entrypoint too? | 17:35 |
corvus | tristanC: yes, i think so. it's probably not as dangerous there, but still could be problematic on a builder. | 17:37 |
openstackgerrit | Tristan Cacqueray proposed zuul/nodepool master: Dockerfile: create a nodepool user with uid 10001 https://review.opendev.org/705497 | 17:38 |
openstackgerrit | Tristan Cacqueray proposed zuul/nodepool master: Dockerfile: remove the uid_entrypoint service https://review.opendev.org/705241 | 17:41 |
tristanC | corvus: in case the removal is found problematic, here are the required changes split in two ^ | 17:42 |
clarkb | tristanC: corvus I've gone through all but the last change in the operator stack and left thoughts | 17:49 |
clarkb | For the dhall itself it looks like it got cleaned up quite a bit which is nice | 17:49 |
clarkb | still a bit of nesting but looks like the vars from parent scope are used in child scopes and things are labeled well | 17:49 |
clarkb | corvus: https://review.opendev.org/#/c/703624/7/roles/zuul-restart-when-zuul-conf-changed/library/dump_zuul_changes.py has a question you might want to look at in particular because I think it is a general zuul thing | 17:51 |
corvus | clarkb: tricky. in the long run, the ha-scheduler work should obviate the need for any of this, and avoids that issue. i don't want to invest too much time working around the status quo and would rather look ahead to implementing that. however, adding a 'do-not-restore' flag isn't a big deal. but really, you'd still need to be careful about pipelines like that anyway (if you added the flag, then a | 17:57 |
corvus | restart happened, you could be missing a bunch of release jobs). so really, the best thing is probably for folks to just be careful about when they do things that cause scheduler lifecycle events (or else, just don't support that for now). | 17:57 |
corvus | i pretty much answered that with stream-of-conciousness didn't it? | 17:58 |
corvus | i guess to some up: there be dragons, and one way or another we either shrug off the difficulties, or just don't do that for now. | 17:58 |
corvus | sum up even | 17:58 |
clarkb | corvus: ya I think the difference is the oeprator hides that from the humans | 17:59 |
clarkb | today without the operator you have to think about those consequences | 17:59 |
clarkb | with the operator you won't notice until it breaks your releases (or similar) | 17:59 |
*** electrofelix has quit IRC | 18:06 | |
*** jamesmcarthur has quit IRC | 18:12 | |
tristanC | clarkb: thank you for the review, i replied to your comment | 18:16 |
*** jamesmcarthur has joined #zuul | 18:19 | |
clarkb | tristanC: I guess centos8 doesn't have openshift/kubectl pacakges yet? | 18:21 |
tristanC | clarkb: it doesn't seem like there is one already. it isn't part of centos7 either, it's only provided by the PaaS SIG | 18:23 |
clarkb | but the paas sig packages for centos 7 | 18:24 |
clarkb | (mostly just weird to be using old centos here since the base image is fedora iirc) | 18:24 |
clarkb | I guess if it gets compiled into a statically linked binary from golang then it doesn't matter much | 18:25 |
clarkb | I think we should add a note to https://review.opendev.org/#/c/703631/5/roles/zuul-reconfigure-tenant-when-conf-changed/library/k8s_exec.py about when we can remove it | 18:26 |
tristanC | clarkb: operator-framework image are based on ubi (rhel8), and yes, static golang package install works acros el7 and el8 | 18:27 |
tristanC | clarkb: yes, let me update that file, and i can switch to tarball install from https://dl.k8s.io/v1.17.0/kubernetes-client-linux-amd64.tar.gz | 18:28 |
Shrews | corvus: line 425 of your change in https://review.opendev.org/#/c/705053/2/zuul/driver/gerrit/gerritconnection.py ... i feel like that string interpolation is missing the 'age' param, or did i miss something? | 18:28 |
Shrews | changes = self.connection.simpleQueryHTTP("status:merged -age:%ss") | 18:29 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add foreground option https://review.opendev.org/635649 | 18:30 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Deprecate -d switch for running in foreground https://review.opendev.org/705185 | 18:30 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Don't enforce foreground with -d switch https://review.opendev.org/705189 | 18:30 |
mordred | Shrews: I agree with you | 18:33 |
Shrews | i thought maybe there is some scary magic in simpletQueryHTTP... but does not appear to be the case | 18:33 |
*** armstrongs has joined #zuul | 18:39 | |
*** armstrongs has quit IRC | 18:42 | |
*** plaurin has joined #zuul | 18:47 | |
*** jamesmcarthur has quit IRC | 18:54 | |
plaurin | Hello irc people! Quick question, when using kubernetes with zuul, the log streams seems a bit different. I am reusing a job that used to work on static ssh nodes, but not I seem to have less output when it's running on a pod than an ssh node | 18:55 |
plaurin | running on static node seems to be more verbose by defaut than kubectl/openshift | 18:56 |
tristanC | plaurin: that's excepted, the zuul_stream module used to stream console logs requires a tcp connection, not available with pods | 18:56 |
plaurin | any possible workaround? That's a major impact for me if I cannot have proper log stream output (I have jobs that runs for 4+ hours) | 18:58 |
plaurin | other than using debug everywhere | 19:00 |
tobiash | plaurin: with the current log streaming mechanism I don't see any easy way of improving this unfortunately | 19:04 |
tobiash | it might be possible to leverage kubectl port-forward to implement this for kubectl connections | 19:05 |
tobiash | for this to work one would need to map an unused port to each host and override the log streaming target to localhost:<chosen port> for the according host | 19:06 |
tobiash | and run a kubectl port forward process in parallel similar to the ssh-agent | 19:07 |
clarkb | thinking out loud here, could you stream the `docker log $container` logs | 19:07 |
clarkb | that should capture stdout and stderr right? | 19:07 |
tobiash | clarkb: zuul expects to get the log stream per command task | 19:08 |
clarkb | tobiash: I thouht it was per playbook. | 19:08 |
tobiash | it's per task and also the log stream signals the end (and result code) of the task afaik | 19:09 |
plaurin | oh that's why I lose my stream for my tasks that take 1+ hours | 19:09 |
clarkb | I think I understand what you are saying. I think I'm suggesting that we don't run the ansible plugin that does that at all | 19:09 |
clarkb | and instead bypass ansible and look directly at the stdout from the container. But ansible probably eats all that I guess | 19:10 |
clarkb | maybe we can update the ansible plugin to write to stdout under k8s instead of to a file that is stream? | 19:10 |
clarkb | then on the executor side we can stream it from `docker logs` or similar | 19:10 |
tobiash | there are also some nifty logging poc refactorings by mordred that might resolve this issue as well | 19:12 |
clarkb | essentially what we've done is buidl our own docker log utility for VMs | 19:13 |
clarkb | and I think we should be able to use docker log (or equivalent) when running in ks8? | 19:13 |
corvus | clarkb, tobiash: iirc, the architecture here is that zuul has a callback plugin which gets called on every task. if the task is a "command" task, we start streaming from the remote host. now we're talking about a kubectl exec task, right? so maybe we can update that callback plugin to say "if the task is kubectl exec, start a 'kubectl logs -f' process" | 19:14 |
corvus | clarkb: (kubectl logs == docker logs, but does not have to run on the same host) | 19:14 |
clarkb | corvus: ya I think that is the more specific version of what I'm trying to say) | 19:14 |
corvus | clarkb: yeah, sorry -- i should have specified i was trying to articulate your idea in more detail :) | 19:15 |
mordred | corvus, clarkb: that sounds like a decent and tractable idea | 19:15 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed https://review.opendev.org/703631 | 19:15 |
plaurin | seems like a real good idea to me, I would be the 'first customer' of such a feature | 19:16 |
tobiash | yes, sounds doable as long as we keep outputting the same end markers | 19:16 |
tobiash | ftr, this is the mordred redesign logging stack: https://review.opendev.org/#/q/topic:zuul-stream-rework+(status:open+OR+status:merged) | 19:17 |
corvus | actually, is this where we would implement it? https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/callback/zuul_stream.py#L269-L271 | 19:17 |
corvus | (i guess it's a command task on a kubectl connection we're talking about?) | 19:18 |
tobiash | corvus: yes, that's the receiving side | 19:18 |
corvus | so that becomes "start a kubectl logs thread" instead of "start a tcp receiver thread" | 19:19 |
tobiash | I think if we go that route we should abstract the streamer here: https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/callback/zuul_stream.py#L275 | 19:19 |
tobiash | maybe change that from a plain thread with a function to a streamer base class that is/has a thread and specializations for tcp and kubectl | 19:20 |
corvus | tobiash: good idea; that may help with mordred's rework too | 19:20 |
mordred | ++ | 19:20 |
corvus | (because then you could replace the tcp with domain sockets, and not touch the kubectl) | 19:20 |
clarkb | tristanC: ianw can you check my comment on https://review.opendev.org/#/c/705337/1 and see what you think? | 19:21 |
tobiash | corvus: btw, hadn't we the same exception for winrm there? https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/callback/zuul_stream.py#L269 | 19:22 |
tobiash | with winrm we don't support streaming yet as well | 19:23 |
clarkb | tobiash: corvus we would still need to update the python that runs in ansible on the remote side to write to stdout instead of to the log file that is streamed right? | 19:23 |
plaurin | tobiash, corvus, mordred: if I can help in some way let me know, else you have my full support for this discussion around logs 😎️ | 19:23 |
corvus | tobiash: that seems like a good idea -- do you get errors or warnings or anything for failed streaming attempts? | 19:23 |
corvus | plaurin: are you interested in doing some python hacking? :) | 19:23 |
tobiash | corvus: sometimes yes, and I thought I was pretty sure that I added this even before the kubectl thing | 19:24 |
tobiash | clarkb: yes, that's the other thing that's needed | 19:24 |
corvus | clarkb, tobiash, mordred: oh wait -- are we talking about the case where we run a pod that just sits there and does nothing as the main process while we run other commands on the pod? | 19:25 |
corvus | because if that's the case, then kubectl logs isn't going to produce any useful output | 19:25 |
corvus | (the pod-as-machine case or whatever we called it) | 19:25 |
clarkb | corvus: I think what happens is because we don't have log streaming all of the ansible stuff that happens on the pod is written to a file that we can't get at | 19:26 |
tobiash | corvus: oh yes I think so | 19:26 |
*** hashar has joined #zuul | 19:26 | |
tobiash | corvus: but we could still do a kubectl port-forward and stream from localhost thing with the same abstraction in the callback | 19:26 |
clarkb | if we changed that ansible plugin thing to write to stdout we would be able to get that data from the lgos command | 19:26 |
tobiash | in that case we wouldn't even need to modify the command module | 19:27 |
clarkb | if however it isn't using that plugin thing then ya I think what tobiash says is correct | 19:27 |
tristanC | clarkb: zj prefix wfm. though i've been using '_' successfully, and it's surprising the doc doesn't mention this as valid since it's common for python variable name. | 19:27 |
clarkb | we just grab the stdout anyway | 19:27 |
clarkb | tristanC: I think my biggest concern would be the next release of ansible deprecating that it works :/ | 19:28 |
clarkb | tristanC: I can use zj_ as a prefix as that is valid according to the docs and should make it unique enough for us | 19:28 |
tobiash | corvus: phew windows is filtered out at a different point: https://review.opendev.org/#/c/615804/1/zuul/ansible/callback/zuul_stream.py | 19:28 |
tobiash | I almost thought this got broken | 19:29 |
plaurin | corvus: not really interested for contributing python, you would have to take me by the hand anyways to produce anything of value, but I'm willing to try and test patches on my setup | 19:29 |
corvus | plaurin: ack. it looks like we're back to the design phase anyway :/ | 19:29 |
corvus | tobiash: i think i'm not following the forward idea | 19:29 |
openstackgerrit | Merged zuul/zuul master: Add gcloud_service auth option for Gerrit driver https://review.opendev.org/704904 | 19:30 |
corvus | clarkb: so your idea is change the kubectl connection plugin to output whatever it gets to stdout? | 19:30 |
tobiash | corvus: instead of kubectl logs the streamer could execute kubectl port-forward <some local port>:<log port> and then stream from localhost:<some local port> | 19:30 |
tobiash | that would be possible as well if we do the streamer abstraction | 19:31 |
plaurin | corvus: also for now I might be able to have a workaround using async tasks + debug statements, that might work | 19:31 |
tobiash | to be precise: "kubectl port-forward pod/mypod :19885", parse the output to get local port and then stream from localhost:<parsed port> | 19:33 |
clarkb | corvus: similar to how the command stuff happens for VMs but instead of writing tona file write to stdout then it can be streamed from k8s | 19:34 |
corvus | tobiash: in this situation, does the modified command module still run? | 19:34 |
tobiash | corvus: yes, the same command module as we use right now | 19:34 |
tobiash | and the zuul_console module as we use in vms to serve on 19885 as well | 19:35 |
corvus | so when you do "command: /bin/true" on a vm, it copies an ansiball over ssh to the vm and executes it with python. when you do the same for a host with a kubectl connection type, it also copies an ansiball over (via kubectl) and executes it with python? | 19:36 |
tobiash | that's what I expect | 19:36 |
corvus | ok, then yeah, the port-forward thing sounds like it would be feasible then | 19:37 |
tobiash | otherwise you'd need kube_foo modules that work differently just as it is with windows | 19:37 |
tobiash | we could make the streamers even stackable so we can reuse the tcp streamer then | 19:37 |
corvus | clarkb: ^ if that's the case, then i think that's why the 'stdout' approach wouldn't work -- the stdout would actually just be the json, just like a regular ssh connection | 19:38 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Gerrit: poll for merged changes if no stream events https://review.opendev.org/705053 | 19:38 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add google-cloud-storage to executor ansible https://review.opendev.org/705279 | 19:38 |
clarkb | corvus: ah | 19:38 |
corvus | Shrews, mordred: ^ derp, thanks. | 19:39 |
clarkb | actually eait | 19:39 |
clarkb | what we write to a file on thr VM isnt json | 19:39 |
corvus | clarkb: no i mean what ansible sees is json | 19:39 |
clarkb | so we should be able to write the same data to stdout instead? | 19:39 |
tobiash | clarkb: you mean the pod's stdout, not ansible's stdout? | 19:40 |
clarkb | yes | 19:40 |
tobiash | yes, would be possible but makes things more complicated than needed | 19:40 |
clarkb | where k8s will collect it | 19:40 |
clarkb | I think that avoidsthe need for port 19885 | 19:40 |
clarkb | because stdout is already aggregated and streamable | 19:41 |
tobiash | that port would be only open inside the port | 19:41 |
corvus | clarkb: do you mean have the pod's main command be "tail -f /zuul_logfile"? | 19:41 |
tobiash | access is via kubectl port-forward then | 19:41 |
tobiash | so the port does't need to be reachable | 19:41 |
clarkb | corvus: that is one way to do it. Can't wejust write to fd 0 insteadof zuul_logfile though? | 19:42 |
tobiash | and executing kubectl port-forward is as easy as executing kubectl log | 19:42 |
clarkb | er fd 1 | 19:42 |
corvus | clarkb: in this situation, there are 2 processes on the pod | 19:42 |
corvus | clarkb: the main process is basically "sleep inifinity". that's what kubernetes thinks it's running, and that's what you'd get with kubectl logs. the other is what ansible runs via kubectl exec | 19:43 |
tobiash | I'd prefer using port-forward over writing into stdout of a foreign process ;) | 19:43 |
clarkb | corvus: hrm, why arent we just running the process directly? | 19:43 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add CONTRIBUTE file https://review.opendev.org/705535 | 19:43 |
corvus | clarkb: because the pod terminates at the end of the process. | 19:43 |
mordred | clarkb: because we need to run multiple commands in the same container | 19:44 |
corvus | clarkb: we asked nodepool for a long-running pod. | 19:44 |
clarkb | I see | 19:44 |
clarkb | in that case having the long running process either tail the file or have it write to port 19885 makes sense to me | 19:44 |
corvus | clarkb: there is another option, which is to ask nodepool for a namespace, then you can run as many pods as you want, but you're on your own there. | 19:44 |
clarkb | ya | 19:45 |
corvus | (ie, zuul doesn't know anything about pods you create in that namespace) | 19:45 |
clarkb | ya I was ignoring the "native" case | 19:45 |
corvus | it seems like at first blush, the port-forward will probably fit in easier with the existing code | 19:45 |
corvus | (the "tail -f case" is actually a bit more complicated, because it's really "tail -f whatever the current file for the current command is") | 19:46 |
tobiash | zuul-maint: this is a small fix that resolves occasional connection problems to zuul-web (which also can lead to missed events in case of github): https://review.opendev.org/705459 | 19:53 |
*** irclogbot_3 has quit IRC | 20:05 | |
*** irclogbot_2 has joined #zuul | 20:07 | |
ianw | clarkb: thinking about the loop variables -- "_item" would be a bad choice as again likely to conflict, maybe _zj_... actually satisfies everything (clearly don't use this, and also very unlikely to conflict) | 20:09 |
clarkb | the _ prefix is aproblem according to the docs though | 20:09 |
clarkb | I'm worried they'll start enforcing that rule of first char is letter | 20:10 |
ianw | yeah, i wonder if that's really just a rule of thumb in the docs for a regular end-user, rather than people writing things to be used in a library situation | 20:11 |
*** mhu has quit IRC | 20:13 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: DNM: Debug container build https://review.opendev.org/704215 | 20:19 |
*** jamesmcarthur has joined #zuul | 20:20 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Use apt mirror infrastructure during zuul-quick-start https://review.opendev.org/649448 | 20:22 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: DNM: Debug container build https://review.opendev.org/704215 | 20:22 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Enable ansible cleanup https://review.opendev.org/636015 | 20:24 |
openstackgerrit | Tristan Cacqueray proposed zuul/nodepool master: Dockerfile: create a nodepool user with uid 10001 https://review.opendev.org/705497 | 20:29 |
*** jamesmcarthur has quit IRC | 20:38 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: upload-afs: rename to upload-afs-roots; add afs-upload-synchronize https://review.opendev.org/705368 | 20:42 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Remove deprecated upload-afs role https://review.opendev.org/705373 | 20:42 |
openstackgerrit | Merged zuul/zuul master: Gerrit: poll for merged changes if no stream events https://review.opendev.org/705053 | 20:43 |
openstackgerrit | Merged zuul/zuul master: Cap cheroot to fix issues with concurrent requests https://review.opendev.org/705459 | 20:54 |
*** Goneri has quit IRC | 21:16 | |
fungi | ofosos: let me know how your connectivity explorations go (once you've sufficiently recovered from fosdem). i'm still en route home but should be around more usual hours on wednesday | 21:18 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use unique loop vars to avoid conflicts https://review.opendev.org/705337 | 21:35 |
openstackgerrit | Merged zuul/zuul master: Add google-cloud-storage to executor ansible https://review.opendev.org/705279 | 21:38 |
corvus | tristanC, mordred: can you +3 https://review.opendev.org/705313 soon? it's trivial and having it in for the next restart would be helpful | 21:46 |
tristanC | corvus: done | 21:50 |
corvus | tobiash: +2 on scale-out-scheduler spec :) | 21:51 |
tobiash | wohoo | 21:51 |
corvus | super cool. when do we start? :) | 21:52 |
fungi | i think that means we just have started, right? | 21:53 |
*** jamesmcarthur has joined #zuul | 21:53 | |
corvus | or we started a year ago | 21:54 |
tobiash | ha scheduler is now one of our high prio topics. bolg is starting to work full time on this atm | 21:58 |
corvus | huzzah! | 21:59 |
tobiash | corvus: maybe tomorrow we could discuss a way forward for the cyclic deps as well? | 22:00 |
corvus | tobiash: ah yes, i'll put that on my list :) | 22:00 |
tobiash | thanks :) | 22:01 |
*** mattw4 has quit IRC | 22:09 | |
*** mattw4 has joined #zuul | 22:09 | |
pabelanger | interested in both HA scheduler and circular depends too | 22:24 |
*** saneax has quit IRC | 22:25 | |
*** mattw4 has quit IRC | 22:29 | |
*** mattw4 has joined #zuul | 22:29 | |
*** jamesmcarthur has quit IRC | 22:32 | |
*** rfolco has quit IRC | 22:36 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: use-buildset-registry: disable docker userland proxy https://review.opendev.org/702753 | 22:36 |
clarkb | going over the operator stack again I've +2'd basically everything but https://review.opendev.org/#/c/702106/22 because tobiash left some really good points there and i'm not sure if we need to address those before merging or not | 22:43 |
openstackgerrit | Merged zuul/zuul master: Fix github app authentication to work with checks API endpoints https://review.opendev.org/705167 | 22:46 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed https://review.opendev.org/703631 | 22:46 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add CONTRIBUTE file https://review.opendev.org/705535 | 22:46 |
tristanC | clarkb: i'm happy to fix tobiash comment in the review or as follow-up, but those are also good learning opportunities too | 22:49 |
tobiash | I'm fine with followups, they are also easier to review | 22:51 |
tristanC | the operator stack also depends-on https://review.opendev.org/#/c/702753/ which may need a toggle before landing in zuul-jobs | 22:55 |
clarkb | tristanC: I think we should set that config elsewhere since it isn't related to the registry right? It is zuul operator on docker specific (with gearman) | 22:57 |
clarkb | tristanC: I believe the docker config manipulation there will honor other existing configs (it does a merge) | 22:57 |
clarkb | (maybe in install-docker if it makes sense to have that as a global setting) | 22:58 |
openstackgerrit | Merged zuul/zuul master: Add some debug lines for provides/requires https://review.opendev.org/705313 | 23:00 |
*** jamesmcarthur has joined #zuul | 23:11 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: DNM: install-docker: enable setting docker userland proxy https://review.opendev.org/702753 | 23:16 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add OpenShift SCC and functional test https://review.opendev.org/702758 | 23:16 |
tristanC | clarkb: ok, testing the new install-docker approach with 702758 | 23:17 |
*** jamesmcarthur has quit IRC | 23:17 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add upload-logs-gcs role https://review.opendev.org/703711 | 23:20 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add index_links option to zuul manifest https://review.opendev.org/705580 | 23:20 |
*** hashar has quit IRC | 23:26 | |
*** plaurin has quit IRC | 23:37 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: web: link to index.html if index_links is set https://review.opendev.org/705585 | 23:45 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!