*** jamesmcarthur has joined #zuul | 00:17 | |
*** jamesmcarthur has quit IRC | 01:33 | |
*** jamesmcarthur has joined #zuul | 02:14 | |
*** jamesmcarthur has quit IRC | 02:37 | |
*** jamesmcarthur has joined #zuul | 03:04 | |
*** jamesmcarthur has quit IRC | 03:33 | |
*** bhavikdbavishi has joined #zuul | 03:44 | |
*** threestrands has joined #zuul | 04:37 | |
*** threestrands has quit IRC | 04:37 | |
*** threestrands has joined #zuul | 04:37 | |
*** raukadah is now known as chkumar|rover | 04:37 | |
*** dkehn has quit IRC | 05:04 | |
*** AJaeger has quit IRC | 05:45 | |
*** AJaeger has joined #zuul | 05:48 | |
*** aluria has joined #zuul | 05:56 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Fix weak dependencies to work with child_jobs https://review.opendev.org/677936 | 06:39 |
---|---|---|
*** themroc has joined #zuul | 06:43 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Fix weak dependencies to work with child_jobs https://review.opendev.org/677936 | 06:50 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Fix weak dependencies to work with child_jobs https://review.opendev.org/677936 | 07:10 |
*** jpena|off is now known as jpena | 07:11 | |
*** sshnaidm|afk is now known as sshnaidm | 07:16 | |
*** threestrands has quit IRC | 07:24 | |
*** hashar has joined #zuul | 07:45 | |
*** jangutter has joined #zuul | 08:06 | |
*** themroc has quit IRC | 08:15 | |
*** themroc has joined #zuul | 08:16 | |
*** bhavikdbavishi has quit IRC | 09:45 | |
*** saneax has joined #zuul | 09:59 | |
*** sanjayu_ has joined #zuul | 10:43 | |
*** saneax has quit IRC | 10:43 | |
*** badboy has joined #zuul | 10:55 | |
badboy | hi guys | 10:56 |
badboy | quick question, is it possible to set Zuul up to be triggered by abandon event? | 10:56 |
*** badboy has quit IRC | 11:03 | |
*** gtema_ has joined #zuul | 11:20 | |
*** jpena is now known as jpena|lunch | 11:25 | |
*** hashar has quit IRC | 11:38 | |
*** badboy has joined #zuul | 11:44 | |
*** gtema_ has quit IRC | 11:46 | |
tristanC | badboy: it seems like all event types ( https://www.gerritcodereview.com/cmd-stream-events.html#events ) are available for trigger, e.g. you could try change-abandoned | 11:51 |
badboy | tristanC: thank you, will try that! | 11:54 |
*** rlandy has joined #zuul | 11:59 | |
*** rlandy is now known as rlandy|ruck | 11:59 | |
*** badboy has quit IRC | 12:01 | |
*** weshay_MOD is now known as weshay | 12:07 | |
*** jpena|lunch is now known as jpena | 12:30 | |
*** mgoddard has quit IRC | 12:47 | |
*** jamesmcarthur has joined #zuul | 12:47 | |
*** mgoddard has joined #zuul | 12:47 | |
fungi | that ought to work. we run (or at least used to, i haven't checked) jobs on change-restored events which are the inverse of change-abandoned | 12:53 |
fungi | gerrit considers abandoned changes to be closed though, so if you're trying to report to one it will only allow commenting and not voting | 12:54 |
pabelanger | morning | 13:31 |
pabelanger | we are seeing a newish error with swift upload logs | 13:31 |
pabelanger | http://paste.openstack.org/raw/764544/ | 13:31 |
pabelanger | module 'keystoneauth1.exceptions.http has no attribute 'HTTPError' | 13:32 |
pabelanger | unsure if related to vexxhost or python lib | 13:32 |
pabelanger | mnaser:^ incase you want to look | 13:32 |
mnaser | seems like a 504 from our side that made that happen | 13:33 |
mnaser | we saw some issues earlier, mainly because of the sheer amount of objects being uploaded all at once (see discussion on zuul-discuss) | 13:33 |
mnaser | something like ~60-65% of all objects uploaded in a day (2.5m in opendev case) are ara-report files | 13:33 |
pabelanger | okay, so though is opendev might be impacting swift? | 13:34 |
fungi | we experienced similar issues with ara's raw files format initially as well, basically we ran out of inodes on the filesystem where we were trying to store them | 13:36 |
pabelanger | that might explain why we've seen an increase in POST_FAILURES for zuul.a.c recently | 13:36 |
fungi | as pointed out on the ml thread, it's not actually the identical static files (icons, scripts, stylesheets) which account for the bulk of those files, it's the data. think of it as a database format where every row for every table is in its own file | 13:37 |
*** jeliu_ has joined #zuul | 13:42 | |
AJaeger | pabelanger: I saw a patch merged for this, let me find it... | 13:43 |
AJaeger | pabelanger: I4afe8c9fc8239a31d62a2a1d09794211b5066472 | 13:43 |
*** jamesmcarthur has quit IRC | 13:44 | |
*** hashar has joined #zuul | 13:45 | |
*** jamesmcarthur has joined #zuul | 13:47 | |
pabelanger | odd, we should be running this now | 13:48 |
AJaeger | pabelanger: so, is running it the problem? Meaning: Do you have another keystoneauth version running then we do? | 13:49 |
openstackgerrit | Jens Harbott (frickler) proposed zuul/zuul-jobs master: Fix handling of dangling symlink on manifest generation https://review.opendev.org/678552 | 13:52 |
*** jamesmcarthur has quit IRC | 13:52 | |
pabelanger | AJaeger: let me check which version I have installed | 13:52 |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Check path for existance in generate_manisfest.py https://review.opendev.org/678553 | 13:52 |
pabelanger | keystoneauth1==3.16.0 | 13:53 |
AJaeger | pabelanger: I don't know what OpenDev uses... | 13:55 |
pabelanger | okay, thanks. It sounds like, we might be exposing this issue, only if upload to vexxhost times out | 13:55 |
AJaeger | 3.17 just came out - a pull shows both 3.16 and 3.17 | 13:55 |
pabelanger | which, seems to happen more, now that opendev is also uploading there | 13:56 |
pabelanger | due to volume of ara bits | 13:56 |
*** mgoddard has quit IRC | 13:56 | |
openstackgerrit | Jens Harbott (frickler) proposed zuul/zuul-jobs master: Fix handling of dangling symlink on manifest generation https://review.opendev.org/678552 | 13:57 |
*** mgoddard has joined #zuul | 13:58 | |
AJaeger | zuul-maint, could you review this, please? ^ | 13:59 |
fungi | on ze01, pip list reports keystoneauth1 (3.17.0) | 14:00 |
fungi | however, the executor daemon on it last restarted 17 days ago | 14:00 |
fungi | so if it's importing keystoneauth at start, then it hasn't been using anything newer than that date | 14:01 |
AJaeger | and the diff between 3.16 and 3.17 shows nothing related to this AFAIU | 14:02 |
openstackgerrit | Merged zuul/zuul-jobs master: Fix handling of dangling symlink on manifest generation https://review.opendev.org/678552 | 14:14 |
fungi | well, 3.16 is also newer than the last executor restart | 14:15 |
fungi | i have a feeling it's running on 3.15.0 comparing release dates for keystoneauth1 with process start times | 14:16 |
mordred | fungi, AJaeger: keystoneauth1.exceptions.http.HttpError is the correct exception name | 14:17 |
clarkb | fungi: because ansible is the result of new forks it should use current sdk install | 14:17 |
mordred | so I think if we're trying to catch HTTPError that's a bug | 14:17 |
mordred | there is also an HTTPClientError | 14:18 |
fungi | clarkb: ahh, this is being called by ansible, not imported by zuul? in that case i concur, it'll be 3.17.0 | 14:18 |
fungi | er, well, possibly not | 14:19 |
fungi | that's the system context version | 14:19 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: Add tests for manifest generation for missing files https://review.opendev.org/678573 | 14:19 |
fungi | ansible will use the one in the corresponding venv for its ansible version, right? | 14:19 |
mnaser | ^ this is not really that clean.. | 14:19 |
mnaser | i think its ok enough, but if someone wants to polish it, please feel free to, i gotta dig into other things :< | 14:20 |
*** jamesmcarthur has joined #zuul | 14:21 | |
*** michael-beaver has joined #zuul | 14:21 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Update keystoneauth exception name https://review.opendev.org/678575 | 14:21 |
mordred | fungi, clarkb, corvus ^^ | 14:21 |
fungi | yeah, these ansible venvs have keystoneauth1 versions contemporary to their initial creation, so 3.13.1 for all the ansible venvs except the ansible 2.8 venv which has 3.14.0 | 14:21 |
corvus | that's untestable code that we just added, so it's not surprising it didn't work the first time out | 14:24 |
*** jamesmcarthur has quit IRC | 14:28 | |
corvus | mnaser: test looks good -- left a nit and a question about which behavior we want | 14:29 |
openstackgerrit | Merged zuul/zuul-jobs master: Update keystoneauth exception name https://review.opendev.org/678575 | 14:38 |
*** gtema_ has joined #zuul | 14:41 | |
*** gtema_ has quit IRC | 14:44 | |
*** sanjayu_ has quit IRC | 15:11 | |
clarkb | corvus: I wrote a bunch of zuul changes on friday. With exception of this first one https://review.opendev.org/678049 the others https://review.opendev.org/#/c/678286/ https://review.opendev.org/#/c/678312/ should be straightforward docs updates if you have a moment | 15:14 |
clarkb | corvus: the docs updates were things that noorul ran into that we helped noorul work through so are real issues people are having | 15:14 |
corvus | fungi, clarkb: mordred and i are talking to the folks working on the gerrit checks plugin tomorrow about closing some of the gaps so zuul could maybe use it | 15:15 |
corvus | clarkb: thanks i'll take a look | 15:16 |
fungi | as was https://review.opendev.org/678243 but looks like i have a docs build error | 15:18 |
fungi | will update in a moment | 15:18 |
*** jamesmcarthur has joined #zuul | 15:18 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: Add PerconaXDB Cluster to Zuul-Operator https://review.opendev.org/677315 | 15:19 |
corvus | fungi: a new user wanted to use the logging config? | 15:20 |
*** dkehn has joined #zuul | 15:20 | |
fungi | corvus: yes, to turn on debugging | 15:20 |
corvus | i thought that was a flag? | 15:20 |
*** jamesmcarthur has quit IRC | 15:22 | |
*** jamesmcarthur has joined #zuul | 15:22 | |
fungi | hrm, if so i didn't find it when searching the zuul docs | 15:22 |
corvus | hrm, apparently that only happens if it runs in the foreground | 15:22 |
fungi | nor did i find anything at all related to configuring service logging | 15:23 |
corvus | honestly, i'd rather we finally fix that rather than encourage folks to add logging configs | 15:23 |
fungi | ahh, okay | 15:23 |
fungi | so proxy the logging module configuration via one of the existing service config files? | 15:23 |
corvus | https://review.opendev.org/635649 | 15:24 |
corvus | fungi: ^ | 15:24 |
fungi | oh, neat | 15:24 |
fungi | so when zuul is containerized, users don't expect it to create log files? | 15:25 |
fungi | also, for the record, noorul was following the zfs instructions not the quickstart, since it was for something related to the bitbucket driver | 15:26 |
corvus | fungi: that is true for many users of containers | 15:26 |
fungi | so i don't know if the container stuff is relevant to the zfs instructions | 15:26 |
fungi | but maybe manually starting the service with logging to foreground still would be | 15:26 |
corvus | fungi: yeah, but that means that noorul would merely have to add "-d" to the invocation rather than loarn the python logging file format | 15:26 |
corvus | fungi: that change severs the two | 15:26 |
corvus | fungi: all 4 cases in the matrix are supported :) | 15:27 |
fungi | let me figure out where i left my bottle of red pills | 15:28 |
corvus | hrm | 15:28 |
corvus | wait, maybe that only handles three? | 15:28 |
clarkb | worth noting that journld also works like docker in this case | 15:29 |
corvus | fungi, clarkb, tobiash: ^ check my comment on that. it's late and i'm making small mistakes now | 15:29 |
clarkb | they both grab all the stdout/stderr and record them | 15:29 |
* corvus eods | 15:31 | |
fungi | thanks corvus! enjoy the cool se evening weather | 15:31 |
clarkb | corvus: I think you generally don't want to write to stdout in the daemon case because daemonization closes teh fd's | 15:32 |
openstackgerrit | Monty Taylor proposed zuul/zuul master: Apply changes to command module from ansible 2.6 https://review.opendev.org/678594 | 15:33 |
fungi | yes, closing inherited fds is necessary to be able to fully disassociate from the calling process | 15:34 |
mordred | pabelanger: ^^ you might want to look at that one | 15:34 |
fungi | obviously for container and systemd use cases you may choose not to daemonize | 15:34 |
mordred | pabelanger: flaper87 noticed that we're missing a new parameter added in ansible 2.6 | 15:34 |
fungi | oh? | 15:35 |
clarkb | fungi: ya in those two cases you are not supposed to daemonize and then if you write to stdout they collect those as logs | 15:35 |
* flaper87 +1'd | 15:35 | |
fungi | ahh, 678594 | 15:35 |
*** noorul has joined #zuul | 15:36 | |
clarkb | mordred: does that need to be version specific so that < 2.6 don't get weird errors? | 15:37 |
clarkb | oh I guess we convert argv to args so that may actually just backport the support to 2.5. May still cause problems if people test their 2.5 ansible with zuul and it works then deploy and it fails | 15:38 |
*** mattw4 has joined #zuul | 15:38 | |
*** mattw4 has quit IRC | 15:39 | |
*** mattw4 has joined #zuul | 15:39 | |
*** mattw4 has quit IRC | 15:42 | |
*** noorul has quit IRC | 15:42 | |
*** mattw4 has joined #zuul | 15:42 | |
openstackgerrit | Merged zuul/nodepool master: openstack: handle safely invalid network name https://review.opendev.org/677501 | 15:47 |
*** mattw4 has quit IRC | 15:52 | |
*** stewie925 has joined #zuul | 15:58 | |
*** noorul has joined #zuul | 16:00 | |
*** jpena is now known as jpena|off | 16:02 | |
*** noorul has quit IRC | 16:07 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: Add PerconaXDB Cluster to Zuul-Operator https://review.opendev.org/677315 | 16:07 |
*** noorul has joined #zuul | 16:11 | |
noorul | hi | 16:14 |
noorul | Does merger care about all the branches in the repo? | 16:14 |
openstackgerrit | Merged zuul/zuul master: Document js tool installation in scratch doc https://review.opendev.org/678286 | 16:14 |
fungi | noorul: yes, because a change could be proposed for any branch of a repository | 16:15 |
fungi | it will only calculate merges for the target branches of changes in the set it's considering, but i believe it prepares all branches from the configured remote | 16:16 |
fungi | local copies of all branches that is | 16:16 |
clarkb | yes it does that to load configs for all branches | 16:16 |
fungi | ahh, right, that too, for the cat jobs | 16:17 |
noorul | I see this exception http://paste.openstack.org/show/764710/ in the log. But it is not marking the build as MERGE_FAILURE | 16:20 |
fungi | if things were working correctly, the scheduler should then report that failure | 16:23 |
fungi | have you checked the scheduler log around that timestamp? | 16:23 |
noorul | It says merge failure, http://paste.openstack.org/show/764712/ | 16:28 |
fungi | does the change ahead of it which it's conflicting with still have builds in progress? | 16:31 |
noorul | Yes | 16:31 |
fungi | because zuul won't report until it decides for sure it won't be able to merge. if the change ahead of it fails a build and gets kicked out of the queue, then this change will no longer conflict and will be tested normally | 16:31 |
noorul | fungi: I see | 16:32 |
fungi | if the change ahead of it (with which it's conflicting) succeeds all its builds and merges, then this change will be kicked out because it can no longer be merged to the branch, and then the scheduler should report a merge failure on it | 16:32 |
noorul | fungi: Is the scheduler responsible for updating build status as MERGE_FAILURE? | 16:33 |
noorul | fungi: I see that in this scenario using Gerrit, the build status is set as MERGE_FAILURE | 16:33 |
noorul | but not in the case of stash. I think that stash driver has nothing to do here. | 16:34 |
fungi | yes, it will happen as part of the buildset reporting process. it will request an updated merge from the mergers if the change ahead of it is ejected | 16:34 |
openstackgerrit | Merged zuul/zuul master: Set git user config in from scratch document https://review.opendev.org/678312 | 16:34 |
fungi | otherwise it will report the merge failure | 16:34 |
*** igordc has joined #zuul | 16:37 | |
*** jamesmcarthur has quit IRC | 16:40 | |
corvus | clarkb: my intent wasn't to write to stdout in the debug case, but rather to change the debug level of the default file output handler which is configured if no logger config is specified. i only intended to suggest that the setDebug() method be called if the debug arg is present regardless of which output handler is used | 16:48 |
clarkb | gotcha | 16:48 |
corvus | i don't know what i actually wrote on the review because i'm tired, but that's what i meant :) | 16:49 |
*** hashar has quit IRC | 16:49 | |
noorul | corvus, clarkb: Is it possible for you to take a look at stash PR to figure out why I am not able to get the merge failure from Zuul? | 16:53 |
*** hashar has joined #zuul | 16:57 | |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Fix handling of dangling symlink https://review.opendev.org/678619 | 16:58 |
openstackgerrit | Merged zuul/zuul master: bindep: add unzip and bzip2 for rpm platform https://review.opendev.org/678433 | 16:58 |
*** bhavikdbavishi has joined #zuul | 17:01 | |
*** jamesmcarthur has joined #zuul | 17:04 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Add tests for manifest generation for missing files https://review.opendev.org/678573 | 17:05 |
noorul | Sometimes both the PR tests run parallely | 17:09 |
noorul | https://imgur.com/a/0KBjOXA | 17:09 |
mordred | clarkb: re: the patch for adding the argv to command module - https://review.opendev.org/#/c/650431/7 pabelanger has a patch up to remove 2.5 support anyway (since it's EOL) | 17:09 |
mordred | clarkb: so I was kind of not worrying about it too much | 17:09 |
clarkb | ya I'm on the side of the fence that just because ansible eols aggressively doesn't mean we have to. There is typically little reason to upgrade ansible as a user and comes with quite a bit of headache to do so | 17:11 |
clarkb | I think if ansible was backward compatible and upgrades just worked I'd care less | 17:11 |
clarkb | I expect zuul users don't want to do large amounts of job churn frequently either | 17:12 |
mordred | yeah | 17:14 |
mordred | clarkb: we could also copy the current command module to the 2.5 directory replacing the symlink | 17:15 |
mordred | which shouldn't be too hard | 17:15 |
*** hashar has quit IRC | 17:18 | |
openstackgerrit | Monty Taylor proposed zuul/zuul master: Apply changes to command module from ansible 2.6 https://review.opendev.org/678594 | 17:18 |
mordred | clarkb: there's doing that | 17:18 |
*** armstrongs has joined #zuul | 17:19 | |
*** mattw4 has joined #zuul | 17:19 | |
*** noorul has quit IRC | 17:20 | |
*** chkumar|rover is now known as raukadah | 17:21 | |
mugsie | Shrews: re - https://review.opendev.org/#/c/554432/ - that is working for me locally :/ | 17:27 |
mugsie | and the unit test seems to be loading and running fine as well | 17:27 |
Shrews | mugsie: I’m surprised since I didn’t see a config definition for ‘driver’, but I only did a quick pass before an appointment | 17:28 |
armstrongs | Quick question I noticed when I run a shell command from a vm in nodepool it streams the stdout but when you run it on a container using the kubernetes driver it doesn't stream the stdout and you have to view it in the output in the json or ara report. Is there any way to get the stdout to behave the same on cotainers as vms? | 17:28 |
*** jamesmcarthur has quit IRC | 17:29 | |
mugsie | I basically copied AWS's config layout, but I can update to add in the extra stuff you pointed out tomorrow at some point | 17:29 |
clarkb | armstrongs: the way console streaming works we run a little daemon on the test node to collect and stream taht data. As is I don't think people want to pollute containers with that by default, but that should all just be a base job config thing iirc | 17:32 |
clarkb | armstrongs: there is long term ongoing work to have ansible do what zuul does for streaming out of the box and once that happens it shouldn't matter what platform is used | 17:32 |
armstrongs | clarkb: could you point me towards what I need to put in the base job if you have an example. Also thanks for the info 😊 | 17:35 |
clarkb | armstrongs: zuul/zuul-jobs/roles/start-zuul-console | 17:36 |
armstrongs | Ah awesome thanks again | 17:37 |
clarkb | we have that in our base job pre run playbook | 17:37 |
fungi | armstrongs: https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/base/pre.yaml#L18 | 17:37 |
armstrongs | Cool will give it a go | 17:38 |
tristanC | armstrongs: zuul-console may not work in kubernetes as it requires a tcp access to the pod netns from the zuul-executor. iirc kubectl connection doesn't have access to the pod netns and rely on the exec api of kubernetes | 17:40 |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Fix handling of dangling symlink https://review.opendev.org/678619 | 17:45 |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Add tests for manifest generation for missing files https://review.opendev.org/678573 | 17:46 |
*** jamesmcarthur has joined #zuul | 17:49 | |
*** jamesmcarthur has quit IRC | 17:58 | |
clarkb | any other zuulian want to review https://review.opendev.org/#/c/676717/ I think that will improve memory overhead for running zuul jobs which will help opendev | 18:02 |
*** armstrongs has quit IRC | 18:08 | |
*** jamesmcarthur has joined #zuul | 18:12 | |
*** bhavikdbavishi has quit IRC | 18:32 | |
*** noorul has joined #zuul | 18:40 | |
SpamapS | clarkb:I wonder if we could make the zuul_console daemon a sidecar that is automatically added to every pod. | 18:51 |
*** noorul has quit IRC | 18:52 | |
SpamapS | Wouldn't be too hard.. emptyDir shares the socket, envvar added to the main pod with the ID. | 18:52 |
clarkb | or we can push ansible to add support in tree | 18:53 |
clarkb | which I believe they want anyway | 18:53 |
clarkb | then it will work for ansible always hopefully | 18:53 |
SpamapS | tristanC: that TCP access that the executor needs doesn't need anything other than a port to contact. Why would it need access in the netns? | 18:53 |
clarkb | I'll add this ti my list of things to bring up at ansiblefest dev day | 18:54 |
SpamapS | Just grab the IP of the container and the sidecar should have the usual port defined. | 18:54 |
clarkb | the other item is the exec per task | 18:54 |
SpamapS | clarkb: That would be nice. :) | 18:54 |
*** noorul has joined #zuul | 18:57 | |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Fix handling of dangling symlink https://review.opendev.org/678619 | 19:01 |
*** noorul has quit IRC | 19:01 | |
SpamapS | Ugh.. https://opendev.org/zuul/nodepool/commit/da2701e0b19cbe75cdbd79cfeafaf7c643546fc7 broke us btw. I understand, using the Dockerfile in the repo may not be something that is part of releases. Just, FYI.. that broke us. We're having to redo a bunch of stuff to be able to deploy Nodepool 3.8.0. :-P | 19:04 |
clarkb | SpamapS: it broke because the uid you had been using wasn't 10001? | 19:06 |
SpamapS | clarkb:correct. | 19:06 |
SpamapS | Permissions problems.. have to rework our Kubernetes pod specs | 19:06 |
clarkb | I think we saw that as forward compatible because you can specify whatever you want it to be, but ya if you don't specify you get the default | 19:07 |
*** noorul has joined #zuul | 19:07 | |
SpamapS | not a huge deal. I just want to raise that this was yet another thing that changed under us. I am not upset, or anything. I just .. it happens a lot.. I feel like we're doing the wrong thing or something. | 19:08 |
*** armstrongs has joined #zuul | 19:09 | |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Add tests for manifest generation for missing files https://review.opendev.org/678573 | 19:11 |
*** noorul has quit IRC | 19:12 | |
tristanC | SpamapS: i meant access to the tcp port of the zuul-console daemon, which is not exposed by default iiuc | 19:14 |
SpamapS | tristanC:right but we can expose it in the k8s driver by running it as a sidecar and sharing the socket in. That way your pod image is clean, you don't even need python. :) | 19:14 |
SpamapS | well.. n/m.. you do.. because ansible | 19:15 |
tristanC | SpamapS: i'm not sure how that will work, the kubectl connection doesn't have ip an address, thus the zuul_stream wouldn't be able to contact the right ingress entrypoint. and even so, how would you map the default port to the zuul-console netns? | 19:17 |
*** noorul has joined #zuul | 19:17 | |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Fix handling of dangling symlink https://review.opendev.org/678619 | 19:19 |
tristanC | another solution to get the output in job-output.txt (but not live) would be to tweak zuul_stream and make it dump the result object when the connection is kubectl | 19:22 |
*** noorul has quit IRC | 19:22 | |
SpamapS | tristanC: We could make a nodeport for it. | 19:27 |
*** noorul has joined #zuul | 19:28 | |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Fix handling of dangling symlink https://review.opendev.org/678619 | 19:28 |
Shrews | SpamapS: I, for one, wasn't really considering the nodepool Dockerfile a "production" piece of code, rather something that we used for our testing. While you obviously were. Perhaps that's the something that isn't quite right there and we need some more better testing around that. | 19:29 |
Shrews | In which case, this feels more like a "packaging testing" issue and feels out of place within nodepool repo itself. | 19:30 |
Shrews | But I'm often told that I'm a weird person | 19:30 |
clarkb | ya I think some of the pain there is SpamapS is running zuul and nodepool completely disjoint from many of us. github not gerrit, aws/k8s with nodepool, kubernetes to host the services, etc. Different groups of us use pieces of that collection but their isn't full overlap iirc | 19:31 |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Add tests for manifest generation for missing files https://review.opendev.org/678573 | 19:31 |
clarkb | I think we can cover a lot of those gaps with better testing | 19:31 |
*** noorul has quit IRC | 19:32 | |
tristanC | SpamapS: isn't nodeport unique per node? it seems like only the first job will be able to spawn the zuul-console service | 19:34 |
tristanC | SpamapS: also, setting nodeport requires admin privilege on okd | 19:34 |
*** armstrongs has quit IRC | 19:35 | |
*** noorul has joined #zuul | 19:36 | |
Shrews | I'd love to solve the us-breaking-SpamapS issues. Some areas we can improve with additional testing (like the Dockerfile thing), others, like the AWS driver, I'm not sure we'd ever be able to do anything about. | 19:37 |
clarkb | Shrews: this is a bit of hand waving but openstack does/did have ec2 api layer | 19:40 |
clarkb | I have no idea how good a stand in for aws that is, but is a potential opetion | 19:40 |
fungi | seems it's still semi-active: https://opendev.org/openstack/ec2-api/commits/branch/master | 19:42 |
Shrews | yeah, i don't know anything about that either | 19:42 |
*** noorul has quit IRC | 19:43 | |
clarkb | I think logan- said they use it in some capacity, might have insight on applicability to this use case | 19:43 |
*** stewie925 has quit IRC | 19:44 | |
fungi | https://review.opendev.org/650397 was the last commit of substance to merge and was related to testing, but was reviewed fairly quickly | 19:45 |
fungi | ~4 months ago | 19:45 |
*** hashar has joined #zuul | 19:46 | |
fungi | and it's still considered an official team in openstack: https://governance.openstack.org/tc/reference/projects/ec2-api.html | 19:46 |
Shrews | i would expect issues with a driver test using a translation API rather that actual API most users would be using | 19:47 |
Shrews | but also... something is better than nothing sometimes | 19:47 |
clarkb | Shrews: ya that could happen. It might however catch bugs in improper use of the api? | 19:47 |
*** noorul has joined #zuul | 19:48 | |
clarkb | probably won't know how useful it is until we try it and I'm not sure if the investment makes that worthwhile | 19:48 |
Shrews | I think users of that driver (with a much more vested interest in it) would have to do the investing. But I think that's only 1 person, atm | 19:49 |
*** noorul has quit IRC | 19:52 | |
*** jamesmcarthur has quit IRC | 19:53 | |
SpamapS | clarkb: I really hope to move to the kubernetes operator once it exists. That should help me align better. | 19:53 |
SpamapS | Sign me up as a beta tester. | 19:53 |
SpamapS | tristanC: You can't make a nodeport in a namespace you have control over? | 19:54 |
SpamapS | tristanC: we may be talking about doing this on a different level. I'm suggesting that we make a way for Zuul to tell Nodepool that it wants these things when it asks for a pod. | 19:54 |
SpamapS | like, in the nodepool request, when it asks for a label that is k8s based, it should be able to also tack on a little thing that tells it to run the sidecar and create a nodeport. | 19:55 |
SpamapS | So, not talking about doing it from the ansible. Do it in nodepool and zuul. | 19:56 |
*** mattw4 has quit IRC | 19:58 | |
openstackgerrit | Merged zuul/zuul-jobs master: Fix handling of dangling symlink https://review.opendev.org/678619 | 19:59 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: Add PerconaXDB Cluster to Zuul-Operator https://review.opendev.org/677315 | 20:00 |
tristanC | SpamapS: yes I understand it needs to be done by nodepool, but multiple zuul build can run on one kubernetes host which may have only one public ip (nodeip) | 20:02 |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Add tests for manifest generation for missing files https://review.opendev.org/678573 | 20:02 |
tristanC | SpamapS: thus the first build may get the correct nodeport, and we can somehow tell zuul that the console of the kubectl connection is the kubernetes nodeip, but that will only work for the first build | 20:03 |
tristanC | SpamapS: okd doesn't let regular user use nodeport as they should be using regular ingress route, which are either based on http vhost or dynamic port mapping, which isn't supported by the zuul_stream module. | 20:04 |
SpamapS | tristanC:nodeports are unique ports. So you'd get back the node IP and a random port for zuul to use as the console port. | 20:05 |
SpamapS | ingresses are for http.. :-P | 20:05 |
SpamapS | We can make zuul_stream read an environment variable or something. | 20:05 |
SpamapS | Anyway, point is, I think it can be done just with k8s primitives and some plumbing between zuul and nodepool. But, I defer to the implementors. I will not be able to work on this myself. :_/ | 20:06 |
*** hashar has quit IRC | 20:08 | |
tristanC | SpamapS: oh right, if we can make the zuul_stream callback and the zuul_console use an arbritary port, that would work | 20:11 |
tristanC | but i've not been able to change the zuul-console listening port without changing the ansible role vars (the python module that spawn the daemon doesn't have access to the environment or site vars) | 20:12 |
clarkb | it can scan /proc for that info | 20:14 |
clarkb | we did that atone point to find the pid iirc | 20:14 |
clarkb | though maybe that means we cant find the pid to find tge port | 20:15 |
SpamapS | Yeah not claiming it's easy.. like clarkb said.. Ansible needs to put this in the core so we can simplify it. | 20:15 |
SpamapS | I bet if it's done right we can drop the zuul_console daemon and just multiplex the output from the python modules that get uploaded. | 20:16 |
* SpamapS kind of wishes he could just work on that for 3 months. | 20:16 | |
*** mattw4 has joined #zuul | 20:17 | |
*** mattw4 has quit IRC | 20:28 | |
*** mattw4 has joined #zuul | 20:39 | |
*** dolpher has joined #zuul | 21:01 | |
dolpher | How the user zuul is created in nodepool image? is there a dib element for doing that? | 21:03 |
clarkb | dolpher: https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/zuul-worker that is the element we use | 21:07 |
flaper87 | Is this a valid job definition? http://paste.openstack.org/show/765019/ (specifically line #5 where I'm using an ansible variable that is defined in a previous ansible play using zuul_return) | 21:07 |
flaper87 | the zuul_return is called in the `run` playbook. I should probably try running this task in the `pre-run` and then consuming the returned value from the `run` playbook | 21:13 |
clarkb | I don't think there is shared values between the phases | 21:14 |
clarkb | zuul_return will share the info between jobs though | 21:14 |
tristanC | flaper87: it depends, zuul_returns are passed between buildset dependent jobs | 21:14 |
SpamapS | I've used a file on disk to pass info between pre/normal/post | 21:15 |
flaper87 | oh, mmh, then that's not what I need :/ | 21:15 |
dolpher | clarkb: thanks, looks like nodepool-base depends on it, so can I use the nodepoll-base element in a 3rd party CI env? or any other suggested/required elements to include? | 21:16 |
flaper87 | I need to generate an auth token from a trusted playbook and pass the token to the job so that the job can use it to authenticate to a service | 21:16 |
clarkb | dolpher: I think those elements are largely going to be geared towards opendev's images. In general you shouldn't need much more than an adduser and setting the ssh key (which you could even use something like cloud init to set at runtime too) | 21:18 |
clarkb | if those elements work for you then great, they tend not to change much, with dns fiddling being the most likely changes based on history iirc | 21:19 |
clarkb | flaper87: I believe that will make your auth token exposable | 21:19 |
dolpher | clarkb: got it, thanks! | 21:19 |
clarkb | flaper87: if you need to keep the token secure I think that the consumer of the token will also need to be in a trusted repo | 21:20 |
clarkb | flaper87: if you don't need to keep the token secure then you can likely do what spamaps does and write to disk and load from there on subsequent playbooks | 21:25 |
*** igordc has quit IRC | 21:26 | |
flaper87 | clarkb: SpamapS gotcha, thanks! I'll play with that although I think the answer is that I do have to keep this secure. | 21:26 |
*** mattw4 has quit IRC | 21:28 | |
*** jeliu_ has quit IRC | 21:28 | |
clarkb | flaper87: the problem with what you are planning of handing a secure set of data to untrusted playbook is that I could push a change that is tested pre review that cat's that data to the console | 21:29 |
clarkb | (or whatever method I want to sneak it off of the test node) | 21:29 |
clarkb | if hwoever it is fully in a trusted repo you can only do that post review | 21:29 |
clarkb | the idea being it won't get approved if it does something bad like that | 21:29 |
SpamapS | flaper87: The thing you said, generating a token, is exactly what I do. The pre has AWS access keys to generate an STS token. I write the token, which has a timeout of exactly the same as the job timeout, into ~/.aws/credentials. | 21:31 |
SpamapS | Actually I've been meaning to open source that role | 21:32 |
SpamapS | http://paste.openstack.org/show/765021/ <-- pretty simple actually | 21:34 |
*** mattw4 has joined #zuul | 21:36 | |
SpamapS | clarkb: Oh this reminds me! I've been thinking a realy cool enhancement to zuul_stream would be that you could feed it a list of hashes of secrets from the trusted phase, and it would XXXXX any strings that match those hashes in the output stream. | 21:43 |
SpamapS | That wouldn't prevent malicious compromise (they can just encrypt it to a key), but it would prevent accidentally printing stuff. | 21:43 |
fungi | i guess it also would be tricky to find what substrings to check against the hashes | 21:51 |
fungi | since for any output stream of nontrivial length the possible substrings (even below a reasonably small max string length) would be nigh innumerable | 21:52 |
clarkb | you'd have to tokenize based on whitespace or some other rule? otherwise ya arbitrary length strings | 21:54 |
SpamapS | You could do some interesting optimization with run length. | 22:02 |
SpamapS | like if you know hash=abc123 and len=47, you can stop checking when there are only 46 chars left. You only have to check the output of things from the console.. it's not that much data. | 22:03 |
SpamapS | add in character classing (class==binary is check every byte, class==word means check non-whitespace, etc) | 22:04 |
SpamapS | Anyway, just a fun thought. | 22:04 |
clarkb | does specifying hash lenght compromise the one way ness of the hash ? (I think for short values it probably does?) | 22:04 |
* SpamapS goes back to using the hell out of zuul as it is. ;) | 22:04 | |
clarkb | so many rainbow tables | 22:05 |
SpamapS | clarkb:yeah, for short values RL would be a big help for rainbow tables and such. | 22:05 |
SpamapS | Use a fast hash algorithm with a strong one to back it up, like rsync uses crc32 before others, and you can just check every damn byte with minimal overhead. | 22:06 |
SpamapS | but.. yeah.. fantasyland is over | 22:06 |
fungi | not exposing the raw length of secrets is one of the reasons for using oaep in our encoding choice | 22:21 |
fungi | so we only leak a loosely quantized length | 22:21 |
fungi | knowing exactly (or even almost) how long a secret is can mean a significant reduction in work factor for brute-force guessing | 22:22 |
fungi | and even moreso for educated guessing | 22:23 |
SpamapS | Indeed. I think just a fast, collission-prone-but-secure hash, followed by a very slow one like sha512, would allow efficient filtering w/o the length. | 22:28 |
SpamapS | not sure what that faster hash is.. would be fun to play with a few and find the right balance. | 22:28 |
SpamapS | Hm, but if something wrapped a secret, you'd have to accept that as a potential oops-around. | 22:29 |
SpamapS | (The whole idea in my head assumes you'd print the secret on a single line) | 22:29 |
SpamapS | n/m on this | 22:29 |
SpamapS | fun mental exercise | 22:29 |
* SpamapS withdraws | 22:29 | |
fungi | oh, and also, pkcs1 is a kdf, not just a raw hash, for a reason. publishing a plain hash of the secret also significantly speeds attempts at guessing the value because a simple hash is much faster to compute than applying a typical kdf | 22:36 |
fungi | there's a massive market for special-purpose sha256 or sha512 hash generating processors | 22:37 |
fungi | those hashes aren't designed to be slow or computationally intensive to calculate | 22:37 |
SpamapS | I wonder if instead of focusing on hashing, the right thing is focusing on getting the secret into a secure place that can filter the output in real time. | 22:37 |
SpamapS | But.. this is again just an oops-preventor.. so. not worth spending much time. :) | 22:38 |
fungi | i like where you're going with the idea | 22:38 |
fungi | but yes, the execution is going to be fraught with pitfalls, risking compromising the strength of the mechanisms protecting secrets it's attempting to catch leaks of | 22:39 |
fungi | though i believe we do have some tests in zuul which encrypt a known secret, use it in some jobs, and then search the resulting logs for a leak of that secret | 22:40 |
fungi | granted that sort of testing can miss unlikely branches in execution or similar corner cases, so it also not a panacea | 22:41 |
SpamapS | fungi: Oh hm, that technique would be an interesting way to test if a change is doing something naughty with a secret.... | 22:41 |
SpamapS | fungi: you could conceivably run the job w/ a known secret in the secret value, and do the same check. | 22:42 |
fungi | right | 22:42 |
SpamapS | Except it would probably fail when using said secret, which is usually the point of secrets. | 22:42 |
SpamapS | (like, I mean, a real job) | 22:42 |
fungi | yeah, would have to be a fairly abstract job without external interactions i suppose | 22:43 |
SpamapS | Yeah that's not realy the thing I want to prevent. I want to prevent somebody from leaving their debug stuff in. ;) | 22:43 |
SpamapS | All we have in defense of that now is code review. | 22:43 |
fungi | and you would likely need to mock up possible failure scenarios in it, since the most common problem is "i tried to log into this remote service and failed, here are the credentials i used..." | 22:43 |
SpamapS | Yeah, no, the only thing I can think of is that you run a filter in front of the logs that knows the exact strings it's not supposed to print. | 22:44 |
SpamapS | There's a company out there, Netskope, that does this with things like Google Docs and Dropbox. You securely feed them all of your secrets, with 0 context, and they will scan all of your google docs and dropbox files for those specific strings. | 22:45 |
fungi | this is verging into the wonderful world of proxies (you love that planet, i know). i'm familiar with an entire industry built around scanning and blocking egress traffic which includes known trigger words/phrases | 22:45 |
SpamapS | But you have to trust them to have those secrets. | 22:45 |
fungi | yeah, the big enterprise solutions involve transparent proxies at your border which you trust (because you audited the source code in them yourself? doubtful, but anyway...) | 22:46 |
fungi | basically application-layer gateways serving the "data loss prevention" concerned industrial sectors | 22:48 |
fungi | you have to rub them down with snakeoil every morning | 22:49 |
*** mattw4 has quit IRC | 23:02 | |
clarkb | ianw when your day starts thoughts on the latest patchset for https://review.opendev.org/#/c/678049/ wouldbe great | 23:24 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!