*** jamesmcarthur has quit IRC | 00:01 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: WIP: use-buildset-registry: support running before docker installed https://review.openstack.org/638180 | 00:47 |
---|---|---|
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Split docker mirror config into its own role https://review.openstack.org/638195 | 00:47 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: WIP: use-buildset-registry: support running before docker installed https://review.openstack.org/638180 | 01:18 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Split docker mirror config into its own role https://review.openstack.org/638195 | 01:18 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: use-buildset-registry: configure as a pull-through proxy https://review.openstack.org/638312 | 01:18 |
*** logan- has quit IRC | 01:22 | |
*** logan- has joined #zuul | 01:23 | |
*** logan- has quit IRC | 01:31 | |
*** logan- has joined #zuul | 01:31 | |
*** ruffian_sheep has joined #zuul | 01:39 | |
ruffian_sheep | Does anyone know how to deal it? git.exc.GitCommandError: Cmd('git') failed due to: exit code(-13) | 01:43 |
ruffian_sheep | I can do the cmd by myself.But I cannot be used by the service. http://paste.openstack.org/show/745241/ | 01:43 |
*** sdake has quit IRC | 01:44 | |
*** sdake has joined #zuul | 01:46 | |
*** jamesmcarthur has joined #zuul | 02:10 | |
SpamapS | ruffian_sheep: that looks like maybe zuul isn't using the right SSH key | 02:11 |
SpamapS | hm | 02:23 |
SpamapS | zuul.change doesn't get set in post pipelines from what I can tell (at least, on my github) | 02:23 |
SpamapS | which a) disagrees with the docs (zuul.change is in the generic section of zuul attributes), and b) makes using the promote pipeline more complicated. | 02:24 |
SpamapS | I have a feeling we could make an attempt at setting change, since the github UI figures it out. | 02:35 |
*** jamesmcarthur has quit IRC | 02:37 | |
*** jamesmcarthur has joined #zuul | 02:42 | |
*** kratec has left #zuul | 02:45 | |
*** sdake has quit IRC | 02:48 | |
*** sdake has joined #zuul | 02:48 | |
*** sdake has quit IRC | 02:57 | |
*** sdake has joined #zuul | 02:58 | |
*** jamesmcarthur has quit IRC | 03:08 | |
*** pwhalen has quit IRC | 03:14 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: gerrit: add support for report only connection https://review.openstack.org/568216 | 03:18 |
*** pwhalen has joined #zuul | 03:19 | |
*** saneax has joined #zuul | 03:25 | |
*** jamesmcarthur has joined #zuul | 03:32 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page https://review.openstack.org/630041 | 03:32 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078 | 03:32 |
*** spsurya has joined #zuul | 03:34 | |
*** jamesmcarthur has quit IRC | 04:00 | |
*** jamesmcarthur has joined #zuul | 04:01 | |
*** jamesmcarthur has quit IRC | 04:05 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: gerrit: add support for report only connection https://review.openstack.org/568216 | 04:08 |
*** saneax has quit IRC | 04:11 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078 | 04:19 |
*** nilashishc has joined #zuul | 04:21 | |
*** nilashishc has quit IRC | 04:25 | |
*** ianychoi has quit IRC | 04:31 | |
*** sdake has quit IRC | 04:50 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078 | 04:54 |
*** sdake has joined #zuul | 04:54 | |
*** sdake has quit IRC | 05:14 | |
*** sdake has joined #zuul | 05:16 | |
*** saneax has joined #zuul | 05:45 | |
ruffian_sheep | But I can do the cmd to clone this project?Isn't mean I set the right ssh key? SpamapS | 05:46 |
*** fdegir has quit IRC | 05:48 | |
*** sdake has quit IRC | 05:48 | |
*** fdegir has joined #zuul | 05:48 | |
*** sdake has joined #zuul | 05:50 | |
*** kmrchdn is now known as chandankumar | 06:01 | |
*** sdake has quit IRC | 06:03 | |
*** [GNU] has quit IRC | 06:03 | |
*** sdake has joined #zuul | 06:04 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Add python-path option to label https://review.openstack.org/637338 | 06:22 |
*** spsurya has quit IRC | 06:25 | |
*** nilashishc has joined #zuul | 06:27 | |
*** sdake has quit IRC | 06:37 | |
*** quiquell|off is now known as quiquell | 06:50 | |
*** [GNU] has joined #zuul | 07:02 | |
*** spsurya has joined #zuul | 07:17 | |
*** bhavikdbavishi has joined #zuul | 07:19 | |
*** pcaruana has joined #zuul | 07:19 | |
*** quiquell is now known as quiquiell|brb | 07:30 | |
*** rlandy|bbl is now known as rlandy | 07:44 | |
*** bhavikdbavishi has quit IRC | 07:56 | |
*** quiquiell|brb is now known as quiquell | 07:59 | |
*** themroc has joined #zuul | 08:09 | |
*** jpena|off is now known as jpena | 08:29 | |
*** saneax has quit IRC | 08:37 | |
*** hashar has joined #zuul | 08:37 | |
*** electrofelix has joined #zuul | 09:08 | |
*** gtema has joined #zuul | 09:26 | |
*** panda|off is now known as panda | 09:54 | |
*** iurygregory has quit IRC | 10:01 | |
*** spsurya has quit IRC | 10:22 | |
*** bhavikdbavishi has joined #zuul | 11:10 | |
*** ruffian_sheep has quit IRC | 11:21 | |
*** nilashishc has quit IRC | 11:43 | |
*** jpena is now known as jpena|lunch | 11:56 | |
*** bhavikdbavishi has quit IRC | 11:59 | |
*** bhavikdbavishi has joined #zuul | 11:59 | |
*** sdake has joined #zuul | 12:01 | |
*** quiquell is now known as quiquell|lunch | 12:05 | |
*** nilashishc has joined #zuul | 12:11 | |
*** bhavikdbavishi1 has joined #zuul | 12:20 | |
*** bhavikdbavishi has quit IRC | 12:20 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 12:20 | |
*** saneax has joined #zuul | 12:40 | |
*** jesusaur has quit IRC | 12:50 | |
*** jesusaur has joined #zuul | 12:56 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/zuul-jobs master: Assure iptables is installed inside multi-node-firewall role https://review.openstack.org/638414 | 12:58 |
*** gtema has quit IRC | 12:59 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/zuul-jobs master: Assure iptables is installed inside multi-node-firewall role https://review.openstack.org/638414 | 13:08 |
*** quiquell|lunch is now known as quiquell | 13:10 | |
*** jpena|lunch is now known as jpena | 13:15 | |
*** sdake has quit IRC | 13:23 | |
*** sdake has joined #zuul | 13:25 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: use-buildset-registry: configure as a pull-through proxy https://review.openstack.org/638312 | 13:28 |
*** bhavikdbavishi has quit IRC | 13:32 | |
*** gtema has joined #zuul | 13:35 | |
*** sdake has quit IRC | 13:36 | |
*** sdake has joined #zuul | 13:37 | |
*** rlandy has joined #zuul | 13:40 | |
*** jamesmcarthur has joined #zuul | 13:43 | |
*** nilashishc has quit IRC | 13:43 | |
*** jamesmcarthur has quit IRC | 13:58 | |
*** jamesmcarthur has joined #zuul | 13:58 | |
*** jamesmcarthur has quit IRC | 14:03 | |
*** sdake has quit IRC | 14:14 | |
*** panda is now known as panda|ruck | 14:16 | |
*** gtema has quit IRC | 14:18 | |
*** jamesmcarthur has joined #zuul | 14:29 | |
*** sdake has joined #zuul | 14:31 | |
*** jamesmcarthur has quit IRC | 14:34 | |
*** nilashishc has joined #zuul | 14:39 | |
*** nilashishc has quit IRC | 14:42 | |
*** sdake has quit IRC | 14:51 | |
pabelanger | Shrews: https://storyboard.openstack.org/#!/story/2005064 is a fun bug in nodepool we found last week, I'm happy to work on it, but would need some design help | 14:58 |
pabelanger | for history, http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2019-02-12.log.html#t2019-02-12T15:23:56 is the discussion last week with clarkb | 14:58 |
*** sdake has joined #zuul | 15:01 | |
Shrews | pabelanger: interesting. i haven't read the irc log yet, but did you identify why the instance creation failed? | 15:04 |
Shrews | i don't see that in any of the links | 15:05 |
pabelanger | Shrews: cloud was in a bad state, along with lack of resources | 15:05 |
pabelanger | quota for provider was superhigh, even though there was no compute capacity | 15:05 |
Shrews | pabelanger: what do you mean by "bad state"? | 15:06 |
Shrews | or is the quota thing the reason | 15:06 |
Shrews | or a symptom | 15:06 |
pabelanger | Shrews: like the controllers for the cloud were in the process of dying, and needed to be rebooted | 15:07 |
pabelanger | the cloud in general isn't very stable | 15:07 |
corvus | SpamapS: for 'promote' you need a change-based pipeline (not like 'post' which is sha-based). my guess for github is you want to trigger on a 'pull_request' event, action 'closed', and require 'merged'. | 15:07 |
Shrews | pabelanger: what confuses me is that an error is reported for the node creation, yet instances are still being created (or so it would seem). how would you go about solving that? the cloud seems to be lying to us | 15:08 |
*** saneax has quit IRC | 15:08 | |
pabelanger | Shrews: however, once it recovered, there was no capacity for new nodes, because all of the ERROR'd instances nodepool created. And because of the reuse of nodepool_node_id, the clean up handler couldn't see those instnaces to try to delete | 15:08 |
pabelanger | Shrews: Yah, I am not too sure. The other key info, is retries on that provider is crazy high (99999999), as not to have nodepool report NODE_FAILURE results back to gerrit | 15:09 |
*** jamesmcarthur has joined #zuul | 15:10 | |
pabelanger | one idea clarkb and I discussed, was to update the nodepool_node_id on the ERROR'd node, when the exception is raise, but unsure if that is the right thing to do | 15:11 |
jkt | tristanC: I have an option for the runc driver to optionally use overlayfs for container's rootfs. Should I squash it to that change under review, or submit as a separate one? | 15:12 |
Shrews | pabelanger: how can you update nodepool_node_id on a node that is reported to not exist when the exception is raised? | 15:12 |
Shrews | pabelanger: https://softwarefactory-project.io/paste/show/1424/ shows that we don't see the node at that point | 15:13 |
Shrews | pabelanger: if the root of the problem is that what the cloud reports to us does not match reality (i feel like that's what your bug report might be saying?), then i don't know how to solve that up front. | 15:13 |
Shrews | pabelanger: let me read through the irc scrollback to see if i'm missing something | 15:15 |
pabelanger | Shrews: right, not sure why it doesn't exist, I suspect openstack side issue. However, we do eventually see it in our cleanup handler, but fail to try to delete it because of: http://git.zuul-ci.org/cgit/nodepool/tree/nodepool/driver/openstack/provider.py#n514 | 15:15 |
pabelanger | if we didn't reuse the nodepool_node_id on failure, I don't think we'd have this issue (but unsure what that means for code base) | 15:15 |
pabelanger | sure | 15:15 |
*** jamesmcarthur has quit IRC | 15:17 | |
*** swest has quit IRC | 15:17 | |
Shrews | pabelanger: can you explain how that code causes us to fail to delete it? is it because that zknode exists? if so, what is its STATE value? | 15:19 |
Shrews | pabelanger: also, have you been able to create a test case that shows this? | 15:20 |
Shrews | i think we should start with that, if possible | 15:21 |
pabelanger | Shrews: yah, it still existed. The state was either READY or inuse, because a node eventually was build with the nodepool_node_id: see like 86 at https://softwarefactory-project.io/paste/show/1428/ | 15:22 |
pabelanger | I don't seem to have the original zk output | 15:22 |
pabelanger | I think what happens in the case of ^ paste, is once server 3cb0fc64-0f00-48b5-afeb-2fad08d941cf is finished running a job ( in this case usually 3 hours), the data will eventually be deleted in zk, then our clean up handler will start to see the leaked error nodes | 15:23 |
pabelanger | but, never confirmed that because they manually deleted the error'd nodes using openstack client | 15:24 |
Shrews | pabelanger: ok, there are lots of question marks here... if you can create a test case that fails and demonstrates the issue, it would be much easier to dissect | 15:27 |
pabelanger | ack, I can work on that | 15:27 |
Shrews | because i'm still confused as to what state the zookeeper database and clouds were actually in | 15:28 |
pabelanger | just thinking, I'm not sure how to mock the openstack side, because we need to actually validate there will be 2 nodes created (1 error, 1 ready) but nodepool thinks there is only 1 zookeeper entry | 15:30 |
pabelanger | due to same nodepool_node_id | 15:30 |
Shrews | pabelanger: do you think what's happening is that we 1) get an exception on launch 2) create a zknode for the errored instance 3) cleanup thread tries to delete errored instance but it does not yet exist 4) we delete the zknode 5) errored instance now appears in the cloud | 15:36 |
Shrews | i'm still trying to understand what's causing the errored instances to not get deleted | 15:38 |
pabelanger | Shrews: yes, that seems to be the flow. | 15:39 |
pabelanger | I am unsure why that is | 15:39 |
Shrews | so why isn't http://git.zuul-ci.org/cgit/nodepool/tree/nodepool/driver/openstack/provider.py#n514 deleting it? | 15:39 |
pabelanger | but step 5) errored node appears, with original nodepool_node_id, which is likely been reused for the next node launch attempt | 15:40 |
Shrews | ah, that's new info | 15:40 |
pabelanger | Shrews: sorry, that is what I was trying to show in: https://softwarefactory-project.io/paste/show/1430/ you can see 0001615360 exists twice for 2 different nodes | 15:41 |
*** sdake has quit IRC | 15:48 | |
*** jamesmcarthur has joined #zuul | 15:48 | |
Shrews | so nodepool_node_id is not matching the actual zk node sequence id, caused by the new code here: http://git.zuul-ci.org/cgit/nodepool/tree/nodepool/driver/openstack/handler.py#n260 | 15:49 |
*** sdake has joined #zuul | 15:50 | |
Shrews | however, that *should* eventually resolve itself | 15:51 |
pabelanger | Shrews: right, the new nodepool_node_id is delete, step 3 above leaking original | 15:52 |
pabelanger | yah, I agree it should resolve, however the time it takes to resolve could end up being a large amount of time | 15:52 |
Shrews | yeah, until the READY instance gets used and deleted, the errored instances will remain | 15:53 |
pabelanger | we retries set to 999999 we end up created a lot of ERROR'd nodes | 15:53 |
*** jamesmcarthur has quit IRC | 15:53 | |
pabelanger | yes | 15:53 |
Shrews | that's an unusually high number | 15:53 |
pabelanger | yup, single cloud provider, and don't want nodepool to report back NODE_FAILURES | 15:53 |
pabelanger | so, basically setup to keep trying forever until a nodepool node comes online | 15:53 |
pabelanger | kinda like v2 nodepool | 15:54 |
Shrews | ok, wow. edge case i don't believe we've considered | 15:54 |
pabelanger | yah, so not sure the best path there to have nodepool delete the error'd nodes, because it is resulting in people manually cleaning them up | 15:56 |
Shrews | pabelanger: ok, now that i better grasp what you're seeing, i can now say with 100% certainty that i don't have an immediate answer. :) | 15:56 |
Shrews | will require some thought | 15:56 |
pabelanger | Yay! | 15:56 |
pabelanger | cool, thanks | 15:56 |
*** jamesmcarthur has joined #zuul | 15:58 | |
*** jamesmcarthur has quit IRC | 15:58 | |
*** jamesmcarthur has joined #zuul | 15:58 | |
Shrews | right now, nodepool assumes temporary excess usage for a short period. but your use case flips that on its head, where we need NO excess usage for a possibly long period of time | 16:01 |
Shrews | not certain there is a quick fix for that, but... maybe? | 16:02 |
*** themroc has quit IRC | 16:03 | |
*** rfolco is now known as rfolco|rover | 16:07 | |
clarkb | Shrews: pabelanger my short summary of understading that bug is we don't delete it because the node request eventually succeeds. Once the successful node is used and makred for deletion the leaked nodes should be deletable as well | 16:09 |
clarkb | so the issue is time to delete more than never gets deleted | 16:09 |
*** quiquell is now known as quiquell|off | 16:13 | |
*** jamesmcarthur has quit IRC | 16:13 | |
Shrews | clarkb: that's pretty much the clarifying comment i just left on the storyboard issue | 16:14 |
Shrews | clarkb: pabelanger: does my comment here seem to sum it up properly? https://storyboard.openstack.org/#!/story/2005064 | 16:15 |
clarkb | Shrews: yup that looks correct to me | 16:16 |
pabelanger | Shrews: yes, thanks | 16:16 |
clarkb | I think the other thing that complicates this is we refer to the nova metadata to get the request id | 16:17 |
clarkb | but that ends up being stale. I wonder if we can replace that with a boot key that a node request updates for each node booted | 16:17 |
Shrews | yeah, if we could keep those values in sync, that would be the solution | 16:23 |
clarkb | Shrews: or maybe do node-request-id-iteration-number | 16:25 |
clarkb | in nova metadata | 16:25 |
clarkb | then the node request can record the iteration that was successful. This has the nice effect of being human readable | 16:25 |
*** jamesmcarthur has joined #zuul | 16:27 | |
*** hashar has quit IRC | 16:39 | |
clarkb | following up on the github commit to PR cache in Zuul. It does seem to help the openstack deployment but I'm still seeing the occasional run up to a backlog of a couple hundred then back down to zero | 16:45 |
clarkb | https://github.com/ansible/ansible/pull/52682 is a PR merging that caused it to run up todate. | 16:45 |
clarkb | I haven't confirmed this but I think the merge_commit_sha is relative to the current branch head of the target branch | 16:45 |
clarkb | this means if say ansible were to merge two commits in succession the merge_commit_sha for the second would move and we wouldn't necessarily have that cached forcing us to do the full scan | 16:45 |
clarkb | if this is what is happening I don't think there is much that zuul can do unfortunately. And really points to the need of better query api tooling in github (which they acknowledge so yay) | 16:46 |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: Proposed spec: tenant-scoped admin web API https://review.openstack.org/562321 | 17:06 |
*** pcaruana has quit IRC | 17:07 | |
*** nilashishc has joined #zuul | 17:20 | |
*** jamesmcarthur_ has joined #zuul | 17:24 | |
*** jamesmcarthur has quit IRC | 17:27 | |
*** nilashishc has quit IRC | 17:31 | |
*** nilashishc has joined #zuul | 17:31 | |
*** nilashishc has quit IRC | 17:40 | |
*** panda|ruck is now known as panda|ruck|off | 17:48 | |
openstackgerrit | Jan Kundrát proposed openstack-infra/nodepool master: runc: Allow overlayfs mounts for container's rootfs https://review.openstack.org/638469 | 17:57 |
*** sdake has quit IRC | 18:05 | |
*** sdake has joined #zuul | 18:05 | |
*** jpena is now known as jpena|off | 18:29 | |
jlk | have y'all brought up TravisCI news in here yet? | 18:30 |
pabelanger | haven't heard anything recently | 18:32 |
jlk | https://www.reddit.com/r/devops/comments/at3oyq/it_looks_like_ibera_is_gutting_travis_ci_just_a/ | 18:32 |
pabelanger | I know they were acquired a few weeks ago | 18:32 |
jlk | A bunch of sudden layoffs, in some cases without the person's manager knowing about it | 18:32 |
Shrews | wow | 18:34 |
*** electrofelix has quit IRC | 18:37 | |
pabelanger | eep | 18:37 |
mordred | yuck | 18:55 |
jlk | There's going to be some good people on the job market who understaned CI systems. | 18:56 |
mordred | jlk: not to be morbid, but it's a good reminder that the issue is never the intention of the original well-intentioned startup ... it's the potential behavior of whoever buys them that's the issue | 18:56 |
mordred | jlk: ++ | 18:56 |
jlk | mordred: that's some uh... relevant musings given uh... recent acquisitions. | 18:57 |
mordred | jlk: not, of course, that you have any experience being part of a startup that was bought by people who then proceeded to have no clue or anything | 18:57 |
jlk | yeah... | 18:57 |
jlk | I am currently rather pleased with my giant corp overlord. | 18:57 |
* Shrews gets a cold chill for some reason | 18:57 | |
mordred | jlk: your new overlord seems to have become one of the most clued in corp overlords, which really just makes me doubt reality | 18:58 |
jlk | this timeline is so weird | 18:58 |
*** hashar has joined #zuul | 18:59 | |
mordred | :) | 18:59 |
*** manjeets has quit IRC | 19:14 | |
*** manjeets has joined #zuul | 19:14 | |
*** hashar has quit IRC | 19:14 | |
fungi | that timeline where the empire commits to a lengthy effort to rebuild coruscant, and then paints the death star a cheerful shade of yellow and starts selling flowers out of the docking bay | 19:56 |
*** jamesmcarthur_ has quit IRC | 19:56 | |
*** jamesmcarthur has joined #zuul | 19:57 | |
fungi | no, wait, i meant rebuild alderaan (just gotta find all the pieces first) | 19:58 |
*** jamesmcarthur has quit IRC | 20:01 | |
*** jamesmcarthur has joined #zuul | 20:18 | |
*** jamesmcarthur has quit IRC | 20:40 | |
*** jamesmcarthur has joined #zuul | 20:40 | |
*** jamesmcarthur has quit IRC | 20:45 | |
*** jamesmcarthur has joined #zuul | 20:53 | |
daniel2 | So nodepool has a bunch of stale builds: http://paste.openstack.org/show/745652/ I can't seem to clean them out, any suggestions? | 20:54 |
clarkb | daniel2: this is still older nodepool not v3 right? | 20:55 |
daniel2 | clarkb: yes, 0.5.0 | 20:55 |
daniel2 | We were running 0.3.0 originally but I was able to make a case to at least upgrade to 0.5.0 | 20:55 |
clarkb | daniel2: I think that happens if the builder and main server which talks to the mysql db get out of sync (this was a big reason for movign to zookeeper for this data). In that case I think you just have to drop the database entries | 20:55 |
daniel2 | the mysql database tables are empty. I forgot about zookeeper | 20:56 |
daniel2 | It must be storing in that. | 20:56 |
daniel2 | clarkb: thanks, cleared the node in zookeeper which fixed that. | 21:00 |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: Proposed spec: tenant-scoped admin web API https://review.openstack.org/562321 | 21:26 |
tristanC | jkt: it sounds like overlayfs could use a separate review, but maybe not, i don't know the implementation detail | 21:27 |
*** hashar has joined #zuul | 21:32 | |
*** badboy has quit IRC | 21:41 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: run-buildset-registry: run a dual registry https://review.openstack.org/638514 | 21:54 |
*** jamesmcarthur has quit IRC | 21:57 | |
*** jamesmcarthur has joined #zuul | 22:05 | |
*** jamesmcarthur has quit IRC | 22:07 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: use-buildset-registry: support running before docker installed https://review.openstack.org/638180 | 22:12 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Split docker mirror config into its own role https://review.openstack.org/638195 | 22:12 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Use buildset registry push endpoint https://review.openstack.org/638520 | 22:12 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Use buildset registry push endpoint https://review.openstack.org/638520 | 22:15 |
clarkb | looking at the github event processing we definitely seem to take a minute an a half to process consecutive ansible merged PR status events | 22:22 |
clarkb | I would expect the first to populate the cache then the second and third and so on to be looked up that way | 22:22 |
clarkb | beginning to suspect either the shas are not up to date as expected (like when does github update merge_commit_sha ?) or we've got a bug | 22:23 |
clarkb | jlk: ^ do you know how eventually consistent the merge_commit_sha data is? | 22:23 |
jlk | I do not. and I'm just walking out teh door now | 22:23 |
clarkb | no worries then | 22:24 |
clarkb | we've had this problem for a lot longer than a day:) | 22:24 |
jlk | you're using merge_commit_sha as the key, not the HEAD of the PR ? | 22:26 |
clarkb | jlk: our cache maps shas to PRs. We are putting the regular head sha and the merge_commit_sha both in the cache | 22:27 |
*** sdake has quit IRC | 22:27 | |
clarkb | jlk: the idea being once the PR merges the status event will be on the merge_commit_sha | 22:27 |
jlk | I think that merge_commit_sha is a background task, it may update whenever a human hits the PR page, or some other background timer? I'm not totally sure. I don't get the feeling that it's calculated at every API get. | 22:28 |
clarkb | but it wouldn't surprise me if the merge_commit_shas are updated in batches as evaluating what things merge to every time tehre is another merge is likely not cheap | 22:28 |
clarkb | ya so that is a best effort caching and may not be completely accurate | 22:28 |
jlk | yeah, and I think that assumes a particular type of merge, which if it's a human it could be different | 22:28 |
clarkb | (sometimes we'll get lucky) | 22:28 |
*** hashar has quit IRC | 22:28 | |
jlk | squash vs rebase vs whatever | 22:28 |
clarkb | oh good point | 22:29 |
jkt | tristanC: I'm almost going to bed, but https://review.openstack.org/638469 . Also, I'm quite tempted to run a full init (systemd in my case) in the container just so that I could use ansible roles for e.g. starting httpd | 22:31 |
jkt | tristanC: do you see a huge blocker in that? | 22:31 |
tristanC | jkt: well there is security concern if you enable privileged access. | 22:41 |
tristanC | jkt: if you want r/w access to /, then we might want to look at better runtime like lxd | 22:44 |
jkt | tristanC: it's on overlayfs | 22:46 |
tristanC | jkt: i meant, you'll need to become root, and then you are exposed to issue like CVE-2019-5736 | 22:49 |
tristanC | jkt: the intend of the runc driver is to be as safe as possible by locking down the environment to only enable unprivileged task like tox | 22:50 |
tristanC | jkt: it can surely be extended to authorize privileged access, but instead of managing overlayfs manually, we may want to look at another runtime for that | 22:51 |
tristanC | jkt: for example i think "podman --systemd" does a better job at running systemd in a container | 22:52 |
mordred | tristanC, jkt: I'm halfway through writing an email on that very topic actually - hopefully I'll be finished by tomorrow | 23:07 |
tristanC | mordred: i'm looking forward to it :) | 23:11 |
mordred | \o/ | 23:12 |
*** sdake has joined #zuul | 23:13 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: gerrit: add support for report only connection https://review.openstack.org/568216 | 23:20 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078 | 23:25 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: zuul-runner: add quick-start integration test https://review.openstack.org/635701 | 23:28 |
tristanC | corvus: can you please reconsider your -2 on https://review.openstack.org/555153, last PS should have addressed your concerns | 23:32 |
corvus | tristanC: yes, sorry i have a backlog of reviews to do, i've been unusually busy with other work | 23:33 |
*** sdake has quit IRC | 23:33 | |
corvus | tristanC: i did just add a -2 to 635701 though. of course feel free to continue using that change to debug problems with testing -- i just didn't want you to spend too much time on writing a quick-start based test. | 23:34 |
*** sdake has joined #zuul | 23:38 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: run-buildset-registry: run a dual registry https://review.openstack.org/638514 | 23:39 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: use-buildset-registry: support running before docker installed https://review.openstack.org/638180 | 23:39 |
corvus | tristanC: it looks like the web trigger change is injecting job variables? that also violates the model. the model is that events enqueue refs. once the ref is enqueued, we forget the information about the triggering event. any trigger driver needs to be able to enqueue a ref and cause the same jobs to run. | 23:40 |
tristanC | corvus: arg... how would the web trigger generate a ref then? | 23:42 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Split docker mirror config into its own role https://review.openstack.org/638195 | 23:43 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Use buildset registry push endpoint https://review.openstack.org/638520 | 23:43 |
corvus | tristanC: it doesn't have to create a ref, it has to tell zuul to enqueue a ref. for example, the timer trigger enqueues the branch-tip ref. | 23:43 |
tristanC | right, but how to set custom variables then? | 23:43 |
corvus | we can't -- triggers can't set variables. | 23:44 |
tristanC | well, i would argue that a gerrit triggers event can set variables... | 23:44 |
corvus | the whole point of the model is that all of the workflow actions are encoded in configuration stored in git. there's no ad-hoc running jobs manually with special arguments. | 23:44 |
corvus | tristanC: the gerrit and github triggers can not set job variables | 23:45 |
tristanC | i could create a review with new variable, and the resulting gerrit triggered event would apply those variables to the jobs | 23:45 |
corvus | tristanC: right, but you've created a git commit to do that. that's part of the git-driven workflow. | 23:46 |
*** sdake has quit IRC | 23:46 | |
corvus | there's a record of it, other people can see it, copy it, etc. there's no reliance on humans setting parameters manually. | 23:47 |
*** sdake has joined #zuul | 23:47 | |
tristanC | corvus: well you could delete the review in recent gerrit | 23:47 |
corvus | that doesn't sound like a *good* git-driven workflow, in fact, it sounds like a very bad one, but it does still sound like a git-driven workflow. | 23:50 |
*** sdake has quit IRC | 23:51 | |
*** sdake_ has joined #zuul | 23:51 | |
corvus | the fundamental issue is that we must not create a system where the only way to do something in zuul is manually. | 23:51 |
tristanC | corvus: i understand the point of git-driven workflow, but asking our user to create fake review to start a job with a custom parameter is a no-go unfortunately | 23:54 |
corvus | tristanC: so we're going to have to continue to dive into the underlying use case | 23:54 |
tristanC | corvus: perhaps the webtrigger could take care of creating and closing the review using the gerrit connection? | 23:55 |
tristanC | corvus: iiuc it's really a common use case for jenkins user, that is to start a job with custom parameter | 23:55 |
corvus | tristanC: right, that's not a use case | 23:55 |
corvus | that's "jenkins does something this way" | 23:56 |
corvus | i want to know what thing you're trying to accomplish | 23:56 |
corvus | if the answer is "a user wants to run a job with custom parameters" then the next question is "why do they want to do that? what are they trying to cause to happen?" | 23:57 |
tristanC | corvus: iiuc, Q[AE] folks wants to run job manually, for example to reproduce a customer case, they would start the same ci job but with the right set of option to investigate | 23:58 |
tristanC | corvus: which is something they can do with jenkins through the web interface | 23:58 |
corvus | tristanC: well, won't that give them the same answer? | 23:59 |
corvus | it's a gating system, presumable it has already run with those parameters and passed. | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!