Monday, 2019-03-04

*** jamesmcarthur has quit IRC00:00
*** jamesmcarthur has joined #zuul00:00
*** jamesmcarthur has quit IRC00:02
*** jamesmcarthur has joined #zuul00:32
*** jamesmcarthur has quit IRC01:01
*** sdake has joined #zuul01:06
*** jesusaur has quit IRC01:19
*** jesusaur has joined #zuul01:25
*** sdake has quit IRC01:44
*** sdake has joined #zuul01:48
*** sdake has quit IRC01:59
*** sdake has joined #zuul02:05
*** sdake has quit IRC02:08
*** sdake has joined #zuul02:42
*** sdake has quit IRC02:44
*** sdake has joined #zuul02:50
*** jamesmcarthur has joined #zuul03:01
*** jamesmcarthur has quit IRC03:05
*** sdake has quit IRC03:24
*** sdake has joined #zuul03:26
*** jamesmcarthur has joined #zuul03:27
*** sdake has quit IRC03:41
*** jamesmcarthur has quit IRC03:41
*** sdake has joined #zuul05:09
*** snapiri has joined #zuul05:19
tristanCpabelanger: zuul-maint: i've started a zuul-discuss thread about https://review.openstack.org/632620 , what's the next step to land this change?05:22
*** saneax has joined #zuul05:26
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: Separate out executor server from runner  https://review.openstack.org/60707905:41
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: zuul-runner: implement prep-workspace  https://review.openstack.org/60708205:41
*** raukadah has quit IRC05:45
*** chandankumar has joined #zuul05:46
*** sdake has quit IRC05:48
*** sdake has joined #zuul05:53
*** sdake has quit IRC06:01
*** sdake has joined #zuul06:02
*** sdake has quit IRC06:03
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: Add API endpoint to get frozen jobs  https://review.openstack.org/60707706:26
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: Get executor job params  https://review.openstack.org/60707806:26
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: Separate out executor server from runner  https://review.openstack.org/60707906:26
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: zuul-runner: implement prep-workspace  https://review.openstack.org/60708206:26
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: zuul-runner: add yaml based configuration file  https://review.openstack.org/64067206:26
*** badboy has joined #zuul06:54
tristanCcorvus: is there an example of job using provides/requires in opendev's zuul?07:00
*** quiquell|off is now known as quiquell07:03
*** [GNU] has joined #zuul07:19
*** quiquell is now known as quiquell|brb07:43
*** themroc has joined #zuul07:44
*** snapiri has quit IRC07:52
*** gtema has joined #zuul07:58
*** jesusaur has quit IRC08:06
*** delhage has left #zuul08:10
*** jesusaur has joined #zuul08:10
*** panda|ruck|off is now known as panda|ruck08:12
*** quiquell|brb is now known as quiquell08:18
*** pcaruana has joined #zuul08:25
*** hashar has joined #zuul08:36
*** jpena|off is now known as jpena08:56
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: [WIP] Add Authorization Rules configuration  https://review.openstack.org/63985509:43
*** electrofelix has joined #zuul10:07
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: Proposed spec: tenant-scoped admin web API  https://review.openstack.org/56232110:25
*** gtema has quit IRC10:34
*** gtema has joined #zuul10:47
*** saneax has quit IRC10:52
*** gtema has quit IRC10:56
badboyhi11:23
badboyis it possible to trigger a job based on a label?11:23
*** dkehn has quit IRC11:32
*** gtema has joined #zuul11:33
*** hashar has quit IRC11:43
*** hashar has joined #zuul12:08
*** saneax has joined #zuul12:29
*** jpena is now known as jpena|lunch12:38
*** edmondsw has joined #zuul12:50
*** zbr has quit IRC12:53
*** zbr|ssbarnea has joined #zuul12:54
*** zbr|ssbarnea has quit IRC12:54
*** zbr has joined #zuul12:55
*** rlandy has joined #zuul12:57
*** saneax has quit IRC13:15
*** sdake has joined #zuul13:24
*** jamesmcarthur has joined #zuul13:24
*** jpena|lunch is now known as jpena13:31
*** jamesmcarthur has quit IRC13:32
*** jamesmcarthur has joined #zuul13:33
*** mhu has quit IRC13:36
*** mhu has joined #zuul13:37
*** jamesmcarthur has quit IRC13:38
*** rfolco is now known as rfolco|pto13:50
*** jamesmcarthur has joined #zuul14:03
*** jamesmcarthur has quit IRC14:06
*** jamesmcarthur has joined #zuul14:06
*** pwhalen has quit IRC14:13
*** sdake has quit IRC14:14
*** gtema has quit IRC14:23
*** jamesmcarthur has quit IRC14:24
*** sdake has joined #zuul14:25
*** pwhalen has joined #zuul14:25
*** sdake has quit IRC14:25
*** pcaruana has quit IRC14:31
*** sdake has joined #zuul14:38
*** jamesmcarthur has joined #zuul14:38
tobiashbadboy: in github?14:41
tobiashyes14:41
tobiashbadboy: you can use the labeled action: https://zuul-ci.org/docs/zuul/admin/drivers/github.html#trigger-configuration14:42
tobiash(in the pipeline config)14:42
*** sdake has quit IRC14:49
*** sdake has joined #zuul14:55
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: [WIP] Add Authorization Rules configuration  https://review.openstack.org/63985514:56
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Remove TaskManager and just use keystoneauth  https://review.openstack.org/64064315:08
*** sdake has quit IRC15:11
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Remove TaskManager and just use keystoneauth  https://review.openstack.org/64064315:12
corvustristanC: re provides/requires -- not fully functioning yet, but this one does: http://git.openstack.org/cgit/opendev/base-jobs/tree/zuul.yaml#n177 -- we need to restart nodepool/zuul for provider affinity to continue testing that (we had jobs unable to contact the buildset registry because ipv4-only regions couldn't talk to ipv6-only)15:14
*** sdake has joined #zuul15:15
Shrewsoh, speaking of nodepool restarts, i don't think we've restarted launchers since we added back the port cleanup code15:16
Shrewsso we'll have a couple of things to watch there15:17
jkttristanC: I think opendev doesn't use provides/requires yet (it hasn't been in a releae yet), but I've used it successfuly15:17
mordredShrews: also - we cut a new openstacksdk release which touched image uploading code ... it's been tested, obviously - but also gremlins exist in the world, so we should watch the builders next restart too15:18
jkttristanC: this is my producer, https://gerrit.cesnet.cz/plugins/gitiles/CzechLight/dependencies/+/master , and here's one consumer: https://gerrit.cesnet.cz/plugins/gitiles/CzechLight/netconf-cli/+/master15:19
mordred(the changes to image code were organizational - the code is ultimately the same code, but it got moved around)15:19
Shrewsmordred: wheeeee15:19
jkttristanC: it's passing three different artifacts; some additional pieces are at https://gerrit.cesnet.cz/plugins/gitiles/ci/zuul-jobs-cesnet/+/master15:19
fungijkt: being in a release isn't a requirement for opendev's zuul to use a new feature... we continuously deploy from master and even act as a frequent proving ground for unreleased features15:24
fungithough i don't know of any provides/requires examples in our configuration right off the top of my head15:25
*** pcaruana has joined #zuul15:25
*** chandankumar is now known as raukadah15:32
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Remove TaskManager and just use keystoneauth  https://review.openstack.org/64064315:38
*** quiquell is now known as quiquell|off15:47
*** sdake has quit IRC16:00
*** themroc has quit IRC16:00
tobiashcorvus: I have an interesting case where the ansible ssh connection retry mechanism (we hard code 3 atm) causes more harm than it helps16:10
tobiashcorvus: during a compile job ansible thought the connection has been lost and it retried that single task (while the first try in fact was still running). So in the end we had two compile processes running at the same time which failed the job.16:11
tobiashcorvus: so I think this is even potentially dangerous and we should think about whether we want this behavior or instead rely on unreachable nodes detection and retry the whole job instead16:12
tobiashthe idea of ansible is that everything is idempotent so executing the same task multiple times is normally ok in normal use cases. But in ci use cases this is not necessarily the case16:13
tobiashand then it might be just incorrect to retry a task we don't even know if it's still running16:14
mordredtobiash: did the connection get lost because the compile was making the machine seem unresponsive?16:14
mordredretrying a task that could be running does seem more dangerous than retrying a task where we weren't able to connect in the first place - do we have any way of distinguishing the two?16:15
tobiashmordred: I don't know. I just saw that the json log looks normal (because ansible doesn't store the result of the first try somewhere) while the job-output.txt showed what actually happened (task restart in parallel to the still running task)16:16
tobiashno, we even don't get any hint about it because that's deep inside ansible16:17
tobiashand if the connection is terminated there is no way of knowing if the process executed by that ssh session was interrupted or successful or failed.16:21
Shrewsmordred or SpamapS: have opinions on a good rust tutorial?16:21
tobiashso I think the most correct thing here would be to not do a retry which should result in an unreachable result of the task and thus a retry of the job16:21
mordredShrews: the rust book is actually pretty solid https://doc.rust-lang.org/book/16:23
mordredtobiash: great - yeah - not being able to distinguish the two is not awesome16:24
Shrewsmordred: thx16:24
corvusShrews, mordred: i spent a good deal of the weekend reading the rust book.  i'm in chapter 18 now.  it is quite good.  keep in mind there's also "rust by example", depending on your learning style.  https://doc.rust-lang.org/#learn-rust16:30
corvusthere are some things that are completely new in rust though (eg, ownership and lifetime annotations), so that even if you don't start with the book, there are probably some things worth going to the book and reading later to get more background.16:33
corvustobiash, mordred: we may want to pull in pabelanger on the retry question too, istr he worked on that a bit.16:34
Shrewscorvus: ch 18? fast reader16:35
clarkbcorvus: Shrews the rust zuul preview tool is actually a really good illustration of ownership too due to its use of anonymous functions and variables with limited scope life stored in long lived cache16:35
corvusShrews: it rained a lot this weekend16:36
Shrewscorvus: since i'm mostly still laid up from my injuries, hopefully i can plow through as quickly16:37
corvusShrews: it's like you planned it!  :)16:38
pabelangertobiash: what type of task was your compile job? some sort of long lived shell / command?16:43
tobiashlooks like I have to learn rust too once y'all reimplement zuul in rust ;)16:43
tobiashpabelanger: yes, long lived shell task16:43
pabelangerI'm not sure I'd want to remove retries completely, but we did discuss maybe exposing the value to users to modify16:43
corvuspabelanger: do you think retries help, or are necessary?16:44
tobiashpabelanger: we already have job retries in place for connection failures16:44
pabelangercorvus: in our case, I think they have helped. We have pretty poor networking in our cloud, however I don't really have numbers to back it up16:45
tobiashBut maybe a config option is a reasanoble comprimise. But I have to say that this can lead to hard to debug job failures16:45
pabelangerI'm open to a revert, but need to let people know this could result in a higher number of retries for jobs16:47
pabelangerbut also okay with config options too16:48
SpamapSShrews: mordred and corvus have you covered. The book is everything. :)16:48
tobiashthe retry mechanism as ansible implements it is actually only valid if every task is idempotent and concurrency safe16:48
tobiashwhich is not the case for typical build jobs16:48
SpamapSShrews: also to contrast with the preview tool code, which is tight and focused on a single thing, here's a more scripty Rust thing that I wrote: https://github.com/ToolsForHumans/shyaml/blob/master/src/main.rs16:49
tobiashso it's a trade of16:49
SpamapS(and I use every day)16:49
pabelangertobiash: right, we try to use creates field for shell when possible, but agree that is much harder with long running tasks.16:50
corvustobiash, pabelanger: when the retries=3 setting was added, we had not yet implemented the job_unreachable_file.  my understanding is that means we were missing some cases where ansible should have been reporting unreachable (where, if it had, we would have retried the job)16:50
corvustobiash, pabelanger: since we added retries=3 to deal with "unreachable" situations, i wonder if, now that we have the job_unreachable_file, if it would be okay to remove retries=3 and rely on job retries?16:51
tobiashcorvus, pabelanger: I think that would at least the 100% correct way16:52
tobiashso retries=3 would be just an optimization to do a smaller scoped retry at the expense of possible races16:52
pabelangerjust looking to see what job_unreachable_file does, first I have heard of that16:53
tobiashso I would vote for either removing it or defaulting to 0 and explain in the docs that this is a potentially dangerous optimization16:53
tobiashpabelanger: tldr is that ansible doesn't return the correct result code for unreachable so we use the callback to detect that16:54
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: [WIP] Add Authorization Rules configuration  https://review.openstack.org/63985516:54
tobiashpabelanger: https://git.zuul-ci.org/cgit/zuul/tree/zuul/ansible/callback/zuul_unreachable.py#n4016:55
pabelangertobiash: corvus: I would be okay with a revert, however I do think there might be cases where operators understand the usage of retries and want to enable / disable it. Retrying a job but having nodepool relaunch it again, is expensive compared to ansible just retrying again.16:57
pabelangerand given the $$$ cost to boot another VM, could affect budgets16:57
mhuHello, could I get some eyes on the tenant-scoped web API spec: https://review.openstack.org/#/c/562321/ ? I've added authZ handling and I'd like to get some feedback before I get too much into my PoC16:57
tobiashcorvus, pabelanger: so is it a reasonable compromise for all of us to default to 0 and make it configurable (with warning in the docs)?16:58
pabelangertobiash: that wfm if others are good16:59
corvuspabelanger: hrm.  it seems like it would be difficult for an operator to say that it's correct.  based on what tobiash is saying, whether it's correct to use retries= depends on exactly how an individual task is written16:59
*** hashar has quit IRC17:00
corvus(does ansible have a "mosh" connection plugin? :)17:01
tobiashmosh would probably solve that ;)17:02
pabelangercorvus: I would agree, depending on which task for sure. And possible the reason why we haven't seem more of the case17:04
mordredcorvus: we should totally write a mosh connection plugin :)17:07
pabelangerjust asked on internal ansible channel, there seems to have been attempts at mock connection plugin, but nothing more then friday hacking it seems17:14
*** sdake has joined #zuul17:22
*** hashar has joined #zuul17:28
*** saneax has joined #zuul17:28
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: [WIP] Add Authorization Rules configuration  https://review.openstack.org/63985517:28
mordredpabelanger: s/mock/mosh/ right? :)17:32
pabelangermordred: yes!17:32
clarkbas a side note, the jobs themselves would have to avoid the network as well if the network is flaky for improvements to ansible network connectivity to have a major impact on job reliability17:33
clarkbI'm not sure how practical it is to invest significantly in making the ansible "ssh" connection reliable on flaky entworks when pip install will fail17:33
tobiashA mock plugin would be great too and make all users happy ;)17:37
mordredtobiash: ++17:43
clarkbHey I've been asked if we can get reviews on https://review.openstack.org/#/c/639871/ to get the summit banner thing up on the zuul site17:43
pabelangerclarkb: in our use case, the executors are on the bad network, test nodes work well.  However, we still haven't made the change to zuul executor zones, so that is likely another way we'll be better with out retries17:45
corvusclarkb: lgtm17:46
clarkbty17:46
dmsimardThe failure mode of zuul-web during whatever is occuring in upstream openstack-infra is not very cool17:47
fungias in zuul-web running and accepting connections while the scheduler is offline?17:48
dmsimardYeah, there should be something better than a white page17:48
fungilooks like it's back up now, but yeah, i agree it tends to look confusing when the scheduler is out of service17:49
dmsimardMaybe a message to the effect that the web interface was not able to connect to the scheduler ? Would be useful as an end user :D17:49
dmsimardI can create a story about it perhaps.17:50
corvusdmsimard: it's supposed to do that.  there is a javascript bug.  a bugfix would be appreciated.17:50
dmsimardcorvus: my javascript/react skills are most definitely limited but I'll create a story17:51
corvusdmsimard: assign tristanC to it, he said he'd fix it last time this was reported17:52
dmsimardwfm17:52
dmsimardhttps://storyboard.openstack.org/#!/story/2005134 for ^17:57
josefwellsmy change: https://review.openstack.org/640548 failed testing, but I looked and it seems unrelated.  Didn't see it on elastic-search or a launchpad-bug17:58
openstackgerritMerged openstack-infra/zuul-website master: Add a promotional message banner and events list  https://review.openstack.org/63987117:58
fungijosefwells: yeah, looks like test_client_dequeue_change_by_ref had opposite the expected result18:21
fungiit's possible that test is racy18:21
fungii agree i don't see how a change to example playbooks in the documentation could cause that18:21
fungijosefwells: if you leave a review comment on that change starting with the word "recheck" (no quotes) it should get tested again18:22
*** electrofelix has quit IRC18:22
*** saneax has quit IRC18:25
*** jpena is now known as jpena|off18:27
*** jamesmcarthur_ has joined #zuul18:32
*** hashar is now known as hasharDinner18:32
*** jamesmcarthur has quit IRC18:35
*** jamesmcarthur_ has quit IRC18:41
*** panda|ruck is now known as panda|ruck|off18:41
fungilooks like corvus did that already18:49
corvusyep! the job that matters passed, so we should be able to land that soon :)18:50
*** jamesmcarthur has joined #zuul18:53
*** pcaruana has quit IRC18:56
josefwellsthanks fungi and corvus, I found this stuff while getting my quickstart going (and making changes for github instead of gerrit) thought I would share the love and walk through your process as well18:57
josefwellsI find the elastic-search stuff very interesting too, might end up picking that up for my processes as well18:57
fungithanks josefwells! it's much appreciated18:57
*** jlvillal has quit IRC18:58
*** jamesmcarthur has quit IRC18:59
*** jlvillal has joined #zuul18:59
*** jamesmcarthur has joined #zuul18:59
*** electrofelix has joined #zuul19:02
*** jamesmcarthur has quit IRC19:04
*** sdake has quit IRC19:07
fungiand tox-py36 did pass when the change was rechecked, so test_client_dequeue_change_by_ref doesn't seem to be consistently broken19:09
*** electrofelix has quit IRC19:10
*** jamesmcarthur has joined #zuul19:22
*** openstackgerrit has quit IRC19:23
josefwellsyes, reducing false-fails is very important in my system as well, where test times can easily reach into 4 hours.19:33
*** openstackgerrit has joined #zuul19:34
openstackgerritMerged openstack-infra/zuul master: quickstart: web and others wait on mysql to start  https://review.openstack.org/64054819:34
tobiashcorvus: responded on 63459719:46
tobiashcorvus: do you still need 638596 for testing docker stuff or is it safe to recheck and land it?19:49
corvustobiash: it can land20:03
corvustobiash: re reply: ack, thanks20:04
tobiashcorvus: thx20:04
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Log exception on module failure with empty stdout  https://review.openstack.org/64065020:09
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Manage ansible installations within zuul  https://review.openstack.org/63193020:09
*** jamesmcarthur has quit IRC20:19
*** jamesmcarthur has joined #zuul20:19
*** hasharDinner is now known as hashar20:34
*** sdake has joined #zuul20:52
*** jamesmcarthur has quit IRC20:58
*** jamesmcarthur has joined #zuul20:59
*** jamesmcarthur has quit IRC20:59
*** jamesmcarthur has joined #zuul20:59
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Manage ansible installations within zuul  https://review.openstack.org/63193021:00
tobiashjosefwells, corvus, fungi: commented on https://review.openstack.org/64054821:04
openstackgerritMerged openstack-infra/zuul master: Optionally disable disk_limit_per_job  https://review.openstack.org/63859621:05
*** jamesmcarthur has quit IRC21:06
corvustobiash: how did the job work if the playbook wasn't mounted?21:06
tobiashcorvus: because the startup script doesn't care about the failure21:06
tobiashdue to ';' instead of '&&'21:06
corvustobiash: i guess we should fix both of those things :)21:06
tobiashcorvus: the fix is also part of the ansible stack (because it will replace the playbook with a shell script)21:07
tobiashso we could fix this or wait for the multi-ansible stack depending how urgent this is21:07
tobiashI'm fine with both options21:08
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Manage ansible installations within zuul  https://review.openstack.org/63193021:08
fungitobiash: mm, good catch on both counts21:08
tobiashlemme fix that real quick21:09
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix missing wait-to-start playbook in quick start  https://review.openstack.org/64087121:11
tobiashcorvus, fungi, josefwells: ^21:11
corvustobiash: thx!21:12
tobiashyw21:12
fungimuch obliged21:12
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix test race in test_client_dequeue_change_by_ref  https://review.openstack.org/64087821:30
tobiashcorvus, fungi: I think this might fix the test race that josefwells hit in 640548 ^21:31
*** jamesmcarthur has joined #zuul21:37
*** sdake has quit IRC21:41
corvusi wonder why that uses a timer trigger in the first place?21:43
*** sdake has joined #zuul21:43
corvustobiash: thanks!21:44
tobiashcorvus: the comment says because it wants a ref based change21:44
tobiashhowever that could probably be a post job as well21:44
corvusor tag21:46
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Manage ansible installations within zuul  https://review.openstack.org/63193021:47
tobiashyepp21:48
*** klindgren_ has joined #zuul21:52
*** shanemcd has quit IRC21:53
*** klindgren has quit IRC21:53
*** smcginnis has quit IRC21:53
*** jamesmcarthur has quit IRC21:57
*** shanemcd has joined #zuul21:57
*** jamesmcarthur has joined #zuul21:58
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: [WIP] Web: plug the authorization engine  https://review.openstack.org/64088421:59
*** jamesmcarthur has quit IRC22:12
*** jamesmcarthur has joined #zuul22:13
*** sdake has quit IRC22:50
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: [WIP] Web: plug the authorization engine  https://review.openstack.org/64088422:51
*** smcginnis has joined #zuul22:54
*** jamesmcarthur has quit IRC22:57
*** jamesmcarthur has joined #zuul22:58
*** jamesmcarthur has quit IRC23:00
*** hashar has quit IRC23:10
*** sdake has joined #zuul23:12
*** openstackgerrit has quit IRC23:28
*** sdake has quit IRC23:35
*** rlandy has quit IRC23:45

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!