Friday, 2018-10-05

*** aluria has quit IRC01:29
*** rlandy has quit IRC03:22
*** openstackgerrit has quit IRC04:52
*** openstackgerrit has joined #zuul04:57
openstackgerritMerged openstack-infra/zuul-jobs master: support passing extra arguments to bdist_wheel in build-python-release  https://review.openstack.org/60790005:22
*** nilashishc has joined #zuul06:45
*** quiquell|off is now known as quiquell|brb06:50
*** nilashishc has quit IRC06:54
*** pcaruana has joined #zuul06:57
*** nilashishc has joined #zuul07:02
*** nilashishc has quit IRC07:06
*** quiquell|brb is now known as quiquell07:08
*** nilashishc has joined #zuul07:08
*** jpena|off is now known as jpena07:10
*** nilashishc has quit IRC07:19
*** nilashishc has joined #zuul07:22
tobiashtristanC: did you remove 'homepage' from the package.json on purpose in the reverr of the revert?07:37
tristanCtobiash: yes, it is actually not needed, the default to '/' is fine07:40
tobiashtristanC: that broke my nifty sed to change it in the dockerfile ;)07:41
tristanCtobiash: it also broke my sub-url patch ;)07:42
tobiashtristanC: do you know if that's overridable by an env var during the build>?07:42
tristanCtobiash: i don't think so, you'll have to patch the json07:43
tobiashok07:43
*** aluria has joined #zuul07:47
tobiashtristanC: deployment works now07:52
tobiashtristanC: found an issue: normal click on live log works, new tab of a live log results in 40407:53
tristanCtobiash: hum, it should, what's the link url looks like?07:58
tobiashtristanC: https://cc-dev1-ci.bmwgroup.net/zuul/t/cc-playground/stream/c32ac7dfe26d4d4e9ce5d1e578efb7f2?logfile=console.log07:58
tobiashtristanC: is the stream route missing? https://git.zuul-ci.org/cgit/zuul/tree/zuul/web/__init__.py#n58608:00
tobiashI only see console-stream (which is the websocket)08:00
tristanCtobiash: stream route is defined L42 of https://review.openstack.org/#/c/607479/2/web/src/routes.js08:01
tristanCtobiash: the web server shouldn't returns 404, what's the url that fails?08:02
*** nilashishc has quit IRC08:02
tobiashtristanC: the url above08:02
tristanCtobiash: does the other url, e.g. /builds works?08:03
tobiashtristanC: so there are two types of route? one in the js (for normal clicks) and one in cherrypy (for deep links?)08:03
tobiashtristanC: yes, builds works as deep link08:04
tristanCtobiash: there are web interface routes, how the index.html loads the page compoenents, that is routes.js08:04
tristanCtobiash: then there are api routes defined in cherrypy08:04
tristanCtobiash: https://review.openstack.org/#/c/607479/2/zuul/web/__init__.py edits should returns the index.html for both '/builds' and '/stream' request08:05
tristanCor is not working because of the '?logfile' querystring?08:05
tobiashtristanC: confirmed, 404 goes away when I remove the ?logfile querystring08:07
tobiashbut that breaks the streaming itself ;)08:08
tristanCtobiash: i see, then maybe we need to add "*arg, **kwarg" to the default() method of the static handler08:09
tristanClet me try this quickly08:09
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: Revert "Revert "web: rewrite interface in react""  https://review.openstack.org/60747908:13
tristanCtobiash: ^ should fix that issue08:14
*** nilashishc has joined #zuul08:15
*** nilashishc has quit IRC08:24
*** nilashishc has joined #zuul08:26
*** panda|off is now known as panda08:40
*** electrofelix has joined #zuul08:42
tobiashtristanC: confirmed, this fixes the issue :)08:55
*** chandankumar has joined #zuul09:06
*** chandankumar has quit IRC09:48
*** chandankumar has joined #zuul09:59
*** chandankumar has quit IRC10:26
*** sshnaidm is now known as sshnaidm|off10:35
tobiashmordred: btw, our ansible segfaults are gone with ubuntu based executors10:39
*** nilashishc has quit IRC10:45
*** nilashishc has joined #zuul10:48
*** jpena is now known as jpena|lunch11:04
*** jesusaur has quit IRC11:06
*** jesusaur has joined #zuul11:14
*** quiquell is now known as quiquell|lunch11:20
*** quiquell|lunch is now known as quiquell11:42
*** jpena|lunch is now known as jpena12:06
*** mrhillsman has joined #zuul12:19
mrhillsmanany idea why i see success status and logs but zuul is not reporting back to github and all of our nodepool nodes are stuck in-use payloads are successfully delivered jobs are queued up12:19
mrhillsmannodepool is not deleting nodes, zuul is not reporting status back to github12:20
mrhillsmanhttp://status.openlabtesting.org/t/openlab/status.html everything has just been "stuck" for hours12:21
*** rlandy has joined #zuul12:25
tobiashmrhillsman: in your status I see that there are events queued (probably also the result events that trigger reporting)12:25
*** samccann has joined #zuul12:25
tobiashmrhillsman: this is normal during reconfigurations, but not for hours12:26
tobiashmrhillsman: in that case you probably need to check the zuul-scheduler logs for anomalies12:26
mrhillsmanfor a time zookeeper was not reachable12:27
mrhillsman2018-10-05 10:35:47,546 WARNING kazoo.client: Connection dropped: outstanding heartbeat ping not received12:27
tobiashmrhillsman: maybe the mergers have no connection to the scheduler via gearman12:28
mrhillsmanbut that is all that is in scheduler and zookeeper12:28
tobiashmrhillsman: you also might have had connection problems from mergers to the scheduler?12:28
tobiashmrhillsman: currently mergers cannot detect this in some situations12:29
tobiashmrhillsman: you could try to restart mergers and executors12:29
mrhillsmani did restart them not long ago12:29
mrhillsmanwhat is weird is they are all on the same server12:29
mrhillsmanand things were fine until a couple of days ago12:29
tobiashdo you have a log of a merger?12:30
mrhillsmani do12:30
mrhillsmanthere were some errors yesterday but that was when i was fixing previous fail12:32
mrhillsmanonce i got things back and "working"; nodes available to nodepool, all services restarted12:32
mrhillsmani ran into an issue where the websocket was not available so a job would show up but it seemed like the executor could not connect to the node12:33
mrhillsmanit was late and i figured to check it when i got up12:33
mrhillsmanand this is what i woke up to lol12:33
tobiashmrhillsman: maybe the last few log lines of the scheduler could help12:34
mrhillsmanhttps://www.irccloud.com/pastebin/aUV17ODS/12:35
mrhillsmani restarted zookeeper12:35
mrhillsmanbefore the executer and merger restart12:36
mrhillsmanso now the scheduler logs are normal12:36
mrhillsman2018-10-05 12:36:14,549 DEBUG zuul.RPCListener: Received job zuul:status_get12:36
mrhillsman gearman is not showing any jobs12:37
tobiashthat's because I opened your status page link ;)12:37
mrhillsmanhttps://www.irccloud.com/pastebin/Ij9GOUVn/12:37
tobiashhrm, is there something unrelated to zk in the scheduler log before that?12:37
mrhillsmanthere's a lot of those status_get lines12:37
mrhillsmanchecking12:37
tobiashso the mergers are there so my previous theory is wrong12:38
mrhillsmanso there are some lines like this 2018-10-05 08:01:29,478 DEBUG zuul.layout: Project <ProjectConfig github.com/cloudfoundry/bosh-openstack-cpi-release source: cloudfoundry/bosh-openstack-cpi-release/.zuul.yaml@master {ImpliedBranchMatcher:master}> did not match item <QueueItem 0x7ff9f01e01d0 for <Branch 0x7ff9f01e0a90 cloudfoundry/bosh-openstack-cpi-release refs/heads/wip_s3_compiled_releases up12:42
mrhillsmandated None..None> in periodic>12:42
mrhillsmanand then things look fine12:43
mrhillsmanoverall things look fine12:43
mrhillsmanthat is the only anamoly12:43
mrhillsmanand there is an error much earlier than that about a particular nodetype not being available12:43
mrhillsmanException: The nodeset "ubuntu-bionic" was not found.12:43
mrhillsmanalso these 2018-10-05 08:00:21,503 DEBUG zuul.Pipeline.openlab.periodic:   <class 'zuul.model.Branch'> does not support dependencies12:44
tobiashmrhillsman: hrm, maybe a thread is stuck12:45
mrhillsmanif i restart the scheduler will all those clear up?12:45
mrhillsmanthe stuff on the dashboard12:45
tobiashmrhillsman: wait a second12:45
mrhillsmanok12:46
tobiashmrhillsman: is your current queue important?12:46
mrhillsmanit is not12:46
mrhillsmani can deal with the fallout12:46
mrhillsmani think i want to kill the periodic and disable bosh jobs for now12:47
tobiashok, you should create a stack dumo before the start so we have a chance to check if a thread was stuck12:47
mrhillsmanok12:47
tobiashyou can send SIGUSR2 to the scheduler process to do that12:47
mrhillsmanthx12:47
tobiashit should print a stack trace of every thread to the log12:47
tobiasha restart after that should be fine (if you're ok with a lost queue)12:48
mrhillsmanok it printed the stack trace12:49
mrhillsmanhttp://paste.openstack.org/show/731586/12:53
mrhillsmanhrmmm...maybe i will not have to restart the scheduler12:54
mrhillsmana bunch of stuff just disappeared12:54
mrhillsmanand nodepool started deleting/building nodes again12:54
mrhillsmanthis is crazy12:54
mrhillsmanjobs are running12:55
mrhillsmanand status updates sent to github12:56
tobiashmrhillsman: the thing that was probably stuck was unlocking a node (line 282 in your stack dump)12:58
tobiashmrhillsman: maybe that has a very long timeout12:58
mrhillsmaninteresting12:59
mrhillsmani wonder if that is a result of something with nodepool13:00
mrhillsmancause all of a sudden all of the nodes that were in-use i guess unlocked and got deleted13:00
mrhillsmanand the executor/merger reported back to github13:00
tobiashmrhillsman: if zuul looses its zookeeper session it automatically looses its locks (that is enforced by zookeeper)13:01
mrhillsmanit was like everything just grinded to a halt after the jobs completed13:01
tobiashif that happens nodepool deletes all those nodes that were in-use and unlocked13:01
mrhillsmando you think i need to move zookeeper to its own node13:01
mrhillsmani'll try to debug it a little via the logs13:02
mrhillsmanright now all zuul things are on one node and all nodepool on another13:02
tobiashdo you have zk on ceph or san with sometimes high latencies?13:02
mrhillsmanzk is on the same node as zuul daemons13:03
tobiashin the beginning I had many problems with zk (I'm on ceph) until I made it run on tmpfs (with 5 replica)13:03
mrhillsmanok, i'll look into things13:03
mrhillsmanwe were running less jobs and i had things spread out then we consolidated13:04
mrhillsmanbut now we have more stuff running13:04
mrhillsmanso making some changes are probably in order13:04
mrhillsmanthx for your help13:04
tobiashno problem13:05
*** samccann has quit IRC13:09
*** samccann has joined #zuul13:10
*** evrardjp has joined #zuul13:25
evrardjpI am curious, is it possible to use a playbook in a job's run: stanza from a required_project, so not from the main project's repo?13:26
AJaegerevrardjp: yes, you can do that - that's what we do the whole time with project-config ;)13:27
evrardjpit seems relative path don't work: ../<required_projectname>/<playbook_relativepath_in_required_project>.yml13:27
evrardjpAJaeger: opening project-config right now then :)13:27
AJaegerevrardjp: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.roles and http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n11713:28
tobiashevrardjp: no, that is not possible with playbook run stanzas, but you can re-use jobs from different projects13:29
evrardjpI am getting opposite messages there :)13:30
*** panda is now known as panda|off13:31
evrardjpI thought roles were like ansible roles, and therefore had to be called in plays to be units of re-use13:31
AJaegerevrardjp: show us a change and let tobiash and myself review ;)13:31
tobiashevrardjp: AJaeger probably meant roles while you asked for playbooks13:31
tobiashI think there might be a misunderstanding ;)13:32
evrardjpthat is fair, I understand that roles would be the "reusable" unit :)13:32
evrardjpI just didn't want to go for roles if I had to still write my own play. I will rethink this :)13:32
tobiashevrardjp: yes, roles and jobs are reusable, but not playbooks13:32
AJaegertobiash: indeed - I talked about roles ;(13:32
AJaegerevrardjp: so, either roles or jobs - not playbooks13:33
evrardjpyup that's what I expected13:33
*** quiquell is now known as quiquell|off13:34
*** EmilienM is now known as EvilienM13:57
*** nilashishc has quit IRC14:10
*** panda|off has quit IRC14:36
*** panda has joined #zuul14:37
*** pcaruana has quit IRC15:39
*** jimi|ansible has joined #zuul15:56
openstackgerritClark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup  https://review.openstack.org/60830315:58
clarkbanyone know if ^ will be tested as is or do I need to do something more to test that? in any case I think it is a simple change that should make job pre run setup more reliable15:59
openstackgerritClark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup  https://review.openstack.org/60830316:00
logan-clarkb: I think you'll need register and until attrs on that task for retry to work there. (see https://docs.ansible.com/ansible/2.5/user_guide/playbooks_loops.html#do-until-loops)16:09
clarkblogan-: ya it wasn't clear to me if the until is necessary if normal failure checking was good enough16:10
clarkblogan-: the current failure checking of the task is good enough, do I need an explicit until to say until this succeeds?16:10
clarkbor is that implied by retries > 0?16:10
logan-just register: git_clone / until: git_clone is success should be sufficient16:10
logan-yeah i think you have to specify it anyway16:10
logan-based on the note "If the until parameter isn’t defined, the value for the retries parameter is forced to 1."16:11
clarkbok16:11
* clarkb updates16:11
*** jpena is now known as jpena|off16:12
openstackgerritClark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup  https://review.openstack.org/60830316:12
*** ianychoi_ is now known as ianychoi16:19
*** pcaruana has joined #zuul16:42
*** pcaruana has quit IRC16:50
*** nilashishc has joined #zuul16:53
pabelangerclarkb: I can test it via ansible-network, it is an untrusted job for us17:03
clarkbpabelanger: if you don't mind doing that and reviewing based on results I would appreciate it greatly17:03
pabelangersure17:04
clarkbthe jobs hit by this are retried due to failing in pre-run, but not needing to spin up new test nodes for that and delaying a few seconds and retrying should be a benefit17:04
pabelangerclarkb: ah, mirror-workspace-git-repos, sorry. We are not using that yet. I was planning on trying to implement that shortly17:06
pabelangerin this case, you'll need to propose a new mirror-workspace-git-repos-test role, land then use base-test to test17:06
clarkbany idea if the new role can be a symlink to the existing one or does it have to be a proper copy?17:07
pabelangeryah, proper copy. roles will be loaded executor side, and don't think zuul will allow symlinks17:10
clarkbI guess it would also have to merge as well because it goes in base-test17:11
pabelangeryes17:11
pabelangerI'm hoping to do the git mirror things in untrusted when testing with ansible-network, but not sure it can because of the git mirror --push logic17:12
pabelangerI think zuul will block it17:12
*** nilashishc has quit IRC17:17
openstackgerritMerged openstack-infra/nodepool master: Run release-zuul-python on release  https://review.openstack.org/60764917:17
ShrewsCan someone else please +3 this race fix for a nodepool test? https://review.openstack.org/60467817:19
clarkbShrews: I'll take a look17:22
Shrewsclarkb: thx17:27
openstackgerritJeremy Stanley proposed openstack-infra/zuul-website master: Update events lists/banner after 2018 Ansiblefest  https://review.openstack.org/60832017:31
tobiashShrews: that failed in the gate because a zk problem. I also see this sometimes in zuul. Maybe we should increase the session timeout and/or place zk data on tmpfs during tests17:47
Shrewstobiash: it was only 4 seconds between connecting to zk and losing the connection. i don't think either of those would fix that. i've seen it before too, but have no idea what causes it17:51
tobiashHrm, should't be the default timeout 10s?17:53
tobiashBefore the end of the session timeout the connection state cannot be lost but only suspended17:54
*** electrofelix has quit IRC18:10
*** jesusaur has quit IRC19:16
*** jesusaur has joined #zuul19:23
openstackgerritMerged openstack-infra/zuul-website master: Update events lists/banner after 2018 Ansiblefest  https://review.openstack.org/60832019:29
openstackgerritClark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup  https://review.openstack.org/60830319:31
openstackgerritClark Boylan proposed openstack-infra/zuul-jobs master: Add test workspace setup role  https://review.openstack.org/60834219:31
clarkbok ^ with https://review.openstack.org/608343 should test this retry chnage19:33
clarkbpabelanger: logan- AJaeger ^ fyi19:33
pabelanger+219:37
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: docker-compose quickstart example  https://review.openstack.org/60834419:46
openstackgerritMerged openstack-infra/nodepool master: Fix race in test_launchNode_delete_error  https://review.openstack.org/60467820:18
*** samccann has quit IRC20:33
*** EvilienM is now known as EmilienM22:06
*** rlandy has quit IRC22:24
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: docker-compose quickstart example  https://review.openstack.org/60834422:25
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: docker-compose quickstart example  https://review.openstack.org/60834422:26

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!