Wednesday, 2017-06-28

*** dmsimard has quit IRC00:18
*** dmsimard has joined #zuul00:20
mordredjeblair: yes - that was my thinking - the string there is a thing for a log entry00:29
mordredjeblair: however, I don't feel strongly one way or the other00:29
mordredjeblair: also - yes, your playbook in the etherpad looks right to me00:35
*** xinliang_ has quit IRC01:13
*** xinliang_ has joined #zuul01:24
*** timrc has quit IRC02:33
*** timrc has joined #zuul02:42
xinliang_tristanC: I figure out the reason why my CI won't triger job. I was confused by the two project: openstack-dev/ci-sandbox and openstack-dev/sandbox02:54
xinliang_My ci is listen ci-sandbox project, but I add recheck comment to sandbox project, so it won't trigger job ;-)02:55
xinliang_now it is working, thanks.02:55
tristanCxinliang_: nice! good to hear :-)02:57
xinliang_:-)02:57
tristanCxinliang_: you might want to setup the zuul status web page, it's quite handy to check what's going on. For example using http://git.openstack.org/cgit/openstack-infra/puppet-zuul/tree/templates/zuul.vhost.erb03:00
tristanCand copying the zuul/etc/status/public_html over to /var/lib/zuul/www03:01
xinliang_tristanC: sounds great! will try to setup it03:02
xinliang_tristanC: The zuul web is already setup by the puppet.03:27
xinliang_tristanC: One other question: Do you know is there any way to run CI jobs on bear metal machines?03:29
xinliang_currently, all the openstack CI jobs  are running on vm machines, right?03:30
tristanCxinliang_: with jenkins, you can setup jenkins slave on bare metal, and if it has the right node label, then it will pick up zuul job03:33
tristanCxinliang_: with zuul-launcher, you can setup custom node in zuul.conf so that job get scheduled directly to a specific node03:34
tristanCxinliang_: and yes, currently all the openstack CI jobs run on ephemeral vm machines, so that the environment is identic between run03:35
xinliang_tristanC: thanks, I heard that openstack CI is migrated away from jenkins, so jobs are only managed by zuul right?03:41
tristanCxinliang_: yes, zuul2.5 has a zuul-launcher service that can execute jenkins-job-builder job in place of jenkins03:42
tristanCxinliang_: and the next version (zuul3) will replace jjb definition by ansible playbooks03:43
xinliang_that sounds good, which make ci system more simple03:44
xinliang_And if we run jobs on bear metal machines, is there any upstream tools for manage the metal machines?03:44
tristanCnot afaik, you'll have to build something custom with ironic for example03:46
xinliang_ok03:47
tristanCat least you can prevent a slave to run more than a job using the OFFLINE_NODE_WHEN_COMPLETE job parameter03:47
clarkbnodepool works with ironic and nova03:48
clarkbpeople are using it that way for some ci03:48
xinliang_clarkb: great!  good to hear this.03:49
tristanCclarkb: oh didn't knew that, good to know!03:50
*** lennyb has quit IRC05:09
*** lennyb has joined #zuul05:21
*** isaacb has joined #zuul05:51
*** isaacb has quit IRC06:31
*** isaacb has joined #zuul08:37
*** hashar has joined #zuul08:47
*** jkilpatr has quit IRC10:00
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Common threading layer  https://review.openstack.org/47846610:34
*** jkilpatr has joined #zuul10:58
*** jkilpatr has quit IRC11:41
*** dkranz has quit IRC11:52
*** jkilpatr has joined #zuul12:01
*** jkilpatr has quit IRC12:11
*** jkilpatr has joined #zuul12:12
*** jkilpatr has quit IRC12:28
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Common threading layer  https://review.openstack.org/47846613:10
*** dkranz has joined #zuul13:27
*** rcarrill1 is now known as rcarrillocruz14:44
*** isaacb has quit IRC14:58
*** jkilpatr has joined #zuul15:37
jeblairmordred: i think we're going to run into tobiash's role-repo problem soon.  i think we hadn't noticed it because of our openstack-zuul-jobs/openstack-zuul-roles split, but as we move roles into zuul-jobs and project-config, we're going to need to name those same repos as role repos.15:56
mordredjeblair: yay!16:04
jeblairmordred: so i was thinking i might take a stab at implementing that rather than writing it up as a story :)16:04
jeblairmordred: can you take a look at 478313 and 478315 ?16:05
jeblairi'd like to merge those and 478311 and restart the executor and see if we can't get log publishing going16:05
mordredjeblair: ++16:08
mordredjeblair: for 478313 - as a temp workaround while you fix the tobiash bug, we could also just add openstack-infra/zuul-jobs to the roles list of the base job16:09
jeblairmordred: yeah i think we'll need to do that16:09
mordredjeblair: I can do that as a followup patch16:10
jeblairmordred: cool16:11
mordredjeblair: oh - I just realized something - hosts: all roles: collect-logs - is the collect-logs role handling overlapping log file names?16:11
jeblairmordred: it will handle them by clobbering them right now.  :)16:11
mordred:)16:11
* mordred will file story for that16:11
tobiashmordred: you might want to encode inventory_hostname into the destination dir16:12
jeblairmordred: so that's an interesting thing -- do we want the collect-logs to ... what tobiash said :)16:12
jeblairmordred, tobiash: maybe yes, if there's more than one host, but no if there's only one host16:12
jeblair(that way you don't have to navigate to logs/ubuntu-xenial/subunit.html for a unit test)16:13
tobiashjeblair: that should be possible with ansible16:14
mordredhttps://storyboard.openstack.org/#!/story/200109216:14
mordredfunny - I wrote allof those things into the story :)16:14
jeblairit's like we're all on the same page :)16:15
mordredthere's probably a third case - which is "what if a job wants to do pre-rationalization" - like I could imagine devstack maybe wanting the logs from the 'controller' node to show up in the base logs dir as "the logs" - but still have subdirs for each additional node16:16
mordredwhich I dont think the base job needs to know about16:16
mordredbut maybe there should be a way for a job to do some log collection and then say "I've already done this work, please to not collect logs for me"16:16
mordredI added a note about that to the story16:18
jeblairmordred, tobiash: or... perhaps we leave the role as is -- it copies every logs/ dir from every host, which is fine for the single-host case.  and if you write a multi-node job, well, it's not going to do anything by default.  you just expect to pre-rationalize them.  ie, devstack may pull all of its logs into the logs/ dir on the controller, in whatever layout it wants.16:18
tobiashjeblair: would also be an option16:19
jeblair(i mean, a job has to *put* something into the logs/ dir on the node -- so just say that the job needs to be responsible for understanding what it's putting there)16:19
mordredjeblair: I like that for an answer to the follow up about "what to do about advanced jobs like devstack" ... I kind of like the system automatically handling multi-node jobs with host directory logs16:21
mordredsince it's kind of a neat way to support multi-node - and if the rationalize-step is just "is there content in only one hostname dir in the local logs dir, if so collapse" - then devstack can simply move all of the logs to the 'controller' node16:22
jeblairmordred: yeah, if we implement it that way, we may have the best of both worlds.  if you accidentally or on purpose have content in logs/ on more than one host, we rationalize it for you.  if you know what you're doing, you don't end up in that position and we do nothing.16:23
mordred++16:24
* tobiash is heading home now16:24
*** bhavik1 has joined #zuul16:30
*** bhavik1 has quit IRC16:32
dmsimardThe zuul executor nodes, are they localized to each region or is it a global zuul cluster somewhere ?16:34
dmsimardhttps://github.com/openstack-infra/system-config/blob/master/hiera/common.yaml suggests it's a common cluster somewhere16:35
mordreddmsimard: common cluster16:36
mordreddmsimard: the communication between scheduler and executors is over gear - communication between regions from executors to nodes is over ssh - which is better suited to WAN traffic16:37
dmsimardmordred: fair, I guess the hidden question behind that was did you ever consider localizing more parts of the infrastructure (such as logs) -- cause that common cluster (for example) could end up pulling logs from an EMEA region to USA16:38
dmsimardI'm not really concerned about things like data privacy laws, but more around latency and throughput16:38
dmsimardI guess decentralizing the infrastructure would have a cost on complexity/management16:39
dmsimard(not just in software but in human resources)16:39
mordreddmsimard: it has come up in conversation a few times (for openstack, we do, in fact, pull logs from EMEA to USA) - but doing so would require teaching zuul more about clouds and regions, and at the moment that's not information zuul takes in to account when scheduling16:40
dmsimardyeah, that logic lives in nodepool.16:40
mordreddmsimard: since it's been fine so far at infra-scale, the idea hasn't really bubbled up to the point of being a problem that needs solving16:40
dmsimardalright, thanks :)16:40
clarkbfwiw our bandwidth between emea and north america is really great16:41
clarkbso as long as you aren't sensitive to latency you tend to not have problems16:41
clarkbour clouds in europe seem to have really solid network setups (ovh and citycloud)16:42
dmsimardyeah, it's also okay not to prematurely optimize things until there's a real problem16:42
dmsimardif it works at infra scale it's good enough for me :)16:43
mordreddmsimard: :)16:43
mordreddmsimard: it's nice that we have a stupily big version for data isn't it? :)16:43
clarkb(afs is sensitive to latency so I've got a couple ideas to test in order to make that better, but currently focused on gerrit upgrade items)16:43
dmsimardmordred: the 14x1TB cinder volume over LVM is still a bit mind boggling :)16:44
clarkbdmsimard: and its too small!16:44
clarkb:)16:44
dmsimardclarkb: I know right, part of why decentralizing that would perhaps ease the burden :p16:45
dmsimardbut fungi said it's a WIP to move to a bigger node/provider (at which point does it make sense to scale horizontally instead?)16:45
clarkbpotentially. We want to switch providers to avoid volume maintenance taking out the entire fs as first priority16:46
clarkbbut sounds like there may be some room for scaling up too16:46
clarkbthe problem with 14 1TB volumes under a single fs is that anytime that cloud does maintenance on their volumes at least one of our volumes seems to be affected16:47
clarkbs/the/a/16:47
dmsimardyeah that's awful16:47
*** hashar has quit IRC16:48
fungiit's an outage magnet, more or less16:49
fungicoupled with needing a day or more outage to our ci system to perform a full fsck when that happens16:50
fungiso, so, so many tiny files16:50
*** jkilpatr has quit IRC16:50
mordreddmsimard: one of the biggest issues with de-centralized log storage is that we'd lose a global index for the logs - knowing the change id would no longer be enough to find the build logs - you'd also have to know what cloud it ran on16:51
fungidecentralized log storage seems fine to me as long as there's a centralized index16:51
mordreddmsimard: this, of course, is solvable, as all things are - mostly just pointing out one of the issues that can make it slightly more complex to deal with16:51
mordredfungi: yup16:51
jeblairsomeone said something about a zuul dashboard... :)16:51
fungiif only the thing deciding where the logs go knew where the logs went...16:52
clarkbjust tested to reassure myself and we get about 10MBps from gra1 to dfw on a 57MB file16:52
jeblairspeaking of logs, i'm going to try restarting the executor now16:52
clarkbwhich isn't bad for crossing an ocean. Better than my home connection16:52
jeblairand i will restart the scheduler16:57
rcarrillocruzhey folks, noob question: a have a nodepool node stuck in locked state, can't delete it. I assume this can be unlocked with some zookeeper command?17:04
Shrewsrcarrillocruz: weird. what state does ZK report the node in?17:05
ShrewsREADY, BUILDING, etc17:05
rcarrillocruzdeleting17:05
Shrewsrcarrillocruz: hmm, is the cleanup thread running?17:06
rcarrillocruzshould be, cos I can delete other nodes just fine17:06
rcarrillocruzbut can't this one17:06
jlko/17:07
Shrewswould be good if we could figure out how it got into this state17:08
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Create a new logger for gerrit IO  https://review.openstack.org/47856617:08
rcarrillocruzShrews: can't really tell how i got there, any hint how to unlock the node to delete it tho?17:09
Shrewsrcarrillocruz: do you have zk-shell?17:10
rcarrillocruzlet me see17:10
rcarrillocruznope, is that on pypi?17:10
Shrewshttps://github.com/rgs1/zk_shell17:10
rcarrillocruzyah, pip installing it17:10
jeblairmordred: speedy +3 on 478570 pls?17:11
rcarrillocruzlol, it installs twitter libs ?17:11
rcarrillocruzanyway, i'm on zk-shell *shell* now Shrews17:11
rcarrillocruzoh neat17:12
rcarrillocruzso17:12
rcarrillocruzzookeeper has a tree structure17:12
Shrewsrcarrillocruz: using that, just rm the znode17:12
jeblairwait17:12
rcarrillocruzls nodepool/nodes17:12
jeblairdo not rm the node17:13
jeblairgive me a second to find some debug commands17:13
* rcarrillocruz waits17:14
mordredjeblair: done17:14
jeblair(it's never okay for a nodepool admin to have to use zk-shell, so we need to debug any time it happens)17:14
jeblairrcarrillocruz: run 'dump' and paste the output please17:14
jeblairyou can actually run that without zk-shell17:16
jeblairecho dump|nc localhost 218117:17
jeblairwill work17:17
jeblairhttps://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#The+Four+Letter+Words17:17
jeblair(but zk-shell also supports it)17:17
rcarrillocruzthere you go17:17
rcarrillocruzhttps://paste.fedoraproject.org/paste/FmmFyItw~8KpVWFZIVz0wA17:17
jeblairso it is the poolworker that holds the lock17:19
jeblairrcarrillocruz: are you running with debug level logging?17:20
rcarrillocruzi'm not running nodepool in foreground with debug level set is that what you ask, installed with ansible role and started nodepool with systemd scripts. I can tail launcher-debug.log17:22
jeblairrcarrillocruz: no, i just meant debug logs -- if "DEBUG" entries show up in launcher-debug.log, that's what i'm looking for17:23
rcarrillocruzyeah, they show up17:23
jeblairrcarrillocruz: can you grep "00166" in launcher-debug.log?  it will also have a server UUID on the "Waiting for server" line.  also grep for that UUID and pastebin both please?17:23
rcarrillocruzhttp://paste.openstack.org/show/613957/17:25
rcarrillocruzthat the last hit17:25
rcarrillocruzotoh17:25
rcarrillocruzi spotted an error17:25
rcarrillocruzsec, i'll paste17:25
rcarrillocruzhttp://paste.openstack.org/show/613958/17:26
jeblairrcarrillocruz: perfect, thanks!17:26
rcarrillocruzsorry for the rightmost dots, i'm on a tmux session, some other user must have diff window size17:27
*** Shuo has joined #zuul17:27
jeblairrcarrillocruz, Shrews: i wonder if, since we have a stuck ephemeral node, if the best solution might be to restart the launcher rather than manually delete the zk node?  Shrews, what do you think?17:27
Shrewsthat would work17:28
jeblairrcarrillocruz: oh, that traceback is useful, though i'm not convinced that's the problem17:28
jeblairi think it may be from a different thread17:28
jeblairrcarrillocruz: was there any more interesting log entries around 2017-06-27 17:01:12,489 ?17:29
rcarrillocruzlet me look17:29
Shrewsrcarrillocruz: did you shutdown/restart or otherwise do something to zookeeper while nodepool was running?17:29
jeblair(that was the "deleting ZK node" line)17:29
*** bhavik1 has joined #zuul17:35
*** bhavik1 has quit IRC17:42
ShuoWhat's zuul v3's dependency to nodepool? Can a zuul environment work without it?17:43
jlkZuul relies on nodepool to provide nodes on which to execute the tests17:43
jlks/tests/jobs/17:44
SpamapSShuo: I'd say that zuulv3 needs _something_ to be on the other side of ZK. Nodepool is currently the only thing that implements that, so it's an effective dependency, but you could, in theory, write something else to do nodepool's job.17:44
jlkWithout nodepool, there is very limited things Zuul can do17:44
SpamapSI've oft wondered if, when we get to a place of stability in that interface, if we could write smaller shims for clouds that have their own nodepool-ish things. Like AWS autoscale groups, for instance.17:45
SpamapSsans nodepool, you can only do noop jobs.. I think?17:46
mordredSpamapS: yah - although those are also likely good backend modules for nodepool itself17:48
mordreda heat driver got brought up the other day, for instance - but there's some design questions we'll need to think about in such cases17:49
tobiashI think it could be possible to run jobs without nodeset locally on the executor17:51
jeblairyeah, there's enough overhead to the protocol and resource management that i think it's going to be useful to think of most species of that problem as another nodepool driver (possibly, even out-of-tree driver in the future).  but, yeah, if we run into a problem from a different genus, that might be a good option for just writing to the zk api.17:51
jeblairtobiash: correct; a job with no nodes can still do things on the executor.  that could come in handy for things like "trigger this remote webhook".17:52
ShuoSpamaps: I wonder if something like Kubernetes/Mesos can be placed on the nodepool's role17:52
tobiashthat would restrict the jobs to trusted jobs (not sure it makes any sense for a nodepool less deployment)17:52
jeblairtobiash: untrusted jobs run on the executor too and can still be useful.  like that webhook example.  it doesn't need to be trusted.17:53
ShuoI am asking this because we are not a OpenStack shop, not a VM shop (in our data center)17:53
jlkSpamapS: you can't even do a "noop" job without a nodepool. That doesn't work outside of the test runner17:53
mordredShuo: writing a k8s/mesos driver for nodepool is definitely a thing that we should do17:54
tobiashjeblair: ah, right, missed things like that17:54
jeblairShuo: yeah, kubernetes, etc, support in nodepool is on the roadmap, but we don't know what it will look like yet.  mesos may be easier.  :)17:55
jlkI think there's some code needed to allow zuul to run a job without a node17:55
Shuojlk: if my plan is to avoid dependency to nodepool (and say, I can bear with a static resource capacity at this moment), can I have some kind of setup?17:55
jeblairjlk: i think it should work -- that's actually the code path for about 85% of the tests :)17:55
jeblairShuo: we plan on using nodepool to manage static resources as well.  there are actually patches in progress for that.17:56
jlkjeblair: oh? I'd like to know more! I thought it was special cased for the test runner.17:56
mordredShuo: static resource capacity is another thing that should come from nodepool - there isn't really any intent to modify zuul to not need nodepool - however, tristanC is working on the patches to allow nodepool to have staticly defined resources17:56
mordredShuo: as that's a thing many people need17:56
Shuojeblair: that would be great because my current thought is to make a simple cluster to make the case for zuul.17:57
mordredShuo: ++17:57
jeblairjlk: i *think* it's just an empty node request.  it might still go out to nodepool to be satisfied (which is silly).  let's test it later after i get logs working.  :)17:57
jlkfair enough17:57
SpamapSjlk: oh, fake nodepool to the rescue I guess ;-)18:00
SpamapSfunny story...18:00
SpamapSI think if you had nodeset-less jobs..18:00
SpamapSyou could just scale out executors18:00
SpamapSwith "whatever method you like"18:00
Shuojeblair: "let's test it later after i get logs working.  :)" sound a very quick ETA? My time horizon is around 5 weeks :-)18:00
SpamapSand if you don't want to let job definers define their host OS... that would be a viable Zuulv3 use case.18:00
Shuomodred: glad this resonates :-)18:01
SpamapSlike if all you want to let them do is run stuff that will also be on the executor.. I see no problem with that.18:01
SpamapS(it's not something I want... but.. I can see a case for it)18:01
jlkI could see a case for it18:02
jlkzuul as a relay agent to other systems18:02
jlkhandle ingest, gate keeping, and reporting, but farm "work" out to somewhere else.18:02
Shuojeblair: "mesos may be easier.  :)" if that's something I can manage to contribute to (bear in mind that I don't know any zuul internals), I'd love to take someone to my plate.18:03
jlkShuo: that'd be more of a nodepool contribution18:03
Shrewstfw you spend an hour debugging a client/server interaction to realize a lack of \n caused all of your problems18:03
Shuojlk: do we have an up-to-date nodepool archtiecture diagram?18:04
jlkShuo: I do not know the answer to that18:04
Shuojeblair: do we have an up-to-date zuul (working with nodepool) architecture diagram?18:04
jeblairShuo: nope18:05
Shrewswell, we have this (https://github.com/openstack-infra/nodepool/blob/feature/zuulv3/doc/source/devguide.rst) though we've let that get slightly out of date. but it is close18:05
Shrewsand only covers the builder18:05
ShuoShrews: thanks.18:06
jeblairShuo: we're just now working out what the internal nodepool architecture is for multiple drivers18:06
Shrewsoh, yeah, def no diagrams for the drivers18:06
Shuojeblair: not meant to interrupt or demand any commitment, but if documentation is something that I can help on, I'd love to help on as I hope to make zuul internally well discussed.18:07
ShuoShrews: thanks for sharing. btw, are you working on nodepool specifically?18:08
jeblairShuo: you may be interested in taking a look at 463328 and its 12 child changes18:08
jeblairShuo: https://review.openstack.org/46332818:08
ShrewsShuo: i rewrote most parts of nodepool to implement jeblair's v3 specs. not actively working on any changes to it right now.18:09
*** jkilpatr has joined #zuul18:09
* SpamapS now wonders about something like a set of ansible modules that manage mesos things that would just run on the executor.18:10
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Make sure playbook and role repos are updated  https://review.openstack.org/47762718:10
mordredSpamapS: yah - passing in mesos/k8s endpoint information as part of the inventory is definitely a thing thathas come up in the discussions we've had so far when we've looked in to container workloads intended for such systems - there's a few fun and complex issues that come up once the whiteboard comes out that it's going to be a fun conversation when we all have the brainspace to tackle it18:13
jeblairSpamapS: it would work, but i don't think it's a good fit for most kinds of jobs.  we gave up on that architecture with jenkins 6 years ago when openstack was 1/200th the size it is now.  managing resources separately from job execution makes a lot of things better at all scales.18:14
ShuoShrews: reading the pointer you shared now. maybe a dump question, can you give an example of "BuildWorker" (is that a single machine or is that an endpoint of a driver such as mesos master)?18:15
jeblairmordred, SpamapS: we need to land https://review.openstack.org/477627 and restart the executor again.  we're running into the problem that fixes.18:16
mordredjeblair: +2 from me - SpamapS you got a sec to look at it?18:16
* SpamapS looks18:19
ShrewsShuo: that diagram is not useful to you in relation to drivers. It's for the nodepool-builder (nodepool/builder.py). You would be interested in nodepool-launcher (nodepool/launcher.py).18:19
SpamapSso many updates18:20
SpamapSmordred: I feel like maybe mesos and k8s include better resource management themselves than jenkins + openstacks. But I could be wrong.18:21
* SpamapS hears Dak ... STAY ON TARGET...18:21
SpamapSI +3'd 477627, but do we need to kick it to get it into the gate?18:22
SpamapSah no, it's just rechecking18:22
mordredSpamapS: I'm not necessarily agreeing or disagreeing with that - I'm saying there's a bunch of other bits that get tricky, and it'll be a fun problem to work on in a few months because I think solving it will be valuable to many people18:23
SpamapSmordred: Right.. first let's get these torpedos into the exhaust port.18:29
mordredSpamapS: ++18:31
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Make sure playbook and role repos are updated  https://review.openstack.org/47762718:32
Shuojeblair: "https://review.openstack.org/463328" looks like a great start point for me. How do I start picking it up?18:35
jeblairShuo: i'm not sure i understand the question18:36
Shuojeblair: https://review.openstack.org/463328 is an active CL under review, how should I jump in and start working on it?18:38
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Fix and test report urls for unknown failures  https://review.openstack.org/46921418:46
jlkoh crap, I think I need to change my docker file for python3 needs now18:46
jeblairShuo: feel free to log into gerrit and submit comments; there are 12 patches after that one too18:48
Shuojeblair: thanks. leave for lunch for now, will be back and try to catch up in the afternoon. by now...18:49
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Add web-based console log streaming  https://review.openstack.org/46335318:53
*** Shuo has quit IRC19:06
jeblairmordred: i'm not seeing any output from the post playbook: http://paste.openstack.org/show/613976/19:09
jeblairmordred: i'm re-running with verbose19:09
mordredjeblair: hrm. let me look in just a sec19:18
*** Guest28796 is now known as mgagne19:19
*** mgagne has quit IRC19:19
*** mgagne has joined #zuul19:19
*** Shuo has joined #zuul20:20
jeblairi'm going to temporarily stop the nodepool launcher as a poor substitute for lacking autohold20:21
*** Shuo has quit IRC20:28
*** Shuo has joined #zuul20:28
mordredjeblair: heya - sorry - I had a phone call with dkranz - any current debugging state I should start looking at? or just start from scrtach?20:44
jeblairmordred: nah, verbose didn't give any more output; i'm going to "hold" a node ^ so i can poke20:45
mordredjeblair: kk20:45
jeblairmordred: while i'm waiting -- i just realized that pabelanger implemented a log pull in the tox job -- so i think he was imagining that any jobs with interesting logs would do their own pulls back to the executor, whereas i imagined they would move things into ~/logs on the node.  i'm not sure my idea is better.  we may want to go with the pabelanger plan for a bit and see what that looks like.20:48
clarkbjeblair: with proper secrets they don't have to use the executor as a staging area right?20:50
clarkbseems like avoidingthat if possible would be good for bw reasons20:50
jeblairclarkb: technically... yeah, we could rsync directly from node to logserver, however, that means the node has some kind of credential access (not sure what ansible does there... agent forwarding?) after the node has run an untrusted job, so i get itchy about that.20:52
jeblairclarkb: v2.5 does the staging thing and we haven't really noticed :)20:52
clarkbthats true and we staged wtih jenkins for years too (that is how scp worked)20:53
jeblairmordred: there's some output (like the "PLAY" and "TASK") lines going to the ZUUL_JOB_OUTPUT_FILE that isn't ending up in the zuul log because it's not going to stdout; i think that's normal -- but did we discuss maybe having everything go to stdout when we use -vvv?20:58
jeblairin ansible, i want to have a "when:" conditional based on whether an attribute/key/whatever is defined.  in other words, sometimes "zuul.newrev" will be defined, sometimes it won't.  "when: zuul.newrev" throws an error when it doesn't exist... what's the right way to do that?21:03
mordredjeblair: maybe? I mean, we can definitely do that21:03
jeblairmaybe jinja with a default value...?21:03
mordredyah21:04
mordredwhen: zuul.newrev | default(None) or somehting like that21:04
jeblair [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: {{ zuul.newrev | default(None) }}21:05
jeblairi think it *worked* since it's just a warning....21:05
jeblairoh it just doesn't want {{}} because it's already a jinja string21:07
jeblairyeah, so just dropping that works21:07
Shuojeblair: can you add me to the reviewer's list? I don't seem to be able to publish my comment on https://review.openstack.org/#/c/463328/ for some reason (though I logged onto gerrit)21:07
jeblairShuo: that shouldn't be necessary -- what happens when you try to publish it?21:07
jeblairmordred: apparently "zuul.newrev is defined" is also a thing21:08
mordredjeblair: that seems clearer21:08
jeblairmordred: ya, hard to argue with that :)21:08
Shuojeblair: I have written a fully comment, and can 'save' (now it's draft from my web view) but can't publish -- sorry, long time no use gerrit :-)21:09
*** dkranz has quit IRC21:10
jeblairShuo: after you do that, there's a "Reply..." button at the top of the main change screen, that should publish your comments21:10
jeblairmordred: i don't see a way to set the user with add_host, but we need to log in as 'jenkins' to static.openstack.org (which we add to the inventory with add_host)21:13
jeblairmordred: oh, wait, maybe i just add an ansible_user variable and it does it...trying21:15
Shuojeblair: done21:15
jeblairShuo: thanks!21:15
mordredjeblair: yes - should be ansible_ssh_user iirc21:15
mordrednope. ansible_user -ansible_ssh_user is the old one21:16
Shuoalso, sent an email to your @redhat.com, please take a look once you get chance...21:16
Shuojeblair: also, sent an email to your @redhat.com, please take a look once you get chance...21:16
jeblairShuo: will do21:18
Shuojeblair: thanks!21:18
jeblairmordred: w00t! i got http://logs.openstack.org/76/478576/2/check/31feea4/logs/ to happen with a bunch of manual changes.  :)  patching now.21:23
jeblair21:26 < openstackgerrit> James E. Blair proposed openstack-infra/project-config master: Fix base post playbook  https://review.openstack.org/47863921:28
jeblair21:28 < openstackgerrit> James E. Blair proposed openstack-infra/openstack-zuul-roles master: Fix several issues with upload-logs role  https://review.openstack.org/47864621:28
jeblairmordred: ^ can you +3 those?21:29
ShuoI'd like to share this with zuul community (See the 2nd reply to first comment), I really feel zuul has its huge potential to grow21:36
Shuohttps://lwn.net/Articles/702177/21:36
mordredjeblair: on it - also - did you have to patch zuul itself? or just the playbooks?21:51
mordredjeblair: and if it was just playbooks - do we need to address debuggability? or just these are hard becaue they are dealing with the log publication system, so chicken-and-egg?21:51
jeblairmordred: i think both.  i think sending all output to stdout on -vvv will help.  as will the cmdline zuul-executor idea.21:53
jeblairand, of course, autohold21:53
jeblairi think with those 3 things, this would be human-debuggable.21:54
mordred++21:54
jlkInteresting question time22:02
jlkwait, no. n/m22:02
SpamapSShuo: interesting article22:04
* SpamapS is reading it22:04
SpamapSthey completely missed that gerrit has keyboard shortcuts for one.. and git-review..22:04
SpamapSand "it's hard to do local testing" .. ???!!!22:04
SpamapSthe UI has the clone instructions right there (not to mention gertty just does it for you)22:04
SpamapSweird22:04
jeblairyou can't argue with kernel developers22:05
jeblair2017-06-28 22:03:02,939 DEBUG zuul.AnsibleJob: [build: 4f9c2f7d45e9428ba1f09964d3cb9e33] Ansible output: b'fatal: [static.openstack.org]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \\"static.openstack.org\\". Make sure this host can be reached over ssh", "unreachable": true}'22:05
jeblairokay that's new22:05
mordredjeblair: that doesn't seem ideal22:06
SpamapSShuo: "if only zuul scaled well" .. but no supporting evidence. I'd say Zuul scales pretty darn well.22:07
mordredSpamapS: someone said zuul doesn't scale?22:07
mordredSpamapS: are they high?22:07
jeblairmordred: apparently we did, in the openstack-infra channel22:07
mordredwe did?22:08
jeblairso says a person on the internet22:08
mordredwow22:08
mordredI mean22:08
mordredpeople on the internet are often wrong22:08
mordredbut that's about the wrongest I've heard anyone be in quite some time22:08
mordredit may not (yet) scale DOWN as well as we'd like - but I think we prove just aout every day that it scales UP22:09
jeblairthis is last october, too, after zuul achieved it's highest scaling point :)22:09
mordredoh - that's from prometheanfire - that's really weird, he's much more clued in than that22:10
* mordred walks away from the lwn article22:10
SpamapSyeah22:11
SpamapSlwn is like that22:11
jeblairthis is one of those "is everything i read in the newspaper about things that i'm not expert in *this* inaccurate?" moments22:12
jeblairmordred, SpamapS: could it be that we have to add static.o.o host keys to known_hosts?22:13
mordredjeblair: it's worth trying an ssh as zuul to jenkins@static and see - are you on and can try that?22:14
clarkbjeblair: I'm pretty sure its all that inaccurate yes22:14
mordredjeblair: there is definitely no entry in zuul user's known_hosts22:15
jlkjeblair: so I was just introducing zuul to somebody, and they kind of stumbled over the project definition schema, and was a bit confused/irritated that the pipelines weren't in their own key, to make it much more clear what was a pipeline. instead of just taking anything that isn't one of the reserved keywords in a project hash22:15
jlkjeblair: they expected something more like https://review.openstack.org/#/c/463328/5/doc/source/project-config.rst22:16
jlker22:16
jlkhttps://gist.github.com/caphrim007/40ec478f0ab2e233ea804f22b8531d9922:16
mordredwow, how fascinating. I never want additional deepness in my yaml :)22:17
mordredjeblair: oh - duh. we create a known_hosts in the job dir - so we'd need to be able to pass in additional known_hosts22:19
mordredjeblair: or maybe start with the zuul user's known hosts and append to it?22:19
jlkmordred: yeah, he's totally unfamiliar with zuul so he thought "check" was a specific keyword for the project, not implicitly a pipeline name (because it didn't match any other keys in the schema)22:23
mordredjlk: nod22:23
mordredjlk: I was actually just chatting with someone earlier today and the "how does a new user discovery what pipelines are available" came up - which may or may not be related22:24
jlksomewhat related yea22:24
jlkcomes down to "admin has to have documented them"22:24
jlkuntil we do something like an API where you can ask zuul what pipelines exist22:25
mordredthis is the lovely area in between "the system is flexible and lets you define any pipeline name you want" and "as a noob user, what are the names and what do they mean"22:27
clarkb++ on not adding unnecessary depth to yaml. I think that happens a lot because people don't understand the datastructure itself22:27
clarkbthe datastructure is part of the sematics here and we should take advantage22:27
mordredclarkb: sure - but also I think we should not ignore reported confusion from new user22:32
clarkbdefinitely22:33
Shuomordred: to reply "SpamapS: someone said zuul doesn't scale?", this is exactly why I feel it is worth speaking out my thoughts here https://review.openstack.org/#/c/463328/22:37
jeblairjlk: yeah, i could probably be convinced we should move it down a level; we've had to do similar things elsewhere (job graph), this is probably about the last place we have something like that...22:37
ShuoSpamapS: I was not saying zuul does not scale, and I think the person making such a comment probably did not spend time to do some research, and that's why I felt something I commented in Jame's Cl might make huge value to zuul https://review.openstack.org/#/c/463328/22:40
mordredShuo: totally! fwiw I did not think you were saying that - and I agree22:43
SpamapSShuo: indeed we were just reacting to the comment violently. It is not directed at you, and I like where you're going. :)22:46
jeblairmordred: yeah, if i re-run my command, i get asked to accept the key, and if i say no, i get the same error22:51
jeblairmordred: so i think it's known_hosts22:51
mordredjeblair: yah22:52
jeblairmordred: i think i'll just append to it in the playbook for now (since that's where the host is being added, it makes sense to keep those things together)22:52
mordredjeblair: kk. you have access to the host_key in the playbook?22:52
jeblairmordred: well, it's got a bunch of hardcoded stuff in there right now: username and host, so i can just add this.22:53
mordredjeblair: ++22:53
jeblairsomething something parameterize later22:53
jeblairsweet there's a known_hosts module22:53
mordredjeblair: ya- seems like either an option to seed the per-job known_hosts file with ~zuul/.ssh/known_hosts - or to pass in one or more to inject22:53
mordredjeblair: \o/22:53
jlkcrap I think I hit some python3 issues in the github code23:00
jlkTypeError: a bytes-like object is required, not 'str'23:02
clarkbyup23:03
clarkbyou'll need to encode whatever is str to bytes23:03
jeblairmordred: 23:07 < openstackgerrit> James E. Blair proposed openstack-infra/project-config master: Add static.o.o key to known_hosts for log publisher  https://review.openstack.org/47867223:08
jlkwtf, I can't even access the object23:09
jlk(Pdb) p request.json23:10
jlk*** TypeError: a bytes-like object is required, not 'str'23:10
mordredjlk: request.data I believe is where the raw data is, if that's a requests object23:12
jlkit's not, it's a webob23:12
jlk<class 'webob.request.Request'>23:12
mordredjlk: ah. well - request.body I think?23:13
jlk(Pdb) request.body23:13
jlk*** TypeError: a bytes-like object is required, not 'str'23:13
jlkwebob page claims to support python3. I'm skeptical23:14
mordredjlk: where in this code is this hitting you - do you know?23:14
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add ability to auto-generate simple one-line shell playbooks  https://review.openstack.org/47867523:14
jlkit's in the githubconnection.py file, when it's trying to handle a webhook event23:14
jlkit's where we try to get the content to be a json dict object23:15
mordred            json_body = request.json_body23:15
mordredthat bit?23:15
jlkthese are all internal methods of webob.23:15
jlkyeah23:15
mordredjlk: will it let you print request.headerlist?23:16
mordredor request.headers23:16
jlkyeah23:17
jlkugh, python3, how do I quickly show the keys of a thing...23:17
clarkbjlk: foo.__dict__ ?23:17
mordredjlk: list(foo.keys())23:17
clarkboh actual keys of a dict23:17
mordredjlk: specifically curious if the incoming payload from gh had a 'Content-type' header set23:18
jlk 'CONTENT_TYPE': 'application/json'23:18
jlkI'm sending it with curl23:18
mordredor charset23:18
mordredah23:18
jlkmaybe I need to add a charset?23:19
clarkbcontent-encoding is I think what you want23:19
jamielennoxjlk: this one? http://paste.openstack.org/show/613989/23:19
jlkjamielennox: the bottom one yes23:19
mordredjlk: yah - my hunch is that you need to add a charset in your curl23:20
jamielennoxok - but not caused  by the same thing then?23:20
mordredjlk: my reading of the webob docs is that it needs a charset set so that it knows what to do with encoding23:20
jamielennoxwhatever is making repository == None23:20
jamielennoxwebob should set a default charset?23:20
jlkthis I think is my first time running this all in python3, I may be up for a while :(23:20
jlkgithub doesn't send that in the headers23:21
mordredjlk: you should be able to pass "content-type: application/json; charset=utf8" and webob claims it'll do the right thing23:21
mordredhrm23:21
jlkhttp://paste.openstack.org/show/613990/23:21
mordredjlk: excellent23:22
jlkyeah I can't select others.23:23
jlkdefault_body_encoding is set to 'UTF-8' by default, it exists to allow users to get/set the Response object using .text, even if no charset has been set for the Content-Type.23:25
jlkThe default_content_type is used as the default for the Content-Type header that is returned on the response. It is text/html.23:25
jlkoh that's content-type, not charset :/23:25
mordredyah - default body encoding seems like the right thing23:25
jamielennoxcharset should be it's own property on webob, you shouldn't need to manually do the headers23:26
jlkthe object knows it's application/json23:27
jlkand it thinks the charset is 'UTF-8'23:27
jeblairi think we should drop the --keep-jobdir argument to the executor and replace it with an IPC like verbose.  so we'd run "zuul-executor keep" or something to enable it.23:45
jeblair(it's difficult to start the executor with the init system and pass an argument like that)23:46
mordredjeblair: ++23:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Replace --keep-jobdirs with an IPC  https://review.openstack.org/47868223:51
jeblair2017-06-28 23:49:30.347020 | localhost | ERROR: Results: => {"changed": false, "failed": true, "msg": "Failed to write to file /root/.ssh/known_hosts: [Errno 2] No such file or directory: '/root/.ssh/tmpS8tEVy'"}23:51
jeblairwhy's it trying to write there?23:51
jlkI am so confused23:51

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!