Tuesday, 2017-11-21

openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role  https://review.openstack.org/52114200:00
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role  https://review.openstack.org/52161800:00
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Update fetch sphinx output to use sphinx vars  https://review.openstack.org/52159000:00
tristanCmordred: oups, the angular version note got lost in the rebase, it was documented here: https://review.openstack.org/#/c/466561/1/etc/status/fetch-dependencies.sh   (v1.5.6)01:00
openstackgerritTristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: web: add /static installation instructions  https://review.openstack.org/52169401:24
tristanCmordred: /win 4501:25
tristanCoops, well this https://review.openstack.org/521694 recap /static installation instructions01:25
mordredtristanC: ah - cool! I can update my patch to use 1.5.6 instead of 1.5.801:28
mordredtristanC: https://review.openstack.org/#/c/521625/ - unless you want to squash mine into your patch there - either way is fine with me01:28
tristanCmordred: or i can verify zuul.angular.js works with 1.5.8, the concern is about the $locationProvider used in builds.json to parse the query string args01:29
tristanCmordred: it feels like those curl script are band aid until we integrate webpack or something01:30
tristanCclarkb: agreed what matter is how zuul jobs leverage ansible, though there is one bit to account is how the zuul_stream callback works, iiuc the zuul-executor needs a tcp connection to the slave zuul_console daemon01:33
tristanCwhich seems to assume the nodepool slave do already have a regular network along the ansible_connection01:35
pabelangerclarkb: jeblair: I added the notes about the zuulv3.o.o outage last week to: https://wiki.openstack.org/wiki/Infrastructure_Status01:39
pabelangerhttps://review.openstack.org/513915/ was the commit in question that stopped us from starting zuul again01:39
pabelangerand required a force merge of: https://review.openstack.org/519949/01:39
jeblairpabelanger: that was the original version of that commit; it was fine.  mordred merged that repo into another outside of gerrit, which is why the problem arose.02:09
pabelangerokay02:17
pabelangerremote:   https://review.openstack.org/521700 demo variable scoping issue in ansible02:27
pabelangerjeblair: mordred: clarkb: dmsimard: a very simple patch to demo the issues I am having ^.  This isn't a zuul issue, but difference how different inventory files can affect ansible from running.  We can go into more details inthe morning02:29
tristanCregarding nodepool backends, this isn't a blocker to release v3 today from my point of view. though we might want to merge a few simple addition to support custom ansible_connection and ansible_user so that it works for tobiash's use case.02:40
pabelangertristanC: which patch is that?02:55
tristanCpabelanger: https://review.openstack.org/453983 and https://review.openstack.org/45398302:56
openstackgerritMerged openstack-infra/zuul feature/zuulv3: web: add /static installation instructions  https://review.openstack.org/52169403:00
pabelangertristanC: think one was to be https://review.openstack.org/50197603:04
pabelangerbut cool, never knew we had that03:04
pabelangerlooks interesting03:04
pabelangertobiash: left comments on 50197603:07
*** harlowja has quit IRC03:22
dmsimardmordred, tristanC, jeblair: I'd love to pick your brain about a question I have regarding the API implementation in ARA if you happen to be around03:45
dmsimardjlk: maybe you too since you used it in BonnyCI :D03:47
dmsimardbefore 1.0, aggregating data to a single location (i.e, running ansible from different servers with ARA set up) meant using a database server, like MySQL, creating credentials and a database -- and then configure those credentials so that ARA knows how to connect to that database03:48
dmsimardIn OpenStack terms, it's not very different than how different nova compute nodes knows where the nova database is as well as the username and password03:49
dmsimardIt's not ideal for a number of reasons, one of which is because the user has read/write access to the database and those credentials might end up on users' laptops because that's where they run ansible from. A bit meh.03:49
dmsimardSo, enter 1.0 with this shiny new API. There's either the default standalone/offline/internal API which has no authentication, no network calls and no HTTP involved.03:50
dmsimardOr there's the HTTP REST API that you can make available so that you can get/post/update data03:51
dmsimardI don't really want to be in the business of managing API tokens or credentials, or ACLs but there might not be any other way. I'd really have two kind of "users" (or "tokens"), read-only and read-write. However, I'm not really sure how to go about managing tokens/users.03:52
dmsimardI think when we discussed the Zuul API in Denver jeblair said he also wasn't interested in managing credentials and would rather keep the API opened and securing the API as an exercise to the operator (i.e, hide /api/admin properly through a webserver or something)03:55
dmsimardI'm wondering if I should do the same thing or not. I really want to keep the code base as simple as possible.. I'm a bit concerned about the implications of adding credentials, permissions, etc.03:56
dmsimard</endwalloftext>03:56
tristanCdmsimard: how about using http authorization and adding htaccess in front of the ara server?04:00
tristanCi guess this is how it's going to be implemented for the zuul-web/admin endpoint04:01
*** harlowja has joined #zuul04:01
dmsimardtristanC: Like a http authentication ? or restriction by IP ?04:01
dmsimardI'm not sure how http authentication in front of an API would work from a client perspective04:01
dmsimardrestriction by IP (or hosting the API in a restricted network to begin with) is probably what I had in mind04:02
dmsimardI suppose since the client uses python requests, it's probably easy to go through the http auth and then do a GET/POST/PATCH/etc, just never seen that done before04:03
dmsimardbut yes, it's an interesting idea I hadn't thought about. I really just don't want to end up *validating* the credentials and matching those to some permissions04:04
tristanCdmsimard: i meant like support a 'authorization' or even a 'x-auth-token' http header at the client level, and then use a middleware to authorize the request on top of the ara server04:07
tristanCdmsimard: though, isn't zuul going to only use the standalone/offline/internal api of ara?04:09
dmsimardtristanC: probably, yes.. the API is useful to ARA first of all, it is consuming the API instead of doing custom SQL queries everywhere04:11
dmsimardthe API endpoint is available if people are interested in aggregating data from different locations that way04:12
dmsimardbut it also allows to query ARA programmatically over HTTP04:13
dmsimardi.e, give me the tasks for this playbook -- or give me the results for this task04:13
dmsimardRunning the API endpoint is not required at all, the default is still the internal API that is completely offline without HTTP04:14
tristanCdmsimard: speaking of which, i'd be interested in the 'give me the output of all the failed task'04:14
tristanCwhich sounds like the first query the zuul user should get when looking at the ara report of his job04:15
dmsimardtristanC: yup, you could totally do something like this (totally just wrote it now) http://paste.openstack.org/raw/626895/04:21
tristanCso that would be part of a "ara generate report --failed-first" or something like that?04:22
dmsimardtristanC: that's python, it's not a frontend/UI implementation04:22
dmsimardtristanC: it's something that, for example, the zuul executor could do to learn about failures and maybe link to them directly or something.04:23
* dmsimard waves hands like mordred would04:24
tristanCdmsimard: i meant, right now you have to click "logs -> ara -> playbook -> task-page -> the task that failed" to get the reason why your job failed04:25
*** haint has quit IRC04:25
tristanCdmsimard: what would be cool is to shorten all those intermediary clicks so that when you click logs, then you get the output of the tasks that failed04:25
dmsimardtristanC: yeah but really this failed task result is already available to a direct link like http://logs.openstack.org/72/516172/4/check/openstack-tox-cover/f3e9208/ara/result/09382b17-4cfe-44dd-b0c1-729feeef3e4f/04:26
dmsimardA static report isn't going to have an API available in order to query it04:27
dmsimardBut the executor can query ARA after the playbook has completed, determine if there has been any failures, and link to it accordingly04:27
dmsimardAnyway, your imagination is the limit around what you want to end up doing with the API04:29
*** smyers has quit IRC04:36
*** smyers has joined #zuul04:36
*** yolanda has quit IRC04:44
*** nguyentrihai has joined #zuul05:34
*** haint has joined #zuul05:40
*** nguyentrihai has quit IRC05:43
*** harlowja has quit IRC05:52
tobiashpabelanger: did you mean 503148 or forgot to click send on 501976?06:16
openstackgerritTobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Rename ssh_port to connection_port  https://review.openstack.org/50080006:26
openstackgerritTobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Support username also for unmanaged cloud images  https://review.openstack.org/50080806:28
*** yolanda has joined #zuul06:45
*** hashar has joined #zuul07:03
openstackgerritTobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Add connection-type to provider diskimage  https://review.openstack.org/50314807:38
openstackgerritTobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Don't gather host keys for non ssh connections  https://review.openstack.org/50316607:38
openstackgerritTobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Add connection-port to provider diskimage  https://review.openstack.org/50411207:38
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use username from node information if available  https://review.openstack.org/45398307:45
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Rename ssh_port to connection_port  https://review.openstack.org/50079907:45
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use connection type supplied from nodepool  https://review.openstack.org/50197607:45
*** rcarrillocruz has quit IRC08:49
*** rcarrillocruz has joined #zuul09:42
*** hashar has quit IRC10:03
*** jesusaur has quit IRC10:07
*** hashar has joined #zuul10:16
*** electrofelix has joined #zuul10:18
*** jesusaur has joined #zuul10:19
*** jhesketh has quit IRC10:28
*** jhesketh has joined #zuul10:30
*** jkilpatr has joined #zuul12:04
*** isaacb has joined #zuul12:13
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use username from node information if available  https://review.openstack.org/45398312:28
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Rename ssh_port to connection_port  https://review.openstack.org/50079912:28
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use connection type supplied from nodepool  https://review.openstack.org/50197612:28
rcarrillocruzhey folks, to trigger check jobs on .zuul.yaml, that's not implicit right?13:08
rcarrillocruzlike13:09
rcarrillocruzi have to splicitly put a files regex on .zuul.yaml if i want to trigger the job should it be modified13:09
rcarrillocruz?13:09
rcarrillocruzi.e.13:09
rcarrillocruz    files:13:09
rcarrillocruz      - ^lib/ansible/modules/network/ovs/.*$13:09
rcarrillocruz      - ^test/integration/targets/openvswitch.*13:09
rcarrillocruzi should also add13:09
rcarrillocruz- .zuul.yaml13:09
rcarrillocruzshould I want that job to be triggered on .zuul.yaml mod ?13:10
tobiashrcarrillocruz: if you use files filter you probably also want to add .zuul.yaml in case you touch the according job13:15
rcarrillocruzOk, thx for confirming13:16
tobiashrcarrillocruz: and probably the according playbook13:17
rcarrillocruzAye13:32
tobiashrcarrillocruz: if you want to limit what jobs run on zuul.yaml changes you also could split that into several files13:56
*** jkilpatr_ has joined #zuul14:00
*** jkilpatr has quit IRC14:03
dmsimardjeblair: so I noticed that ARA is still not up to date on the executors.. we had gotten stuck by https://review.openstack.org/#/c/516740/14:04
dmsimardI happened to have switched to Firefox (57 is awesome) and there's a bugfix in one of the latest releases that resolves an issue with permanent links on firefox :(14:04
*** hashar has quit IRC14:11
*** hashar has joined #zuul14:11
rcarrillocruzmordred: in terms of zuul , third party  CI and github, how's the story there? will 3rd partys willing to CI to create their own GH app and 'we' install it on our repos or is there other mechanism in the roadmap ?14:49
rcarrillocruzother question: depends-on does not work in multiCI  envs (like mixing a Github and Gerrit) iiuc, does it work on github to github tho ?14:55
*** weshay is now known as weshay_pto14:55
mordredrcarrillocruz: yes to the first question14:56
mordredrcarrillocruz: for the second, cross-source depends-on is the thing that doesn't work yet- but it's on the short-term roadmap14:57
rcarrillocruzDoes it work if source are GH?14:57
mordredrcarrillocruz: so that'll be fixed before we cut an official 3.0 release14:58
mordredyes - it works with gh14:58
rcarrillocruzSweet thx14:58
pabelangertobiash: rcarrillocruz: I would think .zuul.yaml files would be implied matching on files, but never tested15:06
rcarrillocruzyeah, thought so, but encountered that. I think it makes sense , as you tie a file , to a job, to a pipeline15:07
rcarrillocruzif it was implied15:07
rcarrillocruzthat would mean kicking off on all pipeliens15:07
rcarrillocruzat least that's how i assume the rationale is about needing that to be explicit15:07
tobiashpabelanger: oh, didn't thought of that possibility15:09
tobiashis it really implied?15:09
pabelangertobiash: I am not sure, I assumed it was. but need to test myself. i think zuul always will load it config15:09
rcarrillocruzso, reading that roadmap thing15:15
rcarrillocruzi'm curious about the dashboard15:15
rcarrillocruzwhat does it mean15:15
rcarrillocruzbundling something on zuul , html stuff and all, to get zuulv3.openstack.org kind of interface15:16
rcarrillocruzfrom what i see, 8001 is the zuul 'api', to see live status of queues . I assume the dashboard as we know it is something we deploy outside of zuul package15:16
rcarrillocruz?15:16
rcarrillocruzheh, was chatting about dmsimard the other day we may eventually also need ansible_connection along with ansible_user plumbed up to zuul, just spotted tobiash https://review.openstack.org/#/c/501976/15:19
rcarrillocruz++15:19
dmsimardtobiash++15:19
rcarrillocruztristanC: the dashboard thing you got it assigned, is that dashboard a thing that will be bundled within zuul ?15:20
rcarrillocruzdo you have changes for it to look around?15:20
rcarrillocruznm, https://review.openstack.org/#/q/topic:zuul-web+(status:open+OR+status:merged)15:27
pabelangerrcarrillocruz: I think it would be something like: https://softwarefactory-project.io/zuul3/local/builds.html15:28
rcarrillocruzoh15:28
rcarrillocruzmucho bonito!15:28
jeblairrcarrillocruz: dashboard will be built in to zuul -- all the web stuff will be combined15:28
pabelangerI think SF rolled that out yesterday, which is the based for zuul-dashboard15:28
jeblairrcarrillocruz: "topic:zuul-web" has the changes15:28
rcarrillocruz++15:28
rcarrillocruzthat's great15:29
rcarrillocruzcos i was having a hard time figuring out how to get a dhasboard last night15:29
rcarrillocruzi think i better wait to get that merged15:29
tobiashrcarrillocruz, dmsimard: yeah, just rebased it this morning :)15:31
rcarrillocruzthat's great for me, cos, i need ansible_connection to be either local or network_cli in order to test network devices from executor15:32
rcarrillocruzworkaround now, i create a bastion with nodepool that creates an inventory on the fly with the needed vars15:32
jeblairrcarrillocruz: i wonder how that will work with the security protections we have against local connections for untrusted jobs (it doesn't apply to trusted jobs, but i'm sure we'd want to find a way to make both work).  (cc: mordred)15:38
rcarrillocruzyeah. for this POC, i wanted to have jobs in-repo cos i find superuseful to get  those tests on commit. OTOH, that forces me get this additional bastion as i can't do certain things on the executor. I think when I show this to my peers and we move it forward in prod i'll just make the 'run ansible network integration tests' job a role on config project so I don't double-jump to kick off tests,15:43
rcarrillocruzexecutor-bastion-testnode15:43
jeblairrcarrillocruz: i'm assuming the network_cli connection plugin wouldn't cause many security concerns on the executor...  what do you need local for?15:45
rcarrillocruzit's a trick we have on network modules on 2.3/2.4. We check on action plugins if it's local, then switch to network_cli. We had to do that back in 2.3 to leverage ansible command line flags like -k and -u , instead of needing to pass creds as module side args15:48
rcarrillocruzhttps://github.com/ansible/ansible/blob/stable-2.4/lib/ansible/plugins/action/ios.py#L5115:49
rcarrillocruzgood news is that on devel we're moving away from that hack, and devices ssh connection will use connection: network_cli , no more local15:49
rcarrillocruzbut there are a few platform families needing that transition15:49
mordredjeblair, rcarrillocruz: I believe ansible_connection should be fine with our security protections - if we have nodepool pass it, then it'll be in the inventory - what we protect against is a user setting it as a variable (iirc)15:49
mordredlemme check though15:50
jeblairmordred: okay, that's the way my brain was heading, but wanted to make sure15:50
jeblairso even that local->network_cli hack may work15:50
rcarrillocruzso something i haven't tested yet but i think i may hit a roadbloack is https://github.com/openstack-infra/zuul/blob/feature/zuulv3/zuul/executor/server.py#L140615:52
rcarrillocruzby default, there's a gather_facts on nodepool nodes15:53
rcarrillocruzhowever, for network devices that will fail15:53
rcarrillocruzwe don't have a shell to play with15:53
rcarrillocruznot along python to gather facts15:53
rcarrillocruzshould that be tweakable somehow, or overridable the gather facts phase on the job section15:53
rcarrillocruz?15:53
pabelangerI wonder if you could laydown an empty file in fact cache, like we do for localhost as a pre playbook15:54
mordredrcarrillocruz: well - we do that in server to pre-cache the facts - which along with gather: smart means tasks in jobs shouldn't themselves run fact gathering ...15:54
rcarrillocruzor paramiko nodepool connection_port as we chatted the other day15:55
rcarrillocruzmordred: but don't we fail early witn NODE_FAILURE should that pre stage fail ?15:55
mordredperhaps, similar to connection it's someting we need to know abot a node from nodepool - 'supports fact gathering'15:55
mordredrcarrillocruz: oh - absolutely- that'll totally break you as it is today15:55
mordredbiggest question would be how to know that the node type in question does not support fact gathering15:56
rcarrillocruzso not sure how we would tackle that, as an executor flag (pre fact gather yes/no)15:56
rcarrillocruzor have a new param on the node15:56
rcarrillocruzsaying15:56
rcarrillocruz'supporst fact gathering'15:56
mordredrcarrillocruz: how do the network modules themselves handle playbook automatic fact gathering?15:56
rcarrillocruzboth are not mutually exclusive15:56
rcarrillocruzmordred: today we have <platform>_facts15:56
rcarrillocruzin the short term, gather_facts will be pluggable15:57
rcarrillocruzmeaning, if we hint ansible that the node is a network thing  ( think with ansible_network_os), then executor will spawn the right 'driver'15:57
rcarrillocruzi think alikins was on it, not sure if we'll get that for 2.6 at the very least15:57
mordredrcarrillocruz: yah - but ansible-playbook runs fact gathering on hosts ... are there just hard-coded lists in playbook that say "don't run fact gathering if ansible_connection is network_cli or something?"15:57
pabelangerwell, I think we only run setup_playbooks (gather facts) today to ensure SSH has been setup properly. We could re tweek that again and stop doing ansible -m setup to validate SSH is working, which moves facts back into ansible-playbook. Wouldn't that allow somebody to gather_facts: false in all playbooks?15:58
mordredoh - wait15:58
rcarrillocruzyeah, i think gather_facts is only on ssh connection15:58
mordredif we need to plumb ansible_connection in anyway, we could just check if ansible_connection == 'ssh' in that fact gathering15:58
rcarrillocruzmordred: http://docs.ansible.com/ansible/latest/ios_facts_module.html we do as modules15:58
mordredsince, as pabelanger says, it's in support of our ssh connections15:59
rcarrillocruzcool15:59
rcarrillocruzi thnk that's a good compromise15:59
pabelangeryah15:59
rcarrillocruzso, wait for tobiash changes to land15:59
rcarrillocruzthen change executor logic to test that15:59
jeblairwell, it's twofold16:00
jeblairit's not just to validate that ssh is working, but also to establish the ssh controlpersist connections16:00
jlkAs folks who work on CI, y'all will appreciate this: https://unix.stackexchange.com/questions/405783/why-does-man-print-gimme-gimme-gimme-at-003016:01
pabelangerwoot!  https://github.com/gtest-org/ansible/pull/1 tox-pep8 works (still) via github connection driver16:01
pabelangertook 10mins to sync git repo to node however :D16:01
rcarrillocruzjeblair: what's the reason to check for the controlpersist? asking, as in paramiko we don't have such, it's the reason why ansible-connection was written, to have 'feature parity'16:02
*** isaacb has quit IRC16:02
mordredrcarrillocruz: we set up controlpersist independently16:03
rcarrillocruzjlk: off-topic, does anyone know sigmavirus or where he hangs out? https://github.com/sigmavirus24/github3.py/pull/671 , i guess it would be good to get a release to not carry the editable package on requirements.txt16:03
mordredrcarrillocruz: because of wrapping ansible-playbook calls in bubblewrap16:03
rcarrillocruzic16:05
jlkrcarrillocruz: I know not, but definitely worth poking upstream again :(16:05
jeblairdigging deeper, i *think* things should still work even if controlpersist isn't established there16:05
mordredrcarrillocruz: also because we start an ssh agent so that we can inject the ssh keys into it and then remove them so that they are not there for the jobs16:05
rcarrillocruzah16:06
rcarrillocruzso that explains the remove_build_key role16:06
rcarrillocruzi was wondering what was about it16:06
mordredI mean - there's a few things we could make ssh-aware - like we don't need to start an ssh agent if ansible_connection != ssh16:06
rcarrillocruzthat's the rationale for it?16:06
jlkhrm.16:06
jlkmordred: would we need to model that "add/remove" capability if the connection is not ssh? like if the connection is kubectl exec ?16:07
mordredrcarrillocruz: ya - we have a base key that we manage, we use that in service of creating a per-build key and adding it to the remote nodes, then removing access to the original key fromthe job before handing things off16:07
jlkis the threat model written up somewhere w/ the keys?16:08
jeblairrcarrillocruz: that way a job can't (somehow) ssh into another host outside the set it's been given16:08
rcarrillocruzwas it ever on the table the idea to spawn executors from nodepool itself ? like a control plane pool16:08
jeblairit shouldn't be able to do that anyway, but just in case16:08
mordredjlk: unsure - we could make the key dance in the base job no-op if ansible_connection != ssh - or it's possible we'll need to do similar things for other systems, like win_rm which uses passwords/certs iirc16:08
jlkmordred: yeah we may need to do that in the k8s exec route. Otherwise a task on the executor could exec into pods/containers from another job16:09
pabelangerjlk: github question, how is the 'detail' url in 'all checks have passed' box work? https://github.com/gtest-org/ansible/pull/116:10
jlkunless we figure out a way to prevent docker/kubectl calls from happening via shell on the executor16:10
mordredjlk: nod. yah - it seems like there is a dance we need to do generally, but the impl may be different for each type of ansible connection plugin16:10
mordredjlk: we have that way16:10
pabelangerjlk: is that something we need zuul to update with stream.html page / final logs?16:10
jeblairit's worth noting the key swap is a second layer defense.  it should not be possible for an untrusted job to add a host to the inventory or run a local hsel.16:10
jeblairshell16:10
mordredjlk: docker/kubectl calls are already prevented from running via shell on the executor16:10
jlkpabelanger: it's the zuul_url fed back through as part of the status POST call16:10
mordredjeblair: ++16:11
jlkjeblair: oh good point.16:11
jeblair(but just in case that happens somehow, we didn't want the result of that attack to be "you can ssh into any node zuul can ssh into")16:11
pabelangerjlk: thanks, in bonnyci, did you get it properly configured to point to your final logs?16:11
mordredyah16:11
jlkpabelanger: for zuul 2.x yes, for 3.x I think that's still an open question16:11
mordredit seems like good form to have equivilent auth dances for other connection types16:11
jlk(particularly since that URL dances around)16:11
jlkYou can have only one URL, so you need it to link to a page that shows all the jobs from a pipeline, with links into their jobs16:12
jlker logs16:12
jeblairjlk: what did you do in zuul v2?16:12
*** isaacb has joined #zuul16:12
jlkwe pointed to a directory16:13
jlkand that directory had subdirs for all the jobs I believe16:13
jeblairoh, so you constructed the logpath specifically for that case16:13
jeblairmakes sense16:13
pabelangerokay, so it is possible we still have some work to do on v316:14
jeblairour path in openstack is constructed to organize by change, but not buildset16:14
jeblairmaybe we could switch it?16:15
jeblairinstead of /change/patchset/pipeline/job/build/  we could use /change/patchset/pipeline/buildset/job/16:15
rcarrillocruzi copy pasted the url format from bonnyCI , this is how it looks like on my side http://38.145.34.35/logs/ansible-networking/check/github.com/rcarrillocruz-org/ansible-fork/5/c66a514898a14a9ba93a813c8d32a117/16:15
jeblairor we can link to the dashboard url for the buildset16:16
jeblaironce the dashboard lands16:16
jeblairthat may be the better approach16:16
jeblairrcarrillocruz: ah thx, makes sense16:17
mordredI like the dashboard approach ... since that link could potentially contain the in-progress links and change to the log links (at leats in theory)16:17
jlkyeah the dashboard was what we were hoping for16:17
jlkand works more like Travis, CircleCI, Shippable, etc.16:17
mordredso the link given in the status could be a persistent link that people could re-use16:18
jeblairmordred: i agree, though currently the dashboard doesn't handle in-progress links16:18
mordredyah16:18
jeblairand it's not trivial to add16:18
pabelangerjeblair: don't mind trying the new URL format16:18
* rcarrillocruz will follow what shippable does, to make people more comfortable on current way of doing ansible CI things16:18
jeblair(i think we *can*, it's just programming, but it's merging two data sources)16:18
jeblairrcarrillocruz: what does that mean?16:19
jeblairtristanC: replied on https://review.openstack.org/50327016:26
tristanCjeblair: followed up :)16:39
rcarrillocruzjust echoing jlk 'works more like travis, shippable'. At Ansible they use Shippable, so I'll try to show things like Shippable on zuul PR notifications16:40
jeblairrcarrillocruz: right, i'm asking what that means :)16:41
tristanCfwiw i'm not convinced the route we decided at the ptg are the best, it makes apache rewrite a bit weird to serve static .html files on dynamic path16:41
tristanCi wonder if we shouldn't step back and have instead a single .html file that would query the different controller path16:42
tristanCor if you have other suggestion, i wouldn't mind using another routes list and refactoring the html bits16:44
rcarrillocruzthis is how a link on an ansible PR looks like https://app.shippable.com/github/ansible/ansible/runs/44996/summary/console . From the gtest PR it was put earlier we point to the main zuul v3 dashboard, would be good to point to the actual job stream link16:45
rcarrillocruznot sure howe we get the shippable 'run' link16:45
rcarrillocruzi can ask mattclay16:45
jlkPretty sure it comes with the status from Shippable16:46
jlkthe pending one16:46
rcarrillocruzwootz16:46
rcarrillocruzhttps://github.com/rcarrillocruz-org/ansible-fork/pull/516:46
rcarrillocruzjanky16:46
rcarrillocruzbut i get 'usable' links back on PR16:46
rcarrillocruzjust added zuul_return on the base post playbook16:46
jlkhttps://travis-ci.org/BonnyCI/hoist/builds/267787248?utm_source=github_status&utm_medium=notification is a relevant link from Travis16:47
jlkit's the URL it tosses on status POSTs16:48
mattclayrcarrillocruz: You had a question about getting Shippable run links?16:48
rcarrillocruzoh16:48
rcarrillocruzdid not even know you were here mattclay16:48
rcarrillocruz:-)16:48
* mattclay waves16:49
rcarrillocruzso we were wondering how shippable 'detail' link gets returned straight to the job being run16:49
rcarrillocruzas in the zuul report put on openstack just points to the main dashboard16:49
rcarrillocruzhttps://github.com/gtest-org/ansible/pull/116:49
mattclayrcarrillocruz: You mean the 'Details' link for the Shippable status that shows up on a PR?16:50
rcarrillocruzyah16:50
mattclayrcarrillocruz: I believe it's this: https://developer.github.com/v3/repos/statuses/#create-a-status16:50
jlkright, like I said. It's a URL that is provided as part of the POST to set the commit status16:51
mattclayIt gets updated every time the run status changes until it's finished.16:52
pabelangerI don't think shipable comments on PRs like zuul right?16:55
*** isaacb has quit IRC16:57
rcarrillocruzbit different yeah, it doesn't put a comment per-se16:57
rcarrillocruzhttps://github.com/ansible/ansible/pull/3314616:57
rcarrillocruzit's an 'all checks have passed' that you can click16:58
rcarrillocruziirc with zuul we put a straight comment from the bot16:58
mordredrcarrillocruz: we can do either - it's configurable16:58
mordredrcarrillocruz: you can configure it to report into that status link, or to leave comments, or both16:58
*** hashar is now known as hasharAway17:00
rcarrillocruzaha17:00
* rcarrillocruz just looking at comment option of GH reporter17:00
jeblairi thought we could not update the link...?17:00
jeblairif we can update the url, then we should link to the status page in the start report, then link to the logs/dashboard in the final report17:05
jeblairbut i thought someone said we could only set the url once17:05
jeblairoh, maybe i'm misremembering, and the problem is that, without the dashboard, we don't have a single url for the buildset after the builds complete?17:06
jeblairso once we *do* have the dashboard, we can do what i described above: set the url to status page on start, then set the url to dashboard on final17:07
jeblairrcarrillocruz, jlk: ^ does that sound right?17:07
mordredyes - I think that was the main issue17:07
pabelangeryah, that looks to be right based on docs17:07
jeblaircool.  i will be very happy when this is all straightened out.  :)17:07
jeblairtristanC: can we just have zuul-web serve the static html files?17:08
*** bhavik1 has joined #zuul17:16
tristanCjeblair: well yes, that's what it does by default17:17
openstackgerritTristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: web: add /{tenant}/jobs route  https://review.openstack.org/50327017:18
jeblairtristanC: sorry, i may have misunderstood what you were saying about apache rewriting17:21
tristanCjeblair: to keep things dead simple from a user pov, we said that the status page would avaiable at /{tenant}/status.html17:22
tristanCjeblair: that content is served by /{tenant}/status.json and the page just do "get status.json"17:22
tristanCjeblair: which is all good with a standalone zuul-web service17:23
tristanCjeblair: but to serve those html files (which includes builds.html, jobs.html, and later {jobname}.html) from a proxy, then we need to rewrite those url using something like:17:24
jeblairwhy would we need to rewrite individual urls in the proxy?  normally i would just expect to proxy the root17:25
tristanCAliasMatch "^/zuul3/.*/(.*).html" "/var/www/zuul-web/static/$1.html"17:27
tristanCjeblair: rewrite static file so that they are served by apache instead of aiohttp17:27
jeblairtristanC: why not let aiohttp serve them?17:27
tristanCjeblair: good point :-) i may have over optimized that thing...17:28
jeblairtristanC: also, we're sending cache-control headers for the status page at least, we can probably make sure we set those correctly for the html pages too, and then apache will end up serving them from cache anyway most of the time, with no extra configuration17:29
tristanCalright then nevermind that concern, let's do this instead17:32
tristanCjust need to add cache-control to the static file controller17:33
jeblair++17:35
pabelangerjeblair: mordred: clarkb: dmsimard: I linked this last night, but https://review.openstack.org/521700/ is an example of the issues I was trying to explain around the need for https://review.openstack.org/521324/17:39
pabelangerit shows how group vars are handled differently based on inventory file17:40
pabelangerjlk: ^might be intersting to you too17:40
* dmsimard looks17:42
dmsimardpabelanger: I think I understand what's going on but that seems like a bug in Ansible to me17:44
dmsimardDoing something in Zuul to address that seems like a workaround for a bug17:44
dmsimardv3-inventory and v3-inventory-group should behave the same17:45
dmsimardWell.. maybe not, actually17:45
jeblairpabelanger: which numbers do you get when you run v3-inventory-group?17:52
*** bhavik1 has quit IRC17:53
jeblairah it's in the job log -- 6789017:53
clarkbpabelanger: so the problem is in how group vars are associated to a host if its logical name doesn't change?17:53
jeblairpabelanger: you switched the order of the groups in v3-inventory vs v3-inventory-group.  is that important?17:55
dmsimardpabelanger: I sort of remember something related to changes in variable scopes and inheritance in 2.4... let check17:56
dmsimardpabelanger: heh, that sounds like our culprit too: https://github.com/ansible/ansible/issues/2900817:57
dmsimard"import_playbook from child directory break var scope"17:57
pabelangerjeblair: oh, that is a typo, doesn't affect things17:57
dmsimardpabelanger: bcoca explains the change here: https://github.com/ansible/ansible/issues/29008#issuecomment-33055898717:58
pabelangerclarkb: maybe? I don't know why it doesn't work17:58
pabelangerdmsimard: looking17:58
dmsimardpabelanger: tl;dr, in 2.3 vars were loaded at the start (which confuses Ansible in your case because you have one host in two groups) and in 2.4 they are loaded on demand which should have the desired behavior17:58
jeblairdmsimard: yeah, that's how i'm reading it17:59
dmsimardpabelanger: the issue is prevalent especially if you have the same hostvar in more than one group_vars18:00
dmsimardotherwise it probably doesn't reproduce18:00
pabelangerdmsimard: right, I know include is deprecated in 2.4 and should switch to new syntax, but I haven't tested that yet18:00
jeblairpabelanger: would it be very difficult for you to try your example under 2.4?18:00
pabelangerjeblair: nope, i can run that now18:00
dmsimardpabelanger: it's not a matter of using include or import, there *is* a change in how variables are loaded in 2.418:00
dmsimardpabelanger: see bcoca's comment18:01
pabelangeryes18:01
pabelangerlet me first test with include and 2.418:01
pabelangerthen, switch up to import_playbook18:02
dmsimardI don't think either matters18:02
dmsimardat least going off by what they're saying in the bug18:02
pabelangerokay, 2.4.1.0 also failed. changing some syntax18:06
pabelangerv3-inventory-group also fails using import_playbooks18:07
pabelangerdmsimard: which is what you expected18:07
pabelangerdmsimard: so, what are you thinking is the correct process?18:07
pabelangerI think it comes down to: http://paste.openstack.org/show/626981/18:09
pabelangerv3-inventory, is 2 plays (which seems to load vars properly) and v3-inventory-group is 1 play18:10
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add inventory variables for checkouts  https://review.openstack.org/52197618:21
electrofelixdoes zuul support running a specific job on failure? we need to do some parsing of data in order to report back to users on change failure?18:33
jlkI don't think we have a 'finally' type bit of a pipeline18:35
jlkthat's an interesting feature addition, that a spec would be nice for18:35
electrofelixI was hoping the failure_actions in v3 might be something that was thinking along those lines18:35
jlkin your playbook you could catch failure from a play and handle it within that job18:35
electrofelixbit difficult when any of 5 jobs launched could be the cause of the failure18:36
jlknod, where do you see failure_actions? I may have missed something18:36
mordredelectrofelix: so - one of the things on the todo list is better parsing/presentation of the logged json for each job... for instance:18:37
mordredelectrofelix: http://logs.openstack.org/05/521105/3/infra-check/openstack-zuul-jobs-linters/942626d/job-output.json.gz18:37
mordredelectrofelix: has all the base data that was used to produce http://logs.openstack.org/05/521105/3/infra-check/openstack-zuul-jobs-linters/942626d/job-output.txt.gz18:38
electrofelixit's in the zuul/model.py, I figured it might be a generic replacement for how the failure message was performed before18:38
tobiashelectrofelix: post playbooks should be executed regardless of a failed run playbook (within the same job)18:38
mordredelectrofelix: so with an html view of that, collapsing the non-error portions and expanding only the failure portion should be fairly easy18:38
tobiashif that's enough18:38
mordredandyah - what tobiash said18:38
electrofelixtobiash: we'd end up needing to copy the same code to the post playbook of all jobs (and btw, we're still using Jenkins...)18:39
mordredelectrofelix: I thinkn I may not fully understand which thing you're trying to do?18:39
electrofelixI was hoping there might be something we said, on_failure of any of the jobs for the change in the pipeline, run this job18:39
tobiashelectrofelix: so you're talking about zuulv2?18:40
mordredelectrofelix: what would you do in the job that runs in response to on_failure?18:40
electrofelixtobiash: yes, but also considering moving to zuulv3 (still works with Gearman)18:40
mordredelectrofelix: but you don't need to copy the same code to the post playbook of all jobs - you should be able to put the code you need in the post playbook of your base job18:41
tobiashso cleanup jobs are not there in zuul but I think there were already discussions about that some months ago18:41
mordredelectrofelix: also, did I paste you https://etherpad.openstack.org/p/zuulv3-jenkins-integration yet?18:42
electrofelixmordred: take the git tree, parse some metadata stored in the failed commit message, look up some changes further upstream (we're doing artifact promotion), and notify the source projects that produced the artifact that just failed it's promotion18:42
jeblairtobiash: yes, i think we're planning on adding them shortly after 3.018:42
mordredelectrofelix: yah - you should totally be able to just do that in a post playbook on your base job - it'll have a variable that indicates whether the job failed or not, and it also has all of the git repo state available18:43
electrofelixmordred: yep, I think it's orthogonal, I possibly just need to understand a bit more about base jobs and what that means in working with gearman/jenkins18:43
mordredelectrofelix: ++18:43
mordredelectrofelix: mostly poking to make sure I understand the thing you're wanting to accomplish. you could do it with a cleanup job, but I'm pretty sure you could do it with a base job.18:45
jeblairthe difference between cleanup/base would be whether it happes once per job in a buildset, or once for the whole buildset18:45
mordredjeblair: ++18:46
mordredelectrofelix: all that said - you do know that zuulv3 isn't compatible with the jenkins gearman plugin, yeah? that's the reason I pasted that etherpad about v3/jenkins integration thoughts18:46
electrofelixbased on my hazy understanding of terminology, once for the whole buildset, we only care if any of the jobs for the change have failed, we don't care as to what one18:46
mordrednod. so cleanup job, once it exists, may map better for you18:47
jlkwoo, use case -> solution.18:47
jlkgo team18:47
jeblairya, and now we have 2 use cases for cleanup18:48
mordredand since all you need is the git repo state, you should be able to potentially write a nodeless cleanup job18:48
mordred\o/18:48
mordredthat means it may even be a good idea :)18:48
electrofelixmordred: I thought the additional works were to make it work better rather than it not working at all. I see there are still references to gearman in v318:50
electrofelixmordred: what is it that doesn't currently work? is it just the nodepool integration? or can zuulv3 not launch jobs on Jenkins with static slaves at all?18:55
mordredelectrofelix: oh - it still uses gearman, but it uses gearman as an internal communication mechanism, not as an interface with external systems18:55
jeblairthe zuulv3 spec may provide some background: http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html18:56
jeblairalso some of the docs we wrote for the infra migration: https://docs.openstack.org/infra/manual/zuulv3.html#what-is-zuul-v318:57
jeblairin short, it handles execution and multi-node orchestration itself, via ansible.  jobs run as ansible playbooks.  those can be simple playbooks which just run tests (which is the bulk of what we do in openstack infra), but ansible gives us a lot of flexibility in interacting with other systems, so mordred's etherpad lays out a way of doing so18:59
*** jeblair is now known as thecount19:01
*** thecount is now known as jeblair19:01
mordredyah. one of the biggest bits is figuring out how to get the zuul prepared repo state onto the node that jenkins is going to run the job on - that's the reason for the handoff dance in the second part of the etherpad19:01
mordredif you're doing static nodes, then obviously the nodepool integration bit isn't as important19:01
electrofelixis it no longer possible from the node side to clone/pull from zuul merger?19:02
mordredalthough we might have to brainstorm about how to get the information about the correct static node to zuul if it's static nodes jenkins owns rather than static nodes zuul owns and passes over19:02
mordredelectrofelix: nope. they don't run git servers at all19:03
jeblairthat was mostly to facilitate the workflow where cloud resources don't have access to the control plane, so it's a push rather than a pull now19:03
electrofelixmordred: ah, well that would be a problem, I wonder if we could hack something there temporarily as otherwise it might become so difficult to migrate that no time ever gets allocated for us to help work on it19:04
mordredelectrofelix: well - once the jenkins integration stuff is done (several people want it/need it) I imagine it would be much easier for you to migrate19:05
jeblair(and as mordred mentioned, doing a first pass of the zuul-trigger-plugin without nodepool support should be a lot easier)19:05
jeblair(if you only have static nodes to worry about for the moment)19:05
mordred++19:06
* rcarrillocruz vaguely recalls having a reverse tunnel to the merger in Gozer just for that19:06
mordredrcarrillocruz: sssh. don't put that tunnel in the docs :)19:06
dmsimardpabelanger: hey sorry I went to get lunch19:06
* rcarrillocruz also has dug Gozer deep in its memory so he may be wrong19:06
rcarrillocruz:P19:07
dmsimardpabelanger: so you were not able to get group_vars to load expectedly in either cases with 2.4.1.0 ?19:07
electrofelixrcarrillocruz: I thought gozer was old enough that it still had the push merge functionality ;-)19:07
rcarrillocruzlol, we had so many hacks it's hard to remember19:07
electrofelixNeeding to hack something together to deal with this reporting back to other repos on a failure in a different repo might be difficult to persuade it's worth writing the zuul trigger plugin and then mean we have a more difficult time to migrate19:08
rcarrillocruzlike all the proxy mesh i put to make pulls to work on the internal labs19:08
pabelangerdmsimard: both v2-inventory and v3-inventory work as expected, v3-inventory-group fails19:08
dmsimardpabelanger: so same behavior as 2.3 ??19:08
pabelangerdmsimard: right19:09
pabelangerwhich, is fine for me.19:09
dmsimardlet me try something19:09
* tobiash had fun deploying openshift a hundred times during the last two weeks19:13
electrofelixmordred jeblair: so we run a git daemon from the same container as the zuul merger instance (using supervisor), which is obviously a giant security hole for private repos, but hey lets not worry about that. Seems like that might allow us to migrate to v3 with the zuul trigger plugin for getting code onto the slaves19:16
electrofelixs/with/without/19:16
dmsimardpabelanger: ok, FWIW I confirm the behavior -- when asking bcoca about it, it is expected behavior. The problem is basically that you have group vars for the *same* host in two groups, the last one loaded wins in that case which is ultimately defined by alphabetical order by default19:21
pabelangerright19:22
dmsimardpabelanger: but this behavior can be changed with the "ansible_group_priority" var.. I don't see it on docs.ansible.org but there's a mention of it here https://github.com/ansible/ansible/pull/2877719:22
pabelangerdmsimard: https://review.openstack.org/521324/ is my attempt to fix it19:23
jeblairelectrofelix: well, in v3 the git repos we want to put onto the workers are on the new zuul-executor server, and they're in a job-specific directory.  it, erm, would be physically possible for you to do the same sort of thing, except that it's an even larger security hole.  it's definitely not intended to be served out.  tbh, i'm not sure it'd be that much harder to do the jenkins plugin.19:23
dmsimardpabelanger: that makes sense19:23
dmsimardpabelanger: this problem hurts my brain19:23
dmsimardjeblair: I understand pabelanger's issue now19:24
dmsimardjeblair: forget about SSH, different plays or var scopes.. it's about the same *inventory host* being in two different groups, and these two groups each have a group_vars.. There has to be one group_vars that wins over the other. What pabelanger aims to fix is to provide the ability to generate different *inventory hosts* which are really the same nodepool VM as to make sure each group_vars is loaded properly19:26
dmsimardI hope that makes sense, this one hurts my brain for some reason19:26
electrofelixjeblair: the problem is selling it, it can sound much easier when it's supposedly just a script, and far more work when it's a plugin, whether it is the same amount of work to solve doesn't always figure into it...19:27
electrofelixmordred: I'll try chatting to you more about the plugin, I've a feeling I won't be able to get it to fly this side of feb, but might at least try19:29
dmsimardpabelanger: something that is worth mentioning is that this problem doesn't reproduce if you have different var names19:30
dmsimardpabelanger: we're seeing this "race" because the same var is defined in both places19:30
clarkbdmsimard: interesting so it must merge the vars together?19:30
clarkband its last overlapping name wins?19:30
dmsimardclarkb: yes, child > parent, priority  and then 'alpha sort'19:31
dmsimardThere's arguably not much else they can do19:31
dmsimardThere has to be something to resolve conflicts19:31
pabelangerdmsimard: right, I want to keep variable names, but set them to different values based on host.  I could rewrite the playbooks to use unique vars, but not something I'd like to do19:31
mordredelectrofelix: so - it's also been suggested to me that the thing I'm calling a plugin might be able to be done with a groovy script in a jenkinsfile19:32
dmsimardpabelanger: yup, just clarifying the behavior about "conflicting" group_vars19:32
pabelangerdmsimard: yah, thanks19:32
pabelangeryou explained it better then I could19:32
mordredelectrofelix: I don't really know much about those - but I bet if we put our heads together we could come up with a hacky POC approach that would do the handoffs appropriately but not involve a new plugin19:32
electrofelixmordred: yes, but it requires a system groovy script and that would just be a precursor to a plugin because you really wouldn't want to have to replicate that for every job19:33
mordredelectrofelix: nod19:34
jeblairdmsimard: yes, though i had read the comment on the bug about loading on demand as suggesting that perhaps when a host is being used because it's in a specific group, that group would win, not some arbitrary first or last group.19:35
jeblairbut i'm not arguing with reality, that was just what i was hoping for :)19:36
dmsimardjeblair: that's what I thought too, actually, which is why I was surprised to see the issue stayed there in 2.419:39
dmsimardlet me challenge upstream on that19:39
pabelangerI should also note, https://review.openstack.org/519596/ didn't actually fix the issue with var scoping as I expected. So, we could just abandon that now, if we don't see value in doing it19:40
pabelangerit still required an updated inventory file19:40
dmsimardpabelanger: added a comment on https://review.openstack.org/#/c/521324/ which summarizes what we discussed19:42
*** hasharAway is now known as hashar19:44
*** electrofelix has quit IRC19:45
dmsimardjeblair: vars are loaded "just in time" for the host, but it doesn't change how when it loads the vars there is a conflict that needs to be resolved basically19:47
dmsimardThere's no awareness of context as to what group the play is running against vs variable loading19:48
dmsimardWhich would be awkward anyway, if you target a play against "all", you don't really know what group you're targetting19:48
jeblairya, makes sense19:49
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Only run whereto if htaccess file exists  https://review.openstack.org/52199619:59
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Only run whereto if htaccess file exists  https://review.openstack.org/52199620:00
*** jasondotstar has joined #zuul20:13
kklimondahow does zuul promote work?20:23
kklimondabased on the description I've expected promote to move the given change to the top of the queue (below the currently running jobs), but either that did not happen, or the UI is just not showing that correctly.20:25
openstackgerritMerged openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role  https://review.openstack.org/52114220:29
openstackgerritMerged openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role  https://review.openstack.org/52161820:34
jeblairkklimonda: it should move the change to the top of the queue *ahead* of the currently running jobs20:36
kklimonda@jeblair is there any way to see the internal zuul queue to see if that has happened? Or is zuul web/status.json "canonical" representation anyway?20:37
jeblairkklimonda: it's canonical20:37
jeblairkklimonda: can you describe your initial state, your promote command, and the state after running promote in more detail?20:38
kklimondasure20:38
jeblairkklimonda: (maybe use etherpad.openstack.org if it helps to write it out there)20:38
kklimondajeblair: https://etherpad.openstack.org/p/zuulv3-promote - zuulv3 web is public, and nothing critical in logs, so I just wrote it all down20:42
jeblairlooking20:42
jeblairoh it's check, i was assuming gate20:43
kklimondais this a gate-only feature? what's the difference?20:44
jeblairwithin a pipeline, there are multiple queues.  in gate (dependent pipelines), these are determined by which projects affect each other and need to be tested together.  in check (indepedent pipelines), all of the items are independent (ie, their ordering in the pipeline doesn't affect each other), so every item gets its own queue.20:45
kklimondaah, that makes sense20:45
jeblairso yeah, promote isn't going to do anything in that case since it's a queue of one20:45
jeblairwe could probably alter it to do something more useful in that case20:46
kklimondawould it be possible to implement that for check too? As in, how much work would that be? I'm only juggling 2 patches right now ;)20:46
kklimondas/check/independent pipelines/20:47
jeblairi don't think it would be a simple change... mostly because the behavior we get in gate comes as side effect of re-ordering the queue (the dependency stack has changed, so zuul cancels jobs and re-launches)20:48
jeblairhere's an idea though20:48
jeblairthe goal is really "get me results for this change faster", right?20:48
pabelangerI actually think if you promote a change in check, it gets moved back to the bottom of status page20:49
pabelangerat least that is how I remember it when I tried to promote something in check may moons ago20:49
jlkpunishment!20:50
jeblairperhaps we could add a command to change the priority for a specific change.  normally priority is determined by the pipeline.  but if we had a command to say "increase the priority of this change", zuul could cancel the node request for that change, and re-issue it with the updated, higher, priority.  this would let it get the nodes faster and therefore complete faster.20:50
kklimondaright, that's how I've assumed that to be working in the first place - then I started reading the code, and got confused :)20:51
kklimondaI was missing "gate-only" part of the puzzle, now that code makes more sense20:51
jeblairnode allocation is now the dominant factor in when changes start running jobs.  the gearman queue is far less relevant now20:51
kklimondaright20:51
jeblairthis priority change would probably be a lot easier to do.20:51
kklimondawould that also affect dependent pipelines, or is promote basically doing the same thing anyway?20:52
pabelangerYa, prioirity would be nice20:52
kklimondaif it is, perhaps we could just reuse promote for both pipelines, and just make zuul different thing based on the pipeline type.. which sounds pretty nasty..20:53
jeblairkklimonda: it could work on dependent pipelines, but promote would still be better there, because a change at the end of the queue with jobs that have finished still won't report until the change ahead has.  though you could use priority on a set of changes in one change queue to give them advantage over a different change queue.20:54
jeblairkklimonda: yeah... i'm sort of thinking that two commands may be clearer, but maybe we should have 'zuul promote' error out on independent pipelines?20:54
jeblair("You probably don't want this, use priority instead")20:55
kklimondamhm, error out with a message about the other command could work20:55
kklimondajeblair: btw, now that the summit is over if you have time, I've reworked https://review.openstack.org/#/c/515169/ a bit20:59
kklimondaI've also had an idea how to unify autohold requests for jobs, changes and refs by making the last part of the key a regex (.* for job-wide, refs/changes/[change]/.* for changes and full ref for refs)21:01
kklimondabut before I write it I wanted someone to take a look at the current revision21:02
jeblairkklimonda: ah, thanks!  i had a successful vacation and managed to completely forget everything from before the summit.  :)21:02
dmsimardjeblair: that's quite the feat21:03
kklimondahaha, didn't know that was actually possible - tell me your secret ;)21:03
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Set success-url for sphinx-docs to html  https://review.openstack.org/52201721:05
mordredkklimonda: vacationing in places where there is no internet is helpful - I did the same thing before the summit - it made the begnning of the summit fun, as I had to re-learn what a computer is21:06
dmsimardmordred, jeblair, pabelanger: By the way, static report generation in ARA might not make it into 1.0. The use case with Zuul made me realize that it /really/ doesn't scale well and I'd much rather improve the sqlite "middleware" option I came up with instead ( http://ara.readthedocs.io/en/latest/advanced.html )21:07
jeblairi deprived myself of oxygen by climbing something like 6 thousand stairs; probably caused permanent brain damage but felt great.21:07
mordredjeblair: \o/21:07
mordredjeblair: you had too many brain nuggets anyway21:07
mordreddmsimard: nod. where did we get on deploying the middleware version in openstack land?21:08
jeblairdmsimard: thanks, makes sense21:08
dmsimardThe static generation in ARA doesn't come for free, there's some constraints and hacks involved to ensure parity between the dynamic and the static version -- so improving the story around "arbitrary" sqlite databases and making the report always "dynamic" will allow for more freedom21:08
dmsimardmordred: not yet, there's reviews for logs-dev.o.o here: https://review.openstack.org/#/q/topic:ara-sqlite-middleware21:09
dmsimardI just -W'd https://review.openstack.org/#/c/513866/ because I need to double check something with the vhost setup first.21:09
kklimonda@dmsimard with sqlite middleware, would it be possible to "parse" ara reports programmatically, for example to get per-task durations?21:10
dmsimardkklimonda: when 1.0 is released, yes -- not in the current "stable" version21:10
dmsimardkklimonda: I actually discussed this last night, hang on21:11
dmsimardkklimonda: http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2017-11-21.log.html#t2017-11-21T04:14:3921:12
dmsimardsee for example http://paste.openstack.org/raw/626895/ which gets information about failed tasks for a particular playbook21:12
kklimondamhm, that will probably make a lot of things easier :)21:13
kklimondaI had to gather duration of a single task across all the jobs, right now ended up parsing html (with the power of grep and sed) but being able to just load bunch of sqlite DBs and run queries on them would be much nicer21:14
dmsimardkklimonda: and the cool thing about the API is that the client-side implementation (that paste just now) knows how to "talk" to the API offline/internally or over HTTP REST without any changes in the implementation21:15
dmsimardso people can write "plugins" or whatever they want and it'll just work, whether I'm running it locally on my laptop without a centralized instance or if I'm sending data over http21:16
kklimondaso with this implementation anyone could write a python script that will connect to ara endpoints and query them for various data?21:16
dmsimardyes, right now the client is bundled in ara -- but the plan is to unbundle it.. like python-araclient or something. Same for the other components (webapp especially)21:17
dmsimardIt's not 100% clear yet how the API will end up being restricted (or not).. I'm not interested in the business of handling credentials, passwords, permissions, ACLs/RBAC, etc. This might be an exercise left to the operator -- to restrict through a webserver or something.21:18
kklimondaright now there is no RBAC etc. anyway, right?21:20
dmsimardRight, but there's also no API and the interface is 100% passive21:20
kklimondaanyone can just pull static files and have their sanity tested by parsing it with regex21:20
dmsimardThe interface in 1.0 remains 100% passive, but you can POST/PATCH/DELETE through the API21:20
kklimondahum21:20
kklimondawhat would be the usecase for making changes to the already generated report?21:21
kklimonda(I'm probably missing something obvious, I only see ARA as a tool to display zuul job results right now :))21:21
dmsimardMostly things that you don't know until later21:22
dmsimardFor example, we might want to create a record in the database for a task21:22
dmsimardand then update it later once we know if it failed or passed21:22
dmsimardara itself isn't really going to be modifying historical data, but the ability is there -- the api is super generic21:23
dmsimardkklimonda: https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/callback/default.py might give some context around how things work21:24
dmsimardkklimonda: ara is a callback plugin that leverages each of these hooks (v2_playbook_on_start, v2_task_on_failed, etc.) and in some circumstances you want to circle back around an event that started (a task) and "finish" it (mark it as successful)21:25
dmsimardmordred: btw "select count(*)" is stupid slow for a number of reasons and I kind of want to keep numbers about the amount of data processed by ara. How do you feel about just selecting the last row and getting the id instead ? like "select id from table order by id desc limit 1" ? It's going to be inaccurate if you end up deleting data but it's not really a lie in the sense that ara did process those21:31
dmsimardI totally hacked something to make it run faster with sqlalchemy (thank you anonymous stackoverflow person) but it's still way too slow21:32
mordreddmsimard: have you tried "select count(id)" ?21:33
mordreddmsimard: select count(*) has a special optimization in mysql that makes it fast21:33
mordreddmsimard: but if you select count(id) sqlalchemy _should_ be able to use the index on the primary key21:34
dmsimardI don't remember, I do know that it is a fairly well documented issue that select count is slow in sqlalchemy21:34
mordrednod. well - getting the highest value from an auto increment int primary key column should be good enough21:34
dmsimardmordred: that special optimization is in innodb ?21:35
mordreddmsimard: oh- actually, I think it's just in  myisam - trying to remember - it's beena few years since my consulting days and it gets hazy21:37
dmsimardheh21:37
dmsimardI vaguely remember doing repeated "show table status" on innodb tables and the row count varying wildly21:38
jeblairmordred: maybe you're at the phase now where you can only tune mysql while drunk21:38
dmsimardoh look it's explained here21:38
dmsimard The number of rows. Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40 to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.21:38
dmsimardThe Rows value is NULL for tables in the INFORMATION_SCHEMA database.21:38
mordreddmsimard: yah - there it is21:38
dmsimardgood to know21:39
mordredand https://www.percona.com/blog/2007/04/10/count-vs-countcol/ explains how innodb will do which type of scans in which cases21:39
mordredalso https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/21:40
mordreddepending on how much you want to know :)21:40
dmsimardIt's been at least 2 years since I've actively tuned mysql but it's still fun :)21:41
*** jkilpatr_ has quit IRC21:47
*** threestrands has joined #zuul21:48
*** jkilpatr has joined #zuul22:05
*** hashar has quit IRC23:22

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!