Thursday, 2017-03-16

openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Add entry for ref-replication-scheduled event  https://review.openstack.org/44625100:47
openstackgerritK Jonathan Harker proposed openstack-infra/zuul feature/zuulv3: WIP: Reenable test_build_configuration_conflict  https://review.openstack.org/44627502:09
*** jasondotstar has quit IRC03:25
*** jasondotstar has joined #zuul03:26
*** leifmadsen has quit IRC03:26
*** leifmadsen has joined #zuul03:27
jamielennoxis it possible/are there any examples of shipping custom ansible modules in a common-config? what's the path?03:34
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Re-add the ability to set username on zuul-executor  https://review.openstack.org/44630805:58
*** isaacb has joined #zuul06:26
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Add default logging file argument  https://review.openstack.org/44632407:02
*** bhavik1 has joined #zuul07:03
*** bhavik1 has quit IRC07:11
*** isaacb has quit IRC07:20
*** isaacb has joined #zuul07:23
*** hashar has joined #zuul07:43
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Change mutex to counting semaphore  https://review.openstack.org/44256308:36
*** hashar has quit IRC08:42
*** hashar has joined #zuul08:53
*** hashar has quit IRC08:55
*** hashar has joined #zuul08:59
*** Cibo_ has quit IRC08:59
*** Cibo_ has joined #zuul09:04
*** Cibo_ has quit IRC09:08
*** isaacb has quit IRC09:16
*** isaacb has joined #zuul09:17
pabelangermorning13:07
mordredmorning pabelanger13:09
*** hashar is now known as hasharNap13:30
*** isaacb has quit IRC13:58
*** isaacb has joined #zuul14:11
pabelangerjeblair: do you mind looking at 446095 and 446110 this morning? Shrews patches to bring nl01.o.o back online14:14
*** Cibo_ has joined #zuul14:39
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Reset lost requests  https://review.openstack.org/44609514:40
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Deallocate ready nodes with no requests  https://review.openstack.org/44611014:42
*** hasharNap is now known as hashar14:43
Shrewspabelanger: let me know when you're ready to restart nl01. want to watch things14:47
jeblairpabelanger, Shrews: done :)14:47
jeblairjamielennox: i don't think we've experimented with that yet14:47
pabelangerjeblair: danke14:47
pabelangerShrews: Sure, I'm ready once we update the git repo on disk, so 15mins or so14:48
Shrewspabelanger: there are currently 5 nodes (the max) allocated to non-existing requests. They should get deleted at startup. The first 4 requests are PENDING and they should get reset to REQUESTED.14:51
*** isaacb has quit IRC14:51
* Shrews crosses fingers14:51
*** isaacb has joined #zuul14:51
pabelangerack14:53
mordredjeblair, jamielennox: one can put custom modules in roles, and we do deal with roles -so I'd hope we can just get it for free by putting modules in roles14:54
mordredShrews: I wish you luck14:55
jeblairmordred, jamielennox: agreed -- and i think we've implemented enough for it to work at this point if someone wants to try it out14:57
jeblairmordred, jamielennox: however, we do still have an outstanding bug (there's a story) where we may not put roles in the correct security context, so be aware of that.14:58
*** Cibo_ has quit IRC15:00
Shrewsjamielennox: hrm, this json error on your nodepool change makes no sense to me: http://logs.openstack.org/75/445675/3/check/nodepool-coverage-ubuntu-xenial/45b42f1/console.html#_2017-03-15_20_22_33_27205015:08
Shrewsjamielennox: maybe some sort of race condition we (or kazoo) do not account for15:09
ShrewsNode znodes should always have data15:09
*** isaacb has quit IRC15:10
pabelangerHmm, I might need some help. Going a little crazy as to why e1c26d56b541f (^446110) is not the tip of feature/zuulv3: http://git.openstack.org/cgit/openstack-infra/nodepool/log/?h=feature/zuulv315:14
pabelangerbut is locally for me15:14
pabelangerbecause, nl01.o.o /opt/nodepool is also out of sync with local15:15
Shrewspabelanger: hrm, a fresh 'git pull' for me does not get those two latest merges15:16
pabelangerI think something might be up with our git farm15:17
Shrewsyeah15:17
pabelangermoving to #openstack-infra15:17
jeblairokay, that's recovering now16:04
pabelangerconfirmed, nl01.o.o updating now16:05
pabelangerShrews: let me know when you are ready16:06
Shrewspabelanger: ready. do you need to 'pip install' again since that didn't happen last time?16:07
pabelangerShrews: no, all fixed.16:07
Shrewsk16:07
pabelangerokay, starting nodepool-launcher16:07
pabelangercool, nodes were deleted16:08
Shrewsyep16:08
Shrewspending requests being processed16:09
pabelangerOh16:10
pabelangerheh16:10
pabelangerwe'll need to restart zuul to pick up new executor changes too16:10
pabelangerwe'll let the queue clear out first16:10
pabelangerShrews: I do see a locked ready node however16:12
Shrewspabelanger: that can be normal16:12
Shrewsthey get locked satisfying requests16:13
* Shrews tests the new 'request-list' command16:14
pabelangerokay, I'm going to stop zuul-launcher and update to zuul-executor. Jobs are getting requeued until they hit the RETRY_LIMIT16:15
Shrewspabelanger: umm, i'm confused why some nodes have AZ 'nova' and others are blank? if there are AZ's, we should list them in nodepool.yaml16:15
Shrewsoh, they're building. nm16:15
Shrewsjeblair: we should probably add node.az to the zuul model definition. seems we lose that value when you set the state to IN_USE16:16
clarkbfwiw I don't think we have any clouds where az currently matters16:17
clarkb(but others may and future might)16:17
jeblairShrews: yeah, though also, we should probably have the model classes transparently keep any unknown values, so we don't stomp on them.16:17
Shrewsjeblair++16:17
Shrewspabelanger: hrm, seems we have a lock now.  :(16:19
Shrewsjeblair: many nodes stuck in IN_USE16:21
Shrewsand locked16:21
jeblairwho owns the lock?16:21
jeblairlooks like zuul16:22
pabelangerokay, zuulv3-dev started again16:23
jeblairwait what?16:23
Shrewsoh, well16:23
jeblairwhy are we restarting things?16:23
pabelangerto use new zuul-executor16:24
pabelangershould I have not done that?16:24
jeblairwe were trying to debug some stuck locks16:24
jeblairit's possible the executor restart made them stuck16:24
jeblairi think we still have a bug with that16:24
jeblairpabelanger: if you restarted the executor first, then the main daemon, that may have been the cause16:25
Shrewsnodepool-launcher is still hung up16:25
pabelangerapologies, I missed the locks part16:25
jeblairpabelanger: did you configure gerrit debug logging?16:25
jeblairin zuul16:25
pabelangerjeblair: no, I did rename the launcher-logging.conf file to executor-logging.conf16:26
pabelangerchecking it now16:26
Shrewspabelanger: jeblair: can we shutdown both things now so I can have a stable ZK state to investigate current issues?16:26
pabelangerI can stop executor again16:26
jeblairgo for it16:26
jeblairi think i know the logging problem16:26
pabelangerexecutor stopped16:27
*** Cibo_ has joined #zuul16:27
Shrewshrm, i think we may be hitting some znode versioning issues16:31
Shrews2017-03-16 16:22:31,198 DEBUG nodepool.NodeRequestHandler: Locked existing node 0000000584 for request 100-000000208816:31
Shrewsyet, when I looked at that in zk-shell, the 'allocated_to' was still set for a lost request16:32
Shrewsonly after zuul-exe was stopped did the value of allocate_to change to 208816:32
Shrewsafter n-l was stopped, i guess16:33
jeblairpabelanger: did you only stop the executor?16:34
jeblairpabelanger: i thought Shrews asked for both components to be stopped.  zuul-server is the only part that talks to nodepool.16:34
pabelangerjeblair: I also stopped / started zuul too16:34
pabelangerzuul is still running, I can stop that16:35
jeblairi see zuul-scheduler running right now16:35
jeblairShrews: you have not had a stable environment16:35
pabelangeryes, I restarted it when I did executor, but have not stopped it16:35
Shrewsyeah, wanted both stopped. just so as not to fill the queue16:35
jeblairpabelanger: why don't you go ahead and stop it16:36
pabelangerscheduler stopped too16:36
Shrewsso, i need to read up on znode versioning a bit. i'm really confused here16:36
Shrewsbecause n-l did what it was supposed to do, but the data it was reading was not what was expected16:37
*** bhavik1 has joined #zuul16:37
pabelangerwhile I am looking at launcher-debug on nl01.o.o, does it make sense to expose our cleanup timer to nodepool.yaml (like we did becore)? I think 5 seconds might be a little aggressive16:44
jeblairpabelanger: i do not think it should be exposed as an option because it doesn't make sense for a user to tune it.  we should hardcode it to a good value.16:45
Shrewspabelanger: We want deleted nodes to be cleaned up quickly (thus the short time), but I was thinking it might be a good idea to have that as a separate thread from all of the other cleanup that can be less aggressive16:45
jeblair++16:46
pabelangerAh, Isee16:46
jeblairthough, another option would be to set watches on nodes so that we get immediate notification of when to clean them up.16:46
jeblairthen all of the periodic cleanup tasks can be less aggressive16:47
pabelangerI think the current ServerList that happens every 5 seconds is from our _cleanupLeakedInstances function. Today, we do 60 seconds on nodepool.o.o16:47
jeblairpabelanger: on master we get immediate notification16:48
Shrewswatches are another option16:48
pabelangernot a big deal, was just trying to see what nl01.o.o was doing with all the ServerList calls16:48
Shrewsharlowja: Can you think of any reason why reads from ZK would not return the latest version of a znode?16:48
Shrewspabelanger: i'd be fine with first separating out into a separate thread, then doing the watches later16:49
jeblairthat seems like a tractable plan16:50
pabelangerwould NodeCleanupWorker() launch the new thread or Nodepool()?16:51
jeblairpabelanger, jhesketh: https://review.openstack.org/224336 invalidated our logging configuration.16:51
jeblairpabelanger: is puppet running on zuulv3-dev?16:51
pabelangerjeblair: no, I made the changes manually16:51
jeblairpabelanger, jhesketh: ok, i'm going to start manually making a new logging config there, and we can copy it into puppet for master later16:52
pabelangerwfm16:52
Shrewspabelanger: i envisioned NodePool doing it (a new DeletedNodeWorker thread and rename NodeCleanupWorker to CleanupWorker)16:53
Shrewspabelanger: if you want to take a stab at it, go for it!16:53
pabelangerShrews: sure, would be good to try more threading programming16:54
pabelangerShrews: would _cleanupNodeRequestLocks() and _cleanupNodes() go into DeletedNodeWorker?16:59
*** hashar has quit IRC17:00
Shrewspabelanger: yes, though I might consider moving the new check in _cleanupNodes() for the allocated_to stuff into the less aggressive thread17:02
Shrewsnot sure that's necessary. haven't given too much thought to it yet17:03
pabelangerShrews: okay, let me propose this over a few patches, so I bette understand things.17:03
Shrewspabelanger: the leaked node stuff should go into the less aggressive thread, obviously. that's where the server listing stuff is coming from17:05
jeblairpabelanger, Shrews: okay i have a new logging config staged for when we're ready to restart zuul17:05
Shrewsjeblair: i'm stumped on this versioning stuff17:05
harlowjaShrews reads from how many nodes?17:05
Shrewsmaybe my assumption is wrong17:06
Shrewsharlowja: umm, what? from "nodes"?17:06
harlowjasorry, how many zookeeper servers17:06
Shrewsharlowja: 117:06
harlowjaoh17:06
harlowjathen, nope can't think of a reason17:07
harlowjalol17:07
Shrews*sigh*17:07
Shrewsok, maybe this is just a bug somewhere then17:07
Shrewsin my code, that is17:07
harlowjaif > 1 i think u can get stale reads (due to how the commit/blah blah algorithm works)17:07
Shrewsobviously a bug  :)17:07
* Shrews tries a new test case17:08
* harlowja trying to find if there are any kazoo folks around anymore17:13
harlowjai starting to think i'm the last one left in #zookeeper17:13
harlowjalol17:13
Shrewsah ha. found at least one bug17:14
Shrewsthis may be the main thing. i've been assuming zk-shell reads the latest version of things. maybe not?17:19
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Unpause when we grab a pre-ready node  https://review.openstack.org/44664217:20
harlowjait should17:20
harlowjadidn't think zk-shell was caching anything17:20
Shrewsjeblair: pabelanger: ^^^17:20
harlowjaShrews https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees17:20
harlowjamay just be useful to read17:20
harlowja`Sometimes developers mistakenly assume one other guarantee that ZooKeeper does not in fact make. This is:17:20
harlowjaSimultaneously Consistent Cross-Client Views17:20
harlowja    ZooKeeper does not guarantee that at every instance in time, two different clients will have identical views of ZooKeeper data.`17:20
*** bhavik1 has quit IRC17:20
harlowjait will guarantee `Sequential Consistency`17:21
harlowjaand blah blah17:21
Shrewsharlowja: hrm, except our clients (threads in this case) share the same ZK connection, so....17:22
harlowjak17:23
Shrewsthat is, indeed, interesting though.17:23
Shrewswe could see differences between zuul and nodepool b/c of that17:23
harlowjaya, there might be a way to tweak this to make it happen17:24
jeblairShrews: +217:24
Shrews"The clients view of the system is guaranteed to be up-to-date within a certain time bound (on the order of tens of seconds)."17:25
Shrewstens of seconds?17:25
Shrewswonder why so high17:26
harlowjalikely the time the server has before it kicks itself out of the zookeeper cluster, lol17:26
jesusaurjeblair: yesterday I started poking test_build_configuration_conflict() in test_scheduler.py, and I discovered that merge conflicts are getting reported as success. Should a merge failure call sendWorkFail instead of sendWorkComplete? or is there another way the merger should notify the server about the merge failure?17:26
harlowjaShrews https://github.com/coreos/etcd/issues/741 is an interested read also17:28
jeblairjesusaur: i think they should report sendworkcomplete but with a null commit, or maybe a boolean flag?17:29
harlowjapretty sure (though not 100% that etcd does the same, in that reads may return slightly different versions of the data, depending on the server the client is connected to)17:29
jesusaurjeblair: ok, so then something should check for the null commit and/or merged=False afterwards?17:31
jeblairjesusaur: hrm, i *thought* we had re-added merge checking on all changes, but it looks like that hasn't happened yet.  i think someone was working on it but i guess they dropped off.17:34
jeblairjesusaur: the missing code is in I14e3d96cadec6e4b4cca66137071e8ed67f161a1.  things related to prepareRef17:36
SpamapSShrews: it's ok for Zuul and Nodepool to see different views, as long as they both eventually see the same view.17:37
harlowjaalso https://aphyr.com/posts/313-strong-consistency-models17:37
jeblairthe main thing is whether locking works :)17:37
harlowjadistrubted systems are hard17:38
harlowjalol17:38
jeblairjesusaur: but prepareLayout does very similar work now; so i think the result needs to be a combination of the two.17:39
jeblairjesusaur, SpamapS: sorry my evaluation of that turned out to be wrong.17:40
jesusaurjeblair: ah, thanks for the breadcrumbs, I'll dig through this a bit more17:43
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Rename NodeCleanupWorker to DeletedNodeWorker  https://review.openstack.org/44664917:43
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Create BaseNodeWorker class  https://review.openstack.org/44665017:43
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Split DeleteNodeWorker into two threads  https://review.openstack.org/44665117:43
SpamapSjeblair: can you disambiguate "that" in your message for me?17:46
ShrewsSpamapS: yep, "eventually" consistent is what i'd expect17:49
jeblairSpamapS: the area that jesusaur is working on -- i thought we had re-added the merge conflict detection, but i think that work got dropped.17:51
rbergeronshrews, jeblair, spamaps, pabelanger, anyone else full of comments: I've got the draft of the mail update here -- https://etherpad.openstack.org/p/zuul-v3-first-status-update (at the top, the previous version is at the bottom)17:51
rbergeron(yes, i know i said i would do this on my own but just want to double check if anyone has anything specific they want to highlight)17:51
Shrewsrbergeron: i'm full of something...17:51
rbergeronshrews: me too. cheetos puffs, to be specific17:52
Shrewsrbergeron: i added words (at the same time as someone else)17:53
jeblairo/17:53
SpamapSjeblair: OH! ACK17:53
jeblairSpamapS: so that probably invalidates some of my evals in that etherpad17:54
SpamapSjeblair: well we can always fix that. :)17:54
jeblairrbergeron: lgtm thanks!17:55
jesusaurSpamapS: and with any luck, I will!17:56
jeblairrbergeron, pabelanger: we could mention the launcher rename17:56
jeblairah, pabelanger is on that :)17:56
Shrewsjeblair: perhaps we should reference your email for that syntax change?17:56
*** Cibo_ has quit IRC17:57
pabelanger++17:57
jeblairShrews: good idea: http://lists.openstack.org/pipermail/openstack-infra/2017-January/005018.html17:57
jeblairrbergeron: ^17:57
Shrewsadded.17:58
pabelangerrbergeron: updates left, looks good17:59
rbergeronpabelanger, shrews, jeblair: merci beaucoup; looks like y'all added all those things above into the mail directly too (thanks!). will ship it out :)18:15
jeblairShrews: nodepool fix merged18:23
Shrewsjeblair: yup. pabelanger, ready for restart when nl01 updates18:24
Shrewssomebody restart it?18:25
Shrewsb/c it's now running... but i don't think the code updated18:25
Shrewsoh, it's been running18:26
ShrewsO.o18:26
pabelangerI haven't done anything yet18:26
Shrewsit was running. i stopped it18:26
Shrewsnot sure why18:27
pabelangerlet me check if puppet has run18:27
Shrews/opt/nodepool hasn't updated18:27
pabelangerconfirmed18:27
pabelangerI can kick.sh it18:28
Shrewspabelanger: can i manually 'git pull', or does that muck things up18:28
pabelangerShrews: actually18:28
pabelangerShrews: /opt/system-config/production/tools/kick.sh nl01.openstack.org18:28
pabelangerfrom puppetmaster as root18:28
Shrewsi'm not touching puppet things. you can do it, or we can wait  :)18:28
pabelangerk, running18:28
pabelangerShrews: okay, restart when you are ready18:30
pabelangerwe also need to restart zuulv3-dev.o.o18:31
Shrewspabelanger: go for it. the request pool drained18:31
pabelangerstarted18:32
pabelangerk, we have no changes in queue18:33
pabelangerlet me add one18:33
pabelangerjust rechecked 44133218:34
pabelangerOh, heh18:34
pabelangerconfigure_mirror.sh is failing, since we removed /etc/nodepool/ folder18:34
pabelangerbut, I do see an issue18:34
jeblairyeah, i reckon it's time to reimplement that in ansible18:35
pabelangerjobs should be set to failing, but they are requeuing due to exit code 218:35
pabelangerneed to see why that is a 218:35
Shrewsnodepool-launcher seemed to do its thing. will need some watching though18:37
jeblairpabelanger: that's a pre-playbook, right?18:37
jeblairpabelanger: we re-queue for pre playbook failures18:37
ShrewsThese nodepool bugs over the last few days were fantastic. Nothing like the drizzle savepoint bug that drove me crazy for weeks and curse mordred for getting me involved, but interesting nonetheless18:39
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add secret top-level config object  https://review.openstack.org/44615618:41
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add per-repo public and private keys  https://review.openstack.org/40638218:41
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Associate secrets with jobs  https://review.openstack.org/44668718:41
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Decrypt secrets and plumb to Ansible  https://review.openstack.org/44668818:41
Shrewswooo, requests18:41
ShrewsMOAR18:42
jeblairi'll type faster!18:42
* Shrews watches the debug log and sees the matrix18:43
pabelangerjeblair: ya, pre-run. thanks18:46
Shrewsstill chugging... looking pretty good18:48
Shrewsjeblair: w00t. it scoffed at your attempts to throw it for a loop (literally, a loop)18:49
jeblairheh18:49
pabelangerjeblair: what are your thoughts on how we get nodepool (provider / cloud / region / az) into zuul? Is it possible to get some of the data out of zookeer?18:49
pabelangerzookeeper*18:50
Shrewsthat question confuses me18:50
pabelangerI somehow need to access the following data (we deleted from nodepool) from zuul.  https://review.openstack.org/#/c/445572/18:52
pabelangerwe use it to build up our mirror information18:52
pabelangeras an example: http://mirror.dfw.rax.openstack.org/18:52
Shrewsoh, ok. less confused18:52
jeblairah yeah, we should be able to get all of that.  what's the best way to give it to ansible?  in vars.yaml?  or should it be in the inventory file?18:53
pabelangervars would be the easiest18:54
pabelangerbut18:54
* Shrews uses this time of less brokenness to step away for a bit. bbl18:54
pabelangerdo we maybe want to setup a FQDN in inventory?18:54
jeblairit's per-host information18:54
pabelangerthen extract the info from it?18:54
jeblair(i mean, those things should generally all be the same, but it feels like we should be able to support them being different)18:54
pabelangeraccording to docs there is inventory_hostname (which could be FQDN) and inventory_hostname_short, which is up to first dot18:59
pabelangerso, adding it to inventory could be an option18:59
jeblairpabelanger: wait, are you suggesting we make up a fqdn for the hosts?18:59
pabelangerjeblair: Ya, but that might be too specific to openstack-infra19:00
jeblairyes19:01
jeblairpabelanger: what i'm asking is whether it's possible to add host-specific metadata to the inventory19:01
jeblairor whether we have to put that in variables and then cross-reference them19:01
jeblairhttp://docs.ansible.com/ansible/intro_inventory.html#host-variables19:01
jeblairthat seems to say yes19:01
pabelangerjeblair: right, what is the difference then just adding them into vars.yaml?19:02
jeblairpabelanger: so it makes the most sense to me to add "nodepool_provider", etc, to host variables.  then each invocation of a 'setup mirror' role would be correct on its local host, i assume.19:02
jeblairpabelanger: we can have many hosts.  some values like this may end up being different some day.19:03
pabelangerI see19:04
pabelangerI don't have a preference atm, so I'll defer to you19:04
jeblairhost variables it is then19:05
jeblairpabelanger: i'm heading to lunch now; it's all yours if you want to work on that19:06
pabelangerjeblair: sure, I can try19:06
Shrewspabelanger: i want to bike shed on https://review.openstack.org/446650  :)19:38
pabelangerShrews: sure!19:40
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Decrypt secrets and plumb to Ansible  https://review.openstack.org/44668820:04
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Associate secrets with jobs  https://review.openstack.org/44668720:04
pabelangerjeblair: I might need some help with putting vars into inventory, it looks like we need to update test/base.py to properly populate getHostList()20:13
pabelangerright now vars.yaml looks to be the easier route :)20:14
pabelangerShrews: no objection, fixing20:14
jeblairpabelanger: do you think that vars.yaml and inventory variables are equivalent?20:17
pabelangerjeblair: good question, I don't really use inventory files to store variables. Maybe jlk is better to ask20:18
jeblairpabelanger: i do not, and that's what i was trying to convey earlier20:18
pabelangerMaybe I just haven't had a need for them until now20:18
jeblairpabelanger: we have a bunch of hosts.  we have metadata for them.  ansible has a way of associating metadata with hosts.  that seems the right way to do it.20:18
jeblairpabelanger: if we stick the information in vars, then we have to cross-reference with some index, like the hostname20:19
pabelangerjeblair: yes, I agree in that case inventory variables make sense.20:19
jeblairpabelanger: so the vars would look like: "hosts: compute1: provider: rax"  and then, the role would have to do something like "{{hosts.$fqdn.provider}}" i'm not sure what that actually looks like in jinja20:20
jeblairpabelanger: but if we put it in the inventory, then the role just says "{{nodepool_provider}}" because it should be set already, if i'm following the docs20:21
pabelangerjeblair: I am happy to continue down inventory variables, I just need some help with test/base.py.20:21
jeblair(i'm sure all those specifics are wrong, but i think that would be the general process required)20:21
pabelangercurrently getHostList() in test/base.py is returns an empty hosts variable. Need some guidance how to properly mock how the information20:22
jeblairpabelanger: i don't think getHostList in tests/base.py needs updating20:22
jeblairpabelanger: that's just adding a local connection flag20:22
jeblairpabelanger: it calls the real getHostList in executor/server.py20:23
pabelangerjeblair: right, I was appending the nodepool_provider info along side ansible_host, for getHostList() in executor/server.py20:23
jeblairpabelanger: which is where the additional nodepool data should be added.  but it will also need to be plumbed through from nodepool itself, which means in nodepool.py, and zk.py.  possibly also the fake nodepool in tests/base.py (to add any missing attrs like az)20:24
pabelangerbecause args['nodes'] is empty, the values are not set20:24
pabelangerjeblair: mind if I show you what I have now?20:24
pabelangermaybe I am on the wrong path20:24
jeblairpabelanger: sure20:24
jeblairpabelanger: though if nodes is empty, it's probably one of the many test cases which run jobs with no nodes20:25
pabelangermaybe that is the problem20:25
jeblair(that still needs to work, but obivously isn't going to exercise the code)20:25
jeblairpabelanger: i think many of the tests in test_v3 use nodes20:26
jeblairpabelanger: certainly the TestAnsible one20:28
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Create nodepool inventory variables  https://review.openstack.org/44672720:28
pabelangerjeblair: ^20:28
pabelangerI haven't added a test case yet20:28
pabelangerbut of the test_v3 tests I look at20:29
pabelangerinventory is set to 'localhost ansible_connection=local'20:29
pabelangerwhich, could be because of what you just mentioned about nodes20:30
pabelangerchecking that now20:30
jeblairpabelanger: oh yeah, the test gethostlist also appends a localhost entry, mostly so that if a test doesn't request any nodes, there's still something in the inventory.  i think we can just ignore that.20:32
jeblairpabelanger: i think what you're seeing is that you're on the right path, and just need to use a test that requests nodes, then you should see it20:33
pabelangeryes, I think you are right20:33
pabelangerlet me do that20:33
pabelangerjeblair: yup, that was it. Thanks!20:34
pabelangerI'll add a test job now20:34
jeblairpabelanger: can you add 'az' while you're at it?20:36
jeblairShrews: hrm, it looks like the zuul model is already set up to avoid clobbering unknown values.20:37
jeblairShrews: but i wonder if we missed a read, so we just wrote old values?20:38
pabelangerjeblair: yes20:39
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Create nodepool inventory variables  https://review.openstack.org/44672721:02
pabelangerjeblair: ready for comments21:03
jeblairpabelanger: +221:04
pabelangerjeblair: cool, sorry about the noise earlier about tests/base.py, I should have checked nodes were defined first21:05
jeblairpabelanger: np.  it's perhaps not immediately obvious that most of the tests don't actually request nodes.  :)21:06
pabelangerjeblair: right, is that a valid setup still? I remember something about we might not want that to happen on the executor21:08
jeblairpabelanger: i'm not sure.  perhaps we will want to support jobs that only perform some action on the executor?21:09
pabelangerjeblair: unsure also21:11
pabelangerif we did that, would motivation would somebody have to run nodepool?21:11
Shrewsjeblair: hrm, then i have no immediate explanation21:14
pabelangerclarkb: Shrews: either of you mind reviewing 446727? moves nodepool provider variables into a job inventory file21:16
clarkbI can try to take a look in a few minutes (finishing up one thing) but then have to head out to kid's doctor appointment so depends on how long it takes me I guess21:16
Shrewspabelanger: +221:25
pabelangerclarkb: when you have time, no rush21:26
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Handle the 'all' group in callback.  https://review.openstack.org/44674921:27
clarkbit was simple, I approved21:27
clarkbthe "nice" thing about doctors appointment is my other thing (a tempest update) should have test results by the time I get back to a computer21:28
Shrewsjeblair: i think the AZ thing may either be a shade bug or a nova bug. Seems that only sometimes is an AZ value returned: http://paste.openstack.org/show/603028/21:32
Shrewsjeblair: that could make our AZ grouping not too efficient21:33
Shrewsif "nova" is the same as "no AZ", we can't know that21:34
* Shrews places $$$ on nova bug21:34
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Create nodepool inventory variables  https://review.openstack.org/44672721:35
Shrewsheh, i think i lost $$$21:38
clarkbyou dont have to have AZs iirc21:38
Shrewsmordred: i can see OS-EXT-AZ:availability_zone populated from 'openstack server show', but shade isn't always giving this data back in the location attr21:39
ShrewsYou know what, I bet it's not populated right away by nova, so we need to check it again after waitForServer()21:44
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Set node AZ after we're done waiting for it.  https://review.openstack.org/44675321:47
Shrewsmordred: nevermind! above is a hopeful fix21:48
*** rattboi is now known as quartermaster21:48
*** quartermaster is now known as rattboi21:57
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Serve public keys through webapp  https://review.openstack.org/44675621:58
jeblairthat's the extent of the cryptography related parts of the secrets facility.  we still have some config-related changes to make for it.22:02
jheskethMorning22:04
*** yolanda has quit IRC22:16
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Decrypt secrets and plumb to Ansible  https://review.openstack.org/44668822:17
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Associate secrets with jobs  https://review.openstack.org/44668722:17
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Serve public keys through webapp  https://review.openstack.org/44675622:17
SpamapSjeblair: what was the logical conclusion of the discussion around RSA-OAEP yesterday?22:35
* SpamapS vaguely remembers looking at gcrypt and then shaking in the corner22:36
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Remove irrelevant test_merging_queues  https://review.openstack.org/44676822:38
jeblairSpamapS: i think it trailed off.  but the way it's gelling in my mind is: RSA-OAEP gets us the ability to encrypt something up to a couple hundred bytes long.  to do something longer we'd probably switch to some kind of hybrid scheme (pgp or rsa encrypted AES session key).  that will produce longer ciphertext.  i think we have 3 choices:22:39
jeblairSpamapS: 1) stick with pkcs1 and accept the limits.  (we can change our minds later).22:39
jeblairSpamapS: 2) keep pkcs1 and also add a second system.  maybe have the encryption utility automatically select it for you.22:40
jeblairSpamapS: 3) drop pkcs1 for the hybrid system and accept that ciphertext blobs will be bigger.22:40
jeblairSpamapS: i think that the framework in zuul will be substantially the same, and also, there is highly likely to be some RSA involved either way, so i decided to press on with implementation to spec for the moment.  i think little work will end up being wasted if we change.22:41
jeblairprobably we need a good ML thread to hash out what we want.22:42
jeblairfungi: ^22:42
SpamapSAlways a good idea to go slow and listen well when playing with crypto.22:42
jeblairya22:42
SpamapSjeblair: my only concern is that AFAICT, there's nothing to identify a ciphertext as pkcs1 other than attempting decryption, so changing our minds later may be complicated.22:44
fungiSpamapS: the function is namespaced as a yaml datatype in the spec22:44
SpamapSThough I suppose we can make a good guess since the data length is relatively static.22:44
jeblairyeah, we annotate it with a yaml tag, so it's forward-compatible22:44
SpamapSAh, does that still work with safe_load?22:45
SpamapSI've never used the feature.22:45
jeblairSpamapS: yeah, you can annotate an object as being 'safe'22:45
fungiml thread sounds like a great idea, and also i agree the current plan makes it not particularly hard to swap out or even increase the algorithms supported22:45
SpamapSjeblair: neat22:45
SpamapSwell yeah, then I'm for (1)22:45
jeblairSpamapS: https://review.openstack.org/446156 has that bit22:45
jeblairconfigloader.py22:46
jeblair(due to metaclass magic, the class defn of EncryptedPKCS1 is all that's required to make pyyaml know to load that)22:46
fungipkcs#1 gets zuul the ability to handle typical passwords/passphrases and short-ish api keys. something else will be needed to pass along things like asymmetric x.509/rsa key material, i agree22:47
SpamapSjeblair: fancy22:48
fungicomputers are both infuriating and amazing22:48
SpamapSfungi: isn't there a somewhat standard encryption format that CSR's use?22:49
fungicertificate signing requests aren't encrypted22:49
SpamapSI thought they were, by a password, unless you decline to do so.22:50
fungiyou're probably thinking of pkcs#12 (a.k.a. "pfx") which is commonly used to encrypt x.509 key bundles, but it's symmetric22:51
fungi"by a password" is a giveaway that it's symmetric22:51
SpamapSCould be22:51
SpamapSyeah22:51
SpamapSthe only arbitrary-length-data public key crypto I'm familiar with are both email standards: PGP and S/MIME22:52
SpamapSThough I guess they're just "message" standards, not specifically email.22:52
fungipkcs#7 can, but it's also the underpinning of s/mime22:54
* fungi sometimes wishes he knew less about these things. shadows of a former career22:56
SpamapSHey, I feel the same about flipping burgers. ;)22:57
fungithe two are alike in soooo many ways22:57
*** jamielennox is now known as jamielennox|away23:03
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Set filter according to PR/Change in URL  https://review.openstack.org/44678223:05
*** jamielennox|away is now known as jamielennox23:05
SpamapSjeblair: so.. stuck jobs..23:21
SpamapSjeblair: I'm looking at how to re-enable that test, and I wonder if we can still get into the situation that it artificially creates.23:21
jeblairSpamapS: yeah, something about the word 'stuck' and all the hullaballo about gearman keeps distracting me from what i think is its main purpose, which is in the comment23:23
jeblairor, rather the docstring23:23
jeblair"Test that pending jobs are cleaned up if removed from layout"23:23
SpamapSjeblair: the way it works now.. if I remove a job that is in the queue, the scheduler correctly cancels it.23:24
SpamapS2017-03-16 16:19:10,115 zuul.DependentPipelineManager    ERROR    Unable to find change queue for project org/project23:24
SpamapS2017-03-16 16:19:10,115 zuul.Scheduler                   WARNING  Canceling build <Build 7a09454c0cd34b988de9e276f8712c64 of gate-noop on <Worker Unknown>> during reconfiguration23:24
SpamapSthat does not end up in history tho23:25
SpamapSvs. before, when it did.23:25
jeblairhrm.  i agree that behavior matches the docstring.  i may not fully understand the mechanics of the test as applied to v2...  lemme dig further.23:26
SpamapSjeblair: so perhaps it's ok to just run that scenario and assert that there's no history?23:26
jeblairSpamapS: maybe?23:26
SpamapSlet me push up what I have and we can see the same code at least23:26
SpamapSjeblair: I may be testing something else with the way I refactored the test.23:28
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Re-enable test_stuck_job_cleanup  https://review.openstack.org/44678323:29
SpamapSthere's code23:29
jlkDid the zuulv3 install have a hiccup?23:30
jlkmy change just got a bunch of RETRY_LIMIT, and it doesn't seem to show up at all in review.23:30
jlk(other htan the vote)23:30
jeblairjlk: yeah, playbooks broken atm23:30
jlkokay.23:30
jeblairjlk: pabelanger is fixing it, but we'll have some false -1s for a while23:30
jeblairSpamapS: ah, it's testing that the gearman job is removed from the gearman queue23:31
jeblairSpamapS: the first config has it running 'project-merge'.  *that* is the job that it's looking for in self.assertEqual(len(self.gearman_server.getQueue()), 1)23:31
jeblairSpamapS: (it doesn't let the job run, just lets the launcher enqueue it)23:32
pabelangerjeblair: jlk: we need to restart zuulv3-dev.o.o to pick up inventory variables first23:32
jeblairSpamapS: and the final assertion is checking that project-merge still didn't actually run.23:32
jeblairpabelanger: oh has that merged?23:32
SpamapSjeblair: ah, so it's just making sure the queue is made of the one thing that did run.23:33
pabelangerjeblair: yes, I still need to update our playbooks however23:33
jeblairSpamapS: well, completed history, but yes.23:33
Shrewspabelanger: any chance you can review https://review.openstack.org/446753 ? very easy review. would like to restart n-l with it23:34
jeblairSpamapS: so the key to that test is the hold_jobs_in_queue part.  we want to keep doing that, and not set hold_jobs_in build.23:35
pabelangerShrews: +323:35
jeblairSpamapS: pointed some things out in review23:37
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Set node AZ after we're done waiting for it.  https://review.openstack.org/44675323:38
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Replace ssh-keyscan with host keys from node  https://review.openstack.org/44678523:40
pabelangerjeblair: ^ would be interested in comments when you are ready to review.  Specifically, I made some changes to getHostList() return value23:41
Shrewsany objections to nodepool-launcher restart?23:42
pabelangernone here, going to do the same for zuulv3-dev too23:42
pabelangerjeblair: ^23:42
jeblair++23:42
Shrewsoh noes23:45
jeblairpabelanger: lgtm (assuming what we get from nodepool is ready to splat into the known_hosts file)23:47
pabelangerjeblair: I think so, I see the data in zookeeper using zk-shell23:48
* jeblair hopes Shrews is okay23:48
Shrewswell23:49
jeblairShrews is well.  that's good.23:49
jeblairall i need to hear23:49
Shrewssomething weird happened23:50
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Revert "Configure regional mirrors for our jobs"  https://review.openstack.org/44678623:51
ShrewsOpenStackCloudException: Multiple matches found for None23:51
Shrewsi know shade had a release today, and there were some changes to the name/id matching23:51
Shrewsso... maybe a bug there23:51
Shrewsbut also, we should try to delete a node if the id is None23:52
jeblairpabelanger: +323:52
Shrewsnot try*23:52
pabelangerjeblair: danke23:53
Shrewsjeblair: pabelanger: we need jamielennox's fix https://review.openstack.org/#/c/445299/23:53
pabelangerthat will unblock us for now, I'll start a refactoring our mirror logic tomorrow23:53
pabelangerShrews: jeblair: Did we want a test for that? Not sure I understand the issue enough23:54
jamielennoxpabelanger, Shrews: i'm not sure how to write a test for that one, there was nothing that really existed similar23:54
Shrewspabelanger: i can try and work out a test tomorrow if we approve that tonight23:55
pabelangerI am okay with that23:55
Shrewsjamielennox: we just need to make createServer() fail in the test. i think the framework for that might already exist23:56
pabelanger+2, so jeblair can review23:56
jeblair+323:56
jamielennoxShrews: there was something like it but it was on a different thread, but i would need to double check for all the details23:56
jamielennoxlike i think there was a test for create server failure, but i couldn't see a test that was actually running the cleanup thread23:57
Shrewsjamielennox: look at test_fail_request_on_launch_failure23:57
jamielennoxhowever my priority was making my stuff work at the time so i could have missed something23:57
Shrewsjamielennox: we'd just need to make sure the node is deleted properly. can probably do so in that same test23:58
jamielennoxShrews: hmm, ok, i think that was what i was looking at, assume i just missed something23:59
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Don't try and delete nodes with no external_id  https://review.openstack.org/44529923:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!