Wednesday, 2018-09-19

*** jiapei has quit IRC00:24
*** _ari_ has quit IRC02:45
*** rfolco has quit IRC04:08
*** sshnaidm has joined #zuul05:10
*** sshnaidm has quit IRC05:11
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: executor: enable zuul_return to update Ansible inventory  https://review.openstack.org/59009205:44
*** chkumar|off is now known as chkumar|ruck06:53
*** hashar has joined #zuul07:10
openstackgerritSimon Westphahl proposed openstack-infra/nodepool master: Cleanup of leaked resource for static driver  https://review.openstack.org/60008407:30
openstackgerritSimon Westphahl proposed openstack-infra/nodepool master: Implement liveness check for static nodes  https://review.openstack.org/60151307:30
*** jesusaur has quit IRC07:39
*** jesusaur has joined #zuul07:44
*** jpena|off is now known as jpena08:31
openstackgerritSorin Sbarnea proposed openstack-infra/zuul master: Assure that status tooltip is displayed on entire row  https://review.openstack.org/60350409:05
rcarrillocruzTIL about https://zuul-ci.org/docs/zuul/user/badges.html09:31
rcarrillocruznice!09:31
mordredrcarrillocruz: \o/10:36
*** jpena is now known as jpena|lunch11:20
*** rfolco has joined #zuul12:17
tobiashmordred: do you know if nodepool can re-use floatingips instead of create/attach/detach/delete?12:17
tobiashI have nodepool in its own tenant and don't need a cleanup of unused floatingips there12:17
tobiashthis currently slows down the openstacksdk task queue dramatically in my environment12:18
*** hashar is now known as hasharAway12:21
mordredtobiash: I don't believe we've implemented that option in nodepool. it's ... an unfortunately racey process12:23
tobiashhrm, ok12:23
tobiashso the workaround would be to create multiple providers to the same cloud?12:23
mordredtobiash: the issue is that there is no way to atomically say "re-use an existing floating ip, else create one for me"12:23
tobiashah, ok12:24
*** jpena|lunch is now known as jpena12:24
mordredand since we do server creation in threads, we'd have to either have some sort of mutex somewhere to coordinate picking an available floating ip - or else handle conflict errors better12:24
mordredthat said - I think for your use case it could certainly be nicer to just re-use12:25
tobiashmordred: is there a way to alternatively parallelize the task queue?12:27
tobiashthe problem is just that attach, create, delete take 1-2 seconds and block any other openstack request in that time12:27
tobiashso if we could have a thread pool of say 5 parallel requests this issue would be solved for me probably too12:27
mordredtobiash: it's a good point - I've actually been thinking also about making the taskmanager be per-service too12:28
tobiashjust tested and launched 200 nodes at once using nodepool and task contention caused some boot timeouts12:28
mordredso that we're ratelimiting on a per-service basis (which is where server-side limits usually sit anyway)12:29
tobiashthat would make this better but still 200 instances would then wait for the requests for 200 floating ips and 50 of them will timeout12:30
mordredas for making the taskmanager have a thread pool - I think that would work, but we'd probably have to be more clever with how we track the rate limiting12:30
tobiashthe rate limiting is thought of max x requests per s right?12:30
mordredyah. oh, I guess we could just track when a request enters the pool12:31
mordredinterestingly enough - we actually already have a version of this for swift object segents12:32
mordredobject segments that is12:32
mordredI wonder if we could just re-use it insideof the sdk Adapter class12:33
* mordred goes to ponder the possibility12:34
mordredtobiash: yes - I think we can make a reasonably simple patch to openstacksdk to make this happen12:36
tobiashmordred: cool12:37
mordredtobiash: remote:   https://review.openstack.org/603739 Run all tasks through the threadpool12:43
mordredtobiash: that's COMPLETELY untested12:44
tobiashmordred: cool :)12:45
mordredbut I can't think of any reason it wouldn't work12:45
tobiashmordred: btw, how do you handle releases in openstacksdk?12:47
tobiashmordred: the 17.x releases are tied to the current openstack version?12:47
tobiashmordred: so if we would want to have this and 602618 usable by nodepool what would be the best approach?12:48
*** samccann has joined #zuul12:58
mordredtobiash: we can cut a new release any time - and they're not tied to openstack version13:00
mordredtobiash: in fact, we need to make one with clarkb's logging change in it for nodepool logs anyway13:00
tobiashah cool13:00
tobiashmordred: I didn't quite understand your change13:00
tobiashin case run_async is True the change is a noop right?13:01
tobiashis run_async False when used by nodepool?13:01
mordredtobiash: yes, that's right. the only thing that was calling run_async before was the swift object segment upload code13:01
tobiashso that is a different task manager?13:02
mordredit's the same task manager - it's a different behavior inside the task manager- if run_async is True in the task manager, then concurrent.futures is used and a Future object is returned13:02
tobiashah13:03
tobiashthanks for the explanation :)13:03
mordredsure! it's a deep dark section of the code13:03
mordredspeaking of ...13:03
mordredShrews: when you feel like staring at deep dark code - https://review.openstack.org/603739 and https://review.openstack.org/#/c/602085/13:04
tobiashmordred: is that thread pool configurable?13:08
tobiashin terms of max parallel requests?13:08
tobiashnevermind, found it13:09
mordredtobiash: only by constructor parameter to TaskManager13:09
tobiashthat seems like a sane default and is probably not overwritten in nodepool13:09
mordredyah. although if we wind up landing this patch to sdk - we should probably add a config option to nodepool to modify it13:14
mordredI think it's probaby fine to keep it that way for now though13:14
tobiashyepp13:14
Shrewsmordred: about to head out for a checkup on my wrist, but can look when i return13:28
mordredShrews: have you considered replacing your wrist with an angry cat?13:28
Shrewsmordred: it's too early for you to have been drinking13:29
mordredShrews: I would have thought after 10 years you'd know me better than that :)13:30
*** annabelleB has joined #zuul13:56
*** samccann has quit IRC14:04
*** samccann has joined #zuul14:04
clarkbmordred: my sdk logging change?14:31
mordredclarkb: yah14:31
*** gouthamr_ is now known as gouthamr14:35
*** _ari_ has joined #zuul15:06
*** hasharAway is now known as hashar15:30
*** chkumar|ruck is now known as chkumar|off15:34
openstackgerritMarkus Hosch proposed openstack-infra/zuul master: Add support for authentication/STARTTLS to SMTP  https://review.openstack.org/60383315:42
openstackgerritMarkus Hosch proposed openstack-infra/zuul master: Add support for authentication/STARTTLS to SMTP  https://review.openstack.org/60383315:44
openstackgerritMarkus Hosch proposed openstack-infra/zuul master: Add support for authentication/STARTTLS to SMTP  https://review.openstack.org/60383315:46
*** pcaruana has joined #zuul15:47
*** annabelleB has quit IRC15:50
*** annabelleB has joined #zuul15:53
*** annabelleB has quit IRC16:03
tobiashmordred: commented on https://review.openstack.org/60373916:18
*** annabelleB has joined #zuul16:23
*** pcaruana has quit IRC16:27
mordredtobiash: thanks!16:32
tobiashmordred: I was brave and tried that in production ;)16:33
clarkbtobiash: that is really brave :P16:33
tobiashbut the rollback was easy and it only caused some minutes delay16:33
mordredwow!16:33
mordredtobiash: hrm. that's ... interesting16:34
mordredtobiash: but thank you very much - I'll have to dig in a little further I guess :)16:34
Shrewshopefully our tests catch that16:36
tobiashShrews: the integration tests for sure16:37
tobiashthe unit tests: no16:37
tobiashI ran the unit tests before I tried it :P16:37
tobiashmordred: maybe do task_manager.wait_for_futures instead?16:38
tobiashfound that somewhere else in the code16:38
*** panda has joined #zuul16:38
*** annabelleB has quit IRC16:50
tobiashShrews: looks like I was wrong, right now only two non-voting jobs failed and all other jobs are already green except two still running jobs16:50
Shrewswell, looks like maybe test_get() failure might be related? http://logs.openstack.org/39/603739/1/check/openstacksdk-functional-devstack-magnum/f278e29/testr_results.html.gz16:54
Shrewsit at least references the new code16:55
*** annabelleB has joined #zuul16:59
openstackgerritSorin Sbarnea proposed openstack-infra/zuul master: Assure that status tooltip is displayed on entire row  https://review.openstack.org/60350416:59
tobiashShrews: does nodepool overwrite the task manager of openstacksdk??17:00
tobiashit overwrites the submit_task method which is used within the openstacksdk taskmanager to switch between async and not async17:01
tobiashmaybe the problem is not openstacksdk17:01
mordredtobiash: I was just about to type that17:02
mordredbasically, the nodepool task_manager doesn't support the underlying async logic17:02
mordredbut fixing that is going to be a bit more work17:03
mordredactually ... it MIGHT not be that terrible17:04
tobiashmordred: it somehow looks like it re-implements more of the task manager than needed?17:07
mordredyeah. well - it's the original task_manager and we've been refactoring things out and in to openstacksdk - I keep meaning to get the multithreaded one from nodepool also added to openstacksdk so that we can look at the whole picture in one file17:07
Shrewsmordred: didn't you do something around this in the past?17:08
mordredyeah - I keep making baby steps17:09
Shrewsguess i'm thinking of https://review.openstack.org/#/c/574287/17:09
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Add support for async tasks  https://review.openstack.org/60385017:11
mordredsomethign like that ^^ in nodepool, combined with the sdk patch, perhaps17:12
mordredof course, that patch screams "this needs a refactor"17:12
mordredShrews: it's also worth noting that dtantsur mentioned on https://review.openstack.org/#/c/574285/ that the task manager code is remarkably similar to what they did in futurist library17:13
mordredand he'd be open to us adding a rate-limiting feature to that17:13
mordredso maybe it's worth considering that if/when we decide to do any further refactoring of the taskmanager code17:14
clarkbmordred: I might be overthinking this but if you are rate limiting does running async gain you anything?17:16
clarkbexecute wait execute wait can be done synchronously and is simpler to reason about17:17
mordredclarkb: yah - because with rate limiting we're making sure we don't _start_ executing a particular task until the appropriate timeslice17:18
mordredclarkb: so with the async it allows us to fire off the correct number of tasks for the timeslice but have them perform in parallel17:18
mordred(we already do this for uploading swift large object segments - because the time it takes to upload a segment is much larger than the rate limit timeslice)17:19
clarkbah17:19
mordredclarkb: this is going to be one of those where we spend a few weeks making sure we get the interactions right and the end result is going to be a tiny number of SLOC17:20
tobiashmordred: I'll try this out now17:21
tobiash(this time in dev instance...)17:21
clarkbmordred: did leave a comment on the nodepool change17:23
clarkbI think there is a small bug17:23
clarkbtobiash: ^ fyi17:23
mordredclarkb: yes - you are very right - although I don't think putting the put into the else will work ...17:24
mordredlet me try again17:24
mordredoh - it's even easier17:26
*** jpena is now known as jpena|off17:30
mordredok. when I said "easier" ... I lied17:30
*** annabelleB has quit IRC18:17
*** annabelleB has joined #zuul18:21
Shrewstobiash: my brain must not be working correctly because i'm not getting what a "leaked" static node is according to https://review.openstack.org/600084 (left a comment there). is there a real life example you could describe for me?18:22
AJaegeranybody wants to +A a zuul-jobs change that has +2s already? https://review.openstack.org/#/c/602290/ fixes print node information18:23
tobiashShrews: I was on vacation at this time but I think is was reducing concurrency of a static node by config18:24
tobiashOr removing a static node from config18:24
Shrewstobiash: we have tests for decreasing max-parallel-jobs, which is why i'm confused i suppose18:24
tobiashswest: ^18:25
tobiashThere were at least problems on config changes and the need to do manual cleanup in zk18:26
Shrewstobiash: swest: if it's removing a static node from the config, then i don't think that leak test is valid because it shares a hostname with an already registered node. in fact, if i change the test to use a different hostname, it fails18:26
Shrews(hostnames assumed to be unique across nodes)18:26
Shrewsmaybe we should use a different term than "leak". In openstack world, this is an instance that exist that we do not have recorded in ZK. This test seems to be the opposite of that18:32
Shrewspossibly also a source of my confusion  :)18:34
tobiashShrews: do we remove znodes only on config changes and unlocks?18:34
tobiashWhat happens if the config change is during a restart (and thus unnoticed by nodepool)?18:35
Shrewswe remove znodes (for the static driver) if we decrease parallelism or remove a config entry. not sure what you mean about unlocks18:35
tobiashWith unlocks I meant finished jobs18:35
Shrewsfinished jobs should cause the node to be deleted which triggers the nodeDeletedNotification() where we re-register it18:37
Shrewsi *think* a config change across restarts gets consolidated at startup, but i'm not 100% sure18:37
Shrewsit obviously *shoul*18:38
tobiashYes, the question is what happens if we decrease parallelity during a restart18:38
Shrewsshould18:38
Shrewsthat question is totally not obvious anywhere in that stack  :)18:38
Shrewsbut i agree we should investigate it18:38
tobiashI think that might be the thing swest means by leaked (z)nodes18:39
Shrewsbut if that's the problem we are solving, let's not call it a leak18:39
Shrewsterm overloading  :)18:39
tobiashI'm always open for better naming :)18:40
Shrewshow about "restart config consolidation"?18:42
Shrewsor other words18:42
Shrewsnargle sniping18:42
Shrewsrabbit wrangling18:42
tobiashOr just config consolidation?18:42
Shrewswell, we always consolidate the config on config *changes*. not sure we are testing across *restarts" correctly18:43
tobiashGood point18:44
Shrewsit's actually good that's being brought up because i don't think i've ever considered that18:45
tobiashYeah, that driver is still new and we start using it more now18:46
Shrewsawesome18:46
*** annabelleB has quit IRC19:15
*** annabelleB has joined #zuul19:22
*** hashar has quit IRC21:23
mordredtobiash, Shrews, clarkb: remote:   https://review.openstack.org/603739 Run all tasks through the threadpool21:33
mordredI took another stab - a bit  more aggressive this time21:34
mordredhowever, I think this time it should work:21:35
mordred>>> c.compute.get('/')21:35
mordred<Response [200]>21:35
mordred>>> c.compute.get('/', run_async=True)21:35
mordred<Future at 0x7fad1205ec18 state=running>21:35
clarkbmordred: are some queries async by default? and if so does nodepool still need to accoodate htose?21:36
mordredclarkb: nothing that nodepool directly touches21:36
mordredclarkb: the swift segment uploads are async - but they're internal to sdk21:37
clarkbright nodepool says upload image, and that is synchronous from its perspective21:37
mordredyup21:37
clarkbsdk can upload a bunch of bytes async under that21:37
mordredyah. and with this change the same should be true for create_server21:37
mordredfrom each launcher thread's perspective it should look synchronous21:38
mordredbut behing the scenes the task manager should be able to service those tasks async in parallel but never starting more than rate_limit appropriate tasks21:38
mordredDEFINITELY double-check me on that logic though21:38
*** samccann has quit IRC21:43
SpamapSOne of the users on my team at GoDaddy ran in to a really annoying problem this week. If you have syntax errors in your ansible... it just exits 1 and you get 0 in the logs.22:12
SpamapSWould be quite useful if those errors from ansible were sent back to the user.22:13
clarkbSpamapS: I want to say those show up in the console log?22:13
clarkbI know I've seen them before. notmyname had one at the PTG and didn't require infra-root to debug it (he found the OSA table and they helped him)22:13
SpamapSThey do not in ours. The problem we had was that some directories were being moved around and so there were missing roles.22:16
SpamapSWe just got "RUN starting.." and then "RUN ENDED FAILURE"22:17
clarkbSpamapS: I'm trying to get an example of notmyname's error so that we have that info too22:17
SpamapSthe executor log had the ansible errors in it22:17
clarkbSpamapS: http://logs.openstack.org/86/601686/32/check/swift-multinode-rolling-upgrade/4ed9236/job-output.txt.gz#_2018-09-14_19_36_25_29156322:29
clarkbSpamapS: I would sort of expect all the ansible errors to be bubbled up like that? maybe it is a stdout vs stderr thing22:29
clarkbin this case it was an actual error in the yaml syntax itself22:30
clarkbrather than a higher level ansible error22:30
clarkbhttp://paste.openstack.org/show/730388/ that might be a bit more permanent22:34
clarkbSpamapS: in this case the playbooks referenced roles that were not included as required roles?22:36
clarkbs/this/your/22:36
*** dkehn has joined #zuul22:46
SpamapSclarkb: the roles are local to the playbook23:11
SpamapSclarkb: so they're in the ipmlied roles path, not an explicit one.23:11
*** annabelleB has quit IRC23:15
*** dkehn has quit IRC23:17
* SpamapS has now created a `zuul-local` command which does a reasonably decent job of running playbooks designed for zuul without zuul.23:18
SpamapSstarting to think though that it might make more sense to make that an entry point in zuul itself that has access to all of zuul's parsers23:18
SpamapS(right now I have to recreate the job with cmdling args23:18
clarkbSpamapS: could also use executor bits to keep things lined up23:19
SpamapSSo, that was my original approach23:19
SpamapSbut that runs into need for a lot of zuul objects23:20
SpamapSand thus, I think the approach I have now is a dead end23:20
SpamapSI definitely am using it right now for testing playbooks that are really hard to iterate on otherwise...23:20
SpamapSbut basically what I want is a way to just run a single CLI command that does scheduler, merger, and executor's job all at once, locally23:20
SpamapScall it "zuul-byovm"23:21
SpamapSas in, bring your own vm23:21
clarkbiirc the idea was use zuul api to get the job config bits and then inject to local executor thing23:21
SpamapSsomething like `zuul-byovm --tenant Foo --project bar -c zuul.conf --inventory 1.2.3.4, --user ubuntu --job test-my-thing23:22
pabelangerSpamapS: agree, there is still use cases where only zuul admin can see ansible failures, wrong path for role is one I know.23:22
SpamapSIt would parse out the job, run all the playbooks in the order they need to be, and the only requirement is you need a local blob of any secrets.23:22
SpamapSRight now I have that but I have to assemble a bunch of things..23:23
SpamapSlike for instance, I can't just add -M zuul/ansible/library, because the command replacement in there expects to magically have zuul_log_id added to it's arguments23:23
SpamapSso I have a dir with symlinks to just zuul_console and zuul_return23:23
SpamapSanyway, I just think we can't just keep doing the same thing with base jobs pre playbooks and post jobs23:25
SpamapSit's damn near impossible to write them efficiently without something like this.23:25
SpamapSThey are far too complected with zuul to just be ansible-playbooked23:26
SpamapSso, this is my POC, and it's making my life a berzillion times easier...23:26
SpamapSI'm going to sit down and think through how we might do it in a way that is generic and useful for zuul users.23:26
SpamapSbut where they can sit and iterate on playbooks on their own vms23:27
SpamapSalso like, the assumptions about having a clean SSH agent ... so many things23:30
clarkbSpamapS: you have to have copies of all repos to have complete config23:32
clarkbwhich us why asking zuul for the config seems to be a good idea23:32
SpamapSehhh... yes and no23:36
SpamapSyes I think that's the perfect solution23:36
SpamapSno I don't think that's the only solution23:36
SpamapSThe config isn't actually the hard part to reason about.23:36
SpamapSIt's the specialized environment and all of the inputs to it.23:37
SpamapSbut yeah, if we could do that, neat.23:37
clarkbyou would be unable to know your config is complete otherwise23:39
clarkbyou could ave users guess at it23:39
SpamapSthat's not actually a problem23:39
clarkband probably get it right most of the time23:39
clarkbit is if reproducing upstream behavior is imoortant23:40
SpamapSreproducing perfectly isn't ... reproducing a reasonable facsimile is.23:40
SpamapSBut yeah, I *want* what you describe23:40
SpamapSAnd I think we could definitely do that.23:41
SpamapSbut I don't mind assembling a bunch of git repos in a dir myself and just saying "use this dir as my tenant config of projects"23:42
SpamapSwhich is what my janky thing does now23:42

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!