*** jiapei has quit IRC | 00:24 | |
*** _ari_ has quit IRC | 02:45 | |
*** rfolco has quit IRC | 04:08 | |
*** sshnaidm has joined #zuul | 05:10 | |
*** sshnaidm has quit IRC | 05:11 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: executor: enable zuul_return to update Ansible inventory https://review.openstack.org/590092 | 05:44 |
---|---|---|
*** chkumar|off is now known as chkumar|ruck | 06:53 | |
*** hashar has joined #zuul | 07:10 | |
openstackgerrit | Simon Westphahl proposed openstack-infra/nodepool master: Cleanup of leaked resource for static driver https://review.openstack.org/600084 | 07:30 |
openstackgerrit | Simon Westphahl proposed openstack-infra/nodepool master: Implement liveness check for static nodes https://review.openstack.org/601513 | 07:30 |
*** jesusaur has quit IRC | 07:39 | |
*** jesusaur has joined #zuul | 07:44 | |
*** jpena|off is now known as jpena | 08:31 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/zuul master: Assure that status tooltip is displayed on entire row https://review.openstack.org/603504 | 09:05 |
rcarrillocruz | TIL about https://zuul-ci.org/docs/zuul/user/badges.html | 09:31 |
rcarrillocruz | nice! | 09:31 |
mordred | rcarrillocruz: \o/ | 10:36 |
*** jpena is now known as jpena|lunch | 11:20 | |
*** rfolco has joined #zuul | 12:17 | |
tobiash | mordred: do you know if nodepool can re-use floatingips instead of create/attach/detach/delete? | 12:17 |
tobiash | I have nodepool in its own tenant and don't need a cleanup of unused floatingips there | 12:17 |
tobiash | this currently slows down the openstacksdk task queue dramatically in my environment | 12:18 |
*** hashar is now known as hasharAway | 12:21 | |
mordred | tobiash: I don't believe we've implemented that option in nodepool. it's ... an unfortunately racey process | 12:23 |
tobiash | hrm, ok | 12:23 |
tobiash | so the workaround would be to create multiple providers to the same cloud? | 12:23 |
mordred | tobiash: the issue is that there is no way to atomically say "re-use an existing floating ip, else create one for me" | 12:23 |
tobiash | ah, ok | 12:24 |
*** jpena|lunch is now known as jpena | 12:24 | |
mordred | and since we do server creation in threads, we'd have to either have some sort of mutex somewhere to coordinate picking an available floating ip - or else handle conflict errors better | 12:24 |
mordred | that said - I think for your use case it could certainly be nicer to just re-use | 12:25 |
tobiash | mordred: is there a way to alternatively parallelize the task queue? | 12:27 |
tobiash | the problem is just that attach, create, delete take 1-2 seconds and block any other openstack request in that time | 12:27 |
tobiash | so if we could have a thread pool of say 5 parallel requests this issue would be solved for me probably too | 12:27 |
mordred | tobiash: it's a good point - I've actually been thinking also about making the taskmanager be per-service too | 12:28 |
tobiash | just tested and launched 200 nodes at once using nodepool and task contention caused some boot timeouts | 12:28 |
mordred | so that we're ratelimiting on a per-service basis (which is where server-side limits usually sit anyway) | 12:29 |
tobiash | that would make this better but still 200 instances would then wait for the requests for 200 floating ips and 50 of them will timeout | 12:30 |
mordred | as for making the taskmanager have a thread pool - I think that would work, but we'd probably have to be more clever with how we track the rate limiting | 12:30 |
tobiash | the rate limiting is thought of max x requests per s right? | 12:30 |
mordred | yah. oh, I guess we could just track when a request enters the pool | 12:31 |
mordred | interestingly enough - we actually already have a version of this for swift object segents | 12:32 |
mordred | object segments that is | 12:32 |
mordred | I wonder if we could just re-use it insideof the sdk Adapter class | 12:33 |
* mordred goes to ponder the possibility | 12:34 | |
mordred | tobiash: yes - I think we can make a reasonably simple patch to openstacksdk to make this happen | 12:36 |
tobiash | mordred: cool | 12:37 |
mordred | tobiash: remote: https://review.openstack.org/603739 Run all tasks through the threadpool | 12:43 |
mordred | tobiash: that's COMPLETELY untested | 12:44 |
tobiash | mordred: cool :) | 12:45 |
mordred | but I can't think of any reason it wouldn't work | 12:45 |
tobiash | mordred: btw, how do you handle releases in openstacksdk? | 12:47 |
tobiash | mordred: the 17.x releases are tied to the current openstack version? | 12:47 |
tobiash | mordred: so if we would want to have this and 602618 usable by nodepool what would be the best approach? | 12:48 |
*** samccann has joined #zuul | 12:58 | |
mordred | tobiash: we can cut a new release any time - and they're not tied to openstack version | 13:00 |
mordred | tobiash: in fact, we need to make one with clarkb's logging change in it for nodepool logs anyway | 13:00 |
tobiash | ah cool | 13:00 |
tobiash | mordred: I didn't quite understand your change | 13:00 |
tobiash | in case run_async is True the change is a noop right? | 13:01 |
tobiash | is run_async False when used by nodepool? | 13:01 |
mordred | tobiash: yes, that's right. the only thing that was calling run_async before was the swift object segment upload code | 13:01 |
tobiash | so that is a different task manager? | 13:02 |
mordred | it's the same task manager - it's a different behavior inside the task manager- if run_async is True in the task manager, then concurrent.futures is used and a Future object is returned | 13:02 |
tobiash | ah | 13:03 |
tobiash | thanks for the explanation :) | 13:03 |
mordred | sure! it's a deep dark section of the code | 13:03 |
mordred | speaking of ... | 13:03 |
mordred | Shrews: when you feel like staring at deep dark code - https://review.openstack.org/603739 and https://review.openstack.org/#/c/602085/ | 13:04 |
tobiash | mordred: is that thread pool configurable? | 13:08 |
tobiash | in terms of max parallel requests? | 13:08 |
tobiash | nevermind, found it | 13:09 |
mordred | tobiash: only by constructor parameter to TaskManager | 13:09 |
tobiash | that seems like a sane default and is probably not overwritten in nodepool | 13:09 |
mordred | yah. although if we wind up landing this patch to sdk - we should probably add a config option to nodepool to modify it | 13:14 |
mordred | I think it's probaby fine to keep it that way for now though | 13:14 |
tobiash | yepp | 13:14 |
Shrews | mordred: about to head out for a checkup on my wrist, but can look when i return | 13:28 |
mordred | Shrews: have you considered replacing your wrist with an angry cat? | 13:28 |
Shrews | mordred: it's too early for you to have been drinking | 13:29 |
mordred | Shrews: I would have thought after 10 years you'd know me better than that :) | 13:30 |
*** annabelleB has joined #zuul | 13:56 | |
*** samccann has quit IRC | 14:04 | |
*** samccann has joined #zuul | 14:04 | |
clarkb | mordred: my sdk logging change? | 14:31 |
mordred | clarkb: yah | 14:31 |
*** gouthamr_ is now known as gouthamr | 14:35 | |
*** _ari_ has joined #zuul | 15:06 | |
*** hasharAway is now known as hashar | 15:30 | |
*** chkumar|ruck is now known as chkumar|off | 15:34 | |
openstackgerrit | Markus Hosch proposed openstack-infra/zuul master: Add support for authentication/STARTTLS to SMTP https://review.openstack.org/603833 | 15:42 |
openstackgerrit | Markus Hosch proposed openstack-infra/zuul master: Add support for authentication/STARTTLS to SMTP https://review.openstack.org/603833 | 15:44 |
openstackgerrit | Markus Hosch proposed openstack-infra/zuul master: Add support for authentication/STARTTLS to SMTP https://review.openstack.org/603833 | 15:46 |
*** pcaruana has joined #zuul | 15:47 | |
*** annabelleB has quit IRC | 15:50 | |
*** annabelleB has joined #zuul | 15:53 | |
*** annabelleB has quit IRC | 16:03 | |
tobiash | mordred: commented on https://review.openstack.org/603739 | 16:18 |
*** annabelleB has joined #zuul | 16:23 | |
*** pcaruana has quit IRC | 16:27 | |
mordred | tobiash: thanks! | 16:32 |
tobiash | mordred: I was brave and tried that in production ;) | 16:33 |
clarkb | tobiash: that is really brave :P | 16:33 |
tobiash | but the rollback was easy and it only caused some minutes delay | 16:33 |
mordred | wow! | 16:33 |
mordred | tobiash: hrm. that's ... interesting | 16:34 |
mordred | tobiash: but thank you very much - I'll have to dig in a little further I guess :) | 16:34 |
Shrews | hopefully our tests catch that | 16:36 |
tobiash | Shrews: the integration tests for sure | 16:37 |
tobiash | the unit tests: no | 16:37 |
tobiash | I ran the unit tests before I tried it :P | 16:37 |
tobiash | mordred: maybe do task_manager.wait_for_futures instead? | 16:38 |
tobiash | found that somewhere else in the code | 16:38 |
*** panda has joined #zuul | 16:38 | |
*** annabelleB has quit IRC | 16:50 | |
tobiash | Shrews: looks like I was wrong, right now only two non-voting jobs failed and all other jobs are already green except two still running jobs | 16:50 |
Shrews | well, looks like maybe test_get() failure might be related? http://logs.openstack.org/39/603739/1/check/openstacksdk-functional-devstack-magnum/f278e29/testr_results.html.gz | 16:54 |
Shrews | it at least references the new code | 16:55 |
*** annabelleB has joined #zuul | 16:59 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/zuul master: Assure that status tooltip is displayed on entire row https://review.openstack.org/603504 | 16:59 |
tobiash | Shrews: does nodepool overwrite the task manager of openstacksdk?? | 17:00 |
tobiash | it overwrites the submit_task method which is used within the openstacksdk taskmanager to switch between async and not async | 17:01 |
tobiash | maybe the problem is not openstacksdk | 17:01 |
mordred | tobiash: I was just about to type that | 17:02 |
mordred | basically, the nodepool task_manager doesn't support the underlying async logic | 17:02 |
mordred | but fixing that is going to be a bit more work | 17:03 |
mordred | actually ... it MIGHT not be that terrible | 17:04 |
tobiash | mordred: it somehow looks like it re-implements more of the task manager than needed? | 17:07 |
mordred | yeah. well - it's the original task_manager and we've been refactoring things out and in to openstacksdk - I keep meaning to get the multithreaded one from nodepool also added to openstacksdk so that we can look at the whole picture in one file | 17:07 |
Shrews | mordred: didn't you do something around this in the past? | 17:08 |
mordred | yeah - I keep making baby steps | 17:09 |
Shrews | guess i'm thinking of https://review.openstack.org/#/c/574287/ | 17:09 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Add support for async tasks https://review.openstack.org/603850 | 17:11 |
mordred | somethign like that ^^ in nodepool, combined with the sdk patch, perhaps | 17:12 |
mordred | of course, that patch screams "this needs a refactor" | 17:12 |
mordred | Shrews: it's also worth noting that dtantsur mentioned on https://review.openstack.org/#/c/574285/ that the task manager code is remarkably similar to what they did in futurist library | 17:13 |
mordred | and he'd be open to us adding a rate-limiting feature to that | 17:13 |
mordred | so maybe it's worth considering that if/when we decide to do any further refactoring of the taskmanager code | 17:14 |
clarkb | mordred: I might be overthinking this but if you are rate limiting does running async gain you anything? | 17:16 |
clarkb | execute wait execute wait can be done synchronously and is simpler to reason about | 17:17 |
mordred | clarkb: yah - because with rate limiting we're making sure we don't _start_ executing a particular task until the appropriate timeslice | 17:18 |
mordred | clarkb: so with the async it allows us to fire off the correct number of tasks for the timeslice but have them perform in parallel | 17:18 |
mordred | (we already do this for uploading swift large object segments - because the time it takes to upload a segment is much larger than the rate limit timeslice) | 17:19 |
clarkb | ah | 17:19 |
mordred | clarkb: this is going to be one of those where we spend a few weeks making sure we get the interactions right and the end result is going to be a tiny number of SLOC | 17:20 |
tobiash | mordred: I'll try this out now | 17:21 |
tobiash | (this time in dev instance...) | 17:21 |
clarkb | mordred: did leave a comment on the nodepool change | 17:23 |
clarkb | I think there is a small bug | 17:23 |
clarkb | tobiash: ^ fyi | 17:23 |
mordred | clarkb: yes - you are very right - although I don't think putting the put into the else will work ... | 17:24 |
mordred | let me try again | 17:24 |
mordred | oh - it's even easier | 17:26 |
*** jpena is now known as jpena|off | 17:30 | |
mordred | ok. when I said "easier" ... I lied | 17:30 |
*** annabelleB has quit IRC | 18:17 | |
*** annabelleB has joined #zuul | 18:21 | |
Shrews | tobiash: my brain must not be working correctly because i'm not getting what a "leaked" static node is according to https://review.openstack.org/600084 (left a comment there). is there a real life example you could describe for me? | 18:22 |
AJaeger | anybody wants to +A a zuul-jobs change that has +2s already? https://review.openstack.org/#/c/602290/ fixes print node information | 18:23 |
tobiash | Shrews: I was on vacation at this time but I think is was reducing concurrency of a static node by config | 18:24 |
tobiash | Or removing a static node from config | 18:24 |
Shrews | tobiash: we have tests for decreasing max-parallel-jobs, which is why i'm confused i suppose | 18:24 |
tobiash | swest: ^ | 18:25 |
tobiash | There were at least problems on config changes and the need to do manual cleanup in zk | 18:26 |
Shrews | tobiash: swest: if it's removing a static node from the config, then i don't think that leak test is valid because it shares a hostname with an already registered node. in fact, if i change the test to use a different hostname, it fails | 18:26 |
Shrews | (hostnames assumed to be unique across nodes) | 18:26 |
Shrews | maybe we should use a different term than "leak". In openstack world, this is an instance that exist that we do not have recorded in ZK. This test seems to be the opposite of that | 18:32 |
Shrews | possibly also a source of my confusion :) | 18:34 |
tobiash | Shrews: do we remove znodes only on config changes and unlocks? | 18:34 |
tobiash | What happens if the config change is during a restart (and thus unnoticed by nodepool)? | 18:35 |
Shrews | we remove znodes (for the static driver) if we decrease parallelism or remove a config entry. not sure what you mean about unlocks | 18:35 |
tobiash | With unlocks I meant finished jobs | 18:35 |
Shrews | finished jobs should cause the node to be deleted which triggers the nodeDeletedNotification() where we re-register it | 18:37 |
Shrews | i *think* a config change across restarts gets consolidated at startup, but i'm not 100% sure | 18:37 |
Shrews | it obviously *shoul* | 18:38 |
tobiash | Yes, the question is what happens if we decrease parallelity during a restart | 18:38 |
Shrews | should | 18:38 |
Shrews | that question is totally not obvious anywhere in that stack :) | 18:38 |
Shrews | but i agree we should investigate it | 18:38 |
tobiash | I think that might be the thing swest means by leaked (z)nodes | 18:39 |
Shrews | but if that's the problem we are solving, let's not call it a leak | 18:39 |
Shrews | term overloading :) | 18:39 |
tobiash | I'm always open for better naming :) | 18:40 |
Shrews | how about "restart config consolidation"? | 18:42 |
Shrews | or other words | 18:42 |
Shrews | nargle sniping | 18:42 |
Shrews | rabbit wrangling | 18:42 |
tobiash | Or just config consolidation? | 18:42 |
Shrews | well, we always consolidate the config on config *changes*. not sure we are testing across *restarts" correctly | 18:43 |
tobiash | Good point | 18:44 |
Shrews | it's actually good that's being brought up because i don't think i've ever considered that | 18:45 |
tobiash | Yeah, that driver is still new and we start using it more now | 18:46 |
Shrews | awesome | 18:46 |
*** annabelleB has quit IRC | 19:15 | |
*** annabelleB has joined #zuul | 19:22 | |
*** hashar has quit IRC | 21:23 | |
mordred | tobiash, Shrews, clarkb: remote: https://review.openstack.org/603739 Run all tasks through the threadpool | 21:33 |
mordred | I took another stab - a bit more aggressive this time | 21:34 |
mordred | however, I think this time it should work: | 21:35 |
mordred | >>> c.compute.get('/') | 21:35 |
mordred | <Response [200]> | 21:35 |
mordred | >>> c.compute.get('/', run_async=True) | 21:35 |
mordred | <Future at 0x7fad1205ec18 state=running> | 21:35 |
clarkb | mordred: are some queries async by default? and if so does nodepool still need to accoodate htose? | 21:36 |
mordred | clarkb: nothing that nodepool directly touches | 21:36 |
mordred | clarkb: the swift segment uploads are async - but they're internal to sdk | 21:37 |
clarkb | right nodepool says upload image, and that is synchronous from its perspective | 21:37 |
mordred | yup | 21:37 |
clarkb | sdk can upload a bunch of bytes async under that | 21:37 |
mordred | yah. and with this change the same should be true for create_server | 21:37 |
mordred | from each launcher thread's perspective it should look synchronous | 21:38 |
mordred | but behing the scenes the task manager should be able to service those tasks async in parallel but never starting more than rate_limit appropriate tasks | 21:38 |
mordred | DEFINITELY double-check me on that logic though | 21:38 |
*** samccann has quit IRC | 21:43 | |
SpamapS | One of the users on my team at GoDaddy ran in to a really annoying problem this week. If you have syntax errors in your ansible... it just exits 1 and you get 0 in the logs. | 22:12 |
SpamapS | Would be quite useful if those errors from ansible were sent back to the user. | 22:13 |
clarkb | SpamapS: I want to say those show up in the console log? | 22:13 |
clarkb | I know I've seen them before. notmyname had one at the PTG and didn't require infra-root to debug it (he found the OSA table and they helped him) | 22:13 |
SpamapS | They do not in ours. The problem we had was that some directories were being moved around and so there were missing roles. | 22:16 |
SpamapS | We just got "RUN starting.." and then "RUN ENDED FAILURE" | 22:17 |
clarkb | SpamapS: I'm trying to get an example of notmyname's error so that we have that info too | 22:17 |
SpamapS | the executor log had the ansible errors in it | 22:17 |
clarkb | SpamapS: http://logs.openstack.org/86/601686/32/check/swift-multinode-rolling-upgrade/4ed9236/job-output.txt.gz#_2018-09-14_19_36_25_291563 | 22:29 |
clarkb | SpamapS: I would sort of expect all the ansible errors to be bubbled up like that? maybe it is a stdout vs stderr thing | 22:29 |
clarkb | in this case it was an actual error in the yaml syntax itself | 22:30 |
clarkb | rather than a higher level ansible error | 22:30 |
clarkb | http://paste.openstack.org/show/730388/ that might be a bit more permanent | 22:34 |
clarkb | SpamapS: in this case the playbooks referenced roles that were not included as required roles? | 22:36 |
clarkb | s/this/your/ | 22:36 |
*** dkehn has joined #zuul | 22:46 | |
SpamapS | clarkb: the roles are local to the playbook | 23:11 |
SpamapS | clarkb: so they're in the ipmlied roles path, not an explicit one. | 23:11 |
*** annabelleB has quit IRC | 23:15 | |
*** dkehn has quit IRC | 23:17 | |
* SpamapS has now created a `zuul-local` command which does a reasonably decent job of running playbooks designed for zuul without zuul. | 23:18 | |
SpamapS | starting to think though that it might make more sense to make that an entry point in zuul itself that has access to all of zuul's parsers | 23:18 |
SpamapS | (right now I have to recreate the job with cmdling args | 23:18 |
clarkb | SpamapS: could also use executor bits to keep things lined up | 23:19 |
SpamapS | So, that was my original approach | 23:19 |
SpamapS | but that runs into need for a lot of zuul objects | 23:20 |
SpamapS | and thus, I think the approach I have now is a dead end | 23:20 |
SpamapS | I definitely am using it right now for testing playbooks that are really hard to iterate on otherwise... | 23:20 |
SpamapS | but basically what I want is a way to just run a single CLI command that does scheduler, merger, and executor's job all at once, locally | 23:20 |
SpamapS | call it "zuul-byovm" | 23:21 |
SpamapS | as in, bring your own vm | 23:21 |
clarkb | iirc the idea was use zuul api to get the job config bits and then inject to local executor thing | 23:21 |
SpamapS | something like `zuul-byovm --tenant Foo --project bar -c zuul.conf --inventory 1.2.3.4, --user ubuntu --job test-my-thing | 23:22 |
pabelanger | SpamapS: agree, there is still use cases where only zuul admin can see ansible failures, wrong path for role is one I know. | 23:22 |
SpamapS | It would parse out the job, run all the playbooks in the order they need to be, and the only requirement is you need a local blob of any secrets. | 23:22 |
SpamapS | Right now I have that but I have to assemble a bunch of things.. | 23:23 |
SpamapS | like for instance, I can't just add -M zuul/ansible/library, because the command replacement in there expects to magically have zuul_log_id added to it's arguments | 23:23 |
SpamapS | so I have a dir with symlinks to just zuul_console and zuul_return | 23:23 |
SpamapS | anyway, I just think we can't just keep doing the same thing with base jobs pre playbooks and post jobs | 23:25 |
SpamapS | it's damn near impossible to write them efficiently without something like this. | 23:25 |
SpamapS | They are far too complected with zuul to just be ansible-playbooked | 23:26 |
SpamapS | so, this is my POC, and it's making my life a berzillion times easier... | 23:26 |
SpamapS | I'm going to sit down and think through how we might do it in a way that is generic and useful for zuul users. | 23:26 |
SpamapS | but where they can sit and iterate on playbooks on their own vms | 23:27 |
SpamapS | also like, the assumptions about having a clean SSH agent ... so many things | 23:30 |
clarkb | SpamapS: you have to have copies of all repos to have complete config | 23:32 |
clarkb | which us why asking zuul for the config seems to be a good idea | 23:32 |
SpamapS | ehhh... yes and no | 23:36 |
SpamapS | yes I think that's the perfect solution | 23:36 |
SpamapS | no I don't think that's the only solution | 23:36 |
SpamapS | The config isn't actually the hard part to reason about. | 23:36 |
SpamapS | It's the specialized environment and all of the inputs to it. | 23:37 |
SpamapS | but yeah, if we could do that, neat. | 23:37 |
clarkb | you would be unable to know your config is complete otherwise | 23:39 |
clarkb | you could ave users guess at it | 23:39 |
SpamapS | that's not actually a problem | 23:39 |
clarkb | and probably get it right most of the time | 23:39 |
clarkb | it is if reproducing upstream behavior is imoortant | 23:40 |
SpamapS | reproducing perfectly isn't ... reproducing a reasonable facsimile is. | 23:40 |
SpamapS | But yeah, I *want* what you describe | 23:40 |
SpamapS | And I think we could definitely do that. | 23:41 |
SpamapS | but I don't mind assembling a bunch of git repos in a dir myself and just saying "use this dir as my tenant config of projects" | 23:42 |
SpamapS | which is what my janky thing does now | 23:42 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!