Wednesday, 2014-10-29

*** vivek-eb_ has joined #openstack-lbaas00:07
*** vivek-ebay has quit IRC00:07
*** xgerman has quit IRC00:41
*** vivek-eb_ has quit IRC00:57
*** mwang2 has quit IRC01:00
*** fnaval has quit IRC01:26
*** vivek-ebay has joined #openstack-lbaas01:36
*** vivek-ebay has quit IRC01:43
*** amotoki has joined #openstack-lbaas01:59
*** fnaval has joined #openstack-lbaas02:16
*** sbfox has joined #openstack-lbaas02:19
*** sbfox has quit IRC02:33
*** fnaval has quit IRC03:00
*** vivek-ebay has joined #openstack-lbaas03:00
*** blogan_ has joined #openstack-lbaas03:15
openstackgerritBrandon Logan proposed a change to stackforge/octavia: Doc detailing amphora lifecycle management  https://review.openstack.org/13042403:44
*** blogan_ has quit IRC03:45
*** sbfox1 has joined #openstack-lbaas04:09
*** kobis has joined #openstack-lbaas04:21
*** kobis has quit IRC04:26
*** sbfox1 has quit IRC04:48
*** kobis has joined #openstack-lbaas04:48
*** kobis has quit IRC04:56
*** kobis has joined #openstack-lbaas05:00
*** sbfox has joined #openstack-lbaas05:11
*** blogan_ has joined #openstack-lbaas05:23
rm_youblogan: ooo, new spec05:23
*** kobis has quit IRC05:28
blogan_aye05:28
blogan_well just an update to a WIP05:28
*** vivek-ebay has quit IRC05:48
*** vivek-ebay has joined #openstack-lbaas06:07
*** kobis has joined #openstack-lbaas06:21
*** vivek-eb_ has joined #openstack-lbaas06:31
*** vivek-ebay has quit IRC06:33
openstackgerritBrandon Logan proposed a change to stackforge/octavia: Doc detailing amphora lifecycle management  https://review.openstack.org/13042406:37
*** blogan_ has quit IRC06:38
*** mwang2 has joined #openstack-lbaas06:56
*** mwang2 has quit IRC07:01
*** vivek-eb_ has quit IRC07:15
*** jschwarz has joined #openstack-lbaas07:15
*** rm_you| has joined #openstack-lbaas07:20
*** rm_you has quit IRC07:22
*** woodster_ has quit IRC07:40
*** sbfox has quit IRC07:51
*** jschwarz has quit IRC07:59
*** vivek-ebay has joined #openstack-lbaas10:16
*** vivek-ebay has quit IRC10:21
*** amotoki has quit IRC10:27
*** vivek-ebay has joined #openstack-lbaas12:01
*** woodster_ has joined #openstack-lbaas12:31
*** vivek-ebay has quit IRC12:43
*** Krast has joined #openstack-lbaas12:48
*** mestery has quit IRC13:08
*** amotoki has joined #openstack-lbaas13:09
*** markmcclain has joined #openstack-lbaas13:09
*** mestery has joined #openstack-lbaas13:11
*** mestery has quit IRC13:13
*** mestery has joined #openstack-lbaas13:13
*** amotoki has quit IRC13:31
*** Krast has quit IRC13:43
*** markmcclain has quit IRC14:12
*** xgerman has joined #openstack-lbaas14:57
*** ptoohill_ has joined #openstack-lbaas15:02
*** mlavalle has joined #openstack-lbaas15:38
*** dboik has joined #openstack-lbaas15:40
dougwigmorning15:47
ptoohillMornin'15:56
openstackgerritTrevor Vardeman proposed a change to stackforge/octavia: Defining interface for amphora drivers  https://review.openstack.org/13035216:02
openstackgerritTrevor Vardeman proposed a change to stackforge/octavia: Defining interface for amphora drivers  https://review.openstack.org/13035216:11
*** dboik has quit IRC16:13
*** markmcclain has joined #openstack-lbaas16:18
bloganawesome, another patch gets merged into the feature branch16:23
blogannow those monster patches are no more16:23
bloganmestery: thanks!16:24
*** kobis has quit IRC16:25
mesteryblogan: Yes sir! That last one shoudl unlock you folks even more.16:25
*** kobis has joined #openstack-lbaas16:26
*** vivek-ebay has joined #openstack-lbaas16:34
*** fnaval has joined #openstack-lbaas16:35
*** codekobe has quit IRC16:37
*** sballe has quit IRC16:38
*** ctracey has quit IRC16:38
*** dboik has joined #openstack-lbaas16:46
*** dboik has quit IRC16:51
*** sbfox has joined #openstack-lbaas16:51
blogandougwig ping17:01
dougwigjoining17:01
blogansam hasn't joined yet17:02
dougwigit's late for him, right?17:02
blogani think its just 7 or 8 pm17:03
ptoohillHe set the date/time17:03
bloganyeah its 7 pm there17:03
dougwighmm, then why aren't our group meetings at this time?  :)17:03
bloganlol17:03
blogani agree17:03
blogani thought israel was like 10 hours ahead of us, but turns out its only 717:04
ptoohillSam is running 5 mins late17:05
ptoohillHe posted this in the google slides17:05
rm_workyeah WTB this time for our meetings T_T17:06
*** dboik has joined #openstack-lbaas17:06
dougwigthere is an opening either at this time slot, or one hour earlier.17:08
rm_workuhhh so what you're telling me17:10
rm_workis that we could have our meetings at 11am CST. and we have instead been doing them at 9am.17:10
rm_workwtf17:10
rm_workor, 9am instead of 7am for the poor PST people17:11
*** dboik has quit IRC17:11
*** dboik has joined #openstack-lbaas17:12
*** sbalukoff has quit IRC17:17
*** kobis has quit IRC17:28
*** kobis has joined #openstack-lbaas17:28
*** markmcclain has quit IRC17:31
*** masteinhauser is now known as zz_mas17:41
*** ctracey has joined #openstack-lbaas17:42
*** sbfox has quit IRC17:42
*** sbfox has joined #openstack-lbaas17:45
*** codekobe has joined #openstack-lbaas17:48
*** dboik has quit IRC17:59
*** dboik has joined #openstack-lbaas18:00
*** dboik has quit IRC18:01
*** sballe has joined #openstack-lbaas18:02
*** dboik has joined #openstack-lbaas18:03
*** dboik has quit IRC18:03
*** dboik has joined #openstack-lbaas18:04
openstackgerritTrevor Vardeman proposed a change to stackforge/octavia: Defining interface for amphora drivers  https://review.openstack.org/13035218:05
*** dboik_ has joined #openstack-lbaas18:06
*** dboik has quit IRC18:06
*** markmcclain has joined #openstack-lbaas18:25
*** sbalukoff has joined #openstack-lbaas18:32
sbalukoffHey folks, if you haven't updated it yet, please don't forget to fill out the weekly stand-up etherpad:  standup18:47
sbalukoffD'oh!18:47
sbalukoffhttps://etherpad.openstack.org/p/octavia-weekly-standup18:47
ptoohill;)18:53
sbalukoffAlso, if you've got anything else for the agenda, feel free to add it: https://wiki.openstack.org/wiki/Octavia/Weekly_Meeting_Agenda18:55
*** dboik_ has quit IRC18:58
*** dboik has joined #openstack-lbaas18:58
*** ptoohill_ has quit IRC19:00
*** markmcclain1 has joined #openstack-lbaas19:01
*** markmcclain2 has joined #openstack-lbaas19:02
*** markmcclain has quit IRC19:03
*** dboik has quit IRC19:03
*** dboik has joined #openstack-lbaas19:04
*** markmcclain1 has quit IRC19:06
*** kobis has joined #openstack-lbaas19:22
*** zz_mas is now known as masteinhauser19:28
*** vivek-ebay has quit IRC19:29
*** ptoohill_ has joined #openstack-lbaas19:30
*** dboik has quit IRC19:32
*** vivek-ebay has joined #openstack-lbaas19:33
*** dboik has joined #openstack-lbaas19:36
*** dboik has quit IRC19:37
*** dboik has joined #openstack-lbaas19:39
*** dboik has quit IRC19:41
*** dboik has joined #openstack-lbaas19:42
*** sbfox has quit IRC19:46
*** sbfox has joined #openstack-lbaas19:48
*** kobis has quit IRC19:49
*** markmcclain2 has quit IRC19:50
*** markmcclain has joined #openstack-lbaas19:51
*** jamiem has joined #openstack-lbaas19:52
*** barclaac|2 has joined #openstack-lbaas19:53
*** barclaac has quit IRC19:53
dougwighttps://www.irccloud.com/pastebin/AIEwr5av19:54
ptoohillwooten!19:55
*** kobis has joined #openstack-lbaas19:56
*** ajmiller has joined #openstack-lbaas19:58
bloganthanks to oleg on that one20:00
*** mwang2 has joined #openstack-lbaas20:02
*** kobis has quit IRC20:04
*** jorgem has joined #openstack-lbaas20:04
*** sbfox has quit IRC20:09
*** markmcclain has quit IRC20:13
*** sbfox has joined #openstack-lbaas20:15
openstackgerritTrevor Vardeman proposed a change to stackforge/octavia: Defining interface for amphora drivers  https://review.openstack.org/13035220:31
openstackgerritTrevor Vardeman proposed a change to stackforge/octavia: Defining interface for compute drivers  https://review.openstack.org/13035220:34
*** jamiem has quit IRC20:38
xgermanfor a moment I read amphora drivers instead of compute drivers20:55
rm_workyeah it was the other :P21:01
rm_workwe made him fix it21:01
rm_workI was like21:01
sbalukoffjohnsom_: Do you have time to continue discussion here?21:01
rm_work"wait, german is working on the same thing as you?!"21:01
roharasomeone remind me what we got cookin' for octavia at summit next week21:02
johnsom_Sure21:02
sbalukoffOk, so, I apologize: I don't mean to single you out on this21:02
johnsom_Thank you.  It definitely seemed like ti21:02
sbalukoffI probably haven't been as clear as I should have been with regard to expectations for people participating in this project.21:03
johnsom_I don't mind heads up that people care about stuff I signed up for.  I got a lot of "we will build our own image", so I was working offline on it and without top priority21:04
sbalukoffWhat do you mean when you say "I got a lot of 'we will build our own image'"?21:04
johnsom_Two weeks ago I put together upstart with respawn for haproxy to go into these images.21:04
bloganjohnsom_: i think the disconnect is that we are waiting to build those images until we have something to work off of, and an actual working octavia with some default image, which would be yours21:04
blogansbalukoff: bc we (rax) mentioned we will end up probablying having to do our own image21:05
sbalukoffblogan: Aah.21:05
sbalukoffOk.21:05
johnsom_When I signed up for this it seemed like everyone was saying they were going to do their own spin anyway, i.e. not interested.  Basically it would just be used for testing.21:05
sbalukoffSo, that's all really secondary to the discussion I want to have.21:05
roharawe wil end up doing our own image as well21:06
johnsom_Ok, so I hear you that people need this.  I will get working on code to check in under WIP.21:06
roharasbalukoff: i am making my schedule for next week. what are we doing for octavia? i know this came up, but i can't find it.21:07
sbalukoffOk, sure.21:07
sbalukoffrohara: Right now, it's mostly ad-hoc meetings and coding work as people are able.21:07
roharasbalukoff: ok21:07
bloganjohnsom_: i don't mean to single anyone out, but I think this is a good time to adopt a policy of doing the WIPs in gerrit, and not waiting until you think its done adn then pushing21:07
blogani had to get out of that mentality m yself21:07
sbalukoffI would like to go over front-end and back-end topology stuff thoroughly with people so we can know what features we're going to need to add to Neutron to make this work.21:08
blogani still think we all need to work on reviewing and giving feedback on WIP reviews now21:08
roharasbalukoff: sign me up. i'll just wanted around looking at names on badges until i find you :)21:08
roharas/wanted/wander21:08
sbalukoffrohara: I'm going to be stuck at our booth off and on during the first part of the week.21:08
bloganare we goign to do that groupme app again?21:09
johnsom_My personal approach has been to develop something that at least produces something useful before checking in, then iterating from there.  I don't want to waste people's precious review time on WIP stuff that is very incomplete.21:09
roharasbalukoff: well i know where to find you then21:09
sbalukoffblogan: It's more difficult, what with international cell phone rates.21:09
johnsom_Maybe that pendulum swung to far here21:09
bloganjohnsom_: i know that urge, but if people are interested in it, they are willing to look at incomplete work and give feedback21:09
sbalukoffjohnsom_: So, at last weeks meeting we discussed ideas around how to effectively collaborate.21:09
johnsom_Yep, which was the surprise of the day here....21:10
sbalukoffAnd again, the pithy phrase "If it's ain't in gerrit, it doesn't exist" is probably the best policy here.21:10
sbalukoffjohnsom_: Sorry, it's in the meeting minutes from last week.21:10
johnsom_So why the push to have people work on stuff folks have signed up for when we have unassigned blueprints?  Just curious here21:11
sbalukoffAgain, the dilemma I'm facing is that I have people who are enthusiastic and want to work on Octavia, especially the bits that aren't blocked by anything else.21:11
sbalukoffAnd I also have people who have signed themselves up for writing parts of this project who have either had their priorities re-arranged by their working circumstances, or otherwise are not making visible progress.21:11
sbalukoffjohnsom_: Many of those unassigned blueprints are either lower priority, or are blocked by other blueprints.21:12
johnsom_Sorry if you don't think I am enthusiastic about Octavia. Frankly I want it yesterday.21:12
sbalukoffAnd I *know* we're all itching to have something real we can work with.21:12
sbalukoffjohnsom_: That's not what I'm saying.21:12
sbalukoffAnd you ARE NOT under attack here.21:13
sbalukoffPlease don't take offense.21:13
sbalukoffThe point I'm getting at is: As PTL, I need to be able to make sure that 1) we are making progress on this project and 2) people who want to work on it have something to do.21:13
johnsom_Again, I didn't think base-image was a priority, so news to me.  Good to know.21:14
johnsom_Yep, as I have mentioned, please, as PTL poke me and help me understand the priority.21:14
sbalukoffOk, I'll be more persistent. What is the best way to get a hold of you?21:15
sbalukoffjohnsom_: So, controller design is also a priority21:15
johnsom_There are eleven non-blocked blueprints, two of which are high.  Can we get those folks fired up on one of those?21:16
sbalukoffMost of those have people working on them with WIP reviews.21:16
johnsom_I am always in this room and at the meetings.21:16
sbalukoffYou weren't last week. ;P21:17
sbalukoffBut... whatever.21:17
johnsom_I was looking at those that are unassigned.  Are we missing some assignments?21:17
bloganyeah what blueprints aren't assigned that have WIPs?21:18
blogandont tell me its trevor21:18
johnsom_Really?  I updated the standup last week.  I'm pretty sure I was there21:18
sbalukoffThen how did you miss the "if it ain't in gerrit, it doesn't exist" discussion?21:18
sbalukoffAlso, many of the lower priority blueprints are effectively blocked by other groundwork that needs to be done, even if they aren't listed as such.21:19
sbalukoff(Again, I hate launchpad.)21:19
johnsom_launchpad has a blocked status21:20
blogani think there are two blueprints that are High that can be worked on, amphorae-scheduler and amphora-health-monitor21:20
sbalukoffjohnsom_: How close are you to having a spec to review on the controller?21:20
johnsom_Anyway, sbalukoff, are we good?21:21
johnsom_I'm somewhere between 50% to 75% done21:21
sbalukoffjohnsom_: Are you good with the "if it ain't in gerrit, it doesn't exist" idea?21:21
sbalukoffBecause if there's nothing in there that's marked WIP, I don't have any concrete evidence that progress is being made.21:22
johnsom_Well, I think we shouldn't be quick to disregard people that have signed up for work, published a spec, etc.  It would be throwing away code/research that is progress by passing tasks around.21:23
xgermanjohnsom_ +121:23
xgermansbalukoff, when are you taking roll of gerrit?21:24
rm_workYeah I think I am going to upload my certmanager change today, even though it *can't* pass py26/py27 tests yet because python-barbicanclient v3.0 isn't released to pypi yet21:24
sbalukoffjohnsom_: Agreed. But I need a way to really know where you're at. The idea of writing a bunch of stuff on your own and then presenting something close to the finished product does not work well in this environment.21:24
sbalukoffxgerman: I usually look through gerrit every day, even if I don't have time every day to review.21:24
xgermanso if we slack for 10 days you will take it away or so?21:25
sbalukoffMostly, though, if I have someone asking me "what should I work on?" I look at both gerrit and launchpad.21:25
johnsom_So what I am hearing is you don't trust what people are reporting in the standups and meetings.  If you look at the log from the Octavia meeting two weeks ago I stated that I was prioritizing the controller spec over the image work to get the spec out early.  I also asked if that was a problem and offered to re-order21:26
johnsom_20:41:07 <blogan> how is the base image coming along?21:26
johnsom_20:42:15 <johnsom> I have done some experimental images and have a good line of sight on the code.  I just prioritized a first draft of the controller spec.21:26
johnsom_20:42:30 <blogan> okay cool, just wanted to get an idea21:26
johnsom_20:42:38 <johnsom> If it is a blocker, I can re-order21:26
sbalukoffAh yes.... that was the week I was on vacation21:27
sbalukoffAnd I should have been quicker to tell you it was more important than the controller spec.21:27
sbalukoffStill, y'all seem to be missing the point:21:27
rm_workcommit early, commit often21:27
sbalukoffrm_work: +121:28
rm_workstill working on blogan :P but yeah, gerrit WIP is great for iteration, use it the same way you would use a private git repo, that's essentially what it is21:28
rm_workhaving lots of patchsets is not a sin21:28
rm_workjust means you're actively iterating21:29
bloganyes it is! but it is the lesser of sins in this case21:29
sbalukoffxgerman: If it appears you've stalled for 10 days, I will probably ask what's up. If someone else is champing at the bit to work on it, I'll probably let them.21:29
jorgemI'm an angel then since I have no patchsets :)21:29
bloganor a bloodsucking manager21:29
johnsom_I don't think I am.  I get it.  I just think you are jumping on me for some alternate motive.  I see open, non-blocked, non-assigned, high blueprints available, so why you want to switch developers doesn't really make sense to me.  It's not like I haven't posted anything.  I wrote and posted the spec.21:29
xgermansbalukoff, I am acticely wokring on it + I will get the WIP up today...21:30
sbalukoffAgain, no offense is meant by this!21:30
xgerman(or better I am pair programming with Min)21:30
rm_worksbalukoff: you commented about showing Octavia storing a copy of the user's cert -- remember we decided NOT to do that21:30
sbalukoffI'm certainly not saying you lack enthusiasm or are lazy or some other such nonsense.21:30
sbalukoffThe point is we all have jobs that have us doing many things at once.21:30
rm_work(on my TLS review)21:30
xgermanyeah, having clear deadlines help me focus my *and HP's mind)21:30
sbalukoffAnd to keep progress on this project moving forward, we need to be flexible in shifting workloads around.21:31
rm_workwell, all of us at Rackspace except jorgem are full-time Octavia for the next ... indefinite period21:31
sbalukoffrm_work: Will discuss that in a minute.21:31
xgermanrm_work, cool21:32
sbalukoffrm_work: That's great. That means I can throw more work at you. :)21:32
rm_workheh21:32
rm_workjorgem is soaking up all the non-Octavia stuff our team has to do :)21:32
jorgemI want to be fulltime :(21:32
sbalukoffxgerman and johnsom_: Am I being unreasonable about this?21:32
jorgem*sniff* *sniff*21:32
blogandont forget neutron lbaas21:33
xgermansbalukoff, yes and no. We like to do the stuff we signe dup for so if we know you need it by X we will ge tit done21:33
johnsom_I think the big gap is communication.  Again, I was up front about what I was working on.  People didn't seem to be excited about the base image and I knew the spec would spark excitement, so commit early, commit often....21:33
sbalukoffjohnsom_: Then let's add frequent and early commits to gerrit as part of our communication strategy.21:34
johnsom_No one spoke up when I mentioned the trade off.21:34
johnsom_So, can I help you find something for these new folks to work on?21:34
sbalukoffjohnsom_: Again, I'll take the heat for that.21:34
sbalukoffSure21:35
xgermanand there can also more than one person work on one blueprint21:35
sbalukoffxgerman: +121:35
johnsom_Yep.21:35
bloganive had trevor help me out on a few21:35
sbalukoffjohnsom_: I would love to see your WIP stuff on the controller. I see that becoming our next big hurdle once we have base images being built21:36
xgermanyeah, and he worked closely with ajmiller21:36
sbalukoff(And it's the glue between so many other components-- we all need to see how it's coming together.)21:36
xgermanso it even works across companies  :-)21:36
bloganim sure the controller work will deal with the amphora lifecycle managemetn doc i've been working on21:36
sbalukoffAnd the Operator APi21:36
bloganyeah21:37
rm_workyeah a lot of stuff kinda needs to evolve in parallel21:37
xgermanyep, I will try to absorb all of Michael's knowledge so we cna discuss at the summit :-)21:37
sbalukoffAnd... well.. all the other -interface(s)21:37
bloganosmosis21:37
johnsom_I think the health monitoring is open and one I would like to see explored.  I would like to see some proof of life via the monitoring url or something similar.  It is not clear to me where this lives and what the tiers, i.e. in the amp, controller, etc.21:37
sbalukoffxgerman: I'd love to see something in gerrit that we can start talking about, even if it's only 50 - 75% done.21:37
xgermanwell, writing is a time consuming task :-)21:38
johnsom_Tell me about it21:38
sbalukoffxgerman: It is. But ultimately, we don't have much concrete to go off of if it isn't written down.21:38
johnsom_We had a good conversation about the cert handling too21:38
bloganwhat is time consuming is trying to perfect the seqdiag bc some of the options are documented and so you are forced to look at the source code to find out what it supports21:38
sbalukoffHeck, even if 90% of the document is "to be determined" the remaining 10% at least gives us something to work off of.21:39
bloganand it all feels like a waste of time, but ocd kicks in21:39
xgermanlol21:39
blogananother good reason to push to gerrit early and often is if by chance you do get pulled off the blueprint, then someone can easily get your code and start from there21:40
sbalukoffcodekobe and / or intr1nsic: Would y'all be willing to take on the amphora-health-monitor blueprint?21:40
sbalukoffblogan: +121:40
sbalukoffxgerman and johnsom_: Just to be clear, "commit early, commit often" is something I NEED to see happen.21:41
sbalukoffI realize this might mean a change in how you're used to doing things.21:41
johnsom_I think it is a balance we all need to work on21:41
intr1nsicsbalukoff I think so21:41
sbalukoffWe're not playing poker here, y'all. There's no need to hide your hand, eh.21:42
sbalukoffAnd I don't care if the code is crap to begin with.21:42
johnsom_sbalukoff I think I have been very straight forward21:43
codekobeamphora-health-monitor  blueprint aye21:43
codekobewhen does this need to be done by21:43
sbalukoffjohnsom_: Ok, I agree. Are you willing to follow "commit early, commit often" with your actual specs and code going forward?21:43
codekobei can get with you on what has already been discussed21:43
codekobesbalukoff i am sure you already have some ideas of what the health check should include21:44
sbalukoffcodekobe: We'll need it when we plug it into the controller, when we have a reference amphora image being built, and when we have a driver to control it.21:44
codekobesbalukoff: intr1nsic and i can take a loot at that21:44
sbalukoffNo specific date.21:45
johnsom_codekobe intr1nsic - awesome.  This is the one area that ties the Amp to a controller.21:45
codekobeare we looking for the amp to report in, or for controller to poll?21:45
sbalukoffThe health check will ultimately touch the amphora API, the amphora driver, and controller.21:45
codekobenot sure what has been discussed already21:45
codekobeso if it touches the amphora API that would tell me the controller is polling21:46
sbalukoffcodekobe: We're looking for the amphora to emit health checks at regular intervals. Initially this can be done via a RESTful API which lives on the back-side of the controller (yet to be defined), but ultimately it'll probably happen via HMAC-signed UDP messages from the amphora.21:46
codekobeI see, so amphora reports in21:47
johnsom_sbalukoff I thought we had decided that the health messages are UDP21:47
blogansbalukoff: if it is done through a restful api doesn't that mean the controller needst o be running a web server?21:47
sbalukoffcodekobe: We also have a need for a "deep diagnostic" healthcheck (usually, shortly after the amphora is spun up), as well as regular light-weight check-ins after that.21:48
dougwigwhoa, you guys wrote a book.21:48
sbalukoffblogan: Yes, it does.21:48
bloganso every component is basically going to have an web server?21:48
sbalukoffjohnsom_: The talk was to do it over a REST interface at first because people thought that would be simpler to implement out of the gate.21:48
sbalukoffUltimately we do want it over UDP.21:48
blogani feel like we should just do it UDP from the beginning21:49
xgermanREST simpler +121:49
xgermanand the interface accounts for that ;-)21:49
sbalukoffI think xgerman was the one primarily advocating the REST API for the same to begin with.21:50
xgermanyes + sballe -- we both love REST!!21:50
*** openstackgerrit has quit IRC21:50
blogani like rest too but it seems like overkill in this instance21:51
codekobesbalukoff: there seems to be a lot of similarity to Trove here21:51
sbalukoffcodekobe: There is!21:51
sbalukoffSo if we can totally steal their work, we should.21:51
intr1nsic+121:51
sballesbalukoff: I love REST!21:52
codekobeyeah, i can look into how they handle that21:52
xgermanand trove is awesome, too, the more we can take from them the better ;-)21:52
codekobethey have an api, a task controller, an agent21:52
sbalukoffLet's see what trove is doing, and then decide where we go on the health checks.21:52
rm_workguys, please give me a sec to catch up21:52
rm_work:)21:52
rm_workI will probably have comments21:53
codekobehttp://docs.openstack.org/developer/trove/dev/design.html21:53
codekobewe can look that over21:53
* dougwig motions rm_work closer, and says, "RUN!"21:53
codekobewe dont have to copy, but we might be able to borrow where it makes sense21:53
rm_worklol21:54
sbalukoffAnyway, xgerman and johnsom_: Please be aware: Even if you're not willing to commit to "commit early, commit often" in so many words:  Note that that's what I'm expecting to see. Weekly updates or ad-hoc asking of "where are you at?" are not going to be as informative as stuff that's in gerrit.21:55
rm_workso, we are talking about... amphora health checks? not node health checks, right?21:55
xgermanamphora healthchecks -- node healthcheacks are done by haproxy21:55
rm_workright21:55
xgermansbalukoff we will amend our life to appease the great leader21:56
rm_workso, if the amphora is announcing its healthy state via UDP ...21:56
sbalukoffmember healthchecks.21:56
sbalukoffxgerman: Thank you21:56
rm_workwhat is the threshold for "this amphora is no longer healthy"21:56
sbalukoffrm_work: Going too long without a healthcheck.21:57
xgermanwell, right now we ask with REST and if it times out for X times or gets route not found we shoot it21:57
rm_worksbalukoff: so, there is a process monitoring the latest update times for each amphora?21:57
sbalukoffrm_work: There needs to be, yes.21:57
rm_workok, what can we expect the poll time on that to be?21:57
xgermanthere are some more nuances like if a controller in az1 can't reach any lbs in az2 -- this might mean az2 is down or az1 lost network compatibility, etc.21:57
rm_workjust example numbers -- would it be polling every second to see if an amphora hadn't responded within the last two seconds? or something like that?21:58
sbalukoffxgerman: +121:58
xgermanrm_work pool time, etc. all needs to be configurable21:58
sbalukoffrm_work: Say, once a minute from each amphora21:58
rm_worklooking for some sense of scale21:58
rm_worksbalukoff: so, the amphora anounces once per minute? or we just check for "at least one announce in the last minute"?21:58
sbalukoffrm_work: Again, control in v0.5 is not meant to be that scalable.21:58
rm_workI meant more like, O(...)21:59
rm_workseconds, minutes....21:59
xgermanyeah, in libra we allow 60 s downtime so polling like every 5s for 5-6 times would work21:59
rm_workok, I thought we were looking for subsecond failovers21:59
rm_workarchitecting it this way does not work well for that22:00
sbalukoffrm_work: That's what active-active or active-standby are supposed to accomplish.22:00
rm_workor really even approaching that22:00
rm_workalright, but even active-active will need to announce when one amp goes down, in a reasonable amount of time, right?22:00
sbalukoffIn active-standby, each amphora is probing its partner once per second.22:00
dougwighow man amphora per controller are we supporting?22:00
johnsom_Does the health check info need to be persistent?22:00
rm_workok22:00
sbalukoffdougwig: We don't have a specific number on that yet.22:01
xgermanrm_work back to active-active22:01
codekobedougwig: do we have to support amphora per controller?22:01
xgermansome component needs to fail over sub second and not send stuff to the broken lb; the controller still ha smore time to bring in a new lb22:01
sbalukoffjohnsom_: As in, stored in the database? Most likely yes-- the process receiving the health checks may not be the same one that checks for dead amphorae.22:01
xgermanthose are two different problems22:01
rm_workxgerman: alright22:01
johnsom_I could see a controller fail over needing to have the near term health history since we are checking for missing check-in messages.22:01
codekobecontroller hsoudl be ha22:01
rm_workso the failover mechanism will be unrelated entirely to health monitoring of the amps22:02
codekobeit would be nice for controllers not to have to fail over22:02
sbalukoffrm_work: If you want sub-second failover, yes.22:02
codekobebut allow for multiple controllers22:02
sbalukoffcodekobe: That's Octavia v1.022:02
codekobei suppose that makes it difficult to determine which controller hsoudl take action on a failed node22:02
johnsom_That is what I thought.  So are we considering something like Redis to handle that transaction rate?  I think we would melt a transactional database22:02
sbalukoffOctavia v0.5 is meant mostly to work the bugs out of amphora lifecycle, network connectivity, etc. It'll support many amphorae, but isn't yet concerned with scaling the control layer.22:03
codekobejohnsom_: i think we would with the transactions.  Not to mention we probably dont need to store health checks long term22:03
sbalukoffjohnsom_: That's not a bad idea.22:03
codekobea cache would seem more appropriate22:04
rm_workcrazy idea: what about a queue?22:04
codekobehaha22:04
sbalukoffI will beat you, rm_work.22:04
sbalukoff;)22:04
johnsom_I think we have at least two levels of health check going on, one for failover in seconds and one for Amp "cluster" fail over22:04
rm_workseriously tho22:04
rm_workhealth announces go onto a queue, queue length is like, 122:04
sbalukoffQueue seems pretty heavy for this.22:04
xgerman+122:04
rm_workand it doesn't have to be a hardened queue22:04
rm_workthat way only the latest health announce is ever on the queue22:05
codekoberm_work:  i'm not sure how a queue would work, because you are tracking how long it has been since the last checkin, which implies we are keeping a state, which is not for a queue22:05
rm_workwhenever the controller wants to check, whatever its polling period22:05
rm_workthe single message on the queue is the latest status22:05
sbalukoffAlso, I'd rather not expose amphorae to a queue directly.22:05
dougwigwe put rest *EVERYWHERE* and a queue is heavy?!??!?  hahahaha.22:05
dougwigsorry, i'm going to go insane now.22:05
rm_workI mean, the amphora broadcasts to a queue22:05
johnsom_Anyway, just some thoughts.  I think the health check thing is going to be interesting.  I need to get back to work as highlight so many times in the last hour....22:05
rm_worknot the other way around22:06
rm_workamphora -> queue: hello, I am <ACTIVE>, <timestamp>22:06
sbalukoffjohnsom_: Thanks for your time today.22:06
rm_workthat happens (n) times22:06
rm_workat whatever frequency22:06
blogandougwig: +122:06
rm_workand the controller queries the queue at whatever frequency it wants, and gets the latest status and timestamp22:07
codekoberm_work: i could see that as a way to deliver the message22:07
sbalukoffrm_work: Again, don't want a queue exposed directly to the amphora. I see the potential for that being too easily abused if the amphora gets hacked.22:07
rm_workif the timestamp is old, then that's bad22:07
codekobealthough requires more than udp22:07
codekobeso the controller would not just get the latest message22:07
rm_worksbalukoff: what, so the hacked amphora could announce to a queue of size 1?22:07
codekobeit would end up feeding however many messages were sent since last poll?22:07
rm_workthe latest message is all that matters22:07
rm_workcodekobe: queue size 122:07
rm_workhttp://www.rabbitmq.com/maxlength.html22:08
rm_work"Messages will be dropped or dead-lettered from the front of the queue to make room for new messages once the limit is reached."22:08
sbalukoffrm_work: You now open up your undercloud based on any vulnerabilities (which are probably well known) in the queue software.22:08
rm_workand it doesn't have to be a "reliable" queue22:08
sbalukoffI'd rather have something we write ourselves doing sanity-checks and whatnot on messages from the amphorae.22:09
rm_worksbalukoff: if the amphora only ever writes to the queue (never reads), and the only thing that reads from the queue is only looking for a state and timestamp, and the only possible action to take from that is that it could kill the hacked amphora and put up a new one...22:09
rm_worki don't see the problem22:09
codekoberm_work: does queue size 1 always keep the latest message, i would think it would just stop accepting messages after reaching that size22:09
rm_work"Messages will be dropped or dead-lettered from the front of the queue to make room for new messages once the limit is reached."22:09
rm_workper the rabbitmq docs22:10
codekobethanks^^  cant read apparently22:10
sbalukoffrm_work: Unless, say, there's a vulnerability in rabbitmq that just requires having access to any queue to exploit. I mean, clearly that software has been well hardened</sarcasm>22:10
rm_worksbalukoff: i mean, what could be exploited?22:10
rm_worki guess being able to read from the queue, they could see which amphorae are up?22:10
codekoberm_work: i think this would also allow for ha controllers22:10
codekoberight out of the gate22:11
rm_workcodekobe: right22:11
sbalukoffrm_work: Are you expecting me to know all the ways in which someone might try exploiting rabbitmq?22:11
codekobewhichever controller pulls the old message can take action22:11
sbalukoffA cache would allow for HA controllers, too.22:11
rm_workcodekobe: correct, that was my thought22:11
codekobeand the controllers job primarily is going to be to execute tasks form a queue22:11
rm_worksbalukoff: the queue is a cache22:11
codekobethat is all it is doing with api calls22:11
codekobeapi cals go to queue, controller picks up22:11
rm_worksbalukoff: trying to imagine the worst possible thing that could happen22:12
rm_worki guess, they take down the queue?22:12
rm_workwhich would cause the controller to freak out22:12
rm_workand not be able to get status from any amphora22:12
codekobeso it does kind of make sense for amphora to write to 1 message queue, and if the timestamp is too old, the task takes action, otherwise it ignores22:12
rm_workwhich could cause downtime...22:12
sbalukoffrm_work: I look and see a bloated piece of software on whose access we wouldn't be able to do sanity checks.22:13
codekobeso i imagine we are having a cluster of queues22:13
sbalukoffI don't like it.22:13
rm_workcodekobe: at least you seem to understand what i'm getting at :P22:13
codekobeand i also imagine that if the queue is down, our api is down22:13
rm_worksbalukoff: we're using queues other places too...22:13
codekobebecause most configuration calls will be async22:13
rm_worksbalukoff: do we not like queues anymore now?22:13
sbalukoffrm_work: Nothing as exposed to the amphorae22:13
sbalukoffRemember, amphorae are going to be the front-lines for attacks.22:14
sbalukoffThey *will* occasionally be exploited.22:14
rm_worksbalukoff: well, I assume we won't have any remote access running on public ports22:14
sbalukoffrm_work: Well, I hate rabbit. :P22:14
rm_workso if they take over HAProxy...22:14
sbalukoffMostly because it's an unreliable piece of crap. ;)22:15
sbalukoffBut... eh...22:15
rm_workwell, let's even say they somehow manage to INSTALL SSH on it and get it running on a port, and log in to the amphora with root access22:15
sbalukoffrm_work: If they take over the amphora, we need to minimize their ability to affect other amphorae or the undercloud.22:15
rm_workat that point, they could go after the queue I guess, or the API22:15
sbalukoffThis is part of the reason amphorae live on an "LB Network" and not any undercloud network.22:15
codekobeso they could still attack the api22:16
rm_worksbalukoff: i mean, they ARE on the network with our API22:16
sbalukoffSure, but it's easier for us to limit the effectiveness of that attack since there will be only a few types of messages that get passed over it.22:16
codekobeand now you are talking about having a shared secret between the amphorae and the controllers so that it cant affect other amporae22:16
rm_workthere would only be ONE type of message on this queue22:16
sbalukoffcodekobe: Yes, indeed.22:16
codekobeso22:17
codekobei have an idea22:17
codekobehear me out22:17
rm_workso, you're expecting them to be able to... DOS the queue?22:17
sbalukoffrm_work: There are going to be more than just healthchecks on this.22:17
sbalukoffedge notifications will also happen over this mechanism (eventually)22:17
codekobeok22:17
rm_worksbalukoff: i thought we were talking about the healthchecks mechanism :P22:17
codekobeso if this is a 1 message queue22:17
sbalukoff(BRB... really need to use the bathroom.)22:17
codekobethen that means each amphorae will have its own queue22:17
codekobewhich can also have its own creds for that queue22:18
rm_workyeah, i wonder how rabbit can handle that scale22:18
codekobeso you could isolate the attack there22:18
rm_workcodekobe: i think he is worried that rabbitmq might have a vulnerability that could allow a DoS?22:18
codekobebecause if the amp was compromised, it would only have access to its own queue22:18
rm_workessentially it seems like he doesn't trust rabbitmq to be secure22:18
codekobewell, you can sign messages....22:19
rm_worki mean really, it's just anything that speaks AMQP that we're talking about here22:19
codekobeyes22:19
codekobenot just rabbit22:19
codekobebut rabbit is probably the most common implmentation22:19
codekobebut really we will HAVE to use oslo.messaging22:19
rm_workhmm yes22:19
codekobeso whatever amqp backend it supports22:19
codekobeso i think the Dos attack angle is a moot point22:20
xgermancodekobe each amphora has it's own queue is a bad idea. Trust me!22:20
codekobeif an amphorae gets compromised it can dos the controller rest-api just as much as a queue22:20
xgermanSorry, I am distracted (sbalukoff wants me to put stuff in gerrit)22:20
codekobelol^22:20
codekobeso this design will end up in gerrit here22:21
codekobebut this is a good discussion to figure out what that spec will look like22:21
xgermanno, it's some other things I promised him22:21
codekobegotcha22:21
xgermanbut amphora talking to queue is bad22:21
xgermanit introduces a single point of error you don;t have when you make the controller talk to REST IHMO22:21
codekobecontinuing to think it through22:21
sbalukoffSo yes, I don't trust rabbitmq to be secure.22:21
rm_workxgerman: it also removes a single point of error, which is the controller22:22
rm_workIE, a controller failover event22:22
codekobei dont follow the single point of error, as the queue is clustered, but i do think it will be annoying for every new amphorae to require a 1 message rabbit queue to be made with creds22:22
rm_workthe queues essentially get created on the fly IIRC22:22
codekobeyeah, with a queue, you do get HA controllers out of the box22:22
codekobewith no failover22:23
sbalukoffAnd it's not really about a DOS vulnerability-- I mean, it's actually not hard to DOS an amphora from the internet, since these things will usually be the front-end "webserver" for a given openstack cluster.22:23
rm_worki've never had to "create" a queue22:23
codekobejust a grid of controlelrs22:23
rm_workyou just define it and start using it, and it's there22:23
codekobebut we will have to apply creds to it i imagine, but i guess that is tribial22:23
codekobe*trivial22:23
codekobeit is a weird pattern, but it does give you HA controllers22:24
intr1nsicI think using rabbit is more secure than trying to re-invent something different22:24
codekobewithout having to worry about failover22:24
codekobeand we arent going to avoid using rabbit (or favorite amqp here)22:24
rm_workand it works more reliably than UDP and allows the amphora announce time to diverge from the controller poll time, and doesnt risk overloading the controller with health messages22:25
bloganbetter question is if this has been tried before adn what were the results22:25
codekobethat is a good question22:25
bloganhow does trove do heart beats?22:25
codekobei still need to deep drive how trove is doing this22:25
xgermanwell  codekobe we tried that before at HP and it doesn't work22:25
rm_workxgerman: what did you try exactly?22:25
intr1nsicTried using the queue for heartbeats?22:25
rm_workthis is a very specific configuration22:25
rm_worksingle-message queues, NOT in reliable mode22:26
codekobeI am blaming this on rm_work, he brought it up first hehe22:26
rm_workdurable, whatever22:26
xgermanyes, we tried using a queue for heartbeats --22:26
rm_workcodekobe: heh, fine by me22:26
rm_worki did lead with "this is a crazy idea" :P22:26
rm_workbut i like it22:26
rm_worka lot22:26
codekobebut if it works i would like to be known as an advocate22:26
rm_workwould need to do some testing to check feasability at scale22:26
codekobeyes, needs a load test probably22:27
rm_workjust because no one has done it before, doesn't mean it's a bad idea :P22:27
bloganim sure someone has done it before22:27
codekobei mean, the queue should be able to handle it as well as an http api22:27
xgermanwell, I said we did that before22:27
sbalukoffxgerman is saying they've done it before.22:27
codekobebut we would need some test here22:27
rm_worki am still not clear that they did EXACTLY this22:27
bloganxgerman: what issues arose? load issues?22:27
rm_workspecifically, single-message queue length, and non-durable22:28
xgermanwell, none of the queue implementations existing are very good at HA in a neutron entwork22:28
rm_workand yeah, i'm curious what the problem ended up being that prevented them from moving forward with it :)22:28
rm_workerr, wat22:28
rm_workyou couldn't get the queues to work HA?22:28
rm_worki feel like that's a problem for using queues for anything, in general -- which is a problem because we use queues elsewhere in Octavia already!22:29
bloganwell i dont care what way we do it, as long as it works well, but i have to reiterate that I believe a REST API for just a heartbeat is a bit overkill, similar how it was said a queue was heavy handed22:29
xgermanwell, hear me out.22:29
sbalukoffblogan: I'm actually fine with going with a UDP-based heartbeat (and edge notifications) from the start.22:30
xgermanSo for RabbitMQ you can only run Active standby in a neutron network - which limits your scale22:30
sbalukoffAgreeing to do it over REST was a compromise.22:30
rm_worksbalukoff: well, i have a trove of concerns (lol, pun) about the UDP method22:30
sbalukoff(over REST as well / or at first)22:30
blogansbalukoff never makes compromises, who is this imposter?22:30
sbalukoffrm_work: I'm sure you do.22:30
codekobewell lets define some requirements maybe?22:30
sbalukoffcodekobe: Probably a really good place to start.22:31
codekobebecause there is at least one thing we should think about before choosing impementation22:31
xgermanalso when you do a healthcheck over the queue you are testing that the VM can send message to the queue but NOT if the VM is reachable from outside22:31
sbalukoffThat, and educating ourselves on how Trove does this, and what didn't work with the HP implementation.22:31
bloganjust fyi, the amphora lifecycle management doc will not go into detail about hwo the heartbeat will be accomplished, just that it is done22:31
codekobeDo we want the controllers to be HA, or have to failover22:31
xgermanwe had vms which could send message to the queue but where mnot accessible anymoere from the outside (all because of Neutron)22:31
rm_workxgerman: hmm, so the healthchecks would be from the amp to the controller API via the public interface?22:31
sbalukoffblogan: Agreed. That doc is supposed to be relatively high-level anyway.22:31
bloganxgerman: couldn't that same thing happen over a rest api? it can only send requests to the controller22:32
xgermanrm_work I wa sthinking the controller would poll the Amp in .5 and then we emit udp22:32
xgermanbut that doesn't solve the wacky neutron problems we have seen at HP22:32
rm_workif the controller polls the amp, just let the amp respond :/22:32
sbalukoffcodekobe: Have a look at the design documents for v0.5, v1.0 and v2.0 to answer that question. :)22:32
rm_worklike, as the response to the poll :P22:32
codekobeahh22:32
codekobei wasnt around for those talks :(22:33
rm_workwhy make a syncronous operation async and UDP? :P22:33
codekobei think it will be hard to switch controller from failover to HA22:33
xgermanbut a common problem for us health check wise is that the VM replies/emits heartbeats in the control network but lost contact to the public Internet22:33
codekoberather than try and design HA22:33
xgermancodekobe controllers will be a cluster -- so intrinsinc HA22:34
codekobeok, i'll have to read up on that22:34
sbalukoffxgerman: So that wasn't a problem I was looking to solve with the UDP-emitted healthchecks (I envisioned them being just over the LB network), but that's good to know.22:34
intr1nsicI'm interested in xgerman experiences but the neutron <-> public access sounds worse than just making sure the instance is up22:34
rm_workxgerman: yeah i don't really understand what method you're advocating for22:34
codekobeso the amphora should report wether it can hit some public interface?22:35
rm_workor, rather, I don't understand the method you mentioned22:35
xgermanwell, I am just telling you about the problems we have with our queue based healthcheck22:35
xgermanit's not like we solved them22:35
rm_workanyway, UDP is a neat idea but impractical for healthchecks when the times get very low -- if you're trying to get down to just a few seconds, and it emits every second, and it misses a couple in a row -- that's not great22:36
xgermanfor what?22:36
sbalukoffrm_work: Again, for sub-second failover we don't rely on this system.22:36
xgermanI am still not convicned that the controller needs to initiate a fialover -- it just nbeeds to clesan up22:36
sbalukoffxgerman: +122:37
rm_worksbalukoff: true, but it shouldn't be long, or you'll end up with cascade failures22:37
xgermanin Octavia 2.0 I think the failover will be done by our ODS component22:37
*** fnaval has quit IRC22:37
sbalukoffrm_work: not long in this case is probably a minute or two.22:37
rm_workif load is what takes down an amphora, then not replacing a downed amp quickly will just lead to an outage22:37
sbalukoffODS?22:37
xgermanOpen daylight22:38
xgermanor SDN22:38
sbalukoffrm_work: A cascading failure is inevitable in that case22:38
xgermanyeah, what rm_work likes to do mis scale up22:38
sbalukoffYou literally can't replace them fast enough if you can't scale horizontally.22:38
sbalukoffAah.22:38
rm_worki mean, i wasn't really saying it now, since autoscaling is a long way down the road, but yeah eventually i'd like to see one amp goes down -> two replace it automatically22:39
rm_workat least short-term22:39
codekobei could see that, if it is due to load22:39
rm_workthough if it's a real DDoS we'd need mitigation at the network layer22:39
sbalukoffrm_work: Ok, so assuming autoscale is running at n+1 capacity, you still have time to "slowly" detect a failure and clean up.22:40
sbalukoffIf you're not running n+1 capacity, you're screwed in any case.22:40
rm_worksbalukoff: I am assuming active-active22:40
rm_workBTW22:40
sbalukoffrm_work: So am I.22:40
rm_workk22:40
xgermanwell, I am assuming like 20 active LBs on the same VIP22:40
sbalukoffxgerman: Yep.22:41
rm_workyeah I guess if you're at 20, losing one and replacing it within 60s isn't too bad22:41
sbalukoffEspecially if those 20 are doing the load of 19.22:41
sbalukoff(ie. n+1)22:41
sbalukoff(At most the load of 19)22:41
rm_workyeah, predictive autoscaling is better than reactive22:41
sbalukoffThough if we're going there, it should probably be a percentage of extra capacity rather than just raw amphorae.22:42
rm_workgetting off-topic though, i didn't really mean to bring that up22:42
sbalukoff:)22:42
rm_worki was just saying i'd like replacement time to not be long22:42
xgermanno worries it's good to talk abpout those things22:42
sbalukoffI can see why you did, I think. I was pointing out that we can't rely on the controller to initiate failovers.22:42
sbalukoffOnly clean up from them.22:42
codekobeso is the spec i am to work on still going to cover healthcheck process? or is the entire lifecycle22:42
xgermanand it was never my intention to make the controller do that22:43
sbalukoffcodekobe: blogan is working on the lifecycle spec.22:43
rm_workyeah but getting onto "predictive autoscaling" at this point is like talking about what we're going to do when we land on a planet in a different galaxy. let's get to Mars first, plox22:43
sbalukoffYou can coordinate with him on that, eh.22:43
codekobeawesome, so i still care how we implement the healthcheck then22:43
xgermanso healthchecks have two components:22:43
xgermanHow do we mwaure the health of the lb?22:43
xgerman- network reachability22:43
xgermanis haproxy running )pid)22:43
xgermanis it working (stats?)22:43
xgermando we see traffic?22:44
xgerman...22:44
xgermanand then how do we get that info to the controller22:44
rm_workyeah, and checking that can be initated either by the controller or by the amp22:44
rm_workand we're trying to solve for scalability22:44
sbalukoffxgerman: So all of that can be handled via some localized health-check daemon on the amphora, and then reported to the controller somehow, right?22:44
rm_workI think that was the reason for going to UDP22:44
codekobeok, so we still need to figure out the whole amqp vs udp thing22:44
codekobeetc22:44
rm_workbut I think we also can't trade too much reliability22:44
sbalukoffcodekobe: Yes.22:45
rm_workwhich is where the queue idea came in -- since it solves for both22:45
codekobei'm not sure if we made that decision or just got sidetracked ealrier22:45
*** jorgem has quit IRC22:45
xgermansbalukoff yes22:45
rm_workwe got sidetracked22:45
rm_workUDP is unreliable, but closer to being a scalable option22:45
sbalukoffAnd I would prefer to look at what Trove is doing, and find out if there were any queue-related problems from HP's experience doing this.22:45
rm_workREST is reliable, but less scalable22:45
rm_work(IMO)22:45
sbalukoffrm_work: Agreed.22:46
rm_workAMQP is reliable and scalable22:46
rm_work(again, IMO)22:46
rm_workso that's why I proposed a queue based solution22:46
sbalukoffPart of the idea here is that it shouldn't matter if a health check is missed once in a while. Hence the reason I didn't worry too much about UDP.22:46
blogannot reliable from xgerman's experience22:46
rm_workyeah, so, revising to "reliable in theory"22:46
xgermannot reliable for a lot of messages/clients22:46
sbalukoffAlso not from ours--22:46
rm_workso, we need to validate concerns22:47
sbalukoffOur first load balancing product used rabbitmq for its messaging.22:47
sbalukoffThe problem we found was that client libraries were unreliable.22:47
bloganwell we will most likely be using rabbitmq for messaging correct? at least from the API to the controller22:47
sbalukoffThey didn't handle network events, or server restarts well.22:47
rm_workyeah, but there is a huge difference between durable and non-durable queues, for one -- and the number of messages that go on the queue make a difference as well22:47
xgermanyes, and we anticipate not having 10,000 controllers22:47
codekobenetwork events they do not22:47
rm_workwe're talking about 100% non-durable for this22:47
sbalukoffblogan: Fewer components will rely on it. :P22:48
rm_worknon-durable non-persistent queues should not have any of the problems you are mentioning22:48
blogangood point22:48
codekobeahh22:48
blogani was just validating that we will be using it still22:48
codekobeok, well any way we can timeout on this topic?22:48
codekobewe shoudl dig into trove22:48
bloganyeah im about to head out22:48
codekobeor at least I should22:48
rm_workk22:48
codekobeand then report back22:48
codekobemaybe tomorrow?22:48
sbalukoffSure22:48
xgermanwell, practically we run a cluster of queues with on queue server in each availability zone22:48
sbalukoff(Also, again, I don't think rabbitmq is hardened enough for the environment we'll be putting it in.)22:49
xgermanyep, most queues we have seen have issue if the network mis less than psristine22:49
rm_workhmm22:49
sbalukoffxgerman: I didn't understand the last part of what you just typed.22:49
rm_worki have used queues over the Internet and never had problems >_>22:50
rm_worki was kinda hoping a local network wouldn't be a problem22:50
bloganrm_work: have you used queues over neutron at scale?22:50
xgermanok, we have issues that neutron network is dropping packets from time to time which seem to make queue servers very angry22:50
rm_workto be fair, no22:50
rm_workbut that's why i said we'd need to do some load testing22:50
rm_workxgerman: like, for their cluster sync?22:51
bloganbut if xgerman has experiences in it already, thats good enough a test22:51
sbalukoffxgerman: And UDP messaging would be OK with this, so long as packet loss isn't high.22:51
xgermanrm_work -- tes cluster sync is  a no-go22:51
rm_workblogan: well, i'm saying that what i proposed could be significantly different enough from what they tried to make it a different beast22:51
xgermansince we can only run active-passive to avoid split brain22:51
rm_workbut, if the problems were with the basic infra, then maybe not22:51
*** dboik has quit IRC22:52
rm_workwell, you do have to remember that we're talking about this as opposed to *UDP*22:52
rm_workso reliability concerns are ... >_>22:52
bloganmgiht be different but if the problem is teh queues and the unreliable neutron networks then it probably won't matter22:52
sbalukoffActually, they're better with UDP.22:52
blogananyway i gotta go22:52
bloganill talk to yall later22:52
xgermancool22:52
sbalukoffIf rabbit gets messed up from a few dropped packets, it doesn't recover gracefully.22:52
xgermanno it doesn't22:52
rm_workok, so, maybe not rabbit <_<22:52
rm_workbut i'm talking about AMQP22:53
xgermanwell, gearman will hang, too22:53
sbalukoffIf we drop a few UDP packets, who cares? Things keep on chugging once the next packets come through.22:53
xgermanyeah, same with REST22:53
sbalukoffxgerman: +122:53
xgermanalso our problems get magnified since we run the queues with TLS22:53
sbalukoffTrue, and we will need to, since the LB Network is not a trusted network.22:54
sbalukoffAnyway, again, let's find out what Trove is doing.22:54
sbalukoffAnd table this for now.22:54
sbalukoff(Any objections to that?)22:54
rm_workkk22:54
intr1nsic+122:55
xgermanI know the PTL we can ask him in paris ;-)22:55
sbalukoffNice!22:55
xgermanyeah, he is HP...22:55
rm_workheh22:56
rm_workgs22:56
rm_workerr22:56
rm_workjust put up https://review.openstack.org/13188922:57
sbalukoffOk, I'mma go get some lunch. I'll send my unreasonable demands (chat) at you later!22:59
rm_worklol lunch22:59
rm_workso, I went ahead and jumped the gun and implemented the TLS spec (partially) in that CR :)23:00
sbalukoffBastard!23:01
sbalukoffOk, I'll have a look when I get back. ;)23:01
ajmillertrevorv blogan sbalukoff I just posted a new patch to https://review.openstack.org/#/c/130002/623:06
xgermanrm_work how do we mark as work in progress?23:15
xgermananybody?23:16
xgermanok, figured it out23:18
*** ptoohil__ has joined #openstack-lbaas23:34
*** ajmiller has quit IRC23:37
*** ptoohill_ has quit IRC23:38
*** barclaac|2 has quit IRC23:45
rm_you|xgerman: ah sorry, yeah23:50
xgermanno worries23:50
rm_you|review -> workflow -123:50
rm_you|but you got it23:50
xgermanyeah, I am pair programming with Min23:50
*** barclaac has joined #openstack-lbaas23:55
*** rm_you| is now known as rm_you23:56

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!