Tuesday, 2013-10-22

*** jdaggett has joined #openstack-marconi00:15
*** jdaggett has quit IRC00:19
*** amitgandhi has joined #openstack-marconi00:25
*** reed has quit IRC00:43
*** reed has joined #openstack-marconi00:44
*** jdaggett has joined #openstack-marconi00:45
*** jdaggett has quit IRC00:50
*** nosnos has joined #openstack-marconi00:52
*** jdaggett has joined #openstack-marconi01:36
*** amitgandhi has quit IRC01:36
*** oz_akan_ has joined #openstack-marconi01:41
*** oz_akan_ has quit IRC02:22
*** reed has quit IRC02:38
*** reed has joined #openstack-marconi05:49
*** yassine has joined #openstack-marconi08:11
*** cthulhup has joined #openstack-marconi08:22
*** reed has quit IRC09:11
*** tedross has joined #openstack-marconi11:38
*** malini_afk is now known as malini11:40
*** malini is now known as malini_afk12:13
*** malini_afk is now known as malini12:15
*** nosnos has quit IRC12:27
*** nosnos has joined #openstack-marconi12:28
*** nosnos has quit IRC12:32
*** ayoung has quit IRC12:51
*** jcru has joined #openstack-marconi12:54
*** oz_akan_ has joined #openstack-marconi13:08
*** alcabrera has joined #openstack-marconi13:10
alcabreraGood morning! :D13:11
*** mpanetta has joined #openstack-marconi13:20
*** malini is now known as malini_afk13:31
*** amitgandhi has joined #openstack-marconi13:42
*** ayoung has joined #openstack-marconi13:49
openstackgerritZhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources  https://review.openstack.org/5312714:19
zyuan^^ i'm trying to find out which change makes py26 failed14:19
zyuandon't review these14:20
alcabrerak14:22
openstackgerritZhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources  https://review.openstack.org/5312714:24
*** reed has joined #openstack-marconi14:30
openstackgerritZhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources  https://review.openstack.org/5312714:36
openstackgerritZhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources  https://review.openstack.org/5312715:00
openstackgerritZhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources  https://review.openstack.org/5312715:04
*** whenry has joined #openstack-marconi15:05
*** jdaggett1 has joined #openstack-marconi15:08
*** malini_afk is now known as malini15:08
openstackgerritAlejandro Cabrera proposed a change to openstack/marconi: feat: add shard management resource  https://review.openstack.org/5070215:13
openstackgerritAlejandro Cabrera proposed a change to openstack/marconi: feat: shards storage controller interface  https://review.openstack.org/5072115:14
openstackgerritZhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources  https://review.openstack.org/5312715:17
*** vkmc has joined #openstack-marconi15:17
*** vkmc has quit IRC15:17
*** vkmc has joined #openstack-marconi15:17
*** kgriffs_afk is now known as kgriffs15:23
openstackgerritZhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources  https://review.openstack.org/5312715:24
zyuanpelase review https://review.openstack.org/#/c/53127/15:29
zyuani changed nothing, and it passed. jenkins you win.15:29
alcabreralol15:30
alcabreraI'll review that patch once I finish getting jenkins happy on my local rebase.15:31
alcabrera*appeasing jenkins15:31
alcabreraerrmm... hmm.. jenkins -> tox15:32
alcabrera:P15:32
kgriffsalcabrera: got a minute to discuss health endpoint stuff?15:40
alcabrerayup15:40
kgriffsok, so we have two bugs15:40
kgriffshttps://bugs.launchpad.net/marconi/+bug/124292615:40
alcabrerawhitelisting for keystone ^^15:41
kgriffsso, my thinking there was to have keystone inject a header, e.g., X-Auth-Whitelisted15:41
kgriffsor something15:41
kgriffsso the other middleware can key off of that and not complain about missing auth stuff15:41
kgriffsthat would allow the solution to work with EOM as well as if an operator is just deploying with out-of-box middleware support15:42
alcabreraI see. As for keystone itself, we'd need support for whitelisting particular routes.15:42
alcabreras/itself/input15:43
zyuanhmm15:43
zyuanthat looks scary15:43
kgriffswhy scary?15:43
zyuanwhat if user send X-Auth-Whitelisted?15:43
mpanettaDon't forget that the capabilities of the thing running the health check is pretty minimal...15:43
mpanettaI can't send any headers.15:44
kgriffsthe keystone middleware would strip that header from the client, just like it already strips X-Roles15:44
kgriffs(and such)15:44
zyuanour X-Project-Id?15:44
kgriffsmpanetta: the LB does not need to send that header15:44
mpanettaAh ok15:44
kgriffsit would be injected by the keystone middleware15:44
kgriffsjust to notify downstream middlewarez15:45
alcabrerahmmm...15:45
kgriffsthat is one option15:45
kgriffsother options include having another app15:45
zyuan...15:45
kgriffstwo variants on that15:46
zyuanif we only allow the header on 1 endpoint, that looks fine15:46
kgriffsa. simple middleware app that implements it's own health, otherwise passes through15:46
kgriffsb. middleware that implements health check but translates it to a request to post something to a special/hidden "health check" queue15:47
alcabreraI kind of like (a), given that it would just wrap the marconi app.15:48
kgriffs...which brings me to my second bug.15:48
kgriffshttps://bugs.launchpad.net/marconi/+bug/124326815:48
mpanettaAh15:49
alcabreracool - a deeper health check15:49
mpanettaYes, we need a deep check.15:49
kgriffsWhatever we do, I think that we need to check that we can talk to a real process and that process can talk to it's storage backend15:49
zyuanhow deep? all shards?15:49
kgriffsthe check goes to a single web head15:50
kgriffsthis is for the LB, right?15:50
alcabrerazyuan has a point - a single webhead can access any shard.15:50
kgriffsis this also going to be used by the uwsgi router?15:50
mpanettayes15:50
mpanettawell LB, does not have to go through router.15:50
kgriffs(or localhost nginx balancer, whatever operators deploy in front of workers)15:51
mpanettaBut I think that was how we were going to expose the endpoint.15:51
kgriffsmpanetta: if the router support checking worker health, that could be useful as well - esp. if it knows how to restart workers15:51
mpanettaBut yeah, single web head, this is not a health check as would necessarily be performed by a queues user, just the system.15:51
kgriffsor some kind of localhost daemon manager15:52
mpanettakgriffs: It actually seems to know how to restart pretty well, I killed some workers manually the other day and it happily restarted them.15:52
kgriffsok, but can it preemptively check for hung or 500's or something15:55
mpanettaNot that I am aware of15:55
kgriffsok15:57
kgriffsi guess the LB alerts when a node becomes unhealthy?15:58
kgriffsso someone can look at it??15:58
mpanettaIt drops the node from the rotation15:58
mpanettano alerts are sent the way we have the LB's configured at the moment.15:58
kgriffsah, that should be remedied. If something goes MIA for an extended period, someone needs to be notified15:59
mpanettaYes15:59
kgriffsaaaanyway16:02
kgriffsre sharding, that *is* a good point16:06
kgriffsSo, a deep check would need to verify it can communicate with all shards16:07
kgriffsso, the shard catalog would need a "health check" call or something16:07
kgriffsactually16:07
kgriffsif the storage driver interface includes a "health" method16:08
mpanettaShards, are the db distribution method?16:08
kgriffsthen we are cool, since the shard driver can just implement that as well16:08
zyuanif so, then the "healthy" need to be related to the sharding fallback algorithm16:08
zyuanwe are going to use16:08
mpanettaThat sounds perfect16:08
kgriffsmpanetta: app-level db sharding16:08
mpanettakgriffs: Ah ok16:09
alcabrerahmm...16:10
alcabrerait shouldn't be very difficult to extend the shard catalogue storage driver to check the health of all registered shards.16:10
*** mpanetta is now known as mpanetta_lunch16:11
*** yassine has quit IRC16:20
*** jdaggett has joined #openstack-marconi16:50
*** jdaggett1 has quit IRC16:54
openstackgerritAlejandro Cabrera proposed a change to openstack/marconi: feat: shards mongodb driver + tests  https://review.openstack.org/5081516:57
*** whenry has quit IRC17:03
*** fvollero is now known as fvollero|gone17:04
zyuankgriffs: ping17:48
zyuani think i will be easier to add an noop() to storage interface18:00
zyuani think it need a very short connection timeout...18:01
*** ametts has quit IRC18:01
*** JRow has joined #openstack-marconi18:19
*** JRow has quit IRC18:43
alcabrerakgriffs: ping18:54
zyuankgriffs: i have some questiong18:54
zyuan...18:54
zyuanyou first18:54
alcabrera:D18:54
kgriffspong18:55
alcabreraw00t18:55
*** JRow has joined #openstack-marconi18:55
alcabreraso to be sure, every patch up to https://review.openstack.org/#/c/50815/ is rebased and ready.18:55
alcabreraI need to double check the single catalogue driver patch.18:55
alcabreraI'm almost done rebasing the transport + storage part of sharding.18:56
alcabrera*sharding admin18:56
*** mpanetta_lunch is now known as mpanetta18:57
kgriffsok, I will take a look at those shortly19:00
alcabrerakgriffs: thanks!19:00
zyuankgriffs: i noticed that the whole app only uses 1 database connection.  it might be a stupid question but.... why not 1 connection per client?19:01
kgriffsthe hole app, meaning "queues" app?19:03
kgriffswhole19:03
zyuanwhole19:03
zyuanyea19:03
kgriffsin the case of the mongodb driver, pymongo maintains its own connection pool iirc19:05
zyuanwsgi container can run N apps, but N database connections are served; multiple sessions currently share 1 connection.19:05
zyuankgriffs: it doesn't matter iiuc, that's only usefull when you ask you multiple connections; for each app we asks for 1 connection19:06
zyuanfor* multiple...19:06
kgriffseach wsgi app is single-threaded, right?19:07
kgriffsi mean, each worker process19:07
zyuanit's allowed to be not19:07
zyuan^^ not very sure19:07
kgriffswell, you could try deploying Marconi using multithreaded workers19:08
kgriffsbut, I don't think anyone has tried it yet19:08
zyuanno, it's allowed to be not. because flask support multithread19:08
kgriffsI'm trying to think whether Falcon has any state that would blow up in a multithreaded environment19:09
zyuanso, you mean we are just fine to reuse the database connections?  can greentlet prempt between queries?19:09
kgriffspymongo is gevent-aware, not sure about eventlet19:09
kgriffsso, you can multiplex across a single client connection19:10
zyuanhmmm, ok19:10
alcabrerapymongo has issues with eventlet, iirc, because it runs on gevent19:10
zyuani know sqlite don't work if you have multi thead access 1 connection...19:10
zyuani asks this because traditional website only connects to db when a request come in19:11
zyuananyway.  then that means we do need a noop() DB access to test whether a shard is still alive19:12
kgriffstraditional setups also  use thread pools per connection.19:12
kgriffsbut usually that is less efficient than using an evented module combined with a set of single-threaded worker processes19:12
zyuankgriffs: that's fine; the purpose is to limit the total threads count, but still 1 thread 1 connection19:13
zyuankgriffs: i vaguely thing so19:13
zyuank*19:13
zyuanthe next question is about X-Auth-Whitelisted19:13
zyuani don't know what Marconi need to be done.19:14
zyuankeystone controlled every access19:14
zyuanif there is no request come to marconi, there won't be a X-Auth-Whitelisted seen by anyside19:15
zyuani looked at keystone's conf, it makes more sense to be if there is an configuration to exclude some uri...19:16
kgriffsyeah, I was thinking it would be great to patch python-keystoneclient middleware to support whitelisting, but...19:21
kgriffsit can be a real pain to get them to accept anything19:21
kgriffs(or so I've heard)19:21
zyuanbut X-Auth-Whitelisted also need to patch keystone; i don't see what marconi can do...19:21
zyuan(unless we have two apps...)19:22
*** JRow has left #openstack-marconi19:28
kgriffsjust a sec19:28
kgriffsI have an idea19:28
zyuanbtw, pls review https://review.openstack.org/#/c/53127/ ; i finally get jenkins accepted it (by doing nothing)19:31
*** malini is now known as malini_afk19:32
kgriffshttps://docs.google.com/file/d/0BxZAkOZUwvFAWnU1aWJfdjJ6V1U/edit?usp=sharing19:34
zyuan??19:35
zyuancoooool19:36
zyuani uses dia19:36
zyuanthis is.... basically a dia19:36
kgriffshttp://oi39.tinypic.com/rrqb80.jpg19:37
alcabrerakgriffs: +1 - middleware saves the day19:37
zyuanput pipeline before keystone?19:38
kgriffsso, that is one option.19:38
kgriffsi am thinking we could have a generic wsgi pipeline constructor19:38
kgriffsdownside is this won't work for non-HTTP transport19:39
kgriffsbut I guess that is something to discuss later19:39
kgriffsalthough19:39
kgriffswe could just translate ZMQ messages to WSGI calls. :p19:39
kgriffsaaaanyway19:40
kgriffscross that bridge when we come to it19:40
alcabrerathat'd be a chore. :P19:40
alcabrerayup19:40
kgriffssince we don't know yet how auth will work period in that case, anyway19:40
alcabreragood point - keystone zmq auth plugin.19:40
alcabrera>.>19:40
mpanettaHow would whitelisting work?19:49
kgriffsso, my current thinking is we have a wsgi pipeline app that is configurable via json19:49
mpanettaI guess what I mean is, how does the system determine who/what is whitelisted?  Are only local connections allowed to be whitelisted?19:50
kgriffsyou give it a list of apps and it loads them with stevedore19:50
kgriffsum19:50
kgriffsit can be whatever you like, I suppose19:50
mpanettaAh ok it is app level whitelisting19:50
kgriffswhitelist based on URL and/or ip address or something19:50
kgriffsbut really, can't you block the auth url at the LB from outside users hitting it?19:51
kgriffs(blacklist)19:51
kgriffsthat seems more reliable than the app trying to determine whether the caller is an admin or load balancer or something19:52
mpanettathe LB is quite dumb19:54
mpanettaIt does not allow URL filtering19:54
mpanettaI think the idea was the admin endpoint would not go through the LB and would only be internally accessable.19:54
kgriffshmm19:55
mpanettaI don't think the app should know anything, tis why I thought a separate app for health check would be good.19:55
mpanettaThat app would be responsable for auth.19:55
kgriffsbut then you are only checking whether the health app is "healthy", not the app itself19:55
kgriffswait19:56
kgriffsI think I see where you going19:56
mpanettaNo cause the health app would do a queue post, and etc19:56
mpanettaif the queue post fails we return a 5xx19:56
mpanettaI would assume if the storage backend is bad on the node the queue post woud fail.19:56
mpanettaIs that a poor assumption>?19:57
zyuani'm implementing an alive() method for storage (which calls mongoclient's alive())19:57
zyuani think one app is enough, since if the app itself is down, you won't get a 200 response anyway19:58
kgriffsi think we should actually try to post something - that would be a deeper test, would it not?19:58
kgriffsand it isn't like this is going to be gazillions of pings per second or anything19:59
zyuanunless the db is phicially broken, i don't see a need of something other than a no-op19:59
mpanettaNope, should be no more then a few a minute.19:59
kgriffsmpanetta: thing is, if you have a separate app, how does it talk to the localhost "real" app without going through auth? It would have to have it's own valid cloud creds19:59
kgriffszyuan: there may be a failing disk or something that alive() isn't going to check20:00
kgriffsremember that this needs to be storage agnostic20:00
zyuandb connection can break of course20:00
zyuanfailing disk can not be checked with a queue posting or something20:01
zyuanbecause usually these requests don go to disk20:01
zyuanyou need disk monitor services20:01
mpanettaDon't care about that20:01
mpanettawe only care of the queue service will respond20:02
zyuan"health", to me, only means "connection is good"20:02
zyuanyea20:02
mpanettaand by respond, I mean allow [posting to a queue.20:02
mpanettaBeyond that, we don't worry about, disk failure is out of scope.20:02
zyuanafaic [posting to a queue] is just an implementation detail20:03
mpanettayes20:03
zyuanwhat mongo's alive() does is to select on connections20:03
kgriffshttp://d3j5vwomefv46c.cloudfront.net/photos/large/816932810.png?138247216320:03
zyuani belive that's enough20:03
kgriffszyuan: Its up to the devops guys, what they want20:04
kgriffsmpanetta, oz_akan_: how deep of a check do you need?20:04
mpanettaTechnically all we care about is that we can post to a queue, that is enough proof to me that the server is alive.20:04
mpanettaThe app (or whatever it end up being) will only return a simple status code.20:05
kgriffsok, so the nice thing about that is we don' have to implement it differently for each storage driver20:05
mpanettaI think oz_akan_ has something else in mind for zenoss, but for the LB all we need is a go/nogo20:05
*** vkmc has quit IRC20:05
kgriffswe could just attempt calling the driver's post() method20:06
kgriffsI guess we would have to ensure the queue is created first too. :[20:07
zyuanit's create()20:07
kgriffsthe sharding driver would just call that for each driver under it's control. I guess it is trickier for when sharding is not enabled. We currently go directly to the storage driver's method20:08
kgriffsah20:08
kgriffswe could just have a default implementation in the base class that does the check to try and post a message.20:08
kgriffsthe sharding driver would be the only one that would need to override it20:08
kgriffszyuan: ?20:09
zyuani feel unconfortable about including a testing method in API20:09
mpanettakgriffs: Yeah of course, but that should be simple logic.20:09
zyuani'm in favor of a proof of working method...20:09
kgriffszyuan: messages controller defines post for messages20:09
mpanetta(The que existing part)20:09
zyuani mean queues20:09
zyuanthey want queues20:09
kgriffsmpanetta: I thought you said post a message, not create a queue for the test?20:10
zyuanso , you see, different people want to test different parts. they show the evidence, but not proof20:10
mpanettakgriffs: Well, the queue has to exist to post to it, no? ;)20:10
kgriffsyeah20:10
mpanettaQueue creation should only occur once.20:11
mpanettaThis is why I am kind of for just having a health check client, it removes health check from the scope of marconi.20:11
kgriffsif the test is create and delete a queue then we would have to choose unique names each time for the queue20:11
kgriffsjust something to consider20:11
mpanettaDifferent people may have different ideas of health check20:11
kgriffsmpanetta: I'm still not convinced that is a real health check20:11
zyuanif you want test, then just do so20:11
zyuanand test auth at the same time20:12
mpanettakgriffs: Me either20:12
kgriffsif you aren't going through the workers serving the actual requests, then you can't know they are healthy20:12
zyuani don't see a reason why health means test20:12
mpanettakgriffs: Yes, we would only be hitting a single worker.20:12
kgriffszyuan: health check is for LB so it knows whether or not to stop sending traffic to a node20:12
mpanettakgriffs: All though worker health is a uwsgi issue, and it seems to handle that well.20:12
kgriffsimo, we should simulate the user as closely as possible for a health check20:13
mpanettaAt least with the very simple, "Kill some random workers and see if they respawn" test...20:13
zyuankgriffs: then health internally ping each shard, what's the problem?20:13
mpanettakgriffs: I agree20:13
kgriffswe aren't just testing the storage, we are testing the uwsgi20:13
kgriffss/testing/health-checking20:13
zyuanif you can20:13
zyuant get response from an endpoint, obviously your wsgi down20:14
mpanettaIt is basically an end-to-end check, but don't think of it so complex, it really is just a simple (is the system useable?) test I think.20:14
kgriffszyuan: that's the thing. The LB can't ping the endpoint using an Auth token - it isn't smart enough20:14
mpanettaYes, this LB is quite simple...20:15
kgriffsso we are trying to find a way to ping a real endpoint just like a user except without auth20:15
zyuankgriffs: open /heath, that's we'are trying to do, don't it?20:15
zyuandoesn't it?20:15
kgriffsyeah, but that is behind auth right now. We could circumvent that within Marconi itself, but then anyone building up their own wsgi pipeline still has the problem.20:16
mpanettabut I thought just opening /health was basically a noop anyay?20:16
kgriffsso, I was trying to come up with a generic solution. Let's us ping real workers without Auth20:16
mpanetta*anyway20:16
kgriffsit is now20:16
kgriffsthat's why I created two bugs20:17
mpanettaAh, but it does not have to be?20:17
mpanettaOk20:17
kgriffsfirst step is to just fix the auth issue20:17
mpanettaI see now.20:17
mpanettadoes health have to be authed?20:17
kgriffssecond step is to do a deeper health check20:17
*** jdaggett has quit IRC20:17
mpanettaI guess we are worried about a DOS attack in that case...20:17
kgriffsmpanetta: I don't see why it does, although you may want to prevent end users from hitting it unless you know that is will be rate limited20:17
kgriffsor you know it returns sensitive internal data20:18
mpanettakgriffs: Ok, yeah that was my only concern.20:18
kgriffsright now it just returns an empty body20:18
mpanettaIt probably should stay that way...20:18
kgriffsthing is, if rate limiting depends on knowing the project ID, you won't have that unless you auth. :p20:19
mpanettaThere is no default rate limit?20:19
mpanettaA very low default rate limit would be ok...20:19
mpanettaPerhaps20:19
kgriffsyou have to have some way to scope/bucket the counters20:19
mpanettaAh, you do not have an 'unknown' or 'everything else' bin? ;)20:19
alcabreraGuys, I'm heading home. I'll be back online in a bit to finish rebasing the last of the patches. I've hit an annoying unit test issue where bootstrap.storage is *always* returning faulty_driver, so that's been slowing me down. :P20:20
kgriffswell, then someone could do a DDoS on us20:20
mpanettaEither way, I can still forsee an issue...20:20
alcabreraSee y'all in a bit.20:20
kgriffsjust flood us with health checks so the LB can't get in20:20
*** alcabrera has quit IRC20:20
mpanettakgriffs: That was my concern20:20
kgriffshmmm20:21
kgriffsI just realized my last diagram is bogus20:21
kgriffsif it's whitelisted, it is whitelisted for everybody20:21
kgriffsso, rbac check doesn't help20:21
mpanettaCrap, so back to the DDOS...20:21
mpanettaSeems like even more a reason for health check to be external...20:24
mpanettaAt least that way we could expose the end point to only a short list of IP's, the ones specific to the LB's.20:24
zyuanid on't think so20:24
zyuanif you can DDOS health, you can DDOS auth as well20:25
mpanettaAuth requests are cached though aren't they?20:25
zyuanso you can't DDOS health; it currently does nothing and i'm trying to prevent it from doing too much20:26
mpanettaProblem is, doing nothing isn't very useful ;)20:27
zyuanif it proves the app is working, it's enough20:27
kgriffshttp://d3j5vwomefv46c.cloudfront.net/photos/large/816934453.png?138247358520:27
zyuanif you want to test, and add an speciall account for test20:27
kgriffshow about that?20:27
kgriffsoh crap20:28
mpanettaBasically it looks to me like we have 2 paths, we have a very simple health check, that just says that marconi is listening, and is fast to reply, or we have something more complex, like check a queue which exercises the backend as well.20:28
kgriffsneeds to go through worker20:28
kgriffsblah20:28
kgriffsjust a moment20:28
kgriffscan LB hit a different port on the node to do health check?20:28
zyuanmpanetta: if you want that, open an account and post it in client20:28
mpanettano :(20:28
mpanettaLB is super dumb20:28
zyuanit's just correct that LB is dumb20:29
mpanettazyuan: That is what I am saying, maybe we should do that in a seprate app?20:29
zyuanwhere you need that app?20:30
mpanettaon the same box as the web head20:30
zyuanbehind LB?20:30
zyuanif so, i don't agree20:30
mpanettathe request would have to be routed appropriately20:30
mpanettaYes behind the LB20:30
mpanettaIt would have to be, since the LB is running the check.20:30
zyuani don't think so, LB shound not have a way to tell the app logic20:31
kgriffsi'm tired of uploading the pic20:31
kgriffshttps://docs.google.com/file/d/0BxZAkOZUwvFAWnU1aWJfdjJ6V1U/edit?usp=sharing20:31
kgriffsjust go there and open in draw.io20:31
kgriffs:p20:31
mpanettazyuan: Exactly20:31
mpanettaLB is dumb, will always be dumb.20:31
zyuanthen why you want it to behind LB?20:31
kgriffsso, that latest revision to the drawing would be nice if the LB were smart enough to go to a different port on the box20:31
zyuanbehind LB means LB can access this app20:31
kgriffswaaaait20:31
mpanettazyuan: to you, what is behind LB?20:31
kgriffscan't you make your router smart enough to go through the bastian for just /health ?20:32
kgriffsi mean, nginx can do stuff like that20:32
mpanettaTo me behind LB means on marconi side of LB, not client side.20:32
zyuanrequest -> LB -> here -> marconi20:32
mpanettaOk, yes it has to be where here is.20:32
zyuanif request -> LB -> health -> marconi20:32
mpanettaLB has to be able to access20:32
mpanettaYes that is exactly.20:32
zyuanthen LB can tell marconi's logic by accesing health20:32
mpanettaHow?20:32
zyuani don't agree with this.20:32
mpanettaAll the health endpoint does is return go/nogo20:33
zyuanbecause you want this health do real posting20:33
mpanettaInternally20:33
mpanettaAll the LB will see is 200 or 50020:33
kgriffshttps://docs.google.com/file/d/0BxZAkOZUwvFAWnU1aWJfdjJ6V1U/edit?usp=sharing20:33
zyuanreal posting is NG here to me.20:33
kgriffsi saved - not sure if it automagically updates for you guys20:33
zyuanrequest -> LB -> marconi20:33
zyuanand LB can also get 200 or xxx20:34
zyuan"is marconi alive" should be the infomation known by LB20:34
mpanettakgriffs: taking a look now.20:34
*** ayoung has quit IRC20:34
zyuannot "is marconi doing the right thing"20:34
mpanettazyuan: But, isnt alive "doing the right thing"?20:34
zyuanno20:34
zyuanalive means, phically good20:35
zyuanright means, logically good20:35
mpanettaBecause if it isn't 'doing the right thing', then the LB needs to drop it, else clients will see the issue.20:35
kgriffsby "alive" I should think we mean "the node can accept requests from users without 500's"20:35
mpanettaYes20:35
zyuanLB must do not drop because nodes are logically wrong.  LB should be dumb and don't what "logic" means20:35
kgriffs500's or layer 3 errors20:36
kgriffsnetwork link between LB and web heads is already handled by the LB20:36
mpanettazyuan: Why not?  It does for other services.20:36
kgriffswe need to just detect internal app health problems20:36
mpanettazyuan: If a server returns a 5xx, LB will drop.20:36
kgriffsso we can send the user somewhere else before they know anything is wrong20:36
zyuanthere are many cases a server returns 5xx20:36
zyuanand some of them i don't think LB should understand20:37
kgriffsthe health ping is just for checking health then in advance of user requests?20:37
zyuanso far yes, and what i20:37
kgriffsi mean, if the LB already watches for 500's then is the health check needed?20:37
mpanettakgriffs: Specifically it means can this web head serve user requests.20:37
zyuan'm trying to add is to check db connection as well20:38
zyuanmpanetta: so, you see20:38
mpanettakgriffs: If the health endpoint returns 5xx when something is broken then yeah.20:38
zyuanthere is a big gap between "can server user requests" and "can create queue/messages"20:38
kgriffsmpanetta: no, i mean, if in the course of a user request, a 500 comes back, does LB take the node out of rotation?20:39
kgriffs(not a request to health)20:39
mpanettaBasically I point the LB to a specific endpoint and LB pings the endpoint checking for a response.  No response, LB drops, bad (5xx) LB drops.20:39
mpanettakgriffs: No, it won't see it.20:39
kgriffsif so, and user requests happen frequently enough, seems like you wouldn't need the health check20:39
mpanettaI wish it did that ;)20:39
kgriffsme too!20:39
mpanettaBut no, it only checks the endpoint we tell it to.20:39
kgriffsok, so what do you think about that latest design?20:40
mpanettaIt won't load :(20:40
kgriffsbah20:40
kgriffsstand by20:40
kgriffsg+20:40
zyuanlast word: if LB can tell whether user can create a queue or not, it's no longer a "load" balance; it's "marconi" balancer20:41
kgriffsmpanetta755 ?20:41
mpanettaTechnically the LB doesn't know squat.20:41
mpanettakgriffs: panetta.mike20:41
mpanettaAll LB knows is the endpoint we told it returns 500 or not20:41
mpanettaThe one issue that this has to avoid, is that the end user can contact the queue server and not be able to use the service because one of the web heads is malfunctioning, but we can't detect it.20:43
kgriffshttps://plus.google.com/hangouts/_/a260606dab37d907e09607e4dd107f51dd294bd5?hl=en20:43
mpanettaUltimately that is all we care about, if you want to get down to it.20:44
mpanettakgriffs: That image is closer to what I was thinking.20:46
mpanettaexcept the bastion app will reside on the same server as the worker(s) it is responsible for.20:46
mpanettaMainly because it has to... heh20:46
kgriffsyeah20:47
kgriffsso that big box is all localhost20:47
mpanettaOk that looks correct.20:47
*** jdaggett1 has joined #openstack-marconi20:48
kgriffscan the router be made to do that?20:48
mpanettaI believe so yes.20:48
kgriffsthe alternative I guess is to have a wsgi app that does it20:48
kgriffsthen it wouldn't depend on the router - work with any router20:48
mpanettaYes20:49
mpanettaIn our case the router is uwsgi heh20:49
mpanettaSo it would be a uwsgi app20:49
*** alcabrera has joined #openstack-marconi20:50
* alcabrera catches up20:50
kgriffsarg20:51
kgriffsjust though of something20:51
kgriffsnevermind20:51
kgriffsrbac to the rescue20:51
mpanettaalcabrera: if you have the link, kgriffs has pretty pictures in the hangout :)20:52
kgriffshttps://plus.google.com/hangouts/_/a260606dab37d907e09607e4dd107f51dd294bd5?hl=en20:52
mpanettaMaybe I can convince oz_akan_ to come here too heh20:53
alcabreraI'll join up in a moment. :)20:54
* alcabrera is caught up20:54
kgriffsso, the idea is that we would have that health bastian middleware20:54
kgriffsyou configure it with an account which has a specific role that you can key off in RBAC middleware or oslo.policy20:55
*** vkmc has joined #openstack-marconi20:55
kgriffsso, for everything BUT /health, the bastian is pass-through20:55
kgriffsfor /health, it injects an X-Auth-Token20:55
mpanettaHmm20:56
kgriffsthat way, we don't have random users DDoS'n us20:56
mpanettaAn optimization would be to make uwsgi only forward requests to /health to the bastian.20:56
kgriffs(since rate limiting is keyed off the tenant/project ID)20:56
mpanettaAh ok20:56
*** jdaggett1 has quit IRC20:57
kgriffsperformance-wise the passthough should be almost as fast as the router20:57
kgriffsbut yeah, we could make it an external app20:57
kgriffsI was just thinking, putting the bastian into the pipeline means it would work with any kind of router/rproxy you put in front20:58
mpanettaHmm, ok.20:58
mpanettaOr, the health check 'app', could be the first example app for the client lib...  If you want others to be able to reuse it.20:59
mpanettaI donno.21:00
kgriffshmm21:00
mpanettaI really should draw up a low level diagram of how we are doing things now.21:00
mpanettaBasically each web head has 8 uwsgi instances running marconi, and a router uwsgi instance21:01
mpanettaThat selects the marconi instances based on some load balancing logic.21:01
kgriffscrap crap crap21:01
mpanetta?21:01
alcabrera?21:01
kgriffsany user can still hit /health unless somehow health bastian checks who is calling21:01
alcabrerayup21:02
mpanettayeah21:02
mpanettaThat is the biggest issue.21:02
mpanettaAnd I *think* we can handle it in the router, uwsgi has allow/deny rules I believe.21:02
alcabrerashort of having a magic key or something like that for the health bastion configuration, any user could hit that endpoint.21:02
mpanettaWe would have to whitelist the LB ips.21:02
kgriffslet's assume LB can only talk to the web head via internal network21:03
oz_akan_we can have an interesting url that none would know21:03
kgriffsmeaning, a user can't hit the web head directly21:03
oz_akan_/healthsosososdjj3334343421:03
kgriffsoh, nevermind21:03
mpanettasecurity through obscurity is a nono...21:03
kgriffslol21:03
mpanettaThat is true, LB can only talk to web head internally.21:04
*** malini_afk is now known as malini21:04
kgriffsbut it is bad mojo to put secrets in URL21:04
mpanettayes21:04
oz_akan_it is health check after all21:04
mpanettaYeah, but we are worried about health check becoming the achilles heel I think.21:05
kgriffsso, normally we don't care if a user hits /health21:05
kgriffshowever, the LB needs to hit is without auth21:05
kgriffshmmm21:06
kgriffsso we inject auth using a bastian21:06
kgriffsbut then rate limiting goes into a single bucket21:06
kgriffsopening us up for DDoS21:06
kgriffsunless21:06
kgriffsblah21:07
kgriffsnevermind21:07
alcabrerasuggestion - why not: rate limit -> bastion -> auth -> rbac -> app?21:07
kgriffsdoes LB do X-Forwarded-For?21:07
alcabreraDDoS protection by rate limiting?21:07
oz_akan_I thought we had a very simple solution already :)21:07
kgriffsrate limit depends on knowing project ID21:07
alcabrerahmmm21:08
alcabrerathat's fine :)21:08
kgriffsand we don't know that before auth21:08
oz_akan_I mean Mike's solution21:08
alcabreraWe fake the project ID21:08
alcabreraEveryone hitting health has the same project ID21:08
alcabreraHow about that?21:08
kgriffsthat is the problem21:08
kgriffswe don't want that21:08
mpanettaoz_akan_: I think my solution turns out to be complicated when we worry about DDOS.21:09
kgriffsallows non-admin users to mess with the limit counter21:09
kgriffsmpanetta: how about that X-Forwarded-For header?21:09
mpanettakgriffs: I don't know.21:09
openstackgerritDirk Mueller proposed a change to openstack/marconi: Start 2014.1 development  https://review.openstack.org/5321921:10
mpanettaI can't set headers in health check, but maybe the absence of them would be ok.21:10
kgriffsMy thinking was, if that header IS NOT present, we can assume it is the LB itself making the request to /health21:10
mpanettaAh good thought.21:10
alcabreracool - that sounds like it would work.21:11
mpanettaLet me read LB docs to see if it sets that header.21:11
kgriffsthen and only then would the bastian inject the auth. Alternatively, it could skip auth middleware21:12
alcabreraI'm passing on the pretty pictures. I'm pretty heads down on getting this patch rebased. :P21:12
mpanettaalcabrera: Ok :)21:12
kgriffsso, related thought21:13
kgriffsit would be nice if instead of hard-coding keystone auth into the marconi WSGI app...21:14
kgriffswe had a generic notion of a WSGI pipeline21:14
kgriffsso operators don't have to write their own app.py to use other middleware21:14
kgriffsthen we can have a solution to the auth issue that works for everybody21:15
kgriffsthe issue is that sometimes you want to but stuff *after* keystone auth21:15
kgriffsbut the current "auth strategy" approach only allows you to put stuff *before* auth21:15
mpanettaAh yes21:16
kgriffsso, you are forced to not use WSGI transport's auth support if you want to do that21:16
kgriffsand it leads to every operator having to reinvent app.py ad naseum21:17
mpanettaBah, the docs seem lacking wrt set headers from LB21:17
kgriffscan you test21:17
mpanettaHmm21:17
kgriffsi mean, capture request headers with a test request through the LB?21:17
mpanettayeah...21:17
mpanettasec21:19
mpanettaI need to install tcpdump...21:20
mpanettaWell, the user agent is set to an interesting value...21:22
mpanettaYes, it sets X-Forwarded-For21:24
mpanettakgriffs: ^^21:25
kgriffsok, does the health check set that header as well?21:25
kgriffs(not sure what it would set it to!)21:25
mpanettaAnd X-Forwarded-Port as well...21:25
mpanettaIt does not set it for health check21:25
mpanettaOnly header that is set for that is agent id21:25
kgriffsw00t21:26
kgriffsFAN-TAST-IC21:26
mpanettaYep21:26
mpanettaStress level going down ;)21:26
kgriffsok, so only thing left to decide is whether to run a separate app and have a router rule or just stick it in the wsgi pipeline21:27
mpanettahttps://gist.github.com/anonymous/ff85fea856d008f312c521:28
kgriffsfrom my perspective, deploying an extra app on the box seems more complicated, but i could be wrong21:28
mpanettaCheck that for available headers21:28
kgriffsthanks!21:29
mpanettaEh, from my pov it is just configuring another uwsgi instance, so not much more difficult.21:29
mpanettaEither way, how it is done does not matter as much to me ;)21:29
kgriffsok21:29
kgriffshmm21:30
kgriffsseparate app would also require configuration for the loopback21:30
mpanettatrue21:30
*** jdaggett1 has joined #openstack-marconi21:31
kgriffsi'm thinking the wsgi middleware (worker option B) would be better and has a nice property of working with any kind of router21:31
kgriffsgunicorn, uwsgi, nginx, whatever21:31
mpanettaOk21:31
kgriffswithout having to use different configs, or maybe the router is too dumb anyway21:31
mpanettaI don't know about other implementations, but uwsgi is extremely powerful.21:31
mpanettaTo the point of confusion in some cases it seems...21:31
kgriffsindeed21:32
kgriffsready to write your first EOM contribution?21:33
kgriffs;)21:33
mpanettahehe sure lol21:34
kgriffsso, we need a thing that you can configure with a URI21:34
kgriffsif that URI matches, AND X-Forwarded-For is NOT present, then it should inject X-Auth-Token21:35
mpanettaHopefully oz_akan_ will grant me the time to do such a wonderful thing ;)21:35
kgriffsguess we should make an issue21:35
kgriffshold on21:35
* mpanetta holds21:35
mpanettaafk21:36
*** mpanetta is now known as mpanetta_afk21:36
oz_akan_I have to leave now, I will try to understand tomorrow why this needs to be a part of EOM and why mpanetta needs to invest time on this21:36
kgriffsheh21:37
oz_akan_even if it is eom, it has to have a token first.. I didn't follow the thread.. anyway.. talk to you tomorrow21:37
kgriffshe doesn't necessarily, but someone does21:37
kgriffsyeah, you have to configure it with account creds21:37
oz_akan_ok, lets catch up tomorrow, bye for now21:37
*** oz_akan_ has quit IRC21:37
kgriffsttfn21:37
alcabrerasetattr - it's what's been biting me all this time. Something about TestBaseFaulty makes it so that *all* tests were using the FaultyStorage driver. :/21:40
alcabreraStill digging into this21:40
alcabreraI noticed that changing the name of bootstrap.storage to bootstrap.kab was fixing all tests except for those involving the Faulty Storage driver.21:41
zyuanalcabrera: so you want some tests uses Faulty driver or something?21:43
alcabreranah21:43
alcabreraThere's something weird going on that the existing FaultyTest that's affecting the rest of the tests.21:44
alcabrera**on with the ...21:44
mpanetta_afkalcabrera: Your brain is faster then your fingers :)21:49
alcabrerampanetta_afk: yup. :P21:50
*** mpanetta_afk is now known as mpanetta21:50
mpanettaI have that problem at times, it results in sentences missing bits of thought lol21:51
mpanettaAnyway, it is go home time for me.21:52
*** mpanetta has quit IRC21:53
alcabreracuriously, if I remove the setattrs from the FaultyDriver test setup, all tests now pass.21:53
zyuan...21:54
alcabrera*including* the faulty driver tests.21:54
zyuan!!!21:54
openstackzyuan: Error: "!!" is not a valid command.21:54
*** ayoung has joined #openstack-marconi21:57
zyuani want to help if you can't solve it by tomorrow21:57
*** malini is now known as malini_afk22:05
openstackgerritAlejandro Cabrera proposed a change to openstack/marconi: feat: integrate shard storage with transport  https://review.openstack.org/5099822:05
alcabrerasolved22:06
alcabreraI decided to promote 'faulty' to a setup.cfg entry point.22:06
alcabreraThen everything works without setattr magic22:06
alcabrerazyuan: ^22:06
alcabrerazyuan: thanks for the offer to help, though. :)22:06
*** tedross has quit IRC22:07
alcabrerazyuan, kgriffs: all patches in the admin api feature branch are rebased and ready for review.22:09
alcabreraI'm double-checking the queues' catalogue storage driver now22:09
alcabrera(separate branch)22:09
kgriffsnice work22:10
kgriffsI will check it out22:10
alcabrerathanks!22:10
kgriffsfyi, eom issue for that auth thingy22:12
kgriffshttps://github.com/racker/eom/issues/722:12
alcabrerakgriffs: cool - I'll tackle that one tomorrow morning. I need a change of pace. Waaaay too much rebasing lately. :P22:13
kgriffsok22:13
kgriffslet's sync up with the devops guys in the morning to finalize the design22:14
alcabrerasure thing22:14
alcabreraI'm our for the night. There's some pork chops just waiting to be cooked.22:15
alcabrerao/22:15
alcabrera*out22:15
*** alcabrera has left #openstack-marconi22:15
*** jdaggett1 has quit IRC22:16
*** jdaggett1 has joined #openstack-marconi22:17
*** jdaggett1 has left #openstack-marconi22:17
*** amitgandhi has quit IRC22:18
*** oz_akan_ has joined #openstack-marconi22:44
*** oz_akan_ has quit IRC22:49
*** reed has quit IRC23:15
*** reed has joined #openstack-marconi23:16
*** jcru has quit IRC23:25
*** malini_afk is now known as malini23:34
*** kgriffs is now known as kgriffs_afk23:37
*** amitgandhi has joined #openstack-marconi23:53

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!