*** tongl has joined #openstack-lbaas | 00:49 | |
*** mugsie has quit IRC | 00:51 | |
*** fnaval has quit IRC | 00:56 | |
*** yamamoto has quit IRC | 01:03 | |
*** yamamoto_ has joined #openstack-lbaas | 01:03 | |
*** leyal has quit IRC | 01:31 | |
*** leyal has joined #openstack-lbaas | 01:33 | |
*** bzhao has joined #openstack-lbaas | 01:39 | |
*** bbzhao has joined #openstack-lbaas | 01:48 | |
*** leitan has joined #openstack-lbaas | 02:06 | |
openstackgerrit | Lingxian Kong proposed openstack/octavia-tempest-plugin master: [WIP] Create scenario tests for listeners https://review.openstack.org/492311 | 02:14 |
---|---|---|
*** SumitNaiksatam has joined #openstack-lbaas | 02:18 | |
*** aojea has joined #openstack-lbaas | 03:17 | |
*** aojea has quit IRC | 03:22 | |
openstackgerrit | Lingxian Kong proposed openstack/octavia-tempest-plugin master: [WIP] Create scenario tests for listeners https://review.openstack.org/492311 | 03:28 |
*** Yipei has joined #openstack-lbaas | 03:46 | |
*** ianychoi_ has joined #openstack-lbaas | 04:20 | |
*** ianychoi has quit IRC | 04:29 | |
*** m-greene_ has quit IRC | 04:32 | |
*** m-greene_ has joined #openstack-lbaas | 04:35 | |
*** sanfern has joined #openstack-lbaas | 04:38 | |
*** belharar has joined #openstack-lbaas | 04:40 | |
*** armax has joined #openstack-lbaas | 04:42 | |
*** Alex_Staf has joined #openstack-lbaas | 04:54 | |
*** leitan has quit IRC | 05:11 | |
openstackgerrit | Rajat Sharma proposed openstack/octavia master: Replace 'manager' with 'os_primary' and 'os_adm' with 'os_admin' https://review.openstack.org/478399 | 05:22 |
*** aojea has joined #openstack-lbaas | 05:42 | |
*** gcheresh_ has joined #openstack-lbaas | 05:44 | |
*** ltomasbo has quit IRC | 05:49 | |
openstackgerrit | Pradeep Kumar Singh proposed openstack/octavia master: Add flavor, flavor_profile table and their APIs https://review.openstack.org/486499 | 06:12 |
*** armax has quit IRC | 06:19 | |
*** Alex_Staf has quit IRC | 06:20 | |
*** ltomasbo has joined #openstack-lbaas | 06:21 | |
openstackgerrit | Bar RH proposed openstack/octavia master: [WIP] Assign bind_host ip address per amphora https://review.openstack.org/505158 | 06:27 |
*** eezhova has joined #openstack-lbaas | 06:36 | |
*** rcernin has joined #openstack-lbaas | 06:39 | |
*** armax has joined #openstack-lbaas | 06:41 | |
*** yamamoto_ has quit IRC | 06:41 | |
*** yamamoto has joined #openstack-lbaas | 06:45 | |
*** armax has quit IRC | 06:50 | |
*** yamamoto_ has joined #openstack-lbaas | 07:04 | |
*** yamamoto has quit IRC | 07:04 | |
*** yamamoto_ has quit IRC | 07:10 | |
*** tongl has quit IRC | 07:12 | |
*** yamamoto has joined #openstack-lbaas | 07:18 | |
*** eezhova has quit IRC | 07:19 | |
*** tesseract has joined #openstack-lbaas | 07:21 | |
*** Alex_Staf has joined #openstack-lbaas | 07:41 | |
*** eezhova has joined #openstack-lbaas | 07:42 | |
*** eezhova_ has joined #openstack-lbaas | 07:45 | |
*** eezhova has quit IRC | 07:47 | |
*** Yipei has left #openstack-lbaas | 08:40 | |
*** chlong has quit IRC | 08:49 | |
openstackgerrit | Pradeep Kumar Singh proposed openstack/octavia master: Add flavor, flavor_profile table and their APIs https://review.openstack.org/486499 | 08:51 |
openstackgerrit | Bar RH proposed openstack/octavia master: [WIP] Assign bind_host ip address per amphora https://review.openstack.org/505158 | 09:00 |
*** yamamoto has quit IRC | 09:19 | |
*** salmankhan has joined #openstack-lbaas | 09:23 | |
*** numans has quit IRC | 09:25 | |
*** numans has joined #openstack-lbaas | 09:28 | |
openstackgerrit | Pradeep Kumar Singh proposed openstack/octavia master: Add flavor, flavor_profile table and their APIs https://review.openstack.org/486499 | 09:30 |
openstackgerrit | Bar RH proposed openstack/octavia master: [WIP] Assign bind_host ip address per amphora https://review.openstack.org/505158 | 09:55 |
*** yamamoto has joined #openstack-lbaas | 10:20 | |
*** eezhova__ has joined #openstack-lbaas | 10:28 | |
*** eezhova__ has quit IRC | 10:28 | |
*** salmankhan has quit IRC | 10:29 | |
*** eezhova_ has quit IRC | 10:31 | |
*** atoth has quit IRC | 10:35 | |
openstackgerrit | Lingxian Kong proposed openstack/octavia-tempest-plugin master: Create scenario tests for listeners https://review.openstack.org/492311 | 10:38 |
*** salmankhan has joined #openstack-lbaas | 10:39 | |
*** apuimedo_ has joined #openstack-lbaas | 10:43 | |
*** apuimedo has quit IRC | 10:45 | |
*** apuimedo_ is now known as apuimedo | 10:45 | |
*** sanfern has quit IRC | 10:54 | |
*** eezhova has joined #openstack-lbaas | 11:20 | |
*** strigazi has quit IRC | 11:23 | |
*** strigazi has joined #openstack-lbaas | 11:24 | |
*** pcaruana has joined #openstack-lbaas | 11:27 | |
*** atoth has joined #openstack-lbaas | 11:29 | |
*** sanfern has joined #openstack-lbaas | 12:32 | |
nmagnezi | o/ | 12:44 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/neutron-lbaas master: Updated from global requirements https://review.openstack.org/506638 | 12:48 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/neutron-lbaas-dashboard master: Updated from global requirements https://review.openstack.org/504660 | 12:48 |
*** leitan has joined #openstack-lbaas | 12:56 | |
*** belharar has quit IRC | 12:57 | |
*** belharar has joined #openstack-lbaas | 12:58 | |
*** chlong has joined #openstack-lbaas | 13:29 | |
*** chlong has quit IRC | 13:31 | |
*** belharar has quit IRC | 13:34 | |
*** Alex_Staf has quit IRC | 13:36 | |
*** rtjure has quit IRC | 13:58 | |
*** sanfern has quit IRC | 13:59 | |
*** sanfern has joined #openstack-lbaas | 14:00 | |
*** rtjure has joined #openstack-lbaas | 14:03 | |
*** belharar has joined #openstack-lbaas | 14:08 | |
*** yamamoto has quit IRC | 14:10 | |
*** ipsecguy_ has joined #openstack-lbaas | 14:13 | |
*** ipsecguy has quit IRC | 14:14 | |
*** yamamoto has joined #openstack-lbaas | 14:15 | |
*** yamamoto has quit IRC | 14:20 | |
*** tongl has joined #openstack-lbaas | 14:27 | |
johnsom | o/ | 14:28 |
*** tongl has quit IRC | 14:30 | |
openstackgerrit | Merged openstack/neutron-lbaas-dashboard master: Updated from global requirements https://review.openstack.org/504660 | 14:34 |
*** dayou has quit IRC | 15:01 | |
*** longkb_ has joined #openstack-lbaas | 15:01 | |
*** bbzhao has quit IRC | 15:03 | |
*** bbzhao has joined #openstack-lbaas | 15:03 | |
*** yamamoto has joined #openstack-lbaas | 15:10 | |
*** gcheresh_ has quit IRC | 15:11 | |
*** chlong has joined #openstack-lbaas | 15:17 | |
*** eezhova has quit IRC | 15:18 | |
xgerman_ | o/ - not sure if I am able to make the meeting but feel free to summon me if needed ;-) | 15:33 |
johnsom | Ok | 15:35 |
*** tongl has joined #openstack-lbaas | 15:37 | |
*** rcernin has quit IRC | 15:43 | |
openstackgerrit | Bar RH proposed openstack/octavia master: [WIP] Assign bind_host ip address per amphora https://review.openstack.org/505158 | 15:49 |
johnsom | Hmmm, I think I just reproduced the "404" issue locally. I have a nova vm, but the interface isn't in the VM | 15:54 |
nmagnezi | johnsom, in case I won't make it to the meeting, I voted to the poll (the best available option for me is to revert back to the old timing) | 15:58 |
nmagnezi | johnsom, oh, and hi :) | 15:58 |
johnsom | Ok, great, I was going to ping you to vote | 15:59 |
nmagnezi | johnsom, btw as for my plugin.sh (and more) patch, for some reason when I stack with this patch it fails to spawn vms with: {u'message': u"Host 'rdocloud-devstack2' is not mapped to any cell", u'code': 400, u'created': u'2017-09-27T11:31:42Z'} | 16:01 |
nmagnezi | so not sure if that's related, going to use a clean setup to restack | 16:01 |
johnsom | Ok, yeah, that sounds like a nova setup issue, probably unrelated | 16:02 |
nmagnezi | johnsom, byw in the story I asked you question about that DVR comment you asked for | 16:02 |
nmagnezi | yup. i think the same. | 16:02 |
johnsom | Do you have a link to the story? | 16:02 |
*** atoth has quit IRC | 16:03 | |
nmagnezi | yes, sec | 16:03 |
johnsom | Thanks, trying to dig into why I have a instance with a missing network interface. Nova/neutron say it's there, but linux in the vm doesn't see it. | 16:04 |
nmagnezi | johnsom, https://storyboard.openstack.org/#!/story/2001183#comment-17461 | 16:04 |
*** gcheresh_ has joined #openstack-lbaas | 16:07 | |
johnsom | nmagnezi Thanks, commented | 16:08 |
nmagnezi | johnsom, thanks! was that issue resolved later? (I want to specify this in the comment) | 16:09 |
johnsom | It was fixed in the Pike release | 16:10 |
johnsom | Prior to Pike it has always been broken when using DVR | 16:10 |
nmagnezi | ack. thanks! | 16:13 |
nmagnezi | johnsom, btw fixed even the nova commands, so I want those bonus points. | 16:13 |
johnsom | Cool! | 16:14 |
johnsom | 500+ | 16:14 |
*** gcheresh_ has quit IRC | 16:22 | |
*** tesseract has quit IRC | 16:24 | |
*** sshank has joined #openstack-lbaas | 16:39 | |
*** SumitNaiksatam has quit IRC | 16:40 | |
*** JudeC has joined #openstack-lbaas | 16:53 | |
*** yamamoto has quit IRC | 16:55 | |
*** dayou has joined #openstack-lbaas | 16:57 | |
*** longstaff has joined #openstack-lbaas | 16:58 | |
*** gans has joined #openstack-lbaas | 16:59 | |
*** rm_mobile has joined #openstack-lbaas | 16:59 | |
*** yamamoto has joined #openstack-lbaas | 16:59 | |
*** eezhova has joined #openstack-lbaas | 17:01 | |
*** longstaff has quit IRC | 17:07 | |
*** longstaff has joined #openstack-lbaas | 17:10 | |
*** pcaruana has quit IRC | 17:14 | |
*** yamamoto has quit IRC | 17:31 | |
*** yamamoto has joined #openstack-lbaas | 17:31 | |
*** gans has quit IRC | 17:32 | |
*** rm_mobile has quit IRC | 17:49 | |
*** sshank has quit IRC | 17:54 | |
*** jniesz has joined #openstack-lbaas | 18:00 | |
johnsom | jniesz Ok with next week or want to chat here? | 18:02 |
rm_work | wait, i just read the flavor spec and it said flavors were immutable and only set at create time :P | 18:02 |
rm_work | are we changing that? | 18:03 |
johnsom | Yeah, that is the current stance | 18:03 |
johnsom | I think the topic was a discussion about revisiting that. | 18:03 |
jniesz | yes because the question is how to move from one flavor to another | 18:04 |
johnsom | I think it's "possible", but I would like to see it working first... grin | 18:04 |
jniesz | i agree that an lb created under a flavor is immutable | 18:04 |
jniesz | but should be able to failover (deprovision / reprovision) lb to new flavor | 18:04 |
*** longstaff has quit IRC | 18:06 | |
jniesz | for example if we update glance image of a flavor | 18:06 |
jniesz | create a new flavor with a new glance image | 18:06 |
jniesz | and want to move all lb's under the old flavor over to that new flavor | 18:06 |
johnsom | It gets pretty strange if the provider is different across the flavors. I mean, I think it is possible, but definitely something I would want to tackle in the future when we have providers/drivers working | 18:06 |
johnsom | Well, today, glance images are tagged, so by updating the tag to point to the new image and then using the failover API you can accomplish that without changing the flavor | 18:08 |
jniesz | correct. Depending how we implement glance images in flavor that might be different | 18:09 |
jniesz | if we have different glance images for different flavors | 18:09 |
jniesz | flavor might point to glance image id | 18:09 |
jniesz | or need to support multiple tags for different images | 18:10 |
johnsom | We have deprecated pointing to image IDs I think... | 18:10 |
jniesz | right now we just look for single tag | 18:10 |
johnsom | I think we only support tags | 18:10 |
johnsom | Right, you could setup a tag per flavor if you want to manage it that way. | 18:11 |
johnsom | https://github.com/openstack/octavia/blob/master/octavia/common/config.py#L290 | 18:11 |
jniesz | yes, so amp_image_tag would have to move into flavor meta_info | 18:12 |
jniesz | from the config | 18:12 |
johnsom | Yes, that is definitely something that needs to happen | 18:12 |
johnsom | I think there are a few config settings that need to move to flavor, like topology, tag, nova flavor, etc. Mostly we parked things in config because we didn't have flavors. | 18:13 |
jniesz | yea, nova flavor is another useful one. | 18:14 |
jniesz | migrate from one flavor to another (vertical scaling) | 18:14 |
jniesz | it would be good to failover to make that happen similar to the way glance is handled with tags | 18:14 |
johnsom | It's another interesting one. Nova team is starting to work on hot-plug vcpu/ram | 18:15 |
*** pcaruana has joined #openstack-lbaas | 18:15 | |
*** salmankhan has quit IRC | 18:15 | |
openstackgerrit | Bar RH proposed openstack/octavia master: [WIP] Assign bind_host ip address per amphora https://review.openstack.org/505158 | 18:19 |
xgerman_ | yeah, we can move all that to flavor ;-) We still support image-ids but they are a bit weird when using | 18:19 |
rm_work | johnsom: harlowja says moving to jobboard requires us dropping oslo.messaging | 18:20 |
xgerman_ | Probably need to extends our failover mechanism and one flavor->another might be a user op… | 18:20 |
harlowja | or at least parts of it... | 18:20 |
harlowja | the models just aren't the same (in that there is no messaging, lol) | 18:20 |
xgerman_ | mmh, can we keep our queue between API and worker? | 18:20 |
johnsom | hmmm, confused a bit by that | 18:21 |
xgerman_ | +1 | 18:21 |
johnsom | We use it in two places: | 18:21 |
johnsom | 1. API process to worker (prior to starting a task flow) | 18:21 |
rm_work | xgerman_: specifically the queue between API and worker is what needs to stop using it | 18:21 |
xgerman_ | mmh | 18:21 |
johnsom | 2. Sending stats/status over to neutron (outside task flow) | 18:21 |
rm_work | yeah that second one could continue | 18:21 |
rm_work | except, neutron-lbaas is on fire | 18:22 |
johnsom | Since API->CW is before we even launch an engine, how would it conflict? In reality I think the CW gets a lot smaller and the JB workers take on more of the stuff | 18:23 |
harlowja | https://openstack.nimeyo.com/83061/openstack-dev-oslo-mistral-saga-process-than-where-from-here also for homework/reading | 18:24 |
harlowja | quiz tommorow | 18:25 |
rm_work | johnsom: because jobboard requires ack-after-work and oslo does ack-before-work apparently | 18:25 |
johnsom | Deal | 18:25 |
*** belharar has quit IRC | 18:25 | |
xgerman_ | so we need to run our own queue? Or does job board does that for us? Confused? | 18:26 |
harlowja | so it prob would be useful for me to tell u what a jobboard (at least the zookeeper one) | 18:26 |
xgerman_ | guess @johnsom will write a summary paper for us and present :-) | 18:26 |
johnsom | rm_work Still don't see how that is a problem. It is a feature of this solution IMO | 18:26 |
harlowja | depending on peoples zookeeper knowledge | 18:26 |
harlowja | i need 20 minutes class time to do said description | 18:26 |
xgerman_ | I know the keeper — | 18:26 |
harlowja | k | 18:27 |
harlowja | https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#ch_zkDataModel (for those who don't know) | 18:27 |
harlowja | watches, znodes, emphermal nodes are important | 18:27 |
johnsom | Here comes the fire hose.... | 18:27 |
harlowja | ha | 18:27 |
harlowja | let me know when to fully firehose | 18:28 |
harlowja | johnsom ack-before-work means u can lose messages if worker processing messages crashes right after acking | 18:29 |
harlowja | so that's bad, especially in projects (mistral, octavia) that aren't retrying and ... | 18:29 |
harlowja | (retrying from the sender side) | 18:29 |
harlowja | (crash or software upgrade, or network blip or other) | 18:30 |
johnsom | Right, this is our current situation. | 18:31 |
xgerman_ | yep, but can’t we make the queue “duarble” | 18:31 |
johnsom | I am envisioning.... (cue dreamland music) Our CW gets message off queue, fires up TF engine stuffs, ACKs queue as message has been handed off to a more durable system. | 18:32 |
rm_work | that is what i thought too, but apparently not how it works :/ | 18:32 |
*** sshank has joined #openstack-lbaas | 18:32 | |
*** sshank has quit IRC | 18:33 | |
*** sshank has joined #openstack-lbaas | 18:33 | |
harlowja | not with oslo.messaging, lol | 18:34 |
xgerman_ | fix it! | 18:34 |
harlowja | not likely | 18:34 |
*** sshank has quit IRC | 18:35 | |
harlowja | oslo.messaging doesn't really expose the acking to users | 18:35 |
harlowja | on purpose | 18:35 |
johnsom | So even Our CW gets message off queue, auto ACKs, fires up TF engine is better than we have today and a pretty small failure window | 18:35 |
xgerman_ | +1 | 18:35 |
harlowja | that's already happening u use oslo.messaging | 18:36 |
rm_work | welcome to my conversation with harlowja as of like 15 minutes ago | 18:36 |
xgerman_ | nice — so we fire off taskboard earlier? | 18:36 |
johnsom | So you are proposing to scrap the queue, launch the TF engine straight from the API provider driver | 18:36 |
xgerman_ | monoliths unite! | 18:36 |
harlowja | different kind of queue (if u can call it that) | 18:36 |
rm_work | I think we ... post a job to a board | 18:36 |
harlowja | a taskflow job 'queue' is a directory in zookeeper | 18:37 |
johnsom | The problem we have is after the message is off the queue and we start the TF is when the "hard stuff" happens | 18:37 |
rm_work | instead of put a call on a queue | 18:37 |
rm_work | it's basically the same kind of thing | 18:37 |
harlowja | so when u post to a job board, it creates entry in that directory | 18:37 |
harlowja | workers that are waiting for work are 'watching' that directory for entries being created | 18:37 |
xgerman_ | ok, zookeeper is the queue | 18:37 |
xgerman_ | got it… easy | 18:37 |
rm_work | so yeah we'd just replace the places where we post to oslo.messaging, and instead post the job to zookeeper in a slightly different format | 18:37 |
harlowja | then a worker (one of them) gets the 'entry' (via atomic blah blah) | 18:37 |
harlowja | then worker starts processing workflow described in job | 18:38 |
rm_work | and the workers look at zookeeper for jobs | 18:38 |
rm_work | instead of reading from oslo.messaging | 18:38 |
harlowja | if worker dies (at any point) it releases lock (emphermal ...node) | 18:38 |
xgerman_ | yeah, then they lock and if they die it unlocks bla, bla | 18:38 |
harlowja | then another worker can see this has happened and try to take that job over | 18:38 |
harlowja | blah blah | 18:38 |
johnsom | Where in this does the TF engine get started? | 18:38 |
harlowja | post-job-claim | 18:38 |
harlowja | the interseting bit is that if u have TF engines persisting task state to somewhere | 18:38 |
harlowja | that on worker death the next worker can try to figure out where the last worker finished | 18:39 |
johnsom | Yeah, that is the part we actually need | 18:39 |
harlowja | and pick it up there | 18:39 |
xgerman_ | now we are talking ;-) | 18:39 |
harlowja | of course, some projects don't give a shit about the persistnce part | 18:39 |
johnsom | sub-flow durability | 18:39 |
harlowja | and just restart the whole damn thing (and just use the auto-worker transfer stuff) | 18:39 |
harlowja | depends on if u can restart the subflows or if its just easier to start the whole thing over | 18:39 |
harlowja | (and then have tasks themselves check things and do nothing...) | 18:40 |
xgerman_ | could work for us but would be wasteful (unaccounted resources…) | 18:40 |
johnsom | Yeah, restart the whole thing is bad in most cases. We have this pesky VIP IP/port | 18:40 |
harlowja | i'd expect a task could check something before doing work no? | 18:40 |
harlowja | like check if VIP/IP/port already made, then do nothing | 18:40 |
xgerman_ | probably | 18:40 |
harlowja | if not already made, do something... | 18:40 |
harlowja | and repeat | 18:40 |
xgerman_ | but it gets more tough with VMs, etc. | 18:40 |
harlowja | sure | 18:41 |
harlowja | anyway, that's the idea | 18:41 |
harlowja | hose done | 18:41 |
harlowja | lol | 18:41 |
*** gcheresh_ has joined #openstack-lbaas | 18:41 | |
johnsom | You have such faith in the OpenStack API capabilities.... Odds are high we would walk every port in the system.... grin | 18:41 |
harlowja | i haven't (but could) transfer the same concepts to etcd | 18:41 |
harlowja | i just haven't | 18:41 |
harlowja | there is a limited redis driver for jobboard that is sorta similar (but not so good) | 18:41 |
harlowja | since redis doesn't support the same concepts natively | 18:41 |
harlowja | johnsom ha | 18:42 |
harlowja | don't do dumb things :-P | 18:42 |
xgerman_ | why can’t we use a graphDB - I heard they are all the rage now | 18:42 |
harlowja | ha | 18:42 |
johnsom | I mean if we are going to restart the whole flow, isn't there something lighter weight than zookeeper, etc? | 18:42 |
xgerman_ | like a durable queue? | 18:42 |
xgerman_ | after all the keeper is not officially part of OpenStacj whereas etcd is | 18:43 |
harlowja | define light-weight | 18:43 |
harlowja | lol | 18:43 |
xgerman_ | so if we can avoid the keeper would be goodness | 18:43 |
johnsom | Yeah, relying on more external parts makes me ill | 18:43 |
harlowja | meh, u decide | 18:43 |
harlowja | u can hack all of this with db|rabbit|something else | 18:43 |
harlowja | but i'd rather not | 18:43 |
harlowja | with some polling threads and shit | 18:44 |
*** sshank has joined #openstack-lbaas | 18:44 | |
harlowja | enjoy that, ha | 18:44 |
harlowja | but ya, the jobboard stuff was before etcd got approved (3 months ago?) so ya... | 18:44 |
harlowja | is what it is ,ha | 18:44 |
xgerman_ | so we should at least use etcd since that’s now officially part of the kit whereas zookeeper isn't | 18:44 |
harlowja | go for it | 18:44 |
harlowja | i could prob do it, but it might be a useful thing for someone here | 18:44 |
harlowja | then u'll know wtf jobboards are better, haha | 18:44 |
xgerman_ | I guess we have our work cutout or wait for the K8 Octavia | 18:44 |
harlowja | at least now u know the concepts | 18:45 |
xgerman_ | yeah, might also be a non-issue, e.g. terraform checks if the LB appeared and if not errors out — so if we loose the message the user will know and can run again | 18:46 |
harlowja | sounds like shifting work to user | 18:46 |
xgerman_ | yep, it’s shitty | 18:46 |
harlowja | ie, the user is your retry decorator, lol | 18:46 |
harlowja | user-powered-decorators | 18:46 |
johnsom | We could just tell rm_work to never interrupt a controller | 18:47 |
xgerman_ | :-) | 18:47 |
johnsom | Get a bunker, UPS, generator, vault door | 18:47 |
xgerman_ | http://www.zerohedge.com/sites/default/files/images/user5/imageroot/2017/04/15/north-korea-missiles_0.png | 18:48 |
xgerman_ | just look that you build it outside those circles | 18:48 |
johnsom | Geez, three pages in the first doc alone... | 18:49 |
johnsom | I feel like harlowja gave us free candy (taskflow) and then ..... | 18:50 |
*** sshank has quit IRC | 18:56 | |
*** JudeC has quit IRC | 18:59 | |
*** pcaruana has quit IRC | 19:00 | |
*** gcheresh_ has quit IRC | 19:05 | |
nmagnezi | johnsom, o/ | 19:11 |
nmagnezi | johnsom, a question about https://review.openstack.org/#/c/505884/7 | 19:12 |
johnsom | o/ | 19:12 |
nmagnezi | johnsom, why do we have openstackclient in both places? | 19:12 |
johnsom | Yeah, I had that question too. Let me see if I can find the comments about this. | 19:13 |
nmagnezi | johnsom, so in the patch i looks like it bumps it in test req, but strangely enough i don't see it in master: https://github.com/openstack/python-octaviaclient/blob/master/test-requirements.txt | 19:13 |
johnsom | If I remember right it has to do with some integrated tests | 19:13 |
nmagnezi | johnsom, ack. just tried to makes sense of it for myself :) | 19:13 |
johnsom | Oh, interesting point | 19:14 |
johnsom | hmmm | 19:14 |
johnsom | https://review.openstack.org/#/c/487565/3/test-requirements.txt | 19:15 |
johnsom | So, we did remove it | 19:15 |
johnsom | The bot must be confused | 19:15 |
johnsom | Oh, we removed it after stable/pike was cut | 19:15 |
johnsom | That bot patch is against stable/pike | 19:16 |
nmagnezi | johnsom, oh, i missed that | 19:16 |
johnsom | Yeah, we can backport that if it is a problem for your packaging | 19:16 |
nmagnezi | johnsom, i should have waited with my vote.. :P | 19:17 |
*** sanfern has quit IRC | 19:17 | |
johnsom | Not to late to fix it | 19:17 |
nmagnezi | johnsom, indeed | 19:17 |
nmagnezi | johnsom, as for packaging, I didn't package the client yet (I plan to do so in a few weeks) | 19:19 |
nmagnezi | but since pike was already release i think we should just leave it there | 19:19 |
johnsom | Yep | 19:19 |
nmagnezi | if it ain't broken.. :) | 19:19 |
johnsom | Yep | 19:20 |
openstackgerrit | Nir Magnezi proposed openstack/octavia master: Fix a Python3 issue with start_stop_listener https://review.openstack.org/480919 | 19:20 |
nmagnezi | rm_work, i need some advise with this one ^. A little bird (with a PTL wings) whispered me you know how to tackle those py3 issues :) | 19:22 |
*** eezhova has quit IRC | 19:23 | |
rm_work | yeah those are always fun | 19:23 |
rm_work | often you can run it through / test against six.text_type | 19:23 |
rm_work | i THINK in this case that might be safe | 19:24 |
rm_work | six.text_type("a") == six.text_type(b"a") | 19:25 |
*** sanfern has joined #openstack-lbaas | 19:25 | |
rm_work | ah yeah doesn't work directly in py3 | 19:26 |
rm_work | but you can test | 19:27 |
harlowja | johnsom hahaha, free candy | 19:31 |
harlowja | i can only provide some much of the candy, the rest is up to u guys | 19:33 |
harlowja | i had a hard enough time just trying to get etcd|zookeeper into openstack as an accepted thing | 19:34 |
harlowja | (thankfully it now is) | 19:34 |
harlowja | ^ that blew my mind honestly (that it took that long) | 19:34 |
harlowja | especially for a cloud distributed system... | 19:34 |
harlowja | lol | 19:34 |
* harlowja slightly burnt out by that crap | 19:36 | |
rm_work | johnsom: do you remember what log level we get if we don't have debug=true? | 19:38 |
harlowja | i think also not super-happy with how people like used oslo.messaging and i think they didn't quite know what it really is doing (acking before work...) | 19:38 |
johnsom | Yeah, I am getting worn down by things breaking out from under us... Thus the worry of adding more | 19:38 |
harlowja | anyways, rant over | 19:38 |
johnsom | rm_work INFO I think | 19:39 |
rm_work | yeah I'm hoping that's true | 19:39 |
johnsom | That is what I see in one of my devstack VMs | 19:43 |
johnsom | Ok, back to nova/neutron fun with the case of the missing network interface | 19:43 |
rm_work | ganbatte | 19:52 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: WIP: Floating IP Network Driver (spans L3s) https://review.openstack.org/435612 | 20:03 |
*** sshank has joined #openstack-lbaas | 20:05 | |
*** dayou has quit IRC | 20:07 | |
*** yamamoto has quit IRC | 20:08 | |
*** eezhova has joined #openstack-lbaas | 20:14 | |
*** sshank has quit IRC | 20:24 | |
*** sshank has joined #openstack-lbaas | 20:24 | |
*** yamamoto has joined #openstack-lbaas | 20:27 | |
*** yamamoto has quit IRC | 20:31 | |
*** ltomasbo has quit IRC | 20:34 | |
*** ltomasbo has joined #openstack-lbaas | 20:37 | |
johnsom | So, our 404 issue.... | 20:39 |
johnsom | Appears to be an issue inside the amp | 20:39 |
johnsom | If I force a PCI bus re-enumeration the interface pops up | 20:40 |
rm_work | hmmmmmmmmm | 20:40 |
rm_work | curiosity: i wonder if we switched the gates to centos amps if it'd show up :P | 20:41 |
rm_work | might be an ubuntu thing | 20:41 |
johnsom | Yeah. I collected a ton of nova/neutron logs then figured I would start poking things. | 20:41 |
rm_work | you said you managed to repro, but | 20:41 |
rm_work | *reliably*? or just once randomly | 20:41 |
johnsom | Once ever | 20:42 |
rm_work | T_T | 20:42 |
johnsom | But we saw it in the gates a bunch in Pike | 20:42 |
rm_work | yes, i remember | 20:42 |
rm_work | is it a race? | 20:42 |
rm_work | like sure you did a re-enum and it popped up | 20:42 |
rm_work | but maybe because nova didn't set it up in the time it was supposed to? | 20:42 |
johnsom | If so it's in the bowels of the linux kernel hot-plug systems.... | 20:43 |
johnsom | Oh, no, I had looked at the device list just before | 20:43 |
johnsom | https://www.irccloud.com/pastebin/5DeVTzTG/ | 20:43 |
*** aojea has quit IRC | 20:44 | |
johnsom | I ran it in the netns just to make sure there wasn't some strange netns magic going on (though PCI should not be masked by that) | 20:44 |
johnsom | I mean, forcing a rescan if we don't find the interface we expect is not harmful. | 20:45 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: WIP: Floating IP Network Driver (spans L3s) https://review.openstack.org/435612 | 20:45 |
*** aojea has joined #openstack-lbaas | 20:49 | |
rm_work | johnsom: so, basically a quick workaround | 20:59 |
rm_work | maybe that's fine :/ | 20:59 |
rm_work | my standards have become a lot lower as someone trying to actually *get all this working reliably* | 21:00 |
openstackgerrit | Nir Magnezi proposed openstack/octavia master: Update devstack plugin and examples https://review.openstack.org/503638 | 21:01 |
*** Alex_Staf has joined #openstack-lbaas | 21:02 | |
johnsom | It's actually a pretty common practice when hot plugging things into linux hosts. I just thought we were long beyond needing it. | 21:02 |
rm_work | aah there was another admin endpoint i forgot to talk about in the meeting T_T | 21:02 |
rm_work | maintenance mode | 21:02 |
rm_work | a couple of things: | 21:03 |
rm_work | 1) I think i want to store the AZ/host that comes back from the nova polling | 21:04 |
rm_work | 2) make a new table for storing currently active maintenances (either an AZ or a Host) | 21:04 |
rm_work | 3) Make an endpoint to create/read/delete to that table | 21:05 |
rm_work | 4) have some logic that can failover amps off those AZ/hosts when a maintenance is set | 21:05 |
rm_work | jniesz: ^^ possibly relevant for you too | 21:06 |
rm_work | for #1 i mean, in the amphora table | 21:07 |
xgerman_ | mmh, I can see that we switch off an AZ for maintenance but actively failing stuff over - that should be left to the operator to run some script | 21:08 |
rm_work | hmmm | 21:08 |
xgerman_ | yeah, not every maintenance might need a failover - maybe just not schedule new amps for some time | 21:09 |
johnsom | Yeah, it seems like maintenance would stop health monitoring and block failovers | 21:10 |
johnsom | If I remember right, you are looking for something to evacuate an AZ??? | 21:11 |
xgerman_ | looks like it and I think that should be done outside Octavia | 21:11 |
xgerman_ | we just give you the building blocks | 21:11 |
johnsom | Yeah, some folks might want to live migrate too | 21:12 |
xgerman_ | +1 | 21:12 |
xgerman_ | also rm_work should write a spec — this is getting beyond what we can handle i irc ;-) | 21:15 |
*** eezhova has quit IRC | 21:15 | |
johnsom | Yeah, this one probably should have a spec | 21:15 |
*** ltomasbo has quit IRC | 21:16 | |
*** leitan has quit IRC | 21:16 | |
xgerman_ | I haven’t looked at our minutes but if we do a spec we might do it for all the proposed ones… so we have a proper record | 21:16 |
xgerman_ | aka an admin-api spec | 21:17 |
*** ltomasbo has joined #openstack-lbaas | 21:18 | |
rm_work | yeah prolly | 21:18 |
johnsom | https://www.irccloud.com/pastebin/CykDBtZM/ | 21:19 |
xgerman_ | root is not what it used to be? | 21:19 |
rm_work | open('/sys/bus/pci/rescan', 'w') | 21:20 |
rm_work | plz try | 21:20 |
johnsom | Ah, I'm a dork and forgot the 'w' | 21:20 |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Force PCI bus rescan if interface is not found https://review.openstack.org/507986 | 21:26 |
rm_work | so now our gates will be impervious to failure? :P | 21:27 |
*** yamamoto has joined #openstack-lbaas | 21:28 | |
tongl | Do we still track neutron-lbaas v2 bug? I created a listener with default pool, and also add healthmonitor for this pool. However, when I tried to create healthmonitor for the 2nd redirect_pool, it reports exception "TypeError: unhashable type: 'dict'". | 21:29 |
tongl | Did anyone see this before in LBaaS v2? | 21:29 |
tongl | Error log: http://paste.openstack.org/show/622092/ | 21:30 |
* rm_work doesn't use neutron-lbaas | 21:31 | |
tongl | We still have to develop our driver to support neutron-lbaas :( | 21:33 |
johnsom | tongl neutron-lbaas bugs go here: https://storyboard.openstack.org/#!/project/906 | 21:33 |
johnsom | I will say, developers working on neutron-lbaas are few | 21:33 |
tongl | thanks | 21:34 |
*** yamamoto has quit IRC | 21:34 | |
johnsom | Is it the odd newline in "admin_state_up"? | 21:37 |
*** gcheresh_ has joined #openstack-lbaas | 21:39 | |
*** gcheresh_ has quit IRC | 21:43 | |
*** sshank has quit IRC | 21:47 | |
tongl | Another quick question: in Octavia, do we allow deleting a pool when l7policy redirect_to_pool is still referencing it? | 21:48 |
johnsom | I think the answer is no, but I have not tested it | 21:50 |
tongl | thx | 21:51 |
*** yamamoto has joined #openstack-lbaas | 21:57 | |
*** yamamoto has quit IRC | 21:57 | |
*** aojea has quit IRC | 21:58 | |
*** kbyrne has quit IRC | 22:00 | |
*** kbyrne has joined #openstack-lbaas | 22:03 | |
*** jniesz has quit IRC | 22:04 | |
rm_work | blegh, amps explode when i use single-create, not when i use normal create | 22:22 |
rm_work | because of the active-standby keepalived | 22:22 |
johnsom | Ugh | 22:22 |
rm_work | something with the initiation of it happening very early | 22:22 |
rm_work | logs look like this: | 22:22 |
rm_work | http://paste.openstack.org/show/622094/ | 22:24 |
rm_work | that's one amp | 22:24 |
rm_work | http://paste.openstack.org/show/622095/ | 22:26 |
rm_work | that's the other | 22:26 |
rm_work | so what seems to happen ... | 22:26 |
rm_work | http://paste.openstack.org/show/622096/ | 22:29 |
rm_work | hmmm | 22:29 |
rm_work | actually i am unsure if it's keepalived related | 22:29 |
*** sshank has joined #openstack-lbaas | 22:29 | |
rm_work | it actually fails over because of no health updates? | 22:29 |
rm_work | but it does so before we consider the create "done" | 22:30 |
rm_work | so it errors the amps because it's still PENDING_CREATE | 22:30 |
rm_work | not sure why it's not sending health messages | 22:30 |
johnsom | Ummm, it can't be told to failover from health messages under after it starts sending them. | 22:32 |
rm_work | yeah i mean it seems like it STOPS sending them | 22:32 |
rm_work | or something | 22:32 |
rm_work | it IS sending them now | 22:33 |
rm_work | blegh | 22:33 |
rm_work | I am still unsure what I think about being unable to failover while LB is pending state | 22:34 |
johnsom | Should not happen, HM doesn't own the amp | 22:35 |
*** sshank has quit IRC | 22:37 | |
rm_work | aahh | 22:37 |
rm_work | 2017-09-27 15:37:28.083 7766 WARNING octavia.controller.healthmanager.update_db [-] Amphora 9912b285-4aee-4216-bc81-3e036db246cd health message reports 1 listeners when 0 expected | 22:37 |
rm_work | the messages come in but are ignored I think | 22:37 |
rm_work | because the listener count doesn't line up | 22:37 |
johnsom | correct, that is normal on startup | 22:38 |
rm_work | ok but it stays that way for a WHILE | 22:38 |
rm_work | i think because it's still in the process of setting up the single-create stuff so it hasn't persisted that yet? | 22:39 |
rm_work | but it has sent it to the amps/ | 22:39 |
rm_work | ? | 22:39 |
johnsom | Even so, it can't fail over due to HM because it should have never written a amphora_health record to the DB until the first listener comes up | 22:40 |
rm_work | yeah the first listener comes up on the amp | 22:40 |
rm_work | err | 22:41 |
rm_work | hmm so at SOME POINT the numbers had to match | 22:41 |
johnsom | I think we had bug about that, that we really should be doing health of the amp before the listener, but you know, chicken/egg | 22:41 |
johnsom | The last attempt at that caused it to always failover | 22:42 |
johnsom | ha | 22:42 |
rm_work | i have an idea... | 22:42 |
rm_work | if len(listeners) == expected_listener_count: | 22:42 |
rm_work | ^^ replace that with | 22:42 |
rm_work | if len(listeners) == expected_listener_count and 'PENDING' not in lb.provisioning_status: | 22:43 |
rm_work | err sorry bad logic | 22:43 |
rm_work | one sec | 22:43 |
rm_work | if len(listeners) == expected_listener_count or 'PENDING' in lb.provisioning_status: | 22:43 |
johnsom | spares pool | 22:43 |
rm_work | erm | 22:43 |
rm_work | hold on we're USING lb.id here | 22:44 |
rm_work | ah i see | 22:44 |
rm_work | one sec | 22:44 |
*** sshank has joined #openstack-lbaas | 22:45 | |
rm_work | http://paste.openstack.org/show/622097/ | 22:46 |
rm_work | added lines 1, 6/7, edited 13 | 22:46 |
*** leitan has joined #openstack-lbaas | 22:48 | |
rm_work | johnsom: ^^ | 22:50 |
johnsom | What about an LB created but no listener? | 22:51 |
johnsom | I don't remember when we start sending | 22:51 |
johnsom | Frankly I'm still not sure why we are doing this at all... | 22:51 |
rm_work | oh dear god, somehow i got an orphaned VM pair | 22:55 |
rm_work | i don't know how to track them down... | 22:55 |
*** yamamoto has joined #openstack-lbaas | 22:58 | |
*** rtjure has quit IRC | 23:01 | |
*** bcafarel has quit IRC | 23:03 | |
rm_work | k got it | 23:03 |
*** yamamoto has quit IRC | 23:05 | |
rm_work | yeah umm shit | 23:06 |
rm_work | so when i do single-create with active-standby, it dies before it ever finishes creating | 23:06 |
rm_work | (the amps go to error) | 23:06 |
johnsom | nova error? | 23:06 |
rm_work | and then when i try cleanup (delete the LB, which is ACTIVE) it seems to be leaving the VMs behind but deleting the amp record O_o | 23:07 |
rm_work | no, the amps failover but it's in PENDING_CREATE so they don't finish failover and go to ERROR | 23:07 |
rm_work | maybe we shouldn't put amps in ERROR if we try a failover but don't do it because of LB state | 23:07 |
rm_work | O_o | 23:08 |
*** tongl has quit IRC | 23:09 | |
openstackgerrit | Adam Harwell proposed openstack/octavia master: WIP: Floating IP Network Driver (spans L3s) https://review.openstack.org/435612 | 23:11 |
rm_work | ok that fix seems to fix my startup issue | 23:13 |
rm_work | the PENDING state is just too long for them and so it gets out an initial heartbeat somehow but then is blocked for too long T_T | 23:14 |
johnsom | I would be figuring out how that hearthbeat is coming out | 23:14 |
*** bcafarel has joined #openstack-lbaas | 23:16 | |
rm_work | hmm | 23:18 |
rm_work | yeah the timing on this is crazy complex | 23:18 |
rm_work | johnsom: ok here's a question for you | 23:35 |
rm_work | LB is happy and ACTIVE | 23:35 |
rm_work | an amp dies | 23:35 |
rm_work | we notice, trigger failover, LB goes to PENDING_UPDATE | 23:35 |
rm_work | we finish the failover, mark it ACTIVE again | 23:35 |
rm_work | right? | 23:35 |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Force PCI bus rescan if interface is not found https://review.openstack.org/507986 | 23:35 |
johnsom | Yes, it also has a health lock | 23:36 |
rm_work | now what happens if both of the amps behind an ACTIVE_STANDBY LB die within a few seconds? | 23:36 |
rm_work | one amp dies, we PENDING_UPDATE the LB | 23:36 |
rm_work | second amp ... failover fails | 23:36 |
rm_work | now think about once we have N-Active | 23:37 |
rm_work | I *really* don't think we can afford to lock the LB on failover <_< | 23:37 |
johnsom | But, it should be fine right, the second failed amp will stay failed, first one will be built, go back ACTIVE, HM will notice the second one is borked and start failover on it. | 23:39 |
johnsom | Right? | 23:39 |
rm_work | i don't think that works | 23:39 |
rm_work | i need to figure out why | 23:40 |
rm_work | but IME they stay ERROR forever | 23:40 |
johnsom | About a year ago it did, I tried deleting both and got it to work | 23:40 |
rm_work | i think possibly the revert doesn't properly unset the busy flag | 23:40 |
rm_work | i'll look again next time it happens | 23:40 |
rm_work | unfortunately my breakpoints keep getting messed up, so tempest finishes cleanup before i can investigate fully | 23:41 |
johnsom | That could be the case actually | 23:41 |
rm_work | but i just went through and found a ton of amps in the health table with busy-flag set | 23:41 |
rm_work | so there is obviously a case where it doesn't unset properly somewhere | 23:42 |
johnsom | Yeah, because that happens outside the flow | 23:42 |
johnsom | It happens in the HM during "stale" check (still hate that term) | 23:42 |
rm_work | MarkAmphoraHealthBusy | 23:43 |
rm_work | that doesn't have a revert action | 23:43 |
johnsom | so, failover error on revert should unset it | 23:43 |
johnsom | https://github.com/openstack/octavia/blob/master/octavia/db/repositories.py#L1079 | 23:43 |
rm_work | errr | 23:43 |
rm_work | AmphoraToErrorOnRevertTask doesn't | 23:44 |
rm_work | trying to find a task that does | 23:44 |
johnsom | Yeah, that is what I am saying, it probably should unset busy | 23:44 |
rm_work | ah k | 23:44 |
rm_work | you think in AmphoraToErrorOnRevertTask or as a revert on the MarkAmphoraHealthBusy task? | 23:44 |
johnsom | MarkAmphoraHealthBusy is on the new amp I think | 23:44 |
rm_work | err | 23:44 |
rm_work | hmmm | 23:45 |
rm_work | database_tasks.MarkAmphoraHealthBusy( | 23:45 |
rm_work | rebind={constants.AMPHORA: constants.FAILED_AMPHORA}, | 23:45 |
johnsom | Oh, no, it's there for the non-failed failover case | 23:45 |
rm_work | looks like it's on the old one | 23:45 |
rm_work | although I guess doing MarkAmphoraHealthBusy in the flow at all is redundant | 23:46 |
rm_work | because in order to RUN this flow on an amp, we've already set it, right? | 23:46 |
rm_work | so it's just... a noop | 23:46 |
johnsom | Well, either way, your case with PENDING will fail before it gets to MarkAmphoraHealthBusy | 23:46 |
johnsom | Well, if it isn't actually a failure, but failover API it needs to be set | 23:46 |
rm_work | hmmm | 23:47 |
johnsom | failure case, it's a dup, failover case it's needed | 23:47 |
rm_work | ok | 23:47 |
rm_work | so maybe the revert should be there regardless | 23:47 |
rm_work | and then ALSO somewhere higher? | 23:47 |
*** sshank has quit IRC | 23:47 | |
johnsom | Yeah, I think AmphoraToErrorOnRevertTask should clear the busy flag | 23:47 |
*** sshank has joined #openstack-lbaas | 23:52 | |
*** sshank has quit IRC | 23:53 | |
openstackgerrit | Adam Harwell proposed openstack/octavia master: WIP: Floating IP Network Driver (spans L3s) https://review.openstack.org/435612 | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!