Thursday, 2019-03-07

*** tosky has quit IRC00:06
*** kgiusti has left #openstack-oslo00:34
*** lbragstad has quit IRC01:09
*** rcernin has quit IRC01:31
*** rcernin has joined #openstack-oslo01:32
*** dave-mccowan has joined #openstack-oslo02:18
*** ducnv has joined #openstack-oslo02:54
*** ducnv has quit IRC02:56
*** ducnv has joined #openstack-oslo02:56
*** lbragstad has joined #openstack-oslo03:11
*** dave-mccowan has quit IRC04:00
*** bnemec has quit IRC04:26
*** ebbex has quit IRC04:26
*** cburgess has quit IRC04:26
*** stephenfin has quit IRC04:26
*** django has quit IRC04:26
*** sorrison has quit IRC04:26
*** bnemec has joined #openstack-oslo04:53
*** ebbex has joined #openstack-oslo04:53
*** cburgess has joined #openstack-oslo04:53
*** stephenfin has joined #openstack-oslo04:53
*** django has joined #openstack-oslo04:53
*** sorrison has joined #openstack-oslo04:53
*** rcernin has quit IRC04:56
*** rcernin has joined #openstack-oslo05:02
*** lbragstad has quit IRC05:33
*** jhesketh has quit IRC05:53
*** jhesketh has joined #openstack-oslo05:54
openstackgerritMerged openstack/automaton master: add python 3.7 unit test job  https://review.openstack.org/61051306:27
*** e0ne has joined #openstack-oslo06:32
*** lxkong has quit IRC06:52
*** Luzi has joined #openstack-oslo06:54
*** e0ne has quit IRC07:17
*** lxkong has joined #openstack-oslo07:21
*** e0ne has joined #openstack-oslo07:21
*** pcaruana has joined #openstack-oslo07:25
*** rcernin has quit IRC07:27
*** aojea has joined #openstack-oslo07:33
*** e0ne has quit IRC08:04
*** moguimar has joined #openstack-oslo08:22
*** tosky has joined #openstack-oslo08:27
*** e0ne has joined #openstack-oslo10:25
openstackgerritMerged openstack/oslo.vmware master: SDRS recommendation for create VM  https://review.openstack.org/64063911:19
openstackgerritMerged openstack/oslo.vmware master: Return None if no suitable datastore is found  https://review.openstack.org/64064011:20
*** a-pugachev has joined #openstack-oslo11:25
*** a-pugachev_ has joined #openstack-oslo12:02
*** a-pugachev has quit IRC12:05
*** a-pugachev_ is now known as a-pugachev12:05
*** raildo has joined #openstack-oslo12:14
*** a-pugachev has quit IRC12:25
*** a-pugachev has joined #openstack-oslo12:28
openstackgerritStephen Finucane proposed openstack/oslo.db stable/rocky: Resolve SAWarning in Query.soft_delete()  https://review.openstack.org/64165012:48
stephenfinbnemec: When you're about, would you mind looking at https://review.openstack.org/641650 ?12:49
*** a-pugachev has quit IRC12:57
*** kgiusti has joined #openstack-oslo13:34
*** lbragstad has joined #openstack-oslo13:48
*** ansmith has joined #openstack-oslo13:49
*** moguimar has quit IRC15:00
*** Luzi has quit IRC15:35
*** a-pugachev has joined #openstack-oslo15:45
openstackgerritMerged openstack/oslo.db stable/rocky: Resolve SAWarning in Query.soft_delete()  https://review.openstack.org/64165015:45
*** a-pugachev_ has joined #openstack-oslo15:56
*** a-pugachev has quit IRC15:58
*** a-pugachev has joined #openstack-oslo16:00
*** a-pugachev_ has quit IRC16:01
openstackgerritStephen Finucane proposed openstack/oslo.db stable/queens: Resolve SAWarning in Query.soft_delete()  https://review.openstack.org/64172416:44
*** e0ne has quit IRC17:07
*** a-pugachev has quit IRC17:46
*** irclogbot_2 has joined #openstack-oslo18:02
*** e0ne has joined #openstack-oslo18:19
*** e0ne has quit IRC18:24
*** e0ne has joined #openstack-oslo18:29
*** e0ne has quit IRC18:33
*** pcaruana has quit IRC18:46
*** pcaruana has joined #openstack-oslo19:05
openstackgerritBen Nemec proposed openstack/oslo.cache stable/rocky: Fix memcache pool client in monkey-patched environments  https://review.openstack.org/64050019:30
*** pcaruana has quit IRC19:33
*** mnaser has joined #openstack-oslo20:08
mnaserdhellmann: mind if i pick at your brain at a change you helped push..... 6 years ago? :)20:08
dhellmannmnaser : uhm, you can try. :-)20:09
mnaserhttps://review.openstack.org/#/c/63444/3/openstack/common/service.py L195 in service.py20:09
mnaseri'm noticing an issue with nova where a reload is doing really weird behaviour (troubleshooted a tad with the nova team)20:09
mnaserand it seems that im concluding that https://github.com/openstack/oslo.service/blob/471bfe770808a0348fcce1e83753dcc414b361c1/oslo_service/service.py#L395-L396 should really be "if _is_sighup_and_daemon(signo):"20:10
mnaserbecause what that is doing is that if (not) sighup and we're running as a daemon: break.  this means that a reload is resuliting in self.restart() being called20:10
dhellmannis that now how the reload works?20:11
mnaserwell, shouldn't the reload call self.reset() and not self.restart() ?20:11
mnasernova assumes that a reload (sighup) only calls reset() -- not a full service restart20:12
mnasermaybe dansmith can comment better about this, if he has time.. i've just kinda tried to slowly figure this out20:12
dansmithdhellmann: the oslo service code assumes that we should only do a reload type action on sighup if we've self-daemonized,20:13
dansmithwhich is not really legit for a systemd-managed process, or anything running under a supervisor (like regular init for respawn, monit, etc)20:13
dhellmannI wonder if this logic predates that?20:13
dansmithpredates the idea of running a service foreground style under a supervisor? no,20:14
dhellmannbnemec you've looked at the service code more recently than I have, I think20:14
dansmithbut it probably does pre-date the uptick in systemd usage20:14
dhellmannyeah, systemd20:14
dhellmannI trust you both if you say it's wrong, but I don't know this lib well enough to actually confirm it if you're not sure20:15
mnaseri mean, i've only started running into this with our gates failing from a bit of odd behaviour showing up20:15
mnaserour = OSA20:15
dansmithdhellmann:  I disagree with the assertion made in the commit message of the code that added this, so...I'm pretty sure :D20:15
dhellmannlet's get a fix posted then20:17
dansmithlike, it's not that the code is broken, it's that the thing the code is (properly) doing is fundamentally naive, I think20:17
* bnemec passes the buck to zaneb :-)20:17
mnasergiven this is breaking the openstack-ansible gates, i can volunteer myself to fix it if someone knows what needs to be done (and doesn't have the time to :<)20:18
* zaneb hides20:18
dansmithmnaser: honestly, I'd just revert the is-daemon check part and call it good20:18
bnemecWe have a maybe related bug too: https://bugs.launchpad.net/oslo.service/+bug/179470820:18
openstackLaunchpad bug 1794708 in oslo.service "Service children uselessly restart on signals" [Undecided,New]20:18
dansmithbut I'd defer to osloians about notice, deprecation, etc20:18
mnaseralso dansmith found a related bug in neutron20:19
mnaserwhere it just dies on a sighup too20:19
dansmithbnemec: yeah, we have another bug where someone is doing something wrong in nova because they assume this is just how it's supposed to work20:19
dansmithand that20:19
mnaserhttps://bugs.launchpad.net/neutron/+bug/178013920:20
openstackLaunchpad bug 1780139 in neutron "Sending SIGHUP to neutron-server process causes it to hang" [Undecided,Triaged] - Assigned to Bernard Cafarelli (bcafarel)20:20
mnaserthis one20:20
mnaseroh jeez20:21
mnaserdansmith: https://bugs.launchpad.net/nova/+bug/171537420:21
openstackLaunchpad bug 1715374 in OpenStack Compute (nova) "Reloading compute with SIGHUP prevents instances from booting" [High,In progress] - Assigned to Ralf Haferkamp (rhafer)20:21
mnaserwould ya look at that20:21
dansmithmnaser: didn't I say "I hope to god this is not really happening because ... bad things" ? :)20:21
dansmithshutting down rpc and all20:22
mnaserwell i guess we can reference that bug to fix it but it's been around a while20:22
dansmithmnaser: I have oslo.service pulled and am working on it20:22
mnaserdansmith: is it safe to reclassify those bugs as "invalid" for nova (not sure about neutron, but at least nova) and add that it affects oslo.service instead?20:23
bnemecFWIW, it sounds like TripleO hit this as well and worked around it instead: https://bugs.launchpad.net/tripleo/+bug/178968020:23
openstackLaunchpad bug 1789680 in tripleo "mistral MessagingTimeout correlates with containerized undercloud uptime" [Critical,Fix released] - Assigned to Dougal Matthews (d0ugal)20:23
dansmithI don't think so20:23
dansmiththey're legit they're just not bugs in nova20:23
dansmithbnemec: right, which is what was tried to do in nova..20:23
dansmithand maybe neutron too20:23
mnaserdansmith: right so just to move the bug as a oslo.service bug rather than a nova bug20:24
dansmithadd oslo.service I think20:24
dansmithso they affect both20:25
mnaserok, updated https://bugs.launchpad.net/oslo.service/+bug/171537420:25
openstackLaunchpad bug 1715374 in OpenStack Compute (nova) "Reloading compute with SIGHUP prevents instances from booting" [High,In progress] - Assigned to Ralf Haferkamp (rhafer)20:25
dansmithI assume that since that change was merged with zero tests (!) that this should be easy ...20:26
mnaserguess by the logic of things you don't have to write any either :)20:27
dansmithle woot :)20:27
dansmithmnaser: can we do gate hackage to test this against your stuff before we merge20:28
dansmith?20:28
mnaserdansmith: i think so.  i'm trying to think if it can work or not, because it's not a direct dependencies (i.e. nova or neutron) but a dependencies of it20:28
dansmithmriedem is a ninja with that stuff20:29
bnemecWow, I still have a draft comment on https://review.openstack.org/#/c/6344420:29
mnaseri think what could be a way to hack around it is make a dnm change to nova pointing requirements to that wip oslo.service change20:30
bnemecHowever, it was a commit message nitpick as opposed to anything useful.20:30
mnaserand then we should be able to clone nova with updated requirements and pull it down20:30
* mnaser wonders if there's an easy way to update requirements.txt to somehow point at a specific oslo change20:30
openstackgerritDan Smith proposed openstack/oslo.service master: Revert assertion made about SIGHUP behavior for non-daemon services  https://review.openstack.org/64179320:32
dansmithmnaser: ^20:32
dansmithI tagged the oslo and nova bugs there. I think the others have been worked around so probably not worth mentioning, but if others care, we can20:32
mnaserdansmith: shouldn't it be "if is_sighup(signo): break"?20:33
mnaserbecause we want to breakout and not hit the self.restart() ?20:33
dansmithisn't it?20:33
mnaserbecause `_is_sighup_and_daemon` evaluated to true when i ran it in systemd (i actually did a LOG.debug() with a system reload afterwards)20:33
mnaserit's "if not _is_sighup(signo): break" which means break out _only_ if its not sighup unless im really botting out20:34
dansmithoh, I was blindly reverting the and daemon one, but it was "not sighup" before20:34
mnaserthe daemon seemed to be working ok, systemd was saying that it was true, but if `_is_sighup_and_daemon` returns true on a reload, "if not true: break" => it won't break and hit self.restart()20:35
dansmithrestart() is not what we want though right?20:35
mnaserif we do sighup i assume we _dont_ want a restart20:35
*** e0ne has joined #openstack-oslo20:35
zanebI feel like if the 'not' was the wrong way around we'd have noticed that no (maskable) signals other than SIGHUP work for killing the process before now...20:36
mnaserhttps://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L39520:36
dansmithzaneb: right that's what I was thinking20:36
mnaserfrom what i understand, that code will break out only if it's NOT sighup and daemon20:36
mnaserso if you get a reload, you will evaluate false, and it'll call self.restart()20:36
mnaserand i assume self.restart() is not what we want, but self.reset() ?20:36
dansmithmnaser: well we did handle_signal which calls the reset I think20:37
dansmithwe want to not run restart though, AFAIK20:37
mnaserhttps://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L778-L78420:37
mnaserwe call .stop()20:37
mnaserthen .reset() /20:37
mnaserso restart() is not an actual restart of process, but reset20:38
mnaserso we do need to call it, that code is right, but it seems like we are calling .stop() then .reset()20:38
dansmithstop where? from _reload_service?20:38
dansmithI'm actually not sure why we're not exiting in that now20:38
mnaserhttps://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L39720:39
*** aojea has quit IRC20:39
mnaserinto20:39
mnaserhttps://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L31420:39
mnaserinto20:39
mnaserhttps://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L78020:39
mnaserwhich does20:39
mnaserhttps://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L760-L76220:39
dansmithno, I'm saying:20:40
dansmithhttps://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L36020:40
dansmithcalls https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L34220:40
dansmithwhich raises SignalExit20:40
dansmithoh wait I see20:40
dansmithit doesn't exit unless SYSTEMexit20:41
mnaseryeah but the stop still happens in self.restart() which _really_ is self.reset()20:41
mnaserand self.reset() seems to be a self.stop() followed by self.reset()20:42
dansmithnot wure what you mean by that, but,20:42
dansmithif we break on that sighup thing we'll exit the while loop and drop out right?20:42
mnaserdansmith: yes20:43
dansmithwhich will bubble up to ProcessLauncher's while loop,20:43
mnaserbut we don't actually want to break out, because without calling self.restart(), we won't reset20:43
dansmithwhich will also break and then os._exit right?20:43
dansmithI guess I'm confused about what you're suggesting needs to change, because just flipping the logic on "is not sighup" doesn't seem right20:44
mnaserdansmith: yes, i was wrong about that, i assumed self.restart() was a full service restart20:45
mnaserbut in reality, self.restart() is a reset20:45
mnaserso all that logic is okay, but i think your assumption that Service.stop() is _not_ being called is not true, because it is being called on resets20:45
mnaserbecause of https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L778-L78420:46
dansmithbut restart() does a stop()20:46
dansmithmnaser: stop() can't be called while we're running or we tear down rpc20:46
mnaserand we call self.manager.cleanup_host() in stop() in nova's service20:46
dansmithright20:46
mnaserwhich sets self._events = None and then causes the breakage20:47
mnaserso either self.reset() needs to rebuild everything that self.stop() tore down, or we stop calling self.stop() in oslo service and let services handle their own resets20:47
dansmithwe have to make a change here that ends up with us calling Service.reset() without Service.stop() first, agreed?20:47
mnasercorrect20:47
dansmithmnaser: it can't do the former because that means we kill in-flight rpc calls20:48
mnaserright, it seems that right now, oslo's default behaviour is to call stop() on the service, then reset(), then start()20:48
mnaserif you ask me, sounds like a normal restart..20:48
mnaseroslo should just call reset() and do nothing else20:49
dansmithright.. reset should be pointless,20:49
dansmithalthough it's probably doing a reset because it doesn't re-create the Service() object because of the sequencing20:49
dansmithright20:49
dansmithmnaser: I think there might be multple paths in the same file here,20:50
dansmithbecause the ProcessLauncher version looks closer20:50
dansmithwell, no it kills all children too20:50
mnaseryep20:50
dansmithbut ProcessLauncher.wait() is replicating things from ServiceLauncher()20:50
mnaseri'm struggling to understand the why behind that20:51
dansmithmnaser: so I haven't eaten anything all day yet and am pretty foggy and need to go do that.. you wanna take that patch and hack on it?20:51
dansmithyeah, that seems weird to me too20:51
dansmithI'm guessing it's an mpm sort of "processes or threads" sort of thing20:51
mnaserhttps://github.com/openstack/neutron/blob/master/neutron/service.py#L137-L14620:51
mnaserhttps://github.com/openstack/neutron/blob/9a15084375a80df830cf50c1fb20908b5d7f822c/neutron/common/config.py#L106-L11220:52
mnaserim gonna guess neutron makes that same assumption20:52
mnaserand thats' why it dies, it has no rpc after a reload20:52
dansmithyeah20:52
mnaserok, this will be interesting. a fix in oslo is probably the fundamental way but i'll try to study the history behind the cahnges20:52
mnaserand see why we've decided to .stop() .reset() .start()20:52
dansmithI think it has to be a fix in oslo, because we can't really prevent the stop (cleanly) from in nova or neutron20:53
mnaseryeah, it's not like we'd be changing default behavior either, i'll dig20:53
dansmithack, thanks20:54
openstackgerritBen Nemec proposed openstack/oslo.log master: Test bionic for legacy job  https://review.openstack.org/64179920:55
*** e0ne has quit IRC21:14
*** mtreinish has joined #openstack-oslo21:30
*** kgiusti has left #openstack-oslo21:37
*** ansmith has quit IRC21:46
*** e0ne has joined #openstack-oslo22:01
*** raildo has quit IRC22:13
*** a-pugachev has joined #openstack-oslo22:21
*** a-pugachev has quit IRC22:22
*** tosky has quit IRC22:22
*** a-pugachev has joined #openstack-oslo22:22
*** ansmith has joined #openstack-oslo22:28
*** tosky has joined #openstack-oslo22:29
*** e0ne has quit IRC22:34
*** e0ne has joined #openstack-oslo22:34
*** e0ne has quit IRC22:36
*** rcernin has joined #openstack-oslo22:41
openstackgerritBen Nemec proposed openstack/oslo.log master: Test bionic for legacy job  https://review.openstack.org/64179922:42
*** purplerbot has quit IRC23:26
*** purplerbot has joined #openstack-oslo23:26
*** tosky has quit IRC23:34

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!