Wednesday, 2014-01-29

*** derekh has quit IRC00:05
*** jcooley_ has quit IRC00:10
*** derekh has joined #openstack-ironic00:17
*** jcooley_ has joined #openstack-ironic00:17
*** matsuhashi has joined #openstack-ironic00:30
openstackgerritYongli He proposed a change to openstack/ironic: ironic clean up:kill few backslash of conductor  https://review.openstack.org/6976000:38
*** bigjools_ is now known as bigjools00:52
*** bigjools has quit IRC00:53
*** bigjools has joined #openstack-ironic00:53
*** jbjohnso has quit IRC01:04
* NobodyCam waves night all01:26
rloonight NobodyCam.01:33
*** derekh has quit IRC01:36
*** nosnos has joined #openstack-ironic01:37
*** yongli has joined #openstack-ironic01:51
dkehndkehn_: waves back01:52
*** yongli has quit IRC01:53
*** yongli has joined #openstack-ironic01:58
*** rloo has quit IRC02:28
openstackgerritYongli He proposed a change to openstack/ironic: ironic clean up:kill few backslash of conductor  https://review.openstack.org/6976002:31
*** aignatov_ is now known as aignatov03:07
*** vkozhukalov has joined #openstack-ironic03:17
*** matsuhashi has quit IRC03:30
*** matsuhashi has joined #openstack-ironic03:31
*** matsuhashi has quit IRC03:35
*** jcooley_ has quit IRC03:41
*** harlowja is now known as harlowja_away04:16
NobodyCamlol will have to try this out http://www.urbandictionary.com/define.php?term=Python%20Bomb04:38
*** aignatov is now known as aignatov_04:41
mrdaHey all, looking for an ironic core to re +2 this patch - https://review.openstack.org/#/c/68852/  It's already approved, but I hit bug/1254890 a few times, but after re-checking it's ready for gate.04:46
*** matsuhashi has joined #openstack-ironic04:55
*** jcooley_ has joined #openstack-ironic05:22
*** matsuhashi has quit IRC05:34
*** matsuhashi has joined #openstack-ironic05:35
*** nosnos has quit IRC05:38
*** nosnos has joined #openstack-ironic05:41
*** mrda is now known as mrda_away05:41
*** jcooley_ has quit IRC05:50
*** jcooley_ has joined #openstack-ironic06:06
openstackgerritJenkins proposed a change to openstack/ironic: Imported Translations from Transifex  https://review.openstack.org/6802406:07
*** jbjohnso has joined #openstack-ironic06:08
*** vkozhukalov has quit IRC06:13
*** jbjohnso has quit IRC06:14
*** matsuhashi has quit IRC06:23
*** matsuhas_ has joined #openstack-ironic06:25
*** Haomeng|2 has joined #openstack-ironic06:37
*** Haomeng has quit IRC06:37
*** early has quit IRC06:39
*** early has joined #openstack-ironic06:47
*** jcooley_ has quit IRC06:53
*** jcooley_ has joined #openstack-ironic06:55
*** ndipanov has joined #openstack-ironic07:16
*** coolsvap has joined #openstack-ironic07:19
*** ifarkas has joined #openstack-ironic07:41
*** jistr has joined #openstack-ironic07:45
*** rwsu has quit IRC07:46
*** rwsu has joined #openstack-ironic07:50
*** romcheg has joined #openstack-ironic07:54
*** vkozhukalov has joined #openstack-ironic07:55
*** aignatov_ is now known as aignatov07:59
*** romcheg has quit IRC08:00
*** romcheg1 has joined #openstack-ironic08:17
*** aignatov is now known as aignatov_08:27
*** hstimer has quit IRC08:27
*** jcooley_ has quit IRC08:30
*** matsuhas_ has quit IRC09:04
*** matsuhashi has joined #openstack-ironic09:04
*** athomas has joined #openstack-ironic09:11
*** aignatov_ is now known as aignatov09:14
*** matsuhashi has quit IRC09:17
*** derekh has joined #openstack-ironic09:18
*** matsuhashi has joined #openstack-ironic09:19
*** jcooley_ has joined #openstack-ironic09:21
*** coolsvap_away has joined #openstack-ironic09:33
*** coolsvap has quit IRC09:34
*** jcooley_ has quit IRC09:37
*** mdurnosvistov has joined #openstack-ironic09:38
*** coolsvap has joined #openstack-ironic09:45
*** coolsvap_away has quit IRC09:48
*** matsuhashi has quit IRC09:51
*** nosnos has quit IRC09:51
*** martyntaylor has joined #openstack-ironic10:01
*** jcooley_ has joined #openstack-ironic10:09
*** jcooley_ has quit IRC10:15
*** max_lobur_afk is now known as max_lobur10:30
*** aignatov is now known as aignatov_10:37
*** athomas has quit IRC10:50
*** athomas has joined #openstack-ironic10:56
*** jistr has quit IRC10:59
*** lucasagomes has joined #openstack-ironic11:05
*** coolsvap has quit IRC11:15
*** lucasagomes has quit IRC11:15
*** lucasagomes has joined #openstack-ironic11:15
*** jistr has joined #openstack-ironic11:19
*** zul has quit IRC11:20
*** athomas has quit IRC11:22
*** zul has joined #openstack-ironic11:24
*** athomas has joined #openstack-ironic11:31
openstackgerritMikhail Durnosvistov proposed a change to openstack/ironic: Get rid object model `dict` methods part 6  https://review.openstack.org/6433611:39
openstackgerritMikhail Durnosvistov proposed a change to openstack/ironic: Get rid object model `dict` methods part 5  https://review.openstack.org/6427811:39
*** aignatov_ is now known as aignatov11:43
openstackgerritMikhail Durnosvistov proposed a change to openstack/ironic: Get rid object model `dict` methods part 6  https://review.openstack.org/6433611:47
openstackgerritMikhail Durnosvistov proposed a change to openstack/ironic: Get rid object model `dict` methods part 5  https://review.openstack.org/6427811:47
*** jcooley_ has joined #openstack-ironic11:58
*** jcooley_ has quit IRC12:08
*** dshulyak has joined #openstack-ironic12:10
*** max_lobur is now known as max_lobur_afk12:27
*** jcooley_ has joined #openstack-ironic12:55
*** linggao has joined #openstack-ironic13:03
*** jdob has joined #openstack-ironic13:08
*** jcooley_ has quit IRC13:09
ifarkaslucasagomes, ping13:10
lucasagomesifarkas, pong13:10
ifarkaslucasagomes, hey, did you manage to debug the heat create-in-progress issue?13:10
lucasagomesifarkas, I didn't debugged that, as my undercloud was up and running I just went to test the next steps instead13:11
lucasagomesI'm rerunning the tests now13:11
lucasagomesI can try to take a look at the problem when it appears again13:11
ifarkaslucasagomes, ok, let me know if that happens. I figured out another way to debug such an issue13:12
lucasagomesifarkas, right, will let u know13:13
ifarkasI run os-collect-config on undercloud and checed its output13:13
lucasagomesifarkas, btw I had a problem with my seedvm today13:13
ifarkaslucasagomes, oh, what was the issue?13:13
lucasagomeshttp://paste.openstack.org/show/62088/13:13
lucasagomesI'm taking a look at it now13:13
lucasagomeshappened twice http://paste.openstack.org/show/62094/13:13
*** vkozhukalov has quit IRC13:26
*** jistr is now known as jistr|english13:30
*** vkozhukalov has joined #openstack-ironic13:39
*** rloo has joined #openstack-ironic13:47
*** jcooley_ has joined #openstack-ironic13:49
*** jbjohnso has joined #openstack-ironic13:56
*** jcooley_ has quit IRC13:58
*** rloo has quit IRC14:01
*** rloo has joined #openstack-ironic14:02
*** rloo has quit IRC14:02
*** rloo has joined #openstack-ironic14:03
*** rloo has quit IRC14:11
*** rloo has joined #openstack-ironic14:12
*** aignatov is now known as aignatov_14:19
*** matty_dubs|gone is now known as matty_dubs14:20
*** aignatov_ is now known as aignatov14:21
*** rloo has quit IRC14:22
*** rloo has joined #openstack-ironic14:22
*** max_lobur_afk is now known as max_lobur14:23
*** rloo has quit IRC14:36
*** rloo has joined #openstack-ironic14:36
*** aignatov is now known as aignatov_14:40
*** jistr|english is now known as jistr14:41
*** rloo has quit IRC14:42
*** rloo has joined #openstack-ironic14:42
*** aignatov_ is now known as aignatov14:43
*** rloo has quit IRC14:47
*** rloo has joined #openstack-ironic14:47
*** rloo has quit IRC14:48
*** rloo has joined #openstack-ironic14:48
*** vkozhukalov has quit IRC14:49
*** rloo has quit IRC14:50
*** rloo has joined #openstack-ironic14:51
*** rloo has quit IRC14:53
*** rloo has joined #openstack-ironic14:53
*** vkozhukalov has joined #openstack-ironic15:03
*** aignatov is now known as aignatov_15:08
*** rloo has quit IRC15:14
*** rloo has joined #openstack-ironic15:14
*** zul has quit IRC15:30
*** rloo has quit IRC15:30
*** zul has joined #openstack-ironic15:30
romcheg1Morning folks15:31
*** aignatov_ is now known as aignatov15:31
max_loburmorning Ironic!15:32
romcheg1lucasagomes: I've seen you posted a manual on how to use deploy interface with curl. Cannot find it15:33
*** romcheg1 is now known as romcheg15:33
lucasagomesromcheg, pong15:33
lucasagomeslemme find the link15:33
lucasagomesmax_lobur, romcheg morning15:33
lucasagomesromcheg, https://etherpad.openstack.org/p/IronicDeployDevstack ?15:33
romcheglucasagomes: Ah, exactly!15:34
romchegThanks15:34
lucasagomesromcheg, np15:34
*** rloo has joined #openstack-ironic15:34
GheRiveromorning all15:38
*** rloo has quit IRC15:41
*** rloo has joined #openstack-ironic15:42
*** lucasagomes is now known as lucas-hungry15:44
*** rloo__ has joined #openstack-ironic15:49
*** rloo has quit IRC15:49
*** coolsvap has joined #openstack-ironic15:49
devanandagmorning, all15:52
*** jcooley_ has joined #openstack-ironic15:53
openstackgerritMikhail Durnosvistov proposed a change to openstack/ironic: Get rid object model `dict` methods part 6  https://review.openstack.org/6433615:55
openstackgerritMax Lobur proposed a change to openstack/ironic: Fix JSONEncodedDict default values  https://review.openstack.org/6841315:56
NobodyCamGood Morning Ironic, says the man moving slowly15:59
max_loburlol16:00
max_loburmorning NobodyCam :)16:00
NobodyCammorning max_lobur :)16:01
*** rloo__ has quit IRC16:01
*** rloo has joined #openstack-ironic16:01
devanandaNobodyCam: g'morning! I'm up before you again :)16:01
NobodyCamya. this tiny trailer is ruff at night .. lol16:02
devanandaso folks, FYI, looks like the global gate is still having some serious problems16:04
NobodyCam:(16:06
openstackgerritMax Lobur proposed a change to openstack/ironic: Fix JSONEncodedDict default values  https://review.openstack.org/6841316:06
*** vkozhukalov has quit IRC16:07
NobodyCamifarkas: question on your last review of 66461 if your around?16:14
dkehnNobodyCam: yes it is : https://review.openstack.org/#/c/66071/ check-tempest-dsvm-ironic-postgres FAILURE16:15
NobodyCamdkehn: ??? yes it is ? gate having problems?16:18
dkehnNobodyCam: can't hardly wait till code freeze16:18
devanandai'm waiting on -infra to finish fisxing what they're fixing, then i'm going to fire off a round of recheck/reverify16:19
dkehnNobodyCam: where volume jumps up 40%16:19
NobodyCamI saw the TripleO gate lastnight was way way backed up16:20
*** aignatov is now known as aignatov_16:25
* devananda looks at the history of https://review.openstack.org/#/c/69495/ 16:25
devanandawhat are the chances of NOT hitting a non-deterministic failure? very, very low.16:26
rloodevananda. What's the diff between reapproving, vs doing a recheck (for https://review.openstack.org/#/c/69495/)16:32
*** lucas-hungry is now known as lucasagomes16:34
lucasagomesmorning NobodyCam devananda rloo16:35
*** jcooley_ has quit IRC16:37
*** jcooley_ has joined #openstack-ironic16:37
NobodyCammorning lucasagomes16:38
max_loburlucasagomes, devananda, do you have some time to discuss background task cancellation today?16:41
lucasagomesmax_lobur, yes :)16:41
*** jcooley_ has quit IRC16:42
*** mdurnosvistov has quit IRC16:45
devanandamax_lobur: yes!16:45
rlooafternoon lucasagomes ;)16:46
devanandarloo: recheck -- jenkins will run check tests again16:46
devanandarloo: reverify -- jenkins will run gate tests,a nd if it passes, merge to trunk16:46
max_loburcool16:46
devanandarloo: core member re-approving is a lazy way to reverify without taggign a bug16:46
max_loburso currently I can imagine two ways16:46
devanandarloo: anyone can "recehck no bug" but "reverify no bug" doesn't work. and I didn't feel like trying to figure out which bug caused the failure /again/ on that patch16:47
rloodevananda. thx. one of the many advs of being a core member ;)16:47
NobodyCamspeak softly a=but carry a big check mark :-p16:48
NobodyCams/a=//16:48
devanandaanyone around looking for low-hanging-fruit?16:49
max_lobur1. To maintain thread pool and terminate running thread  somehow. Which not a good pattern. For reference http://stackoverflow.com/questions/323972/is-there-any-way-to-kill-a-thread-in-python16:50
devanandamax_lobur: yes. killing a thread abruptly == NO.16:50
max_loburIn this case we may have corrupted data in conductor process and corrupted node - somewhere in the middle of deployment or changing power state16:50
devanandamax_lobur: and quite likely, leaked memory/resources/etc16:51
max_loburdevananda, true16:51
devanandamax_lobur: which is why proper SMP architecture uses signals and traps16:51
max_lobur2. To have a set of checkpoints within every long-running task. Every checkpoint is point where the task can be stopped and rolled back16:51
max_loburI like this more16:52
lucasagomesmax_lobur, 2 sounds good16:52
devananda2 is half-way there. it may help bu tmay not be enough16:52
lucasagomesand opens up a space to use things like16:52
lucasagomestaskflow for it16:52
devanandathink of the step "dd ${image} ${iscs_target}"16:52
devananda2 will have to wait for that to finish16:52
max_loburyep16:52
max_loburif we want to keep our data consistent16:53
devanandaso this is good16:53
devanandait's not an immediate interrupt16:53
max_loburwe need to be agreed with the fact that task cannot be cancelled immediately16:53
devanandaright16:53
max_loburit's only can be planned to be cancelled16:53
lucasagomes+116:53
devananda++16:53
lucasagomesfirmware updates for e.g16:53
lucasagomesI don't see any problem in waiting a task to finish before canceling the operation16:54
devanandawhere task == some smaller unit of work16:54
lucasagomesyup16:54
devanandanot the TaskManager instance16:54
lucasagomesdevananda, have u looked at taskflow?16:54
max_loburI think not only cancelling, but rolling back to beginning right&16:54
devanandalucasagomes: yes, but not in a while. harlowja_away and I talked a lot when he started the project16:54
lucasagomesdo you think that we could benefit from it?16:54
lucasagomescinder uses it afaict16:55
devanandalucasagomes: possibly, but I don't want to introduce that right now16:55
devanandain Juno, maybe16:55
lucasagomesdevananda, right16:55
lucasagomesmakes sense16:55
devanandalucasagomes: but bringing up proper task management // checkpoint&rollback // etc can not be discussed within our current framework16:55
devanandalucasagomes: for that, i think we will need a third service16:56
lucasagomesyea16:56
max_loburdevananda, +16:56
devanandaAPI <-> TaskManager <-> ConductorManager16:56
lucasagomesfor the rolling back part that would makes a lot of sense to use taskflow instead of implementing our own16:56
max_loburto make possible rollbacks in case conductor has crashed16:56
*** matty_dubs is now known as matty_dubs|lunch16:56
devanandasame thing applies to the ML thread about batching or coalescing requests16:57
max_loburright16:57
devanandaa third service which received the API requests and could coalesce them could then issue them all in a batch to the conductor16:57
devanandaas a single task16:58
lucasagomeshmm16:58
lucasagomesthere's a service mistral I think16:58
lucasagomesI think is the name16:58
devanandagannt, too16:59
lucasagomesto provide task as a service or something like that16:59
lucasagomeshttps://wiki.openstack.org/wiki/Mistral16:59
lucasagomesI see16:59
devanandataht said, i'm looking quite far down the road17:00
max_loburso, at current time, do you think it's possible/worth time to try to implement it?17:00
* devananda pulls his head out of the rabbit hole17:00
max_loburI was thinking about rollback batch17:00
devanandamax_lobur: let's focus on interrupt for the moment17:00
devanandawithout a separate service to coordinate things17:00
devanandawe will need to route API request (interrupt what node XXX is doing) to the appropriate conductor17:01
devananda- i think that can be done today17:01
max_loburdevananda, yes17:01
devanandaand the greenthread that's doing $THING will need to be checking whether it has been signalled to stop17:01
max_loburusing reservation field right?17:01
devanandawell17:01
devanandamax_lobur: no17:02
devanandamax_lobur: API service will use the hash_ring to find whcih conductor that $NODE is mapped to, then issue RPC request to that conductor's bus17:02
max_loburah, right17:02
devanandathere is risk, if the ring changes, that interrupt would not be possible17:02
devanandabecause message would be routed to a different conductor17:03
devanandaso we need to handle this and send back error to the API17:03
devanandawe could use broadcasts... but i'm side tracking again17:03
devanandawithin each interruptible method (eg, deploy)17:04
devanandai see two options17:04
devananda- do some "magic" in eventlet that will check for our interrupt signal between each LOC17:04
max_lobur:)17:05
devananda- add an explicit call at certain points in the method to check for the signal17:05
*** jistr has quit IRC17:05
devanandai prefer #217:05
max_loburI like the second17:05
* devananda mocks up something on pastebin17:05
max_loburdevananda, lucasagomes so do we want only interruption, or interruption + rollback?17:05
lucasagomesmax_lobur, I would aim for interruption only in the moment17:06
lucasagomesfor rollback we would have to break the method doing the actual work17:06
lucasagomesin atomic subclasses/methods17:06
lucasagomesso we know hw to rollback each class etc17:06
max_loburyea, rollback will probably introduce a lot of bugs at it won't be reliable. because of possible conductor crash17:07
lucasagomesI mean, both can be done but I would focus on the interruption first17:07
lucasagomesand then if there's time we can start taking a look at rollingback17:07
lucasagomesyea also, it more things to test etc17:07
*** linggao has quit IRC17:07
lucasagomess/each class/each task/g17:08
lucasagomesmax_lobur, +117:08
max_loburI was thinking about rollback batches. F.e. each action like dd will add an opposite action to the rollback batch, then, if we want to cancel the task, we'll hit one of the checkpoints and run all callables from rollback batch in opposite order17:10
max_loburbut this will probably add a lot of code17:10
max_loburit's something like rewinding back17:11
devanandahttp://paste.openstack.org/show/mSj9WnYdJvaICPCC2uvJ/17:11
devanandaincluding rollback -- but we can leave that out17:11
devanandajust an idea of how it might be done17:11
* max_lobur looks17:12
devanandamax_lobur: the "if some_itnerrupt_was_recieved" step is not clear to me -- i'm hoping you have some insight in how we can do that with greenthreads17:13
devanandamax_lobur: eg, inject some change into a greenthread17:13
devanandaer, not what i meant to say17:13
max_loburhmm lemme think17:15
max_loburinitially I thought about DB17:15
max_loburbut it will DDOS it17:15
devanandayes17:15
devanandawon't scale17:15
max_loburneed something better17:15
max_loburdevananda, in you mockup the _rollback_second() will need to rollback first too right?17:16
max_loburah17:16
max_loburI see17:16
devanandait does17:16
max_loburyou called rollback first17:16
max_loburbut what if we have first and second in different scopes17:16
max_loburin different methods17:17
devanandaso, as an aside, this is a very crude form of TaskFlow17:17
max_loburpls take a look at my comment about rollback batches (see above), what do you think17:18
lucasagomes devananda +117:18
lucasagomesmax_lobur, https://github.com/openstack/cinder/blob/master/cinder/volume/flows/api/create_volume.py#L1678-L168117:18
* max_lobur looks how to pass something to greenthread17:18
lucasagomeshttps://github.com/openstack/cinder/blob/master/cinder/volume/flows/api/create_volume.py#L680-L69817:18
max_loburlucasagomes, right17:19
max_loburso they have execute and revert for each unit of work17:19
max_loburcool17:20
lucasagomesI would just live the rollback aside, or if we are going to implement it17:20
max_loburit's like a DB migrations :)17:20
lucasagomeslet's not reinvent it and use something that does it already17:20
max_loburyes17:20
lucasagomesmax_lobur, exactly17:20
lucasagomeslike a transition17:20
max_loburyea this will require us to rewrite most of the code using taskflow approach17:21
lucasagomesyup17:21
devanandaso all that is doing a flow and handling rollback if the flow fails part-way17:21
devanandawhat we're talking about is somewhat different17:21
devanandasending a message to the flow FROM OUTSIDE to stop it17:21
devanandaharlowja_away: is ^ supported by taskflow today?17:21
lucasagomesdevananda, I think17:22
lucasagomesI think that, hmm it would be just a trigger17:22
lucasagomesI mean, the flow is running, each task has it's execute() method17:23
lucasagomesand the execute() method for each class can check whether the flow should stop or not17:23
lucasagomesif yes, it raises an exception that will make taskflow to rollback it17:24
lucasagomesbut I think it's too complex for now, I would go with a simple approach just to interrupt the current work17:24
lucasagomeswithout rollback/taskflow etc17:24
*** hemnafk is now known as hemna_17:25
lucasagomes(note: I'm wasn't tihnking about a third service managing the flow, I'm thinking about conductor using the taskflow lib to create the flow inside it)17:26
max_loburso roughly to send a signal to the thread we need to maintain some common collection of the cancellation tokens - for example a dictionary global to each conductor process17:27
max_loburit should be {"greenthread_id": <True/False> - cancelled or not ...}17:28
max_loburthen from within each check_signals( we'll need to get current greenthread id17:28
max_loburgo to that collection17:28
max_loburand look if it was cancelled17:28
max_loburthis needs to be prototyped17:29
NobodyCammax_lobur: are you thinking shared directory like nfs?17:29
max_lobursince we will just read the collection from all the greenthreads it won't block17:30
max_loburand we'll write to it from only one thead17:30
max_loburthat one which handles rpc requests17:30
max_loburwell, not from one17:30
max_loburNobodyCam, no, I mean usual python dictionary, just global for a conductor17:31
max_lobursomewhere on module level17:31
*** rloo__ has joined #openstack-ironic17:31
*** rloo has quit IRC17:31
devanandamax_lobur: look at resource_manager.py17:31
NobodyCamahh /me miss read :-p17:31
*** rloo__ has quit IRC17:32
devanandamax_lobur: there is already a shared object for each node which maintains references back to every TaskManager which acquired that node17:32
*** rloo has joined #openstack-ironic17:32
devanandamax_lobur: in the case of shared locks, may be >1 greenthread in the conductor which has acquired the Node17:32
devanandamax_lobur: but only one with exclusive lock17:32
devanandamax_lobur: i think this would be the right place to add a greenthread_id reference17:33
max_loburdevananda, right, I'll take a look now17:33
max_loburwe also will need to maintain greenthread_id<-->task_id somewhere, and maintain tasks in db. So once the user submitted a deployment, he will receive his task_id and will be able to use it to cancel the task, right? or there are other ways?17:36
max_loburhow do we want to say what exactly background task do we want to cancel?:)17:36
max_loburor it will be - cancell every active task for particular node17:37
devanandamax_lobur: no17:37
devanandamax_lobur: we dont maintain task id in DB. and we don't support >1 task per node17:37
devanandasorry for confusion17:37
devanandamost tasks we use today require exclusive lock17:38
NobodyCambbt...brb17:38
devanandathere are some which, in principle, may take a shared lock17:38
devanandaso we support shared locks too17:38
devanandaexclusive lock is tracked in DB -- just in case there is a node rebalance, exclusive lock is maintained across conductors17:38
devanandashared lock is only tracked locally, within the memory state of the conductor. it doesn't need to be interrupted17:39
devanandashared lock is for eg. validate()17:39
devanandaexlusive is for eg. set_power_state or deploy17:39
devanandamax_lobur: so you'll want to find the greenthread_id of the task holding an exlusive lock17:40
max_loburyes:)17:40
max_loburI was meant, for example if the user submitted a deployment17:40
max_loburhow the command to cancel the deployment will look like?17:41
max_loburit will only have node id right?17:41
devanandayes17:42
devanandaactually this may not need to be exposed at all17:42
devanandait should probably be internal to our RPC API17:43
devanandaor even just inside the conductor17:43
devanandaeg17:43
devanandaif a user wants to cancel the deploy they started, they should just issue undeploy17:43
devanandayes?17:43
max_loburhmm17:44
devanandaif power_on is taking too long, what is more intuitive -- "ironic interrupt-node $NODE" or "ironic power-off $NODE"17:44
NobodyCamironic cancel-current-node-action $NODE17:44
max_loburI'd say interrupt17:45
max_loburNobodyCam, or so17:45
max_loburbecause if power on hang17:45
lucasagomesalso, the interruption might be triggered internally17:45
lucasagomesdue a timeout17:45
*** matty_dubs|lunch is now known as matty_dubs17:45
max_loburwhy should I think that power off will succeed :)17:45
devanandaNobodyCam: think how this is goign to be implemented in the nova driver17:45
devanandanova may get a request to delete the instance which hasn't finished deploying yet17:46
devanandabecause users are impatient :)17:46
max_loburlucasagomes, +17:46
max_loburhehe :)17:46
max_loburbrb17:46
NobodyCampost bbt walkies.. bbiafm17:47
lucasagomesdevananda, NobodyCam what needs to be set on the nova.conf in order to use the ironic driver? http://paste.openstack.org/show/62119/17:48
lucasagomesdoes that looks correct? I need to set something else17:48
lucasagomes?17:48
max_loburback17:48
devanandalucasagomes: looks right??17:49
devanandabbiafm17:49
lucasagomesdevananda, cheers :)17:49
*** aignatov_ is now known as aignatov17:50
* max_lobur saved the link to nova.conf :D17:51
lucasagomes:)17:52
*** jcooley_ has joined #openstack-ironic17:53
*** harlowja_away is now known as harlowja17:54
harlowjadevananda u guys talking about flows :-P17:55
harlowjadevananda define outside :)17:57
*** derekh has quit IRC18:01
max_loburharlowja, hi! yes we are :)18:01
harlowjasweet18:01
harlowjamax_lobur from the above it seems u guys are building mini-workflows :-P18:01
* harlowja maybe u are just discussing (not sure)18:02
max_loburharlowja, almost :) we're trying to make possible to cancel background threads. Using cancellation checkpoints set across the code18:03
*** aignatov is now known as aignatov_18:03
max_loburwe're not going to have rollbacks on this stage18:03
harlowjak, so let me describe a little bit of how taskflow is doing this :)18:04
harlowjait does have i think what u are describing18:04
harlowjaand it also has rollbacks18:04
*** vkozhukalov has joined #openstack-ironic18:04
harlowjaso in taskflow, there's a concept of a task object (it has execute and revert methods)18:05
max_loburyep, lucasagomes posted an example18:05
harlowjaah18:05
harlowjau guys ahead of me :-P18:05
harlowjahaha18:05
max_loburheh :)18:05
harlowjaok, so those are formed into larger structures (flows)18:05
harlowjaflows can describe data-flow dependencies or just other random dependencies (no cycles currently)18:06
*** aignatov_ is now known as aignatov18:06
harlowjaall of that gets executed by an engine18:06
harlowjathe engine does a bunch of state-transitions18:06
harlowjaand manages the data-flow between tasks18:07
harlowjait has suspend methods that are equivalent to your cancel18:07
harlowjaand at anytime during running u can suspend it (which after current tasks finish running, aka no preemption, the engine will stop running other tasks)18:07
devanandame is back18:07
max_loburdevananda, wb:)18:08
harlowjaso the suspending can be activated by another thread (if thats how this wants to be used)18:08
harlowjau then also don't need 'cancellation checkpoints'18:09
max_loburharlowja, yea that's exactly what we need, but18:09
harlowjasince at every state-transition the engine does it will check if it has been suspended18:09
harlowja*https://wiki.openstack.org/wiki/TaskFlow/States_of_Task_and_Flow18:09
max_loburat current stage it may be overhead to reinvent all what we have using taskflow framework18:10
max_loburharlowja, looks cool!18:11
devanandaharlowja: suspend ... or cancel?18:11
devanandaharlowja: we dont want to leave things dangling and resumable18:11
harlowjamax_lobur sure, of course, never easy to do refactoring :-P18:11
devanandaharlowja: this is for eg. an impatient user cancelled their deploy18:11
devanandaharlowja: or the deploy has stalled waiting for a callback from teh hardware that never happened due to a network error18:11
devanandaharlowja: in either case, we want to stop the "task" at the next interruptible point18:12
harlowjadevananda so i think it would be fine to expose the immediate revert method then, which would stop then start reversion18:12
harlowjathats currently not exposed as a public engine api, but could be18:13
devanandamax_lobur: also we do need some sort of cleanup18:13
devanandamax_lobur: even if it's not a full rollback, each method that can be interrupted needs to be able to be cleaned up18:13
lucasagomesyea, taskflow looks promising for us18:13
devanandaharlowja: cool, thanks18:13
max_loburdevananda, +18:13
harlowjadevananda np18:13
lucasagomesbut I don't think we will get it done for this cycle18:13
devanandaso fokls - there's a lot of long-tail work we can see18:13
devanandalet's focus on the real bug in front of us :)18:14
lucasagomesharlowja, thank you for the explanation18:14
harlowja*isn't there always ;)18:14
lucasagomesdevananda, +218:14
harlowjalucasagomes np18:14
*** martyntaylor has left #openstack-ironic18:14
devanandahttps://bugs.launchpad.net/ironic/+bug/127098618:14
devanandaand https://blueprints.launchpad.net/ironic/+spec/generic-timeouts18:15
devanandaand https://blueprints.launchpad.net/ironic/+spec/abort-deployment18:15
devanandathese are all related18:16
devanandaa deploy can be interrupted, broadly speaking, at three places18:16
devananda*a deploy of the PXE driver ...18:16
devanandaduring driver.deploy(), which finishes when the node powers on18:17
devanandaduring the interim while the node PXE boots the deploy ramdisk and POSTs back18:17
devanandaduring the driver.vendor._continue_deploy() phase, which dd's the image onto the node and power cycles it once again18:17
devanandain step 2, there is currently nothing running inside of the ironic conductorManager18:18
devanandano lock held. nothing to interrupt.18:18
devanandaso we need a general timeout that is registered when deploy starts, can trigger at any point during all 3 stages, and is removed at end of stage 318:20
devanandathat timeout should fire off the same interrupt that a user could initiate18:20
devanandaand perform the same cleanup18:20
devanandaetc18:20
devanandaEOL18:21
max_loburso at stage 1 and 3 we'll need to cancel the greenthread18:22
max_loburat 2 we'll need to clear node state in db and wipe the node, right?18:22
*** hstimer has joined #openstack-ironic18:22
* max_lobur meant steps to interrupt deployment for each stages18:23
devananda1 and 3 need to interrupt a greenthread18:24
devananda2 and 3 need to power off the node18:24
devanandaall steps need to call driver.deploy.clean_up() to delete images, tftp config, etc18:24
max_loburcurrently we can't see from outside on what stage we are right?18:25
max_lobure.g. our node state doesn't track the,18:25
max_lobur*them18:26
devanandawe can look at node.provision_state18:26
devanandayes18:26
devanandaactually no18:27
devanandastate == DEPLOYING for both all 318:27
max_lobursorry, I haven't dig into deployment much, I can ask stupid questions :)18:27
devanandanot a stupid question :)18:28
max_loburright, so once we distinguish between the stages of deployment18:28
max_loburwe can take an appropriate action18:28
max_loburto interrupt it18:28
max_loburperiodic task may serve as a watchdog that will check timer expiration18:29
devanandathe cleanup logic should reside in the driver18:29
lucasagomesshould maybe we add a new state? for when the config files is just being created?18:29
lucasagomesinstead of using deploying for all of the tasks?18:29
devanandaand should be registered as a callback18:29
devanandathat's how it should track what to do for rollback, not by a dependency on node state18:29
devanandalucasagomes: that way leads to problems18:30
* devananda points to the pastebin he linked earlier18:30
devanandahttp://paste.openstack.org/show/mSj9WnYdJvaICPCC2uvJ/18:30
devanandanote that all the logic for rolling back is encapsulated within do_long_thing18:31
devanandaat the interruptible points, a callback is set which knows how to clean up at that point18:31
devanandai can flesh it out with deploy and continue_deploy if that helps make it clear18:31
NobodyCam*flush even18:32
lucasagomeshmm I see18:32
devanandaNobodyCam: no, flesh :) http://eggcorns.lascribe.net/english/349/flush/18:34
devanandaNobodyCam: unless we're chasing vermin and need to flush them out of hiding ;)18:35
max_lobur:D18:35
* devananda likes eggcorns18:36
NobodyCamflesh: the soft substance consisting of muscle and fat that is found between the skin and bones of an animal or a human.18:36
NobodyCamalso (put weight on.)18:37
devanandayep18:37
devanandaso my idea is too skinny. i need to flesh it out :)18:37
devanandato add weight to it18:37
NobodyCamahh, /me learns new work usage18:37
lucasagomeshah cool expression indeed18:37
NobodyCamword even18:37
NobodyCam:)18:37
devanandamax_lobur: so, i think we should implement the interrupt first18:39
NobodyCamdevananda: quick question on line # 4 for https://review.openstack.org/#/c/66461/4/elements/nova-ironic/os-refresh-config/configure.d/80-ironic-ssh-power-key18:39
max_loburok, so I'll try to create a prototype of how we're going to cancel greenthread that holds exclusive lock18:39
max_loburwill see how it looks like18:39
NobodyCamI added my nick to the todo but, I'm not sure we can actually do it18:39
NobodyCamimages can be built out side of a ironic env. also ironic can support multi power control modes18:41
NobodyCamjust wanted to check on your thought path when you initialy added that18:42
devanandamax_lobur: steps seem to be 1) tell a greenthread to stop at the next checkpoint, 2) add some checkpoint hooks to conductor_manager.do_node_deploy and pxe.deploy.[prepare|deploy], and 3) have these checkpoints register a callback (it'll call pxe.deploy.clean_up, pxe.deploy.teardown, etc, depending on location)18:42
devanandamax_lobur: then we can add something to register timeouts and a periodic-task that checks timeouts and fires off interrupts18:42
devanandamax_lobur: sound about right?18:42
devanandaNobodyCam: looking18:43
devanandaNobodyCam: i dont see a problem with generating it all the time18:43
NobodyCamit was the todo bail out I was questioning18:44
devanandaNobodyCam: ssh power driver is part of trunk. it's unlikely that fokls will deploy ironic and explicitly remove it from the codebase18:44
max_loburdevananda, great plan, thanks for ordering things in my mind :D18:44
devanandaNobodyCam: right. I'd just remove the TODO18:44
NobodyCam:) thats what I thought just wanted to check and see if you had a plan I didn't think of18:44
NobodyCam:)18:44
devanandamax_lobur: could you toss this up somewhere for future reference? Either on one of the existing BPs or send a summary to the ML18:44
max_loburI'll take a look for existing bps18:45
max_loburmailing threads become so long with time18:45
lucasagomesmaybe an etherpad?18:46
lucasagomesdevananda, http://paste.openstack.org/show/62126/18:46
devanandahttps://blueprints.launchpad.net/ironic/+spec/generic-timeouts and https://blueprints.launchpad.net/ironic/+spec/abort-deployment and https://blueprints.launchpad.net/ironic/+spec/breaking-resource-locks18:46
devanandawe should probably consolidate those ...18:46
devanandaa bit of redundancy :)18:46
lucasagomesdevananda, I just looked at my nova.conf18:47
lucasagomesit does have an entry18:47
lucasagomesscheduler_driver = nova.scheduler.filter_scheduler.FilterScheduler18:47
max_loburdevananda, seems we need a parent bp for those 318:49
max_loburmakes sense?18:49
devanandamax_lobur: or we invalidate one and leave other 218:50
devanandaactually, nvm18:51
max_loburwhich one?18:51
devanandayou're right18:51
max_loburdeployment?18:51
max_loburok :)18:51
devanandabreaking-resoruce-locks is not the same as abort-deployment18:51
devanandait's also about breaking the lock of a dead conductor18:51
max_loburtrue18:51
max_loburI'll try fill the parent bp if you don't mind18:53
devanandasure, thanks18:56
lucasagomesdevananda, NobodyCam have a minute? I'm trying to configure the ironic driver for nova with devstack (and implement it in devstack later, that will be used by our tests)18:57
lucasagomesdevananda, NobodyCam how would the scheduler pick an ironic node?18:57
devanandalucasagomes: nova-ironic driver exposes nodes to scheduler18:58
lucasagomesdoes this flow sounds correct? register the node in ironic (with properties etc etc etc) -> create a flavor in nova -> issue nova boot18:58
* max_lobur filling a bp18:58
NobodyCamdevananda: yes it does18:58
lucasagomesdevananda, right, via IronicHostManager?18:58
devanandalucasagomes: you're missing a step: wait ~ 2 minutes for nova scheduler to become aware of resources18:58
lucasagomesdevananda, ahnn18:59
lucasagomesright right18:59
NobodyCamyes there is a delay18:59
lucasagomesdevananda, there's anyway I can check if it's aweare of the resources?18:59
devanandalucasagomes: no, via nova.ironic.driver:get_available_nodes18:59
devanandalucasagomes: tail -f nova-compute.log18:59
devanandayou'll see the audit of available resources18:59
NobodyCamdevananda: ++18:59
devanandawhen n-cpu becomes aware of the resources, it'll log it18:59
devanandathere is a way to check in the DB, but I forget it right now19:00
lucasagomesdevananda, NobodyCam gotcha19:00
lucasagomesthanks19:00
NobodyCamthou I cheat and watch the ironic api log to see nova query for the node19:00
devanandathat ^ works too19:00
devanandabut watching for resource in n-cpu is the same for baremetal and ironic :)19:01
NobodyCamyes19:01
matty_dubslucasagomes: Do you know if https://review.openstack.org/#/c/66925 is still needed for devstack+Ironic? Brant's comment sort of suggested otherwise, but I'm unsure.19:08
lucasagomesright, I might have been doing something wrong cause it's not returning the nodes to nova /me will investigate19:08
lucasagomeshttp://paste.openstack.org/show/62128/19:09
lucasagomesmatty_dubs, yea19:09
matty_dubsthanks lucasagomes, will apply19:09
lucasagomesmatty_dubs, I will investigate if we should add the version to the url19:10
lucasagomesif it's not present19:10
lucasagomesmatty_dubs, but in the moment it's needed19:10
lucasagomesNobodyCam, https://review.openstack.org/#/c/51328/12/nova/virt/ironic/driver.py L29219:11
lucasagomesif power_state is None we are not going to use it19:11
lucasagomeshmm19:11
NobodyCamlucasagomes: yes a vaild node will have a valid power state19:12
NobodyCamonly fake driver will leave it at none19:12
NobodyCamwhich is why I do ironic node-set-power-state $IRONIC_NODE_ID off19:13
lucasagomesNobodyCam, right, we don't immediatly set the power state when we create the node19:13
NobodyCamfor my testing with fake driver19:13
lucasagomesI see19:13
NobodyCamssh and ipmi will auto set via background task19:13
lucasagomesyup yea19:14
lucasagomescheers :D19:14
NobodyCam:)19:14
NobodyCambrb make'n mo coffee19:15
*** athomas has quit IRC19:24
lucasagomesNobodyCam, can I submit reviews for the driver for nova? or u prefer to me to just comment on it?19:31
lucasagomeshttps://review.openstack.org/#/c/51328/12/nova/virt/ironic/driver.py L563 causes it to fail, in the unplug_vifs it's handling the wrong exception19:32
NobodyCamhumm in the try ?19:33
NobodyCamironic_exception.HTTPInternalServerError19:33
lucasagomesNobodyCam, yup19:34
lucasagomesrelated:https://review.openstack.org/#/c/68457/19:34
lucasagomeshttp://paste.openstack.org/show/62130/19:34
max_loburdevananda, we talked primarily about deployment cancellation. are there other similar tasks  (firmware updates?)19:34
max_loburI think of bp name cancel-long-running-tasks19:34
openstackgerritDevananda van der Veen proposed a change to openstack/ironic: Sanitize node.last_error message strings  https://review.openstack.org/6471119:34
max_loburnode power on / off probably too19:35
max_loburif they hang19:35
devanandamax_lobur: deploy cancellation is the main need today. but yes, there will be others19:35
max_loburso is cancel-long-running-tasks ok?19:36
devanandamax_lobur: we alrady have bp for generic-timeouts19:36
devanandahm19:37
NobodyCam humm do we have a keyerror19:37
max_loburit's gist is "Individual timeouts should be configurable per-driver."19:37
max_loburAs I understood19:37
devanandayes19:37
*** martyntaylor has joined #openstack-ironic19:38
max_loburmaybe just task-cancellation19:38
*** martyntaylor has left #openstack-ironic19:38
devanandaah, so an umbrella might be make-tasks-interruptible19:38
devanandaor something19:38
devanandayea19:38
lucasagomesNobodyCam, actually it will be clientsideerror19:38
devanandacause it's not just about long-running things, or timeouts19:38
max_loburmake-tasks-interruptible sounds better19:39
lucasagomeshttp code 400 instead of internalservererror httpcode 50019:39
devanandamax_lobur: ++19:39
max_loburok, finishing19:39
lucasagomesNobodyCam, I'll investigate that 1 sec19:39
NobodyCamlucasagomes: ack19:39
openstackgerritDevananda van der Veen proposed a change to openstack/ironic: Sanitize node.last_error message strings  https://review.openstack.org/6471119:40
max_loburare we able to edit bp body after it created? I marked some unclear places by "?" and would like to remove them after one more round of discussion19:41
devanandamax_lobur: yes19:42
max_loburand is somebody other than creator able to edit it19:42
openstackgerritDevananda van der Veen proposed a change to openstack/ironic: API: Add sample() method on Node  https://review.openstack.org/6553619:42
devanandaalso, max_lobur, take a look at https://review.openstack.org/#/c/48198/219:44
devanandamax_lobur: only creator and PTL, i believe19:44
devanandamax_lobur: you can use the "specification URL" to point to a wiki or etherpad19:44
devanandamax_lobur: that is usually better than making many edits to the bp descriptin19:44
max_loburright, will move detailed description to etherpad19:45
max_loburthx!19:45
* max_lobur looking to the 4819819:46
devanandayuriyz, max_lobur - we may want to revive https://review.openstack.org/#/c/48198/2 and incorporate this into the work max_lobur is starting for interruptions19:46
*** mdurnosvistov has joined #openstack-ironic19:46
* devananda trolls through the status:abandoned list19:49
devanandaerm, woops19:49
devanandas/trolls/trawls/ :)19:49
max_lobur:D19:50
lifelessdevananda: trolls sounds appropriate19:50
max_loburtrolls may be acceptable too19:50
max_lobur:D19:50
lifelesso/19:50
NobodyCam:)19:50
max_loburhttps://blueprints.launchpad.net/ironic/+spec/make-tasks-interruptible done19:50
NobodyCamhey lifeless :)19:51
max_loburhi lifeless19:51
max_loburdevananda, lucasagomes ^ the bp ref19:51
lucasagomesmax_lobur, cheers :D19:52
max_lobur and I'm going to go home..19:52
devanandamax_lobur: thanks! g'night :)19:52
NobodyCamhave a good night max_lobur :)19:52
lucasagomesmax_lobur, g'night19:52
lucasagomesNobodyCam, http://paste.openstack.org/show/62131/19:53
max_loburyep, see you tomorrow :)19:53
max_loburnight Everyone!19:53
*** max_lobur is now known as max_lobur_afk19:53
* NobodyCam looks19:53
lucasagomesNobodyCam, so, the kernel+ramdisk should be registered in nova?19:54
NobodyCamlucasagomes: did you set deploy_kernel_id on the flavor?19:54
lucasagomesNobodyCam, heh nop19:54
lucasagomesI mean19:54
lucasagomesI thought I would register it directly in ironic19:54
devanandalucasagomes: looks like https://review.openstack.org/#/c/58266/2 may want to be restored?19:54
lucasagomesdevananda, ohhh +119:55
lucasagomesdevananda, will restore that19:55
lucasagomesdevananda, thank u!19:55
lucasagomesNobodyCam, do you have the command line do you use to register the flavor there handy?19:56
lucasagomesI was just doing19:56
lucasagomesnova flavor-create baremetal auto 512 10 119:56
NobodyCamlucasagomes: wait one sec19:56
*** vkozhukalov has quit IRC19:56
NobodyCamI may have spoke too soon19:57
lucasagomesNobodyCam, right thanks19:57
NobodyCamnope19:57
NobodyCamlines 48 thru 52 of https://review.openstack.org/#/c/51328/12/nova/virt/ironic/ironic_driver_fields.py19:58
devanandalucasagomes: same for https://review.openstack.org/#/c/61960/319:58
NobodyCam'nova_object': 'flavor','object_field': 'extra_specs/baremetal:deploy_kernel_id'},19:58
lucasagomesdevananda, I think you already propoused a better solution for that19:59
devanandaah, k19:59
lucasagomesdevananda, using the hashring to identify if the driver is available or not19:59
devanandalucasagomes: ah yes ,thanks!20:00
devanandai should scroll al lthe  way to the end of the bug report :)20:00
lucasagomes^^20:00
lucasagomesNobodyCam, cheers20:00
NobodyCamlucasagomes: https://github.com/openstack/tripleo-incubator/blob/master/scripts/setup-baremetal#L41-L4420:00
lucasagomesNobodyCam, a-ha here we go!20:01
lucasagomesta much!20:01
lucasagomesthe pxe_root_gb I continue to set in ironic?20:01
NobodyCamlucasagomes: line 45-46 of above ironic_driver_fileds20:02
NobodyCamthat comes from instance20:02
devanandalucasagomes: pls see yuriyz' response on https://review.openstack.org/#/c/68018/2/ironic/api/controllers/v1/node.py20:02
devanandalucasagomes: regarding your -1 on the patch20:02
lucasagomesdevananda, will take a look20:02
lucasagomesNobodyCam, thanks20:02
openstackgerritDevananda van der Veen proposed a change to openstack/ironic: API validates driver name for both POST and PATCH  https://review.openstack.org/6801820:03
lucasagomesdevananda, hmm, true it makes sense, because the acquire will look at the driver right?20:04
*** ndipanov has quit IRC20:05
devanandamax_lobur_afk: looks like this should be restored too: https://review.openstack.org/#/c/63904/20:08
devanandalucasagomes: acquire?20:08
devanandalucasagomes: ah. right.20:08
NobodyCambrb20:08
devanandalucasagomes: but no, i think because update_node isn't performed by the API for /any/ data20:09
lucasagomesdevananda, true20:09
devanandalucasagomes: update_node will always need to be routed to a conductor. and if no conductor is currently alive and advertising support for that driver, there is no topic for it20:09
lucasagomesdevananda, true, but before the get_topic_for would use a generic topic20:10
lucasagomesif the driver was invalid20:10
devanandayes20:10
lucasagomesso any conductor alive would be able to get that request and update the node20:10
devanandawhich would allow any arbitrary conductor to work on it20:10
devanandayep20:10
lucasagomesidepdenent of the driver it has20:10
lucasagomesyea20:10
lucasagomeshmm do you think it's a bad practice?20:10
devanandawhich, if the conductor tried to load the driver (eg, in acquire()) would of course fail20:10
lucasagomesright20:11
lucasagomesthat was what I was thinking20:11
lucasagomeshmm20:11
devanandaiow, the old behavior was actually broken20:12
devanandabut since we aren't doing functional testing w/ multiple conductor instances w/ different drivers, we haven't hit that issue yet20:12
lucasagomesok fair, it was more a observation, cause it sounds a bit odd if the partial update fails because of an unrelated attribute20:12
devanandawell20:12
devanandaold way: update would fail for any attribute if driver not found20:13
devanandabut you'd get an error about driver-not-found even when updating node.properties20:13
devanandanew way: update will fail for any attribute if driver not found20:13
devanandabut you'll get a mroe sensible error :)20:13
lucasagomesheh yea20:13
devanandaand the error is generated in API, rather than in a random conductor20:13
lucasagomeswhich is fair enough20:14
devanandawhether we should allow updates for nodes which have no active driver is a different question ;)20:14
lucasagomesindeed20:14
lucasagomesI thought the old way would allow us to update indepedent if the driver is avaiable or not20:15
lucasagomesbut I might be wrong there20:15
devanandai thought acquire() will load the driver20:15
devanandaand thus fail if eg, you try to update a node which has driver='foobar'20:15
*** jdob has quit IRC20:15
lucasagomesdevananda, just took a look at the code, yea it will try to load the driver20:17
lucasagomesand raise20:17
lucasagomesraise exception.DriverNotFound(driver_name=driver_name)20:17
lucasagomescase it's not found20:17
devanandaya20:17
lucasagomesI was a bit confused because20:17
lucasagomesin the acquire method we have the default argument20:17
lucasagomesdriver_name=None20:17
devanandahumm20:17
lucasagomesbut following it down yea it will always try to load it20:17
lucasagomes    :param driver_name: Name of Driver. Default: None20:18
lucasagomesAnd then on the NodeManager:        driver_name = driver_name or self.node.get('driver')20:18
devanandayes20:19
devanandaso20:19
lucasagomesdevananda, so yea the patch is correct, I will review it later20:20
lucasagomessorry for the confusion20:20
lucasagomes(and for suggesting something that won't work)20:20
devanandalucasagomes: http://git.openstack.org/cgit/openstack/ironic/commit/?id=425a443820:20
devanandaI think this is to allow changign the driver_name20:21
devanandaand is needed if we want to move a node from one driver (on conductor A) to another driver (on conductor B)20:21
lucasagomesright, hmm makes sense20:23
devanandayea, that's it -- normally it uses self.node.get('driver'), except when a new driver_name is passed20:24
devanandain which case the update_node method should have used the new driver_name when routing the RPC message20:25
devanandaand conductor needs to acquire() the node with the new driver, not the current one20:25
devananda:)20:25
devanandas/:)// -- fingers movign too fast in the wrong window20:26
NobodyCamfor the next rev of the nova driver do you think I should switch 'except ironic_exception.HTTPInternalServerError' to 'except Exception'?20:28
lucasagomesdevananda, thanks!20:28
lucasagomeshttp://paste.openstack.org/show/62134/20:29
lucasagomesI've seem a lot of Node is already locked by another process when triggering the deploy20:29
lucasagomesI think the periodic tasks are running too often20:29
lucasagomesor we should somehow tells it to not acquire that node while he's being deployed20:30
NobodyCamlucasagomes: do we check that the nodes instance <> uuid?20:31
NobodyCamthat would tell us20:31
devanandalucasagomes: oh, interesting20:32
lucasagomesNobodyCam, +1 or the target_provision_state20:32
devanandalucasagomes: we should put a retry in there20:32
lucasagomesif it's active20:32
lucasagomesdevananda, +120:32
devanandaretry in the nova-ironic driver20:32
devanandasince api calls may get transitory failures like this20:32
devanandait's normal20:32
lucasagomescause the deploy might fail because the peridic task is running in the node20:32
lucasagomesdevananda, yup20:32
lucasagomeswe can check if the API returned conflict20:33
devanandayep20:33
lucasagomesand retry it20:33
devanandayep20:33
devanandait should only retry for certain HTTP codes, ofc20:33
lucasagomes+120:33
lucasagomesNobodyCam, about the InternalServerError20:33
lucasagomesit should be clienterror20:33
lucasagomescause in that case it's trying to remove an attribute that doesn't exist in that node20:34
lucasagomesso the APi should return 400 (client error)20:34
lucasagomesbad request*20:34
devanandai'm going to head home... afk for ~ an hour20:34
NobodyCam:)20:35
lucasagomesI'm going to take a break as well20:35
NobodyCamdevananda: you go into the office today?20:35
devanandaNobodyCam: no. been working from the hill, tho20:35
NobodyCam:) ahh20:35
NobodyCamenjoy the walk (driver) home20:35
lucasagomesNobodyCam, devananda g'night20:35
NobodyCamnight lucasagomes :)20:35
lucasagomesdevananda, safe drive back home20:36
lucasagomesNobodyCam, thanks, and thanks for all the help with the deploy stuff20:36
NobodyCam:)20:36
NobodyCamlucasagomes: fyi I should yet another nova driver up today20:36
lucasagomesNobodyCam, good stuff20:36
lucasagomesI will test that tomorrow20:36
NobodyCam:)20:36
NobodyCam:)20:37
*** lucasagomes has quit IRC20:37
devanandag'night, lucas!20:40
openstackgerritJarrod Johnson proposed a change to stackforge/pyghmi: Fix keepalive behavior on broken Sessions  https://review.openstack.org/6996720:43
jbjohnsoI've moved on to long term test with network failure injection to find issues20:44
jbjohnsoThat one was interesting, an application with multiple sessions to multiple bmcs with one of those sessions connecting successfully and then having connectivity lost would cause it to lose its mind20:45
jbjohnsothe fun sorts of problems I get to enjoy by being more ambitious than ipmitool20:45
NobodyCam:)20:46
jbjohnsoand the shortest git review I think I've ever done20:46
*** coolsvap is now known as coolsvap_away20:46
NobodyCamso if you 5 connections and lose four then then one that was still active goes wacko20:46
jbjohnsoNobodyCam, or if you had 5 concurrent sessions and just one20:48
jbjohnsothen the one bad one would basically starve the 420:48
jbjohnsowell, except for SOL payloads20:48
NobodyCamahh :) good catch20:48
jbjohnsowhich still carried on20:48
jbjohnsobut cpu usage would be high20:48
jbjohnsoman, going through two apache reverse proxy setups really visibly impacts the chunkiness of my remote console server20:49
jbjohnsoI should stop doing that..20:49
NobodyCamlol20:49
jbjohnsobtw, was curious20:50
jbjohnsoso shellinabox is currently part of the strategy of things20:50
NobodyCamadd a squid cache that will help...LOL20:50
NobodyCam:-p20:50
jbjohnsoso if I do show off this console server and I license it as Apache20:50
jbjohnsohow would you want to work the javascript logistics in the browser?20:51
jbjohnsoI used the shellinabox javascript sloppily (since I am a terrible web designer) with a trivial ajax filled select and an iframe to change the console20:52
NobodyCambrowser like lynx?20:52
jbjohnsoironically20:52
jbjohnsolynx or links wouldn't work very well as a text console browser20:52
jbjohnsosince it doesn't do javascript20:52
jbjohnsobut you could run links20:52
jbjohnsoand then embed that in another browser20:52
jbjohnsoyou just gave me an idea to demo the external application console plugin... what better console than lynx20:53
*** rloo has quit IRC20:54
*** rloo has joined #openstack-ironic20:54
NobodyCammany time I do not even have access to gui browsers20:54
jbjohnsowell, all this stuff is catering to web people20:54
jbjohnsothe exact same console is available over non-http20:54
NobodyCamso I think (as it is a text console) text browser support would be awesome20:55
jbjohnsoin fact, concurrent accesses from http and non-http can see each other type20:55
jbjohnsoit's just the non-http users don't notice latency as badly as the http users20:55
NobodyCamssh/telnet and http20:55
jbjohnsowell, it's a socket that's available over TLS or unix domain socket20:56
* NobodyCam remembers pluging 9600 bps modems into serial ports for remote console access20:56
jbjohnsoI remember real vt100s20:57
NobodyCamtry and find a computer with a db9(25) port now20:57
jbjohnsoa server or other?20:57
jbjohnsoserver it's easy20:57
* NobodyCam user ibm rs and hp mini's (mini took up a whole room :-p )20:58
NobodyCams/user/used/20:58
jbjohnsoI love how parallel cables are technically 'subminiature'21:00
jbjohnsoso tiny, miniature isn't small enough21:00
jbjohnsoand yet, gigantic by today's standards21:01
NobodyCam:)21:02
openstackgerritA change was merged to stackforge/pyghmi: Fix keepalive behavior on broken Sessions  https://review.openstack.org/6996721:05
*** jcooley_ has quit IRC21:11
*** jcooley_ has joined #openstack-ironic21:12
openstackgerritRuby Loo proposed a change to openstack/ironic: mock's return value for processutils.ssh_execute  https://review.openstack.org/6947921:15
NobodyCamlol got side tracked looking up the old debug command to setup mfm/rll drives ... congrats to seagate for keeping the info up. (fyi: ftp://ftp.seagate.com/techsuppt/controllers/st11m-r.txt)21:17
NobodyCamhow many people here remember mfm/rll hard drives21:17
* NobodyCam wounders21:17
rlooNobodyCam, what are you talking about? :-)21:25
NobodyCamlol :-P21:28
* NobodyCam feels old21:28
rloobut NobodyCam is young at heart!21:32
NobodyCam:)21:40
NobodyCamTy rloo21:40
jbjohnsosome people probably don't even remember IDE21:43
NobodyCamjbjohnso: lol wow am I that old... I missed punch cards but did use paper tape!21:45
jbjohnsoI vaguely remember messing with Xylogics controllers in Sun4 systems21:45
jbjohnsomainly because some one screwed around with them for no good reason and I spent a long time with tweezers unbending the pins on the controller21:46
jbjohnsosomeone wanted to reorganize the Sun4 VME bits into one super-duper Sun4 system with lots of disk in the student computer lab21:47
jbjohnsoI do remember as the Sun4 systems went down one by one we did salvage, ended up with a massive 128 MB of ram in one system, that was amazing...21:48
NobodyCamnice21:48
NobodyCamwe had a dec pdp-1121:49
jbjohnsoin that day, a 5U storage enclosure held precisely 1 drive: 540 Megabytes21:49
NobodyCamyep. I installed several on the hp I used to work on... >220 lbs eavh21:50
NobodyCameach even21:50
NobodyCamtook four of us21:50
jbjohnsothis is going to turn into a tech variant of the four yorkshiremen skit21:51
NobodyCamlol jb did you see the link I posted last night? Re: python bomb?21:53
NobodyCamhttp://www.urbandictionary.com/define.php?term=Python%20Bomb21:55
jbjohnsoheh21:57
jbjohnsohmm... wonder why it is the advertisements now decide to talk to me about lenovo so much...21:57
jbjohnsoguess the acquisition caused me to do things that the ad servers mistook for looking to buy lenovo rather than being bought by lenovo21:57
NobodyCamlol google is watching where you go!21:58
jbjohnsoI for one take some fascination in how tracking me changes my experience21:58
*** epim has joined #openstack-ironic21:58
jbjohnsook, I look up some auto part and I see ads for that auto part for a while, I'm with you...21:58
jbjohnsoI look at a site about speedboats, and then I see banner ads for *yachts*21:59
NobodyCamyep!21:59
jbjohnsoone, you've grossly overestimated my success in the world21:59
NobodyCamlol21:59
jbjohnsotwo, even if I were that successful, does a *banner* ad really make anyone buy a yacht...21:59
jbjohnso"oh yeah, might as well, where's the 'buy it now' button..."21:59
NobodyCamlol None on NobodyCams Credit cards would pass the bay a Yacht now check22:00
NobodyCams/bay/buy/22:01
devanandafokls who want to land things --22:08
devanandain case you dont want to just "recheck no bug", this bug has been causing a lot of the failures: https://bugs.launchpad.net/nova/+bug/127338622:09
devanandapretty easy to tell if that's why jenkins failed it -- look in the test failure's kernel.log for this: http://paste.openstack.org/show/61869/22:10
rloodevananda. so we should just keep doing 'recheck's until it wins the lottery?22:15
devanandarloo: probably not. that'll just further choke the queues. though, i'm doing that with the stevedore fix to try to unblock /everything/ else22:16
devanandarloo: but if you feel something needs to be rechecked/reverified, that bug seems to be causing most of our pain22:17
rloodevananda. ok, i think for important stuff (like the stevedore) it makes sense, but otherwise, I figured I'd just wait.22:17
rloodevananda. i'm even wondering whether it is worth reviewing right now.22:18
NobodyCambrb afternoon walkies22:22
*** aignatov is now known as aignatov_22:27
devanandaagordeev, romcheg - dont suppose either of you are around?22:34
devanandacould use your eues on our tempest test suite22:34
devanandai suspect we're doing something we shouldn't, cause we're hitting a neutron bug EVERY SINGLE TIME. and even Nova is not.22:34
devanandaahh22:36
devanandawe are testing tempest22:36
devananda*testing tempest with neutron22:37
romchegdevananda: Hi, I'm semi-available here :)22:37
*** matty_dubs is now known as matty_dubs|gone22:38
NobodyCamHi romcheg :) How goes the revolution22:38
devanandaromcheg: https://review.openstack.org/7000122:38
romchegdevananda: But we need to test neutron integration as well...22:40
devanandaromcheg: today?22:40
romchegNot22:40
romcheg*No22:40
devanandaromcheg: right22:40
devanandaromcheg: and we can't land ANYTHING today because of bugs in neutron22:40
romchegNow it's not required22:40
romchegdevananda: that's what I was about to mention22:40
devanandaromcheg: even Nova does not have neutron enabled in their tempest tests22:40
devanandaromcheg: look at tempest-dsvm-full22:41
devanandafor example22:41
devanandait's not using neutron22:41
romchegYes, neutron's baas (bug as a service) works great. I had several problems with it while testing Ironic tests but they were not critical22:42
romchegI will support that patch to infra22:43
devanandaty22:47
* devananda wants to unblock our gate ...22:48
*** mrda_away is now known as mrda22:49
mrdamorning all22:49
NobodyCammorning mrda22:49
romchegNobodyCam: we had some good progress but there are a lot of things to do yet22:51
mrdahey NobodyCam, just wondering if you could +2, it's already approved, and it's passed check, it's just stuck and won't run gate until a core re-approves22:51
romchegNobodyCam: Ukraine drives to Europe http://cl.ly/image/3s222R1g3440 :)22:51
romchegNobodyCam: Thanks for your interest22:51
NobodyCamromcheg: :)22:51
mrdaSorry NobodyCam it's https://review.openstack.org/#/c/68852/22:51
NobodyCammrda: link22:51
NobodyCam:)22:51
mrdathnx22:52
NobodyCammrda: it is approved you hold on https://review.openstack.org/#/c/66078/222:52
devanandamrda: so. read scrollback if you want more history ...22:52
devanandamrda: tldr - gate has been stalled for a while. i'm trying to unblock it22:53
devanandamrda: ignore jenkins' -1's right now, too. same problem22:53
NobodyCamin this case dep is not approved22:53
devanandamrda: also, rebase anything on top of 69495 if you want to have a chance of passing gate22:53
NobodyCamdevananda: so https://review.openstack.org/#/c/66078/2 is good to go now22:54
* devananda looks22:54
devanandayea22:54
mrdathanks, been reading scrollback - lots of discussion o/night22:54
devanandaas soon as gate is fixed :)22:54
NobodyCamahh I had already +2 it22:55
devanandamrda: the discussion about interrupting a deploy is probably worth a read22:55
mrdadevananda: ok, thanks22:55
openstackgerritRuby Loo proposed a change to openstack/ironic: SSHPower driver raises IronicExceptions  https://review.openstack.org/6699023:00
NobodyCamomg romcheg thats not near you is it?23:00
romchegThat's a burnt police bus in Kyiv23:00
NobodyCamwow23:00
devanandaromcheg: lol, nice ride :p23:01
romchegNobodyCam: Quite far from me but I hope people here in Kharkiv will become more active eventually...23:02
NobodyCamhttp://www.bbc.co.uk/news/world-europe-2588558823:03
devananda(not intending to be offensive. making light of very grave things is best, IMO, otherwise they are just depressing)23:03
NobodyCamthat does not look safe to /me23:03
NobodyCam:)23:03
romchegNobodyCam: the funniest one I've seen http://pbs.twimg.com/media/Be7W1zKCQAA0JpR.jpg:medium23:05
*** epim has quit IRC23:05
NobodyCamomg23:06
NobodyCam:)23:06
NobodyCami lone this one: http://www.bbc.co.uk/news/world-europe-2592721223:06
NobodyCams/lone/like/23:06
*** rloo has quit IRC23:07
NobodyCamsorry its the 5th picture23:07
*** rloo has joined #openstack-ironic23:08
NobodyCamlink take you to the first one23:08
*** rloo has quit IRC23:09
*** rloo has joined #openstack-ironic23:10
romchegNobodyCam: I think this one was in all world news http://www.popularresistance.org/wp-content/uploads/2013/12/Ukraine-man-playing-pianor-to-riot-police-Dec-7-2013.jpg23:11
NobodyCam:) romcheg lol Thats really good too23:11
*** rloo has quit IRC23:17
*** rloo has joined #openstack-ironic23:17
* NobodyCam wanders afk for a few min..23:22
*** mdurnosvistov has quit IRC23:24
*** epim has joined #openstack-ironic23:32
*** epim has quit IRC23:48
*** romcheg has left #openstack-ironic23:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!