*** dsneddon has joined #openstack-ironic | 00:21 | |
*** sdake has quit IRC | 00:26 | |
*** dsneddon has quit IRC | 00:26 | |
*** dustinc_ has joined #openstack-ironic | 00:29 | |
*** dustinc has quit IRC | 00:32 | |
*** dustinc_ has quit IRC | 00:33 | |
*** dsneddon has joined #openstack-ironic | 00:40 | |
*** sdake has joined #openstack-ironic | 00:46 | |
*** openstackgerrit has quit IRC | 00:52 | |
*** dustinc has joined #openstack-ironic | 00:53 | |
*** gyee has quit IRC | 01:07 | |
*** hwoarang has quit IRC | 01:09 | |
*** gyee has joined #openstack-ironic | 01:10 | |
*** hwoarang has joined #openstack-ironic | 01:13 | |
*** sdake has quit IRC | 01:14 | |
*** sdake has joined #openstack-ironic | 01:17 | |
*** sdake has quit IRC | 01:29 | |
*** sdake has joined #openstack-ironic | 01:32 | |
*** hamzy has joined #openstack-ironic | 01:39 | |
*** sdake has quit IRC | 01:39 | |
*** gyee has quit IRC | 01:47 | |
*** _fragatina has quit IRC | 02:02 | |
*** rloo has quit IRC | 02:40 | |
*** openstackgerrit has joined #openstack-ironic | 02:43 | |
openstackgerrit | Julia Kreger proposed openstack/ironic master: [WIP]: fast tracked deployment support https://review.openstack.org/635996 | 02:43 |
---|---|---|
*** sdake has joined #openstack-ironic | 03:06 | |
*** hwoarang has quit IRC | 03:19 | |
*** hwoarang has joined #openstack-ironic | 03:20 | |
*** etingof has quit IRC | 03:41 | |
*** hwoarang has quit IRC | 04:00 | |
*** hwoarang has joined #openstack-ironic | 04:03 | |
*** sdake has quit IRC | 04:15 | |
*** penick has quit IRC | 05:24 | |
*** mkrai has joined #openstack-ironic | 05:37 | |
*** sdake has joined #openstack-ironic | 06:28 | |
*** jtomasek has joined #openstack-ironic | 06:32 | |
*** moshele has joined #openstack-ironic | 06:47 | |
*** gkadam has joined #openstack-ironic | 07:00 | |
*** sdake has quit IRC | 07:07 | |
*** hamdyk has joined #openstack-ironic | 07:24 | |
*** e0ne has joined #openstack-ironic | 07:36 | |
openstackgerrit | Merged openstack/ironic-python-agent master: Add secondary sorting by name when guessing root disk https://review.openstack.org/635239 | 07:41 |
*** rpittau has joined #openstack-ironic | 07:57 | |
arne_wiebalck | good morning ironic! | 07:57 |
rpittau | good morning ironic! o/ | 07:57 |
rpittau | first! xD | 07:58 |
arne_wiebalck | :-D | 07:59 |
*** etingof has joined #openstack-ironic | 08:01 | |
dtantsur | morning ironic | 08:17 |
rpittau | hey dtantsur :) | 08:18 |
*** amoralej|off is now known as amoralej | 08:40 | |
*** early has quit IRC | 08:54 | |
*** early has joined #openstack-ironic | 08:56 | |
iurygregory | morning everyone o/ | 08:57 |
*** fdegir has joined #openstack-ironic | 08:57 | |
rpittau | hi iurygregory :) | 08:58 |
*** tssurya has joined #openstack-ironic | 09:00 | |
*** early has quit IRC | 09:06 | |
*** e0ne has quit IRC | 09:06 | |
*** e0ne has joined #openstack-ironic | 09:07 | |
*** dougsz has joined #openstack-ironic | 09:07 | |
*** early has joined #openstack-ironic | 09:15 | |
*** sunnaichuan has joined #openstack-ironic | 09:16 | |
openstackgerrit | Hamdy Khader proposed openstack/python-ironicclient master: Add is-smartnic port attribute to port command https://review.openstack.org/629449 | 09:20 |
openstackgerrit | Hamdy Khader proposed openstack/python-ironicclient master: Add 'hostname' to port's local link connection https://review.openstack.org/628773 | 09:20 |
*** derekh has joined #openstack-ironic | 09:29 | |
openstackgerrit | Hamdy Khader proposed openstack/ironic master: [Follow Up] Expose is_smartnic in port API https://review.openstack.org/636575 | 09:30 |
*** e0ne has quit IRC | 09:34 | |
*** mkrai has quit IRC | 10:14 | |
*** ijpascual has joined #openstack-ironic | 10:19 | |
ijpascual | Hi, I am trying to deploy baremetal with ironic/bifrost in Dell Poweredge server R640. Apparently, the RAID controller PERCH740P is not supported by IPA. Is this right? I have seen a commit from rpioso last year solving this issue, but I cannot get it working. | 10:27 |
*** MattMan has quit IRC | 10:28 | |
*** MattMan has joined #openstack-ironic | 10:29 | |
*** e0ne has joined #openstack-ironic | 10:46 | |
*** e0ne_ has joined #openstack-ironic | 10:50 | |
*** e0ne has quit IRC | 10:52 | |
openstackgerrit | Arkady Shtempler proposed openstack/ironic-tempest-plugin master: New test where instances are BM guest and VM https://review.openstack.org/636598 | 11:38 |
*** rpittau has quit IRC | 11:52 | |
*** e0ne has joined #openstack-ironic | 11:56 | |
*** e0ne has quit IRC | 11:57 | |
*** e0ne_ has quit IRC | 11:57 | |
*** hamzy has quit IRC | 11:59 | |
*** sdake has joined #openstack-ironic | 12:03 | |
*** e0ne has joined #openstack-ironic | 12:08 | |
*** _fragatina has joined #openstack-ironic | 12:11 | |
*** e0ne has quit IRC | 12:12 | |
*** _fragatina_ has joined #openstack-ironic | 12:12 | |
*** hamzy has joined #openstack-ironic | 12:13 | |
*** _fragatina has quit IRC | 12:16 | |
*** sdake has quit IRC | 12:16 | |
*** dougsz has quit IRC | 12:18 | |
*** e0ne has joined #openstack-ironic | 12:22 | |
*** rpittau has joined #openstack-ironic | 12:29 | |
*** dougsz has joined #openstack-ironic | 12:59 | |
larsks | I keep running into the situation in which baremetal nodes fail to boot because they're not getting a response from the dhcp server. The dhcp requests are reaching the host, the neutron-managed dnsmasq server is running, but it doesn't appear to seeing the requests and it's not sending any replies. The configuration in /var/lib/ironic/*boot seems fine. | 12:59 |
larsks | Does this ring a bell for anybody? | 12:59 |
*** arne_wiebalck_ has joined #openstack-ironic | 13:00 | |
*** amoralej is now known as amoralej|lunch | 13:06 | |
*** trown|outtypewww is now known as trown | 13:06 | |
*** rh-jelabarre has joined #openstack-ironic | 13:06 | |
*** e0ne has quit IRC | 13:11 | |
openstackgerrit | Iury Gregory Melo Ferreira proposed openstack/ironic-tempest-plugin master: Run all defined jobs defined in check and gate https://review.openstack.org/636169 | 13:12 |
*** e0ne has joined #openstack-ironic | 13:17 | |
dtantsur | larsks: the causes can vary from invalid MACs on ports to bugs in Neutron :) | 13:20 |
dtantsur | the most common is networking misconfiguration though | 13:20 |
dtantsur | I guess tcpdumping DHCP traffic is the first reasonable step | 13:20 |
*** e0ne_ has joined #openstack-ironic | 13:20 | |
larsks | That's how I know the requests are making it to the host :). | 13:20 |
larsks | But yeah, I'm looking at neutron right now. | 13:20 |
*** e0ne has quit IRC | 13:23 | |
rpioso | ijpascual: I generally recall the issue. Please link the commit you mentioned. | 13:23 |
dtantsur | larsks: try finding its host file (dunno where it is on your system, maybe something /var/lib/neutron/..) | 13:26 |
dtantsur | that will at least tell you if neutron has prepared the DHCP environment right | 13:26 |
e0ne_ | I appreciate is anybody can review https://review.openstack.org/#/c/635493/ | 13:30 |
patchbot | patch 635493 - ironic-ui - Add ironic-ui integration tests - 2 patch sets | 13:30 |
*** root has joined #openstack-ironic | 13:33 | |
*** root is now known as w14161_1 | 13:33 | |
jroll | morning ironic | 13:37 |
w14161_1 | After reboot, need to re-run stack.sh to run ironic command, anyway to run ronic command directly without run stack.sh after reboot or shutdown? For example, I would run "openstack baremetal xxx" after stack.sh run successfully, but after reboot, "openstack baremetal xxx" would fail with something like "http 400", if re-run stack.sh, then everything is fine. | 13:38 |
*** e0ne_ has quit IRC | 13:40 | |
*** e0ne has joined #openstack-ironic | 13:45 | |
*** mjturek has joined #openstack-ironic | 13:46 | |
*** dnuka has joined #openstack-ironic | 13:47 | |
dnuka | good morning ironic | 13:47 |
*** e0ne_ has joined #openstack-ironic | 13:50 | |
*** e0ne has quit IRC | 13:51 | |
*** mjturek has quit IRC | 13:51 | |
*** rloo has joined #openstack-ironic | 13:52 | |
*** arne_wiebalck_ has quit IRC | 13:54 | |
larsks | dtantsur: here's a more ironic related question: after rebooting the controller, we have working networking. Nodes going into the 'clean' state are succesfully booting the ipa image, but then instead of performing the cleaning process they seem to hang around indefinitely in "clean wait" state. ipa on the node console is looping over "heartbeat successful/waiting for next heartbeat" messages. | 13:54 |
*** mjturek has joined #openstack-ironic | 13:55 | |
larsks | I don't see any errors in the logs; what could be going on here? | 13:55 |
*** mjturek has quit IRC | 13:59 | |
*** mjturek has joined #openstack-ironic | 14:00 | |
*** mjturek has quit IRC | 14:01 | |
*** mjturek has joined #openstack-ironic | 14:02 | |
*** arne_wiebalck_ has joined #openstack-ironic | 14:03 | |
*** e0ne_ has quit IRC | 14:04 | |
mjturek | morning dnuka | 14:05 |
dnuka | Hey mjturek o/ | 14:06 |
*** sdake has joined #openstack-ironic | 14:08 | |
*** sdake has quit IRC | 14:09 | |
*** sdake has joined #openstack-ironic | 14:10 | |
*** sdake has quit IRC | 14:13 | |
*** sdake has joined #openstack-ironic | 14:13 | |
*** tzumainn has joined #openstack-ironic | 14:14 | |
ijpascual | rpioso I refer to this one https://review.openstack.org/#/c/545184/ | 14:14 |
patchbot | patch 545184 - ironic - Fix iDRAC hardware type does not work with UEFI (MERGED) - 9 patch sets | 14:14 |
ijpascual | rpioso for some reason when deploying it is not able to locate the RAID virtual disk | 14:15 |
dtantsur | larsks: check the nodes are not in maintenance | 14:18 |
larsks | dtantsur: there are not, they are in "clean wait" state. | 14:19 |
dtantsur | morning jroll, dnuka | 14:19 |
dtantsur | larsks: well, the provision states are orthogonal to maintenance mode, you can be in both | 14:19 |
larsks | Oh, I see. Yes, they are in maintenance now. | 14:19 |
larsks | Does that happen automatically on a failure? | 14:20 |
dtantsur | larsks: it can happen for various reasons. check the maintenance_reason field on a node. | 14:20 |
dnuka | hi dtantsur :) | 14:20 |
larsks | dtantsur: "Timeout while cleaning" | 14:20 |
dnuka | morning jroll o/ | 14:21 |
dtantsur | larsks: okay, so previous cleaning failed. try moving the nodes out of maintenance and waiting a bit (if IPA still heartbeats) | 14:21 |
larsks | Let me node maintenance unset and see what happens... | 14:21 |
*** sdake has quit IRC | 14:23 | |
larsks | dtantsur: that was it; maintenance state because the networking issue earlier prevented the nodes from booting. Thanks. | 14:23 |
larsks | They seem to have all completed cleaning this time. | 14:24 |
dtantsur | cool! | 14:24 |
larsks | Now to retry the boot-from-volume which is what I was originally working on :) | 14:24 |
*** arne_wiebalck_ has quit IRC | 14:24 | |
dnuka | dtantsur, thanks for the review, you guessed it right :) it happened when fixing the merge conflict. I will fix it. | 14:24 |
TheJulia | good morning | 14:25 |
* TheJulia sips coffee trying to wake up | 14:25 | |
dtantsur | morning TheJulia | 14:26 |
dtantsur | dnuka: happens :) | 14:26 |
dnuka | Hey TheJulia , good morning | 14:26 |
rpittau | hi TheJulia :) | 14:29 |
*** e0ne has joined #openstack-ironic | 14:29 | |
dnuka | hey rpittau o/ | 14:29 |
rpittau | hi dnuka :) | 14:30 |
*** e0ne_ has joined #openstack-ironic | 14:35 | |
*** e0ne has quit IRC | 14:35 | |
TheJulia | hjensas: by chance, will you be able to rebase https://review.openstack.org/#/c/631946 in the next day o rtwo? | 14:37 |
patchbot | patch 631946 - ironic - API - Implement /events endpoint - 12 patch sets | 14:37 |
TheJulia | hamdyk: There are two smartnic related patches for python-ironicclient that need to be updated. Since the API merged, it would be good to revise those | 14:38 |
*** e0ne_ has quit IRC | 14:41 | |
iurygregory | morning TheJulia o/ | 14:41 |
dnuka | hi iurygregory o/ | 14:41 |
iurygregory | dnuka, o/ | 14:41 |
*** _fragatina_ has quit IRC | 14:42 | |
*** amoralej|lunch is now known as amoralej | 14:42 | |
openstackgerrit | Merged openstack/ironic master: [Follow Up] Expose is_smartnic in port API https://review.openstack.org/636575 | 14:43 |
*** hjensas has quit IRC | 14:46 | |
*** sthussey has joined #openstack-ironic | 14:47 | |
TheJulia | So if any cores would take about about 45 minutes and review https://review.openstack.org/#/c/633052/, it would be greatly appreciated... else I'll have a grumpy dtantsur and I really don't want a grumpy dtantsur :) | 14:48 |
patchbot | patch 633052 - ironic - Support using JSON-RPC instead of oslo.messaging - 25 patch sets | 14:48 |
dtantsur | nobody wants a grumpy dtantsur :) | 14:49 |
dnuka | :) | 14:49 |
rpioso | ijpascual: That commit didn't resolve the issue you described. Rather, it added support for UEFI boot mode to the idrac h/w type (driver). | 14:50 |
*** e0ne has joined #openstack-ironic | 14:50 | |
dnuka | hey rpioso o/ | 14:50 |
rpioso | dnuka: Good morning :) | 14:51 |
dnuka | :) | 14:51 |
rpioso | ijpascual: I vaguely recall an issue testing against a PERC; however, I believe it was an H730. If memory serves, the CentOS ramdisk didn't have the driver. | 14:53 |
openstackgerrit | Mark Beierl proposed openstack/ironic master: Doc Updates to Dell iDrac Driver https://review.openstack.org/636644 | 14:54 |
* etingof is curious, does grumpy dtantsur look like ironic bear? | 14:54 | |
ijpascual | I see rpioso, thanks for the input! Which driver you mean? | 14:55 |
*** e0ne_ has joined #openstack-ironic | 14:55 | |
dnuka | hey etingof o/ | 14:55 |
rpioso | I can confirm that in a downstream Queens-based release, IPA supports H740P. | 14:55 |
*** e0ne has quit IRC | 14:55 | |
e0ne_ | TheJulia: hi. could you please review https://review.openstack.org/#/c/635493/ once you have a time? it's simple ironic-ui integration tests to verity that it works with the latest horizon | 14:57 |
patchbot | patch 635493 - ironic-ui - Add ironic-ui integration tests - 2 patch sets | 14:57 |
dtantsur | etingof: grumpy dtantsur looks like this http://mirchudes.net/uploads/posts/2017-04/1493417242_manul.jpg | 14:58 |
TheJulia | e0ne_: done | 14:58 |
TheJulia | e0ne_: thanks! | 14:58 |
e0ne_ | TheJulia: thank you for the review! | 14:59 |
*** penick has joined #openstack-ironic | 14:59 | |
TheJulia | o/ penick | 14:59 |
rpioso | ijpascual: The ramdisk may contain an older megaraid_sas driver. | 15:01 |
etingof | that looks like the very last second before the catastrophic cat explosion | 15:01 |
etingof | dnuka, o/ | 15:01 |
rpioso | ijpascua: If you could get the dmesg output from the ramdisk, we could know for certain. | 15:02 |
larsks | I'm trying to 'openstack server reboot' a baremetal node. The request is getting as far as ironic-conductor, which says: "RPC change_node_power_state called for node 015954fa-c900-4798-8c04-808a1504fe35. The desired new state is rebooting." | 15:02 |
larsks | ...but it doesn't seem to be able to acquire an exclusive lock. | 15:02 |
rpioso | ijpascual: ^^^ | 15:02 |
larsks | What could be holding the lock and preventing the power state change? | 15:02 |
ijpascual | rpioso alright, I will grab a dmesg log and comeback :) thanks for your help | 15:03 |
rpioso | ijpascual: You're welcome! | 15:03 |
arne_wiebalck | etingof: For my education, what’s the reason/benefit behind failing ‘sloppy’ power sync nodes faster (and how much faster is faster)? | 15:04 |
larsks | ...and can I cancel the pending power state change and just reset the power myself? | 15:05 |
dnuka | hey arne_wiebalck o/ | 15:07 |
*** baha has joined #openstack-ironic | 15:07 | |
arne_wiebalck | hey dnuka o/ | 15:07 |
etingof | arne_wiebalck, oh, sorry, I forgot to respond in gerrit! I can only blame my coffee reserves exhaustion | 15:07 |
arne_wiebalck | etingof: no worries, I was just curious :) | 15:08 |
*** moshele has quit IRC | 15:09 | |
etingof | arne_wiebalck, so the idea being that if we have too many nodes to walk over them all in time, the unstable nodes would get better attention... | 15:09 |
penick | o/ Julia | 15:10 |
penick | hey hey :) | 15:10 |
arne_wiebalck | etingof: aren’t they all scanned (sequentially)? | 15:10 |
penick | Also howdy, arne_wiebalck! | 15:10 |
etingof | arne_wiebalck, so we'd hopefully fail them fast and get back on track with the rest of nodes | 15:10 |
arne_wiebalck | hey penick o/ | 15:10 |
openstackgerrit | Merged openstack/ironic-ui master: Add ironic-ui integration tests https://review.openstack.org/635493 | 15:11 |
etingof | arne_wiebalck, the nodes are power-synced sequentially in 8 threads (by default) | 15:11 |
dtantsur | penick: \o | 15:12 |
arne_wiebalck | etingof: so the sloppy nodes would basically make it to the top of the list | 15:12 |
penick | Hey dtantsur! | 15:12 |
etingof | arne_wiebalck, exactly, that's the idea | 15:12 |
openstackgerrit | Merged openstack/sushy-tools master: Add configurable libvirt firmware https://review.openstack.org/620605 | 15:12 |
arne_wiebalck | etingof: what I was wondering is how going through the list compares to the power scan frequency | 15:12 |
arne_wiebalck | etingof: s/frequency/interval/ | 15:13 |
arne_wiebalck | etingof: there would only be a speed up when the scannig is much longer than the configured interval, no? | 15:14 |
etingof | arne_wiebalck, I am thinking of the situation when we have so many nodes that they can't be power-synced within the periodic job run interval | 15:14 |
arne_wiebalck | etingof: right, that’s my thinking | 15:15 |
etingof | arne_wiebalck, in that case the next periodic job won't be started | 15:15 |
arne_wiebalck | etingof: got it | 15:15 |
ijpascual | I have the log rpioso, any preference on where I should upload it? | 15:15 |
*** hamdyk has quit IRC | 15:15 | |
openstackgerrit | Bob Fournier proposed openstack/ironic-inspector master: For ironic_inspector database, query using node_uuid only when adding entry https://review.openstack.org/636650 | 15:16 |
etingof | arne_wiebalck, I wonder if you run into such situation...? would be interesting to know if that trick makes any real sense... | 15:16 |
arne_wiebalck | etingof: we have currently around 1700 nodes | 15:16 |
arne_wiebalck | etingof: and the interval is set to 300 | 15:17 |
rpioso | ijpascual: Preferably in a form that doesn't require a download nor email attachment. | 15:17 |
etingof | arne_wiebalck, would you consider trying the latest (unreleased) ironic which does parallel power syncs? that might let you reduce power sync interval, hopefully | 15:18 |
* etingof considers that being a scientific experiment as opposed to testing software on the end users | 15:19 | |
* arne_wiebalck is thinking about it | 15:20 | |
arne_wiebalck | etingof: I’m not sure we’d need a power sync fast than 5 mins | 15:20 |
arne_wiebalck | etingof: faster | 15:20 |
rpioso | dtantsur: Is there a preferred way to share dmesg output on the channel? | 15:20 |
dtantsur | rpioso: paste.openstack.org? | 15:21 |
arne_wiebalck | etingof: however, we will be integrating quite some additional nodes into ironic in the next months | 15:21 |
arne_wiebalck | etingof: so we may hit th 300 secs | 15:21 |
rpioso | ijpascual: ^^^ paste.openstack.org works for me :) | 15:21 |
rpioso | dtantsur: Thank you :) | 15:21 |
arne_wiebalck | arne_wiebalck: time(operator notices a node has power sync failures) >> 300 :-D | 15:23 |
etingof | arne_wiebalck, right, I'd be interested in knowing your experience with multi-threading power syncs | 15:23 |
TheJulia | arne_wiebalck: we've often gotten complaints from deployments where they are dealing with 2000-5000+ nodes and start encountering power sync issues. | 15:24 |
* TheJulia notices arne_wiebalck talk to himself and think he will fit right in to ironic. :) | 15:24 | |
arne_wiebalck | TheJulia: definitely the root of all evil | 15:24 |
TheJulia | thinks | 15:24 |
arne_wiebalck | arne_wiebalck: ppower sync I mean :-D | 15:24 |
TheJulia | hehe | 15:24 |
arne_wiebalck | TheJulia: not TheJulia :-D | 15:25 |
arne_wiebalck | TheJulia: this is with the default interval? | 15:25 |
* etingof wonders when BMCs learn to emit power state change notifications | 15:25 | |
arne_wiebalck | TheJulia: and the customers require to know on a 60secs basis? | 15:25 |
arne_wiebalck | TheJulia: or they merely see power syncs run into each other? | 15:26 |
arne_wiebalck | TheJulia: and don’t really care about how quickly power sync failures are noticed | 15:26 |
TheJulia | arne_wiebalck: yeah, I remember speaking with an operator that was doing 10-15 minutes and did encounter an issue, I just don't precisely remember what it was | 15:26 |
TheJulia | they were seeing syncs basically never completing I beleive | 15:27 |
arne_wiebalck | etingof: we can certainly give it a try and see how that changes the total time to do the power sync | 15:27 |
etingof | arne_wiebalck, would be awesome | 15:27 |
etingof | arne_wiebalck, prior to the latest changes, single-threaded ipmi call against a dead BMC could last for up to 300 sec | 15:30 |
ijpascual | rpioso hope this is ok as well: https://pastebin.com/wiL2yd0b | 15:30 |
ijpascual | Next time i will use the openstack one ;) | 15:31 |
etingof | arne_wiebalck, if we have a handful of dead BMCs, power information becomes quite outdated for the rest of the nodes | 15:31 |
arne_wiebalck | etingof: the power sync gets stuck? | 15:32 |
arne_wiebalck | etingof: ah, sorry missed your previous message | 15:33 |
etingof | arne_wiebalck, not fully stuck, but significantly delayed. e.g. ironic could block on dead BMCs for 300 * deadcount seconds before it deals with the rest of the fleet | 15:33 |
arne_wiebalck | etingof: I see … 300 seems quite generous, no? | 15:34 |
etingof | arne_wiebalck, it is the magic of ipmitool heuristics - when it hits non-responsive IPMI server, it back-offs its internal retry interval | 15:35 |
etingof | arne_wiebalck, on top of that, it treats timeouts as possible authentication failures so it tries some different auth suite or something | 15:36 |
etingof | arne_wiebalck, what adds up to the total ipmitool runtime | 15:36 |
arne_wiebalck | etingof: uh ... | 15:37 |
etingof | arne_wiebalck, so the latest changes in ironic essentially change two things: ironic kills ipmitool on *ironic* timeout and ironic runs ipmitool processes concurrently | 15:38 |
arne_wiebalck | etingof: I see. I was assuming the timeout came from ironic already and is passed to ipmitool. | 15:38 |
* arne_wiebalck wonders now how accurate the power state info of his nodes is | 15:39 | |
etingof | arne_wiebalck, that is correct, but ipmitool does not really respect the timeout | 15:39 |
arne_wiebalck | etingof: “nice” :-) | 15:40 |
arne_wiebalck | etingof: thanks a lot for the details! | 15:40 |
arne_wiebalck | etingof: I’ll try to give the concurrency patch a try and let you know | 15:41 |
arne_wiebalck | etingof: with holidays coming up, this may take a couple of weeks, though | 15:41 |
etingof | arne_wiebalck, you could find some relevant data points in the comment inside this patch -- https://review.openstack.org/#/c/607949/3/ironic/drivers/modules/ipmitool.py | 15:41 |
patchbot | patch 607949 - ironic - WIP: Avoid long-pending ipmitool processes - 3 patch sets | 15:41 |
arne_wiebalck | etingof: we should not delay ironic from taking over the world | 15:43 |
etingof | exactly! that's my only concern! | 15:43 |
arne_wiebalck | :-D | 15:43 |
rpioso | ijpascual: That works well, too :-) I'm analyzing the dmesg output, along with a subject matter expert. | 15:46 |
ijpascual | Thank you very much rpioso! | 15:47 |
rpioso | np | 15:47 |
* dtantsur looks at devstack with bloody eyes | 15:47 | |
* TheJulia hands dtantsur a torch | 15:50 | |
rpittau | oO | 15:50 |
dtantsur | This is one of that moments when I'm glad I don't have an axe, otherwise I'd need a new monitor :D | 15:50 |
etingof | dtantsur should better return that torch back to TheJulia | 15:51 |
dtantsur | not until this devstack finally builds successfully | 15:52 |
dnuka | :) | 15:52 |
rpittau | that might require more than one torch :P | 15:52 |
*** gkadam has quit IRC | 15:53 | |
* arne_wiebalck read “until this devstack finally burns successfully” | 15:54 | |
dtantsur | ++++ | 15:54 |
dtantsur | and now folks I can tell you a downside of using clouds: you cannot kick a server when it does not behave :) | 15:54 |
etingof | long-distance relationship can be tough | 15:56 |
dtantsur | lol | 15:56 |
rpittau | "punch-as-a-service" could become a thing | 15:57 |
dtantsur | that could be an ironic feature | 15:57 |
dnuka | :D | 15:57 |
larsks | Hey folks; I've had this happen several times now: I'm booting a baremetal server with nova. It powers up, starts to boot, then Nova logs: "Instance shutdown by itself. Calling the stop API." And powers off the node. | 16:02 |
larsks | Nova is showing that the power_state seen from ironic is still "power on", so I'm not sure why it thinks the instance shut down. | 16:02 |
TheJulia | \o/ for race conditions | 16:04 |
TheJulia | hmm | 16:04 |
*** sdake has joined #openstack-ironic | 16:06 | |
larsks | TheJulia: was that for me? | 16:06 |
openstackgerrit | Ilya Etingof proposed openstack/sushy-tools master: Limit instances exposure https://review.openstack.org/616516 | 16:06 |
TheJulia | larsks: on a call with my team. It sounds like nova is seeing it down during one of the deployment tranitions, and kills it after deployment | 16:06 |
TheJulia | larsks: a very detailed sequence of events from your logs would really help us | 16:07 |
*** ianychoi has quit IRC | 16:08 | |
larsks | TheJulia: nova logs (grep <instance_uuid> nova/*.log) https://termbin.com/ef9i, from ironic (grep <node_uuid> ironic/*.log): https://termbin.com/6wu5 | 16:09 |
dtrainor | I reinstalled my Undercloud this morning thinking maybe I broke an undercloud.conf while installing, in an attempt to fix this issue I'm having with trying to introspect this overcloud node. looks like there are no changes from my problem since yesterday - tcpdump from the undercloud shows the node attempting to grab an IP, but the undercloud never answers | 16:09 |
larsks | The only thing that stands out to me is that "Instance shutdown by itself. Calling the stop API. Current" error. | 16:10 |
*** mjturek has quit IRC | 16:10 | |
dtrainor | i went through the troubleshooting steps that hjensas helped me with yesterday and I still see only twip IPv4 IPs and one IPv6 IPs on br-ctlplane, hjensas said that there should be three IPs - undercloud_public_host, undercloud_admin_host, and (I forgot the other) | 16:11 |
*** w14161_1 has quit IRC | 16:11 | |
*** hjensas has joined #openstack-ironic | 16:12 | |
dtrainor | speak of the devil. | 16:12 |
dtantsur | etingof: it seems like we broken something around vbmc in devstack, at least on centos. I'm seeing "BMC instance node-0 already running" | 16:13 |
*** mjturek has joined #openstack-ironic | 16:13 | |
dtrainor | hey there hjensas, suppose i can borrow some more of your time this morning? still got that issue going on where a node fails to grab an IP as it starts to pxeboot | 16:13 |
* dtantsur hopes to solve his problems with || true | 16:14 | |
TheJulia | dtantsur: +++++++ on || true | 16:15 |
*** dustinc has quit IRC | 16:16 | |
*** samc-bbc has joined #openstack-ironic | 16:17 | |
etingof | dtantsur, I have a guess - may be that happens because with the latest vbmc we do not need to [re]start each vbmc domain, just start the master process. May be devstack is still doing things like 'vbmc start node-X'? | 16:21 |
dtantsur | etingof: it does indeed. I wonder why the gate is fine with it. | 16:21 |
etingof | dtantsur, may be it's not treated as error by gate? | 16:21 |
TheJulia | if not already running, seems fine, if it has to start it goes *boom* | 16:21 |
dtantsur | TheJulia: well, it's not running initially. I don't know why it fails for me.. | 16:22 |
* dtantsur suspects black magic and holds his borrowed torch tighter | 16:22 | |
TheJulia | dtantsur: same for me on my xenial vm | 16:23 |
dtantsur | hmmm | 16:23 |
TheJulia | dtantsur: it is a magic torch, light or fire | 16:23 |
dtantsur | so, || true for the win? :) | 16:24 |
dtantsur | etingof: could you check if we can actually remove the start command now? | 16:24 |
etingof | dtantsur, from devstack? | 16:24 |
dtantsur | etingof: yeah | 16:24 |
etingof | dtantsur, I will take a look | 16:25 |
dtantsur | thnx | 16:25 |
openstackgerrit | Merged openstack/ironic master: Remove duplicated jobs and refactor jobs https://review.openstack.org/630100 | 16:26 |
etingof | dtantsur, alternatively, may be we could change vbmc to ignore those start commands...? to retain backward compatibility | 16:26 |
dtantsur | up to you. I think I don't have enough background on what has changed. | 16:26 |
*** dnuka has quit IRC | 16:33 | |
*** sdake has quit IRC | 16:46 | |
*** e0ne has joined #openstack-ironic | 16:47 | |
*** sdake has joined #openstack-ironic | 16:48 | |
*** e0ne_ has quit IRC | 16:48 | |
*** dustinc has joined #openstack-ironic | 16:53 | |
*** gyee has joined #openstack-ironic | 16:55 | |
*** sdake has quit IRC | 16:58 | |
*** _fragatina has joined #openstack-ironic | 16:59 | |
*** _fragatina has quit IRC | 17:00 | |
*** tssurya has quit IRC | 17:06 | |
NobodyCam | Good Morning Ironic'ers | 17:15 |
dtantsur | morning NobodyCam | 17:15 |
NobodyCam | hey hey dtantsur :) happy hump day! | 17:16 |
rpittau | hey NobodyCam :) | 17:16 |
*** dustinc has quit IRC | 17:20 | |
*** mjturek has quit IRC | 17:20 | |
openstackgerrit | Harald Jensås proposed openstack/ironic master: API - Implement /events endpoint https://review.openstack.org/631946 | 17:21 |
larsks | TheJulia: looking more t athat "Instance shutdown by itself" error, it looks like Nova thinks that vm_power_state is power_state.SHUTDOWN...but that only happens if Ironic reports ironic_states.POWER_OFF...but the logs show ironic reporting power_state="power on" immediately afterwards. Is this just a timing issue of some sort? | 17:21 |
*** dustinc has joined #openstack-ironic | 17:22 | |
larsks | Looking at the ironic logs, ironic reports "Successfully set node power state to power on by power on" two seconds before Nova decides to shut things down. | 17:23 |
rpittau | bye all! good night! o/ | 17:23 |
TheJulia | larsks: yeah, that is it observed it mid-deploy | 17:23 |
openstackgerrit | Harald Jensås proposed openstack/python-ironicclient master: Add Events support https://review.openstack.org/345934 | 17:24 |
*** rpittau has quit IRC | 17:24 | |
TheJulia | I thought we had code to kind of guard against that update happening, at least during deployment.... | 17:24 |
*** dustinc has quit IRC | 17:24 | |
* TheJulia wonders if that changed | 17:24 | |
rpioso | ijpascual: We see the PERC H740 in the dmesg log -- [ 7.458186] pci 0000:18:00.0: [1000:0016] type 00 class 0x010400. However, there does not seem to be a driver for it, e.g., megaraid_sas. | 17:24 |
*** dustinc has joined #openstack-ironic | 17:24 | |
larsks | Which update? I mean, it makes sense that nova is checking the power state at this stage, but it is somehow checking too early or getting stale information or...something? | 17:25 |
*** e0ne has quit IRC | 17:25 | |
*** dustinc has quit IRC | 17:25 | |
*** dustinc_ has joined #openstack-ironic | 17:26 | |
larsks | It's odd though, because that "Triggering sync for uuid dcb4f055-cda4-4d61-ba8f-976645c4e92a _sync_power_states" happens *after* ironic reports that successful power on. | 17:26 |
rpioso | ijpascual: We don't know who created the ramdisk or how it was created. Perhaps they have to include megaraid_sas in it somehow or bump the version of it if it's already there. | 17:27 |
*** dustinc_ has quit IRC | 17:27 | |
*** dustinc has joined #openstack-ironic | 17:27 | |
*** dustinc has quit IRC | 17:27 | |
TheJulia | larsks: if memory serves, and I suspect jroll will remember better than me since he has looked at that code more than I, bu tI think that sync is actually from a list that could have slightly stale data | 17:27 |
*** dustinc has joined #openstack-ironic | 17:27 | |
*** dustinc has quit IRC | 17:28 | |
*** dustinc has joined #openstack-ironic | 17:28 | |
jroll | TheJulia: larsks: there's definitely a risk of a race condition there, iirc | 17:29 |
jroll | what version of nova/ironic is this? | 17:29 |
*** dustinc has quit IRC | 17:29 | |
*** dustinc has joined #openstack-ironic | 17:29 | |
larsks | jroll: This is a recent delorean...so, master from around 1/23. Ironic is at commit d057591. | 17:30 |
larsks | I seem to be hitting this issue pretty reliably. | 17:30 |
*** dustinc has quit IRC | 17:30 | |
jroll | nova master too? | 17:30 |
*** dustinc has joined #openstack-ironic | 17:30 | |
larsks | wait, sorry, that was the commit for the inspect client. ironic-api is at 4404292. | 17:30 |
larsks | Yeah, Nova from about the same time. | 17:31 |
jroll | ok | 17:31 |
*** _fragatina has joined #openstack-ironic | 17:31 | |
* jroll pokes around code | 17:31 | |
larsks | Nova-api is at ad842aa. | 17:31 |
*** hjensas has quit IRC | 17:33 | |
*** rloo has quit IRC | 17:33 | |
*** rloo has joined #openstack-ironic | 17:34 | |
*** dtantsur is now known as dtantsur|afk | 17:37 | |
dtantsur|afk | o/ | 17:37 |
larsks | jroll: I guess the call to self._sync_power_pool.spawn_n is asynchronous, so maybe Nova checks the power state before the sync has completed? | 17:37 |
jroll | larsks: yeah, that's my assumption. we use a short-lived cache for this: https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L891 | 17:39 |
openstackgerrit | Ilya Etingof proposed openstack/sushy-tools master: Add memoization to expensive emulator calls https://review.openstack.org/612758 | 17:39 |
openstackgerrit | Bob Fournier proposed openstack/ironic-inspector master: Use processed bool as key in introspection_data DB table https://review.openstack.org/636692 | 17:40 |
jroll | larsks: so there's a small window where you could go in order of: deployment starts -> server shuts down for reboot -> cache updated -> deployment completes and server powered on -> power sync calls shutdown | 17:41 |
larsks | jroll: it it helps, I posted links earlier to nova + ironic logs around this event. | 17:41 |
jroll | larsks: yep, have them open | 17:41 |
TheJulia | Should we consider special casing cache creation if in deployment to just mark the machine power as on? | 17:41 |
jroll | larsks: this is the change where we started using a cache: https://review.openstack.org/#/c/602127/ | 17:42 |
patchbot | patch 602127 - nova - ironic: stop hammering ironic API in power sync loop (MERGED) - 2 patch sets | 17:42 |
jroll | TheJulia: I'm not sure, that just feels like piling on hacks | 17:44 |
TheJulia | the alternative is lie about the power state in our API | 17:44 |
TheJulia | There is a case where we do that... | 17:44 |
* TheJulia tries to remember | 17:44 | |
TheJulia | or we talked about it | 17:45 |
jroll | well, a less lie-ful option would be... | 17:45 |
jroll | https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L7613 | 17:45 |
jroll | if we make it here, since it's a wtf situation | 17:45 |
jroll | we could hit ironic and make real sure that's what happened | 17:45 |
jroll | e.g. call get_info(use_cache=False) or so | 17:46 |
*** mjturek has joined #openstack-ironic | 17:46 | |
jroll | I can put up a POC patch if someone wants to take it over if we go that way | 17:46 |
TheJulia | That actually seems like a much better idea if the nova folks are good with that. | 17:47 |
TheJulia | since it is a "double-check" of the state | 17:48 |
jroll | it's either that or go back to hammering the crap out of ironic api | 17:48 |
TheJulia | yeah, which is not ideal either | 17:48 |
TheJulia | Worst comes to worst, a poc patch would start the discussion in nova | 17:48 |
TheJulia | ++++ | 17:48 |
larsks | jroll: I'm happy to test out any patches. | 17:48 |
larsks | ...although since this only happens "sometimes" testing will be a PITA. | 17:49 |
jroll | heh, right | 17:49 |
* jroll will hack something | 17:49 | |
openstackgerrit | Ilya Etingof proposed openstack/sushy-tools master: Limit instances exposure https://review.openstack.org/616516 | 17:51 |
*** derekh has quit IRC | 17:55 | |
larsks | So, another timing question: when deleting a baremetal node (openstack server delete...), the instance stays in state ACTIVE even after the ironic state has moved to "cleaning". Should nova just let go as soon as the state != active? | 17:57 |
larsks | I ask because we're consistently seeing instances go to ERROR state when deleting them, so we have to delete them a second time, and I think it's because nova times out waiting on ironic. | 17:58 |
*** dougsz has quit IRC | 17:58 | |
larsks | Oh, no, look at that; it's a traceback... | 17:59 |
larsks | So, nova is able to successfully de-allocate the network, and then it logs: | 18:00 |
larsks | Conflict: Node a00696d5-32ba-475e-9528-59bf11cffea6 can not be updated while a state transition is in progress | 18:00 |
larsks | (logs @ https://termbin.com/qldx) | 18:00 |
jroll | larsks: TheJulia: something like this should do it (I haven't tested it, it may just blow up): https://review.openstack.org/636699 | 18:03 |
patchbot | patch 636699 - nova - ironic: check fresh data when sync_power_state doe... - 1 patch set | 18:03 |
jroll | it'll need a proper bug filed in launchpad | 18:04 |
larsks | jroll: I'll go file a bug. | 18:04 |
jroll | and I might not be able to finish it out this week. can come back to it later, or someone can take it over | 18:04 |
jroll | thanks larsks ! :) | 18:04 |
larsks | jroll: https://bugs.launchpad.net/nova/+bug/1815791 | 18:10 |
openstack | Launchpad bug 1815791 in OpenStack Compute (nova) "Race condition causes Nova to shut off a successfully deployed baremetal server" [Undecided,New] | 18:10 |
jroll | cool, adding to that patch | 18:10 |
jroll | thanks | 18:10 |
larsks | jroll: do you have any thoughts on the conflict in https://termbin.com/qldx? | 18:10 |
jroll | larsks: I haven't begun to think about that yet :) | 18:10 |
larsks | Fair enough. | 18:10 |
jroll | give me a few and I can try to look, it seems familiar | 18:12 |
larsks | Sure, no rush. | 18:12 |
*** moshele has joined #openstack-ironic | 18:17 | |
*** sdake has joined #openstack-ironic | 18:19 | |
openstackgerrit | Merged openstack/sushy master: Move to zuulv3 https://review.openstack.org/632692 | 18:22 |
*** e0ne has joined #openstack-ironic | 18:22 | |
*** ijw has joined #openstack-ironic | 18:23 | |
*** betherly has joined #openstack-ironic | 18:25 | |
TheJulia | larsks: networking removal during instance tear-down if my glance at that log is correct? | 18:29 |
*** betherly has quit IRC | 18:29 | |
larsks | TheJulia: I thought the networking removal was successful ("Took 1.71 seconds to deallocate network for instance") | 18:30 |
*** Chaserjim has quit IRC | 18:30 | |
*** dustinc has quit IRC | 18:30 | |
TheJulia | larsks: there are two parts to it, one in neutron and one in ironic... ironic's can be delayed... | 18:30 |
*** mjturek has quit IRC | 18:32 | |
larsks | TheJulia: you're correct; it looks like it may be coming from _try_deallocate_network | 18:32 |
larsks | Is this a case where Nova should just retry instead of immediately going to ERROR state? | 18:32 |
TheJulia | Yeah, both nova and ironic try to nuke out some network config in ironic | 18:32 |
TheJulia | either is allowed to win, nova will just try for a while | 18:32 |
TheJulia | tl;dr we should really suppress the exception | 18:33 |
larsks | ...except it's not; as soon as it throws this error, it sets the instance to error and gives up. | 18:33 |
TheJulia | wut?!? | 18:33 |
TheJulia | thats not right | 18:33 |
TheJulia | ugh | 18:33 |
larsks | TheJulia: "Successfully reverted task state from deleting on failure for instance." | 18:33 |
*** trown is now known as trown|lunch | 18:34 | |
larsks | And then it shows up as "ERROR" in "openstack server list". | 18:34 |
larsks | A subsequent "... server delete" will actually delete it. | 18:34 |
jungleboyj | Hey, couple of questions for you guys. | 18:34 |
*** moshele has quit IRC | 18:34 | |
TheJulia | melwitt: sorry to tag you, but does ^^^ surface any memories of anything? | 18:34 |
* TheJulia shoudl have been less vague... changes in nova to related areas | 18:35 | |
TheJulia | jungleboyj: shoot, expect delays in replies today | 18:35 |
jungleboyj | Ok. I think it is pretty simple TheJulia | 18:35 |
TheJulia | and does it involve cloning me? | 18:35 |
*** amoralej is now known as amoralej|off | 18:35 | |
jungleboyj | Does Ironic require MAC addresses for the imported barmetal nodes for PXE boot to reliably work? | 18:36 |
jroll | jungleboyj: yes | 18:36 |
jroll | PXE is a DHCP thing, and DHCP works on MAC addresses | 18:36 |
jroll | (inspector can automatically find mac addresses for you, but they are required) | 18:36 |
jungleboyj | jroll: Excellent. My team in China has been fighting with me over this for weeks. For some reason they see it work without MAC addresses, so they think, but it never works for me. | 18:37 |
jroll | heh | 18:37 |
*** sdake has quit IRC | 18:37 | |
jroll | jungleboyj: *technically* it can work with some other kind of address similar to a mac address that infiniband uses, but... yeah. | 18:37 |
jungleboyj | I am guessing that they are booting into an existing image or getting a cached result or something and not verifying it really works. | 18:38 |
TheJulia | sure sounds like it | 18:38 |
*** sunnaichuan has quit IRC | 18:38 | |
jungleboyj | It is a pet peeve for me as when I first started using Ironic I assumed they were right and spent a good few days beating my head against it. Sniffing network traffic and stuff. | 18:38 |
jungleboyj | Thank you for confirming my findings. | 18:39 |
jungleboyj | TheJulia: I am working on cloning. Will share when I get it working. :-) | 18:39 |
TheJulia | jungleboyj: make sure they don't have a mac after the fact... | 18:39 |
jungleboyj | TheJulia: What do you mean? | 18:39 |
TheJulia | jungleboyj: it _is_ possibel, if they have modified the base config... for inspector to be quietly invoked and ports created | 18:39 |
TheJulia | but we don't test that in CI or anything | 18:40 |
jungleboyj | TheJulia: I don't think they did that. This is just using director to do RHOSP deployment. | 18:40 |
TheJulia | oh yeah, then yeah.. they absolutely need mac addresses if they are not explicitly doing an inspection | 18:40 |
jroll | larsks: do you have ironic conductor logs to go with those nova logs? | 18:41 |
jungleboyj | TheJulia: Thank you. My other questions are lower priority. Will ping you some time when you are less busy. | 18:41 |
larsks | jroll: sure :) | 18:41 |
jungleboyj | jroll: Thanks! | 18:41 |
larsks | One second... | 18:41 |
jroll | jungleboyj: feel free to ask, someone will answer when they get time :) | 18:41 |
TheJulia | jungleboyj: what jroll said :) | 18:42 |
jungleboyj | jroll: Ok, will do when I have a few minutes. Thanks. | 18:42 |
larsks | jroll: logs from slightly before/after the Conflict error in the nova logs (limited to just the baremetal node in question): https://termbin.com/8ebp | 18:43 |
jroll | thanks | 18:44 |
jroll | hm, looks like it should have been in CLEANWAIT | 18:45 |
openstackgerrit | Merged openstack/ironic-inspector master: Introspection data storage backend follow up https://review.openstack.org/632969 | 18:45 |
*** ijw has quit IRC | 18:46 | |
*** betherly has joined #openstack-ironic | 18:47 | |
*** betherly has quit IRC | 18:47 | |
larsks | jroll: should I open a bug for this, also? | 18:49 |
jroll | larsks: yeah, I think so | 18:49 |
jroll | it makes sense why this is breaking the way it is | 18:50 |
jroll | it just doesn't make sense how we got here | 18:50 |
jroll | or something | 18:50 |
* jroll might not be at full brain capacity right now | 18:50 | |
jroll | sorry, I've got nothing right now. :( | 18:51 |
*** sri_ has quit IRC | 18:52 | |
* jroll steps away for a bit | 18:52 | |
*** sri_ has joined #openstack-ironic | 18:52 | |
TheJulia | nodes going through deletion into cleanwait remain locked for a while. | 19:01 |
TheJulia | There is a specific reason that my brain is failing to recall at the moment | 19:01 |
jungleboyj | So the kind of generic question I had was if there have ever been proposals in the past for Ironic to be enhanced to do more system lifecycle management? Firware updates, etc? | 19:03 |
jungleboyj | I know that that isn't the goal of Ironic, but just wondering if anyone has ever pursued it in the past. | 19:04 |
openstackgerrit | Mark Goddard proposed openstack/ironic master: WIP: Prepare for instance with power off https://review.openstack.org/636720 | 19:04 |
larsks | jroll: fyi, https://bugs.launchpad.net/nova/+bug/1815799 | 19:09 |
openstack | Launchpad bug 1815799 in OpenStack Compute (nova) "Attempting to delete a baremetal server places server into ERROR state" [Undecided,New] | 19:09 |
jroll | larsks: thanks | 19:10 |
jroll | jungleboyj: yeah, definitely has been proposed. some of it has been done - for example, the ilo driver has firmware update capabilities, but the admin has to specify the firmware to write and such | 19:16 |
jroll | e.g. https://storyboard.openstack.org/#!/story/1526216 | 19:16 |
jroll | we haven't done full blown lifecycle automation, though... it may or may not be in scope, but regardless is a ton of work with lots of weird corner cases and such | 19:18 |
jroll | I'd love to see a CMDB-ish thing which handles tracking lifecycle things, and tells ironic to do things like update firmware to version X | 19:18 |
larsks | Opinion question that I think was lost earlier: right now, it looks like Nova keeps a baremetal instance in the ACTIVE state after it's entered the cleaning step on the ironic side. (a) is this intentional, and (b) does it make sense? Or should Nova considere the server "deleted" as soon as it leaves the "active" state on the ironic side? | 19:21 |
jroll | ah, I thought that was superceded by the errors you found :) | 19:22 |
TheJulia | likewise, I'd prefer to keep ironic much more of a swiss army knife of sorts | 19:22 |
jroll | larsks: we do consider the nova instance gone as soon as it gets into a cleaning state, so not sure what's happening on your system. see https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1170-L1181 | 19:23 |
larsks | jroll: huh, okay. I'll take a closer look, then. | 19:23 |
larsks | I wonder if this is really just a symptom of that conflict thing. | 19:24 |
jroll | larsks: it might have something to do with... yeah | 19:24 |
jroll | I think it is, that exception is *just* before we return to the higher level in nova that marks the instance deleted | 19:24 |
larsks | Okay, that makes sense. | 19:24 |
jroll | the block I linked is inside here: https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1238 | 19:25 |
jroll | your exception is here: https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1245 | 19:25 |
jroll | and then we pop back up to the common nova layer | 19:25 |
larsks | Got it. | 19:25 |
*** ijw has joined #openstack-ironic | 19:26 | |
jroll | (which is here, if you're curious: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2635 see the vm_state change shortly after) | 19:26 |
larsks | Thanks! This has all been very educational :) | 19:27 |
openstackgerrit | Mark Goddard proposed openstack/ironic master: Deploy templates: data model, DB API & objects https://review.openstack.org/627663 | 19:28 |
*** ijw has quit IRC | 19:28 | |
jroll | you're welcome :) | 19:31 |
openstackgerrit | Bob Fournier proposed openstack/ironic-inspector master: Use processed bool as key in introspection_data DB table https://review.openstack.org/636692 | 19:31 |
openstackgerrit | Bob Fournier proposed openstack/ironic-inspector master: Use processed bool as key in introspection_data DB table https://review.openstack.org/636692 | 19:33 |
larsks | If I create a volume from an image that specifies a separate kernel and ramdisk, those properties aren't preserved on the volume. I see that I can set ramdisk and kernel ina baremetal node's instance_info, but is there any way to either set them per-volume or to provide them when booting a new baremetal instance via nova? | 19:38 |
* jroll has no idea, but if they aren't preserved on the volume, that feels like a bug | 19:40 | |
larsks | I left the launchpad tab open this time...:) | 19:43 |
*** irclogbot_1 has quit IRC | 19:47 | |
*** e0ne has quit IRC | 19:47 | |
*** trown|lunch is now known as trown | 19:52 | |
*** bfournie has quit IRC | 19:53 | |
*** arne_wiebalck_ has joined #openstack-ironic | 19:56 | |
*** ijw has joined #openstack-ironic | 19:57 | |
*** irclogbot_1 has joined #openstack-ironic | 20:00 | |
*** ijw has quit IRC | 20:01 | |
jungleboyj | jroll: TheJulia Ok, so enhancements that would support more lifecycle management to Ironic would not be rejected. | 20:02 |
jungleboyj | Or seen as out of Ironic's scope. | 20:02 |
jroll | jungleboyj: some may be out of scope, but it's worth having the discussion | 20:03 |
jroll | can always file a short RFE and go from there | 20:03 |
TheJulia | jungleboyj: and driver enhancements for things like firmware/settings would not be rejected if they are already in ironic in other drivers | 20:04 |
*** _fragatina has quit IRC | 20:05 | |
jungleboyj | Awesome. That was a much more concise discussion than I expected. | 20:05 |
TheJulia | larsks: so partition/filesystem image? | 20:07 |
TheJulia | larsks: it sounds like the best route is burning into the image | 20:10 |
* TheJulia might be confused | 20:10 | |
larsks | TheJulia: partitioned images with a bootloader work fine. But we would also like to deliver kernel and ramdisk via pxe and just have the kernel mount root via is so. | 20:13 |
larsks | Arg, iscsi. | 20:13 |
larsks | Sorry phone | 20:13 |
larsks | In theory that would allow mounting root from any device supported by the kernel, not just iscsi. | 20:19 |
TheJulia | larsks: in "theory". In practice... rbd is problematic | 20:23 |
larsks | I wasn't even thinking specifically about the, but sure. | 20:24 |
TheJulia | realistically, booting from parameters on the node instance_info is after the image since the image... at least with the bfv work done so far, is geared to be agnostic | 20:24 |
TheJulia | so on image boot loader, which Ironic doesn't have the knowledge of or ability to modify as-is | 20:25 |
larsks | I think we're talking about slightly different things. I have to run off for a drs appointment right now; let me get back to you | 20:29 |
melwitt | TheJulia: sorry for the delayed reply, but in the traceback I see that during ironic driver.destroy(), it's hitting the conflict 409 when it tries to _cleanup_deploy and subsequently _remove_instance_info_from_node. the self.ironicclient.call('node.update' experiences a conflict 409. if it's appropriate to retry in that case, it seems like something to add to the ironic driver destroy() method, IIUC | 20:50 |
*** samc-bbc has quit IRC | 20:58 | |
*** jtomasek has quit IRC | 21:02 | |
TheJulia | larsks: possibly. Enjoy | 21:17 |
TheJulia | melwitt: Thanks. I suspect tolerating is likely fine. One of those things we mutually try to remove/clean-up. We could always consider removing the nova side since the contract is fairly set with ironic removing the field. Although... I've had people disagree from the other point of view *sigh* | 21:21 |
melwitt | TheJulia: not sure I understand. you mean an alternative approach being to do something on the ironic side to avoid returning 409 in the first place? | 21:23 |
TheJulia | melwitt: to tell nova not to try and perform the delete of the field anymore.... | 21:23 |
TheJulia | but I need to look at the code and validate that we can do that on the nova side | 21:23 |
TheJulia | I think we pop the fields on ironic's side as a very very very last step | 21:23 |
melwitt | delete the field == node.update? | 21:24 |
TheJulia | I may also have lost my mind, and may be rocking in the corner | 21:24 |
TheJulia | melwitt: yes | 21:24 |
jroll | TheJulia: I think we have that in _cleanup_deploy() to ensure it happens in exceptional cases (failed deployments, etc) where ironic might not hit that code path | 21:24 |
melwitt | ok | 21:24 |
TheJulia | jroll: oh, right | 21:24 |
TheJulia | good point, if it fails super early on | 21:24 |
* TheJulia goes back to rocking in the corner | 21:24 | |
*** ijw has joined #openstack-ironic | 21:30 | |
*** whoami-rajat has quit IRC | 21:32 | |
*** arne_wiebalck_ has quit IRC | 21:36 | |
*** _fragatina has joined #openstack-ironic | 21:45 | |
*** ijw has quit IRC | 21:51 | |
*** trown is now known as trown|outtypewww | 22:06 | |
*** ijw has joined #openstack-ironic | 22:16 | |
*** openstackgerrit has quit IRC | 22:22 | |
*** eandersson has quit IRC | 22:33 | |
*** ijw has quit IRC | 22:37 | |
*** ijw has joined #openstack-ironic | 22:38 | |
*** ijw has quit IRC | 22:42 | |
*** openstackgerrit has joined #openstack-ironic | 22:48 | |
openstackgerrit | Julia Kreger proposed openstack/ironic-inspector master: Allow processing power_off to be defined https://review.openstack.org/636774 | 22:48 |
openstackgerrit | Hamdy Khader proposed openstack/python-ironicclient master: Add is-smartnic port attribute to port command https://review.openstack.org/629449 | 22:54 |
openstackgerrit | Hamdy Khader proposed openstack/python-ironicclient master: Add 'hostname' to port's local link connection https://review.openstack.org/628773 | 22:54 |
*** eandersson has joined #openstack-ironic | 22:59 | |
openstackgerrit | Julia Kreger proposed openstack/ironic-inspector master: Add ironic API url to inspector IPA config https://review.openstack.org/636778 | 23:13 |
openstackgerrit | Julia Kreger proposed openstack/ironic master: fast tracked deployment support https://review.openstack.org/635996 | 23:24 |
*** hwoarang has quit IRC | 23:32 | |
*** hwoarang has joined #openstack-ironic | 23:33 | |
*** hwoarang has quit IRC | 23:48 | |
*** hwoarang has joined #openstack-ironic | 23:49 | |
*** w14161_1 has joined #openstack-ironic | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!