Wednesday, 2019-02-13

*** dsneddon has joined #openstack-ironic00:21
*** sdake has quit IRC00:26
*** dsneddon has quit IRC00:26
*** dustinc_ has joined #openstack-ironic00:29
*** dustinc has quit IRC00:32
*** dustinc_ has quit IRC00:33
*** dsneddon has joined #openstack-ironic00:40
*** sdake has joined #openstack-ironic00:46
*** openstackgerrit has quit IRC00:52
*** dustinc has joined #openstack-ironic00:53
*** gyee has quit IRC01:07
*** hwoarang has quit IRC01:09
*** gyee has joined #openstack-ironic01:10
*** hwoarang has joined #openstack-ironic01:13
*** sdake has quit IRC01:14
*** sdake has joined #openstack-ironic01:17
*** sdake has quit IRC01:29
*** sdake has joined #openstack-ironic01:32
*** hamzy has joined #openstack-ironic01:39
*** sdake has quit IRC01:39
*** gyee has quit IRC01:47
*** _fragatina has quit IRC02:02
*** rloo has quit IRC02:40
*** openstackgerrit has joined #openstack-ironic02:43
openstackgerritJulia Kreger proposed openstack/ironic master: [WIP]: fast tracked deployment support  https://review.openstack.org/63599602:43
*** sdake has joined #openstack-ironic03:06
*** hwoarang has quit IRC03:19
*** hwoarang has joined #openstack-ironic03:20
*** etingof has quit IRC03:41
*** hwoarang has quit IRC04:00
*** hwoarang has joined #openstack-ironic04:03
*** sdake has quit IRC04:15
*** penick has quit IRC05:24
*** mkrai has joined #openstack-ironic05:37
*** sdake has joined #openstack-ironic06:28
*** jtomasek has joined #openstack-ironic06:32
*** moshele has joined #openstack-ironic06:47
*** gkadam has joined #openstack-ironic07:00
*** sdake has quit IRC07:07
*** hamdyk has joined #openstack-ironic07:24
*** e0ne has joined #openstack-ironic07:36
openstackgerritMerged openstack/ironic-python-agent master: Add secondary sorting by name when guessing root disk  https://review.openstack.org/63523907:41
*** rpittau has joined #openstack-ironic07:57
arne_wiebalckgood morning ironic!07:57
rpittaugood morning ironic! o/07:57
rpittaufirst! xD07:58
arne_wiebalck:-D07:59
*** etingof has joined #openstack-ironic08:01
dtantsurmorning ironic08:17
rpittauhey dtantsur :)08:18
*** amoralej|off is now known as amoralej08:40
*** early has quit IRC08:54
*** early has joined #openstack-ironic08:56
iurygregorymorning everyone o/08:57
*** fdegir has joined #openstack-ironic08:57
rpittauhi iurygregory :)08:58
*** tssurya has joined #openstack-ironic09:00
*** early has quit IRC09:06
*** e0ne has quit IRC09:06
*** e0ne has joined #openstack-ironic09:07
*** dougsz has joined #openstack-ironic09:07
*** early has joined #openstack-ironic09:15
*** sunnaichuan has joined #openstack-ironic09:16
openstackgerritHamdy Khader proposed openstack/python-ironicclient master: Add is-smartnic port attribute to port command  https://review.openstack.org/62944909:20
openstackgerritHamdy Khader proposed openstack/python-ironicclient master: Add 'hostname' to port's local link connection  https://review.openstack.org/62877309:20
*** derekh has joined #openstack-ironic09:29
openstackgerritHamdy Khader proposed openstack/ironic master: [Follow Up] Expose is_smartnic in port API  https://review.openstack.org/63657509:30
*** e0ne has quit IRC09:34
*** mkrai has quit IRC10:14
*** ijpascual has joined #openstack-ironic10:19
ijpascualHi, I am trying to deploy baremetal with ironic/bifrost in Dell Poweredge server R640. Apparently, the RAID controller PERCH740P is not supported by IPA. Is this right? I have seen a commit from rpioso last year solving this issue, but I cannot get it working.10:27
*** MattMan has quit IRC10:28
*** MattMan has joined #openstack-ironic10:29
*** e0ne has joined #openstack-ironic10:46
*** e0ne_ has joined #openstack-ironic10:50
*** e0ne has quit IRC10:52
openstackgerritArkady Shtempler proposed openstack/ironic-tempest-plugin master: New test where instances are BM guest and VM  https://review.openstack.org/63659811:38
*** rpittau has quit IRC11:52
*** e0ne has joined #openstack-ironic11:56
*** e0ne has quit IRC11:57
*** e0ne_ has quit IRC11:57
*** hamzy has quit IRC11:59
*** sdake has joined #openstack-ironic12:03
*** e0ne has joined #openstack-ironic12:08
*** _fragatina has joined #openstack-ironic12:11
*** e0ne has quit IRC12:12
*** _fragatina_ has joined #openstack-ironic12:12
*** hamzy has joined #openstack-ironic12:13
*** _fragatina has quit IRC12:16
*** sdake has quit IRC12:16
*** dougsz has quit IRC12:18
*** e0ne has joined #openstack-ironic12:22
*** rpittau has joined #openstack-ironic12:29
*** dougsz has joined #openstack-ironic12:59
larsksI keep running into the situation in which baremetal nodes fail to boot because they're not getting a response from the dhcp server.  The dhcp requests are reaching the host, the neutron-managed dnsmasq server is running, but it doesn't appear to seeing the requests and it's not sending any replies.  The configuration in /var/lib/ironic/*boot seems fine.12:59
larsksDoes this ring a bell for anybody?12:59
*** arne_wiebalck_ has joined #openstack-ironic13:00
*** amoralej is now known as amoralej|lunch13:06
*** trown|outtypewww is now known as trown13:06
*** rh-jelabarre has joined #openstack-ironic13:06
*** e0ne has quit IRC13:11
openstackgerritIury Gregory Melo Ferreira proposed openstack/ironic-tempest-plugin master: Run all defined jobs defined in check and gate  https://review.openstack.org/63616913:12
*** e0ne has joined #openstack-ironic13:17
dtantsurlarsks: the causes can vary from invalid MACs on ports to bugs in Neutron :)13:20
dtantsurthe most common is networking misconfiguration though13:20
dtantsurI guess tcpdumping DHCP traffic is the first reasonable step13:20
*** e0ne_ has joined #openstack-ironic13:20
larsksThat's how I know the requests are making it to the host :).13:20
larsksBut yeah, I'm looking at neutron right now.13:20
*** e0ne has quit IRC13:23
rpiosoijpascual: I generally recall the issue. Please link the commit you mentioned.13:23
dtantsurlarsks: try finding its host file (dunno where it is on your system, maybe something /var/lib/neutron/..)13:26
dtantsurthat will at least tell you if neutron has prepared the DHCP environment right13:26
e0ne_I appreciate is anybody can review https://review.openstack.org/#/c/635493/13:30
patchbotpatch 635493 - ironic-ui - Add ironic-ui integration tests - 2 patch sets13:30
*** root has joined #openstack-ironic13:33
*** root is now known as w14161_113:33
jrollmorning ironic13:37
w14161_1After reboot, need to re-run stack.sh to run ironic command, anyway to run ronic command directly without run stack.sh after reboot or shutdown? For example, I would run "openstack baremetal xxx" after stack.sh run successfully, but after reboot, "openstack baremetal xxx" would fail with something like "http 400", if re-run stack.sh, then everything is fine.13:38
*** e0ne_ has quit IRC13:40
*** e0ne has joined #openstack-ironic13:45
*** mjturek has joined #openstack-ironic13:46
*** dnuka has joined #openstack-ironic13:47
dnukagood morning ironic13:47
*** e0ne_ has joined #openstack-ironic13:50
*** e0ne has quit IRC13:51
*** mjturek has quit IRC13:51
*** rloo has joined #openstack-ironic13:52
*** arne_wiebalck_ has quit IRC13:54
larsksdtantsur: here's a more ironic related question: after rebooting the controller, we have working networking.  Nodes going into the 'clean' state are succesfully booting the ipa image, but then instead of performing the cleaning process they seem to hang around indefinitely in "clean wait" state. ipa on the node console is looping over "heartbeat successful/waiting for next heartbeat" messages.13:54
*** mjturek has joined #openstack-ironic13:55
larsksI don't see any errors in the logs; what could be going on here?13:55
*** mjturek has quit IRC13:59
*** mjturek has joined #openstack-ironic14:00
*** mjturek has quit IRC14:01
*** mjturek has joined #openstack-ironic14:02
*** arne_wiebalck_ has joined #openstack-ironic14:03
*** e0ne_ has quit IRC14:04
mjturekmorning dnuka14:05
dnukaHey mjturek o/14:06
*** sdake has joined #openstack-ironic14:08
*** sdake has quit IRC14:09
*** sdake has joined #openstack-ironic14:10
*** sdake has quit IRC14:13
*** sdake has joined #openstack-ironic14:13
*** tzumainn has joined #openstack-ironic14:14
ijpascualrpioso I refer to this one https://review.openstack.org/#/c/545184/14:14
patchbotpatch 545184 - ironic - Fix iDRAC hardware type does not work with UEFI (MERGED) - 9 patch sets14:14
ijpascualrpioso for some reason when deploying it is not able to locate the RAID virtual disk14:15
dtantsurlarsks: check the nodes are not in maintenance14:18
larsksdtantsur: there are not, they are in "clean wait" state.14:19
dtantsurmorning jroll, dnuka14:19
dtantsurlarsks: well, the provision states are orthogonal to maintenance mode, you can be in both14:19
larsksOh, I see.  Yes, they are in maintenance now.14:19
larsksDoes that happen automatically on a failure?14:20
dtantsurlarsks: it can happen for various reasons. check the maintenance_reason field on a node.14:20
dnukahi dtantsur :)14:20
larsksdtantsur: "Timeout while cleaning"14:20
dnukamorning jroll o/14:21
dtantsurlarsks: okay, so previous cleaning failed. try moving the nodes out of maintenance and waiting a bit (if IPA still heartbeats)14:21
larsksLet me node maintenance unset and see what happens...14:21
*** sdake has quit IRC14:23
larsksdtantsur: that was it; maintenance state because the networking issue earlier prevented the nodes from booting.  Thanks.14:23
larsksThey seem to have all completed cleaning this time.14:24
dtantsurcool!14:24
larsksNow to retry the boot-from-volume which is what I was originally working on :)14:24
*** arne_wiebalck_ has quit IRC14:24
dnukadtantsur, thanks for the review, you guessed it right :) it happened when fixing the merge conflict. I will fix it.14:24
TheJuliagood morning14:25
* TheJulia sips coffee trying to wake up14:25
dtantsurmorning TheJulia14:26
dtantsurdnuka: happens :)14:26
dnukaHey TheJulia , good morning14:26
rpittauhi TheJulia :)14:29
*** e0ne has joined #openstack-ironic14:29
dnukahey rpittau o/14:29
rpittauhi dnuka :)14:30
*** e0ne_ has joined #openstack-ironic14:35
*** e0ne has quit IRC14:35
TheJuliahjensas: by chance, will you be able to rebase https://review.openstack.org/#/c/631946 in the next day o rtwo?14:37
patchbotpatch 631946 - ironic - API - Implement /events endpoint - 12 patch sets14:37
TheJuliahamdyk: There are two smartnic related patches for python-ironicclient that need to be updated. Since the API merged, it would be good to revise those14:38
*** e0ne_ has quit IRC14:41
iurygregorymorning TheJulia o/14:41
dnukahi iurygregory o/14:41
iurygregorydnuka, o/14:41
*** _fragatina_ has quit IRC14:42
*** amoralej|lunch is now known as amoralej14:42
openstackgerritMerged openstack/ironic master: [Follow Up] Expose is_smartnic in port API  https://review.openstack.org/63657514:43
*** hjensas has quit IRC14:46
*** sthussey has joined #openstack-ironic14:47
TheJuliaSo if any cores would take about about 45 minutes and review https://review.openstack.org/#/c/633052/, it would be greatly appreciated... else I'll have a grumpy dtantsur and I really don't want a grumpy dtantsur :)14:48
patchbotpatch 633052 - ironic - Support using JSON-RPC instead of oslo.messaging - 25 patch sets14:48
dtantsurnobody wants a grumpy dtantsur :)14:49
dnuka:)14:49
rpiosoijpascual: That commit didn't resolve the issue you described. Rather, it added support for UEFI boot mode to the idrac h/w type (driver).14:50
*** e0ne has joined #openstack-ironic14:50
dnukahey rpioso o/14:50
rpiosodnuka: Good morning :)14:51
dnuka:)14:51
rpiosoijpascual: I vaguely recall an issue testing against a PERC; however, I believe it was an H730. If memory serves, the CentOS ramdisk didn't have the driver.14:53
openstackgerritMark Beierl proposed openstack/ironic master: Doc Updates to Dell iDrac Driver  https://review.openstack.org/63664414:54
* etingof is curious, does grumpy dtantsur look like ironic bear?14:54
ijpascualI see rpioso, thanks for the input! Which driver you mean?14:55
*** e0ne_ has joined #openstack-ironic14:55
dnukahey etingof o/14:55
rpiosoI can confirm that in a downstream Queens-based release, IPA supports H740P.14:55
*** e0ne has quit IRC14:55
e0ne_TheJulia: hi. could you please review https://review.openstack.org/#/c/635493/ once you have a time? it's simple ironic-ui integration tests to verity that it works with the latest horizon14:57
patchbotpatch 635493 - ironic-ui - Add ironic-ui integration tests - 2 patch sets14:57
dtantsuretingof: grumpy dtantsur looks like this http://mirchudes.net/uploads/posts/2017-04/1493417242_manul.jpg14:58
TheJuliae0ne_: done14:58
TheJuliae0ne_: thanks!14:58
e0ne_TheJulia: thank you for the review!14:59
*** penick has joined #openstack-ironic14:59
TheJuliao/ penick14:59
rpiosoijpascual: The ramdisk may contain an older megaraid_sas driver.15:01
etingofthat looks like the very last second before the catastrophic cat explosion15:01
etingofdnuka, o/15:01
rpiosoijpascua: If you could get the dmesg output from the ramdisk, we could know for certain.15:02
larsksI'm trying to 'openstack server reboot' a baremetal node.  The request is getting as far as ironic-conductor, which says: "RPC change_node_power_state called for node 015954fa-c900-4798-8c04-808a1504fe35. The desired new state is rebooting."15:02
larsks...but it doesn't seem to be able to acquire an exclusive lock.15:02
rpiosoijpascual: ^^^15:02
larsksWhat could be holding the lock and preventing the power state change?15:02
ijpascualrpioso alright, I will grab a dmesg log and comeback :) thanks for your help15:03
rpiosoijpascual: You're welcome!15:03
arne_wiebalcketingof: For my education, what’s the reason/benefit behind failing ‘sloppy’ power sync nodes faster (and how much faster is faster)?15:04
larsks...and can I cancel the pending power state change and just reset the power myself?15:05
dnukahey arne_wiebalck o/15:07
*** baha has joined #openstack-ironic15:07
arne_wiebalckhey dnuka o/15:07
etingofarne_wiebalck, oh, sorry, I forgot to respond in gerrit! I can only blame my coffee reserves exhaustion15:07
arne_wiebalcketingof: no worries, I was just curious :)15:08
*** moshele has quit IRC15:09
etingofarne_wiebalck, so the idea being that if we have too many nodes to walk over them all in time, the unstable nodes would get better attention...15:09
penicko/ Julia15:10
penickhey hey :)15:10
arne_wiebalcketingof: aren’t they all scanned (sequentially)?15:10
penickAlso howdy, arne_wiebalck!15:10
etingofarne_wiebalck, so we'd hopefully fail them fast and get back on track with the rest of nodes15:10
arne_wiebalckhey penick o/15:10
openstackgerritMerged openstack/ironic-ui master: Add ironic-ui integration tests  https://review.openstack.org/63549315:11
etingofarne_wiebalck, the nodes are power-synced sequentially in 8 threads (by default)15:11
dtantsurpenick: \o15:12
arne_wiebalcketingof: so the sloppy nodes would basically make it to the top of the list15:12
penickHey dtantsur!15:12
etingofarne_wiebalck, exactly, that's the idea15:12
openstackgerritMerged openstack/sushy-tools master: Add configurable libvirt firmware  https://review.openstack.org/62060515:12
arne_wiebalcketingof: what I was wondering is how going through the list compares to the power scan frequency15:12
arne_wiebalcketingof: s/frequency/interval/15:13
arne_wiebalcketingof: there would only be a speed up when the scannig is much longer than the configured interval, no?15:14
etingofarne_wiebalck, I am thinking of the situation when we have so many nodes that they can't be power-synced within the periodic job run interval15:14
arne_wiebalcketingof: right, that’s my thinking15:15
etingofarne_wiebalck, in that case the next periodic job won't be started15:15
arne_wiebalcketingof: got it15:15
ijpascualI have the log rpioso, any preference on where I should upload it?15:15
*** hamdyk has quit IRC15:15
openstackgerritBob Fournier proposed openstack/ironic-inspector master: For ironic_inspector database, query using node_uuid only when adding entry  https://review.openstack.org/63665015:16
etingofarne_wiebalck, I wonder if you run into such situation...? would be interesting to know if that trick makes any real sense...15:16
arne_wiebalcketingof: we have currently around 1700 nodes15:16
arne_wiebalcketingof: and the interval is set to 30015:17
rpiosoijpascual: Preferably in a form that doesn't require a download nor email attachment.15:17
etingofarne_wiebalck, would you consider trying the latest (unreleased) ironic which does parallel power syncs? that might let you reduce power sync interval, hopefully15:18
* etingof considers that being a scientific experiment as opposed to testing software on the end users15:19
* arne_wiebalck is thinking about it15:20
arne_wiebalcketingof: I’m not sure we’d need a power sync fast than 5 mins15:20
arne_wiebalcketingof: faster15:20
rpiosodtantsur: Is there a preferred way to share dmesg output on the channel?15:20
dtantsurrpioso: paste.openstack.org?15:21
arne_wiebalcketingof: however, we will be integrating quite some additional nodes into ironic in the next months15:21
arne_wiebalcketingof: so we may hit th 300 secs15:21
rpiosoijpascual: ^^^ paste.openstack.org works for me :)15:21
rpiosodtantsur: Thank you :)15:21
arne_wiebalckarne_wiebalck: time(operator notices a node has power sync failures) >> 300 :-D15:23
etingofarne_wiebalck, right, I'd be interested in knowing your experience with multi-threading power syncs15:23
TheJuliaarne_wiebalck: we've often gotten complaints from deployments where they are dealing with 2000-5000+ nodes and start encountering power sync issues.15:24
* TheJulia notices arne_wiebalck talk to himself and think he will fit right in to ironic. :)15:24
arne_wiebalckTheJulia: definitely the root of all evil15:24
TheJuliathinks15:24
arne_wiebalckarne_wiebalck: ppower  sync I mean :-D15:24
TheJuliahehe15:24
arne_wiebalckTheJulia: not TheJulia :-D15:25
arne_wiebalckTheJulia: this is with the default interval?15:25
* etingof wonders when BMCs learn to emit power state change notifications15:25
arne_wiebalckTheJulia: and the customers require to know on a 60secs basis?15:25
arne_wiebalckTheJulia: or they merely see power syncs run into each other?15:26
arne_wiebalckTheJulia: and don’t really care about how quickly power sync failures are noticed15:26
TheJuliaarne_wiebalck: yeah, I remember speaking with an operator that was doing 10-15 minutes and did encounter an issue, I just don't precisely remember what it was15:26
TheJuliathey were seeing syncs basically never completing I beleive15:27
arne_wiebalcketingof: we can certainly give it a try and see how that changes the total time to do the power sync15:27
etingofarne_wiebalck, would be awesome15:27
etingofarne_wiebalck, prior to the latest changes, single-threaded ipmi call against a dead BMC could last for up to 300 sec15:30
ijpascualrpioso hope this is ok as well: https://pastebin.com/wiL2yd0b15:30
ijpascualNext time i will use the openstack one ;)15:31
etingofarne_wiebalck, if we have a handful of dead BMCs, power information becomes quite outdated for the rest of the nodes15:31
arne_wiebalcketingof: the power sync gets stuck?15:32
arne_wiebalcketingof: ah, sorry missed your previous message15:33
etingofarne_wiebalck, not fully stuck, but significantly delayed. e.g. ironic could block on dead BMCs for 300 * deadcount seconds before it deals with the rest of the fleet15:33
arne_wiebalcketingof: I see … 300 seems quite generous, no?15:34
etingofarne_wiebalck, it is the magic of ipmitool heuristics - when it hits non-responsive IPMI server, it back-offs its internal retry interval15:35
etingofarne_wiebalck, on top of that, it treats timeouts as possible authentication failures so it tries some different auth suite or something15:36
etingofarne_wiebalck, what adds up to the total ipmitool runtime15:36
arne_wiebalcketingof: uh ...15:37
etingofarne_wiebalck, so the latest changes in ironic essentially change two things: ironic kills ipmitool on *ironic* timeout and ironic runs ipmitool processes concurrently15:38
arne_wiebalcketingof: I see. I was assuming the timeout came from ironic already and is passed to ipmitool.15:38
* arne_wiebalck wonders now how accurate the power state info of his nodes is15:39
etingofarne_wiebalck, that is correct, but ipmitool does not really respect the timeout15:39
arne_wiebalcketingof: “nice” :-)15:40
arne_wiebalcketingof: thanks a lot for the details!15:40
arne_wiebalcketingof: I’ll try to give the concurrency patch a try and let you know15:41
arne_wiebalcketingof: with holidays coming up, this may take a couple of weeks, though15:41
etingofarne_wiebalck, you could find some relevant data points in the comment inside this patch -- https://review.openstack.org/#/c/607949/3/ironic/drivers/modules/ipmitool.py15:41
patchbotpatch 607949 - ironic - WIP: Avoid long-pending ipmitool processes - 3 patch sets15:41
arne_wiebalcketingof: we should not delay ironic from taking over the world15:43
etingofexactly! that's my only concern!15:43
arne_wiebalck:-D15:43
rpiosoijpascual: That works well, too :-) I'm analyzing the dmesg output, along with a subject matter expert.15:46
ijpascualThank you very much rpioso!15:47
rpiosonp15:47
* dtantsur looks at devstack with bloody eyes15:47
* TheJulia hands dtantsur a torch15:50
rpittauoO15:50
dtantsurThis is one of that moments when I'm glad I don't have an axe, otherwise I'd need a new monitor :D15:50
etingofdtantsur should better return that torch back to TheJulia15:51
dtantsurnot until this devstack finally builds successfully15:52
dnuka:)15:52
rpittauthat might require more than one torch :P15:52
*** gkadam has quit IRC15:53
* arne_wiebalck read “until this devstack finally burns successfully”15:54
dtantsur++++15:54
dtantsurand now folks I can tell you a downside of using clouds: you cannot kick a server when it does not behave :)15:54
etingoflong-distance relationship can be tough15:56
dtantsurlol15:56
rpittau"punch-as-a-service" could become a thing15:57
dtantsurthat could be an ironic feature15:57
dnuka:D15:57
larsksHey folks; I've had this happen several times now: I'm booting a baremetal server with nova.  It powers up, starts to boot, then Nova logs: "Instance shutdown by itself. Calling the stop API."  And powers off the node.16:02
larsksNova is showing that the power_state seen from ironic is still "power on", so I'm not sure why it thinks the instance shut down.16:02
TheJulia\o/ for race conditions16:04
TheJuliahmm16:04
*** sdake has joined #openstack-ironic16:06
larsksTheJulia: was that for me?16:06
openstackgerritIlya Etingof proposed openstack/sushy-tools master: Limit instances exposure  https://review.openstack.org/61651616:06
TheJulialarsks: on a call with my team. It sounds like nova is seeing it down during one of the deployment tranitions, and kills it after deployment16:06
TheJulialarsks: a very detailed sequence of events from your logs would really help us16:07
*** ianychoi has quit IRC16:08
larsksTheJulia: nova logs (grep <instance_uuid> nova/*.log) https://termbin.com/ef9i, from ironic (grep <node_uuid> ironic/*.log): https://termbin.com/6wu516:09
dtrainorI reinstalled my Undercloud this morning thinking maybe I broke an undercloud.conf while installing, in an attempt to fix this issue I'm having with trying to introspect this overcloud node.  looks like there are no changes from my problem since yesterday - tcpdump from the undercloud shows the node attempting to grab an IP, but the undercloud never answers16:09
larsksThe only thing that stands out to me is that "Instance shutdown by itself. Calling the stop API. Current" error.16:10
*** mjturek has quit IRC16:10
dtrainori went through the troubleshooting steps that hjensas helped me with yesterday and I still see only twip IPv4 IPs and one IPv6 IPs on br-ctlplane, hjensas said that there should be three IPs - undercloud_public_host, undercloud_admin_host, and (I forgot the other)16:11
*** w14161_1 has quit IRC16:11
*** hjensas has joined #openstack-ironic16:12
dtrainorspeak of the devil.16:12
dtantsuretingof: it seems like we broken something around vbmc in devstack, at least on centos. I'm seeing "BMC instance node-0 already running"16:13
*** mjturek has joined #openstack-ironic16:13
dtrainorhey there hjensas, suppose i can borrow some more of your time this morning?  still got that issue going on where a node fails to grab an IP as it starts to pxeboot16:13
* dtantsur hopes to solve his problems with || true16:14
TheJuliadtantsur: +++++++ on || true16:15
*** dustinc has quit IRC16:16
*** samc-bbc has joined #openstack-ironic16:17
etingofdtantsur, I have a guess - may be that happens because with the latest vbmc we do not need to [re]start each vbmc domain, just start the master process. May be devstack is still doing things like 'vbmc start node-X'?16:21
dtantsuretingof: it does indeed. I wonder why the gate is fine with it.16:21
etingofdtantsur, may be it's not treated as error by gate?16:21
TheJuliaif not already running, seems fine, if it has to start it goes *boom*16:21
dtantsurTheJulia: well, it's not running initially. I don't know why it fails for me..16:22
* dtantsur suspects black magic and holds his borrowed torch tighter16:22
TheJuliadtantsur: same for me on my xenial vm16:23
dtantsurhmmm16:23
TheJuliadtantsur: it is a magic torch, light or fire16:23
dtantsurso, || true for the win? :)16:24
dtantsuretingof: could you check if we can actually remove the start command now?16:24
etingofdtantsur, from devstack?16:24
dtantsuretingof: yeah16:24
etingofdtantsur, I will take a look16:25
dtantsurthnx16:25
openstackgerritMerged openstack/ironic master: Remove duplicated jobs and refactor jobs  https://review.openstack.org/63010016:26
etingofdtantsur, alternatively, may be we could change vbmc to ignore those start commands...? to retain backward compatibility16:26
dtantsurup to you. I think I don't have enough background on what has changed.16:26
*** dnuka has quit IRC16:33
*** sdake has quit IRC16:46
*** e0ne has joined #openstack-ironic16:47
*** sdake has joined #openstack-ironic16:48
*** e0ne_ has quit IRC16:48
*** dustinc has joined #openstack-ironic16:53
*** gyee has joined #openstack-ironic16:55
*** sdake has quit IRC16:58
*** _fragatina has joined #openstack-ironic16:59
*** _fragatina has quit IRC17:00
*** tssurya has quit IRC17:06
NobodyCamGood Morning Ironic'ers17:15
dtantsurmorning NobodyCam17:15
NobodyCamhey hey dtantsur :) happy hump day!17:16
rpittauhey NobodyCam :)17:16
*** dustinc has quit IRC17:20
*** mjturek has quit IRC17:20
openstackgerritHarald Jensås proposed openstack/ironic master: API - Implement /events endpoint  https://review.openstack.org/63194617:21
larsksTheJulia: looking more t athat "Instance shutdown by itself" error, it looks like Nova thinks that vm_power_state is power_state.SHUTDOWN...but that only happens if Ironic reports ironic_states.POWER_OFF...but the logs show ironic reporting power_state="power on" immediately afterwards. Is this just a timing issue of some sort?17:21
*** dustinc has joined #openstack-ironic17:22
larsksLooking at the ironic logs, ironic reports "Successfully set node power state to power on by power on" two seconds before Nova decides to shut things down.17:23
rpittaubye all! good night! o/17:23
TheJulialarsks: yeah, that is it observed it mid-deploy17:23
openstackgerritHarald Jensås proposed openstack/python-ironicclient master: Add Events support  https://review.openstack.org/34593417:24
*** rpittau has quit IRC17:24
TheJuliaI thought we had code to kind of guard against that update happening, at least during deployment....17:24
*** dustinc has quit IRC17:24
* TheJulia wonders if that changed17:24
rpiosoijpascual: We see the PERC H740 in the dmesg log -- [    7.458186] pci 0000:18:00.0: [1000:0016] type 00 class 0x010400. However, there does not seem to be a driver for it, e.g., megaraid_sas.17:24
*** dustinc has joined #openstack-ironic17:24
larsksWhich update?  I mean, it makes sense that nova is checking the power state at this stage, but it is somehow checking too early or getting stale information or...something?17:25
*** e0ne has quit IRC17:25
*** dustinc has quit IRC17:25
*** dustinc_ has joined #openstack-ironic17:26
larsksIt's odd though, because that "Triggering sync for uuid dcb4f055-cda4-4d61-ba8f-976645c4e92a _sync_power_states" happens *after* ironic reports that successful power on.17:26
rpiosoijpascual: We don't know who created the ramdisk or how it was created. Perhaps they have to include megaraid_sas in it somehow or bump the version of it if it's already there.17:27
*** dustinc_ has quit IRC17:27
*** dustinc has joined #openstack-ironic17:27
*** dustinc has quit IRC17:27
TheJulialarsks: if memory serves, and I suspect jroll will remember better than me since he has looked at that code more than I, bu tI think that sync is actually from a list that could have slightly stale data17:27
*** dustinc has joined #openstack-ironic17:27
*** dustinc has quit IRC17:28
*** dustinc has joined #openstack-ironic17:28
jrollTheJulia: larsks: there's definitely a risk of a race condition there, iirc17:29
jrollwhat version of nova/ironic is this?17:29
*** dustinc has quit IRC17:29
*** dustinc has joined #openstack-ironic17:29
larsksjroll: This is a recent delorean...so, master from around 1/23.  Ironic is at commit d057591.17:30
larsksI seem to be hitting this issue pretty reliably.17:30
*** dustinc has quit IRC17:30
jrollnova master too?17:30
*** dustinc has joined #openstack-ironic17:30
larskswait, sorry, that was the commit for the inspect client. ironic-api is at 4404292.17:30
larsksYeah, Nova from about the same time.17:31
jrollok17:31
*** _fragatina has joined #openstack-ironic17:31
* jroll pokes around code17:31
larsksNova-api is at ad842aa.17:31
*** hjensas has quit IRC17:33
*** rloo has quit IRC17:33
*** rloo has joined #openstack-ironic17:34
*** dtantsur is now known as dtantsur|afk17:37
dtantsur|afko/17:37
larsksjroll: I guess the call to self._sync_power_pool.spawn_n is asynchronous, so maybe Nova checks the power state before the sync has completed?17:37
jrolllarsks: yeah, that's my assumption. we use a short-lived cache for this: https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L89117:39
openstackgerritIlya Etingof proposed openstack/sushy-tools master: Add memoization to expensive emulator calls  https://review.openstack.org/61275817:39
openstackgerritBob Fournier proposed openstack/ironic-inspector master: Use processed bool as key in introspection_data DB table  https://review.openstack.org/63669217:40
jrolllarsks: so there's a small window where you could go in order of: deployment starts -> server shuts down for reboot -> cache updated -> deployment completes and server powered on -> power sync calls shutdown17:41
larsksjroll: it it helps, I posted links earlier to nova + ironic logs around this event.17:41
jrolllarsks: yep, have them open17:41
TheJuliaShould we consider special casing cache creation if in deployment to just mark the machine power as on?17:41
jrolllarsks: this is the change where we started using a cache: https://review.openstack.org/#/c/602127/17:42
patchbotpatch 602127 - nova - ironic: stop hammering ironic API in power sync loop (MERGED) - 2 patch sets17:42
jrollTheJulia: I'm not sure, that just feels like piling on hacks17:44
TheJuliathe alternative is lie about the power state in our API17:44
TheJuliaThere is a case where we do that...17:44
* TheJulia tries to remember17:44
TheJuliaor we talked about it17:45
jrollwell, a less lie-ful option would be...17:45
jrollhttps://github.com/openstack/nova/blob/master/nova/compute/manager.py#L761317:45
jrollif we make it here, since it's a wtf situation17:45
jrollwe could hit ironic and make real sure that's what happened17:45
jrolle.g. call get_info(use_cache=False) or so17:46
*** mjturek has joined #openstack-ironic17:46
jrollI can put up a POC patch if someone wants to take it over if we go that way17:46
TheJuliaThat actually seems like a much better idea if the nova folks are good with that.17:47
TheJuliasince it is a "double-check" of the state17:48
jrollit's either that or go back to hammering the crap out of ironic api17:48
TheJuliayeah, which is not ideal either17:48
TheJuliaWorst comes to worst, a poc patch would start the discussion in nova17:48
TheJulia++++17:48
larsksjroll: I'm happy to test out any patches.17:48
larsks...although since this only happens "sometimes" testing will be a PITA.17:49
jrollheh, right17:49
* jroll will hack something17:49
openstackgerritIlya Etingof proposed openstack/sushy-tools master: Limit instances exposure  https://review.openstack.org/61651617:51
*** derekh has quit IRC17:55
larsksSo, another timing question: when deleting a baremetal node (openstack server delete...), the instance stays in state ACTIVE even after the ironic state has moved to "cleaning".  Should nova just let go as soon as the state != active?17:57
larsksI ask because we're consistently seeing instances go to ERROR state when deleting them, so we have to delete them a second time, and I think it's because nova times out waiting on ironic.17:58
*** dougsz has quit IRC17:58
larsksOh, no, look at that; it's a traceback...17:59
larsksSo, nova is able to successfully de-allocate the network, and then it logs:18:00
larsksConflict: Node a00696d5-32ba-475e-9528-59bf11cffea6 can not be updated while a state transition is in progress18:00
larsks(logs @ https://termbin.com/qldx)18:00
jrolllarsks: TheJulia: something like this should do it (I haven't tested it, it may just blow up): https://review.openstack.org/63669918:03
patchbotpatch 636699 - nova - ironic: check fresh data when sync_power_state doe... - 1 patch set18:03
jrollit'll need a proper bug filed in launchpad18:04
larsksjroll: I'll go file a bug.18:04
jrolland I might not be able to finish it out this week. can come back to it later, or someone can take it over18:04
jrollthanks larsks ! :)18:04
larsksjroll: https://bugs.launchpad.net/nova/+bug/181579118:10
openstackLaunchpad bug 1815791 in OpenStack Compute (nova) "Race condition causes Nova to shut off a successfully deployed baremetal server" [Undecided,New]18:10
jrollcool, adding to that patch18:10
jrollthanks18:10
larsksjroll: do you have any thoughts on the conflict in https://termbin.com/qldx?18:10
jrolllarsks: I haven't begun to think about that yet :)18:10
larsksFair enough.18:10
jrollgive me a few and I can try to look, it seems familiar18:12
larsksSure, no rush.18:12
*** moshele has joined #openstack-ironic18:17
*** sdake has joined #openstack-ironic18:19
openstackgerritMerged openstack/sushy master: Move to zuulv3  https://review.openstack.org/63269218:22
*** e0ne has joined #openstack-ironic18:22
*** ijw has joined #openstack-ironic18:23
*** betherly has joined #openstack-ironic18:25
TheJulialarsks: networking removal during instance tear-down if my glance at that log is correct?18:29
*** betherly has quit IRC18:29
larsksTheJulia: I thought the networking removal was successful ("Took 1.71 seconds to deallocate network for instance")18:30
*** Chaserjim has quit IRC18:30
*** dustinc has quit IRC18:30
TheJulialarsks: there are two parts to it, one in neutron and one in ironic... ironic's can be delayed...18:30
*** mjturek has quit IRC18:32
larsksTheJulia: you're correct; it looks like it may be coming from _try_deallocate_network18:32
larsksIs this a case where Nova should just retry instead of immediately going to ERROR state?18:32
TheJuliaYeah, both nova and ironic try to nuke out some network config in ironic18:32
TheJuliaeither is allowed to win, nova will just try for a while18:32
TheJuliatl;dr we should really suppress the exception18:33
larsks...except it's not; as soon as it throws this error, it sets the instance to error and gives up.18:33
TheJuliawut?!?18:33
TheJuliathats not right18:33
TheJuliaugh18:33
larsksTheJulia: "Successfully reverted task state from deleting on failure for instance."18:33
*** trown is now known as trown|lunch18:34
larsksAnd then it shows up as "ERROR" in "openstack server list".18:34
larsksA subsequent "... server delete" will actually delete it.18:34
jungleboyjHey, couple of questions for you guys.18:34
*** moshele has quit IRC18:34
TheJuliamelwitt: sorry to tag you, but does ^^^ surface any memories of anything?18:34
* TheJulia shoudl have been less vague... changes in nova to related areas18:35
TheJuliajungleboyj: shoot, expect delays in replies today18:35
jungleboyjOk.  I think it is pretty simple TheJulia18:35
TheJuliaand does it involve cloning me?18:35
*** amoralej is now known as amoralej|off18:35
jungleboyjDoes Ironic require MAC addresses for the imported barmetal nodes for PXE boot to reliably work?18:36
jrolljungleboyj: yes18:36
jrollPXE is a DHCP thing, and DHCP works on MAC addresses18:36
jroll(inspector can automatically find mac addresses for you, but they are required)18:36
jungleboyjjroll: Excellent.  My team in China has been fighting with me over this for weeks.  For some reason they see it work without MAC addresses, so they think, but it never works for me.18:37
jrollheh18:37
*** sdake has quit IRC18:37
jrolljungleboyj: *technically* it can work with some other kind of address similar to a mac address that infiniband uses, but... yeah.18:37
jungleboyjI am guessing that they are booting into an existing image or getting a cached result or something and not verifying it really works.18:38
TheJuliasure sounds like it18:38
*** sunnaichuan has quit IRC18:38
jungleboyjIt is a pet peeve for me as when I first started using Ironic I assumed they were right and spent a good few days beating my head against it.  Sniffing network traffic and stuff.18:38
jungleboyjThank you for confirming my findings.18:39
jungleboyjTheJulia:  I am working on cloning.  Will share when I get it working.  :-)18:39
TheJuliajungleboyj: make sure they don't have a mac after the fact...18:39
jungleboyjTheJulia:  What do you mean?18:39
TheJuliajungleboyj: it _is_ possibel, if they have modified the base config... for inspector to be quietly invoked and ports created18:39
TheJuliabut we don't test that in CI or anything18:40
jungleboyjTheJulia:  I don't think they did that.  This is just using director to do RHOSP deployment.18:40
TheJuliaoh yeah, then yeah.. they absolutely need mac addresses if they are not explicitly doing an inspection18:40
jrolllarsks: do you have ironic conductor logs to go with those nova logs?18:41
jungleboyjTheJulia:  Thank you.  My other questions are lower priority.  Will ping you some time when you are less busy.18:41
larsksjroll: sure :)18:41
jungleboyjjroll: Thanks!18:41
larsksOne second...18:41
jrolljungleboyj: feel free to ask, someone will answer when they get time :)18:41
TheJuliajungleboyj: what jroll said :)18:42
jungleboyjjroll: Ok, will do when I have a few minutes.  Thanks.18:42
larsksjroll: logs from slightly before/after the Conflict error in the nova logs (limited to just the baremetal node in question): https://termbin.com/8ebp18:43
jrollthanks18:44
jrollhm, looks like it should have been in CLEANWAIT18:45
openstackgerritMerged openstack/ironic-inspector master: Introspection data storage backend follow up  https://review.openstack.org/63296918:45
*** ijw has quit IRC18:46
*** betherly has joined #openstack-ironic18:47
*** betherly has quit IRC18:47
larsksjroll: should I open a bug for this, also?18:49
jrolllarsks: yeah, I think so18:49
jrollit makes sense why this is breaking the way it is18:50
jrollit just doesn't make sense how we got here18:50
jrollor something18:50
* jroll might not be at full brain capacity right now18:50
jrollsorry, I've got nothing right now. :(18:51
*** sri_ has quit IRC18:52
* jroll steps away for a bit18:52
*** sri_ has joined #openstack-ironic18:52
TheJulianodes going through deletion into cleanwait remain locked for a while.19:01
TheJuliaThere is a specific reason that my brain is failing to recall at the moment19:01
jungleboyjSo the kind of generic question I had was if there have ever been proposals in the past for Ironic to be enhanced to do more system lifecycle management?  Firware updates, etc?19:03
jungleboyjI know that that isn't the goal of Ironic, but just wondering if anyone has ever pursued it in the past.19:04
openstackgerritMark Goddard proposed openstack/ironic master: WIP: Prepare for instance with power off  https://review.openstack.org/63672019:04
larsksjroll: fyi, https://bugs.launchpad.net/nova/+bug/181579919:09
openstackLaunchpad bug 1815799 in OpenStack Compute (nova) "Attempting to delete a baremetal server places server into ERROR state" [Undecided,New]19:09
jrolllarsks: thanks19:10
jrolljungleboyj: yeah, definitely has been proposed. some of it has been done - for example, the ilo driver has firmware update capabilities, but the admin has to specify the firmware to write and such19:16
jrolle.g. https://storyboard.openstack.org/#!/story/152621619:16
jrollwe haven't done full blown lifecycle automation, though... it may or may not be in scope, but regardless is a ton of work with lots of weird corner cases and such19:18
jrollI'd love to see a CMDB-ish thing which handles tracking lifecycle things, and tells ironic to do things like update firmware to version X19:18
larsksOpinion question that I think was lost earlier: right now, it looks like Nova keeps a baremetal instance in the ACTIVE state after it's entered the cleaning step on the ironic side.  (a) is this intentional, and (b) does it make sense?  Or should Nova considere the server "deleted" as soon as it leaves the "active" state on the ironic side?19:21
jrollah, I thought that was superceded by the errors you found :)19:22
TheJulialikewise, I'd prefer to keep ironic much more of a swiss army knife of sorts19:22
jrolllarsks: we do consider the nova instance gone as soon as it gets into a cleaning state, so not sure what's happening on your system. see https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1170-L118119:23
larsksjroll: huh, okay.  I'll take a closer look, then.19:23
larsksI wonder if this is really just a symptom of that conflict thing.19:24
jrolllarsks: it might have something to do with... yeah19:24
jrollI think it is, that exception is *just* before we return to the higher level in nova that marks the instance deleted19:24
larsksOkay, that makes sense.19:24
jrollthe block I linked is inside here: https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L123819:25
jrollyour exception is here: https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L124519:25
jrolland then we pop back up to the common nova layer19:25
larsksGot it.19:25
*** ijw has joined #openstack-ironic19:26
jroll(which is here, if you're curious: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2635 see the vm_state change shortly after)19:26
larsksThanks! This has all been very educational :)19:27
openstackgerritMark Goddard proposed openstack/ironic master: Deploy templates: data model, DB API & objects  https://review.openstack.org/62766319:28
*** ijw has quit IRC19:28
jrollyou're welcome :)19:31
openstackgerritBob Fournier proposed openstack/ironic-inspector master: Use processed bool as key in introspection_data DB table  https://review.openstack.org/63669219:31
openstackgerritBob Fournier proposed openstack/ironic-inspector master: Use processed bool as key in introspection_data DB table  https://review.openstack.org/63669219:33
larsksIf I create a volume from an image that specifies a separate kernel and ramdisk, those properties aren't preserved on the volume. I see that I can set ramdisk and kernel ina baremetal node's instance_info, but is there any way to either set them per-volume or to provide them when booting a new baremetal instance via nova?19:38
* jroll has no idea, but if they aren't preserved on the volume, that feels like a bug19:40
larsksI left the launchpad tab open this time...:)19:43
*** irclogbot_1 has quit IRC19:47
*** e0ne has quit IRC19:47
*** trown|lunch is now known as trown19:52
*** bfournie has quit IRC19:53
*** arne_wiebalck_ has joined #openstack-ironic19:56
*** ijw has joined #openstack-ironic19:57
*** irclogbot_1 has joined #openstack-ironic20:00
*** ijw has quit IRC20:01
jungleboyjjroll: TheJulia  Ok, so enhancements that would support more lifecycle management to Ironic would not be rejected.20:02
jungleboyjOr seen as out of Ironic's scope.20:02
jrolljungleboyj: some may be out of scope, but it's worth having the discussion20:03
jrollcan always file a short RFE and go from there20:03
TheJuliajungleboyj: and driver enhancements for things like firmware/settings would not be rejected if they are already in ironic in other drivers20:04
*** _fragatina has quit IRC20:05
jungleboyjAwesome.  That was a much more concise discussion than I expected.20:05
TheJulialarsks: so partition/filesystem image?20:07
TheJulialarsks: it sounds like the best route is burning into the image20:10
* TheJulia might be confused20:10
larsksTheJulia: partitioned images with a bootloader work fine. But we would also like to deliver kernel and ramdisk via pxe and just have the kernel mount root via is so.20:13
larsksArg, iscsi.20:13
larsksSorry phone20:13
larsksIn theory that would allow mounting root from any device supported by the kernel, not just iscsi.20:19
TheJulialarsks: in "theory". In practice... rbd is problematic20:23
larsksI wasn't even thinking specifically about the, but sure.20:24
TheJuliarealistically, booting from parameters on the  node instance_info is after the image since the image... at least with the bfv work done so far, is geared to be agnostic20:24
TheJuliaso on image boot loader, which Ironic doesn't have the knowledge of or ability to modify as-is20:25
larsksI think we're talking about slightly different things. I have to run off for a drs appointment right now; let me get back to you20:29
melwittTheJulia: sorry for the delayed reply, but in the traceback I see that during ironic driver.destroy(), it's hitting the conflict 409 when it tries to _cleanup_deploy and subsequently _remove_instance_info_from_node. the self.ironicclient.call('node.update' experiences a conflict 409. if it's appropriate to retry in that case, it seems like something to add to the ironic driver destroy() method, IIUC20:50
*** samc-bbc has quit IRC20:58
*** jtomasek has quit IRC21:02
TheJulialarsks: possibly. Enjoy21:17
TheJuliamelwitt: Thanks. I suspect tolerating is likely fine. One of those things we mutually try to remove/clean-up. We could always consider removing the nova side since the contract is fairly set with ironic removing the field. Although... I've had people disagree from the other point of view *sigh*21:21
melwittTheJulia: not sure I understand. you mean an alternative approach being to do something on the ironic side to avoid returning 409 in the first place?21:23
TheJuliamelwitt: to tell nova not to try and perform the delete of the field anymore....21:23
TheJuliabut I need to look at the code and validate that we can do that on the nova side21:23
TheJuliaI think we pop the fields on ironic's side as a very very very last step21:23
melwittdelete the field == node.update?21:24
TheJuliaI may also have lost my mind, and may be rocking in the corner21:24
TheJuliamelwitt: yes21:24
jrollTheJulia: I think we have that in _cleanup_deploy() to ensure it happens in exceptional cases (failed deployments, etc) where ironic might not hit that code path21:24
melwittok21:24
TheJuliajroll: oh, right21:24
TheJuliagood point, if it fails super early on21:24
* TheJulia goes back to rocking in the corner21:24
*** ijw has joined #openstack-ironic21:30
*** whoami-rajat has quit IRC21:32
*** arne_wiebalck_ has quit IRC21:36
*** _fragatina has joined #openstack-ironic21:45
*** ijw has quit IRC21:51
*** trown is now known as trown|outtypewww22:06
*** ijw has joined #openstack-ironic22:16
*** openstackgerrit has quit IRC22:22
*** eandersson has quit IRC22:33
*** ijw has quit IRC22:37
*** ijw has joined #openstack-ironic22:38
*** ijw has quit IRC22:42
*** openstackgerrit has joined #openstack-ironic22:48
openstackgerritJulia Kreger proposed openstack/ironic-inspector master: Allow processing power_off to be defined  https://review.openstack.org/63677422:48
openstackgerritHamdy Khader proposed openstack/python-ironicclient master: Add is-smartnic port attribute to port command  https://review.openstack.org/62944922:54
openstackgerritHamdy Khader proposed openstack/python-ironicclient master: Add 'hostname' to port's local link connection  https://review.openstack.org/62877322:54
*** eandersson has joined #openstack-ironic22:59
openstackgerritJulia Kreger proposed openstack/ironic-inspector master: Add ironic API url to inspector IPA config  https://review.openstack.org/63677823:13
openstackgerritJulia Kreger proposed openstack/ironic master: fast tracked deployment support  https://review.openstack.org/63599623:24
*** hwoarang has quit IRC23:32
*** hwoarang has joined #openstack-ironic23:33
*** hwoarang has quit IRC23:48
*** hwoarang has joined #openstack-ironic23:49
*** w14161_1 has joined #openstack-ironic23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!