*** pmannidi is now known as pmannidi|brb | 01:11 | |
*** pmannidi|brb is now known as pmannidi | 01:28 | |
*** pmannidi is now known as pmannidi|Lunch | 05:07 | |
*** pmannidi|Lunch is now known as pmannidi | 05:41 | |
arne_wiebalck | Good morning, Ironic! | 06:23 |
---|---|---|
jssfr | Good Morning! | 06:23 |
iurygregory | good morning arne_wiebalck jssfr and Ironic o/ | 06:38 |
arne_wiebalck | hey iurygregory o/ | 06:38 |
dtantsur | morning ironic | 08:57 |
iurygregory | good morning dtantsur | 08:57 |
kamlesh6808c | Hi Team, is pxe boot meant to be working with UEFI mode? | 09:59 |
dtantsur | kamlesh6808c: yes, although it requires different configuration (see the docs) | 10:00 |
kamlesh6808c | dtantsur : i have followed https://docs.openstack.org/ironic/latest/install/configure-pxe.html#uefi-pxe-grub-setup doc for configuration,during boot process its waits forever for netboot image, Any suggestion to overcome? | 10:03 |
dtantsur | kamlesh6808c: it may be some networking or firewell problems. use tcpdump to trace all packages to where they are lost/not responded. | 10:04 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Log traceback in fail_on_error https://review.opendev.org/c/openstack/ironic/+/807105 | 10:17 |
dtantsur | janders: ^^^ | 10:17 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: DNM testing fail_on_error https://review.opendev.org/c/openstack/ironic/+/807107 | 10:18 |
janders | dtantsur a big +1 | 10:20 |
janders | :) | 10:20 |
iurygregory | dtantsur, tks for the review in the httpheaders patch =) | 10:22 |
kamlesh6808c | dtantsur: for bios boot mode it's working. | 10:56 |
kamlesh6808c | issue is with UEFI bood mode only | 10:57 |
opendevreview | Merged openstack/bifrost master: Add support for being dhcp relay target https://review.opendev.org/c/openstack/bifrost/+/804482 | 11:02 |
janders | iurygregory thank you for your comments in iDRAC / LC reset patch | 11:53 |
janders | hope to fix these up tomorrow | 11:53 |
janders | see you tomorrow Ironic o/ | 11:53 |
iurygregory | janders, take your time o/ bye! | 11:54 |
opendevreview | Merged openstack/ironic master: Revert "Allow reboot to hard disk following iso ramdisk deploy." https://review.opendev.org/c/openstack/ironic/+/805284 | 12:33 |
opendevreview | Merged openstack/ironic-python-agent master: Check the network burnin roles and partner https://review.opendev.org/c/openstack/ironic-python-agent/+/804092 | 12:43 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Improve edge-case debugging for deployment and cleaning https://review.opendev.org/c/openstack/ironic/+/807105 | 13:59 |
iurygregory | dtantsur, I just noticed I didn't add any documentation about using the redfish vendor passthru subscriptions, do we have a specific place where we add this? | 14:35 |
dtantsur | iurygregory: probably the redfish docs | 14:49 |
dtantsur | https://docs.openstack.org/ironic/latest/admin/drivers/redfish.html | 14:49 |
iurygregory | dtantsur, ack | 14:55 |
sshnaidm | dtantsur, hey, wrt https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/806809 - I'm going to a long pto from tomorrow, so I won't be capable to work on it soon, maybe stevebaker will want to fix it :) | 15:00 |
dtantsur | ack, enjoy your pto! | 15:03 |
iurygregory | we need ipa-builder changes before 13Sep (so we can request a release) | 15:04 |
TheJulia | we'll need to release regardless and treat it as bug fixes because they will need to end up going back to Wallaby | 15:09 |
dtantsur | Support for an OS that is not even released is hardly a bug fix :) | 15:10 |
dtantsur | but I won't block that | 15:10 |
TheJulia | decisions far above my pay grade | 15:10 |
dtantsur | well, here we make the decisions | 15:11 |
TheJulia | but honestly, the issues we're *aware* of is more like "oh, package changed" | 15:11 |
dtantsur | honestly, now you made me want to block it... | 15:11 |
iurygregory | oh god | 15:11 |
* dtantsur won't because he has more important things to care about | 15:11 | |
iurygregory | good morning TheJulia | 15:11 |
TheJulia | That is your prerogative, it is literally going to shoot your friends in the foot. | 15:12 |
dtantsur | Please take a step back. "Accept this change or our management will hurt us" is not an argument anyone should use. | 15:12 |
dtantsur | As I said, I'm not going to die on that hill, so please don't try to convince me otherwise. | 15:13 |
TheJulia | Thats not what I suggested, we're going to be hampered and blocked | 15:13 |
TheJulia | Really, thats all. I have no desire to die on a hill either :) | 15:16 |
dtantsur | so make sure to approve that backport during your evening so that I don't see it :D | 15:16 |
iurygregory | hehehe | 15:17 |
iurygregory | omg | 15:17 |
TheJulia | lol | 15:17 |
iurygregory | dtantsur, you can close the irc client also | 15:17 |
iurygregory | and don't look at the irc logs | 15:17 |
TheJulia | lol | 15:17 |
TheJulia | iurygregory: So we arrange for tasty beer delivery.... | 15:18 |
dtantsur | :D | 15:18 |
TheJulia | See, he likes that idea :) | 15:18 |
dtantsur | I'll be camping in the woods at some point in 1-2 weeks | 15:18 |
dtantsur | you'll be able to land anything you want | 15:18 |
TheJulia | most excellent | 15:18 |
iurygregory | TheJulia, yeah, we can also disable the bot on the day so we don't know the changes that got merged LOL | 15:18 |
dtantsur | :) | 15:18 |
TheJulia | iurygregory: now that is just evil ;) | 15:19 |
dtantsur | I also intend to disable internet on my phone and only use it for navigation/emergency | 15:19 |
iurygregory | https://media.giphy.com/media/nJmROmxdUuLFC/giphy.gif | 15:21 |
dtantsur | what a PTL have we got this time :D | 15:22 |
TheJulia | one who posts awesome gifs of plotting evil?! | 15:23 |
dtantsur | wait, it wasn't a selfie? | 15:23 |
dtantsur | sorry | 15:23 |
TheJulia | lol | 15:23 |
iurygregory | we still plan to take over the world right? :D | 15:23 |
TheJulia | I think so yes | 15:23 |
iurygregory | dtantsur, LOL | 15:23 |
iurygregory | so I still need to follow some guidelines =D | 15:24 |
dtantsur | yeah, TheJulia is your World Takeover Liaison | 15:24 |
iurygregory | awesome \o/ | 15:24 |
dtantsur | I don't remember if this position was accepted by the TC or not... but we'll take them over anyway, so who cares? | 15:24 |
iurygregory | ++ | 15:24 |
TheJulia | dtantsur: we just overthrow everything and install cats as our surpreme overlords | 15:24 |
TheJulia | That way, everyone can become crazy cat people | 15:25 |
dtantsur | I think from the point of view of cats the "install" part is a bit naive | 15:25 |
dtantsur | because it assumes they haven't been the overlords from the beginning | 15:25 |
TheJulia | lol, true | 15:25 |
dtantsur | but yeah, let's formalized it finally | 15:25 |
TheJulia | exactly! | 15:25 |
iurygregory | do we need a spec? | 15:25 |
iurygregory | or have this in our docs? :D | 15:26 |
dtantsur | let's start with an RFE | 15:26 |
iurygregory | makes sense | 15:26 |
TheJulia | rpioso: that bz we discussed yesterday. I took a look at the extra information supplied and compared it to the logs, and it appears the power sync state enforcement setting has been disabled, which causes the power off from the configuration job end result to never result in a power on | 15:29 |
TheJulia | well, we go to turn it on, and find the machine already on, and then later on it gets turned off by the config job | 15:29 |
TheJulia | and because the sync has been disabled, it just records that it is off and moves on | 15:29 |
TheJulia | but that is well after deployment has been completed. | 15:29 |
arne_wiebalck | bye everyone o/ | 15:48 |
* dtantsur sees a cleaning with empty steps and goes wtf | 15:49 | |
dtantsur | okay, cleaning is still broken with the ramdisk deploy. lovely. | 15:53 |
rpioso | TheJulia: We confirmed we do not modify [conductor] force_power_state_during_sync on the Director. That knob is not in /var/lib/config-data/puppet-generated/ironic/etc/ironic/ironic.conf. | 16:46 |
dtantsur | rloo: do we expect the normal agent-base cleaning to work for the anaconda deploy? | 16:49 |
rloo | dtantsur: yes. i don't recall the details, but i remember jay saying that. i believe it should be set up to use agent-based cleaning. | 16:51 |
dtantsur | rloo: well, it's likely just as broken as the ramdisk deploy | 16:51 |
rloo | ugh :-( | 16:51 |
dtantsur | so okay, I think I'll fix both | 16:51 |
rloo | thx! | 16:52 |
dtantsur | we have too many mixins in agent_base... | 16:52 |
rloo | (good and a bad thing...) | 16:52 |
rpioso | TheJulia: Take that back :-/ We do see it there. It is set to False. | 16:52 |
dtantsur | I thought it was a Director-wide setting, but I may be confusing something | 16:53 |
rpioso | TheJulia: Interestingly, the same configuration works with 14th Generation. | 16:54 |
dtantsur | rpioso, TheJulia, this setting was there since forever: https://opendev.org/openstack/instack-undercloud/src/tag/newton-eol/elements/puppet-stack-config/puppet-stack-config.yaml.template#L387 | 16:56 |
dtantsur | (this is newton!) | 16:56 |
*** sshnaidm is now known as sshnaidm|off | 17:01 | |
rpioso | dtantsur: Thank you! | 17:03 |
dtantsur | see you tomorrow folks | 17:22 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Fix in-band cleaning in the ramdisk deploy https://review.opendev.org/c/openstack/ironic/+/807187 | 17:23 |
TheJulia | rpioso: weird it worked but it is basically a giant race condition since we have no way of really knowing what, and how long it is going to take. If it lines up with power sync periodic triggering, I can see it being a thing. | 19:43 |
TheJulia | rpioso: could the newer generation be recording some state from start of the job that we're colliding with maybe? | 19:51 |
rpioso | TheJulia: Hrm ... The deployment of the instance OS seems to be successful. It is on the root device, a RAID volume. Powering on the system from the BMC causes it to boot into that OS. That suggests the boot device was successfully set to persistently be the disk. All that said, I have not yet seen a record which confirms the baremetal node reached the active state, so ... | 19:57 |
TheJulia | rpioso: The information we were provided with has a baremetal node list reporting all nodes active, but also powered off | 19:57 |
TheJulia | the logs, when sifting through the workflow, jive with that. We observe the node turning itself off... basically. We also obseved it turning itself on before that. | 19:58 |
rpioso | TheJulia: We're puzzled by the observations that the node turns itself on and then turns itself off. Do you believe it's doing that on its own? | 20:03 |
opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic master: Fix typo and add subscription docs https://review.opendev.org/c/openstack/ironic/+/807213 | 20:05 |
rpioso | TheJulia: When the boot device is set to persistently be the disk, a BMC configuration job is created. The idrac-wsman management interface implementation waits for that job to be scheduled before returning to the caller. Note that it does not do anything to affect the power state of the system. The expectation is that the job will run when it is powered on. The system should not autonomously power itself on. | 20:09 |
rpioso | TheJulia: Since [conductor] force_power_state_during_sync is set to false, I am unclear on the prospective race condition. Please elaborate. | 20:14 |
rpioso | TheJulia: Just after ironic successfully deploys the instance, is the bare metal node turned off while it is booting into the activated instance? In other words, might it not be finished booting when it is powered off in preparation for performing network configuration? | 20:26 |
rpioso | TheJulia: In other words, is the system abruptly powered off while booting into the instance OS for the first time? If so, that power action may be coincidental with the iDRAC processing the configuration job to persistently set the boot device to disk. The iDRAC may proceed to complete the job by powering the server back on and executing it. When the job has been completed, it would power the system off. | 20:35 |
stevebaker | morning | 20:36 |
TheJulia | rpioso: I'm fairly convinced based upon the log entries I've seen | 20:46 |
TheJulia | rpioso: we expliclty turn it off to chagne the network addresses. When we go to turn it back on, the machine is already back on | 20:46 |
TheJulia | rpioso: a little while later, the machine is back off again completely outside of our actions, | 20:47 |
TheJulia | of course, also after deployment | 20:47 |
TheJulia | The power on is intended drive the machine to boot the worload | 20:47 |
TheJulia | I think it is a race becasue prior tests reportly resulted in randomized behavior | 20:48 |
TheJulia | by pure luck, every node this most recent time | 20:48 |
TheJulia | and since the idrac powers the machine back off, nothing is there to power it back on since we're past the deployment process. | 20:49 |
rpioso | TheJulia: I was told all of the overcloud servers in 2 clusters failed in the same way. There has been no successful automated installation. | 20:50 |
TheJulia | rpioso: the bz comments I have state otherwise, that they were able to ping some machines on prior runs | 20:50 |
rpioso | TheJulia: Please remind me ... does ironic boot the newly deployed and activated node into the instance OS? | 20:52 |
TheJulia | rpioso: it does it by power on, but if a node is *already* in that power state..... you can imagine what happens then | 20:53 |
TheJulia | power up to a powered-on machine is a an effective non-operation | 20:54 |
TheJulia | if the setting is set to True, then that would at least end up getting the power back on, as opposed to just recording "oh, the machine is off now, let me update that in my database" | 20:57 |
TheJulia | rpioso: I'd be happy to sift through the logs with you and show you what I'm seeing tomorrow | 20:58 |
TheJulia | I've been looking at issues and logs all day, so I'm about out of spoons | 20:59 |
rpioso | TheJulia: Thank you for that offer. I accept :-) | 20:59 |
TheJulia | rpioso: Okay, just first thing-ish tomorrow I'll ping you once I have coffee in hand | 21:00 |
* TheJulia allows morbid curiosity to lake over and looks at another bug.... and finds a comment which gives her hope for the universe | 21:01 | |
rpioso | TheJulia: I wonder what would happen if the amount of time it took to complete this processing substantially changed: https://github.com/openstack/ironic/blob/stable/train/ironic/drivers/modules/ipxe.py#L281-L282 | 21:07 |
opendevreview | Steve Baker proposed openstack/ironic master: WIP use packaged grub efi for network boot https://review.opendev.org/c/openstack/ironic/+/806808 | 21:08 |
TheJulia | rpioso: that has always been asynchronous in nature | 21:10 |
TheJulia | Or it returns immediately now... | 21:10 |
TheJulia | https://github.com/openstack/ironic/blob/1bdee995837c2511e1513cc0f5ac24a0d60963e8/ironic/drivers/modules/agent_base_vendor.py#L675 is where we see the power suddenly toggle unexepctedly as we wrap things up | 21:10 |
TheJulia | which is like the major step after | 21:10 |
TheJulia | BUT you have a good point, maybe set boot device for wsman could have been blocking on 14g | 21:10 |
opendevreview | Steve Baker proposed openstack/ironic master: WIP use packaged grub efi for network boot https://review.opendev.org/c/openstack/ironic/+/806808 | 21:15 |
rpioso | TheJulia: The call I linked causes this to execute: https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/drac/management.py#L326-L335. That waits for the iDRAC configuration job to be scheduled. | 21:15 |
rpioso | TheJulia: The logic has remained the same for a long time. | 21:15 |
TheJulia | well, it might have always been scheduled | 21:15 |
rpioso | TheJulia: When that code was added, we found the configuration job wasn't scheduled, so it wasn't executed when the system was powered on. It skipped right over it. The system did not boot from the desired device. | 21:17 |
TheJulia | yeah, same. The overall path and behavior really hasn't changed, but tomorrow, I'll show you what I saw in the logs and we can walk through it | 21:18 |
rpioso | TheJulia: +1 | 21:19 |
opendevreview | Steve Baker proposed openstack/ironic master: WIP use packaged grub efi for network boot https://review.opendev.org/c/openstack/ironic/+/806808 | 22:31 |
opendevreview | Steve Baker proposed openstack/bifrost master: WIP support grub network boot https://review.opendev.org/c/openstack/bifrost/+/807220 | 22:59 |
tonyb | Can node cleaning be disabled per-node? | 23:16 |
stevebaker | tonyb: yes, by setting the automated_clean attribute on thenode | 23:21 |
stevebaker | ...to falsw | 23:22 |
stevebaker | false | 23:22 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!