Thursday, 2021-09-02

*** pmannidi is now known as pmannidi|brb01:11
*** pmannidi|brb is now known as pmannidi01:28
*** pmannidi is now known as pmannidi|Lunch05:07
*** pmannidi|Lunch is now known as pmannidi05:41
arne_wiebalckGood morning, Ironic!06:23
jssfrGood Morning!06:23
iurygregorygood morning arne_wiebalck jssfr and Ironic o/06:38
arne_wiebalckhey iurygregory o/06:38
dtantsurmorning ironic08:57
iurygregorygood morning dtantsur 08:57
kamlesh6808cHi Team, is pxe boot meant to be working with UEFI mode?09:59
dtantsurkamlesh6808c: yes, although it requires different configuration (see the docs)10:00
kamlesh6808cdtantsur : i have followed https://docs.openstack.org/ironic/latest/install/configure-pxe.html#uefi-pxe-grub-setup doc for configuration,during boot process its waits forever for netboot image, Any suggestion to overcome?10:03
dtantsurkamlesh6808c: it may be some networking or firewell problems. use tcpdump to trace all packages to where they are lost/not responded.10:04
opendevreviewDmitry Tantsur proposed openstack/ironic master: Log traceback in fail_on_error  https://review.opendev.org/c/openstack/ironic/+/80710510:17
dtantsurjanders: ^^^10:17
opendevreviewDmitry Tantsur proposed openstack/ironic master: DNM testing fail_on_error  https://review.opendev.org/c/openstack/ironic/+/80710710:18
jandersdtantsur a big +110:20
janders:)10:20
iurygregorydtantsur, tks for the review in the httpheaders patch =)10:22
kamlesh6808cdtantsur: for bios boot mode it's working. 10:56
kamlesh6808cissue is with UEFI bood mode only10:57
opendevreviewMerged openstack/bifrost master: Add support for being dhcp relay target  https://review.opendev.org/c/openstack/bifrost/+/80448211:02
jandersiurygregory thank you for your comments in iDRAC / LC reset patch11:53
jandershope to fix these up tomorrow11:53
janderssee you tomorrow Ironic o/11:53
iurygregoryjanders, take your time o/ bye!11:54
opendevreviewMerged openstack/ironic master: Revert "Allow reboot to hard disk following iso ramdisk deploy."  https://review.opendev.org/c/openstack/ironic/+/80528412:33
opendevreviewMerged openstack/ironic-python-agent master: Check the network burnin roles and partner  https://review.opendev.org/c/openstack/ironic-python-agent/+/80409212:43
opendevreviewDmitry Tantsur proposed openstack/ironic master: Improve edge-case debugging for deployment and cleaning  https://review.opendev.org/c/openstack/ironic/+/80710513:59
iurygregorydtantsur, I just noticed I didn't add any documentation about using the redfish vendor passthru subscriptions, do we have a specific place where we add this? 14:35
dtantsuriurygregory: probably the redfish docs14:49
dtantsurhttps://docs.openstack.org/ironic/latest/admin/drivers/redfish.html14:49
iurygregorydtantsur, ack14:55
sshnaidmdtantsur, hey, wrt https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/806809 - I'm going to a long pto from tomorrow, so I won't be capable to work on it soon, maybe stevebaker will want to fix it :)15:00
dtantsurack, enjoy your pto!15:03
iurygregorywe need ipa-builder changes before 13Sep (so we can request a release)15:04
TheJuliawe'll need to release regardless and treat it as bug fixes because they will need to end up going back to Wallaby15:09
dtantsurSupport for an OS that is not even released is hardly a bug fix :)15:10
dtantsurbut I won't block that15:10
TheJuliadecisions far above my pay grade15:10
dtantsurwell, here we make the decisions15:11
TheJuliabut honestly, the issues we're *aware* of is more like "oh, package changed"15:11
dtantsurhonestly, now you made me want to block it...15:11
iurygregoryoh god15:11
* dtantsur won't because he has more important things to care about15:11
iurygregorygood morning TheJulia 15:11
TheJuliaThat is your prerogative, it is literally going to shoot your friends in the foot. 15:12
dtantsurPlease take a step back. "Accept this change or our management will hurt us" is not an argument anyone should use.15:12
dtantsurAs I said, I'm not going to die on that hill, so please don't try to convince me otherwise.15:13
TheJuliaThats not what I suggested, we're going to be hampered and blocked15:13
TheJuliaReally, thats all. I have no desire to die on a hill either :)15:16
dtantsurso make sure to approve that backport during your evening so that I don't see it :D15:16
iurygregoryhehehe15:17
iurygregoryomg15:17
TheJulialol15:17
iurygregorydtantsur, you can close the irc client also15:17
iurygregoryand don't look at the irc logs 15:17
TheJulialol15:17
TheJuliaiurygregory: So we arrange for tasty beer delivery....15:18
dtantsur:D15:18
TheJuliaSee, he likes that idea :)15:18
dtantsurI'll be camping in the woods at some point in 1-2 weeks15:18
dtantsuryou'll be able to land anything you want15:18
TheJuliamost excellent15:18
iurygregoryTheJulia, yeah, we can also disable the bot on the day so we don't know the changes that got merged LOL15:18
dtantsur:)15:18
TheJuliaiurygregory: now that is just evil ;)15:19
dtantsurI also intend to disable internet on my phone and only use it for navigation/emergency15:19
iurygregoryhttps://media.giphy.com/media/nJmROmxdUuLFC/giphy.gif15:21
dtantsurwhat a PTL have we got this time :D15:22
TheJuliaone who posts awesome gifs of plotting evil?!15:23
dtantsurwait, it wasn't a selfie?15:23
dtantsursorry15:23
TheJulialol15:23
iurygregorywe still plan to take over the world right? :D15:23
TheJuliaI think so yes15:23
iurygregorydtantsur, LOL15:23
iurygregoryso I still need to follow some guidelines =D15:24
dtantsuryeah, TheJulia is your World Takeover Liaison15:24
iurygregoryawesome \o/15:24
dtantsurI don't remember if this position was accepted by the TC or not... but we'll take them over anyway, so who cares?15:24
iurygregory++15:24
TheJuliadtantsur: we just overthrow everything and install cats as our surpreme overlords15:24
TheJuliaThat way, everyone can become crazy cat people15:25
dtantsurI think from the point of view of cats the "install" part is a bit naive15:25
dtantsurbecause it assumes they haven't been the overlords from the beginning15:25
TheJulialol, true15:25
dtantsurbut yeah, let's formalized it finally15:25
TheJuliaexactly!15:25
iurygregorydo we need a spec?15:25
iurygregoryor have this in our docs? :D15:26
dtantsurlet's start with an RFE15:26
iurygregorymakes sense15:26
TheJuliarpioso: that bz we discussed yesterday. I took a look at the extra information supplied and compared it to the logs, and it appears the power sync state enforcement setting has been disabled, which causes the power off from the configuration job end result to never result in a power on15:29
TheJuliawell, we go to turn it on, and find the machine already on, and then later on it gets turned off by the config job15:29
TheJuliaand because the sync has been disabled, it just records that it is off and moves on15:29
TheJuliabut that is well after deployment has been completed.15:29
arne_wiebalckbye everyone o/15:48
* dtantsur sees a cleaning with empty steps and goes wtf15:49
dtantsurokay, cleaning is still broken with the ramdisk deploy. lovely.15:53
rpiosoTheJulia: We confirmed we do not modify [conductor] force_power_state_during_sync on the Director. That knob is not in /var/lib/config-data/puppet-generated/ironic/etc/ironic/ironic.conf.16:46
dtantsurrloo: do we expect the normal agent-base cleaning to work for the anaconda deploy?16:49
rloodtantsur: yes. i don't recall the details, but i remember jay saying that. i believe it should be set up to use agent-based cleaning.16:51
dtantsurrloo: well, it's likely just as broken as the ramdisk deploy16:51
rloough :-(16:51
dtantsurso okay, I think I'll fix both16:51
rloothx!16:52
dtantsurwe have too many mixins in agent_base...16:52
rloo(good and a bad thing...)16:52
rpiosoTheJulia: Take that back :-/ We do see it there. It is set to False.16:52
dtantsurI thought it was a Director-wide setting, but I may be confusing something16:53
rpiosoTheJulia: Interestingly, the same configuration works with 14th Generation.16:54
dtantsurrpioso, TheJulia, this setting was there since forever: https://opendev.org/openstack/instack-undercloud/src/tag/newton-eol/elements/puppet-stack-config/puppet-stack-config.yaml.template#L38716:56
dtantsur(this is newton!)16:56
*** sshnaidm is now known as sshnaidm|off17:01
rpiosodtantsur: Thank you!17:03
dtantsursee you tomorrow folks17:22
opendevreviewDmitry Tantsur proposed openstack/ironic master: Fix in-band cleaning in the ramdisk deploy  https://review.opendev.org/c/openstack/ironic/+/80718717:23
TheJuliarpioso: weird it worked but it is basically a giant race condition since we have no way of really knowing what, and how long it is going to take. If it lines up with power sync periodic triggering, I can see it being a thing.19:43
TheJuliarpioso: could the newer generation be recording some state from start of the job that we're colliding with maybe?19:51
rpiosoTheJulia: Hrm ... The deployment of the instance OS seems to be successful. It is on the root device, a RAID volume. Powering on the system from the BMC causes it to boot into that OS. That suggests the boot device was successfully set to persistently be the disk. All that said, I have not yet seen a record which confirms the baremetal node reached the active state, so ...19:57
TheJuliarpioso: The information we were provided with has a baremetal node list reporting all nodes active, but also powered off19:57
TheJuliathe logs, when sifting through the workflow, jive with that. We observe the node turning itself off... basically. We also obseved it turning itself on before that.19:58
rpiosoTheJulia: We're puzzled by the observations that the node turns itself on and then turns itself off. Do you believe it's doing that on its own?20:03
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic master: Fix typo and add subscription docs  https://review.opendev.org/c/openstack/ironic/+/80721320:05
rpiosoTheJulia: When the boot device is set to persistently be the disk, a BMC configuration job is created. The idrac-wsman management interface implementation waits for that job to be scheduled before returning to the caller. Note that it does not do anything to affect the power state of the system. The expectation is that the job will run when it is powered on. The system should not autonomously power itself on. 20:09
rpiosoTheJulia: Since [conductor] force_power_state_during_sync is set to false, I am unclear on the prospective race condition. Please elaborate.20:14
rpiosoTheJulia: Just after ironic successfully deploys the instance, is the bare metal node turned off while it is booting into the activated instance? In other words, might it not be finished booting when it is powered off in preparation for performing network configuration?20:26
rpiosoTheJulia: In other words, is the system abruptly powered off while booting into the instance OS for the first time? If so, that power action may be coincidental with the iDRAC processing the configuration job to persistently set the boot device to disk. The iDRAC may proceed to complete the job by powering the server back on and executing it. When the job has been completed, it would power the system off.20:35
stevebakermorning20:36
TheJuliarpioso: I'm fairly convinced based upon the log entries I've seen20:46
TheJuliarpioso: we expliclty turn it off to chagne the network addresses. When we go to turn it back on, the machine is already back on20:46
TheJuliarpioso: a little while later, the machine is back off again completely outside of our actions,20:47
TheJuliaof course, also after deployment20:47
TheJuliaThe power on is intended drive the machine to boot the worload20:47
TheJuliaI think it is a race becasue prior tests reportly resulted in randomized behavior20:48
TheJuliaby pure luck, every node this most recent time20:48
TheJuliaand since the idrac powers the machine back off, nothing is there to power it back on since we're past the deployment process.20:49
rpiosoTheJulia: I was told all of the overcloud servers in 2 clusters failed in the same way. There has been no successful automated installation.20:50
TheJuliarpioso: the bz comments I have state otherwise, that they were able to ping some machines on prior runs20:50
rpiosoTheJulia: Please remind me ... does ironic boot the newly deployed and activated node into the instance OS?20:52
TheJuliarpioso: it does it by power on, but if a node is *already* in that power state..... you can imagine what happens then20:53
TheJuliapower up to a powered-on machine is a an effective non-operation20:54
TheJuliaif the setting is set to True, then that would at least end up getting the power back on, as opposed to just recording "oh, the machine is off now, let me update that in my database"20:57
TheJuliarpioso: I'd be happy to sift through the logs with you and show you what I'm seeing tomorrow20:58
TheJuliaI've been looking at issues and logs all day, so I'm about out of spoons20:59
rpiosoTheJulia: Thank you for that offer. I accept :-)20:59
TheJuliarpioso: Okay, just first thing-ish tomorrow I'll ping you once I have coffee in hand21:00
* TheJulia allows morbid curiosity to lake over and looks at another bug.... and finds a comment which gives her hope for the universe21:01
rpiosoTheJulia: I wonder what would happen if the amount of time it took to complete this processing substantially changed: https://github.com/openstack/ironic/blob/stable/train/ironic/drivers/modules/ipxe.py#L281-L28221:07
opendevreviewSteve Baker proposed openstack/ironic master: WIP use packaged grub efi for network boot  https://review.opendev.org/c/openstack/ironic/+/80680821:08
TheJuliarpioso: that has always been asynchronous in nature21:10
TheJuliaOr it returns immediately now...21:10
TheJuliahttps://github.com/openstack/ironic/blob/1bdee995837c2511e1513cc0f5ac24a0d60963e8/ironic/drivers/modules/agent_base_vendor.py#L675 is where we see the power suddenly toggle unexepctedly as we wrap things up21:10
TheJuliawhich is like the major step after21:10
TheJuliaBUT you have a good point, maybe set boot device for wsman could have been blocking on 14g21:10
opendevreviewSteve Baker proposed openstack/ironic master: WIP use packaged grub efi for network boot  https://review.opendev.org/c/openstack/ironic/+/80680821:15
rpiosoTheJulia: The call I linked causes this to execute: https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/drac/management.py#L326-L335. That waits for the iDRAC configuration job to be scheduled.21:15
rpiosoTheJulia: The logic has remained the same for a long time.21:15
TheJuliawell, it might have always been scheduled21:15
rpiosoTheJulia: When that code was added, we found the configuration job wasn't scheduled, so it wasn't executed when the system was powered on. It skipped right over it. The system did not boot from the desired device.21:17
TheJuliayeah, same. The overall path and behavior really hasn't changed, but tomorrow, I'll show you what I saw in the logs and we can walk through it21:18
rpiosoTheJulia: +121:19
opendevreviewSteve Baker proposed openstack/ironic master: WIP use packaged grub efi for network boot  https://review.opendev.org/c/openstack/ironic/+/80680822:31
opendevreviewSteve Baker proposed openstack/bifrost master: WIP support grub network boot  https://review.opendev.org/c/openstack/bifrost/+/80722022:59
tonybCan node cleaning be disabled per-node?23:16
stevebakertonyb: yes, by setting the automated_clean attribute on thenode23:21
stevebaker...to falsw23:22
stevebakerfalse23:22

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!