Thursday, 2021-11-04

*** pmannidi is now known as pmannidi|brb00:40
*** pmannidi|brb is now known as pmannidi01:21
*** pmannidi is now known as pmannidi|brb01:48
*** pmannidi|brb is now known as pmannidi02:30
arne_wiebalckGood morning janders and Ironic!07:18
iurygregorygood morning janders arne_wiebalck and Ironic o/08:00
arne_wiebalckhey iurygregory o/08:01
rpittaugood morning ironic! o/08:04
rpittauooook something's off with ironic-standalone-ipa-src and metalsmith-integration-ipa-src, and it looks like ironic-standalone has become (more) unstable recently08:20
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent stable/xena: Delete EFI boot entry duplicate labels first  https://review.opendev.org/c/openstack/ironic-python-agent/+/81648908:22
iurygregorygood morning rpittau o/08:26
rpittauhey iurygregory :)08:26
iurygregoryrpittau, well about ironic-standalone-ipa-src https://opendev.org/openstack/ironic-python-agent/commit/a67807b9b6017a8a1dc250bd7a7141e6728544cb08:28
iurygregoryand seems like we have the job voting in ipa-builder... but not in IPA =)08:29
rpittauyeah but it's giving more issues recently, check the stats08:31
rpittaualso the metalsmith job is voting08:31
rpittauI'm actually thinking to make them voting again08:32
iurygregoryhummm 39/50 SUCCESS | 1/50 ABORTED | 1/50 TIMED_OUT | 8/50 FAILURE08:34
iurygregoryfor  ironic-standalone-ipa-src08:35
iurygregoryconsidering only ipa-builder repo08:35
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Use json for lsblk output  https://review.opendev.org/c/openstack/ironic-python-agent/+/77539108:36
iurygregoryoh 1/50 CANCELED =)08:36
rpittauthe error is during raid creation "DEBUG ironic_lib.utils [-] Command stderr is: "mdadm: cannot open /dev/vdb1: No such file or directory"08:43
rpittauthe partition seems to be created08:45
rpittaumaybe partition table not updated?08:47
iurygregorydidn't we have a case like this in the past? (the partition table didn't get updated)08:48
rpittauyep, trying to remember08:48
* iurygregory wondering if arne_wiebalck remember something08:49
rpittauwe do run partx before calling mdadm https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L216908:50
iurygregoryomg!08:50
iurygregory Necessary, if we want to avoid hitting an error when creating the mdadm array below 'mdadm: cannot open /dev/nvme1n1p1: No such file or directory'08:51
iurygregory:D08:51
rpittauyeeeeep08:52
iurygregorycan we cry now or just later?08:55
*** pmannidi is now known as pmannidi|brb08:55
rpittaualso the last commit was done to prevent such a case https://opendev.org/openstack/ironic-python-agent/commit/9d707e9f4bab40109b7e29df2136e86d65325ea308:59
arne_wiebalckiurygregory: rpittau: yes09:08
arne_wiebalckthere is an issue when the device nodes are not yet created09:08
arne_wiebalckI've put in 2 patches lately09:09
arne_wiebalckhttps://review.opendev.org/c/openstack/ironic-python-agent/+/81619209:09
arne_wiebalckhttps://review.opendev.org/c/openstack/ironic-python-agent/+/81247009:10
iurygregoryhttps://review.opendev.org/c/openstack/ironic-python-agent/+/816192 I would say would probably help (but metalsmith failed /facepalm)09:11
rpittauright, so need to see why the metalsmith job is now failing09:11
iurygregoryyeah09:11
arne_wiebalckthe comment has the error message, but the fix is not sufficient09:11
rpittauok, I see09:13
rpittauarne_wiebalck: I'm wondering if we should make rescan_device public, move that to hardware.py and call that directly instead of lines 2156-2170 with your change  that adds the call to blockdev09:25
arne_wiebalckrpittau: sorry, which lines are these?09:27
* arne_wiebalck has a meeting now ...09:28
rpittauarne_wiebalck: https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L215609:28
rpittaumaybe add that after the except on L217109:28
arne_wiebalckrpittau: yes, I was thinking this as well09:35
arne_wiebalckrpittau: we should pull out the big sweep every time09:35
rpittauyeah09:35
arne_wiebalckacc. to the internet, all these calls may be necessary in different situations09:36
arne_wiebalckall of them work sometimes09:36
arne_wiebalck:-/09:36
rpittauexactly, and they're usually fast if everything's right09:36
arne_wiebalckyep09:36
arne_wiebalckwe probably have more places where we call udev settle for similar reasons09:37
arne_wiebalckbut these two are very similar already09:37
arne_wiebalckand already justify your proposal I think09:37
rpittauyes, it already makes sense for just those two09:38
dtantsurmorning ironic10:39
iurygregorymorning dtantsur o/10:49
opendevreviewAija Jauntēva proposed x/sushy-oem-idrac master: Fix moved tests  https://review.opendev.org/c/x/sushy-oem-idrac/+/81664111:02
*** dviroel|rover|out is now known as dviroel|rover11:12
dtantsurdoes anyone know what happened with the metalsmith jobs? iurygregory, rpittau?11:24
dtantsurhmm, the statistics is not too bad https://zuul.openstack.org/builds?job_name=metalsmith-integration-ipa-src11:24
dtantsurmaybe a temporary glitch11:24
rpittaucould be just a glitch, I got worried because it failed 3 times in a row in https://review.opendev.org/c/openstack/ironic-python-agent/+/81619211:25
iurygregorydtantsur, yeah I agree, I checked in the zuul the history of last 50 runs also11:25
dtantsurInstanceDeployFailure: Failed to install a bootloader when deploying node c7c7e554-453a-4842-8438-756135047c11. Error: No partition with UUID 6b1e3209-4f48-46de-a022-a6ea69099155 found on device /dev/vda11:25
rpittauthat11:26
dtantsurhmmm11:26
iurygregorymaybe something specific in a given cloud? (I haven't checked if there is somethign common between the failures)11:27
* dtantsur fixes the double error handling in the meantime11:43
dtantsur  File "/home/dtantsur/Projects/ironic/.tox/py3/lib/python3.10/site-packages/eventlet/timeout.py", line 166, in wrap_is_timeout11:45
dtantsur    base.is_timeout = property(lambda _: True)11:45
dtantsurTypeError: cannot set 'is_timeout' attribute of immutable type 'TimeoutError'11:45
dtantsurDEAR EVENTLET!!111:45
dtantsurhas anyone tried running our unit tests on python 3.10?11:45
rpittauI did11:54
rpittaudidn't see that though11:54
rpittaummm I ran it before upgrading to F35, but it was py310 anyway11:55
dtantsurmaybe F35 has something newer? dunno11:56
rpittauoh wow now I see that failing in ipa 11:57
rpittauI have to split now, see you tomorrow folks! o/12:02
dtantsursee you12:04
dtantsurehmm, it looks like lsblk actually loses partition UUID Oo12:04
dtantsurbefore re-reading: https://zuul.openstack.org/build/963864600d774599a59e835b60d3d0ef/log/controller/ironic-bm-logs/node-0_no_ansi_2021-11-03-20:17:45.log#296112:04
dtantsurafter: https://zuul.openstack.org/build/963864600d774599a59e835b60d3d0ef/log/controller/ironic-bm-logs/node-0_no_ansi_2021-11-03-20:17:45.log#298712:04
dtantsurarne_wiebalck: can it be a side effect of rereatpt?12:05
dtantsurI suspect your patch may be breaking it12:06
dtantsurfunnily, the lsblk output in deploy logs does contain the required UUID12:08
dtantsurI wonder if we should reorder partition read with udev settle? is it somehow asynchronous?12:08
iurygregorydtantsur, I just got the same error running ipa in 3.10 locally (F34)12:25
dtantsursigh12:28
TheJuliagood morning12:43
arne_wiebalckdtantsur: o.O (no idea)12:44
dtantsurmorning TheJulia 12:44
arne_wiebalckdtantsur: the lsblk calls are slightly different12:44
arne_wiebalckdtantsur: not sure this is relevant12:45
dtantsurarne_wiebalck: yeah, but they shouldn't return so different results12:45
* arne_wiebalck has to step away for 45mins, will have a look after12:46
arne_wiebalckdtantsur: no sure how re-reading the pt would make UUIDs disappear12:46
dtantsuryeah, sounds crazy12:46
janderssee you tomorrow Ironic o/12:47
arne_wiebalckbye janders o/12:49
arne_wiebalckdtantsur: I can try to reproduce this on real hardware12:50
arne_wiebalckdtantsur: just add the lsblk calls before and after rescan_device()12:50
opendevreviewDmitry Tantsur proposed openstack/ironic master: Avoid handling a deploy failure twice  https://review.opendev.org/c/openstack/ironic/+/81668412:52
arne_wiebalckGood morning, TheJulia 12:53
opendevreviewMichal Nasiadka proposed openstack/ironic-python-agent master: Add LVM based image support to MD scenario  https://review.opendev.org/c/openstack/ironic-python-agent/+/81668513:03
opendevreviewMichal Nasiadka proposed openstack/ironic-python-agent master: Add LVM based image support to MD scenario  https://review.opendev.org/c/openstack/ironic-python-agent/+/81668513:06
*** pmannidi|brb is now known as pmannidi13:10
dtantsurcould I get a 2nd +2 on https://review.opendev.org/c/openstack/ironic-python-agent/+/815861 please?13:21
TheJuliaI can look at outstanding changes on ipa a little later13:24
*** pmannidi is now known as pmannidi|AFK13:30
*** dviroel|rover is now known as dviroel|lunch|appt|afk15:03
opendevreviewMerged openstack/ironic-python-agent master: Always include the oslo_log log file in ramdisk logs  https://review.opendev.org/c/openstack/ironic-python-agent/+/81586115:14
opendevreviewDmitry Tantsur proposed openstack/sushy master: Migrate System constants to enums  https://review.opendev.org/c/openstack/sushy/+/81671715:55
opendevreviewDmitry Tantsur proposed openstack/ironic-python-agent stable/xena: Always include the oslo_log log file in ramdisk logs  https://review.opendev.org/c/openstack/ironic-python-agent/+/81666415:56
dtantsurthe migration to enums will break ironic unit tests, le sigh15:58
opendevreviewJulia Kreger proposed openstack/ironic master: Test nova-compute fix  https://review.opendev.org/c/openstack/ironic/+/81326415:58
dtantsurwhat is worse, it will also break some actual code where we assume constants are strings. double sigh.15:59
dtantsurTheJulia: do we have a single reason to enable only "pxe", not "ipxe" by default?16:05
opendevreviewDmitry Tantsur proposed openstack/ironic master: Enable Redfish and iPXE by default  https://review.opendev.org/c/openstack/ironic/+/81672116:15
dtantsuranyway, here goes ^^^16:15
TheJuliadtantsur: none, change it!16:18
TheJulia:)16:18
dtantsur\o/16:19
* dtantsur looks at redfish code and wants to cry.. again16:27
opendevreviewJulia Kreger proposed openstack/ironic-python-agent master: Fix UEFI record regex  https://review.opendev.org/c/openstack/ironic-python-agent/+/81672316:44
TheJuliale-sigh16:45
opendevreviewJulia Kreger proposed openstack/ironic-python-agent master: Fix UEFI record regex  https://review.opendev.org/c/openstack/ironic-python-agent/+/81672316:45
NobodyCamgood Morning Ironic folks17:02
dtantsurmoar merge conflicts \o/17:03
dtantsurmorning NobodyCam 17:03
NobodyCamjoy.. good morning dtantsur :) o/17:03
arne_wiebalckGood morning, NobodyCam o/17:04
dtantsurTheJulia: I assume colon is actually never there?17:04
NobodyCamhey hey arne_wiebalck :)17:04
TheJuliadtantsur: it is not17:04
dtantsurokie17:04
NobodyCammorning TheJulia 17:04
TheJuliaI must have just have mentally gone "this makes sense!"17:04
dtantsurtoo much sense for EFI :)17:05
TheJuliaand kept putting them into the samples where as I developed the regex against data from a customer17:05
TheJuliabut somewhere along the lines added the colon into it :\17:05
* TheJulia kicks self17:05
dtantsurmy laptop doesn't have colons but does have stars17:05
TheJuliamy laptop doesn't have stars :)17:05
dtantsur\o/17:05
dtantsurdo you have a thinkpad as well?17:05
TheJuliaI cracked open the efibootmgr code to see what it does17:05
TheJuliayup17:06
dtantsur\o/ consistency \o/17:06
dtantsurI'm glad they dropped U from UEFI :D17:06
arne_wiebalckdtantsur: I have now finally an image now to test what happens with the partition UUIDs on my hardware ... let's see ...17:07
dtantsuron top of that, my EFI has 30 boot records17:08
dtantsuror so17:08
arne_wiebalckdtantsur: obvious question: what's the limit? :-D17:08
dtantsurobvious answer: vendor dependent? :) dunno actually17:09
TheJuliaarne_wiebalck: AIUI, it is like 4 or 8kB of space17:09
TheJuliaI *have* seen reports of 30+ records just not fitting into the space17:09
arne_wiebalckTheJulia: heh, that would make quite some entries :)17:09
TheJuliaevil, misbehaving HBA adapters of evil17:09
dtantsurhehe, at least I hope nobody allows it to overflow into the neighboring regions :D17:10
TheJuliaEvilHBA == HBA that adds undesired UEFI NVRAM entries17:10
TheJuliadtantsur: I don't remember, I *think* it silently fails17:10
dtantsurso good17:11
arne_wiebalckTheJulia has been at the edge of the world, it seems17:11
TheJuliaalmost wholesome sort of good17:11
dtantsurcomputers are cursed17:11
JayFComputers are the curse; people who work on them are the cursed ones.17:12
JayFand Ironic is the blackest grimoire of them all17:12
dtantsur:D17:14
opendevreviewDmitry Tantsur proposed openstack/ironic master: Fix RedfishManagement.get_mac_addresses and related functions  https://review.opendev.org/c/openstack/ironic/+/81672617:14
dtantsurjanders: FYI ^^17:14
*** dviroel|lunch|appt|afk is now known as dviroel|rover17:20
TheJuliadtantsur: JayF: this is all in accordance with the prophecy, yes?17:37
arne_wiebalckdtantsur: I have some output ... and some questions :)17:37
dtantsurTheJulia: exactly17:38
arne_wiebalckdtantsur: I do lsblk before and after the call to blockdev17:38
arne_wiebalckdtantsur: the output differs17:38
arne_wiebalckdtantsur: but it seems to differ since the partition table also changed (since the image comes with a partition table)17:39
arne_wiebalckdtantsur: our image has 4 partitions, while before the image is dumped there are only 217:40
arne_wiebalckdtantsur: so, at the moment I would say: yes there are no more partition UUIDs, but these are not the same partitions17:40
dtantsurO___o17:40
arne_wiebalckdtantsur: does that makes sense or is it just too late for me ?17:41
dtantsurI'm on a call, but from a quick read it makes no sense17:41
arne_wiebalckwe can discuss tomorrow17:41
arne_wiebalckI think it makes sense in the sense we make this call to get the new partitions17:42
dtantsurI think we read the partitions after writing the image, not before17:47
dtantsurin case of the CI there are not previous partitions?17:47
arne_wiebalckthe issue for the blockdev call was that we write the image but we do not see the partitions the image brought17:49
arne_wiebalckin the log output I captured this was the case: first 2, then 4 partitions17:49
arne_wiebalckI guess I would need to wait for a case where I have 4 partitions before17:49
arne_wiebalckto see if the UUIDs disappear17:49
arne_wiebalckatm, the number of partitions changes and the new ones have no UUIDs17:50
arne_wiebalckthe first part is what I want, the second I cannot explain atm17:50
dtantsurhmm, hold on, it's a partition image, it doesn't have its own partitions?17:51
dtantsurand whole disk images are not written on a partition?17:51
* dtantsur is confused17:51
arne_wiebalckthe image is a whole disk image17:51
dtantsurokay, so it's not the same case as in the CI17:51
arne_wiebalckright17:51
arne_wiebalckCI is a partition image?17:52
dtantsurthe metalsmith jobs do both. the whole disk image work (it does not REALLY need the partition UUID)17:52
arne_wiebalckoh, wait17:53
arne_wiebalckthis is also iscsi on top17:54
arne_wiebalckmy test node still has that17:54
arne_wiebalcknot sure if that matters17:54
arne_wiebalckbut the 2nd partition before the image is dumped is the config drive I would think17:55
arne_wiebalckhmm ... hmm ... since the UUID is also empty, I am wondering if that would break s/w RAID17:57
arne_wiebalcksince that will need the uuid of the root fs17:58
arne_wiebalckand, yeah, since this is rescan_device rather than manage_uefi, it is now in the RAID code path as well18:01
opendevreviewArne Wiebalck proposed openstack/ironic master: [Trivial] Clarify conditions under which power recovery is attempted  https://review.opendev.org/c/openstack/ironic/+/81673318:02
arne_wiebalckI will test if s/w RAID still works ... isn't this fun :)18:05
arne_wiebalckok, I will leave the details for tomorrow :-D, have a good night everyone o/18:06
dtantsuro/18:38
opendevreviewDmitry Tantsur proposed openstack/ironic master: Enable Redfish and iPXE by default  https://review.opendev.org/c/openstack/ironic/+/81672118:46
arne_wiebalckdtantsur: the UUIDs really disappear18:47
arne_wiebalckdtantsur: https://paste.opendev.org/show/810784/18:48
arne_wiebalckdtantsur: the deployment works and once the instance is booted, all UUIDs are back18:49
arne_wiebalckdtantsur: so maybe there is some other race where blockdev triggers the non-availability of the uuids for some time?18:50
arne_wiebalck*the deployment of a WDI on a s/w RAID seems to work18:50
stevebaker[m]morning20:21
*** dviroel|rover is now known as dviroel|out22:13
jandersgood morning stevebaker[m] and Ironic o/23:37

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!