Thursday, 2021-10-14

*** pmannidi is now known as pmannidi|brb04:06
*** pmannidi|brb is now known as pmannidi05:07
MikeCTZAtrying to get ironic working in openstack wallaby release deployed with Kolla, struggling with that for a while now, thought I'd pop here and see if people had ideas05:52
MikeCTZAthe probleim I;m working at now is that the conductor is not talking to the iDRAC via IPMI I get an error in the conductor logs like this05:52
MikeCTZA2021-10-14 07:49:32.449 7 WARNING ironic.drivers.modules.ipmitool [req-15453cc2-4b7a-483c-88df-5ab69fbc7dde - - - - -] IPMI Error encountered, retrying "ipmitool -I lanplus -H 10.102.30.31 -L ADMINISTRATOR -p 623 -U root -R 1 -N 5 -f /tmp/tmpd0vv0sjp power status" for node cac3653a-56ec-4e42-887b-28cc7b1f0372. Error: Unexpected error while running05:53
MikeCTZAcommand.05:53
MikeCTZACommand: ipmitool -I lanplus -H 10.102.30.31 -L ADMINISTRATOR -p 623 -U root -R 1 -N 5 -f /tmp/tmpd0vv0sjp power status05:53
MikeCTZAExit code: 105:53
MikeCTZAStdout: ''05:53
MikeCTZAStderr: 'Error: Unable to establish IPMI v2 / RMCP+ session\n': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.05:53
MikeCTZAI can do the command manually myself from inside the container so I'm a bit stumped as to where to look further05:53
arne_wiebalckGood morning, Ironic!06:26
MikeCTZAmorning arne, I wish I had ironic, still trying to get it opertational06:29
arne_wiebalckHey MikeCTZA o/06:29
arne_wiebalckLooking at your issue, this is not uncommon :)06:29
arne_wiebalckHere's what I would try/check:06:29
arne_wiebalckipmiversion: "-I lanplus" vs "-I lan"06:30
arne_wiebalckconnectivity: can the conductor connect to the ipmi port of the node (try with telnet, for instance)06:31
MikeCTZAwe have lanplus so thats there, I can do the exact command (obv with the password created) in the container and it works so there is no comms issue06:31
arne_wiebalckok06:31
arne_wiebalckthe container is on the conductor?06:32
MikeCTZAyes06:32
arne_wiebalckcan the conductor connect to the BMC port?06:32
MikeCTZAwe use kolla-ansible for our openstack deployment so I can do this fine06:33
MikeCTZAdocker exec -it ironic_conductor ipmitool -I lanplus -H 10.102.30.31 -L ADMINISTRATOR -p 623 -U root -R 1 -N 5 -f /tmp/pass power status06:33
MikeCTZAand that returns Chassis Power is on as expected06:33
arne_wiebalckhmm06:34
arne_wiebalckso the node also reports "power_state" = None ?06:35
arne_wiebalckor error maybe06:36
MikeCTZAwith openstack command I see this06:36
MikeCTZAUploaded file: https://uploads.kiwiirc.com/files/1f23b798020415dab2291145a5b2b7ca/pasted.txt06:37
MikeCTZAopenstack baremetal node list -f json06:37
MikeCTZA[06:37
MikeCTZA  {06:37
MikeCTZA    "UUID": "cac3653a-56ec-4e42-887b-28cc7b1f0372",06:37
MikeCTZA    "Name": "A-08-36-mike",06:37
MikeCTZA    "Instance UUID": null,06:37
MikeCTZA    "Power State": "power on",06:37
MikeCTZA    "Provisioning State": "available",06:37
MikeCTZA    "Maintenance": false06:37
MikeCTZA  }06:37
MikeCTZA]06:37
arne_wiebalckok, but this shows the conductor can talk to the node06:37
arne_wiebalcksince it has recorded the power state as "power on"06:37
MikeCTZAagreed I just did a openstack baremetal node power off and now it shows as power off06:38
MikeCTZAso should I just assume that error is not quite right and I'm looking at something thats not really an issue06:38
arne_wiebalckok, here is another idea06:39
arne_wiebalckthe impicalls are retried as well and they are udp06:39
arne_wiebalckso, maybe you just see an error, but the call is retried and then works06:39
arne_wiebalckthe node is in available, so it went through cleaning (which means reboots, no?)06:40
arne_wiebalckwhen does the above error actually come up?06:40
MikeCTZAthe error seems to be ever few minutes in the logs from what I can see06:41
MikeCTZAI'm pretty new to the whole ironic setup so I'm possibly doing something wrong06:41
arne_wiebalckevery few minutes, that is the power sync06:41
MikeCTZAour setup is far from simple so following the docs are not getting me quite to it all working06:41
arne_wiebalckironic gets the power from all nodes every couple of minutes06:41
arne_wiebalcklet me check the default interval ...06:42
MikeCTZAin my test setup I have 3 controllers, 2 normal nova compute nodes and 1 test ironic host06:42
MikeCTZAonce we have it all working we will deploy ironic pretty wide to over 100 nodes in prod06:42
MikeCTZAwe have a data intensive astronomy/bioinformatics research cloud06:42
arne_wiebalcksounds great!06:43
arne_wiebalckcheck sync_power_state_interval in ironic.conf (default is 60s)06:43
arne_wiebalckthen have a look if that matches your interval06:43
MikeCTZAthats not set so it must be the default, I'll check that06:44
arne_wiebalckthen maybe change the interval to see if the error rate follows (but I am pretty sure it is the power sync)06:44
MikeCTZAright I'll try that out06:44
arne_wiebalckwe see this in our deployment as well, this is why there is a big retry loop around ipmi calls06:44
MikeCTZAmaybe I can paste a few commands and you can tell me if I'm doing something totally wrong in trying to get something going?06:45
arne_wiebalckI don't think errors bubble up, though ... 06:45
arne_wiebalcksure06:45
arne_wiebalckI will check if we see these errors ...06:46
MikeCTZAso this is what I've been doing  in my testing to try get something working06:47
MikeCTZAopenstack flavor create --ram 8192 --vcpus 1 --disk 10 mike-ironic-flav06:47
MikeCTZAopenstack flavor set --property resources:CUSTOM_BAREMETAL_MIKE=1 mike-ironic-flav06:47
MikeCTZAopenstack baremetal node create --driver ipmi --name A-08-36-mike --driver-info ipmi_port=623 --driver-info ipmi_username=root --driver-info 'ipmi_password=ourpassword' --driver-info ipmi_address=10.102.30.31 --resource-class baremetal-mike --property cpus=24 --property memory_mb=393216 --property local_gb=372 --property cpu_arch=x86_6406:47
MikeCTZA--driver-info deploy_ramdisk=`openstack image show deploy-initrd -f value -c id` --driver-info deploy_kernel=`openstack image show deploy-vmlinuz -f value -c id`06:47
MikeCTZAMCNODE=`openstack baremetal node show -f value -c uuid A-08-36-mike`06:47
MikeCTZAopenstack baremetal node validate $MCNODE06:47
MikeCTZAopenstack baremetal port create ec:f4:bb:d5:a6:12 --node $MCNODE --physical-network physnet206:47
MikeCTZAopenstack baremetal node validate $MCNODE06:47
MikeCTZAopenstack baremetal node show $MCNODE06:47
MikeCTZAopenstack baremetal node list # shows (none,enroll)06:47
MikeCTZAopenstack baremetal node manage $MCNODE06:47
MikeCTZAopenstack baremetal node list # shows (poweroff,manageable)06:47
MikeCTZAopenstack baremetal node provide $MCNODE06:47
MikeCTZAopenstack baremetal node list # shows (poweroff,available)06:47
MikeCTZAopenstack hypervisor stats show06:47
MikeCTZAopenstack hypervisor show $MCNODE06:47
MikeCTZAI had it at one stage where the show (last command) showed me stuff but currently its reporting "No hypervisor with a name or ID"06:47
MikeCTZAIve added a few notes and some of the commands are repeated because I'm just making checks and mental notes as I go along06:48
arne_wiebalckMikeCTZA: for long code/command postings it is better to use sth like paste.openstack.org (easier to read and keeps IRC cleaner for others)06:49
MikeCTZAsorry will do, I'm new to IRC not used it 20+yrs06:49
arne_wiebalckheh :-D06:49
arne_wiebalckyou're using Ironic with Nova (i.e. not standalone, right?)06:51
arne_wiebalckyes (answering myself)06:52
MikeCTZAcorrect, we have a prod Ussuri cloud, our test is with Wallaby as we know there are improvements06:52
iurygregorygood morning Ironic 06:52
arne_wiebalckonce the node is in available, do you see the allocation candidate appear in placement?06:52
arne_wiebalckhey iurygregory o/06:52
MikeCTZAthis is "us" BTW: https://www.ilifu.ac.za/06:52
arne_wiebalckMikeCTZA: looks very interesting, welcome to Ironic!06:53
arne_wiebalckMikeCTZA: our deployment is at CERN: https://home.cern/06:54
MikeCTZAI saw from a google of who I was speaking to, I know about cern's setup so great to "meet you"06:54
MikeCTZAwe have about 120 hypervisors and about 8PB disk (ceph) most the VMs we have a re full nodes, so a single VM on a hypervisor this so thus the need for ironic06:55
MikeCTZAwe use a Slurm cluster, we dont anticipate being like you maybe are, once our nodes are up they will be up and not spun up and down06:55
arne_wiebalcknice to meet you!06:57
arne_wiebalckthe setup sounds actually quite similar: Ironic, hypervisors, Ceph ... we use HTCondor for our batch system06:57
arne_wiebalckand we use Ironic for a number of reasons, the need for full node VMs is one of them06:57
MikeCTZAthats our main drive as it feels pointless having a VM with max resources hogging a node, we could get a bit more performance back06:58
arne_wiebalckyes, same here, there is a 5% virt tax we accepted until Ironic came along and provided us with an API to treat bm nodes the same as VMs06:58
MikeCTZAwe have a 10Gb Ethernet fabric and then a 50Gb/s Mellanox (switch has 100 ports but we split them) fabric which we use for inter device comms and backend stuff06:59
arne_wiebalckwe have 10GigE as well mostly (plus the BMC management network), then some nodes with 25Gig, some IB (for HPC)07:00
arne_wiebalckI just checked our logging for "your" error message07:00
arne_wiebalckI see it basically all the time :)07:01
MikeCTZAthe university HPC has IB we didnt go that route, we just have Ethernet over that fabric. we also have a slower 1gig out of band network for the management (idracs etc), so sounds pretty simiara07:01
arne_wiebalckI think these are failing commands which are retried.07:01
MikeCTZAthats "good" then and a red herring 07:01
arne_wiebalckI think, yes07:01
arne_wiebalckI spent some time some weeks ago to get these down a little07:02
MikeCTZAOK cool then I need to just move on and see whats what with the next steps 07:02
MikeCTZAthe day I fire up an ironic node is the day I jump for joy07:02
arne_wiebalckit seemed like the BMCs sometimes need a wakeup call07:02
arne_wiebalckMikeCTZA: it will be, yes ... TFW when pings returns success :)07:03
arne_wiebalckMikeCTZA: if you run into issue, do not hesitate to ask here, the Ironic community here on IRC is a great resource to get help07:05
arne_wiebalck*issues07:05
MikeCTZAI'd love to maybe do a zoom or chat sometime if you are willing so I can explain what we have and then maybe you can advise is we are on the right track, I'm not sure if that is overasking07:05
arne_wiebalcknot at all!07:05
arne_wiebalckthere is also the monthly bare metal SIG meeting we try to gather the community on zoom to exchange experiences07:06
MikeCTZAthat would be amazing, let me know when suits you and we can set something up07:06
arne_wiebalckmaybe we could do a "show and tell" session07:06
MikeCTZAI have been catching the infra chats and seeing so so would be goo dot get more involved, I'm planning to try attend the Berlin openstack event in June 202207:06
arne_wiebalckawesome!07:07
arne_wiebalckquite some distance for you :)07:07
MikeCTZAindeed, but worth it I think, an ex colleague who setup our initial cloud attended a few in the past07:07
arne_wiebalckit is usually quite good to meet the community, i.e. devs but also other operators07:08
MikeCTZAfor sure, I'll come with loads of questions but maybe not as many once I'm over this hurdle07:09
rpittaugood morning ironic! o/07:43
iurygregorymorning rpittau o/07:47
rpittauhey iurygregory :)07:47
opendevreviewMerged openstack/ironic stable/xena: Fix iDRAC configuration mold docs  https://review.opendev.org/c/openstack/ironic/+/81149507:49
hjensasrpittau: dtantsur: do you use http_basic with inspector in metal3? I just opened https://storyboard.openstack.org/#!/story/2009295, not sure if it is inspector-client or something with TripleO undercloud "no keystone" config.09:21
iurygregoryhjensas, we do use http_basic with inspector in metal3/openshift09:34
iurygregorywe had some problems with the no keystone config but I think we already solved this09:34
opendevreviewManuel Schönlaub proposed openstack/sushy master: Add support for additional network resources.  https://review.opendev.org/c/openstack/sushy/+/81385009:36
iurygregoryTheJulia, dtantsur rpittau fyi https://review.opendev.org/c/openstack/releases/+/813971 =) 09:38
hjensasiurygregory: ok, I'll keep digging. Hopefully it is just a configuration issue in TripleO.09:43
iurygregoryhjensas, good luck!09:43
jssfrIs there a way to tell an IPA image to ignore all interfaces except one whose MAC has been given in the kernel command line?09:59
jssfrI have the issue that a node may obtain DHCP from multiple interfaces, but only one has the correct DNS server etc. for IPA to find the Ironic APIs10:00
jssfrI could pass the MAC via kernel commandline from iPXE10:00
jssfrif there was a way to restrict that :)10:00
-opendevstatus- NOTICE: zuul was stuck processing jobs and has been restarted. pending jobs will be re-enqueued10:02
dtantsurgood... morning?10:12
dtantsurjssfr: IPA itself does not DHCP, your image does10:13
dtantsure.g. the images we build use the DIB's dhcp-all-interfaces element, which, as you imagine, tries to do DHCP everywhere10:14
dtantsurI'm not sure there is something ready-to-use, but you can likely write your own DIB element, it's not too complicated10:14
iurygregorymorning dtantsur 10:45
jssfrdtantsur, okay, thanks10:46
jssfrI'll poke our image guy :)10:47
*** akahat is now known as akahat|afk11:54
opendevreviewMerged openstack/ironic master: Add and document high-level helpers for async steps  https://review.opendev.org/c/openstack/ironic/+/80729512:20
opendevreviewMerged openstack/ironic master: Add a helper for node-based periodics  https://review.opendev.org/c/openstack/ironic/+/81249512:20
opendevreviewMerged openstack/ironic master: node_periodics: encapsulate the interface class check  https://review.opendev.org/c/openstack/ironic/+/81345812:21
jssfr... or I'll just turn off the other DHCP servers. they're not needed anymore anyway.13:04
dtantsur++ it's a safer idea13:06
*** akahat|afk is now known as akahat13:10
TheJuliagood morning13:33
rpittaugood morning TheJulia :)13:33
iurygregorygood morning TheJulia =)13:39
dtantsurHi TheJulia, how's waking up this morning?13:40
TheJuliadtantsur: slow, and really thinking of just taking the day off13:49
dtantsurI had same thoughts today..13:49
dtantsurthe worst kind of hangover: when you haven't actually drunk any alcohol (for a while)13:49
TheJuliaheh13:49
iurygregoryouch =( this is the worst type of hangover14:02
JayFthe malaise is going around faster than covid the last 2 years14:05
JayFI have tomorrow off after a week where I haven't felt great either. Just gotta make it through the next 8.14:05
iurygregoryJayF, you can do it!14:06
JayFyou gotta point that !!!! energy at something :P14:07
TheJuliaoooh ahh my nova patch didn't fail on ironic stuffs *dances*14:07
JayFit's 7am-ish local time. Too early for a !14:07
TheJuliai think there is something to be about what for many is basically long term isolation14:08
TheJuliataking reasonable precautions now is an isolation creation pattern14:09
NobodyCamGood morning ironic’ers14:10
TheJuliagood morning NobodyCam 14:10
NobodyCamYou ya up way to early ;p14:11
iurygregorygood morning NobodyCam 14:11
NobodyCamMorning TheJulia and iurygregory14:11
JayFTheJulia: on Friday, we got to see the jr hockey team play... they did a whole video on how it's been 580 days since they had a game in person14:11
JayFgetting out, seeing that, being around people, just like a jolt of electricity14:12
TheJuliaJayF: I bet14:12
NobodyCamAny known issues with Ubuntu generic cloud image. Seeing image that boot up with no network14:14
TheJuliaNobodyCam: drivers? does it use networkmanager?14:14
dtantsurI *think* ubuntu/debian doesn't do DHCP by default (i.e. they don't have NM)14:17
dtantsurwhen building a debian-based IPA I had to add dhcp-all-interfaces (RH images work without it, if NM does the right thing)14:17
TheJuliaalso, networkmanager is just plain evil14:17
dtantsuralso, yes14:17
dtantsurbut it does fulfil the task of doing DHCP on a random interface (if that's the task you need)14:18
TheJulia++14:18
dtantsurNobodyCam: you probably need to use configdrive to tell cloud-init to DHCP the right stuff14:18
dtantsuron this positive note I'll probably go and lie down finally14:20
TheJuliaYeah, I'm going to put in for today/tomorrow and go do something else()14:20
TheJulianot that I can do something fun like drive into LA when I don't know when the highway will reopen14:20
NobodyCamThank you for the pointers.  I’ll take a look after I wake up. Could also be the mellanox cards. Though I thought Ubuntu supported them out of the box14:20
TheJuliaNobodyCam: oh, mellanox cards. Joy!14:20
dtantsuro/14:20
TheJuliaNobodyCam: could they be in IB mode?14:20
NobodyCamNight dtantsur14:21
NobodyCamI don’t believe so. Cent and rh working perfectly14:22
NobodyCam:dance14:22
NobodyCamOh did I say on arm (aarch64) system14:23
NobodyCamHehehe14:23
NobodyCamMore coffee is needed14:24
TheJuliadoes this aarch64 machine natively netboot?14:24
NobodyCamYes and the Ipmi actually works14:24
rpittaubye everyone, see you tomorrow o/14:27
iurygregorybye rpittau o/14:28
TheJuliaNobodyCam: nice14:30
NobodyCamNight rpittau14:34
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic-tempest-plugin master: Add stable/xena jobs  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/81401914:42
arne_wiebalckbye everyone o/16:06
opendevreviewAndrew Martins Carletti proposed openstack/ironic master: Unnecessary Pecan imports removed  https://review.opendev.org/c/openstack/ironic/+/81403816:32
*** dviroel|rover is now known as dviroel|rover|afk21:30

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!