*** pmannidi is now known as pmannidi|brb | 04:06 | |
*** pmannidi|brb is now known as pmannidi | 05:07 | |
MikeCTZA | trying to get ironic working in openstack wallaby release deployed with Kolla, struggling with that for a while now, thought I'd pop here and see if people had ideas | 05:52 |
---|---|---|
MikeCTZA | the probleim I;m working at now is that the conductor is not talking to the iDRAC via IPMI I get an error in the conductor logs like this | 05:52 |
MikeCTZA | 2021-10-14 07:49:32.449 7 WARNING ironic.drivers.modules.ipmitool [req-15453cc2-4b7a-483c-88df-5ab69fbc7dde - - - - -] IPMI Error encountered, retrying "ipmitool -I lanplus -H 10.102.30.31 -L ADMINISTRATOR -p 623 -U root -R 1 -N 5 -f /tmp/tmpd0vv0sjp power status" for node cac3653a-56ec-4e42-887b-28cc7b1f0372. Error: Unexpected error while running | 05:53 |
MikeCTZA | command. | 05:53 |
MikeCTZA | Command: ipmitool -I lanplus -H 10.102.30.31 -L ADMINISTRATOR -p 623 -U root -R 1 -N 5 -f /tmp/tmpd0vv0sjp power status | 05:53 |
MikeCTZA | Exit code: 1 | 05:53 |
MikeCTZA | Stdout: '' | 05:53 |
MikeCTZA | Stderr: 'Error: Unable to establish IPMI v2 / RMCP+ session\n': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. | 05:53 |
MikeCTZA | I can do the command manually myself from inside the container so I'm a bit stumped as to where to look further | 05:53 |
arne_wiebalck | Good morning, Ironic! | 06:26 |
MikeCTZA | morning arne, I wish I had ironic, still trying to get it opertational | 06:29 |
arne_wiebalck | Hey MikeCTZA o/ | 06:29 |
arne_wiebalck | Looking at your issue, this is not uncommon :) | 06:29 |
arne_wiebalck | Here's what I would try/check: | 06:29 |
arne_wiebalck | ipmiversion: "-I lanplus" vs "-I lan" | 06:30 |
arne_wiebalck | connectivity: can the conductor connect to the ipmi port of the node (try with telnet, for instance) | 06:31 |
MikeCTZA | we have lanplus so thats there, I can do the exact command (obv with the password created) in the container and it works so there is no comms issue | 06:31 |
arne_wiebalck | ok | 06:31 |
arne_wiebalck | the container is on the conductor? | 06:32 |
MikeCTZA | yes | 06:32 |
arne_wiebalck | can the conductor connect to the BMC port? | 06:32 |
MikeCTZA | we use kolla-ansible for our openstack deployment so I can do this fine | 06:33 |
MikeCTZA | docker exec -it ironic_conductor ipmitool -I lanplus -H 10.102.30.31 -L ADMINISTRATOR -p 623 -U root -R 1 -N 5 -f /tmp/pass power status | 06:33 |
MikeCTZA | and that returns Chassis Power is on as expected | 06:33 |
arne_wiebalck | hmm | 06:34 |
arne_wiebalck | so the node also reports "power_state" = None ? | 06:35 |
arne_wiebalck | or error maybe | 06:36 |
MikeCTZA | with openstack command I see this | 06:36 |
MikeCTZA | Uploaded file: https://uploads.kiwiirc.com/files/1f23b798020415dab2291145a5b2b7ca/pasted.txt | 06:37 |
MikeCTZA | openstack baremetal node list -f json | 06:37 |
MikeCTZA | [ | 06:37 |
MikeCTZA | { | 06:37 |
MikeCTZA | "UUID": "cac3653a-56ec-4e42-887b-28cc7b1f0372", | 06:37 |
MikeCTZA | "Name": "A-08-36-mike", | 06:37 |
MikeCTZA | "Instance UUID": null, | 06:37 |
MikeCTZA | "Power State": "power on", | 06:37 |
MikeCTZA | "Provisioning State": "available", | 06:37 |
MikeCTZA | "Maintenance": false | 06:37 |
MikeCTZA | } | 06:37 |
MikeCTZA | ] | 06:37 |
arne_wiebalck | ok, but this shows the conductor can talk to the node | 06:37 |
arne_wiebalck | since it has recorded the power state as "power on" | 06:37 |
MikeCTZA | agreed I just did a openstack baremetal node power off and now it shows as power off | 06:38 |
MikeCTZA | so should I just assume that error is not quite right and I'm looking at something thats not really an issue | 06:38 |
arne_wiebalck | ok, here is another idea | 06:39 |
arne_wiebalck | the impicalls are retried as well and they are udp | 06:39 |
arne_wiebalck | so, maybe you just see an error, but the call is retried and then works | 06:39 |
arne_wiebalck | the node is in available, so it went through cleaning (which means reboots, no?) | 06:40 |
arne_wiebalck | when does the above error actually come up? | 06:40 |
MikeCTZA | the error seems to be ever few minutes in the logs from what I can see | 06:41 |
MikeCTZA | I'm pretty new to the whole ironic setup so I'm possibly doing something wrong | 06:41 |
arne_wiebalck | every few minutes, that is the power sync | 06:41 |
MikeCTZA | our setup is far from simple so following the docs are not getting me quite to it all working | 06:41 |
arne_wiebalck | ironic gets the power from all nodes every couple of minutes | 06:41 |
arne_wiebalck | let me check the default interval ... | 06:42 |
MikeCTZA | in my test setup I have 3 controllers, 2 normal nova compute nodes and 1 test ironic host | 06:42 |
MikeCTZA | once we have it all working we will deploy ironic pretty wide to over 100 nodes in prod | 06:42 |
MikeCTZA | we have a data intensive astronomy/bioinformatics research cloud | 06:42 |
arne_wiebalck | sounds great! | 06:43 |
arne_wiebalck | check sync_power_state_interval in ironic.conf (default is 60s) | 06:43 |
arne_wiebalck | then have a look if that matches your interval | 06:43 |
MikeCTZA | thats not set so it must be the default, I'll check that | 06:44 |
arne_wiebalck | then maybe change the interval to see if the error rate follows (but I am pretty sure it is the power sync) | 06:44 |
MikeCTZA | right I'll try that out | 06:44 |
arne_wiebalck | we see this in our deployment as well, this is why there is a big retry loop around ipmi calls | 06:44 |
MikeCTZA | maybe I can paste a few commands and you can tell me if I'm doing something totally wrong in trying to get something going? | 06:45 |
arne_wiebalck | I don't think errors bubble up, though ... | 06:45 |
arne_wiebalck | sure | 06:45 |
arne_wiebalck | I will check if we see these errors ... | 06:46 |
MikeCTZA | so this is what I've been doing in my testing to try get something working | 06:47 |
MikeCTZA | openstack flavor create --ram 8192 --vcpus 1 --disk 10 mike-ironic-flav | 06:47 |
MikeCTZA | openstack flavor set --property resources:CUSTOM_BAREMETAL_MIKE=1 mike-ironic-flav | 06:47 |
MikeCTZA | openstack baremetal node create --driver ipmi --name A-08-36-mike --driver-info ipmi_port=623 --driver-info ipmi_username=root --driver-info 'ipmi_password=ourpassword' --driver-info ipmi_address=10.102.30.31 --resource-class baremetal-mike --property cpus=24 --property memory_mb=393216 --property local_gb=372 --property cpu_arch=x86_64 | 06:47 |
MikeCTZA | --driver-info deploy_ramdisk=`openstack image show deploy-initrd -f value -c id` --driver-info deploy_kernel=`openstack image show deploy-vmlinuz -f value -c id` | 06:47 |
MikeCTZA | MCNODE=`openstack baremetal node show -f value -c uuid A-08-36-mike` | 06:47 |
MikeCTZA | openstack baremetal node validate $MCNODE | 06:47 |
MikeCTZA | openstack baremetal port create ec:f4:bb:d5:a6:12 --node $MCNODE --physical-network physnet2 | 06:47 |
MikeCTZA | openstack baremetal node validate $MCNODE | 06:47 |
MikeCTZA | openstack baremetal node show $MCNODE | 06:47 |
MikeCTZA | openstack baremetal node list # shows (none,enroll) | 06:47 |
MikeCTZA | openstack baremetal node manage $MCNODE | 06:47 |
MikeCTZA | openstack baremetal node list # shows (poweroff,manageable) | 06:47 |
MikeCTZA | openstack baremetal node provide $MCNODE | 06:47 |
MikeCTZA | openstack baremetal node list # shows (poweroff,available) | 06:47 |
MikeCTZA | openstack hypervisor stats show | 06:47 |
MikeCTZA | openstack hypervisor show $MCNODE | 06:47 |
MikeCTZA | I had it at one stage where the show (last command) showed me stuff but currently its reporting "No hypervisor with a name or ID" | 06:47 |
MikeCTZA | Ive added a few notes and some of the commands are repeated because I'm just making checks and mental notes as I go along | 06:48 |
arne_wiebalck | MikeCTZA: for long code/command postings it is better to use sth like paste.openstack.org (easier to read and keeps IRC cleaner for others) | 06:49 |
MikeCTZA | sorry will do, I'm new to IRC not used it 20+yrs | 06:49 |
arne_wiebalck | heh :-D | 06:49 |
arne_wiebalck | you're using Ironic with Nova (i.e. not standalone, right?) | 06:51 |
arne_wiebalck | yes (answering myself) | 06:52 |
MikeCTZA | correct, we have a prod Ussuri cloud, our test is with Wallaby as we know there are improvements | 06:52 |
iurygregory | good morning Ironic | 06:52 |
arne_wiebalck | once the node is in available, do you see the allocation candidate appear in placement? | 06:52 |
arne_wiebalck | hey iurygregory o/ | 06:52 |
MikeCTZA | this is "us" BTW: https://www.ilifu.ac.za/ | 06:52 |
arne_wiebalck | MikeCTZA: looks very interesting, welcome to Ironic! | 06:53 |
arne_wiebalck | MikeCTZA: our deployment is at CERN: https://home.cern/ | 06:54 |
MikeCTZA | I saw from a google of who I was speaking to, I know about cern's setup so great to "meet you" | 06:54 |
MikeCTZA | we have about 120 hypervisors and about 8PB disk (ceph) most the VMs we have a re full nodes, so a single VM on a hypervisor this so thus the need for ironic | 06:55 |
MikeCTZA | we use a Slurm cluster, we dont anticipate being like you maybe are, once our nodes are up they will be up and not spun up and down | 06:55 |
arne_wiebalck | nice to meet you! | 06:57 |
arne_wiebalck | the setup sounds actually quite similar: Ironic, hypervisors, Ceph ... we use HTCondor for our batch system | 06:57 |
arne_wiebalck | and we use Ironic for a number of reasons, the need for full node VMs is one of them | 06:57 |
MikeCTZA | thats our main drive as it feels pointless having a VM with max resources hogging a node, we could get a bit more performance back | 06:58 |
arne_wiebalck | yes, same here, there is a 5% virt tax we accepted until Ironic came along and provided us with an API to treat bm nodes the same as VMs | 06:58 |
MikeCTZA | we have a 10Gb Ethernet fabric and then a 50Gb/s Mellanox (switch has 100 ports but we split them) fabric which we use for inter device comms and backend stuff | 06:59 |
arne_wiebalck | we have 10GigE as well mostly (plus the BMC management network), then some nodes with 25Gig, some IB (for HPC) | 07:00 |
arne_wiebalck | I just checked our logging for "your" error message | 07:00 |
arne_wiebalck | I see it basically all the time :) | 07:01 |
MikeCTZA | the university HPC has IB we didnt go that route, we just have Ethernet over that fabric. we also have a slower 1gig out of band network for the management (idracs etc), so sounds pretty simiara | 07:01 |
arne_wiebalck | I think these are failing commands which are retried. | 07:01 |
MikeCTZA | thats "good" then and a red herring | 07:01 |
arne_wiebalck | I think, yes | 07:01 |
arne_wiebalck | I spent some time some weeks ago to get these down a little | 07:02 |
MikeCTZA | OK cool then I need to just move on and see whats what with the next steps | 07:02 |
MikeCTZA | the day I fire up an ironic node is the day I jump for joy | 07:02 |
arne_wiebalck | it seemed like the BMCs sometimes need a wakeup call | 07:02 |
arne_wiebalck | MikeCTZA: it will be, yes ... TFW when pings returns success :) | 07:03 |
arne_wiebalck | MikeCTZA: if you run into issue, do not hesitate to ask here, the Ironic community here on IRC is a great resource to get help | 07:05 |
arne_wiebalck | *issues | 07:05 |
MikeCTZA | I'd love to maybe do a zoom or chat sometime if you are willing so I can explain what we have and then maybe you can advise is we are on the right track, I'm not sure if that is overasking | 07:05 |
arne_wiebalck | not at all! | 07:05 |
arne_wiebalck | there is also the monthly bare metal SIG meeting we try to gather the community on zoom to exchange experiences | 07:06 |
MikeCTZA | that would be amazing, let me know when suits you and we can set something up | 07:06 |
arne_wiebalck | maybe we could do a "show and tell" session | 07:06 |
MikeCTZA | I have been catching the infra chats and seeing so so would be goo dot get more involved, I'm planning to try attend the Berlin openstack event in June 2022 | 07:06 |
arne_wiebalck | awesome! | 07:07 |
arne_wiebalck | quite some distance for you :) | 07:07 |
MikeCTZA | indeed, but worth it I think, an ex colleague who setup our initial cloud attended a few in the past | 07:07 |
arne_wiebalck | it is usually quite good to meet the community, i.e. devs but also other operators | 07:08 |
MikeCTZA | for sure, I'll come with loads of questions but maybe not as many once I'm over this hurdle | 07:09 |
rpittau | good morning ironic! o/ | 07:43 |
iurygregory | morning rpittau o/ | 07:47 |
rpittau | hey iurygregory :) | 07:47 |
opendevreview | Merged openstack/ironic stable/xena: Fix iDRAC configuration mold docs https://review.opendev.org/c/openstack/ironic/+/811495 | 07:49 |
hjensas | rpittau: dtantsur: do you use http_basic with inspector in metal3? I just opened https://storyboard.openstack.org/#!/story/2009295, not sure if it is inspector-client or something with TripleO undercloud "no keystone" config. | 09:21 |
iurygregory | hjensas, we do use http_basic with inspector in metal3/openshift | 09:34 |
iurygregory | we had some problems with the no keystone config but I think we already solved this | 09:34 |
opendevreview | Manuel Schönlaub proposed openstack/sushy master: Add support for additional network resources. https://review.opendev.org/c/openstack/sushy/+/813850 | 09:36 |
iurygregory | TheJulia, dtantsur rpittau fyi https://review.opendev.org/c/openstack/releases/+/813971 =) | 09:38 |
hjensas | iurygregory: ok, I'll keep digging. Hopefully it is just a configuration issue in TripleO. | 09:43 |
iurygregory | hjensas, good luck! | 09:43 |
jssfr | Is there a way to tell an IPA image to ignore all interfaces except one whose MAC has been given in the kernel command line? | 09:59 |
jssfr | I have the issue that a node may obtain DHCP from multiple interfaces, but only one has the correct DNS server etc. for IPA to find the Ironic APIs | 10:00 |
jssfr | I could pass the MAC via kernel commandline from iPXE | 10:00 |
jssfr | if there was a way to restrict that :) | 10:00 |
-opendevstatus- NOTICE: zuul was stuck processing jobs and has been restarted. pending jobs will be re-enqueued | 10:02 | |
dtantsur | good... morning? | 10:12 |
dtantsur | jssfr: IPA itself does not DHCP, your image does | 10:13 |
dtantsur | e.g. the images we build use the DIB's dhcp-all-interfaces element, which, as you imagine, tries to do DHCP everywhere | 10:14 |
dtantsur | I'm not sure there is something ready-to-use, but you can likely write your own DIB element, it's not too complicated | 10:14 |
iurygregory | morning dtantsur | 10:45 |
jssfr | dtantsur, okay, thanks | 10:46 |
jssfr | I'll poke our image guy :) | 10:47 |
*** akahat is now known as akahat|afk | 11:54 | |
opendevreview | Merged openstack/ironic master: Add and document high-level helpers for async steps https://review.opendev.org/c/openstack/ironic/+/807295 | 12:20 |
opendevreview | Merged openstack/ironic master: Add a helper for node-based periodics https://review.opendev.org/c/openstack/ironic/+/812495 | 12:20 |
opendevreview | Merged openstack/ironic master: node_periodics: encapsulate the interface class check https://review.opendev.org/c/openstack/ironic/+/813458 | 12:21 |
jssfr | ... or I'll just turn off the other DHCP servers. they're not needed anymore anyway. | 13:04 |
dtantsur | ++ it's a safer idea | 13:06 |
*** akahat|afk is now known as akahat | 13:10 | |
TheJulia | good morning | 13:33 |
rpittau | good morning TheJulia :) | 13:33 |
iurygregory | good morning TheJulia =) | 13:39 |
dtantsur | Hi TheJulia, how's waking up this morning? | 13:40 |
TheJulia | dtantsur: slow, and really thinking of just taking the day off | 13:49 |
dtantsur | I had same thoughts today.. | 13:49 |
dtantsur | the worst kind of hangover: when you haven't actually drunk any alcohol (for a while) | 13:49 |
TheJulia | heh | 13:49 |
iurygregory | ouch =( this is the worst type of hangover | 14:02 |
JayF | the malaise is going around faster than covid the last 2 years | 14:05 |
JayF | I have tomorrow off after a week where I haven't felt great either. Just gotta make it through the next 8. | 14:05 |
iurygregory | JayF, you can do it! | 14:06 |
JayF | you gotta point that !!!! energy at something :P | 14:07 |
TheJulia | oooh ahh my nova patch didn't fail on ironic stuffs *dances* | 14:07 |
JayF | it's 7am-ish local time. Too early for a ! | 14:07 |
TheJulia | i think there is something to be about what for many is basically long term isolation | 14:08 |
TheJulia | taking reasonable precautions now is an isolation creation pattern | 14:09 |
NobodyCam | Good morning ironic’ers | 14:10 |
TheJulia | good morning NobodyCam | 14:10 |
NobodyCam | You ya up way to early ;p | 14:11 |
iurygregory | good morning NobodyCam | 14:11 |
NobodyCam | Morning TheJulia and iurygregory | 14:11 |
JayF | TheJulia: on Friday, we got to see the jr hockey team play... they did a whole video on how it's been 580 days since they had a game in person | 14:11 |
JayF | getting out, seeing that, being around people, just like a jolt of electricity | 14:12 |
TheJulia | JayF: I bet | 14:12 |
NobodyCam | Any known issues with Ubuntu generic cloud image. Seeing image that boot up with no network | 14:14 |
TheJulia | NobodyCam: drivers? does it use networkmanager? | 14:14 |
dtantsur | I *think* ubuntu/debian doesn't do DHCP by default (i.e. they don't have NM) | 14:17 |
dtantsur | when building a debian-based IPA I had to add dhcp-all-interfaces (RH images work without it, if NM does the right thing) | 14:17 |
TheJulia | also, networkmanager is just plain evil | 14:17 |
dtantsur | also, yes | 14:17 |
dtantsur | but it does fulfil the task of doing DHCP on a random interface (if that's the task you need) | 14:18 |
TheJulia | ++ | 14:18 |
dtantsur | NobodyCam: you probably need to use configdrive to tell cloud-init to DHCP the right stuff | 14:18 |
dtantsur | on this positive note I'll probably go and lie down finally | 14:20 |
TheJulia | Yeah, I'm going to put in for today/tomorrow and go do something else() | 14:20 |
TheJulia | not that I can do something fun like drive into LA when I don't know when the highway will reopen | 14:20 |
NobodyCam | Thank you for the pointers. I’ll take a look after I wake up. Could also be the mellanox cards. Though I thought Ubuntu supported them out of the box | 14:20 |
TheJulia | NobodyCam: oh, mellanox cards. Joy! | 14:20 |
dtantsur | o/ | 14:20 |
TheJulia | NobodyCam: could they be in IB mode? | 14:20 |
NobodyCam | Night dtantsur | 14:21 |
NobodyCam | I don’t believe so. Cent and rh working perfectly | 14:22 |
NobodyCam | :dance | 14:22 |
NobodyCam | Oh did I say on arm (aarch64) system | 14:23 |
NobodyCam | Hehehe | 14:23 |
NobodyCam | More coffee is needed | 14:24 |
TheJulia | does this aarch64 machine natively netboot? | 14:24 |
NobodyCam | Yes and the Ipmi actually works | 14:24 |
rpittau | bye everyone, see you tomorrow o/ | 14:27 |
iurygregory | bye rpittau o/ | 14:28 |
TheJulia | NobodyCam: nice | 14:30 |
NobodyCam | Night rpittau | 14:34 |
opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic-tempest-plugin master: Add stable/xena jobs https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/814019 | 14:42 |
arne_wiebalck | bye everyone o/ | 16:06 |
opendevreview | Andrew Martins Carletti proposed openstack/ironic master: Unnecessary Pecan imports removed https://review.opendev.org/c/openstack/ironic/+/814038 | 16:32 |
*** dviroel|rover is now known as dviroel|rover|afk | 21:30 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!