*** TxGirlGeek has joined #openstack-ironic | 00:09 | |
*** bobmel has joined #openstack-ironic | 00:13 | |
*** TxGirlGeek has quit IRC | 00:15 | |
*** bobmel has quit IRC | 00:17 | |
*** rloo has quit IRC | 00:40 | |
*** ricolin_ has joined #openstack-ironic | 00:44 | |
*** ricolin_ has quit IRC | 00:45 | |
*** cdearborn has quit IRC | 01:13 | |
*** yedongcan has joined #openstack-ironic | 01:28 | |
openstackgerrit | Merged openstack/ironic master: Refactor glance retry code to use retrying lib https://review.opendev.org/701078 | 01:32 |
---|---|---|
openstackgerrit | Kaifeng Wang proposed openstack/ironic master: Make qemu hook running with python3 https://review.opendev.org/700994 | 01:35 |
*** igordc has joined #openstack-ironic | 01:49 | |
*** Goneri has quit IRC | 02:19 | |
*** goldyfruit_ has joined #openstack-ironic | 02:35 | |
*** goldyfruit_ has quit IRC | 02:36 | |
*** goldyfruit_ has joined #openstack-ironic | 02:36 | |
*** igordc has quit IRC | 02:43 | |
*** bobmel has joined #openstack-ironic | 02:56 | |
*** bobmel has quit IRC | 03:00 | |
*** rcernin_ has joined #openstack-ironic | 03:13 | |
*** rcernin has quit IRC | 03:13 | |
*** mkrai has joined #openstack-ironic | 03:28 | |
*** rcernin_ has quit IRC | 03:56 | |
*** rcernin has joined #openstack-ironic | 04:00 | |
openstackgerrit | Julia Kreger proposed openstack/bifrost master: Skip status code check and make vagrant install quicker https://review.opendev.org/701067 | 04:22 |
TheJulia | mgoddard: re ^^^ so, when I run test-bifrost.sh locally, It puts my local test vm into a not great state.. but install.yaml works just fine and gets a working ironic deployment. :\ | 04:23 |
*** goldyfruit_ has quit IRC | 04:24 | |
*** goldyfruit_ has joined #openstack-ironic | 04:24 | |
*** mkrai has quit IRC | 04:29 | |
*** mkrai_ has joined #openstack-ironic | 04:29 | |
*** bobmel has joined #openstack-ironic | 04:44 | |
*** goldyfruit_ has quit IRC | 04:48 | |
*** bobmel has quit IRC | 04:49 | |
*** tzumainn has quit IRC | 04:53 | |
*** gyee has quit IRC | 05:00 | |
*** goldyfruit_ has joined #openstack-ironic | 05:13 | |
*** ociuhandu has joined #openstack-ironic | 05:30 | |
*** ociuhandu has quit IRC | 05:35 | |
*** pcaruana has joined #openstack-ironic | 06:00 | |
*** pcaruana has quit IRC | 06:14 | |
*** ileixe has joined #openstack-ironic | 06:53 | |
ileixe | Hello Ironic / | 06:53 |
ileixe | I have some question about ironic network | 06:53 |
*** rcernin has quit IRC | 06:54 | |
ileixe | is there any reference architecture to integrate Neutron vlan, flat network? | 06:54 |
ileixe | I tried to use flat provider network but it's quite difficult to scale | 06:54 |
ileixe | any advice? | 06:54 |
*** henriqueof has quit IRC | 07:02 | |
*** henriqueof1 has joined #openstack-ironic | 07:02 | |
*** jawad_axd has joined #openstack-ironic | 07:18 | |
*** bobmel has joined #openstack-ironic | 07:26 | |
*** bobmel has quit IRC | 07:31 | |
mkrai_ | ileixe, Hi, you can refer https://docs.openstack.org/ironic/latest/admin/multitenancy.html | 07:35 |
kaifeng | hi ileixe, we have some reference architectures at https://docs.openstack.org/ironic/latest/install/refarch/index.html | 07:35 |
mkrai_ | kaifeng, Hi good morning | 07:36 |
kaifeng | it's not complete though. | 07:36 |
mkrai_ | And a very happy new year to all Ironicers :) | 07:36 |
kaifeng | Good morning mkrai_ o/ | 07:36 |
mkrai_ | kaifeng, I have question about sushy. Sushy sends Redfish APIs to BMC or some agent? | 07:37 |
arne_wiebalck | Good morning, ironic! | 07:41 |
kaifeng | mkrai_: I have bare knowledge on sushy, as I understand it, sushy talks to BMCs that support redfish APIs | 07:42 |
kaifeng | Good morning, arne_wiebalck o/ | 07:43 |
arne_wiebalck | Hey kaifeng o/ | 07:43 |
mkrai_ | arne_wiebalck, Hi, good morning o/ | 07:43 |
arne_wiebalck | Hey mkrai o/ | 07:43 |
mkrai_ | kaifeng, Thanks! | 07:46 |
kaifeng | mkrai_: sushy itself is not a service, it's a library consumed by services like ironic. | 07:47 |
mkrai_ | kaifeng, Yes I know that. I am just searching if there's a redfish agent in openstack | 07:48 |
kaifeng | mkrai_: no idea on redfish agent, what do we expect it do? | 07:50 |
ileixe | mkrai_, kaifeng: Thanks! I will check that | 07:52 |
arne_wiebalck | mkrai_: I understand it the same way kaifeng does. For specific redfish questions etingof is probably the best to talk to. | 07:54 |
ileixe | https://docs.openstack.org/ironic/latest/admin/multitenancy.html from the reference | 07:55 |
ileixe | does it saying I need special ML2 implmentation ? (Cisco/FUJITSU..) | 07:55 |
mkrai_ | kaifeng, configure the settings on the node, in simple terms the same redfish does with BMC | 07:56 |
mkrai_ | arne_wiebalck, thanks my understanding is same | 07:56 |
mkrai_ | ileixe, yes there are different ML2 drivers for different type of switch hardware | 07:57 |
ileixe | Oh.. is there any H/W list to support? | 07:57 |
kaifeng | ileixe: in case of vlan network you'll need some ml2 driver to apply switch config for you. | 07:58 |
ileixe | usually i never touched ToR, so I could not find the right way | 07:58 |
kaifeng | ileixe: I haven't used others, but networking-generic-switch works for us, it supports many vendor switches. | 07:58 |
ileixe | :) Thanks | 07:58 |
kaifeng | mkrai_: I am not sure I understand, redfish is a protocol.. | 07:59 |
kaifeng | sounds like firware management tool in the host machine? | 08:00 |
kaifeng | s/firware/firmware/ | 08:00 |
* kaifeng runs into a meeting | 08:01 | |
mkrai_ | kaifeng, yes something similar. | 08:02 |
*** iurygregory has joined #openstack-ironic | 08:02 | |
iurygregory | good morning Ironic | 08:03 |
kaifeng | hi iurygregory o/ | 08:04 |
iurygregory | kaifeng, o/ | 08:04 |
kaifeng | mkrai_: etingof might know :) | 08:04 |
mkrai_ | kaifeng, Ok will check with him, thanks :) | 08:05 |
mkrai_ | etingof, hi o/ | 08:05 |
*** tesseract has joined #openstack-ironic | 08:17 | |
*** pcaruana has joined #openstack-ironic | 08:29 | |
*** FlorianFa has quit IRC | 08:30 | |
*** aedc_ has quit IRC | 08:34 | |
*** rpittau|afk is now known as rpittau | 08:36 | |
rpittau | good morning ironic! o/ | 08:36 |
*** bobmel has joined #openstack-ironic | 08:40 | |
etingof | mkrai_, o/ | 08:41 |
etingof | mkrai_, redfish agent normally lives inside BMC. we have sushy-emulator that effectively implements kind of redfish BMC for testing purposes | 08:42 |
etingof | mkrai_, sushy is a redfish client that abstracts away some redfish protocol details into higher-level operations | 08:43 |
mkrai_ | etingof, Hi o/ | 08:43 |
etingof | mkrai_, ironic does all its red/fishy business via sushy | 08:43 |
mkrai_ | etingof, Ok so there's no redfish agent that runs on the baremetal nodes, right? | 08:44 |
etingof | mkrai_, it is! but this spaghetti monster is hiding inside BMC firmware | 08:45 |
etingof | mkrai_, redfish agent is not running on the server itself | 08:46 |
mkrai_ | etingof, Ok got it, thanks! | 08:46 |
*** priteau has joined #openstack-ironic | 08:47 | |
ileixe | For networking-generic-switch, I think it needs permission of switch control. How do you guys handle of this issue? I assume that cloud does not touch underlay network .. | 08:48 |
ileixe | We have strictly seperated role hm.. | 08:49 |
iurygregory | morning rpittau o/ | 08:52 |
rpittau | hey iurygregory :) | 08:52 |
*** tridde has quit IRC | 08:53 | |
*** trident has joined #openstack-ironic | 08:55 | |
*** jawad_axd has quit IRC | 08:57 | |
*** jawad_axd has joined #openstack-ironic | 09:03 | |
*** jawad_axd has quit IRC | 09:03 | |
*** dougsz has joined #openstack-ironic | 09:04 | |
*** dmellado has quit IRC | 09:07 | |
*** dmellado has joined #openstack-ironic | 09:08 | |
*** dougsz has quit IRC | 09:09 | |
*** lucasagomes has joined #openstack-ironic | 09:10 | |
kaifeng | ilexi: right, you need to provide credentials in the driver configuration file | 09:14 |
kaifeng | hey rpittau o/ | 09:14 |
rpittau | hey kaifeng :) | 09:14 |
*** dougsz has joined #openstack-ironic | 09:29 | |
*** derekh has joined #openstack-ironic | 09:36 | |
*** alexmcleod has joined #openstack-ironic | 09:38 | |
*** dtantsur|afk is now known as dtantsur | 09:40 | |
dtantsur | TheJulia: sorry, I forgot that I have a holiday yesterday. I hope you get better today! | 09:41 |
dtantsur | morning ironic | 09:41 |
iurygregory | morning dtantsur | 09:42 |
*** khansa has joined #openstack-ironic | 09:51 | |
*** aedc has joined #openstack-ironic | 09:54 | |
*** ociuhandu has joined #openstack-ironic | 10:00 | |
*** khansa has quit IRC | 10:01 | |
*** mkrai_ has quit IRC | 10:03 | |
*** dougsz has quit IRC | 10:05 | |
*** ociuhandu has quit IRC | 10:08 | |
*** afasano has joined #openstack-ironic | 10:09 | |
*** dougsz has joined #openstack-ironic | 10:22 | |
kaifeng | morning dtantsur o/ | 10:29 |
*** priteau has quit IRC | 10:37 | |
*** ociuhandu has joined #openstack-ironic | 10:39 | |
*** dougsz has quit IRC | 10:54 | |
*** ociuhandu has quit IRC | 10:54 | |
openstackgerrit | Iury Gregory Melo Ferreira proposed openstack/ironic-python-agent master: Avoid grub2-install when on UEFI boot mode https://review.opendev.org/696914 | 10:55 |
*** ociuhandu has joined #openstack-ironic | 10:55 | |
*** ociuhandu has quit IRC | 10:56 | |
*** ociuhandu has joined #openstack-ironic | 10:56 | |
*** mkrai_ has joined #openstack-ironic | 11:01 | |
*** ociuhandu has quit IRC | 11:02 | |
*** ociuhandu has joined #openstack-ironic | 11:05 | |
*** ociuhandu has quit IRC | 11:06 | |
*** ociuhandu has joined #openstack-ironic | 11:06 | |
*** dougsz has joined #openstack-ironic | 11:10 | |
*** ociuhandu has quit IRC | 11:11 | |
*** rpittau is now known as rpittau|bbl | 11:16 | |
*** khansa has joined #openstack-ironic | 11:18 | |
*** khansa has quit IRC | 11:21 | |
*** khansa has joined #openstack-ironic | 11:31 | |
*** yedongcan has left #openstack-ironic | 11:33 | |
*** khansa has quit IRC | 11:35 | |
*** khansa has joined #openstack-ironic | 11:36 | |
*** Lucas_Gray has joined #openstack-ironic | 11:40 | |
*** khansa has quit IRC | 11:42 | |
*** bobmel has quit IRC | 11:54 | |
*** aedc has quit IRC | 12:03 | |
*** hjensas|afk is now known as hjensas | 12:14 | |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic-tempest-plugin master: Explicitly tear down software RAID after testing it https://review.opendev.org/683921 | 12:24 |
*** bfournie has quit IRC | 12:41 | |
*** khansa has joined #openstack-ironic | 12:43 | |
*** khansa has quit IRC | 12:47 | |
*** rachit7 has joined #openstack-ironic | 12:52 | |
*** sziviani has joined #openstack-ironic | 12:55 | |
*** khansa has joined #openstack-ironic | 12:56 | |
*** jawad_axd has joined #openstack-ironic | 12:58 | |
*** khansa has quit IRC | 13:02 | |
*** rh-jelabarre has joined #openstack-ironic | 13:05 | |
*** mkrai_ has quit IRC | 13:10 | |
*** goldyfruit_ has quit IRC | 13:20 | |
*** rpittau|bbl is now known as rpittau | 13:26 | |
*** rloo has joined #openstack-ironic | 13:26 | |
*** Lucas_Gray has quit IRC | 13:30 | |
*** bfournie has joined #openstack-ironic | 13:33 | |
openstackgerrit | Merged openstack/ironic master: Increasing BUILD_TIMEOUT value for multinode job https://review.opendev.org/698720 | 13:41 |
*** Goneri has joined #openstack-ironic | 13:42 | |
*** Lucas_Gray has joined #openstack-ironic | 13:42 | |
dtantsur | etingof, TheJulia, hey, can we setup something (a public call maybe?) to figure out the L3 spec directions? I feel like my comments are not getting addressed, and I think at this point the update has drifted too far. | 13:45 |
etingof | +1 for a call | 13:45 |
dking_desktop | TheJulia: Thank you very much! I was second guessing everything. Do you know what the best stable branch or commit is? | 13:45 |
*** Lucas_Gray has quit IRC | 13:46 | |
etingof | I am not quite done addressing your comments (with the latest spec revision some of them feel obsolete) | 13:46 |
etingof | but I am still at them | 13:46 |
*** priteau has joined #openstack-ironic | 13:46 | |
dtantsur | etingof: well, before you get too far with changing the spec, let's try to see where we want it to be | 13:47 |
dtantsur | I'm not entirely fond of the approach of injecting the file as is | 13:47 |
dtantsur | while it Gets The Job Done, it's not API, and I have a feeling we've been asked for an API | 13:48 |
etingof | dtantsur, yeah, I like second config-drive idea better | 13:48 |
dtantsur | my favourite approach is still to create a slim API. but I think the configdrive approach is at least easier (and does allow using cloud-init if desired). | 13:49 |
*** Lucas_Gray has joined #openstack-ironic | 13:49 | |
dtantsur | what I doubt, however, is the ramdisk side implementation | 13:49 |
dtantsur | if we declare we support whatever cloud-init supports, we're essentially locked into using cloud-init.. | 13:49 |
etingof | exactly | 13:49 |
etingof | that's why I try to keep ironic's hands out of network_data.json as much as possible... | 13:50 |
dtantsur | (to be clear: this problem affects both the current proposal and the configdrive proposal) | 13:50 |
dtantsur | etingof: the other way around: if you keep ironic's hands, you're locked into this problem | 13:50 |
dtantsur | * hands out | 13:50 |
dtantsur | because you have no way to say "oh, wait, we cannot do this here" | 13:50 |
dtantsur | and you have to support port bonding and whatever the current format does (or will) support | 13:51 |
dtantsur | (and you cannot validate for stupid mistakes like non-matching MAC addresses. people do this all the time.) | 13:51 |
openstackgerrit | Merged openstack/ironic master: Add logic to determine Ironic node is HW or not into configure_ironic_dirs https://review.opendev.org/671957 | 13:51 |
etingof | dtantsur, how a validating API would work for stand-alone ironic? perhaps all network info should come from the operator in this case | 13:53 |
etingof | if there is no neutron, then ironic's API would need to obtain MAC+IP+network+dns from the operator, then format network_data.json out of it? | 13:54 |
dtantsur | etingof: I mean sanity checks. More specifically that network_data: 1. is consistent with information in ironic (ports, etc), 2. is implemented by us (which has a not well defined meaning here) | 13:54 |
*** ociuhandu has joined #openstack-ironic | 13:54 | |
dtantsur | and then, yes, we have neutron, so we need to somehow (?) converge the operator-provided and neutron-provided information | 13:55 |
*** Lucas_Gray has quit IRC | 13:55 | |
etingof | with stand-alone ironic, perhaps there's a little that ironic could possibly validate | 13:55 |
dtantsur | etingof: I disagree completely, I think the two points I mentioned are specifically important for standalone ironic | 13:56 |
dtantsur | because the operators can (and thus eventually will) provide incorrect information, while we assume that neutron is sane | 13:56 |
etingof | if neutron is in place, what kind of network info do we need from the operator? | 13:56 |
dtantsur | etingof: probably none? which doesn't prevent them from providing one though.. | 13:56 |
etingof | my understanding is that for neutron case the operator only needs to bind MAC to ironic port | 13:57 |
etingof | for non-neutron case, everything should come from the operator | 13:57 |
dtantsur | let's leave the neutron case alone for a while. it's pretty simple and on its own it does not require any API modifications. | 13:57 |
dtantsur | we're adding the new API specifically to target the standalone case | 13:58 |
dtantsur | and this is where we're contentious on 1. how to present this API, 2. which feature set to support | 13:58 |
*** Lucas_Gray has joined #openstack-ironic | 13:59 | |
etingof | right, woth the design I am proposing, ironic is not involved at all - the operator prepares network_data.json and ironic blindly passes it over to ramdisk/cloud-init | 13:59 |
dtantsur | (honestly, my favourite UX would be $ openstack baremetal port set <port> --deploy-ip-address 1.2.3.4) | 13:59 |
dtantsur | etingof: ramdisk IS a part of ironic. ironic IS involved. | 13:59 |
dtantsur | and the API is part of ironic, so again, ironic IS involved. | 14:00 |
etingof | I am not sure about that ^ | 14:00 |
dtantsur | about what? | 14:00 |
*** dougsz has quit IRC | 14:01 | |
dtantsur | a decision to pass data in unverified is an API design decision, and we'll be responsible for it | 14:01 |
etingof | I am not sure that ramdisk is part of ironic because IPA is indeed part of ironic, however ramdisk OS bootstrapping is somewhat independent of ironic | 14:01 |
*** ociuhandu has quit IRC | 14:01 | |
dtantsur | etingof: IPA and the way to build ironic are part of ironic, both logically (who else?) and officially (through IPA-builder) | 14:01 |
dtantsur | there is absolutely no question in this matter. when we call this feature delivered, we'll deliver both the ironic and the ramdisk sides. | 14:02 |
dtantsur | and trust me, our customers see them as one feature | 14:02 |
dtantsur | or even: our customers don't care that there is a ramdisk side as long as it works. the question is now: how exactly will we make it work? | 14:02 |
* etingof is afk briefly | 14:06 | |
*** xXraphXx has quit IRC | 14:11 | |
*** ociuhandu has joined #openstack-ironic | 14:11 | |
*** dougsz has joined #openstack-ironic | 14:20 | |
*** pcaruana has quit IRC | 14:20 | |
etingof | dtantsur, then the way to go would be to have stand-alone ironic accepting the entire IP stack config (IP+network+gw+dns...)? | 14:21 |
*** etingof is now known as etingof|afk | 14:21 | |
dtantsur | etingof|afk: a good question that I want us to solve before we rush into a decision :) | 14:21 |
dtantsur | one thing is to support IPs+routes (the MVP we've been asked IIUC) | 14:22 |
dtantsur | another thing - to support everything that cloud-init supports | 14:22 |
dtantsur | I guess you're voting for the latter? | 14:22 |
*** ociuhandu has quit IRC | 14:22 | |
* TheJulia yawsn | 14:23 | |
dtantsur | morning TheJulia | 14:25 |
*** pcaruana has joined #openstack-ironic | 14:32 | |
*** ociuhandu has joined #openstack-ironic | 14:33 | |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic-tempest-plugin master: DNM Explicitly tear down software RAID after testing it https://review.opendev.org/683921 | 14:33 |
*** Lucas_Gray has quit IRC | 14:36 | |
*** ociuhandu has quit IRC | 14:42 | |
*** tzumainn has joined #openstack-ironic | 14:44 | |
*** etingof|afk is now known as etingof | 14:45 | |
* etingof is not voting to support the open-ended feature set of cloud-init | 14:45 | |
openstackgerrit | Iury Gregory Melo Ferreira proposed openstack/ironic-python-agent-builder master: Add efivar https://review.opendev.org/701374 | 14:49 |
*** jawad_axd has quit IRC | 14:49 | |
dtantsur | etingof: then we need to 1. define the feature set, 2. define what happens if an operator requests something outside of it | 14:49 |
etingof | even seemingly simple IP+routes config can get complicated once it invokes sophisticated iproute2 features... | 14:49 |
*** jawad_axd has joined #openstack-ironic | 14:50 | |
dtantsur | exactly. which to me seems an argument for limiting the feature set to the bare minimum. | 14:50 |
TheJulia | would it be helpful to have a higher bandwidth call on this topic? | 14:51 |
openstackgerrit | Julia Kreger proposed openstack/bifrost master: Skip status code check and make vagrant install quicker https://review.opendev.org/701067 | 14:52 |
dtantsur | TheJulia: that's the proposal I started this discussion with :) | 14:52 |
etingof | stepping back in our discussion, how badly do we want ironic to process/verify IP stack config? I understand that catching config bugs earlier improves overall usability. but there are incurred costs... | 14:52 |
dtantsur | etingof: which costs do you mean? It's certainly more coding, but this pays off in user experience. | 14:53 |
TheJulia | dtantsur: oh good :) | 14:53 |
dtantsur | especially since any failure during IP configuration means deploy timeout | 14:53 |
*** jawad_ax_ has joined #openstack-ironic | 14:53 | |
TheJulia | I would prefer we have to avoid explicitly blessing and testing every single configuration possibility | 14:53 |
TheJulia | which is why I'm all for a pass-through | 14:54 |
dtantsur | TheJulia: there is no way to avoid it: we have to write the ramdisk side. | 14:54 |
etingof | I mean, ironic would have to align with some third party tool such as cloud-init | 14:54 |
dtantsur | unless you suggest to rely on cloud-init | 14:54 |
etingof | the alternative (I have been proposing) is the pass-through | 14:54 |
dtantsur | etingof: it's up to us whether to rely on it or not | 14:54 |
TheJulia | dtantsur: I was honestly originally thinking glean since it is lightweight and we limit it to that | 14:54 |
*** Lucas_Gray has joined #openstack-ironic | 14:54 | |
dtantsur | TheJulia: we need to check if glean has the features we need | 14:55 |
*** jawad_axd has quit IRC | 14:55 | |
TheJulia | this is a good point | 14:55 |
TheJulia | put IP on an interface it definitely has | 14:55 |
dtantsur | and.. I still dislike the mode of operation where any error in a free-form JSON results in a deploy timeout with no explanation :( | 14:55 |
TheJulia | fancy networking with bridging and vlans, it likely does not | 14:55 |
TheJulia | if someone wants those features, they are welcome to propose such code, we shouldn't be trying to have an exhaustive list supported day 1 | 14:56 |
dtantsur | sure, I don't disagree with that | 14:56 |
etingof | my experience with ironic ramdisk is that I frequently have to fire up the console to troubleshoot | 14:56 |
etingof | I am trying to say that while early validation is indeed a good thing to have, it does not radically change anything | 14:57 |
dtantsur | TheJulia: looking at https://opendev.org/opendev/glean I cannot find its feature set. but the first glance looks promising. | 14:57 |
dtantsur | etingof: you're not the target customer ;) | 14:57 |
TheJulia | dtantsur: it is what infra uses for config drive processing | 14:58 |
*** jawad_ax_ has quit IRC | 14:58 | |
TheJulia | supers simple | 14:58 |
TheJulia | and we can blame mordred for all bugs | 14:58 |
TheJulia | s/supers/super/ | 14:59 |
dtantsur | OH, I'm sold now!! | 14:59 |
dtantsur | :) | 14:59 |
TheJulia | if that seriously sold you, I'm alarmed :) | 14:59 |
* etingof includes mordred dependency in the spec | 14:59 | |
dtantsur | TheJulia: I'm an openstacksdk core, I'm used to blaming mordred for things :D | 14:59 |
dtantsur | seriously though, glean seems to support a lot of features. not clear if it supports RHEL 8 though | 14:59 |
TheJulia | etingof: this is a true potential issue, but I also hear of grumbling about getting fixes or anything into cloud-init and it is so heavy weight | 15:00 |
dtantsur | I'm still strongly on the side of providing a schema on the ironic API side. it is our API, we're responsible for it (and for it to work). | 15:00 |
* etingof is not proposing cloud-init in ramdisk | 15:00 | |
TheJulia | dtantsur: they would surely accept patches and we have plenty of people they can trust to review the code... if they are willing to trust | 15:00 |
dtantsur | so, glean seems promising, we need to: 1. figure out the feature set, 2. figure out RHEL 8 status, 3. patch it to understand our CD format. | 15:01 |
etingof | + keep linking up ironic API with potentially expanding glean feature set | 15:02 |
etingof | s/linking/leveling/ | 15:02 |
TheJulia | ++ | 15:02 |
dtantsur | it's annoying, but it's less annoying that debugging a deploy timeout when the customer does not know what a virtual console is :) | 15:03 |
etingof | if it's an operator-managed network (i.e. not neutron-maneged), it's probably more fragile in its own | 15:03 |
TheJulia | odds are, in the edge cases, neutron will know exactly nothing about it | 15:04 |
etingof | even if glean config looks clean, the ramdisk may still not show up on the network because of operator mistake in part of network config | 15:04 |
TheJulia | this is entirely possible | 15:04 |
dtantsur | sure, we cannot protect from anything. it doesn't mean we have to protect from nothing. | 15:04 |
TheJulia | anything and everything | 15:05 |
TheJulia | or just everything | 15:05 |
dtantsur | * everything | 15:05 |
etingof | right, but the cost of building this protection - is in justified in this case...? | 15:05 |
dtantsur | etingof: which cost again? | 15:05 |
dtantsur | it's like 1 JSON schema and a few dozens of lines of sanity check code | 15:05 |
etingof | ironic network config API maintenance | 15:05 |
TheJulia | what would the code really cost to perform some basic strcutural validations | 15:05 |
etingof | code is cheap, maintenance could be expensive | 15:06 |
TheJulia | write anything in flask and be ahead of the techdebt? *ducks* | 15:06 |
dtantsur | etingof: which maintenance? | 15:06 |
dtantsur | we're literally talking about 100-300 lines of Python code doing string manipulations | 15:07 |
etingof | leveling up ironic network config API with glean changes/bugs/features/deprecations | 15:07 |
dtantsur | well, that's what API means | 15:07 |
dtantsur | we can say "we support glean >= 3.5.6,<5.0.0" | 15:07 |
etingof | yeah, but it's not self-contained in this case | 15:07 |
dtantsur | if API is not defined, it's not API :) and we're working on bare metal API here. | 15:08 |
dtantsur | just imagine yourself on the customer side of this feature | 15:08 |
dtantsur | "we build an API that uses glean, go figure what it support. no, we don't know how you've build the ramdisk." | 15:08 |
*** ociuhandu has joined #openstack-ironic | 15:08 | |
etingof | would it make sense to have some sort of config builder for glean that ensures valid outcome? | 15:09 |
etingof | as an alternative to user-built config validation | 15:10 |
TheJulia | what is minimally viable? | 15:10 |
*** rachit7 has quit IRC | 15:10 | |
dtantsur | etingof: realistically, we don't expect the part we need to ever change | 15:10 |
dtantsur | so the maintenance discussion here is a bit theoretic | 15:10 |
dtantsur | chances are high, we'll write it once and forever | 15:11 |
TheJulia | I think where maintenance idea is coming from continuing to carry forth and add more and I don't think we need to do that. We only need to do the initial lift | 15:11 |
dtantsur | (unless we expect the IP protocol family to get deprecated) | 15:11 |
TheJulia | again, bringing us to what is minimally viable for a user to be able to leverage the feature and not have to jump into IRC to ask for help or file support tickets | 15:11 |
dtantsur | ++ | 15:12 |
TheJulia | I heard global v6 traffic has gone down to liek 12%... so I find V4 being deprecated a bit of a funny idea | 15:12 |
dtantsur | we'll retire by the time the potential deprecation reaches enterprises | 15:13 |
TheJulia | +++++++++++++ | 15:13 |
jroll | we know as well as anyone that deprecation never ensures death :) | 15:13 |
TheJulia | or have become goat farmers or something | 15:13 |
dtantsur | yeah, what's the new python 2 death date? :) | 15:13 |
dtantsur | April 1st? | 15:13 |
TheJulia | lol | 15:13 |
dtantsur | morning jroll | 15:13 |
TheJulia | good morning jroll | 15:13 |
jroll | lol | 15:13 |
jroll | oh, morning everyone :) | 15:13 |
dtantsur | TheJulia: I'd make cheese in Swiss Alps. that's my dream. | 15:13 |
jroll | ++ | 15:14 |
etingof | dtantsur, what's so special about that? | 15:14 |
TheJulia | dtantsur: We're actually pondering getting a laser and plasma cutter and starting a fabrication shop on the side.... | 15:14 |
TheJulia | well, I want the plasma cutter, summer just wants the laser cutter | 15:14 |
dtantsur | etingof: about what? I'm merely saying that the data format we're planning to use is quite stable. | 15:14 |
dtantsur | TheJulia: sounds like a lot of fun! :) | 15:15 |
TheJulia | (which is the same premise glean took) | 15:15 |
etingof | dtantsur, I've been wondering about your prospective farming career ;) | 15:15 |
dtantsur | aaaah | 15:16 |
dtantsur | :) | 15:16 |
rpittau | +1 to goat farming, sheep would be good too | 15:16 |
dtantsur | etingof: have you been to Swiss Alps? It's fantastic! And fresh cheese, mmmmm... :) | 15:16 |
TheJulia | Tree farming too... | 15:16 |
* dtantsur has just lost his (already negligible) desire to work today | 15:16 | |
rpittau | I expect glean working on rhel/centos 8 since it works on systemd envs | 15:17 |
etingof | dtantsur, yes, I got stuck in piles of manure there | 15:17 |
TheJulia | systemd != systemd-network | 15:17 |
dtantsur | heh | 15:17 |
TheJulia | or NetworkManager | 15:17 |
dtantsur | it uses NM, I think | 15:17 |
rpittau | oh it works on NetworkManager | 15:17 |
TheJulia | oh, goodie | 15:17 |
dtantsur | adapting glean to RHEL 8 may be a matter of a few switches to detect it | 15:18 |
dtantsur | or maybe someone can just try it on a VM? | 15:18 |
TheJulia | (or it maybe already does correctly) | 15:18 |
TheJulia | (all vms in use at the moment) | 15:18 |
dtantsur | I can fire up a VM after I get my afternoon tea | 15:18 |
* dtantsur goes afk exactly for that | 15:18 | |
rpittau | mmm tea | 15:19 |
TheJulia | ++ | 15:19 |
etingof | it used to look much simpler with just iproute2 dependency of ramdisk | 15:20 |
TheJulia | in my effort to fix bifrost to fix kolla... I ran into an issue where bionic basically keeps overriding route settings on a long lease where the route is changed. That wouldn't be a simple thing... so we have to kind of go above and beyond just interface settings becasue people have coded tools that disregard and discourage their use | 15:22 |
*** ociuhandu has quit IRC | 15:22 | |
dtantsur | etingof: the problem with API validation was there just as well | 15:33 |
dtantsur | the in-band deploy steps spec has got 2x +2, it's the right time to check it: https://review.opendev.org/#/c/696619/ | 15:35 |
patchbot | patch 696619 - ironic-specs - Add in-band deploy steps spec - 4 patch sets | 15:35 |
etingof | yes, but virtually no dependencies... | 15:36 |
dtantsur | is it good or bad? ;) | 15:36 |
dtantsur | a nice (or terrible) thing about glean: it seems to replace dhcp-all-interfaces | 15:37 |
*** TxGirlGeek has joined #openstack-ironic | 15:41 | |
dtantsur | TheJulia, etingof, fungi thinks that glean works with CentOS/RHEL 8. so we're good here. | 15:42 |
fungi | all our nodepool nodes in opendev use glean instead of cloud-init, mainly to avoid the giant mass of python dependencies cloud-init drags in | 15:44 |
dtantsur | that's what makes it appealing to us as well | 15:45 |
fungi | so any node type you see jobs use in opendev at least have our cloud provider use cases covered in glean | 15:45 |
fungi | (and that includes centos-8) | 15:46 |
* etingof noted that glean supports a subset of cloud-init features | 15:52 | |
*** Lucas_Gray has quit IRC | 15:54 | |
etingof | ...but what's supported is not seemingly documented | 15:55 |
dtantsur | yep | 15:56 |
* dtantsur ponders supporting the SSH key functionality | 15:56 | |
etingof | still, why would not we approach the problem from the different end - generate valid network_data.json from operator-supplied input? | 15:58 |
etingof | ...instead of building glean validator into ironic | 15:59 |
dtantsur | etingof: I don't mind it either, as long as we have a clearly defined API | 15:59 |
dtantsur | s/glean validator/openstack metadata validator/ | 15:59 |
etingof | do we still need any API in ironic if network_data.json is guaranteed to be sane? | 15:59 |
etingof | I'd say it's still a glean validator, or even a glean syntax subset validator | 16:00 |
dtantsur | etingof: we're working on a feature: "Create API to provide IP addresses to the ramdisk in DHCP-less environment" | 16:01 |
dtantsur | does it answer your question? | 16:01 |
dtantsur | or do you suggest concentrating only on the neutron case? | 16:02 |
etingof | is the API part also a requirement? could we describe the feature as "Make ironic ramdisk running in DHCP-less environment"? | 16:03 |
*** jawad_axd has joined #openstack-ironic | 16:04 | |
TheJulia | neutron would be a good intermediate stepping stone, but won't really be used in edge scenarios unless it is a far flung completely openstack managed which I can only see in the likes of Oath | 16:04 |
etingof | no, I am not looking at neutron within the context of this conversation | 16:04 |
dtantsur | etingof: how do you provide IP addresses if you don't have an API for that? | 16:05 |
dtantsur | I see two options in the discussion: from neutron and from the API. do you see another option? | 16:05 |
*** ociuhandu has joined #openstack-ironic | 16:05 | |
etingof | so the thought I am trying to convey is this: (1) operator obtains network config for a node somehow (2) operator invokes a glean config builder that consumes operator-supplied network config and produces valid network_data.json file (3) ironic passes-through network_data.json to glean running inside ramdisk (w/o any validation or interpretation) | 16:07 |
dtantsur | etingof: (3) ironic passes through <-- this is API. API requires validation. | 16:07 |
dtantsur | or you can suggest passing the complete configdrive, which pushes the problem down the stream. | 16:07 |
dtantsur | (for the reasons that are still unclear to me) | 16:08 |
TheJulia | I feel like bifrost is soooo close... | 16:08 |
TheJulia | https://www.irccloud.com/pastebin/TRzoTfpL/ | 16:08 |
etingof | my thought is that if network_data.json is guaranteed to be well-formed (because it's validated by the building tool), then we arguably do not need another validation in ironic | 16:08 |
openstackgerrit | Merged openstack/ironic master: Make qemu hook running with python3 https://review.opendev.org/700994 | 16:09 |
openstackgerrit | Merged openstack/ironic master: Explicitly use ipxe as boot interface for iPXE testing https://review.opendev.org/698146 | 16:09 |
dtantsur | etingof: guaranteed - by whom to whom? how do we know that? | 16:09 |
dtantsur | can we remove all validation from our API under the assumption that ironicclient/openstacksdk are written correctly? | 16:09 |
dtantsur | also, are you actually suggesting that we write a new tool to avoid writing 100-200 lines of string manipulations? | 16:10 |
etingof | well, if we code up a tool that would produce network_data.json under the same constraints as ironic network config API would impose on human-generated network_data.json... | 16:10 |
dtantsur | ... and users won't use it? | 16:11 |
dtantsur | more importantly, what are you saving here? you still have to do the same job, but now the validation is optional, because.. why? | 16:11 |
etingof | I am not counting in lines of code here. I am more looking to avoid ironic depending on glean | 16:11 |
dtantsur | why? | 16:11 |
dtantsur | what's wrong with using a tool that gets the job done? | 16:12 |
dtantsur | okay, you can get back to the iproute2 proposal, but you'll still need to validate the input | 16:12 |
TheJulia | Is the desire to expand the scope beyond purely deployment to instance operation? | 16:12 |
etingof | I am not against using glean in ramdisk | 16:12 |
dtantsur | TheJulia: we already have it... | 16:13 |
etingof | I am worrying that pulling (undocumented) glean's details into ironic code | 16:13 |
TheJulia | well, tehcnically as long as someone attached a configuration drive.... | 16:13 |
etingof | ...may be expensive in the long run | 16:13 |
dtantsur | etingof: let's ask them document it! or help them document it | 16:13 |
TheJulia | but we shouldn't be engineering the same thing again | 16:13 |
*** ociuhandu has quit IRC | 16:13 | |
TheJulia | etingof: bifrost was a super early user of it as well, so the bonds with the ironic community are strong :) | 16:14 |
*** ociuhandu has joined #openstack-ironic | 16:15 | |
etingof | so we are basically saying to ironic users that they need to produce network_data.json that is compliant to a subset of glean syntax? | 16:15 |
*** gyee has joined #openstack-ironic | 16:16 | |
etingof | then, as dtantsur said, we need to (1) establish what glean syntax is and (2) strip it to a bare minimum | 16:16 |
dtantsur | etingof: right. and we'll have to document it somewhere | 16:16 |
dtantsur | (or ask glean people to document, or document it for them) | 16:17 |
dtantsur | etingof: chances are non-zero, glean supports the whole network_data.json | 16:17 |
dtantsur | fungi: do you know ^^ or somebody who knows? | 16:17 |
etingof | dtantsur, they say it is not | 16:17 |
dtantsur | it might be outdated. the code looks quite sophisticated. | 16:17 |
etingof | > Please note that glean does not implement every feature listed. | 16:18 |
dtantsur | from configdrive - sure | 16:18 |
dtantsur | it doesn't seem to bother with vendor_data or all of metadata | 16:18 |
dtantsur | in any case, whether we use glean, cloud-init or iproute2, we'll need to define what API we support | 16:18 |
TheJulia | I suspect dtantsur's point is "what is the mechanics and use path (in terms of a human programming/interaction interface) to get from A to booted ramdisk | 16:19 |
etingof | I take it purely as a stripped down network_data.json schema (which is not documented as well, afaik) | 16:20 |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic master: Add a missing versionadded for configdrive[vendor_data] https://review.opendev.org/701400 | 16:20 |
dtantsur | etingof: it may even be complete, let's see if we can get someone to confirm | 16:21 |
fungi | not entirely certain what the glean question is, but glean is basically focused on setting up network interface configuration at boot time using information provided in a configdrive and/or dhcp | 16:22 |
fungi | (it doesn't implement dhcp itself, but can configure the interface to use whatever the platform's provided dhcp solution is) | 16:22 |
fungi | oh, also ipv6 autoconfig | 16:23 |
TheJulia | That is good to know about v6 autoconfig | 16:23 |
etingof | dtantsur, so regarding the ironic network config API - is it going to leak all the details (IP, gw, dns...) to ironic REST API? or ironic osc should build network_data.json out of user-supplied options and pass it over to ironic as a blob? | 16:23 |
fungi | as for dhcp6, i'm not sure how much is supported (stateless vs stateful), in most cases it's been implemented piecemeal as we or other glean users encountered provider environments providing particular features | 16:24 |
dtantsur | fungi: does it support static IPv6? it's not clear from the code. | 16:24 |
TheJulia | etingof: no matter what we do, the data would need to be transmitted to the API. OSC is not our only API consumer. | 16:24 |
TheJulia | dtantsur: that might be a good feature to add if not... | 16:25 |
dtantsur | yep, especially for people here who work on metal3 ;) | 16:25 |
TheJulia | I think they can live with just slaac, but you never know. | 16:25 |
dtantsur | fungi: this looks suspicious: https://opendev.org/opendev/glean/src/branch/master/glean/cmd.py#L270-L271 | 16:25 |
fungi | there are also some nasty inetractions between networkmanager and kernel ipv6 slaac autoconf where nm can refuse to set up v4 addresses if the kernel has already configured a v6 addy, i think we finally got that worked out but some of it may still be timing sensitive | 16:25 |
etingof | so perhaps leaking IP config details to ironic API is a way to go then | 16:25 |
fungi | dtantsur: not sure about static v6 either, would need to look in the docs/source. if it doesn't, i'm sure we'd be happy for that to get added | 16:26 |
dtantsur | okay, so this ^^ is another potential TODO | 16:27 |
dtantsur | fungi: for the context: for Edge deployments we're working on a way to configure IP addresses for our ramdisk without DHCP | 16:27 |
*** mbeierl has quit IRC | 16:27 | |
dtantsur | we suspect glean may help us on the ramdisk side. we'll need mostly static configuration + routes. | 16:27 |
etingof | +dns | 16:28 |
fungi | like i said, it's mostly grown support for situations as we've encounetred them in the wild, and i don't think we have any providers for opendev who are using static v6 in configdrive (nova might not even put that in the configdrive data, i'm not certain) | 16:28 |
*** ociuhandu has quit IRC | 16:28 | |
*** mbeierl has joined #openstack-ironic | 16:28 | |
dtantsur | etingof: yep. of all these, only static IPv6 is questionable. | 16:28 |
dtantsur | fungi: makes sense, thanks! | 16:29 |
clarkb | it depends on the platform | 16:29 |
clarkb | we've added things that we've needed as we've added platforms | 16:30 |
TheJulia | a support matrix might be needed then, but that shouldn't be a big deal | 16:30 |
* TheJulia is on a matrix kick for some reason | 16:30 | |
JayF | Probably good to document that for glean in any case. | 16:30 |
dtantsur | exactly | 16:30 |
clarkb | I think part of the struggle there is it is nebulous | 16:31 |
dtantsur | yeah, from code it seems that static IPv6 is supported for networkd and debian, not supported for RH and gentoo | 16:31 |
clarkb | you add config to systemd-networkd | 16:31 |
clarkb | then see what comes out the other end | 16:31 |
clarkb | (similar with networkmanager) | 16:31 |
TheJulia | This is true, but if we at least point out high level cases, I think we're better off than nothing | 16:31 |
* dtantsur finds glean code quite readable btw | 16:31 | |
clarkb | I guess this has mostly been an issue with networkmanager | 16:32 |
clarkb | we configure ipv4 and ipv6 and then only get one or the other out the other end due to ancient bugs in network manager | 16:32 |
TheJulia | clarkb: ouch | 16:32 |
clarkb | (which we hope we've worked around at this point) | 16:32 |
fungi | right, working around the quirks of various network interface configuration frameworks across distros has been the biggest challenge | 16:33 |
etingof | re the prospective ironic api - I understand we are adding a select of IP config details as node/port properties, on per-interface basis for ramdisk and instance. is that about right? | 16:33 |
fungi | s/quirks/long unfixed bugs/ | 16:33 |
dtantsur | etingof: I'm quite against doing anything for the instance, we already have an API for that. | 16:33 |
etingof | that would be a bit inconsistent for the stand-alone ironic operator... | 16:34 |
dtantsur | etingof: I don't have a strong opinion on node vs ports as long as API is meaningful | 16:34 |
dtantsur | etingof: not really. we still accept a configdrive. | 16:34 |
TheJulia | per interface is what it should be because we need a MAC for lookup (unless we merge agent token and then add in an embedding "hey, this is your uuid, have a nice day" | 16:34 |
dtantsur | it's just that nova generates it behind the scenes, while in the standalone case the operator is responsible | 16:34 |
etingof | right, but to deploy a node entirely dhcpless, the operator will have to (1) configure ramdisk via api and (2) produce network_data.json somehow, wrap it into a config-drive and pass to ironic | 16:35 |
dtantsur | etingof: or just pass it to ironic to two places if it's completely the same | 16:35 |
TheJulia | So dmitry actually raises a very very very very good point | 16:36 |
TheJulia | on an edge site, your not going to have a different network... your likely going to provide the exact same configuration drive data for the deployment | 16:36 |
TheJulia | your internet or internal network or outbound path should be the same | 16:36 |
JayF | Should that be a configuration we make easy by default, given it's less-than-ideal security implications? | 16:36 |
TheJulia | (which is why the agent token code is so important | 16:36 |
TheJulia | ) | 16:36 |
*** ociuhandu has joined #openstack-ironic | 16:37 | |
dtantsur | JayF: you mean, the complete feature? or using the same networking configuration? | 16:38 |
TheJulia | JayF: in this case is for virtual media usage, we would just embed the data in the iso image being attached | 16:38 |
etingof | dtantsur, these two places - what are they? | 16:38 |
JayF | dtantsur: using the same network configuration | 16:38 |
dtantsur | etingof: for deployment - the one we're designing. for instances - https://docs.openstack.org/api-ref/baremetal/?expanded=change-node-provision-state-detail#change-node-provision-state (see network_data). | 16:38 |
TheJulia | JayF: we can't orchustrate changing network configuration around in the edge case. It is literally an active/standby node pair on a telephone pole with embedded radios doing magical stuff | 16:39 |
etingof | dtantsur, "pass it to ironic" - "it" stands for config-drive here? | 16:39 |
etingof | or network-data? | 16:40 |
dtantsur | etingof: for the final instance you can do either of these | 16:40 |
JayF | TheJulia: I must be missing something about this discussion then, because I didn't realize that was a requirement? I'll bow outta the conversation then... | 16:40 |
dtantsur | starting with Train you can pass only network_data itself | 16:40 |
JayF | That sounds like an awful requirement to have to meet :-| | 16:40 |
etingof | right, but for deployment we are feeding all config details to ironic via dedicated API node/port properties, right? | 16:41 |
TheJulia | JayF: not terribly awful, just the reaching consensus part has been... not fun | 16:41 |
dtantsur | etingof: via the provision state API (please see the link) | 16:41 |
JayF | I mean, if I were still closely involved and more well-read, I might ask the "should we" question... but I'm not, so I'm heading off IRC to go to code-work | 16:41 |
etingof | dtantsur, right | 16:42 |
dtantsur | (thinking aloud) in theory nothing prevents using Far Edge deployments with the 'neutron' networking for network separation.. | 16:42 |
TheJulia | JayF: nobody wants to roll a truck to re-image nodes, so from a capability standpoint | 16:42 |
*** xXraphXx has joined #openstack-ironic | 16:43 | |
TheJulia | dtantsur: for an envrionment with a managed switch fabric, sure and I think in the neutron case that can be supported, the operator "i need to reimage this server in the field that doesn't really have a real switch... is where the original spec came from | 16:43 |
* etingof still worries that expanding provision state API with glean options is a bit more intrusive compared to an external network_data generation tool... | 16:45 | |
dtantsur | TheJulia: I would expect even a cheap switch from eBay to be programmable nowadays, but I may be terribly wrong.. | 16:45 |
JayF | dtantsur: that is pretty wrong :) | 16:45 |
etingof | ...and internally inconsistent ramdisk vs instance config | 16:45 |
dtantsur | etingof: wait, wait, we don't do that. | 16:45 |
dtantsur | JayF: #sadpanda | 16:45 |
JayF | dtantsur: I have a box of dumb switches in my closet, all purchased on amazon for <$50 | 16:45 |
*** ociuhandu has quit IRC | 16:46 | |
JayF | and you can absolutely buy switches marketed to business that don't have any management capability | 16:46 |
dtantsur | etingof: the provision state API accepts network_data now, already. it's done, we're not discussing changing that. | 16:46 |
dtantsur | JayF: that's why we cannot have nice things.. | 16:46 |
* TheJulia looks at the 4 port router/switch on her desk in the 3d printed case where the switch looks and reports that it can be configured, but doesn't actually support any running configuation | 16:46 | |
JayF | dtantsur: if your world was true, I could say "that's why we cannot have cheap things" :D | 16:46 |
etingof | dtantsur, then I do not understand how will `openstack baremetal node --set-ip 1.2.3.4 ` would work | 16:46 |
dtantsur | etingof: I'm saying the opposite thing: let's not worry about final instances, only about the deployment process | 16:46 |
TheJulia | dtantsur: agreed, this is purely a deployment problem to solve | 16:47 |
dtantsur | etingof: more of $ openstack baremetal node set <node> --ramdisk-network-data /my/file.json | 16:47 |
etingof | well, that's not your ideal, is not it? ;) | 16:47 |
dtantsur | close to what you're proposing, just without instance_network_data and with some validation on the API side | 16:47 |
dtantsur | my ideal.. may never come to life. | 16:47 |
dtantsur | my ideal would be more of $ openstack baremetal port set <port> --network-data ipv4=1.2.3.4 --network-data ipv6=blah:blah::blah | 16:48 |
JayF | That's going to get really gross for super-complex setups. | 16:48 |
dtantsur | but I can see a point in accepting the whole network_data file: you can easily reuse it in the (existing) deployment API | 16:49 |
JayF | Unless you're going to limit Ironic's capabilities to less than what glen/network_data supports | 16:49 |
JayF | i.e. how would you model a bonded network interface ala what OnMetal used to use | 16:49 |
dtantsur | JayF: we're not sure we want to support super-complex setups, but fair enough | 16:49 |
dtantsur | did you use bonding for deploy ramdisks? | 16:49 |
JayF | We did not, but totally would've if I could've | 16:49 |
*** alexmcleod has quit IRC | 16:49 | |
dtantsur | JayF: interesting, why? | 16:50 |
etingof | dtantsur, let's consider the edge case where we have a single network with exactly the same IP config for deploy and instance | 16:50 |
etingof | dtantsur, would not it be operator friendlier to pass network config to ramdisk and instance in exactly the same way? | 16:50 |
etingof | e.g. network_data.json or config-drive | 16:50 |
JayF | dtantsur: failure semantics around switch failures, mainly. We could've had a better attempt at making control plane stay up in switch failure cases, or at a minimum, short outages on the switch wouldn't have put nodes in weird-ish states (that might have been related to our hacky long running ramdisk stuff though) | 16:51 |
dtantsur | etingof: it probably will, hence my doubts. If we accept network_data, then it can be the same file, but put into different places for deployment and for final instance. | 16:52 |
TheJulia | That moemnt when you can't find the screen beeping | 16:52 |
etingof | dtantsur, that's what in the latest spec... | 16:52 |
dtantsur | etingof: yes, but we need to remove instance_network_data and add API validation | 16:52 |
dtantsur | (and probably settle on using glean as a backend) | 16:53 |
dtantsur | this ^^ is the minimal proposal to finish the spec. there can be options. | 16:53 |
dtantsur | JayF: ah, long running ramdisks (which we now have upstream btw) | 16:53 |
JayF | dtantsur: Yeah, but I suspect what you are doing now has to be better than what we were doing :P | 16:54 |
etingof | dtantsur, that all makes sense to me except my doubt wrt network_data validation vs generation debate | 16:54 |
dtantsur | JayF: I'd not be that confident | 16:54 |
TheJulia | \o/ success it was a hangouts chat | 16:54 |
dtantsur | etingof: I don't find "try and get deploy timeouts if you fail" a good failure model | 16:54 |
dtantsur | especially when it comes to silly things like wrong JSON format | 16:55 |
etingof | I see both approaches as formally equivalent for as long as the operator uses config generation tool | 16:55 |
dtantsur | etingof: they don't. just live with it, they won't use such a tool. | 16:56 |
dtantsur | tripleo people tried to create a networking template generator - people still write them by hand (and often enough wrongly) | 16:56 |
etingof | then they will fail at instance config | 16:56 |
etingof | anyway | 16:56 |
etingof | given that they might generate instance network_data by hand - we are not guarding that | 16:57 |
dtantsur | we don't verify instance configdrive because we're not on the other end. we don't know how it should look. | 16:57 |
dtantsur | for the ramdisk we are the other end | 16:58 |
* dtantsur wouldn't mind at least trying to verify the instance configdrive | 16:58 | |
etingof | I know, I am just noting that the end result is the same - either ramdisk or instance won't come up if the operator is messing with network_data.json | 16:58 |
dtantsur | Sure, the area of responsibility is different. We're responsible for the ramdisk, the operators are responsible for the final instance. | 16:59 |
*** lucasagomes has quit IRC | 17:00 | |
*** hwoarang has quit IRC | 17:01 | |
TheJulia | rpittau: re 700912, the env-setup script gets it I think.. | 17:02 |
rpittau | TheJulia: yeah, I was going to reply to my own comment :/ | 17:03 |
rpittau | TheJulia: one thing, running the test it seems python-ironicclient doesn't get installed and the test fails at openstack baremetal node list | 17:04 |
etingof | dtantsur, right, but still, to my opinion adding glean validator into ironic is barely justified in this case. given all pros and cons | 17:04 |
etingof | but anyway, let me update the spec accordingly | 17:04 |
dtantsur | etingof: it's not "glean validator", it's our API validator | 17:04 |
*** hwoarang has joined #openstack-ironic | 17:04 | |
dtantsur | whatever we accept must be validated as much as possible | 17:04 |
dtantsur | glean or anything else is an implementation detail | 17:04 |
etingof | that perhaps makes it even more hairy then | 17:05 |
dtantsur | you're even free to invent your own format (and convert it to glean behind the scenes) as long as it's documented and validated | 17:05 |
etingof | if we gonna support more than one syntax | 17:05 |
*** pcaruana has quit IRC | 17:05 | |
dtantsur | we aren't. I'm merely pointing out that it doesn't matter for the API discussion. | 17:05 |
dtantsur | unless we're building something that we truly never ever interpret, we have to validate it | 17:06 |
etingof | you see, I am reluctant to teach ironic parsing a specific dialect of network_data.json | 17:06 |
dtantsur | (an examples of something that we do not interpret are whole disk images or instance configdrive) | 17:06 |
dtantsur | etingof: s/dialect/subset/. this has been your proposal since the beginning, no? the question was only "which subset" | 17:07 |
etingof | dtantsur, not really. the only thing that I had to account for is the need to merge multiple network_data.json files | 17:08 |
TheJulia | in my 1-on-1 atm | 17:08 |
etingof | but that need is gone if we attach network_data to ironic node instead of ironic port | 17:08 |
*** hwoarang has quit IRC | 17:09 | |
dtantsur | etingof: if you'd like to support the complete network_data.json, we only need to add the missing parts to glean. | 17:09 |
etingof | ;-) | 17:09 |
dtantsur | still easier than implement everything from the ground up | 17:09 |
rpittau | good night! o/ | 17:09 |
*** rpittau is now known as rpittau|afk | 17:09 | |
etingof | I am leaning towards making ironic (conductor) neutral to network config format | 17:10 |
*** hwoarang has joined #openstack-ironic | 17:10 | |
etingof | so it would be like end-to-end: the operator builds a well-formed network-data.json for ironic ramdisk via a tool, network_data.json passes through ironic conductor to ironic ramdisk for consumption | 17:11 |
dtantsur | s/well-formed// we cannot assume that | 17:11 |
etingof | the same network_data.json could work for instance as well | 17:12 |
* dtantsur reminds that we're still arguing about 100-200 lines of basic validation code...... | 17:12 | |
JayF | dtantsur: I'd be curious if that validation code would live in glean and be libraried-in, or if we'd write it ourselves. | 17:12 |
etingof | we can only assume network_data.json sanity based on operator's willingness to play by the rules i.e. to use config generating tool | 17:12 |
dtantsur | JayF: interesting point. given that format is not glean-specific, it could be just a JSON schema.. | 17:12 |
JayF | dtantsur: seems like it could be painful to require an ironic API validation change to support new features in network_data.json if glean was already updated to support them | 17:13 |
JayF | yeah, you're seeing what I'm getting at a little bit | 17:13 |
dtantsur | JayF: it's not the case we're facing, really. I don't even suggest we validate every aspect of the file. | 17:13 |
JayF | dtantsur: you just wanna make sure it's valid json and has a few keys you expect? | 17:13 |
dtantsur | more of "somebody typed ipaddress instead of ip_address" | 17:13 |
dtantsur | yeah | 17:13 |
JayF | dtantsur: I still think that's a little dangerous, because the spec for that json could change a name. Right now, AFAICT, it'd be possible to implement all this in Ironic without having Ironic (or IPA ramdisk) care about the specifics of what's inside that json, as long as glean knows how to deal with it | 17:14 |
dtantsur | JayF: yep, but it's barely a good API (and a failure will cost a lot) | 17:15 |
* etingof humbly reminds that ip_address field can still have a non-functional value that we can't catch | 17:15 | |
JayF | You both are kinda right, it's a crappy failure case that we can avoid some aspects of | 17:15 |
*** ociuhandu has joined #openstack-ironic | 17:15 | |
dtantsur | etingof: never in this discussion have I suggested to cover 100% of failures, hence I'm referring to sanity checking | 17:15 |
JayF | but also you can't prevent the most likely cause of those (actaully badly-configured networking) | 17:15 |
JayF | I'll also ask: does this spec consider at all using this network data for rescue ramdisks? | 17:16 |
dtantsur | oh, we haven't talked about rescue much | 17:16 |
dtantsur | (see, that's why it's useful to have you in this discussion :) | 17:16 |
etingof | dtantsur, I know, I am just pointing out that there can be lesser sense in catching typos if semantic errors are likely and fatal | 17:16 |
JayF | It should just be able to use what deployment does, just s/deploy/rescue/ on the relevant added options :P | 17:16 |
dtantsur | etingof: inability to prevent all errors doesn't mean we shouldn't prevent any | 17:17 |
dtantsur | especially since the costs of validation are negligible | 17:17 |
JayF | dtantsur: etingof: I wonder if we have a good way to give non-fatal API warnings ... i.e. API user uploads a network_data.json that fails validation ... can we attempt to still set it on the node, but let the user know "we found potential issues with this file: X, Y, Z" | 17:17 |
etingof | so what's the difference between deploy/rescue and clean ramdisks? | 17:18 |
dtantsur | etingof: deploy and clean ramdisks are largely the same (ditto inspection) | 17:18 |
dtantsur | rescue is running on a user network, so it may potentially have different networking | 17:18 |
JayF | at OnMetal, for example, our rescue ramdisk was extremely similar to our deploy ramdisk, but with some useful tools added for users trying to rescue, and with the private firmware tools ripped out | 17:18 |
dtantsur | which is a huge can of worms if we talk about non-admin users, but let's ignore it | 17:18 |
JayF | oh man, rescue would be painful to support with this | 17:19 |
dtantsur | indeed | 17:19 |
JayF | because you'd have to cleanup the "ramdisk" networking configuraiton before applying the "user" networking configuration | 17:19 |
JayF | which may not be that awful, glean might just do it by default, but it's case that'd have to be explicitly supported and tested | 17:19 |
*** hwoarang has quit IRC | 17:20 | |
dtantsur | oh | 17:20 |
*** hwoarang has joined #openstack-ironic | 17:20 | |
dtantsur | etingof: what JayF is talking about is that rescue starts on the provisioning network, then switching to the internal one | 17:20 |
dtantsur | with the networking configuration changing mid-way | 17:20 |
dtantsur | oh, s/internal/user/ | 17:21 |
JayF | s/provisioning/rescue/ | 17:21 |
dtantsur | we cannot even express that in the currently proposed model | 17:21 |
etingof | how does it do the switch-over? restarts DHCP client? | 17:21 |
JayF | many deployers use the same network for provisioning and rescue, but it's not required | 17:21 |
dtantsur | etingof: I think so, but I'm not fluent in that code | 17:21 |
JayF | etingof: basically it does a callback to conductor, the conductor tells neutron to flip to user network, sets the state to "rescued" | 17:22 |
JayF | etingof: after performing that callback, and a short sleep, the ramdisk then applys the network configuration already populated on the configdrive | 17:22 |
etingof | JayF, thanks for the insight! | 17:23 |
JayF | it's ugly and horrible | 17:23 |
etingof | JayF, why does it start on provisioning network at all? | 17:23 |
JayF | and honestly worked way better than I expected in the real world :P | 17:23 |
etingof | i.e. rescue | 17:23 |
JayF | we'll, as noted above, it starts on the *rescue* network | 17:23 |
JayF | deployers can optionally make that the same network through config | 17:23 |
dtantsur | etingof: it needs a network that can access ironic API/TFTP/etc | 17:23 |
JayF | but think about multitenant cases: we can't ever have the user node, running user code, booted on a network that can hit the control plane | 17:24 |
etingof | right, but do we need to even start on rescue network with vmedia/dhcpless? | 17:24 |
dtantsur | etingof: you need to talk to ironic API anyway | 17:25 |
JayF | etingof: to get the rescue password from it :) | 17:25 |
dtantsur | now, multitenancy is probably not about the Edge case | 17:25 |
dtantsur | so these use cases may not overlap | 17:25 |
etingof | can't we embed it into iso? | 17:25 |
dtantsur | but we need to decide something about rescue | 17:25 |
JayF | no, and I'm not even saying you have to support that in the matrix for this feature or w/e | 17:25 |
*** ociuhandu has quit IRC | 17:25 | |
dtantsur | etingof: we can, but right now we don't. | 17:25 |
JayF | just noting that rescue should be considered and explicitly excluded rather than just forgotten :P | 17:25 |
dtantsur | JayF++ great catch | 17:26 |
* etingof is trying to avoid switching the networks | 17:26 | |
dtantsur | etingof: we can explicitly say that rescue is only supported in the limited way | 17:27 |
dtantsur | and then think what we can do about it outside of this spec | 17:27 |
JayF | I will say though, it sounds like rescue working could be crucial to the use case as I understand it | 17:27 |
etingof | yeah | 17:27 |
*** pcaruana has joined #openstack-ironic | 17:28 | |
dtantsur | JayF: yep, but the case TheJulia was talking about assumes same network for everything | 17:28 |
JayF | dtantsur: ah | 17:28 |
dtantsur | rescue will work like that | 17:28 |
dtantsur | multi-tenancy... we'll have to solve it somehow. | 17:28 |
JayF | I'm still about 30 months behind :P | 17:28 |
dtantsur | heh | 17:28 |
dtantsur | JayF: the Edge world is weird and changes every few months | 17:28 |
JayF | and really, I don't even work much upstream right now, but I love getting in these chats when they happen :) | 17:28 |
JayF | dtantsur: s/Edge// | 17:28 |
JayF | lol | 17:28 |
dtantsur | haha, true | 17:29 |
*** hamzy has quit IRC | 17:29 | |
dtantsur | JayF: we love seeing you here :) | 17:29 |
etingof | that's why ironic is taking over the world | 17:29 |
JayF | dtantsur: you know I work with Ruby/Jim at Verizon Media now, right? | 17:29 |
JayF | dtantsur: so I do work on Ironic, just haven't had the chance to work as much upstream (yet, I hope) | 17:29 |
dtantsur | I do, yes (\o/) | 17:29 |
JayF | You still at Red Hat? | 17:30 |
dtantsur | yeah, same stuff, a bit more k8s in my life | 17:30 |
dtantsur | aka http://metal3.io/ | 17:30 |
JayF | Nice. | 17:30 |
JayF | Honestly, I'm kinda glad I stepped away for a couple of years from OpenStack. It's very satisfying to leave, come back, and see all the stuff we all built together that much more mature. | 17:31 |
dtantsur | may be a nice surprise indeed :) | 17:31 |
JayF | Such as, for instance, a chat completely centered around the existance of a network_data.json file that didn't even exist until we all made it exist way back when | 17:31 |
dtantsur | :D | 17:31 |
*** ociuhandu has joined #openstack-ironic | 17:31 | |
dtantsur | and we're approaching supporting non-admin API users, who could imagine | 17:32 |
JayF | I thought the policy work enabled that years ago? | 17:32 |
dtantsur | JayF: I mean, ownership of nodes by non-admin users | 17:32 |
JayF | Oh, I guess you mean *multitenant aware* non-admin API users | 17:32 |
dtantsur | yeah | 17:32 |
JayF | yeah that makes sense, and is likely awesome for my current use cases | 17:33 |
dtantsur | (and leasing being proposed to ironic-specs) | 17:33 |
* dtantsur tries to remember when we last talked f2f | 17:33 | |
dtantsur | etingof: I suggest we make another iteration of the spec, incorporating all the feedback, and walk from there. I've been in this discussion for 4 hours, I need a break :) | 17:34 |
JayF | Austin, TX PTG? | 17:34 |
dtantsur | Austin was a Summit, wasn't it? I guess you mean ATL? | 17:34 |
JayF | dtantsur: if you're ever in the Seattle area, we can have a mini-reunion. At least Aeva is in this area too (although I haven't seen her in a long time either) | 17:34 |
JayF | maybe? I don't remember, it all runs together | 17:34 |
JayF | and there were a few months when I tried to forget it all after getting laid off | 17:35 |
dtantsur | yeah.. | 17:35 |
dtantsur | I'd love to come to that area, barring all the visa complexities | 17:35 |
dtantsur | (and my general dislike of the airplanes) | 17:35 |
JayF | I'm going to try to get to go to the Vancouver summit since it's driving distance for me | 17:36 |
tenbrae | Also planning to attend Vancouver summit, FWIW :) | 17:36 |
dtantsur | I don't have an idea if I go or not. Again, budgets, visas, flights :) | 17:36 |
*** ociuhandu has quit IRC | 17:37 | |
* dtantsur will go get some drin^Wrest now | 17:37 | |
*** dtantsur is now known as dtantsur|afk | 17:37 | |
dtantsur|afk | see you tomorrow! | 17:37 |
JayF | dtantsur|afk: o/ | 17:38 |
* TheJulia exits 1-on-1 | 17:38 | |
TheJulia | brrraaains | 17:38 |
*** ociuhandu has joined #openstack-ironic | 17:39 | |
TheJulia | so rescue networking, yes agree it has to be excluded, but could eventually be supported as long as we implment the agent token work or something like that so we can begin to add some additional layers of security | 17:40 |
*** aedc has joined #openstack-ironic | 17:41 | |
* TheJulia replaces dtantsur's rest with a drink | 17:43 | |
*** priteau has quit IRC | 17:45 | |
*** ociuhandu has quit IRC | 17:46 | |
*** jawad_axd has quit IRC | 17:53 | |
*** derekh has quit IRC | 18:00 | |
TheJulia | re vancouver summit, I'll very likey be there. We've been asked if we would bring the puppy too.... I'm not sure if that is possible. | 18:02 |
*** jawad_axd has joined #openstack-ironic | 18:04 | |
JayF | TheJulia: if you all drive up there, plan a stopover for smoked meats in Tacoma :D | 18:04 |
* TheJulia looks around for the spare braincells and thinks it is just time to go get some lunch and run errands | 18:04 | |
JayF | TheJulia: I even have a big enough parking spot for the dark bug | 18:04 |
JayF | *bus | 18:04 |
TheJulia | ohhh! | 18:04 |
TheJulia | And we have solar on it now | 18:04 |
TheJulia | so we don't need to run the generator as much | 18:04 |
JayF | Yeah, I'm 100% seriously, just LMK | 18:04 |
*** bdodd has quit IRC | 18:04 | |
TheJulia | okay! | 18:04 |
etingof | dtantsur|afk yes, I will push another spec update | 18:06 |
TheJulia | dking_desktop: Sorry, I think I got your last message as I was falling asleep last night. You should be fine if use stable/train. Depending on how your triggering it, you may need to explicitly set the git_branch settings. A quick grep -r will find them since it is like package_<git_branch> | 18:07 |
TheJulia | okay, brain needs to disconnect for a little bit | 18:07 |
TheJulia | errands! | 18:07 |
dking_desktop | Thank you very much! | 18:09 |
*** dougsz has quit IRC | 18:16 | |
TheJulia | and no need to do errands, found the hidden back of cat foot | 18:20 |
*** afasano has quit IRC | 18:29 | |
*** afasano has joined #openstack-ironic | 18:30 | |
*** jawad_axd has quit IRC | 18:44 | |
*** trident has quit IRC | 18:48 | |
*** trident has joined #openstack-ironic | 18:49 | |
*** hwoarang has quit IRC | 19:10 | |
*** hwoarang has joined #openstack-ironic | 19:11 | |
*** jtomasek has quit IRC | 19:13 | |
*** bdodd has joined #openstack-ironic | 19:24 | |
*** dougsz has joined #openstack-ironic | 19:24 | |
*** hamzy has joined #openstack-ironic | 19:26 | |
*** dougsz has quit IRC | 19:27 | |
*** jawad_axd has joined #openstack-ironic | 20:03 | |
*** p0tr3c has quit IRC | 20:04 | |
*** jawad_axd has quit IRC | 20:09 | |
*** bfournie has quit IRC | 20:12 | |
*** Lucas_Gray has joined #openstack-ironic | 20:30 | |
*** pcaruana has quit IRC | 20:41 | |
*** Goneri has quit IRC | 21:00 | |
TheJulia | dtantsur|afk: https://storyboard.openstack.org/#!/story/2007076 filed :\ | 21:03 |
*** afasano has quit IRC | 21:08 | |
*** afasano has joined #openstack-ironic | 21:09 | |
*** mmethot has joined #openstack-ironic | 21:14 | |
*** rcernin has joined #openstack-ironic | 21:37 | |
*** rcernin has quit IRC | 21:37 | |
*** rcernin has joined #openstack-ironic | 21:38 | |
*** early has quit IRC | 21:55 | |
stevebaker | TheJulia: hey, I'm thinking of tackling the ironic pecan->flask conversion, but I should talk to someone first | 21:58 |
TheJulia | stevebaker: this is encouraged all around | 21:58 |
stevebaker | talking? yeah | 21:58 |
TheJulia | well, converting the api as well | 21:59 |
stevebaker | TheJulia: ok, so first questions. Is there upstream consensus to start this? Should it be a story or a blueprint? | 22:00 |
TheJulia | stevebaker: 1) there has long been consensus. 2) I think there is a story already. 3) Ironic doesn't use blueprints | 22:01 |
stevebaker | TheJulia: cool, I didn't find the story when I looks, but will try again | 22:02 |
TheJulia | searching myself | 22:03 |
TheJulia | https://storyboard.openstack.org/#!/story/2005806 | 22:03 |
stevebaker | TheJulia: \o/ | 22:04 |
TheJulia | hmm | 22:04 |
TheJulia | https://storyboard.openstack.org/#!/story/1651346 | 22:04 |
TheJulia | The latter one has rfe approved tagged | 22:05 |
stevebaker | TheJulia: huh, or is that one more about replacing WSME for validation, which is related but orthogonal? | 22:06 |
TheJulia | the general idea is get rid of pecan and wsme | 22:08 |
stevebaker | fair enough | 22:08 |
TheJulia | ironic-python-agent already got swapped over to flask and aside from a single bug, worked like a charm | 22:08 |
stevebaker | TheJulia: finally, I'll see how the implementation goes as far as the risk of the switchover breaking things. But my first attempt will be to have a few refactoring changes then a big-bang switchover change. | 22:10 |
TheJulia | stevebaker: a big single bang patch is super unlikely to ever merge | 22:11 |
TheJulia | from a review bandwidth standpoint, it is just going to crawl... where as like one endpoint at a time should fly right through and allow for less conflicts/headaches | 22:11 |
TheJulia | the idea was always to do what keystone did, change a single endpoint, get happy with that, and then sweep through the rest of the api in rapid succession | 22:12 |
stevebaker | TheJulia: oh they did pecan->flask? | 22:12 |
*** early has joined #openstack-ironic | 22:12 | |
TheJulia | yup | 22:12 |
stevebaker | good to know, I'll check it out | 22:12 |
TheJulia | stevebaker: it was like two years ago now | 22:13 |
stevebaker | ok | 22:13 |
TheJulia | I'm sure morgan is someplace... if there are questions | 22:13 |
stevebaker | sweet | 22:14 |
TheJulia | okay maybe he isnot on irc | 22:15 |
stevebaker | TheJulia: it looks like pre-flask they used paste.deploy plus there own internal wsgi fu https://docs.openstack.org/releasenotes/keystone/rocky.html#prelude | 22:18 |
TheJulia | actually, come to think of it. ipa was ?werkzrug? which is what flask is based on | 22:19 |
TheJulia | since we didn't need all of the extra stuff for that api | 22:19 |
stevebaker | ok | 22:19 |
*** bfournie has joined #openstack-ironic | 22:23 | |
*** afasano has quit IRC | 22:47 | |
*** afasano has joined #openstack-ironic | 22:47 | |
*** afasano has quit IRC | 23:04 | |
*** afasano has joined #openstack-ironic | 23:06 | |
*** hamzy has quit IRC | 23:13 | |
*** tesseract has quit IRC | 23:13 | |
*** goldyfruit_ has joined #openstack-ironic | 23:17 | |
*** rh-jelabarre has quit IRC | 23:26 | |
*** ociuhandu has joined #openstack-ironic | 23:30 | |
*** afasano has quit IRC | 23:31 | |
*** afasano has joined #openstack-ironic | 23:32 | |
*** ociuhandu has quit IRC | 23:35 | |
*** aedc has quit IRC | 23:36 | |
*** bdodd has quit IRC | 23:41 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!