opendevreview | Miguel Lavalle proposed openstack/os-vif master: Fix race with DPDK and vhostuserclient mode https://review.opendev.org/c/openstack/os-vif/+/830103 | 00:36 |
---|---|---|
*** prometheanfire is now known as Guest2 | 04:59 | |
*** Guest2 is now known as prometheanfire | 05:09 | |
*** amoralej|off is now known as amoralej | 07:01 | |
bauzas | hola folks | 09:41 |
* bauzas had to resurrect his ZNC bouncer | 09:41 | |
gibi | bauzas: o/ | 09:41 |
bauzas | Uggla: looks like your karma helped my server to be back alive :) | 09:41 |
bauzas | just by a gchat ping, wow | 09:41 |
gibi | gmann: thanks I've started reading it this morning | 09:41 |
bauzas | gmann: will review it | 09:42 |
Uggla | bauzas, first time it happens to me! :) Most of the time I find bugs. ;) | 09:43 |
bauzas | it was fun | 09:43 |
bauzas | I was fighting for 2 hours with a rasppi not wanting to resurrect and like 1 min after your ping and like 10 reboots, it eventually rebooted | 09:44 |
bauzas | maybe a fsck | 09:44 |
bauzas | just take one day off and be prepared about a bad day :p | 09:45 |
Uggla | I have a question regarding: https://specs.openstack.org/openstack/nova-specs/specs/yoga/approved/libvirt-virtiofs-attach-manila-shares.html | 09:47 |
Uggla | Regarding the model we have share_mapping table and instance_uuid as a field. If I'm not wrong we have a one to one relationship between share and instance, correct ? | 09:50 |
Uggla | But a share can be attached to several instances. So the model may need to be changed. Am I wrong ? | 09:53 |
bauzas | hmmmm | 09:56 |
* bauzas looks | 09:56 | |
Uggla | I would say we need a share_uuid field on the instance table to have a 1 to n relation ? | 09:57 |
Uggla | I could update the spec, but I would like to know if I have understand it well. | 09:58 |
bauzas | Uggla: oh, I just read again the spec | 10:01 |
bauzas | Uggla: nothing prevents you to have a share_mapping table with multiple share_uuid records having the same UUID | 10:02 |
Uggla | so we will have several records for the same share with a diff uuid. | 10:03 |
Uggla | so we will have several records for the same share with a diff instance_uuid. | 10:03 |
bauzas | I guess lyarwood wanted to tell we would do the same like for BlockDeviceMapping table | 10:03 |
bauzas | where we have a UC be the mapping UUID itself | 10:04 |
bauzas | not the volume UUID | 10:04 |
bauzas | Uggla: https://github.com/openstack/nova/blob/master/nova/db/main/models.py#L590 | 10:04 |
bauzas | either way, the PK on the bdmmapping table is a single 'id' key | 10:05 |
kashyap | Some more documentation on it - https://docs.openstack.org/nova/latest/user/block-device-mapping.html | 10:05 |
bauzas | thanks kashyap | 10:06 |
gibi | gmann: thanks I've started reading the new PSs this morning | 10:08 |
Uggla | Ok thanks, so in that case, maybe the share_mapping table requires an index on instance_uuid ? | 10:12 |
bauzas | Uggla: this depends whether you want to query from this field | 10:13 |
bauzas | Uggla: but since you'll tag a FK on instances.uuid, it will automatically generate an index | 10:14 |
bauzas | :) | 10:14 |
Uggla | ok | 10:14 |
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Sort PCI devices by their address https://review.opendev.org/c/openstack/nova/+/830136 | 10:20 |
Uggla | Another question, looking at block device attachment, the api speaks directly to the compute api. Are there some specific rules that defines to speak through the conductor, or directly to the compute. Do we speak to the compute directly when we don't need to interact with the db at the first stage ? | 10:24 |
lyarwood | \o hey so yeah I thought I included a unique UUID per mapping between the instance and share? | 10:24 |
lyarwood | Uggla: the compute API isn't running on the compute, that's up in nova-api, the manager is the part running on the compute | 10:26 |
bauzas | lyarwood: I explained it to Uggla | 10:27 |
bauzas | https://specs.openstack.org/openstack/nova-specs/specs/yoga/approved/libvirt-virtiofs-attach-manila-shares.html#data-model-impact | 10:28 |
bauzas | Uggla: ^ | 10:28 |
lyarwood | cool thanks and sorry I don't have this IRC client open anymore, only just saw the email :) | 10:28 |
Uggla | lyarwood, agreed my question is more the path api --> compute vs api --> cond --> compute ? | 10:28 |
lyarwood | Uggla: we only use the latter for long running tasks iirc like rebuilds, resizes etc | 10:29 |
lyarwood | Uggla: device attachments that are simple async operations tend to go directly between the api and compute | 10:29 |
bauzas | ++ | 10:33 |
bauzas | also if we already know the compute | 10:33 |
Uggla | lyarwood, so the manila one drops down to the simple async op case, correct ? | 10:33 |
lyarwood | Yeah | 10:34 |
lyarwood | and tbh it should be a noop at the moment as the instance is shutoff anyway | 10:34 |
lyarwood | starting the instance will recreate it within the libvirt driver and the logic to attach the mount needs to live there initially | 10:35 |
lyarwood | until we support hot plug | 10:35 |
Uggla | lyarwood, ok thx. | 10:36 |
Uggla | thank you for the answers, I think I can continue. | 10:37 |
opendevreview | yuval proposed openstack/nova master: Lightbits LightOS driver https://review.opendev.org/c/openstack/nova/+/821606 | 11:00 |
opendevreview | yuval proposed openstack/nova master: Lightbits LightOS driver https://review.opendev.org/c/openstack/nova/+/821606 | 11:03 |
sean-k-mooney | gibi: am well the pci module also is virt dirver indepenent and we explictily call sysfs in many cases to avoid depening on the virt driver | 11:13 |
sean-k-mooney | gibi: so that is already an established patteren | 11:13 |
gibi | :/ | 11:14 |
gibi | so basically nova-compute cannot run on windows with hyperv | 11:14 |
gibi | or it can but never handle PCI devices | 11:14 |
sean-k-mooney | gibi: they dont support pci device management | 11:14 |
gibi | yeah I figure that they cannot right now | 11:14 |
sean-k-mooney | so ya on windows they would have ti impelent all the fucntions in a windows compatiable way | 11:14 |
gibi | interestingly whent he pf_interface_name was added for QoS that was added via the virt driver interface | 11:15 |
sean-k-mooney | in os-vif we added an indriection layer in teh few places we needed it | 11:15 |
sean-k-mooney | gibi: yes which i changed since that was unreliable | 11:15 |
sean-k-mooney | due to libvirt design choices | 11:15 |
sean-k-mooney | mainly the way the cache and how that interacts with udev | 11:16 |
sean-k-mooney | gibi so right now the neutron use of sysfs is still confied to the sriov path | 11:17 |
sean-k-mooney | so i dont think this usage breaks windows support since that was already not supported | 11:17 |
sean-k-mooney | gibi: are there other usages that you found that would be common | 11:18 |
gibi | sorry I meant parent_ifname | 11:18 |
gibi | that is still done via the virt driver interface https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L1321 | 11:18 |
sean-k-mooney | ah there | 11:18 |
gibi | sean-k-mooney: the new smartnic feature introduced a set of new dependencies between nova's neturon code (called from the compute manager) and sysfs | 11:19 |
sean-k-mooney | well its calling into the pci module to do it | 11:19 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L1322 | 11:19 |
sean-k-mooney | so that is not actully calling libvirt anymore | 11:19 |
gibi | yes but the call coming from the libvirt driver | 11:19 |
gibi | so the nova-compute does not depend on the sysfs in that case | 11:20 |
gibi | as the virtdirver hides it | 11:20 |
sean-k-mooney | yes this is genergating the virt independent view of the pci devices that is pass to the pci tracker | 11:20 |
sean-k-mooney | yes | 11:20 |
sean-k-mooney | gibi: so at one point i think i proposed storing the mac and vf number in addtion to the serial in the pci dev extra info | 11:21 |
sean-k-mooney | we could do that and remove the need to do this lookup entirly | 11:21 |
sean-k-mooney | in the neutorn module | 11:21 |
gibi | yeah my question is do we want two interfaces from compute manager towards the hypervisor host | 11:22 |
gibi | as today we have the virt interface and the pci_utils "interface" | 11:23 |
sean-k-mooney | thats fair i know os-brick also looks at the host filesystem | 11:23 |
sean-k-mooney | i guess that is hidden? | 11:23 |
gibi | do we call os-brick outside of the virt driver? | 11:24 |
sean-k-mooney | is os-brick only used via the virt dirver i assume so | 11:24 |
sean-k-mooney | ya i dont think we call it in the generic code | 11:24 |
sean-k-mooney | i was just trying to think what other indrections we have | 11:24 |
sean-k-mooney | i generally did not consider the nuetorn module to be part fo the compute manager | 11:24 |
sean-k-mooney | but it is call by it | 11:24 |
sean-k-mooney | i was thinking of it like the pci module | 11:25 |
gibi | I grepped now, os_brick only imported from under nova.virt (and in nova-manage)_ | 11:25 |
sean-k-mooney | ack | 11:25 |
sean-k-mooney | gibi: so i think we could file this as a bug and adress it without much extra work | 11:25 |
gibi | OK. I will file a bug. | 11:25 |
gibi | and probably I have too do the fixing as well as I need this to make the PF MAC address reporting to neutron | 11:26 |
gibi | as the current patch for that is also calling pci_utils from nova's neutron code instead of relying on the pci_info | 11:27 |
gibi | from the virt driver | 11:27 |
sean-k-mooney | ya it was an exsiting pattern but i agree the tight coupleing shoudl not exist between linux and the neutron module | 11:27 |
gibi | I think the pattern was established ~ 2015 when the first fix for the PF MAC address problem was added :D | 11:28 |
gibi | so everything is connected :D | 11:29 |
sean-k-mooney | there is one think to consider however | 11:29 |
sean-k-mooney | the pci tracker tracks pools of similar devices | 11:29 |
sean-k-mooney | so for PF mac and PF name that logicaly maps ok to pool fo the PF's VFs | 11:30 |
sean-k-mooney | the vf number will it can be tracked per device its not an atribute of the pool | 11:30 |
sean-k-mooney | so some of this info coudl be store in the db and some will have to be provided by the virt dirver per device | 11:31 |
sean-k-mooney | it can do that just by calling the existing fucntion when it creates the device object | 11:31 |
sean-k-mooney | but just pointing out that we might not be abel to hide this in the common pci module code | 11:32 |
sean-k-mooney | we might need to do this in the virt driver part | 11:32 |
sean-k-mooney | in any case it should be relitivly trivial to extend https://github.com/openstack/nova/blob/master/nova/objects/pci_device.py#L121 | 11:33 |
gibi | sean-k-mooney: currently the parent_ifname is part of the extra_info | 11:33 |
sean-k-mooney | in fact we could store the info in extra_info for backwards compatiablity without changing the object | 11:33 |
sean-k-mooney | ya | 11:33 |
gibi | yeah | 11:33 |
gibi | lets try that | 11:34 |
sean-k-mooney | so we can add the other field there and keep object compat and perhaps add properties for access | 11:34 |
gibi | yepp | 11:34 |
gibi | and that way I can still use that in my PF MAC address bugfix and keep it backportable | 11:34 |
sean-k-mooney | yes | 11:34 |
gibi | cool, thanks for the brainstroming | 11:35 |
* sean-k-mooney note to self always include a dict of random strings filed for "backporting" | 11:35 | |
gibi | :D | 11:35 |
gibi | yeah I feel the mental pain due to that json blobs in the db | 11:36 |
gibi | but the bugfix backporting pain is bigger | 11:36 |
gibi | so that won | 11:36 |
sean-k-mooney | honestly the only thing i woudl replace it with is a key value mapping table | 11:40 |
sean-k-mooney | like instance extra | 11:40 |
sean-k-mooney | * instance_system_metadata | 11:40 |
gibi | if the OVO model is just a Dict then changing the SQL model does not help architecturally | 11:40 |
sean-k-mooney | but that would not buy use much since we dont really query on the sub fileds | 11:41 |
sean-k-mooney | ya it doesnt really | 11:41 |
sean-k-mooney | if you work on this ping me and ill happily review i dont know if dmitriis woudl have time to work on it as a followup/techdebt cleanup | 11:42 |
gibi | sean-k-mooney: thanks. I will work on it as I need it for the PF MAC bugfix anyhow | 11:43 |
sean-k-mooney | :) ping me if you want me to review that too. speaking of which i should go look at the rest of your placement series | 11:44 |
sean-k-mooney | gibi: we defintly want to backport the PF fix if it ends up being posible by the way | 11:45 |
gibi | sean-k-mooney: I think it will be possible | 11:45 |
sean-k-mooney | i kind of feel like the backporatble fix and final fix would be different however | 11:45 |
gibi | sean-k-mooney: yeah, if you are done with the placement series then I will switch to that to fix the small thing in a followup | 11:45 |
gibi | sean-k-mooney: let's see about the backportability of the PF MAC | 11:46 |
sean-k-mooney | as in final fix would invovle multipel port bdingin but backporable would just unbind the port for the vm update the mac and reattach it or something like that | 11:46 |
gibi | sean-k-mooney: nope, the plan is to add the PF MAC to the binding:profile and let neutron do the overwrite of port.mac_addres based on that | 11:46 |
sean-k-mooney | oh right yes | 11:47 |
sean-k-mooney | you said that last week | 11:47 |
gibi | the neutron side is basically ready in https://review.opendev.org/c/openstack/neutron/+/829247 | 11:47 |
gibi | (I need to fix comments there) | 11:47 |
sean-k-mooney | are neutron ok to backport that? | 11:47 |
sean-k-mooney | or will we have to workaround that in the nova backport | 11:47 |
gibi | I think they are OK to backport that | 11:48 |
gibi | no issue was raise so far on that | 11:48 |
sean-k-mooney | ack then that makes our life simpler | 11:48 |
gibi | the nova side needs a redo as per the pci_utils / pci_info refactor https://review.opendev.org/c/openstack/nova/+/829248 | 11:48 |
sean-k-mooney | i was assuming they would not backport that change | 11:48 |
gibi | sean-k-mooney: at least they haven't indicated that it is not backportable | 11:48 |
gibi | I will double check with them | 11:49 |
sean-k-mooney | ack it looks pretty small and i dont think it would break existing usage | 11:49 |
sean-k-mooney | so it shoudl be safe to backport | 11:49 |
gibi | yepp that is my view too | 11:49 |
sean-k-mooney | its kind of a feature however but i would be look very favorably towards framing it as a bugfix as you have done in the release note | 11:50 |
sean-k-mooney | ill finish reviewing that since i have it open now | 11:50 |
gibi | yeah it is a bug in nova, that needs some featury thing from neutron :) | 11:51 |
sean-k-mooney | gibi: there is no way to detect this form the api right | 11:51 |
sean-k-mooney | e.g. that this is supproted | 11:51 |
gibi | it cannot be detected | 11:52 |
sean-k-mooney | just thining that on the nova side we could eventually move to only setting the PF port via this eventully | 11:52 |
gibi | but it is safe from upgrade | 11:52 |
gibi | yeah I left a note in the nove code to only remove the current PF MAC settings logic after couple of releases | 11:52 |
sean-k-mooney | perfect | 11:52 |
gibi | sean-k-mooney: btw the arch OVO backport is also crazyness https://review.opendev.org/c/openstack/nova/+/829989 but I have a totally differnt proposal as a possible direction as a comment within that patch | 12:00 |
sean-k-mooney | what exactly is that doing? | 12:00 |
gibi | so the Arch enum got new values | 12:01 |
gibi | due to the emulation support feature | 12:01 |
sean-k-mooney | right which should be added at the end | 12:01 |
gibi | and the ComputeNode object has a list of HVSpec objects with arch values | 12:01 |
sean-k-mooney | and then you would just raise an error if you try to backport | 12:01 |
gibi | HVSpec cannot backport itself | 12:02 |
gibi | the ComputeNode would need to backport it | 12:02 |
gibi | by removing too new HVSpec instances (with new arch) | 12:02 |
gibi | but OVO does not cooperate | 12:02 |
gibi | I even filled https://bugs.launchpad.net/oslo.versionedobjects/+bug/1961482 | 12:02 |
gibi | so the current patch up above works but ugly as hell | 12:02 |
sean-k-mooney | am well the compute node would not have the new values if it was not updated | 12:02 |
sean-k-mooney | and an instance object with the new values shoudl not be sendable to the compute node | 12:03 |
gibi | there is code path in live migration where an old compute loads the new compute from the DB | 12:03 |
sean-k-mooney | oh | 12:03 |
sean-k-mooney | thats a nasty edgecase | 12:04 |
gibi | grenade caught it | 12:04 |
* gibi pats grenade's head | 12:04 | |
sean-k-mooney | right ok so the list object would have to strip the new version instead of raise | 12:04 |
gibi | yes | 12:04 |
gibi | but it is really hard to do that with OVO | 12:04 |
gibi | (see the patch with a ton of comments) | 12:05 |
sean-k-mooney | i dont think we have had to do that in the past | 12:05 |
sean-k-mooney | at least i cant think of a case where this has come up | 12:05 |
gibi | my alternative proposal is not to publish HVSpec objects with new arch values until a min service version check is passed | 12:05 |
sean-k-mooney | most of the enums we have extended in the past have been only in teh instance object | 12:05 |
sean-k-mooney | ok ya that would work | 12:06 |
sean-k-mooney | we often dont enable feature until the cloud is fully upgraded | 12:06 |
sean-k-mooney | this woudl cause all the compute nodes to update the comptue node table once they are all upgraded | 12:06 |
gibi | I pulled dansmith into the discussion maybe he sees an easy fix in the OVO support for this case but if not then I will suggest chateaulav to add a service version check instead | 12:06 |
sean-k-mooney | well maybe not | 12:07 |
sean-k-mooney | we are extendign the values but that does not mean the compute node would use one of the new values | 12:07 |
gibi | today it uses in the gate | 12:07 |
gibi | I haven't checked it what enables all the new arch | 12:07 |
sean-k-mooney | its likely just the presence fo the qemu binary | 12:08 |
sean-k-mooney | on ubuntu it installs all of them by default | 12:08 |
sean-k-mooney | how is this reported in the compute node db recored by the way | 12:08 |
sean-k-mooney | ah supported_hv_specs | 12:09 |
gibi | yes, that one | 12:09 |
sean-k-mooney | i guess we dont wnat to defien that as the host native arch | 12:10 |
sean-k-mooney | since that would not really help if we had native hosts using one of the new options | 12:10 |
gibi | it seems the new arch coming from the host cababilities via libvirt | 12:10 |
sean-k-mooney | yes | 12:11 |
sean-k-mooney | those are dynmic based on which qemu binaries libvirt finds | 12:11 |
gibi | ahh I see | 12:11 |
gibi | the you are right about the binaries | 12:11 |
sean-k-mooney | technically its dynmaic based on which libvirt virt plugin too e.g. if you install the lxc or openvz plugins they also extend it but in this case its the presence of the riscv emulators | 12:12 |
sean-k-mooney | ignoring this for a miniute how close do you think the emulation work is for this cycle | 12:13 |
sean-k-mooney | i think the previous impression i heard was likely not this cycle | 12:13 |
sean-k-mooney | i am debating if defering the ricsv support woudl help or not | 12:13 |
sean-k-mooney | that woudl allow use to punt this enum change to next cycle | 12:14 |
sean-k-mooney | so basically im suggesting that perhaps its better for chateaulav to focus on the arm and mips supprot this cycle and we can adress teh riscv supprot next cycel btu if we dont think the code is likely to land before thruday which i gusse is unlikely it proably is not worth spliting it out | 12:16 |
gibi | sean-k-mooney: good point. I think the series is in an OK shape overall | 12:25 |
gibi | so if we could remove the OVO backport pain by not adding risc then we can land it this week I believe | 12:25 |
gibi | bauzas, melwitt: there is a gate bug on nova that is a duplicate of a gate bug in neutron https://bugs.launchpad.net/nova/+bug/1959349 https://bugs.launchpad.net/neutron/+bug/1940425 but if I mark the newer bug as duplicate then we loose the tracking of it | 12:26 |
gibi | bauzas, melwitt: is it OK if I mark it duplicate and then add nova as affected to the eariler neutron bug? | 12:26 |
sean-k-mooney | gibi: riscv was not orgianlly on there list i suggeted it be added since i tought that would be one of the more useful targets | 12:27 |
gibi | sean-k-mooney: OK, then I think it make sense to split it out | 12:27 |
sean-k-mooney | gibi: so defering that i dont think woudl compropise there usecase | 12:27 |
gibi | OK | 12:27 |
gibi | chateaulav: ^^ I know that it is again a change in direction but could you remove the riscv support from the series. that way we can avoid the ovo backport pain | 12:28 |
sean-k-mooney | chateaulav: ^ at the end fo the day the descion is yours | 12:28 |
gibi | sean-k-mooney: +1 | 12:28 |
chateaulav | Lol. Yeah I think that will be better overall gibi, I agree with sean-k-mooney that I can look at doing that for next cycle. It seems as though there are also some things in the work for the riscv firmware community that may be needed. Overall initial support will be arm, s390x, ppc, and potentially mips if I can solve the pci issue. So not bad, if you ask me, for initial release of emulation | 12:35 |
gibi | chateaulav: cool | 12:36 |
gibi | sean-k-mooney: filed the bug for the pci.utils networking.neutron coupling issue https://bugs.launchpad.net/nova/+bug/1961587 | 12:36 |
sean-k-mooney | ack just triaged it | 12:59 |
*** amoralej is now known as amoralej|lunch | 13:09 | |
gibi | thanks | 13:43 |
*** amoralej|lunch is now known as amoralek | 13:59 | |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1960412 https://review.opendev.org/c/openstack/nova/+/830010 | 14:22 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1960412 https://review.opendev.org/c/openstack/nova/+/830010 | 14:25 |
sean-k-mooney | gibi: have i mentioned how much i hate the fact that gerrit forward ports comments now | 14:38 |
sean-k-mooney | its such a ux regressions | 14:38 |
gibi | it is only a problem iff we forget to close the comments when they are solved | 14:38 |
sean-k-mooney | liek these ones https://review.opendev.org/c/openstack/nova/+/821606/10/nova/virt/libvirt/volume/lightos.py | 14:39 |
sean-k-mooney | gibi: also this has been retoactivly enabeld for all reviews in the past | 14:39 |
gibi | yeah, that retroactive thing is a pain I agree | 14:39 |
sean-k-mooney | so if we lookup a revew becasue we are lookign at git blame we will now get all the unresoved ones where the feature did not exist | 14:40 |
sean-k-mooney | im going to ack most of the commend on that once i check them but to me this breaks the workflow since if the converation is happenign betwen reviewrs rather then just the revier and authour | 14:41 |
gibi | yeah | 14:44 |
gibi | not optimal but in the other hand it helps when people only fix a set of comments and forget the rest | 14:44 |
gibi | and push a new PS | 14:44 |
sean-k-mooney | yes and know. it can but over all i think this is a more harmful change then good | 14:45 |
gibi | It forces me to close comment when they fixed or answered | 14:45 |
gibi | I think in time we can learn to use it well | 14:45 |
sean-k-mooney | right but i dont think you should have to do that | 14:45 |
sean-k-mooney | i am hoping we will get the option to disable it personlly | 14:46 |
sean-k-mooney | its the frist time that gerrit has made a change that has made me consider if we shoudl continue to use gerrit longterm | 14:46 |
sean-k-mooney | to me the fact that comments were pined to reveriosn was one of the big benifits for using gerrit | 14:47 |
sean-k-mooney | this regresses that | 14:47 |
sean-k-mooney | i can see the beniftis too but to me in its current form the beiftis do not outway the downside | 14:48 |
gibi | hm, maybe a toggle in the UX to show / hide the old PS comments can be a compromise | 14:48 |
opendevreview | yuval proposed openstack/nova master: Lightbits LightOS driver https://review.opendev.org/c/openstack/nova/+/821606 | 14:48 |
sean-k-mooney | if it only forward ported the comment that refered to lines that were modified in the next revision i could proably live with that | 14:48 |
yuval_ | gibi sean-k-mooney thanks for the fast reply! | 14:49 |
sean-k-mooney | like its infurating that i have to load back up ps 8 to see why they refer too | 14:49 |
gibi | yuval_: will check the last version before end of my day | 14:50 |
yuval_ | thanks, I just updated as sean-k-mooney suggested the req file and lower constraints | 14:52 |
yuval_ | to see zuul pass | 14:52 |
yuval_ | I am now re-adding the lightbits ci to work on nova repo | 14:52 |
sean-k-mooney | yuval_: https://review.opendev.org/c/openstack/nova/+/821606/11/nova/virt/libvirt/volume/lightos.py#31 can you respond to this too | 14:53 |
yuval_ | yes, no problem | 14:53 |
dmitriis | sean-k-mooney, gibi: w.r.t. the above ping, I can chime in to help with reviews | 14:53 |
gibi | dmitriis: thanks | 14:54 |
gibi | dmitriis, sean-k-mooney: w.r.t. the sysfs pci_utils network.neutron coupling, I started looking into the refactor and I have to say that the whole resource tracker is infected with teh sysfs coupling via the pci device tracker | 14:55 |
sean-k-mooney | gibi: yes that is not surpriseing to be honest | 14:55 |
gibi | so this sean-k-mooney but then we are back to that the compute manager depends on sysfs | 14:56 |
sean-k-mooney | everything pci related is only supproted on linux today | 14:56 |
gibi | which is an acceptable limitation but the external depenency (sysfs) is not abstracted out | 14:56 |
gibi | and heavily tangled in | 14:56 |
sean-k-mooney | yes it is | 14:57 |
gibi | so I think today it would be extremely hard to add support to PCI for a non Linux host | 14:57 |
sean-k-mooney | it could certenly be factored more but i guess the expecation is you woudl add windows suport to the pci module if that ever came to be a thing | 14:57 |
gibi | hm | 14:58 |
gibi | yeah that would be an option too | 14:58 |
sean-k-mooney | gibi: well you would have to just implenet the supprot in the pci.utils module i | 14:58 |
gibi | I might have been overzealeus about this | 14:58 |
sean-k-mooney | its what we do in os-vif for what its worth https://github.com/openstack/os-vif/tree/master/os_vif/internal/ip | 14:58 |
sean-k-mooney | we have an ip module with a windows and linux impl | 14:59 |
gibi | yeah maybe that is a better direction | 14:59 |
gibi | maybe I just wish for a better defined pci.utils module | 14:59 |
gibi | where the interface is clear | 14:59 |
sean-k-mooney | ack | 15:00 |
sean-k-mooney | we kind of have grown it organicly over time | 15:00 |
gibi | yes, it is very organic :) | 15:00 |
gibi | like the init_host in the compute manager parses the whitelist config early but by that it already reading sysfs a lot :) | 15:00 |
sean-k-mooney | at some point i think we will want to unify mdev+pci+vdpa into a more uniform device tracker imple | 15:01 |
gibi | even if the compute manager itself does not use the whitelsit config at all | 15:01 |
gibi | sean-k-mooney: yeah that is also an interesting direction | 15:01 |
sean-k-mooney | gibi: parsing the whitelist does not hit sysfs | 15:01 |
gibi | it does | 15:01 |
sean-k-mooney | it should not | 15:01 |
gibi | yeah :d | 15:01 |
gibi | :D | 15:01 |
sean-k-mooney | parsing the whitelist does not look at the devices | 15:02 |
sean-k-mooney | it just reads the json and constucts the filter object | 15:02 |
sean-k-mooney | it should not make a single call to sysfs | 15:02 |
sean-k-mooney | https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/pci/whitelist.py#L56-L83 | 15:02 |
gibi | https://github.com/openstack/nova/blob/0c31561792e0e13a9f8267e71fa484ab79957f04/nova/pci/devspec.py#L287 | 15:03 |
gibi | this is part of the Whitelist.__init__ codepath | 15:03 |
gibi | via PciDeviceSpec | 15:03 |
sean-k-mooney | it really should not me | 15:04 |
sean-k-mooney | https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/pci/whitelist.py#L80 | 15:04 |
sean-k-mooney | is where you say the dep happens | 15:04 |
sean-k-mooney | that shoudl not be happening there | 15:05 |
sean-k-mooney | we shoudl only be asseting the valudes in the config are valid here | 15:05 |
gibi | yeah L80 eventually calls _init_dev_details in the PciDeviceSpec | 15:06 |
sean-k-mooney | i guess we just added the validation for remote managed here | 15:06 |
gibi | yes that is remote managed part now that calls out to sysfs | 15:06 |
sean-k-mooney | gibi: ok well the intent is we shoudl not hit sysfs in this code path | 15:06 |
sean-k-mooney | i had suggested moving the valdiation logic out of there and into the virt driver for remote-managed | 15:07 |
sean-k-mooney | perhaps we shoudl do that | 15:07 |
sean-k-mooney | everthing here https://github.com/openstack/nova/blob/0c31561792e0e13a9f8267e71fa484ab79957f04/nova/pci/devspec.py#L315-L351 shoudl really be in teh virt driver | 15:08 |
gibi | I think there is two places now that potentiall hit the remote managed validation path. 1) the init_host in the compute manager to early exit the compute service 2) the pci device tracker that cannot do anything with an invalid config as it is called from a periodic | 15:08 |
gibi | so we potentially need a 3rd place that is in the virt driver and part of the init sequence | 15:09 |
sean-k-mooney | init_host calls the driver | 15:09 |
sean-k-mooney | that si where this validation shoudl be done | 15:09 |
sean-k-mooney | in here https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/virt/libvirt/driver.py#L707 | 15:10 |
gibi | yeah the confusion is that the validation is part of the whitelist config parsing and that config is compute level not libvirt level :/ | 15:11 |
sean-k-mooney | gibi: its not | 15:11 |
gibi | so it make sense to validate whitelist at the config level | 15:11 |
gibi | * compute level | 15:11 |
sean-k-mooney | well it is | 15:11 |
sean-k-mooney | validatign the whitelist is just ment to validate teh syntax | 15:11 |
sean-k-mooney | with some minor validation of the value in that product ids shoudl be in a certin range of allowed values | 15:12 |
sean-k-mooney | then avlidation of the semantics is ment to bve done in teh virt driver | 15:12 |
sean-k-mooney | i raised this in the reivew but i guess i did not see that code path. i did not think that was going to call those fucntion so that is a regression | 15:14 |
dmitriis | gibi: there's also the code path which uses interface names that calls out to sysfs | 15:15 |
sean-k-mooney | yes that is the qos code | 15:16 |
gibi | dmitriis: yes, there is a bunch of other coupling point. It is not just your patches introduced that | 15:16 |
gibi | sean-k-mooney: the qos code is going through the virt driver | 15:16 |
gibi | pci_info | 15:16 |
gibi | but there are other codes that goes to sysfs without the driver | 15:16 |
sean-k-mooney | gibi: yes | 15:16 |
sean-k-mooney | gibi: hum https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/virt/hyperv/hostops.py#L194-L217 so hyperv apparently does support pci passhtough | 15:19 |
sean-k-mooney | but maybe not sriov | 15:19 |
sean-k-mooney | at least not with qos | 15:19 |
sean-k-mooney | well neutron sriov ports i mean | 15:19 |
sean-k-mooney | but we cant depend on sysfs in the whitelist parseing of it woudl break hyperv potentially | 15:20 |
sean-k-mooney | right now its guarded by the remote managed tag | 15:20 |
sean-k-mooney | so it wont actully break anything unless you set that in the whitelist | 15:20 |
sean-k-mooney | but this is much more fragile then i would like | 15:21 |
gibi | hyperv does not use the pci device tracker I assume | 15:21 |
sean-k-mooney | i think it does | 15:21 |
sean-k-mooney | but the current code pats only run if you set remote_manged=true | 15:22 |
sean-k-mooney | and we catch error for the interface lookup in the qos path | 15:22 |
sean-k-mooney | which you also said is called form teh virt driver so it would not trigger an error in hyperv | 15:22 |
gibi | they do a very different and limited device matching https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/virt/hyperv/vmops.py#L438-L444 | 15:22 |
gibi | just vendor and product id | 15:23 |
sean-k-mooney | yes | 15:23 |
gibi | so they are not relying on the whitelist pcidevicespec matching logic | 15:23 |
sean-k-mooney | hyperv does not allow them to access the pci address | 15:23 |
sean-k-mooney | the have an indrict handel via a uuid if i rememebr correctly | 15:23 |
sean-k-mooney | so they cant use the devname or adress fields | 15:24 |
dmitriis | sean-k-mooney: yeah, with the current code, runtime checks would only be triggered if the tag is set to true | 15:24 |
sean-k-mooney | just vendor id and prduct id | 15:24 |
sean-k-mooney | gibi: dmitriis so right now the pci device are first loaded form teh hypervior via https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/compute/manager.py#L1554-L1560 | 15:26 |
sean-k-mooney | the compute mangers pre start hook calls update_available_resource | 15:27 |
sean-k-mooney | which calls into driver which gets the avaible resouces | 15:27 |
sean-k-mooney | that is the entrypoint that should raise the exction that will kill the comptue service if the confguration is invlaide | 15:27 |
sean-k-mooney | https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/compute/resource_tracker.py#L879 | 15:28 |
sean-k-mooney | is where the resouce tracker called the virt driver | 15:28 |
opendevreview | Jonathan Race proposed openstack/nova master: object/notification for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/828369 | 15:28 |
opendevreview | Jonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/822053 | 15:28 |
opendevreview | Jonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/828372 | 15:28 |
sean-k-mooney | which gets the pci passthough info https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/virt/libvirt/driver.py#L8984 | 15:29 |
gibi | sean-k-mooney: so the comment in https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/compute/manager.py#L1388-L1396 is not valid in the sense that the pci device tracker could kill the compute service not just the periodic task | 15:29 |
sean-k-mooney | that is used to init the pci tracker here https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/compute/resource_tracker.py#L761 | 15:30 |
sean-k-mooney | gibi: yes it should be able to kill both | 15:30 |
* sean-k-mooney clicks | 15:30 | |
sean-k-mooney | gibi: so yes loadign the whitelisth ehre kills things early if the syntax is wrong but the pci tracker can kill the agent and or the perodic later if it whishes by raising an excption | 15:31 |
gibi | https://review.opendev.org/c/openstack/nova/+/342301/2/nova/compute/manager.py this added the early parsing | 15:31 |
gibi | and at that time the pre_start_hook already called update_available_resource so then I don't see how that original bug happened | 15:32 |
sean-k-mooney | i dont think i reviewd that | 15:33 |
sean-k-mooney | so im not sure either | 15:33 |
gibi | it is ooold | 15:33 |
gibi | anyhow | 15:34 |
gibi | in summary | 15:34 |
gibi | we have plenty of coupling | 15:34 |
gibi | I will try to document that in the bug | 15:34 |
gibi | I opened | 15:34 |
gibi | but I'm not sure any more that I want to refactor everything before I fix the PF MAC update problem | 15:34 |
sean-k-mooney | actully i think this was invalid | 15:35 |
sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1603034 | 15:35 |
sean-k-mooney | pci_passthrough_whitelist = [ {"devname": "hed1", "physical_network": "physnet1"},{"physical_network": "physnet1", "address": "*:04:00.0"},{"physical_network": "physnet2", "address": "*:04:00.1"}] | 15:35 |
sean-k-mooney | they were expecting an error because hed1 was not present | 15:35 |
sean-k-mooney | but the whitelist parsing shoudl not check that | 15:35 |
sean-k-mooney | so the change they have should not affect that behvioar | 15:35 |
gibi | still the whitelist.__init__ caught that based on the attached stack trace | 15:36 |
gibi | but did not kill the compute | 15:36 |
sean-k-mooney | it looks like it was found because of _init_dev_details | 15:36 |
gibi | yepp | 15:36 |
sean-k-mooney | so at that poitn it was doing the sysfs lookup | 15:37 |
gibi | yes | 15:37 |
sean-k-mooney | ya so that was actully already a bug | 15:37 |
sean-k-mooney | since it shoudl not call sysfs as we have been disucssing | 15:37 |
sean-k-mooney | gibi: ok so you are writign up a bug for this | 15:38 |
sean-k-mooney | and are going to try an capture this old context | 15:38 |
gibi | yes | 15:39 |
gibi | I try to write up what couplings I found | 15:39 |
gibi | the Whitelist.__init__ is one example | 15:39 |
gibi | I did the write up https://bugs.launchpad.net/nova/+bug/1961587/comments/2 | 16:08 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Run clean up calls when queued live migration is aborted https://review.opendev.org/c/openstack/nova/+/828570 | 16:11 |
*** amoralek is now known as amoralej|off | 16:43 | |
kashyap | Finally ... Linux kernel wants to consolidate /dev/random and /dev/urandom. Removes much confusion - https://lwn.net/SubscriberLink/884875/0c09a5929fef859c/ | 16:48 |
kashyap | "it would seem there are no huge barriers to removing the final distinction between /dev/random and /dev/urandom—other than the names, of course" | 16:48 |
opendevreview | Ghanshyam proposed openstack/nova master: Modify remaining APIs as per RBAC new guidelines https://review.opendev.org/c/openstack/nova/+/828994 | 16:53 |
opendevreview | Ghanshyam proposed openstack/nova master: Separate flavor extra specs policy for server APIs https://review.opendev.org/c/openstack/nova/+/829626 | 16:54 |
opendevreview | Ghanshyam proposed openstack/nova master: Complete phase-1 of RBAC community-wide goal https://review.opendev.org/c/openstack/nova/+/829866 | 16:54 |
dansmith | gibi: where did the compute node stuff go in this? https://review.opendev.org/c/openstack/nova/+/828369/20 | 16:56 |
gibi | dansmith: on a call, I will ping you soon | 16:56 |
dansmith | gibi: I was going to circle back and see how that was shaping up, but it looks like a lot was dropped in the latest rev so something much have changed | 16:56 |
dansmith | sure | 16:56 |
chateaulav | dansmith: see patchset 19 | 16:59 |
dansmith | chateaulav: what about it? no comments on there about a change in direction that I see | 17:00 |
dansmith | chateaulav: or do you mean ps20 was an accident? | 17:00 |
chateaulav | we discussed this morning about dropping initial riscv support in this cycle. I can make note in the review of the discussion from irc. | 17:01 |
dansmith | ah okay | 17:01 |
chateaulav | dansmith: i wanted to ensure code had no issues with the support removed | 17:02 |
dansmith | that's why I was asking here, figured there was some discussion not captured there | 17:02 |
chateaulav | totally | 17:02 |
chateaulav | dansmith: let me know if you have any questions. and also see the DNM from gibi for a more complete capture of the issue and related bug that was submitted. https://review.opendev.org/c/openstack/nova/+/829989 | 17:18 |
dansmith | chateaulav: I just commented on that | 17:18 |
* gibi is back | 17:25 | |
gibi | dansmith: chateaulav is correct we do not try to extend the Arch enum any more so the ComputeNode OVO backporting issue can be ignored now | 17:26 |
gibi | at least for the current release | 17:26 |
dansmith | yep, I'm caught up having seen your DNM | 17:26 |
gibi | dansmith: thanks for the comment on https://review.opendev.org/c/openstack/nova/+/829989 | 17:27 |
gibi | dansmith: the separate ListObject thing might work | 17:27 |
dansmith | worth a try I think | 17:27 |
dansmith | and in that case, the list object becomes more of a "this is what specs the remote side supports" which can have its own legit translation logic in the backport | 17:28 |
gibi | dansmith: so in the next release if we want riscv support then we can try that | 17:28 |
dansmith | ++ | 17:28 |
gibi | if it does not work then we can still make a minimum service version check on the new compute before publishing support for new archs | 17:28 |
dansmith | there could be a situation in the future where we supported arm and then it split into arml and armf, such that we need to translate those two into one or something | 17:28 |
dansmith | yep, for sure | 17:28 |
gibi | dansmith: ohh that arm thingy is a nice example where we need specific translation in the supported list | 17:29 |
dansmith | yeah | 17:29 |
gibi | OK I think this is settled | 17:29 |
gibi | thanks | 17:29 |
dansmith | cool | 17:30 |
gibi | chateaulav: I checked the first patch of the series it looks good to me. I will continue with the rest tomorrow | 17:33 |
chateaulav | awesome thanks gibi | 17:58 |
melwitt | gibi: feel free to mark the nova bug I reported as a duplicate | 17:59 |
melwitt | I didn't realize it was already out there | 17:59 |
*** carloss is now known as carloss|afk | 18:29 | |
*** carloss|afk is now known as carloss | 19:29 | |
opendevreview | Merged openstack/nova stable/xena: Clean up allocations left by evacuation when deleting service https://review.opendev.org/c/openstack/nova/+/816954 | 20:51 |
opendevreview | Merged openstack/nova master: trivial: Add a white space in an error message https://review.opendev.org/c/openstack/nova/+/823223 | 20:51 |
opendevreview | yuval proposed openstack/nova master: Lightbits LightOS driver https://review.opendev.org/c/openstack/nova/+/821606 | 20:53 |
opendevreview | Merged openstack/nova master: nova-next: Drop NOVA_USE_SERVICE_TOKEN from subnode https://review.opendev.org/c/openstack/nova/+/816740 | 20:57 |
opendevreview | Merged openstack/nova master: Vmware: Fix spelling in test https://review.opendev.org/c/openstack/nova/+/806348 | 20:58 |
opendevreview | Merged openstack/nova master: Correct test error https://review.opendev.org/c/openstack/nova/+/773634 | 20:58 |
opendevreview | Julia Kreger proposed openstack/nova master: Ironic - Handle instance/node host on rebalance https://review.opendev.org/c/openstack/nova/+/813897 | 22:32 |
opendevreview | Julia Kreger proposed openstack/nova master: Ironic - Don't query the API for instance counts https://review.opendev.org/c/openstack/nova/+/829613 | 22:33 |
opendevreview | Merged openstack/nova master: Raise InstanceNotFound on fkey constraint fail saving info cache https://review.opendev.org/c/openstack/nova/+/826942 | 23:11 |
opendevreview | Merged openstack/nova master: VmWare: Use of id shadows built-in function https://review.opendev.org/c/openstack/nova/+/806390 | 23:11 |
opendevreview | Merged openstack/nova stable/xena: Prevent leaked eventlets to send notifications https://review.opendev.org/c/openstack/nova/+/816487 | 23:11 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!