Monday, 2022-02-21

opendevreviewMiguel Lavalle proposed openstack/os-vif master: Fix race with DPDK and vhostuserclient mode  https://review.opendev.org/c/openstack/os-vif/+/83010300:36
*** prometheanfire is now known as Guest204:59
*** Guest2 is now known as prometheanfire05:09
*** amoralej|off is now known as amoralej07:01
bauzashola folks09:41
* bauzas had to resurrect his ZNC bouncer09:41
gibibauzas: o/09:41
bauzasUggla: looks like your karma helped my server to be back alive :)09:41
bauzasjust by a gchat ping, wow09:41
gibigmann: thanks I've started reading it this morning09:41
bauzasgmann: will review it 09:42
Ugglabauzas, first time it happens to me! :) Most of the time I find bugs. ;)09:43
bauzasit was fun09:43
bauzasI was fighting for 2 hours with a rasppi not wanting to resurrect and like 1 min after your ping and like 10 reboots, it eventually rebooted09:44
bauzasmaybe a fsck09:44
bauzasjust take one day off and be prepared about a bad day :p09:45
UgglaI have a question regarding: https://specs.openstack.org/openstack/nova-specs/specs/yoga/approved/libvirt-virtiofs-attach-manila-shares.html09:47
UgglaRegarding the model we have share_mapping table and instance_uuid as a field. If I'm not wrong we have a one to one relationship between share and instance, correct ?09:50
UgglaBut a share can be attached to several instances. So the model may need to be changed. Am I wrong ?09:53
bauzashmmmm09:56
* bauzas looks09:56
UgglaI would say we need a share_uuid field on the instance table to have a 1 to n relation ?09:57
UgglaI could update the spec, but I would like to know if I have understand it well.09:58
bauzasUggla: oh, I just read again the spec10:01
bauzasUggla: nothing prevents you to have a share_mapping table with multiple share_uuid records having the same UUID10:02
Ugglaso we will have several records for the same share with a diff uuid.10:03
Ugglaso we will have several records for the same share with a diff instance_uuid.10:03
bauzasI guess lyarwood wanted to tell we would do the same like for BlockDeviceMapping table10:03
bauzaswhere we have a UC be the mapping UUID itself10:04
bauzasnot the volume UUID10:04
bauzasUggla: https://github.com/openstack/nova/blob/master/nova/db/main/models.py#L59010:04
bauzaseither way, the PK on the bdmmapping table is a single 'id' key10:05
kashyapSome more documentation on it - https://docs.openstack.org/nova/latest/user/block-device-mapping.html10:05
bauzasthanks kashyap10:06
gibigmann: thanks I've started reading the new PSs this morning 10:08
UgglaOk thanks, so in that case, maybe the share_mapping table requires an index on instance_uuid ? 10:12
bauzasUggla: this depends whether you want to query from this field10:13
bauzasUggla: but since you'll tag a FK on instances.uuid, it will automatically generate an index10:14
bauzas:)10:14
Ugglaok10:14
opendevreviewPavlo Shchelokovskyy proposed openstack/nova master: Sort PCI devices by their address  https://review.opendev.org/c/openstack/nova/+/83013610:20
UgglaAnother question, looking at block device attachment, the api speaks directly to the compute api. Are there some specific rules that defines to speak through the conductor, or directly to the compute. Do we speak to the compute directly when we don't need to interact with the db at the first stage ?10:24
lyarwood\o hey so yeah I thought I included a unique UUID per mapping between the instance and share?10:24
lyarwoodUggla: the compute API isn't running on the compute, that's up in nova-api, the manager is the part running on the compute10:26
bauzaslyarwood: I explained it to Uggla10:27
bauzashttps://specs.openstack.org/openstack/nova-specs/specs/yoga/approved/libvirt-virtiofs-attach-manila-shares.html#data-model-impact10:28
bauzasUggla: ^10:28
lyarwoodcool thanks and sorry I don't have this IRC client open anymore, only just saw the email :)10:28
Ugglalyarwood, agreed my question is more the path api --> compute vs api --> cond --> compute ?10:28
lyarwoodUggla: we only use the latter for long running tasks iirc like rebuilds, resizes etc10:29
lyarwoodUggla: device attachments that are simple async operations tend to go directly between  the api and compute10:29
bauzas++10:33
bauzasalso if we already know the compute10:33
Ugglalyarwood, so the manila one drops down to the simple async op case, correct ?10:33
lyarwoodYeah10:34
lyarwoodand tbh it should be a noop at the moment as the instance is shutoff anyway10:34
lyarwoodstarting the instance will recreate it within the libvirt driver and the logic to attach the mount needs to live there initially10:35
lyarwooduntil we support hot plug 10:35
Ugglalyarwood, ok thx.10:36
Ugglathank you for the answers, I think I can continue.10:37
opendevreviewyuval proposed openstack/nova master: Lightbits LightOS driver  https://review.opendev.org/c/openstack/nova/+/82160611:00
opendevreviewyuval proposed openstack/nova master: Lightbits LightOS driver  https://review.opendev.org/c/openstack/nova/+/82160611:03
sean-k-mooneygibi: am well the pci module also is virt dirver indepenent and we explictily call sysfs in many cases to avoid depening on the virt driver11:13
sean-k-mooneygibi: so that is already an established patteren11:13
gibi:/11:14
gibiso basically nova-compute cannot run on windows with hyperv11:14
gibior it can but never handle PCI devices11:14
sean-k-mooneygibi: they dont support pci device management11:14
gibiyeah I figure that they cannot right now 11:14
sean-k-mooneyso ya on windows they would have ti impelent all the fucntions in a windows compatiable way11:14
gibiinterestingly whent he pf_interface_name was added for QoS that was added via the virt driver interface11:15
sean-k-mooneyin os-vif we added an indriection layer in teh few places we needed it11:15
sean-k-mooneygibi: yes which i changed since that was unreliable11:15
sean-k-mooneydue to libvirt design choices11:15
sean-k-mooneymainly the way the cache and how that interacts with udev11:16
sean-k-mooney gibi  so right now the neutron use of sysfs is still confied to the sriov path11:17
sean-k-mooneyso i dont think this usage breaks windows support since that was already not supported11:17
sean-k-mooneygibi: are there other usages that you found that would be common11:18
gibisorry I meant parent_ifname 11:18
gibithat is still done via the virt driver interface https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L132111:18
sean-k-mooney ah there11:18
gibisean-k-mooney: the new smartnic feature introduced a set of new dependencies between nova's neturon code (called from the compute manager) and sysfs11:19
sean-k-mooneywell its calling into the pci module to do it11:19
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L132211:19
sean-k-mooneyso that is not actully calling libvirt anymore11:19
gibiyes but the call coming from the libvirt driver11:19
gibiso the nova-compute does not depend on the sysfs in that case11:20
gibias the virtdirver hides it11:20
sean-k-mooneyyes this is genergating the virt independent view of the pci devices that is pass to the pci tracker11:20
sean-k-mooneyyes11:20
sean-k-mooneygibi: so at one point i think i proposed storing the mac and vf number in addtion to the serial in the pci dev extra info11:21
sean-k-mooneywe could do that and remove the need to do this lookup entirly11:21
sean-k-mooneyin the neutorn module11:21
gibiyeah my question is do we want two interfaces from compute manager towards the hypervisor host11:22
gibias today we have the virt interface and the pci_utils "interface"11:23
sean-k-mooneythats fair i know os-brick also looks at the host filesystem11:23
sean-k-mooneyi guess that is hidden?11:23
gibido we call os-brick outside of the virt driver?11:24
sean-k-mooneyis os-brick only used via the virt dirver i assume so11:24
sean-k-mooneyya i dont think we call it in the generic code11:24
sean-k-mooneyi was just trying to think what other indrections we have11:24
sean-k-mooneyi generally did not consider the nuetorn module to be part fo the compute manager11:24
sean-k-mooneybut it is call by it11:24
sean-k-mooneyi was thinking of it like the pci module11:25
gibiI grepped now, os_brick only imported from under nova.virt (and in nova-manage)_11:25
sean-k-mooneyack11:25
sean-k-mooneygibi: so i think we could file this as a bug and adress it without much extra work11:25
gibiOK. I will file a bug. 11:25
gibiand probably I have too do the fixing as well as I need this to make the PF MAC address reporting to neutron11:26
gibias the current patch for that is also calling pci_utils from nova's neutron code instead of relying on the pci_info11:27
gibifrom the virt driver11:27
sean-k-mooneyya it was an exsiting pattern but i agree the tight coupleing shoudl not exist between linux and the neutron module11:27
gibiI think the pattern was established ~ 2015 when the first fix for the PF MAC address problem was added :D11:28
gibiso everything is connected :D11:29
sean-k-mooneythere is one think to consider however11:29
sean-k-mooneythe pci tracker tracks pools of similar devices11:29
sean-k-mooneyso for PF mac and PF name that logicaly maps ok to pool fo the PF's VFs11:30
sean-k-mooneythe vf number will it can be tracked per device its not an atribute of the pool11:30
sean-k-mooneyso some of this info coudl be store in the db and some will have to be provided by the virt dirver per device11:31
sean-k-mooneyit can do that just by calling the existing fucntion when it creates the device object11:31
sean-k-mooneybut just pointing out that we might not be abel to hide this in the common pci module code11:32
sean-k-mooneywe might need to do this in the virt driver part11:32
sean-k-mooneyin any case it should be relitivly trivial to extend https://github.com/openstack/nova/blob/master/nova/objects/pci_device.py#L12111:33
gibisean-k-mooney: currently the parent_ifname is part of the extra_info 11:33
sean-k-mooneyin fact we could store the info in extra_info for backwards compatiablity without changing the object 11:33
sean-k-mooneyya11:33
gibiyeah11:33
gibilets try that11:34
sean-k-mooneyso we can add the other field there and keep object compat and perhaps add properties for access11:34
gibiyepp11:34
gibiand that way I can still use that in my PF MAC address bugfix and keep it backportable11:34
sean-k-mooneyyes11:34
gibicool, thanks for the brainstroming11:35
* sean-k-mooney note to self always include a dict of random strings filed for "backporting"11:35
gibi:D11:35
gibiyeah I feel the mental pain due to that json blobs in the db11:36
gibibut the bugfix backporting pain is bigger11:36
gibiso that won11:36
sean-k-mooneyhonestly the only thing i woudl replace it with is a key value mapping table11:40
sean-k-mooneylike instance extra11:40
sean-k-mooney* instance_system_metadata11:40
gibiif the OVO model is just a Dict then changing the SQL model does not help architecturally11:40
sean-k-mooneybut that would not buy use much since we dont really query on the sub fileds11:41
sean-k-mooneyya it doesnt really11:41
sean-k-mooneyif you work on this ping me and ill happily review i dont know if dmitriis woudl have time to work on it as a followup/techdebt cleanup11:42
gibisean-k-mooney: thanks. I will work on it as I need it for the PF MAC bugfix anyhow11:43
sean-k-mooney:) ping me if you want me to review that too. speaking of which i should go look at the rest of your placement series11:44
sean-k-mooneygibi: we defintly want to backport the PF fix if it ends up being posible by the way11:45
gibisean-k-mooney: I think it will be possible11:45
sean-k-mooneyi kind of feel like the backporatble fix and final fix would be different however11:45
gibisean-k-mooney: yeah, if you are done with the placement series then I will switch to that to fix the small thing in a followup11:45
gibisean-k-mooney: let's see about the backportability of the PF MAC 11:46
sean-k-mooneyas in final fix would invovle multipel port bdingin but backporable would just unbind the port for the vm update the mac and reattach it or something like that11:46
gibisean-k-mooney: nope, the plan is to add the PF MAC to the binding:profile and let neutron do the overwrite of port.mac_addres based on that11:46
sean-k-mooneyoh right yes11:47
sean-k-mooneyyou said that last week11:47
gibithe neutron side is basically ready in https://review.opendev.org/c/openstack/neutron/+/82924711:47
gibi(I need to fix comments there)11:47
sean-k-mooneyare neutron ok to backport that?11:47
sean-k-mooneyor will we have to workaround that in the nova backport11:47
gibiI think they are OK to backport that11:48
gibino issue was raise so far on that11:48
sean-k-mooneyack then that makes our life simpler11:48
gibithe nova side needs a redo as per the pci_utils / pci_info refactor https://review.opendev.org/c/openstack/nova/+/82924811:48
sean-k-mooneyi was assuming they would not backport that change11:48
gibisean-k-mooney: at least they haven't indicated that it is not backportable11:48
gibiI will double check with them11:49
sean-k-mooneyack it looks pretty small and i dont think it would break existing usage11:49
sean-k-mooneyso it shoudl be safe to backport11:49
gibiyepp that is my view too11:49
sean-k-mooneyits kind of a feature however but i would be look very favorably towards framing it as a bugfix as you have done in the release note11:50
sean-k-mooneyill finish reviewing that since i have it open now11:50
gibiyeah it is a bug in nova, that needs some featury thing from neutron :)11:51
sean-k-mooneygibi: there is no way to detect this form the api right11:51
sean-k-mooneye.g. that this is supproted11:51
gibiit cannot be detected11:52
sean-k-mooneyjust thining that on the nova side we could eventually move to only setting the PF port via this eventully11:52
gibibut it is safe from upgrade11:52
gibiyeah I left a note in the nove code to only remove the current PF MAC settings logic after couple of releases11:52
sean-k-mooneyperfect11:52
gibisean-k-mooney: btw the arch OVO backport is also crazyness https://review.opendev.org/c/openstack/nova/+/829989 but I have a totally differnt proposal as a possible direction as a comment within that patch12:00
sean-k-mooneywhat exactly is that doing?12:00
gibiso the Arch enum got new values12:01
gibidue to the emulation support feature12:01
sean-k-mooneyright which should be added at the end12:01
gibiand the ComputeNode object has a list of HVSpec objects with arch values12:01
sean-k-mooneyand then you would just raise an error if you try to backport12:01
gibiHVSpec cannot backport itself12:02
gibithe ComputeNode would need to backport it12:02
gibiby removing too new HVSpec instances (with new arch)12:02
gibibut OVO does not cooperate12:02
gibiI even filled https://bugs.launchpad.net/oslo.versionedobjects/+bug/196148212:02
gibiso the current patch up above works but ugly as hell12:02
sean-k-mooneyam well the compute node would not have the new values if it was not updated12:02
sean-k-mooneyand an instance object with the new values shoudl not be sendable to the compute node12:03
gibithere is code path in live migration where an old compute loads the new compute from the DB12:03
sean-k-mooneyoh12:03
sean-k-mooneythats a  nasty edgecase12:04
gibigrenade caught it12:04
* gibi pats grenade's head12:04
sean-k-mooneyright ok so the list object would have to strip the new version instead of raise12:04
gibiyes12:04
gibibut it is really hard to do that with OVO12:04
gibi(see the patch with a ton of comments)12:05
sean-k-mooneyi dont think we have had to do that in the past12:05
sean-k-mooneyat least i cant think of a case where this has come up12:05
gibimy alternative proposal is not to publish HVSpec objects with new arch values until a min service version check is passed12:05
sean-k-mooneymost of the enums we have extended in the past have been only in teh instance object12:05
sean-k-mooneyok ya that would work12:06
sean-k-mooneywe often dont enable feature until the cloud is fully upgraded12:06
sean-k-mooneythis woudl cause all the compute nodes to update the comptue node table once they are all upgraded12:06
gibiI pulled dansmith into the discussion maybe he sees an easy fix in the OVO support for this case but if not then I will suggest chateaulav to add a service version check instead12:06
sean-k-mooneywell maybe not12:07
sean-k-mooneywe are extendign the values but that does not mean the compute node would use one of the new values12:07
gibitoday it uses in the gate12:07
gibiI haven't checked it what enables all the new arch12:07
sean-k-mooneyits likely just the presence fo the qemu binary12:08
sean-k-mooneyon ubuntu it installs all of them by default12:08
sean-k-mooneyhow is this reported in the compute node db recored by the way12:08
sean-k-mooneyah supported_hv_specs12:09
gibiyes, that one12:09
sean-k-mooneyi guess we dont wnat to defien that as the host native arch12:10
sean-k-mooneysince that would not really help if we had native hosts using one of the new options12:10
gibiit seems the new arch coming from the host cababilities via libvirt12:10
sean-k-mooneyyes12:11
sean-k-mooneythose are dynmic based on which qemu binaries libvirt finds12:11
gibiahh I see 12:11
gibithe you are right about the binaries12:11
sean-k-mooneytechnically its dynmaic based on which libvirt virt plugin too e.g. if you install the lxc or openvz plugins they also extend it but in this case its the presence of the riscv emulators12:12
sean-k-mooneyignoring this for a miniute how close do you think the emulation work is for this cycle12:13
sean-k-mooneyi think the previous impression i heard was likely not this cycle12:13
sean-k-mooneyi am debating if defering the ricsv support woudl help or not12:13
sean-k-mooneythat woudl allow use to punt this enum change to next cycle12:14
sean-k-mooneyso basically im suggesting that perhaps its better for chateaulav to focus on the arm and mips supprot this cycle and we can adress teh riscv supprot next cycel btu if we dont think the code is likely to land before thruday which i gusse is unlikely it proably is not worth spliting it out12:16
gibisean-k-mooney: good point. I think the series is in an OK shape overall12:25
gibiso if we could remove the OVO backport pain by not adding risc then we can land it this week I believe12:25
gibibauzas, melwitt: there is a gate bug on nova that is a duplicate of a gate bug in neutron https://bugs.launchpad.net/nova/+bug/1959349 https://bugs.launchpad.net/neutron/+bug/1940425 but if I mark the newer bug as duplicate then we loose the tracking of it12:26
gibibauzas, melwitt: is it OK if I mark it duplicate and then add nova as affected to the eariler neutron bug?12:26
sean-k-mooneygibi: riscv was not orgianlly on there list i suggeted it be added since i tought that would be one of the more useful targets12:27
gibisean-k-mooney: OK, then I think it make sense to split it out12:27
sean-k-mooneygibi: so defering that i dont think woudl compropise there usecase12:27
gibiOK12:27
gibichateaulav: ^^ I know that it is again a change in direction but could you remove the riscv support from the series. that way we can avoid the ovo backport pain 12:28
sean-k-mooneychateaulav: ^ at the end fo the day the descion is yours12:28
gibisean-k-mooney: +112:28
chateaulavLol. Yeah I think that will be better overall gibi, I agree with sean-k-mooney that I can look at doing that for next cycle. It seems as though there are also some things in the work for the riscv firmware community that may be needed. Overall initial support will be arm, s390x, ppc, and potentially mips if I can solve the pci issue. So not bad, if you ask me, for initial release of emulation12:35
gibichateaulav: cool12:36
gibisean-k-mooney: filed the bug for the pci.utils networking.neutron coupling issue https://bugs.launchpad.net/nova/+bug/196158712:36
sean-k-mooneyack just triaged it 12:59
*** amoralej is now known as amoralej|lunch13:09
gibithanks13:43
*** amoralej|lunch is now known as amoralek13:59
opendevreviewAlexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1960412  https://review.opendev.org/c/openstack/nova/+/83001014:22
opendevreviewAlexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1960412  https://review.opendev.org/c/openstack/nova/+/83001014:25
sean-k-mooneygibi: have i mentioned how much i hate the fact that gerrit forward ports comments now14:38
sean-k-mooneyits such a ux regressions14:38
gibiit is only a problem iff we forget to close the comments when they are solved14:38
sean-k-mooneyliek these ones https://review.opendev.org/c/openstack/nova/+/821606/10/nova/virt/libvirt/volume/lightos.py14:39
sean-k-mooneygibi: also this has been retoactivly enabeld for all reviews in the past14:39
gibiyeah, that retroactive thing is a pain I agree14:39
sean-k-mooneyso if we lookup a revew becasue we are lookign at git blame we will now get all the unresoved ones where the feature did not exist14:40
sean-k-mooneyim going to ack most of the commend on that once i check them but to me this breaks the workflow since if the converation is happenign betwen reviewrs rather then just the revier and authour14:41
gibiyeah14:44
gibinot optimal but in the other hand it helps when people only fix a set of comments and forget the rest14:44
gibiand push a new PS14:44
sean-k-mooneyyes and know. it can but over all i think this is a more harmful change then good14:45
gibiIt forces me to close comment when they fixed or answered 14:45
gibiI think in time we can learn to use it well14:45
sean-k-mooneyright but i dont think you should have to do that14:45
sean-k-mooneyi am hoping we will get the option to disable it personlly14:46
sean-k-mooneyits the frist time that gerrit has made a change that has made me consider if we shoudl continue to use gerrit longterm14:46
sean-k-mooneyto me the fact that comments were pined to reveriosn was one of the big benifits for using gerrit14:47
sean-k-mooneythis regresses that14:47
sean-k-mooneyi can see the beniftis too but to me in its current form the beiftis do not outway the downside14:48
gibihm, maybe a toggle in the UX to show / hide the old PS comments can be a compromise14:48
opendevreviewyuval proposed openstack/nova master: Lightbits LightOS driver  https://review.opendev.org/c/openstack/nova/+/82160614:48
sean-k-mooneyif it only forward ported the comment that refered to lines that were modified in the next revision i could proably live with that14:48
yuval_gibi sean-k-mooney thanks for the fast reply!14:49
sean-k-mooneylike its infurating that i have to load back up ps 8 to see why they refer too14:49
gibiyuval_: will check the last version before end of my day14:50
yuval_thanks, I just updated as sean-k-mooney suggested the req file and lower constraints14:52
yuval_to see zuul pass14:52
yuval_I am now re-adding the lightbits ci to work on nova repo14:52
sean-k-mooneyyuval_: https://review.opendev.org/c/openstack/nova/+/821606/11/nova/virt/libvirt/volume/lightos.py#31 can you respond to this too14:53
yuval_yes, no problem14:53
dmitriissean-k-mooney, gibi: w.r.t. the above ping, I can chime in to help with reviews14:53
gibidmitriis: thanks14:54
gibidmitriis, sean-k-mooney: w.r.t. the sysfs pci_utils network.neutron coupling, I started looking into the refactor and I have to say that the whole resource tracker is infected with teh sysfs coupling via the pci device tracker14:55
sean-k-mooneygibi: yes that is not surpriseing to be honest14:55
gibiso this sean-k-mooney but then we are back to that the compute manager depends on sysfs14:56
sean-k-mooneyeverything pci related is only supproted on linux today14:56
gibiwhich is an acceptable limitation but the external depenency (sysfs) is not abstracted out14:56
gibiand heavily tangled in14:56
sean-k-mooneyyes it is14:57
gibiso I think today it would be extremely hard to add support to PCI for a non Linux host14:57
sean-k-mooneyit could certenly be factored more but i guess the expecation is you woudl add windows suport to the pci module if that ever came to be a thing14:57
gibihm14:58
gibiyeah that would be an option too14:58
sean-k-mooneygibi: well you would have to just implenet the supprot in the pci.utils module i14:58
gibiI might have been overzealeus about this14:58
sean-k-mooneyits what we do in os-vif for what its worth https://github.com/openstack/os-vif/tree/master/os_vif/internal/ip14:58
sean-k-mooneywe have an ip module with a windows and linux impl14:59
gibiyeah maybe that is a better direction14:59
gibimaybe I just wish for a better defined pci.utils module14:59
gibiwhere the interface is clear14:59
sean-k-mooneyack15:00
sean-k-mooneywe kind of have grown it organicly over time15:00
gibiyes, it is very organic :)15:00
gibilike the init_host in the compute manager parses the whitelist config early but by that it already reading sysfs a lot :)15:00
sean-k-mooneyat some point i think we will want to unify mdev+pci+vdpa into a more uniform device tracker imple15:01
gibieven if the compute manager itself does not use the whitelsit config at all15:01
gibisean-k-mooney: yeah that is also an interesting direction15:01
sean-k-mooneygibi: parsing the whitelist does not hit sysfs15:01
gibiit does15:01
sean-k-mooneyit should not15:01
gibiyeah :d15:01
gibi:D15:01
sean-k-mooneyparsing the whitelist does not look at the devices15:02
sean-k-mooneyit just reads the json and constucts the filter object15:02
sean-k-mooneyit should not make a single call to sysfs15:02
sean-k-mooneyhttps://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/pci/whitelist.py#L56-L8315:02
gibihttps://github.com/openstack/nova/blob/0c31561792e0e13a9f8267e71fa484ab79957f04/nova/pci/devspec.py#L28715:03
gibithis is part of the Whitelist.__init__ codepath15:03
gibivia PciDeviceSpec15:03
sean-k-mooney it really should not me15:04
sean-k-mooneyhttps://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/pci/whitelist.py#L8015:04
sean-k-mooneyis where you say the dep happens15:04
sean-k-mooneythat shoudl not be happening there15:05
sean-k-mooneywe shoudl only be asseting the valudes in the config are valid here15:05
gibiyeah L80 eventually calls _init_dev_details in the PciDeviceSpec15:06
sean-k-mooneyi guess we just added the validation for remote managed here15:06
gibiyes that is remote managed part now that calls out to sysfs15:06
sean-k-mooneygibi: ok well the intent is we shoudl not hit sysfs in this code path15:06
sean-k-mooneyi had suggested moving the valdiation logic out of there and into the virt driver for remote-managed15:07
sean-k-mooneyperhaps we shoudl do that15:07
sean-k-mooneyeverthing here https://github.com/openstack/nova/blob/0c31561792e0e13a9f8267e71fa484ab79957f04/nova/pci/devspec.py#L315-L351 shoudl really be in teh virt driver15:08
gibiI think there is two places now that potentiall hit the remote managed validation path. 1) the init_host in the compute manager to early exit the compute service 2) the pci device tracker that cannot do anything with an invalid config as it is called from a periodic15:08
gibiso we potentially need a 3rd place that is in the virt driver and part of the init sequence15:09
sean-k-mooneyinit_host calls the driver15:09
sean-k-mooneythat si where this validation shoudl be done15:09
sean-k-mooneyin here https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/virt/libvirt/driver.py#L70715:10
gibiyeah the confusion is that the validation is part of the whitelist config parsing and that config is compute level not libvirt level :/15:11
sean-k-mooneygibi: its not15:11
gibiso it make sense to validate whitelist at the config level15:11
gibi* compute level15:11
sean-k-mooneywell it is15:11
sean-k-mooneyvalidatign the whitelist is just ment to validate teh syntax15:11
sean-k-mooneywith some minor validation of the value in that product ids shoudl be in a certin range of allowed values15:12
sean-k-mooneythen avlidation of the semantics is ment to bve done in teh virt driver15:12
sean-k-mooneyi raised this in the reivew but i guess i did not see that code path. i did not think that was going to call those fucntion so that is a regression15:14
dmitriisgibi: there's also the code path which uses interface names that calls out to sysfs15:15
sean-k-mooneyyes that is the qos code15:16
gibidmitriis: yes, there is a bunch of other coupling point. It is not just your patches introduced that15:16
gibisean-k-mooney: the qos code is going through the virt driver15:16
gibipci_info15:16
gibibut there are other codes that goes to sysfs without the driver15:16
sean-k-mooneygibi: yes15:16
sean-k-mooneygibi: hum https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/virt/hyperv/hostops.py#L194-L217 so hyperv apparently does support pci passhtough15:19
sean-k-mooneybut maybe not sriov15:19
sean-k-mooneyat least not with qos15:19
sean-k-mooneywell neutron sriov ports i mean15:19
sean-k-mooneybut we cant depend on sysfs in the whitelist parseing of it woudl break hyperv potentially15:20
sean-k-mooneyright now its guarded by the remote managed tag15:20
sean-k-mooneyso it wont actully break anything unless you set that in the whitelist15:20
sean-k-mooneybut this is much more fragile then i would like15:21
gibihyperv does not use the pci device tracker I assume15:21
sean-k-mooneyi think it does15:21
sean-k-mooneybut the current code pats only run if you set remote_manged=true15:22
sean-k-mooneyand we catch error for the interface lookup in the qos path15:22
sean-k-mooneywhich you also said is called form teh virt driver so it would not trigger an error in hyperv15:22
gibithey do a very different and limited device matching https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/virt/hyperv/vmops.py#L438-L44415:22
gibijust vendor and product id15:23
sean-k-mooneyyes15:23
gibiso they are not relying on the whitelist pcidevicespec matching logic15:23
sean-k-mooneyhyperv does not allow them to access the pci address15:23
sean-k-mooneythe have an indrict handel via a uuid if i rememebr correctly15:23
sean-k-mooneyso they cant use the devname or adress fields15:24
dmitriissean-k-mooney: yeah, with the current code, runtime checks would only be triggered if the tag is set to true15:24
sean-k-mooneyjust vendor id and prduct id15:24
sean-k-mooneygibi: dmitriis  so right now the pci device are first loaded form teh hypervior via https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/compute/manager.py#L1554-L156015:26
sean-k-mooneythe compute mangers pre start hook calls update_available_resource15:27
sean-k-mooneywhich calls into driver which gets the avaible resouces15:27
sean-k-mooneythat is the entrypoint that should raise the exction that will kill the comptue service if the confguration is invlaide15:27
sean-k-mooneyhttps://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/compute/resource_tracker.py#L87915:28
sean-k-mooneyis where the resouce tracker called the virt driver15:28
opendevreviewJonathan Race proposed openstack/nova master: object/notification for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82836915:28
opendevreviewJonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82205315:28
opendevreviewJonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82837215:28
sean-k-mooneywhich gets the pci passthough info https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/virt/libvirt/driver.py#L898415:29
gibisean-k-mooney: so the comment in https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/compute/manager.py#L1388-L1396 is not valid in the sense that the pci device tracker could kill the compute service not just the periodic task15:29
sean-k-mooneythat is used to init the pci tracker here https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/compute/resource_tracker.py#L76115:30
sean-k-mooneygibi: yes it should be able to kill both15:30
* sean-k-mooney clicks15:30
sean-k-mooneygibi: so yes loadign the whitelisth ehre kills things early if the syntax is wrong but the pci tracker can kill the agent and or the perodic later if it whishes by raising an excption15:31
gibihttps://review.opendev.org/c/openstack/nova/+/342301/2/nova/compute/manager.py this added the early parsing15:31
gibiand at that time the pre_start_hook already called update_available_resource so then I don't see how that original bug happened15:32
sean-k-mooney i dont think i reviewd that15:33
sean-k-mooneyso im not sure either15:33
gibiit is ooold15:33
gibianyhow15:34
gibiin summary 15:34
gibiwe have plenty of coupling 15:34
gibiI will try to document that in the bug15:34
gibiI opened15:34
gibibut I'm not sure any more that I want to refactor everything before I fix the PF MAC update problem15:34
sean-k-mooneyactully i think this was invalid15:35
sean-k-mooneyhttps://bugs.launchpad.net/nova/+bug/160303415:35
sean-k-mooneypci_passthrough_whitelist = [ {"devname": "hed1", "physical_network": "physnet1"},{"physical_network": "physnet1", "address": "*:04:00.0"},{"physical_network": "physnet2", "address": "*:04:00.1"}]15:35
sean-k-mooneythey were expecting an error because hed1 was not present15:35
sean-k-mooneybut the whitelist parsing shoudl not check that15:35
sean-k-mooneyso the change they have should not affect that behvioar15:35
gibistill the whitelist.__init__ caught that based on the attached stack trace15:36
gibibut did not kill the compute15:36
sean-k-mooneyit looks like it was found because of _init_dev_details15:36
gibiyepp15:36
sean-k-mooneyso at that poitn it was doing the sysfs lookup15:37
gibiyes15:37
sean-k-mooneyya so that was actully already a bug15:37
sean-k-mooneysince it shoudl not call sysfs as we have been disucssing15:37
sean-k-mooneygibi: ok so you are writign up a bug for this15:38
sean-k-mooneyand are going to try an capture this old context15:38
gibiyes15:39
gibiI try to write up what couplings I found15:39
gibithe Whitelist.__init__ is one example15:39
gibiI did the write up https://bugs.launchpad.net/nova/+bug/1961587/comments/216:08
opendevreviewAlexey Stupnikov proposed openstack/nova master: Run clean up calls when queued live migration is aborted  https://review.opendev.org/c/openstack/nova/+/82857016:11
*** amoralek is now known as amoralej|off16:43
kashyapFinally ... Linux kernel wants to consolidate /dev/random and /dev/urandom.  Removes much confusion -  https://lwn.net/SubscriberLink/884875/0c09a5929fef859c/16:48
kashyap"it would seem there are no huge barriers to removing the final distinction between /dev/random and /dev/urandom—other than the names, of course"16:48
opendevreviewGhanshyam proposed openstack/nova master: Modify remaining APIs as per RBAC new guidelines  https://review.opendev.org/c/openstack/nova/+/82899416:53
opendevreviewGhanshyam proposed openstack/nova master: Separate flavor extra specs policy for server APIs  https://review.opendev.org/c/openstack/nova/+/82962616:54
opendevreviewGhanshyam proposed openstack/nova master: Complete phase-1 of RBAC community-wide goal  https://review.opendev.org/c/openstack/nova/+/82986616:54
dansmithgibi: where did the compute node stuff go in this? https://review.opendev.org/c/openstack/nova/+/828369/2016:56
gibidansmith: on a call, I will ping you soon16:56
dansmithgibi: I was going to circle back and see how that was shaping up, but it looks like a lot was dropped in the latest rev so something much have changed16:56
dansmithsure16:56
chateaulavdansmith: see patchset 1916:59
dansmithchateaulav: what about it? no comments on there about a change in direction that I see17:00
dansmithchateaulav: or do you mean ps20 was an accident?17:00
chateaulavwe discussed this morning about dropping initial riscv support in this cycle. I can make note in the review of the discussion from irc.17:01
dansmithah okay17:01
chateaulavdansmith: i wanted to ensure code had no issues with the support removed17:02
dansmiththat's why I was asking here, figured there was some discussion not captured there17:02
chateaulavtotally17:02
chateaulavdansmith: let me know if you have any questions. and also see the DNM from gibi for a more complete capture of the issue and related bug that was submitted. https://review.opendev.org/c/openstack/nova/+/829989 17:18
dansmithchateaulav: I just commented on that17:18
* gibi is back17:25
gibidansmith: chateaulav is correct we do not try to extend the Arch enum any more so the ComputeNode OVO backporting issue can be ignored now17:26
gibiat least for the current release17:26
dansmithyep, I'm caught up having seen your DNM17:26
gibidansmith: thanks for the comment on https://review.opendev.org/c/openstack/nova/+/829989 17:27
gibidansmith: the separate ListObject thing might work17:27
dansmithworth a try I think17:27
dansmithand in that case, the list object becomes more of a "this is what specs the remote side supports" which can have its own legit translation logic in the backport17:28
gibidansmith: so in the next release if we want riscv support then we can try that17:28
dansmith++17:28
gibiif it does not work then we can still make a minimum service version check on the new compute before publishing support for new archs17:28
dansmiththere could be a situation in the future where we supported arm and then it split into arml and armf, such that we need to translate those two into one or something17:28
dansmithyep, for sure17:28
gibidansmith: ohh that arm thingy is a nice example where we need specific translation in the supported list17:29
dansmithyeah17:29
gibiOK I think this is settled17:29
gibithanks17:29
dansmithcool17:30
gibichateaulav: I checked the first patch of the series it looks good to me. I will continue with the rest tomorrow17:33
chateaulavawesome thanks gibi17:58
melwittgibi: feel free to mark the nova bug I reported as a duplicate17:59
melwittI didn't realize it was already out there17:59
*** carloss is now known as carloss|afk18:29
*** carloss|afk is now known as carloss19:29
opendevreviewMerged openstack/nova stable/xena: Clean up allocations left by evacuation when deleting service  https://review.opendev.org/c/openstack/nova/+/81695420:51
opendevreviewMerged openstack/nova master: trivial: Add a white space in an error message  https://review.opendev.org/c/openstack/nova/+/82322320:51
opendevreviewyuval proposed openstack/nova master: Lightbits LightOS driver  https://review.opendev.org/c/openstack/nova/+/82160620:53
opendevreviewMerged openstack/nova master: nova-next: Drop NOVA_USE_SERVICE_TOKEN from subnode  https://review.opendev.org/c/openstack/nova/+/81674020:57
opendevreviewMerged openstack/nova master: Vmware: Fix spelling in test  https://review.opendev.org/c/openstack/nova/+/80634820:58
opendevreviewMerged openstack/nova master: Correct test error  https://review.opendev.org/c/openstack/nova/+/77363420:58
opendevreviewJulia Kreger proposed openstack/nova master: Ironic - Handle instance/node host on rebalance  https://review.opendev.org/c/openstack/nova/+/81389722:32
opendevreviewJulia Kreger proposed openstack/nova master: Ironic - Don't query the API for instance counts  https://review.opendev.org/c/openstack/nova/+/82961322:33
opendevreviewMerged openstack/nova master: Raise InstanceNotFound on fkey constraint fail saving info cache  https://review.opendev.org/c/openstack/nova/+/82694223:11
opendevreviewMerged openstack/nova master: VmWare: Use of id shadows built-in function  https://review.opendev.org/c/openstack/nova/+/80639023:11
opendevreviewMerged openstack/nova stable/xena: Prevent leaked eventlets to send notifications  https://review.opendev.org/c/openstack/nova/+/81648723:11

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!