Monday, 2022-07-25

bauzasmorning folks08:20
opendevreviewAmit Uniyal proposed openstack/nova master: For evacuation, ignore if task_state is not None  https://review.opendev.org/c/openstack/nova/+/84888608:43
opendevreviewAmit Uniyal proposed openstack/nova master: add regression test case for bug 1978983  https://review.opendev.org/c/openstack/nova/+/84910409:06
opendevreviewAmit Uniyal proposed openstack/nova master: For evacuation, ignore if task_state is not None  https://review.opendev.org/c/openstack/nova/+/84888609:06
auniyal_how should we write zuul recheck cmd11:31
auniyal_so I want to run zuul, recheck for 4 jobs11:32
sean-k-mooneyauniyal_: you cant and that by design11:47
sean-k-mooneyauniyal_: we don not allow indivigual jobs to be rechecked seperatly11:47
auniyal_okay, so have to run all jobs, by giving recheck only ?11:48
sean-k-mooneycorrect. you can tirgger third party ci seperately11:50
auniyal_thanks sean-k-mooney11:50
sean-k-mooneybut first party ci will run all jobs together11:50
auniyal_earlier I saw somewhere, that we should not run all jobs if only 1 or 2 job fails and can only run by single jobs using recheck <something> <job-name>, but tried to look in https://zuul-ci.org/docs/zuul/latest/ couldn't find it11:54
sean-k-mooneythe confirution is per pipeline and we expeictly do not allow that in openstack12:08
sean-k-mooneyzuul may support that but we do not allow that in openstack under the green check policy12:09
sean-k-mooneyall jobs on the check run must use the same revison fo the code12:09
sean-k-mooneyif you recheck indivicual jobs that woudl not be the case12:09
bauzasfolks, we have a problem with tempest.api.compute.admin.test_live_migration.LiveMigrationTest.test_live_migration_with_trunk12:31
* bauzas tries to look how many checks we have a problem with it12:31
bauzashttps://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30h,to:now))&_a=(columns:!(_source),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:test_live_migration_with_trunk),sort:!())12:35
bauzaslooks like it's https://bugs.launchpad.net/nova/+bug/194042512:35
opendevreviewBalazs Gibizer proposed openstack/nova master: Poison /sys access via various calls in test  https://review.opendev.org/c/openstack/nova/+/84462712:35
opendevreviewBalazs Gibizer proposed openstack/nova master: Add compute restart capability for libvirt func tests  https://review.opendev.org/c/openstack/nova/+/85051012:35
opendevreviewBalazs Gibizer proposed openstack/nova master: Rename [pci]passthrough_whitelist to device_spec  https://review.opendev.org/c/openstack/nova/+/84383412:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Rename exception.PciConfigInvalidWhitelist to PciConfigInvalidSpec  https://review.opendev.org/c/openstack/nova/+/84386112:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Rename whitelist in tests  https://review.opendev.org/c/openstack/nova/+/84386212:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Basics for PCI Placement reporting  https://review.opendev.org/c/openstack/nova/+/84618712:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Extend device_spec with resource_class and traits  https://review.opendev.org/c/openstack/nova/+/84621812:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Reject PCI dependent device config  https://review.opendev.org/c/openstack/nova/+/84643512:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Reject mixed VF rc and trait config  https://review.opendev.org/c/openstack/nova/+/84643612:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Ignore PCI devs with physical_network tag  https://review.opendev.org/c/openstack/nova/+/84621912:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Reject devname based device_spec config  https://review.opendev.org/c/openstack/nova/+/84646612:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Support [pci]device_spec reconfiguration  https://review.opendev.org/c/openstack/nova/+/84647012:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Stop if tracking is disable after it was enabled before  https://review.opendev.org/c/openstack/nova/+/84700912:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Move provider_tree RP creation to PciResourceProvider  https://review.opendev.org/c/openstack/nova/+/85054612:36
opendevreviewBalazs Gibizer proposed openstack/nova master: Allow enabling PCI tracking in Placement  https://review.opendev.org/c/openstack/nova/+/85046812:36
gibibauzas: I see 10 hits in the last 7 day: https://paste.opendev.org/show/b8BF5WsTcwJojALnC5J0/ so it is become a bit more frequent than when I reported that bug12:38
bauzasI found 226 hits from the last 7 days12:38
gibiI don't really know how to parse the opensearch query. did you just queried for 'test_live_migration_with_trunk' ? that is all the runs of the test case including when the test passed, isn'tit?12:42
gibialso this 'from:now-30h,to:now' does not seem to be 7 days12:42
gibiI filtered for the nova-next job runs, so my number can be smaller than the global number for sure12:43
gibibauzas: that is probably closer to the 7 days query of all jobs https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-7d,to:now))&_a=(columns:!(_source),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:%22test_live_migration.py%22),sort:!())12:47
gibiI queryd for "test_live_migration.py" do that filters out passing test cases (the test case name is printed ther but not the file)12:48
bauzasgibi: yeah, I just checked for the testname12:48
bauzasas the testname is only provided with a FAILURE12:48
gibinope12:48
gibithis is a passing test with testname {1} tempest.api.compute.admin.test_live_migration.LiveAutoBlockMigrationV225Test.test_live_migration_with_trunk [30.171614s] ... ok\12:48
bauzassee the buildrate12:49
bauzasit's 100%12:49
bauzasbut agreed, I could query it better12:49
gibidoes opensearch filters out SUCCESS runs automatically?12:51
gibior why the passing runs not appear?12:51
gibithe string "test_live_migration_with_trunk" is in the job-output.txt for passing runs too12:51
gibiso something magic happens in opensearch to filter those12:51
bauzasdunno, just testing this new tool 12:53
gibiI don't like magic :D12:53
bauzaswell, at least the failure rate seems high and not related to one specific job12:54
gibiyeah I don't think it is related to nova-next at all, I just needed a way to limit my query12:56
gibimy tool don't do a full search on all job results as that would require to download all the job results and logs locally12:57
gibiand that is not feasible12:57
gibiI've just run a widened search on all the nova devstack based jobs, it is 16 hits in 7 days for me and it is hitting nova-next and nova-grenade-multinode in nova 12:58
*** mfo is now known as Guest596913:03
*** mfo_ is now known as mfo13:03
bauzasovs-hybrid-plug job too13:05
gibibauzas: good point. I missed that job in my config13:17
gibithis way I get 34 hits in the last 7 days filtered for nova jobs13:18
*** dasm|off is now known as dasm13:51
bauzasI'm getting mad at 194042514:58
bauzasbug 1940425 I mean14:58
bauzasgibi: wdyt we could do for https://bugs.launchpad.net/nova/+bug/194042515:07
bauzaswe already wait for 60secs15:07
bauzas?15:08
gibibauzas: would make sense pinging neutron folks15:08
bauzasfailure rate is so high that we can't merge things by now15:10
bauzasmmm, so melwitt had a thought on https://github.com/openstack/tempest/blob/97be23ea6402649652991983f3f2b85873eba4d8/tempest/api/compute/admin/test_live_migration.py#L285 maybr not required for all neutron backends15:33
bauzasslaweq: around ?15:36
melwitt^ it was sean-k-mooney's thought that I repeated :) sean's the one who knows tons of stuff about nova <--> neutron15:44
bauzasif I'm not getting it wrong, nova isn't only impacted15:45
bauzasthis is also hitting cinder15:46
sean-k-mooneymelwitt: so the basis of this is that for ovn we only plug one port into ovs. and the trunking is implemented as openflow rules on that port15:59
sean-k-mooneyfor ml2/ovs and ml2/linux bridge16:00
sean-k-mooneywe careate a ovs or linux bridge per trunkport 16:00
sean-k-mooneyin the ovs case each sub port is create as a patch port pair between the br-int and br-trunk###16:00
sean-k-mooneyand the br-int side of the patch port is tagged with the neutron port uuid16:01
sean-k-mooneyso ml2/ovs reports the status fo the port as up/active16:01
sean-k-mooneyonce it finishes wiring them up16:01
sean-k-mooneyfor ml2/ovn i have no idea if it bothers to report the subports as up since they dont exist on the dataplane level16:02
bauzassean-k-mooney: except the ovn-hybrid-plug job, this is also failing on nova-next16:02
melwittI think my brain just exploded from reading that16:02
bauzasovs-hybrid-plug, my bad16:02
bauzaswe also have the grenade-multinode which hits such bug16:03
sean-k-mooneybauzas: ack so its failing regardesll of how its implemneted16:03
gibithe interesting part that it is not 100% failure in nova-next, so in some case neutron does bring up the subport16:03
bauzasyup16:03
sean-k-mooneyit could be a timing thing16:03
bauzasthis looks a transient issue16:03
bauzaswe already wait for 60s16:03
gibibut 60sec is a lot 16:03
sean-k-mooneyneutron is not ment to send network vif pulgged for the parent16:03
sean-k-mooneyuntill all the subports are setup16:03
sean-k-mooneymaybe it does nto take that into account16:03
sean-k-mooneyand sends it only once the parent is set up16:04
sean-k-mooneymeaning we might be racing16:04
gibiyeah that make sense16:04
sean-k-mooneynova only has one port attached to the vm so we only care about the parent16:04
gibithat will depend on how was tempest querying the state16:04
sean-k-mooneyi dont think the parent shoudl really be active if the subports are not active16:05
melwittthis is the tempest test https://github.com/openstack/tempest/blob/97be23ea6402649652991983f3f2b85873eba4d8/tempest/api/compute/admin/test_live_migration.py#L28516:05
sean-k-mooneybut honestly that is proably an impmentation detail that noone should depend on16:05
sean-k-mooneyi dont think this was defiended in teh specs16:05
sean-k-mooneyso the test might just be asserting stuff that is not required/guarenteed by the api16:05
sean-k-mooneyim pretty sure this was the relevent spec https://specs.openstack.org/openstack/neutron-specs/specs/newton/vlan-aware-vms.html16:07
sean-k-mooneygibi: melwitt  so ya reading that quickly the status of the subport is not defeined in relation to port binding16:10
dansmithsean-k-mooney: on the event, it should send the event once the thing the port represents can pass traffic, right? so subport or not, we shouldn't get the alert until the traffic will flow16:10
dansmithelse we're wiring up to something we can't expect to get dhcp or other critical traffic through16:10
sean-k-mooneycorrect16:10
dansmithcalling the trunk up because one side is active isn't good enough16:10
sean-k-mooneywe should not get the network-vif-plugged for the trunk parent untill everything is configured to allow all taffic to flow16:11
dansmithright, so depending on the backend implementation that may come depending on what wiring needs to happen16:11
sean-k-mooneybut they may not have implemented that depened status check16:11
opendevreviewBalazs Gibizer proposed openstack/nova master: Add more test coverage for devname base dev spec  https://review.opendev.org/c/openstack/nova/+/84462516:11
opendevreviewBalazs Gibizer proposed openstack/nova master: Extra tests for remote managed dev spec  https://review.opendev.org/c/openstack/nova/+/84462616:11
opendevreviewBalazs Gibizer proposed openstack/nova master: Unparent PciDeviceSpec from PciAddressSpec  https://review.opendev.org/c/openstack/nova/+/84449116:11
opendevreviewBalazs Gibizer proposed openstack/nova master: Fix PciAddressSpec descendants to call super.__init__  https://review.opendev.org/c/openstack/nova/+/84456516:11
opendevreviewBalazs Gibizer proposed openstack/nova master: Remove dead code from PhysicalPciAddress  https://review.opendev.org/c/openstack/nova/+/84462816:11
opendevreviewBalazs Gibizer proposed openstack/nova master: Clean up mapping input to address spec types  https://review.opendev.org/c/openstack/nova/+/84576516:11
dansmithright but that would be a neutron issue16:11
opendevreviewBalazs Gibizer proposed openstack/nova master: Remove unused PF checking from get_function_by_ifname  https://review.opendev.org/c/openstack/nova/+/84577516:11
opendevreviewBalazs Gibizer proposed openstack/nova master: Fix type annotation of pci.Whitelist class  https://review.opendev.org/c/openstack/nova/+/84578016:11
opendevreviewBalazs Gibizer proposed openstack/nova master: Move __str__ to the PciAddressSpec base class  https://review.opendev.org/c/openstack/nova/+/84578116:11
sean-k-mooneydansmith: yes it would16:11
dansmithack, just confirming ;)16:12
sean-k-mooneyso your suggesting that marking it confirmed for nova on the bug is wrong and we should likely change it16:12
dansmithI dunno about that I just want to be clear that nova shouldn't be trying to interpret vif-plugged differently for trunks16:13
sean-k-mooneybauzas: gibi  by the way with the escalation and everything that happened in the last few days i have not been doing upstream bug triage this week sorry16:13
gibisean-k-mooney: no worries16:13
sean-k-mooneydansmith: agreeed16:13
bauzassean-k-mooney: no worries at all, again, bug triage is just down any prio16:13
sean-k-mooneydansmith: nova should jsut care about thte one port that is attached to the vm (the trunk)16:13
sean-k-mooneythe rest is up to neutron to care about16:14
dansmithyes16:14
sean-k-mooneydansmith: if tempest should check this at all is proably TBD16:14
gibimaybe the tempest test verify the system from neutron perspective hence the assert on the subport too16:14
dansmithsean-k-mooney: also true16:14
sean-k-mooneygibi: yes but in that casae it is indicating that neutron is not correctly seting up the trunk16:15
gibiyes16:15
gibiI agree16:15
sean-k-mooneyi would suggest seting the nova part to incomplete for now16:15
sean-k-mooneyas its not clear that nova shoudl be doing anything it is not already doing16:15
gibiworks for me16:20
gibilater we can set it to invalid if turn out only neutron needs a fix16:20
opendevreviewBilly Olsen proposed openstack/nova master: Handle mdev devices in libvirt 7.7+  https://review.opendev.org/c/openstack/nova/+/83897616:24
opendevreviewAmit Uniyal proposed openstack/nova master: For evacuation, ignore if task_state is not None  https://review.opendev.org/c/openstack/nova/+/84888619:30
*** melwitt_ is now known as melwitt21:10
*** dasm is now known as dasm|off21:25

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!