Monday, 2021-12-13

*** songwenping__ is now known as songwenping00:48
sdmitrievhttps://review.opendev.org/c/openstack/nova/+/760354/501:56
sdmitrievhttps://review.opendev.org/c/openstack/nova/+/710848/601:56
sdmitrievhttps://review.opendev.org/c/openstack/nova/+/710847/801:57
sdmitrievHello there, just rebased these commits above, would really appreciate if someone could review them01:57
sdmitrievI'm really interested in fixing that bug since deployments are pretty much affected 01:57
*** bhagyashris_ is now known as bhagyashris06:21
opendevreviewPierre-Samuel Le Stang proposed openstack/nova-specs master: Implements: blueprint soft-delete-instance-actions  https://review.opendev.org/c/openstack/nova-specs/+/82138708:47
gibibauzas, sean-k-mooney: this is my last week before vacation and I will not be back until M2 so I'd like to get over with the placement spec https://review.opendev.org/q/topic:any-traits-support this week. 09:42
bauzasgibi: ack good point09:42
gibiit was already approved a long time ago so I don't think it is controversioal09:42
bauzasand reminder that we have our spec review day tomorrow09:42
gibiyepp, tomorrow is good for me09:43
gibi:)09:43
gibias a trade I'm willing to look at many reviews this week as I don't want to start any new on my side just closing down open things09:50
gibiso hit me with reviews09:50
gibisean-k-mooney: I'm +2 on the recent version of the instance action softdelete spec https://review.opendev.org/c/openstack/nova-specs/+/82138711:03
gibibauzas: and easy stable/xena backport needing a second core (lyarwood is already on PTO)  https://review.opendev.org/q/topic:%2522bug/1945310%2522+branch:stable/xena11:11
bauzasgibi: done11:17
bauzasgibi: I'm just reviewing pslestang's spec already :)11:18
sean-k-mooneygibi: just reading back11:50
sean-k-mooneyill look at the placement one shortly11:50
sean-k-mooneygibi: i was pretty happy with the instance action spec so ill look at that again this morning since its been revised11:51
sean-k-mooneygibi: by the way i dong thave +2 rights on placemt/placemsnt spec directory since its in tree but ill review it anyway11:52
sean-k-mooneygibi: oh this is the spect that like yuou do required=in:TRAIT1,TRAIT211:56
*** dpawlik7 is now known as dpawlik11:56
opendevreviewWenping Song proposed openstack/nova master: Fill the AcceleratorRequestBindingFailed exception msg info  https://review.opendev.org/c/openstack/nova/+/81732611:57
gibisean-k-mooney: yes, that is in :TRAIT1,TRAIT212:07
sean-k-mooneygibi: i have one open qustion/suggestion for that spec12:07
sean-k-mooneygibi: https://review.opendev.org/c/openstack/placement/+/649992/5/doc/source/specs/yoga/approved/2005346-any-traits-in-allocation_candidates-query.rst#6812:09
sean-k-mooneyoh i see the last spec is for mixing which is what i was asking us to support12:10
sean-k-mooneythe proposal for mixing will work for placment but now for nova/glance12:11
sean-k-mooneyi tink just using ; is simpler12:12
gibisean-k-mooney: it is almost directly following how member_of is modelled in the placement API12:24
gibiso repeat means AND. OR expressed with in: prefix12:25
gibisure ';' could be used instead of repeat of the required query param but then we would create inconsistency with member_of12:26
opendevreviewPierre-Samuel Le Stang proposed openstack/nova-specs master: Implements: blueprint soft-delete-instance-actions  https://review.opendev.org/c/openstack/nova-specs/+/82138712:32
sean-k-mooneygibi: it would yes but the ofllow on proposal of using multiple &required=...&required=...12:33
sean-k-mooneycant be modled in flavors or images12:33
sean-k-mooneysince we cannot support multi opts and merge them12:33
sean-k-mooneyso we will need to have some other way to mix them in the flavor/image 12:33
sean-k-mooneywhich is why i think either requiring all in parmater to be at the end or using ; makes sense12:34
gibithe current falvor syntax is trait:HW_CPU_X86_AVX2=required , so it can extended to trait:inFOO,BAR=required12:35
opendevreviewMerged openstack/nova stable/xena: Reproduce bug 1945310  https://review.opendev.org/c/openstack/nova/+/81140512:35
gibialso trait:inFOO,BAR=required and trait:BAZ=required can be mixed12:36
sean-k-mooneyi guess that could work12:36
pslestanggibi: thanks for reviewing the spec, I pushed an other patch to fix all the syntax error you underligned12:36
gibithe flavor syntax is ugly12:36
gibibut that is a different story12:36
gibi :D12:36
sean-k-mooneygibi: this might be nicer trait:FOO,BAR=required_in12:36
gibisean-k-mooney: yeah, that could be another option12:37
gibithe point is that we have a way to modell repetition as the key contains the trait name12:37
gibipslestang: I will check soon12:37
sean-k-mooneygibi: ok the image/flavor represnetiaotn is out of scope of the placment spec anyway12:37
sean-k-mooneyso i think you have convinced me that this can work12:38
gibiyes, but you had a point, if the flavor have had required=FOO syntax then we would be in deep trouble12:38
gibiso it was a good excersize to see how flavor will fit12:38
sean-k-mooneyya i forgot we reversted it12:38
sean-k-mooneyand didn trait:*=required12:39
sean-k-mooneyill update my review with this conversation and change to a +112:40
gibisean-k-mooney: thanks12:41
gibipslestang: +212:41
sean-k-mooneypslestang: ill review that sortly after i grab a coffee12:43
sean-k-mooneyi unexpectly have meeting for the next 4 hours but i should have time to review your spec before i have to join them12:43
sean-k-mooneyits almost as if there is pending holidays that people will be taking on mass and everything is beeing cramed into this week :)12:44
gibisean-k-mooney: I hope that 4 hour meeting is some light gathering before the vacation time to chill out and slow down :)12:44
pslestanggibi: thanks12:44
plibeau3lyarwood: I have apply a part of proposition. I have put some question on patchset3. -> https://review.opendev.org/c/openstack/nova/+/82053112:44
pslestangsean-k-mooney: sure, don't worry12:44
gibipslestang: thank you for the nicely written spec!12:45
gibiplibeau3: lyarwood is on PTO already12:45
gibiplibeau3: so be prepared for slow response :)12:45
plibeau3gibi: oki :)12:46
opendevreviewMerged openstack/nova stable/xena: Query ports with admin client to get resource_request  https://review.opendev.org/c/openstack/nova/+/81140712:47
sean-k-mooneybauzas: unless you want to to specifically look at the instance-action sepc im also not +2 on it and was going to +w12:54
sean-k-mooneypslestang: thanks for adding the lines regardding restore12:55
pslestanggibi: sean-k-mooney thanks!13:08
gibibauzas: I've responded in https://review.opendev.org/c/openstack/nova/+/820549 , could you upgrade your vote to +2? :)13:29
bauzasgibi: sean-k-mooney: sorry, was afk for getting a 3rd jab14:04
gibibauzas: nice14:04
sean-k-mooneybauzas: no worries i just +w'd the instance action spec14:04
sean-k-mooneyi can porbly pull it out of the gate if you want to review?14:04
sean-k-mooneygibi and i are both +2 on it14:05
bauzassean-k-mooney: nah, eventually I just +wd it too14:07
bauzasI'm OK with the way14:07
bauzasso we wouldn't have any upgrade issues14:07
sean-k-mooneyack14:11
opendevreviewMerged openstack/nova-specs master: Implements: blueprint soft-delete-instance-actions  https://review.opendev.org/c/openstack/nova-specs/+/82138714:16
gibigmann: due to https://review.opendev.org/c/openstack/nova/+/821423 the AZ related tempest tests cannot be run in parallel with tests that are booting instances (you can see many example failures in the run on that patch). Is there an easy way to separate these tests out? 14:38
gibiI see fixtures.LockFixture('availability_zone') in these tests, but I guess that only protects the AZ test from each other14:39
gibiand it would be hard to add the same lock to all the other tests that boot instances14:39
gibigmann: I'm thinking about separating out the AZ tests to a separate tempest run command in all tox targets, but that is also a bit of a pain to do properly14:44
gmanngibi: looking 15:37
opendevreviewyuval proposed openstack/nova master: Lightbits LightOS driver  https://review.opendev.org/c/openstack/nova/+/82160616:15
jhartkopfHi there, I need another +2 on my patch: https://review.opendev.org/c/openstack/nova/+/813672 It's just a small fix. I'd appreciate if someone could take time for review.16:36
gmanngibi: I got the problem, its tempest test also need to modify as per your change. 16:46
gibigmann: what would be the good way to separate the AZ tests from the rest?16:47
gmanngibi: test is trying to remove the host which has servers (bug you are fixing). you can see remove host request  - https://zuul.opendev.org/t/openstack/build/5a9aad45498d45cda1eabd2b87b743d2/log/controller/logs/screen-n-api.txt#226816:47
gmannand delete server request is after that https://zuul.opendev.org/t/openstack/build/5a9aad45498d45cda1eabd2b87b743d2/log/controller/logs/screen-n-api.txt#256816:47
gmanngibi: I am fixing test to wait for server delete before it does remove host request16:48
gmannthere is no race condition here its just tests did not wait for server to be deleted before remove_host is called in cleanup16:48
gibigmann: OK, if that is in a single test case I can fix that up. But I think there are cases when we have two parallel tempest tests 1) does some AZ testing by moving host between aggregates 2) boot instance on host. If this two happens in parallel then the 1) will fail after the bugfix16:49
gmanngibi: I can check those if there is any such case. I think that is only test we have to add/remove host to agg with server16:50
gibiOK, I will gather the full picutre and fix up the cleanup issue16:50
gmanngibi: let me update test and add your change as depends-on to see if all pass. and later we can merge the temest fix before nova change as per bug fix procedure 16:50
gibigmann: sure, if you have time then I can appreciate the tempest fix16:51
gmannthose tempest test play on same host but we have only single tests doing server boot but I will check if we have any case like race condition you explained16:51
gmannsure16:51
gibigmann: I think test_aggregate_basic_ops failure in https://zuul.opendev.org/t/openstack/build/83aad9cdd5b849d48f4ffb463880d259/logs is a race16:53
gibias that test case does not boot a server16:53
gibibut it fails as probably other parallel test booted one16:53
gmanngibi: ohk, did not see that in latest run. let me check16:54
gmannwe should not need lock here instead test should check host with AZ etc carefully 16:55
gibigmann: so mean skip the AZ test if the host has instance? 16:58
gibior wait in the test until the host has no instnace?16:58
gmanngibi: skip like we do here https://github.com/openstack/tempest/blob/7e96c8e854386f43604ad098a6ec7606ee676145/tempest/api/compute/admin/test_aggregates.py#L22816:59
gibigmann: but would not that mean we sometimes run the test sometimes does not depending on which other parallel test runs at the same time 16:59
gibi?16:59
gibiso in theory we might lose test coverage by that17:00
gmanngibi: i mean skip if there is no such host in env. test creating instance has to be cleaned up so we can add wait or so 17:00
gmanngibi: ah, we do have lock there but not sure why race is happening https://github.com/openstack/tempest/blob/master/tempest/scenario/test_aggregates_basic_ops.py#L11917:02
gmanngibi: ohk but any other non agg test might be picking that host and creating server17:02
gibigmann: exactly ^^17:03
gibiso the lock only avoid two AZ test to race17:03
gmannyeah17:03
gibican we somehow make the AZ tests always run _after_ all the other tests?17:04
gmannand skip if host has server also does not solve the issue as server can be created in between17:04
gmanngibi: we can configure that way in our CI but as tempest run it is still bug that this test canot be run in parallal 17:05
sean-k-mooneywe do run the senario test after all the others in serial17:06
sean-k-mooneyso we could put the az test in that group17:07
sean-k-mooneyat least for tempest-full17:07
sean-k-mooneyhttps://github.com/openstack/tempest/blob/master/tox.ini#L112-L11317:07
sean-k-mooneywe could exclude the az test the same way and run them with the senario and slow tests17:07
sean-k-mooneyits not the ideal solution but its an option17:08
gmanngibi: one best we can do in tests 'check for host has no servers and then only add/remove tests' but that does not solve race completly  17:08
sean-k-mooneygmann: we donbt really want to make the dest depend on thing that are dynamic like that really17:09
gmanngibi: or we can add try except and if conflict error due to server present then skip the test otherwise pass17:09
gibisean-k-mooney: you mean AZ test would be a 3rd group?17:09
gibito separate them from all the others17:10
sean-k-mooneygibi: not entirly just put them with the senario tests17:10
sean-k-mooneywe could have them be a third group17:10
gmannotherwise we have only choice of remvoe the test17:10
sean-k-mooneybut only if we could run them in paralle in that group17:10
gmannsean-k-mooney: that does not solve the issue, it is just our CI fix17:10
gmannanyone running tempest will face the same issue 17:10
sean-k-mooneygmann: right but the only way to solve the issue it so use a lock really17:11
gibisean-k-mooney: does the scenario / show runs in a single thread? if so we can add the AZ test there but if those also run in parallel then the same race can happen there too17:11
gmannsean-k-mooney: we cannot use lock as it end up making all tests running serially 17:11
sean-k-mooneygibi: for tempest full its serical so one thread yes17:11
gmanngibi: yeah that is happening now.17:11
gmannscenario test also create server. 17:12
gibiyeah, locking means we basically need to lock agains any test that boots an instance, that is pretty restrictive. 17:12
gmannyeah17:12
sean-k-mooneygibi: not quite17:12
sean-k-mooneyyes we do but the locking is slightly different17:13
gmannit end up running tempest in serial 17:13
sean-k-mooneybasiclly  we need somihing like a reader writer lock17:13
sean-k-mooneythe aggreate tests need to grab the writer lock but the boot test need only a reader lock17:13
gmannI see these are good option if we want to keep tests - 1. test will 'check for host has no servers and then only add/remove tests'  2. add try except and if conflict error due to server present then skip the test otherwise pass17:14
sean-k-mooneyso the build test can run in parallel with the read lock until a aggreate test try to modify thing in which case it grabes the writer lock 17:14
gmannwe do have same kind of try except for image tests too for some cases, that is best tempest can test these APi race operations 17:16
gmannand which matches to the real operation scenario 17:16
sean-k-mooneyperhaps but i still think a reader/writere lock would be a good fit here17:16
gmannsean-k-mooney: but boot tests having reader lock will still create server on host as aggregate test having write lock. correct?17:17
gmannif we want to isolate them the it has to be same lock17:18
sean-k-mooneyit depend on the sematics of the lcok. you can have it perfer reads so that you cant get the write lock until all readers relese17:18
sean-k-mooneyhttps://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock17:18
gmannyeah which is same thing right. end up running in serial as per many current boot tests17:19
sean-k-mooneywell we have to run in serial17:19
gmannlet me try the above check and then we can see how it looks like17:19
sean-k-mooneywe can run in parallel only if aggreates are not being changed17:19
sean-k-mooneyif they are then only that test can happen as its global state17:20
sean-k-mooneywe actully need "Write-preferring RW lock" semantics if we were to use this correctly17:21
gibias we have a small amount of AZ test I still think separating them out to the end would be easier than addin a reader lock to each test that boots an instance17:22
gibiof course if such separation is possible17:22
sean-k-mooneygibi: yep it would be but it need to be done by teh caller of tempest17:22
sean-k-mooneywe can do it via tox with the regex support17:22
sean-k-mooneybut every client would have to do that17:23
gibisean-k-mooney: then we might need to invent someting inside tempest to order cases?17:23
sean-k-mooneyunless we add a new feature to tempest to tag things as "must  be serial"17:23
gibiyeah17:23
gibilike that17:23
sean-k-mooneyif we could do that as a decorator that would be nice17:23
gibiOK, so we have couple of options a) RW lock b) @serial decorator17:23
sean-k-mooneyi just dont know if we can. i suspect its posible17:23
sean-k-mooneyc) tox.ini for now17:24
gibiyeah temporarily tox.ini  but I don't really like that17:24
sean-k-mooneyits what we have done for senario tests for years17:24
gmannyeah we should not do that as other tempest user still will face the error17:24
sean-k-mooneyit works but you have to know to do it17:24
gibiI will sleep on this. I don't need to rush to land the nova bug fix so we have time17:25
gmannsean-k-mooney: we did that only for env/timeout restriction they can be run in parallel like we do in tempest-parallel job17:25
gibithanks for the input17:25
sean-k-mooneygmann: they can if you have enough resouces in the could yes17:25
gibiand I will read back if you still continue discussiing this17:25
gibi:)17:25
gmannsean-k-mooney: here we are making tempest cannot be run in parallel so we have to fix test not the way we run those17:25
sean-k-mooneywe do it because with our default concurance tehy wont fit in the ci vms17:26
sean-k-mooneygmann: yep i understand17:26
sean-k-mooneyhowever if this is a gate blocker it better to unblock the gate and then fix it properly17:26
gmanngibi:sean-k-mooney let me add test modification with checks and try/except. and later we can see if tagging a tests to run serial can be done or not17:26
sean-k-mooneyack17:26
gmannsean-k-mooney: this is not gate blokcer but change need with nova fix17:27
gmannnew bug fix in nova and so does tempest tests need fix17:27
sean-k-mooneythis is for stephens bug fix to prevent you updating aggreate when there are isntance on the host?17:29
gibisean-k-mooney: yes17:30
sean-k-mooneythat was previously ment to be blocked but i assume there was an edge case we missed17:30
gmannyeah17:30
gmannhttps://bugs.launchpad.net/nova/+bug/190777517:30
gibisean-k-mooney: https://review.opendev.org/c/openstack/nova/+/82142317:30
sean-k-mooneyok that was "fixed" for a different edgecase many releases ago17:30
sean-k-mooneyso im surpised tempest has not hit this already17:30
gibisean-k-mooney: what was actually fixed?17:33
gibisean-k-mooney: as this bug now has reproduction test17:33
gmannyeah, we never added restriction for add/remove host having server17:34
gmannthat is what tempest tests is doing17:34
sean-k-mooneyis_safe_to_update_az17:35
sean-k-mooneyhttps://github.com/openstack/nova/commit/8e19ef4173906da0b7c761da4de0728a2fd71e2417:35
sean-k-mooneyhttps://github.com/openstack/nova/commit/0ad5a64dc9ac4b1cfb8038f9171b6385fdf07f2817:35
gmannohk, no two different AZ while adding 17:35
sean-k-mooneywe currently block you form adding or removing the host to the host-aggreate that has the az metadata17:36
gmanndifferent AZ17:37
sean-k-mooneywell more when doign az update17:37
sean-k-mooneybut this code was ment to alos cater for this usecase17:37
gibihm the first one is aggregated update, that is different from add/remove host but yes that could race with instance boot17:38
gibithe second is about one host can only be one AZ case17:39
gibithat I think cannot race with instance boot17:39
sean-k-mooneygibi: well yes and no. ading or removign a host form an aggreate is really an update too in the more general sense17:39
sean-k-mooneymy point was that https://review.opendev.org/c/openstack/nova/+/821423/1/nova/compute/api.py#645817:40
sean-k-mooneyshould have been blocking this edgecacse already17:40
sean-k-mooneybut it does not hece the need for stephens change17:40
gibiOK, I really need to drop17:42
gibifor today17:42
gibiI will think about it more tomorrow o/17:42
*** tbachman_ is now known as tbachman17:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!