opendevreview | melanie witt proposed openstack/nova master: Add mock to avoid loading guestfs in unit test https://review.opendev.org/c/openstack/nova/+/862769 | 01:52 |
---|---|---|
opendevreview | melanie witt proposed openstack/placement stable/ussuri: placement-status: check only consumers in allocation table https://review.opendev.org/c/openstack/placement/+/840703 | 02:16 |
*** tkajinam is now known as Guest299 | 04:46 | |
sahid | o/ a gentle reminder if it's possible to get eyes on https://review.opendev.org/c/openstack/nova/+/858383 ? | 09:05 |
gibi | good morning | 09:30 |
Uggla | oh gibi, good morning and happy new year. | 10:00 |
gibi | Uggla: same to you too | 10:00 |
opendevreview | Balazs Gibizer proposed openstack/placement master: Make tox.ini tox 4.0.0 compatible https://review.opendev.org/c/openstack/placement/+/868418 | 10:01 |
gibi | bauzas, gmann: the os-vif tox4 fix is blocked by test failure from master confirmed by sean-k-mooney[m] last year. | 10:04 |
gibi | I fixed up the placement tox4 patch based on stephenfin's comments now | 10:04 |
bauzas | k | 10:05 |
bauzas | working on some reproducer downstream atm | 10:05 |
gibi | ack | 10:05 |
opendevreview | Konrad Gube proposed openstack/nova-specs master: Use extend volume completion action https://review.opendev.org/c/openstack/nova-specs/+/855490 | 12:31 |
*** dasm|off is now known as dasm | 13:50 | |
sean-k-mooney | gibi: i have reviewed up to https://review.opendev.org/c/openstack/nova/+/854924/9 and am starting on it now. stephenfin is -1 on it i think just the docs/release notes can you adress those quickly? | 14:03 |
gibi | sean-k-mooney: looking... | 14:03 |
sean-k-mooney | i have approved everything before that so if you do fix it please avoid rebasing the previous patches :) | 14:04 |
gibi | sure :) | 14:08 |
opendevreview | Merged openstack/nova master: Support cold migrate and resize with PCI tracking in placement https://review.opendev.org/c/openstack/nova/+/854247 | 14:40 |
sean-k-mooney | gibi: stephenfin ok so at this point stephen and i have completed the review fo the pci seriese. i have approve most of it including the followps so there are two bits left. 1 the docs/release note fixes noted above and the refactor in the last patch which is optional | 15:05 |
gibi | sean-k-mooney: thanks I will provide the fix for the followup toda | 15:05 |
sean-k-mooney | so i think we can wrap this up this week | 15:05 |
sean-k-mooney | gibi: https://review.opendev.org/c/openstack/nova/+/854929/8 my suggestion fro that is make it a mixin instead and only mix it into the filters that need it | 15:07 |
gibi | I will think about that | 15:07 |
sean-k-mooney | ack i dont think its pressing in any case. if we did not merge the last patch it would have no ill effect one way or anohter | 15:08 |
gibi | yepp, this was the reason we moved it to the top | 15:08 |
sean-k-mooney | im going to go afk for a few mins chat in a bit | 15:08 |
gibi | so it does not block us | 15:08 |
gibi | ack | 15:08 |
opendevreview | Pierre-Samuel Le Stang proposed openstack/nova master: Reproducer test of bug #1999674 https://review.opendev.org/c/openstack/nova/+/867807 | 16:31 |
opendevreview | Pierre-Samuel Le Stang proposed openstack/nova master: Correctly reset instance task state in rebooting hard https://review.opendev.org/c/openstack/nova/+/867832 | 16:31 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Allow enabling PCI scheduling in Placement https://review.opendev.org/c/openstack/nova/+/854924 | 16:41 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Follow up for the PCI in placement series https://review.opendev.org/c/openstack/nova/+/855654 | 16:41 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Rename _to_device_spec_conf to _to_list_of_json_str https://review.opendev.org/c/openstack/nova/+/855648 | 16:41 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reproduce PCI pool filtering bug https://review.opendev.org/c/openstack/nova/+/855649 | 16:41 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Strictly follow placement allocation during PCI claim https://review.opendev.org/c/openstack/nova/+/855650 | 16:41 |
opendevreview | Balazs Gibizer proposed openstack/nova master: FUP for the scheduler part of PCI in placement https://review.opendev.org/c/openstack/nova/+/862876 | 16:41 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Split ignored_tags in stats.py https://review.opendev.org/c/openstack/nova/+/867978 | 16:41 |
gibi | stephenfin, sean-k-mooney[m]: fixed up the comments and added a new reno ^^ | 16:43 |
opendevreview | Merged openstack/osc-placement stable/zed: Make tox.ini tox 4.0.0 compatible https://review.opendev.org/c/openstack/osc-placement/+/868722 | 17:39 |
dansmith | melwitt: sean-k-mooney: so I need to sanity check something with people for the stable compute node uuid stuff | 18:12 |
dansmith | ignoring the pep8 thing, I left one test failing on this patch: https://review.opendev.org/c/openstack/nova/+/863917/1 | 18:12 |
dansmith | because the test checks for a thing that (a) was part of upgrade stuff from long ago and (b) is somewhat incompatible with the new stuff | 18:13 |
dansmith | this is the test: https://github.com/openstack/nova/blob/master/nova/tests/functional/regressions/test_bug_1764556.py#L69-L144 | 18:14 |
dansmith | it's checking going from a deleted service/node with no uuid to re-creating a service with the same name, which generates a node uuid | 18:14 |
dansmith | bug is here: https://bugs.launchpad.net/nova/+bug/1764556 | 18:15 |
dansmith | fixed in stein, so the test is checking for things that could have happened in an upgrade _to_ stein, where you deleted a service/node before the upgrade and then re-created it with the same name after the upgrade | 18:16 |
dansmith | what I want to do is just drop that test early in the stable compute uuid set as no longer relevant, but since that's a big red flag, I want to make sure people are okay with that | 18:16 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Factor out a mixin class for candidate aware filters https://review.opendev.org/c/openstack/nova/+/854929 | 18:33 |
gibi | stephenfin, sean-k-mooney[m]: another stab at the candidate aware schedule filter refactoring ^^ now with a mixing class | 18:33 |
sean-k-mooney | dansmith: sorry was still in calls il read back in a bit | 18:46 |
sean-k-mooney | dansmith: correct me if im wrong but before cells v2 when teh service were first added they just had an int id filed and later we added a uuid field later, not nessically for cellv2 but its required for cellsv2 for for service to be unique since we cant rely on the id filed being unique | 18:50 |
sean-k-mooney | i.g. the id filed was an auto_increment int primary key | 18:51 |
sean-k-mooney | and sicne the service are in the cell db we could have two with the same id in differnt cell dbs | 18:51 |
sean-k-mooney | and the uuid was added to give us a globally unique id for each service | 18:51 |
dansmith | that's one reason we added uuid on the compute node yeah. not sure if that's relevant here though. | 18:51 |
sean-k-mooney | well i mention that becuase test_instance_list_deleted_service_with_no_uuid | 18:52 |
sean-k-mooney | is there to test that upgrade case where the service has not had a uuid populated | 18:52 |
sean-k-mooney | that the old upgade behiovior you mentioned | 18:52 |
sean-k-mooney | i was just confirming that | 18:52 |
sean-k-mooney | the test is also testign what happens fi you delete and recreate teh service but thats not really the point its doing it to thest the online migration | 18:54 |
sean-k-mooney | dansmith: i think im fin with droping it given that only really works pre placement | 18:55 |
dansmith | the point of the test (AFAICT) is to test the case where you restart a compute after the upgrade, when you deleted it before the upgrade | 18:56 |
dansmith | good point on pre-placement, didn't even think about that | 18:56 |
sean-k-mooney | right which should not work | 18:56 |
sean-k-mooney | ``` 4. start a new service with the old hostname (still host1); this will | 18:57 |
sean-k-mooney | also create a new compute_nodes table record for that host/node | 18:57 |
sean-k-mooney | ``` | 18:57 |
sean-k-mooney | so if the host had allocations then one of two thing would happen etierh we delete all callocation when we deleted teh service | 18:58 |
sean-k-mooney | and when the new rp is created with the new cn uuid we need to rebuild them | 18:58 |
sean-k-mooney | or the service delete would fail if you were on older version of openstack because of thte allocations | 18:59 |
dansmith | well, it couldn't have had allocations, because it didn't have a uuid before | 18:59 |
sean-k-mooney | if the RP is not removed the compute agent will get a rp conflict due to duplicate RP name with different uuids | 18:59 |
sean-k-mooney | oh right | 19:00 |
dansmith | the scenario is for computes that were created (and deleted) before we had CN uuids | 19:00 |
sean-k-mooney | is this compute node uuids or service uuids | 19:00 |
sean-k-mooney | i tought this test was a compute service with no compute service uuid | 19:01 |
sean-k-mooney | not compute node uuid | 19:01 |
dansmith | the test is more focused on service, but the implication is what happens to the CN for us | 19:03 |
dansmith | because you don't delete computes, you delete services, which is what the bug is about | 19:03 |
dansmith | bug/test | 19:03 |
sean-k-mooney | right | 19:03 |
sean-k-mooney | so the problem is really the creation of the new compute node recorrd | 19:04 |
melwitt | I'm pretty sure it's compute node uuid that was added | 19:04 |
dansmith | well, they were both added at one point, | 19:04 |
dansmith | but yes compute node was most recent (although still a long time ago) | 19:04 |
melwitt | hm, yeah. this is confusing | 19:05 |
dansmith | the "problem" for me is that the test relies on us creating a new compute node for the resurrected service | 19:05 |
dansmith | which will get a new auto-generated node uuid | 19:05 |
sean-k-mooney | compute service uuid was pike https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#maximum-in-pike | 19:05 |
dansmith | but after I fix that to not happen, it ... doesn't :) | 19:05 |
dansmith | and fails because the compute node can't be re-created with the same uuid | 19:05 |
dansmith | I can make it create-or-undelete (and have locally) | 19:06 |
sean-k-mooney | dansmith: right so creating a new compute node record is wrong | 19:06 |
sean-k-mooney | well | 19:06 |
melwitt | this is the commit that added the test https://github.com/openstack/nova/commit/81f05f53d357a546c7f9a53cae6ef45b92e28bc1 | 19:06 |
dansmith | but that's a bit more change, out of sequence with the rest of the series, etc | 19:06 |
sean-k-mooney | if we still have a compute node record we shoudl not be creating a new one | 19:06 |
sean-k-mooney | if its has been deleted then creating one makes sense | 19:06 |
sean-k-mooney | as its the same as the first time it was created | 19:06 |
dansmith | well, that's kinda the thing | 19:07 |
dansmith | we create a new one, but shouldn't, and if we create with the same uuid, the unique contstraint will fail with the deleted one | 19:07 |
melwitt | sean-k-mooney: the test is deleting the compute node record (implicitly) so that's why it expects to create a new record right? | 19:09 |
melwitt | afterwards | 19:09 |
dansmith | it doesn't expect to create a new compute node, it just assumes/relies on it happening | 19:09 |
sean-k-mooney | melwitt: i think so which is why its checking the hyperviors api to ensure its gone | 19:09 |
sean-k-mooney | dansmith: its expecting a sidefect of the service delete is that the compute node is removed | 19:10 |
sean-k-mooney | and the side efffect of the restart is that a new one is created | 19:10 |
dansmith | my first thought was to make it undelete the compute, then assert the uuid is the same, but then I realized that the test is checking for a thing that can't have happened since stein, so it seems like not worth a bunch of monkeywork to keep asserting this | 19:10 |
sean-k-mooney | at least that is how im interperting https://github.com/openstack/nova/blob/master/nova/tests/functional/regressions/test_bug_1764556.py#L98-L107 | 19:10 |
melwitt | ok, so what is different with the new change ... if a service and thus compute node are deleted and then nova-compute is started again, will it un-delete the existing compute node record? | 19:11 |
dansmith | it doesn't currently | 19:11 |
dansmith | after my series is done then it will | 19:11 |
melwitt | but your change will make it do that I mean? | 19:11 |
melwitt | ok | 19:11 |
sean-k-mooney | do we included deleted in the uniqconstriat for cn table | 19:12 |
dansmith | sean-k-mooney: no, which is why it conflicts | 19:12 |
dansmith | sean-k-mooney: we hit that with some of the previous rename customer scenarios too if you recall | 19:12 |
sean-k-mooney | ack ok so either we undelete or we add it to the uniqcontratint | 19:12 |
dansmith | yes, and undelete is the right thing IMHO, but that's *after* this point in the series | 19:13 |
dansmith | and since this test is asserting something that can't be the case anymore, I want to nuke it :) | 19:13 |
sean-k-mooney | ya so either slap an expect fail on this or nuke it | 19:13 |
sean-k-mooney | im fine with the latter | 19:13 |
dansmith | I don't want to xfail it because I don't want to fix it later because I think it's no longer useful | 19:14 |
dansmith | but if I'm wrong, you (all) need to say so | 19:14 |
sean-k-mooney | well going forward we dont want to recreate the CN with a differnt uuid | 19:15 |
melwitt | sorry, I'm going back and trying to understand how that test is relying on a new compute node record | 19:15 |
melwitt | I know that it is but I can't see why when I look at it | 19:15 |
dansmith | melwitt: relying on the new compute node or relying on it being recreated in some way? | 19:15 |
melwitt | dansmith: the recreation | 19:16 |
dansmith | it relies on there being some compute node because it does a migration, which won't work without it | 19:16 |
sean-k-mooney | it migrate back to the host that was deleted | 19:16 |
dansmith | it doesn't care (or know) whether or not it's recreated or undeleted | 19:16 |
sean-k-mooney | just that it exists | 19:16 |
melwitt | ok, I guess I don't get why that wouldn't work with your code change | 19:16 |
dansmith | not even that it exists, just that it can migrate | 19:16 |
melwitt | if you are going to undelete it | 19:17 |
dansmith | I'm going to undelete it eventually, but not at patch #3 | 19:17 |
dansmith | but patch #3 is where we start getting the same uuid for compute nodes, which means it fails to blindly re-create the compute node because of the UC | 19:17 |
melwitt | ah ok. so this would be a "temporary" failure if we keep the test | 19:17 |
melwitt | in that it would work again after the undelete patch happens | 19:18 |
dansmith | it doesn't really matter that it works or doesn't, because I can fix the test or the code.. my point is it's not a case that can exist in real life (since stein) so I think it'd be better not to do that work for no reason | 19:18 |
dansmith | yeah | 19:18 |
dansmith | I could xfail it, and then at the end, unxfail it | 19:18 |
dansmith | but the latter will be "re-enable this test for a thing that can't happen anymore" :) | 19:19 |
melwitt | yeah I guess I'm thinking does it matter if a customer has deleted service records that are super old that have no uuids? | 19:19 |
sean-k-mooney | you coudl but we dont intend to support compute service with out a uuid in teh compute agent that is going to be exicurign this code | 19:19 |
sean-k-mooney | not anymore anyway | 19:19 |
sean-k-mooney | melwitt: it would have to be pre pike | 19:19 |
dansmith | they would have to have a service that was deleted before stein, which remains in the database, which they re-started in antelope | 19:19 |
dansmith | I'll just move the undelete code into this patch | 19:20 |
dansmith | I thought this would be an easy conversation | 19:21 |
dansmith | moving it is easier :) | 19:21 |
melwitt | ok, just trying to think if there's anyway something could break or we lose coverage if we delete it. sorry | 19:21 |
sean-k-mooney | the coverage we would be losing is assertign that a compute-agent in A can work with a compute service record that does not have a uuid | 19:21 |
melwitt | like, do we have some other test that makes sure you can delete the service and then restart nova-compute and then migrate and then assert it worked | 19:21 |
dansmith | it's theoretically losing some coverage I guess, but the only reason the test will pass after the undelete is because the compute node actually does have a uuid that I can undelete from | 19:22 |
dansmith | it's like, not a thing that could happen in the real world, | 19:22 |
melwitt | if we do, then we don't need this one | 19:22 |
dansmith | because their records would actually not have uuids like these fake test ones do | 19:23 |
melwitt | yeah sorry, I mean without consideration of the uuid. just covering that deleting a service and starting nova-compute again migrate still works | 19:23 |
melwitt | I agree that the uuid part of it is so old that we need not test for it | 19:24 |
sean-k-mooney | well the overall functionallity that they were trying to test was InstanceListWithDeletedServicesTestCase | 19:25 |
dansmith | okay I don't think the bug actually has much to do with migrate, | 19:25 |
melwitt | I wasn't clear on whether the concept in general of deleting a service and then starting it again stuff will still work, if that is covered somewhere | 19:25 |
sean-k-mooney | ya i dont think so either | 19:26 |
dansmith | it's instance list, the test just uses migrate to generate some traffic and records I think | 19:26 |
sean-k-mooney | right so you could jsut delete the service | 19:26 |
sean-k-mooney | and then do an instnace list | 19:26 |
dansmith | melwitt: tbh I think that's probably a risky thing to do right now, not sure if we claim to support it.. it's like we have service delete, we don't have undelete, but if you restart a service with the right name after deleting it, it'll come back from the dead, | 19:27 |
dansmith | which is actually a problem because of how we recreate compute nodes and potentially can have conflicts with the provider name in placement | 19:27 |
sean-k-mooney | it will mostly come back form the dead | 19:27 |
dansmith | because the name will be the same, but the uuid will be different (currently) | 19:27 |
sean-k-mooney | but not fully | 19:27 |
sean-k-mooney | right the uuid will be differnt and naythign like pci claims will not be recreated | 19:28 |
sean-k-mooney | so it will come back in a broken state | 19:28 |
sean-k-mooney | unfortuntly if our customer have shown us anything its posible to run in that broken state for an extended period of time without noticing | 19:29 |
melwitt | yeah. I mean like regression coverage that deleting the service and restarting nova-compute with the new undelete will remain working | 19:29 |
dansmith | heh yeah | 19:29 |
sean-k-mooney | melwitt: well it will actully work better then it does today | 19:29 |
melwitt | like is this test the only place we test this or is it covered somewhere else already and this test isn't providing anything new other than uuid checking | 19:30 |
sean-k-mooney | but that does not mean we technially supprot it today or sould support it going forward | 19:30 |
dansmith | it sounds like melwitt wants a more generic test to validate that the de-zombification works today, even though it shouldn't be expected to, and that this series will not make it worse | 19:30 |
dansmith | yeah, that's my only complaint about writing that test, but perhaps I should just do it | 19:30 |
melwitt | so you're saying we do *not* support deleting a service and restarting nova-compute and having stuff still wowrk? | 19:31 |
melwitt | *work? | 19:31 |
sean-k-mooney | melwitt: thats what im saying as an operator you should not expect that to work | 19:31 |
dansmith | agree, not sure if we're explicit about it though | 19:31 |
sean-k-mooney | if you do not use any pci/numa stuff or have not vms on it at the time it will work | 19:32 |
dansmith | also not defending that as a good thing :) | 19:32 |
melwitt | sean-k-mooney: that seems so unexpected to me. sorry, I just had no idea. I thought they're supposed to be able to do that if the hostname stays the same | 19:32 |
sean-k-mooney | there is no expection that the compute node uuid would remain the same | 19:32 |
dansmith | melwitt: the reality is different I think | 19:32 |
melwitt | so if someone messes up and deletes a service and then says oops that was a mistake, then all those instances are expected not to work? | 19:32 |
sean-k-mooney | its a uuid4 and not based on the hostname/hypervior_hostname | 19:32 |
melwitt | dang | 19:32 |
dansmith | you can't delete a service with instances on it | 19:32 |
melwitt | ok, so that saves it I guess? ok | 19:33 |
dansmith | saves it from the single-click-mega-fail, but.. :) | 19:33 |
sean-k-mooney | dansmith: are you sure | 19:33 |
dansmith | pretty sure | 19:33 |
sean-k-mooney | ok cause i know we have code to loop over the allocation in placment and delete them before we delete the placment rp when teh compute serivce is deleted | 19:33 |
melwitt | just seems so harsh lol (if it were possible to delete the service while instances are on it) | 19:34 |
dansmith | yup | 19:34 |
sean-k-mooney | i guess that is just to prevent leaked allocation blocking the placment cleanup | 19:34 |
sean-k-mooney | ah https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/services.py#L269-L282 | 19:35 |
sean-k-mooney | we special case the nova-compute | 19:35 |
sean-k-mooney | so ya you cant delete it if it has instance | 19:36 |
melwitt | ok, well, if that's the case then I understand why and agree the test can be removed entirely. just seems so harsh, if what I was thinking were possible (and it is not possible bc we don't let you delete a service with instances mapped to it) | 19:36 |
sean-k-mooney | in which case provide the placment clean up happens properly it does not really matter if the uuid changes in that case | 19:36 |
sean-k-mooney | or if we undelete | 19:36 |
sean-k-mooney | https://github.com/openstack/nova/commit/42f62f1ed2ad76829eb9d40a8b9646a523f6381f | 19:37 |
sean-k-mooney | melwitt: it was only blokced in rocky it looks like | 19:37 |
sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1763183 | 19:38 |
melwitt | I think we (maybe I) backported it downstream | 19:38 |
sean-k-mooney | well it was backported upstream to pike | 19:38 |
melwitt | I just was not thinking about it or remembering it | 19:38 |
melwitt | ah ok | 19:38 |
sean-k-mooney | i rememebr being able to delete compute serivce with instance at one point but i feel like that is just because i mess up my local devstack not because i planed to do it | 19:39 |
dansmith | melwitt: here are the most service-delete-y tests we have in functional/ https://github.com/openstack/nova/blob/master/nova/tests/functional/wsgi/test_services.py#L119 | 19:40 |
melwitt | yeah you used to be able to | 19:40 |
dansmith | none of them ensure we can start an instance on the resurrected service, | 19:40 |
dansmith | although they do restart the compute to make sure it comes back up | 19:40 |
dansmith | which is the thing sean-k-mooney and laugh at outside a fake environment :P | 19:40 |
melwitt | I see, ok. thanks | 19:41 |
dansmith | melwitt: so your demand is me adding a test that a resurrected compute can fake boot a fake instance un a fake environment, and then I can delete this regression test, right? | 19:41 |
dansmith | (snarky on purpose, but serious) | 19:41 |
melwitt | sorry for the longer convo. I was very confused by the test and then I was erroneously thinking of an accidental service delete scenario | 19:41 |
dansmith | don't apologize | 19:42 |
melwitt | yeah, I said earlier I understand now and agree the test can be removed without loss of anything | 19:42 |
dansmith | the stuff I'm having to do in this set to make such a simple thing work is ridiculously incestuous | 19:42 |
dansmith | melwitt: well, I think adding a "and can boot something" thing to those ^ would make that a defensible position for me :) | 19:42 |
melwitt | I bet :\ | 19:43 |
melwitt | thanks for that 😂 | 19:44 |
sean-k-mooney | dansmith: alot of that likel come form how the fixture make restarting compute service work in the past | 19:49 |
dansmith | yes, I'm well aware | 19:50 |
melwitt | dansmith: I agree adding a "and can boot something" to those existing tests is a nice thing to cover. but I don't expect it to have to be part of your series, to be clear | 19:50 |
sean-k-mooney | with the stable uuid serise i am assuming you will have a functional test that start with an empty db and starts a comptue service with the uuid specifed in a file | 19:51 |
sean-k-mooney | you have a seperte test that delete it form teh db and starts it again if you wanted | 19:51 |
sean-k-mooney | but ya i think we agreed on nuke the thing and move on with your seriese | 19:52 |
melwitt | yes | 19:53 |
dansmith | well, I figure I need to add the other when I drop the regression test | 19:55 |
dansmith | there's something weird though about not seeing the provider get recreated after restarting the old compute, | 19:55 |
dansmith | although I see it happen in the logs | 19:55 |
sean-k-mooney | that happens after teh perodic task runs although it also happens i think in init host | 19:56 |
dansmith | I see it created before I look for it | 19:57 |
dansmith | https://pastebin.com/isqJXnfW | 19:57 |
dansmith | first line is it being created in our db, then placement, then the last one is looking for it, but it's missing | 19:57 |
dansmith | I kinda wonder if there's a bug causing us to find the old deleted compute node before the new one, and then return nothing because it's deleted | 19:59 |
dansmith | hah | 20:02 |
dansmith | 2023-01-05 12:01:58,067 INFO [nova.api.openstack.compute.hypervisors] Unable to find service for compute node host1. The service may be deleted and compute nodes need to be manually cleaned up. | 20:02 |
dansmith | that's what happens when I try to list hypervisors with the old name after re-starting the service | 20:02 |
dansmith | the service object should be undeleted, a new compute node was created, | 20:02 |
dansmith | yet listing doesn't include *either* because of that ^ | 20:03 |
dansmith | melwitt: see what we mean now? :) | 20:03 |
melwitt | 😵💫 | 20:03 |
sean-k-mooney | could this be related to the cell mappings | 20:04 |
sean-k-mooney | in the api db | 20:04 |
sean-k-mooney | as in does discover host need to be run | 20:05 |
dansmith | god I hope not | 20:06 |
sean-k-mooney | dansmith: by the way i do know that if the resouce tracker is broken the compute service can show up in the comptue service list but the compute node will not show up in the hypervior list | 20:06 |
sean-k-mooney | so if you run the test with OS_DEBUG maybe there is somethign breaking in the restart | 20:06 |
sean-k-mooney | i only see info logs in the output you pasted so fi this is from a functional test then you might need OS_DEBUG=1 | 20:07 |
dansmith | sure enough: Host 'host1' is not mapped to any cell | 20:08 |
sean-k-mooney | although if it was broken that way i woudl expect to see some trace backs or Error logs so debug should not be required | 20:08 |
dansmith | OS_DEBUG changed lately btw | 20:09 |
dansmith | I used to set OS_DEBUG=y but that doesn't work anymore | 20:10 |
dansmith | is =1 the new magic? | 20:10 |
sean-k-mooney | i have always used 1 but not sure if/when that changed | 20:11 |
sean-k-mooney | i dont think its every really been documented properly | 20:11 |
dansmith | yeah | 20:14 |
dansmith | =y generates an exception now | 20:14 |
sean-k-mooney | i assuem its anythign loosely equivalent to true in a c like language | 20:16 |
sean-k-mooney | for the cell mappings stuff i dont think we normally run discovier hosts explictly anywhere in our funct tests | 20:17 |
dansmith | I didn't either, which is why it seems weird to me that it fails like that | 20:18 |
dansmith | maybe we insert the mapping in start but not in restart? | 20:18 |
dansmith | anyway, | 20:18 |
dansmith | I'll leave that as s #FIXME for later | 20:18 |
sean-k-mooney | it might be burried in some of the compute create code but ill admit i have neverlooked | 20:18 |
sean-k-mooney | you could always cheat with the conductor periodic if you needed to in the short term | 20:18 |
sean-k-mooney | anywya im going to call it a day soon | 20:19 |
dansmith | if I don't verify the new rp I'll make it past | 20:19 |
sean-k-mooney | i dont see how the cell mappings stuff could impact the palcment part by the way. what was the exception you got? | 20:20 |
sean-k-mooney | the cell mappiing shoudl only affect calling the comptue service via rpc | 20:20 |
sean-k-mooney | so the rp thing most be somethign else | 20:21 |
dansmith | it impacts the placement stuff only in the verification in the tests, because we use hypervisors to find the rp uuid and then check the allocations | 20:53 |
dansmith | if I just don't do that validation (like other parts of the test) them I'm good | 20:53 |
*** dasm is now known as dasm|off | 22:33 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!