openstackgerrit | fupingxie proposed openstack/nova master: Delete allocations when it is re-allocated https://review.openstack.org/582899 | 00:46 |
---|---|---|
*** edmondsw has joined #openstack-placement | 01:11 | |
*** edmondsw has quit IRC | 01:15 | |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: Func test for improper cn local DISK_GB reporting https://review.openstack.org/583646 | 01:34 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in libvirt/test_driver.py (5) https://review.openstack.org/570842 | 01:43 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in libvirt/test_driver.py (6) https://review.openstack.org/571330 | 01:43 |
*** lei-zh has joined #openstack-placement | 01:58 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in libvirt/test_driver.py (7) https://review.openstack.org/571992 | 02:07 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in libvirt/test_driver.py (8) https://review.openstack.org/571993 | 02:09 |
openstackgerrit | yanpuqing proposed openstack/nova master: Rename auth_uri to www_authenticate_uri https://review.openstack.org/576820 | 02:19 |
*** tetsuro has joined #openstack-placement | 02:25 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (3) https://review.openstack.org/574104 | 02:36 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (4) https://review.openstack.org/574106 | 02:36 |
openstackgerrit | Merged openstack/nova master: Add regression test for bug 1781710 https://review.openstack.org/583339 | 02:38 |
openstack | bug 1781710 in OpenStack Compute (nova) "ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity failing with "Servers are on the same host"" [High,In progress] https://launchpad.net/bugs/1781710 - Assigned to Matt Riedemann (mriedem) | 02:38 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (5) https://review.openstack.org/574110 | 02:53 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (6) https://review.openstack.org/574113 | 02:53 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (7) https://review.openstack.org/574974 | 02:54 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (8) https://review.openstack.org/575311 | 02:54 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (9) https://review.openstack.org/575581 | 02:54 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (10) https://review.openstack.org/576017 | 02:54 |
*** edmondsw has joined #openstack-placement | 02:59 | |
*** edmondsw has quit IRC | 03:04 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (11) https://review.openstack.org/576018 | 03:09 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (12) https://review.openstack.org/576019 | 03:09 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (13) https://review.openstack.org/576020 | 03:09 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (14) https://review.openstack.org/576027 | 03:10 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (15) https://review.openstack.org/576031 | 03:10 |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: Report 0 root_gb in resource tracker if instance is bfv. https://review.openstack.org/584204 | 03:25 |
*** lei-zh has quit IRC | 04:06 | |
*** edmondsw has joined #openstack-placement | 04:47 | |
*** edmondsw has quit IRC | 04:52 | |
openstackgerrit | Merged openstack/nova master: Merge server create for multiple-create extension https://review.openstack.org/580017 | 04:54 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (16) https://review.openstack.org/576299 | 04:55 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (17) https://review.openstack.org/576344 | 04:55 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (18) https://review.openstack.org/576673 | 04:55 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Merge extended availability zone response into server controller https://review.openstack.org/502859 | 05:12 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Merge config drive extension response into server controller https://review.openstack.org/584223 | 05:14 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Merge config drive extension response into server controller https://review.openstack.org/584223 | 05:22 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Merge extended server attributes extension response https://review.openstack.org/584590 | 05:23 |
*** lei-zh has joined #openstack-placement | 05:29 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Merge extended server attributes extension response https://review.openstack.org/584590 | 05:42 |
openstackgerrit | jichenjc proposed openstack/nova master: add zvm into support matrix https://review.openstack.org/532720 | 06:14 |
openstackgerrit | jichenjc proposed openstack/nova master: Add zvm admin intro and hypervisor information https://review.openstack.org/533125 | 06:14 |
openstackgerrit | jichenjc proposed openstack/nova master: Add zvm CI information https://review.openstack.org/533512 | 06:14 |
openstackgerrit | Lei Zhang proposed openstack/nova master: Add method to get cpu traits https://review.openstack.org/560317 | 06:24 |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: Report 0 root_gb in resource tracker if instance is bfv. https://review.openstack.org/584204 | 06:25 |
openstackgerrit | Lei Zhang proposed openstack/nova master: Add method to get cpu traits https://review.openstack.org/560317 | 06:29 |
*** edmondsw has joined #openstack-placement | 06:35 | |
*** edmondsw has quit IRC | 06:39 | |
openstackgerrit | jichenjc proposed openstack/nova master: add zvm into support matrix https://review.openstack.org/532720 | 06:51 |
openstackgerrit | jichenjc proposed openstack/nova master: Add zvm admin intro and hypervisor information https://review.openstack.org/533125 | 06:51 |
openstackgerrit | jichenjc proposed openstack/nova master: Add zvm CI information https://review.openstack.org/533512 | 06:51 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (19) https://review.openstack.org/576676 | 07:00 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (20) https://review.openstack.org/576689 | 07:00 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (21) https://review.openstack.org/576709 | 07:00 |
*** tssurya has joined #openstack-placement | 07:14 | |
openstackgerrit | jichenjc proposed openstack/nova master: Add zvm admin intro and hypervisor information https://review.openstack.org/533125 | 07:24 |
openstackgerrit | jichenjc proposed openstack/nova master: Add zvm CI information https://review.openstack.org/533512 | 07:24 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (22) https://review.openstack.org/576712 | 07:29 |
openstackgerrit | huanhongda proposed openstack/nova master: WIP: hypervisor-stats shows wrong disk usages with shared storage https://review.openstack.org/149878 | 07:32 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Merge keypair extension response into server view builder https://review.openstack.org/584748 | 07:40 |
*** avolkov has joined #openstack-placement | 07:57 | |
*** avolkov has quit IRC | 07:59 | |
*** takashin has left #openstack-placement | 08:03 | |
*** edmondsw has joined #openstack-placement | 08:23 | |
*** edmondsw has quit IRC | 08:28 | |
openstackgerrit | Boxiang Zhu proposed openstack/nova stable/pike: Fix "instance snap min disk size err after resize instance" https://review.openstack.org/584770 | 08:40 |
*** e0ne has joined #openstack-placement | 08:43 | |
*** cdent has joined #openstack-placement | 09:39 | |
*** lei-zh has quit IRC | 09:50 | |
*** finucannot has quit IRC | 09:53 | |
*** stephenfin has joined #openstack-placement | 09:55 | |
*** edmondsw has joined #openstack-placement | 10:11 | |
*** edmondsw has quit IRC | 10:16 | |
*** bauzas has joined #openstack-placement | 10:20 | |
openstackgerrit | Chris Dent proposed openstack/nova master: [placement] Add /reshaper handler for POST https://review.openstack.org/576927 | 10:20 |
*** avolkov has joined #openstack-placement | 11:22 | |
openstackgerrit | Andrey Volkov proposed openstack/nova master: Docs: Add Placement to Nova system architecture https://review.openstack.org/584338 | 11:22 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Remove reference to transient domain when detaching devices https://review.openstack.org/584433 | 11:35 |
openstackgerrit | Andrey Volkov proposed openstack/nova master: Docs: Add Placement to Nova system architecture https://review.openstack.org/584338 | 11:54 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Add queued_for_delete field to InstanceMapping object https://review.openstack.org/566795 | 11:54 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Online migration tool for populating queued-for-delete https://review.openstack.org/582536 | 11:54 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Update queued-for-delete from the ComputeAPI during deletion/restoration https://review.openstack.org/566813 | 11:54 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Return a minimal construct for nova service-list when a cell is down https://review.openstack.org/584829 | 11:54 |
*** tetsuro has quit IRC | 12:04 | |
*** tetsuro has joined #openstack-placement | 12:09 | |
*** tetsuro has quit IRC | 12:10 | |
*** edmondsw has joined #openstack-placement | 12:13 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Wire up a force disconnect_volume flag https://review.openstack.org/584849 | 12:20 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP libvirt: Forcibly disconnect volumes during post_live_migration https://review.openstack.org/584850 | 12:20 |
*** leakypipes is now known as jaypipes | 12:29 | |
openstackgerrit | huanhongda proposed openstack/nova master: hypervisor-stats shows wrong disk usages with shared storage https://review.openstack.org/149878 | 12:32 |
*** mriedem has joined #openstack-placement | 12:53 | |
*** lei-zh has joined #openstack-placement | 13:03 | |
cdent | gibi, efried, jaypipes : found a buglet in the database side of reshaper | 13:12 |
cdent | it currently assumes you'll never be clearing inventory fully | 13:13 |
cdent | brb | 13:13 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Transform missing delete notifications https://review.openstack.org/410297 | 13:13 |
efried | cdent: Clearing all inventories for a particular provider? | 13:16 |
*** lei-zh has quit IRC | 13:16 | |
*** lei-zh has joined #openstack-placement | 13:17 | |
cdent | efried: yeah | 13:18 |
cdent | end up with an IndexError | 13:18 |
efried | ouch. | 13:19 |
cdent | line 4087 of objects/resource_provider.py | 13:19 |
cdent | because it is assuming there will be a new_inv_list for the rp | 13:19 |
efried | Cool. | 13:19 |
efried | By which I mean, good find. | 13:19 |
cdent | this was revealved because gibi asked me to right another test about new consumers, and I decided oh, hey, let's move all the inventory back to the originals | 13:20 |
cdent | and boom | 13:20 |
gibi | cdent: nice catch | 13:20 |
cdent | is the transformer code already in the gate or can/should we stop it | 13:21 |
* cdent makes a bug, for history | 13:22 | |
gibi | cdent: it is on the gate | 13:22 |
gibi | cdent: but we can pull it | 13:22 |
cdent | doesn't really matter | 13:22 |
cdent | we can just fix it in a parent of my thing | 13:23 |
cdent | since there are no caller until my stuff merges | 13:23 |
gibi | cdent: OK, let's fix it in a followup | 13:23 |
*** takashin has joined #openstack-placement | 13:27 | |
cdent | gibi, efried, jaypipes: https://bugs.launchpad.net/nova/+bug/1783130 | 13:30 |
openstack | Launchpad bug 1783130 in OpenStack Compute (nova) "placement reshaper doesn't clearing all inventories for a resource provider" [Medium,Confirmed] | 13:30 |
cdent | efried: I think you had some notions about this when writing that version of that code. What do you think of my last paragraph in the bug summary? | 13:31 |
efried | cdent: Yeah, I remember noticing this and deliberately not using rp_uuid to retrieve the rp obj because we already had it in the inventory obj. | 13:35 |
efried | except we don't in this case. | 13:35 |
efried | It wouldn't hurt to build & maintain a cache of rp_uuid: rp obj from the first loop, including `except IndexError` => do a lookup. | 13:36 |
*** belmoreira has joined #openstack-placement | 13:38 | |
efried | cdent: Are you already working a fix? | 13:38 |
cdent | efried: no sir | 13:40 |
cdent | I went to check on sarah and her foot | 13:40 |
*** tetsuro has joined #openstack-placement | 13:42 | |
efried | I'll take it | 13:42 |
cdent | thanks | 13:42 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Send soft_delete from context manager https://review.openstack.org/476459 | 13:43 |
jaypipes | efried: if you could address it, that would be appreciated. I have to go to the doctor to address a lacerated cornea. :( | 13:45 |
efried | jaypipes: Golf swing go awry? Or a pug-related incident? | 13:45 |
jaypipes | efried: I can barely look at the computer screen for more than a few minutes at a time. | 13:45 |
cdent | yow, that sounds horrible jaypipes. I hope they can fix it quick. | 13:46 |
jaypipes | efried: neither. I've had a sinus infection for 2 weeks now. no idea how this tear on my cornea occurred sometime in the last day. | 13:46 |
jaypipes | cdent: me too, considering I need to leave for Ohio tomorrow. | 13:46 |
efried | Scheduler meeting in ten minutes in #openstack-meeting-alt | 13:50 |
openstackgerrit | Dan Smith proposed openstack/nova master: Online data migration for queued_for_delete flag https://review.openstack.org/584504 | 13:52 |
openstackgerrit | Boxiang Zhu proposed openstack/nova stable/pike: Fix "instance snap min disk size err after resize instance" https://review.openstack.org/584770 | 14:06 |
openstackgerrit | Chris Dent proposed openstack/nova master: [placement] Add /reshaper handler for POST https://review.openstack.org/576927 | 14:24 |
*** purplerbot has quit IRC | 14:41 | |
*** purplerbot has joined #openstack-placement | 14:41 | |
*** lei-zh has quit IRC | 14:42 | |
*** tetsuro has quit IRC | 14:48 | |
openstackgerrit | Kashyap Chamarthy proposed openstack/nova master: libvirt: Remove usage of migrateToURI{2} APIs https://review.openstack.org/567258 | 14:56 |
gibi | efried, cdent, jaypipes: the PUT /allocations/{consumer_uuid} API doc says that it can return 409 for concurrent inventory update: https://developer.openstack.org/api-ref/placement/#update-allocations | 15:04 |
*** takashin has left #openstack-placement | 15:04 | |
gibi | efried, cdent, jaypipes: but we are not providing any provider generation information in the request body (just consumer generation) | 15:05 |
gibi | efried, cdent, jaypipes: so how can it be an inventory conflit? | 15:05 |
* jaypipes heading to eye doctor now... back later | 15:06 | |
gibi | jaypipes: take care | 15:06 |
efried | gibi: We come into the handler, grab the rp objects to check usage, then another thread comes in and updates the inventory, then we proceed and do our allocation, then when we're done with our allocation, we go to increment the rp generation and bounce there. | 15:07 |
gibi | efried: but PUT /allocations/{consumer_uuid} bounces RP generation not just consumer generation? | 15:08 |
efried | right | 15:08 |
efried | it's kind of a statement of, "Capacity was fine when we started working on this, but then something happened asynchronously that might make that not be true anymore, so we need to fail." | 15:09 |
gibi | but this should not be different from client perspectice from a simple not enough inventory available error case | 15:11 |
gibi | I mean the client does not need to differentiate between no inventory before the call or no inventory at the end of the call, both is just no inventory | 15:12 |
*** belmoreira has quit IRC | 15:12 | |
gibi | I don't feel I can explain it clearly | 15:13 |
* cdent catches up | 15:13 | |
gibi | getting a 409 due to consumer_generation mismatch is a thing that the client of the placement should understand. Also the client should understand if there is no inventory available for the allocation request. But the client should not see a resource proider generation conflict during PUT /allocations/{consumer_uuid} | 15:14 |
gibi | as the client doesn't provided a resource provider generation in the request | 15:15 |
gibi | s/provided/provide/ | 15:15 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Report 0 root_gb in resource tracker if instance is bfv. https://review.openstack.org/584204 | 15:15 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Heal RequestSpec.is_bfv for legacy instances during moves https://review.openstack.org/583715 | 15:15 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix wonky reqspec handling in conductor.unshelve_instance https://review.openstack.org/583739 | 15:15 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add shelve/unshelve wrinkle to volume-backed disk func test https://review.openstack.org/584931 | 15:15 |
cdent | gibi: that's one of the reasons we have this: | 15:15 |
cdent | https://bugs.launchpad.net/nova/+bug/1719933 | 15:15 |
openstack | Launchpad bug 1719933 in OpenStack Compute (nova) "placement server needs to retry allocations, server-side" [Medium,Triaged] | 15:15 |
cdent | the rp generation conflicts that are happen are not relevant to the client-side as they are only used to represent the database encountering a situation where it cannot ensure integrity of the compare and swap operation | 15:16 |
gibi | cdent: thanks that TODO makes sens | 15:17 |
cdent | under high allocation load (against the same resource provider) that can happen and we really ought to retry more often server side before we let the client know | 15:17 |
gibi | cdent: today we have a client side retry for that | 15:17 |
gibi | cdent: I agree that is should be server side retry | 15:17 |
gibi | cdent: and now we will have to types of retry in client side | 15:18 |
gibi | cdent: 1) blind retry in case of RP conflict | 15:18 |
cdent | but it is still the case that even if we retried ten times server side, we could still never succeed, so for the docs to be complete they need to report that the rp generation can be the source of conflict | 15:18 |
efried | gibi: We're not rechecking capacity when that 409 happens. We don't *know* whether we're out of capacity; we just know something changed. | 15:18 |
* cdent nods | 15:18 | |
gibi | cdent: 2) potentially smarter retry for consumer_generation conflict | 15:18 |
efried | cdent: sanity check, placement db can now be configured via [api_database] or [placement_database], right? | 15:19 |
cdent | efried: correct. if the latter is not set the former will be used | 15:19 |
efried | nod | 15:19 |
efried | thx | 15:19 |
*** yikun has quit IRC | 15:19 | |
gibi | efried, cdent: thank I think I got it. I will try to differentiate between the two conflict on client side somehow | 15:19 |
efried | gibi: If there's no error code, there oughtta be. | 15:20 |
efried | gibi: Pretty sure the ConcurrentUpdate will have an error code associated with it. Unless we didn't merge that patch yet - cdent, you got that at your fingertips? | 15:20 |
efried | gibi: In any case, differentiating should be possible with error codes, which is their whole purpose. | 15:20 |
cdent | efried: you mean this one: https://review.openstack.org/#/c/581771/ | 15:20 |
cdent | (that was supposed to be a question) | 15:21 |
efried | cdent: Yup, its predecessor probably hit the path gibi is talking about, but that one is good too. gibi, feel like pushing ^ that one? | 15:21 |
gibi | I will review that patch right away | 15:21 |
gibi | as I don't want to build more on top of this code https://github.com/openstack/nova/blob/f1ec4ebe64179a0d2f8efbc236be58f8e0ff577a/nova/scheduler/client/report.py#L1665 | 15:22 |
efried | gibi: Oh, if you want to resolve that TODO, that would be awesome. | 15:23 |
*** yikun has joined #openstack-placement | 15:23 | |
efried | gibi: we implemented error codes specifically to be able to de-uglify code like that. | 15:23 |
gibi | cdent: do I see correctly that we only have a single error code CONCURRENT_UPDATE for every type of resources. | 15:25 |
gibi | ? | 15:25 |
* cdent looks at efried | 15:25 | |
cdent | for the time being, yes, because jay didn't think we needed different kinds | 15:25 |
efried | yes | 15:25 |
cdent | but during that discussion there were ways to change it later if needed | 15:25 |
gibi | then I cannot differentiate between consumer_generation conflict and resource provider conflict | 15:25 |
cdent | yeah, if you want to do that, we need to add the aforementioned new codes | 15:26 |
cdent | efried: do you remember which review that was? | 15:26 |
efried | cdent: I don't. Didn't we decide we would need a new microversion, or something, if we wanted to do that? | 15:26 |
cdent | the idea was that if we change the code on a well known (but "error") response, then yes, a new microversion would be ideal | 15:27 |
cdent | that is: if the code is used in flow control | 15:27 |
gibi | cdent, efried: before jumping into adding the new error codes, could you check https://review.openstack.org/#/c/583667/2/nova/scheduler/client/report.py@1614 to see if I really need to differentiate between rp conflict (do blind retry) and consumer conflict (raise from reportclient and let the caller handle)? | 15:28 |
* cdent looks | 15:28 | |
gibi | L1631 now implements the blind retry for RP conflict only | 15:29 |
efried | It looks like blind retry ought to work for consumer gen conflict too, because you're checking for existing allocs and loading them up. That payload will include the latest consumer gen. | 15:30 |
cdent | efried has a bit more state on how we want to deal with conflicts in the report client. But I can say that the blind retry for rp conflict is okay and desirable. | 15:31 |
cdent | yeah since you're adding the GET it's probably okay | 15:31 |
gibi | cdent: I agree that rp conflict can be handled with blind retry | 15:31 |
cdent | although it seems weird to me that all we do in response to a consumer gen is try again | 15:31 |
cdent | shouldn't we check more of the world? | 15:31 |
efried | right, we need to consider whether we would still want to replace the allocations wholesale. | 15:32 |
gibi | efried: blind retry for consumer conflict would work but would defeate the idea of consumer generation | 15:32 |
cdent | yeah what gibi just said | 15:32 |
efried | Like, do we need to revalidate whatever assumptions led us to construct the alloc_request param? | 15:32 |
efried | so we're all saying the same thing. | 15:32 |
cdent | brb | 15:32 |
efried | gibi: I think we need to look further up the call stack to see what's going into the building of the alloc_request param. | 15:33 |
gibi | efried: most of the callers of claim_resources expect a totally new consumer to be created, like in case of boot or migrate | 15:33 |
efried | gibi: So under what circumstances could we possibly enter the condition on L1608? | 15:33 |
gibi | efried: only in case of force move operations | 15:34 |
efried | I think the "doubled up" algorithm is obsolete since we implemented migration UUIDs. | 15:34 |
efried | isn't it? | 15:34 |
efried | i.e. it should be "impossible" to have existing allocs when we get here? | 15:34 |
efried | unless e.g. two migrations are happening at the same time? But isn't that synchronized at the conductor level or something? | 15:35 |
gibi | efried: this is still a thing nova.scheduler.utils.claim_resources_on_destination | 15:35 |
gibi | efried: which is called from nova.conductor.manager.ComputeTaskManager#_allocate_for_evacuate_dest_host | 15:36 |
gibi | efried: and the comment in https://github.com/openstack/nova/blob/bcbc1f9aeddb060513768489450c429bf53e1e46/nova/conductor/manager.py#L838 points to forced evacuation | 15:37 |
gibi | efried: I might be mistaken | 15:42 |
gibi | efried: let me dig a bit more | 15:43 |
gibi | efried: the https://github.com/openstack/nova/blob/0807b9408045db8cd674c84def8b7e1b27171377/nova/tests/functional/test_servers.py#L2312 test case hits that edge case in claim_resources | 15:49 |
efried | gibi: not having read the test yet - is it a real thing or contrived via the test code? | 15:50 |
*** lei-zh has joined #openstack-placement | 15:50 | |
gibi | efried: I think it is a real thing, nothing relevant is mocked in that functional test and the call stack coming from the compute/manager rebuild_instance I would expect | 15:51 |
gibi | s/coming from/starting at. | 15:52 |
gibi | efried: couple of functional test is affected by that code path and both is related to evacuation | 15:53 |
gibi | s/both/all/ | 15:54 |
efried | mriedem: Reviewed https://review.openstack.org/#/c/575912/ | 15:55 |
efried | It really ought to be trivial to swap out aggregate_add_host. | 15:55 |
efried | Is it that you're busy with other stuff? | 15:55 |
gibi | efried: it seems that with evacuation both the source and the dest host allocation is held by the instance | 16:00 |
efried | gibi: Okay, we don't use the "migration UUID" for that case? | 16:02 |
gibi | it doesn't seems so | 16:02 |
gibi | we do create the Migration object in the api | 16:02 |
gibi | but | 16:02 |
efried | isn't that problematic for quotas, or something? | 16:02 |
gibi | https://github.com/openstack/nova/blob/f1ec4ebe64179a0d2f8efbc236be58f8e0ff577a/nova/compute/api.py#L4502-L4504 | 16:03 |
mriedem | efried: i'm busy and we just can't really afford to push things out too far this week because of the gate | 16:04 |
efried | ight | 16:04 |
mriedem | so things that can be done in a follow up should be done in a follow up | 16:05 |
gibi | mriedem: do you know why evacuate does not use the migration uuid as the consumer of the source host allocation? | 16:05 |
mriedem | gibi: because during evac the source host is down and when the source compute comes back up (if it comes back up) we'll delete it's allocations | 16:05 |
mriedem | so leaving the allocations for the instance against the dead/down source compute shouldn't be a problem | 16:05 |
mriedem | dansmith: ^ keep me honest | 16:06 |
mriedem | *we'll delete its allocations for evacuated instances | 16:06 |
mriedem | which reminds me, lyarwood has a related bug fix here https://review.openstack.org/#/c/562284/ | 16:06 |
gibi | mriedem: it is not a resource allocation problem, but more like the only case left where we double-up allocations for the same consumer during claim_resources() | 16:06 |
mriedem | oh | 16:07 |
dansmith | yup, the migration uuid holding allocations is for the source anyway, and since we're effectively never going back to the source we don't need to keep track of them back there | 16:07 |
dansmith | (or shouldn't) | 16:07 |
dansmith | but yeah we shouldn't be doing the doubling if that's true | 16:07 |
mriedem | gibi: it just wasn't done for evac b/c we didn't need to and it'd be a non-trivial amount of work to handle it that way probably | 16:07 |
*** e0ne has quit IRC | 16:08 | |
gibi | mriedem, dansmith: thanks now I have to figure out which is a bigger work, keep the code for the double-up in claim_resources() or do the migartion_uuid thing for evac | 16:09 |
dansmith | gibi: I don't think the latter is a thing | 16:09 |
gibi | where keeping the double-up code really means making it work with consumer_generations | 16:09 |
dansmith | and I don't think the former was intentional right? | 16:09 |
dansmith | I think just replace the allocations with the new ones and move on | 16:09 |
mriedem | i have to imagine it's not that simple, | 16:10 |
mriedem | what happens if the source compute comes back up during the evac? | 16:10 |
mriedem | and/or if the evac on the dest host fails | 16:10 |
dansmith | it's going to delete the instance anyway | 16:10 |
mriedem | not if the evac failed | 16:10 |
dansmith | I thought I put the nail in that already? | 16:11 |
mriedem | b/c of stuff like this https://review.openstack.org/#/c/562284/ | 16:11 |
gibi | yeah, I guess evac failed then source compute recovered is a valid case | 16:11 |
dansmith | as I've said before, I don't think we should ever let the instance just restart on the evac source | 16:11 |
dansmith | because of shit like this | 16:11 |
dansmith | I think once you've evac'd we should continue with doing it, even if the first one fails | 16:11 |
mriedem | you'd have to rebuild the instance on the source host to recover it right? | 16:11 |
mriedem | so maybe this is a ptg topic too because this isn't something we should monkey with in rocky imo | 16:12 |
dansmith | in which case? | 16:12 |
dansmith | sounds good to me | 16:12 |
efried | mriedem: Mocking bugs on https://review.openstack.org/#/c/583715/ - I'll fast-approve if you want to fix 'em real quick. | 16:13 |
dansmith | having dealt with our instance ha people trying to automate this process, I think we need to stick to a very obvious workflow | 16:13 |
gibi | mriedem, dansmith : OK, I'm adding it to the ptg pad... | 16:13 |
mriedem | i'm saying, what happens if we remove the allocations for the source compute during evac scheduling but the evac fails - then the source compute comes back up, what do we do? | 16:13 |
mriedem | we have to think through that before we just make changes here | 16:13 |
dansmith | I think we should make the source never start the instance again, but yeah, let's talk about it at ptg | 16:13 |
mriedem | efried: i'll fix them quick | 16:13 |
* dansmith has to jump on a call anyway | 16:14 | |
mriedem | adding a cache to the end of that series atm | 16:14 |
mriedem | efried: ^ for the is_bfv flag in the RT | 16:14 |
gibi | mriedem, dansmith: thanks | 16:14 |
*** lei-zh has quit IRC | 16:15 | |
gibi | cdent, efried: as adding separated error codes would need a new microversion and we are in days before FF I either solve the retry issue without error codes or we punt the 1.28 bump to Stein | 16:23 |
efried | "I told you so" | 16:23 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Heal RequestSpec.is_bfv for legacy instances during moves https://review.openstack.org/583715 | 16:24 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix wonky reqspec handling in conductor.unshelve_instance https://review.openstack.org/583739 | 16:24 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add shelve/unshelve wrinkle to volume-backed disk func test https://review.openstack.org/584931 | 16:24 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Cache is_bfv check in ResourceTracker https://review.openstack.org/584962 | 16:24 |
jaypipes | cdent, efried: well, the good news is that it's not a lacerated cornea. bad news is that it's a bacterial corneal ulcer. | 16:25 |
efried | jaypipes: eyedrops and augmentin, you'll be fine in 36 hours. | 16:25 |
jaypipes | efried: waiting on the prescription to get filled now... | 16:26 |
* gibi leaves for today | 16:31 | |
efried | cdent: Sanity check me here... | 16:46 |
efried | https://review.openstack.org/#/c/582383/5/nova/api/openstack/placement/objects/resource_provider.py@4118 | 16:47 |
efried | I didn't have to "keep track" of anything. I can identify "providers for which all inventories are being deleted" right here by asking if new_inv_list is empty. | 16:48 |
efried | that ain't right. | 16:48 |
efried | the comment should say resource *classes*, not resource *providers*. | 16:49 |
efried | actually, the comment is totally bogus. | 16:50 |
efried | it should say "keep track of resource providers whose inventories are actually changing". Which... ought to be all of them, or you wouldn't have passed it in. | 16:50 |
efried | I'm going to delete the comment. | 16:50 |
jaypipes | efried: hold up. | 16:54 |
* efried holds up | 16:55 | |
*** tssurya has quit IRC | 16:55 | |
openstackgerrit | Merged openstack/nova master: perform reshaper operations in single transaction https://review.openstack.org/582383 | 16:57 |
openstackgerrit | Merged openstack/nova master: Refactor _heal_instances_in_cell https://review.openstack.org/577896 | 16:57 |
cdent | efried: I'm back, but pausing for jaypipes | 16:58 |
jaypipes | efried: so, looking closer at this, what about putting just this right before line 4098: | 17:01 |
cdent | efried: you do want to keep track here | 17:01 |
jaypipes | if new_inv_list: | 17:01 |
jaypipes | cdent, efried: that way you never set_inventory() on line 4098 for the providers you are going to remove all inventory for? | 17:02 |
cdent | you need to blow out sooner, around 4086, no? | 17:02 |
cdent | the problem isn't really with the logic in the code | 17:03 |
cdent | it's simply that you can't index None | 17:03 |
cdent | so when new_inv_list is none in the first block, skip it | 17:03 |
cdent | and when new_inv_list is None in the last block, use a differnt resource provider | 17:04 |
efried | I've solved the IndexError. | 17:04 |
efried | I was just in the neighborhood and trying to figure out that other thing. | 17:04 |
efried | but you're right; if the new_inv_list is empty, I can `continue` in the top loop. | 17:04 |
cdent | right, and your comment is wrong, so the neighbordhood looks different from hwat you were thinking, right? | 17:04 |
openstackgerrit | Elod Illes proposed openstack/nova stable/queens: Call generate_image_url only for legacy notification https://review.openstack.org/584969 | 17:05 |
cdent | we can't "skip the rest" as the comment says | 17:05 |
efried | right | 17:06 |
efried | The only optimization in the bottom would be if they passed in an inventory unchanged from its existing state, which would be silly of them, so no reason to optimize for it. | 17:07 |
cdent | ✔ | 17:08 |
jaypipes | ack | 17:12 |
jaypipes | that works for me. | 17:12 |
jaypipes | efried: lol, patch just merged. | 17:13 |
jaypipes | efried: only after what, like 7 days in the gate? | 17:13 |
efried | jaypipes: which? | 17:13 |
efried | The reshaper db? gibi just approved it this morning. | 17:14 |
jaypipes | efried: oh? I thought it had been approved for a lot longer than that. | 17:14 |
jaypipes | was there a new revision that crept in or something? | 17:14 |
efried | No. I +2ed it late last week, but gibi had only +1ed until he saw the consensus that we should merge it rather than waiting. | 17:15 |
jaypipes | ah, ack | 17:15 |
efried | cdent: After some creative rebasing, I'm failing only that bottom test, for a new reason: Inventory for 'VCPU' on resource provider 'fb48e253-50cb-4f98-b2fc-2b770a844d0b' in use. | 17:19 |
efried | Probably just forgot to remove that allocation in the test case, haven't looked yet. | 17:19 |
efried | yeah, ALT_RP_UUID still has allocations from a couple consumers. | 17:21 |
*** e0ne has joined #openstack-placement | 17:22 | |
cdent | efried: yeah, I didn't have a good way to actually test the test, so was winging it | 17:22 |
*** e0ne has quit IRC | 17:22 | |
cdent | efried: If you want to leave it, I can clean things up later (or tomorrow morning). Or feel free if you're feeling energetic. | 17:23 |
efried | cdent: Almost there. | 17:24 |
efried | I can't push this without yanking just-fix-it out of the gate. I'll do that and reapprove it. Only lose a couple hours of gate time. | 17:27 |
* cdent shrugs | 17:27 | |
efried | oh, never mind | 17:27 |
efried | it's almost merged. | 17:27 |
cdent | I'll check back in a couple hours | 17:27 |
efried | I'll just wait. | 17:27 |
efried | And work on the top of the series in the meantime. | 17:27 |
cdent | gotta do dinner etc | 17:28 |
efried | enjoy. | 17:28 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Use consumer generation in _heal_allocations_for_instance https://review.openstack.org/577905 | 18:10 |
mriedem | efried: ^ was a clean local rebase but gerrit apparently doesn't think it was a rebase | 18:10 |
mriedem | not sure why | 18:11 |
efried | ... | 18:11 |
mriedem | i'm just looking for re-approval | 18:11 |
efried | Bet it was this: https://review.openstack.org/#/c/577905/11/nova/scheduler/client/report.py@56 | 18:12 |
efried | mriedem: done. | 18:13 |
mriedem | thanks | 18:14 |
efried | 13 placement microversions from pike to queens. 13 from queens to rocky, assuming we don't do more, which seems reasonably likely. | 18:25 |
*** e0ne has joined #openstack-placement | 18:26 | |
openstackgerrit | Merged openstack/nova master: Rename auth_uri to www_authenticate_uri https://review.openstack.org/576820 | 18:27 |
openstackgerrit | sean mooney proposed openstack/nova master: fix disk_bus handeling https://review.openstack.org/584999 | 18:43 |
openstackgerrit | Merged openstack/nova master: Func test for improper cn local DISK_GB reporting https://review.openstack.org/583646 | 18:55 |
openstackgerrit | Merged openstack/nova master: [placement] disallow additional fields in allocations https://review.openstack.org/583907 | 18:56 |
*** cdent has quit IRC | 19:04 | |
efried | cdent's fix flushed out a bug in mriedem's test case | 19:31 |
mriedem | unpossible | 19:32 |
efried | mriedem: You're right, it was my bug. | 19:34 |
efried | mriedem: But lets me resolve a TODO of yours. | 19:34 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add method to get cpu traits https://review.openstack.org/560317 | 19:53 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: FakeLibvirtFixture: mock get_fs_info https://review.openstack.org/579201 | 19:53 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Blacklist greenlet 0.4.14 https://review.openstack.org/585016 | 19:53 |
*** edmondsw has quit IRC | 20:09 | |
*** e0ne has quit IRC | 20:50 | |
*** edmondsw has joined #openstack-placement | 20:56 | |
openstackgerrit | Jim Rollenhagen proposed openstack/nova master: ironic: add instance_uuid before any other spawn activity https://review.openstack.org/563722 | 20:59 |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] Add /reshaper handler for POST https://review.openstack.org/576927 | 21:03 |
openstackgerrit | Eric Fried proposed openstack/nova master: Make get_allocations_for_resource_provider sane https://review.openstack.org/584598 | 21:03 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: Real get_allocs_for_consumer https://review.openstack.org/584599 | 21:03 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: get_allocations_for_provider_tree https://review.openstack.org/584648 | 21:03 |
openstackgerrit | Eric Fried proposed openstack/nova master: reshaper: Look up provider if not in inventories https://review.openstack.org/585033 | 21:03 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: _reshape helper, placement min bump https://review.openstack.org/585034 | 21:03 |
efried | cdent, jaypipes, mriedem, dansmith: Some material to get ahead on, review-wise. Client side isn't finished, but these pieces ^ are ready. | 21:04 |
efried | gibi, bauzas: ^ | 21:04 |
*** e0ne has joined #openstack-placement | 21:19 | |
*** e0ne has quit IRC | 21:21 | |
jaypipes | efried: rock on, thx | 21:27 |
* jaypipes goes back to applying drops to weeping eye | 21:27 | |
*** avolkov has quit IRC | 21:42 | |
openstackgerrit | Dan Smith proposed openstack/nova master: Online data migration for queued_for_delete flag https://review.openstack.org/584504 | 21:43 |
efried | ahcrap, I think I put 'em in the wrong order. | 21:44 |
efried | wtf did I do? | 21:45 |
efried | I think I need to comment out the failing tests in the microversion patch and then re-enable them in the bug fix. | 21:45 |
openstackgerrit | Dan Smith proposed openstack/nova master: Online data migration for queued_for_delete flag https://review.openstack.org/584504 | 21:48 |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] Add /reshaper handler for POST https://review.openstack.org/576927 | 21:50 |
openstackgerrit | Eric Fried proposed openstack/nova master: reshaper: Look up provider if not in inventories https://review.openstack.org/585033 | 21:50 |
openstackgerrit | Eric Fried proposed openstack/nova master: Make get_allocations_for_resource_provider sane https://review.openstack.org/584598 | 21:50 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: Real get_allocs_for_consumer https://review.openstack.org/584599 | 21:50 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: get_allocations_for_provider_tree https://review.openstack.org/584648 | 21:50 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: _reshape helper, placement min bump https://review.openstack.org/585034 | 21:50 |
efried | fixed ^ | 21:50 |
efried | sorry if someone was in mid-review. | 21:51 |
*** edmondsw has quit IRC | 22:04 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update queued-for-delete from the ComputeAPI during deletion/restoration https://review.openstack.org/566813 | 22:05 |
openstackgerrit | Merged openstack/nova master: Add VIFMigrateData object for live migration https://review.openstack.org/515423 | 22:10 |
efried | I could use a GET /resource_providers?uuid=in:[list of uuids] at this point. | 22:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: doc: link to CERN summit video about upgrading from cells v1 to v2 https://review.openstack.org/585044 | 22:16 |
mriedem | efried: like the force_hosts thing | 22:21 |
mriedem | or i thought we talked about that being an option for that | 22:21 |
mriedem | well, GET /allocation_candidates | 22:22 |
efried | mriedem: That was GET /allocation_candidates?uuid yeah | 22:22 |
efried | mriedem: Because "we" decided not to return a payload including updated generations from POST /reshaper, "we" now need to re-GET all the providers before we proceed with update_from_provider_tree. | 22:22 |
efried | And absent GET /rps?uuid=in:[list], we gotta do 'em one at a time. | 22:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: doc: link to AZ talk from the Rocky summit https://review.openstack.org/585045 | 22:25 |
mriedem | i don't really remember any specific discussion about that, | 22:26 |
mriedem | but the microversion is still there, so we could also add GET /rps?uuid=in:[list] if we wanted | 22:26 |
efried | mriedem: In the same microversion as POST /reshaper? That seems like an abuse of microversionage. | 22:28 |
mriedem | this is because the other operations in update_from_provider_tree are going to need the updated generations from POST /reshaper yeah? | 22:28 |
efried | correct. | 22:28 |
efried | Inventory updates will be skipped (because already done) but agg & trait updates need the new gens. | 22:28 |
mriedem | i know of at least one compute microversion that touched 2 things that were loosely related | 22:28 |
mriedem | https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#id46 | 22:29 |
mriedem | why did "we" decide not to return generations in the response from POST /reshaper? | 22:29 |
mriedem | the request body requires generations right? | 22:30 |
mriedem | for POST /reshaper | 22:30 |
efried | yeah. | 22:31 |
efried | It wasn't a good reason. | 22:32 |
efried | I don't remember what it was. | 22:32 |
efried | or whether it was in review or IRC or what. | 22:32 |
efried | I remember conceding the point under protest, but not as much protest as usual, because we needed to get moving and it was going to be extra work. | 22:32 |
*** mriedem has quit IRC | 22:39 | |
openstackgerrit | Eric Fried proposed openstack/nova master: WIP: Compute: Handle reshaped provider trees https://review.openstack.org/576236 | 23:06 |
openstackgerrit | Eric Fried proposed openstack/nova master: WIP: Report client: update_from_provider_tree w/reshape https://review.openstack.org/585049 | 23:06 |
efried | mriedem, cdent, jaypipes, gibi, bauzas, dansmith: The series is complete now. Top two patches need test, but code side is ready for perusal ^ | 23:07 |
*** tetsuro has joined #openstack-placement | 23:19 | |
*** mriedem has joined #openstack-placement | 23:21 | |
*** edmondsw has joined #openstack-placement | 23:38 | |
*** edmondsw has quit IRC | 23:43 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Enhance doc to guide user to use nova user https://review.openstack.org/583115 | 23:48 |
*** mriedem has quit IRC | 23:50 | |
*** takashin has joined #openstack-placement | 23:52 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!