*** bauzas_ is now known as bauzas | 00:12 | |
*** bauzas_ is now known as bauzas | 03:11 | |
mikal | bauzas: I am fine with working on adding tests too. Without Kerbside deployed it would still be meaningful to have a functional test which called the console create call to get an auth token and then looked up connection details from that. | 06:39 |
---|---|---|
mikal | One wart there is that the openstacksdk doesn't actually support the console auth token API at the moment, so I need to add that too unless I am going to write a different client for that test. | 06:40 |
*** bauzas_ is now known as bauzas | 07:25 | |
bauzas | mikal: fine by me, we'll discuss this on the implementation then | 07:49 |
bauzas | sean-k-mooney: when you're up, could you please +2/+W https://review.opendev.org/c/openstack/os-traits/+/908966 ? | 08:52 |
bauzas | then I'll create a os-traits release patch | 08:52 |
sean-k-mooney | bauzas: done | 08:54 |
bauzas | ++ | 08:54 |
sean-k-mooney | i have been thinking about the cpu power managment | 08:54 |
sean-k-mooney | i have a simple quick fix and hten a longer term fix | 08:54 |
bauzas | all approved specs are now in the dalmatian etherpad | 08:54 |
bauzas | and Launchpad is now correct | 08:55 |
bauzas | sean-k-mooney: kk, tell me | 08:55 |
bauzas | I just need to go to a cycling shop in 30 mins | 08:55 |
sean-k-mooney | the simple quick fix is when nova-compute is stopped we can just power on all core | 08:55 |
sean-k-mooney | and when we start up we can just cache the core info with the runonce decorator before we turn off any of the cores | 08:56 |
bauzas | it would be a behaviour change, but why not | 08:57 |
sean-k-mooney | the longer term soltuion is we may want to add a local key value store or some other form of data persitince | 08:58 |
sean-k-mooney | we have had a few feature lately where having a very simple key value store avaibel o nthe compute like https://docs.python.org/3/library/dbm.html | 08:59 |
sean-k-mooney | would have been useful | 08:59 |
sean-k-mooney | so we shoudl disucss that at some point | 08:59 |
bauzas | for remembering the governor strategy ? | 09:00 |
sean-k-mooney | there are several other good option like sqlite | 09:00 |
sean-k-mooney | bauzas: not just for that | 09:00 |
sean-k-mooney | for data persitance on the comptue node locally in general | 09:00 |
bauzas | then this is a way larger context but I don't disagree | 09:01 |
sean-k-mooney | bauzas: for example for storing information about the image in the image cache | 09:01 |
bauzas | I just think this is useless for just caching the governor state | 09:01 |
sean-k-mooney | bauzas: right that why i said long term for that approhc | 09:01 |
sean-k-mooney | bauzas: right i never mention those becaus im not suggestin we do | 09:01 |
sean-k-mooney | what we need to cache is strictly the numa info for the cores before we turn them off so that the numa toplgoy blob has the correct numa node and socket | 09:02 |
bauzas | what we can also do is when the instance is about to spawn, we can doublecheck whether the core is poweroff if this is asking to get a governor | 09:02 |
bauzas | to set* | 09:02 |
sean-k-mooney | bauzas: this is not related to the govenor | 09:02 |
sean-k-mooney | im not talkign about that bug im talking about the other one | 09:03 |
sean-k-mooney | the one where libvirt returns incrroet data when the core is off | 09:03 |
sean-k-mooney | which breaks numa affinity | 09:03 |
bauzas | hah | 09:03 |
sean-k-mooney | bauzas: i think im ok with the patch you propsed yested for the govoner bug | 09:04 |
sean-k-mooney | at least for now | 09:04 |
bauzas | we can hold a little bit the resolution | 09:06 |
bauzas | as people want | 09:07 |
bauzas | honestly, that simple series became way more tricky than expected | 09:07 |
bauzas | because the kernel doesn't act like I was expecting | 09:07 |
bauzas | I just wish this was more documented | 09:08 |
opendevreview | Merged openstack/os-traits master: Add a new trait for AMD SEV-ES https://review.opendev.org/c/openstack/os-traits/+/908966 | 09:08 |
gibi | turning on cpus during nova-compute shutdown is leaky, what if the hypervisor is rebooted out of the blue? | 09:10 |
gibi | if we want to go this direction the it is safer to do the following in init_host: turn on all dedicated cores, collect governor data, run off all unused dedicated cores. | 09:13 |
opendevreview | Fabian Wiesel proposed openstack/nova-specs master: Lazy Metadata Loading in order to Reduce Server Load https://review.opendev.org/c/openstack/nova-specs/+/922201 | 09:13 |
bauzas | I need to leave now but I'm cool with discussing this later on this afternoon | 09:15 |
sean-k-mooney | gibi: if the hyperiofr is reboot all the cores would be one but we coudl turn them all on, gather the numa info once and then turn them off | 09:16 |
sean-k-mooney | when we start | 09:16 |
sean-k-mooney | so yes im fine with that in init_host | 09:16 |
sean-k-mooney | so the toploty info of a core (socket, numa node, cluster, die) shoudl not change while the agent is running | 09:17 |
sean-k-mooney | that would iply the phsical hardware changed | 09:17 |
gibi | yeah I agree that we should be able to cache the topology at startup and not re-query it. | 09:18 |
sean-k-mooney | so if we can ensure we are collecting good data, (by turning on all the cores we manage), we can look that up once and stop doing it every 5 mins for no real value | 09:19 |
sean-k-mooney | i think in the short term that is the solution to the bug, | 09:20 |
sean-k-mooney | in the long term as i said to bauzas i think we have usecase wehre not could benifit form a persitient datastore that is local to the nova-compute agent fo some kind | 09:20 |
sean-k-mooney | so we have a tool we can use to solve these type of problems in a more robust way | 09:21 |
gibi | yeah, we can discuss the persistent store for long term. | 09:21 |
sean-k-mooney | too be clear im alo not opposed ot useing the filesytem as a db. that works very well and is simple to debug and understand but i think we shoudl gater some usecase and see why might be more approate and talk about this more before or during the ptg | 09:22 |
gibi | bauzas: I left a simple request in the power mgmt fix https://review.opendev.org/c/openstack/nova/+/924427 | 09:35 |
sean-k-mooney | gibi: yep i agree with % | 09:40 |
sean-k-mooney | ^ | 09:40 |
gibi | bauzas: and thanks for quickly jumping on this bug | 09:41 |
opendevreview | Sylvain Bauza proposed openstack/nova master: cpu: Only check governor type on online cores https://review.opendev.org/c/openstack/nova/+/924427 | 10:23 |
bauzas | sean-k-mooney: gibi: patch updated | 10:23 |
bauzas | once merged, I'll provide the backports to Antelope | 10:23 |
sean-k-mooney | the previous results were green and the change is trivial so +2 | 10:37 |
gibi | bauzas: thanks +A | 10:45 |
*** bauzas_ is now known as bauzas | 10:47 | |
bauzas | tkajinam: cores, fwiw os-traits release patch https://review.opendev.org/c/openstack/releases/+/924492 | 11:32 |
stephenfin | gmann: Lemme know when you're around. I'm not sure I understand what you mean in https://review.opendev.org/c/openstack/nova/+/915738 | 11:56 |
opendevreview | Stefan Hoffmann proposed openstack/nova master: wait for ovn network at migration https://review.opendev.org/c/openstack/nova/+/924500 | 11:56 |
songwenping_ | sean-k-mooney: hi, the cmd 'openstack server migration show f4078cee-71c0-4a47-a0e6-76b67ab625fa 845cca1c-ec99-43bd-a7e8-ce588c0e1dbf' raise exception 'In-progress live migration ddf01c0f-d0f3-469e-a320-ebcb6732420e is not found for server f4078cee-71c0-4a47-a0e6-76b67ab625fa.' | 12:00 |
songwenping_ | the migration status is failed | 12:00 |
songwenping_ | so we cannot get the detail of failed migration? | 12:01 |
opendevreview | Stephen Finucane proposed openstack/nova master: conf: Clarify '[api] response_validation help' text https://review.opendev.org/c/openstack/nova/+/924501 | 12:01 |
atmark | hello, we have bunch of windows VMs that are boot from volume and got impacted by crowdstrike incident today. As per doc, `Rescuing a volume-backed instance is not supported with this mode`. Is there any method to rescue an bfv instance? | 13:48 |
*** bauzas_ is now known as bauzas | 14:02 | |
sean-k-mooney | atmark: it depend on the release | 14:13 |
sean-k-mooney | rescuse with boot form voluem was added in ussuri https://specs.openstack.org/openstack/nova-specs/specs/ussuri/implemented/virt-bfv-instance-rescue.html | 14:15 |
atmark | sean-k-mooney: thought it wasn't supported. i'm on xena | 14:22 |
sean-k-mooney | you need to use this with the stable rescue feature i think but its been supproted for several years now | 14:26 |
atmark | ok, i can't get it to work. I uploaded with cirros image with this property `--property hw_rescue_device=cdrom` | 14:32 |
atmark | then I tried `openstack --os-compute-api-version 2.87 server rescue --image e0b042a3-ad6a-4f6e-ad5d-f6f150b5287e a11283a0-55bb-46cb-b0be-b8c3df79a6cf` | 14:33 |
atmark | it returns ` cannot be rescued: Cannot rescue a volume-backed instance HTTP 404` | 14:34 |
atmark | s/404/400 | 14:34 |
sean-k-mooney | you need an newer api micorversion and you need to set other image properites on the image | 14:34 |
opendevreview | Merged openstack/nova master: cpu: Only check governor type on online cores https://review.opendev.org/c/openstack/nova/+/924427 | 14:49 |
sean-k-mooney | atmark: https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#maximum-in-ussuri-and-victoria | 14:53 |
sean-k-mooney | you need micro version 2.87 | 14:54 |
sean-k-mooney | and you should set hw_rescue_bus i recommend hw_rescue_bus=usb | 14:55 |
opendevreview | Sylvain Bauza proposed openstack/nova stable/2024.1: cpu: Only check governor type on online cores https://review.opendev.org/c/openstack/nova/+/924514 | 14:56 |
sean-k-mooney | atmark: these are the tempest test in case that helps https://github.com/openstack/tempest/blob/1af21705c53bc9911ea467eaeee2bc12489a43ed/tempest/api/compute/servers/test_server_rescue.py#L259-L310 | 14:57 |
opendevreview | Sylvain Bauza proposed openstack/nova stable/2023.2: cpu: Only check governor type on online cores https://review.opendev.org/c/openstack/nova/+/924517 | 15:05 |
opendevreview | Sylvain Bauza proposed openstack/nova stable/2023.1: cpu: Only check governor type on online cores https://review.opendev.org/c/openstack/nova/+/924518 | 15:08 |
gibi | stephenfin: what would be the way to get an openstack sdk client not based on the service user in the config but based on the user token from a nova context? I tried naivly https://paste.opendev.org/show/bnuA5WLIZvlJGR4cSHUt/ but it fails with `Jul 19 15:29:53 gibi-devstack-aio-jammy devstack@n-api.service[1814794]: ERROR nova.api.openstack.wsgi openstack.exceptions.NotSupported: The | 15:32 |
gibi | shared-file-system service for :None exists but does not have any supported versions. | 15:33 |
gibi | ` | 15:33 |
gibi | for the nova - manila integration we try to use the sdk to talk to manila but we want to make sure that the user who request attaching a manial share from nova has the creds to read the share in the manila api | 15:34 |
*** bauzas_ is now known as bauzas | 15:50 | |
atmark | sean-k-mooney: Thanks. Got it working now. I set hw_rescue_device=cdrom and hw_rescue_bus=ide for ISO. Upstream doc https://docs.openstack.org/nova/latest/user/rescue.html#stable-device-instance-rescue says `or` instead of `and` https://docs.openstack.org/nova/latest/user/rescue.html#stable-device-instance-rescue | 16:57 |
*** bauzas_ is now known as bauzas | 19:18 | |
*** haleyb is now known as haleyb|out | 20:31 | |
*** bauzas_ is now known as bauzas | 22:43 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!