Friday, 2024-07-19

*** bauzas_ is now known as bauzas00:12
*** bauzas_ is now known as bauzas03:11
mikalbauzas: I am fine with working on adding tests too. Without Kerbside deployed it would still be meaningful to have a functional test which called the console create call to get an auth token and then looked up connection details from that.06:39
mikalOne wart there is that the openstacksdk doesn't actually support the console auth token API at the moment, so I need to add that too unless I am going to write a different client for that test.06:40
*** bauzas_ is now known as bauzas07:25
bauzasmikal: fine by me, we'll discuss this on the implementation then07:49
bauzassean-k-mooney: when you're up, could you please +2/+W https://review.opendev.org/c/openstack/os-traits/+/908966 ?08:52
bauzasthen I'll create a os-traits release patch08:52
sean-k-mooneybauzas: done08:54
bauzas++08:54
sean-k-mooneyi have been thinking about the cpu power managment 08:54
sean-k-mooneyi have a simple quick fix and hten a longer term fix08:54
bauzasall approved specs are now in the dalmatian etherpad08:54
bauzasand Launchpad is now correct08:55
bauzassean-k-mooney: kk, tell me08:55
bauzasI just need to go to a cycling shop in 30 mins08:55
sean-k-mooneythe simple quick fix is when nova-compute is stopped we can just power on all core08:55
sean-k-mooneyand when we start up we can just cache the core info with the runonce decorator before we turn off any of the cores08:56
bauzasit would be a behaviour change, but why not08:57
sean-k-mooneythe longer term soltuion is we may want to add a local key value store or some other form of data persitince08:58
sean-k-mooneywe have had a few feature lately where having a very simple key value store avaibel o nthe compute like https://docs.python.org/3/library/dbm.html08:59
sean-k-mooneywould have been useful08:59
sean-k-mooneyso we shoudl disucss that at some point08:59
bauzasfor remembering the governor strategy ?09:00
sean-k-mooneythere are several other good option like sqlite 09:00
sean-k-mooneybauzas: not just for that09:00
sean-k-mooneyfor data persitance on the comptue node locally in general09:00
bauzasthen this is a way larger context but I don't disagree09:01
sean-k-mooneybauzas: for example for storing information about the image in the image cache09:01
bauzasI just think this is useless for just caching the governor state09:01
sean-k-mooneybauzas: right that why i said long term for that approhc09:01
sean-k-mooneybauzas: right i never mention those becaus im not suggestin we do09:01
sean-k-mooneywhat we need to cache is strictly the numa info for the cores before we turn them off so that the numa toplgoy blob has the correct numa node and socket09:02
bauzaswhat we can also do is when the instance is about to spawn, we can doublecheck whether the core is poweroff if this is asking to get a governor09:02
bauzasto set*09:02
sean-k-mooneybauzas: this is not related to the govenor09:02
sean-k-mooneyim not talkign about that bug im talking about the other one09:03
sean-k-mooneythe one where libvirt returns incrroet data when the core is off09:03
sean-k-mooneywhich breaks numa affinity09:03
bauzashah09:03
sean-k-mooneybauzas: i think im ok with the patch you propsed yested for the govoner bug09:04
sean-k-mooneyat least for now09:04
bauzaswe can hold a little bit the resolution 09:06
bauzasas people want09:07
bauzashonestly, that simple series became way more tricky than expected09:07
bauzasbecause the kernel doesn't act like I was expecting09:07
bauzasI just wish this was more documented09:08
opendevreviewMerged openstack/os-traits master: Add a new trait for AMD SEV-ES  https://review.opendev.org/c/openstack/os-traits/+/90896609:08
gibiturning on cpus during nova-compute shutdown is leaky, what if the hypervisor is rebooted out of the blue?09:10
gibiif we want to go this direction the it is safer to do the following in init_host: turn on all dedicated cores, collect governor data, run off all unused dedicated cores.09:13
opendevreviewFabian Wiesel proposed openstack/nova-specs master: Lazy Metadata Loading in order to Reduce Server Load  https://review.opendev.org/c/openstack/nova-specs/+/92220109:13
bauzasI need to leave now but I'm cool with discussing this later on this afternoon09:15
sean-k-mooneygibi: if the hyperiofr is reboot all the cores would be one but we coudl turn them all on, gather the numa info once and then turn them off09:16
sean-k-mooneywhen we start09:16
sean-k-mooneyso yes im fine with that in init_host09:16
sean-k-mooneyso the toploty info of a core (socket, numa node, cluster, die) shoudl not change while the agent is running09:17
sean-k-mooneythat would iply the phsical hardware changed09:17
gibiyeah I agree that we should be able to cache the topology at startup and not re-query it.09:18
sean-k-mooneyso if we can ensure we are collecting good data, (by turning on all the cores we manage), we can look that up once and stop doing it every 5 mins for no real value09:19
sean-k-mooneyi think in the short term that is the solution to the bug,09:20
sean-k-mooneyin the long term as i said to bauzas i think we have usecase wehre not could benifit form a persitient datastore that is local to the nova-compute agent fo some kind09:20
sean-k-mooneyso we have a tool we can use to solve these type of problems in a more robust way09:21
gibiyeah, we can discuss the persistent store for long term.09:21
sean-k-mooneytoo be clear im alo not opposed ot useing the filesytem as a db. that works very well and is simple to debug and understand but i think we shoudl gater some usecase and see why might be more approate and talk about this more before or during the ptg09:22
gibibauzas: I left a simple request in the power mgmt fix https://review.opendev.org/c/openstack/nova/+/92442709:35
sean-k-mooneygibi: yep i agree with %09:40
sean-k-mooney^09:40
gibibauzas: and thanks for quickly jumping on this bug09:41
opendevreviewSylvain Bauza proposed openstack/nova master: cpu: Only check governor type on online cores  https://review.opendev.org/c/openstack/nova/+/92442710:23
bauzassean-k-mooney: gibi: patch updated10:23
bauzasonce merged, I'll provide the backports to Antelope10:23
sean-k-mooneythe previous results were green and the change is trivial so +210:37
gibibauzas: thanks +A10:45
*** bauzas_ is now known as bauzas10:47
bauzastkajinam: cores, fwiw os-traits release patch https://review.opendev.org/c/openstack/releases/+/92449211:32
stephenfingmann: Lemme know when you're around. I'm not sure I understand what you mean in https://review.opendev.org/c/openstack/nova/+/91573811:56
opendevreviewStefan Hoffmann proposed openstack/nova master: wait for ovn network at migration  https://review.opendev.org/c/openstack/nova/+/92450011:56
songwenping_sean-k-mooney: hi, the cmd 'openstack server migration show f4078cee-71c0-4a47-a0e6-76b67ab625fa 845cca1c-ec99-43bd-a7e8-ce588c0e1dbf' raise exception 'In-progress live migration ddf01c0f-d0f3-469e-a320-ebcb6732420e is not found for server f4078cee-71c0-4a47-a0e6-76b67ab625fa.'12:00
songwenping_the migration status is failed12:00
songwenping_so we cannot get the detail of failed migration?12:01
opendevreviewStephen Finucane proposed openstack/nova master: conf: Clarify '[api] response_validation help' text  https://review.opendev.org/c/openstack/nova/+/92450112:01
atmarkhello, we have bunch of windows VMs that are boot from volume and got impacted by crowdstrike incident today. As per doc, `Rescuing a volume-backed instance is not supported with this mode`.  Is there any method to rescue an bfv instance?13:48
*** bauzas_ is now known as bauzas14:02
sean-k-mooneyatmark: it depend on the release14:13
sean-k-mooneyrescuse with boot form voluem was added in ussuri https://specs.openstack.org/openstack/nova-specs/specs/ussuri/implemented/virt-bfv-instance-rescue.html14:15
atmarksean-k-mooney: thought it wasn't supported. i'm on xena14:22
sean-k-mooneyyou need to use this with the stable rescue feature i think but its been supproted for several years now14:26
atmarkok, i can't get it to work. I uploaded with cirros image with this property `--property hw_rescue_device=cdrom`14:32
atmarkthen I tried  `openstack --os-compute-api-version 2.87 server rescue --image e0b042a3-ad6a-4f6e-ad5d-f6f150b5287e a11283a0-55bb-46cb-b0be-b8c3df79a6cf` 14:33
atmarkit returns ` cannot be rescued: Cannot rescue a volume-backed instance HTTP 404` 14:34
atmarks/404/40014:34
sean-k-mooneyyou need an newer api micorversion and you need to set other image properites on the image14:34
opendevreviewMerged openstack/nova master: cpu: Only check governor type on online cores  https://review.opendev.org/c/openstack/nova/+/92442714:49
sean-k-mooneyatmark: https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#maximum-in-ussuri-and-victoria14:53
sean-k-mooneyyou need micro version 2.8714:54
sean-k-mooneyand you should set hw_rescue_bus  i recommend hw_rescue_bus=usb 14:55
opendevreviewSylvain Bauza proposed openstack/nova stable/2024.1: cpu: Only check governor type on online cores  https://review.opendev.org/c/openstack/nova/+/92451414:56
sean-k-mooneyatmark: these are the tempest test in case that helps https://github.com/openstack/tempest/blob/1af21705c53bc9911ea467eaeee2bc12489a43ed/tempest/api/compute/servers/test_server_rescue.py#L259-L31014:57
opendevreviewSylvain Bauza proposed openstack/nova stable/2023.2: cpu: Only check governor type on online cores  https://review.opendev.org/c/openstack/nova/+/92451715:05
opendevreviewSylvain Bauza proposed openstack/nova stable/2023.1: cpu: Only check governor type on online cores  https://review.opendev.org/c/openstack/nova/+/92451815:08
gibistephenfin: what would be the way to get an openstack sdk client not based on the service user in the config but based on the user token from a nova context? I tried naivly https://paste.opendev.org/show/bnuA5WLIZvlJGR4cSHUt/ but it fails with `Jul 19 15:29:53 gibi-devstack-aio-jammy devstack@n-api.service[1814794]: ERROR nova.api.openstack.wsgi openstack.exceptions.NotSupported: The 15:32
gibishared-file-system service for :None exists but does not have any supported versions.15:33
gibi`15:33
gibifor the nova - manila integration we try to use the sdk to talk to manila but we want to make sure that the user who request attaching a manial share from nova has the creds to read the share in the manila api15:34
*** bauzas_ is now known as bauzas15:50
atmarksean-k-mooney: Thanks. Got it working now. I set hw_rescue_device=cdrom and hw_rescue_bus=ide  for ISO.  Upstream doc https://docs.openstack.org/nova/latest/user/rescue.html#stable-device-instance-rescue says `or` instead of `and` https://docs.openstack.org/nova/latest/user/rescue.html#stable-device-instance-rescue  16:57
*** bauzas_ is now known as bauzas19:18
*** haleyb is now known as haleyb|out20:31
*** bauzas_ is now known as bauzas22:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!