Monday, 2022-11-21

opendevreviewJorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host  https://review.opendev.org/c/openstack/nova/+/86481210:14
opendevreviewJorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host  https://review.opendev.org/c/openstack/nova/+/86481210:52
opendevreviewJorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host  https://review.opendev.org/c/openstack/nova/+/86481211:37
admin1i have a server with 1 vm .. i want to do maintenance on this server ... when i run openstack server migrate .. it tells me    compute host X could not be found .. 12:16
admin1i look into the logs and it has a diff uuid 12:17
admin1if i want to delete this host, it says it has instances, clear it first 12:17
admin1so i am in a bit of catch2212:17
admin1cannot fix without migratiing , cannot migrate without fixing 12:17
sean-k-mooneyadmin1: it sounds like you changed the hostname on the server or the host value in the nova.conf12:19
admin1sean-k-mooney, i have always used the openstack-ansible playbook  and never touched a manual setting 12:19
sean-k-mooneythats the only way the uuid would change12:19
admin1how would one fix this ? 12:20
sean-k-mooneyyou need to deterim if the hostname changed first12:29
sean-k-mooneybut it likely will need db surgery if it did and you cant set it back12:30
admin1would changing the resource provider uuid for this hypervisor from old ( non existent) to new one help ? 12:36
admin1i see that the UUID appears in only 1 filed in the compute_nodes tables 12:37
admin1sean-k-mooney, i think the hostname changed from fqdn -> non-fqdn 12:46
admin1hostname remained the same 12:46
admin1sean-k-mooney, how to check own uuid ? 13:11
admin1from the hypervisor 13:11
opendevreviewSahid Orentino Ferdjaoui proposed openstack/nova master: compute: enhance compute evacuate instance to support target state  https://review.opendev.org/c/openstack/nova/+/85838313:19
opendevreviewSahid Orentino Ferdjaoui proposed openstack/nova master: api: extend evacuate instance to support target state  https://review.opendev.org/c/openstack/nova/+/85838413:19
sahido/ gibi sean-k-mooney I have added you change that you were looking for, hope that makes sense13:21
sahidhttps://review.opendev.org/c/openstack/nova/+/858384/20/nova/api/openstack/compute/evacuate.py#10413:21
sahids/you/the13:22
sean-k-mooneysahid: admin1  sorry was on a call downstream. sahid ill try and take a look  at yyour change in general later in the week but that section looks like i was expecting so i think that will be fine13:39
sean-k-mooneyadmin1: that is unforgunete the base way to resolve this issue would be to set teh hostname back to the fqdn13:39
admin1i have one vm in this i need to migrate .. after that i can just delete /re-initialize it 13:39
sahidsean-k-mooney: no worries, thanks a lot for your return13:40
sean-k-mooneycan you check the instance.host value for that vm13:40
sean-k-mooneyadmin1: the instance.host value is ment to match the host value in the nova.conf13:41
sean-k-mooneyif you have just one vm the simpleist fix woudl be to set the nova.conf host value on that node to match the instance.host on the vm13:41
sean-k-mooneythen you should be able to cold migrate teh vm13:41
sean-k-mooneylive migrtate might also work depending on the vm. e.g. if you are using any special feature like sriov or cpu pinning then cold migration has a higher proablity of working13:42
admin1i was not able to find hostname or name value in nova.conf 13:43
sean-k-mooneyadmin1: if its not set the defautl is socket.gethostname()13:43
sean-k-mooneyadmin1: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.host13:44
sean-k-mooneyadmin1: nova does not support changing the hostname because it currpts our db. we have had a bad expirince with customer doing this acidentally of late to the point that we are now working on detecting it and prevent the compute agent form starting when it happens https://review.opendev.org/q/topic:bp%252Fstable-compute-uuids13:45
admin1Failed to create resource provider record in placement API for UUID 88b9b395-784f-4d78-8497-3d674f7dff64  .. Conflicting resource provider name:  h20 already exists .. this is what I have 13:46
admin1so question is where does this UUID come from ? 13:46
sean-k-mooneyah yes that makes sesne13:48
sean-k-mooneyok remove the host value13:48
sean-k-mooneyand upstea the instnace.host for that one instnace13:48
sean-k-mooneythey way the uuid is calulated today is we use the nova.conf host value to look for a compute service record with the same host value13:49
sean-k-mooney*we look for a compute node record with the same host value not comptue service13:49
admin1so wherever in database, h20 with old UUID appears, i need to just updated it with the new  88b9b395-784f-4d78-8497-3d674f7dff64  uuid ? 13:50
opendevreviewAlexey Stupnikov proposed openstack/nova stable/wallaby: [stable-only] Use os-brick from source in wallaby  https://review.opendev.org/c/openstack/nova/+/86513413:55
sean-k-mooneyadmin1: no14:04
sean-k-mooneyyou should leave teh comptue node alone and update the host value on the one instnace that is affected14:04
sean-k-mooneyadmin1: presumable its the full fqdn corrently right14:04
sean-k-mooneyand the host is not just the hostname not fqdn14:04
sean-k-mooneyon the compute node14:05
sean-k-mooneyso you need to make them match then migrate it14:05
sean-k-mooneywe use the instance.host to determin the rpc endpoint of the compute service that manages it14:05
sean-k-mooneyadmin1: so if the compute service name change and you have just one vm the shortest way to fix it is update that one instnace and then migrate it14:06
admin1right .. its full fqdn, but the issue is the current hostname is also not able to register into placement .. it saysFailed to create resource provider record in placement API for UUID 88b9b395-784f-4d78-8497-3d674f7dff64  .. Conflicting resource provider name:  h20 already exists14:06
admin1so just update for this one instance,  node to h20 instead of h20.fqdn 14:06
admin1host does show h20 .. node shows h20.fqdn 14:07
sean-k-mooneyok so i need you to check a few things all of which shoudl be the same14:08
sean-k-mooneywe need to check that the instance.host value and serivce.host value are the same.14:08
sean-k-mooneythe hyperviour hostname and placement RP name need to be the same14:08
sean-k-mooneyand the compute node uuid and placment uuid need to match14:08
sean-k-mooneyand the compute node host value must match the instace.host and service.host values14:09
sean-k-mooneythose are the 4 things that need to align.14:09
sean-k-mooneynova does not support changign the hostname or the [DEFAULT]/host value after the agent is first started on a physical server14:10
sean-k-mooneychaning either will currpt both the nova db and create issues in placemnt14:10
admin1" compute node uuid and placment uuid need to match" - where/how would I see those values ? 14:11
admin1from the db ? 14:11
sean-k-mooneyyep although you can actuly get them form the api too14:12
sean-k-mooneythe placment uuid is jsut in the placement show output14:12
sean-k-mooneythe compute node uuid is in the hypervior api if you use a new enough verion14:12
sean-k-mooneyadmin1: but yes you can get it in the cell db compute_nodes table14:12
admin1in placement, i already have  h20 as 59cc8a37-cee4-4dbc-84bf-18f56366bb2d,  and h20.fqdn as 7bf78a2d-ce88-4ba2-a5b5-27fa4479f887 .. but in the h20 nova-compute logs, it tries to register itself as 5fecf61b-feb6-4af4-82d9-e5f5245e6ae9 14:14
admin1a grep of the whole database dump shows that that UUID ... 59c  is only in 2 places ..... resource_providers and compute_nodes 14:17
admin1so if I update those 2 tables with the new uuid 5f that the node is trying to register as   instead of of the 59 in the db, would it fix ? 14:17
sean-k-mooneyadmin1: thats because you have presumable already deleted teh old compute service entry for the host14:20
sean-k-mooneya safer approch might be to remove the resouce providers for that host in placment14:21
sean-k-mooneyallow the compute service to start up and regeister its self14:22
sean-k-mooneythen make sure the instance alines and them migrate it14:22
sean-k-mooneyadmin1: i want to make it very clear however that the hostname changing is one of the most distructive things that can happen to the nova/placment dbs and is very non trivial to recover form14:23
sean-k-mooneyif you remove the placment rp with/without the fqdn14:23
sean-k-mooneyit will allow the compute service to start14:24
sean-k-mooneyif you ensure the instnace.host matches the running compute service you should then be able to manage it and migrate it14:24
admin1is there an api way to delete the entry from placement 14:24
admin1instead from db 14:24
admin1cli way 14:24
sean-k-mooneyyes if it has no allocations14:24
sean-k-mooneyhttps://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-delete14:25
sean-k-mooneyyou might have to do https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-delete to delete the allcotion for the vm first14:25
sean-k-mooneyonce the compute service is running and regesterd in placment again14:26
sean-k-mooneyupdate the instanace.host value to match the running service14:27
sean-k-mooneyoptionally run https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement-heal-allocations14:27
sean-k-mooneyfor the singel instnace14:27
sean-k-mooneyand then migrate it14:27
sean-k-mooneyheal allocations will restorte the allocatiosn in palcment that you deleted to allow you to delete the resouce provider14:28
sean-k-mooneycold migration will fix the alloction on the destination in either case when you confirm the migration14:28
admin1UUID of the consumer -- is the UUID of the vm ? 14:29
sean-k-mooneyyep14:29
sean-k-mooneyin this case at least14:29
sean-k-mooneyif you are cold  migrating a vm it will also have a second allocation using the migration uuid14:30
admin1resource provider allocation show UUID ( of h20 ) shows blank, but delete gives   Resource provider has allocations 14:32
admin1so there could be some more allocations in the old uuid .. but not present int he virsh list  that i can see14:33
sean-k-mooneywere you able to delete the fqdn version14:34
admin1yes14:34
admin1fqdn one is gone 14:34
sean-k-mooneyand the vm has the fqdn currently14:34
sean-k-mooneyif so can you check what virsh hostname outputs14:34
admin1in the instances.node , its set to fqdn 14:35
sean-k-mooneyis it the hostname or hostname.fqdn14:36
sean-k-mooneyor i guess hostname.domain 14:36
admin1i was able to delete both now 14:36
sean-k-mooneyoh ok good14:37
sean-k-mooneyso the compute agent should now be able to start14:37
admin1the GUI hypervisors showed the instances .. 14:37
admin1it registered itself now .. 14:38
sean-k-mooneyya if you have db currption like this its hard to resolve14:38
sean-k-mooneyso if its regeisterd its self you just need to ensure the instance.host and service.host agree and you shoudl be able to migrate14:38
admin1now when i try to migrate, using openstack server migrate, it says compute host h20 could not be found 14:39
admin1so i guess its trying to refer to some other h20 14:39
sean-k-mooneyyou see the compute service in openstack compute service list right and its up14:40
sean-k-mooneyoh did you update the compute service mappings in the api deb14:40
admin1not the last part .. 14:40
sean-k-mooneyyou need to run nova-mange cell_v2 discover_hosts i think14:40
admin1that would be from inside the nova venv ? 14:41
sean-k-mooneythe new comptue service recorred need to be mapped to the correct cell14:41
sean-k-mooneyideally form one of the contoller with db access14:41
admin1ok14:41
admin1from the os utils as admin, or as nova in the nova venv 14:41
sean-k-mooneynova-mange uses the credential in the nova.conf14:41
admin1got it14:42
sean-k-mooneyso you should use the config the conductor uses14:42
sean-k-mooneyor ap14:42
sean-k-mooney*or api14:42
sean-k-mooneyhttps://docs.openstack.org/nova/latest/cli/nova-manage.html#cell-v2-discover-hosts14:42
*** dasm|off is now known as dasm14:48
admin1sean-k-mooney , thank you .. finally its migrating to another host 14:50
sean-k-mooneyadmin1: once you have completed that and confirmed the migration14:50
admin1i read the spec .. having a uuid that is associated with the server and not associated with hostname will fix issues like this in future 14:50
sean-k-mooneyi woudl advise checking if any other computes have had a host name change14:50
sean-k-mooneyadmin1: that not really what the sepc is goign to do14:51
admin1in my case,  grafana showed fqdn while others were non-fqdn, so another collegue decided to change the fqdn to just hostname only and not full fqdn to make the graphs sane 14:51
sean-k-mooneyadmin1: the spec will record the compute node uuid in a file so we can detech if the hostname changes14:51
sean-k-mooneyadmin1: ya if they did that everywhere they woudl have severly currpted the db14:52
sean-k-mooneyyou woudl need to cold migrate every workload on teh affected hosts to resolve it14:52
sean-k-mooneyits much much better to correct the hostname if no new instnace have been created 14:52
admin1yeah .. we normally ensure all is what is needed before deploying the first vm 14:54
admin1but in this case, since the server was replaced and pxe put the fqdn, it slipped 14:54
sean-k-mooneyack14:54
admin1and it was corrected only after the first vm was deployed14:54
sean-k-mooneyi would advise autiting the rest to make sure there are no other in this state14:55
sean-k-mooneythe longer you hosts like this the harder it is to fix14:55
opendevreviewMerged openstack/os-vif stable/zed: Move mtu update request into ovsdb transaction  https://review.opendev.org/c/openstack/os-vif/+/86399315:13
*** tbachman_ is now known as tbachman15:43
opendevreviewsean mooney proposed openstack/nova-specs master: add spec for fqdn in hostname  https://review.opendev.org/c/openstack/nova-specs/+/86262616:23
sean-k-mooneygibi: dansmith melwitt ^ that hopefully has adressed all the outstanding comments16:24
admin1as an operator, i would prefer to not have fqdn but just hostnames ..  because when it comes to monitoring and graphs, with fqdn it clutters the whole graph ..  h20.location.dev.cloud.domain.com where the location.dev.cloud.domain.com  is redundant in all 16:26
sean-k-mooneyadmin1: that spec is for vms16:26
sean-k-mooneyand on the compute nodes you are free to use either16:27
sean-k-mooneyi prefer using hostnames on the computes nodes too16:27
sean-k-mooneyadmin1: for what its worth nova did not test and recommended against using fqdns for the compute node hostname for a very long time16:30
sean-k-mooneysome installer implemtned it anyway and we kind of got stuck with supporting it16:30
sean-k-mooneyunfortunetly if you want to use tls i think (not 100% sure) a fqdn is required for the certs16:31
sean-k-mooneythat is the main reason they started changing to FQDNs as far as i am aware16:31
opendevreviewsean mooney proposed openstack/nova stable/xena: Record SRIOV PF MAC in the binding profile  https://review.opendev.org/c/openstack/nova/+/86493317:08
opendevreviewsean mooney proposed openstack/nova stable/xena: Remove double mocking  https://review.opendev.org/c/openstack/nova/+/86493417:08
opendevreviewsean mooney proposed openstack/nova stable/xena: Remove double mocking... again  https://review.opendev.org/c/openstack/nova/+/86493517:08
opendevreviewsean mooney proposed openstack/nova stable/xena: Add compute restart capability for libvirt func tests  https://review.opendev.org/c/openstack/nova/+/86493617:08
opendevreviewsean mooney proposed openstack/nova stable/xena: enable blocked VDPA move operations  https://review.opendev.org/c/openstack/nova/+/86493717:08
opendevreviewMerged openstack/os-vif stable/yoga: Move mtu update request into ovsdb transaction  https://review.opendev.org/c/openstack/os-vif/+/86399418:18
opendevreviewMerged openstack/nova stable/zed: Handle "no RAM info was set" migration case  https://review.opendev.org/c/openstack/nova/+/86073220:59
*** tbachman_ is now known as tbachman21:04
opendevreviewMerged openstack/nova master: Update contributor guide for 2023.1 Antelope  https://review.opendev.org/c/openstack/nova/+/85823821:50
*** dasm is now known as dasm|off22:01
opendevreviewGhanshyam proposed openstack/nova master: Update gate jobs as per the 2023.1 cycle testing runtime  https://review.opendev.org/c/openstack/nova/+/86111122:08
opendevreviewGhanshyam proposed openstack/os-vif master: Update gate jobs as per the 2023.1 cycle testing runtime  https://review.opendev.org/c/openstack/os-vif/+/86146822:08

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!