Thursday, 2021-09-09

gibigood morning07:12
bauzasgibi: good morning07:56
bauzaswas mostly off the grid yesterday due to a hell day with customer escalations07:57
gibinot much happened upstream yestrday here either08:01
bauzasgibi: in case you haven't seen it, i eventually found the problem with my prelude change08:10
bauzasone single line was badly indented by one char08:10
bauzasbut not the one which was returning the error08:11
bauzas...08:11
gibiyepp, I +2d the prelude yestearday. thank you for writing it08:13
kashyapbauzas: Then how did you discover it? If it's not the line that caused the error :-)08:19
bauzaskashyap: I still use my eyes in general08:20
kashyapHehe08:20
bauzasautomated coding isn't an option yet, I'm afraid08:20
bauzasI should ask elon to propose it08:20
kashyapbauzas: GitHUb has some ideas with their <clears-throat> "Copilot" 08:23
kashyapbauzas: But, look - https://www.techradar.com/news/github-autopilot-highly-likely-to-introduce-bugs-and-vulnerabilities-report-claims08:23
kashyap"researchers discover that nearly 40% of the code suggestions by GitHub’s Copilot tool are erroneous, from a security point of view"08:23
kashyapBut LOL, as if that's a surprise!08:24
bauzaskashyap: you get the same experience with Tesla Full Self Driving which was proposed since a while 08:24
bauzasthe only difference being that a car bug leads to some injury :D08:25
kashyapI don't drive; so ignore cars altogether :D  I'm more interested in long-distance electric bikes :D08:25
bauzasI switched 70% of my drives to be full electric, I'm chasing down the last 30% bits08:25
bauzasbut that will require hardware upgrade08:26
bauzaskashyap: excellent choice, btw. do you have government incentives for this like in France ?08:26
kashyapbauzas: I was told there are; but I need to dig in still08:28
kashyapbauzas: I got more energized about it after my recent hiking in the Alps.  08:29
bauzaskashyap: sure, but mountain e-bikes are way different 08:31
kashyapbauzas: Oh, sure08:31
kashyapI was not mixing them up.  Just noticing how people were doing long-distance rides there w/ e-bikes made me think about it more08:31
bauzaskashyap: https://mobilit.belgium.be/nl/mobiliteit/personenvervoer/fiets ;)08:32
* bauzas is glad that when clicking to the nl button, this didn't went back to the homepage :)08:32
kashyapbauzas: Hehe, thank you.  Yeah, NL and FR are treated as first-class citizens on most web pages08:33
kashyapbauzas: I knew this existed, just didn't bother to investigate 08:33
* kashyap bookmarks08:33
bauzaskashyap: surprinsingly, most of the incentives seem to be only for the Brussels and the Wallony regions08:36
bauzasafaicr, Ghent is in Flanders, right?08:36
kashyapbauzas: Yes, it is08:36
* bauzas suggests to relocate :D08:36
kashyapThat's very odd, though.08:36
kashyapbauzas: LOL, no, thank you 08:36
bauzasthere is a fun fact here08:37
bauzasif you want to buy an electric car, the gov gives you a discount of 7k€ if the car is below 45k€08:37
kashyapNice, that's quite a chunky amount08:38
bauzasbut, if you live in the Marseille area, the local city gov gives you an extra 5k08:38
bauzasso, lots of people are considering some way to address their primary location as some random Marseille place08:39
kashyapHeh08:40
gibithose are nice summs indeed08:40
gibihere we have ~5k€ subsidise for full electric cars (and free parking)08:43
gibibut for cars < 35k€08:44
kashyapI do hope people are taking advantage of it; I guess it's a win-win08:47
bauzaskashyap: we do08:48
bauzaswe have two cars, one is full electric (a peugeot 208) and one is plugin-hybrid (a skoda superb)08:49
bauzaseven with the hybrid which has 40km+ electric range, we try to do 100% of our drives to be electric08:49
kashyapNice; /me learned first time of plugin-hybrid08:50
bauzasas a consequence, we use the 208 (which is a compact car) for one-day drives around our region08:50
bauzasand we only take the superb for trips above 200 kms which requires lots of leg space08:50
bauzasI hate now refuelling08:51
bauzasthis is expensive and it stinks08:51
gibibauzas: if you feel the power, then could you please review https://review.opendev.org/c/openstack/placement/+/807014 I think (and CI thinks) it is good and fixing the transaction issue08:52
bauzashopefully next year, we'll change the superb to a new electric vehicle, because we tested long-range trips with intermediary recharges, and this works08:52
bauzasgibi: excellent point, I need to look at this one08:52
bauzasgibi: I also need to amend the vgpu doc 08:52
gibiand if you are at placement land then https://review.opendev.org/c/openstack/placement/+/807155 is simple and fixes a bug in consumer_types08:53
bauzasgibi: I guess melwitt addressed your excellent concern ?08:53
songwenping_hi,team, is any entry to delete compute node except nova service-delete?08:54
gibibauzas: yes, she added a independent transaction for re-reading the rp data in the retry loop, and it seems to work well08:54
bauzassongwenping_: you shouldn't delete the compute node entries 08:54
bauzassongwenping_: either the virt driver or the service deletion can do this08:54
songwenping_how virt driver works?08:55
kashyapsongwenping_: The bird's-eye view is this:08:57
kashyapnova-api (in coordination with nova-scheduler) --> nova-compute (virt driver) --> launches libvirtd --> launches QEMU08:57
kashyapBut you have to be more specific for people to answer :)08:57
songwenping_no, i means which scenes virt driver delete the compute node?08:59
bauzassongwenping_: sorry, I need to jump off for 30 mins09:16
bauzasbut basically, the virt driver gives the inventories and the compute nodes to the RT which creates the necessary records09:16
bauzasRT : ResourceTracker09:16
bauzasas the RT is run by the nova-compute service, you need to delete the service09:17
bauzasthere is a tight relationship between an RPC service (the nova-compute manager) and the compute node record09:17
* bauzas needs to drop09:18
songwenping_hi, team, anybody knows why placement check source node resource when evacuate VM?11:15
sean-k-mooneyhow do you mean11:19
sean-k-mooneyas it it can include the source host in the set of host returned11:19
sean-k-mooneythat would be because we do not currently filter host but just up hosts11:21
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py#L241-L25411:21
sean-k-mooneywe do filter hosts by there disabeld status however11:21
sean-k-mooneyso fi you had dissbled the host you are evacuatating form then it would not be included in the placment query11:21
sean-k-mooneysongwenping_: the source node will be elimiated by the scudler after the placment query so it does not really have a negitive impact to include it11:22
opendevreviewMerged openstack/nova master: Support Cpu Compararion on Aarch64 Platform  https://review.opendev.org/c/openstack/nova/+/76392811:34
songwenping_sean-k-mooney: wait for a min, i am finding the placement code.11:41
songwenping_when we evacuate vm, placement will check_capacity_exceeded, https://github.com/openstack/placement/blob/master/placement/objects/allocation.py#L7311:44
songwenping_https://github.com/openstack/placement/blob/master/placement/objects/allocation.py#L120 contains source node provider id and dest node provider id.11:45
sean-k-mooneyyes12:05
sean-k-mooneywhen we make the allocation candiate quest we do not exclude the host we are evacuating form12:06
sean-k-mooneyah i see12:06
sean-k-mooneythis should not have any negitive effect12:07
sean-k-mooneywe are technicall checkign allocation for one addtional host that we dont need placment to condiser12:08
songwenping_sometimes there are some rubbish data at allocation table, this lead to evacuate failed due to this check.12:10
sean-k-mooneythat shoudl just eliminate the host as an allcoation candiate no?12:12
sean-k-mooneyhave you filed a bug for this12:12
sean-k-mooneyplacment should not be made aware of evuacation or other lifecycle operations explcitly12:12
songwenping_havenot filed bug.12:13
sean-k-mooneyso im not sure we could proceed with any apprcoh that required modifciation of placment to make it expcitly aware of evacuation12:13
sean-k-mooneybut we might be able to handel the error condition with corupt data 12:13
sean-k-mooneyso that it woudl not fail12:13
gibisongwenping_: during evacuation the source node allocation of the VM is kept and the dest node allocation is added to it12:17
gibiso for an evacuated VM you will see allocation on the source and dest nodes _until_ the source compute is down12:18
gibiwhen the source compute node is recovered it will delete the allocation on the source node for the already evacuated VMs12:18
sean-k-mooneygibi: for evacuate should we not be using the migration uuid for the evaucation12:19
sean-k-mooneyalocaltions12:19
sean-k-mooneylike we do for rezise12:19
songwenping_gibi: yeah, this is right workflow, but placement check the source node capacity before evacuate.12:20
gibisean-k-mooney: we never switched the evac workflow to migration allocations12:21
gibithere is todo in the code12:21
sean-k-mooneyah ok12:21
sean-k-mooneyi guess that would be the correct way to fix this then12:22
gibisongwenping_: do you have a pointer where nova checks the source node capacity during evac?12:23
sean-k-mooneyits in placment https://github.com/openstack/placement/blob/master/placement/objects/allocation.py#L12012:23
songwenping_yes, it's placement checking.12:24
gibiwhat placement does during any kind of allocation update, is to replace the existing allocation of the consumer (VM in this case) with the new requested allocation12:25
gibiso if the node is not overallocated then the replace_all should not fail12:26
gibiif you got your node overallocated then I guess the allocation update can fail12:26
gibias placement will not allow you to overallocate12:27
songwenping_yes, i means why we check the source node overallocated as the vm will be evacuated to the dest node.12:27
gibiplacement does not know these things, what placement sees is a request to update the allocation of a consumer 12:27
gibiand the old allocation has resources on the source node, the new allocation has resource on the source node and on the dest node12:27
gibiplacement goes and replaces the whole old allocation with the new allocation12:28
gibiif you got your compute overallocated then simply removing the source allocation and adding it back at this step fails12:28
gibisean-k-mooney: btw, moving to migration allocation will not solve this as there we move the VM allocation to the migration allocation and that move will fail for the same reason12:29
gibiin short, if you overallocated your compute then placement will reject allocation updates on that compute12:29
gibiyou need to resolve the overallocation12:29
sean-k-mooneyit should reject new allcoations12:29
gibino12:29
sean-k-mooneybut if we dont change the reosuce in a currnt oen it shoudl work no12:29
gibiplacement implements replace_all for allocation update12:30
gibiso it is a delete + create in the same transaction12:30
gibibut after the delte the compute is full12:30
gibiso create fails12:30
songwenping_on our product env, there are some unknow reasons that vm has allocations on two resource provider.12:30
gibisean-k-mooney: I mean I got that it would be nice to detect that the source node allocation does not change during the update and dont delete + re-create it, but still placement does not do that logic12:33
sean-k-mooneywell i was hoping that a simple uuid update woudl not triger this check12:34
gibisean-k-mooney: there is no way to update a consumer uuid12:34
gibisean-k-mooney: you update the allocation of a consumer or you create / delete consumers12:35
gibithere is no rename consumer12:35
gibiand there is no partial allocation update12:35
gibijust total one12:35
sean-k-mooneyack12:35
gibiprobably the easyest thing is to implement rename consumer, the partial allocation update feels hard12:36
sean-k-mooneyya so we would move the source allocation to the migration uuid create a new allocation for the vm using its uuid12:37
gibithen we can reimplement the allocation move from VM -> migration with the rename12:37
sean-k-mooneythen have the dest delete the migration allocation after evac12:37
sean-k-mooneywhich will avoid leaking the allcoation in placment if the source compute never comes back12:38
sean-k-mooneywell actully no12:38
sean-k-mooneywe have to be careful12:38
sean-k-mooneyto make sure if the evac fails we can jsut evac again12:38
sean-k-mooneyso the dest vm need to have the migration uuid12:38
gibiyeah, I feel there is a reason why we kept the source allocation for the source compute to clean up12:38
sean-k-mooneyuntill it succeed then we cna remvoe the source vm allcoation and rename the migration allcotion 12:39
gibihaving the migration uuid to allocate on the test is a surgery as today we just call the scheduler and that always uses the instance uuid to allocate12:39
gibis/test/dest/12:39
gibiall the moves are using the migration uuid on the dest so the scheduler don't have to be branched for moves12:40
gibisorry on the source12:40
sean-k-mooneyok12:40
sean-k-mooneywell we can just use our exisign patteren12:40
sean-k-mooneybut rename would make it simpler12:40
gibirename would be needed to solve the above placement-reject-evac-as-source-is-overallocated issue12:41
sean-k-mooneyit might also be useful for blazar12:41
sean-k-mooneythis would obviously be a api change right12:42
gibiyepp12:42
sean-k-mooneytechnically there are no filed change and we are just chanign form a 400 to 200 12:42
sean-k-mooneybut i assume that still need a microverion bump12:42
sean-k-mooneyso not backportable?12:43
gibias the 400 wasnt caused by a bug, the transformation that to 200 is a microversion bump12:43
sean-k-mooneyok so i dont really see a way to fix this in code for exisitng release then12:44
sean-k-mooneyoperators will just need to fix the RP inventories12:44
sean-k-mooneye.g. set capasty to max int or something12:44
sean-k-mooneythe comptue node would fix it when it started back up12:44
gibibasically the operator needs to resolve the overallocation12:45
sean-k-mooneybut while its down you can use osc to manually update it12:45
gibieither by deleting allocations or by increasing inventory12:45
sean-k-mooneygibi: right but if the host is down they cant really do deletes12:45
gibiture12:45
gibitrue12:45
gibithe change the inventory via OSC12:45
gibithen12:46
sean-k-mooneyyep12:46
gibithat is the way12:46
gibiand also investigate how you ended up in overallocation12:46
gibias placement should not allow that12:46
sean-k-mooneyit normally happens if you change things like cpu_dedicated_set or the amount of hugepages ectra12:47
sean-k-mooneyor actully more commanly the ram/disk/cpu allcoation ratios12:48
sean-k-mooneyim sure there are other ways too but i have most often seen it due to operators chanign config such that the current vms nolonger fit12:49
gibihm, maybe we should add a WARNING for the compute log / placement log if there is overallocation detected so the admin will detect the misconfiguration12:50
sean-k-mooneyto the perodic 12:50
sean-k-mooneyupdate_avaialable_resouces when we recalulate the placment update12:51
sean-k-mooneyya we could12:51
sean-k-mooneyim not sure how spamy that would be but it does indeicate the might need to heal allcoaiton or other wise investigate why12:52
gibisean-k-mooney: actually placement already has a warning14:07
gibisean-k-mooney: "WARNING placement.objects.resource_provider [None req-6f2253b9-a195-4bf9-8c7e-2a32271a8c0c admin admin] Resource provider 935b9ad6-d7d1-4b5a-bb49-022acbba7c72 is now over-capacity for VCPU"14:07
gibiwhen I set the allocation ratio to lower to induce overallocation14:07
opendevreviewBalazs Gibizer proposed openstack/placement master: DNM: extra logs to troubleshoot overcapacity  https://review.opendev.org/c/openstack/placement/+/80808314:15
sean-k-mooneyah nice15:33
*** abhishekk is now known as abhishekk|afk15:45
sean-k-mooneyalthough if the compute agent is down you might not see that15:47
gibielodilles_pto: there is probably a stable only bug here https://bugs.launchpad.net/nova/+bug/1941819 but as it is probably only affect stein and older which are in EM I don't think I will spend time fixing it. Maybe the bug author try it.17:03
*** efried1 is now known as efried17:30
*** akekane__ is now known as abhishekk17:39
legochenhey nova experts, one question - can someone point me the best practice about configuring nova-scheduler filter to distribute VMs to user-specified cabinets? 18:40
dansmithuser-specified, meaning "at boot a user specifies where this should go" ?18:41
dansmithor did you mean user-specific?18:41
legochenuser-specified, meaning "at boot a user specifies where this should go" ? <= yes18:42
dansmithin general this is not a thing nova allows or intends to allow, with one exception: AZs18:42
dansmithusers can choose AZs, so if you want them to be able to choose, make AZs for them to specify18:42
dansmithyou can do an AZ per site, or per aisle, or per rack or something18:43
legochenFor example, I have multiple cabients in data center, users want to distribute their VMs to different cabinets equally in order to avoid SPOF of tor switch or power stuff. 18:43
dansmiththat's what AZs are for18:44
legochenhmm, per cabinet per AZ seems not that reasonable to me :( 18:44
legochenI was thinking to configure per aggregate per cabinet. And set property - cabient=A for aggregate A, cabinet=B for aggregate B…. then users can specify —hint cabinet=A while creating a VM. 18:49
dansmiththat's what AZs are for18:49
opendevreviewxiaoxin yang proposed openstack/nova master: Secure boot requires SMM feature enabled  https://review.opendev.org/c/openstack/nova/+/80812619:18
*** slaweq1 is now known as slaweq19:19

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!