opendevreview | Ghanshyam proposed openstack/nova-specs master: Allow project admin to list hypervisors https://review.opendev.org/c/openstack/nova-specs/+/793011 | 00:06 |
---|---|---|
gmann | dansmith: sean-k-mooney melwitt gibi ^^ please review the 'project admin boot server on host' spec. I have proposed to modify the existing field but if that break user I am ok on adding new field too. | 00:10 |
gmann | gibi: also please remove your procedural -2. | 00:10 |
gibi | gmann: I dropped the -2, I will review the spec later today | 06:22 |
opendevreview | Rajat Dhasmana proposed openstack/nova-specs master: Add spec for volume backed server rebuild https://review.opendev.org/c/openstack/nova-specs/+/809621 | 06:27 |
bauzas | good morning Nova | 07:03 |
*** brinzhang_ is now known as brinzhang | 07:26 | |
*** akekane_ is now known as abhishekk | 07:53 | |
stephenfin | lyarwood: Can I just say that our volume attachment APIs are exceptionally...weird? :-D There doesn't seem to be any way to fetch an attachment by its own ID (you have to use the server and volume IDs), and you need to pass a volume ID when updating an attachment, even if you're only changing the delete on termination behavior | 09:42 |
stephenfin | (the students we're mentoring are running into bugs with openstacksdk caused by, I think, misunderstandings of how that API is supposed to work) | 09:43 |
lyarwood | stephenfin: \o morning (just back online after a few days off sick) | 09:52 |
lyarwood | stephenfin: wait my APIs? ^_^ | 09:52 |
stephenfin | lyarwood: Oh, sorry to hear, hope you're feeling better /o\ | 09:53 |
lyarwood | stephenfin: They pre-date me buddy but you're right that there's no support for Nova's attachment UUID to be used to lookup things | 09:53 |
stephenfin | I wasn't blaming you but if you want to take ownership | 09:53 |
stephenfin | You just seemed like someone that would appreciate such comments 0:) | 09:53 |
lyarwood | stephenfin: Aye it's a valid RFE of sorts | 09:53 |
gibi | it is good to see that stephen is bringing back honest feedback about our API usability. | 10:06 |
gibi | *stephenfin | 10:07 |
gibi | I have no clue who has the time to fix it | 10:07 |
gibi | but still the feedback is appreciated | 10:07 |
stephenfin | It could be something for future students to work on. I'll mention it to diablo_rojo | 10:08 |
sean-k-mooney[m] | im really not sure why we have a volume attaments api in nova iteslf. to me the atacment is not something we shoudl be exposing to users we should just have the volume resource associated with the server | 10:09 |
sean-k-mooney[m] | we dont expose port bindign in our api. they exist in neutorn but not in nova | 10:10 |
stephenfin | sean-k-mooney[m]: how would propose modifying e.g. the delete on termination behaviour for an attached volume in that scenario | 10:10 |
stephenfin | or doing volume swaps | 10:10 |
stephenfin | *how would you | 10:10 |
sean-k-mooney[m] | it would be an atibute on the volume | 10:10 |
sean-k-mooney[m] | and or set on the cinder resouce | 10:10 |
stephenfin | what about multi-attach volumes? | 10:10 |
sean-k-mooney[m] | not on novas | 10:10 |
sean-k-mooney[m] | you can have volume attaments they shoould just not be part fo the nova api | 10:11 |
stephenfin | this isn't a cinder thing, right? nova decides whether to delete $thing or not | 10:11 |
sean-k-mooney[m] | nova should have /server/uuid/volumes | 10:11 |
stephenfin | that's what we have :) | 10:11 |
sean-k-mooney[m] | that should just list the volume not the attacments | 10:12 |
stephenfin | /server/{server_id}/os-volumes | 10:12 |
sean-k-mooney[m] | all other data should be in cinder | 10:12 |
stephenfin | actually, no, tell a lie: the API is '/servers/{server_id}/os-volume_attachments' | 10:13 |
sean-k-mooney[m] | conceptually if you are working wiht the nova api you should not be thinking in terms of volume attachment but server and volume reosuces | 10:13 |
stephenfin | https://docs.openstack.org/api-ref/compute/?expanded=list-volume-attachments-for-an-instance-detail#list-volume-attachments-for-an-instance | 10:13 |
sean-k-mooney[m] | really all nova should be tracking in its api is the volume uuid is associated with the server | 10:15 |
sean-k-mooney[m] | perhaps the status of the volume too | 10:15 |
stephenfin | the mount point? | 10:15 |
sean-k-mooney[m] | not the mount point the tag yes since that is a nova concept | 10:16 |
sean-k-mooney[m] | we dont actully guarentee the mount point and in libvirt it is jut not correct | 10:16 |
sean-k-mooney[m] | if you specifiy the mount point name there is no way for libvit to enforce it | 10:17 |
stephenfin | oh, I didn't know that | 10:17 |
stephenfin | we should _probably_ mention that in the API ref | 10:17 |
lyarwood | it is | 10:17 |
lyarwood | https://docs.openstack.org/api-ref/compute/?expanded=attach-a-volume-to-an-instance-detail#attach-a-volume-to-an-instance | 10:18 |
lyarwood | Name of the device such as, /dev/vdb. Omit or set this parameter to null for auto-assignment, if supported. If you specify this parameter, the device must not exist in the guest operating system. Note that as of the 12.0.0 Liberty release, the Nova libvirt driver no longer honors a user-supplied device name. This is the same behavior as if the device name parameter is not supplied on the request. | 10:18 |
stephenfin | lyarwood++ Sweet. I was looking at https://docs.openstack.org/api-ref/compute/?expanded=list-volume-attachments-for-an-instance-detail#list-volume-attachments-for-an-instance and thought it would be mentioned there also | 10:18 |
stephenfin | once place is good enough though, for sure | 10:18 |
lyarwood | and I've wanted to remove it entirely from the response but that's going to take reworking the entire attach flow between the API and compute to drop some useless RPC stuff | 10:19 |
lyarwood | tbh we could also list it in the GET docs | 10:19 |
sean-k-mooney[m] | stephenfin: so ya i dont think we should have the volume attaments api we have currently and i dont think we should mirror that for the manilla shares going forward | 10:20 |
lyarwood | sean-k-mooney: I've updated the manila spec FWIW | 10:22 |
sean-k-mooney[m] | just opened it | 10:23 |
sean-k-mooney[m] | ill review it this morning | 10:23 |
sean-k-mooney[m] | i have a doctors appointment in an hour so i might loop back with you later | 10:23 |
sean-k-mooney[m] | did you see my comment regarding the vm memory | 10:24 |
sean-k-mooney[m] | oh you going to require file backed memory i almost feel like -2 for that | 10:25 |
lyarwood | yeah I've suggested going with the simple option for now and queuing the image property work for later on | 10:25 |
lyarwood | sure go ahead | 10:25 |
sean-k-mooney[m] | requireing hugepages i could live with | 10:25 |
sean-k-mooney[m] | file backed memory is not somethign we can schdule on today | 10:25 |
sean-k-mooney[m] | so there is no way to enforce it so the vm will just not be able to access the shares if it lands on a host without it | 10:26 |
lyarwood | well the compute would fail the request at that point | 10:27 |
lyarwood | late on but still | 10:27 |
lyarwood | we wouldn't have the attachment | 10:27 |
sean-k-mooney[m] | your expecting a build failure. if its like normal vhost user | 10:27 |
sean-k-mooney[m] | it will boot but the connectivy wont work | 10:27 |
lyarwood | well no | 10:27 |
lyarwood | file backed memory is a configurable on the compute right? | 10:28 |
sean-k-mooney[m] | yes | 10:28 |
lyarwood | and with this spec we are only talking about a basic attach share flow | 10:28 |
lyarwood | if we support shelved it would make this harder to assert but either way | 10:28 |
lyarwood | during the attach or boot we'd be able to tell if the compute supported file backed or not | 10:28 |
sean-k-mooney[m] | i guess since this is not boot its not as bad | 10:29 |
lyarwood | if we support shelved it would be that's more awkward yeah | 10:30 |
sean-k-mooney[m] | well the other issue is | 10:30 |
sean-k-mooney[m] | as a normal user you cant tell if file backed memory is used | 10:31 |
sean-k-mooney[m] | so you dont know if it will work | 10:31 |
lyarwood | Yeah it's awkward for end users, admins would need file backed host aggregates for this to work I guess | 10:32 |
lyarwood | but without the image property stuff this is the best we can do in the short term tbh | 10:32 |
lyarwood | so it's either deliver something this cycle or back it up behind a pile of other work | 10:32 |
sean-k-mooney[m] | yes but the vm when it was booted did not request “must be able to attached shares” in any way | 10:33 |
sean-k-mooney[m] | well lets just say use file backed memory or hugepages | 10:33 |
sean-k-mooney[m] | and i can look at creating the new imge/flavor extra spec in a sperate spec | 10:34 |
sean-k-mooney[m] | i think it would be a good addtion outside of this feature | 10:34 |
sean-k-mooney[m] | lyarwood: hugepages also solve the requirement for vhost-user and are user requestable today via flavor or image | 10:35 |
sean-k-mooney[m] | kashyap by the way do you recall i mention that file backed memory seamed to be not allocating memory form the file | 10:36 |
kashyap | sean-k-mooney[m]: Very vaguely :) | 10:36 |
kashyap | sean-k-mooney[m]: Can you refresh my memory, please? Is there a ticket/bug for this? | 10:37 |
sean-k-mooney[m] | kashyap i have been wondering the last day or two could that be related to tb-cache or something similar | 10:37 |
sean-k-mooney[m] | no i just deployed it at home to test it | 10:37 |
kashyap | Interesting. Can you share your guest XML + QEMU command-line to see if I can reproduce it | 10:37 |
sean-k-mooney[m] | then booted vms and could not over subscibe my ram with OOM | 10:37 |
sean-k-mooney[m] | well ill have to repoduce it my self in a test envionent | 10:38 |
sean-k-mooney[m] | but if i do i can share it with you | 10:38 |
sean-k-mooney[m] | lyarwood: are you setting up a deployment with file backed memory for your manila dev? | 10:38 |
lyarwood | yeah I plan to | 10:39 |
sean-k-mooney[m] | ok can you try to reverify the behavior | 10:39 |
* lyarwood nuked his original one last week before going off sick | 10:39 | |
lyarwood | yeah sure | 10:39 |
kashyap | lyarwood: Hope you're hale and hearty now | 10:40 |
kashyap | sean-k-mooney[m]: Yeah, that behaviour does sound like tb-cache thing | 10:40 |
lyarwood | kashyap: yup back to normal now thanks | 10:40 |
gibi | can I get a second core on this bugfix (bauzas and sean-k-mooney[m] are already positive on it) https://review.opendev.org/c/openstack/nova/+/813419 ? | 10:40 |
lyarwood | gibi: queued | 10:40 |
gibi | lyarwood: thanks! I'm glad you are back! | 10:41 |
sean-k-mooney[m] | basiclly i just tried to boot 6 8G vms on a host with 48G of ram and the 6th one triggered OOM | 10:41 |
sean-k-mooney[m] | gibi: my +1 dissapeared at some point but its back on it. the config help text is now better then some of our dedicated docs :) | 10:46 |
gibi | sean-k-mooney[m]: thanks. bauzas pushed me to have proper config doc and even config value validation | 10:46 |
sean-k-mooney[m] | i proably would have skipped the validation because its easy to miss updating that if we add a new type | 10:47 |
sean-k-mooney[m] | but we will proably rememeber | 10:47 |
sean-k-mooney[m] | on the other hand i am seeing a lot of issue related to edgecases with the network vif plugged events | 10:48 |
sean-k-mooney[m] | which makes me think we need a systematic soluntion to this problem sooner rather then later | 10:48 |
sean-k-mooney[m] | i might see if i can revie the work to pass the driver form neutron to nova this cycle | 10:49 |
sean-k-mooney[m] | but without neutron telling us when the event is sent i feel like we will continue to have wack a mole issues | 10:50 |
gibi | lyarwood: sean-k-mooney[m]: feel free to add me as a reviewer | 11:11 |
lyarwood | sean-k-mooney: https://review.opendev.org/c/openstack/nova/+/811716 - can you also hit this again today please? | 12:28 |
gibi | lyarwood: I getting pretty confident that the kernel panic on stable/victoria in nova-live-migration job happens because we are live migrating a guest that is not booted fully up yet. When I added 30 sec sleep before the live migration then the problem dissapeared (5/5 run green) | 13:06 |
gibi | lyarwood: from the console log I see that that without the sleep tempest would trigger live migration even 10 seconds before the guest fully booted | 13:07 |
gibi | you can see the run results here https://review.opendev.org/c/openstack/nova/+/817564 | 13:07 |
lyarwood | gibi: kk I was sure I tested my PINGABLE/SSHABLE series against it last week and it still failed | 13:07 |
gibi | so I think your idea to wait for pingable is a good direction | 13:07 |
gibi | hm, interesting | 13:07 |
lyarwood | let me look again | 13:07 |
lyarwood | https://review.opendev.org/c/openstack/nova/+/817636 | 13:08 |
lyarwood | I want to say that was against https://review.opendev.org/c/openstack/tempest/+/817635/2 | 13:08 |
lyarwood | I'm just cleaning the series up again now and can retest | 13:08 |
gibi | lyarwood: I don't see kernel panic in the runs of https://review.opendev.org/c/openstack/nova/+/817636, but there are other errors. Let's re-test it and see where we are | 13:10 |
opendevreview | Lee Yarwood proposed openstack/nova stable/victoria: DNM - Testing volume detach failures https://review.opendev.org/c/openstack/nova/+/817636 | 13:30 |
sean-k-mooney | lyarwood: yes will do | 13:42 |
lyarwood | thanks | 13:43 |
sean-k-mooney | ya ok im +1 on that ill get to your spec ater the call im on is over | 13:48 |
*** akekane_ is now known as abhishekk | 14:00 | |
opendevreview | Artom Lifshitz proposed openstack/nova master: DNM: Test token expiration during live migration https://review.opendev.org/c/openstack/nova/+/817778 | 14:20 |
opendevreview | Artom Lifshitz proposed openstack/nova master: DNM: Test token expiration during live migration https://review.opendev.org/c/openstack/nova/+/817778 | 14:28 |
bauzas | folks, in case you don't know, the OpenInfra keynote is starting in 8 mins | 14:52 |
gmann | gibi: thanks | 15:52 |
gibi | gmann: sorry I had no time to go back and properly review that today | 15:52 |
gibi | :/ | 15:52 |
gmann | no worry. | 15:53 |
whoami-rajat | lyarwood, around? | 16:13 |
lyarwood | whoami-rajat: hey yeah | 16:13 |
lyarwood | on a call but can chat | 16:13 |
whoami-rajat | hey | 16:13 |
whoami-rajat | ack | 16:14 |
whoami-rajat | so i don't have any issue with your suggestion of keeping the tried and tested way of nova doing the attachment update, but the team agreed on other flow so don't want to go back and forth | 16:14 |
lyarwood | yeah appreciate that, I wasn't at PTG so wasn't part of the discussions | 16:15 |
lyarwood | but as someone maintaining this area more than most I'm still against passing the connector around like this | 16:16 |
lyarwood | it also keeps the cinder implementation straight forward etc so it's a win win in my view | 16:16 |
lyarwood | if other nova-specs-cores are against this then they can speak out in the spec | 16:17 |
whoami-rajat | ack, makes sense to me as the code becomes easier to maintain and debug that way, not sure how much optimization that one less API call does | 16:18 |
whoami-rajat | i tried discussing the same with other cores in yesterday's nova meeting but we went out of time | 16:19 |
whoami-rajat | so i will update the spec and see if people are against it and we require further discussion on it | 16:20 |
whoami-rajat | thanks lyarwood for your inputs | 16:22 |
lyarwood | Awesome thanks and yeah agree, it's avoiding a single c-api call from n-cpu but c-vol will still have the do the same work so it's a tiny optimisation | 16:22 |
opendevreview | Artom Lifshitz proposed openstack/nova master: DNM: Test token expiration during live migration https://review.opendev.org/c/openstack/nova/+/817778 | 16:44 |
opendevreview | Artom Lifshitz proposed openstack/nova master: DNM: Test token expiration during live migration https://review.opendev.org/c/openstack/nova/+/817778 | 17:39 |
lyarwood | gibi: https://42950ae1f17575ae9a7c-fe6c968e98fdbd85f0f135fdbc9bd3ed.ssl.cf2.rackcdn.com/817636/2/check/nova-live-migration/66b064b/testr_results.html - so that worked pretty well aside from the SG group DELETE blowing up but I can fix that up | 17:40 |
lyarwood | gibi: had to wait 32 seconds for sshd to start in the instance | 17:40 |
gibi | that seems align with the logs I collected. The kernel needed more than 10 seconds to boot up | 17:41 |
gibi | unfortunately there is no timestamps in the cloud init part | 17:41 |
gibi | but dhcp definetly needs seconds to finish | 17:41 |
sean-k-mooney | lyarwood: that being waiting for ssh/ping to work? or soemthing else | 17:50 |
lyarwood | yeah ssh | 17:50 |
sean-k-mooney | cores can restore patches that are abandoned that belong to others right | 17:51 |
sean-k-mooney | i messaged the autor via gerrit but if they dont respond in a few days i might ask ye to unabandong a chagne | 17:52 |
melwitt | sean-k-mooney: yeah cores can restore patch. which patch is it? I can do it | 18:01 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/741529 | 18:02 |
melwitt | done | 18:02 |
sean-k-mooney | thanks | 18:02 |
sean-k-mooney | since it was there idea i want to give creit where credit is due and reuse there patch rather then start a new one | 18:03 |
melwitt | makes sense ++ | 18:03 |
opendevreview | Lee Yarwood proposed openstack/nova-specs master: Repropose Add libvirt support for flavor and image defined ephemeral encryption https://review.opendev.org/c/openstack/nova-specs/+/810868 | 19:36 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!