zigo | FYI, we were affected by nova-compute flapping, we applied https://review.opendev.org/c/openstack/nova/+/939317 and that solved the issue for us. Thanks to melwitt and others who worked on it. | 07:24 |
---|---|---|
opendevreview | Masahito Muroi proposed openstack/nova master: Use dict object for request_specs_dict in the _list_view https://review.opendev.org/c/openstack/nova/+/939658 | 08:06 |
sean-k-mooney | dansmith: i feel like you will care about https://etherpad.opendev.org/p/nova-2025.2-ptg#L577 | 13:04 |
sean-k-mooney | wel have an opionion | 13:04 |
sean-k-mooney | care implies a positive emotional sentement that i dont think you will express toward the idea | 13:05 |
sahid | o/ | 13:06 |
sahid | We don't have any logs in scheduler to give some information regarding the how the list is re-ordered by weighter | 13:07 |
sean-k-mooney | we do at debug | 13:08 |
sean-k-mooney | we have the weights per host | 13:08 |
sean-k-mooney | per weigher | 13:08 |
dansmith | sean-k-mooney: that line is blank, which thing? | 13:36 |
frickler | hmm, the timeslider sadly doesn't show line numbers. the line's now at #578 now, though | 14:00 |
sean-k-mooney | dansmith: I would quite like an equivalent to oVirt's VDSM hooking mechanism in Nova. | 14:00 |
sean-k-mooney | i.e. the requst to have a way to add arbiatry hooks to modify the libvirt xml without chanign nova | 14:01 |
dansmith | sean-k-mooney: okay I figured that might be it :) | 14:01 |
dansmith | bauzas: any chance you could try to review some of this OTU series? https://review.opendev.org/c/openstack/nova/+/944148 | 14:03 |
dansmith | the bottom two are just trivial setup for the larger one above | 14:03 |
bauzas_ | dansmith: I definitely want to | 14:03 |
dansmith | all of them are +2 already, so the bottom two could probably be sent to the gate without a lot of time investment :) | 14:03 |
dansmith | thanks | 14:03 |
sean-k-mooney | dansmith: +2w on the first patch +2 on the second with a question https://review.opendev.org/c/openstack/nova/+/944149/comment/133f1fc5_dabaf92d/ | 14:43 |
sean-k-mooney | im going ot grab somethign to drink and kick off a devstack install but ill try and take a look at the rest when i get back | 14:44 |
dansmith | sean-k-mooney: please see response | 14:44 |
dansmith | https://review.opendev.org/c/openstack/nova/+/943816/12/nova/compute/pci_placement_translator.py specifically L202 here :) | 14:45 |
sahid | sean-k-mooney: We don't have any logs like for filters. Filter Computefilter returned 2 host(s) I could be good to know the weighter as-well. Exemple I have removed one of them and I have no idea whether it is well removed or not | 14:51 |
sahid | I can see a ligne WeighedHost but no detail about which weighter was used to order the list | 14:52 |
sean-k-mooney | sahid: https://zuul.opendev.org/t/openstack/build/95c4e84b168d4a79ba750797f28a81c7/log/controller/logs/screen-n-sch.txt#1010 | 15:11 |
sean-k-mooney | we have output per weigher per host both pre and post normaization and multiplying | 15:12 |
sean-k-mooney | sahid: again only at debug level like the filters | 15:12 |
sean-k-mooney | grep for "DEBUG nova.weights" in yoru logs | 15:13 |
sean-k-mooney | https://github.com/openstack/nova/blob/9d910ec4bf2a12baf3b5f0ec3bc41686413538fb/nova/weights.py#L136-L170 | 15:15 |
sean-k-mooney | sahid: it was added 4 years ago https://github.com/openstack/nova/commit/154ab7b2f9ad80fe432d2c036d5e8c4ee171897b | 15:16 |
sean-k-mooney | dansmith: that not what i was asking let me restate it | 15:17 |
dansmith | oh? okay | 15:17 |
sean-k-mooney | dansmith: do we need to invaliate the cache for all pci in placement RPs or just the one that also are marked OTU | 15:17 |
sean-k-mooney | for normal devies we do not expect the reserved to change | 15:18 |
dansmith | sean-k-mooney: ah, so if you look at some of gibi's comments on the commit message, he noted that PCI-in-placement specifically calls out reserved as a thing that could be changed by the operator directly in placement, | 15:19 |
dansmith | and as such it's probably good that we invalidate our cache for all of those to make sure we're seeing the right info, even though technically only OTUs care at the moment | 15:19 |
sean-k-mooney | oh it does? ok i dont know why but if that the exsign bevhiaor then the code you wote makes sense | 15:19 |
dansmith | sean-k-mooney: yeah he provided a link to a comment in the code that specifically says that, and asked for me to add words to the commit message about it | 15:20 |
sean-k-mooney | i dont knwo what the orginal usecase for allowing the reserved to be change was but if thats the case then il upgrade to +w | 15:20 |
dansmith | so we _could_ potentially follow up with this and use the OTU trait as well to control invalidation, but I think he would argue it's best if we just don't so that future developers are clear that we won't get stuck in the same way I initially did until I added this :) | 15:21 |
sean-k-mooney | ah https://review.opendev.org/c/openstack/nova/+/944149/6//COMMIT_MSG#15 | 15:21 |
sean-k-mooney | ok i missed that when i looked at the comit | 15:21 |
gibi | sean-k-mooney: the idea behind it is that nova should own as small part of the inventory as it cares about. Before OTU it only cared about total and max_unit (as it is mandatory) so the rest of the field was never set in the inventory of a PCI RP | 15:22 |
gibi | making it possible to set externally | 15:22 |
sean-k-mooney | dansmith: well as we dicussed a few weeks ago im not really sure why we are caching at all | 15:22 |
sean-k-mooney | so im all for being safe with correctness here | 15:22 |
sean-k-mooney | even if it might be slightly slower | 15:22 |
sean-k-mooney | gibi: right and nova cares about reserved even without OTU | 15:23 |
sean-k-mooney | we report the inventory based on the devspec | 15:23 |
dansmith | well, for a big tree of lots of providers it probably is a real gain, but brings problems like this (as does all caching of course) | 15:23 |
sean-k-mooney | so you shoudl never be chanign reserved since that will mnake placement and the pci deivces tabel be sort of out of sync | 15:23 |
sean-k-mooney | anywya dont wnat ot boil the ocean i was just wonderigy why we were not checkifn for "is pci in placement and one time use" rather then just "is pci in placment" | 15:24 |
dansmith | well, that patch is before OTU, so that's part of why :) | 15:25 |
sean-k-mooney | ya i was orgianly wondering if that was the reason nad if you were going to modify it again in the next patch | 15:26 |
dansmith | nova is not currently reporting anything reserved for pci-in-placement providers, AFAIK (other than OTU in the next patch), so I think gibi's point is that if you want to reserve some VFs you would do it in placement directly | 15:26 |
dansmith | sean-k-mooney: yeah I understand now, I just didn't get that from your comment :) | 15:26 |
sean-k-mooney | no worries. so +2w on the first too. ill take a look at the rest once i have devstack deploying | 15:27 |
dansmith | thanks | 15:28 |
sean-k-mooney | dansmith: as an aside, you have a vexhost lable created have you tried to use this in ci yet? | 15:33 |
sean-k-mooney | so the host im using is not the one that has igb support. you said in your testing the rng didnt crash the host vm? | 15:40 |
sean-k-mooney | if so ill see if i can ply with that as well | 15:40 |
dansmith | (on a call) | 15:49 |
melwitt | zigo: just fyi that we had a partial regression involving mdevs with that patch which was tracked and fixed here https://bugs.launchpad.net/nova/+bug/2098892 | 16:03 |
sean-k-mooney | dansmith: not urgent but you have a trival bug in the unit test. https://review.opendev.org/c/openstack/nova/+/943816/comment/4d20bb05_6116bff7/ it does no tactully break anything so it can be a follow up | 16:21 |
dansmith | sean-k-mooney: ah yeah will fix in a follow up.. | 16:26 |
sean-k-mooney | your not actully building it into a treee or populating the pci tracker so technically it wont break anything. it just will be confusing if we read it in like a year so ya no need to respine but nice to fix | 16:27 |
dansmith | yep, spinning a follow-up now | 16:28 |
dansmith | sean-k-mooney: are you holding the +W until you test locally? | 16:29 |
dansmith | oh nm :) | 16:29 |
sean-k-mooney | i was until my devstack failed for the 3 time | 16:30 |
dansmith | heh | 16:30 |
sean-k-mooney | there is a patch sitting in ci to make our new prometous plugin not fail if /etc/promethous exists and i keep forgetting i need to delete the dir before i restack | 16:33 |
sean-k-mooney | i should just checkout the fixed version but im manually testing a docs change for the default local.conf | 16:34 |
sean-k-mooney | * default of watcher + promethous | 16:36 |
* sean-k-mooney cant spell that and its annoying how much i have to type it now | 16:36 | |
melwitt | p8s 😛 | 16:39 |
sean-k-mooney | i rather unofrtunetly misclicked yesterday and added a commone misspelling of approch to my firefox? or garmmerly dictonatry instead of fixing it | 16:41 |
sean-k-mooney | i need to go fix that before i forget | 16:42 |
sean-k-mooney | intorspeciton that is also not a correct spelling.... how do i spellcheck my dictionary | 16:43 |
sean-k-mooney | i guess google | 16:43 |
sean-k-mooney | this is a very sean problem | 16:44 |
dansmith | heh | 16:44 |
sean-k-mooney | skipable also has too Ps | 16:45 |
melwitt | you're on a roll | 16:45 |
sean-k-mooney | i have custom dictionaries for jagon like podman or mixed-rhel | 16:46 |
sean-k-mooney | but i really shoudl just delete anything that a real word because it proably wrong | 16:46 |
sean-k-mooney | sudo mkdir /etc/prometheus | 16:48 |
sean-k-mooney | mkdir: cannot create directory '/etc/prometheus': File exists | 16:48 |
sean-k-mooney | ... attepmpt 5 | 16:48 |
sean-k-mooney | this tim eim just going to use the fixed versions | 16:48 |
sahid | sean-k-mooney: thanks for the reference :-) we are running a version a bit too old I guess... | 16:48 |
sean-k-mooney | sahid: its a pretty trivial backport if thats an option for you | 16:49 |
sahid | no I don't think we need to backport it. we don't run in debug anyway it was more about to proof the change on UAT | 16:53 |
sahid | thank you sean-k-mooney | 16:53 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily for a patch release update | 17:17 | |
sean-k-mooney | just as an fyi if you end up with passt or pasta installed on your system becuase of docker/podman or i think technially libvirt? not sure | 17:57 |
sean-k-mooney | https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2065685/comments/7 | 17:57 |
sean-k-mooney | you may need to manually fix apparmor to fix your devstack | 17:58 |
sean-k-mooney | the issue is fixed just not on 24.04... | 17:58 |
sean-k-mooney | finally | 18:01 |
sean-k-mooney | Invalid [pci]device_spec config: one_time_use=true requires pci.report_in_placement to be enabled | 18:08 |
sean-k-mooney | well at least i confrimed that works | 18:08 |
sean-k-mooney | right... AttributeError: module 'os_traits' has no attribute 'HW_PCI_ONE_TIME_USE' | 18:09 |
sean-k-mooney | dansmith: so the db and placement all looks good https://paste.opendev.org/show/bpBlYPAE5cvPwAK1pOuS/ im quite tired but this is almost set up so im going to quickly create an alias and flaovr and see if i can boot a vm and dlete it and reset it | 18:16 |
sean-k-mooney | then ill call it a day and rest | 18:17 |
dansmith | sean-k-mooney: cool | 18:22 |
sean-k-mooney | ... Uggla the new alias example swe added for live migrateble dvice are wrong | 18:29 |
sean-k-mooney | Uggla: you used `'` not `"` | 18:29 |
sean-k-mooney | they are json so the have to be double quotes | 18:29 |
dansmith | ooh, yeah I see | 18:30 |
sean-k-mooney | you get a really unintuivie error | 18:31 |
sean-k-mooney | BadRequestException: 400: Client Error for url: http://192.168.16.185/compute/v2.1/servers, Invalid PCI alias definition: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) | 18:31 |
sean-k-mooney | i mean it makes sense but i dont expect end users to get that excption | 18:31 |
sean-k-mooney | wehn its an error in your alias in the config file | 18:31 |
sean-k-mooney | its really a server side error not a client side one | 18:32 |
dansmith | ah yeah, exposing that to the user is probably not a great idea either | 18:33 |
sean-k-mooney | it might be becaus i did that as admin | 18:33 |
sean-k-mooney | ill check that after im done | 18:34 |
sean-k-mooney | https://paste.opendev.org/show/bwJuKNE7VW3NvAjnxiWr/ | 18:36 |
sean-k-mooney | so cool i was able to boot a vm | 18:36 |
sean-k-mooney | it reserved the device | 18:37 |
sean-k-mooney | and after deleting it i now get a no valid host | 18:37 |
dansmith | \o/ | 18:38 |
sean-k-mooney | and cool i can reset it and boot again | 18:40 |
dansmith | \o\ /o/ \o/ | 18:40 |
sean-k-mooney | im too tired this envning to review the final functional test patch but ill do that on monday | 18:41 |
sean-k-mooney | i was testing with the reno patch just before it | 18:41 |
dansmith | no worries, good to have another confirmation, thanks | 18:41 |
dansmith | rest up for next week ;) | 18:41 |
sean-k-mooney | o/ | 18:43 |
dansmith | gmann: melwitt: this recently-added test seems to be failing a *lot* https://review.opendev.org/c/openstack/tempest/+/858885 | 19:03 |
dansmith | and it seems to be failing because the swap devices don't actually show up or disappear in the guest | 19:04 |
dansmith | probably because it doesn't wait for SSHABLE I'm guessing | 19:05 |
sean-k-mooney | that shoudl not matter for rezise since there is no hotplug | 19:05 |
sean-k-mooney | but i dotn think it will auto swap on | 19:05 |
sean-k-mooney | so depending on what the test is doing it might now be stable | 19:06 |
sean-k-mooney | hum blkid looking for type swap | 19:06 |
sean-k-mooney | do we format the disk with a swap partion | 19:07 |
dansmith | what I mean is, I'm not sure it's waiting for the instance to have actually been resized | 19:07 |
sean-k-mooney | oh | 19:07 |
sean-k-mooney | well that optentaly differnet | 19:07 |
dansmith | I guess the underlying stuff waits for the verify stage, so perhaps not | 19:08 |
dansmith | either way, seems to be quite flaky | 19:08 |
melwitt | sean-k-mooney: swap is formatted | 19:08 |
sean-k-mooney | ack | 19:08 |
sean-k-mooney | i tought it was but was not sure | 19:08 |
sean-k-mooney | im nmot sure why there are reboots in those tests | 19:08 |
dansmith | me either, but it's failing before the reboot | 19:09 |
sean-k-mooney | i get why its creating flaovr but im not sure why that part of the resize helper | 19:09 |
sean-k-mooney | i mena that fine just not what i expect form the function name | 19:10 |
dansmith | I also don't think the 2048_to_1024 test is doing anything particularly useful since it's not asserting the size of the device | 19:10 |
sean-k-mooney | https://github.com/openstack/tempest/blob/b1e168015316f3f73131957e9fb6abfd2fdc20f1/tempest/api/compute/base.py#L441-L454 should be confirming as you said | 19:11 |
dansmith | yeah | 19:11 |
sean-k-mooney | dansmith: thse looks like the tests were partly ported form the functional tests for the same thing | 19:12 |
sean-k-mooney | but those had more asserts | 19:12 |
dansmith | either way, I wonder if we should fast revert that.. it's been merged for less than 24 hours | 19:13 |
sean-k-mooney | probably if its flaky i say the pin to review a few days ago but didnt really have time to look at it | 19:15 |
dansmith | proposed https://review.opendev.org/c/openstack/tempest/+/946379 | 19:16 |
sean-k-mooney | its the remote command | 19:18 |
dansmith | looks like all the multicell jobs are failing this since it merged (which is only three amazingly) https://zuul.opendev.org/t/openstack/builds?job_name=nova-multi-cell&skip=0 | 19:18 |
sean-k-mooney | [20:16:32]❯ blkid | 19:18 |
sean-k-mooney | /dev/nvme0n1p3: UUID="9e5b21f3-dd4d-425a-b309-dcfbf585ad16" TYPE="crypto_LUKS" PARTUUID="82a4c723-1748-43f6-b82d-f051b8d3a6c5" | 19:19 |
sean-k-mooney | /dev/nvme0n1p1: UUID="F3DE-29C6" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="7208aa8f-8f44-432c-8163-693e4c1dcb8b" | 19:19 |
sean-k-mooney | /dev/nvme0n1p2: UUID="f84e29ae-bdd6-45a2-9590-f7d9fc71f90c" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="b2216aac-c4e1-4cf1-930b-37a879cb1cd3" | 19:19 |
sean-k-mooney | /dev/mapper/nvme0n1p3_crypt: UUID="0ebbc49c-51a0-4b8c-bdc1-702bf9d178b7" UUID_SUB="83086ee4-93b7-49c7-9915-0b8da9589da6" BLOCK_SIZE="4096" TYPE="btrfs" | 19:19 |
sean-k-mooney | ~ via 🐹 v1.21.11 via 🐍 v3.11.11 | 19:19 |
sean-k-mooney | [20:16:35]➜ lsblk | 19:19 |
sean-k-mooney | NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS | 19:19 |
sean-k-mooney | loop0 7:0 0 76.8M 1 loop | 19:19 |
sean-k-mooney | loop1 7:1 0 76.8M 1 loop | 19:19 |
sean-k-mooney | zram0 252:0 0 15.6G 0 disk [SWAP] | 19:19 |
sean-k-mooney | nvme0n1 259:0 0 953.9G 0 disk | 19:19 |
sean-k-mooney | ├─nvme0n1p1 259:1 0 1.1G 0 part /boot/efi | 19:19 |
sean-k-mooney | ├─nvme0n1p2 259:2 0 14.9G 0 part /boot | 19:19 |
sean-k-mooney | └─nvme0n1p3 259:3 0 937.9G 0 part | 19:19 |
sean-k-mooney | └─nvme0n1p3_crypt 253:0 0 937.8G 0 crypt /var/lib/docker/btrfs | 19:19 |
sean-k-mooney | /home | 19:19 |
sean-k-mooney | /swap | 19:19 |
sean-k-mooney | / | 19:19 |
sean-k-mooney | paste.o.o is donw | 19:19 |
sean-k-mooney | blkid wiht no args does not alwasy show swap | 19:19 |
sean-k-mooney | ah | 19:20 |
sean-k-mooney | so with sudo the swap disk show up | 19:20 |
sean-k-mooney | but not without | 19:20 |
sean-k-mooney | at least initally | 19:21 |
sean-k-mooney | once i run blkid with sudo the non sudo one has the swap partion too | 19:21 |
sean-k-mooney | so it must be caching or something like that | 19:21 |
dansmith | I suppose I won't recheck my patches since this has failed twice in a row on the same thing | 19:23 |
sean-k-mooney | https://paste.centos.org/view/ebce235e | 19:23 |
dansmith | yeah weird, but... busybox | 19:24 |
sean-k-mooney | well the last two are the gnu versions | 19:24 |
sean-k-mooney | busybox apprently does not show anything as non root | 19:24 |
sean-k-mooney | which could also be the problme here since cirror is useing busybox | 19:25 |
sean-k-mooney | which is why i cheked it | 19:25 |
dansmith | makes me think that this test wasn't actually running in whatever tempest configs are required to merge | 19:25 |
dansmith | anyway, nothing much I can do about it now I guess | 19:27 |
dansmith | I'mma go get food | 19:27 |
sean-k-mooney | did you hit the revert button | 19:28 |
dansmith | pasted above: https://review.opendev.org/c/openstack/tempest/+/946379 | 19:28 |
sean-k-mooney | ack | 19:28 |
sean-k-mooney | ok commented on that too and now i need to go before the shops start to close o/ | 19:31 |
gmann | dansmith: ok, I did nto check multicell job but it was passing on regular one. agree to merge the revert first and then we can fix the test | 19:42 |
dansmith | gmann: I'm really not sure why the multicell job would be special here, almost like it is missing a regex exclude or something | 20:11 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!