*** dviroel|afk is now known as dviroel|out | 01:26 | |
opendevreview | Merged openstack/cinder master: mypy: set no_implicit_optional https://review.opendev.org/c/openstack/cinder/+/782260 | 03:12 |
---|---|---|
opendevreview | Merged openstack/cinder master: Don't destroy existing backup by mistake on import https://review.opendev.org/c/openstack/cinder/+/839451 | 10:07 |
*** dviroel|out is now known as dviroel | 11:00 | |
opendevreview | Stephen Finucane proposed openstack/python-cinderclient master: Deprecate cinder CLI https://review.opendev.org/c/openstack/python-cinderclient/+/841725 | 12:13 |
opendevreview | Stephen Finucane proposed openstack/python-cinderclient master: docs: Update docs to reflect deprecation status https://review.opendev.org/c/openstack/python-cinderclient/+/841726 | 12:13 |
geguileo | rosmaita: whoami-rajat__ I see many commands missing in the table https://docs.openstack.org/python-openstackclient/latest/cli/decoder.html | 12:56 |
geguileo | There is no support for Active-Active, no support for dynamically getting or setting log levels | 12:57 |
geguileo | no way to get manageable lists | 12:57 |
geguileo | 2 quota commands missing | 12:57 |
rosmaita | geguileo: whoami-rajat__: guess we need to update the parity matrix doc | 12:57 |
geguileo | rosmaita: I mean, they are there, there is just no OSC command for them | 12:58 |
geguileo | no revert-to-snapshot support | 12:58 |
geguileo | we would need to go and make sure that we have 100% parity | 12:58 |
geguileo | and document that new features need to be added to OSC | 12:58 |
rosmaita | and i forgot the parity doc was stored on ethercalc, which has recently been commissioned | 12:58 |
rosmaita | :( | 12:59 |
rosmaita | well, looks like we need a new parity doc, anyway | 12:59 |
geguileo | that page looks pretty good to me | 13:00 |
rosmaita | yes, but we need a gap document to keep track of what needs to be completed | 13:00 |
rosmaita | or however the PTL wants to do it, i'm flexible | 13:01 |
geguileo | we can probably copy/paste that table into a google sheet without any effort | 13:01 |
opendevreview | Stephen Finucane proposed openstack/cinder master: api-ref: Add docs for clusters https://review.opendev.org/c/openstack/cinder/+/795785 | 13:02 |
opendevreview | Stephen Finucane proposed openstack/cinder master: Add Python 3.10 functional jobs https://review.opendev.org/c/openstack/cinder/+/841753 | 13:02 |
opendevreview | Stephen Finucane proposed openstack/cinder master: WIP: tests: Add functional tests for cluster API https://review.opendev.org/c/openstack/cinder/+/841754 | 13:02 |
stephenfin | rosmaita: I split the functional tests out of that patch. They're miles from complete and I'd really like to see _something_ merged here, particularly given we just added support for this stuff to OSC ^ | 13:03 |
stephenfin | geguileo: Are these new commands. I updated that doc only last year https://review.opendev.org/c/openstack/python-openstackclient/+/792946 | 13:05 |
stephenfin | *? | 13:05 |
rosmaita | stephenfin: yeah, i got sidetracked while reviewing the doc patch and didn't get back to it ... i hope i have notes somewhere about what i was worried about | 13:05 |
rosmaita | stephenfin: the ones geguileo is talking about are not very new; only new thing added in yoga was volume reimage | 13:06 |
stephenfin | Maybe I missed something but iirc I just ran 'OS_VOLUME_API_VERSION=3.X cinder --help' (whatever X was at the time) and reformatted the output so I could cross-reference | 13:06 |
stephenfin | Hmm, I must have missed something so | 13:07 |
geguileo | stephenfin: the commands are in the doc | 13:07 |
geguileo | stephenfin: what's missing is the OSC command equivalent | 13:07 |
geguileo | rosmaita: had said that he thought there was parity between the clients, and I was disagreeing | 13:08 |
stephenfin | Ah, gotcha. I see... | 13:08 |
stephenfin | group-create-from-src,,Creates a group from a group snapshot or a source group. | 13:08 |
stephenfin | manageable-list,,Lists all manageable volumes. | 13:08 |
stephenfin | quota-delete,,Delete the quotas for a tenant. | 13:08 |
stephenfin | quota-usage,,Lists quota usage for a tenant. | 13:08 |
stephenfin | and about 6 others | 13:08 |
stephenfin | geguileo: It's not parity, but it's _really_ close, especially now that the cluster stuff is merging. Minimal effort is needed to close the gap | 13:10 |
geguileo | stephenfin: yes, definitely we are a lot closer than I thought | 13:10 |
geguileo | I would say it justifies merging the cinder client deprecation patch | 13:11 |
geguileo | although I still don't like that OSC is slower when loading, thoug that's something we can work on | 13:11 |
stephenfin | The stuff that's remaining is mostly stuff I either don't understand (what's a "manageable" volume) or stuff I think might be covered by other commands though I'm not sure (again because I don't understand) | 13:11 |
geguileo | stephenfin: manageable volumes and snapshot is the concept of how we can bring volumes that already exist into Cinder | 13:12 |
geguileo | or make cinder "forget" about them | 13:12 |
stephenfin | Yeah, that one's complicated. It's partially because we use entrypoints (though that's got faster thanks to importlib_metadata) and partially because of cmd2 (via cliff) which loads a load of other crap like the GTK stuff for clipboard support | 13:12 |
geguileo | so if you unmanage a volume then cinder doesn't delete the actual volume in the backend, but marks the row in the DB as deleted | 13:13 |
stephenfin | oh, TIL | 13:13 |
geguileo | and if you manage it, you tell cinder that there's a volume in the array that you want to start managing (be the owner of the volume) | 13:14 |
geguileo | and the listing is so that it's easier to know which volumes are in the array (usually the specific pool) that are not yet managed by Cinder | 13:14 |
geguileo | same thing for snapshots | 13:14 |
rosmaita | geguileo: take a look at https://review.opendev.org/c/openstack/cinder/+/815660/9#message-001fb6ded8a7fe210b5a061b45197180d72fcd80 , and hit 'recheck' if you feel lucky | 13:33 |
geguileo | rosmaita: I'm feeling lucky | 13:34 |
rosmaita | go for it, man!!! | 13:34 |
rosmaita | geguileo: lmk when you want to lay some CI job improvement ideas on me | 13:38 |
geguileo | rosmaita: we can start now | 13:41 |
geguileo | I believe we have at least 3 different problems: | 13:42 |
geguileo | 1- OOM kill of backup service | 13:42 |
geguileo | 2- timeout of backups | 13:42 |
geguileo | 3- no host found on scheduler | 13:42 |
geguileo | for #1 I believe there is something going on either with Python or we somehow are holding buffers in variables for too long (though I though I had fixed that) | 13:43 |
geguileo | I wanted to deploy cinder-backup to do a memory profile, but I've been having troubles with the deployment | 13:43 |
geguileo | so we should probably wait a bit to do the analysis, because the backup driver is using swift, and the size is of the chunks is 50MB | 13:45 |
geguileo | and there should be no reason why cinder-backup ends up using 4GB before it gets killed | 13:45 |
geguileo | so changing backup_swift_object_size config option probably won't help :-( | 13:46 |
rosmaita | right, i think in the tests it's only doing one backup at a time | 13:46 |
rosmaita | or at most 5 | 13:46 |
geguileo | I've seen a couple happening concurrently | 13:46 |
geguileo | because tempest is running multiple workers in parallel | 13:46 |
rosmaita | nothing that accounts for 4G though at 50MB chunks | 13:46 |
geguileo | exactly | 13:46 |
geguileo | so I have to properly investigate it | 13:47 |
geguileo | memory profiling, object relationship analysis, garbage collection status, etc | 13:47 |
rosmaita | or somebody does ... maybe we can convince the dev who posted the patch to change object size to help | 13:47 |
geguileo | oh, is there a patch to change the object size? | 13:47 |
* geguileo didn't know | 13:48 | |
rosmaita | i may be confusing cinder with glance, i think there's something posted | 13:48 |
rosmaita | i will look later | 13:48 |
geguileo | in any case, the only default that is big is backup_file_size, but afaik that was not used in the job I saw get OOM killed | 13:48 |
geguileo | for #2 I believe that one the zuul jobs have changed a default from 300 to 196 or something like that | 13:49 |
geguileo | and it's crazy that timeouts at 196 and then at 200 seconds the backup completes... | 13:50 |
geguileo | rofl | 13:50 |
rosmaita | yeah, that's a killer | 13:50 |
geguileo | the default for tempest build_timeout configuration option in the volume group is 300 | 13:51 |
geguileo | so I don't know where that is being changed | 13:51 |
rosmaita | ok, i can look into that | 13:51 |
geguileo | then let me give you some additional info... | 13:51 |
rosmaita | you don't happen to have a link to that job? | 13:51 |
geguileo | on this patch https://review.opendev.org/c/openstack/os-brick/+/836059/3 | 13:52 |
geguileo | this job https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/logs | 13:52 |
geguileo | this test cinder_tempest_plugin.api.volume.test_volume_backup.VolumesBackupsTest.test_volume_snapshot_backup | 13:52 |
geguileo | backup id 7bae1276-1488-442b-a521-9785c58c75fd | 13:52 |
geguileo | error: backup 7bae1276-1488-442b-a521-9785c58c75fd failed to reach available status (current creating) within the required time (196 s). | 13:52 |
geguileo | start of the request in the backup service https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/log/controller/logs/screen-c-bak.txt#6661 | 13:53 |
geguileo | end: https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/log/controller/logs/screen-c-bak.txt#8015 | 13:53 |
rosmaita | thanks! | 13:53 |
geguileo | I looked into the test, the create_backup method, the client, and it uses client.build_timeout as the timeout | 13:54 |
geguileo | I looked at the default and it's 300, so somewhere this has been changed for our tempest jobs | 13:54 |
rosmaita | iirc, there may be another fixtures timeout that is kind of hidden | 13:55 |
geguileo | maybe... | 13:55 |
rosmaita | anyway, this is good info to trace through the job definitions and look for something | 13:56 |
geguileo | but for backups I think I would either increase the timeout or make backup tests execute serially | 13:56 |
geguileo | because if they are executed concurrently they may be too slow | 13:56 |
rosmaita | i think i'd be in favor of increasing the timeout, would rather see us test in parallel | 13:57 |
geguileo | me too | 13:58 |
rosmaita | ok, we'll take that approach first | 13:58 |
geguileo | rosmaita: https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/log/controller/logs/tempest_conf.txt#29 | 13:59 |
geguileo | 196 seconds timeout, I just don't know where that is being set | 13:59 |
rosmaita | cool, i should be able to track that down | 14:00 |
rosmaita | (it's still early in my time zone, i am optimistic this morning) | 14:01 |
geguileo | that's a good way to do Friday | 14:01 |
geguileo | finally #3 scheduling host not found issues | 14:03 |
rosmaita | you made some suggestions about this but i have not followed up | 14:04 |
geguileo | yes, I would try increasing LVM's max_over_subscription_ratio to 40 or something like that | 14:06 |
geguileo | in the driver itself | 14:07 |
geguileo | and then in the defaults change backend_stats_polling_interval and periodic_interval both to something like 7 seconds ro something like that | 14:07 |
rosmaita | ok | 14:07 |
rosmaita | which would be a good job to test this out? | 14:07 |
geguileo | I saw #1 and #2 happen on the same job | 14:08 |
geguileo | os-brick-src-tempest-lvm-lio-barbican | 14:09 |
rosmaita | ok | 14:10 |
rosmaita | i saw #3 in one of the ceph jobs, i thought | 14:10 |
raghavendrat | hi, it would be great if someone can have a look at https://review.opendev.org/c/openstack/cinder/+/824911 | 14:19 |
raghavendrat | It has one +2. Thanks. | 14:19 |
opendevreview | Eric Harney proposed openstack/cinder master: Use modern type annotation format for collections https://review.opendev.org/c/openstack/cinder/+/839987 | 14:37 |
*** dviroel is now known as dviroel|lunch | 15:12 | |
*** dviroel|lunch is now known as dviroel | 16:00 | |
opendevreview | Brian Rosmaita proposed openstack/cinder master: Increase swap size to 4GB https://review.opendev.org/c/openstack/cinder/+/841782 | 16:22 |
opendevreview | Rico Lin proposed openstack/cinder master: Add image_conversion_disable config https://review.opendev.org/c/openstack/cinder/+/839793 | 17:00 |
ricolin | rosmaita: updated accordingly, thanks for the nice wording :) | 17:01 |
opendevreview | Merged openstack/cinder master: Replace distutils with packaging in 3rd party drivers https://review.opendev.org/c/openstack/cinder/+/832130 | 19:12 |
*** dviroel is now known as dviroel|out | 21:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!