Friday, 2022-05-13

*** dviroel|afk is now known as dviroel|out01:26
opendevreviewMerged openstack/cinder master: mypy: set no_implicit_optional  https://review.opendev.org/c/openstack/cinder/+/78226003:12
opendevreviewMerged openstack/cinder master: Don't destroy existing backup by mistake on import  https://review.opendev.org/c/openstack/cinder/+/83945110:07
*** dviroel|out is now known as dviroel11:00
opendevreviewStephen Finucane proposed openstack/python-cinderclient master: Deprecate cinder CLI  https://review.opendev.org/c/openstack/python-cinderclient/+/84172512:13
opendevreviewStephen Finucane proposed openstack/python-cinderclient master: docs: Update docs to reflect deprecation status  https://review.opendev.org/c/openstack/python-cinderclient/+/84172612:13
geguileorosmaita: whoami-rajat__ I see many commands missing in the table https://docs.openstack.org/python-openstackclient/latest/cli/decoder.html12:56
geguileoThere is no support for Active-Active, no support for dynamically getting or setting log levels12:57
geguileono way to get manageable lists12:57
geguileo2 quota commands missing12:57
rosmaitageguileo: whoami-rajat__: guess we need to update the parity matrix doc12:57
geguileorosmaita: I mean, they are there, there is just no OSC command for them12:58
geguileono revert-to-snapshot support12:58
geguileowe would need to go and make sure that we have 100% parity12:58
geguileoand document that new features need to be added to OSC12:58
rosmaitaand i forgot the parity doc was stored on ethercalc, which has recently been commissioned12:58
rosmaita:(12:59
rosmaitawell, looks like we need a new parity doc, anyway12:59
geguileothat page looks pretty good to me13:00
rosmaitayes, but we need a gap document to keep track of what needs to be completed13:00
rosmaitaor however the PTL wants to do it, i'm flexible13:01
geguileowe can probably copy/paste that table into a google sheet without any effort13:01
opendevreviewStephen Finucane proposed openstack/cinder master: api-ref: Add docs for clusters  https://review.opendev.org/c/openstack/cinder/+/79578513:02
opendevreviewStephen Finucane proposed openstack/cinder master: Add Python 3.10 functional jobs  https://review.opendev.org/c/openstack/cinder/+/84175313:02
opendevreviewStephen Finucane proposed openstack/cinder master: WIP: tests: Add functional tests for cluster API  https://review.opendev.org/c/openstack/cinder/+/84175413:02
stephenfinrosmaita: I split the functional tests out of that patch. They're miles from complete and I'd really like to see _something_ merged here, particularly given we just added support for this stuff to OSC ^13:03
stephenfingeguileo: Are these new commands. I updated that doc only last year https://review.opendev.org/c/openstack/python-openstackclient/+/79294613:05
stephenfin*?13:05
rosmaitastephenfin: yeah, i got sidetracked while reviewing the doc patch and didn't get back to it ... i hope i have notes somewhere about what i was worried about13:05
rosmaitastephenfin: the ones geguileo is talking about are not very new; only new thing added in yoga was volume reimage13:06
stephenfinMaybe I missed something but iirc I just ran 'OS_VOLUME_API_VERSION=3.X cinder --help' (whatever X was at the time) and reformatted the output so I could cross-reference13:06
stephenfinHmm, I must have missed something so13:07
geguileostephenfin: the commands are in the doc13:07
geguileostephenfin: what's missing is the OSC command equivalent13:07
geguileorosmaita: had said that he thought there was parity between the clients, and I was disagreeing13:08
stephenfinAh, gotcha. I see...13:08
stephenfingroup-create-from-src,,Creates a group from a group snapshot or a source group.13:08
stephenfinmanageable-list,,Lists all manageable volumes.13:08
stephenfinquota-delete,,Delete the quotas for a tenant.13:08
stephenfinquota-usage,,Lists quota usage for a tenant.13:08
stephenfinand about 6 others13:08
stephenfingeguileo: It's not parity, but it's _really_ close, especially now that the cluster stuff is merging. Minimal effort is needed to close the gap13:10
geguileostephenfin: yes, definitely we are a lot closer than I thought13:10
geguileoI would say it justifies merging the cinder client deprecation patch13:11
geguileoalthough I still don't like that OSC is slower when loading, thoug that's something we can work on13:11
stephenfinThe stuff that's remaining is mostly stuff I either don't understand (what's a "manageable" volume) or stuff I think might be covered by other commands though I'm not sure (again because I don't understand)13:11
geguileostephenfin: manageable volumes and snapshot is the concept of how we can bring volumes that already exist into Cinder 13:12
geguileoor make cinder "forget" about them13:12
stephenfinYeah, that one's complicated. It's partially because we use entrypoints (though that's got faster thanks to importlib_metadata) and partially because of cmd2 (via cliff) which loads a load of other crap like the GTK stuff for clipboard support13:12
geguileoso if you unmanage a volume then cinder doesn't delete the actual volume in the backend, but marks the row in the DB as deleted13:13
stephenfinoh, TIL13:13
geguileoand if you manage it, you tell cinder that there's a volume in the array that you want to start managing (be the owner of the volume)13:14
geguileoand the listing is so that it's easier to know which volumes are in the array (usually the specific pool) that are not yet managed by Cinder13:14
geguileosame thing for snapshots13:14
rosmaitageguileo: take a look at https://review.opendev.org/c/openstack/cinder/+/815660/9#message-001fb6ded8a7fe210b5a061b45197180d72fcd80 , and hit 'recheck' if you feel lucky13:33
geguileorosmaita: I'm feeling lucky13:34
rosmaitago for it, man!!!13:34
rosmaitageguileo: lmk when you want to lay some CI job improvement ideas on me13:38
geguileorosmaita: we can start now13:41
geguileoI believe we have at least 3 different problems:13:42
geguileo1- OOM kill of backup service13:42
geguileo2- timeout of backups13:42
geguileo3- no host found on scheduler13:42
geguileofor #1 I believe there is something going on either with Python or we somehow are holding buffers in variables for too long (though I though I had fixed that)13:43
geguileoI wanted to deploy cinder-backup to do a memory profile, but I've been having troubles with the deployment13:43
geguileoso we should probably wait a bit to do the analysis, because the backup driver is using swift, and the size is of the chunks is 50MB13:45
geguileoand there should be no reason why cinder-backup ends up using 4GB before it gets killed13:45
geguileoso changing backup_swift_object_size config option probably won't help :-(13:46
rosmaitaright, i think in the tests it's only doing one backup at a time13:46
rosmaitaor at most 513:46
geguileoI've seen a couple happening concurrently13:46
geguileobecause tempest is running multiple workers in parallel13:46
rosmaitanothing that accounts for 4G though at 50MB chunks13:46
geguileoexactly13:46
geguileoso I have to properly investigate it13:47
geguileomemory profiling, object relationship analysis, garbage collection status, etc13:47
rosmaitaor somebody does ... maybe we can convince the dev who posted the patch to change object size to help13:47
geguileooh, is there a patch to change the object size?13:47
* geguileo didn't know13:48
rosmaitai may be confusing cinder with glance, i think there's something posted13:48
rosmaitai will look later13:48
geguileoin any case, the only default that is big is backup_file_size, but afaik that was not used in the job I saw get OOM killed13:48
geguileofor #2 I believe that one the zuul jobs have changed a default from 300 to 196 or something like that13:49
geguileoand it's crazy that timeouts at 196 and then at 200 seconds the backup completes...13:50
geguileorofl13:50
rosmaitayeah, that's a killer13:50
geguileothe default for tempest build_timeout configuration option in the volume group is 30013:51
geguileoso I don't know where that is being changed13:51
rosmaitaok, i can look into that13:51
geguileothen let me give you some additional info...13:51
rosmaitayou don't happen to have a link to that job?13:51
geguileoon this patch https://review.opendev.org/c/openstack/os-brick/+/836059/313:52
geguileothis job  https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/logs13:52
geguileothis test cinder_tempest_plugin.api.volume.test_volume_backup.VolumesBackupsTest.test_volume_snapshot_backup13:52
geguileobackup id 7bae1276-1488-442b-a521-9785c58c75fd13:52
geguileoerror:  backup 7bae1276-1488-442b-a521-9785c58c75fd failed to reach available status (current creating) within the required time (196 s).13:52
geguileostart of the request in the backup service  https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/log/controller/logs/screen-c-bak.txt#666113:53
geguileoend: https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/log/controller/logs/screen-c-bak.txt#801513:53
rosmaitathanks!13:53
geguileoI looked into the test, the create_backup method, the client, and it uses client.build_timeout as the timeout13:54
geguileoI looked at the default and it's 300, so somewhere this has been changed for our tempest jobs13:54
rosmaitaiirc, there may be another fixtures timeout that is kind of hidden13:55
geguileomaybe...13:55
rosmaitaanyway, this is good info to trace through the job definitions and look for something13:56
geguileobut for backups I think I would either increase the timeout or make backup tests execute serially13:56
geguileobecause if they are executed concurrently they may be too slow13:56
rosmaitai think i'd be in favor of increasing the timeout, would rather see us test in parallel13:57
geguileome too13:58
rosmaitaok, we'll take that approach first13:58
geguileorosmaita: https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/log/controller/logs/tempest_conf.txt#2913:59
geguileo196 seconds timeout, I just don't know where that is being set13:59
rosmaitacool, i should be able to track that down14:00
rosmaita(it's still early in my time zone, i am optimistic this morning)14:01
geguileothat's a good way to do Friday14:01
geguileofinally #3 scheduling host not found issues14:03
rosmaitayou made some suggestions about this but i have not followed up14:04
geguileoyes, I would try increasing LVM's max_over_subscription_ratio to 40 or something like that14:06
geguileoin the driver itself14:07
geguileoand then in the defaults change backend_stats_polling_interval and periodic_interval both to something like 7 seconds ro something like that14:07
rosmaitaok14:07
rosmaitawhich would be a good job to test this out?14:07
geguileoI saw #1 and #2 happen on the same job14:08
geguileoos-brick-src-tempest-lvm-lio-barbican14:09
rosmaitaok14:10
rosmaitai saw #3 in one of the ceph jobs, i thought14:10
raghavendrathi, it would be great if someone can have a look at https://review.opendev.org/c/openstack/cinder/+/82491114:19
raghavendratIt has one +2. Thanks.14:19
opendevreviewEric Harney proposed openstack/cinder master: Use modern type annotation format for collections  https://review.opendev.org/c/openstack/cinder/+/83998714:37
*** dviroel is now known as dviroel|lunch15:12
*** dviroel|lunch is now known as dviroel16:00
opendevreviewBrian Rosmaita proposed openstack/cinder master: Increase swap size to 4GB  https://review.opendev.org/c/openstack/cinder/+/84178216:22
opendevreviewRico Lin proposed openstack/cinder master: Add image_conversion_disable config  https://review.opendev.org/c/openstack/cinder/+/83979317:00
ricolinrosmaita: updated accordingly, thanks for the nice wording :)17:01
opendevreviewMerged openstack/cinder master: Replace distutils with packaging in 3rd party drivers  https://review.opendev.org/c/openstack/cinder/+/83213019:12
*** dviroel is now known as dviroel|out21:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!