Thursday, 2023-08-03

opendevreviewOpenStack Proposal Bot proposed openstack/cinder master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/cinder/+/88999804:09
opendevreviewKatarina Strenkova proposed openstack/cinder-tempest-plugin master: Replacedeprecated terms  https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/88984807:44
opendevreviewRaghavendra Tilay proposed openstack/cinder master: HPE 3par: Fix issue seen during retype/migrate  https://review.opendev.org/c/openstack/cinder/+/88755908:31
raghavendrathi whoami-rajat: are you around ?12:15
raghavendratwhoami-rajat: regarding https://review.opendev.org/c/openstack/cinder/+/87868412:55
raghavendratIt has two +2. Whenever you get time, could you please have a look. Thanks.12:55
dansmithrosmaita: my feeling is that "lvchange -ay" should be a pretty fast operation, no? IIRC, it's just creating a dm device, dm entries, and making it active15:27
dansmithI15:27
dansmitham looking at a failure where a bunch of stuff grinds to a halt on a CI worker for a minute, and the thing that seems to correlate is running 'lvchange -ay' at the same time, which takes 60s15:28
rosmaitadansmith: yes, i would expect that to be pretty fast, but apparently not15:34
dansmithbecause it's 60s exactly, it smells of a deadlock/timeout thing, but it also doesn't fail15:35
dansmithrosmaita: any pointers to who might be able to take a look at that?15:48
rosmaitadansmith: can you give me a link?15:52
dansmithrosmaita: in and about here: https://zuul.opendev.org/t/openstack/build/ed1d53dce8204b6e82c6dcedf335fb66/log/controller/logs/screen-c-vol.txt#652015:54
rosmaitaty15:55
dansmithno activity for a minute or so before that, then some "this took too long" things around running that command, updating the service state etc15:55
dansmithget_volume_stats took 134s, etc15:55
dansmithso I'm wondering if that is because c-vol is hung up with long-running lvm commands,15:57
dansmithor if the long-running db and lvm commands are a symptom of something somewhere else15:57
dansmiththat loopingcall thing that is complaining about report_state taking longer than the interval kinda makes me wonder15:59
dansmithI see it running pretty often and complaining about an overage that almost looks like a good portion of the interval15:59
dansmithhow often is that supposed to run? certainly not multiple times per minute right?16:00
rosmaitai think 6x a minute by default16:02
dansmithreally? what does it do? I assume update a service record in the database like nova?16:03
rosmaitayeah, i think it's basically a heartbeat kind of thing16:04
dansmithbut to the database?16:04
rosmaitayes, pretty sure16:04
dansmithso we could probably slow that down in devstack right?16:05
dansmithat times of high load, thats probably burning one of your threadpool workers all the time16:05
dansmithand increasing load on the database16:05
dansmithI guess nova's is pretty high as well, but I know that's one of the things people have to slow down at any kind of scale, sort of firstthing16:06
rosmaitai think it would only be an issue for devstack if there are tempest tests around the service status APIs16:06
dansmithwell, we can control the "what do we consider to be down" interval, which would go up as well16:06
dansmith(we being nova)16:06
dansmithif we set the report interval too slow and don't adjust the other, then tests will fail because we won't schedule to "down" services,16:07
dansmithbut if you bump both of them it'll balance out16:07
rosmaitaI think the option is [DEFAULT]/report_interval in cinder.conf16:07
dansmithyeah, same for nova16:07
dansmithand service_down_time?16:07
rosmaitayeah, we have 60 as the default for service_down_time16:08
dansmithyeah, so I'm thinking report_interval=60 and service_down_time=12016:09
dansmithfor both nova and cinder16:09
dansmithit will be a small gain, granted, but if we're seeing it constantly not meeting the deadline, then there's really no point in keeping it so low16:09
rosmaitai was thinking 120 and 720 but let's see what your change does16:10
dansmithyour numbers are bigger, let's do that :P16:10
rosmaitai will be interested to see what happens16:11
dansmithrosmaita: yeah, this is just a minor tweak, but we'll see.. still interested in any diagnosis of the lvm stuff16:16
dansmithrosmaita: https://review.opendev.org/c/openstack/devstack/+/89043916:19
dansmithrosmaita: so back to that example,16:22
dansmithcinder runs "lvs" which I assume is like updating its internal state or something,16:23
dansmithand then nothing else happens for a minute,16:23
dansmithwhich also correlates with the lvchange -ay delay16:23
dansmithI'm wondering if cinder should be using an external lock to only run one lvm command at a time maybe?16:24
rosmaitai have no idea, this is goign to require some research17:13
dansmithrosmaita: related, but.. I wonder if it would be better to use the ceph driver more than lvm as our baseline?17:13
dansmithI would expect lvm to be much simpler and better performing, but I kinda think we see more failures using it for random reasons than ceph17:14
rosmaitai don't know about that, guess we'll have to pull some data17:16
eharneythe lvs/vgs commands running for over a minute is bizarre... it looks like devstack already sets up lvm device filtering so i'm not sure what could cause that20:02
opendevreviewEric Harney proposed openstack/cinder master: LVM: Use --readonly where possible for lvs/vgs  https://review.opendev.org/c/openstack/cinder/+/89046020:24
opendevreviewEric Harney proposed openstack/os-brick master: LVM: Use --readonly where possible for lvs/vgs  https://review.opendev.org/c/openstack/os-brick/+/89028820:25
opendevreviewEric Harney proposed openstack/cinder master: DNM: LVM: get debug output from lvs  https://review.opendev.org/c/openstack/cinder/+/89046120:29
eharneysome ideas ^20:30

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!