Monday, 2022-06-20

opendevreviewArun KV proposed openstack/cinder master: Reintroduce DataCore driver  https://review.opendev.org/c/openstack/cinder/+/83699605:06
opendevreviewTushar Trambak Gite proposed openstack/cinder master: Reset state robustification for snapshot os-reset_status  https://review.opendev.org/c/openstack/cinder/+/80403505:10
opendevreviewTushar Trambak Gite proposed openstack/cinder master: Reset state robustification for group os-reset_status  https://review.opendev.org/c/openstack/cinder/+/80473505:11
opendevreviewTushar Trambak Gite proposed openstack/cinder master: Reset state robustification for group-snapshot os-reset_status  https://review.opendev.org/c/openstack/cinder/+/80475705:11
opendevreviewTushar Trambak Gite proposed openstack/cinder master: Reset state robustification for backup os-reset_status  https://review.opendev.org/c/openstack/cinder/+/77819305:11
opendevreviewTushar Trambak Gite proposed openstack/cinder master: Include volume type constraints in internal API  https://review.opendev.org/c/openstack/cinder/+/84614610:22
opendevreviewAlexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache.  https://review.opendev.org/c/openstack/cinder/+/83697311:28
opendevreviewAlexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache.  https://review.opendev.org/c/openstack/cinder/+/83697311:30
*** dviroel_ is now known as dviroel12:05
toskywhoami-rajat, geguileo: any idea on how to unbreak https://review.opendev.org/c/openstack/cinder/+/845799 (failure caused by https://review.opendev.org/c/openstack/os-brick/+/834604 )?12:45
geguileotosky: that's the cinderlib failure, right?12:46
geguileoit is12:46
toskymaybe you discussed this already on Friday but I'm catching up with emails12:47
geguileotosky: whoami-rajat told me about the failure and I started investigating12:48
geguileoin my eyes the failure is caused by privsep not using the venv's libraries and using the host's ones12:48
geguileoso fixing that would be the right fix12:48
geguileoto unblock the gate I think we have 2 options12:49
geguileo1- Change the .zuul jobs (the 2 that fail: LVM & Ceph) that stable branch to use brick from source12:49
geguileo2- change the cinderlib tox.ini in that branch to use the released os-brick version instead of using the one from master12:50
geguileothose are the 2 simple fixes that I think can unblock the gate12:50
geguileoI'm currently looking into the proper solution12:50
toskyI think 2 is the one I would personally choose first13:02
rosmaitageguileo: fwiw, i agree with tosky about option #213:12
rosmaitai think we need to have a cinder-project-deliverables CI review at the next midcycle13:13
rosmaitai'm worried about not testing os-brick properly in the gate, but at the same time, not getting into the situation glance had, where a glance CI fix could not merge because glance_store CI was broken, and the glance_store CI fix could not merge because glance CI was broken13:14
rosmaitaluckily, we were able to make a devstack change that fixed both13:15
rosmaitabut it's not a good situation to be in13:15
whoami-rajatagree with the majority here, we should modify cinderlib instead of the jobs consuming it else we might start facing similar issues in other job and end up patching jobs again and again13:16
whoami-rajatrosmaita, I face that issue every cycle, is it because of the functional job we run on glance_store gate?13:17
whoami-rajatglance functional job on glance_store gate13:18
rosmaitawhoami-rajat: i am not sure, i haven't looked into it carefully13:18
whoami-rajatack13:18
whoami-rajatgeguileo, IIUC, we still have an issue with how cinderlib using privsep in gate right? since the patch in os-brick stable/wallaby broke cinderlib tests, if os-brick gets released with that patch, cinderlib tests are going to break again right?13:21
geguileowhoami-rajat: cinderlib tests work fine on its gate13:21
geguileothey only break in the Cinder gate13:21
whoami-rajatgeguileo, i mean on cinder gate13:21
whoami-rajatyes13:22
geguileooh, forgot additional way to fix the issue, make an os-brick release13:22
geguileoafaik the issue right now is caused by the host having an older version of os-brick than the one in the virtual env13:22
geguileocinderlib sees the os-brick version in the virtual env and calls privsep to execute the code13:23
geguileoand privsep is using the older os-brick version (from pip)13:23
geguileothis can be: 1- Because os-brick is not doing correctly the privsep   2- Because cinderlib is somehow calling the privsep that Cinder started13:24
geguileoI don't think the second option is possible... Because privsep should be generating a random new directory and a new socket inside of it each time13:25
whoami-rajatwasn't able to understand all the details but looks like we've an issue somewhere with our usage of privsep (probably os-brick as you said)13:27
geguileowhoami-rajat: no, not os-brick, this is 99.99% sure on the cinderlib side (code, config, job config, etc)13:36
geguileoI am testing things locally and it works fine, and privsep called with rootwrap has the right sys.path to search for libraries (at least in my system)...13:36
whoami-rajatok, you said "1- Because os-brick is not doing correctly the privsep" so got confused13:36
geguileooh, yeah, that was a possibility 10 minutes ago    lol13:37
geguileobut according to my local tests doesn't look like that's the case...13:38
whoami-rajat:D ack13:38
whoami-rajatthanks for looking into it, we can have the workaround/"right way to use libs in gate i.e. released" till then to at least unblock the wallaby gate13:39
whoami-rajatgeguileo, hope you will be pushing the patch for it?13:39
opendevreviewAlexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache.  https://review.opendev.org/c/openstack/cinder/+/83697314:07
*** dviroel is now known as dviroel|lunch15:36
*** dviroel|lunch is now known as dviroel16:43
opendevreviewAlexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache.  https://review.opendev.org/c/openstack/cinder/+/83697317:56
hemnageguileo, so it looks like there are really only 3 places where the workers entries are created in all of cinder.  the scheduler for volume create, the volume rpcapi for delete_volume and create_snapshot.18:41
hemnathat's it18:41
hemnaI'm not sure I see the purpose of the workers table at this point as it's really not used18:42
geguileoiirc they are created only in the places related to the operations we can actual clean18:42
geguileowhich are very few18:42
geguileoI thought there were some more...18:42
hemnayah, so I guess the workers table purpose is for somehow cleaning up during the next start?18:42
geguileonext start and on a clustered service so an operator can trigger the cleaning on other nodes18:44
geguileowithout races conditions18:44
hemnahrmm ok so it's hard to see how I could use this table then18:49
hemnaI am looking for 2 things out of the proposal.   1) cinder top - to show in realtime(ish) what cinder is working on right now.  2) a history of state changes for a volume for a particular request id.18:50
hemnacinder top would simply loop forever showing a list of active actions being taken on what volumes and each actions progress.18:51
hemnaand the history is a log of what happened at various steps in the process for actions on a volume.18:52
hemnacinder top could be used to help me decide if I can safely bounce the cinder service.18:52
hemnaand the history would help find out wtf went wrong on actions on a volume....or at least where it bailed.18:52
hemnathere are so many damn volume inconsistencies it's a mountain of problems for me at this point.18:53
hemnathis is one of my many deployments:  https://paste.openstack.org/show/bsV8DWpSeh9eq0oDDOVJ/18:55
geguileothe top could be done with the workers table (adding other operations there), which could allow us to do proper cleanup for other states as well18:56
geguileoand it would help make the query a lot faster (since it does hard deletes and not soft ones)18:56
geguileohemna: ouch, ouch, ouch, 92 94 errors18:56
hemnayes18:57
hemnawell the workers table is a new row for every change in the process of a volume18:58
hemnait would be hard to show a live table18:59
geguileono, no, the same row is updated18:59
geguileothat's why it doesn't help with your second objetive18:59
geguileoit helps with top, because it only shows what's ongoing18:59
hemnahrmm, maybe the data I'm looking at is bogus then18:59
geguileobut once a resource reaches a stable state it gets removed, so no history19:00
geguileoit was done on purpose to make sure it was performant19:00
geguileoI think that using the same table for history and top would not have a good performance19:01
geguileobecause it would store a lot of records19:01
geguileobut then again, something non performant is 1000% better than nothing19:02
geguileoso we could always use the history table to do top, and in the future move things to the workers table19:02
geguileoby things I mean only the top part19:02
hemnahttps://paste.openstack.org/show/boLvsDvkDRXnQYLgKpB2/19:02
hemnayah I agree, those 2 features can't be in the same table19:03
geguileobut we can do them in the same table "for now"19:03
hemnabut both the top and the history should have request id in it too19:03
geguileohistory definitely, I'm thinking about top...19:04
hemnathe top table would be pretty small at any given point in time19:04
geguileoif it's a different table yes19:04
hemnathe history could be cleaned out by an external script via the soft delete mechanism19:04
geguileoor we can add an endpoint to clean things older than a date or something19:05
geguileothat way we don't have to do weird things with the soft delete19:05
hemnayah that's fine too19:05
geguileohaving the request id is ok in the top table, though I admit I don't quite see its usefulness (disadvantages of not maintaining a cloud)19:06
hemnanot a lot of activity going on at the moment, but I wrote this as a volume states tool https://asciinema.org/a/znuSbH2hfxB5TPv71xTL1zXkC19:08
hemnaso the request_id is vital for me, because right now we push all the logs into kibana19:09
hemnaand when there are problems I have to search kibana for the specific error, then I filter out by request_id after I find the request_id that failed.   19:09
hemnabeing able to go into the DB and look by request_id would make it easier for filtering for me in the history table, instead of by volume id19:10
hemnawe do lots of stuff to the same volume, but a particular request_id is related to a specific action being take on that volume that may have failed.19:10
hemnare: I don't care about 3 days ago extension of a particular volume, but 3 days ago's attach of that volume I do care about.19:11
hemna2 different actions with 2 request_ids19:11
hemnaso what happens if I start stuffing other actions in the workers table?19:16
hemnalike attach, detach, extend, migration19:16
hemnaheh, every entry in the workers table is deleted=0;19:17
hemnaentries from 2018! :P19:18
*** dviroel is now known as dviroel|afk20:05
opendevreviewFrancesco Pantano proposed openstack/devstack-plugin-ceph master: Deploy with cephadm  https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/82648421:01
*** dviroel|afk is now known as dviroel|out21:05

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!