Wednesday, 2022-01-19

*** hemna6 is now known as hemna07:37
*** dasm|off is now known as dasm13:25
opendevreviewAlexey Stupnikov proposed openstack/nova master: Support use_multipath for NVME driver  https://review.opendev.org/c/openstack/nova/+/82394116:05
spatelFolks, i have glusterfs mounted on /var/lib/nova for my vms shared storage and everything working but when i delete my vm it didn't clean up files. i can still see disk and other file which taking lots of space. is that normal for nova17:37
sean-k-mooneyhum technially we dont support moutning /var/lib/nova  on glusterfs however the shared file system support we have for mounting it on NFS shoudl work with it17:38
sean-k-mooneythe way we detect a shared files system is by touching a file which shoudl just work with any shared file system17:39
sean-k-mooneyin terems of deleting files17:39
sean-k-mooneywhen you delete the vm it should delete the vm disk and other files in general17:39
sean-k-mooneythe backing files for hte guest image may stay in the image cache for a peiord of time but it will eventully get cleaned up17:40
sean-k-mooneyi wonder if this is liek the ceph issue17:40
sean-k-mooneyhttps://docs.openstack.org/nova/latest/configuration/config.html#workarounds.ensure_libvirt_rbd_instance_dir_cleanup17:40
spatelhmm why did you say don't support glusterfs for /var/lib/nova ?17:41
sean-k-mooneyspatel: because we dont offcialy support or test that upstream17:41
sean-k-mooneywe test /var/lib/nova on nfs even thouhg we recommend using it17:42
spatelbut generally shared storage is storage just create file and delete file.. no matter who is in backend 17:42
sean-k-mooneybut we have never offcially supprot clusterfs as afar as i recall17:42
sean-k-mooneyspatel: there are a buch of issue with locks and move operation that the latency of shared storage can cause17:43
sean-k-mooneyso in general we do not support it17:43
spatelwe have 800TB of glusterfs and it would be bad if i mount that on foo server and then mount on 50 compute nodes 17:43
spatelThat is interesting... 17:44
sean-k-mooneyin generall if you want ot use glusterfs with openstack the only supporte way to do it would via cinder17:44
sean-k-mooneyit can work the way you have it deplyed but its not a tested configuration and was never expiclty supported17:44
spatelmay be i am the first use case here.. :) i would still like to push it and try out because we already have large storage which i would like to use. 17:46
spatelif i see any issue (major issue) then i may switch to NFS 17:46
spatelIf i go with cinder then its not going to be shared storage correct?  does live migration will be supported with cinder base deployment?17:47
sean-k-mooneyspatel: well jsut a word of warning if we could remove support for this capablity entirly we would17:48
sean-k-mooneywe dont like support nfs backed /var/lib/nova17:48
sean-k-mooneywe only maintian supprot becasue we have some old large clouds that use it but this area fo the code is not activly maintained17:48
spatelhmm! so in short no to shared storage except ceph. correct?17:49
sean-k-mooneybasically and even for ceph not by using cephfs17:49
spatelceph rbd 17:49
sean-k-mooneythe probelm is that when using shared storate like this we are actully using a local storage driver in nova17:49
sean-k-mooneye.g. image_type=qcow17:50
spateli know what you saying.. mounting nova on shared storage can have all kind of issue 17:50
sean-k-mooneyand have sprincked some check in random parts of the code to do something else fi we detect its on shared storage17:50
sean-k-mooneythe main one come up aroudn move operations and evacuate17:50
sean-k-mooneye.g. we need to make sure we dont overrie the image if we do a cold/live migration17:51
sean-k-mooneyand dont recreate it if we evacuate17:51
sean-k-mooneyto prevent loosing data17:51
spatelhmm 17:51
spatelso what option i have in current design with glusterfs ? 17:52
spatelcinder boot volum?17:52
sean-k-mooneyif the sync time between writing a file on one host and reading it on another is long enough we might not detect its on shared storage whchi can be bad17:52
sean-k-mooneycinder boot form vomue woudl work yes17:52
spateldoes cinder support glusterfs ?17:53
sean-k-mooneywhat you are doign can work just be aware that there are dragons wehn you do this.17:53
sean-k-mooneyam it used too17:53
spatelI understand this cluster not going to hold critical data. this is for crunch some data and give dynamic result 17:54
sean-k-mooneyam its avaible for cinder backup17:54
sean-k-mooneyhttps://docs.openstack.org/cinder/latest/drivers.html#glusterfsbackupdriver17:54
sean-k-mooneynot sure about cidner in general17:54
spatellet me do some research and see what i can do 17:55
sean-k-mooneyin generall it looks like no17:56
spateloh boy :(17:56
sean-k-mooneywe do not have a volume driver in nova for gluster17:56
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L176-L19117:56
sean-k-mooneyso it can be used for backup bug not vms17:56
spatelI think i should try NFS now because folks are using it.. 17:57
spateli know you hate but i need something to move forward and deal with issue later. :)17:57
spatelwe have plan to buy dedicated ceph storage but that is not today but after few months 17:58
sean-k-mooneyack the way we detech shared storage https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L10893-L1091217:59
sean-k-mooneyin principal shoudl work with gluster17:59
sean-k-mooneybut im not sure why it woudl not delete the instace directory17:59
sean-k-mooneyalthogh i dont think that is the only way we check18:00
sean-k-mooneyspatel: we have things like this18:00
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1723-L173118:00
spatelsean-k-mooney does nova has any time etc.. where it wait and delete?18:01
spatelI am planning to turn on debug to see what is going on ?18:01
sean-k-mooneywell i think part of the delete wil be done by libvirt18:02
sean-k-mooneythis si where we do some o fthe lean up https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L150218:02
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L11641-L1169718:02
spatelhmm.. is there any special config which i may missed 18:02
sean-k-mooneylook for   LOG.info('Deletion of %s failed', remaining_path,18:03
sean-k-mooney                     instance=instance)18:03
sean-k-mooneyspatel: you could try https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L10619-L1064718:06
sean-k-mooneyits contoled by instance_delete_interval18:06
sean-k-mooneyhttps://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.instance_delete_interval18:07
sean-k-mooneyspatel: so nova will try to delete the instance every 5 minuts and then retry up to 5 times18:08
sean-k-mooneyhttps://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.maximum_instance_delete_attempts18:08
spatellet me check that setting and wait..  i can turn on debug also 18:09
sean-k-mooneyspatel: check the compute agent log for that info message first18:09
sean-k-mooneyto confirm that nova failed to delete the instance files18:09
spatelgive me few minute to collect some logs 18:11
sean-k-mooneyno worries18:15
spatelsean-k-mooney as quick check i found this - https://paste.opendev.org/show/812229/18:28
spatellibvirt trying to delete but its getting failed.. very odd18:29
spatelnow going to turn on debug and see18:29
sean-k-mooneyya that is the message i was expecting18:30
sean-k-mooneyso like nfs https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L11079 it look like deleteion on gluster is also unreliable18:31
spatelhttps://paste.opendev.org/show/812230/18:32
spatelThis is odd, when i tried to delete that by hand i got above error 18:33
sean-k-mooneyso ya looks like a gluster issue18:33
spatelbut it deleted files inside that directory 18:33
spatelI am going to talk to someone who own that storage and see what is going on 18:33
sean-k-mooneyack18:33
spatelThanks for checking18:34
sean-k-mooneyim not sure how gluster tracks files vs direcoties18:34
sean-k-mooneybut it may handel directory inodes differently18:34
spatelhttps://paste.opendev.org/show/812231/18:34
spatelsomething is wrong with gluster for sure.. i am able to delete files but not directory 18:35
sean-k-mooneyack18:35
sean-k-mooneyya it could also be fuse18:35
sean-k-mooneyare you mounting glusterfs with fuse or using a kernel dirver18:36
spatel10.10.217.21:gluster_vol2/voyager /mnt/glusterfs  glusterfs  defaults,backup-volfile-servers=10.10.217.22:10.10.217.23:10.10.217.24:10.10.217.25:10.10.217.2618:37
spatelfuse.. hmm ?18:37
spateli just install glusterfs-client RPM package and use mount tool to mount it 18:37
sean-k-mooneyi think there is a user space dirver and a kernel dirver for gluster like there is for ceph18:37
spatellet me check.. that is good point18:38
sean-k-mooneyhttps://bugzilla.redhat.com/show_bug.cgi?id=150899918:39
spatelReading bug report, looks interesting18:43
sean-k-mooneyi have not looked at it closely but the title seamed relevent18:44
spateli am having same issue but trying to understand how they fix. i don't control gluster so not sure but i can explain someone18:44
sean-k-mooneyi would start with your simple repoducer18:45
sean-k-mooneye.g. create teh dir and show you cant delete it18:45
sean-k-mooneythat really shoudl work18:45
sean-k-mooneyone thing to check is it any directoy or just ones at the root of the volume18:46
sean-k-mooneyi.e. does mkdir -p .../temp/mydata rm -rf .../temp/mydata work18:47
spatellet me check that18:51
spatelany directory in tree period 18:52
sean-k-mooneyack but it can delete files18:52
sean-k-mooneythat is very odd indeed18:53
spateli am able to delete files not matter where they located but not able to delete any dir18:53
spatelYes i can delete files.. 18:53
spatelnot directory 18:53
sean-k-mooneyya i have never seen that before honestly unless the permission of the folder vs fiels are diffent18:53
sean-k-mooneybut the error implies it an internal gluster issue not a simple permissions one18:54
spatelI asked someone to take a look 18:54
spatelsean-k-mooney i have ask someone to take a look so hope they find something20:32
opendevreviewAlexey Stupnikov proposed openstack/nova master: Support use_multipath for NVME driver  https://review.opendev.org/c/openstack/nova/+/82394121:34
clarkbsean-k-mooney: fyi https://git.centos.org/rpms/systemd/c/3d3dc89fb25868e8038ecac8d5aef0603bdfaaa2?branch=c8s was recently committed. I don't know how/when/if that will become a package in the package repos but progress23:02
*** dasm is now known as dasm|off23:33
sean-k-mooney ack i tihnk it should go through the koji automated build once th commit lands in dist git automatically23:39
sean-k-mooneyclarkb: so i woudl expect that to show up relitivly quickly once its commited23:39
sean-k-mooneyclarkb: https://koji.mbox.centos.org/koji/buildinfo?buildID=20898 there was the attmepted build23:45
sean-k-mooneylooks like it failed23:46
sean-k-mooney155/298 test-procfs-util                          FAIL             0.32s   killed by signal 6 SIGABRT23:49
sean-k-mooney――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――23:50
sean-k-mooneystderr:23:50
sean-k-mooneyCurrent system CPU time: 5month 4w 4h 23min 16.380000s23:50
sean-k-mooneyCurrent memory usage: 34.6G23:50
sean-k-mooneyCurrent number of tasks: 68123:50
sean-k-mooneykernel.pid_max: 4096023:50
sean-k-mooneykernel.threads-max: 103030923:50
sean-k-mooneyLimit of tasks: 4095923:50
sean-k-mooneyReducing limit by one to 40958…23:50
clarkb Iguess a failed build is still prgress23:50
sean-k-mooneyprocfs_tasks_set_limit: Permission denied23:50
sean-k-mooneyAssertion 'r >= 0 ? w == v - 1 : w == v' failed at ../src/test/test-procfs-util.c:59, function main(). Aborting.23:50
sean-k-mooney――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――23:50
sean-k-mooneyya not really sure why that failed to be honest23:50
sean-k-mooneythere si one failrue out of aorund 300 build tests23:50
sean-k-mooneyso its morst fine23:51
sean-k-mooneythe build logs are here https://koji.mbox.centos.org/koji/taskinfo?taskID=33449023:51
sean-k-mooneyin case your intersted23:51
sean-k-mooneyhopefully its just a buggy test and a rebuild wil fix it but in anycase hopefuly it will get adressed soon23:52
sean-k-mooneyim pretty sure i do not have an account that can retriger that on that koji instance so ill just have to wait and see but i can link the failed build on the bugzilla bug23:53
clarkbya I'm not really in a hurry myself more just trying to follow along since we get semi regular question about it. Though those have died down recently. I think people just know ping is broken now23:54
sean-k-mooneyack https://bugzilla.redhat.com/show_bug.cgi?id=2037807#c10 comment on the bug. ill check it again tomorow but at least peopel are aware of the issue23:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!