Tuesday, 2022-04-12

*** rlandy_ is now known as rlandy|out00:05
*** ministry is now known as __ministry02:35
*** Tengu_ is now known as Tengu08:26
*** rlandy|out is now known as rlandy10:31
*** rlandy is now known as rlandy|mtg13:03
*** dasm|off is now known as dasm13:25
*** dviroel is now known as dviroel|mtg14:15
fzzf[m]fungi: hi, when I use DIB build diskimage have error, log like... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/tHheIRSvyYwZKoxlWNObiPHh)15:20
fzzf[m] * clarkb: fungi: hi, when I use DIB build diskimage have error, log like... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/lrnCfSDQjOEuYYKoxHMYfqRf)15:20
fungifzzf[m]: it's hard to identify the cause without a lot more detail, but usually the reason is that some process run during package installation within the chroot didn't terminate or mounted something in a subtree of that filesystem which didn't get umounted15:24
fungiyou might try using df and lsof to figure out what's still using that mount15:25
fzzf[m]fungi: That's a way, I'll try to check it. I have check diskimage-builder log. will some useful information here be displayed? thanks :) 15:29
fungifzzf[m]: probably, but you'll want the context of what on the system is still using that block device/filesystem first, and then you may be able to track down the reason for it in the log15:30
fzzf[m]fungi: fine. I get it. thanks :d15:31
fungiunfortunately, packages have a tendency to run maintscripts at installation which sometimes mount other things (especially virtual device trees under /dev) or leave processes going with open file handles (e.g., logging to something under /var)15:33
fzzf[m]fungi: Is there any way to avoid it. I get this umount error every time. I looked at the diskimage build log, the cirrors image download was completed, most elements completed their jobs, and then an error occurred when unmount, resulting in an unsuccessful final build. and also show this lastly. but this should not be the reason for... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/hvtKLSzDDslWeFeiJnaBHmhZ)15:43
clarkbdib does set the apt settings to not run scripts iirc15:44
clarkbyou can interrupt the build process and then examine what processes have files open15:44
clarkbto do this you can put a line that is `bash` in the element script that runs at the end before failure15:45
fungialso keep in mind that the log entry about being unable to umount the block device is coming from the cleanup phase, so you may be encountering an error before that which is terminating the image build15:45
clarkbgood point15:45
fzzf[m]fungi: sry, I don't understand. do you mean error from cleanup phase15:49
clarkbfzzf[m]: no the cleanup phase runs after successful or errored builds. This means that if the cleanup phase fails it could be due to an earlier fail during the actual build15:50
fungifzzf[m]: dib tries to install/configure things in the image, then once it's done it cleans up after itself. if something goes wrong during the install/config phase, then it could leave things in a "dirty" state which dib is unable to properly clean up15:51
fungiso you'll want to look at the log entries prior to when cleanup started for the build to see if things were successful or whether there's some other failure you need to address15:51
fzzf[m]fungi: This sudo kpartx -d /dev/loop4 is before trap_cleanup. I haven't found any other errors... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/fYWaPwBDtZEWPNimhsqmnhOQ)15:57
fzzf[m]<clarkb> "to do this you can put a line..." <- do you mean add /bin/bash in elemnts script..15:58
clarkbfzzf[m]: yes if you do that then manually run the image build your terminal will enter that bash shell and you can interact with the system. Then when you exit that shell the build will continue15:59
fungiso it umounts /var/cache/nodepool/dib_tmp/dib_build.OuW0QXUa/mnt/ and then tries to delete /dev/loop4 but gets back an error that loop4p2 (a partition of the loop4 device) is still busy. what does the log say that filesystem was mounted on? or where does it say that partition was mounted?16:00
*** dviroel|mtg is now known as dviroel|lunch16:02
fzzf[m]clarkb: In this case, I need to edit element script,  manually build the diskimage, and set the env variable first, right? I used nodepool-builder to automatically build it before.16:03
*** rlandy|mtg is now known as rlandy16:03
clarkbyes if you want to do interactive debugging of the builds you need to do manual steps16:06
clarkbnodepool runs automatically as a daemon and retries in a loop and there isn't a good way to break into it from there16:07
fzzf[m]fungi: find this... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/qWMVqhpjsnVqOiGbrCvspUfO)16:08
fzzf[m]clarkb: okay, get it.16:09
fungifzzf[m]: is that a log from a different build? it's talking about a loop2p3 partition, not the loop4p2 your previous sample was complaining about being in use. switching between logs from multiple is just going to get confusing16:22
fungier, between logs from multiple image builds16:23
fzzf[m]fungi: sry. that's other. this is loop4p2 log.... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/MTLasbUvSFkzrPHoKXqURRub)16:29
opendevreviewsean mooney proposed openstack/project-config master: update Review-Priority lable for nova related projects  https://review.opendev.org/c/openstack/project-config/+/83759516:32
clarkbsean-k-mooney: re ^ why explicitly list +0 permissions?16:33
clarkbI don't think it hurts but also want to amke sure I'm not missing anything16:34
fungiokay, so the /var/cache/nodepool/dib_tmp/dib_build.OuW0QXUa/mnt/ being umounted is where loop4p2 was mounted, that helps. i wonder if it's somehow acting implicitly as a "lazy umount" (-l/--lazy) so it's not actually umounted when the kpartx -d call happens16:34
sean-k-mooneyclarkb: we want the patch owner to be able to clear it but not request it16:36
sean-k-mooneyso we dont want the patch owner to be able to set +116:36
sean-k-mooneywe might relax that but that is why i put chage ownwer +016:36
clarkbsean-k-mooney: right but everyone can always +016:36
clarkband taht doesn't clear other people's votes16:36
sean-k-mooneyright but i was hoping that woudl overrite registered ownwer16:37
sean-k-mooneyah ok16:37
clarkboh no I don't think it will16:37
clarkbI see what you are saying now. Pretty sure that isn't how it will work16:37
sean-k-mooneyok 16:37
clarkbeveryone will be able to +116:37
sean-k-mooneywe technially dont need ot enforce that i guess16:37
sean-k-mooneyya i guess that is ok16:38
sean-k-mooneywe can just dicurrage self use 16:38
sean-k-mooneywe had considerd allowing self use but were not sure if it woudl be abused16:38
sean-k-mooneyclarkb: so you would suggest just droping the change owner line16:39
clarkbsean-k-mooney: yes to avoid confusion16:39
sean-k-mooneyrunnign tok localy i seam to have some failures16:39
sean-k-mooneyso ill be resinnign anyway16:39
sean-k-mooneythanks will do16:39
sean-k-mooneyclarkb: ill respin that tomorrow thanks for taking a look. ill get sylvain as ptl to give it a review too to make sure he is happy and then remvoe -w when its ready for reivew by infra or ping ye here16:45
clarkbsounds good16:45
*** dmellado_ is now known as dmellado16:46
fzzf[m]fungi: start from line 81. seem like use umount -fl.  https://paste.opendev.org/show/btoz1JnttNjtouE58mau/16:49
clarkbfzzf[m]: is the filesystem that you are running on network provided? just noting the umount man page for --lazy indicates network filesystems may cause problems16:50
fungiwell, it's rather than -l/--lazy is there to counter problems with unresponsive network filesystems16:51
fungiit's rather that, i mean16:51
clarkbright but maybe the mount wasn't actually gone at the end because it was on a network fs? lazy unmounting allows you to ignore that but we aren't ignoring it later due to the loopback device handling16:52
clarkbanyway dib should probably not do a lazy umount given the later loopback device handling16:52
fungilazy umounting will also return even if there are still submounts16:53
fungiper the umount manpage16:53
fungi"A system reboot would be expected in near future if you’re going to use this option for [...] local filesystem with submounts."16:54
fungianyway, it's probably not great to be trying to delete block devices after lazy-umounting them, since there's no guarantee that the umount has completed by the time device deletion starts16:55
fungihard-umounting would solve that, but could lead to the process hanging indefinitely16:55
fungithis likely explains the loop device leaks we see on builders from time to time16:56
clarkbya it may be better to lazy unmount, then check in a loop with a timeout and if after say 5 minutes we still haven't unmounted then error16:57
*** dviroel|lunch is now known as dviroel16:58
dansmithdpawlik: https://af03dfc56dd1bea1c6a5-57b719e0009d4036c44d6542bd77bfc6.ssl.cf1.rackcdn.com/837139/11/check/tempest-full-py3/57baa39/controller/logs/performance.json20:10
dansmithclarkb: ^20:10
opendevreviewClark Boylan proposed openstack/project-config master: Remove geard graphing from zuul-status dashboard  https://review.opendev.org/c/openstack/project-config/+/83762120:26
opendevreviewClark Boylan proposed openstack/project-config master: Remove geard graphing from zuul-status dashboard  https://review.opendev.org/c/openstack/project-config/+/83762120:31
clarkbdansmith: might want to graph rabbit and mysql and etcd in the processes list? but that is looking pretty good20:34
dansmithah, yeah20:34
clarkbdansmith: is that ~ half a gig of memory just for privsep though? 20:34
dansmithno, ~50ish20:35
clarkb514330624 is the rss added together from your example and then divided by 1024^2 is ~49020:38
dansmithoh sorry, you mean total privsep usage, I see20:40
dansmithI thought you mean the individual ones, most of which are around 50mb20:41
dansmithbut yeah neutron is 100m on its own for some reason20:41
clarkbah yup20:41
clarkbdansmith: my hunch is they have more regexes/rules. I wonder if they all get precompiled for performance reasons but that means regardless of how the software is used we carry the memroy cost of all the rules at all times20:42
dansmithmaybe, I was thinking more like they're doing large dumps of netlink outputs which inflate the heap and get proxied to the neutron proicess20:42
dansmithlike "iptables --line-numbers -L -nv"20:42
clarkbhrm ya, iptables can be chatty20:43
dansmithIIRC with privsep it should be mostly python code that is resident, not rootwrap-style rules right?20:43
clarkbI thought it still did rootwrap style rules, but I may be mistaken. I didn't follow that migration super closely20:44
dansmithokay, I didn't think so  but... me either20:44
dansmithalso that doesn't include the api call counts because I specified the wrong file, but it'll have those too20:44
dansmithalso added erlang (rabbit) and mysqld to the default process list20:44
clarkbetcd is probably worthwhile too since it is included by default. Though unsure if anything actually uses it at this point20:45
clarkbdansmith: looking at neutron/etc/neutron/rootwrap.d/rootwrap.filters it seems that there are path filters too which may imply privsep is reading and writing file stoo20:47
clarkbthat could explain buffer bloat too if files are large20:47
dansmithoh is their privsep using rootwrap instead of the native stuff?20:47
clarkbth econfig files seem to use rootwrap paths at least20:48
dansmithI thought the native way privsep works is that the privsep binary is run as root (potentially using rootwrap for just that) and then it proxies actual python calls rpc-style to/from the parent20:49
dansmithso only one rootwrap rule should be needed if that's what you use20:49
dansmithcinder is using etcd via tooz, IIRC20:50
clarkbgot it re rootwrap and privsep. That sounds likely20:50
*** dviroel is now known as dviroel|out20:55
fungiafaik, all privsep rule evaluation should be taking place in python, however the migration was possible piecemeal, so if neutron still hasn't completed it then they may be doing both rootwrap and privsep21:12
clarkbwell its mostly that the configs all see mto be in rootwrap files hinting at that. But maybe that was a compatibility thing21:17
fungii wouldn't be surprised if they never finished21:18
clarkblooks like c-bak is still a big consumer of memory too. I thought we had addressed that by not running c-bak since nothing was testing cinder backup in those installations21:31
clarkblooking at devstack's .zuul.yaml c-bak is explicitly enabled. I wonder if that disablement got lost in the d-g to zuul config shuffle /me tries to figure that out21:32
clarkblooks like we removed it from grenade in d-g21:34
dansmithfungi: right21:40
dansmithclarkb: I thought we did have some c-bak tests, but I could be wrong21:40
dansmithI disabled c-bak and swift in some other jobs specifically to get some memory back since we were ooming21:40
dansmith(and apparently c-bak requires swift)21:41
clarkbdansmith: ya looking at the c-bak logs it seems to be doing something. My memory may be related specifically to hte grenade situation which likely doesn't check c-bak21:43
dansmithack21:44
*** rlandy is now known as rlandy|bbl22:35
*** dasm is now known as dasm|off22:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!