Tuesday, 2022-04-12

*** rlandy_ is now known as rlandy\|out		00:05
*** ministry is now known as __ministry		02:35
*** Tengu_ is now known as Tengu		08:26
*** rlandy\|out is now known as rlandy		10:31
*** rlandy is now known as rlandy\|mtg		13:03
*** dasm\|off is now known as dasm		13:25
*** dviroel is now known as dviroel\|mtg		14:15
fzzf[m]	fungi: hi, when I use DIB build diskimage have error, log like... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/tHheIRSvyYwZKoxlWNObiPHh)	15:20
fzzf[m]	* clarkb: fungi: hi, when I use DIB build diskimage have error, log like... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/lrnCfSDQjOEuYYKoxHMYfqRf)	15:20
fungi	fzzf[m]: it's hard to identify the cause without a lot more detail, but usually the reason is that some process run during package installation within the chroot didn't terminate or mounted something in a subtree of that filesystem which didn't get umounted	15:24
fungi	you might try using df and lsof to figure out what's still using that mount	15:25
fzzf[m]	fungi: That's a way, I'll try to check it. I have check diskimage-builder log. will some useful information here be displayed? thanks :)	15:29
fungi	fzzf[m]: probably, but you'll want the context of what on the system is still using that block device/filesystem first, and then you may be able to track down the reason for it in the log	15:30
fzzf[m]	fungi: fine. I get it. thanks :d	15:31
fungi	unfortunately, packages have a tendency to run maintscripts at installation which sometimes mount other things (especially virtual device trees under /dev) or leave processes going with open file handles (e.g., logging to something under /var)	15:33
fzzf[m]	fungi: Is there any way to avoid it. I get this umount error every time. I looked at the diskimage build log, the cirrors image download was completed, most elements completed their jobs, and then an error occurred when unmount, resulting in an unsuccessful final build. and also show this lastly. but this should not be the reason for... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/hvtKLSzDDslWeFeiJnaBHmhZ)	15:43
clarkb	dib does set the apt settings to not run scripts iirc	15:44
clarkb	you can interrupt the build process and then examine what processes have files open	15:44
clarkb	to do this you can put a line that is `bash` in the element script that runs at the end before failure	15:45
fungi	also keep in mind that the log entry about being unable to umount the block device is coming from the cleanup phase, so you may be encountering an error before that which is terminating the image build	15:45
clarkb	good point	15:45
fzzf[m]	fungi: sry, I don't understand. do you mean error from cleanup phase	15:49
clarkb	fzzf[m]: no the cleanup phase runs after successful or errored builds. This means that if the cleanup phase fails it could be due to an earlier fail during the actual build	15:50
fungi	fzzf[m]: dib tries to install/configure things in the image, then once it's done it cleans up after itself. if something goes wrong during the install/config phase, then it could leave things in a "dirty" state which dib is unable to properly clean up	15:51
fungi	so you'll want to look at the log entries prior to when cleanup started for the build to see if things were successful or whether there's some other failure you need to address	15:51
fzzf[m]	fungi: This sudo kpartx -d /dev/loop4 is before trap_cleanup. I haven't found any other errors... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/fYWaPwBDtZEWPNimhsqmnhOQ)	15:57
fzzf[m]	<clarkb> "to do this you can put a line..." <- do you mean add /bin/bash in elemnts script..	15:58
clarkb	fzzf[m]: yes if you do that then manually run the image build your terminal will enter that bash shell and you can interact with the system. Then when you exit that shell the build will continue	15:59
fungi	so it umounts /var/cache/nodepool/dib_tmp/dib_build.OuW0QXUa/mnt/ and then tries to delete /dev/loop4 but gets back an error that loop4p2 (a partition of the loop4 device) is still busy. what does the log say that filesystem was mounted on? or where does it say that partition was mounted?	16:00
*** dviroel\|mtg is now known as dviroel\|lunch		16:02
fzzf[m]	clarkb: In this case, I need to edit element script, manually build the diskimage, and set the env variable first, right? I used nodepool-builder to automatically build it before.	16:03
*** rlandy\|mtg is now known as rlandy		16:03
clarkb	yes if you want to do interactive debugging of the builds you need to do manual steps	16:06
clarkb	nodepool runs automatically as a daemon and retries in a loop and there isn't a good way to break into it from there	16:07
fzzf[m]	fungi: find this... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/qWMVqhpjsnVqOiGbrCvspUfO)	16:08
fzzf[m]	clarkb: okay, get it.	16:09
fungi	fzzf[m]: is that a log from a different build? it's talking about a loop2p3 partition, not the loop4p2 your previous sample was complaining about being in use. switching between logs from multiple is just going to get confusing	16:22
fungi	er, between logs from multiple image builds	16:23
fzzf[m]	fungi: sry. that's other. this is loop4p2 log.... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/MTLasbUvSFkzrPHoKXqURRub)	16:29
opendevreview	sean mooney proposed openstack/project-config master: update Review-Priority lable for nova related projects https://review.opendev.org/c/openstack/project-config/+/837595	16:32
clarkb	sean-k-mooney: re ^ why explicitly list +0 permissions?	16:33
clarkb	I don't think it hurts but also want to amke sure I'm not missing anything	16:34
fungi	okay, so the /var/cache/nodepool/dib_tmp/dib_build.OuW0QXUa/mnt/ being umounted is where loop4p2 was mounted, that helps. i wonder if it's somehow acting implicitly as a "lazy umount" (-l/--lazy) so it's not actually umounted when the kpartx -d call happens	16:34
sean-k-mooney	clarkb: we want the patch owner to be able to clear it but not request it	16:36
sean-k-mooney	so we dont want the patch owner to be able to set +1	16:36
sean-k-mooney	we might relax that but that is why i put chage ownwer +0	16:36
clarkb	sean-k-mooney: right but everyone can always +0	16:36
clarkb	and taht doesn't clear other people's votes	16:36
sean-k-mooney	right but i was hoping that woudl overrite registered ownwer	16:37
sean-k-mooney	ah ok	16:37
clarkb	oh no I don't think it will	16:37
clarkb	I see what you are saying now. Pretty sure that isn't how it will work	16:37
sean-k-mooney	ok	16:37
clarkb	everyone will be able to +1	16:37
sean-k-mooney	we technially dont need ot enforce that i guess	16:37
sean-k-mooney	ya i guess that is ok	16:38
sean-k-mooney	we can just dicurrage self use	16:38
sean-k-mooney	we had considerd allowing self use but were not sure if it woudl be abused	16:38
sean-k-mooney	clarkb: so you would suggest just droping the change owner line	16:39
clarkb	sean-k-mooney: yes to avoid confusion	16:39
sean-k-mooney	runnign tok localy i seam to have some failures	16:39
sean-k-mooney	so ill be resinnign anyway	16:39
sean-k-mooney	thanks will do	16:39
sean-k-mooney	clarkb: ill respin that tomorrow thanks for taking a look. ill get sylvain as ptl to give it a review too to make sure he is happy and then remvoe -w when its ready for reivew by infra or ping ye here	16:45
clarkb	sounds good	16:45
*** dmellado_ is now known as dmellado		16:46
fzzf[m]	fungi: start from line 81. seem like use umount -fl. https://paste.opendev.org/show/btoz1JnttNjtouE58mau/	16:49
clarkb	fzzf[m]: is the filesystem that you are running on network provided? just noting the umount man page for --lazy indicates network filesystems may cause problems	16:50
fungi	well, it's rather than -l/--lazy is there to counter problems with unresponsive network filesystems	16:51
fungi	it's rather that, i mean	16:51
clarkb	right but maybe the mount wasn't actually gone at the end because it was on a network fs? lazy unmounting allows you to ignore that but we aren't ignoring it later due to the loopback device handling	16:52
clarkb	anyway dib should probably not do a lazy umount given the later loopback device handling	16:52
fungi	lazy umounting will also return even if there are still submounts	16:53
fungi	per the umount manpage	16:53
fungi	"A system reboot would be expected in near future if you’re going to use this option for [...] local filesystem with submounts."	16:54
fungi	anyway, it's probably not great to be trying to delete block devices after lazy-umounting them, since there's no guarantee that the umount has completed by the time device deletion starts	16:55
fungi	hard-umounting would solve that, but could lead to the process hanging indefinitely	16:55
fungi	this likely explains the loop device leaks we see on builders from time to time	16:56
clarkb	ya it may be better to lazy unmount, then check in a loop with a timeout and if after say 5 minutes we still haven't unmounted then error	16:57
*** dviroel\|lunch is now known as dviroel		16:58
dansmith	dpawlik: https://af03dfc56dd1bea1c6a5-57b719e0009d4036c44d6542bd77bfc6.ssl.cf1.rackcdn.com/837139/11/check/tempest-full-py3/57baa39/controller/logs/performance.json	20:10
dansmith	clarkb: ^	20:10
opendevreview	Clark Boylan proposed openstack/project-config master: Remove geard graphing from zuul-status dashboard https://review.opendev.org/c/openstack/project-config/+/837621	20:26
opendevreview	Clark Boylan proposed openstack/project-config master: Remove geard graphing from zuul-status dashboard https://review.opendev.org/c/openstack/project-config/+/837621	20:31
clarkb	dansmith: might want to graph rabbit and mysql and etcd in the processes list? but that is looking pretty good	20:34
dansmith	ah, yeah	20:34
clarkb	dansmith: is that ~ half a gig of memory just for privsep though?	20:34
dansmith	no, ~50ish	20:35
clarkb	514330624 is the rss added together from your example and then divided by 1024^2 is ~490	20:38
dansmith	oh sorry, you mean total privsep usage, I see	20:40
dansmith	I thought you mean the individual ones, most of which are around 50mb	20:41
dansmith	but yeah neutron is 100m on its own for some reason	20:41
clarkb	ah yup	20:41
clarkb	dansmith: my hunch is they have more regexes/rules. I wonder if they all get precompiled for performance reasons but that means regardless of how the software is used we carry the memroy cost of all the rules at all times	20:42
dansmith	maybe, I was thinking more like they're doing large dumps of netlink outputs which inflate the heap and get proxied to the neutron proicess	20:42
dansmith	like "iptables --line-numbers -L -nv"	20:42
clarkb	hrm ya, iptables can be chatty	20:43
dansmith	IIRC with privsep it should be mostly python code that is resident, not rootwrap-style rules right?	20:43
clarkb	I thought it still did rootwrap style rules, but I may be mistaken. I didn't follow that migration super closely	20:44
dansmith	okay, I didn't think so but... me either	20:44
dansmith	also that doesn't include the api call counts because I specified the wrong file, but it'll have those too	20:44
dansmith	also added erlang (rabbit) and mysqld to the default process list	20:44
clarkb	etcd is probably worthwhile too since it is included by default. Though unsure if anything actually uses it at this point	20:45
clarkb	dansmith: looking at neutron/etc/neutron/rootwrap.d/rootwrap.filters it seems that there are path filters too which may imply privsep is reading and writing file stoo	20:47
clarkb	that could explain buffer bloat too if files are large	20:47
dansmith	oh is their privsep using rootwrap instead of the native stuff?	20:47
clarkb	th econfig files seem to use rootwrap paths at least	20:48
dansmith	I thought the native way privsep works is that the privsep binary is run as root (potentially using rootwrap for just that) and then it proxies actual python calls rpc-style to/from the parent	20:49
dansmith	so only one rootwrap rule should be needed if that's what you use	20:49
dansmith	cinder is using etcd via tooz, IIRC	20:50
clarkb	got it re rootwrap and privsep. That sounds likely	20:50
*** dviroel is now known as dviroel\|out		20:55
fungi	afaik, all privsep rule evaluation should be taking place in python, however the migration was possible piecemeal, so if neutron still hasn't completed it then they may be doing both rootwrap and privsep	21:12
clarkb	well its mostly that the configs all see mto be in rootwrap files hinting at that. But maybe that was a compatibility thing	21:17
fungi	i wouldn't be surprised if they never finished	21:18
clarkb	looks like c-bak is still a big consumer of memory too. I thought we had addressed that by not running c-bak since nothing was testing cinder backup in those installations	21:31
clarkb	looking at devstack's .zuul.yaml c-bak is explicitly enabled. I wonder if that disablement got lost in the d-g to zuul config shuffle /me tries to figure that out	21:32
clarkb	looks like we removed it from grenade in d-g	21:34
dansmith	fungi: right	21:40
dansmith	clarkb: I thought we did have some c-bak tests, but I could be wrong	21:40
dansmith	I disabled c-bak and swift in some other jobs specifically to get some memory back since we were ooming	21:40
dansmith	(and apparently c-bak requires swift)	21:41
clarkb	dansmith: ya looking at the c-bak logs it seems to be doing something. My memory may be related specifically to hte grenade situation which likely doesn't check c-bak	21:43
dansmith	ack	21:44
*** rlandy is now known as rlandy\|bbl		22:35
*** dasm is now known as dasm\|off		22:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!