Tuesday, 2016-01-19

*** ChanServ changes topic to "+CVE-2016_0728."		17:52
*** ChanServ changes topic to "CVE-2016-0728 http://www.openwall.com/lists/oss-security/2016/01/19/2"		17:53
fungi	here's trusty: http://www.ubuntu.com/usn/usn-2870-1/ (so we're looking for linux-image-3.13.0-76-generic_3.13.0-76.120)	18:00
fungi	precise is unaffected since it's on too early of a kernel (the bug was introduced in 3.8, precise is 3.2)	18:01
clarkb	we should be able to do a mass ansible apt-get update && apt-get install linux-image	18:02
clarkb	then check the results and reboot	18:02
fungi	yep, and hopefully soonish if my sources.list change has propagated	18:03
fungi	which it doesn't seem to have done yet	18:03
fungi	puppetboard does not look so well	18:04
clarkb	fungi: its possible mordred's ansible work has broken something	18:04
fungi	111 population, 111 nodes unreported in the last 1.5 hours	18:04
clarkb	there is also linux-image-virtual	18:04
clarkb	we should double check on what our VMs actually use	18:04
fungi	ahh, i was going by uname -a	18:05
mordred	o/	18:05
mordred	what?	18:05
mordred	clarkb: oh - in puppetboard?	18:05
clarkb	mordred: ya no rpeorts for an hour and a half	18:05
mordred	k. on it	18:05
mordred	o - yes - that's because of the two patches that are trying to land	18:06
fungi	on a rackspace instance, which isn't entirely representative of the state of the world granted since we have pypi mirrors elsewhere	18:06
mordred	that haven't landed due to node starvation	18:06
mordred	but I just saw their last set of tests start running in zuul status	18:06
fungi	mordred: well, also it's not puppeting. my 267778 change to manage sources.list merged an hour ago and hasn't propagated	18:07
mordred	right	18:07
mordred	that's why they haven't reported	18:07
mordred	because they aren't puppeting	18:07
mordred	will be fixed as soon as that patch lands	18:07
fungi	ahh, okay, just confirming it's not only missing reports, but actually not running	18:07
fungi	okay, confirmed my sources.list change has propagated to review.o.o now	18:37
clarkb	fungi: you should be able to do something like sudo ansible all -m shell 'apt-get update && apt-get install $package'	18:41
clarkb	but check my ansible :)	18:42
fungi	okay, just tested and confirmed that review and zuul (representatives of trusty and precise respectively) apt-get update successfully after the sources.list update got applied by puppet just now	18:43
fungi	clarkb: we need to break that up by release though, right? or just accept that it will fail on not-trusty?	18:44
clarkb	I was thinking accept it will fial or noop on not trusty	18:44
clarkb	if you write a proper playbook I think you cna restrict it to where the fact gathering reports trusty	18:45
clarkb	might be able to do that on the command line too	18:45
fungi	i have about 10 minutes to prep for the weekly meeting, so i'll pick this back up in about 70 minutes ;)	18:48
clarkb	http://docs.ansible.com/ansible/playbooks_conditionals.html#the-when-statement	18:51
clarkb	so we could do something like when os_distro_release == "trusty"	18:51
clarkb	(those values may not be what ansible uses)	18:52
fungi	okay, checking up on where we're at with this now	21:03
fungi	apt-cache show on review.o.o indicates we have a linux-generic 3.13.0.76.82 pending	21:05
fungi	but we apparently need 3.13.0-76.120	21:06
fungi	i installed 3.13.0.76.82 on etherpad-dev just to check it out, and /usr/share/doc/linux-image-3.13.0-76-generic/changelog.Debian.gz indicates that it is indeed the update for CVE-2016-0728/LP: #1534887 "KEYS: Fix keyring ref leak in join_session_keyring()"	22:13
openstack	Launchpad bug 1534887 in linux (Ubuntu Xenial) "CVE-2016-0728" [High,Incomplete] https://launchpad.net/bugs/1534887	22:13
fungi	thanks openstack	22:13
fungi	3.13.0-76.120 may be the corresponding source package version	22:13
clarkb	fungi: so thats just a package versioning confusion?	22:13
fungi	hard to tell since the packages.u.c update pulse for today hasn't happened yet	22:14
fungi	but yes, looks that way	22:14
fungi	oh!	22:15
fungi	should have done apt-cache show linux-image-3.13.0-76-generic not linux-generic	22:16
fungi	that does actually indicate that 3.13.0-76.120 is the current version in the package list	22:17
clarkb	if possible leave the nodepool restart to me and I can do it when I upgrade the service	22:17
fungi	so i think that means we're ready to start kernel upgrades	22:17
fungi	well, we need to make sure the new packages get installed everywhere before we even start planning what order we reboot them in, i think	22:17
clarkb	++	22:18
clarkb	fungi: did you see my ansible link? something like that should work great for ansibling the update and install	22:18
fungi	yep, looks good	22:19
fungi	also we have 33 nodes running trusty according to http://puppetboard.openstack.org/fact/operatingsystemrelease/14.04	22:19
fungi	i'll go ahead and give the package updates a shot	22:20
clarkb	ok let me know if I can help with that	22:20
fungi	also need to circle back around and see if red hat has posted rhel 7 package details for this yet	22:21
clarkb	I can do that	22:21
fungi	and whether the equivalents have made it into centos	22:21
fungi	i couldn't find any earlier, but that was hours ago so hopefully they've updated the bug now	22:21
clarkb	my initial check is coming up empty	22:22
clarkb	I usually start at https://lists.centos.org/pipermail/centos-announce/2016-January/thread.html	22:22
clarkb	but will check yum too just because	22:22
fungi	mordred: if you have any input on ansibleisms for this, it's appreciated. i'm just going to start swinging wildly over here	22:23
clarkb	I appreciate that the package name is "kernel"	22:23
fungi	debian used to call it kernel-image until they started having hurd, kfreebsd, et cetera kernels in addition to linux	22:23
fungi	then it became a little confusing so they did a package name transition to linux-image	22:24
clarkb	we are currently running a few versions behind, going t ocheck if the latest has our patch	22:25
clarkb	nope latest is from the 6th	22:25
clarkb	so I think we are still waiting for package update on centos	22:25
fungi	clarkb: from the example you linked, looks like we want when: ansible_distribution == "Ubuntu" and ansible_distribution_major_version == "14.04"	22:26
clarkb	fungi: that sounds about right	22:26
clarkb	fungi: then just run a shell task to do apt-get update && apt-get install $package or dist-upgrade	22:26
clarkb	or you can use the apt module and have it do those things	22:26
clarkb	(though I find it easier to just shell for one offs like this)	22:27
fungi	do i create a temporary role file to put this stuff in, or can it be fed as cli arguments?	22:27
clarkb	I think you need a temp playbook	22:27
clarkb	it may be possible to feed it all as cli arguments but the docs tend to be lacking on how to do that	22:28
clarkb	so I just cave and make a yaml file so I cna follow docs	22:28
clarkb	a general playbook for "forcefully update packages now" might make sense though	22:28
clarkb	then we can just run it next time	22:28
fungi	No manual entry for ansible-playbook; See 'man 7 undocumented' for help when manual pages are not available.	22:29
* fungi shakes cane again		22:29
clarkb	fungi: because we pip install I think	22:29
fungi	and ansible-playbook --help doesn't work without sudo, wants to write to /var/log/ansible.log/var/log/ansible.log	22:30
fungi	that's sort of scary	22:30
* fungi needs to stop thinking like a sysadmin and use web searches for stack sexchange articles from people he has no reason to trust		22:30
clarkb	that may be a misconfiguration on our part	22:30
clarkb	fungi: I like to google man whatever	22:30
clarkb	google is the best mandb	22:31
clarkb	ya log_path=/var/log/ansible.log is set in /etc/ansible/ansible.cfg	22:31
fungi	however, why does it even try to open its log when invoked with only --help? that's the part i found scary	22:32
clarkb	also why is it appending /var/log/ansible to our path :)	22:35
fungi	clarkb: oh, that was my bouncing paste button. i need to fix that	22:35
fungi	the message just said IOError: [Errno 13] Permission denied: '/var/log/ansible.log'	22:35
clarkb	aha	22:36
fungi	but sometimes my middle button registers two clicks when i press once at the moment	22:36
fungi	the example i had used a -f 10 (from our workspace cleanup) so i was trying to figure out what that's doing, but found my answer in /usr/local/lib/python2.7/dist-packages/ansible/cli/playbook.py	22:37
fungi	use the source, luke	22:37
fungi	thought maybe it was a timeout or something, but no it's the short form of --forks	22:37
fungi	so parallelism count	22:37
clarkb	yup	22:39
clarkb	default is 5 I Think	22:39
fungi	okay, so i have a file in my homedir called upgrade_kernel.yaml with this content: http://paste.openstack.org/show/484348	22:41
clarkb	looking	22:41
clarkb	fungi: that looks about righ tthen you would do `ansible-playbook $pathtoyamlfile	22:42
clarkb	fungi: er you need a host spec first	22:42
clarkb	fungi: ansible-playbook $host yamlfile	22:42
clarkb	fungi: where all is an alias for all known hosts	22:42
fungi	sudo ansible-playbook -f 10 all upgrade_kernel.yaml	22:42
clarkb	ya	22:42
fungi	trying that now	22:43
clarkb	you cna replace all with a specific fqdn if yo uwant to test first	22:43
fungi	i'll do it under screen	22:43
fungi	oh, good idea	22:43
mordred	heya - sorry - I was on the phone - do you still need help?	22:46
clarkb	mordred: we should know shortly :)	22:46
fungi	looks like ansible-playbook doesn't take a hostname parameter	22:46
mordred	soooo	22:46
fungi	it wants ANSIBLE_HOSTS envvar passed?	22:46
mordred	that's not a playbook you have there	22:46
mordred	one sec	22:47
fungi	do i comma-separate multiples or...	22:47
fungi	oh	22:47
clarkb	oh can we ansible it instead	22:47
fungi	did i mention swinging wildly over here? ;)	22:47
clarkb	(I really hate the ansible terminology fwiw)	22:47
clarkb	(granted I am sure puppet is no better I have just gotten used to it	22:47
clarkb	fungi: you should just pass the name on the command line in place of all iirc	22:48
mordred	http://paste.openstack.org/show/484349/	22:48
mordred	fungi, clarkb: ^^ try tat	22:48
clarkb	oh right this is the difference between playbook and the other command	22:48
mordred	ansible-playbook name-of-yaml.yml	22:48
clarkb	(which is part of my confusion I think, and ansible should just rm one of them)	22:48
fungi	mmm black magic	22:48
mordred	a playbook is a collection of plays - a play is the combination of host specifications with one or more tasks	22:49
mordred	so a playbook by design is a thing that associates tasks with where you want to run them	22:49
fungi	what's the strategy parameter?	22:49
mordred	actually - that's a waste, you can remove it - there is only one task here	22:49
mordred	but in genreal it says "don't wait for other hosts to finish the task you're on before proceeding ot the next task"	22:50
fungi	ahh, got it	22:50
mordred	which in a case like this, is correct - you do not care that all hosts finish task one then move on to task two	22:50
fungi	so i ansible-playbook this file still?	22:50
fungi	and pass an envvar to indicate which host i want to test it on?	22:50
fungi	at least that's what the manpage seems to imply	22:51
mordred	fungi: yes	22:51
mordred	no	22:52
mordred	do --limit=$hostname	22:52
mordred	fungi: ansible-playboot --limit=review-dev.openstack.org that-file.yaml	22:52
fungi	sudo ansible-playbook -f 10 --limit=ask-staging.openstack.org upgrade_kernel.yaml	22:52
mordred	yes	22:52
fungi	thanks	22:52
fungi	it's not liking the hosts: * line	22:53
clarkb	fungi: use hosts all	22:54
clarkb	I think	22:54
fungi	yep, that seems to have worked	22:54
mordred	oops. sorry	22:54
fungi	wow, it claims to have run, but completed very quickly	22:54
fungi	far too quickly given that this should have taken a minute or so	22:55
fungi	oh, it says TASK [command]... skipping: [ask-staging.openstack.org]	22:55
clarkb	fungi: the when may have failed	22:55
mordred	yes.	22:56
mordred	http://paste.openstack.org/show/484351/	22:57
fungi	indeed, it looks like it is actually running once i remove the when condition	22:57
mordred	fungi: ^^	22:57
mordred	that way you can check to see what the values of the variables are there	22:57
fungi	thanks	22:58
mordred	you can also run "ansible ask-staging.openstack.org -m setup" and it will print all of the variables	22:58
mordred	fungi: are you screening?	22:58
fungi	"ansible_distribution_major_version": "14"	22:58
fungi	mordred: yep	22:59
mordred	well, there we go!	22:59
mordred	(that's probably good enough for matching for this)	22:59
fungi	so, er, i guess that's not the same as what facter does for major version'	22:59
clarkb	since we never did unicorn	22:59
fungi	yeah, good enough	22:59
fungi	`sudo ssh ask-staging.openstack.org dpkg -l linux-image-3.13.0-76-generic` indicates it's installed now	23:00
mordred	woot!	23:01
fungi	and rerunning the playbook with == "14" seems to not skip the host	23:01
mordred	that's excellent	23:01
fungi	so i think we're set to open it wide?	23:01
clarkb	cool time to run it everywhere I think	23:01
mordred	++	23:01
fungi	so i expect etherpad-dev (which i tested manually earlier) and ask-staging to fail possibly as they've already got the new package	23:02
fungi	also possibly some hosts (review-dev and some of our pypi mirrors) with puppet disabled	23:02
fungi	anyway, it's running now	23:02
fungi	seems to be working so far. i'll do some more spot checks after it finishes	23:03
fungi	looks like it's skipping the right hosts	23:06
fungi	also some errors which look like hosts where puppet has been disabled for a while so probably fon	23:06
fungi	gah	23:07
fungi	probably don't have my sources.list updates yet	23:07
clarkb	hrm, maybe we just manually update sources.list if list is small and rerun?	23:08
clarkb	or we fiure out if we can enable puppet again	23:08
fungi	yeah, for example i have no idea when/why it was disabled on review-dev	23:10
clarkb	I think that was part of the gerrit upgrade prep	23:11
clarkb	we should get it puppeting again but that may be mor ethan a 5 minute change	23:11
*** ianw has joined #openstack-infra-incident		23:12
fungi	it was reenabled at some point after the gerrit upgrade	23:12
fungi	and disabled again since	23:12
fungi	mordred: what's the default timeout on ansible? it looks like it's just hanging for a few minutes now, and i suspect it's having trouble connecting to or hearing back from something	23:14
mordred	fungi: it's going to look like it's hanging if it's running something - output is buffered til the end of task execution	23:14
mordred	fungi: I have not noticed overly-long hang/timeouts when doing testing of puppet runs ...	23:15
fungi	some of these are probably just taking a while	23:15
mordred	yah	23:15
Clint	i think it's like 5 or 10 minutes	23:15
mordred	I wish the output was streamed - but output is a json blob, so I think streaming woudl be tricky	23:15
fungi	like update-initrd churning through dozens of old kernel packages we never autoremoved	23:15
mordred	fungi: oh. yeah. that'll take a minute	23:16
clarkb	did the change to autoremove not merge	23:16
clarkb	maybe I hsould dig that up	23:16
fungi	it's still sitting. no new output for about 10 minutes	23:22
fungi	mordred: ps claims that my ansible-playbook call has 10 defunct zombie children	23:29
fungi	no, sorry, i mis-counted. just 9	23:30
fungi	but i worry they're never going to terminate and are actually indefinitely hung	23:37
mordred	hrm	23:39
mordred	fungi: well, worst-case with bailing is that you'll abort 10 apt-get install processes	23:39
fungi	any clue whether i should just shoot the parent in the head and try again? keyboard interrupt? sigkill? or are there more graceful options?	23:39
mordred	fungi: and have to apt-get -f install somewhere	23:39
mordred	fungi: I'd just ctrl-c if it were me	23:39
fungi	i actually looked at ps hoping it would mention which hosts it was communicating with for the currently lingering forks, but no such luck	23:40
mordred	you could maybe kill the forst	23:40
mordred	and see if you get good error from the parent	23:40
mordred	fungi: forks. not forst	23:40
mordred	fungi: don't kill the forst	23:40
fungi	yeah, i'll kill the immediate parent of the forks but not the parent's parent	23:41
fungi	oh, i missed, it's right there in the ssh command-line	23:42
fungi	15.126.140.7 (which seems to have no reverse dns?)	23:42
fungi	that's pypi.region-b.geo-1.openstack.org	23:42
mordred	no reverse dns in hpcloud	23:43
fungi	right, which is why i checked that one first	23:43
fungi	and also because it looked like an hpcloud ip address	23:43
mordred	yay for 15.	23:43
fungi	so the recap says these failed: afs01.dfw, afs01.ord, afsdb01, afsdb02, odsreg-test-corvus, pypi.region-b.geo-1, review-dev, test-mordred-config-drive	23:45
jeblair	i think we can delete odsreg-test-corvus	23:45
clarkb	those all sounds like VMs that may not have up to date sources.list	23:45
fungi	er, test-mordred-config-drive was unreachable, not failed	23:45
fungi	also there's a ab78618b-a1f4-4d0a-8aeb-56f7b688bcf4 which was unreachable	23:46
fungi	the hostlist is dynamically generated from nova list such that i can just nova delete the trash instances, or does something else need updating in between?	23:47
fungi	no idea what that uuid is, it's not in openstackci rax-dfw though	23:47
fungi	nevermind, i was looking in the openstack tenant not openstackci	23:48
fungi	it's the old release.slave. i'll clean that one up too	23:48
fungi	it was unreachable because it's in shutdown state since after i replaced it	23:49
fungi	jeblair: i've deleted odsreg-test-corvus.openstack.org (from rax-ord) too, thanks	23:50
fungi	should we ignore the afs servers for now or reenable puppet on them or update them manually?	23:51
fungi	i've already got review-dev sorted with zaro and reenabled	23:51
jeblair	fungi: you're saying the afs fileservers are disabled in puppet?	23:58
jeblair	fungi, mordred: i don't know why that would be	23:58
fungi	jeblair: actually, they're not according to puppetboard	23:59
mordred	fungi: test-mordred-config-drive can die	23:59
fungi	so the update presumably failed on them for some other reason i'll debug in a bit	23:59
mordred	fungi: if you delete instances, you want to delete the inventory cache too	23:59
fungi	mordred: thanks, doing	23:59
mordred	fungi: /var/cache/ansible-inventory/<tab>	23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!