*** ChanServ changes topic to "+CVE-2016_0728." | 17:52 | |
*** ChanServ changes topic to "CVE-2016-0728 http://www.openwall.com/lists/oss-security/2016/01/19/2" | 17:53 | |
fungi | here's trusty: http://www.ubuntu.com/usn/usn-2870-1/ (so we're looking for linux-image-3.13.0-76-generic_3.13.0-76.120) | 18:00 |
---|---|---|
fungi | precise is unaffected since it's on too early of a kernel (the bug was introduced in 3.8, precise is 3.2) | 18:01 |
clarkb | we should be able to do a mass ansible apt-get update && apt-get install linux-image | 18:02 |
clarkb | then check the results and reboot | 18:02 |
fungi | yep, and hopefully soonish if my sources.list change has propagated | 18:03 |
fungi | which it doesn't seem to have done yet | 18:03 |
fungi | puppetboard does not look so well | 18:04 |
clarkb | fungi: its possible mordred's ansible work has broken something | 18:04 |
fungi | 111 population, 111 nodes unreported in the last 1.5 hours | 18:04 |
clarkb | there is also linux-image-virtual | 18:04 |
clarkb | we should double check on what our VMs actually use | 18:04 |
fungi | ahh, i was going by uname -a | 18:05 |
mordred | o/ | 18:05 |
mordred | what? | 18:05 |
mordred | clarkb: oh - in puppetboard? | 18:05 |
clarkb | mordred: ya no rpeorts for an hour and a half | 18:05 |
mordred | k. on it | 18:05 |
mordred | o - yes - that's because of the two patches that are trying to land | 18:06 |
fungi | on a rackspace instance, which isn't entirely representative of the state of the world granted since we have pypi mirrors elsewhere | 18:06 |
mordred | that haven't landed due to node starvation | 18:06 |
mordred | but I just saw their last set of tests start running in zuul status | 18:06 |
fungi | mordred: well, also it's not puppeting. my 267778 change to manage sources.list merged an hour ago and hasn't propagated | 18:07 |
mordred | right | 18:07 |
mordred | that's why they haven't reported | 18:07 |
mordred | because they aren't puppeting | 18:07 |
mordred | will be fixed as soon as that patch lands | 18:07 |
fungi | ahh, okay, just confirming it's not only missing reports, but actually not running | 18:07 |
fungi | okay, confirmed my sources.list change has propagated to review.o.o now | 18:37 |
clarkb | fungi: you should be able to do something like sudo ansible all -m shell 'apt-get update && apt-get install $package' | 18:41 |
clarkb | but check my ansible :) | 18:42 |
fungi | okay, just tested and confirmed that review and zuul (representatives of trusty and precise respectively) apt-get update successfully after the sources.list update got applied by puppet just now | 18:43 |
fungi | clarkb: we need to break that up by release though, right? or just accept that it will fail on not-trusty? | 18:44 |
clarkb | I was thinking accept it will fial or noop on not trusty | 18:44 |
clarkb | if you write a proper playbook I think you cna restrict it to where the fact gathering reports trusty | 18:45 |
clarkb | might be able to do that on the command line too | 18:45 |
fungi | i have about 10 minutes to prep for the weekly meeting, so i'll pick this back up in about 70 minutes ;) | 18:48 |
clarkb | http://docs.ansible.com/ansible/playbooks_conditionals.html#the-when-statement | 18:51 |
clarkb | so we could do something like when os_distro_release == "trusty" | 18:51 |
clarkb | (those values may not be what ansible uses) | 18:52 |
fungi | okay, checking up on where we're at with this now | 21:03 |
fungi | apt-cache show on review.o.o indicates we have a linux-generic 3.13.0.76.82 pending | 21:05 |
fungi | but we apparently need 3.13.0-76.120 | 21:06 |
fungi | i installed 3.13.0.76.82 on etherpad-dev just to check it out, and /usr/share/doc/linux-image-3.13.0-76-generic/changelog.Debian.gz indicates that it is indeed the update for CVE-2016-0728/LP: #1534887 "KEYS: Fix keyring ref leak in join_session_keyring()" | 22:13 |
openstack | Launchpad bug 1534887 in linux (Ubuntu Xenial) "CVE-2016-0728" [High,Incomplete] https://launchpad.net/bugs/1534887 | 22:13 |
fungi | thanks openstack | 22:13 |
fungi | 3.13.0-76.120 may be the corresponding source package version | 22:13 |
clarkb | fungi: so thats just a package versioning confusion? | 22:13 |
fungi | hard to tell since the packages.u.c update pulse for today hasn't happened yet | 22:14 |
fungi | but yes, looks that way | 22:14 |
fungi | oh! | 22:15 |
fungi | should have done apt-cache show linux-image-3.13.0-76-generic not linux-generic | 22:16 |
fungi | that does actually indicate that 3.13.0-76.120 is the current version in the package list | 22:17 |
clarkb | if possible leave the nodepool restart to me and I can do it when I upgrade the service | 22:17 |
fungi | so i think that means we're ready to start kernel upgrades | 22:17 |
fungi | well, we need to make sure the new packages get installed everywhere before we even start planning what order we reboot them in, i think | 22:17 |
clarkb | ++ | 22:18 |
clarkb | fungi: did you see my ansible link? something like that should work great for ansibling the update and install | 22:18 |
fungi | yep, looks good | 22:19 |
fungi | also we have 33 nodes running trusty according to http://puppetboard.openstack.org/fact/operatingsystemrelease/14.04 | 22:19 |
fungi | i'll go ahead and give the package updates a shot | 22:20 |
clarkb | ok let me know if I can help with that | 22:20 |
fungi | also need to circle back around and see if red hat has posted rhel 7 package details for this yet | 22:21 |
clarkb | I can do that | 22:21 |
fungi | and whether the equivalents have made it into centos | 22:21 |
fungi | i couldn't find any earlier, but that was hours ago so hopefully they've updated the bug now | 22:21 |
clarkb | my initial check is coming up empty | 22:22 |
clarkb | I usually start at https://lists.centos.org/pipermail/centos-announce/2016-January/thread.html | 22:22 |
clarkb | but will check yum too just because | 22:22 |
fungi | mordred: if you have any input on ansibleisms for this, it's appreciated. i'm just going to start swinging wildly over here | 22:23 |
clarkb | I appreciate that the package name is "kernel" | 22:23 |
fungi | debian used to call it kernel-image until they started having hurd, kfreebsd, et cetera kernels in addition to linux | 22:23 |
fungi | then it became a little confusing so they did a package name transition to linux-image | 22:24 |
clarkb | we are currently running a few versions behind, going t ocheck if the latest has our patch | 22:25 |
clarkb | nope latest is from the 6th | 22:25 |
clarkb | so I think we are still waiting for package update on centos | 22:25 |
fungi | clarkb: from the example you linked, looks like we want when: ansible_distribution == "Ubuntu" and ansible_distribution_major_version == "14.04" | 22:26 |
clarkb | fungi: that sounds about right | 22:26 |
clarkb | fungi: then just run a shell task to do apt-get update && apt-get install $package or dist-upgrade | 22:26 |
clarkb | or you can use the apt module and have it do those things | 22:26 |
clarkb | (though I find it easier to just shell for one offs like this) | 22:27 |
fungi | do i create a temporary role file to put this stuff in, or can it be fed as cli arguments? | 22:27 |
clarkb | I think you need a temp playbook | 22:27 |
clarkb | it may be possible to feed it all as cli arguments but the docs tend to be lacking on how to do that | 22:28 |
clarkb | so I just cave and make a yaml file so I cna follow docs | 22:28 |
clarkb | a general playbook for "forcefully update packages now" might make sense though | 22:28 |
clarkb | then we can just run it next time | 22:28 |
fungi | No manual entry for ansible-playbook; See 'man 7 undocumented' for help when manual pages are not available. | 22:29 |
* fungi shakes cane again | 22:29 | |
clarkb | fungi: because we pip install I think | 22:29 |
fungi | and ansible-playbook --help doesn't work without sudo, wants to write to /var/log/ansible.log/var/log/ansible.log | 22:30 |
fungi | that's sort of scary | 22:30 |
* fungi needs to stop thinking like a sysadmin and use web searches for stack sexchange articles from people he has no reason to trust | 22:30 | |
clarkb | that may be a misconfiguration on our part | 22:30 |
clarkb | fungi: I like to google man whatever | 22:30 |
clarkb | google is the best mandb | 22:31 |
clarkb | ya log_path=/var/log/ansible.log is set in /etc/ansible/ansible.cfg | 22:31 |
fungi | however, why does it even try to open its log when invoked with only --help? that's the part i found scary | 22:32 |
clarkb | also why is it appending /var/log/ansible to our path :) | 22:35 |
fungi | clarkb: oh, that was my bouncing paste button. i need to fix that | 22:35 |
fungi | the message just said IOError: [Errno 13] Permission denied: '/var/log/ansible.log' | 22:35 |
clarkb | aha | 22:36 |
fungi | but sometimes my middle button registers two clicks when i press once at the moment | 22:36 |
fungi | the example i had used a -f 10 (from our workspace cleanup) so i was trying to figure out what that's doing, but found my answer in /usr/local/lib/python2.7/dist-packages/ansible/cli/playbook.py | 22:37 |
fungi | use the source, luke | 22:37 |
fungi | thought maybe it was a timeout or something, but no it's the short form of --forks | 22:37 |
fungi | so parallelism count | 22:37 |
clarkb | yup | 22:39 |
clarkb | default is 5 I Think | 22:39 |
fungi | okay, so i have a file in my homedir called upgrade_kernel.yaml with this content: http://paste.openstack.org/show/484348 | 22:41 |
clarkb | looking | 22:41 |
clarkb | fungi: that looks about righ tthen you would do `ansible-playbook $pathtoyamlfile | 22:42 |
clarkb | fungi: er you need a host spec first | 22:42 |
clarkb | fungi: ansible-playbook $host yamlfile | 22:42 |
clarkb | fungi: where all is an alias for all known hosts | 22:42 |
fungi | sudo ansible-playbook -f 10 all upgrade_kernel.yaml | 22:42 |
clarkb | ya | 22:42 |
fungi | trying that now | 22:43 |
clarkb | you cna replace all with a specific fqdn if yo uwant to test first | 22:43 |
fungi | i'll do it under screen | 22:43 |
fungi | oh, good idea | 22:43 |
mordred | heya - sorry - I was on the phone - do you still need help? | 22:46 |
clarkb | mordred: we should know shortly :) | 22:46 |
fungi | looks like ansible-playbook doesn't take a hostname parameter | 22:46 |
mordred | soooo | 22:46 |
fungi | it wants ANSIBLE_HOSTS envvar passed? | 22:46 |
mordred | that's not a playbook you have there | 22:46 |
mordred | one sec | 22:47 |
fungi | do i comma-separate multiples or... | 22:47 |
fungi | oh | 22:47 |
clarkb | oh can we ansible it instead | 22:47 |
fungi | did i mention swinging wildly over here? ;) | 22:47 |
clarkb | (I really hate the ansible terminology fwiw) | 22:47 |
clarkb | (granted I am sure puppet is no better I have just gotten used to it | 22:47 |
clarkb | fungi: you should just pass the name on the command line in place of all iirc | 22:48 |
mordred | http://paste.openstack.org/show/484349/ | 22:48 |
mordred | fungi, clarkb: ^^ try tat | 22:48 |
clarkb | oh right this is the difference between playbook and the other command | 22:48 |
mordred | ansible-playbook name-of-yaml.yml | 22:48 |
clarkb | (which is part of my confusion I think, and ansible should just rm one of them) | 22:48 |
fungi | mmm black magic | 22:48 |
mordred | a playbook is a collection of plays - a play is the combination of host specifications with one or more tasks | 22:49 |
mordred | so a playbook by design is a thing that associates tasks with where you want to run them | 22:49 |
fungi | what's the strategy parameter? | 22:49 |
mordred | actually - that's a waste, you can remove it - there is only one task here | 22:49 |
mordred | but in genreal it says "don't wait for other hosts to finish the task you're on before proceeding ot the next task" | 22:50 |
fungi | ahh, got it | 22:50 |
mordred | which in a case like this, is correct - you do not care that all hosts finish task one then move on to task two | 22:50 |
fungi | so i ansible-playbook this file still? | 22:50 |
fungi | and pass an envvar to indicate which host i want to test it on? | 22:50 |
fungi | at least that's what the manpage seems to imply | 22:51 |
mordred | fungi: yes | 22:51 |
mordred | no | 22:52 |
mordred | do --limit=$hostname | 22:52 |
mordred | fungi: ansible-playboot --limit=review-dev.openstack.org that-file.yaml | 22:52 |
fungi | sudo ansible-playbook -f 10 --limit=ask-staging.openstack.org upgrade_kernel.yaml | 22:52 |
mordred | yes | 22:52 |
fungi | thanks | 22:52 |
fungi | it's not liking the hosts: * line | 22:53 |
clarkb | fungi: use hosts all | 22:54 |
clarkb | I think | 22:54 |
fungi | yep, that seems to have worked | 22:54 |
mordred | oops. sorry | 22:54 |
fungi | wow, it claims to have run, but completed very quickly | 22:54 |
fungi | far too quickly given that this should have taken a minute or so | 22:55 |
fungi | oh, it says TASK [command]... skipping: [ask-staging.openstack.org] | 22:55 |
clarkb | fungi: the when may have failed | 22:55 |
mordred | yes. | 22:56 |
mordred | http://paste.openstack.org/show/484351/ | 22:57 |
fungi | indeed, it looks like it is actually running once i remove the when condition | 22:57 |
mordred | fungi: ^^ | 22:57 |
mordred | that way you can check to see what the values of the variables are there | 22:57 |
fungi | thanks | 22:58 |
mordred | you can also run "ansible ask-staging.openstack.org -m setup" and it will print all of the variables | 22:58 |
mordred | fungi: are you screening? | 22:58 |
fungi | "ansible_distribution_major_version": "14" | 22:58 |
fungi | mordred: yep | 22:59 |
mordred | well, there we go! | 22:59 |
mordred | (that's probably good enough for matching for this) | 22:59 |
fungi | so, er, i guess that's not the same as what facter does for major version' | 22:59 |
clarkb | since we never did unicorn | 22:59 |
fungi | yeah, good enough | 22:59 |
fungi | `sudo ssh ask-staging.openstack.org dpkg -l linux-image-3.13.0-76-generic` indicates it's installed now | 23:00 |
mordred | woot! | 23:01 |
fungi | and rerunning the playbook with == "14" seems to not skip the host | 23:01 |
mordred | that's excellent | 23:01 |
fungi | so i think we're set to open it wide? | 23:01 |
clarkb | cool time to run it everywhere I think | 23:01 |
mordred | ++ | 23:01 |
fungi | so i expect etherpad-dev (which i tested manually earlier) and ask-staging to fail possibly as they've already got the new package | 23:02 |
fungi | also possibly some hosts (review-dev and some of our pypi mirrors) with puppet disabled | 23:02 |
fungi | anyway, it's running now | 23:02 |
fungi | seems to be working so far. i'll do some more spot checks after it finishes | 23:03 |
fungi | looks like it's skipping the right hosts | 23:06 |
fungi | also some errors which look like hosts where puppet has been disabled for a while so probably fon | 23:06 |
fungi | gah | 23:07 |
fungi | probably don't have my sources.list updates yet | 23:07 |
clarkb | hrm, maybe we just manually update sources.list if list is small and rerun? | 23:08 |
clarkb | or we fiure out if we can enable puppet again | 23:08 |
fungi | yeah, for example i have no idea when/why it was disabled on review-dev | 23:10 |
clarkb | I think that was part of the gerrit upgrade prep | 23:11 |
clarkb | we should get it puppeting again but that may be mor ethan a 5 minute change | 23:11 |
*** ianw has joined #openstack-infra-incident | 23:12 | |
fungi | it was reenabled at some point after the gerrit upgrade | 23:12 |
fungi | and disabled again since | 23:12 |
fungi | mordred: what's the default timeout on ansible? it looks like it's just hanging for a few minutes now, and i suspect it's having trouble connecting to or hearing back from something | 23:14 |
mordred | fungi: it's going to look like it's hanging if it's running something - output is buffered til the end of task execution | 23:14 |
mordred | fungi: I have not noticed overly-long hang/timeouts when doing testing of puppet runs ... | 23:15 |
fungi | some of these are probably just taking a while | 23:15 |
mordred | yah | 23:15 |
Clint | i think it's like 5 or 10 minutes | 23:15 |
mordred | I wish the output was streamed - but output is a json blob, so I think streaming woudl be tricky | 23:15 |
fungi | like update-initrd churning through dozens of old kernel packages we never autoremoved | 23:15 |
mordred | fungi: oh. yeah. that'll take a minute | 23:16 |
clarkb | did the change to autoremove not merge | 23:16 |
clarkb | maybe I hsould dig that up | 23:16 |
fungi | it's still sitting. no new output for about 10 minutes | 23:22 |
fungi | mordred: ps claims that my ansible-playbook call has 10 defunct zombie children | 23:29 |
fungi | no, sorry, i mis-counted. just 9 | 23:30 |
fungi | but i worry they're never going to terminate and are actually indefinitely hung | 23:37 |
mordred | hrm | 23:39 |
mordred | fungi: well, worst-case with bailing is that you'll abort 10 apt-get install processes | 23:39 |
fungi | any clue whether i should just shoot the parent in the head and try again? keyboard interrupt? sigkill? or are there more graceful options? | 23:39 |
mordred | fungi: and have to apt-get -f install somewhere | 23:39 |
mordred | fungi: I'd just ctrl-c if it were me | 23:39 |
fungi | i actually looked at ps hoping it would mention which hosts it was communicating with for the currently lingering forks, but no such luck | 23:40 |
mordred | you could maybe kill the forst | 23:40 |
mordred | and see if you get good error from the parent | 23:40 |
mordred | fungi: forks. not forst | 23:40 |
mordred | fungi: don't kill the forst | 23:40 |
fungi | yeah, i'll kill the immediate parent of the forks but not the parent's parent | 23:41 |
fungi | oh, i missed, it's right there in the ssh command-line | 23:42 |
fungi | 15.126.140.7 (which seems to have no reverse dns?) | 23:42 |
fungi | that's pypi.region-b.geo-1.openstack.org | 23:42 |
mordred | no reverse dns in hpcloud | 23:43 |
fungi | right, which is why i checked that one first | 23:43 |
fungi | and also because it looked like an hpcloud ip address | 23:43 |
mordred | yay for 15. | 23:43 |
fungi | so the recap says these failed: afs01.dfw, afs01.ord, afsdb01, afsdb02, odsreg-test-corvus, pypi.region-b.geo-1, review-dev, test-mordred-config-drive | 23:45 |
jeblair | i think we can delete odsreg-test-corvus | 23:45 |
clarkb | those all sounds like VMs that may not have up to date sources.list | 23:45 |
fungi | er, test-mordred-config-drive was unreachable, not failed | 23:45 |
fungi | also there's a ab78618b-a1f4-4d0a-8aeb-56f7b688bcf4 which was unreachable | 23:46 |
fungi | the hostlist is dynamically generated from nova list such that i can just nova delete the trash instances, or does something else need updating in between? | 23:47 |
fungi | no idea what that uuid is, it's not in openstackci rax-dfw though | 23:47 |
fungi | nevermind, i was looking in the openstack tenant not openstackci | 23:48 |
fungi | it's the old release.slave. i'll clean that one up too | 23:48 |
fungi | it was unreachable because it's in shutdown state since after i replaced it | 23:49 |
fungi | jeblair: i've deleted odsreg-test-corvus.openstack.org (from rax-ord) too, thanks | 23:50 |
fungi | should we ignore the afs servers for now or reenable puppet on them or update them manually? | 23:51 |
fungi | i've already got review-dev sorted with zaro and reenabled | 23:51 |
jeblair | fungi: you're saying the afs fileservers are disabled in puppet? | 23:58 |
jeblair | fungi, mordred: i don't know why that would be | 23:58 |
fungi | jeblair: actually, they're not according to puppetboard | 23:59 |
mordred | fungi: test-mordred-config-drive can die | 23:59 |
fungi | so the update presumably failed on them for some other reason i'll debug in a bit | 23:59 |
mordred | fungi: if you delete instances, you want to delete the inventory cache too | 23:59 |
fungi | mordred: thanks, doing | 23:59 |
mordred | fungi: /var/cache/ansible-inventory/<tab> | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!