pabelanger | #status log logstash-worker17.openstack.org now running ubuntu-trusty and processing requests | 00:01 |
---|---|---|
clarkb | ok I am out now, will try to sort out why puppet ansible things are broken in the morning | 00:01 |
openstackstatus | pabelanger: finished logging | 00:03 |
pabelanger | wow, that was slow | 00:03 |
pabelanger | clarkb: Oh, I know the issue | 00:05 |
pabelanger | clarkb: it is because puppetmaster.o.o is in emergency file | 00:05 |
clarkb | oh? | 00:05 |
clarkb | ahahahahahaha | 00:05 |
clarkb | RIP | 00:05 |
pabelanger | Yup | 00:05 |
pabelanger | so, we can remove it | 00:05 |
pabelanger | since osic is backonline | 00:05 |
pabelanger | jeblair: you okay with us removeing puppetmaster.o.o from emergency? | 00:06 |
pabelanger | since osic reverted their SSL cert | 00:06 |
clarkb | makes sense to me | 00:09 |
pabelanger | okay, removing | 00:09 |
pabelanger | #status log puppetmaster.o.o remove from emergency file since OSIC is now back online | 00:10 |
openstackstatus | pabelanger: finished logging | 00:10 |
pabelanger | #status log logstash-worker18.openstack.org now running ubuntu-trusty and processing requests | 00:18 |
openstackstatus | pabelanger: finished logging | 00:18 |
*** baoli has quit IRC | 00:22 | |
*** baoli has joined #openstack-sprint | 00:23 | |
pabelanger | clarkb: booyah: http://logstash.openstack.org/#/dashboard | 00:25 |
pabelanger | #status log logstash-worker19.openstack.org now running ubuntu-trusty and processing requests | 00:32 |
openstackstatus | pabelanger: finished logging | 00:33 |
pabelanger | #status log logstash-worker20.openstack.org now running ubuntu-trusty and processing requests | 00:48 |
openstackstatus | pabelanger: finished logging | 00:48 |
pabelanger | \o/ | 00:48 |
pabelanger | very happy how well that went | 00:48 |
*** baoli has quit IRC | 01:29 | |
fungi | excellent job! | 01:30 |
*** baoli has joined #openstack-sprint | 01:31 | |
jhesketh | I'm going to take the old apps.openstack.org offline after snapshotting fyi | 02:43 |
*** rfolco has quit IRC | 02:56 | |
jhesketh | deleted | 03:02 |
-openstackstatus- NOTICE: Gerrit is going offline briefly to check possible filesystem corruption | 03:02 | |
*** ChanServ changes topic to "Gerrit is going offline briefly to check possible filesystem corruption" | 03:02 | |
anteaya | jhesketh: thank you | 03:07 |
*** anteaya has quit IRC | 03:08 | |
*** ChanServ changes topic to "Taking Infra servers running Precise and upgrading them to Trusty | https://wiki.openstack.org/wiki/VirtualSprints#Infra_Trusty_Upgrade" | 03:22 | |
-openstackstatus- NOTICE: after a quick check, gerrit and its filesystem have been brought back online and should be working again | 03:22 | |
*** baoli has quit IRC | 03:44 | |
*** baoli has joined #openstack-sprint | 03:45 | |
*** baoli has quit IRC | 03:50 | |
*** baoli has joined #openstack-sprint | 03:51 | |
*** yuikotakadamori has joined #openstack-sprint | 03:51 | |
*** baoli has quit IRC | 05:29 | |
-openstackstatus- NOTICE: zuul required a restart due to network outages. If your change is not listed on http://status.openstack.org/zuul/ and is missing results, please issue a 'recheck'. | 07:14 | |
*** yuikotakadamori has quit IRC | 10:01 | |
*** rfolco has joined #openstack-sprint | 11:18 | |
*** yolanda has quit IRC | 12:04 | |
*** yolanda has joined #openstack-sprint | 12:06 | |
*** yolanda has quit IRC | 12:41 | |
*** baoli has joined #openstack-sprint | 12:54 | |
*** baoli_ has joined #openstack-sprint | 12:56 | |
*** baoli has quit IRC | 12:59 | |
*** yolanda has joined #openstack-sprint | 13:20 | |
pabelanger | now that yak shaving is out of the way :) | 14:00 |
pabelanger | going to prep eavesdrop.o.o for launch | 14:00 |
pabelanger | should be able to start at 1600UTC | 14:00 |
pabelanger | clarkb: going to start looking at ES on ubuntu-trusty | 14:13 |
pabelanger | setting cluster.routing.allocation.enable to none | 14:14 |
clarkb | ok, I am not fully here yet but its basi ally the upgradr process | 14:14 |
clarkb | pabelanger: tgose instances do use cinder volumes for the es data | 14:14 |
pabelanger | clarkb: Ya, that is what I am looking at now | 14:15 |
pabelanger | I've stopped elasticsearch on ES02 | 14:24 |
fungi | reminder: double-check any servers you've replaced to make sure you remembered to add reverse dns for both ipv4 and ipv6 addresses on each | 14:25 |
pabelanger | ++ | 14:26 |
fungi | (in other words, don't be a chump like me!) | 14:26 |
pabelanger | okay, dropping dns ttl on elasticsearch hosts to 5mins | 14:27 |
fungi | the paste-precise-backup snapshot completed overnight, so i'm going to delete the old halted instance now | 14:29 |
pabelanger | sorry, I've stopped elasticsearch on ES07 | 14:29 |
pabelanger | not ES02 | 14:29 |
fungi | 2 and 7 look a lot alike | 14:30 |
pabelanger | /facepalm | 14:30 |
pabelanger | no, I stopped ES02 | 14:30 |
pabelanger | I'm using ES07 as my SSH tunnel | 14:30 |
pabelanger | okay, need to step away for 5mins to let my brain recover | 14:30 |
pabelanger | and fetch some coffee | 14:30 |
fungi | just soak the brain in caffeine | 14:31 |
pabelanger | could use some help landing https://review.openstack.org/#/c/320642/ for elasticsearch migrations | 14:43 |
clarkb | approved | 14:45 |
pabelanger | danke | 14:45 |
pabelanger | Think I am going to put ES02 into shutdown, so I don't run into the detach issue again | 14:46 |
pabelanger | fungi: safe to start work on ES02? Want to make sure you were able to check the volume before I shutdown | 14:59 |
pabelanger | A quick poke on the server didn't show any errors | 14:59 |
fungi | pabelanger: if the volume is still mounted read/write and dmesg -T doesn't show any filesystem/block device errors from overnight, go ahead with it | 15:01 |
fungi | i haven't gotten that far down the list yet | 15:01 |
pabelanger | fungi: Yup, last logs are from Apr 4 | 15:01 |
pabelanger | okay, placing ES02 into shutdown | 15:02 |
pabelanger | Hmm, looks like we're hitting out quote issue | 15:04 |
pabelanger | need to launch 60 GB Performance (performance2-60) | 15:04 |
pabelanger | clarkb: fungi: Are we okay with deleting each elasticsearch host first to recover quota? Then standing up the replacement server | 15:07 |
fungi | pabelanger: that may be the only way to go about it | 15:07 |
clarkb | pabelanger: yes I think its our only sane option | 15:07 |
pabelanger | okay, let me do that | 15:08 |
fungi | unless there are still some instances we need to clean up from other replacements | 15:08 |
pabelanger | I've already detached the volume | 15:08 |
fungi | i think i've deleted all the old instances i've replaced so far though | 15:08 |
pabelanger | Same | 15:08 |
pabelanger | okay, going to delete elasticsearch02.o.o | 15:08 |
clarkb | I have one 2gb insyance that needs deletion but thats not enough for an es host | 15:10 |
clarkb | just make sure the replacement is big like the original. Java and ES use all the memory | 15:14 |
pabelanger | ack | 15:27 |
pabelanger | elasticsearch02.o.o online, I've enabled shard allocation again | 15:37 |
clarkb | pabelanger: were you able to have the launch machinery attach exksting volume or did you do that by hand? | 15:39 |
pabelanger | clarkb: I did it by hand this time | 15:39 |
pabelanger | I can try using launch-node.py for the next one | 15:39 |
pabelanger | I found 1 issue | 15:40 |
pabelanger | after I mounted the cinder volume, I had to chmod -R elasticsearch: /var/lib/elasticsearch because the the uid was not correct | 15:40 |
pabelanger | going to see if we can have puppet manage that | 15:41 |
clarkb | ah ya likely due to how we reserve a chunk of uids now | 15:41 |
pabelanger | running daughter down to school. Waiting for cluster to go green ATM | 15:53 |
pabelanger | #status log elasticsearch02.o.o upgraded to ubuntu-trusty and cluster is green | 16:20 |
openstackstatus | pabelanger: finished logging | 16:20 |
pabelanger | okay, moving on to ES03 | 16:24 |
clarkb | the others shouldn't require an apache restart for the proxy since we only proxy to 02 (that needs fixing but has been low priority) | 16:24 |
pabelanger | agreed | 16:25 |
pabelanger | clarkb: So, just confirming, I need to pass --volume and --mount-path to launch-node.py it seems | 16:27 |
clarkb | pabelanger: ya, I need to double check that the script sdo the right thing if the volume already has an fs | 16:28 |
pabelanger | good call | 16:28 |
clarkb | looks like they won't, if an fs already exists they noop | 16:29 |
clarkb | so thats not super useful | 16:29 |
pabelanger | okay | 16:29 |
pabelanger | I'll attach by hand again | 16:30 |
pabelanger | anybody want to review: https://review.openstack.org/#/c/322242/ | 16:30 |
clarkb | so the scripts work if you attach a brand new volume but not for migrating volumes between instances, I can look at addressing that | 16:31 |
pabelanger | okay | 16:31 |
clarkb | pabelanger: do you know if puppet will touch the file mode on that dir and its children? | 16:32 |
pabelanger | clarkb: I believe just that directory | 16:32 |
clarkb | it uses the file mode of the file source if you don't explicitly add one to the file resource, not sure what it does with dirs on recurse | 16:32 |
pabelanger | but, I can update it to present to be safe | 16:32 |
pabelanger | TIL | 16:34 |
pabelanger | https://docs.puppet.com/puppet/latest/reference/type.html#file-attribute-recurse | 16:34 |
pabelanger | we need ensure => directory | 16:34 |
clarkb | pabelanger: ya ensure => directory is ok I think, I just don't want to to change the file modes on all those files due to some default behavior | 16:38 |
clarkb | (which it does do on proper files without modes set) | 16:38 |
pabelanger | Oh right, I misread what you were asking | 16:38 |
pabelanger | Ya, that's the reason I left mode off | 16:39 |
pabelanger | #status log elasticsearch03.o.o upgraded to ubuntu-trusty and cluster is green | 16:43 |
openstackstatus | pabelanger: finished logging | 16:43 |
clarkb | pabelanger: so puppet will not touch file modes in this case? | 17:00 |
pabelanger | clarkb: right, it will just noop on them | 17:00 |
pabelanger | that's how I've always understood it | 17:01 |
clarkb | ok +2 | 17:01 |
pabelanger | clarkb: So, I have a shard that is still unassigned. Is there any way to kick it to a host? | 17:07 |
pabelanger | "reason": "NODE_LEFT", | 17:07 |
clarkb | pabelanger: no, is it the only outstanding shard or are others recovering? | 17:07 |
clarkb | ES will only process a small number at a time (2 I think) | 17:08 |
pabelanger | clarkb: only oustanding | 17:08 |
pabelanger | others have settled | 17:08 |
clarkb | let me get a proxy running an dwill look | 17:08 |
pabelanger | actually, | 17:08 |
pabelanger | there is 2 purple ATM | 17:08 |
clarkb | ya it may be rebalancing before assigning that one shard | 17:09 |
pabelanger | okay | 17:09 |
clarkb | lets wait until those purple ones are done | 17:09 |
pabelanger | I'll hold off on moving to ES05 until it gets assigned | 17:09 |
clarkb | pabelanger: that unassigned shard does have its master copy on es07 so as long as yo udon't turn off es07 before it gets its replica up we should be ok | 17:10 |
clarkb | but would be nice to see it recover it | 17:10 |
pabelanger | sure, I don't mind waiting for a few minutes | 17:10 |
pabelanger | the upgrades are going smooth | 17:10 |
clarkb | yay | 17:11 |
clarkb | if we turn off es07 in this state the cluster will go read beacuse one shard is completely not available, it should go back to yellow once es07 is back up again though | 17:12 |
pabelanger | clarkb: woot | 17:19 |
pabelanger | here we go | 17:19 |
pabelanger | moved to recovering | 17:19 |
pabelanger | #status log elasticsearch04.o.o upgraded to ubuntu-trusty and cluster is green | 17:24 |
openstackstatus | pabelanger: finished logging | 17:24 |
pabelanger | #status log elasticsearch05.o.o upgraded to ubuntu-trusty and cluster is green | 17:47 |
openstackstatus | pabelanger: finished logging | 17:47 |
pabelanger | #status log elasticsearch06.o.o upgraded to ubuntu-trusty and cluster is green | 18:51 |
openstackstatus | pabelanger: finished logging | 18:51 |
clarkb | pabelanger: \o/ just one more to go? | 18:52 |
pabelanger | clarkb: indeed! | 18:52 |
clarkb | still waiting for puppet to update logtash.o.o | 18:52 |
pabelanger | need to pick up my daughter in 5mins, but should be able to hammer out ES07 once I get back | 18:53 |
clarkb | woot | 18:53 |
clarkb | #status log logstash.openstack.org upgraded to ubuntu trusty | 18:59 |
openstackstatus | clarkb: finished logging | 18:59 |
clarkb | that took entirely too much time but it is done now :) | 19:02 |
clarkb | looks like lists, zuul, static, wiki, and planet are the remaining "hard" upgrades | 19:11 |
clarkb | and eavesdrop, puppetdb, and es07 are the remainder that are possible today (maybe) | 19:12 |
pabelanger | #status log elasticsearch07.o.o upgraded to ubuntu-trusty and cluster is green | 19:33 |
openstackstatus | pabelanger: finished logging | 19:33 |
pabelanger | \o/ | 19:33 |
pabelanger | clarkb: Ya, we likely need to schedule lists / zuul / static. wiki is last on my list, I haven't looked at planet | 19:34 |
pabelanger | going to look at eavesdrop.o.o now | 19:35 |
clarkb | pleia2 was looking at planet and apparently trusty's version of the software is broken | 19:35 |
clarkb | pleia2 mentioned possibly doing the jump straight to xenial | 19:35 |
pabelanger | Ah | 19:38 |
pabelanger | I can poke at the puppet manifests next week, see what would be needed | 19:39 |
pabelanger | I suspect we'd want to move to puppetlabs-apache to be safe | 19:39 |
pabelanger | okay, shutting down eavesdrop.o.o to detach volume | 19:42 |
*** openstack has joined #openstack-sprint | 21:44 | |
pabelanger | okay, I have no idea how irc-meetings are updated on eavesdrop.o.o | 21:48 |
pabelanger | also, spacex sticks it again | 21:49 |
pabelanger | \o/ | 21:49 |
pabelanger | I can only think some crontab that was manually installed | 21:49 |
clarkb | pabelanger: I think it may be a jenkins job | 21:50 |
pabelanger | Oh | 21:50 |
pabelanger | that sounds right | 21:50 |
pabelanger | guess we need to trigger a job | 21:51 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!