jhesketh | fungi: so with the new naming thing, is that meant to apply to dns too? eg I launch apps01.openstack.org, create a DNS apps01.openstack.org and use a CNAME for apps.openstack.org? | 00:15 |
---|---|---|
jhesketh | or do I just launch apps01.openstack.org as the server name and update the apps.openstack.org records? | 00:15 |
anteaya | jhesketh: I don't know the answer to the naming question | 00:27 |
anteaya | my nephew has arrived for tea | 00:27 |
anteaya | I may or may not be back online tonight | 00:27 |
fungi | jhesketh: i think cname, though we didn't explicitly discuss that part of the implementation | 00:35 |
fungi | jhesketh: though you likely want to make sure there are no fqdn-isms baked into puppet for the service you're replacing | 00:36 |
fungi | i'm leaning toward this is guidance going forward for new services we build out, but unless someone does the legwork to clean up hostname assumptions in our existing puppet we likely should continue doing replacements the old way when there's some question | 00:37 |
jhesketh | fungi: right, so if there are fqdn-isms (so to speak), how do you launch a new node? Do you do it with the same name as the existing one? | 00:42 |
fungi | yep | 00:43 |
fungi | and then just edit the dns records | 00:43 |
fungi | to switch to the replacement | 00:43 |
* jhesketh is embarrassed to admit he hasn't ever edited any openstack's dns before... | 00:44 | |
jhesketh | fungi: what's the best way to do that.. through the web ui? (given the commands from dns.py are for creating new records rather than updating) | 00:45 |
fungi | jhesketh: yes, i end up doing dns changes through the rax web dashboard | 00:45 |
fungi | remember the tenant that domain is under is "openstack" not openstackci | 00:45 |
jhesketh | yeah I found them earlier | 00:46 |
fungi | login credentials for that are in our usual password list | 00:46 |
*** rfolco has joined #openstack-sprint | 00:49 | |
ianw | couple of jobs failed i think 2 hours ago with "timeout -s SIGINT 0 git fetch http://zm05.openstack.org/p/openstack-dev/devstack refs/zuul/master/Z6370fc143e874864a12f8061044c2f04" ... i'm gonna presume it was related to upgrades? | 01:01 |
ianw | e.g. http://logs.openstack.org/52/320152/2/check/gate-devstack-unit-tests/1b1e49a/console.html | 01:01 |
jhesketh | fungi: so just looking at the hostname stuff... if we did apps01.openstack.org we'd also need to update the site.pp to match there.. | 01:07 |
jhesketh | should we have something like "node /^apps\d*\.openstack\.org$/ {" for all the node definitions? | 01:07 |
*** baoli has joined #openstack-sprint | 01:09 | |
fungi | ianw: that sounds possible. pabelanger pretty consistently did a #status log of each of them so should be timestamped on the infra status wiki | 01:12 |
fungi | jhesketh: yeah, though there may also be $::fqdn references in the corresponding class in system-config or in the puppet module instantiated by it | 01:13 |
fungi | you'll want to keep an eye out for that as well | 01:13 |
jhesketh | oh yeah, there's heaps of that | 01:13 |
jhesketh | most of those are vhost names so we should look at having vhost in the site.pp | 01:15 |
jhesketh | hmm, replacing $::fqdn is no small feat | 01:20 |
fungi | yep, that's why i say don't let the fqdn-isms in our puppet keep you from doing upgrades for now. people who have a strong interest in that refactoring effort can tackle it after we upgrade stuff, but the upgrades are more urgent than separating out all the hostname assumptions in our config management right now | 01:39 |
fungi | that was more or less the point of my second e-mail on that thread | 01:41 |
*** hieulq has quit IRC | 01:48 | |
jhesketh | yep that seems sensible... I looked at tackling the fqdn but I think upgrading the old way for now to move the sprint along is best | 01:49 |
fungi | i wound up taking the same route through the storyboard replacement build too | 01:50 |
anteaya | I'm out for the night | 02:01 |
anteaya | thanks jhesketh for taking on the apps.o.o operating system upgrade | 02:01 |
anteaya | :) | 02:01 |
anteaya | g'night | 02:01 |
jhesketh | anteaya: no trouble.. I've gotten side-tracked but will look at it this afternoon | 02:01 |
jhesketh | anteaya: have a good evening :-) | 02:01 |
anteaya | thank you :) | 02:02 |
anteaya | have a good day | 02:02 |
*** anteaya has quit IRC | 02:02 | |
*** hieulq has joined #openstack-sprint | 03:10 | |
*** baoli has quit IRC | 05:19 | |
*** ig0r_ has joined #openstack-sprint | 06:00 | |
*** ig0r_ has quit IRC | 06:17 | |
*** ig0r_ has joined #openstack-sprint | 08:20 | |
*** delatte has quit IRC | 12:12 | |
*** baoli_ has joined #openstack-sprint | 12:45 | |
*** ig0r__ has joined #openstack-sprint | 13:05 | |
*** ig0r_ has quit IRC | 13:08 | |
*** anteaya has joined #openstack-sprint | 13:24 | |
pabelanger | morning | 13:49 |
pabelanger | fungi: it looks like the volumes on graphite.o.o are reattached this morning. No updates on the support ticket jeblair created | 13:50 |
pabelanger | I'm going to try again to disconnect one | 13:50 |
fungi | thanks | 13:51 |
fungi | worth a shot | 13:51 |
pabelanger | will do 1 at a time this time | 13:52 |
pabelanger | 4b5d9f10-0427-4bfb-b45b-a4d682ac4ba5 detaching | 13:53 |
pabelanger | still detaching | 13:54 |
pabelanger | looks like same issue as yesterday | 13:57 |
pabelanger | I am going to move to status.o.o while we wait for RAX support, fungi anteaya jhesketh or anybody else. Do you mind +A https://review.openstack.org/#/c/320653/ | 14:03 |
anteaya | looking | 14:03 |
jhesketh | me too | 14:04 |
jhesketh | anteaya: Morning | 14:04 |
anteaya | jhesketh: hey there | 14:05 |
anteaya | what are you doing up? | 14:05 |
jhesketh | just checking on things before I head off | 14:05 |
jroll | pabelanger: what's the issue, volumes stuck detaching? ORD? | 14:05 |
anteaya | jhesketh: thank you, I haven't read back scroll yet, how did you do with apps.o.o? | 14:05 |
jhesketh | anteaya: I spun up a new 14.04 apps.openstack.org.. .I've left a few comments in the etherpad. Basically I think it's okay but would like somebody to take a quick look before doing the dns switchover | 14:06 |
jhesketh | probably good to get docaedo to take a look | 14:06 |
anteaya | jhesketh: fair enough, thank you | 14:06 |
pabelanger | jroll: Yes, volumes appear stuck when I detach them, this is in dfw | 14:06 |
anteaya | jhesketh: the etherpad has the new ip? | 14:06 |
jhesketh | yes | 14:06 |
jhesketh | anteaya: or http://104.239.149.86/ | 14:06 |
jroll | pabelanger: interesting. /me sees if this is a thing | 14:06 |
anteaya | jhesketh: thank you | 14:07 |
fungi | pabelanger: did you try halting (or rebooting) the server? | 14:07 |
pabelanger | fungi: I have not | 14:08 |
anteaya | jhesketh: great thank you, I will share that with docado and get his approval | 14:08 |
jhesketh | anteaya: cool. I'm happy to switch the dns across, but maybe not right now | 14:08 |
anteaya | jhesketh: yup, understood thank you | 14:09 |
anteaya | jhesketh: have you lowered the ttl time on apps.o.o? | 14:09 |
anteaya | jhesketh: I pinged docado in -infra | 14:09 |
jhesketh | anteaya: I pinged docaedo earlier but he wasn't around.. if you wouldn't mind checking there are no other api end points or anything that might be pointing to apps.o.o that'd be handy | 14:09 |
jhesketh | anteaya: oh, good point, I'll do that now | 14:10 |
anteaya | jhesketh: thanks | 14:10 |
jroll | pabelanger: if you have instance UUIDs I'm happy to dive in logs, too | 14:10 |
anteaya | no other api endpoints pointing to apps.o.o, I'd be happy to but would have to read or recieve instruction as to how | 14:10 |
anteaya | as currently I don't know | 14:10 |
jhesketh | anteaya: any other alias's | 14:11 |
anteaya | morning jroll thanks for the help | 14:11 |
jroll | morning anteaya :) | 14:11 |
anteaya | jhesketh: I don't know how I would check that, I guess I will ask docado and hope he would know if there are | 14:11 |
pabelanger | jroll: sure: server 09696c29-3410-4baf-8813-05d0eb948be2 | 14:11 |
anteaya | jhesketh: in any case, leave it with me, enjoy some sleep | 14:12 |
jhesketh | anteaya: yeah, if you don't mind | 14:12 |
pabelanger | jroll: and thanks for helping | 14:12 |
anteaya | jhesketh: and thanks for your work here, much appreciated | 14:12 |
anteaya | jhesketh: I got it, sleep well | 14:12 |
jhesketh | anteaya: any idea what storage.apps.openstack.org might be? | 14:12 |
* jhesketh notes it in the dns records | 14:12 | |
anteaya | jhesketh: I personally have no idea, but will find out | 14:12 |
jroll | pabelanger: no problem, I'll see what I can find | 14:13 |
jhesketh | anteaya: thanks, much appreciated :-) | 14:13 |
anteaya | jhesketh: and I'm grateful for your work, see you tomorrow | 14:14 |
jhesketh | heh, it's like you're trying to get rid of me! | 14:14 |
jhesketh | not at all, it was my pleasure... sorry I didn't get it online, I just wanted some people to sanity check it first | 14:14 |
fungi | docaedo lives in portland oregon, so probably isn't online yet for the day | 14:15 |
anteaya | jhesketh: no no, not trying to get rid of you at all | 14:16 |
jhesketh | I'm teasing ;-) | 14:16 |
anteaya | jhesketh: but it must be what, midnight for you? | 14:16 |
jhesketh | correct | 14:16 |
anteaya | love your company, would keep you to 3am if I could | 14:16 |
anteaya | I miss not having you around more | 14:16 |
anteaya | jhesketh: and I agree sanity checking is great | 14:17 |
jhesketh | would stay up until 3am if I could sleep in.. I'm much more productive at night | 14:17 |
anteaya | fungi: thank you | 14:17 |
jhesketh | naww, thanks | 14:17 |
anteaya | jhesketh: :) | 14:17 |
anteaya | jhesketh: well work it out with your wife and then stay up at night | 14:17 |
anteaya | that would work for me | 14:17 |
jhesketh | heh, I do some nights, it's more the morning meetings that constrain me... | 14:18 |
anteaya | ah yes | 14:18 |
jhesketh | (actually one of the reasons I am still up is my wife is on a night shift, so that works out well) | 14:18 |
anteaya | well get mikal to change his schedule too | 14:18 |
anteaya | oh nice | 14:18 |
anteaya | we can have mikal staying up all night too | 14:19 |
anteaya | would get to talk to him again | 14:19 |
anteaya | I miss him too | 14:19 |
jhesketh | heh, he's a morning person | 14:19 |
anteaya | dang | 14:19 |
jhesketh | but these meetings are with other rackers in the states or sometimes uk so it's also juggling timezones | 14:19 |
anteaya | ah I see | 14:19 |
anteaya | I thought it was your morning mikal meeting | 14:20 |
jhesketh | we have those too, but they are at 10am so they are easy | 14:20 |
anteaya | ah good | 14:23 |
jhesketh | okay, now I'm off | 14:32 |
jhesketh | night all! | 14:32 |
anteaya | night | 14:35 |
jroll | pabelanger: looks like a host thing, still not sure what's up but it looks specific to your instance or your host | 14:36 |
pabelanger | jroll: Hmm. Okay. Maybe I'll do what fungi suggests and stop the instances next time | 14:37 |
pabelanger | assuming it reattaches | 14:37 |
jroll | pabelanger: looks like it's going to retry for a couple hours :| | 14:38 |
pabelanger | jroll: what do you suggest? Should I be able to power down the instances now? or simply wait until the retry finish | 14:39 |
jroll | pabelanger: I'm not sure, I am far from a virt expert - trying to bug some people | 14:40 |
anteaya | docado has given the all clear to swap dns on apps.o.o if someone can do that | 14:47 |
anteaya | also don't change storage.apps.o.o | 14:48 |
anteaya | <docaedo> anteaya: yes, that storage DNS entry points to a rax cloudfiles instance, so should not be changed | 14:48 |
anteaya | < | 14:48 |
anteaya | and docaedo is not aware of any aliases | 14:49 |
anteaya | can I convince anyone to take up the dns switch for apps.o.o? | 15:16 |
*** yolanda has quit IRC | 15:20 | |
* anteaya delegates :) | 15:36 | |
anteaya | wrong channel | 15:36 |
* clarkb catches up on precise to trusty sprint | 16:08 | |
clarkb | fungi: are we supposed to boot things with digits always now and cname to them in dns with human names? | 16:09 |
fungi | clarkb: if possible that's the new exciting, i guess. but if you're replacing a server there's a good chance its existing manifest(s) assume the hostname in lots of places so we can punt on that and just do the old-style server names until someone has time to do that refactoring | 16:10 |
clarkb | ya the one I am worried about in the stack of hosts I would like to help with is logstash.o.o, I can look at its apache vhost config | 16:11 |
fungi | storyboard.o.o didn't pass the sniff test for me, so i just replaced it with another storyboard.o.o | 16:12 |
fungi | git grep fqdn turned up 4 instances of $::fqdn in the puppet-storyboard module and some in the sytem-config classes for the storyboard servers as well | 16:12 |
fungi | and i don't want to conflate the "is this thing working after moving to trusty?" problems with the "did i just screw up the puppet manifest?" ones | 16:13 |
clarkb | ya | 16:14 |
clarkb | fungi: the fixup is to listen on *:port and set servername to fqdn? | 16:17 |
fungi | clarkb: set servername to something which _isn't_ fqdn, passed in as a specific string constant to a class parameter instead | 16:22 |
clarkb | oh right because fqdn is different now | 16:22 |
fungi | VirtualHost *:port and ServerName are separate concerns | 16:22 |
clarkb | fungi: sort of, you can't safely colocate *'s on the same ports iwthout servername iirc | 16:23 |
pabelanger | trying to launch status.o.o replacement again | 16:23 |
fungi | VirtualHost *:port ends up being necessary anyway, i think, because of that apache 2.4 behavior i found with the hostname mapping only to a loopback interface in /etc/hosts | 16:23 |
clarkb | fungi: maybe we should default to fqdn so that we continue to work on older hosts (if anoyne else is using the current setup) | 16:27 |
fungi | yep | 16:28 |
pabelanger | wouldn't mind a review to fix apache on status.o.o replacement: https://review.openstack.org/#/c/321068/ | 16:31 |
fungi | in the case of the storyboard vhosts we were, luckily, already setting ServerName so the wildcard change to VirtualHost was sufficient to make this work on newer servers | 16:31 |
clarkb | pabelanger: looking | 16:31 |
jroll | pabelanger: still got nothin on that volume thing :( | 16:32 |
pabelanger | jroll: :( | 16:32 |
clarkb | pabelanger: do we need to enable mod_version or whatever the module that does the if checks is called? | 16:32 |
pabelanger | jroll: I'll wait for it to reattach, then power off the server | 16:32 |
jroll | pabelanger: sounds good, if that doesn't work then O_o | 16:32 |
pabelanger | clarkb: I don't believe so. A quick test on the replacement server didn't require it | 16:33 |
clarkb | pabelanger: it may be enabled by default | 16:33 |
clarkb | pabelanger: we should probably explicitly enable it whenever we add a dep on it | 16:33 |
pabelanger | clarkb: sure, let me check | 16:33 |
clarkb | and it is mod_version | 16:33 |
pabelanger | clarkb: https://tickets.puppetlabs.com/browse/MODULES-1446 looks to be a builtin function now | 16:37 |
clarkb | pabelanger: but we support precise too | 16:38 |
clarkb | if its enabled there by default too I am less worried | 16:38 |
clarkb | I seem to recall we ran into this with the git0* hosts because centos6 to centos7 is similar migration path for apache | 16:38 |
pabelanger | clarkb: that is true, let me see how to enable it for other OSes | 16:39 |
pabelanger | looks like include ::apache::mod::version should be enough | 16:41 |
pabelanger | Hmm, we are still using puppet-httpd | 16:41 |
pabelanger | let me see what is needed to migrate to puppet-apache | 16:41 |
pabelanger | https://review.openstack.org/#/c/321124/ fixes a race condition on status.o.o | 16:56 |
pabelanger | keeps bouncing on me | 16:56 |
fungi | clarkb: yeah, what we ran into with the centos6 migration is that the change to detect the apache version broke the centos6 production servers because it assumed mod_version was always present | 17:06 |
fungi | which apparently was the case on typical centos7 deployments or something | 17:06 |
fungi | so apache failed to start because configuration explicitly referenced version driectives in conditionals even thoug the module to provide them was absent | 17:07 |
pabelanger | we only need to worry about it on ubuntu-precise now | 17:09 |
pabelanger | centos7 and ubuntu-trusty and above have it baked in | 17:09 |
pabelanger | fungi: clarkb: mind reviewing https://review.openstack.org/#/c/321124/ | 17:28 |
pleia2 | lgtm | 17:29 |
pabelanger | pleia2: danke! | 17:30 |
pabelanger | pleia2: mind +A? | 17:31 |
pleia2 | yep, just saw that \ | 17:31 |
fungi | sorry, my primary computer picked this moment to overheat and power itself off, and then i discovered my alternate was frozen | 17:31 |
fungi | back now | 17:31 |
pleia2 | fungi: yeesh, quite a morning | 17:31 |
clarkb | fungi: did you apply ice to the alternate? | 17:32 |
fungi | clarkb: i'm allowing it to add to the thermal entropy of the room for a few minutes instead | 17:33 |
*** ig0r__ has quit IRC | 17:51 | |
pabelanger | fungi: okay, looks like volume has reattached | 17:59 |
fungi | pabelanger: yeah, try halting the server, and if that still doesn't work then we can try to openstack server reboot it | 18:00 |
fungi | hard reboot (whatever the osc syntax is for that anyway) | 18:00 |
pabelanger | ack | 18:02 |
pabelanger | halting system | 18:02 |
pabelanger | fungi: okay, that worked. Both volumes are detached (available) | 18:05 |
pabelanger | and volumes attached to the new server | 18:07 |
pabelanger | \o/ | 18:07 |
fungi | excellent. i guess update/close the ticket | 18:07 |
pabelanger | fungi: sure. Do you mind stepping me through the attachment process for lvm? | 18:08 |
fungi | my guess is something about the old system wasn't responding to or updating metadata in the hypervisor in the way nova expected to indicate the volumes were unused | 18:08 |
fungi | i don't know exactly what signaling that process relies on normally | 18:08 |
fungi | pabelanger: sure, reattachment is easy | 18:09 |
fungi | pabelanger: see the openstack server volume attach syntax in our example guide | 18:09 |
fungi | after that, look in dmesg output, you'll likely see it discover the volume group on its own | 18:09 |
fungi | if lvs reports the existence of the logical volume, then you should be ready to mount itt | 18:10 |
fungi | if not, we may need to vgchange -a y first | 18:10 |
fungi | or vgscan | 18:10 |
fungi | there are a few tools to help the kernel along if udev triggers don't automagically dtrt | 18:11 |
fungi | but in my experience this usually _just works_ | 18:11 |
pabelanger | lvs: http://paste.openstack.org/show/505433/ | 18:11 |
fungi | lgtm! | 18:12 |
pabelanger | and I see /dev/mapper/main-graphite | 18:12 |
pabelanger | okay, mounted! | 18:14 |
pabelanger | now to remember the fstab syntax | 18:15 |
pabelanger | looks like: | 18:15 |
pabelanger | /dev/mapper/main-graphite /var/lib/graphite/storage ext4 errors=remount-ro,noatime,barrier=0 0 2 | 18:15 |
pabelanger | okay, rebooting server to make sure things come back up properly | 18:16 |
pabelanger | then cutting over DNS | 18:16 |
anteaya | anyone who has the time to change the dns for apps.openstack.org to http://104.239.149.86/ that is the only thing left to do for that server | 18:20 |
pabelanger | DNS updated for graphite.o.o | 18:25 |
pabelanger | anteaya: I can do that now | 18:25 |
anteaya | pabelanger: thank you | 18:25 |
anteaya | note there is a dns entry for storage.apps.openstack.org, do nothing with that entry | 18:25 |
pabelanger | anteaya: There should also be an IPv6 address to. We'll need to update that | 18:26 |
pabelanger | anteaya: ack | 18:26 |
anteaya | thanks | 18:26 |
anteaya | I don't know how to get the ipv6 address for that server, sorry I didn't ask for it earlier | 18:26 |
pabelanger | np, I can find it quickly | 18:26 |
anteaya | thanks | 18:26 |
pabelanger | 2001:4800:7819:105:be76:4eff:fe04:70ae for your records | 18:27 |
anteaya | thank you | 18:27 |
pabelanger | okay, DNS changed | 18:27 |
anteaya | wonderful! | 18:27 |
anteaya | thanks pabelanger | 18:27 |
pabelanger | anteaya: the old server is still running. We should clean that up as soon as people are happy with the replacement | 18:28 |
anteaya | thanks, I just pinged docaedo in -infra | 18:28 |
fungi | similarly, i have the old storyboard.o.o halted, will be deleting it later today | 18:28 |
fungi | the new one seems to be in good shape per the denizens of #storyboard | 18:29 |
anteaya | yay | 18:29 |
pabelanger | okay, I restarted nodepoold to pick up the DNS change for graphite.o.o, looks to be logging correctly again | 18:35 |
pabelanger | we'll need to do that same for the other services | 18:35 |
anteaya | yay graphite | 18:36 |
pabelanger | So, shockingly, this is the output of iptables on graphite.o.o now | 18:37 |
pabelanger | http://paste.openstack.org/show/505440/ | 18:37 |
clarkb | pabelanger: does an iptables-persist or whatever it is called restart fix that? | 18:38 |
pabelanger | clarkb: YES | 18:39 |
clarkb | huh | 18:39 |
clarkb | I wonder if it is racing something at boot | 18:39 |
pabelanger | possible | 18:39 |
pabelanger | I should check other servers I upgraded | 18:39 |
pabelanger | okay, they all work as expected | 18:41 |
jeblair | i will see about fixing cacti | 18:43 |
pabelanger | off to pick up my daughter from school, then will try status.o.o again | 18:43 |
jeblair | oh, ha | 18:43 |
jeblair | i forgot to turn off debug logging, which is why the disk is full | 18:43 |
pleia2 | whoops | 18:44 |
jeblair | clarkb, pleia2: May 24 16:52:26 cacti puppet-user[26190]: (/Stage[main]/Apache/Apache::Vhost[default]/File[15-default.conf symlink]/ensure) created | 18:47 |
jeblair | so, puppet ensured that the default site was created... i can't imagine we actually wanted that to happen... | 18:47 |
jeblair | (this is why cacti.openstack.org returns the ubuntu default page) | 18:47 |
pleia2 | hrm | 18:50 |
pleia2 | looks like cacti is another we configure completely in system-config? | 18:51 |
jeblair | yep | 18:51 |
bkero | jeblair: that might be the default behavior for the apcahe module | 18:51 |
pleia2 | bkero: yeah, that's what I'm gathering | 18:52 |
jeblair | two questions: why didn't this break on the old host? how do we turn it off? | 18:52 |
bkero | Which shouldn't be a problem if you're doing name-based vhosts for the sites you actually care about | 18:52 |
pleia2 | could be apache 2.2 > 2.4 weirdness again | 18:53 |
jeblair | indeed, this is not an actual vhost | 18:53 |
jeblair | maybe we should make it a proper vhost | 18:53 |
pleia2 | yeah | 18:53 |
bkero | That sounds like the proper solution | 18:53 |
jeblair | hrm | 18:54 |
bkero | Hmm, it should be getting a proper vhost | 18:54 |
bkero | ::apache::vhost::custom { $::fqdn: | 18:54 |
bkero | hah | 18:54 |
jeblair | bkero: yeah, but we set custom content, which doesn't actually vhost | 18:54 |
bkero | jeblair: can you see if there's a file in /etc/apache2/sites-enabled/ for cacti? | 18:55 |
bkero | It should be triggered by http://localhost/cacti | 18:55 |
jeblair | bkero: yes, it looks like modules/openstack_project/templates/cacti<whatever> | 18:56 |
bkero | right | 18:56 |
jeblair | and cacti.openstack.org/cacti has been working | 18:56 |
jeblair | so that file is fine | 18:56 |
jeblair | i just wrapped it in a virtualhost on disk with a servername of cacti.openstack.org | 18:56 |
jeblair | and now cacti.openstack.org/ works | 18:57 |
jeblair | so i will puppet that up | 18:57 |
bkero | cool | 18:57 |
* bkero prefers to avoid having to specifying paths when a subdomain already exists | 18:57 | |
jeblair | this is an *old* server config :) | 18:57 |
bkero | I can feel the cobwebs | 18:58 |
jeblair | wow, it's not as old as i thought: https://review.openstack.org/#/c/14582/ it's a whole 1.25 years after gerrit... | 19:00 |
jeblair | we probably stood it up manually before then | 19:00 |
pleia2 | still before my time :) | 19:01 |
jeblair | oh, yes, in fact the first comment says as much | 19:01 |
jeblair | pleia2, bkero, clarkb: remote: https://review.openstack.org/321190 Add VirtualHost to cacti vhost | 19:07 |
bkero | jeblair: lgtm | 19:15 |
pabelanger | and back | 19:39 |
pabelanger | jeblair: I just noticed http://cacti.openstack.org/ landing page doesn't seem correct | 19:40 |
bkero | pabelanger: it will once patch lands I assume | 19:41 |
bkero | See earlier ^ | 19:41 |
pabelanger | bkero: sure, checking backscroll | 19:41 |
jeblair | pabelanger, bkero: yes, that patch addresses that | 19:41 |
pabelanger | Great, see it now | 19:41 |
pabelanger | launching status.o.o again | 19:58 |
jeblair | pabelanger: hrm, i'm not seeing some graphite data i'm expecting from zuul | 20:26 |
pabelanger | jeblair: which? | 20:27 |
pabelanger | jeblair: is it possible the service needs restarting to pick up the dns change? | 20:27 |
jeblair | pabelanger: oh hrm. nodepool has been restarted, and we have data from that | 20:27 |
jeblair | pabelanger: that could be it | 20:27 |
fungi | bkero: yeah, i think stats exports are going to whatever ip address zuul resolved from its config at start | 20:28 |
pabelanger | Ya, I did restart nodepool. But we need to schedule the other services | 20:28 |
fungi | er, jeblair ^ (sorry bkero) | 20:28 |
jeblair | well... i've been wondering if we want to restart zuul for https://review.openstack.org/318966 | 20:29 |
clarkb | yes the statsd lib does name resolution at import time iirc | 20:29 |
bkero | no worries | 20:29 |
fungi | bkero: you had changes up for the paste.o.o upgrade, right? i have bandwidth to tackle that one now | 20:31 |
bkero | fungi: I do! | 20:32 |
bkero | Lemme get the review link | 20:32 |
bkero | fungi: https://review.openstack.org/#/c/311235/ | 20:32 |
pabelanger | okay, moving to my next server. | 20:42 |
fungi | bkero: i think i see a problem with that change. see my inline comment | 20:45 |
pabelanger | going to do eavesdrop.o.o | 20:45 |
pabelanger | Hmm, this was going to be tricky | 20:46 |
pabelanger | need to rework the file system first | 20:46 |
fungi | oh, yeah eavesdrop has state in /var/lib | 20:48 |
fungi | and then we symlink into it from /srv | 20:48 |
bkero | fungi: ok, fixed | 20:48 |
pabelanger | yup | 20:48 |
fungi | pabelanger: we could make /var/lib/whatever a logical volume and /srv/whatever another volume | 20:49 |
fungi | and stick them in a main volume group on a cinder device | 20:49 |
fungi | bkero: coolnezr | 20:49 |
pabelanger | fungi: ya, I think that will be easiest. | 20:50 |
clarkb | hrm logstash.o.o will need some plumbing to use something other than fqdn in the vhost name | 20:50 |
pabelanger | fungi: so, 1 cinder volume, 2 partitions. I should be able to follow: http://docs.openstack.org/infra/system-config/sysadmin.html#adding-a-new-device but use 50% for parted command | 20:52 |
pabelanger | fungi: does that sound right? | 20:52 |
fungi | bkero: wrong name on that class parameter--it will likely fail tests anyway, but see inline new comment | 20:53 |
fungi | pabelanger: don't need two partitions. you can just create two logical volumes in the volume group you make | 20:54 |
bkero | fungi: ah, my bad. Fixed. | 20:55 |
pabelanger | fungi: gotcha. That's using lvcreate right? | 20:55 |
bkero | Hah, that's kind of curious. It should still work, but this is the default: https://github.com/openstack-infra/puppet-lodgeit/blob/master/manifests/site.pp#L12 | 20:58 |
bkero | So if we didn't set it in the lodgeit::site resource, it would do the right thing by default | 21:01 |
fungi | pabelanger: yep, just make sure to figure out how much space you want to assign to each lv so that you don't exceed the space available to the vg | 21:02 |
pabelanger | fungi: okay, I created a 50GB SSD volume | 21:04 |
*** baoli_ has quit IRC | 21:04 | |
pabelanger | fungi: or I can go less | 21:04 |
pabelanger | we are only using about 10GB today | 21:04 |
fungi | pabelanger: for some reason i thought jeblair said 75gb was the smallest available size for ssd in rackspace | 21:05 |
jeblair | fungi: for sata | 21:05 |
fungi | ahh | 21:05 |
fungi | thought that was 100gb | 21:06 |
pabelanger | I am not sure what the smallest SSD is | 21:06 |
jeblair | the limit is smaller for ssd, but i don't recall | 21:06 |
jeblair | fungi: i did too, seems they lowered it? | 21:06 |
pabelanger | let me try 25GB | 21:06 |
fungi | bkero: oh, interesting. that's a strangely dysfunctional default whose memory i'd apparently repressed | 21:06 |
jeblair | (if i were to guess, i'd guess 50 for ssd) | 21:06 |
pabelanger | Invalid input received: 'size' parameter must be between 50 and 1024 (HTTP 400) (Request-ID: req-382a88ec-dfeb-44de-a84b-490cc92592ad) | 21:07 |
pabelanger | there we go | 21:07 |
pabelanger | 50 is the smallest | 21:07 |
pabelanger | jeblair wins! | 21:07 |
fungi | and just wait 'till you see what you've won! | 21:08 |
anteaya | a new car? | 21:11 |
bkero | Bees! | 21:15 |
fungi | a new car filled with bees | 21:20 |
bkero | http://www.theguardian.com/environment/2016/may/24/bee-swarm-clinging-to-car-boot-haverfordwest-wales | 21:20 |
bkero | Apparently that's what happens when a queen is accidentally trapped inside. | 21:20 |
fungi | (god save the queen?) | 21:21 |
pabelanger | okay, ready to mount the new volume on eavesdrop.o.o, but waiting until meetings are finished | 21:21 |
pabelanger | it would be cool to have a link on eavesdrop.o.o showing the current meetings in progress | 21:22 |
clarkb | I am going t owork on a new logstash.o.o shortly | 21:22 |
fungi | pabelanger: that's an awesome idea... i wonder if we should compute that from our schedule data or attempt to extract state info out of the bot | 21:24 |
fungi | leaning toward the former so that if there's no meeting we can see what/when the next meeting is supposed to be for each official channel | 21:24 |
pabelanger | fungi: parsing the bot would be interesting. But maybe we can use some JS magic to read the exiting ical start times | 21:25 |
pabelanger | Looks like I have to way a few hours for the mount | 21:27 |
pabelanger | I'll go grab some food and hang out with the family. I'll circle back later tonight to finish up | 21:28 |
fungi | awesome, thanks! | 21:29 |
clarkb | dropping TTL on logstash.o.o dns records now | 21:31 |
clarkb | current logstash.o.o is on a standard 2GB flavor, should I switch to performance or use the new "general" flavors | 21:38 |
clarkb | (anyone have an opinion? I lean towards using performance) | 21:38 |
fungi | i've confirmed that the ttl for paste.o.o is still at 5 minutes | 21:38 |
fungi | clarkb: i've been sticking with performance since it's what we've documented | 21:39 |
anteaya | clarkb: let's try performance | 21:39 |
clarkb | ok going to use performance 2GB which has the same memory and cpus | 21:39 |
anteaya | fungi: ah did we? | 21:39 |
fungi | anteaya: system-config launch/README example commands anyway | 21:40 |
fungi | i guess we're running a bit of a backlog in the check pipeline | 21:41 |
clarkb | just waiting on the change to test current stuff for logstash on trusty to be tested. If that comes back green will start the boot | 21:56 |
anteaya | fungi: ah | 21:57 |
anteaya | fungi: yes appears we have a dearth of trusty nodes | 21:57 |
clarkb | hrm I might need to dig in and see if nodepool is happy | 22:27 |
clarkb | ENOTRUSTY | 22:27 |
clarkb | we have 450 ish used instances | 22:33 |
anteaya | that would make things unhappy | 22:33 |
anteaya | do we | 22:33 |
clarkb | which roughyl lines up with our capacity if we are a couple clouds down | 22:34 |
* clarkb guesses bluebox and osic are still effectively off? | 22:34 | |
anteaya | I haven't heard anything to the contrary | 22:34 |
fungi | no, we got the fip cleanup patch online and working yesterday | 22:34 |
anteaya | ah wonderful | 22:35 |
fungi | though pabelanger spotted a bunch of stale used nodes not transitioning to delete i think? | 22:35 |
fungi | i wonder if that's an ongoing issue suddenly | 22:35 |
clarkb | fungi: the vast majority of instances in bluebox and osic are building | 22:36 |
fungi | oh, hrm | 22:36 |
clarkb | so I don't think its a state transition issue | 22:36 |
clarkb | in osic its 70:27 building:used | 22:36 |
clarkb | Forbidden: Quota exceeded for instances, ram: Requested 1, 8192, but already used 100, 819200 of 100, 819200 instances, ram (HTTP 403) (Request-ID: req-de1bc677-c2e6-45f1-b0f2-9ea776dd858b) | 22:37 |
clarkb | we do have 100 instances there, a bunch of them don't have fips | 22:39 |
clarkb | I wonder if we are just slowly going from building to ready due to fip situation | 22:39 |
clarkb | the rate for osic is set to .001 so thats not it. I wonder if we just have a really slow attach time period or a high fail rate? maybe racing against hte cleanup cron | 22:42 |
clarkb | ok finally got the trusty tests to run, they passed so attempting to build new host now | 23:07 |
anteaya | yay | 23:08 |
pabelanger | fungi: clarkb: Ya, I cleaned up a bunch already today. Mostly because nodepoold was stopped for ~45mins | 23:12 |
pabelanger | must have been about 400 nodes leaked | 23:12 |
clarkb | OSError: [Errno 13] Permission denied: '/home/clarkb/.ansible/tmp' | 23:13 |
clarkb | am I doing this wrong? | 23:14 |
fungi | clarkb: you likely ran with sudo once and now you're running without? | 23:15 |
fungi | that happened to me once and so i blew away ~/.ansible | 23:15 |
clarkb | hrm ya | 23:16 |
* clarkb cleans up | 23:16 | |
fungi | then later discovered that there are a number of things still not working due to insufficient permissions if you try to launch-node.py as non-root anyway | 23:16 |
fungi | like hiera values not making it onto new servers | 23:16 |
clarkb | oh | 23:17 |
* clarkb prepares for it to fail again | 23:17 | |
clarkb | fungi: do I need to sudo -H it? | 23:17 |
fungi | clarkb: you probably also need to do something to keep sudo from filtering envvars if so? | 23:17 |
clarkb | oh ya | 23:18 |
fungi | i used an interactive root shell | 23:18 |
clarkb | hrm puppet failed with exit code 6 but no logs printed | 23:25 |
clarkb | fungi: is ^ where you were saying you have to keep it and check syslog? | 23:26 |
fungi | yeah, add --keep | 23:27 |
fungi | and then ssh into the broken instance (if you can) and look in its syslog | 23:27 |
pabelanger | looks like meetings are done for a bit | 23:32 |
pabelanger | going to stop meetbot and move files to cinder volume | 23:33 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!