mordred | crinkle: fwiw: delete from compute_nodes where id=21; | 00:00 |
---|---|---|
crinkle | mordred: cool | 00:00 |
mordred | crinkle: I'm looking for pre-written scripts for doing the soft deletes | 00:00 |
mordred | crinkle: it seems to be ... tricky | 00:01 |
nibalizer | 284468 | 00:01 |
nibalizer | logs ^^ | 00:01 |
*** cody-somerville has joined #openstack-sprint | 00:05 | |
*** dfflanders has quit IRC | 00:20 | |
*** cody-somerville has quit IRC | 00:21 | |
*** delatte has joined #openstack-sprint | 00:58 | |
*** delattec has quit IRC | 01:00 | |
*** yolanda_ has joined #openstack-sprint | 01:05 | |
*** yolanda has quit IRC | 01:18 | |
*** yolanda_ is now known as yolanda | 01:18 | |
*** sivaramakrishna has joined #openstack-sprint | 02:40 | |
*** baoli_ has quit IRC | 03:31 | |
*** baoli has joined #openstack-sprint | 03:40 | |
*** baoli has quit IRC | 03:42 | |
*** baoli has joined #openstack-sprint | 03:43 | |
*** baoli has quit IRC | 03:48 | |
*** yolanda has quit IRC | 04:28 | |
*** mrmartin has joined #openstack-sprint | 05:44 | |
*** mrmartin has quit IRC | 05:48 | |
*** cody-somerville has joined #openstack-sprint | 05:54 | |
pabelanger | sigh | 06:30 |
pabelanger | https://github.com/openstack/puppet-nova/blob/stable/kilo/manifests/init.pp#L603 | 06:30 |
pabelanger | kilo hardcodes /var/log/nova to 0750 | 06:31 |
pabelanger | which causes apache to 404 | 06:31 |
pabelanger | will hack on it in the morning | 06:31 |
*** cody-somerville has quit IRC | 06:34 | |
*** mrmartin has joined #openstack-sprint | 06:38 | |
*** imcsk8 has quit IRC | 06:53 | |
*** imcsk8 has joined #openstack-sprint | 07:06 | |
*** AJaeger has quit IRC | 08:10 | |
*** mrmartin has quit IRC | 08:55 | |
*** trusted has joined #openstack-sprint | 09:08 | |
*** trusted has quit IRC | 09:10 | |
*** sivaramakrishna has quit IRC | 09:10 | |
*** sivaramakrishna has joined #openstack-sprint | 09:11 | |
*** sivaramakrishna is now known as Guest92287 | 09:11 | |
*** yolanda has joined #openstack-sprint | 09:19 | |
*** yolanda has quit IRC | 09:35 | |
*** mrmartin has joined #openstack-sprint | 09:44 | |
*** Guest92287 has quit IRC | 10:19 | |
*** _degorenko|afk is now known as degorenko | 10:30 | |
*** mrmartin has quit IRC | 10:34 | |
*** mrmartin has joined #openstack-sprint | 10:35 | |
*** mrmartin has joined #openstack-sprint | 10:47 | |
*** mrmartin has quit IRC | 11:12 | |
*** rfolco has joined #openstack-sprint | 11:53 | |
*** mrmartin has joined #openstack-sprint | 12:11 | |
*** NobodyCa1 has joined #openstack-sprint | 12:15 | |
*** NobodyCam has quit IRC | 12:17 | |
*** mrmartin has quit IRC | 12:43 | |
*** yolanda has joined #openstack-sprint | 12:45 | |
*** baoli has joined #openstack-sprint | 12:55 | |
*** baoli_ has joined #openstack-sprint | 13:26 | |
*** baoli has quit IRC | 13:28 | |
clarkb | anyone else doing pancakes? otherwiae we are heading over soon | 14:33 |
fungi | clarkb: i am not. trying to catch up on some work and have been eating more than i'm used to this week | 14:38 |
fungi | though if anyone's driving to the airport this afternoon and willing to have a passenger, i'm in need of a ride. if not, i'll probably just arrange a cab | 14:39 |
clarkb | I am catching shuttle at noon | 14:42 |
clarkb | which is a bit early for you | 14:42 |
fungi | well, also i'm not booked on your shuttle, so they'd likely balk | 14:48 |
*** NobodyCa1 is now known as Nobodycam | 15:09 | |
*** Nobodycam is now known as NobodyCam | 15:10 | |
anteaya | morning | 15:13 |
*** dfflanders has joined #openstack-sprint | 15:19 | |
fungi | anteaya: how's minnesota? | 15:21 |
anteaya | much the same as fort collins | 15:21 |
anteaya | driving a rav4 | 15:21 |
anteaya | liking it | 15:21 |
anteaya | looks like devstack had a problem last night: http://lists.openstack.org/pipermail/openstack-dev/2016-February/087518.html | 15:22 |
anteaya | sdague and jhesketh worked on it: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-02-25.log.html#t2016-02-25T11:55:42 | 15:22 |
anteaya | and ansible was pinned in devstack-gate: https://review.openstack.org/#/c/284652/2 | 15:22 |
anteaya | that is my summary thus far | 15:22 |
anteaya | and I understand that you got infra cloud connected to nodepool yesterday, yay! | 15:25 |
anteaya | \o/ | 15:25 |
anteaya | nice work | 15:25 |
*** dfflanders has quit IRC | 15:25 | |
fungi | it was exciting to see | 15:27 |
nibalizer | good morning | 15:27 |
fungi | even if that short-lived excitement was soon replaced by new bugs to fix | 15:28 |
clarkb | good morning. jeblair pabelanger amd I are at hpe lobby. anyome check into why nodepool cant boot things yet? | 15:28 |
fungi | i'm just about to check out of my room and will be there shortly | 15:29 |
anteaya | yay | 15:30 |
anteaya | fungi: new bugs, yay! | 15:30 |
anteaya | nibalizer clarkb morning | 15:30 |
*** yolanda has quit IRC | 15:32 | |
nibalizer | clarkb: i have done no investigating | 15:37 |
nibalizer | gonna head to the hpe office in a few | 15:37 |
clarkb | https://www.cloudbees.com/press/cloudbees-announces-first-jenkins-based-continuous-delivery-service-aws-and-openstack-private | 15:50 |
nibalizer | heya | 16:07 |
nibalizer | were at the security desk | 16:07 |
fungi | yep | 16:08 |
nibalizer | can has fetching? | 16:08 |
clarkb | pleia2: is on the way | 16:08 |
*** yolanda has joined #openstack-sprint | 16:09 | |
rcarrillocruz | https://review.openstack.org/#/c/284789/ | 16:23 |
rcarrillocruz | reviews pls, fix for the set project quotas playbook | 16:23 |
clarkb | file=/var/lib/nova/instances/979306d6-e169-4ba8-a343-b266b4f422c1/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none | 16:26 |
clarkb | compute029 doesn't have the updated nova.conf | 16:27 |
clarkb | so this may just be a matter of making sure puppet ran everywhere | 16:27 |
clarkb | mordred: are you around? new ansible may have broken infracloud puppeting. We don't run puppet against the infracloud hosts | 16:31 |
clarkb | https://jenkins05.openstack.org/job/gate-rally-dsvm-rally-cinder/848/console | 16:32 |
mordred | clarkb: uhm | 16:35 |
mordred | clarkb: I am here - would it be useful for me to dial in? | 16:36 |
clarkb | mordred: sure | 16:36 |
clarkb | we are currently talking about plans for mid term of clouds | 16:37 |
clarkb | but you are welcome to join in | 16:37 |
mordred | neat | 16:37 |
clarkb | mordred: the other issue is remember that host that nodepool couldn't delete and you cleaned up? it seems to still count against quota | 16:38 |
mordred | clarkb: neat. I may need to delete more records | 16:39 |
clarkb | we bumped quota and are now booting nodes again successfully | 16:39 |
clarkb | but we have one nodes worth of extra quota counted against us and it does not exist | 16:39 |
* jeblair thinks we sholud just redploy west ;) | 16:41 | |
clarkb | jeblair: how were you checking disk IO of running VM yesterday? I can start that up on compute029 | 16:42 |
jeblair | iotop | 16:46 |
clarkb | we seem to do a consistent 1-2 megabyte per second on that VM | 16:47 |
clarkb | for writes | 16:47 |
jeblair | that's what we saw on 035 yesterday too | 16:47 |
clarkb | note 029 has not updated its libvirt config | 16:47 |
clarkb | because puppet isn't running on the infracloud things | 16:48 |
yolanda | infra-root, is ok to restart gerrit ? is performing badly | 16:48 |
fungi | yolanda: fine by me | 16:48 |
*** rfolco has quit IRC | 16:56 | |
crinkle | okay I believe https://review.openstack.org/#/c/276375/ is reasonably safe to apply to west though it will cause a short downtime | 16:57 |
Clint | fungi: "ssh: Enable HashKnownHosts by default to try to limit the spread of ssh worms." | 16:58 |
*** rfolco has joined #openstack-sprint | 16:59 | |
*** rfolco has quit IRC | 16:59 | |
fungi | Clint: thanks~ i feel like we should just set that globally across all our puppeted servers in fact | 17:00 |
Clint | i certainly do personally | 17:01 |
fungi | i understand the theory as to why it's sometimes a security improvement for personal systems (since it allows an attacker to get a list of other places you may have an account) but for our servers i don't see it being anything beyond a complication | 17:04 |
mordred | fungi: ++ | 17:04 |
clarkb | mordred: [DEPRECATION WARNING]: Using bare variables is deprecated. from puppet run all log | 17:06 |
Clint | an attacker can get that list of places from my shell history | 17:06 |
clarkb | I think that is new with new ansible | 17:06 |
Clint | so all it does is irritate me | 17:06 |
yolanda | crinkle is it ok now to approve https://review.openstack.org/#/c/276375/ ? | 17:09 |
yolanda | if that lands i can remove management address from my patches | 17:09 |
clarkb | nibalizer: puppet did not run | 17:10 |
clarkb | but thats what it said so it actually tried | 17:11 |
crinkle | yolanda: I think so | 17:11 |
crinkle | except i guess puppet isn't running ^ | 17:11 |
mordred | clarkb: https://review.openstack.org/284820 | 17:11 |
fungi | Clint: true. dat | 17:18 |
*** krtaylor has quit IRC | 17:21 | |
jeblair | http://docs.openstack.org/infra/system-config/sysadmin.html#ssh-access | 17:23 |
jeblair | crinkle: ^ | 17:23 |
crinkle | jeblair: ty | 17:23 |
jeblair | yolanda: can you see my comment on https://review.openstack.org/280326 ? | 17:26 |
*** sivaramakrishna has joined #openstack-sprint | 17:27 | |
yolanda | checking | 17:28 |
yolanda | yes i will fix | 17:29 |
crinkle | nibalizer: https://review.openstack.org/284830 | 17:30 |
nibalizer | ty | 17:30 |
yolanda | jeblair, amended | 17:30 |
jeblair | yolanda: thanks | 17:31 |
nibalizer | fungi: 268366 and 284830 | 17:33 |
nibalizer | sets up infracloud-root | 17:33 |
*** krtaylor has joined #openstack-sprint | 17:33 | |
jeblair | mordred, nibalizer, yolanda: can we chat about https://review.openstack.org/281892 and https://review.openstack.org/243399 ? | 17:35 |
clarkb | nibalizer: Could not find data item openstackci_infracloud_password in any Hiera data file and no default supplied at /opt/system-config/production/manifests/site.pp:1188 on node controller00.hpuswest.ic.openstack.org | 17:37 |
clarkb | the rally job succeeded \o/ | 17:52 |
clarkb | so while a little slow we are producing good results | 17:52 |
crinkle | \o/ | 17:57 |
yolanda | jeblair, patches fixed and waiting for review again | 17:58 |
*** sivaramakrishna has quit IRC | 18:04 | |
*** degorenko is now known as _degorenko|afk | 18:04 | |
Clint | https://jenkins01.openstack.org/job/gate-tempest-dsvm-neutron-full/21745/console aborted | 18:06 |
fungi | Clint: did the change which triggered it get a new patchset, or was it in the gate and a change ahead of it failed so testing for it was restarted as new job runs? | 18:09 |
Clint | fungi: no new patchset, no explicit dependency | 18:11 |
fungi | was it for an approved change? | 18:12 |
mordred | sensu_metrics_check: name=swift-sla-metrics plugin=metrics-os-api.py | 18:13 |
mordred | args='-S swift --scheme {{ monitoring.graphite.host_prefix }}' | 18:13 |
nibalizer | i have fixed up controller00's puppet i think | 18:13 |
yolanda | jeblair so we have port 80 in west, but is not needed | 18:13 |
yolanda | rcarrillocruz sent a change to remove for west | 18:13 |
rcarrillocruz | i pushed a change to disable that for west | 18:13 |
rcarrillocruz | y | 18:13 |
jeblair | yolanda, crinkle: yes -- so let's make it it the same | 18:13 |
jeblair | pabelanger: are you doing any logs on the baremetal nodes yet? | 18:14 |
yolanda | jeblair so i think the way is to not use 80, and remove from west, rather than adding to east | 18:14 |
jeblair | pabelanger: or just the controller for now? | 18:14 |
Clint | fungi: got approved ~1h ago | 18:14 |
pabelanger | jeblair: just controllers, but easy to enable on controllers | 18:14 |
pabelanger | err | 18:14 |
pabelanger | computes | 18:15 |
jeblair | pabelanger: okay, so if you're not doing it on baremetal for now, then it may be safe to turn off port 80 on baremetal | 18:15 |
jeblair | crinkle: sound right to you? | 18:15 |
mordred | https://github.com/blueboxgroup/ursula-monitoring/tree/master/sensu/plugins | 18:15 |
pabelanger | jeblair: Ya, a matter of just saying we want it and updating a line in puppet | 18:15 |
mordred | https://github.com/blueboxgroup/ursula-monitoring/blob/master/sensu/plugins/check-glance-store.py | 18:16 |
crinkle | jeblair: yolanda I don't see anything running on port 80...I'm wondering why it was there, nginx is running on 8080 | 18:16 |
mordred | https://github.com/blueboxgroup/ursula-monitoring/blob/master/sensu/plugins/metrics-nova.py | 18:16 |
rcarrillocruz | probably a copy pasta from other definition | 18:16 |
crinkle | could be | 18:17 |
rcarrillocruz | based as a template for baremetal | 18:17 |
fungi | Clint: what change was it? my connection is not well suited to browsing the jenkins webui | 18:17 |
crinkle | jeblair: yolanda I think it should be safe to remove it | 18:17 |
Clint | fungi: https://review.openstack.org/#/c/271599 | 18:18 |
fungi | Clint: oh, it's a cinder change. it's almost certainly still in the gate and a change ahead of it in the integrated queue failed and caused a gate reset | 18:19 |
mordred | https://github.com/blueboxgroup/ursula-monitoring/tree/master/collectd/plugins/openstack | 18:20 |
fungi | Clint: http://status.openstack.org/zuul/ would tell you for sure (i similarly can't easily look at that from this machine) | 18:20 |
Clint | fungi: yup, thanks | 18:21 |
crinkle | this might be useful for the compute nodes https://collectd.org/wiki/index.php/Plugin:virt | 18:23 |
crinkle | since sometimes the controller lies about what the computes are doing | 18:23 |
jeblair | mordred: https://review.openstack.org/281892 and https://review.openstack.org/243399 | 18:26 |
yolanda | infra-root, there is an extra +2 needed for patches on infra-cloud apart from jeblair, care reviewing those ? | 18:26 |
mordred | https://galaxy.ansible.com/openstack-infra/puppet/ btw | 18:35 |
clarkb | nibalizer: has puppet updated the libvurt settings on compute029 yet? | 18:36 |
yolanda | jeblair, resent | 18:36 |
clarkb | if that improves performance we can probably merge your change to bump max-servers | 18:37 |
nibalizer | clarkb: checking | 18:37 |
fungi | nibalizer: why is 268366 using depends-on instead of being rebased onto 284830 (they're in the same git repo) | 18:38 |
nibalizer | clarkb: yes it ran | 18:39 |
clarkb | nibalizer: if you look in /etc/nova/nova.conf is something=file=unsafe presentm | 18:39 |
clarkb | ? | 18:39 |
clarkb | if so check the currently running VM for its qcow2 disk line in ps | 18:40 |
nibalizer | i dont see a vm in 029 | 18:40 |
clarkb | scheduker us probablu moving them aroubd now | 18:40 |
fungi | is anyone driving to the denver airport this afternoon/evening i can bum a ride from? i don't really care what time... my flight doesn't take off until after midnight so i expect to just work from the airport for a while anyway | 18:46 |
fungi | if no, not a big deal, just don't want to waste money on cab fare needlessly | 18:46 |
crinkle | yolanda: jeblair https://review.openstack.org/284872 | 18:51 |
pabelanger | jeblair: mordred: I asked the trystack team to publish there grafana dashboard tooling too: https://github.com/trystack/trystack-collectd As another metric to how to get the data | 18:53 |
yolanda | crinkle, pleia2, or anyone interested, do you want to join a skype call about move of the servers in an hour ? | 18:54 |
nibalizer | clarkb: hrm | 18:54 |
jeblair | pabelanger: cool, and yeah, more collectd | 18:54 |
crinkle | yolanda: yes | 18:54 |
nibalizer | so according to openstack server show the vm is on 029 | 18:54 |
nibalizer | but when i ps on 029 it does not exist | 18:54 |
nibalizer | so i wonder if we lost another one | 18:54 |
pleia2 | yolanda: I assumed we'd do it here in the room with the phone thing | 18:54 |
clarkb | nibalizer: if it uodated the hostname when it rwstarted then that seems likrly | 18:54 |
nibalizer | clarkb: yea the vm doesn't ping | 18:54 |
pleia2 | we can use the skype phone number on the conference a call device | 18:54 |
yolanda | i had an invite with skype so we need to change the method | 18:55 |
yolanda | ah ok | 18:55 |
clarkb | so we may want a controlled puppet everywhere | 18:55 |
pleia2 | I don't use skype, I always use the phone numbers :) | 18:55 |
clarkb | then redo mordreds cleanups | 18:55 |
yolanda | let's do with skype phone | 18:55 |
crinkle | nibalizer: yep nova hypervisor-list | grep compute029 has one up one down | 18:55 |
nibalizer | gg | 18:55 |
clarkb | skype stopped working on linux this week too | 18:55 |
clarkb | so ya bad skype | 18:55 |
nibalizer | our mirror pings | 18:55 |
nibalizer | because its a good mirror | 18:56 |
*** AJaeger has joined #openstack-sprint | 18:56 | |
clarkb | I am on shuttle now so service may be spotty but ya just getting it everywhere then cleaning up may get us past this | 18:59 |
fungi | pleia2: yolanda: and we can dial the conference phone into pbx.openstack.org if there's more than one person not in the room who wants to be on the call | 19:00 |
crinkle | yolanda: https://review.openstack.org/#/c/277605/5/manifests/site.pp | 19:06 |
yolanda | jeblair, crinkle, can you revisit https://review.openstack.org/260018 ? fixed rabbit port to use ssl | 19:09 |
*** baoli_ has quit IRC | 19:11 | |
yolanda | fungi ok i'll share phone number and conf and we can do it on the room, sounds good | 19:22 |
nibalizer | ed78ffbb-ce70-43a1-8c5d-b0bbbe7d8174 | devstack-trusty-infracloud-west-8311836 | ACTIVE | public=15.184.55.16 | 19:27 |
yolanda | fungi thanks for approving, i'll look at puppetmaster to babysit | 19:32 |
clarkb | nibalizer: is thst a new host with better cache? | 19:36 |
nibalizer | clarkb: no thats an orphaned node | 19:37 |
nibalizer | that needs the mordred clenaup funtime | 19:37 |
clarkb | ah | 19:37 |
clarkb | we should write up a proposal for nova to do that | 19:40 |
clarkb | possibly a no really just delete from db please | 19:40 |
clarkb | and maybe we can identify compute hosts with unique id set in config? | 19:41 |
clarkb | then this only changes if we explicitly change it in config | 19:41 |
clarkb | I wonder how something like mesos does that | 19:42 |
jeblair | or generate a uuid and store it in host-local storage on boot? | 19:42 |
jeblair | (first boot) | 19:42 |
jeblair | (and if you blow away your host, you get another uuid and you need to clean up the old one, but so what?) | 19:43 |
clarkb | ya as long as it is more explicit than hostname | 19:44 |
*** baoli has joined #openstack-sprint | 19:49 | |
*** krotscheck is now known as krotscheck_dcm | 19:50 | |
*** baoli has quit IRC | 19:51 | |
nibalizer | https://review.openstack.org/#/c/284820/ | 20:02 |
nibalizer | infra root should be able to do that | 20:02 |
clarkb | nibalizer we can manually remove that instance from nodepool db too, since it is gone cloud side | 20:02 |
nibalizer | clarkb: ok | 20:02 |
nibalizer | would i have to stop nodepool? | 20:02 |
clarkb | that should boot a new thing with cache hopefully | 20:02 |
clarkb | nibalizer: no | 20:02 |
nibalizer | we're doing a conf call to determine the future of the gear | 20:02 |
clarkb | ah | 20:02 |
clarkb | just thinking it would be good to discover if cache mode fixes performance with io | 20:03 |
clarkb | but uncapable of doing anything myself so dont worry about it | 20:03 |
jeblair | i wonder if "missing hardware" might explain some of the nodes that are unresponsive :) | 20:08 |
crinkle | afaik there are only two that are unresponsive https://etherpad.openstack.org/p/infra-cloud-inventory-status | 20:10 |
crinkle | others might not be fully deployed or have other miscellaneous errors | 20:10 |
mordred | it'd be great if we could get ipv6 in the new ecopod | 20:10 |
mordred | :) | 20:10 |
clarkb | +1 | 20:11 |
mordred | and also maybe not so much with the tagged vlans | 20:11 |
mordred | AND | 20:11 |
mordred | if they're unracking everything anyway | 20:11 |
mordred | maybe they can do battery remediation | 20:11 |
clarkb | tagged is fine if we can have two nics | 20:12 |
mordred | yah | 20:12 |
clarkb | noe that we grok vlans eith glean and friemds | 20:12 |
mordred | what? | 20:13 |
clarkb | now that we grok vlans | 20:14 |
clarkb | the trouble before was around interface setup when needing to tag | 20:14 |
clarkb | but thats mostly solved and leaves us with neutron funkyness on one interface | 20:15 |
rcarrillocruz | folks | 20:25 |
rcarrillocruz | for the ansible role to create resources baseline | 20:25 |
rcarrillocruz | what are the resources that are common to all clouds | 20:26 |
rcarrillocruz | trying to model it | 20:26 |
rcarrillocruz | flavors are common to all clouds i assume | 20:26 |
rcarrillocruz | ? | 20:26 |
rcarrillocruz | anything else? | 20:26 |
rcarrillocruz | images | 20:26 |
rcarrillocruz | ... | 20:26 |
clarkb | flavors images users/projects/domains | 20:28 |
rcarrillocruz | that's it? | 20:28 |
rcarrillocruz | thanks | 20:28 |
clarkb | the mirror maybe but I think thats a layer above | 20:28 |
rcarrillocruz | yeah | 20:28 |
mordred | crinkle (and everyone else) this: http://paste.openstack.org/show/488271 is a pseudo-code first draft of the delete script | 20:29 |
clarkb | networks | 20:29 |
clarkb | and subnets | 20:29 |
mordred | I took the previous script which was using pt-archiver to move things to shadow tables | 20:30 |
mordred | and am rewriting it ot just delete things | 20:30 |
mordred | because screw that | 20:30 |
mordred | also, it'll now take a server argument which is the UUID of a server to purge from the db | 20:30 |
mordred | I've also got a dump of the nova db from west on my laptop so that I can practice a few times :) | 20:31 |
rcarrillocruz | clarkb: ? | 20:31 |
rcarrillocruz | i had the impression you get the netowrks and subnets created for you on public cloud providers | 20:31 |
rcarrillocruz | so it's not really common to all clousd? | 20:32 |
mordred | it depends | 20:32 |
mordred | some clouds create them for you, some do not | 20:32 |
rcarrillocruz | ok | 20:32 |
rcarrillocruz | tell you what | 20:32 |
mordred | same with flavors - some clouds we create them, some we do not | 20:32 |
rcarrillocruz | i'll model it in a yaml | 20:32 |
mordred | yah | 20:32 |
mordred | ++ | 20:32 |
mordred | I think that's a great first step | 20:32 |
rcarrillocruz | with a dict as 'common_cloud_resources" | 20:32 |
mordred | images is a thing we want to be the same everywhere | 20:32 |
rcarrillocruz | those get created oin all clouds defined | 20:32 |
rcarrillocruz | then in a cloud/region basis we can have specific additional resources | 20:32 |
mordred | yah | 20:32 |
rcarrillocruz | not mutually exclusive | 20:33 |
anteaya | oh good news rcarrillocruz and clarkb, dougwig and kevinbenton were sitting close to me at lunch and are willing to listen to you tell them your use case involving default security groups | 20:35 |
anteaya | I suggested next week might be a good time for a chat | 20:35 |
anteaya | and recommended they try to find you in infra | 20:36 |
rcarrillocruz | nice anteaya | 20:38 |
clarkb | denver is an airport where pre would be useful | 20:38 |
rcarrillocruz | mordred: fwiw, can't wait to see this coming: https://github.com/ansible/ansible/issues/13262 | 20:38 |
rcarrillocruz | include with_items workaround is nasty :/ | 20:39 |
Clint | clarkb: they don't have pre? | 20:39 |
clarkb | I dont | 20:39 |
Clint | ohhh | 20:39 |
clarkb | portland + pre is not super useful | 20:39 |
clarkb | lines what lines | 20:39 |
clarkb | denver on the other hand | 20:39 |
clarkb | also there are special baggage claims for skis | 20:42 |
*** _degorenko|afk has quit IRC | 20:42 | |
*** _degorenko|afk has joined #openstack-sprint | 20:43 | |
rcarrillocruz | folks, can you +2+A https://review.openstack.org/#/c/284801/ , had to rebase | 20:44 |
rcarrillocruz | thx fungi | 20:47 |
pabelanger | So, crinkle was mentioning that our puppet-infracloud jobs in the gate were pretty slow. Did we want to review the jobs and possible trim them? EG: setup -nv for centos / fedora and or drop beaker jobs? | 20:49 |
pabelanger | since we depend on bare-precise / bare-trusty and are limited by rax | 20:50 |
crinkle | pabelanger: in this case they were waiting to run puppet-lint and puppet-syntax jobs | 20:51 |
rcarrillocruz | mordred, clarkb: http://paste.openstack.org/show/488275/ | 20:52 |
pabelanger | crinkle: right, which is on bare-trusty nodes (rax only). i think we could take with fungi and convert -infra to bindep jobs (which is ubuntu-trusty) nodes and on all clouds now | 20:53 |
fungi | pabelanger: the challenge i think will be that the plan so far has been to switch jobs over on a per-job-template basis (since we need to force them to specific node types), so we'd need to use slightly different templates just for the jobs using bindep if we're talking about splitting some projects to this and not others which used the same job-templates | 20:55 |
pabelanger | fungi: right | 20:56 |
fungi | that also complicates the unwinding once we finish migrating | 20:57 |
fungi | so would want to make sure we track what can be recombined into common job-templates again in the end | 20:57 |
AJaeger | fungi, yeah - I had to duplicate the tox job for moving the manual jobs in 283445 | 21:02 |
fungi | pabelanger: also i have a feeling we're about to be not landing many puppet-infracloud changes soon for at least some weeks while all te hardware is offline | 21:02 |
anteaya | clarkb: yes denver is an airport where pre would be useful | 21:02 |
AJaeger | and right now our systems are really busy - 380 changes in check queue... | 21:03 |
pabelanger | fungi: agreed. Just looking at optimizations | 21:03 |
fungi | clarkb: current discussion is around whether we should be doing 802.3ad or similar and then 802.1q over that. you almost certainly have some input here | 21:13 |
crinkle | could someone check on the puppetmaster and see if the last puppet run on controller00 was successful? it looks like puppet has run on the controller but not on the computes so the computes are trying to reach a rabbitmq that has moved ports | 21:14 |
fungi | clarkb: as opposed to putting different native vlans on the two uplinks | 21:14 |
crinkle | o.0 Feb 25 21:20:30 controller00 puppet-user[10150]: Could not find data item elasticsearch_nodes in any Hiera data file and no default supplied at /opt/system-config/production/manifests/site.pp:8 on node controller00.hpuswest.ic.openstack.org | 21:21 |
crinkle | root@controller00:~# grep elasticsearch_nodes /opt/system-config/production/hiera/common.yaml | 21:22 |
crinkle | elasticsearch_nodes: | 21:22 |
crinkle | aha http://git.openstack.org/cgit/openstack-infra/ansible-puppet/tree/defaults/main.yml#n15 | 21:25 |
*** cody-somerville has joined #openstack-sprint | 21:25 | |
nibalizer | cody-somerville: http://paste.openstack.org/show/488279/ | 21:25 |
yolanda | infra-root , are you familiar with error | 21:27 |
yolanda | fatal: [controller00.hpuseast.ic.openstack.org]: FAILED! => {"changed": false, "disabled": false, "error": true, "failed": true, "invocation": {"module_args": {"environment": null, "facter_basename": "ansible", "facts": null, "logdest": "syslog", "manifest": "/opt/system-config/production/manifests/site.pp", "puppetmaster": null, "show_diff": false, | 21:27 |
yolanda | "timeout": "30m"}, "module_name": "puppet"}, "msg": "puppet did not run", "rc": 1, "stderr": "", "stdout": "", "stdout_lines": []} | 21:27 |
yolanda | how can i better debug it ? | 21:28 |
crinkle | yolanda: please see what i pasted above ^ | 21:28 |
crinkle | the last ansible-puppet change broke the hiera config | 21:28 |
yolanda | crinkle there was a typo on the flag, i fixed with https://review.openstack.org/284936 | 21:29 |
yolanda | but still fails for me | 21:30 |
fungi | nibalizer: i'm reading through lspci... two gigabit interfaces, a mellanox 10g interface... i wonder what the fourth nic is? | 21:30 |
yolanda | but yes , apart from that there is something with hieradata now | 21:31 |
crinkle | yolanda: the hiera.yaml is pointing the hiera data at :datadir: "/etc/puppet/hieradata" | 21:31 |
clarkb | fungi: neutron wants its own logical interface | 21:31 |
clarkb | fungi: so link aggregation doesn't really help | 21:31 |
clarkb | you can do link aggregation but you still need two logical interfaces | 21:31 |
yolanda | nibalizer, can you check at that hiera paths in ansible-puppet ? looks as they are not really matching our needs | 21:34 |
clarkb | I think the ideal would be 10Gbe for neutron, and 1gig or better for control | 21:34 |
clarkb | and on the controller use 10Gbe for image stuff | 21:34 |
clarkb | actually you want 10Gbe for all the things because images | 21:35 |
fungi | clarkb: sounds like these have a 10gbe mellanox interface and then a couple of 1gbe nics | 21:35 |
fungi | the hardware in west anyway | 21:35 |
yolanda | mmm | 21:35 |
yolanda | :yaml: | 21:35 |
yolanda | :datadir: "/opt/system-config/" | 21:35 |
yolanda | needs to be that, not /etc/puppet/hieradata... | 21:35 |
clarkb | looks like the unhappy VM is still in nodepool and still unhappy | 21:35 |
Clint | fungi: if you up all 4 interfaces, how many of them have link? | 21:35 |
clarkb | mordred: ^ you are working on cleaning it out of cloud? has that happened yet? | 21:36 |
fungi | also i have a feeling the ipmi boot isn't going to work over the 10gbe nic | 21:36 |
clarkb | mordred: not sure if nodepool needs encouragement or if it isn't expected to work yet | 21:36 |
fungi | Clint: Clint two | 21:36 |
fungi | eth0 and eth1 have no carrier, eth2 and eth3 have carrier | 21:37 |
fungi | after ip link set up | 21:37 |
Clint | fungi: and which one are we using? | 21:37 |
fungi | eth2 | 21:38 |
crinkle | could someone make recommendations on https://review.openstack.org/284939 ? | 21:38 |
Clint | so if eth3 is shared with the ilo port that might make sense | 21:38 |
yolanda | crinkle nibalizer https://review.openstack.org/#/c/284942 | 21:40 |
fungi | Clint: yep, that's what i'm suspecting | 21:41 |
jeblair | Speed: 10000Mb/s | 21:41 |
jeblair | from ethtool eth2 in compute035 | 21:41 |
fungi | Clint: because i see the ilo show up in lspci | 21:41 |
yolanda | and i need https://review.openstack.org/284936 and https://review.openstack.org/284938 to land as well | 21:42 |
fungi | jeblair: did you manually install ethtool? i was only finding mii-tool installed on compute001 | 21:42 |
jeblair | fungi: yep | 21:42 |
mordred | clarkb: I'm still working on the script | 21:43 |
yolanda | infra-root, to fix puppet ^ | 21:43 |
crinkle | yolanda: I +1'd all your things and abandoned my thing | 21:44 |
yolanda | thx | 21:44 |
yolanda | i did some tests changing that parameters on my controller and puppet ran | 21:45 |
jhesketh | Howdy | 21:48 |
nibalizer | yolanda: commented on 284938/1 | 21:48 |
jeblair | jhesketh: o/ | 21:48 |
nibalizer | yolanda: crinkle is the problems with ansible-puppet why the controller00 is failing puppet? | 21:49 |
crinkle | nibalizer: yes | 21:49 |
nibalizer | cool | 21:49 |
crinkle | and most likely everything in infra is failing | 21:49 |
nibalizer | crinkle: suprisingly not | 21:50 |
nibalizer | probably because we do not have manage_true | 21:50 |
crinkle | ah | 21:50 |
nibalizer | er manage_config: true | 21:50 |
jeblair | manage_config=true but yeah | 21:50 |
nibalizer | in the other playbooks | 21:51 |
nibalizer | 0 controller00 puppet-user[11898]: Could not find data item elasticsearch_nodes in any Hiera data file and no default supplied | 21:53 |
nibalizer | crinkle: yolanda | 21:53 |
crinkle | right that is what led to this | 21:53 |
crinkle | there is a hiera.yaml that points hiera data dir to /etc/puppet/hierdata | 21:53 |
jeblair | (is elasticsearch_nodes something that controller00 should be attempting to lookup?) | 21:54 |
crinkle | jeblair: it's in the public common.yaml | 21:54 |
nibalizer | the problem is 284942 | 21:54 |
nibalizer | we need that | 21:54 |
crinkle | jeblair: http://git.openstack.org/cgit/openstack-infra/system-config/tree/manifests/site.pp#n8 | 21:55 |
clarkb | http://www.meetup.com/OpenStack-Colorado/events/228594900/ if you want to learn about magnum and kolla | 21:57 |
Clint | alas, i have a prior engagement | 21:58 |
nibalizer | https://review.openstack.org/#/c/284954/ | 21:58 |
clarkb | mordred: would it be a bad idea to remove that node fomr the nodepool db so that nodepool can boot another machine? | 21:59 |
clarkb | I think we should've applied the new nova compute config everywhere at this point | 21:59 |
yolanda | infra-root can you review https://review.openstack.org/#/c/284942/ ? | 21:59 |
mordred | clarkb: sure. which node is it? | 21:59 |
clarkb | 8311836 with uuid ed78ffbb-ce70-43a1-8c5d-b0bbbe7d8174 | 22:00 |
*** delatte has quit IRC | 22:00 | |
mordred | clarkb: done | 22:01 |
mordred | clarkb: I have not yet figured out the quota thing | 22:01 |
clarkb | mordred: done meaning nova no longer know about it? | 22:01 |
jeblair | clarkb: puppet has been broken; i'm not sure if we know whether it has been applied everywhere | 22:01 |
clarkb | jeblair: rgr | 22:01 |
mordred | clarkb: I have done the same db things to that node as I did to the previous one | 22:02 |
clarkb | mordred: woot thanks | 22:02 |
mordred | update instances set task_state='deleted', deleted=1 where uuid= 'ed78ffbb-ce70-43a1-8c5d-b0bbbe7d8174'; | 22:02 |
mordred | for what it's worth | 22:02 |
yolanda | infra-root, i need reviews again for https://review.openstack.org/284938, i refactored | 22:02 |
jeblair | clarkb: where's the setting? | 22:02 |
clarkb | jeblair: /etc/nova/nova.conf grep for 'unsafe' | 22:02 |
jeblair | clarkb: it is applies everywhere | 22:03 |
jeblair | applied | 22:03 |
clarkb | cool | 22:04 |
mordred | clarkb: how many instances does nova think are in use now? | 22:04 |
jeblair | (confirmed with ansible adhoc grep) | 22:04 |
clarkb | mordred: nova? or nodepool? | 22:04 |
mordred | either | 22:04 |
clarkb | nodepool says 0 | 22:04 |
crinkle | one (the mirror) | 22:04 |
crinkle | from nova's perspective | 22:04 |
mordred | that's in a different project | 22:04 |
mordred | | 2016-02-24 20:20:24 | 2016-02-25 18:12:57 | NULL | 17 | 894a11e0a16a4c29bb8b884c1c70bf2c | instances | 2 | 0 | NULL | 0 | 7dbe0f121e424a74be2eed25399e2c75 | | 22:04 |
mordred | | 2016-02-24 20:20:24 | 2016-02-25 18:12:57 | NULL | 18 | 894a11e0a16a4c29bb8b884c1c70bf2c | ram | 16384 | 0 | NULL | 0 | 7dbe0f121e424a74be2eed25399e2c75 | | 22:04 |
mordred | | 2016-02-24 20:20:24 | 2016-02-25 18:12:57 | NULL | 19 | 894a11e0a16a4c29bb8b884c1c70bf2c | cores | 32 | 0 | NULL | 0 | 7dbe0f121e424a74be2eed25399e2c75 | | 22:04 |
mordred | that's the quota usage for the nodepool project | 22:04 |
mordred | I think I'd like to set instance, ram and core usage to 0 | 22:04 |
mordred | I have now done that | 22:05 |
mordred | I think quotas should be correct now | 22:05 |
clarkb | the next VM booted should use cache unsafe on its boot disk | 22:07 |
mordred | woot | 22:07 |
clarkb | you can confirm this by running a ps -elf on that process | 22:07 |
clarkb | and part of the command is the string used for the root disk | 22:07 |
clarkb | was saying cache=none should say cache=unsafe | 22:07 |
Clint | clarkb: so it hasn't yet today? | 22:08 |
clarkb | no because puppet wasn't running on the computes | 22:08 |
clarkb | then it ran and it broke the single VM which made nodepool stop making new ones | 22:09 |
Clint | ahh | 22:09 |
fungi | fragility, thy name is openstack | 22:11 |
clarkb | LaunchStatusException: Server ca1453ce-23bb-41e7-ba8c-9e44415f0064 for node id: 8316914 status: ERROR | 22:12 |
clarkb | so thats neat | 22:12 |
mordred | mmm | 22:12 |
mordred | that's a great status | 22:12 |
crinkle | clarkb: run nova hypervisor-list | 22:12 |
jeblair | clarkb: yeah, rabbitmq is broken because of the puppet break | 22:12 |
clarkb | aha | 22:12 |
jeblair | i think all puppet-fixing patches have been aprvd | 22:13 |
jeblair | but only just now | 22:13 |
nibalizer | 284938 | 22:14 |
nibalizer | 284954 | 22:15 |
Clint | full of lies | 22:15 |
jeblair | 936, 938 and 942 | 22:15 |
*** delatte has joined #openstack-sprint | 22:25 | |
yolanda | infra-root, i need https://review.openstack.org/#/c/284447/j | 22:30 |
yolanda | https://review.openstack.org/#/c/284447/ , for controller to work | 22:31 |
*** delattec has joined #openstack-sprint | 22:33 | |
mordred | https://review.openstack.org/284969 | 22:35 |
mordred | clarkb, crinkle: ^^ | 22:35 |
mordred | that should be usable to delete an instance from the db - and to clean out soft-deleted records | 22:35 |
yolanda | crinkle, i fixed... and also change https://review.openstack.org/#/c/284881/ to properly send neutron in west needs to land https://review.openstack.org/#/c/284881/ | 22:36 |
*** delatte has quit IRC | 22:36 | |
crinkle | yolanda: commented on 881 | 22:37 |
mordred | perhaps we should just make a cron to install that script and run it every 15 minutes? | 22:39 |
mordred | we pretty much never want to keep actual deleted records around | 22:39 |
crinkle | puppet is still broken, hiera.yaml still has wrong datadir | 22:40 |
crinkle | oh, possibly because 284936 merged after the last puppet run | 22:41 |
fungi | i now have a cab booked departing to den from the hilton garden inn at 5:30pm local if anyone needs to share a ride | 22:46 |
jeblair | mordred: yeah, add it to puppet-infracloud... :| | 22:47 |
crinkle | puppet looks fixed | 22:50 |
* crinkle crosses fingers | 22:50 | |
jeblair | crinkle: \o/ | 22:50 |
yolanda | infra-root, this change https://review.openstack.org/#/c/284447/ is blocking addition of controller in useast... can you review ? | 22:56 |
crinkle | puppet successfully ran on controller00, was it able to start on the computes? | 22:58 |
crinkle | nibalizer: yolanda ^ | 22:58 |
crinkle | my reading of puppetboard makes it look like controller hasn't submitted a report in a couple of hours | 22:59 |
yolanda | and this https://review.openstack.org/284881 needs to land first | 23:00 |
*** delatte has joined #openstack-sprint | 23:01 | |
crinkle | I don't think 881 and 447 will pass puppet apply separately, I think they need to be in the same change | 23:02 |
*** delattec has quit IRC | 23:03 | |
yolanda | infra-root, i abandoned 881 and added west to https://review.openstack.org/284447 | 23:08 |
nibalizer | crinkle: we failed to post facts | 23:10 |
nibalizer | so no | 23:10 |
nibalizer | im debugging | 23:10 |
crinkle | nibalizer: :'( | 23:10 |
nibalizer | weeping uncontrollably | 23:12 |
fungi | thanks yolanda | 23:12 |
fungi | nibalizer: *sniffle* | 23:12 |
yolanda | jeblair, can you add another +2 to 284447 | 23:18 |
jeblair | nibalizer: https://review.openstack.org/284346 | 23:19 |
mordred | jeblair: shouldn't it reference remote_puppet_adhoc? | 23:22 |
mordred | oh. piddle. that didn't land yet because linters | 23:23 |
mordred | silly me - I got indentation of the shellscript wrong | 23:24 |
mordred | https://review.openstack.org/#/c/284352/ | 23:24 |
yolanda | thx jeblair | 23:25 |
yolanda | jeblair http://paste.openstack.org/show/488292/ | 23:38 |
rcarrillocruz | huh | 23:38 |
rcarrillocruz | mordred: there's no os_domain module | 23:38 |
rcarrillocruz | ? | 23:38 |
* rcarrillocruz sad panda | 23:38 | |
mordred | rcarrillocruz: https://github.com/ansible/ansible-modules-extras/blob/devel/cloud/openstack/os_keystone_domain.py | 23:39 |
rcarrillocruz | aha! | 23:39 |
rcarrillocruz | thanks | 23:39 |
mordred | a domain is an admin-only thing, so it gets prefixed with the name of the service | 23:39 |
rcarrillocruz | any reason for 'keystone' in middle | 23:40 |
rcarrillocruz | ah, see the pattern | 23:40 |
rcarrillocruz | something i see there's missing is an os_quota thing | 23:40 |
rcarrillocruz | ? | 23:40 |
rcarrillocruz | in case, would that be os_server_quota | 23:40 |
rcarrillocruz | os_port_quota | 23:40 |
rcarrillocruz | etc | 23:40 |
rcarrillocruz | or os_quota with the resource passed in as param | 23:41 |
rcarrillocruz | thinking on potential stream works for myself | 23:41 |
rcarrillocruz | mordred: ^ | 23:43 |
mordred | hrm | 23:44 |
mordred | I think probably os_nova_quota | 23:44 |
mordred | again - it's an admin thing, so you konw you're setting nova quotas | 23:44 |
rcarrillocruz | erm, yeah | 23:45 |
mordred | I think we could also have the ansible modules do validation that the project_id you're passing in is a valid project_id | 23:45 |
crinkle | have we put together an etherpad yet for our networking request? and/or could someone start that? | 23:49 |
crinkle | clarkb: can you review and help expand on the bottom of https://etherpad.openstack.org/p/mitaka-infra-midcycle so we can send it as an email to the dc ops team? | 23:53 |
*** krtaylor has quit IRC | 23:58 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!