*** suzhengwei has joined #openstack-masakari | 02:03 | |
*** suzhengwei has quit IRC | 05:17 | |
*** suzhengwei_ has joined #openstack-masakari | 05:32 | |
*** brinzhang has quit IRC | 06:11 | |
*** brinzhang has joined #openstack-masakari | 06:12 | |
*** brinzhang has quit IRC | 06:14 | |
*** brinzhang has joined #openstack-masakari | 06:15 | |
openstackgerrit | jacky06 proposed openstack/masakari master: [wip] Remove six https://review.opendev.org/727106 | 08:29 |
---|---|---|
noonedeadpunk | hey everyone. anybody here and active and ready to chat?:) | 09:46 |
suzhengwei_ | I'm here. | 09:57 |
suzhengwei_ | noonedeadpunk: :) | 10:01 |
noonedeadpunk | suzhengwei_: oh, cool. So I've recently upgraded openstack deployment to train within masakari and realized that some behaviour changed a while ago with https://review.opendev.org/#/c/621535/ | 10:02 |
noonedeadpunk | so the thing is that we have different real hostnames and hostnames provisioned by nova hyprevisors | 10:03 |
noonedeadpunk | Like in hypervisor list we do have hostname like `compute01.domain.com`, but in nova compute service list this compute node is marked as `compute01` - which is eventually correct hostname of this node | 10:03 |
noonedeadpunk | And so we find ourselves unable to add new host to masakari | 10:03 |
noonedeadpunk | And corosync is also configured to use "real" hostname | 10:04 |
noonedeadpunk | Like changing hostnames for all hosts is possible but uh, unwanted I'd say:) | 10:04 |
noonedeadpunk | suzhengwei_: so was wondering about thoughs regarding this. I kinda was thinking about possible solutions. Among them (except changing hostnames and re-adding all compute nodes to all services) were adding a config option to masakari to either check or not if compute is in hypervisors | 10:06 |
noonedeadpunk | and checking not only against nove hypervisors but also against nova services with type compute | 10:07 |
noonedeadpunk | as we have correct naming being discovered there | 10:07 |
suzhengwei_ | I didn't meet such problem. | 10:10 |
suzhengwei_ | Do you have several hypervisors in one host? | 10:10 |
noonedeadpunk | nope only one | 10:11 |
suzhengwei_ | I don't know how they have diffrent name in nova. Can you give some details? | 10:15 |
noonedeadpunk | so like these are my outputs: http://paste.openstack.org/show/793430/ | 10:15 |
noonedeadpunk | so it seems that nova have different ways of discovering of hostnames | 10:16 |
noonedeadpunk | so like for nova-compute they get real hostname, but for hypervisor list they do use socket.getfqdn() which is taken from hosts file... | 10:17 |
suzhengwei_ | what's it in /etc/hostname, and did you config hostname in nova.conf? | 10:20 |
suzhengwei_ | If not config, it will use socket.gethostname() as the hostname. | 10:23 |
noonedeadpunk | no, hostname is not configured in nova.conf. in hosts records like this http://paste.openstack.org/show/793432/ | 10:23 |
noonedeadpunk | suzhengwei_: so the thing is, that I think that hypervisors use socket.getfqdn() while services use socket.gethostname() | 10:24 |
noonedeadpunk | http://paste.openstack.org/show/793433/ | 10:25 |
suzhengwei_ | I didn't find "getfqdn" in nova project. Would the hypervisor hostname be passed from libvirt? | 10:27 |
noonedeadpunk | oh, that's possible... | 10:28 |
noonedeadpunk | but I don't have it set explicitly anywhere | 10:28 |
noonedeadpunk | like in other region where socket.gethostname() and socket.getfqdn() return the same result things are good | 10:29 |
noonedeadpunk | actually I can possible just re-create corosync cluster and add fqdn naming there and re-add all nodes to masakari | 10:34 |
noonedeadpunk | suzhengwei_: another question was about if there's any option to prevent split braining? Like if one host can't reach other ones (and actually it's the one who is unhealthy) - it marks all the rest hosts as down, while these all hosts mark this one as down. All in all we end up in situation were all hosts are marked for maintenance and disabled in nova | 10:37 |
noonedeadpunk | (dunno maybe after upgrade smth changed as it was the issue for orcky) | 10:37 |
noonedeadpunk | *rocky | 10:37 |
suzhengwei_ | socket.getfqdn, get fully qualified domain name from name. | 10:38 |
suzhengwei_ | socket.gethostname, return the current host name. | 10:38 |
noonedeadpunk | yeah, I get that:) just being unable to add host to masakari because these two differs, which means different names in nova hypervisors and services and ends up not being able to add host to masakari with current naming | 10:40 |
noonedeadpunk | and both of them are actually valid so I'm not sure there's some reason not to allow them | 10:41 |
noonedeadpunk | as eventually it matters what name is configured in corosync and nova compute service but not hypervisor | 10:42 |
noonedeadpunk | as corosync should match nova service name to mark node as disabled during event | 10:42 |
noonedeadpunk | but not hypervisor one | 10:42 |
suzhengwei_ | This is a meaningful discovery. Maybe other projects upon nova would have the same problem. | 10:44 |
suzhengwei_ | I can give you a suggestion about brain break. Power-off the host before evacuating instances. This patch is not implented to masakari. | 10:49 |
noonedeadpunk | yeah, but the main problem is that "failed" host marks all healthy ones as down... So it would be really nice to have some quorum thing... | 10:55 |
noonedeadpunk | actually all projects are pretty ok:) except masakari. | 10:57 |
noonedeadpunk | Btw, during discussion I start having concerns that we do right thing by verifying added node against hyprevisors while it's supposed to be verified against nova service list https://opendev.org/openstack/masakari/src/branch/master/masakari/compute/nova.py#L247-L261 | 10:58 |
suzhengwei_ | Would you like to give out a patch? | 11:03 |
noonedeadpunk | yep, sure. Just was trying to double check if I'm right as may miss smth out | 11:05 |
suzhengwei_ | Good. Cloud you join us next irc project meeting, we can talk about this issue and quorum things with other people. | 11:06 |
noonedeadpunk | when do you have next meeting? | 11:15 |
suzhengwei_ | Maybe next Tuesday at 04:00 UTC in #openstack-meeting. | 11:16 |
*** suzhengwei_ has quit IRC | 11:39 | |
*** vishalmanchanda has quit IRC | 12:07 | |
*** vishalmanchanda has joined #openstack-masakari | 12:41 | |
*** openstackstatus has quit IRC | 13:53 | |
*** openstackstatus has joined #openstack-masakari | 13:54 | |
*** ChanServ sets mode: +v openstackstatus | 13:54 | |
*** brinzhang_ has joined #openstack-masakari | 14:01 | |
*** brinzhang has quit IRC | 14:04 | |
*** brinzhang_ has quit IRC | 14:06 | |
openstackgerrit | jacky06 proposed openstack/masakari master: [wip] Remove six https://review.opendev.org/727106 | 15:21 |
openstackgerrit | jacky06 proposed openstack/masakari master: Remove six https://review.opendev.org/727106 | 15:30 |
openstackgerrit | jacky06 proposed openstack/masakari-dashboard master: Update hacking for Python3 https://review.opendev.org/727248 | 15:53 |
openstackgerrit | jacky06 proposed openstack/masakari-monitors master: Remove six https://review.opendev.org/727252 | 15:56 |
*** vishalmanchanda has quit IRC | 16:57 | |
*** vishalmanchanda has joined #openstack-masakari | 18:04 | |
*** vishalmanchanda has quit IRC | 20:37 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!