Tuesday, 2020-05-12

*** suzhengwei has joined #openstack-masakari02:03
*** suzhengwei has quit IRC05:17
*** suzhengwei_ has joined #openstack-masakari05:32
*** brinzhang has quit IRC06:11
*** brinzhang has joined #openstack-masakari06:12
*** brinzhang has quit IRC06:14
*** brinzhang has joined #openstack-masakari06:15
openstackgerritjacky06 proposed openstack/masakari master: [wip] Remove six  https://review.opendev.org/72710608:29
noonedeadpunkhey everyone. anybody here and active and ready to chat?:)09:46
suzhengwei_I'm here.09:57
suzhengwei_noonedeadpunk: :)10:01
noonedeadpunksuzhengwei_: oh, cool. So I've recently upgraded openstack deployment to train within masakari and realized that some behaviour changed a while ago with  https://review.opendev.org/#/c/621535/10:02
noonedeadpunkso the thing is that we have different real hostnames and hostnames provisioned by nova hyprevisors10:03
noonedeadpunkLike in hypervisor list we do have hostname like `compute01.domain.com`, but in nova compute service list this compute node is marked as `compute01` - which is eventually correct hostname of this node10:03
noonedeadpunkAnd so we find ourselves unable to add new host to masakari10:03
noonedeadpunkAnd corosync is also configured to use "real" hostname10:04
noonedeadpunkLike changing hostnames for all hosts is possible but uh, unwanted I'd say:)10:04
noonedeadpunksuzhengwei_: so was wondering about thoughs regarding this. I kinda was thinking about possible solutions. Among them (except changing hostnames and re-adding all compute nodes to all services) were adding a config option to masakari to either check or not if compute is in hypervisors10:06
noonedeadpunkand checking not only against nove hypervisors but also against nova services with type compute10:07
noonedeadpunkas we have correct naming being discovered there10:07
suzhengwei_I didn't meet such problem.10:10
suzhengwei_Do you have several hypervisors in one host?10:10
noonedeadpunknope only one10:11
suzhengwei_I don't know how they have diffrent name in nova. Can you give some details?10:15
noonedeadpunkso like these are my outputs: http://paste.openstack.org/show/793430/10:15
noonedeadpunkso it seems that nova have different ways of discovering of hostnames10:16
noonedeadpunkso like for nova-compute they get real hostname, but for hypervisor list they do use socket.getfqdn() which is taken from hosts file...10:17
suzhengwei_what's it in /etc/hostname, and did you config hostname in nova.conf?10:20
suzhengwei_If not config, it will use socket.gethostname() as the hostname.10:23
noonedeadpunkno, hostname is not configured in nova.conf. in hosts records like this http://paste.openstack.org/show/793432/10:23
noonedeadpunksuzhengwei_: so the thing is, that I think that hypervisors use socket.getfqdn() while services use socket.gethostname()10:24
noonedeadpunkhttp://paste.openstack.org/show/793433/10:25
suzhengwei_I didn't find "getfqdn" in nova project. Would the hypervisor hostname be passed from libvirt?10:27
noonedeadpunkoh, that's possible...10:28
noonedeadpunkbut I don't have it set explicitly anywhere10:28
noonedeadpunklike in other region where socket.gethostname() and socket.getfqdn() return the same result things are good10:29
noonedeadpunkactually I can possible just re-create corosync cluster and add fqdn naming there and re-add all nodes to masakari10:34
noonedeadpunksuzhengwei_: another question was about if there's any option to prevent split braining? Like if one host can't reach other ones (and actually it's the one who is unhealthy) - it marks all the rest hosts as down, while these all hosts mark this one as down. All in all we end up in situation were all hosts are marked for maintenance and disabled in nova10:37
noonedeadpunk(dunno maybe after upgrade smth changed as it was the issue for orcky)10:37
noonedeadpunk*rocky10:37
suzhengwei_socket.getfqdn, get fully qualified domain name from name.10:38
suzhengwei_socket.gethostname, return the current host name.10:38
noonedeadpunkyeah, I get that:) just being unable to add host to masakari because these two differs, which means different names in nova hypervisors and services and ends up not being able to add host to masakari with current naming10:40
noonedeadpunkand both of them are actually valid so I'm not sure there's some reason not to allow them10:41
noonedeadpunkas eventually it matters what name is configured in corosync and nova compute service but not hypervisor10:42
noonedeadpunkas corosync should match nova service name to mark node as disabled during event10:42
noonedeadpunkbut not hypervisor one10:42
suzhengwei_This is a meaningful discovery. Maybe other projects upon nova would have the same problem.10:44
suzhengwei_I can give you a suggestion about brain break. Power-off the host before evacuating instances. This patch is not implented to masakari.10:49
noonedeadpunkyeah, but the main problem is that "failed" host marks all healthy ones as down... So it would be really nice to have some quorum thing...10:55
noonedeadpunkactually all projects are pretty ok:) except masakari.10:57
noonedeadpunkBtw, during discussion I start having concerns that we do right thing by verifying added node against hyprevisors while it's supposed to be verified against nova service list https://opendev.org/openstack/masakari/src/branch/master/masakari/compute/nova.py#L247-L26110:58
suzhengwei_Would you like to give out a patch?11:03
noonedeadpunkyep, sure. Just was trying to double check if I'm right as may miss smth out11:05
suzhengwei_Good. Cloud you join us next irc project meeting, we can talk about this issue and quorum things with other people.11:06
noonedeadpunkwhen do you have next meeting?11:15
suzhengwei_Maybe next Tuesday at 04:00 UTC in #openstack-meeting.11:16
*** suzhengwei_ has quit IRC11:39
*** vishalmanchanda has quit IRC12:07
*** vishalmanchanda has joined #openstack-masakari12:41
*** openstackstatus has quit IRC13:53
*** openstackstatus has joined #openstack-masakari13:54
*** ChanServ sets mode: +v openstackstatus13:54
*** brinzhang_ has joined #openstack-masakari14:01
*** brinzhang has quit IRC14:04
*** brinzhang_ has quit IRC14:06
openstackgerritjacky06 proposed openstack/masakari master: [wip] Remove six  https://review.opendev.org/72710615:21
openstackgerritjacky06 proposed openstack/masakari master: Remove six  https://review.opendev.org/72710615:30
openstackgerritjacky06 proposed openstack/masakari-dashboard master: Update hacking for Python3  https://review.opendev.org/72724815:53
openstackgerritjacky06 proposed openstack/masakari-monitors master: Remove six  https://review.opendev.org/72725215:56
*** vishalmanchanda has quit IRC16:57
*** vishalmanchanda has joined #openstack-masakari18:04
*** vishalmanchanda has quit IRC20:37

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!