Thursday, 2023-01-26

opendevreview	Merged openstack/ansible-role-systemd_service stable/zed: Ensure daemon is reloaded on socket change https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/871751	00:21
noonedeadpunk	another vote would be preferable for https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/871752	08:25
noonedeadpunk	Going to do another bump today as all patches for OSSA-2023-002 are landed	08:26
dokeeffe85	Morning, could anyone give me a pointer as to what I should look for after this: https://paste.openstack.org/show/bpq4ew3P61rfwp5OuqgS/ I thought I had installed octavia properly but following the cookbook guide I get that	09:13
noonedeadpunk	dokeeffe85: have you ran haproxy-install.yml after adding octavia to the inventory?	09:15
dokeeffe85	ah noonedeadpunk, sorry my bad. I'll do that now	09:18
dokeeffe85	Too early for this :) Thanks noonedeadpunk that did it	09:23
noonedeadpunk	sweet )	09:23
noonedeadpunk	I usually also forget tu run haproxy role to be frank when adding new services, so it's common thing to forget :)	09:23
noonedeadpunk	thankfully, that error you've pasted is quite explicit about what is wrong ;)	09:24
noonedeadpunk	jrosser: andrewbonney: it would be great if you could check this bug: https://bugs.launchpad.net/openstack-ansible/+bug/2003921 as I'm quite far from using proxies...	11:24
*** dviroel\|out is now known as dviroel		11:32
*** dviroel_ is now known as dviroel		11:45
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-ceph_client master: Improve regexp for fetching nove secret from files https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/871819	12:49
jrosser	noonedeadpunk: i can look at that but just probably not today - but we do have proxy CI job which is working so i'm not sure what it going on there	12:59
ierdem	Hi, when I try to cold migrate VMs via cli by specifying destination host, it throws an exception after first migrate "No valid host was found" -no more details, just this message-, but destination host has enough resource. Does nova-scheduler cause this? If is, how can I force it to migrate more than one VMs to the same host? Thanks for all your assistance. (I have kolla-ansible stein-eol)	13:25
noonedeadpunk	jrosser: yeah, I guess no rush here as even workaround was proposed. It's just hard to evaluate for me what they've faced with and why that workaround is needed. I assume it might be during debootstrap. As I had to override lxc_hosts_container_build_command for example, to include path to gpg	13:33
noonedeadpunk	but I have fully isolated env, and have no idea what can be needed for proxy there	13:33
mgariepy	don't you need to set the global_environment_variables with various proxys ?	13:37
mgariepy	https://github.com/openstack/openstack-ansible-openstack_hosts/blob/master/templates/environment.j2	13:38
noonedeadpunk	well. there's alternative approach as well due to pam.d limitations of no_proxy length	13:38
noonedeadpunk	or smth like that	13:38
noonedeadpunk	so you can provide some vars to do same but during runtime only	13:38
noonedeadpunk	sorry, my real knowledge is quite abstract as never had to run that kind of setup	13:39
mgariepy	if the network is restricted you will need to have the proxy for every command mostly. you need to have it configured to wget stuff also.	13:39
noonedeadpunk	here's the doc https://docs.openstack.org/openstack-ansible/latest/user/limited-connectivity/index.html#other-proxy-configuration	13:41
noonedeadpunk	`deployment_environment_variables`	13:42
mgariepy	no_proxy length 1024 .. oof i	13:43
mgariepy	i'm a 1006 LOL	13:43
dokeeffe85	Hi all, sorry to bother you again. I installed octavia and ran haproxy as noonedeadpunk pointed out this morning but I cannot create a loadbalancer successfully. https://paste.openstack.org/show/bAM1QuK7DZGgj1wQtvjS/	14:41
noonedeadpunk	well, it looks like it's neutron issue at the first place?	14:43
noonedeadpunk	Are you able to spawn any other VM from that network? Or attach port from it to some VM?	14:43
dokeeffe85	Let me try	14:44
noonedeadpunk	Also worth checking logs for neutron agent on compute	14:44
dokeeffe85	Will do that too	14:44
noonedeadpunk	Also `physnet2` should not be used as a flat network anywhere	14:45
noonedeadpunk	IIRC	14:45
noonedeadpunk	as then it will be part of the bridge and neutron might fail to manage it	14:45
anskiy	hello! Is it possible to have some of the services in a group (I'm trying to do this with `storage_hosts` group) to be deployed in LXC and the others directly on metal?	15:15
noonedeadpunk	sure, it's possible	15:16
noonedeadpunk	anskiy: https://docs.openstack.org/openstack-ansible/latest/reference/inventory/configure-inventory.html#deploying-directly-on-hosts	15:17
spatel	noonedeadpunk are you guys using cinder-backup service? what provider do you use?	15:20
spatel	NFS or ceph ?	15:20
noonedeadpunk	we don't now, but I'd use S3/swift	15:21
spatel	We have very small ceph storage and it doesn't have S3 service	15:22
spatel	its 3 node ceph	15:22
spatel	to run S3 i need S3 gateway correct on dedicated node	15:22
spatel	This is what i configured to point cinder-backup to ceph - https://paste.opendev.org/show/bPKkDsAxiRdcNll2WSuR/	15:24
opendevreview	Merged openstack/ansible-role-zookeeper stable/zed: Add configuration option for native Prometheus exporter https://review.opendev.org/c/openstack/ansible-role-zookeeper/+/871753	15:28
opendevreview	Merged openstack/ansible-role-systemd_service stable/yoga: Ensure daemon is reloaded on socket change https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/871752	15:53
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Bump OSA for stable/zed to cover CVE-2022-47951 https://review.opendev.org/c/openstack/openstack-ansible/+/871830	16:03
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Bump OSA for stable/yoga to cover CVE-2022-47951 https://review.opendev.org/c/openstack/openstack-ansible/+/871834	16:41
noonedeadpunk	spatel: well, I'd say everyting that can do incremental backups is good enough. But I defenitely would avoid nfs	17:20
spatel	Hmm!	17:20
spatel	Do you guys create vm with cinder volume ? or let me ask this way.. what is the best approach here?	17:21
noonedeadpunk	Yes, we're moving from ephemerals to cinder volumes at the moment	17:36
noonedeadpunk	I'm not sure about reasons why you might want to use ephemerals, rather then keep them for local storage	17:37
noonedeadpunk	ie, get couple of drives in raid and utilize for CI runners	17:37
noonedeadpunk	As with nova handling drives you need to have soooo way more flavors to cover demand	17:38
noonedeadpunk	for actually no good reason imo	17:38
jrosser	i wish this was all more transparent to mix	17:39
jrosser	like making CI runners with local storage and other VM with BFV seems unnecessarily hard	17:40
jrosser	making/mixing	17:40
spatel	noonedeadpunk agreed to use volume based vm	17:41
spatel	But how does flavor disk size and volume size work?	17:41
spatel	does volume override flavor disk size?	17:41
noonedeadpunk	yes it does	17:43
noonedeadpunk	but eventually you should just have 0 size of disk in flavor for bfv	17:43
noonedeadpunk	jrosser: well... you can spawn cinder-volume on each compute I guess :D	17:43
noonedeadpunk	not sure it's easier or better though....	17:44
spatel	if you run cinder-volume with local-storage in that case what if server is dead, you will have outage correct?	17:47
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/xena: Bump OSA for stable/zed to cover CVE-2022-47951 https://review.opendev.org/c/openstack/openstack-ansible/+/871839	17:47
noonedeadpunk	with local storage you wil lhave outage, yes. But well. It's what ephemeral literally means	17:48
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/xena: Bump OSA for stable/xena to cover CVE-2022-47951 https://review.opendev.org/c/openstack/openstack-ansible/+/871839	17:48
spatel	But if you run nova+ceph then you will have option to migrate VM	17:49
noonedeadpunk	I think these things are for different purposes	17:49
spatel	nova+ceph+ephemeral i meant (without volume)	17:49
noonedeadpunk	As you won't get low-latency with ceph	17:49
spatel	Yes.. that is what i am doing all local storage without ceph	17:50
spatel	I am exploring other option for new cloud to run all central ceph based vm	17:50
spatel	This cloud is for different purpose	17:50
spatel	I want backup of volume alsso	17:50
spatel	That is why testing cinder-backup option to understand what will fit better here	17:51
noonedeadpunk	Or well. You can, if you will do caching to local nvme, so ceph will consider write commited and appllied once it will get written on local nvme, and then it will move data asychronosly to OSDs	17:52
noonedeadpunk	but well, that has quite same drawbacks as local drive imo...	17:52
spatel	local nvme for caching in ceph?	17:53
noonedeadpunk	yeah....	17:53
spatel	never heard of that	17:53
spatel	How does it work?	17:53
noonedeadpunk	If I'm not mistaken, I'm talking about https://docs.ceph.com/en/octopus/rbd/rbd-persistent-cache/	17:55
noonedeadpunk	but likely I am mistaken and it's smth different	17:57
spatel	Hmm.. so local compute node disk (nvme) will be part of ceph in that case ?	17:57
spatel	It doesn't make sense :(	17:57
spatel	Look like they are talking about ceph cache tiering method	17:58
noonedeadpunk	sorry, it's https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/	17:58
spatel	It's deprecated and it's use is discouraged.	17:58
noonedeadpunk	but yes. you're using local drive on compute node for caching writes to "improve" latency	17:59
noonedeadpunk	as ceph will ack commit once data gets to local drive	17:59
noonedeadpunk	so basically you get latency and throughput of local disk with ceph	18:00
noonedeadpunk	but yes, with quite a risk to get data corrupted in case of compute going down or disk failure...	18:01
noonedeadpunk	But might be better then just local disk.... But again, I don't really trust this thing	18:01
spatel	agreed!	18:02
noonedeadpunk	though our storage folks eager to get it running asap :D	18:02
spatel	This kind of solution looks fancy but comes with troubleshooting cost	18:02
spatel	it makes operator life hell :)	18:02
spatel	share your case study with us :)	18:03
noonedeadpunk	well, while testing at sandbox, things look amazing	18:04
noonedeadpunk	But I really don't want this to go out of the sandbox to be frank	18:04
spatel	It works great because single machine but when it come down to multiple nodes then it would be interesting to see	18:05
spatel	but idea sounds great	18:05
spatel	worth testing out	18:06
noonedeadpunk	Well, I don't think things go south with more nodes, given that you still maintain througput and latency on ceph side and scale in time	18:08
noonedeadpunk	But in case of incidents I'm not sure what's gonna happen with data that was not transferred	18:09
noonedeadpunk	Like compute failure	18:09
noonedeadpunk	anyway	18:10
jrosser	read cache looks nice though	18:13
jrosser	I expect that could have a significant reduction in network and osd io for some cases	18:14
noonedeadpunk	yeah, true	18:23
jrosser	and the write cache being able to apply to a specific pool would be perhaps ok	18:25
jrosser	somehow that feels like it might be an actual solution to having something that feels like local storage, on a different volume type	18:26
noonedeadpunk	yep, that is true	18:27
noonedeadpunk	But I guess you should also then somehow prevent evacuating VMs that are on this pool	18:27
noonedeadpunk	as if it's just compute that's catched kernel panic or smth - it's way better to wait for it rather then repair all filesystems	18:28
noonedeadpunk	or databases	18:28
noonedeadpunk	And I still have scars from same folks pushing for tiering cache which ended up as all cache tiering ends	18:31
noonedeadpunk	so no trust in write cache + ceph combo	18:31
noonedeadpunk	but yes, in tests fio has shown just same results for iops/throughput as for same ssd being used as local drive	18:33
moha7	"Flat type for provider network is not recommended for production"; Why? This is the question that the network team asked me.	20:04
moha7	Compared to the vlan type, is it only for the reason of expansion issues in the future that you suggest that the network should not be set as a flat type or are there other reasons?	20:06
jrosser	moha7: to add an extra flat type network later is a massive job with a lot of config changes on all hosts and in OSA config, and then in neutron. some reasonable risk involved.	20:19
jrosser	moha7: extra vlan type network is 1) your network team enable that vlan on the trunk 2) you issue a openstack cli command to create the new neutron network, job done	20:20
jrosser	there is no redeployment and no config changes anywhere, no services restarted or adjustment of physical interfaces in your hosts	20:20
moha7	+1	20:21
moha7	In the stage env, our controllers and compute servers have two 2-port 10G network cards and one 4-port 1G network card. Here's the design I'm going to ask for from the network team:	20:22
moha7	10G+10G -> bond0 ---> for Ceph	20:22
moha7	2nd 10G ports -> bond1 ---> for the self-service network (br-vxlan)	20:22
moha7	Provider network (br-vlan): one 1G port	20:22
moha7	Management network (br-mgmt): one 1G port	20:23
jrosser	you still need some vlans	20:23
moha7	API (external_haproxy_lb): one 1G port	20:23
moha7	Log (ELK): one 1G port	20:23
moha7	I'm going to fit each br-x with an interface	20:24
moha7	Does the above plan look reasonable?	20:25
jrosser	mapping br-<x> to a physical interface is ok perhaps	20:27
jrosser	but what will you do for octavia when osa wants br-lbaas or ironic wanting br-bmaas on the controllers?	20:28
moha7	You mean if we vlan, we will handle new features easier in the future, right? Hmm, Yeah	20:31
moha7	Is it the right decision that I am giving the tenants network (br-vxlan) much more bandwidth than the external network? (20G vs 1G)	20:34
jrosser	well that entirely depends on your workload requirements	20:35
jrosser	I think also it is unbalanced to have bonds for some things and not for br-mgmt, as that is critical to the whole control plane	20:36
jrosser	anywhere you only have one port and not a bind consider the impact of the network team doing a firmware upgrade, your links are down in the at time	20:37
jrosser	*bond	20:37
jrosser	then you want your out-of-band/ipmi etc dedicated port (shared ports can be difficult)	20:40
moha7	it's true. For the production environment, all networks will be redundant. We currently have a limited number of cards in stock. These cards will be paired as soon as the server team provides them.	20:40
jrosser	and also consider which of these many interfaces is the one you pxeboot from and connect with the deploy host ssh	20:40
jrosser	personally I would have less individual ports, bundle stuff together on a trunk more and use the 1G port for deployment / monitoring	20:42
jrosser	there’s no user impact if a non redundant port for you to ssh in on goes down	20:42
moha7	That means as the Deployer host speaks to nodes on mgmt range, the br-mgmt interface should be set as pxeboot, right?	20:43
jrosser	and you can absolutely have a different setup on the controller vs the computes	20:43
jrosser	as the requirements might well be different	20:43
jrosser	only if you want it to :)	20:44
jrosser	I have a separate interface fo let deploy/ssh/pxeboot/monitoring that’s not redundant, but that’s just suiting my environment	20:44
jrosser	and the mgmt network is elsewhere on a bond	20:45
jrosser	but really, keep as simple as possible whilst meeting your requirements for security, HA, maintainability, flexibility, performance etc…..	20:50
jrosser	I know that’s very easy to say, and reality is it’s a tough problem with many aspects to consider	20:51
*** dviroel is now known as dviroel\|out		20:58
moha7	Thank you jrosser for help and all the tips	21:06

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!