Thursday, 2023-01-26

opendevreviewMerged openstack/ansible-role-systemd_service stable/zed: Ensure daemon is reloaded on socket change  https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/87175100:21
noonedeadpunkanother vote would be preferable for https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/87175208:25
noonedeadpunkGoing to do another bump today as all patches for OSSA-2023-002 are landed08:26
dokeeffe85Morning, could anyone give me a pointer as to what I should look for after this: https://paste.openstack.org/show/bpq4ew3P61rfwp5OuqgS/ I thought I had installed octavia properly but following the cookbook guide I get that09:13
noonedeadpunkdokeeffe85: have you ran haproxy-install.yml after adding octavia to the inventory?09:15
dokeeffe85ah noonedeadpunk, sorry my bad. I'll do that now09:18
dokeeffe85Too early for this :) Thanks noonedeadpunk that did it09:23
noonedeadpunksweet )09:23
noonedeadpunkI usually also forget tu run haproxy role to be frank when adding new services, so it's common thing to forget :)09:23
noonedeadpunkthankfully, that error you've pasted is quite explicit about what is wrong ;)09:24
noonedeadpunkjrosser: andrewbonney: it would be great if you could check this bug: https://bugs.launchpad.net/openstack-ansible/+bug/2003921 as I'm quite far from using proxies...11:24
*** dviroel|out is now known as dviroel11:32
*** dviroel_ is now known as dviroel11:45
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-ceph_client master: Improve regexp for fetching nove secret from files  https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/87181912:49
jrossernoonedeadpunk: i can look at that but just probably not today - but we do have proxy CI job which is working so i'm not sure what it going on there12:59
ierdemHi, when I try to cold migrate VMs via cli by specifying destination host, it throws an exception after first migrate "No valid host was found" -no more details, just this message-, but destination host has enough resource. Does nova-scheduler cause this? If is, how can I force it to migrate more than one VMs to the same host? Thanks for all your assistance. (I have kolla-ansible stein-eol)13:25
noonedeadpunkjrosser: yeah, I guess no rush here as even workaround was proposed. It's just hard to evaluate for me what they've faced with and why that workaround is needed. I assume it might be during debootstrap. As I had to override lxc_hosts_container_build_command for example, to include path to gpg 13:33
noonedeadpunkbut I have fully isolated env, and have no idea what can be needed for proxy there 13:33
mgariepydon't you need to set the global_environment_variables with various proxys ?13:37
mgariepyhttps://github.com/openstack/openstack-ansible-openstack_hosts/blob/master/templates/environment.j213:38
noonedeadpunkwell. there's alternative approach as well due to pam.d limitations of no_proxy length13:38
noonedeadpunkor smth like that13:38
noonedeadpunkso you can provide some vars to do same but during runtime only13:38
noonedeadpunksorry, my real knowledge is quite abstract as never had to run that kind of setup13:39
mgariepyif the network is restricted you will need to have the proxy for every command mostly. you need to have it configured to wget stuff also.13:39
noonedeadpunkhere's the doc https://docs.openstack.org/openstack-ansible/latest/user/limited-connectivity/index.html#other-proxy-configuration13:41
noonedeadpunk`deployment_environment_variables`13:42
mgariepyno_proxy length 1024 .. oof i13:43
mgariepyi'm a 1006 LOL13:43
dokeeffe85Hi all, sorry to bother you again. I installed octavia and ran haproxy as noonedeadpunk pointed out this morning but I cannot create a loadbalancer successfully. https://paste.openstack.org/show/bAM1QuK7DZGgj1wQtvjS/14:41
noonedeadpunkwell, it looks like it's neutron issue at the first place?14:43
noonedeadpunkAre you able to spawn any other VM from that network? Or attach port from it to some VM?14:43
dokeeffe85Let me try14:44
noonedeadpunkAlso worth checking logs for neutron agent on compute14:44
dokeeffe85Will do that too14:44
noonedeadpunkAlso `physnet2` should not be used as a flat network anywhere14:45
noonedeadpunkIIRC14:45
noonedeadpunkas then it will be part of the bridge and neutron might fail to manage it14:45
anskiyhello! Is it possible to have some of the services in a group (I'm trying to do this with `storage_hosts` group) to be deployed in LXC and the others directly on metal?15:15
noonedeadpunksure, it's possible15:16
noonedeadpunkanskiy: https://docs.openstack.org/openstack-ansible/latest/reference/inventory/configure-inventory.html#deploying-directly-on-hosts15:17
spatelnoonedeadpunk are you guys using cinder-backup service?  what provider do you use?15:20
spatelNFS or ceph ?15:20
noonedeadpunkwe don't now, but I'd use S3/swift15:21
spatelWe have very small ceph storage and it doesn't have S3 service15:22
spatelits 3 node ceph 15:22
spatelto run S3 i need S3 gateway correct on dedicated node15:22
spatelThis is what i configured to point cinder-backup to ceph - https://paste.opendev.org/show/bPKkDsAxiRdcNll2WSuR/ 15:24
opendevreviewMerged openstack/ansible-role-zookeeper stable/zed: Add configuration option for native Prometheus exporter  https://review.opendev.org/c/openstack/ansible-role-zookeeper/+/87175315:28
opendevreviewMerged openstack/ansible-role-systemd_service stable/yoga: Ensure daemon is reloaded on socket change  https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/87175215:53
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Bump OSA for stable/zed to cover CVE-2022-47951  https://review.opendev.org/c/openstack/openstack-ansible/+/87183016:03
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Bump OSA for stable/yoga to cover CVE-2022-47951  https://review.opendev.org/c/openstack/openstack-ansible/+/87183416:41
noonedeadpunkspatel: well, I'd say everyting that can do incremental backups is good enough. But I defenitely would avoid nfs17:20
spatelHmm! 17:20
spatelDo you guys create vm with cinder volume ? or let me ask this way.. what is the best approach here? 17:21
noonedeadpunkYes, we're moving from ephemerals to cinder volumes at the moment17:36
noonedeadpunkI'm not sure about reasons why you might want to use ephemerals, rather then keep them for local storage17:37
noonedeadpunkie, get couple of drives in raid and utilize for CI runners17:37
noonedeadpunkAs with nova handling drives you need to have soooo way more flavors to cover demand17:38
noonedeadpunkfor actually no good reason imo17:38
jrosseri wish this was all more transparent to mix17:39
jrosserlike making CI runners with local storage and other VM with BFV seems unnecessarily hard17:40
jrossermaking/mixing17:40
spatelnoonedeadpunk agreed to use volume based vm17:41
spatelBut how does flavor disk size and volume size work? 17:41
spateldoes volume override flavor disk size?17:41
noonedeadpunkyes it does17:43
noonedeadpunkbut eventually you should just have 0 size of disk in flavor for bfv17:43
noonedeadpunkjrosser: well... you can spawn cinder-volume on each compute I guess :D17:43
noonedeadpunknot sure it's easier or better though....17:44
spatelif you run cinder-volume with local-storage in that case what if server is dead, you will have outage correct?17:47
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/xena: Bump OSA for stable/zed to cover CVE-2022-47951  https://review.opendev.org/c/openstack/openstack-ansible/+/87183917:47
noonedeadpunkwith local storage you wil lhave outage, yes. But well. It's what ephemeral literally means 17:48
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/xena: Bump OSA for stable/xena to cover CVE-2022-47951  https://review.opendev.org/c/openstack/openstack-ansible/+/87183917:48
spatelBut if you run nova+ceph then you will have option to migrate VM17:49
noonedeadpunkI think these things are for different purposes17:49
spatelnova+ceph+ephemeral i meant  (without volume)17:49
noonedeadpunkAs you won't get low-latency with ceph17:49
spatelYes.. that is what i am doing all local storage without ceph 17:50
spatelI am exploring other option for new cloud to run all central ceph based vm17:50
spatelThis cloud is for different purpose 17:50
spatelI want backup of volume alsso 17:50
spatelThat is why testing cinder-backup option to understand what will fit better here17:51
noonedeadpunkOr well. You can, if you will do caching to local nvme, so ceph will consider write commited and appllied once it will get written on local nvme, and then it will move data asychronosly to OSDs  17:52
noonedeadpunkbut well, that has quite same drawbacks as local drive imo...17:52
spatellocal nvme for caching in ceph? 17:53
noonedeadpunkyeah....17:53
spatelnever heard of that17:53
spatelHow does it work?17:53
noonedeadpunkIf I'm not mistaken, I'm talking about https://docs.ceph.com/en/octopus/rbd/rbd-persistent-cache/17:55
noonedeadpunkbut likely I am mistaken and it's smth different17:57
spatelHmm.. so local compute node disk (nvme) will be part of ceph in that case ?17:57
spatelIt doesn't make sense :(17:57
spatelLook like they are talking about ceph cache tiering method 17:58
noonedeadpunksorry, it's https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/17:58
spatelIt's deprecated and it's use is discouraged.17:58
noonedeadpunkbut yes. you're using local drive on compute node for caching writes to "improve" latency17:59
noonedeadpunkas ceph will ack commit once data gets to local drive17:59
noonedeadpunkso basically you get latency and throughput of local disk with ceph18:00
noonedeadpunkbut yes, with quite a risk to get data corrupted in case of compute going down or disk failure...18:01
noonedeadpunkBut might be better then just local disk.... But again, I don't really trust this thing18:01
spatelagreed! 18:02
noonedeadpunkthough our storage folks eager to get it running asap :D18:02
spatelThis kind of solution looks fancy but comes with troubleshooting cost18:02
spatelit makes operator life hell :)18:02
spatelshare your case study with us :)18:03
noonedeadpunkwell, while testing at sandbox, things look amazing18:04
noonedeadpunkBut I really don't want this to go out of the sandbox to be frank18:04
spatelIt works great because single machine but when it come down to multiple nodes then it would be interesting to see 18:05
spatelbut idea sounds great 18:05
spatelworth testing out18:06
noonedeadpunkWell, I don't think things go south with more nodes, given that you still maintain througput and latency on ceph side and scale in time18:08
noonedeadpunkBut in case of incidents I'm not sure what's gonna happen with data that was not transferred18:09
noonedeadpunkLike compute failure18:09
noonedeadpunkanyway18:10
jrosserread cache looks nice though18:13
jrosserI expect that could have a significant reduction in network and osd io for some cases18:14
noonedeadpunkyeah, true18:23
jrosserand the write cache being able to apply to a specific pool would be perhaps ok18:25
jrossersomehow that feels like it might be an actual solution to having something that feels like local storage, on a different volume type18:26
noonedeadpunkyep, that is true18:27
noonedeadpunkBut I guess you should also then somehow prevent evacuating VMs that are on this pool18:27
noonedeadpunkas if it's just compute that's catched kernel panic or smth - it's way better to wait for it rather then repair all filesystems18:28
noonedeadpunkor databases18:28
noonedeadpunkAnd I still have scars from same folks pushing for tiering cache which ended up as all cache tiering ends18:31
noonedeadpunkso no trust in write cache + ceph combo18:31
noonedeadpunkbut yes, in tests fio has shown just same results for iops/throughput as for same ssd being used as local drive18:33
moha7"Flat type for provider network is not recommended for production"; Why? This is the question that the network team asked me.20:04
moha7Compared to the vlan type, is it only for the reason of expansion issues in the future that you suggest that the network should not be set as a flat type or are there other reasons?20:06
jrossermoha7: to add an extra flat type network later is a massive job with a lot of config changes on all hosts and in OSA config, and then in neutron. some reasonable risk involved.20:19
jrossermoha7: extra vlan type network is 1) your network team enable that vlan on the trunk 2) you issue a openstack cli command to create the new neutron network, job done20:20
jrosserthere is no redeployment and no config changes anywhere, no services restarted or adjustment of physical interfaces in your hosts20:20
moha7+120:21
moha7In the stage env, our controllers and compute servers have two 2-port 10G network cards and one 4-port 1G network card. Here's the design I'm going to ask for from the network team:20:22
moha710G+10G -> bond0 ---> for Ceph20:22
moha72nd 10G ports -> bond1 ---> for the self-service network (br-vxlan)20:22
moha7Provider network (br-vlan): one 1G port20:22
moha7Management network (br-mgmt): one 1G port20:23
jrosseryou still need some vlans20:23
moha7API (external_haproxy_lb): one 1G port20:23
moha7Log (ELK): one 1G port20:23
moha7I'm going to fit each br-x with an interface20:24
moha7Does the above plan look reasonable?20:25
jrossermapping br-<x> to a physical interface is ok perhaps20:27
jrosserbut what will you do for octavia when osa wants br-lbaas or ironic wanting br-bmaas on the controllers?20:28
moha7You mean if we vlan, we will handle new features easier  in the future, right? Hmm, Yeah20:31
moha7Is it the right decision that I am giving the tenants network (br-vxlan) much more bandwidth than the external network? (20G vs 1G)20:34
jrosserwell that entirely depends on your workload requirements20:35
jrosserI think also it is unbalanced to have bonds for some things and not for br-mgmt, as that is critical to the whole control plane20:36
jrosseranywhere you only have one port and not a bind consider the impact of the network team doing a firmware upgrade, your links are down in the at time20:37
jrosser*bond20:37
jrosserthen you want your out-of-band/ipmi etc dedicated port (shared ports can be difficult)20:40
moha7it's true. For the production environment, all networks will be redundant. We currently have a limited number of cards in stock. These cards will be paired as soon as the server team provides them.20:40
jrosserand also consider which of these many interfaces is the one you pxeboot from and connect with the deploy host ssh20:40
jrosserpersonally I would have less individual ports, bundle stuff together on a trunk more and use the 1G port for deployment / monitoring20:42
jrosserthere’s no user impact if a non redundant port for you to ssh in on goes down20:42
moha7That means as the Deployer host speaks to nodes on mgmt range, the br-mgmt interface should be set as pxeboot, right?20:43
jrosserand you can absolutely have a different setup on the controller vs the computes20:43
jrosseras the requirements might well be different20:43
jrosseronly if you want it to :)20:44
jrosserI have a separate interface fo let deploy/ssh/pxeboot/monitoring that’s not redundant, but that’s just suiting my environment20:44
jrosserand the mgmt network is elsewhere on a bond20:45
jrosserbut really, keep as simple as possible whilst meeting your requirements for security, HA, maintainability, flexibility, performance etc…..20:50
jrosserI know that’s very easy to say, and reality is it’s a tough problem with many aspects to consider20:51
*** dviroel is now known as dviroel|out20:58
moha7Thank you jrosser for help and all the tips21:06

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!