opendevreview | Merged openstack/ansible-role-systemd_service stable/zed: Ensure daemon is reloaded on socket change https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/871751 | 00:21 |
---|---|---|
noonedeadpunk | another vote would be preferable for https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/871752 | 08:25 |
noonedeadpunk | Going to do another bump today as all patches for OSSA-2023-002 are landed | 08:26 |
dokeeffe85 | Morning, could anyone give me a pointer as to what I should look for after this: https://paste.openstack.org/show/bpq4ew3P61rfwp5OuqgS/ I thought I had installed octavia properly but following the cookbook guide I get that | 09:13 |
noonedeadpunk | dokeeffe85: have you ran haproxy-install.yml after adding octavia to the inventory? | 09:15 |
dokeeffe85 | ah noonedeadpunk, sorry my bad. I'll do that now | 09:18 |
dokeeffe85 | Too early for this :) Thanks noonedeadpunk that did it | 09:23 |
noonedeadpunk | sweet ) | 09:23 |
noonedeadpunk | I usually also forget tu run haproxy role to be frank when adding new services, so it's common thing to forget :) | 09:23 |
noonedeadpunk | thankfully, that error you've pasted is quite explicit about what is wrong ;) | 09:24 |
noonedeadpunk | jrosser: andrewbonney: it would be great if you could check this bug: https://bugs.launchpad.net/openstack-ansible/+bug/2003921 as I'm quite far from using proxies... | 11:24 |
*** dviroel|out is now known as dviroel | 11:32 | |
*** dviroel_ is now known as dviroel | 11:45 | |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-ceph_client master: Improve regexp for fetching nove secret from files https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/871819 | 12:49 |
jrosser | noonedeadpunk: i can look at that but just probably not today - but we do have proxy CI job which is working so i'm not sure what it going on there | 12:59 |
ierdem | Hi, when I try to cold migrate VMs via cli by specifying destination host, it throws an exception after first migrate "No valid host was found" -no more details, just this message-, but destination host has enough resource. Does nova-scheduler cause this? If is, how can I force it to migrate more than one VMs to the same host? Thanks for all your assistance. (I have kolla-ansible stein-eol) | 13:25 |
noonedeadpunk | jrosser: yeah, I guess no rush here as even workaround was proposed. It's just hard to evaluate for me what they've faced with and why that workaround is needed. I assume it might be during debootstrap. As I had to override lxc_hosts_container_build_command for example, to include path to gpg | 13:33 |
noonedeadpunk | but I have fully isolated env, and have no idea what can be needed for proxy there | 13:33 |
mgariepy | don't you need to set the global_environment_variables with various proxys ? | 13:37 |
mgariepy | https://github.com/openstack/openstack-ansible-openstack_hosts/blob/master/templates/environment.j2 | 13:38 |
noonedeadpunk | well. there's alternative approach as well due to pam.d limitations of no_proxy length | 13:38 |
noonedeadpunk | or smth like that | 13:38 |
noonedeadpunk | so you can provide some vars to do same but during runtime only | 13:38 |
noonedeadpunk | sorry, my real knowledge is quite abstract as never had to run that kind of setup | 13:39 |
mgariepy | if the network is restricted you will need to have the proxy for every command mostly. you need to have it configured to wget stuff also. | 13:39 |
noonedeadpunk | here's the doc https://docs.openstack.org/openstack-ansible/latest/user/limited-connectivity/index.html#other-proxy-configuration | 13:41 |
noonedeadpunk | `deployment_environment_variables` | 13:42 |
mgariepy | no_proxy length 1024 .. oof i | 13:43 |
mgariepy | i'm a 1006 LOL | 13:43 |
dokeeffe85 | Hi all, sorry to bother you again. I installed octavia and ran haproxy as noonedeadpunk pointed out this morning but I cannot create a loadbalancer successfully. https://paste.openstack.org/show/bAM1QuK7DZGgj1wQtvjS/ | 14:41 |
noonedeadpunk | well, it looks like it's neutron issue at the first place? | 14:43 |
noonedeadpunk | Are you able to spawn any other VM from that network? Or attach port from it to some VM? | 14:43 |
dokeeffe85 | Let me try | 14:44 |
noonedeadpunk | Also worth checking logs for neutron agent on compute | 14:44 |
dokeeffe85 | Will do that too | 14:44 |
noonedeadpunk | Also `physnet2` should not be used as a flat network anywhere | 14:45 |
noonedeadpunk | IIRC | 14:45 |
noonedeadpunk | as then it will be part of the bridge and neutron might fail to manage it | 14:45 |
anskiy | hello! Is it possible to have some of the services in a group (I'm trying to do this with `storage_hosts` group) to be deployed in LXC and the others directly on metal? | 15:15 |
noonedeadpunk | sure, it's possible | 15:16 |
noonedeadpunk | anskiy: https://docs.openstack.org/openstack-ansible/latest/reference/inventory/configure-inventory.html#deploying-directly-on-hosts | 15:17 |
spatel | noonedeadpunk are you guys using cinder-backup service? what provider do you use? | 15:20 |
spatel | NFS or ceph ? | 15:20 |
noonedeadpunk | we don't now, but I'd use S3/swift | 15:21 |
spatel | We have very small ceph storage and it doesn't have S3 service | 15:22 |
spatel | its 3 node ceph | 15:22 |
spatel | to run S3 i need S3 gateway correct on dedicated node | 15:22 |
spatel | This is what i configured to point cinder-backup to ceph - https://paste.opendev.org/show/bPKkDsAxiRdcNll2WSuR/ | 15:24 |
opendevreview | Merged openstack/ansible-role-zookeeper stable/zed: Add configuration option for native Prometheus exporter https://review.opendev.org/c/openstack/ansible-role-zookeeper/+/871753 | 15:28 |
opendevreview | Merged openstack/ansible-role-systemd_service stable/yoga: Ensure daemon is reloaded on socket change https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/871752 | 15:53 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Bump OSA for stable/zed to cover CVE-2022-47951 https://review.opendev.org/c/openstack/openstack-ansible/+/871830 | 16:03 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Bump OSA for stable/yoga to cover CVE-2022-47951 https://review.opendev.org/c/openstack/openstack-ansible/+/871834 | 16:41 |
noonedeadpunk | spatel: well, I'd say everyting that can do incremental backups is good enough. But I defenitely would avoid nfs | 17:20 |
spatel | Hmm! | 17:20 |
spatel | Do you guys create vm with cinder volume ? or let me ask this way.. what is the best approach here? | 17:21 |
noonedeadpunk | Yes, we're moving from ephemerals to cinder volumes at the moment | 17:36 |
noonedeadpunk | I'm not sure about reasons why you might want to use ephemerals, rather then keep them for local storage | 17:37 |
noonedeadpunk | ie, get couple of drives in raid and utilize for CI runners | 17:37 |
noonedeadpunk | As with nova handling drives you need to have soooo way more flavors to cover demand | 17:38 |
noonedeadpunk | for actually no good reason imo | 17:38 |
jrosser | i wish this was all more transparent to mix | 17:39 |
jrosser | like making CI runners with local storage and other VM with BFV seems unnecessarily hard | 17:40 |
jrosser | making/mixing | 17:40 |
spatel | noonedeadpunk agreed to use volume based vm | 17:41 |
spatel | But how does flavor disk size and volume size work? | 17:41 |
spatel | does volume override flavor disk size? | 17:41 |
noonedeadpunk | yes it does | 17:43 |
noonedeadpunk | but eventually you should just have 0 size of disk in flavor for bfv | 17:43 |
noonedeadpunk | jrosser: well... you can spawn cinder-volume on each compute I guess :D | 17:43 |
noonedeadpunk | not sure it's easier or better though.... | 17:44 |
spatel | if you run cinder-volume with local-storage in that case what if server is dead, you will have outage correct? | 17:47 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/xena: Bump OSA for stable/zed to cover CVE-2022-47951 https://review.opendev.org/c/openstack/openstack-ansible/+/871839 | 17:47 |
noonedeadpunk | with local storage you wil lhave outage, yes. But well. It's what ephemeral literally means | 17:48 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/xena: Bump OSA for stable/xena to cover CVE-2022-47951 https://review.opendev.org/c/openstack/openstack-ansible/+/871839 | 17:48 |
spatel | But if you run nova+ceph then you will have option to migrate VM | 17:49 |
noonedeadpunk | I think these things are for different purposes | 17:49 |
spatel | nova+ceph+ephemeral i meant (without volume) | 17:49 |
noonedeadpunk | As you won't get low-latency with ceph | 17:49 |
spatel | Yes.. that is what i am doing all local storage without ceph | 17:50 |
spatel | I am exploring other option for new cloud to run all central ceph based vm | 17:50 |
spatel | This cloud is for different purpose | 17:50 |
spatel | I want backup of volume alsso | 17:50 |
spatel | That is why testing cinder-backup option to understand what will fit better here | 17:51 |
noonedeadpunk | Or well. You can, if you will do caching to local nvme, so ceph will consider write commited and appllied once it will get written on local nvme, and then it will move data asychronosly to OSDs | 17:52 |
noonedeadpunk | but well, that has quite same drawbacks as local drive imo... | 17:52 |
spatel | local nvme for caching in ceph? | 17:53 |
noonedeadpunk | yeah.... | 17:53 |
spatel | never heard of that | 17:53 |
spatel | How does it work? | 17:53 |
noonedeadpunk | If I'm not mistaken, I'm talking about https://docs.ceph.com/en/octopus/rbd/rbd-persistent-cache/ | 17:55 |
noonedeadpunk | but likely I am mistaken and it's smth different | 17:57 |
spatel | Hmm.. so local compute node disk (nvme) will be part of ceph in that case ? | 17:57 |
spatel | It doesn't make sense :( | 17:57 |
spatel | Look like they are talking about ceph cache tiering method | 17:58 |
noonedeadpunk | sorry, it's https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/ | 17:58 |
spatel | It's deprecated and it's use is discouraged. | 17:58 |
noonedeadpunk | but yes. you're using local drive on compute node for caching writes to "improve" latency | 17:59 |
noonedeadpunk | as ceph will ack commit once data gets to local drive | 17:59 |
noonedeadpunk | so basically you get latency and throughput of local disk with ceph | 18:00 |
noonedeadpunk | but yes, with quite a risk to get data corrupted in case of compute going down or disk failure... | 18:01 |
noonedeadpunk | But might be better then just local disk.... But again, I don't really trust this thing | 18:01 |
spatel | agreed! | 18:02 |
noonedeadpunk | though our storage folks eager to get it running asap :D | 18:02 |
spatel | This kind of solution looks fancy but comes with troubleshooting cost | 18:02 |
spatel | it makes operator life hell :) | 18:02 |
spatel | share your case study with us :) | 18:03 |
noonedeadpunk | well, while testing at sandbox, things look amazing | 18:04 |
noonedeadpunk | But I really don't want this to go out of the sandbox to be frank | 18:04 |
spatel | It works great because single machine but when it come down to multiple nodes then it would be interesting to see | 18:05 |
spatel | but idea sounds great | 18:05 |
spatel | worth testing out | 18:06 |
noonedeadpunk | Well, I don't think things go south with more nodes, given that you still maintain througput and latency on ceph side and scale in time | 18:08 |
noonedeadpunk | But in case of incidents I'm not sure what's gonna happen with data that was not transferred | 18:09 |
noonedeadpunk | Like compute failure | 18:09 |
noonedeadpunk | anyway | 18:10 |
jrosser | read cache looks nice though | 18:13 |
jrosser | I expect that could have a significant reduction in network and osd io for some cases | 18:14 |
noonedeadpunk | yeah, true | 18:23 |
jrosser | and the write cache being able to apply to a specific pool would be perhaps ok | 18:25 |
jrosser | somehow that feels like it might be an actual solution to having something that feels like local storage, on a different volume type | 18:26 |
noonedeadpunk | yep, that is true | 18:27 |
noonedeadpunk | But I guess you should also then somehow prevent evacuating VMs that are on this pool | 18:27 |
noonedeadpunk | as if it's just compute that's catched kernel panic or smth - it's way better to wait for it rather then repair all filesystems | 18:28 |
noonedeadpunk | or databases | 18:28 |
noonedeadpunk | And I still have scars from same folks pushing for tiering cache which ended up as all cache tiering ends | 18:31 |
noonedeadpunk | so no trust in write cache + ceph combo | 18:31 |
noonedeadpunk | but yes, in tests fio has shown just same results for iops/throughput as for same ssd being used as local drive | 18:33 |
moha7 | "Flat type for provider network is not recommended for production"; Why? This is the question that the network team asked me. | 20:04 |
moha7 | Compared to the vlan type, is it only for the reason of expansion issues in the future that you suggest that the network should not be set as a flat type or are there other reasons? | 20:06 |
jrosser | moha7: to add an extra flat type network later is a massive job with a lot of config changes on all hosts and in OSA config, and then in neutron. some reasonable risk involved. | 20:19 |
jrosser | moha7: extra vlan type network is 1) your network team enable that vlan on the trunk 2) you issue a openstack cli command to create the new neutron network, job done | 20:20 |
jrosser | there is no redeployment and no config changes anywhere, no services restarted or adjustment of physical interfaces in your hosts | 20:20 |
moha7 | +1 | 20:21 |
moha7 | In the stage env, our controllers and compute servers have two 2-port 10G network cards and one 4-port 1G network card. Here's the design I'm going to ask for from the network team: | 20:22 |
moha7 | 10G+10G -> bond0 ---> for Ceph | 20:22 |
moha7 | 2nd 10G ports -> bond1 ---> for the self-service network (br-vxlan) | 20:22 |
moha7 | Provider network (br-vlan): one 1G port | 20:22 |
moha7 | Management network (br-mgmt): one 1G port | 20:23 |
jrosser | you still need some vlans | 20:23 |
moha7 | API (external_haproxy_lb): one 1G port | 20:23 |
moha7 | Log (ELK): one 1G port | 20:23 |
moha7 | I'm going to fit each br-x with an interface | 20:24 |
moha7 | Does the above plan look reasonable? | 20:25 |
jrosser | mapping br-<x> to a physical interface is ok perhaps | 20:27 |
jrosser | but what will you do for octavia when osa wants br-lbaas or ironic wanting br-bmaas on the controllers? | 20:28 |
moha7 | You mean if we vlan, we will handle new features easier in the future, right? Hmm, Yeah | 20:31 |
moha7 | Is it the right decision that I am giving the tenants network (br-vxlan) much more bandwidth than the external network? (20G vs 1G) | 20:34 |
jrosser | well that entirely depends on your workload requirements | 20:35 |
jrosser | I think also it is unbalanced to have bonds for some things and not for br-mgmt, as that is critical to the whole control plane | 20:36 |
jrosser | anywhere you only have one port and not a bind consider the impact of the network team doing a firmware upgrade, your links are down in the at time | 20:37 |
jrosser | *bond | 20:37 |
jrosser | then you want your out-of-band/ipmi etc dedicated port (shared ports can be difficult) | 20:40 |
moha7 | it's true. For the production environment, all networks will be redundant. We currently have a limited number of cards in stock. These cards will be paired as soon as the server team provides them. | 20:40 |
jrosser | and also consider which of these many interfaces is the one you pxeboot from and connect with the deploy host ssh | 20:40 |
jrosser | personally I would have less individual ports, bundle stuff together on a trunk more and use the 1G port for deployment / monitoring | 20:42 |
jrosser | there’s no user impact if a non redundant port for you to ssh in on goes down | 20:42 |
moha7 | That means as the Deployer host speaks to nodes on mgmt range, the br-mgmt interface should be set as pxeboot, right? | 20:43 |
jrosser | and you can absolutely have a different setup on the controller vs the computes | 20:43 |
jrosser | as the requirements might well be different | 20:43 |
jrosser | only if you want it to :) | 20:44 |
jrosser | I have a separate interface fo let deploy/ssh/pxeboot/monitoring that’s not redundant, but that’s just suiting my environment | 20:44 |
jrosser | and the mgmt network is elsewhere on a bond | 20:45 |
jrosser | but really, keep as simple as possible whilst meeting your requirements for security, HA, maintainability, flexibility, performance etc….. | 20:50 |
jrosser | I know that’s very easy to say, and reality is it’s a tough problem with many aspects to consider | 20:51 |
*** dviroel is now known as dviroel|out | 20:58 | |
moha7 | Thank you jrosser for help and all the tips | 21:06 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!