Friday, 2023-05-26

opendevreviewMerged openstack/openstack-ansible master: Run healthcheck-openstack from utility host  https://review.opendev.org/c/openstack/openstack-ansible/+/88349600:18
opendevreviewMerged openstack/openstack-ansible-os_nova master: Install libvirt-deamon for RHEL systems  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/88441501:25
NeilHanlonjrosser: https://review.opendev.org/c/openstack/diskimage-builder/+/88445202:57
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova stable/2023.1: Install libvirt-deamon for RHEL systems  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/88437907:18
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova stable/zed: Install libvirt-deamon for RHEL systems  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/88438007:18
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova stable/yoga: Install libvirt-deamon for RHEL systems  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/88438107:18
derekokeeffe85Morning all, any pointers as to what's causing this: TASK [Get list of repo packages] ************************************************************************************************************************************07:37
derekokeeffe85fatal: [infra1_utility_container-221a4415]: FAILED! => {"changed": false, "content": "", "elapsed": 0, "msg": "Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://10.37.110.100:8181/constraints/upper_constraints_cached.txt"}07:37
MrR8181 is swift isn't it? what release? The answer is right there though, connection was refused, is the container/bridge up?07:42
derekokeeffe85-Mr8 yep the container is up and I can attach to it, I'm not sure what 8181 is tbh. I don't see anything in the logs on the container either. Usually it is something I have failed to configure or configured wrong in the networking so you're probably right. What bridge would affect that?07:52
derekokeeffe85Actually is there documentation to show what containers rely on what bridges as that always seems a problem for me07:53
MrRI was refering to the port but my bad, 8181 is the repo server port (its early and I haven't even drank my coffee yet)07:56
MrRhttps://docs.openstack.org/openstack-ansible/latest/user/prod/example.html07:56
MrRshows a production example of networking07:56
derekokeeffe85haha that's ok :) There's the log I just got when it failed07:56
derekokeeffe85May 26 07:56:00 infra1-utility-container-221a4415 ansible-ansible.legacy.uri[3508]: Invoked with url=http://10.37.110.100:8181/constraints/upper_constraints_cached.txt return_content=True force=False http_agent=ansible-httpget use_proxy=True validate_certs=True force_basic_auth=False use_gssapi=False body_format=raw method=GET follow_redirects=safe status_code=[200] timeout=30 headers={} remote_src=False unredirected_headers=[] 07:56
derekokeeffe85unsafe_writes=False url_username=None url_password=NOT_LOGGING_PARAMETER client_cert=None client_key=None dest=None body=None src=None creates=None removes=None unix_socket=None ca_path=None mode=None owner=None group=None seuser=None serole=None selevel=None setype=None attributes=None07:56
noonedeadpunknah 8181 is a repo_container07:58
MrRI'm not an openstack dev btw, i'm just a guy that got some help and has learnt along the way to fix a lot of problems haha07:58
noonedeadpunkI think everyone here are like that07:59
derekokeeffe85I'm on a journey too :)07:59
noonedeadpunkso, in repo container there should be nginx running07:59
MrRwhat branch are you on and what stage is this failing, i'm assumingbootstrap or setup hosts as you cant really get any further without repo working07:59
noonedeadpunkI guess on utilit-isntall.yml08:00
noonedeadpunk*utility-install.yml08:00
derekokeeffe85I bug noonedeadpunk and jrosser all the time no it's setup-infrastructure.yml, setup hosts ran through yesterday after fixing the br-storage issue08:00
noonedeadpunkutility-install.yml is one of last pieces of setup-infrastructure https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/setup-infrastructure.yml#L2308:01
derekokeeffe8526.1.108:01
noonedeadpunkI'd propose to manually re-run repo-install.yml08:01
noonedeadpunkand see how it goes08:01
derekokeeffe85Will try that08:02
noonedeadpunkBut Connection refused... I wonder if that could be coming from haproxy....08:02
noonedeadpunkso might be worth checking if repo_server backend is considered UP in haproxy as well08:02
MrRyeah i just found it, i'll admit i haven't jumped into all the playbooks, only the ones that have broke on me haha08:03
noonedeadpunkMrR: thanks for stepping in btw and trying to help out!08:03
MrRhey i might of just burnt this all down if you guys hadn't helped, it's only right i try and do the same for others08:04
derekokeeffe85repo-install completed but during the run of the playbook I got this: TASK [systemd_mount : Set the state of the mount] ********************************************************************08:04
derekokeeffe85fatal: [infra1_repo_container-23e3ab6f]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.041776", "end": "2023-05-26 08:03:13.896337", "msg": "non-zero return code", "rc": 1, "start": "2023-05-26 08:03:13.854561", "stderr": "Job failed. See \"journalctl -xe\" for details.", "stderr_lines": ["Job failed. See \"journalctl -xe\" for details."], 08:04
derekokeeffe85"stdout": "", "stdout_lines": []}08:04
derekokeeffe85and in the container log this: May 26 08:03:13 infra1-repo-container-23e3ab6f systemd[1]: Reload failed for Auto mount for /var/www/repo.08:04
noonedeadpunkyeah, this one is "fine" as we have block/rescue there08:05
noonedeadpunkSo, if you just `curl http://10.37.110.100:8181/constraints/upper_constraints_cached.txt -v --head` ?08:08
noonedeadpunkwhat's the result will be?08:09
derekokeeffe85curl http://10.37.110.100:8181/constraints/upper_constraints_cached.txt -v --head08:09
derekokeeffe85*   Trying 10.37.110.100:8181...08:09
derekokeeffe85* TCP_NODELAY set08:09
derekokeeffe85* connect to 10.37.110.100 port 8181 failed: Connection refused08:09
derekokeeffe85* Failed to connect to 10.37.110.100 port 8181: Connection refused08:09
derekokeeffe85* Closing connection 008:09
noonedeadpunkosa-cores, let's quickly land https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/884379 :)08:09
derekokeeffe85curl: (7) Failed to connect to 10.37.110.100 port 8181: Connection refused08:09
noonedeadpunkand what's haproxy says on repo?08:10
noonedeadpunkecho "show stat" | nc -U /run/haproxy.stat | grep repo08:10
noonedeadpunkderekokeeffe85: also please use paste.openstack.org for providing outputs :)08:10
noonedeadpunkIRC not really designed for that08:11
derekokeeffe85Sorry noonedeadpunk will do08:11
noonedeadpunkno worries :)08:11
noonedeadpunkit's just tricky to read in native irc client at very least08:12
derekokeeffe85No worries. Em sorry for sounding completely stupid but where do I run that command? Which container?08:13
MrRon the host08:14
noonedeadpunkwith haproxy on it08:14
MrRyou may need sudo/su08:14
derekokeeffe85https://paste.openstack.org/show/bW3DdqnStlBjGaRU0xsT/08:16
noonedeadpunkthere's also hatop utility that should be present to help managing haproxy (just in case)08:17
noonedeadpunkare you running that as root?08:17
derekokeeffe85Yep08:17
noonedeadpunkis haproxy even alive?08:17
derekokeeffe85on infra1 (controller)08:17
derekokeeffe85yep https://paste.openstack.org/show/bqfvdEsjeY55YM4lbrYq/08:18
noonedeadpunkum08:19
noonedeadpunkthis is not healthy08:19
noonedeadpunkthese `can not bind` likely the root cause08:20
derekokeeffe85should I blow away and re deploy?08:21
noonedeadpunkwhat distro is that?08:21
derekokeeffe85ubuntu 22.04.2, Jammy08:22
noonedeadpunkhuh08:22
derekokeeffe85The OS I'm deploying on?08:22
noonedeadpunkcan you do smth like `journalctl -xn -u haproxy` ?08:22
noonedeadpunkso it looks like haproxy can not bind to ports for some reasons08:25
opendevreviewMerged openstack/openstack-ansible-os_nova stable/2023.1: Install libvirt-deamon for RHEL systems  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/88437908:25
derekokeeffe85I think I found it noonedeadpunk, I had a copy and paste error on the interface for haproxy_keepalived_internal_vip_cidr: thanks for talking me through it. and you too Mr808:27
noonedeadpunkaha, yes, ips can not be same for internal and external vips08:28
derekokeeffe85I had the netplans saved for easier redeployment and it had an error, same with br-storage yesterday08:28
derekokeeffe85Thanks again08:28
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump SHAs for OpenStack-Ansible 27.0.0.rc1  https://review.opendev.org/c/openstack/openstack-ansible/+/88420308:34
noonedeadpunkI hope this is final now ^08:34
derekokeeffe85That sorted that issue :)08:47
MrRnoonedeadpunk seems my logging issue has eased, i'm guessing it was down to having senlin/trove etc in a broken state, only (i say only!) 2.5gb of logs per node in the last 12-18 hours, much better than the 15-20gb a day at least08:51
noonedeadpunkwell, operational cluster indeed quite intense in logs10:21
noonedeadpunkprobably we should implement variable to control verbosity of logs to warning, for example10:21
damiandabrowskiall tls-backend patches passed CI and are ready for review10:28
damiandabrowskihttps://review.opendev.org/q/topic:tls-backend+status:open10:28
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Fix repo url in healthcheck-infrastructure.yml  https://review.opendev.org/c/openstack/openstack-ansible/+/88444512:24
jrosserthis needs another vote https://review.opendev.org/c/openstack/openstack-ansible/+/88420315:06
admin1is my vote eligible :D ? 15:07
admin1 i see it straightforward .. the changes.. 15:09
NeilHanlonadmin1: you're of course welcome to add your vote and review :) but each change needs at least two core reviewers' reviews, too :) we always encourage more people to review, though15:11
NeilHanlonjrosser: looking now15:11
jrosserNeilHanlon: this creates a SHA in openstack-ansible repo that we will then use as the branch point for antelope15:13
NeilHanlonjrosser: roger. it looks good to me :) 15:14
NeilHanlonbtw jrosser, not sure if you saw my patch for DIB to build `kernel-64k` rocky 9 images15:20
jrosserah yes i did - thanks for looking at that15:20
jrosseri think we will be able to have a go next week with it15:20
NeilHanlonawesome! glad to hear it15:21
MrRquick question, for horizon is there a speciic way to customize it (logo etc) via override files/directorys or am i doing it directly in the horizon container? 15:31
NeilHanlonMrR: https://docs.openstack.org/horizon/latest/configuration/customizing.html15:38
MrRI did find that but it assumes direct access to the files, obviously i can just connect to the container but then changes would be lost with upgrades, what i'm asking is if i can put them in a file or user_variables  etc 15:42
jrosserMrR: the "reference" is pretty much always defaults/main.yml in the relevant ansible role15:43
jrosserthere you will find this https://github.com/openstack/openstack-ansible-os_horizon/blob/master/defaults/main.yml#L377-L38815:43
jrosserand you can also have an entirely custom theme if you want to15:44
MrRgreat thanks, a custom theme seems like more work than i'm willing to go to right now but i'll keep it in mind15:46
MrRi pretty much only have ironic to contemplate now and some network hardening then i may be done15:47
MrRwith ironic, is a seperate host that can't run compute neccesery or is there a way around that? For testing i'd guess a few vms would do but may not do well in production15:47
jrosserironic is bare metal deployment, so really for production you'd need a use case requireing that sort of facility15:49
jrosserandd sufficient suitable nodes to make it viable15:49
jrosserfor example we use ironic today in a test lab environment, which would otherwise have been manually configured bare metal servers15:50
MrRthe idea was to have it ready to automatically provision future hardware into the stack15:50
jrosserbut with ironic we can define and build/tear down the lab with terraform15:50
MrRright now this is 3 servers, and ironic seemed like a way to add 10 more without a headache every time15:51
jrosseryou can do that too :)15:51
jrosserthough i would say that *everything* about ironic is configurable15:52
jrosserand in a practical situation the intersection of your use case / the hardware you have / the many bugs in everything / etc leaves actually only a small subset of things that are workable15:53
adi_hi15:53
MrRso is a dedicated ironic machine needed or could i get away with it being a vm in openstack? Obviously this would be down if the instance was down but once 3 more servers are added an ironic node would be viable15:54
adi_i am seeing this error in my horizon container, is this any kind of bug  in xena15:54
adi_[Thu May 25 14:35:26.734806 2023] [mpm_event:notice] [pid 840:tid 140502761360448] AH00489: Apache/2.4.41 (Ubuntu) configured -- resuming normal operations [Thu May 25 14:35:26.734995 2023] [core:notice] [pid 840:tid 140502761360448] AH00094: Command line: '/usr/sbin/apache2' [Thu May 25 14:35:28.023081 2023] [mpm_event:notice] [pid 840:tid 140502761360448] AH00491: caught SIGTERM, shutting down [Thu May 25 14:35:28.095876 2023] [mpm_event:notice] 15:54
adi_CLI works fine15:54
jrosserMrR: the ironic service deploys bare metal mahcines by pxebooting them for you and doing lifecycle management15:55
jrosserits not really something that you can replace with a VM (outside an artificial CI setup)15:55
jrosseradi_: you will need to look through the logs more to see why that has happened15:56
jrosserfor example, is it the OOM killer?15:56
jrosserSIGTERM has to have come from somewhere15:56
adi_it is coming from here15:57
adi_OpenSSL/1.1.1f mod_wsgi/4.6.8 Python/3.815:57
adi_H00292: Apache/2.4.41 (Ubuntu) OpenSSL/1.1.1f mod_wsgi/4.6.8 Python/3.8 configured15:57
jrossercan you please use a paste service for debug logs15:58
MrRthe bare metal machines will come, it's just in my current environment i have 3, the idea was for ironic to provision the next node that gets added, if that however requires a dedicated machine it can wait, which means i'm almost done until i start adding more machines, the first oif which will now be an ironic node to make future expansion much smoother 15:58
jrosseradi_: really i don't know what that means unnfortunately15:58
jrosserMrR: you can use ironic to deploy the next node15:59
jrosserthe ironic service runs on your existing controllers15:59
jrosserMrR: i recently added a full example for LXC deployment of ironic to the docs https://docs.openstack.org/openstack-ansible-os_ironic/latest/configure-lxc-example.html16:00
MrRnot yet i cant as all 3 machines are currently compute, or can i run ironic alongside compute on the same node? i was under the impressiuon i cant16:00
jrossersorry i thought you meant 3 controllers16:00
jrosser"run ironic" this is confusing :)16:01
jrosseryou run the ironic service on your controllers16:01
jrosserit then PXEboots some other servers for you, as needed16:01
jrosser"run ironic alongside compute on the same node" <- this cannot be16:01
MrRyeah my bad, i converged for initial testing, as nodes are added things will become less converged16:01
jrosseras by definition ironic will PXEboot the node and wipe the disks / deploy a new OS16:02
MrRi'm also not distinguishing ironic and the ironic api which probably isnt helping clarification16:03
jrosseradi_: you have shown the message that apache made when it received SIGTERM. the OOM killer will typically put information into syslog for example16:03
MrRadi_ can you ping your haproxy external vip ip? check your haproxy config in user_variables if you can't16:07
jrosseradi_: ^ this is a good point - your CLI access from the utility container will be through the internal vip, but horizon access will be the external vip16:11
adi_ok16:13
adi_The Ip is pingable , that part i know, There is a issue because it is not that page is not openning , it sometime opens , Some time it doesnt16:14
jrosserthe only way is to do systematic debugging in the horizon container16:14
jrosserremember that there is a loadbalancer, so requests are round-robin between the horizon backends16:15
jrosserso if one of N is broken you can easily see this "works sometimes / broken sometimes" behaviour16:15
adi_i debug yesterday , There were error, i made in user variables horizon_enable_designate_ui: False , there are no errors after that16:15
MrRdo any other services periodically fail/timeout?16:16
adi_apache also comes clean, Sometime the horizon page open , and the login page is in a very bad shape, All the icons are here and there16:16
MrRyou said you have enough memory, do you have enough cpu power/cores? Do you have enough free hdd space, What are you trying to run on what hardware? it still sounds like a performance/hardware issue to me, especially if its sporadic16:18
MrRand I've seen a lot of performance issues, at one point i tried to run a full stack on a 10 year old machine with 32gb ram, it was extremely painful!16:19
jrosseradi_: you did not tell us at all why you need the designate setting16:20
jrosserplease remember we are not familiar with your deployment, i have no idea why you would need to disable the designate UI16:21
adi_hi jrosser, i can remove that , because i completely remove the designate initially16:37
jrosserso is that related to the horizon troubles?16:38
adi_it was initially as it was showing apache 2 status, but when i remove designate it was gone, i just added this for cusriosity i can remove that16:40
adi_actually i have few queries about ansible, when i do a got checkout at 24.0.0 , it did fail on task " parallel repo"16:41
adi_when i did bootstrap ansible and gpg keys16:41
adi_Can it be a problem, because then i did a minror version upgrade, just to see how it goes,  and there the boot strap ansible does not show any error may be it only boot strap the changes from the OLD16:42
jrosserwell, 24.0.0 would be the very first release of that branch, and probably has bugs16:42
adi_the minor upgarde was clean, no errors16:43
adi_only horizon issue , playbook is clean16:43
jrosserit is possible that there were bugs fixed in the git clone process16:46
jrosserbut remember that the git clone retrieves the ansible roles, not the code for horizon16:47
adi_yeah i know16:47
adi_roles are imp16:47
adi_but this horizon is pain , in my other prod env also everything comes clean16:48
adi_but when u move drom one project to another, it shows gateway time out16:49
adi_every ip is reachable, memcache is fine16:49
MrRis there a reason your using 24.0.0 and not 26.1.1? I defiantly had problems when deploying 24/25 that i haven't had in 2616:51
adi_my openstack env  is extensively used16:52
adi_untill  i [roved the POC, i ca not upgrade16:52
adi_so i was testing from xena to yoga dirst16:52
adi_starting from scratch is easy16:52
jrosserMrR: please do file bugs if you find any16:53
adi_but upgrade need to come clean, we can not do scratch eevrytime16:53
jrosserthe stable branches should be good16:53
adi_ok16:53
jrosseradi_: but it is a good question, do you really use 24.0.0 or the latest tag of xena?16:54
MrRmost of the bugs i've found are already patched for the next release, some others it was debateable if i was the cause16:54
adi_in my test i am in 24.6.016:54
adi_i am planning to upgrade to 25.2.016:55
adi_yoga , once this horizon is fixed16:55
jrosserbugfixes do get backported, so do let us know if we've missed something16:55
jrosseradi_: do you try to reproduce this in an all-in-one?16:55
adi_i can try to, but i know if i go from scratch no issues will come16:56
opendevreviewMerged openstack/openstack-ansible master: Bump SHAs for OpenStack-Ansible 27.0.0.rc1  https://review.opendev.org/c/openstack/openstack-ansible/+/88420317:29
NeilHanlonšŸ„³17:46
noonedeadpunksweet :)18:39
admin1adi_ i have one cluster that i moved from rocky  -> 26.1.1 .. and in between changed from ceph ansible -> cephadm 18:48
admin1i think i hit the redis not present  in gnocci issue again 18:49
admin1i will have more data on monday 18:49
noonedeadpunkiirc there was a patch that allowed to install it?18:50
admin1yeah .. i recall this being addressed and fixed .. but that was like a 100 deployments ago .. a new one needed to do the same 18:50
admin1gnocci using redis for cache and then using ceph for metrics 18:50
noonedeadpunkwe have also var like `gnocchi_storage_redis_url`18:50
noonedeadpunk(and gnocchi_incoming_redis_url)18:51
admin1its the driver that goes missing 18:51
noonedeadpunkbut yes, this setup makes most sense and performance to me as well18:51
admin1i recall going into the venv and manually doing pip install redis to move things ahead18:51
noonedeadpunkthough I wish gnocchi was supporting zookeeper as incoming driver...18:51
noonedeadpunkso now packages being added with that https://opendev.org/openstack/openstack-ansible-os_gnocchi/src/branch/master/defaults/main.yml#L177-L18118:52
noonedeadpunkso if you set gnocchi_incoming_driver to redis - it should get in18:53
admin1gnocchi_incoming_driver: redis        -- i have this set 18:56
admin1hmm.. i also have  gnocchi_conf_overrides:   =>  incoming:       =>   driver: redis   redis_url: redis://172.29.236.111:637918:56
admin1is the override not required anymore ? 18:57
admin1i see 18:57
admin1all i need is gnocchi_storage_redis_url and gnocchi_incoming_driver18:57
admin1maybe it was due to the overrides .. 18:57
admin1why would this error come ? tag 26.1.1     22.04 jammy  .. fatal: [ams1h2 -> ams1c1_repo_container-75a86909(172.29.239.37)]: FAILED! => {"attempts": 5, "changed": false, "msg": "No package matching '{'name': 'ubuntu-cloud-keyring', 'state': 'present'}' is available"} 20:12
admin1it did not came in the 1st run ..   from 2nd run, it starts to come 20:12
admin1TASK [python_venv_build : Install distro packages for wheel build] ************************************************************************************************************************20:12
admin1this is in the neutron playbook .. playbooks before this seem fine 20:20
admin1issue seems to be only in the os-neutron playbook .. rest are moving along fine 20:25
jrossertry looking at the output of `apt policy` for that package20:59
jrosseradmin1: ^21:00
jrosseryou can see here that package should be trivial to install https://packages.ubuntu.com/search?suite=all&searchon=names&keywords=ubuntu-cloud-keyring21:01
admin1the package is already there and the newest version .. 21:14
opendevreviewMerged openstack/openstack-ansible master: Fix repo url in healthcheck-infrastructure.yml  https://review.opendev.org/c/openstack/openstack-ansible/+/88444521:25
jrosser admin1: if you could paste that failed task with -vvvv it would be interesting21:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!