Wednesday, 2022-08-10

*** ysandeep|out is now known as ysandeep01:23
*** ysandeep is now known as ysandeep|afk02:44
*** ysandeep|afk is now known as ysandeep04:57
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-ceph_client master: Provide opportunity to define cluster_name  https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/85258805:07
*** prometheanfire is now known as Guest19407:00
*** ysandeep is now known as ysandeep|afk07:09
*** Guest194 is now known as Guest20007:28
evrardjpnoonedeadpunk: I disagree,  the fact is that the config should radically be different,  IMO07:34
evrardjpelse I would not even try this07:35
evrardjpand I don't think that for your case you need variable generation. There are plenty of places for osa case that template is a better choice07:36
evrardjpbut let's try the PoC see how far that goes07:36
*** ysandeep|afk is now known as ysandeep07:37
*** Adri2000_ is now known as Adri200008:29
evrardjpOk I am at the end of the time I have for the PoC, and I see that this is a positive improvement, yet too marginal to be worth the risks of failed migrations. 08:43
evrardjpThe results removed completely the variables from osa/inventory/group_vars/haproxy,  reconfigured the role to use an external role,  put all the "desired state" into the deployer node08:43
evrardjpit would load the role to reconfigure haproxy if necessary,  in each playbook, with include_role tasks_from to allow for a reconfiguration of a frontend/backend live 08:45
evrardjpI need a quick patch on the upstream role for it08:45
evrardjpI had two ways to reconfigure using my external role08:46
evrardjpthe first way was to generate a series of vars (using set_facts) that would give a proper config.  Sadly this becomes very convoluted when configuring haproxy from a non haproxy play08:47
evrardjpthe other way,  was to template directly from OSA existing role and use my other role to reload the configuration/handle the state.  This is relatively good in terms of code cleanup, but only bring marginal improvements over the whole configuration08:47
evrardjpa mix of those two models could give great results, but at the risk of the clarity and complexity during an upgrade.08:48
evrardjpI think that noonedeadpunk's patch on 'interface' and maybe future changes in the templating are "good enough" for a majority of OSA users.08:50
evrardjpfor people who want something different,  I am sure my role can deliver it,  if you think it "from scratch".  Now it's a tad late for OSA for the marginal improvements08:50
evrardjpSo there you go,  1 day flushed :)08:51
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-ceph_client master: Do not delegate facts when fetching keyrings  https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/85271408:53
jrosser_urgh centos-9-stream jobs are broken09:05
*** ysandeep is now known as ysandeep|lunch09:34
snadgeim running into an issue deploying a yoga install on ubuntu 20.04.4.. https://pastebin.com/24UuhBPX09:51
snadgeim aware of this issue: https://bugs.launchpad.net/openstack-ansible/+bug/1943978 .. and have installed that patch09:51
noonedeadpunksnadge: from the error it seems that haproxy does not see any alive nginx on repo_containers09:54
noonedeadpunkso have a feeling that repo-install.yml has failed previously09:54
snadgeoddly.. wgetting that file seems to work.. perhaps something is bouncing up and down09:55
noonedeadpunkyeah, with centos seems we got bad timing for repo updates...09:58
snadgeive seen this message a few times "backend repo_all-back has no server available!" 09:58
noonedeadpunk*infra mirrors sync09:58
noonedeadpunksnadge: well yes, that would explain 50309:58
snadgei wonder why it did that.. one of the playbooks must have made it go to lunch09:58
noonedeadpunkor well, that's is the reason of 503)09:58
snadgethen haproxy has marked it nonresponsive or whatever09:59
opendevreviewJean-Philippe Evrard proposed openstack/openstack-ansible master: Cleanup useless variables  https://review.opendev.org/c/openstack/openstack-ansible/+/85256310:06
snadgeyeah something is causing the repo server to drop out 10:10
snadgebut the problem seems intermittent10:11
mrfmmm what text editor got the containers?10:12
mrfnano vi?10:12
snadgecinder-volume is crashing in a loop saying access denied to user cinder .. using password yes.. it seems like a mysql error?10:20
snadgethis is on the controller which runs all the containers plus galera etc10:21
snadgei wonder if thats whats loading the system up and causing the repo server to drop out10:21
mrfwhy i cant edit from host the /var/lib/mysql/grastate.dat in the path of rootfs /var/lib/lxc/controller1_galera_container-45ce6c70/rootfs/var/lib/mysql ??10:43
mrfsolved finally used sed... for replace the boostrap10:46
noonedeadpunkI have a problem in my sandbox. `internal endpoint for volumev3 service in az-poc region not found` https://paste.openstack.org/show/bC3upHmpHotyE5PyKhnG/10:51
noonedeadpunkwtf is that....10:51
noonedeadpunkmrf: /var/lib/mysql is a bind mount inside the container. So you should check for actual path on the host10:52
noonedeadpunkinside token stanza for me catalog is weird indeed. It's somehow filtered I would say10:53
noonedeadpunkok, wtf https://paste.openstack.org/show/bPjXX6jQFm0WWa0yh4Ka/11:04
*** ysandeep|lunch is now known as ysandeep11:06
jrosser_mrf: there won't be an editor in the containers, they are as minimal as practical. you can install vim or whatever if you need it11:09
jrosser_snadge: wget from the haproxy node and also check the haproxy log will be useful11:10
jrosser_all interaction with the repo hosts will be via the loadbalancer so it's important to find why that is unstable11:10
jrosser_mrf: if you are having database trouble then we have some docs here https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html#galera-cluster-maintenance11:13
mrfyeah i already solved with a sed... forcing the a one to bootstrap11:15
*** dviroel|out is now known as dviroel11:24
noonedeadpunkso, basically catalog is taken from your auth, and endpoints from a separate API request11:39
noonedeadpunkwhy not everything is returned during token generation then...11:39
noonedeadpunkok, I know what is that12:07
snadgemaybe im running out of tcp ports or something stupid on the host? a whole bunch of servers went down at the same time this time12:58
snadgeit seems haproxy logs to /dev/log which is just the main journal13:01
snadgeit crashes during keystone setup in setup_openstack.. and now its just bailing saying it cant find "/var/www/repo/os-releases/25.0.0/ubuntu-20.04-x86_64/requirements/keystone-25.0.0-constraints.txt"13:05
snadgeso i have to blow away the keystone container and just reinstall that part? i got stuck in this loop last time13:06
snadgei knew i shouldn't have used version 25 :(13:06
snadgehow do i rebuild that file?13:14
mrfCould not find the requested service aodh-api: host"  mmm for aodh we just need in the yml the metering-alarm_hosts no?13:29
*** ysandeep is now known as ysandeep|break13:30
jrosser_snadge: that sounds like you still have problems with the repo server13:34
jrosser_i am not sure re-creating the keystone container is going to help13:35
snadgeyeah because i've done this once already, i need to try and find out why its happening13:35
jrosser_i think also you are installing 25.0.0 tag, which would not include any bugfixes that have been applied to yoga since the first release13:35
snadgei will check13:35
mrfstable/yoga git download the 25.0.0 tag13:36
mrfsame happens to me13:36
jrosser_no :)13:36
mrfyes13:36
jrosser_stable/yoga is the head of the branch13:36
mrfin my deploy i read 100% 25.0.013:36
mrfand i git the stable/yoga13:36
snadgeit is set to 25.0.013:37
snadgehow do i change it to the latest yoga?13:37
mrfim re running the install of aodh containers will check the tag, but im 99% sure that it show 25.0.0 for stable/yoga13:38
snadgethere is b1, rc1 and rc213:38
jrosser_beta1, release candidate 1 and 213:38
snadgethey will be older then? .. oh you are suggesting trying the dev branch13:39
jrosser_i don't know what that means13:39
jrosser_stable/yoga is a branch13:39
jrosser_25.x.x are tags that mark points in the history of that branch13:39
snadgeah okay that makes sense now.. so if i want some fixes that have been done since 25.0.0 i can switch to stable/yoga13:40
NeilHanlonsnadge: does this visualization help, or no?  https://drop1.neilhanlon.me/irc/uploads/ae91b2a8fb5663f5/image.png 13:41
snadgei need to figure out why the repo server crashes during keystone install.. but it gets jammed, and i have to blow away the keystone container to start again13:43
jrosser_the only thing to note is when you switch to checking out stable/yoga the installed version will become something like 25.1.0.dev3313:43
jrosser_snadge: can you paste some more debug about what is happening?13:43
jrosser_installing keystone should not affect the repo server, it is helpful if we can debug it13:44
snadgewell now im at the point where i have the second error that the constraint file is missing13:45
snadgeso i have to blow it all away to get it to crash the repo server.. and even then, i probably wont know why13:45
mrfjrosser how to checkout the installed version?13:45
mrfany file in openstack_ git contains version string?13:46
jrosser_mrf: it is templated into the top of /usr/local/bin/openstack-ansible13:46
jrosser_snadge: we can help debug if you like13:46
snadgethat would be great, its real late here but i wouldn't mind progressing past this block at least13:47
jrosser_there are standard debug things to try, like wget the same file several times13:47
jrosser_the loadbalancer will hit each repo server in turn so if you get 1-in-3 type succeed/fail then you know that the contents of the repo servers are not synchronised13:48
mrfexport OSA_VERSION="25.0.0" 13:48
jrosser_mrf: that isntallation is the result of `git checkout 25.0.0`13:48
snadgethere is only one repo container.. so it shouldn't even really need haproxy?13:49
snadgethis is a fairly small install13:49
mrffrom my cli history "575  git clone -b stable/yoga https://opendev.org/openstack/openstack-ansible /opt/openstack-ansible"13:49
jrosser_mrf: are you re-running `scripts/bootstrap-ansible.sh` each time you change the checkout of openstack-ansible to deploy?13:52
mrfi never changed :( is the first time we use ansible for deploy openstack... 13:52
mrfit just a virtual envirioment for test it 13:53
jrosser_if you change from tag 2.5.0 to stable/yoga then you really should re-run the bootstrap script13:53
mrfre-bootstraped and changed to export OSA_VERSION="25.0.1.dev3"13:57
jrosser_did you git fetch?14:00
snadgecan i just rebuild the repo container?14:12
jrosser_you can re-run the playbook for it, no problem14:16
jrosser_you can also delete/re-create it completely14:16
jrosser_but i will add that /var/www/repo/os-releases/25.0.0/ubuntu-20.04-x86_64/requirements/keystone-25.0.0-constraints.txt" is a file created during the keystone playbook, not when the repo server is built14:17
snadgeokay i just need to figure out why the playbook isn't creating that file and putting it into the repo then14:19
snadgei dont know which playbook it is, i can only assume it thinks its already done and skipping it or something14:33
*** ysandeep|break is now known as ysandeep14:34
snadgeit just happened again, and i couldn't figure out why.. i desperately tried turning the timout for haproxy way up14:41
snadgebut it didnt help14:41
jrosser_it would really help to see pastes of the log14:41
jrosser_becasue i don't know if you are talking about 404 or 50314:42
snadgethe ansible playback log is about all i have to go on14:42
snadgehaproxy logs to the journal on the controller i think, and it doesnt say much other than down, i simply dont know where to look14:43
jrosser_ok14:43
jrosser_and is it down?14:44
jrosser_haproxy is checking for something very specific being present, not just that the socket accepts a connection14:44
jrosser_so "down" can mean network problems, nginx not running, the file it's checking being absent.....14:45
*** ysandeep is now known as ysandeep|out14:47
mgariepysnadge, look at the haproxy config to see what it's looking for for that service14:47
jrosser_`cat /etc/haproxy/conf.d/repo_all`14:48
snadgeyep so i can connect to it with wget and it works14:51
snadgeits up at the moment14:51
snadgebut the last error i got was this14:51
snadgehttps://pastebin.com/MR0EtzRP14:51
snadgeand now if i run the keystone install again, it will just say that constraints file is missing.. instead of failing during that python_venv_build step as above14:56
jrosser_and the loadbalancer now has the repo server being down?14:57
snadgehow do i show the haproxy status14:59
jrosser_` hatop -s /var/run/haproxy.stat`15:00
jrosser_and `journalctl -u haproxy` for the log15:00
snadgerepo_all-back is up15:01
jrosser_you can also follow the log for a service with `journalctl -fu haproxy` to watch it in real-time15:01
jrosser_it has the feeling of ARP trouble where something else has the same IP tbh15:02
snadgerepo_all-back times out.. then comes back a few minutes later15:04
snadgeyeah this sounds like ip conflict, you're right15:04
jrosser_the container IP are allocated randomly from the CIDR for the management network15:09
snadgeit will be the haproxy ip.. 172.29.236.101 .. a penny has dropped15:09
jrosser_this is important settings https://github.com/openstack/openstack-ansible/blob/master/etc/openstack_deploy/openstack_user_config.yml.example#L91-L9515:10
jrosser_places where you have your own routers, or you take IP from for the mgmt bridges on hosts need to be excluded from the range available to containers15:11
*** dviroel is now known as dviroel|lunch15:38
snadgeive shut off the vms that could have conflicted with that haproxy ip address.. i've added all of the br-mgmt static addresses to the used_ips list15:41
*** Guest200 is now known as prometheanfire15:59
snadgei customised haproxy.cfg, shouldn't that get overwritten?16:14
jrosser_it will get overwritten if you re-run the haproxy playbook16:16
jrosser_the sum total of the haproxy config file is made by glueing together all the generated parts  in /etc/haproxy/haproxy.cfg16:17
jrosser_oops /etc/haproxy/conf.d i mean16:17
snadgeis that part of hosts, inf or openstack playbook16:17
snadgeim part way through inf now.. i started again after hopefully resolving any potential ip conflict issues16:18
jrosser_it's in infrastrcuture16:18
jrosser_setup-infrastructure.yml is just a list of other playbooks to call16:18
jrosser_you can do them by hand individually as/when you need16:18
*** dviroel|lunch is now known as dviroel16:47
snadgehmm, it never seemed to overwrite my customisation but apparently it doesn't matter and its going past the keystone setup now16:52
snadgeall i did was increase the timeout for the repo_all-back from 12000 to 12000016:55
snadgebut of course all that did was make it take longer, and it was probably an arp conflict like you said16:55
snadgehorizon is next.. im so excited to see the gui this time, even though I know at least cinder won't be working.. thats a minor technicality :P18:46
snadgelooks like its working, i'll get a few hours sleep and fix the storage and networking tomorrow.. thanks again jrosser19:21
*** dviroel is now known as dviroel|out21:48
opendevreviewMerged openstack/openstack-ansible-lxc_hosts stable/xena: Prevent lxc.service from being restarted on package update  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/85249822:19

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!