Thursday, 2020-10-15

recycleherojrosser: I see ur name for this all over the place. BTW I set my first debug right00:08
recycleheroOct 15 03:31:40 infra1-utility-container-3e3911b0 ansible-openstack.cloud.os_project[12159]: Invoked with cloud=default state=present name=service description=Keystone Identity Service domain_id=default endpoint_type=admin validate_certs=True interface=admin wait=True timeout=180 properties={} enabled=True auth_type=None auth=NOT_LOGGING_PARAMETER region_name=None availability_zone=None00:08
recycleheroca_cert=None client_cert=None client_key=NOT_LOGGING_PARAMETER api_timeout=None00:08
recycleherotask [OS_keystone: Add service project]00:08
recycleheroerror: openstacksdk required00:08
recycleherothe log is from utillity container00:08
recycleheroI added a debug and saw regardless of what I set keystone_service_setup_host: "{{ groups['utility_all'][0] }}" it is the utiliity container00:10
recycleheroI cant proced setup-openstack00:11
recycleheroits my restore attempt deployment00:11
*** gillesMo has joined #openstack-ansible00:11
recycleherojrosser: I think it didnt respect my --regen switch when recreating the tokens!00:15
*** macz_ has joined #openstack-ansible00:15
*** rf0lc0 has joined #openstack-ansible00:18
*** macz_ has quit IRC00:20
*** MickyMan77 has quit IRC00:42
*** MickyMan77 has joined #openstack-ansible00:42
*** MickyMan77 has quit IRC00:51
*** gyee has quit IRC01:04
openstackgerritMerged openstack/openstack-ansible-galera_server stable/train: Bump galera version  https://review.opendev.org/75748301:05
*** rf0lc0 has quit IRC01:06
*** macz_ has joined #openstack-ansible01:09
*** macz_ has quit IRC01:14
*** MickyMan77 has joined #openstack-ansible01:20
*** NewJorg has quit IRC01:28
*** MickyMan77 has quit IRC01:29
*** cshen has joined #openstack-ansible01:36
*** cshen has quit IRC01:40
*** spatel has joined #openstack-ansible01:44
*** MickyMan77 has joined #openstack-ansible02:03
*** MickyMan77 has quit IRC02:11
*** NewJorg has joined #openstack-ansible02:45
*** MickyMan77 has joined #openstack-ansible02:46
*** MickyMan77 has quit IRC02:55
*** MickyMan77 has joined #openstack-ansible03:35
*** MickyMan77 has quit IRC04:19
*** MickyMan77 has joined #openstack-ansible04:20
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-ansible04:33
*** MickyMan77 has quit IRC04:35
*** spatel has quit IRC04:38
*** MickyMan77 has joined #openstack-ansible04:39
*** nurdie has quit IRC04:43
*** MickyMan77 has quit IRC04:48
*** MickyMan77 has joined #openstack-ansible04:50
*** MickyMan77 has quit IRC05:00
*** MickyMan77 has joined #openstack-ansible05:03
*** MickyMan77 has quit IRC05:07
*** MickyMan77 has joined #openstack-ansible05:10
*** MickyMan77 has quit IRC05:16
*** miloa has joined #openstack-ansible06:02
*** jbadiapa has joined #openstack-ansible06:38
*** andrewbonney has joined #openstack-ansible07:01
*** MickyMan77 has joined #openstack-ansible07:05
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_magnum master: Use openstack_service_*uri_proto vars by default  https://review.opendev.org/41068107:21
*** rpittau|afk is now known as rpittau07:22
jrossermorning07:25
*** cshen has joined #openstack-ansible07:26
noonedeadpunkmorning07:26
noonedeadpunkjrosser: did you get the idea of https://review.opendev.org/#/c/758207 vs https://review.opendev.org/#/c/737221/ ?07:27
*** yolanda__ has joined #openstack-ansible07:27
noonedeadpunkand what makes first one at all?07:28
masterpeIsn' t it beter to look at the number of cores/cpu's that are available?07:28
*** maharg101 has joined #openstack-ansible07:29
masterpeto determent the number of threats?07:29
jrosseri am not sure i understand the second patch07:30
jrosserhttps://review.opendev.org/#/c/758207/ <- that one07:30
noonedeadpunktechnically the second posted is yours, but I guess you don't get another one (which is same for me)07:30
noonedeadpunkwell Adri2000 posted comment to 737221 and that's how I paid attention to it07:31
jrosserthe sed looks wierd07:32
noonedeadpunkand what is `ANSIBLE_FORKS_VALUE` - have no idea07:32
jrosserno me neither07:32
*** cloudnull has quit IRC07:33
noonedeadpunkok, good, then it's not me not seing something obvious:) or at least not me only07:33
jrossermy number of 20 was kind of arbitrary07:33
jrosserbased on the recommendations for AIO cpus really07:34
noonedeadpunkbut kind of aio requires the way more cpus than needed for just deploy host?07:34
*** cloudnull has joined #openstack-ansible07:34
jrosserbut it's based on the maxium number of containers in the control plane07:34
jrosseryeah but it's deploy host threads though, and i'm not sure that means fully utilised CPUs07:35
jrosserit's number of parallel tasks07:35
noonedeadpunklike I have 2 cpus for some deploy host :p07:35
jrosserright, but if you run 20 tasks in parallel which each take N seconds on the target you dont need N cpus on the deploy host at 100% to do that?07:36
jrosser^ confused used of N there, sorry07:36
jrosserright, but if you run 20 tasks in parallel which each take N seconds on the target you dont need 20 cpus on the deploy host at 100% to do that?07:36
noonedeadpunkdepends on how much seconds will take each task honestly07:36
noonedeadpunkin terms of threads and cpu cycles07:37
jrosserhmm well maybe we need something dynamic then07:38
noonedeadpunkand what really disturbes me - is amount of ssh sessions here. I know we talked about that, but we need to guarantee that we do increase them07:38
noonedeadpunkor deployers will07:38
jrosserwhat i saw was with AIO (8cpu 8G) it would run the tasks in several batches particularly for lots of containers on the controller07:39
jrosserand you could get a good speedup by increasing the forks to make it do just one batch07:39
noonedeadpunkand for aio we have https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/bootstrap-aio.yml#L69 which is not applied to regular deployments07:40
noonedeadpunkand we also need to adjust this https://opendev.org/openstack/openstack-ansible/src/branch/master/doc/source/admin/maintenance-tasks/ansible-modules.rst#user-content-ansible-forks07:40
jrossertbh that feels out of date info07:41
jrosserbut if this is difficult we can leave it, or just make an optimisation for CI07:42
noonedeadpunkwel, partially07:42
Adri2000in https://review.opendev.org/#/c/758207/ ANSIBLE_FORKS_VALUE is a placeholder in openstack-ansible.rc ... and that is replaced by the actual value07:42
noonedeadpunkand what is that valua?07:42
*** tosky has joined #openstack-ansible07:42
Adri2000it's computed in scripts-library.sh07:43
Adri2000my patch assumes we don't remove https://review.opendev.org/#/c/737221/2/scripts/scripts-library.sh07:43
noonedeadpunkwell, it doesn;t have ANSIBLE_FORKS_VALUE07:43
noonedeadpunkand according to http://codesearch.openstack.org/?q=ANSIBLE_FORKS_VALUE&i=nope&files=&repos= this var is introduced with that patch07:44
Adri2000`sed -i "s|ANSIBLE_FORKS_VALUE|${ANSIBLE_FORKS}|g" /usr/local/bin/openstack-ansible.rc` < the ANSIBLE_FORKS_VALUE placeholder in openstack-ansible.rc is replaced by ${ANSIBLE_FORKS} which is ocmputed in scripts-library.sh07:44
* jrosser confused07:44
Adri2000this change https://review.opendev.org/#/c/758207/3/scripts/openstack-ansible.rc makes sure the ANSIBLE_FORKS env var is defined to either a user defined ANSIBLE_FORKS env var or to ANSIBLE_FORKS_VALUE which will be an actual number (it will have been replaced by bootstrap-ansible.sh)07:46
Adri2000this change https://review.opendev.org/#/c/758207/3/scripts/bootstrap-ansible.sh replaces ANSIBLE_FORKS_VALUE by the value ${ANSIBLE_FORKS} which is computed in scripts-libary.sh (https://review.opendev.org/#/c/737221/2/scripts/scripts-library.sh)07:46
noonedeadpunkI still didn;t get why in the world we need ANSIBLE_FORKS_VALUE env var07:47
Adri2000there is no ANSIBLE_FORKS_VALUE env var07:47
noonedeadpunkand why we need to replace something that does not exist?07:47
noonedeadpunkwell, and you introduce it here ? https://review.opendev.org/#/c/758207/3/scripts/openstack-ansible.rc07:48
Adri2000`export ANSIBLE_FORKS="${ANSIBLE_FORKS:-ANSIBLE_FORKS_VALUE}"` will become e.g. `export ANSIBLE_FORKS="${ANSIBLE_FORKS:-10}"` after you've run bootstrap-ansible07:48
noonedeadpunkby defining it as default for ANSIBLE_FORKS?07:48
Adri2000I define an ANSIBLE_FORKS env var07:48
Adri2000just look at the line before, it's the same model: `export ANSIBLE_PYTHON_INTERPRETER="${ANSIBLE_PYTHON_INTERPRETER:-OSA_ANSIBLE_PYTHON_INTERPRETER}"`07:48
Adri2000OSA_ANSIBLE_PYTHON_INTERPRETER is a placeholder as well07:49
noonedeadpunkbut we have this var defined https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/bootstrap-ansible.sh#L5907:49
jrosserimho this is confusing because of the overloading of ANSIBLE_FORKS07:49
Adri2000yes07:49
Adri2000this https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/bootstrap-ansible.sh#L46 defines ${ANSIBLE_FORKS}07:50
jrosserto make this clean the only place that ANSIBLE_FORKS should be defined is in the .rc file07:50
jrosserand we should have a different var OSA_ANSIBLE_FORKS calculated in scripts library which becomes the default value, based on the the deploy host CPUs07:50
noonedeadpunk+107:51
Adri2000that's fine to me. I believe my patch works as is, but I can understand you find the use of names confusing. I kind of wanted to make the patch as small as possible to fix the actual problem I had. (I had to use -f X on each openstack-ansible run to define the number of forks)07:52
jrossermaking the code more obvious is a good thing, so if the patch is a bit bigger then thats totally OK07:54
noonedeadpunkanother option for me would be just dropping ANSIBLE_FORKS_VALUE and related sed to it07:54
noonedeadpunkbut I like jrosser's idea07:54
noonedeadpunkbut actually, I think we should decide if we probably want just some static bump07:55
noonedeadpunkas I like this idea as well, except nuance with MaxSessions07:55
noonedeadpunkbut considering that it wil be pretty easi to override in bashrc...07:56
noonedeadpunkmaybe we should really just increase number of forks for aio?07:57
jrossersimple is good too07:58
noonedeadpunkbut yeah, again, I with my 2 cores can really the way more threads than 207:58
jrossergood question08:02
noonedeadpunkso I see 3 roads - leave by amount of cpus (which is not so effective considering that we probably should multiply it) and make use of ANSIBLE_FORKS, add MaxSessions bump for sshd for deploy host somehow, or just set fixed number to 10?08:02
noonedeadpunkwell start really using ANSIBLE_FORKS will be applicable anyway08:03
jrosseryes, because we can't even make an AIO/CI special case without that08:03
noonedeadpunkso https://review.opendev.org/#/c/758207/  might be a good and backportable shot08:04
jrosserfwiw i am using LXDs for deploy hosts, a bunch of them on the same machine08:04
jrosserso they all have all the host CPUs should they need them08:04
jrosseryes i think https://review.opendev.org/#/c/758207/ is good, if the variable names get regularised to OSA_.... for things replaced in the rc file08:06
noonedeadpunkwell, we actually need to bump ssh sessions just for lxc hosts08:06
Adri2000jrosser: will prepare and push that change right now so you can both have a look08:08
jrosserrecyclehero: the openstacksdk is installed into the utility container venv here https://github.com/openstack/openstack-ansible/blob/master/playbooks/utility-install.yml#L17608:09
jrosserrecyclehero: the list of things which get installed into the utility venv is here https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/utility_all.yml#L7508:10
jrosserrecyclehero: handlers run at the end of a play, but only if the task which notifies them has a status of 'changed'08:11
openstackgerritAdrien Cunin proposed openstack/openstack-ansible master: Actually use ANSIBLE_FORKS in openstack-ansible.rc  https://review.opendev.org/75820708:11
*** CeeMac has joined #openstack-ansible08:12
noonedeadpunkI wish we dropped these seds08:24
*** yolanda__ has quit IRC08:24
*** yolanda__ has joined #openstack-ansible08:24
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-lxc_hosts master: Increase amount of MaxSessions  https://review.opendev.org/75836408:25
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Increase default ansible forks from 5 to 20  https://review.opendev.org/73722108:30
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Increase default ansible forks from 5 to 20  https://review.opendev.org/73722108:30
* noonedeadpunk still can't understand 758207 and why we set default of ANSIBLE_FORKS to undefined OSA_ANSIBLE_FORKS ;(08:33
noonedeadpunkthat's kind of what we're doing http://paste.openstack.org/show/799066/08:36
recycleheromorning08:36
noonedeadpunkah ok08:36
recycleherojrosser: so I should see openstacksdk in /openstack/venvs/utility-21.0.1/bin insde openstack container08:36
noonedeadpunkin case we have ANSIBLE_FORKS defined we will replace with actual number08:36
recyclehero*utility container08:37
jrosseropenstacksdk is the name of the python package08:37
recycleherofor some reason it isnt present on the utility contaienr08:38
recycleheroor maybe a symlink isnt created08:39
recycleherosome of this tasks got run_once on them08:40
recycleherocould it be  causing this08:40
jrosserhow have you confirmed that openstacksdk is not present?08:41
recycleherotask [OS_keystone: Add service project] error08:42
recycleheroit says openstack sdk required08:42
jrosseryou said you have some trouble with deploying the utility container08:44
jrosserif thats somehow not worked then the rest of the roles are not going to work08:44
recycleherocan I delete the container and redeloy it?08:45
jrosseryou can do that, yes08:47
recycleherothe link is there openstack -> /openstack/venvs/utility-21.0.1/bin/openstack08:47
recycleherobut the actual bin aint08:47
recycleheroso delete with lxc and run utility-install08:49
jrosseryou could try re-running the utility playbook with -e venv_rebuild=yes08:49
recycleherodoh, too late. msg": "Destination /var/lib/lxc/infra1_utility_container-3e3911b0/config does not exist !08:51
recycleheroI think I should do setup-host too08:51
jrosserrecyclehero: there is a playbook specifically to create the containers, and you can use --limit to make it very specific which will speed things up considerably08:56
jrosserit's work spending some time understanding whats inside the setup-*.yml playbooks, because all of the more granular things can be called directly08:57
recycleheroI will go limit first to see how its works then I will check that out. thanks08:58
jrosserhttps://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html#destroy-and-recreate-containers09:01
MickyMan77which openstack-ansible version is the lastes stable for use with Centos 8 ?09:09
jrosserMickyMan77: that would be 21.1.0 on the ussuri branch09:11
jrossernoonedeadpunk: we need to fix the uwsgi role https://review.opendev.org/#/c/758108/09:17
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-os_tempest master: Fix tempest init logic  https://review.opendev.org/75339309:26
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-galera_server master: Update galera to 10.5.6  https://review.opendev.org/74210509:29
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-lxc_hosts master: Increase amount of MaxSessions  https://review.opendev.org/75836409:31
recycleherowhen does password setting take place? like placement service in keystone09:31
jrosserhttps://github.com/openstack/openstack-ansible-os_placement/blob/master/tasks/main.yml#L87-L11509:33
*** Nick_A has quit IRC09:34
openstackgerritMerged openstack/openstack-ansible-os_placement stable/train: Trigger service restart  https://review.opendev.org/75774509:45
openstackgerritMerged openstack/openstack-ansible-os_cinder stable/ussuri: Trigger uwsgi restart  https://review.opendev.org/75771210:02
openstackgerritErik Berg proposed openstack/openstack-ansible stable/ussuri: WIP/DNM: Upgrade ceph to octopus during run-upgrade.sh to ussuri  https://review.opendev.org/75838210:07
*** admin0 has quit IRC10:27
*** yolanda__ is now known as yolanda10:52
recycleheroaguys how do u deploy with out getting logged out as an effect of security_hardening of hosts? nohup, output redirection11:08
recycleheroI deploy from infra111:11
openstackgerritMerged openstack/ansible-role-uwsgi master: Add vars file for ubuntu bionic  https://review.opendev.org/75810811:11
recycleherowhen It takes a while for example on wheel builds I get logged out eitehr on ssh or localhost login11:13
*** dave-mccowan has joined #openstack-ansible11:16
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Fix infra jobs  https://review.opendev.org/75839911:20
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-galera_server master: Update galera to 10.5.6  https://review.opendev.org/74210511:21
*** yann-kaelig has joined #openstack-ansible11:32
ebbexrecyclehero: i deploy from a vm with screen, so i'm not sure, but you can set "security_rhel7_session_timeout: 0" in user_variables.yml, and "ServerAliveInterval 300" in your .ssh/config11:33
openstackgerritMerged openstack/openstack-ansible-os_nova master: Enable notifications when Designate is enabled  https://review.opendev.org/75790411:37
openstackgerritMerged openstack/openstack-ansible-os_nova stable/ussuri: Simplify scheduler filter additions  https://review.opendev.org/75785811:37
openstackgerritErik Berg proposed openstack/openstack-ansible-os_nova stable/train: Simplify scheduler filter additions  https://review.opendev.org/75840411:43
recycleheroebbex: thanks, I changed the ServerAliveInterval.11:46
recycleherobut it gets overwritten by ansible!11:53
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_nova stable/ussuri: Remove backported release note  https://review.opendev.org/75840611:54
jrosserrecyclehero: ebbex uses a seperate deploy host " i deploy from a vm" so i beleive he was referring to that host, not infra111:55
recycleheroit actually didnt overwrite ServerAliveInterval in sshd_config but setting it to 0 and still got disconnected. I should look for something like session timeout in playbook and reverse it. it kicks me out in the middle of deployment11:57
jrosserif you are deploying from one of the target hosts then you'll need to set variables that change what the hardening role does11:57
ebbexrecyclehero: ServerAliveInterval 300 in .ssh/config on your local machine. The one you ssh *from* into infra1.11:58
jrosserebbex: i think infra1 is the deploy host here11:58
ebbexyeah, and he's getting disconnected from infra1 right?11:59
jrosserwon't the session timeout always win?11:59
ebbexsession timeout will win yes.12:00
jrosseri think your advice to adjust security_rhel7_session_timeout is whats needed12:00
ebbexcause he's deploying from a host that gets deployed to?12:00
jrossereither way i think?12:01
recycleherojrosser: yes, but I am on debian.12:01
recycleheroyes its common as I dont see anything distro specefic in the roel12:02
ebbexhe probably needs both. one to prevent session timeout on infra1, and two to prevent ssh disconnects from infra1 to his computer.12:02
openstackgerritGeorgina Shippey proposed openstack/ansible-role-systemd_service master: Greater flexibility to timer templating  https://review.opendev.org/75840812:05
openstackgerritAdrien Cunin proposed openstack/openstack-ansible-os_nova stable/ussuri: Enable notifications when Designate is enabled  https://review.opendev.org/75841112:09
recycleheroits well hardened! declare -rx TMOUT="600" cant easiliy override it (read-only) . I am going cuckoo :D12:09
openstackgerritAdrien Cunin proposed openstack/openstack-ansible-os_nova stable/train: Enable notifications when Designate is enabled  https://review.opendev.org/75841212:09
*** cshen has quit IRC12:09
openstackgerritAdrien Cunin proposed openstack/openstack-ansible-os_nova stable/stein: Enable notifications when Designate is enabled  https://review.opendev.org/75841312:09
recycleheroat list now I know I shoud press enter every 10 minutes.12:11
ebbexthat's what setting security_rhel7_session_timeout is for, no?12:12
ebbexrecyclehero: ^12:12
*** rf0lc0 has joined #openstack-ansible12:15
openstackgerritErik Berg proposed openstack/openstack-ansible-os_nova stable/train: Simplify scheduler filter additions  https://review.opendev.org/75840412:17
recycleheroebbex: I wanted to do that in palce. I am on it now12:22
jrossernoonedeadpunk: i think for stable branch octavia jobs we are really stuck like this https://review.opendev.org/#/c/672556/12:31
jrosseri don't think there is a suitable amphora image for that openstack release12:31
noonedeadpunkjrosser: let's just disable then tests?12:34
jrosseryep - that would do it, the patch makes sense otherwise12:35
noonedeadpunkbtw, any thoughts about https://review.opendev.org/#/c/752059/ ?12:35
jrosseryeah - i think we are having issues similar12:35
jrosserjust not got round to check12:36
noonedeadpunkgot it12:36
noonedeadpunk(just was pinged in related bug report about in what state it is)12:36
jrosseri'll see if we can test it but going to be not immediate12:37
jrosserwe try to collect logs with journalbeat only on the hosts, and i think some were missing12:40
jrosserwhich could be broken mounts12:40
*** macz_ has joined #openstack-ansible12:40
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-os_tempest master: Fix tempest init logic  https://review.opendev.org/75339312:43
*** macz_ has quit IRC12:45
*** cshen has joined #openstack-ansible12:57
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-os_octavia stable/rocky: Save iptables rules for all Debian derivative operating systems  https://review.opendev.org/67255613:04
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-lxc_hosts master: Increase amount of MaxSessions  https://review.opendev.org/75836413:05
MickyMan77Finally, :), I have a working openstack farm except for the network part. I can see dhcp request from the instances on the br-vlan interface when i do a tcpdump.13:09
MickyMan77The instances does not get any ip address on the nic.  ----> http://paste.openstack.org/show/799080/13:09
jrosserMickyMan77: there are come quite comprehensive checklists here https://docs.openstack.org/openstack-ansible/latest/admin/troubleshooting.html13:22
openstackgerritMerged openstack/openstack-ansible-os_nova stable/ussuri: Remove backported release note  https://review.opendev.org/75840613:25
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-os_ironic master: Updated from OpenStack Ansible Tests  https://review.opendev.org/75553613:30
*** nurdie has joined #openstack-ansible13:36
openstackgerritMerged openstack/openstack-ansible-lxc_hosts stable/train: copy the actual keyring  https://review.opendev.org/73162613:37
*** sshnaidm has quit IRC13:54
openstackgerritJonathan Rosser proposed openstack/openstack-ansible master: Switch from ansible-base + collections to ansible package  https://review.opendev.org/75843113:59
*** sshnaidm has joined #openstack-ansible14:00
jrossergoodness me how can ansible galaxy be *so* unreliable14:21
jrosserlike 80% of our jobs are failing14:21
*** macz_ has joined #openstack-ansible14:28
*** gshippey has joined #openstack-ansible14:29
* noonedeadpunk in ansible contributor summit and going to rent about that14:32
noonedeadpunk*rant14:32
*** macz_ has quit IRC14:33
noonedeadpunkjrosser: do you have bug report you've written to somewhere handy?14:34
jrosserhmm just need to find it!14:36
jrossernoonedeadpunk: https://github.com/ansible/galaxy/issues/230214:37
noonedeadpunkthanks!14:37
*** rgogunskiy has joined #openstack-ansible14:44
*** miloa has quit IRC14:53
*** macz_ has joined #openstack-ansible15:01
Adri2000would be happy to get a few more opinions on https://review.opendev.org/#/c/729533/ - I think the topic basically boils down to how we define "data" in the context of OSA LXC containers. I have always assumed that OSA LXC containers' "data" were the directories bind mounted from /openstack/... into the LXC containers, such as /var/lib/mysql/ for galera containers. which means that there15:03
Adri2000is no actual "data" for most containers (galera being the main exception); i.e. it's possible to completely destroy/delete and then recreate from scratch (well as long as the OSA inventory is there) most of the containers.15:03
openstackgerritMerged openstack/openstack-ansible master: Remove glance-registry from docs  https://review.opendev.org/73979415:11
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Make collections installation more reliable  https://review.opendev.org/75845415:22
jrosser^ nice :)15:23
jrossernoonedeadpunk: i was also thinking about the job failures we get downloading upper-constraints.txt, we do that a lot on each run15:24
jrosserthat could be recovered from the requirements git repo on the CI node and maybe we put it in /openstack/requirements/<sha>/upper-constraints.txt or something15:26
jrosserthen change the url to file://....15:26
noonedeadpunkwell, I think we can set requirements as required project?15:26
noonedeadpunkand then zuul will get it - the only thing we need is to update single variabe in ci?15:27
jrosserright - but we set a specific sha and we'd need a specific step to extract just the right version of the file15:27
noonedeadpunkwell, yes...15:27
jrosserand putting it in /openstack/... makes it also be inside all the lxc :)15:27
jrosserit's nowhere near as bad as the galaxy thing but maybe the next most frequent thing that breaks15:28
recycleherothis python-venv behave very strange. sometimes on upgrades to the version ti wants retry for 5 time even after some failed runs which should have them locally.15:30
recycleheroand now this15:30
recycleherofatal: [infra1_horizon_container-9cb968e5 -> 172.29.239.138]: FAILED! => {"changed": false, "msg": "file not found: /var/www/repo/os-releases/21.0.1/horizon-21.0.1-constraints.txt"}15:30
recycleherothis is the task: but I think if I rerun it disappear15:31
recycleheroTASK [python_venv_build : Slurp up the constraints file for later re-deployment] ****15:31
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Make collections installation more reliable  https://review.opendev.org/75845415:32
*** gyee has joined #openstack-ansible15:39
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Make collections installation more reliable  https://review.opendev.org/75845415:44
*** rpittau is now known as rpittau|afk15:52
*** dave-mccowan has quit IRC15:53
*** dave-mccowan has joined #openstack-ansible15:57
recycleheroI am blaming the https connection to releases.openstack.org15:57
recycleherowhat is the varibale to make it try like crazy rather than aborting the deployemnt after 5 tries?15:59
jrosserthere has been an outage at releases.openstack.org this afternoon16:03
jrosserit should be back now16:04
recycleheronot very stable16:04
recycleherobut this one is a sticky TASK [python_venv_build : Slurp up the constraints file for later re-deployment]16:04
recycleheroatal: [infra1_horizon_container-9cb968e5 -> 172.29.239.138]: FAILED! => {"changed": false, "msg": "file not found: /var/www/repo/os-releases/21.0.1/horizon-21.0.1-constraints.txt"}16:04
recycleherowaht should I do?16:05
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Make collections installation more reliable  https://review.opendev.org/75845416:07
recycleheroI even dont know what Slurp means16:09
recycleheroBit should be an address like this? https://opendev.com/openstack/requirements/raw/21.0.1/horizon-21.0.1-constraints.txt16:18
recycleheroits not correct, I am tring to put it in there manually16:19
*** spatel has joined #openstack-ansible16:21
*** MickyMan77 has quit IRC16:25
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Fix upgrade jobs for bind-to-mgmt  https://review.opendev.org/75846116:25
recycleheroignore_errors: yes :(16:29
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-galera_server stable/stein: Bump galera version  https://review.opendev.org/75846216:30
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-galera_client stable/stein: Bump galera version  https://review.opendev.org/75846416:31
jamesdentonjrosser with your haproxy and baremetal efforts, have you seen a need to override openstack_service_bind_address for uwsgi-based services?16:35
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_magnum master: Fix linter errors  https://review.opendev.org/75556916:55
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible master: Fix upgrade jobs for bind-to-mgmt  https://review.opendev.org/75846116:58
jrosserjamesdenton: no i've not - if you're having to override that maybe we have a wrong default somewhere17:02
jrosserdo you have an example?17:02
jamesdentonwell, i have haproxy running on the same node as ironic-api (on baremetal). default for uwsgi host ip is 0.0.0.0, but haproxy is already listening on thre same ports17:03
jrosserthe patch that enabled bind-to-mgmt should have set openstack_service_bind_address to something like {{ mangement_address }} iirc17:04
jamesdentoni might not have that patch17:04
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/all.yml#L3817:07
noonedeadpunkit's only in master iirc17:07
jamesdentonbueno. i overrode to ansible_host, but looks good17:08
jamesdentonrunning ussuri here17:08
jamesdentonthank you17:08
noonedeadpunkjamesdenton: and you're running U on metal?17:08
noonedeadpunkI guess you're if asking:)17:09
noonedeadpunkit kind of means that we've failed with our plan to do CI only thig jrosser17:09
jamesdentonkinda sorta. this is actually my home lab setup, which started as a rocky? environment a couple of years ago and has been upgraded. slowing moving lxc->baremetal17:10
jamesdentonwe run Stein on baremetal in prod, but have dedicated haproxy nodes there17:10
noonedeadpunkjamesdenton: well, I think we will need some migration plan then....17:10
noonedeadpunkas what I did was https://review.opendev.org/#/c/758461/17:11
noonedeadpunk(not sure it's working at all at the moment)17:11
noonedeadpunkbut not suitable for prod for sure17:11
jamesdentonor stein environments are greenfield, fwiw17:11
jamesdenton*our17:12
jamesdentonmy migration plan has been to copy the env.d files to /etc/openstack_deploy/env.d per service, set baremetal=true, remove the existing lxc container from inventory, regenerate inventory, and redeploy the respective service. But these oddities show up, like haproxy. i'd have to be much more methodical about it17:13
noonedeadpunkWell, I was meaning bare metal deployment with external haproxy/tons of overrides on U to V bare metal with our bind-to-mgmt17:15
noonedeadpunkand bad thing about that is ppl might have tons of different ways to workaround....17:16
jamesdentonagreed17:16
noonedeadpunkofftop - dunno how to comment that (it's just beginning of the work) https://github.com/ansible-collections/cloud.roles17:17
recycleheroguys in the repo container /var/www/repo/os-releases/21.0.1 every service has 4 files but horizon is mission the constraint one?17:18
recyclehero.17:18
jamesdentoni'm curious as to which ton of overrides you're referring to for external haproxy17:18
noonedeadpunkI was reffering to overrides in case of internal haproxy)17:19
jamesdentonoh ok17:19
recycleherowhat are these .txt files17:19
recyclehero?17:19
jamesdentonyeah, well, like you said, who knows how many people were/are actually doing that?17:19
jamesdentoni kinda feel like if you go off the reservation, you're on your own, to an extent. But if there's a "sanctioned" architecture and migration plan, that;s the one you test against17:20
*** andrewbonney has quit IRC17:20
noonedeadpunkwell yes, fair17:20
recycleheroopenstack-ansible repo-install.yml -e "venv_rebuild=True"17:29
recycleherowould this help?17:29
noonedeadpunkrecyclehero: help with what?17:31
noonedeadpunkyou have error installing horizon?17:31
recycleheroyes17:32
recycleheroit complains about a file missing17:32
recycleherohorizon-21.0.1-constraints.txt17:32
recycleheroit should be in the repo server, and I checked its there for them all except horizon.17:33
noonedeadpunkopenstack-ansible os-horizon-install.yml -e "venv_rebuild=True"17:33
recycleherothen continue with setup-openstack ?17:34
noonedeadpunkwell either this or manually run rest of playbooks for services that you want to deploy17:37
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/setup-openstack.yml#L25-L5017:38
noonedeadpunkbut for core I think horizon is one of the last ones17:38
recycleheronoonedeadpunk: btw is this in 21.1.0?17:41
recycleherohttps://review.opendev.org/#/c/751724/17:43
noonedeadpunkum...... it's not but I was pretty sure that it is...17:44
noonedeadpunkah wait17:44
*** MickyMan77 has joined #openstack-ansible17:48
noonedeadpunkrecyclehero: yep, it's included17:49
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible/src/tag/21.1.0/ansible-role-requirements.yml#L4417:49
recycleherogreat, how did u found out that this commit is included on that lxc_host version?17:53
noonedeadpunkwell by the commit SHA17:57
noonedeadpunkyou may see that for stable/ussuri that sha is exactly this commit https://opendev.org/openstack/openstack-ansible-lxc_hosts/commits/branch/stable/ussuri17:58
recycleherogot it thanks17:59
MickyMan77Hi, I run it a problem with Cloud-init and SSh keys. "no authorized SSH keys fingerprints found for user debian" --> http://paste.openstack.org/show/799088/18:00
MickyMan77it there any easy way to fix it ?18:00
fridtjof[m]Hey, I'm encountering a lot of instability in an environment. (especially when creating instances) nova-api-wsgi is losing connection to rabbitmq a lot - rabbit is closing the connections due to missing heartbeats, which would indicate that nova-api-wsgi is somehow failing to send those properly?18:01
fridtjof[m]any ideas?18:01
fridtjof[m]this env is on train 20.1.718:01
spatelMickyMan77: did you try other distro?18:01
MickyMan77same issue with debian and centos 818:02
fridtjof[m]whoa, i just got a huge exception in the log18:03
spatelMickyMan77: I think neutron metadata services provide those function so make sure its up and running - https://docs.openstack.org/nova/latest/user/metadata.html18:03
fridtjof[m]http://paste.openstack.org/show/799093/18:04
recycleheroMickyMan77: these comes to my mind 1- check metadata 2-maybe write some key directly to the volume using libguestfs tools and inspect more18:06
jrosserMickyMan77: you did create/upload an ssh keypair?18:06
MickyMan77i did upload the key.18:07
spatelfridtjof[m]: i had that kind of issue when i was using F5 load-balancer and found TCP timeout setting was different on F5 and it was closing connection18:07
*** djhankb has quit IRC18:07
fridtjof[m]this is a plain haproxy setup unfortunately18:08
fridtjof[m]just one infra node18:08
jrosserMickyMan77: this is not good 87.370969] cloud-init[419]: 2020-10-15 17:45:00,488 - util.py[WARNING]: No active metadata service found18:08
noonedeadpunkfridtjof[m]: well I'm wondering if everything with network is ok, and considering it's ok, then I think it's worth checking if you have some rabbit queue overflowed with unread messages18:08
MickyMan77I will take a look at metadata service18:09
spatelyes, network issue, mtu (could be) or not health MQ.18:09
noonedeadpunkyou can check that with `rabbitmqctl -p /nova list_queues | egrep -v "0$"` (but vhosts are nova,cinder,glance,neutron,etc)18:09
fridtjof[m]network shouldn't be an issue, the containers are all on the same host in this case18:09
noonedeadpunkcan you have OOM? :)18:09
fridtjof[m]sorry for the dumb question, but how do I check rabbitmq's health?18:10
openstackgerritMerged openstack/openstack-ansible-os_octavia stable/rocky: Save iptables rules for all Debian derivative operating systems  https://review.opendev.org/67255618:10
fridtjof[m]ah18:10
noonedeadpunkfridtjof[m]: well, there's dashboard and cli util and...18:10
fridtjof[m]128GB on the infra host, oom shouldn't be a problem18:10
MickyMan77is the website https://docs.openstack.org/ down ?18:11
noonedeadpunkwell yeah18:11
fridtjof[m]hm, queues are empty18:11
spatelis this AIO deployment?18:11
noonedeadpunkMickyMan77: yep :(18:11
fridtjof[m]not quite, it's one infra + storage host, and two compute hosts18:12
fridtjof[m]what i find weird is that rabbitmq gives me this log output a lot:18:12
fridtjof[m]2020-10-15 18:09:11.722 [error] <0.17759.2> closing AMQP connection <0.17759.2> (10.1.70.227:50126 -> 10.1.70.29:5671 - uwsgi:9290:3e5c3b51-cd0f-4527-9035-cb29d21c23fd):18:12
fridtjof[m]missed heartbeats from client, timeout: 60s18:12
fridtjof[m]227 is the nova-api container18:12
spatelfridtjof[m]: its normal i believe, i am seeing this in my network very randomly (i believe its kind of bug)18:13
spatelrun tcpdump on port 567118:13
fridtjof[m]it correlates with nova-api constantly logging this: http://paste.openstack.org/show/799094/18:14
noonedeadpunkfridtjof[m]: oh, and you're running rabbit with ssl?18:14
fridtjof[m]yeah, i'm just wondering if it's a load + timeout issue for my problems18:14
spatelthat is not good error message18:14
fridtjof[m]whatever the default in OSA is18:15
noonedeadpunkI think be default we don't use ssl for rabbit/mysql as for now18:15
noonedeadpunkbut exception is thrown by ssl module at the end18:16
noonedeadpunkah, we use ssl by default18:17
fridtjof[m]hm, i'll change nova-api to use 5672 without ssl then18:19
noonedeadpunkalso, what I used to run to fix rabbit - openstack-ansible playbooks/rabbitmq-install.yml -e rabbitmq_upgrade=true18:20
noonedeadpunkI think you can just change this for nova...18:20
noonedeadpunkthis re-creates queues so might fix things if rabbit starts acting wierdly18:21
fridtjof[m]i did that yesterday (as part of a minor upgrade), but it didn't really help18:21
fridtjof[m]oh, oops. I rebooted the entire host, not the container18:22
fridtjof[m]that was dumb18:23
spatelnoonedeadpunk: why do we need to use -e rabbitmq_upgrade=true ?18:33
fridtjof[m]alright, seems the reboot kind of helped18:35
fridtjof[m]the rabbitmq related issues are gone (for now, at least)18:35
fridtjof[m]but my base problem is still there >_>18:35
jrosserjamesdenton: with the haproxy/metal/bind-to-mgmt theres kind of two things in play18:36
jrosserwithout the bind-to-mgmt patches all the services were bound to 0.0.0.0 so thats the first thing that needs cleaning up, and those changes were a precursor to landing the haproxy+metal patch18:37
jrosserhowever for the prod deploys you mention where haproxy was on seperate nodes then that wouldnt have been an issue, so i would think that the changes we have made in V might not be so impactful there18:38
fridtjof[m]alright, i can pin down at least this issue now - creating an instance on an external network times out because network binding fails18:38
fridtjof[m]and i can see a steady stream of exceptions on one compute  http://paste.openstack.org/show/799095/18:39
fridtjof[m]"permission denied" sounds like the agent is misconfigured?18:39
noonedeadpunkthis sounds as missing sudo18:39
jrosserbut..... if there are deploys where great effort has been done to make haproxy co-exist with a metal deploy infra node, thats where the upgrade might be more tricky18:40
noonedeadpunkdo you have sudo binary on compute?18:40
fridtjof[m]sudo is present18:40
fridtjof[m]yeah18:40
*** djhankb has joined #openstack-ansible18:40
noonedeadpunkthe probably some command/path is missing from /etc/neutron/rootwrap.d/18:42
noonedeadpunkbut um, there ton's of stuff18:42
noonedeadpunkso if you don't have command it tries to execute before that stack trace probably worth enabling debug and restarting service to see on what exact command it fails and not being able to gain permissions18:43
fridtjof[m]trying that18:45
fridtjof[m]debug output doesn't give me the exact command line :/18:50
fridtjof[m]i'll resort to adding some log statements now lol18:50
recycleheroguys I want to restore mariad in a 2 node env. 1 controller so 1 node galere.18:56
recycleheroI am planning to18:56
recyclehero1-  stop maria 2-restore from backup 3-etc/init.d/mysql start --wsrep-new-cluster18:57
recycleherosounds okay?18:58
recycleherohttps://docs.openstack.org/openstack-ansible/ussuri/admin/maintenance-tasks.html#galera-cluster-recovery18:59
recycleherorecovering primary component link is broken on this page19:00
fridtjof[m]huh, it's on both compute nodes though.19:00
fridtjof[m]they both get "permission denied"19:01
*** cshen has quit IRC19:17
*** cshen has joined #openstack-ansible19:17
*** yann-kaelig has quit IRC19:18
MickyMan77jrosser: when I check the meta via haproxy I do get info.. --> http://paste.openstack.org/show/799098/19:19
MickyMan77jrosser: but when I tcpdump the br-vlan nic I can't see any autogoing traffic from the new created Instances to the meta service.19:20
*** gregwork has quit IRC19:26
*** cshen has quit IRC19:26
fridtjof[m]okay, it's trying to add a link local ipv6 address to a brq interface...? and that gets a "permission denied"19:30
fridtjof[m]looks like it's not a permission problem after all.19:32
fridtjof[m]root@compute2-CP6NY03:~# ip a add fe80::ac59:20ff:fe4c:8cff/64 dev brqf7424189-aa19:32
fridtjof[m]RTNETLINK answers: Permission denied19:32
fridtjof[m]i replicated what it's trying to do19:32
fridtjof[m]okay, seems like that address already exists on eth1219:33
fridtjof[m]I configured my environment according to https://docs.openstack.org/openstack-ansible/train/user/prod/example.html19:35
*** MickyMan77 has quit IRC19:48
fridtjof[m]okay, found the cause for the exception, but I have no idea why that is the case20:07
fridtjof[m]when I launch an instance attached to flat provider network (wired up on compute hosts through the br-vlan-veth/eth12 pair), a bridge "brq<guid>" gets created, eth12 and the VM's tap adapter get added to it20:08
fridtjof[m]then neutron-linuxbridge-agent is trying to add a link local ipv6 to the bridge, but /proc/sys/net/ipv6/conf/brqf7424189-aa/disable_ipv6 is 120:09
fridtjof[m]and that's where the "permission denied" is coming from20:09
fridtjof[m]now, the question remains: why is that set to 1, and for what is it adding a link local v6 address to that adapter anyway?20:10
Adri2000fridtjof[m]: hello, I just read through quickly, and that sounds like an issue I had very recently and spent a day debugging with the help of neutron developers... have a look at https://bugs.launchpad.net/neutron/+bug/1899141 and https://review.opendev.org/#/c/757107/20:13
openstackLaunchpad bug 1899141 in neutron "Linuxbridge agent NetlinkError: (13, 'Permission denied') after Stein upgrade" [Medium,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)20:13
fridtjof[m]> Can you add debug logs in the "add_ip_address" method20:15
fridtjof[m]oh my god, that's exactly what i ended up doing20:16
jamesdentonjrosser the changes you made will be helpful in my scenario, with haproxy on the controller nodes. thank you20:16
fridtjof[m]i wish i would've known earlier, i spent half my evening on this ;_;20:16
fridtjof[m]it's the exact same issue, thank you Adri200020:16
Adri2000yw :)20:17
*** spatel has quit IRC20:25
*** jeh has joined #openstack-ansible20:35
*** nurdie has quit IRC20:38
*** nurdie has joined #openstack-ansible21:09
*** cshen has joined #openstack-ansible21:12
*** cshen has quit IRC21:17
*** jbadiapa has quit IRC21:17
recycleheroCreate the neutron provider netowrk facts works when neutron_provider_networks is not defined21:30
recycleheroso it means I cant make a change to netowrk physical mappings?21:30
recycleheroreally simple question, how to set host vars? I want to set neutron_provider_networks per host. I am looking for host_vars21:39
*** MickyMan77 has joined #openstack-ansible21:44
*** gshippey has quit IRC21:48
*** rh-jelabarre has quit IRC22:00
*** yann-kaelig has joined #openstack-ansible22:46
*** macz_ has quit IRC23:03
*** cshen has joined #openstack-ansible23:13
*** cshen has quit IRC23:17
*** tosky has quit IRC23:23

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!