Tuesday, 2022-07-26

*** ysandeep|out is now known as ysandeep01:55
*** ysandeep is now known as ysandeep|afk03:21
*** ysandeep|afk is now known as ysandeep05:03
jrossermorning07:38
*** ysandeep is now known as ysandeep|afk08:02
noonedeadpunkmorning!08:07
damiandabrowskihi!09:10
*** tosky_ is now known as tosky09:10
*** dviroel__ is now known as dviroel11:05
mgariepygood morning everyone11:39
*** ysandeep|afk is now known as ysandeep12:30
mgariepyhmm ? ERROR! couldn't resolve module/action 'openstack.cloud.os_auth'. This often indicates a misspelling, missing collection, or incorrect module path.14:44
opendevreviewMarc Gariépy proposed openstack/openstack-ansible master: Fix cloud.openstack.auth module  https://review.opendev.org/c/openstack/openstack-ansible/+/85103814:47
noonedeadpunkdo we need to backport that? ^14:49
jrosserwhoops14:50
mgariepymaybe we need to start managing the version or the module we install ?14:50
jrosseri think the zuul stuff is set up now to run the infra jobs when those playbooks are touched14:50
jrosserwe didnt do that before and could merge broken stuff14:50
jrosserhuh interesting https://github.com/openstack/openstack-ansible/blob/master/zuul.d/jobs.yaml#L267-L27714:52
jrosseryes so we don't actually run that anywhere in CI?14:53
*** dviroel is now known as dviroel|lunch14:55
opendevreviewMarc Gariépy proposed openstack/openstack-ansible master: Set the number of threads for processes to 2  https://review.opendev.org/c/openstack/openstack-ansible/+/85094214:56
jrosseralso why are we running stream-9 distro jobs14:59
jrosserthat looks like a mistake14:59
*** ysandeep is now known as ysandeep|out15:00
noonedeadpunk#startmeeting openstack_ansible_meeting15:00
opendevmeetMeeting started Tue Jul 26 15:00:59 2022 UTC and is due to finish in 60 minutes.  The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.15:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:01
opendevmeetThe meeting name has been set to 'openstack_ansible_meeting'15:01
noonedeadpunk#topic office hours15:01
mgariepyright on time :D 15:01
noonedeadpunkso for infra jobs yeah, we probably should also add ansible-role-requirements and ansible-collection-requirements15:02
noonedeadpunkto "files"15:02
noonedeadpunkstream-9 distro jobs - well, I intended to fix them one day15:03
noonedeadpunkthey're NV as of today, so why not. But I had a dependency on smth to get them working, can't really recall15:03
* noonedeadpunk fixed alarm clock :D15:04
damiandabrowski:D 15:04
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Remove centos-9-stream distro jobs  https://review.opendev.org/c/openstack/openstack-ansible/+/85104115:04
jrossero/ hello15:05
noonedeadpunkWill try to spend some time on them maybe next week15:05
noonedeadpunkoh, ok, so the reason why distro jobs are failing, is that zed packages are not published yet15:07
noonedeadpunkand we need openstacksdk 0.99 with our collection versions15:07
noonedeadpunkSo. I was looking today at our lxc jobs. And wondering if we should try switching to overlayfs instead of dir by default?15:09
noonedeadpunkmaybe it's more a ptg topic though15:09
jrosseris there a benefit?15:10
noonedeadpunkBut I guess as of today, all supported distros (well, except focal - do we support it?) have kernel 3.13 by default15:10
noonedeadpunkWell, yes? You don't copy root each time, but just use cow and snapshot?15:10
noonedeadpunkI have no idea if it's working at all tbh as of today, as I haven't touched that part for real15:11
noonedeadpunkbut you should save quite some space with that on controllers15:11
jrosseri think we have good tests for all of these, so it should be easy to see15:12
noonedeadpunkEach container is ~500MB15:12
noonedeadpunkAnd diff would be like 20MB tops15:12
noonedeadpunkNah, more. Image is 423MB15:12
noonedeadpunkSo we would save 423Mb per each container that runs15:12
noonedeadpunkAnother thing was, that we totally forget to mention during previous PTG, is getting rid of dash-sparated groups and use underscore only15:13
noonedeadpunkWe're postponing this for quite a while now to have that said...15:13
noonedeadpunkBut it might be not that big chunk of work.15:14
noonedeadpunkI will try to look into that, but not now for sure15:14
noonedeadpunkRight now I'm trying to work on AZ concept and what needs to be done from OSA side to get this working. Will publish a scenario to docs once done if there're no objections15:15
noonedeadpunkenv.d overrides are quite massive as of to day to have that said15:15
noonedeadpunk*today15:15
jrosseri wonder if we prune journals properly15:16
opendevreviewMarc Gariépy proposed openstack/openstack-ansible master: Fix cloud.openstack.* modules  https://review.opendev.org/c/openstack/openstack-ansible/+/85103815:16
jrosseri am already using COW on my controllers (zfs) and the base size of the container is like tiny compared to the total15:16
noonedeadpunkah. well, then there's no profit in overlayfs for you :D15:17
noonedeadpunkBut I kind of wonder if there're consequences to use overlayfs with other bind-mounts, as it kind of change representaton of these from what I read15:18
jrosseri guess i mean that the delta doesnt always stay small15:18
jrosserit will be when you first create them15:18
noonedeadpunkwell, delta would be basically venvs size I guess?15:19
jrosserfor $time-since-resinstall on one i'me looking at 25G for 13 containers even with COW15:20
noonedeadpunkas logs and journals and databases are passed into the container and not root15:20
noonedeadpunkthat is kind of weird?15:20
mgariepywow.. https://github.com/ansible/ansible/issues/7834415:28
jrosserhuh15:29
jrosserthey're going to complain next at the slightly unusual use of the setup module, i can just feel it15:30
mgariepylet's make the control persist 4 hours.. ;p15:30
mgariepywhat a good responses..15:30
jrossernoonedeadpunk: in my container /openstack is nearly 1G for keystone container, each keystone venv is ~200M so for a couple of upgrade this adds up pretty quick15:31
noonedeadpunkbtw, retention of old venvs is a good topic 15:32
noonedeadpunkshould we create some playbook at least for ops repo for that?15:32
jrosserit's pretty coupled to the inventory so might be hard for the ops repo?15:35
jrossermgariepy: i will reproduce with piplining=False and show that it handles -13 in that case15:36
jrosseri'm sure i saw it doing that15:36
mgariepyhmm i haven't been able to reproduce it with pipelining false.15:37
noonedeadpunkbecause it re-tries :)15:37
mgariepyno15:38
mgariepybecause scp catch it to transfer the file.15:38
mgariepythen reactivate the error15:38
mgariepyerror / socket15:38
jrosserwell - question is if it throws AnsibleControlPersistBrokenPipeError and handles it properly when pipelining is false15:40
mgariepywhen pipelining is false15:40
mgariepyit will transfert the file.15:40
mgariepyif there is an error it most likely end up being exit(255).15:40
mgariepythen it retries.15:40
mgariepyno ?15:40
jrosserlets test it15:41
mgariepyyou never catch the same race with pipelining false.15:41
jrosserno, but my code that was printing p.returncode was showing what was happening though15:41
jrosseranyway, lets meeting some more then look at that15:42
jrosserwe did a Y upgrade in the lab last week15:44
jrosserwhich went really quite smoothly, just a couple of issues15:44
spotz_nice15:44
jrossergalaxy seems to have some mishandling of http proxies so we had to override all the things to be https://github....15:45
jrosseralso lxc_hosts_container_build_command is not taking account of lxc_apt_mirror15:46
anskiynoonedeadpunk: there is already one: https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/ansible_tools/playbooks/cleanup-venvs.yml15:46
jrosserbut the biggest trouble was from `2022-07-22 08:04:35 upgrade lxc:all 1:4.0.6-0ubuntu1~20.04.1 1:4.0.12-0ubuntu1~20.04.1`15:47
noonedeadpunkanskiy: ah, well :)15:47
noonedeadpunkoh, well, it would restart all containers15:48
jrosseryeah, pretty much15:48
jrosserand i think setup-hosts does that across * at the same time15:48
noonedeadpunkwhat I had to do with that, is to systemctl stop lxc, systemctl mask lxc, then update package, unmask and reboot15:49
noonedeadpunkif we pass package_state: latest, then yeah...15:49
jrosseri don't know if we want to manage that a bit more15:49
jrosserbut that might be a big surprise for poeple expecting the upgrades to not hose everything at once15:50
noonedeadpunkI bet this is causing troubles https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/run-upgrade.sh#L18615:51
jrosserwe also put that in the documentation https://opendev.org/openstack/openstack-ansible/src/branch/master/doc/source/admin/upgrades/minor-upgrades.rst#L5215:53
noonedeadpunkSo we would need to add serial here https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/containers-lxc-host.yml#L24 and override it in script15:53
noonedeadpunkok, that is good a point. 15:53
noonedeadpunk* a good point15:53
jrosserwell i don't know if serial is the right thing to do15:53
noonedeadpunkare you thinking about putting lxc on hold?15:54
noonedeadpunkI really can't recall how it's done on centos though15:55
noonedeadpunkbut yes, I do agree we need to fix that for ppl15:57
noonedeadpunknot upgrading packages - well, it's also a thing. Maybe we can adjust docs not to mention `-e package_state=latest` as smth "default" but what can be added if you want to update packages, but with some pre-cautions15:58
noonedeadpunkAs at very least this also affects neutron, as ovs being upgraded also cause disturbances15:58
jrosserandrewbonney is back tomorrow - i'm sure we have some use of unattended upgrades here with config to prevent $bad-things but i can't find where that is just now15:59
noonedeadpunkwell, we jsut disabled unattended upgrades...16:00
jrosseryeah i might be confused here, but it is the same sort of area16:00
noonedeadpunkso my intention when I added `-e package_state=latest` was - you're panning upgrade and thus maintenance. During maintenance interruptions happen.16:01
noonedeadpunkBut with case of LXC you would need to re-boostrap galera at bery least16:01
noonedeadpunkas all goes down at the same time16:01
jrosserright, it's not at all self-healing just by running some playbooks16:01
*** dviroel|lunch is now known as dviroel16:01
jrosserwhich is kind of what poeple expect16:01
noonedeadpunkand what I see we can do - either manage serial, then restarting lxc will likely self-heal as others are alive, or, put some packages on hold and add another flag to upgrade LXC to make it intentionally16:02
noonedeadpunkas I don't think we are able to pin the version16:03
noonedeadpunkok, let's wait for andrewbonney then to make a decision 16:04
noonedeadpunk#endmeeting16:04
opendevmeetMeeting ended Tue Jul 26 16:04:06 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:04
opendevmeetMinutes:        https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.html16:04
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.txt16:04
opendevmeetLog:            https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.log.html16:04
noonedeadpunkoh, I can recall that we somehow prevented package triggers to run for galera I guess.16:08
noonedeadpunkand we already have lxc_container_allow_restarts variable16:09
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/tasks/galera_install_apt.yml#L65-L7116:09
jrossermgariepy: without pipelining https://paste.opendev.org/show/bfdqBx2uOhGOr8LJRMGV/16:15
jrosserand with -vvv https://paste.opendev.org/show/bWCoeQuDgiFeR8msOpop/16:17
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Allow to provide serial for lxc_hosts  https://review.opendev.org/c/openstack/openstack-ansible/+/85104916:17
jrosserand with pipelining enabled https://paste.opendev.org/show/baJ8I2xKlAiovdwHvniM/16:20
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Prevent lxc.service from being restarted on package update  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/85107116:29
noonedeadpunkhuh, how it re-tries then....16:30
jrosseri don't know16:34
jrossermgariepy: noonedeadpunk the other thing i totally don't undestand is why this always happens with the setup module16:43
jrosseror is it really 'always', or are we just unlucky in the stucture of our playbooks that T>ControlPersist has elapsed before we try to gather facts on the repo hosts16:44
mgariepyit think it's reproducable by another module.17:02
mgariepybut it would need not to exit 255..17:03
mgariepywe probably can trigger it with comamnd module also17:03
jrosserperhaps there is something specific with the setup module17:06
mgariepymaybe.17:06
mgariepybut as long as we get the same race and an exit outside of 255 it should be failing.17:07
mgariepythe fun part is to make the other modules takes as much time as setup takes :D 17:10
jrosserok here with go with the file: module https://paste.opendev.org/show/bFNbYfNTxDRBENHh6h9S/17:11
mgariepyhow nice :D17:11
mgariepyhttps://github.com/ansible/ansible/blob/e4087baa835df6046950a6579d2d2b56df75d12b/lib/ansible/plugins/connection/ssh.py#L123517:16
mgariepycheckrc=False17:17
mgariepyso if it's scp of sftp it does not care about the exit status and retry if so17:18
jrosserhere is another case where we get `muxclient: master hello exchange failed` https://paste.opendev.org/show/b3xllr0pREQ3vDvpyoax/17:18
jrosseri am still not able to make a MODULE FAILURE with anything else though17:38
mgariepythe file modules with pipelining does it do the same as the setup module ?17:41
spateljamesdenton around?17:41
mgariepyor it does transfer the file in another way ?17:42
jrosserit's possible it's different, just look at the number of SSH steps in my paste17:42
jrosseri'm just trying something else like package:17:42
mgariepysome vote please :D https://review.opendev.org/c/openstack/openstack-ansible/+/85103817:43
mgariepyalso any comments on : https://review.opendev.org/c/openstack/openstack-ansible/+/850942/ ?17:49
mgariepyi think having it to 1 makes the process not responing to haproxy an then disconect the endpoint server.17:50
jrosserpackage: MODULE FAILURE https://paste.opendev.org/show/bGFkEBoURUovv3ktl9JJ/17:51
jrossermgariepy: is it really threads - i always find this sooo confusing with python17:51
mgariepyle'ts rewrite openstack in go.17:52
mgariepysimple deployment, real threads. and so on.17:52
jrosseri think what i mean is i don't know if that actually does what you expect :)17:54
mgariepyso should we set the number of process to 2 instead ?17:56
mgariepyi assume that with 2 threads it should be a bit better to respond to multiple request.17:56
jrossernoonedeadpunk: ^ do you understand how this works?17:58
mgariepyarf.. meeting now17:58
noonedeadpunkI wonder what's wrong with linters actually. seems that always fail19:22
noonedeadpunkum, not  sure I got where this leads...19:28
mgariepythe linter is fixed in the other patch.19:29
noonedeadpunk(last was regarding python threads)19:30
mgariepyi saw a few failure when tempest was running that the services were just not responding.19:38
mgariepynot 100% sure having 2 thread will fix 100% of the case. but maybe.19:39
mgariepymayber i could try to take the time to test if it actualy help. but i'm kinda short on time for the remaining of the week.19:39
opendevreviewMerged openstack/openstack-ansible master: Fix cloud.openstack.* modules  https://review.opendev.org/c/openstack/openstack-ansible/+/85103820:14
*** anskiy1 is now known as anskiy21:28
*** dviroel is now known as dviroel|out23:29

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!