Tuesday, 2022-04-05

jrossernoonedeadpunk: odd errors here https://review.opendev.org/c/openstack/openstack-ansible/+/83637807:42
jrosseri wonder what changed there07:42
jrossercan we merge this https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/83637707:43
noonedeadpunkmornings07:56
noonedeadpunkbasically lxc fails07:57
noonedeadpunkthat sounds like some issue with our connection plugin definition07:58
noonedeadpunkI think we use related path import there or some nasty thing that likely has changed07:58
noonedeadpunkI wonder if that's `AnsiballZ - Ensure we use the full python package in the module cache filename to avoid a case where collections: is used to execute a module via short name, where the short name duplicates another module from ansible.builtin or another collection that was executed previously.`08:00
noonedeadpunkhttps://github.com/ansible/ansible/blob/stable-2.12/changelogs/CHANGELOG-v2.12.rst#v2-12-408:00
jrosserso maybe thats because we duplicate the name of the ssh connection plugin with ours?08:08
noonedeadpunkbut at the same time we provide full path....08:21
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/openstack-ansible.rc#L5008:21
noonedeadpunkbut I'm 99.9% sure it has smth to do with connection plugin this way or another. Will spawn aio then to test this out08:22
*** ysandeep is now known as ysandeep|lunch08:28
opendevreviewMerged openstack/openstack-ansible-plugins master: Fix detection of Rocky Linux for ssh_keypairs role  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/83637708:29
*** ysandeep|lunch is now known as ysandeep09:00
opendevreviewMerged openstack/openstack-ansible-openstack_hosts stable/wallaby: Use correct system.conf.d permissions  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/83633909:03
jrossernoonedeadpunk: as far as i can see it is trying to ssh to the container rather than the physical host https://paste.opendev.org/show/bl3xVqKH6LFaOljc82ig/09:09
noonedeadpunkyup..09:10
noonedeadpunkwhich maskes me think that our connection plugin simply not used for some reason09:10
jrosserit is certainly using the strategy plugin as some of those messages come from that09:12
opendevreviewMerged openstack/openstack-ansible-openstack_hosts stable/xena: Use correct system.conf.d permissions  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/83633809:20
noonedeadpunkok, it;'s 2.12.3 that broke us eventually09:21
noonedeadpunk`ssh connection now uses more correct host source as play_context can ignore loop/delegation variations.`09:22
noonedeadpunk`gather_facts action now handles the move of base connection plugin types into collections to add/prevent subset argument correctly`09:22
noonedeadpunksounds like smth related to the issue:)09:23
jrosserhttps://github.com/ansible/ansible/commit/be19863e44cc6b78706147b25489a73d7c8fbcb509:25
noonedeadpunkmmm, yeah...09:27
noonedeadpunkbut! I think it's indeed jsut facts gathering09:28
noonedeadpunknah, disregard that09:28
jrosseri think this is where we override the target https://github.com/openstack/openstack-ansible-plugins/blob/master/plugins/connection/ssh.py#L414-L41709:29
jrosserbut in the ansible patch they add a whole bunch of `self.host = self.get_option('host') or self._play_context.remote_addr`09:30
noonedeadpunkso we basically should unset 'host' from options as well I guess09:31
jrosseractually that is the only change they make the the ssh plugin09:33
noonedeadpunkbut hm, shouldn't shouldn't self.host be returned for get_option('hsot')09:34
noonedeadpunkok, no, it doesn't09:35
opendevreviewMerged openstack/openstack-ansible-repo_server master: Use /run/nginx.pid  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83637409:36
noonedeadpunkjrosser: ok, I have patch I think09:39
jrosserjust from the debug message we can see that its using self.host `<172.29.238.145> ESTABLISH SSH CONNECTION FOR USER: root`09:40
jrosserthe IP in < > is self.host09:41
jrosserand self._play_context.remote_addr does indeed contain the thing we want09:43
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Define physical_host in options  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/83658509:43
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Define physical_host in options  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/83658509:45
noonedeadpunkjrosser: ^ this works nicely in aio....09:46
noonedeadpunklooks like nasty hook though09:46
noonedeadpunkbut I printed self.get_options('host') and it was container address09:47
noonedeadpunkI dunno if self.host is used anywhere down the line though....09:47
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Define physical_host in options  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/83658509:50
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Update ansible-core to 2.12.4  https://review.opendev.org/c/openstack/openstack-ansible/+/83637809:50
jrosserworks here too - we get to see if it also works on 2.12.2 with 83658509:52
noonedeadpunkI think on 2.12.2 it just uses context, but not 100% sure09:55
noonedeadpunkat least after core downgrade locally things remain working09:56
jrosseryes it looks good10:00
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-repo_server stable/xena: Use /run/nginx.pid  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83659310:04
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-repo_server stable/wallaby: Use /run/nginx.pid  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83659410:04
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-repo_server stable/victoria: Use /run/nginx.pid  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83659510:04
jrossernginx pid is not the fix for stable branches it seems https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83659412:15
opendevreviewJonathan Rosser proposed openstack/openstack-ansible stable/xena: Connect openstack_pki_regen_ca variable to pki role  https://review.opendev.org/c/openstack/openstack-ansible/+/83401712:24
opendevreviewMerged openstack/openstack-ansible-os_tempest stable/xena: Set py_modules to an empty list in setup.py  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/83631512:40
noonedeadpunkhm12:41
noonedeadpunkwondering if it is for master as well...12:42
noonedeadpunkoh12:42
noonedeadpunkut it fails for buster12:42
noonedeadpunkso likely buster is different12:42
opendevreviewMerged openstack/openstack-ansible-os_tempest stable/wallaby: Set py_modules to an empty list in setup.py  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/83616912:45
opendevreviewMerged openstack/openstack-ansible-os_tempest stable/victoria: Set py_modules to an empty list in setup.py  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/83633012:45
jrosseroh no more patches get zuul -2 because of early +2+W13:02
opendevreviewMerged openstack/openstack-ansible-plugins master: Define physical_host in options  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/83658513:03
mgariepywhat ?13:03
jrossermgariepy: see what happened here https://review.opendev.org/c/openstack/openstack-ansible/+/83637813:04
jrosserthats super unhelpful behaviour13:04
mgariepyso the workflow would be to +2 wait for zuul, then +w ?13:05
jrosseri think it's because it has an unmerged depends-on in a different zuul queue13:05
jrosserbut i don't really see what that would warrant a -213:05
jrosserit used to just do nothing at that point13:05
mgariepyso cross repo depend is broken ?13:06
jrosserbut the +2+W of someone else would be retained, so the reviewer didnt have to wait for the dependant patch to merge13:06
noonedeadpunklet's go #openstack-infra mmaybe to discuss that? As I tried understand but I can't13:06
jrosseri asked yesterday but perhaps we need to ask again13:06
jrossernoonedeadpunk: do wo have a PTG etherpad?13:08
jrosser*we13:09
noonedeadpunkhttps://etherpad.opendev.org/p/osa-Z-ptg13:09
jrosseri expect PTG week is a tricky week to get answers on this stuff13:18
noonedeadpunkwell, yes... but I have lack of ideas I believe.... or well, lack of time to think about adding new cool stuff:)13:46
mgariepytoday is not quite a good day for me. i'll running out of power on my ups in like 30-60 minutes for my internet :( so i won't be able to be there for ptg today.13:50
noonedeadpunkdoh....14:03
noonedeadpunkthat's a bummer14:03
mgariepythey always tell the worst case for work on the powelines but it's 10 am and they say it will be back for 8 pm.. so i don't expect much.14:05
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_senlin master: Updated from OpenStack Ansible Tests  https://review.opendev.org/c/openstack/openstack-ansible-os_senlin/+/83572114:20
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Define zuul queue  https://review.opendev.org/c/openstack/openstack-ansible/+/83665714:23
jrossernoonedeadpunk: do we think that is correct? ^14:23
jrosserif all our changes get queued together and one at the head of the queue fails, doesnt that invalidate the whole queue14:24
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Use integrated queue for project  https://review.opendev.org/c/openstack/openstack-ansible/+/83665814:24
jrosserwhere actually pretty much most of our repos can merge patches independantly14:24
jrosserafaik this is why we see giant queues for some projects in zuul status, but ours stay short and have good throughput because they don't break each other14:25
noonedeadpunkoh, indeed, you;re right....14:27
noonedeadpunkI read that different way at first14:27
jrossermy understanding of this is a little lightweight though14:28
jrosseri think a lot of the style in openstack is driven by mono-repo or single project oriented things14:29
jrosserand we really do something a bit different to that14:29
noonedeadpunkthis just wasn't stated anywhere in docs this way:)14:30
noonedeadpunkAnd `Any projects which interact with each other in tests should be part of the same shared queue in order to ensure that they don’t merge changes which break the others.` sounded like indeed smth we should have had14:31
jrosseryeah, though i was thinking about that14:31
jrosserand the logical conclusion is that there could only ever be one queue14:31
jrosserbecause we might legitimately want to depends-on SDK, or nova, or requirements14:31
jrosseror anything really14:31
jrosserso thats when i start to get confused about what the actual concept is here14:32
noonedeadpunkwell, true...14:32
opendevreviewMerged openstack/openstack-ansible-repo_server stable/xena: Use /run/nginx.pid  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83659314:32
noonedeadpunkAnd indeed fungi said "their testing has to be reset if there's a failure for a change ahead of the current change" which leads that you're right and it would be mess if we use them I guess. Like any patch failure would lead to invalidation of others....14:33
jrosseroh yes just imagine trying to merge a proposal bot set of changes14:33
fungidepends on whether your projects often fail gate jobs after passing in check14:33
jrosserit would be impossible14:34
fungizuul's dependent pipelines are optimized for rapidly merging changes, so jobs with a high rate of spurious failures unrelated to the changes being tested tend to slow things up as a result14:35
jrosserfungi: is the assumption there that it is rapid merging to the same repo?14:35
noonedeadpunkand likely we even wanted to push it here instead of all projects https://opendev.org/openstack/project-config/src/branch/master/zuul.d/projects.yaml14:35
fungiif your jobs usually only fail when there's something wrong with the change itself, then things should merge very quickly with dependent queues14:35
fungirapid merging to all repos, but yes14:36
jrosseri think we are mostly getting failure due to $external-random-thing that we have no control over (ansible-galaxy gives error)14:36
noonedeadpunkwe tend to have interminnent issues related to some infra/resource related stuff14:36
jrosseror our current 'noise floor' of things that we've not got to the bottom of occasional errors yest14:37
noonedeadpunkor OOM :)14:37
jrosserhah14:37
fungii wonder if we could improve the galaxy issues with some caching. i know tripleo has zuul directly provide a number of galaxy modules as git checkouts in their jobs14:37
jrosserwe do that for everything we can already14:37
jrosserbut some collections do not contain the right stuff to allow local installs to work14:38
noonedeadpunktripleo installs roles as python packages, so dunno what you're talking about :)14:38
fungiahh, okay i couldn't remember if that had already been explored14:38
fungii probably confused the two in that case14:38
jrosserthough really i guess this discussion is all a result of our cross-queue trouble14:38
fungii guess it was openstack-ansible installing galaxy modules from git checkous14:38
jrosserand if conceptually we are doing the right thing or not14:38
noonedeadpunkwe do that in gates, yes14:39
noonedeadpunkwherever we can14:39
fungisome background: the cross-project dependent queuing concept in zuul originated early in openstack's development, because we wanted to make sure that changes to cinder or cinderclient didn't break volume attachments through the nova api, for example, so we tested changes for all of them together in an integrated queue using devstack/tempest to make sure they remained interoperable at14:40
fungieach change before merging it14:40
dmsimardo/ I'm not at the openstack PTG but feel free to reach out if you have ansible or ara questions14:40
jrosserdmsimard: versions in galaxy.yml should be mandatory in the git repos :)14:41
jrossernot added after-the-fact as something is published to galaxy - otherwise you can't install from git source14:41
fungiso if a change to cinder's api was queued ahead of a change to nova's volume handling and the former broke the latter, then that prevented the latter from merging. however if the former change couldn't pass its own tests, the latter would be re-tested without it and merged if it passed on its own14:41
fungithat way we could test both changes in parallel with the assumption they would both pass their tests, and only need to restart testing if that assumption was untrue14:42
jrosserdmsimard: so specifically ansible.netcommon ansible.utils openvswitch.openvswitch don't do this, which means they cant be installed with type: git in a collection requirements file14:44
dmsimardjrosser: oh ? can you show me an example ?14:44
* dmsimard looks in the meantime14:44
jrosserhere is our collection requirements file https://github.com/openstack/openstack-ansible/blob/master/ansible-collection-requirements.yml14:45
dmsimardjrosser: oh I see what you mean14:45
jrosserand the ones at the bottom cannot come from git, so cannot use cached repos in CI, so contribute to our CI failure noise floor14:45
dmsimardyou're right -- I think that's because they insert it dynamically at build/publish time14:45
dmsimardbut it's definitely worth talking about14:45
jrosserif the process of applying the tag also committed the version, that would be cool14:46
jrosserbut i guess that can be tricky as you want the repo structure always to be 'master' or 'devel' or whatever14:46
noonedeadpunkdmsimard: eventually Paul was quite picking in terms of not adding version to galaxy.yml for whatever reason14:47
dmsimardI wonder if we could make the argument that it's a bug in the ansible-galaxy CLI (as in, if there's no version it's not a fatal error)14:47
dmsimardnoonedeadpunk: Paul has moved on and ain't maintaining those anymore14:47
noonedeadpunkah14:47
noonedeadpunkwell, eventually galaxy not failing anymore14:47
noonedeadpunkbut it shows collection version as "*"14:47
noonedeadpunkwhich is not helpfull either14:48
* jrosser rechecked a patch for galaxy 502 error today14:48
noonedeadpunkbut super valid from ansible-galaxy prespective14:48
noonedeadpunkdmsimard: fwiw I created a bug https://github.com/ansible-collections/openvswitch.openvswitch/issues/9414:48
noonedeadpunkbut not sure if worth spreading it across every collection....14:49
dmsimardjrosser: those 502 errors are a plague14:49
jrosserright, and its hours and hours of wasted CPU time in an openstack context14:49
dmsimardI see a lot of HTTP 429's too14:49
jrosserthe collections are cached on the CI nodes for some of them14:49
jrosserdmsimard: in our jobs we pre-process the collections requirement file and re-write the entries that have local clones of the collections, like this https://github.com/openstack/openstack-ansible/blob/master/scripts/get-ansible-collection-requirements.yml#L50-L6714:51
dmsimardouch14:52
jrosserit's that big a deal relying on upstream galaxy being reliable that such measures are needed14:54
jrosserwhilst i understand that there is 'policy' that things are installed with the galaxy backend / api / whatever, reality says otherwise14:55
jrosserand i would also like an option on ansible-galaxy which can install from git and retain the git metadata14:55
jrosseras the developer workflow for collections is pretty terrible right now14:55
dmsimardI will hunt that git version one for now but it would be helpful if you could organize and write down some of these papercuts somewhere that I can easily share them with the right folks14:56
dmsimardfwiw I reproduced it with ansible.netcommon and it ain't even a helpful error message: ERROR! Collection artifact at '/home/dmsimard/.ansible/tmp/ansible-local-24116076pwl8md3/tmpyr5shbyf/ansible.ne-lp9xq87k/ansible.netcommon' is not a valid tar file.14:58
jrosser:)14:58
noonedeadpunkJust in case - we're about to start in https://www.openstack.org/ptg/rooms/essex15:02
* jrosser was just wondering where to find the schedule15:02
jrosserkinda absent from https://www.openstack.org/ptg/15:02
noonedeadpunkhttps://ptg.opendev.org/ptg.html15:03
damiandabrowski[m]I also had a problem with that :D 15:03
jrosserdoh15:03
*** ysandeep is now known as ysandeep|out16:03
dmsimardnoonedeadpunk, jrosser: I spent some time reproducing the galaxy.yml version issue and asking folks about it -- in my testing, even though it says "Installing 'ansible.netcommon:*'", it installs the correct version (I tried 2.5.1 and 2.6.1)20:19
dmsimardI've tested with the latest version of ansible-core (2.12.4) though20:20
dmsimardThe error I got earlier was because I had set "source: git" instead of "type: git" so it was my fault20:20
jrosserdmsimard: interesting20:24
jrosserwe have a patch in flight to upgrade to 2.12.4 already20:25
jrosserthose tags all exist in ansible.netcommon git repo so maybe now it's more tolerant of missing version info in galaxy.yml20:26
opendevreviewMerged openstack/openstack-ansible-tests stable/xena: Add ansible.utils collection requirement  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/83636120:32
*** dviroel is now known as dviroel|out20:36
dmsimardjrosser: I can try to reproduce with earlier versions to find out whether it was fixed at some point20:42
opendevreviewMerged openstack/openstack-ansible-rabbitmq_server stable/wallaby: Verify if hosts file already managed with OSA  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/83616721:40
opendevreviewMerged openstack/openstack-ansible-repo_server master: Use ssh_keypairs role to generate keys for repo sync  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/82710021:50
dmsimardjrosser: seems to work with the latest 2.11 too (2.11.10)23:08
dmsimarddoesn't work with 2.9 (but also neither does ansible.posix with type: git)23:10
dmsimardinterestingly enough, ansible-base 2.10 does print an actual warning: [WARNING]: Collection at '/home/dmsimard/.ansible/tmp/ansible-local-2454180er65d81l/tmpr_cryqol/ansible.netcommon' does not have a valid version set, falling back to '*'. Found version: 'None'23:11
dmsimardthat warning doesn't exist in 2.11 and 2.1223:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!