Friday, 2022-07-22

*** akahat is now known as akahat|ruck05:00
*** ysandeep|out is now known as ysandeep05:05
*** chkumar|rover is now known as chandankumar05:05
*** arxcruz is now known as arxcruz|rover06:54
noonedeadpunkthat is _suuuper_ interesting read07:15
noonedeadpunkbut right now we use 2.12.6 and kind of see the issue in CI?07:16
opendevreviewDmitriy Rabotyagov proposed openstack/ansible-config_template master: Use release-ansible-collections from project-config  https://review.opendev.org/c/openstack/ansible-config_template/+/85066607:38
*** ysandeep is now known as ysandeep|lunch07:44
*** ysandeep|lunch is now known as ysandeep09:42
mgariepyhmm. sad news.. i cannot reproduce with only the ssh plugins on a fresh jammy container11:44
mgariepyhttps://paste.openstack.org/show/bdoLY0fLP7dEHxaFfOGo/12:11
mgariepyhmm.12:12
mgariepyhttps://zuul.opendev.org/t/openstack/build/4211b9d532a247029df82a57cd7e2fa3/log/job-output.txt#3512:13
mgariepyhmm running a 500 loop on 2.12.6 aio12:21
mgariepyho.. just took a bit longer..12:23
jrosseris that the same? i a zuul job i don't think i've ever seen stdout/stderr have anything other than ""12:30
mgariepyhttps://zuul.opendev.org/t/openstack/build/4211b9d532a247029df82a57cd7e2fa3/log/job-output.txt#1362112:30
mgariepythats xena.. ansible 2.11.612:30
mgariepyin the same AIO i can reproduce the error on 2.12.6, 2.12.7 and 2.13.212:32
mgariepylooks like our connection plugin is causing issue12:32
mgariepyi do run the same tests in 2 containers (1 jammy, 1 focal) on my laptop running 200 long loops and it doesn't crash12:33
mgariepyin my aio, ansible 2.12.7+ does crash in less than 20 loops most of the time12:34
mgariepy2.12.6 took 96 iteration12:34
mgariepylooks to me like it's a race on the control socket12:35
mgariepywith our connection plugin12:36
mgariepyoops ERROR! Exceeded maximum object depth. This may have been caused by excessive role recursion. maximum recursion depth exceeded12:50
mgariepyjrosser, i think it's the -vvvv that changes it.12:54
mgariepythat's without -vvv : https://paste.openstack.org/show/bySeiRKYlalcPruPPYgR/12:57
*** ysandeep is now known as ysandeep|afk13:07
mgariepyi think it has to do with the retry decorator but my python is not quite good enough to try to fix that.13:11
spateljamesdenton around?13:20
spatelI am stuck here in ovn-bgp-agent deployment, any idea what is wrong here - https://paste.opendev.org/show/b0jSgXWlkkkws8Wp18Xz/13:20
*** ysandeep|afk is now known as ysandeep13:31
*** ysandeep is now known as ysandeep|mtg13:31
jamesdentonis that the latest FRR?13:53
spateljamesdenton yes.. 7.2.1 14:03
jamesdentoni don't think that's the latest?14:03
spatelIf you look at openstack-neutron channel i am already talk to one of developer14:03
jamesdentoni see14:03
spatelmade little progress but stuck again in OVS schema related stuff14:03
jamesdentonkk14:03
spatelThis is latest error - https://paste.opendev.org/show/bYxy4cBWygOQ21JgEJZb/14:04
spatel Error starting thread.: AssertionError  - not sure what is that14:05
jamesdentoni guess it didn't like: assert table_name in schema.tables14:06
spateli am running latest devstack version of openstack and ovn 20.03 14:07
spatellatest version of ovn is 22.03 (LTS)14:08
spateljust curious may be that is the issue because i am on very older version of ovn14:10
jamesdentonquite possible14:15
spateljamesdenton i talk to one of developer and he suggested go back 5 commit in ovn-bgp-agent (he believe its lb related issue)14:28
spatelgit question how do i check out specific SHA in git :) 14:28
spatellet me google it out14:29
noonedeadpunk`git checkout $SHA`? :D14:29
spatelhelp is here let me try14:29
mgariepynoonedeadpunk, how is your python ? 14:34
noonedeadpunkyou literally had an answer in your question14:34
noonedeadpunkmgariepy: meh, I'd say quite average. why?14:34
mgariepyour plugin seems to cause issue 14:34
mgariepyour ssh plugins**14:34
mgariepyfrom what i understand it's only a subclass of the original one. but it doesn't seems to work correctly14:35
noonedeadpunkyeah, we basically parenting it. I wonder if it's indeed shs plugin and not our strategy thing?14:37
noonedeadpunkbut yes, likely you're right about connection plugin14:38
noonedeadpunkmgariepy: so the way to reproduce is just loop for gather facts in aio?14:39
mgariepyi think it's only a mather to delegate the tasks. 14:40
mgariepyfrom the error i get it's not the setup module that fail but the ssh connection14:40
mgariepythis: https://paste.openstack.org/show/b8sxbUflXJJXF7PaiXZw/14:43
mgariepyrunning it without -vvvv does crash faster14:43
mgariepythe debug adds some delay that makes the code not crashing as much :D14:44
noonedeadpunkand since you run against just aio, this is not reproducable once you don't use our conenction plugin?14:45
mgariepylol my play name is not great 14:45
mgariepyi've not beeing able to14:46
noonedeadpunkdamn,.... I can't really dig into that right now despite it's super interesting14:46
mgariepyas a workaround/ugly hack we could always se the controlpersist to something like 600s14:47
mgariepythe default is 60s14:48
mgariepyso i guess it should happens 10x less..14:49
noonedeadpunkmgariepy: well, that looks quite differently from what we have https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/connection/ssh.py#L462-L53814:49
noonedeadpunkand basically, exec_command is not covered with retry, as what's calling it is covered14:50
mgariepyyes but we don't overwrite it.14:50
noonedeadpunkit's the only place where retry is used? https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/plugins/connection/ssh.py#L42214:50
noonedeadpunkso I have some concerns if we should have our retry implementation nowadays, as it seems to be covered now....14:51
mgariepyi think it's only for the lxc exect stuff14:51
mgariepyttps://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/plugins/connection/ssh.py#L503-L50714:51
mgariepyhttps://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/plugins/connection/ssh.py#L503-L50714:52
mgariepyshould call https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/connection/ssh.py#L117814:52
mgariepywith it's decorator14:52
mgariepynot ours14:52
noonedeadpunkyeah14:53
mgariepyunless the issue it present on the ssh plugin but our code add just enough delay to crash it?14:53
noonedeadpunkuh, let me try it out to reproduce :D14:54
mgariepyit's quite puzzeling haha14:54
noonedeadpunkexport ControlPersist is essential for that, right?14:55
mgariepyit does make the issue reproduce frequently14:55
noonedeadpunkhm......15:00
noonedeadpunkon loop 50 and works fine so far....15:00
mgariepywith our plugin?15:00
noonedeadpunkyup15:00
mgariepywithout -vvv ?15:00
noonedeadpunkwithout any -v15:00
mgariepywaht version of ansible ?15:01
noonedeadpunkno freaking idea :D Could be 2,13.215:01
mgariepyhaha15:01
mgariepylol15:01
noonedeadpunkwill tell you once it finish15:01
noonedeadpunkyes, 2.13.215:02
mgariepyi dont get up to 10 on my server.15:02
mgariepystupid race condition15:02
mgariepyyour server is either too fast or too slow ;p15:02
noonedeadpunksounds like it also depends on performance of hardware15:02
mgariepyif it's race it sure is 15:03
noonedeadpunkalready on 11015:03
mgariepyyou did bootstrap-ansible.sh ?15:04
noonedeadpunksure15:06
noonedeadpunkcan place your key....15:06
mgariepywhat is your hw ?15:06
noonedeadpunknot sure what's underneath, but virtualized as Haswell-noTSX-IBRS15:07
noonedeadpunkand have 6 cores/12Gb RAM on VM15:07
mgariepy4 core / 30gb ram 15:07
mgariepyon ceph, sata ssds 15:07
noonedeadpunkah, ceph with nvmes :)15:08
mgariepyi'm too poor for that haha15:09
noonedeadpunkbut still, ssds should be fine...15:09
noonedeadpunkso I can not reproduce that....15:12
mgariepyhttps://launchpad.net/~mgariepy/+sshkeys15:14
noonedeadpunkhardware is shitty - E5-2680 v315:14
jrosserI can try this later on a xeon gold type cpu15:16
mgariepyit fails.15:20
noonedeadpunkyeah, have these as well15:20
noonedeadpunkwhaat15:20
mgariepymust be me.. 15:20
noonedeadpunknow I really do wonder if that could be some SSH env you bring in ?15:21
noonedeadpunkok, mgariepy, tell me waht you did, step by step :D15:22
mgariepyfirst time 11.15:22
noonedeadpunkI'm right now on 4015:22
mgariepyssh ubuntu@yourvm15:22
mgariepysudo -i15:22
mgariepycd /home/ubuntu15:22
mgariepyexport ANSIBLE_SSH_ARGS="${ANSIBLE_SSH_ARGS:-'-C -o ControlMaster=auto -o ControlPersist=2s'}"15:22
mgariepyopenstack-ansible loop-test.yml15:23
mgariepyi'm at 51 on the second run15:23
mgariepynot failed yet.15:23
*** dviroel is now known as dviroel|lunch15:25
*** ysandeep|mtg is now known as ysandeep|out15:26
*** elenalindq_ is now known as elenalindq15:27
*** johnsom_ is now known as johnsom15:27
mgariepylook like this time it won't fail15:30
noonedeadpunkit's annoying15:31
mgariepywell it doesn't fail that often.. but often enough to see it a couple of time per day in the CI..15:32
noonedeadpunknah, I can't reproduce anywhere....15:58
mgariepyo16:01
mgariepywell. it sucks16:01
mgariepyfucking race..16:01
noonedeadpunkok, just cacthed16:07
noonedeadpunkand again o_O16:07
mgariepyouf now you know i'm not lying16:08
noonedeadpunklol, but now it's being reproduced each and every time16:09
mgariepyhave you seen it in your vm ?16:10
noonedeadpunkin your16:10
mgariepyso i was lucky :D i had just the right combinaison of hw16:11
mgariepyyour vm just did it.. at iteration 159..16:13
*** dviroel|lunch is now known as dviroel16:42
noonedeadpunkI think it's not our plugin that is broken, but rather some changes to ansible one that are16:43
noonedeadpunkAs once I `export ANSIBLE_TRANSPORT="ssh"` it still re-occured16:44
noonedeadpunkBut, once I did `export ANSIBLE_TRANSPORT="paramiko"` I can't reproduce again16:44
noonedeadpunkSo likely we just need to adjust our connection plugin to leverage paramiko rather then ssh....16:44
noonedeadpunkor well. just make another one16:45
mgariepyit's been broken for a while.16:45
noonedeadpunkparamiko?16:45
mgariepyansible 16:45
mgariepyit still have issue on xena.16:45
noonedeadpunkah. well. they use smart as default. and I can recall reading in release notes that they will prefer paramico when they can16:46
mgariepynot sure how far we need to test to see.16:46
noonedeadpunklikely that was for ansible-core 2.1216:46
noonedeadpunkjust to we exact https://paste.openstack.org/show/bwqxhLNtCJroaaBgsgOJ/16:48
mgariepyif you want to change the ansible version in that vm you can 16:50
noonedeadpunkbasically, paramiko does not support ControlPersist16:51
jrosseriirc paramiko doesnt understand anything in ~/.ssh/config16:52
noonedeadpunkyou're right here16:54
jrosserthat already causes us issues with with the ansible networking modules16:55
noonedeadpunkanyway I can totally reproduce it even without our connection plugin17:01
noonedeadpunkSo it's smth else....17:01
jrosseroh thats good17:02
noonedeadpunkis it?:)17:02
jrosserwell, we would have no hope making a bug report using our connection plugin17:03
noonedeadpunkyeah. but I can't say I catch issue when installed ansible inside venv17:03
noonedeadpunkand not sourcing our .rc17:03
mgariepymight be interaction with our inventory script ?17:04
noonedeadpunkI passed inventory17:05
noonedeadpunkAlso I would say ansible from venv is kind of faster17:05
noonedeadpunkso maybe also bunch of vars...17:05
jrosserthats actually more tech debt we have - the script should be converted to an inventory plugin17:05
spateljamesdenton ovn-bgp-agent working :) 17:06
spatelI am able to advertise vm fips in EVPN fabric 17:06
spatelOnly Floating IPs getting advertise not VM original address.. 17:07
noonedeadpunkbut totally smth out of our .rc make ansible fail17:12
noonedeadpunkas once I fully source it issue gets reproduced17:12
mgariepystrip 1 by 1 env from it >?17:12
noonedeadpunkyeah, but I'm populating isntead ;)17:12
mgariepywouldn't it be faster the otherway around ?17:13
mgariepylol17:13
noonedeadpunkand I've commented out connection plugin before sourcing to be sure it's not it17:13
mgariepyANSIBLE_SSH_PIPELINING ?17:15
noonedeadpunkI _think_ it wasn't it17:15
mgariepyhttps://xkcd.com/1722/17:21
mgariepylol17:21
noonedeadpunk:D17:22
mgariepythank god it's friday ! :) haha17:22
noonedeadpunkdamn, now I can't reproduce at all again :facepalm:17:26
noonedeadpunkwhat I did though, I stupidly dropped facts for aio1 from 17:28
*** dviroel is now known as dviroel|afk17:28
noonedeadpunkinstead of moving them somewhere else17:28
mgariepydo you think it's related ?17:32
jamesdentonspatel awesome! can't wait to read about it :)17:38
spatelYes.. i will blog it out..17:39
jamesdentoni am very behind in fun stuff17:39
spatelcurrently figuring out how to advertise tenant network17:39
spatelprovider network is working fine.. 17:39
spatelIssue was OVN version, newer version of OVN trying to find Load_Balancer schema table which is not exist in older OVN 17:41
spatelI move back to 5 commit that fixed my issue. 17:41
jamesdentonahh good to know17:41
jamesdentonand that FRR error? with the running-config?17:41
spatelIt was code issue https://opendev.org/x/ovn-bgp-agent/src/branch/master/ovn_bgp_agent/privileged/vtysh.py#L26-L2817:42
spatelThis is modified code - https://paste.opendev.org/show/bDvYGmrG21HDh8xRH0K2/17:42
spatelI will submit patch or ask dev to see what is going on17:42
spatelcommand should be split 17:43
spateljamesdenton i can see my floating ip getting advertise in BGP fabric - https://paste.opendev.org/show/bSqLjbj3DFcKG2u2E66e/17:45
spatelTrying to find out how to advertise tenant network 17:46
opendevreviewMerged openstack/openstack-ansible-haproxy_server stable/xena: Don't restrict haproxy tunable options  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/85048018:21
mgariepy ANSIBLE_SSH_PIPELINING=True seems to be related.18:23
mgariepyjrosser, noonedeadpunk ^^18:23
jrosser`When pipelining is enabled, Ansible does not save the module to a temporary file on the client. Instead it pipes the module to the remote python interpreter's stdin`18:29
mgariepyso if the socket drops mid-air it fails..18:30
mgariepyobviously :D18:30
jrosseris that -EPIPE?18:31
jrosserthe -1318:31
mgariepyhttps://paste.openstack.org/show/bdoLY0fLP7dEHxaFfOGo/18:31
jrosserwhy would it drop it though - thats odd18:34
jrosser`the backgrounded master connection will automatically terminate after it has remained idle (with no client connections) for the specified time`18:34
mgariepybecause we do not use it often.18:34
mgariepysince we moslty probably interract with contrainers via lxc and not ssh. except when we delegate ?18:34
jrosserdoes that ssh to the container with delegate?18:36
jrosserit's doesnt still realise the the container is local and not use ssh....?18:36
jrosserperhaps i'm confused here too tbh18:36
jrosseror perhaps i meen it ssh to aio1 and then use the special stuff in the lxc aware connection plugin18:37
mgariepyhow do we look at ara report with the sqlite now ?18:38
jrosseroh well thats difficult18:38
mgariepylol18:38
jrosserdo you have a build result handy?18:38
mgariepyhttps://a7e4a39e8fb82330ec44-84709aab2060acc1565f07c661aff448.ssl.cf5.rackcdn.com/846123/2/check/openstack-ansible-deploy-aio_lxc-ubuntu-focal/4211b9d/logs/ara-report/index.html18:39
jrosserdo you have the link to the job?18:39
mgariepyhttps://zuul.opendev.org/t/openstack/build/4211b9d532a247029df82a57cd7e2fa3/log18:39
mgariepyi can reproduce with the playbook without osa :D18:48
jrosser\o/18:48
mgariepyho. wait.. not completly lol18:49
mgariepylol :(18:49
mgariepyi'm getting tired haha18:49
mgariepyif i disalbe pipelining. it does not fail on openstack-ansible run . 18:51
mgariepydisable* wow can't type either18:51
mgariepyi can reproduce on my laptop now tho.. 18:57
mgariepyhave a great weekends guys19:06
noonedeadpunksorry I had to drop this for today :(19:14
noonedeadpunkhopefully will have some time on monday or during weekends19:14
noonedeadpunkwhat if we replace ANSIBLE_SSH_PIPELINING with ANSIBLE_PIPELINING.... I bet nothing will change actually...19:18
spateljamesdenton figured out how to expose tenant network in BGP :)20:01
spatelstill getting some strange error which i need to figure out before i close this loop20:02
*** qwebirc59942 is now known as batman20:20
*** batman is now known as r98geh31rt20:20
manowarriorHi all, looking to cross reference osa periodic tasks with https://docs.openstack.org/operations-guide/ops-advanced-configuration.html#implementing-periodic-tasks. I see https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html but that doesn't seem to cover cron stuff eg: image cache pruning, etc. Anyone know a good place to check as a new osa user?21:11

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!