Thursday, 2020-12-10

*** tosky has quit IRC00:02
*** maharg101 has joined #openstack-ansible00:20
*** macz_ has quit IRC00:25
*** maharg101 has quit IRC00:26
*** rfolco has joined #openstack-ansible00:26
*** cshen has joined #openstack-ansible00:30
*** cshen has quit IRC00:35
*** cshen has joined #openstack-ansible01:15
*** rfolco has quit IRC01:17
*** cshen has quit IRC01:19
*** ierdem has quit IRC01:31
*** maharg101 has joined #openstack-ansible02:21
*** maharg101 has quit IRC02:26
*** priteau has quit IRC03:03
*** cshen has joined #openstack-ansible03:15
*** cshen has quit IRC03:19
*** openstackgerrit has quit IRC03:22
*** gyee has quit IRC04:00
*** maharg101 has joined #openstack-ansible04:22
*** maharg101 has quit IRC04:27
*** spatel has joined #openstack-ansible04:36
*** cshen has joined #openstack-ansible05:15
*** dasp has quit IRC05:18
*** cshen has quit IRC05:20
*** evrardjp has quit IRC05:33
*** evrardjp has joined #openstack-ansible05:33
*** dasp has joined #openstack-ansible05:34
*** spatel has quit IRC05:51
*** cloudnull has quit IRC05:56
*** cloudnull has joined #openstack-ansible05:57
*** rpittau|afk has quit IRC06:11
*** mnaser has quit IRC06:11
*** mnaser has joined #openstack-ansible06:12
*** rpittau|afk has joined #openstack-ansible06:12
*** pcaruana has joined #openstack-ansible06:19
*** evrardjp has quit IRC06:49
*** evrardjp_ has joined #openstack-ansible06:49
*** pto has quit IRC06:56
*** pto_ has joined #openstack-ansible06:56
*** cshen has joined #openstack-ansible07:07
*** cshen has quit IRC07:12
*** cshen has joined #openstack-ansible07:14
*** openstackgerrit has joined #openstack-ansible07:15
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Fix libsystemd version for Centos  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/76603007:15
*** cshen has quit IRC07:18
*** sep has joined #openstack-ansible07:20
*** cshen has joined #openstack-ansible07:22
*** cshen has quit IRC07:26
*** maharg101 has joined #openstack-ansible07:44
*** jbadiapa has joined #openstack-ansible07:48
*** miloa has joined #openstack-ansible07:50
*** macz_ has joined #openstack-ansible07:51
*** macz_ has quit IRC07:55
*** rgogunskiy has joined #openstack-ansible07:58
*** andrewbonney has joined #openstack-ansible08:11
*** miloa has quit IRC08:11
*** miloa has joined #openstack-ansible08:12
*** miloa has quit IRC08:13
*** miloa has joined #openstack-ansible08:14
*** rpittau|afk is now known as rpittau08:17
*** cshen has joined #openstack-ansible08:17
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Fix libsystemd version for Centos  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/76603008:21
*** tosky has joined #openstack-ansible08:47
*** spatel has joined #openstack-ansible08:52
*** spatel has quit IRC08:56
*** pto has joined #openstack-ansible09:09
*** pto_ has quit IRC09:13
openstackgerritJames Gibson proposed openstack/openstack-ansible-os_keystone master: Add security.txt file hosting to keystone  https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/76643709:27
*** macz_ has joined #openstack-ansible09:35
*** macz_ has quit IRC09:40
*** macz_ has joined #openstack-ansible10:33
openstackgerritJames Gibson proposed openstack/openstack-ansible-os_keystone master: Add security.txt file hosting to keystone  https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/76643710:34
*** macz_ has quit IRC10:38
*** rgogunskiy has quit IRC10:45
*** rgogunskiy has joined #openstack-ansible10:47
*** ierdem has joined #openstack-ansible11:16
*** avagi has joined #openstack-ansible11:20
*** avagi has quit IRC11:21
*** avagi has joined #openstack-ansible11:22
*** rfolco has joined #openstack-ansible11:54
*** mike44333 has quit IRC12:17
openstackgerritJames Gibson proposed openstack/openstack-ansible master: Add security.txt to haproxy frontend  https://review.opendev.org/c/openstack/openstack-ansible/+/76645712:24
*** rfolco has quit IRC12:38
*** rfolco has joined #openstack-ansible12:42
*** rgogunskiy has quit IRC12:48
openstackgerritMarc Gariépy proposed openstack/openstack-ansible-repo_server master: Fix order for removing nginx file.  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/76625712:50
mgariepygood mornign12:51
noonedeadpunko/12:51
admin0\o12:52
mgariepyanything interesting this morning ?12:53
avagihi,12:53
avagimay I ask for help concerning keystone installation?12:53
avagiStarting from yesterday I cannot install keystone, because the keystone-21.2.0-constraints.txt file is not existing in the repo container.12:53
noonedeadpunkis it centos?:)12:53
avagiyes12:53
noonedeadpunkwell.... with 8.3 things got broken. It was great rhel demo of how centos can be considered as "stable" now12:54
noonedeadpunkhttps://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/766030 should work12:54
noonedeadpunkoh, btw, jrosser - it just passed (with disabled centos metal test)12:55
avagithanks, I am going to check ....12:55
noonedeadpunkand centos metal passing here https://zuul.opendev.org/t/openstack/build/6ddee2acf46e49e9b50dff6db5e6d63e12:56
mgariepyhttps://centos.rip/12:56
noonedeadpunkhaha12:56
mgariepylol12:57
mgariepynot funny haha12:57
noonedeadpunkwell, funny until you use centos in prod12:58
mgariepyyep.12:58
mgariepyi'm very glad i'm not using centos.12:58
admin0+112:59
*** sshnaidm has quit IRC13:09
*** sshnaidm has joined #openstack-ansible13:09
*** priteau has joined #openstack-ansible13:10
openstackgerritAndrew Bonney proposed openstack/openstack-ansible master: Ensure kuryr repo is available within CI images  https://review.opendev.org/c/openstack/openstack-ansible/+/76576513:14
*** zigo has joined #openstack-ansible13:51
*** spatel has joined #openstack-ansible13:55
spatelnoonedeadpunk: https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/765906  you already have this fix in your patch so we don't need this one right?13:59
noonedeadpunkspatel: yeah, sorry, we had to squash commits somehow to fix everything in one patch14:01
spatelno worry, i will abandon it to clean up14:02
noonedeadpunkyeah, thanks, it really helped a lot, since we just took your code14:02
spatel+114:03
spatelwhat is the status of 8.3 and victoria at present ?14:03
spatelare we going to merge 8.3 with victoria release?14:03
noonedeadpunkI was about to make branching when 8.3 released14:03
noonedeadpunkyes, totally14:03
noonedeadpunkotherwise all stuf just stuck14:04
noonedeadpunkas CI is broken14:04
noonedeadpunkmoreover, we need to backport fix to U14:04
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible master: Ensure kuryr repo is available within CI images  https://review.opendev.org/c/openstack/openstack-ansible/+/76576514:07
*** newtim has joined #openstack-ansible14:14
jrosserit looks like these patches stand a chance to merge14:15
jrosserproblems with CI nodes being 'error' though14:15
noonedeadpunkyeah. so annoying....14:16
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Make CentOS 8 metal voting again  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/76642514:17
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Make CentOS 8 metal voting again  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/76642514:17
*** SecOpsNinja has joined #openstack-ansible14:21
SecOpsNinjahi to all. is there a way to force the haproxy lets encrypt renew process?  from what i was search is not using cron jobs14:25
SecOpsNinjai used for the first one https://docs.openstack.org/openstack-ansible/latest/user/security/ssl-certificates.html#letsencrypt-certificates14:26
jrosserSecOpsNinja: can you explain "the first one"?14:29
jrosseryou mean the first set of variables given there, as you already have horizon deployed?14:31
SecOpsNinjajroll,  the first was  created corretly using that info, but because i have multiple haproxy with non distributed storage the renew process didnt happen in the correct one and now that i have removed all the other haproxy i lost the renew certificate. I try to make "certbot renew " in the unique haproxy node and seeing whty Hook command "/etc/letsencrypt/renewal-hooks/pre/haproxy-pre" return14:32
SecOpsNinjaed error code 12414:32
SecOpsNinjafrom what i was seging in haproxy ansible role we aren't using cronjobs and using the hooks of the certbot itself for this. trying to use why this doesnt renew14:33
* jroll does the periodic mis-ping wave to jrosser :P14:33
SecOpsNinjathis because im getting this "Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1056)')"14:33
jrosserjroll: o/14:33
jroll\o14:33
jrosserSecOpsNinja: ok, so shared storage is not required for letsencrypt14:34
SecOpsNinjait is if you using multipli haproxys in infra host like i was using14:34
jrossereach haproxy node is responsible for renewing it's own certificate, certbot runs independantly on each one14:34
*** pto has quit IRC14:35
SecOpsNinjayep the service could work in any node so the certificate would be on only the one that renew and not in the others14:35
jrosserthe nodes are responsible for their own certificates14:35
jrosserthere is nothing shared, ever14:35
SecOpsNinjabecause there isnt any way to sincronized all haproxys with the corect certificate unless is copied to all or using a distributed filesystem like nfs, ceph, flusterfs,...14:36
SecOpsNinjabut the public endpoint is using this lets encrypt cert14:36
SecOpsNinjaand what happend is that it renewed in a difierente haproxy that wasn't master and the haproxy only had the expired one for the public endpoint14:37
SecOpsNinjanow i only have 1 haproxy and glance until i resolve the problem of distributed storage like ceph14:37
jrosserthere is no synchronisation of the certificates14:37
jrosserby deliberate design each haproxy (one or more) renews its own certitificate independantly14:38
*** tobberydberg has quit IRC14:38
jrosserkeepalived is pointing the external VIP only ever to one of those14:38
SecOpsNinjawhy deliberate design?14:38
jrosserbecause synchronising the certs is difficult14:39
*** cshen has quit IRC14:39
jrosserit is easier to make it like this14:39
*** tobberydberg has joined #openstack-ansible14:39
jrosserthere is an extra problem that you never know which haproxy has the external VIP14:40
jrosserso you would have huge complexity making that one do the renewal and then distribute to the others14:40
jrosserit is not known after deployment which is the active haproxy14:40
SecOpsNinjayep the only solution would be a shared storage for this certificates if only the master one renews14:40
SecOpsNinjasupossly you dont difine one as master and the other as slaves?14:41
jrosserkeepalived / vrrp makes that choice14:41
SecOpsNinjayep but you can define the one with higher prioruty... i think14:41
jrosserabsolutely, that can be done14:42
SecOpsNinjabut yeh i will take in mind next time i try to reactive multiple haproxys14:42
jrosserthe http-01 challenge is difficult14:42
jrosserbecasue it needs to hit the challenge URL at the VIP14:43
SecOpsNinjabut i think the solutuion would be the configuration of a shared drive  (for the storage of the /etc/letsencrupt) if multiple haproxys are used14:43
jrosseri'm not sure tbh, becasue that needs a filesystem style storage which otherwise does not exist in the deployment, and would become a SPOF14:44
jrosseri am running the OSA haproxy role with HA letsencrypt in two production environments and also re-using it outside of OSA14:44
jrosserit's working really solidly there14:45
jrosserSecOpsNinja: there is a special backend in haproxy config which routes the http-01 challenge from the VIP to the particular haproxy that is renewing its cert14:47
jrosserthat allows all of them to renew even though the VIP is at a specific haproxy14:48
SecOpsNinjaSPOF?14:48
jrossersingle-point-of-failure14:48
jrosserso we use haproxy itself to route the renewal challenge to the right certbot14:48
jrosserthe pre-hook is needed to stand up a temporary http server on the right port to swing the haproxy backend over to the node that is renewing14:49
jrosserotherwise there is a race condition14:49
SecOpsNinjabut that you need to define a specific haproxy node to to only the renew or its something outside of haproxy?14:49
jrosseri'm not really following14:50
jrossereach haproxy is using cron to run certbot renew14:50
SecOpsNinjasorry, regarding the specifial backend14:50
jrossercertbot cron runs on one of the haproxy nodes14:50
SecOpsNinjai didn't finf that crontab jjob14:51
jrosserthat starts a temporary server on the backend renewal port using python, in the pre-hook14:51
SecOpsNinjabut i would thy to find it14:51
SecOpsNinjayep that i saw of the prehooks to create the http server14:51
jrosserall the other haproxy nodes notice the backend port is up and direct http-01 challenges to that port14:51
jrosserthe one with the VIP receives the challenge and so the challenge goes to whichever haproxy node is renewing14:52
jrosserthe cron job is actually a systemd timer i think which comes with the ubuntu certbot package14:52
SecOpsNinjayep but if the master changes the problem hapends because the renew certificate its in another node14:52
SecOpsNinjaok didnt check the system cronjob :D14:53
jrosserit does not matter which the master is14:53
jrosserbecause all of the haproxy renew their certificates all the time14:53
SecOpsNinjaif you have 5 haproxy all of them are goung to request 5 renew of the same certificste?14:53
SecOpsNinjaok that is a way to do it if you dodnt hit the lest encrypt api limits of request and renews14:54
SecOpsNinjabut from what i found yesterday is that only one of the haproxys renew the certificate and the other where still using the expired one in /etc/letsencrypt14:54
SecOpsNinjathat is why i asked about the workflow14:55
jrosserrenewals do not count against the rate limit but are subject to 5 a week for duplicates14:57
jrosserif you were to add the unique fqdn of each haproxy instance as an extra --domain then they would not be duplicate and not subject to that limit14:57
SecOpsNinjaah ok nice didnt «kno that of renewals15:00
SecOpsNinjain meantime i was able to find the problem, it was another parent haproxy that is not pointing to the correct ports :(15:01
jrosserthe design used is this one https://serversforhackers.com/c/letsencrypt-with-haproxy15:01
jrosserbut extended to multiple haproxy15:01
SecOpsNinjajrosser, tahnks for all the info :D15:01
jrosserno problem :: anytime15:01
jrosser:) even15:02
*** ericzolf has joined #openstack-ansible15:02
jrosserthe whole thing is a balance really - renewal in one place only comes with some different complexities and how to make resilient15:02
jrosserthis is just a different more distributed approach15:03
SecOpsNinjaafter resolving the problem of creating the vms because of duplicated galnce with not sincronezxed storage i wthink the next stepp would be to install ceph or glsuterfs as a bakend for all the services so i have less spof15:03
SecOpsNinjabut that will probably be a new year resolution :)15:04
jrosseralso if you need to work a lot on haproxy/LE add --staging to this https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/defaults/main.yml#L9515:05
jrosserthen the rate limit is basically removed15:05
jrosserbut the certs are not valid, when you are happy, remove the flag and re-issue from the production LE endpoint15:06
SecOpsNinjayep i sued the staging some times so dont mess with the prod api limits15:06
SecOpsNinjabut thanks anyway for tip15:06
SecOpsNinjastill learning a lot about the structure of all the roles in openstack.-ansible and there are a lot of them :D15:07
*** ierdem has quit IRC15:27
mnasernoonedeadpunk, jrosser you might be interested by currenet topic  on #openstack-tc15:36
*** macz_ has joined #openstack-ansible16:01
*** pcaruana has quit IRC16:10
noonedeadpunkfolks, can we vote please for https://review.opendev.org/c/openstack/openstack-ansible/+/766244 ?16:19
mgariepynoonedeadpunk, done.16:21
noonedeadpunkjrosser: ?:)16:21
noonedeadpunkthanks mgariepy!16:21
andrewbonneyGot there first :)16:21
noonedeadpunkawesome, thanks16:21
jrosserahha it is done16:21
openstackgerritMarc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature.  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/76650416:21
mgariepyany comments on that ^^16:22
noonedeadpunkexcept you could use single release note?:)16:22
noonedeadpunkas it's a yaml list16:22
noonedeadpunkbut whatever16:23
mgariepylol yes sure haha16:23
openstackgerritMarc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature.  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/76650416:25
jrosserhaproxy_raw is interesting i think, even if we like a haproxy_http_request as well16:38
jrosseras you can't use config_template with haproxy config it's difficult to insert arbitrary stuff into front/back end that the template does not support16:38
jrosserthe way that the security.txt patches were done was influenced a bit by what was allowed in the template16:39
mgariepymy idea is just to be able to set config whenever the haproxy version supports it without the need to modifying the template.16:40
mgariepyi;ll re-upload a soonish.16:40
mgariepyjust stuck in a meeting right now.16:41
*** cshen has joined #openstack-ansible17:09
*** cshen has quit IRC17:13
admin0jrosser, i think this has passed -- https://review.opendev.org/c/openstack/neutron/+/765408 .. will it be in the next update and if yes, when is that update coming ?17:17
*** cshen has joined #openstack-ansible17:36
*** ericzolf has quit IRC17:40
*** jbadiapa has quit IRC17:57
*** rpittau is now known as rpittau|afk18:04
jrosseradmin0: that will be in the next osa ussuri tag, not sure when as that can be only after we fix all this centos8.3 mess18:05
admin0jrosser, in that case, i can override the neutron with this Change-Id: Icfcf8c5406cfdc47fabf012e82ed56c345a73af8  ? somewhere ? i don't recall the exact steps to do it18:10
*** miloa has quit IRC18:24
*** CeeMac has joined #openstack-ansible18:34
jrosseradmin0: here are the instructions https://docs.openstack.org/openstack-ansible/latest/user/source-overrides/index.html18:41
openstackgerritMarc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature.  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/76650418:46
openstackgerritMarc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature.  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/76650418:50
mgariepyouf. the workflow in the new gerrit is something lo.18:53
mgariepylol.18:53
mgariepylet me know if you have other comemnts.18:54
mgariepynoonedeadpunk, i would prefer to keep the haproxy_frontend_raw input instead of adding a haproxy_http_request as it will be easier to change the output with this key.18:55
mgariepythe output of the template that is.18:55
SecOpsNinjayep for soem reason the certbot.service installed by haproxy_server role is getting error 124 in pre hook18:55
SecOpsNinjatrying to understand what is failing...18:55
ThiagoCMCNew CentOS: https://rockylinux.org ? lol18:56
mgariepyThiagoCMC, centos.rip is wayy better ;p ahah18:56
SecOpsNinjaThiagoCMC,  are you taling about this https://arstechnica.com/gadgets/2020/12/centos-shifts-from-red-hat-unbranded-to-red-hat-beta/ ?18:57
mgariepynoonedeadpunk, if you want to add stuff like : https://www.haproxy.com/blog/four-examples-of-haproxy-rate-limiting/18:57
ThiagoCMCmgariepy, LOL18:57
ThiagoCMCSecOpsNinja, pretty much18:58
SecOpsNinjayep i stoped using centos a long time ago and now using debian stable our sometimes ubuntu18:59
ThiagoCMCMe too... Using Debian since 1998 and Ubuntu since 2006 (desktops)19:00
ThiagoCMCNever liked RH-based distros, way too complicated, specially.. Hmm... Upgrades.19:00
mgariepythe issue with centos i have is that if you need vim you need an extra repo, and another one for nano.. that's just annoying.19:01
ThiagoCMCExactly. And unsafe.19:01
ThiagoCMCDebian repo is huge, everything is in there19:01
ThiagoCMCsupported, stable, tested...19:01
mgariepyi have been biten once or twice with major pkg update in epel.19:02
mgariepyit was not fun.19:02
spateladmin0: did you verify this patch - https://review.opendev.org/c/openstack/neutron/+/76540819:03
ThiagoCMCSaw that happening too, in previous jobs... People fear upgrades because they stick with CentOS... Only if they knew Debian lol19:03
ThiagoCMCspatel, I'm applying that patch manually in my cloud. Totally required for me!19:04
spatelThiagoCMC: let me know if it stop spitting logs.19:04
spatelit looks very ugly19:04
ThiagoCMCYes, logs are clean now19:05
spatelThis patch should be on high priority to merge, not sure why its not taking enough traction19:05
jrosserSecOpsNinja: paste the certbot errors if you think it might hepl19:06
spatelThiagoCMC: did you edit files to apply patch or use OSA way to push out branch/commit?19:06
ThiagoCMCOn launchpad, people say that it only affects CentOS but it affects Ubuntu as well. I sent a message there too.19:06
jrosserthe pre-hook should be runnable by hand i think to test it19:06
ThiagoCMCspatel, `vim` FTW19:06
spatel:)19:06
SecOpsNinjayep runing manualy the pre scirpt it runs ok it semas something in certbot trying to get any info on that comand but i will past19:07
jrosserwhen you run the pre-hook you should also be able to see the backend become active with hatop or in the haproxy journal19:07
SecOpsNinjaparte do the renew in verbose mode http://paste.openstack.org/show/800949/19:08
jrosseris this on ubuntu or debian?19:08
SecOpsNinjaits debian and its running fine the haproxy parte because i can see in all backends starting the service and became up...19:09
SecOpsNinjadebian 10 i believe19:09
SecOpsNinjaand i have installed using distro19:10
openstackgerritMarc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature.  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/76650419:10
SecOpsNinjai will try removing the pre hook and starting manualy and see if the hook still fails19:10
kleiniHow do you handle vm.overcommit_memory? I still have default of 0, OpenStack thinks, 420G out of 512G are used, actually free are 158G and qemu gets "Cannot allocate memory" when probing for cababilities, so I can not spawn any new VMs on that machine although memory still seems to be available.19:11
SecOpsNinjayep the certbot is having a strange error http://paste.openstack.org/show/800951/19:13
jrosserthat is odd19:15
jrosserwhich log are the 404 from?19:15
kleiniThiagoCMC: I have one system, that I initially installed with Debian 1.3 (Bo) and upgraded it all the way along up to Buster. Underlying hardware needs to be replaced several times and filesystem evolved from ext2, ext3, reiserfs, ext4 now finally to ZFS. But I never had to "reinstall" it.19:15
admin0spatel, i am trying to override neutron to use that commit-id and then do a os-neutron run to validate that patch19:16
admin0or you mean do it manually ?19:16
admin0like edit the file (by hand) :D19:16
admin0jrosser, spatel  - this way right ? https://gist.github.com/a1git/79d019baa855f7ef8d9ba0b47166ba6219:19
admin0i need to put that in openstack_services.yml file19:19
spatelvia commit-id19:19
openstackgerritMarc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature.  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/76650419:20
spatelif you verify that process then let me know i will also push out19:20
spateladmin0: looks good to me https://gist.github.com/a1git/79d019baa855f7ef8d9ba0b47166ba6219:20
spatelyou can put that in user_variables.yml also19:21
admin0oh19:21
admin0:)19:21
admin0did not knew that19:21
jrosserSecOpsNinja: "The exit code is 124. This is the value timeout uses to indicate the program was terminated using SIGTERM"19:21
spateladmin0: check this out - https://docs.openstack.org/openstack-ansible-os_neutron/latest/19:21
SecOpsNinjayep the problem it seams that the problem could be in the weebroot in the default certbot renew.... checking there documentation about that19:22
spatelit has all those variables which you can overwrite using user_variables.yml file19:22
jrosserSecOpsNinja: the pre-hook uses "timeout" which may terminate the script like that19:23
admin0spatel, what do i run after that / the whole setup-everything ?19:24
admin0i think hosts is not needed19:24
admin0only infra and os-neutron ?19:24
SecOpsNinjajrosser,  yep i raise the timeout but it seams the problem is not that atm (despite causing the erro 124) but runing pre hook separalty from cert boott the problem it seems a problem of webroot files http://paste.openstack.org/show/800951/19:24
jrosserit's not webroot19:25
jrosserit uses the certbot built in web server19:25
ThiagoCMCkleini, riiight?! Try to upgrade CentOS 6 to 7!  lol19:25
spateljust run neutron playbook19:25
spateladmin0: os-neutron-install.yml playbook is enough19:26
spatelIt will detect new branch and re-build / re-install neutron19:27
SecOpsNinjajrosser,  sorry?  from what i understand the pre-hook only starts a web server, trought python, and only after that  certbot  will create the  the acme challange files to be exposed trough webserver and that is not happing atm19:27
spateljrosser: is that correct?19:28
jrosserspatel: yes I think it’s basically like a minor upgrade19:29
spatel+119:33
*** maharg101 has quit IRC19:33
spatelTechnically we only this patch on LinuxBridge agent right not for neutron-server.19:38
spatelBut good to keep everything consistent across the board19:39
jrosserSecOpsNinja: certbot is run in standalone mode https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/tasks/haproxy_ssl_letsencrypt.yml#L7219:39
jrosserin that mode it has an internal web server which serves the challenge response, we are not using webroot19:40
jrosserbecasue on the haproxy node there is no web server19:40
jrosserhaproxy also cannot serve static content19:40
jrosserwe must be able to support deployments where haproxy has dedicated nodes, so this must be self contained19:40
jrosserthe pre-hook is there to give haproxy health checks sufficient time to detect which haproxy node is running certbot, by there being a valid http server on port 888819:41
SecOpsNinjayep i understand that but dont understand standalone workflow because atm the reposes for that files are getting 404 from the python webserver and that is the cause for this not to renew (not thinking about the problem of the error 124 atm)19:42
jrosseri did ask which log the 404 was from.....19:42
admin0spatel,  WARNING: Did not find branch or tag 'Icfcf8c5406cfdc47..19:43
jrosserSecOpsNinja: the python webserver should run for 5 seconds before certbot19:43
jrosserit is never expected to handle the challenge19:44
SecOpsNinjaok i do need to check the standalone work but i will revert the change but yep i still dont understand why this is faling19:45
spatelhmm19:45
SecOpsNinjagoing to read certbot documentation beucase im not understand why this is faling  and i reached the prod  produced an unexpected error: urn:ietf:params:acme:error:rateLimited19:50
spateladmin0: BRB19:50
jrosserSecOpsNinja: whats kind of wierd is that the pre-hook is run much later than i would expect in your certbot log19:51
admin0jrosser,to get this, is it Icfcf8c5406cfdc47fabf012e82ed56c345a73af8 or 2207b885449667a7bc377f427b9123165223dbde as the neutron_git_install_branch ?19:51
admin0to get https://review.opendev.org/c/openstack/neutron/+/76540819:51
admin0or is it not there yet due to the zuul status in the end19:51
jrosserwell neither of those things.....19:52
openstackgerritMarc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature.  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/76650419:52
jrosseryou need the git SHA of the commit on stable/ussuri19:52
SecOpsNinjajrosser, one second i will post tthe full log of certbot renew --standalone --staging --break-my-certs -vv in http://paste.openstack.org/show/800954/19:53
admin0oh from here ? https://opendev.org/openstack/neutron/commits/branch/stable/ussuri  -- i don't see it merged yet19:53
admin0last update is 3 weeks ago19:53
admin0so its passed review .. but yet to be merged to the actual branches19:53
admin0due to some CI error19:53
jrosseradmin0: the patch here is not merged at all https://review.opendev.org/c/openstack/neutron/+/76540819:56
jrosserit fails CI19:56
jrosserSecOpsNinja: you will get a 503 from haproxy when it does not think there is a backend available to serve the request19:57
jrosserin one terminal watch hatop/haproxy journal and in the other run the cert issuance19:58
jrosseryou should see the letsencrypt backend come up in the haproxy log, if you don't there is something wrong between haproxy and the detection of port 8888 being active19:59
SecOpsNinjajrosser,  sorry but now im lost. if we only use the python for 5 seconds before starting the service, and to fool haproxy that a web service is active, how is the certbonly standalone mode going to expose anything to outside requests? atm the certbot will put the certbot standalone service runing the chekk for haproxy is already triggering. i will put a bigger trigger for fail in the parrent20:00
SecOpsNinja haproxy to see if it stays longer up and running20:00
jrossercertbot standalone mode has an internal webserver inside certbot20:00
jrossercertbot itself responds to the request20:01
mgariepyreno is not fun.20:02
SecOpsNinjaok but the parent haproxy (not haproxy in osa infra node) is still with the connection up and runing when certbot allready failed20:02
jrosseri do not understand 'parent'20:03
openstackgerritMarc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature.  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/76650420:03
*** andrewbonney has quit IRC20:05
SecOpsNinjajrosser, sorry i have 2 haproxy : one parrent (in the gateway of the company) and child one (installed in osa infra host).20:06
jrosseroh this is new information! :)20:06
SecOpsNinjafrom what im seing with the haproxy, in osa, the for the check is inter 12000 rise 1 fall 2 and in the parent one is that rise 1 and fall 10 with the default 1 sec20:07
jrosserthen i think careful attention to the pre-hook timeout being greater than the total of the two haproxy 'rise' times plus some for luck would be needed20:08
SecOpsNinjanow with the same setting in both haprioxy the channel is not open is time to triger the haproxy check20:08
SecOpsNinjayep going to test that20:08
jrossertwo haproxy like this increases the uncertainty time i think20:08
SecOpsNinjathe parent one as a very big fail try so its not a problem (the connection goes up the python webserver starts and only goes down long aftert the certbot failed error)20:10
SecOpsNinjagoing to check osa haproxy settings and try to tune it20:10
jrosserif the timeout is not quite long enough you might find sometimes it works / sometimes fails depending at exactly what time the renewal runs vs. the haproxy checks20:10
jrosserwe had exactly this without the timeout at all and it was very unreliable20:10
jrosserthe timeout got added later to make it robust20:11
SecOpsNinjaok i will make a few tests to see what is making the 50320:11
mgariepyshould we abandon old patches (like 2yo one) ?20:16
SecOpsNinjajrosser,  yep there is some strange thing hapening in osa haproxy because hatop stops getting data when certbot is runing (the parent haproxy still has his connection open by the time certbot fails)20:17
jrosserand you allow port 80 http through from the parent haproxy?20:18
SecOpsNinjathose broadcast message are anoying like hell when you try to foloow cli tools outputs heheh20:18
SecOpsNinjayep20:18
jrosserthe other think you can do is have a terminal open on a host on the internet20:18
SecOpsNinjai tested runing only the python webserver to check pout side comunication and it works20:18
jrosserand you can curl the .well_known/acme-challenge endpoint without the hashed filename (because you don't know what it will be)20:19
jrosseryou can see that change from 503 to 404 and so on as the different phases of this happen20:19
SecOpsNinjayep osa  haproxy reload20:20
SecOpsNinjalet me see if i can find the logs regarding clets encrypt api20:21
SecOpsNinjaok i think i understand the problem20:24
spateladmin0: any luck20:25
admin0spatel, i had to do it manually20:27
admin0but that error is gone20:27
admin0i was able to remove and add and test like 10 loadbalancers create and delete20:27
spatelmanually=hand edit?20:27
admin0:(20:28
admin0spatel,  https://opendev.org/openstack/neutron/commits/branch/stable/ussuri20:28
admin0nothing has landed in the last 3 weeks there20:28
admin0until that commit id lands there, we are out of luck to do it via any sort of automation20:28
admin0well, we can ansible copy :)20:28
SecOpsNinjajrosser,  for some reason the haproxy in osa is reloading after python webserver timeouts (5s) and in the time the request in parent haproxy get 503 because the child haproxy is reloading20:28
spatelinteresting admin020:29
jrosserthat is very odd20:29
SecOpsNinjayep true :D20:29
jrosseradmin0: you can fork the repo on github (it's literally one click) then apply the patch yourself to stable/ussuri20:29
jrosserpoint to your own github URL and your own git SHA20:30
jrosserSecOpsNinja: really the only thing that should reload haproxy is the renewal hook20:30
*** avagi has quit IRC20:30
SecOpsNinjaeven if a backend fails it canont relaod all server in haproxy thats very stupid20:31
SecOpsNinjaand explains very strange errors that i was having in my multipe haproxy nodes....20:31
jrosseri am not sure how to determine what made it reload20:32
jrossersystemctl status will not be helpul?20:32
SecOpsNinjachecking jjournal of haproxy is always reload after lets encrypt backend goes down20:33
jrosserdoes it do that even if you run the pre-hook by hand?20:34
SecOpsNinjai do need to troubleshoting this but at least we found the cause of the non validation :D20:34
admin0jrosser,  i have not tried it yet ..  the last time i applied a patch was for https ssl from cloudnull .. years ago :)20:34
admin0i think i did not made notes and already forgot20:34
SecOpsNinjajrosser,  i wil check that with only the python server20:35
admin0but copying files via ansible also seems OK to me ..  they will be gone int he next ansible upgrade anyway20:35
spateljrosser: forking neutron github would be good idea20:35
admin0as it will be in a new venv20:35
admin0so no hassle of doing a fork and then patching and again touching the variables and remembering them20:35
admin0download 2 files, ansible copy to the compute, restart network-agents and you are good until you run the playbooks again/next upgrade20:36
admin0in my opinion20:36
SecOpsNinjajrosser,  nope. with only the python webserver the haproxy is not reloading so it can be the certbot standalone internal service ... very srange indead....20:36
spateladmin0: that is a good idea to just create new playbook call post-neutron-patch.yml and run along with other playbook to patch it until we have fix in branch20:37
jrosseryou could move the renewal hook aside so that cannot be run20:37
SecOpsNinjajrosser,  i will try that. and will report after going something to eat :D but again thanks for all your help troubleshooting this. i will report in 1h probably20:39
jrossersure not problem, it is getting late here too20:39
jrosserreally interested to know what is going on though :)20:39
*** rfolco has quit IRC21:34
*** yann-kaelig has joined #openstack-ansible21:35
admin0spatel, done magnum already ?21:51
admin0i failed ..21:51
admin0might do trove next before moving back to magnum again21:51
spatelwhat is the issue with magum?21:52
spatelI did magnum deployment on my lab but not in production21:52
spatelI can't run it in production because we used vlan base provider and k8s doesn't fit in that design21:53
admin0the issue with magnum is that there is no issue in installation .... and the master (instance comes up) .. but its not marked up in the script and never reaches the point where the nodes are created and the magic happens22:11
*** avagi has joined #openstack-ansible22:18
SecOpsNinjajrosser,  i finally was able to find the cause of all problems :D22:34
SecOpsNinjafirst we dont need the pre hokk to start the python webserver because certbot standalone already does that22:34
SecOpsNinjathe reload was caused by post hook that reloads haproxy after concat the lets encrytp cert for haproxy22:35
SecOpsNinjathe problems in the renew was acording to /etc/letsencrypt/renewal/*.conf where the defined ip was diferente from the one expecting by haproxy22:36
jrosserwithout the pre-hook the renewal is unreliable becasue of a race condition between haproxy healthchecks and certbot22:36
SecOpsNinjawhen i did the certbot renew -vv --http-01-address 172.30.100.253 --http-01-port 8888 it renew it22:37
SecOpsNinjai reduced the haproxy check to inter 1 rise 1 fall 10 and it workly fine22:38
SecOpsNinjatomorow i will check why it put the wrong ip in the *conf file but atleast its renewed :D22:39
SecOpsNinjasecond i need to find if haproxy needs a full reload when some cert changes22:40
jrosserinteresting22:40
jrosserthe reload is necessary iirc to pick up the new certificate22:40
SecOpsNinjayep but at lest we need to change the cront job certbot to not reload certbot if the cert wasn't changed22:41
SecOpsNinjaIn HAProxy 2.1 (Nov 2019), a new feature allows you to change TLS certificates without requiring a reload: https://www.haproxy.com/blog/dynamic-ssl-certificate-storage-in-haproxy/22:42
SecOpsNinjathat could be interesting and to reload some unecesseries reload and drop out of connections22:42
jrosseri wonder if we are using the right hook22:45
jrosserrenewal vs. deploy22:45
SecOpsNinjai will put a 0 timeout in pre hook and see what hapens e try to troubleshooting the root csause regarding the worng ip in lets encrypt *.conf file22:46
SecOpsNinjabut now i can rest well knowing i solved the problem :)22:47
jrosserperhaps this is wrong https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/tasks/haproxy_ssl_letsencrypt.yml#L9922:48
jrosserand the path should be /etc/letsencrypt/renewal-hooks/deploy/haproxy-renew instead22:48
SecOpsNinjai need to check the diferences between both of them22:49
SecOpsNinjahttps://github.com/certbot/certbot/issues/593522:50
jrosserhttps://certbot.eff.org/docs/using.html?highlight=hook#renewing-certificates22:50
jrosserthat talks about all the different hook dirs22:50
SecOpsNinja If you want your hook to run only after a successful renewal, use --deploy-hook in a command like this.22:51
SecOpsNinjain https://certbot.eff.org/docs/using.html#renewing-certificates22:51
SecOpsNinjaWhen Certbot detects that a certificate is due for renewal, --pre-hook and --post-hook hooks run before and after each attempt to renew it. If you want your hook to run only after a successful renewal, use --deploy-hook in a command like this.22:52
SecOpsNinjasioo yah we need to change it to deploy hook regaring the concat and reload22:52
jrosseri think this is relevant22:52
jrosser"You can also specify hooks by placing files in subdirectories of Certbot’s configuration directory. Assuming your configuration directory is /etc/letsencrypt, any executable files found in /etc/letsencrypt/renewal-hooks/pre, /etc/letsencrypt/renewal-hooks/deploy, and /etc/letsencrypt/renewal-hooks/post will be run as pre, deploy, and post hooks respectively when any certificate is renewed with the renew22:53
jrossersubcommand"22:53
SecOpsNinjaand possibily add an option to newer versions of haproxy update dynamic the ssl wiuthout reload all the service22:53
jrosseryes, that would end up being distro specific22:53
jrosserare you running an haproxy version that can do that?22:54
SecOpsNinjalet me check that22:55
SecOpsNinjabecause im using debian version im using the 1.8.19 but they already have the 2.2.6 in backports22:57
jrosserok, i can certainly look at making patch to fix the reload-when-not-renewing thing tomorrow22:58
jrossernice work finding that btw22:58
SecOpsNinjait toook a day but the cause was found22:58
jrosser:/ apologies, thanks for persisting with it though22:59
SecOpsNinjai dont like to have my system to have strange behaviuours :D22:59
SecOpsNinjaand i was getting strange erros in haproxy disconeted so that is a problem because its used by opentack services so i had to try to find the cause :P23:00
jrosseroh yes indeed23:00
jrosserand if we can do reload-less new certificates with 2.x that will br great23:01
SecOpsNinjayep that i will try to check it but will probavly be in the end of the year or in the january23:01
SecOpsNinjai dont think i have time to check that but will be ion my todo list  and a way to contribute to the project23:02
SecOpsNinjaok i will go rest now but again thanks for all the help troubleshooting this23:02
*** SecOpsNinja has left #openstack-ansible23:10
*** spatel has quit IRC23:17
*** maharg101 has joined #openstack-ansible23:30
*** maharg101 has quit IRC23:35
*** rfolco has joined #openstack-ansible23:41
*** yann-kaelig has quit IRC23:48
*** tosky has quit IRC23:53

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!