*** bhavikdbavishi has joined #softwarefactory | 02:06 | |
*** bhavikdbavishi has quit IRC | 03:18 | |
*** bhavikdbavishi has joined #softwarefactory | 04:06 | |
*** raukadah is now known as chandankumar | 05:09 | |
jangutter | hi, I did an upgrade last night from 3.0 to 3.2 (!!!) and I'm getting weird errors in zuul-scheduler.log ERROR gear.Server: Exception in connect loop: ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:645) | 08:19 |
---|---|---|
*** jpena|off is now known as jpena | 08:48 | |
jangutter | I found a hint: reverting https://softwarefactory-project.io/r/#/c/11591/ silenced the warnings (unencrypting the gearman comms). I wonder if my localCA needs to be regenerated? | 09:08 |
*** bhavikdbavishi has quit IRC | 09:32 | |
*** bhavikdbavishi has joined #softwarefactory | 09:32 | |
*** sshnaidm is now known as sshnaidm|off | 10:00 | |
*** bhavikdbavishi has quit IRC | 10:46 | |
*** bhavikdbavishi has joined #softwarefactory | 11:29 | |
*** jpena is now known as jpena|lunch | 12:51 | |
tristanC | jangutter: ssl cert should have been auto generated, are all the package up-to-date? (e.g. yum update) | 12:56 |
jangutter | tristanC: Lemme check! Thanks. | 12:56 |
jangutter | tristanC: No packages marked for update. I am using my own certs for the reverse proxy though, would that make a difference? | 12:57 |
jangutter | tristanC: sfconfig.yaml:network.tls_cert_file and friends. (thanks for that, btw!) | 12:58 |
tristanC | jangutter: hum, maybe, so the gearman.crt and key are generated with https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/sfconfig/components.py#n69 | 12:59 |
tristanC | using the CA from /var/lib/software-factory/bootstrap-data/certs/localCA.pem | 13:01 |
jangutter | tristanC: yeah, what's weird to me is I'm getting "WRONG_VERSION_NUMBER".... I'm not a TLS expert, but could it be the localCA.pem is still pretty old and doesn't have new cyphers? | 13:04 |
tristanC | jangutter: is this command returns "OK": openssl verify -CAfile /etc/zuul/ssl/localCA.pem /etc/zuul/ssl/gearman.crt ? | 13:05 |
jangutter | tristanC: where else is localCA used? (i.e. would it break stuff if I go delete it and recreate it?) | 13:05 |
jangutter | tristanC: /etc/zuul/ssl/gearman.crt: OK | 13:05 |
tristanC | jangutter: and "rpm -q rh-python35-python-gear" says rh-python35-python-gear-0.13.0-1.el7.noarch right? | 13:08 |
jangutter | Oooh! nope | 13:09 |
jangutter | 0.12 | 13:09 |
tristanC | actually, that's normal, this version of python-gear should work | 13:10 |
tristanC | so yeah, maybe the localCA is too old, though we didn't have such issue for our SF who got upgraded from 3.0 | 13:11 |
jangutter | The backtrace has: ssl_version=ssl.PROTOCOL_TLSv1 | 13:11 |
tristanC | which is what python-gear is using | 13:13 |
jangutter | from what I can gather WRONG_VERSION_NUMBER seems to be a generic error (you also get it if you're trying to do tls on unencrypted links). | 13:15 |
tristanC | yes, so in SF-3.2, the zuul-scheduler gearman service is now protected by TLS | 13:16 |
tristanC | jangutter: is this working: "echo status | /usr/local/bin/gearman-client | 13:17 |
jangutter | Should be libexec, let me check. | 13:17 |
jangutter | I'm restarting zuul with the TLS back in place. | 13:18 |
tristanC | jangutter: this script should be copied to /usr/local/bin | 13:19 |
jangutter | Nope, I only see: cgit-config-generator.py resources2repoxplorer.py resources.sh in there. | 13:19 |
tristanC | jangutter: another thing to check is if the gearman service has been restarted, it should be a scheduler child process, you can get its pid using "sudo netstat -nepal | grep 4730.*LISTEN" | 13:19 |
jangutter | Yep, ss -nlp | grep 4730 shows it listening. | 13:20 |
tristanC | jangutter: that's odd, did you run "sfconfig" after the upgrade? | 13:20 |
jangutter | yep, multiple times afterwards in fact. | 13:20 |
jangutter | I checked the libexec path is in the ansible scripts shipped with sf-3.2 btw. | 13:21 |
jangutter | oh. Zuul's not starting now :-( | 13:22 |
jangutter | zapping the TLS config made it work again (phew). | 13:23 |
tristanC | jangutter: my bad, the script is indeed copied to libexec now, sorry it's late here :) | 13:23 |
jangutter | Hey, this is definitely not a priority! Thanks very much for helping. | 13:24 |
tristanC | jangutter: could you paste the error you get? e.g. the line before might be helpful | 13:24 |
jangutter | I tried to find what exactly triggered it but failed. Lemme paste the error into pastebin | 13:25 |
jangutter | https://pastebin.com/RHQtgDAm | 13:26 |
tristanC | jangutter: and could it be that the gearman process (the one listening on 4730) didn't got restarted with the tls settings? | 13:27 |
jangutter | Hmmmm.... let me check that theory! I though the gearman process forked off the zuul-scheduler service. | 13:27 |
jangutter | OK, the 4730 port goes dead if I stop zuul-scheduler. | 13:28 |
jangutter | If I re-enable TLS, it pauses at: INFO zuul.ConfigLoader: Loading configuration from /etc/opt/rh/rh-python35/zuul/main.yaml | 13:29 |
tristanC | jangutter: if it pauses there, it likely means it is waiting for executor/merger to perform merge task over gearman | 13:31 |
tristanC | jangutter: perhaps try to restart rh-python35-zuul-executor now | 13:31 |
tristanC | jangutter: it seems like if the scheduler reach "Loading configuration", then it manage to connect to gearman | 13:32 |
tristanC | managed* | 13:32 |
jangutter | systemctl restart zuul-executor took loooong | 13:33 |
tristanC | jangutter: it could have been that services didn't got restarted properly, especially if you went from 3.0 to 3.2 | 13:33 |
jangutter | Ah failing: AttributeError: 'MergeJob' object has no attribute 'updated' | 13:33 |
jangutter | zuul-scheduler is now failing on an exception. | 13:33 |
jangutter | I'm restarting it... looks fine thus far. | 13:34 |
jangutter | INFO zuul.Scheduler: Full reconfiguration complete | 13:34 |
jangutter | Bingo, thanks! | 13:35 |
jangutter | looks like "turning it off, then turning it on" made it work! | 13:35 |
tristanC | jangutter: the updated attribute error is a known benign issue: see https://review.openstack.org/#/c/633259/ | 13:35 |
tristanC | jangutter: yeah, i guess the service didn't got restarted as expected, before 3.2 the upgrade used to stop everything, do the upgrade, and start everything | 13:36 |
tristanC | jangutter: in 3.2, sf-config tries to be smarter and it should only restart service if the service's package got updated | 13:37 |
jangutter | If I went from 3.0 to 3.1, I might not have noticed. | 13:37 |
tristanC | jangutter: but this may not work well if sfconfig process failed | 13:37 |
tristanC | jangutter: perhaps you should restart the instance too, to make sure everything is running the right version, and perhaps update kernel too | 13:38 |
jangutter | echo status | /usr/libexec/software-factory/gearman-client is pausing though. | 13:38 |
jangutter | But, thanks I think I'll do some rebooting. | 13:39 |
jangutter | have a great weekend! | 13:39 |
tristanC | jangutter: gearman-client doesn't exit iirc, it should print the list of job and ends with a single ".\n" | 13:39 |
jangutter | yeah, it's just quiet. | 13:40 |
jangutter | Gimme a sec, let me run the openssl manually. | 13:40 |
jangutter | nope, I'll do more digging. | 13:41 |
tristanC | alright, i'll leave now, let me know if reboot helped | 13:42 |
jangutter | weird, gearman-client works, but it's just reallly slooow. | 13:49 |
jangutter | rebooting. | 13:50 |
*** jpena|lunch is now known as jpena | 13:54 | |
*** bhavikdbavishi has quit IRC | 13:55 | |
jangutter | tristanC: thanks again, after a reboot, all the services seem to be much happier now. | 13:55 |
*** chandankumar is now known as raukadah | 14:26 | |
*** bhavikdbavishi has joined #softwarefactory | 15:20 | |
*** jangutter has quit IRC | 16:58 | |
*** bhavikdbavishi has quit IRC | 17:11 | |
*** bhavikdbavishi has joined #softwarefactory | 17:17 | |
sfbender | Javier Peña created DLRN master: Do not fallback to master on branches starting with rhos- https://softwarefactory-project.io/r/15121 | 17:47 |
*** jpena is now known as jpena|off | 18:03 | |
*** irclogbot_3 has joined #softwarefactory | 18:11 | |
*** bhavikdbavishi has quit IRC | 18:46 | |
*** rfolco|rover has quit IRC | 19:21 | |
*** irclogbot_3 has quit IRC | 19:48 | |
*** irclogbot_3 has joined #softwarefactory | 20:03 | |
*** rfolco has joined #softwarefactory | 20:37 | |
*** irclogbot_3 has quit IRC | 21:37 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!