*** AJaeger has quit IRC | 06:10 | |
*** AJaeger has joined #openstack-infra-incident | 06:31 | |
*** rosmaita has joined #openstack-infra-incident | 11:15 | |
*** rlandy has joined #openstack-infra-incident | 12:14 | |
dmsimard | ohai, etherpad.openstack.org is running Ubuntu 14.04 with an outdated version of etherpad. There was a release today which fixes arbitrary code execution: http://blog.etherpad.org/2018/04/07/important-release-1-6-4/ | 13:53 |
---|---|---|
dmsimard | The file timestamps are up to date in /opt/etherpad but the git log dates back to 2015 (as do some of the settings and configuration files) | 13:54 |
dmsimard | I'll try and see if I can get 1.6.4 to work off of a new 16.04 VM | 13:55 |
pabelanger | http://git.openstack.org/cgit/openstack-infra/puppet-etherpad_lite/tree/manifests/init.pp has the ability to run develop version, but unsure why we are pinned to cc9f88e7ed4858b72feb64c99beb3e13445ab6d9 | 13:57 |
fungi | well that's fun. i used to be the one to receive security@etherpad.org e-mail and work with the author on embargoed disclosure with some of their downstream stakeholders | 14:07 |
dmsimard | etherpad01.o.o spinning up on 16.04, we'll see how it goes | 14:07 |
fungi | i guess i'm not any longer (it's been a couple years since they had any security fixes though so maybe not entirely surprising) | 14:07 |
fungi | is etherpad-dev.o.o not running latest? | 14:08 |
dmsimard | I haven't looked, let me see | 14:08 |
dmsimard | fungi: oh, yeah, etherpad-dev runs the latest version | 14:08 |
fungi | /opt/etherpad-lite/etherpad-lite has 1fdb01fd759133b4da001dc5e233420a14cd8d59 checked out | 14:09 |
fungi | from today | 14:09 |
fungi | on etherpad-dev | 14:09 |
dmsimard | well, that's good -- we know 1.6.4 works with our current deployment setup, that's something | 14:09 |
fungi | looks like node has been running since january, so i'm going to do a service restart | 14:10 |
fungi | just to make sure we're testing the latest version | 14:10 |
pabelanger | web is down for me on etherpad-de | 14:11 |
fungi | yeah, it's restarting | 14:11 |
fungi | takes a few minutes, if memory serves | 14:11 |
pabelanger | cool | 14:11 |
corvus | looks like the version was pinned just because that was the latest develop version at the time. so i think it was just us being conservative with versions on the production server. if develop works on -dev, it should be fine to upgrade. | 14:11 |
pabelanger | great | 14:12 |
fungi | yeah, that's been our pattern in the past. test latest version on e-dev.o.o, if it's good then roll to that version on e.o.o | 14:12 |
fungi | though it's taking a while to start, so it _may_ not be happy | 14:12 |
fungi | gonna check logs here in a sec | 14:12 |
dmsimard | np, etherpad01.o.o is still spinning up on 16.04 (I screwed up and had to restart) | 14:13 |
dmsimard | I wonder if there's any SQL migrations ? | 14:13 |
fungi | init: etherpad-lite pre-start process (10176) terminated with status 1 | 14:13 |
dmsimard | I got this when running launch, didn't seem fatal: http://paste.openstack.org/raw/718739/ | 14:15 |
fungi | /var/log/eplite/error.log says "Ensure that all dependencies are up to date... If this is the first time you have run Etherpad please be patient." | 14:15 |
dmsimard | not sure of the impact | 14:15 |
dmsimard | fungi: maybe we update etherpad but not it's deps ? | 14:17 |
fungi | i'm checking to see what deps those might be | 14:17 |
dmsimard | fungi: I think this only ever ran once: https://github.com/openstack-infra/puppet-etherpad_lite/blob/master/manifests/init.pp#L93-L106 | 14:17 |
dmsimard | because of the "creates" | 14:17 |
dmsimard | ah doh | Apr 9 14:16:48 etherpad01 puppet-user[20797]: Could not find data item etherpad_ssl_cert_file_contents in any Hiera data file and no default supplied at /opt/system-config/production/manifests/site.pp:454 on node etherpad01.openstack.org | 14:18 |
fungi | dmsimard: yeah, i expect we should rerun installDeps.sh | 14:19 |
fungi | in /opt/etherpad-lite/etherpad-lite on etherpad-dev.o.o i'm running `sudo -u eplite env HOME=/var/log/eplite ./bin/installDeps.sh` | 14:22 |
fungi | it's churning through pulling in the deps now | 14:23 |
dmsimard | fungi: lgtm | 14:23 |
fungi | it eventually spewed this error: http://paste.openstack.org/show/718740 | 14:24 |
fungi | i think it may simply not like that the cert we're using there is self-signed | 14:24 |
dmsimard | fungi: wait | 14:25 |
dmsimard | HOME=/var/log/eplite ? | 14:25 |
dmsimard | should that not be like /opt/etherpad-lite or something ? | 14:25 |
fungi | https://git.openstack.org/cgit/openstack-infra/puppet-etherpad_lite/tree/manifests/init.pp#n98 | 14:25 |
fungi | i think it just wants somewhere writeable by the eplite user | 14:26 |
dmsimard | huh | 14:26 |
dmsimard | okay | 14:26 |
dmsimard | re-spinning etherpad01 on 16.04 with fixed hiera things | 14:27 |
fungi | seeing if maybe that error was non-fatal, trying again to start etherpad-lite service | 14:27 |
fungi | trying to manually start the service here's what i'm getting: http://paste.openstack.org/show/718741/ | 14:32 |
fungi | i wonder if it should instead be looking in /opt/etherpad-lite/etherpad-lite/src/node_modules/ | 14:33 |
fungi | though i don't see a ep_etherpad-lite subdir under there either | 14:33 |
dmsimard | there's one in | 14:34 |
dmsimard | oh, huh | 14:34 |
dmsimard | etherpad.o.o is different | 14:34 |
dmsimard | etherpad.o.o has a node_modules at the root of /opt/etherpad-lite/etherpad-lite | 14:35 |
dmsimard | and then you have /opt/etherpad-lite/etherpad-lite/node_modules/ep_etherpad-lite -> ../src | 14:35 |
dmsimard | I don't see that on etherpad-dev | 14:35 |
fungi | i wonder if it wouldn't be simpler to just blow away /opt/etherpad-lite entirely and re-kick puppet | 14:36 |
dmsimard | on etherpad-dev ? worth a shot. I'd simply rename the /opt/etherpad-lite directory before we're sure it's the good way though | 14:36 |
fungi | sure, can do that too. in case we want to compare content | 14:36 |
corvus | qq, why are we upgrading the server? | 14:37 |
fungi | moved it to /opt/etherpad-lite.old_2018-04-09 | 14:37 |
corvus | the operating system, that is | 14:37 |
fungi | i don't know, nor do i expect it to be necessary | 14:38 |
fungi | it was the rabbit hole i found people in when i got here | 14:38 |
fungi | i'm focusing on just getting latest etherpad code to deploy safely on existing servers | 14:38 |
fungi | re-kicking puppet on etherpad-dev now | 14:39 |
dmsimard | corvus: no particular reason, I figured I might as well take the opportunity to upgrade it, someone had already added xenial in site.pp | 14:40 |
fungi | i'd prefer if we could focus on fixing the security vulnerability in the fastest way possible, but i don't object to people looking into upgrading the operating system once this is behind us | 14:41 |
fungi | fatal: [etherpad-dev.openstack.org]: FAILED! => {"changed": false, "failed": true, "msg": "/usr/bin/timeout -s 9 30m /usr/bin/puppet apply /opt/system-config/production/manifests/site.pp --logdest syslog --environment 'production' --no-noop --detailed-exitcodes failed with return code: 6", "rc": 6, "stderr": "", "stdout": "", "stdout_lines": []} | 14:42 |
fungi | yeah, it's hitting that same "npm ERR! Error: CERT_UNTRUSTED" | 14:43 |
frickler | not sure whether that has been mentioned yet, there are two patches still in progress related to etherpad on xenial | 14:43 |
frickler | https://review.openstack.org/528156 and https://review.openstack.org/528130 | 14:43 |
dmsimard | frickler: ok thanks, let's leave xenial for later then | 14:43 |
fungi | looking for a quick workaround now for what i expect is related to the self-signed cert on the dev server | 14:44 |
dmsimard | fungi: we could generate a letsencrypt cert ? | 14:44 |
fungi | we could. will that be faster? | 14:44 |
dmsimard | Do we generate/manage letsencrypt certs anywhere right now ? | 14:45 |
fungi | may not actually be the server's cert it's complaining about | 14:45 |
dmsimard | can either pattern off of that or do a manual certbot challenge | 14:45 |
fungi | looks like this is more likely due to outdated trust set in older node.js releases | 14:46 |
fungi | https://github.com/npm/npm/issues/20191 | 14:46 |
dmsimard | ah so we need the patches that frickler mentioned | 14:46 |
frickler | the notes in https://etherpad.openstack.org/p/infra-sprint-xenial-upgrades seem to indicate that one needs at least node 6.x for current etherpad versions | 14:46 |
fungi | yeah, ubuntu bug 1760840 | 14:47 |
openstack | Ubuntu bug 1760840 in npm (Ubuntu) "npm contains hardcoded certificate, so npm is not working anymore.." [Undecided,New] https://launchpad.net/bugs/1760840 | 14:47 |
frickler | see also the "in progress" section there for ethercalc | 14:47 |
fungi | looks like it was reported less than a week agio | 14:47 |
fungi | i'm trying the patch from that bug as a temporary workaround, mostly to make sure it's the actual (and only) problem we're facing | 14:50 |
fungi | cleared out /opt/etherpad-lite and am re-kicking puppet now | 14:51 |
mnaser | fungi: apt-get install ssl-cert and you will find /etc/ssl/{certs,private}/ssl-cert-snakeoil.pem | 14:51 |
mnaser | lazy fool proof way of getting self signed certs D: | 14:52 |
dmsimard | Wow that's a lot of hardcoded certificates | 14:53 |
fungi | that seems to have worked | 14:54 |
fungi | mnaser: yep, that's what we already do in the puppet-etherpad_lite module if no certs are provided | 14:54 |
fungi | puppet applied without error after applying the patch to config-defs.js | 14:55 |
fungi | etherpad-lite start/running, process 18909 | 14:55 |
dmsimard | fungi: where is that config-defs file running ? | 14:55 |
dmsimard | or located, rather | 14:56 |
fungi | however it looks like it's probably crashing immediately | 14:56 |
fungi | dmsimard: /usr/share/npm/node_modules/npmconf/config-defs.js | 14:56 |
dmsimard | thanks | 14:56 |
fungi | looks like it's gone into a classic etherpad spawn->crash->respawn->crash->respawn->... loop | 14:57 |
fungi | this is what i find repeating in the error log for each time it starts: http://paste.openstack.org/show/718742/ | 14:59 |
dmsimard | I have to step away for dentist appointment, be back in a bit | 14:59 |
fungi | i'm going to try manually starting the service again in the foreground | 15:00 |
fungi | this looks to be the next problem: http://paste.openstack.org/show/718744/ | 15:01 |
fungi | and `/usr/local/bin/node --version` does indeed report v0.10.25 | 15:02 |
fungi | yeah, that's a symlink to /usr/bin/nodejs because we tell it to use the system package per http://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/manifests/etherpad_dev.pp#n9 | 15:05 |
fungi | so system-config change 526978 and puppet-etherpad_lite change 528156 try to address that though we'll also need to patch the etherpad and etherpad_dev classes in system-config to use something other than "system" for the $nodejs_version parameter | 15:12 |
fungi | 528156 also has the other change frickler linked (528130) as a parent | 15:13 |
fungi | so i'm reviewing those now | 15:13 |
fungi | oh, nevermind, 528156 is the parent of 528130 not the other way around | 15:17 |
corvus | fungi: it looks like the stack is blocked by 528625 | 15:19 |
fungi | yeah, i just finished reviewing that one | 15:19 |
corvus | lgtm too | 15:19 |
fungi | other than the line setting the homedir in that exec resource being somewhat redundant now, it seems fine | 15:19 |
fungi | though ultimately it's 526978 we need and then another system-config change on top of it | 15:20 |
fungi | i've approved that one and am writing the change to openstack_project::etherpad_dev now | 15:21 |
fungi | https://review.openstack.org/559767 Use nodejs 6.x on etherpad-dev.o.o | 15:24 |
corvus | clarkb, pabelanger: ^ | 15:31 |
* clarkb catches up | 15:33 | |
pabelanger | same | 15:35 |
clarkb | I've approved nodejs update change | 15:36 |
fungi | thanks, i'll clean up etherpad-dev so it'll get retried | 15:40 |
fungi | backing out the patch from bug 1760840 as well | 15:41 |
openstack | bug 1760840 in npm (Ubuntu) "npm contains hardcoded certificate, so npm is not working anymore.." [Undecided,New] https://launchpad.net/bugs/1760840 | 15:41 |
fungi | removed /opt/etherpad-lite and /var/log/eplite/.npm too | 15:42 |
clarkb | fungi: so I make sure I'm up to speed, we are update nodejs on -dev sothat we can deploy latest etherpad-lite there to fix a bug, Once we show that is working we'll do similar with production? | 15:48 |
fungi | clarkb: 100% correct | 15:48 |
fungi | wow, jobs for system-config changes really do seem to take a while | 16:09 |
corvus | looks like it depends on the provider | 16:13 |
corvus | sometimes they take 50% longer | 16:13 |
fungi | i'm going to need to disappear in ~30 minutes for an appointment, but expect this shouldn't be hard to iterate on | 16:17 |
fungi | looks like they just merged | 16:17 |
fungi | clarkb just pointed out to me that our production deployment is probably only actually impacted by the third bullet on http://blog.etherpad.org/2018/04/07/important-release-1-6-4/ | 16:22 |
fungi | since it's on a random commit somewhere after 1.5.0 but prior to 1.6.0 | 16:23 |
fungi | which rules out the first bullet | 16:23 |
fungi | and using mysql for the pad store | 16:23 |
fungi | which rules out the second bullet | 16:23 |
fungi | so depending on how worried we are about people being able to extract content from pads whose names they don't know, this is likely not super urgent? | 16:24 |
corvus | indeed. the obscurity of random etherpad data is nice (to share private drafts, etc). but hopefully folks have all used that with a grain of salt and not relied on it not being discovered. | 16:24 |
corvus | might be nice to finish upgrading it today, but probably not stop-the-world urgent | 16:25 |
fungi | right, as in i'm happy to continue poking at it, but not so worried about lunch-appointment-induced delays | 16:25 |
fungi | and we can probably move remaining discussion back to #openstack-infra | 16:25 |
clarkb | ya I'm going to approve the dib change for bionic dns now that this is less urgent (I didn't want dib distracting us but I think we don't have to ensure this is done immediately before lunch as fungi puts it ) | 16:27 |
fungi | clarkb: there's a pending zuul scheduler restart | 16:28 |
clarkb | that should be fine, can just recheck or reapprove if it doesn't make it in before that | 16:28 |
fungi | yeah | 16:29 |
dmsimard | sorry about the commotion, didn't realize our etherpad was so out of date :p | 16:29 |
clarkb | I've become somewhat allergic to updating it because every update meant bugfixes at the summit in the middle of summiting | 16:29 |
clarkb | but we have updated for bug fixes as necessary so fine to continue updating | 16:29 |
fungi | dmsimard: well, it's still semi-critical for etherpad-dev, ironically | 16:30 |
fungi | but as long as we keep it offline there until we're running the latest version, shouldn't be a huge concern | 16:30 |
dmsimard | I was looking at the couple patches we needed.. looks like we an unrelated failure in https://review.openstack.org/#/c/528626/ "ArgumentError: Could not find declared class ::drush::git::drush at /etc/puppet/modules/drupal/manifests/drush.pp:32 on node" | 16:33 |
dmsimard | for groups.o.o | 16:33 |
dmsimard | The parent patch dates back to december, I'll try a rebase. | 16:34 |
dmsimard | oh, wait, that's in puppet-etherpad, not system config... it's already up to date. | 16:34 |
dmsimard | meh, "fatal: unable to access 'https://git.drupal.org/project/puppet-drush/': Failed to connect to git.drupal.org port 443: Connection timed out" let's try that again. | 16:37 |
clarkb | dmsimard: what is the relationship between etherpad and drupal? | 16:37 |
clarkb | I'm not confused | 16:37 |
clarkb | *now | 16:37 |
dmsimard | clarkb: https://review.openstack.org/#/c/528626/ | 16:37 |
dmsimard | clarkb: was failing a puppet apply job on a drupal thing because the puppet module installation failed for drush | 16:38 |
clarkb | ah its the noop apply job running against all the nodes in site.pp | 16:38 |
dmsimard | yeah.. | 16:38 |
dmsimard | I was confused too :) | 16:38 |
clarkb | dmsimard: https://git.drupal.org/project/puppet-drush/ is a 404 I'm guessing it moved | 16:43 |
dmsimard | clarkb: the http is just not set up right, git clone works | 16:44 |
dmsimard | (I tested it) | 16:44 |
clarkb | ah | 16:44 |
clarkb | ya confirmed works via git clone | 16:47 |
-openstackstatus- NOTICE: zuul was restarted to update to the latest code; please recheck any changes uploaded within the past 10 minutes | 16:51 | |
*** rlandy has quit IRC | 21:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!