-openstackstatus- NOTICE: Jobs in gate are failing with POST_FAILURE. Infra roots are investigating | 07:43 | |
*** ChanServ changes topic to "Jobs in gate are failing with POST_FAILURE. Infra roots are investigating" | 07:43 | |
-openstackstatus- NOTICE: logs.openstack.org has corrupted disks, it's being repaired. Please avoid rechecking until this is fixed | 08:23 | |
*** ChanServ changes topic to "logs.openstack.org has corrupted disks, it's being repaired. Please avoid rechecking until this is fixed" | 08:23 | |
mordred | fungi: JOY | 13:09 |
---|---|---|
fungi | oh! my ssh retries have finally paid off | 13:09 |
fungi | the server is sort of responsive again | 13:10 |
mordred | oh good | 13:10 |
fungi | though free -m is still taking a while | 13:10 |
mordred | fungi: should we stop zuul to stop things from trying to upload logs? | 13:10 |
fungi | not sure yet | 13:11 |
mordred | or maybe pause all the nodepool providers instead, so that zuul still grabs the event queues | 13:11 |
* mordred stands by | 13:11 | |
fungi | yeah, free confirms all 8gb ram and about 4gb out of 8gb of swap in use | 13:15 |
fungi | no find commands in the process table at least | 13:15 |
*** yolanda has joined #openstack-infra-incident | 13:15 | |
yolanda | hi fungi, so you are going to restart the server? | 13:16 |
fungi | i'm trying to see if i can get the top memory offender processes | 13:16 |
fungi | yolanda: no, since it seems to sort of be working again | 13:16 |
yolanda | we went with that this morning, but it was causing to loose access to all the other static sites, so we switched to a manual fsck | 13:16 |
yolanda | fungi, i got a POST failure now, even with log volume not being mounted. I wonder if the logs that are writte on the root volume, are exhausting it? | 13:17 |
fungi | i'm going to put it in the emergency list on the puppetmaster to start | 13:18 |
fungi | yeah, so i think the apache wsgi for the logs site | 13:18 |
fungi | working on disabling that site in apache | 13:20 |
fungi | tons of very high memory using apache processes in the table, so probably os-loganalyze killing us here | 13:22 |
fungi | the fsck itself is using about 6gb ram and we need every bit we can muster | 13:22 |
yolanda | fungi, i wonder if we could split those volumes somehow | 13:26 |
yolanda | if having such a big volume is a problem | 13:26 |
*** AJaeger_ is now known as AJaeger | 13:28 | |
fungi | okay, i disabled that vhost but also stopped apache temporarily | 13:31 |
fungi | memory pressure has reduced _greatly_ | 13:32 |
fungi | and system load has dropped from over 100 to around 2 | 13:32 |
fungi | starting apache again now with the logs vhost still out of the config set | 13:33 |
yolanda | apache killing us! | 13:33 |
fungi | yeah, os-loganalyze is a resource hog, but that's not normally a problem when it's the only thing really running | 13:34 |
fungi | but when we need most of the available ram for fsck, it gets ugly | 13:35 |
fungi | next step, we're probably going to need to ditch the logs we've been spooling to /srv/static while /srv/static/logs is offline | 13:38 |
fungi | /srv/static is almost full now | 13:38 |
fungi | this is why i've usually just intentionally broken log uploading in the past since we don't have enough room to buffer them for the duration of the fsck | 13:40 |
fungi | any opinions? should we try moving them into /opt temporarily instead? | 13:40 |
yolanda | sometimes when i was hitting that, i was just letting logs to upload on the root volume, then copying to the right one. But yes, looks as disk space is a risk now. It never took that long for me to recover from an fsck | 13:41 |
fungi | well, we probably have 30-60 minutes before it fills up depending on how quickly activity ramps up | 13:41 |
fungi | i'm going to take the opportunity for a quick break to finish my morning routine and ingest some sustenance | 13:42 |
fungi | i think puppet must have reenabled the logs site before it noticed the server is in emergency disabled state | 14:07 |
fungi | working to undo it again | 14:07 |
fungi | if my shell prompt ever comes back | 14:08 |
*** jeblair has joined #openstack-infra-incident | 14:25 | |
fungi | still waiting for my shell to return to responsiveness | 14:26 |
*** EmilienM has joined #openstack-infra-incident | 14:29 | |
*** jroll has quit IRC | 14:32 | |
*** jroll has joined #openstack-infra-incident | 14:34 | |
*** yolanda has quit IRC | 14:43 | |
fungi | and still waiting. it's been stuck waiting to return the results of a df -h for over 35 minutes now | 14:43 |
mordred | fungi: wow | 14:45 |
fungi | once i regain control of that shell, sudo service apache2 stop is happening | 14:46 |
*** bnemec has joined #openstack-infra-incident | 14:50 | |
jeblair | somehow snmp is still getting data | 14:51 |
fungi | yeah, starting to wonder if df was hanging because of disk i/o so i just spammed a bunch of ctrl-c to see if it would stop trying | 15:02 |
fungi | hasn't yet though | 15:07 |
fungi | okay, it just came back and i stopped apache for a moment | 15:16 |
fungi | next highest memory user besides fsck.ext4 is rsyslogd | 15:17 |
fungi | roughly 25 sshd processes eating up quite a bit though | 15:17 |
fungi | and we're at about 90% used on /srv/static now (88g out of 99g) | 15:18 |
jeblair | maybe about 2/3 jenkins ssh. 1/3 us. | 15:18 |
fungi | i've started apache back up again and am going to watch it to see if i can tell if it's really the culprit | 15:19 |
jeblair | fungi: with or without the logs vhost? | 15:19 |
fungi | turns out the logs.o.o vhost hadn't gotten reenabled | 15:19 |
jeblair | oh huh | 15:19 |
fungi | so i don't think it was responsible for the second load blow-up (which was more sustained but also the load average didn't climb as high) | 15:19 |
fungi | right now 5-minute load has dropped below 2.0 even with apache back up (sans logs.o.o vhost still) | 15:20 |
jeblair | fungi: maybe that was just fsck? | 15:22 |
fungi | entirely possible, though i see a bunch of 00-header and sshd and xe-update-something spiking way up in the load now | 15:28 |
jeblair | it's going unresponsive for me | 15:29 |
fungi | trying to get a better idea of what all this is cropping up in the process list suddenly (and why) | 15:29 |
fungi | yeah, my ps is hanging again | 15:29 |
jeblair | fungi: i think 'xe-' is xen something | 15:30 |
fungi | yeah, i saw several of these pop up relatively high on cpu percentage in top just as the system load began climbing again | 15:31 |
*** yolanda has joined #openstack-infra-incident | 15:32 | |
mordred | such a fan of in-guest utilities doing things | 15:32 |
fungi | xe-update-guest-attrs: Tool for writing arbitrary values to the Xen Store | 15:33 |
fungi | Writes arbitrary values to the Xen Store for hypervisor communication. | 15:33 |
jeblair | though to be fair, bash and sshd processes had a higher cpu percentage than even fsck, likely because they were stuck on io. | 15:33 |
fungi | right, those are likely popping up due to high iowait | 15:33 |
fungi | my terminay just got a byte in edgewise | 15:35 |
fungi | er, terminal | 15:35 |
fungi | fsck is at 84.8% complete at this point, btw (in case i'm the only one with a working screen -x there) | 15:36 |
jeblair | that seems about what it was at when i looked 15 minutes ago | 15:37 |
yolanda | it has only progressed a 10% over the last hours | 15:38 |
jeblair | 85.2% now | 15:41 |
*** clarkb has joined #openstack-infra-incident | 15:45 | |
clarkb | fungi: so for the spool disk you've just got the logs fs unmounted so the "base" fs is covering the deeper repos? | 15:49 |
clarkb | s/repos/dirs/ | 15:49 |
fungi | clarkb: yeah, normally i've disabled the jenkins ssh key to prevent log uploads (breaks jobs for the duration of course), but this time the volume was just unmounted and logs have been let to accumulate in the parent directory's filesystem | 15:51 |
fungi | i'm still not sure how we're going to go about cleanly merging them into the logs volume (rsync?), and we're likely to run out of room to keep doing this pretty soon anyway | 15:52 |
clarkb | ya rsync is what we use for the copy and mount back in old location dance for test instances. Granted this is a fair bit more data | 15:53 |
fungi | hanging for me again. it just began to spike back up on load and i tried to catch what/where these run-parts scripts are | 15:58 |
jeblair | fungi: /etc/update-motd.d? | 16:09 |
jeblair | http://manpages.ubuntu.com/manpages/xenial/man5/update-motd.5.html | 16:10 |
jeblair | Executable scripts in /etc/update-motd.d/* are executed by pam_motd(8) as the root user at each login, and this information is concatenated in /var/run/motd. The order of script execution is determined by the run- parts(8) --lsbsysinit option (basically alphabetical order, with a few caveats). | 16:10 |
jeblair | fungi: so every time time jenkins logs in, that runs | 16:10 |
fungi | wow | 16:10 |
fungi | yep, those are the scripts i was seeing the names of popping up over and over | 16:11 |
fungi | and also explains all the lsb_release calls i was seeing | 16:11 |
jeblair | one of the scripts tries to see which filesystems need fscking, which it occurs to me could be kinda slow right now. though i think it only looks at mounted filesystems. | 16:12 |
fungi | i may temporarily purge the update-motd package when i get control of my terminal again and see if that helps at all | 16:13 |
jeblair | fungi: or just move the scripts out of the way? | 16:14 |
fungi | or just move the scripts out of that dir temporarily | 16:14 |
fungi | heh, yeah, that | 16:14 |
fungi | i've pushed that through my tty, so when my shell wakes back up those should temporarily move to my homedir | 16:18 |
fungi | there it goes! (well, at least the shell prompt updated, the mv hasn't returned just yet) | 16:34 |
fungi | and now it's applied | 16:36 |
fungi | maybe that'll cut down a little bit of the performance degradation at each iowait spike | 16:37 |
clarkb | for a second I thought you were going to say fsck was done | 16:40 |
jeblair | erm, something is going on with the screen session that i don't understand. | 16:46 |
fungi | as far as how it's updating? | 16:48 |
fungi | also 5-minute load average is back up over 40 again | 16:49 |
fungi | tons of defunct sshd processes | 16:49 |
jeblair | i guess it was something with my terminal. i've dropped and logged in again. | 16:50 |
fungi | okay, most of the defunct sshds have cleared again | 16:50 |
jeblair | 87.4% | 16:50 |
fungi | and system load seems to be dropping again | 16:50 |
*** pabelanger has joined #openstack-infra-incident | 16:50 | |
fungi | and now it's freezing on me again. seems to just be all over the place | 16:51 |
jeblair | has anyone done the math yet? at current rate it looks to me like 6.7 hours till fsck is finished? | 17:05 |
pabelanger | does is make sense to disable log publishing from jobs themselves? | 17:22 |
jeblair | fungi: it looks like /srv/static is at 100% | 17:24 |
fungi | yep. should we relocate the current content into /opt? or just delete? | 17:25 |
jeblair | fungi: opt isn't big enough | 17:25 |
jeblair | fungi: what about docs-draft? | 17:25 |
fungi | ahh, yeah, 68gb free | 17:25 |
fungi | oh, yep docs-draft has lots free | 17:26 |
fungi | i can get started doing that now | 17:26 |
fungi | once the terminal is responding again | 17:26 |
jeblair | that's going to be a problem :| | 17:26 |
fungi | once my shell wakes up, it will create a dir called /srv/static/docs-draft/logs-tmp and mv /srv/static/logs/* into there | 17:29 |
fungi | hopefully the extra i/o doesn't just make matters worse | 17:29 |
clarkb | do we know what caused the underlying volume issues? was that scheduled rax maintenance? | 17:33 |
fungi | yes, the scheduled maintenance | 17:37 |
fungi | hours of iscsi connectivity outage for one of the cinder volumes | 17:38 |
fungi | the mv is (ostensibly) under way now | 17:42 |
fungi | watch us get to 99% complete and then fanatical support "helpfully" reboots the instance for us because they see it heavily swapping | 17:44 |
clarkb | (thinking out loud here) if the osuosl stuff worsk out, maybe we want to carve off some of that to run a ceph cluster and host logs there? | 17:46 |
clarkb | or even just load up a node with 10TB drives and lvm it like we do today | 17:46 |
fungi | i still expect moving it to afs without a replica and sharding the tree across different servers would work | 17:47 |
clarkb | ya, we'd just need to figure out a sharding scheme that scales as we go (whereas with ceph the theory at least is you just keep adding disks and the underlying object store nature of ceph figures that out for you) | 17:49 |
clarkb | you'd also get replication with ceph | 17:49 |
jeblair | i think either would be an improvement :) | 17:49 |
clarkb | yes indeed. Just trying to think beyond the "must run on cloud" constraint and if that helps anything | 17:52 |
clarkb | build a few backblaze type fileservers, call it a day | 17:53 |
clarkb | (of course then the problem becaomes when a drive fails we have to drive to corvallis and replace it) | 17:53 |
jeblair | that's a lovely drive for me! | 17:54 |
fungi | scenic, at least? | 17:55 |
jeblair | yes, past vacaville | 17:57 |
clarkb | my drive is mostly flat and full of cows | 17:57 |
clarkb | but its not too long and I've done it before | 17:57 |
fungi | vaca is latin for cow | 17:58 |
clarkb | jeblair: side roading around the rogue river is awesome | 17:58 |
clarkb | especially in late spring before it gets too hot | 17:58 |
*** tbarron has joined #openstack-infra-incident | 17:58 | |
fungi | ouch, my mv is getting plenty of "cannot create hard link" failures out of kolla-kubernetes-deploy job logs | 18:00 |
fungi | hopefully they're in the minority | 18:00 |
jeblair | fungi: i think the throughput has slowed considerably :/ | 18:44 |
fungi | jeblair: i suspect that's because of the mv which is still proceeding | 18:44 |
jeblair | fungi: do you have a read on available space in /srv/static? | 18:45 |
fungi | no, we don't seem to be tracking that in cacti that i could tell | 18:45 |
jeblair | oh i meant progress on th mv | 18:45 |
fungi | i'll see if the screen session is responsive enough to get me another window and run df | 18:45 |
fungi | i figured i could infer the mv progress from cacti if we had been tracking it there | 18:46 |
fungi | oh, though maybe i can figure it out from the docs-draft in cacti | 18:46 |
fungi | looks like we're maybe 60% complete | 18:48 |
jeblair | cool, so we're at least able to keep above water | 18:48 |
fungi | er, sorry, 60% to go, probably 40% complete | 18:49 |
pabelanger | question, is there any volume we could move to files.o.o? Which leave us with a spare volume on static.o.o for maintenance rotation moving forward? | 18:49 |
fungi | hard to be sure since there are some gaps in the cacti responses | 18:49 |
fungi | pabelanger: when i looked before, adding all the lvms other than the one for logs together came out to less than one cinder volume | 18:50 |
fungi | oh, except for docs-draft. moving that might do it | 18:50 |
fungi | i forgot i had ruled it out because of basically also being logs | 18:50 |
clarkb | we may consider it just for simplicity of management | 18:51 |
jeblair | pabelanger: by files.o.o you mean afs? | 18:51 |
pabelanger | jeblair: not in this cause, just adding another cinder volume to the server. As I understand it, we are at a hard limited for attaching another volume on static.o.o? | 18:52 |
pabelanger | case* | 18:52 |
jeblair | pabelanger: well, files.o.o is principally an afs server | 18:52 |
fungi | df -h says /srv/static is 11% free at the moment | 18:52 |
jeblair | i don't think we should start attaching cinder volumes to what is designed to be a load-balanced web head. | 18:53 |
pabelanger | jeblair: okay. I wasn't sure of which other server would make sense to help migrate volumes from static.o.o from | 18:53 |
fungi | probably a new server | 18:53 |
pabelanger | ack | 18:54 |
pabelanger | static02? | 18:54 |
fungi | maybe even a server in vexxhost? ;) | 18:54 |
fungi | i hear they have some awesome storage | 18:54 |
jeblair | yeah. either a new server, or afs. doc-draft might be a nice smaller test case for that. | 18:54 |
fungi | how about a new server _and_ afs? | 18:54 |
jeblair | (or ceph) | 18:55 |
fungi | or ceph-backed afs ;) | 18:55 |
jeblair | any of those 3 options gives us a volume slot to pvmove in the future | 18:55 |
fungi | yup | 18:55 |
jeblair | (though, i do wonder about our ability to actually execute a pvmove of that scale while running and still pruning directories) | 18:56 |
fungi | moving some of the other stuff from static.o.o to files01 (by way of afs) probably does make sense though... tarballs, static content for sites like governance, specs and security... | 18:56 |
pabelanger | we've talked about optimizing our directory structure, as everybody is likely aware, to help with pruning | 18:57 |
jeblair | most of those could be done like docs and get RO redundancy (not tarballs probably) | 18:57 |
fungi | jeblair: probably to pvmove a logs.o.o volume i'd kill/disable the pruning and compression cronjob, then do the pvmove at a time when activity is lower (e.g., a weekend) | 18:57 |
jeblair | ya | 18:58 |
fungi | mighth still be a little slowish, but would probably work out fine | 18:58 |
clarkb | I wonder if ceph backed volumes need to take as many downtimes | 18:58 |
clarkb | (thinking of vexxhost option) | 18:58 |
pabelanger | speaking of pmove, reviews welcome: https://review.openstack.org/#/c/449232/ | 18:58 |
clarkb | in theory sine you have at least one replica for all ceph things you just replace as you go | 18:58 |
fungi | pvmove is how i originally replaced all our 0.5tb cinder volumes with 1tb volumes on static.o.o | 18:58 |
clarkb | and then we woudn;t need to worry on our end | 18:59 |
pabelanger | docs I follows for eavesdrop.o.o | 18:59 |
fungi | clarkb: mnaser was talking to me about that actually, said they do seamless/uninterrupted maintenance on their ceph storage | 18:59 |
clarkb | fungi: ya thats what I expected, its one of the reasons why ceph is so shiny to ops | 19:00 |
clarkb | the cost is lower IO throughput but seems well worth the cost | 19:00 |
fungi | pabelanger: yeah, irc logs might also make sense in afs, though they're not really that huge anyway (11gb currently in our meetbot volume) | 19:01 |
jeblair | is vexxhost offering enough space for logs? | 19:01 |
fungi | jeblair: i didn't mention a number to him, but i wouldn't be surprised if that was a fine amount | 19:01 |
clarkb | also we may be able to get even more disk than we currently have there | 19:02 |
jeblair | (because all of these ideas are good and doable, but take engineering time; a straight move to vexxhost would be simpler) | 19:02 |
clarkb | +++++++++++++++ | 19:02 |
clarkb | that could be an excellent intermediate stopgap | 19:03 |
jeblair | "rsync and check back in 3 weeks :)" | 19:03 |
clarkb | or even just put a line in the sand | 19:03 |
clarkb | and document how to modify urls by hand if you need old things | 19:03 |
fungi | i'm asking him the hypothetical with actual numbers right now | 19:03 |
clarkb | (and in the mean time double copy) | 19:03 |
jeblair | i'm pretty sure we can have mod_rewrite do the transition for us seamlessly | 19:04 |
fungi | if we were already sharding by date, we could just do apache redirects until the old server content ages out on its own | 19:04 |
jeblair | (if file not exist on new server, proxy from old server) | 19:04 |
clarkb | jeblair: good point | 19:04 |
fungi | oh, or maybe even if we aren't i guess | 19:04 |
fungi | as you described there | 19:04 |
clarkb | and then if old server gives 404 it really doesn't exist. I like that its simple | 19:04 |
pabelanger | that would be cool actually | 19:05 |
jeblair | yeah, rewritecond has a "-f is regular file" test | 19:05 |
fungi | he said shouldn't be a problem (we do need to update the infra donors list to include vexxhost with planet.o.o going in there anyway) | 19:05 |
fungi | also he says 12tb could be done as a single block storage volume in their environment, no problem | 19:06 |
jeblair | nice :) | 19:06 |
clarkb | would they be comfortable with doing more? | 19:06 |
clarkb | because thats the other scaling issue we face right now | 19:06 |
clarkb | (its not a regression if not so not deal breaker) | 19:07 |
jeblair | well, i'm hoping with zuulv3 we can start scaling back down again. once we can start putting caps on how much logs jobs store. | 19:07 |
clarkb | the "looming" thing I worry about is with containers becoming a popular packaging format it means that packages are now 300MB large instead of 3MB and that adds up fast | 19:07 |
clarkb | but yes if we can reign that in becomes less of an worry | 19:07 |
pabelanger | container problems | 19:08 |
jeblair | fungi: er, did fsck stop? | 19:08 |
jeblair | dogs: |================================================= - 87.2% | 19:09 |
jeblair | Unconnected directory inode 222462491 (/46/437546/4/experimental/gate-tripleo-ci | 19:09 |
jeblair | logs: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) | 19:09 |
jeblair | what's "-a"? | 19:09 |
clarkb | my man page doesn't say | 19:09 |
jeblair | automatic repair | 19:11 |
fungi | clarkb: i expect they would, i think what we should do though is get them on the donors page as soon as planet01.o.o is running in production, start working on a transition of logs.o.o to there, and see what kind of quota they're happy with pushing that up to. even if we start at just enough quota for what we're storing now, i think it's an improvement | 19:11 |
fungi | jeblair: yikes | 19:12 |
clarkb | fungi: that makes sense | 19:12 |
fungi | jeblair: argh, so we weren't running with -y? | 19:12 |
pabelanger | is the redirect rule on our current logs.o.o server, or the new one? | 19:12 |
jeblair | fungi: before we start again... | 19:12 |
pabelanger | which way would be proxy things | 19:12 |
clarkb | pabelanger: new one | 19:12 |
jeblair | fungi: maybe we should take a moment and think if there's anything we can do to improve the situation while fsck isn't killing the system | 19:13 |
clarkb | pabelanger: beacuse over time traffic to old one would die off | 19:13 |
pabelanger | clarkb: ya, going to play with rules. since I've already spent a few hours on it today | 19:13 |
pabelanger | clarkb: ya, make sense | 19:13 |
jeblair | fungi: maybe finish that mv for one... then maybe symlink logs into the docs-draft volume? | 19:13 |
fungi | jeblair: yeah, i'm looking at the fsck.ext4 manpage for ideas | 19:14 |
fungi | jeblair: also, sure the symlink sounds like a swell idea | 19:14 |
fungi | jeblair: i was reading earlier about setting fsck.ext4 to use a scratch space on disk instead of ram | 19:15 |
fungi | though that might be just as bad (or worse?) than swapping | 19:15 |
jeblair | i need to grab lunch, biab. | 19:17 |
fungi | l | 19:17 |
fungi | er, k | 19:17 |
clarkb | fungi: ya I'd worry about that especially since we aren't OOMing right? | 19:17 |
fungi | `man e2fsck.conf` has some interesting options | 19:17 |
clarkb | so worst case we hit disk with swap anyways or we get speed of memory | 19:18 |
fungi | okay, so after reading the e2fsck and e2fsck.conf manpages cover to cover, i don't have any great ideas other than we should just run with -y instead of -a so it will fix what can be fixed and if we lose some files, oh well | 19:22 |
clarkb | that works for me, | 19:23 |
fungi | i've pretty much always fsck'd this volume -y anyway because what else am i going to do to any of the corruption options it prompts about besides hit y and hope for the best? | 19:23 |
clarkb | ya | 19:23 |
pabelanger | is the plan to create a symlink for logs now? to help reduce pressure on volume? | 19:25 |
fungi | and of course finishing the data relocation off of /srv/static into /srv/static/docs-draft yeah | 19:25 |
pabelanger | thanks | 19:27 |
fungi | i'm going to abort the current mv of files from /srv/static/logs/* into /srv/static/docs-draft/docs-tmp/, and then in a single command line rename /srv/static/logs to /srv/static/logs.0 and symlink /srv/static/logs to docs-draft/logs, then start rsync'ing the contents of /srv/static/logs.0 into /srv/static/docs-draft/logs-tmp | 19:31 |
fungi | after which we can delete /srv/static/logs.0 and start the fsck back up with -y | 19:32 |
fungi | also this will make it easier to get the spooled logs back into the logs volume once we mount it again | 19:33 |
pabelanger | ack | 19:34 |
jeblair | fungi: ++ | 19:37 |
fungi | rsync -av /srv/static/logs.0/ /srv/static/docs-draft/logs-tmp | 19:37 |
fungi | that look right? | 19:37 |
fungi | i always second-guess myself on trailing-slash situations with rsync | 19:38 |
jeblair | fungi: i want a / on the end of logs-tmp. so gimme a min to check myself. | 19:38 |
fungi | heh | 19:38 |
fungi | i'm going by the `rsync -av /src/foo /dest` is equivalent to `rsync -av /src/foo/ /dest/foo` example in the rsync manpage | 19:39 |
clarkb | ya thats my reading. (btw the keyword to search for is "trailing") | 19:39 |
clarkb | (its easier for me to remember to search for trailing than to just remember the actual behavior apparently) | 19:39 |
jeblair | what's the diff for trailing slash on dest? | 19:40 |
clarkb | iirc dest doesn't matter | 19:40 |
jeblair | yeah, that's what i'm seeing. | 19:41 |
jeblair | so, lgtm. :) | 19:41 |
jeblair | ("we're both right!") | 19:41 |
fungi | was looking real quick to see if there was a good middle ground between normal execution and -v | 19:42 |
fungi | because that may slow it down if it outputs every single filename to the console | 19:42 |
fungi | any opinions? | 19:42 |
fungi | should i redirect it to a file instead? | 19:42 |
fungi | (best of both worlds since you can tail the file if you want to without terminal buffering slowing down the application)? | 19:43 |
jeblair | fungi: i'm okay just omitting it... we can df and strace | 19:44 |
fungi | wfm | 19:44 |
clarkb | write to file is still gonna block | 19:44 |
clarkb | so ya | 19:44 |
fungi | there goes | 19:44 |
fungi | it's in the same root screen session | 19:44 |
fungi | i'll keep an eye on this so i can clear out the source tree and then get the fsck started back up with -y this time as quickly as possible | 19:45 |
-openstackstatus- NOTICE: lists.openstack.org will be offline from 20:00 to 23:00 UTC for planned upgrade maintenance | 19:58 | |
fungi | i think the rsync is nearing completion since the used space on the docs-draft volume is about 110gb higher than it was earlier | 20:09 |
fungi | we'll find out shortly i guess | 20:10 |
fungi | and done! | 20:13 |
fungi | still over a tb available on the docs-draft volume too | 20:14 |
pabelanger | nice | 20:14 |
fungi | running it once more in case there were outstanding writes under it when i initially started | 20:15 |
jeblair | w00t | 20:15 |
fungi | hopefully this goes much more quickly | 20:15 |
fungi | and then i'll delete everything out of logs.0 before restarting the fsck | 20:15 |
fungi | and that's done too | 20:30 |
fungi | cleaning up | 20:30 |
fungi | starting fsck back up now, this time with -y | 20:41 |
fungi | still in the same root screen session | 20:42 |
fungi | i'm not holding my breath that this will go much faster | 20:42 |
jeblair | now that's past, i'd like to lighten the mood by pointing out that the stray characters in the screen window made it look like we were fscking the "dogs" filesystem. | 20:42 |
jeblair | i'm still giggling about that. | 20:43 |
jeblair | 19:09 < jeblair> dogs: |================================================= - 87.2% | 20:43 |
*** bnemec is now known as beekneemech | 20:49 | |
fungi | yup, i saw the same | 20:55 |
fungi | i think someone hit their d key at some point during the earlier fsck (maybe even me) | 20:55 |
fungi | probably the result of a mistyped screen detach attempt | 20:55 |
-openstackstatus- NOTICE: The upgrade maintenance for lists.openstack.org has been completed and it is back online. | 21:50 | |
clarkb | fungi: how goes the current fsck? | 22:17 |
clarkb | and would you be offended if I did go do yardwork while still fscking? | 22:18 |
jeblair | Pass 2: Checking directory structure | 22:19 |
jeblair | i guess we're looking at this finishing tomorrow morning? | 22:19 |
fungi | clarkb: not much point in sticking around for now | 22:21 |
fungi | yeah, i'll keep an eye on it while i'm awake, maybe see if jhesketh or ianw don't mind keeping tabs on it on their saturday, but will otherwise probably not reenable things until i wake up tomorrow | 22:22 |
*** ianw has joined #openstack-infra-incident | 22:24 | |
clarkb | I will be sure to check in tomorrow when I wake | 22:25 |
ianw | sorry i did not think my screen session was active so that's why there's ~'s in there | 22:28 |
ianw | i thought it would be finished! | 22:28 |
fungi | well, it was originally restarted with -p (-a really, but same thing) | 22:29 |
fungi | so when it hit an error it couldn't safely fix, it aborted (around 90% complete) | 22:29 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!