*** hamalq has quit IRC | 00:37 | |
*** diablo_rojo has quit IRC | 03:11 | |
*** diablo_rojo has joined #opendev-meeting | 04:38 | |
*** hamalq has joined #opendev-meeting | 04:49 | |
*** hamalq has quit IRC | 05:15 | |
*** diablo_rojo has quit IRC | 10:08 | |
*** hamalq has joined #opendev-meeting | 16:31 | |
*** hamalq has quit IRC | 16:32 | |
*** hamalq has joined #opendev-meeting | 16:32 | |
*** diablo_rojo has joined #opendev-meeting | 16:34 | |
clarkb | Have we ended up with people being able to meet today? | 19:00 |
---|---|---|
clarkb | I've been distracted with parental duties (yay birthdays) | 19:00 |
clarkb | if more than just myself raise their hands as being around I can do a quick informal meeting catch up thing | 19:02 |
fungi | oh, indeed, i forgot | 19:02 |
fungi | i was going to try and get a nap in before 04:00, since there's also more openstack tc sessions at 13:00 i wanted to cover | 19:02 |
corvus | o/ | 19:03 |
fungi | but i don't need to nap just yet ;) | 19:03 |
corvus | i'm around, but don't have much to say | 19:03 |
clarkb | #startmeeting infra | 19:03 |
fungi | i guess we could talk about etherpad lower-casing, but that's not especially urgent | 19:03 |
openstack | Meeting started Tue Jun 2 19:03:35 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:03 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:03 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:03 | |
openstack | The meeting name has been set to 'infra' | 19:03 |
clarkb | if nothing else we'll record that we had nothing to say | 19:03 |
clarkb | #topic Announcements | 19:04 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:04 | |
clarkb | This week the PTG is happening so we are a bit distracted | 19:04 |
clarkb | for that reason we'll have a shorter less formal meeting | 19:04 |
clarkb | #topic Open Discussion | 19:04 |
*** openstack changes topic to "Open Discussion (Meeting topic: infra)" | 19:04 | |
clarkb | Meetpad's room urls are case insensitive due to xmpp limitations | 19:06 |
clarkb | this has created a small amount of confusion with the mapping onto etherpad urls as ehterpad urls are case sensitive | 19:06 |
corvus | fungi looked at the database and it looks like people get confused by that all the time | 19:06 |
AJaeger | o/ | 19:06 |
clarkb | the workaround we are using is to use lower case urls in both and renaming pads if necessary | 19:06 |
fungi | ahh, yeah, so i did a bit of analysis on what it might look like if we wanted to lower-case all pad names (and presumably set up redirects) | 19:07 |
corvus | so it seems like making etherpad case-insensitive in general would solve this jitsi issue for the future as well as prevent some etherpad-only mistakes | 19:07 |
fungi | we have a bit over 2k pads (out of roughly 60k) which would need to have case-insensitivity collisions resolved | 19:07 |
fungi | however, etherpad also has a great feature where if anyone connects to a new pad it saves that initial revision with just the intro text | 19:08 |
corvus | i probably am responsible for 5k empty pads :) | 19:08 |
fungi | so i did some comparisons of checksums of all the pad contents and found that if we removed those and also pads which are blank, that leaves more like 500 we'd need to look through | 19:08 |
fungi | still a lot, but not insurmountable | 19:09 |
corvus | brainstorming how to resolve collisions: we could rename one of them to something with a suffix (eg "-case") and ... if it's not too hard to edit via the api... prepend a note at the top saying "if you're looking for $otherpadname it has been moved to $newurl" ? | 19:09 |
fungi | if nothnig else, we might take the checksum comparisons as a good opportunity to clean up the db. roughly haf of our pads are blank or contain just the intro text | 19:09 |
corvus | fungi: if we cleanup the blank pads, we could cron that weekly or something too | 19:10 |
fungi | yeah, well, intro text only pads for sure | 19:10 |
clarkb | we can delete pads via the api right? | 19:10 |
clarkb | so that bit at least should be straightforward | 19:10 |
fungi | i'm not 100% sure removing blank pads is a good diea, because it's possible someone blanked them in a fit of vandalism, and if we delete them then we can't get them back (excepting from our database backups) | 19:11 |
fungi | but there's fewer of those | 19:11 |
corvus | sorry i meant intro | 19:11 |
fungi | the ones which are all intro text are obviously fair game for sure | 19:11 |
fungi | i also found a surprising number which are intro text plus an abiword error message | 19:11 |
fungi | also we have something like 5 variations of intro text floating around as it's changed over time | 19:12 |
corvus | there's an "appendText()" method, as well as "setText()"... so adding a redirect message seems plausible | 19:12 |
fungi | but basically anything over 20 identical checksums (after stripping leading/trailing whitespace) is trash, i verified the texts manually | 19:13 |
corvus | i'm not sure how that deals with formatting (perhaps the setHTML() method would be necessary?) but it's at least something to look into | 19:13 |
corvus | any other brainstorms about how to resolve the conflicts? | 19:14 |
fungi | as for the actually empty pads, i could probably do a bit of analysis on revision count. most of them probably just deleted the intro text and that was it | 19:14 |
fungi | i mean, chances are a lot of the remaining 500 name collisions are also trash, i just haven't had time to take a look | 19:15 |
corvus | yeah, but if more than like 20 of them are real, it may be easier to automate the whole thing | 19:16 |
corvus | (after all, if we automate it, and nobody notices, it's no big deal :) | 19:16 |
clarkb | corvus: other ideas if they are the result of people mistakenly using case improperly we could merge them somehow and keep the lower case version going forward | 19:17 |
clarkb | if they are distinct then your idea seems sane | 19:17 |
corvus | like concat them? yeah | 19:17 |
corvus | oh, i guess fungi implied an option that i didn't quite pick up on too -- | 19:18 |
clarkb | ya concat is probably simplest | 19:18 |
corvus | rename one pad, and add an .htaccess entry for that one | 19:18 |
corvus | (that does a redir) | 19:18 |
corvus | that would work for people visiting etherpad.o.o directly, but wouldn't address confusion for folks arriving via meetpad | 19:19 |
clarkb | right I think we want to force teherpad to do lower case too? | 19:20 |
clarkb | at least that was what I was assuming we wanted then it would avoid confusion there and mismatched behavior with jitsi | 19:20 |
fungi | yeah, i wondered if we should make etherpad just redirect to lower-case padnames (if that's possible) | 19:20 |
AJaeger | can we ensure that future new pads are all lowercase? | 19:20 |
corvus | yeah, i think in all cases, we have etherpad redirect to lower case | 19:20 |
fungi | that avoid people creating new problem pads | 19:21 |
clarkb | we can enforce that with apache | 19:21 |
clarkb | (I think) | 19:21 |
corvus | then the question is for conflicts, do we a) move and add a note to the pad (optional: add a specific redirect for the moved pad); b) concat. | 19:21 |
corvus | clarkb: yeah, would be a simple mod_rewrite redirect | 19:22 |
corvus | fungi: do you have a list of collissions? | 19:22 |
fungi | yep! | 19:22 |
fungi | i didn't post it publicly since i don't know if anyone was relying on some random pad names to not be discoverable | 19:23 |
corvus | if we remain concerned about that, that may eliminate the idea of having a .htaccess list for specific pad name redirects | 19:23 |
fungi | it's also just a python script i can rerun to regenerate, but takes around an hour due to the number of queries | 19:24 |
corvus | fungi: ~fungi/collisions.yaml ? | 19:24 |
fungi | checking, but if that's got 504 entries then yes | 19:24 |
fungi | yeah, that looks like it | 19:25 |
fungi | that's the collisions which would remain if we cleaned up empty and intro text pads | 19:25 |
corvus | ah nice, there's some linkfarm spam there | 19:25 |
fungi | anyway, just wanted to strike up the discussion when it wasn't a weekend | 19:28 |
fungi | scripting stuff against the etherpad rest api is not hard, and it's well-documented | 19:29 |
fungi | so we could certainly consider periodic cleanup by checksum, for example | 19:29 |
corvus | spot checking these, i feel pretty confident that only one of the two of each of these is going to be important | 19:29 |
corvus | so far, they're either both linkspam, or one was clearly "the wrong one" | 19:29 |
fungi | that's my suspicion as well, they just weren't going to be as easy to mechanically identify | 19:29 |
clarkb | cool in that case we should be able to delete the bad one, then rename if necessary, and set up redirects in apache? | 19:30 |
corvus | we could use a simple rubric to determine the "better" of the two to rename | 19:30 |
fungi | so my hope is that we wouldn't need any fancy per-pad redirects or breadcrumbs | 19:30 |
corvus | well, if we go through all 500 -- do we want to? | 19:30 |
corvus | i guess if we got a bunch of folks doing it, we could probably knock it out pretty quickly | 19:31 |
fungi | i suppose we could do it in batches (the rest api would still let us get to the redirected originals | 19:31 |
fungi | so we could still set up the mass redirect and renaming while we worked through the collisions | 19:31 |
fungi | but we'd presumably want to go through the cleanup first, at least before bulk moving | 19:32 |
corvus | here's what i'm thinking: if we want to delete one of the pads (or rename it to a non-public name), we'll need to go through manually and identify which one to keep. but if, instead, we went with one of the options above (concat or rename with link) we could use a simple length rubric to 'guess' which is the best one, so we can make that the one that people land on by default. essentially, | 19:32 |
corvus | rename that to be the lowercase one if it isn't already. | 19:32 |
clarkb | for link spam we might be able to identify those based on content? eg just a list of urls? | 19:33 |
corvus | probably so? that might prune it a bit more | 19:33 |
fungi | plan sounds reasonable, also yes spotting pads which are just lists of urls may also be scriptable | 19:33 |
corvus | also, some of the linkspam may actually have identical content | 19:34 |
fungi | yeah, some does, you can find the checksum analyses in checksums.yaml | 19:35 |
corvus | okay, maybe we can putz with this the rest of this week and see if we can prune the list a bit, then send an email out next week with a suggested plan | 19:36 |
fungi | wfm | 19:36 |
fungi | i at least wanted to be sure this was something we felt we ought to do | 19:37 |
clarkb | ya it seems doable | 19:37 |
clarkb | and considering people were having collision issues previously seems like a good idea meetpad or not | 19:37 |
fungi | a bunch of the url-heavy examples i'm looking at don't seem to be linkfarms for search engine purposes | 19:38 |
fungi | they instead seem to be mazes of link obfuscation and url proxies | 19:38 |
corvus | yeah, i was a little ambivalent about doing it just to "fix" meetpad, but i'm increasingly convinced it's a Good Idea | 19:40 |
corvus | fungi: yeah, there's some interesting stuff in there, the purpose of which i don't fully understand | 19:40 |
corvus | aww, i just found someone's gerrit http password :/ | 19:41 |
corvus | clarkb: i think we may be out of topics :) | 19:49 |
clarkb | agreed | 19:49 |
clarkb | #endmeeting | 19:49 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 19:49 | |
openstack | Meeting ended Tue Jun 2 19:49:27 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:49 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-06-02-19.03.html | 19:49 |
clarkb | thank you everyone | 19:49 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-06-02-19.03.txt | 19:49 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-06-02-19.03.log.html | 19:49 |
clarkb | we'll be back to our regularly schedled programing next week | 19:49 |
fungi | thanks clarkb! | 19:50 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!