A large issue with Reddit is that content is regularly submitted again and again (otherwise known as reposting).
A vocal minority always posts comments about how the image has been posted many times before, sometimes the same day, week, or month.
So I started to think about how I can make a Reddit Enhancement Suite extension that would hide reposts.
I started to think about how this would work and there are a few ways I could see this working.
But first, why?
Pros
- Reposts suck. An argument that the content on Reddit should be fresh and seeing the same submission over and over would decrease the quality of the website.
Cons
- Reposts are a real part of Reddit, just because something is reposted doesn’t mean that there won’t be an interesting addition to the discussion a user has never seen before.
- I remember when I used to block the imgur domain using Reddit Enhancement Suite, the extension use to just hide the submissions. So what would happen is that whenever I went to the front page I would only see two actual posts because almost everything was imgur links. For this extension, it would be counter-productive to block most of the submissions from the front page because they are reposts.
One Possible Implementation
- Keep track of all front page posts for a long time (month or so, to start) into a data store (take your pick, mongo or sql)
- The extension checks the url of the post against the database and if it already exists hide the submission post.
Issues:
- Getting the data can take a while, but I found other people who have been archiving Reddit data.
- Checking against the data store for a certain URL can take a while, of course hashing can make it easier but eventually collisions can happen.
- One way to make things faster is to use some sort of hashing strategy that can check different hashtables for a URL. I remember that some data stores reverse a URL to make this easier. For example, the strategy is to store a domain in the format com.vvohra.subdomain so the com’s, net’s, etc can be stored on different servers to speed up checking of data.
Existing Solutions
Karma Decay shows which images have been reposted. I think it’s similar to TinEye, so it should be able to handle images from different domains and changed sizes. It seems Karma Decay has an extension which allows a user to notice a submission has duplicates.
Another extension I found was Repost Blocker, which seems to be exactly what I brainstormed. Looking through the documentation, it seems like this extension uses Karma Decay to check for reposts.
Just a short brainstorming activity, gotta think some more!