Another Argulator feature that never was

One Argulator feature I was, at one point, absolutely convinced that I needed was a web page cache/archive. The trouble is that one often wants to link to other sources of information when making statements (for example to cite a source or backup some assertion). On the web the natural way to do that is to use a link to a URL elsewhere on the web. But URLs are fickle things - the resource they point to can change with time or disappear altogether (something I've always tried to avoid happening with this site - I believe pretty much every reenigne.org URL that was ever valid is still valid today and points to the same information, even if it has been tweaked a bit over the years).

If you're referring to a URL in an Argulator statement, though, that's a problem because a statement's meaning is never supposed to change once it's been created. If it could change, you're putting words into the mouths of all the people who have expressed an opinion on that statement. So I wanted a way to make sure that web pages mentioned in Argulator statements never changed or disappeared, which meant bringing them under the control of the Argulator server. The idea was that when someone mentioned a URL, the Argulator server would fetch that URL, along with the graphics and CSS required to make the page display properly, just as if it was a web browser. Then it would store these files on the server and show it's cached copy (much like Google's cache or the Internet Archive does) when a link is clicked.

But that turns out to be surprisingly difficult, and opens up a whole cannery of worms. Security is a big problem. To ensure nothing bad is ever served from the Argulator site we need to strip out all executable content (script, Java, ActiveX, flash, XUL, Silverlight etc.) from the pages. Which means parsing all the CSS and tag attributes. After that treatment the page might not render at all (especially if it contains things like protection from ad-blockers). Such a cache should also respect the directive and refuse to cache pages whose authors do not want them cached.

Then what do we do if someone requests we take down a cached page (maybe it contains illegal content, or the content owner complains on copyright grounds)? We don't want to create work for ourselves, but on the other hand we don't want to make it too easy for people to sabotage statements they disagree with by getting that statement's sources removed.

I considered the possibility of just linking to pages on the Internet Archive but that may stifle discussions of current events since the IA doesn't serve any pages less than 6 months old. But the idea of linking to an external site instead of keeping the pages on Argulator got me thinking. Such a cache may be useful for other things besides Argulator, so maybe the cache should be a separate (but associated) site. Realizing that such a site would be useful in its own right got me to wondering why nobody had done that before, which got me to realizing that maybe they had. A few google searches later I found several sites which did exactly that. I then realized that Argulator should stick to its core competency and leave the web caching to the experts. Statements can link to any URL anywhere on the internet, but it is recommended for the sake of stability to link to one of these caching services. I hope Argulatists will realize that URLs pointing elsewhere might not have the same contents as they had when the statement was originally created, and take such statements with the appropriate amount of salt.

Leave a Reply