Deleting from the database

Continuing on from yesterday's post - a commenter asked what happens if you want to delete something from the database irrevocably. I was going to reply with a comment but my reply got kind of long and it's an important concept so I decided to make a post out of it.

I have thought about this a bit and I think there are two reasons why you would want to delete something irrevocably. One is to save disk space (if the thing you want to delete is very big and you're sure you won't need it again). Current storage capacities are sufficiently high that this is a much less common problem than it used to be (especially for programmers - I suppose astronomers and particle physicists tend to have larger chunks of data, though).

The other reason is if the data should never have been checked into the version control system in the first place - for example if you accidentally checked your tax return into your company's source control repository.

Either way, any reasonable VCS should be able to cope with such requirements. To prevent abuse, this would probably have to be done by the owner or administrator of the system storing the data. I know at Microsoft there was a process for this (I never had to use it myself, but I saw an embarrassing bug report which had had its edit history erased so I know it can be done).

This is one area where distributed version control has a bit of a disadvantage - you have to contact everyone who has downloaded a copy of the repository and persuade them to erase the unwanted data and pass on the message to everybody who might have downloaded it from them and so on. For a popular project, this might be as impossible as unpublishing something that's been published on the internet.

As for the technical details of obliterating from the accumulate/cache database in particular, it's easy to do (as long as you can remove data from the accumulate table) except for any changes which happen on top of the undesirable change. These either have to be reparented or deleted. If there have been a lot of changes on top of the unwanted one this might be very labour-intensive or undesirable respectively, particularly if those changes are to data you're trying to remove.

Leave a Reply