[UPDATED JULY 2022]
This morning I was watching a politics program on TV which mentioned a particular website and how the information had seemingly been removed.
It made me wonder why they didn’t use a couple of basic tools to find an older version of that site.
Here’s a very basic guide to finding “deleted” content on the Internet.
Can deleted websites be recovered?
The internet has existed since 1983 and billions of data and content have been exchanged through it. Now, it is inevitable that older websites will be deleted over time. The question is, can deleted websites be recovered? The simple answer is yes.
There are web archives that exist on the internet that bank publicly listed internet content. When we say publicly listed, we mean anything that does not require passwords or encryptions. The idea of these web archives, fundamentally, is to create a system similar to a traditional library. Information, historical content, and creative commons content are made available to anyone to needs it. Per their user guidelines, they follow the rules provided by National Libraries in the U.S. and can, therefore, be considered legal.
But, retrieving it to create a newer website is a whole different ballgame. Searching through and collecting archived data can take time. Then transferring these data to your newly built website is equally time-consuming. It is also important to understand that not all content on the internet has been archived by archiving resources due to web crawl limitations. So, there is a chance that not everything can be recovered. In fact, the oldest archived content from the internet dates back only to 1996, this was the time when web archiving was initiated. The chance of finding content older than this is low to none.
How can I find old deleted Internet pages?
Search Engine Cache
The big three search engines (Google, Yahoo!, Bing) – and probably some others – all store cached versions of pages.
It’s very easy to see if a cached page is available. Just look below the individual search result for either the “Cached” or Cached page” link.
This is most useful when you know the page has changed in the past few days (or even hours).
It’s not easy to determine exactly when the last copy of the page was stored, as search engine spiders vary in their frequency of visits to any website, but it’s good to use when content has been static for a while and is then removed.
It’s also worth noting that generally, only the text is cached. Images are pulled directly from the web page, if still available, otherwise, simply show as blank.
If an image is changed but its name remains the same, you will likely see the replacement image, therefore this method can’t be relied upon to view older images.
For Google, Yahoo and Bing use the site: prefix (e.g. site:domain.com)
To search for the latest cached copy of a specific URL, you can simply paste that URL into the search box (e.g. domain.com/somepage.htm)
Wayback Machine
The Wayback Machine at archive.org lets you view various snapshots of websites that are at least six months to a year old.
This is a great resource to use when you are interested in seeing how a web page used to look, even if the site no longer exists.
There are a lot of advanced options, but often it’s enough just to enter the URL of a website into the search box and see what dates come up.
The website is a little flaky sometimes and doesn’t always return results, but it’s a great way of seeing old pages that either no longer exist, or have had a makeover.
Find old versions of websites
Library of Congress Internet Archives
This website is great for content from well-known websites.
Old Web Today