Skip to Main Content

Using Archived Web Content in Your Research

Overview

Researchers are increasingly aware that content published on the web can change or disappear at any time. This guide identifies the leading archives of web content and provides context about these projects, and about web archiving in general, so that researchers can make informed use of the available resources.

Below are a number of tools available tools that can be used to save single pages or create your own personal web archives. On other pages of this guide, a number of existing web archives are highlighted, as well as some context as to the technical challenges of web archiving.

Web Archiving Tools

The Internet Archive's Wayback Machine is the largest archive of the World Wide Web, covering more than 946 billion pages (as of August 2025), dating back as far as 1996. Pages in the Wayback Machine are captured repeatedly over time. Many pages are available in hundreds or thousands of versions, corresponding to the content of the page on different dates.


How to Save a Page on the Wayback Machine

There are several methods available to save a page to the Wayback Machine. The most straightforward method is to use the submission form on the Internet Archive website.

Simply navigate to the Wayback Machine submission page, https://web.archive.org/save, paste the URL of the page you want to save in the blank field, and click SAVE PAGE. There is also a SAVE PAGE submission field on the Wayback Machine homepage, on the far right.

Screenshot of the Wayback Machine Save Page Now screen.

Fig 1. Screenshot of the Wayback Machine Save Page Now submission form.

It may take several minutes for the page to save. Once the page finishes saving, you will be given a link to the permanently archived version of the page. Note that the snapshot link is actually https://web.archive.org + the string shown on the success page.

Screenshot of the Wayback Machine success page.Fig 2. Screenshot of the Wayback Machine Save Page Now success screen, with the link to the archived page pointed at with a red arrow.


Limitations

You may find that a page you are interested in is not available in the Wayback Machine, or that the content seems incomplete. There are a few reasons why this might be the case.

  • The Wayback Machine may not have been aware of the site's existence at that time. The archive uses a web crawler, similar to those used by search engines, which follows the links on a page to identify other pages that can be archived. This approach can miss pages that are not linked from other pages. Using the Save Page Now function can help the crawler find pages that it might not be aware of otherwise.
  • The site's owner may forbid web crawlers in general, or the Wayback Machine's crawler in particular, from accessing the page.
  • There may be technical features of the page that the Wayback Machine cannot capture. These often include embedded video or interactive components of a page.

Perma.cc is a service developed and maintained by the Harvard Law School Library that helps legal scholars and courts create links to web citations that will never break or disappear. When a user creates a Perma.cc link, Perma.cc creates a permanent link to an archived record of the page and its content. Even if the original source is no longer available, the archived page will continue to be accessible through the Perma.cc link. This is a very helpful tool to prevent link rot and preserve the integrity of legal citations on the web.

Note that content saved in Perma can only be found using the direct link, so unless you have the precise Perma.cc link you will not be able to find the archived URL.

The organization Webrecorder offers a suite of tools that can be used to save web pages locally, create cloud-based web archives, or replay an archived web file in the browser (learn more about web archive filetypes elsewhere in this guide).

  • ArchiveWeb.Page is a browser extension for Chrome that allows the user to save single pages to their browser's local storage. (See the ArchiveWeb.Page user guide for instructions on installing and using the extension.)

  • ReplayWeb.page is a browser-based player for archived websites. Upload WARC, WACZ, CDX, or HAR files to view and interact with archived pages, such as those captured by ArchiveWeb.Page.

  • Browsertrix is a cloud-hosted, paid web archiving service, that provides convenient features such as scheduled multi-page crawls, and allowing users to designate "profiles" with login information for social media or other accounts that would otherwise be uncapturable by services such as the Wayback Machine.

Archive-It is a web archiving service created by the Internet Archive that helps institutions archive and provide access to cultural heritage on the web. Archive-It works with 1,200 partner organizations, including BC Law, academic, state, and public libraries, museums, and historical societies. Collections often include archived news articles, blogs, social media, and other websites about topics of interest. This resource could be useful for researching the specific topics covered by the digital collections that these institutions have created. You can browse the site or full-text search by collecting organization, collection, site, or page text. Some examples of collections include the #blacklivesmatter Web Archive, a Boston Marathon Bombing collection, and a collection about the Supreme Court hearings on DOMA/Prop 8 in 2013.