The Digital Projects Unit’s recent web archiving activities.
For this ongoing program, we collect and store the final web content of U.S. government agencies and commissions identified by agency staff, librarians, or other interested parties as “dead.” We make the content freely and permanently available to the public through the UNT Digital Library’s CyberCemetery collection. As of 2017, the CyberCemetery contained the web sites and publications of more than 100 defunct governmental units. This collection is provided through a partnership between the UNT Libraries, the National Archives and Records Administration (NARA), and the U.S. Government Publishing Office (GPO), as part of the Federal Depository Library Program.
End-of-Term Presidential Harvest
For this project we worked with the Library of Congress, the Internet Archive, the California Digital Library, and the U.S. Government Printing Office to harvest sites that would quickly change at the end of President Bush’s administration on January 19, 2009. The web sites are available in the searchable End-of-Term Archive. A two-year grant from the Institute of Museum and Library Services (IMLS) funded research into classifying this End-of-Term Archive using the Superintendent of Documents (SuDocs) Classification System.
2012 and 2016
We again participated with our partners to harvest sites at the end of President Obama’s first and second terms.
UNT developed the Nomination Tool to allow communities of subject specialists to recommend URLs for upcoming web harvests. Recent projects with the International Internet Preservation Consortium (IIPC) community include Winter Olympics 2014, Nelson Mandela Archive, and Papal Transition. Of more local interest in the North Texas area, we are doing monthly crawls on materials pertaining to the expansion of interstate highway 35.
UNT Domain Harvests
Since 2005, we have regularly crawled our own UNT domain and subdomains twice per year: near the end of the fall and the spring semesters. These harvests normally include any pages with URLs containing unt.edu or its subdomains such as library.unt.edu. We can manually add or exclude domains as requested. The harvested content is persistently stored in our repository, but it is not publicly available at this time. Members of the UNT community with on-campus UNT IP addresses or VPN may gain access.