Page Contents 18 minute read.
Version 1.0, August 2007
Introduction
Responsibility for storing and providing user access to new University of North Texas (UNT) Electronic Theses and Dissertations (ETDs) is moving from Academic Computing Services to the UNT Libraries. The first section of this document provides background on ETDs at UNT and elsewhere, followed by a description of current local practice. Finally, it offers recommendations for an optimal environment to ensure long-term access to and preservation of UNT’s ETDs. For the current status of the project, see ETD Progress Report. If you have questions or comments regarding this document or the ETD project in general, please contact Mark E. Phillips at mark.phillps@unt.edu or Daniel Gelaw Alemneh at daniel.alemneh@unt.edu.
Scope
Development of a viable workflow requires balancing the needs of multiple stakeholders. It should be noted that the preliminary analysis presented in this document is based only on UNT manuals and guideline documents, and the scope is limited to the UNT Libraries’ projected stewardship role. In view of these limitations, the workflow based on this document’s recommendations will be tested for feasibility by all ETD stakeholders. We are anticipating an ETD pilot for Fall 2007 with eventual implementation of the full-fledged ETD workflow campus-wide.
Background
Theses and dissertations represent a wealth of content created by
university students in the degree-seeking process. Historically, the UNT
Libraries housed two copies of each paper-based UNT thesis or
dissertation, depositing one copy in the University Archives and placing
one copy for use in the Libraries’ general collections.
Among the first five American universities that required ETDs for
graduation, UNT began accepting theses and dissertations in electronic
format in 1999. This switch to an electronic delivery system of
scholarly output fundamentally changed the way these documents were
handled and stored. The ETDs were loaded onto UNT Academic Computing
Services servers and the UNT Libraries provided bibliographic access
through the Libraries’ online catalog.
Current Trends
New technology for digital interchange provides opportunities for
extensive dissemination of graduate students’ scholarly work. Over the
past few years, institutional ETD programs have become the norm, not the
exception.
An ETD program provides processes, standards, software that automates
functions, and a digital infrastructure that facilitates access and
preservation. Commentators (ETD 2007) agree that implementation of
comprehensive strategies for placing a collection of institutional
intellectual output on the Web requires some changes in institutional
policies and practices. It also needs the support of a wide range of
stakeholders on campuses. These include graduate students, faculty
members, libraries, graduate schools, and, in some cases, commercial
publishers and other external players.
In the early 2000s, the first wave of ETDs were usually stored as a part
of digital library collections administered by university libraries.
These collections later served as the foundation for institutional
repositories. By extending their existing objectives and by working
together at the state, national, and international levels, university
libraries are now playing a vital role in ensuring permanent and
persistent access to this indigenous knowledge base.
Current ETD Workflow at UNT [prior to Fall 2007]
Currently [prior to Fall 2007] the ETD’s are handled at the University of North Texas in the following workflow:
- Degree candidates submit their ETDs to the Graduate School in the Portable Document Format (PDF).
- The Graduate Reader reviews the files for formatting errors and permissions issues and ensures that candidates supply any needed corrections or documentation. When the files meet all the requirements of the Graduate School, the Graduate Reader approves them.
-
Two copies of each approved PDF are made:
- UNT version
i. File folder created with student name, as “lastname_firstname” Student folder is saved in either an “Open” or “Restricted” folder. (An open thesis/dissertation will be available to the entire Internet community. A restricted thesis/dissertation will be limited to use by those with a valid UNT login. According to the Graduate School’s electronic document filing form, all ETDs are openly available unless compelling reasons exist to restrict the document.)
ii. Thesis or dissertation file is named either “thesis” or “dissertation” and saved in student folder. File is saved with protections that prevent copy/pasting and/or printing, either in whole or in part).iii. Index page (to be used by catalogers) is created using Microsoft FrontPage. File is named “index” and also saved in student folder.
- ProQuest copy File saved as “lastname-f” with no protections. Theses are saved in Theses folder; dissertations are stored in Dissertations folder.
At the end of the semester, after the Registrar has formally closed the semester (usually 6-8 weeks after commencement), the Graduate School distributes the approved files:
- UNT files are loaded onto Academic Computing Services (ACS) file server; hard copies of title pages/abstracts are sent to the UNT Libraries’ catalogers. Once all files have been cataloged, ACS transfers the files for the semester over to its Web server. The PDF files are delivered password-protected so that users are not allowed to print or copy text from them. The Graduate School controls the password and also stores a copy of each ETD that is not password-protected.
- ProQuest copies are burned to CD and sent via FedEx (along with all accompanying material) to Proquest.
Rarely, UNT’s Vice President for Technology Transfer directs that certain theses or dissertations must be completely locked down due to patent or proprietary concerns. These ETDs are not released to the UNT Libraries, nor sent to ProQuest, for the period of 1 year. Each year, for the period of three years, the VP is responsible for letting the Graduate Reader know whether lock-down should continue for another year. At the end of three years, all lock-downs are released.
Diagram of the File/Folder Structure
The ETD files are stored in a directory structure which follows this convention:
|
Table 1: File and folder structure in the existing ETD directory
Description of Metadata
In the current [prior to Fall 2007] practice, metadata for each ETD exists in two places: in the Libraries’ online catalog, and with the ETD itself in the form of an HTML file with an active link to the text of the ETD.
Online Catalog
The Libraries’ online catalog provides both a description of each ETD and an access link to the ETD’s full text. The regular display visible to catalog users is based on MARC (MAchine-Readable Cataloging) records (see figure 1) . The creation of standardized MARC records for ETDs ensures that researchers worldwide will be able to locate ETDs not only in the UNT online catalog, but also through consortial catalogs such as WorldCat.
Figure 1: Sample UNT Libraries ETD catalog record display
You can also view the regular display at the UNT Libraries Catalog page. Table 2 shows the MARC display format. See also the actual MARC display at the UNT Libraries Catalog page.
|
Table 2: MARC display
Metadata Associated With the ETD
Metadata that is stored with the ETD in the form of an HTML file provides an abstract of and key facts about the thesis/dissertation. This information is used by catalogers to help create MARC records for the Libraries’ online catalog. The HTML file is named “index” and also saved in the student folder. It contains the following information in a table format:
LABEL | DESCRIPTION | EXAMPLE |
---|---|---|
Author's Name | Student responsible for the creation of the thesis/dissertation [last name, first name, middle initial] |
Woods, Christopher P |
Document Type | Type of Resources [Controlled vocabularies of: Dissertation or Thesis] |
Dissertation |
Title | Title of the thesis/dissertation. [Title Information exactly as it appears on the document] | A Transcription of Op. 94 Morceau de Concert, by Camille Saint-Saëns For Solo Bass Trombone and Brass Ensemble |
Degree | Degree Information [Controlled vocabularies of all UNT degrees ] |
Doctor of Musical Arts |
Major | Degree Information. [Can be from a controlled list] | Performance |
Committee | Name of Committee Members, including Major Professor (thesis/dissertation advisor) | Vern Kagarice, Major Professor Gene Cho Brian Bowman Graham Phipps Thomas Clark |
Keywords | Subject [One or more subject values denoting the discipline and/or area for the given thesis/ dissertation] |
Camille Saint-Saëns, Morceau de Concert, bass trombone, brass ensemble |
Graduation Date | Month (in English) and Year | May 2001 |
Availability | [Controlled vocabularies of: Open or Restricted] | restricted |
Abstract | [Brief description of the content of the thesis/dissertation] | (Abstract supplied by the author) The transcription is an addition to the repertoire for brass ensemble and bass trombone. Consideration is given to the nineteenth-century orchestration treatises of Berlioz and Strauss as well as the twentieth-century texts of Erik Leidzén, Walter Piston, and Samuel Adler. The transcription process is shaped by the principles of these writers. The score is contained in the appendix. |
Files: | Link to the PDF file | dissertation.pdf |
Special Conditions | If any | . |
Table 3 - Description of sample metadata HTML file associated with a UNT ETD
Desired Situation (Recommendations)
The following sections describe the desired environment for storage and preservation of and access to UNT’s ETDs. Based on these recommendations, we will develop a workflow for placing ETD’s in the UNT Libraries’ Digital Collections (DC) operated by the Digital Projects Unit (DPU).
Environment
The Keystone Digital Library System serves as a framework for the creation, management and public display of digital objects collected by the the UNT Libraries and housed in the UNT Libraries’ Digital Collections. This framework is also used as the primary development framework for all other digital collections managed by the Libraries’ Digital Projects Unit. Other projects include The Portal to Texas HistorySM and the Congressional Research Service Reports Archive. All combined the Digital Projects Unit manages over 30,000 digital objects consisting of over 210,000 files. We have developed processes and workflows to manage and preserve large collections of digital objects with the metadata housed in these systems being a key component.
Metadata
Metadata for the ETDs should be created in a way that supports the international standards set by the Networked Digital Library of Theses and Dissertations (NDLTD) as well as the published standards set by the Texas Digital Library (TDL), of which UNT is a member. As can be seen from the sample description in Table 3 the existing metadata as received from the Graduate School lacks some metadata elements (such as the degree information and degree grantor institution name) which are important for resource sharing at the national and international levels. In light of this new requirements, the UNT Libraries Metadata schema currently used in the Libraries’ Digital Collections would need to be modified to comply with TDL recommendations. Further developing the elements in the ETD metadata will facilitate wider access to the ETDs through various retrieval systems including the Libraries’ Digital Collections, the Libraries’ online catalog, and search engines such as Google and Yahoo. Wider access to ETDs will in turn increase the visibility of UNT and its scholarship.
Files
The Libraries will store the ETDs in the Digital Collections system which is built on the Keystone Digital Library System framework. We will store all metadata in XML files in the system with references to the presentation PDF files that are stored on the display servers. Archival copies of all PDFs which make up each ETD will reside in the Libraries’ Digital Archive with required preservation metadata.
By storing the files in these systems, we will be able to respond to changes in technology that would otherwise affect the accessibility of the ETD files. Moreover, we can create reports based on characteristics of the ETDs themselves.
Files stored in the Libraries’ Digital Collections and ultimately placed in the Libraries’ Digital Archive should be stored and made available with the fewest possible proprietary and software-based rights management mechanisms enabled. It is more desirable that any rights management decisions should be made at the system level and should control the type of access that is available for the ETDs.
Services
The Digital Projects Unit has developed various services which make use of the data stored in the Libraries’ Digital Collections. These services include full-text and fielded keyword and phrase searching, collection and subject level browsing, and syndication services such as RSS and ATOM feeds. Because the Digital Collections are searchable using the SRU and OpenSearch protocols, the UNT Libraries’ Digital Collections can be included in federated search systems.
The Digital Collections’ metadata is harvestable using the Open Archives Initiative’ Protocol for Metadata Harvesting (OAI-PMH). Many groups including Google, OCLC, OAISter and the NDLTD use this protocol to facilitate the harvesting of metadata records for inclusion in their search systems. We are working toward the creation of SiteMaps to allow other search engines such as Yahoo and Microsoft’s Live Search to crawl and index the Digital Collections content.
All content placed in the Libraries Digital Collections benefits from development projects carried out in the Digital Projects Unit. We are planning several user studies to identify ways to create and refine interfaces to enhance access to the various collections held by the Libraries.
The Digital Collections system incorporates stable URLs sometimes referred to as permanent URLs. Users will be able to cite a thesis or dissertation with confidence that others will be able to find that document in the system at a later date.
We will also develop new features specifically for the ETD collection. For example, for the ETDs in the Digital Collections we plan to provide “citations on-the-fly” in several formats commonly used by our students.
Summary
Responsibility for storing and providing user access to current UNT ETDs is moving from UNT Academic Computing Services to the UNT Libraries. As depicted in figure 2, the UNT Libraries will house the ETD files in the Libraries’ Digital Collections and Digital Archive.
Figure 2: UNT Libraries ETDs By Type
With the exception of “Problem in Lieu of Thesis”e, we will continue to catalog ETDs (and link to the texts from the Libraries’ online catalog. We will modify the metadata accompanying ETD files to meet TDL standards and provide appropriate access through both the Libraries’ Digital Collections and the TDL Repository. By maintaining the ETDs in the Libraries’ well-established systems, we will be able to respond to changes in technology and ensure long-term preservation of the files. Users will benefit from searching, browsing, syndication services, and regular enhancements available in the Digital Collections.
In the next phase of this project we will develop a workflow detailing the specific steps that the Libraries will follow to receive, store, describe, monitor, preserve, and provide access to UNT’s ETDs.
Resources
- Congressional Research Service Reports (CRS)
- ETD 2007, Added Values to E-theses, Uppsala 13th-16th, 2007
- Networked Digital Library of Theses and Dissertations (NDLTD)
- Open Archives Initiative
- Open Archives Initiative Protocol for Metadata Harvesting
- OpenSearch.org
- The Portal to Texas HistorySM
- Sample Restricted ETD Metadata
- Sample Open ETD Metadata
- Sitemaps.org
- SRU Search/Retrieval via URL
- Texas Digital Library Repository
- UNT Doctoral Degree Requirements
- UNT Dissertation and Thesis Manual
- UNT Libraries Descriptive Metadata
- UNT Libraries Digital Collections
- UNT Libraries Digital Projects Unit
- The University of North Texas Library Catalog
- Yahoo
Appendices
Appendix-1 Metadata for UNT ETDs
(For complete recommendations and implementation, see ETD at the UNT.)
ETD Metadata Element Outline
- Title
- type
- Creator
- name
- type
- role
- information
- Contributor
- name
- type
- role
- information
- Publisher
- name
- place
- information
- Date
- originalCreationDate
- digitalCreationDate
- Language
-
- Description
- contentDescription
- physicalDescription
- Subject
- authority
- Primary Source
-
- Coverage
- placeName
- timePeriod
- date
- dateRange
- startDateRange
- endDateRange
- Source
- Relation
- Collection
- Institution
- Rights
- access
- license
- holder
- rightsStatement
- Resource Type
- Format
- Identifier
- type
- Degree
- name
- level
- discipline
- department
- grantor
- Note
Appendix-2 UNT-ETD Metadata to MARC Crosswalk Specification
UNTL-ETD Element | MARC Element Description |
Remark |
---|---|---|
Title: |
245a |
(246 for alternatives & 242 for translation) |
Creator: |
100a |
|
Contributor: |
720a |
720e (for role) |
Publisher: |
260b |
260a (for place) |
Date: |
008 positions 7-10 |
|
Language: |
008 positions 35-37 |
|
Description: |
520a |
|
Subject: |
653a |
|
Primary Source: |
--- |
|
Coverage: |
651 or 690 |
|
Source: |
--- |
|
Relation: |
--- |
|
Collection: |
--- |
|
Institution: |
--- |
|
Rights: |
540 |
|
Resource Type: |
655 |
leader 6&7 (As text objects, 6 set to 'a' and as monographs 'm' in 7) |
Format: |
856q |
|
Identifier: |
856u |
|
Degree: |
502a |
|
Note: |
504 |
(for note 5xx) |
Appendix-3 UNT-ETD Metadata Crosswalks to ND-LTD and TDL
UNTL-ETD Element | ND-LTD | TDL |
---|---|---|
Title: |
Title (and alternative) |
Title Information |
Creator: |
Creator |
Name of Author |
Contributor: |
Contributor (and role) |
Name of Thesis Advisor & Committee Members |
Publisher: |
Publisher |
--- |
Date: |
Date |
Original Information |
Language: |
Language |
Langauge |
Description: |
Description |
Abstract |
Subject and Keywords: |
Subject |
Subject |
Primary source: |
--- |
--- |
Coverage: |
Coverage |
Subject |
Source: |
--- |
--- |
Relation: |
--- |
--- |
Collection: |
--- |
--- |
Institution: |
--- |
--- |
Rights Management: |
Rights |
--- |
Resource type: |
Type |
Type of resources |
Format: |
Format |
Physical Description |
Identifier: |
Identifier |
Identifier, (and Location) |
Metadata Information: |
--- |
Record Information |
Note: |
(Description-Note) |
--- |
[Degree Information - Name] |
Degree (name, level, discipline, grantor) |
Degree Information |
[Degree Information - Level] |
Degree (name, level, discipline, grantor) |
Degree Information |
[Degree Information - Discipline] |
Degree (name, level, discipline, grantor) |
Degree Information |
[Degree Information - Degree Grantor] |
Degree (name, level, discipline, grantor) |
Name of Degree Grantor |
--- |