Remote access to the Cambridge Genizah Collections: Facsimiles and subject-based data

This paper describes a project carried out at the Cambridge University Library by a team consisting of Ellis Weinberger, research assistant in the Taylor-Schechter Genizah Research Unit, Matthew Bernstein, World Wide Web officer for the University Library, and Les Goodey, photographer in the Photographic Department of the Library.

What will be discussed here are the activities of the Taylor-Schechter Genizah Research Unit; the reasons we wished to provide remote access; our goals when starting the project; the resources that were readily available; the steps we took to carry out the project; and our plans for the future of the project.

The work of the Unit

Over the years those responsible for the Taylor-Schechter Genizah Collection at the Cambridge University Library have performed many tasks to make possible the exploitation of the resources of the Genizah for the betterment of learning. Such work has enabled many gifted and dedicated scholars, from Jacob Mann and Louis Ginzberg, to Shelomo Dov Goitein and Moshe Gil to pursue their research in the Genizah. The scholars have come from many countries, and the Genizah has supplied them with source material in history, palaeography, biblical and rabbinic literature, and an enormous variety of other subjects.

The Library stored the fragments in glass sheets, preserved them in Melinex, and microfilmed the Collection. Fragments were classified to some extent at the time of preservation, and written handlists with descriptions of some of the fragments are on the shelves of the Manuscript Reading Room. A number of catalogues, containing the work of noted scholars who deciphered and described the fragments have been published by the Cambridge University Press for Cambridge University Library, and more are being prepared. The catalogues deal with some of the variety of subjects covered by the manuscripts, such as liturgical poems, Hebrew Bible manuscripts, Aramaic translations of the Bible, Arabic legal documents, and medical manuscripts. A bibliography of published works on the fragments has been produced, and the second volume of the bibliography is in preparation. The Taylor-Schechter Genizah Research Unit was established in 1974 to preserve and exploit the Taylor-Schechter Genizah Collection. At all times the goal of the Unit has been to facilitate the work of researchers for the advancement of scholarship.

The importance of developing remote access

The Unit felt that web browser access to images of the manuscripts would be a useful tool for scholars. Web access to information resources is seen by funding bodies as part of a service that an archive needs to provide. Remote access would diminish the need for scholars to travel to Cambridge, reduce wear on the manuscripts, and allow more than one scholar at a time to view a manuscript in detail. The form of access that we provide in this way is faster and cheaper for the scholar than obtaining a microfilm, and, more importantly, provides better image quality. The web site was set up in order to expand the role of the Unit in providing access to information from the Collection and information about the Collection. This can be seen as another tool, among many tools being provided by the Unit, to help scholars find and use all the dispersed information produced about Genizah documents.

The Taylor-Schechter Genizah Research Unit's World Wide Web site sets forth a broad overview of the manuscripts in the Taylor-Schechter Genizah Collection, with selected manuscripts annotated in great detail. The web site contains a summary of subjects covered by the Collection, and bibliographies relating to the Collection in general and a few manuscripts in particular. The web site also presents the history of the Collection and describes the work of the Unit. The web site was initiated by Dr Stefan Reif, the Director of the Unit and first edited by Dr Douglas de Lacey, now the computing officer for the School of Arts and Humanities at the University of Cambridge.

Our goals when initiating the project

One of my tasks within the Unit is the maintenance and improvement of our web site. I saw a way of enhancing the speed of digitizing manuscripts, using the available resources of the Unit. This was also intended to ease the lot of scholars searching for manuscripts dealing with their research topics. Up to this time, scholars have been obliged either to search through the printed or manuscript catalogues, or to know of previous work done in their field in order to find suitable manuscripts, or personally to examine large numbers of manuscripts. There is no detailed catalogue for most of the fragments, since the comprehensive cataloguing of manuscripts at the time of shipping and preservation was not possible, due to the large number of documents. There are around 140,000 documents, containing in total at least 500,000 leaves of varying sizes, ranging from those leaves that cover a desk to those that barely cover the palm of the hand. The fragments are fragile and subject to widely varying levels of fading, staining, and insect damage. The quality of the handwriting varies to a great extent. These are some of the factors making study of the fragments difficult.

Our resources when starting the project

In order to enrich the Genizah on-line database project, we looked at the assets readily accessible to the Unit. We have the catalogues produced by scholars in the Unit. We have straightforward access to the manuscripts. We are fortunate to have the services of Mr Les Goodey, from the University Library's Photographic Department, which has made a serious investment in digital imaging technology. This means that we can obtain high quality images produced by a professional photographer.

The Genizah Research Unit engages, for the portion of his post devoted to the Unit, Mr Matthew Bernstein, who is the Library Web Officer. Mr Bernstein has extensive experience in network management, server management, and interactive web page design.

After considering our assets, I proposed to Dr Stefan Reif, the Director of the Unit, that building the capability to search our catalogues and then display the images of the manuscripts would be the most efficient and effective method to use our resources and fulfil the promise of a Genizah on-line database.

The steps we took to carry out the project

I extracted from our data archives the text used in the typesetting of Professor Michael Klein's catalogue. This catalogue describes and classifies Genizah fragments containing Aramaic translations of the Bible. We planned the structure of the search engine, and decided on the best way to present the information to the scholar. Mr Bernstein then coded a fast and efficient search engine, in the Perl scripting language.

The search engine enables anyone to find specific fragments by using catalogue information already produced by the Genizah Research Unit for the Klein catalogue. One of the problems we have to overcome is the extremely complicated structure of the classmarks of the manuscripts. A pseudo-code description of the structure of the classmarks runs to over 4 single-spaced pages of A4 paper.

The search engine takes the search terms provided by the scholar and returns the Klein catalogue records that contain the greatest number of search terms. It starts displaying results even before the search is completed. The Hebrew text is translated by the display script from the transliteration used in typesetting to a Hebrew web font at the same time that the file is displayed to the screen.

Using the search engine, fragments can be found by searching for any combination of words or classmarks in the text. A large number of subject access points to the fragments has thus been produced. This enables scholars to find manuscripts of interest to their field. For example, scholars can look for types of script, text from various books of the Bible, or stylistic details of the manuscripts.

Mr Goodey, using many years of experience gained as a photographer and digital image editor, produced consistently high quality image files tailored to the project specifications. Producing finely exposed and focused images, with the proper density and colour calibration, requires specialised technical ability. Mr Bernstein coded an image display facility in Perl, which enables images of manuscripts included in the catalogue to be examined.

The images can be viewed at varying levels of magnification, simply by clicking on the image displayed on the screen. One can also, at any given level of magnification, move about the image of the manuscript, simply by clicking on the thumbnail image. Only the specific area of the image requested is sent to the scholar. This means that a file of under 35 kilobytes is sent for each request, and this should easily be received by scholars with slow network connections.

The value of the images is enriched, since they are combined with descriptive cataloguing and on-line search facilities. The value of the catalogue text is also enhanced, since the on-line images are a very large collection of facsimiles. We are on our way to providing images of both sides of every leaf of each document catalogued in the Genizah Series.

The structure to store and enable access to the images is in place, and the only constraints to making available images of all the fragments catalogued by Klein (over 1500) are the time and cost of digitizing. This year we are able to provide access to the Klein catalogue, thanks to permission from the Cambridge University Press, and next year we hope to afford access to the Isaacs catalogue, of medical and paramedical manuscripts.

Future plans for the project

We plan the addition of bibliographical information for each catalogue record, based upon the bibliography already published by the Unit. We are examining possible improvements to the search engine and display facility, which will provide more advanced search facilities and image enhancement. Finally, the Unit is planning, in cooperation with Professor Mark Cohen, from Princeton University, to offer the facility to view transcriptions of texts, based upon the work of Professor Shelomo Dov Goitein, as well as our own images.

I will sum up by saying, that we have used the resources we have to produce in a few months a tool that we feel is useful for scholars. The database is currently on-line, at, and we welcome suggestions to help us improve the utility of the project.


