1. Introduction

History, according to McCrank (2001, p. 15), has been seen ‘to elicit fact, tell what actually happened, and to interpret events in the past for the present’. The raw materials, on which historical fact is based, are usually written records or artifacts, and different kinds of raw materials and research questions often correspond to different approaches and methods to historical research. Methods in historical research include the biographical method, the comparative method, the ideal typical method, understanding as a method, the causal method, discourse analysis, the microhistorical method, the psychoanalytical method, and the quantitative method (Lorenz, 2015, p. 133). These methods can be roughly classified into two types: the qualitative approach and the quantitative approach.

Drawing on the discipline of book history, this paper introduces bibliographical sources for quantitative translation history, presents techniques for retrieving, analyzing and presenting bibliographic data, reviews some existing quantitative studies related to translation history, and describes some of our own preliminary findings.

2. Qualitative history and quantitative history

In a general sense, the qualitative approach includes, among others, narrative research and case study method (see Creswell, 2012). Narrative, in the case of history, is also the


traditional and predominant writing style; it is used for conveying a sense of time or showing chronological development (Shep, 2005), and is often viewed as being the defining characteristic of historical writings (Wallace & Van Fleet, 2012). Case study research involves the study of a specific case, and tries to present an in-depth understanding of the case (Creswell, 2012). As a method, it is commonly used in translation historical research. Yet, according to Moretti (2005, p. 4), history is not a sum of individual cases; it is a collective system that should be grasped as a whole. Rundle (2012, p. 236) warns that historical translation research is ‘in danger of accumulating a vast archive of heterogeneous case studies that no translation scholar can realistically have the expertise to understand or appreciate as a whole’.

Quantitative history entails the use of numeric data and statistical data analysis in the study of history. It has been successful in addressing questions about long-term historical patterns of change and in revealing the aggregate context and structures of history (Anderson, 2007). In the past decades, there has been a considerable tension between qualitative and quantitative approaches in historical research. Nowadays, it is acknowledged by many researchers that different approaches are necessary for different research questions (see Bode & Osborne, 2015), and historical research would benefit from the blending of these two approaches. As Eliot (2002, p. 284) says, without qualitative case studies, we would ‘miss the texture and taste of [ … ] humanity’; without quantitative studies, the individual case studies would lack ‘a context to confer on their details a proper significance’. To date, few studies (e.g. Šajkevič, 1992) in translation history have adopted the quantitative approach, though many translation researchers (e.g. Pym, 1998) are aware of the potential of quantitative methods in, for example, leading to new research questions, conclusions and perspectives.

In recent years, developments in computational power, the availability of digitized sources, and the emergence of digital humanities have provided us with unprecedented research opportunities in using quantitative methods to study translation history. Coincidentally or not, at the 22nd International Congress of Historical Sciences, held in August 2015, one of the four themes was ‘Digital Turn in History’.

3. Book history and translation history

Books are the primary source for historical research. They are used to disseminate ideas, record memories and create narratives, hence the statements ‘books make history’ (Eliot & Rose, 2007, p. 1) and ‘without books there is no history’ (Howard-Hill, 2007, p. 18). In the field of the history of the book (or simply book history), which has become an independent discipline in the past two decades, the word ‘book’ refers to ‘virtually any piece of written or printed text that has been multiplied, distributed, or in some way made public’ (Eliot & Rose, 2007, p. 2). In his oft-cited essay, ‘What is the History of Books?’, Darnton (1982) proposes a six-stage model for studying book history (alternatively called a communications circuit of the book), which runs from the author to the publisher, the printer, the shipper, the bookseller and the reader. The focus in the model is on people who made, distributed and read books. In response to Darnton’s model, Adams and Barker (1993) shift attention from the people to the book itself, and stress the importance of bibliography. They put forward a model of book history which includes five events in the life of a book (publishing, manufacture, distribution, reception, and survival) and four outside


pressures (intellectual influences, political, legal and religious influences, commercial pressures and social behavior and taste) that impact the cycle. Some research questions in book history are (Darnton, 2007; Eliot & Rose, 2007): how books came into being, how they reached readers, what readers made of them and where and how they were translated.

It should be easy for translation researchers to notice the resemblance and connections between translation history and book history. Translation history has sought to ‘account for the circulation and canonization of texts via transformation and transfer’, and more recently has turned to explore ‘the networks of agents involved, the technologies with whichtranslationsareproduced,andtheirreceptionandimpact’ (O’Sullivan, 2012, p. 131). In Method in Translation History, Pym (1998, p. 5) encourages researchers to ask ‘who translated what, how, where, when, for whom and with what effect’. As advocated by Bachleitner (2009), the systems approach of book history should be applied to the study of translation history. Pym (2009; see also Chesterman, 2009) recommends that researchers first study translators and then texts. This is consistent with Darnton’s views. Yet, one may wonder whether this translator-focused approach to translation history is appropriate, since there are many stakeholders (e.g. editors, censors and critics, besides those noted earlier) in the life cycle of a translation, and in most modern cases, publishers rather than translators occupy a more central position.

Records of publishers, booksellers and printers, as well as bibliographies and library catalogues, provide rich sources for quantitative book history (or bibliometrics). Measurable factors include: number of titles published, publisher, date of publication, place of publication, author, number of book reviews, recommended retail price, sales figures, book format, subjects, genres, language and others. For example, Munch-Petersen (1981), in an article entitled Bibliometrics and Fiction, analyzed quantitatively the distribution of authors of prose fiction translated into Danish in the period 1800–1899. Jon Orwant of Google Books in 2010 mapped Library of Congress subject headings against dates of publication in Google’s catalog of books (1600–2010), and produced visualizations showing what kinds of book were popular at different periods of history (see Witmore & Valenza, 2012 for the charts).

In the book publishing industry, the major book categories are: textbooks, adult trade books, children’s books, technical, scientific and professional books and general reference books (Greco, Milliot, & Wharton, 2014). Trade books are published for the general public and ‘used primarily for entertainment and information’ (Rath, 1995, p. 16). They include literary fiction, poetry and drama, non-fiction (e.g. biographies, histories, travel guides), religious books, etc. For translation researchers, trade books are the focus of their attention. It has been known that in the US book market, the number of books in translation accounts for about 3% of the total (Venuti, 2008), and according to Ban (2015), nonfiction is the largest category, while literary works represent less than one-third of the books in translation published. Of course, in different countries, there exist huge differences in publishing developments, as well as in the status that the book has (Kovač, 2004). Abel (2005) identifies eight major worldwide trends in the publishing industry in recent decades, including, for example, the continuing increase in the number of new titles issued, the globalization of reading interests and the explosion of electronic products. These trends carry implications for translation historical research.


4. Bibliographies, catalogs and metadata

According to Shep (2005, p. 163), the historical research process consists of four phases: (1) identifying and locating relevant sources; (2) assessing the nature and value of these sources; (3) interpreting the evidence found in the sources; and (4) communicating the interpretation in written form. The four phases basically correspond to Boonstra, Breure, and Doorn’s (2006) life cycle of historical information, which consists of creation, enrichment, editing, retrieval, analysis and presentation. The main reason for the paucity of quantitative historical studies relates to sources, and one major source for translation historical research is bibliographies of translated books.

Bibliography, as defined by Encyclopædia Britannica (2012), is ‘the systematic cataloging, study, and description of written and printed works, especially books’. It is either ‘the listing of works according to some system’ or ‘the study of works as tangible objects’ (Encyclopædia Britannica, 2012). The two aspects of meaning correspond to different types of bibliography: descriptive or enumerative bibliography, and analytical or critical bibliography (see, e.g., Harmon, 1998; Stokes, 2011). Enumerative (or systematic) bibliography is concerned with assembling information about individual works into a logical arrangement; the results can be a universal bibliography (including everything published in a subject field), a national bibliography (listing everything published in a given country), a trade bibliography (for aiding the book trade), or a subject bibliography (related to a specific topic) (Connaway & Powell, 2010, p. 255). Descriptive bibliography differs from enumerative in that it requires much more detailed descriptions of the works (including, e.g., details about binding and other physical aspects) (Stokes, 2003).

A catalog is ‘an organized set of bibliographic records that represent the holdings of a particular collection and/or resources accessible in a particular location’ (Taylor, 2006, p. 6). More often than not, it is a library catalog. The library catalog is one of the many forms of bibliography (Hanson & Daily, 2003). Based on its purpose, it has several types, such as public catalog (e.g. British Library Public Catalog), which serves the patron, and union catalog (e.g. OCLC’s WorldCat), which is a combined library catalog describing the collections of many libraries.

In library and information science, bibliographies and catalogs are bibliographic tools for people to exercise bibliographic control over portions of the bibliographic universe (Taylor, 2006, p. 6). The uses of bibliographic control include finding entities that correspond to the user’s search criteria, identifying and selecting an entity and obtaining access to an entity (Taylor, 2006, p. 6). In short, it facilitates bibliographic record retrieval. The key to fulfilling these uses is metadata.

Metadata, according to the US National Information Standards Organization (NISO, 2004, p. 1), is ‘structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource’. Cataloging is the process of creating metadata. There are various cataloging standards, including data structure standards, data value standards, data content standards and data format/technical exchange standards (Gilliland, 2008, p. 3). Data structure standards define categories of data or fields that make up a record. Two examples in this category are MARC (Machine-Readable Cataloging) standards and Dublin Core Metadata Element Set. Data value standards, such as Library of Congress Subject Headings and Sears List of Subject Headings, are controlled vocabularies that are used to populate metadata element sets.


Data content standards are guidelines for the format and syntax of the data values that are used to populate metadata elements. Examples in this category include Anglo-American Cataloguing Rules (AACR), Resource Description and Access (RDA) and International Standard Bibliographic Description (ISBD). Data format/technical exchange standards, such as MARC 21 and MARCXML, are metadata standards expressed in machine-readable form. For the sake of space, only two of the standards are introduced in greater detail here.

The Dublin Core Metadata Element Set, which came into being in Dublin, Ohio, in 1995, is a vocabulary of 15 elements for use in resource description. The 15 elements are: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage and Rights. In 2003, the Dublin Core became an ISO standard (ISO 15836). The International Standard Bibliographic Description (ISBD) is a set of rules produced by the International Federation of Library Associations and Institutions (IFLA) to create a bibliographic description in a standard, human-readable form that is internationally acceptable. It requires that the nine areas of description be recorded in a specific sequence (see IFLA, 2011), as follows: 0. Content form and media type; 1. Title and statement of responsibility; 2. Edition area; 3. Material or type of resource; 4. Publication, production, distribution, etc.; 5. Material description area; 6. Series and multipart monographic resource; 7. Note; and 8. Resource identifier and terms of availability. Each area is composed of multiple elements. For example, in Note area, 7.2 concerns Note on edition and bibliographic history, 7.2.4 is Note on relationships to other resources, and is Note on translations. In Note on translations, phrases like ‘Translation of:’, ‘Original title:’ and ‘translated from’ are used.

5. Bibliographical sources for translation history

Bibliographical sources for translation history include union catalogs, trade catalogs, national library catalogs, subject bibliographies and official records. In order to do quantitative analysis, the bibliographic records cannot be handpicked from a database (e.g. online bookstore; usually the database should support batch retrieval of records amenable to quantitative analysis. In most cases, bibliographic tools are needed to facilitate record retrieval.

5.1. Z39.50 and bibliographic tools

Z39.50 is an international standard (ISO 23950) defining a protocol for computer-to-computer information retrieval. It makes it possible ‘for a user in one system to search and retrieve information from other computer systems (that have also implemented Z39.50) without knowing the search syntax that is used by those other systems’ (Library of Congress, 2001). The syntax of the Z39.50 protocol allows for complex queries. The default exchange format for bibliographic records in Z39.50 is MARC 21. Both MARC 21 and Z39.50 are maintained by the Library of Congress, and are widely used in library environments.

Bibliographic management tools are software programs for scholars and authors to retrieve references from databases and build and format bibliographies. Mainstream bibliographic tools include EndNote, Refworks, Zotero and others. They usually support


direct search in Z39.50 databases via connection files. At the time of writing, EndNote (by Thomson Reuters) provides connection files that allow a user to search in one of over 4000 specific library catalogs and other information databases (including, e.g., the Library of Congress and the British Library), and hundreds of import filters for helping to import data downloaded from a library or other information provider into EndNote. Bibliographic records can be retrieved or imported to EndNote and other bibliographic tools, subject to further manipulation, such as sorting, editing, counting and finding duplicates. EndNote (version X7) provides various fields related to translation (including Title, Translator, Original Publication, Translated Author, Translated Title, Language and others) in its template for the reference type of Book.

Some bibliographical sources make their datasets available in CSV (Comma-Separated Values) format. Researchers can use utilities or applications such as OpenRefine to analyze the data.

5.2.Trade catalogs and Bowker’s Books In Print

Trade catalogs (or bibliographies), according to Feipel (1916) in Elements of Bibliography, constitute the great bulk of commercial bibliography, and often are models of accuracy and compilation. They usually provide information mainly for booksellers about books in print or planned for publication, and are particularly concerned with publishers, prices and availability of books. Bowker’s Books In Print is a notable example.

R.R. Bowker Company (‘Bowker’) provides bibliographic information on published works to the book trade. It is the exclusive US agency for issuing ISBNs (International Standard Book Numbers). One of Bowker’s publications is Books in Print, which is a bibliographic database that contains over 20 million global titles (in print, out of print and forthcoming), including that books, ebooks, audio books and multimedia titles (Bowker, 2015). It is available in two subscription levels: (1) United States Edition contains US publications; (2) Global Edition covers US, UK, Canadian, European and Australian publications. offers multiple search options. The Advance Search page lists about 20 metadata elements (or fields), including Author/Contributor, Title, Subject, Publisher, Type (Fiction, Non-fiction), Status (In Print, Out of Print, Forthcoming), Audience (Trade, Young Adult, Scholarly & Professional, etc.), Format (Book, Audio, etc.), Country of Publication, Language, Original Language, Date Range, Dewey Range and others. Elements that are especially interesting for the purpose of compiling a bibliography of translated books are Language, Original Language, Country of Publication and Date Range. For example, if one is interested in records of books translated from Chinese into English and published in the US during 2010, he or she can enter Chinese in the Original Language section, and select English in the Language section, United States in the Country of Publication section, and Book-All in the Format section, and finally enter 2010–01–01 to 2010–12–31 in the Date Range section before clicking the Search button.

In this case, 119 records are retrieved. Further options are available for the user to refine results by Format (Paperback, Hardback, Electronic book text), Author, Status, Availability, Price range, Publication Date and Subjects (BISAC, BIC, Sears). It is worth mentioning that the paperback, hardback and eBook editions of a book have different ISBNs. In the 119 records, there are 75 paperbacks, 38 hardbacks, and six eBooks. As a book may have two or three editions (or formats), there may exist duplicates of the same book in the


results. The Subjects section lists the number of books in each subject, for example, China (33), Fiction (25), Short Stories (10), according to the Sears List of Subject Headings.

The retrieved records can be exported by selecting the records and downloading them. The export format can be ASCII (full), and options to include Annotations, Reviews, Bios, Stock Availability and Publisher Information can be selected. The procedure to import the records into EndNote is simple: Go to File>Import>File, choose the file exported from, select Books In Printer (Bowker) in the Import Option section, Import All in the Duplicates section, Unicode (UTF-8) in the Text Translation section and click Import. The duplicates can be selected by going to References>Find Duplicates. It appears that there are 13 duplicates (based on Title) in the 119 records. An examination of the records shows that the books are indeed translations from Chinese into English.

A record in consists of nearly 40 fields, including Translator. Such enriched metadata makes it a very valuable resource for translation history researchers.

5.3.National bibliographies and the Library of Congress

National bibliographies are ‘cumulated records of a nation’s publishing output’ (Jahns, 2012, p. 1), and are usually the most comprehensive record of publications in a country. Most of them are published under the auspices of the national library or other governmental agencies charged with such responsibility (Penka, 2011). For example, the British National Bibliography is provided by the British Library; China National Bibliography has been released by China National Depository Library (rather than the National Library of China), which is the Chinese agency for issuing ISBNs. National libraries obtain their books and other materials by various means. The Library of Congress (LC), for instance, according to its website (, obtains material by purchase, exchange, gift, transfer from other government agencies and, above all, by legal deposit. The US passed the Copyright Law in 1870, which obligates the owner of copyright in a published work to deposit two free copies with the Copyright Office for the Library of Congress’s collections. Many countries have passed a similar law: for example, Sweden in 1661, Poland in 1780, the UK in 1911, Australia in 1968 and China in 2002.

The CD-ROM version of annual China National Bibliography includes all kinds of books published in China during that year, including textbooks, trade books, children’s books, professional books and others. Take the 2011 volume as an example. The Bibliography contains about 30 metadata elements (or fields), including First Statement of Responsibility (diyi zeren shuoming), which follows the pattern of ‘(abbreviation of the name of the author’s country) author name’, and Remaining Data (qiyu zeren shuoming), which follows ‘translator’s name [etc.] yi’. The patterns allow one to search yi in field Remaining Data, and all the records of translated books will return. One can also search for records of books published by a specific publisher (e.g. Shanghai Translation Publishing House). The returned records can be exported to Microsoft Excel or EndNote for further manipulation.

The Library of Congress has been actively involved in standard activities in areas related to bibliographical and search and retrieve standards such as Z39.50, MARC standards and Search/Retrieve via URL (SRU). Its list of subject headings has been adopted by many libraries (e.g. the British Library). In an evaluation of three different kinds of bibliographic databases (online library catalogs, Global Books in Print and online bookshops) in 2004,


Somers and Nieuwenhuysen found that the ‘Library of Congress is an outstanding source of bibliographic information’ (Somers & Nieuwenhuysen, 2004, p. 41).

For translation history researchers, unfortunately, the Library of Congress does not have a collection of translated books, and it seems that there is no clean way to do a comprehensive search that would result in records for all of the books in translation in the Library’s collection (personal communication with Library of Congress in August 2015). In a Library of Congress bibliographic record, there is no field called Translator or Original Language. Table 1 shows part of a Library of Congress record of a translated book (

One can see that field Main Title contains ‘translated by’ in the record. An examination of other records shows that similar introductory phrases may be used in the Main Title field or other fields, including ‘translation of’, ‘translated as’, ‘Title translated’, ‘translation by’, ‘translation from’, ‘translation into’, ‘translated from’ and ‘translated into’. The most common introductory phrases are ‘translated by’, ‘translation of’ and ‘translated from’. One reason for such inconsistency is that conventions for cataloging translated materials have changed considerably over time. Uniform Title refers to a title assigned to a work which has appeared under varying titles. For example, a Chinese novel may have several translations in English, and they can use the same chosen uniform title. This field in MARC 21 Format usually includes the title of the source text and the target language. However, it seems that only a small portion of the Library of Congress records for translated books have a ‘Uniform Title’.

The Language Code field follows ISO 639–2 Code (e.g. eng for English, chi for Chinese), and contains codes for two languages in the case of translated books. That two language codes appear in the field Language Code alone does not necessarily mean that the work is a translation. In this case, based on the MARC Tags of this record (which appears on the same webpage) for the language code (i.e. 041 1_ |a eng |h jpn), eng is the target language, while jpn is the source language. If the MARC Tags for the language code are 041 0_ |a eng |a jpn, it indicates that the item is not a translation and includes text in English and Japanese, although the Language Code would still be ‘eng jpn’ in the Full Record. In the MARC 21 Format for Bibliographic Data, each field in the record is identified by a three-digit numeric code. For example, MARC defines field 041 as Language Code, which is further divided into subfields using a single letter or number designation (e.g. ‘a’ for language code of text, ‘h’ for language code of original). Field 245 is Title Statement; its subfield code ‘a’ refers to Title and ‘c’ Statement of responsibility. Translator (together with author, editor, etc.) is part of Statement of responsibility.

Table 1
Part of a Library of Congress record of a translated book

Personal Name Sekine, Seizō.
Uniform Title Rinri shis ono genryu-. English.
Main Title A comparative study of the origins of ethical thought: Hellenism and Hebraism/Seizō Sekine; translated by Judy Wakabayashi.
Published/Created Lanham, Md.: Rowman & Littlefield Publishers, c2005.
Subjects Ethics–History.
Jewish ethics–History.
LC Classification BJ71 .S45 2005
Language Code eng jpn

Figure 1. A search configuration in EndNote for retrieving translated books from the Library of Congress.

EndNote users can connect to the Library of Congress catalog and search and retrieve records. The search configuration shown in Figure 1 is supposed to retrieve books translated from Chinese into English and published in 2010. Forty-nine results were returned, and over half of the translated books were published in Beijing.

By replacing Uniform Title: English in the configuration with Language Code: eng, 113 records were retrieved. Among the results, some books were translated from English into Chinese and the others vice versa, and most of the books were published in Beijing. If Language Code: eng chi was used, no results were returned.

In their book-length study of contemporary world fiction, Dilevko, Dali, and Garbutt (2011) used the web version of Library of Congress Classification (http://, subscription required) as a basis for finding major languages and literatures of the world, and found the call number ranges helpful in retrieving works originally written in specific languages of interest by individual authors. However, their initial searches took a year, and manual elimination had to be performed.

Compared with Bowker’s Books In Print, it appears that the current Library of Congress catalog is not very favorable for translation researchers to quickly build a bibliography of translated books. It is possible that one may find a way to grab records of translated books from the Library of Congress catalog using some utilities or applications based on MARC field 041 for Language Code (e.g. 041 1_ |a eng |h chi).

5.4.Union catalogs and OCLC WorldCat

As mentioned earlier, a union catalog is a combined library catalog describing the holdings of many participating libraries. The geographical area covered by a union catalog varies from local to multinational. For example, OhioLINK is a consortium of Ohio’s college and university libraries and the State Library of Ohio. It serves 91 institutions and intends to provide easy access to information and rapid delivery of library materials throughout the state. The Universal Short Title Catalogue (USTC) ( is a


Figure 2. The search interface of WorldCat.

collective database of all the books published in Europe in the fifteenth and sixteenth centuries; it encompasses about 350,000 titles located in over 5000 libraries worldwide.

OCLC’s WorldCat is the largest online union catalog in the world. Each of its records contains a bibliographic description of an item and a list of institutions that hold the item (see Jordan, 2010). According to its website and Wikipedia, WorldCat currently contains over 347 million bibliographic records (more than half of which are for non-English content), representing the collections of 72,000 libraries in 170 countries and territories. Free search of the catalog is allowed via However, WorldCat through the FirstSearch service (library subscription required) includes extra features such as advanced search, ‘Find similar items’, more export options, and links to published reviews and excerpts.

WorldCat is based on MARC 21 and the Anglo-American Cataloging Rules, 2nd edition. Thus, searching WorldCat is similar to searching the Library of Congress catalog, with one exception: WorldCat does not support EndNote connection. Figure 2 is the search interface of WorldCat (

As this interface supports only three keyword fields at the same time, one needs to do more searches using other introductory phrases mentioned in the previous section. Of course, these searches are not exhaustive. Godby, Wang, and Mixter (2015, p. 95) suggest using a text mining approach, and their ‘text mining software has successfully detected translators for an extra 2.7 million records’ in WorldCat. Search results can be exported to EndNote, RefWorks or Text file for further manipulation.

5.5. Subject bibliographies

Bibliographies devoted to a single subject constitute the largest portion of bibliographic publication (Sweetland, 2001, p. 87). They may be retrospective or current, annotated


or unannotated. Compared with subject bibliographies, library catalogs and trade catalogs are often less comprehensive on a particular subject (Sweetland, 2001, p. 87). At present, there exist various subject bibliographies of translated books.

The most well-known translation bibliography is probably UNESCO’s Index Translationum, which was created in 1932. Printed volumes of this title cover 1932–1940, with a gap from April 1940 to 1947, then resuming for 1948–1986. There is a two-volume Cumulative Index to English Translations 1948–1968 (G.K. Hall, 1973), covering that portion of the Index Translationum. The records were computerized in 1979. The online version contains ‘cumulative bibliographical information on books translated and published in about one hundred of the UNESCO Member States between 1979 and 2009 and totals more than 2,000,000 entries in all disciplines’ (UNESCO, 2014). This database’s search interface (at includes fields such as Original Language, Target Language, Country and Year Range. There is no export option for the search results, though it is possible to design a filter for importing them into EndNote. The Index’s data come from the bibliography centers or national libraries in the participating countries. According to Šajkevič (1992, p. 67), the numbers provided on its website do not match those in the UNESCO Statistical Yearbook, and the Index contains some omissions.

The Renaissance Cultural Crossroads Catalogue, compiled at the University of Warwick, is a searchable and annotated list of all translations into and out of all languages printed in England, Scotland and Ireland, and of all translations out of all languages into English printed abroad before 1641, according to its website ( Its search interface includes such fields as Author, Translator, Original Language, Target Language, Year and Subject.

Three Percent, a resource prepared by the University of Rochester, provides detailed lists of translated titles published each year in the US since 2008. It was ‘named after the oft-cited statistic (first established by Bowker) that only 3% of books published in the U.S. are translations’ (Three Percent, 2015). Different from the Index Translationum, Three Percent focuses only on fiction and poetry. So far, eight annual translation databases in Excel format have been released, and each of them includes several spreadsheets. The spreadsheet Titles, for example, has over 10 column headings such as Title, Author, Translator, Genre, Year, Original Language and Source Country. Statistics involving source languages, countries and publishers are provided. Records in Three Percent are collected from many catalogs and input directly from publishers.

5.6. Official records and statistics

Quantitative data on translated books can be derived from official records and statistics held by government agencies. Of special interest to translation researchers are annual statistics on book trade and book copyright trade, which are directly related to translation. Such statistics can be used directly in quantitative translation history or be used to calibrate the numbers obtained from bibliographies.

For example, China has recently started to release annual country-specific Basic Statistics on Copyright Import and Export, which include the following categories: books, audio products, video products, electronic publications, software, films, TV programs and others. Figure 3 shows the annual Chinese (mainland) book copyright trade statistics from 2001 to 2013. It should be noted that over 10% of the titles in the Chinese (mainland)


Figure 3. Chinese (mainland) book copyright trade from 2001 to 2013.

book copyright trade are imported from Taiwan, Hong Kong, Singapore and Macau, while about 30% of the titles are exported to these four Chinese-speaking regions; in such cases, no translation would be required.

Besides Books In Print, Bowker also publishes The Library and Book Trade Almanac (formerly The Bowker Annual) and provides for free annual title output statistics.

6. Uses of bibliographic data

Following data collection are data analysis and presentation. Common quantitative data analysis techniques include, among others, the frequency distribution, measures of central tendency and of dispersion, time series analysis, and correlation analysis (see, e. g., Floud, 2006). The frequency distribution is a method of presenting data in which a table displays the frequency of occurrence of each of the values. A time series is a sequence of values of a measure taken at a point in time, and the values are in chronological order (see Figure 3). As a time series displays ‘a combination of long-term growth or decline and short-term fluctuations’ (Feinstein & Thomas, 2002, p. 21), time series analysis is of particular interest to historians. Correlations are used when researchers want to know about the relationship between two variables (e.g. GNP and the annual number of translated titles), and, more specifically, whether there is a relationship, how strong it is and what form it takes (Floud, 2006).

For the presentation of quantitative data, the graphs that are frequently used in historical research are histograms, line charts, bar charts, pie charts and others. Moretti (2005) advocates in the study of literary history the use of graphs from quantitative history, maps from geography and trees from evolutionary theory. In the fields of digital humanities and big data, one focus is on the visualization of data, which can help researchers see patterns, extract new meaning from the messy data and visually communicate the findings (Frankel & Reid, 2008). Also, data visualization can ‘be understood as a speculative and intuitive process that provokes researchers to new questions, thus functioning as a problem-generating rather than problem-solving strategy’ (Bode & Osborne, 2015, p. 233). There are some user-friendly tools for analyzing and visualizing historical and bibliographic data,


such as Zotero (with its plugin Paper Machines), OmniViz and RefViz (together with EndNote), ThemeRiver and Gephi.

Different techniques and tools are necessary for different research questions. As noted in Section 3, research questions in book history revolve around books (or authors), publishers, distribution of books, readers (and reviewers) and translators. Some basic research questions are: Who wrote them? Who translated them? Who published and sold the translated books? How expensive were they? Who was reading them? What was their appeal and popularity compared with non-translated titles? How were they reviewed? (see Suarez, 2015, p. 204). Bibliographic data can help address such questions. Since the metadata in bibliographic records contain specific fields such as subjects (e.g. religion), genres (e.g. fiction), source language, target language, date of publication and place of publication, more specific questions can be asked, for example, how many translated books were published in a given country in a given year. Darnton (1982) and Adams and Barker (1993) both mentioned that political, legal, economic, social, intellectual and religious influences can impact book history. These influences and trends contribute to our analysis and interpretation of quantitative data.

In the remainder of this section, we review some existing quantitative studies related to translation history, and present some of our own preliminary findings. Some of the subsection titles are borrowed from Šajkevič (1992), which is a pioneer study in this area, based on UNESCO’s Index Translationum.

6.1 General output of translated books

According to Šajkevič (1992), the proportion of translated books in the total book output in the world stayed stable at some 9% with slight variation. Kovaĉ (2002), however, found that there were significant differences in this aspect in European countries.

In countries such as Germany, for example, around 15 percent of total book output is translations. In France, translations represent around 10 percent of the total book output. In Italy, similar to Hungary or Slovenia, translations represent between one-third and one-quarter of the total book output. In all these countries, those originally in English represent between 60–80 percent of the translations. On the other hand, in 1990, only 2.4 percent of the total British output of books was translations; in the United States it amounted to 2.96 percent. (Kovaĉ, 2002, p. 49)

Based on the numbers retrieved from Bowker’s Books in Print, Figure 4 shows the time series of the numbers of translated books from Chinese into English and published in the USA from 1981 to 2010. Note that the numbers include both paperbacks and hardbacks.

In her MA thesis, Dong (2008) explored literary works translated into Chinese and published between 1950 and 2008. From the NBINet (National Bibliographic Information Network) databases in Taiwan, she retrieved 32,159 bibliographic records, and found that the annual number of translated literary works increased steadily.

The Index Translationum provides on its website translation-related statistics in many aspects, such as ‘Evolution of translations in a given country’. Figure 5, for example, displays the annual number of translated books published in the USA from 1979 to 2008.

6.2. Original languages

English has undoubtedly been the most translated language in the world. Using data from the Index Translationum, Figure 6 shows the proportion of English as original language in


Figure 4. The annual number of translated books from Chinese into English and published in the USA.

the total sum of translated books in the world at large over the years. In 1979, the proportion of English as original language was 40%; in 1990, it stood at 50%; since 1995, it appears that the proportion has been around 60%.

Figure 7 compares English, French, German and Russian as original language in terms of the annual number of translated books. Overall, the annual number of books translated from English has been growing consistently and sharply from 1979 to 2007. French, German and Russian are among the top four original languages, according to statistics from the Index Translationum. Over the years, French and German exhibit a steady and similar trend. In the case of Russian, it shows a notable decline since 1991, when the Soviet Union dissolved.

Figure 5. The annual number of translated books published in the USA from 1979 to 2008.


Figure 6. The proportion of English as original language in translated books worldwide.

German, French, Spanish, English and Japanese have been the top five target languages, and correspondingly, Germany, Spain, France and Japan have been the top four countries in terms of published translated books. Dong (2008) found in her study that the top five source countries of literary works translated into Chinese were the USA, UK, France, Germany and Russia.

6.3. Themes and subjects

Subject is one of the 15 elements in the Dublin Core Metadata Element Set. As data value standards, Library of Congress Subject Headings and Sears List of Subject Headings are widely adopted by bibliographies and catalogs such as the Library of Congress and Bowker’s Books In Print, as well as Index Translationum. These make it possible for researchers to analyze the proportion of various subject and thematic groups.

In 2008, one of the authors of this article retrieved from the Library of Congress catalog via EndNote the records of translated books published in the USA during 1900–2007, and ‘translated’ was used as a search word. In EndNote, duplicates were deleted; records of such reference types as Audiovisual Material, Manuscript, Thesis and Generic were deleted; records with no indication of publication date or place of publication, or with publication place not in the USA, were discarded. The whole process took about two days. In the end, a total of 57,011 bibliographic records were kept in an EndNote library ready to be analyzed. With the caveat that using ‘translated’ as a search word is not a perfect way to retrieve records of translated books (see Section 5.3), a theme distribution based on Subjects and Title was mapped out using OmniViz, which can work seamlessly with EndNote. Figure 8 is a chart for the period of 1931–1940. This theme distribution illustrates that ‘life’ (as in The Wisdom of Life) was a paramount theme in the 1930s, followed by ‘Germany’, ‘woman’, ‘biography’ and ‘love’. Table 2 presents the theme distribution for each decade from 1900 to 2007.


Figure 7. The annual number of books translated from English, German, French, and Russian.

According to a survey1 jointly conducted in 2012 by the Translators Association of China and Beijing Foreign Studies University, 9763 Chinese books had been translated into various foreign languages during 1980–2009, and the top three subject groups were Chinese history and geography, Chinese political and legal writings and Chinese arts, culture, science and education. Each of the three accounted for over 20%, compared with 10% for Chinese literature and less than 1% for Marxist-Leninist and Mao Zedong thought writings.

6.4. Translated authors

Based on statistics from the Index Translationum, the top 10 translated authors in the world are Agatha Christie, Jules Verne, William Shakespeare, Enid Blyton, Danielle Steel, Vladimir Lenin, Hans Christian Andersen, Stephen King and Jacob Grimm.

Figure 8. Themes of the translated books published in the USA during 1931–1940.


Table 2
Theme distribution of translated books published in the USA (1900–2007).

<>Period Themes
1900–1910 life
1911–1920 life, war
1921–1930 life, story, religion
1931–1940 life, German/woman, story, biography/Spain
1941–1950 life, world war, philosophy
1951–1960 Life/God, Modern/poems, Mathematics/adventure, Story/church
1961–1970 Old/Testament/New, philosophy/chemical, travel/sculpt, Christianity/religion
1971–1980 poems/animal/night, France/England, theology/sociology, Christianity
1981–1990 poems/love, story/China, contemporary/West
1991–2000 life/death/woman, philosophy/Europe, France/war, tale/fairy, love/travel
2001–2007 life/death/guide, story/collect/night, write/law/war, story/collect, human/myth

When studying the frequency distribution of productivity of authors, three laws (or hypotheses) are often discussed and applied. Lotka’s Law (Lotka, 1926) states that the number of authors making n contributions is about 1/n2 of those making one, and that the proportion of all authors that make a single contribution is about 60%. Coile (1977) later argued that Lotka’s Law was based on observations of scientific productivity of chemists and physicists, and did not apply to the humanities. Munch-Petersen (1981) studied the application of Lotka’s law to the distribution of authors of prose fiction translated into Danish in the period 1800–1899. He concluded that ‘Lotka’s law does not fit exactly to my material, but different ways of counting the document units can bring my material closer to or farther away from Lotka’s theoretical model’ (Munch-Petersen, 1981, p. 9). Dong (2008) found that, overall, Lotka’s Law was not applicable to her case of authors of literary works translated into Chinese.

Price’s law (Price, 1963) states that, in a given field, ‘the number of prolific authors is equal to approximately the square root of the total number of authors in the field’, and ‘the prolific authors account for about half the publications in the field’ (Diodato, 1994, p. 131). The 80/20 law (also known as the Pareto principle) asserts that ‘a minority of causes, inputs or effort usually lead to a majority of the results, outputs or rewards’ (Koch, 1998, p. 4). For instance, according to Consumer Research Study on Book Purchasing 2001 (by Book Industry Study Group), books in three subject categories (i.e. popular fiction, non-fiction religious and cooking-crafts) accounted for 74% of all books belonging to one of 11 subject categories. In Dong’s (2008) study, she found that neither Price’s law nor the 80/20 law applied to the distribution of authors’ or translators’ productivity.

6.5. Publishers of translated books

Publishers play an important role in modern translation history. For example, Mai Jia, a Chinese writer, published his espionage novel Decoded in China in 2002. According to him, that novel was rejected 17 times by Chinese publishers. In 2014, it was translated into English by Olivia Milburn and Christopher Payne, and published by Penguin and FSG. The publishers organized promotional activities lasting eight months before launching Decoded in 20 countries. All major English and American newspapers (e.g. The Guardian, New York Times) published reviews of the novel. This attests to the importance of publishers in the success of a translated book.


The key problems a publishing house faces include: how to select the right books, how to reach the target readers, how to survive economically and how to gain cultural prestige (De Glas, 1986, p. 59). These involve economic, cultural, political and other factors. When it comes to publishing translations, different publishers have different policies and different interests in book subjects. A comprehensive analysis of a publisher’s list, as well as its statement of editorial policies, should help illuminate the translation history of a publishing house.

It is interesting to note that the top three publishers of translated books in China, according to the Index Translationum, are China Machine Press, the Publishing House of Electronics Industry and Tsinghua University Press, all of which focus on translations of technical, scientific and professional books and textbooks rather than trade books. This is probably consistent with the situation in the USA. According to Casper and Rubin (2013, p. 702), textbooks and scientific and technical books dominated the US export book trade in the 1950s and 1960s, and the sale of translation rights increased dramatically. Two literary publishing houses have made it to the Chinese top 10 list: Shanghai Translation Publishing House and People’s Literature Publishing House.

6.6. Distribution of translated books

Political relations between countries, economic relations (especially the international book market) and cultural exchanges determine the mode of circulation of texts in the world, and one of them often plays a dominant role in the circulation and distribution of translated books in a country during a specific historical period (Heilbron, 1999; Heilbron & Sapiro, 2007). In order to study the distribution of translated books, one may turn to sales figures (provided by publishers or booksellers), library holdings and library loan data.

The number and geographic distribution of libraries holding a given title indicate the distribution of a book. In this aspect, WorldCat provides information on library holdings, which can be retrieved and calculated. The book Wolf Totem (authored by Jiang Rong, translated by Howard Goldblatt), for instance, was available in 900 libraries worldwide at this writing. The number of library loans reflects the appeal and popularity of a book and the reading habits of the population.

7. Limitations and future of bibliographic data

Bibliography-based quantitative methods can be used to verify or question an impression, find trends and look for patterns and can carry great persuasive power in historical research. And yet, data sources are always imperfect (Weedon, 2007). For instance, in Section 6.3, we mentioned that a survey found 9763 Chinese books had been translated into other languages during 1980–2009. However, the number provided in the Index Translationum is 13,615. The data quality issue is caused by various reasons. According to Greco (2005, p. 361), R.R. Bowker used data extracted from Bowker’s American Book Publishing Record, supplemented by data generated from their Paperbound Books in Print for many years. Later, Bowker found that these figures reflected only those books catalogued by the Library of Congress, and much of the output generated by small publishers and self-publishers had not been included. Thus, Bowker revamped its existing cataloguing system to capture more reliable statistical data on total US title


output. As a result, in its statistics, the book title output in the USA in 1996 was 68,175 and 119,262 in 1997. When counting the number of translated books, one has to decide whether to count reprints, new editions, retranslations, volumes and paperback versions of hardcovers. Naturally, different decisions will lead to different numbers. In addition, a lack of standards or rules can cause inconsistency issues. For instance, in the Library of Congress catalog, a publisher called William Morrow and Company has the following variants: W. Morrow & Company, W. Morrow and Company, W. Morrow and Co., Morrow, and Morrow. This makes it difficult to retrieve information reliably on publishers.

The issue of data quality is often intertwined with the data availability issue. For instance, in the Library of Congress catalog, there is no such field as Translator, Translated Title or Source Language. In the Dublin Core Metadata Initiative metadata terms, there are no such terms, either; one term related to translation is Alternative, which refers to an alternative name for the resource. The reason why it is not labeled Translated Title is that some resources are multilingual (e.g. UN releases and EU releases) and it is impossible (mostly for the sake of political correctness) to say which is original and which is the translation. Given that most bilingual or multilingual resources are official documents, not translated books (in a narrow sense), we advocate for establishing terms like Translated Title in such standards or adopting a standard containing such terms (e.g. More granular categories will lead to catalogs of much better quality. The Library of Congress has been working on the Bibliographic Framework Initiative (BIBFRAME), which aims to replace the MARC formats and provide greater granularity and easier reuse of the data. We hope they take into account the case of translations. In addition to the issue of formats, data provision is critical. Donahaye (2012) recommends that publishers provide full detail on translation using a predefined template.

As most translated books have been published in the last 100 years or so, records in most bibliographical sources are often limited to modern times. This may be inconvenient (though not impossible) for quantitative studies of translation history of the past. From another perspective, however, historical studies on translation ‘originate from present concerns and not from any “idle” curiosity about past peoples and their lives’ (Gürçağlar, 2013, p. 134), and intend to ‘express, address and try to solve problems affecting our own situation’ (Pym, 1998, p. x). Seen in this light, trends and patterns in modern times discovered by bibliography-based historical research are in a good position to help policy-makers in the field of language and culture.

Bibliographic data sources, no matter how flawed or distorted, provide enough material for translation historians to construct a general picture of the translation history, ‘something comparable to the early maps of the New World, which showed the contours of the continents, even though they did not correspond very well to the actual landscape’ (Darnton, 2002, p. 240). Patterns and regularities in translation history will be better discerned and interpreted with bibliographies and the emerging big data analytic techniques.



