In May 2001, Tim Berners-Lee, director of the World Wide Web Consortium, and the inventor of the Web, identified the next “killer app” of technology as “the Semantic Web.” This is simply e-text made intelligent by XML-encoded semantic hyperlinks to what Berners-Lee terms “ontologies”, collections of word-definitions and semantic rules that enable software agents to travel the Web with an artificial understanding of the “meaning” of both what it searches for and what it searches in. The TAPoR project represents the first major proposal in Canada to exploit the Web as an intelligent medium for text, where "text" means any record of human communication that can be digitally represented. In addition to providing a framework for computing in textual and related fields, it represents an opportunity for the community of humanities computing scholars in Canada to seize the initiative, developing standards, tools, and methods that will set the pace for the next several decades and enable Canadian scholars to participate as equals on the international stage, as the standards evolve.
The centres that TAPoR brings together into a critical research mass work on traditional subjects such as Shakespeare, history, law, Canadian poetry, aboriginal languages and culture, Canadian and worldwide co-operative movements, women’s writing, and lexicons of early English. These centres enrich Canada’s culture and educational system by bringing new content to the Web, but they also play a role in Berners-Lee’s vision of the Semantic Web. They do research at the interface of semantic analysis (the study of meaning) and computer technology: This nexus we call text analysis, although, thanks to the computer, the "texts" that scholars need to analyze cover a variety of fields and cross media as we connect, for example, digitized performances of a Shakespeare play to historical information. At the core are metadata for resource discovery and encoding standards, technical fields in which the New Brunswick TAPoR centre is a world leader and in which most centres are adept practitioners in special subfields. These methods have obvious benefits for the knowledge society, the knowledge economy and for a connected Canada.
As TAPoR researchers develop SGML text-encoding languages, for whatever type of text, they create metadata, an essential ingredient in making e-texts understandable by future software agents. Computer-based lexicons like DOE and LEME at Toronto, the women writers textbase at Alberta, and the law corpus at Montreal, among others, become valuable properties in this perspective. They belong to Berners-Lee’s “ontologies,” which run the semantic engine that software agents will consult in the next few decades. For centuries, the humanities has studied meaning in genteel poverty, using printed texts which were sometimes rare and fragile, which forces the scholar to go to the source. Now, the time has come when computers can bring the sources to the literary and linguistic scholars, regardless where they are and, for that matter, to the general public. Humanists are now being called again to put their skills in knowledge representation and language analysis to use for the technical advances of society at large, just as they did in the last knowledge explosion when print technologies were perfected. Partnered with computer, information, and cognitive sciences, they will help engineer text tools for the common weal.
Though not well known outside humanities computing, text analysis studies the cognitive foundations of language. For example, this method supplies the stylistic profiles that are used to identify the authors of anonymous or disputed texts, as was done in the case where the anonymous journalist who wrote Primary Colors was identified. This technology also enables forensic researchers to attach responsibility for a document to its true author, helps to establish copyright, and shows promise to improve the security with which a document passes over the Internet, by demonstrating a probability that the supposed author is the actual one. Researchers at Alberta, Victoria and elsewhere practice computational stylistics, but the interconnectedness of language and thought, as a subject, belongs to humanities researchers in all fields.
Large digital libraries or corpora, searchable by reference to their metadata, appear in every TAPoR centre. These translate the intellectual and cultural heritage of Canadians to a new medium. They include the Montreal legal corpus, the large co-op studies and aboriginal (audio and text) collections at Victoria, and the Canadian poetry and drama initiatives at New Brunswick. All this is original Web content, but to work in the world TAPoR researchers are partnering with computer and information scientists. For example, the Toronto centre attaches human-computer interface research to the development of these corpora. By mapping text characteristics to the cognitive behaviour of those who are using them, TAPoR researchers can build improved interface designs that will contribute to digital library research.
Traditionally, text-analysis methods benefit research in the language industries (translation services, word-processing companies, software producers, and language instruction firms). That will certainly continue. With the growth of intercontinental free trade, however, Canada needs a high-powered text-oriented research portal that can integrate the country’s researchers so that they, like their counterparts in Europe and the United States, have a solid base from which to engage in international collaborations. TAPoR assembles well-known researchers in very different fields who have, in some cases, just become aware of one another.
