Canadian News Sources (WE1S Area of Focus Report)

Report by Annie Schmalstig

Final Version Created June 2018


Schmalstig, Annie. “Canadian News Sources.” WhatEvery1Says Project, July 3, 2018.

1. Overview

What is your area(s) of focus?

Canadian (English language) news sources, although I’ve also compiled a list of French-language sources on Trello, in case we want to establish a French-language sub-corpus later on.

Why is this area of focus important to the WE1S corpus?

In order to be representative, we are trying to collect English-language sources from outside the United States.

2. Source Scoping Process

How have you been selecting sources for the WE1S corpus? (e.g. collecting from particular databases, using “impact” lists, etc.)

Wikipedia has an extremely comprehensive list of all newspapers in Canada, including English-language, French-language, daily, weekly, and monthly national and regional newspapers and metro bulletins, and college newspapers. The main focus was first to collect metadata for daily, national, English-language papers. A Canadian-American graduate student was consulted about local and regional Canadian news sources; he noted that newspapers published in Ontario, even if local, were for all intents and purposes national newspapers, as half of the population of Canada lives in Ontario. It was difficult to gauge representativeness based on how many newspapers should be collected from each region of Canada, but in practice, those collected regionally were those that were actually available through university databases. Most sources were available, via the University of Miami library system, through ProQuest Global Newsstream adn LexisNexis Academic (although the U of Miami library is soon switching to NexisUni).

If you are using external links to guide your selection of sources, include links here and indicate who produced them, for what purpose the list was produced, and any potential bias issues involved.

I used this Wikipedia list: This list, as mentioned above, was extremely comprehensive, listing newspapers with various political biases, ethnic focuses, languages, distribution methods, levels of “brow,” and national, local, regional, or university affiliation. The list was not organized by circulation numbers, although these were (sometimes) listed. The list was edited by multiple Wikipedia authors, and it notes, like many Wikipedia pages, that the list is incomplete and suggests Wikipedia readers can help by expanding it. Some sources cited to compile the list include these pages from News Media Canada on Canadian newspaper circulation ( and ownership (

3. Corpus Representativeness

How representative do you think your corpus is? (“representativeness” can be interpreted and addressed in a number of ways, so tailor it to be the most productive for your area.)

Sources collected so far are largely regional, daily newspapers. As discussed above, the number of sources collected from the list of local and regional newspapers is based on what is available in the online databases, rather than a representative number (by population or circulation) of newspapers from each region of Canada. There are still very few independent Canadian news sources, or anything other than newspapers (another RA has collected a handful of Canadian magazines). There are many French-language newspapers in Canada, but these have not been collected yet. Nor have ethnically- or gender-focused newspapers, although these are on the Wikipedia list of all Canadian newspapers.

What challenges in achieving representation have you encountered?

It was difficult to determine in theory whether to focus on collecting a certain number of sources from each region of Canada (especially since roughly half of all Canadians live in Ontario, and that many sources from Quebec are in French), or on sources with a certain level of circulation. In practice, i have simply begun collecting all sources that are available via the U of Miami online library system, as these are somewhat limited (especially for newspapers from small towns).

Provide a tally breakdown of the various facets of sources in your area of focus that WE1S is considering as possbile measures of overall corpus “representativeness” (for example, by source or media type, nationality, region, political orientation, identification with specific racial, ethnic, and gender audiences, etc.)

National: 2
Regional: 33


Ontario: 16
British Columbia: 7
Alberta: 4
New Brunswick: 3
Manitoba: 2
Nova Scotia: 1

 4. Reflections

What challenges or difficulties have you encountered in the source selection or collection process? Do you anticipate any challenges emerging from your work going forward?

The biggest selection challenge, as discussed above, was to determine how many sources to collect from each region. Again, this was resolved by simply collecting all available sources. A collection challenge was that many newspapers were only available via databases, like Newsbank World News Research Collection, which required a further level of access, or only displayed articles in pdf format, which we are not able to use for topic modelling. Other databases, including ProQuest Global Newsstream, claimed to have coverage of a newspaper, but only had digital files of articles for a few years, usually in the 1990s and early 2000s. It was also difficult to gauge political orientation and level of “brow” for many of these publications, as such classifications were not noted on either the Wikipedia page or the online database.

5. Research Scan

Conduct some preliminary research on the questions or challenges that you provided in sections three and four.

It was difficult to find sources that explicitly addressed the question of how to select a representative corpus to represent all Canadian interests, regions, and nationalities for topic modeling. However, several studies discuss methodologies for collecting smaller-scale media samples to study select issues or groups. Some researchers chose to study only prominent national newspapers, while others developed their own lists based on highest circulation numbers. The Globe and Mail and the National Post were often mentioned, and while one or two studies also included the French-language publication Le Devoir, most used exclusively English-language newspapers.

Have other scholars reflected on these issues? Are there publications that address these problems? Has research been conducted on how to overcome these challenges or at least acknowledge them productively?

There are several published articles that discuss how their authors selected certain newspapers and ranges of articles with which to study various issues. A study on “Chronic Disease Coverage in Canadian Aboriginal Newspapers” (by Hoffman-Goetz, et al, in the Journal of Health Communication, 8.5, 2003), identified, at that time, 31 Aboriginal newspapers in Canada, and randomly selected 14 for keyword searches related to illnesses in that community. An article in Journalism Studies by Mike Gasher (“The View from Here: A News-flow Study of the On-line Editions of Canada’s National Newspapers,” 8.2, 2007), studied patterns in the circulation of news items online through content analysis. He chose to limit his study to daily newspapers “because of the number and variety of them available on the Web, because their news packages tend to be more extensive than those of either magazines or radio or TV broadcasters, and, most importantly, because by moving on-line, newspapers break the physical and geographical restraints inherent to the production and distribution of heavy, bulky hard-copy newspapers.” The papers chosen were the Globe and Mail, the National Post, and Le Devoir (published in Montreal but serves primarily citizens in Quebec). Jennifer Ellen Good’s research on “The Framing of Climate Change in Canadian, American, and International Newspapers: A Media Propaganda Model Analysis” (in Canadian Journal of Communication, 33.2, 2008), compared climate change coverage using keyword searches in LexisNexis. The newspapers were selected largely by highest circulation numbers. For American papers, the author used a list generated by LexisNexis of English-language newspapers that were listed in the top 50 in circulation of the Editor & Publisher Year Book (

The list of Canadian newspapers was created by the author, as there was not a list of highest in circulation on LexisNexis, and was based on “the largest-circulation English newspaper from each province and territory (as determined by the Canadian Newspaper Association,, and based on LexisNexis’ availability) and the addition of two national newspapers.” Harald Bauldner’s article on the “Immigration Debate in Canada: How Newspapers Reported, 1996-2004” (in Journal of International Migration and Integration, 9.3, Sep 2008, DOI:10.1007/s12134-008-0062-z), discusses his research on Canadian media representations of immigration and their effects on debate, policy, and law. He used topoi analysis (created by linguists Boke (2000) and Wengeler (1995, 2003)) to “identify distinct models of argumentation embedded in a given text or text passage, organize these topoi, and analyze their occurrence in a systematic manner”–the author used a list of 16 topoi relation to immigration that had previously been identified by an earlier, German study by Martin Wengeler. The newspapers Bauldner used were the Vancouver Sun, Calgary Herald, The Toronto Star, the National Post, and Ottawa Citizen; they were chosen “because they are among Canada’s largest, most influential and reputable daily newspapers, they represent a range of perspectives, and they are based in the provinces of Canada (Ontario, British Columbia, and Alberta) that receive–with the exception of Quebec–the most immigrants” (and so would be most pertinent for his research questions). He also noted, however, that these were the newspapers that could be searched electronically from 1996-2004 using the search engine Canadian Newsstand. Finally, an article by Wendy Naava Smolash in University of Toronto Quarterly (78.2, Spring 2009, “Mark of Cain(ada): Racialized Security Discourse in Canada’s National Newspapers”) outlined how she used literary analsyis techniques to compare two of Canada’s national newspapers’ coverage and racialization of terrorist incidents in Canada. While her methodology did not involve topic modeling, it did identify the Globe and Mail and the National Post as two major, national Canadian news sources.

6. Additional Comments/Reflections

Include any other issues or questions that you have encountered that may not fit into any of the above categories.

Eventually we will need to decide whether to include a French-language sub-corpus, as we are considering a Spanish-language one. It might be difficult ot get a full picture of Canadian news media’s representation of “humanities” without also collecting articles from the many French-language publications, or at least Le Devoir.