United Kingdom News Sources (WE1S Area of Focus Report)

Report by Abigail Droge, Jennifer Hessler, Samantha Wallace, and Scott Kleinman


Abigail Droge, Jennifer Hessler, Scott Kleinman, and Samantha Wallace. “United Kingdom News Sources.” WhatEvery1Says Project, http://we1s.ucsb.edu. July 3, 2018. http://we1s.ucsb.edu/united-kingdom-news-sources/.

1. Overview

What is your area(s) of focus?

Our area of focus is news sources in English from the United Kingdom (England, Scotland, Wales, and Northern Ireland).

Why is this area of focus important to the WE1S corpus?

This focus is important for the WE1S corpus because the UK has historically exerted enormous power on the shape of the humanities in the Western imagination. The institutions put in place by the British Empire have had, for better or for worse, a forceful influence on the way that different social groups — in many countries — can access resources and experiences related to the arts. A strong representation of UK media from the 1980s onward will position us well in terms of understanding the current challenges and opportunities that face the humanities now. Such a focus will also provide a strong base to build from, should we decide to extend our corpus historically (for example, back to the 19th century moment when many academic institutions began to take the forms we now recognize).

2. Source Scoping Process

How have you been selecting sources for the WE1S corpus? (e.g. collecting from particular databases, using “impact” lists, etc.)

Our selection process began with a starter kit list, generated by two of Alan’s British contacts, which provided a helpful overview of major newspapers to target first. To expand this initial set of suggestions, we identified a list of all the UK papers that are available through LexisNexis, which is very extensive. (We limited our search by “Newspapers” and “UK,” which produced 736 results.) We have been working to hand-categorize this list according to breakdowns such as “regional papers,” “national papers,” “political affiliations,” “tabloids,” etc, in order to make sure that we understand as fully as possible what kinds of sources are accessible to us.

If you are using external lists to guide your selection of sources, include links here and indicate who produced them, for what purpose the list was produced, and any potential bias issues involved.

Our original lists were produced by two individuals (personal friends and colleagues) and therefore necessarily represent the inherent bias of single opinions. Nonetheless, these lists provided a wide range of sources in terms of “brow” level, geography, and political affiliation, instilling confidence that a large cross-section of social experience was represented. The list of sources available from LexisNexis carries the bias of a large institutional database which sees fit to preserve and make accessible certain sources and not others, though the fact that 736 sources are represented ensures that we are at least achieving exposure to many publications. We have also been able to cross-reference both the starter kit and LexisNexis lists with a Wikipedia list of UK newspapers by circulation figures, which gives us at least a rough estimate of popularity and influence.

Our categorization of publications by political affiliation is based on their official political party endorsements for 2017. We originally discovered this information in a Wikipedia page titled, “Endorsements in the United Kingdom general election, 2017.” We then verified these endorsements using a variety of more official secondary sources. There is some question remaining about how to categorize publications that do not endorse the same political party each year. Also, we only have political endorsement information for a small number of national publications; we do not yet have this information for smaller or regional publications.  


Original starter kit lists (which also includes links to additional articles about political bias)

LexisNexis list of publications 

Political endorsements list

UK newspapers by circulation 

3. Corpus Representativeness

How representative do you think your corpus is? (“Representativeness” can be interpreted and addressed in a number of ways, so tailor it to be most productive for your area.)

We have been working to arrange newspapers by region of publication, and, once the ongoing categorization of sources available through LexisNexis is completed, our corpus will be very representative in terms of geography. Many local papers, from all parts of the UK, are present in the corpus. We have also been working to assemble a list of papers by political affiliation, representing both right- and left-wing perspectives. We have both online and print sources and papers of all “brow” levels.

What challenges in achieving representativeness have you encountered?

The UK’s relatively small size means that political and cultural differences within the country can be quite distinct at a local level, making regionality an important mechanism for categorizing and interpreting our data. Geographical/regional categorization has generated several methodological questions for us. The first issue is that the seemingly simple division between “national” and “regional” papers becomes complicated when we recognize that many of the papers we regard as “national” are centered in London and thus necessarily carry with them a regional bias. It is also possible that publications based outside of London could be widely enough circulated to be considered national, though we have not categorized any such examples. Right now, our categorization of sources as “national” verses “regional” is based on how they describe themselves, but this process of categorization is ongoing.

The publications that we have designated as “regional” are categorized based on where they are published, not on where they circulate, which in some cases might be quite a different metric. Adding circulation data (both in terms of hard numbers and in terms of location) to our selection logic is an ongoing need. Acquiring circulation data will help us assign publications to geographical regions more empirically and make more informed decisions about which publications to include in topic models. (Right now we have operated along the assumption that the most widely circulated newspapers correlate with those that receive the most hits for our search terms in Lexis Nexis, but of course, this may not be the case). The region categorization is based on historic counties, which adds its own challenges, as many prominent UK cities overlap county territories. But in other cases, the county categories make a lot of sense, since many regional publications describe themselves as serving certain county borders. Wales, Scotland, and Northern Ireland are not yet categorized by county; in our categorizations, they were considered “macro regions.” We have also grappled with the question of whether the UK group should include a sub-corpus from the Republic of Ireland (due to similarity of location and perhaps cultural orientation), or whether the Republic of Ireland should remain in the larger European group.

In terms of the representation of diversity, many of the newspapers collected so far appear to have a predominantly white-male viewpoint, which means that we are missing a strong body of sources that would speak to specific ethnic and gender communities. To achieve political diversity, we have relied so far upon data that shows which political parties a paper has formally endorsed. This metric, however, does not necessarily correspond to what political perceptions and biases adhere to a given publication (either from the writers or from the readers).

We have also struggled with how best to measure diversity in terms of socio-economic class. While some papers seem at first glance to fall easily into categories such as “high-brow” or “tabloid,” it is much more difficult and problematic to align these genres with a particular class of readership. In other words, we find it important to avoid the easy elision between “sensational” and “lower class” or between “intellectual” and “higher class.” In order to nuance these categories, we would need to find data about circulation and readership explicitly in terms of economic status, which we have yet to see. The general price of each paper could serve as a possible stand-in, but again with the risk of oversimplifying a complex issue of class, access, and taste-making.

Provide a tally breakdown of the various facets of sources in your area of focus that WE1S is considering as possible measures of overall corpus “representativeness” (for example, by source or media type, nationality, region, political orientation, identification with specific racial, ethnic, and gender audiences, etc.).

These numbers represent the sources that we have already collected. (The number that we have categorized, but have not yet collected, is higher.)

Overall publications in the UK: 62


Scotland: 10

Wales: 4

Northern Ireland: 5

England: 43

Geography by Region:

National: 17

Durham: 2

Derbyshire and Yorkshire: 3

Lancashire: 2

Lincoln: 1

Norfolk and Suffolk: 1

Warwick, Worcester, Stafford, Cheshire: 2

Oxford: 1

Nottingham: 2

Leicester: 1

Dorset and Devon: 2

Gloucester and Somerset: 2

Hampshire: 2

Essex: 1

London: 2

Kent: 1

Sussex: 1

Political Affiliation:


Conservative Party: 7

Labour Party: 4


Scottish National Party: 2

Labour Party: 1 

What challenges or difficulties have you encountered in the source selection or collection process? Do you anticipate any challenges emerging from your work going forward?

The largest challenge for us has been making headway through the sheer number of sources available. Whereas many teams have struggled to find any accessible sources, the UK team suffers from the opposite problem: an embarrassment of riches. The categorization process of newspapers available from LexisNexis is ongoing and will require more research time in future.

We have also experienced a gap between the sources that LexisNexis claims to have and those that seem to be accessible to us. This gap has meant that some of the most widely circulated British papers (like The Sun and The London Times) have been unavailable. Our topic models therefore cannot be fully representative until these important high-impact sources are included.  

Another ongoing question is whether to treat Sunday versions of papers differently from weekday versions (for instance, whether to give them more weight in a topic model). Our current suggestion is that Sunday papers should merely be folded in with their weekday counterparts and receive equal treatment. It is important to note when collecting sources in LexisNexis, however, that Sunday versions will often be listed as separate sources that need to be  queried individually. (For instance, The Daily Telegraph and The Sunday Telegraph are listed as two different papers with two different Source IDs.) Similarly, online versions of papers are listed separately from print versions, and an ongoing decision is whether content differs enough to justify including both in the same topic model (as we are currently doing).

5. Research Scan

Conduct some preliminary research on the questions or challenges that you provided in sections three and four.

Have other scholars reflected on these issues? Are there publications that address these problems? Has research been conducted on how to overcome these challenges or at least acknowledge them productively?

The most immediately relevant research tool for us is the preliminary information collected by the WE1S team in 2015, particularly on the question of whether to collect on “the arts” or “humanities” (see #6): 

6. Additional Comments/Reflections

Include any other issues or questions that you have encountered that may not fit into any of the above categories.

A large question that remains to be decided fully is whether to use the search term of “humanities” or “the arts” when querying British sources. Almost universally, “the arts” gets more hits in publications than “humanities,” so searching for the former would automatically give us more data to work with. However, it is an ongoing question for our team as to whether “the arts” means the same thing in Britain as we would mean by the “humanities” in the US. The two terms can also exist ambiguously in tandem. For instance, CRASSH at Cambridge University stands for “Centre for Research in the Arts, Social Sciences and Humanities,” leading to the question of distinction between the first and last categories. A high priority for future research is to generate parallel topic models for “the arts” and “humanities” for target sets of publications in order to determine whether the two behave similarly or differently and which might be most productive for extended study. The tension between the two terms in British discourse might also prompt us to be more specific about what we mean by “humanities” in other geographical regions, even close to home. For instance, when we use the term “arts” in the US, do we generally mean the act of artistic creation, which creations would then be studied by the “humanities”? If so, do we want to exclude or include the former in our analysis?

For the purposes of the summer research camp, we have also minimized collections on “liberal arts.” Continued research is needed on this decision.

Other future directions of research include:

1) Extending our corpus historically to reflect the changing meanings, governance, and accessibility of the humanities in 19th- and 20th-century British sources. It is our hunch that Victorian sources in particular would provide a fascinating view of the process by which the humanities began to crystalize into academic disciplines and institutional departments, which has a particularly important bearing on accessibility to the humanities by different socio-economic classes. In general, it seems important to make sure that the WE1S project’s emphasis on diversity — currently focused on ethnicity and gender — also includes a focus on class. British sources, particularly historical ones, would be well poised to help us explore this fundamental question.

2) Collaborating with other teams to extend and frame our corpus. For instance, it would be interesting to compare UK topic models to those generated for Ireland by the Europe team, and we could also expand our corpus by including UK TV sources collected by the Broadcast team. We would also love to join forces with the Diversity Team in order to find more UK publications that would speak to diverse communities and social groups.