Bibliography – Text Classification

Sorted by: Author | Title | Date (with abstracts) | Recently Added

(all) Corpus Representativeness
Comparison paradigms for idea of a corpus: Archives as Paradigm | Canons as Paradigm | Editions as Paradigm | Corpus Linguistics as Paradigm

Topic Modeling (all)

Selected DH research and resources bearing on, or utilized by, the WE1S project.

(all) Grounded Theory | Human Subjects Research

(all) | Publications | Talks | Research Blog Posts (selected)

Searchable version of bibliography on Zotero site For WE1S developers: Biblio style guide | Biblio collection form (suggest additions) | WE1S Bibliography Ontology Outline

2133649 Text classification 1 chicago-fullnote-bibliography 50 date desc year 1 1 1 6245 https://we1s.ucsb.edu/wp-content/plugins/zotpress/

%7B%22status%22%3A%22success%22%2C%22updateneeded%22%3Afalse%2C%22instance%22%3Afalse%2C%22meta%22%3A%7B%22request_last%22%3A0%2C%22request_next%22%3A0%2C%22used_cache%22%3Atrue%7D%2C%22data%22%3A%5B%7B%22key%22%3A%22VCQ8ZIXE%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Kwak%20et%20al.%22%2C%22parsedDate%22%3A%222020%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BKwak%2C%20Haewoon%2C%20Jisun%20An%2C%20and%20Yong-Yeol%20Ahn.%20%26%23x201C%3BA%20Systematic%20Media%20Frame%20Analysis%20of%201.5%20Million%20New%20York%20Times%20Articles%20from%202000%20to%202017.%26%23x201D%3B%20%26lt%3Bi%26gt%3BarXiv%3A2005.01803%20%5BCs%5D%26lt%3B%5C%2Fi%26gt%3B%2C%202020.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttp%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2005.01803%26%23039%3B%26gt%3Bhttp%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2005.01803%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DVCQ8ZIXE%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22A%20Systematic%20Media%20Frame%20Analysis%20of%201.5%20Million%20New%20York%20Times%20Articles%20from%202000%20to%202017%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Haewoon%22%2C%22lastName%22%3A%22Kwak%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jisun%22%2C%22lastName%22%3A%22An%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yong-Yeol%22%2C%22lastName%22%3A%22Ahn%22%7D%5D%2C%22abstractNote%22%3A%22Framing%20is%20an%20indispensable%20narrative%20device%20for%20news%20media%20because%20even%20the%20same%20facts%20may%20lead%20to%20conflicting%20understandings%20if%20deliberate%20framing%20is%20employed.%20Therefore%2C%20identifying%20media%20framing%20is%20a%20crucial%20step%20to%20understanding%20how%20news%20media%20influence%20the%20public.%20Framing%20is%2C%20however%2C%20difficult%20to%20operationalize%20and%20detect%2C%20and%20thus%20traditional%20media%20framing%20studies%20had%20to%20rely%20on%20manual%20annotation%2C%20which%20is%20challenging%20to%20scale%20up%20to%20massive%20news%20datasets.%20Here%2C%20by%20developing%20a%20media%20frame%20classifier%20that%20achieves%20state-of-the-art%20performance%2C%20we%20systematically%20analyze%20the%20media%20frames%20of%201.5%20million%20New%20York%20Times%20articles%20published%20from%202000%20to%202017.%20By%20examining%20the%20ebb%20and%20flow%20of%20media%20frames%20over%20almost%20two%20decades%2C%20we%20show%20that%20short-term%20frame%20abundance%20fluctuation%20closely%20corresponds%20to%20major%20events%2C%20while%20there%20also%20exist%20several%20long-term%20trends%2C%20such%20as%20the%20gradually%20increasing%20prevalence%20of%20the%20%60%60Cultural%20identity%26%23039%3B%26%23039%3B%20frame.%20By%20examining%20specific%20topics%20and%20sentiments%2C%20we%20identify%20characteristics%20and%20dynamics%20of%20each%20frame.%20Finally%2C%20as%20a%20case%20study%2C%20we%20delve%20into%20the%20framing%20of%20mass%20shootings%2C%20revealing%20three%20major%20framing%20patterns.%20Our%20scalable%2C%20computational%20approach%20to%20massive%20news%20datasets%20opens%20up%20new%20pathways%20for%20systematic%20media%20framing%20studies.%22%2C%22date%22%3A%222020%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%22kwak2020%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2005.01803%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Data%20science%22%7D%2C%7B%22tag%22%3A%22Frame%20analysis%20of%20media%22%7D%2C%7B%22tag%22%3A%22Machine%20learning%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22642H4YJL%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Hvitfeldt%20and%20Silge%22%2C%22parsedDate%22%3A%222020%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BHvitfeldt%2C%20Emil%2C%20and%20Julia%20Silge.%20%26lt%3Bi%26gt%3BSupervised%20Machine%20Learning%20for%20Text%20Analysis%20in%20R%26lt%3B%5C%2Fi%26gt%3B.%20Emil%20Hvitfeldt%20and%20Julia%20Silge%2C%202020.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fsmltar.com%5C%2F%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fsmltar.com%5C%2F%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3D642H4YJL%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22book%22%2C%22title%22%3A%22Supervised%20Machine%20Learning%20for%20Text%20Analysis%20in%20R%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Emil%22%2C%22lastName%22%3A%22Hvitfeldt%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Julia%22%2C%22lastName%22%3A%22Silge%22%7D%5D%2C%22abstractNote%22%3A%22%5BFirst%20paragraph%3A%5D%20Modeling%20as%20a%20statistical%20practice%20can%20encompass%20a%20wide%20variety%20of%20activities.%20This%20book%20focuses%20on%20supervised%20or%20predictive%20modeling%20for%20text%2C%20using%20text%20data%20to%20make%20predictions%20about%20the%20world%20around%20us.%20We%20use%20the%20tidymodels%20framework%20for%20modeling%2C%20a%20consistent%20and%20flexible%20collection%20of%20R%20packages%20developed%20to%20encourage%20good%20statistical%20practice.%22%2C%22date%22%3A%222020%22%2C%22originalDate%22%3A%22%22%2C%22originalPublisher%22%3A%22%22%2C%22originalPlace%22%3A%22%22%2C%22format%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%22hvitfeldt2020%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fsmltar.com%5C%2F%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Text%20Analysis%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%2C%7B%22tag%22%3A%22Word%20Embedding%20and%20Vector%20Semantics%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22AC8XA4ZJ%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Ford%22%2C%22parsedDate%22%3A%222017%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BFord%2C%20Clay.%20%26%23x201C%3BThe%20Wilcoxon%20Rank%20Sum%20Test.%26%23x201D%3B%20University%20of%20Virginia%20Library%20Research%20Data%20Services%20%2B%20Sciences%2C%202017.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdata.library.virginia.edu%5C%2Fthe-wilcoxon-rank-sum-test%5C%2F%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdata.library.virginia.edu%5C%2Fthe-wilcoxon-rank-sum-test%5C%2F%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DAC8XA4ZJ%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22webpage%22%2C%22title%22%3A%22The%20Wilcoxon%20Rank%20Sum%20Test%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Clay%22%2C%22lastName%22%3A%22Ford%22%7D%5D%2C%22abstractNote%22%3A%22Collections%2C%20services%2C%20branches%2C%20and%20contact%20information.%22%2C%22date%22%3A%222017%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%22ford2017%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdata.library.virginia.edu%5C%2Fthe-wilcoxon-rank-sum-test%5C%2F%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Text%20Analysis%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22T6XSN6DR%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Lijffijt%20et%20al.%22%2C%22parsedDate%22%3A%222016%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BLijffijt%2C%20Jefrey%2C%20Terttu%20Nevalainen%2C%20Tanja%20S%26%23xE4%3Bily%2C%20Panagiotis%20Papapetrou%2C%20Kai%20Puolam%26%23xE4%3Bki%2C%20and%20Heikki%20Mannila.%20%26%23x201C%3BSignificance%20Testing%20of%20Word%20Frequencies%20in%20Corpora.%26%23x201D%3B%20%26lt%3Bi%26gt%3BLiterary%20and%20Linguistic%20Computing%26lt%3B%5C%2Fi%26gt%3B%2031%2C%20no.%202%20%282016%29%3A%20374%26%23x2013%3B97.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1093%5C%2Fllc%5C%2Ffqu064%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1093%5C%2Fllc%5C%2Ffqu064%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DT6XSN6DR%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Significance%20testing%20of%20word%20frequencies%20in%20corpora%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jefrey%22%2C%22lastName%22%3A%22Lijffijt%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Terttu%22%2C%22lastName%22%3A%22Nevalainen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tanja%22%2C%22lastName%22%3A%22S%5Cu00e4ily%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Panagiotis%22%2C%22lastName%22%3A%22Papapetrou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kai%22%2C%22lastName%22%3A%22Puolam%5Cu00e4ki%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Heikki%22%2C%22lastName%22%3A%22Mannila%22%7D%5D%2C%22abstractNote%22%3A%22Finding%20out%20whether%20a%20word%20occurs%20significantly%20more%20often%20in%20one%20text%20or%20corpus%20than%20in%20another%20is%20an%20important%20question%20in%20analysing%20corpora.%20As%20noted%20by%20Kilgarriff%20%28Language%20is%20never%2C%20ever%2C%20ever%2C%20random%2C%20Corpus%20Linguistics%20and%20Linguistic%20Theory%20%2C%202005%3B%201%282%29%3A%20263%5Cu201376.%29%2C%20the%20use%20of%20the%20%5Cu03c7%202%20and%20log-likelihood%20ratio%20tests%20is%20problematic%20in%20this%20context%2C%20as%20they%20are%20based%20on%20the%20assumption%20that%20all%20samples%20are%20statistically%20independent%20of%20each%20other.%20However%2C%20words%20within%20a%20text%20are%20not%20independent.%20As%20pointed%20out%20in%20Kilgarriff%20%28Comparing%20corpora%2C%20International%20Journal%20of%20Corpus%20Linguistics%20%2C%202001%3B%206%281%29%3A%201%5Cu201337%29%20and%20Paquot%20and%20Bestgen%20%28Distinctive%20words%20in%20academic%20writing%3A%20a%20comparison%20of%20three%20statistical%20tests%20for%20keyword%20extraction.%20In%20Jucker%2C%20A.%2C%20Schreier%2C%20D.%2C%20and%20Hundt%2C%20M.%20%28eds%29%2C%20Corpora%3A%20Pragmatics%20and%20Discourse%20.%20Amsterdam%3A%20Rodopi%2C%202009%2C%20pp.%20247%5Cu201369%29%2C%20it%20is%20possible%20to%20represent%20the%20data%20differently%20and%20employ%20other%20tests%2C%20such%20that%20we%20assume%20independence%20at%20the%20level%20of%20texts%20rather%20than%20individual%20words.%20This%20allows%20us%20to%20account%20for%20the%20distribution%20of%20words%20within%20a%20corpus.%20In%20this%20article%20we%20compare%20the%20significance%20estimates%20of%20various%20statistical%20tests%20in%20a%20controlled%20resampling%20experiment%20and%20in%20a%20practical%20setting%2C%20studying%20differences%20between%20texts%20produced%20by%20male%20and%20female%20fiction%20writers%20in%20the%20British%20National%20Corpus.%20We%20find%20that%20the%20choice%20of%20the%20test%2C%20and%20hence%20data%20representation%2C%20matters.%20We%20conclude%20that%20significance%20testing%20can%20be%20used%20to%20find%20consequential%20differences%20between%20corpora%2C%20but%20that%20assuming%20independence%20between%20all%20words%20may%20lead%20to%20overestimating%20the%20significance%20of%20the%20observed%20differences%2C%20especially%20for%20poorly%20dispersed%20words.%20We%20recommend%20the%20use%20of%20the%20t-test%2C%20Wilcoxon%20rank-sum%20test%2C%20or%20bootstrap%20test%20for%20comparing%20word%20frequencies%20across%20corpora.%22%2C%22date%22%3A%222016%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%2210.1093%5C%2Fllc%5C%2Ffqu064%22%2C%22citationKey%22%3A%22lijffijt2016%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Facademic.oup.com%5C%2Fdsh%5C%2Farticle%5C%2F31%5C%2F2%5C%2F374%5C%2F2462752%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%220268-1145%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Text%20Analysis%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%2C%7B%22key%22%3A%225UIJV7XJ%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Long%20and%20So%22%2C%22parsedDate%22%3A%222016%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BLong%2C%20Hoyt%2C%20and%20Richard%20Jean%20So.%20%26%23x201C%3BLiterary%20Pattern%20Recognition%3A%20Modernism%20between%20Close%20Reading%20and%20Machine%20Learning.%26%23x201D%3B%20%26lt%3Bi%26gt%3BCritical%20Inquiry%26lt%3B%5C%2Fi%26gt%3B%2042%2C%20no.%202%20%282016%29%3A%20235%26%23x2013%3B67.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1086%5C%2F684353%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1086%5C%2F684353%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3D5UIJV7XJ%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Literary%20Pattern%20Recognition%3A%20Modernism%20between%20Close%20Reading%20and%20Machine%20Learning%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hoyt%22%2C%22lastName%22%3A%22Long%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Richard%20Jean%22%2C%22lastName%22%3A%22So%22%7D%5D%2C%22abstractNote%22%3A%22%5BFirst%20paragraph%3A%5D%20The%20title%20of%20this%20essay%20announces%20its%20core%20ambition%3A%20to%20propose%20a%20model%20of%20reading%20literary%20texts%20that%20synthesizes%20familiar%20humanistic%20approaches%20with%20computational%20ones.%20In%20recent%20years%2C%20debates%20over%20the%20use%20of%20computers%20to%20interpret%20literature%20have%20been%20fierce.%20On%20one%20side%2C%20scholars%20such%20as%20Franco%20Moretti%2C%20Matthew%20Jockers%2C%20Matthew%20Wilkens%2C%20and%20Andrew%20Piper%20defend%20the%20deployment%20of%20sophisticated%20machine%20techniques%2C%20like%20topic%20modeling%20and%20network%20analysis%2C%20to%20expose%20macroscale%20patterns%20of%20language%20and%20form%20culled%20from%20massive%20digitized%20literary%20corpora.1%20On%20the%20other%20side%2C%20scholars%20such%20as%20Alexander%20Galloway%2C%20David%20Golumbia%2C%20Tara%20McPherson%2C%20and%20Alan%20Liu%2C%20who%20work%20in%20the%20field%20of%20New%20Media%20Studies%2C%20have%20criticized%20machine%20techniques%20for%20reducing%20the%20complexity%20of%20literary%20texts%20to%20mere%20%5Cu201cdata%5Cu201d%20or%20for%20being%20incommensurable%20with%20the%20goals%20of%20critical%20theory.2%20Here%20we%20move%20beyond%20this%20impasse%20by%20modeling%20a%20form%20of%20literary%20analysis%20that%2C%20rather%20than%20leveraging%20one%20mode%20of%20reading%20against%20another%2C%20synthesizes%20humanistic%20and%20computational%20approaches%20into%20what%20we%20call%20literary%20pattern%20recognition.%22%2C%22date%22%3A%222016%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%2210.1086%5C%2F684353%22%2C%22citationKey%22%3A%22long2016%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.journals.uchicago.edu%5C%2Fdoi%5C%2F10.1086%5C%2F684353%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%220093-1896%2C%201539-7858%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22DH%20Distant%20reading%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%2C%7B%22key%22%3A%229C54BR32%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Freitas%22%2C%22parsedDate%22%3A%222014%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BFreitas%2C%20Alex%20A.%20%26%23x201C%3BComprehensible%20Classification%20Models%3A%20A%20Position%20Paper.%26%23x201D%3B%20In%20%26lt%3Bi%26gt%3BACM%20SIGKDD%20Explorations%26lt%3B%5C%2Fi%26gt%3B%2C%2015.1%3A1%26%23x2013%3B10.%20Association%20for%20Computing%20Machinery%2C%202014.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F2594473.2594475%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F2594473.2594475%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3D9C54BR32%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Comprehensible%20classification%20models%3A%20a%20position%20paper%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Alex%20A.%22%2C%22lastName%22%3A%22Freitas%22%7D%5D%2C%22abstractNote%22%3A%22The%20vast%20majority%20of%20the%20literature%20evaluates%20the%20performance%20of%20classification%20models%20using%20only%20the%20criterion%20of%20predictive%20accuracy.%20This%20paper%20reviews%20the%20case%20for%20considering%20also%20the%20comprehensibility%20%28interpretability%29%20of%20classification%20models%2C%20and%20discusses%20the%20interpretability%20of%20five%20types%20of%20classification%20models%2C%20namely%20decision%20trees%2C%20classification%20rules%2C%20decision%20tables%2C%20nearest%20neighbors%20and%20Bayesian%20network%20classifiers.%20We%20discuss%20both%20interpretability%20issues%20which%20are%20specific%20to%20each%20of%20those%20model%20types%20and%20more%20generic%20interpretability%20issues%2C%20namely%20the%20drawbacks%20of%20using%20model%20size%20as%20the%20only%20criterion%20to%20evaluate%20the%20comprehensibility%20of%20a%20model%2C%20and%20the%20use%20of%20monotonicity%20constraints%20to%20improve%20the%20comprehensibility%20and%20acceptance%20of%20classification%20models%20by%20users.%22%2C%22proceedingsTitle%22%3A%22ACM%20SIGKDD%20Explorations%22%2C%22conferenceName%22%3A%22%22%2C%22date%22%3A%222014%22%2C%22eventPlace%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22citationKey%22%3A%22freitas2014%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F2594473.2594475%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Interpretability%20and%20explainability%22%7D%2C%7B%22tag%22%3A%22Machine%20learning%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22FNF9YR6Q%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Hand%22%2C%22parsedDate%22%3A%222006%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BHand%2C%20David%20J.%20%26%23x201C%3BClassifier%20Technology%20and%20the%20Illusion%20of%20Progress.%26%23x201D%3B%20%26lt%3Bi%26gt%3BStatistical%20Science%26lt%3B%5C%2Fi%26gt%3B%2021%2C%20no.%201%20%282006%29%3A%201%26%23x2013%3B14.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1214%5C%2F088342306000000060%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1214%5C%2F088342306000000060%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DFNF9YR6Q%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Classifier%20Technology%20and%20the%20Illusion%20of%20Progress%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22David%20J.%22%2C%22lastName%22%3A%22Hand%22%7D%5D%2C%22abstractNote%22%3A%22A%20great%20many%20tools%20have%20been%20developed%20for%20supervised%20classification%2C%20ranging%20from%20early%20methods%20such%20as%20linear%20discriminant%20analysis%20through%20to%20modern%20developments%20such%20as%20neural%20networks%20and%20support%20vector%20machines.%20A%20large%20number%20of%20comparative%20studies%20have%20been%20conducted%20in%20attempts%20to%20establish%20the%20relative%20superiority%20of%20these%20methods.%20This%20paper%20argues%20that%20these%20comparisons%20often%20fail%20to%20take%20into%20account%20important%20aspects%20of%20real%20problems%2C%20so%20that%20the%20apparent%20superiority%20of%20more%20sophisticated%20methods%20may%20be%20something%20of%20an%20illusion.%20In%20particular%2C%20simple%20methods%20typically%20yield%20performance%20almost%20as%20good%20as%20more%20sophisticated%20methods%2C%20to%20the%20extent%20that%20the%20difference%20in%20performance%20may%20be%20swamped%20by%20other%20sources%20of%20uncertainty%20that%20generally%20are%20not%20considered%20in%20the%20classical%20supervised%20classification%20paradigm.%22%2C%22date%22%3A%222006%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%2210.1214%5C%2F088342306000000060%22%2C%22citationKey%22%3A%22hand2006%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fprojecteuclid.org%5C%2Feuclid.ss%5C%2F1149600839%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%220883-4237%2C%202168-8745%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22IC8UJ7SR%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Sebastiani%22%2C%22parsedDate%22%3A%222002%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BSebastiani%2C%20Fabrizio.%20%26%23x201C%3BMachine%20Learning%20in%20Automated%20Text%20Categorization.%26%23x201D%3B%20%26lt%3Bi%26gt%3BACM%20Computing%20Surveys%20%28CSUR%29%26lt%3B%5C%2Fi%26gt%3B%2034%2C%20no.%201%20%282002%29%3A%201%26%23x2013%3B47.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F505282.505283%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F505282.505283%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DIC8UJ7SR%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Machine%20learning%20in%20automated%20text%20categorization%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Fabrizio%22%2C%22lastName%22%3A%22Sebastiani%22%7D%5D%2C%22abstractNote%22%3A%22The%20automated%20categorization%20%28or%20classification%29%20of%20texts%20into%20predefined%20categories%20has%20witnessed%20a%20booming%20interest%20in%20the%20last%2010%20years%2C%20due%20to%20the%20increased%20availability%20of%20documents%20in%20digital%20form%20and%20the%20ensuing%20need%20to%20organize%20them.%20In%20the%20research%20community%20the%20dominant%20approach%20to%20this%20problem%20is%20based%20on%20machine%20learning%20techniques%3A%20a%20general%20inductive%20process%20automatically%20builds%20a%20classifier%20by%20learning%2C%20from%20a%20set%20of%20preclassified%20documents%2C%20the%20characteristics%20of%20the%20categories.%20The%20advantages%20of%20this%20approach%20over%20the%20knowledge%20engineering%20approach%20%28consisting%20in%20the%20manual%20definition%20of%20a%20classifier%20by%20domain%20experts%29%20are%20a%20very%20good%20effectiveness%2C%20considerable%20savings%20in%20terms%20of%20expert%20labor%20power%2C%20and%20straightforward%20portability%20to%20different%20domains.%20This%20survey%20discusses%20the%20main%20approaches%20to%20text%20categorization%20that%20fall%20within%20the%20machine%20learning%20paradigm.%20We%20will%20discuss%20in%20detail%20issues%20pertaining%20to%20three%20different%20problems%2C%20namely%2C%20document%20representation%2C%20classifier%20construction%2C%20and%20classifier%20evaluation.%22%2C%22date%22%3A%222002%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%2210.1145%5C%2F505282.505283%22%2C%22citationKey%22%3A%22sebastiani2002%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Fdl.acm.org%5C%2Fdoi%5C%2F10.1145%5C%2F505282.505283%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%220360-0300%2C%201557-7341%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Data%20science%22%7D%2C%7B%22tag%22%3A%22Machine%20learning%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%5D%7D

Kwak, Haewoon, Jisun An, and Yong-Yeol Ahn. “A Systematic Media Frame Analysis of 1.5 Million New York Times Articles from 2000 to 2017.” arXiv:2005.01803 [Cs], 2020. http://arxiv.org/abs/2005.01803. Cite

Hvitfeldt, Emil, and Julia Silge. Supervised Machine Learning for Text Analysis in R. Emil Hvitfeldt and Julia Silge, 2020. https://smltar.com/. Cite

Ford, Clay. “The Wilcoxon Rank Sum Test.” University of Virginia Library Research Data Services + Sciences, 2017. https://data.library.virginia.edu/the-wilcoxon-rank-sum-test/. Cite

Lijffijt, Jefrey, Terttu Nevalainen, Tanja Säily, Panagiotis Papapetrou, Kai Puolamäki, and Heikki Mannila. “Significance Testing of Word Frequencies in Corpora.” Literary and Linguistic Computing 31, no. 2 (2016): 374–97. https://doi.org/10.1093/llc/fqu064. Cite

Long, Hoyt, and Richard Jean So. “Literary Pattern Recognition: Modernism between Close Reading and Machine Learning.” Critical Inquiry 42, no. 2 (2016): 235–67. https://doi.org/10.1086/684353. Cite

Freitas, Alex A. “Comprehensible Classification Models: A Position Paper.” In ACM SIGKDD Explorations, 15.1:1–10. Association for Computing Machinery, 2014. https://doi.org/10.1145/2594473.2594475. Cite

Hand, David J. “Classifier Technology and the Illusion of Progress.” Statistical Science 21, no. 1 (2006): 1–14. https://doi.org/10.1214/088342306000000060. Cite

Sebastiani, Fabrizio. “Machine Learning in Automated Text Categorization.” ACM Computing Surveys (CSUR) 34, no. 1 (2002): 1–47. https://doi.org/10.1145/505282.505283. Cite