Bibliography – Text Analysis

Sorted by: Author | Title | Date (with abstracts) | Recently Added

(all) Corpus Representativeness
Comparison paradigms for idea of a corpus: Archives as Paradigm | Canons as Paradigm | Editions as Paradigm | Corpus Linguistics as Paradigm

Topic Modeling (all)

Selected DH research and resources bearing on, or utilized by, the WE1S project.

(all) Grounded Theory | Human Subjects Research

(all) | Publications | Talks | Research Blog Posts (selected)

Searchable version of bibliography on Zotero site For WE1S developers: Biblio style guide | Biblio collection form (suggest additions) | WE1S Bibliography Ontology Outline

2133649 Text analysis 1 chicago-fullnote-bibliography 50 date desc year 1 1 1 6243 https://we1s.ucsb.edu/wp-content/plugins/zotpress/

%7B%22status%22%3A%22success%22%2C%22updateneeded%22%3Afalse%2C%22instance%22%3Afalse%2C%22meta%22%3A%7B%22request_last%22%3A0%2C%22request_next%22%3A0%2C%22used_cache%22%3Atrue%7D%2C%22data%22%3A%5B%7B%22key%22%3A%22642H4YJL%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Hvitfeldt%20and%20Silge%22%2C%22parsedDate%22%3A%222020%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BHvitfeldt%2C%20Emil%2C%20and%20Julia%20Silge.%20%26lt%3Bi%26gt%3BSupervised%20Machine%20Learning%20for%20Text%20Analysis%20in%20R%26lt%3B%5C%2Fi%26gt%3B.%20Emil%20Hvitfeldt%20and%20Julia%20Silge%2C%202020.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fsmltar.com%5C%2F%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fsmltar.com%5C%2F%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3D642H4YJL%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22book%22%2C%22title%22%3A%22Supervised%20Machine%20Learning%20for%20Text%20Analysis%20in%20R%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Emil%22%2C%22lastName%22%3A%22Hvitfeldt%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Julia%22%2C%22lastName%22%3A%22Silge%22%7D%5D%2C%22abstractNote%22%3A%22%5BFirst%20paragraph%3A%5D%20Modeling%20as%20a%20statistical%20practice%20can%20encompass%20a%20wide%20variety%20of%20activities.%20This%20book%20focuses%20on%20supervised%20or%20predictive%20modeling%20for%20text%2C%20using%20text%20data%20to%20make%20predictions%20about%20the%20world%20around%20us.%20We%20use%20the%20tidymodels%20framework%20for%20modeling%2C%20a%20consistent%20and%20flexible%20collection%20of%20R%20packages%20developed%20to%20encourage%20good%20statistical%20practice.%22%2C%22date%22%3A%222020%22%2C%22originalDate%22%3A%22%22%2C%22originalPublisher%22%3A%22%22%2C%22originalPlace%22%3A%22%22%2C%22format%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%22hvitfeldt2020%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fsmltar.com%5C%2F%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Text%20Analysis%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%2C%7B%22tag%22%3A%22Word%20Embedding%20and%20Vector%20Semantics%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22MS4U5EAW%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Rogers%20et%20al.%22%2C%22parsedDate%22%3A%222020%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BRogers%2C%20Anna%2C%20Olga%20Kovaleva%2C%20and%20Anna%20Rumshisky.%20%26%23x201C%3BA%20Primer%20in%20BERTology%3A%20What%20We%20Know%20about%20How%20BERT%20Works.%26%23x201D%3B%20%26lt%3Bi%26gt%3BarXiv%3A2002.12327%20%5BCs%5D%26lt%3B%5C%2Fi%26gt%3B%2C%202020.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttp%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2002.12327%26%23039%3B%26gt%3Bhttp%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2002.12327%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DMS4U5EAW%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22A%20Primer%20in%20BERTology%3A%20What%20we%20know%20about%20how%20BERT%20works%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Anna%22%2C%22lastName%22%3A%22Rogers%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Olga%22%2C%22lastName%22%3A%22Kovaleva%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Anna%22%2C%22lastName%22%3A%22Rumshisky%22%7D%5D%2C%22abstractNote%22%3A%22Transformer-based%20models%20are%20now%20widely%20used%20in%20NLP%2C%20but%20we%20still%20do%20not%20understand%20a%20lot%20about%20their%20inner%20workings.%20This%20paper%20describes%20what%20is%20known%20to%20date%20about%20the%20famous%20BERT%20model%20%28Devlin%20et%20al.%202019%29%2C%20synthesizing%20over%2040%20analysis%20studies.%20We%20also%20provide%20an%20overview%20of%20the%20proposed%20modifications%20to%20the%20model%20and%20its%20training%20regime.%20We%20then%20outline%20the%20directions%20for%20further%20research.%22%2C%22date%22%3A%222020%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%22rogers2020%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2002.12327%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Artificial%20intelligence%22%7D%2C%7B%22tag%22%3A%22Interpretability%20and%20explainability%22%7D%2C%7B%22tag%22%3A%22Machine%20learning%22%7D%2C%7B%22tag%22%3A%22Natural%20language%20processing%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22HHTJ2S2M%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Yang%20et%20al.%22%2C%22parsedDate%22%3A%222019%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BYang%2C%20Yiwei%2C%20Eser%20Kandogan%2C%20Yunyao%20Li%2C%20Prithviraj%20Sen%2C%20and%20Walter%20S.%20Lasecki.%20%26%23x201C%3BA%20Study%20on%20Interaction%20in%20Human-in-the-Loop%20Machine%20Learning%20for%20Text%20Analytics.%26%23x201D%3B%20In%20%26lt%3Bi%26gt%3BIUI%20Workshops%202019%26lt%3B%5C%2Fi%26gt%3B.%20Los%20Angeles%3A%20ACM%2C%202019.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fwww.semanticscholar.org%5C%2Fpaper%5C%2FA-Study-on-Interaction-in-Human-in-the-Loop-Machine-Yang-Kandogan%5C%2F03a4544caed21760df30f0e4f417bbe361c29c9e%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fwww.semanticscholar.org%5C%2Fpaper%5C%2FA-Study-on-Interaction-in-Human-in-the-Loop-Machine-Yang-Kandogan%5C%2F03a4544caed21760df30f0e4f417bbe361c29c9e%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DHHTJ2S2M%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22A%20Study%20on%20Interaction%20in%20Human-in-the-Loop%20Machine%20Learning%20for%20Text%20Analytics%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yiwei%22%2C%22lastName%22%3A%22Yang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Eser%22%2C%22lastName%22%3A%22Kandogan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yunyao%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Prithviraj%22%2C%22lastName%22%3A%22Sen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Walter%20S.%22%2C%22lastName%22%3A%22Lasecki%22%7D%5D%2C%22abstractNote%22%3A%22Machine%20learning%20%28ML%29%20models%20are%20often%20considered%20%5Cu201cblackboxes%5Cu201d%20as%20their%20internal%20representations%20fail%20to%20align%20with%20human%20understanding.%20While%20recent%20work%20attempted%20to%20expose%20the%20inner%20workings%20of%20ML%20models%20they%20do%20not%20allow%20users%20to%20interact%20directly%20with%20the%20model.%20This%20is%20especially%20problematic%20in%20domains%20where%20labeled%20data%20is%20limited%20as%20such%20the%20generalizability%20of%20ML%20models%20becomes%20questionable.%20We%20argue%20that%20the%20fundamental%20problem%20of%20generalizibility%20could%20be%20addressed%20by%20making%20ML%20models%20explainable%20in%20abstractions%20and%20expressions%20that%20make%20sense%20to%20users%20and%20by%20allowing%20them%20to%20interact%20with%20the%20model%20to%20assess%2C%20select%2C%20and%20build%20on.%20By%20involving%20humans%20in%20the%20process%20this%20way%2C%20we%20argue%20that%20the%20cocreated%20models%20will%20be%20more%20generalizable%20as%20they%20extrapolate%20what%20ML%20learns%20from%20few%20data%20when%20expressed%20in%20higher%20level%20abstractions%20that%20humans%20can%20verify%2C%20update%2C%20and%20expand%20based%20on%20their%20domain%20expertise.%20In%20this%20paper%2C%20we%20introduce%20RulesLearner%20that%20expresses%20MLmodel%20as%20rules%20on%20top%20of%20semantic%20linguistic%20structures%20in%20disjunctive%20normal%20form.%20RulesLearner%20allows%20users%20to%20interact%20with%20the%20patterns%20learned%20by%20the%20ML%20model%2C%20e.g.%20add%20and%20remove%20predicates%2C%20examine%20precision%20and%20recall%2C%20and%20construct%20a%20trusted%20set%20of%20rules.We%20conducted%20a%20preliminary%20user%20study%20which%20suggests%20that%20%281%29%20rules%20learned%20by%20ML%20are%20explainable%20and%20%282%29%20co-created%20model%20is%20more%20generalizable%20%283%29%20providing%20rules%20to%20experts%20improves%20overall%20productivity%2C%20with%20fewer%20people%20involved%2C%20with%20less%20expertise.%20Our%20findings%20link%20explainability%20and%20interactivity%20to%20generalizability%2C%20as%20such%20suggest%20that%20hybrid%20intelligence%20%28human-AI%29%20methods%20offer%20great%20potential.%22%2C%22proceedingsTitle%22%3A%22IUI%20Workshops%202019%22%2C%22conferenceName%22%3A%22IUI%20Workshops%22%2C%22date%22%3A%222019%22%2C%22eventPlace%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22citationKey%22%3A%22yang2019%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.semanticscholar.org%5C%2Fpaper%5C%2FA-Study-on-Interaction-in-Human-in-the-Loop-Machine-Yang-Kandogan%5C%2F03a4544caed21760df30f0e4f417bbe361c29c9e%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Interpretability%20and%20explainability%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22B88VQW5Q%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22parsedDate%22%3A%222019%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3B%26%23x201C%3BThe%20Programming%20Historian.%26%23x201D%3B%20%26lt%3Bi%26gt%3BProgramming%20Historian%26lt%3B%5C%2Fi%26gt%3B%2C%202019.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fprogramminghistorian.org%5C%2F%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fprogramminghistorian.org%5C%2F%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DB88VQW5Q%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22The%20Programming%20Historian%22%2C%22creators%22%3A%5B%5D%2C%22abstractNote%22%3A%22We%20publish%20novice-friendly%2C%20peer-reviewed%20tutorials%20that%20help%20humanists%20learn%20a%20wide%20range%20of%20digital%20tools%2C%20techniques%2C%20and%20workflows%20to%20facilitate%20research%20and%20teaching.%22%2C%22date%22%3A%222019%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%222019s%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fprogramminghistorian.org%5C%2F%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%2C%20es%2C%20fr%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22DH%20Digital%20humanities%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22AC8XA4ZJ%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Ford%22%2C%22parsedDate%22%3A%222017%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BFord%2C%20Clay.%20%26%23x201C%3BThe%20Wilcoxon%20Rank%20Sum%20Test.%26%23x201D%3B%20University%20of%20Virginia%20Library%20Research%20Data%20Services%20%2B%20Sciences%2C%202017.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdata.library.virginia.edu%5C%2Fthe-wilcoxon-rank-sum-test%5C%2F%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdata.library.virginia.edu%5C%2Fthe-wilcoxon-rank-sum-test%5C%2F%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DAC8XA4ZJ%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22webpage%22%2C%22title%22%3A%22The%20Wilcoxon%20Rank%20Sum%20Test%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Clay%22%2C%22lastName%22%3A%22Ford%22%7D%5D%2C%22abstractNote%22%3A%22Collections%2C%20services%2C%20branches%2C%20and%20contact%20information.%22%2C%22date%22%3A%222017%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%22ford2017%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdata.library.virginia.edu%5C%2Fthe-wilcoxon-rank-sum-test%5C%2F%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Text%20Analysis%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22TW8HLXHF%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Steinskog%20et%20al.%22%2C%22parsedDate%22%3A%222017%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BSteinskog%2C%20Asbj%26%23xF8%3Brn%2C%20Jonas%20Therkelsen%2C%20and%20Bj%26%23xF6%3Brn%20Gamb%26%23xE4%3Bck.%20%26%23x201C%3BTwitter%20Topic%20Modeling%20by%20Tweet%20Aggregation.%26%23x201D%3B%20In%20%26lt%3Bi%26gt%3BProceedings%20of%20the%2021st%20Nordic%20Conference%20of%20Computational%20Linguistics%26lt%3B%5C%2Fi%26gt%3B%2C%2077%26%23x2013%3B86.%20Gothenburg%3A%20Linko%26%23xA8%3Bping%20University%20Electronic%20Press%2C%202017.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fwww.semanticscholar.org%5C%2Fpaper%5C%2FTwitter-Topic-Modeling-by-Tweet-Aggregation-Steinskog-Therkelsen%5C%2F89735b06ee5d7bcb469ddc619022bbc9f2443f02%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fwww.semanticscholar.org%5C%2Fpaper%5C%2FTwitter-Topic-Modeling-by-Tweet-Aggregation-Steinskog-Therkelsen%5C%2F89735b06ee5d7bcb469ddc619022bbc9f2443f02%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DTW8HLXHF%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Twitter%20Topic%20Modeling%20by%20Tweet%20Aggregation%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Asbj%5Cu00f8rn%22%2C%22lastName%22%3A%22Steinskog%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jonas%22%2C%22lastName%22%3A%22Therkelsen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Bj%5Cu00f6rn%22%2C%22lastName%22%3A%22Gamb%5Cu00e4ck%22%7D%5D%2C%22abstractNote%22%3A%22Conventional%20topic%20modeling%20schemes%2C%20such%20as%20Latent%20Dirichlet%20Allocation%2C%20are%20known%20to%20perform%20inadequately%20when%20applied%20to%20tweets%2C%20due%20to%20the%20sparsity%20of%20short%20documents.%20To%20alleviate%20these%20disadvantages%2C%20we%20apply%20several%20pooling%20techniques%2C%20aggregating%20similar%20tweets%20into%20individual%20documents%2C%20and%20specifically%20study%20the%20aggregation%20of%20tweets%20sharing%20authors%20or%20hashtags.%20The%20results%20show%20that%20aggregating%20similar%20tweets%20into%20individual%20documents%20significantly%20increases%20topic%20coherence.%22%2C%22proceedingsTitle%22%3A%22Proceedings%20of%20the%2021st%20Nordic%20Conference%20of%20Computational%20Linguistics%22%2C%22conferenceName%22%3A%2221st%20Nordic%20Conference%20of%20Computational%20Linguistics%22%2C%22date%22%3A%222017%22%2C%22eventPlace%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22citationKey%22%3A%22steinskog2017%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.semanticscholar.org%5C%2Fpaper%5C%2FTwitter-Topic-Modeling-by-Tweet-Aggregation-Steinskog-Therkelsen%5C%2F89735b06ee5d7bcb469ddc619022bbc9f2443f02%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Social%20media%20analysis%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%5D%7D%7D%2C%7B%22key%22%3A%2235E4ZQDA%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Algee-Hewitt%20et%20al.%22%2C%22parsedDate%22%3A%222016%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BAlgee-Hewitt%2C%20Mark%2C%20Sarah%20Allison%2C%20Marissa%20Gemma%2C%20Ryan%20Heuser%2C%20Franco%20Moretti%2C%20and%20Hannah%20Walser.%20%26lt%3Bi%26gt%3BCanon%5C%2FArchive%3A%20Large-Scale%20Dynamics%20in%20the%20Literary%20Field%26lt%3B%5C%2Fi%26gt%3B.%20Vol.%2011.%20Stanford%20Literary%20Lab%20Pamphlets.%20Stanford%2C%20CA%3A%20Stanford%20Literary%20Lab%2C%202016.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Flitlab.stanford.edu%5C%2FLiteraryLabPamphlet11.pdf%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Flitlab.stanford.edu%5C%2FLiteraryLabPamphlet11.pdf%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3D35E4ZQDA%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22book%22%2C%22title%22%3A%22Canon%5C%2FArchive%3A%20Large-scale%20Dynamics%20in%20the%20Literary%20Field%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mark%22%2C%22lastName%22%3A%22Algee-Hewitt%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Sarah%22%2C%22lastName%22%3A%22Allison%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Marissa%22%2C%22lastName%22%3A%22Gemma%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ryan%22%2C%22lastName%22%3A%22Heuser%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Franco%22%2C%22lastName%22%3A%22Moretti%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hannah%22%2C%22lastName%22%3A%22Walser%22%7D%5D%2C%22abstractNote%22%3A%22%5BFirst%20paragraph%5D%3A%20Of%20the%20novelties%20introduced%20by%20digitization%20in%20the%20study%20of%20literature%2C%20the%20size%20of%20the%20archive%20is%20probably%20the%20most%20dramatic%3A%20we%20used%20to%20work%20on%20a%20couple%20of%20hundred%20nineteenth-century%20novels%2C%20and%20now%20we%20can%20analyze%20thousands%20of%20them%2C%20tens%20of%20thousands%2C%20tomorrow%20hundreds%20of%20thousands.%20It%5Cu2019s%20a%20moment%20of%20euphoria%2C%20for%20quantitative%20literary%20history%3A%20like%20having%20a%20telescope%20that%20makes%20you%20see%20entirely%20new%20galaxies.%20And%20it%5Cu2019s%20a%20moment%20of%20truth%3A%20so%2C%20have%20the%20digital%20skies%20revealed%20anything%20that%20changes%20our%20knowledge%20of%20literature%3F%22%2C%22date%22%3A%222016%22%2C%22originalDate%22%3A%22%22%2C%22originalPublisher%22%3A%22%22%2C%22originalPlace%22%3A%22%22%2C%22format%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%22algee-hewitt2016%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Flitlab.stanford.edu%5C%2FLiteraryLabPamphlet11.pdf%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Archives%20as%20paradigm%22%7D%2C%7B%22tag%22%3A%22Canons%20as%20paradigm%22%7D%2C%7B%22tag%22%3A%22Corpus%20representativeness%22%7D%2C%7B%22tag%22%3A%22DH%20Digital%20humanities%22%7D%2C%7B%22tag%22%3A%22DH%20Distant%20reading%22%7D%2C%7B%22tag%22%3A%22Humanities%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22T6XSN6DR%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Lijffijt%20et%20al.%22%2C%22parsedDate%22%3A%222016%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BLijffijt%2C%20Jefrey%2C%20Terttu%20Nevalainen%2C%20Tanja%20S%26%23xE4%3Bily%2C%20Panagiotis%20Papapetrou%2C%20Kai%20Puolam%26%23xE4%3Bki%2C%20and%20Heikki%20Mannila.%20%26%23x201C%3BSignificance%20Testing%20of%20Word%20Frequencies%20in%20Corpora.%26%23x201D%3B%20%26lt%3Bi%26gt%3BLiterary%20and%20Linguistic%20Computing%26lt%3B%5C%2Fi%26gt%3B%2031%2C%20no.%202%20%282016%29%3A%20374%26%23x2013%3B97.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1093%5C%2Fllc%5C%2Ffqu064%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1093%5C%2Fllc%5C%2Ffqu064%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DT6XSN6DR%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Significance%20testing%20of%20word%20frequencies%20in%20corpora%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jefrey%22%2C%22lastName%22%3A%22Lijffijt%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Terttu%22%2C%22lastName%22%3A%22Nevalainen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tanja%22%2C%22lastName%22%3A%22S%5Cu00e4ily%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Panagiotis%22%2C%22lastName%22%3A%22Papapetrou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kai%22%2C%22lastName%22%3A%22Puolam%5Cu00e4ki%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Heikki%22%2C%22lastName%22%3A%22Mannila%22%7D%5D%2C%22abstractNote%22%3A%22Finding%20out%20whether%20a%20word%20occurs%20significantly%20more%20often%20in%20one%20text%20or%20corpus%20than%20in%20another%20is%20an%20important%20question%20in%20analysing%20corpora.%20As%20noted%20by%20Kilgarriff%20%28Language%20is%20never%2C%20ever%2C%20ever%2C%20random%2C%20Corpus%20Linguistics%20and%20Linguistic%20Theory%20%2C%202005%3B%201%282%29%3A%20263%5Cu201376.%29%2C%20the%20use%20of%20the%20%5Cu03c7%202%20and%20log-likelihood%20ratio%20tests%20is%20problematic%20in%20this%20context%2C%20as%20they%20are%20based%20on%20the%20assumption%20that%20all%20samples%20are%20statistically%20independent%20of%20each%20other.%20However%2C%20words%20within%20a%20text%20are%20not%20independent.%20As%20pointed%20out%20in%20Kilgarriff%20%28Comparing%20corpora%2C%20International%20Journal%20of%20Corpus%20Linguistics%20%2C%202001%3B%206%281%29%3A%201%5Cu201337%29%20and%20Paquot%20and%20Bestgen%20%28Distinctive%20words%20in%20academic%20writing%3A%20a%20comparison%20of%20three%20statistical%20tests%20for%20keyword%20extraction.%20In%20Jucker%2C%20A.%2C%20Schreier%2C%20D.%2C%20and%20Hundt%2C%20M.%20%28eds%29%2C%20Corpora%3A%20Pragmatics%20and%20Discourse%20.%20Amsterdam%3A%20Rodopi%2C%202009%2C%20pp.%20247%5Cu201369%29%2C%20it%20is%20possible%20to%20represent%20the%20data%20differently%20and%20employ%20other%20tests%2C%20such%20that%20we%20assume%20independence%20at%20the%20level%20of%20texts%20rather%20than%20individual%20words.%20This%20allows%20us%20to%20account%20for%20the%20distribution%20of%20words%20within%20a%20corpus.%20In%20this%20article%20we%20compare%20the%20significance%20estimates%20of%20various%20statistical%20tests%20in%20a%20controlled%20resampling%20experiment%20and%20in%20a%20practical%20setting%2C%20studying%20differences%20between%20texts%20produced%20by%20male%20and%20female%20fiction%20writers%20in%20the%20British%20National%20Corpus.%20We%20find%20that%20the%20choice%20of%20the%20test%2C%20and%20hence%20data%20representation%2C%20matters.%20We%20conclude%20that%20significance%20testing%20can%20be%20used%20to%20find%20consequential%20differences%20between%20corpora%2C%20but%20that%20assuming%20independence%20between%20all%20words%20may%20lead%20to%20overestimating%20the%20significance%20of%20the%20observed%20differences%2C%20especially%20for%20poorly%20dispersed%20words.%20We%20recommend%20the%20use%20of%20the%20t-test%2C%20Wilcoxon%20rank-sum%20test%2C%20or%20bootstrap%20test%20for%20comparing%20word%20frequencies%20across%20corpora.%22%2C%22date%22%3A%222016%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%2210.1093%5C%2Fllc%5C%2Ffqu064%22%2C%22citationKey%22%3A%22lijffijt2016%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Facademic.oup.com%5C%2Fdsh%5C%2Farticle%5C%2F31%5C%2F2%5C%2F374%5C%2F2462752%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%220268-1145%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Text%20Analysis%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%2C%7B%22key%22%3A%225UIJV7XJ%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Long%20and%20So%22%2C%22parsedDate%22%3A%222016%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BLong%2C%20Hoyt%2C%20and%20Richard%20Jean%20So.%20%26%23x201C%3BLiterary%20Pattern%20Recognition%3A%20Modernism%20between%20Close%20Reading%20and%20Machine%20Learning.%26%23x201D%3B%20%26lt%3Bi%26gt%3BCritical%20Inquiry%26lt%3B%5C%2Fi%26gt%3B%2042%2C%20no.%202%20%282016%29%3A%20235%26%23x2013%3B67.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1086%5C%2F684353%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1086%5C%2F684353%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3D5UIJV7XJ%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Literary%20Pattern%20Recognition%3A%20Modernism%20between%20Close%20Reading%20and%20Machine%20Learning%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hoyt%22%2C%22lastName%22%3A%22Long%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Richard%20Jean%22%2C%22lastName%22%3A%22So%22%7D%5D%2C%22abstractNote%22%3A%22%5BFirst%20paragraph%3A%5D%20The%20title%20of%20this%20essay%20announces%20its%20core%20ambition%3A%20to%20propose%20a%20model%20of%20reading%20literary%20texts%20that%20synthesizes%20familiar%20humanistic%20approaches%20with%20computational%20ones.%20In%20recent%20years%2C%20debates%20over%20the%20use%20of%20computers%20to%20interpret%20literature%20have%20been%20fierce.%20On%20one%20side%2C%20scholars%20such%20as%20Franco%20Moretti%2C%20Matthew%20Jockers%2C%20Matthew%20Wilkens%2C%20and%20Andrew%20Piper%20defend%20the%20deployment%20of%20sophisticated%20machine%20techniques%2C%20like%20topic%20modeling%20and%20network%20analysis%2C%20to%20expose%20macroscale%20patterns%20of%20language%20and%20form%20culled%20from%20massive%20digitized%20literary%20corpora.1%20On%20the%20other%20side%2C%20scholars%20such%20as%20Alexander%20Galloway%2C%20David%20Golumbia%2C%20Tara%20McPherson%2C%20and%20Alan%20Liu%2C%20who%20work%20in%20the%20field%20of%20New%20Media%20Studies%2C%20have%20criticized%20machine%20techniques%20for%20reducing%20the%20complexity%20of%20literary%20texts%20to%20mere%20%5Cu201cdata%5Cu201d%20or%20for%20being%20incommensurable%20with%20the%20goals%20of%20critical%20theory.2%20Here%20we%20move%20beyond%20this%20impasse%20by%20modeling%20a%20form%20of%20literary%20analysis%20that%2C%20rather%20than%20leveraging%20one%20mode%20of%20reading%20against%20another%2C%20synthesizes%20humanistic%20and%20computational%20approaches%20into%20what%20we%20call%20literary%20pattern%20recognition.%22%2C%22date%22%3A%222016%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%2210.1086%5C%2F684353%22%2C%22citationKey%22%3A%22long2016%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.journals.uchicago.edu%5C%2Fdoi%5C%2F10.1086%5C%2F684353%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%220093-1896%2C%201539-7858%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22DH%20Distant%20reading%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22CHLQDCP3%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Danesh%20et%20al.%22%2C%22parsedDate%22%3A%222015-06%22%2C%22numChildren%22%3A1%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BDanesh%2C%20Soheil%2C%20Tamara%20Sumner%2C%20and%20James%20H.%20Martin.%20%26%23x201C%3BSGRank%3A%20Combining%20Statistical%20and%20Graphical%20Methods%20to%20Improve%20the%20State%20of%20the%20Art%20in%20Unsupervised%20Keyphrase%20Extraction.%26%23x201D%3B%20In%20%26lt%3Bi%26gt%3BProceedings%20of%20the%20Fourth%20Joint%20Conference%20on%20Lexical%20and%20Computational%20Semantics%26lt%3B%5C%2Fi%26gt%3B%2C%20117%26%23x2013%3B26.%20Denver%2C%20Colorado%3A%20Association%20for%20Computational%20Linguistics%2C%202015.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.18653%5C%2Fv1%5C%2FS15-1013%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.18653%5C%2Fv1%5C%2FS15-1013%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DCHLQDCP3%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22SGRank%3A%20Combining%20Statistical%20and%20Graphical%20Methods%20to%20Improve%20the%20State%20of%20the%20Art%20in%20Unsupervised%20Keyphrase%20Extraction%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Soheil%22%2C%22lastName%22%3A%22Danesh%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tamara%22%2C%22lastName%22%3A%22Sumner%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22James%20H.%22%2C%22lastName%22%3A%22Martin%22%7D%5D%2C%22abstractNote%22%3A%22Keyphrase%20extraction%20is%20a%20fundamental%20technique%20in%20natural%20language%20processing.%20It%20enables%20documents%20to%20be%20mapped%20to%20a%20concise%20set%20of%20phrases%20that%20can%20be%20used%20for%20indexing%2C%20clustering%2C%20ontology%20building%2C%20auto-tagging%20and%20other%20information%20organization%20schemes.%20Two%20major%20families%20of%20unsupervised%20keyphrase%20extraction%20algorithms%20may%20be%20characterized%20as%20statistical%20and%20graph-based.%20We%20present%20a%20hybrid%20statistical-graphical%20algorithm%20that%20capitalizes%20on%20the%20heuristics%20of%20both%20families%20of%20algorithms%20and%20is%20able%20to%20outperform%20the%20state%20of%20the%20art%20in%20unsupervised%20keyphrase%20extraction%20on%20several%20datasets.%22%2C%22proceedingsTitle%22%3A%22Proceedings%20of%20the%20Fourth%20Joint%20Conference%20on%20Lexical%20and%20Computational%20Semantics%22%2C%22conferenceName%22%3A%22%2ASEM-SemEval%202015%22%2C%22date%22%3A%222015-06%22%2C%22eventPlace%22%3A%22%22%2C%22DOI%22%3A%2210.18653%5C%2Fv1%5C%2FS15-1013%22%2C%22ISBN%22%3A%22%22%2C%22citationKey%22%3A%22danesh2015%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.aclweb.org%5C%2Fanthology%5C%2FS15-1013%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Text%20Analysis%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22MFJ7D494%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Algee-Hewitt%20and%20McGurl%22%2C%22parsedDate%22%3A%222015%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BAlgee-Hewitt%2C%20Mark%2C%20and%20Mark%20McGurl.%20%26lt%3Bi%26gt%3BBetween%20Canon%20and%20Corpus%3A%20Six%20Perspectives%20on%2020th-Century%20Novels%26lt%3B%5C%2Fi%26gt%3B.%20Vol.%208.%20Stanford%20Literary%20Lab%20Pamphlets.%20Stanford%2C%20CA%3A%20Stanford%20Literary%20Lab%2C%202015.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Flitlab.stanford.edu%5C%2FLiteraryLabPamphlet8.pdf%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Flitlab.stanford.edu%5C%2FLiteraryLabPamphlet8.pdf%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DMFJ7D494%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22book%22%2C%22title%22%3A%22Between%20Canon%20and%20Corpus%3A%20Six%20Perspectives%20on%2020th-Century%20Novels%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mark%22%2C%22lastName%22%3A%22Algee-Hewitt%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mark%22%2C%22lastName%22%3A%22McGurl%22%7D%5D%2C%22abstractNote%22%3A%22%5BBeginning%20of%20pamphet%3A%5D%20Of%20the%20many%2C%20many%20thousands%20of%20novels%20and%20stories%20published%20in%20English%20in%20the%2020th%20century%2C%20which%20group%20of%20several%20hundred%20would%20represent%20the%20most%20reasonable%2C%20interesting%2C%20and%20useful%20subset%20of%20the%20whole%3F%5Cn%5CnThis%20was%20the%20difficult%20question%20posed%20to%20researchers%20in%20the%20Stanford%20Literary%20Lab%20when%20they%20decided%20to%20move%20ahead%20with%20plans%20to%20create%20a%20fully%20digitized%20corpus%20of%2020th-century%20fiction.%22%2C%22date%22%3A%222015%22%2C%22originalDate%22%3A%22%22%2C%22originalPublisher%22%3A%22%22%2C%22originalPlace%22%3A%22%22%2C%22format%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%22algee-hewitt2015%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Flitlab.stanford.edu%5C%2FLiteraryLabPamphlet8.pdf%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Canons%20as%20paradigm%22%7D%2C%7B%22tag%22%3A%22Corpus%20representativeness%22%7D%2C%7B%22tag%22%3A%22DH%20Digital%20humanities%22%7D%2C%7B%22tag%22%3A%22DH%20Distant%20reading%22%7D%2C%7B%22tag%22%3A%22Humanities%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%5D%7D%7D%2C%7B%22key%22%3A%227VRN7NE5%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Underwood%22%2C%22parsedDate%22%3A%222015%22%2C%22numChildren%22%3A1%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BUnderwood%2C%20Ted.%20%26%23x201C%3BSeven%20Ways%20Humanists%20Are%20Using%20Computers%20to%20Understand%20Text.%26%23x201D%3B%20%26lt%3Bi%26gt%3BThe%20Stone%20and%20the%20Shell%26lt%3B%5C%2Fi%26gt%3B%20%28blog%29%2C%202015.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Ftedunderwood.com%5C%2F2015%5C%2F06%5C%2F04%5C%2Fseven-ways-humanists-are-using-computers-to-understand-text%5C%2F%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Ftedunderwood.com%5C%2F2015%5C%2F06%5C%2F04%5C%2Fseven-ways-humanists-are-using-computers-to-understand-text%5C%2F%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3D7VRN7NE5%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22blogPost%22%2C%22title%22%3A%22Seven%20ways%20humanists%20are%20using%20computers%20to%20understand%20text.%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ted%22%2C%22lastName%22%3A%22Underwood%22%7D%5D%2C%22abstractNote%22%3A%22%5BThis%20is%20an%20updated%20version%20of%20a%20blog%20post%20I%20wrote%20three%20years%20ago%2C%20which%20organized%20introductory%20resources%20for%20a%20workshop.%20Getting%20ready%20for%20another%20workshop%20this%20summer%2C%20I%20glanced%20back%20at%20the%20old%20%5Cu2026%22%2C%22blogTitle%22%3A%22The%20Stone%20and%20the%20Shell%22%2C%22date%22%3A%222015%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%22underwood2015%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Ftedunderwood.com%5C%2F2015%5C%2F06%5C%2F04%5C%2Fseven-ways-humanists-are-using-computers-to-understand-text%5C%2F%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22DH%20Digital%20humanities%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22J4ZFPMW6%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22De%20Bolla%22%2C%22parsedDate%22%3A%222013%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BDe%20Bolla%2C%20Peter.%20%26lt%3Bi%26gt%3BThe%20Architecture%20of%20Concepts%3A%20The%20Historical%20Formation%20of%20Human%20Rights%26lt%3B%5C%2Fi%26gt%3B.%20New%20York%3A%20Fordham%20University%20Press%2C%202013.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DJ4ZFPMW6%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22book%22%2C%22title%22%3A%22The%20architecture%20of%20concepts%3A%20the%20historical%20formation%20of%20human%20rights%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Peter%22%2C%22lastName%22%3A%22De%20Bolla%22%7D%5D%2C%22abstractNote%22%3A%22The%20Architecture%20of%20Concepts%20proposes%20a%20radically%20new%20way%20of%20understanding%20the%20history%20of%20ideas.%20Taking%20as%20its%20example%20human%20rights%2C%20it%20develops%20a%20distinctive%20kind%20of%20conceptual%20analysis%20that%20enables%20us%20to%20see%20with%20precision%20how%20the%20concept%20of%20human%20rights%20was%20formed%20in%20the%20eighteenth%20century.%5Cn%5CnThe%20first%20chapter%20outlines%20an%20innovative%20account%20of%20concepts%20as%20cultural%20entities.%20The%20second%20develops%20an%20original%20methodology%20for%20recovering%20the%20historical%20formation%20of%20the%20concept%20of%20human%20rights%20based%20on%20data%20extracted%20from%20digital%20archives.%20This%20enables%20us%20to%20track%20the%20construction%20of%20conceptual%20architectures%20over%20time.%5Cn%5CnHaving%20established%20the%20architecture%20of%20the%20concept%20of%20human%20rights%2C%20the%20book%20then%20examines%20two%20key%20moments%20in%20its%20historical%20formation%3A%20the%20First%20Continental%20Congress%20in%201775%20and%20the%20publication%20of%20Tom%20Paine%5Cu2019s%20Rights%20of%20Man%20in%201792.%20Arguing%20that%20we%20have%20yet%20to%20fully%20understand%20or%20appreciate%20the%20consequences%20of%20the%20eighteenth-century%20invention%20of%20the%20concept%20%5Cu201crights%20of%20man%2C%5Cu201d%20the%20final%20chapter%20addresses%20our%20problematic%20contemporary%20attempts%20to%20leverage%20human%20rights%20as%20the%20most%20efficacious%20way%20of%20achieving%20universal%20equality.%22%2C%22date%22%3A%222013%22%2C%22originalDate%22%3A%22%22%2C%22originalPublisher%22%3A%22%22%2C%22originalPlace%22%3A%22%22%2C%22format%22%3A%22%22%2C%22ISBN%22%3A%22978-0-8232-5438-5%20978-0-8232-5439-2%22%2C%22DOI%22%3A%22%22%2C%22citationKey%22%3A%22debolla2013%22%2C%22url%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22DH%20Cultural%20analytics%22%7D%2C%7B%22tag%22%3A%22DH%20Digital%20humanities%22%7D%2C%7B%22tag%22%3A%22DH%20Distant%20reading%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22Z8FM9G3X%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Baker%20et%20al.%22%2C%22parsedDate%22%3A%222008%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BBaker%2C%20Paul%2C%20Costas%20Gabrielatos%2C%20Majid%20KhosraviNik%2C%20Micha%26%23x142%3B%20Krzy%26%23x17C%3Banowski%2C%20Tony%20McEnery%2C%20and%20Ruth%20Wodak.%20%26%23x201C%3BA%20Useful%20Methodological%20Synergy%3F%20Combining%20Critical%20Discourse%20Analysis%20and%20Corpus%20Linguistics%20to%20Examine%20Discourses%20of%20Refugees%20and%20Asylum%20Seekers%20in%20the%20UK%20Press.%26%23x201D%3B%20%26lt%3Bi%26gt%3BDiscourse%20%26amp%3B%20Society%26lt%3B%5C%2Fi%26gt%3B%2019%2C%20no.%203%20%282008%29%3A%20273%26%23x2013%3B306.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1177%5C%2F0957926508088962%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1177%5C%2F0957926508088962%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DZ8FM9G3X%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22A%20useful%20methodological%20synergy%3F%20Combining%20critical%20discourse%20analysis%20and%20corpus%20linguistics%20to%20examine%20discourses%20of%20refugees%20and%20asylum%20seekers%20in%20the%20UK%20press%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Paul%22%2C%22lastName%22%3A%22Baker%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Costas%22%2C%22lastName%22%3A%22Gabrielatos%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Majid%22%2C%22lastName%22%3A%22KhosraviNik%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Micha%5Cu0142%22%2C%22lastName%22%3A%22Krzy%5Cu017canowski%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tony%22%2C%22lastName%22%3A%22McEnery%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ruth%22%2C%22lastName%22%3A%22Wodak%22%7D%5D%2C%22abstractNote%22%3A%22This%20article%20discusses%20the%20extent%20to%20which%20methods%20normally%20associated%20with%20corpus%20linguistics%20can%20be%20effectively%20used%20by%20critical%20discourse%20analysts.%20Our%20research%20is%20based%20on%20the%20analysis%20of%20a%20140-million-word%20corpus%20of%20British%20news%20articles%20about%20refugees%2C%20asylum%20seekers%2C%20immigrants%20and%20migrants%20%28collectively%20RASIM%29.%20We%20discuss%20how%20processes%20such%20as%20collocation%20and%20concordance%20analysis%20were%20able%20to%20identify%20common%20categories%20of%20representation%20of%20RASIM%20as%20well%20as%20directing%20analysts%20to%20representative%20texts%20in%20order%20to%20carry%20out%20qualitative%20analysis.%20The%20article%20suggests%20a%20framework%20for%20adopting%20corpus%20approaches%20in%20critical%20discourse%20analysis.%22%2C%22date%22%3A%222008%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%2210.1177%5C%2F0957926508088962%22%2C%22citationKey%22%3A%22baker2008%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1177%5C%2F0957926508088962%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%220957-9265%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Corpus%20linguistics%20as%20paradigm%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%5D%7D%7D%2C%7B%22key%22%3A%22IC8UJ7SR%22%2C%22library%22%3A%7B%22id%22%3A2133649%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A4770172%2C%22username%22%3A%22lcthomas%22%2C%22name%22%3A%22Lindsay%20Thomas%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Flcthomas%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Sebastiani%22%2C%22parsedDate%22%3A%222002%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%26gt%3BSebastiani%2C%20Fabrizio.%20%26%23x201C%3BMachine%20Learning%20in%20Automated%20Text%20Categorization.%26%23x201D%3B%20%26lt%3Bi%26gt%3BACM%20Computing%20Surveys%20%28CSUR%29%26lt%3B%5C%2Fi%26gt%3B%2034%2C%20no.%201%20%282002%29%3A%201%26%23x2013%3B47.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F505282.505283%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F505282.505283%26lt%3B%5C%2Fa%26gt%3B.%20%26lt%3Ba%20title%3D%26%23039%3BCite%20in%20RIS%20Format%26%23039%3B%20class%3D%26%23039%3Bzp-CiteRIS%26%23039%3B%20data-zp-cite%3D%26%23039%3Bapi_user_id%3D2133649%26amp%3Bitem_key%3DIC8UJ7SR%26%23039%3B%20href%3D%26%23039%3Bjavascript%3Avoid%280%29%3B%26%23039%3B%26gt%3BCite%26lt%3B%5C%2Fa%26gt%3B%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Machine%20learning%20in%20automated%20text%20categorization%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Fabrizio%22%2C%22lastName%22%3A%22Sebastiani%22%7D%5D%2C%22abstractNote%22%3A%22The%20automated%20categorization%20%28or%20classification%29%20of%20texts%20into%20predefined%20categories%20has%20witnessed%20a%20booming%20interest%20in%20the%20last%2010%20years%2C%20due%20to%20the%20increased%20availability%20of%20documents%20in%20digital%20form%20and%20the%20ensuing%20need%20to%20organize%20them.%20In%20the%20research%20community%20the%20dominant%20approach%20to%20this%20problem%20is%20based%20on%20machine%20learning%20techniques%3A%20a%20general%20inductive%20process%20automatically%20builds%20a%20classifier%20by%20learning%2C%20from%20a%20set%20of%20preclassified%20documents%2C%20the%20characteristics%20of%20the%20categories.%20The%20advantages%20of%20this%20approach%20over%20the%20knowledge%20engineering%20approach%20%28consisting%20in%20the%20manual%20definition%20of%20a%20classifier%20by%20domain%20experts%29%20are%20a%20very%20good%20effectiveness%2C%20considerable%20savings%20in%20terms%20of%20expert%20labor%20power%2C%20and%20straightforward%20portability%20to%20different%20domains.%20This%20survey%20discusses%20the%20main%20approaches%20to%20text%20categorization%20that%20fall%20within%20the%20machine%20learning%20paradigm.%20We%20will%20discuss%20in%20detail%20issues%20pertaining%20to%20three%20different%20problems%2C%20namely%2C%20document%20representation%2C%20classifier%20construction%2C%20and%20classifier%20evaluation.%22%2C%22date%22%3A%222002%22%2C%22section%22%3A%22%22%2C%22partNumber%22%3A%22%22%2C%22partTitle%22%3A%22%22%2C%22DOI%22%3A%2210.1145%5C%2F505282.505283%22%2C%22citationKey%22%3A%22sebastiani2002%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Fdl.acm.org%5C%2Fdoi%5C%2F10.1145%5C%2F505282.505283%22%2C%22PMID%22%3A%22%22%2C%22PMCID%22%3A%22%22%2C%22ISSN%22%3A%220360-0300%2C%201557-7341%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222026-02-16T21%3A25%3A40Z%22%2C%22tags%22%3A%5B%7B%22tag%22%3A%22Data%20science%22%7D%2C%7B%22tag%22%3A%22Machine%20learning%22%7D%2C%7B%22tag%22%3A%22Text%20Analysis%22%7D%2C%7B%22tag%22%3A%22Text%20classification%22%7D%5D%7D%7D%5D%7D

Hvitfeldt, Emil, and Julia Silge. Supervised Machine Learning for Text Analysis in R. Emil Hvitfeldt and Julia Silge, 2020. https://smltar.com/. Cite

Rogers, Anna, Olga Kovaleva, and Anna Rumshisky. “A Primer in BERTology: What We Know about How BERT Works.” arXiv:2002.12327 [Cs], 2020. http://arxiv.org/abs/2002.12327. Cite

Yang, Yiwei, Eser Kandogan, Yunyao Li, Prithviraj Sen, and Walter S. Lasecki. “A Study on Interaction in Human-in-the-Loop Machine Learning for Text Analytics.” In IUI Workshops 2019. Los Angeles: ACM, 2019. https://www.semanticscholar.org/paper/A-Study-on-Interaction-in-Human-in-the-Loop-Machine-Yang-Kandogan/03a4544caed21760df30f0e4f417bbe361c29c9e. Cite

“The Programming Historian.” Programming Historian, 2019. https://programminghistorian.org/. Cite

Ford, Clay. “The Wilcoxon Rank Sum Test.” University of Virginia Library Research Data Services + Sciences, 2017. https://data.library.virginia.edu/the-wilcoxon-rank-sum-test/. Cite

Steinskog, Asbjørn, Jonas Therkelsen, and Björn Gambäck. “Twitter Topic Modeling by Tweet Aggregation.” In Proceedings of the 21st Nordic Conference of Computational Linguistics, 77–86. Gothenburg: Linko¨ping University Electronic Press, 2017. https://www.semanticscholar.org/paper/Twitter-Topic-Modeling-by-Tweet-Aggregation-Steinskog-Therkelsen/89735b06ee5d7bcb469ddc619022bbc9f2443f02. Cite

Algee-Hewitt, Mark, Sarah Allison, Marissa Gemma, Ryan Heuser, Franco Moretti, and Hannah Walser. Canon/Archive: Large-Scale Dynamics in the Literary Field. Vol. 11. Stanford Literary Lab Pamphlets. Stanford, CA: Stanford Literary Lab, 2016. https://litlab.stanford.edu/LiteraryLabPamphlet11.pdf. Cite

Lijffijt, Jefrey, Terttu Nevalainen, Tanja Säily, Panagiotis Papapetrou, Kai Puolamäki, and Heikki Mannila. “Significance Testing of Word Frequencies in Corpora.” Literary and Linguistic Computing 31, no. 2 (2016): 374–97. https://doi.org/10.1093/llc/fqu064. Cite

Long, Hoyt, and Richard Jean So. “Literary Pattern Recognition: Modernism between Close Reading and Machine Learning.” Critical Inquiry 42, no. 2 (2016): 235–67. https://doi.org/10.1086/684353. Cite

Danesh, Soheil, Tamara Sumner, and James H. Martin. “SGRank: Combining Statistical and Graphical Methods to Improve the State of the Art in Unsupervised Keyphrase Extraction.” In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, 117–26. Denver, Colorado: Association for Computational Linguistics, 2015. https://doi.org/10.18653/v1/S15-1013. Cite

Algee-Hewitt, Mark, and Mark McGurl. Between Canon and Corpus: Six Perspectives on 20th-Century Novels. Vol. 8. Stanford Literary Lab Pamphlets. Stanford, CA: Stanford Literary Lab, 2015. https://litlab.stanford.edu/LiteraryLabPamphlet8.pdf. Cite

Underwood, Ted. “Seven Ways Humanists Are Using Computers to Understand Text.” The Stone and the Shell (blog), 2015. https://tedunderwood.com/2015/06/04/seven-ways-humanists-are-using-computers-to-understand-text/. Cite

De Bolla, Peter. The Architecture of Concepts: The Historical Formation of Human Rights. New York: Fordham University Press, 2013. Cite

Baker, Paul, Costas Gabrielatos, Majid KhosraviNik, Michał Krzyżanowski, Tony McEnery, and Ruth Wodak. “A Useful Methodological Synergy? Combining Critical Discourse Analysis and Corpus Linguistics to Examine Discourses of Refugees and Asylum Seekers in the UK Press.” Discourse & Society 19, no. 3 (2008): 273–306. https://doi.org/10.1177/0957926508088962. Cite

Sebastiani, Fabrizio. “Machine Learning in Automated Text Categorization.” ACM Computing Surveys (CSUR) 34, no. 1 (2002): 1–47. https://doi.org/10.1145/505282.505283. Cite