Bibliography – Data Science

Selected DH research and resources bearing on, or utilized by, the WE1S project.
(all) Distant Reading | Cultural Analytics | | Sociocultural Approaches | Topic Modeling in DH | Non-consumptive Use


Smith, Gary, and Jay Cordes. The Phantom Pattern Problem: The Mirage of Big Data. First edition. Oxford ; New York, NY: Oxford University Press, 2020. Cite
Koenzen, Andreas, Neil Ernst, and Margaret-Anne Storey. “Code Duplication and Reuse in Jupyter Notebooks.” ArXiv:2005.13709 [Cs], 2020. http://arxiv.org/abs/2005.13709. Cite
Chattopadhyay, Souti, Ishita Prasad, Austin Z. Henley, Anita Sarma, and Titus Barik. “What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities.” In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–12. CHI ’20. Honolulu, HI, USA: Association for Computing Machinery, 2020. https://doi.org/10.1145/3313831.3376729. Cite
DePratti, Roland. “Jupyter Notebooks versus a Textbook in a Big Data Course.” Journal of Computing Sciences in Colleges 35, no. 8 (2020): 208–20. https://dl.acm.org/doi/abs/10.5555/3417639.3417658. Cite
Willis, Alistair, Patricia Charlton, and Tony Hirst. “Developing Students’ Written Communication Skills with Jupyter Notebooks.” In Proceedings of the 51st ACM Technical Symposium on Computer Science Education, 1089–95. SIGCSE ’20. Portland, OR, USA: Association for Computing Machinery, 2020. https://doi.org/10.1145/3328778.3366927. Cite
Thylstrup, Nanna Bonde, ed. Uncertain Archives: Critical Keywords for Big Data. Cambridge, Massachusetts: The MIT Press, 2020. Cite
Kwak, Haewoon, Jisun An, and Yong-Yeol Ahn. “A Systematic Media Frame Analysis of 1.5 Million New York Times Articles from 2000 to 2017.” ArXiv:2005.01803 [Cs], 2020. http://arxiv.org/abs/2005.01803. Cite
Munro, Robert. Human-in-the-Loop Machine Learning. Shelter Island, New York: Manning, 2020. https://www.manning.com/books/human-in-the-loop-machine-learning. Cite
Wang, April Yi, Anant Mittal, Christopher Brooks, and Steve Oney. “How Data Scientists Use Computational Notebooks for Real-Time Collaboration.” Association for Computing Machinery, 2019. https://doi.org/10.1145/3359141. Cite
Rule, Adam, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob Knight, Niema Moshiri, et al. “Ten Simple Rules for Writing and Sharing Computational Analyses in Jupyter Notebooks.” PLOS Computational Biology 15, no. 7 (2019): e1007007. https://doi.org/10.1371/journal.pcbi.1007007. Cite
Pandey, Parul. Interpretable Machine Learning, 2019. https://towardsdatascience.com/interpretable-machine-learning-1dec0f2f3e6b. Cite
“Big Data Technologies: A Survey.” Journal of King Saud University - Computer and Information Sciences 30, no. 4 (2018): 431–48. https://doi.org/10.1016/j.jksuci.2017.06.001. Cite
Kery, Mary Beth, Marissa Radensky, Mahima Arya, Bonnie E. John, and Brad A. Myers. “The Story in the Notebook: Exploratory Data Science Using a Literate Programming Tool.” In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1–11. CHI ’18. Montreal QC, Canada: Association for Computing Machinery, 2018. https://doi.org/10.1145/3173574.3173748. Cite
Narkhede, Sarang. Understanding Confusion Matrix, 2018. https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62. Cite
Kleinman, Scott, Mark D. LeBlanc, and Michael Drout. Hierarchical Clustering, 2018. http://scalar.usc.edu/works/lexos/hierarchical-clustering?path=manual. Cite
Randles, Bernadette M., Irene V. Pasquetto, Milena S. Golshan, and Christine L. Borgman. “Using the Jupyter Notebook as a Tool for Open Science: An Empirical Study.” In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 1–2, 2017. https://doi.org/10.1109/JCDL.2017.7991618. Cite
Jupyter, Project. “Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science.” Medium, 2017. https://blog.jupyter.org/project-jupyter-computational-narratives-as-the-engine-of-collaborative-data-science-2b5fb94c3c58. Cite
Ruchansky, Natali, Sungyong Seo, and Yan Liu. “CSI: A Hybrid Deep Model for Fake News Detection.” In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 797–806. CIKM ’17. Singapore, Singapore: Association for Computing Machinery, 2017. https://doi.org/10.1145/3132847.3132877. Cite
Wang, William Yang. “‘Liar, Liar Pants on Fire’: A New Benchmark Dataset for Fake News Detection.” ArXiv:1705.00648 [Cs], 2017. http://arxiv.org/abs/1705.00648. Cite
Mützel, Sophie. “Facing Big Data: Making Sociology Relevant , Facing Big Data: Making Sociology Relevant.” Big Data & Society 2, no. 2 (2015): 2053951715599179. https://doi.org/10.1177/2053951715599179. Cite
Hashem, Ibrahim Abaker Targio, Ibrar Yaqoob, Nor Badrul Anuar, Salimah Mokhtar, Abdullah Gani, and Samee Ullah Khan. “The Rise of ‘Big Data’ on Cloud Computing: Review and Open Research Issues.” Information Systems 47 (2015): 98–115. https://doi.org/10.1016/j.is.2014.07.006. Cite
“Beyond the Hype: Big Data Concepts, Methods, and Analytics.” International Journal of Information Management 35, no. 2 (2015): 137–44. https://doi.org/10.1016/j.ijinfomgt.2014.10.007. Cite
Ahonen, Pertti. “Institutionalizing Big Data Methods in Social and Political Research , Institutionalizing Big Data Methods in Social and Political Research.” Big Data & Society 2, no. 2 (2015): 2053951715591224. https://doi.org/10.1177/2053951715591224. Cite
Conroy, Niall J., Victoria L. Rubin, and Yimin Chen. “Automatic Deception Detection: Methods for Finding Fake News: Automatic Deception Detection: Methods for Finding Fake News.” Proceedings of the Association for Information Science and Technology 52, no. 1 (2015): 1–4. https://doi.org/10.1002/pra2.2015.145052010082. Cite
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. “The Parable of Google Flu: Traps in Big Data Analysis.” Science 343, no. 6176 (2014): 1203–5. https://doi.org/10.1126/science.1248506. Cite
Kitchin, Rob. “Big Data, New Epistemologies and Paradigm Shifts.” Big Data & Society 1, no. 1 (2014): 2053951714528481. https://doi.org/10.1177/2053951714528481. Cite
Kitchin, Rob. The Data Revolution: Big Data, Open Data, Data Infrastructures & Their Consequences. Los Angeles, California: SAGE Publications, 2014. Cite
Philip Chen, C. L., and Chun-Yang Zhang. “Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data.” Information Sciences 275 (2014): 314–47. https://doi.org/10.1016/j.ins.2014.01.015. Cite
Bail, Christopher. The Cultural Environment: Measuring Culture With Big Data, 2014. https://www.researchgate.net/publication/260705893_The_Cultural_Environment_Measuring_Culture_With_Big_Data. Cite
Chen, Min, Shiwen Mao, and Yunhao Liu. “Big Data: A Survey.” Mobile Networks and Applications 19, no. 2 (2014): 171–209. https://doi.org/10.1007/s11036-013-0489-0. Cite
Wu, Xindong, Xingquan Zhu, Gong-Qing Wu, and Wei Ding. “Data Mining with Big Data.” IEEE Transactions on Knowledge and Data Engineering 26, no. 1 (2014): 97–107. https://doi.org/10.1109/TKDE.2013.109. Cite
Burscher, Björn, Daan Odijk, Rens Vliegenthart, Maarten de Rijke, and Claes H. de Vreese. “Teaching the Computer to Code Frames in News: Comparing Two Supervised Machine Learning Approaches to Frame Analysis.” Communication Methods and Measures 8, no. 3 (2014): 190–206. https://doi.org/10.1080/19312458.2014.937527. Cite
Richards, Neil M., and Jonathan H. King. “Three Paradoxes of Big Data.” Stanford Law Review 66 (2013). https://www.stanfordlawreview.org/online/privacy-and-big-data-three-paradoxes-of-big-data/. Cite
Ward, Jonathan Stuart, and Adam Barker. “Undefined By Data: A Survey of Big Data Definitions.” ArXiv:1309.5821 [Cs], 2013. http://arxiv.org/abs/1309.5821. Cite
Sagiroglu, Seref, and Duygu Sinanc. “Big Data: A Review.” In 2013 International Conference on Collaboration Technologies and Systems (CTS), 42–47, 2013. https://doi.org/10.1109/CTS.2013.6567202. Cite
Labrinidis, Alexandros, and H. V. Jagadish. “Challenges and Opportunities with Big Data.” VLDB Endowment, 2012. https://doi.org/10.14778/2367502.2367572. Cite
Drout, Michael, and Leah Smith. How to Read a Dendogram, 2012. https://wheatoncollege.edu/wp-content/uploads/2012/08/How-to-Read-a-Dendrogram-Web-Ready.pdf. Cite
Boyd, Danah, and Kate Crawford. “Six Provocations for Big Data.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, 2011. https://papers.ssrn.com/abstract=1926431. Cite
Peng, Roger D. “Reproducible Research in Computational Science.” Science 334, no. 6060 (2011): 1226–27. https://doi.org/10.1126/science.1213847. Cite
Sebastiani, Fabrizio. “Machine Learning in Automated Text Categorization.” ACM Computing Surveys (CSUR) 34, no. 1 (2002): 1–47. https://doi.org/10.1145/505282.505283. Cite
Miller, M. Mark. “Frame Mapping and Analysis of News Coverage of Contentious Issues.” Social Science Computer Review 15, no. 4 (1997): 367–78. https://doi.org/10.1177/089443939701500403. Cite