Corpus Approaches to Language in Social Media (Routledge Advances in Corpus Linguistics) [1 ed.] 1032125705, 9781032125701

This book showcases the unique possibilities of corpus linguistic methodologies in engaging with and analysing language

137 116

English Pages 398 [399] Year 2023

Table of contents :
Cover
Half Title
Series
Title
Copyright
Dedication
Contents
Preface/Acknowledgements
1 Introduction
Setting the stage
Interconnecting with the digital
Digital humanities as practices of interconnections
More than numbers: studying cognition and society through corpus approaches
Scope and structure of this book
About the companion website
References
2 Social media as digital research data
The impact of the digital on cognition and society
Open source
Copyright and ethics
Copyright issues
Ethical issues
The characteristics of a corpus
More than text: corpus metadata, textual markup, and annotation
Metadata
Evaluating metadata
Textual markup
Annotations
References
3 Fundamentals of corpus linguistics
Corpus tools
The building blocks of corpus linguistics
Type, token, lemma
Frequencies and frequency lists
Dispersion
Concordances and key-word-in-context
Collocations
Keywords
Stoplist
Advancements in corpus linguistics
A corpus approach perspective on sentiment analysis and topic modelling
References
4 Imagining the data: corpus design
Setting up the working environment
Command-line interface and virtual programming environments
A note about programming languages
CSV, XML and HTML, JSON
CSV
XML and HTML
JSON
Preserving the data
Internet Archive and the Wayback Machine
WARC format
git
Working with digital textual data
Unicode, UTF-8, character encodings
Regular expressions
Towards data collection
References
5 Creating the data: corpus collection
Collecting the data: general remarks
Crawling and scraping web data
APIs
General purpose scrapers
#LancsBox
Archivebox
Trafilatura
The coding way: BeautifulSoup
Platform-specific scrapers
Twitter
Instagram
Facebook
YouTube
Data processing
Dates, time, and Unix time
Text normalisation
PDF, Word, images
Detecting the language(s) used in a text
Emoticons and emojis
Hashtags
Other elements
Annotations
Verticalised format
Exploring the collected data
Cleaning and formatting the data
References
6 Case studies
Analysing crypto-drug market fora
Background
Context
Corpus design
Data processing
Corpus analysis
Analysing the language of far-right groups on Twitter and Facebook
Background
Context
Corpus design
Data processing
Corpus analysis
The communicative modus operandi of online child sexual groomers
Background
Context
Corpus design
Data processing
Corpus analysis
References
7 Conclusion
A broad view of corpus approaches
References
Appendix
Index

Author / Uploaded
Matteo Di Cristofaro

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Recommend Papers

Corpus approaches to language in social media / [1 ed.] 9781000915594, 9781000915556, 9781032125701

This book showcases the unique possibilities of corpus linguistic methodologies in engaging with and analysing language

106 19 5MB Read more

Corpora and Language Learners (Studies in Corpus Linguistics) 9027222886, 9789027222886

507 96 2MB Read more

Arabic Corpus Linguistics 9780748677382

An overview of current corpus-based research on the Arabic language Takes a perspective-based approach to the practice o

99 83 4MB Read more

Corpus Linguistics 9781474470865

GBS_insertPreviewButtonPopup('ISBN:9780748611652); Corpus Linguistics has quickly established itself as the leadi

115 20 4MB Read more

Corpus Approaches to the Language of Sports: Texts, Media, Modalities 9781350088207, 9781350088238, 9781350088214

Recent decades have seen a fundamental change and transformation in the commercialisation and popularisation of sports a

147 34 3MB Read more

Corpora in Cognitive Linguistics: Corpus-Based Approaches to Syntax and Lexis 9783110197709, 9783110186055, 9783110198263

Cognitive Linguistics, the branch of linguistics that tries to "make one's account of human language accord wi

161 87 2MB Read more

Demystifying Corpus Linguistics for English Language Teaching 3031112199, 9783031112195

The aim of this edited volume is to demystify corpus linguistics for use in English language teaching (ELT). It advocate

203 42 8MB Read more

Statistics for Corpus Linguistics 9781474471381

This book in the Edinburgh Textbooks in Empirical Linguistics series is a comprehensive introduction to the statistics c

114 81 32MB Read more

Doing Linguistics with a Corpus: Methodological Considerations for the Everyday User (Elements in Corpus Linguistics) 1108744850, 9781108744850

Paradoxically, doing corpus linguistics is both easier and harder than it has ever been before. On the one hand, it is e

326 15 2MB Read more

Computational and Corpus Approaches to Chinese Language Learning (Chinese Language Learning Sciences) 9811335699, 9789811335693

This book presents a collection of original research articles that showcase the state of the art of research in corpus a

114 28 Read more