Corpus Approaches to Language in Social Media (Routledge Advances in Corpus Linguistics) [1 ed.]
1032125705, 9781032125701
This book showcases the unique possibilities of corpus linguistic methodologies in engaging with and analysing language
137
116
English
Pages 398
[399]
Year 2023
Report DMCA / Copyright
DOWNLOAD PDF FILE
Table of contents :
Cover
Half Title
Series
Title
Copyright
Dedication
Contents
Preface/Acknowledgements
1 Introduction
Setting the stage
Interconnecting with the digital
Digital humanities as practices of interconnections
More than numbers: studying cognition and society through corpus approaches
Scope and structure of this book
About the companion website
References
2 Social media as digital research data
The impact of the digital on cognition and society
Open source
Copyright and ethics
Copyright issues
Ethical issues
The characteristics of a corpus
More than text: corpus metadata, textual markup, and annotation
Metadata
Evaluating metadata
Textual markup
Annotations
References
3 Fundamentals of corpus linguistics
Corpus tools
The building blocks of corpus linguistics
Type, token, lemma
Frequencies and frequency lists
Dispersion
Concordances and key-word-in-context
Collocations
Keywords
Stoplist
Advancements in corpus linguistics
A corpus approach perspective on sentiment analysis and topic modelling
References
4 Imagining the data: corpus design
Setting up the working environment
Command-line interface and virtual programming environments
A note about programming languages
CSV, XML and HTML, JSON
CSV
XML and HTML
JSON
Preserving the data
Internet Archive and the Wayback Machine
WARC format
git
Working with digital textual data
Unicode, UTF-8, character encodings
Regular expressions
Towards data collection
References
5 Creating the data: corpus collection
Collecting the data: general remarks
Crawling and scraping web data
APIs
General purpose scrapers
#LancsBox
Archivebox
Trafilatura
The coding way: BeautifulSoup
Platform-specific scrapers
Twitter
Instagram
Facebook
YouTube
Data processing
Dates, time, and Unix time
Text normalisation
PDF, Word, images
Detecting the language(s) used in a text
Emoticons and emojis
Hashtags
Other elements
Annotations
Verticalised format
Exploring the collected data
Cleaning and formatting the data
References
6 Case studies
Analysing crypto-drug market fora
Background
Context
Corpus design
Data processing
Corpus analysis
Analysing the language of far-right groups on Twitter and Facebook
Background
Context
Corpus design
Data processing
Corpus analysis
The communicative modus operandi of online child sexual groomers
Background
Context
Corpus design
Data processing
Corpus analysis
References
7 Conclusion
A broad view of corpus approaches
References
Appendix
Index