An interpretable classifier to detect political leanings in online news, using the Log-Odds Ratio with Informative Dirichlet Prior (LOR-IDP) — a method designed for transparency and interpretability. The project combines AllSides bias labels with a dataset of over 22,000 articles sourced from the NewsCatcher API.
Project outline
This project applies a statistical NLP approach to political bias classification: the Log-Odds Ratio with Informative Dirichlet Priors (LOR-IDP) — a method renowned for producing interpretable, feature-driven models.
The work is grounded in Jurafsky & Martin’s Speech and Language Processing — a canonical NLP textbook that has shaped the field — and builds directly on the methodological framework of Monroe et al. (as cited in Jurafsky). By combining these foundations with modern data sources — including AllSides labels and a large corpus from NewsCatcher — the project delivers both transparent model outputs and empirical insight into linguistic markers of political bias.
Key Features:
- Feature engineering from news article headlines
- Interpretable classifier with feature weight visualisation (top-ranked left/right terms)
- Integration of AllSides and NewsCatcher datasets
- Methodology grounded in canonical NLP and statistical text analysis research