Political Bias Classification with Dirichlet Priors

An interpretable classifier to detect political leanings in online news, using the Log-Odds Ratio with Informative Dirichlet Prior (LOR-IDP) — a method designed for transparency and interpretability. The project combines AllSides bias labels with a dataset of over 22,000 articles sourced from the NewsCatcher API.

Project outline

This project applies a statistical NLP approach to political bias classification: the Log-Odds Ratio with Informative Dirichlet Priors (LOR-IDP) — a method renowned for producing interpretable, feature-driven models.

The work is grounded in Jurafsky & Martin’s Speech and Language Processing — a canonical NLP textbook that has shaped the field — and builds directly on the methodological framework of Monroe et al. (as cited in Jurafsky). By combining these foundations with modern data sources — including AllSides labels and a large corpus from NewsCatcher — the project delivers both transparent model outputs and empirical insight into linguistic markers of political bias.

Key Features:

Feature engineering from news article headlines
Interpretable classifier with feature weight visualisation (top-ranked left/right terms)
Integration of AllSides and NewsCatcher datasets
Methodology grounded in canonical NLP and statistical text analysis research

📄 Read the full report

💻 Read code (GitHub)