The Rise of Central Bank Talk

The Rise of Central Bank Talk:
Essays in Central Bank Communication and Independence

Thesis Defense →

Lauren Leek

7 October, 2025

Thesis Committee

Prof. Simon Hix – European University Institute (supervisor)

Prof. Waltraud Schelkle – European University Institute (co-supervisor)

Prof. Kenneth Benoit – Singapore Management University

Prof. Ana Carolina Garriga – University of Essex

In the headlines

FT headline on climate and central banks

& beyond the headlines

Puzzle

How do central banks reconcile their independent status with extensive public communication that goes beyond their narrow mandates?

Argument

Independence ≠ one‑off grant; requires continuous responsiveness and coordination.

Standard view

Communication = neutral instrument for information transmission and accountability (e.g., Woodford, Geraats, etc.).

⇛

Thesis view

Communication is dynamically shaped by (ch2) & deployed to address perceived challenges to independence (ch3 & ch4).

Addition Not only central banks embedded in advanced democracies. (Some central banks in advanced economies seem to be experiencing emerging central bank problems).

Methods & data

Overview of Chapters & Methods

Chapter Data Empirical Method Text-as-Data

2 Independence → Communication

All speeches (1997–2023)

Staggered DiD Instrumental Variable

Gemini

Sentence cls.

3 NCB Agenda Influence

NCB + ECB speeches (1997–2023)

Sequence / Markov Cross-Sectional Timeseries

BERTopic

Topics & diffusion

4 Policy Linkages

All speeches (1997–mid‑2023)

Descriptive

GPT‑3.5

Regime cls. Validation

How Central Bank Independence Shapes Monetary Policy Communication: A Large Language Model Application

* Published as: Leek, L., & Bischl, S. (2025). How Central Bank Independence Shapes Monetary Policy Communication: A Large Language Model Application European Journal of Political Economy, 87, 102668. doi.org/10.1016/j.ejpoleco.2025.102668

Theory

Main

H1: More independence → less inflation talk.

Main

H2: More independence → more financial pressure talk.

Mechanism

H3: Independence shifts which pressures are addressed in communication.

Scope Condition

H4: Effects strongest in democracies and advanced economies.

Methods

LLM Google Gemini sentence classification
DiD Staggered Difference-in-Differences on 100 CBs (1997–2023) and augmented
IV Inverse distance-weighted lagged world CBI

Findings: strongest in advanced democracies

Free from government ≠ free from markets

Who Sets the Agenda? The Role of National Central Banks in the Eurosystem*

* R&R at Journal of European Public Policy.

Hypotheses

H1 First Movers

NCBs introduce novel topics first

H2 Public Salience

ECB uptake ↑ with national salience

H3 Economic Pressure

Responsiveness ↑ with asymmetric pressures

ECB not extra responsive under pressure

Scope conditions

🔴 Crisis ↑ responsiveness

👥 Coalitions ↑ stronger

📍 Post‑Sintra ↓ reduced

Alternative Explanations

❌ Issue Linkage

✓ Within‑Topic Variance

Key Insight

ECB responsiveness is conditional and topic audience dictates desired unity.

Introducing Textual Measures of Central Bank Policy Linkages Using GPT‑3.5

Policy Regimes

💪 Monetary Dominance

🏛 Fiscal Dominance

💼 Financial Dominance

🤝 Monetary–Fiscal Coordination

🤝 Monetary–Financial Coordination

Classification Pipeline

Validation & Prompt Engineering

Manual Validation

1k sentences / 3 coders (IRR 76–84%). Accuracy 62–85%; high macro F1 (.36–.83).

Prompt Choices

Temp=0; 10 sentences/prompt; context window for disambiguation.

Compute / Tokens

Efficient token usuage ; GPT‑4 & Gemini benchmarking checks.

Dominance / Coordination Over Time

Trends in monetary dominance, fiscal/financial coordination and dominance over time

Cross‑Country Variation

Advanced vs Emerging

Democracy vs Autocracy

Crisis Effects

Theoretical contributions

Ch 2 ✓ Financial instability as a persistent challenge for central bank independence - beyond macroprudential turn literature
Ch 3 ✓ Even/especially the most independent central banks are responsive - beyond Cukierman
Ch 4 ✓ Fiscal and financial pressures through either coordination or dominance (policy‑linkages) - beyond Sargent and Wallace
⚠ Note: not a test of the intentions of governors behind communicating

Methods & Data Contributions

Ch 2 Ch 4 ✓ Novel LLM-based methods for detecting policy pressures in central bank communication
Ch 2–4 ✓ Comprehensive dataset with enriched meta-data (retrieved via LLMs)
Ch 3 ✓ New measure of responsiveness for central banks
Ch 2–4 ✓ Interactive website with dashboards and insights

Four Broader Implications

1. Dynamic Independence

CBI is an ongoing negotiation despite statutes.

2. Geographic Heterogeneity

Different mechanisms in emerging vs. advanced economies and currency areas/pegged currencies - context matters.

3. Government Coordination

Macroeconomic policy coordination - link with actual policies (fiscal & financial).

4. LLM usage

LLM Reproducibility (bias, environmental costs)

Appendix: Introduction

Jump: Analytical Framework CBI Map World Avg CBI Topic Diversity Speaker Position Audiences Over Time

Analytical framework

CBI Map

World Avg CBI

Topic Diversity

Speaker Position

Audiences Over Time

Appendix: Chapter 2

Jump:

Classification & Validation

Distribution of Speeches Classification Results Face Validation Over Time Confusion Matrices Dataset Comparison Map Gemini Fine-tune Settings Gemini Model Configurations LLM Performance External Validation Regimes

DiD & IV

Events Map Staggered DiD Subgroup specification Subgroup Results IV specification IV results IV Validity Mechanisms

Alt explanations

Alternative explanations overview Alt Explanations I Alt Explanations II Supervision mandate Audiences effect

Robustness test

Robustness overview Alternative Event Study Continuous Event Study Event study Robustness Event study CBI disaggregrated Placebo Randomization Placebo Event studies CBI level effects CBI level

Distribution of Speeches

Classification Results

Face validation regimes over time

Confusion Matrices

Dataset comparison map

Gemini fine-tune settings

Final Model Configurations

Parameter	Final Value
Optimization Settings
Epochs	7
Learning rate	0.0005
Batch size	2
Prompt Engineering
Sentences per prompt	5
Temperature	0
Format instructions	yes
Dataset Composition
Synthetic sentences	no
Upsample factor	0
Randomise epochs	yes

LLM Validation: Gemini Fine-Tune Performance

Validation Against Human Ground Truth

Metric	GPT-3.5	GPT-4	Fine-tune
Accuracy	0.64	0.79	0.81
F1 (weighted)	0.69	0.78	0.79
F1 (macro)	0.35	0.40	0.47
Precision (macro)	0.33	0.48	0.49
Recall (macro)	0.43	0.40	0.45

Best Performance

Fine-tuned Gemini achieves highest scores across all metrics. Macro F1 (0.47) particularly strong for minority categories.

Confusion Matrix Insights

Both models struggle with:
• Financial & fiscal dominance
• Differentiating dominance from coordination

Gemini advantages:
• More conservative assignments
• Higher precision (fewer false positives)
• Closer alignment with human coders

Model Comparison

GPT-3.5: Over-assigns dominance/coordination
Gemini fine-tune: More "none" assignments, fewer bad mistakes

Validation Details

Ground truth: 3 human coders, 1,000 sentences
Training: 300 sentences
Holdout: 700 sentences for testing
Same scheme as Leek & Bischl (2024)

Bottom Line: Fine-tuned Gemini outperforms all tested models. Higher precision means fewer erroneous dominance assignments - critical for reliable policy regime measurement.

External validation regimes

Events Map

Empirical Analysis: Staggered Difference-in-Differences

Main Event Study Specification $$\psi^m_{ict} = \sum_{k=-5}^{k = -2} \beta_k D^k_{ct} + \sum_{k=0}^{k = 12} \beta_k D^k_{ct} + \mu_c + \theta_t + \epsilon_{ict}$$

ψᵐᵢct

Dominance indicator

Dᵏct

Event indicator

βₖ

Treatment effects (ATT)

μc, θt

Fixed effects

Endpoint Binning Strategy

$$D^{k}_{ct} = \begin{cases} \sum_{j=-5}^{-\infty} d_{c(t-j)} & \text{if } k = -5 \\ d_{c(t-k)} & \text{if } -5 < k < 12 \\ \sum_{j=12}^{\infty} d_{c(t-j)} & \text{if } k = 12 \end{cases}$$

Design Features

Pre-treatment coefficients test parallel trends. Treatment effects relative to β₋₁ = 0. Standard errors clustered at country level.

Specification Details

Treatment: Binary (d_ct ∈ {0,1}), min. increase 0.05
Events: Largest change per country only
Robustness: Gardner (2024) two-stage estimator
Tests: Wald tests for leveling & pre-trends
Window: 1985-2023, staggered CBI adoption

Subgroup analysis

Augmented Event Study with Interactions $$\psi^m_{ict} = \sum_{k=-5}^{k = -2} \beta_k D^k_{ct} + \sum_{k=0}^{k = 12} \beta_k D^k_{ct} + \sum_{j=2}^{j=J} \sum_{k=0}^{k = 12} \delta_{jk} D^k_{ct} S^j_{ct} + \mu_c + \theta_t +\epsilon_{ict}$$

S^j_{ct}

Subgroup dummies

δ_{jk}

Interaction coefficients

Effect Aggregation

Use observation-weighted average of post-treatment coefficients to avoid TWFE bias from heterogeneous treatment effects.

Key Findings by Subgroups

Political Regime

Democracies: Larger, more significant effects
Autocracies: Constrained communication freedom

Economic Development

Advanced: Strong financial dominance increase
Emerging: Weaker effects, high baseline dominance

Heterogeneity Results: Effect Magnitudes by Subgroups

	Monetary Dominance	Financial Dominance
Baseline
Full sample	-0.1607***	0.0548***
Supervision Capabilities
Low	-0.1532**	0.0642***
Medium	-0.1373***	0.0446*
High	-0.2504***	0.0279
Political System
Autocracy	-0.1161**	0.0495**
Democracy	-0.1747***	0.0595***
Monetary Sovereignty
Full sovereignty	-0.1417***	0.0556***
Union or peg	-0.2268***	0.0244
Economic Development
Emerging & Developing	-0.0644*	0.0021
Advanced	-0.2229***	0.0774***
Mandates
Non-conflicting with price stability	-0.1479***	0.0542***
Conflicting objectives	-0.2440***	0.0588***

Note: Standard errors in parentheses. * p < 0.1; ** p < 0.05; *** p < 0.01. Effects strongest in democracies and advanced economies, confirming Hypothesis 4.

Instrumental Variable Approach

Addressing Dynamic Endogeneity

IV approach circumvents biases from time-varying omitted variables and reverse causality (CB communication influencing independence). Uses 2SLS with diffusion-based instruments.

Second Stage (2SLS) $$\psi^m_{ict} = \rho\, \widetilde{\psi}^m_{ict} + \beta_1\,\textrm{CBI}_{ct} + \beta_2\, \Delta \pi_{ct} + \beta_3\, \Delta u_{ct} + \theta_t + \mu_c + \epsilon_{ict}$$

ψ̃ᵐᵢct

Lagged DV (avg. 25 speeches)

Δπct, Δuct

Inflation & unemployment changes

Three Instruments

1. Inverse distance-weighted world CBI average
2. Electoral democracy in 10 nearest neighbors
3. Domestic judicial independence

Heterogeneity Results

Democracies: Larger, more significant effects

Advanced economies: Strong financial dominance ↑

Emerging economies: High baseline dominance limits effects

Methodological Notes

Lagged DV: 25-speech average maximizes log-likelihood
Controls: Inflation/unemployment changes from Fed literature

Identification: Diffusion processes drive CBI adoption
Caveat: Exclusion restriction potentially violated by bidirectional diffusion

Instrumental Variable Results: Complete 2SLS Estimation

	First Stage		2SLS Effect on Dominance
Dependent Variables:	CBI		Monetary	Financial
Model:	(1)	(2)	(3)	(4)
Variables
CBI (instrumented)	—	—	-0.7524*	0.6921**
	—	—	(0.4365)	(0.2969)
Monetary dominance 25 prior speeches	0.0078	—	0.4952***	—
	(0.0129)	—	(0.0727)	—
Financial dominance 25 prior speeches	—	-0.0009	—	0.3247***
	—	(0.0376)	—	(0.0567)
Δ Unemployment rate	0.0001	0.0001	-0.0016	0.0022
Neighbour's electoral democracy₋₁	0.2376	0.2448	—	—
Independence judiciary	0.0056**	0.0053*	—	—
Fixed Effects
Country	✓	✓	✓	✓
Year	✓	✓	✓	✓
Fit Statistics
R²	0.97593	0.97591	0.21513	0.11198
Observations	12,205	12,205	12,205	12,205

Instrument Validity Tests

1. Instrument Relevance

F-statistics:
• Financial dominance: 318.5
• Monetary dominance: 332.3
Result: Well above rule-of-thumb (F > 10)

2. Over-identification Tests

Sargan test p-values:
• Financial dominance: 0.08
• Monetary dominance: 0.06
Result: Fail to reject exogeneity (p > 0.05)

Caveat: LATE vs ATE

Over-identification assumes instruments identify same parameter. Different instruments may yield different LATEs due to heterogeneous treatment effects.

3. Single Instrument Robustness

Using only world CBI diffusion:
• Monetary: -0.7995* (vs -0.7524*)
• Financial: 0.7325** (vs 0.6921**)
Result: Estimates very similar

4. Instrument Correlation

Democracy vs Judicial Independence:
R² = 0.062 (correlation ≈ 25%)
Result: Instruments not redundant

2SLS Matrix Form $$\hat\beta_{\rm IV} = (Z'X)^{-1}Z'y$$

Controls W included in both X and Z (self-instrument)

Validity Assessment: Strong instruments (F > 300), exogeneity not rejected, robust to single instrument, low instrument correlation. Used as additional check for DiD analysis.

Mechanisms

Robustness Checks

1. Parallel Trends Violations

Linear country-specific trends
Conditional parallel trends with controls
Result: Quantitatively similar estimates

2. Treatment Definition Flexibility

Multiple treatments per country allowed
Continuous intensity & direction variation
Result: Patterns remain unchanged

3. Heterogeneity-Robust Estimators

Borusyak et al. (2024) imputation
Sun & Abraham (2021) cohort effects
Cengiz et al. (2019) stacked DiD
Result: Main findings confirmed

4. Sample Construction

Full event window requirements
One-year anticipation pre-dating
CBI sub-dimension definitions
Result: Robust across variations

5. Alternative CBI Measures

LVAU indicator (Garriga 2025)
Cukierman et al. (1992) dimensions
Result: Strong correlation, comparable estimates

6. Falsification Test

Randomization procedure: effects lie in extreme tails of placebo distributions (<1% probability under null).

Bottom Line: Results robust across estimators, sample definitions, CBI measures, and treatment specifications. Placebo tests confirm effects are not artifacts.

Event Study Robustness

Alternative Event Study

Continuous Event Study

Event study CBI disaggregrated

Placebo Randomization

Placebo Event studies

Alternative Explanations

1. Global Financial Crisis Effects

Concern: Effects driven by post-2008 global trends
Test: Cohort-specific effects by independence year
Result: Effects positive & significant pre-crisis

2. Euro Area Confounding

Concern: Euro adoption (1998) & SSM (2014)
Test: Exclude euro area / add ECB as control
Result: Robust when euro area excluded

3. Independence Precedent

Concern: Epistemic community effects from first increase
Test: Use only first independence change per country
Result: Similar patterns observed

4. Audience Targeting

Concern: Shift toward financial market audiences
Test: Event studies with audience indicators
Result: Audiences change little; effects hold within audiences

5. Supervision Mandate Changes

Placebo test: Replace CBI with banking supervision changes. No significant association with financial dominance found.

Data Sources

Masciandaro et al. (2018) supervision data
CBIE index policy dimension III (Romelli 2024)
Randomization tests following Miller et al. (2021)

Conclusion: Effects are not artifacts of global trends, euro area dynamics, audience shifts, or supervisory mandate changes.

Alt explanations I

Alt explanations II

Effect on audiences

Placebo Test: Supervision Mandate Changes

Question: Are effects driven by changes in supervisory powers rather than CBI?
Approach: Replace CBI events with banking supervision changes

Supervision Change	Two-way Fixed Effects	Gardner et al. (2024)
Increase in Banking Supervision
Masciandaro & Romelli (2018)	-0.0297	-0.0122
	(0.0188)	(0.0145)
CBI Policy Q3 (Romelli, 2024)	0.0093	0.0017
	(0.0159)	(0.0126)
Decrease in Banking Supervision
Masciandaro & Romelli (2018)	-0.0083	0.0195
	(0.0338)	(0.0272)
CBI Policy Q3 (Romelli, 2024)	0.0026	0.0323
	(0.0326)	(0.0261)

Placebo Test Result

No significant association between banking supervision changes and financial dominance communication. Effects are due to CBI changes, not supervisory mandate shifts.

Heterogeneous Effects Regarding CBI Level

	Monetary Dominance	Financial Dominance
Baseline
Full sample	-0.1607*** (0.0554)	0.0548*** (0.0200)
CBI level before independence
Low	-0.1464** (0.0606)	0.0474* (0.0243)
Medium	-0.1728** (0.0759)	0.0741*** (0.0266)
High	-0.1082* (0.0654)	0.0371 (0.0253)

Methodological Notes

Stratification: Countries divided by pre-independence CBI index: Low (CBI < 0.5), Medium (0.5 ≤ CBI ≤ 0.6), High (CBI > 0.6)
Sample distribution: 12 high-CBI, 24 medium-CBI, 36 low-CBI independence events. Never-treated: 7 low-CBI, 4 medium-CBI, 6 high-CBI countries
Key finding: Effects strongest for medium-CBI countries, weakest for already high-CBI countries

CBI level

Appendix: Chapter 3

Jump:

Topic Model

Workflow BERT Topic Validation Topic Regrouping Speeches Frequency Wordcloud I Wordcloud II Bertopic Pipeline

Methods and Results

Methods Overview Markov Analysis Regression Analysis NCBs First Movers ECB Direct Responsive NCBs Lead 3 Month Lags Permutations Coefficient Plot Crisis Coefficient Plot Leads Coefficient Plot Half Yearly Crisis Topic Transitions Panel Pre/post crisis coefficient plot Figure Climate Transition Figure Crisis Transition Transition Matrix Robustness

Scope and Alternative

Sintra effects Coalition effects Crisis Topic Transitions Markov Topic Transitions Topic Unity Climate Topic Unity Monetary

Workflow BERT

Topics validation

Methods Overview

BERTopic Model Pipeline

Core Pipeline Steps

1 Document Embeddings

all-MiniLM-L6-v2 generates 384-dimensional vectors

2 Text Vectorization

CountVectorizer with n-grams (2-4), stopword filtering

3 Dimensionality Reduction

UMAP (5 components, cosine distance)

4 Topic Clustering

BERTopic with c-TF-IDF refinement

Quality Features

• Efficiency: Distilled BERT model
• Noise reduction: Multi-word filtering
• Coherence: Two-step outlier removal
• Analysis: Temporal trend extraction

Markov Chain Analysis: Measuring Responsiveness

First-Mover Identification

Threshold criteria:
• Topic discussed in ≥25% of speech
• ≥3 other CBs discuss topic in subsequent year
Purpose: Identify main topics with follow-up (not noise)

Transition Count $$m_{ab \mid k} = \mathrm{count}\{t : CB_t = a, CB_{t+1} = b, T_t = k\}$$

How often bank a is followed by bank b for topic k

Conditional Transition Probability $$P_{ab \mid k} = \frac{m_{ab \mid k}}{\displaystyle \sum_{c \in \mathcal{B}} m_{ac \mid k}}$$

Row-stochastic matrix: each row sums to 1

Time-Varying Implementation

30-day moving window to:
• Smooth short-term fluctuations
• Capture communication cycles
• Ensure sufficient observations
• Exclude non-stationary effects

Sliding Window Probabilities $$m_{ab \mid k}^{(w)} = \mathrm{count}\{t \in \text{window } w : CB_t = a, CB_{t+1} = b, T_t = k\}$$ $$P_{ab \mid k}^{(w)} = \frac{m_{ab \mid k}^{(w)}}{\displaystyle \sum_{c \in \mathcal{B}} m_{ac \mid k}^{(w)}}$$

Markov Property

Next state depends only on present state. Enables measurement of direct responsiveness between central banks.

Robustness: Alternative windows tested (7, 14, 60, 90 days). Uses highest topic probability from BERTopic. Discrete time implementation for simplicity.

Regression Models: ECB Responsiveness to NCBs

Model A: Baseline Responsiveness $$EB_{t,p} = \alpha_p + EB_{t-1,p} + \beta_1 NCB_{t-1,p} + \epsilon_{t,p}$$

Model B: Topic Heterogeneity $$EB_{t,p} = \alpha_p + EB_{t-1,p} + \beta_1 NCB_{t-1,p} + \sum_{j=2}^{T} \beta_{2,j} (D_{p,j} \times NCB_{t-1,p}) + \epsilon_{t,p}$$

Model C: Pressure Interactions $$\begin{align} EB_{t,p} = & \alpha_p + EB_{t-1,p} + \beta_1 NCB_{t-1,p} + \gamma C_t \\ & + \sum_{j=2}^{T} \beta_{2,j} (D_{p,j} \times NCB_{t-1,p}) \\ & + \sum_{j=2}^{T} \beta_{3,j} (C_t \times NCB_{t-1,p} \times D_{p,j}) \\ & + \sum_{j=2}^{T} \beta_{4,j} (C_t \times D_{p,j}) + \epsilon_{t,p} \end{align}$$

Three-way interactions test pressure effects (H2, H3)

Pressure Proxies (Ct)

Economic: Real GDP growth, inflation
Public: ECB trust (Eurobarometer), Google Trends
Salience: "ECB" search interest (0-100 scale)

Index Construction $$I_{b,q,t} = \frac{1}{|S_{b,q}|} \sum_{s \in S_{b,q}} p_{s,t}$$

Average topic proportion across all speeches in quarter q

Technical Details

Controls: GDP growth, inflation (Fed literature)
Standard errors: Panel-corrected (Beck & Katz 2006)
Counterfactuals: Contemporaneous, no correlation, reverse causality
Robustness: Lead specifications test reverse causality

Progressive Testing: Model A → general responsiveness | Model B → topic differences | Model C → pressure conditions. Three counterfactuals examined.

NCBs first movers

ECB direct responsive

NCBs lead 3 month lags

Permutations

Permutation-Based Validation of Transition Patterns

Research Question

"If topic sequence was completely random, how often would we see patterns at least as extreme as observed?"

Permutation Procedure

1. Dichotomization: Speeches >25% topic share → Low/High intensity (median cutoff)
2. Transition matrices: Four first-order Markov probabilities
3. Null distribution: 100 permutations shuffling topic intensities
4. Preservation: Marginal distributions, speech counts, seasonality

Global Test Result

χ² test rejects randomness at p < 0.001. Sequence far from "noise machine."

Observed vs Expected Patterns

Within-band continuations: Markedly higher than null
Cross-band jumps: Suppressed below expectation

Topic-Specific Patterns

Core macro topics: Strongest excess probabilities (monetary policy, economic indicators, financial markets)
Digital/supervision: Mixed patterns, spikes in high-pressure moments
National economy: Aligns with null (ECB avoids domestic debates)

Conclusion: Transition structure statistically and substantively distinct from random sequencing. Intensity conditioned by strategic topic salience.

Topic Regrouping

Coefficient Plot Crisis

Coefficient Plot Leads

Coefficient Plot Half Yearly

Crisis Topic Transitions Panel

Speeches frequency

Topic word clouds I

Topic word clouds II

Pre and post crisis coefficient plots

Figure Climate Transition

Figure Crisis Transition

Transition Matrix Robustness

Impact of NCB Topic × Sintra Interaction on ECB Responsiveness

Topic	FR	IT	ES	DE	NL
Monetary Policy	-0.336 (0.280)	-0.332 (0.238)	0.039 (0.209)	0.188 (0.212)	0.045 (0.250)
Economic Indicators	-0.221 (0.149)	0.344 (0.860)	-1.534 (0.959)	-0.275 (0.199)	-0.203 (0.292)
Financial Markets	-0.143 (0.091)	-0.742** (0.310)	0.144 (0.480)	-0.355** (0.147)	0.270 (0.196)
Banking Supervision	0.200 (0.248)	-1.183*** (0.404)	-0.039 (0.115)	0.030 (0.161)	0.055 (0.074)
Digital Finance	0.029 (0.227)	-0.098 (0.142)	-0.183 (0.129)	-0.001 (0.163)	-0.013 (0.064)
International Economics	-0.016 (0.183)	-0.266 (0.284)	0.160 (0.584)	-0.635** (0.297)	0.129 (0.230)
Crisis Management	0.045 (0.096)	-0.128 (0.263)	-0.655 (1.322)	0.127 (0.173)	-0.297 (0.380)
Climate	-0.078 (0.403)	0.040* (0.023)	-0.230* (0.120)	0.451 (0.522)	-0.032 (0.096)
Payment Systems	-0.053 (0.076)	0.320 (0.270)	1.121** (0.525)	0.061 (0.064)	-0.158 (0.386)
National Economy	-0.512** (0.240)	0.053 (0.035)	0.008 (0.036)	-0.012 (0.044)	-0.692** (0.310)

Note:

Coefficients and clustered standard errors for the interaction term NCB Topic × Sintra. Standard errors in parentheses. * p < 0.1, ** p < 0.05, *** p < 0.01.

Coalition

Coalition Topic Transition

Markov Topic transitions

Topic unity Climate

Topic unity Monetary

Appendix: Chapter 4

Jump: Classification examples Confusion Matrices Accuracy Tokens L1 L2 Accuracy Tokens L3 Spreads Stability and Uncertainty Temperature L1 L2 Temperature L3 Topics Model Architectures Classification Shares Class Imbalance Guess the regime

Descriptive variation

Crisis Democracy GDP PPP Inflation Normative by Group Normative by Topic Polarization

Classification Examples

Classification	Example
Monetary Dominance	"...monetary policy...without prejudice to our primary mandate of safeguarding price stability." (ECB, 2021) Key: Price stability mandate above other priorities.
Fiscal Dominance	"...funds could be spend toward the direct purchase of debt..." (Argentina CB, 2008) Key: Direct debt purchase over price stability.
Financial Dominance	"...flexible provision of liquidity contained market participants' concerns..." (BoJ, 2002) Key: Accommodating financial markets.
Monetary-fiscal Coordination	"...new work stream on monetary-fiscal interactions..." (ECB, 2020) Key: Direct monetary-fiscal interactions.
Monetary-financial Coordination	"...market participants...work together...for ongoing financial stability." (Fed, 2017) Key: Coordination for financial stability.

Confusion Matrices

Crisis

Democracy

GDP PPP

Inflation

Normative by Group

Normative by Topic

Polarization

Accuracy Tokens L1 L2

Accuracy Tokens L3

Spreads

Stability and Uncertainty

Temperature Accuracy L1 L2

Temperature L3

Topics

Addressing Class Imbalance in Classifier Performance

Class Imbalance Challenge

Problem: High error rates in low-frequency categories
Solution: Focus on minority categories of interest (fiscal/financial dominance)

Validation Metrics

Five standard metrics used:
• Accuracy • F1 macro • F1 weighted
• Precision • Recall
Key focus: Macro average F1 score

Performance Formulas $$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$ $$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$ $$\text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$$

Why Macro F1?

Unweighted average of per-class F1 scores → sensitive to performance in minority categories (fiscal/financial dominance).

Target Categories

Fiscal dominance: Relatively infrequent
Financial dominance: Relatively infrequent
Priority: Accurate detection of these minority classes

Implementation

Scikit-learn package for metric calculation
Macro F1 used for prompt selection
Not interested in 'none' category performance
Focus on actionable policy regime detection

Model Architectures for Central Bank Communication Classification

🔍 Encoder-Only Models

Examples: BERT, RoBERTa, FinBERT
Architecture: Bidirectional attention for contextual understanding

✅ Advantages

• Superior classification accuracy
• Fast inference for regime categorization
• Domain adaptation (FinBERT for finance)
• Computational efficiency

❌ Disadvantages

No text generation capability
Fixed input length constraints
Single-task optimization only

📝 Decoder-Only Models

Examples: GPT-4, Claude, Gemini
Architecture: Autoregressive generation with causal attention

✅ Advantages

• Flexible output + explanations
• Few-shot learning capability
• Long context handling
• Reasoning explanations

❌ Disadvantages

High computational cost
Inconsistent classification outputs
Hallucination risks
Complex fine-tuning requirements

🔄 Encoder-Decoder Models

Examples: T5, BART, Flan-T5
Architecture: Bidirectional encoder + autoregressive decoder

✅ Advantages

• Best of both worlds
• Structured output formatting
• Task versatility
• Controllable generation

❌ Disadvantages

Higher model complexity
Memory intensive training
Sophisticated training required
May be overkill for simple tasks

💡 Recommendation

Encoder-only models (especially FinBERT) optimal for pure classification tasks. Use decoder-only when explanations needed. Encoder-decoder offers flexibility but may be unnecessary for regime categorization.

Classification Performance: Human vs GPT-3.5

Classification	Validation (Humans)	Validation (GPT-3.5)	Full Dataset
NA	0.0%	0.0%	0.1%
Financial dominance	3.8%	10.7%	9.7%
Fiscal dominance	1.2%	2.6%	2.9%
Monetary dominance	6.1%	10.1%	10.1%
Monetary-financial coordination	10.9%	14.7%	14.8%
Monetary-fiscal coordination	2.5%	4.5%	3.6%
None	75.5%	57.4%	58.9%

GPT-3.5 "Yea-Sayer" Bias

Over-assigns dominance/coordination labels at expense of "none" category. Gemini model (Chapter 2) shows reduced bias.

Macro F1 Advantage

Sensitive to changes in minority categories we care about most (fiscal/financial dominance).

Guess the Regime

Loading example…

AMonetary Independence

BMonetary → Fiscal Coordination

CFiscal Dominance

DFinancial Coordination

EFinancial Dominance

FOther / Neutral

A–F classify N next | L spotlight

Website & Replication

Replication: GitHub
Dashboards: centralbanktalk.eu