A Methodological Audit of Hill & Haughey's "Shaky Political Science Misses Mark on Ranked Choice Voting"

Response to: Steven Hill and Paul Haughey, “Shaky political ‘science’ misses mark on ranked choice voting,” SSRN Working Paper (November 2025). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5238675

Summary

Steven Hill and Paul Haughey published a 38-page paper on SSRN in November 2025 reviewing 41 studies on ranked choice voting. Their thesis: critical studies are methodologically deficient, while supportive studies reflect genuine political science. They name researchers by name, accuse them of “sloppy work,” and devote an entire section to attacking a specific political scientist’s tweets and reform preferences.

This audit examines the actual papers Hill & Haughey attack — not their characterizations — from source. The finding is unambiguous: the studies Hill attacks are, on average, more methodologically rigorous than the studies he praises. The attacked studies more frequently use formal causal identification, larger samples, and more sophisticated statistical techniques. The praised studies more frequently rely on descriptive case studies, advocacy-organization reports, and single-election snapshots. Hill & Haughey dismiss surveys as invalid when they produce unfavorable results, then cite surveys approvingly when they produce favorable ones. They demand peer review from critics, then fail to disclose — or fail to know — when the papers they attack are peer-reviewed, including in the most prestigious journal in the discipline. They present an advocacy brief co-authored by the co-founder of FairVote, citing six FairVote reports as counter-evidence, while accusing others of operating in an “unvirtuous circle” of conflicted citation.

This is not a methodological critique. It is a prosecutor’s brief wearing footnotes.

I. The papers Hill attacks: what they actually do

McCarty (2024), “Minority Electorates and Ranked Choice Voting”

Hill says: “Sloppy work that is poor political science.” Claims McCarty “invented two new metrics” to “cherry-pick” results.

The paper: Uses cast vote records from the NYC Board of Elections and voter demographic data from L2 at the electoral district level (N ≈ 5,600 districts). Multivariate regressions with borough fixed effects and robust standard errors across multiple NYC Democratic primary races and Alaska’s 2022 elections. The “adjusted exhaustion” measure strips out voters who ranked a finalist first — because they mechanically cannot exhaust — isolating the behavioral question of truncation among voters whose preferred candidates were eliminated. This is standard methodological refinement, not cherry-picking. Hill also claims McCarty “failed to mention” NYC’s diverse electoral outcomes; McCarty’s narrower point — that differential exhaustion patterns exist — is logically compatible with diverse winners. Both things can be simultaneously true.

Where Hill has a partial point: McCarty’s claim “I am not aware of any study that finds a boost in turnout associated with switching to RCV from plurality voting” is selective — Kimball & Anthony found a 32.7-point reduction in drop-off between rounds.

Hill’s accuracy: ~35%.

McDaniel (2016), “Writing the Rules to Rank the Candidates”

Hill says: “A textbook example of poor political science with little redeeming value.”

The paper: Peer-reviewed in the Journal of Urban Affairs (three anonymous reviewers). Uses geographically-weighted ecological inference (GWR-Ei) models — a recognized technique that accounts for spatial autocorrelation, developed by Calvo & Escolar (2003) — with multilevel mixed-effects regression, controlling for incumbency, candidate race, population age, education, income, racial diversity, and cross-level interactions. Hill claims McDaniel “failed to mention that the 2011 mayoral election did not have any Black candidates.” This is false — McDaniel’s Table 1 explicitly codes candidate race by election, and the absence of a Black candidate in 2011 is central to his racial group interest theory framework. Hill is correct that two low-competitiveness elections limit the causal claims and that national turnout trends aren’t controlled for.

Hill’s accuracy: ~30%.

Atkinson, Foley & Ganz (2024), “Beyond the Spoiler Effect”

Hill says: “Too removed from reality to have any value.”

The paper: Explicitly finds that IRV has better centripetal force than plurality. In low-polarization electorates, IRV’s C-force is 0.517 vs. plurality’s 0.374. The paper’s actual claim is that IRV falls short of Condorcet, especially in high-polarization electorates — precisely where reformers promote it. Hill frames this as “RCV increases polarization,” which inverts the paper’s finding relative to the status quo. The 60% vs. 99% Condorcet-winner calibration gap is a legitimate concern about the model’s external validity.

Hill’s accuracy: ~40%.

Buisseret & Prato (2023), “Politics Transformed?”

Hill says: “Dense mathematical model,” “too complex and convoluted to describe.”

The paper: Game-theoretic model from Harvard and Columbia identifying precise equilibrium conditions under which RCV encourages broad vs. base-targeting strategies. Finds that in high-apathy or high-polarization environments — precisely the U.S. contexts where RCV is promoted — RCV can intensify base-targeting more than plurality. The mechanism: under RCV, a candidate can win by beating a similar candidate to inherit their second preferences, creating incentives to differentiate rather than converge. This models strategic candidate behavior that observational studies cannot capture. Hill responds with one mayoral race as a counterexample — comparing an existence proof to a conditional theorem.

Hill’s accuracy: ~15%.

Colner (2024), “Running Toward Rankings”

Hill says: Listed as a New America web-hosted study; author “clearly did not possess enough local knowledge.”

The paper: Published in the American Journal of Political Science — the single most prestigious journal in the discipline. Preregistered difference-in-differences design with genetic matching across 273 cities and 6,000+ elections over three decades. Finds transient, low-quality candidate entry with no effect on diversity. Hill’s San Francisco public financing critique is a legitimate local-knowledge point that doesn’t invalidate a 273-city study with city fixed effects. Hill either didn’t know or didn’t disclose that this paper was published in AJPS. Calling an AJPS paper “not very illuminating” while praising FairVote descriptive reports as credible science is an indefensible asymmetry.

Hill’s accuracy: ~10%.

Pettigrew & Radley (2025), “Overvotes, Overranks, and Skips”

Hill says: Claims were “baseless and largely out of context,” relying on Parry & Kidd’s rebuttal as his primary evidence. Hill presents this as his single most rigorous specific critique — the one resting on a “third-party scholarly rebuttal” rather than just his own assertions.

The paper: Published in Political Behavior — one of the top three journals in the subfield — after peer review. Analyzes 3 million cast vote records across Alaska, Maine, New York City, and San Francisco, representing over three-quarters of all Americans living in RCV jurisdictions. Develops a systematic typology of ballot mismarks (overvotes, overranks, skips) and finds a 4.8% mismark rate, with a 0.53% final-round rejection rate vs. 0.04% for non-RCV races on the same ballot. Includes precinct-level demographic analysis showing mismarks are significantly higher in precincts with more Black voters, more Hispanic voters, more residents below the poverty line, and fewer residents with bachelor’s degrees. Replication data and code are publicly available. No external funding disclosed.

Hill’s counter-evidence, the Parry & Kidd “rebuttal”: A blog post on the website of the Institute for Mathematics and Democracy at Utah Valley University. Not peer-reviewed. Contains no original data analysis. Its key counterpoint cites FairVote’s “17% more votes make a difference” report. What the Parry & Kidd post gets wrong:

It attacks a claim Pettigrew & Radley don’t make. Parry & Kidd accuse P&R of “exaggerating the issue over 10 times” because most mismarked ballots are counted. But P&R explicitly report both numbers — 4.8% mismarks and 0.53% rejection rate — and clearly distinguish between them throughout the paper. P&R never claim 1 in 20 ballots was rejected. They claim 1 in 20 was mismarked, which is what the data show.

It confuses machine readability with voter comprehension. Parry & Kidd’s core theoretical move is that skips and overranks are “processable” by the instant-runoff algorithm and therefore shouldn’t count as errors. But P&R’s point is that mismarks are evidence of voter confusion, not that they invalidate ballots. A voter who ranks candidate B first, skips rankings 2-4, and ranks candidate A fifth almost certainly didn’t understand the ballot. The algorithm handles it; the voter didn’t intend it. Parry & Kidd’s argument that “we should not be in the business of second guessing voter intent” is precisely the analytical abdication P&R are trying to correct.

It ignores the demographic findings — the most important results in the paper. P&R show mismark rates are significantly higher in precincts with more Black and Hispanic voters, more poverty, and less education. If skips and overranks were just “political expression” as Parry & Kidd claim, they wouldn’t concentrate systematically among the least-educated, lowest-income voters. The demographic gradient is the equity signal. Parry & Kidd don’t address it.

The “10x a small number is still small” argument is normative, not analytical. Whether 0.53% vs. 0.04% is “acceptable” is a policy judgment. P&R don’t claim RCV should be abolished; they say these findings “raise key questions.” The analytically important result is that the gap is consistent across all four jurisdictions and both high-salience and low-salience races.

The time-series argument undermines Parry & Kidd. They claim initial error increases will “fade with experience.” But P&R include San Francisco’s time series from 2007-2022: rejection rates declined from 2008-2011, then plateaued for a decade and never dropped to non-RCV levels. Even after 18 years of use, SF’s RCV rejection rates remain significantly higher than non-RCV races on the same ballot.

The “17% more votes” counterpoint is a FairVote report, not an independent finding. Whether RCV’s benefits outweigh its costs is a separate question from whether mismarks are more common. P&R answer the measurement question. Parry & Kidd redirect to the policy question using advocacy data.

What Parry & Kidd get right: The distinction between processable mismarks and ballot-invalidating overvotes is a useful taxonomic point. But it is an interpretive contribution, not a methodological demolition — and presenting it as the latter, as Hill does, is what’s misleading.

The asymmetry in full: A peer-reviewed paper in Political Behavior with 3 million observations, public replication data, and no external funding — attacked on the strength of a non-peer-reviewed blog post with no original data analysis, citing FairVote reports. This is Hill’s strongest specific critique. It is also his most revealing.

Hill’s accuracy: ~20%.

II. The papers Hill praises: what they actually do

Reilly, Lublin & Wright (2023), “Alaska’s New Electoral System” — Study 5

Peer-reviewed in the California Journal of Politics and Policy. Funded by Unite America (reform advocacy). A descriptive case study of Alaska’s first RCV election cycle with no causal identification strategy, no regression, no counterfactual, and no control group. Qualitative interpretation of a single election cycle. Methodologically less rigorous than every attacked study assessed above.

Donovan, Tolbert & Gracey (2016), “Campaign Civility” — Study 7

Peer-reviewed in Electoral Studies. A survey study — the methodology Hill dismisses throughout the paper when it produces unfavorable results. Uses 2,432 respondents across matched cities with ordered logistic regression, jack-knifed samples, and CEM matching. Genuinely well-designed. But its acceptance exposes the asymmetry: Hill treats surveys as valid when they find RCV reduces negativity and invalid when they find voter confusion. The authors also explicitly acknowledge the endogeneity problem: cities that adopt RCV may have a more civil political culture already. The survey cannot distinguish “RCV caused civility” from “civil cities adopted RCV.”

FairVote Reports — Studies 11, 13, 22, 25, 38, 41

Six reports from the primary national advocacy organization for RCV, co-founded by Hill himself. None peer-reviewed. No causal identification. Descriptive tabulations and exit polls. Cited as credible counter-evidence while Hill devotes Section VI to arguing that an academic research consortium creates an “unvirtuous circle” of conflicted citation.

Kimball & Anthony (2016) — Study 34

A conference paper — not published in a peer-reviewed journal. Hill’s entire Section VI attacks SSRN and non-peer-reviewed venues for “diluting academic rigor.” He never mentions that this paper occupies the same methodological tier as the papers he dismisses.

Coll (2021) — Study 35

Peer-reviewed in Politics and Governance. A YouGov online survey of 1,000 respondents. Hill praises it for finding no demographic disparities. But it’s structurally almost identical to the Ntounias Mechanical Turk study Hill ridicules as Study 14: both are online platform surveys of respondents engaging with ranked-choice scenarios. The difference is that Coll finds favorable results.

III. The rigor comparison

Dimension	Studies Hill Attacks	Studies Hill Praises
Peer review	McCarty: No. McDaniel: Yes (JUA). Colner: Yes (AJPS). Atkeson: Yes (SSQ). Pettigrew: Yes (Pol Behavior). Buisseret: WP (Harvard/Columbia)	Reilly: Yes (CJPP). Donovan: Yes (ES). Coll: Yes (P&G). Parry/Kidd: No (blog post). 6 FairVote reports: No. Kimball: No (conference paper)
Causal identification	McCarty: ecological regression w/ borough FE. Colner: preregistered DiD + genetic matching. Buisseret: formal equilibrium analysis. Pettigrew: 3M CVR records w/ demographic analysis	Reilly: none (descriptive case study). FairVote: none (tabulation). Parry/Kidd: none (no original data). Donovan: survey matching w/ CEM
Sample / scope	Colner: 273 cities, 6,000+ elections. McCarty: 5,600+ electoral districts. Pettigrew: 3M ballots, 78% of all US RCV voters. Buisseret: formal model	Reilly: 4 statewide races, 1 election. FairVote reports: variable. Parry/Kidd: zero original data. Donovan: 2,432 respondents, 10 cities
Advocacy funding	McCarty: Center for Election Confidence (anti-RCV). Colner: New America / Arnold Ventures. Pettigrew: none disclosed	Reilly: Unite America (pro-reform). FairVote x 6: FairVote (co-founded by Hill). Parry/Kidd: IMD (pro-RCV institute). RepresentWomen: pro-RCV

IV. The Section VI ad hominem

Hill & Haughey devote an entire section to attacking my tweets, my shift from RCV advocacy to fusion voting, and my role at New America. The section does not engage with any substantive argument I have made for why proportional representation or fusion voting might be structurally superior to single-winner RCV. The actual intellectual project — that multi-winner systems changing the party structure produce better outcomes than single-winner tweaks leaving the two-party duopoly intact — goes entirely unaddressed. Hill treats conclusions as evidence of bias rather than engaging reasoning.

V. What Hill gets right

The comparative benchmark argument. Many RCV-critical studies fail to compare RCV against plurality on the same metrics. If you show RCV has ballot exhaustion problems, you should also show whether plurality produces “wasted votes” at similar or higher rates. This is a legitimate and important methodological standard. Hill applies it inconsistently — never comparing RCV to other reforms like fusion, approval, STAR, or PR — but the principle is sound.

VI. Conclusion

Hill & Haughey’s paper applies an asymmetric standard so systematic it constitutes intellectual dishonesty. It demands peer review from critics while failing to disclose when attacked papers are peer-reviewed — including in the most prestigious journal in the discipline. It dismisses surveys as invalid when they produce unfavorable results while citing surveys approvingly when they support RCV. It attacks an academic research consortium for producing “an unvirtuous circle” of flawed citations while building its own case on six reports from an advocacy organization co-founded by the lead author. It presents an unpeer-reviewed blog post with no original data as a credible rebuttal of a 3-million-ballot study published in Political Behavior.

The studies Hill attacks are more rigorous than the studies he praises by every standard metric: peer review, causal identification, sample size, and replication transparency. The double standard is not incidental. It is the paper’s organizing principle.