Source M: Cook Political Partisan Voting Index (PVI) Processing

Source M: Cook Political Partisan Voting Index (PVI) Processing#

Overview

This notebook processes Cook Political Partisan Voting Index (PVI) data to generate the Source M dataset used in the Bridge Grades methodology to measure the partisan lean of each U.S. congressional district relative to the national average.

Source M: Cook Political Partisan Voting Index (PVI) - Measures district-level partisan lean for both House and Senate

The PVI values generated here will be merged into the master district-level dataset for final Bridge Grade calculations, providing context for evaluating bipartisan behavior relative to district characteristics.

Data Sources#

Input Files#

Cook PVI 1997-2025.xlsx - House district PVI data from Cook Political (paid subscription)
2025 PVI States.csv - State-level PVI data for Senate races
119th_Congress_*.csv - Congressional metadata with bioguide IDs and district assignments

Data Source Details#

Source: Cook Political (paid subscription)
Congress: 119th U.S. Congress
Download Date: April 17, 2025 (Data updates periodically. Last checked: August 8, 2025)
Coverage: Partisan Voting Index for all House districts and Senate states
Original Processing: PVI Pre-Processing.ipynb

Outputs#

Source M: House PVI#

File: bridge_grade_source_m_house_pvi.csv

Columns:

Name: Legislator’s full name
Chamber: “House” for all records
bioguide_id: Unique legislator identifier
Party: Legislator’s political party
Cook_PVI_Party: District’s partisan lean (D/R)
Cook_PVI_Number: Numeric PVI value (0 if party mismatch)
PVI_Party_Diff: Boolean flag for party mismatches

Interpretation: Higher PVI numbers indicate stronger partisan lean in the district. Zero values indicate party mismatches where the legislator represents a district leaning toward the opposite party.

Source M: Senate PVI#

File: bridge_grade_source_m_senate_pvi.csv

Columns:

Name: Senator’s full name
Chamber: “Senate” for all records
bioguide_id: Unique legislator identifier
Party: Senator’s political party
PVI_Party: State’s partisan lean (D/R)
PVI_Number: Numeric PVI value (0 if party mismatch)
PVI_Party_Diff: Boolean flag for party mismatches

Interpretation: Higher PVI numbers indicate stronger partisan lean in the state. Zero values indicate party mismatches where the senator represents a state leaning toward the opposite party.

Technical Requirements#

Dependencies#

pandas: Data manipulation and analysis
numpy: Numerical operations
scipy.stats: Statistical functions (imported but not used in current version)
warnings: Warning suppression for pandas operations

Performance Notes#

State abbreviation mapping is comprehensive for all 50 states plus DC
District key generation handles “At Large” districts as district 1
Party mismatch detection prevents inappropriate PVI scoring
All original PVI data is preserved with adjustments clearly flagged

Data Quality#

Data Integrity Notes#

PVI data is sourced from Cook Political, a respected political analysis firm
Party mismatch adjustments ensure fair evaluation of bipartisan behavior
State-district key generation handles edge cases (At Large districts, zero-padding)
Both House and Senate data are processed consistently

Key Features#

House Coverage: All 431 House districts with PVI data
Senate Coverage: All 100 Senate seats with state-level PVI data
Party Mismatch Handling: Automatic zeroing of PVI scores for opposing lean
Standardized Format: Consistent output structure for both chambers

PVI Interpretation#

PVI Scale: Measures how much more Democratic or Republican a district/state is compared to the nation
Positive Values: Indicate Democratic lean relative to national average
Negative Values: Indicate Republican lean relative to national average
Zero Values: Indicate party mismatches or neutral districts

Notebook Walkthrough: Source M - Cook Political Partisan Voting Index (PVI)#

This notebook prepares the Source M dataset—Cook Political Partisan Voting Index (PVI)—used in the Bridge Grades methodology to measure the partisan lean of each U.S. congressional district relative to the national average.

Source M: Cook Political Partisan Voting Index (PVI)
Origin: Obtained via paid subscription from Cook Political
Date downloaded: April 03, 2025 (Data updates periodically. Last checked: August 8, 2025)

The PVI values generated here will be merged into the master district-level dataset for final Bridge Grade calculations.

import pandas as pd
import numpy as np
from scipy.stats import norm
import seaborn as sns
import matplotlib.pyplot as plt
import glob

import warnings
import pandas as pd
from pandas.errors import SettingWithCopyWarning
warnings.simplefilter(action='ignore', category=(SettingWithCopyWarning))

Load & Parse PVI Values#

We read the Excel sheet, split the "2025 Cook PVI" column into party and numeric components, and cast the numeric part to integer.

# Load raw Cook PVI for the 119th Congress (2025–26)
house_pvi = pd.read_excel(
    '../Data/Source M/Input files/Cook PVI 1997-2025.xlsx',
    sheet_name='119 (25-26)'
)

# Split "2025 Cook PVI" into Party and Number
house_pvi[['PVI_Party','PVI_Number']] = (
    house_pvi['2025 Cook PVI']
    .str.split('+', n=1, expand=True)
)

# Convert numeric part to integer, coerce errors to NaN
house_pvi['PVI_Number'] = pd.to_numeric(house_pvi['PVI_Number'], errors='coerce').fillna(0).astype(int)

house_pvi.head(10)

	State	Number	Member	Party	2025 Cook PVI	PVI_Party	PVI_Number
0	Alabama	1	Barry Moore	R	R+27	R	27
1	Alabama	2	Shomari Figures	D	D+5	D	5
2	Alabama	3	Mike Rogers	R	R+23	R	23
3	Alabama	4	Robert Aderholt	R	R+33	R	33
4	Alabama	5	Dale Strong	R	R+15	R	15
5	Alabama	6	Gary Palmer	R	R+20	R	20
6	Alabama	7	Terri Sewell	D	D+13	D	13
7	Alaska	AL	Nick Begich	R	R+6	R	6
8	Arizona	1	David Schweikert	R	R+1	R	1
9	Arizona	2	Eli Crane	R	R+7	R	7

Load & Parse Senate PVI#

We read the CSV of state‐level PVI, split the "Raw PVI" into party lean and numeric components, and convert the numeric part to integer.

# Load raw Senate PVI by state
senate_pvi = pd.read_csv(
    '../Data/Source M/Input files/2025 PVI States.csv'
)

# Split "2025 PVI" into Lean Party and Number
senate_pvi[['PVI_Party','PVI_Number']] = (
    senate_pvi['2025 PVI']
    .str.split('+', n=1, expand=True)
)

# Convert numeric part to integer, default 0 on errors
senate_pvi['PVI_Number'] = pd.to_numeric(
    senate_pvi['PVI_Number'], errors='coerce'
).fillna(0).astype(int)

senate_pvi.head(10)

	State	2025 PVI	Raw PVI	Rank (D to R)	PVI_Party	PVI_Number
0	Alabama	R+15	R+14.81	44	R	15
1	Alaska	R+6	R+6.46	32	R	6
2	Arizona	R+2	R+2.06	27	R	2
3	Arkansas	R+15	R+15.31	46	R	15
4	California	D+12	D+11.52	6	D	12
5	Colorado	D+6	D+5.96	14	D	6
6	Connecticut	D+8	D+8.08	8	D	8
7	Delaware	D+8	D+8.01	9	D	8
8	D.C.	D+44	D+43.6	1	D	44
9	Florida	R+5	R+5.39	29	R	5

Prepare party-state-district keys#

We add a key that will later be used to identify the different parties. We also need a common union key with the format “XX-DD” (for example, “CA-12”). We:

Add a key to identify the party with the initial of its name.
Assign the full names of the states to the USPS abbreviations in both DataFrames.
Create a district string with zero padding, treating “At Large” as district 1.

# Read in the 119th Congress data with bioguide ids
files = sorted(glob.glob("../Data/Source C-D-E-F/Input files/119th_Congress_*.csv"))
latest = files[-1]
df_119 = pd.read_csv(latest)
meta_data = df_119[["Name", "bioguide_id", "Party", "Chamber", "State", "District"]]

# meta_data["Party"] is R, if the party is Republican, D if Democrat, I if Independent
meta_data["Party_Abbr"] = meta_data["Party"].replace({
    "Republican": "R",
    "Democratic": "D",
    "Independent": "I"
})

meta_data.head(10)

	Name	bioguide_id	Party	Chamber	State	District	Party_Abbr
0	Mark B. Messmer	M001233	Republican	House	Indiana	8.0	R
1	Delia C. Ramirez	R000617	Democratic	House	Illinois	3.0	D
2	Tim Sheehy	S001232	Republican	Senate	Montana	NaN	R
3	Ben Ray Luján	L000570	Democratic	Senate	New Mexico	NaN	D
4	Josh Hawley	H001089	Republican	Senate	Missouri	NaN	R
5	Peter Welch	W000800	Democratic	Senate	Vermont	NaN	D
6	Bernie Moreno	M001242	Republican	Senate	Ohio	NaN	R
7	LaMonica McIver	M001229	Democratic	House	New Jersey	10.0	D
8	Chrissy Houlahan	H001085	Democratic	House	Pennsylvania	6.0	D
9	Ashley Moody	M001244	Republican	Senate	Florida	NaN	R

# State name → USPS abbreviations

state_abbr = {
    'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR', 'California': 'CA',
    'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE', 'Florida': 'FL', 'Georgia': 'GA',
    'Hawaii': 'HI', 'Idaho': 'ID', 'Illinois': 'IL', 'Indiana': 'IN', 'Iowa': 'IA',
    'Kansas': 'KS', 'Kentucky': 'KY', 'Louisiana': 'LA', 'Maine': 'ME', 'Maryland': 'MD',
    'Massachusetts': 'MA', 'Michigan': 'MI', 'Minnesota': 'MN', 'Mississippi': 'MS', 'Missouri': 'MO',
    'Montana': 'MT', 'Nebraska': 'NE', 'Nevada': 'NV', 'New Hampshire': 'NH', 'New Jersey': 'NJ',
    'New Mexico': 'NM', 'New York': 'NY', 'North Carolina': 'NC', 'North Dakota': 'ND', 'Ohio': 'OH',
    'Oklahoma': 'OK', 'Oregon': 'OR', 'Pennsylvania': 'PA', 'Rhode Island': 'RI', 'South Carolina': 'SC',
    'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX', 'Utah': 'UT', 'Vermont': 'VT',
    'Virginia': 'VA', 'Washington': 'WA', 'West Virginia': 'WV', 'Wisconsin': 'WI', 'Wyoming': 'WY', 'District of Columbia':'DC'

}

meta_data['State_Abbr'] = meta_data['State'].str.strip().map(state_abbr)
house_pvi['State_Abbr'] = house_pvi['State'].str.strip().map(state_abbr)
senate_pvi['State_Abbr'] = senate_pvi['State'].str.strip().map(state_abbr)

# Build "State_District" key with zero-padded two digits
def make_sd(df, state_col, dist_col):

    # Convert district strings to numeric, default 1 for non‐numeric (at‐large)
    dist_num = pd.to_numeric(df[dist_col], errors='coerce').fillna(1).replace(0, 1).astype(int)

    # Zero‐pad to two digits
    dist_str = dist_num.apply(lambda d: f"{d:02d}")
    return df[state_col].astype(str) + '-' + dist_str

meta_data['State_District'] = make_sd(meta_data, 'State_Abbr', 'District')
house_pvi['State_District'] = make_sd(house_pvi,  'State_Abbr', 'Number')

house_pvi.head()

	State	Number	Member	Party	2025 Cook PVI	PVI_Party	PVI_Number	State_Abbr	State_District
0	Alabama	1	Barry Moore	R	R+27	R	27	AL	AL-01
1	Alabama	2	Shomari Figures	D	D+5	D	5	AL	AL-02
2	Alabama	3	Mike Rogers	R	R+23	R	23	AL	AL-03
3	Alabama	4	Robert Aderholt	R	R+33	R	33	AL	AL-04
4	Alabama	5	Dale Strong	R	R+15	R	15	AL	AL-05

meta_data.head()

	Name	bioguide_id	Party	Chamber	State	District	Party_Abbr	State_Abbr	State_District
0	Mark B. Messmer	M001233	Republican	House	Indiana	8.0	R	IN	IN-08
1	Delia C. Ramirez	R000617	Democratic	House	Illinois	3.0	D	IL	IL-03
2	Tim Sheehy	S001232	Republican	Senate	Montana	NaN	R	MT	MT-01
3	Ben Ray Luján	L000570	Democratic	Senate	New Mexico	NaN	D	NM	NM-01
4	Josh Hawley	H001089	Republican	Senate	Missouri	NaN	R	MO	MO-01

Merge & Filter for House#

We left-join the PVI onto our meta_data, then isolate *House members only and rename columns for clarity.

# Merge on State_District
temp = meta_data.merge(
    house_pvi[['State_District','PVI_Party','PVI_Number']],
    on='State_District',
    how='left'
)

# Keep only House chamber
source_pvi_house = temp.query("Chamber=='House'").copy()

# Rename for consistency
source_pvi_house.rename(columns={
    'PVI_Party':'Cook_PVI_Party',
    'PVI_Number':'Cook_PVI_Number'
}, inplace=True)

# Flag differences
source_pvi_house['PVI_Party_Diff'] = (
    source_pvi_house['Party_Abbr'] != source_pvi_house['Cook_PVI_Party']
)

# Zero out numeric PVI where party mismatches
mask = source_pvi_house['PVI_Party_Diff']
source_pvi_house.loc[mask, 'Cook_PVI_Number'] = 0

Finalize & Export#

Select only the columns needed for Bridge Grades and save to CSV.

final_cols = [
    'Name','Chamber','bioguide_id','Party',
    'Cook_PVI_Party','Cook_PVI_Number','PVI_Party_Diff'
]
source_pvi_house[final_cols].to_csv(
    '../Data/Source M/Output files/bridge_grade_source_m_house_pvi.csv',
    index=False
)

source_pvi_house[final_cols].head(10)

	Name	Chamber	bioguide_id	Party	Cook_PVI_Party	Cook_PVI_Number	PVI_Party_Diff
0	Mark B. Messmer	House	M001233	Republican	R	18	False
1	Delia C. Ramirez	House	R000617	Democratic	D	17	False
7	LaMonica McIver	House	M001229	Democratic	D	27	False
8	Chrissy Houlahan	House	H001085	Democratic	D	6	False
10	Robert Menendez	House	M001226	Democratic	D	15	False
11	Valerie P. Foushee	House	F000477	Democratic	D	23	False
12	Shri Thanedar	House	T000488	Democratic	D	22	False
13	Jimmy Patronis	House	P000622	Republican	R	18	False
14	Randy Fine	House	F000484	Republican	R	14	False
15	Hillary J. Scholten	House	S001221	Democratic	D	4	False

Prepare Senator Metadata & Merge#

We filter our meta_data to Senate members, derive the first letter of their party, and merge on state name.

sen_meta = meta_data.query("Chamber=='Senate'").copy()

# Merge PVI onto senators by State
senate_merge = sen_meta.merge(
    senate_pvi[['State','PVI_Party','PVI_Number']],
    on='State',
    how='left'
)

Flag & Adjust Party Mismatches#

If a senator’s party letter differs from the Cook‐PVI lean, we zero out the PVI number so they aren’t credited for an opposing lean.

# Flag mismatches between senator’s party and PVI lean
senate_merge['PVI_Party_Diff'] = (
    senate_merge['Party_Abbr'] != senate_merge['PVI_Party']
)

# Zero‐out PVI_Number where mismatch is True
mask = senate_merge['PVI_Party_Diff']
senate_merge.loc[mask, 'PVI_Number'] = 0

Finalize & Export#

Select the necessary columns and save the CSV for Source M – Senate.

# Select output fields
final_cols = ['Name','Chamber','bioguide_id','Party','PVI_Party','PVI_Number','PVI_Party_Diff']
source_pvi_senate = senate_merge[final_cols]

# Export
source_pvi_senate.to_csv(
    '../Data/Source M/Output files/bridge_grade_source_m_senate_pvi.csv',
    index=False
)

senate_merge[final_cols].head(10)

	Name	Chamber	bioguide_id	Party	PVI_Party	PVI_Number	PVI_Party_Diff
0	Tim Sheehy	Senate	S001232	Republican	R	10	False
1	Ben Ray Luján	Senate	L000570	Democratic	D	4	False
2	Josh Hawley	Senate	H001089	Republican	R	9	False
3	Peter Welch	Senate	W000800	Democratic	D	17	False
4	Bernie Moreno	Senate	M001242	Republican	R	5	False
5	Ashley Moody	Senate	M001244	Republican	R	5	False
6	John R. Curtis	Senate	C001114	Republican	R	11	False
7	Jon Husted	Senate	H001104	Republican	R	5	False
8	Eric Schmitt	Senate	S001227	Republican	R	9	False
9	Angela D. Alsobrooks	Senate	A000382	Democratic	D	15	False

Source M: Cook Political Partisan Voting Index (PVI) Processing

Contents

Source M: Cook Political Partisan Voting Index (PVI) Processing#

Data Sources#

Input Files#

Data Source Details#

Outputs#

Source M: House PVI#

Source M: Senate PVI#

Technical Requirements#

Dependencies#

Performance Notes#

Data Quality#

Data Integrity Notes#

Key Features#

PVI Interpretation#

Notebook Walkthrough: Source M - Cook Political Partisan Voting Index (PVI)#

Load & Parse PVI Values#

Load & Parse Senate PVI#

Prepare party-state-district keys#

Merge & Filter for House#

Finalize & Export#

Prepare Senator Metadata & Merge#

Flag & Adjust Party Mismatches#

Finalize & Export#