Source M: Cook Political Partisan Voting Index (PVI) Processing#
Overview
This notebook processes Cook Political Partisan Voting Index (PVI) data to generate the Source M dataset used in the Bridge Grades methodology to measure the partisan lean of each U.S. congressional district relative to the national average.
Source M: Cook Political Partisan Voting Index (PVI) - Measures district-level partisan lean for both House and Senate
The PVI values generated here will be merged into the master district-level dataset for final Bridge Grade calculations, providing context for evaluating bipartisan behavior relative to district characteristics.
Data Sources#
Input Files#
Cook PVI 1997-2025.xlsx
- House district PVI data from Cook Political (paid subscription)2025 PVI States.csv
- State-level PVI data for Senate races119th_Congress_*.csv
- Congressional metadata with bioguide IDs and district assignments
Data Source Details#
Source: Cook Political (paid subscription)
Congress: 119th U.S. Congress
Download Date: April 17, 2025 (Data updates periodically. Last checked: August 8, 2025)
Coverage: Partisan Voting Index for all House districts and Senate states
Original Processing:
PVI Pre-Processing.ipynb
Outputs#
Source M: House PVI#
File: bridge_grade_source_m_house_pvi.csv
Columns:
Name
: Legislator’s full nameChamber
: “House” for all recordsbioguide_id
: Unique legislator identifierParty
: Legislator’s political partyCook_PVI_Party
: District’s partisan lean (D/R)Cook_PVI_Number
: Numeric PVI value (0 if party mismatch)PVI_Party_Diff
: Boolean flag for party mismatches
Interpretation: Higher PVI numbers indicate stronger partisan lean in the district. Zero values indicate party mismatches where the legislator represents a district leaning toward the opposite party.
Source M: Senate PVI#
File: bridge_grade_source_m_senate_pvi.csv
Columns:
Name
: Senator’s full nameChamber
: “Senate” for all recordsbioguide_id
: Unique legislator identifierParty
: Senator’s political partyPVI_Party
: State’s partisan lean (D/R)PVI_Number
: Numeric PVI value (0 if party mismatch)PVI_Party_Diff
: Boolean flag for party mismatches
Interpretation: Higher PVI numbers indicate stronger partisan lean in the state. Zero values indicate party mismatches where the senator represents a state leaning toward the opposite party.
Technical Requirements#
Dependencies#
pandas
: Data manipulation and analysisnumpy
: Numerical operationsscipy.stats
: Statistical functions (imported but not used in current version)warnings
: Warning suppression for pandas operations
Performance Notes#
State abbreviation mapping is comprehensive for all 50 states plus DC
District key generation handles “At Large” districts as district 1
Party mismatch detection prevents inappropriate PVI scoring
All original PVI data is preserved with adjustments clearly flagged
Data Quality#
Data Integrity Notes#
PVI data is sourced from Cook Political, a respected political analysis firm
Party mismatch adjustments ensure fair evaluation of bipartisan behavior
State-district key generation handles edge cases (At Large districts, zero-padding)
Both House and Senate data are processed consistently
Key Features#
House Coverage: All 431 House districts with PVI data
Senate Coverage: All 100 Senate seats with state-level PVI data
Party Mismatch Handling: Automatic zeroing of PVI scores for opposing lean
Standardized Format: Consistent output structure for both chambers
PVI Interpretation#
PVI Scale: Measures how much more Democratic or Republican a district/state is compared to the nation
Positive Values: Indicate Democratic lean relative to national average
Negative Values: Indicate Republican lean relative to national average
Zero Values: Indicate party mismatches or neutral districts
Notebook Walkthrough: Source M - Cook Political Partisan Voting Index (PVI)#
This notebook prepares the Source M dataset—Cook Political Partisan Voting Index (PVI)—used in the Bridge Grades methodology to measure the partisan lean of each U.S. congressional district relative to the national average.
Source M: Cook Political Partisan Voting Index (PVI)
Origin: Obtained via paid subscription from Cook Political
Date downloaded: April 03, 2025 (Data updates periodically. Last checked: August 8, 2025)
The PVI values generated here will be merged into the master district-level dataset for final Bridge Grade calculations.
import pandas as pd
import numpy as np
from scipy.stats import norm
import seaborn as sns
import matplotlib.pyplot as plt
import glob
import warnings
import pandas as pd
from pandas.errors import SettingWithCopyWarning
warnings.simplefilter(action='ignore', category=(SettingWithCopyWarning))
Load & Parse PVI Values#
We read the Excel sheet, split the "2025 Cook PVI"
column into party and numeric components, and cast the numeric part to integer.
# Load raw Cook PVI for the 119th Congress (2025–26)
house_pvi = pd.read_excel(
'../Data/Source M/Input files/Cook PVI 1997-2025.xlsx',
sheet_name='119 (25-26)'
)
# Split "2025 Cook PVI" into Party and Number
house_pvi[['PVI_Party','PVI_Number']] = (
house_pvi['2025 Cook PVI']
.str.split('+', n=1, expand=True)
)
# Convert numeric part to integer, coerce errors to NaN
house_pvi['PVI_Number'] = pd.to_numeric(house_pvi['PVI_Number'], errors='coerce').fillna(0).astype(int)
house_pvi.head(10)
State | Number | Member | Party | 2025 Cook PVI | PVI_Party | PVI_Number | |
---|---|---|---|---|---|---|---|
0 | Alabama | 1 | Barry Moore | R | R+27 | R | 27 |
1 | Alabama | 2 | Shomari Figures | D | D+5 | D | 5 |
2 | Alabama | 3 | Mike Rogers | R | R+23 | R | 23 |
3 | Alabama | 4 | Robert Aderholt | R | R+33 | R | 33 |
4 | Alabama | 5 | Dale Strong | R | R+15 | R | 15 |
5 | Alabama | 6 | Gary Palmer | R | R+20 | R | 20 |
6 | Alabama | 7 | Terri Sewell | D | D+13 | D | 13 |
7 | Alaska | AL | Nick Begich | R | R+6 | R | 6 |
8 | Arizona | 1 | David Schweikert | R | R+1 | R | 1 |
9 | Arizona | 2 | Eli Crane | R | R+7 | R | 7 |
Load & Parse Senate PVI#
We read the CSV of state‐level PVI, split the "Raw PVI"
into party lean and numeric components, and convert the numeric part to integer.
# Load raw Senate PVI by state
senate_pvi = pd.read_csv(
'../Data/Source M/Input files/2025 PVI States.csv'
)
# Split "2025 PVI" into Lean Party and Number
senate_pvi[['PVI_Party','PVI_Number']] = (
senate_pvi['2025 PVI']
.str.split('+', n=1, expand=True)
)
# Convert numeric part to integer, default 0 on errors
senate_pvi['PVI_Number'] = pd.to_numeric(
senate_pvi['PVI_Number'], errors='coerce'
).fillna(0).astype(int)
senate_pvi.head(10)
State | 2025 PVI | Raw PVI | Rank (D to R) | PVI_Party | PVI_Number | |
---|---|---|---|---|---|---|
0 | Alabama | R+15 | R+14.81 | 44 | R | 15 |
1 | Alaska | R+6 | R+6.46 | 32 | R | 6 |
2 | Arizona | R+2 | R+2.06 | 27 | R | 2 |
3 | Arkansas | R+15 | R+15.31 | 46 | R | 15 |
4 | California | D+12 | D+11.52 | 6 | D | 12 |
5 | Colorado | D+6 | D+5.96 | 14 | D | 6 |
6 | Connecticut | D+8 | D+8.08 | 8 | D | 8 |
7 | Delaware | D+8 | D+8.01 | 9 | D | 8 |
8 | D.C. | D+44 | D+43.6 | 1 | D | 44 |
9 | Florida | R+5 | R+5.39 | 29 | R | 5 |
Prepare party-state-district keys#
We add a key that will later be used to identify the different parties. We also need a common union key with the format “XX-DD” (for example, “CA-12”). We:
Add a key to identify the party with the initial of its name.
Assign the full names of the states to the USPS abbreviations in both DataFrames.
Create a district string with zero padding, treating “At Large” as district 1.
# Read in the 119th Congress data with bioguide ids
files = sorted(glob.glob("../Data/Source C-D-E-F/Input files/119th_Congress_*.csv"))
latest = files[-1]
df_119 = pd.read_csv(latest)
meta_data = df_119[["Name", "bioguide_id", "Party", "Chamber", "State", "District"]]
# meta_data["Party"] is R, if the party is Republican, D if Democrat, I if Independent
meta_data["Party_Abbr"] = meta_data["Party"].replace({
"Republican": "R",
"Democratic": "D",
"Independent": "I"
})
meta_data.head(10)
Name | bioguide_id | Party | Chamber | State | District | Party_Abbr | |
---|---|---|---|---|---|---|---|
0 | Mark B. Messmer | M001233 | Republican | House | Indiana | 8.0 | R |
1 | Delia C. Ramirez | R000617 | Democratic | House | Illinois | 3.0 | D |
2 | Tim Sheehy | S001232 | Republican | Senate | Montana | NaN | R |
3 | Ben Ray Luján | L000570 | Democratic | Senate | New Mexico | NaN | D |
4 | Josh Hawley | H001089 | Republican | Senate | Missouri | NaN | R |
5 | Peter Welch | W000800 | Democratic | Senate | Vermont | NaN | D |
6 | Bernie Moreno | M001242 | Republican | Senate | Ohio | NaN | R |
7 | LaMonica McIver | M001229 | Democratic | House | New Jersey | 10.0 | D |
8 | Chrissy Houlahan | H001085 | Democratic | House | Pennsylvania | 6.0 | D |
9 | Ashley Moody | M001244 | Republican | Senate | Florida | NaN | R |
# State name → USPS abbreviations
state_abbr = {
'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR', 'California': 'CA',
'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE', 'Florida': 'FL', 'Georgia': 'GA',
'Hawaii': 'HI', 'Idaho': 'ID', 'Illinois': 'IL', 'Indiana': 'IN', 'Iowa': 'IA',
'Kansas': 'KS', 'Kentucky': 'KY', 'Louisiana': 'LA', 'Maine': 'ME', 'Maryland': 'MD',
'Massachusetts': 'MA', 'Michigan': 'MI', 'Minnesota': 'MN', 'Mississippi': 'MS', 'Missouri': 'MO',
'Montana': 'MT', 'Nebraska': 'NE', 'Nevada': 'NV', 'New Hampshire': 'NH', 'New Jersey': 'NJ',
'New Mexico': 'NM', 'New York': 'NY', 'North Carolina': 'NC', 'North Dakota': 'ND', 'Ohio': 'OH',
'Oklahoma': 'OK', 'Oregon': 'OR', 'Pennsylvania': 'PA', 'Rhode Island': 'RI', 'South Carolina': 'SC',
'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX', 'Utah': 'UT', 'Vermont': 'VT',
'Virginia': 'VA', 'Washington': 'WA', 'West Virginia': 'WV', 'Wisconsin': 'WI', 'Wyoming': 'WY', 'District of Columbia':'DC'
}
meta_data['State_Abbr'] = meta_data['State'].str.strip().map(state_abbr)
house_pvi['State_Abbr'] = house_pvi['State'].str.strip().map(state_abbr)
senate_pvi['State_Abbr'] = senate_pvi['State'].str.strip().map(state_abbr)
# Build "State_District" key with zero-padded two digits
def make_sd(df, state_col, dist_col):
# Convert district strings to numeric, default 1 for non‐numeric (at‐large)
dist_num = pd.to_numeric(df[dist_col], errors='coerce').fillna(1).replace(0, 1).astype(int)
# Zero‐pad to two digits
dist_str = dist_num.apply(lambda d: f"{d:02d}")
return df[state_col].astype(str) + '-' + dist_str
meta_data['State_District'] = make_sd(meta_data, 'State_Abbr', 'District')
house_pvi['State_District'] = make_sd(house_pvi, 'State_Abbr', 'Number')
house_pvi.head()
State | Number | Member | Party | 2025 Cook PVI | PVI_Party | PVI_Number | State_Abbr | State_District | |
---|---|---|---|---|---|---|---|---|---|
0 | Alabama | 1 | Barry Moore | R | R+27 | R | 27 | AL | AL-01 |
1 | Alabama | 2 | Shomari Figures | D | D+5 | D | 5 | AL | AL-02 |
2 | Alabama | 3 | Mike Rogers | R | R+23 | R | 23 | AL | AL-03 |
3 | Alabama | 4 | Robert Aderholt | R | R+33 | R | 33 | AL | AL-04 |
4 | Alabama | 5 | Dale Strong | R | R+15 | R | 15 | AL | AL-05 |
meta_data.head()
Name | bioguide_id | Party | Chamber | State | District | Party_Abbr | State_Abbr | State_District | |
---|---|---|---|---|---|---|---|---|---|
0 | Mark B. Messmer | M001233 | Republican | House | Indiana | 8.0 | R | IN | IN-08 |
1 | Delia C. Ramirez | R000617 | Democratic | House | Illinois | 3.0 | D | IL | IL-03 |
2 | Tim Sheehy | S001232 | Republican | Senate | Montana | NaN | R | MT | MT-01 |
3 | Ben Ray Luján | L000570 | Democratic | Senate | New Mexico | NaN | D | NM | NM-01 |
4 | Josh Hawley | H001089 | Republican | Senate | Missouri | NaN | R | MO | MO-01 |
Merge & Filter for House#
We left-join the PVI onto our meta_data
, then isolate *House members only and rename columns for clarity.
# Merge on State_District
temp = meta_data.merge(
house_pvi[['State_District','PVI_Party','PVI_Number']],
on='State_District',
how='left'
)
# Keep only House chamber
source_pvi_house = temp.query("Chamber=='House'").copy()
# Rename for consistency
source_pvi_house.rename(columns={
'PVI_Party':'Cook_PVI_Party',
'PVI_Number':'Cook_PVI_Number'
}, inplace=True)
# Flag differences
source_pvi_house['PVI_Party_Diff'] = (
source_pvi_house['Party_Abbr'] != source_pvi_house['Cook_PVI_Party']
)
# Zero out numeric PVI where party mismatches
mask = source_pvi_house['PVI_Party_Diff']
source_pvi_house.loc[mask, 'Cook_PVI_Number'] = 0
Finalize & Export#
Select only the columns needed for Bridge Grades and save to CSV.
final_cols = [
'Name','Chamber','bioguide_id','Party',
'Cook_PVI_Party','Cook_PVI_Number','PVI_Party_Diff'
]
source_pvi_house[final_cols].to_csv(
'../Data/Source M/Output files/bridge_grade_source_m_house_pvi.csv',
index=False
)
source_pvi_house[final_cols].head(10)
Name | Chamber | bioguide_id | Party | Cook_PVI_Party | Cook_PVI_Number | PVI_Party_Diff | |
---|---|---|---|---|---|---|---|
0 | Mark B. Messmer | House | M001233 | Republican | R | 18 | False |
1 | Delia C. Ramirez | House | R000617 | Democratic | D | 17 | False |
7 | LaMonica McIver | House | M001229 | Democratic | D | 27 | False |
8 | Chrissy Houlahan | House | H001085 | Democratic | D | 6 | False |
10 | Robert Menendez | House | M001226 | Democratic | D | 15 | False |
11 | Valerie P. Foushee | House | F000477 | Democratic | D | 23 | False |
12 | Shri Thanedar | House | T000488 | Democratic | D | 22 | False |
13 | Jimmy Patronis | House | P000622 | Republican | R | 18 | False |
14 | Randy Fine | House | F000484 | Republican | R | 14 | False |
15 | Hillary J. Scholten | House | S001221 | Democratic | D | 4 | False |
Prepare Senator Metadata & Merge#
We filter our meta_data
to Senate members, derive the first letter of their party, and merge on state name.
sen_meta = meta_data.query("Chamber=='Senate'").copy()
# Merge PVI onto senators by State
senate_merge = sen_meta.merge(
senate_pvi[['State','PVI_Party','PVI_Number']],
on='State',
how='left'
)
Flag & Adjust Party Mismatches#
If a senator’s party letter differs from the Cook‐PVI lean, we zero out the PVI number so they aren’t credited for an opposing lean.
# Flag mismatches between senator’s party and PVI lean
senate_merge['PVI_Party_Diff'] = (
senate_merge['Party_Abbr'] != senate_merge['PVI_Party']
)
# Zero‐out PVI_Number where mismatch is True
mask = senate_merge['PVI_Party_Diff']
senate_merge.loc[mask, 'PVI_Number'] = 0
Finalize & Export#
Select the necessary columns and save the CSV for Source M – Senate.
# Select output fields
final_cols = ['Name','Chamber','bioguide_id','Party','PVI_Party','PVI_Number','PVI_Party_Diff']
source_pvi_senate = senate_merge[final_cols]
# Export
source_pvi_senate.to_csv(
'../Data/Source M/Output files/bridge_grade_source_m_senate_pvi.csv',
index=False
)
senate_merge[final_cols].head(10)
Name | Chamber | bioguide_id | Party | PVI_Party | PVI_Number | PVI_Party_Diff | |
---|---|---|---|---|---|---|---|
0 | Tim Sheehy | Senate | S001232 | Republican | R | 10 | False |
1 | Ben Ray Luján | Senate | L000570 | Democratic | D | 4 | False |
2 | Josh Hawley | Senate | H001089 | Republican | R | 9 | False |
3 | Peter Welch | Senate | W000800 | Democratic | D | 17 | False |
4 | Bernie Moreno | Senate | M001242 | Republican | R | 5 | False |
5 | Ashley Moody | Senate | M001244 | Republican | R | 5 | False |
6 | John R. Curtis | Senate | C001114 | Republican | R | 11 | False |
7 | Jon Husted | Senate | H001104 | Republican | R | 5 | False |
8 | Eric Schmitt | Senate | S001227 | Republican | R | 9 | False |
9 | Angela D. Alsobrooks | Senate | A000382 | Democratic | D | 15 | False |