Source M: Cook Political Partisan Voting Index (PVI) Processing#

Overview

This notebook processes Cook Political Partisan Voting Index (PVI) data to generate the Source M dataset used in the Bridge Grades methodology to measure the partisan lean of each U.S. congressional district relative to the national average.

  • Source M: Cook Political Partisan Voting Index (PVI) - Measures district-level partisan lean for both House and Senate

The PVI values generated here will be merged into the master district-level dataset for final Bridge Grade calculations, providing context for evaluating bipartisan behavior relative to district characteristics.

Data Sources#

Input Files#

  • Cook PVI 1997-2025.xlsx - House district PVI data from Cook Political (paid subscription)

  • 2025 PVI States.csv - State-level PVI data for Senate races

  • 119th_Congress_*.csv - Congressional metadata with bioguide IDs and district assignments

Data Source Details#

  • Source: Cook Political (paid subscription)

  • Congress: 119th U.S. Congress

  • Download Date: April 17, 2025 (Data updates periodically. Last checked: August 8, 2025)

  • Coverage: Partisan Voting Index for all House districts and Senate states

  • Original Processing: PVI Pre-Processing.ipynb


Outputs#

Source M: House PVI#

File: bridge_grade_source_m_house_pvi.csv

Columns:

  • Name: Legislator’s full name

  • Chamber: “House” for all records

  • bioguide_id: Unique legislator identifier

  • Party: Legislator’s political party

  • Cook_PVI_Party: District’s partisan lean (D/R)

  • Cook_PVI_Number: Numeric PVI value (0 if party mismatch)

  • PVI_Party_Diff: Boolean flag for party mismatches

Interpretation: Higher PVI numbers indicate stronger partisan lean in the district. Zero values indicate party mismatches where the legislator represents a district leaning toward the opposite party.

Source M: Senate PVI#

File: bridge_grade_source_m_senate_pvi.csv

Columns:

  • Name: Senator’s full name

  • Chamber: “Senate” for all records

  • bioguide_id: Unique legislator identifier

  • Party: Senator’s political party

  • PVI_Party: State’s partisan lean (D/R)

  • PVI_Number: Numeric PVI value (0 if party mismatch)

  • PVI_Party_Diff: Boolean flag for party mismatches

Interpretation: Higher PVI numbers indicate stronger partisan lean in the state. Zero values indicate party mismatches where the senator represents a state leaning toward the opposite party.


Technical Requirements#

Dependencies#

  • pandas: Data manipulation and analysis

  • numpy: Numerical operations

  • scipy.stats: Statistical functions (imported but not used in current version)

  • warnings: Warning suppression for pandas operations

Performance Notes#

  • State abbreviation mapping is comprehensive for all 50 states plus DC

  • District key generation handles “At Large” districts as district 1

  • Party mismatch detection prevents inappropriate PVI scoring

  • All original PVI data is preserved with adjustments clearly flagged


Data Quality#

Data Integrity Notes#

  • PVI data is sourced from Cook Political, a respected political analysis firm

  • Party mismatch adjustments ensure fair evaluation of bipartisan behavior

  • State-district key generation handles edge cases (At Large districts, zero-padding)

  • Both House and Senate data are processed consistently

Key Features#

  • House Coverage: All 431 House districts with PVI data

  • Senate Coverage: All 100 Senate seats with state-level PVI data

  • Party Mismatch Handling: Automatic zeroing of PVI scores for opposing lean

  • Standardized Format: Consistent output structure for both chambers

PVI Interpretation#

  • PVI Scale: Measures how much more Democratic or Republican a district/state is compared to the nation

  • Positive Values: Indicate Democratic lean relative to national average

  • Negative Values: Indicate Republican lean relative to national average

  • Zero Values: Indicate party mismatches or neutral districts


Notebook Walkthrough: Source M - Cook Political Partisan Voting Index (PVI)#

This notebook prepares the Source M dataset—Cook Political Partisan Voting Index (PVI)—used in the Bridge Grades methodology to measure the partisan lean of each U.S. congressional district relative to the national average.

  • Source M: Cook Political Partisan Voting Index (PVI)

  • Origin: Obtained via paid subscription from Cook Political

  • Date downloaded: April 03, 2025 (Data updates periodically. Last checked: August 8, 2025)

The PVI values generated here will be merged into the master district-level dataset for final Bridge Grade calculations.

import pandas as pd
import numpy as np
from scipy.stats import norm
import seaborn as sns
import matplotlib.pyplot as plt
import glob

import warnings
import pandas as pd
from pandas.errors import SettingWithCopyWarning
warnings.simplefilter(action='ignore', category=(SettingWithCopyWarning))

Load & Parse PVI Values#

We read the Excel sheet, split the "2025 Cook PVI" column into party and numeric components, and cast the numeric part to integer.

# Load raw Cook PVI for the 119th Congress (2025–26)
house_pvi = pd.read_excel(
    '../Data/Source M/Input files/Cook PVI 1997-2025.xlsx',
    sheet_name='119 (25-26)'
)
# Split "2025 Cook PVI" into Party and Number
house_pvi[['PVI_Party','PVI_Number']] = (
    house_pvi['2025 Cook PVI']
    .str.split('+', n=1, expand=True)
)
# Convert numeric part to integer, coerce errors to NaN
house_pvi['PVI_Number'] = pd.to_numeric(house_pvi['PVI_Number'], errors='coerce').fillna(0).astype(int)
house_pvi.head(10)
State Number Member Party 2025 Cook PVI PVI_Party PVI_Number
0 Alabama 1 Barry Moore R R+27 R 27
1 Alabama 2 Shomari Figures D D+5 D 5
2 Alabama 3 Mike Rogers R R+23 R 23
3 Alabama 4 Robert Aderholt R R+33 R 33
4 Alabama 5 Dale Strong R R+15 R 15
5 Alabama 6 Gary Palmer R R+20 R 20
6 Alabama 7 Terri Sewell D D+13 D 13
7 Alaska AL Nick Begich R R+6 R 6
8 Arizona 1 David Schweikert R R+1 R 1
9 Arizona 2 Eli Crane R R+7 R 7

Load & Parse Senate PVI#

We read the CSV of state‐level PVI, split the "Raw PVI" into party lean and numeric components, and convert the numeric part to integer.

# Load raw Senate PVI by state
senate_pvi = pd.read_csv(
    '../Data/Source M/Input files/2025 PVI States.csv'
)
# Split "2025 PVI" into Lean Party and Number
senate_pvi[['PVI_Party','PVI_Number']] = (
    senate_pvi['2025 PVI']
    .str.split('+', n=1, expand=True)
)
# Convert numeric part to integer, default 0 on errors
senate_pvi['PVI_Number'] = pd.to_numeric(
    senate_pvi['PVI_Number'], errors='coerce'
).fillna(0).astype(int)
senate_pvi.head(10)
State 2025 PVI Raw PVI Rank (D to R) PVI_Party PVI_Number
0 Alabama R+15 R+14.81 44 R 15
1 Alaska R+6 R+6.46 32 R 6
2 Arizona R+2 R+2.06 27 R 2
3 Arkansas R+15 R+15.31 46 R 15
4 California D+12 D+11.52 6 D 12
5 Colorado D+6 D+5.96 14 D 6
6 Connecticut D+8 D+8.08 8 D 8
7 Delaware D+8 D+8.01 9 D 8
8 D.C. D+44 D+43.6 1 D 44
9 Florida R+5 R+5.39 29 R 5

Prepare party-state-district keys#

We add a key that will later be used to identify the different parties. We also need a common union key with the format “XX-DD” (for example, “CA-12”). We:

  1. Add a key to identify the party with the initial of its name.

  2. Assign the full names of the states to the USPS abbreviations in both DataFrames.

  3. Create a district string with zero padding, treating “At Large” as district 1.

# Read in the 119th Congress data with bioguide ids
files = sorted(glob.glob("../Data/Source C-D-E-F/Input files/119th_Congress_*.csv"))
latest = files[-1]
df_119 = pd.read_csv(latest)
meta_data = df_119[["Name", "bioguide_id", "Party", "Chamber", "State", "District"]]

# meta_data["Party"] is R, if the party is Republican, D if Democrat, I if Independent
meta_data["Party_Abbr"] = meta_data["Party"].replace({
    "Republican": "R",
    "Democratic": "D",
    "Independent": "I"
})
meta_data.head(10)
Name bioguide_id Party Chamber State District Party_Abbr
0 Mark B. Messmer M001233 Republican House Indiana 8.0 R
1 Delia C. Ramirez R000617 Democratic House Illinois 3.0 D
2 Tim Sheehy S001232 Republican Senate Montana NaN R
3 Ben Ray Luján L000570 Democratic Senate New Mexico NaN D
4 Josh Hawley H001089 Republican Senate Missouri NaN R
5 Peter Welch W000800 Democratic Senate Vermont NaN D
6 Bernie Moreno M001242 Republican Senate Ohio NaN R
7 LaMonica McIver M001229 Democratic House New Jersey 10.0 D
8 Chrissy Houlahan H001085 Democratic House Pennsylvania 6.0 D
9 Ashley Moody M001244 Republican Senate Florida NaN R
# State name → USPS abbreviations

state_abbr = {
    'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR', 'California': 'CA',
    'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE', 'Florida': 'FL', 'Georgia': 'GA',
    'Hawaii': 'HI', 'Idaho': 'ID', 'Illinois': 'IL', 'Indiana': 'IN', 'Iowa': 'IA',
    'Kansas': 'KS', 'Kentucky': 'KY', 'Louisiana': 'LA', 'Maine': 'ME', 'Maryland': 'MD',
    'Massachusetts': 'MA', 'Michigan': 'MI', 'Minnesota': 'MN', 'Mississippi': 'MS', 'Missouri': 'MO',
    'Montana': 'MT', 'Nebraska': 'NE', 'Nevada': 'NV', 'New Hampshire': 'NH', 'New Jersey': 'NJ',
    'New Mexico': 'NM', 'New York': 'NY', 'North Carolina': 'NC', 'North Dakota': 'ND', 'Ohio': 'OH',
    'Oklahoma': 'OK', 'Oregon': 'OR', 'Pennsylvania': 'PA', 'Rhode Island': 'RI', 'South Carolina': 'SC',
    'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX', 'Utah': 'UT', 'Vermont': 'VT',
    'Virginia': 'VA', 'Washington': 'WA', 'West Virginia': 'WV', 'Wisconsin': 'WI', 'Wyoming': 'WY', 'District of Columbia':'DC'

}

meta_data['State_Abbr'] = meta_data['State'].str.strip().map(state_abbr)
house_pvi['State_Abbr'] = house_pvi['State'].str.strip().map(state_abbr)
senate_pvi['State_Abbr'] = senate_pvi['State'].str.strip().map(state_abbr)
# Build "State_District" key with zero-padded two digits
def make_sd(df, state_col, dist_col):

    # Convert district strings to numeric, default 1 for non‐numeric (at‐large)
    dist_num = pd.to_numeric(df[dist_col], errors='coerce').fillna(1).replace(0, 1).astype(int)

    # Zero‐pad to two digits
    dist_str = dist_num.apply(lambda d: f"{d:02d}")
    return df[state_col].astype(str) + '-' + dist_str

meta_data['State_District'] = make_sd(meta_data, 'State_Abbr', 'District')
house_pvi['State_District'] = make_sd(house_pvi,  'State_Abbr', 'Number')
house_pvi.head()
State Number Member Party 2025 Cook PVI PVI_Party PVI_Number State_Abbr State_District
0 Alabama 1 Barry Moore R R+27 R 27 AL AL-01
1 Alabama 2 Shomari Figures D D+5 D 5 AL AL-02
2 Alabama 3 Mike Rogers R R+23 R 23 AL AL-03
3 Alabama 4 Robert Aderholt R R+33 R 33 AL AL-04
4 Alabama 5 Dale Strong R R+15 R 15 AL AL-05
meta_data.head()
Name bioguide_id Party Chamber State District Party_Abbr State_Abbr State_District
0 Mark B. Messmer M001233 Republican House Indiana 8.0 R IN IN-08
1 Delia C. Ramirez R000617 Democratic House Illinois 3.0 D IL IL-03
2 Tim Sheehy S001232 Republican Senate Montana NaN R MT MT-01
3 Ben Ray Luján L000570 Democratic Senate New Mexico NaN D NM NM-01
4 Josh Hawley H001089 Republican Senate Missouri NaN R MO MO-01

Merge & Filter for House#

We left-join the PVI onto our meta_data, then isolate *House members only and rename columns for clarity.

# Merge on State_District
temp = meta_data.merge(
    house_pvi[['State_District','PVI_Party','PVI_Number']],
    on='State_District',
    how='left'
)

# Keep only House chamber
source_pvi_house = temp.query("Chamber=='House'").copy()

# Rename for consistency
source_pvi_house.rename(columns={
    'PVI_Party':'Cook_PVI_Party',
    'PVI_Number':'Cook_PVI_Number'
}, inplace=True)
# Flag differences
source_pvi_house['PVI_Party_Diff'] = (
    source_pvi_house['Party_Abbr'] != source_pvi_house['Cook_PVI_Party']
)

# Zero out numeric PVI where party mismatches
mask = source_pvi_house['PVI_Party_Diff']
source_pvi_house.loc[mask, 'Cook_PVI_Number'] = 0

Finalize & Export#

Select only the columns needed for Bridge Grades and save to CSV.

final_cols = [
    'Name','Chamber','bioguide_id','Party',
    'Cook_PVI_Party','Cook_PVI_Number','PVI_Party_Diff'
]
source_pvi_house[final_cols].to_csv(
    '../Data/Source M/Output files/bridge_grade_source_m_house_pvi.csv',
    index=False
)
source_pvi_house[final_cols].head(10)
Name Chamber bioguide_id Party Cook_PVI_Party Cook_PVI_Number PVI_Party_Diff
0 Mark B. Messmer House M001233 Republican R 18 False
1 Delia C. Ramirez House R000617 Democratic D 17 False
7 LaMonica McIver House M001229 Democratic D 27 False
8 Chrissy Houlahan House H001085 Democratic D 6 False
10 Robert Menendez House M001226 Democratic D 15 False
11 Valerie P. Foushee House F000477 Democratic D 23 False
12 Shri Thanedar House T000488 Democratic D 22 False
13 Jimmy Patronis House P000622 Republican R 18 False
14 Randy Fine House F000484 Republican R 14 False
15 Hillary J. Scholten House S001221 Democratic D 4 False

Prepare Senator Metadata & Merge#

We filter our meta_data to Senate members, derive the first letter of their party, and merge on state name.

sen_meta = meta_data.query("Chamber=='Senate'").copy()
# Merge PVI onto senators by State
senate_merge = sen_meta.merge(
    senate_pvi[['State','PVI_Party','PVI_Number']],
    on='State',
    how='left'
)

Flag & Adjust Party Mismatches#

If a senator’s party letter differs from the Cook‐PVI lean, we zero out the PVI number so they aren’t credited for an opposing lean.

# Flag mismatches between senator’s party and PVI lean
senate_merge['PVI_Party_Diff'] = (
    senate_merge['Party_Abbr'] != senate_merge['PVI_Party']
)

# Zero‐out PVI_Number where mismatch is True
mask = senate_merge['PVI_Party_Diff']
senate_merge.loc[mask, 'PVI_Number'] = 0

Finalize & Export#

Select the necessary columns and save the CSV for Source M – Senate.

# Select output fields
final_cols = ['Name','Chamber','bioguide_id','Party','PVI_Party','PVI_Number','PVI_Party_Diff']
source_pvi_senate = senate_merge[final_cols]

# Export
source_pvi_senate.to_csv(
    '../Data/Source M/Output files/bridge_grade_source_m_senate_pvi.csv',
    index=False
)
senate_merge[final_cols].head(10)
Name Chamber bioguide_id Party PVI_Party PVI_Number PVI_Party_Diff
0 Tim Sheehy Senate S001232 Republican R 10 False
1 Ben Ray Luján Senate L000570 Democratic D 4 False
2 Josh Hawley Senate H001089 Republican R 9 False
3 Peter Welch Senate W000800 Democratic D 17 False
4 Bernie Moreno Senate M001242 Republican R 5 False
5 Ashley Moody Senate M001244 Republican R 5 False
6 John R. Curtis Senate C001114 Republican R 11 False
7 Jon Husted Senate H001104 Republican R 5 False
8 Eric Schmitt Senate S001227 Republican R 9 False
9 Angela D. Alsobrooks Senate A000382 Democratic D 15 False