Source A-B: Legislator and Sponsorship Data Processing#
Overview
This notebook processes bill sponsorship data from the 119th U.S. Congress to generate two foundational datasets for the Bridge Grades methodology:
Source A: Authors of Bills with Cross Party Sponsors - Ranks legislators by how many of their sponsored bills received bipartisan cosponsorship
Source B: Ranked People who Cosponsor Bills - Ranks legislators by how often they cosponsored bills from the opposite party
These datasets serve as key inputs for computing individual Bridge Grades by analyzing patterns of bipartisan legislative collaboration.
Data Sources#
Input Files#
US_119_bill_sponsorships.csv
- Bill sponsorship records from the 119th Congress (Plural Policy)plural_legislators_retired_with_bioguide.csv
- Historical legislator biographical dataplural_legislators_with_bioguide.csv
- Current legislator biographical data
Data Source Details#
Source: Plural Policy
Congress: 119th U.S. Congress
Download Date: August 8, 2025
Coverage: Bill sponsorship records and biographical details of current and historical legislators
Outputs#
Source B: Ranked People who Cosponsor Bills#
File: bridge_grade_source_b_cross_party_cosponsors.csv
Columns:
name
: Name of the cosponsorbioguide_id
: Unique legislator identifiernum_cross_party_cosponsored_bills
: Count of bills cosponsored from opposite partyrank
: Ranking based on cross-party cosponsorship count
Interpretation: Higher rankings indicate legislators who more actively support legislation from the opposite party, showing bipartisan engagement.
Technical Requirements#
Dependencies#
pandas
: Data manipulation and analysisrapidfuzz
: Fuzzy string matchingtqdm
: Progress trackingplotly.express
: Data visualization
Performance Notes#
Fuzzy matching is computationally intensive but necessary for data quality
Conservative matching threshold balances accuracy with coverage
All original records are preserved for transparency
Notebook Walkthrough: Preprocessing for Bridge Grades: Legislator and Sponsorship Data#
This notebook prepares the input data used to generate two foundational resources in the Bridge Grades methodology:
Source A: Authors of Bills with Cross Party Sponsors
Source B: Ranked People who Cosponsor Bills
These resources serve as the basis for computing individual Bridge Grades by analyzing patterns of bipartisan legislative collaboration in the U.S. Congress.
The datasets processed here correspond to the 119th U.S. Congress and were downloaded from the public data portal at Plural Policy.
Date downloaded: August 8, 2025
Data includes: Bill sponsorship records and biographical details of both current and historical legislators.
Download instructions:
Scroll down to United States on the List (United States 119th Congress). Or download directly Here.
Go to US –> 119 –> US_119_bill_sponsorships.csv
# Install and import required libraries
!pip install rapidfuzz --quiet
import pandas as pd
from rapidfuzz import process, fuzz
from tqdm.notebook import tqdm
import plotly.express as px
import pandas as pd
import numpy as np
# Uncomment to run from colab in google drive
# # Load bill sponsorship records (includes both sponsors and cosponsors)
# sponsorships = pd.read_csv('/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Input files/US_119_bill_sponsorships.csv', encoding="latin1")
# # Read in legislators data from plural
# plural_legislators_current = pd.read_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Input files/plural_legislators_with_bioguide.csv")
# plural_legislators_retired = pd.read_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Input files/plural_legislators_retired_with_bioguide.csv")
# #combine retired and current legislators into a single dataframe
# plural_legislators = pd.concat([plural_legislators_current, plural_legislators_retired])
# # Drop legislators without a bioguide_id in the plural_legislators file
# plural_legislators = plural_legislators[plural_legislators["bioguide_id"].notna()]
# Run from local machine
# Load bill sponsorship records (includes both sponsors and cosponsors)
sponsorships = pd.read_csv('../Data/Source A-B/Input files/US_119_bill_sponsorships.csv', encoding="latin1")
# Read in legislators data from plural
plural_legislators_current = pd.read_csv("../Data/Source A-B/Input files/plural_legislators_with_bioguide.csv")
plural_legislators_retired = pd.read_csv("../Data/Source A-B/Input files/plural_legislators_retired_with_bioguide.csv")
#combine retired and current legislators into a single dataframe
plural_legislators = pd.concat([plural_legislators_current, plural_legislators_retired])
# Drop legislators without a bioguide_id in the plural_legislators file
plural_legislators = plural_legislators[plural_legislators["bioguide_id"].notna()]
Filter and Prepare Legislator Records#
# Do some variable renaming to facilitate the merge
plural_legislators = plural_legislators.rename(columns={"id": "person_id"}) #rename id to person_id (the column name in sponsorships file)
plural_legislators["party"].value_counts(dropna=False) # Check for missing values in the party column
party
Republican 360
Democratic 352
Independent 4
Name: count, dtype: int64
plural_legislators["role_type"].value_counts(dropna=False) # Check for missing values in the role_type column
role_type
lower 641
upper 75
Name: count, dtype: int64
# Create a new column in plural_legislators called Chamber that is sen if the role_type is upper and House if the role_type is lower
plural_legislators["type"] = plural_legislators["role_type"].apply(lambda x: "sen" if x == "upper" else "rep")
# Check the values of the type column
plural_legislators["type"].value_counts()
type
rep 641
sen 75
Name: count, dtype: int64
# Give a state and district column to the plural_legislators file
us_state_abbrev = {
'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR',
'California': 'CA', 'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE',
'Florida': 'FL', 'Georgia': 'GA', 'Hawaii': 'HI', 'Idaho': 'ID',
'Illinois': 'IL', 'Indiana': 'IN', 'Iowa': 'IA', 'Kansas': 'KS',
'Kentucky': 'KY', 'Louisiana': 'LA', 'Maine': 'ME', 'Maryland': 'MD',
'Massachusetts': 'MA', 'Michigan': 'MI', 'Minnesota': 'MN', 'Mississippi': 'MS',
'Missouri': 'MO', 'Montana': 'MT', 'Nebraska': 'NE', 'Nevada': 'NV',
'New Hampshire': 'NH', 'New Jersey': 'NJ', 'New Mexico': 'NM', 'New York': 'NY',
'North Carolina': 'NC', 'North Dakota': 'ND', 'Ohio': 'OH', 'Oklahoma': 'OK',
'Oregon': 'OR', 'Pennsylvania': 'PA', 'Rhode Island': 'RI', 'South Carolina': 'SC',
'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX', 'Utah': 'UT',
'Vermont': 'VT', 'Virginia': 'VA', 'Washington': 'WA', 'West Virginia': 'WV',
'Wisconsin': 'WI', 'Wyoming': 'WY'
}
plural_legislators = plural_legislators.rename(columns={"district": "compounded_district"})
# Create district and state columns in the plural_legislators file
def extract_state_and_district(row):
district_val = row["compounded_district"]
type_val = row["type"]
if type_val == "rep" and "-" in str(district_val):
state, district_number = district_val.split("-", 1)
return pd.Series([state, district_number])
elif type_val == "sen":
# Map full name to abbreviation
state_abbrev = us_state_abbrev.get(district_val, np.nan)
return pd.Series([state_abbrev, np.nan])
else:
return pd.Series([np.nan, np.nan])
# Apply the function to the DataFrame
plural_legislators[["state", "district"]] = plural_legislators.apply(extract_state_and_district, axis=1)
Merge bioguides to sponsorships#
# Merge the sponsorships with the plural legislators to get the bioguide_id into the sponsorships file
sponsorships_with_bioguide = pd.merge(
sponsorships,
plural_legislators[["bioguide_id", "person_id"]],
on="person_id",
how="left"
)
# How many records are missing a bioguide_id?
print("Matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].notna()].shape[0])
# How many records are missing a person_id?
print("Not matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()].shape[0])
Matched: 85302
Not matched: 3874
Now let’s save the non-matched sponsorships and save the unique legislators that were not matched
non_matched_sponsorships = sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()]
unmatched_names = non_matched_sponsorships["name"].unique()
unmatched_names = unmatched_names.tolist()
unmatched_names
['Christopher S. Murphy',
'John K. Fetterman',
'Ben R. Luján',
'John F. Reed',
'Randy K. Weber',
'Mark B. Messmer',
'William R. Timmons',
'John J. McGuire',
'Robert F. Onder',
'Addison P. McDowell',
'Donald S. Beyer',
'Marlin A. Stutzman',
'Jennifer A. Kiggans',
'Nicholas J. Begich',
'Craig A. Goldman',
'Riley M. Moore',
'Thomas H. Kean',
'Eugene Simon Vindman',
'Gilbert Ray Cisneros',
'Robert P. Bresnahan',
'Herbert C. Conaway',
'Janelle S. Bynum',
'John W. Mannion',
'Abraham J. Hamadeh',
'Sam T. Liccardo',
'David J. Taylor',
'Luz M. Rivas',
'Pablo Jose Hernández',
'Kimberlyn King-Hinds',
'Jacklyn Rosen',
'Ben R. Lujan',
'David McCormick',
'J. D. Vance']
Here we will use fuzzy matching to get the best matches and then manually check the matches and insert the bioguide_ids manually for more accuracy.
name_df = plural_legislators # We will see if we can find the bioguide_id from the name column in the plural_legislators file
# Column in your DataFrame to match against
name_column = "name"
# Dictionary to store best matches
best_matches = {}
for name in unmatched_names:
match, score, _ = process.extractOne(
name,
name_df[name_column],
scorer=fuzz.token_sort_ratio # You can experiment with others like fuzz.ratio
)
best_matches[name] = (match, score)
# Convert to DataFrame for easier viewing
import pandas as pd
match_df = pd.DataFrame([
{"unmatched_name": k, "matched_name": v[0], "score": v[1]}
for k, v in best_matches.items()
])
# Optional: Filter for high confidence matches
high_confidence_matches = match_df[match_df["score"] > 67] # Set the threshold to 67 to get the best matches
# Merge to get bioguide_ids from name_df
high_confidence_with_bioguide = high_confidence_matches.merge(
name_df[[name_column, "bioguide_id"]],
left_on="matched_name",
right_on=name_column,
how="left"
)
# Optional: drop the duplicate name column
high_confidence_with_bioguide = high_confidence_with_bioguide.drop(columns=[name_column])
# View the result
print(high_confidence_with_bioguide)
unmatched_name matched_name score bioguide_id
0 Christopher S. Murphy Chris Murphy 72.727273 M001169
1 John K. Fetterman John Fetterman 90.322581 F000479
2 Ben R. Luján Ben Ray Luján 76.923077 L000570
3 Randy K. Weber Randy Weber 88.000000 W000814
4 Mark B. Messmer Mark Messmer 88.888889 M001233
5 William R. Timmons William Timmons 90.909091 T000480
6 John J. McGuire John McGuire 88.888889 M001239
7 Addison P. McDowell Addison McDowell 91.428571 M001240
8 Donald S. Beyer Don Beyer 75.000000 B001292
9 Marlin A. Stutzman Marlin Stutzman 90.909091 S001188
10 Jennifer A. Kiggans Jen Kiggans 73.333333 K000399
11 Nicholas J. Begich Nick Begich 68.965517 B001323
12 Craig A. Goldman Craig Goldman 89.655172 G000601
13 Riley M. Moore Riley Moore 88.000000 M001235
14 Thomas H. Kean Tom Kean 72.727273 K000398
15 Eugene Simon Vindman Eugene Vindman 82.352941 V000138
16 Gilbert Ray Cisneros Gil Cisneros 75.000000 C001123
17 Robert P. Bresnahan Rob Bresnahan 81.250000 B001327
18 Herbert C. Conaway Herb Conaway 80.000000 C001136
19 Janelle S. Bynum Janelle Bynum 89.655172 B001326
20 John W. Mannion John Mannion 88.888889 M001231
21 Abraham J. Hamadeh Abe Hamadeh 68.965517 H001098
22 Sam T. Liccardo Sam Liccardo 88.888889 L000607
23 David J. Taylor Dave Taylor 76.923077 T000490
24 Luz M. Rivas Luz Rivas 85.714286 R000620
25 Pablo Jose Hernández Pablo José Hernández 87.804878 H001103
26 Kimberlyn King-Hinds Kim King-Hinds 82.352941 K000404
27 Jacklyn Rosen Jacky Rosen 91.666667 R000608
28 Ben R. Lujan Ben Ray Luján 80.000000 L000570
29 David McCormick Dave McCormick 89.655172 M001243
30 J. D. Vance JD Vance 73.684211 V000137
Warning
It is very important that you check the resuls above manually and make sure everyone on this list is matched correctly!
manual_bioguide_map = dict(zip(
high_confidence_with_bioguide["unmatched_name"],
high_confidence_with_bioguide["bioguide_id"]
))
manual_bioguide_map
{'Christopher S. Murphy': 'M001169',
'John K. Fetterman': 'F000479',
'Ben R. Luján': 'L000570',
'Randy K. Weber': 'W000814',
'Mark B. Messmer': 'M001233',
'William R. Timmons': 'T000480',
'John J. McGuire': 'M001239',
'Addison P. McDowell': 'M001240',
'Donald S. Beyer': 'B001292',
'Marlin A. Stutzman': 'S001188',
'Jennifer A. Kiggans': 'K000399',
'Nicholas J. Begich': 'B001323',
'Craig A. Goldman': 'G000601',
'Riley M. Moore': 'M001235',
'Thomas H. Kean': 'K000398',
'Eugene Simon Vindman': 'V000138',
'Gilbert Ray Cisneros': 'C001123',
'Robert P. Bresnahan': 'B001327',
'Herbert C. Conaway': 'C001136',
'Janelle S. Bynum': 'B001326',
'John W. Mannion': 'M001231',
'Abraham J. Hamadeh': 'H001098',
'Sam T. Liccardo': 'L000607',
'David J. Taylor': 'T000490',
'Luz M. Rivas': 'R000620',
'Pablo Jose Hernández': 'H001103',
'Kimberlyn King-Hinds': 'K000404',
'Jacklyn Rosen': 'R000608',
'Ben R. Lujan': 'L000570',
'David McCormick': 'M001243',
'J. D. Vance': 'V000137'}
# Fill missing bioguide_id in sponsorships_with_bioguide
sponsorships_with_bioguide["bioguide_id"] = sponsorships_with_bioguide.apply(
lambda row: manual_bioguide_map.get(row["name"], row["bioguide_id"]),
axis=1
)
# How many records are missing a bioguide_id?
print("Matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].notna()].shape[0])
print("Not matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()].shape[0])
Matched: 88932
Not matched: 244
# Now let's check for the legislators that were not matched, with a score lower than 67
low_confidence_matches = match_df[match_df["score"] < 67]
low_confidence_matches
unmatched_name | matched_name | score | |
---|---|---|---|
3 | John F. Reed | John Rose | 66.666667 |
8 | Robert F. Onder | Robert Aderholt | 66.666667 |
Note
We will now search manually for these legislators in the plural_legislators file and insert the bioguide_ids manually.
Here you can check the list of retired members from OpenStates, and check wikipedia for possible other names.
manual_bioguide_ids = {
"John F. Reed": "R000122", # Is listed as Jack Reed but it's the same person
"Robert F. Onder": "O000177", # Is listed as Bob Onder but it's the same person
# ... fill in the rest manually
#"J. D. Vance": "V001234"
}
# Apply the mapping to fill missing bioguide_id
sponsorships_with_bioguide["bioguide_id"] = sponsorships_with_bioguide.apply(
lambda row: manual_bioguide_ids.get(row["name"], row["bioguide_id"]),
axis=1
)
# How many records are missing a bioguide_id?
print("Matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].notna()].shape[0])
print("Not matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()].shape[0])
Matched: 89176
Not matched: 0
# Final dataset is a list of sponsorships with all of the bioguide_id, name, and chamber
sponsorships_with_bioguide.head()
id | name | entity_type | organization_id | person_id | bill_id | primary | classification | bioguide_id | |
---|---|---|---|---|---|---|---|---|---|
0 | 272e17aa-0d95-48af-aeb0-2b105b35596f | Cory Booker | person | NaN | ocd-person/d0bb2662-40b6-52d2-94e6-2bf9d588c954 | ocd-bill/2b2697ce-cd17-4fd1-ade6-61100214e5a8 | False | cosponsor | B001288 |
1 | 0596b61b-2452-4043-864d-41f62748526a | Brian Schatz | person | NaN | ocd-person/bd24fe60-5756-5d23-bcbf-4734f366c257 | ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94 | False | cosponsor | S001194 |
2 | 3715e1c7-54a5-441f-b004-fe41d97724be | Roger F. Wicker | person | NaN | ocd-person/8fe2b037-b8ff-5161-9bb4-196e9f1d9dca | ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94 | True | primary | W000437 |
3 | c943434a-5029-49c2-b38b-31b651b7c38c | Tina Smith | person | NaN | ocd-person/f82067c3-a257-54ae-b6bd-ebe5178372e7 | ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7 | False | cosponsor | S001203 |
4 | c0c4a9c7-bf64-4ce3-93fb-63335272ef8a | Jeff Merkley | person | NaN | ocd-person/96291b83-ea59-594a-97cf-630189e0b8ed | ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7 | False | cosponsor | M001176 |
Insert Party and chamber#
# Now let's use the bioguide_ids to get the party, type, state and districtis from the plural_legislators file via the bioguide_id
merged_sponsorships = pd.merge(
sponsorships_with_bioguide,
plural_legislators[["bioguide_id", "party", "type", "state", "district"]],
on="bioguide_id",
how="left"
)
merged_sponsorships
id | name | entity_type | organization_id | person_id | bill_id | primary | classification | bioguide_id | party | type | state | district | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 272e17aa-0d95-48af-aeb0-2b105b35596f | Cory Booker | person | NaN | ocd-person/d0bb2662-40b6-52d2-94e6-2bf9d588c954 | ocd-bill/2b2697ce-cd17-4fd1-ade6-61100214e5a8 | False | cosponsor | B001288 | Democratic | sen | NJ | NaN |
1 | 0596b61b-2452-4043-864d-41f62748526a | Brian Schatz | person | NaN | ocd-person/bd24fe60-5756-5d23-bcbf-4734f366c257 | ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94 | False | cosponsor | S001194 | Democratic | sen | HI | NaN |
2 | 3715e1c7-54a5-441f-b004-fe41d97724be | Roger F. Wicker | person | NaN | ocd-person/8fe2b037-b8ff-5161-9bb4-196e9f1d9dca | ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94 | True | primary | W000437 | Republican | rep | MS | 1 |
3 | c943434a-5029-49c2-b38b-31b651b7c38c | Tina Smith | person | NaN | ocd-person/f82067c3-a257-54ae-b6bd-ebe5178372e7 | ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7 | False | cosponsor | S001203 | Democratic | sen | MN | NaN |
4 | c0c4a9c7-bf64-4ce3-93fb-63335272ef8a | Jeff Merkley | person | NaN | ocd-person/96291b83-ea59-594a-97cf-630189e0b8ed | ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7 | False | cosponsor | M001176 | Democratic | sen | OR | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
89171 | 38397786-595a-4685-9516-6c53777073cb | Ted Budd | person | NaN | ocd-person/67c3f0bb-7182-51b8-98d0-253144992bfe | ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0 | False | cosponsor | B001305 | Republican | rep | NC | 13 |
89172 | 4dc9b53c-a378-4c53-a223-ca8a5e126161 | James C. Justice | person | NaN | ocd-person/2f9a1274-1a6a-4a78-b253-01ec02f20eee | ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0 | False | cosponsor | J000312 | Republican | sen | WV | NaN |
89173 | 085171c5-5a35-4759-851e-cad314664552 | Pete Ricketts | person | NaN | ocd-person/e1b9405a-e695-5bc5-b3b0-6cdc1415f53c | ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0 | False | cosponsor | R000618 | Republican | sen | NE | NaN |
89174 | a4b34c09-6375-4197-af04-09d53c682006 | Ted Cruz | person | NaN | ocd-person/c4d35f64-f25b-5888-9fcb-f3cb97df500e | ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0 | True | primary | C001098 | Republican | sen | TX | NaN |
89175 | e40b1beb-0049-4128-9d6e-23334f3879bf | Mike Lee | person | NaN | ocd-person/461257ff-088f-5edd-b342-4652ce1c111f | ocd-bill/6d5519d3-e11e-4e67-9265-ae194575c0ff | True | primary | L000577 | Republican | sen | UT | NaN |
89176 rows × 13 columns
# Check for missing values on the bioguide_id, type, party, state, and district (if type == "rep")
print(merged_sponsorships["bioguide_id"].isna().sum())
print(merged_sponsorships["type"].isna().sum())
print(merged_sponsorships["party"].isna().sum())
print(merged_sponsorships["state"].isna().sum())
print(merged_sponsorships["district"][merged_sponsorships["type"] == "rep"].isna().sum())
0
0
0
0
0
About merged_sponsorships
The merged_sponsorships
DataFrame consolidates all sponsorship records from the original dataset (US_119_bill_sponsorships.csv
) and augments them with legislator information when a match is found.
This dataset includes:
The original sponsor and bill information (
name
,bill_id
,classification
, etc.).Legislator attributes (
bioguide_id
,party
,type
,state
,district
) for matched sponsors.All legislators in the sponsorship data should now have a bioguide_id.
This ensures that no data is lost. It serves as the foundational dataset for constructing Source A (Authors of Bills with Cross Party Sponsors) and Source B (Ranked People who Cosponsor Bills) in the Bridge Grades methodology.
Check independents#
merged_sponsorships.columns
Index(['id', 'name', 'entity_type', 'organization_id', 'person_id', 'bill_id',
'primary', 'classification', 'bioguide_id', 'party', 'type', 'state',
'district'],
dtype='object')
merged_sponsorships[merged_sponsorships['party']== 'Independent']['name'].unique()
array(['Bernard Sanders', 'Bernie Sanders', 'Angus S. King'], dtype=object)
Warning
Check that legislators above aligned with the democratic party before running the code below. These legislators are factually democratic and should not be given rewards for sponsoring with the democratic party. Independents might change in the future, so please revise this part of the code
merged_sponsorships['party'] = merged_sponsorships['party'].replace('Independent', 'Democratic')
Construct Cross-Party Sponsorship Ranking (Source A)#
This section calculates how many bills each primary sponsor introduced that received support from at least one cosponsor of a different political party.
To ensure accurate and complete results:
We filter out records without a declared party (required for identifying bipartisan collaboration).
We validate the structure of sponsorship data, including missing values and unmatched relationships.
We retain identifiers (
bioguide_id
) for both sponsors and cosponsors, to preserve analytical flexibility.
The final output ranks each legislator who served as a primary sponsor based on the number of bills they sponsored that attracted cross-party support.
# Filter out rows with missing party information and report exclusion stats
total_records = len(merged_sponsorships)
missing_party = merged_sponsorships['party'].isna().sum()
pct_missing_party = (missing_party / total_records) * 100 if total_records > 0 else 0
print(f"Rows without party: {missing_party} of {total_records} ({pct_missing_party:.2f}%)")
# Drop rows without party affiliation (cannot evaluate bipartisanship)
df = merged_sponsorships.dropna(subset=['party'])
Rows without party: 0 of 89176 (0.00%)
df.shape[0]
89176
Separate primary sponsors and cosponsors#
We isolate the primary sponsors and cosponsors into two separate DataFrames, ensuring that both include the sponsor name, party, and bioguide ID.
Primary sponsors are then renamed to distinguish them clearly when merged with cosponsor data.
# Select primary sponsors
primary_df = df[df['classification'] == 'primary'][['bill_id', 'name', 'party', 'bioguide_id']].copy()
# Select cosponsors (with bioguide_id retained)
cosponsor_df = df[df['classification'] == 'cosponsor'][['bill_id', 'name', 'party', 'bioguide_id']].copy()
# Rename primary sponsor columns to avoid conflict after merge
primary_df = primary_df.rename(columns={
'name': 'primary_name',
'party': 'primary_party',
'bioguide_id': 'primary_bioguide_id'
})
Merge cosponsors with their primary sponsor and flag cross-party cases#
Each cosponsor is merged with their bill’s primary sponsor. We validate how many cosponsors are unmatched (bills without a primary sponsor), and we flag cases where the cosponsor and sponsor are from different parties.
# Merge cosponsors with primary sponsors on bill_id
merged_df = cosponsor_df.merge(primary_df, on='bill_id', how='left')
# Report how many cosponsors are linked to bills without a recorded primary sponsor
no_primary = merged_df['primary_name'].isna().sum()
total_cosponsors = len(cosponsor_df)
pct_no_primary = (no_primary / total_cosponsors) * 100 if total_cosponsors > 0 else 0
print(f"Cosponsors without linked primary sponsor: {no_primary} of {total_cosponsors} ({pct_no_primary:.2f}%)")
# Flag whether the cosponsor is from a different party (ignoring rows with missing primary_party)
merged_df['different_party'] = (
merged_df['primary_party'].notna() &
(merged_df['party'] != merged_df['primary_party'])
)
Cosponsors without linked primary sponsor: 0 of 80345 (0.00%)
Count and rank bills with cross-party cosponsors#
We count the number of bills each primary sponsor introduced that had at least one cross-party cosponsor. Primary sponsors with zero such bills are retained in the ranking with a count of zero.
# Get all unique primary sponsors
all_primary_sponsors = primary_df[['primary_name', 'primary_bioguide_id']].drop_duplicates()
# Identify bills with at least one cross-party cosponsor
bills_with_diff_party = merged_df[merged_df['different_party'] == True][
['bill_id', 'primary_name', 'primary_bioguide_id']
].drop_duplicates()
# Count number of such bills per primary sponsor
cross_party_counts = bills_with_diff_party.groupby(
['primary_name', 'primary_bioguide_id']
).size().reset_index(name='num_bills_with_cross_party_cosponsors')
# Merge with all primary sponsors to include those with zero
result = all_primary_sponsors.merge(
cross_party_counts,
on=['primary_name', 'primary_bioguide_id'],
how='left'
)
result['num_bills_with_cross_party_cosponsors'] = result['num_bills_with_cross_party_cosponsors'].fillna(0).astype(int)
# Sort and rank
ranked_result = result.sort_values(
by='num_bills_with_cross_party_cosponsors',
ascending=False
).reset_index(drop=True)
ranked_result['rank'] = ranked_result.index + 1
# Preview final result
display(ranked_result.head(10))
primary_name | primary_bioguide_id | num_bills_with_cross_party_cosponsors | rank | |
---|---|---|---|---|
0 | Amy Klobuchar | K000367 | 51 | 1 |
1 | Bill Cassidy | C001075 | 49 | 2 |
2 | John Cornyn | C001056 | 48 | 3 |
3 | Chuck Grassley | G000386 | 43 | 4 |
4 | Brian K. Fitzpatrick | F000466 | 42 | 5 |
5 | Marsha Blackburn | B001243 | 41 | 6 |
6 | Gary Peters | P000595 | 38 | 7 |
7 | Catherine Cortez Masto | C001113 | 34 | 8 |
8 | Jeanne Shaheen | S001181 | 31 | 9 |
9 | Zachary Nunn | N000193 | 29 | 10 |
Save ranked_result to csv files#
# Uncomment to save to google drive
# ranked_result.to_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Output files/bridge_grade_source_a_cross_party_supported_bills.csv", index=False)
# Save to local machine
ranked_result.to_csv("../Data/Source A-B/Output files/bridge_grade_source_a_cross_party_supported_bills.csv", index=False)
About this file
The bridge_grade_source_a_cross_party_supported_bills.csv
DataFrame contains a ranked list of legislators who served as primary sponsors during the 119th Congress, based on the number of bills they introduced that received support from at least one cosponsor of a different political party.
Each row represents a unique primary sponsor and includes:
primary_name
: The name of the primary sponsor.primary_bioguide_id
: A unique identifier for the legislator.num_bills_with_cross_party_cosponsors
: The number of sponsored bills that received cross-party cosponsorship.rank
: The sponsor’s rank based on that count (higher values indicate more bipartisan engagement).
This dataset constitutes Source A in the Bridge Grades methodology:
“Authors of Bills with Cross Party Sponsors” — a signal of bipartisan openness based on the cosponsorship behavior of others toward the legislator’s initiatives.
Construct Cross-Party Cosponsorship Ranking (Source B)#
This section generates a ranking of legislators based on the number of bills they cosponsored where the primary sponsor was from a different political party.
The output represents Source B in the Bridge Grades methodology:
“Ranked People who Cosponsor Bills” — a measure of bipartisan engagement based on a legislator’s willingness to support legislation authored by the opposing party.
The data used here comes from the merged_sponsorships
DataFrame produced earlier in this notebook.
Step 1: Remove rows without party affiliation#
Cosponsorship records without party information cannot be used to assess bipartisan behavior. We remove these rows and report how many are excluded.
# Count and drop rows with missing party
total_rows = len(merged_sponsorships)
missing_party_count = merged_sponsorships['party'].isna().sum()
pct_missing = (missing_party_count / total_rows) * 100 if total_rows > 0 else 0
print(f"Rows without party: {missing_party_count} of {total_rows} ({pct_missing:.2f}%)")
df = merged_sponsorships.dropna(subset=['party'])
Rows without party: 0 of 89176 (0.00%)
Step 2: Join cosponsors with party of the primary sponsor#
We isolate primary sponsors and cosponsors, then merge cosponsors with the party of the legislator who introduced the bill. We also check if any cosponsor entries cannot be linked to a primary sponsor.
# Get party of bill authors (primary sponsors)
primary_df = df[df['classification'] == 'primary'][['bill_id', 'party']].copy()
primary_df = primary_df.rename(columns={'party': 'primary_party'})
# Get cosponsor records with identifying information
cosponsor_df = df[df['classification'] == 'cosponsor'][['bill_id', 'name', 'party', 'bioguide_id']].copy()
# Merge cosponsors with primary sponsor party
merged_df = cosponsor_df.merge(primary_df, on='bill_id', how='left')
# Report cosponsors without primary sponsor info
no_primary_link = merged_df['primary_party'].isna().sum()
total_cosponsors = len(cosponsor_df)
pct_no_primary = (no_primary_link / total_cosponsors) * 100 if total_cosponsors > 0 else 0
print(f"Cosponsors without linked primary sponsor: {no_primary_link} of {total_cosponsors} ({pct_no_primary:.2f}%)")
# Flag cross-party sponsorship only when both parties are known
merged_df['different_party'] = (
merged_df['primary_party'].notna() &
(merged_df['party'] != merged_df['primary_party'])
)
Cosponsors without linked primary sponsor: 0 of 80345 (0.00%)
Step 3: Count and rank cross-party cosponsorships#
We identify all legislators who cosponsored at least one bill from the opposing party. Those who did not are still included in the final ranking with a count of zero.
# All unique cosponsors
all_cosponsors = cosponsor_df[['name', 'bioguide_id']].drop_duplicates()
# Filter to cross-party cosponsorship cases
cross_party_df = merged_df[merged_df['different_party'] == True][['name', 'bioguide_id', 'bill_id']].drop_duplicates()
# Count number of cross-party cosponsorships per legislator
cross_party_counts = cross_party_df.groupby(
['name', 'bioguide_id']
).size().reset_index(name='num_cross_party_cosponsored_bills')
# Merge to include all cosponsors (even those with 0 cross-party cases)
result = all_cosponsors.merge(
cross_party_counts,
on=['name', 'bioguide_id'],
how='left'
)
result['num_cross_party_cosponsored_bills'] = result['num_cross_party_cosponsored_bills'].fillna(0).astype(int)
# Sort and rank
ranked_result_b = result.sort_values(
by='num_cross_party_cosponsored_bills',
ascending=False
).reset_index(drop=True)
ranked_result_b['rank'] = ranked_result_b.index + 1
# Preview top-ranked cross-party cosponsors
display(ranked_result_b.head(10))
name | bioguide_id | num_cross_party_cosponsored_bills | rank | |
---|---|---|---|---|
0 | Brian K. Fitzpatrick | F000466 | 433 | 1 |
1 | Donald G. Davis | D000230 | 240 | 2 |
2 | Jimmy Panetta | P000613 | 167 | 3 |
3 | Eugene Simon Vindman | V000138 | 161 | 4 |
4 | Josh Gottheimer | G000583 | 160 | 5 |
5 | Michael Lawler | L000599 | 153 | 6 |
6 | Thomas R. Suozzi | S001201 | 130 | 7 |
7 | Don Bacon | B001298 | 126 | 8 |
8 | Amy Klobuchar | K000367 | 117 | 9 |
9 | Angie Craig | C001119 | 108 | 10 |
Save ranked_result_b to csv files#
# Uncomment to save to google drive
# ranked_result_b.to_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Output files/bridge_grade_source_b_cross_party_cosponsors.csv", index=False)
# Save to local machine
ranked_result_b.to_csv("../Data/Source A-B/Output files/bridge_grade_source_b_cross_party_cosponsors.csv", index=False)
Visual Summary of Cross-Party Activity and Data Quality#
This section presents three key visualizations to complement the quantitative analysis:
Top Cross-Party Cosponsors (Source B): Legislators ranked by how often they cosponsored bills from the opposite party.
Top Authors of Cross-Party Supported Bills (Source A): Legislators ranked by the number of their sponsored bills that received bipartisan support.
Missing Values Overview: A chart highlighting columns with missing information, including the source and reasoning behind those gaps.
These charts help illustrate not only patterns of bipartisan behavior, but also the integrity and limitations of the underlying data.
ordered_purples = ['#3f007d', '#54278f', '#6a51a3', '#807dba', '#9e9ac8', '#bcbddc']
# Bar chart — Source A: Top Cross-Party Bill Authors
top_authors = ranked_result.sort_values(
by="num_bills_with_cross_party_cosponsors", ascending=False
).head(15)
fig_authors = px.bar(
top_authors,
x="num_bills_with_cross_party_cosponsors",
y="primary_name",
orientation="h",
color="num_bills_with_cross_party_cosponsors",
color_continuous_scale=ordered_purples[::-1],
title="Top 15 Legislators by Cross-Party Support to Their Bills (Source A)"
)
fig_authors.update_layout(
xaxis_title="Number of Bills with Cross-Party Cosponsors",
yaxis_title="Primary Sponsor",
yaxis=dict(autorange="reversed"),
plot_bgcolor="#ffffff",
paper_bgcolor="#ffffff",
coloraxis_showscale=False,
margin=dict(l=100, r=20, t=60, b=40)
)
fig_authors.show()