Source A-B: Legislator and Sponsorship Data Processing#

Overview

This notebook processes bill sponsorship data from the 119th U.S. Congress to generate two foundational datasets for the Bridge Grades methodology:

  • Source A: Authors of Bills with Cross Party Sponsors - Ranks legislators by how many of their sponsored bills received bipartisan cosponsorship

  • Source B: Ranked People who Cosponsor Bills - Ranks legislators by how often they cosponsored bills from the opposite party

These datasets serve as key inputs for computing individual Bridge Grades by analyzing patterns of bipartisan legislative collaboration.

Data Sources#

Input Files#

  • US_119_bill_sponsorships.csv - Bill sponsorship records from the 119th Congress (Plural Policy)

  • plural_legislators_retired_with_bioguide.csv - Historical legislator biographical data

  • plural_legislators_with_bioguide.csv - Current legislator biographical data

Data Source Details#

  • Source: Plural Policy

  • Congress: 119th U.S. Congress

  • Download Date: August 8, 2025

  • Coverage: Bill sponsorship records and biographical details of current and historical legislators


Outputs#

Source A: Authors of Bills with Cross Party Sponsors#

File: bridge_grade_source_a_cross_party_supported_bills.csv

Columns:

  • primary_name: Name of the primary sponsor

  • primary_bioguide_id: Unique legislator identifier

  • num_bills_with_cross_party_cosponsors: Count of sponsored bills with bipartisan support

  • rank: Ranking based on bipartisan support count

Interpretation: Higher rankings indicate legislators whose bills attract more cross-party support, suggesting bipartisan appeal.

Source B: Ranked People who Cosponsor Bills#

File: bridge_grade_source_b_cross_party_cosponsors.csv

Columns:

  • name: Name of the cosponsor

  • bioguide_id: Unique legislator identifier

  • num_cross_party_cosponsored_bills: Count of bills cosponsored from opposite party

  • rank: Ranking based on cross-party cosponsorship count

Interpretation: Higher rankings indicate legislators who more actively support legislation from the opposite party, showing bipartisan engagement.


Technical Requirements#

Dependencies#

  • pandas: Data manipulation and analysis

  • rapidfuzz: Fuzzy string matching

  • tqdm: Progress tracking

  • plotly.express: Data visualization

Performance Notes#

  • Fuzzy matching is computationally intensive but necessary for data quality

  • Conservative matching threshold balances accuracy with coverage

  • All original records are preserved for transparency


Notebook Walkthrough: Preprocessing for Bridge Grades: Legislator and Sponsorship Data#

This notebook prepares the input data used to generate two foundational resources in the Bridge Grades methodology:

  • Source A: Authors of Bills with Cross Party Sponsors

  • Source B: Ranked People who Cosponsor Bills

These resources serve as the basis for computing individual Bridge Grades by analyzing patterns of bipartisan legislative collaboration in the U.S. Congress.

The datasets processed here correspond to the 119th U.S. Congress and were downloaded from the public data portal at Plural Policy.

  • Date downloaded: August 8, 2025

  • Data includes: Bill sponsorship records and biographical details of both current and historical legislators.

  • Download instructions:

  1. Scroll down to United States on the List (United States 119th Congress). Or download directly Here.

  2. Go to US –> 119 –> US_119_bill_sponsorships.csv

# Install and import required libraries

!pip install rapidfuzz --quiet

import pandas as pd
from rapidfuzz import process, fuzz
from tqdm.notebook import tqdm
import plotly.express as px
import pandas as pd
import numpy as np
# Uncomment to run from colab in google drive
# # Load bill sponsorship records (includes both sponsors and cosponsors)
# sponsorships = pd.read_csv('/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Input files/US_119_bill_sponsorships.csv', encoding="latin1")

# # Read in legislators data from plural
# plural_legislators_current = pd.read_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Input files/plural_legislators_with_bioguide.csv")
# plural_legislators_retired = pd.read_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Input files/plural_legislators_retired_with_bioguide.csv")

# #combine retired and current legislators into a single dataframe
# plural_legislators = pd.concat([plural_legislators_current, plural_legislators_retired])

# # Drop legislators without a bioguide_id in the plural_legislators file
# plural_legislators = plural_legislators[plural_legislators["bioguide_id"].notna()]
# Run from local machine
# Load bill sponsorship records (includes both sponsors and cosponsors)
sponsorships = pd.read_csv('../Data/Source A-B/Input files/US_119_bill_sponsorships.csv', encoding="latin1")

# Read in legislators data from plural
plural_legislators_current = pd.read_csv("../Data/Source A-B/Input files/plural_legislators_with_bioguide.csv")
plural_legislators_retired = pd.read_csv("../Data/Source A-B/Input files/plural_legislators_retired_with_bioguide.csv")

#combine retired and current legislators into a single dataframe
plural_legislators = pd.concat([plural_legislators_current, plural_legislators_retired])

# Drop legislators without a bioguide_id in the plural_legislators file
plural_legislators = plural_legislators[plural_legislators["bioguide_id"].notna()]

Filter and Prepare Legislator Records#

# Do some variable renaming to facilitate the merge
plural_legislators = plural_legislators.rename(columns={"id": "person_id"}) #rename id to person_id (the column name in sponsorships file)
plural_legislators["party"].value_counts(dropna=False) # Check for missing values in the party column
party
Republican     360
Democratic     352
Independent      4
Name: count, dtype: int64
plural_legislators["role_type"].value_counts(dropna=False) # Check for missing values in the role_type column
role_type
lower    641
upper     75
Name: count, dtype: int64
# Create a new column in plural_legislators called Chamber that is sen if the role_type is upper and House if the role_type is lower
plural_legislators["type"] = plural_legislators["role_type"].apply(lambda x: "sen" if x == "upper" else "rep")
# Check the values of the type column
plural_legislators["type"].value_counts()
type
rep    641
sen     75
Name: count, dtype: int64
# Give a state and district column to the plural_legislators file
us_state_abbrev = {
    'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR',
    'California': 'CA', 'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE',
    'Florida': 'FL', 'Georgia': 'GA', 'Hawaii': 'HI', 'Idaho': 'ID',
    'Illinois': 'IL', 'Indiana': 'IN', 'Iowa': 'IA', 'Kansas': 'KS',
    'Kentucky': 'KY', 'Louisiana': 'LA', 'Maine': 'ME', 'Maryland': 'MD',
    'Massachusetts': 'MA', 'Michigan': 'MI', 'Minnesota': 'MN', 'Mississippi': 'MS',
    'Missouri': 'MO', 'Montana': 'MT', 'Nebraska': 'NE', 'Nevada': 'NV',
    'New Hampshire': 'NH', 'New Jersey': 'NJ', 'New Mexico': 'NM', 'New York': 'NY',
    'North Carolina': 'NC', 'North Dakota': 'ND', 'Ohio': 'OH', 'Oklahoma': 'OK',
    'Oregon': 'OR', 'Pennsylvania': 'PA', 'Rhode Island': 'RI', 'South Carolina': 'SC',
    'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX', 'Utah': 'UT',
    'Vermont': 'VT', 'Virginia': 'VA', 'Washington': 'WA', 'West Virginia': 'WV',
    'Wisconsin': 'WI', 'Wyoming': 'WY'
}
plural_legislators = plural_legislators.rename(columns={"district": "compounded_district"})
# Create district and state columns in the plural_legislators file
def extract_state_and_district(row):
    district_val = row["compounded_district"]
    type_val = row["type"]

    if type_val == "rep" and "-" in str(district_val):
        state, district_number = district_val.split("-", 1)
        return pd.Series([state, district_number])
    elif type_val == "sen":
        # Map full name to abbreviation
        state_abbrev = us_state_abbrev.get(district_val, np.nan)
        return pd.Series([state_abbrev, np.nan])
    else:
        return pd.Series([np.nan, np.nan])
# Apply the function to the DataFrame
plural_legislators[["state", "district"]] = plural_legislators.apply(extract_state_and_district, axis=1)

Merge bioguides to sponsorships#

# Merge the sponsorships with the plural legislators to get the bioguide_id into the sponsorships file
sponsorships_with_bioguide = pd.merge(
    sponsorships,
    plural_legislators[["bioguide_id", "person_id"]],
    on="person_id",
    how="left"
)
# How many records are missing a bioguide_id?
print("Matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].notna()].shape[0])
# How many records are missing a person_id?
print("Not matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()].shape[0])
Matched:  85302
Not matched:  3874

Now let’s save the non-matched sponsorships and save the unique legislators that were not matched

non_matched_sponsorships = sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()]
unmatched_names = non_matched_sponsorships["name"].unique()
unmatched_names = unmatched_names.tolist()
unmatched_names
['Christopher S. Murphy',
 'John K. Fetterman',
 'Ben R. Luján',
 'John F. Reed',
 'Randy K. Weber',
 'Mark B. Messmer',
 'William R. Timmons',
 'John J. McGuire',
 'Robert F. Onder',
 'Addison P. McDowell',
 'Donald S. Beyer',
 'Marlin A. Stutzman',
 'Jennifer A. Kiggans',
 'Nicholas J. Begich',
 'Craig A. Goldman',
 'Riley M. Moore',
 'Thomas H. Kean',
 'Eugene Simon Vindman',
 'Gilbert Ray Cisneros',
 'Robert P. Bresnahan',
 'Herbert C. Conaway',
 'Janelle S. Bynum',
 'John W. Mannion',
 'Abraham J. Hamadeh',
 'Sam T. Liccardo',
 'David J. Taylor',
 'Luz M. Rivas',
 'Pablo Jose Hernández',
 'Kimberlyn King-Hinds',
 'Jacklyn Rosen',
 'Ben R. Lujan',
 'David McCormick',
 'J. D. Vance']

Here we will use fuzzy matching to get the best matches and then manually check the matches and insert the bioguide_ids manually for more accuracy.

name_df = plural_legislators # We will see if we can find the bioguide_id from the name column in the plural_legislators file

# Column in your DataFrame to match against
name_column = "name"

# Dictionary to store best matches
best_matches = {}

for name in unmatched_names:
    match, score, _ = process.extractOne(
        name,
        name_df[name_column],
        scorer=fuzz.token_sort_ratio  # You can experiment with others like fuzz.ratio
    )
    best_matches[name] = (match, score)

# Convert to DataFrame for easier viewing
import pandas as pd
match_df = pd.DataFrame([
    {"unmatched_name": k, "matched_name": v[0], "score": v[1]}
    for k, v in best_matches.items()
])

# Optional: Filter for high confidence matches
high_confidence_matches = match_df[match_df["score"] > 67] # Set the threshold to 67 to get the best matches
# Merge to get bioguide_ids from name_df
high_confidence_with_bioguide = high_confidence_matches.merge(
    name_df[[name_column, "bioguide_id"]],
    left_on="matched_name",
    right_on=name_column,
    how="left"
)

# Optional: drop the duplicate name column
high_confidence_with_bioguide = high_confidence_with_bioguide.drop(columns=[name_column])

# View the result
print(high_confidence_with_bioguide)
           unmatched_name          matched_name      score bioguide_id
0   Christopher S. Murphy          Chris Murphy  72.727273     M001169
1       John K. Fetterman        John Fetterman  90.322581     F000479
2           Ben R. Luján         Ben Ray Luján  76.923077     L000570
3          Randy K. Weber           Randy Weber  88.000000     W000814
4         Mark B. Messmer          Mark Messmer  88.888889     M001233
5      William R. Timmons       William Timmons  90.909091     T000480
6         John J. McGuire          John McGuire  88.888889     M001239
7     Addison P. McDowell      Addison McDowell  91.428571     M001240
8         Donald S. Beyer             Don Beyer  75.000000     B001292
9      Marlin A. Stutzman       Marlin Stutzman  90.909091     S001188
10    Jennifer A. Kiggans           Jen Kiggans  73.333333     K000399
11     Nicholas J. Begich           Nick Begich  68.965517     B001323
12       Craig A. Goldman         Craig Goldman  89.655172     G000601
13         Riley M. Moore           Riley Moore  88.000000     M001235
14         Thomas H. Kean              Tom Kean  72.727273     K000398
15   Eugene Simon Vindman        Eugene Vindman  82.352941     V000138
16   Gilbert Ray Cisneros          Gil Cisneros  75.000000     C001123
17    Robert P. Bresnahan         Rob Bresnahan  81.250000     B001327
18     Herbert C. Conaway          Herb Conaway  80.000000     C001136
19       Janelle S. Bynum         Janelle Bynum  89.655172     B001326
20        John W. Mannion          John Mannion  88.888889     M001231
21     Abraham J. Hamadeh           Abe Hamadeh  68.965517     H001098
22        Sam T. Liccardo          Sam Liccardo  88.888889     L000607
23        David J. Taylor           Dave Taylor  76.923077     T000490
24           Luz M. Rivas             Luz Rivas  85.714286     R000620
25  Pablo Jose Hernández  Pablo José Hernández  87.804878     H001103
26   Kimberlyn King-Hinds        Kim King-Hinds  82.352941     K000404
27          Jacklyn Rosen           Jacky Rosen  91.666667     R000608
28           Ben R. Lujan         Ben Ray Luján  80.000000     L000570
29        David McCormick        Dave McCormick  89.655172     M001243
30            J. D. Vance              JD Vance  73.684211     V000137

Warning

It is very important that you check the resuls above manually and make sure everyone on this list is matched correctly!

manual_bioguide_map = dict(zip(
    high_confidence_with_bioguide["unmatched_name"],
    high_confidence_with_bioguide["bioguide_id"]
))

manual_bioguide_map
{'Christopher S. Murphy': 'M001169',
 'John K. Fetterman': 'F000479',
 'Ben R. Luján': 'L000570',
 'Randy K. Weber': 'W000814',
 'Mark B. Messmer': 'M001233',
 'William R. Timmons': 'T000480',
 'John J. McGuire': 'M001239',
 'Addison P. McDowell': 'M001240',
 'Donald S. Beyer': 'B001292',
 'Marlin A. Stutzman': 'S001188',
 'Jennifer A. Kiggans': 'K000399',
 'Nicholas J. Begich': 'B001323',
 'Craig A. Goldman': 'G000601',
 'Riley M. Moore': 'M001235',
 'Thomas H. Kean': 'K000398',
 'Eugene Simon Vindman': 'V000138',
 'Gilbert Ray Cisneros': 'C001123',
 'Robert P. Bresnahan': 'B001327',
 'Herbert C. Conaway': 'C001136',
 'Janelle S. Bynum': 'B001326',
 'John W. Mannion': 'M001231',
 'Abraham J. Hamadeh': 'H001098',
 'Sam T. Liccardo': 'L000607',
 'David J. Taylor': 'T000490',
 'Luz M. Rivas': 'R000620',
 'Pablo Jose Hernández': 'H001103',
 'Kimberlyn King-Hinds': 'K000404',
 'Jacklyn Rosen': 'R000608',
 'Ben R. Lujan': 'L000570',
 'David McCormick': 'M001243',
 'J. D. Vance': 'V000137'}
# Fill missing bioguide_id in sponsorships_with_bioguide
sponsorships_with_bioguide["bioguide_id"] = sponsorships_with_bioguide.apply(
    lambda row: manual_bioguide_map.get(row["name"], row["bioguide_id"]),
    axis=1
)
# How many records are missing a bioguide_id?
print("Matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].notna()].shape[0])
print("Not matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()].shape[0])
Matched:  88932
Not matched:  244
# Now let's check for the legislators that were not matched, with a score lower than 67
low_confidence_matches = match_df[match_df["score"] < 67]
low_confidence_matches
unmatched_name matched_name score
3 John F. Reed John Rose 66.666667
8 Robert F. Onder Robert Aderholt 66.666667

Note

We will now search manually for these legislators in the plural_legislators file and insert the bioguide_ids manually.

Here you can check the list of retired members from OpenStates, and check wikipedia for possible other names.

manual_bioguide_ids = {
    "John F. Reed": "R000122", # Is listed as Jack Reed but it's the same person
    "Robert F. Onder": "O000177", # Is listed as Bob Onder but it's the same person
    # ... fill in the rest manually
    #"J. D. Vance": "V001234"
}

# Apply the mapping to fill missing bioguide_id
sponsorships_with_bioguide["bioguide_id"] = sponsorships_with_bioguide.apply(
    lambda row: manual_bioguide_ids.get(row["name"], row["bioguide_id"]),
    axis=1
)
# How many records are missing a bioguide_id?
print("Matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].notna()].shape[0])
print("Not matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()].shape[0])
Matched:  89176
Not matched:  0
# Final dataset is a list of sponsorships with all of the bioguide_id, name, and chamber
sponsorships_with_bioguide.head()
id name entity_type organization_id person_id bill_id primary classification bioguide_id
0 272e17aa-0d95-48af-aeb0-2b105b35596f Cory Booker person NaN ocd-person/d0bb2662-40b6-52d2-94e6-2bf9d588c954 ocd-bill/2b2697ce-cd17-4fd1-ade6-61100214e5a8 False cosponsor B001288
1 0596b61b-2452-4043-864d-41f62748526a Brian Schatz person NaN ocd-person/bd24fe60-5756-5d23-bcbf-4734f366c257 ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94 False cosponsor S001194
2 3715e1c7-54a5-441f-b004-fe41d97724be Roger F. Wicker person NaN ocd-person/8fe2b037-b8ff-5161-9bb4-196e9f1d9dca ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94 True primary W000437
3 c943434a-5029-49c2-b38b-31b651b7c38c Tina Smith person NaN ocd-person/f82067c3-a257-54ae-b6bd-ebe5178372e7 ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7 False cosponsor S001203
4 c0c4a9c7-bf64-4ce3-93fb-63335272ef8a Jeff Merkley person NaN ocd-person/96291b83-ea59-594a-97cf-630189e0b8ed ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7 False cosponsor M001176

Insert Party and chamber#

# Now let's use the bioguide_ids to get the party, type, state and districtis from the plural_legislators file via the bioguide_id
merged_sponsorships = pd.merge(
    sponsorships_with_bioguide,
    plural_legislators[["bioguide_id", "party", "type", "state", "district"]],
    on="bioguide_id",
    how="left"
)
merged_sponsorships
id name entity_type organization_id person_id bill_id primary classification bioguide_id party type state district
0 272e17aa-0d95-48af-aeb0-2b105b35596f Cory Booker person NaN ocd-person/d0bb2662-40b6-52d2-94e6-2bf9d588c954 ocd-bill/2b2697ce-cd17-4fd1-ade6-61100214e5a8 False cosponsor B001288 Democratic sen NJ NaN
1 0596b61b-2452-4043-864d-41f62748526a Brian Schatz person NaN ocd-person/bd24fe60-5756-5d23-bcbf-4734f366c257 ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94 False cosponsor S001194 Democratic sen HI NaN
2 3715e1c7-54a5-441f-b004-fe41d97724be Roger F. Wicker person NaN ocd-person/8fe2b037-b8ff-5161-9bb4-196e9f1d9dca ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94 True primary W000437 Republican rep MS 1
3 c943434a-5029-49c2-b38b-31b651b7c38c Tina Smith person NaN ocd-person/f82067c3-a257-54ae-b6bd-ebe5178372e7 ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7 False cosponsor S001203 Democratic sen MN NaN
4 c0c4a9c7-bf64-4ce3-93fb-63335272ef8a Jeff Merkley person NaN ocd-person/96291b83-ea59-594a-97cf-630189e0b8ed ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7 False cosponsor M001176 Democratic sen OR NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
89171 38397786-595a-4685-9516-6c53777073cb Ted Budd person NaN ocd-person/67c3f0bb-7182-51b8-98d0-253144992bfe ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0 False cosponsor B001305 Republican rep NC 13
89172 4dc9b53c-a378-4c53-a223-ca8a5e126161 James C. Justice person NaN ocd-person/2f9a1274-1a6a-4a78-b253-01ec02f20eee ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0 False cosponsor J000312 Republican sen WV NaN
89173 085171c5-5a35-4759-851e-cad314664552 Pete Ricketts person NaN ocd-person/e1b9405a-e695-5bc5-b3b0-6cdc1415f53c ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0 False cosponsor R000618 Republican sen NE NaN
89174 a4b34c09-6375-4197-af04-09d53c682006 Ted Cruz person NaN ocd-person/c4d35f64-f25b-5888-9fcb-f3cb97df500e ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0 True primary C001098 Republican sen TX NaN
89175 e40b1beb-0049-4128-9d6e-23334f3879bf Mike Lee person NaN ocd-person/461257ff-088f-5edd-b342-4652ce1c111f ocd-bill/6d5519d3-e11e-4e67-9265-ae194575c0ff True primary L000577 Republican sen UT NaN

89176 rows × 13 columns

# Check for missing values on the bioguide_id, type, party, state, and district (if type == "rep")
print(merged_sponsorships["bioguide_id"].isna().sum())
print(merged_sponsorships["type"].isna().sum())
print(merged_sponsorships["party"].isna().sum())
print(merged_sponsorships["state"].isna().sum())
print(merged_sponsorships["district"][merged_sponsorships["type"] == "rep"].isna().sum())
0
0
0
0
0

About merged_sponsorships

The merged_sponsorships DataFrame consolidates all sponsorship records from the original dataset (US_119_bill_sponsorships.csv) and augments them with legislator information when a match is found.

This dataset includes:

  • The original sponsor and bill information (name, bill_id, classification, etc.).

  • Legislator attributes (bioguide_id, party, type, state, district) for matched sponsors.

  • All legislators in the sponsorship data should now have a bioguide_id.

This ensures that no data is lost. It serves as the foundational dataset for constructing Source A (Authors of Bills with Cross Party Sponsors) and Source B (Ranked People who Cosponsor Bills) in the Bridge Grades methodology.

Check independents#

merged_sponsorships.columns
Index(['id', 'name', 'entity_type', 'organization_id', 'person_id', 'bill_id',
       'primary', 'classification', 'bioguide_id', 'party', 'type', 'state',
       'district'],
      dtype='object')
merged_sponsorships[merged_sponsorships['party']== 'Independent']['name'].unique()
array(['Bernard Sanders', 'Bernie Sanders', 'Angus S. King'], dtype=object)

Warning

Check that legislators above aligned with the democratic party before running the code below. These legislators are factually democratic and should not be given rewards for sponsoring with the democratic party. Independents might change in the future, so please revise this part of the code

merged_sponsorships['party'] = merged_sponsorships['party'].replace('Independent', 'Democratic')

Construct Cross-Party Sponsorship Ranking (Source A)#

This section calculates how many bills each primary sponsor introduced that received support from at least one cosponsor of a different political party.

To ensure accurate and complete results:

  • We filter out records without a declared party (required for identifying bipartisan collaboration).

  • We validate the structure of sponsorship data, including missing values and unmatched relationships.

  • We retain identifiers (bioguide_id) for both sponsors and cosponsors, to preserve analytical flexibility.

The final output ranks each legislator who served as a primary sponsor based on the number of bills they sponsored that attracted cross-party support.

# Filter out rows with missing party information and report exclusion stats
total_records = len(merged_sponsorships)
missing_party = merged_sponsorships['party'].isna().sum()
pct_missing_party = (missing_party / total_records) * 100 if total_records > 0 else 0

print(f"Rows without party: {missing_party} of {total_records} ({pct_missing_party:.2f}%)")

# Drop rows without party affiliation (cannot evaluate bipartisanship)
df = merged_sponsorships.dropna(subset=['party'])
Rows without party: 0 of 89176 (0.00%)
df.shape[0]
89176

Separate primary sponsors and cosponsors#

We isolate the primary sponsors and cosponsors into two separate DataFrames, ensuring that both include the sponsor name, party, and bioguide ID.

Primary sponsors are then renamed to distinguish them clearly when merged with cosponsor data.

# Select primary sponsors
primary_df = df[df['classification'] == 'primary'][['bill_id', 'name', 'party', 'bioguide_id']].copy()

# Select cosponsors (with bioguide_id retained)
cosponsor_df = df[df['classification'] == 'cosponsor'][['bill_id', 'name', 'party', 'bioguide_id']].copy()

# Rename primary sponsor columns to avoid conflict after merge
primary_df = primary_df.rename(columns={
    'name': 'primary_name',
    'party': 'primary_party',
    'bioguide_id': 'primary_bioguide_id'
})

Merge cosponsors with their primary sponsor and flag cross-party cases#

Each cosponsor is merged with their bill’s primary sponsor. We validate how many cosponsors are unmatched (bills without a primary sponsor), and we flag cases where the cosponsor and sponsor are from different parties.

# Merge cosponsors with primary sponsors on bill_id
merged_df = cosponsor_df.merge(primary_df, on='bill_id', how='left')

# Report how many cosponsors are linked to bills without a recorded primary sponsor
no_primary = merged_df['primary_name'].isna().sum()
total_cosponsors = len(cosponsor_df)
pct_no_primary = (no_primary / total_cosponsors) * 100 if total_cosponsors > 0 else 0

print(f"Cosponsors without linked primary sponsor: {no_primary} of {total_cosponsors} ({pct_no_primary:.2f}%)")

# Flag whether the cosponsor is from a different party (ignoring rows with missing primary_party)
merged_df['different_party'] = (
    merged_df['primary_party'].notna() &
    (merged_df['party'] != merged_df['primary_party'])
)
Cosponsors without linked primary sponsor: 0 of 80345 (0.00%)

Count and rank bills with cross-party cosponsors#

We count the number of bills each primary sponsor introduced that had at least one cross-party cosponsor. Primary sponsors with zero such bills are retained in the ranking with a count of zero.

# Get all unique primary sponsors
all_primary_sponsors = primary_df[['primary_name', 'primary_bioguide_id']].drop_duplicates()

# Identify bills with at least one cross-party cosponsor
bills_with_diff_party = merged_df[merged_df['different_party'] == True][
    ['bill_id', 'primary_name', 'primary_bioguide_id']
].drop_duplicates()

# Count number of such bills per primary sponsor
cross_party_counts = bills_with_diff_party.groupby(
    ['primary_name', 'primary_bioguide_id']
).size().reset_index(name='num_bills_with_cross_party_cosponsors')

# Merge with all primary sponsors to include those with zero
result = all_primary_sponsors.merge(
    cross_party_counts,
    on=['primary_name', 'primary_bioguide_id'],
    how='left'
)
result['num_bills_with_cross_party_cosponsors'] = result['num_bills_with_cross_party_cosponsors'].fillna(0).astype(int)

# Sort and rank
ranked_result = result.sort_values(
    by='num_bills_with_cross_party_cosponsors',
    ascending=False
).reset_index(drop=True)

ranked_result['rank'] = ranked_result.index + 1

# Preview final result
display(ranked_result.head(10))
primary_name primary_bioguide_id num_bills_with_cross_party_cosponsors rank
0 Amy Klobuchar K000367 51 1
1 Bill Cassidy C001075 49 2
2 John Cornyn C001056 48 3
3 Chuck Grassley G000386 43 4
4 Brian K. Fitzpatrick F000466 42 5
5 Marsha Blackburn B001243 41 6
6 Gary Peters P000595 38 7
7 Catherine Cortez Masto C001113 34 8
8 Jeanne Shaheen S001181 31 9
9 Zachary Nunn N000193 29 10

Save ranked_result to csv files#

# Uncomment to save to google drive
# ranked_result.to_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Output files/bridge_grade_source_a_cross_party_supported_bills.csv", index=False)

# Save to local machine
ranked_result.to_csv("../Data/Source A-B/Output files/bridge_grade_source_a_cross_party_supported_bills.csv", index=False)

About this file

The bridge_grade_source_a_cross_party_supported_bills.csv DataFrame contains a ranked list of legislators who served as primary sponsors during the 119th Congress, based on the number of bills they introduced that received support from at least one cosponsor of a different political party.

Each row represents a unique primary sponsor and includes:

  • primary_name: The name of the primary sponsor.

  • primary_bioguide_id: A unique identifier for the legislator.

  • num_bills_with_cross_party_cosponsors: The number of sponsored bills that received cross-party cosponsorship.

  • rank: The sponsor’s rank based on that count (higher values indicate more bipartisan engagement).

This dataset constitutes Source A in the Bridge Grades methodology:
“Authors of Bills with Cross Party Sponsors” — a signal of bipartisan openness based on the cosponsorship behavior of others toward the legislator’s initiatives.

Construct Cross-Party Cosponsorship Ranking (Source B)#

This section generates a ranking of legislators based on the number of bills they cosponsored where the primary sponsor was from a different political party.

The output represents Source B in the Bridge Grades methodology:

“Ranked People who Cosponsor Bills” — a measure of bipartisan engagement based on a legislator’s willingness to support legislation authored by the opposing party.

The data used here comes from the merged_sponsorships DataFrame produced earlier in this notebook.

Step 1: Remove rows without party affiliation#

Cosponsorship records without party information cannot be used to assess bipartisan behavior. We remove these rows and report how many are excluded.

# Count and drop rows with missing party
total_rows = len(merged_sponsorships)
missing_party_count = merged_sponsorships['party'].isna().sum()
pct_missing = (missing_party_count / total_rows) * 100 if total_rows > 0 else 0

print(f"Rows without party: {missing_party_count} of {total_rows} ({pct_missing:.2f}%)")

df = merged_sponsorships.dropna(subset=['party'])
Rows without party: 0 of 89176 (0.00%)

Step 2: Join cosponsors with party of the primary sponsor#

We isolate primary sponsors and cosponsors, then merge cosponsors with the party of the legislator who introduced the bill. We also check if any cosponsor entries cannot be linked to a primary sponsor.

# Get party of bill authors (primary sponsors)
primary_df = df[df['classification'] == 'primary'][['bill_id', 'party']].copy()
primary_df = primary_df.rename(columns={'party': 'primary_party'})

# Get cosponsor records with identifying information
cosponsor_df = df[df['classification'] == 'cosponsor'][['bill_id', 'name', 'party', 'bioguide_id']].copy()

# Merge cosponsors with primary sponsor party
merged_df = cosponsor_df.merge(primary_df, on='bill_id', how='left')

# Report cosponsors without primary sponsor info
no_primary_link = merged_df['primary_party'].isna().sum()
total_cosponsors = len(cosponsor_df)
pct_no_primary = (no_primary_link / total_cosponsors) * 100 if total_cosponsors > 0 else 0

print(f"Cosponsors without linked primary sponsor: {no_primary_link} of {total_cosponsors} ({pct_no_primary:.2f}%)")

# Flag cross-party sponsorship only when both parties are known
merged_df['different_party'] = (
    merged_df['primary_party'].notna() &
    (merged_df['party'] != merged_df['primary_party'])
)
Cosponsors without linked primary sponsor: 0 of 80345 (0.00%)

Step 3: Count and rank cross-party cosponsorships#

We identify all legislators who cosponsored at least one bill from the opposing party. Those who did not are still included in the final ranking with a count of zero.

# All unique cosponsors
all_cosponsors = cosponsor_df[['name', 'bioguide_id']].drop_duplicates()

# Filter to cross-party cosponsorship cases
cross_party_df = merged_df[merged_df['different_party'] == True][['name', 'bioguide_id', 'bill_id']].drop_duplicates()

# Count number of cross-party cosponsorships per legislator
cross_party_counts = cross_party_df.groupby(
    ['name', 'bioguide_id']
).size().reset_index(name='num_cross_party_cosponsored_bills')

# Merge to include all cosponsors (even those with 0 cross-party cases)
result = all_cosponsors.merge(
    cross_party_counts,
    on=['name', 'bioguide_id'],
    how='left'
)
result['num_cross_party_cosponsored_bills'] = result['num_cross_party_cosponsored_bills'].fillna(0).astype(int)

# Sort and rank
ranked_result_b = result.sort_values(
    by='num_cross_party_cosponsored_bills',
    ascending=False
).reset_index(drop=True)

ranked_result_b['rank'] = ranked_result_b.index + 1

# Preview top-ranked cross-party cosponsors
display(ranked_result_b.head(10))
name bioguide_id num_cross_party_cosponsored_bills rank
0 Brian K. Fitzpatrick F000466 433 1
1 Donald G. Davis D000230 240 2
2 Jimmy Panetta P000613 167 3
3 Eugene Simon Vindman V000138 161 4
4 Josh Gottheimer G000583 160 5
5 Michael Lawler L000599 153 6
6 Thomas R. Suozzi S001201 130 7
7 Don Bacon B001298 126 8
8 Amy Klobuchar K000367 117 9
9 Angie Craig C001119 108 10

Save ranked_result_b to csv files#

# Uncomment to save to google drive
# ranked_result_b.to_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Output files/bridge_grade_source_b_cross_party_cosponsors.csv", index=False)

# Save to local machine
ranked_result_b.to_csv("../Data/Source A-B/Output files/bridge_grade_source_b_cross_party_cosponsors.csv", index=False)

Visual Summary of Cross-Party Activity and Data Quality#

This section presents three key visualizations to complement the quantitative analysis:

  1. Top Cross-Party Cosponsors (Source B): Legislators ranked by how often they cosponsored bills from the opposite party.

  2. Top Authors of Cross-Party Supported Bills (Source A): Legislators ranked by the number of their sponsored bills that received bipartisan support.

  3. Missing Values Overview: A chart highlighting columns with missing information, including the source and reasoning behind those gaps.

These charts help illustrate not only patterns of bipartisan behavior, but also the integrity and limitations of the underlying data.

ordered_purples = ['#3f007d', '#54278f', '#6a51a3', '#807dba', '#9e9ac8', '#bcbddc']
# Bar chart — Source A: Top Cross-Party Bill Authors
top_authors = ranked_result.sort_values(
    by="num_bills_with_cross_party_cosponsors", ascending=False
).head(15)

fig_authors = px.bar(
    top_authors,
    x="num_bills_with_cross_party_cosponsors",
    y="primary_name",
    orientation="h",
    color="num_bills_with_cross_party_cosponsors",
    color_continuous_scale=ordered_purples[::-1],
    title="Top 15 Legislators by Cross-Party Support to Their Bills (Source A)"
)

fig_authors.update_layout(
    xaxis_title="Number of Bills with Cross-Party Cosponsors",
    yaxis_title="Primary Sponsor",
    yaxis=dict(autorange="reversed"),
    plot_bgcolor="#ffffff",
    paper_bgcolor="#ffffff",
    coloraxis_showscale=False,
    margin=dict(l=100, r=20, t=60, b=40)
)

fig_authors.show()