Source A-B: Legislator and Sponsorship Data Processing

Source A-B: Legislator and Sponsorship Data Processing#

Overview

This notebook processes bill sponsorship data from the 119th U.S. Congress to generate two foundational datasets for the Bridge Grades methodology:

Source A: Authors of Bills with Cross Party Sponsors - Ranks legislators by how many of their sponsored bills received bipartisan cosponsorship
Source B: Ranked People who Cosponsor Bills - Ranks legislators by how often they cosponsored bills from the opposite party

These datasets serve as key inputs for computing individual Bridge Grades by analyzing patterns of bipartisan legislative collaboration.

Data Sources#

Input Files#

US_119_bill_sponsorships.csv - Bill sponsorship records from the 119th Congress (Plural Policy)
plural_legislators_retired_with_bioguide.csv - Historical legislator biographical data
plural_legislators_with_bioguide.csv - Current legislator biographical data

Data Source Details#

Source: Plural Policy
Congress: 119th U.S. Congress
Download Date: August 8, 2025
Coverage: Bill sponsorship records and biographical details of current and historical legislators

Outputs#

Source A: Authors of Bills with Cross Party Sponsors#

File: bridge_grade_source_a_cross_party_supported_bills.csv

Columns:

primary_name: Name of the primary sponsor
primary_bioguide_id: Unique legislator identifier
num_bills_with_cross_party_cosponsors: Count of sponsored bills with bipartisan support
rank: Ranking based on bipartisan support count

Interpretation: Higher rankings indicate legislators whose bills attract more cross-party support, suggesting bipartisan appeal.

Source B: Ranked People who Cosponsor Bills#

File: bridge_grade_source_b_cross_party_cosponsors.csv

Columns:

name: Name of the cosponsor
bioguide_id: Unique legislator identifier
num_cross_party_cosponsored_bills: Count of bills cosponsored from opposite party
rank: Ranking based on cross-party cosponsorship count

Interpretation: Higher rankings indicate legislators who more actively support legislation from the opposite party, showing bipartisan engagement.

Technical Requirements#

Dependencies#

pandas: Data manipulation and analysis
rapidfuzz: Fuzzy string matching
tqdm: Progress tracking
plotly.express: Data visualization

Performance Notes#

Fuzzy matching is computationally intensive but necessary for data quality
Conservative matching threshold balances accuracy with coverage
All original records are preserved for transparency

Notebook Walkthrough: Preprocessing for Bridge Grades: Legislator and Sponsorship Data#

This notebook prepares the input data used to generate two foundational resources in the Bridge Grades methodology:

Source A: Authors of Bills with Cross Party Sponsors
Source B: Ranked People who Cosponsor Bills

These resources serve as the basis for computing individual Bridge Grades by analyzing patterns of bipartisan legislative collaboration in the U.S. Congress.

The datasets processed here correspond to the 119th U.S. Congress and were downloaded from the public data portal at Plural Policy.

Date downloaded: August 8, 2025
Data includes: Bill sponsorship records and biographical details of both current and historical legislators.
Download instructions:

Scroll down to United States on the List (United States 119th Congress). Or download directly Here.
Go to US –> 119 –> US_119_bill_sponsorships.csv

# Install and import required libraries

!pip install rapidfuzz --quiet

import pandas as pd
from rapidfuzz import process, fuzz
from tqdm.notebook import tqdm
import plotly.express as px
import pandas as pd
import numpy as np

# Uncomment to run from colab in google drive
# # Load bill sponsorship records (includes both sponsors and cosponsors)
# sponsorships = pd.read_csv('/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Input files/US_119_bill_sponsorships.csv', encoding="latin1")

# # Read in legislators data from plural
# plural_legislators_current = pd.read_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Input files/plural_legislators_with_bioguide.csv")
# plural_legislators_retired = pd.read_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Input files/plural_legislators_retired_with_bioguide.csv")

# #combine retired and current legislators into a single dataframe
# plural_legislators = pd.concat([plural_legislators_current, plural_legislators_retired])

# # Drop legislators without a bioguide_id in the plural_legislators file
# plural_legislators = plural_legislators[plural_legislators["bioguide_id"].notna()]

# Run from local machine
# Load bill sponsorship records (includes both sponsors and cosponsors)
sponsorships = pd.read_csv('../Data/Source A-B/Input files/US_119_bill_sponsorships.csv', encoding="latin1")

# Read in legislators data from plural
plural_legislators_current = pd.read_csv("../Data/Source A-B/Input files/plural_legislators_with_bioguide.csv")
plural_legislators_retired = pd.read_csv("../Data/Source A-B/Input files/plural_legislators_retired_with_bioguide.csv")

#combine retired and current legislators into a single dataframe
plural_legislators = pd.concat([plural_legislators_current, plural_legislators_retired])

# Drop legislators without a bioguide_id in the plural_legislators file
plural_legislators = plural_legislators[plural_legislators["bioguide_id"].notna()]

Filter and Prepare Legislator Records#

# Do some variable renaming to facilitate the merge
plural_legislators = plural_legislators.rename(columns={"id": "person_id"}) #rename id to person_id (the column name in sponsorships file)

plural_legislators["party"].value_counts(dropna=False) # Check for missing values in the party column

party
Republican     360
Democratic     352
Independent      4
Name: count, dtype: int64

plural_legislators["role_type"].value_counts(dropna=False) # Check for missing values in the role_type column

role_type
lower    641
upper     75
Name: count, dtype: int64

# Create a new column in plural_legislators called Chamber that is sen if the role_type is upper and House if the role_type is lower
plural_legislators["type"] = plural_legislators["role_type"].apply(lambda x: "sen" if x == "upper" else "rep")

# Check the values of the type column
plural_legislators["type"].value_counts()

type
rep    641
sen     75
Name: count, dtype: int64

# Give a state and district column to the plural_legislators file
us_state_abbrev = {
    'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR',
    'California': 'CA', 'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE',
    'Florida': 'FL', 'Georgia': 'GA', 'Hawaii': 'HI', 'Idaho': 'ID',
    'Illinois': 'IL', 'Indiana': 'IN', 'Iowa': 'IA', 'Kansas': 'KS',
    'Kentucky': 'KY', 'Louisiana': 'LA', 'Maine': 'ME', 'Maryland': 'MD',
    'Massachusetts': 'MA', 'Michigan': 'MI', 'Minnesota': 'MN', 'Mississippi': 'MS',
    'Missouri': 'MO', 'Montana': 'MT', 'Nebraska': 'NE', 'Nevada': 'NV',
    'New Hampshire': 'NH', 'New Jersey': 'NJ', 'New Mexico': 'NM', 'New York': 'NY',
    'North Carolina': 'NC', 'North Dakota': 'ND', 'Ohio': 'OH', 'Oklahoma': 'OK',
    'Oregon': 'OR', 'Pennsylvania': 'PA', 'Rhode Island': 'RI', 'South Carolina': 'SC',
    'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX', 'Utah': 'UT',
    'Vermont': 'VT', 'Virginia': 'VA', 'Washington': 'WA', 'West Virginia': 'WV',
    'Wisconsin': 'WI', 'Wyoming': 'WY'
}
plural_legislators = plural_legislators.rename(columns={"district": "compounded_district"})
# Create district and state columns in the plural_legislators file
def extract_state_and_district(row):
    district_val = row["compounded_district"]
    type_val = row["type"]

    if type_val == "rep" and "-" in str(district_val):
        state, district_number = district_val.split("-", 1)
        return pd.Series([state, district_number])
    elif type_val == "sen":
        # Map full name to abbreviation
        state_abbrev = us_state_abbrev.get(district_val, np.nan)
        return pd.Series([state_abbrev, np.nan])
    else:
        return pd.Series([np.nan, np.nan])

# Apply the function to the DataFrame
plural_legislators[["state", "district"]] = plural_legislators.apply(extract_state_and_district, axis=1)

Merge bioguides to sponsorships#

# Merge the sponsorships with the plural legislators to get the bioguide_id into the sponsorships file
sponsorships_with_bioguide = pd.merge(
    sponsorships,
    plural_legislators[["bioguide_id", "person_id"]],
    on="person_id",
    how="left"
)

# How many records are missing a bioguide_id?
print("Matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].notna()].shape[0])
# How many records are missing a person_id?
print("Not matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()].shape[0])

Matched:  85302
Not matched:  3874

Now let’s save the non-matched sponsorships and save the unique legislators that were not matched

non_matched_sponsorships = sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()]
unmatched_names = non_matched_sponsorships["name"].unique()
unmatched_names = unmatched_names.tolist()
unmatched_names

['Christopher S. Murphy',
 'John K. Fetterman',
 'Ben R. LujÃ¡n',
 'John F. Reed',
 'Randy K. Weber',
 'Mark B. Messmer',
 'William R. Timmons',
 'John J. McGuire',
 'Robert F. Onder',
 'Addison P. McDowell',
 'Donald S. Beyer',
 'Marlin A. Stutzman',
 'Jennifer A. Kiggans',
 'Nicholas J. Begich',
 'Craig A. Goldman',
 'Riley M. Moore',
 'Thomas H. Kean',
 'Eugene Simon Vindman',
 'Gilbert Ray Cisneros',
 'Robert P. Bresnahan',
 'Herbert C. Conaway',
 'Janelle S. Bynum',
 'John W. Mannion',
 'Abraham J. Hamadeh',
 'Sam T. Liccardo',
 'David J. Taylor',
 'Luz M. Rivas',
 'Pablo Jose HernÃ¡ndez',
 'Kimberlyn King-Hinds',
 'Jacklyn Rosen',
 'Ben R. Lujan',
 'David McCormick',
 'J. D. Vance']

Here we will use fuzzy matching to get the best matches and then manually check the matches and insert the bioguide_ids manually for more accuracy.

name_df = plural_legislators # We will see if we can find the bioguide_id from the name column in the plural_legislators file

# Column in your DataFrame to match against
name_column = "name"

# Dictionary to store best matches
best_matches = {}

for name in unmatched_names:
    match, score, _ = process.extractOne(
        name,
        name_df[name_column],
        scorer=fuzz.token_sort_ratio  # You can experiment with others like fuzz.ratio
    )
    best_matches[name] = (match, score)

# Convert to DataFrame for easier viewing
import pandas as pd
match_df = pd.DataFrame([
    {"unmatched_name": k, "matched_name": v[0], "score": v[1]}
    for k, v in best_matches.items()
])

# Optional: Filter for high confidence matches
high_confidence_matches = match_df[match_df["score"] > 67] # Set the threshold to 67 to get the best matches

# Merge to get bioguide_ids from name_df
high_confidence_with_bioguide = high_confidence_matches.merge(
    name_df[[name_column, "bioguide_id"]],
    left_on="matched_name",
    right_on=name_column,
    how="left"
)

# Optional: drop the duplicate name column
high_confidence_with_bioguide = high_confidence_with_bioguide.drop(columns=[name_column])

# View the result
print(high_confidence_with_bioguide)

           unmatched_name          matched_name      score bioguide_id
 Christopher S. Murphy          Chris Murphy  72.727273     M001169
     John K. Fetterman        John Fetterman  90.322581     F000479
         Ben R. LujÃ¡n         Ben Ray Luján  76.923077     L000570
        Randy K. Weber           Randy Weber  88.000000     W000814
       Mark B. Messmer          Mark Messmer  88.888889     M001233
    William R. Timmons       William Timmons  90.909091     T000480
       John J. McGuire          John McGuire  88.888889     M001239
   Addison P. McDowell      Addison McDowell  91.428571     M001240
       Donald S. Beyer             Don Beyer  75.000000     B001292
    Marlin A. Stutzman       Marlin Stutzman  90.909091     S001188
  Jennifer A. Kiggans           Jen Kiggans  73.333333     K000399
   Nicholas J. Begich           Nick Begich  68.965517     B001323
     Craig A. Goldman         Craig Goldman  89.655172     G000601
       Riley M. Moore           Riley Moore  88.000000     M001235
       Thomas H. Kean              Tom Kean  72.727273     K000398
 Eugene Simon Vindman        Eugene Vindman  82.352941     V000138
 Gilbert Ray Cisneros          Gil Cisneros  75.000000     C001123
  Robert P. Bresnahan         Rob Bresnahan  81.250000     B001327
   Herbert C. Conaway          Herb Conaway  80.000000     C001136
     Janelle S. Bynum         Janelle Bynum  89.655172     B001326
      John W. Mannion          John Mannion  88.888889     M001231
   Abraham J. Hamadeh           Abe Hamadeh  68.965517     H001098
      Sam T. Liccardo          Sam Liccardo  88.888889     L000607
      David J. Taylor           Dave Taylor  76.923077     T000490
         Luz M. Rivas             Luz Rivas  85.714286     R000620
Pablo Jose HernÃ¡ndez  Pablo José Hernández  87.804878     H001103
 Kimberlyn King-Hinds        Kim King-Hinds  82.352941     K000404
        Jacklyn Rosen           Jacky Rosen  91.666667     R000608
         Ben R. Lujan         Ben Ray Luján  80.000000     L000570
      David McCormick        Dave McCormick  89.655172     M001243
          J. D. Vance              JD Vance  73.684211     V000137

Warning

It is very important that you check the resuls above manually and make sure everyone on this list is matched correctly!

manual_bioguide_map = dict(zip(
    high_confidence_with_bioguide["unmatched_name"],
    high_confidence_with_bioguide["bioguide_id"]
))

manual_bioguide_map

{'Christopher S. Murphy': 'M001169',
 'John K. Fetterman': 'F000479',
 'Ben R. LujÃ¡n': 'L000570',
 'Randy K. Weber': 'W000814',
 'Mark B. Messmer': 'M001233',
 'William R. Timmons': 'T000480',
 'John J. McGuire': 'M001239',
 'Addison P. McDowell': 'M001240',
 'Donald S. Beyer': 'B001292',
 'Marlin A. Stutzman': 'S001188',
 'Jennifer A. Kiggans': 'K000399',
 'Nicholas J. Begich': 'B001323',
 'Craig A. Goldman': 'G000601',
 'Riley M. Moore': 'M001235',
 'Thomas H. Kean': 'K000398',
 'Eugene Simon Vindman': 'V000138',
 'Gilbert Ray Cisneros': 'C001123',
 'Robert P. Bresnahan': 'B001327',
 'Herbert C. Conaway': 'C001136',
 'Janelle S. Bynum': 'B001326',
 'John W. Mannion': 'M001231',
 'Abraham J. Hamadeh': 'H001098',
 'Sam T. Liccardo': 'L000607',
 'David J. Taylor': 'T000490',
 'Luz M. Rivas': 'R000620',
 'Pablo Jose HernÃ¡ndez': 'H001103',
 'Kimberlyn King-Hinds': 'K000404',
 'Jacklyn Rosen': 'R000608',
 'Ben R. Lujan': 'L000570',
 'David McCormick': 'M001243',
 'J. D. Vance': 'V000137'}

# Fill missing bioguide_id in sponsorships_with_bioguide
sponsorships_with_bioguide["bioguide_id"] = sponsorships_with_bioguide.apply(
    lambda row: manual_bioguide_map.get(row["name"], row["bioguide_id"]),
    axis=1
)

# How many records are missing a bioguide_id?
print("Matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].notna()].shape[0])
print("Not matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()].shape[0])

Matched:  88932
Not matched:  244

# Now let's check for the legislators that were not matched, with a score lower than 67
low_confidence_matches = match_df[match_df["score"] < 67]
low_confidence_matches

	unmatched_name	matched_name	score
3	John F. Reed	John Rose	66.666667
8	Robert F. Onder	Robert Aderholt	66.666667

Note

We will now search manually for these legislators in the plural_legislators file and insert the bioguide_ids manually.

Here you can check the list of retired members from OpenStates, and check wikipedia for possible other names.

manual_bioguide_ids = {
    "John F. Reed": "R000122", # Is listed as Jack Reed but it's the same person
    "Robert F. Onder": "O000177", # Is listed as Bob Onder but it's the same person
    # ... fill in the rest manually
    #"J. D. Vance": "V001234"
}

# Apply the mapping to fill missing bioguide_id
sponsorships_with_bioguide["bioguide_id"] = sponsorships_with_bioguide.apply(
    lambda row: manual_bioguide_ids.get(row["name"], row["bioguide_id"]),
    axis=1
)

# How many records are missing a bioguide_id?
print("Matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].notna()].shape[0])
print("Not matched: ", sponsorships_with_bioguide[sponsorships_with_bioguide["bioguide_id"].isna()].shape[0])

Matched:  89176
Not matched:  0

# Final dataset is a list of sponsorships with all of the bioguide_id, name, and chamber
sponsorships_with_bioguide.head()

	id	name	entity_type	organization_id	person_id	bill_id	primary	classification	bioguide_id
0	272e17aa-0d95-48af-aeb0-2b105b35596f	Cory Booker	person	NaN	ocd-person/d0bb2662-40b6-52d2-94e6-2bf9d588c954	ocd-bill/2b2697ce-cd17-4fd1-ade6-61100214e5a8	False	cosponsor	B001288
1	0596b61b-2452-4043-864d-41f62748526a	Brian Schatz	person	NaN	ocd-person/bd24fe60-5756-5d23-bcbf-4734f366c257	ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94	False	cosponsor	S001194
2	3715e1c7-54a5-441f-b004-fe41d97724be	Roger F. Wicker	person	NaN	ocd-person/8fe2b037-b8ff-5161-9bb4-196e9f1d9dca	ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94	True	primary	W000437
3	c943434a-5029-49c2-b38b-31b651b7c38c	Tina Smith	person	NaN	ocd-person/f82067c3-a257-54ae-b6bd-ebe5178372e7	ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7	False	cosponsor	S001203
4	c0c4a9c7-bf64-4ce3-93fb-63335272ef8a	Jeff Merkley	person	NaN	ocd-person/96291b83-ea59-594a-97cf-630189e0b8ed	ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7	False	cosponsor	M001176

Insert Party and chamber#

# Now let's use the bioguide_ids to get the party, type, state and districtis from the plural_legislators file via the bioguide_id
merged_sponsorships = pd.merge(
    sponsorships_with_bioguide,
    plural_legislators[["bioguide_id", "party", "type", "state", "district"]],
    on="bioguide_id",
    how="left"
)

merged_sponsorships

	id	name	entity_type	organization_id	person_id	bill_id	primary	classification	bioguide_id	party	type	state	district
0	272e17aa-0d95-48af-aeb0-2b105b35596f	Cory Booker	person	NaN	ocd-person/d0bb2662-40b6-52d2-94e6-2bf9d588c954	ocd-bill/2b2697ce-cd17-4fd1-ade6-61100214e5a8	False	cosponsor	B001288	Democratic	sen	NJ	NaN
1	0596b61b-2452-4043-864d-41f62748526a	Brian Schatz	person	NaN	ocd-person/bd24fe60-5756-5d23-bcbf-4734f366c257	ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94	False	cosponsor	S001194	Democratic	sen	HI	NaN
2	3715e1c7-54a5-441f-b004-fe41d97724be	Roger F. Wicker	person	NaN	ocd-person/8fe2b037-b8ff-5161-9bb4-196e9f1d9dca	ocd-bill/53dce010-ef5e-4fff-b537-f5d78072da94	True	primary	W000437	Republican	rep	MS	1
3	c943434a-5029-49c2-b38b-31b651b7c38c	Tina Smith	person	NaN	ocd-person/f82067c3-a257-54ae-b6bd-ebe5178372e7	ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7	False	cosponsor	S001203	Democratic	sen	MN	NaN
4	c0c4a9c7-bf64-4ce3-93fb-63335272ef8a	Jeff Merkley	person	NaN	ocd-person/96291b83-ea59-594a-97cf-630189e0b8ed	ocd-bill/b3542be0-643b-47a0-af7d-359831b79dc7	False	cosponsor	M001176	Democratic	sen	OR	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...
89171	38397786-595a-4685-9516-6c53777073cb	Ted Budd	person	NaN	ocd-person/67c3f0bb-7182-51b8-98d0-253144992bfe	ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0	False	cosponsor	B001305	Republican	rep	NC	13
89172	4dc9b53c-a378-4c53-a223-ca8a5e126161	James C. Justice	person	NaN	ocd-person/2f9a1274-1a6a-4a78-b253-01ec02f20eee	ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0	False	cosponsor	J000312	Republican	sen	WV	NaN
89173	085171c5-5a35-4759-851e-cad314664552	Pete Ricketts	person	NaN	ocd-person/e1b9405a-e695-5bc5-b3b0-6cdc1415f53c	ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0	False	cosponsor	R000618	Republican	sen	NE	NaN
89174	a4b34c09-6375-4197-af04-09d53c682006	Ted Cruz	person	NaN	ocd-person/c4d35f64-f25b-5888-9fcb-f3cb97df500e	ocd-bill/57efa6a1-400d-4ec3-8cf8-5109b4708bd0	True	primary	C001098	Republican	sen	TX	NaN
89175	e40b1beb-0049-4128-9d6e-23334f3879bf	Mike Lee	person	NaN	ocd-person/461257ff-088f-5edd-b342-4652ce1c111f	ocd-bill/6d5519d3-e11e-4e67-9265-ae194575c0ff	True	primary	L000577	Republican	sen	UT	NaN

89176 rows × 13 columns

# Check for missing values on the bioguide_id, type, party, state, and district (if type == "rep")
print(merged_sponsorships["bioguide_id"].isna().sum())
print(merged_sponsorships["type"].isna().sum())
print(merged_sponsorships["party"].isna().sum())
print(merged_sponsorships["state"].isna().sum())
print(merged_sponsorships["district"][merged_sponsorships["type"] == "rep"].isna().sum())

About merged_sponsorships

The merged_sponsorships DataFrame consolidates all sponsorship records from the original dataset (US_119_bill_sponsorships.csv) and augments them with legislator information when a match is found.

This dataset includes:

The original sponsor and bill information (name, bill_id, classification, etc.).
Legislator attributes (bioguide_id, party, type, state, district) for matched sponsors.
All legislators in the sponsorship data should now have a bioguide_id.

This ensures that no data is lost. It serves as the foundational dataset for constructing Source A (Authors of Bills with Cross Party Sponsors) and Source B (Ranked People who Cosponsor Bills) in the Bridge Grades methodology.

Check independents#

merged_sponsorships.columns

Index(['id', 'name', 'entity_type', 'organization_id', 'person_id', 'bill_id',
       'primary', 'classification', 'bioguide_id', 'party', 'type', 'state',
       'district'],
      dtype='object')

merged_sponsorships[merged_sponsorships['party']== 'Independent']['name'].unique()

array(['Bernard Sanders', 'Bernie Sanders', 'Angus S. King'], dtype=object)

Warning

Check that legislators above aligned with the democratic party before running the code below. These legislators are factually democratic and should not be given rewards for sponsoring with the democratic party. Independents might change in the future, so please revise this part of the code

merged_sponsorships['party'] = merged_sponsorships['party'].replace('Independent', 'Democratic')

Construct Cross-Party Sponsorship Ranking (Source A)#

This section calculates how many bills each primary sponsor introduced that received support from at least one cosponsor of a different political party.

To ensure accurate and complete results:

We filter out records without a declared party (required for identifying bipartisan collaboration).
We validate the structure of sponsorship data, including missing values and unmatched relationships.
We retain identifiers (bioguide_id) for both sponsors and cosponsors, to preserve analytical flexibility.

The final output ranks each legislator who served as a primary sponsor based on the number of bills they sponsored that attracted cross-party support.

# Filter out rows with missing party information and report exclusion stats
total_records = len(merged_sponsorships)
missing_party = merged_sponsorships['party'].isna().sum()
pct_missing_party = (missing_party / total_records) * 100 if total_records > 0 else 0

print(f"Rows without party: {missing_party} of {total_records} ({pct_missing_party:.2f}%)")

# Drop rows without party affiliation (cannot evaluate bipartisanship)
df = merged_sponsorships.dropna(subset=['party'])

Rows without party: 0 of 89176 (0.00%)

df.shape[0]

Separate primary sponsors and cosponsors#

We isolate the primary sponsors and cosponsors into two separate DataFrames, ensuring that both include the sponsor name, party, and bioguide ID.

Primary sponsors are then renamed to distinguish them clearly when merged with cosponsor data.

# Select primary sponsors
primary_df = df[df['classification'] == 'primary'][['bill_id', 'name', 'party', 'bioguide_id']].copy()

# Select cosponsors (with bioguide_id retained)
cosponsor_df = df[df['classification'] == 'cosponsor'][['bill_id', 'name', 'party', 'bioguide_id']].copy()

# Rename primary sponsor columns to avoid conflict after merge
primary_df = primary_df.rename(columns={
    'name': 'primary_name',
    'party': 'primary_party',
    'bioguide_id': 'primary_bioguide_id'
})

Count and rank bills with cross-party cosponsors#

We count the number of bills each primary sponsor introduced that had at least one cross-party cosponsor. Primary sponsors with zero such bills are retained in the ranking with a count of zero.

# Get all unique primary sponsors
all_primary_sponsors = primary_df[['primary_name', 'primary_bioguide_id']].drop_duplicates()

# Identify bills with at least one cross-party cosponsor
bills_with_diff_party = merged_df[merged_df['different_party'] == True][
    ['bill_id', 'primary_name', 'primary_bioguide_id']
].drop_duplicates()

# Count number of such bills per primary sponsor
cross_party_counts = bills_with_diff_party.groupby(
    ['primary_name', 'primary_bioguide_id']
).size().reset_index(name='num_bills_with_cross_party_cosponsors')

# Merge with all primary sponsors to include those with zero
result = all_primary_sponsors.merge(
    cross_party_counts,
    on=['primary_name', 'primary_bioguide_id'],
    how='left'
)
result['num_bills_with_cross_party_cosponsors'] = result['num_bills_with_cross_party_cosponsors'].fillna(0).astype(int)

# Sort and rank
ranked_result = result.sort_values(
    by='num_bills_with_cross_party_cosponsors',
    ascending=False
).reset_index(drop=True)

ranked_result['rank'] = ranked_result.index + 1

# Preview final result
display(ranked_result.head(10))

	primary_name	primary_bioguide_id	num_bills_with_cross_party_cosponsors	rank
0	Amy Klobuchar	K000367	51	1
1	Bill Cassidy	C001075	49	2
2	John Cornyn	C001056	48	3
3	Chuck Grassley	G000386	43	4
4	Brian K. Fitzpatrick	F000466	42	5
5	Marsha Blackburn	B001243	41	6
6	Gary Peters	P000595	38	7
7	Catherine Cortez Masto	C001113	34	8
8	Jeanne Shaheen	S001181	31	9
9	Zachary Nunn	N000193	29	10

Save ranked_result to csv files#

# Uncomment to save to google drive
# ranked_result.to_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Output files/bridge_grade_source_a_cross_party_supported_bills.csv", index=False)

# Save to local machine
ranked_result.to_csv("../Data/Source A-B/Output files/bridge_grade_source_a_cross_party_supported_bills.csv", index=False)

About this file

The bridge_grade_source_a_cross_party_supported_bills.csv DataFrame contains a ranked list of legislators who served as primary sponsors during the 119th Congress, based on the number of bills they introduced that received support from at least one cosponsor of a different political party.

Each row represents a unique primary sponsor and includes:

primary_name: The name of the primary sponsor.
primary_bioguide_id: A unique identifier for the legislator.
num_bills_with_cross_party_cosponsors: The number of sponsored bills that received cross-party cosponsorship.
rank: The sponsor’s rank based on that count (higher values indicate more bipartisan engagement).

This dataset constitutes Source A in the Bridge Grades methodology:
“Authors of Bills with Cross Party Sponsors” — a signal of bipartisan openness based on the cosponsorship behavior of others toward the legislator’s initiatives.

Construct Cross-Party Cosponsorship Ranking (Source B)#

This section generates a ranking of legislators based on the number of bills they cosponsored where the primary sponsor was from a different political party.

The output represents Source B in the Bridge Grades methodology:

“Ranked People who Cosponsor Bills” — a measure of bipartisan engagement based on a legislator’s willingness to support legislation authored by the opposing party.

The data used here comes from the merged_sponsorships DataFrame produced earlier in this notebook.

Step 1: Remove rows without party affiliation#

Cosponsorship records without party information cannot be used to assess bipartisan behavior. We remove these rows and report how many are excluded.

# Count and drop rows with missing party
total_rows = len(merged_sponsorships)
missing_party_count = merged_sponsorships['party'].isna().sum()
pct_missing = (missing_party_count / total_rows) * 100 if total_rows > 0 else 0

print(f"Rows without party: {missing_party_count} of {total_rows} ({pct_missing:.2f}%)")

df = merged_sponsorships.dropna(subset=['party'])

Rows without party: 0 of 89176 (0.00%)

Step 3: Count and rank cross-party cosponsorships#

We identify all legislators who cosponsored at least one bill from the opposing party. Those who did not are still included in the final ranking with a count of zero.

# All unique cosponsors
all_cosponsors = cosponsor_df[['name', 'bioguide_id']].drop_duplicates()

# Filter to cross-party cosponsorship cases
cross_party_df = merged_df[merged_df['different_party'] == True][['name', 'bioguide_id', 'bill_id']].drop_duplicates()

# Count number of cross-party cosponsorships per legislator
cross_party_counts = cross_party_df.groupby(
    ['name', 'bioguide_id']
).size().reset_index(name='num_cross_party_cosponsored_bills')

# Merge to include all cosponsors (even those with 0 cross-party cases)
result = all_cosponsors.merge(
    cross_party_counts,
    on=['name', 'bioguide_id'],
    how='left'
)
result['num_cross_party_cosponsored_bills'] = result['num_cross_party_cosponsored_bills'].fillna(0).astype(int)

# Sort and rank
ranked_result_b = result.sort_values(
    by='num_cross_party_cosponsored_bills',
    ascending=False
).reset_index(drop=True)

ranked_result_b['rank'] = ranked_result_b.index + 1

# Preview top-ranked cross-party cosponsors
display(ranked_result_b.head(10))

	name	bioguide_id	num_cross_party_cosponsored_bills	rank
0	Brian K. Fitzpatrick	F000466	433	1
1	Donald G. Davis	D000230	240	2
2	Jimmy Panetta	P000613	167	3
3	Eugene Simon Vindman	V000138	161	4
4	Josh Gottheimer	G000583	160	5
5	Michael Lawler	L000599	153	6
6	Thomas R. Suozzi	S001201	130	7
7	Don Bacon	B001298	126	8
8	Amy Klobuchar	K000367	117	9
9	Angie Craig	C001119	108	10

Save ranked_result_b to csv files#

# Uncomment to save to google drive
# ranked_result_b.to_csv("/content/drive/MyDrive/Bridge Grades Snapshot August/Data/Source A-B/Output files/bridge_grade_source_b_cross_party_cosponsors.csv", index=False)

# Save to local machine
ranked_result_b.to_csv("../Data/Source A-B/Output files/bridge_grade_source_b_cross_party_cosponsors.csv", index=False)

Visual Summary of Cross-Party Activity and Data Quality#

This section presents three key visualizations to complement the quantitative analysis:

Top Cross-Party Cosponsors (Source B): Legislators ranked by how often they cosponsored bills from the opposite party.
Top Authors of Cross-Party Supported Bills (Source A): Legislators ranked by the number of their sponsored bills that received bipartisan support.
Missing Values Overview: A chart highlighting columns with missing information, including the source and reasoning behind those gaps.

These charts help illustrate not only patterns of bipartisan behavior, but also the integrity and limitations of the underlying data.

ordered_purples = ['#3f007d', '#54278f', '#6a51a3', '#807dba', '#9e9ac8', '#bcbddc']

# Bar chart — Source A: Top Cross-Party Bill Authors
top_authors = ranked_result.sort_values(
    by="num_bills_with_cross_party_cosponsors", ascending=False
).head(15)

fig_authors = px.bar(
    top_authors,
    x="num_bills_with_cross_party_cosponsors",
    y="primary_name",
    orientation="h",
    color="num_bills_with_cross_party_cosponsors",
    color_continuous_scale=ordered_purples[::-1],
    title="Top 15 Legislators by Cross-Party Support to Their Bills (Source A)"
)

fig_authors.update_layout(
    xaxis_title="Number of Bills with Cross-Party Cosponsors",
    yaxis_title="Primary Sponsor",
    yaxis=dict(autorange="reversed"),
    plot_bgcolor="#ffffff",
    paper_bgcolor="#ffffff",
    coloraxis_showscale=False,
    margin=dict(l=100, r=20, t=60, b=40)
)

fig_authors.show()