Source C-D-E-F: App Communications Calculations Data Processing#

Overview

This notebook processes public communication data from legislators in the 119th U.S. Congress to generate four foundational datasets for the Bridge Grades methodology:

  • Source C: Bipartisan Communication Sum - Total count of bipartisan communications per legislator

  • Source D: Bipartisan Communication Percentage - Share of total communications that are bipartisan per legislator

  • Source E: Personal Attack Sum - Total count of personal attacks per legislator

  • Source F: Personal Attack Percentage - Share of total communications that are personal attacks per legislator

These datasets serve as key inputs for computing individual Bridge Grades by analyzing patterns of bipartisan rhetoric and divisive communication in public statements.

Data Sources#

Input Files#

  • 2025.csv - Raw communication data from Americas Political Pulse

  • 119th_Congress_*.csv - Bioguide ID mapping for 119th Congress legislators

Data Source Details#

  • Source: Americas Political Pulse

  • Congress: 119th U.S. Congress

  • Download Date: August 8, 2025

  • Coverage: All forms of public communication by legislators including:

    • Floor speeches

    • Newsletters

    • Press releases

    • Tweets/X posts


Outputs#

Source C: Bipartisan Communication Sum#

Column: outcome_bipartisanship in bridge_grades_source_cdef_app_communication.csv

Description: Total count of communications flagged as bipartisan per legislator.

Interpretation: Higher values indicate legislators who engage in more bipartisan rhetoric across all their public communications.

Source D: Bipartisan Communication Percentage#

Column: outcome_bipartisanship_pct in bridge_grades_source_cdef_app_communication.csv

Description: Percentage of total communications that are flagged as bipartisan per legislator.

Interpretation: Higher percentages indicate legislators whose communication style is more consistently bipartisan relative to their total output.

Source E: Personal Attack Sum#

Column: attack_personal in bridge_grades_source_cdef_app_communication.csv

Description: Total count of communications flagged as personal attacks per legislator.

Interpretation: Higher values indicate legislators who engage in more personal attacks across all their public communications.

Source F: Personal Attack Percentage#

Column: attack_personal_pct in bridge_grades_source_cdef_app_communication.csv

Description: Percentage of total communications that are flagged as personal attacks per legislator.

Interpretation: Higher percentages indicate legislators whose communication style is more consistently divisive relative to their total output.

Complete Output Dataset#

File: bridge_grades_source_cdef_app_communication.csv

Columns:

  • bioguide_id: Unique legislator identifier

  • full_name: Legislator’s full name

  • communication_count: Total number of communications

  • attack_personal: Sum of personal attacks (Source E)

  • outcome_bipartisanship: Sum of bipartisan communications (Source C)

  • policy: Sum of policy-focused communications

  • attack_personal_pct: Percentage of personal attacks (Source F)

  • outcome_bipartisanship_pct: Percentage of bipartisan communications (Source D)

  • policy_pct: Percentage of policy-focused communications


Technical Requirements#

Dependencies#

  • pandas: Data manipulation and analysis

  • numpy: Numerical operations

  • matplotlib.pyplot: Data visualization

  • re: Regular expressions for text processing

Performance Notes#

  • Aggregation operations are efficient using pandas groupby functions

  • Percentage calculations provide normalized metrics for fair comparison

  • All original communication records are preserved for transparency


Data Quality#

Data Integrity Notes#

  • Communication data is automatically collected from multiple public sources

  • Flags are applied consistently across all communication types

  • Bioguide ID matching ensures reliable legislator identification

  • Percentage calculations normalize for differences in communication volume

Key Metrics#

  • Communication Volume: Varies significantly by legislator

  • Flagging Consistency: Applied uniformly across all communication types

  • Coverage: Includes all major forms of public communication

  • Timeliness: Data reflects current 119th Congress activity


Notebook Walkthrough: Preprocessing for Bridge Grades: App Communications Calculations Data#

This notebook prepares the input data used to generate 4 foundational resources in the Bridge Grades methodology:

  • Source C: Sums the outcome_bipartisan column for each representative from the communication file.

  • Source D: Takes the outcome_bipartisan sum for each representative from the communication file and divides it by the total number of communications that representative had in the entire file.

  • Source E: Sums the attack_personal column for each representative from the communication file

  • Source F: Sums the attack_personal column for each representative and divides it by the total number of communications

These resources serve as the basis for computing individual Bridge Grades by analyzing patterns of bipartisan legislative collaboration in the U.S. Congress.

The datasets processed here correspond to the 119th U.S. Congress and were downloaded from the public data portal at Americas Political Pulse.

  • Date downloaded: August 8, 2025

  • Data includes: All forms of public communication by legislators in the House and Senate are automatically collected, including:

    • Floor speeches

    • Newsletters

    • Press releases

    • Tweets/X

  • Download instructions:

    1. Go to the button on the left side that says US-OFFICIALS and download by clicking on the button that says download. or click Here.

    2. Go to US –> Download 2025 –> 2025-04-24 app comm raw.csv.csv

# Uncomment to run from colab in google drive

# from google.colab import drive
# drive.mount('/content/drive')
Mounted at /content/drive
# Install and import required libraries
import pandas as pd
import numpy as np
import glob
# Load raw data comunications
df= pd.read_csv('../Data/Source C-D-E-F/Input files/2025.csv',low_memory=False)

# Read in the 119th Congress data with bioguide ids
files = sorted(glob.glob("../Data/Source C-D-E-F/Input files/119th_Congress_*.csv"))
latest = files[-1]
df_119 = pd.read_csv(latest)

Bipartisanship processing#

This helper function transforms the raw 119th Congress bioguide spreadsheet into a clean lookup by promoting the header row, extracting and reformatting full names, and retaining only bioguide_id and full_name.

df_119.columns
df_bioguide = df_119.copy()
df_bioguide['full_name'] = df_bioguide['first_name'] + ' ' + df_bioguide['last_name']
df_bioguide = df_bioguide[['bioguide_id', 'full_name']]
df_bioguide.head()
bioguide_id full_name
0 M001233 Mark Messmer
1 R000617 Delia Ramirez
2 S001232 Tim Sheehy
3 L000570 Ben Luján
4 H001089 Josh Hawley

Processing and Aggregation of APP Communication Data#

This function distills the raw APP communications data into per‐legislator summaries by counting total messages, aggregating personal attacks, bipartisan outcomes, and policy mentions, and then computing corresponding percentage rates for 2025.

def clean_communications_app(df):
    # Keep only necessary columns
    df = df[['bioguide_id', 'attack_personal', 'outcome_bipartisanship', 'policy', 'first_name', 'last_name']].copy()

    # Count total number of communications per legislator
    df['communication_count'] = df.groupby('bioguide_id')['bioguide_id'].transform('count')

    # Create full_name column from first and last name
    df['full_name'] = df['first_name'].str.strip() + ' ' + df['last_name'].str.strip()

    # Aggregate total flags by legislator
    df = df.groupby(['bioguide_id', 'full_name', 'communication_count'], as_index=False).agg({
        'attack_personal': 'sum',
        'outcome_bipartisanship': 'sum',
        'policy': 'sum'
    })

    # Compute percentage-based indicators
    df['attack_personal_pct'] = df['attack_personal'] / df['communication_count'] * 100
    df['outcome_bipartisanship_pct'] = df['outcome_bipartisanship'] / df['communication_count'] * 100
    df['policy_pct'] = df['policy'] / df['communication_count'] * 100

    return df


# Clean and sort APP communication dataset for year 2025
df_comm_app_2025 = (
    clean_communications_app(df)
    .sort_values(by='bioguide_id', ascending=True)
    .reset_index(drop=True)
)
# Export csv
df_comm_app_2025.head(20)
bioguide_id full_name communication_count attack_personal outcome_bipartisanship policy attack_personal_pct outcome_bipartisanship_pct policy_pct
0 A000055 Robert Aderholt 256 3.0 10.0 148.0 1.171875 3.906250 57.812500
1 A000148 Jake Auchincloss 903 76.0 99.0 647.0 8.416390 10.963455 71.650055
2 A000369 Mark Amodei 288 1.0 20.0 200.0 0.347222 6.944444 69.444444
3 A000370 Alma Adams 285 3.0 12.0 171.0 1.052632 4.210526 60.000000
4 A000371 Pete Aguilar 123 4.0 3.0 80.0 3.252033 2.439024 65.040650
5 A000372 Rick Allen 657 1.0 34.0 399.0 0.152207 5.175038 60.730594
6 A000375 Jodey Arrington 963 13.0 40.0 600.0 1.349948 4.153686 62.305296
7 A000379 Mark Alford 1493 12.0 76.0 659.0 0.803751 5.090422 44.139317
8 A000380 Gabe Amo 937 52.0 56.0 642.0 5.549626 5.976521 68.516542
9 A000381 Yassamin Ansari 246 10.0 6.0 133.0 4.065041 2.439024 54.065041
10 A000382 Angela Alsobrooks 356 23.0 23.0 168.0 6.460674 6.460674 47.191011
11 B000490 Sanford Bishop 430 1.0 22.0 142.0 0.232558 5.116279 33.023256
12 B000668 Cliff Bentz 229 2.0 7.0 152.0 0.873362 3.056769 66.375546
13 B000740 Stephanie Bice 602 2.0 41.0 370.0 0.332226 6.810631 61.461794
14 B000825 Lauren Boebert 455 10.0 27.0 278.0 2.197802 5.934066 61.098901
15 B001230 Tammy Baldwin 1258 48.0 100.0 903.0 3.815580 7.949126 71.780604
16 B001236 John Boozman 824 1.0 103.0 566.0 0.121359 12.500000 68.689320
17 B001243 Marsha Blackburn 2331 99.0 76.0 1405.0 4.247104 3.260403 60.274560
18 B001257 Gus Bilirakis 1744 6.0 94.0 862.0 0.344037 5.389908 49.426606
19 B001260 Vern Buchanan 733 8.0 61.0 577.0 1.091405 8.321965 78.717599
# Save to csv
df_comm_app_2025.to_csv('../Data/Source C-D-E-F/Output files/bridge_grades_source_cdef_app_communication.csv', index=False)
# Check that all legislators are present
df_comm_app_2025.shape[0]
537