Source C-D-E-F: App Communications Calculations Data Processing

Source C-D-E-F: App Communications Calculations Data Processing#

Overview

This notebook processes public communication data from legislators in the 119th U.S. Congress to generate four foundational datasets for the Bridge Grades methodology:

Source C: Bipartisan Communication Sum - Total count of bipartisan communications per legislator
Source D: Bipartisan Communication Percentage - Share of total communications that are bipartisan per legislator
Source E: Personal Attack Sum - Total count of personal attacks per legislator
Source F: Personal Attack Percentage - Share of total communications that are personal attacks per legislator

These datasets serve as key inputs for computing individual Bridge Grades by analyzing patterns of bipartisan rhetoric and divisive communication in public statements.

Data Sources#

Input Files#

2025.csv - Raw communication data from Americas Political Pulse
119th_Congress_*.csv - Bioguide ID mapping for 119th Congress legislators

Data Source Details#

Source: Americas Political Pulse
Congress: 119th U.S. Congress
Download Date: August 8, 2025
Coverage: All forms of public communication by legislators including:
- Floor speeches
- Newsletters
- Press releases
- Tweets/X posts

Outputs#

Source C: Bipartisan Communication Sum#

Column: outcome_bipartisanship in bridge_grades_source_cdef_app_communication.csv

Description: Total count of communications flagged as bipartisan per legislator.

Interpretation: Higher values indicate legislators who engage in more bipartisan rhetoric across all their public communications.

Source D: Bipartisan Communication Percentage#

Column: outcome_bipartisanship_pct in bridge_grades_source_cdef_app_communication.csv

Description: Percentage of total communications that are flagged as bipartisan per legislator.

Interpretation: Higher percentages indicate legislators whose communication style is more consistently bipartisan relative to their total output.

Source E: Personal Attack Sum#

Column: attack_personal in bridge_grades_source_cdef_app_communication.csv

Description: Total count of communications flagged as personal attacks per legislator.

Interpretation: Higher values indicate legislators who engage in more personal attacks across all their public communications.

Source F: Personal Attack Percentage#

Column: attack_personal_pct in bridge_grades_source_cdef_app_communication.csv

Description: Percentage of total communications that are flagged as personal attacks per legislator.

Interpretation: Higher percentages indicate legislators whose communication style is more consistently divisive relative to their total output.

Complete Output Dataset#

File: bridge_grades_source_cdef_app_communication.csv

Columns:

bioguide_id: Unique legislator identifier
full_name: Legislator’s full name
communication_count: Total number of communications
attack_personal: Sum of personal attacks (Source E)
outcome_bipartisanship: Sum of bipartisan communications (Source C)
policy: Sum of policy-focused communications
attack_personal_pct: Percentage of personal attacks (Source F)
outcome_bipartisanship_pct: Percentage of bipartisan communications (Source D)
policy_pct: Percentage of policy-focused communications

Technical Requirements#

Dependencies#

pandas: Data manipulation and analysis
numpy: Numerical operations
matplotlib.pyplot: Data visualization
re: Regular expressions for text processing

Performance Notes#

Aggregation operations are efficient using pandas groupby functions
Percentage calculations provide normalized metrics for fair comparison
All original communication records are preserved for transparency

Data Quality#

Data Integrity Notes#

Communication data is automatically collected from multiple public sources
Flags are applied consistently across all communication types
Bioguide ID matching ensures reliable legislator identification
Percentage calculations normalize for differences in communication volume

Key Metrics#

Communication Volume: Varies significantly by legislator
Flagging Consistency: Applied uniformly across all communication types
Coverage: Includes all major forms of public communication
Timeliness: Data reflects current 119th Congress activity

Notebook Walkthrough: Preprocessing for Bridge Grades: App Communications Calculations Data#

This notebook prepares the input data used to generate 4 foundational resources in the Bridge Grades methodology:

Source C: Sums the outcome_bipartisan column for each representative from the communication file.
Source D: Takes the outcome_bipartisan sum for each representative from the communication file and divides it by the total number of communications that representative had in the entire file.
Source E: Sums the attack_personal column for each representative from the communication file
Source F: Sums the attack_personal column for each representative and divides it by the total number of communications

These resources serve as the basis for computing individual Bridge Grades by analyzing patterns of bipartisan legislative collaboration in the U.S. Congress.

The datasets processed here correspond to the 119th U.S. Congress and were downloaded from the public data portal at Americas Political Pulse.

Date downloaded: August 8, 2025
Data includes: All forms of public communication by legislators in the House and Senate are automatically collected, including:
- Floor speeches
- Newsletters
- Press releases
- Tweets/X
Download instructions:
1. Go to the button on the left side that says US-OFFICIALS and download by clicking on the button that says download. or click Here.
2. Go to US –> Download 2025 –> 2025-04-24 app comm raw.csv.csv

# Uncomment to run from colab in google drive

# from google.colab import drive
# drive.mount('/content/drive')

Mounted at /content/drive

# Install and import required libraries
import pandas as pd
import numpy as np
import glob

# Load raw data comunications
df= pd.read_csv('../Data/Source C-D-E-F/Input files/2025.csv',low_memory=False)

# Read in the 119th Congress data with bioguide ids
files = sorted(glob.glob("../Data/Source C-D-E-F/Input files/119th_Congress_*.csv"))
latest = files[-1]
df_119 = pd.read_csv(latest)

Bipartisanship processing#

This helper function transforms the raw 119th Congress bioguide spreadsheet into a clean lookup by promoting the header row, extracting and reformatting full names, and retaining only bioguide_id and full_name.

df_119.columns

df_bioguide = df_119.copy()
df_bioguide['full_name'] = df_bioguide['first_name'] + ' ' + df_bioguide['last_name']
df_bioguide = df_bioguide[['bioguide_id', 'full_name']]
df_bioguide.head()

	bioguide_id	full_name
0	M001233	Mark Messmer
1	R000617	Delia Ramirez
2	S001232	Tim Sheehy
3	L000570	Ben Luján
4	H001089	Josh Hawley

Processing and Aggregation of APP Communication Data#

This function distills the raw APP communications data into per‐legislator summaries by counting total messages, aggregating personal attacks, bipartisan outcomes, and policy mentions, and then computing corresponding percentage rates for 2025.

def clean_communications_app(df):
    # Keep only necessary columns
    df = df[['bioguide_id', 'attack_personal', 'outcome_bipartisanship', 'policy', 'first_name', 'last_name']].copy()

    # Count total number of communications per legislator
    df['communication_count'] = df.groupby('bioguide_id')['bioguide_id'].transform('count')

    # Create full_name column from first and last name
    df['full_name'] = df['first_name'].str.strip() + ' ' + df['last_name'].str.strip()

    # Aggregate total flags by legislator
    df = df.groupby(['bioguide_id', 'full_name', 'communication_count'], as_index=False).agg({
        'attack_personal': 'sum',
        'outcome_bipartisanship': 'sum',
        'policy': 'sum'
    })

    # Compute percentage-based indicators
    df['attack_personal_pct'] = df['attack_personal'] / df['communication_count'] * 100
    df['outcome_bipartisanship_pct'] = df['outcome_bipartisanship'] / df['communication_count'] * 100
    df['policy_pct'] = df['policy'] / df['communication_count'] * 100

    return df


# Clean and sort APP communication dataset for year 2025
df_comm_app_2025 = (
    clean_communications_app(df)
    .sort_values(by='bioguide_id', ascending=True)
    .reset_index(drop=True)
)

# Export csv
df_comm_app_2025.head(20)

	bioguide_id	full_name	communication_count	attack_personal	outcome_bipartisanship	policy	attack_personal_pct	outcome_bipartisanship_pct	policy_pct
0	A000055	Robert Aderholt	256	3.0	10.0	148.0	1.171875	3.906250	57.812500
1	A000148	Jake Auchincloss	903	76.0	99.0	647.0	8.416390	10.963455	71.650055
2	A000369	Mark Amodei	288	1.0	20.0	200.0	0.347222	6.944444	69.444444
3	A000370	Alma Adams	285	3.0	12.0	171.0	1.052632	4.210526	60.000000
4	A000371	Pete Aguilar	123	4.0	3.0	80.0	3.252033	2.439024	65.040650
5	A000372	Rick Allen	657	1.0	34.0	399.0	0.152207	5.175038	60.730594
6	A000375	Jodey Arrington	963	13.0	40.0	600.0	1.349948	4.153686	62.305296
7	A000379	Mark Alford	1493	12.0	76.0	659.0	0.803751	5.090422	44.139317
8	A000380	Gabe Amo	937	52.0	56.0	642.0	5.549626	5.976521	68.516542
9	A000381	Yassamin Ansari	246	10.0	6.0	133.0	4.065041	2.439024	54.065041
10	A000382	Angela Alsobrooks	356	23.0	23.0	168.0	6.460674	6.460674	47.191011
11	B000490	Sanford Bishop	430	1.0	22.0	142.0	0.232558	5.116279	33.023256
12	B000668	Cliff Bentz	229	2.0	7.0	152.0	0.873362	3.056769	66.375546
13	B000740	Stephanie Bice	602	2.0	41.0	370.0	0.332226	6.810631	61.461794
14	B000825	Lauren Boebert	455	10.0	27.0	278.0	2.197802	5.934066	61.098901
15	B001230	Tammy Baldwin	1258	48.0	100.0	903.0	3.815580	7.949126	71.780604
16	B001236	John Boozman	824	1.0	103.0	566.0	0.121359	12.500000	68.689320
17	B001243	Marsha Blackburn	2331	99.0	76.0	1405.0	4.247104	3.260403	60.274560
18	B001257	Gus Bilirakis	1744	6.0	94.0	862.0	0.344037	5.389908	49.426606
19	B001260	Vern Buchanan	733	8.0	61.0	577.0	1.091405	8.321965	78.717599

# Save to csv
df_comm_app_2025.to_csv('../Data/Source C-D-E-F/Output files/bridge_grades_source_cdef_app_communication.csv', index=False)

# Check that all legislators are present
df_comm_app_2025.shape[0]