Source C-D-E-F: App Communications Calculations Data Processing#
Overview
This notebook processes public communication data from legislators in the 119th U.S. Congress to generate four foundational datasets for the Bridge Grades methodology:
Source C: Bipartisan Communication Sum - Total count of bipartisan communications per legislator
Source D: Bipartisan Communication Percentage - Share of total communications that are bipartisan per legislator
Source E: Personal Attack Sum - Total count of personal attacks per legislator
Source F: Personal Attack Percentage - Share of total communications that are personal attacks per legislator
These datasets serve as key inputs for computing individual Bridge Grades by analyzing patterns of bipartisan rhetoric and divisive communication in public statements.
Data Sources#
Input Files#
2025.csv
- Raw communication data from Americas Political Pulse119th_Congress_*.csv
- Bioguide ID mapping for 119th Congress legislators
Data Source Details#
Source: Americas Political Pulse
Congress: 119th U.S. Congress
Download Date: August 8, 2025
Coverage: All forms of public communication by legislators including:
Floor speeches
Newsletters
Press releases
Tweets/X posts
Outputs#
Source C: Bipartisan Communication Sum#
Column: outcome_bipartisanship
in bridge_grades_source_cdef_app_communication.csv
Description: Total count of communications flagged as bipartisan per legislator.
Interpretation: Higher values indicate legislators who engage in more bipartisan rhetoric across all their public communications.
Source D: Bipartisan Communication Percentage#
Column: outcome_bipartisanship_pct
in bridge_grades_source_cdef_app_communication.csv
Description: Percentage of total communications that are flagged as bipartisan per legislator.
Interpretation: Higher percentages indicate legislators whose communication style is more consistently bipartisan relative to their total output.
Source E: Personal Attack Sum#
Column: attack_personal
in bridge_grades_source_cdef_app_communication.csv
Description: Total count of communications flagged as personal attacks per legislator.
Interpretation: Higher values indicate legislators who engage in more personal attacks across all their public communications.
Source F: Personal Attack Percentage#
Column: attack_personal_pct
in bridge_grades_source_cdef_app_communication.csv
Description: Percentage of total communications that are flagged as personal attacks per legislator.
Interpretation: Higher percentages indicate legislators whose communication style is more consistently divisive relative to their total output.
Complete Output Dataset#
File: bridge_grades_source_cdef_app_communication.csv
Columns:
bioguide_id
: Unique legislator identifierfull_name
: Legislator’s full namecommunication_count
: Total number of communicationsattack_personal
: Sum of personal attacks (Source E)outcome_bipartisanship
: Sum of bipartisan communications (Source C)policy
: Sum of policy-focused communicationsattack_personal_pct
: Percentage of personal attacks (Source F)outcome_bipartisanship_pct
: Percentage of bipartisan communications (Source D)policy_pct
: Percentage of policy-focused communications
Technical Requirements#
Dependencies#
pandas
: Data manipulation and analysisnumpy
: Numerical operationsmatplotlib.pyplot
: Data visualizationre
: Regular expressions for text processing
Performance Notes#
Aggregation operations are efficient using pandas groupby functions
Percentage calculations provide normalized metrics for fair comparison
All original communication records are preserved for transparency
Data Quality#
Data Integrity Notes#
Communication data is automatically collected from multiple public sources
Flags are applied consistently across all communication types
Bioguide ID matching ensures reliable legislator identification
Percentage calculations normalize for differences in communication volume
Key Metrics#
Communication Volume: Varies significantly by legislator
Flagging Consistency: Applied uniformly across all communication types
Coverage: Includes all major forms of public communication
Timeliness: Data reflects current 119th Congress activity
Notebook Walkthrough: Preprocessing for Bridge Grades: App Communications Calculations Data#
This notebook prepares the input data used to generate 4 foundational resources in the Bridge Grades methodology:
Source C: Sums the outcome_bipartisan column for each representative from the communication file.
Source D: Takes the outcome_bipartisan sum for each representative from the communication file and divides it by the total number of communications that representative had in the entire file.
Source E: Sums the attack_personal column for each representative from the communication file
Source F: Sums the attack_personal column for each representative and divides it by the total number of communications
These resources serve as the basis for computing individual Bridge Grades by analyzing patterns of bipartisan legislative collaboration in the U.S. Congress.
The datasets processed here correspond to the 119th U.S. Congress and were downloaded from the public data portal at Americas Political Pulse.
Date downloaded: August 8, 2025
Data includes: All forms of public communication by legislators in the House and Senate are automatically collected, including:
Floor speeches
Newsletters
Press releases
Tweets/X
Download instructions:
Go to the button on the left side that says US-OFFICIALS and download by clicking on the button that says download. or click Here.
Go to US –> Download 2025 –> 2025-04-24 app comm raw.csv.csv
# Uncomment to run from colab in google drive
# from google.colab import drive
# drive.mount('/content/drive')
Mounted at /content/drive
# Install and import required libraries
import pandas as pd
import numpy as np
import glob
# Load raw data comunications
df= pd.read_csv('../Data/Source C-D-E-F/Input files/2025.csv',low_memory=False)
# Read in the 119th Congress data with bioguide ids
files = sorted(glob.glob("../Data/Source C-D-E-F/Input files/119th_Congress_*.csv"))
latest = files[-1]
df_119 = pd.read_csv(latest)
Bipartisanship processing#
This helper function transforms the raw 119th Congress bioguide spreadsheet into a clean lookup by promoting the header row, extracting and reformatting full names, and retaining only bioguide_id
and full_name
.
df_119.columns
df_bioguide = df_119.copy()
df_bioguide['full_name'] = df_bioguide['first_name'] + ' ' + df_bioguide['last_name']
df_bioguide = df_bioguide[['bioguide_id', 'full_name']]
df_bioguide.head()
bioguide_id | full_name | |
---|---|---|
0 | M001233 | Mark Messmer |
1 | R000617 | Delia Ramirez |
2 | S001232 | Tim Sheehy |
3 | L000570 | Ben Luján |
4 | H001089 | Josh Hawley |
Processing and Aggregation of APP Communication Data#
This function distills the raw APP communications data into per‐legislator summaries by counting total messages, aggregating personal attacks, bipartisan outcomes, and policy mentions, and then computing corresponding percentage rates for 2025.
def clean_communications_app(df):
# Keep only necessary columns
df = df[['bioguide_id', 'attack_personal', 'outcome_bipartisanship', 'policy', 'first_name', 'last_name']].copy()
# Count total number of communications per legislator
df['communication_count'] = df.groupby('bioguide_id')['bioguide_id'].transform('count')
# Create full_name column from first and last name
df['full_name'] = df['first_name'].str.strip() + ' ' + df['last_name'].str.strip()
# Aggregate total flags by legislator
df = df.groupby(['bioguide_id', 'full_name', 'communication_count'], as_index=False).agg({
'attack_personal': 'sum',
'outcome_bipartisanship': 'sum',
'policy': 'sum'
})
# Compute percentage-based indicators
df['attack_personal_pct'] = df['attack_personal'] / df['communication_count'] * 100
df['outcome_bipartisanship_pct'] = df['outcome_bipartisanship'] / df['communication_count'] * 100
df['policy_pct'] = df['policy'] / df['communication_count'] * 100
return df
# Clean and sort APP communication dataset for year 2025
df_comm_app_2025 = (
clean_communications_app(df)
.sort_values(by='bioguide_id', ascending=True)
.reset_index(drop=True)
)
# Export csv
df_comm_app_2025.head(20)
bioguide_id | full_name | communication_count | attack_personal | outcome_bipartisanship | policy | attack_personal_pct | outcome_bipartisanship_pct | policy_pct | |
---|---|---|---|---|---|---|---|---|---|
0 | A000055 | Robert Aderholt | 256 | 3.0 | 10.0 | 148.0 | 1.171875 | 3.906250 | 57.812500 |
1 | A000148 | Jake Auchincloss | 903 | 76.0 | 99.0 | 647.0 | 8.416390 | 10.963455 | 71.650055 |
2 | A000369 | Mark Amodei | 288 | 1.0 | 20.0 | 200.0 | 0.347222 | 6.944444 | 69.444444 |
3 | A000370 | Alma Adams | 285 | 3.0 | 12.0 | 171.0 | 1.052632 | 4.210526 | 60.000000 |
4 | A000371 | Pete Aguilar | 123 | 4.0 | 3.0 | 80.0 | 3.252033 | 2.439024 | 65.040650 |
5 | A000372 | Rick Allen | 657 | 1.0 | 34.0 | 399.0 | 0.152207 | 5.175038 | 60.730594 |
6 | A000375 | Jodey Arrington | 963 | 13.0 | 40.0 | 600.0 | 1.349948 | 4.153686 | 62.305296 |
7 | A000379 | Mark Alford | 1493 | 12.0 | 76.0 | 659.0 | 0.803751 | 5.090422 | 44.139317 |
8 | A000380 | Gabe Amo | 937 | 52.0 | 56.0 | 642.0 | 5.549626 | 5.976521 | 68.516542 |
9 | A000381 | Yassamin Ansari | 246 | 10.0 | 6.0 | 133.0 | 4.065041 | 2.439024 | 54.065041 |
10 | A000382 | Angela Alsobrooks | 356 | 23.0 | 23.0 | 168.0 | 6.460674 | 6.460674 | 47.191011 |
11 | B000490 | Sanford Bishop | 430 | 1.0 | 22.0 | 142.0 | 0.232558 | 5.116279 | 33.023256 |
12 | B000668 | Cliff Bentz | 229 | 2.0 | 7.0 | 152.0 | 0.873362 | 3.056769 | 66.375546 |
13 | B000740 | Stephanie Bice | 602 | 2.0 | 41.0 | 370.0 | 0.332226 | 6.810631 | 61.461794 |
14 | B000825 | Lauren Boebert | 455 | 10.0 | 27.0 | 278.0 | 2.197802 | 5.934066 | 61.098901 |
15 | B001230 | Tammy Baldwin | 1258 | 48.0 | 100.0 | 903.0 | 3.815580 | 7.949126 | 71.780604 |
16 | B001236 | John Boozman | 824 | 1.0 | 103.0 | 566.0 | 0.121359 | 12.500000 | 68.689320 |
17 | B001243 | Marsha Blackburn | 2331 | 99.0 | 76.0 | 1405.0 | 4.247104 | 3.260403 | 60.274560 |
18 | B001257 | Gus Bilirakis | 1744 | 6.0 | 94.0 | 862.0 | 0.344037 | 5.389908 | 49.426606 |
19 | B001260 | Vern Buchanan | 733 | 8.0 | 61.0 | 577.0 | 1.091405 | 8.321965 | 78.717599 |
# Save to csv
df_comm_app_2025.to_csv('../Data/Source C-D-E-F/Output files/bridge_grades_source_cdef_app_communication.csv', index=False)
# Check that all legislators are present
df_comm_app_2025.shape[0]
537