# **Current Legislators Data Collection: Congress.gov API**

```{admonition} Overview
:class: tip

This notebook serves as the **foundational data collection step** for the Bridge Grades methodology, extracting current member information from the official Congress.gov API. This dataset provides the essential legislator roster that serves as the backbone for all subsequent data processing and analysis.

The notebook generates the master legislator dataset (`119th_Congress_20250809.csv`) that contains biographical information, party affiliations, and unique identifiers (bioguide_id) for all current members of the 119th U.S. Congress.
```

## **Data Sources**

### **Input Files**
- **Congress.gov API** - Official congressional member data via REST API
- **API Documentation:** https://api.congress.gov/

### **Data Source Details**
- **Source:** [Congress.gov API](https://api.congress.gov/)
- **Congress:** 119th U.S. Congress
- **Collection Date:** August 9, 2025
- **Coverage:** All current House Representatives and Senators with voting rights

---

## **Outputs**

### **Master Legislator Dataset**
**File:** `119th_Congress_*.csv`

**Columns:**
- `bioguide_id`: Unique legislator identifier (primary key for all Bridge Grades processing)
- `Name`: Full name of the legislator
- `first_name`: Legislator's first name
- `middle_name`: Legislator's middle name (if available)
- `last_name`: Legislator's last name
- `nickname`: Common nickname (if available)
- `Chamber`: House or Senate
- `State`: State or territory represented
- `District`: Congressional district (House only)
- `Party`: Political party affiliation
- `start_year`: Year term began
- `end_year`: Year term ends (null for current members)
- `update_date`: Last update timestamp from API
- `image_url`: Official congressional photo URL

**Data Quality Notes:**
- **Voting Members Only:** Excludes non-voting territorial delegates
- **Current Members:** Filters to active legislators only
- **Complete Coverage:** Includes all 435 House districts and 100 Senate seats
- **Standardized Format:** Consistent naming and formatting across all records

---

## **Technical Requirements**

### **Dependencies**
- `requests`: API communication
- `pandas`: Data manipulation and analysis
- `time`: Rate limiting for API calls

### **API Configuration**
- **Rate Limiting:** 0.2 second delay between requests to respect API limits
- **Pagination:** Handles large datasets with offset-based pagination
- **Error Handling:** Robust handling of API response variations

---

## **Data Processing Pipeline**

### **Step 1: API Data Collection**
- Fetches all current members from Congress.gov API
- Handles pagination to collect complete dataset
- Implements rate limiting to respect API constraints

### **Step 2: Data Normalization**
- Flattens nested JSON structure into tabular format
- Extracts term information and biographical data
- Standardizes image URL extraction

### **Step 3: Data Filtering and Cleaning**
- Removes non-voting territorial delegates
- Filters to current members only
- Parses name components for consistent formatting

### **Step 4: Output Generation**
- Exports clean dataset to CSV format
- Validates data completeness and quality
- Provides summary statistics

---

## **Usage in Bridge Grades Pipeline**

This dataset serves as the **master roster** for all subsequent Bridge Grades processing:

1. **Source A-B Processing:** Provides bioguide_id matching for bill sponsorship data
2. **Source C-D-E-F Processing:** Enables legislator identification in communication data
3. **Source M-N Processing:** Links PVI and ideology data to specific legislators
4. **Final Scoring:** Serves as the foundation for all Bridge Grade calculations

**Critical Role:** Without this dataset, no other Bridge Grades processing can occur, as it provides the essential legislator identification and biographical context required for all analysis.

## **Notebook Walkthrough: Current Legislators Data Collection**

This notebook demonstrates the complete process of collecting current congressional member data from the official Congress.gov API. The resulting dataset serves as the foundation for all Bridge Grades analysis by providing essential legislator identification and biographical information.

**Key Steps:**
1. **API Configuration:** Set up authentication and endpoint parameters
2. **Data Collection:** Fetch all current members with pagination handling
3. **Data Processing:** Normalize JSON structure and extract key fields
4. **Data Cleaning:** Filter to voting members and standardize formatting
5. **Output Generation:** Export clean dataset for use in other notebooks

**Expected Runtime:** 2-3 minutes (due to API rate limiting)


In [2]:
# Import libraries
import requests
import pandas as pd
import time

## **API Configuration and Authentication**

Before making API calls, we need to configure the authentication and endpoint parameters. The Congress.gov API requires an API key for access.

```{warning}
**API Key Security**
Replace the placeholder API key with your own key from https://api.congress.gov/sign-up/. Never commit API keys to version control.
```

### **Configuration Parameters**
- **API Endpoint:** 119th Congress member endpoint
- **Rate Limiting:** 0.2 second delay between requests
- **Pagination:** 250 records per page (API maximum)


### **API Call**

## **Data Collection Process**

This section implements the API call loop to collect all current members of Congress. The process handles pagination automatically and includes rate limiting to respect API constraints.

### **Collection Strategy**
- **Pagination Handling:** Uses offset-based pagination to collect all records
- **Rate Limiting:** 0.2 second delay between requests to avoid hitting API limits
- **Error Handling:** Continues collection even if individual requests fail
- **Progress Tracking:** Prints progress updates for each batch of records


In [None]:
# Set up the API call
API_KEY = "76GTa7yKIRDorzISvdcSWbeSgZnc2HcGSK8UeIgZ"  # Replace with your actual API key. Get your own key at: https://api.congress.gov/sign-up/
BASE_URL = "https://api.congress.gov/v3/member/congress/119/" # Change Congress number to get different Congresses
LIMIT = 250  # max per page
offset = 0
all_members = []
headers = {
    "X-API-Key": API_KEY
}

In [4]:
# Run the API call to get the current members
while True:
    params = {
        "currentMember": "true",
        "limit": LIMIT,
        "offset": offset
    }

    response = requests.get(BASE_URL, headers=headers, params=params)
    data = response.json()

    if "members" not in data:
        print("No more data or error in response.")
        break

    members = data["members"]
    if not members:
        break

    all_members.extend(members)
    print(f"Fetched {len(members)} members (offset {offset})")

    offset += LIMIT
    time.sleep(0.2)  # avoid rate-limiting


Fetched 250 members (offset 0)
Fetched 250 members (offset 250)
Fetched 37 members (offset 500)


## **Data Normalization and Processing**

The raw API response contains nested JSON structures that need to be flattened into a tabular format suitable for analysis. This section handles the data transformation and cleaning process.

### **Normalization Steps**
1. **JSON Flattening:** Convert nested JSON to flat DataFrame structure
2. **Image URL Extraction:** Extract image URLs from nested depiction objects
3. **Column Standardization:** Rename columns to match expected format
4. **Data Type Conversion:** Ensure proper data types for analysis


In [5]:
# Result of the API call
all_members

[{'bioguideId': 'M001233',
  'depiction': {'attribution': 'Image courtesy of the Member',
   'imageUrl': 'https://www.congress.gov/img/member/677448630b34857ecc909125_200.jpg'},
  'district': 8,
  'name': 'Messmer, Mark B.',
  'partyName': 'Republican',
  'state': 'Indiana',
  'terms': {'item': [{'chamber': 'House of Representatives',
     'startYear': 2025}]},
  'updateDate': '2025-07-14T14:25:50Z',
  'url': 'https://api.congress.gov/v3/member/M001233?format=json'},
 {'bioguideId': 'R000617',
  'depiction': {'attribution': 'Image courtesy of the Member',
   'imageUrl': 'https://www.congress.gov/img/member/684c2356333714e4aee2e1fd_200.jpg'},
  'district': 3,
  'name': 'Ramirez, Delia C.',
  'partyName': 'Democratic',
  'state': 'Illinois',
  'terms': {'item': [{'chamber': 'House of Representatives',
     'startYear': 2023}]},
  'updateDate': '2025-06-13T13:48:04Z',
  'url': 'https://api.congress.gov/v3/member/R000617?format=json'},
 {'bioguideId': 'S001232',
  'depiction': {'attributio

In [6]:
# Convert Json to DataFrame
df = pd.json_normalize(
    all_members,
    record_path=["terms", "item"],
    meta=[
        "bioguideId",
        "name",
        "state",
        "district",
        "partyName",
        "updateDate",
        #"url",
        "depiction" # Path to the nested imageUrl
    ],
    errors='ignore'
)

In [7]:
df['imageUrl'] = df['depiction'].str.get('imageUrl').fillna('No Image Available')

# 3. Drop the now-redundant 'depiction' column
df = df.drop(columns=['depiction'])

## **Data Filtering and Cleaning**

This section applies critical filters to ensure we only include voting members of Congress in our final dataset. We exclude non-voting territorial delegates and focus on current members only.

### **Filtering Criteria**
- **Current Members Only:** Filter to members with null `endYear` (active legislators)
- **Voting Members:** Exclude non-voting territorial delegates
- **Data Validation:** Verify expected counts and data quality

```{note}
**Territorial Delegates**
The following territories have non-voting delegates: American Samoa, Guam, Northern Mariana Islands, Puerto Rico, Virgin Islands, and District of Columbia. These are excluded from Bridge Grades analysis as they do not have full voting rights in Congress.
```


In [8]:
df

Unnamed: 0,chamber,startYear,endYear,bioguideId,name,state,district,partyName,updateDate,imageUrl
0,House of Representatives,2025,,M001233,"Messmer, Mark B.",Indiana,8,Republican,2025-07-14T14:25:50Z,https://www.congress.gov/img/member/677448630b...
1,House of Representatives,2023,,R000617,"Ramirez, Delia C.",Illinois,3,Democratic,2025-06-13T13:48:04Z,https://www.congress.gov/img/member/684c235633...
2,Senate,2025,,S001232,"Sheehy, Tim",Montana,,Republican,2025-06-07T10:30:29Z,https://www.congress.gov/img/member/677d8231fd...
3,House of Representatives,2009,2021.0,L000570,"Luján, Ben Ray",New Mexico,,Democratic,2025-06-03T13:18:42Z,https://www.congress.gov/img/member/l000570_20...
4,Senate,2021,,L000570,"Luján, Ben Ray",New Mexico,,Democratic,2025-06-03T13:18:42Z,https://www.congress.gov/img/member/l000570_20...
...,...,...,...,...,...,...,...,...,...,...
592,Senate,2009,,B001267,"Bennet, Michael F.",Colorado,,Democratic,2025-03-09T12:42:12Z,https://www.congress.gov/img/member/b001267_20...
593,House of Representatives,1999,2013.0,B001230,"Baldwin, Tammy",Wisconsin,,Democratic,2025-03-09T12:42:12Z,https://www.congress.gov/img/member/b001230_20...
594,Senate,2013,,B001230,"Baldwin, Tammy",Wisconsin,,Democratic,2025-03-09T12:42:12Z,https://www.congress.gov/img/member/b001230_20...
595,House of Representatives,2003,2019.0,B001243,"Blackburn, Marsha",Tennessee,,Republican,2025-03-09T12:42:12Z,https://www.congress.gov/img/member/b001243_20...


### Process Data

In [9]:
# if endYear is null, then the member is current
df_current = df[df["endYear"].isnull()].reset_index(drop=True)

In [10]:
# total legislators
df_current.shape[0]

537

## **Name Processing and Standardization**

The Congress.gov API returns names in "Last, First Middle" format, but we need to standardize this for consistent use across the Bridge Grades pipeline. This section parses and reformats name components.

### **Name Processing Steps**
1. **Parse Name Components:** Extract first, middle, and last names
2. **Extract Nicknames:** Identify nicknames enclosed in quotes
3. **Create Standard Format:** Convert to "First Middle Last" format
4. **Clean Whitespace:** Remove extra spaces and formatting issues

```{warning}
**Name Format Assumptions**
The parsing logic assumes names follow the "Last, First Middle" format. Names with unusual formatting may require manual review.
```


In [11]:
# Check for legislators that are non-voting members in territories that have no voting rights (should be 6 members)
list_of_territories = ["American Samoa", "Guam", "Northern Mariana Islands", "Puerto Rico", "Virgin Islands", "District of Columbia"]
non_voting_members = df_current[df_current["state"].isin(list_of_territories)]
non_voting_members

Unnamed: 0,chamber,startYear,endYear,bioguideId,name,state,district,partyName,updateDate,imageUrl
34,House of Representatives,2015,,P000610,"Plaskett, Stacey E.",Virgin Islands,0,Democratic,2025-04-28T13:04:28Z,https://www.congress.gov/img/member/116_dg_vi_...
106,House of Representatives,2025,,H001103,"Hernández, Pablo Jose",Puerto Rico,0,Democratic,2025-04-28T13:04:25Z,https://www.congress.gov/img/member/67742d980b...
213,House of Representatives,2025,,K000404,"King-Hinds, Kimberlyn",Northern Mariana Islands,0,Republican,2025-04-28T13:04:21Z,https://www.congress.gov/img/member/67742f0a0b...
300,House of Representatives,2023,,M001219,"Moylan, James C.",Guam,0,Republican,2025-04-28T13:04:18Z,https://www.congress.gov/img/member/m001219_20...
380,House of Representatives,1991,,N000147,"Norton, Eleanor Holmes",District of Columbia,0,Democratic,2025-04-28T13:04:16Z,https://www.congress.gov/img/member/116_dg_dc_...
423,House of Representatives,2015,,R000600,"Radewagen, Aumua Amata Coleman",American Samoa,0,Republican,2025-04-28T13:04:14Z,https://www.congress.gov/img/member/r000600_20...


In [12]:
# Remove non-voting members from territories that have no voting rights in congress
df_current = df_current[~df_current['state'].isin(list_of_territories)]

# total legislators without territories
df_current.shape[0]

531

In [13]:
# change name column (e.g. Messmer, Mark B.)to first_name, middle_name, last_name
df_current = df_current.copy()
df_current["first_name"] = df_current["name"].str.split(",").str[1].str.split(" ").str[1]
df_current["middle_name"] = df_current["name"].str.split(",").str[1].str.split(" ").str[2]
df_current["last_name"] = df_current["name"].str.split(",").str[0]

# get nickname column, nicknames are in the name column surrounded by "
df_current["nickname"] = df_current["name"].str.extract(r'"(.*?)"')

# Take original name column and reorder so that Messmer, Mark B. becomes Mark B. Messmer
df_current["Name"] = df_current["name"].str.split(",").str[1] + " " + df_current["name"].str.split(",").str[0]

# Strip whitespaces from the Name column
df_current["Name"] = df_current["Name"].str.strip()

# uncomment and run the following line if you want to check that the name columns are correct
#df_current[["name", "first_name", "middle_name", "last_name", "nickname", "Name"]].to_csv("first_name.csv", index=False)

## **Final Dataset Preparation and Export**

This section prepares the final dataset for export by standardizing column names and selecting the relevant fields for the Bridge Grades pipeline.

### **Final Processing Steps**
1. **Column Standardization:** Rename columns to match expected format
2. **Field Selection:** Choose only relevant columns for Bridge Grades
3. **Data Validation:** Verify final dataset completeness
4. **Export Preparation:** Format data for CSV export

### **Expected Output**
- **Total Records:** 531 legislators (435 House + 100 Senate - 4 non-voting delegates)
- **Key Fields:** bioguide_id, Name, Chamber, State, District, Party
- **Data Quality:** All records should have complete bioguide_id values


In [14]:
# Replace values in the column "chamber" so that "House of Representatives" becomes "House" and "Senate" becomes "Senate"
df_current["chamber"] = df_current["chamber"].replace({"House of Representatives": "House", "Senate": "Senate"})

In [15]:
# Select columns of interest
df_current_selected = df_current[["bioguideId", "Name", "first_name", "middle_name", "last_name", "nickname", 
                                  "chamber", "state", "district", "partyName", "startYear", "endYear", 
                                  "updateDate", "imageUrl"]]

In [16]:
# Rename columns so that they match previous "119th Congress" files
df_current_selected = df_current_selected.rename(columns={"bioguideId": "bioguide_id", 
                                    "Name": "Name", 
                                    "first_name": "first_name", 
                                    "middle_name": "middle_name", 
                                    "last_name": "last_name", 
                                    "nickname": "nickname", 
                                    "chamber": "Chamber", 
                                    "state": "State", 
                                    "district": "District", 
                                    "partyName": "Party", 
                                    "startYear": "start_year", 
                                    "endYear": "end_year", 
                                    "updateDate": "update_date", 
                                    "imageUrl": "image_url"})

In [17]:
df_current_selected

Unnamed: 0,bioguide_id,Name,first_name,middle_name,last_name,nickname,Chamber,State,District,Party,start_year,end_year,update_date,image_url
0,M001233,Mark B. Messmer,Mark,B.,Messmer,,House,Indiana,8,Republican,2025,,2025-07-14T14:25:50Z,https://www.congress.gov/img/member/677448630b...
1,R000617,Delia C. Ramirez,Delia,C.,Ramirez,,House,Illinois,3,Democratic,2023,,2025-06-13T13:48:04Z,https://www.congress.gov/img/member/684c235633...
2,S001232,Tim Sheehy,Tim,,Sheehy,,Senate,Montana,,Republican,2025,,2025-06-07T10:30:29Z,https://www.congress.gov/img/member/677d8231fd...
3,L000570,Ben Ray Luján,Ben,Ray,Luján,,Senate,New Mexico,,Democratic,2021,,2025-06-03T13:18:42Z,https://www.congress.gov/img/member/l000570_20...
4,H001089,Josh Hawley,Josh,,Hawley,,Senate,Missouri,,Republican,2019,,2025-05-28T10:30:24Z,https://www.congress.gov/img/member/h001089_20...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
532,B001236,John Boozman,John,,Boozman,,Senate,Arkansas,,Republican,2011,,2025-03-09T12:42:13Z,https://www.congress.gov/img/member/b001236_20...
533,B001261,John Barrasso,John,,Barrasso,,Senate,Wyoming,,Republican,2007,,2025-03-09T12:42:12Z,https://www.congress.gov/img/member/b001261_20...
534,B001267,Michael F. Bennet,Michael,F.,Bennet,,Senate,Colorado,,Democratic,2009,,2025-03-09T12:42:12Z,https://www.congress.gov/img/member/b001267_20...
535,B001230,Tammy Baldwin,Tammy,,Baldwin,,Senate,Wisconsin,,Democratic,2013,,2025-03-09T12:42:12Z,https://www.congress.gov/img/member/b001230_20...


In [18]:
# save csv with current date
import datetime

# get current date
current_date = datetime.datetime.now().strftime("%Y-%m-%d")

congress_number = 119 # change if different Congress

# save csv with current date
df_current_selected.to_csv(f"{congress_number}th_Congress_{current_date.replace('-', '')}.csv", index=False)

In [22]:
# for automation, to retrieve the latest file in your scripts 
# you can run something like this when reading the latest version of the file
# This way we can keep a version control of the current members list
import glob

files = sorted(glob.glob("119th_Congress_*.csv"))
latest = files[-1]


In [23]:
files

['119th_Congress_20250717.csv', '119th_Congress_20250809.csv']

In [27]:
files[-2]

'119th_Congress_20250717.csv'

In [24]:
latest

'119th_Congress_20250809.csv'

```python
# let's compare the current members list with the old one (df_old, df_new)
# we want to see which members are new and which ones are no longer in the list

import glob
# Get the latest file
files = sorted(glob.glob("119th_Congress_*.csv"))
latest = files[-1]

# read "119th Congress.csv" file
df_old = pd.read_csv("119th_Congress_20250717.csv")

# let's compare the current members list with the old one
df_new = pd.read_csv(latest)

# Remove any non-voting members from df_old using their bioguideId
non_voting_members_bioguide = non_voting_members['bioguideId'].tolist()
df_old = df_old[~df_old['bioguide_id'].isin(non_voting_members_bioguide)]

# New members: in df_new but not in df_old
new_members = df_new[~df_new['bioguide_id'].isin(df_old['bioguide_id'])]

# Departed members: in df_old but not in df_new
departed_members = df_old[~df_old['bioguide_id'].isin(df_new['bioguide_id'])]

# Print summary
print(f"🆕 New members: {len(new_members)}")
print(new_members['Name'].tolist())
print(f"❌ Departed members: {len(departed_members)}")
print(departed_members['Name'].tolist())
```