Recap of Congressional Data Task Force Meeting on December 12, 2024

The Congressional Data Task Force met for the first time in its typical forum setting since June, as the sixth Congressional Hackathon represented its fall quarterly meeting in September. The video of the event, presenter slides, and agenda are available at the Legislative Branch Innovation Hub. It’s worth noting that the Congressional Hackathon is now an annual event.

The congressional staff directory recommended by the House Select Committee on the Modernization of Congress received its first batch of users last month, representatives from the office of the Chief Administrative Officer shared. About 100 staff are participating in a private beta test of what’s called LegiDex, including its mass email function that makes it possible to send targeted emails to specific staffers organized by issue area, party, state, title, and so on. Directory data comes through multiple data sources, including data payroll daily, and includes about 30 different role titles the team developed. Staff can edit their offices’ information manually including issue areas covered by legislative staff. Integration of committee assignments is planned. 

At this moment, LegiDex is only available to House users as the team needs personnel data from the Senate and some legislative branch agencies for full functionality Legislative branch-wide. Staff in the Senate and the Congressional Budget Office can see a limited demonstration of the platform, however, and will have full usability as they share their data. The intention is to make this available to everyone across the Legislative branch.

CAO also is migrating HouseNet to the AWS cloud and will turn off the old site in a few weeks. The move is intended to improve access from mobile devices and integrate better with other systems, including LegiDex. 

CAO also demonstrated Persona, a tool the House Digital Service developed internally to help it better understand its users’ needs, pain points, and context within the congressional system. CAO staff interviewed members and staff to develop sample profiles of the type of work done across member offices, leadership offices, and committees daily. It also displays organizational charts for personas within those offices all to help legislative branch staff better understand the complexity and relationship networks within the House.  

The Secretary of the Senate’s office is nearing completion of a report of a working group studying access and preservation of congressional video. They have settled on a cloud infrastructure platform and have developed frameworks for addressing long-term preservation of what will be considerable data. One of their intentions is to create an archive of past senate floor proceedings and make it available to integrate into other information sources. There are no plans, however, for a repository for older videos. The working group has not decided whether to share the report publicly. 

The Secretary’s office also completed converting the old Capitol Bells app, which alerts users to updates in the House and Senate legislative call systems run by the Architect of the Capitol, into an API. Users on the Senate intranet can see a description of what the bells mean and receive alerts on things like adjournment. It’s only available to the Senate at the moment, but the office intends to make it available to other congressional data partners and the public in time.

Display of roll call votes on Congress.gov is receiving a speed upgrade as the Clerk of the House has authorized the site to consume chamber vote data. This authorization also means that the same data will be in the Congress.gov API. The Clerk’s API updates every 15 minutes during votes, so the updates won’t come in real time, but within 30 minutes at most, the Clerk’s Office explained. 

The Clerk’s Office also is launching a new internal committee portal for tracking committee activity, including votes. The office is working first on a system of unique identifiers for individual committee votes for an electronic tally sheet for roll call votes. 

To help a statutory requirement to track and report on expired and expiring appropriations authorization annually, the Congressional Budget Office has developed an LLM process to shorten an incredibly time consuming process. Currently, CBO staff have to search both public laws and appropriations bills and track down individual appropriations manually to compile the report. A team leveraged the Clerk’s Comparative Print Suite to identify changes in the US Code and trained the LLM on sections relevant to authorization language to highlight relevant public laws for the report. The team is now pursuing developing a prototype, potentially with the help of Amazon’s Generative AI innovation Center or universities. This project came directly from the second yearly internal CBO hackathon last August. 

The Government Publishing Office has passed the half-way point in digitizing the Congressional serial set volumes, the nearly 16,000 bound books that collect the records of each Congress. GPO has broken these massive volumes into individual reports and other documents so users do not have to come through hundreds of pages in a specific volume. Nearly 72,000 congressional reports and 36,000 documents, journals, rules, and manuals have been digitized from 8,500 volumes. It’s unclear if GPO can prioritize, however, more recent volumes for digitization.   

In December, GPO also released code to provide access to the US Statutes at Large from 2002 back to 1789 in USLM XML on GovInfo and via API. It also will start making XML and graphics files from the collection available on GovInfo. The process of posting all XML files will proceed incrementally to assure quality control and likely will take a few years to complete. 

GPO marked the one-year anniversary of its digital collection of congressionally mandated reports, which now include 550 titles from 70 federal organizations. The Congressionally Mandated Reports Act requires reports mandated to Congress and specific committees to be submitted to GPO in a digital format. Because the Clerk does not receive and is not required to compile a list of reports required by committees, GPO is learning about the scope of the collection as it goes along.

A working group of House, Senate, Library of Congress, and GPO staff have launched a project to model House and Senate committee and conference reports in USLM XML going forward. It has created a sample data set and will post progress on schema and samples on a GPO Github repository.

GPO also announced it has launched the first user acceptance testing phase for its XPub bill drafting platform.

Finally, GPO shared that it has digitized the congressional pictorial directories dating back to 1951.

The Congress.gov team reported it is on track to meet the March 2025 mandate to publish Congressional Research Service reports on the website. HTML, PDF files, and metadata will be available for the reports in the Congress.gov API. This is the first time HTML will be released publicly.

The Library of Congress also announced a victory for legislative branch interoperability in the creation of links in congress.gov to the GovInfo collection of statute compilations and links to GPO files for public laws and statutes at large. 

The Library declined to say whether the report on the September public meeting would be made publicly available as appropriators directed.

The Clerk’s office indicated they would work to make information about new members of Congress for the start of the 119th Congress publicly available as soon as possible. Of particular note for data users, the subcommittee codes will be released in the XML file. The release schedule for unofficial member elect data is the PDF of new members will be released tomorrow and the XML file the following week. 

Congratulations and thank you to Wade Ballou, who is retiring from serving as the House’s legislative counsel.