The Congressional Data Task Force convened March 20, benefiting from a nice venue upgrade inside the Capitol Visitor Center. These quarterly meetings highlight the great collaborative work taking place behind the scenes across legislative branch offices to unlock the enormous amount of information about what Congress is doing and has done in the past. It’s work that levels the playing field for members of Congress, legislative staff, and the public in terms of situational awareness and deeper institutional knowledge, which is why we think it’s so important.
Summaries of previous CDTF meetings can be found on the website using the “Congressional Data Task Force” tag.
Government Publishing Office
The team digitizing the Congressional Serial Set, compilations of House and Senate documents and reports from each Congress, has about 4,500 volumes left to go. It has completed volumes from the entire 20th century (the 56th through 106th Congresses). The team also completed adding Supreme Court cases and decisions up to the latest published volume in 2016.
GPO’s Jon Quandt shared several other developments on GovInfo, which recently celebrated its 10th anniversary. It also has created links to congressional hearings related to bills under the related documents tab. GPO, the Office of Federal Records, and the National Archives hit another major milestone by completing implementation of the USLM XML format for its digitized volumes of the historical U.S. Statutes at Large collection.
The latest release notes on GovInfo, published after the meeting, provide additional details on these updates.
GPO also is working on several new collaborative projects, notably an effort with the Congressional Budget Office to format some of its projection models into USLM XML. Expanding use of the USLM schema makes the exchange of information in various legislative branch systems possible, ensuring interoperability that expands the possibilities of what tools can deliver for users.
Millions of documents on GovInfo will be affected by plans of the major web browsers to cease support of XLST 1.0, a language that transforms XML documents into formatted web pages that dates to 1999 and holds security risks. The planned deprecation is scheduled for November. The GovInfo team will spend the summer getting ahead of this deadline, including server-side transformations that will maintain the user experience of accessing documents.
In the Q&A, Quandt said that the GovInfo team was working on methods for extracting data from committee documents for better access, but multiple entities in the legislative branch have a piece of the data exchange of information found in such documents.
House Chief Administrative Officer
Responding to recommendations of the House Select Committee on Modernization, the CAO has provided new guidance for vendors interested in selling technology services into the House on the House website. The Committee on House Administration has set a number of standards for vendors to protect House data and speech and debate rights, but new entrants into the chamber marketplace often were unaware of them, leading to product rejections that could have been avoided.
The new guidance includes information for cloud services most notably. CAO also has created a new unsolicited tech pitch page for services and products that includes the option to upload a five-minute demonstration video. We’ll be very interested in seeing what they receive, how member and committee staff are brought into the process of vetting pitches, and potential benefit to the Congressional Hackathon.
In collaboration with leadership offices, CAO also has launched the first ever downloadable and subscribable voting calendar for House.gov that dynamically updates when votes and pro forma sessions are taking place. The calendar, a collaborative effort of the House Digital Service and the House Clerk, can be embedded in members’ and organizations’ websites and even customized to match web page color palettes. See the sidebar of the legislative activity page on House.gov for the links.
Finally, CAO announced it had brought control of all House Committee videos under its in-house studio to simplify their custody. In addition to processing the video, the studio will provide them to Congress.gov, which requires matching up the YouTube or Vimeo post with the hearing ID. We think this change will boost accessibility for the videos going forward.
Linking video data to hearings in the metadata remains a challenge, however, for making past recordings useful to Congress.gov. It doesn’t sound like the CAO is going to devote resources to doing so, focusing on hearings as they come. Fortunately, AGI is planning to launch a tech project to use AI to identify likely matches between hearings and video automatically.
House Clerk
Now that the Legislative Branch Data Map has come together through a partnership between the House Digital Service and Congressional Data Coalition, the Clerk is adding Data Catalog Vocabulary(or DCAT) standards for the datasets. The descriptors in these standards make data catalogs published on the Web interoperable and more easily discovered.
In the Clerk’s Office presentation, Kirsten Gullickson also highlighted two findings of a recent collaborative legislative drafting study the Clerk and House Office of Legislative Counsel delivered to the Legislative Branch Appropriations Subcommittee. Clerk Kevin McCumber mentioned them during the recent budget hearing: a member drafting portal so offices can track the status of requests made to the Office of Legislative Counsel (cutting email ping-pong); and a lightweight editor member staff could use to make changes to non-legal text of bill drafts themselves.
Secretary of the Senate
Senate Webmaster Arin Shapiro shared that the new Senate website is in development and is aiming for launch over the summer recess.
Enhanced information about Senate committees is now available on Congress.gov, including assignments listed on each member page and membership (including for subcommittees) on committee pages. Senate committee schedules also are available on Congress.gov. Internal users can track the full text of amendments in machine-readable format.
The Secretary of the Senate also continues to work with House partners to bring the comparative print suite to the full Senate, building on a current pilot program. Shapiro’s team has received useful feedback from Senate participants
Library of Congress
The FY 2026 annual report for congress.gov that was mentioned above also listed the team’s priority initiatives for this fiscal year. The Senate committee information integration was one. Others include:
- Adding Senate days in session
- Searchable statute compilations
- Automating the workflow of adding official titles to bills engrossed in the House
- Improving access to Senate amendment texts from 2001 to 2015
- Automating the workflow of curating Senate resolution texts
- Fulfilling a Select Committee on Modernization’s recommendation by establishing a data source with the Clerk’s Office and the House Rules Committee to be able to add House Rules amendments to members’ profile pages on Congress.gov.
- Continued congressional and public client interviews on user experience.
The list also included providing a timeline for fulfilling a ModCom recommendation that Congress.gov provide information on related bills by February 1, 2026. The presentation did not include details about this timeline, so it’s worth following up about this in the next meeting.
The next public forum for Congress.gov will be the afternoon of September 24.
Civil Society
George Mason professor Jennifer Victor provided a demonstration of a project that captures her research into members’ participation in the caucus system and the networks between members it creates. Victor and her teams of student assistants built a database of all caucus membership between the 103rd and 116th Congresses using bound copies of Leadership Directories. The data can be queried via her Caucus Explorer platform. It also can generate network matrixes for members and caucuses.
The data can be downloaded by users, making this to our knowledge the best and perhaps only comprehensive source of caucus membership information around. Unbeknownst to us at the time, we completed Victor’s data for the 117th-119th Congresses’ membership in major ideological caucuses. That dataset is also available within GovTrack.
Joining from Portugal, Daniel Schuman shared AGI’s latest work to support the legislative data and technology infrastructure of Congress. First, he revealed a list we started compiling via crowdsourcing of active technology projects within or related to the legislative branch last month. View or contribute to the list at this link.
We also highlighted some relevant public witness testimony submitted to the House Legislative Branch Appropriations Subcommittee last week, including:
- AGI’s request for funding for 1 FTE to support the CDTF
- Joe Eannello on updating the CRS Appropriations Status table to include bill text and report language at the earliest point made publicly available
- Nick Hart, Data Foundation, on funding & accessibility of the Legislative Branch Data Technology Map
- JD Rackey of BPC on funding for the Modernization Initiatives Account; Lorelei Kelly, Public Good Group on funding for MIA & Clerk
- Michael Stern on improving information about the Bipartisan Legal Advisory Group on the House Office of General Counsel website
- Jim Townsend, Levin Center, on GPO hosting IG reports
- Haiman Wong, R Street, Sean Vitka Demand Progress, and Daniel Schuman of AGI on strengthening congressional cybersecurity
Daniel also shared news that AGI is launching several tech projects for the good of the legislative branch, including:
- Transforming appropriations bills and reports into data
- Tracking changes in appropriations report language
- Automatically identifying reporting requirements in bill text or report language
Building off our engagement with Bússola Tech in Brazil, Daniel shared the Inter-Parliamentary Union’s use cases for AI in parliaments compilation.
Finally, Daniel shared a reminder that the “Data Skills for Congress” professional certificate program, a free online series of training, will be offered again this summer from June 28 through August 27. It is sponsored by the Goldman School of Public Policy at the University of California, Berkeley and supported by USAFacts.
Questions for next time
As we mentioned in the First Branch Forecast recap of this meeting, some presentations raised questions that we’ll revisit in the next meeting on June 11.
In FY2023, the House Appropriations Committee accepted the Clerk of the House’s assessment that creating a publicly-available lobbying disclosure system that included unique identifiers for individuals would require an overhaul of the system and provided $1.4 million to do so. The Clerk’s Office and Secretary of the Senate have been working on the system, but going on four years later, the Senate still has no timeline for rolling out its unique identifiers. It’s not fair to blame the technical staff for some hang-up that appears to be out of their control, but this long of a delay is puzzling.
At a CDTF meeting last fall, the Library of Congress teased the release of an annual report for Congress.gov for FY 2026. Thursday, it was revealed that the report is available on the Congress.gov website (here it is), which we very much appreciate. The Library is doing well to be responsive to the needs of users of the site, which is a significant lift given that it saw more than 82 million visits and 176 million page views in FY 2025, huge leaps in both metrics from the previous year. It’s held public forums to gather feedback since 2020 and has gathered more from interviews.
The report lists suggestions from its feedback repository, which the Library reviews with the other members of the CDTF to determine feasibility and level of priority. As the report notes, significant planning and coordination goes into the editing and publishing processes that make access to information on Congress.gov possible. Its list of user requests, however, only indicates whether an idea would be “impacted by current upstream system limitations” or not, without much context for what these challenges may be.
To be certain, some requests and feedback from users that may seem straightforward are actually complicated or are caught up in congressional rules (see, committees owning their data). We have some concern, however, that the Library might be playing too cautiously with aligning upstream systems for information it has its own resources to gather, and likely is somewhere.
Take, for example, the process for posting appropriations-related materials on Congress.gov. At the height of appropriations season, subcommittees often circulate bill text and report language via email to interested parties on Friday afternoons or post them on websites asynchronously with committee hearings or markups. They often come in a rush, leaving members and the public who do not subscribe to expensive alert services or are not checking obsessively themselves at a disadvantage when engagement is critical in the process. The Library currently waits, however, until materials have been sent to and published by GPO, which is often after the committee has approved the text, too late to affect the outcome.
CRS is tasked with publishing the appropriations status tables that appear on Congress.gov, which are extremely helpful one-stop resources for a lot of material from a dozen subcommittees times two chambers. CRS analysts almost certainly are keeping track of sporadic releases, or at least could be with minimal time investment. CRS posting that material to Congress.gov when it is first available would greatly enrich the value of the tables as a free, public resource. Accordingly, AGI has proposed appropriators include report language requesting immediate posting of materials released by subcommittees and full committees by CRS in the FY 2027 bill.
