Recap of Bulk Data Task Force Meeting on July 14, 2021

The Bulk Data Task Force met on July 14, 2021, for the first quarterly meeting since October 2019, which is just before the COVID pandemic began. The virtual meeting included presentations from the House of Representatives, the Library of Congress, GPO, the Senate, and Demand Progress Education Fund. Video from the 2-hour long proceedings are available here and slides from the presentations are available on GPO’s Innovation Hub. More than 100 people were pre-registered to attend the meeting.

At the end of the meeting, Deputy Clerk of the House Bob Reeves announced he would retire at the end of August 2021. He has run the task force since its inception in 2012. Kirsten Gullickson will succeed him in that role. Congratulations to both. We wish to express our appreciation for Deputy Clerk Reeves for all of his hard work and many accomplishments.

As part of the Q&A, there was a question on funding for these projects. The answer was that there is funding in the budgets for the projects, but not all the projects have individual funding.

The Library of Congress will host the next meeting of the Bulk Data Task Force.

The House of Representatives

Public health emergency: In response to the public health emergency, the House of Representatives put in place the electronic submission of legislative documents: bills & resolutions, committee reports, and other materials as well as proxy voting. Electronic submission of bills & resolutions (the e-hopper) will be upgraded in the common months. More than 1,400 proxy voting letters are online.

Co-sponsor Senate bills: As of the change in the Congress, members of the House can now show their support for Senate bills. You can see this support on, just look for a link for “supporting House Members.” See here. During Q&A, it was made clear that the data for Senate co-sponsors is not yet in the bulk data — and they hope to have more information on that on Github.

New bioguide website. The House language a new bioguide website, available here, that now allows you to download the data in a structured data format and is fully implemented on an HTTPS website.

House modernization reports, i.e., reports required by the Select Committee on the Modernization of Congress, are available online on the House Committee on Administration’s website. Many of these reports cover items being worked on by the Clerk. Note: we routinely cover newly released reports in our newsletter, the First Branch Forecast.

Very large bills. Legislation in the House is getting longer, and the Clerk has processed some very long bills. The FY 2021 appropriations bill, for example, clocked in at 5,432 pages.

Comparative Print Project. The Clerk’s office provided a demonstration of the comparative print project. This new tool, which is being built by several offices inside the House, will show in real time how an amendment would change a bill, the differences between two bills, and how a proposed bill would change the law. It uses natural language processing to understand the legislative language to work its magic.

The comparative print project’s goal is not just to show the amendments, but to show them in a way that makes the maximum amount of sense to the user. This helps increase trust in the legislative process, help members visualize their changes, and allows them to negotiate in greater confidence. Go to the 10:45 mark in the video to see the presentation.

The goal is to make the comparative print project available House-wide by the end of the year. The next task is to make the components available to data partners. There is no information on if/whether there will be public access to the tool or the code.

Q&A. A question was raised about the lag between hearing transcripts and when they become available to the public. The answer was that it can take quite a while to prepare the transcripts. We note that the House could publish unofficial transcripts with a fast turn-around time that would facilitate public access to the information in a timely fashion.

The Library of Congress

The Library of Congress announced the next virtual public forum on the Library of Congress’s legislative data services will take place on September 2nd. Users are also encouraged to provide feedback to the Library of Congress at

The Library reported on a number of smaller but still significant improvements to its flagship website. They are:

  • Alerts for committee activity
  • An automatic citation tool — MLA, chicago manual style
  • The ability to contact your Member — a search box where you can put in your address easily and search
  • Consolidate saved search email alerts — now you can sign into your account, click consolidate by email alert, and just receive one alert at a time
  • Access enacted laws from 1951 forward (but not as structured data)
  • Access “Listen to this page” feature
  • Search the help center
  • Access historical debates of Congress — the bound congressional records — going back to the 1920s. The goal is to go back to 1873. (This also is not structured data).
  • Access committee schedule with hearing transcripts back to 2001.
  • Also, 33,000 bills from the 6th to 32nd Congress were added to, moving over the information (and putting it into PDF format) from the out-dated Century of Lawmaking website.

Q&A. A question was raised about the speed by which bill texts are published on It appears that bills can take quite a while after they are submitted to become available online. The answer was there is a backlog for processing and many bills can get stuck in it. xPub and other technical solutions can help lessen the issue. A follow-up question asked: how long should it take for a bill to become available online/address the backlog? In response, we will look further into this.

In addition, the Library of Congress avoided answering questions about publishing historical CRS reports online and publishing current reports as HTML, merely saying they get that feedback regularly. In addition, the Library of Congress would not share its views on the feasibility of public requests made at the Library’s first virtual public forum, to summarize the report it provided to appropriators as to the feasibility or appropriateness of meeting those requests, or to share the report. Nor was a timeline given as to when these questions might be answered.

Government Publishing Office

GPO presented on GovInfo, its main repository website, and also discussed the transformation from its old publication system to a fully digital workflow and new formats. More information on their resources is available here.

Among the improvements to GovInfo are the inclusion of bill status XML bulk data back to the 108th Congress (2003 forward); a dedicated RSS feed for bills; and the now -published but at-the-time-previewed statute compilations in USLM XML. They’ve included “citation search patterns” that stakeholders can now use to filter through the various categories of information they hold. And there’s a new embeddable govinfo search box.

We also want to highlight the API available from GovInfo. We haven’t had a chance to play with it yet. If you want new features, GPO requests you suggest it on their Github repo. Also, packages available through the API all make clear that they are not subject to copyright. (Thank you!)

Upcoming GPO projects include digitizing the congressional serial set back to the 15th Congress. The serial set contains House and Senate Documents and House and Senate Reports bound by session of Congress. The first tranche of documents will start becoming available in the fall of 2021.

GPO provided a demonstration of xPub, which is a new project for publishing bills and laws. The GPO is moving to a full digital workflow in which they can accept content in any form. Even now, much of GPO’s work is paper based, but they are working to change that.

One feature that I’m particularly excited for is a new display for the ASCII files for documents. Currently, there’s a hard line return, which messes up the flow of the text. Soon it will be displayed in responsive HTML, which means it will flow. They also will gain a better ability to render tables and improve how they show insertions and removals. xPub is expected to be easier to integrate with authoring and editing tools and to produce PDF files and digital products that are section 508 complaint.

The goal is to have production ready at the end of the year — they just have to make the decision when to fully switch to xPub. The public is invited to review and comment on the new xPub format here.

Secretary of the Senate

The Secretary of the Senate had several updates, but I did not catch them all. They include changing the member contact page; having a datatable for contact information for Senators; changing how the search page works; and changing the URLs for roll call votes.

Daniel Schuman, Demand Progress Education Fund

I gave a demonstration of our new BillMap tool, which makes it possible to track legislative ideas across the same Congress and over multiple Congresses. Here is a link to my presentation.