On September 6th, the Library of Congress announced it launched a beta version of its Congress.gov API. While APIs for legislative data aren’t new for the Legislative branch — see, for example, the Government Publishing Office’s API — this is a pretty big deal. For the reason why, it’s helpful to know a little history.
In 1995, the Library of Congress and GPO launched THOMAS.LOC.GOV, a website containing bills under consideration by the 104th Congress, including information about their status. This is what it looked like in 1997.
In the prior Congress, the Democratic House of Representatives had been working to publish legislative information on their own, but ran out of time. Speaker Gingrich and the Republican revolution swept into power in the 104th Congress, and they mandated a legislative information site be quickly created. This wasn’t entirely a new idea: back in 1988 the now defunct Office of Technology Assessment issued a report entitled “Informing the Nation: Federal Information Dissemination in an Electronic Age“ that, in chapter 8, addressed electronic dissemination of congressional information.
There was a disagreement over where the new Congressional information website would be housed, with the result that the Library of Congress and the Government Publishing Office jointly shared publication responsibilities. The Library was responsible for the website interface and some of the data, while GPO was responsible for publishing much of the information. (The House Clerk’s office and Senate Secretary’s office also contribute data, particularly the legislative actions on bills.) In fact, ultimately two legislative information websites managed by the Library were created: THOMAS, for the public, and LIS, for congressional users.
This new THOMAS website was great for the time, but the Library/Government Publishing Office did not publish the legislative status data behind the website. So if you wanted to analyze the whole set of legislation before Congress or track the status of a bill, you could not. Members of the public first asked the Library to publish the information as data, which they flatly refused (for more than a decade!) So, members of the public started scraping the data, which is a difficult and time-consuming process.
Dr. Josh Tauberer, who started scraping the information in the early 2000s because he was interested in tracking a particular bill, created GovTrack.us so that others could follow the legislation as well. And, more importantly for this story, he began publishing the information in a structured data format, so that others did not need to build screen-scrapers, but instead could use his data. This created an entire collaborative ecosystem around legislative data that persists to this date.
While the THOMAS website was modern in 1995, the technology behind it grew increasingly out-of-date. The Library front office showed no interest in improving THOMAS or modernizing its systems. By the mid-2000s, GovTrack was obviously superior to THOMAS, with better user design, more content, and greater responsiveness to its audience.
At the same time, the public kept agitating for the Library of Congress to publish the legislative bill text, summary, and status information as data, a request that was continuously refused. In 2007, a bipartisan coalition of organizations once against pushed for that information to be made publicly available, in the groundbreaking Open House Project Report, but no progress was made.
Ultimately, around 2012, those of us in civil society made a breakthrough. Previously, when Democrats would advocate for open data, the Library would prompt Republicans to oppose. When we helped encourage Republicans and Democrats to work together in the House, the Library would work through the Senate to stop it. But, around 2012, we had a legislative moment.
Rep. Darryl Issa was ready to offer an amendment to an appropriations bill that would have mandated open access to legislative data. The Library of Congress front office squeezed House leadership hard, who offered a compromise: we’ll create a working group to study the question, the Bulk Data Task Force.
This was an effort by some to send it to committee to die. Yet, to my surprise and that of many others, the Bulk Data Task Force heard from civil society, considered the issue fairly, and recommended that the legislative information behind THOMAS be published as structured data. Specifically, as structured and bulk data — which means you can download it all at once.
What opponents had failed to anticipate was that there were many players inside the Legislative branch who wanted modernization, and the Bulk Data Task Force brought many of them together and established a network. (A few of them had already been cooperating since the establishment of the Legislative Branch XML Working Group in the late 90s, which set standards for the XML used to draft legislation.) That Bulk Data Task Force continues to meet today, with internal and external stakeholders, and was renamed as the Legislative Data Task Force in June 2022. It continues to be a driving force for modernization. (We are the civil society analogue to the LDTF, and this website contains the history of our efforts to engage with that process!)
The GPO began publishing legislative information online as bulk data, which they still do almost a decade later even as they continue to expand their offerings. A little prior to publication of some legislative information as data, the Library began working on a replacement to THOMAS, what ultimately became Congress.gov. The beta for Congress.gov was launched in 2012, and THOMAS was shut down in 2016. The Library also began the slow path to eliminating LIS, the internal congressional legislative information website.
So what does all this have to do with the Congress.gov API? Publication of legislative data in bulk is great for skilled developers. But, often times, you don’t want to download all the bills in a particular Congress, but rather just a specific bill. If you’ll pardon the metaphor, bulk access is like purchasing a bag of rice, an API is like obtaining a single grain. Or, when you ask a question, bulk data is like receiving an encyclopedia in response, an API is like getting a specific answer.
We knew the Library had built an API for exchanging data with its Legislative branch partners. But for many years it would not acknowledge that an API existed, and then there was resistance to making a public-facing API. People inside the Library had multiple perspectives about whether to do this, often split along departmental lines.
Civil society continued to push for a Congress.gov API. Appropriators listened and encouraged work along these lines, and there were now others who shared this view throughout the Legislative branch, including in parts of the Library. The issue continued to be raised in multiple venues, including at the Library of Congress’s 2020 and 2021 virtual public forum on Congress.gov.
Thanks to external requests, support throughout the Legislative branch, and forward-thinkers inside the Library of Congress, the Library took up the mantle and announced they would create an API. Back in June, they asked the public to help test it, which is an excellent example of collaboration.
So here we are. The Library of Congress, GPO, and other Legislative branch stakeholders continue to improve the quantity of legislative information made available to the public, close publication gaps, and publish more and more legislative information in ways that are accessible to both developers and the general public.
This is a good news story. And the launch of the Congress.gov API is very good news.