US Statutes at Large: Essential to understanding our laws and legislative history

One of the benefits of the Congressional Data Coalition has been our ability to collaborate on mutual projects of interest. CDC members recognize that reusable, cleaned-up legislative information, especially the laws themselves, is essential for both the legislative data community and the public. Unfortunately, at least some information will likely not be provided by Congress or will not be provided in a timely manner.

Almost 3½ years ago, in November 2010, GPO and the Library of Congress were authorized by the Joint Committee on Printing to make the following three document sets available on the Internet: Statutes at Large, the Congressional Record (1878-1998), and the Constitution of the United States: Analysis and Interpretation (CONAN). Quoting from the JCP letter: “These are key primary research sources, essential to understanding our laws and legislative history, and they should all be readily available online in electronic format.”

So far, volumes 65 through 124 (1951-2010) of the Statutes at Large and PDF files only of CONAN have been published by the Legislative Branch per the November 2010 authorization.

Why are the Statutes at Large important?

The United States Statutes at Large is the legal and permanent evidence of all the laws enacted during a session of Congress (1 U.S.C. 112). Every law, public and private, is published in the order of its passage. The set contains treaties and international agreements before 1948, concurrent resolutions, proposed and ratified amendments to the Constitution, and proclamations by the President. Pretty much the whole enchilada – and before you ask about the Constitution, yes, volume 1 includes the Declaration of Independence, the Articles of Confederation, and the Constitution of the United States.

But isn’t the US Code the law? Only a subset of the laws in the Statutes at Large are contained within the U.S. Code and many of those laws have been modified by subsequent laws to the point that the original language is difficult to discern. Hundreds of laws have been enacted that never made it into the United States Code. For example, of the 440 laws enacted in 1949, 235 made it into the US Code.

The importance of Internet accessibility to the laws enacted before 1951 should be obvious. The Law Revision Counsel (the organization responsible for putting together the U. S. Code) in their Table of Acts Cited by Popular Name have identified almost 2,100 laws that were enacted before 1951. Searching legislative text from the 112th Congress (2011-2012) shows that the past is not completely forgotten. About 6 percent of Statute at Large citations reference pre-1951 volumes.

So, while some of these laws are cited in current bills, they remain in 2014, officially available only as paper documents and, unofficially, there are scanned versions of the volumes at the Constitution Society’s website, but these volume files have not been broken down into individual laws, treaties, Presidential proclamations, etc. until now.

Making more laws available

Starting in January 2014, the Congressional Data Coalition and citizens joined together to make the individual laws and other documents of the US Statutes at Large available as discreet PDF files. We’re a little over half way through the initiative but we need volunteers to help for the final push.

Rather than attempting to produce a full-text table of contents for each volume as was accomplished by GPO for the post-1950 volumes, we’ve extracted the page number where each component (public law, resolution, etc.) begins by reusing the OCRed text from the PDF files. We then crowdsource the proofreading and correcting of the data which is where we need your help. Once the simple table of contents is completed, software extracts the individual PDF files for each sub-document. The software to do all this is open source and available online.

As of April 2014, volumes 28 through 64 (1893-1951) have been processed. We’ve also begun extracting the text from the tables of contents from the volume files and combined it with the simple table of contents data being used to create the files (sort of like a final QA check). By combining the two data sources (the text from the tables of contents along with the public law number and stat page data, we’ve been able to build more usable tables of contents. See the U. S. Statutes at Large Pre-1951 Directory.

The future of legislative data collaboration

Our approach has combined crowd-sourcing, manual editing, and automated processes. We’ve received help from a variety of outstanding volunteers. In two months, we have expanded the availability of laws by 50 years and over 15,000 acts, treaties, and international agreements.

Similar approaches should be strongly considered for publishing other historical documents on the Internet. The best example of the elephant in the room of course is the Congressional Record – only available on the Internet back to 1994 but published since 1873. As software developers, both inside and outside of government, we should be thinking in terms of how crowdsourcing can help us build the necessary document repositories for the 21st century.

Our role, as the Congressional Data Coalition, includes supporting public initiatives that provide improved legislative information for ourselves and the public. Tom Bruce, Director of the Cornell Law Information Institute, said it eloquently in his hangout session when he talked about the dream of having an open-access Westlaw or LexisNexis with layered access to information providing legal/legislative services housed under many roofs – a federation of services and data.

We should not shy away from identifying data anomalies and provide corrected data in a fully transparent and constructive way to support the public need for accurate and timely legislative information. It might not seem that having all of these laws as discreet files on the Internet would mean much. We’ve lived without them as discreet electronic files for a long time without any apparent problems. My hope for now is that these documents will extend our electronic legislative library so that our history can be read and referenced over the Internet.

Please consider helping our effort and volunteering along with us at Special thanks to Owen Ambur, Daniel Schuman, Sara S. Frug, Joe Jerome, and Matt Steinberg.

Op-ed: To make Congress more accountable, make it more open

Daniel Schuman and I have a new op-ed on legislative data in The Hill:

Nearly two decades ago, Congress began publishing some of its activities online, revolutionizing access to essential public information. The system was called THOMAS, after our third president. Managed by the Library of Congress, it aimed to serve as a central hub to find bills and resolutions, the Congressional Record, committee reports, treaties and so on. There’s no doubt that, for 1995, this was a huge leap forward.

While technology has changed a lot since the mid 1990s, the quality of data coming from Congress has not kept up. Complicating matters, the THOMAS website is set to be retired and replaced with by the end of 2014. Once this happens, applications that have been developed with this data, and that are used extensively by congressional members and their staff, interest groups and citizens, will stop working.

Read the rest here.

4/04 Day panel and happy hour

The Congressional Data Coalition is hosting a panel and happy hour on April 4 (“4/04 Day”) sponsored by the R Street Institute and CREW, and organized with help from other CDC members. Register for the panel here: And for the happy hour here:

Here’s the event description and list of participants:

Two decades ago Congress began publishing on the Internet, revolutionizing public access to legislative information. While the technology we use has evolved since the 90s, Congress has not always kept up.

In the meantime, public-minded entrepreneurs have used congressional data to alert people to important bills, keep track of votes, and put citizens in contact with their elected representatives.

While some modernization has occurred, further improvements to Congress’s publishing methods are urgently needed. And with the looming switch from THOMAS to, many organizations and individuals who depend on the available data will be left in the dark, endangering the public’s ability see what its government is doing.

Join our panel of data experts for a discussion of how far we have come, and how far we have yet to go, to fulfill the promise of lawmaking in the open.

Jim Harper, Global Policy Counsel, Bitcoin Foundation; Senior Fellow, Cato Institute (Moderator)
Steve Dwyer, Digital Director & Policy Advisor to Rep. Steny Hoyer
Josh Tauberer, Founder, Govtrack.Us
Nick Schaper, Senior Vice President, Engage
Kirsten Gullickson, Senior Systems Analyst, Clerk of the House of Representatives

Seeking Google Fellow to help run CDC

The R Street Institute (a libertarian think tank headquartered in Washington, DC) is accepting applications for its 2014 Google Policy Fellowship in technology policy. R Street’s Google Fellow will have the opportunity to contribute op-eds and blog posts, work on overseeing coalition efforts, and help manage digital and social media projects relating to tech policy. In addition, our Google Fellow will have the opportunity to help run the Congressional Data Coalition and coordinate efforts between members.

Apply here for R Street’s position. Read the full terms and conditions here.

Other CDC members offering the fellowship include EFF and TechFreedom.

Congressional Data Coalition writes to House appropriators

In a new letter to the House this month, we joined 18 other organizations and individuals in calling for access to the legislative data on bill status that Congress has but won’t share.

The letter was sent by the new Congressional Data Coalition, formed this month of citizens, public interest groups, trade associations, and businesses who champion greater governmental transparency through improved public access to and long-term preservation of congressional information.Continue Reading

Video and presentation from BDTF

In case you weren’t able to attend in person, here’s a video and presentation from Monday’s Bulk Data Task Force meeting.

Congressional data: A primer for non-geeks

When the legislative branch started publishing bills and resolutions online in 1995, it was heralded as a revolution in government transparency. The public at last had easy and immediate access to the text of legislation and citizens could better hold their elected officials accountable.

But technology has progressed quite a distance since then, and the internet has become a far more dynamic platform for information. Given the massive quantity of content produced on a daily basis by the government alone, transparency can no longer defined by the existence of information as text. Instead, transparency should be defined as the accessibility and usability of information as data.

So what exactly is the difference between text and data? Text is a series of static characters and words formatted to be presented in a certain way on a page. Humans are able to derive meaning from text by reading and understanding it, but computers can do little more than store it or display it for a human to read. Data is different. Data consists of pieces of information linked to identifiers and variables. When data is published in certain formats, computers are able to automatically find and compile specific, identified pieces of information. This saves humans countless hours and improves the accuracy and completeness of the information they gather.

To simplify, think about the difference between text and data in terms of information in the news. When a major weather event occurs, media outlets typically release articles about how the weather anecdotally impacted local individuals. These articles are text. Media outlets also release numbers and charts that show the temperatures and wind speeds over the same period of time. This is data. While the text may detail the effect of the event through an (implicitly or explicitly) editorialized lens, the data provides concrete information that consumers can apply widely and compile with other datasets to derive new meaning.

So what would this text-to-data transform mean for Congress? It would mean releasing official documents, membership information, committee reports, and other pieces of legislative information in formats ready for computer processing; establishing authoritative identifiers for the many entities involved in governmental processes; and having that content available for download from a machine-crawlable location.

Despite notable effort in some quarters, Congress unfortunately has not kept up with these data demands. As of now, legislative data is difficult to access, outdated, and low quality. Private developers who build software around this data must inefficiently pull from multiple sources, amplifying the potential for errors, inconsistencies, and inaccurate information. While recent improvements in data accessibility have been encouraging developments (including various new bulk data downloads from the Government Printing Office), they do not adequately support today’s data processing capabilities.

Not only do developers deserve better, the public deserves better.

The Congressional Data Coalition seeks to bring together programmers, data scientists, activists, and policy experts of all ideologies to encourage Congress to improve the access and usability of legislative data. This will allow developers and civic hackers to provide consumers with more authoritative, reliable, and timely information on the on-goings of Congress, leading to a more transparent government and better informed public.

It is time for Congress to free its data.


The original text of the Freedom of Information Act

The Freedom of Information Act was enacted twice, and the one that we know and celebrate is, technically, not the one that became law. This early history of FOIA provides an interesting case study in the complexities of the codification of our federal statutes.

What we commonly consider the Freedom of Information Act, S. 1160 in the 89th Congress, was signed by President Johnson on July 4, 1966. It became Pub.L. 89–487 / 80 Stat. 250. Its effective date was one year later on July 4, 1967, and in fact it never became law: it was repealed before its effective date. More on that below.Continue Reading

More on counting laws and discrepancies in the Resume of Congressional Activity

After my last post yesterday about Congress incorrectly counting the new laws in 2013, Daniel Schuman (of CREW) suggested that I look at previous installments of the Resume of Congressional Activity to see if there were other long-standing discrepancies in these historical counts of the number of laws passed by each Congress.

I went through each of the PDFs listed at… and compared the totals by Congress (a Congress is a two-year period of legislative activity), and then compared those totals to other sources.Continue Reading

Timeline of US legislative documents and data

  • The message, “What hath God wrought?” sent later by “Morse Code” from the old Supreme Court chamber in the United States Capitol to Samuel Morse’s partner in Baltimore, officially opened the completed telegraph line of May 24, 1844. (1)
  • The private firm, Little, Brown, and Company, began publishing the Statutes at Large under authority granted by a joint resolution of the 28th Congress. (1)
  • Charles Lanman, an author and former secretary to Daniel Webster, assembled the first collection of biographies of former and sitting Members for his Dictionary of Congress. (1)