Half a trillion unaccounted for on federal spending transparency website

The federal government can’t find $619 billion dollars on the website it built six years ago to give a transparent account of its spending activities. USA Today‘s Gregory Korte has the full story:

A government website intended to make federal spending more transparent is missing at least $619 billion from 302 federal programs, a government audit has found. And the data that does exist is wildly inaccurate, according to the Government Accountability Office, which looked at 2012 spending data. Only 2% to 7% of spending data on USASpending.gov is “fully consistent with agencies’ records,” according to the report….OMB spokesman Jamal Brown said the administration is already working to improve the data.

The website is currently maintained by the Office of Management and Budget, and had an initial budget of $15 million.

Hat tip to AEI’s Arthur Brooks for the tweeting the story.

House Concludes Third Annual Legislative Data and Transparency Conference

(Cross-posted from CREW)

Last week, the House of Representatives held its third annual Legislative Data and Transparency Conference. The full-day symposium, which took place in the U.S. Capitol, featured speakers from inside and outside government who discussed efforts to make more legislative information available to the public, particularly in machine-readable formats.

The event was sponsored by the Committee on House Administration and included staff from House leadership offices (of both parties), the Clerk’s office, the Government Printing Office, the Library of Congress, the Office of Law Revision Counsel, the Office of Legislative Counsel, and other personal and committee offices. In addition, a number of outside groups made presentations, including the Congressional Data Coalition, a consortium of civic organizations, civic hackers, businesses, trade associations, librarians, and others who support better public access to legislative data. The event was live-streamed, and video will be made available on the Committee on House Administration’s website.

While the event was jam-packed with interesting information, three items particularly stood out.

First, the conference itself is the ongoing manifestation of the House of Representatives’ collective efforts to make its activities more open and transparent to the public. For proof, one merely needs to look to the series of annual transparency conferences, the ongoing meetings of the Legislative Bulk Data Task Force, the recent Legislative Branch Appropriations Bill, the creation of docs.house.gov, the ongoing upgrades to rules.house.gov, the updated version of the U.S. code, and so on.

Second, real progress is being made on one of the thorniest but singularly important issues: Is it possible to show, in real-time, how an amendment would change a bill and how draft legislation would change the law? For a number of technical reasons, building a solution to these questions is particularly difficult in the U.S. Congress. However, the House has made real progress in doing just that. For the details, watch the HOLC/OLRC Modernization and Next Steps presentation given by Ralph Seep, Sandra Strokoff, and Harlan Yu.

Finally, there is a growing sense of partnership and camaraderie among people who are working to make legislative information more widely available regardless of whether they are inside or outside government. Sometimes the work of people outside government is paving the way for innovations inside government. Other times, efforts by those inside government allow those of us on the outside to build clever new services and tools. In many respects, there is a real give-and-take. This is what progress looks like.

Transparency and Legislative Data Happy Hour

On behalf of the Congressional Data Coalition, you are invited to a Transparency and Legislative Data Happy Hour this upcoming Thursday, May 29, from 5ish to 7. We will get started right after the House of Representative’s 2014 Legislative Data and Transparency Conference ends.

Location: Bullfeathers, on Capitol Hill, just south of the Cannon House Office Building, 410 1st Street SE Washington, DC

We will provide (very) light hors d’eouvres and have a spot towards the back of the bar.

Please let us know you’re coming by RSVPing below (or go here).

Democracy and open data: are the two linked?

Are democracies better at practicing open government than less free societies? To find out, I analyzed the 70 countries profiled in the Open Knowledge Foundation’s Open Data Index and compared the rankings against the 2013 Global Democracy Rankings. As a tenet of open government in the digital age, open data practices serve as one indicator of an open government. Overall, there is a strong relationship between democracy and transparency.

Using data collected in October 2013, the top ten countries for openness include the usual bastion-of-democracy suspects: the United Kingdom, the United States, mainland Scandinavia, the Netherlands, Australia, New Zealand and Canada.

There are, however, some noteworthy exceptions. Germany ranks lower than Russia and China. All three rank well above Lithuania. Egypt, Saudi Arabia and Nepal all beat out Belgium. The chart (below) shows the democracy ranking of these same countries from 2008-2013 and highlights the obvious inconsistencies in the correlation between democracy and open data for many countries.

transparency

There are many reasons for such inconsistencies. The implementation of open-government efforts – for instance, opening government data sets – often can be imperfect or even misguided. Drilling down to some of the data behind the Open Data Index scores reveals that even countries that score very well, such as the United States, have room for improvement. For example, the judicial branch generally does not publish data and houses most information behind a pay-wall. The status of legislation and amendments introduced by Congress also often are not available in machine-readable form.

As internationally recognized markers of political freedom and technological innovation, open government initiatives are appealing political tools for politicians looking to gain prominence in the global arena, regardless of whether or not they possess a real commitment to democratic principles. In 2012, Russia made a public push to cultivate open government and open data projects that was enthusiastically endorsed by American institutions. In a June 2012 blog post summarizing a Russian “Open Government Ecosystem” workshop at the World Bank, one World Bank consultant professed the opinion that open government innovations “are happening all over Russia, and are starting to have genuine support from the country’s top leaders.”

Given the Russian government’s penchant for corruption, cronyism, violations of press freedom and increasing restrictions on public access to information, the idea that it was ever committed to government accountability and transparency is dubious at best. This was confirmed by Russia’s May 2013 withdrawal of its letter of intent to join the Open Government Partnership. As explained by John Wonderlich, policy director at the Sunlight Foundation:

While Russia’s initial commitment to OGP was likely a surprising boon for internal champions of reform, its withdrawal will also serve as a demonstration of the difficulty of making a political commitment to openness there.

Which just goes to show that, while a democratic government does not guarantee open government practices, a government that regularly violates democratic principles may be an impossible environment for implementing open government.

A cursory analysis of the ever-evolving international open data landscape reveals three major takeaways:

  1. Good intentions for government transparency in democratic countries are not always effectively realized.
  2. Politicians will gladly pay lip-service to the idea of open government without backing up words with actions.
  3. The transparency we’ve established can go away quickly without vigilant oversight and enforcement.

Congress at a Glance

What is Congress doing this week? The answer to this question—an assortment of hearings and markups in the House and Senate—is surprisingly difficult to find. A few publications sell this information to congressional insiders with money to burn, but only recently has a comprehensive free source of this information become available.

The privately-run congressional website GovTrack just began publishing a committee meetings calendar for all hearings and markups scheduled in the House or Senate, updated daily. This calendar levels the playing field for small non-profits and private citizens otherwise not able to afford comprehensive scheduling information.

Both Senate and House rules require nearly all committees to publish committee scheduling information a week in advance (three days for some House meetings). For a while now, the Senate aggregated the scheduling information in one place both in human-readable and machine-readable formats, but the House buried information on multiple committee webpages, often in PDFs, except for a listing of the upcoming day’s events.

With the House’s launch of its impressive new website, docs.house.gov, users can obtain information about that chamber’s activities as soon as it is scheduled. In fact, docs.house.gov goes further than the Senate website and contains relevant committee documents such as witness testimony and legislation about to be considered on the House floor. The House Rules Committee also has vast amounts of data about amendments offered for consideration on the floor.

All this means that it is now possible to combine House and Senate data to get a fuller picture of what is happening in committees across the legislative branch. (A few entities, such as Senate Appropriators, don’t have to follow these publication rules.) One would expect Congress’ flagship legislative information website, Congress.gov, to combine this information into one helpful, public-facing list, but that is not yet the case.

Traditionally, civic activists have led on congressional technology issues, with their innovations slowly leaking over into official practice. One could imagine a central list of upcoming hearings and markups that contains links to live and archived video, committee documents, witness lists, and other useful information, all in one place.

Until then, GovTrack’s unified list of committee activities has transformed civic data published by Congress into something everyone can use.

Cross-posted from CREW.

Congressional Data Coalition asks Senate to publish legislative info in digital formats

Earlier today, the Congressional Data Coalition submitted testimony to the Senate Appropriations Committee on improving public access to legislative information. The coalition made two requests.

First, we asked the Senate to concur with legislative language passed by the House of Representatives and direct the secretary of the Senate to work to implement bulk access to bill status information. Second, we requested that the Senate authorize the Library of Congress and the Government Printing Office to publish bill summary information in bulk in the same fashion as does the House of Representatives.

The Congressional Data Coalition previously had submitted testimony to House appropriators requesting bulk access to bill status information. While this recommendation was not adopted in subcommittee, an amendment to this effect offered by Rep. Mike Quigley, D-Ill., was adopted by the full committee in early April and passed by the House of Representatives yesterday.

The House of Representatives has led in making legislative information available to the public in digital formats. We hope the Senate will engage in efforts to ensure the public has access to congressional activities in a manner befitting our modern technological age.

The letter was jointly co-authored by Citizens for Responsibility and Ethics in Washington (CREW) and Civic Impulse, LLC, on behalf of the Congressional Data Coalition. It was signed by the Data Transparency Coalition, Legisworks.org, the National Priorities Project, the OpenGov Foundation, OpenTheGovernment.org, the R Street Institute, the Sunlight Foundation, WashingtonWatch.com, Jerry Hall of eCitizens.org and GovAlert.me, Molly Schwartz of the R Street Institute and Gregory Slater.

Federal News Radio interview on the Congressional Data Coalition

Earlier today, Emily Kopp of Federal News Radio interviewed Congressional Data Coalition chairman Daniel Schuman about the launch of the coalition and a recent victory in the House of Representatives. Listen here.

Public access vs. open access

“Doesn’t Congress already make its information publicly accessible?”

That’s the question I hear most frequently when I tell people about the Congressional Data Coalition’s mission to get Congress to provide open access to its data. “Open access” is a complicated and loaded term in the digital information world, but at its core it involves three main components:

  1. The ability to find the data.
  2. The ability to use the data.
  3. The ability to repurpose the data.

Truly achieving open access to congressional data will require more than just posting the information online: the information has to be in the correct format. Presenting data as gobs of text is seriously problematic because machines cannot read it.

In today’s information ecosystem, information that cannot be parsed and read by machines is like building an all-terrain vehicle that can only drive straight forward. It might be able to get you where you need to go, but only if your destination lies straight ahead. And it completely defeats the purpose of being able to drive off-road.

So what can congressional data that is machine-readable do that facilitates open access?

  1. Finding the data: Search engines can search for the content stored within documents.
  2. Using the data: A variety of programs can access and display the data. Mobile apps can provide to-the-minute updates, APIs can scrape it and immediately display it on another website, programs can download it into spreadsheets, etc.
  3. Repurposing the data: Data can be run through programs that display it in charts, graphs, or elegant visualizations. Journalists and engaged citizens can also get timely access to the data that informs their output and ideas.

Plus there is a multitude of extra benefits. Machine-readable data is more accessible to people with disabilities because screen readers can read it. It is also easier to preserve because the data is not dependent on the software we use to access it, which will most likely become obsolete within the next ten years (remember floppy disks? Word Perfect?). Because laws passed by Congress can remain in effect for decades, we have to keep the data that allows us to put those laws in context.

Big step for public access to legislation

Earlier today, the House of Representatives’ Appropriations Committee made a major move towards improving public access to legislative information. In layman’s terms, the committee said that by the beginning of the next Congress information about the disposition of bills—where they are in the legislative process and who authored or co-sponsored the legislation—will be published in a way that computers can easily process, and thus can be easily reused by apps and websites.

Americans access legislative information through third-party sites. This change in publication policy will help guarantee that accurate, timely, and complete legislative information is directly available from the official source. Congress already publishes the text of legislation in a structured format that is downloadable in bulk.

The committee specifically directed the Clerk of the House to work with the Librarian of Congress and the Public Printer to publish bill status information for bulk data downloads by the beginning of the next congress. This has been a long-standing request of the public interest community and was the subject of a recent letter sent by CREW and GovTrack.us on behalf of the newly formed Congressional Data Coalition.

The report language came at the behest of Rep. Mike Quigley (D-IL), who recommended the committee adopt this language in its report. His recommendation was the culmination of many years of hard work by legislative transparency advocates in both parties, including (but not limited to) Speaker John Boehner (R-OH), Majority Leader Eric Cantor (R-VA), Minority Leader Steny Hoyer (D-MD), and Reps. Darrell Issa (R-CA), Mike Quigley (D-IL), Mike Honda (D-CA), and Ander Crenshaw (R-FL).

In June 2012, Speaker Boehner, Majority Leader Cantor, and then-Legislative Branch Appropriations Subcommittee Chairman Ander Crenshaw issued a letter on the occasion of the establishment of a Legislative Bulk Data Task Force charged with looking into improved public access to legislative information, stating “our goal is to provide bulk access to legislative information the American people without further delay.” Rep. Issa had offered an amendment to put that requirement into law, but withdrew it pending the report of the Task Force. In its December 2013 report, the Task Force recommended “that it be a priority for Legislative Branch agencies to publish legislative information in XML and provide bulk access to that data.” While the issue was not raised during the recent Legislative Branch Appropriations Subcommittee hearings, Ranking Member Debbie Wasserman-Schultz (D-FL) singled out Rep. Quigley at the full committee hearing for making the recommendation.

With the report language in the final committee report, it is unclear what additional action, if any, is necessary to put it into effect. The House Appropriations Committee has tremendous sway over legislative branch agencies, who may spring to comply even in the absence of floor action in the House. The Senate, in its own committee report, may not address the issue (thus perhaps giving tacit approval) or may expressly agree or disagree to bulk publication of bill status information. Indeed, the Senate’s Legislative Branch Appropriations Subcommittee is still reviewing its appropriation bill, having met just yesterday.

Regardless, today’s action in the House is a significant win for transparency. Public interest advocates have been fighting for bulk access to legislative information at least since May 2007, and the House has now put its full weight on the side of legislative transparency.

Here is the report language:

The Committee request that the Clerk of the House, the Librarian of Congress and the Public Printer work together to make available to the public through Congress.gov or FDsys bulk data downloads of bill status by the beginning of the next Congress.

US Statutes at Large: Essential to understanding our laws and legislative history

One of the benefits of the Congressional Data Coalition has been our ability to collaborate on mutual projects of interest. CDC members recognize that reusable, cleaned-up legislative information, especially the laws themselves, is essential for both the legislative data community and the public. Unfortunately, at least some information will likely not be provided by Congress or will not be provided in a timely manner.

Almost 3½ years ago, in November 2010, GPO and the Library of Congress were authorized by the Joint Committee on Printing to make the following three document sets available on the Internet: Statutes at Large, the Congressional Record (1878-1998), and the Constitution of the United States: Analysis and Interpretation (CONAN). Quoting from the JCP letter: “These are key primary research sources, essential to understanding our laws and legislative history, and they should all be readily available online in electronic format.”

So far, volumes 65 through 124 (1951-2010) of the Statutes at Large and PDF files only of CONAN have been published by the Legislative Branch per the November 2010 authorization.

Why are the Statutes at Large important?

The United States Statutes at Large is the legal and permanent evidence of all the laws enacted during a session of Congress (1 U.S.C. 112). Every law, public and private, is published in the order of its passage. The set contains treaties and international agreements before 1948, concurrent resolutions, proposed and ratified amendments to the Constitution, and proclamations by the President. Pretty much the whole enchilada – and before you ask about the Constitution, yes, volume 1 includes the Declaration of Independence, the Articles of Confederation, and the Constitution of the United States.

But isn’t the US Code the law? Only a subset of the laws in the Statutes at Large are contained within the U.S. Code and many of those laws have been modified by subsequent laws to the point that the original language is difficult to discern. Hundreds of laws have been enacted that never made it into the United States Code. For example, of the 440 laws enacted in 1949, 235 made it into the US Code.

The importance of Internet accessibility to the laws enacted before 1951 should be obvious. The Law Revision Counsel (the organization responsible for putting together the U. S. Code) in their Table of Acts Cited by Popular Name have identified almost 2,100 laws that were enacted before 1951. Searching legislative text from the 112th Congress (2011-2012) shows that the past is not completely forgotten. About 6 percent of Statute at Large citations reference pre-1951 volumes.

So, while some of these laws are cited in current bills, they remain in 2014, officially available only as paper documents and, unofficially, there are scanned versions of the volumes at the Constitution Society’s website, but these volume files have not been broken down into individual laws, treaties, Presidential proclamations, etc. until now.

Making more laws available

Starting in January 2014, the Congressional Data Coalition and citizens joined together to make the individual laws and other documents of the US Statutes at Large available as discreet PDF files. We’re a little over half way through the initiative but we need volunteers to help for the final push.

Rather than attempting to produce a full-text table of contents for each volume as was accomplished by GPO for the post-1950 volumes, we’ve extracted the page number where each component (public law, resolution, etc.) begins by reusing the OCRed text from the constitution.org PDF files. We then crowdsource the proofreading and correcting of the data which is where we need your help. Once the simple table of contents is completed, software extracts the individual PDF files for each sub-document. The software to do all this is open source and available online.

As of April 2014, volumes 28 through 64 (1893-1951) have been processed. We’ve also begun extracting the text from the tables of contents from the volume files and combined it with the simple table of contents data being used to create the files (sort of like a final QA check). By combining the two data sources (the text from the tables of contents along with the public law number and stat page data, we’ve been able to build more usable tables of contents. See the U. S. Statutes at Large Pre-1951 Directory.

The future of legislative data collaboration

Our approach has combined crowd-sourcing, manual editing, and automated processes. We’ve received help from a variety of outstanding volunteers. In two months, we have expanded the availability of laws by 50 years and over 15,000 acts, treaties, and international agreements.

Similar approaches should be strongly considered for publishing other historical documents on the Internet. The best example of the elephant in the room of course is the Congressional Record – only available on the Internet back to 1994 but published since 1873. As software developers, both inside and outside of government, we should be thinking in terms of how crowdsourcing can help us build the necessary document repositories for the 21st century.

Our role, as the Congressional Data Coalition, includes supporting public initiatives that provide improved legislative information for ourselves and the public. Tom Bruce, Director of the Cornell Law Information Institute, said it eloquently in his hangout session when he talked about the dream of having an open-access Westlaw or LexisNexis with layered access to information providing legal/legislative services housed under many roofs – a federation of services and data.

We should not shy away from identifying data anomalies and provide corrected data in a fully transparent and constructive way to support the public need for accurate and timely legislative information. It might not seem that having all of these laws as discreet files on the Internet would mean much. We’ve lived without them as discreet electronic files for a long time without any apparent problems. My hope for now is that these documents will extend our electronic legislative library so that our history can be read and referenced over the Internet.

Please consider helping our effort and volunteering along with us at http://legisworks.org/sal. Special thanks to Owen Ambur, Daniel Schuman, Sara S. Frug, Joe Jerome, and Matt Steinberg.