A Biased Yet Reliable Guide to Sources of Information and Data About Congress

Big Picture

1/ There’s big gaps in the data story

2/ Even when there’s data, it may not tell the whole story

  • Info about Congress isn’t entire reliable, even when it is official, e.g., the Congressional Record (“revise and extend”)
  • Congress historically is a paper-based institution, driven by people with agendas, and it has inconsistent archival practices, e.g. GPO established in 1860, National Archives created in 1934
  • Its institutions are built to solve a particular problem, not work for all time. Plus there’s a lot of turf wars, e.g., the former THOMAS.gov
  • Analyses, even by experts, can be unreliable because of the source data or unexpected actions. See, e.g. CRS report on the number of staff in an office (done by counting phone numbers) or the various supplementals

3/ The people who dogfood the data, such as Josh Tauberer at GovTrack, Derek Willis formerly of ProPublica, and OpenSecrets, are often forced to build additional reliability and usability into the data than that available from official sources.

4/ This presentation is idiosyncratic and focuses on particular use cases. Major topics include:

  • Federal spending information
  • Oversight and accountability
  • Legislation
  • Congressional committees
  • Information about Congress
  • Money in politics and ethics
  • Other interesting and important stuff

Federal spending information

The current White House Budget proposal, including the useful Budget Appendix (which has the proposed bill text and top level request), is available from the OMB’s website, along with historical tables https://www.whitehouse.gov/omb/budget/

  • For the plain-language explanation of what an agency is looking for, search for “Congressional Justification” at the agency’s website. Older justifications tend to disappear, so use the wayback machine to look for them. Starting in FY24, the new Congressional Budget Justification Transparency Act requires them to be published online as PDF and data, in one place, at https://www.usaspending.gov/agency, within two weeks of submission to congress, and to keep track of which ones are not available. As a backup, you can FOIA the agency, ask the relevant congressional committee for a copy, or perhaps find it in the committee’s hearing report. Starting this year, the CBJ’s should also contain a list of items recommended by GAO or the IG for agency implementation that are not completed.
  • Understanding what appropriators actually consider and enact into law is an art, not a science. Congress.gov publishes status tables at https://crsreports.congress.gov/AppropriationsStatusTable that will give you the House and Senate bill text, committee reports adopted in either chamber, final language, and the joint explanatory statement that accompanies the final bill. If you want to know what statements were submitted to the committee or the nature of the discussion, you’ll need to look up at the committee webpage if it’s recent. If it’s within the last decade or so, you can find it at docs.house.gov. Videos of the House proceedings and some of the Senate proceedings are available on their webpages. Alas, there is no line item representation of the spending levels in any particular appropriation bill and no list of reports mandated by that law.
  • A new law, the Access to Congressionally Mandated Reports Act, requires the GPO to compile a list of all reports due to Congress from federal agencies as well as many of the reports themselves. It should be implemented some time in 2024. In addition, the Clerk of the House maintains a partial list of reports due to Congress, entitled “REPORTS TO BE MADE TO CONGRESS COMMUNICATION FROM THE CLERK.” Google for it. This lets you know what’s due by each agency, but it only includes reports due to either chamber, not the committees or subcommittees.
  • Implementation of spending decisions enacted by Congress should ultimately show up at one point or another in USASpending.gov, including grants, contracts, as well as the prime and subrecipient.

Oversight and Accountability

  • The testimony of people who testify before Congressional committees can usually be found on the committee website. Occasionally those websites are “refreshed” and that information is lost. You can use the wayback machine to find older testimony. The written testimony should be gathered and included in the committee report on the hearing, if there is one, which is available from GPO.
  • GPO will often have a transcript of those hearings. Remember, however, that people who testify before the committee, including committee members, can update and change their words (oftentimes), so it may not be verbatim. If there’s video, you may wish to check that, or use a paid transcription service.
  • Most, but not all federal inspector general reports are available from Oversight.Gov, which is a website maintained by the Council of Inspectors General. Older reports won’t be up there, and some IGs do not comply or didn’t make available their back catalog. You can FOIA the IG reports and, if they’re old enough, you may find them at the Center for Legislative Archives at NARA, which can give them to you if they were submitted to Congress (when enough time has passed).
  • The GAO publishes its reports on its website https://www.gao.gov/, although some reports are restricted and indicated but not published there (https://www.gao.gov/reports-testimonies/restricted). However, GAO provides a FOIA-like process and you can request the reports.
  • CBO scores are published on its website https://www.cbo.gov/
  • The Law Library of Congress publishes excellent legal reports on foreign activities on its website. https://www.loc.gov/collections/publications-of-the-law-library-of-congress/about-this-collection/
  • The Congressional Research Service, after much prodding, is publishing some of its CRS reports, focused on domestic legal and policy issues on its website. https://crsreports.congress.gov/. It’s not a very good or complete website. We have twice as many reports at https://www.everycrsreport.com/ and our search is much better. We also publish data behind the reports when we have it. If you need old reports you can ask your member of congress to request it or there are paid services that may have it (for a not insignificant fee).
  • The Government Publishing Office has millions of federal documents, including potentially what you’re looking for. https://www.govinfo.gov/ The website is a bit tricky to use, so asking a government document library to help is often a great idea. GPO also maintains an API for its content https://api.data.gov/docs/gpo/
  • A number of civil society organizations can be pretty helpful on this score. It’s often worth a call to the Project on Government Oversight.


  • Congress.gov is the official source for legislative data and the website is pretty good looking — a significant improvement over THOMAS. Its bill text remains limited to the mid 1990s, although it has (often in PDF format) enacted laws going back centuries. It now has a public-facing API. https://api.congress.gov/ Note that Congress.gov is not the repository of the data, but (largely) an interference to information held elsewhere. Also Congress.gov is usually at least a day behind legislative activity (i.e. bill introductions), but at times that can lag significantly.
  • Most legislative text and documents live at the Government Publishing Office, which maintains the robust govinfo.gov. GPO also publishes bill text and other information in bulk. https://www.govinfo.gov/bulkdata. GPO also has an API. https://api.govinfo.gov/docs/.
  • For an alternative experience for navigating legislative text with different tools and enhanced data, govtrack.us is a great resource. (https://www.govtrack.us/) They no longer publish their data in bulk because they won the battle to get congress to publish much of this data.
  • If you want real-time access to legislation to be considered in the House or the contents of congressional hearings, https://docs.house.gov/ is for you. The section “bills to be considered on the floor,” which also is published in XML, gives you all the bills set for floor consideration this week. The section “committee repository” gives you the last 10 or 15 years worth of committee information. Also, don’t miss out on the House Rules Committee website, https://rules.house.gov/, which includes all bills that go before the rules committee prior to a vote on the floor, including the text of every offered amendment, what’s adopted in the rules committee, and more. It’s also published as XML.
  • The Senate is behind the times when it comes to legislation to be considered on the floor. Your best bet to see amendments and bill text is on the congressional record, although the bill text will eventually show up on Congress.gov. They are working to update congress.gov to show the amendment text as well.
  • If you’re looking for video of the proceedings, this is an iffy proposition. Lars at the Lincoln Network created a central repository of all available links to senate committee video going back about 20 years. https://www.senatecommitteehearings.com/transcripts

Congressional Committees

  • In theory, GPO has all the transcripts and reports of congressional committees. In practice, transcripts of proceedings can be doctored by the committees as well as some kept secret, and committees can be slow to publish reports and some fail to do so entirely. GPO is likely your best bet. But if you want this info and it’s not available, the Center for Legislative Archives at the National Archives may have unpublished material, and you can sometimes find this stuff from the archives of retired members of congress maintained at various educational institutions.
  • House committees do publish committee roll call votes, but they don’t do it in a central place. You need to look for the PDF file on each committee’s website. The House is putting together a central repository that should have all the roll call vote info for each committee, plus a place to submit testimony, more info about the witnesses, etc., but that’s a work in progress.
  • The House and Senate do publish Roll Call votes on the floor. House Roll Call votes are online here, https://clerk.house.gov/Votes, and the data is also available in XML. The key for identifying members is the BioGuide ID (https://www.congress.gov/help/field-values/member-bioguide-ids). The Senate Roll Call votes are here, https://www.senate.gov/legislative/votes_new.htm, but instead of using the Bioguide ID, their XML uses the LIS ID. The LIS IDs are published here. (https://www.senate.gov/about/senator-lookup.xml). If you need a crosswalk between the BIOGUIDE and LIS IDs, let me know.
  • A great resource (at least for House Committees) is that they publish an end of the congress report that lists everything they did — every hearing, every vote, every markup, etc., called an activity report. Here is an example. https://www.congress.gov/116/crpt/hrpt718/CRPT-116hrpt718.pdf

Information about Congress

  • We’ve published many of the support office and agency budget justifications on our github page — https://github.com/DanielSchuman/Policy/wiki/Congressional-Budget-Justifications. They’re not all publicly available — this is what was published by the House concerning the agencies that report to it. I have no idea how to get the ones sent to the senate
  • Brookings Vital Statistics has information about congressional staff in the offices and agencies going back decades. This information is not entirely reliable but it is what everyone cites. https://www.brookings.edu/multi-chapter-report/vital-statistics-on-congress/
  • The Congressional Research Service has a series of reports about committee staff levels. These should be taken with a grain of salt, as some of them were conducted by counting the names listed in the congressional phone directory, which is not an entirely reasonable indicator. https://www.everycrsreport.com/reports/R43946.html
  • Zach Graves at the Lincoln Network and I have scoured the budget justifications for the last 40 years and built our own dataset on staff numbers, but we have not yet published the data. In our view, it is more reliable than the other data sets.
  • The House Statement of Disbursements https://www.house.gov/the-house-explained/open-government/statement-of-disbursements and Senate Secretary Report (SOPOEA) https://www.senate.gov/legislative/common/generic/report_secsen.htm has every expenditure in the House and in the Senate. The last half-dozen years of the House are published as spreadsheets; the last dozen years are published as PDFs, and the rest is in book format somewhere. The House is in the process of improving how they publish data in their reports. The Senate looks to continue publishing as PDFs. The reports online only go back so far, but they go back further in the House Office of Public Disclosure https://disclosures-clerk.house.gov/ and the Senate Office of Public Records https://www.senate.gov/legislative/opr.htm. Also some libraries have digitized these back to the 1970s.
  • If you want more info about the Capitol Police, it’s probably just easier to call me. They do have a statement of expenditure, but it’s hard to find (and mostly missing). There’s also a misconduct database of limited utility. This is a good starting point. https://firstbranchforecast.com/2021/01/06/a-primer-on-the-capitol-police-what-we-know-from-two-years-of-research/
  • Many of the agencies file testimony if/when they testify before the legislative branch appropriations subcommittee. It’s often perfunctory, but sometimes there’s useful stuff. The House testimony can be found for the last decade on docs.house.gov. The Senate ones are published (usually) on the committee webpages. This is not a great system. If you want to go back in time, you can either go to your local government document librarian (often at local public libraries) or the Law Library of Congress, which have free access to paid services that can get you this stuff.
  • The Executive Branch, which tracks federal employees, does track the overall number of employees in the legislative branch. I think it’s the Department of Labor or Census. Last time I did it it was a pain to pull the data, but it is there.

Money in Politics and Ethics

Other interesting and important stuff

  • Tons of tools and scrapers for information about Congress are available from the United States Project Github Repository. https://github.com/unitedstates. They are largely maintained by civil society. It also has many of the various forms of unique identifiers for members of congress, etc. You really should go here because it addresses many of the needs you will have for data, identifiers, tools to extract information, etc.
  • The Legislative Branch Innovation Hub, maintained collaboratively by GPO, has tons of info about Congressional data and data standards. https://usgpo.github.io/innovation/ It also links to the useful XML working group. https://xml.house.gov/
  • Derek Willis is a fount of useful technology and data about Congress, with many of the items maintained by ProPublica. The ProPublica Congress API has tons of info about members of Congress. https://projects.propublica.org/api-docs/congress-api/ The Represent tool is a useful way of searching press statements made by members of congress. https://projects.propublica.org/represent/
  • There’s a useful tool to look at what members of congress say in their emails to constituents, DC Inbox. https://www.dcinbox.com/
  • The Congressional Bioguide is terse, but they do link to official photos and have a unique ID for every member of congress. https://bioguide.congress.gov/
  • There’s a surprisingly useful House phone directory with good download capabilities. https://directory.house.gov/. The congress is in the process of exploring whether to create a congress-wise phone directory (including issue areas).