Blog Post: Highlights from the June 2023 Congressional Data Task Force Meeting

Introduction

The Congressional Data Task Force meeting held on June 22, 2023, brought several significant updates and announcements in the realm of congressional data management and accessibility. Video and slides are available here.

Key Personnel Changes

  • Clerk Johnson’s Resignation: Announced her departure, effective June 30.
  • Deputy Clerk Kevin McCumber: Sworn in as acting Clerk, officially taking office in July.

Historical Insight

  • Workload Increase: The number of bills in the 117th Congress rose to 11,461, reflecting an increase of 1,000 bills over the previous congress.

Reports from the Library of Congress

Updates by Kimberly Fergusson, LOC:

  • Feedback: Encouraged feedback through the site’s feedback form
  • Congress.gov API: everything on Congress.gov will soon be available from the congress.gov API. There’s a changelog on GitHub that talks through the API milestones and is a good place to submit updates.
  • Future Plans: Announced a public forum at the Library of Congress in September.

Updates by Robert Brammer, LOC:

Government Accountability Office (GAO) Innovations

Andrew Kurtzman, Assistant Director, Innovation Lab:

  • Project Sia: A platform for Congressional Activity Monitoring, which consolidates information sources for easier accessibility and analysis. Specifically, surfaced GAO reports that are relevant to congressional hearings (identified the top 3), also identified emerging areas of congressional interest (especially when there were no matches). Lead: Andrew Kurtzman
  • Ran from Sept 2020 -2022. Working to release a new version internally to GAO.
  • Issue: too many information sources – were scraping the committee websites manually. Needed a single information source that covered all committees.
  • Major use cases: what is congress saying; gauge congressional interest; monitor hearings; help identify areas of emerging interest
  • Future Developments: Working on integrating Congress.gov API for broader data access (it covers much but not all committee information, with limitations noted in information scope. Interested in other information, such as press releases, which are unavailable on congress.gov. We hope to parse PDF/XML versions of legislation to identify mandates for GAO to do work.
  • Took 6 data scientists over six months.
  • GAO currently publishes their reports online as PDFs, but may consider publishing them in another format, such as plain text, to be able to conduct searches.

Government Publishing Office (GPO) Updates

Lisa LaPlant and Amanda Dunn, GPO:

  • GovInfo Expansion: Added 700 House and Senate hearings from 1946-1982; House reports from the 94th congress serial set (1975-1976)
  • Coming soon: GPO API and USLM 2.0.x: GPO has had a public API for years, and is currently developing a search service API – the ability to perform searches and get reports back in JSON format. Hope to get a sample working this summer. For example, could do bill comparisons and get machine readable search results back.
  • USLM: USLM 2.0.x schema is moving out of draft status. Look for sample amendment files soon.
  • XPUB: Upcoming release of XPUB system – will have congressional bills and public laws. Responsive HTML format for bills is coming soon.

ACCESS TO CONGRESSIONALLY MANDATED REPORTS ACT

Presentation by Amanda Dunn, ACMRA’s project manager

  • ACMRA Guidance recently released; agencies must certify they are complying and provide an agency point of contact
  • Timeline for implementation is 180 days after enactment, which was June 21, 2023
  • Expect to have the portal live by 12/23/23 and available to the public
  • Reports will be required as PDFs as well as open formats such as XLS and TXT
  • Using GPO’s content management system, ask ASKGPO, to manage the receipt of the documents
  • Working on a new CMR collection on govinfo
  • Working to identify early adopters to validate the process
  • Outreach? Working on table of reports to track agency submissions. Unclear whether GPO will reach out to agencies that do not submit on time. Still considering how outreach would work
  • API? As design what metadata we collect, will have info available through GPO’s API
  • A lot of functionality will be possible as move the suite of publications into an XML-based workflow.
  • Says agency could choose to withhold a report or redact it (note: this is an incorrect answer)

Working Groups and Projects

Congressional Staff Directory – Steve Dwyer:

  • Exploring the creation of a comprehensive legislative branch staff directory, including legislative issue coverage.
  • We view this as a data problem: trying to find or gather data that does not exist
  • Will soon provide a report on implementation of this project
  • Broad vision with an incremental plane: create a modern data graph to make better use of this data for us and (hopefully) for the public as well

Digitization of Congressional Documents – Kimberly Fergusson:

  • Efficiency in Digitization: Collaboration efforts to digitize historical materials while avoiding duplication of work. Digitization of historical materials is labor intensive

Congressional Video Preservation – Arin Shapiro:

  • Video Accessibility: Efforts to standardize the transmittal of congressional proceedings and enhance video accessibility.
  • Have a draft report – hope to finalize it in the next few months
  • Getting close in the Senate to delivering committee URLs to the LC for videos

Legislative Branch XML Technical Working Group – Kirsten:

  • Progress in XML integration for legislative data.

Senate Update – Arin Shapiro:

  • Getting close in the Senate to delivering committee URLs to the LC for videos.
  • There are multiple sources for committee meeting proceedings. Daily Digest will agree to add additional information to their source, which allows us to replace our internal calendar of committee events so we have one source of information. This allows us to get back to one calendar
  • New video: Just released a new player for floor proceedings with enhanced closed captioning capabilities; also updated for committees
  • Closed captioning: Exploring ways to make closed captioning available for all hearings. Trying to make unofficial closed captioning available is near real time. This is not the same as transcripts, but should be available same day. The more modern video feed allows for the separation of text from the video, but requires coding skills to separate the data feeds. No plan for the Senate to make the text available for the feed.

Notable Developments and Discussions

  • Clerk Report (Kirsten Gullickson): Ongoing projects include LIMS project and comparative suite. Hope to release comparative print house-wide soon. Still working on centralized committee portal, but nothing to report ATM. Still talking with the Senate about lobbyist disclosures and unique IDs.
  • House Digital Service (Ken Ward): Introduction of a committee deconfliction tool for scheduling. Officially launched to all 32 committees at the end of March. Looking to add caucus events. Floor schedule information is coming from an internal API – note there is no authoritative source for what’s scheduled on the House floor.

Public presentation on committee.report by Daniel at Demand Progress Education Fund

  • Automatically transforms committee reports into ePUB formats
  • Available here

Prior Meetings for which we’ve published a summary

2023: March 2023 CDTF Meeting | June CDTF Meeting | September LC Virtual Public Forum | September Hackathon 5.0 | December CDTF Meeting (scheduled)

2022: December 2022 | September CDTF Meeting | September LC Virtual Public Forum | June CDTF | March BDTF | April Hackathon

2021: July BDTF | September LC Virtual Public Forum

2020: September LC Virtual Public Forum

2019: July BDTF | October BDTF |

2018: February 2018 (available upon request) | June LDTC | November BDTF |

2017: April BDTF (available upon request) | June BDTF (available upon request) | December Hackathon

2016: May BDTF | June LDTC (and this)

2015: May LDTC | October Hackathon

2014: February BDTF | June LDTC | December BDTF

2013: February BDTF | May LDTC |

2012: April LDTC |

A Biased Yet Reliable Guide to Sources of Information and Data About Congress

Big Picture

1/ There’s big gaps in the data story

2/ Even when there’s data, it may not tell the whole story

  • Info about Congress isn’t entire reliable, even when it is official, e.g., the Congressional Record (“revise and extend”)
  • Congress historically is a paper-based institution, driven by people with agendas, and it has inconsistent archival practices, e.g. GPO established in 1860, National Archives created in 1934
  • Its institutions are built to solve a particular problem, not work for all time. Plus there’s a lot of turf wars, e.g., the former THOMAS.gov
  • Analyses, even by experts, can be unreliable because of the source data or unexpected actions. See, e.g. CRS report on the number of staff in an office (done by counting phone numbers) or the various supplementals

3/ The people who dogfood the data, such as Josh Tauberer at GovTrack, Derek Willis formerly of ProPublica, and OpenSecrets, are often forced to build additional reliability and usability into the data than that available from official sources.

4/ This presentation is idiosyncratic and focuses on particular use cases. Major topics include:

  • Federal spending information
  • Oversight and accountability
  • Legislation
  • Congressional committees
  • Information about Congress
  • Money in politics and ethics
  • Other interesting and important stuff
Continue Reading

House Publishes More Earmarks Request Data, Which We Enhance

At the end of last week, the House Appropriations Committee published all earmark requests for FY 2024 on the committee’s website, including publishing them as a spreadsheet. This is great and welcome news. For the first time, the appropriations spreadsheet separated member names into different columns and included state, district, party, and recipient address. This makes the information significantly more usable. Thank you.

In fact, it’s so usable, we spent a little time over the weekend making it even more robust. We enhanced their spreadsheet by adding bioguide IDs for each member, appropriations subcommittee codes, a standardized recipient address (with help from ChatGPT), and extracted the recipient state and zip code. We have been playing around with using the AI to categorize whether the recipient entity is a non-profit or a governmental entity. We can imagine a lot of use cases for this cleaned-up data.

The spreadsheet is available online here. We are continuing to tinker with it.

Continue Reading

Congressional Data Task Force Meeting Set For June 22, 2023

The next Congressional Data Task Force Meeting is set for June 22, 2023 from 2:00 – 4:00 pm EST.

The meeting will take place in hybrid format. You must register online here, at which point you’ll be prompted to indicate whether you want to attend virtually or in person. If you attend in-person, the meeting will take place in the House Longworth Building, room B-248/B-249.

Continue Reading

Notes from the Congressional Hackathon on April 6, 2022

(Everyone is welcome to add edits/ comments. Document created by Daniel Schuman at Demand Progress daniel@demandprogress.org)

Continue Reading

Recap: Congressional Data Task Force December 2022 Meeting

The newly renamed Congressional Data Task Force met virtually on December 13, 2022. Resources on the event, including a video of the proceedings, slides from the clerk and slides from GPO, are available on the Innovation Hub here.

Continue Reading

Congressional Data Task Force Meeting Announced for March 14, 2023

The next Congressional Data Task Force Meeting is scheduled for 2-4PM EST on Tuesday, March 14, 2023.

To register, use the following link: https://ushr.webex.com/weblink/register/r10370082ab6d3795e3bd3a105f7c717d.

Continue Reading

House of Reps Publishes Unofficial Member Data for 118th Congress

In advance of the start of the 118th Congress, the House of Representatives published resources on members of the House on the Clerk’s webpage on December 30, 2022. The resources include:

To download the information, go to the Clerk’s page > Member Information > look to the column on the far right entitled “Additional Resources.” I’ve included a screenshot below.

Screenshot of Member Information Screen from the House Clerk's website.
Screenshot of Member Information Screen from the House Clerk’s website

Advisory Committee on the Records of Congress Meeting Set for December 5, 2022

The Advisory Committee on the Records of Congress announced its semi-annual meeting will be held on December 5, 2012, from 10 a.m. to 12 p.m. ET at the Government Publishing Office. Back in June, we had request that these meetings include a virtual component, but the notice apparently requires in-person attendance only and the meetings are not otherwise recorded. We have reached out again to request a virtual aspect for those who cannot attend in person.

Continue Reading

Improving the House Statement of Disbursements: Feedback Requested

The House of Representatives wants to improve how the Statements of Disbursements are published as data and they are asking for your help and input. A summary of how we got to this point is immediately below. Skip to the bottom if you want to share your views on how the Statements of Disbursements should be published, including reviewing a sample data set that contains the House’s proposal as well as a link to where you can provide feedback.

Continue Reading