The Congressional Data Task Force Continues Efforts to Modernize Congressional Tech, Including Itself

The Congressional Data Task Force announced significant legislative branch technology modernization efforts at their second quarter meeting on June 21, 2022, starting with a name change from the Bulk Data Task Force in recognition of the expanded scope of the working group as it goes into its second decade of existence.

We have a full report on what happened at the two-hour meeting below, but here are some highlights:

The public will be able to obtain information about legislative activities via a new Congress.gov API, slated for release in September. The Congressional Data Task Force came into existence to address the public’s need for legislative information in structured data formats, which was addressed through the publication of that information in bulk. The creation of a public-facing API, announced by the Library of Congress, will make it easier for people to access exactly the information they need, which has long been a request from civil society. To test the API now, (1) email lawoutreach@loc.gov; and (2) use that same email address to sign up for the beta API key at api.data.gov. The Library also repeated an announcement first made by the Librarian of Congress when she testified before the Legislative Branch Appropriations Committee that it will hold its third virtual public forum on legislative data on September 21st at 1:30.

The public will soon have improved access to individual Senate amendments and added 6,500 committee prints going back to 1955, the Government Publishing Office announced. In addition, GPO continues to improve its APIs, which provide access to all sorts of useful documents, by making it easier to download more than 10,000 items at one time. As part of improved access to individual Senate amendments — and many other documents — the GPO has created a predictable URL link structure. In addition, GPO reiterated a previous announcement that it will significantly improve how it generates bill text, which will significantly improve how it is displayed and will include all the metadata you could hope for.

Videos of floor proceedings in the Senate, and hopefully committees proceedings as well, are the focus of a new collaborative project to preserve and improve the public availability of these and other videos, including improved integration with other congressional data, announced by the Secretary of the Senate. In addition, the Senate is working to significantly improve how it transmits legislative information to the Library of Congress, which should improve timeliness and flexibility in how that information is presented. Perhaps this means that information about Senate floor proceedings, which at times are not available prior to a vote, will be published in improved formats and speedily available to all.

The comparative print project, which shows how a draft amendment would change a bill and a draft bill would change the law, will be rolled out House-wide in the near future. In addition, the Clerk of the House has been authorized to completely redo the lobbying disclosure system, which will make it possible to track each lobbyist by their unique identifier. The Clerk has launched an updated eHopper, which is a tool that allows Members to submit legislation electronically. Finally, the Clerk is working to improve information about Members, which will make it easier to have timely access to their committee assignments.

Civil society announced (1) the online publication of code for APIs for the BillMap project, which makes it possible to track the ideas inside legislation across multiple Congresses. You can see the code online for the similarity API, the similarity and title API, and the similarity algorithms, as well as that the repositories are online; and (2) a new website that makes it possible to find information from Senate committee proceedings over the last few decades (repository here).

The next meeting of the Congressional Data Task Force will take place in August or September. Video of these proceedings and announcements of the next will be available on the Congressional Data Task Force’s Innovation Hub.

Prior meetings for which we’ve published a summary:
2022: March BDTF | April Hackathon
2021: July BDTF | December LC Virtual Public Forum
2020: September LC Virtual Public Forum
2019: July BDTF | October BDTF |
2018: February 2018 (available upon request) | June LDTC | November BDTF |
2017: April BDTF (available upon request | June BDTF (available upon request) | December Hackathon
2016: May BDTF | June LDTC (and this)
2015: May LDTC | October Hackathon
2014: February BDTF | June LDTC | December BDTF
2013: February BDTF | May LDTC |
2012: April LDTC |

=-=-=-=-=-=-=-=-=-=-=-

Welcome from Kirsten Gullickson

  • Bulk Data Task Force now named the Congressional Data Task Force as of June 21, 2022. See letter from the House Admin Committee directing the name change dated June 16, 2022. Change supported by the Clerk of the House and SCOMC recommendation 47.
  • Work goes beyond bulk data.
  • Task force info available at https://usgpo.github.io/innovation

Since the last meeting

  • 4/6/2022 Congressional Hackathon 4.0
  • 4/28/2022 SCOMC hearing on modernizing the legislative process
  • Last week: internapalooza; district caseworker training

Remarks from Yuri Beckelman, House Select Committee on the Modernization of Congress

  • Hearing on Thursday (4/23/22) on Congress and Technology. Focus: improving start-up ecosystem, tackling system challenges and opportunities, and onboarding tech talent + recruitment & retention.
  • Later hearing are set on customer-friendly Congress and on the future of technology in Congress
  • Mentions committee mark on leg branch approps bill: $10 million modernization fund. Includes tools to receive feedback from committee members to committee chairs; match legislators with others who share similar interests; congressional directory; simplify co-sponsorship process. Also supports certifications for congressional staff.
  • Member services staffer: Ananda Bhatia. Can send ideas through the website.
  • Legislative branch appropriations bill also establishes a House Intern Office and more funds for interns.

Remarks from Stephen Dwyer, Majority Leader Hoyer’s Office, on the hackathon

  • Still working on a post-hackathon report, hope to release soon.

Presentation by Daniel Schuman on BillMap

Presentation from Lars at the Lincoln Network

  • www.senatecommitteehearings.com
  • Problem statement: no central website for searching across all Senate committee hearings
  • Scraper goes through every committee website and create a spreadsheet for each individual event
  • Scraper Repository: https://github.com/Leschonander/SenatevideoScraper
  • Contains links to the videos of the proceedings.Uploading all the hearing videos to the Internet Archive to create a central repository
  • Problems Lars faced. (1) Repeatedly IP banned from the Senate. (2) Committee websites are not standardized; (3) Committee table displays are not standardized; (4) No agreement; (5) inconsistent methods to link to testimony; (6) no URL schemas; (7) No consistency across witness names
  • Approximately 2,300 videos online so far.

Conversation:

  • There is a challenge to manage the names members of the House/ Senate use for themselves.

Presentation from Carl Malamud at Public.Resource.Org

  • Focus on state regulations and state codes — working with Cornell’s Legal Information Institute, Justia, and Fastcase, and the Internet Archive — posting the regulations for all 50 states in XML on the Internet Archive, at Cornell, and at Justia.
  • Bring the state regulations into an open source content management system, INDIGO. Allows you to see redlines, to see how the code has changed. Also has an API. Currently 6 states are loaded; goal is to have all 50 by the end of the year.
  • Want edicts of government made more broadly available.

Conversation:

  • Kirsten requests a demo at the next meeting. Carl would be pleased to do a demo.
  • Using INDIGO to have a native USLM version that’s better suited to state regulatory and legislative codes

General Q&A for public presenters:

Q: Question on availability of data for congressional proceedings?

Arin:

  • There’s a predictable pattern to committee video URL. For a committee that is having more than 1 hearing on a calendar day
  • The combined congressional calendar on congress.gov comes from the daily digest inside the secretary of the senate. As long as the committees report the meeting to the daily digest, it will show up on congress.gov. At times, committees will hold hearings/ post on their individual websites and may not inform the daily digest of the hearing. The daily digest is the official source.
  • We are hoping to further expand these hearings both as structured data from the daily digest and also to have other ways to access the videos.
  • As to public resources: use congress.gov combined committee calendar or Senate XML.gov

Kirsten (Clerk): will look at whether it’s possible to create a structured data version of all hearings and markups.

Lisa LaPlant: GPO finds the RSS feeds to be helpful, including the event IDs. (RSS feeds don’t go back in time.)

Q: Lorelei Kelly is interested in how to capture information arising from the public petitioning congress, perhaps at the Internet Archive.

Carl: Anyone can establish an account with the Internet Archives and upload data and manage the collection.

Presentation from Library of Congress, Congress.gov team: Abby Weiss

  • Didn’t fully absorb this presentation.
  • Legislation text search form: implemented special operators for searching.
  • Updated “presented to president”
  • Improved committee alert emails

Presentation from Robert Brammer, Library of Congress & Congress.gov

  • Planned release of Beta Congress.gov API targeted for September 2022
  • Contain data available on Congress.gov in machine-readable format
  • REST API presented in hierarchical browse format (not search).
  • Goals: protect congress.gov from scrapers; facilitate access to complete and accurate congressional data (including even more bill data); provide a useful and extensible resource to Congress and the public
  • Have a GitHub repo with Java and XXX sample repo
  • To test it: (1) email lawoutreach@loc.gov; (2)use that same email address to sign up for beta API key at api.data.gov
  • Github space will be opened up to the public in the fall.
  • September 21st, at 1:30, another Library of Congress public forum. Hold a listening session to better serve legislative information needs and also to present on latest updates.
  • Can get data results as JSON or XML

Presentation from Lisa LaPlant, GPO

  • June 2022 release highlights (deployed this week): (1) link service for individual senate amendments; improved access to hearing addenda; support for digitized committee prints; API enhancements.
  • Link service: now makes individual senate amendments their own documents in the congressional record and can link to them. In this release, created a link service for all the individual amendments. Follows a predictable structure. Provides a predictable way to link to certain kinds of documents/ legislation.
  • Improved access to hearing addenda: Many recent hearings have had addenda. Standardized the process so that all addenda that are available ahead of the hearing will be packaged with the hearing.
  • New support for digitized committee prints: adding 6,500 prints from House and Senate committees from multiple congresses, going back to 1955.
  • API enhancements: adding “offsetMark” parameter to API request so can get more than 10k results. Also adding additional fields (including hearing number) for granules

Presentation from Matt Landgraf, United States Legislative Markup and XPUB

  • USLM: Goal: to ensure the model of USLM XML is interoperable within the legislative ecosystem. Have completed initial modeling for most major bill versions. Now modeling amendments and pre-introduced bills.
  • XPUB: New HTML format for bills and public laws (was demoed at the last meeting). Replaces plain text files. Is optimized for any device. Uses modern, easy to read fonts. Includes metadata in HTML tags for easy re-use by data providers. Also shows text that’s been deleted/inserted, and the metadata describes whether the text has been inserted/deleted. Includes significant amounts of other metadata concerning the legislation as well.
  • Shooting to release HTML responsiveness later part of this year. Need to ensure congress.gov website can handle that format. Have a more specific timeframe when we have the next meeting in the fall.

Presentation from Arin Shapiro, Webmaster, Office of the Secretary of the Senate

  • (1) How Senate clerk information is transmitted to the Library of Congress. Replacing the 20-year old process with a modern, granular approach with greater possibility for timeliness and data formats. Should finish up by the end of this year/ start of next year.
  • (2) New project has just gotten started concerning preservation of floor proceeding videos and hopefully other videos, such as committee meetings. Working to update processes with Senate archivist, National Archives, to review what has been done and to create greater access to these pieces of information and allow for the integration of other information concerning congressional data.

Conversation:

  • Looking to improve generally how video information is preserved and made publicly available.

Presentation from Veneice Smith, Office of the Clerk of the House

  • Presentation on the secure email system recently developed for electronic submission of bills. In April 2021, released eHopper: a web-based solution with rules, built-in logic, for submission of legislation. (ehopper.house.gov) Used human-centered design to guide the submission process. 1/3 of submissions are now coming directly from the eHopper website. (See PopVox discussion from eHopper)
  • Congressional redistricting: working with data partners. Preparing for the change from the 117th to the 118 congress, especially with respect to the predec

Presentation from Kirsten Gullickson, Office of the Clerk of the House of Representatives

  • Getting ready to release comparative print suite House-wide
  • Getting ready to release an update to LIMS (Legislative Information Management System, which is most of the information about the status of bills).
  • Had issued a RFI for (1) committee vote database, and 2) common committee scheduling tool
  • Getting ready to rebuilt/rewrite the Lobby Disclosure System that will address unique IDs for lobbyists
  • Will start to rewrite the system for Member information Systems (MIS) — information about Members. Will address some of the delay in when the XML file on member day is updated. When a Member certifies a resolution electing a member to a committee, once certified by the Clerk that data can be published on House.gov.

End Q&A

Q: How do you know that information published online is authentic and has not been changed?

A (Lisa LaPlant): You can look at the Hash for the file and compare the official SHA-256 hash value against the re-publication elsewhere.

Closing comments/ slides

  • A future meetings will alternate between civil society and congressional stakeholders presenting first
  • Will post meeting video and all slides at Innovation Hub https://usgpo.github.io
  • Next meeting will be in August or September from 2-4 PM EST. Likely on a Tuesday or Thursday.
  • Question for Kirsten? Kirsten.Gullickson@mail.house.gov