Recap: Congressional Data Task Force December 2022 Meeting

The newly renamed Congressional Data Task Force met virtually on December 13, 2022. Resources on the event, including a video of the proceedings, slides from the clerk and slides from GPO, are available on the Innovation Hub here.

This meeting covered updates on:

  • Publishing information about newly-elected members of Congress
  • The Senate Task force on making bill text available prior to votes
  • Modernizing Lobbying disclosure data to include unique IDs for lobbyists
  • LIMS modernization
  • Efforts to build a central committee repository, including publishing votes and tracking committee witnesses

In opening remarks by the Clerk’s Kirsten Gullickson, As part of the overview, she indicated that the Clerk’s quarterly reporters are available on GPO’s Innovation Hub Website, here.

Change of Congress

The House Clerk’s Veneice Smith discussed “change of congress” activities, as the Congress rolls over from the 117th to 118 Congresses. She explained that Member-elect data may be released by the Clerk late this month, and that official member data will be released on the opening day of the 118th. In addition, committee and subcommittee names will be on the Clerk’s House website; please note that the subcommittee codes may change. Finally, the Clerk’s office will now include its data model the ability to track, collect, and publish social media accounts.

The Secretary of the Senate’s Arin Shapiro explained that the Senate doesn’t publish preliminary data about members- elect, and information about the new members would be available on the first day of the new Senate.

Bioguide websites will be updated on opening day.

Statement of Disbursements

The CAO’s Bob Barnett provided an update on efforts to modernize the Statement of Disbursements, including providing a model of what the data would look like. This is a reprise of the presentation in September — scroll down the CAO’s publication of slides and a sample spreadsheet here. The goal is to have much of this information from the CAO for what’s released in February

Among the feedback was a request for improved documentation of the SODs. Once the congressional transition is completed the CAO will package the requests (from the public) regarding upgrading the SODs and propose them to the committee, making the recommendations as appropriate.

There was a useful discussion of the various requests.

  • Crosswalk for every org code and entity to which they refer — no technical problems with publishing this. A few might be sensitive for security reasons
  • Crosswalk between the BOC and the type of expenditure they refer to — shouldn’t be a problem. It’s published internally. (will need to check)
  • Crosswalk between Vendor IDs and the entities to which they refer — technically doable. There may be vendors that have never received a payment or not received one for a long time. No obvious issues.

Request to add additional data to the file

  • Vendor IDs for individuals receiving funding — would have to create a separate ID so don’t disclose PII. Is technically doable. No obvious security problems.
  • Provide more clarity on individual’s titles and roles; is it possible to standardize roles or titles? — the offices are autonomous and titles can be a form of compensation. We can’t do, but will include in the request
  • Can you differentiate between DC and district staff — don’t have this info in financial systems

Ken Ward, House Digital Service

Hiring up new staff. Looking at the constituent mail data analysis system, and to create a legislative branch directory (with issue areas on which people work.)

Arin Sharpio, Office of the Secretary of the Senate

The Secretary of the Senate is still working with the LC to provide a new means for the Senate to disseminate information to them through the API. They are modernizing the format and delivery mechanisms of Secretary of senate info to the LC. The project should wrap up by the end of 2023. No publicly-visible result, but will make matters easier in the background.

The Senate is planning to change the way they provide videos. Working to consolidate some resources and make a uniform system for the Senate and for end users, leveraging what’s already in place for committee hearings. There’s a technical working group that’s continuing to meet and has been meeting since April and are getting ready to make recommendations. Some video information will become available on

Working to make more obvious the activities that have occurred on the Senate floor as XML, with links to where you can get more information about what’s happening. This will allow you to look legislative day by legislative day and integrate that data into other systems.

Working to improve alerting for when the senate is in session. Currently using the floor schedule maintained by senate library staff, which now has a JSON version. The JSON will tell you where to find the video, including the expected URL.

For floor activity update, the is unofficial and likely faster, as the floor update activity xml isn’t available until the next day.

The best way to access current committee videos is on the committee websites, which is under the jurisdiction of the committees. Senate committee videos don’t currently make it to, but they’re trying to build the fields and systems to transfer the information. There is a datastream between the Senate and to deliver the hearing information, but need to add the additional fields. Also, down the road, hope to have a video directly accessible from or NARA.

Government Publishing Office

Currently digitizing the Statues at Large to 1789 and digitizing all the files as structured data back to 1789.

Expect to update the API to a v3 data structure on December 20, 2022. This will add links to house roll end points in xml and more.

Expect to have a new responsive HTML format, to be rolled out with the 2023 release, for the responsive HTML format you see on GovInfo. It looks good.

For 2023, trying to get done the modeling activity for remaining bill versions.

A lot of the work on the items in the roadmap is not linear. Much work is going on with respect to committee prints as well as getting the congressional record into a better format.

Committee transcripts are on the roadmap to put into USLM as part of the committee documents. Most committees are not publishing unedited transcripts, which can create significant delays.

In 2023, hope to have the API out of beta and to release collections of: committee meetings, transcripts, committee prints, house communication requirements, and the bound congressional record.

House clerk & OLC

An excellent demo of the comparative print project, which shows how an amendment changes a bill or a bill changes a law. 2/3s of the comparative print project are released: document to document comparisons and changes to existing law. Still working on the amendment impact program, with the third module to be released in 2023.

Clerk was unable to definitively answer about making the API or data or methods behind the comparative print project publicly available.

There are plans to standardize committee vote templates and to offer them to committees.

Prior meetings for which we’ve published a summary:

2022: September CDTF Meeting | September LC Virtual Public Forum | June CDTF | March BDTF | April Hackathon

2021: July BDTF | September LC Virtual Public Forum

2020: September LC Virtual Public Forum

2019: July BDTF | October BDTF |

2018: February 2018 (available upon request) | June LDTC | November BDTF |

2017: April BDTF (available upon request | June BDTF (available upon request) | December Hackathon

2016: May BDTF | June LDTC (and this)

2015: May LDTC | October Hackathon

2014: February BDTF | June LDTC | December BDTF

2013: February BDTF | May LDTC |

2012: April LDTC |