Congressional Data Task Force Meeting on December 19, 2023

The Congressional Data Task Force held its third quarterly meeting on December 19, 2023, in the Longworth House Office Building. The agenda, video, and slides are available here. Next year’s meetings are tentatively scheduled for: March 19, June 6, and December 12, 2024.

Highlights and Key Takeaways

  • The GPO successfully launched the first phase of the Congressionally Mandated Reports website, which publishes on GPO’s website many reports required by law to be provided to Congress. The reports contain a significant amount of metadata, which makes it easy for users to find what they are looking for. A number of significant features, including a list of all reports due to Congress, remains to be built.
  • The CDTF published its recommendations to House appropriators urging the House to adopt a federated data governance framework; to support existing streams for data publishing; to support current and new initiatives in support of data exchanges between the Legislative branch data stakeholders; and to support the cross-organizational efforts embodied in the Congressional Data Task Force.
  • We are still waiting for the outstanding congressional video preservation report, required by appropriators. Meanwhile, the Senate is using a new system provided by the Library of Congress to provide video and accompanying information (think closed captions) to the LC and public. In addition, the LC is working to improve committee data on congress.gov, including incorporating Senate video information in its committee event pages, with the hope for links to Senate video to be available in its API as soon as Q1.
  • The Comparative Print Project is ready from a technological perspective to be shared more widely across the legislative branch, so any remaining considerations on use are not technical (except for authenticated logins). In other words, the technology is mature but there may be other considerations regarding its use.
  • The legislative branch staff directory is awaiting funding so that it can go into an active building phase/
  • House Digital Services is supporting the development of multiple tools, including: a staff pay dashboard; ready to prototype its flagtrack project in January; and is prepping development work for casework data sharing.
  • The Senate appears to be stuck regarding publishing bills prior to votes. Specifically, it is still awaiting resolution of policy issues concerning whether and how to make bills, reports, and amendments available online in that chamber and for the public prior to a floor vote.
  • Legislative support agencies are finding ways to explore and make use of AI.
  • It will soon be time for feedback on USLM for some bills and amendments as the modeling is as complete as it can be.
  • The Lobbying Disclosure project is still awaiting resolution of an approach in the Senate.
  • The House’s Committees website project is awaiting funding and is ready to begin work once it is received.
  • CBO has a number of important datasets on GitHub.

Key Recommendations of the CDTF

As part of the convening, the CDTF alluded to Recommendation C of a report about “House Information Websites” that CDTF sent to House appropriators on June 30, 2023. They have made the relevant section of that report available online here. Among its key recommendations:

  • The House should adopt a federated data governance framework
  • The House should adopt strategies that continue to support the existing publishing streams and organizational responsibilities related to House data as it flows to Congress.gov and GovInfo
  • The House should continue to support current and future initiatives including USLM and improvements in the data exchange between the Clerk and Congress.gov as well as modernizing several data sources improvements in the data exchange between the Senate and Congress.gov

In addition, the CDTF wrote the House should support works in progress, cross-organizational efforts, and working groups of the task force. These include:

  • The Congressional Video Preservation and Access Technical Working Group
  • The Legislative Branch XML Technical Working Group
  • The Working Group on Digitization of Congressional Documents
  • The Legislative Branch Data Interchange Working Group
  • The Working Group on a legislative branch-wide Online Staff Directory

The Agenda

The meeting’s agenda was as follows:

  • Welcome
  • Report from GPO on the Congressionally Mandated Reports Collection
  • Panel Discussion on Artificial Intelligence
  • Reports and Updated from the CDTF Working Groups and Legislative Branch Organizations
  • Q&A

Congressionally Mandated Reports – Amanda Dunn from GPO

Amanda Dunn presented on GPO’s new Congressionally Mandated Reports website.

At the end of last year, Congress directed: (1) GPO to create a website that contains many of the reports it requires agencies to submit to Congress, (2) agencies to provide the reports to GPO. There’s also online guidance on how agencies are to submit the reports, including appropriate metadata, as well as a webpage for agencies to submit the reports. As of December 12, 2023, GPO launched the website with more than 100 reports.

Lots of information is gathered about the reports, including the date they’re submitted, the date they’re required to be submitted, which agency is submitting the report, the committees that receive the report, the topics they cover, and so on. This allows users to sort through the data in many different ways: by date, by committee, by topic, by agency, and so on.

The reports are submitted both as a PDF and also in other formats. They can be downloaded in bulk via API. It’s also possible to download a table of all the submitted reports at this website. Looking at the download, there is a minor issue in that some of the characters (such as apostrophes) are being misrendered. It would also be useful to have an agency unique ID (like a treasury code ID) and not just the same.

As a consequence of the legislative process, some reports are not required to be published online either because of the originating agency or because of the committee to which it is submitted. I asked a question about what happens when a required report goes to committees A, B, and C, but committee C’s reports aren’t published online. My view is that the report should be published anyway because reports for Committees A and B are public. I did not get a clear answer to that question.

GPO is in the process of answering many questions from agencies and there’s a lot of back and forth.

All in all, the publication of these reports online is a tremendous boon for Congress, agencies, and the public. Many reports that would be available by FOIA are now proactively being reviewed and published online. This will allow committees to see the reports submitted to them, which heretofore has been quite difficult.

While the reports do implicate executive communications, at the moment the Library is not working towards adding the Congressionally Mandated Reports reports to the executive communication information included on Congress.gov.

Artificial Intelligence Panel Discussion

The House Administration Committee’s Jessica Smith introduced a panel discussion on AI. In her opening remarks, she expressed a hope that Artificial Intelligence is a force multiplier in the Legislative branch. It can help to manage casework and constituent requests; draft communications for public or internal messaging; and help every single person in an office work faster and smarter. AI can help Congress be more effective and efficient. (Note: House Administration Committee’s AI Flash Reports are here.)

Panelists included:

  • Laurie Allen, Chief of the Digital Innovation Division, Library of Congress – see LC Lab’s AI page
  • Mark Hadley, Deputy Director, Congressional Budget Office – see CBO’s transparency page
  • Sam Musa, Chief Information Officer, Government Publishing Office – see GPO’s AI page
  • Steve Dwyer, Senior Advisory, Chief Administrative Officer of the House of Representatives – see HDS’s LinkedIn page
  • Raymond Woeller, the Senate Chief Technology Officer, was invited but was unable to participate due to a scheduling conflict.

I’m not going to reproduce each panelist’s response to the three questions asked, but instead highlight new or interesting information.

What are you doing with respect to emerging technologies and AI?

  • Sam Musa at GPO: Created an AI policy, governance committee, and are working on an AI strategy.
  • Laurie Allen at LC: The Library has been experimenting with AI for 4 years. Currently 3 formal experiments: on copyright; ML for catalog generation; and for subject classification and bill summaries for Congress.gov. Established an AI working group that reports to the technology strategy board.
  • Steve Dwyer at CAO: Used our existing cloud capabilities to let staff try the various LLM tools, obtaining authorization for use of ChatGPT Plus. Established a working group with more than 200 staffers; more than 100 offices have signed up for licenses and half have provided feedback. Offices find ChatGPT surprisingly useful. Also pushing forward with a use case inventory, completed a governance assessment, and moved the work into more formal policies.
  • Mark Hadley at CBO: Thought about governance; set up a process to evaluate different technologies available to help CBO, including a working group. We are still at the experimentation stage of what could be useful for our work. For example: need help with coding, and are very excited about the GitHub co-pilot project.

What are some of the most significant opportunities presented by AI?

  • Laurie Allen at LC: Significant opportunities: (1) to direct support to Congress, such as working with bigger data sources, analysis across varying fields. (2) Public access to information about Congress, beyond legislative information to all info held by the LC – provide insight into legislation and everything else. (3) There’s the potential to model trustworthy provision of authentic information in transformative ways – with a focus on making sure it’s built on authentic information.
  • Sam Musa at GPO: The ability to process large amounts of data quickly, including in different sources and formats; and to find patterns and trends and to correlate info. For example, GPO’s Publicly Identifiable Information Tool has an AI component to check documents for PII. AI can also help automation to streamline team tasks & manage information flow.
  • Mark Hadley at CBO: AI provides a significant opportunity to boost productivity of analysts and more quickly answer questions from congress. It is particularly helpful for writing code – especially in dealing with legacy code, and translating code from one language to another. Another significant use is to help with accessibility for persons with disability, such as accurately describing the contents of a picture or a table. It can also help with translation into other languages, although there are concerns about accuracy and security of using AI.
  • Steve Dwyer at CAO: Increases efficiency across government, especially in addressing low level and redundant work. Excited about the possibility for specially trained LLMs on specialized datasets, such as trained on the documents that related to a member of congress or a committee. Can also help with information overload when LLMs are trained on a data set on a particular set of issues.

How do you ensure transparency and authenticity in the use of AI and other emerging technologies? What should the CDTF be considering around cross-organizational collaboration on matters like data governance, shared technologies, and upskilling?

  • Steve Dwyer at CAO: There are inherent problems with LLMs on bias, so the government must be transparent in how it uses AI and discloses that it is doing so. It has a great potential for training people – for upskilling – to help teach about new matters. On data governance, improving AI is dependent upon more open access to data sources to build these tools.
  • Sam Musa at GPO: Need clear ethical guidelines and standards for use of AI and to develop a regulatory framework that addresses issues on data privacy, security, accountability, and transparency. Data sources must be valid and under positive control. List of GAO’s use cases at gpo.gov/AI.
  • Laurie Allen at LC: We created a use case inventory. At LC labs, we are working to document experiments and document quality standards and benchmarks. Which models tend to work well? What standards are we using? We can imagine building benchmark data sets to test models more consistently.
  • Mark Hadley at CBO: We have to be as transparent as you can possibly be; I’m all in on transparency. We need as much rigor to our products as possible. AI’s greatest role is as we start to do our work, then our rigorous fact-checking process makes sure it’s accurate and reliable.

Closing thoughts?

  • Mark Hadley at CBO: AI fundamentally changes career development. People who use AI will be used to reviewing the work of another intelligence, which means they’ll be used to it when they become management. Also, it allows us to take away the routine tasks for staff can focus on the exciting work.
  • Laurie Allen at LC: Partnership is key, we need to learn from each other. The LC has a worksheet on how to think about various AI use cases.
  • Sam Musa at GPO: AI brings many risks and we need to keep the risk at an acceptable level.
  • Steve Dwyer at CAO: It’s surprising how quickly everyone is moving. Technology has great potential.

WORKING GROUP REPORTS

Legislative Branch XML Working Group (Matt Landgraff at GPO) – update on USLM

The modeling for some bills and amendments has gone as far as it can, so the next step is to put up samples on GPO’s bulk repository to test it out. Look for a new version of the USLM schema.

GPO plans to take a fresh look at the USLM roadmap early in the new year to make enhancements and updates. (Note GPO’s October 13, 2023 report on USLM is online here.)

Legislative Branch Staff Directory – Steve Dwyer at CAO

Report on the staff directory was submitted in June. The working group meets every other week with Senate partners to advance the project; it has also met with all the major legislative branch agencies about obtaining appropriate data.

The next step is to obtain funding to build out the work.

The hardest part of the effort is the collection and organization of the data. It will be challenging to have real time data about staff. Once the project has been provided funding, they are ready to go into a more active phase.

The door is open to anyone inside or outside congress who have worked on staff directories or staffer data about how to do this best.

Congressional Video Preservation – Arin Shapiro, Senate Sergeant at Arms

The Congressional Video Preservation working group has been working for the last two years with the House, LC, and NARA about how to provide video of Senate proceedings to the LC and NARA and public access.

The group is in the final phases of the report, which they hope to have completed in March. (Note that they said they had hoped to have the report in the update from last June). The recommendations are leaning towards keeping all the current methods of submitting data and then augmenting the data availability on Congress.gov – including connecting video of proceedings onto events page on Congress.gov.

Starting tomorrow the Senate will be using a new system provided by the Library of Congress to provide video and information to the LC. This mechanism is also allowing for the development of a file-based workflow to transfer videos and metadata. (It’s unclear to me what the previous transfer mechanism was, although at one point it was DVDs.)

The Working Group on Digitization of Congressional Documents

No report. Will have an update in March.

The Legislative Branch Data Interchange Working Group – Kimberly Ferguson, CRS

No report but a quick update. We are replacing a 40-year old data exchange mode.

Comments are welcome at the Library of Congress’s API GitHub page. Reminded that the Library of Congress API went official this past September. If you want to know what’s changed, look at the changelog.

The Congress.gov API is at the beginning of sharing and distributing committee meeting information, not just previously published transcripts, but information about what’s happening in the future, integrating meeting pages with pre-submitted documents. This will allow for marrying committee videos with meeting information + making it available through the API

REPORTS FROM CONGRESSIONAL OFFICES

Secretary of the Senate – Arin Shapiro

The Senate is rolling out a new architecture for senate.gov that combines reference and art & history. (It may be at this webpage.)

The Senate is nearing completion of a legislative exchange effort to provide data to the Library of Congress is a new format. The old format was nearly 40 years old.

Clerk of the House – Andy Doyle, Director of Legislative Applications

The Comparative Print Project has completed the integration work to allow other legislative branch organizations to use the tool. At some point, it is anticipated to provide authenticated logins across the legislative branch using existing infrastructure to authenticate people. (No information provided on whether stakeholders in the Senate would be able to use the tool). (The Clerk’s October 13, 2023 report on the comparative print project is here.)

Lobbying disclosure project is focused on modernizing disclosures to include unique IDs for lobbyists, which will address duplicate accounts. (This has been ongoing for some time). The House is working closely with the Senate and the foundation of the project will be led by the Senate. The House-specific requirements will be built on top of the Senate’s system. (The October 13, 2023 report on the lobbying disclosure project is here.)

Made requests for funding to (1) fund the committee work project, including modernizing vote data and referral management, and looking forward to starting some of the work in the upcoming year; (2) modernize legislative drafting tools, especially collaborative drafting. Both of these projects were requested by the Modernization Committee. (The October 13, 2023 report on modernizing committee vote data is here.)

LIMS: modernization is ongoing; delivering capabilities to the LRC and the bill clerks.

LC Data Exchange: Modernizing the data exchange with the Library (just like the Senate). With a few minor details, everything now is in modern formats for daily feeds. The House is working to publish a more detailed map of what information is being exchanged.

House Digital Service – Ken Ward, CAO

Calendar deconflict: HDS previously presented on its deconflict project, which showed when members have overlapping committee meetings, including when hearings have not yet been publicly announced. Building a new feature that shows when they have conflicting meetings for those proceedings that are publicly announced and will provide a notification to the users. This will be shipped in 2024 to staff and will allow them to subscribe to committee feeds in the MS calendar.

Staff salary info: providing support to payroll and benefits team at CAO to provide summary staffer salary information. This would make it possible for employers to understand typical staff pay and benefits broken out by position, duration, etc. HDS is building the dashboard. This will be available via FinMart, an internal financial reporting tool.

Flagtrack: Getting ready to pilot this prototype. Have worked with various offices throughout the legislative branch (from supply to AOC to HIR) to build a tracker of where flags are in the request process. Should be available as soon as January.

Hackathon: With support from CAO, can expect the congressional hackathon to occur on a more regular schedule.

Casework: Doing lots of user research with CMS vendors to get samples of casework data. The purpose is to provide a view into aggregated casework data. Getting ready to start the development work in early 2024.

Congressional Budget Office – Kevin Perese, Senior Advisor, Data and Transparency

Kevin Parese of CBO gave an excellent presentation on the datasets CBO is publishing on its website. CBO has an internal policy on releasing code (section 7.17) – releasing data and computer code is part of its transparency efforts. The slides from his presentation are here. CBO started publishing repositories on GitHub in 2020.

The four repositories are:

1. Eval-projections: centralized data for baseline projections – CSV files ready for machine access – look at accuracy of CBO budget projections, outlays, past revenue, etc. The code replicates CBO’s analysis published in four major reports available on its website

  • The Accuracy of CBO’s Budget Projections for Fiscal Year 2023
  • An Evaluation of CBO’s Projections of Outlays From 1984 to 2021
  • An Evaluation of CBO’s Past Revenue Projections
  • An Evaluation of CBO’s Past Deficit and Debt Projections

2. Captax: large-scale model that produces CBO’s estimates of effective marginal tax rates on capital from new investments – as a CSV

3. Means_tested_transfers_imputations: data for household income analyses

4. Debtwelfare – overlapping generations model – how government debt shifts risk across generations (python)

Questions and Answers

~ Constitutional Authority Statement ~

Q:Can Congress.gov link to the constitutional authority statements?
A: It already does.

~ Index ~

Q: Bill history from Congress record cites the index, but I don’t see bill numbers in the index list. How would you find a bill in the index list?
A: It’s unclear what the questioner is referring to. Perhaps it’s actually referring to hyperlinks to the bills in the actions tab for legislation.

~ Senate Bills and Amendments Online Prior to Floor Votes ~

Q: In December 2021 a coalition of organizations wrote to the Senate asking it to make legislation, amendments, and committee reports available online prior to a final vote on the floor. At times, legislation will show up on Congress.gov days or weeks after it has passed the Senate. (The Senate’s internal system, known as ATS, doesn’t make the full text of amendments available to staff, but has a page limit, and only publishes that information as a scanned file.) What is the status of the Senate’s efforts to create better bill, amendment, and report text availability prior to votes?
A: (Arin): We are trying to find the right way to thread this needle. Both political parties want this, but there are a lot of considerations concerning the processes that currently exist and why they exist. There may be good reasons why the processes are what they are, but want to take advantage of advancements in technology. We will come up with a solution to address availability but there’s no timeline by which we will do so.

~ Updating committee info on Congress.gov ~

Q: Timeline for improvements for committee information on Congress.gov?
A: (Kimberly): We are in real time working towards the improvement for committee data on congress.gov, covering videos, transcripts, documents, etc. Right now we have developers who are trying to figure out how to display senate committee video on congress.gov. We may see URLs in the API for congress.gov for senate committee video in Q1. Further out – perhaps in 2024, but could be much later – there will be data files with data files for video closed captioning. We are currently in an exploratory phase and the rate of action is unpredictable.

In addition, we are looking to add historical committee information. The updated data exchanges have committee codes that were invented by House and Senate staff starting in 1973. GOP and LC are working on historical digitization projects. We are creating/ updating historical committee authority records. These authority records have already been created by Library cataloguers, but they need to be moved into the data exchanges.

CLOSING REMARKS

Dates for next year:

March 19, June 6, December 12

A number of leg branch entities are hiring. Look to the websites:

House: https://www.house.gov/employment/positions-with-other-house-organizations

Senate: https://www.senate.gov/visiting/employment.htm

  • Secretary of Senate: https://saa.csod.com/ux/ats/careersite/1/home?c=saa&cfdd[0][id]=2&cfdd[0][options][0]=3
  • SAA https://saa.csod.com/ux/ats/careersite/1/home?c=saa

GPO: https://www.usajobs.gov/Search/?a=LP00

LOC: https://loc.gov/careers/

Prior Meetings for which we’ve published a summary

2023: March 2023 CDTF Meeting | June CDTF Meeting | September LC Virtual Public Forum | September Hackathon 5.0 | December CDTF Meeting

2022: December 2022 | September CDTF Meeting | September LC Virtual Public Forum | June CDTF | March BDTF | April Hackathon

2021: July BDTF | September LC Virtual Public Forum

2020: September LC Virtual Public Forum

2019: July BDTF | October BDTF |

2018: February 2018 (available upon request) | June LDTC | November BDTF |

2017: April BDTF (available upon request) | June BDTF (available upon request) | December Hackathon

2016: May BDTF | June LDTC (and this)

2015: May LDTC | October Hackathon

2014: February BDTF | June LDTC | December BDTF

2013: February BDTF | May LDTC |

2012: April LDTC |