Recap of the September 2022 Congressional Data Task Force Meeting

The Congressional Data Task Force provided a series of legislative branch technology updates at its third quarter meeting on September 29, 2022, including a CAO presentation on adding metadata to Statements of Disbursement, a recap of the Library of Congress Virtual Public Forum, updates on the comparative print project and E-Hopper, updates on how the Congress.gov API beta is handling committee codes, and more. 

We have a full report below on what happened at the two-hour meeting below, and here are some highlights:

The CAO announced it is adding metadata to Statements of Disbursement to make files easier to analyze. The CAO is obligated to produce these 60 days after the quarter. The information is currently in PDF and is human readable, but CHA requested CAO to enhance the data to make it easier for third parties to download the spreadsheets and perform analysis using automated tools. The new columns will provide unique identifiers for the following five columns in the current files: organization, program, sort subtotal description, description, and vendor name.

The Library of Congress recapped its recent Virtual Public Forum (which we wrote about here), including the announcement about the new Congress.gov API, several Congress.gov enhancements, and panels on legislative data standards. 

The House Clerk’s office provided a helpful update on its comparative print project, E-Hopper and Legislative Information Management System. The E-Hopper application — which began in 2019 right before the pandemic — has replaced the physical hopper as the primary way for staffers to submit legislative documents. This process has shifted the burden from member offices to bill clerks. The old system required one bill to be printed at a time; the new solution allows for bulk printing which is saving immense amounts of time and resources in the Clerk’s office. 

A thorough discussion on how the Congress.gov API is handling committee codes, which are essential to understanding committee makeup and information over time. Congressional committees – especially standing committees – have had different names over time (think the Committee on Resources versus the Committee on Natural Resources.) Creating committee codes help break the conflicts and differentiates committees from one another over time. From the House perspective, if the House Rules don’t change the standing committee names, the House Clerk will use the same codes for the same committees. However, they will not issue the subcommittee codes until the committees have held their organizational meetings and vote on their committee rules, which sometimes doesn’t happen until May or later. 

The next meeting of the Congressional Data Task Force will take place on Tuesday, December 13, 2022. Video of these proceedings and announcements of the next will be available on the Congressional Data Task Force’s Innovation Hub.

Prior meetings for which we’ve published a summary:
2022: September LC Virtual Public Forum | June CDTF | March BDTF | April Hackathon
2021: July BDTF | September LC Virtual Public Forum
2020: September LC Virtual Public Forum
2019: July BDTF | October BDTF |
2018: February 2018 (available upon request) | June LDTC | November BDTF |
2017: April BDTF (available upon request | June BDTF (available upon request) | December Hackathon
2016: May BDTF | June LDTC (and this)
2015: May LDTC | October Hackathon
2014: February BDTF | June LDTC | December BDTF
2013: February BDTF | May LDTC |
2012: April LDTC |

=-=-=-=-=-=-=-=-=-=-=-

Welcome from Kristen Gullickson

  • Review of who is part of the task force.
  • Adding a digital layer to a centuries-old paper process.
  • Online means more than just electronically. Want to also provide structured data in a dynamic way.
    • Three characteristics that documents share
      • Presentation: how the content looks
      • Structure: how the content is organized
      • Semantics: what does it mean
  • Continued excitement about the release of the Congress.gov API.
  • Upcoming release of responsive HTML for bill text. 
  • Work being done regarding USLM and data modeling. 
  • Task force info available at https://usgpo.github.io/innovation

Since Last Meeting

  • 9/21/22 virtual forum on Congress.gov
    • Release of Congress.gov API
    • Congress.gov is ten years old
  • Introduction of H.Res. 1331, the latest Modernization Committee resolution

Kimberly Ferguson – Congress.gov

  • Congress.gov virtual public forum. 
  • Video will be available very soon. Will be shared via Congress.gov notification email.
  • Three-hour long Library of Congress Virtual Forum
    • Top ten Congress.gov enhancement list
      • Congress.gov has an enhancement timeline every three weeks.
    • Congress.gov API beta release.
      • Decades in the making. 
      • Majority of communications will come through LC Github workspace.
    • Q – Do amendments include House and Senate amendments?
      • A – Important to understand the difference between metadata and full text of materials. Right now the amendments include metadata for House and Senate floor amendments — actions, descriptions, sponsors, etc. Amendments happen in committees as well; not there yet. Working on releasing the amendments end point for full text for Senate amendments. 
    • Two data partner panels
      • 1st: House, Senate, GPO, Law Library projects and Q&A
      • 2nd: Legislative Data Standards and USLM
    • Updates from Constitution Annotated and CBO. 

Clerk of the House – Andrew Doyle and Kristen Gullickson

  • Product updates
    • Comparative print suite. Does a track changes-like process. Working towards getting this out House-wide by end of the year. 
    • E-Hopper and LIMS (Legislative Information Management System) – Primary way to provide data to data partners. Developed in the 1980s. Moving it to a modern cloud infrastructure. 
      • Andrew Doyle – LIMS is critical to the legislative process. Majority of the data originates from LIMS. In the middle of the large modernization effort for a modern user experience. E-Hopper application, which began in 2019 — replaces physical hopper. Both applications for staffers to submit legislative documents. Tools for bill clerks to accept and process those submissions. Born digital legislative documents and incorporating them into all the products and platforms.
        • Kristen – in terms of volume, can you remember how many cosponsors we have?
          • Andy – Gets into six figures. Close to 100,000. Huge workload for bill clerks. Vast majority of bills are submitted electronically. 
        • Kristen – When we brought this on during the start of the pandemic, we shifted the burden from Member office to bill clerk. Our bill clerks print it all out. Older solution was one bill at a time. New solution allows all bulk printing, which saves so much time. How much time are we saving?
          • Andy – Staff are saving immense amounts of time. 
        • Kristen – Question around electronic cosponsors.
          • Andy – Entry isn’t electronic yet, but we are looking into ways to support that. Right now just the document is electronic. 
  • Congressional Redistricting
    • Gain one
      • Five – CO, FL, Montana, NC, OR
    • Gain two
      • One – TX
    • Stay the same 
      • 37
    • Lose One
      • Seven – CA, IL, MI, NY, OH, PA
  • Unique Identifiers
    • Members – House uses Bioguide IDS – A000370
    • House Committees/Subcommittees – AG00

Kimberly Ferguson – House Congress.gov API is handling the committee code

  • Committee codes are important. Identical committees have had different names over time. Needs to have codes to break the conflicts and differentiate. 
  • With Congress.gov being available, Congress.gov authority and management is now available to everyone. 
  • Also look at the Committee name history section on Congress.gov as well as check out the Committee endpoint section on Github. Communicate to us through the API.
    • Kristen – From the House perspective – if the House Rules Cmte doesn’t change the standing committee names, we will use the same codes for the same committees. However, we will not issue the subcommittee codes until the committees have had the organizational meetings and vote on their rules. 
    • Takes the members a while to come up with their new plan and come to agreement of who sits on what committee. Especially in the years that change the majority. You will see patterns of this in the committee history section of Congress.gov. Subcommittees are often not continuing bodies and only are used in a particular Congress. Sometimes it can be as late as June when a particular subcommittee is named. 
    • Arin Shapiro – Support everything that has been said. It takes a while on our side in the Senate as well. Through organizing resolution that must be passed. Staff are always laser-focused on these systems so that when the information is available, it can be published. 
  • Q – Is is possible to find all House and Senate committee names in one place? Easy to find?
    • Kimberly – We have the names on the committee history page, but if you want to see everything in one place, you need to come to the API.

Statement of Disbursement (SOD) Presentation – Bob Barrett of the CAO

  • Adding metadata to make the files easier to analyze
  • Obligated to produce this 60 days after the quarter. Use all of this time. Currently in PDF and is human readable. Six years ago, we made them as CSVs and excel spreadsheets. 
  • CHA requested for us to enhance the data. Primary purpose is to make it easier for third parties to download the spreadsheets and perform analysis using automated tools.
  • Third parties have reported difficulty performing operations that require a “group by” ability or referential integrity. 
  • The SOD fields in question are text files which can/do include spaces and punctuation. Adding corresponding “ID fields” with unique values will enable third parties to perform automated operations with greater confidence. 
  • New columns will be added to the SOD DETAIL TRANSACTIONS file. The new columns will provide unique identifiers for the following five columns in the current files:
    • Organization – This field is a combination of the funding year and the Office code / DeptID.
      • The funding year is either LY for MCL Offices and FY for all other offices.
    • Program – This field represents the Program Code in the PeopleSoft Chart of Accounts.
    • Sort Subtotal Description – This field is the high-level category of the BOC.
    • Description – This is the BOC.
    • Vendor_Name _ This is pulled form the Vendor address. Vendors can have multiple addresses.
  • Can’t provide unique identifiers for everything. 
    • Employees – provide information.
    • Transfers – payments within the House.
    • Citi Bank – provides purchase and travel cards. All Citibank transactions will have the ID. Do not have a vendor ID for the merchant, only a description.
    • Mask Employee Payments

Question and Answer:

  • Q – Can you tell us more about the LY v. FY? Member offices and committees are on FY while support offices are on LY. 
    • A – LY corresponds with the legislative session. Starting on Jan 1 or 3. Next week will be FY 23 while next week will also be LY 22. 
  • Q – How do CRs affect the data?
    • A – These are disbursements and we are required to report it. If we have a shutdown, we identify essential personnel. 
  • Q – By not having a unique employee ID, it may be difficult to tell if someone is the same person or not. Would they ever change this and assign these unique IDs?
    • A – I don’t know but I can ask. We can use the code in the system because that would expose the employees, but maybe something else can be used. 

Senate Congressional Video – Arin Shapiro, Secretary of the Senate

  • Two brief updates that we are working on. 
  • Congressional video project
    • Technical working group phase comprised of Senate, House, GPO, LC, Archives. Establish a new means of providing high quality videos to LC and Archives for everyone. Better access to Senate Members and staff as well. Lots of challenges of getting material up in a timely manner. Rich metadata needs to be available to build a robust archive. Ongoing project that will not be accomplished by the end of the year. Findings will be reported back to larger group. 
  • Modernizing data exchange
    • Project with LC, SAA, Senate Clerks
    • Machine and human readable data in XML. Updated files on a daily basis on a section basis to make the data more sophisticated. To account for things such as the removal of information and allow information to be delivered more timely and accurately. 

USLM – Lisa LaPlant of GPO

  • Updated text styling and logo
  • Interim resume of congressional activity added to link service for resume of congressional activity
  • Access to Senate introductory statements in the Congressional Record
    • Used to be grouped together and separated by dashes. Now divided individually and easier to find and access.
  • Updated related documents and API from bills to the Congressional Record. 
    • Slides can be found on GPO Github
  • Related documents API from Congressional Record to bills and public laws
  • YouTube video URLs in MODs metadata for a subset of congressional hearings. 
  • Digitized statutes at large and USLM conversion
    • GPO awarded a multiyear contract to digitize the remaining Statutes at Large volumes and create USLM XML files for the volumes.
    • Digitized 110 volumes from 1789 to 1950 including creation of PDF and USLM XML.
    • Convert the digitized Statutes at Large volumes that are currently available on GovInfo from 1951 to 2022 USLM XML.
  • GPO and the Depository Library Council are hosting the largest annual (virtual) gathering of Federal depository librarians and colleagues in the country. Three days of knowledge, information-sharing, and enrichment.

USLM and XPUB – Matt Landgraf of GPO

  • Modeling additional bill versions
    • Ensure that the modeled USLM XML is interoperable within the legislative ecosystem

Question and Answer

  • Q – USLM being publicly available for amendments?
    • A – Similar to the answer Kimberly gave about amendment before. Continue to make process improvements. 
  • Q – Will you leave access to generation one on the website?
    • A – Yes we will continue to provide access to it. There may be a day where things are natively authored in USLM, but we aren’t close to this. 

Validating USLM in LEXA – John Pollock, Office of the Secretary of the Senate

  • Changing data standards
    • Ensure the new USLM (United States Legislative Markup) standard will not break our existing tools and processes
    • Bridge current data format to USLM. 
    • Sometimes described as “changing the wheel on a moving car without crashing.”
  • LEXA (Legislative Editor in XML Application)
    • LEXA is an XML editing application used in the Senate
  • Validating USLM in and Editorial Environment
    • All of the current bill XML elements have been properly represented in USLM.
    • USLM structures are compatible with current formats for editing. 

End Question and Answer

  • Q – Hearing transcripts in XML?
    • A – Kristen – We will eventually move to committee reports, then congressional records, then hearing transcripts. Timeline?
    • A – Matt – No timeline. Knee deep in the amendments piece of it right now. Need to get through the next couple of meetings to find out the scope.