The Recap: Library of Congress 2021 Virtual Public Forum

On September 2, 2021, the Library of Congress held the second of two virtual public fora on the Library’s role in providing access to legislative information directed by congressional appropriators in FY2020. (For reference, language requiring the proceedings was included in the committee report accompanying the FY2020 Legislative Branch Appropriations Bill.) We summarized the 2020 forum here.

The event was well attended — the Library noted over one hundred RSVPs — and several participants voiced their appreciation and recommended continuing the practice in the future. During Q&A, the Library stated that there were no plans in place for future public virtual events to continue, but indicated that we’d “hear more” about any such plans after the video was published. (No news so far.)

Ahead of the 2020 forum, the Congressional Data Coalition submitted more than two dozen recommendations regarding the Library’s legislative information services. They fell into five conceptual groupings: (1) Publish Information As Data; (2) Put the Legislative Process in Context; (3) Integrate Information from Multiple Sources; (4) Publish Archival Information; (5) Collaborate with the Public.

This year, civil society recommendations and audience questions circled around these same themes. A letter submitted to the library by the Policy Agendas Project called for “greater publication of documents and data by the Library of Congress on Congress.gov.” The letter, which was signed by 18 political scientists, specifically recommended that the Library make historic bill text available online; review and publish CRS reports from the CRSX archive; collect Congressionally mandated Executive branch agency reports; publish all hearing information and committee reports from 1970 forwards; and adopt the Policy Agenda’s project coding system. They also endorsed our letter from last year.

During the public discussion period, we noticed particular interest in making Congress.gov more accessible; using Congress.gov to further congressional oversight; and enabling Congress.gov to track related legislation, or ‘legislative memes.’ (On this last point, we note that our new tool, BillMap, enables tracking of legislative memes — we’re still working on it and feedback is welcome. One of our developers wrote about how we track legislative memes and assess when bills may be related to one another here.)

As usual, participant questions were especially productive. We’ve incorporated notable questions throughout our summary. LOC posted a video of the entire event here.

PRESENTATIONS

Kate Zwaad, Library’s Director of Digital Strategy, moderated the event. Zwaad stated that the Library was there in “listening mode,” and introduced Bud Barton, the library’s departing Chief Information Officer, whom she thanked for all his work over the years.

Departing leadership. Bud Barton noted that LOC is situated as a purveyor — not an owner — in the legislative information system, and is therefore limited by how Congress chooses to provide information. [We note that the Library actually does internally generate a significant amount of legislative information.] However, Barton also noted that the Library is uniquely positioned to gather user feedback on the existing information system and translate that feedback into its own updates. (Congress.gov, for example, is updated every three weeks.) The Library can also communicate users’ needs to its data partners.

A message from our product owner. Andrew Weber, product owner of Congress.gov, spoke about several ongoing modernization projects for Congress.gov, some of which were based on suggestions made at last year’s public forum. One of the bigger projects facing Congress.gov is making all historical bill text available online. Public and private laws from 1951 to present are now available on Congress.gov, as are US founding documents/Century of Lawmaking materials and transcripts of 11 Congresses, going back to the 107th Congress. (More has been published since then.) Weber stated that LOC only has transcripts dating back to 2001. Holes in the digital historical legislative record are a significant issue — Margaret Wood elaborated on where these absences exist later on — and could be a good candidate for a major crowdsourcing effort, like the one we saw a few years back for transcribing Law Library of Congress reports that couldn’t be OCRed.

Weber also touched on efforts to make Congress.gov more accessible. A new tool that reads transcripts to users is a welcome improvement. Other user-friendly updates from the past year include improved search filters; a built-in constituent feedback tool for legislation; and a help center. Weber noted continued efforts to improve the mobile functionality (30-50% of traffic is mobile!) for Congress.gov.

Bound Congressional Record. Robert Brammer from the Law Library of Congress discussed LOC interns’ efforts to make the Bound Congressional Record more discoverable by breaking up these massive pdfs. (That’s a lot of work.) The Record is currently available dating back to 1921. Brammer also highlighted the law library’s monthly webinars on legal research and the library’s collections.

LOC’s accessibility review and improvements were discussed by Natalie Buda Smith and Fred Simonton. They noted the accessibility advantages of HTML format. A recent accessibility audit of Congress.gov identified the need for improving the visual accessibility of the site, including color contrast and form elements, and general importance of providing information in multiple formats. To improve digital accessibility and searchability, the Library must tackle its troves of PDFs and publish their content as data.

Congressional Web Archive. Abbie Grotke spoke about the Congressional Websites Archive, a repository for the public digital record that began actively collecting Legislative materials in the 107th Congress and collects on an ongoing monthly basis. Grotke explained that the archive functions like the Wayback Machine; she described its goal as replicating “the look and feel of the websites archived” — not necessarily preserving archival content. The Congressional Web Archive includes House and Senate Committee websites as well as member websites from both chambers. During the Q&A, Grotke spoke about ongoing efforts to download and publish members’ activities on social media as data, which are frustrated by inadequate capture tools. Grotke said there’s an ongoing project to better capture members’ online activities, and mentioned that the National Archives does this every two years under NARA.

Grotke announced that the archives from the 115th and 116th Congresses were about to be released — looks like they’re up now — leading us to wonder about the delay between data collection and publication to the web archive. I wonder how that process can be supported and whether there’s a role for the Advisory Committee on the Records of Congress.

Century of Lawmaking. Margaret Wood of the Law Library of Congress reminded the audience to ask a librarian! and spoke about the Century of Lawmaking, a cache of historic legislative data spanning 1774 – 1875, including laws, congressional journals, bills and resolutions, and materials related to the Constitutional Convention and the ratification of the Constitution. Wood stressed that CoL is outdated, and detailed the efforts to migrate some of its data to Congress.gov, which has better search capacity. Currently, users can search metadata of legislation from the 6th – 42nd Congresses on Congress.gov

There are certain holes in the dataset — for example, the record is incomplete for the 12th and 16th Congresses — and bills need to be retroactively numerated somehow for usability, since the House did not sequentially number bills until the 15th Congress and the Senate did not sequentially number bills until the 30th Congress. Wood mentioned that bills from the 12th and 16th Congresses are available on microfiche and could theoretically be digitized, and identified filling these gaps as a current goal of the library.

Digitization of the US Serial Set is the goal of LLOC and GPO’s multi-year collaborative effort discussed by Jay Sweany (LLOC) and Suzanne Ebanues (GPO). Digitizing all 16,000 volumes of the US Congressional Serial Set, dating back to the first volume from 1817, is expected to take another three years to complete. LLOC and GPO’s joint production website is now live and features a selection of serial set volumes from the 69th Congress, the 82nd Congress, and several 19th Century Congresses. In Q&A, LLOC explained that these Congresses were not selected for historical or political relevancy, and said that the digitization will proceed chronologically, meaning that the most recent volumes will be digitized last. We’d rather have the most recent volumes first, as they’re more likely to be immediately relevant; we look forward to the project’s completion.

Lisa LaPlant of GPO demonstrated that portions of the US Serial Set are now discoverable on govinfo.gov via search, browse, and API. The documents are available in full and as individual documents. LaPlant explained that govinfo.gov is an ISO-certified repository and is integrated with XPub, GPO’s new XML composition system.

Help Center resources. The final presenter was Kimberly Ferguson of LOC who gave a live demo of Congress.gov’s help center, which presents resources in multiple formats including video, essays, a searchable help center, and the contact button which provides a line to a law librarian.

QUESTIONS AND ANSWERS

Q&A. We have worked to summarize the gist of the Q&A, but this is not a verbatim transcript. We have reorganized the items conceptually.

LOC’s mission

Q: Does the Library plan to improve how its collections can be used to study and track Congressional oversight efforts — for example, by creating searches to more easily access hearings; adding specific resources for conducting oversight in the Help Center; adding oversight materials to the CoL; and linking oversight reports to specific hearings?

A: No. The Library does not consider oversight as one of its primary goals.

Q: Is the library supporting Congress or committees in a virtual legislative drafting process currently? Could Congress.gov evolve into a place where we have more robust identities as digital citizens to participate in oversight, hearings, drafting, etc.?

A: No; we see our role as guiding constituents to their members of Congress.

Updating LC materials

Q: Does the Library include records from the Office of Congressional Workplace Rights?

A: No.

Q: There seems to be a long lag between bill introduction and getting the text up on Congress.gov. How is this being addressed?

A: Once bill text is introduced in the Congressional Record or its metadata on Congress.gov, the bill text should be soon to follow. On the Senate side there are sometimes delays in enrollments. The quality assurance timeline is also a limiting factor, and simply takes longer for bigger bills. (We note that at times the delays can be significant.)

Amendments

Q: When will we be able to see authored amendments to omnibus bills?

A: Idk. Kirsten Gullickson noted that on the House side, adopted amendments must be added to docs.house.gov and some must be included on rules.house.gov.

Q: Senate amendments are not available on Congress.gov unlike on the House side. How can this info be integrated?

A: No answer to this question.

Related legislation

Q: Is it possible for members to provide feedback to the Library on which bills they consider ‘related’?

A: There are a number of different channels through which the Library receives related bill data. The House clerk and House committees can transmit related bill data to the library; on the Senate side, the Office of the Secretary of the Senate and Senate committees also have this ability. In many cases, the LOC employees who write bill summaries are the ones who designate related bills.

Q: Could there be a process by which members ask the relevant committee to identify related legislation information to Congress.gov?

A: It’s complicated.

Congressional Research Service

Q: It’s great that CRS reports are now up on the site but unfortunate that the system doesn’t publish the information in data formats (only PDF) and it also doesn’t include reports from before 1997. Have you considered using full text HTML to improve usability of the info and publishing the full CRS digital archive, which exists going back to 1970?

A: Thank you for the feedback. (The Library previously indicated it would not make any change with respect to access to CRS reports, despite encouragement from Congress, unless it is directed to do so.)

Data standards

Q: If committees start building their own E-hoppers or repositories, what should they be doing now so the content and structure can be metabolized by Congress.gov, and not weaponized in the data environment?

A: There are no good data standards in place for Congressional hearings.

Q: It’s very valuable to have more member information on Congress.gov, including the committees on which they serve. Wondering when that would be available?

A: Another data standardization problem. It’s not always available from official congressional sources.

Uploading hearing videos

Q: The Library mentioned its efforts to upload hearing data dating back to 2001; does the Library have plans for uploading committee hearing data from before that? (Noted there’s some data going back to 1870s).

A: We will go back as far as we have good data.

Q: During last year’s forum, your team mentioned that a human has to enter each video’s ID tag. Has there been any progress made in creating an automated entry system for all videos?

A: Kirsten Gullickson is responsible for there being unique IDs for House Committee meeting announcements and Arin Shapiro on the Senate side. Thrilled about those. Do not yet have in place a way for the unique ID to automatically be attached to video and announcement and hearing transcript. Over the past summer, volunteers have been stitching this metadata together. Want to figure out how committees can produce these downstream to be associated in an automated way. The future is bright.

Q: On the House side, we have many of the videos hosted on Youtube and linked to on Congress.gov. Some of the historical videos serve as repositories. Not sure that it’s happening on the Senate side. Youtube will create an automatic transcription of proceedings. Hearings sometimes take a year or more. Is the library holding House and Senate proceedings? Would the Library consider creating automated transcripts?

A: This is something we’ve been looking into, but no plan in place to implement it. Senate floor proceedings have closed captioning. This can be leveraged to achieve what you are asking for.

Update: video from the forum has now been posted here.

This recap was written by Izzi Olive and Daniel Schuman