This past week we upgraded our EveryCRSReport website, which as of this writing contains approximately 17,700 reports. By comparison, the official CRS website has only around 8,500 reports. The result is that we will continue to provide you the most up-to-date CRS reports as well as an extensive archive.
What was the upgrade and why did we make it?
Back in March, when it was no longer safe for Members of Congress to go to their offices, our supply of CRS reports was disrupted. Over the last month, we built a scraper to obtain the reports published on the official public-facing CRS reports website. Those reports are now included in our archives and are updated on a regular basis. We are also transforming the PDFs into HTML. Because it is very difficult to do a good conversation of a file from PDF to HTML, our HTML versions have a number of formatting errors. But we’ve done the best we can.
Unlike the official public website, where CRS reports are available only as PDFs, CRS maintains a separate website only available to Congressional staff that has the reports both in PDF and HTML formats.
We publish both the PDF and HTML versions of the reports, which originate with the official sources. There are three major reasons why we publish the reports as HTML. First, the HTML versions are more SEO-friendly, so they can be more easily found with an internet search. Second, they also can be automatically reformatted on a mobile device, unlike a PDF, so you can read them even if you’re on your phone or tablet — and transformed and downloaded as an ePub. And finally, we can more easily analyze the file to show you differences between versions of the same report.
(We have asked CRS to publish the reports as HTML on their website, but they have strongly indicated they will not do so unless specifically directed by Congress. We believe this is not a question of legal authority — the law allows them to publish the reports as HTML — but reflects their preference absent direction from legislators.)
Plans going forward
Should circumstances change and we start receiving the HTML versions of the files, we will resume publishing the official HTML version in place of our imperfectly transformed PDF files.
We hope that CRS will start publicly publishing the text of CRS reports in a more accessible format than PDF — just as they make them available internally. This could take the form of publishing the reports as HTML on their website, attaching a data file to each PDF, or publishing the data as a bulk data repository (like the one on GPO’s website). We found the publication of HTML versions to be simple and inexpensive to implement on our website, and as they are already publishing internally at HTML; perhaps it would be the same for them.
As mentioned above, it may take Congressional direction to prompt the Library to publish CRS reports as HTML. We and many others had raised concerns with the Library of Congress’s implementation memo as they prepared to launch their CRS reports website, but our concerns were not resolved at that time.
We have included a fix for this in our recommendations to Legislative Branch appropriators. (We also think the Library should look at publishing some non-confidential historical CRS reports, which we outlined in this recommendation.)
Fortunately, the House Legislative Branch Appropriations Committee has offered significant hope. They have requested a study on both of these items in their FY 2021 committee report.
Access to Archival Materials: The Committee requests that with-in 60 days of enactment of this Act the CRS provide a report to the Committee evaluating the possibility of publication of CRS reports contained in its CRSX archive, specifically examining the feasibility, cost, and benefits of integrating all or a subset of the reports online. This analysis should include an assessment of the utility to the public and Congress of online access to the reports.
Alternate Format for Public Reports: The Library is requested to provide to the Committee within 60 days of enactment of this Act a report describing the process, timeframe and costs of making available to the public all currently available non-confidential CRS Reports in HTML format rather than PDF, or a successor format when appropriate. The Committee understands that CRS already publishes reports on its internal website in HTML. Making this change in format for external audiences would facilitate the use and re-use of the information contained in the reports.
Because the Legislative Branch Appropriations bill is significantly delayed this year, it may be some time before the underlying bill becomes law and these directions to CRS come into effect. There’s also the possibility that the House Administration Committee or Senate Rules Committee acts first and directs CRS to address these issues.
We would much prefer for everyone to get CRS reports from the official source, i.e., the Library of Congress, and not from us. But until the non-confidential reports are available from the Library in HTML and they address access to legacy reports, we will continue to provide this service.