Toronto Data Workshop (TDW) – Accessible Investigative Journalism: Navigating Canada’s Largest Corpus of Government Documents – Sept 20, 2024
“Open By Default” (OBD) is a dataset from the Investigative Journalism Foundation which is Canada’s largest collection of government documents, comprising over 4.5 million pages of Access To Information and Privacy (ATIP) requests and corresponding government documentation. This project enhanced data capture using optical character recognition (OCR), improved search performance through Large Language Model (LLM) […]