Chase Davis
“Our industry has an unfortunate history of having disruptive emerging technologies inflicted upon it, putting us on the defensive and forcing us to begrudgingly adapt. But this time around, it feels like we’re taking some welcome initiative.”
I’m thrilled to see so many news organizations — in some cases with the help of key funding partners — investing in a concerted exploration of generative AI and what it means for the mission and business of journalism.
Our industry has an unfortunate history of having disruptive emerging technologies inflicted upon it, putting us on the defensive and forcing us to begrudgingly adapt. But this time around, it feels like we’re taking some welcome initiative. Fantastic.
From The New York Times and The Washington Post to nonprofit news to chains like Hearst and McClatchy to independent regional outfits like ours, it seems every news organization with the means to do so is launching small teams to experiment with these technologies. Good on all of us. I can’t wait to see what comes of it.
My prediction for 2025 is that many of these teams will be focused on immediate, real-world problems and quick wins. Between the demands of grant funding and the speculative nature of these investments by media companies, there will naturally be some pressure to show visible successes quickly. And that’s great! There’s plenty of low-hanging fruit. It’s a great place to get started.
That said, my hope is that we don’t stop there. The world of “AI” is vast and includes much more than the generative models that have captured the public imagination. We can do more than build wrappers around tools like ChatGPT, valuable as they might be. We should also find the time and energy to invest in working from first principles and trying to break some new ground.
In that spirit, I thought I’d highlight a few areas of research and development that, at least to me, seem worthy of our industry’s attention. Perhaps they would make good areas for collaboration, either among ourselves or with academic and industry partners. But at the very least I offer them as food for thought as we begin to explore this space in earnest heading into 2025:
Benchmarking and evaluation: Whether a model can pass the bar, solve complex math problems, or ace the many other benchmarks that are commonly used to determine the state-of-the-art, we still don’t have a great way to measure how well they can accomplish tasks that are important to us as journalists. Sachita Nishal, Charlotte Li, and Nick Diakopoulos at Northwestern highlighted this challenge in a paper they published earlier this year, which ends with a call for further research and industry collaboration. On a smaller project-by-project level, tools like Braintrust can also be helpful.
Small and domain-specific models: Many of us have wondered what might happen if we trained a large language model on the archive of a news organization. The problem is, compared to the internet-scale datasets used to train foundational language models, a single archive is just a drop in the bucket. Here we might borrow from medicine, law and finance, and their work creating small and domain-specific models — some of which have shown promising results. Research suggests that smaller, domain-specific models can outperform large foundational models on certain benchmarks, and companies have begun to take note. What might that mean for us?
Explainability and interpretability: Especially in our profession, it’s hard to trust the output of a model when even its creators can’t fully explain how it works. Model explainability, particularly in the area of neural networks, has been an active area of research for some time (including some notable contributions by New York Times alum Shan Carter). Even at its most basic level, being able to understand and explain how these models work can help us help our audience make sense of them. Andrej Karpathy’s work helped the Times do as much in 2023.
Metadata generation: There has been some interesting research showing that, in certain contexts, large language models can match or exceed the performance of humans in complex classification tasks — say, in trying to assign something like topics, or sentiment, or structural attributes to a news story. If language models can be trusted to provide reasonably accurate metadata about our journalism, it opens up a wealth of opportunities to analyze our stories in different ways, or augment our product offerings. Maybe you want to extract all the evergreen stories from your archive, or figure out whether explainers perform better than more traditional story forms in certain contexts. Companies like Overtone and SmartOcto are already doing some of this, but news organizations could
Courtesy: https://www.niemanlab.org