New York Times Data Scientist Tells Students to Prioritize Client Communications
Chris Wiggins, chief data scientist at The New York Times, presented on his field Feb. 25 at the Jersey City branch of the Ying Wu College of Computing.
Technology is important, but to be a good data scientist it's more important to understand communications skills, explained Wiggins, who holds a Ph.D. in theoretical physics and joined the Times in 2014 on sabbatical from his applied mathematics professorship at Columbia University.
"There are extra pains associated with trying to take a real-world problem, reframe it as machine learning, execute the machine learning and then communicate the results back to somebody in a way they can use," Wiggins said. It needs to be understood by "people who don't do calculus," he noted.
Before a data scientist begins working on math, it's vital to know what problem your clients want to solve, ensure the data quality and understand the organization's current practices. "You can't just hire someone to come in and do artificial intelligence from scratch," or you'll end up with dissatisfied clients.
He cited the old joke among data analysts that if you torture numbers enough, they'll confess to anything.
However, he said, you also have to stay true to what the numbers honestly say. He cited the old joke among data analysts that if you torture numbers enough, they'll confess to anything.
At the Times, Wiggins' work is used for three kinds of problems — descriptive, predictive, and decisive — which are plain-English terms he used to refer to the machine-learning terms of unsupervised learning, supervised learning and reinforcement learning.
Problems he worked on, often using his favorite tool which is an open-source library called Scikit-learn for the Python programming language, include a study on detecting which reader behaviors would lead to canceled subscriptions and which news topics evoke which kinds of feelings. These are important because the Times is trying to move away from presenting stories based solely on readers' interests, in order to minimize privacy concerns, he said.
Wiggins is strictly on the business side of the company, not news reporting, but his team does sometimes assists journalists. He gave an example of helping a reporter understand data about air bag deployments in car accidents, which led to product recalls and ultimately the Japanese manufacturer Takata Corp. going bankrupt.
Wiggins' speech was the second from a Times data expert in as many days at NJIT. Data visualization editor Sarah Almukhtar, an NJIT alumna, lectured on the day before at the Hillier College of Architecture and Design. Her message, in a twist of irony, was for people in creative fields to better understand technical subjects.