Join the 30,000 people making better decisions with our data newsletter
Episode 10: Out of the AI Winter
“We’re not going to have machine-humans any time soon”
Sachin and Andrew discuss AI with Matt Carrigan, a machine learning engineer at Parse.ly. AI is a rising field right now, but what exactly is machine learning and how does it relate to news and what we pay attention to online? Plus: Cambridge Analytica, GDPR, and +1/-1 on returning to flip phones.
Industry news that caught our attention:
0:40 – Discussing the Facebook and Cambridge Analytica scandal.
5:45 – GDPR and how it relates to the privacy concerns at the forefront of the Cambridge Analytica story.
Data that held our attention:
9:40 – What’s the connection between “machine learning” and “artificial intelligence”?
12:43 – How Matt’s background in genetics and academia led him to an interest in big data, machine learning, and now natural language processing, or NLP.
15:36 – De-jargoning NLP and understanding how machine learning uses data to process language.
Matt: “Modern machine learning has been enabled almost entirely by the huge amounts of data the Internet has produced.”
21:00 – Learning how Matt is applying NLP to data sets through article clustering, or grouping together stories by events or news items.
Matt: “One of the big questions here is: What does it actually mean to say that two articles are about the same thing? It’s something humans do very intuitively. Despite the progress NLP has made, humans will have a much deeper understanding of the meaning of a text than a machine learning system right now will have. So when we look at it from a machine learning perspective, we have to say, what defines a story?”
22:57 – Why Donald Trump is a tricky NLP “entity,” or a proper noun or object shared across multiple stories, and other challenges of article clustering.
28:55 – Beyond article clustering, what other problems could machine learning solve? Matt discusses sentiment analysis and what machine learning could tell us about other angles within the orbit of larger news stories, with the Black Panther film as a theoretical example.
Matt: “It could be very interesting to track, then, not just is the internet paying attention to something, but is it paying positive attention, negative attention—what is it that the internet is saying about a thing right now? Rather than just ‘attention,’ you could go deeper into kinds of attention.”
Matt: “A journalist or someone writing an article could realize that they would be inside one of the broader clusters but they might not know that there are these other sub-clusters, these other topics within the orbit that they are in, that people are talking about. […] If people are paying a lot of attention to the story, is there a particular sub-story that’s actually what people are interested in? […] All of these are things that NLP could reveal. Modern machine learning is essentially the finding of structure within data, at the 100-mile overhead view, and I think this has huge potential to do exactly that, to basically see where you are in the great map of articles and internet attention and see what is going on around you, to give you reconnaissance information about the spot that you’re writing about.”
34:01 – How machine learning augments but doesn’t replace human capability.
+1 or -1? Quick takes:
36:01 – Returning to flip phones
40:25 – Disconnecting from tech on vacation and weekends