Photo Credit: Donald Iain Smith
Open-AI’s latest natural language process (NLP) model, GPT-3, is an astonishing feat. The tool is able to generate poems, short stories, songs, and technical specs that can pass off as human creations.
But as cool as it is, GPT-3 doesn’t actually understand what it’s creating. AI needs to demonstrate a deeper level of comprehension to gain our trust.
Enter Diffbot
To address this issue, the machine-learning company Diffbot is building an AI that reads every page on the entire public web, in multiple languages, extracting as many facts as it can.
Rather than using this info to train a language model like GPT-3, Diffbot turns it into a series of 3-part factoids that relates one thing to another: subject, verb, and object.
This approach creates a more accurate knowledge graph
In addition to the subject-verb-object paradigm, Diffbot’s founder Mike Tung tells The Hustle that his startup is building an AI system that consumes information like humans do.
Among other parameters, it takes into consideration things like; 1) trustworthiness (“Did this come from an official source, or social media?”); and 2) up-to-date-ness (“Is this information stale?”), so that you can see where the facts it generates come from on the web.
The startup already has ~400 paying customers
Diffbot is the only US company (aside from Google and Microsoft) crawling the entire web, and the knowledge graph it’s building is being deployed across various industries:
- DuckDuckGo uses it to create Google-like answer boxes
- Snapchat uses it to extract highlights from news pages
- Adidas and Nike use it to find counterfeits
What’s next for Diffbot?
Making it easy to use information from their knowledge graph for popular business tools like Excel, Google Sheets, and Salesforce.