This AI startup is reading the entire internet non-stop. Why?

Diffbot is training its AI to be more trustworthy by reading the internet nonstop and creating a system that ensures the accuracy of its data.


September 8, 2020

Photo Credit: Donald Iain Smith

Open-AI’s latest natural language process (NLP) model, GPT-3, is an astonishing feat. The tool is able to generate poems, short stories, songs, and technical specs that can pass off as human creations.

But as cool as it is, GPT-3 doesn’t actually understand what it’s creating. AI needs to demonstrate a deeper level of comprehension to gain our trust.

Enter Diffbot 

To address this issue, the machine-learning company Diffbot is building an AI that reads every page on the entire public web, in multiple languages, extracting as many facts as it can.

Rather than using this info to train a language model like GPT-3, Diffbot turns it into a series of 3-part factoids that relates one thing to another: subject, verb, and object.

This approach creates a more accurate knowledge graph

In addition to the subject-verb-object paradigm, Diffbot’s founder Mike Tung tells The Hustle that his startup is building an AI system that consumes information like humans do.

Among other parameters, it takes into consideration things like; 1) trustworthiness (“Did this come from an official source, or social media?”); and 2) up-to-date-ness (“Is this information stale?”), so that you can see where the facts it generates come from on the web.

The startup already has ~400 paying customers

Diffbot is the only US company (aside from Google and Microsoft) crawling the entire web, and the knowledge graph it’s building is being deployed across various industries:

  • DuckDuckGo uses it to create Google-like answer boxes 
  • Snapchat uses it to extract highlights from news pages 
  • Adidas and Nike use it to find counterfeits

What’s next for Diffbot?  

Making it easy to use information from their knowledge graph for popular business tools like Excel, Google Sheets, and Salesforce.

Daily briefings, straight to your inbox

Business and tech news in 5 minutes or less

Join over 1 million people who read The Hustle

Psst

How'd Bezos build a billion dollar empire?

In 1994, Jeff Bezos discovered a shocking stat: Internet usage grew 2,300% per year.

Data shows where markets are headed.

And that’s why we built Trends — to show you up-and-coming market opportunities about to explode. Interested?

Join us, it's free.

Look, you came to this site because you saw something cool. But here’s the deal. This site is actually a daily email that covers the important news in business, tech, and culture.

So, if you like what you’re reading, give the email a try.