This AI startup is reading the entire internet non-stop. Why?

Diffbot is training its AI to be more trustworthy by reading the internet nonstop and creating a system that ensures the accuracy of its data.

Photo Credit: Donald Iain Smith

This AI startup is reading the entire internet non-stop. Why?

Open-AI’s latest natural language process (NLP) model, GPT-3, is an astonishing feat. The tool is able to generate poems, short stories, songs, and technical specs that can pass off as human creations.

But as cool as it is, GPT-3 doesn’t actually understand what it’s creating. AI needs to demonstrate a deeper level of comprehension to gain our trust.

Enter Diffbot 

To address this issue, the machine-learning company Diffbot is building an AI that reads every page on the entire public web, in multiple languages, extracting as many facts as it can.

Rather than using this info to train a language model like GPT-3, Diffbot turns it into a series of 3-part factoids that relates one thing to another: subject, verb, and object.

This approach creates a more accurate knowledge graph

In addition to the subject-verb-object paradigm, Diffbot’s founder Mike Tung tells The Hustle that his startup is building an AI system that consumes information like humans do.

Among other parameters, it takes into consideration things like; 1) trustworthiness (“Did this come from an official source, or social media?”); and 2) up-to-date-ness (“Is this information stale?”), so that you can see where the facts it generates come from on the web.

The startup already has ~400 paying customers

Diffbot is the only US company (aside from Google and Microsoft) crawling the entire web, and the knowledge graph it’s building is being deployed across various industries:

  • DuckDuckGo uses it to create Google-like answer boxes 
  • Snapchat uses it to extract highlights from news pages 
  • Adidas and Nike use it to find counterfeits

What’s next for Diffbot?  

Making it easy to use information from their knowledge graph for popular business tools like Excel, Google Sheets, and Salesforce.

New call-to-action
Topics: Ai Emerging Tech

Related Articles

Get the 5-minute news brief keeping 2.5M+ innovators in the loop. Always free. 100% fresh. No bullsh*t.