January 7th, 2025

added

Diffbot GraphRAG LLM

by Kris Negulescu

Recently, large language models (LLMs) have been trained with more and more data, leading to an increase in the number of parameters and the computing power needed. But, what if, instead of feeding the model more data, we purposefully trained it to rely less on its pretraining data and more on its ability to find external knowledge?

To test this idea, we fine-tuned LLama 3.3 70B to be an expert tool user of a real-time Knowledge Graph API, providing the first open-source implementation of a GraphRAG system that outperforms Google Gemini and ChatGPT. To learn more, see: https://github.com/diffbot/diffbot-llm-inference/.

December 2nd, 2024

added

type:CompanyReport

by Kris Negulescu

Company Reports are now available in the KnowledgeGraph as type:CompanyReport, and in LeadGraph in company profiles. There are two primary types of reports available:

SEC Filings
Documents found on a company’s website, like earnings call transcripts, annual reports, ESG reports, etc.

Initial coverage focuses on the top 1000, publicly-traded companies in the United States.

November 12th, 2024

Scheduled Maintenance Window, Thurs. Nov. 14th

by Kris Negulescu

Please Note: there will be a scheduled maintenance window, including downtime for approximately 50 minutes, on Thursday, November 14th from 10 am PST until 2 pm PST.

The Diffbot DevOps team will be using this time to upgrade some of the underlying infrastructure supporting global crawls. To ensure the rapid restoration of uptime and stability of the platform, we are performing this upgrade during ordinary business hours.

During this maintenance window, all updates to the Diffbot Knowledge Graph will be paused. Access to Organization and Person data will continue. Access to Article data will be limited, i.e. you will not be able to download new article data from the graph for your sources, nor will you be able to access article data crawled more than 5 months ago. All other graph data types, including Products & Events, will be inaccessible.

Other Diffbot services will continue to be operational including Crawlbot, all Extraction APIs, the Bulk Extract API, the Natural Language API, and the Enhance APIs.

October 15th, 2024

improved

EventAPI Enhancements

by Kris Negulescu

We have —

Added categories for events
Improved rule handling for event location, date, title, and description.
Improved title, image, start & end date, and timezone extraction.
Added support for extraction of location from maps as well as text.

September 3rd, 2024

added

News monitoring now available on LeadGraph

by Kris Negulescu

News on LeadGraph is a new feature that allows anyone to monitor breaking news for key risk and opportunity signals.

Monitor key business events like

New Products
Partnerships
Mergers & Acquisitions
Executive Hires
Funding
Private Equity
Layoffs

For more personalized monitoring, you can also curate keyword and company lists. Unlike Diffbot APIs, LeadGraph is accessible via trial, only, for now. Reach out to us for access.

July 30th, 2024

improved

Improved video normalization for HTML5

by Kris Negulescu

In July, we worked on better normalization for videos in HTML5 along with enhanced support for video controls and audio elements in podcasts.

June 21st, 2024

added

Try the Diffbot NL API: Natural Language Processing

by Kris Negulescu

Extract entities (e.g., people, organizations, products) and data about them (e.g., sentiment, relationships) from raw text. Try it via API (Introduction to Natural Language API) or the Dashboard: https://app.diffbot.com/natural-language/

May 24th, 2024

deprecated

Removing Organization.industries & Organization.diffbotClassification on January 7th

by Kris Negulescu

PLEASE NOTE: We plan to deprecate Organization.industries and Organization.diffbotClassification on Jan 7th, 2025. Instructions for how to migrate to the updated Organization.categories can be found here:
https://docs.diffbot.com/docs/industries-to-categories-migration-guide

May 24th, 2024

improved

Updated Organization Ontology: Organization.categories

by Kris Negulescu

We've made significant improvements to our industries classifier. This v2 classifier includes over 200 additional fine-grained categories of companies. While most of the existing categories are unchanged, you'll want to review the updated labels. The new industry taxonomy - now called Organization Categories - is documented here: https://docs.diffbot.com/docs/organization-industries

May 14th, 2024

added

Try the Natural Language API Without Code

by Jerome Choo

Did you know Diffbot has a Natural Language API? Not only is it capable of extracting entities but also the relationships between entities in any body of text.

Use the NL API to build Knowledge Graphs from large corpora, or highly personalized news monitoring pipelines.

And starting right now, you can explore the Natural Language API in the dashboard without code. Give it a go at https://app.diffbot.com/natural-language/