Diffeo Dynamic Knowledge Discovery for All-Source Fusion


The Diffeo dynamic knowledge discovery system helps analysts rapidly uncover relevant content that expands their perspective on entities of interest as they assemble dossiers and explore networks of related entities. Diffeo uses Basis Technology’s advanced linguistics platform as a key component of Diffeo’s cross-lingual discovery solution. Unlike a traditional keyword search engine, the Diffeo system applies natural language technologies to the user’s in-progress working notes to build an algorithmic model of its subject. Using this model, it scans through available source documents, bringing to the analyst results that have greater coverage and accuracy than those found with keyword-based search.

This new approach offers an alternative discovery path for finding key linkages across semi-structured data fast enough to be relevant in cyber time. The system is tuned to uncover relationships in cyber oriented reporting, supply chain risk management (SCRM), and open source (OSINT) content with the goal of helping analysts see information that they did not initially realize was available for retrieval, i.e. unknown unknowns. It adapts to the user’s current focus, by continually retraining its text mining models, applying state-of-the-art machine learning trained through the analysts’ natural interactions.

This machine-in-the-loop approach allows algorithms to “hang in there” as users pursue deep networks of entities and long query sessions, achieving human-level accuracy at machine scale.

In this talk, I will demonstrate semi-structured identifier chaining through private reports and open source data gathered by automated metasearch, which accesses multiple layers of the Dark and Open Webs.

By uncovering networks of related entities across open and dark web content, these algorithms support a new form of machine-assisted research in which the system recommends content for a user to incorporate into her working notes.  Instead of requiring users to craft complex queries in a pattern specification language, this system infers the utility of text passages and entity mentions from available source texts by automatically comparing them with the natural language in the working notes. The simple act of citing a source document provides active learning feedback that simultaneously sharpens the models’ representation of a user’s interests and intents, and broadens the system’s ability to traverse farther into the data to find new elements that the user did not yet know.