Monday, October 13, 2008

Finding Meaning One Word at a Time

When analyzing textual data there is a tendency to go narrow - within a document- not wide - across sets of documents within a corpus. Historically, wide and deep has been expensive in processing power and human resources. No longer. BlueAnt incorporating the iQuest methodology adds speed and accuracy to the investigation process.

When analyzing a corpus of documents for meaning a user must go beyond what is contained in individual documents and investigate how documents are related by common terms, concepts, ideas and meaning. Sometimes, meaning appears only in the relationships among a set of documents, meaning which is not discernible in any individual document. Thus, an analytic tool must be able to guide the user into seeing document relationships. At a fundamental level, these relationships are the terms that are common among any given set of documents. Term level relationships can then be built into concepts, ideas and meaning relationships.

An example of this might be an intelligence analyst looking for preliminary indications of a terrorist event. It is extremely unlikely that the planned event will be fully described in a single document. But it is possible that pieces of information about this event are spread over many documents in the global data available to the analyst. If the analysis tool can guide the analyst to finding these related documents, it can be possible for the user to piece together a planned action to an event before it occurs.

 A very important tenet of the iQuest model is that meaning is found not only in individual documents but also, often more importantly, in the relationships among documents.

 


Sunday, October 12, 2008

Finding Meaning in Unstructured Data

Unstructured data is all the rage. There is so much of it, in company repositories and on the web. Making sense of it is the challenge at hand. Making sense of it without bias is a massive problem. I have been working on the problem for 4 years now and believe I have found a reasonable approach. My company, iQuest Analytics is releasing a grammatically driven search and discovery engine call Blue Ant. Blue Ant leverages indexed content of enterprise information repositories using a process that extracts relationships of grammar such as parts of speech, natural language processing and semantic analysis. Blue Ant lets the data speak for itself. Blue Ant elevates patterns within the content creating dynamic contextually linked data. I believe that no other unstructured data analysis tool can accurately make this claim.