Free Site Registration

How to Find Alpha Needles in Unstructured Data Haystacks

Traders Magazine Online News, September 12, 2017

Barry Star

The challenges of working with unstructured data came into sharp focus last year when a self-driving car was involved in a fatal crash on a Florida highway. The NTSB ultimately found that the driver-assistance system was not at fault. However, the federal regulators did warn people that they can only rely on these systems to handle some of the situations that occur on the roads.


To get the Wall Street equivalent, just change a few words in the regulators’ warning. Traders can only rely on these systems to handle some of the situations that occur in the markets.

While not as catastrophic as a collision with an 18-wheeler, a bad trade based on a poor assessment of unstructured data could cost a firm millions.

That said, there is untold value and potential alpha tied up in the vast amounts of unstructured data now available. The key question, therefore, is how best to identify it, extract it, and leverage it to improve trading and investment results.

The Nature and Scope of the Unstructured Data Problem? 

While definitions vary, unstructured data is content that doesn’t follow a pre-defined data model. Unlike the tidy and consistent rows, columns, and formats of structured data, unstructured data is all that messy and free-form content that people generate, mostly intended for use by other people.

It’s the free-form material in e-mail and text messages, blogs, videos, podcasts, chat sessions, and all kinds of marketing material, including web sites, marketing and sales collateral, white papers, slide presentations, and more. Essentially, it’s all stuff that doesn’t fit nicely into databases and spreadsheets.

Gartner and other market research firms agree that unstructured data comprises the lion’s share of most organizations’ informational assets – somewhere in the 80% range. The problem is that we’re still doing a poor job of mining the value out of this huge category of data.

Options for Moving Forward

Many Wall Street firms want to leverage various forms of unstructured data to generate profits and avoid losses. They want to wring the value out of that data to gain a clearer picture of their markets, spot patterns and anticipate developments more effectively, and take faster action to seize opportunities and sidestep risks.

These companies have three choices. They can build the software themselves, try a shortcut that skips all the really hard translation and analytics by using keywords to attempt to gauge sentiment, or buy or subscribe to a third-party system. Let’s look at the pros and cons of each option. 

The “build” option can work if the company’s unstructured data initiative is narrowly focused and of limited scope. A good in-house development team can create and piece together all the taxonomy, text parsing functionality, analytics and metadata management functions required for a focused application.

For more information on related topics, visit the following channels:

Comments (0)

Add Your Comments:

You must be registered to post a comment.

Not Registered? Click here to register.

Already registered? Log in here.

Please note you must now log in with your email address and password.