Remember me

Register  |   Lost password?


Our popular course Introduction to QuantLib Development will be taking place June 18-20th, 2018.


MoneyScience Blog Header 2015 2

Sifting for Sentiment - Interview with DataSift's Robert Passarella

Fri, 22 Feb 2013 05:11:00 GMT

In a second interview from last year's Battle of the Quants event in London, I sat down for coffee with Robert Passarella, the Managing Director of the Financial Institutions Group at DataSift. In a wide-ranging conversation, we talked about DataSift's Sentiment Analysis technology and the changing media landscape, and also touch on the impact of Big Data in the financial space and address the skeptical view that Sentiment Analysis is being unduly hyped.

Robert is a former VP at Morgan Stanley and JPMorgan, and was a Manging Director at Bear Stearns before joining Dow Jones as VP responsible for the Dow Jones Institutional Markets Business. In Robert's role at DataSift he has responsibility for product strategy, business development and building the business for financial services clients.

Jacob Bettany: Thanks very much for joining me today, Rob. Perhaps you can start by giving me a brief overview of DataSift and your technology.

Robert Passarella: The way I look at it, there are two companies in DataSift. One is the Big Data side of what we do which is the infrastructure itself. Our Founder Nick Halstead also founded Tweetmeme, so he's the creator of the Retweet button, and did a lot of work cataloguing and searching tweets. What grew out of that was expertise with Hadoop - we probably run the largest Hadoop cluster in Europe. The reason we use Hadoop is the power and the scalability it provides to our infrastructure to support what we do on the other side, which is the data.

We ingest and aggregate data from different social data providers, the most well known one is Twitter, and we then filter and enrich (augment) that data. We process that data in real time and add what we call 'augmentations' - so we'll add sentiment to what comes in, along with a lot of metadata. For example, when you take a Tweet, which is 140 characters, there is more probably more metadata associated with the 140 characters, than information in the message itself. Between Twitter profiles, Retweets, number of followers, we also add sentiment, Klout score, NLP (Natural Language Processing), trending data, and more. And this all gets processed in real time - that's thousands of tweets per second.

JB: So you have access to the Fire hose?

RP: Yes, we have access to the full Twitter firehose in real time and we also store it historically. So the reason we have the Big Data infrastructure built on Hadoop is not only to process this data, but also to recall it - and we can recall it very quickly.

We have developed a filtering language called CSDL (Curated Stream Definition Language) which is a tool we use to filter and normalise the data. So a single query can include multiple data sources. Let’s say you wanted to search for topical things, looking at keyword searches, you want to create a real time API and filter for Facebook, message boards, News Cred (which is a provider we have with access to more than 3000 data sources) and Twitter. Let’s say you wanted to look for anything related to Apple Product information, say you have a set of queries around Apple Products, people at Apple, a news story - even an Amazon Listing - we would provide that as a single data file. That provides a lot of flexibility and power, and you can turn the resource on or off as you like.

The model for DataSift is to deliver the solution as Software as a Service (SaaS), which enables people to pay for only what they consume. So if you know exactly what you want to do and create filter searches for that, there are two component costs, a content cost and a processing cost - or what we call a DPU (Data Processing Unit).

Because the problem with social media when it comes to companies in finance is finding the information, we took it from a different point of view. We've created filters to find the content which is company related, tag it as company related, and provide that feed at a fixed price on a monthly basis, much like a news feed. So we're really trying to reach a professional market.

The folks in finance really do expect to be sold a product which is at a fixed cost, which is repeatable and on a model they are accustomed to. So today we're doing all of the dollar-sign tickers, which will be provided as a feed, as the first step, and then we're layering on top of that, sentiment, all the link mechanics, and descriptive metadata about the people who are posting the Tweets. So if they've identified themselves somewhere as being an analyst, a trader or a broker, that's also included - and there are hierarchies, so there are media and journalists as well. We're starting to label that out so that it's much easier for people who want to process content, and look at company-related stuff coming through Twitter.

The next step really is that we're starting to - for the S&P 500 - tag things which don't have a dollar-sign, so we're aiming by 2013 to do the same for the Russell 3000. In this case, we’re actually tagging the companies, so if people are mentioning product names, people, sources, the name of a company, and all the extended meta-data, they can feel confident that they are all getting high quality information coming through about what people are talking for companies in one stream.

JB: Where you have investors with portfolios that are constantly evolving and developing, are you able to provide a service that adapts as their requirements adapt?

RP: Well you can look at it in two ways. Where people want to use the SaaS model, they really have a lot more control ultimately over what they do, because they can change their APIs, they can change the functionality. On the flip side, what we're finding, specifically for finance, is that people don't want extraneous data - they don't want the whole fire hose - they have clear expectations. So for instance they want to look at - what I would call - the link capital that's contained inside social media. When you think about Twitter, 25-30% of messages contain some kind of link and it seems to be that it's linking to a news story, or a primary source, and to me that's very powerful - as you're watching, almost in real time, how a story or a theme or a topic explodes - and that happens a lot in finance. People will point to an article, there will be a response, comment or opinion on that, and you'll see a ripple effect and diffusion. That really does have an effect, in some case, on the market psychology.

We had a conference yesterday talking about sentiment, and sometimes people talk about it as if it's a new thing, but sentiment has been around forever, since we've had 24 hour cable news shows that do business, and try and find out what traders are thinking and how they feel. Right here we have the perfect laboratory now, with social media, where people who are involved in markets are telling you exactly how they feel, so we have that edge and can aggregate and actually do something quantitative with that data.

JB: I get that Twitter is really popular and we’re obviously both evangelists for it as a tool, but surely there’s a big population of the investing community who's not engaged with Twitter. Aren't you therefore getting a very self-selecting viewpoint on the markets from the Twitterati? Or do you see their viewpoints as representative of the trading community as a whole?

RP: I do see that, and this goes back to a point we try to make a lot which is that one thing doesn't serve all. There's no one magic bullet to solving a problem. Twitter has a place right now, and it's a growing place, but it's early days - which is good. We're seeing adoption right now from the places you would expect, from Media Organisations, who are using it as an open news feed. Information you would normally see on a 'real-time' feed that you would pay for you're starting it see it on Twitter. On the flip side, if you're employing a strategy based on sentiment, you're going to want to use multiple sources, and Twitter is a growing one, but news, is clearly another one.

For us, that's where NewsCred comes in. They aggregate the news from all the big content providers that are out there, something like 3000+ sources and they're growing.

The way the market has changed is interesting. News is online for the most part, but it's shifted. The market for terminal-based, professional-based news is shrinking. God isn't making any more traders or market professionals. With the financial crisis the people who actually view real-time news on a platform has shrunk. What's happened is that there are still plenty of people who want real time news but they moved to another marketplace. The news media has understood this, if you look. Everyone is out there trying to figure out exactly how to drive traffic to their websites - and what they're finding out is that the best way is to leverage social media. So I'm watching, as if in slow motion, every media company turning to Twitter and asking how they can use it to drive traffic, asking how can we get people in this open forum to come back to our website which could get us paid either from advertising or from a subscription. This is very similar to what happened with RSS. To me, it's a natural progression. Twitter will become more and more professional.

JB: A lot of what you've talked about feeds into the idea of Big Data which is obviously something of a buzzword at the moment. Do you think this new environment demands a new set of skills?

RP: Absolutely. In finance, Quants are changing too! I had this conversation with the guys at Wilmot Magazine, they are becoming data scientists. It used to be that it was hard to get data, but these days data is not the problem. The problem is actually finding tools to give you value, and to find things in the data. So Quants are becoming data scientists, and indeed, data scientists are becoming quants.

JB: When you started with DataSift, was finance your target market, or did you have a broader view of the applications of this technology?

RP: I actually came on board in the last few months to focus particularly on finance, but DataSift actually has a much larger business right now that goes across Media, Consumer Products, anyone who wants to use any of the tools to analyse the data that's out there, especially from Twitter.

I was just out at a large consumer product company which is very interested in figuring out how their products and campaigns are doing. They're very much thinking about their strategy on Facebook, and how that plays to Twitter, and who's linking, and how hashtags perform, so DataSift's core business grew up around that. What's naturally happening is that they started getting a lot of inbound inquiries from financial firms, hedge funds and others saying, we want to examine this data, we want to use this data, so it made to sense to start thinking about building a financial vertical, and that's how I got involved. Being here at this conference, we've been very busy talking to people and I've been pleasantly surprised by the applications that people are considering.

JB: There's going to be people who approach sentiment analysis with an element of scepticism. I mean, there's always been data, and there's always been problems understanding that data. People might suggest that there seems to be some hype around Sentiment Analyse as an idea. Is there a risk that providers like DataSift are making a bigger deal out of this than is justified? What's the difference now?


RP: I think sometimes it's easier to look at these things, not while you're in the middle of it, but in retrospect. So for instance I can remember specific meetings at Morgan Stanley in 1995 trying to explain to Senior Management why we need a website. Senior Management would look at me and say, ‘but isn't that for retail companies? Why would we, Morgan Stanley, want a public presence on the internet, that's just nuts!’ And I would lay out what it meant for client interaction, and talk about firewalls and password protection and explain how we could take all that data that we ship everywhere - like research reports, customer information - and actually save money as well as making it easier for the customer to get what they want 24-7.

In this case, quite clearly there's something going on. The number of Tweets is increasing geometrically every 9 months. There's a huge amount of information being exchanged here, and you could argue that a lot of it Justin Bieber, One Direction, Lady Gaga, what I had for breakfast, but there's a lot of good signal as well, and the greatest part is that the same API which sends that out is available for anyone to use with the right tools and technology.

That to me is what's amazing, that open API that's out there called Twitter, that allows people to communicate in real-time, to massive amounts of people, who wish to follow, can be utilised in many different ways. That's exactly where we were with the internet when it was opening up to business in '94 or '95.

JB: Of course Twitter are closing down the API, or at least they're changing the rules, is that a problem for DataSift?

RP: No, we have access to the full Twitter fire hose, and one of the ways to get that information is through us. So I understand what Twitter is trying to do, I understand the business model, which I think is valid for them, but the flip side for us is that we're a platform that's available to anybody. We have the power and the scalability to handle the data loads that are coming through. For professional use, and use in a business context, getting it from us is a great choice because you get the SLA's and support that Twitter may not be able or willing to offer, in the context of providing that API.

JB: So you have a particular licence that allows you to do that?

RP: Right, that one of the keys that we have. We're one of just three Twitter Certified Data Reseller partners in the world that are allowed to do that. It's nice to have Twitter say, we'd like you to do this, it helps.

JB: Following the release of the social feeds for the financial services industry, what's next?

RP: Now that we've laid a foundation, it will be to continuously add more data to this and make it, in my opinion, a true newsfeed coming from social media. So we want, in the business world, anyone who was interested to know what was being said about companies, or products, or themes in social media to get a feed like this from us. So in the roadmap we're asking, how can we handle macro? How we can handle commodities? How do we make this accessible to more people?

JB: Thanks very much for joining me Rob. I'll very much look forward to connecting with you again in the future to find out how things work out.

RP: No problems.

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,