What's Your Prior?

Data Hunters: how traders use data to beat the markets

April 23, 2021 Damian Handzy Season 1 Episode 8
What's Your Prior?
Data Hunters: how traders use data to beat the markets
Show Notes Transcript

Data is the lifeblood of investment research and of making investments that beat the market. This episode's guests are all data experts:


Speaker 1:

Making money by investing boils down really to just one thing, having access to information before others do a hedge fund. Client of mine once told me that back in the late seventies and early eighties, long before the internet, his commodity trading firm hired accounting majors all over the U S to go into farmers' fields. And, and with the farmer's permission, they play small devices that the firm had custom made that would record things like temperature, humidity, soil, acidity, rainfall, things like that, kind of everything you need to know to accurately estimate how much of each crop will grow that year. Every two weeks, the college kids would swap out a disc full of the information and snail mail it back to the home office where they poured over it. That was the data nobody else had, and they used it to decide how much of each agricultural future to buy or sell today. He bemoaned, everyone is walking around with access to that kind of information about every market in the world, in their pockets. It's much harder today to beat the market than it was before this explosion.

Speaker 2:

Hello and welcome to what's your prior the book cost for the adaptive investor with your hosts , Damien [inaudible] .

Speaker 1:

This episode is actually part one of a two part series on what I think of as the lifeblood of investment management data. In this first part, we're going to explore how investment managers extract useful information from data that already exists, or from data that needs to still be generated. And the challenges associated with that effort. In the next episode, we're going to explore it the other way around how the industry goes about gathering and packaging quality data, to answer questions that everyone wants answered, because they believe those answers contain nuggets of gold in the form of higher returns. Now , in that next episode, we'll focus on ESG, but for both of these episodes, unfortunate enough to have gotten three leading industry data experts to explore the topics with me. My guests for both episodes are Lisa Connor , Jeremy boshed and bill Haney. Now , can I ask each of you to please briefly introduce yourselves?

Speaker 3:

I've worked in market data for 25 ,

Speaker 2:

Five years, at least. And currently I'm the North America

Speaker 3:

Client services representative for rhymes tech .

Speaker 2:

Hi, thanks for having me Damien , uh, bill Haney. I have spent my career in information, data analytics, workflow, serving the Holy Trinity of capital markets, issuers, intermediaries, and investors.

Speaker 3:

Hi, this is Jeremy box . Just really glad to join you all spent majority of my time at three different startups in the last decade. And then I did spend about four years in data, specifically at Bloomberg doing alternative data. And now I'm actually at a FinTech lender that uses high-speed data to help underwrite consumer goods companies it's called the make growth.

Speaker 1:

All right , thanks. It really is a pleasure to have each of you on the podcast. I'm looking forward to today's conversation. So today we're talking about the challenges with extracting alpha from the available data, especially with the massive increase in the amount of data that we now have. All right . So what is the state of the industry today? What data confirms actually use to their advantage, given that it seems everyone has access to all this data.

Speaker 2:

I'll jump in with a bigger theme, but I'll let the experts take it from there. I think you're absolutely right. There's been an explosion of data. And if you are an active manager and you wake up in the morning and you say, I need a unique advantage, but when I look out at the landscape of available information sets, and by the way, this is whether your fundamental cause even fundamental managers look at enormous amounts of data, as well as quantum mental, as well as the extreme quants, like two Sigma and WorldCom , which have, you know , enormous amounts of resources, your job is hard. And by the way, that data is not clean, it's messy. It has all sorts of problems with it, whether that's time series problems or tagging problems. And so your job, your job, couldn't be harder than the story you told, just because of the sheer quantity of things that you have to look at. It's created real problems, enduring our job, which is to make those managers efficient and insightful.

Speaker 4:

I think it's not only a question bill of efficient and insightful, but it's serving them up what they need in a format that they can use and not question whether or not there's additional data on the peripheral that will just give them that extra edge, because you don't want the alpha generators, the people that are making revenue for your firm to question whether or not there's additional data that they need, or if there's additional data that they're missing it. So you need to build something that can deliver complete and accurate, but then is flexible enough to add when it's needed to be added. Okay,

Speaker 1:

I'm liking this. I've heard a few things so far that really resonate with me. First bill, I heard you mentioned that even fundamental managers pour over lots of data. I got to tell you the investment metrics, part of what we do is factor analysis and the amount of fundamental data that goes into measuring factor exposures factor returns, and figuring out how much of a fund's returns are due to each factor it's monumental. Each month we calculate 130 different sub-factors across 75,000 global stocks in over 650,000 different funds and benchmarks that adds up to about 10 terabytes of market data and seven terabytes of our own calculated data. And Lisa , to your point, all of that data needs to be timely, accurate. And as you really pointed out complete, we can't have gaps in that data. It adds up to a staggering amount of work that most people don't ever see. Jeremy, what do you think about

Speaker 3:

All this? I think I would add a few different elements to this data discovery, gluing the data together, getting the ingredients into the box. All of that comes with a price and a very heavy price, given the speed and veracity of the data that we all carry around in our phones that we can scrape from the internet. So I think deriving useful insights and analytics is obviously the most important part of this stack. And it really comes back to the size of the institution, the commitment to spend on tools and assets, and also the framework of the firm. So if an asset manager is looking at very long duration trades, they're little less focused on now, casting, they're a little less focused on the inter quarter activity. If they're top-down, maybe they can survive with a little bit less data. And it's more about narratives. Whereas if they're much more bottoms up in highly liquid markets in very highly liquid equities, they're going to need everything. If you're now casting a highly liquid momentum stock, you're going to need every piece of data you can get. You're going to need the web scraping. There's at least five, very expensive data sets . You would need to trade that very ably. If you're a short duration trader, if you're long duration trade is, I love it and I'll pay any price for it, which is some people. If you have a three to five-year outlook, that's perfectly fine. So I think it swings back to the nature of the firm, the duration of the trades, the commitment to budgeting, and then back to Lisa, your point on process, gluing it all together and actually getting some analytics out of it.

Speaker 2:

You raise a really interesting point, Jeremy, about commitment. I have found a very wide range of financial and data commitment, even amongst the largest, you know, the top 25 global asset managers. For example, they struggle even when they create centralized teams to ingest stage and serve data up to the investment teams and their analysts, they really struggle to commit the number of resources required to go deep around one issuer as , as you just mentioned. And so I think they're , you know, they're really asking themselves with fee compression with all sorts of expense austerity, particularly in the last, forget the pandemic, but you know, even, even since the great financial crisis, ramping that up to the level of commitment required has been a struggle. So Jeremy, how do you think asset managers think about the economics of the data what's worth getting? And what's not

Speaker 3:

What I often found when you talk to a quantum mental type of shop is you talk to the data Hunter and they sort of say, okay, if your data set cost me $200,000 a year, how much can I meaningfully derive out of this ROI? Do I think this is a three X, is it a five X? Will I lose my job over it? If I don't have it, or my team's going to be sort of underserved in a way that is meaningful to them and I'll get in trouble for it, or if I overextend myself and spending too much money, am I basically on the hook? So I think part of the job of the data Hunter is to find things that have never been found, but at the same time, keeping the integrity, the job, and frankly keeping their own job. There's a lot of sort of, if , if somebody wins on a trade it's I always knew Carvana was fantastic. If somebody loses on a trade, they can say, well, I really didn't have the right information or we overpaid for the wrong information or somebody on the team signed up for a data feed that wasn't able to stay consistent. You know, a lot of times data vendors, when they get in trouble can create data or falsify data or do things they need to do so that they don't shut off the API. Cause the worst thing in the life of a data vendor is when their APIs go down and if they go down too often, they can often be shut off for good. And then the data procurement team is sort of saying, well, I didn't have a backup for that dataset. Or we only buy one credit card data set . We don't have three and a mosaic. And one, if one of them goes down or one, if one of them becomes sort of on the wrong side of a privacy legal representation, then all of a sudden I don't have the data that my team needs to trade. There's a lot that goes on in this chain of custody of finding the data, procuring the data and getting paid on that data. So there's a lot of sensitivity around keeping your job. I think that that is critical, which I understand very well. So there's a lot of stress that goes into this whole supply chain.

Speaker 4:

I think Jeremy, you raised a lot of interesting points, right? Procuring the data and negotiating the fees is obviously been a hot topic across, you know, both all parts of the business, right? Because the it's the front office they're actually paying for it. So they're saying what's the value add for all of this content that you're then you , that you're laying at my feet and enabling for me to use. Right. But I think the other thing is, is that search for, to use your term , right? The data hunters , right? That searched for that niche service, you may be willing to pay a lot. If you can ensure that your tray that's going to be the differentiator for your trading strategy. So there's a lot of complexity to that. You, with your Carvana example, outlined a lot of different data types , foot traffic, credit card data. How do you knit that to traditional financial market datasets and make it usable and not add 10 different it layers and all the infrastructure that you need to be able to do that? I think it's not only the cost associated with the data, but the cost to maintain that data and deliver that data in a usable format. So there's additional layers of cost that I think a firm needs to be able to accept and say, this is what's going to put us head and shoulders above our competitors.

Speaker 1:

Okay. I've worked in analytics firms that have all the data in one massive curated database. And I've also worked in shops where the data is spread over many different databases, different database types, right ? Some sequel , some key value pair, data lakes, sometimes hyper and normalized duplicated copy data, edited data, just a mess. Right. So how do we make this efficient? I mean, is it even worth trying?

Speaker 4:

Yeah, I think that's the key, right. Is how do you stitch it together and deliver it? I think is a problem that everyone in the business is trying to solve. Right? We've all heard about alternative data and alternative

Speaker 3:

Data opportunities for years now. But how do you stitch that together? I think is something that everyone in the business is struggling with right now. I think Lisa brings up an excellent point. I think if you look at the different constituents, you have exchanges like LSE , even liquid net buying into data and analytics companies to try to form that chain. And I think some of that is very specific to market tick data and pricing data. And that's one way to attack it. Or you have this two alternative data basket and some people call it external data. Some people just call it data, training, data, whatever you want to call it. But non traditional financial data can often come in various forms and it can be very hard to link that together. So I see point solutions that I like, I think snowflake data catalog type of companies and even data bricks will eventually converge into trying to get all this data into at least the same place. But I think bill touched on this, the concordance of linking back to tickers is monumental task. Now you have more than 10,000 tickers, right? With the advent of SPACs and all the other new public instruments, trying to link all of this unique data back to a public instrument tends to be their top down or bottoms up. I've got companies that I advise that do very specific things. One company particle one actually does commodities, knowledge, graphs , and links that exposure to public companies. So what is the coal or exposure of cobalt to say Tesla? These are the types of things that that particular company is trying to map. So I think there's bottoms up sort of point solutions. There's, top-down sort of cloud infrastructure solutions. And then you have , uh, you know, companies like crux meaningly , meaningfully working through this as a connectors business for the two sigmas and cities of the world. So it's somewhere between all these boundary layers. You find the answer. If you're willing to spend the time and , and willing to spend the money.

Speaker 1:

Jeremy, I started the entire episode with that story about the commodity hedge fund that sourced its own data that we today would call alternative data. Right? And that's an area that you've worked in extensively a while at Bloomberg. So kind of just for the listeners, the kind of the quintessential example of alternative data that, that I recall when it first came out, was looking at , uh , parking lots that this sticks, right. Satellite data or Twitter feeds to estimate the number of people actually in the stores, right? So that you can get a sense of whether or not that particular store, if it's a public company is gaining or losing customers. So things that are not in the typical dataset , where does that fit in with this entire discussion journey ?

Speaker 3:

One of my observations about alternative data that I found in at Bloomberg was I probably met with 300 companies or so in three years. And we ended up partnering with about 10% of them. Most of them would fail some sort of layer of compliance, whether that was, do they own the data? How did they get it from the web? Where was the sourcing of it? Is it unique? Do they own the collateral? Is it consistent? Can it be delivered in a format that makes sense that there's so much that it can happen from point a to point B in that custody chain, that new data sets very, very challenging to use. So sometimes they kind of harken it back to, it depends on the market maturity. So there's a big difference between equities data, crypto data, frontier market data, and linking that back to the opportunity set. So Coinbase, for example, if you read the [inaudible] , they'll say we actually make 50 basis points on every trades. If Bitcoin's at 60,000 Coinbase, maybe making $500 every time there's a Bitcoin trade. Whereas as you can guess on the NASDAQ and nicey or CME , getting that, you know , fraction of a penny on each trade is very exciting. So I think there's so much opportunity in some of the frontier markets and new asset classes that allows for some sort of sloppiness around this whole process, but with common equities or very mature assets, every dollar counts, every penny counts. So understanding the compute price of Amazon actually even rendering you the data, you know, that drip, everything counts. So in the mature markets, it's a super competitive business. And in the newer markets, you're allowed a little bit of flexibility because you know, there's enough volatility and enough uncertainty around bid-ask spreads.

Speaker 2:

Give me an example of what Lisa and Jeremy were just talking about, which is my most recent gig with credit benchmark, which was consensus banks , sourced credit ratings. So here's the challenge, depending on the investment use case momentum versus trying to find that sweet spot between names that are falling from investment grade down into high yield or the reverse rising stars or defining quality or doing sector level analysis, depending on which strategy you're employing, you have very different reference data, tagging, cleansing, and history needs. And so if you go for the super set of those use cases, the data challenge is enormous and you don't know where to, you know, as a, as a provider of data, you don't necessarily know where to start, right? So you need a lot of help from technology in the sector space. For example, most investors want to see private company data because that drives where the sector is headed, but you can , of course only invest in it for public instruments in the stuff that's trainable . It gets really complex as you try to solve everybody's needs at the outset [inaudible] .