The Quagmire of Backtesting Alternative Data

Dec 05, 2023

Introduction

In the high-stakes world of finance, where hedge funds manage billions of dollars and compete at the millisecond, alternative data has emerged as a pivotal edge in the relentless pursuit of alpha. As an alternative data provider, I am on the precipice of this financial revolution. But the journey from data to buy decision is anything but straightforward. In this edition of The Data Observatory, we will discuss the intricate dance between data vendors and hedge funds during the backtesting phase of the sales cycle—a phase as critical as it is cumbersome.

The backtesting process, where historical data is run through investment strategies to gauge future performance, is a cornerstone in hedge fund evaluation. For data vendors, it's a rigorous vetting ritual that tests not only the data's quality but also its predictive power. We find ourselves in a paradoxical scenario: despite the industry's rapid adoption of cutting-edge technologies, inefficiencies abound in backtesting practices. The fusion of massive datasets, complex algorithms, and fierce competition creates a bottleneck that is paradoxically resistant to the very innovation it seeks.

Our interactions with hedge funds are shrouded in a mix of secrecy and urgency. The challenge is to prove that our data can not only withstand their stringent evaluation but also add value in a market saturated with information. As vendors, we stand on the front lines, deciphering codes of market movement and investor sentiment, packaging this into neat datasets ready for scrutiny.

However, the inefficiency of the process is not just a vendor's lament but an industry-wide puzzle. Is it a mere consequence of the intricate checks and balances that underpin financial market transactions? Or is it an artifact of a system resistant to change? While the answer may not be straightforward, the quest for efficiency persists. Could a technological solution streamline this process, or are we destined to continue to navigate this labyrinth indefinitely?

Significant Gains in Scouting and Ingestion

The emergence of firms like Neudata, Eagle Alpha, and Battlefin has dramatically changed the equation. By specializing in "data scouting," these companies have created a symbiotic environment where funds and data vendors are brought into alignment with greater ease. Their role extends beyond mere matchmaking; they provide a valuable service by vetting the data for quality and applicability, ensuring that only the most pertinent datasets reach fund managers.

Moreover, these firms have transformed networking in the alternative data space with their conferences and events, providing platforms for data vendors and buyers to connect. These gatherings have become critical hubs for collaboration, discussion, and deal-making, effectively removing the barriers that once made the search for valuable data so daunting.

These strategic partnerships and events facilitate the initial connection between data providers and interested funds, setting the stage for more effective data ingestion and utilization. The service provided by these companies is not just about connecting two parties but also about creating an ecosystem where data can be exchanged, evaluated, and implemented with unprecedented speed and relevance. As a result, the path from data discovery to integration is now shorter and more direct, which is a vital step towards reducing the overall backtesting cycle.

The industry's approach to handling alternative data has also undergone a radical transformation, reminiscent of a technology sector that rapidly matures. Not so long ago, the ingestion of alternative datasets into hedge fund analytics platforms was a Herculean task. Back then, the utilization of cloud big data technologies and warehousing was just beginning to gain traction. The landscape was vastly different—with robust technical documentation from data vendors considered a luxury rather than a necessity, and the creation of custom Extract, Transform, Load (ETL) processes for each dataset was the norm for funds. This tedious beginning was the first of many bottlenecks that contributed to the notoriously slow pace of the backtesting cycle.

The progress since then reflects a significant shift toward efficiency and user-friendliness. Today, we witness a burgeoning practice among data vendors to equip clients with comprehensive technical documentation. The documentation has become akin to that provided by SaaS API companies, reflecting a standard of clarity and accessibility once absent in the sector. Moreover, the provision of custom ETL code by vendors for smooth data ingestion signifies a leap toward reducing complexity and accelerating the onboarding process.

This progression is bolstered by the advent of companies such as Eagle Alpha and Crux Informatics, which bridge the gap for funds lacking in-house expertise. They play a critical role in fine-tuning the initial stages of backtesting, ensuring that data vendors can offer optimized and actionable datasets from the get-go.

The maturation of cloud data warehousing solutions has also been a game-changer. Platforms like Snowflake, Databricks, and AWS Redshift have not only become more sophisticated but also more cost-effective, drastically altering the data storage and processing landscape. The introduction of native data-sharing capabilities within these platforms has introduced a new paradigm—eliminating the need for cumbersome ETL tasks and facilitating direct access to ready-to-query datasets.

Emerging specialists like Bobsled further exemplify this evolution, providing streamlined data-sharing services that enable interoperability across different clouds and platforms, ensuring that data can be as mobile and agile as the markets themselves.

Despite these advancements in the initial stages of the backtesting process, the question remains: Have we significantly shortened the sales cycle? The ingestion and integration of data are indeed faster and smoother, but does this translate to a quicker path from data evaluation to trading decisions?

The Paradox of Progress

The landscape of alternative data for hedge funds has indeed been reshaped by technology and market evolution, leading to an onboarding process that is almost as smooth as silk. Anecdotally, the narrative within our industry points to a marked improvement in how data vendors introduce their offerings into hedge fund operations. As someone deeply entrenched in this process, I can personally vouch for the leaps made towards efficiency—data integration that, once sprawled across weeks, can now be executed with near-instantaneous precision.

The reduction in onboarding time is undeniably a triumph of technological ingenuity and the maturation of the supporting ecosystem. Data warehouses and sharing technologies have not only slashed the time required to transfer data but have also made it significantly easier to align with the complex and varied infrastructure of hedge funds. The result is an onboarding experience that aligns with the high-velocity world where these funds operate—a testament to the industry's ability to evolve with its challenges.

Yet, this streamlined entry point into the hedge fund's analytical process has not translated into shorter backtesting cycles. Instead, it has unveiled a curious phenomenon: the time saved on data transfer and integration has been reallocated to more rigorous and extensive testing phases. Hedge funds, unburdened by previous logistical constraints, are channeling their resources into deeper, more comprehensive analytical exercises.

This shift suggests that funds are placing a higher premium on certainty and depth over speed, a choice that reflects the nuanced demands of a competitive and data-driven market. The ability to perform more thorough testing is a double-edged sword; it offers the potential for better-informed decisions but also extends the evaluation period, which can be a source of frustration for data vendors and funds alike eager to see their data in action.

The Intractable Challenge of Backtesting

We've examined how alternative data vendors have revolutionized the onboarding process for hedge funds, achieving a level of swiftness and ease that was previously unimaginable. Yet, the subsequent phase—the evaluation of the data through backtesting—remains an enduring challenge, a complex puzzle that has resisted the same level of streamlining and speed optimization. This section will delve into the intricacies that make the backtesting exercise a particularly hard nut to crack despite advancements in data provision services.

Backtesting stands out as a critical endeavor for funds to assess the practical value of data beyond the glossy veneer of vendor presentations. While vendors have progressed towards providing measures of forecast error and predictability, these assessments are ultimately promotional tools designed to underscore the data's strengths (I can vouch for this as a vendor 😉). It's a natural part of the sales process: vendors are inclined to showcase their data in the best light possible, often through objective-looking studies that, while rigorous, are meant to persuade as much as they are to inform.

Hedge funds, armed with healthy skepticism, approach such vendor-supplied literature with caution, conscious that while these studies are valuable, they are also crafted to sell. Consequently, funds invariably undertake their own independent analysis, duplicating efforts to ensure that the objectivity and relevance of the data to their specific strategies are truly upheld. It's a testament to their due diligence and the high stakes involved that despite receiving forecast accuracy metrics, the fund analysts will dissect and test the data against their proprietary models and investment hypotheses.

The differentiation in fund strategies adds another layer of complexity. Each fund has its own set of strategies, risk appetites, and investment mandates, which can range from highly systematic to deeply fundamental approaches, from aggressive to conservative risk profiles, and from long-only to a mix of long and short positions. These variances are not just broad strokes differentiating one fund from another; they often exist within a single organization, where diverse "pods" or teams may operate under different mandates and with varying investment philosophies.

This diversity means that the utility of a single dataset can vary wildly from one context to another, making it impossible for vendors to provide a one-size-fits-all solution. The tailor-made nature of backtesting is its strength, allowing funds to apply data to their unique strategies and constraints, but it's also a source of inefficiency and prolonged evaluation times. What's beneficial for one pod could be irrelevant for another, even within the same fund.

The question remains: Is there a way to optimize this bespoke and labor-intensive process? Some suggest that shared standards or benchmarks help streamline evaluations, but the deeply personalized nature of investment strategies defy such standardization. In the search for optimization, perhaps the industry might look towards a middle ground, leveraging technology and shared metrics where possible while acknowledging and accommodating the need for individualized testing where necessary.

The Role of Neutral Platforms in Streamlining Backtesting

As the backtesting quandary persists, a new breed of technology vendors has stepped into the fray, proposing solutions that add layers of neutrality and efficiency to the evaluation process. Platforms like Maiden Century and Exabel have been at the forefront of this initiative, offering hedge funds turnkey access to pre-vetted data sets through user-friendly web interfaces, complete with pre-calculated backtests and forecasts. These platforms have been designed with the aim of cutting through the salesmanship of data vendors by providing an independent performance appraisal of data sets.

The appeal of these services lies in their promise of neutrality: they act as third-party evaluators, devoid of vested interests in the data being sold. By doing so, they provide an objective baseline from which funds can begin their analysis, potentially saving time and resources that would otherwise be spent on initial data cleaning, processing, and preliminary testing. Furthermore, their ready-to-use forecasts and backtest results mean that funds can quickly gauge a data set's historical effectiveness against market movements without committing to a full-scale, in-house backtesting operation from the outset.

Despite these advantages, the question of whether such platforms can truly shorten the backtesting cycle for hedge funds is complex. On one hand, instant access to analyzed data sets can undoubtedly eliminate some of the early-stage legwork, allowing funds to proceed directly to more nuanced aspects of testing that are specific to their strategies and constraints. On the other hand, the final verdict on a data set's utility is not solely determined by historical performance metrics. Hedge funds must still ascertain the data's predictive power and alignment with their investment models, a task that often requires bespoke analysis and cannot be entirely outsourced or automated.

Moreover, the depth and rigor of backtesting demanded by hedge funds may not be fully addressed by third-party platforms, which have to balance the breadth of their offerings with the depth of analysis. While such platforms can provide a helpful starting point, they may not cater to the granular needs of every fund's strategy, particularly when it comes to the long-tail events and outlier scenarios that funds must account for in their risk assessments.

While technology vendors like Maiden Century and Exabel contribute valuable tools to the backtesting toolkit, they are parts of a broader solution rather than a panacea. Hedge funds may find these platforms useful in hastening certain steps of the backtesting process, but the inherent complexity and customization required in backtesting seem likely to retain their time-consuming nature. The future sections will explore whether these neutral platforms can evolve to bridge the gap further and what additional innovations or industry shifts may be necessary to truly revolutionize the backtesting cycle.

Confronting the Time Dilemma in Backtesting with Technology

The conundrum of backtesting in the data-driven investment world can, at its core, be distilled into issues of time and trust. The decision to purchase and integrate a new data set into the investment process typically rests on the shoulders of the very individuals who operate at the heart of the market's pulse: quants, analysts, and portfolio managers (PMs). Their endorsement is pivotal, as they are the ones to spot the alpha-creating potential of new data. Yet, herein lies a significant challenge: the same individuals are already stretched thin, juggling the demands of an intense and dynamic market environment with the critical task of data evaluation.

The quandary is further compounded by the consideration of expanding teams to alleviate the workload. While distributing the responsibility of backtesting across more team members might seem like a straightforward solution, it runs the risk of inflating operational costs and complexity. This approach can be financially intensive and only reserved for the largest funds.

Trust is another critical dimension. When the reliability of data analysis directly influences an analyst's success and decision-making, there is a natural inclination for these professionals to be intimately involved in the data evaluation process. They are unlikely to relinquish this responsibility to others or to unverified automated systems.

Addressing the time challenge necessitates an innovative technological approach. If history has shown us anything, it's that technology excels in amplifying human efficiency. Recalling Steve Jobs' analogy of the computer as a "bicycle for the mind," we find a fitting metaphor for the role of technology in enhancing human capabilities. In the context of backtesting, this could translate into the development of more sophisticated, automated tools that can assimilate and analyze large data sets with minimal human oversight.

Such tools would need to offer a high degree of transparency and customizability to gain the trust of quants and PMs. They should act not as replacements but as extensions of the human experts, taking over the labor-intensive components of data testing while leaving the final, nuanced judgments to the discretion of the professionals. Only then can the full potential of these technological aides be harnessed to compress the timeframes of backtesting without compromising on the thoroughness required for making informed investment decisions.

In essence, the future of backtesting in hedge fund operations may well depend on the industry's ability to develop and adopt such advanced technologies. These tools must be able to navigate the unique complexities of financial data while offering the speed and efficiency that modern markets demand. By doing so, they will not only solve the time problem but also fortify the crucial foundation of trust upon which all investment decisions are built.

The Data Observatory