RIAs are Being Sold a Data Playbook. It Doesn’t Work.
If you’re waiting for your data to be perfectly in order before deploying AI, you’re falling behind.
I’ve spent the better part of the past decade building Data & AI teams in the compliance-heavy education space, serving millions of users. Now I work under the hood with some of the largest RIAs in the country. I see the same mistakes being made, but they're not the ones you’d expect.
Picture this:
Your doctor calls to walk you through your lab results. They gladly report that your liver function, blood sugar, and cholesterol are all within normal ranges. By every standard measure, you’re healthy.
But are you?
True, based on those numbers, there’s nothing that’s obviously wrong. But they don’t tell you whether you can climb three consecutive flights of stairs without stopping, keep up with your kids at the park, or train for that half marathon you signed up for in a moment of unexpected confidence. Ultimately, they measure your body against a “typical” reference range, but they don’t measure what your body can actually do.
Medicine learned this the hard way. For decades, clinical trials relied on what researchers called surrogate endpoints, which are effectively lab values that were assumed to stand in for real patient outcomes. If a drug lowered cholesterol, surely it would reduce heart attacks. If it shrank a tumor, it’d extend life.
In the mid-2000s, Pfizer spent $800 million developing a drug called torcetrapib, a cholesterol drug that performed exceptionally well on paper: it raised “good” HDL by 72% and lowered “bad” LDL by 25%. Their CEO called it “one of the most important compounds of our generation.” Then the clinical trial of 15,000 patients showed a 58% increase in all-cause mortality. Every biomarker suggested promising results, and yet, patients were dying faster. Pfizer killed the drug shortly thereafter.
Torcetrapib isn't an outlier, either. One recent analysis found that 86% of cancer drugs approved on the basis of surrogate endpoints haven't been shown to improve actual patient outcomes years after approval. The intrinsic measurements keep getting better, but the extrinsic results don't follow.
It feels like the data quality industry is in its surrogate endpoint era.
In wealth management, most firms haven't even started formal data quality testing, and many are waiting until they do before they try to put their data to work. Meanwhile, the core firm KPIs, workflow automations, and strategic decisions that data is supposed to enable aren't getting any closer.
In wealth management, most firms are under the impression that their data needs to be perfectly in order before they can put it to work. Whether they've implemented formal testing or not, the instinct is the same: organize and clean it up first, use it later. This process, often managed by decades-old consultancies, can take upwards of a year. Meanwhile, the core firm KPIs, workflow automations, and strategic decisions that data is supposed to enable aren't getting any closer.
–
Most data quality frameworks converge on the same handful of dimensions: accuracy, completeness, consistency, and timeliness. These are reasonable things to measure, and they catch real problems (stale data, null values in critical columns, etc.). But notice what all these dimensions have in common: they measure the data against itself.
These frameworks ask whether the data conforms to its own schema or rules. They have nothing to say about whether anyone can actually do anything useful with it.
Your organization can have a pristine data warehouse: every test passing, fresh data across all of your sources, and duplicates resolved—but still fail to produce anything that is actually useful. Most firms today struggle to automate basic workflows, reconcile portfolio performance across custodians, or integrate an acquired firm’s book of business without months of manual cleanup. The surrogate endpoints are promising on paper, but the patient isn’t any better off.
And to be clear, it’s not as if the people building these frameworks don’t understand this. SYNQ, a data observability platform, published a frameworkarguing that data quality metrics should be organized around data products rather than at the individual asset level. I think that’s precisely the right instinct, but their recommended metrics are still coverage scores and SLA pass rates.
They ultimately reorganized where the tests live without changing what the tests measure. It’s a better organized blood panel, but it’s still just a blood panel.
–
Most organizations default to treating data quality as a prerequisite. Something you have to achieve before you can start using the data in earnest. Standardize it, validate it, and then put it to work. I think this gets the sequencing backwards.
Data quality isn’t a prerequisite to using your data; it’s an outcome of using your data.
If you want to know whether your data is good enough, try to use it. Build that report. Reconcile those portfolios. Try to run the analysis that’s supposed to inform next quarter’s growth strategy. The answer to “Is our data quality sufficient?” isn’t whether you pass a number of helpful but not sufficient “quality” tests, but whether you were able to make the decision you needed to make quickly and with confidence.
And when you can’t, when the report doesn’t tie out or the reconciliation falls apart, you don’t have vague “data quality” issues. You have a specific problem to go fix that’s far more useful than adherence to a framework. And what’s more, you got there faster than any data quality roadmap would have taken you.
–
Consider a typical scenario: an RIA would like to generate quarterly performance reports for their clients. Their data quality tests indicate that everything is healthy: schema tests are passing, the data is fresh, and critical fields are reliably complete and absent of null values.
However, when someone actually tries to generate the reports, things don’t quite work as expected. The first attempt reveals that performance figures don’t tie out across custodians because each one calculates time-weighted returns differently, and the employee who set them up never standardized the methodology. Data quality tests fail to catch this because the values are mechanically accurate, but it’s still unusable nonetheless.
The fix is straightforward once you dig into the problem: define a canonical return methodology to normalize the data across the systems. But notice, you only discovered the problem by actually trying to generate the report, not by relying on a generic effort to “clean up your data.”
The second attempt gets further along. The returns can be reconciled across custodians, but a handful of accounts have obviously incorrect asset allocations. It turns out the CRM classifications were entered inconsistently by different advisors years ago: some tagged the accounts by strategy name, others by model portfolio, and in other cases by something idiosyncratic. The data is “complete,” but the values don’t mean the same thing.
By the third attempt, the reports work. More importantly, the firm now understands things about its data that no data quality test would have surfaced. They understand where their systems silently disagree, which human processes have introduced inconsistencies, and what assumptions were baked into their data from years ago that no longer hold.
The attempt itself is diagnostic. Each time you hit a wall, you're not just solving the problem in front of you but also learning something about your data that you couldn't have known otherwise. A quality test tells you whether a value is present; attempting to use the data is the only surefire way to tell if it means what you think it means. Over time, you stop encountering the same classes of problems because you've built a genuine understanding of where your data comes from, what its shortcomings are, and why. Not because you implemented more tests, but because you used your data for something that mattered and paid attention to what went wrong.
–
The data quality industry has spent years building increasingly sophisticated ways to measure the vital signs of data. That work is important, but it’s incomplete. The next step isn’t better measurement; it’s just using your data to actually do things.
Stop trying to test your data into shape and start trying to use it.

