Data lake that consolidated 12 source systems into one operational view

To maintain client confidentiality, the company and industry in this case study have been anonymized. The underlying solution is the same.

The problem

After the claims-payment automation work, Halberd brought us back for a different problem entirely: data that was technically available but operationally invisible.

Halberd Mutual already had the data. Twelve source systems were generating it every day: claims management, claims financial, policy admin, billing, the vendor portal, adjuster timekeeping, the litigation system, the document store, reinsurance management, producer admin, the rating engine, statutory reporting. None of them talked to each other. Reports came out of spreadsheets stitched by hand and emailed monthly.

That meant adjuster workload patterns were invisible until someone burned out. Vendor SLA breaches showed up months after the fact in retrospective reviews. Reserve drift on certain claim types only got caught at year-end actuarial review. The carrier was operating on data that was already old by the time anyone looked at it.

What we built

A consolidated data platform that ingests every source system on a defined refresh cadence (most near-real-time, all within 24 hours), lands raw data in an S3-based lake (catalogued via AWS Glue, queried through Athena), runs dbt models to apply consistent business logic across sources, and exposes the resulting marts through Metabase dashboards for operations, claims, finance, and executive leadership.

The dashboards aren’t just visualizations of existing data. They model insurance-specific operational signals: adjuster utilization vs claim complexity, vendor SLA adherence vs payment timeliness, fee schedule deviation patterns, reserve drift by claim segment, subrogation recovery gaps, premium audit pipeline aging. The patterns that used to require an analyst-week to find now surface automatically when the data refreshes.

For executive leadership, we built rollup views across business lines and source systems: total operational cost trends, claims-expense leakage indicators, growth signals by producer segment. Each view drills down to the underlying transactions in the source system itself, so questions get answered without an analyst in the middle.

Results

The platform consolidates 12 source systems and serves 30+ dashboards in continuous production use. In the first quarter after rollout, the dashboards surfaced more than $3M in annualized expense leakage that the existing reporting could not see: vendor billing patterns that crossed fee schedules in subtle ways, claim types with systematic reserve under-prediction, adjuster workload concentrations that drove cycle-time blowups.

More importantly: leadership stopped waiting for month-end reports to see what was happening in the business. The data is fresh enough to act on, the joins are tested and trusted, and operational decisions that used to require an analyst-week now happen in the room.

Data lake that consolidated 12 source systems into one operational view

The problem

What we built

Results

Ready to build something like this?