Automating DAC8 Reporting: Build vs Buy for CASPs in 2026

Regulation·

Automating DAC8 Reporting: Build vs Buy for CASPs in 2026

DAC8 reporting from 2026 is a recurring data pipeline, not a one-off filing. What to automate, where manual judgement stays, and how to decide build vs buy for the collection, aggregation, and output layers.
Author avatar Wag3s TeamEditorial team specializing in Web3 finance, crypto tax, and DAO operations. Based in Zurich, Switzerland.

Reviewed by Wag3s Editorial Team — verified against Council Directive (EU) 2023/2226 and European Commission DAC8 guidance · Last reviewed May 2026

Automating DAC8 Reporting

DAC8 reporting is not a form you fill in once a year. It is a recurring data pipeline that has to capture, normalize, and aggregate a full year of activity per reportable user, then emit a schema-valid report. This is the mechanics pillar for that pipeline: the three layers it decomposes into, what to automate at each, where human judgement has to stay, and how to make the build-vs-buy call. Adjacent articles drill into the pieces — the exact reportable fields, the XML schema and templates, and the penalty exposure that makes data integrity the priority.

The build-vs-buy picture in five points

  • DAC8 is a pipeline, not a filing. Data collection had to be live from 1 January 2026.
  • It decomposes into three layers: identity and tax residency (Layer 1), transaction aggregation (Layer 2), output formatting (Layer 3).
  • Layer 2 and Layer 3 automate well. Layer 1 verification and discrepancy-investigation judgement do not fully automate.
  • The riskiest layer to get wrong is Layer 2 data integrity: well-formatted but incomplete reports.
  • On build vs buy: buy the regulatory-commodity layers, own the part tied to your unique transaction data.

Why "automation" is the wrong frame on its own

The instinct is to ask "which tool generates the DAC8 XML?" That is the last 10% of the problem. The report-generation step is largely deterministic once the inputs are correct. The work — and the risk — is upstream: getting complete, correctly attributed, correctly aggregated data into the generator.

A pipeline that automates output but is fed incomplete transaction data produces a report that passes schema validation, files on time, and is wrong. That is a data-integrity failure, the highest-frequency DAC8 penalty exposure (see DAC8 penalties). Automation that accelerates a defective process only produces defects faster.

The three layers, and what automates

Layer 1 — Identity and tax residency

Collect and verify legal name, date of birth, address, jurisdiction(s) of tax residence, TIN per jurisdiction, and a verified self-certification (see DAC8 data collected).

  • Automatable: capture flows, TIN format validation, re-certification reminders, registry lookups where APIs exist.
  • Not fully automatable: resolving conflicting residency indicia, judging an ambiguous self-certification, handling refusals. These need a documented human decision.

Layer 1 is the hardest to fully automate because verification is a judgement, and a wrong reportable-user determination propagates into every subsequent layer.

Layer 2 — Transaction aggregation

Normalize all activity across chains and venues, attribute it per user, and produce the annual aggregates (acquired/disposed vs fiat and vs crypto, transfers).

  • Automatable: ingestion, normalization, per-user per-asset aggregation, counterparty categorization rules.
  • Not fully automatable: classification edge cases (NFT scope — see DAC8 and NFTs; EMT vs CBDC vs tokenized security — see DAC8 and the digital euro) and discrepancy investigation.

This is the layer most tied to your specific data, and the riskiest to get wrong.

Layer 3 — Output formatting

Emit the report in the required XML schema for exchange via the EU Common Communication Network. The EU schema aligns with the OECD's CARF/CRS XML schemas published to support cross-authority transmission.

  • Automatable: almost entirely, once Layer 2 outputs are correct and the target schema is fixed.
  • Not fully automatable: per-Member-State variations where national transposition adds fields (see DAC8 transposition by country).

Layer 3 is the cheapest to automate and the least risky — precisely because it is downstream of the layers that carry the judgement.

Build vs buy, per layer

LayerTypical decisionRationale
Layer 1 (identity/residency)BuyHigh regulatory specificity, commodity engineering, KYC vendors specialize here
Layer 2 (aggregation)Own/build the normalization tied to your data; buy the aggregation engineThis is your unique data; the engine is reusable, the connectors are yours
Layer 3 (output)BuyDeterministic, schema-driven, low differentiation

The recurring anti-pattern is trying to buy a single end-to-end tool. The three layers need different specialists; one product rarely does all three well. The other anti-pattern is building everything in-house — Layers 1 and 3 are commodity regulatory plumbing where vendors are ahead and staying ahead.

The defensible architecture for most CASPs: a KYC/self-certification vendor (Layer 1), a transaction-data platform that owns your chain/venue normalization (Layer 2), and a reporting-output tool (Layer 3), with documented human checkpoints at the judgement points.

Where vendors fit

  • Sumsub — Layer 1: self-certification capture, TIN verification, re-certification flags.
  • TaxBit — Layer 3 (and parts of Layer 2 output): generating DAC8/CARF-shaped reports.
  • Cryptio — Layer 2: normalizing fragmented on-chain and exchange data with retained lineage.

The mapping is almost one tool per layer, which is why DAC8 automation is a stack-design problem, not a tool-selection problem.

Where Wag3s sits: the Layer 2 foundation

Layer 2 is the part of the pipeline tied to your unique transaction data, and the part this article flags as both least automatable to buy whole and riskiest to get wrong. Wag3s Ledger is built for exactly that layer:

  • Multi-chain normalization across 20+ chains and exchange APIs (see multi-chain reconciliation)
  • Per-user, per-asset annual aggregation matching the DAC8 reportable categories
  • Counterparty categorization and instrument-level tagging (EMT vs CBDC vs tokenized security)
  • Retained per-transaction lineage so any aggregate is auditable and discrepancies are investigable
  • Clean outputs that feed a Layer 3 reporting tool or in-house XML generation

It does not file the report or make the reportable-user determinations — those judgement points and the submission stay with your compliance team and counsel. Wag3s supplies the audited substance they file on. See the Wag3s Ledger product page for module details.


Step-by-step: building a DAC8 pipeline for a mid-size CASP

A CASP operating on 5 EVM chains with 80,000 reportable users, processing roughly 3 million transactions per year, needs a pipeline that is reliable enough to withstand audit but not over-engineered to the point of being undeliverable by the 30 September 2027 deadline for FY 2026 data. The following architecture is illustrative.

Layer 1 — Identity and tax residency (buy). Select a KYC and self-certification vendor (e.g. Sumsub or a peer) with a DAC8-ready self-certification module. The module should capture: legal name, date of birth, address, tax residence jurisdiction(s), and TIN per jurisdiction. It should send re-certification reminders at least 30 days before annual deadlines and flag users where TIN validation returns an error. Build internal procedures for the human review of flagged cases — the vendor handles the infrastructure; the compliance team handles the edge cases.

Layer 2 — Transaction aggregation (own the normalization; buy the engine where possible). This is the hardest layer to buy entirely, because the normalization is specific to the CASP's data. For each supported chain and exchange venue, the CASP's engineering team must produce a normalization adapter that converts raw chain data (or exchange API data) into a canonical transaction record with: user identifier, transaction type (acquired vs disposed vs transferred), asset type (crypto-asset, EMT, tokenized security), quantity, fair value in EUR at the transaction date, and counterparty type (CASP, non-CASP, unhosted wallet).

Once normalized, the aggregation step (summing per user, per asset, per year, per reportable category) is deterministic and can use an off-the-shelf aggregation engine or a SQL-based data warehouse. The reporting categories required by DAC8 are: (a) aggregate consideration and units for all transactions where the reportable user disposed of assets for fiat; (b) aggregate consideration and units for crypto-to-crypto exchanges; (c) aggregate transfers to unhosted wallets (unit count and value); (d) retail payment transactions. Maintain each category separately in the aggregation model.

Layer 3 — Output formatting (buy). The XML schema for DAC8 reporting follows the EU Common Communication Network specifications, which closely mirror the OECD CARF XML schema. A reporting-output tool should accept the Layer 2 aggregates, map them to the correct DAC8 XML elements, and validate the output against the schema. This step is entirely deterministic once the inputs are correct. Most schema errors at this stage are caused by missing or malformed Layer 2 data (a null TIN, an unsupported asset type), not by the output tool itself.

Human checkpoints. Three mandatory human review points: (1) After Layer 1: review the list of users classified as reportable vs non-reportable, and spot-check 50–100 cases for plausibility. (2) After Layer 2 aggregation but before output: run a statistical review of the aggregates — check for users with implausibly high or low aggregate values, zero-value aggregates for active users, or categories that appear empty when activity is known to exist. (3) After Layer 3 output but before filing: confirm the filing covers the correct reporting period, the correct Member State, and that the schema validation passes without errors.

Common automation failures and how to prevent them

Missing exchange or chain. A new exchange was integrated mid-year but its API was not added to the normalization pipeline until Q3. Transactions from H1 are missing from that exchange's data. Prevention: maintain a complete inventory of data sources; require that every new integration is added to the DAC8 pipeline on the same day it goes live for users.

Wrong asset classification. A stablecoin pegged to EUR is classified as a crypto-asset rather than an EMT in the normalization layer. EMTs (electronic money tokens) are reported differently under DAC8. Prevention: maintain an asset classification register tied to the normalization layer; review it when new assets are listed.

User merge failure. A user changes their email address and the system creates a duplicate account. Transactions are split across two user records; the DAC8 aggregate shows two low-activity users instead of one high-activity one. Prevention: user identity resolution (linking all accounts for the same individual) must be done before aggregation, not after.

Self-certification gap. Users who registered before the DAC8 self-certification requirement was implemented never completed a tax-residency self-certification. They are in the trading population but absent from Layer 1. Prevention: backfill self-certification for all pre-DAC8 users; remind and block trading for users who do not complete it within a grace period, consistent with the CASP's terms of service and legal advice.


Further reading

Sources

Editorial disclaimer
This article is informational and does not constitute legal, tax, or technology-procurement advice. Confirm reporting obligations and tooling fit with qualified counsel and your own technical assessment.