The Attribution Problem

This post makes several references to double-entry ledgering. If you aren’t familiar with traditional ledgering methods you may want to skim the Wikipedia article before reading.

What is the attribution problem?

The attribution problem is an issue that arises from double entry systems relying on authoritative sums for balances, and the cumulative property of addition. The problem can be summarized as:

Money is only deterministically traceable through a single money movement.

You might be tempted to extrapolate from this. If you can trace money through one movement then if you trace any money back, one movement at a time, you’ll eventually have a deterministic trail! Unfortunately, this approach falls short. It’s an insidious issue as it won’t cause problems until the system scales up, but it’s present even in simple systems. Let’s delve into an example.

A liquid analogy

Before we dive into the illustration we’re going to set up an analogy. You can think of traditional double entry accounts like a bucket, filled with liquid. You can put any label you want on the bucket, but when you pour new liquid in, it mixes together. This is because any new money that enters an account is summed into a balance. As part of the balance, numbers are indistinguishable from each other.

For this example we’ll keep track of any liquid that is added to, or removed from each bucket (transactions) and balances by the volume of liquid in each bucket. This mimics the two primary pieces of data in a double entry system: transactions and balances.

The problem, illustrated

In order to illustrate the attribution problem, we’ll use three buckets. We’ll call them In, Held, and Out . Our In bucket will be wherever we get the liquid from. Our Out bucket will be customers asking for a drink. We’ll start by making 4 deposits into the Held bucket:

1 cup of vodka
1 cup of water
1 cup of kool-aid
1 cup of lime juice

Alright, now we’ve got a balance of 4 cups of liquid in our Held bucket. We’re ready to start handing it out to thirsty people!

The first customer

Our first customer of the day is a middle aged man who asks for a cup to drink. We pour a cup from our Held bucket into his cup.

Now to ask the question that is the crux of the attribution problem:

The attribution problem: What did we just serve our guest?

The answer to this in our analogy is pretty complicated. It depends on fluid dynamics: what is the relative density of each liquid? How long have they been mixed together? Was the mixture shaken, stirred, or agitated? We could presumably test a sample of the cup we poured and determine the percentage of the mixture, but that’s going to take time and tools we probably don’t have.

How could you do it in a double entry system though? You can’t take a sample of a balance… every number is perfectly interchangeable. It turns out, this is just not possible thanks to the mathematical principles that govern sums.

In most accounting systems this kind of attribution is handled by a generic after-the-fact accounting rule that attempts to fix the problem by making an uninformed, but consistent guess. In other words, we can’t possibly tell what we served the customer, so after we pour the glass we’ll just decide what was in it, and go with that. The most simplistic and most common rule is FIFO (first in, first out) which assumes that the first money in is the first money out.

We’ll apply that rule here and claim we served our first customer 1 cup of straight vodka, much to their delight. Problem solved!

The second customer

With our first customer happily served we welcome our second guest: a 5-year-old child . The child asks for a drink, pleadingly holding out an empty glass. So we pour them a cup.

The attribution problem: What did we just serve our guest?

That pesky question again. One easily solved, though. We decided (without verification) that the cup we served our first customer was 1 cup of vodka, so there is no alcohol left in the bucket… right? Right!?

No rational person would consider pouring from our Held bucket into the child’s glass ethical. Using the standard FIFO approach for attribution is supposedly safe. A drunk 5-year old is a scary prospect, but, if you can believe it, there is actually something even crazier (though less ethically questionable) about this situation.

The true evil of the attribution problem

Here is the dark underbelly of the attribution problem.

Maybe you are an proponent of FIFO and think our first customer got 1 cup of vodka. If so: you are right .

Maybe you think the second cup must have contained some vokda and we served alcohol to a child. If so: you are also right .

Maybe you think we served the first customer 1 cup of lime juice, and the child 1 cup of kool-aid. If so: you are still right .

All of these claims are demonstrably true in equal measure. So are any of the other combinations of liquids we have in the bucket. This means that it’s possible to have a perfectly valid understanding of your finances that doesn’t map in any way to the actions taken or business rules used by your system. That’s insane!

Note: This is why forensic accounting exists as a profession. Trying to reverse engineer what actually happened from a view of double entry accounts is very hard, and requires extensive corroborating information. Money launderers love this property of standard accounting systems. A lot of money laundering is trying to find holes where the way you understand money movement doesn’t match the actual actions being taken.

Does it matter?

So what if you can’t tell how money actually went through your system? I can use after-the-fact rules like FIFO to account for it all. Why should I even care about deterministic attribution?

This is a valid question. FIFO and rules like it have been the standard for accounting for a century or so. How could the existing system be broken? Let’s go back to our example above with the liquids, but give them some more real world context.

What if Lime juice was money from a stolen credit card? Once you know that it was bad money, wouldn’t you want to know who you paid it to? Being able to identify who is profiting from fraud is a key component to stopping it. Might be useful.
What if customer 1 lives in China? Chinese regulations say they can only receive money from Chinese bank accounts. What if Kool-aid is from a Chinese bank, and the others are not? Can we prove we complied with the law?
What if Vodka is from customer 1’s credit card, and they are doing a charge-back on the money after each time they get paid? That might be nice to know too.
What if the Water is a monetary grant that can only be used for funding science programs? Can you give it legally to customer 1 ?

These are complicated scenarios, but they are all real world problems. Deterministic attribution gives you the tools to solve them. If you are a domestic-only business, and work in a low fraud industry you might be able to get away with vague attribution for a while. Eventually your business will need to answer questions just like these, either for compliance reasons or even just to be able to better understand the flow of money through your systems.

Common solutions

Ok, this is a potentially big problem, but how do you solve it?

Here we’ll talk about some common ways to solve this problem and what to expect if you pursue them.

Smaller accounts

(Sometimes called subledgers)

In our example we could solve the attribution problem in a pretty straight forward way: make a new account for each type of liquid. This approach works, but it has some serious drawbacks you’ll need to be ready to deal with.

Never ending: There is always another level of detail, or a new dimension that you’ll want to understand your money based on. This approach will be a source of constant maintenance for the life of your company.
Discoverability: Now that each logical account is broken down into a bunch of smaller accounts, you’ll need a way to find all the correct small accounts for the balance you want.
Transactionality: With tiny accounts you’ll likely be viewing and operating on aggregated balances. That means the promise of a “single, lockable row” for your balance no longer holds true. You’ll need a locking strategy for multi-account withdraws.
Conflicting perspectives: Eventually your accounts will need to be so specific that you’ll end up with two conflicting opinions of how it should be modeled. There isn’t a way to address this with double entry, so be ready to pick the one that is closest to your core money-making business process.
Routing: You’ll need to make sure that deposits and transfers get routed to the correct small account. Once you have decomposed your accounts several times the routing for an individual action can start getting really complex.
Fix-forward only: You can’t decompose accounts once they have a balance. Each time you create smaller accounts you’ll need a plan for flushing the existing account, and routing new deposits to the correct smaller account.

Transaction ledger

This approach uses a secondary data model other than just the balance. It keeps track of the inbound transactions and matches them to outbound transactions in order to patch around the attribution problem.

Transaction	Deposit Transaction	Remaining Balance
T1	Vodka	~~1 cup~~
T2	Water	~~1 cup~~
T3	Kool-aid	1 cup
T4	Lime juice	1 cup

This approach is pretty solid, actually. It solves the attribution problem by deterministically enforcing FIFO (or whatever attribution rule you want to use). The main downside is that the transaction ledger becomes the real source of truth. The balance row its no longer authoritative, and doesn’t much matter. If there is a conflict between the balance and the transaction ledger, the transaction ledger will be the source of truth because it has significantly more detail than the balance. If you already built an account system, and you are trying to bolt on a solution to the attribution problem, this is the route we would suggest.

However, this still won’t be easy. You need to design the system carefully and be ready to handle the problems listed here.

Multiple balances: A transaction ledger makes each deposit (credit) transaction its own sub-balance to the account. You’ll need to be able to track each of those balances, and make sure they are all atomically updated when a withdraw happens. Additionally, you’ll need to decide if you’ll try to keep the overall account balance in sync, or if you’ll compute it from the sub-balances every time.
Double-sided attribution: It’s not enough to just keep track of the sub balances, you also need a way to map the deposit transactions to the withdraw transactions that withdrew money. There are a number of approaches for this, and all of them are a bit tricky.
Debt: This approach has trouble dealing with negative balances because it’s expecting a one way flow of money. It’s possible to make it work for debt, but it’s going to be hard. If you are going to use this approach consider using specific accounts that have positive balances, but represent debt, instead of negative balance debt.
Accounting rules: The transaction ledger approach fixes attribution by applying accounting rules when money moves (in the example above we used the FIFO accounting rule). This is what most businesses start with, but it often evolves, particularly where regulatory compliance is involved. Using this approach your underlying system will need to either change every time the accounting rules change for that account type, or be flexible enough to support a wide range of rules.

Product-level ledger

This approach tends to be where most businesses start. The order system, billing system, or some other product model is responsible for “the truth” of the financials, and then they record it in the financial system. The financial system is expected to blindly accept any information the product system gives it, and if there is ever a conflict the product system is used as the source of truth to resolve the it.

This is a pretty straightforward, and business-first approach. It’s exceedingly popular amongst start-ups for exactly that reason, but it has some serious drawbacks.

Your product engineers are not financial engineers . They are optimizing for speed, flexibility, and shipping new features. They will happily use mutable data models, non-transactional writes, and processes that don’t maintain financial integrity. It’s not because they are bad engineers, its because product work has really explicit priorities, and they aren’t the same as finance.
It will slow down your product team . Your financial team (CFO, accountants, etc.) need a really specific view of the world. The need to attribute and account for money in a GAPP focused world. Using this approach your product team will get a consistent barrage of requirements from finance to make sure that they can understand correctly what is happening. This is going to bog down your product team with constraints they don’t care about or understand. This issue might take a while to show up, but when you reach a funding round where you need to show evidence of your financials or get big enough to have to pass financial audits it will appear with a vengeance. You’ll also have to make the entire history of your system compliant when that happens, so be really careful with this approach.
You’ll need constant system reconciliation . The financial system needs to stay up to date on a monthly basis at an absolute minimum, and using this approach the two systems will never be completely in sync. You’ll need to build systems for reconciling the two, even if the rule for reconciling is always trust the product model you’ll find this is a fairly complicated process. Be ready for data pipelining and anomaly detection to be major areas of focus.

The solution

It is nearly impossible to avoid the attribution problem in a double-entry system. Blockchain ledgers have better traceability, as they can trace money back to the most recent merge, but still suffer from the attribution problem for merges. The only surefire way to avoid the problem is by using a ledger that guarantees deterministic attribution.

String Theory is the only modern ledger system with provable attribution for all money in the system. We’ll discuss the properties of deterministic attribution and its uses in the next article.