I’ve made the argument before that technology has a lot to offer the field of tax. But for all the potential that technologies like artificial intelligence and machine learning algorithms have to streamline processes and inform policy, they remain tools—tools that must be wielded by real human beings (at least for now) at the direction of human policymakers (at least for now).
Where is this going? The Kinderopvangtoeslagaffaire is a tax and political scandal currently rocking the Netherlands—literally translated to “childcare allowance affair.” Despite complex political underpinnings, the core elements of the scandal are straightforward.
In 2013, the Dutch government deployed artificial intelligence to handle childcare benefits applications and, as you might guess, it did not go well. Disproportionately, ethnic minorities were denied benefits and charged with fraud, and the entire imbroglio culminated in the cabinet resigning in January 2021. Now, it seems, it may not have been a fault of technology as much as a fault of the human beings, or policymakers, operating the technology.
What are AI and ML?
First, a quick primer. AI refers to any of a number of technologies being used to automate tasks that you would traditionally think a human would need to be involved in—those requiring thought and decision-making. ML is a subset of AI but itself is an umbrella term for a slate of technologies in which machines use feedback loops to become better at predicting a given outcome. Most folks likely encounter the effects of machine learning through advertising—such as when a Starbucks ad on your phone knows to advertise to you for your entirely overwrought, too complicated, and frankly profligate iced drink on a hot day.
In tax, AI has potential applications from policymaking all the way through to compliance and auditing. Earlier this year, the IRS awarded $310 million to Brillient Corp. to provide ML and automation to agency processes. The Netherlands offer a cautionary tale, however, that should be heeded before we race headlong in to government by algorithm.
Beginning in 2013, any family in the Netherlands claiming a childcare allowance would file their claim with the Dutch tax authority and have the claim subjected to a self-learning machine algorithm. The algorithm was intended to check not only for things like proper form usage and complete information but also to flag the relative risk of fraud for an individual application. The algorithm, it turns out, was disproportionately flagging foreign-born individuals’ and ethnic minority applications as fraudulent. The pace of the misfiring fraud classification was so fast that the human overseers were quickly swamped and resorted to simply accepting a fraud risk flag as a fraud flag.
More than 20,000 families were charged with fraud and forced to pay back benefits, without appeal. Innocent families were forced into economic distress by a xenophobic and racist app—or was there more to it?
Dutch Institutional Racism
The early reports centered on the algorithm and the need for oversight, which is true enough. But in the last year, the Dutch government has quietly begun making indications it will admit to broader institutional racism at the tax authority. As it turns out, the tax authority maintained a system for hunting down fraudsters that began with ascertaining whether the individual had a non-Western appearance. Apparently, a donation to a mosque was also a fraud flag, which mirrors broader policies in the Dutch government.
It is beginning to sound more like it may not have been faulty AI to blame, but AI trained to enforce racist policies. To that end, there are some portable lessons that can be extrapolated to the use of AI and ML in tax more broadly.
Accountability. First and foremost, there must be accountability. There must be a real human being tasked with overseeing the AI, making recommendations, receiving advice, and issuing reports. AI must not become a scapegoat policymakers can simply point when racist policies are filtered through an algorithm and output racist outcomes. An individual must sign off on decisions made by AI.
The conversation in the immediate wake of the Dutch scandal revolved around putting in place mechanisms by which misfiring AI can be reined in. Unchecked, AI policymaking can provide cover for politicians seeking to trial balloon (potentially) regressive policies. A Dutch politician tasked with oversight of the childcare credit applications would much prefer to discuss the lamentable effects of AI that had a disparate impact as against a system designed to effect a disparate treatment. AI is a tool to streamline processes; it must not be allowed to be used to attenuate and deflect accountability.
Transparency. The victims of the Dutch childcare credit scandal had no way of knowing why they had been flagged and consequently little ability to remedy their situation. No advocate could be of much help, as the algorithm itself was and remains entirely obscured from the public eye. This cannot be the case moving forward.
The AI must be open sourced and made available for scrutiny. When an algorithm is being used to approve or deny access to rightful benefits, it is tantamount to law, public law must be public, and a citizenry has a right to access the laws that govern them. Similarly, the results and accuracy of the algorithm should be published. How many false fraud flags are being discovered? Who has the contract to build and maintain the algorithm? Just as with law, it is not about enacting a law that everyone agrees with, but it must be such that any law enacted is one everyone has access to. The same applies here; the algorithm will have its detractors, but it must be published and open to be pored over and critiqued.
Human Oversight and Fail-State. A common issue to consider when automating a process is the fail-state or default. In this specific example, if an application cannot be properly assessed, does the algorithm assume a fraud risk? Assume no fraud risk? Or make no determination?
Checks on a system, including human oversight, are only as useful as the fail-state. In the Dutch case, overworked public servants simply abided by the determination of the AI and marked the applications as fraudulent. Thinking of the application process as a system, the act of seeing the fraud flag and automatically approving it eliminates the human element entirely as a check—the AI was working with no net. If the AI similarly defaulted to a fraud flag, folks are being deprived of their benefits for no reason.
Human oversight must operate completely free from the AI, with no feedback loop either upstream, to the AI, or downstream, to the humans. The individuals tasked with reviewing those applications should not have been able to determine whether they had been flagged as fraudulent or not by the AI—the sampling should be random. An AI fraud-flagged application that is similarly flagged by a human overseer can then be processed as a fraud risk. The accuracy of the individual overseer can be considered: Are they flagging applications as fraudulent for which the AI does not agree? Either the AI or the overseer must be adjusted.
Around the world and with increasing scope, governments are turning to AI and ML to automate tasks that would have previously been piled high on a public servant’s desk. Many, including Amnesty International, are concerned about the effects of giving over to government by algorithm. Soon, the IRS may be giving over auditing duties to an algorithm, and care must be taken at the outset to ensure it isn’t a repeat of the Dutch experience. In computer science, there is a concept called “garbage in, garbage out.” But in this case, it may just as well be “bias in, bias out.”
This is a regular column from tax and technology attorney Andrew Leahey, principal at Hunter Creek Consulting and a sales suppression expert. Look for Leahey’s column on Bloomberg Tax, and follow him on Twitter at @leahey.