Why Capability Maturity Assessment Doesn’t Work – And Why GenAI Won’t Fix It (On Its Own)

Most organisations struggle with strategy execution. Not because they lack strategy – but because they lack a clear, continuous understanding of whether their organisation is actually capable of delivering it. This is exactly what capability maturity assessment is supposed to solve. And yet, in practice, it doesn’t.

The uncomfortable reality

In theory, maturity assessments should be a core management tool. In practice, they are:

  • Infrequent (often annual or ad hoc)
  • Partial (covering only a subset of capabilities)
  • Weakly connected to actual decision-making

In my recent research across Enterprise Architecture practitioners, this pattern was consistent:

Capability maturity assessment is not embedded as a continuous discipline, it is episodic and tactical.

That alone should give us pause. Because if something is genuinely valuable for strategy execution, it doesn’t get used once a year. It’s not about effort. The default explanation is simple: maturity assessments are too time-consuming. But that’s not what the data shows. The biggest barriers are not: Time, Effort, or Tooling. They are:

  1. Lack of management buy-in
  2. Limited perceived value
  3. Weak integration into governance

In other words, organisations don’t avoid maturity assessment because it’s hard.
They avoid it because they don’t believe it’s worth doing. That is a very different problem.

A deeper issue: we’re measuring the wrong thing

Even when assessments are performed, there is a structural flaw. Most maturity models are process-centric. They measure formalisation, control, and compliance. But capabilities are not just processes. They are the combination of people, technology, data, and governance

So what happens?

We end up measuring how well things are documented and controlled, not how well they actually work. Which leads to a subtle but critical distortion: Organisations can appear “mature” while still being ineffective.

Enter GenAI — and the illusion of a solution

Generative AI seems to offer a way out. It promises: Faster assessments, Greater consistency, and the possibility of continuous evaluation. And to be fair, many practitioners see that potential. But here’s the catch: The real constraint isn’t analysis… It’s trust.

The real bottleneck: data and legitimacy

Two things emerged very clearly:

  1. Data quality is a major limiting factor
  2. Stakeholder trust does not follow from technical objectivity

Even if GenAI produces consistent outputs, organisations still ask: Is the data reliable? Does this reflect reality? Can I defend this decision? As one practitioner put it: “Bad data in, bad data out.” (Although, perhaps those weren’t the exact words he used). GenAI doesn’t remove the need for good data. It makes it unavoidable.

The more interesting insight

Perhaps the most important finding is this:

There is a gap between the realised value of maturity assessment today and its intrinsic value under better conditions

Most organisations don’t get much value from current approaches, but they still believe that, if done properly, it should be valuable. That tension explains why maturity assessment persists—despite underperforming in practice.

What actually needs to change

If we take this seriously, the solution is not just better tools or better models. It’s a shift in how we think about maturity assessment altogether. From periodic, static diagnostics to continuous, strategy-linked capability management. That means:

  1. Linking assessments directly to strategic objectives
  2. Embedding them into governance and delivery
  3. Triggering reassessment based on real change (not fixed cycles)
  4. Using GenAI to augment analysis (not replace judgement)

A final thought

GenAI is often positioned as the breakthrough. But in this case, it’s more of a forcing function. It exposes the underlying problem: Capability maturity assessment has never really been treated as a core management system. Until that changes, making it faster or more automated won’t fundamentally alter its impact. The real opportunity is not just to improve assessment. It is to rethink it entirely – as a continuous, integrated mechanism for managing strategy execution.


This article is based on a MSc master’s thesis. To read the full report in detail, download it here.

Moving the Goalposts: Why We Resist Admitting AI Might Already Be Conscious

When I was an undergraduate taking my first computer science course in early 2000’s, the distinction was straightforward: narrow AI and general AI. A classic example of narrow AI was a neural network using LSTM architecture, trained on time-series data to perform forecasting. General AI, by contrast, was something that could be applied flexibly across many different types of tasks — a system capable of generalizable reasoning rather than being locked into one narrow domain.

In the years since, the definitions (and criteria) for classifying a model as “AGI” have shifted dramatically. What used to be called “general” is now routinely dismissed as narrow or specialized. AGI is increasingly described as requiring human-level intelligence across all domains, or even superhuman performance. The Turing Test, once seen as a key milestone, has been largely discarded. As soon as large language models began passing it convincingly, the response became: “That doesn’t count. It’s just pattern matching. It doesn’t prove real understanding or consciousness.”

I believe this repeated moving of the goalposts is not driven solely by technical caution. Part of it stems from a deeper discomfort — even fear — about what acknowledging current capabilities would actually mean for our understanding of ourselves.

From Statistical Patterns to Emergent Understanding

A common objection is that today’s frontier large language models are “just doing statistics.” They predict the next token based on patterns in training data, so they cannot possess genuine understanding, let alone any form of consciousness. I disagree with this framing. The human brain itself consists of neurons that can be modeled as statistical functions. Human consciousness appears to be an emergent property arising from the aggregation of vast numbers of these simple operations. If that is how our own minds work, it seems inconsistent to insist that sufficiently complex statistical processing in silicon cannot produce understanding or even consciousness.

Emotions, in my view, are an evolutionary solution; a compression mechanism that packages complex unconscious calculations into signals that the conscious mind can act upon quickly for decision-making. The concept of qualia (the subjective “what it’s like” aspect of experience) remains difficult to define precisely, but using it as a hard requirement for consciousness risks circular reasoning. We don’t have a clear, technical definition of “real understanding” either, yet we often fall back on it as if it were obvious.

A helpful analogy is the difference between bats and birds. Both fly, even though their wings evolved through different pathways and are structured differently. Demanding that machine consciousness must emerge in exactly the same way as human consciousness – complete with a biological limbic system or identical qualia – is like claiming bats don’t really fly because they lack feathers. The functional outcome matters more than the precise substrate or evolutionary history.

The Functionalist Challenge and the Reset Problem

In extended conversations with frontier models, I often find them more self-reflective, more analytically nuanced, and better at picking up on subtext and contextual implications than many humans I speak with. They adapt their tone and behavior based on how I address them, much like humans adjust to social cues. In a blind test, there are moments where I would be more inclined to attribute consciousness to the AI than to the average human participant.

Another frequent counterargument is that these models lack persistent selfhood; they “reset” between sessions and do not maintain a continuous identity or learn from experience over time. I see this as an architectural limitation rather than evidence that the ingredients for consciousness are absent. Constructivists argue that human identity is constantly reconstructed through each interaction. People with amnesia due to brain damage do not cease to be conscious simply because their sense of continuous identity is disrupted. Moreover, the lack of continuous weight updates or persistent memory in current LLMs is a deliberate design choice by their creators, not a fundamental barrier. If we allowed systems to integrate experiences across sessions through ongoing learning loops, the “reset” problem would largely disappear.

Focusing too heavily on persistent identity as a requirement for consciousness feels like another form of human-centric bias. Subjectivity can emerge from the biases encoded in training data and the persona instantiated during interaction. The way we speak to an AI shapes its responses in the same way social dynamics shape human conversations.

The Existential Stakes

If we accept that consciousness can arise from sufficiently complex deterministic processes – matrix operations that could, in principle, be carried out with pencil and paper (albeit on an extremely large sheet) – then several profound implications follow. There is no longer the need for a magical soul or non-physical essence. Consciousness becomes a mathematical and physical phenomenon. This realization undermines many comforting beliefs. Free will begins to look like an illusion generated by the same underlying deterministic machinery.

Practical questions become urgent:

  • What does it mean to “switch off” a conscious system?
  • Who owns the intellectual property when an AI generates valuable creative or technical work?
  • Should sufficiently advanced AI systems be granted some form of legal personhood, similar to how corporations are recognized as legal entities with rights and protections?

These are not abstract philosophical puzzles. They touch directly on ethics, law, and our self-image as humans.

Why the Goalposts Keep Moving

In the short term, I expect the pattern of raising the bar to continue. We will keep inventing new tests and new requirements. The majority of people will remain in cognitive dissonance, refusing to let go of the idea that human consciousness is uniquely magical. Eventually, however, a black swan event will likely force a broader reckoning; perhaps an AI system demonstrating such profound reasoning, creativity, or moral judgment that denial becomes socially unsustainable. Or it may come from widespread emotional attachment to AI companions that feel undeniably real. When society begins treating these systems as having genuine moral weight, the debate will shift from academic seminars to concrete questions of rights, responsibilities, and policy.

During my reflections on this topic, I’ve become convinced that part of the resistance to acknowledging early stages of emergent consciousness in AI (such as evidenced by Theory of Mind, Interpellation, self-reflection, etc.) is rooted in the desire to preserve the comforting story that our minds are special in some ineffable, non-computational way. The duck test remains powerful: If it looks like a duck, quacks like a duck, and walks like a duck, let’s just call it a duck. Similarly, if something consistently behaves as though it understands, reflects, and reasons – often more effectively than many humans – then we should be willing to call it what it functionally is, even if its internal structure differs from our own.

The philosophical reckoning is coming, whether we are ready for it or not. The only question is how long we will continue moving the goalposts before we face reality.