case-study · addiction · biomarkers

The Hours We Cannot See

1

The Hours We Cannot See

The man sitting across from Marek Kowalczyk in a Warsaw addiction clinic was doing well. He said so himself, week after week, with the quiet confidence of someone four months into sobriety who has begun to trust his own stability. "Sleeping fine. Stress is manageable." His eye contact was steady, his affect appropriate. By every measure available in that small consultation room—verbal report, clinical observation, therapeutic rapport—Tomasz was a success story in progress.

And yet something nagged at Kowalczyk, a psychologist with twelve years of experience treating alcohol use disorder. Call it clinical intuition, that ineffable sense cultivated through thousands of hours of sitting with people in varying states of recovery and collapse. "I couldn't point to anything specific," he recalls. "He was saying all the right things. But there was a flatness to it, like he was reading from a script he'd convinced himself was true."

This is the fundamental predicament of weekly psychotherapy: the 167-hour gap. A patient arrives, speaks for fifty minutes, and vanishes back into a life the therapist can only glimpse through retrospective self-report. What happens in those invisible hours—the 2am wakings, the creeping irritability, the craving that surges and subsides before it can be named—remains largely opaque. Patients aren't lying, exactly. They're doing what humans do: constructing coherent narratives that smooth over the jagged edges of lived experience.

The idea of continuous monitoring in mental health carries, for many European clinicians, an uncomfortable resonance. In Poland especially, where the surveillance apparatus of the communist era left deep cultural scars, the notion of tracking patients' movements and physiological states can feel less like innovation than intrusion. German colleagues, shaped by their own historical sensitivities and stringent Bundesdatenschutzgesetz requirements, express similar wariness. When Kowalczyk first encountered digital biomarker platforms—systems that synthesize data from wearables, brief daily check-ins, and AI-generated clinical summaries—his skepticism was reflexive. "My first thought was: this is surveillance dressed up as care. My patients escaped one system of monitoring. Why would I invite them into another?"

The distinction he eventually drew was between surveillance imposed and transparency chosen. That concern only eased when he could frame it explicitly as opt-in, time-limited, and fully reversible—something the patient controlled rather than something done to them. Tomasz agreed to a trial: his Apple Watch would passively track sleep architecture and heart rate variability; twice daily, he would complete thirty-second mood and craving check-ins on his phone; every two weeks, he would fill out PHQ-9 and GAD-7 assessments. The data would remain encrypted on EU servers, under Polish data protection jurisdiction, deletable at any moment he chose. "It felt more like a diary than a surveillance camera," Tomasz later reflected. "Except this diary showed patterns I wouldn't have noticed or might have minimized."

What emerged by weeks five and six was a portrait strikingly at odds with Tomasz's verbal reports. His sleep, which he dismissed as "not great, but normal for me," showed severe fragmentation: deep sleep decreased from eighteen percent to nine percent of total sleep time; awakenings increased from one or two to three or four per night. His heart rate variability—a measure of autonomic nervous system flexibility with emerging evidence in addiction research, though findings remain mixed—was trending steadily downward. His average daily steps had dropped from roughly eight thousand to forty-eight hundred, a forty percent reduction suggesting social withdrawal. And the brief craving reports he'd filed revealed four spikes in a single week against a baseline of zero to one. His twice-daily mood ratings had slipped from 7.4 to 6.2 out of ten.

It bears noting that these measurements were approximations, not ground truth. Consumer wearables infer sleep stages rather than measuring them directly; HRV is affected by caffeine, minor illnesses, and exercise; step counts drop for reasons that have nothing to do with mood or relapse risk. The system was useful not because it was perfectly accurate, but because it provided consistent, interpretable trends that clinician and patient could discuss in context.

He wasn't lying to Kowalczyk. He wasn't even lying to himself, exactly. He simply hadn't connected the dots. "I thought the cravings didn't matter because I didn't drink," he said later. "I didn't realize they were telling me something."

The conversation that followed in session six marked a turning point—not because technology replaced clinical judgment, but because it gave judgment something concrete to work with. Before the session, Kowalczyk reviewed a two-minute AI-generated briefing that synthesized the week's data: Stability Index change, top anomalies flagged, EMA trends, and carry-over goals from the prior session. His preparation time dropped from roughly fifteen minutes of reviewing scattered notes to about ninety seconds of focused review.

Instead of the usual opener ("How was your week?"), Kowalczyk began with specificity: "I'm noticing your sleep has been really disrupted the past two weeks—four or five wake-ups a night. And I see you reported cravings four times, including one at 2am on Thursday. What's been happening?"

Tomasz's surprise was palpable. "I didn't realize it was that bad." What followed was genuine exploration rather than ritualized check-in: work stress had been escalating for weeks, triggering sleep disruption, which fed exhaustion, which activated craving pathways he'd thought he'd mastered. The pattern was clearer in retrospect—as it often is—but retrospect is precisely what self-report usually provides. The biomarker data offered something closer to real-time pattern recognition.

Lambert and Shimokawa's 2011 meta-analysis in Psychotherapy (N > 6,000 across multiple trials) found that feedback systems improve outcomes with an effect size around d = 0.25, 95% CI [0.17, 0.33]—meaningful but modest, roughly equivalent to adding a few percentage points to recovery rates. The gains come not from algorithmic revelation but from something simpler: making the invisible visible, creating occasions for conversations that might otherwise never happen.

The real test came at week eight. The system tracked what it called a "Stability Index"—a composite metric calculated from rolling deviations across HRV, sleep, activity, and self-reported mood against Tomasz's personal baseline. His index had dropped from 78 (moderate stability) to 44 (below the threshold of 50 that flagged fragile recovery). HRV had declined sharply over three days. Sleep architecture had worsened. Activity patterns suggested isolation. The system generated an alert: review recommended.

Kowalczyk reached out between sessions, a departure from his usual practice. "Your message stopped me," Tomasz later said. "I was planning to drink Friday." Kowalczyk is cautious about how much weight to give that statement. Whether the plan would have become action is impossible to know, but the alert created an earlier contact point than his usual weekly rhythm allowed—by his estimate, intervention came roughly two to three times earlier than traditional weekly check-ins would have permitted.

But the biomarkers sometimes got it wrong—or rather, they were right in ways that didn't capture the full clinical picture. At week twelve, Tomasz's data looked excellent: HRV up, sleep stable, Stability Index comfortably at 81. Yet he reported increased anxiety in session. The algorithm saw physiological equilibrium; it couldn't see that his partner had just announced a pregnancy, bringing joy complicated by stress, hope shadowed by fear of relapse during the enormous transition ahead. "The numbers said he was fine," Kowalczyk notes. "The numbers were measuring the wrong thing. Physiological stability doesn't mean psychological ease—context matters. The algorithm flags; you interpret."

The system failed in ways both predictable and instructive. Some of Kowalczyk's patients—particularly those with health anxiety—became obsessed with their own metrics, interpreting normal HRV fluctuations as signs of impending collapse. Others found the daily check-ins burdensome, their compliance trailing off within weeks. Rural patients with unreliable internet connectivity struggled to sync their data at all. And the cultural resistance Kowalczyk had initially felt proved widespread among colleagues, many of whom viewed the entire enterprise with suspicion that no amount of GDPR compliance could alleviate.

The technology also demanded something unexpected: education. Heart rate variability is not an intuitive metric; interpreting its fluctuations requires training that most psychotherapy curricula don't provide. Sleep architecture data is similarly opaque without context. "This isn't plug and play," he observes. "It's a new clinical skill, and like any skill, it takes time to develop and can be done badly." AI-generated session summaries reduced his documentation burden by roughly thirty percent, but he still edited every draft manually—the algorithm could organize; it couldn't judge.

Six months later, Tomasz remains abstinent. He credits the monitoring with making his internal states legible in ways self-report never could: "I couldn't deny the data. When I saw my sleep charts, I had to take my stress seriously." Kowalczyk's practice has shifted subtly. He no longer opens sessions with the anodyne question that invites anodyne answers. Instead, he arrives with specifics: "I see your sleep was rough Tuesday—what happened?" The ritual fifteen minutes of catching up has compressed into targeted, pattern-informed exploration.

But he remains wary of the narrative he's just told—the tidy story of technology illuminating truth and preventing disaster. "There's a version of this where the machines become the authority and the therapist becomes the technician," he says. "That would be a catastrophe." What he's found useful is something more modest: biomarkers as conversation starters, AI summaries as preparation tools, alerts as invitations to pay closer attention. The clinical relationship remains primary. The algorithms serve the judgment; they don't replace it.

In the end, what passive monitoring revealed about Tomasz was not some secret truth hidden from view. It was the pattern he was living but couldn't see—the same pattern any attentive observer might notice, given sufficient data and sufficient time. The technology's contribution was simply this: it watched the 167 hours that Kowalczyk could not and reported back without the distortions of memory or the smoothing of narrative. The data had their own limitations—noise, occasional inaccuracies—but they showed trends that neither of them had been able to articulate. Whether that contribution is worth the risks—of surveillance creep, of metric obsession, of algorithmic authority displacing clinical wisdom—remains an open question. But for one patient in one Warsaw clinic, seen more clearly than weekly sessions alone could manage, it made a difference. Sometimes that's enough to justify the experiment.

Stay informed with
evidence-based insights

Subscribe to receive new research translations and updates directly to your inbox.