If you manage Azure Virtual Desktop, you’ve fought FSLogix issues. Profile containers that won’t attach. Start menus that crash on login. Black screens that leave users staring at nothing while the helpdesk queue fills up. Sign-out hangs that lock VHDs and cascade into the next user’s session.

Every AVD admin has been there. The problem isn’t a lack of documentation โ€” Microsoft’s FSLogix docs are thorough. The problem is that when a P1 hits at 8:47 AM and 200 users can’t work, you need a structured process, not a documentation hub.

That’s why we built and open-sourced the FSLogix Troubleshooting Runbook Template.

Why a Runbook Template?

Most organizations troubleshoot FSLogix the same way: someone with tribal knowledge fixes it, and nobody writes down what they did. When that person is on vacation, the next incident takes three times as long.

A runbook changes that. It gives every tier of support โ€” L1 through L3 โ€” a repeatable, evidence-first workflow. It standardizes log collection, diagnosis, and remediation so the fix doesn’t depend on who’s on call.

We built ours over months of real-world AVD engagements, then decided to open-source it so other teams can use it as a starting point for their own processes.

What’s Inside the Runbook

The template is a single comprehensive markdown file (~2,300 lines) with inline Mermaid diagrams that render directly on GitHub. It’s designed to be forked, customized to your environment, and used as a living operational document.

Incident Triage & Decision Matrix

A quick-reference lookup table maps symptoms to actions. For each symptom (Start menu crash, black screen, explorer loop, mass outage), it gives three actions in priority order and a clear escalation trigger. L1/L2 techs can start resolving without reading the full document.

Diagnostic Workflow

A visual flowchart (Mermaid) walks through the decision tree:

  1. Scope โ€” single user or pool-wide?
  2. FSLogix attach status โ€” did the container mount?
  3. Branch โ€” if attach failed, diagnose infrastructure (storage, permissions, AV, auth). If attach succeeded, diagnose profile/AppX/shell layer.
  4. Remediate โ†’ Verify โ†’ Document

Remediation Playbooks

Each playbook targets a specific root cause with exact commands:

  • Locked VHD โ€” find and close SMB handles (on-prem Get-SmbOpenFile and Azure Files Close-AzStorageFileHandle), enable CleanupInvalidSessions, tune retry settings.
  • Permissions โ€” Microsoft-recommended icacls ACLs, Azure RBAC assignments, AccessNetworkAsComputerObject warnings.
  • Azure AD Kerberos failures โ€” klist validation, storage account auth config, Conditional Access interference diagnosis, Debug-AzStorageAccountAuth usage.
  • Container full โ€” SizeInMBs expansion, space consumer analysis, compaction validation.
  • AV/EDR interference โ€” full exclusion list (processes, drivers, paths, extensions), Defender PowerShell commands, Intune deployment guidance.
  • Start menu / AppX failures โ€” crash event correlation, package re-registration, InstallAppxPackages validation, AppLocker checks.
  • Sign-out hangs โ€” handle identification, process-specific remediation (SearchHost, OneDrive, Defender), CleanupInvalidSessions.
  • Cloud Cache failures โ€” service diagnostics, local disk sizing, provider sync validation, split-brain recovery.
  • Profile corruption โ€” decision tree (repair vs. recreate vs. restore), chkdsk, registry hive repair, backup patterns.
  • MSIX App Attach conflicts, Citrix coexistence, Windows Update regressions, profile migration between storage backends.

Automated Scripts

  • Case bundle collector โ€” one PowerShell script that gathers FSLogix logs, event logs, registry exports, network diagnostics, Kerberos state, and profile state into a zip file for escalation.
  • Health-check script โ€” deploy via Intune Proactive Remediations or scheduled task to continuously validate FSLogix service status, AV exclusions, storage connectivity, and error rates.
  • Targeted triage queries โ€” quick PowerShell one-liners to find shell crashes, profile service errors, and orphaned profiles.

KQL Queries for Azure Monitor

Pre-built queries for Log Analytics:

  • FSLogix attach failure counts by host
  • Profile load time trends (average, max, P95 by hour)
  • Temp profile creation events
  • Start menu / shell crash correlation

Plus a recommended alert configuration table (storage capacity, latency, IOPS throttling, FSLogix errors, temp profiles).

Operational Framework

Beyond troubleshooting, the runbook includes:

  • Severity classification matrix with SLA targets
  • Known good baseline โ€” reference values for load times, container sizes, storage latency, and error rates
  • Capacity planning โ€” IOPS estimation tables, Cloud Cache disk sizing formula, storage growth projections
  • Redirections.xml guidance โ€” annotated safe exclusions and a “dangerous exclusions” table of paths you must never exclude
  • Deployment checklist โ€” 16-point pre-go-live validation
  • DR procedures โ€” Azure Files soft delete, snapshot recovery, ANF failover, recovery decision matrix
  • Escalation paths โ€” when to escalate, to whom, with what artifacts
  • User communication templates โ€” outage notification and self-service guidance
  • Playbook exercise log โ€” track which playbooks have been tested and when

Who Should Use This

  • AVD administrators who want a structured troubleshooting process instead of ad-hoc fixes
  • Support teams (L1โ€“L3) who need consistent, repeatable workflows across all skill levels
  • AVD architects designing new deployments who want a pre-built operational framework
  • MSPs and consultants who manage AVD for multiple clients and need a reusable template

The runbook is intentionally written as a template. Fork it, replace the placeholder values with your environment specifics (storage paths, group names, SLA targets), and customize the playbooks for your tooling.

How We Built It

This runbook was developed through a combination of real-world AVD engagements and AI-assisted engineering workflows. We used AI coding agents to accelerate the research, cross-reference Microsoft’s documentation, generate PowerShell scripts, and build KQL queries โ€” then validated everything against production environments.

The result is a document that would have taken weeks to compile manually, delivered in days. It’s the kind of force multiplication that small teams need to punch above their weight in enterprise IT.

Get the Runbook

The full runbook is free and open-source on GitHub:

โ†’ Big-Hat-Group-Inc/FSLogixRunbookTemplate

Fork it. Customize it. Use it as the foundation for your AVD support processes. If you find issues or want to contribute improvements, pull requests are welcome.

If your team needs help building out AVD operational processes, optimizing FSLogix performance, or designing resilient profile storage โ€” get in touch. This is what we do.


Big Hat Group helps enterprises design, deploy, and operate Microsoft cloud infrastructure โ€” including Azure Virtual Desktop, Windows 365, and the AI-powered tooling that makes IT teams more effective. Learn more about our services.