Best Practices & Real-World Recipes
This chapter provides practical solutions for common Grafana operational headaches. The goal is not just to show a command, but to show when to use it, what success looks like, and what to check when the workflow goes sideways.
Who It Is For
- Operators who want proven workflow patterns instead of starting from a blank shell.
- Teams standardizing migration, recovery, and review playbooks.
- People who need success criteria and failure checks beside each example.
Primary Goals
- Turn common operational problems into reusable playbooks.
- Show what “good output” looks like before you continue.
- Call out when a lane or workflow is the wrong fit for the job.
Before / After
- Before: operator examples looked like an unstructured command dump.
- After: each recipe shows the problem, the safe path, the expected result, and the failure points to check first.
What success looks like
- You can copy a recipe and understand which step is safe to try first.
- You know what result would prove the workflow worked.
- You know which failure should stop the flow before live mutation.
Failure checks
- If the recipe assumes the wrong source lane, stop and adjust before continuing.
- If the output or preview differs from the intended result, verify the staging inputs before you mutate live state.
- If the example feels too thin for your task, switch to the command reference for exact flags.
Recipe 1: Promoting Dashboards (Dev -> Prod)
Problem: Exporting from Dev and importing to Prod often fails due to hardcoded organization IDs, folder context, or source-environment datasource UIDs.
Before: Promotion means manually cleaning export files, guessing which lane to use, and discovering environment-specific metadata problems during import.
After: Promotion becomes a lane choice plus one replay path that is easier to review and explain.
Solution: Use the prompt/ lane for a clean promotion handoff.
- Export from Dev:
grafana-util dashboard export --output-dir ./dev-assets - Locate Clean Source: Use files in
./dev-assets/prompt/. These have environment-specific metadata stripped. - Import to Prod:
# Purpose: Import to Prod.
grafana-util dashboard import --input-dir ./dev-assets/prompt --url https://prod-grafana --replace-existing
Use this when: the source and target environments share dashboard intent, but you do not want to replay every source-specific field literally.
Do not use this when: you are performing raw backup/replay or disaster recovery. In that case, start from raw/, not prompt/.
Success looks like:
- the import lands without source-environment-only metadata causing conflicts
- target dashboards bind to the intended target datasources and folders
- the resulting live dashboards need minimal cleanup after import
If it fails, check:
- whether the target datasource UIDs or names actually exist
- whether the chosen lane should have been
raw/instead ofprompt/ - whether your credential scope can see the target org or folder
Recipe 2: Auditing Dependencies Before Import
Problem: Importing a dashboard without its required datasource results in broken panels and misleading "successful" imports.
Before: Import appears to succeed, but the first real signal is a broken dashboard after replay.
After: Missing datasource dependencies are visible before import, while there is still time to fix mapping or target inventory.
Solution: Run a pre-import inspection.
# Generate a report of all required datasources in your export tree
grafana-util dashboard analyze --input-dir ./backups/raw --input-format raw --output-format dependency
What to check: Ensure every datasource listed in the dependency report exists in your target Grafana's datasource list.
Use this when: you are preparing an import, validating a promotion bundle, or checking whether a dashboard export is portable enough for another environment.
Success looks like:
- every required datasource UID is present in the target
- missing dependencies are known before import time
- you can explain which dashboards are blocked and why
If it fails, check:
- whether the target environment uses different datasource naming or UID conventions
- whether you exported the correct lane
- whether the target credentials can list the datasources you expect
Recipe 3: Mass Tagging/Renaming (Surgical Patching)
Problem: You need to add a tag such as ManagedBySRE to many dashboards at once without hand-editing every file.
Before: A simple bulk change turns into manual JSON edits or a risky script with no preview.
After: The patch stays mechanical, reviewable, and previewed before live replay.
Solution: Use patch-file in a loop, then preview the result before replaying it.
# Purpose: Solution: Use patch-file in a loop, then preview the result before replaying it.
for file in ./dashboards/raw/*.json; do
grafana-util dashboard patch-file --input "$file" --tag "ManagedBySRE" --output "$file"
done
grafana-util dashboard import --input-dir ./dashboards/raw --replace-existing --dry-run --table
Use this when: the structural change is local and mechanical, and you want to keep the update reviewable.
Do not use this when: the patch logic is so complex that a loop hides too much risk, or when the right answer depends on live discovery rather than local artifacts.
Success looks like:
- the modified files still review cleanly in Git
- repeated patching does not create unexpected drift
- the follow-on import is still previewed with
--dry-runbefore live execution
If it fails, check:
- whether your loop is patching the right lane and file set
- whether the patch should have targeted
prompt/rather thanraw/ - whether the import should be previewed first with
--dry-run
Recipe 4: Verifying Alert Routing Logic
Problem: Complex notification policies make it hard to know where an alert will land.
Solution: Use preview-route to simulate matches.
# Purpose: Solution: Use preview-route to simulate matches.
grafana-util alert preview-route \
--desired-dir ./alerts/desired \
--label service=order \
--severity critical
Goal: Verify that the receiver in the output matches your intended Slack channel or PagerDuty service.
Use this when: labels or notification policies are changing and you want a deterministic answer before anyone assumes the route is correct.
Success looks like:
- the resolved receiver matches the intended destination
- labels that should distinguish critical paths actually do
- route previews are reviewed before a plan/apply step
If it fails, check:
- whether the labels in the preview match the labels your rules will actually emit
- whether the desired alert files and notification policies are in sync
- whether the issue is route logic versus rule classification
Expert Tips
- UID consistency: Always define stable
uids in your JSON. Do not rely on incrementalids. - Dry-run everything: Use
--dry-runto see theACTION=updatevsACTION=createpreview before making live changes. - Git integration: Only commit the
raw/anddesired/directories to Git. These are your canonical sources. - Credential reality check: Before blaming the recipe, verify that the chosen credential can really see the org, folder, or admin surface you are operating on.
- Role split: Use the handbook for workflow choice and the command reference when you need the exact flags for one step.
⬅️ Previous: Technical Reference | 🏠 Home | ➡️ Next: Troubleshooting & Glossary