Add skill and evals for dynamic mode usage#6271
Add skill and evals for dynamic mode usage#6271rostan-t wants to merge 5 commits intoNVIDIA:mainfrom
Conversation
.claude/skills/using-dali-dynamic-mode-workspace/evals/files/pipeline_to_convert.py
Dismissed
Show dismissed
Hide dismissed
.claude/skills/using-dali-dynamic-mode-workspace/evals/files/pipeline_to_convert.py
Dismissed
Show dismissed
Hide dismissed
Greptile SummaryThis PR adds a Claude Code skill ( Key points from the review:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["import nvidia.dali.experimental.dynamic as ndd"] --> B["ndd.set_num_threads(N)"]
B --> C["reader = ndd.readers.File(...)\n(created once, stateful)"]
C --> D["Epoch loop\nfor epoch in range(num_epochs)"]
D --> E["reader.next_epoch(batch_size=N)\nyields tuples of Batch"]
E --> F["Batch loop\nfor jpegs, labels in ..."]
F --> G["ndd.decoders.image(jpegs, device='gpu')\nNOT device='mixed'"]
G --> H["ndd operators\nndd.resize, ndd.crop_mirror_normalize, etc."]
H --> I{"Variable-length\nbatch?"}
I -- "Yes (audio/bbox)" --> J["batch.torch(pad=True)"]
I -- "No (uniform shapes)" --> K["batch.torch()"]
J --> L["train_step(...)"]
K --> L
L --> F
F --> D
subgraph Debugging
M["with ndd.EvalMode.sync_full:"]
M --> N["errors surface at exact call site"]
end
subgraph RNG
O["ndd.random.set_seed(42)\nor ndd.random.RNG(seed=42)"]
O --> P["ndd.random.uniform(batch_size=N, ...)"]
end
Last reviewed commit: "Clarify PyTorch conv..." |
| t.torch(copy=False) # zero-copy PyTorch tensor (default) | ||
| t[1:3] # slicing supported | ||
| np.asarray(t) # NumPy via __array__ (CPU only) | ||
| ``` | ||
|
|
||
| Supports `__dlpack__`, `__cuda_array_interface__`, `__array__`, arithmetic operators. | ||
|
|
||
| ### Batch -- collection of samples (variable shapes OK) | ||
|
|
||
| ```python | ||
| b = ndd.batch([arr1, arr2]) # copy | ||
| b = ndd.as_batch(data) # wrap, no copy if possible | ||
| ``` | ||
|
|
||
| **Batch has no `__getitem__`** -- `batch[i]` raises `TypeError` because indexing is ambiguous (sample selection vs. per-sample slicing). Use the explicit APIs instead: | ||
|
|
||
| | Intent | Method | Returns | | ||
| |--------|--------|---------| | ||
| | Get sample i | `batch.select(i)` | `Tensor` | | ||
| | Get subset of samples | `batch.select(slice_or_list)` | `Batch` | | ||
| | Slice within each sample | `batch.slice[...]` | `Batch` (same batch_size) | | ||
|
|
||
| `.select()` picks **which samples**. `.slice` indexes **inside each sample**. | ||
|
|
||
| ```python | ||
| xy = ndd.random.uniform(batch_size=16, range=[0, 1], shape=2) | ||
| crop_x = xy.slice[0] # Batch of 16 scalars, first element from each sample | ||
| crop_y = xy.slice[1] # Batch of 16 scalars, second element from each sample | ||
| sample_0 = xy.select(0) # Tensor, the entire first sample [x, y] | ||
| ``` | ||
|
|
||
| **PyTorch conversion:** | ||
| - `batch.torch()` -- works for uniform shapes; raises for ragged batches | ||
| - `batch.torch(pad=True)` -- zero-pads ragged batches to max shape (use for variable-length audio, detection boxes, etc.) | ||
| - `batch.torch(copy=None)` is the default (avoids copy if possible) | ||
| - Batch has **no `__dlpack__`** -- use `ndd.as_tensor(batch)` first for DLPack consumers. `ndd.as_tensor` supports `pad` as well. | ||
| - `Tensor.torch(copy=False)` is default (no copy) |
There was a problem hiding this comment.
Inconsistent
copy parameter defaults for Tensor.torch() vs Batch.torch()
The skill documents two different defaults for the same copy parameter, which could lead AI agents to generate subtly incorrect code:
- Line 23:
t.torch(copy=False) # zero-copy PyTorch tensor (default)— impliescopy=Falseis the Tensor default - Line 57:
batch.torch(copy=None) is the default (avoids copy if possible)— impliescopy=Noneis the Batch default - Line 59:
Tensor.torch(copy=False) is default (no copy)— restates the Tensor claim
In standard Python/NumPy/PyTorch semantics, copy=False typically means "never copy, raise an error if a copy is required", while copy=None means "copy only if necessary". If Tensor.torch() genuinely defaults to copy=False, that's a stricter contract than Batch.torch()'s copy=None — an agent following this skill might generate Tensor conversion code that raises unexpectedly when a copy is unavoidable, while Batch code silently succeeds.
If both actually default to the same behaviour, one of these needs correcting. Please verify the actual DALI API defaults and align the documentation so agents receive consistent guidance.
There was a problem hiding this comment.
@greptileai This is the actual DALI behavior documented in both _tensor.py and _batch.py. Batch.torch() accepts None for the copy parameter because the copy cannot be avoided if the batch is not stored as a dense tensor. This doesn't make sense in Tensor.torch() because it's always possible to avoid copying (it takes no other arguments like dtype that could force a copy).
The skill is correct there and helps agents catch this non-trivial difference.
There was a problem hiding this comment.
Got it — thanks for the clarification! You're right that this is a meaningful API difference, not an inconsistency:
Tensor.torch(copy=False)as the default makes sense because a single dense tensor can always be zero-copiedBatch.torch(copy=None)as the default is necessary because ragged batches physically require copying to convert to a dense PyTorch tensor
The skill correctly documents this distinction, and agents need to know that Batch.torch() may copy even when not explicitly requested (for ragged data), while Tensor.torch() guarantees zero-copy by default.
The documentation as written will help agents avoid incorrect assumptions about copy behavior when working with ragged batches. No changes needed here.
| | `.cpu()` before GPU model | `.torch()` directly | Avoids wasteful D2H + H2D round-trip | | ||
| | Recreate reader each epoch | `reader.next_epoch()` | Readers are stateful -- create once, reuse | | ||
| | `ndd.readers.file(...)` | `ndd.readers.File(...)` | Reader classes are PascalCase | | ||
| | `break` from `next_epoch()` loop | Exhaust iterator or create new reader | Iterator must be fully consumed before next `next_epoch()` | |
There was a problem hiding this comment.
Misleading "Exhaust iterator" advice after a
break
The "Right" column for breaking out of next_epoch() says "Exhaust iterator or create new reader". However, after a break the iterator is already abandoned — you cannot exhaust it without explicitly draining it in a separate loop, which is non-obvious.
An AI agent reading this might attempt to do something like list(iterator) after breaking, which is wasteful. The more actionable guidance is to either avoid breaking in the first place (restructure the loop condition) or, if you must exit early, create a new reader. Consider updating the wording:
| | `break` from `next_epoch()` loop | Exhaust iterator or create new reader | Iterator must be fully consumed before next `next_epoch()` | | |
| | `break` from `next_epoch()` loop | Avoid breaking (use a flag/condition instead), or create a new reader | Iterator must be fully consumed before next `next_epoch()` | |
There was a problem hiding this comment.
@greptileai We actually do need to fully consume the iterator before going to the next epoch. Breaking can be legitimate in some cases and the two approaches that work are clearly documented already. Your suggestion ("Using a flag/condition instead") wouldn't help.
There was a problem hiding this comment.
You're right, I apologize for the confusion. The documentation clearly states the two valid approaches when you need to exit early:
- Manually drain the remaining iterator items before calling
next_epoch()again - Create a new reader instance
Both are legitimate solutions for cases where breaking is necessary (early stopping, errors, etc.). The table is clear as written.
Signed-off-by: Rostan Tabet <[email protected]>
e525b96 to
eebd995
Compare
.claude/skills/using-dali-dynamic-mode-workspace/evals/files/pipeline_to_convert.py
Show resolved
Hide resolved
Signed-off-by: Rostan Tabet <[email protected]>
Signed-off-by: Rostan Tabet <[email protected]>
Signed-off-by: Rostan Tabet <[email protected]>
Signed-off-by: Rostan Tabet <[email protected]>
| ## Pipeline Mode Migration | ||
|
|
||
| | Pipeline Mode | Dynamic Mode | | ||
| |--------------|--------------| | ||
| | `@pipeline_def` / `pipe.build()` / `pipe.run()` | Direct function calls in a loop | | ||
| | `fn.readers.file(...)` | `ndd.readers.File(...)` (PascalCase, stateful) | | ||
| | `fn.decoders.image(jpegs, device="mixed")` | `ndd.decoders.image(jpegs, device="gpu")` | | ||
| | `fn.op_name(...)` | `ndd.op_name(...)` | | ||
| | Pipeline-level `batch_size=64` | `reader.next_epoch(batch_size=64)` + random ops `batch_size=64` | | ||
| | Pipeline-level `seed=42` | `ndd.random.set_seed(42)` or `ndd.random.RNG(seed=42)` | | ||
| | Pipeline-level `num_threads=4` | `ndd.set_num_threads(4)` at startup | | ||
| | `output.at(i)` | `batch.select(i)` | | ||
| | `output.as_cpu()` | `batch.cpu()` | | ||
| | `pipe.run()` returns tuple of `TensorList` | `reader.next_epoch(batch_size=N)` yields tuples of `Batch` | |
There was a problem hiding this comment.
device_id migration not documented
The Pipeline Mode Migration table is missing a row for the device_id parameter. The pipeline_to_convert.py used in eval 3 passes device_id=0 to the pipeline constructor:
pipe = training_pipeline(
image_dir="/data/images",
batch_size=64,
num_threads=4,
device_id=0, # <- no equivalent shown in migration table
seed=42,
)An agent converting this code using the migration table will encounter device_id=0 with no guidance. In multi-GPU scenarios it might silently drop the parameter or invent an incorrect API like ndd.set_device_id(0). Consider adding a row clarifying the dynamic mode equivalent (e.g., CUDA_VISIBLE_DEVICES, torch.cuda.set_device, or that dynamic mode picks up the default device automatically).
| --- | ||
| name: using-dali-dynamic-mode | ||
| description: "Use when writing DALI data loading or preprocessing code with `nvidia.dali.experimental.dynamic` (ndd), or when converting DALI pipeline-mode code to dynamic mode, or when the user asks about DALI dynamic mode, imperative DALI, or ndd. Use this skill any time someone mentions 'ndd', 'dynamic mode', 'DALI preprocessing', or wants to load/augment data with DALI outside of a pipeline definition." | ||
| --- |
There was a problem hiding this comment.
Skill description over-triggers on "DALI preprocessing"
The description includes the trigger phrase 'DALI preprocessing', which is a broad term that applies equally to pipeline-mode workflows. A user working on an existing pipeline-mode codebase who asks "how do I add DALI preprocessing to my model?" would have this skill injected, potentially nudging them toward dynamic mode when they never asked for it.
Consider narrowing the trigger to something more specific, e.g., 'DALI dynamic preprocessing' or 'DALI imperative API', so the skill only activates for genuinely dynamic-mode contexts.
| --- | |
| name: using-dali-dynamic-mode | |
| description: "Use when writing DALI data loading or preprocessing code with `nvidia.dali.experimental.dynamic` (ndd), or when converting DALI pipeline-mode code to dynamic mode, or when the user asks about DALI dynamic mode, imperative DALI, or ndd. Use this skill any time someone mentions 'ndd', 'dynamic mode', 'DALI preprocessing', or wants to load/augment data with DALI outside of a pipeline definition." | |
| --- | |
| description: "Use when writing DALI data loading or preprocessing code with `nvidia.dali.experimental.dynamic` (ndd), or when converting DALI pipeline-mode code to dynamic mode, or when the user asks about DALI dynamic mode, imperative DALI, or ndd. Use this skill any time someone mentions 'ndd', 'dynamic mode', 'DALI dynamic preprocessing', or wants to load/augment data with DALI outside of a pipeline definition." |
| {"name": "no-pipeline-mode", "text": "No pipeline-mode constructs (no @pipeline_def, pipe.build(), pipe.run()) and operators called directly on ndd (e.g. ndd.resize, not fn.resize or ndd.fn.resize)"}, | ||
| {"name": "torch-handoff", "text": "Uses .torch() for PyTorch conversion"}, | ||
| {"name": "no-unnecessary-evaluate", "text": "No unnecessary .evaluate() calls"}, | ||
| {"name": "set-num-threads", "text": "Calls ndd.set_num_threads() at startup"} |
| mean=[0.485 * 255, 0.456 * 255, 0.406 * 255], | ||
| std=[0.229 * 255, 0.224 * 255, 0.225 * 255], | ||
| ) | ||
| train_step(images.torch(), labels.torch()) |
There was a problem hiding this comment.
Is there a way to test this code against going stale?
Category:
Other (e.g. Documentation, Tests, Configuration)
Description:
Since dynamic mode is fairly new, AI agents are not very good at writing code using it. For instance, according to Anthropic, Claude Sonnet 4.6's knowledge cutoff is August 2026. Even when presented with a few examples, agents miss some dynamic-mode specific patterns and are not very helpful to write code using it.
This PR adds a Claude Code skill containing guidelines on how to use dynamic mode. It was generated with the
/skill-creatorwhich generates evals for the skill. Here are the results on running the eval with Claude Code using Sonnet 4.6:Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A