Includes some useful examples below.
Enabling Dynamic Control Flow in CUDA Graphs with Device Graph Launch
By Sally Stevenson, Stephen Jones and Fred Oh
CUDA Graphs significantly reduce the overhead of launching a large batch of user operations by defining them as a task graph, which may be launched in a single operation. Knowing the workflow upfront enables the CUDA driver to apply various optimizations, which cannot be performed when launching through a stream model.
However, this performance comes at the cost of flexibility: if the full workflow is not known in advance, then GPU execution must be interrupted to return to the CPU to make a decision.
CUDA device graph launch solves this problem by enabling a task graph to be performantly launched from a running GPU kernel, based on data that is determined at run time. CUDA device graph launch offers two distinct launch modes—fire and forget, and tail launch—to enable a wide range of applications and use.
This post demonstrates how to use device graph launch and the two launch modes. It features the example of a device-side work scheduler, which decompresses files for data processing. ... '
No comments:
Post a Comment