Cuda.StreamCUDA streams are independent FIFO schedules for CUDA tasks, allowing them to potentially run in parallel. See: Stream Management.
Stores a stream pointer and manages lifetimes of kernel launch arguments. See CUstream.
val sexp_of_t : t -> Sexplib0.Sexp.tval mem_alloc : t -> size_in_bytes:int -> Deviceptr.tSee cuMemAllocAsync.
The pointer is finalized using cuMemFreeAsync.
val mem_free : t -> Deviceptr.t -> unitSee cuMemFreeAsync.
val memcpy_H_to_D_unsafe :
dst:Deviceptr.t ->
src:unit Ctypes.ptr ->
size_in_bytes:int ->
t ->
unitSee cuMemcpyHtoDAsync.
val memcpy_H_to_D :
?host_offset:int ->
?length:int ->
dst:Deviceptr.t ->
src:('a, 'b, 'c) Stdlib.Bigarray.Genarray.t ->
t ->
unitCopies the bigarray (or its interval) into the device memory asynchronously. host_offset and length are in numbers of elements. See memcpy_H_to_D_unsafe.
type kernel_param = | Tensor of Deviceptr.t| Int of intPassed as C int.
| Size_t of Unsigned.size_t| Single of floatPassed as C float.
| Double of floatPassed as C double.
Parameters to pass to a kernel.
val sexp_of_kernel_param : kernel_param -> Sexplib0.Sexp.tval no_stream : tThe NULL stream which is the main synchronization stream of a device. Manages lifetimes of the corresponding kernel launch parameters.
val launch_kernel :
Module.func ->
grid_dim_x:int ->
?grid_dim_y:int ->
?grid_dim_z:int ->
block_dim_x:int ->
?block_dim_y:int ->
?block_dim_z:int ->
shared_mem_bytes:int ->
t ->
kernel_param list ->
unitSee cuLaunchKernel.
val memcpy_D_to_H_unsafe :
dst:unit Ctypes.ptr ->
src:Deviceptr.t ->
size_in_bytes:int ->
t ->
unitSee cuMemcpyDtoHAsync.
val memcpy_D_to_H :
?host_offset:int ->
?length:int ->
dst:('a, 'b, 'c) Stdlib.Bigarray.Genarray.t ->
src:Deviceptr.t ->
t ->
unitCopies from the device memory into the bigarray (or its interval) asynchronously. host_offset and length are in numbers of elements. See memcpy_D_to_H_unsafe and cuMemcpyDtoHAsync.
val memcpy_D_to_D :
?kind:('a, 'b) Stdlib.Bigarray.kind ->
?length:int ->
?size_in_bytes:int ->
dst:Deviceptr.t ->
src:Deviceptr.t ->
t ->
unitCopies between two memory positions on the same device asynchronously. The size to copy can optionally be provided in numbers of elements via kind and length. Provide either both kind and length, or just size_in_bytes. See cuMemcpyDtoDAsync.
val memcpy_peer :
?kind:('a, 'b) Stdlib.Bigarray.kind ->
?length:int ->
?size_in_bytes:int ->
dst:Deviceptr.t ->
dst_ctx:Context.t ->
src:Deviceptr.t ->
src_ctx:Context.t ->
t ->
unitCopies between memory positions on two different devices asynchronously. The size to copy can optionally be provided in numbers of elements via kind and length. Provide either both kind and length, or just size_in_bytes. See cuMemcpyPeerAsync.
See CUmemAttach_flags.
val sexp_of_attach_mem : attach_mem -> Sexplib0.Sexp.tval attach_mem : t -> Deviceptr.t -> int -> attach_mem -> unitval create : ?non_blocking:bool -> ?lower_priority:int -> unit -> tLower lower_priority numbers represent higher priorities, the default is 0. See cuStreamCreateWithPriority.
The stream value is finalized using cuStreamDestroy. This is meant to be safe without needing to set the proper context.
See cuStreamGetCtx.
val get_id : t -> Unsigned.uint64See cuStreamGetId.
val is_ready : t -> boolReturns false when the querying status is CUDA_ERROR_NOT_READY, and true if it is CUDA_SUCCESS. See cuStreamQuery.
val synchronize : t -> unitWaits until a stream's tasks are completed. See cuStreamSynchronize.
val memset_d8 : Deviceptr.t -> Unsigned.uchar -> length:int -> t -> unitSee cuMemsetD8Async.
val memset_d16 : Deviceptr.t -> Unsigned.ushort -> length:int -> t -> unitlength is in number of elements. See cuMemsetD16Async.
val memset_d32 : Deviceptr.t -> Unsigned.uint32 -> length:int -> t -> unitlength is in number of elements. See cuMemsetD32Async.
val total_unreleased_unfinished_delimited_events : t -> int * int * intDebug information about delimited events carried by the stream: total, unreleased (i.e. not destroyed), unfinished.