Cudajit.Stream
CUDA streams are independent FIFO schedules for CUDA tasks, allowing them to potentially run in parallel. See: Stream Management.
Stores a stream pointer and manages lifetimes of kernel launch arguments. See CUstream.
val sexp_of_t : t -> Sexplib0.Sexp.t
val mem_alloc : t -> size_in_bytes:int -> Deviceptr.t
See cuMemAllocAsync.
The pointer is finalized using cuMemFreeAsync.
val mem_free : t -> Deviceptr.t -> unit
See cuMemFreeAsync.
val memcpy_H_to_D_unsafe :
dst:Deviceptr.t ->
src:unit Ctypes.ptr ->
size_in_bytes:int ->
t ->
unit
See cuMemcpyHtoDAsync.
val memcpy_H_to_D :
?host_offset:int ->
?length:int ->
dst:Deviceptr.t ->
src:('a, 'b, 'c) Stdlib.Bigarray.Genarray.t ->
t ->
unit
Copies the bigarray (or its interval) into the device memory asynchronously. host_offset
and length
are in numbers of elements. See memcpy_H_to_D_async_unsafe
.
type kernel_param =
| Tensor of Deviceptr.t
| Int of int
Passed as C int
.
| Size_t of Unsigned.size_t
| Single of float
Passed as C float
.
| Double of float
Passed as C double
.
Parameters to pass to a kernel.
val sexp_of_kernel_param : kernel_param -> Sexplib0.Sexp.t
val no_stream : t
The NULL stream which is the main synchronization stream of a device. Manages lifetimes of the corresponding kernel launch parameters.
val launch_kernel :
Module.func ->
grid_dim_x:int ->
?grid_dim_y:int ->
?grid_dim_z:int ->
block_dim_x:int ->
?block_dim_y:int ->
?block_dim_z:int ->
shared_mem_bytes:int ->
t ->
kernel_param list ->
unit
See cuLaunchKernel.
val memcpy_D_to_H_unsafe :
dst:unit Ctypes.ptr ->
src:Deviceptr.t ->
size_in_bytes:int ->
t ->
unit
See cuMemcpyDtoHAsync.
val memcpy_D_to_H :
?host_offset:int ->
?length:int ->
dst:('a, 'b, 'c) Stdlib.Bigarray.Genarray.t ->
src:Deviceptr.t ->
t ->
unit
Copies from the device memory into the bigarray (or its interval) asynchronously. host_offset
and length
are in numbers of elements. See memcpy_D_to_H_unsafe
and cuMemcpyDtoHAsync.
val memcpy_D_to_D :
?kind:('a, 'b) Stdlib.Bigarray.kind ->
?length:int ->
?size_in_bytes:int ->
dst:Deviceptr.t ->
src:Deviceptr.t ->
t ->
unit
Copies between two memory positions on the same device asynchronously. The size to copy can optionally be provided in numbers of elements via kind
and length
. Provide either both kind
and length
, or just size_in_bytes
. See cuMemcpyDtoDAsync.
val memcpy_peer :
?kind:('a, 'b) Stdlib.Bigarray.kind ->
?length:int ->
?size_in_bytes:int ->
dst:Deviceptr.t ->
dst_ctx:Context.t ->
src:Deviceptr.t ->
src_ctx:Context.t ->
t ->
unit
Copies between memory positions on two different devices asynchronously. The size to copy can optionally be provided in numbers of elements via kind
and length
. Provide either both kind
and length
, or just size_in_bytes
. See cuMemcpyPeerAsync.
See CUmemAttach_flags.
val sexp_of_attach_mem : attach_mem -> Sexplib0.Sexp.t
val attach_mem_of_sexp : Sexplib0.Sexp.t -> attach_mem
val attach_mem : t -> Deviceptr.t -> int -> attach_mem -> unit
val create : ?non_blocking:bool -> ?lower_priority:int -> unit -> t
Lower lower_priority
numbers represent higher priorities, the default is 0
. See cuStreamCreateWithPriority.
The stream value is finalized using cuStreamDestroy. This is safe without needing to set the proper context.
See cuStreamGetCtx.
val get_id : t -> Unsigned.uint64
See cuStreamGetId.
val is_ready : t -> bool
Returns false
when the querying status is CUDA_ERROR_NOT_READY
, and true
if it is CUDA_SUCCESS
. See cuStreamQuery.
val synchronize : t -> unit
Waits until a stream's tasks are completed. See cuStreamSynchronize.
val memset_d8 : Deviceptr.t -> Unsigned.uchar -> length:int -> t -> unit
See cuMemsetD8Async.
val memset_d16 : Deviceptr.t -> Unsigned.ushort -> length:int -> t -> unit
length
is in number of elements. See cuMemsetD16Async.
val memset_d32 : Deviceptr.t -> Unsigned.uint32 -> length:int -> t -> unit
length
is in number of elements. See cuMemsetD32Async.