内存管理

默认情况下，CuPy使用内存池进行内存分配。内存池通过减轻内存分配和CPU / GPU同步的开销，显著提高了性能。
内存池预先规划一定数量的存储器区块，使得整个程序可以在运行期规划 (allocate)、使用 (access)、归还 (free) 存储器区块。
CuPy中有两个不同的内存池：
设备内存池（GPU设备内存），用于GPU内存分配。
固定内存池（不可交换的CPU内存），在CPU到GPU的数据传输期间使用。

内存池操作

内存池实例提供有关内存分配的统计信息,
cupy.get_default_memory_pool(),
cupy.get_default_pinned_memory_pool(),
访问默认的内存池实例。
还可以释放内存池中所有未使用的内存块。

import cupy
import numpy

mempool = cupy.get_default_memory_pool()
pinned_mempool = cupy.get_default_pinned_memory_pool()

# 在CPU上创建一个数组a_cpu
# NumPy在CPU 上分配了 400 bytes  (不是由CuPy 内存池管理的).
a_cpu = numpy.ndarray(100, dtype=numpy.float32)
print(a_cpu.nbytes)                      # 400

print(mempool.used_bytes())              # 0
print(mempool.total_bytes())             # 0
print(pinned_mempool.n_free_blocks())    # 0

# 把数组从 CPU 移到 GPU.
# 这个操作分配了400 bytes给GPU设备内存池, 还有 400bytes 给固定内存池.
# 被分配的固定内存池在转移操作完成之后就会被释放
# 实际被分配的大小可能是大一些的整数（比如512），而不是请求的大小（400）
a = cupy.array(a_cpu)
print(a.nbytes)                          # 400
print(mempool.used_bytes())              # 512
print(mempool.total_bytes())             # 512
print(pinned_mempool.n_free_blocks())    # 1

# 当数组不用了之后，分配的设备内存将释放并保留在池中，以备将来重用。
a = None  # (or `del a`)
print(mempool.used_bytes())              # 0
print(mempool.total_bytes())             # 512
print(pinned_mempool.n_free_blocks())    # 1

# 也可以释放整个'内存池`free_all_blocks`
mempool.free_all_blocks()
pinned_mempool.free_all_blocks()
print(mempool.used_bytes())              # 0
print(mempool.total_bytes())             # 0
print(pinned_mempool.n_free_blocks())    # 0

限制GPU内存使用

可以使用环境变量 CUPY_GPU_MEMORY_LIMIT来硬性限制GPU内存的大小。
还可以使用cupy.cuda.MemoryPool.set_limit()
设置限制（或覆盖通过环境变量指定的值）。这样，可以为每个GPU设备使用不同的限制。

# Set the hard-limit to 1 GiB:
   $ export CUPY_GPU_MEMORY_LIMIT="1073741824"

# You can also specify the limit in fraction of the total amount of memory
# on the GPU. If you have a GPU with 2 GiB memory, the following is
# equivalent to the above configuration.
#   $ export CUPY_GPU_MEMORY_LIMIT="50%"

import cupy
print(cupy.get_default_memory_pool().get_limit())  # 1073741824

mempool = cupy.get_default_memory_pool()

with cupy.cuda.Device(0):
    mempool.set_limit(size=1024**3)  # 1 GiB

with cupy.cuda.Device(1):
    mempool.set_limit(size=2*1024**3)  # 2 GiB

注意的是CUDA会分配一些内存池之外的GPU内存，根据使用情况可能有几百M，这些不会算进限制数量里。

改变内存池

你可以用自己的内存分配器，使用函数
cupy.cuda.set_allocator() / cupy.cuda.set_pinned_memory_allocator().

需要一个参数并且返回指针
cupy.cuda.MemoryPointer / cupy.cuda.PinnedMemoryPointer.
甚至可以通过以下代码禁用默认内存池。确保在执行任何其他CuPy操作之前执行此操作。

import cupy

# Disable memory pool for device memory (GPU)
cupy.cuda.set_allocator(None)

# Disable memory pool for pinned memory (CPU).
cupy.cuda.set_pinned_memory_allocator(None)