Deep Learning Primitives (CUDA® Deep Neural Network library™ (cuDNN))
High-performance building blocks for deep neural network applications including convolutions, activation functions, and tensor transformations. The cuDNN product page can be found here. The cuDNN documentation page can be found here.
Deep Learning Inference Engine (TensorRT™ )
High-performance deep learning inference runtime for production deployment. The TensorRT product page can be found here. The TensorRT documentation page can be found here.
Deep Learning for Video Analytics (NVIDIA DeepStream™ SDK)
High-level C++ API and runtime for GPU-accelerated transcoding and deep learning inference. The DeepStream SDK product page is located here.
Linear Algebra (CUDA® Basic Linear Algebra Subroutines library™ (cuBLAS))
GPU-accelerated BLAS functionality that delivers 6x to 17x faster performance than CPU-only BLAS libraries. The cuBLAS product page is located here. The cuBLAS documentation page is located here.
Sparse Matrix Operations (NVIDIA CUDA® Sparse Matrix library™ (cuSPARSE))
GPU-accelerated linear algebra subroutines for sparse matrices that deliver up to 8x faster performance than CPU BLAS (MKL), ideal for applications such as natural language processing. The cuSPARSE product page is located here. The cuSPARSE documentation is located here.
Multi-GPU Communication (NVIDIA® Collective Communications Library ™ (NCCL))
Collective communication routines, such as all-gather, reduce, and broadcast that accelerate multi-GPU deep learning training on up to eight GPUs. The NCCL product page is located here. The NCCL documentation is located here.
<<< <<<
The Deep Learning SDK requires the CUDA® Toolkit™ , which offers a comprehensive development environment for building new GPU-accelerated deep learning algorithms, and dramatically increasing the performance of existing applications.
<<< <<<
网友评论