There are several solutions to use The XXL versions of ProtT5:
- Use a GPU with a big memory like NVIDIA Quadro RTX-8000 or NVIDIA A100, with/without half-precision.
- Use a GPU with less memory after you quantize the model, which will make the model size 3x-4x smaller:
https://pytorch.org/docs/stable/quantization.html - Convert the model to onnx, then Quantize the model, and use the CPU rather than GPU for inference:
https://github.com/agemagician/ProtTrans/tree/master/Embedding/Onnx - Parallelize the model across multiple small GPUs:
https://huggingface.co/transformers/model_doc/t5.html#transformers.T5Model.parallelize
You can, of course, combine one or more of the above points.
网友评论