Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs
databricks
JANUARY 30, 2024
Quantization is a technique for making machine learning models smaller and faster. We quantize Llama2-70B-Chat, producing an equivalent-quality model that generates 2.2x more.
Let's personalize your content