The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper
•
2408.01050
•
Published
•
9
Inference engines, quantization, serving stacks, and perf tooling. Reference list for deployment and latency/cost work.
Display benchmark evaluation data for LLMs
VLMEvalKit Evaluation Results Collection
VLMEvalKit Eval Results in video understanding benchmark