New Show Hacker News story: Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090

December 26, 2023

Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090
2 by muttled | 0 comments on Hacker News.
I was running into an issue with a vLLM bug that affected multiple GPUs and I needed a stand-in while that bug was getting fixed that used the same API format but had better performance than the API on text-generation-webui. It's very rough. I'm not a coder by trade. But it's very fast once you have many simultaneous connections.

Search This Blog

TODAYS TECH WORLD

New Show Hacker News story: Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090

Comments

Post a Comment

Popular posts from this blog

Internet Download Manager Universal Crack is Here ! [IDM 6.25 Build 10 UPDATED]

जहानाबाद में 133 बोतल अंग्रेजी शराब बरामद:रेल थाने की पुलिस ने 44 बोतल बियर केन भी किया बरामद

FOX NEWS: Huma Abedin and Anthony Weiner spotted together in rare photos strolling in NYC ahead of book release