Bonsai 1.7B Hits 442 Tokens Per Second on M4 Max: Ternary Weight Efficiency in Practice
A ternary-weight 1.7B model achieves 442 T/s on Apple M4 Max, demonstrating how ultra-compact weight encoding translates to real-world on-device inference speed.
A ternary-weight 1.7B model achieves 442 T/s on Apple M4 Max, demonstrating how ultra-compact weight encoding translates to real-world on-device inference speed.