Daniel Viglione

141
reputation

Microsoft’s Phi-2 is only 2.7 billion parameters. Yet it outperforms Meta Llama 90B on mongo aggregation pipelines because it is specifcally trained on quality textbook data, fine tuned for the specific task we need it for. It has a modest memory footprint. Half-precision (FP16) occupies only 5GB of VRAM on my Geoforce RTX. I am even willing to accept 4 bit quantization for this. Its importance is of the essence for GuideWell Chat and for the 1000+ users who use GuideWell Chat: fine-tuned, purpose-built for generating MongoDB pipelines queries from natural language, where Meta Llama 4/3.2 as much larger models often hallucinate. Don't get me wrong, Scout's 109B (while using 16 experts) is good enough as a general-purpose model, but not for a specific task that is needed here. In contrast, Magistral Small is 24B and yes it has been fine-tuned for multi-step train of thought, which can outpeform Llama 4 for multi-step train of thought, but what about the cost-benefit analysis? Magistral is looking at 48GB of VRAM where phi mongo is only 5-6GB. The latter has a crucial use case for a product that is going live to production in weeks and used by 1000+ users of the enterprise.