This data was originally featured in the December 11th, 2024 newsletter found here: INBOX INSIGHTS, December 11, 2024: Organizational Health, AI Models
In this week’s Data Diaries, let’s build on the 5P framework Katie reminded us of in the cold open. When it comes to AI tools, you have a bounty of choices; on the model “app store” of the world, Hugging Face, there are over a million different AI models to choose from, and virtually all of them are free of cost.
With that much choice, how do you decide what to use?
This is relevant because as the world changes, as governments change, there are a number of folks who would like more control over what data is given to AI, especially third party services, as well as concerns about reliability and availability.
To no one’s surprise, choosing an AI model can be done using the Trust Insights 5P framework, starting with Purpose and Process.
- Purpose: What tasks is the AI model supposed to accomplish? Many models are specialized towards a specific task, or perform certain tasks better than others.
- People: Do you have the technical skills necessary to implement certain types of models? If you’re working on Apple hardware, for example, your people should know what MLX is.
- Process: How will people use the models? What tasks will they be performing, and how granular is your documentation for those tasks?
- Platform: Based especially on your purpose and performance, what models best fit your goals? You’ll need to take into account things like computational power required.
- Performance: What will be your benchmarks for success? You’ll have many, from accomplishing the task overall to granular performance metrics like tokens per second and time to first token.
It’s very easy to get lost in the dizzying array of techno-jargon around AI models, with obscure-sounding terms like iMatrix quants and GGUF vs vLLM. All of that is important, but it comes well after working through the 5Ps broadly.
Let’s review the current state of models quickly so that you have a sense of at least what’s out there. This is just a tiny fraction of all the different options.
Foundation/Frontier Language Models
These are the best in class, and either are hosted by someone else or require substantial investment in hardware to operate.
- OpenAI o1: great at complex thinking
- Anthropic Claude Sonnet 3.5: great at writing and content creation
- Google Gemini 1.5 Exp 1206: great at handling huge datasets
- Meta Llama 3.1 405B: great for general purpose running in your own hardware
Language Models for Individual Use
- Meta Llama 3.3 70B: If you have a beefy laptop (think fully-loaded MacBook Pro M4 Max) or a server, this is the current best in class for following instructions carefully
- Alibaba Qwen QwQ 32B: A midrange reasoning model that runs on a gaming-class laptops and can reason well
- Alibaba Qwen 2.5 Coder 32B: The current best in class for writing code, also runs on a gaming-class laptop
- Cydonia Magnum 22B: A creative writing model that is very fluent at writing and bad at everything else
- Mistral Small 22B: A solid general purpose model for midrange computers
- Mistral Nemo 12B: A surprisingly good, small model for lower-end computers
Image Generation Models
- Black Forest Labs Flux-1 Dev: The current best in class image creation model you can run on your own hardware
Transcription Models
- Distil-Whisper Large V3: The fastest distilled version of OpenAI’s Whisper model, 6 times faster than OpenAI’s version, 49% smaller, and same accuracy. Great if you want to process a lot of audio without racking up big bills
Bear in mind that these models are all engines; just as you don’t drive down the road sitting on an engine block, neither do you use these models without infrastructure around them. Another time, we’ll talk about how to put these models into production.