AI/ML Feature Evaluation Framework
How to decide when to build an AI-powered feature, scope the problem, and choose between build, buy, or API.
When to Use This Framework
Use this when asked: "Should we add AI to this product?", "How would you build a recommendation system?", "Walk me through how you'd approach an ML-powered feature." It is also the right starting point for any AI product design question.
Step 1: Clarify the Problem Before Touching AI
AI is a solution, not a goal. Many candidates jump to model architecture before establishing whether AI is even the right tool.
Ask first:
Only after answering these should you evaluate whether ML is warranted.
Step 2: Is AI the Right Approach?
Not every problem needs machine learning. Use this checklist to decide.
AI is a good fit when:
Stick with rules or simpler logic when:
Classic PM trap: Building a model when a well-tuned heuristic would work 90% as well with 10% of the complexity.
Step 3: Assess Data Readiness
AI models are only as good as their training data. Evaluate four dimensions:
- Volume: Do you have enough labeled examples? As a rough rule, supervised classification needs at minimum thousands of labeled examples; complex tasks (NLP, vision) need far more.
- Quality: Is the data accurate, consistent, and representative of real-world inputs?
- Recency: Is the data fresh enough to reflect current user behavior?
- Bias: Does the data reflect the diversity of your actual user population, or does it systematically underrepresent certain groups?
If data readiness is low, the right PM move is to invest in data infrastructure first — not to skip straight to model development.
Step 4: Build vs. Buy vs. API
For most product teams, the choice is not whether to build a model from scratch — it is which tier of the AI stack to own.
Build (Train Your Own Model)
When to choose: You have a highly differentiated use case, proprietary data is your moat, or off-the-shelf models cannot hit your accuracy bar.
Cost: Highest. Requires ML engineers, data infrastructure, ongoing retraining, and monitoring.
Examples: Google's search ranking, Netflix recommendations, Spotify's Discover Weekly.
Buy (Acquire or License)
When to choose: You need deep, specialized capability quickly and are willing to take a dependency on a vendor.
Cost: High upfront, but faster than building.
Examples: Acquiring a specialized AI startup rather than building the capability internally.
API / Foundation Model
When to choose: Your use case is within the capability of a general-purpose model (GPT, Claude, Gemini), and you do not need proprietary data advantages.
Cost: Lowest to start. Variable at scale — watch for API cost inflation as usage grows.
Examples: Adding summarization, classification, or generation features using an LLM API.
The PM's job is to frame this as a make-vs-buy tradeoff: differentiation, data moat, speed, cost, and risk tolerance.
Step 5: Scope the ML Problem
Once you have decided to build, translate the user problem into an ML problem statement.
Define:
Example: "We want to reduce spam in comments. Input: comment text + author history. Output: spam probability score (0–1). Task: binary classification. Feedback: user reports + human review labels."
Step 6: Define the Launch Strategy
AI features need a different launch playbook than traditional software.
- Shadow mode: Run the model in parallel with the existing system, log predictions, and compare — before exposing output to users.
- Staged rollout: Start with a small percentage of traffic, monitor closely, expand only when metrics hold.
- Human-in-the-loop: For high-stakes decisions, require human review before the model's output is acted on.
- Fallback: Always have a rule-based or manual fallback for when the model confidence is below a threshold.
Common Mistakes to Avoid
- Jumping to AI before validating the user problem
- Underestimating data readiness as a blocker
- Choosing "build" by default when an API would ship faster and work equally well
- Treating the model as the product — users care about outcomes, not architecture
- Skipping shadow mode and going straight to production
- Not planning for model decay as data distributions shift over time