Separating Surface Variation from Conceptual Variation in LLM Outputs
Across 3,300 LLM outputs, I measured three distinct layers of variation. Models rephrase freely while recycling the same concepts and arguments. Temperature doesn't help. A few structural prompt techniques do.