because it encodes semantics.
Please enlighten me on how? I admit I don't know all the internals of the transformer model, but from what I know it encodes precisely only syntactical information, i.e. what next syntactical token is most likely to follow based on a syntactical context window.
How does it encode semantics? What is the semantics that it encodes? I doubt they have denatotational or operational semantics of natural language, I don't think something like that even exists, so it has to be some smaller model. Actually, it would be enlightening if you could tell me at least what the semantical domain here is, because I don't think there's any naturally obvious choice for that.
Are they a serious researcher in ML with insights into some of the most interesting and complicated intersections of computer science and analytical mathematics, or a promptfondler that earns 3x the former's salary for a nebulous AI startup that will never create anything of value to society? Read on to find out!