Grok3 has a lot more info on the Q posts and with the "thinking" tag does a decent job with analysis. Grok2 would regularly get the post content wrong and seemed to base most analysis on random twitter posts. Grok3 is a pretty big improvement.
here's a discussion on the AF1->Q0 callsign change from 2018, it was able to access the raw ADS-B data and also interpret screen images when coming up with conclusions.