well i assume you gotta make or commission a good model for what ever framework you are gonna use.
for mocap i think they use a thing called perception neuron (1000+ clams) and use recorded voice for simulated lip movement.
the part i donno for sure is what they use to put it all together. i think it might of been unity?
very low T. have some high T.