Like your work, but would suppose it'd be hard to gather them all – short of capping/cutting them yourself. Why not link to/screen the original message from an archive like the q_raw.txt, or both ?
>capping/cutting them yourself
>that's what i am doing it right now.
Probably as the best as this way you'll have time zone consistency.
>my original thought.
>but i didn't like looking at a plain text pic.
Didn't mean a plain text pic, meant HTML code that would screen the orig (copyable/searchable) message text (like in a window/frame or so).
One could also wonder if, to make it easier, it could be possible to extract the message part directly in HTML from saved/backed up HTML pages of all the threads, and simply insert it into your webpage.
That's a nice idea. This way (tagging them with like keyword tags) one could also produce maps & statistics of words and connections between them and such.
I am not saying it would not be tedious to do, but (in Firefox) you can always:
– right-click on a post, click Inspect Element,
– on the left side in the window hover over the line that makes the complete post light up (pic1) and
– left-click on that line/element and choose "Edit As HTML"
– Copy the text in the window the lower window, and pasta into some file you save as e.g. test.html
– Open that file with a browser, and you'll see the result of what you've done (pic2)
Not sure if you could pasta the copied html code into your webpage at some point … just as a suggestion, however …
Ok, I'll give it a shot with the extraction of the original code . . .
Image-links in the raw HTML could be problematic in the original code, and I am not sure if I can manage to get an accurate and consistent result, as I won't be doing it via Firefox (as described above) – but we'll see.
Kek, yes.
Once we have the html code extracted accurately, one could search for/mark all kind of lists of words and stuff.
In this case one would first have to define what'd be "uncommon" (and then fill a list of those uncommons)
However, one step after the other . . .