posted by [identity profile] gravitrue.livejournal.com at 09:59pm on 2007-12-01
> I've got an idea how to collect the comments posted to all the LJ-codebase
> sites and reproduce them at my own web site (though I need to estimate the
> impact it'll have in terms of both disk space and traffic). I'm not sure
> mirroring comments between the various journalling sites will work.

Really? It's a bitch of a problem; neither the API nor the RSS/Atom feeds provide comments, and I'm not sure the API supports reading locked posts; the backup functions certainly will not let me copy stuff from other folks journals, even if I have permission to read it. Screen scraping works if all you want is a mirror, but if you want stuff parsed as messages, well, if you find working *nix code that does this, I seriously want a copy.
 
posted by [identity profile] dglenn.livejournal.com at 09:27pm on 2007-12-02
Comments posted to my entries by othe users are emailed to me, so getting timestamp/user-id/subject/text of those is easy. Getting which avatar they used isn't. Nor is getting my own comments sucked into the mirror automagically, since getting one's own comments appears to be a paid-user-only feature (and until/unless 6A makes some significant and visible changes, I can't see giving them any more of my money).

Getting in the habit of always posting comments via OpenID as a user on a different site will help, since those comments count as "not from myself" to LJ and do get mailed. A less than optimal solution, 'cause at some point I'll forget to do it, of course.

If I have to resort to screen-scraping (which I'll have to make infrequent enough to not attract attention / cause problems), there are some useful clues in the HTML that LJ sends that can be used to parse out comments separately, though I haven't looked closely enough to see whether it'll be easy to preserve the threading ... and when there are enough comments to collapse threads, getting the collapsed ones will be a PITA.

Note also that taking comments at my own site and propogating them out to the LJ-style sites ain't gonna be easy either -- I probably won't even attempt that.

I was planning on implenting whatever parts of this I manage on only my own journals, so copying from other people's journals won't be an issue. Getting locked posts could be -- I can have my posting client deal with the originally-posted version of an entry, but if I want the central-mirror to reflect edits I've made later via LJ's web interface, not being able to get locked posts via the API would be a stumbling block. Not that I post many locked entries (I think I'm up to three now, in all the time I've been blogging), but I could well wind up having reasons to do so in the future.

So far the parts of the solution that exist are incomplete and in my head; I would not be too terribly surprised (just a little) to discover that filling in the gaps is more difficult than I'd guessed.

Links

January

SunMonTueWedThuFriSat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24
 
25
 
26
 
27
 
28
 
29
 
30
 
31