posted by [identity profile] dglenn.livejournal.com at 02:56am on 2005-12-02
Oh, and as for how 'sort' and/or 'uniq' failed me earlier: in the web stats post, there are a few URLs that appear twice when they should have been summed into one line, and there were a few more that I fixed by hand. (I don't think it was a matter of inconsistent trailing spaces, but I've not bothered to recreate all my steps to make sure. I'll certainly watch out for those next time and hope it all works right. Or obtain a proper web stats analysis program.)
 
posted by [identity profile] eviltomble.livejournal.com at 11:18pm on 2005-12-03
Hmm, so there are... I suppose if you still have the source data, the thing to do would be to check the output of sort before it gets fed to uniq -c, and see if things appear out of place there. If not, try getting a diff of the lines out of uniq -c before they go into the next sort (I presume you used a sort on the initial numeric field at the end?). Getting the difference between lines might be fiddly though.
Maybe "od" or similar might be of more use? Personally I wouldn't want to resort to somebody else's program though, I'd just have to spend time learning the thing and likely find it isn't what I'm after.
 
posted by [identity profile] dglenn.livejournal.com at 06:03pm on 2005-12-05
Oh, I still have the source data, of course, but the obsessive spell broke and I wandered away from the problem. When I get around to doing the same thing with the full logs (a few years worth), I'll want to make sure it all sorts correctly then.

I'm wondering whether a 3-dimensional graph can be arranged in a way that'll show me interesting things about the relative poularity of different pages over time, rather than just looking pretty. (Maybe if I limit the graph to the few most popular pages, instead of trying to crowd all of them in there...)

Links

January

SunMonTueWedThuFriSat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24
 
25
 
26
 
27
 
28
 
29
 
30
 
31