[WTF?! Need line-eater-food to prevent LJ and/or this posting
client from trying to interpret the first word with a colon after the
first <P> tag as a formatting command? Never noticed that before...]
Background: There was my old site at www.radix.net (not to be
confused with my ancient (and long dead) site at digex.net)
which has been replaced with a directory tree full of "this page has
moved" messages with meta-refresh redirects to www.dglenn.org. And
there's my main site at www.glenn.org, which is getting most of its
hits via those redirects at www.radix.net, not directly. And there's
a backup/working-copy of my pages on www.panix.com that a couple of
search engines find instead of www.radix.net (and no search engines
seem to list www.dglenn.org first of the three, based on traffic
analysis).
Apparently part of the problem causing so much traffic to continue
to hit my www.radix.net ex-pages is that I used meta-refresh tags on
www.radix.net -- if I'd used .htaccess to generate 301 responses,
search engines would have moved their attention from www.radix.net
to www.dglenn.org by now ... but I wanted that few-seconds delay for
human readers to see the "this page has moved" notice and (I hoped)
update their bookmarks or links to my pages from their own sites.
The decision was made partly from ignorance of how to generate 301
codes, but I probably would have made the same choice in the interest
of prioritizing human readers over robotic ones. Lately, as described
here in passing, I've been trying to notify other webmasters personally
via email (do I bother sending a postcard to the ones who list a postal
address but no email address?) pointing out that my site moved eleven
months ago and asking them to update their links to me.
It may be time to say, "the humans have had plenty of time to see
the 'page moved' message; now it's the robots' turn and I should start
using 301 redirects" ... that's the smaller sanity check I'm asking for.
(Users would get redirected to the right place, but unless they think
to examine the address bar of their browsers they might not notice that
it was because of a redirect, not because my content was still where
they'd bookmarked it.)
The bigger sanity-check regards my www.panix.com pages, which I
consider a mirror of www.dglenn.org. I don't want to take those pages
down just yet, but I'd rather a) search engines link to the main site
instead of there, and b) search engines and maintained-index sites not
think that I'm trying to inflate my hit count by duplicating content.
I'd like most of the not-search-engine-mediated traffic to move to
www.dglenn.org as well, but that feels a little less critical right
now. (It'll matter if I wind up needing to reclaim that disk space
to stay within my quota at panix, or if I decide to use the two web
spaces to hold completely different sites.)
At the moment I'm finding it useful to have a copy of my pages on
a publically-accessible server where I have shell access so I can make
quick edits with vi and see the effects right away, or stick temporary
files that I plan to delete a few days later. Spot-maintenance is
easier with shell access than via ftp. If I 301 those pages to send
the search engines to my main site, I lose all the advantages of having
a spare copy for tinkering.
But it turns out that unlike Digex and Radix, Panix gives me PHP on
their web server. So the idea that came to my muddled-by-lack-of-sleep
brain was this: convince the server to tread .html pages as though they
were .php ones (by way of .htaccess), and then have PHP on each page
(via includes) check the user-agent string for known spiders, returning
a 301:go-to-www.dglenn.org message to a spider, but a 200(ok) and the
normal page contents to any other visitor.
If I also force .html to be interpreted as .php on www.dglenn.org,
I wouldn't even have to strip the code before ftp'ing edited pages
over to the main site -- I'd just leave an inoccuous include file
on that server instead of the one that has the spider-detect code in
it. (This would also eliminate the need to replace the StatCounter
bug with the Panix project-codes in it with the dglenn.org version
by hand in each page I upload.)
So is this a reasonable approach, or does it put me on a road that
ends with cackling, lightning rods, glassware full of bubbling liquids,
and a horde of peasants with torches and pitchforks beating on my door?