eftychia: Me in kilt and poofy shirt, facing away, playing acoustic guitar behind head (Default)
Add MemoryShare This Entry
posted by [personal profile] eftychia at 06:28pm on 2007-10-20

[WTF?! Need line-eater-food to prevent LJ and/or this posting client from trying to interpret the first word with a colon after the first <P> tag as a formatting command? Never noticed that before...]

Background: There was my old site at www.radix.net (not to be confused with my ancient (and long dead) site at digex.net) which has been replaced with a directory tree full of "this page has moved" messages with meta-refresh redirects to www.dglenn.org. And there's my main site at www.glenn.org, which is getting most of its hits via those redirects at www.radix.net, not directly. And there's a backup/working-copy of my pages on www.panix.com that a couple of search engines find instead of www.radix.net (and no search engines seem to list www.dglenn.org first of the three, based on traffic analysis).

Apparently part of the problem causing so much traffic to continue to hit my www.radix.net ex-pages is that I used meta-refresh tags on www.radix.net -- if I'd used .htaccess to generate 301 responses, search engines would have moved their attention from www.radix.net to www.dglenn.org by now ... but I wanted that few-seconds delay for human readers to see the "this page has moved" notice and (I hoped) update their bookmarks or links to my pages from their own sites. The decision was made partly from ignorance of how to generate 301 codes, but I probably would have made the same choice in the interest of prioritizing human readers over robotic ones. Lately, as described here in passing, I've been trying to notify other webmasters personally via email (do I bother sending a postcard to the ones who list a postal address but no email address?) pointing out that my site moved eleven months ago and asking them to update their links to me.

It may be time to say, "the humans have had plenty of time to see the 'page moved' message; now it's the robots' turn and I should start using 301 redirects" ... that's the smaller sanity check I'm asking for. (Users would get redirected to the right place, but unless they think to examine the address bar of their browsers they might not notice that it was because of a redirect, not because my content was still where they'd bookmarked it.)

The bigger sanity-check regards my www.panix.com pages, which I consider a mirror of www.dglenn.org. I don't want to take those pages down just yet, but I'd rather a) search engines link to the main site instead of there, and b) search engines and maintained-index sites not think that I'm trying to inflate my hit count by duplicating content. I'd like most of the not-search-engine-mediated traffic to move to www.dglenn.org as well, but that feels a little less critical right now. (It'll matter if I wind up needing to reclaim that disk space to stay within my quota at panix, or if I decide to use the two web spaces to hold completely different sites.)

At the moment I'm finding it useful to have a copy of my pages on a publically-accessible server where I have shell access so I can make quick edits with vi and see the effects right away, or stick temporary files that I plan to delete a few days later. Spot-maintenance is easier with shell access than via ftp. If I 301 those pages to send the search engines to my main site, I lose all the advantages of having a spare copy for tinkering.

But it turns out that unlike Digex and Radix, Panix gives me PHP on their web server. So the idea that came to my muddled-by-lack-of-sleep brain was this: convince the server to tread .html pages as though they were .php ones (by way of .htaccess), and then have PHP on each page (via includes) check the user-agent string for known spiders, returning a 301:go-to-www.dglenn.org message to a spider, but a 200(ok) and the normal page contents to any other visitor.

If I also force .html to be interpreted as .php on www.dglenn.org, I wouldn't even have to strip the code before ftp'ing edited pages over to the main site -- I'd just leave an inoccuous include file on that server instead of the one that has the spider-detect code in it. (This would also eliminate the need to replace the StatCounter bug with the Panix project-codes in it with the dglenn.org version by hand in each page I upload.)

So is this a reasonable approach, or does it put me on a road that ends with cackling, lightning rods, glassware full of bubbling liquids, and a horde of peasants with torches and pitchforks beating on my door?

There are 10 comments on this entry. (Reply.)
geekosaur: orange tabby with head canted 90 degrees, giving impression of "maybe it'll make more sense if I look at it this way?" (Default)
posted by [personal profile] geekosaur at 11:03pm on 2007-10-20
(1) Seems reasoble to me that it's been long enough for people to get the message; you can only support old sites for so long.

(2) It being duplicate content, I'd just use robots.txt to tell spiders to go away.
geekosaur: orange tabby with head canted 90 degrees, giving impression of "maybe it'll make more sense if I look at it this way?" (Default)
posted by [personal profile] geekosaur at 11:05pm on 2007-10-20
Oh, and how many readers are you going to confuse with the line-eater comment? :)
 
posted by [identity profile] dglenn.livejournal.com at 12:30am on 2007-10-21
<innocent look>

<tries to look younger>
redbird: closeup of me drinking tea, in a friend's kitchen (Default)
posted by [personal profile] redbird at 11:13pm on 2007-10-20
I was also going to suggest robots.txt. On radix and maybe also the panix duplicates.
 
posted by [identity profile] dglenn.livejournal.com at 12:24am on 2007-10-21
If I'm reading the docs right, to use robots.txt I need access to the root of the web server's document tree. At both Radix and Panix the highest directory I have write access to is ~dglenn/.

I think I could use a robot-directed meta tag to tell the spiders to stay out, at least for whichever search engines' spiders understand that tag ... But if I have a choice I'd like to direct the spiders to dglenn.org to a) make sure that search engine sees the corresponding page there, and b) to get the trail that led the spider there to count as a link to dglenn.org if possible.

If my scheme is crazy, settling for using the robot meta tag seems like a reasonable plan B.
geekosaur: orange tabby with head canted 90 degrees, giving impression of "maybe it'll make more sense if I look at it this way?" (Default)
posted by [personal profile] geekosaur at 01:34am on 2007-10-21
Ah, yeh, that'd make sense.
redbird: closeup of me drinking tea, in a friend's kitchen (Default)
posted by [personal profile] redbird at 11:16pm on 2007-10-20
I just went to look at dglenn.net, via the link at the top of this page. Why do you insist on showing us stuff in what appears to be six-point type? (I tried telling Firefox not to let sites use their own fonts, and it made no difference, except when I came back to this LJ page.)

I realize that wasn't what you were asking about, but it does seem relevant.
 
posted by [identity profile] dglenn.livejournal.com at 12:50am on 2007-10-21
Hmm. All the font changes I used (as I recall) are supposed to be relative, not absolute, so that they'd be reasonable in proportion to the reader's comfortable default font. But I've noticed that different browsers seem to have different ideas of what "make it smaller" means.

Major chunks of my site haven't been edited (other than fixing links here and there) since before I started learning about CSS; I'm not sure whether it would work better or worse using style directives instead of <font> tags to do the relative scaling. The wee type is supposed to be stuff I wanted findable but not distracting, and/or stuff that was repeated on page after page that I didn't want to eat up a lot of real estate with. Given that monitor sizes have changed a lot since I coded most of that, I guess it's time for me to overhaul the design. (Which would also be a good time to switch to using PHP on the server, since I finally can, instead of what I'd been doing before for major edits: edit a PHP page offline and generate a static HTML page from it to upload. If I don't change too much, I can do it all by editing my header and footer include files, but properly I ought to do more than that to make use of CSS.)
 
posted by [identity profile] deor.livejournal.com at 04:45pm on 2007-10-21
Yeah; if you change to CSS and use font sizes in relative em, for instance, instead of flat -x, I think it would work better. I can see it fine in Opera, but in Firefox I get the same problem mentioned above.
 
posted by [identity profile] marnanel.livejournal.com at 11:33pm on 2007-10-20
really, I can't see why 301s aren't always the way to go.

Links

January

SunMonTueWedThuFriSat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24
 
25
 
26
 
27
 
28
 
29
 
30
 
31