Perlshop 4 - Dancing with Search Engines

Overview

OK Sparky, here's the deal.

You want to have all of your Perlshop catalog web pages spidered, indexed, and otherwise processed by every search engine in the world. To do this, the pages have to be able to be treated just like any static web page. Obviously, Perlshop catalog pages don't work that way, and so there are a couple of problems here that make this all a bit of a messy situation.

Problem 1 :
By default, Perlshop expects the catalog page files to be stored under your cgi directory. Search engines need web pages to be stored under your html directory, just like any other static web page. How can this conflict be resolved?

Problem 2 :
Once a search engine has processed a given page, you'll get a link to that page at that search engine. This is great, except for the fact that the link will be right to the catalog page file. A Perlshop catalog page won't work if treated like a regular web page. There'd be no shopping cart, no Order ID value, and no server side page processing. How can you make a direct link to one of your catalog pages work like this?

First, Some Basics

For the following discussion, I'm going to be using the directory structures we use here at Waverider Systems and at the various businesses we support. Most on-line businesses use a similar structure, and so most of you will understand this stuff right off. Those of you using different systems will have to figure this out for yourself as you go along.

Our systems are Unix based. Home directories are of the form /home/youraccount. All files that are intended for public access are stored in a subdirectory called public_html. All CGI programs are stored in a subdirectory called cgi-bin.

When you type our URL (http://www.waveridersystems.com) into your web browser, what you're really doing is requesting the file /home/waveridr/public_html/index.html from our web server.

When you type the URL http://www.waveridersystems.com/beer.html into your web browser, what you're really doing is requesting the file /home/waveridr/public_html/beer.html from our web server.

When you type the URL http://www.waveridersystems.com/cgi-bin/report.cgi into your web browser, what you're really doing is requesting that the program stored as /home/waveridr/cgi-bin/report.cgi be executed by our web server, with the program output being sent back to you.

Solving Problem 1

Let's talk about catalog pages. Let's say we have a catalog page called "beer.html", which allows you to buy a beer on-line and send it to me. Perlshop needs to have access to this file, and so it's going to be stored as /home/waveridr/cgi-bin/catalog/beer.html. This file is not stored under the public_html subdirectory, and so browsers and search spiders have no way to get to it. How can we keep Perlshop happy, and at the same time have this file be directly visible to the outside world? Wouldn't it be great if we could keep all of our catalog files under the public_html directory? Even better, wouldn't it be cool if we could be organized and keep all of our catalog page files in a catalog subdirectory under the public_html directory?

Not only would it be very cool if we could do this, but it turns out that there are at least three ways to do it.

For the sake of the following discussion, let's assume that all Perlshop catalog page files are going to be kept in the directory /home/waveridr/public_html/catalog. This means that a browser will be able to access them using a URL like http://www.waveridersystems.com/catalog/beer.html. The specific information for your own account will of course be different.

Solution 1:

The first solution is simple, inefficient, and not recommended. I'll show it to you first, and then hope you don't ever have to use it.

You make duplicates of all of your catalog page files.
You keep your working copies in /home/waveridr/cgi-bin/catalog.
You keep your your search engine visible copies in /home/waveridr/public_html/catalog.

As I said, ugly, inefficient, very simple to pull off, and I hope you never have to do it this way.

Solution 2:

This solution should work for any platform. You can alter the ps.cfg file setting called $catalog_directory so that it points to /home/youraccount/public_html/catalog.

The default setting is:

$catalog_directory = $curr_dir . 'catalog';

Your new setting would be:

$catalog_directory = '/home/wavridr/public_html/catalog';

Now you can use the same catalog page directory for all purposes. Please be aware that the internal nature of some web server security configurations can prevent this method from working correctly.

Solution 3:

This final solution is for Unix systems only (I'm hoping someone out there will provide me with a Windows NT equivalent). We'll be using a simple Unix construct called a symbolic link.

What is a symbolic link you ask? A symbolic link is a way to create a make believe subdirectory that looks, tastes, and feels like a real subdirectory. In reality, the symbolic link just points to a real directory someplace else. In other words, a symbolic link is an alias. A piece of misdirection. A pseudonym. A fake. It's not real, but the server thinks it is.

The first thing you'll do is make a real subdirectory at /home/waveridr/public_html/catalog. All of your catalog page files go in here.

You will have no real cgi-bin catalog directory. Instead, you'll have a symbolic link that points to the /home/waveridr/public_html/catalog directory. Every reference to file /home/waveridr/cgi-bin/catalog/beer.html will really be a reference to /home/waveridr/public_html/catalog/beer.html. All you have to do is create the symbolic link. Unix will do everything else for you. It's magic!

How do you create a symbolic link? I'm so glad you asked. You'll need Unix shell access for this, using Telnet or SSH or whatever works for you. At the Unix command line, you'll use the ln command as follows:

ln -s /home/youraccount/public_html/catalog
/home/youraccount/cgi-bin/catalog

This tells the server to create a symbolic link called /home/youraccount/cgi-bin/catalog, and to make it be an alias for /home/youraccount/public_html/catalog. Now Perlshop can access the catalog page files just as it expects to, and your catalog pages are visible to the outside world.

Solving Problem 2:

OK, so now your catalog web pages are visible to the outside world. Search engines can spider them and go away happy. Browsers can access them, but they can't make them work. This is because Perlshop gets no chance to process the pages before the browser gets them. No shopping cart, no Order ID, no fun at all. So what can we do to make this work right?

You need to make your catalog pages smart. They need to be able to figure the situation out for themselves. If they were called by Perlshop, then they allow themselves to be loaded normally. If they were called from the outside, then they need to force the browser to reload them through Perlshop. A little JavaScript code placed in the <head> section of your catalog page can make this happen with a minimum of fuss.

Here is the code I'm talking about:

<head>

<!-- Catalog page file beer.html -->

<title>Waverider Systems - Send Dave a Beer</title>

<script type="text/javascript" language="JavaScript">
// Record the Perlshop Order ID value
var orderid = '!ORDERID!';

// If the Order ID value starts with a '!' symbol, then
// the page has not been processed by Perlshop.
if (orderid.charAt(0) == '!')
{
	// Force the browser to reload the page via Perlshop.
	top.document.location.replace("http://www.WaveriderSystems.com/cgi-bin/perlshop.cgi?ACTION=enter&thispage=beer.html&ORDER_ID=!ORDERID!");
}
</script>

<!-- Other head stuff -->

</head>

This JavaScript code will be executed by the browser as the page is being loaded. If the Order ID value check fails, the JavaScript code will cause the browser to load the given Perlshop URL. From that point on, the customer will receive a normal shopping experience.