
In my day to day life, I work with dozens of different clients, each of whom have - in some form - their own ways of handling the structure and naming conventions for their sites. In almost each one of them, these conventions are not standardized in any written policy, but rather simply evolves from what they think is best. In a lot of instances, things work out OK with these unwritten conventions, but it always seems that the more "spoons in the pot", or the longer the site exists before being redesigned, the greater the chance for entropy in the site's structure. This results in difficulty in maintaining the site and even difficulty developing new content. Usability, as I say, isn't just for the end user, but for those of us who have to develop and maintain web sites, because if we have problems doing our jobs, that can't make it easier on the user.
Unfortunately, there is no established convention anywhere for structure and naming of web sites and, considering the plethora of different technologies which can go into building a site - each with their own structure and naming practices - outlining one of your own can seem rather difficult. PHP, for example, tends to use underscores as a delimiter between words in function names. For example, "in_array" is a function to determine if an item is located in a specific array. JavaScript, on the other hand, tends to use what's known as camelcasing - the capitalization of conjoined words as a form of delimiting, i.e. "getElementById". Finally, HTML seems to follow a convention which involves abbreviations and acronyms in a quest to ensure brevity (sometimes at the cost of clarity) to name elements and attributes. Below, I hope to offer my own opinion on naming and structure to make site development and maintenance easier.
All names for anything on the site, be it a file name, directory name, HTML element ID or class, programming variable, function, method, or class should make sense when seen. The name must be reflective of what the item does or, in the case of a publicly viewable file, what information the file contains or what object the item portrays. The name must be instantly recognizable while being as terse as possible. You should err on the side of clarity rather than brevity and any abbreviations or acronyms should be clear enough to be usable.
In order to reduce confusion for yourself and others developing the site, and also for the site's users, all file and directory names should be defined in lowercase only. In my work, I have found that mixing case, capitalization, and use of uppercase for naming files and directories is one of the main causes of developer-caused 404 ("File Not Found") errors. When the person who creates the site defines a directory name of "Images" and others maintaining the site expect graphics to be found in the "images" directory, this can cause major problems throughout the entire site. While it is true that everyone should be checking their work, the entire problem could be avoided simply by agreeing to always name files and directories in lowercase only.
Avoid using underscore delimiters on publicly viewable files and directories. Many in the SEO community have espoused the use of hyphens and underscores as word separators and, until recently, this advice has been semi-misguided. Recently, however, it has been reported that hyphens and underscores are to going to be treated as a keyword delimiter in the URL by Google. For usability purposes, however, I still must recommend against using underscores in publicly viewable file and directory names. Underscores, when displayed in electronic email and documents end up disappearing, being obscured instead by the underline which automatically appears on the URL. Laypersons may think that the underscore (again, which has been obscured by the underline) is a space and type it into their browser's address bar as such, resulting in a 404 error, leaving them unable to access the resource they're seeking.
In the case of non-viewable files and directories, just pick a delimiter and stick to it religiously. Whether you use "-" or "_" as your delimiter does not really matter so long as you stick to one method only. Personally, I use underscores on these types of files, which also serves as a reminder to me that what's inside them isn't meant to be directly accessed by the outside world.
Do not, under any circumstances, use spaces in file names of any kind. While Windows operating systems supports spaces within file names, Unix-based operating systems do not. This means any request for an URL with a space in the name will result in a 404 error. While some may respond that their browser will convert (encode) the spaces to "%20", this is a browser-based error correction mechanism and you should not rely upon this method as a means for circumventing shortcomings within your naming convention.
Establish a practice of using a singular file extension for all publicly viewable pages on the site. For instance, if your site is driven by a PHP backend, use .php as the extension only, even if some pages contain only static HTML. In cases where a PHP-driven page is a rarity, consider using an Apache Handler to process .html files as PHP by adding the following line to an .htaccess file:
AddHandler application/x-httpd-php .html
This practice also has security benefits, as it does not disclose any platform information to hackers and script kiddies who, having discovered your site on a PHP server can narrow down their attack.
For all files, ensure that like files always contain the same exact extension. For example, for JPEG images, choose either .jpg or .jpeg and religiously use only the extension you've chosen for that type of file.
Some developers have the practice of defining scripts and template files with extensions like .inc or .tpl. Such a practice is a very big security risk. If the server is not given explicit instructions on how to deal with files having those extensions, the files could be rendered as plain text if they are requested directly. Imagine, for a moment, that you have a file located at /htdocs/includes/db.inc and that file contains settings for connecting to your database. Hackers can now read your connection details simply by directly typing in http://www.yourdomain.com/includes/db.inc into their browser's address bar.
To solve this problem you can do one (or more) of the following:
<FilesMatch ".inc$">
Order allow,deny
Deny from all
</Files> if(basename( __FILE__ ) == basename( $_SERVER['PHP_SELF'])){
header("location: /403.php");
exit;
} By using one of the above methods to ensure security, you can also add a sensible file extension to delineate PHP files which have special purposes, such as includes or templates vs. files that are publicly viewed pages.
One of the things that often lead to maintenance headaches for developers is the intermingling of site assets, such as images and scripts, into the same directories as the web site's pages themselves. To reduce this headache, site assets should be placed in directories with other similar assets. For example, all style sheets should be placed in a directory called "styles", all images should be placed in a directory called "images", and so forth. In instances where like assets can be further separated into logical groups, you should do so. For example, imagine an e-commerce site which has two subdirectories in their images folder: "template", which holds image shared throughout the site's template and "products" which holds the images of the site's products for sale.
One exception to the above is when site assets are not shared throughout the most (or even majority) of the site. For example, if your site contains a newsletters section in which none of the newsletter's images are used anywhere else in the site, then perhaps placing those images within the newsletters space would make the best sense, so that http://www.example.com/newsletters/ is the URL to the newsletters and all newsletter assets are within that same directory.
When it comes to publicly viewable pages, they should be placed in a directory structure that reflects the structure of the information itself - again, grouping similar information together. At the same time, take care that you do not use unnecessary sub directories. Create sub-directories when it is anticipated that the site's growth will be such that they're needed. For instance:
Think carefully about your directory structure, as once published, it is difficult to change.
Regardless of whether you agree with the recommendations I've made above, if you're in a situation where more than one person will be working on the site, you really should write it down and distribute it among all of the people involved in the site's development and maintenance. I recommend that your structure and naming document adhere to the conventions outlined in RFC 2119, "Key words for use in RFCs to Indicate Requirement Levels". This document - short, by RFC standards - was authored in 1997 and outlines certain keywords which indicate the requirements for adherence to the established standard. By using words like "must" and "should", your naming and structure document can clearly delineate between items that are absolutely required and items that are merely recommended.
By determining clear and sensible internal standards for the naming and structure of documents in your site, you can help ensure a higher level of overall quality for your site. This, in turn, will benefit not only you but also anyone else who develops and maintains the site. This will trickle down to the users as well, because sensible and clear names for the pages of your site will help its visitors understand where they are on the site and may also have benefits to SEO.
Contact Us to see how Web Access Strategies can help your organization.