ritter.vg
tech > code > adventures in code > making the site
the goal
This site is designed to be as minimalistic as possible in terms of maintenance.
  1. Necessities
    1. As close to web standards as feasible
    2. Easy updating
    3. Accessible for those without Javascript or Cookies
    4. Keep the back button and bookmarking working
  2. Do Not Want
    1. Fancy Content Management System with a database backend and security holes
    2. Any database at all
    3. As little Server Side code as possible
revision 1
22 Mar 2009 12:42 EST

Revision 1 of the site accomplished most of those goals; but without letting the site be crawlable or accessible to those without javascript. In fact, most of the site runs off javascript.

View the source, and you'll see most of revision 1. Each page of content is just an HTML file that is loaded into the main content area via javascript. It uses the jQuery History plugin to load content and maintain the back button. Each link looked like this:

<a href="#contact" class="history">contact</a>

I could just apply some thing similar to the below to every a.history element; but that wouldn't apply to any links that I load via AJAX. But the Live Query plugin let me write something that would apply to links that are loaded via AJAX as well.

$("a.history").livequery('click', function(){
	var hash = this.href;
	hash = hash.replace(/^.*#/, '');
	$.history.load(hash);
	return false;
	});

And this bit let me keep bookmarking work:

function pageload(hash) {
	if(hash)
		$("#main").load(hash + ".html");
	else
		$("#main").load('news.html');
	}

I used a few other bits. The tooltips came from the jQuery Tooltips Plugin. The syntax highlighting came from SyntaxHighlighter. I had tried a few other syntax highlighters, but this was the only one that played nicely with loading content via AJAX.

revision 2

This worked, and it worked very well. It only had two drawbacks - search engines couldn't index it, and people browsing without javascript (who I have a very soft spot in my heart for) couldn't view the pages. I could remedy both of them in one fell swoop.

This article outlines a method for doing it; but I came up with a more home-grown solution. The first part was to ensure that each link went to an actual page when javascript was disabled. I change all the links to a real page, and changed the javascript url rewriting. Even though the click event returns false (meaning you won't go to the page) I rewrite the urls to the old method (with the hash) because the click event isn't fired when users choose to open in a new tab or window from the right-click menu, which inadvertently threw them into non-javascript mode.

<a href="n.php?page=contact" class="history">contact</a>
$("a.history").livequery(function(){
        var hash = this.href;
        hash = hash.replace(/.+php\?page=/, '');
        this.href = '#' + hash;
        });

$("a.history").livequery('click', function(){
	$("#main").html('<div style="text-align:center">Loading...</div>');
	var hash = this.href;
	hash = hash.replace(/^.*#/, '');
	$.history.load(hash);
	return false;
	});

The real magic of course is just what n.php does. It has to fopen and send back the html - I'm certainly not going to rewrite my pages.

//Load Index
$f = fopen("index.html", 'r');
$index = "";
while($line = fread($f, 2048))
     $index .= $line;
fclose($f);

//Get content
$filename = !empty($_GET['page']) ? $_GET['page'] : "news";
if(file_exists($filename.'.html') && $filename != "index" && strpos($filename, ".") === false)
{
  $f = fopen($filename.'.html', 'r');
  $content = "";
  while ($line = fread($f, 2048))
    $content .= $line;
  fclose($f);
}
else
  //Hack attempt

//Replace Content
$index = str_replace("<!-- content here -->", $content, $index);

//Remove script elements
//"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." - Jamie Zawinski
while ($s = strpos($index, '<script type="text/javascript"'))
{
  $e = strpos($index, '</script>');
  $index = substr_replace($index, "", $s, $e + strlen('</script>') - $s);
}

//Done
echo $index;

This will do almost everything - except redirect the user to n.php if they don't have javascript enabled. I could make the index page into a server-side language. But I won't - I still want the entire site run out of HTML - not PHP or another server-side language. We're keeping that minimal.

The only way to redirect a user without javascript in HTML is a META Refresh. That's not valid HTML - but I don't care. Usability and functionality must override a dogmatic adherence to an arbitrary standard. Life isn't fair, get used to it.

<noscript><meta http-equiv="refresh" content="0;URL=n.php" /></noscript>

(Oh, and make sure you write that bit out in the server-side file.)
Feel free to turn off javascript and test it out, or you can visit it directly. It won't (shouldn't) render with any javascript anyway. This has been changed - to view the javascript-disabled page you will need to turn javascript off.

The very last thing I did was consolidate my javascript and css files. Previously I had 2 CSS Files and 5 JS files - now it is 1 of each, for a total of 4 requests per page (index, content, JS, and CSS). The syntax highlighter still loads a javascript file for each brush set used on a page however, so for this page it's 3 more javascript files (xml, javascript, and php). Maybe in the future I'll write a quick PHP script to concat them on the fly.

revision 3
07 Apr 2009 16:29:00 EST

Today I realized that google was indexing my non-javascript pages. And that anyone getting here via google was going to get the non-javascript experience. Now while I think that experience works - it's not as nice, and more importantly, I don't test in it very much. So I added just a little bit of code to try javascript and redirect them if it works.

$content = "\t" . '<div style="text-align:center">You\'re browsing without Javascript! ' .
        'If you have no idea what that means, you should ask your technical friend about it.<br />' .
        'Otherwise - kudos.  The website <em>should</em> work.  If it doesn\'t, please ' .
        '<a href="n.php?page=contact" class="history">contact me</a> and let me know.</div>' . "\n" .
        '<script  type="text/javascript">location.href="http://ritter.vg/#' . $_GET['page'] . '";</script >' . "\n" . 
		//funny spacing to avoid the matching =/
        $content;

You'll notice I'm using a horribly hackish way to avoid the later script matching - extra spaces. Eh.

The last bit is that previously, it would remove all script elements, so you wouldn't get some annoying noscript bar telling you that it blocked script elements. Now you will get that bar. I figured people using NoScript are used to it, and the improvement for search engine visitors is worth it.

revision 4
19 Apr 2009 14:29:00 EST

I have been a hypocrite. I proclaim that I will not read anything unless it comes in RSS form but my own site does not provide an RSS feed. So, how do I implement an RSS feed when I have no database or anything of the sort? The answer is hackery!

Step 1 was to create the RSS feed and get all the news into it. I had to do a little research and send the result through some validators, but eventually I got it and it looked like this:

<?xml version="1.0"?>
<rss version="2.0">
<channel>

<title>ritter.vg</title>
<link>http://ritter.vg</link>
<description>Personal weblog and homepage of Tom Ritter.  A smash and grab approach to technology.</description>

        <item><guid>http://ritter.vg/#=15</guid><title>
                authentication poc
        </title>
        <pubDate>
                07 Apr 2009 16:29:00 EST
        </pubDate>
        <description><![CDATA[
                <p>I added a new <a href="http://ritter.vg/n.php?page=code_poc" class="history">Proof of Concept</a> - this one on an authentication idea I had for lost passwords.  Secret Questions suck, picking your own secret question sucks.  Filling out a form of 100 items really sucks.  Picking a few questions to answer out of 100 questions sucks because you have to read them all.  But if we organize the questions in a way that they're very easy to "skim" we can present the user or attacker with 100 choices of questions to answer.</p>

                <p>I also changed around some styles to try and make the site more readable.</p>
        ]]></description></item>
</channel></rss>

So now I had an RSS Feed ready to injest at Google Reader, and it worked fine. But I now had the problem of having to get my news posts to show up in three locations: the RSS Feed, the javascript version, and the non-javascript version. According to the RSS Specs, I also had to provide a permalink for each individual post I made. The solution to this problem was a script that would parse the RSS feed and produce HTML. I could have parsed HTML to produce RSS, but that seemed more difficult.

I looked at my options, and while I could have downloaded and parsed the feed in javascript for javascript enabled clients, it would have grown linearly with time. And it didn't solve the problem for non-javascript clients. So I made a PHP script that had a few modes. I generally don't like using any server-side code on this site, but the two PHP scripts I've made so far would be pretty easy to port to any other language (I hope). The script could:

  1. By default display the 5 most recent news posts. example(no longer functional)
  2. Be in "plus mode" where it would show the c most recent news posts. This lets people go back in time and view old posts. example(no longer functional)
  3. Be put in "permalink mode" where you will view the nth post. example(no longer functional)

This script returns only the HTML, so it must be loaded into the page dynamically via AJAX. This code was replaced in index.html. You can see how you enter permalink mode (prefacing with a '=') and "plus mode" (prefacing with a '-'). I wanted the plus mode preface to be a plus sign, but that got url encoded and didn't play nicely. And if you're wondering why I used indexOF instead of simply [0], it's because IE doesn't let you index into strings.

function pageload(hash) {
   $("#main").html('<div style="text-align:center">Loading...</div>');
   if(hash.indexOf('=') == 0)
     $("#main").load("shownews.php?n" + hash);
   else if(hash.indexOf('-') == 0
     $("#main").load("shownews.php?c=" + hash);
   else if(hash && hash != 'news')
     $("#main").load(hash + ".html");
   else
     $("#main").load("shownews.php");
}

For the non-javascript page I rearranged the code and used this. I had significant problems capturing the output of the script. I tried all of the output buffering techniques, and none worked. I looked around, found a comment in this stackoverflow question that suggested doing a curl request pretending to be a client. So I used the handy built in PHP function to do that.

if($filename == "news" || $filename[0] == '-' || $filename[0] == '=')
{
  $app = "";
  if($filename[0] == '-') $app = "?c=" . substr($filename, 1);
  elseif($filename[0] == '=') $app = "?n=" . substr($filename, 1);

  $content = file_get_contents('http://ritter.vg/shownews.php' . $app);
}

So the last thing to do is show you the script that parses the RSS Feed and produces HTML to output to the browser.

<?php

//Load all the XML
$f = fopen("news.xml", 'r');
$all = "";
while($line = fread($f, 2048))
     $all .= $line;
fclose($f);

//Get count
$c = !empty($_GET['c']) ? abs(intval($_GET['c'])) : 5;
$n = !empty($_GET['n']) ? abs(intval($_GET['n'])) : 0;


$index=$firstindex=0;
//What mode are we?  Specific item, or the c most recent
if($n > 0)
{//Specific Item - find that specific item
  $firstindex = stripos($all, '<item><guid>http://ritter.vg/#=' . $n . '<');
  $index = stripos($all, '<item><guid>http://ritter.vg/#=' . ($n-1) . '<');
  $all = substr($all, $firstindex, $index - $firstindex);
}
else
{//c most recent
  //Find the index of the 1st and nth item.
  //Isn't this so much more efficient than parsing the XML as XML?
  for($i=0;$i<=$c;$i++)
    {
      $previndex = $index;
      $index = stripos($all, '<item>', $index) + 1;
      if($index < $previndex) { $index = $prevIndex; break; }//stripos was wrapping around, annoyingly so kick out
      $firstindex = ($i == 0) ? $index : $firstindex;
    }
  $all = substr($all, $firstindex - 1, $index - $firstindex - 1);
}

//Replace entities with HTML
$search = array(
                '<item>'
                , '</item>'
                , '<title>'
                , '</title>'
                , '<description><![CDATA['
                , ']]></description>'
                , '<pubDate>'
                , '</pubDate>'
                , '<guid>http://ritter.vg/#='
                , '</guid>'
                );
$replace = array(
                 ''
                 , ''
                 , '<div class="title">'
                 , '</div>'
                 , '<div class="section">'
                 , '</div>'
                 , '<div class="tagline">'
                 , '</div>'
                 , '<!-- "'
                 , '" -->'
                 );

$all = str_replace($search, $replace, $all);

//Done
echo $all;
if($n <= 0)//Add a view more depending on mode
     echo '<div class="showmore"><a href="n.php?page=-' . ($c + 5) . '" class="history">5 More...</a></div>';
?>
revision 5
17 Jan 2010 20:43:00 EST

This is getting pretty ridiculous, right? I'm putting more effort into writing my site that actually accomplishing anything. Anyway, in the current incarnation, the site isn't search indexable, and it doesn't allow comments. I looked at moving to Movable Type, but I got so many javascript errors in the admin panel it was completely unusable, so after a lot of agonizing, I decided to (again) stay away from CMS's and just roll it myself.

First I rewrote n.php to combine.php which writes out the content page you request inserted into the template.

<?php
//common.php
class blogpost
{
  public $slug;
  public $title;
  public $pubdate;
  public $content;

  function __construct($slug)
  {
    $this->slug = $slug;
    
    $file_contents = get_file_contents("blog/" . $slug);

    $firstnewline = stripos($file_contents, "\n");
    $this->title = substr($file_contents, 0, $firstnewline);

    $secondnewline = stripos($file_contents, "\n", $firstnewline+1);
    $this->pubdate = substr($file_contents, $firstnewline, $secondnewline - $firstnewline);

    $this->content = substr($file_contents, $secondnewline);
  }

  public function toHTML()
  {
    return '<div class="title"><a href="/blog-' . $this->slug . '">' . $this->title . "</a></div>\n" .
      '<div class="tagline">' . $this->pubdate . "</div>\n" .
      $this->content;
  }
}

class slug
{
  public $title;
  public $filename;

  function __construct($line)
  {
    $line = trim($line);
    $this->title = trim(substr($line, 0, stripos($line, ":")));
    $this->filename = trim(substr($line, stripos($line, ':')+1));
  }
}

function make_blog_entry($slug)
{
  $entry = new blogpost($slug);
  return $entry;

}

function get_file_contents($slug)
{
  $content = "";
  $f = fopen($slug, 'r');
  while ($line = fread($f, 2048))
    $content .= $line;
  fclose($f);
  return $content;
}

function retrieve_slug_array()
{
  $index = get_file_contents('blog/directory.txt');
  $a = explode("\n", $index);
  for($i=0; $i<count($a); $i++)
    if(trim($a[$i]) == "")
      unset($a[$i]);
    else
      $a[$i] = new slug($a[$i]);
  return array_values($a);
}

<?php
//combine.php
require("common.php");

$index = get_file_contents("index.html");
$notfoundmessage = '<br /><br /><div style="text-align:center">File Not Found.<br /><br /><img src="resources/images/sad.jpg" /></div>';
$content = "";

$filename = !empty($_GET['page']) ? trim(strval($_GET['page'])) : "index";
$filename = str_replace("/", "", $filename);
$filename = str_replace("\\", "", $filename);
$filename = str_replace(".", "", $filename);
if($filename == "news") $filename = "index";
$filename = strtolower($filename);

if($filename == "index")
  {
    $posts = retrieve_slug_array();
    for($i=0; $i<5; $i++)
      $content .= make_blog_entry($posts[$i]->filename)->toHTML() . "\n\n";
  }
else if($filename == "archive")
  {
    $posts = retrieve_slug_array();
    $content .= '<div class="title">post archive</div>' . "\n";
    $content .= "\t<ul>\n";
    for($i=0; $i<count($posts); $i++)
      $content .= "\t\t" . '<li><a href="/blog-' . $posts[$i]->filename . '">' . $posts[$i]->title . '</a></li>' . "\n";
    $content .= "\t</ul>\n";
  }
else if(substr($filename, 0, 5) === "blog-")
  {
    $filename = str_replace("blog-", "", $filename);
    if(file_exists("blog/" . $filename . ".html"))
      $content = make_blog_entry($filename . ".html")->toHTML();
    else
      $content = $notfoundmessage;
  }
else if(file_exists("content/" . $filename.'.html'))
  $content = get_file_contents("content/" . $filename.'.html');
else 
  $content = $notfoundmessage;


//Replace Content
$index = str_replace("<!-- ¤ -->", $content, $index);

//Done
echo $index;

After that, I used Apache's URL rewriting to make url's look much more simple:

RewriteEngine on
RewriteRule ^/$ combine.php [L]
RewriteRule ^([^/.]+).html$ combine.php?page=$1 [L]

The last part was manually rewriting all the links (there weren't many), and splitting the blog posts out of news.xml into individual files (I'll never get the hang of emacs). You'll notice there still aren't comments - you're right. I hope to add them soon, I really want them, but nothing fits the model of how I want comments to behave, so I probably need to roll my own system, again.

revision 6
21 Jan 2010 19:43:00 EST

I've finally done it. I've finally put comments into the blog. It was more difficult than it seemed, because I have a strict view on how comments should work:

I think I've managed to accomplish all of those, with only one major trade-off. Comments are loaded via AJAX after the page is loaded - this keeps my philosophy of avoiding databases whereever possible, degrading gracefully, and staying as fast as possible. The trade-off is that comments are not search-engine indexable. I think that's fine for now, although I'm going to put some thought into it.

I use markdown to style your comments, so you can put code, bold, italics, links and make your comments look pretty. The actual implementation is mostly mod_python. Here's most of the code:

import MySQLdb,markdown
from markdownDisableImages import DisableImagesExtension
from mod_python import apache,util

class comment:
    def __init__(self, tup):
        null,self.submitted,null,self.author,self.email,self.website,comment,null = tup
        self.author, self.email, self.website = tuple([i.replace("\"", "\\\"") for i in \
                                                [self.author, self.email, self.website]])
        md = markdown.Markdown(safe_mode="escape", extensions=[DisableImagesExtension()])
        self.htmlcomment = md.convert(comment)
        self.htmlcomment = self.htmlcomment.replace("\"", "\\\"")
        self.htmlcomment = self.htmlcomment.replace("\n", "\\n")
        self.htmlcomment = self.htmlcomment.replace("\r", "\\r")
    def toJSON(self):
        import hashlib
        from cgi import escape
        return """{
            "author" : "%s",
            "email" : "%s",
            "website" : "%s",
            "comment" : "%s",
            "submitted" : "%s"
            }""" % \
                (escape(self.author), hashlib.md5(self.email).hexdigest(), self.website, self.htmlcomment, self.submitted)
class commentCollection(list):
    def toJSON(self):
        return """{ "comments" : [""" + \
            ','.join([i.toJSON() for i in self]) \
            + "] }"

def getDB():
    return MySQLdb.connect(...)
def isCaptchaValid(challenge, response):
    #ommitted for brevity

def handler(req):
    req.form = util.FieldStorage(req, keep_blank_values=1)
    if req.path_info[-3:] == "get":
        return getComments(req)
    elif req.path_info[-6:] == "submit":
        return addComment(req)
    else:
        req.status = apache.HTTP_NOT_FOUND
        return apache.OK

def addComment(req):
    from urlparse import urlparse,urlunparse
    from urllib import quote

    postid = req.form['postid'].strip() if 'postid' in req.form else ''
    author = req.form['name'].strip() if 'name' in req.form else ''
    email = req.form['email'].strip() if 'email' in req.form else ''
    website = req.form['website'].strip() if 'website' in req.form else ''
    comment = req.form['comment'].strip() if 'comment' in req.form else ''
    challenge = req.form['challenge'].strip() if 'challenge' in req.form else ''
    response = req.form['response'].strip() if 'response' in req.form else ''
    ip = req.connection.remote_ip

    if len(ip) == 0 or len(postid) == 0 or len(author) == 0 or len(email) == 0 or len(comment) == 0 or len(challenge) == 0 or len(response) == 0:
        return addError(req, "Bad Request")

    if len(website) > 0:
        urlinfo = urlparse(website)
        if urlinfo.scheme != "http" and urlinfo.scheme != "https":
            return addError(req, "Bad Website")
        website = quote(website, safe="%/:=&?~#+!$,;'@()*[]")

    if not isCaptchaValid(challenge, response):
        return addError(req, "Incorrect Captcha")

    db = getDB()
    cursor = db.cursor()
    cursor.execute("INSERT INTO comments(postid, name, email, website, comment, ip) VALUES (%s, %s, %s, %s, %s, INET_ATON(%s))", \
                    (postid, author, email, website, comment, ip))
    return getComments(req)

def addError(req, msg):
    req.content_type = 'application/json'
    req.send_http_header()
    req.write("{ \"error\" : \"" + msg + "\"}")
    return apache.OK

def getComments(req):
    postid = req.form['postid'] if 'postid' in req.form else ''
    db = getDB()
    cursor = db.cursor()
    cursor.execute("SELECT * FROM comments WHERE postid = %s ORDER BY submitted ASC", (postid))
    results = cursor.fetchall()
    comments = commentCollection()
    for row in results:
        comments.append(comment(row))
    req.content_type = 'application/json'
    req.send_http_header()
    req.write(comments.toJSON())
    return apache.OK

### DisableImagesExtension
import markdown
from markdown import etree

class DisableImagesExtension(markdown.Extension):
    def extendMarkdown(self, md, md_globals):
        md.treeprocessors.add('disableImages', DisableImages(md), '_end')

class DisableImages(markdown.treeprocessors.Treeprocessor):
    def descendRemove(self, element):
        for i in element:
            if i.tag == 'img':
                element.remove(i)
            else:
                self.descendRemove(i)
    def run(self, root):
        self.descendRemove(root)
        return root

You'll notice I'm serializing to JSON myself - I don't like dependencies. I'm also using markdown.py's extension architecture to strip out any images from comments.

Now one thing you might be thinking is "There's no way this is secure. It's just a hodge-podge of code, messy, no structure. There's a hole here." Well, you're right, it is a hodge-podge of code, it is messy, there is little structure. But there aren't any holes. I don't think. But I'm willing to put my money where my mouth is. Exploit my comment system and I'll pay you $20. Now what qualifies as an exploit is persistent javascript, html, or css injection (so it affects other people) or pull off sql injection. If you manage to get it to produce a javascript error, but not an exploit, I'll consider that worth a dessert or a snack. And of course you'll get bragging rights and I'll credit you here.

revision 7
22 Oct 2012 20:42 EST

In anticipation of upgrading to Apache 2.4, I upgraded the comments from mod_python to mod_wsgi. It wasn't terribly difficult, as the comments code was basically a Hello World app anyway. The $20 challenge still stands. I also removed the CAPTCHA, since I don't think there's a need for it. I can put it back if necessary. I also turned off Content Security Policy until Chrome and FF get their shit together. I was hitting some bugs in their implementations.

Comments
Add a comment...
required
required, hidden, gravatared

required, markdown enabled (help)
you type:you see:
*italics*italics
**bold**bold
[stolen from reddit!](http://reddit.com)stolen from reddit!
* item 1
* item 2
* item 3
  • item 1
  • item 2
  • item 3
> quoted text
quoted text
Lines starting with four spaces
are treated like code:

    if 1 * 2 < 3:
        print "hello, world!"
Lines starting with four spaces
are treated like code:
if 1 * 2 < 3:
    print "hello, world!"