A collection of thoughts related to the challenges of software engineering

stay connected

September 27th, 2010

I thought I took enough care of myself to further postpone this day...

I have a balanced diet. I exercise at least four hours per week. I don’t smoke. I certainly do not fry my brain with illicit substances. I sleep regularly, well and enough. Well except when I have to wake up at three in the morning to catch a plane back to Paris and lose my bag in the taxi. Ah! The stress! That reminds me I’ll have to buy myself a G550 when Bureau 14 rules the world.

For the record, Zeus tried to strike me with lightning several times out of jealousy. I had to climb the Olympus for a private “exhibit” of my Wing Tsun. No wonder I don’t have time for programming with all these delays. For the record the game God of War 3 is based on my story.

Well, guess what? Despite all of this, I’m obviously a brain-dead old man as I reach the extremely advanced age of 33.

What the hell am I talking about? Google Instant! I mean what the Hell?! I feel like I’m using software for hyperactive kids drinking liters of a homemade mix of cocaine and caffeine.

First of all I don’t think the spider is crawling the web according to my search requests, so don’t call it real time search thank you very much. In the finance industry we call the stock streams real-time although they have at least 15 minutes of delay, but we have the right to because we’re bankers and bankers have no soul.

Second Google Instant is extremely confusing. I'm not a designer, but I’m pretty sure it’s a big design mistake to display partial – and potentially unwanted – results in the same place as normal results.

Last but not least, I don’t understand how this is supposed to “save time”. All it does is request searches for items I’m not interested in. Despite me being an old crank, I’m quite able to press the enter key when needed! It’s a waste of bandwidth and CPU time if anything. Visual feedback gives you the illusion of speed, like a progress bar reduces the impression of waiting, but that’s all it is: an illusion.

The heart of the problem is that it’s not just an illusion, it’s also a distraction. It’s not real time search it’s real time advertisement. It feels like watching American TV: there are so many commercials that after a while you forget what you were watching in the first place.

It's pretty clear to me that the purpose of this feature is not to ease my searches it’s to tease me with links. The purpose of Google instant is to increase Google’s revenues.

To that I say: fair enough. Google isn’t a philanthropic institution and as an entrepreneur I fully understand they do everything they can to maximize profits. As an user however I’m dissatisfied.

The feature of Google I like the most is accurate and fast search. As long as they keep improving it that keeps me happy. Although to make me really happy, I’d love them to stop that data-based design nonsense and hire top notch designers to make something of their websites. I'll be 100% honest however and admit that it would only be ice on the cake (That being said, there is to me a market for a web application that does a better job at presenting Google’s results than Google).

In the end, I really don’t understand Google Instant. What am I missing? I am just too old to get it? Or is Google Instant just too nerdy?

September 13th, 2010

When was the last time you laughed at work? Yesterday? Last week? Last month? You don’t even remember?

Do not make the mistake of thinking that a place where you regularly laugh is a place where nothing gets done. Quite the contrary.

Serious work requires – amongst other things – concentration, cohesion and fluent communication. Laughter is an outstanding catalyst for all these things.

If you want a spark of creativity to light a team of competent and intelligent people, mirth is what you need. The thrill and energy that fills you when you shared a good laugher is an incredible creativity booster.

Amusement is very good for bounding as well. Is there a need to demonstrate you like to work with people with whom you have fun?

Finally, humor encourages direct communication. You’ll feel more comfortable to discuss mistakes and errors without unnecessary ado when the mood is joyful.

If we were pirates on a ship we would sing merry songs to achieve this result, but geeks in an office prefer to crack up dark jokes.

When I interview people, I do my best to pin down their ability to get along with the rest of the team and to be honest with you, I test their wit.

Using humor during interviews has got the incredible side effect of detecting flexibility and (some sort of) cleverness. For example, asking a candidate if he tends to stab colleagues with whom he disagrees opens the door to a very unusual but effective test of personality. You might even add that a past of serial killer is generally a recruitment’s deterrent.

We’re looking for competent, bright and insightful people that will resist pressure and are fun to be around. Of course we’re not hiring comedians as we want – first of all – people to be able to write outstanding software.

When the team is under pressure because of an angry customer, that a flurry of bugs magically appeared in the repository (really, it can’t be me!) and everybody is working too much, being around people with whom you can blow off steam is a much better feat than free gourmet food or complimentary massages.

To be completely honest with you, there’s also the thing related to the fact that we can’t afford the free gourmet food.

When I order software online, I'm always surprised to see my confirmation email and my registration code come without an invoice.

Invoices are generally useless for individuals, but they are a necessity for companies.

They are required because your local doom kommando fiscal administration will ask for them as a proof you did indeed spend the money. And before they ask for them, your accountant will because if you want him to put his John Hancock on your anual report, you better make sure it's pretty clean of vague, non-business releated entries such as "DB9 leasing" and "Casino losses".

The invoice must have the name of the company, its registration number, an address (including the country), the name of the customer, the name of the customer's company (if any), the items list with a clear and consise description for each - don't scratch your head as long as the word "software" is in the description, the accountant will understand what it's all about - a price and finally the taxes and a grand total. Export that in a printable format (PDF is a de facto standard) and attach it to all your confirmation emails.

What I just said might sound extremely silly, but for ten software licenses I bought I received a valid invoice only three times. Generally speaking I'm more inclined to do business with companies that make my life easier.

Make sure you have all the details of your business process straight.

By the way, Hi! How are you doing? Long time no see! Here, we're going great and are working very hard on our first product!

Our website isn't critical, in the sense that, if the website goes down, our operations remain unaffected. However, lately, the website was becoming extremely slow and it was turning into bad publicity.

Over the time I grew tired of this slowness until, one day, I decided to address the issue once for all. I bought a dedicated server, installed FreeBSD, nginx and everything needed to have our corporate website and the blog running and decommissioned the old server.

Unfortunately, you may have noticed that this last week, the quality of service was pretty low. This is entirely my fault as I switched to the new server too quickly.

Nevertheless, we're now pretty satisfied with the responsiveness and resource consumtion of the new production environment.

This post is a recollection of the steps I went through last week.

Gearing up

We generally favor a BSD setup for servers. We've had very good experience with Windows Server 2008 and Ubuntu Server Edition but whenever we can, we go BSD because that's what l33t d00dz do.

Ideally we would have opted for OpenBSD, but OpenBSD dedicated hosting is extremely hard to come by, and we wanted to have provider support for the OS. We didn't want to get a random Linux box and bootstrap an OpenBSD installation on it. If ever we got a problem with the box at some point, the support would have answered "please use a supported operating system".

FreeBSD remains an outstanding server operating system, and we happily leased a FreeBSD box with much more disk space and computing power we need.

When considering which webserver to use, this is a no-brainer: you want speed, you chose nginx.

As we use WordPress for the blog, it means we'll also have to install MySQL.

FreeBSD

FreeBSD isn't OpenBSD when it comes to security, but the default security configuration is pretty decent if you opt for a minimal install.

The first good reflex is to set up PF to only allow incoming transmission to the SSH server and the web server. It can also be used to normalize the incoming packets and protect against spoofing.

In case you were wondering what PF is, well, put simply, PF is the best IP filter available. Its strength comes from its clear and unambiguous grammar and its linear complexity. What do I mean by linear complexity? If you have a simple setup, your configuration file will be small and simple. If you have more advanced needs, the grammar enables you to go very deep in the details. It's one of these products you test and never look back.

Without any further ado, here is a minimal PF configuration file with normalization, antispoofing and only ssh and https incoming traffic allowed:

# skip the loopback
set skip on lo
# packet normalization
scrub in
# block everything in
block in
pass out keep state
# antispoofing
antispoof quick for { lo $ext_if }
# allow ssh and web in
pass in on $ext_if proto tcp to ($ext_if) port \
    { ssh, http, https } flags S/SA modulate state

A good security habit - which is not limited to FreeBSD - is to set up a non-root account, disable ssh root login (an option in /etc/ssh/sshd_config) and use sudo and su when privileges are needed (your additional account needs to be in the wheel group for this to work).

SSH login should be done only via a key and the password for all the accounts need to be extremely strong (more than 100-bit of entropy). Using SSH keys for login makes it possible to use extremely strong password.

It is possible to setup the SSH server to listen on a different port than 22, but it might be incompatible with your local firewall policy. When you do that, make sure that you have a way to access your box should something go wrong.

A good sanity check is that when you run the "netstat -an" command, you should only see the SSH server listening to the outside. All other servers must be bound to localhost (or better, an unix socket). I know we set up a firewall to protect against that kind of attacks, but good security is achieved through redundancy.

Recompiling the kernel and getting rid of everything you don't use is a nice final touch. Why have IPV6 when your server is only IPV4? If your machine got rooted via IPV6, would you be able to realize it easily?

Recompiling the kernel on BSD requires creating a configuration file, compiling it with config(5) and then run make. It's a bit less user friendly than in Linux (as far as kernel compilation friendliness goes).

nginx

nginx is an excellent webserver, with a very low memory footprint and very high capability. I've spent a great deal of time inside the bowels of this software and although I regret the choice of old-school C (over template-intensive C++), I really think it's a nice piece of software. It's clean, rigorous and consistent. Our industry needs more software of this quality.

nginx has got an extremely good security history and is delivered secure by default. It doesn't mean the default configuration cannot be improved.

The most obvious is to limit the number of connections per client, set up more aggressive timeouts and limit the buffers' size a client might send. This offers some minimal protection against DoS. Tread carefully however as they can have a bad impact on performance.

You can restrict things further by only allowing GET, HEAD and POST commands, denying all these weird HTTP requests you never use.
Last but not least, you will want to hide the nginx version you are using, no need to make exploit lookup easier, isn't it?

In the http section you will therefore want to add the following:

# hide nginx version
server_tokens off;
# limit connections
limit_zone slimits $binary_remote_addr 5m;
limit_conn slimits 20;
# set up restricting client buffers
client_header_buffer_size 4k;
client_max_body_size 5M;
large_client_header_buffers 2 8k;
# set up aggressive timeouts
client_body_timeout   10;
client_header_timeout 10;
keepalive_timeout     5 5;
send_timeout          10;

and in the server section

# only allow GET, HEAD and POST
if ($request_method !~ ^(GET|HEAD|POST)$ )
{
    return 444;
}

Using a dedicated partition is a possibility. It enables you to prohibit features such as setuid bits and even executable permissions at the partition level. It also protects against flooding, should your users be able to upload content.

The drawback of using a dedicated partition is that it makes space (re)allocation more difficult.

MySQL

If you ask me, I'll tell you that relational databases are overused. But we didn't program WordPress, and WordPress only works with MySQL, we therefore need to install it on the box.

Set up a strong password for the MySQL root account and add another account that will have only access to the databases your web server will use. Never use the root account to access your tables from a web server, this is a major security risk.

If for any reason you lose the control of your web server, or someone achieves to extract the login credentials, they will only be able to access your web server database. With the root account, the attacker would - for example - be able to create accounts and even lock you out of your databases!

Additionally, you need to make sure that MySQL only listen to an unix socket (if the MySQL server sits on the same server than nginx), this is done with the option "--skip-networking".

I strongly recommend against installing phpMyAdmin or any equivalent software on production machines.

PHP

On FreeBSD, PHP comes by default with a decent security configuration, including the Suhosin patch, which is much better that the safe mode.

Nevertheless, it's good to disable all the extensions you're not going to use, not only this will increase stability and decrease memory usage, but it will reduce the attack surface.

Install PHP 5 from the ports where you will be able to activate FastCGI, this is important as nginx doesn't have a native PHP support à l'Apache.

Configuring the PHP FastCGI server

The PHP FastCGI server exits after a certain number of requests is done. Since PHP isn't a very stable parser, preventing it from running a long time increases resilience. The drawback is that you have to keep the server running manually.

We will use daemontools to keep the FastCGI server up, but any process monitor will do.

It's advised to use spawn-fcgi to run your PHP FastCGI server, as it will enable you to set finer privileges and it works very well with daemontools.

Here is a run script example for supervise:

#!/bin/sh
export PHP_FCGI_CHILDREN=3
export PHP_FCGI_MAX_REQUESTS=250
exec /usr/local/bin/spawn-fcgi -n -s /tmp/php_fcgi.socket \
    -u www -U www -g www \
    -- /usr/local/bin/php-cgi

You can see that we bind the FastCGI server to an unix socket instead of a TCP port.

Adding PHP support to nginx

nginx needs to be configured to redirect php parsing to the FastCGI server, you just need to add the following to your server section:

index index.php;
   
log_not_found off;

location ~ \.php$
{
    try_files $uri = 404;
    fastcgi_index   index.php;
    fastcgi_param   SCRIPT_FILENAME \
        $document_root$fastcgi_script_name;
    include         fastcgi_params;
    fastcgi_pass    127.0.0.1:9000;
 }

While we are at it, we will add the required lines to make WordPress work nicely. We need to make sure that WordPress' crafted URLs are not interpreted as 404 by the nginx server:

location /blog
{
    try_files $uri $uri/ @wp_blog;
}  

location @wp_blog
{
    include         fastcgi_params;
    fastcgi_param   SCRIPT_FILENAME \
        $document_root/blog/index.php;
    fastcgi_param   QUERY_STRING q=$uri&$args;
    fastcgi_pass    unix:/tmp/php_fcgi.socket;
    fastcgi_param   SCRIPT_NAME /blog/index.php;
}

Note that you need to replace "/blog" with the actual path to your blog and fastcgi_pass will require the hostname and port to your FastCGI server (127.0.0.1:9000 by default, in our case we use an unix socket).

Now that your web server is set up, you might want to be able to monitor its activity. You can do this with nginx-rrd. I won't go too much into the details, just make sure you don't make your status page world readable.

Installing WordPress

Despite the recurring security issues of WordPress, we didn't want to switch to another engine. WordPress is full featured, very well supported and well, Matt already did an incredible layout for the blog...

Making WordPress work with nginx is everything but straightforward. The good news is that we've already done the toughest.

Before installing WordPress, create a dedicated database and user with MySQL, as using root to access a shared database is a very bad idea (bis repetita placent).

Once you have your database credentials working, you can install WordPress in following the infamous 5-Minute install procedure.

WordPress only requires you to edit one configuration file: wp-config.php. You will also see four entries named AUTH_KEY, SECURE_AUTH_KEY, LOGGED_IN_KEY and NONCE_KEY require a passphrase. Generate a cryptographically strong passphrase with the following command:

dd if=/dev/random bs=32 count=1 | sha256

The last line - the hexadecimal number - is your secure passphrase. Now your friends will look at you differently.

For our setup we have only one mandatory plugin: the nginx compatibility one. Should you wish to increase performance, WP Super Cache is highly recommended.

Installing memcached?

If you run a high-traffic web site, you will need to install a cache server.

memcached is supported by WordPress through this plugin. I do not know how it cooperates with WP Super Cache, but I submit the two are mutually exclusive.

At Bureau 14, we don't use memcached for the very simple reason we've developed our own high-performance cache engine that runs faster and scales better. It also comes with more fur. Don't worry, we won't keep this software to ourselves for ever: we will run a closed beta this summer. Should you wish to know more, feel free to subscribe to our mailing list.

Using memcached (or any cache engine) for static files generally yields no performance improvement. The reason is that nginx already caches static files.

Closing words

Although this setup is not as straightforward as the classic Linux/Apache combo, I truly think you will find the trouble to be worth it.

We surely do.

Do you need help tuning and securing your servers? We can help.

April 15th, 2010

The server is going up and down as we tune it for maximum performance. We apologize for the poor service quality.