A collection of thoughts related to the challenges of software engineering

stay connected

April 11th, 2010

It's really a shame for a company that does its best to provide its customers with fast, reliable and clean software to have a very slow web server...

We had to fix that.

We've spent a whole week end configuring a dedicated server to host the official company's website as well as this blog.

Of course it runs FreeBSD and nginx!

March 27th, 2010

This morning was one of these mornings when you wake up with that kind of strange idea only sleep can bring you. It's a concept related to software optimization at the assembly level.

Before I explore this idea with you, I'd like to spend few paragraphs to talk about the vast and inspiring field of assembly optimization.

Optimizing code

Assembly optimization is such a huge topic that I won't even try to be exhaustive. This brief tour should help you understand where the idea comes from. Experts will forgive the generalities and approximations of this quick overview. Should you wish to know more, feel free to read the abundant and excellent documentation provided by Intel.

The natural approach

This first and most straightforward one consists in replacing a set of instructions by an equivalent, but faster, set of instructions.

An example of this is when you write

1
xor    eax, eax

instead of

1
mov    eax, 0

on the IA32.

These instructions both set the eax register to 0, except that the former is faster (This xor register, register instruction also has got other special powers I will tell you later about...)

Optimizing is in that case just a matter of search and replace, but unfortunately this technique will not lead you very far.

On our heavy-pipelined processors, you will additionally do everything you can to make the most of branch prediction.

Optimizing branches

Branch prediction is a set of heuristics built inside your CPU to avoid flushing the pipeline when branching. It determines the most likely branch when reaching a conditional instruction and preloads the guessed address. Unexplored branches are predicted using static heuristics, explored branches using statistics.

Making the most of branch predictions means disposing conditions in a way favorable for these heuristics, or even avoid branching in the first place through the use of conditional instructions. Unrolling loops is another way to tackle this as it replaces the loop by the equivalent repeated instructions. Unrolling has however to be done reasonably, as excessive use of this technique may result in a larger code that no longer fits in the trace cache.

Putting branches aside, arranging instructions also has got a great impact of performances, because of stalling.

Avoiding stalls

Nota bene : the following example oversimplifies the issue as it overlooks - amongst other things - parallelization on processors featuring multiple treatment units. In other words, rearranging the above instructions might not always yield a performance improvement, but it nevertheless shows the underlying concept.

1
2
3
4
mov    eax, 3
mov    ebx, 4
add    ebx, eax ; addition of 3 + 4
xor    ecx, ecx

The addition can only occur when the ebx register is ready, in other words, the addition will have to wait for the register to be fully set before doing anything. That's a stall. If you notice the instruction on ecx, you will see there is no dependence .

If you place the operation on ecx before the add, you have a potential performance gain because the instruction can execute right away without waiting for the previous one to finish. Sometimes dependencies are more subtle and are based on the flags. This is why - for example - it is generally recommended to avoid instructions such as inc and dec that only affect flags partially.

I earlier wrote that the xor register, register instruction has got some special powers. It can indeed be used to break the dependencies we just examined. When the processor encounters this instruction, it will discard all dependencies related to the register you reset, preventing stalls.

And so much more...

Memory layout is another major topic that is very hard to sum up in few words, but I need to at least name it. Just know that aligning instructions and data on cache boundaries can yield a significant performance improvement in optimizing loads and avoiding false sharing (very important for multithreaded code).

And then, you have also the matter of mixing floating point and single instructions multiple data (SIMD) instructions, partial register stalls, vectorization, stack optimization, store forwarding, locality enhancement...

When you realize the amount of techniques that can be used to optimize a code fragment, finding the ideal (or close to ideal) set of parameters for your processor is extremely difficult. This is where we go back to instrumentation, without instrumentation, you (or most likely the compiler) optimize a priori for your given architecture. With instrumentation, you make decisions based on hard evidence.

You need to instrument representative runs of your program to make it worthwhile, and sometimes, there isn't much the optimizer can do to improve the speed of your software.

This morning's idea

What about using natural selection to optimize a routine?

If you inject random optimizations in thousands of instances of a routine, benchmark them, discard the slower ones and keep working on the faster ones, will it eventually output the fastest possible implementation?

Obviously the following requirements would have to be met:

  • The routine needs to be testable for correctness to a reasonable extent as optimizations might result in modified behavior
  • It must be possible to generate a representative set of inputs in order to evaluate performance realistically
  • Enough computing power to mutate, spawn and test the routine as waiting one month for the result might not be acceptable...

The routine mutator would need to know a comprehensive set of optimization techniques for the given architecture, but it could also simply inject random opcodes and let the test phase discard non-working versions.

The approach can also be used to generate the smallest possible program, or any feature that can easily and objectively be measured.

When would that approach be reasonable? Probably for fundamental software bricks, such as an operating system scheduler, the pager or some driver's routine.

That and God of War.

March 16th, 2010

The BoostCon is a Boost centric conference, but it's actually much more than that. It's probably the best C++ conference in the world.

A bold statement?

Contrary to what the name of the conference might imply, it's more C++ centric than Boost centric. A lot of talks revolve around the boost libraries, but there's also content related to the language itself and the challenges we all face.

The advantage of the Boost tropism is that the topic stays on "real world issues". No endless debate about how to phrase a new feature in the C++ standard, no endless theoretical speech about a problem that only exists when you program in a zero gravity environment on a computer that might be invented in 2030 if LSD gets legalized.

I was at BoostCon 2009, not only the content was accurate, advanced and of high quality, but it was extremely relevant.

"Spot on" is actually the term that comes to my mind.

Would it be only for the keynote by Andrei Alexandrescu and the introduction to software transactional memory I would have been happy to do the trip.

You know these conferences where you have some days filled with "so-so" talks, and you go to that talk about optimizing memory usage on a system you've never heard of because all the other talks look horrid, and the guy just read his slides with a vocoder tone, and you start firing up your laptop and suddenly the conference room becomes a very expensive WiFi spot.

Well BoostCon is nothing like it.

What's more bound to happen is that you're going to regret you still don't master mitosis, because you'd really like to learn more about smart pointers and Spirit, and both talks happen at the same time, and they are both given by really great people.

You'll fly back with the head full of new ideas to tackle the issues you've left at home (I'm not talking about how to get the exoskeleton in S.T.A.L.K.E.R.: Зов Припяти, because it's really just a matter of harvesting artifacts like there's no tomorrow), rested and energized by the intense intellectual maelstrom.

The conference is located in Aspen, Colorado.

I'll nitpick and say although the site and the city in itself are really a great, great place to have a conference, it's a bit inconvenient if you live outside the USA. From my European point of view, American conferences located in Boston or New-York are be more convenient.

If you're serious about C++, fly your team to the BoostCon. It's an outstanding investment.

Registration is just one click away

January 16th, 2010

I hope you like the new theme! You can thank Matt Moore for this incredible design.

Of course, that doesn't mean we're not working hard on our top-secret super-duper project... We should be able to make an announcement before spring.

January 3rd, 2010

In battle, use direct forces to match the enemy and use indirect forces to win the enemy.
-Sun Tzu

A trick to protect yourself from burglars is to have in an obvious - but not too obvious - place a reasonable amount of cash.

An opportunity burglar - not the one after your Matisse - is looking for easy money. When the front door is locked, most of them give up. The most resilient may break a window and perhaps pick a simple lock, but don't expect anything fancy.

Once in, there are only two objectives:

  1. Steal something small and valuable (I'm not sure that carrying around your 46" television is of any interest to the average thief);
  2. Get the hell out.

Which means that leaving an envelope in a drawer near the entrance containing a plausible amount of money (let's say 200 €) is a form of insurance. The thief will think he found your stash, decide that 200 € for one night is pretty decent and consider objective #2. Your other valuables are protected.

This is how you want to design your copy protection.

Buy a cheap over-the-counter copy protection. Let the cracker beat it. Let him think he won. The real protection isn't there.

Add your custom integrity checks within the software (don't use the copy protection you bought and make sure the compiler will not factorize your checks). Make the software unusable after a reasonable amount of time. Example: randomly overwrite a memory region after 200 clicks.

How to write code integrity checks? Read a code segment, hash it and make sure it's what you expect. In C/C++ it's just a matter of using the function's address. Feel free to review our related consulting services.

Users will come to your forums, mail support and complain about the crashes. You got yourself a pretty decent opportunity to sell a legitimate license to pirates. Don't worry, if people like your software and the price is right, they'll buy it. People who don't aren't potential customers in the first place.

It is paramount to evaluate with accuracy and fairness how much piracy costs you. One pirated license does not equal one lost sale, it's an order of magnitude less.

When you spend time implementing and testing copy protection, you're not implementing new features or fixing bugs. Keep in mind you could be driving your motorbike along those wonderful curvy roads. Wait. That's not what you're doing. You're implementing some sort of copy protection mechanism because a student with too much free time is going to reverse engineer your software and write a key generator for it.

Remember that your goal is to sell software and be profitable. For each software there is an optimal amount of copy protection. Sometimes this amount is zero. Obvious case: free software, less obvious case: software that needs to gain notoriety.

Adding more copy protection than needed is a huge waste of time and money, notwithstanding the annoyance you may cause to your legitimate users. If you played games with overzealous copy protections, you know what I mean.

See piracy as competition. See it as the lowest segment of your price grid. You want to make sure it's hard to find a working illegitimate copy, hard enough to make buying a license interesting. It must be so much easier to buy the software from you than getting the illegal copy that your customers will quickly dismiss the latter.

You cannot prevent piracy, but you can prevent it from harming your business.