Last month was quite busy - I’ve started a new project called Cosette, which is a brand new chess engine written in C# for .NET Core platform. It’s not my first project of this kind (a few years ago I made Proxima b 2.0 (C#), together with even older Proxima b (C++)), so using the gained experience I can finally write a few words about performance tips and tricks, especially in C# language.
My very first engine, Proxima b, used array-based board representation - basically, there was a Board class which contained a two-dimensional char array field. If the specified cell contained a letter, it was a sign that we have some piece on this field. This is probably the simplest way to represent board state, but at the same time, very inefficient when generating moves. Let’s see at this method:
|
|
This is the code used for sliding pieces when we want to generate available moves. Let’s look at a queen moves generator now:
|
|
It’s not hard to find that it takes a lot of time to get these moves - and remember that we talk about processing millions of boards per second. That’s why array-based representation is a very unfortunate thing for high-performance solutions. This is where bitboards gain the advantage - we store our state using 64-bit types like ulong
, which allows using extremally fast bit arithmetic instructions like and
, or
and xor
. What’s more, using Magic Bitboards, we are able to precalculate moves for sliding pieces and probe them using this simple code:
|
|
Way more simple, right? I don’t want to explain the whole algorithm, because it’s quite complicated to understand and this is not the point of this article. Chess Programming Wiki have excellent articles about Bitboards and Magic Bitboards. A bit similar thing is with king and knight moves, where precalculating and probing moves is trivial (because they have very limited movement).
C# has an excellent feature called stackalloc which allows making an array on the stack instead of the heap. This is a very important difference because we have to remember that Garbage Collector’s work is very undesired during a search of the best move (GC has to stop the thread and analyze all heap objects to find which of them are no longer used). In Cosette, I use stackalloc expression mainly to make arrays containing generated moves and their values (used later for sorting).
|
|
Because it’s a part of negamax search, this piece of code is called very frequently - so allocating array on heap every time and cleaning it then would be a real nightmare for overall performance. Don’t do it.
We have a list of generated moves, so let’s think about how to execute them. There are two main approaches:
In Proxima b and Proxima b 2.0 (C#), I was using the first approach due to my small experience with chess engines. It worked quite well and was simple to do, but both memory allocation on heap and copying data was a real bottleneck - even with using fast Buffer.BlockCopy in Proxima b 2.0. In Cosette, I’ve decided to try the second approach to reduce Garbage Collector pressure, which is a significant change in the engine architecture compared to the previous ones. First, let’s see at the Move structure:
|
|
The whole structure takes 32 bits, and what’s more important, it contains only the most needed information. That’s because:
So what things are missing? I’ve decided to not store a piece that has been captured (this can be obtained when necessary during search or evaluation), there is no also an indicator if the move generates check or checkmate. Lack of the first one makes a bit more difficult to do an undo operation, so the engine has to store this information somewhere else. That’s why board state in Cosette contains a set of stacks to store information which would be lost when making moves:
|
|
When making or undoing moves, these stacks are pushing or popping values, depending on the situation. FastStack is a fast alternative for Stack class - it has a fixed size which prevents from reallocating array during a search. Because we know what values are expected and how much, so it’s not a problem.
MakeMove and UndoMove in Cosette are a bit complex, so again, I won’t go into implementation details. The main point of this chapter is: if possible, go with Make-Undo schema, because of the reduced pressure of Garbage Collector and time saved on the memory allocating/copying.
Since .NET Core 3.0, we have finally able to use BMI functions directly (a few months ago I wrote an article about them: Performance of Bit Manipulation Instructions (BMI)). Because Bitboards and Magic Bitboards heavily depends on them, we can take an advantage from using them like this:
|
|
It’s crucial to remember that not every processor supports BMI instructions - so having both versions is a good solution. During the tests, I’ve got about 10% better performance using BMI so it’s worth try at least.
This is the first part of the series about chess engines written in C# - next time, I will write something about transposition tables, and hopefully other things. Cosette is still in early development and lacks a lot of optimization, so hopefully there will be a ton of good knowledge to present. And in the meantime, feel free to play with my engine using lichess platform, or visit the GitHub repository.