This time we use MSVC intrinsics that are
C wrappers for Intel assembler 'bsf' instruction.
The speed up in node count is around 3%, probably
it does not worth the effort. Anyway this patch
can be useful at least for documentation purposes.
This optimization covers 32 bit systems only.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Under Linux we have a segfault after a random time,
about a couple of minutes while running the benchmark.
This happens both with gcc and icc, and both with O2
and O3 optimizations.
Disable for Linux until we understand what's the deal.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This final version is a little bit faster then
previous patch and is a bit cleaned up also.
On 32 bit x86 pop_1st_bit is now more then
two times faster then the original one that
is optimized for 64 bit processors.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Operations on 64 bits Bitboard types are slow
on x86 compiled with gcc, so optimize this case.
BTW profiling shows that pop_1st_bit() is a
veeery performance critical path!
Signed-off-by: Marco Costalba <mcostalba@gmail.com>