An old idea retested at SPRT(0, 3) with 60+0.05 TC:
LLR: 2.95 (-2.94,2.94) [0.00,3.00]
Total: 98872 W: 15549 L: 15123 D: 68200
This is a very small elo increase patch so it really
stresses the limits of fishtest.
bench: 8596156
It seems to intorduce a regression when tested
with 3 threads at 15+0.05:
ELO: -2.26 +-2.2 (95%) LOS: 2.4%
Total: 30000 W: 4813 L: 5008 D: 20179
bench: 8331357
Tested setting FakeSplit to true and running
./stockfish bench 128 2
There is a different signature with and without
the patch so it affects functionality but
only in SMP case.
bench: 8331357
SMP case is very tricky and raises an assert in stage_moves():
assert(stage == KILLERS_S1 || stage == QUIETS_1_S1 || stage == QUIETS_2_S1)
So rewrite the code to just return moves[] when we are sure
we are in quiet moves stages.
Also rename stage_moves to quiet_moves to reflect that.
No functional change (but needs testing in SMP case)
Use MovePicker moves[] to access already tried
quiet moves. A bit of care shall be taken
to avoid calling stage_moves() when we are still
at ttMove stage, because moves are yet to be
generated. Actually our staging move generation
makes this code a bit more tricky than what I'd
like, but removing an ausiliary redundant
array like quietsSearched[] is a good thing.
Idea by DiscoCheck
bench: 9355734
Use the newly introduced LineBB[] to simplify this
super hot-path function.
Verified with perft we don't have any speed regression, although
the number of squares removed is less than before in case of
contact check.
Insipred by DiscoCheck implementation.
Perft numbers are the same, but we have an harmless functional
change due to reorder of moves, because now some illegal moves
are no more detected at generation time, but in the search.
bench: 8331357
This seems a die hard idea :-)
Passed both short TC
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 17485 W: 3307 L: 3156 D: 11022
And long TC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 38181 W: 6002 L: 5729 D: 26450
bench: 8659830
Actually, it is not used, as both arrays have the
same values. Some local tests in either direction
showed no improvement.
Also some minor corrections in the comments.
No functional change.
Previously some squares could be "incorrectly" awarded
to a pinned piece.
e.g. in 3k4/1q6/3b4/3Q4/8/5K2/B7/8 b - - 0 1 the black
bishop get 4 squares too many and the white queen gets 6.
Passed both short TC.
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 4871 W: 934 L: 817 D: 3120
And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 38968 W: 6113 L: 5837 D: 27018
bench: 9282549
But compensate by reducing rook and queen
value by 53 = (160 / 3)
Material imbalances are affected as follows:
Red. Major Rook Queen Total
QRR +160 -2*53 -53 +1
QR +160 -53 -53 +54
RR +160 -2*53 0 +54
R 0 -53 0 -53
Q 0 0 -53 -53
so that the imbalance changes by at most 54 + 53 = 107 units.
This corresponds to appromximately 3.5cp in the final evaluation.
Verified with fixed number 40000 games at both short
and long TC it does not regress.
Short TC 15+0.05
ELO: 1.93 +-2.1 (95%) LOS: 96.6%
Total: 40000 W: 7520 L: 7298 D: 25182
Long TC 60+0.05
ELO: -0.33 +-1.9 (95%) LOS: 36.5%
Total: 39663 W: 6067 L: 6105 D: 27491
bench: 6703846
As, Gary (that analyzed the bug) says:
SF does not print a PV when the original best move fails low,
we hit our time allowance, and stop the search. The output from
the SF search is below. It was failing low on Ne1 at depth 34.
Then, we get bestmove Qd3, but no PV change.
info depth 34 seldepth 45 score cp 38 upperbound nodes 483484489 nps 15464575 time 31264 multipv 1 pv f3e1 h5h4 e1d3 h4g3 f2g3 a6f6 f1f6 e7f6 d1a4 f6e7 a1f1 d8f8 a4b3 b7b6 b3c2 f7f6 c2a4 h3g5 b2b3 g5f7 a4c6 f7d6 h1g2 f6f5 e4f5 d6f5
info depth 34 seldepth 45 score cp 38 upperbound nodes 483484489 nps 15464575 time 31264 multipv 1 pv f3e1 h5h4 e1d3 h4g3 f2g3 a6f6 f1f6 e7f6 d1a4 f6e7 a1f1 d8f8 a4b3 b7b6 b3c2 f7f6 c2a4 h3g5 b2b3 g5f7 a4c6 f7d6 h1g2 f6f5 e4f5 d6f5
info depth 34 seldepth 47 score cp 30 upperbound nodes 2112334132 nps 17255517 time 122415 multipv 1 pv f3e1 h5h4 d1a4 a6f6 e1d3 d8f8 a4c2 h4g3 f2g3 f6f1 a1f1 h7g8 b2b3 f7f6 a3a4 b7b6
info depth 34 seldepth 47 score cp 30 upperbound nodes 2112334132 nps 17255517 time 122415 multipv 1 pv f3e1 h5h4 d1a4 a6f6 e1d3 d8f8 a4c2 h4g3 f2g3 f6f1 a1f1 h7g8 b2b3 f7f6 a3a4 b7b6
info nodes 18235667001 time 969824
bestmove e2d3 ponder c8d7
Looking at the code, if we hit Signals.stop, we return from id_loop
before printing any PV. It is possible for us to have resorted the
RootMove list though, which will change the move that is actually
played.
No functional change.
1/ eval margin and gains removed:
16bit are now free on TT entries, due to the removal of eval margin. may be useful
in the future :) gains removed: use instead by Value(128). search() and qsearch()
are now consistent in this regard.
2/ futility_margin()
linear formula instead of complex (log(depth), movecount) formula.
3/ unify pre & post futility pruning
pre futility pruning used depth < 7 plies, while post futility pruning used
depth < 4 plies. Now it's always depth < 7.
Tested with fixed number of games both at short TC:
ELO: 0.82 +-2.1 (95%) LOS: 77.3%
Total: 40000 W: 7939 L: 7845 D: 24216
And long TC
ELO: 0.59 +-2.0 (95%) LOS: 71.9%
Total: 40000 W: 6876 L: 6808 D: 26316
bench 7243575
1/ eval margin and gains removed:
- gains removed by Value(128): search() and qsearch() now behave consistently!
2/ futility_margin()
- testing showed that there is no added value in this weird (log(depth), movecount)
formula, and a much simpler linear formula is just as good. In fact, it is most
likely better, as it is not yet optimally tuned.
- the new simplified formula also means we get rid of FutilityMargins[], its
initialization code, and more importantly ss->futilityMoveCount, and the hacky
code that updates it throughout the search().
- the current formula gives negative futility margins, and there is a hidden interaction
between the move coutn pruning formula and the futility margin one: what happens is
that MCP is supposed to be triggered before we use the non-sensical negative futility
margins.
3/ unify pre & post futility pruning
- pre futility pruning (what SF calls value based pruning) used depth < 7 plies,
while post futility pruning (what SF calls static null move pruning) used depth < 4 plies.
- also the condition depth < 7 in pre futility pruning was not obvious, and it seemd
to be depth < 16 (futility_margin() returns an infinite value when depth >= 7).
Tested with fixed number of games both at short TC:
ELO: 0.82 +-2.1 (95%) LOS: 77.3%
Total: 40000 W: 7939 L: 7845 D: 24216
And long TC
ELO: 0.59 +-2.0 (95%) LOS: 71.9%
Total: 40000 W: 6876 L: 6808 D: 26316
bench: 10206576
RedundantRook and RedundantQueen replaced by simple
variable RedundantMajor. Also the SameColor coefficient
for Queen<->Queen has been set by definition to 0.
The remaining 5 parameters:
LinearCoefficients[ROOK]
LinearCoefficients[QUEEN]
QuadraticCoefficientsSameColor[ROOK][ROOK]
QuadraticCoefficientsSameColor[QUEEN][ROOK]
RedundantMajor
are sufficient to equate the material imbalances for the
5 common material configurations of R, RR, Q, QR and QRR
to any desired values simultaneously.
With the chosen parameters there should be no functional
change unless one side has more than 2 rooks or more
than 1 queen. For example bench from the start position
using the commands:
./stockfish
go depth 16
produces identical output except for one extra node
in the last iteration.
bench: 8198094
Coefficients for Bishop<->BishopPair and Bishop<->Bishop
are also pretty much redundant. By altering the values
in LinearCoefficients[] these coefficients can be zeroed
without changing the imbalance calculations in any position
with less than 3 bishops for one side.
bench: 7995098
First coefficient in the SameColor array does an
equivalent job when folded into the LinearCoefficients
array.
All of the diagonal terms in the OppositeColor array
are redundant due to cancellation.
No functional change.
In case we find a very good move after a
troubled start, we don't return immediately
anymore.
Tested directly at long TC where it passed:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 13910 W: 2397 L: 2228 D: 9285
bench: 7995098
And remove a complex (and broken) formula.
Indeed previous code was broken in case of TC with big
time increments where available_time() was too similar
to total time yielding to many time losses, so for instance:
go wtime 2600 winc 2600
info nodes 4432770 time 2601 <-- time forfeit!
maximum search time = 2530 ms
available_time = 2300 ms
For a reference and further details see:
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/dCPAvQDcm2E
Speed tested with bench disabling timer alltogheter vs timer set at
max resolution, showed we have no speed regressions both in single
core and when using all physical cores.
No functional change.
A combo of two patches that failed SPRT with score
higher than 50% but togheter they succeed:
SPRT at 60+0.05
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 7312 W: 1276 L: 1139 D: 4897
bench: 8029334
If the game got late enough that move_importance(currentPly) * slowMover / 100
rounds to 0, then we ended up dividing 0 by 0 when only looking 1 move ahead.
This apparently caused the search to almost immediately abort and Stockfish
would blunder in long games. So convert thisMoveImportance to a double.
No functional change.
This seems more a material imbalance topic,
anyhow test is good and so patch is applied
as is.
Passed both short TC:
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 17391 W: 3548 L: 3393 D: 10450
And long TC:
LLR: 3.00 (-2.94,2.94) [0.00,6.00]
Total: 34660 W: 5972 L: 5700 D: 22988
bench: 8291883
Dumb down a bit the code and trade some possible
speed (but this is far from hot path anyhow) for
some added readability for the layman.
No functional change.
The normalising transformation is computed all at
once by the helper function get_flip_sq and then
applied immediately to the relevant squares as soon
as they are loaded from the position class.
bench: 8350690
New formula mathces the old formula until d = 45
Test code:
int main() {
for(int d=1; d<=45; d++)
{
int a = int(log(double(d * d) / 2) / log(2.0) + 1.001);
int b = int(2.9 * log(double(d)));
if (a != b) std::cout << d << std::endl;
}
return 0;
}
bench: 8455956
This is the first chain bonus version
from Ralph that also passed both
Short TC:
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 23460 W: 4727 L: 4556 D: 14177
And long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 31858 W: 5497 L: 5240 D: 21121
And performed better against current
committed version, always at 60secs:
LLR: -2.94 (-2.94,2.94) [-3.00,3.00]
Total: 26301 W: 4477 L: 4580 D: 17244
This test was done by Leonid.
bench: 8455956
After 40K games at 60 secs, result is still
not clear, but not a regression against SF 4
After
ELO: 50.11 +-2.1 (95%) LOS: 100.0%
Total: 40000 W: 10547 L: 4817 D: 24636
Before
ELO: 49.51 +-2.1 (95%) LOS: 100.0%
Total: 40000 W: 10483 L: 4821 D: 24696
So re-apply the patch to avoid to
special-case this one.
bench: 7403882
It was never updated !
Currently it only affects evaluate_passed_pawns()
and in particularly the rule to increase the bonus
if we have more non-pawn pieces. We could simply use
popcount() instead and avoid the little slowdown
in put_piece() and remove_piece(), but this would
leave a very subtle and tricky hole where people
are forced to remember that pos.count<ALL_PIECES>()
does not work. This is not obvious and so dangerous.
Thanks to Ronald de Man for spotting this.
bench: 7931424
Due to a strange issue (bug?) the ternary
operator does not return a BitCountType for
icc, so revert to the expression.
The same patch was already applied in
9749f1f14c
Thanks to NssY Wanyonyi for pointing out
this.
No functional change.
This reverts commit 4bc2374450 for
two reasons.
First regression testing shows almost equal
score:
Before the patch:
ELO: 49.75 +-2.5 (95%) LOS: 100.0%
Total: 27205 W: 7113 L: 3244 D: 16848
After the patch:
ELO: 48.87 +-2.9 (95%) LOS: 100.0%
Total: 20860 W: 5478 L: 2563 D: 12819
Second, and more sensible to me, this patch
increases safe check bonuses to 4 times their
original value (!) and considering:
- Values were already well tuned
- Values are highly critical
- King safety is highly critical, very TC
dependent and very difficult to test
- Our testing coverage is partial (self-testing,
blitz times)
I think is better to be safe than sorry and so
I revert the patch.
bench: 8440524
Passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10466 W: 2087 L: 1953 D: 6426
And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 26334 W: 4540 L: 4310 D: 17484
And also proved stronger than a slightly
different patch, also succesful against master:
https://github.com/mcostalba/Stockfish/commit/dc6830a3b4ed12
But losing against current one in a match
at 60secs with SPRT [-3, 3]:
LLR: -2.96 (-2.94,2.94) [-3.00,3.00]
Total: 44484 W: 7360 L: 7463 D: 29661
bench: 9160831
Also the drawing criteria has been slightly loosened.
It now detects a draw if the king is ahead of all the
pawns and on the same file or the adjacent file.
bench: 7700683
This lost position 8/8/3q4/8/5k2/2P1R3/2K2P2/8 w - - 0 1
was previously evaluated as a draw.
The king and rook need to be correctly placed with
respect to the _same_ pawn.
(Note also that the check for the pawn being on RANK_2
in the old version is redundant: it must be on RANK_2 if
it hopes to protect a rook on RANK_3)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Good both at short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 5448 W: 1133 L: 1012 D: 3303
And at long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 40509 W: 6836 L: 6541 D: 27132
bench: 7700683
Don't need a struct here. Speed test shows
result is teh same. Moreover RKISS is used
mainly at startup to compute magics, so
prefer to keep it simple...RKISS ;-)
Also some assorted triviality while there.
No functional change.
These two changes go in opposite directions and it
seems that the combination is stronger than original.
Here are the positive tests at various TC:
15+0.05
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 24561 W: 4946 L: 4772 D: 14843
60+0.05
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 15259 W: 2598 L: 2423 D: 10238
40/30
LLR: 2.96 (-2.94,2.94) [-3.00,3.00]
Total: 2570 W: 527 L: 422 D: 1621
Unfortunately there is also a bad result
with one sec time increment that needs
to be further investigated:
12+1
LLR: -2.97 (-2.94,2.94) [-3.00,3.00]
Total: 2694 W: 438 L: 543 D: 1713
bench: 8340585
Rationale:
- Speed of double and float is about the same (not on the hot path anyway)
- Double makes code prettier (no need to write 1.0f, just 1.0)
- Only practical advantage of float is to use less memory, but since we never
store large arrays of double, we don't care.
No functional change.
Increase bench default depth from 12 to 13 and
add 15 new endgame positions to have broader
coverage and also more reliable nps calulcation
used for fishtest framework.
Due to the new endgame positions, where nps is higher,
the total nps is increased of about 15%.
Thanks to Lucas and Jörg for the suggestions.
No functional change, but bench number is now:
bench: 8336338
For some users -stack_size,0x4000 does not work,
so revert for now.
osX 10.6.8
gcc version 4.7.3 (MacPorts gcc47 4.7.3_2)
g++: error: unrecognized command line option '-stack_size,0x4000'
make[2]: *** [stockfish] Error 1
make[1]: *** [gcc-profile-make] Error 2
make: *** [profile-build] Error 2
No functional change.
This reverts commit 800410eef1 and instead increases
stack size.
I went through the old emails with Daylen that reported the
crash issue on Mac OS X and was fixed by 0049d3f337.
It was reported default stack size for a thread in Mac OS X is 8
megabytes while the patch that we are reverting allows to reduce
stack size at max of about 217KB, so the reason for the crash was
only marginal in MAX_MOVES value. On those emails Daylen also
hinted how to increase stack size for Mac OS X to 16MB.
So prefer to increase stack size to 16MB instad of re-inventing
the wheel and do our home grown stack as we did with the patch
that we are now reverting (it will remain anyhow in git history
for documentation purposes).
No functional change.
Unify extensions between PV and not PV nodes
and remove all but check extensions.
This is a simplification so tested at fixed number
of games where proved to not regress.
About 45k games at 15+0.05
ELO: 1.23 +-2.0 (95%) LOS: 88.5%
Total: 45643 W: 9107 L: 8946 D: 27590
About 45k games at 60+0.05
ELO: 1.07 +-1.8 (95%) LOS: 87.8%
Total: 46786 W: 7728 L: 7584 D: 31474
bench: 3172206
If the uci option 'Best Book Move' is set to true the lookup still
returns a move at random instead of the move with the highest
weight.
No functional change.
This should be enough for any legal position, even
the handcrafted ones, like the one presented by Reuven:
1Q5R/4Q1K1/B1Q5/B4Q2/N2Q4/pQ4Q1/pn2Q3/krQ4R w - -
Where currently we crash. This reverts the patch
0049d3f337 of 8/4/2012 where stack
was shrinked due to crashes while in deep analysys.
No functional change.
This greately reduces stack usage and is a
prerequisite for next patch.
Verified with 40K games both in single and SMP
case that there are no regressions.
No functional change.
This is an even safer setup proposed and tested
by Alexandre Meirelles.
Regression testing of 40K games at 10+0.05 show
result is stable both against current master:
ELO: -0.29 +-2.2 (95%) LOS: 39.7%
Total: 40000 W: 8010 L: 8043 D: 23947
and again original master (the one with smallest
time parameters):
ELO: 1.71 +-2.2 (95%) LOS: 93.8%
Total: 40000 W: 8325 L: 8128 D: 23547
Alexandre verified with LittleBlitzer time losses are
greately reduced with this setup:
Games Completed = 2100 of 3000 (Avg game length = 35.745 sec)
Settings = RR/128MB/15000ms+50ms/M 1000cp for 12 moves, D 150 moves/
Time = 39200 sec elapsed, 16800 sec remaining
1. Stockfish 190913 1091.5/2100 803-720-577 (L: m=313 t=1 i=0 a=406) (D: r=278 i=91 f=136 s=8 a=64) (tpm=212.5 d=14.75 nps=925427)
2. Houdini 2.0 w32 1008.5/2100 720-803-577 (L: m=250 t=299 i=0 a=254) (D: r=278 i=91 f=136 s=8 a=64) (tpm=204.1 d=12.04 nps=1326351)
No functional change.
Goes in the direction of avoiding time losses and seems
equivalent after almost 40K games at super fast TC of 10+0.05
ELO: 2.61 +-2.2 (95%) LOS: 99.1%
Total: 39869 W: 8258 L: 7959 D: 23652
No functional change.
Goes in the direction of avoiding time losses and seems
equivalent after almost 40K games at super fast TC of 10+0.05
ELO: 2.41 +-2.3 (95%) LOS: 98.1%
Total: 37222 W: 7843 L: 7585 D: 21794
No functional change.
The ideal setting for super-blitz might be something like:
"Emergency Base Time" = 50
"Emergency Move Time" = 5
This would give a total emergency time buffer of:
50 + 40 * 5 = 250 ms
This setup replaces the previous half cooked hack
"Don't blunder under extreme time pressure".
Test results are very good at super blitz, but keep good even
at 60 secs.
At 5+0.05
ELO: 24.30 +-2.4 (95%) LOS: 100.0%
Total: 37802 W: 10060 L: 7420 D: 20322
At 15+0.05
ELO: 13.41 +-2.9 (95%) LOS: 100.0%
Total: 22271 W: 4853 L: 3994 D: 13424
At 60+0.05
ELO: 5.30 +-3.2 (95%) LOS: 99.9%
Total: 16000 W: 2897 L: 2653 D: 10450
No functional change.
Instead of current code, give a bonus according to the frontmost
square among candidate + passed pawns.
This is a big simplification that removes a lot of accurate code
substituting it with a statistically based one using the common
'bonus' scheme, leaving to the search to sort out the details.
Results are equivalent but code is much less and, as an added bonus,
we now store candidates bitboard in pawns hash and allow this
info to be used in evaluation. This paves the way to possible
candidate pawns evaluations together with all the other pieces,
as we do for passed.
Patch passed short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 16927 W: 3462 L: 3308 D: 10157
Then failed (quite quickly) at long TC
LLR: -2.95 (-2.94,2.94) [0.00,6.00]
Total: 8451 W: 1386 L: 1448 D: 5617
But when ran with a conclusive 40K fixed games at 60 secs it proved
almost equivalent to original one.
ELO: 1.08 +-2.0 (95%) LOS: 85.8%
Total: 40000 W: 6739 L: 6615 D: 26646
bench: 3884003
Now that we use pre-increment on enums, it
make sense, for code style uniformity, to
swith to pre-increment also for native types,
although there is no speed difference.
No functional change.
ENABLE_OPERATORS_ON has incorrect definitions of
post-increment and post-decrement operators.
In particularly the returned value is the variable
already incremented/decremented, while instead they
should return the variable _before_ inc/dec.
This has no real effect because are only used in loops
and where the returned value is never used, neverthless
it is wrong. The fix would be to copy the variable to a
dummy, then inc/dec the variable, then return the dummy.
So instead, rename to pre-increment that can be implemented
without the dummy, actually the current implementation
it is already the correct pre-increment, with the only change
to return a reference (an l-value) and not a copy, so
to properly mimic the pre-increment on native integers.
Spotted by Kojirion.
No functional change.
We always attempt to keep at least this emergencyBaseTime
at clock. But if available time is very low it means that
we will force ourself to play immediately to satisfy the
emergencyBaseTime constrain and so leading to blunders.
Patch is good at short and very short TC (15secs and 5secs respectively)
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 26590 W: 5426 L: 5245 D: 15919
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 5767 W: 1397 L: 1268 D: 3102
Instead seems has no influence at longer TC (60 secs)
LLR: -2.96 (-2.94,2.94) [0.00,6.00]
Total: 79862 W: 13623 L: 13339 D: 52900
So it is committed to have a broader testing but is
to be consider still EXPERIMENTAL and can be reverted
easily.
No functional change.
In case we have less then 10ms to think as soon as
we wake up the timer, it immediately fires and calls
check_time() where due to condition:
elapsed > TimeMgr.maximum_time() - 2 * TimerResolution
the stop flag is set and search returns immediately, without
actually search anything.
Here the somewhat hacky fix is to start the timer after
at least one iteration as been completed.
No functional change.
The possible maximum mobility cardinality (plus one in case of
zero squares available) is:
- Knights: max. 8 squares -> max. 9 entries
- Bishops: max. 13 squares -> max. 14 entries
- Rooks: max. 14 squares -> max. 15 entries
- Queen: max. 27 squares -> max. 28 entries
So remove the extra entries in the table.
Spotted by Dariusz Orzechowski.
No functional change.
Union of
- LMR >= 3 plies from Gary tests.stockfishchess.org/tests/view/522522960ebc595d328fcafd
- allows() tweak from Reuven tests.stockfishchess.org/tests/view/5225fa1c0ebc595d328fcb53
Both passed Step I and failed Step II.
Instead this union passed both short TC:
LLR: 2.95 (-2.94,2.94)
Total: 14525 W: 3063 L: 2874 D: 8588
And long TC
LLR: 2.94 (-2.94,2.94)
Total: 31075 W: 5566 L: 5308 D: 20201
bench: 4238160
Idea is sound but implementation is partial. Ryan and Joona noticed that
we leave an hole in material table. Also we got another report by an user
of an odd behaviour. Namely, if you start stockfish and from the prompt
give 'bench' you get 3453941, then if you run again bench you get 3453940.
The reason is that two different positions with the same number of pieces,
but one with a bishop pair and another without have the same material key.
But after Eelco patch also different material imbalance and this yields
to this issue.
Restesting at long TC shows the patch does not really contribute at
ELO improvement. Actually patch failed at long TC.
LLR: -2.97 (-2.94,2.94)
Total: 23109 W: 4104 L: 4092 D: 14913
So revert.
bench: 3453945
Prefer pos.bishop_pair() to pos.count<BISHOP>(WHITE) > 1
because the first checks that the two bishops are on
different color squares.
Although the change seems to kick in only in very rare cases,
quite surprisingly it was able to pass SPRT test at short TC.
LLR: 2.95 (-2.94,2.94)
Total: 39818 W: 8174 L: 7956 D: 23688
bench: 3453941
This was thought to be a draw but the bishops generally win. However,
it takes up to 66 moves. The position in the diagram was thought to be
a draw for over one hundred years, but tablebases show that White wins
in 45 moves. All of the long wins go through this type of semi-fortress
position. It takes several moves to force Black out of the temporary
fortress in the corner; then precise play with the bishops prevents Black
from forming the temporary fortress in another corner (Nunn 1995:265ff).
Before computer analysis, Speelman listed this position as unresolved,
but "probably a draw" (Speelman 1981:109).
bench: 3453945
STANDALONE-TOOLCHAIN.html in Android NDK says:
It is recommended to use the -mthumb compiler flag to force the generation
of 16-bit Thumb-1 instructions (the default being 32-bit ARM ones).
If you want to target the 'armeabi-v7a' ABI, you will need ensure that the
following two flags are being used:
CFLAGS='-march=armv7-a -mfloat-abi=softfp'
Note: The first flag enables Thumb-2 instructions, and the second one
enables H/W FPU instructions while ensuring that floating-point
parameters are passed in core registers, which is critical for
ABI compatibility. Do *not* use these flags separately!
Thanks to Peter Osterlund for pointout this doc and for showing me
an example Makefile to follow.
No functional change.
Becuase castle is coded as "king captures the rook"
the to_sq(move), A1/8 or H1/8 is empty after the move,
leading to assert assert(p != NO_PIECE) in color_of().
Teach allows() asserts about castle and fix the crash.
Bug reported by Ryan Takker and tracked down by Tom Vijlbrief.
No functional change.
If our rook is behind a passed pawn, all
squares are defended.
One of the longest tests to pass !
Passed both short TC
LLR: 2.97 (-2.94,2.94)
Total: 44560 W: 9518 L: 9281 D: 25761
And long TC
LLR: 2.96 (-2.94,2.94)
Total: 61348 W: 11618 L: 11192 D: 38538
bench: 3787694
With
position fen 7k/8/8/8/8/7P/6K1/7B w - - 0 1
go depth 25
The evaluation at depth 22 is not draw as it should be. The reason is that
when search reaches the position 8/6kP/8/8/8/3B4/6K1/8 w - - 0 1 if white plays
h8R or h8N then we get a position that is a "KNOWN_WIN" and is _not_ a check, so
futility pruning in qsearch kicks in and black may think that it is "futile"
to reply Kxh8 since, according to the logic of the code, it cannot raise the score
back towards a draw.
bench: 4728533
The case of two lone kings on the board is already considered
by the "No pawns" scaling factor rules in material.cpp as is
KBK and KNK.
Moreover we had a small leak in endgames map because for
KK endgame it happens white and black material keys are the
same (both equal to zero), so when adding the black endgame in
Endgames::add() we were overwriting the already exsisting
white one, leading to a memory leak found by Valgrind.
So remove the endgames althogheter and rely on scaling
to correctly set the endgames value to a draw.
No functional change.
Instead of classical flags, throw an
exception when we want to immediately halt
the search. Currently only one type
is used for both UCI stop and threads
cut off.
No functional change.
Idea originated from a post of Don Dailey
on talkchess and reported by Eelco.
This is the last succesful attempt of a long
series of trials (as usually happens, the
'idea' alone is not enough).
Passed both short 15secs TC
LLR: 2.97 (-2.94,2.94)
Total: 7629 W: 1645 L: 1515 D: 4469
And long 60secs TC
LLR: 2.96 (-2.94,2.94)
Total: 10218 W: 1932 L: 1775 D: 6511
bench: 4944581
With the new automatic setting of split depth
instead of a default, the user no longer needs
guidance on setting the split point.
Also threads now defaults to one.
No functional change.
Search::RootColor is a global parameter set
before to start a search, it is not something
trace() should change.
This patch allows to add trace() calls, for
debugging, inside search itself without altering
the bench, and also ensures that the values
returned by trace() and evaluate() are fully
equivalent.
No functional change.
The rounding formula is different between
positive and negative scores due to the
GrainSize/2 term that is asymmetric.
So use truncation instead of rounding. This
guarantees that evaluation is rounded to zero
in the same way for both positive and negative
scores.
Found with position's flip
bench: 4634244
Because VALUE_NONE is 30002, it happens that
after a check the next move is never an improving
one.
After this patch bench signature is independent from
VALUE_NONE actual value.
bench: 4303194
Apply to LMR the same Eelco's idea
applied to move count pruning.
This is the result of a series of
attempts started by Thomas Kolarik.
Passed both short TC
LLR: 2.95 (-2.94, 2.94)
Total: 5675 W: 1241 L: 1117 D: 3317
And long TC:
LLR: 2.95 (-2.94, 2.94)
Total: 8748 W: 1689 L: 1539 D: 5520
bench: 4356801
Not a real functional change, but bench changed due to different piecelist
reordering. To verify it a temporary my canonicalize_rooks function was
written as follows. It just ensures that the rook on the "smaller" square
is listed first.
void Position::canonicalize_rooks(Color c)
{
if (pieceCount[c][ROOK] == 2)
{
Square s0 = pieceList[c][ROOK][0];
Square s1 = pieceList[c][ROOK][1];
if (s0 > s1)
{
pieceList[c][ROOK][0] = s1;
pieceList[c][ROOK][1] = s0;
index[s0] = 1;
index[s1] = 0;
}
}
}
With this both bench and the test on Chess960 positions
./stockfish bench 128 1 8 Chess960.epd file > /dev/null
Gives same result.
bench: 4424151
Set threads number always to 1 at startup and let the
user explicitly to chose the number of threads.
Also preserve the useful behavior of automatically set
"Min Split Depth" according to the requested threads,
indeed this parameter is too technical for a casual user,
so, when left to zero, we set it on a sensible value.
No functional change
The new Position methods add_piece, move_piece, and remove_piece
now manage the member variables pieceList, pieceCount, and index,
and 9 blocks of code in Position that used to manipulate those
data structures by hand now call the new methods.
There is a slightly slowdown (< 1%) on Clang and on perft,
but the cleanup compensates the little speed loss.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Introduce ThreadBase struct that is search
agnostic and just handles low level stuff,
and derive all the other specialized classes
form here.
In particular TimerThread does not hinerits
anymore all the search related stuff from Thread.
Also some renaming while there.
Suggested by Steven Edwards
No functional change.
At thread creation start_routine() is called
and from there the virtual function idle_loop()
because we do this inside Thread c'tor, where the
virtual mechanism is disabled, it could happen that
the base class idle_loop() is called instead.
The issue happens with TimerThread and MainThread
where, at launch, start_routine calls
Thread::idle_loop instead of the derived ones.
Normally this bug is hidden because c'tor finishes
before start_routine() is actually called in the
just created execution thread, but on some platforms
and in some cases this is not guaranteed and the
engine hangs.
Reported by Ted Wong on talkchess
No functional change.
The endgame king + minor vs king is erroneusly
detected as king + minor vs king + minor
Here the fix is to detect king + minor earlier,
in particular to add these trivial cases to
endgame evaluation functions.
Spotted by Reuven Peleg
bench: 4727133
And #ifdef instead of #if defined
This is more standard form (see for example iostream file).
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
A big simplification and removing of useless code.
Finished at 50% both at short TC (with SPRT) than
at long TC at fixed number of games:
ELO: -0.14 +-3.4 (95%) LOS: 46.8%
Total: 15206 W: 2836 L: 2842 D: 9528
bench: 5059948
This reverts commit 4b3a0fdab0.
As Gary says: " It failed when I tried it at long TC previously, and only
barely passed this time. Some anecdotal evidence is that it hurts vs other
engines as well (the Lightspeed rating list showed a 16 elo drop from previous
best version - still +- 5 error bars on both, but that's still significant)"
I also agree that if we have some doubts (like in this case) it is better to
be safe than sorry.
bench: 4615572
Reduces the influence of PSQT for entries such as
the extended center and the h-file.
Passed both short TC test:
LLR: 2.95 (-2.94,2.94)
Total: 23919 W: 5207 L: 5029 D: 13683
And long TC one:
LLR: 2.96 (-2.94,2.94)
Total: 5762 W: 1108 L: 974 D: 3680
Bench: 4617880
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This very speed critical code was full of clever (!)
tricks and subtle details.
So I have rewritten it in a more straithforward way
and, as very often happens, result is even faster
than original.
No functional change.
This reverts commit 3e95800814
For some reason it fails the short TC test:
LLR: -2.96 (-2.94,2.94)
Total: 20033 W: 4214 L: 4265 D: 11554
bench: 4769737
It is somewhat unbilievable but our SEE is broken !
If the first SEE move is a king capture and square is
defended then SEE continues instead of breaking.
The bug shows only on normal SEE, not see_sign() so
probing with a:
dbg_hit_on_c(slIndex==1, captured == KING);
reports just a tiny:
Total 3465656 Hits 6646 hit rate (%) 0
Bug was there since Retire seeValues[] and move PieceValue[] out of Position of 26/6/2011 (!)
although for some reason didn't show immediately, indeed the
bougous patch was a "No functional change" (!!)
bench: 4699504
It is somewhat unbilievable but our SEE is broken !
If the first SEE move is a king capture and square is
defended then SEE continues instead of breaking.
The bug shows only on normal SEE, not see_sign() so
probing with a:
dbg_hit_on_c(slIndex==1, captured == KING);
reports just a tiny:
Total 3465656 Hits 6646 hit rate (%) 0
Bug was there since 351ef5c85b of 26/6/2011 (!)
although for some reason didn't show immediately, indeed the
bougous patch was a "No functional change" (!!)
bench: 4793754
Without patch we have 333198 nps, with patch 334249.
A very small +0.3%, not a lot manily becuase this is a
side path that is taken very few times.
Anyhow idea is correct becuase first 'quick' condition
has an hit rate of about 95%.
No functional change.
But still keep the same original
margin for score.
Passed both short TC test
LR: 2.95 (-2.94,2.94)
Total: 3710 W: 845 L: 726 D: 2139
And long TC
LLR: 2.95 (-2.94,2.94)
Total: 57859 W: 10939 L: 10532 D: 36388
bench: 4769737
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Partially revert previous patch and use
unlikey() just as code annotation.
Actually it is better to rely on a profiler for branch prediction:
http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html
"In fact, even when only one in ten thousand values is nonzero,
we're still at only roughly the break-even point"
No functional change,
It is somewhat redundant and could make SF
name too long, so use just Version, in case
of a signature build Version will be set to
'sig-xxx' otherwise, if left empty, we fall
back on usual date stamp.
No functional change.
When compiling with:
make signature-build ARCH=xxx COMP=xxx
After binary has been roduced, it will be run to
get the signature 'stockfish bench' and this
number will be used as Version, so that it
will be easy to track the original sources
from a binary.
No functinal change.
Due to a strange issue (bug?) the ternary
operator does not return a BitCountType for
icc, so revert to the expression used
before bcbc9bfd1f
No functional change.
Here speed up is the name of the game.
Speed up is gained:
- Removing the useless enoughMaterial code
- Limiting trapped rook evaluation to where it counts
Tested at long TC:
LLR: 2.97 (-2.94,2.94)
Total: 10061 W: 1948 L: 1790 D: 6323
bench: 4558173
Thanks to Don, Miguel, Louis and the other people
of talkchess forum for the suggestion:
http://www.talkchess.com/forum/viewtopic.php?t=48612
Also sync polyglot.ini with current UCI options
No functional change.
Unfortunatly a reverse test at long TC failed:
master^ vs master
LLR: 1.37 (-2.94,2.94)
Total: 33682 W: 6294 L: 6071 D: 21317
So becuase short TC score is 50% there is a good
possibility patch is not scalable.
So revert it.
bench: 4507288
Do not use threat move to detect the condition. This
let us to retire the big allows() function.
Test at short TC was within 50% score:
LLR: -2.95 (-2.94,2.94)
Total: 38272 W: 7941 L: 7940 D: 22391
To be verified with reverse long TC
bench: 4191565
Here the main difference is that now we center
aspiration window on last returned score. This allows
to simplify handling of mate scores.
We have done a reversed SPRT tests, where we wanted to
verify if master is stronger than this patch.
Long TC: master vs this patch (reverse test)
LLR: -2.95 (-2.94,2.94)
Total: 37992 W: 7012 L: 6920 D: 24060
bench: 4507288
Temporary revert aspiration window patch
so to be visible to everybody: it will be
re-applied with next patch
No functional change (together with next one)
Here the main difference is that now we center
aspiration window on last returned score. This allows
to simplify handling of mate scores.
We have done a reversed SPRT tests, where we wanted to
verify if master is stronger than this patch.
Long TC: master vs this patch (reverse test)
LLR: -2.95 (-2.94,2.94)
Total: 37992 W: 7012 L: 6920 D: 24060
bench: 4507288
A simplification of the 'dangerous' definition.
Seems neutral at reverse test at long TC
master vs patch
LLR: -2.96 (-2.94,2.94)
Total: 16974 W: 3122 L: 3139 D: 10713
bench: 4689029
Function calloc() already initializes memory to
zero, so avoid calling clear() afterwards.
Also some renaming while there (inspired by DiscoCheck).
No functional change.
Here we skip the call to pos.attacks_from<ROOK>(s) in the 98%
of cases, testing the first 2 members first. Unfortunatly
code is a bit triky and not clear. So we give up to the
speed optimization in exchange of more code clarity.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
So when we are doing a LMR search at the parent ALL node.
This patch didn't prove stronger at 60" TC
LLR: -2.97 (-2.94,2.94)
Total: 22398 W: 4070 L: 4060 D: 14268
But, first, it scores at 50%, second (and most important for me) the opposite,
i.e. normal reduction when parent node is not reduced, seems very bad:
LLR: -2.95 (-2.94,2.94)
Total: 7036 W: 1446 L: 1534 D: 4056
According to Don, this idea of increased reduction of CUT nodes
works because if parent node is reduced, missing a cut-off due to
reduced depth search (meaning position is somehow tricky) forces
a full depth research at parent node, giving due insight in this
set of sensible positions.
IOW if we expect a node to fail-high at depth n, then we assume it
should fail-high also at depth n-1, if this doesn't happen it means
position is tricky enough to deserve a research at depth n+1.
bench: 4687419
We got a good result from this tweak, in line with
what was already found by Don Dailey.
At short TC:
LLR: 2.95 (-2.94,2.94)
Total: 13097 W: 2742 L: 2598 D: 7757
At long TC:
LLR: 2.97 (-2.94,2.94)
Total: 7281 W: 1408 L: 1265 D: 4608
bench: 5108393
Follow Don Dailey definition of cut/all node:
"If the previous node was a cut node, we consider this an ALL node.
The only exception is for PV nodes which are a special case of ALL nodes.
In the PVS framework, the first zero width window searched from a PV
node is by our definition a CUT node and if you have to do a re-search
then it is suddenly promoted to a PV nodes (as per PVS search) and only
then can the cut and all nodes swap positions. In other words, these
internal search failures can force the status of every node in the subtree
to swap if it propagates back to the last PV nodes."
http://talkchess.com/forum/viewtopic.php?topic_view=threads&p=519741&t=47577
With this definition we have an hit rate higher than 90% on:
if (!PvNode && depth > 4 * ONE_PLY)
dbg_hit_on_c(cutNode, (bestValue >= beta));
And an hit rate of just 28% on:
if (!PvNode && depth > 4 * ONE_PLY)
dbg_hit_on_c(!cutNode, (bestValue >= beta));
No functional change.
Fix was wrong becuase search starts from ss+1,
code is a bit tricky here, so rewrite in a way
to be more easy to read and understand.
Spotted by Eelco.
No functional change.
In case of we pick a sub-optimal move be
sure to print this, and not the best one
on seach log file.
Bug spotted by Guenther Demetz.
No functional change.
Search is started after setting a position and
issuing UCI 'go' command. Then if we stop the search
and call 'go' again without setting a new position it
is assumed that the previous setup is preserved, but
this is not the case because what happens is that
SetupStates is reset to NULL, leading to a crash as
soon as RootPos.is_draw() is called because st->previous
is now stale.
UCI protocol is not very clear about requiring that a
position is setup always before launching a search,
so here we easy the life of GUI developers assuming
that the current state is preserved after returning
from a 'stop' command.
Bug reported by Gregor Cramer.
No functional change.
A small number of tests with simulated
annealing at 15s indicated these values
may be better
And this is verified at long 60+0.05 TC
LLR: 2.95 (-2.94,2.94)
Total: 40658 W: 7821 L: 7501 D: 25336
bench: 4931544
This patch is the sum of:
- Grainsize of 4 instead of 8
- Removing "depth < DEPTH_ZERO"
- Change DEPTH_QS_RECAPTURES = -5 to -7
All the patches individually failed to pass SPRT but scored
around 50%.
Together they pass easily short TC:
LLR: 2.96 (-2.94,2.94)
Total: 4429 W: 964 L: 844 D: 2621
And with some difficult long TC of 60+0.05:
LLR: 2.95 (-2.94,2.94)
Total: 64133 W: 11968 L: 11532 D: 40633
bench: 4821467
Add MOVE_NONE at the tail, this allows to loop
across MoveList checking for *it != MOVE_NONE,
and because *it is used imediately after compiler
is able to reuse it.
With this small patch perft speed increased of 3%
And it is also a semplification !
No functional change.
Most of the time we cut-off earlier, at captures, so this
results in useless work.
There is a small functionality change becuase 'ss' can change
from MovePicker c'tor to when killers are tried due, for
instance, to singular search.
bench: 4603795
Very good at long 60"+0.05 TC
LLR: 2.95 (-2.94,2.94)
Total: 5954 W: 1151 L: 1016 D: 3787
[edit: slightly changed form original patch to avoid useless loop
across killers when killer is MOVE_NONE]
bench: 4327405
Performed more or less well at short TC
LLR: 2.95 (-2.94,2.94)
Total: 50517 W: 9815 L: 9574 D: 31128
And a bit better at long TC
LLR: 2.96 (-2.94,2.94)
Total: 15564 W: 2805 L: 2624 D: 10135
bench: 4375253
It seems that do not limiting checking the
trapped rook only on rank 1 improves the
score.
At long TC
LLR: 2.97 (-2.94,2.94)
Total: 6581 W: 1346 L: 1204 D: 4031
bench: 4985012
Don't update refutation table in case of
previous move is MOVE_NULL or MOVE_NONE
and don't try refutation if is already
a killer move.
Pass both short TC
LLR: 2.96 (-2.94,2.94)
Total: 4310 W: 953 L: 869 D: 2488
And long one
LLR: 2.95 (-2.94,2.94)
Total: 6707 W: 1254 L: 1184 D: 4269
bench: 4785954
Very good result both at short TC 15+0.05
LLR: 2.95 (-2.94,2.94)
Total: 2803 W: 596 L: 483 D: 1724
And at long TC 60+0.05
LLR: 2.95 (-2.94,2.94)
Total: 2862 W: 548 L: 431 D: 1883
bench: 4329221
Unortunatly we have no guarantee that the call to
operator~(Color c) is resolved at compile time.
Perhaps the solution would be to use C++11 const_expr,
but for now simply use the good old-style ternary operator
that works as expected.
No functional change.
A rook is trapped if on rank 1 as is the king.
Currently the condition aloows for the rook
to be also in front of the pawns as long
as king is on first rank.
Verified with short TC test:
LLR: -1.71 (-2.94,2.94)
Total: 23234 W: 4317 L: 4317 D: 14600
Here what it counts is that after 23K games
result is equal.
bench: 4696542
When it is already defined(_WIN32).
According to Microsoft documentation:
http://msdn.microsoft.com/en-us/library/b0084kay.aspx
_WIN32 Defined for applications for Win32 and Win64. Always defined.
_WIN64 Defined for applications for Win64.
Patch suggested by Joona.
No functional change.
This info is normally printed together with
PV info in uci_pv() but when search is stopped,
for instance when max search time is reached,
uci_pv is not called and we miss this bits.
Suggested by gravy_train
No functional change.
But this time do not play with pointers, in
particular do not assume that size_t is an
unsigned type of the same width as pointers.
This code should be fully portable.
No functional change.
This fixes an assert while testing with debug on.
Assert was due to static null pruning returning value
eval - futility_margin(depth, (ss-1)->futilityMoveCount)
That was sometimes higher than VALUE_INFINITE triggering
an assert at the caller site.
Because eval con be equal to ttValue and anyhow is read from
TT that can be corrupted in SMP case, we need to sanity
check it before to use.
bench: 4176431
Let TT clusters (16*4=64 bytes) to hold on a singe cache line.
This avoids the need for the double prefetch.
Original patches by Lucas and Jean-Francois that has also tested
on his AMD FX:
BIG HASHTABLE
./stockfish bench 1024 1 18 > /dev/null
Before:
1437642 nps
1426519 nps
1438493 nps
After:
1474482 nps
1476375 nps
1475877 nps
SMALL HASHTABLE
./stockfish bench 128 1 18 > /dev/null
Before:
1435207 nps
1435586 nps
1433741 nps
After:
1479143 nps
1471042 nps
1472286 nps
No functional change.
Crash is due to uninitialized ss->futilityMoveCount that
when happens to be negative, yields to an out of range
access in futility_margin().
Bug is subtle because it shows itself only in SMP case.
Indeed in single thread mode we only use the
Stack ss[MAX_PLY_PLUS_2];
Allocated at the begin of id_loop() and due to pure
(bad) luck, it happens that for all the MAX_PLY_PLUS_2
elements, ss[i].futilityMoveCount >= 0
Note that the patch does not prevent futilityMoveCount
to be overwritten after, for instance singular search
or null verification, but to keep things readable and
because the effect is almost unmeasurable, we here
prefer a slightly incorrect but simpler patch.
bench: 4311634
Instead of a pointer. This should fix the issue of
remaining with a stale pointer when for instance calling
IID, but also null search verification, singular search
and razoring where we call search with the same ss
pointer. In this case ss->ei is overwritten in the
search() call and upon returning remains stale.
This patch could have a performance hit because Eval::Info
is big (176 bytes) and during splitting we copy 4 ss entries.
On the good side, this patch is a clean solution.
Proposed by Gary.
No functional change.
Allow to use EvalInfo struct, populated by
evaluation(), in search.
In particular we allocate Eval::Info on the stack
and pass a pointer to this to evaluate().
Also add to Search::Stack a pointer to Eval::Info,
this allows to reference eval info of previous/next
nodes.
WARNING: Eval::Info is NOT initialized and is populated
by evaluate(), only if the latter is called, and this
does not happen in all the code paths, so care should be
taken when accessing this struct.
No functional change.
The idea is to fail high more easily in static
null test if in the parent node we are already
very deep in the move list, so the propability
to fail high there is very low.
[edit: I have slightly changed the functionality
moving
ss->futilityMoveCount = moveCount;
At the end of the pruning code, this should not affect
ELO in anyway, but makes code more natural and logic]
Test with SPRT is good at 15+0.05
LLR: 2.96 (-2.94,2.94)
Total: 50653 W: 10024 L: 9780 D: 30849
And at 60+0.05
LLR: 2.97 (-2.94,2.94)
Total: 40799 W: 7227 L: 6921 D: 26651
bench: 4530093
When we use sysconf(_SC_NPROCESSORS_ONLN) to get number of
cores, we have to include sysconf library that is unistd.h
Sometimes it happens to work just becuase unistd.h indirectly
included by some other libraries, but not always.
Reported and fixed by Eyal BD
No functional change.
The idea is to penalize a bishop in case of
its pawns are on the same colored squares.
Good at short 15"+0.05 TC
LLR: 2.95 (-2.94,2.94)
Total: 4252 W: 925 L: 806 D: 2521
And at longer 60"+0.05 TC
LLR: 2.95 (-2.94,2.94)
Total: 15006 W: 2743 L: 2564 D: 9699
bench: 5274705
It seems stronger both at fast 15+0.05 TC with fixed game number test:
ELO: 2.74 +-2.7 (95%) LOS: 97.6%
Total: 24000 W: 4698 L: 4509 D: 14793
And also at long 60+0.05 TC with SPRT
LLR: 3.05 (-2.94,2.94)
Total: 38986 W: 6845 L: 6547 D: 25594
bench: 5157061
Use an optinal argument instead of a template
parameter. Interestingly, not only is simpler,
but also faster, perhaps due to less L1 instruction
cache pressure because we don't duplicate the very
used SEE code path.
No functional change.
Rationale:
- Current settings seem to make engine *significantly* weaker in analysis mode.
- In practice this setting only has effect when king safety scores are high.
- Even in analysis mode its far more important to know if one side is getting mated,
rather than get evaluation correct with 1cp accuracy.
No functional change
According to Jean-Paul this setup should be stronger
than default.
And SPRT test seems to confirm it:
At fast TC 15"+0.05
ELO: 3.33 +-2.7 (95%) LOS: 99.2%
Total: 25866 W: 5461 L: 5213 D: 15192
At longer TC 60"+0.05
ELO: 7.27 +-5.0 (95%) LOS: 99.8%
Total: 6544 W: 1212 L: 1075 D: 4257
bench: 5473339
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Increasing depth limit to 10 plies seems stronger
after 16K games at 15"+0.05 (ELO: +3.56) and also
repeating the test at 60"+0.05 TC:
ELO: 2.08 +-3.1 (95%) LOS: 90.9%
Total: 16000 W: 2641 L: 2545 D: 10814
Moreover setting the limit to 12 is proved stronger
then limit set to 10 by direct SPRT test at 15"+0.05:
ELO: 2.56 +-2.0 (95%) LOS: 99.5%
Total: 46568 W: 9240 L: 8897 D: 28431
So we directly set the limit to 12, the strongest setup.
bench: 4361224
Master IID formula is depth / 2
Previous patch is depth - 4 * ONE_PLY
This one is the middle way:
(dept/2 + depth-4*ONE_PLY)/2 -> depth-2*ONE_PLY-depth/4
After 16000 games at 60+0.05 th 1
ELO: 4.08 +-3.1 (95%) LOS: 99.5%
Total: 16000 W: 2742 L: 2554 D: 10704
bench: 4781239
Same idea of 5af8179647
in qsearch() but applied to search()
After 15500 games at 15+0.05
ELO: 4.48 +-3.4 (95%) LOS: 99.5%
Total: 15500 W: 3061 L: 2861 D: 9578
bench: 4985829
Always before pruning the move, it's important to check that:
bestValue > VALUE_MATED_IN_MAX_PLY
See example position:
8/2p1p3/P1NpP3/3k4/1P1BN3/2P1P3/2Q5/6K1 w - - 0 1
http://support.stockfishchess.org/discussions/problems/268-wrong-declaring-a-forced-mate-in-3-moves
This problem was present in 2.3.1, then it was fixed by my patch.
After 24000 games at 15+0.05
ELO: 2.40 +-4.4 (95%) LOS: 95.7%
Total: 24000 W: 4774 L: 4608 D: 14618
bench: 4465997
This reverts commit a24da071f0
Seems a regression when tested against 2.3.1
With this patch, have after 20000 games at 60+0.05, we have
ELO: 13.42 +-4.8 (95%) LOS: 100.0%
Total: 20000 W: 3746 L: 2974 D: 13280
Instead with the patch reverted:
ELO: 16.62 +-4.8 (95%) LOS: 100.0%
Total: 20000 W: 3816 L: 2860 D: 13324
Although we are within error bounds here we take the conservative
approach of not introducing changes that are not proved stronger
It doesn't mean that the change shall be weaker, simply that we
don't want to take any risk.
No functional change.
Here the rational seems to be that if after one try easy
move detection fails then the easy move is not so easy :-)
After 15563 games at 60+0.05
ELO: 3.04 +-5.5 (95%) LOS: 97.0%
Total: 15563 W: 2664 L: 2528 D: 10371
No functional change.
Increase MaxRatio to use more time when in trouble.
After 16000 games at 60+0.05
ELO: 4.89 +-5.4 (95%) LOS: 99.9%
Total: 16000 W: 2700 L: 2475 D: 10825
No functional change.
This seems good at short TC controls.
After 10000 games at 20+0.05
ELO: 9.56 +-6.8 (95%) LOS: 100.0%
Total: 10000 W: 1949 L: 1674 D: 6377
Testing at long TC and regression testing is still
ongoing. So this is a bit speculative commit and
could be reverted in the future.
Also re-testing at long TC the SEE pruning in PV nodes
seems less effective (perhaps even a regression, but
still ongoing) so disabled for now.
bench: 4968764
This reverts commit 0d68b523a3.
After easy move semplification this machinery is not
needed anymore (because of we don't need to know if a
root move is a recapture)
No functional change.
Detect a move as easy only if it is the only one ;-)
or if is much better than remaining ones after we
have spent 20% of search time.
Tests are ongoing, but it seems this semplification
stands. Anyhow it is experimental for now and could
be reverted/improved with further work Gary is
testing right now.
No functional change.
After previous patch if split point master is
waiting for job and "Use Sleeping Threads" is
false (our condition for official releases) then
it will lock/unlock splitPoint mutex in a super
tight loop badly affecting performance.
Rewrite the code to lock only when we are about
to finish. Note that race condition on slavesMask
is anyhow fixed.
No functional change.
There is no point searching a move that is forced.
It wastes time while allowing computer opponents to
fill hash with 100% accuracy.
[edit: Condition moved together with "easy move" ones]
Bench identical: 4922272
We detect an easy move as a recapture with an
high margin on the second best move.
Unfortunatly the recapture detection is broken
becuase we identify as a recapture any move that
follows an opponent's previous capture !
This patch fix the logic to correctly detect a
real re-capture.
No functional change.
Note that we read shared data without lock
protection, so code is theoretically prone to
torn reads. But, first splitPoint pointer
never changes, and alpha is of integer type so
it is read in a single DWORD access.
No functional change.
This patch is actually the sum of two contributions that
have been tested independently:
1) Pruning of negative SEE moves in PV
After 10000 games at 20+0.05
ELO: 5.18 +-7 (95%) LOS: 99.2%
Total: 10000 W: 1952 L: 1803 D: 6245
2) Remove of bestValue > VALUE_MATED_IN_MAX_PLY condition
After 23000 games at 20+0.05
ELO: 1.63 +-4 (95%) LOS: 88.1%
Total: 23000 W: 4232 L: 4124 D: 14644
The whole patch as been re-tested at long TC with positive results:
After 10000 games at 60+0.05
ELO: 4.31 +-7 (95%) LOS: 98.3%
Total: 10000 W: 1765 L: 1641 D: 6594
kf = (kf == FILE_A) ? kf++ : ....
is tricky becuase kf is updated twice and it happens
to do the right thing just by accident.
Rewrite in a better way.
Spotted by pdimov
No functional change.
Give a bonus if a bishop can pin a piece or can
give a discovered check through an x-ray attack.
Seems good after 24000 games at 15"+0.05 (single thread):
ELO: 12.30 +- 99%: 5.79 95%: 4.40 LOS: 100.00%
Total: 24000 W: 4931 L: 4082 D: 14987
bench: 4917064
Rename startPosPly to gamePly and increment/decrement
the variable after each do/undo move. This adds a little
overhead in do_move() but we will need to have the
game ply during the search for the next patches now
under test.
Currently we don't increment gamePly in do_null_move()
becuase it is not needed at the moment. Could change
in the future.
As a nice side effect we can now remove an hack in
startpos_ply_counter().
No functional change.
Another trick, along the same lines of previous
patch. This time we first check positions with
white side to move that, becuase we start with
pawn on rank 7, are easily classified as wins,
then black ones.
Number of cycles reduced to 15 !
Becuase now it is faster we can remove a lot of
code to detect theoretical draws. We will calculate
them anyhow, although a bit slower, but the speed
up trick more than compensates it.
Verified that generated bitbases match original ones.
No functional change.
Change the way the index is coded so that
now looping from 0 to IndexMax generates
the pawns from RANK_7 down to RANK2.
Becuase positions with pawns at RANK_7
are easily classified as wins/draws, this
small trick allows to reduce the number
of needed iterations from 30 down to 26!
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Rename to insertion_sort so to avoid confusion
with std::sort, also move it to movepicker.cpp
and use the bit slower std::stable_sort in
search.cpp where it is used in not performance
critical paths.
No functional change.
We can encode the ClusterSize directly in the
hashMask, this allows to skip the left shift.
There is no real change, but bench number is now
different because instead of using the lowest order
bits of the key to index the start of the cluster,
now we don't use the last two lsb bits that are
always set to zero (cluster size is 4). So for
instance, if 10 bits are used to index the cluster,
instead of bits [9..0] now we use bits [11..2].
This changes the positions that end up in the same
cluster affecting TT hits and so bench is different.
Also some renaming while there.
bench: 5383795
Do a 32bit bitwise 'and' instead of a 64bit
subtract and bitwise 'and'.
This is possible because even in the biggest
hash table case (8GB) the number of entries
is 2^29 so storable in an unsigned int.
No functional change.
Save the current active position in each Thread
instead of keeping a centralized array in struct
SplitPoint.
This allow to skip a memset() call at each split.
No functional change.
Obsolete renmant of when position was directly
passed to the search instead of being copied
for the main thread as is now.
From Jundery.
No functional change.
The syntax splitPoints() should force the compiler to
value-initialize the array and because there is no
user defined c'tor it falls back on zero-initialization.
Unfortunatly this is broken in MSVC compilers, because
value initialization for non-POD types is not supported,
so left splitPoints un-initialized and add in split()
initialization of slavesPositions, that is the only
member not already set at split time.
This fixes an assert under MSVC when running with
more than one thread.
Spotted and reported by Jundery.
No functional change.
It is a draw if pawns are on G or B files, weaker pawn is
on rank 7 and bishop can't attack the pawn.
No functional change (because it is very rare and does not appear in bench)
To return a pointer to the available
thread instead of a bool. This allows
to simplify the core loop in split().
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This function "returns" two values: bestValue and bestMove
Instead of returning one and passing as pointer the other
be consistent and pass as pointers both.
No functional change.
Currently a ttMove is reduced with ss->reduction = DEPTH_ZERO,
so it is actually not reduced (as it should be), but the
trick works just becuase it happens that ttMove is the first
to be tried and
reduction(depth, 1)
Always returns zero. So explicitly forbid reduction of ttMove
in the LMR condition. This is much clear and self-documented.
No functional change.
Surprisingly this rare case was not considered
when scoring a capture.
Also take in account that in the promotion case
we gain a new piece (typically a queen) but we
lose the promoting pawn.
These small issues were present since Glaurung times!
Found while browsing DiscoCheck sources
bench: 5400063
Handling of History and Gains is almost the same, with
the exception of the update logic, so unify both
classes under a single Stats struct.
No functional change.
And handle the castle directly in do/undo_move().
This allow to greatly simplify the code.
Here the beast is the nasty Chess960 that is
really tricky to get it right because could be
that 'from' and 'to' squares are the same or
that king's 'to' square is rook's 'from' square.
Anyhow should work: verified on all Chess960
starting positions.
No functional and no speed change also in Chess960.
Use a more traditional approach, along the same lines
of do_move().
It is true that we copy more in do_null_move(), but we
save the work in undo_null_move(). Speed test shows the
new code to be even a bit faster.
No functional change.
We have only one call place so inline its content.
BTW, function is already declared as FORCE_INLINE.
Also some small refactoring while there.
No functional change.
When a thread is allocated a bit is set in slavesMask.
This bit corresponds to the thread's index field that,
because it happens to be the position in the threads
array, eventually it is equal to the loop index 'i'.
But instead of relying on this 'coincidence', explicitly
use the 'idx' field so to clarify slavesMask usage.
Backported from c++11 branch.
No functional change.
Intel Compiler has 'invented' this pearl:
warning #1476: field uses tail padding of a base class
Just becuase we have subclassed MainThread and added
the field 'bool thinking'.
Pure nosense. Silence the warning.
No functional change.
Fix again TimerThread::idle_loop() to prevent a
theoretical race with 'exit' flag in ~Thread().
Indeed in Thread d'tor we raise 'exit' and then
call notify() that is lock protected, so we
have to check again for 'exit' before going to
sleep in idle_loop().
Also same change in Thread::idle_loop() where we
now check for 'exit' before to go to sleep.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Great code simplification: - instead do not futility
prune threat refutations. allows_move() is therefore removed.
4000 games at 50,000 nodes/move:
1085-989-1926 [51.2%] LOS=98.3%
4000 games in 10"+0.1"
756-751-2493 [50.1%] LOS=55.1%
EDIT: I have retested the patch of Lucas in a slightly different form
(without pruning in PvNode) and test mre or less confirms that
60 lines of code are totally unuseful:
After 6195 games at 15"+0.05"
1333 - 1325 - 3537 ELO 0
bench 5140990
Silly logic bug introduced in dda7de17e7
Timer thread, when msec = 0, instead of going
to sleep, calls check_time() in an endless loop.
Spotted and reported by snino64 due to abnormally
high CPU usage.
No functional change.
Subclass MainThread and TimerThread and declare
idle_loop() virtual. This allow us to cleanly
remove a good bunch of hacks, relying on C++
polymorphism to do the job.
No functional change.
Rename it is_finished and use it only in main
thread to signal search is finished. This allows
us to simplify the complex SMP logic.
Ultra tricky patch: deep test is required under
wide conditions like pondering on and option
"Use Sleeping Threads" set to false.
No functional change.
This reverts commit 869c924410
I misunderstood here. Actually it can happen that
thread is created but still not entered idle_loop
and at the same time start_searching() is called.
Becuase 'do_sleep' is set start_searching() will
set it to false and start the search, but when,
at last, the thread enters idle_loop(), resets
the flag and goes to sleep: not what we want.
Revert the hack waiting for a better solution
in the next patches.
No functional change.
Finally we can now merge the 'ponderhit' case with
'stop' and 'quit'.
The patches have been done step by step to help debugging
becuase this is really tricky code.
No functional change.
Reset Limits.ponder only if search continue, but if
we are going to stop the search there is no need
(and is also confusing) to clear the 'ponder' flag.
This mimics the behaviour upon rceiving 'stop' when
pondering.
No functional change.
In SAN notation when two pieces of the same type can
move to a given destination square, a disambiguation
additional info (like starting file) shall be added
to the SAN move.
If one of the two pieces is pinned, the corresponding
move _could_ be illegal and in this case disambiguation
is not needed. But to be pinned alone it is not enough
to deduce that the move is illegal, for instance in this
position:
R3rk2/2r6/8/8/8/8/8/K7 b - - 0 1
The move Rc8 is ambiguous although the rook in e8 is pinned
and the correct SAN notation should be Rcc8.
No functional change.
Don't wait for the search to finish after a 'stop'
command, but keep processing the GUI input if any.
Also explicitly wake up the main thread (that could be
sleeping) after a 'stop' or 'quit' command and do not
rely on wait_for_search_finished() doing it for us.
This patch cleans up the code and functions's definitions,
but it is risky and needs a good test under different
conditions to be sure it does not introduces hungs up.
No functional change.
Revert patch c581b7ea36
Seems a regression after testing from Gary:
ELO: 7.24 +- 99%: 17.03 95%: 12.93
LOS: 97.86%
Wins: 439 Losses: 381 Draws: 1962
And mine:
After 5410 games at 15"+0.05
Wins: 936 Losses: 1141 Draws: 3333 ELO -13
Moreover we know that there is a regression in the range
of patches which include the fromNull patch.
Probably this is not the only regression since 2.3.1 and
perhaps the idea under fromNull is good, but at the moment,
while in deep regression hunting, better to be on the safe
side and revert it entirely.
My guess on why this is a regression is that using the
negated evaluation of previous ply in case of null search
fails to take in account the king safety asymmetry between
the two colors. This is of course just a guess.
bench 5503830
They are not self-describing and create a lot of user
requests about them.
Given that the values are already well tuned there
is no need to expose them as UCI options.
No functional change.
Sometimes is faster, but not always and on very long mates
produces strange scores probably due to truncation of PV
artifacts.
So simply perform normal search also in case of UCI 'mate x'
command, with the only difference that when a mate in x is
found search returns immediately.
No functional change.
Now that we can call 'bench' command also from
interactive terminal it makes no more sense to
exit the application if the user types a wrong
file name.
No functional change.
Since &-ing with SpaceMask restricts the set to the home
half of the board, it is possible to use just one popcount
instead of 2 by shifting "safe" to the other half of the
board. This gives a small speedup especially on systems
where hardware popcount is not available.
Patch kindly sent by Richard Vida.
No functional change.
It is now possible to run SF on a 'mate in x' testsuite.
For instance in case of a file with fen strings of
positions with mate in 10 we can now 'bench' on it:
stockfish bench 128 1 10 mate_in_10.epd mate
No functional change.
Following a user request I added the handling of UCI:
go mate x
Currently we just return from a PV node if x moves have been
done. Probably not the best approach. I have looked at Fruit/Toga
sources and there is even simpler: engine falls back on a fixed
depth search.
No functional change.
And return on using TT as backing store for position
evaluations.
Tests (even on single thread) show eval cache was a regression.
In multi thread result should be even worst because eval cache
is a per-thread struct, while TT is shared.
After 4957 games at 15"+0.05 (single thread)
eval cache vs master 969 - 1093 - 2895 -9 ELO
So previous reported result of +18 ELO was probably due to an
issue in the testing framework (a bug in cutechess-cli) that
has been fixed in the meanwhile.
bench: 5386711
In case of null search at low depths returns a fail low
due to a threat then, rather than return beta-1 (to cause
a re-search at full depth in the parent node), we set a flag
threatExtension = true (false by default) that will cause
moves that prevent the threat to be extended of one ply
in the following search.
Idea and patch is by Lucas Braesch.
Lucas also did the tests:
1500 games in 5"+0.05":
SF_threatExtension vs SF_20121222: 366 - 331 - 803 [51.2%] LOS=90.8%
3000 games in 10"+0.1":
SF_threatExtension vs SF_20121222: 610 - 559 - 1831 [50.8%] LOS=93.2%
Tests confirmed by Gary after 10570 games,
ELO: 2.79 +- 99%: 8.72 95%: 6.63
LOS: 94.08%
Wins: 1523 Losses: 1438 Draws: 7607
And finally by me at 15"+0.05, single thread, 3824 games
threatExtension vs master 768 - 692 - 2364 +7 ELO
bench 4918443
This is difficult code becuase a bug here could lead
to very subtle crashes in case of SMP games where we
have TT move corruption due to concurrent access.
Anyhow I have fully verified te code throwing at it
random moves. It shoudl work.
No functional change.
Test by Joona prooves the new feature don't value 70 added lines of code.
Grand totals after 10040 games (crashes: 0) for tt_both
master_9edc7 - 6a93488_6a934: 1756 - 1688 - 6596 ELO +2 (+- 2.7)
Confirmed by test of Gary:
After 8680 games:
ELO: 0.80 +- 99%: 9.62 95%: 7.31
LOS: 65.38%
Wins: 1288 Losses: 1268 Draws: 6130
Thanks a lot to both for testing it !!!
bench 5149248
This is more complex than what I'd like but I
was unable to split in small chunks.
Here we add 2 slots to TTEntry (valueUpper and depthUpper)
so that sizeof(TTEntry) returns to the original 16 bytes
and we can pack exactly 4 entries in a 64 bytes cache line.
Now we save an upper bound score alongside a lower (exact)
score. The idea is to increase TT cut-offs rates becuase
there is now an higher probability for a node to use TT info.
This patch is highly experimental and probably needs further
steps as is hinted by an unrealistic bench number:
bench: 2022385
In almost all cases we already know in advance that
color_of() argument is different from NO_PIECE.
So avoid the check for NO_PIECE in color_of() and
test at caller site in the very few places where
this case could occur.
As a nice side effect, this patch fixes a (bogus)
warning under some versions of gcc.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Use an eval cache instead of TT to store node
position evaluations.
It is already an improvment and, because it frees
two TT entry slots, paves the way to extend TT to
store both upper and lower bounds.
After 4855 games, single thread, 15"+0.05
Mod vs Orig 1165 -920 - 2770 ELO +18
bench: 5149248
And document why this is an hard limit. It
seems for some (lucky) people 32 threads
are not enough.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
In search(), after we evalute the position, in case there
isn't any TT entry we create one with just the evaluation
score.
This patches removes that code. The reason becuase the patch
deserves a single commit it is becuase introduces a (very small)
functional change due to the fact that the total number of
TT stores is less now and this slightly alters the TT hits
of our benchmark.
bench: 4983262
Rely fully on eval cache. Note that we still save eval
info to TT, this is not needed at this moment and will be
removed in future patches. We keep it so to have a "non
functional change" patch.
No functional change.
With this patch series we want to introduce a per-thread
evaluation cache to store node evaluation and do not
rely anymore on the TT table for this.
This patch just introduces the infrastructure.
No functional change.
In case of a RootNode or a SpNode move has
been already checked for legality so we can
skip a redundant check.
Spotted by Frank Genot.
No functional change.
In qsearch we should update the bestValue as we do
in case of futilityValue < beta, also when pruning
moves with non-positive see.
Spotted by Lucas Braesch
Bench: 5695710
Send the PV lines to GUI only once at the end of the
PV search loop or just in case of long searches.
We need to sync also sending of "currmove" info to
avoid sending info on current move without first
informing the GUI on the PV line we are searching on.
No functional change.
At this point we have already verified (value > alpha)
and this implies, in case of a non-PV node, where search
window size is zero, that value >= beta.
This is not so self-evident, so document the code with
an assert condition.
No functional change.
Let the caller to decide where to redirect (cout or cerr) the
ASCII representation of the position. Rename the function to
reflect this.
Renamed also from_fen() and to_fen() to set() and fen() respectively.
No functional change.
In case a PvNode node has a static evaluation above alpha but
no available moves we want to flag the node as BOUND_EXACT,
not as BOUND_UPPER as is currently.
The behaviour was recently introduced with patch d471c49700
of 3/10/2012
Spotted by Hongzhi Cheng.
bench: 5558464
Both Lucas re-test and Jean-Francois confirrm it
is a regression.
Here Jean-Francois's results after 3600 games :
Score of 96d3b1c92b vs 3b87314: 690 - 729 - 2181 [0.495] 3600
ELO: -3.86 +- 99%: 14.94 95%: 11.35
LOS: 15.03%
Wins: 690 Losses: 729 Draws: 2181 Total: 3600
Bench: 5404066
Don't prune and eventually extend check moves of type
DISCO_CHECK (pun intended, Lucas will understand :-) ).
Patch from Lucas Braesch that has also tested it:
Result: 879-661-2137, score=52.96%, LOS=100.00% (time 10"+0.1")
I have started a verification test right now.
bench: 6004966
From Jean-Francois's
Final result after 5000 games :
Score of c581b7e vs a878312: 1163 - 970 - 2867 [0.519] 5000
ELO: 13.35 +- 99%: 12.71 95%: 9.65
LOS: 100.00%
Wins: 1163 Losses: 970 Draws: 2867 Total: 5000
From me
After 3266 games at 20"+0,05
Score of c581b7e vs a878312: 612 - 607 - 2047
So no regression at longer TC and perhaps a little gain at
fast TC.
In this case we try a rather drastic
approach: we simply don't futility prune
in qsearch when arriving from a null move.
So we save evaluating and also save to mess
with eval margins at all because margin is used
only in futility.
Also accuracy should not be affected, actually it
improves because we don't prune anything anymore.
bench: 5404066
Performs well at very short TC of 40/4+0.05 (courtesy of Jean-Francois):
Wins: 2503 Losses: 2146 Draws: 5581 Total: 10230 +12 ELO
But is poor at longer TC of 20"+0.05
Wins: 321 Losses: 373 Draws: 1141 Total: 1808 -10 ELO
The patch was clearly a tradoff between speed and accuracy and
the most interesting part of it are test results that can be
commented as follows:
- A short TC is very sensible to any speed increase
- A longer TC is more sensible to accuracy and less to speed
So a patch that does not change speed is suitable to be tested at
short TC, while a speed/accuracy compromise patch is IMO better to
be tested at longer TC to verify loss of accuracy can be tolerated.
In this case the revert is only temporary. We will come back again
once we will be able to preserve the evaluation margin.
bench: 5809010
Reuse the evaluation of the parent with inverted sign and
set margin to zero (this is an hack!).
This is done only in qsearch where almost 15% of calls are
from a null move. In normal search the number of nodes where
(ss-1)->currentMove == MOVE_NULL
is almost zero and so there is no need of using this trick.
The big advantage of this patch is a speed-up due to skipped
evaluate() calls, that are very costly.
Functionality is of course affected and we will need to proper
test it later. For now we just register a 3-4% speed up.
Suggested by Hongzhi Cheng.
bench: 5051328
When testing if a move blocks the threat path there is no
reason to require the threat to be a slider. Indeed threat
can be a double pawn push like in this example:
r1bq1rk1/ppp1np1p/4n1p1/3p4/3P2Q1/2P1B3/PPBN2PP/R4RK1 w - - 0 16
Where white's move Rf6 blocks the threat f5.
As a nice side effect we can retire the now useless helper
piece_is_slider().
This patch kicks in only very rare cases, indeed the bench is
still the same!
bench: 5809010
There is no guarantee that split() consumes all the node's
moves. Indeed split() can return without performing any job
for instance because MAX_SPLITPOINTS_PER_THREAD is reached
or becuase no available threads are found (this latter case
is much more common).
So search must continue in those cases and we cannot force
exiting from move's loop.
Bug introduced by 1ac417edb8 of 5/10/2012
Spotted by Frank Genot.
No functional change.
If a previous move attacks the king (with the piece
of the threat move removed) then must be a discovered
check, otherwise it means that first move gave check
and we were not able to do a null move.
Also renamed stuff to better document the function's
context.
No functional change.
When testing if a piece is moving through the squares
vacated by a previous move there is no reason to require
the piece to be a slider, indeed we can have a double
pawn push like in this example:
r1q2rk1/2p1bppp/2Pp4/pN5b/Q1P1p3/4B2P/PP1R1PP1/1K5R w - - 3 18
Where black's move f5 is connected to previous move Be7 that
frees the path.
Or we can have a castle move:
r1bqkb1r/pppp1ppp/2n1pn2/1B6/4P3/2N2N2/PPPP1PPP/R1BQK2R b KQkq - 5 1
Where a previous move Bb5 allows the white to castle king side.
This time patch is mine ;-)
new bench: 5809010
We send to GUI multi-pv info after each cycle,
not just once at the end of the PV loop. This is
because at high depths a single root search can
be very slow and we want to update the gui as
soon as we have a new PV score.
Idea is good but implementation is broken because
sort() takes as arguments a pointer to the first
element and one past the last element.
So fix the bug and rename sort arguments to better
reflect their meaning.
Another hit by Hongzhi Cheng. Impressive!
No functional change.
When checking if the moving piece p1 in a previous
move m1 defends the destination square of a move m2
we have to use the occupancy with the from square of
m2 removed so to take in account the case in which
f2 will block an x-ray attack from p1.
For instance in this position:
r2k3r/p1pp1pb1/qn3np1/1N2P3/1p3P2/2B5/PPP3QP/R3K2R b KQ - 1 9
The move eXf6 is connected to the previous move Bc3 that
defends the destination square f6.
With this patch we have about 10% more moves detected as
'connected'. Anyhow the absolute number is very low, about
4000 more moves out of 6M nodes searched.
Another issue spotted by Hongzhi "Hawk Eye" Cheng ;-)
new bench: 5757373
On Intel, perhaps due to 'lea' instruction this way of
zeroing the lsb of *b seems faster than a shift+negate.
On perft (where any speed difference is magnified) I
got a 6% speed up on my Intel i5 64bit.
Suggested by Hongzhi Cheng.
No functional change.
Compiler complies that 'cnt' is initialized but
unused (in !CheckThreeFold case). Moving the
definition of 'cnt'out of the loop seems to do
the trick.
No functional change.
Instead of use a variable so to resolve many conditions
already at compile time. In quiesce is also where we
have most of the InCheck nodes and is one of the most
performance critical code paths.
Speed up of 1.5% with Clang and 1% with gcc
Suggested by Hongzhi Cheng.
No functional change.
When checking if a move defends the threatened piece
we correctly remove from the occupancy bitboard the
moved piece. This patch removes from the occupancy also
the threatening piece so to consider the cases of moves
that defend the threatened piece x-raying through the
threat move.
As example in this position:
r3k2r/p1ppqp2/Bn4p1/3p1n2/4P1N1/5Q1P/PPP2P1P/R3K2R w KQkq - 1 10
The threat black move is dxe4. With this patch we include
(and so don't prune) white's Bb7 that would be pruned otherwise.
The number of affected position is very low, around 1% of cases,
so we don't expect ELO changes, neverthless this is the logical
and natural thing to do.
Patch suggested by Hongzhicheng.
new bench: 5323798
ReducedStateInfo is a redundant struct that is also
prone to errors, indeed must be updated any time is
updated StateInfo. It is a trick to partial copy a
StateInfo object in do_move().
This patch takes advantage of builtin macro offsetof()
to directly calculate the number of quad words to copy.
Note that we still use memcpy to do the actual job of
copying the (48 bytes) of data.
Idea by Richard Vida.
No functional and no performance change.
Only in not performance critical code like pretty_pv(),
otherwise continue to use the good old C-style arrays
like in extract/insert PV where I have done some code
refactoring anyhow.
No functional change.
Silly typo (introduced in e304db9d1e) completely
messed up move notation in case of promotions causing
"Illegal move" warning in cutechess-cli.
Reported by Jörg Oster.
No functional change.
In multi-threads runs with debug on we experience some
asserts due to the fact that TT access is intrinsecally
racy and its contents cannot be always trusted so must
be validated before to be used and this is what the
patch does.
No functional case.
And restore old behaviour of not returning from a RootNode
without updating RootMoves[].
Also renamed is_draw() template parameters to reflect a
'positive' logic (Check instead of Skip) that is easier
to follow.
New bench: 5312693
When signal 'stop' is raised we return bestValue
that could be still set at -VALUE_INFINITE and
this triggers an assert. Fix it by returning
a value we know for sure is not +-VALUE_INFINITE.
Reported by 平岡拓也 Hiraoka.
No functional change.
Note that the actual pickup is done in the class
d'tor so to be sure it is always triggered, even
in case of a sudden exit due to a 'stop' signal.
No functional change.
Function move_to_san() requires the Position to be
passed by referenced because a do/undo move is done
inside the function to detect a possible mate and to
add to the san string the corresponding '#' suffix.
Instead of passing a copy of current position pass
directly the original position object after const
casting it. This has the advantage to avoid a costly
Position copy, on the down side a bench test could
report different searched nodes if print(move) is
used, due to the additionals do_move() calls.
No functional change.
Existing Makefile is buggy for PowerPC, it has no
SSE, yet it is given it if Prefetch is enabled,
because it isn't ARMv7.
Patch from Matthew Brades.
No functional change.
We can have ttValue == VALUE_NONE when we use a TT
slot to just save a position static evaluation, but
in this case we also save DEPTH_NONE so to avoid
using the ttValue in search. This happens to work,
but due to a number of lucky and tricky cases that
we now documnet through a bunch of asserts and a
little change to value_from_tt() that has no real
effect but clarifing the code.
No functional change.
On some platforms 128 MB of RAM for TT is too much,
so run 'bench' with the default 32 MB size.
No functional change although of course now 'bench'
reports a different number: 5545018
Use popcount() instead in the only calling place.
It is used only at initialization so there is no
speed regression and anyhow even initialization
itself is not slowed down: magic bitboard setup
stays around 175 msec on my slow 32bit Core Duo.
No functional change.
Simplify the code and doing this introduce a couple
of (very small) functional changes:
- Always compare to depth even in "mate value" condition
- TT cut-off in qsearch also in case of PvNode, as in search
Verified against regression with 2500 games at 30"+0.05
on 2 threads: 451 - 444 - 1602
Functional changed: new bench is 5544977
As Joona says: "The problem is that when doing full
window search (-VALUE_INFINITE, VALUE_INFINITE), and
pruning all the moves will return fail low which is
mate score, because only clause touching alpha is
"mate distance pruning". So we are returning mate score
although we are just pruning all the moves. In reality
there probably is no mate in sight.
Bug spotted and fixed by Joona.
Implement lsb/msb using armv7 assembly instructions.
msb is the easiest one, using a gcc intrinsic that generates
code using the ARM's clz instruction. lsb is also using this
clz instruction, but with the help of ARM's 'rbit' (bit
reversing) instruction. This leads to a >2% speed gain.
I also renamed 'arm-32' to the more meaningfull 'armv7' in the Makefile
No functional change.
Port to qsearch() the same changes we recently
added to search().
Overall this search refactoring series shows
almost 2% speed up on gcc compile.
No functional change.
When using the Makefile (as for the mingw case),
IS_64BIT and USE_BSFQ are already set with
ARCH=x86-64 and do not need to be redefined.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
First disable Contempt Factor during analysis, then
calculate the modified draw score from the point of
view of the player, so from the point of view of
RootPosition color.
Thanks to Ryan Taker for suggesting the fixes.
No functional change.
These kind of arch specific code is really nasty
to make it right becuase you need to verify on
all the platforms.
Now should compile properly also on ARM
Reported by Jean-Francois.
No functional change.
In particular on ARM processors. Original patch by
Jean-Francois, sligtly modified by me to preserve
the meaning of NO_PREFETCH flag.
Verified with gcc, clang and icc that prefetch instruction
is correctly created.
No functional change.
A simplification and also a small speed-up of
about 1% mainly due to reducing calls to
thisThread->cutoff_occurred().
Worst case split point recovering time after a
cut-off occurred is limited to 3 msec on my slow
PC, and usually is below 1 msec, so it seems safe
to remove the cutoff_occurred() check.
No functional change.
This is very crude and very basic: simply in case
of a draw for repetition or 50 moves rule return
a negative score instead of zero according to the
contempt factor (in centipawns). If contempt is
positive engine will try to avoid draws (to use
with weaker opponents), if negative engine will
try to draw. If zero (default) there are no changes.
No functional change.
Use a value related to PawnValue instead.
This is a different patch from previous one because
could affect game play and skill levels, although
in a mostly unmeasurable way. Indeed thresold has
been raised so easy move is a bit harder to trigger
and skill level is a bit more prone to blunders.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Show the real value in the code, not hide it
behind a variable name, especially when there
is only once occurence.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Extend for an extra half-ply in case the node is (probably)
going to fail high. In this case the added overhead is limited.
A novelity is the way this patch has been tested: Always in
self-play but with a much longer TC to allow the singular
extension to fully kick in and also (my impression) to have
less noisy results.
Ater 1015 games on my QUAD at 60"+0.05
Mod vs Orig 173 - 150 - 692 ELO +8
Handle also the SMP case. This has been quite tricky, not
trivial to enforce the node limit in SMP case becuase
with "helpful master" concept we can have recursive split
points and we cannot lock them all at once so there is the
risk of counting the same nodes more than once.
Anyhow this patch should be race free and counted nodes are
correct.
No functional change.
As long as isPvMove (renamed to pvMove) is set after
legality check, we can postpone legality even in PV case.
Patch aligns the PV case with the common non-pv one.
No functional change.
Patch and tuning by Gary Linscott from an idea of Ryan Taker.
Double tested by Gary:
Wins: 3390 Losses: 2972 Draws: 11323
LOS: 99.999992%
ELO: 8.213465 +- 99%: 6.746506 95%: 5.124415
Win%: 51.181792 +- 99%: 0.969791 95%: 0.736740
And by me:
After 5612 games 1255 1085 3271 +11 ELO
We have a crash with this position:
rkqbnnbr/pppppppp/8/8/8/8/PPPPPPPP/RKQBNNBR w HAha -
What happens is that even if we are castling QUEEN_SIDE,
in this case we have kfrom (B8) < kto (C8) so the loop
that checks for attackers runs forever leading to a crash.
The fix is to check for (kto > kfrom) instead of
Side == KING_SIDE, but this is slower in the normal case of
ortodhox chess, so rewrite generate_castle() to handle the
chess960 case as a template parameter and allow the compiler
to optimize out the comparison in case of normal chess.
Reported by Ray Banks.
Broken by commit a44c5cf4f7 of 3 /12 / 2011 that
was labeled "No functional change" because our 'bench'
test didn't triggered that particular endgame. Indeed
we need to run a specific bench on a set of endgames
position when touching endgame.cpp because normal bench
does not cover endgames properly.
Found by MSVC 2012 code analyzer.
It seems Intel is unable to properly workout templates with 'static'
storage specifier.
Workaround using an anonymous namespace instead.
No functional change.
Currently we exit the loop when
abs(bestValue) >= VALUE_KNOWN_WIN
but there is no logical reason for this. It seems more
natural to re-search again with full open window.
This has practically no impact in most cases, we have a
'no functional change' running 'bench' command.
The trick here is to check for legality only in the
(rare) cases we have pinned pieces or a king move
or an en-passant.
This trick is able to increase the speed of perft
of more then 20%!
No functional change.
Simply reshuffling the code inverting the condition in next_attacker()
yields a miraculous speed up of more than 3% under gcc!
On my laptop a bench run goes from 320Knps to 330Knps
No functional change.
This allows to reduce the scanning for new X-ray attacks
according to the capturing piece type.
It seems to be just a very small speed increase in MSVC 64
bit and gcc 32 bit, I guess cache issues value more than some
instruction less to execute (as usual).
No functional change.
It is very difficult and risky to assure
that a running thread doesn't access a global
variable. This is currently true, but could
change in the future and we don't want to rely
on code that works 'by accident'. The threads
are still running when ThreadPool destructor is
called (after main() returns) and this could
lead to crashes if a thread accesses a global
that has been already freed. The solution is to
use an exit() function and call it while we are
still in main(), ensuring global variables are
still alive at threads termination time.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
When many threds concurrently print you need to serialize
the access to std::cout to avoid output lines are intermixed
with the contents of each thread.
This is not strictly needed at the moment because
only main thread prints out, although some ad-hoc
test could trigger UCI::loop() printing while searching.
Anyhow we want to lift this pretty avoidable constrain
also as a prerequisite for future work.
This patch just introduces the support, next one will enable
the serialization.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Before the search we setup the starting position doing all the
moves (sent by GUI) from start position to the position just
before to start searching.
To do this we use a set of StateInfo records used by each
do_move() call. These records shall be kept valid during all
the search because repetition draw detection uses them to back
track all the earlier positions keys. The problem is that, while
searching, the GUI could send another 'position' command, this
calls set_position() that clears the states! Of course a crash
follows shortly.
Before searching all the relevant parameters are copied in
start_searching() just for this reason: to fully detach data
accessed during the search from the UCI protocol handling.
So the natural solution would be to copy also the setup states.
Unfortunatly this approach does not work because StateInfo
contains a pointer to the previous record, so naively copying and
then freeing the original memory leads to a crash.
That's why we use two std::auto_ptr (one belonging to UCI and another
to Search) to safely transfer ownership of the StateInfo records to
the search, after we have setup the root position.
As a nice side-effect all the possible memory leaks are magically
sorted out for us by std::auto_ptr semantic.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Should be more clear from where the 'magic' numbers
come from.
Also bit of reformat while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Instead of just size(). Although code is longer,
should be more immediate to understand when reading.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
To mimics C++11 std::mutex and std::condition_variable,
also rename locks and condition variables to be more
uniform across the classes.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Reduce of one instruction. It seems a tad faster on
the profiler now. Very slightly but anyhow it is a
code semplification.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This better mimics std::vector::operator[] and
fixes a warning with MSVC 64bit.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Further reduce redundancy in move generation.
Veirifed no speed regression on MSVC, Clang and gcc.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Greatly simplify these very performace critical functions.
Amazingly we don't have any speed regression actually under
MSVC we have the same assembly for generate_moves() !
In generate_direct_checks() 'target' is calculated only
once being a loop invariant.
On Clang there is even a slight speed up.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Are used by Position but do not belong to that class,
there is only one instance of them (that's why were
defined as static), so move to a proper namespace instead.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Check we are the last slave of the split point
before to wake up the master. This should avoid
spurious wakes up.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We can detect the split point master also from within idle_loop,
so we can call the function without parameters and remove an
overloaded member hack in Thread class.
Note that we don't need to take a lock around curSplitPoint
when entering idle_loop() because if we are the master then
curSplitPoint cannot change under our feet (because is_searching
is set and so we cannot be reallocated), if we are a slave
we enter idle_loop() only upon Thread creation and in that case
is always splitPointsCnt == 0. This is true even in the very rare
case that curSplitPoint != NULL, if we have been already allocated
even before entering idle_loop().
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Directly use the underlying std::map instead and avoid
a useless inheritance.
As a nice side-effect Options global object has now a
default c'tor avoiding possible issues with globals
initializations.
Suggested by Rein Halbersma.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Templetize MovePicker::next_move() member function instead. It
is easier and we also avoid the forwarding of MovePicker() c'tor
arguments in the common case.
Suggested by Rein Halbersma.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
First change: If Haiku is host platform, change
installation prefix to /boot/common/bin
Second change: Only link in pthreads if Haiku isn't
host platform.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And group there all the formatting functions but
uci_pv() that requires access to search.cpp variables.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Use the helpers to format the PV info but without
writing to output stream (file or cout). Message
formatting and sending are two logically different
task.
Incidentaly reintroduce the pretty_pv() name,
from Glaurung memories :-)
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Simplifies the code and seems more natural.
We have a very small fucntional change becuase now
at PV nodes castles are extended one ply anyhow.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
With warning level 4 MSVC complains that a default
assignment operator could not be generated due to
member 'file' is a reference (warning C4512).
Use a pointer instead of a reference and move
struct Tie outisde class Logger while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It seems more accurate: lsb is clear while 'first
bit' depends from where you look at the bitboard.
And fix compile in case of 64 bits platforms that
do not use BSFQ intrinsics.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Instead of -O4 option that does not work with both mingw and
Linux gcc (tested with Clang 3.1).
As reported by Reed Kotler:
Turns out that -O4 is not a valid option for clang unless you have
the proper gold linker and plugins built. That's because -O4 enables
LTO, which writes out bitcode files during the compile, and then loads
those and optimizes them during the link phase.
It requires a linker that supports LLVM's LTO. There is a plugin for
Gold available as part of LLVM.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We have a signed integer here so let the return type
take in account that.
Found by Clang with -Weverything option.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Return a reference instead of void so to enable
chained assignments like
"p = q = Position(...);"
Suggested by Rein Halbersma.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Although a (little) functional change, we have no ELO change
but formula it is now more clear.
After 13019 games at 30"+0.05
Mod vs Orig 2075 - 2088 - 8856 ELO 0 (+- 3.4)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It seems the standard behaviour as implemented
in most engines although UCI protocol does not
specify what to do upon "ucinewgame" command.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Calculate the attacks only for the piece to disambiguate,
not for all.
Also some reformatting while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
At endgame time push the king near his pawns (actually
one of them).
Original idea is from Critter (although slightly different),
implementation is mine and is completely different from the
original, in particular it is different the algorithm to
compute the minimum distance from pawns.
After 19895 games at 15"+0.05
Mod vs Orig 3638 - 3248 - 13009 ELO +7
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Error is due to ambiguous overloading of operator^
because we have both the built-in operator^(int, int)
and the user defined operator^(Bitboard, Square).
This error does not trigger when using Makefile becuase,
due to luck, the user defined operator^(Bitboard, Square)
happens to be always defined _after_ opposite_colors() so
that compiler does not claim. But in case of Microsoft MSVC
we don't have a Makefile and the order of files compilation
is chosen by the compiler (in an unpredictable way). So it
could still happen that error is not detected (as in my case),
but in another case the order of compilation of the files could
be so that at some point both operator^ were defined before
opposite_colors() and this triggers the error.
The fix is much simpler than the explanation :-)
Reported by Quocvuong82.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Currently when we fail to open a book file, for instance
if it doesn't exsist, we leave Book::open() with ifstream
failbit set. If then the book file is added, we correctly
open it at next attempt, but failbit is still set so that
after opening we exit because ifstream::good() returns false.
The fix is to reset failbit upon exiting.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
In case a book entry has 'count' field set to 0
we crash. Spotted by Clang's static analyzer.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
So to be fully in sync with StateInfo, and move struct
to position.h, just below StateInfo.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Revision 2aac860db3 of 27 / 4 / 2012
changed can_castle() signatures from bool to int and
this broke the code that calculates polyglot hash key
in book.cpp
Instead of directly fixing the code we prefer to change
castling rights definitions to align to the polyglot ones
(as we did in previous patch). After this step we can simply
take internal castle rights as they are and use them
directly to calculate polyglot book hash key, as we do
in this patch that fixes the regression.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
To be aligned with PolyGlot book castle right definitions.
This will be used by next patch.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Early classify as known draws the positions
where white king is trapped on the rook file.
Suggested by Dan Honeycutt.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
After extensive testing (I was off line this week and let
the test go on) it seems this change is useless:
After 33968 games on a QUAD (4 threads) at 15"+0.05
Mod vs Orig 5425 - 5550 - 22993 ELO -1 (+-2.1)
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Only in case of promotion we care about an upper case
promotion piece char, so std::transform() is overkill
for the task.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Assumption: Junior sends promotions according to the side to move (ucase/lcase).
Fact: Stockfish generally handles promotion lcase.
Patch: Handling position fen input moves always with lcase promotions.
Ported back by Portfish. No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It seems ADL lookup is broken with the STLPort library. Peter says:
The compiler is gcc 4.4.3, but I don't know how many patches they
have applied to it. I think gcc has had support for Koenig lookup
a long time. I think the problem is the type of the vector iterator.
For example, line 272 in search.cpp:
if (bookMove && count(RootMoves.begin(), RootMoves.end(), bookMove))
gives the error:
jni/stockfish/search.cpp:272: error: 'count' was not declared in this scope
Here RootMoves is:
std::vector<RootMove> RootMoves;
If std::vector<T>::iterator is implemented as T*, then Koenig lookup
would fail because RootMove* is not in namespace std.
I compile with the stlport implementation of STL, which in its vector
class has:
typedef value_type* iterator;
I'm not sure if this is allowed by the C++ standard. I did not find
anything that says the iterator type must belong to namespace std.
The consensus in this thread
http://compgroups.net/comp.lang.c++.moderated/argument-dependent-lookup/433395
is that the stlport iterator type is allowed.
Report and patch by Peter Osterlund.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Let first argument to be the 'color'. This allows to align
pos.pieces() signatures with piece_list(), piece_count() and
all the other functions where 'color' argument is passed as
first one.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Hex representation doesn't add any value in those cases.
Preserve hex representation where more self-documenting
for instance for binary masks values.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It seems to increase SMP performances. To note that
this patch goes in the opposite direction of "Active
reparenting" where we try to reparent an idle slave
as soon as possible. Instead here we prefer to keep
it idle instead of splitting on a shallow / near the
leaves node.
After 11550 games on a QUAD (4 threads) at 15"+0.05
Mod vs Orig 1972 - 1752 - 7826 ELO +6 (+-3.6)
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Set optimization level to 4 and get a 2.564% faster binary:
Stockfish (Clang, Level 4) bench:
$ make build ARCH=osx-x86-64 COMP=clang
(Clang does not support PGO)
Average of 4 trials:
Total time (ms): 5137.5
Nodes searched: 5631135
Nodes/second: 1096084.5
Stockfish (Clang, Level 3) bench:
$ make build ARCH=osx-x86-64 COMP=clang
(Clang does not support PGO)
Average of 4 trials:
Total time (ms): 5269.25
Nodes searched: 5631135
Nodes/second: 1068679.25
Stockfish (GCC, PGO) bench:
$ make profile-build ARCH=osx-x86-64
Average of 4 trials:
Total time (ms): 5286
Nodes searched: 5631135
Nodes/second: 1065292.25
Suggestion and performance tests by Daylen Yang.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
A PvNode that givesCheck has been already granted
an extension of ONE_PLY in previous condition.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Makefile modified to support the clang compiler under Mac.
This was tested using clang 4 under Mountain Lion, but should
also work fine under Lion and possibly under Snow Leopard.
It requires the 'Xcode 4.x Command Line Tools' to be installed.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Since revision 374c9e6b63
we use also castling information to calculate king safety.
So before to reuse the cached king safety score we have to
veify not only king position, but also castling rights are
the same of the pre-calculated ones.
This is a very subtle bug, found only becuase even after
previous patch, consecutives runs of 'bench' _still_ showed
different numbers. Pawn tables are not cleared between 'bench'
runs and in the second run a king safety score, previously
evaluated under some castling rights, was reused by another
position with different castling rights instead of being
recalculated from scratch.
Bug spotted and tracked down by Balint Pfliegel
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Now that we can call bench multiple times
from command prompt we need to ensure searched
nodes remain constant across different runs.
Spotted by Blint Pfliegel.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
After 6K games at 60" + 0.1 on QUAD with 4 threads
this implementation fails to show a measurable increase,
result is well within error bar.
Perhaps with 8 or more threads resut is better but we
don't have the hardware to test. So retire for now and
in case re-add in the future if it proves good on big
machines.
The only good news is that we don't have a regression and
implementation is stable and bug-free, so could be reused
somewhere in the future.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
The check for detecting when a split point has all the
slaves still running is done with:
slavesMask == allSlavesMask
When a thread reparents, slavesMask is increased, then, if
the same thread finishes, because there are no more moves,
slavesMask returns to original value and the above condition
returns to be true. So that the just finished thread immediately
reparents again with the same split point, then starts and
then immediately exits in a tight loop that ends only when a
second slave finishes, so that slavesMask decrements and the
condition becomes false. This gives a spurious and anomaly
high number of faked reparents.
With this patch, that rewrites the logic to avoid this pitfall,
the reparenting success rate drops to a more realistical 5-10%
for 4 threads case.
As a side effect note that now there is no more the limit of
maxThreadsPerSplitPoint when reparenting.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Check for a cutoff occurred also high in
the tree and not only at current split
point.
This avoids some more wasted reparenting.
No functional chnage.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It is more correct given what the function does. In
particular single_bit() returns true also in case of
empty bitboards.
Of course also the usual renaming while there :-)
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Instead of reparenting to oldest split point, try to reparent
to latest. The nice thing here is that we can use the YBWC
helpful master condition to allow the reparenting of a split
point master as long as is reparented to one of its slaves.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And update master->splitPointsCnt under lock
protection. Not stricly necessary because
single_bit() condition takes care of false
positives anyhow, but it is a bit tricky and
moving under lock is the most natural thing
to do to avoid races with "reparenting".
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
In Young Brothers Wait Concept (YBWC) available slaves are
booked by the split point master, then start to search below
the assigned split point and, once finished, return in idle
state waiting to be booked by another master.
This patch introduces "Active Reparenting" so that when a
slave finishes its job on the assigned split point, instead
of passively waiting to be booked, searches a suitable active
split point and reprents itselfs to that split point. Then
immediately starts to search below the split point in exactly
the same way of the others split point's slaves. This reduces
to zero the time waiting in idle loop and should increase
scalability especially whit many (8 or more) cores.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Apart from the semplification it is now more clear that
the actual Tempo added was half of the indicated score.
This is becuase at start compute_psq_score() added half
Tempo unit and in do_move() white/black were respectively
adding/subtracting one Tempo unit.
Now we have directly halved Tempo constant and everything
is more clear.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It is still enabled during fixed limit search so to
use it during fixed depth/nodes/time matches.
Bug reported by Daylen.
No functional changes.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Shrinking from [16] to [2][2] is able to speedup
perft of start position of almost 5% !
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Shrink dimensions of the biggest stack consumers arrays.
In particular movesSearched[] can be safely shrinked
without any impact on strenght or risk of crashing.
Also MAX_PLY can be reverted to 100 with almost no impact
so to limit search recursion and hence stack allocation.
A different case is for MAX_MOVES (used by Movepicker's
moves[]), because we know that do exsist some artificial
position with about 220 legal moves, so in those cases SF
will crash. Anyhow these cases are never found in games.
An open risk remains perft, especially run above handcrafted
positions.
This patch originates from a report by Daylen that found
SF crashing on his Mac OS X 10.7.3 while in deep analysys
on the following position:
8/3Q1pk1/5p2/4r3/5K2/8/8/8 w - - 0 1
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Now a fen file with Chess960 positions is
correctly parsed. But it is mandatory to set
"UCI_Chess960" option _before_ to call bench.
Note that this was not needed/possible before
adding the possibility to call 'bench' from
command prompt.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Now that we can call bench on current position
we can directly use it to perform our perft.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Now that we can call bench from command prompt
has a sense to teach bench to run the current
set position. To do this is enough to call bench
with 'current' as fen source parameter.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
After extensive test Gary says:
"So, after 16k games at 10"+1" on an i7, the undefended rook test
looks to be not good (albeit by a very small margin).
3063 - 3093 - 9844 (-1).
I doubt that is causing the regression, but even so, it looks like
it's not worth keeping, and we can go back to the simpler undefended
minors check."
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Unfortunatly accessing thread local variable
is much slower than object data (see previous
patch log msg), so we have to revert to old code
to avoid speed regression.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Much faster then pthread_getspecific() but still a
speed regression against the original code.
Following are the nps on a bench:
Position
454165
454838
455433
tls
441046
442767
442767
ms (Win)
450521
447510
451105
ms (pthread)
422115
422115
424276
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
After we release the SplitPoint lock the master, suppose
is main thread, can safely return and if a "quit" command
is pending, main thread exits and associated Thread object
is freed. So when we access master->is_searching a crash
occurs.
I have never found such a race that is of course very rare
becuase assumes that from lock releasing we go to sleep for
a time long enough for the main thread to end the search and
return. But you can never know, and anyhow a race is a race.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
So to avoid a crash when setting the moves in
UCI "position startpos moves ...." command.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
But use the newly introduced local storage
for this. A good code semplification and also
the correct way to go.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Use thread local storage to store a pointer to the thread we
are running on. This will allow to remove thread info from
Position class.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
With this change sources are fully endianess
independent, so we can simplify the Makefile.
Somewhat surprisingly we don't have any speed
regression !
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Erroneusly adds default positions to the fens
loaded from external file.
Bug introduced in adb71b8096
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
The 2 overload functions map() accept a pointer to
EndgameBase<Value> or a pointer to EndgameBase<ScaleFactor>.
Because Endgame<E> is derived from one of them we can
directly use a pointer to this class to resolve the
overload as is needed in Endgames::add().
Also made class Endgames fully parametrized and no more
hardcoded to the types (Value or ScaleFactor) of endgames
stored.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
A std::set (that is a rb_tree) seems really
overkill to store at most a handful of moves
and nothing in the common case.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It is possible to start with 'stockfish', then from
command prompt type 'bench' and SF will do what you expect.
Old behaviour is anyhow preserved. As a bonus we can now
start from command line any UCI command understood by
Stockfish. The difference is that after execution of a
command from arguments SF quits, while at the end of the
same command from prompt SF stays in UCI loop.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Allows some code semplification and avoids directly
allocation and managing heap memory.
Also the usual renaming while there.
No functional change and no speed regression.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
In particualr before to wake up main thread that
could take some random time. Until we don't reset
search time we are not able to correctly track
the elapsed search time and this can be dangerous
under extreme time pressure.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We need to wake up main thread if it is sleeping
waiting for stop or ponderhit, so we cannot skip
calling wait_for_search_finished().
Found by Othello1984.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
In this case SF stop searching and goes sleeping
waiting for a stop / ponderhit before to return
best move. So when a "stop" arrives we need to wake
up the main thread again.
Another regression introduced by 3aa471f2a9,
hopefully the last one.
Thanks to Otello1984 to reporting this.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Renamed stuff and added comments. The aim is to make more
readable, at least by me ;-) , this newly added part of code.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Incredible typo from my side!
The 2 tables are completely different, one counts 1s the
other returns the msb position. Even more incredible
the 'stockfish bench' command returns the same number
of nodes!!!
Spotted by Justin Blanchard.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Add more detailed pawn shelter/storm evaluation
After 10670 games at 10"+0.05
Mod vs Orig 2277 - 1941 - 6452 ELO +11 !!!
The first real increase since 2.2.2, congratulations Gary !!!
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Fixes a not so rare crash (once every 100 games)
newly introduced. Unfortunatly I am still not
able to figure out why :-(
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
There is no need to "invent" different names
from the original UCI parameters.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Penalty for undefended rook
Almost no change at longer TC, but perhaps there
is a tiny increase....
After 17522 games at 10"+0.05
Mod vs Orig 3064 - 2967 - 11491 ELO +2
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
When quitting we should avoid RootPosition to be
destroyed while threads are still running, leading
to a crash. In case of a "stop" or "ponderhit"
command there is no need for the UI thread to wait.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And add final touches to this long patch series.
All the series has been verified against regression with
20K games at fast TC.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We have a clash with start_fn defined both as a
Thread memeber and as a function pointer type in
pthread_create().
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We cannot set do_sleep flag of main thread before
"bestmove" is sent to GUI, otherwise GUI could send
immediately the next "go" command that triggers
start_thinking() and because do_sleep is set UI
thread resets the flag to launch a new search. But
when shortly after main thread returns to main_loop()
flag is incorrectly reset and main thread goes to sleep
hanging the engine.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We store pointers instead of Thread objects because
Thread is not copy-constructible nor copy-assignable
and default ones are not suitable. So we cannot store
directly in a std::vector.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Associate platform OS thread to the Thread class instead of
creating it from ThreadsManager.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Split the data allocation, now done (mostly once)
in read_uci_options(), from the wake up and sleeping
of the slave threads upon entering/exiting the search.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It seems is the cause of strange and rare hangs
reported by some users where Stockfish stops
responding to GUI. It is not clear why but for
the moment revert the patch.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Not correct warning about use of an uninitialized
variable. The warning is not correct becuase we can
never reach the warned code when in SpNode, anyhow
the fix is simple.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
A somewhat tricky function pointer cast allows us
to move the platform specifics to lock.h, the cast
is tricky because return type is not the same of the
casted function in Linux (for Windows return type is
a DWORD that is a long) but this should not be a
problem as long as the size is the same;
From: http://stackoverflow.com/questions/188839/function-pointer-cast-to-different-signature
"OpenSSL was only casting functions pointers to
other function types taking and returning the same
number of values of the same exact sizes, and this
(assuming you're not dealing with floating-point)
happens to be safe across all the platforms and
calling conventions I know of. However, anything
else is potentially unsafe."
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This should avoid some aliasing issues
with TT table access.
After 3913 games at 10"+0.05
Mod vs Orig 662 - 651 - 2600 ELO +0 (+- 6.4)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Even if not under attack. This seems to be good
especially on openings.
After 12112 games at 10"+0.05
Mod vs Orig 2175 - 1997 - 7940 ELO +5 (+- 3.7)
[Patch series from Gary, little edited by me]
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We need splitted Tie classes because MSVC stream library
takes a lock on buffer both on reading and on writing and
this causes an hang because, while searching, the I/O
thread is locked on getline() and when main thread is
trying to std::cout() something it blocks on the same
lock waiting for I/O thread getting some input and
releasing the lock.
The solution is to use separated streambuf objects for
cin and cout.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
A big code simplification and cruft removing, make
Logger class a singleton and fully self conteined.
Also add direction indicators (">>" and "<<") to
better differentiate input and output lines in the
log file.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Using "o" as a parameter with the on_xxx(const UICOption& o)
functions is a bit dangerous because of confusion with "0".
Suggested by Rein Halbersma.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Some trial was needed to find the correct recipe but now
we log both stdin and stdout to file "io_log.txt".
Link http://spec.winprog.org/streams/ was very useful
to understand the details of iostreams implementation.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
By means of "Use Debug Log" UCI option it is possible to toggle
the logging of std::cout to file "out.txt" while preserving
the usual output to stdout. There is zero overhead when logging
is disabled and we achieved this without changing a single line
of exsisting code, in particular we still use std::cout as usual.
The idea and part of the code comes from this article:
http://groups.google.com/group/comp.lang.c++/msg/1d941c0f26ea0d81
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
In particular before initialization. So that SF
seems more snappy at startup.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
As it was in Glaurung times. Also rearranged order
so that byTypeBB[0] is accessed before byTypeBB[x]
to be more cache friendly. It seems there is even
a small speedup.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Also some microoptimizations, were there from ages
but hidden: the renaming suddendly made them visible!
This is a good example of how better naming lets you write
better code. Naming is really a kind of black art!
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Stick to UCI protocol that says:
* by default all the opening book handling is done by the GUI,
but there is an option for the engine to use its own book
("OwnBook" option, see below)
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
UCI protocol it is not clear about what the engine
should be supposed to do when "ucinewgame" is
received. Stockfish simply sets the position to
start FEN, but it is redundant becuase the GUI always
resends the position after "ucinewgame" command, so
it seems we can safely ignore that command.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
When a button fires UCIOption::operator=() is called and from
there the on_change() function. Now it happens that in case of
a button the on_change() function resets option's value to
"false" triggering again UCIOption::operator=() that calls again
on_change() and so on in an endless loop that is experienced
by the user as an application hang.
Rework the button logic to fix the issue and also be more clear
about how button works.
Reported by several people working with Scid and tracked down
to the "Clear Hash" UCI button by Steven Atkinson.
Bug recently introduced by 2ef5b4066e.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Now we are forced to just use C++ iostream becuase
buffers are independent and using C library functions
like printf() or scanf() could yield to issues.
Speed up of about 1%.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Result of t.time * 1000 should be a 64 bit value, not an int.
Bug reported by several users.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Fine tune newly introduced pinner bonus score:
After 34696 games at 2"+0.05
Mod vs Orig 7474 - 7087 - 20135 ELO +3 (+- 2.4)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
So to be done only once at startup and in the (unlikely)
cases that a relevant UCI parameter is changed, instead
of doing it at the beginning of each search.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Introduce 'on change' actions that are triggered as soon as
an UCI option is changed by the GUI. This allows to set hash
size before to start the game, helpful especially on very fast
TC and big TT size.
As a side effect remove the 'button' type option, that now
is managed as a 'check' type.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Self-documenting code instead of a tricky
bitwise tweak, not known by everybody.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Add a bonus if a slider is pinning an enemy piece.
Idea from Critter.
After 27443 games at 2"+0.05
Mod vs Orig 5900 - 5518 - 16025 ELO +4 (+- 2.7)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Introduce and use a new Time class designed after
QTime, from Qt framework. Should be a more clear and
self documented code.
As an added benefit we now use 64 bits internally to get
millisecs from system time. This avoids to wrap around
to 0 every 2^32 milliseconds, which is 49.71 days.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Visual Studio 11 is worried that shift result could
overflow an integer, this is impossible becuase max
value of the shift is 4, but compiler cannot know it.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Pre compute castle path so to quickly test
for impeded rule.
This speeds up perft on starting position
of more than 2%.
No functional change
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Fix this warning with MSVC 64 bits:
warning C4244: '=' : conversion from 'std::streampos' to 'size_t',
possible loss of data
Point is that std::streampos could be negative, while size_t
is always non-negative.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And introduce SPlitPoint bestMove to pass back the
best move after a split point.
This allow to define as const the search stack passed
to split.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It is a prerequisite for next patch and simplifies
the function. testing at ultra fast TC shows no
regression.
After 24302 games at 2"+0.05
Mod vs Orig 5122 - 5038 - 13872 ELO +1 (+- 2.9)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Reverse the meaning of castleRightsMask[sq] so that now
is stored the castling right that will be removed in
case a move starts from or arrives to sq square. This
allows to simplify the code.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Instead of by square. This is a more conventional
approach, as reported also in:
http://chessprogramming.wikispaces.com/Zobrist+Hashing
We shrink zobEp[] from 64 to 8 keys at the cost of an extra
'and 7' at runtime to get the file out of the ep square.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We shouldn't need lock protection to increment
splitPointsCnt and set curSplitPoint of masterThread.
Anyhow because this code is very tricky and prone to
races bound the change in a single patch.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
When updating castleRights in do_move() perform only one
64bit xor with zobCastle[] instead of two.
The trick here is to define zobCastle[] keys of composite
castling rights as a xor combination of the keys of the
single castling rights, instead of 16 independent keys.
Idea from Critter although implementation is different.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Because TT table is shared tte->move() could change
under our feet, in particular we could validate
tte->move() then the move is changed by another
thread and we call pos.do_move() with a move different
from the original validated one !
This leads to a very rare but reproducible crash once
every about 20K games.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
There is no need to limit the maximum ply searched to
100, with deep exclusion search extensions we could
reach it even with much smaller search depths.
The only drawback is an increase in stack usage, but
is limited mainly to id_loop(), in particular the
recursive search() functions are not affected.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Replace a 64 bit 'and' by two 32 bits ones and
use unsigned instead of int.
This simple patch increases perft speed of 6% on
my Intel Core 2 Duo !
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
But only when needed, after a split point. This behaviour
does not apply when useSleepingThreads is false, becuase
in this case threads are not woken up at split points so
must be already running.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Rule says should be reset only after a capture and/or
a pawn move.
This incredible bug was here since Glaurung times !
Spotted by Kiriakos.
No functional change in the test bench because we
don't reach the 50 moves limits.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
With default value of 100 no change in regard of current
behaviour. Increasing the value makes SF to think a
longer time for each move. Decreasing the value makes SF
to move faster.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This method belongs to Thread, not to ThreadsManager.
Reshuffle stuff in thread.cpp while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Release split point lock before to wake up
master thread. This seems to increase speed
in case "sleeping threads" are used:
After 7792 games with 4 threads at very fast TC (2"+0.05)
Mod vs Orig 1722 - 1627 - 4443 ELO +4 (+- 5.1)
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
When allocating a slave we set both is_searching
and splitPoint under lock protection.
Unfortunatly the order in which the variables are
set is not defined. This article was very clarifying:
http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/
So when in idle loop we test for is_searching and then
access splitPoint, it could happen that splitPoint is still
not updated leading to a possible crash.
Fix the race lock protecting splitPoint access.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
With current code we could raise bestValue above beta,
not what is intended for.
Spotted by Richard Vida.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Simplify and streamline the code. Verified all the
resulting bitbases are not changed.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Fix an issue where the log file stores an incorrect +0.00
eval after a search has been stopped.
Bug reported by Ajedrecista.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This allows to retire ClearMaskBB[] and use just
one SquareBB[] array to set and clear a bit.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
With this we should compeltely remove the need
of installing third party POSIX threads library
when compiling with mingw-gcc under Windows.
Spotted by Trung Tu.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It is not clear the advantage and we don't want
to risk of introducing regressions on this
very critical parameter. So revert to old limit.
After 16003 games
Mod vs Orig 2496 - 2421 - 11086 ELO +1 (+-3)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Apart from some renaming the biggest change
is the retire of split_point_finished()
replaced by slavesMask flags. As a side
effect we now take also split point lock
when allocation available threads.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Instead of Posix threads. This seems to fix time
losses of the gcc compiled version for Windows.
The patch replaces the MSVC specific _MSC_VER flag
with _WIN32 and _WIN64 that are defined both by
MSVC and mingw-gcc.
Workaround found by Jim Ablett.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Instead of by SEE. Almost no ELO change but it is
a bit easier and is a more natural choice given
that good captures are ordered in the same way.
After 10424 games
Mod vs Orig 1639 - 1604 - 7181 ELO +1 (+-3.8)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
pass references (Windows style) instead of
pointers (Posix style) as function arguments.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
The previous line, LDFLAGS += $(CXXFLAGS), does not make sense, and
breaks profile-build, thus changing it into: LDFLAGS += -flto.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Integrate TT_MOVE step into the first state. This allows to
avoid the first call to next_phase() in case of a TT move.
And use overflow detection instead of the bunch of STOP_XX
states to detect end of moves.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
In case of a PvNode could happen that alpha == beta - 1,
for instance in case the same previous node was visited
with same beta during a non-pv search, the node failed low
and stored beta-1 in TT. Then the node is searched again
in PV mode, TT value beta-1 is retrieved and updates alpha
that now happens to be beta-1.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Almost no functional change because multiple recaptures
to same square are very rare, but neverthless it seems
the correct thing to do.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
They are almost the same, if the function arguments would
have been the same they could even be integrated.
Also a bit of renaming while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And increase LMR limit. Tests show no change ELO wise,
but we prefer to take the risk to commit anyhow becuase
is a 'prune reducing' patch.
After 10749 games
Mod vs Orig: 1670 - 1676 - 7403 ELO 0 (+-3.7)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Skip calling promotion generation functions in
the very common case of no possible promotion
evasion. Also retire generate_pawn_captures()
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Fix an incorrect warning: 'bm may be used uninitialized'
with the old but still commonly used gcc 4.4
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Now that we don't support anymore popcount detection
at runtime this target is obsolete.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Use $(CXX) instead of assuming compiler name is 'gcc'
Spotted by Louis Zulli.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
All the piece dependant data is passed now as
function arguments so that the code is exactly
the same for bishop and rook.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It is more correct and specific. Another naming
improvement while reading Critter sources.
No functional changes.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.