in affine transform for AVX512/AVX2/SSSE3
The idea is to initialize sum with the first element instead of zero.
Reduce one add_epi32 and one set_zero SIMD instructions for each output dimension.
sum = 0; for i = 1 to n sum += a[i] ->
sum = a[1]; for i = 2 to n sum += a[i]
STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 69048 W: 7024 L: 6799 D: 55225
Ptnml(0-2): 260, 5175, 23458, 5342, 289
https://tests.stockfishchess.org/tests/view/5faf2cf467cbf42301d6aa06
closes https://github.com/official-stockfish/Stockfish/pull/3227
No functional change.
This is a follow-up of the recent qsearch pruning patch in
a260c9a8a2
We now use the same guard condition (testing that we already have a defense with
a score better score than a TB loss) for all pruning heuristics in qsearch().
This allows some pruning when in check, but in a controlled way to ensure that
no wrong mate scores appear.
Tested with Elo-gaining bounds:
STC:
LLR: 2.97 (-2.94,2.94) {-0.25,1.25}
Total: 22632 W: 2433 L: 2264 D: 17935
Ptnml(0-2): 98, 1744, 7487, 1865, 122
https://tests.stockfishchess.org/tests/view/5fa59405936c54e11ec99515
LTC:
LLR: 2.94 (-2.94,2.94) {0.25,1.25}
Total: 105432 W: 4965 L: 4648 D: 95819
Ptnml(0-2): 85, 4110, 44011, 4423, 87
https://tests.stockfishchess.org/tests/view/5fa5b609936c54e11ec9952a
closes https://github.com/official-stockfish/Stockfish/pull/3221
Bench: 3578092
For the feature transformer the code is analogical to AVX2 since there was room for easy adaptation of wider simd registers.
For the smaller affine transforms that have 32 byte stride we keep 2 columns in one zmm register. We also unroll more aggressively so that in the end we have to do 16 parallel horizontal additions on ymm slices each consisting of 4 32-bit integers. The slices are embedded in 8 zmm registers.
These changes provide about 1.5% speedup for AVX-512 builds.
Closes https://github.com/official-stockfish/Stockfish/pull/3218
No functional change.
Using no searching time in case of a single legal move is not beneficial from
a strength point of view, and this special case can be easily removed:
STC:
LLR: 2.93 (-2.94,2.94) {-1.25,0.25}
Total: 22472 W: 2458 L: 2357 D: 17657
Ptnml(0-2): 106, 1733, 7453, 1842, 102
https://tests.stockfishchess.org/tests/view/5f926cbc81eda81bd78cb6df
LTC:
LLR: 2.94 (-2.94,2.94) {-0.75,0.25}
Total: 37880 W: 1736 L: 1682 D: 34462
Ptnml(0-2): 22, 1392, 16057, 1448, 21
https://tests.stockfishchess.org/tests/view/5f92a26081eda81bd78cb6fe
The advantage of using the normal time management for a single legal move is that scores
reported for that move are reasonable, not searching leads to artifacts during games
(see e.g. https://tcec-chess.com/#div=sf&game=96&season=19)
The disadvantage of using normal time management of a single legal move is that thinking
times can be unnaturally long, making it 'painful to watch' in online tournaments.
This patch uses normal time management, but caps the used time to 500ms.
This should lead to reasonable scores, and be hardly perceptible.
closes https://github.com/official-stockfish/Stockfish/pull/3195
closes https://github.com/official-stockfish/Stockfish/pull/3183
variant of a patch suggested by SFisGOD
No functional change.