Commit Graph

55 Commits

Author SHA1 Message Date
Tomasz Sobczyk
5db46d0c82 Verify whether there is a network being used during training. 2020-10-17 08:44:38 +09:00
Tomasz Sobczyk
0494adeb2c Move nnue evaluation stuff from evaluate.h to nnue/evaluate_nnue.h 2020-10-15 20:37:03 +09:00
noobpwnftw
d865159bd6 Fix variable initialization in test commands 2020-09-29 17:30:08 +08:00
noobpwnftw
a8b502a975 Merge remote-tracking branch 'remotes/origin/master'
Bench: 3618595
2020-09-29 17:09:14 +08:00
noobpwnftw
c065abdcaf Use incremental updates more often
Use incremental updates for accumulators for up to 2 plies.
Do not copy accumulator. About 2% speedup.

Passed STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 21752 W: 2583 L: 2403 D: 16766
Ptnml(0-2): 128, 1761, 6923, 1931, 133
https://tests.stockfishchess.org/tests/view/5f7150cf3b22d6afa5069412

closes https://github.com/official-stockfish/Stockfish/pull/3157

No functional change
2020-09-28 16:54:35 +02:00
noobpwnftw
5e8a49f7f2 Restore lambda and gradient function post-merge and minor fixes.
bench: 3788313
2020-09-26 12:55:02 +09:00
noobpwnftw
9827411b7c Merge remote-tracking branch 'remotes/nodchip/master' into trainer 2020-09-24 21:45:28 +08:00
noobpwnftw
5be8b573be Merge remote-tracking branch 'remotes/origin/master' into trainer 2020-09-23 19:02:27 +08:00
noobpwnftw
411adab149 Merge remote-tracking branch 'remotes/nodchip/master' into trainer 2020-09-23 18:29:30 +08:00
Stéphane Nicolet
9a64e737cf Small cleanups 12
- Clean signature of functions in namespace NNUE
- Add comment for countermove based pruning
- Remove bestMoveCount variable
- Add const qualifier to kpp_board_index array
- Fix spaces in get_best_thread()
- Fix indention in capture LMR code in search.cpp
- Rename TtmemDeleter to LargePageDeleter

Closes https://github.com/official-stockfish/Stockfish/pull/3063

No functional change
2020-09-21 10:41:10 +02:00
Sami Kiminki
485d517c68 Add large page support for NNUE weights and simplify TT mem management
Use TT memory functions to allocate memory for the NNUE weights. This
should provide a small speed-up on systems where large pages are not
automatically used, including Windows and some Linux distributions.

Further, since we now have a wrapper for std::aligned_alloc(), we can
simplify the TT memory management a bit:

- We no longer need to store separate pointers to the hash table and
  its underlying memory allocation.
- We also get to merge the Linux-specific and default implementations
  of aligned_ttmem_alloc().

Finally, we'll enable the VirtualAlloc code path with large page
support also for Win32.

STC: https://tests.stockfishchess.org/tests/view/5f66595823a84a47b9036fba
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 14896 W: 1854 L: 1686 D: 11356
Ptnml(0-2): 65, 1224, 4742, 1312, 105

closes https://github.com/official-stockfish/Stockfish/pull/3081

No functional change.
2020-09-21 08:43:48 +02:00
Tomasz Sobczyk
d4737819cd Fix castling rights feature encoding. 2020-09-20 20:10:03 +09:00
noobpwnftw
26f63fe741 Merge remote-tracking branch 'remotes/origin/master' into trainer 2020-09-19 03:38:37 +08:00
noobpwnftw
a47a3bfc7c Merge remote-tracking branch 'remotes/nodchip/master' into trainer 2020-09-19 02:14:17 +08:00
syzygy1
8b8a510fd6 Use tiling to speed up accumulator refreshes and updates
Perform the update and refresh operations tile by tile in a local
array of vectors. By selecting the array size carefully, we
achieve that the compiler keeps the whole array in vector registers.

Idea and original implementation by @sf-x.

STC: https://tests.stockfishchess.org/tests/view/5f623eec912c15f19854b855
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 4872 W: 623 L: 477 D: 3772
Ptnml(0-2): 14, 350, 1585, 450, 37

LTC: https://tests.stockfishchess.org/tests/view/5f62434e912c15f19854b860
LLR: 2.94 (-2.94,2.94) {0.25,1.25}
Total: 25808 W: 1565 L: 1401 D: 22842
Ptnml(0-2): 23, 1186, 10332, 1330, 33

closes https://github.com/official-stockfish/Stockfish/pull/3130

No functional change
2020-09-17 17:24:52 +02:00
Tomasz Sobczyk
d33e7a9b07 Remove conditional compilation on EVAL_LEARN 2020-09-12 16:19:24 +02:00
Joost VandeVondele
e0a9860708 Upgrade CI distro, remove special cases, fix one more warning 2020-09-10 08:15:15 +09:00
nodchip
7bd4688747 Remove compile warnings. 2020-09-09 23:02:39 +09:00
noobpwnftw
84ba591118 Merge branch 'master' into trainer 2020-09-09 20:19:13 +08:00
noobpwnftw
675d336ebb Merge branch 'master' into trainer 2020-09-09 16:08:49 +08:00
nodchip
4206a1edd0 Renamed parameters to avoid shadowing other parameters. 2020-09-09 10:26:42 +09:00
nodchip
1864845811 Commented out unused parameters. 2020-09-09 10:26:42 +09:00
nodchip
1d00d00241 Removed ENABLE_TEST_CMD macro. 2020-09-09 10:26:42 +09:00
nodchip
a6013557f2 Removed EVAL_NNUE macro. 2020-09-09 10:26:42 +09:00
noobpwnftw
d25657c439 Merge branch 'master' into trainer 2020-09-09 08:43:12 +08:00
noobpwnftw
d21424c8d3 test 2020-09-09 07:31:22 +08:00
syzygy1
fc27d158c0 Bug fix in do_null_move() and NNUE simplification.
This fixes #3108 and removes some NNUE code that is currently not used.

At the moment, do_null_move() copies the accumulator from the previous
state into the new state, which is correct. It then clears the "computed_score"
flag because the side to move has changed, and with the other side to move
NNUE will return a completely different evaluation (normally with changed
sign but also with different NNUE-internal tempo bonus).

The problem is that do_null_move() clears the wrong flag. It clears the
computed_score flag of the old state, not of the new state. It turns out
that this almost never affects the search. For example, fixing it does not
change the current bench (but it does change the previous bench). This is
because the search code usually avoids calling evaluate() after a null move.

This PR corrects do_null_move() by removing the computed_score flag altogether.
The flag is not needed because nnue_evaluate() is never called twice on a position.

This PR also removes some unnecessary {}s and inserts a few blank lines
in the modified NNUE files in line with SF coding style.

Resulf ot STC non-regression test:
LLR: 2.95 (-2.94,2.94) {-1.25,0.25}
Total: 26328 W: 3118 L: 3012 D: 20198
Ptnml(0-2): 126, 2208, 8397, 2300, 133
https://tests.stockfishchess.org/tests/view/5f553ccc2d02727c56b36db1

closes https://github.com/official-stockfish/Stockfish/pull/3109

bench: 4109324
2020-09-08 22:53:17 +02:00
Joost VandeVondele
6e8f82ad76 Fix small CI failures
1) Only access UCI option if defined
2) disable -Werror for now.
3) disable a few target that don't have _mm_malloc.
4) Add profile-learn target, with small speedup.
5) just test on Linux + gcc (skip macOS, unclear openblas, skip linux+clang, unclear omp/std::filesystem).
2020-09-08 09:14:49 +09:00
nodchip
4cc98d80f8 Replaced the utility function to create a directory to std::filesystem. 2020-09-07 18:56:41 +09:00
Joost VandeVondele
edbbc1a4df Remove some warnings 2020-09-07 09:20:47 +09:00
Stéphane Nicolet
406979ea12 Embed default net, and simplify using non-default nets
covers the most important cases from the user perspective:

It embeds the default net in the binary, so a download of that binary will result
in a working engine with the default net. The engine will be functional in the default mode
without any additional user action.

It allows non-default nets to be used, which will be looked for in up to
three directories (working directory, location of the binary, and optionally a specific default directory).
This mechanism is also kept for those developers that use MSVC,
the one compiler that doesn't have an easy mechanism for embedding data.

It is possible to disable embedding, and instead specify a specific directory, e.g. linux distros might want to use
CXXFLAGS="-DNNUE_EMBEDDING_OFF -DDEFAULT_NNUE_DIRECTORY=/usr/share/games/stockfish/" make -j ARCH=x86-64 profile-build

passed STC non-regression:
https://tests.stockfishchess.org/tests/view/5f4a581c150f0aef5f8ae03a
LLR: 2.95 (-2.94,2.94) {-1.25,-0.25}
Total: 66928 W: 7202 L: 7147 D: 52579
Ptnml(0-2): 291, 5309, 22211, 5360, 293

closes https://github.com/official-stockfish/Stockfish/pull/3070

fixes https://github.com/official-stockfish/Stockfish/issues/3030

No functional change.
2020-08-29 21:56:00 +02:00
nodchip
f7bc4e6e45 Fixed compilation errors. 2020-08-29 00:56:05 +09:00
nodchip
906c18eb46 Merge branch 'master' of github.com:official-stockfish/Stockfish into nnue-player-merge-2020-08-28
# Conflicts:
#	README.md
#	src/Makefile
#	src/search.cpp
#	src/types.h
#	src/uci.cpp
#	src/ucioption.cpp
2020-08-28 11:26:11 +09:00
syzygy1
9b4967071e Remove EvalList
This patch removes the EvalList structure from the Position object and generally simplifies the interface between do_move() and the NNUE code.

The NNUE evaluation function first calculates the "accumulator". The accumulator consists of two halves: one for white's perspective, one for black's perspective.

If the "friendly king" has moved or the accumulator for the parent position is not available, the accumulator for this half has to be calculated from scratch. To do this, the NNUE node needs to know the positions and types of all non-king pieces and the position of the friendly king. This information can easily be obtained from the Position object.

If the "friendly king" has not moved, its half of the accumulator can be calculated by incrementally updating the accumulator for the previous position. For this, the NNUE code needs to know which pieces have been added to which squares and which pieces have been removed from which squares. In principle this information can be derived from the Position object and StateInfo struct (in the same way as undo_move() does this). However, it is probably a bit faster to prepare this information in do_move(), so I have kept the DirtyPiece struct. Since the DirtyPiece struct now stores the squares rather than "PieceSquare" indices, there are now at most three "dirty pieces" (previously two). A promotion move that captures a piece removes the capturing pawn and the captured piece from the board (to SQ_NONE) and moves the promoted piece to the promotion square (from SQ_NONE).

An STC test has confirmed a small speedup:

https://tests.stockfishchess.org/tests/view/5f43f06b5089a564a10d850a
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 87704 W: 9763 L: 9500 D: 68441
Ptnml(0-2): 426, 6950, 28845, 7197, 434

closes https://github.com/official-stockfish/Stockfish/pull/3068

No functional change
2020-08-26 07:11:26 +02:00
mstembera
701b2427bd Support VNNI on 256bit vectors
due to downclocking on current chips (tested up to cascade lake)
supporting avx512 and vnni512, it is better to use avx2 or vnni256
in multithreaded (in particular hyperthreaded) engine use.
In single threaded use, the picture is different.

gcc compilation for vnni256 requires a toolchain for gcc >= 9.

closes https://github.com/official-stockfish/Stockfish/pull/3038

No functional change
2020-08-24 12:03:04 +02:00
syzygy1
cc9d503dde Skip the alignment bug workaround for Clang
Clang-10.0.0 poses as gcc-4.2:

$ clang++ -E -dM - </dev/null | grep GNUC

This means that Clang is using the workaround for the alignment bug of gcc-8
even though it does not have the bug (as far as I know).

This patch should speed up AVX2 and AVX512 compiles on Windows (when using Clang),
because it disables (for Clang) the gcc workaround we had introduced in this commit:
875183b310

closes https://github.com/official-stockfish/Stockfish/pull/3050

No functional change.
2020-08-23 23:09:31 +02:00
Stéphane Nicolet
81d716f5cc Reformat code in little-endian patch
Reformat code and rename the function to "read_little_endian()" in the recent
commit by Ronald de Man for support of big endian systems.

closes https://github.com/official-stockfish/Stockfish/pull/3016

No functional change
-----

Recommended net: https://tests.stockfishchess.org/api/nn/nn-82215d0fd0df.nnue
2020-08-17 12:15:57 +02:00
syzygy1
72dc7a5c54 Assume network file is in little-endian byte order
This patch fixes the byte order when reading 16- and 32-bit values from the network file on a big-endian machine.

Bytes are ordered in read_le() using unsigned arithmetic, which doesn't need tricks to determine the endianness of the machine. Unfortunately the compiler doesn't seem to be able to optimise the ordering operation, but reading in the weights is not a time-critical operation and the extra time it takes should not be noticeable.

Big endian systems are still untested with NNUE.

fixes #3007

closes https://github.com/official-stockfish/Stockfish/pull/3009

No functional change.
2020-08-16 21:10:26 +02:00
mstembera
6eb186c97e Try to match relative magnitude of NNUE eval to classical
The idea is that since we are mixing NNUE and classical evals matching their magnitudes closer allows for better comparisons.

STC https://tests.stockfishchess.org/tests/view/5f35a65411a9b1a1dbf18e2b
LLR: 2.94 (-2.94,2.94) {-0.50,1.50}
Total: 9840 W: 1150 L: 1027 D: 7663
Ptnml(0-2): 49, 772, 3175, 855, 69

LTC https://tests.stockfishchess.org/tests/view/5f35bcbe11a9b1a1dbf18e47
LLR: 2.93 (-2.94,2.94) {0.25,1.75}
Total: 44424 W: 2492 L: 2294 D: 39638
Ptnml(0-2): 42, 2015, 17915, 2183, 57

also corrects the location to clamp the evaluation (non-function on bench).

closes https://github.com/official-stockfish/Stockfish/pull/3003

bench: 3905447
2020-08-14 16:39:52 +02:00
mstembera
dd63b98fb0 Add support for VNNI
Adds support for Vector Neural Network Instructions (avx512), as available on Intel Cascade Lake

The _mm512_dpbusd_epi32() intrinsic (vpdpbusd instruction) is taylor made for NNUE.

on a cascade lake CPU (AWS C5.24x.large, gcc 10) NNUE eval is at roughly 78% nps of classical
(single core test)

bench 1024 1 24 default depth:
target 	classical 	NNUE 	ratio
vnni 	2207232 	1725987 	78.20
avx512 	2216789 	1671734 	75.41
avx2 	2194006 	1611263 	73.44
modern 	2185001 	1352469 	61.90

closes https://github.com/official-stockfish/Stockfish/pull/2987

No functional change
2020-08-13 07:39:52 +02:00
Joost VandeVondele
992f549ae7 Restrict avx2 hack to windows target
this workaround is possibly rather a windows & gcc specific problem. See e.g.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412#c25

on Linux with gcc 8 this patch brings roughly a 8% speedup.
However, probably needs some testing in the wild.

includes a workaround for an old msys make (3.81) installation (fixes #2984)

No functional change
2020-08-11 23:35:02 +02:00
mstembera
f46c73040c Fix AVX512 build with older compilers
avoids an intrinsic that is missing in gcc < 10.

For this target, might trigger another gcc bug on windows that
requires up-to-date gcc 8, 9, or 10, or usage of clang.

Fixes https://github.com/official-stockfish/Stockfish/issues/2975

closes https://github.com/official-stockfish/Stockfish/pull/2976

No functional change
2020-08-11 08:17:03 +02:00
Fanael Linithien
21df37d7fd Provide vectorized NNUE code for SSE2 and MMX targets
This patch allows old x86 CPUs, from AMD K8 (which the x86-64 baseline
targets) all the way down to the Pentium MMX, to benefit from NNUE with
comparable performance hit versus hand-written eval as on more modern
processors.

NPS of the bench with NNUE enabled on a Pentium III 1.13 GHz (using the
MMX code):
  master: 38951
  this patch: 80586

NPS of the bench with NNUE enabled using baseline x86-64 arch, which is
how linux distros are likely to package stockfish, on a modern CPU
(using the SSE2 code):
  master: 882584
  this patch: 1203945

closes https://github.com/official-stockfish/Stockfish/pull/2956

No functional change.
2020-08-10 19:17:57 +02:00
mstembera
f948cd008d Cleanup and optimize SSE/AVX code
AVX512 +4% faster
AVX2 +1% faster
SSSE3 +5% faster

passed non-regression STC:
STC https://tests.stockfishchess.org/tests/view/5f31249f90816720665374f6
LLR: 2.96 (-2.94,2.94) {-1.50,0.50}
Total: 17576 W: 2344 L: 2245 D: 12987
Ptnml(0-2): 127, 1570, 5292, 1675, 124

closes https://github.com/official-stockfish/Stockfish/pull/2962

No functional change
2020-08-10 14:38:17 +02:00
mstembera
875183b310 Workaround using unaligned loads for gcc < 9
despite usage of alignas, the generated (avx2/avx512) code with older compilers needs to use
unaligned loads with older gcc (e.g. confirmed crash with gcc 7.3/mingw on abrok).

Better performance thus requires gcc >= 9 on hardware supporting avx2/avx512

closes https://github.com/official-stockfish/Stockfish/pull/2969

No functional change
2020-08-10 11:12:35 +02:00
Joost VandeVondele
651ec3b31e Revert "Avoid special casing for MinGW"
This reverts commit a6e89293df.

The offending setup has been found as gcc/mingw 7.3 (on Ubuntu 18.04).

fixes https://github.com/official-stockfish/Stockfish/issues/2963

closes https://github.com/official-stockfish/Stockfish/issues/2968

No functional change.
2020-08-10 07:28:19 +02:00
nodchip
4260ed0c7f Merge branch 'master' of github.com:official-stockfish/Stockfish into nnue-player-merge 2020-08-10 08:52:55 +09:00
nodchip
4f97d3446d Cleaned up source code. 2020-08-10 08:52:34 +09:00
Dariusz Orzechowski
a6e89293df Avoid special casing for MinGW
after some testing, no version of MinGW/gcc has been found where this code is still necessary.
Probably older code (pre-c++17?)

closes https://github.com/official-stockfish/Stockfish/pull/2891

No functional change
2020-08-09 23:49:14 +02:00
nodchip
22b85810fe Re-added the code to skip loading a net file. 2020-08-08 19:04:08 +09:00