Commit Graph

135 Commits

Author SHA1 Message Date
Tomasz Sobczyk
50df3a7389 fix annoying warning 2020-12-22 09:24:26 +09:00
Joost VandeVondele
b49fd3ab30 Add -lstdc++fs to the link line of gcc
older versions of gcc (<8.1) need this, even if they accept -std=c++17

with this patch, the code can be run on fishtest again,
at least by the majority of workers (fishtest doesn't require c++17 to be available)

See e.g.
https://tests.stockfishchess.org/tests/view/5fcfbf801ac1691201888235

Bench: 3820648
2020-12-09 08:40:34 +09:00
Tomasz Sobczyk
fafb9557a8 Get train loss from update_parameters. 2020-12-02 08:56:20 +09:00
Tomasz Sobczyk
256c4b55ec Properly apply gradient norm clipping after it's scaled in the update_parameters. 2020-12-02 08:56:20 +09:00
Tomasz Sobczyk
539bd2d1c8 Replace the old loss/grad calculation completely. 2020-12-02 08:56:20 +09:00
Tomasz Sobczyk
b71d1e8620 Pass the new loss function to update_parameters 2020-12-02 08:56:20 +09:00
Tomasz Sobczyk
1322a9a5fd Prevent false sharing of num_calls counter in the shared input trainer. Fix current_operation not being local to the executing thread. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
2aa7f5290e Fix comparison of integers with different signedness. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
a97b65eaef Fix compilation error with USE_BLAS 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
622e0b14c2 Remove superfluous example shuffling. Shuffling now only happens on reading. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
34510dd08a Remove used examples asyncronously. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
0bee8fef64 Don't unnecessarily copy the batch part. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
e954b14196 Prefetch weights for feature transformer backprop to shared cache. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
49b2dcb1f3 Preallocate memory for unique_features. Keep the training_features temporary buffer as a thread_local so we reuse the storage. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
1c8495b54b Remove handwritten saxpy because compilers optimize the second look anyway. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
15c528ca7b Prepare feature transformer learner. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
a3c78691a2 Prepare input slice trainer. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
401fc0fbab Prepare clipped relu trainer. 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
cc11375f6d Skeleton for new evaluate learner 2020-11-30 08:54:53 +09:00
Tomasz Sobczyk
0d4b803b08 Prepare trainer affine transform. 2020-11-30 08:54:53 +09:00
noobpwnftw
0b2ae6cb64 Merge remote-tracking branch 'remotes/official/master' into merge 2020-11-28 06:47:04 +08:00
MaximMolchanov
7615e3485e Calculate sum from first elements
in affine transform for AVX512/AVX2/SSSE3

The idea is to initialize sum with the first element instead of zero.
Reduce one add_epi32 and one set_zero SIMD instructions for each output dimension.

sum = 0; for i = 1 to n sum += a[i] ->
sum = a[1]; for i = 2 to n sum += a[i]

STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 69048 W: 7024 L: 6799 D: 55225
Ptnml(0-2): 260, 5175, 23458, 5342, 289
https://tests.stockfishchess.org/tests/view/5faf2cf467cbf42301d6aa06

closes https://github.com/official-stockfish/Stockfish/pull/3227

No functional change.
2020-11-25 21:10:13 +01:00
Stéphane Nicolet
027626db1e Small cleanups 13
No functional change
2020-11-23 22:20:32 +01:00
noobpwnftw
c29554a120 Merge remote-tracking branch 'remotes/official/master' into master
Bench: 3597730
2020-11-23 04:27:12 +08:00
JWmer
3975fc9c0d Update half_relative_ka.cpp 2020-11-22 07:45:39 +09:00
JWmer
b0429237a8 Update half_ka.cpp 2020-11-22 07:45:39 +09:00
JWmer
ea70e378cd Update a.cpp 2020-11-22 07:45:39 +09:00
JWmer
be4cd56146 Update half_kp.cpp 2020-11-22 07:45:39 +09:00
JWmer
021f47b00e Update half_relative_kp.cpp 2020-11-22 07:45:39 +09:00
JWmer
36c801699f Update k.cpp 2020-11-22 07:45:39 +09:00
JWmer
5b3e9b0eb3 Update p.cpp 2020-11-22 07:45:39 +09:00
JWmer
c04c5b6658 Update nnue_common.h 2020-11-22 07:45:39 +09:00
JWmer
b27c51b5cf Delete k-p-cr-ep_256x2-32-32.h 2020-11-22 07:45:39 +09:00
JWmer
72fee2f7a4 Delete k-p-cr_256x2-32-32.h 2020-11-22 07:45:39 +09:00
JWmer
d9dcdc2b73 Delete k-p_256x2-32-32.h 2020-11-22 07:45:39 +09:00
Tomasz Sobczyk
691da3bdad Add more information for factorizers at the start of training. 2020-11-14 18:47:22 +09:00
Tomasz Sobczyk
4e1653d53a Fix reliance on transitive includes for factorizers in trainer feature transformer. Add a file that includes all factorizers. 2020-11-14 12:35:12 +09:00
Tomasz Sobczyk
ba35c88ab8 AVX-512 for smaller affine and feature transforms.
For the feature transformer the code is analogical to AVX2 since there was room for easy adaptation of wider simd registers.

For the smaller affine transforms that have 32 byte stride we keep 2 columns in one zmm register. We also unroll more aggressively so that in the end we have to do 16 parallel horizontal additions on ymm slices each consisting of 4 32-bit integers. The slices are embedded in 8 zmm registers.

These changes provide about 1.5% speedup for AVX-512 builds.

Closes https://github.com/official-stockfish/Stockfish/pull/3218

No functional change.
2020-11-07 16:49:49 +01:00
Tomasz Sobczyk
3f6451eff7 Manually align arrays on the stack
as a workaround to issues with overaligned alignas() on stack variables in gcc < 9.3 on windows.

closes https://github.com/official-stockfish/Stockfish/pull/3217

fixes #3216

No functional change
2020-11-04 19:52:42 +01:00
Tomasz Sobczyk
75e06a1c89 Optimize affine transform for SSSE3 and higher targets.
A non-functional speedup. Unroll the loops going over
the output dimensions in the affine transform layers by
a factor of 4 and perform 4 horizontal additions at a time.
Instead of doing naive horizontal additions on each vector
separately use hadd and shuffling between vectors to reduce
the number of instructions by using all lanes for all stages
of the horizontal adds.

passed STC of the initial version:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 17808 W: 1914 L: 1756 D: 14138
Ptnml(0-2): 76, 1330, 5948, 1460, 90
https://tests.stockfishchess.org/tests/view/5f9d516f6a2c112b60691da3

passed STC of the final version after cleanup:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 16296 W: 1750 L: 1595 D: 12951
Ptnml(0-2): 72, 1192, 5479, 1319, 86
https://tests.stockfishchess.org/tests/view/5f9df5776a2c112b60691de3

closes https://github.com/official-stockfish/Stockfish/pull/3203

No functional change
2020-11-02 19:41:17 +01:00
Tomasz Sobczyk
987b6c98d4 Move the observed feature collection to the threaded part now that it can be done safely. 2020-11-01 11:02:44 +09:00
Tomasz Sobczyk
e8907bcfc4 Replace omp in trainer_feature_transformer 2020-10-31 11:54:03 +09:00
Tomasz Sobczyk
db1b33d4ac Optimize trainer clipped relu propagate 2020-10-31 11:52:51 +09:00
Tomasz Sobczyk
b5714c4084 Parallelize input slice trainer backprop. 2020-10-31 11:52:26 +09:00
Tomasz Sobczyk
941897ff2c Optimize trainer clipped relu backpropagate. 2020-10-31 11:50:12 +09:00
Tomasz Sobczyk
c96743c5bd Optimize feature transformer backpropagation stats. 2020-10-31 11:49:29 +09:00
Tomasz Sobczyk
2c10b1babc Optimize feature transformer clipped relu. 2020-10-31 11:48:02 +09:00
Tomasz Sobczyk
a56d8124d8 Replace non-blas parts of trainers with our own blas-like routines. 2020-10-31 08:36:58 +09:00
Tomasz Sobczyk
ee0917a345 Pass ThreadPool to update_parameters, propagate, and backpropagate. 2020-10-29 09:21:19 +09:00
Tomasz Sobczyk
f1e96cab55 Align trainer arrays to cache line. 2020-10-29 09:12:50 +09:00