mirror of
https://github.com/HChaZZY/Stockfish.git
synced 2025-12-23 10:36:26 +08:00
Architecture:
The diagram of the "SFNNv4" architecture:
https://user-images.githubusercontent.com/8037982/153455685-cbe3a038-e158-4481-844d-9d5fccf5c33a.png
The most important architectural changes are the following:
* 1024x2 [activated] neurons are pairwise, elementwise multiplied (not quite pairwise due to implementation details, see diagram), which introduces a non-linearity that exhibits similar benefits to previously tested sigmoid activation (quantmoid4), while being slightly faster.
* The following layer has therefore 2x less inputs, which we compensate by having 2 more outputs. It is possible that reducing the number of outputs might be beneficial (as we had it as low as 8 before). The layer is now 1024->16.
* The 16 outputs are split into 15 and 1. The 1-wide output is added to the network output (after some necessary scaling due to quantization differences). The 15-wide is activated and follows the usual path through a set of linear layers. The additional 1-wide output is at least neutral, but has shown a slightly positive trend in training compared to networks without it (all 16 outputs through the usual path), and allows possibly an additional stage of lazy evaluation to be introduced in the future.
Additionally, the inference code was rewritten and no longer uses a recursive implementation. This was necessitated by the splitting of the 16-wide intermediate result into two, which was impossible to do with the old implementation with ugly hacks. This is hopefully overall for the better.
First session:
The first session was training a network from scratch (random initialization). The exact trainer used was slightly different (older) from the one used in the second session, but it should not have a measurable effect. The purpose of this session is to establish a strong network base for the second session. Small deviations in strength do not harm the learnability in the second session.
The training was done using the following command:
python3 train.py \
/home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
/home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
--gpus "$3," \
--threads 4 \
--num-workers 4 \
--batch-size 16384 \
--progress_bar_refresh_rate 20 \
--random-fen-skipping 3 \
--features=HalfKAv2_hm^ \
--lambda=1.0 \
--gamma=0.992 \
--lr=8.75e-4 \
--max_epochs=400 \
--default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2
Every 20th net was saved and its playing strength measured against some baseline at 25k nodes per move with pure NNUE evaluation (modified binary). The exact setup is not important as long as it's consistent. The purpose is to sift good candidates from bad ones.
The dataset can be found https://drive.google.com/file/d/1UQdZN_LWQ265spwTBwDKo0t1WjSJKvWY/view
Second session:
The second training session was done starting from the best network (as determined by strength testing) from the first session. It is important that it's resumed from a .pt model and NOT a .ckpt model. The conversion can be performed directly using serialize.py
The LR schedule was modified to use gamma=0.995 instead of gamma=0.992 and LR=4.375e-4 instead of LR=8.75e-4 to flatten the LR curve and allow for longer training. The training was then running for 800 epochs instead of 400 (though it's possibly mostly noise after around epoch 600).
The training was done using the following command:
The training was done using the following command:
python3 train.py \
/data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
/data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
--gpus "$3," \
--threads 4 \
--num-workers 4 \
--batch-size 16384 \
--progress_bar_refresh_rate 20 \
--random-fen-skipping 3 \
--features=HalfKAv2_hm^ \
--lambda=1.0 \
--gamma=0.995 \
--lr=4.375e-4 \
--max_epochs=800 \
--resume-from-model /data/sopel/nnue/nnue-pytorch-training/data/exp295/nn-epoch399.pt \
--default_root_dir ../nnue-pytorch-training/experiment_$1/run_$run_id
In particular note that we now use lambda=1.0 instead of lambda=0.8 (previous nets), because tests show that WDL-skipping introduced by vondele performs better with lambda=1.0. Nets were being saved every 20th epoch. In total 16 runs were made with these settings and the best nets chosen according to playing strength at 25k nodes per move with pure NNUE evaluation - these are the 4 nets that have been put on fishtest.
The dataset can be found either at ftp://ftp.chessdb.cn/pub/sopel/data_sf/T60T70wIsRightFarseerT60T74T75T76.binpack in its entirety (download might be painfully slow because hosted in China) or can be assembled in the following way:
Get the 5640ad48ae/script/interleave_binpacks.py script.
Download T60T70wIsRightFarseer.binpack https://drive.google.com/file/d/1_sQoWBl31WAxNXma2v45004CIVltytP8/view
Download farseerT74.binpack http://trainingdata.farseer.org/T74-May13-End.7z
Download farseerT75.binpack http://trainingdata.farseer.org/T75-June3rd-End.7z
Download farseerT76.binpack http://trainingdata.farseer.org/T76-Nov10th-End.7z
Run python3 interleave_binpacks.py T60T70wIsRightFarseer.binpack farseerT74.binpack farseerT75.binpack farseerT76.binpack T60T70wIsRightFarseerT60T74T75T76.binpack
Tests:
STC: https://tests.stockfishchess.org/tests/view/6203fb85d71106ed12a407b7
LLR: 2.94 (-2.94,2.94) <0.00,2.50>
Total: 16952 W: 4775 L: 4521 D: 7656
Ptnml(0-2): 133, 1818, 4318, 2076, 131
LTC: https://tests.stockfishchess.org/tests/view/62041e68d71106ed12a40e85
LLR: 2.94 (-2.94,2.94) <0.50,3.00>
Total: 14944 W: 4138 L: 3907 D: 6899
Ptnml(0-2): 21, 1499, 4202, 1728, 22
closes https://github.com/official-stockfish/Stockfish/pull/3927
Bench: 4919707
404 lines
13 KiB
C++
404 lines
13 KiB
C++
/*
|
|
Stockfish, a UCI chess playing engine derived from Glaurung 2.1
|
|
Copyright (C) 2004-2022 The Stockfish developers (see AUTHORS file)
|
|
|
|
Stockfish is free software: you can redistribute it and/or modify
|
|
it under the terms of the GNU General Public License as published by
|
|
the Free Software Foundation, either version 3 of the License, or
|
|
(at your option) any later version.
|
|
|
|
Stockfish is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
*/
|
|
|
|
// Code for calculating NNUE evaluation function
|
|
|
|
#include <iostream>
|
|
#include <set>
|
|
#include <sstream>
|
|
#include <iomanip>
|
|
#include <fstream>
|
|
|
|
#include "../evaluate.h"
|
|
#include "../position.h"
|
|
#include "../misc.h"
|
|
#include "../uci.h"
|
|
#include "../types.h"
|
|
|
|
#include "evaluate_nnue.h"
|
|
|
|
namespace Stockfish::Eval::NNUE {
|
|
|
|
// Input feature converter
|
|
LargePagePtr<FeatureTransformer> featureTransformer;
|
|
|
|
// Evaluation function
|
|
AlignedPtr<Network> network[LayerStacks];
|
|
|
|
// Evaluation function file name
|
|
std::string fileName;
|
|
std::string netDescription;
|
|
|
|
namespace Detail {
|
|
|
|
// Initialize the evaluation function parameters
|
|
template <typename T>
|
|
void initialize(AlignedPtr<T>& pointer) {
|
|
|
|
pointer.reset(reinterpret_cast<T*>(std_aligned_alloc(alignof(T), sizeof(T))));
|
|
std::memset(pointer.get(), 0, sizeof(T));
|
|
}
|
|
|
|
template <typename T>
|
|
void initialize(LargePagePtr<T>& pointer) {
|
|
|
|
static_assert(alignof(T) <= 4096, "aligned_large_pages_alloc() may fail for such a big alignment requirement of T");
|
|
pointer.reset(reinterpret_cast<T*>(aligned_large_pages_alloc(sizeof(T))));
|
|
std::memset(pointer.get(), 0, sizeof(T));
|
|
}
|
|
|
|
// Read evaluation function parameters
|
|
template <typename T>
|
|
bool read_parameters(std::istream& stream, T& reference) {
|
|
|
|
std::uint32_t header;
|
|
header = read_little_endian<std::uint32_t>(stream);
|
|
if (!stream || header != T::get_hash_value()) return false;
|
|
return reference.read_parameters(stream);
|
|
}
|
|
|
|
// Write evaluation function parameters
|
|
template <typename T>
|
|
bool write_parameters(std::ostream& stream, const T& reference) {
|
|
|
|
write_little_endian<std::uint32_t>(stream, T::get_hash_value());
|
|
return reference.write_parameters(stream);
|
|
}
|
|
|
|
} // namespace Detail
|
|
|
|
// Initialize the evaluation function parameters
|
|
void initialize() {
|
|
|
|
Detail::initialize(featureTransformer);
|
|
for (std::size_t i = 0; i < LayerStacks; ++i)
|
|
Detail::initialize(network[i]);
|
|
}
|
|
|
|
// Read network header
|
|
bool read_header(std::istream& stream, std::uint32_t* hashValue, std::string* desc)
|
|
{
|
|
std::uint32_t version, size;
|
|
|
|
version = read_little_endian<std::uint32_t>(stream);
|
|
*hashValue = read_little_endian<std::uint32_t>(stream);
|
|
size = read_little_endian<std::uint32_t>(stream);
|
|
if (!stream || version != Version) return false;
|
|
desc->resize(size);
|
|
stream.read(&(*desc)[0], size);
|
|
return !stream.fail();
|
|
}
|
|
|
|
// Write network header
|
|
bool write_header(std::ostream& stream, std::uint32_t hashValue, const std::string& desc)
|
|
{
|
|
write_little_endian<std::uint32_t>(stream, Version);
|
|
write_little_endian<std::uint32_t>(stream, hashValue);
|
|
write_little_endian<std::uint32_t>(stream, desc.size());
|
|
stream.write(&desc[0], desc.size());
|
|
return !stream.fail();
|
|
}
|
|
|
|
// Read network parameters
|
|
bool read_parameters(std::istream& stream) {
|
|
|
|
std::uint32_t hashValue;
|
|
if (!read_header(stream, &hashValue, &netDescription)) return false;
|
|
if (hashValue != HashValue) return false;
|
|
if (!Detail::read_parameters(stream, *featureTransformer)) return false;
|
|
for (std::size_t i = 0; i < LayerStacks; ++i)
|
|
if (!Detail::read_parameters(stream, *(network[i]))) return false;
|
|
return stream && stream.peek() == std::ios::traits_type::eof();
|
|
}
|
|
|
|
// Write network parameters
|
|
bool write_parameters(std::ostream& stream) {
|
|
|
|
if (!write_header(stream, HashValue, netDescription)) return false;
|
|
if (!Detail::write_parameters(stream, *featureTransformer)) return false;
|
|
for (std::size_t i = 0; i < LayerStacks; ++i)
|
|
if (!Detail::write_parameters(stream, *(network[i]))) return false;
|
|
return (bool)stream;
|
|
}
|
|
|
|
// Evaluation function. Perform differential calculation.
|
|
Value evaluate(const Position& pos, bool adjusted) {
|
|
|
|
// We manually align the arrays on the stack because with gcc < 9.3
|
|
// overaligning stack variables with alignas() doesn't work correctly.
|
|
|
|
constexpr uint64_t alignment = CacheLineSize;
|
|
int delta = 7;
|
|
|
|
#if defined(ALIGNAS_ON_STACK_VARIABLES_BROKEN)
|
|
TransformedFeatureType transformedFeaturesUnaligned[
|
|
FeatureTransformer::BufferSize + alignment / sizeof(TransformedFeatureType)];
|
|
|
|
auto* transformedFeatures = align_ptr_up<alignment>(&transformedFeaturesUnaligned[0]);
|
|
#else
|
|
alignas(alignment)
|
|
TransformedFeatureType transformedFeatures[FeatureTransformer::BufferSize];
|
|
#endif
|
|
|
|
ASSERT_ALIGNED(transformedFeatures, alignment);
|
|
|
|
const std::size_t bucket = (pos.count<ALL_PIECES>() - 1) / 4;
|
|
const auto psqt = featureTransformer->transform(pos, transformedFeatures, bucket);
|
|
const auto positional = network[bucket]->propagate(transformedFeatures);
|
|
|
|
// Give more value to positional evaluation when adjusted flag is set
|
|
if (adjusted)
|
|
return static_cast<Value>(((128 - delta) * psqt + (128 + delta) * positional) / 128 / OutputScale);
|
|
else
|
|
return static_cast<Value>((psqt + positional) / OutputScale);
|
|
}
|
|
|
|
struct NnueEvalTrace {
|
|
static_assert(LayerStacks == PSQTBuckets);
|
|
|
|
Value psqt[LayerStacks];
|
|
Value positional[LayerStacks];
|
|
std::size_t correctBucket;
|
|
};
|
|
|
|
static NnueEvalTrace trace_evaluate(const Position& pos) {
|
|
|
|
// We manually align the arrays on the stack because with gcc < 9.3
|
|
// overaligning stack variables with alignas() doesn't work correctly.
|
|
|
|
constexpr uint64_t alignment = CacheLineSize;
|
|
|
|
#if defined(ALIGNAS_ON_STACK_VARIABLES_BROKEN)
|
|
TransformedFeatureType transformedFeaturesUnaligned[
|
|
FeatureTransformer::BufferSize + alignment / sizeof(TransformedFeatureType)];
|
|
|
|
auto* transformedFeatures = align_ptr_up<alignment>(&transformedFeaturesUnaligned[0]);
|
|
#else
|
|
alignas(alignment)
|
|
TransformedFeatureType transformedFeatures[FeatureTransformer::BufferSize];
|
|
#endif
|
|
|
|
ASSERT_ALIGNED(transformedFeatures, alignment);
|
|
|
|
NnueEvalTrace t{};
|
|
t.correctBucket = (pos.count<ALL_PIECES>() - 1) / 4;
|
|
for (std::size_t bucket = 0; bucket < LayerStacks; ++bucket) {
|
|
const auto materialist = featureTransformer->transform(pos, transformedFeatures, bucket);
|
|
const auto positional = network[bucket]->propagate(transformedFeatures);
|
|
|
|
t.psqt[bucket] = static_cast<Value>( materialist / OutputScale );
|
|
t.positional[bucket] = static_cast<Value>( positional / OutputScale );
|
|
}
|
|
|
|
return t;
|
|
}
|
|
|
|
static const std::string PieceToChar(" PNBRQK pnbrqk");
|
|
|
|
|
|
// format_cp_compact() converts a Value into (centi)pawns and writes it in a buffer.
|
|
// The buffer must have capacity for at least 5 chars.
|
|
static void format_cp_compact(Value v, char* buffer) {
|
|
|
|
buffer[0] = (v < 0 ? '-' : v > 0 ? '+' : ' ');
|
|
|
|
int cp = std::abs(100 * v / PawnValueEg);
|
|
if (cp >= 10000)
|
|
{
|
|
buffer[1] = '0' + cp / 10000; cp %= 10000;
|
|
buffer[2] = '0' + cp / 1000; cp %= 1000;
|
|
buffer[3] = '0' + cp / 100;
|
|
buffer[4] = ' ';
|
|
}
|
|
else if (cp >= 1000)
|
|
{
|
|
buffer[1] = '0' + cp / 1000; cp %= 1000;
|
|
buffer[2] = '0' + cp / 100; cp %= 100;
|
|
buffer[3] = '.';
|
|
buffer[4] = '0' + cp / 10;
|
|
}
|
|
else
|
|
{
|
|
buffer[1] = '0' + cp / 100; cp %= 100;
|
|
buffer[2] = '.';
|
|
buffer[3] = '0' + cp / 10; cp %= 10;
|
|
buffer[4] = '0' + cp / 1;
|
|
}
|
|
}
|
|
|
|
|
|
// format_cp_aligned_dot() converts a Value into (centi)pawns and writes it in a buffer,
|
|
// always keeping two decimals. The buffer must have capacity for at least 7 chars.
|
|
static void format_cp_aligned_dot(Value v, char* buffer) {
|
|
|
|
buffer[0] = (v < 0 ? '-' : v > 0 ? '+' : ' ');
|
|
|
|
double cp = 1.0 * std::abs(int(v)) / PawnValueEg;
|
|
sprintf(&buffer[1], "%6.2f", cp);
|
|
}
|
|
|
|
|
|
// trace() returns a string with the value of each piece on a board,
|
|
// and a table for (PSQT, Layers) values bucket by bucket.
|
|
|
|
std::string trace(Position& pos) {
|
|
|
|
std::stringstream ss;
|
|
|
|
char board[3*8+1][8*8+2];
|
|
std::memset(board, ' ', sizeof(board));
|
|
for (int row = 0; row < 3*8+1; ++row)
|
|
board[row][8*8+1] = '\0';
|
|
|
|
// A lambda to output one box of the board
|
|
auto writeSquare = [&board](File file, Rank rank, Piece pc, Value value) {
|
|
|
|
const int x = ((int)file) * 8;
|
|
const int y = (7 - (int)rank) * 3;
|
|
for (int i = 1; i < 8; ++i)
|
|
board[y][x+i] = board[y+3][x+i] = '-';
|
|
for (int i = 1; i < 3; ++i)
|
|
board[y+i][x] = board[y+i][x+8] = '|';
|
|
board[y][x] = board[y][x+8] = board[y+3][x+8] = board[y+3][x] = '+';
|
|
if (pc != NO_PIECE)
|
|
board[y+1][x+4] = PieceToChar[pc];
|
|
if (value != VALUE_NONE)
|
|
format_cp_compact(value, &board[y+2][x+2]);
|
|
};
|
|
|
|
// We estimate the value of each piece by doing a differential evaluation from
|
|
// the current base eval, simulating the removal of the piece from its square.
|
|
Value base = evaluate(pos);
|
|
base = pos.side_to_move() == WHITE ? base : -base;
|
|
|
|
for (File f = FILE_A; f <= FILE_H; ++f)
|
|
for (Rank r = RANK_1; r <= RANK_8; ++r)
|
|
{
|
|
Square sq = make_square(f, r);
|
|
Piece pc = pos.piece_on(sq);
|
|
Value v = VALUE_NONE;
|
|
|
|
if (pc != NO_PIECE && type_of(pc) != KING)
|
|
{
|
|
auto st = pos.state();
|
|
|
|
pos.remove_piece(sq);
|
|
st->accumulator.computed[WHITE] = false;
|
|
st->accumulator.computed[BLACK] = false;
|
|
|
|
Value eval = evaluate(pos);
|
|
eval = pos.side_to_move() == WHITE ? eval : -eval;
|
|
v = base - eval;
|
|
|
|
pos.put_piece(pc, sq);
|
|
st->accumulator.computed[WHITE] = false;
|
|
st->accumulator.computed[BLACK] = false;
|
|
}
|
|
|
|
writeSquare(f, r, pc, v);
|
|
}
|
|
|
|
ss << " NNUE derived piece values:\n";
|
|
for (int row = 0; row < 3*8+1; ++row)
|
|
ss << board[row] << '\n';
|
|
ss << '\n';
|
|
|
|
auto t = trace_evaluate(pos);
|
|
|
|
ss << " NNUE network contributions "
|
|
<< (pos.side_to_move() == WHITE ? "(White to move)" : "(Black to move)") << std::endl
|
|
<< "+------------+------------+------------+------------+\n"
|
|
<< "| Bucket | Material | Positional | Total |\n"
|
|
<< "| | (PSQT) | (Layers) | |\n"
|
|
<< "+------------+------------+------------+------------+\n";
|
|
|
|
for (std::size_t bucket = 0; bucket < LayerStacks; ++bucket)
|
|
{
|
|
char buffer[3][8];
|
|
std::memset(buffer, '\0', sizeof(buffer));
|
|
|
|
format_cp_aligned_dot(t.psqt[bucket], buffer[0]);
|
|
format_cp_aligned_dot(t.positional[bucket], buffer[1]);
|
|
format_cp_aligned_dot(t.psqt[bucket] + t.positional[bucket], buffer[2]);
|
|
|
|
ss << "| " << bucket << " "
|
|
<< " | " << buffer[0] << " "
|
|
<< " | " << buffer[1] << " "
|
|
<< " | " << buffer[2] << " "
|
|
<< " |";
|
|
if (bucket == t.correctBucket)
|
|
ss << " <-- this bucket is used";
|
|
ss << '\n';
|
|
}
|
|
|
|
ss << "+------------+------------+------------+------------+\n";
|
|
|
|
return ss.str();
|
|
}
|
|
|
|
|
|
// Load eval, from a file stream or a memory stream
|
|
bool load_eval(std::string name, std::istream& stream) {
|
|
|
|
initialize();
|
|
fileName = name;
|
|
return read_parameters(stream);
|
|
}
|
|
|
|
// Save eval, to a file stream or a memory stream
|
|
bool save_eval(std::ostream& stream) {
|
|
|
|
if (fileName.empty())
|
|
return false;
|
|
|
|
return write_parameters(stream);
|
|
}
|
|
|
|
/// Save eval, to a file given by its name
|
|
bool save_eval(const std::optional<std::string>& filename) {
|
|
|
|
std::string actualFilename;
|
|
std::string msg;
|
|
|
|
if (filename.has_value())
|
|
actualFilename = filename.value();
|
|
else
|
|
{
|
|
if (currentEvalFileName != EvalFileDefaultName)
|
|
{
|
|
msg = "Failed to export a net. A non-embedded net can only be saved if the filename is specified";
|
|
|
|
sync_cout << msg << sync_endl;
|
|
return false;
|
|
}
|
|
actualFilename = EvalFileDefaultName;
|
|
}
|
|
|
|
std::ofstream stream(actualFilename, std::ios_base::binary);
|
|
bool saved = save_eval(stream);
|
|
|
|
msg = saved ? "Network saved successfully to " + actualFilename
|
|
: "Failed to export a net";
|
|
|
|
sync_cout << msg << sync_endl;
|
|
return saved;
|
|
}
|
|
|
|
|
|
} // namespace Stockfish::Eval::NNUE
|