Initial documentation for learn, gensfen, convert, and binpack.

2025-12-25 03:26:24 +08:00 · 2020-10-05 20:33:47 +02:00
parent 91cb4a6770
commit 31f9d66f12
4 changed files with 206 additions and 0 deletions
--- a/src/docs/binpack.md
+++ b/src/docs/binpack.md
@@ -0,0 +1,42 @@
+# Binpack
+
+Binpack is a binary training data storage format designed to take advantage of position chains differing by a single move. Therefore it is very good at compactly storing data generated from real games (as opposed to random positions for example sourced from an opening book).
+
+It is currently implemented through a single header library in `extra/nnue_data_binpack_format.h`.
+
+Below follows a rough description of the format in a BNF-like notation.
+
+```
+[[nodiscard]] std::uint16_t signedToUnsigned(std::int16_t a) {
+    std::uint16_t r;
+    std::memcpy(&r, &a, sizeof(std::uint16_t));
+    if (r & 0x8000) r ^= 0x7FFF; // flip value bits if negative
+    r = (r << 1) | (r >> 15); // store sign bit at bit 0
+    return r;
+}
+
+file := <block>*
+block := BINP<chain>*
+chain := <stem><movetext>
+stem := <pos><move><score><ply_and_result><rule50> (32 bytes)
+pos := https://github.com/Sopel97/nnue_data_compress/blob/master/src/chess/Position.h#L1166 (24 bytes)
+move := https://github.com/Sopel97/nnue_data_compress/blob/master/src/chess/Chess.h#L1044 (2 bytes)
+score := signedToUnsigned(score) (2 bytes, big endian)
+ply_and_result := ply bitwise_or (signedToUnsigned(result) << 14) (2 bytes, big endian)
+rule50 := rule_50_counter (2 bytes, big endian)
+    // this is a small defect from old version,
+    I didn't want to break backwards compatibility. Effectively means that there's
+    one byte left for something else in the future because rule50 always fits in one byte.
+
+movetext := <count><move_and_score>*
+count := number of plies in the movetext (2 bytes, big endian). Can be 0.
+move_and_score := <encoded_move><encoded_score> (~2 bytes)
+encoded_move := oof this one is complicated to explain.
+    https://github.com/Sopel97/nnue_data_compress/blob/master/src/compress_file.cpp#L827.
+    https://github.com/Sopel97/chess_pos_db/blob/master/docs/bcgn/variable_length.md
+
+encoded_score := https://en.wikipedia.org/wiki/Variable-width_encoding
+    with block size of 4 bits + 1 bit for extension bit.
+    Encoded value is signedToUnsigned(-prev_score - current_score)
+    (scores are always seen from the perspective of side to move in <pos>, that's why the '-' before prev_score)
+```
--- a/src/docs/convert.md
+++ b/src/docs/convert.md
@@ -0,0 +1,15 @@
+# Convert
+
+`convert` allows conversion of training between any of `.plain`, `.bin`, and `.binpack`.
+
+As all commands in stockfish `convert` can be invoked either from command line (as `stockfish.exe convert ...`) or in the interactive prompt.
+
+The syntax of this command is as follows:
+```
+convert from_path to_path [append]
+```
+
+`from_path` is the path to the file to convert from. The type of the data is deduced based on its extension (one of `.plain`, `.bin`, `.binpack`).
+`to_path` is the path to an output file. The type of the data is deduced from its extension. If the file does not exist it is created.
+
+Last argument is optional. If not specified then they output file will be truncated prior to any writes. If the last argument is `append` then the converted training data will be appended to the end of the output file.
--- a/src/docs/gensfen.md
+++ b/src/docs/gensfen.md
@@ -0,0 +1,57 @@
+# Gensfen
+
+`gensfen` command allows generation of training data from self-play in a manner that suits training better than traditional games. It introduces random moves to diversify openings, allows reduced pruning, disabling of TT for less interference between searches, and fixed depth evaluation.
+
+As all commands in stockfish `gensfen` can be invoked either from command line (as `stockfish.exe gensfen ...`, but this is not recommended because it's not possible to specify UCI options before `gensfen` executes) or in the interactive prompt.
+
+`gensfen` takes named parameters in form `gensfen param_1_name param_1_value param_2_name param_2_value ...`.
+
+Currently the following options are available:
+
+`depth` - minimum depth of evaluation of each position. Default: 3.
+
+`depth2` - maximum depth of evaluation of each position. If not specified then the same as `depth`.
+
+`nodes` - the number of nodes to use for evaluation of each position. This number is multiplied by the number of PVs of the current search. This does NOT override the `depth` and `depth2` options. If specified then whichever of depth or nodes limit is reached first applies.
+
+`loop` - the number of training data entries to generate. 1 entry == 1 position.
+
+`output_file_name` - the name of the file to output to. If the extension is not present or doesn't match the selected training data format the right extension will be appened.
+
+`eval_limit` - evaluations with higher absolute value than this will not be written and will terminate a self-play game. Should not exceed 10000 which VALUE_KNOWN_WIN, but is only hardcapped at mate in 2 (\~30000).
+
+`random_move_minply` - the minimal ply at which a random move may be executed instead of a move chosen by search
+
+`random_move_maxply` - the maximal ply at which a random move may be executed instead of a move chosen by search
+
+`random_move_count` - maximum number of random moves in a single self-play game
+
+`random_move_like_apery` - either 0 or 1. If 1 then random king moves will be followed by a random king move from the opponent whenever possible with 50% probability.
+
+`random_multi_pv` - the number of PVs used for determining the random move. If not specified then a truly random move will be chosen. If specified then a multiPV search will be performed the random move will be one of the moves chosen by the search.
+
+`random_multi_pv_diff` - Makes the multiPV random move selection consider only moves that are at most `random_multi_pv_diff` worse than the next best move. Default: 30000 (all multiPV moves).
+
+`random_multi_pv_depth` - the depth to use for multiPV search for random move. Defaults to `depth2`.
+
+`write_minply` - minimum ply for which the training data entry will be emitted.
+
+`write_maxply` - maximum ply for which the training data entry will be emitted.
+
+`save_every` - the number of training data entries per file. If not specified then there will be always one file. If specified there may be more than one file generated (each having at most `save_every` training data entries) and each file will have a unique number attached.
+
+`random_file_name` - if specified then the output filename will be chosen randomly. Overrides `output_file_name`.
+
+`use_draw_in_training_data_generation` - either 0 or 1. If 1 then training data from drawn games will be emitted too. Default: 0.
+
+`write_out_draw_game_in_training_data_generation` - deprecated, alias for `use_draw_in_training_data_generation`
+
+`detect_draw_by_consecutive_low_score` - either 0 or 1. If 1 then drawn games will be adjudicated when the score remains 0 for at least 8 plies after ply 80. Default: 0.
+
+`use_game_draw_adjudication` - deprecated, alias for `detect_draw_by_consecutive_low_score`
+
+`detect_draw_by_insufficient_mating_material` - either 0 or 1. If 1 then position with insufficient material will be adjudicated as draws. Default: 0.
+
+`sfen_format` - format of the training data to use. Either `bin` or `binpack`. Default: `bin`.
+
+`seed` - seed for the PRNG. Can be either a number or a string. If it's a string then its hash will be used. If not specified then the current time will be used.
--- a/src/docs/learn.md
+++ b/src/docs/learn.md
@@ -0,0 +1,92 @@
+# Learn
+
+`learn` command allows allows training a network from training data.
+
+As all commands in stockfish `learn` can be invoked either from command line (as `stockfish.exe learn ...`, but this is not recommended because it's not possible to specify UCI options before `learn` executes) or in the interactive prompt.
+
+`learn` takes named parameters in form `learn param_1_name param_1_value param_2_name param_2_value ...`. Unrecognized parameters form a list of paths to training data files.
+
+Currently the following options are available:
+
+`bat` - the size of a minibatch in multiples of 10000. The number of positions inbetween weights updates. Default: 1000 (meaning mini batch size of 1000000).
+
+`targetdir` - path to the direction from which training data will be read. All files in this directory are read sequentially. If not specified then only the list of files from positional arguments will be used. If specified then files from the given directory will be used after the explicitly specified files.
+
+`loop` - the number of times to loop over all training data.
+
+`basedir` - the base directory for the paths. Default: "" (current directory)
+
+`batchsize` - same as `bat` but doesn't scale by 10000
+
+`lr` - initial learning rate. Default: 1.
+
+`use_draw_games_in_training` - either 0 or 1. If 1 then draws will be used in training too. Default: 0.
+
+`use_draw_in_training` - deprecated, alias for `use_draw_games_in_training`
+
+`use_draw_games_in_validation` - either 0 or 1. If 1 then draws will be used in validation too. Default: 0.
+
+`use_draw_in_validation` - deprecated, alias for `use_draw_games_in_validation`
+
+`skip_duplicated_positions_in_training` - either 0 or 1. If 1 then a small hashtable will be used to try to eliminate duplicated position from training. Default: 0.
+
+`use_hash_in_training` - deprecated, alias for `skip_duplicated_positions_in_training`
+
+`winning_probability_coefficient` - some magic value for winning probability. If you need to read this then don't touch it. Default: 1.0 / PawnValueEg / 4.0 * std::log(10.0)
+
+`use_wdl` - either 0 or 1. If 1 then the evaluations will be converted to win/draw/loss percentages prior to learning on them. (Slightly changes the gradient because eval has a different derivative than wdl). Default: 0.
+
+`lambda` - value in range [0..1]. 1 means that only evaluation is used for learning, 0 means that only game result is used. Values inbetween result in interpolation between the two contributions. See `lambda_limit` for when this is applied. Default: 0.33.
+
+`lambda2` - value in range [0..1]. 1 means that only evaluation is used for learning, 0 means that only game result is used. Values inbetween result in interpolation between the two contributions. See `lambda_limit` for when this is applied. Default: 0.33.
+
+`lambda_limit` - the maximum absolute score value for which `lambda` is used as opposed to `lambda2`. For positions with absolute evaluation higher than `lambda_limit` `lambda2` will be used. Default: 32000 (so always `lambda`).
+
+`reduction_gameply` - the minimum ply after which positions won't be skipped. Positions at plies below this value are skipped with a probability that lessens linearly with the ply (reaching 0 at `reduction_gameply`). Default: 1.
+
+`eval_limit` - positions with absolute evaluation higher than this will be skipped. Default: 32000 (nothing is skipped).
+
+`save_only_once` - this is a modifier not a parameter, no value follows it. If specified then there will be only one network file generated.
+
+`no_shuffle` - this is a modifier not a parameter, no value follows it. If specified then data within a batch won't be shuffled.
+
+`nn_batch_size` - batch size used for learning. Default: 1000.
+
+`newbob_decay` - learning rate will be multiplied by this factor every time a net is rejected (so in other words it controls LR drops). Default: 1.0 (no LR drops)
+
+`newbob_num_trials` - determines after how many subsequent rejected nets the training process will be terminated. Default: 2.
+
+`nn_options` - if you're reading this you don't use it. It passes messages directly to the network evaluation. I don't know what it can do either.
+
+`eval_save_interval` - every `eval_save_interval` positions the network will be saved and either accepted or rejected (in which case an LR drop follows). Default: 1000000000 (1B). (generally people use values in 10M-100M range)
+
+`loss_output_interval` - every `loss_output_interval` fittness statistics are displayed. Default: `batchsize`
+
+`validation_set_file_name` - path to the file with training data to be used for validation (loss computation and move accuracy)
+
+`seed` - seed for the PRNG. Can be either a number or a string. If it's a string then its hash will be used. If not specified then the current time will be used.
+
+## Legacy subcommands and parameters
+
+### Convert
+
+`convert_plain`
+`convert_bin`
+`interpolate_eval`
+`check_invalid_fen`
+`check_illegal_move`
+`convert_bin_from_pgn-extract`
+`pgn_eval_side_to_move`
+`convert_no_eval_fens_as_score_zero`
+`src_score_min_value`
+`src_score_max_value`
+`dest_score_min_value`
+`dest_score_max_value`
+
+### Shuffle
+
+`shuffle`
+`buffer_size`
+`shuffleq`
+`shufflem`
+`output_file_name`