Read NNUE net faster

Load feature transformer weights in bulk on little-endian machines.
This is in particular useful to test new nets with c-chess-cli,
see https://github.com/lucasart/c-chess-cli/issues/44

```
$ time ./stockfish.exe uci

Before : 0m0.914s
After  : 0m0.483s
```

No functional change
This commit is contained in:
Tomasz Sobczyk
2021-06-09 11:21:55 +02:00
committed by Stéphane Nicolet
parent 559942d64d
commit b84fa04db6
4 changed files with 81 additions and 39 deletions

View File

@@ -66,9 +66,10 @@ std::ostream& operator<<(std::ostream&, SyncCout);
#define sync_cout std::cout << IO_LOCK
#define sync_endl std::endl << IO_UNLOCK
// `ptr` must point to an array of size at least
// `sizeof(T) * N + alignment` bytes, where `N` is the
// number of elements in the array.
// align_ptr_up() : get the first aligned element of an array.
// ptr must point to an array of size at least `sizeof(T) * N + alignment` bytes,
// where N is the number of elements in the array.
template <uintptr_t Alignment, typename T>
T* align_ptr_up(T* ptr)
{
@@ -78,6 +79,12 @@ T* align_ptr_up(T* ptr)
return reinterpret_cast<T*>(reinterpret_cast<char*>((ptrint + (Alignment - 1)) / Alignment * Alignment));
}
// IsLittleEndian : true if and only if the binary is compiled on a little endian machine
static inline const union { uint32_t i; char c[4]; } Le = { 0x01020304 };
static inline const bool IsLittleEndian = (Le.c[0] == 4);
template <typename T>
class ValueListInserter {
public: