Finite Field based on GPU
In recent times, I've been interested in Finite Field operations, so I decided to implement few fields in SYCL DPC++, targeting accelerators ( specifically GPGPUs ).
In this repository, currently I keep implementation of two finite field's arithmetic operations, accompanied with relevant benchmarks on both CPU, GPGPU.
F(2 ** 32)
F(2 ** 64 - 2 ** 32 + 1)
I've also written following implementations, along with benchmark results on CPU, GPU.
make
, clang-format
and dpcpp
/ clang++
installed$ lsb_release -d
Description: Ubuntu 20.04.3 LTS
$ dpcpp --version
Intel(R) oneAPI DPC++/C++ Compiler 2021.3.0 (2021.3.0.20210619)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2021.3.0/linux/bin
make # JIT kernel compilation on *default* device, for AOT read below
./run
DEVICE=cpu make # still JIT, but in runtime use CPU
DEVICE=gpu make # still JIT, but in runtime use GPU
DEVICE=host make # still JIT, but in runtime use HOST
make clean
make format
Aforementioned steps compile kernels JIT, but if target device is already known it's better to compile them AOT, which saves some time in runtime, though compiled binary is device specific now.
I provide AOT kernel compilation recipe for CPUs using avx2
instructions. You can check whether your CPU supports that.
lscpu | grep -i avx
DEVICE=cpu make aot_cpu
DEVICE=gpu make aot_gpu
You may have some other hardware, consider taking a look at AOT compilation guidelines & make necessary changes in
Makefile
.
Targeting Nvidia GPU with CUDA backend :
For targeting Nvidia GPU, you want to run
DEVICE=gpu make cuda
, so that benchmark suite is compiled for CUDA backend.
I run benchmark suite on both Intel CPU/ GPU and Nvidia GPU, keeping results 👇
Intel CPU/ GPU
Nvidia GPU
F(2 ** 32)
F(2 ** 64 - 2 ** 32 + 1)
F(2 ** 64 - 2 ** 32 + 1)
F(2 ** 64 - 2 ** 32 + 1)
You can run basic test cases using
# set variable to runtime target device
DEVICE=cpu|gpu|host make test
There's another set of randomised test cases, which asserts results ( obtained from my prime field implementation ) with another finite field implementation module, written in Python
, named galois
.
For running those, I suggest you first compile shared object using
# set variable to runtime target device
DEVICE=cpu|gpu|host make genlib
After that you can follow next steps here.
🔧 Python Random Graph Generator
A simple personal website powered by flask in python
Affogato is a multipurpose library for Android, Jetpack Compose and Kotlin.
Easy-to-use multi-strategic automatic trading for Binance Futures with Telegram integration
PyBotNet framework, high level remote control
Coefficient of Variation (CV) and Coefficient of Quartile Variation (CQV) with Confidence Intervals (CI). Python port of https://github.com/MaaniBeigy/cvcqv)
Familiar async Python MongoDB ODM