test_suite fails when using GPU @v3.5.3
master@3f7738f6e7f71d13125dd4a5ecbd22cdc7f0e98d When using GPU test_suite fails.
detailed logs
[SHTns 3.5.3] built Apr 25 2023, 23:30:50, id: v3.5.3*,avx2,ishioka
Lmax=7, Mmax*Mres=3, Mres=1, Nlm=26 [1 threads, orthonormalized]
=> using FFTW : Mmax=3, Nphi=8, Nlat=64 (data layout : phi_inc=64, theta_inc=1)
=> using Gauss nodes
Sum of weights = 2 + 4.44089e-16 (should be 2)
Applying quadrature rule to 3/2.x^2 = 1 + 1.11022e-16 (should be 1)
Applying quadrature rule to 3/4.sin2(theta) = 1 + -5.55112e-17 (should be 1)
+ polar optimization threshold = 1.0e-10
cuda GPU #0 "NVIDIA GeForce RTX 3050 Laptop GPU" found (warp size = 32, compute capabilities = 8.6).
have a theta contiguous layout
work-area size: 512 nlat*nphi = 0
float work-area size: 512 nlat*nphi = 0
+ GPU #0 successfully initialized.
+ SHT accuracy = 1.83e-15
=> SHTns is ready.
line #124 : FAIL (|3.11 - 3.54| = 0.433)
line #125 : FAIL (|1 - 0.0205| = 0.983)
line #126 : FAIL (|1 - 0.317| = 0.686)
line #128 : FAIL (|-0.0216 - -1.45| = 1.43)
line #129 : FAIL (|0.0174 - 0.434| = 0.417)
line #130 : FAIL (|-0.000763 - 0.56| = 0.561)
line #131 : FAIL (|-0.00201 - -0.056| = 0.054)
line #135 : FAIL (|-0.179 - 0| = 0.179)
line #135 : FAIL (|0.168 - 0| = 0.168)
line #135 : FAIL (|-0.17 - 0| = 0.17)
line #135 : FAIL (|0.00779 - 0| = 0.00779)
line #135 : FAIL (|-0.0478 - 0| = 0.0478)
line #135 : FAIL (|0.00174 - 0| = 0.00174)
line #136 : FAIL (|0.0384 - 0| = 0.0384)
line #135 : FAIL (|0.0114 - 0| = 0.0114)
line #136 : FAIL (|0.00477 - 0| = 0.00477)
line #135 : FAIL (|-0.00789 - 0| = 0.00789)
line #136 : FAIL (|0.0191 - 0| = 0.0191)
line #135 : FAIL (|0.0112 - 0| = 0.0112)
line #136 : FAIL (|-0.0171 - 0| = 0.0171)
line #135 : FAIL (|-0.000607 - 0| = 0.000607)
line #136 : FAIL (|0.0015 - 0| = 0.0015)
line #135 : FAIL (|0.00456 - 0| = 0.00456)
line #136 : FAIL (|-0.00562 - 0| = 0.00562)
line #135 : FAIL (|0.0119 - 0| = 0.0119)
line #136 : FAIL (|-0.00721 - 0| = 0.00721)
line #135 : FAIL (|0.00979 - 0| = 0.00979)
line #136 : FAIL (|-0.0148 - 0| = 0.0148)
line #135 : FAIL (|0.00696 - 0| = 0.00696)
line #136 : FAIL (|-0.00794 - 0| = 0.00794)
line #135 : FAIL (|-0.00555 - 0| = 0.00555)
line #136 : FAIL (|0.00401 - 0| = 0.00401)
line #135 : FAIL (|0.00221 - 0| = 0.00221)
line #136 : FAIL (|-0.00261 - 0| = 0.00261)
line #135 : FAIL (|-0.01 - 0| = 0.01)
line #136 : FAIL (|0.00293 - 0| = 0.00293)
line #135 : FAIL (|-0.0106 - 0| = 0.0106)
line #136 : FAIL (|0.00554 - 0| = 0.00554)
line #135 : FAIL (|-0.00216 - 0| = 0.00216)
line #136 : FAIL (|0.00134 - 0| = 0.00134)
line #135 : FAIL (|-0.00794 - 0| = 0.00794)
line #136 : FAIL (|0.00367 - 0| = 0.00367)
line #135 : FAIL (|0.00309 - 0| = 0.00309)
line #136 : FAIL (|-0.000917 - 0| = 0.000917)
line #151 : FAIL (|1.03 - 0| = 1.03)
** ERROR: 45 tests out of 75 FAILED**
printf(COLOR_WRN "** ANALYSE TESTS **" COLOR_END "\n");
const int nlat = 64;
const int nphi = 8;
const int lmax = 7;
shtns_cfg sht = shtns_init(sht_orthonormal | sht_auto|SHT_ALLOW_GPU, lmax, 3, 1, nlat, nphi);
`