calculation error

Message boards : Number crunching : calculation error
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Ian&Steve C.

Send message
Joined: 13 Jan 25
Posts: 2
Credit: 7,500
RAC: 84
Message 954 - Posted: 13 Jan 2025, 15:31:30 UTC - in response to Message 953.  
Last modified: 13 Jan 2025, 15:36:00 UTC

also, your Linux (maybe Windows too, havent checked) v1.02 app is broken for multi-GPU setups. all tasks seem to be hard coded to run on GPU0.

and even stranger, when you abort extra tasks that are running on the GPU, it causes the one(s) that were intended to remain running to insta-error with an illegal memory access error.
ID: 954 · Report as offensive     Reply Quote
fzs600

Send message
Joined: 25 Jun 20
Posts: 9
Credit: 104,176,691
RAC: 1,466,624
Message 955 - Posted: 13 Jan 2025, 16:35:33 UTC - in response to Message 939.  

Appears only hosts with 560 or 565 drivers have been successful.

NVIDIA NVIDIA GeForce RTX 2080 Ti (4095MB) driver: 550.99 OpenCL: 3.0
Linux Ubuntu
Ubuntu 22.04.5 LTS [6.8.0-51-generic|libc 2.35]
No error


13 Jan 2025, 14:21:38 UTC Terminé et validé 6,873.84 6,873.84 2,500.00 1.20 Find seeds with zero villages within a radius v1.02 (cuda)
x86_64-pc-linux-gnu
ID: 955 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 32
Credit: 101,415,555
RAC: 110,632
Message 957 - Posted: 13 Jan 2025, 20:31:34 UTC

Hi everyone,

We've released v1.03 of this app to address the following:

1. Compiled directly to architecture levels, increasing binary size but improving speed (~36% improvement on a Titan X Maxwell measured)
2. Potential multi-gpu fix
3. More debug text to help with GPU assignment bug troubleshooting


If you're having issues with tasks not assigning to the correct GPUs, please post the contents of your stderr.txt (you can find this in the slot directory the task is running in before the task is aborted/finished) and we'll be happy to help.

Thanks again for your patience!
ID: 957 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 32
Credit: 101,415,555
RAC: 110,632
Message 958 - Posted: 13 Jan 2025, 21:23:33 UTC

Hi again,

1.04 has been released with a potential multi-gpu fix!

Please let us know if you continue to have this problem (where all tasks run on a single GPU when multiple are available).
ID: 958 · Report as offensive     Reply Quote
Sabroe_SMC

Send message
Joined: 3 May 21
Posts: 7
Credit: 98,730,706
RAC: 2,001,540
Message 962 - Posted: 14 Jan 2025, 13:59:32 UTC - in response to Message 958.  
Last modified: 14 Jan 2025, 14:04:44 UTC

10827366 5088882 13 Jan 2025, 23:00:52 UTC 14 Jan 2025, 13:34:45 UTC Fertig, Warte auf Bestätigung 6,866.32 6,866.32 ausstehend 1.20 Find seeds with zero villages within a radius v1.04 (cuda)
x86_64-pc-linux-gnu
10826618 5088635 13 Jan 2025, 21:41:33 UTC 14 Jan 2025, 11:40:20 UTC Fertig und Bestätigt 6,821.69 6,821.69 2,500.00 1.20 Find seeds with zero villages within a radius v1.04 (cuda)
x86_64-pc-linux-gnu

The running time of the Wus has increased from about 4800 sec to more than 6800 sec.
I don't like it like that...
This is from my RTX2080Ti graphics card

10301538 5089110 13 Jan 2025, 20:22:15 UTC 13 Jan 2025, 23:29:03 UTC Fertig und Bestätigt 4,857.88 4,858.48 2,500.00 1.20 Find seeds with zero villages within a radius v1.02 (cuda)
x86_64-pc-linux-gnu
10306032 5091357 13 Jan 2025, 19:02:58 UTC 13 Jan 2025, 22:08:07 UTC Fertig und Bestätigt 4,848.94 4,849.30 2,500.00 1.20 Find seeds with zero villages within a radius v1.02 (cuda)
x86_64-pc-linux-gnu
10303172 5089927 13 Jan 2025, 17:43:37 UTC 13 Jan 2025, 20:47:14 UTC Fertig und Bestätigt 4,847.93 4,848.74 2,500.00 1.20 Find seeds with zero villages within a radius v1.02 (cuda)
x86_64-pc-linux-gnu

The same graphics card one day before. What are you doing there?????
It is only under Linux
ID: 962 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 32
Credit: 101,415,555
RAC: 110,632
Message 965 - Posted: 14 Jan 2025, 18:48:30 UTC - in response to Message 962.  

Hello,

We've pushed another update (1.05) that may address the speed change.

Please let me know if this helps.

Thanks!
ID: 965 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 24 Jun 20
Posts: 26
Credit: 652,109,541
RAC: 7,054,492
Message 966 - Posted: 15 Jan 2025, 0:08:57 UTC

FWIW, the tasks sped up for me with 1.04. With 1.05, times are now back to the slower run times of 1.00-1.03 with
ID: 966 · Report as offensive     Reply Quote
Sabroe_SMC

Send message
Joined: 3 May 21
Posts: 7
Credit: 98,730,706
RAC: 2,001,540
Message 967 - Posted: 15 Jan 2025, 1:33:29 UTC - in response to Message 965.  

Hello,

We've pushed another update (1.05) that may address the speed change.

Please let me know if this helps.

Thanks!


No, times have gotten a little better but not much. You should look here:

10828885 5089390 14 Jan 2025, 21:25:47 UTC 15 Jan 2025, 1:22:07 UTC Fertig, Bestätigung nicht eindeutig 6,023.90 6,023.90 ausstehend 1.20 Find seeds with zero villages within a radius v1.05 (cuda)
x86_64-pc-linux-gnu
10828965 5090361 14 Jan 2025, 19:28:11 UTC 14 Jan 2025, 23:41:45 UTC Fertig, Bestätigung nicht eindeutig 6,069.65 6,069.65 ausstehend 1.20 Find seeds with zero villages within a radius v1.05 (cuda)
x86_64-pc-linux-gnu
10828187 5090353 14 Jan 2025, 17:36:46 UTC 14 Jan 2025, 22:00:39 UTC Fertig und Bestätigt 6,816.11 6,817.87 2,500.00 1.20 Find seeds with zero villages within a radius v1.04 (cuda)
x86_64-pc-linux-gnu
10828702 5088638 14 Jan 2025, 15:46:11 UTC 14 Jan 2025, 20:06:58 UTC Fertig und Bestätigt 6,833.38 6,834.77 2,500.00 1.20 Find seeds with zero villages within a radius v1.04 (cuda)
x86_64-pc-linux-gnu
10307964 5092323 14 Jan 2025, 13:17:37 UTC 14 Jan 2025, 19:13:09 UTC Fertig, Warte auf Bestätigung 6,825.91 6,828.04 ausstehend 1.20 Find seeds with zero villages within a radius v1.04 (cuda)
x86_64-pc-linux-gnu
10826777 5088587 14 Jan 2025, 0:41:19 UTC 14 Jan 2025, 16:19:16 UTC Fertig und Bestätigt 6,825.03 6,825.03 2,500.00 1.20 Find seeds with zero villages within a radius v1.04 (cuda)
x86_64-pc-linux-gnu
10827366 5088882 13 Jan 2025, 23:00:52 UTC 14 Jan 2025, 13:34:45 UTC Fertig, Warte auf Bestätigung 6,866.32 6,866.32 ausstehend 1.20 Find seeds with zero villages within a radius v1.04 (cuda)
x86_64-pc-linux-gnu
10826618 5088635 13 Jan 2025, 21:41:33 UTC 14 Jan 2025, 11:40:20 UTC Fertig und Bestätigt 6,821.69 6,821.69 2,500.00 1.20 Find seeds with zero villages within a radius v1.04 (cuda)
x86_64-pc-linux-gnu
10301538 5089110 13 Jan 2025, 20:22:15 UTC 13 Jan 2025, 23:29:03 UTC Fertig und Bestätigt 4,857.88 4,858.48 2,500.00 1.20 Find seeds with zero villages within a radius v1.02 (cuda)
x86_64-pc-linux-gnu
10306032 5091357 13 Jan 2025, 19:02:58 UTC 13 Jan 2025, 22:08:07 UTC Fertig und Bestätigt 4,848.94 4,849.30 2,500.00 1.20 Find seeds with zero villages within a radius v1.02 (cuda)
x86_64-pc-linux-gnu
10303172 5089927 13 Jan 2025, 17:43:37 UTC 13 Jan 2025, 20:47:14 UTC Fertig und Bestätigt 4,847.93 4,848.74 2,500.00 1.20 Find seeds with zero villages within a radius v1.02 (cuda)
x86_64-pc-linux-gnu
10304044 5090363 13 Jan 2025, 16:44:27 UTC 13 Jan 2025, 19:26:25 UTC Fertig und Bestätigt 4,853.22 4,853.22 2,500.00 1.20 Find seeds with zero villages within a radius v1.02 (cuda)
x86_64-pc-linux-gnu
ID: 967 · Report as offensive     Reply Quote
Profile Henk Haneveld

Send message
Joined: 24 Jun 20
Posts: 6
Credit: 1,715,763
RAC: 0
Message 968 - Posted: 15 Jan 2025, 8:20:37 UTC - in response to Message 965.  

Hello,

We've pushed another update (1.05) that may address the speed change.

Please let me know if this helps.

Thanks!

Before you start working at speed optimization may I suggest that you first fix the errors in the app.

I get a Computation error with message:
GPUassert: the launch timed out and was terminated (code 702) main.cu 4862

This is on Windows 10, Boinc 8.02 with GPU card
NVIDIA GeForce GTX 750 Ti (driver version 566.36, CUDA version 12.7, compute capability 5.0, 2048MB, 2048MB available, 1388 GFLOPS peak)

Thanks
ID: 968 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 32
Credit: 101,415,555
RAC: 110,632
Message 969 - Posted: 15 Jan 2025, 9:00:38 UTC

Hi everyone,

I'm going to push a 1.06 version which reverts a change we made to the maxrregcount argument Nvidia's compiler uses.
It sped up certain cards (Titan X Maxwell, RTX 3090, V100, GTX 1060 tested) but RTX 4090, RTX 2080Ti, and seemingly GTX 750Ti have been negatively impacted.

For the error on GTX 750Ti, it's just a timeout error as there's a limit to how long you can run a single kernel wave. Since we changed the max register count, it's possible this impacted the weaker GTX 750Ti enough to fail this timeout.

Hopefully this addresses both issues.

Thanks!
ID: 969 · Report as offensive     Reply Quote
Profile Henk Haneveld

Send message
Joined: 24 Jun 20
Posts: 6
Credit: 1,715,763
RAC: 0
Message 970 - Posted: 15 Jan 2025, 9:33:48 UTC - in response to Message 969.  
Last modified: 15 Jan 2025, 9:50:52 UTC

Hi everyone,

I'm going to push a 1.06 version which reverts a change we made to the maxrregcount argument Nvidia's compiler uses.
It sped up certain cards (Titan X Maxwell, RTX 3090, V100, GTX 1060 tested) but RTX 4090, RTX 2080Ti, and seemingly GTX 750Ti have been negatively impacted.

For the error on GTX 750Ti, it's just a timeout error as there's a limit to how long you can run a single kernel wave. Since we changed the max register count, it's possible this impacted the weaker GTX 750Ti enough to fail this timeout.

Hopefully this addresses both issues.

Thanks!

Thanks for the attempt to fix the problem on GTX750Ti but it still gives a error but a different one.

GPUassert: unknown error (code 999) main.cu 4870

Also my screen went completely black for a few secondes. It starts to look like that the demands of the app are to high for my GPU card.
ID: 970 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 32
Credit: 101,415,555
RAC: 110,632
Message 986 - Posted: 17 Jan 2025, 0:48:55 UTC

Hi everyone,

I'm currently applying a new version of the loneseed application (1.07) with the following changes/improvements:

1. Smaller batch sizes when we detect a GPU on the Maxwell architecture
(Resolves the 500 or so instances where older 900 and 700 series GPUs cannot complete the work since the kernel runs for too long and gets killed due to timeout)
2. More robust error handling for cuda functions prior to work beginning
(Should help us diagnose the remaining segmentation faults we're getting from some users)

Please let us know if you encounter any issues using this new version.


Thank you!
ID: 986 · Report as offensive     Reply Quote
Profile Henk Haneveld

Send message
Joined: 24 Jun 20
Posts: 6
Credit: 1,715,763
RAC: 0
Message 988 - Posted: 17 Jan 2025, 8:31:54 UTC - in response to Message 986.  

Hi everyone,

I'm currently applying a new version of the loneseed application (1.07) with the following changes/improvements:

1. Smaller batch sizes when we detect a GPU on the Maxwell architecture
(Resolves the 500 or so instances where older 900 and 700 series GPUs cannot complete the work since the kernel runs for too long and gets killed due to timeout)
2. More robust error handling for cuda functions prior to work beginning
(Should help us diagnose the remaining segmentation faults we're getting from some users)

Please let us know if you encounter any issues using this new version.


Thank you!

No, still not working on GTX750Ti with error
GPUassert: the launch timed out and was terminated (code 702) main.cu 4885

But the problem with a black screen is gone.
ID: 988 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : calculation error