calculation error

Message boards : Number crunching : calculation error
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
fzs600

Send message
Joined: 25 Jun 20
Posts: 9
Credit: 104,176,691
RAC: 1,623,405
Message 915 - Posted: 12 Jan 2025, 6:10:29 UTC

on linux ( ubuntu ) all WU leave in calculation error
<core_client_version>7.18.1</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
No checkpoint to load
SIGSEGV: segmentation violation
Stack trace (6 frames):
../../projects/minecraftathome.com_minecrafthome/loneliest-cuda_1.00.bin(+0x2ffdf)[0x6072344fefdf]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x738ed7242520]
../../projects/minecraftathome.com_minecrafthome/loneliest-cuda_1.00.bin(+0x1f09b)[0x6072344ee09b]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x738ed7229d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x738ed7229e40]
../../projects/minecraftathome.com_minecrafthome/loneliest-cuda_1.00.bin(+0x1f6ee)[0x6072344ee6ee]

Exiting...

</stderr_txt>
]]>
ID: 915 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 32
Credit: 101,415,555
RAC: 110,632
Message 916 - Posted: 12 Jan 2025, 6:28:19 UTC - in response to Message 915.  

I'm looking into this still.

It seems to be happening with many users on Linux. My Debian 12 system seems fine though.

What CUDA version are you running? nvidia-smi should tell you
ID: 916 · Report as offensive     Reply Quote
Profile Tamagoch

Send message
Joined: 12 Sep 24
Posts: 2
Credit: 6,412,500
RAC: 1,530
Message 917 - Posted: 12 Jan 2025, 7:04:37 UTC

I'm having 100% error rate on an old GTX950 with 2GB VRAM. Other and newer 6GB cards seem to be running good (not finished yet). Is there any memory size requirements maybe?
ID: 917 · Report as offensive     Reply Quote
fzs600

Send message
Joined: 25 Jun 20
Posts: 9
Credit: 104,176,691
RAC: 1,623,405
Message 918 - Posted: 12 Jan 2025, 7:16:26 UTC - in response to Message 916.  

I'm looking into this still.

It seems to be happening with many users on Linux. My Debian 12 system seems fine though.

What CUDA version are you running? nvidia-smi should tell you

NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4
ID: 918 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 32
Credit: 101,415,555
RAC: 110,632
Message 919 - Posted: 12 Jan 2025, 7:37:17 UTC - in response to Message 917.  

2GB should be enough, I think. VRAM utilization on my Titan X shows around 1.2GB for this program.

But if other things are using memory as well, that could be problematic.

I've definitely seen cards with 6+GB error out but it's unclear if it's the same error as you're experiencing.
ID: 919 · Report as offensive     Reply Quote
klepel

Send message
Joined: 11 Mar 21
Posts: 2
Credit: 145,179,901
RAC: 83,964
Message 920 - Posted: 12 Jan 2025, 9:54:06 UTC

All the workunits fail on my 3 GPUs: Linux: GTX1650, GTX1660 ti; Windows: GTX 970.
ID: 920 · Report as offensive     Reply Quote
Profile Tamagoch

Send message
Joined: 12 Sep 24
Posts: 2
Credit: 6,412,500
RAC: 1,530
Message 921 - Posted: 12 Jan 2025, 10:19:08 UTC
Last modified: 12 Jan 2025, 10:20:02 UTC

okay, I've got 100% errors also with Quadro P3200 (6GB)
GTX1660 and RTX-es doing fine but I still waiting for validation to be sure

will test some older cards later to see anything common in these errors

p.s. I run Windows only for GPU computing
ID: 921 · Report as offensive     Reply Quote
Drago75

Send message
Joined: 13 Oct 20
Posts: 6
Credit: 28,421,591
RAC: 299,632
Message 922 - Posted: 12 Jan 2025, 11:41:11 UTC - in response to Message 921.  

All of my three Linux hosts produce errors right at the beginning. I got two RTX 3060-Ti and one 3070-Ti running Ubuntu 20.04.6 LTS and Mint with driver version 535. Only my Windows host with RTX 4080 comes up with valid results. I think there may be a glitch in the Linux version of the app.
ID: 922 · Report as offensive     Reply Quote
Drago75

Send message
Joined: 13 Oct 20
Posts: 6
Credit: 28,421,591
RAC: 299,632
Message 923 - Posted: 12 Jan 2025, 12:00:05 UTC - in response to Message 922.  

I reverted to driver version 470. No help...
ID: 923 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 25 Jun 20
Posts: 4
Credit: 157,859,345
RAC: 1,397,818
Message 925 - Posted: 12 Jan 2025, 15:58:52 UTC

Same seg fault in Linux as shown already.

Windows 10 access violations
-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
https://minecraftathome.com/minecrafthome/result.php?resultid=10297424
ID: 925 · Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 28 Jun 20
Posts: 15
Credit: 121,890,580
RAC: 1,859,847
Message 930 - Posted: 12 Jan 2025, 18:04:12 UTC - in response to Message 925.  
Last modified: 12 Jan 2025, 18:17:43 UTC

Same seg fault in Linux as shown already.

Windows 10 access violations
-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
https://minecraftathome.com/minecrafthome/result.php?resultid=10297424


ME TOO:

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
No checkpoint to load
SIGSEGV: segmentation violation
Stack trace (5 frames):
../../projects/minecraftathome.com_minecrafthome/loneliest-cuda_1.00.bin(+0x2ffdf)[0x5580f7119fdf]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f8e4239b420]
../../projects/minecraftathome.com_minecrafthome/loneliest-cuda_1.00.bin(+0x1f09b)[0x5580f710909b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f8e42064083]
../../projects/minecraftathome.com_minecrafthome/loneliest-cuda_1.00.bin(+0x1f6ee)[0x5580f71096ee]

Exiting...

</stderr_txt>
]]>

I am running Cuda 12.2 on that Linux pc

At least one of my Windows pc's has Cuda ver 12.6
ID: 930 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 32
Credit: 101,415,555
RAC: 110,632
Message 933 - Posted: 12 Jan 2025, 19:33:12 UTC

Hi folks,

We're cooking up a new build of this right now.

Should be out today in hopes of fixing both Windows and Linux issues.
It should also provide us with additional debug information from the cuda driver about what errors it's hitting.

Thanks for your patience!
ID: 933 · Report as offensive     Reply Quote
Skillz

Send message
Joined: 23 May 21
Posts: 3
Credit: 151,926,759
RAC: 3,751,949
Message 937 - Posted: 13 Jan 2025, 3:18:33 UTC

Getting this error on Linux

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 222 (0xde, -34)</message>
<stderr_txt>
stndalone gpuindex 0 
No checkpoint to load
GPUassert: the provided PTX was compiled with an unsupported toolchain. (code 222) main.cu 4847

</stderr_txt>
]]>


3070 Ti GPU on Linux.
ID: 937 · Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 8 Mar 21
Posts: 59
Credit: 313,205,473
RAC: 3,968,971
Message 939 - Posted: 13 Jan 2025, 4:44:22 UTC

Appears only hosts with 560 or 565 drivers have been successful.
ID: 939 · Report as offensive     Reply Quote
EDU Enthusiasts of Digital Uni...

Send message
Joined: 9 Sep 24
Posts: 2
Credit: 31,122,500
RAC: 74,036
Message 942 - Posted: 13 Jan 2025, 4:54:29 UTC

Just note latest production branch is 550.142
https://www.nvidia.com/en-us/drivers/unix/

This kills my ability to compute until support for 550 is created.
ID: 942 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 32
Credit: 101,415,555
RAC: 110,632
Message 945 - Posted: 13 Jan 2025, 5:07:25 UTC

Hey everyone,

I appreciate your patience as we work through this.

I wasn't aware that 560+ wasn't "production ready". We had targeted CUDA 12.6 as that was the latest available from NVIDIA (12.7 is available but in a "beta" state, so we did not use that).

For now, I will re-target our compilation and push out a new version targeting CUDA 12.2, as that is supported by driver 535 or above and many folks posting today seem to use that version.

Stay tuned, I'll have it pushed out soon!
ID: 945 · Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 8 Mar 21
Posts: 59
Credit: 313,205,473
RAC: 3,968,971
Message 947 - Posted: 13 Jan 2025, 5:11:59 UTC - in response to Message 945.  

Thank you. Much more reasonable.
ID: 947 · Report as offensive     Reply Quote
Profile Landjunge

Send message
Joined: 26 Jun 20
Posts: 7
Credit: 291,433,198
RAC: 7,430,158
Message 949 - Posted: 13 Jan 2025, 5:43:28 UTC - in response to Message 945.  

Hey everyone,

I appreciate your patience as we work through this.

I wasn't aware that 560+ wasn't "production ready". We had targeted CUDA 12.6 as that was the latest available from NVIDIA (12.7 is available but in a "beta" state, so we did not use that).

For now, I will re-target our compilation and push out a new version targeting CUDA 12.2, as that is supported by driver 535 or above and many folks posting today seem to use that version.

Stay tuned, I'll have it pushed out soon!


Thanks!
ID: 949 · Report as offensive     Reply Quote
Skillz

Send message
Joined: 23 May 21
Posts: 3
Credit: 151,926,759
RAC: 3,751,949
Message 951 - Posted: 13 Jan 2025, 9:39:22 UTC

Tasks are completing now. I was really dreading updating to 560+ on my GPUs. So glad I do not have to.
ID: 951 · Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 13 Jan 25
Posts: 2
Credit: 7,500
RAC: 84
Message 953 - Posted: 13 Jan 2025, 14:30:28 UTC - in response to Message 945.  

Hey everyone,

I appreciate your patience as we work through this.

I wasn't aware that 560+ wasn't "production ready". We had targeted CUDA 12.6 as that was the latest available from NVIDIA (12.7 is available but in a "beta" state, so we did not use that).

For now, I will re-target our compilation and push out a new version targeting CUDA 12.2, as that is supported by driver 535 or above and many folks posting today seem to use that version.

Stay tuned, I'll have it pushed out soon!


CUDA 12.6 and R560 is production ready. but there seems to be something weird going on with CUDA 12.6 and the forward compatibility that it's supposed to have. even 12.6.3 documentation states that the minimum required driver is 525.60.13 for all CUDA 12.x builds. but I've seen more than one instance where 12.6 isnt adhering to this minor version forward compatibility like 12.1-12.5 does.

is it possible that your 12.6 build was using some feature that was new in 12.6? thus breaking forward compatibility?

is your application open source?
ID: 953 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : calculation error