Message boards :
Number crunching :
AMD RX 5700 XT will not validate
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 Jun 20 Posts: 85 Credit: 207,156 RAC: 0 |
All tasks on my AMD RX 5700 XT so far have not validated, not even one paired against an AMD RX 580. But seeing how I am so far the only RX 5700 XT around, it's difficult to put a finger on it. I did say in the News thread that seeing everyone else's tasks use CPU plus GPU, whereas for me it runs only on GPU but produces no output about seeds or anything in stderr.txt. Running BOINC 7.16.7 Windows 10 x64 Radeon drivers 20.5.1 -> was 20.3.1 but with those the same thing. |
Send message Joined: 15 Jun 20 Posts: 74 Credit: 19,537,761 RAC: 0 |
Which AMD drivers are you on? The issue with this problem is it's extremely unreliable to 'cause' I have seen other RX 5700XT machines produce tasks with normal output (as coded into the binary) and produce output for tasks with nothing just like you have. That machine had the same kaktwoos-cl version, the same task length for both the good and bad outputs. AMD has officially confirmed that there was a printf bug in their openCL Drivers that may have been patched in 20.5.1. We print to stderr and that same code works flawlessly on some AMD GPUs and all Nvidia machines. https://stackoverflow.com/questions/62545440/rootcausing-or-working-around-possible-amd-compiler-bug https://community.amd.com/thread/244452 In the meantime, I had someone's RX 5700XT machine that had unreliable (successes and then empty-outputs in a row during a few hour period) behavior run a modified binary and has reported 8 tasks successfully that can be validated in a row. This is in comparison to 2 tasks invalid, or two tasks validated in a row. I have not seen any RX 580s or RX Vegas (myself) have these failures with 1.12 yet, but I will keep going over the logs |
Send message Joined: 24 Jun 20 Posts: 85 Credit: 207,156 RAC: 0 |
Hy wrote: Which AMD drivers are you on? Jord wrote: Radeon drivers 20.5.1 -> was 20.3.1 but with those the same thing. |
Send message Joined: 24 Jun 20 Posts: 85 Credit: 207,156 RAC: 0 |
Apropos, something I touched on in the News thread but is probably snowed under: One thing I see though, the kaktwoos application runs at Low priority in Windows Task Manager, even in the wrapper. The wrapper runs at Normal priority. Seeing how we run on the GPU, you should try to set the priority to below_normal (case 2, Low priority at the Wrapper App).Commands: https://boinc.berkeley.edu/trac/wiki/WrapperApp (And I asked why run the OpenCL app in a wrapper? Is it in such an obscure language that it can't run natively?) |
Send message Joined: 14 Jun 20 Posts: 78 Credit: 1,321,619 RAC: 0 |
Apropos, something I touched on in the News thread but is probably snowed under: Use of the wrapper was initially because none of our developers had BOINC experience, so it reduced the learning curve by making it less of a challenge to get started. Now of course, we realise the wrapper is more hassle than its worth for these apps and seriously limits us. So, a coming change will likely remove it altogether in favour of implementing the BOINC API into the app. |
Send message Joined: 15 Jun 20 Posts: 74 Credit: 19,537,761 RAC: 0 |
Some really good news: I managed to do a bunch of fixes to our kaktwoos-cl code to make it C++ compliant, and finally compiled it with boinc_api headers included and the calls required to interface with BOINC. I tested some functionality of it, and it appears to behave like a BOINC application and still works in standalone mode. I already added all the code required to interface with the checkpointing system, and also the progress indicator will be updated on BOINC using the old internal percentage of seeds worked through. All program output goes to the right place now without question, and so that may finally fix the unusual AMD empty-output bugs. I have also tested resume / suspend, and it appears to work as intended as well because the checkpoint auto-save + manual call from BOINC is restoring the proper progress % (calculated from # of seeds in range, divide by current seeds progressed through) as well as resuming the GPU properly according to the logs. More testing will need to be done of course, and we will need to plan the migration away from the wrapper we use to this new kaktwoos-cl-boinc. Multi-GPU is 95%? finished as well. I've replaced all the old calls with calls to BOINC's opencl device ID function instead. I'll just need to verify it on a system I know had manual device selection working anyways |
Send message Joined: 24 Jun 20 Posts: 85 Credit: 207,156 RAC: 0 |
Yay, finally. I have credit as the new OpenCL 2.0 AMD application ran a task and it validated correctly against one done on an Nvidia GTX 960. CPU time is still just tens of seconds, not as with the Nvidia app where it's almost as long as the time it runs on the GPU. Now I wonder if it even runs on the GPU for Nvidia, and not on the CPU. But I can't check that as I don't have an Nvidia GPU. ;-) https://minecraftathome.com/minecrafthome/workunit.php?wuid=1262529 output: <core_client_version>7.16.7</core_client_version> <![CDATA[ <stderr_txt> Received work unit: 265366080896324 Data: n1: 119, n2: 615, n3: 103, di: 2, ch: 12 boinc gpu 0 gpuindex: 0 No checkpoint to load Speed: 41.08m/s Done Processed 100000000000 seeds in 2434.086000 seconds Found seeds: 01:31:45 (4404): called boinc_finish(0) </stderr_txt> (And checkpointing works great!) |