Are the size estimates accurate?

Author	Message
TheCruelLogician Send message Joined: 13 Jul 20 Posts: 3 Credit: 128,216 RAC: 0	Message 260 - Posted: 13 Jul 2020, 2:52:33 UTC doing the math of size estimate / time to complete I am coming back with a result of ~ 18 TFLOP/s or 18 TIOP/s per gpu. 1080 Ti's don't run that fast.... Is the size estimate accurate, and also, is the project floating point or integer math? ID: 260 · Reply Quote

Jord Volunteer moderator Help desk expert Send message Joined: 24 Jun 20 Posts: 85 Credit: 207,156 RAC: 0	Message 264 - Posted: 13 Jul 2020, 7:22:01 UTC - in response to Message 260. Last modified: 13 Jul 2020, 7:26:20 UTC How do you do the math for the size estimate or time? You can see the FLOPs estimate of tasks by checking the properties of a running task in BOINC Manager Advanced view, Tasks tab. The latest I've seen was <rsc_fpops_est>30000000000000000.000000</rsc_fpops_est> And you can see how BOINC calculates the run time for tasks by enabling the rr_simulation in the Event Log options... menu ID: 264 · Reply Quote

TheCruelLogician Send message Joined: 13 Jul 20 Posts: 3 Credit: 128,216 RAC: 0	Message 265 - Posted: 13 Jul 2020, 16:42:48 UTC I don't think you understood my question. The size of the computation is being reported as 3x10^16 operations. The run time for one of the work units I completed last night was 1703.89 seconds. 3x10^16 / 1703.89 = 17606.7704 operations per second. My graphics cards are physically incapable of running that fast ( 2 IPC * 1.911 GHz max boost * 3584 CUDA cores = 13698.048 operations per second, theoretical maximum), so I am wondering how accurate the estimate of the size of the work units is. ID: 265 · Reply Quote

randomlyFuzzy Send message Joined: 12 Jul 20 Posts: 2 Credit: 167,441 RAC: 0	Message 268 - Posted: 15 Jul 2020, 16:58:08 UTC - in response to Message 265. Just so you know 1.911GHz is 1'911'000'000 so the actual result is 21'911'000'0003'584 = 13'736'268'000'000 which is 13 T-OP/S but when taking into account things like warps and it quite easily can reach 18 TFLOPS or TIOPS with the most major bottle neck being how fast you can put the operations and data into VRAM ID: 268 · Reply Quote

Hy Project developer Send message Joined: 15 Jun 20 Posts: 74 Credit: 19,537,761 RAC: 0	Message 269 - Posted: 15 Jul 2020, 18:27:58 UTC - in response to Message 268. Last modified: 16 Jul 2020, 15:41:39 UTC Absolutely, we're VRAM latency and core-clock / core bottlenecked. Bandwidth doesn't matter as much from what I saw super early on (I was surprised to see the same speed going from 480GB/s to 78GB/s HBM2) GDDR6 GPUs with worse compute than their previous gen (AMD RX 5700 vs my RX Vega 56 OC) perform the same or like 25-40% faster despite their similar gaming perf and worse compute perf in some situations than the previous GPUs So, we have to deal with NV: Turing(GDDR6), Pascal (GDDR5/X) and even Maxwell (GDDR5) AMD : RDNA Navi (GDDR6), GCN Vega(HBM2) and GCN Polaris (GDDR5) affecting our time/tflops per task estimates. There's also some in-task variances for some set of seeds/inputs due to cactus placement optimization as well ID: 269 · Reply Quote

randomlyFuzzy Send message Joined: 12 Jul 20 Posts: 2 Credit: 167,441 RAC: 0	Message 270 - Posted: 15 Jul 2020, 18:45:22 UTC - in response to Message 269. When i said putting into vram I meant locality not bandwidth ID: 270 · Reply Quote

TheCruelLogician Send message Joined: 13 Jul 20 Posts: 3 Credit: 128,216 RAC: 0	Message 271 - Posted: 15 Jul 2020, 23:33:20 UTC If you want to pretend l''m an idiot and you need to explain computer architecture and third grade mathematics to me, the least you could do is actually get your arithmetic right. I'm done here. ID: 271 · Reply Quote