Imagine square matrices a, b, and result each of size 1000 by 1000. It follows that n, m, and p below will each be 1000.
int n = a.length; int m = b[0].length; int p = a[0].length;
Imagine further that you are targetting creating ~10 times as many tasks as available processors.
join_void_fork_loop(0, n, (i) -> { for (int j = 0; j < m; j++) { for (int k = 0; k < p; k++) { result[i][j] += a[i][k] * b[k][j]; } } });
join_void_fork_loop(0, n, (i) -> { join_void_fork_loop(0, m, (j) -> { for (int k = 0; k < p; k++) { result[i][j] += a[i][k] * b[k][j]; } }); });
join_void_fork_loop(0, n, (i) -> { join_void_fork_loop(0, m, (j) -> { join_void_fork_loop(0, p, (k) -> { result[i][j] += a[i][k] * b[k][j]; }); }); });
For the code below:
join_void_fork_loop(0, n, (i) -> { doSomething1d(i); });
For the code below:
join_void_fork_loop(0, n, (i) -> { join_void_fork_loop(0, m, (j) -> { doSomething2d(i, j); }); });