In our experiments in Spark on a 64 core machine with 512GB RAM. Spark chokes beyond about 8 cores (~ 6x speedup) and our hypothesis is that the central garbage collector becomes a choke hold which avoids parallelism. This is unavoidable unless you take large chunks of memory per thread and use tricks and local memory managements to avoid a central bottleneck.