Do JVMs create significant overhead in distributed/ parallel processing?

If a distributed computing framework spins up nodes for running Java/ Scala operations then it has to include the JVM in every container. E.g. every Map and Reduce step spawns its own JVM.

How does the efficiency of this instantiation compare to spinning up containers for languages like Python?

I've heard that, much like Alpine Linux is just a few MB, there are stripped down JVMs, but still, there must be a cost. Yet, Scala is the first class citizen in Spark and MR is written in Java....

Read More »

By: StackOverFlow - 6 days ago

Related Posts