sql

Spark常見異常: java.util.concurrent.TimeoutException: Futures timed out


執行spark on yarn任務時報錯:

Caused by : .util.concurrent.TimeoutException: Futures timed out after 60s

參考此網站https://stackoverflow.com/questions/41123846/why-does-join-fail-with-java-util-concurrent-timeoutexception-futures-timed-ou

This happens because Spark tries to do Broadcast Hash Join and one of the DataFrames is very large, so sending it consumes much time.
You can:
Set higher spark..broadcastTimeout to increase timeout – spark.conf.set(“spark..broadcastTimeout”, newValueForExample36000)
persist() both DataFrames, then Spark will use Shuffle Join

所以可以:

  1. 增大spark.sql.broadcastTimeout的值;
  2. 持久化兩個DataFrames;

另外,還可以考慮將BroadcastJoin禁用掉,以及增加spark.driver.memory的值。

In addition to increasing spark.sql.broadcastTimeout or persist() both DataFrames,
You may try:
1.disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1
2.increase the spark driver memory by setting spark.driver.memory to a higher value.

本文章已修改原文用詞符合繁體字使用者習慣使其容易閱讀

版權宣告:此處為CSDN博主「我在北國不背鍋」的原創文章,依據CC 4.0 BY-SA版權協議,轉載請附上原文出處連結及本宣告。

原文連結:https://blog.csdn.net/weixin_44455388/article/details/101286428