스파크 실행기 번호, 코어 및 실행기 메모리를 조정하는 방법은 무엇입니까?

Nice programing

스파크 실행기 번호, 코어 및 실행기 메모리를 조정하는 방법은 무엇입니까?

nicepro 2020. 11. 29. 12:18

스파크 실행기 번호, 코어 및 실행기 메모리를 조정하는 방법은 무엇입니까?

위에서 언급 한 매개 변수를 어디에서 조정하기 시작합니까? 실행기 메모리로 시작하여 실행기 수를 얻거나 코어로 시작하여 실행기 번호를 얻습니까? 나는 링크를 따랐다 . 그러나 높은 수준의 아이디어를 얻었지만 어떻게 시작하고 최종 결론에 도달해야할지 아직 확실하지 않습니다.

다음 답변은 제목에 언급 된 세 가지 주요 측면 (실행기 수, 실행기 메모리 및 코어 수)을 다룹니다. 이 답변에서 다루지 않았지만 가까운 장래에 추가하고 싶은 드라이버 메모리 및 기타 매개 변수와 같은 다른 매개 변수가있을 수 있습니다.

사례 1 하드웨어-6 개 노드, 각 노드 16 개 코어, 64GB RAM

각 실행기는 JVM 인스턴스입니다. 따라서 단일 노드에 여러 실행기를 가질 수 있습니다.

OS 및 Hadoop 데몬에는 처음 1 개 코어와 1GB가 필요하므로 각 노드에 대해 15 개 코어, 63GB RAM을 사용할 수 있습니다.

코어 수를 선택하는 방법부터 시작하십시오 .

Number of cores = Concurrent tasks as executor can run 

So we might think, more concurrent tasks for each executor will give better performance. But research shows that
any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5.

This number came from the ability of executor and not from how many cores a system has. So the number 5 stays same
even if you have double(32) cores in the CPU.

집행자 수 :

Coming back to next step, with 5 as cores per executor, and 15 as total available cores in one Node(CPU) - we come to 
3 executors per node.

So with 6 nodes, and 3 executors per node - we get 18 executors. Out of 18 we need 1 executor (java process) for AM in YARN we get 17 executors

This 17 is the number we give to spark using --num-executors while running from spark-submit shell command

각 실행기에 대한 메모리 :

From above step, we have 3 executors  per node. And available RAM is 63 GB

So memory for each executor is 63/3 = 21GB. 

However small overhead memory is also needed to determine the full memory request to YARN for each executor.
Formula for that over head is max(384, .07 * spark.executor.memory)

Calculating that overhead - .07 * 21 (Here 21 is calculated as above 63/3)
                            = 1.47

Since 1.47 GB > 384 MB, the over head is 1.47.
Take the above from each 21 above => 21 - 1.47 ~ 19 GB

So executor memory - 19 GB

최종 번호-실행자-17, 코어 5, 실행자 메모리-19GB

케이스 2 하드웨어 : 동일 6 노드, 32 코어, 64GB

5는 좋은 동시성을 위해 동일합니다.

각 노드의 실행기 수 = 32/5 ~ 6

따라서 총 실행자 = 6 * 6 노드 = 36. 그러면 최종 수는 AM = 35 일 때 36-1입니다.

실행기 메모리는 각 노드에 대해 6 개의 실행기입니다. 63/6 ~ 10. 오버 헤드는 .07 * 10 = 700MB입니다. 따라서 머리 위로 1GB로 반올림하면 10-1 = 9GB가됩니다.

최종 번호-실행자-35, 코어 5, 실행자 메모리-9GB

사례 3

위의 시나리오는 코어 수를 고정 된 것으로 받아들이고 실행기 및 메모리 수로 이동하는 것으로 시작합니다.

이제 첫 번째 경우 19GB가 필요하지 않고 10GB만으로 충분하다고 생각하면 다음과 같은 숫자가 있습니다.

코어 5 각 노드에 대한 실행기 수 = 3

이 단계에서 이것은 첫 번째 계산에 따라 21과 19로 이어질 것입니다. 그러나 우리는 10이 괜찮다고 생각했기 때문에 (작은 오버 헤드 가정), 노드 당 실행기 수를 6 (63/10과 같이)으로 전환 할 수 없습니다. 노드 당 실행기가 6 개이고 코어가 5 개인 경우 코어가 16 개 뿐인 경우 노드 당 30 개의 코어로 줄어 듭니다. 따라서 각 실행기의 코어 수도 변경해야합니다.

다시 계산하면

매직 넘버 5는 3이됩니다 (5보다 작거나 같은 숫자). 따라서 3 개의 코어와 15 개의 사용 가능한 코어를 사용하여 노드 당 5 개의 실행기를 얻습니다. 따라서 (5 * 6 -1) = 29 명의 집행자

따라서 메모리는 63/5 ~ 12입니다. 오버 헤드는 12 * .07 = .84이므로 실행기 메모리는 12-1GB = 11GB입니다.

최종 숫자는 29 개의 실행기, 3 개의 코어, 실행기 메모리는 11GB입니다.

Dynamic Allocation:

Note : Upper bound for the number of executors if dynamic allocation is enabled. So this says that spark application can eat away all the resources if needed. So in a cluster where you have other applications are running and they also need cores to run the tasks, please make sure you do it at cluster level. I mean you can allocate specific number of cores for YARN based on user access. So you can create spark_user may be and then give cores (min/max) for that user. These limits are for sharing between spark and other applications which run on YARN.

spark.dynamicAllocation.enabled - When this is set to true - We need not mention executors. The reason is below:

The static params number we give at spark-submit is for the entire job duration. However if dynamic allocation comes into picture, there would be different stages like

What to start with :

Initial number of executors (spark.dynamicAllocation.initialExecutors) to start with

How many :

Then based on load (tasks pending) how many to request. This would eventually be the numbers what we give at spark-submit in static way. So once the initial executor numbers are set, we go to min (spark.dynamicAllocation.minExecutors) and max (spark.dynamicAllocation.maxExecutors) numbers.

When to ask or give:

When do we request new executors (spark.dynamicAllocation.schedulerBacklogTimeout) - There have been pending tasks for this much duration. so request. number of executors requested in each round increases exponentially from the previous round. For instance, an application will add 1 executor in the first round, and then 2, 4, 8 and so on executors in the subsequent rounds. At a specific point, the above max comes into picture

when do we give away an executor (spark.dynamicAllocation.executorIdleTimeout) -

Please correct me if I missed anything. The above is my understanding based on the blog i shared in question and some online resources. Thank you.

References:

Also, it depends on your use case, an important config parameter is:

spark.memory.fraction(Fraction of (heap space - 300MB) used for execution and storage) from http://spark.apache.org/docs/latest/configuration.html#memory-management.

If you dont use cache/persist, set it to 0.1 so you have all the memory for your program.

If you use cache/persist, you can check the memory taken by:

sc.getExecutorMemoryStatus.map(a => (a._2._1 - a._2._2)/(1024.0*1024*1024)).sum

Do you read data from HDFS or from HTTP?

Again, a tuning depend on your use case.

참고URL : https://stackoverflow.com/questions/37871194/how-to-tune-spark-executor-number-cores-and-executor-memory

'Nice programing' 카테고리의 다른 글

Bind () 대신 Join ()을 사용하는 모나드 (0)	2020.11.29
Python 2.7의 분할 (0)	2020.11.29
Git에서 모든 파일을 수동으로 병합하는 방법은 무엇입니까? (0)	2020.11.29
XPath로 Java의 네임 스페이스를 사용하여 XML을 쿼리하는 방법은 무엇입니까? (0)	2020.11.29
디버깅하는 동안 iOS 애플리케이션 샌드 박스에서 파일의 내용을 볼 수 있습니까? (0)	2020.11.29

현재글스파크 실행기 번호, 코어 및 실행기 메모리를 조정하는 방법은 무엇입니까?

nicepro

스파크 실행기 번호, 코어 및 실행기 메모리를 조정하는 방법은 무엇입니까?

스파크 실행기 번호, 코어 및 실행기 메모리를 조정하는 방법은 무엇입니까?

'Nice programing' 카테고리의 다른 글

'Nice programing'의 다른글

티스토리툴바

스파크 실행기 번호, 코어 및 실행기 메모리를 조정하는 방법은 무엇입니까?

스파크 실행기 번호, 코어 및 실행기 메모리를 조정하는 방법은 무엇입니까?

'Nice programing' 카테고리의 다른 글

'Nice programing'의 다른글

관련글

티스토리툴바