Describe the bug
Hey I experienced a significant performance drop when updating from 0.9.4 to the latest version on cc computation. The first iteration of the cc is little bit slower while for consecutive iteration, the time can increase from several mins to around half an hour. Checked from spark ui, the cpu usage of each executor is almost 0 while super high for the driver. For one stage all the executor can finish the task in seconds while the total time can be half an hour. This might be the algorithm updates or the updates from writing to parquet to the checkpoint.
To Reproduce
Steps to reproduce the behavior:
- ...
- ...
- ...
Expected behavior
System [please complete the following information]:
- OS: e.g. [Ubuntu 18.04]
- Python Version (if applied): [e.g. Python 3.8]
- Spark / PySpark version: [e.g. PySpark 3.5.1] Spark 3.5.4
- GraphFrames version: [e.g. graphframes-0.9.0]
Component
Additional context
Are you planning on creating a PR?
Describe the bug
Hey I experienced a significant performance drop when updating from 0.9.4 to the latest version on cc computation. The first iteration of the cc is little bit slower while for consecutive iteration, the time can increase from several mins to around half an hour. Checked from spark ui, the cpu usage of each executor is almost 0 while super high for the driver. For one stage all the executor can finish the task in seconds while the total time can be half an hour. This might be the algorithm updates or the updates from writing to parquet to the checkpoint.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
System [please complete the following information]:
Component
Additional context
Are you planning on creating a PR?