Is your feature request related to a problem? Please describe.
At the moment in Pegel, when we are constructing triplets, we are always taking all the source and destination vertices columns. It create a huge dataset in memory, especially for algorithms, that have a big state (cycles detection, future random walks, etc.)
IRL, we do not need always to have the full state, but only part of it. For example, in Rocha-Thatte algorithm, it is enough to have on each triplet only source vertex' sequences.
Describe the solution you would like
I would like to have an API like:
requiredSrcColumns(col: Column, cols: Column*)
requiredDstColumns(col: Column, cols: Column*)
and on the step of generating triplets, select only required columns instead of the whole Pregel state of both src and dst columns.
Bonus update existing Pregel-based algorithms by explicitly providing only required columns (based on the context of the sendToSrc and sendToDst
Bonus 2 provide PySpark Classic / Connect APIs.
Component
Additional context
Are you planning on creating a PR?
Is your feature request related to a problem? Please describe.
At the moment in Pegel, when we are constructing triplets, we are always taking all the source and destination vertices columns. It create a huge dataset in memory, especially for algorithms, that have a big state (cycles detection, future random walks, etc.)
IRL, we do not need always to have the full state, but only part of it. For example, in Rocha-Thatte algorithm, it is enough to have on each triplet only source vertex' sequences.
Describe the solution you would like
I would like to have an API like:
requiredSrcColumns(col: Column, cols: Column*)requiredDstColumns(col: Column, cols: Column*)and on the step of generating triplets, select only required columns instead of the whole Pregel state of both src and dst columns.
Bonus update existing Pregel-based algorithms by explicitly providing only required columns (based on the context of the
sendToSrcandsendToDstBonus 2 provide PySpark Classic / Connect APIs.
Component
Additional context
Are you planning on creating a PR?