Is your feature request related to a problem? Please describe.
At the moment all the edge attributes are packed to the struct and persisted (!):
val edges = graph.edges
.select(col(SRC).alias("edge_src"), col(DST).alias("edge_dst"), struct(col("*")).as(EDGE))
.repartition(col("edge_src"))
.persist(intermediateStorageLevel)
While in built-in algorithms we are work-arounding it by explicitly select only required columns, it would be nice to add an API for end users that allows to specify required columns.
Describe the solution you would like
requiredEdgeColumns: if specified we are selecting only required edge columns. If it is empty we should remove struct(col("*")).as(EDGE) from edges at all. At the moment it is more like a bug / unclear behaviour: it always add a struct with SRC and DST that is persisted (!). It is bad.
Component
Additional context
While it may look like a breaking change, for me it is more like fixing unspecified behavior on a very-very rare case someone (wrongly) uses Pregel.edge(SRC) instead of Pregel.src(ID). In the case requiredEdgeColumns is empty I would like to drop EDGE at all from edges.
By default the requiredEdgeColumns should use all the edge columns except the SRC and DST. AND in the case there are no additional columns, EDGE struct should not be created.
We may mention it in release notes for the rare case someone is still using the old EDGE for any reason (that is very-very unlikely imo).
Are you planning on creating a PR?
Is your feature request related to a problem? Please describe.
At the moment all the edge attributes are packed to the struct and persisted (!):
While in built-in algorithms we are work-arounding it by explicitly select only required columns, it would be nice to add an API for end users that allows to specify required columns.
Describe the solution you would like
requiredEdgeColumns: if specified we are selecting only required edge columns. If it is empty we should removestruct(col("*")).as(EDGE)from edges at all. At the moment it is more like a bug / unclear behaviour: it always add a struct withSRCandDSTthat is persisted (!). It is bad.Component
Additional context
While it may look like a breaking change, for me it is more like fixing unspecified behavior on a very-very rare case someone (wrongly) uses
Pregel.edge(SRC)instead ofPregel.src(ID). In the caserequiredEdgeColumnsis empty I would like to dropEDGEat all fromedges.By default the
requiredEdgeColumnsshould use all the edge columns except theSRCandDST. AND in the case there are no additional columns,EDGEstruct should not be created.We may mention it in release notes for the rare case someone is still using the old
EDGEfor any reason (that is very-very unlikely imo).Are you planning on creating a PR?