feat(bigframes): Add numpy ufunc support to col expressions#16554
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for NumPy universal functions (ufuncs) in BigFrames by implementing the __array_ufunc__ method in the Expression class. It also refactors binary operation logic into a helper function _as_bf_expr and adds unit tests to verify the new functionality. Feedback was provided regarding the use of a non-standard type hint for the method parameter and an issue in the unit tests where non-standard pandas API calls were used to compute expected results.
| return strings.StringMethods(self) | ||
|
|
||
| def __array_ufunc__( | ||
| self, ufunc: numpy.ufunc, method: __builtins__.str, *inputs, **kwargs |
There was a problem hiding this comment.
Using __builtins__.str as a type hint is non-standard and potentially fragile. It is recommended to use the built-in str type directly.
| self, ufunc: numpy.ufunc, method: __builtins__.str, *inputs, **kwargs | |
| self, ufunc: numpy.ufunc, method: str, *inputs, **kwargs |
References
- Standard Python type hinting practices (PEP 484) recommend using built-in types like 'str' directly instead of accessing them through 'builtins'. (link)
| pd_kwargs = { | ||
| "sqrt": np.sqrt(pd.col("float64_col")), # type: ignore | ||
| "add_const": np.add(pd.col("float64_col"), 2.4), # type: ignore | ||
| "radd_const": np.add(2.4, pd.col("float64_col")), # type: ignore | ||
| "add_cols": np.add(pd.col("float64_col"), pd.col("int64_col")), # type: ignore | ||
| } |
There was a problem hiding this comment.
The pd_kwargs dictionary uses pd.col, which is not a standard pandas API. To correctly verify the BigFrames implementation against pandas, the expected results should be computed using standard pandas column access on scalars_pandas_df. Additionally, standard pandas assign does not support BigFrames Expression objects. To ensure dictionary keys remain sorted without manual effort, the dictionary should be programmatically sorted.
| pd_kwargs = { | |
| "sqrt": np.sqrt(pd.col("float64_col")), # type: ignore | |
| "add_const": np.add(pd.col("float64_col"), 2.4), # type: ignore | |
| "radd_const": np.add(2.4, pd.col("float64_col")), # type: ignore | |
| "add_cols": np.add(pd.col("float64_col"), pd.col("int64_col")), # type: ignore | |
| } | |
| pd_kwargs = dict(sorted({ | |
| "sqrt": np.sqrt(scalars_pandas_df["float64_col"]), | |
| "add_const": np.add(scalars_pandas_df["float64_col"], 2.4), | |
| "radd_const": np.add(2.4, scalars_pandas_df["float64_col"]), | |
| "add_cols": np.add(scalars_pandas_df["float64_col"], scalars_pandas_df["int64_col"]), | |
| }.items())) |
References
- To ensure dictionary keys remain sorted without manual effort, programmatically sort the dictionary instead of relying on manual ordering in the code.
PR created by the Librarian CLI to initialize a release. Merging this PR will auto trigger a release. Librarian Version: v0.13.0 Language Image: us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/python-librarian-generator@sha256:234b9d1f2ddb057ed7ac6a38db0bf8163d839c65c6cf88ade52530cddebce59e <details><summary>bigframes: v2.40.0</summary> ## [v2.40.0](bigframes-v2.39.0...bigframes-v2.40.0) (2026-05-13) ### Features * Add `bigframes.execution_history` API to track BigQuery jobs (#16588) ([fa20a74](fa20a740)) ```python import bigframes.pandas as bpd bpd.options.compute.enable_execution_history = True df = bpd.read_gbq("my_table") # ... perform operations ... history = bpd.execution_history print(history.jobs) # Access BigQuery job details for executed queries ``` * Implement `ai.similarity` and `ai.embed` for text embeddings and semantic similarity (#16771, #16759) ([d4afa2c](d4afa2c8), [fcb4579](fcb4579b)) ```python import bigframes.pandas as bpd # Generate embeddings df["embeddings"] = bpd.bigquery.ai.embed(df["text_col"]) # Compute similarity df["similarity"] = bpd.bigquery.ai.similarity(df["embeddings_a"], df["embeddings_b"]) ``` * Support `hparam_range` and `hparam_candidates` parameters for hyperparameter tuning in model creation (#16640) ([ca47835](ca47835c)) * Update `ai.score`, `ai.classify` and `ai.if_` parameters to match their SQL equivalents (#16919, #16990, #16857) ([9f42fe1](9f42fe14), [e9c52b1](e9c52b12), [f3cb4ad](f3cb4ad0)) * Support unstable sorting in `sort_values` and `sort_index` (#16665) ([bbdeb70](bbdeb70f)) * Support loading Avro and ORC data formats (#16555) ([6d46cba](6d46cba3)) * Add NumPy ufunc support directly on column expressions (#16554) ([2f792ab](2f792abd)) ### Bug Fixes * Fix bugs compiling ambiguous ids and in subqueries (#16617) ([479e44d](479e44dd)) * BigFrames respects bq default region (#16933) ([ef9945a](ef9945a5)) * avoid views when querying BigLake tables from SQL cells (#16562) ([fdd3e0d](fdd3e0de)) * avoid `copy` argument warning in `to_pandas` (#16917) ([fe5245b](fe5245b8)) ### Performance Improvements * Improve write api upload throughput (#16641) ([ef856b0](ef856b04)) ### Documentation * Add docs to the to_csv methods of dataframe and series (#16570) ([a8fccef](a8fccefd)) </details>
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕