gh-127750: Fix singledispatchmethod caching (v2)#128648
gh-127750: Fix singledispatchmethod caching (v2)#128648eendebakpt wants to merge 10 commits intopython:mainfrom
Conversation
| import weakref # see comment in singledispatch function | ||
| self._method_cache = weakref.WeakKeyDictionary() | ||
| def __set_name__(self, obj, name): | ||
| self.attrname = name |
There was a problem hiding this comment.
Check cached_property.__set_name__, it has some more stuff in it - might be needed here as well.
There was a problem hiding this comment.
Hmm. The additions there prevent something like this:
@dataclass(frozen=True)
class A:
value: int
@singledispatchmethod
def dispatch(self, x):
return id(self)
renamed_dispatch = dispatch # allowed? if so, how should it behave
The corresponding test for the cached_property for this is
cpython/Lib/test/test_functools.py
Line 3315 in 34e840f
But on current main renaming is allowed for the singledispatchmethod.
I am not sure here what the desired behavior is (and why)
There was a problem hiding this comment.
If this implementation is desirable, maybe later someone who knows more about this can comment.
There was a problem hiding this comment.
As far as I know, the only reason cached properties can't be renamed is because the cache is keyed by the attribute's name.
Allowing a rebind would disconnect the cached property from it's cached value.
There was a problem hiding this comment.
Actually, I think you might want to either ignore renames or do something along these lines (ignoring error handling):
if self.attrname:
cache[name] = cache.pop(self.attrname)
self.attrname = nameAs far as I know, each binding shares the same instance of the descriptor, so as long as the cache key is constant, it should work no matter how many times it's been renamed.
There was a problem hiding this comment.
Allowing a rebind would disconnect the cached property from it's cached value.
This is kind of the same situation.
If rename is allowed, then it would simply cache to the last attrname. Drawback is that there is a small risk for unused cached methods.
I think it might be most straight forward to copy+paste cached_property.__set_name__. It does seem a sensible restriction. It comes at expense of flexibility, but personally, I have never run into that TypeError.
Also, it will be easier to address changes/improvements when 2 implementations that use the same caching approach are aligned.
| if self._method_cache is not None: | ||
| self._method_cache[obj] = _method | ||
| if cache is not None: | ||
| cache[self.attrname] = _method |
There was a problem hiding this comment.
Does not it create a reference loop? obj refers to cache, cache refers to _method, _method refers to a cell which refers to obj.
There was a problem hiding this comment.
Yes. But once there are no external references to the object obj any more the garbage collector removes the objects. (the cache is on the object obj, not on the singledispatchmethod itself or the class)
In the current main the caching is done on the singledispatchmethod which keeps the generated methods alive.
There was a problem hiding this comment.
Yes, the current situation is worse, it creates strong references singledispatchmethod -> _method -> obj.
Relying on the garbage collection is not good. This particular loop can be broken by using a weak reference to obj instead of obj. But a reference from a bound method to the object should be strong, otherwise some code will not work (there was a similar issue with TemporaryFile).
I am not sure how much this optimization saves. Are there other ways to achieve the same speed up, without creating reference loops?
|
Closing in favor of #130008 |
Version based on idea from @dg-pb in #127839. This version
__hash__/__eq__Regression in Django with singledispatchmethod on models #127750There is still a cache (stored on the object instances). Quick benchmark (windows, non-pgo):
(note that the alternative to this PR is not to keep main, but to revert #107148)