Skip to content

taps.filter

Filter

Bases: Protocol

Filter protocol.

A Filter is a callable object, e.g., a function, used by the Engine that takes an object as input and returns a boolean indicating if the object should be transformed by Engine's data Transformer.

FilterConfig

Bases: BaseModel, ABC

Abstract Filter plugin configuration.

Parameters:

  • name (str) –

    Filter name.

get_filter abstractmethod

get_filter() -> Filter

Create a filter from the configuration.

Source code in taps/filter/_protocol.py
@abc.abstractmethod
def get_filter(self) -> Filter:
    """Create a filter from the configuration."""
    ...

NeverFilterConfig

Bases: FilterConfig

NeverFilter plugin configuration.

Parameters:

  • name (Literal[str], default: 'never' ) –

    Filter name.

get_filter

get_filter() -> Filter

Create a filter from the configuration.

Source code in taps/filter/_simple.py
def get_filter(self) -> Filter:
    """Create a filter from the configuration."""
    return NeverFilter()

NeverFilter

Filter that never lets objects pass through.

from taps.filter import NeverFilter

filter_ = NeverFilter()
assert not filter_('value')  # always false

ObjectSizeFilter

ObjectSizeFilter(
    *, min_bytes: int = 0, max_bytes: float = math.inf
)

Object size filter.

Checks if the size of an object (computed using sys.getsizeof()) is greater than a minimum size and less than a maximum size.

Warning

sys.getsizeof() does not count the size of objects referred to by the main object.

Example
from taps.filter import ObjectSizeFilter

filter_ = ObjectSizeFilter(min_bytes=100)
assert not filter_('small')
assert filter_('large' * 100)

Parameters:

  • min_bytes (int, default: 0 ) –

    Minimum size threshold (inclusive) to pass through the filter.

  • max_bytes (float, default: inf ) –

    Maximum size threshold (inclusive) to pass through the filter.

Source code in taps/filter/_object.py
def __init__(
    self,
    *,
    min_bytes: int = 0,
    max_bytes: float = math.inf,
) -> None:
    self.min_bytes = min_bytes
    self.max_bytes = max_bytes

ObjectSizeFilterConfig

Bases: FilterConfig

ObjectSizeFilter plugin configuration.

Parameters:

  • name (Literal[str], default: 'object-size' ) –

    Filter name.

  • min_size (int, default: 0 ) –

    Minimum object size in bytes.

  • max_size (float, default: inf ) –

    Maximum object size in bytes.

get_filter

get_filter() -> Filter

Create a filter from the configuration.

Source code in taps/filter/_object.py
def get_filter(self) -> Filter:
    """Create a filter from the configuration."""
    return ObjectSizeFilter(
        min_bytes=self.min_size,
        max_bytes=self.max_size,
    )

ObjectTypeFilter

ObjectTypeFilter(
    *types: type, patterns: Sequence[str] | None = None
)

Object type filter.

Checks if an object is of a certain type using isinstance() or by pattern matching against the name of the type.

Example
from taps.filter import ObjectTypeFilter

filter_ = ObjectTypeFilter(int, str)
assert filter_(42)
assert filter_('value')
assert not filter_(3.14)

Parameters:

  • types (type, default: () ) –

    Types to check.

  • patterns (Sequence[str] | None, default: None ) –

    Regex compatible patterns to compare against the name of the object's type.

Source code in taps/filter/_object.py
def __init__(
    self,
    *types: type,
    patterns: Sequence[str] | None = None,
) -> None:
    self.types = types
    self.patterns = tuple(patterns if patterns is not None else [])

ObjectTypeFilterConfig

Bases: FilterConfig

ObjectTypeFilter plugin configuration.

Parameters:

  • name (Literal[str], default: 'object-type' ) –

    Filter name.

  • patterns (List[str] | None, default: None ) –

    List of patterns to match against type names.

get_filter

get_filter() -> Filter

Create a filter from the configuration.

Source code in taps/filter/_object.py
def get_filter(self) -> Filter:
    """Create a filter from the configuration."""
    return ObjectTypeFilter(patterns=self.patterns)

PickleSizeFilter

PickleSizeFilter(
    *, min_bytes: int = 0, max_bytes: float = math.inf
)

Object size filter.

Checks if the size of an object (computed using size of the pickled object) is greater than a minimum size and less than a maximum size.

Warning

Pickling large objects can take significant time, so this filter type is only recommended when the data transformation cost (e.g., communication or storage) is significantly greater than serialization of the objects.

Example
from taps.filter import PickleSizeFilter

filter_ = PickleSizeFilter(min_bytes=100)
assert not filter_('small')
assert filter_('large' * 100)

Parameters:

  • min_bytes (int, default: 0 ) –

    Minimum size threshold (inclusive) to pass through the filter.

  • max_bytes (float, default: inf ) –

    Maximum size threshold (inclusive) to pass through the filter.

Source code in taps/filter/_object.py
def __init__(
    self,
    *,
    min_bytes: int = 0,
    max_bytes: float = math.inf,
) -> None:
    self.min_bytes = min_bytes
    self.max_bytes = max_bytes

PickleSizeFilterConfig

Bases: FilterConfig

PickleSizeFilter plugin configuration.

Parameters:

  • name (Literal[str], default: 'pickle-size' ) –

    Filter name.

  • min_size (int, default: 0 ) –

    Minimum object size in bytes.

  • max_size (float, default: inf ) –

    Maximum object size in bytes.

get_filter

get_filter() -> Filter

Create a filter from the configuration.

Source code in taps/filter/_object.py
def get_filter(self) -> Filter:
    """Create a filter from the configuration."""
    return PickleSizeFilter(
        min_bytes=self.min_size,
        max_bytes=self.max_size,
    )