Design Document

Project: arrayops

Rust-backed acceleration for Python array.array


1. Purpose & Goals

Problem

Python’s built-in array.array:

  • Is memory-efficient and C-compatible

  • Supports zero-copy buffers

  • Lacks high-level operations (map, filter, reduce, math ops)

  • Is slow when iterated in Python

NumPy solves this but is:

  • Heavyweight

  • Multi-dimensional (often unnecessary)

  • Overkill for scripting, ETL, and systems tooling

Goal

Create a lightweight Rust extension that:

  • Operates directly on array.array

  • Uses zero-copy buffer access

  • Provides fast, safe, numeric operations

  • Avoids introducing a new array type

Non-Goals

  • Replacing NumPy

  • Multi-dimensional arrays

  • Arbitrary Python object arrays

  • Dynamic method injection into array.array


2. Target Users

  • Systems / ETL Python developers

  • Binary protocol authors

  • Streaming data pipelines

  • Performance-sensitive scripts

  • Users who want speed without NumPy


3. High-Level Architecture

┌──────────────┐
│   Python     │
│              │
│ array.array  │  ← unchanged
│ arrayops/_ao.py │
└──────┬───────┘
       │ buffer protocol
       ▼
┌──────────────┐
│    Rust      │
│  (PyO3)      │
│              │
│ typed loops  │
│ SIMD / par   │
└──────────────┘

4. Public Python API

Module layout

import arrayops as ao

Core operations

ao.map(arr, fn)                -> array
ao.map_inplace(arr, fn)        -> None

ao.filter(arr, predicate)      -> array

ao.reduce(arr, fn, initial)    -> scalar

ao.sum(arr)                    -> scalar
ao.mean(arr)                   -> float
ao.min(arr), ao.max(arr)

ao.scale(arr, factor)          -> None
ao.clip(arr, min, max)         -> None

Type restrictions

  • Only numeric array.array types:

    • b, B, h, H, i, I, l, L

    • f, d

  • No object arrays

  • No mixed types

Invalid typecodes → TypeError


5. Rust Implementation Strategy

Tooling

  • Rust

  • PyO3

  • maturin

  • Optional: rayon (parallelism), packed_simd or std::simd

Buffer access

use pyo3::buffer::PyBuffer;

let buffer = PyBuffer::<i32>::get(py, &py_array)?;
let slice = unsafe { buffer.as_slice()? };
  • ✔️ Zero-copy

  • ✔️ Typed

  • ✔️ Safe lifetime enforcement

Type dispatch

Rust side dispatch by typecode:

match typecode {
    'i' => process::<i32>(buffer),
    'f' => process::<f32>(buffer),
    'd' => process::<f64>(buffer),
    _ => Err(TypeError)
}

Each kernel is monomorphized → fast loops.


6. Python Callable Handling

Two execution paths

A. Fast path (no Python calls)

ao.scale(arr, 1.5)
ao.sum(arr)
  • Pure Rust loop

  • No GIL per element

  • SIMD-friendly

B. Callback path (slower, flexible)

ao.map(arr, lambda x: x * x)
  • GIL held

  • Python callable invoked per element

  • Still faster than Python iteration due to C-level loop


7. Safety Guarantees

Memory Safety

  • Bounds-checked slices: All array access is bounds-checked through Rust’s safe APIs

  • No unsafe pointer arithmetic: We use PyO3’s safe buffer protocol APIs exclusively

  • No reallocation during in-place ops: In-place operations modify existing buffers without reallocation

  • Panic boundaries: Rust panics are caught and converted to Python exceptions

Security Guarantees

  • Input validation: All inputs are validated before processing (type, typecode, properties)

  • Error message security: Error messages are informative but don’t leak sensitive information

  • Minimal unsafe code: Only dead code (allocator module) uses unsafe, and it’s well-documented

  • Dependency security: Dependencies are regularly audited for vulnerabilities

See Security Documentation for comprehensive security information.


8. Performance Characteristics

Operation

Python list

array.array

arrayops

Iteration

Fast

Slower

❌ (avoid)

Sum

Slow

Slow

🚀

Map

Slow

Slow

🚀

Binary IO

In-place ops

🚀


9. Error Handling

Condition

Behavior

Unsupported typecode

TypeError

Non-contiguous buffer

BufferError

Python callback error

Propagate exception

Overflow (ints)

Python semantics


10. Packaging & Distribution

Build

maturin build
maturin publish

Targets

  • Linux (manylinux)

  • macOS (universal2)

  • Windows


11. Testing Strategy

Rust

  • Unit tests per kernel

  • Property tests (quickcheck)

  • Overflow behavior validation

Python

  • pytest

  • Parity tests vs Python list

  • Buffer safety tests

  • Large array stress tests


12. Documentation Plan

  • README with:

    • Motivation

    • API examples

    • Performance benchmarks

    • Comparison with NumPy

  • Cookbook examples:

    • Binary parsing

    • ETL pipelines

    • Streaming stats


13. Future Extensions

  • SIMD auto-vectorization

  • Parallel execution (rayon)

  • memoryview support

  • Optional NumPy interop

  • Arrow buffer interop


14. Risks & Mitigations

Risk

Mitigation

Reinventing NumPy

Scope control

Callback overhead

Encourage fast paths

API creep

Minimal surface

Platform builds

maturin CI


15. Summary

arrayops would:

  • Fill a real gap between array.array and NumPy

  • Leverage Rust for safety + speed

  • Enable high-performance numeric scripts with zero dependencies