Design Document
Project: arrayops
Rust-backed acceleration for Python array.array
1. Purpose & Goals
Problem
Python’s built-in array.array:
Is memory-efficient and C-compatible
Supports zero-copy buffers
Lacks high-level operations (map, filter, reduce, math ops)
Is slow when iterated in Python
NumPy solves this but is:
Heavyweight
Multi-dimensional (often unnecessary)
Overkill for scripting, ETL, and systems tooling
Goal
Create a lightweight Rust extension that:
Operates directly on
array.arrayUses zero-copy buffer access
Provides fast, safe, numeric operations
Avoids introducing a new array type
Non-Goals
Replacing NumPy
Multi-dimensional arrays
Arbitrary Python object arrays
Dynamic method injection into
array.array
2. Target Users
Systems / ETL Python developers
Binary protocol authors
Streaming data pipelines
Performance-sensitive scripts
Users who want speed without NumPy
3. High-Level Architecture
┌──────────────┐
│ Python │
│ │
│ array.array │ ← unchanged
│ arrayops/_ao.py │
└──────┬───────┘
│ buffer protocol
▼
┌──────────────┐
│ Rust │
│ (PyO3) │
│ │
│ typed loops │
│ SIMD / par │
└──────────────┘
4. Public Python API
Module layout
import arrayops as ao
Core operations
ao.map(arr, fn) -> array
ao.map_inplace(arr, fn) -> None
ao.filter(arr, predicate) -> array
ao.reduce(arr, fn, initial) -> scalar
ao.sum(arr) -> scalar
ao.mean(arr) -> float
ao.min(arr), ao.max(arr)
ao.scale(arr, factor) -> None
ao.clip(arr, min, max) -> None
Type restrictions
Only numeric
array.arraytypes:b,B,h,H,i,I,l,Lf,d
No object arrays
No mixed types
Invalid typecodes → TypeError
5. Rust Implementation Strategy
Tooling
Rust
PyO3
maturin
Optional: rayon (parallelism), packed_simd or std::simd
Buffer access
use pyo3::buffer::PyBuffer;
let buffer = PyBuffer::<i32>::get(py, &py_array)?;
let slice = unsafe { buffer.as_slice()? };
✔️ Zero-copy
✔️ Typed
✔️ Safe lifetime enforcement
Type dispatch
Rust side dispatch by typecode:
match typecode {
'i' => process::<i32>(buffer),
'f' => process::<f32>(buffer),
'd' => process::<f64>(buffer),
_ => Err(TypeError)
}
Each kernel is monomorphized → fast loops.
6. Python Callable Handling
Two execution paths
A. Fast path (no Python calls)
ao.scale(arr, 1.5)
ao.sum(arr)
Pure Rust loop
No GIL per element
SIMD-friendly
B. Callback path (slower, flexible)
ao.map(arr, lambda x: x * x)
GIL held
Python callable invoked per element
Still faster than Python iteration due to C-level loop
7. Safety Guarantees
Memory Safety
Bounds-checked slices: All array access is bounds-checked through Rust’s safe APIs
No unsafe pointer arithmetic: We use PyO3’s safe buffer protocol APIs exclusively
No reallocation during in-place ops: In-place operations modify existing buffers without reallocation
Panic boundaries: Rust panics are caught and converted to Python exceptions
Security Guarantees
Input validation: All inputs are validated before processing (type, typecode, properties)
Error message security: Error messages are informative but don’t leak sensitive information
Minimal unsafe code: Only dead code (allocator module) uses unsafe, and it’s well-documented
Dependency security: Dependencies are regularly audited for vulnerabilities
See Security Documentation for comprehensive security information.
8. Performance Characteristics
Operation |
Python list |
array.array |
arrayops |
|---|---|---|---|
Iteration |
Fast |
Slower |
❌ (avoid) |
Sum |
Slow |
Slow |
🚀 |
Map |
Slow |
Slow |
🚀 |
Binary IO |
❌ |
✅ |
✅ |
In-place ops |
❌ |
✅ |
🚀 |
9. Error Handling
Condition |
Behavior |
|---|---|
Unsupported typecode |
TypeError |
Non-contiguous buffer |
BufferError |
Python callback error |
Propagate exception |
Overflow (ints) |
Python semantics |
10. Packaging & Distribution
Build
maturin build
maturin publish
Targets
Linux (manylinux)
macOS (universal2)
Windows
11. Testing Strategy
Rust
Unit tests per kernel
Property tests (quickcheck)
Overflow behavior validation
Python
pytest
Parity tests vs Python list
Buffer safety tests
Large array stress tests
12. Documentation Plan
README with:
Motivation
API examples
Performance benchmarks
Comparison with NumPy
Cookbook examples:
Binary parsing
ETL pipelines
Streaming stats
13. Future Extensions
SIMD auto-vectorization
Parallel execution (rayon)
memoryview support
Optional NumPy interop
Arrow buffer interop
14. Risks & Mitigations
Risk |
Mitigation |
|---|---|
Reinventing NumPy |
Scope control |
Callback overhead |
Encourage fast paths |
API creep |
Minimal surface |
Platform builds |
maturin CI |
15. Summary
arrayops would:
Fill a real gap between
array.arrayand NumPyLeverage Rust for safety + speed
Enable high-performance numeric scripts with zero dependencies