Roadmap

This document outlines the planned features and improvements for the arrayops project.

βœ… Completed

Core Operations

  • [x] Sum operation - Fast summation for all numeric array types

  • [x] Scale operation - In-place scaling with type-safe factor application

  • [x] Full test coverage - 100% coverage for both Python and Rust code

  • [x] Type stubs for mypy - Complete type annotations for static type checking

Phase 1: Additional Operations

  • [x] map(arr, fn) -> array - Apply function to each element, return new array

    • Support for Python callables (lambda, functions)

    • Type preservation (input type determines output type)

    • Performance target: 10-50x faster than Python list comprehension

  • [x] map_inplace(arr, fn) -> None - Apply function in-place

    • Modify array elements without allocation

    • Same callable support as map

    • Performance target: 5-20x faster than Python loop

  • [x] filter(arr, predicate) -> array - Return new array with filtered elements

    • Support for Python callable predicates

    • Preserve original array type

    • Handle empty results gracefully

    • Performance target: 10-30x faster than Python list comprehension

  • [x] reduce(arr, fn, initial=None) -> scalar - Fold array with binary function

    • Support for Python callables

    • Optional initial value

    • Type inference for return value

    • Performance target: 10-40x faster than Python functools.reduce

Phase 2: Performance Optimizations

  • [x] Parallel execution (Rayon) - Parallel processing for large arrays

    • Feature flag: parallel - Enable with --features parallel or maturin build --features parallel

    • Thread-safe buffer access via Vec extraction

    • Threshold-based parallelization (10,000 elements for sum/reduce, 5,000 for scale)

    • Implemented for: sum, scale operations

    • Performance: Near-linear speedup on multi-core systems

  • [x] SIMD infrastructure - Framework for SIMD optimizations

    • Feature flag: simd - Enable with --features simd

    • Infrastructure in place for future SIMD implementation

    • Note: Full SIMD implementation pending std::simd API stabilization

    • When implemented: 2-4x additional speedup expected

🚧 In Progress

No items currently in progress

πŸ“‹ Planned Features

Phase 3: Interoperability (Medium Priority)

  • [x] NumPy array support - Operate on numpy.ndarray objects

    • Zero-copy access via NumPy’s buffer protocol

    • Support for contiguous 1D arrays

    • Type conversion handling

    • Performance: Match or exceed NumPy’s built-in operations

    • Optional dependency (only import when NumPy available)

    • Returns numpy.ndarray for map and filter operations

  • [x] Python memoryview support - Work with memoryview objects

    • Buffer protocol access

    • Type inference from memoryview format

    • Support for both read-only and writable memoryviews

    • Use case: Binary data processing, network protocols

    • In-place operations require writable memoryviews

Phase 4: Advanced Features

  • [x] Statistical Operations - Statistical analysis functions

    • [x] mean(arr) -> float - Arithmetic mean

    • [x] min(arr) -> scalar - Minimum value

    • [x] max(arr) -> scalar - Maximum value

    • [x] std(arr) -> float - Standard deviation (population)

    • [x] var(arr) -> float - Variance (population)

    • [x] median(arr) -> scalar - Median value

  • [x] Element-wise Operations - Binary array operations

    • [x] add(arr1, arr2) -> array - Element-wise addition

    • [x] multiply(arr1, arr2) -> array - Element-wise multiplication

    • [x] clip(arr, min, max) -> None - In-place clipping to range

    • [x] normalize(arr) -> None - In-place normalization to [0, 1]

  • [x] Array Manipulation - Array transformation operations

    • [x] reverse(arr) -> None - In-place reversal

    • [x] sort(arr) -> None - In-place sorting (for numeric types)

    • [x] unique(arr) -> array - Return unique elements (sorted)

Advanced Features (Post-Phase 4)

  • [x] Zero-copy slicing - slice(arr, start=None, end=None) -> memoryview

    • Returns zero-copy memoryview of array slice

    • Works with all supported input types (array.array, numpy.ndarray, memoryview, Arrow)

    • No data copying - view shares memory with original array

  • [x] Lazy evaluation - lazy_array(arr) -> LazyArray

    • Chain operations without intermediate allocations

    • Supports map() and filter() operations

    • Execution deferred until collect() is called

    • More memory-efficient for complex operation chains

  • [x] Arrow buffer interop - Support for Apache Arrow buffers/arrays

    • Automatic detection of pyarrow.Buffer, pyarrow.Array, and pyarrow.ChunkedArray

    • All operations work transparently with Arrow arrays

    • Returns Arrow arrays when Arrow input is used

    • Optional dependency (requires pyarrow to be installed)

πŸ”¬ Research & Exploration

Potential Enhancements

  • [x] Arrow buffer interop - Support Apache Arrow memory format βœ…

  • [x] Zero-copy slicing - Return views instead of copies where possible βœ…

  • [x] Lazy evaluation - Chain operations without intermediate allocations βœ…

  • [ ] Custom allocators - Support for specialized memory pools (infrastructure in place)

  • [ ] GPU acceleration - Optional CUDA/OpenCL support for very large arrays

API Design Considerations

  • [ ] Method chaining - Consider fluent API: arr.map(fn).filter(pred).sum()

  • [ ] Iterator protocol - Support Python iteration efficiently

  • [ ] Context managers - Resource management for parallel operations

  • [ ] Async support - Async/await for I/O-bound operations

πŸ“Š Success Metrics

Performance Targets

  • Sum: 100x faster than Python (βœ… achieved)

  • Scale: 50x faster than Python (βœ… achieved)

  • Map: 10-50x faster than list comprehension

  • Filter: 10-30x faster than list comprehension

  • Parallel ops: Near-linear speedup on 4-8 core systems

Quality Targets

  • Maintain 100% test coverage

  • Zero memory safety issues

  • Full mypy type checking compliance

  • Comprehensive documentation with examples

πŸ—“οΈ Timeline

Q1 2024

  • [x] Complete Phase 1 (map, filter, reduce operations)

  • [x] Complete Phase 2 (parallel execution infrastructure, SIMD framework)

Q2 2024

  • [x] Implement parallel execution with rayon (completed in Q1 2024)

  • [ ] Complete full SIMD optimizations (infrastructure in place, pending API stabilization)

  • [x] NumPy interop (completed in Q1 2024)

  • [x] Memoryview support (completed in Q1 2024)

Q3 2024

  • [x] Complete NumPy and memoryview support (completed in Q1 2024)

  • [x] Statistical operations (completed in Q4 2024)

  • [x] Performance benchmarking suite (completed)

Q4 2024

  • [x] Advanced features (element-wise ops, array manipulation) (completed)

  • [x] API polish and documentation (completed)

  • [x] Arrow buffer interop (completed)

  • [x] Zero-copy slicing (completed)

  • [x] Lazy evaluation (completed)

  • [ ] Version 1.0 release candidate

Q1 2025

  • [ ] Version 1.0 release - Official stable release

  • [ ] Post-release bug fixes and stability improvements

  • [ ] Performance profiling and optimization pass

  • [ ] Community feedback integration

  • [ ] Enhanced error messages and diagnostics

Q2 2025

  • [ ] Iterator protocol optimization for efficient Python iteration

  • [ ] Method chaining API design and prototype

  • [ ] Advanced SIMD optimizations (platform-specific tuning)

  • [ ] Performance regression testing infrastructure

  • [ ] Community benchmarks and case studies

  • [ ] Custom allocator support for specialized memory pools (infrastructure in place)

  • [ ] Async/await support for I/O-bound operations

  • [ ] Context managers for parallel operation resource management

  • [ ] Extended statistical operations (percentiles, quantiles)

  • [ ] Multi-dimensional array support research (if demand exists)

  • [ ] Ecosystem integration (pandas, polars compatibility layers)

Q4 2025

  • [ ] GPU acceleration research - CUDA/OpenCL feasibility study

  • [ ] Advanced array manipulation (reshape, transpose concepts)

  • [ ] Streaming/chunked processing for very large arrays

  • [ ] Memory-mapped file support

  • [ ] Performance optimization pass based on real-world usage

  • [ ] Version 2.0 planning and design

  • [ ] Community workshop and conference presentations

🀝 Contributing to the Roadmap

We welcome contributions! If you’re interested in implementing any of these features:

  1. Check existing issues and pull requests

  2. Open an issue to discuss the approach

  3. Follow the contribution guidelines in README.md

  4. Ensure 100% test coverage

  5. Update documentation

Priority will be given to:

  • Features with clear use cases

  • Performance improvements

  • Interoperability enhancements

  • Bug fixes and stability improvements

πŸ“ Notes

  • All features should maintain backward compatibility

  • Performance is a primary concern - new features should not regress existing operations

  • Type safety is critical - all operations must validate inputs

  • Documentation and examples are required for all new features


Last updated: Phase 4 completed + Advanced features (Arrow interop, Zero-copy slicing, Lazy evaluation)