The Apache Arrow project is a standard for representing columnar data for in-memory processing, which has a different set of trade-offs compared to on-disk storage. In-memory, access is much faster and processes optimize for CPU throughput by paying attention to cache locality, pipelining, and SIMD instructions.
Posted September 12, 2017
The Apache Arrow project is a standard for representing data for in-memory processing.Hardware evolves rapidly. Because Apache Arrow was designed to benefit many different types of software in a wide range of hardware environments, the project team focused on making the work "future-proof," which meant anticipating changes to hardware over the next decade.
Posted June 14, 2017