in-memory-analytics-with-apache-arrow - Harry's Computer Science Notes and Blog

# In-Memory Analytics with Apache Arrow * Exercises at https://github.com/Harry-Kwon/arrow-sandbox # skim notes ## Section 1 ### ch1 arrow memory format - **array**: list of values with known length and same tyoe - **record batch**: group of equal length **arrays** and a schema - **slot**: value in an **array** specified by index ### ch2 key arrow specs - read from file system, amazon s3, hdfs, csv * parallelized csv read - pandas + arrow - **chunked array**: wrapper around group of arrow **arrays** of same data type * incrementally build up an array without allocating memory - **table**: holds one or more **chunked arrays** and a schema. * analagous to **record batches** - sliced buffers for working in parallel - FFI C headers for zero-copy sharing between languages