Zarr (data format)
Zarr is an open standard for storing large multidimensional array data. It specifies a protocol and data format, and is designed to be "cloud ready" including random access, by dividing data into subsets referred to as chunks.[1][2] Zarr can be used within many programming languages, including Python, Java, JavaScript, C++, Rust and Julia.[3] It has been used by organizations such as Google and Microsoft to publish large datasets.[4][5] Early versions of Zarr were first released in 2015 by Alistair Miles.[6][7] Zarr is designed to support high-throughput distributed I/O on different storage systems, which is a common requirement in cloud computing. Multiple read operations can efficiently occur to a Zarr array in parallel, or multiple write operations in parallel.[8] Format descriptionThe main data format in Zarr is multidimensional arrays. For parallelisable access, these arrays are stored and accessed as a grid of so-called "chunks". The actual data format on disk depends on the compressor and storage plugins selected by the user.[8] ![]() Zarr's design was influenced by that of HDF5, and so it includes similar features for metadata and grouping: arrays can be grouped into named hierarchies, and they can also be annotated with key-value metadata stored alongside the array.[8] ApplicationsFor bioimaging such as microscopy, a consortium called the Open Microscopy Environment (OME) created a format called "OME-Zarr", based on Zarr with some discipline-specific extensions.[9] Similarly, Zarr is being used to publish weather and satellite data [10] and energy data,[11] among others. See alsoReferences
External links
|
Portal di Ensiklopedia Dunia