.. vim: set fileencoding=utf-8 : .. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/ .. .. Contact: beat.support@idiap.ch .. .. .. .. This file is part of the beat.core module of the BEAT platform. .. .. .. .. Commercial License Usage .. .. Licensees holding valid commercial BEAT licenses may use this file in .. .. accordance with the terms contained in a written agreement between you .. .. and Idiap. For further information contact tto@idiap.ch .. .. .. .. Alternatively, this file may be used under the terms of the GNU Affero .. .. Public License version 3 as published by the Free Software and appearing .. .. in the file LICENSE.AGPL included in the packaging of this file. .. .. The BEAT platform is distributed in the hope that it will be useful, but .. .. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY .. .. or FITNESS FOR A PARTICULAR PURPOSE. .. .. .. .. You should have received a copy of the GNU Affero Public License along .. .. with the BEAT platform. If not, see http://www.gnu.org/licenses/. .. .. _developerguide-io: =============== Inputs/Outputs =============== .. _developerguide-io-introduction: Introduction ------------ The requirements for the platform when reading/writing data are: * Ability to manage large and complex data * Portability to allow the use of heterogeneous environments Based on our experience and on these requirements, we investigated the use of HDF5. Unfortunately, HDF5 is not convenient to handle structures such as arrays of variable-size elements, for instance, array of strings. Therefore, we decided to rely on our own binary format. .. _developerguide-io-strategy: Binary Format ------------- Our binary format does *not* contains information about the format of the data itself, and it is hence necessary to know this format a priori. This means that the format cannot be inferred from the content of a file. We rely on the following fundamental C-style formats: * int8 * int16 * int32 * int64 * uint8 * uint16 * uint32 * uint64 * float32 * float64 * complex64 (first real value, and then imaginary value) * complex128 (first real value, and then imaginary value) * bool (written as a byte) * string An element of such a basic format is written in the C-style way, using little-endian byte ordering. Besides, dataformats always consist of arrays or dictionary of such fundamental formats or compound formats. An array of elements is saved as followed. First, the shape of the array is saved using an *uint64* value for each dimension. Next, the elements of the arrays are saved in C-style order. A dictionary of elements is saved as followed. First, the key are ordered according to the lexicographic ordering. Then, the values associated to each of these keys are saved following this ordering. The platform is data-driven and always processes chunks of data. Therefore, data are always written by chunks, each chunk being preceded by a text-formated header indicated the start- and end- indices followed by the size (in bytes) of the chunck. Considering the Python backend of the platform, this binary format has been successfully implemented using the ``struct`` module.