HDF5 data storage

Vahana uses the Hierarchical Data Format version 5 (HDF5) as file format to store simulations to disc, utilizing the HDF5.jl libary. HDF5.jl again uses a C library, which is either installed with HDF5.jl or can be provided by the system. Please check the HDF5.jl documentation for details.

If the provided library supports Parallel HDF5, this will be used automatically. Using Parallel HDF5 has the advantage that all processes can write to a single file, without Parallel HDF5 multiple files are created for a single (parallel) simulation (but the Vahana API is the same in any case, so for the user this difference is only visible when looking into the h5 directory with a file manager or via the shell). But in the current Vahana version using Parallel HDF5 has the disadvantage, that the files are not compressed (see also set_compression).

Write

To write into a HDF5 file, they are attached to a Vahama simulation. Normally this happens automatically when the first time a write function like write_snapshot is called. All following write_* calls then add additional datasets to the file.

Vahana.set_hdf5_pathFunction
set_hdf5_path(path::String)

Specify the path that is used to save and read the hdf5 files. If the directory does not exist, it is created the first time a file is written or tried to read.

See also create_h5file!

source
Vahana.write_snapshotFunction
write_snapshot(sim::Simulation, [comment::String = "", ignore = []])

Writes the current state of the simulation sim to the attached HDF5 file. comment can be used to identify the snapshot via list_snapshots. ignore is a list of agent and/or edge types, that should not be written.

See also create_h5file!

source
Vahana.create_h5file!Function
create_h5file!(sim::Simulation, [filename = sim.filename; overwrite = sim.overwrite_file])

The canonical way to create an HDF5 file is to call one of the write_ functions like write_snapshot. If sim does not already have an HDF5 file attached, such a file will then be created automatically using the filename specified as keyword in create_simulation or, if this keyword was not given, the model name. But sometime it can be useful to control this manually, e.g. after a call to copy_simulation.

The filename argument can be used to specify a filename other than sim.filename. If overwrite is true, existing files with this name will be overwritten. If it is false, the filename is automatically extended by an increasing 6-digit number, so that existing files are not overwritten.

By default, the files are created in an h5 subfolder, and this is created in the current working directory. However, the path can also be set with the function set_hdf5_path.

In the case that an HDF5 file was already created for the simulation sim, this will be closed.

create_h5file! can be only called after finish_init!

See also close_h5file!, write_agents, write_edges, write_globals, read_agents!, read_edges!, read_globals, read_snapshot! and list_snapshots

source
Vahana.close_h5file!Function
close_h5file!(sim::Simulation)

Closes the HDF5 file attached to the simulation sim.

Be aware that a following call to one of the write_ functions like write_snapshot will automatically create a new file and, depending on the overwrite_file argument of create_simulation also overwrites to closed file.

source

Beside write_snapshot there exists also some more fine grained write functions:

Vahana.write_agentsFunction
write_agents(sim::Simulation, [types])

Writes the current agent state to the attached HDF5 file. If only the agents of a subset of agent types are to be written, this subset can be specified via the optional types argument.

source
Vahana.write_edgesFunction
write_edges(sim::Simulation, [types])

Writes the current edge states to the attached HDF5 file. If only the edges of a subset of edge types are to be written, this subset can be specified via the optional types argument.

source
Vahana.write_globalsFunction
write_globals(sim::Simulation, [fields])

Writes the current global values to the attached HDF5 file. If only a subset of the fields is to be written, this subset can be specified via the optional fields argument.

source

Read

In the normal use case we call just write_snapshot(sim, "snapshot description") (assuming sim is a Vahama simulation). To read such a snapshot, we can then run another Script that creates the same model (see create_model) and simulation (see create_simulation) and then call read_snapshot!(sim). read_snapshot! can read also a parallel simulation into a single (REPL) process, then the distributed graph is merged into a single one.

Vahana.read_snapshot!Function
read_snapshot!(sim::Simulation, [name::String; transition = typemax(Int64), comment = "", writeable = false, ignore_params = false])
read_snapshot!(sim::Simulation, nr::Int64; [transition = typemax(Int64), comment = "", writeable = false, ignore_params = false])

Read a complete snapshot from a file into the simulation sim. If name is given, the snapshot is read from the file with this filename from the h5 subfolder of the current working directory. In the other case the filename from the create_simulation call is used.

If the overwrite_file argument of create_simulation is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr argument.

Per default, the last written snapshot is read. The transition keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment keyword to read the corresponding snapshot.

If writeable is set to true, the file is also attached to the simulation and following write_ functions like write_snapshot will be append to the file.

If ignore_params is set to true, the parameters of sim will not be changed.

Returns false when no snapshot was found

source

Also here exists also some more fine grained read functions:

Vahana.read_paramsFunction
read_params(filename::String, T::DataType)
read_params(sim::Simulation, T::DataType)
read_params(sim::Simulation, nr::Int64, T::DataType)
read_params(filename::String)

Read the parameters from an HDF5 file. If filename is given, the parameters are read from the file with this filename from the h5 subfolder of the current working directory.

If a simulation sim is given instead, the filename from the create_simulation call is used.

If the overwrite_file argument of create_simulation is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr argument.

In the case that the DataType T of the argument globals of create_simulation used for the simulation is specified, the result will be an instance of T, elsewhere it will be a Dict with the fields of the written Globals type as keys.

source
Vahana.read_globalsFunction
read_globals(filename::String, T::DataType; [ transition = typemax(Int64), comment = "" ])
read_globals(sim::Simulation, T::DataType; [ transition = typemax(Int64), comment = "" ])
read_globals(sim::Simulation, nr::Int64, T::DataType; [ transition = typemax(Int64), comment = "" ])
read_globals(filename::String; [ transition = typemax(Int64), comment = "" ])

Read the global values from an HDF5 file. If filename is given, the parameters are read from the file with this filename from the h5 subfolder of the current working directory.

If a simulation sim is given instead, the filename from the create_simulation call is used.

If the overwrite_file argument of create_simulation is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr argument.

In the case that the DataType T of the argument globals of create_simulation used for the simulation is specified, the result will be an instance of T, elsewhere it will be a Dict with the fields of the written Globals type as keys.

Per default, the last written globals are read. The transition keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment keyword to read the corresponding snapshot.

See also File storage for details.

source
Vahana.read_agents!Function
read_agents!(sim::Simulation, [name::String = sim.filename; transition = typemax(Int64), types::Vector{DataType}, comment = "" ])
read_agents!(sim::Simulation, nr::Int64; [transition = typemax(Int64), types::Vector{DataType}, comment = "" ])

Read the agents from an HDF5 file into the simulation sim. If name is given, the agent are read from the file with this filename from the h5 subfolder of the current working directory, or from the subfolder set with set_hdf5_path. In the other case the filename from the create_simulation call is used.

If the overwrite_file argument of create_simulation is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr argument.

Per default, the last written agents are read. The transition keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment keyword to read the corresponding snapshot.

If only the agents of a subset of agent types are to be read, this subset can be specified via the optional types argument.

When the agents from a distributed simulation is read into a single threaded simulation, the IDs of the agents are modified. read_agents! returns a dictory that contains the ID mapping.

source
Vahana.read_edges!Function
read_edges!(sim::Simulation, [name::String = sim.filename; idmapfunc = identity, transition = typemax(Int64), types::Vector{DataType}, comment = "" ])
read_edges!(sim::Simulation, nr::Int64; [ idmapfunc = identity, transition = typemax(Int64), types::Vector{DataType}, comment = "" ])

Read the edges from an HDF5 file into the simulation sim. If name is given, the edges are read from the file with this filename from the h5 subfolder of the current working directory. In the other case the filename from the create_simulation call is used.

If the overwrite_file argument of create_simulation is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr argument.

Per default, the last written edges are read. The transition keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment keyword to read the corresponding snapshot.

If only the edges of a subset of edge types are to be read, this subset can be specified via the optional types argument.

When the agents from a distributed simulation is read into a single threaded simulation, the IDs of the agents are modified. The idmapfunc must be a function that must return the new agent id for a given old agent id. read_agents! returns a Dict{AgentID, AgentID} that can be used for this via: idmapfunc = (key) -> idmapping[key], where idmapping is such a Dict.

source

For e.g. postprocessing it is possible to read the simulation state without defining a model and simulation. Just import Vahana, use set_hdf5_path to determine where the simulations are stored and use:

Vahana.read_agentsFunction
read_agents(filename::String, type; transition = typemax(Int64))

Read the agentstates of type (which can be a DataType, String or Symbol) from an HDF5 file with the name filename. The agent are read from the h5 subfolder of the current working directory, or from the subfolder set with set_hdf5_path.

Per default, the last written agents are read. The transition keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment keyword to read the corresponding snapshot.

Returns a vector of agentstates.

source
Vahana.read_edgesFunction
read_edges(filename::String, type; transition = typemax(Int64), comment = "")

Read the edgestates of type (which can be a DataType, String or Symbol) from an HDF5 file with the name filename. The edgestates are read from the h5 subfolder of the current working directory, or from the subfolder set with set_hdf5_path.

Per default, the last written edgestates are read. The transition keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment keyword to read the corresponding snapshot.

Returns a vector of edgestates.

source

You can also read the parameters and globals of a simulation in the same way using the method implementations of read_params and read_globals without specifing a DataType (this will return Dicts of parameters/globals).

Transition

All read_* functions have a keyword called transition. If this is not set, the last stored data of a type is always read, but via this keyword it is also possible to read the previous state of the simulation (or a part of it) (assuming it was written multiple times, of course).

Vahana counts internally how many times the function apply! is called (in the current Vahana implementation this is stored in a field called num_transitions of the simulation). When a new dataset is created by a write_* call, this information is stored with the dataset.

When read_snapshot! is called, Vahana looks for the highest num_transition that is less than or equal to the transition keyword for the types to be read. Since the default value for the argument is typemax(Int64), the newest dataset is read by default.

For snapshots the list_snapshots function returns a list of all stored snapshots in the file.

Vahana.list_snapshotsFunction
list_snapshots(name::String)
list_snapshots(sim::Simulation)
list_snapshots(sim::Simulation, nr::Int64)

List all snapshots of a HDF5 file. If name is given, the snapshots from the file with this filename is returned. In the other case the filename from the create_simulation call is used.

If the overwrite_file argument of create_simulation is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr argument.

Returns a vector of tuples, where the first element is the transition number for which a snapshot was saved, and the second element is the comment given in the write_snapshot call.

source

Metadata

It's possible to attach Metadata to the parameters and globals of a simulation.

Vahana.write_metadataFunction
write_metadata(sim::Simulation, type::Union{Symbol, DataType}, field::Symbol, key::Symbol, value)

Attach metadata to a field of an agent- or edgetype or the globals or params struct (see create_simulation) or to a raster (in that case field must be the name of the raster). type must be an agent- or edgetype, :Global, :Param or :Raster. Metadata is stored via key, value pairs, so that multiple data of different types can be attached to a single field.

See also: read_metadata

source
Vahana.read_metadataFunction
read_metadata(sim::Simulation, type::Union{Symbol, DataType}[, field::Symbol, key::Symbol ])
read_metadata(filename::String, type::Union{Symbol, DataType}[, field::Symbol, key::Symbol ])

Read metadata for a field of an agent- or edgetype or the globals or params struct (see create_simulation) or to a raster (in that case field must be the name of the raster). type must be an agent- or edgetype :Global, :Param or :Raster. Metadata is stored via key, value pairs. Multiple data of different types can be attached to a single field, a single piece of the metadata can be retrived via the key parameter. If this is not set (or set to Symbol()), a Dict{Symbol, Any} with the complete metadata of this field is returned.

See also: write_metadata

source
Vahana.read_sim_metadataFunction
read_sim_metadata(sim::Simulation, [ key::Symbol = Symbol() ])
read_sim_metadata(filename::String, [ key::Symbol = Symbol() ])

Read metadata for a simulation or from the file filename. Metadata is stored via key, value pairs. If key is not set (or set to Symbol()), a Dict{Symbol, Any} with the complete metadata of the simulation is returned.

The following metadata is stored automatically:

  • simulation_name
  • model_name
  • date (in the format "yyyy-mm-dd hh:mm:ss")

See also: write_metadata

source

Restrictions and Workarounds

The exact datastructs that can be stored and read from a HDF5 depends in the HDF5.jl implementation. E.g. before v0.16.15 Tuples where not supported.

The following functions are workarounds for two current restrictions.

Vahana.create_enum_converterFunction
create_enum_converter()

The HDF5.jl library does not support Enums as fields of structs that should be stored. This function add this support but as this involves type piracy, this support must be enabled explicitly.

source
Vahana.create_namedtuple_converterFunction
create_namedtuple_converter(T::DataType)

The HDF5.jl library does not support the storage of nested structs, but structs can have NamedTuples as fields. This function creates a convert function from a struct to a corresponding NamedTuple (and also the other way around), so after calling this for a type T, T can be the type of an agent/edge/param/global field.

source
Vahana.create_string_converterFunction
create_string_converter(add_show_method::Bool = true)

The HDF5.jl library (version 0.17.2) does not support InlineStrings or StaticStrings in structs. Standard Strings are also not suitable for agent and edge types, as these must be bits types.

To address this limitation, create_string_converter generates gconversion methods between String and SVector{N, UInt8} instances (from the StaticArrays package). Here, N represents the maximum number of bytes that can be stored, which for Unicode strings may exceed the character count due to variable-length encoding.

For example, you could create a Person struct with a fixed-size name field:

struct Foo
    foo::SVector{20, UInt8}
end

You can then construct a Foo instance using a regular string: Foo("abc").

If add_show_method is set to true (the default), show methods are also defined for these SVectors. To avoid confusion while working with Julia's REPL, where the value might appear as a String while actually being an SVector, the output includes "(as UInt8-Vector)" after the value itself.

When add_show_method is set to true (which is the default behavior), show methods are automatically defined for the SVector types. To prevent potential confusion arising from the display of SVector values as Strings, the output is formatted to include the annotation "::UInt8[]" following the value itself.

source

Example Model

The tutorials in the documentation does not include examples for the file storage functionality, but this model is a good example, which also demonstrates checkpointing (resuming a simulation after an interruption), and working with initial snapshots (the initialized simulation state is stored after the graph structure is constructed and distributed to the different processes).