HDF5 data storage
Vahana uses the Hierarchical Data Format version 5 (HDF5) as file format to store simulations to disc, utilizing the HDF5.jl libary. HDF5.jl again uses a C library, which is either installed with HDF5.jl or can be provided by the system. Please check the HDF5.jl documentation for details.
If the provided library supports Parallel HDF5, this will be used automatically. Using Parallel HDF5 has the advantage that all processes can write to a single file, without Parallel HDF5 multiple files are created for a single (parallel) simulation (but the Vahana API is the same in any case, so for the user this difference is only visible when looking into the h5 directory with a file manager or via the shell). But in the current Vahana version using Parallel HDF5 has the disadvantage, that the files are not compressed (see also set_compression
).
Write
To write into a HDF5 file, they are attached
to a Vahama simulation. Normally this happens automatically when the first time a write function like write_snapshot
is called. All following write_*
calls then add additional datasets to the file.
Vahana.set_hdf5_path
— Functionset_hdf5_path(path::String)
Specify the path that is used to save and read the hdf5 files. If the directory does not exist, it is created the first time a file is written or tried to read.
See also create_h5file!
Vahana.write_snapshot
— Functionwrite_snapshot(sim::Simulation, [comment::String = "", ignore = []])
Writes the current state of the simulation sim
to the attached HDF5 file. comment
can be used to identify the snapshot via list_snapshots
, and to read this snapshot via the read_snapshot!
function by utilizing the comment
keyword of this function.
ignore
is a list of agent and/or edge types, that should not be written.
See also create_h5file!
, read_snapshot!
Vahana.create_h5file!
— Functioncreate_h5file!(sim::Simulation, [filename = sim.filename; overwrite = sim.overwrite_file])
The canonical way to create an HDF5 file is to call one of the write_
functions like write_snapshot
. If sim
does not already have an HDF5 file attached, such a file will then be created automatically using the filename specified as keyword in create_simulation
or, if this keyword was not given, the model name. But sometime it can be useful to control this manually, e.g. after a call to copy_simulation
.
The filename
argument can be used to specify a filename other than sim.filename
. If overwrite
is true, existing files with this name will be overwritten. If it is false, the filename is automatically extended by an increasing 6-digit number, so that existing files are not overwritten.
By default, the files are created in an h5
subfolder, and this is created in the current working directory. However, the path can also be set with the function set_hdf5_path
.
In the case that an HDF5 file was already created for the simulation sim
, this will be closed.
create_h5file!
can be only called after finish_init!
See also close_h5file!
, write_agents
, write_edges
, write_globals
, read_agents!
, read_edges!
, read_globals
, read_snapshot!
and list_snapshots
Vahana.close_h5file!
— Functionclose_h5file!(sim::Simulation)
Closes the HDF5 file attached to the simulation sim
.
Be aware that a following call to one of the write_
functions like write_snapshot
will automatically create a new file and, depending on the overwrite_file
argument of create_simulation
also overwrites to closed file.
Beside write_snapshot
there exists also some more fine grained write functions:
Vahana.write_agents
— Functionwrite_agents(sim::Simulation, [types])
Writes the current agent state to the attached HDF5 file. If only the agents of a subset of agent types are to be written, this subset can be specified via the optional types
argument.
Vahana.write_edges
— Functionwrite_edges(sim::Simulation, [types])
Writes the current edge states to the attached HDF5 file. If only the edges of a subset of edge types are to be written, this subset can be specified via the optional types
argument.
Vahana.write_globals
— Functionwrite_globals(sim::Simulation, [fields])
Writes the current global values to the attached HDF5 file. If only a subset of the fields is to be written, this subset can be specified via the optional fields
argument.
Read
In the normal use case we call just write_snapshot(sim, "snapshot description")
(assuming sim is a Vahama simulation). To read such a snapshot, we can then run another Script that creates the same model (see create_model
) and simulation (see create_simulation
) and then call read_snapshot!(sim)
. read_snapshot!
can read also a parallel simulation into a single (REPL) process, then the distributed graph is merged into a single one.
Vahana.read_snapshot!
— Functionread_snapshot!(sim::Simulation, [name::String; transition = typemax(Int64), comment = "", writeable = false, ignore_params = false])
read_snapshot!(sim::Simulation, nr::Int64; [transition = typemax(Int64), comment = "", writeable = false, ignore_params = false])
Read a complete snapshot from a file into the simulation sim
. If name
is given, the snapshot is read from the file with this filename from the h5
subfolder of the current working directory. In the other case the filename from the create_simulation
call is used.
If the overwrite_file
argument of create_simulation
is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr
argument.
Per default, the last written snapshot is read. The transition
keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment
keyword to read the corresponding snapshot.
If writeable
is set to true, the file is also attached to the simulation and following write_
functions like write_snapshot
will be append to the file.
If ignore_params
is set to true, the parameters of sim
will not be changed.
Returns false when no snapshot was found
Also here exists also some more fine grained read functions:
Vahana.read_params
— Functionread_params(filename::String, T::DataType)
read_params(sim::Simulation, T::DataType)
read_params(sim::Simulation, nr::Int64, T::DataType)
read_params(filename::String)
Read the parameters from an HDF5 file. If filename
is given, the parameters are read from the file with this filename from the h5
subfolder of the current working directory.
If a simulation sim
is given instead, the filename from the create_simulation
call is used.
If the overwrite_file
argument of create_simulation
is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr
argument.
In the case that the DataType T
of the argument globals
of create_simulation
used for the simulation is specified, the result will be an instance of T, elsewhere it will be a Dict with the fields of the written Globals type as keys.
Vahana.read_globals
— Functionread_globals(filename::String, T::DataType; [ transition = typemax(Int64), comment = "" ])
read_globals(sim::Simulation, T::DataType; [ transition = typemax(Int64), comment = "" ])
read_globals(sim::Simulation, nr::Int64, T::DataType; [ transition = typemax(Int64), comment = "" ])
read_globals(filename::String; [ transition = typemax(Int64), comment = "" ])
Read the global values from an HDF5 file. If filename
is given, the parameters are read from the file with this filename from the h5
subfolder of the current working directory.
If a simulation sim
is given instead, the filename from the create_simulation
call is used.
If the overwrite_file
argument of create_simulation
is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr
argument.
In the case that the DataType T
of the argument globals
of create_simulation
used for the simulation is specified, the result will be an instance of T, elsewhere it will be a Dict with the fields of the written Globals type as keys.
Per default, the last written globals are read. The transition
keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment
keyword to read the corresponding snapshot.
See also File storage for details.
Vahana.read_agents!
— Functionread_agents!(sim::Simulation, [name::String = sim.filename; transition = typemax(Int64), types::Vector{DataType}, comment = "" ])
read_agents!(sim::Simulation, nr::Int64; [transition = typemax(Int64), types::Vector{DataType}, comment = "" ])
Read the agents from an HDF5 file into the simulation sim
. If name
is given, the agent are read from the file with this filename from the h5
subfolder of the current working directory, or from the subfolder set with set_hdf5_path
. In the other case the filename from the create_simulation
call is used.
If the overwrite_file
argument of create_simulation
is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr
argument.
Per default, the last written agents are read. The transition
keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment
keyword to read the corresponding snapshot.
If only the agents of a subset of agent types are to be read, this subset can be specified via the optional types
argument.
When the agents from a distributed simulation is read into a single threaded simulation, the IDs of the agents are modified. read_agents!
returns a dictory that contains the ID mapping.
Vahana.read_edges!
— Functionread_edges!(sim::Simulation, [name::String = sim.filename; idmapfunc = identity, transition = typemax(Int64), types::Vector{DataType}, comment = "" ])
read_edges!(sim::Simulation, nr::Int64; [ idmapfunc = identity, transition = typemax(Int64), types::Vector{DataType}, comment = "" ])
Read the edges from an HDF5 file into the simulation sim
. If name
is given, the edges are read from the file with this filename from the h5
subfolder of the current working directory. In the other case the filename from the create_simulation
call is used.
If the overwrite_file
argument of create_simulation
is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr
argument.
Per default, the last written edges are read. The transition
keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment
keyword to read the corresponding snapshot.
If only the edges of a subset of edge types are to be read, this subset can be specified via the optional types
argument.
When the agents from a distributed simulation is read into a single threaded simulation, the IDs of the agents are modified. The idmapfunc
must be a function that must return the new agent id for a given old agent id. read_agents!
returns a Dict{AgentID, AgentID}
that can be used for this via: idmapfunc = (key) -> idmapping[key]
, where idmapping
is such a Dict
.
For e.g. postprocessing it is possible to read the simulation state without defining a model and simulation. Just import Vahana, use set_hdf5_path
to determine where the simulations are stored and use:
Vahana.read_agents
— Functionread_agents(filename::String, type; transition = typemax(Int64))
Read the agentstates of type
(which can be a DataType, String or Symbol) from an HDF5 file with the name filename
. The agent are read from the h5
subfolder of the current working directory, or from the subfolder set with set_hdf5_path
.
Per default, the last written agents are read. The transition
keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment
keyword to read the corresponding snapshot.
Returns a vector of agentstates.
Vahana.read_edges
— Functionread_edges(filename::String, type; transition = typemax(Int64), comment = "")
Read the edgestates of type
(which can be a DataType, String or Symbol) from an HDF5 file with the name filename
. The edgestates are read from the h5
subfolder of the current working directory, or from the subfolder set with set_hdf5_path
.
Per default, the last written edgestates are read. The transition
keyword allows to read also earlier versions. Alternatively, the comment that was specified when write_snapshot was called can be specified with the comment
keyword to read the corresponding snapshot.
Returns a vector of edgestates.
You can also read the parameters and globals of a simulation in the same way using the method implementations of read_params
and read_globals
without specifing a DataType (this will return Dicts of parameters/globals).
Transition
All read_*
functions have a keyword called transition. If this is not set, the last stored data of a type is always read, but via this keyword it is also possible to read the previous state of the simulation (or a part of it) (assuming it was written multiple times, of course).
Vahana counts internally how many times the function apply!
is called (in the current Vahana implementation this is stored in a field called num_transitions
of the simulation). When a new dataset is created by a write_*
call, this information is stored with the dataset.
When read_snapshot!
is called, Vahana looks for the highest num_transition
that is less than or equal to the transition keyword for the types to be read. Since the default value for the argument is typemax(Int64), the newest dataset is read by default.
For snapshots the list_snapshots
function returns a list of all stored snapshots in the file.
Vahana.list_snapshots
— Functionlist_snapshots(name::String)
list_snapshots(sim::Simulation)
list_snapshots(sim::Simulation, nr::Int64)
List all snapshots of a HDF5 file. If name
is given, the snapshots from the file with this filename is returned. In the other case the filename from the create_simulation
call is used.
If the overwrite_file
argument of create_simulation
is set to true, and the file names are supplemented with a number, the number of the meant file can be specified via the nr
argument.
Returns a vector of tuples, where the first element is the transition number for which a snapshot was saved, and the second element is the comment given in the write_snapshot
call.
Metadata
It's possible to attach Metadata to the parameters and globals of a simulation.
Vahana.write_metadata
— Functionwrite_metadata(sim::Simulation, type::Union{Symbol, DataType}, field::Symbol, key::Symbol, value)
Attach metadata to a field
of an agent- or edgetype or the globals
or params
struct (see create_simulation
) or to a raster (in that case field
must be the name of the raster). type
must be an agent- or edgetype, :Global, :Param or :Raster. Metadata is stored via key
, value
pairs, so that multiple data of different types can be attached to a single field.
See also: read_metadata
Vahana.write_sim_metadata
— Functionwrite_sim_metadata(sim::Simulation, key::Symbol, value)
Attach additional metadata to a simulation.
See also: read_sim_metadata
Vahana.read_metadata
— Functionread_metadata(sim::Simulation, type::Union{Symbol, DataType}[, field::Symbol, key::Symbol ])
read_metadata(filename::String, type::Union{Symbol, DataType}[, field::Symbol, key::Symbol ])
Read metadata for a field
of an agent- or edgetype or the globals
or params
struct (see create_simulation
) or to a raster (in that case field
must be the name of the raster). type
must be an agent- or edgetype :Global, :Param or :Raster. Metadata is stored via key
, value pairs. Multiple data of different types can be attached to a single field, a single piece of the metadata can be retrived via the key
parameter. If this is not set (or set to Symbol()), a Dict{Symbol, Any} with the complete metadata of this field is returned.
See also: write_metadata
Vahana.read_sim_metadata
— Functionread_sim_metadata(sim::Simulation, [ key::Symbol = Symbol() ])
read_sim_metadata(filename::String, [ key::Symbol = Symbol() ])
Read metadata for a simulation or from the file filename
. Metadata is stored via key
, value pairs. If key
is not set (or set to Symbol()), a Dict{Symbol, Any} with the complete metadata of the simulation is returned.
The following metadata is stored automatically:
- simulation_name
- model_name
- date (in the format "yyyy-mm-dd hh:mm:ss")
See also: write_metadata
Restrictions and Workarounds
The exact datastructs that can be stored and read from a HDF5 depends in the HDF5.jl implementation. E.g. before v0.16.15 Tuples where not supported.
The following functions are workarounds for two current restrictions.
Vahana.create_enum_converter
— Functioncreate_enum_converter()
The HDF5.jl library does not support Enums as fields of structs that should be stored. This function add this support but as this involves type piracy, this support must be enabled explicitly.
Vahana.create_namedtuple_converter
— Functioncreate_namedtuple_converter(T::DataType)
The HDF5.jl library does not support the storage of nested structs, but structs can have NamedTuples
as fields. This function creates a convert function from a struct to a corresponding NamedTuple
(and also the other way around), so after calling this for a type T
, T
can be the type of an agent/edge/param/global field.
Vahana.create_string_converter
— Functioncreate_string_converter(add_show_method::Bool = true)
The HDF5.jl library (version 0.17.2) does not support InlineStrings
or StaticStrings
in structs. Standard String
s are also not suitable for agent and edge types, as these must be bits types.
To address this limitation, create_string_converter
generates gconversion methods between String
and SVector{N, UInt8}
instances (from the StaticArrays
package). Here, N represents the maximum number of bytes that can be stored, which for Unicode strings may exceed the character count due to variable-length encoding.
For example, you could create a Person struct with a fixed-size name field:
struct Foo
foo::SVector{20, UInt8}
end
You can then construct a Foo
instance using a regular string: Foo("abc")
.
If add_show_method
is set to true
(the default), show
methods are also defined for these SVector
s. To avoid confusion while working with Julia's REPL, where the value might appear as a String
while actually being an SVector
, the output includes "(as UInt8-Vector)" after the value itself.
When add_show_method
is set to true
(which is the default behavior), show
methods are automatically defined for the SVector
types. To prevent potential confusion arising from the display of SVector
values as String
s, the output is formatted to include the annotation "::UInt8[]" following the value itself.
Example Model
The tutorials in the documentation does not include examples for the file storage functionality, but this model is a good example, which also demonstrates checkpointing (resuming a simulation after an interruption), and working with initial snapshots (the initialized simulation state is stored after the graph structure is constructed and distributed to the different processes).