Data Conversion Tutorial#

up4 is built on the HDF5 data format. To use up4 we must first convert data to HDF5. Up to this point there are two supported filetypes that we can convert from: csv and vtk. Inside the vtk format, this includes both the legacy ASCII format (.vtk) and also the modern, xml-based format.

CSV#

CSV files can be converted in the following manner:

import up4
up4.Converter.csv(
    'path/to/data.csv',     # Path to the csv file
    'output.hdf5',          # filename to write
    columns = [0,1,2,3],    # Select the columns for to read, pointing to the time, x-, y-, z-positions
    delimiter = ',',        # Delimiter used in the csv file
    header = True,          # Does the csv file have a header?
    comment = '#',          # Comment character
    vel = True,             # Do you want to calculate the velocity
    interpolate = True,     # Do you want to interpolate the data
    radius = 0.1,           # Radius of the particle
)

It is strongly recommended to interpolate the data. up4 sometimes relies on the spacing between the data points to be constant. This also helps to remove the effects of sample rate when comparing between datasets accquired by different techniques. If the csv contains velocity information you can read it in by simply extending the columns vector from 4 to 7 elements, pointing to t, x, y, z, vx, vy, vz.

VTK#

The vtk reader is developed to read both legacy .vtk files generated by DEM engines such as LIGGGHTS and modern xml-based files generated by other DEM engines such as Lethe.

These different filetypes require different conversion methods, which up4 dispatches to based on the file extension given in the filter argument.

Warning

The file extension in the filter must match the file extension of the files you are trying to convert. If you are trying to convert a .vtu file, the filter must be set to .vtu. If you are trying to convert a .vtk file, the filter must be set to .vtk.

Legacy VTK#

There are two ways to convert vtk files, either with up4.Converter.vtk, which requires a list of filenames as input, or with up4.Converter.vtk_from_folder which will read all vtk files in a folder and convert them to hdf5.

The list of filenames in up4.Converter.vtk must be sorted into a natural and not lexicographical order. This basically means that the sorting order should consider numbers in a file as numbers and not sort on a per-character basis. In the specific case for files generated by LIGGGHTS, the sub-files generated by LIGGGHTS (any involving boundingBox in the name) must also be removed. The code below shows how to do this:

from glob import glob
from natsort import natsorted
files = glob('path/to/folder/*.vtk')
files = [f for f in files if not "boundingBox" in f] # remove LIGGGHTS sub-files
files = natsorted(files)

conversion can then be done with:

import up4
up4.Converter.vtk(
    files,                 # Sorted list of filenames
    1e-5,                  # timestep of the simulation
    'output.hdf5',         # filename to write
    r"(\d+).vtk",         # regex to extract the timestep from the filename
)

The regex filter used in up4.Converter.vtk is used to extract the timestep from the filename. The regex must contain a group of numbers. Read more about regex here. The default regex filter should work in most cases.

The function up4.Converter.vtk_from_folder is a wrapper around up4.Converter.vtk and can be used as follows:

import up4
up4.Converter.vtk_from_folder(
    'path/to/folder',       # Path to the folder containing the vtk files
    1e-5,                   # timestep of the simulation
    'output.hdf5',          # filename to write
    r"(\d+).vtk",          # regex to extract the timestep from the filename
)

The field names arguments proceeding the filter argument are defaulted to LIGGGHTS naming conventions, but can be changed to match the field names in the vtk files you are trying to convert:

import up4
up4.Converter.vtk(
    files,                 # Sorted list of filenames
    1e-5,                  # timestep of the simulation
    'output.hdf5',         # filename to write
    r"(\d+).vtk",         # regex to extract the timestep from the filename
    filter = '.vtk',       # File extension
    velocity_field_name = "Velocity", # Name of the velocity field in the vtk files
)

Modern VTK#

The converter for the modern VTK formats supports unstructured grid (.vtu) or polydata

(.vtp) files. It is likely that you will need to specify field names for the velocity,: radius, id and type fields in the vtk files you are trying to convert, as the defaults are set to LIGGGHTS. The radius_field_name and diameter_field_name arguments are mutually exclusive, and only one is needed. The end result is the same as the diameter values are used to calculate the radius values that up4 internally uses. If diameter_field_name is set, this is the value that will be used, regardless of the radius_field_name argument value.

Like with the legacy VTK converter, up4 can convert either a naturally sorted list of files (here with a .vtu extension), or look inside a folder and extract the necessary files itself. Sorting a list of .pvtu files and converting them can be done as follows:

from glob import glob
from natsort import natsorted
files = glob('path/to/folder/*.vtu')
files = natsorted(files)

import up4
up4.Converter.vtk(
    files,                 # Sorted list of filenames
    1e-5,                  # timestep of the simulation
    'output.hdf5',         # filename to write
    r"(\d+).vtu",         # regex to extract the timestep from the filename
    velocity_field_name = "Velocity", # Name of the velocity field in the vtk files
    radius_field_name = "Radius", # Name of the radius field in the vtk files
    id_field_name = "id", # Name of the id field in the vtk files
    type_field_name = "type", # Name of the type field in the vtk files
)

The function up4.Converter.vtu_from_folder is a wrapper around up4.Converter.vtu and can be used as follows:

import up4
up4.Converter.vtk_from_folder(
    'path/to/folder',       # Path to the folder containing the vtk files
    1e-5,                   # timestep of the simulation
    'output.hdf5',          # filename to write
    r"(\d+).vtu",          # regex to extract the timestep from the filename
    velocity_field_name = "Velocity", # Name of the velocity field in the vtk files
    diameter_field_name = "Diameter", # Name of the diameter field in the vtk files
    id_field_name = "id", # Name of the id field in the vtk files
    type_field_name = "type", # Name of the type field in the vtk files
)

Dataset Statistics#

Once you have generated your hdf5 file you can read it in using the up4.Data class. If you include the class in a normal print function the output may look as following:

import up4
data = up4.Data('output.hdf5')
print(data)

"""
Dimensions of the system:
     x -0.07-->0.06
     y 0.00-->0.13
     z -0.09-->0.01
The max time of this set is : 2.00
Number of Particles: 1
Mean velocity of: 0.44 m/s
Minimum velocity 0.03 m/s
Maximum Velocity 0.74 m/s
"""