Basic Dataset Manipulation#

This section describes Dataset operations available on Dataset which exists to re-implement basic operations present on xarray Datasets. These should be sufficient for application developers.

Those wanting extended xarray Dataset functionality can simply install xarray as the standard dask-ms xds_from* and xds_to* functions return and accept xarray Datasets.

Datasets logically group arrays together into a single structure.

Dataset Variables#

Variableā€™sā€™ are represented by a tuple of two or three variables. They have the form (dims, array[, attrs]).

  • dim is a dimension schema. The first entry in dim must always be "row".

  • array should be a dask or numpy array.

  • attrs is optional and should be a dictionary containing metadata.

IMAGING_WEIGHT = (("row", "chan"), np.zeros(10, 16), {"keywords": test})

Creating Datasets#

Set up imports and define some dimension chunks and sizes:

import numpy as np
from daskms import Dataset
# Define a chunking schema
chunks = {'row': (2, 2, 2, 2, 2), 'chan': (16, 16), "corr": (4,)}
# Figure out dimension sizes
row = sum(chunks['row'])
chan = sum(chunks['chan'])
corr = sum(chunks['corr'])

Next, create some dask arrays that we will place on our Dataset

# Define a data descriptor array
ddid = da.ones(row, chunks=chunks['row'])

# Define some visibilities
vis_chunks = (chunks['row'], chunks['chan'], chunks['corr'])
data = (da.random.random((row, chan, corr), chunks=vis_chunks) +
        da.random.random((row, chan, corr), chunks=vis_chunks)*1j)

Next, create the dataset by assigning variable dictionaries. They have the form {name: (dims, array[, attrs])}

The Dataset can also be assigned coordinates and attributes via the coords and attrs argument to the constructor.

Note

The ROWID coordinate is not normally assigned when creating a Dataset from scratch and is shown here for illustrating how to set coordinates. See Updating/Appending Rows for further information on standard use of the ROWID array.

# Data Variable dictionary
data_vars = {
    'DATA_DESC_ID' : (("row"), ddid, {'keywords': 'test'})
    'DATA': (("row", "chan", "corr"), data)}

# Coordinate dictionary
coords = {'ROWID': (("row"), rowid)}

# Create the dataset
ds = Dataset(data_vars, attrs={'observer': 'hugo'}, coords=coords})

Modifying Datasets#

We can assign new variables to our Dataset

bitflag = da.ones((row, chan, corr), chunks=vis_chunks)

new_ds = ds.assign(BITFLAG=(("row", "chan", "corr"), bitflag))