pygsti.data
¶
A subpackage holding data set objects and supporting analysis objects
Submodules¶
Package Contents¶
Classes¶
An association between Circuits and outcome counts, serving as the input data for many QCVV protocols. 

A collection of 

A comparison between multiple data, presumably taken in different contexts. 

An association between Circuits and arbitrary data. 

A set of statistical hypothesis tests on a set of null hypotheses. 
Functions¶

Creates a DataSet using the probabilities obtained from a model. 







Creates a DataSet which merges certain outcomes in input DataSet. 

Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`. 

Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`. 

Creates a DataSet that is the restriction of dataset to sectors_to_keep. 
Trims a 


Creates a 

Generate a fake RPE DataSet using the probabilities obtained from a model. 
 class pygsti.data.DataSet(oli_data=None, time_data=None, rep_data=None, circuits=None, circuit_indices=None, outcome_labels=None, outcome_label_indices=None, static=False, file_to_load_from=None, collision_action='aggregate', comment=None, aux_info=None)¶
Bases:
object
An association between Circuits and outcome counts, serving as the input data for many QCVV protocols.
The DataSet class associates circuits with counts or time series of counts for each outcome label, and can be thought of as a table with gate strings labeling the rows and outcome labels and/or time labeling the columns. It is designed to behave similarly to a dictionary of dictionaries, so that counts are accessed by:
count = dataset[circuit][outcomeLabel]
in the timeindependent case, and in the timedependent case, for integer time index i >= 0,
outcomeLabel = dataset[circuit][i].outcome count = dataset[circuit][i].count time = dataset[circuit][i].time
 Parameters
oli_data (list or numpy.ndarray) – When static == True, a 1D numpy array containing outcome label indices (integers), concatenated for all sequences. Otherwise, a list of 1D numpy arrays, one array per gate sequence. In either case, this quantity is indexed by the values of circuit_indices or the index of circuits.
time_data (list or numpy.ndarray) – Same format at oli_data except stores floatingpoint timestamp values.
rep_data (list or numpy.ndarray) – Same format at oli_data except stores integer repetition counts for each “data bin” (i.e. (outcome,time) pair). If all repetitions equal 1 (“singleshot” timestampted data), then rep_data can be None (no repetitions).
circuits (list of (tuples or Circuits)) – Each element is a tuple of operation labels or a Circuit object. Indices for these strings are assumed to ascend from 0. These indices must correspond to the time series of spamlabel indices (above). Only specify this argument OR circuit_indices, not both.
circuit_indices (ordered dictionary) – An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit. Only specify this argument OR circuits, not both.
outcome_labels (list of strings or int) – Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both. If an int, specifies that the outcome labels should be those for a standard set of this many qubits.
outcome_label_indices (ordered dictionary) – An OrderedDict with keys equal to spam labels (strings) and value equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.
static (bool) –
 When True, create a readonly, i.e. “static” DataSet which cannot be modified. In
this case you must specify the timeseries data, circuits, and spam labels.
 When False, create a DataSet that can have time series data added to it. In this case,
you only need to specify the spam labels.
file_to_load_from (string or file object) – Specify this argument and no others to create a static DataSet by loading from a file (just like using the load(…) function).
collision_action ({"aggregate","overwrite","keepseparate"}) – Specifies how duplicate circuits should be handled. “aggregate” adds duplicatecircuit counts to the same circuit’s data at the next integer timestamp. “overwrite” only keeps the latest given data for a circuit. “keepseparate” tags duplicatecircuits by setting the .occurrence ID of added circuits that are already contained in this data set to the next available positive integer.
comment (string, optional) – A userspecified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.
aux_info (dict, optional) – A userspecified dictionary of percircuit auxiliary information. Keys should be the circuits in this DataSet and value should be Python dictionaries.
 __iter__(self)¶
 __len__(self)¶
 __contains__(self, circuit)¶
Test whether data set contains a given circuit.
 Parameters
circuit (tuple or Circuit) – A tuple of operation labels or a Circuit instance which specifies the the circuit to check for.
 Returns
bool – whether circuit was found.
 __hash__(self)¶
Return hash(self).
 __getitem__(self, circuit)¶
 __setitem__(self, circuit, outcome_dict_or_series)¶
 __delitem__(self, circuit)¶
 _get_row(self, circuit)¶
Get a row of data from this DataSet.
 Parameters
circuit (Circuit or tuple) – The gate sequence to extract data for.
 Returns
_DataSetRow
 _set_row(self, circuit, outcome_dict_or_series)¶
Set the counts for a row of this DataSet.
 Parameters
circuit (Circuit or tuple) – The gate sequence to extract data for.
outcome_dict_or_series (dict or tuple) – The outcome count data, either a dictionary of outcome counts (with keys as outcome labels) or a tuple of lists. In the latter case this can be a 2tuple: (outcomelabellist, timestamplist) or a 3tuple: (outcomelabellist, timestamplist, repetitioncountlist).
 Returns
None
 keys(self)¶
Returns the circuits used as keys of this DataSet.
 Returns
list – A list of Circuit objects which index the data counts within this data set.
 items(self)¶
Iterator over (circuit, timeSeries) pairs.
Here circuit is a tuple of operation labels and timeSeries is a
_DataSetRow
instance, which behaves similarly to a list of spam labels whose index corresponds to the time step. Returns
_DataSetKVIterator
 values(self)¶
Iterator over _DataSetRow instances corresponding to the time series data for each circuit.
 Returns
_DataSetValueIterator
 property outcome_labels(self)¶
Get a list of all the outcome labels contained in this DataSet.
 Returns
list of strings or tuples – A list where each element is an outcome label (which can be a string or a tuple of strings).
 property timestamps(self)¶
Get a list of all the (unique) timestamps contained in this DataSet.
 Returns
list of floats – A list where each element is a timestamp.
 gate_labels(self, prefix='G')¶
Get a list of all the distinct operation labels used in the circuits of this dataset.
 Parameters
prefix (str) – Filter the circuit labels so that only elements beginning with this prefix are returned. None performs no filtering.
 Returns
list of strings – A list where each element is a operation label.
 degrees_of_freedom(self, circuits=None, method='present_outcomes1', aggregate_times=True)¶
Returns the number of independent degrees of freedom in the data for the circuits in circuits.
 Parameters
circuits (list of Circuits) – The list of circuits to count degrees of freedom for. If None then all of the DataSet’s strings are used.
method ({'all_outcomes1', 'present_outcomes1', 'tuned'}) – How the degrees of freedom should be computed. ‘all_outcomes1’ takes the number of circuits and multiplies this by the total number of outcomes (the length of what is returned by outcome_labels()) minus one. ‘present_outcomes1’ counts on a percircuit basis the number of present (usually = nonzero) outcomes recorded minus one. ‘tuned’ should be the most accurate, as it accounts for lowN “Poisson bump” behavior, but it is not the default because it is still under development. For timestamped data, see aggreate_times below.
aggregate_times (bool, optional) – Whether counts that occur at different times should be tallied separately. If True, then even when counts occur at different times degrees of freedom are tallied on a percircuit basis. If False, then counts occuring at distinct times are treated as independent of those an any other time, and are tallied separately. So, for example, if aggregate_times is False and a data row has 0 and 1counts of 45 & 55 at time=0 and 42 and 58 at time=1 this row would contribute 2 degrees of freedom, not 1. It can sometimes be useful to set this to False when the DataSet holds coarsegrained data, but usually you want this to be left as True (especially for timeseries data).
 Returns
int
 _collisionaction_update_circuit(self, circuit)¶
 _add_explicit_repetition_counts(self)¶
Build internal repetition counts if they don’t exist already.
This method is usually unnecessary, as repetition counts are almost always build as soon as they are needed.
 Returns
None
 add_count_dict(self, circuit, count_dict, record_zero_counts=True, aux=None, update_ol=True)¶
Add a single circuit’s counts to this DataSet
 Parameters
circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object
count_dict (dict) – A dictionary with keys = outcome labels and values = counts
record_zero_counts (bool, optional) – Whether zerocounts are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
update_ol (bool, optional) – This argument is for internal use only and should be left as True.
 Returns
None
 add_count_list(self, circuit, outcome_labels, counts, record_zero_counts=True, aux=None, update_ol=True, unsafe=False)¶
Add a single circuit’s counts to this DataSet
 Parameters
circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object
outcome_labels (list or tuple) – The outcome labels corresponding to counts.
counts (list or tuple) – The counts themselves.
record_zero_counts (bool, optional) – Whether zerocounts are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
update_ol (bool, optional) – This argument is for internal use only and should be left as True.
unsafe (bool, optional) – True means that outcome_labels is guaranteed to hold tupletype outcome labels and never plain strings. Only set this to True if you know what you’re doing.
 Returns
None
 add_count_arrays(self, circuit, outcome_index_array, count_array, record_zero_counts=True, aux=None)¶
Add the outcomes for a single circuit, formatted as raw data arrays.
 Parameters
circuit (Circuit) – The circuit to add data for.
outcome_index_array (numpy.ndarray) – An array of outcome indices, which must be values of self.olIndex (which maps outcome labels to indices).
count_array (numpy.ndarray) – An array of integer (or sometimes floating point) counts, one corresponding to each outcome index (element of outcome_index_array).
record_zero_counts (bool, optional) – Whether zero counts (zeros in count_array should be stored explicitly or not stored and inferred. Setting to False reduces the space taken by data sets containing lots of zero counts, but makes some objective function evaluations less precise.
aux (dict or None, optional) – If not None a dictionary of userdefined auxiliary information that should be associated with this circuit.
 Returns
None
 add_cirq_trial_result(self, circuit, trial_result, key)¶
Add a single circuit’s counts — stored in a Cirq TrialResult — to this DataSet
 Parameters
circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object. Note that this must be a PyGSTi circuit — not a Cirq circuit.
trial_result (cirq.TrialResult) – The TrialResult to add
key (str) – The string key of the measurement. Set by cirq.measure.
 Returns
None
 add_raw_series_data(self, circuit, outcome_label_list, time_stamp_list, rep_count_list=None, overwrite_existing=True, record_zero_counts=True, aux=None, update_ol=True, unsafe=False)¶
Add a single circuit’s counts to this DataSet
 Parameters
circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object
outcome_label_list (list) – A list of outcome labels (strings or tuples). An element’s index links it to a particular time step (i.e. the ith element of the list specifies the outcome of the ith measurement in the series).
time_stamp_list (list) – A list of floating point timestamps, each associated with the single corresponding outcome in outcome_label_list. Must be the same length as outcome_label_list.
rep_count_list (list, optional) – A list of integer counts specifying how many outcomes of type given by outcome_label_list occurred at the time given by time_stamp_list. If None, then all counts are assumed to be 1. When not None, must be the same length as outcome_label_list.
overwrite_existing (bool, optional) – Whether to overwrite the data for circuit (if it exists). If False, then the given lists are appended (added) to existing data.
record_zero_counts (bool, optional) – Whether zerocounts (elements of rep_count_list that are zero) are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
update_ol (bool, optional) – This argument is for internal use only and should be left as True.
unsafe (bool, optional) – When True, don’t bother checking that outcome_label_list contains tupletype outcome labels and automatically upgrading strings to 1tuples. Only set this to True if you know what you’re doing and need the marginally faster performance.
 Returns
None
 _add_raw_arrays(self, circuit, oli_array, time_array, rep_array, overwrite_existing, record_zero_counts, aux)¶
 update_ol(self)¶
Updates the internal outcomelabel list in this dataset.
Call this after calling add_count_dict(…) or add_raw_series_data(…) with update_olIndex=False.
 Returns
None
 add_series_data(self, circuit, count_dict_list, time_stamp_list, overwrite_existing=True, record_zero_counts=True, aux=None)¶
Add a single circuit’s counts to this DataSet
 Parameters
circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object
count_dict_list (list) – A list of dictionaries holding the outcomelabel:count pairs for each time step (times given by time_stamp_list.
time_stamp_list (list) – A list of floating point timestamps, each associated with an entire dictionary of outcomes specified by count_dict_list.
overwrite_existing (bool, optional) – If True, overwrite any existing data for the circuit. If False, add the count data with the next nonnegative integer timestamp.
record_zero_counts (bool, optional) – Whether zerocounts (elements of the dictionaries in count_dict_list that are zero) are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
 Returns
None
 aggregate_outcomes(self, label_merge_dict, record_zero_counts=True)¶
Creates a DataSet which merges certain outcomes in this DataSet.
Used, for example, to aggregate a 2qubit 4outcome DataSet into a 1qubit 2outcome DataSet.
 Parameters
label_merge_dict (dictionary) – The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a twoqubit DataSet has outcome labels “00”, “01”, “10”, and “11”, and we want to ‘’aggregate out’’ the second qubit, we could use label_merge_dict = {‘0’:[‘00’,’01’],’1’:[‘10’,’11’]}. When doing this, however, it may be better to use :function:`filter_qubits` which also updates the circuits.
record_zero_counts (bool, optional) – Whether zerocounts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
 Returns
merged_dataset (DataSet object) – The DataSet with outcomes merged according to the rules given in label_merge_dict.
 aggregate_std_nqubit_outcomes(self, qubit_indices_to_keep, record_zero_counts=True)¶
Creates a DataSet which merges certain outcomes in this DataSet.
Used, for example, to aggregate a 2qubit 4outcome DataSet into a 1qubit 2outcome DataSet. This assumes that outcome labels are in the standard format whereby each qubit corresponds to a single ‘0’ or ‘1’ character.
 Parameters
qubit_indices_to_keep (list) – A list of integers specifying which qubits should be kept, that is, not aggregated.
record_zero_counts (bool, optional) – Whether zerocounts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
 Returns
merged_dataset (DataSet object) – The DataSet with outcomes merged.
 add_auxiliary_info(self, circuit, aux)¶
Add auxiliary meta information to circuit.
 Parameters
circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object
aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
 Returns
None
 add_counts_from_dataset(self, other_data_set)¶
Append another DataSet’s data to this DataSet
 Parameters
other_data_set (DataSet) – The dataset to take counts from.
 Returns
None
 add_series_from_dataset(self, other_data_set)¶
Append another DataSet’s series data to this DataSet
 Parameters
other_data_set (DataSet) – The dataset to take time series data from.
 Returns
None
 property meantimestep(self)¶
The mean timestep, averaged over the timestep for each circuit and over circuits.
 Returns
float
 property has_constant_totalcounts_pertime(self)¶
True if the data for every circuit has the same number of total counts at every data collection time.
This will return True if there is a different number of total counts per circuit (i.e., after aggregating over time), as long as every circuit has the same total counts per time step (this will happen when the number of timesteps varies between circuit).
 Returns
bool
 property totalcounts_pertime(self)¶
Total counts per time, if this is constant over times and circuits.
When that doesn’t hold, an error is raised.
 Returns
float or int
 property has_constant_totalcounts(self)¶
True if the data for every circuit has the same number of total counts.
 Returns
bool
 property has_trivial_timedependence(self)¶
True if all the data in this DataSet occurs at time 0.
 Returns
bool
 __str__(self)¶
Return str(self).
 to_str(self, mode='auto')¶
Render this DataSet as a string.
 Parameters
mode ({"auto","timedependent","timeindependent"}) – Whether to display the data as timeseries of outcome counts (“timedependent”) or to report peroutcome counts aggregated over time (“timeindependent”). If “auto” is specified, then the timeindependent mode is used only if all time stamps in the DataSet are equal to zero (trivial time dependence).
 Returns
str
 truncate(self, list_of_circuits_to_keep, missing_action='raise')¶
Create a truncated dataset comprised of a subset of the circuits in this dataset.
 Parameters
list_of_circuits_to_keep (list of (tuples or Circuits)) – A list of the circuits for the new returned dataset. If a circuit is given in this list that isn’t in the original data set, missing_action determines the behavior.
missing_action ({"raise","warn","ignore"}) – What to do when a string in list_of_circuits_to_keep is not in the data set (raise a KeyError, issue a warning, or do nothing).
 Returns
DataSet – The truncated data set.
 time_slice(self, start_time, end_time, aggregate_to_time=None)¶
Creates a DataSet by aggregating the counts within the [start_time,`end_time`) interval.
 Parameters
start_time (float) – The starting time.
end_time (float) – The ending time.
aggregate_to_time (float, optional) – If not None, a single timestamp to give all the data in the specified range, resulting in timeindependent DataSet. If None, then the original timestamps are preserved.
 Returns
DataSet
 split_by_time(self, aggregate_to_time=None)¶
Creates a dictionary of DataSets, each of which is a equaltime slice of this DataSet.
The keys of the returned dictionary are the distinct timestamps in this dataset.
 Parameters
aggregate_to_time (float, optional) – If not None, a single timestamp to give all the data in each returned data set, resulting in timeindependent `DataSet`s. If None, then the original timestamps are preserved.
 Returns
OrderedDict – A dictionary of
DataSet
objects whose keys are the timestamp values of the original (this) data set in sorted order.
 drop_zero_counts(self)¶
Creates a copy of this data set that doesn’t include any zero counts.
 Returns
DataSet
 process_times(self, process_times_array_fn)¶
Manipulate this DataSet’s timestamps according to processor_fn.
For example, using, the folloing process_times_array_fn would change the timestamps for each circuit to sequential integers.
``` def process_times_array_fn(times):
return list(range(len(times)))
 Parameters
process_times_array_fn (function) – A function which takes a single arrayoftimestamps argument and returns another similarlysized array. This function is called, once per circuit, with the circuit’s array of timestamps.
 Returns
DataSet – A new data set with altered timestamps.
 process_circuits(self, processor_fn, aggregate=False)¶
Create a new data set by manipulating this DataSet’s circuits (keys) according to processor_fn.
The new DataSet’s circuits result from by running each of this DataSet’s circuits through processor_fn. This can be useful when “tracing out” qubits in a dataset containing multiqubit data.
 Parameters
processor_fn (function) – A function which takes a single Circuit argument and returns another (or the same) Circuit. This function may also return None, in which case the data for that string is deleted.
aggregate (bool, optional) – When True, aggregate the data for ciruits that processor_fn assigns to the same “new” circuit. When False, use the data from the last original circuit that maps to a given “new” circuit.
 Returns
DataSet
 process_circuits_inplace(self, processor_fn, aggregate=False)¶
Manipulate this DataSet’s circuits (keys) inplace according to processor_fn.
All of this DataSet’s circuits are updated by running each one through processor_fn. This can be useful when “tracing out” qubits in a dataset containing multiqubit data.
 Parameters
processor_fn (function) – A function which takes a single Circuit argument and returns another (or the same) Circuit. This function may also return None, in which case the data for that string is deleted.
aggregate (bool, optional) – When True, aggregate the data for ciruits that processor_fn assigns to the same “new” circuit. When False, use the data from the last original circuit that maps to a given “new” circuit.
 Returns
None
 remove(self, circuits, missing_action='raise')¶
Remove (delete) the data for circuits from this
DataSet
. Parameters
circuits (iterable) – An iterable over Circuitlike objects specifying the keys (circuits) to remove.
missing_action ({"raise","warn","ignore"}) – What to do when a string in circuits is not in this data set (raise a KeyError, issue a warning, or do nothing).
 Returns
None
 _remove(self, gstr_indices)¶
Removes the data in indices given by gstr_indices
 copy(self)¶
Make a copy of this DataSet.
 Returns
DataSet
 copy_nonstatic(self)¶
Make a nonstatic copy of this DataSet.
 Returns
DataSet
 done_adding_data(self)¶
Promotes a nonstatic DataSet to a static (readonly) DataSet.
This method should be called after all data has been added.
 Returns
None
 __getstate__(self)¶
 __setstate__(self, state_dict)¶
 save(self, file_or_filename)¶
 write_binary(self, file_or_filename)¶
Write this data set to a binaryformat file.
 Parameters
file_or_filename (string or file object) – If a string, interpreted as a filename. If this filename ends in “.gz”, the file will be gzip compressed.
 Returns
None
 load(self, file_or_filename)¶
 read_binary(self, file_or_filename)¶
Read a DataSet from a binary file, clearing any data is contained previously.
The file should have been created with :method:`DataSet.write_binary`
 Parameters
file_or_filename (str or buffer) – The file or filename to load from.
 Returns
None
 rename_outcome_labels(self, old_to_new_dict)¶
Replaces existing output labels with new ones as per old_to_new_dict.
 Parameters
old_to_new_dict (dict) – A mapping from old/existing outcome labels to new ones. Strings in keys or values are automatically converted to 1tuples. Missing outcome labels are left unaltered.
 Returns
None
 add_std_nqubit_outcome_labels(self, nqubits)¶
Adds all the “standard” outcome labels (e.g. ‘0010’) on nqubits qubits.
This is useful to ensure that, even if not all outcomes appear in the data, that all are recognized as being potentially valid outcomes (and so attempts to get counts for these outcomes will be 0 rather than raising an error).
 Parameters
nqubits (int) – The number of qubits. For example, if equal to 3 the outcome labels ‘000’, ‘001’, … ‘111’ are added.
 Returns
None
 add_outcome_labels(self, outcome_labels, update_ol=True)¶
Adds new valid outcome labels.
Ensures that all the elements of outcome_labels are stored as valid outcomes for circuits in this DataSet, adding new outcomes as necessary.
 Parameters
outcome_labels (list or generator) – A list or generator of string or tuplevalued outcome labels.
update_ol (bool, optional) – Whether to update internal mappings to reflect the new outcome labels. Leave this as True unless you really know what you’re doing.
 Returns
None
 auxinfo_dataframe(self, pivot_valuename=None, pivot_value=None, drop_columns=False)¶
Create a Pandas dataframe with auxdata from this dataset.
 Parameters
pivot_valuename (str, optional) – If not None, the resulting dataframe is pivoted using pivot_valuename as the column whose values name the pivoted table’s column names. If None and pivot_value is not None,`”ValueName”` is used.
pivot_value (str, optional) – If not None, the resulting dataframe is pivoted such that values of the pivot_value column are rearranged into new columns whose names are given by the values of the pivot_valuename column. If None and pivot_valuename is not None,`”Value”` is used.
drop_columns (bool or list, optional) – A list of column names to drop (prior to performing any pivot). If True appears in this list or is given directly, then all constantvalued columns are dropped as well. No columns are dropped when drop_columns == False.
 Returns
pandas.DataFrame
 class pygsti.data.MultiDataSet(oli_dict=None, time_dict=None, rep_dict=None, circuit_indices=None, outcome_labels=None, outcome_label_indices=None, file_to_load_from=None, collision_actions=None, comment=None, comments=None, aux_info=None)¶
Bases:
object
A collection of
DataSets
that hold data for the same circuits.The MultiDataSet class allows for the combined access and storage of several static DataSets that contain the same circuits (in the same order) AND the same timedependence structure (if applicable).
It is designed to behave similarly to a dictionary of DataSets, so that a DataSet is obtained by:
dataset = multiDataset[dataset_name]
where dataset_name may be a string OR a tuple.
 Parameters
oli_dict (ordered dictionary, optional) – Keys specify dataset names. Values are 1D numpy arrays which specify outcome label indices. Each value is indexed by the values of circuit_indices.
time_dict (ordered dictionary, optional) – Same format as oli_dict except stores arrays of floatingpoint time stamp data.
rep_dict (ordered dictionary, optional) – Same format as oli_dict except stores arrays of integer repetition counts (can be None if there are no repetitions)
circuit_indices (ordered dictionary, optional) – An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit.
outcome_labels (list of strings) – Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both.
outcome_label_indices (ordered dictionary) – An OrderedDict with keys equal to spam labels (strings) and value equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.
file_to_load_from (string or file object, optional) – Specify this argument and no others to create a MultiDataSet by loading from a file (just like using the load(…) function).
collision_actions (dictionary, optional) – Specifies how duplicate circuits should be handled for the data sets. Keys must match those of oli_dict and values are “aggregate” or “keepseparate”. See documentation for
DataSet
. If None, then “aggregate” is used for all sets by default.comment (string, optional) – A userspecified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.
comments (dict, optional) – A userspecified dictionary of comments, one per dataset. Keys are dataset names (same as oli_dict keys).
aux_info (dict, optional) – A userspecified dictionary of percircuit auxiliary information. Keys should be the circuits in this MultiDataSet and value should be Python dictionaries.
 property outcome_labels(self)¶
Get a list of all the outcome labels contained in this MultiDataSet.
 Returns
list of strings or tuples – A list where each element is an outcome label (which can be a string or a tuple of strings).
 __iter__(self)¶
 __len__(self)¶
 __getitem__(self, dataset_name)¶
 __setitem__(self, dataset_name, dataset)¶
 __contains__(self, dataset_name)¶
 keys(self)¶
A list of the keys (dataset names) of this MultiDataSet
 Returns
list
 items(self)¶
Iterator over (dataset name, DataSet) pairs.
 values(self)¶
Iterator over DataSets corresponding to each dataset name.
 datasets_aggregate(self, *dataset_names)¶
Generate a new DataSet by combining the outcome counts of multiple member Datasets.
Data with the same timestamp and outcome are merged into a single “bin” in the returned
DataSet
. Parameters
dataset_names (list of strs) – one or more dataset names.
 Returns
DataSet – a single DataSet containing the summed counts of each of the data named by the parameters.
 add_dataset(self, dataset_name, dataset, update_auxinfo=True)¶
Add a DataSet to this MultiDataSet.
The dataset must be static and conform with the circuits and timedependent structure passed upon construction or those inherited from the first dataset added.
 Parameters
dataset_name (string) – The name to give the added dataset (i.e. the key the new data set will be referenced by).
dataset (DataSet) – The data set to add.
update_auxinfo (bool, optional) – Whether the auxiliary information (if any exists) in dataset is added to the information already stored in this MultiDataSet.
 Returns
None
 __str__(self)¶
Return str(self).
 copy(self)¶
Make a copy of this MultiDataSet
 Returns
MultiDataSet
 __getstate__(self)¶
 __setstate__(self, state_dict)¶
 save(self, file_or_filename)¶
 write_binary(self, file_or_filename)¶
Write this MultiDataSet to a binaryformat file.
 Parameters
file_or_filename (file or string) – Either a filename or a file object. In the former case, if the filename ends in “.gz”, the file will be gzip compressed.
 Returns
None
 load(self, file_or_filename)¶
 read_binary(self, file_or_filename)¶
Read a MultiDataSet from a file, clearing any data is contained previously.
The file should have been created with :method:`MultiDataSet.write_binary`
 Parameters
file_or_filename (file or string) – Either a filename or a file object. In the former case, if the filename ends in “.gz”, the file will be gzip uncompressed as it is read.
 Returns
None
 add_auxiliary_info(self, circuit, aux)¶
Add auxiliary meta information to circuit.
 Parameters
circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object
aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
 Returns
None
 class pygsti.data.DataComparator(dataset_list_or_multidataset, circuits='all', op_exclusions=None, op_inclusions=None, ds_names=None, allow_bad_circuits=False)¶
A comparison between multiple data, presumably taken in different contexts.
This object can be used to run all of the “context dependence detection” methods described in “Probing contextdependent errors in quantum processors”, by Rudinger et al. (See that paper’s supplemental material for explicit demonstrations of this object.)
This object stores the pvalues and log_likelihood ratio values from a consistency comparison between two or more data, and provides methods to:
Perform a hypothesis test to decide which sequences contain statistically significant variation.
Plot pvalue histograms and log_likelihood ratio box plots.
Extract (1) the “statistically significant total variation distance” for a circuit, (2) various other quantifications of the “amount” of context dependence, and (3) the level of statistical significance at which any context dependence is detected.
 Parameters
dataset_list_multidataset (List of DataSets or MultiDataSet) – Either a list of DataSets, containing two or more sets of data to compare, or a MultiDataSet object, containing two or more sets of data to compare. Note that these DataSets should contain data for the same set of Circuits (although if there are additional Circuits these can be ignored using the parameters below). This object is then intended to be used test to see if the results are indicative that the outcome probabilities for these Circuits has changed between the “contexts” that the data was obtained in.
circuits ('all' or list of Circuits, optional (default is 'all')) – If ‘all’ the comparison is implemented for all Circuits in the DataSets. Otherwise, this should be a list containing all the Circuits to run the comparison for (although note that some of these Circuits may be ignored with nondefault options for the next two inputs).
op_exclusions (None or list of gates, optional (default is None)) – If not None, all Circuits containing any of the gates in this list are discarded, and no comparison will be made for those strings.
op_exclusions – If not None, a Circuit will be dropped from the list to run the comparisons for if it doesn’t include some gate from this list (or is the empty circuit).
ds_names (None or list, optional (default is None)) – If dataset_list_multidataset is a list of DataSets, this can be used to specify names for the DataSets in the list. E.g., [“Time 0”, “Time 1”, “Time 3”] or [“Driving”,”NoDriving”].
allow_bad_circuits (bool, optional) – Whether or not the data is allowed to have zero total counts for any circuits in any of the passes. If false, then an error will be raise when there are such unimplemented circuits. If true, then the data from those circuits that weren’t run in one or more of the passes will be discarded before any analysis is performed (equivalent to excluding them explicitly in with the circuits input.
 run(self, significance=0.05, per_circuit_correction='Hochberg', aggregate_test_weighting=0.5, pass_alpha=True, verbosity=2)¶
Runs statistical hypothesis testing.
This detects whether there is statistically significant variation between the DateSets in this DataComparator. This performs hypothesis tests on the data from individual circuits, and a joint hypothesis test on all of the data. With the default settings, this is the method described and implemented in “Probing contextdependent errors in quantum processors”, by Rudinger et al. With nondefault settings, this is some minor variation on that method.
Note that the default values of all the parameters are likely sufficient for most purposes.
 Parameters
significance (float in (0,1), optional (default is 0.05)) – The “global” statistical significance to implement the tests at. I.e, with the standard per_circuit_correction value (and some other values for this parameter) the probability that a sequence that has been flagged up as context dependent is actually from a contextindependent circuit is no more than significance. Precisely, significance is what the “familywise error rate” (FWER) of the full set of hypothesis tests (1 “aggregate test”, and 1 test per sequence) is controlled to, as long as per_circuit_correction is set to the default value, or another option that controls the FWER of the persequence comparion (see below).
per_circuit_correction (string, optional (default is 'Hochberg')) –
The multihypothesis test correction used for the percircuit/sequence comparisons. (See “Probing contextdependent errors in quantum processors”, by Rudinger et al. for the details of what the percircuit comparison is). This can be any string that is an allowed value for the localcorrections input parameter of the HypothesisTest object. This includes:
’Hochberg’. This implements the Hochberg multitest compensation technique. This
is strictly the best method available in the code, if you wish to control the FWER, and it is the method described in “Probing contextdependent errors in quantum processors”, by Rudinger et al.
’Holms’. This implements the Holms multitest compensation technique. This
controls the FWER, and it results in a strictly less powerful test than the Hochberg correction.
’Bonferroni’. This implements the wellknown Bonferroni multitest compensation
technique. This controls the FWER, and it results in a strictly less powerful test than the Hochberg correction.
’none’. This implements no multitest compensation for the persequence comparsions,
so they are all implemented at a “local” signifincance level that is altered from significance only by the (inbuilt) Bonferronilike correction between the “aggregate” test and the persequence tests. This option does not control the FWER, and many sequences may be flagged up as context dependent even if none are.
‘BenjaminiHochberg’. This implements the BenjaminiHockberg multitest compensation technique. This does not control the FWER, and instead controls the “False Detection Rate” (FDR); see, for example, https://en.wikipedia.org/wiki/False_discovery_rate. That means that the global significance is maintained for the test of “Is there any context dependence?”. I.e., one or more tests will trigger when there is no context dependence with at most a probability of significance. But, if one or more persequence tests trigger then we are only guaranteed that (in expectation) no more than a fraction of “localsignifiance” of the circuits that have been flagged up as context dependent actually aren’t. Here, “localsignificance” is the significance at which the persequence tests are, together, implemented, which is significance`*(1  `aggregate_test_weighting) if the aggregate test doesn’t detect context dependence and significance if it does (as long as pass_alpha is True). This method is strictly more powerful than the Hochberg correction, but it controls a different, weaker quantity.
aggregate_test_weighting (float in [0,1], optional (default is 0.5)) – The weighting, in a generalized Bonferroni correction, to put on the “aggregate test”, that jointly tests all of the data for context dependence (in contrast to the persequence tests). If this is 0 then the aggreate test is not implemented, and if it is 1 only the aggregate test is implemented (unless it triggers and pass_alpha is True).
pass_alpha (Bool, optional (default is True)) – The aggregate test is implemented first, at the “local” significance defined by aggregate_test_weighting and significance (see above). If pass_alpha is True, then when the aggregate test triggers all the local significance for this test is passed on to the persequence tests (which are then jointly implemented with significance significance, that is then locally corrected for the multitest correction as specified above), and when the aggregate test doesn’t trigger this local significance isn’t passed on. If pass_alpha is False then local significance of the aggregate test is never passed on from the aggregate test. See “Probing contextdependent errors in quantum processors”, by Rudinger et al. (or hypothesis testing literature) for discussions of why this “significance passing” still maintains a (global) FWER of significance. Note that The default value of True always results in a strictly more powerful test.
verbosity (int, optional (default is 1)) – If > 0 then a summary of the results of the tests is printed to screen. Otherwise, the various .get_…() methods need to be queried to obtain the results of the hypothesis tests.
 Returns
None
 tvd(self, circuit)¶
Returns the observed total variation distacnce (TVD) for the specified circuit.
This is only possible if the comparison is between two sets of data. See Eq. (19) in “Probing contextdependent errors in quantum processors”, by Rudinger et al. for the definition of this observed TVD.
This is a quantification for the “amount” of context dependence for this circuit (see also, jsd(), sstvd() and ssjsd()).
 Parameters
circuit (Circuit) – The circuit to return the TVD of.
 Returns
float – The TVD for the specified circuit.
 sstvd(self, circuit)¶
Returns the “statistically significant total variation distacnce” (SSTVD) for the specified circuit.
This is only possible if the comparison is between two sets of data. The SSTVD is None if the circuit has not been found to have statistically significant variation. Otherwise it is equal to the observed TVD. See Eq. (20) and surrounding discussion in “Probing contextdependent errors in quantum processors”, by Rudinger et al., for more information.
This is a quantification for the “amount” of context dependence for this circuit (see also, jsd(), _tvd() and ssjsd()).
 Parameters
circuit (Circuit) – The circuit to return the SSTVD of.
 Returns
float – The SSTVD for the specified circuit.
 property maximum_sstvd(self)¶
Returns the maximum, over circuits, of the “statistically significant total variation distance” (SSTVD).
This is only possible if the comparison is between two sets of data. See the .sstvd() method for information on SSTVD.
 Returns
float – The circuit associated with the maximum SSTVD, and the SSTVD of that circuit.
 pvalue(self, circuit)¶
Returns the pvalue for the log_likelihood ratio test for the specified circuit.
 Parameters
circuit (Circuit) – The circuit to return the pvalue of.
 Returns
float – The pvalue of the specified circuit.
 property pvalue_pseudothreshold(self)¶
Returns the (multitestadjusted) statistical significance pseudothreshold for the persequence pvalues.
The pvalues under consideration are those obtained from the loglikehood ratio test. This is a “pseudothreshold”, because it is datadependent in general, but all the persequence pvalues below this value are statistically significant. This quantity is given by Eq. (9) in “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The statistical significance pseudothreshold for the persequence pvalue.
 llr(self, circuit)¶
Returns the log_likelihood ratio (LLR) for the input circuit.
This is the quantity defined in Eq (4) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Parameters
circuit (Circuit) – The circuit to return the LLR of.
 Returns
float – The LLR of the specified circuit.
 property llr_pseudothreshold(self)¶
Returns the statistical significance pseudothreshold for the persequence log_likelihood ratio (LLR).
This results has been multitestadjusted.
This is a “pseudothreshold”, because it is datadependent in general, but all LLRs above this value are statistically significant. This quantity is given by Eq (10) in “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The statistical significance pseudothreshold for persequence LLR.
 jsd(self, circuit)¶
Returns the observed JensenShannon divergence (JSD) between “contexts” for the specified circuit.
The JSD is a rescaling of the LLR, given by dividing the LLR by 2*N where N is the total number of counts (summed over contexts) for this circuit. This quantity is given by Eq (15) in “Probing contextdependent errors in quantum processors”, Rudinger et al.
This is a quantification for the “amount” of context dependence for this circuit (see also, _tvd(), sstvd() and ssjsd()).
 Parameters
circuit (Circuit) – The circuit to return the JSD of
 Returns
float – The JSD of the specified circuit.
 property jsd_pseudothreshold(self)¶
The statistical significance pseudothreshold for the JensenShannon divergence (JSD) between “contexts”.
This is a rescaling of the pseudothreshold for the LLR, returned by the method .llr_pseudothreshold; see that method for more details. This threshold is also given by Eq (17) in “Probing contextdependent errors in quantum processors”, by Rudinger et al.
Note that this pseudothreshold is not defined if the total number of counts (summed over contexts) for a sequence varies between sequences.
 Returns
float – The pseudothreshold for the JSD of a circuit, if welldefined.
 ssjsd(self, circuit)¶
Returns the statistically significant JensenShannon divergence” (SSJSD) between “contexts” for circuit.
This is the JSD of the circuit (see .jsd()), if the circuit has been found to be context dependent, and otherwise it is None. This quantity is the JSD version of the SSTVD given in Eq. (20) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
This is a quantification for the “amount” of context dependence for this circuit (see also, _tvd(), sstvd() and ssjsd()).
 Parameters
circuit (Circuit) – The circuit to return the JSD of
 Returns
float – The JSD of the specified circuit.
 property aggregate_llr(self)¶
Returns the “aggregate” log_likelihood ratio (LLR).
This values compares the null hypothesis of no context dependence in any sequence with the full model of arbitrary context dependence. This is the sum of the persequence LLRs, and it is defined in Eq (11) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The aggregate LLR.
 property aggregate_llr_threshold(self)¶
The (multitestadjusted) statistical significance threshold for the “aggregate” log_likelihood ratio (LLR).
Above this value, the LLR is significant. See .aggregate_llr for more details. This quantity is the LLR version of the quantity defined in Eq (14) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The threshold above which the aggregate LLR is statistically significant.
 property aggregate_pvalue(self)¶
Returns the pvalue for the “aggregate” log_likelihood ratio (LLR).
This compares the null hypothesis of no context dependence in any sequence with the full model of arbitrary dependence. This LLR is defined in Eq (11) in “Probing contextdependent errors in quantum processors”, by Rudinger et al., and it is converted to a pvalue via Wilks’ theorem (see discussion therein).
Note that this pvalue is often zero to machine precision, when there is context dependence, so a more useful number is often returned by aggregate_nsigma() (that quantity is equivalent to this pvalue but expressed on a different scale).
 Returns
float – The pvalue of the aggregate LLR.
 property aggregate_pvalue_threshold(self)¶
The (multitestadjusted) statistical significance threshold for the pvalue of the “aggregate” LLR.
Here, LLR refers to the log_likelihood ratio. Below this pvalue the LLR would be deemed significant. See the .aggregate_pvalue method for more details.
 Returns
float – The statistical significance threshold for the pvalue of the “aggregate” LLR.
 property aggregate_nsigma(self)¶
The number of standard deviations the “aggregate” LLR is above the contextindependent mean.
More specifically, the number of standard deviations above the contextindependent mean that the “aggregate” log_likelihood ratio (LLR) is. This quantity is defined in Eq (13) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The number of signed standard deviations of the aggregate LLR .
 property aggregate_nsigma_threshold(self)¶
The significance threshold above which the signed standard deviations of the aggregate LLR is significant.
The (multitestadjusted) statistical significance threshold for the signed standard deviations of the the “aggregate” log_likelihood ratio (LLR). See the .aggregate_nsigma method for more details. This quantity is defined in Eq (14) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The statistical significance threshold above which the signed standard deviations of the aggregate LLR is significant.
 worst_circuits(self, number)¶
Returns the “worst” circuits that have the smallest pvalues.
 Parameters
number (int) – The number of circuits to return.
 Returns
list – A list of tuples containing the worst number circuits along with the correpsonding pvalues.
 class pygsti.data.FreeformDataSet(circuits=None, circuit_indices=None)¶
Bases:
object
An association between Circuits and arbitrary data.
 Parameters
circuits (list of (tuples or Circuits), optional) – Each element is a tuple of operation labels or a Circuit object. Indices for these strings are assumed to ascend from 0. These indices must correspond to the time series of spamlabel indices (above). Only specify this argument OR circuit_indices, not both.
circuit_indices (ordered dictionary, optional) – An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit. Only specify this argument OR circuits, not both.
 to_dataframe(self, pivot_valuename=None, pivot_value='Value', drop_columns=False)¶
Create a Pandas dataframe with the data from this freeform dataset.
 Parameters
pivot_valuename (str, optional) – If not None, the resulting dataframe is pivoted using pivot_valuename as the column whose values name the pivoted table’s column names. If None and pivot_value is not None,`”ValueName”` is used.
pivot_value (str, optional) – If not None, the resulting dataframe is pivoted such that values of the pivot_value column are rearranged into new columns whose names are given by the values of the pivot_valuename column. If None and pivot_valuename is not None,`”Value”` is used.
drop_columns (bool or list, optional) – A list of column names to drop (prior to performing any pivot). If True appears in this list or is given directly, then all constantvalued columns are dropped as well. No columns are dropped when drop_columns == False.
 Returns
pandas.DataFrame
 __iter__(self)¶
 __len__(self)¶
 __contains__(self, circuit)¶
Test whether data set contains a given circuit.
 Parameters
circuit (tuple or Circuit) – A tuple of operation labels or a Circuit instance which specifies the the circuit to check for.
 Returns
bool – whether circuit was found.
 __hash__(self)¶
Return hash(self).
 __getitem__(self, circuit)¶
 __setitem__(self, circuit, info_dict)¶
 __delitem__(self, circuit)¶
 keys(self)¶
Returns the circuits used as keys of this DataSet.
 Returns
list – A list of Circuit objects which index the data counts within this data set.
 items(self)¶
Iterator over (circuit, info_dict) pairs.
 Returns
Iterator
 values(self)¶
Iterator over infodicts for each circuit.
 Returns
Iterator
 copy(self)¶
Make a copy of this FreeformDataSet.
 Returns
DataSet
 class pygsti.data.HypothesisTest(hypotheses, significance=0.05, weighting='equal', passing_graph='Holms', local_corrections='Holms')¶
Bases:
object
A set of statistical hypothesis tests on a set of null hypotheses.
This object has not been carefully tested.
 Parameters
hypotheses (list or tuple) –
Specifies the set of null hypotheses. This should be a list containing elements that are either
A “label” for a hypothesis, which is just some hashable object such as a string.
A tuple of “nested hypotheses”, which are also just labels for some null hypotheses.
The elements of this list are then subject to multitest correction of the “closed test procedure” type, with the exact correction method specified by passing_graph. For each element that is itself a tuple of hypotheses, these hypotheses are then further corrected using the method specified by local_corrections.
significance (float in (0,1), optional) – The global significance level. If either there are no “nested hypotheses” or the correction used for the nested hypotheses will locally control the familywise error rate (FWER) (such as if local_correction`=’Holms’) then when the hypothesis test encoded by this object will control the FWER to `significance.
weighting (string or dict.) – Specifies what proportion of significance is initially allocated to each element of hypotheses. If a string, must be ‘equal’. In this case, the local significance allocated to each element of hypotheses is significance/len(hypotheses). If not a string, a dictionary whereby each key is an element of hypotheses and each value is a nonnegative integer (which will be normalized to one inside the function).
passing_graph (string or numpy.array) –
Specifies where the local significance from each test in hypotheses that triggers is passed to. If a string, then must be ‘Holms’. In this case a test that triggers passes it’s local significance to all the remaining hypotheses that have not yet triggered, split evenly over these hypotheses. If it is an array then its value for [i,j] is the proportion of the “local significance” that is passed from hypothesis with index i (in the tuple hypotheses) to the hypothesis with index j if the hypothesis with index i is rejected (and if j hasn’t yet been rejected; otherwise that proportion is redistributed other the other hypothesis that i is to pass it’s significance to). The only restriction on restriction on this array is that a row must sum to <= 1 (and it is suboptimal for a row to sum to less than 1).
Note that a nested hypothesis is not allowed to pass significance out of it, so any rows that request doing this will be ignored. This is because a nested hypothesis represents a set of hypotheses that are to be jointly tested using some multitest correction, and so this can only pass significance out if all of the hypotheses in that nested hypothesis are rejected. As this is unlikely in most usecases, this has not been allowed for.
local_corrections (str, optional) –
The type of multitest correction used for testing any nested hypotheses. After all of the “top level” testing as been implemented on all nonnested hypotheses, whatever the “local” significance is for each of the “nested hypotheses” is multitest corrected using this procedure. Must be one of:
’Holms’. This implements the Holms multitest compensation technique. This
controls the FWER for each set of nested hypotheses (and so controls the global FWER, in combination with the “top level” corrections). This requires no assumptions about the null hypotheses.
’Bonferroni’. This implements the wellknown Bonferroni multitest compensation
technique. This is strictly less powerful test than the Hochberg correction.
Note that neither ‘Holms’ nor ‘Bonferronni’ gained any advantage from being implemented using “nesting”, as if all the hypotheses were put into the “top level” the same corrections could be achieved.
’Hochberg’. This implements the Hockberg multitest compensation technique. It is
not a “closed test procedure”, so it is not something that can be implemented in the top level. To be provably valid, it is necessary for the pvalues of the nested hypotheses to be nonnegatively dependent. When that is true, this is strictly better than the Holms and Bonferroni corrections whilst still controlling the FWER.
’none’. This implements no multitest compensation. This option does not control the
FWER of the nested hypotheses. So it will generally not control the global FWER as specified.
‘BenjaminiHochberg’. This implements the BenjaminiHockberg multitest compensation technique. This does not control the FWER of the nested hypotheses, and instead controls the “False Detection Rate” (FDR); see wikipedia. That means that the global significance is maintained in the sense that the probability of one or more tests triggering is at most significance. But, if one or more tests are triggered in a particular nested hypothesis test we are only guaranteed that (in expectation) no more than a fraction of “local signifiance” of tests are false alarms.This method is strictly more powerful than the Hochberg correction, but it controls a different, weaker quantity.
 _initialize_to_weighted_holms_test(self)¶
Initializes the passing graph to the weighted Holms test.
 add_pvalues(self, pvalues)¶
Insert the pvalues for the hypotheses.
 Parameters
pvalues (dict) – A dictionary specifying the pvalue for each hypothesis.
 Returns
None
 run(self)¶
Implements the multiple hypothesis testing routine encoded by this object.
This populates the self.hypothesis_rejected dictionary, that shows which hypotheses can be rejected using the procedure specified.
 Returns
None
 _implement_nested_hypothesis_test(self, hypotheses, significance, correction='Holms')¶
Todo
 pygsti.data.simulate_data(model_or_dataset, circuit_list, num_samples, sample_error='multinomial', seed=None, rand_state=None, alias_dict=None, collision_action='aggregate', record_zero_counts=True, comm=None, mem_limit=None, times=None)¶
Creates a DataSet using the probabilities obtained from a model.
 Parameters
model_or_dataset (Model or DataSet object) – The source of the underlying probabilities used to generate the data. If a Model, the model whose probabilities generate the data. If a DataSet, the data set whose frequencies generate the data.
circuit_list (list of (tuples or Circuits) or ExperimentDesign or None) – Each tuple or Circuit contains operation labels and specifies a gate sequence whose counts are included in the returned DataSet. e.g.
[ (), ('Gx',), ('Gx','Gy') ]
If anExperimentDesign
, then the design’s .all_circuits_needing_data list is used as the circuit list.num_samples (int or list of ints or None) – The simulated number of samples for each circuit. This only has effect when
sample_error == "binomial"
or"multinomial"
. If an integer, all circuits have this number of total samples. If a list, integer elements specify the number of samples for the corresponding circuit. IfNone
, then model_or_dataset must be aDataSet
, and total counts are taken from it (on a percircuit basis).sample_error (string, optional) –
What type of sample error is included in the counts. Can be:
”none”  no sample error: counts are floating point numbers such that the exact probabilty can be found by the ratio of count / total.
”clip”  no sample error, but clip probabilities to [0,1] so, e.g., counts are always positive.
”round”  same as “clip”, except counts are rounded to the nearest integer.
”binomial”  the number of counts is taken from a binomial distribution. Distribution has parameters p = (clipped) probability of the circuit and n = number of samples. This can only be used when there are exactly two SPAM labels in model_or_dataset.
”multinomial”  counts are taken from a multinomial distribution. Distribution has parameters p_k = (clipped) probability of the gate string using the kth SPAM label and n = number of samples.
seed (int, optional) – If not
None
, a seed for numpy’s random number generator, which is used to sample from the binomial or multinomial distribution.rand_state (numpy.random.RandomState) – A RandomState object to generate samples from. Can be useful to set instead of seed if you want reproducible distribution samples across multiple random function calls but you don’t want to bother with manually incrementing seeds between those calls.
alias_dict (dict, optional) – A dictionary mapping single operation labels into tuples of one or more other operation labels which translate the given circuits before values are computed using model_or_dataset. The resulting Dataset, however, contains the untranslated circuits as keys.
collision_action ({"aggregate", "keepseparate"}) – Determines how duplicate circuits are handled by the resulting DataSet. Please see the constructor documentation for DataSet.
record_zero_counts (bool, optional) – Whether zerocounts are actually recorded (stored) in the returned DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
comm (mpi4py.MPI.Comm, optional) – When not
None
, an MPI communicator for distributing the computation across multiple processors and ensuring that the same dataset is generated on each processor.mem_limit (int, optional) – A rough memory limit in bytes which is used to determine job allocation when there are multiple processors.
times (iterable, optional) – When not None, a list of timestamps at which data should be sampled. num_samples samples will be simulated at each time value, meaning that each circuit in circuit_list will be evaluated with the given time value as its start time.
 Returns
DataSet – A static data set filled with counts for the specified circuits.
 pygsti.data._adjust_probabilities_inbounds(ps, tol)¶
 pygsti.data._adjust_unit_sum(ps, tol)¶
 pygsti.data._sample_distribution(ps, sample_error, nSamples, rndm_state)¶
 pygsti.data.aggregate_dataset_outcomes(dataset, label_merge_dict, record_zero_counts=True)¶
Creates a DataSet which merges certain outcomes in input DataSet.
This is used, for example, to aggregate a 2qubit, 4outcome DataSet into a 1qubit, 2outcome DataSet.
 Parameters
dataset (DataSet object) – The input DataSet whose results will be simplified according to the rules set forth in label_merge_dict
label_merge_dict (dictionary) – The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a twoqubit DataSet has outcome labels “00”, “01”, “10”, and “11”, and we want to ‘’aggregate out’’ the second qubit, we could use label_merge_dict = {‘0’:[‘00’,’01’],’1’:[‘10’,’11’]}. When doing this, however, it may be better to use :function:`filter_dataset` which also updates the circuits.
record_zero_counts (bool, optional) – Whether zerocounts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
 Returns
merged_dataset (DataSet object) – The DataSet with outcomes merged according to the rules given in label_merge_dict.
 pygsti.data._create_qubit_merge_dict(num_qubits, qubits_to_keep)¶
Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`.
The returned dictionary instructs aggregate_dataset_outcomes to aggregate all but the specified qubits_to_keep when the outcome labels are those of num_qubits qubits (i.e. strings of 0’s and 1’s).
 Parameters
num_qubits (int) – The total number of qubits
qubits_to_keep (list) – A list of integers specifying which qubits should be kept, that is, not aggregated, when the returned dictionary is passed to aggregate_dataset_outcomes.
 Returns
dict
 pygsti.data._create_merge_dict(indices_to_keep, outcome_labels)¶
Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`.
Each element of outcome_labels should be a ncharacter string (or a 1tuple of such a string). The returned dictionary’s keys will be all the unique results of keeping only the characters indexed by indices_to_keep from each outcome label. The dictionary’s values will be a list of all the original outcome labels which reduce to the key value when the nonindices_to_keep characters are removed.
For example, if outcome_labels == [‘00’,’01’,’10’,’11’] and indices_to_keep == [1] then this function returns the dict {‘0’: [‘00’,’10’], ‘1’: [‘01’,’11’] }.
Note: if the elements of outcome_labels are 1tuples then so are the elements of the returned dictionary’s values.
 Parameters
indices_to_keep (list) – A list of integer indices specifying which character positions should be kept (i.e. not aggregated together by aggregate_dataset_outcomes).
outcome_labels (list) – A list of the outcome labels to potentially merge. This can be a list of strings or of 1tuples containing strings.
 Returns
dict
 pygsti.data.filter_dataset(dataset, sectors_to_keep, sindices_to_keep=None, new_sectors=None, idle=((),), record_zero_counts=True, filtercircuits=True)¶
Creates a DataSet that is the restriction of dataset to sectors_to_keep.
This function aggregates (sums) outcomes in dataset which differ only in sectors (usually qubits  see below) not in sectors_to_keep, and removes any operation labels which act specifically on sectors not in sectors_to_keep (e.g. an idle gate acting on all sectors because it’s .sslbls is None will not be removed).
Here “sectors” are statespace labels, present in the circuits of dataset. Each sector also corresponds to a particular character position within the outcomes labels of dataset. Thus, for this function to work, the outcome labels of dataset must all be 1tuples whose sole element is an ncharacter string such that each character represents the outcome of a single sector. If the statespace labels are integers, then they can serve as both a label and an outcomestring position. The argument new_sectors may be given to rename the kept statespace labels in the returned DataSet’s circuits.
A typical case is when the statespace is that of n qubits, and the state space labels the intergers 0 to n1. As stated above, in this case there is no need to specify sindices_to_keep. One may want to “rebase” the indices to 0 in the returned data set using new_sectors (E.g. sectors_to_keep == [4,5,6] and new_sectors == [0,1,2]).
 Parameters
dataset (DataSet object) – The input DataSet whose data will be processed.
sectors_to_keep (list or tuple) – The statespace labels (strings or integers) of the “sectors” to keep in the returned DataSet.
sindices_to_keep (list or tuple, optional) – The 0based indices of the labels in sectors_to_keep which give the postiions of the corresponding letters in each outcome string (see above). If the state space labels are integers (labeling qubits) thath are also letterpositions, then this may be left as None. For example, if the outcome strings of dataset are ‘00’,’01’,’10’,and ‘11’ and the first position refers to qubit “Q1” and the second to qubit “Q2” (present in operation labels), then to extract just “Q2” data sectors_to_keep should be [“Q2”] and sindices_to_keep should be [1].
new_sectors (list or tuple, optional) – New sectors names to map the elements of sectors_to_keep onto in the output DataSet’s circuits. None means the labels are not renamed. This can be useful if, for instance, you want to run a 2qubit protocol that expects the qubits to be labeled “0” and “1” on qubits “4” and “5” of a larger set. Simply set sectors_to_keep == [4,5] and new_sectors == [0,1].
idle (string or Label, optional) – The operation label to be used when there are no kept components of a “layer” (element) of a circuit.
record_zero_counts (bool, optional) – Whether zerocounts present in the original dataset are recorded (stored) in the returned (filtered) DataSet. If False, then such zero counts are ignored, except for potentially registering new outcome labels.
filtercircuits (bool, optional) – Whether or not to “filter” the circuits, by removing gates that act outside of the sectors_to_keep.
 Returns
filtered_dataset (DataSet object) – The DataSet with outcomes and circuits filtered as described above.
 pygsti.data.trim_to_constant_numtimesteps(ds)¶
Trims a
DataSet
so that each circuit’s data comprises the same number of timesteps.Returns a new dataset that has data for the same number of time steps for every circuit. This is achieved by discarding all timeseries data for every circuit with a time step index beyond ‘mintimestepindex’, where ‘mintimestepindex’ is the minimum number of time steps over circuits.
 Parameters
ds (DataSet) – The dataset to trim.
 Returns
DataSet – The trimmed dataset, obtained by potentially discarding some of the data.
 pygsti.data._subsample_timeseries_data(ds, step)¶
Creates a
DataSet
where each circuit’s data is subsampled.Returns a new dataset where, for every circuit, we only keep the data at every ‘step’ timestep. Specifically, the outcomes at the ith time for each circuit are kept for each i such that i modulo ‘step’ is zero.
 Parameters
ds (DataSet) – The dataset to subsample
step (int) – The subsampling time step. Only data at every step increment in time is kept.
 Returns
DataSet – The subsampled dataset.
 pygsti.data.make_rpe_data_set(model_or_dataset, string_list_d, num_samples, sample_error='binomial', seed=None)¶
Generate a fake RPE DataSet using the probabilities obtained from a model.
Is a thin wrapper for pygsti.data.simulate_data, changing default behavior of sample_error, and taking a dictionary of circuits as input.
 Parameters
model_or_dataset (Model or DataSet object) – If a Model, the model whose probabilities generate the data. If a DataSet, the data set whose frequencies generate the data.
string_list_d (Dictionary of list of (tuples or Circuits)) – Each tuple or Circuit contains operation labels and specifies a gate sequence whose counts are included in the returned DataSet. The dictionary must have the key ‘totalStrList’; easiest if this dictionary is generated by make_rpe_string_list_d.
num_samples (int or list of ints or None) – The simulated number of samples for each circuit. This only has effect when sample_error == “binomial” or “multinomial”. If an integer, all circuits have this number of total samples. If a list, integer elements specify the number of samples for the corresponding circuit. If None, then model_or_dataset must be a DataSet, and total counts are taken from it (on a percircuit basis).
sample_error (string, optional) –
What type of sample error is included in the counts. Can be:
”none”  no sample error: counts are floating point numbers such that the exact probability can be found by the ratio of count / total.
”round”  same as “none”, except counts are rounded to the nearest integer.
”binomial”  the number of counts is taken from a binomial distribution. Distribution has parameters p = probability of the circuit and n = number of samples. This can only be used when there are exactly two SPAM labels in model_or_dataset.
”multinomial”  counts are taken from a multinomial distribution. Distribution has parameters p_k = probability of the circuit using the kth SPAM label and n = number of samples. This should not be used for RPE.
seed (int, optional) – If not None, a seed for numpy’s random number generator, which is used to sample from the binomial or multinomial distribution.
 Returns
DataSet – A static data set filled with counts for the specified circuits.