LASSO correlation

The least absolute shrinkage and selection operator (LASSO) is a regression technique using machine learning that tracks slow correlations among a large collection of time-domain data streams. For gravitational-wave detector characterisation, this technique is used to find correlations between environmental sensors and any noise in the primary strain channel.

The core gwdetchar.lasso module provides the following functions:

find_outliers(ts[, N, method])

Find outliers within a TimeSeries

remove_outliers(ts[, N, method])

Find and remove outliers within a TimeSeries

fit(data, target[, alpha])

Fit some data to the target using a Lasso model

find_alpha(data, target)

Find the best alpha value to use for the given data

remove_flat(tsdict)

Remove flat timeseries from a TimeSeriesDict

remove_bad(tsdict)

Remove data that cannot be scaled from a TimeSeriesDict

The gwdetchar.lasso.plot module also provides functions for efficiently writing plots of LASSO data products:

plot.configure_mpl_tex()

Configure Matplotlib with LaTeX when using multiprocessing

plot.save_figure(fig, pngfile, **kwargs)

Save a figure

Command-line utility

Note

This utility requires authentication with LIGO.ORG credentials for archived frame data access.

gwdetchar.lasso

The gwdetchar.lasso command-line interface searches for long, slow correlations between one channel identified as a primary (typically gravitational-wave strain) and several other (typically thousands of) auxiliary channels. For a full explanation of the available command-line arguments and options, you can run

$ python -m gwdetchar.lasso --help
usage: python -m gwdetchar.lasso [-h] [-V] -i IFO [-j NPROC] [-J NPROC_PLOT]
                                 [-o OUTPUT_DIR] [-s SUMMARY_DIR]
                                 [-f CHANNEL_FILE] [-T {second,minute}]
                                 [-p PRIMARY_CHANNEL] [-pf PRIMARY_FILE]
                                 [-P PRIMARY_FRAMETYPE]
                                 [--primary-cache PRIMARY_CACHE]
                                 [-O REMOVE_OUTLIERS] [-R REMOVE_OUTLIERS_PF]
                                 [-t THRESHOLD] [--remove-bad-chans]
                                 [-b FLOW FHIGH FLOW FHIGH]
                                 [-x FILTER_PADDING] [-a ALPHA] [-C]
                                 [-c CLUSTER_COEFFICIENT]
                                 [-L LINE_SIZE_PRIMARY] [-l LINE_SIZE_AUX]
                                 [--low-noise-state-flag]
                                 [--segment-padding SEGMENT_PADDING]
                                 [--segment-min-length SEGMENT_MIN_LENGTH]
                                 [--intersect-data-segs]
                                 gpsstart gpsend

positional arguments:
  gpsstart              GPS start time or datetime of analysis
  gpsend                GPS end time or datetime of analysis

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -i IFO, --ifo IFO     IFO prefix for this analysis, default: None
  -j NPROC, --nproc NPROC
                        the number of processes to use when reading data,
                        default: 1
  -J NPROC_PLOT, --nproc-plot NPROC_PLOT
                        number of processes to use for plot rendering, will be
                        ignored if not using LaTeX (default: None)
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        output directory for plots (default: .)
  -s SUMMARY_DIR, --summary-dir SUMMARY_DIR
                        output directory for html summary pages (default: .)
  -f CHANNEL_FILE, --channel-file CHANNEL_FILE
                        path for channel file (default: None)
  -T {second,minute}, --trend-type {second,minute}
                        type of trend for correlation (default: minute)
  -p PRIMARY_CHANNEL, --primary-channel PRIMARY_CHANNEL
                        name of primary channel to use (default: {ifo}:DMT-
                        SNSH_EFFECTIVE_RANGE_MPC.mean)
  -pf PRIMARY_FILE, --primary-file PRIMARY_FILE
                        filepath of .gwf custom primary channel if using
                        custom channel (default: None)
  -P PRIMARY_FRAMETYPE, --primary-frametype PRIMARY_FRAMETYPE
                        frametype for --primary-channel, default: guess by
                        channel name (default: None)
  --primary-cache PRIMARY_CACHE
                        cache file for --primary-channel, default: None
                        (default: None)
  -O REMOVE_OUTLIERS, --remove-outliers REMOVE_OUTLIERS
                        Std. dev. limit for removing outliers (default: None)
  -R REMOVE_OUTLIERS_PF, --remove-outliers-pf REMOVE_OUTLIERS_PF
                        Fractional limit for removing outliers between 0 and 1
                        (default: None)
  -t THRESHOLD, --threshold THRESHOLD
                        threshold for making a plot (default: 0.0001)
  --remove-bad-chans    remove flat/bad channels (default: False)

Signal processing options:
  -b FLOW FHIGH FLOW FHIGH, --band-pass FLOW FHIGH FLOW FHIGH
                        lower and upper frequencies for bandpass on h(t)
                        (default: None)
  -x FILTER_PADDING, --filter-padding FILTER_PADDING
                        amount of time (seconds) to pad data for filtering
                        (default: 3.0)

Lasso options:
  -a ALPHA, --alpha ALPHA
                        alpha parameter for lasso fit (default: None)
  -C, --no-cluster      do not generate clustered channel plots (default:
                        False)
  -c CLUSTER_COEFFICIENT, --cluster-coefficient CLUSTER_COEFFICIENT
                        correlation coefficient threshold for clustering
                        (default: 0.85)
  -L LINE_SIZE_PRIMARY, --line-size-primary LINE_SIZE_PRIMARY
                        line width of primary channel (default: 1)
  -l LINE_SIZE_AUX, --line-size-aux LINE_SIZE_AUX
                        line width of auxilary channel (default: 0.75)

Segment processing options:
  --low-noise-state-flag
                        use low noise data quality flag for segments (default:
                        False)
  --segment-padding SEGMENT_PADDING
                        padding of time on either side of a data-quality
                        segment to remove (default: 1200)
  --segment-min-length SEGMENT_MIN_LENGTH
                        data-quality segments must be at least this many
                        seconds (default: 10800)
  --intersect-data-segs
                        intersect data quality segments with available data
                        (default: False)