LASSO correlation¶
The least absolute shrinkage and selection operator (LASSO) is a regression technique using machine learning that tracks slow correlations among a large collection of time-domain data streams. For gravitational-wave detector characterisation, this technique is used to find correlations between environmental sensors and any noise in the primary strain channel.
The core gwdetchar.lasso
module provides the following functions:
|
Find outliers within a |
|
Find and remove outliers within a |
|
Fit some data to the target using a Lasso model |
|
Find the best alpha value to use for the given data |
|
Remove flat timeseries from a |
|
Remove data that cannot be scaled from a |
The gwdetchar.lasso.plot
module also provides functions for efficiently writing plots of LASSO data products:
Configure Matplotlib with LaTeX when using multiprocessing |
|
|
Save a figure |
Command-line utility¶
Note
This utility requires authentication with LIGO.ORG
credentials for archived frame data access.
gwdetchar.lasso¶
The gwdetchar.lasso
command-line interface searches for long, slow correlations between one channel identified as a primary (typically gravitational-wave strain) and several other (typically thousands of) auxiliary channels. For a full explanation of the available command-line arguments and options, you can run
$ python -m gwdetchar.lasso --help
usage: python -m gwdetchar.lasso [-h] [-V] -i IFO [-j NPROC] [-J NPROC_PLOT]
[-o OUTPUT_DIR] [-s SUMMARY_DIR]
[-f CHANNEL_FILE] [-T {second,minute}]
[-p PRIMARY_CHANNEL] [-pf PRIMARY_FILE]
[-P PRIMARY_FRAMETYPE]
[--primary-cache PRIMARY_CACHE]
[-O REMOVE_OUTLIERS] [-R REMOVE_OUTLIERS_PF]
[-t THRESHOLD] [--remove-bad-chans]
[-b FLOW FHIGH FLOW FHIGH]
[-x FILTER_PADDING] [-a ALPHA] [-C]
[-c CLUSTER_COEFFICIENT]
[-L LINE_SIZE_PRIMARY] [-l LINE_SIZE_AUX]
[--low-noise-state-flag]
[--segment-padding SEGMENT_PADDING]
[--segment-min-length SEGMENT_MIN_LENGTH]
[--intersect-data-segs]
gpsstart gpsend
positional arguments:
gpsstart GPS start time or datetime of analysis
gpsend GPS end time or datetime of analysis
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-i IFO, --ifo IFO IFO prefix for this analysis, default: None
-j NPROC, --nproc NPROC
the number of processes to use when reading data,
default: 1
-J NPROC_PLOT, --nproc-plot NPROC_PLOT
number of processes to use for plot rendering, will be
ignored if not using LaTeX (default: None)
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
output directory for plots (default: .)
-s SUMMARY_DIR, --summary-dir SUMMARY_DIR
output directory for html summary pages (default: .)
-f CHANNEL_FILE, --channel-file CHANNEL_FILE
path for channel file (default: None)
-T {second,minute}, --trend-type {second,minute}
type of trend for correlation (default: minute)
-p PRIMARY_CHANNEL, --primary-channel PRIMARY_CHANNEL
name of primary channel to use (default: {ifo}:DMT-
SNSH_EFFECTIVE_RANGE_MPC.mean)
-pf PRIMARY_FILE, --primary-file PRIMARY_FILE
filepath of .gwf custom primary channel if using
custom channel (default: None)
-P PRIMARY_FRAMETYPE, --primary-frametype PRIMARY_FRAMETYPE
frametype for --primary-channel, default: guess by
channel name (default: None)
--primary-cache PRIMARY_CACHE
cache file for --primary-channel, default: None
(default: None)
-O REMOVE_OUTLIERS, --remove-outliers REMOVE_OUTLIERS
Std. dev. limit for removing outliers (default: None)
-R REMOVE_OUTLIERS_PF, --remove-outliers-pf REMOVE_OUTLIERS_PF
Fractional limit for removing outliers between 0 and 1
(default: None)
-t THRESHOLD, --threshold THRESHOLD
threshold for making a plot (default: 0.0001)
--remove-bad-chans remove flat/bad channels (default: False)
Signal processing options:
-b FLOW FHIGH FLOW FHIGH, --band-pass FLOW FHIGH FLOW FHIGH
lower and upper frequencies for bandpass on h(t)
(default: None)
-x FILTER_PADDING, --filter-padding FILTER_PADDING
amount of time (seconds) to pad data for filtering
(default: 3.0)
Lasso options:
-a ALPHA, --alpha ALPHA
alpha parameter for lasso fit (default: None)
-C, --no-cluster do not generate clustered channel plots (default:
False)
-c CLUSTER_COEFFICIENT, --cluster-coefficient CLUSTER_COEFFICIENT
correlation coefficient threshold for clustering
(default: 0.85)
-L LINE_SIZE_PRIMARY, --line-size-primary LINE_SIZE_PRIMARY
line width of primary channel (default: 1)
-l LINE_SIZE_AUX, --line-size-aux LINE_SIZE_AUX
line width of auxilary channel (default: 0.75)
Segment processing options:
--low-noise-state-flag
use low noise data quality flag for segments (default:
False)
--segment-padding SEGMENT_PADDING
padding of time on either side of a data-quality
segment to remove (default: 1200)
--segment-min-length SEGMENT_MIN_LENGTH
data-quality segments must be at least this many
seconds (default: 10800)
--intersect-data-segs
intersect data quality segments with available data
(default: False)