De-Doppler Analysis¶

In this code, the following terminology is used:

Hit: Single strong narrowband signal in an observation.
Event: Strong narrowband signal that is associated with multiple hits across ON observations.

Note

This code works for .dat files that were produced by seti_event.py after turboSETI version 0.8.2, and blimpy version 1.1.7 (~mid 2019). The drift rates before that version were recorded with the incorrect sign and thus the drift rate sign would need to be flipped in the make_table function.

Authors¶

Version 2.0 - Sofia Sheikh (ssheikhmsa@gmail.com) and Karen Perez (kip2105@columbia.edu)
Version 1.0 - Emilio Enriquez (jeenriquez@gmail.com)

plotSETI Command Main Program¶

Main program module for executable plotSETI. Facilitates the automation of 2 large functions:

find_event_pipline() plot_event_pipline()

turbo_seti.find_event.run_pipelines.clean_event_stuff(path_out_dir)[source]¶

Clean up the output directory of old artifacts.

Parameters:	path_out_dir (str) – Output path of directory holding old artifacts.
Returns:
Return type:	None.

turbo_seti.find_event.run_pipelines.count_text_lines(path_list_file)[source]¶

Count the list of text lines in a file.

Parameters:	path_list_file (str) – Path of file containing a list of text lines..
Returns:	Count of text lines.
Return type:	int

turbo_seti.find_event.run_pipelines.execute_pipelines(args)[source]¶

Interface to the pipeline functions, called by main().

Parameters:	args (dict) –

turbo_seti.find_event.run_pipelines.main(args=None)[source]¶

This is the entry point to the plotSETI executable.

Parameters:	args (dict) –

turbo_seti.find_event.run_pipelines.make_lists(path_h5_dir, path_h5_list, path_dat_dir, path_dat_list)[source]¶

Create a list of .h5 files and a list of .dat files.

Parameters:	path_h5_dir (str) – Directory where the h5 files reside. path_h5_list (str) – Path of output list of h5 files. path_dat_dir (str) – Directory where the dat files reside. path_dat_list (str) – Path of output list of dat files.
Returns:	Number in cadence : Success. 0 : Failure.
Return type:	int

Find Event Pipeline¶

Front-facing script to find drifting, narrowband events in a set of generalized cadences of ON-OFF radio SETI observations.

The main function contained in this file is find_event_pipeline() calls find_events from find_events.py to read a list of turboSETI .dat files. It then finds events within this group of files.

class turbo_seti.find_event.find_event_pipeline.PathRecord(path_dat, tstart, source_name, fch1, foff, nchans)[source]¶: Definition of a DAT record

turbo_seti.find_event.find_event_pipeline.close_enough(x, y)[source]¶: Make sure that x and y are close enough to be considered roughly equal.

turbo_seti.find_event.find_event_pipeline.find_event_pipeline(dat_file_list_str, h5_file_list_str=None, check_zero_drift=False, filter_threshold=3, on_off_first='ON', number_in_cadence=6, on_source_complex_cadence=False, saving=True, csv_name=None, user_validation=False, sortby_tstart=True, SNR_cut=None, min_drift_rate=None, max_drift_rate=None)[source]¶

Find event pipeline.

Parameters:

dat_file_list_str (str) – The string name of a plaintext file ending in .lst that contains the filenames of .dat files, each on a new line, that were created with seti_event.py. The .lst should contain a set of cadences (ON observations alternating with OFF observations). The cadence can be of any length, given that the ON source is every other file. This includes Breakthrough Listen standard ABACAD as well as OFF first cadences like BACADA. Minimum cadence length is 2, maximum cadence length is unspecified (currently tested up to 6). Example: ABACAD|ABACAD|ABACAD
h5_file_list_str (str | None) – The string name of a plaintext file ending in .lst that contains the filenames of .h5 files, each on a new line, that were created with seti_event.py. The .lst should contain a set of cadences (ON observations alternating with OFF observations). The cadence can be of any length, given that the ON source is every other file. This includes Breakthrough Listen standard ABACAD as well as OFF first cadences like BACADA. Minimum cadence length is 2, maximum cadence length is unspecified (currently tested up to 6).
check_zero_drift (bool) – A True/False flag that tells the program whether to include hits that have a drift rate of 0 Hz/s. Earth- based RFI tends to have no drift rate, while signals from the sky are expected to have non-zero drift rates.
filter_threshold (int, default is 3) –
Specification for how strict the hit filtering will be. There are 3 different levels of filtering, specified by the integers 1, 2, and 3. * Filter_threshold = 1 applies the following parameter checks:

check_zero_drift SNR_cut min_drift_rate max_drift_rate

However, Filter_threshold = 1 applies no ON-OFF check. * Filter_threshold = 2 returns hits that passed level 1 AND that are in at least one ON table but no OFF tables. * Filter_threshold = 3 returns events that passed level 2 AND that are present in ALL ON tables.
on_off_first (str {'ON', 'OFF'}) – Tells the code whether the .dat sequence starts with the ON or the OFF observation. Valid entries are ‘ON’ and ‘OFF’ only. Default is ‘ON’.
number_in_cadence (int) – The number of files in a single ON-OFF cadence. Default is 6 for ABACAD.
on_source_complex_cadence (bool) – If using a complex cadence (i.e. ons and offs not alternating), this variable should be the string target name used in the .dat filenames. The code will then determine which files in your dat_file_list_str cadence are ons and which are offs.
saving (bool) – A True/False flag that tells the program whether to save the output array as a .csv.
user_validation (bool) – A True/False flag that, when set to True, asks if the user wishes to continue with their input parameters (and requires a ‘y’ or ‘n’ typed as confirmation) before beginning to run the program. Recommended when first learning the program, not recommended for automated scripts.
sortby_tstart (bool) – If True, the input file list is sorted by header.tstart.
SNR_cut (None (default value) or float value > 0) – If None, then all SNR values from the dedoppler results in the dat files are accepted as-is. Otherwise, the specified value is the threshold SNR below which hits will be discarded.
min_drift_rate (None (default value) or float value > 0) – If None, then all drift rate values from the dedoppler results in the dat files are accepted as-is. Otherwise, the specified value is the threshold drift rate below which hits will be discarded.
max_drift_rate (None (default value) or float value > 0) – If None, then all drift rate values from the dedoppler results in the dat files are accepted as-is. Otherwise, the specified value is the threshold drift rate above which hits will be discarded.

Returns:

a Pandas dataframe with all the events that were found.
None, if no events were found.

Return type:

Either

Notes

The HDF5 file is ASSUMED(!!) to have the same name as .dat files.

Examples

>>> import find_event_pipeline;
>>> find_event_pipeline.find_event_pipeline(dat_file_list_str,
...                                         SNR_cut=10,
...                                         min_drift_rate=0.1,
...                                         max_drift_rate=4,
...                                         check_zero_drift=False,
...                                         filter_threshold=3,
...                                         on_off_first='ON',
...                                         number_in_cadence=6,
...                                         on_source_complex_cadence=False,
...                                         saving=True,
...                                         user_validation=False)

turbo_seti.find_event.find_event_pipeline.get_file_header(filepath_h5)[source]¶

Extract and return the target’s source name from the DAT file path.

Parameters:	dat_path (str) – Full or relative path name of the DAT file
Returns:	header
Return type:	Waterfall header object

Find Event¶

Backend script to find drifting, narrowband events in a generalized cadence of radio SETI observations (any number of ons, any number of offs, any pattern - streamlined for alternating on-off sequences).

The main function contained in this file is find_events() uses the other helper functions in this file (described below) to read a list of turboSETI .dat files. It then finds events within this group of files.

turbo_seti.find_event.find_event.calc_freq_range(hit, delta_t=0.0, max_dr=True, follow=False)[source]¶

Calculates a range of frequencies where RFI in an off-source could be related to a hit in an on-source, given a freq and drift_rate.

Parameters:	hit (dict) – delta_t (float, optional) – max_dr (bool, optional) – follow (bool, optional) –
Returns:	[low_bound, high_bound]
Return type:	list

turbo_seti.find_event.find_event.end_search(t0)[source]¶

Ends the search when there are no candidates left, or when the filter level matches the user-specified level.

Parameters:	t0 (time) –

turbo_seti.find_event.find_event.find_events(dat_file_list, check_zero_drift=False, filter_threshold=3, on_off_first='ON', complex_cadence=False, SNR_cut=None, min_drift_rate=None, max_drift_rate=None)[source]¶

Reads a list of turboSETI .dat files.

Parameters:

dat_file_list (list) – A Python list of .dat files with ON observations of a single target alternating with OFF observations. This cadence can be of any length, given that the ON source is every other file. This includes Breakthrough Listen standard ABACAD as well as OFF first cadences like BACADA. Minimum cadence length is 2, maximum cadence length is unspecified (currently tested up to 6).
check_zero_drift (bool, optional) – A True/False flag that tells the program whether to include hits that have a drift rate of 0 Hz/s. Earth- based RFI tends to have no drift rate, while signals from the sky are expected to have non-zero drift rates. Default is False.
filter_threshold (int, default is 3) –
Specification for how strict the hit filtering will be. There are 3 different levels of filtering, specified by the integers 1, 2, and 3. * Filter_threshold = 1 applies the following parameter checks:

check_zero_drift SNR_cut min_drift_rate max_drift_rate

However, Filter_threshold = 1 applies no ON-OFF check. * Filter_threshold = 2 returns hits that passed level 1 AND that are in at least one ON table but no OFF tables. * Filter_threshold = 3 returns events that passed level 2 AND that are present in ALL ON tables.
on_off_first (str {'ON', 'OFF}, optional) – Tells the code whether the .dat sequence starts with the ON or the OFF observation. Valid entries are ‘ON’ and ‘OFF’ only.
complex_cadence (bool, optional) – A Python list of 1s and 0s corresponding to which files in the file_sublist are on-sources and which are off_sources for complex (i.e. non alternating) cadences.
SNR_cut (None (default value) or float value > 0) – If None, then all SNR values from the dedoppler results in the dat files are accepted as-is. Otherwise, the specified value is the threshold SNR below which hits will be discarded.
min_drift_rate (None (default value) or float value > 0) – If None, then all drift rate values from the dedoppler results in the dat files are accepted as-is. Otherwise, the specified value is the threshold drift rate below which hits will be discarded.
max_drift_rate (None (default value) or float value > 0) – If None, then all drift rate values from the dedoppler results in the dat files are accepted as-is. Otherwise, the specified value is the threshold drift rate above which hits will be discarded.

Examples

It is highly recommended that users interact with this program via the front-facing find_event_pipeline.py script. See the usage of that file in its own documentation.

If you would like to run find_events without calling find_event_pipeline.py, the usage is as follows:

>>> find_event.find_events(file_sublist, SNR_cut=10, check_zero_drift=False,
...                        filter_threshold=3, on_off_first='ON', complex_cadence=False)

Notes

It calls other functions to find events within this group of files. Filter_threshold allows the return of a table of events with hits at different levels of filtering. Filter_threshold = [1,2,3] means:

Hits above an SNR cut witout AB check

Hits that are only in some As and no Bs

Hits that are only in all As and no Bs

turbo_seti.find_event.find_event.follow_event(hit, on_table, get_count=True)[source]¶

Follows a given hit to the next observation of the same target and looks for hits which could be part of the same event.

Parameters:	hit (dict) – on_table (dict) – get_count (bool) –
Returns:	new_on_table or count
Return type:	dict or int

turbo_seti.find_event.find_event.not_yet_seen(mylist, argument)[source]¶

Search a list to see if argument is already there.

Parameters:	mylist (list) – List of things that have been already seen. argument (int) – An integer to add to list if not alreay seen.
Returns:	True :: Not yet seen so the argument was added. False :: Already seen.
Return type:	bool

turbo_seti.find_event.find_event.read_dat(filename)[source]¶

Read a turboseti .dat file.

Parameters:	filename (str) – Name of .dat file to open.
Returns:	df_data – Pandas dataframe of hits.
Return type:	dict

Plot DAT¶

turbo_seti.find_event.plot_dat.make_plot(dat, fil, f_start, f_stop, t0, candidate=None, check_zero_drift=False, alpha=1, color='black')[source]¶

Parameters:

dat (str) – The .dat file containing information about the hits detected.
fil (str) – Filterbank or h5 file corresponding to the .dat file.
f_start (float) – Start frequency, in MHz.
f_stop (float) – Stop frequency, in MHz.
t0 (float) – Start time of the candate event in mjd units.
candidate (dict, optional) – A single row from a pandas dataframe containing information about one of the candidate signals detected. Contains information about the candidate signal to be plotted. The necessary data includes the start and stop frequencies, the drift rate, and the source name. The dataframe the candiate comes from is generated in plot_all_hit_and_candidates above as candidate_event_dataframe. The default is None.
check_zero_drift (bool, optional) – A True/False flag that tells the program whether to include hits that have a drift rate of 0 Hz/s. Earth- based RFI tends to have no drift rate, while signals from the sky are expected to have non-zero drift rates. The default is False.
alpha (float, optional) – The opacity of the overlayed hit plot. This should be between 0 and 1, with 0 being invisible, and 1 being the default opacity. This is passed into matplotlib.pyplot function.
color (str, optional) – Allows for the specification of the color of the overlayed hits. The default is ‘black’.

turbo_seti.find_event.plot_dat.plot_dat(dat_list_string, fils_list_string, candidate_event_table_string, outdir=None, check_zero_drift=False, alpha=1, color='black', window=None)[source]¶

Makes a plot similar to the one produced by plot_candidate_events, but also includes the hits detected, in addition to the candidate signal.

Calls plot_hit_candidate and make_plot

Parameters:

dat_list_string (str) – List of .dat files in the cadence.
fils_list_string (str) – List of filterbank or .h5 files in the cadence.
candidate_event_table_string (str) – The string name of a .csv file that contains the list of events at a given filter level, created as output from find_event_pipeline.py.
outdir (str, optional) – Path to the directory where the plots will be saved to. The default is None, which will result in the plots being saved to the directory where the .dat file are located.
check_zero_drift (bool, optional) – A True/False flag that tells the program whether to include hits that have a drift rate of 0 Hz/s. Earth- based RFI tends to have no drift rate, while signals from the sky are expected to have non-zero drift rates. The default is False.
outdir – Path to the directory where the plots will be saved to. The default is None, which will result in the plots being saved to the directory the .dat file are located.
alpha (float, optional) – The opacity of the overlayed hit plot. This should be between 0 and 1, with 0 being invisible, and 1 being the default opacity. This is passed into matplotlib.pyplot function.
color (str, optional) – Allows for the specification of the color of the overlayed hits. The default is ‘black’.
window (tuple, optional) – Sets the start and stop frequencies of the plot, in MHz. The input takes the form of a tuple: (start, stop). And assumes that the start is less than the stop. If given, the resulting plot will range exactly between the start/stop frequencies. The default is None, which will result in a plot of the entire range of hits detected.

turbo_seti.find_event.plot_dat.plot_hit_candidate(dat_file_list, fil_file_list, source_name_list, all_hits_frame, candidate=None, check_zero_drift=False, outdir=None, alpha=1, color='black', window=None)[source]¶

Parameters:

dat_file_list (list) – A Python list that contains a series of strings corresponding to the filenames of .dat files, each on a new line, that corresponds to the .dat files created when running turboSETI candidate search on the .h5 or .fil files below
fil_file_list (list) – A Python list that contains a series of strings corresponding to the filenames of .dat files, each on a new line, that corresponds to the cadence used to create the .csv file used for event_csv_string.
source_name_list (list) – A Python list that contains a series of strings corresponding to the source names of the cadence in chronological (descending through the plot pannels) cadence.
all_hits_frame (dict) – A pandas dataframe contining information about all the hits detected. The necessary data includes the start and stop frequencies, the drift rate, and the source name. This dataframe is generated in plot_all_hit_and_candidates above.
candidate (dict, optional) – A single row from a pandas dataframe containing information about one of the candidate signals detected. Contains information about the candidate signal to be plotted. The necessary data includes the start and stop frequencies, the drift rate, and the source name. The dataframe the candiate comes from is generated in plot_all_hit_and_candidates above as candidate_event_dataframe. The default is None.
check_zero_drift (bool, optional) – A True/False flag that tells the program whether to include hits that have a drift rate of 0 Hz/s. Earth- based RFI tends to have no drift rate, while signals from the sky are expected to have non-zero drift rates. The default is False.
outdir (str, optional) – Path to the directory where the plots will be saved to. The default is None, which will result in the plots being saved to the directory the .dat file are located.
alpha (float, optional) – The opacity of the overlayed hit plot. This should be between 0 and 1, with 0 being invisible, and 1 being the default opacity. This is passed into matplotlib.pyplot function.
color (str, optional) – Allows for the specification of the color of the overlayed hits. The default is ‘black’.
window (tuple, optional) – Sets the start and stop frequencies of the plot, in MHz. The input takes the form of a tuple: (start, stop). And assumes that the start is less than the stop. The resulting plot will range exactly between the start/stop frequencies. The default is None, which will result in a plot of the entire range of hits detected.

Plot Event Pipeline¶

Front-facing script to plot drifting, narrowband events in a set of generalized cadences of ON-OFF radio SETI observations.

class turbo_seti.find_event.plot_event_pipeline.PathRecord(path_h5, tstart, source_name)[source]¶: Definition of an H5 path record

turbo_seti.find_event.plot_event_pipeline.plot_event_pipeline(event_csv_string, fils_list_string, user_validation=False, offset=0, filter_spec=None, sortby_tstart=True, plot_dir=None)[source]¶

This function calls plot_candidate_events() to plot the events in an output .csv file generated by find_event_pipeline.py

Parameters:

event_csv_string (str) – The string name of a .csv file that contains the list of events at a given filter level, created as output from find_event_pipeline.py. The .csv should have a filename containing information about its parameters, for example “kepler1093b_0015_f2_snr10.csv” Remember that the file was created with some cadence (ex. ABACAD) and ensure that the cadence matches the order of the files in fils_list_string
fils_list_string (str) – The string name of a plaintext file ending in .lst that contains the filenames of .fil files, each on a new line, that corresponds to the cadence used to create the .csv file used for event_csv_string.
user_validation (bool, optional) – A True/False flag that, when set to True, asks if the user wishes to continue with their input parameters (and requires a ‘y’ or ‘n’ typed as confirmation) before beginning to run the program. Recommended when first learning the program, not recommended for automated scripts.
offset (int, optional) – The amount that the overdrawn “best guess” line from the event parameters in the csv should be shifted from its original position to enhance readability. Can be set to 0 (default; draws line on top of estimated event) or ‘auto’ (shifts line to the left by an auto- calculated amount, with addition lines showing original position).
sortby_tstart (bool) – If True, the input file list is sorted by header.tstart.

Examples

>>> import plot_event_pipeline;
... plot_event_pipeline.plot_event_pipeline(event_csv_string, fils_list_string,
...                                         user_validation=False, offset=0)

Plot Event¶

Backend script to plot drifting, narrowband events in a generalized cadence of ON-OFF radio SETI observations. The main function contained in this file is plot_candidate_events() uses the other helper functions in this file (described below) to plot events from a turboSETI event .csv file.

turbo_seti.find_event.plot_event.make_waterfall_plots(fil_file_list, on_source_name, f_start, f_stop, drift_rate, f_mid, filter_level, source_name_list, offset=0, plot_dir=None, **kwargs)[source]¶

Makes waterfall plots of an event for an entire on-off cadence.

Parameters:

fil_file_list (str) – List of filterbank files in the cadence.
on_source_name (str) – Name of the on_source target.
f_start (float) – Start frequency, in MHz.
f_stop (float) – Stop frequency, in MHz.
drift_rate (float) – Drift rate in Hz/s.
f_mid (float) – <iddle frequency of the event, in MHz.
filter_level (int) – Filter level (1, 2, or 3) that produced the event.
source_name_list (list) – List of source names in the cadence, in order.
bandwidth (int) – Width of the plot, incorporating drift info.
kwargs (dict) – Keyword args to be passed to matplotlib imshow().

Notes

Makes a series of waterfall plots, to be read from top to bottom, displaying a full cadence at the frequency of a recorded event from find_event. Calls plot_waterfall()

turbo_seti.find_event.plot_event.overlay_drift(f_event, f_start, f_stop, drift_rate, t_duration, offset=0, alpha=1, color='#cc0000')[source]¶: Creates a dashed red line at the recorded frequency and drift rate of the plotted event - can overlay the signal exactly or be offset by some amount (offset can be 0 or ‘auto’).

turbo_seti.find_event.plot_event.plot_candidate_events(candidate_event_dataframe, fil_file_list, filter_level, source_name_list, offset=0, plot_dir=None, **kwargs)[source]¶

Calls make_waterfall_plots() on each event in the input .csv file.

Parameters:

candidate_event_dataframe (dict) – A pandas dataframe containing information about a candidate event. The necessary data includes the start and stop frequencies, the drift rate, and the source name. To determine the required variable names and formatting conventions, see the output of find_event_pipeline.
fil_file_list (list) – A Python list that contains a series of strings corresponding to the filenames of .fil files, each on a new line, that corresponds to the cadence used to create the .csv file used for event_csv_string.
filter_level (int) – A string indicating the filter level of the cadence used to generate the candidate_event_dataframe. Used only for output file naming, convention is “f1”, “f2”, or “f3”. Descriptions for the three levels of filtering can be found in the documentation for find_event.py
source_name_list (list) – A Python list that contains a series of strings corresponding to the source names of the cadence in chronological (descending through the plot panels) cadence.
offset (int, optional) – The amount that the overdrawn “best guess” line from the event parameters in the csv should be shifted from its original position to enhance readability. Can be set to 0 (default; draws line on top of estimated event) or ‘auto’ (shifts line to the left by an auto-calculated amount, with addition lines showing original position).
kwargs (dict) –

Examples

It is highly recommended that users interact with this program via the front-facing plot_event_pipeline.py script. See the usage of that file in its own documentation.

If you would like to run plot_candidate_events without calling plot_event_pipeline.py, the usage is as follows:

>>> plot_event.plot_candidate_events(candidate_event_dataframe, fil_file_list,
...                                  filter_level, source_name_list, offset=0)

turbo_seti.find_event.plot_event.plot_waterfall(wf, source_name, f_start=None, f_stop=None, **kwargs)[source]¶

Plot waterfall of data in a .fil or .h5 file.

Parameters:	wf (blimpy.Waterfall object) – Waterfall object of an H5 or Filterbank file containing the dynamic spectrum data. source_name (str) – Name of the target. f_start (float) – Start frequency, in MHz. f_stop (float) – Stop frequency, in MHz. kwargs (dict) – Keyword args to be passed to matplotlib imshow().

Notes

Plot a single-panel waterfall plot (frequency vs. time vs. intensity) for one of the on or off observations in the cadence of interest, at the frequency of the expected event. Calls overlay_drift()