De-Doppler Search¶
turboSETI Command Main Program¶
Main program module for executable turboSETI
Find Doppler¶
turbo_seti doppler search module¶
This module is deeply dependent on classes and functions in data_handler.py.
Main class: FindDoppler
- Independent functions:
- search_coarse_channel - for a given coarse channel, doppler search. load_the_data - loads everything needed by search_coarse_channel. populate_tree - populate “tree_findoppler” used by several functions. hitsearch - Searches for hits at given drift rate. tophitsearch - Searches for hits with largest SNR within 2*tsteps fine frequency channels.
-
class
turbo_seti.find_doppler.find_doppler.
FindDoppler
(datafile, max_drift=10.0, min_drift=1e-05, snr=25.0, out_dir='./', coarse_chans=None, obs_info=None, flagging=False, n_coarse_chan=None, kernels=None, gpu_backend=False, gpu_id=0, precision=1, append_output=False, log_level_int=20, blank_dc=True)[source]¶ Initializes FindDoppler object.
Parameters: - datafile (string) – Input filename (.h5 or .fil)
- max_drift (float) – Maximum drift rate in Hz/second.
- min_drift (float) – Minimum drift rate in Hz/second.
- snr (float) – Minimum Signal to Noise Ratio (SNR) - A ratio bigger than 1 to 1 has more signal than noise.
- out_dir (string) – Directory where output files should be placed. By default this is the current working directory.
- coarse_chans (list(int)) – The input comma-separated list of coarse channels to analyze, if any. By default, all coarse channels will be searched. Use this to search only specified channels, e.g. [7,12] will search channels 7 and 12 only.
- obs_info (dict) – Used to hold information found on file, including info about pulsars, RFI, and SEFD.
- flagging (bool) – Flags the edges of the PFF for BL data (with 3Hz res per channel)? (True/False) Anybody - please improve this cryptic description.
- n_coarse_chan (int) – Number of coarse channels in the file. If None (default), blimpy will make this determination (undesirable, in general).
- kernels (Kernels, optional) – Pre-configured class of Kernels.
- gpu_backend (bool, optional) – Use GPU accelerated Kernels? (True/False)
- gpu_id (int) – If gpu_backend=True, then this is the GPU device to use. Default is 0.
- precision (int {2: float64, 1: float32}, optional) – Floating point precision for the GPU. The default is 1 (recommended).
- append_output (bool, optional) – Append output DAT & LOG files? (True/False) Default is False. DEPRECATED.
- log_level_int (int, optional) – Python logging threshold level (INFO, DEBUG, or WARNING) Default is logging.INFO.
- blank_dc (bool, optional) – Remove the DC spike? (True/False) Default is True (recommended).
-
last_logwriter
(arg_path, arg_text)[source]¶ Write the last LogWriter entry
Parameters: - arg_path (str) – Path of log for the final log entries.
- arg_text (str) – Text message to include at end of the log file.
Returns: Return type: None.
-
search
(n_partitions=1, progress_bar='n')[source]¶ Top level search routine.
Parameters: - n_partitions (int) – Number of Dask partitions (processes) to use in parallel. Defaults to single-partition (process).
- progress_bar (str {'y', 'n'}, optional) – Enable command-line progress bar.
Returns: Return type: None.
Notes
- self.data_handle.cchan_list : the list of coarse channel objects for searching,
- created by self.data_handle = DATAHandle() during __init__() execution.
If using dask (n_partitions > 1): * Launch multiple drift searches in parallel. * Each search works on a single coarse channel object. * n_partitions governs the maximum number of partitions to run in parallel. Else, the searches are done in sequence of the coarse channel objects.
It is not recommended to mix dask partitions with GPU mode as this could cause GPU queuing.
-
turbo_seti.find_doppler.find_doppler.
hitsearch
(fd, spectrum, specstart, specend, snr_thresh, drift_rate, header, tdwidth, max_val, the_median, the_stddev)[source]¶ Searches for hits that exceed the given SNR threshold.
Note that the “max” arrays share the index values as any given spectrum. They represent maximums with respect to the frequency columns in the range (0, FFT length).
Let S be the subspectrum given by spectrum[specstart:specend]. Set hit-counter to 0. For each element of S,
- Subtract the given median and divide that result by the given standard deviation,
- giving the new element value.
- if the element value > snr_thresh then
Increment hit-counter If element value > current max SNR using the common index then
Set the current max SNR at the common index = this element. Set the current max drift rate at the common index = drift rate of this element.
Increment the grand total of hits by the hit-counter.
Parameters: - fd (FindDoppler) – Instance of FindDoppler class.
- spectrum (ndarray) – Array of data values along the frequency axis of length = FFT length.
- specstart (int) – First index to search for hit in spectrum.
- specend (int) – Last index to search for hit in spectrum.
- snr_thresh (float) – Minimum signal to noise ratio for candidacy.
- drift_rate (float) – Drift rate at which we are searching for hits.
- header (dict) – Header in fits header format. See data_handler.py’s DATAH5 class header.
- tdwidth (int) – FFT Length = # fine channels / # coarse channels.
- max_val (max_vals) – Object to be filled with max values from this search and then returned. Length of each subarray = FFT length.
-
turbo_seti.find_doppler.find_doppler.
load_the_data
(cchan_dict, precision)[source]¶ Load the DATAH5 object, spectra matrix, and the associated drift indexes.
Parameters: - cchan_dict (dict) – A single coarse channel object created by data_handler.py DATAHandle __split_h5.
- precision (int {2: float64, 1: float32}) – Floating point precision for the GPU.
Returns: - datah5_obj (DATAH5 object (complex!))
- spectra (numpy.ndarray) – Spectra data array. Set by the data_handler.py load_data function.
- drift_indexes (numpy.ndarray) – Drift index matrix. Set by the data_handler.py load_data function.
-
class
turbo_seti.find_doppler.find_doppler.
max_vals
[source]¶ Class used to initialize some maximums.
-
turbo_seti.find_doppler.find_doppler.
populate_tree
(fd, spectra, tree_findoppler, nframes, tdwidth, tsteps, fftlen, shoulder_size, roll=0, reverse=0)[source]¶ This script populates the findoppler tree with the spectra.
Parameters: - fd (FindDoppler object) – Instance of FindDoppler class.
- spectra (ndarray) – Spectra matrix.
- tree_findoppler (ndarray) – Tree to be populated with spectra.
- nframes (int) –
- tdwidth (int) –
- tsteps (int) –
- fftlen (int) – Length of fast fourier transform (fft) matrix.
- shoulder_size (int) – Size of shoulder region.
- roll (int, optional) – Used to calculate amount each entry to the spectra should be rolled (shifted).
- reverse (int, optional) – Used to determine which way spectra should be rolled (shifted).
Returns: Spectra-populated version of the input tree_findoppler.
Return type: ndarray
Notes
It creates two “shoulders” (each region of tsteps*(shoulder_size/2) in size) to avoid “edge” issues. It uses np.roll() for drift-rate blocks higher than 1.
-
turbo_seti.find_doppler.find_doppler.
search_coarse_channel
(cchan_dict, fd, dataloader=None, logwriter=None, filewriter=None)[source]¶ Run a turboseti search on a single coarse channel.
Parameters: - cchan_dict (dict) – A single coarse channel object created by data_handler.py DATAHandle __split_h5. Contains the following fields: * filename : file path (common to all objects) * f_start : start frequency of coarse channel * f_stop : stop frequency of coarse channel * cchan_id : coarse channel number * n_coarse_chan : total number of coarse channels (common to all objects)
- fd (FindDoppler object) – Instance of the FindDoppler class.
- logwriter (LogWriter, optional) – A LogWriter to write log output into. If None, one will be created.
- filewriter (FileWriter, optional) – A FileWriter to use to write the dat file. If None, one will be created.
Returns: Returns True if no exceptions occur (needed for dask).
Return type: bool
Notes
This function is separate from the FindDoppler class to allow parallelization. This should not be called directly, but rather via the FindDoppler.search() routine. One exception: turboseti_search package.
-
turbo_seti.find_doppler.find_doppler.
tophitsearch
(fd, tree_findoppler_original, max_val, tsteps, header, tdwidth, fftlen, max_drift, drift_rate_resolution, logwriter=None, filewriter=None, obs_info=None)[source]¶ This finds the hits with largest SNR within a nearby window of frequency channels. The window size is calculated so that we cannot report multiple overlapping hits.
Parameters: - tree_findoppler_original (ndarray) – Spectra-populated findoppler tree
- max_val (max_vals) – Contains max values from hitsearch
- tsteps (int) –
- header (dict) – Header in fits header format. Used to report tophit in filewriter.
See
DATAH5
- tdwidth (int) –
- fftlen (int) – Length of fast fourier transform (fft) matrix
- max_drift (float) – Maximum drift rate in Hz/second
- drift_rate_resolution (float) – The drift rate corresponding to drifting rightwards one bin in the whole observation
- logwriter (LogWriter, optional) – Logwriter to which we should write if we find a top hit.
- filewriter (FileWriter, optional) – Filewriter corresponding to file to which we should save the local maximum of tophit.
See
report_tophit()
- obs_info (dict, optional) –
Returns: Same filewriter that was input.
Return type:
Data Handler¶
Filterbank data handler for the find_doppler.py functions.
-
class
turbo_seti.find_doppler.data_handler.
DATAH5
(filename, f_start=None, f_stop=None, t_start=None, t_stop=None, cchan_id=0, n_coarse_chan=None, kernels=None, gpu_backend=False, precision=1, gpu_id=0)[source]¶ This class is where the waterfall data is loaded, as well as the DATAH5 header info. Don’t be surprised at the use of FITS header names! [?] It creates other attributes related to the dedoppler search (load_drift_indexes).
Parameters: - filename (string) – Name of file.
- f_start (float) – Start frequency in MHz.
- f_stop (float) – Stop frequency in MHz.
- t_start (int) – Start integration ID.
- t_stop (int) – Stop integration ID.
- coarse_chan (int) – Coarse channel ID.
- n_coarse_chan (int) – Total number of coarse channels.
- kernels (Kernels) – Pre-configured class of kernels.
-
close
()[source]¶ Closes file and sets the data attribute .closed to True. A closed object can no longer be used for I/O operations. close() may be called multiple times without error.
-
class
turbo_seti.find_doppler.data_handler.
DATAHandle
(filename=None, out_dir='./', n_coarse_chan=None, coarse_chans=None, kernels=None, gpu_backend=False, precision=1, gpu_id=0)[source]¶ Class to setup input file for further processing of data. Handles conversion to h5 (from fil), extraction of coarse channel info, waterfall info, and file size checking.
Parameters: - filename (str) – Name of file (.h5 or .fil).
- out_dir (str) – Directory where output files should be saved.
- n_coarse_chan (int) – Number of coarse channels.
- coarse_chans (list or None) – List of course channels.
- kernels (Kernels, optional) – Pre-configured class of Kernels.
- gpu_backend (bool, optional) – Use GPU accelerated Kernels?
- precision (int {2: float64, 1: float32}, optional) – Floating point precision. Default: 1.
- gpu_id (int) – If gpu_backend=True, then this is the device ID to use.
File Writers¶
-
class
turbo_seti.find_doppler.file_writers.
FileWriter
(filename, header)[source]¶ Used to write information to turboSETI output files.
Initializes FileWriter object and writes its header.
Parameters: - filename (str) – Name of file on which we would like to perform operations.
- header (dict) – Information to be written to header of file filename.
-
report_header
(header)[source]¶ Write header information per given obs.
Parameters: header (dict) – Information to be written to file header.
-
report_tophit
(max_val, ind, ind_tuple, tdwidth, fftlen, header, total_n_candi, obs_info=None)[source]¶ This function looks into the top hit in a region, basically finds the local maximum and saves that.
Parameters: - max_val (findopp) –
- ind (int) – Index at which top hit is located in max_val’s maxdrift and maxsnr.
- ind_tuple (tuple(int, int) (lbound, ubound)) –
- tdwidth (int) –
- fftlen (int) – Length of the fast fourier transform matrix.
- header (dict) – Contains info on coarse channel to be written to file.
- total_n_candi (int) –
- obs_info (dict, optional) – Used to hold info found on file, including info about pulsars, RFI, and SEFD.
Returns: Return type: FileWriter object that called this function.
-
class
turbo_seti.find_doppler.file_writers.
GeneralWriter
(filename='', mode='a')[source]¶ Wrapper class for file operations.
Initializes GeneralWriter object. Opens given file with given mode, sets new object’s filehandle to the file object, sets the new object’s filename to the file’s name, then closes the file.
Parameters: - filename (str) – Name of file on which we would like to perform operations.
- mode (str {'a', 'r', 'w', 'x'}, optional) – Mode which we want to use to open file, same modes as the built-in python built-in open function: read (r), append (a), write (w), or create (x).
-
is_open
()[source]¶ Checks if file is open.
Returns: True if file is open, False otherwise. Return type: boolean
-
open
(mode='a')[source]¶ Opens the file with the inputted mode, then closes it. Does not actually leave the file opened, only used for changing mode.
Parameters: mode (str {'a', 'r', 'w', 'x'}, optional) – Mode which we want to assign to this file, same modes as the built-in python built-in open function: read (r), append (a), write (w), or create (x).
-
writable
()[source]¶ Checks if file is open, and if it is, checks that mode is either write or append.
Returns: True if file is open and writeable, False otherwise. Return type: boolean
-
write
(info_str, mode='a')[source]¶ Sets file mode to a writeable mode and opens it if it is not already open in a writeable mode, writes info_str to it, and then closes it. If the file was not previously open when this is called, the file is closed after writing in order to maintain the state the filewriter was in before.
Parameters: - info_str (str) – Data to be written to file.
- mode (str {'a', 'w'}, optional) – Mode for file. If it is not a writeable mode, it will be set to a writeable mode.
-
class
turbo_seti.find_doppler.file_writers.
LogWriter
(filename='', mode='a')[source]¶ Used to write data to log.
Initializes GeneralWriter object. Opens given file with given mode, sets new object’s filehandle to the file object, sets the new object’s filename to the file’s name, then closes the file.
Parameters: - filename (str) – Name of file on which we would like to perform operations.
- mode (str {'a', 'r', 'w', 'x'}, optional) – Mode which we want to use to open file, same modes as the built-in python built-in open function: read (r), append (a), write (w), or create (x).
Kernels¶
Hitsearch¶
This kernel implements a GPU accelerated version of the hitsearch()
method written as a RAW CUDA kernel.
De-Doppler¶
This kernel implements a slightly modified version of the Taylor Tree algorithm published by J.H. Taylor in 1974.
- This GPU implementation is based on Cupy array library accelerated with CUDA and ROCm.
- This CPU implementation is based on Numba Just-In-Time compilation.
turbo_seti.find_doppler.kernels._taylor_tree._core_numba.
flt
[source]¶This is a function to Taylor-tree-sum a data stream. It assumes that the arrangement of data stream is, all points in first spectra, all points in second spectra, etc. Data are summed across time.
Parameters:
- outbuf (array_like) – Input data array, replaced by dedispersed data at the output.
- nchn (int) – Number of timesteps in the data.
References
- Ramachandran, 07-Nov-97, nfra. – Original algorithm.
- Siemion, 2011 – float/64 bit addressing (C-code)
- Chen, 2014 – python version
- Enriquez + P.Schellart, 2016 – cython version
- Cruz, 2020 – numba version
-
class
turbo_seti.find_doppler.kernels.
Kernels
(gpu_backend=False, precision=2, gpu_id=0)[source]¶ Dynamically loads the right modules according to parameters.
Parameters: - gpu_backend (bool, optional) – Enable GPU acceleration.
- precision (int {2: float64, 1: float32}, optional) – Floating point precision.
-
get_spectrum
(tt_output, tsteps, tdwidth, drift_index)[source]¶ The different Taylor tree kernels have a slightly different output. Both of them you can think of indexed by [row index][frequency], although it is reshaped as a 1-dimensional array. In the GPU version, the row index is the same as the “drift index”. 0 is the least drift, 1 is the next least drift, et cetera. In the CPU version, the row index is bit-reversed from this. This method lets the caller get data for a particular drift without knowing how the rows are ordered. There’s a good chance that one or both of these is suboptimal; please update this comment if you change the underlying algorithm.
Helper Functions¶
-
turbo_seti.find_doppler.helper_functions.
FlipX
(outbuf, xdim, ydim, xp=None)[source]¶ This function takes in an array of values and iteratively flips ydim chunks of values of length xdim.
Parameters: - outbuf (ndarray) – An array with shape like (int, 1)
- xdim (int) – Size of segments to be flipped.
- ydim (int) – Amount of segments of size xdim to be flipped.
- xp (Numpy or Cupy, optional) – Math module to be used. If None, Numpy will be used.
Examples
If you have an array [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and enter it with xdim = 5 and ydim = 2, the array will be modified to become [5, 4, 3, 2, 1, 10, 9, 8, 7, 6]. Note that if you wish for the whole array to be modified in this way, xdim * ydim should equal the length of the array. If ydim * xdim is greater than the length of the array, this function will error.
-
turbo_seti.find_doppler.helper_functions.
bitrev
(inval, nbits)[source]¶ This function bit-reverses the given value “inval” with the number of bits, “nbits”.
Parameters: - inval (int) – Number to be bit-reversed.
- nbits (int) – The length of inval in bits. If user only wants the bit-reverse of a certain amount of bits of inval, nbits is the amount of bits to be reversed counting from the least significant (rightmost) bit. Any bits beyond this length will not be reversed and will be truncated from the result.
Returns: The bit-reverse of inval. If there are more significant bits beyond nbits, they are truncated.
Return type: int
References
- Ramachandran, 10-Nov-97, nfra. – Original C implementation.
- Chen, 2014 – Python version.
- Elkins (texadactyl), 2020 – Speedup.
-
turbo_seti.find_doppler.helper_functions.
chan_freq
(header, fine_channel, tdwidth, ref_frame)[source]¶ Find channel frequency. Note issue #98.
Parameters: - header –
- fine_channel –
- tdwidth –
- ref_frame –
Returns: chanfreq
Return type: float
-
turbo_seti.find_doppler.helper_functions.
comp_stats
(np_arr, xp=None)[source]¶ Compute median and stddev of floating point vector array in a fast way, discarding outliers.
Parameters: - np_arr (ndarray) – Floating point vector array.
- xp (Numpy or Cupy, optional) – Math module to be used. If None, Numpy will be used.
Returns: the_median, the_stddev – Median and standard deviation of input array with outliers removed.
Return type: numpy.float32, numpy.float32
Merge DAT and LOG Files¶
Source file for merge_dats_logs()
-
turbo_seti.find_doppler.merge_dats_logs.
merge_dats_logs
(arg_h5: str, arg_dir: str, arg_type: str, cleanup='n')[source]¶ Merge multiple DAT (or LOG) files.
Parameters: - arg_h5 (str) – HDF5 file used by
search()
to produce the DAT and LOG files. - arg_dir (str) – Directory holding multiple DAT and LOG files after FindDoppler.search() which ran with more than 1 partition.
- arg_type (str) – File extension of interest (‘dat’ or ‘log’).
- arg_h5 (str) – HDF5 file used by