Computational Astronomy Tut 5 ==================================== Due date: 1700 Thurs 6 May. For this tut I want you to write 3 interacting python programs. Overall description: -------------------- The input data consists of time recordings of the radio output of several pulsars. Each pulsar has been recorded at a number of closely-spaced (radio) frequencies. The aim of the exercise is, for each pulsar, to: - Estimate its period in seconds. - Estimate its dispersion measure (DM) in pc cm^-3. Program 1 is a 'wrapper' which will cycle through the list of pulsars. For each pulsar, this program will construct a file name, and call first program 2, then program 3, providing this file name and other appropriate data on the command line of these calls. Program 2 calculates the period of the pulsar and also writes some data to file for use by program 3. Program 3 uses the data provided by program 2 to calculate the dispersion measure for the pulsar. If there are routines which both programs 2 and 3 make use of, you should put these in a separate module, which you then import independently to programs 2 and 3. Duplication of code will be viewed as 'evil' and will attract the appropriate demerit. I will make the input data available in my home area of the NASSP file system, at /home/ims/data/teaching/comp_astron/2010/assignments/tut_5/ These files have names like 'tut_5_data_?_noisy.fits' where the ? is an integer. Required files you must submit to me: ------------------------------------- * The code for the three programs (plus code for any additional module you write). * A plot of the power spectrum for the first pulsar. This should be png or jpeg. * A file (may be fits, text, pdf or any other reasonable format) containing a list of period and DM for each pulsar. Don't just give me two columns of numbers with no other information! Annotate it in a professional manner, ie include name and units for the numbers. Names: ------ For this tut I am going to ask you to name your programs and some output files according to the following scheme. You don't for example need to include your own name in the file name. Programs: ......... ca_t5_wrapper.py ca_t5_period.py ca_t5_disp.py ca_t5_utils.py (if you have a module for common code, as mentioned above) Output files: ............. ca_t5_p1_power.png (or .jpg as appropriate) ca_t5_results Detailed description of programs 2 and 3 follows. Program 2: ---------- * Name: ca_t5_period.py * Input data: a FITS file which has a 2D array in its primary HDU. Each row of this is a time series which records the radio flux from a pulsar. Each row corresponds to a different radio frequency. Thus, after you open this array in python, the 0th axis (ie, .shape[0]) is the spectral, or frequency axis, while the 1st axis (.shape[1]) is the time axis. Information about the pixel size (which in the 0th axis corresponds to the channel width, in the 1st axis corresponds to the integration time) is contained in appropriate WCS keywords. * This program should: - Obtain 3 things from the command line: 1: the name of its input file; 2: name of the output power-spectrum plot file. 3: name of the output 'intermediate' file. - Read a 2D image from the primary HDU of the input FITS file. - Read the WCS keywords 'CRPIX1', 'CRVAL1', 'CUNIT1' and 'CDELT1' from the header of this HDU. - Read the WCS keyword 'CDELT2' from the header of this HDU. - Get the sizes of the input array (respectively, the number of spectral channels, and the number of time samples). - Initialize a 2D array to store the Fourier transform of each spectral channel. This should be complex-valued of course, and of size [numChannels, numTimeSamples]. - Initialize a 1D array to store a cumulative power spectrum. Set its contents to zero. - For each spectral channel: % Calculate the discrete Fourier transform (DFT) of the time series at this radio frequency. Numpy and scipy have fft routines to do this. Save this FT in the appropriate row of the FT array you initialized before starting this loop. % Calculate a power spectrum from this FT. % Add this to the cumulative power spectrum. - Now you have to locate the position of the first peak in the cumulative power spectrum (CPS). An appropriate algorithm to do this is to find the position of the maximum value of the CPS, subject to the following constraints: % Constraint 1: Exclude from consideration the 0th pixel, and every subsequent pixel, until the value begins to rise. For example, suppose the values of the first few pixels of this CPS were: Pixel: Value: Include? 0 1.32 No 1 0.97 No 2 0.43 No 3 0.22 No 4 0.29 Yes % Constraint 2: exclude the second half of the vector. Note that numpy objects have an argmax() method, but you might find it more convenient to use the maxloc() function of my module misc_utils (see below for notes on importing my modules). - Suppose we refer to the pixel number of this peak as peak N. - Now you have to calculate the period of the pulsar. This is equal to T/float(N), where T is the total observing duration. You can calculate T by multiplying the sample integration time stored in the 'CDELT2' WCS keyword by the number of time samples. - You should store the period at some place the wrapper program can receive it. How this information makes its way into the final list of results is up to you. - Now plot the cumulative power spectrum as a png or jpeg. NOTE! % You should ONLY plot the first 3.5*N pixels. % Label the axes properly. In particular, the X axis. Appropriate names for this are 'Reciprocal period' or 'cycle frequency'. Units should be s^-1. You will need to scale the X values to get the axis numbers right. % I only require this plot for the first pulsar, but it is probably simpler to make plots for all pulsars, then just send me the 1st of these. - Finally, create a FITS file which contains a table extension having the following properties: % The extension contains 2 columns. # Column 1 contains the real values of FTarray[:,N]. # Column 2 contains the imaginary values of FTarray[:,N]. You have to do it this way because FITS has no complex data type. % The extension contains the spectral-axis WCS keywords you read from the input file. Formally speaking we should use a different format for these, but for simplicity just use the same names. % Store the pulsar period in an appropriate keyword. - This FITS file is the 'intermediate' file and is meant to be read by program 3. There is no reason not to use the same file name for all pulsars, but whether to do this is up to you. Program 3: ---------- * Name: ca_t5_disp.py * Input data: the 'intermediate' FITS file written by program 2. This contains a binary table extension with 2 columns. The 2 columns record respectively the real and imaginary parts of a complex-valued spectrum. Information about the spectral channel size and starting points is contained in appropriate WCS keywords. * This program should: - Obtain 1 thing from the command line: 1: the name of its input file. - Read a complex-valued spectrum from the columns of the 1st extension of the input FITS file. - Read the WCS keywords 'CRPIX1', 'CRVAL1', 'CUNIT1' and 'CDELT1' from the header of this HDU. - Read the pulsar period from the keyword you stored it under. - Get the number of spectral channels (which is just the number of table rows). - Now you will be calculating the discrete Fourier transform of this spectrum. The reason for doing this is that the number of oscillations in it is proportional (approximately) to the dispersion measure. There is only 1 complication: that is that the number of oscillations may be quite small - 1 or 2 at most. So a bare FT of the spectrum will not be very precise, because the discrete FT only works with integer numbers of oscillations. To get more significant figures in the final result we need to 'zero-pad'. So: - Initialize a complex-valued, 1D array, of size equal to 100*numChannels. Ie we will be using a 100 to 1 zero pad. Set all its values to zero. - Store the spectrum you have read from file in the first numChannels pixels of this array. The remaining pixel values are left at zero. - Do a FFT on the result. - Calculate the power spectrum (PS) from this FT. - Now we want to find the position of the first peak in the PS. The algorithm for this is exactly the same as used for program 2. Let's call the position of this peak M, and the zero-pad multiple (which I have specified as 100 here, but remember, don't pepper your code with hard-coded 100s, store it once in a variable then use that variable!) as Z. - The DM is approximately given by M * T * 10^3 DM ~ --------------------------------------- 4.15 * float(Z) * (f_1**-3 - f_0**-3) Here T is the pulsar period in seconds, and f_0 and f_1 are radio frequencies in GHz. The frequencies are obtained from the WCS keywords as follows: f_0 = CRVAL1 + (0 - CRPIX1) * CDELT1 f_1 = CRVAL1 + (numChannels - 1 - CRPIX1) * CDELT1 - Convey this value of DM to the 'wrapper' script. Typical values of DM (which comes out in the correct units of pc cm^-3) are in the range 0-100. Eg Vela I think is 67. Importing my code: ------------------ I am going to make some modules available in my file system. I will set up symlinks in /home/ims/python/lib So if you add this directory to your PYTHONPATH environment variable, you will be able to import any module names you find in it. I haven't tested this yet, if it doesn't work I'll have to find another way, but let's give it a try. * * * * * I could think of more things to add to this tut but I think this will keep you busy for a while.