Effelsberg-Bonn HI Survey (EBHIS)
In a few month the new 21-cm 7-Beam receiver will be installed. Our team (J. Kerp, P. Kalberla, R. Keller, A. Zensus, and me) will perform a HI survey mapping the full northern hemisphere. The aim is to map both the galactic and extragalactic sky at the same time using two distinct spectrometer backends. Using a Multi-Beam receiver allows to reach outstanding sensitivity with detections of 10^7 M_sun dwarfs at a distance of Virgo (16 Mpc).
Survey properties |
Area: 20600 (8500*) square degrees Angular resolution: 9 arcmin (fully sampled) Redshift range: ~0.07 EBHISeNumber of spectral channels: 3200Bandwidth: 100 MHz Velocity resolution: 6 km/s RMS-limit 5 (2*) mJy/Beam Mass-limit @ 16 Mpc: 1.8(0.8*) 10^7 M_sun EBHISgNumber of spectral channels: 1024Bandwidth: 5 MHz Velocity resolution: 1 km/s RMS-limit 12 (5.5*) mJy/Beam N_HI limit 0.9 (0.4*) 10^18 cm^-2 * towards SDSS area |
||||||||||
Scientific aims |
EBHISe - Extragalactic SurveyThe low-mass end of the HI Mass Function (HIMF)
The abundance and effects of environmental conditions on low-mass objects (M ≤ 1E07 M_sun) are currently not well understood. Exploring this population of galaxies or clouds is therefore a major task of all blind Hi surveys. The Arecibo telescope will probe this population by the medium-deep environmental survey with 1 minute integration time per position. The collecting area of Arecibo is ten times larger than Effelsberg but we will spend 10 minutes per position making the Effelsberg data superior in a statistical sense, due to of the 8500 sq. degree coverage in comparison to 2000 sq. degrees from Arecibo. The local baryon budget
We will detect all Hi clouds with a mass of M_HI >= 2.5 10^5 M_sun (assumed line width Δv = 80 km s-1) out to a distance of 1.5 Mpc with a significance above 8σ. Accordingly, it is feasible to determine the neutral local baryon fraction towards the barycenter of the Local Group of Galaxies, far too north for other radio observatories, with a unique signal-to-noise ratio. High-mass HI galaxies
Owing to its large survey area of 8.500 sq. degrees and high sensitivity the excellent statistics of the Effelsberg survey it is possible to quantify evolutionary effects as a function of redshift. Assuming that these effects have a (1+z)3 dependence at the edge of our frequency limit corresponding to z = 0.066 the evolutionary effect will enhance to 23%, which is easily detectable. The mass detection limit will be about 9∙109 MSun at that redshift. Search for galaxies near low-redshift Lyman-alpha absorbers
This might be a short term project within the Effelsberg survey, because we can start from an already existing source catalogue and search for Hi emission towards particular lines of sight. If we detect Hi emission, follow-up measurements with sensitive radio interferometers are necessary to clarify the origin of the absorption by the galaxy halo or by accreted neutral clouds within the gravitational potential of the mass concentration. Environmental dependence of the HI properties of galaxies
The large number and variety of galaxy clusters and groups in the northern sky, allows to study galaxy evolution and mass accretion as a function of different environments or mass concentrations. We can trace the effects of the intergalactic medium on the HI content of galaxies by correlation studies with X-ray data of the intergroup/intercluster medium. We will also use the SDSS data to study the optical properties of the galaxy clusters and groups as a function of the HI mass distribution. The proposed Effelsberg survey will allow us to study for example the M81/M82 group with a factor of four time's higher sensitivity than the HIJASS observation presented by Boyce et al. (2001). EBHISg - Milky Way SurveyHigh-Velocity Clouds
High-velocity cloud research is a traditional theme of radio astronomy at Bonn University. We discovered the interaction of HVCs with the Galactic halo (Kerp, Lesch and Mack 1994, Pietz et al. 1996, Brüns, Kerp & Pagels 2001) and performed a large-scale correlation with ROSAT All-Sky-Survey data (Kerp et al. 1999, Pradas et al. 2006). Using the Effelsberg survey, we will focus on the transition regions between dwarf galaxies and compact high-velocity clouds, as well as search for the frequency and mass spectrum of the ultra-compact high velocity clouds, which might trace a population of Galactic clouds that might have been overlooked up to now. The warm neutral gas, which envelopes the cold cores of the HVCs, will allow us to trace ram-pressure stripping events in the environment of the HVCs; studying HVCs with well-determined distance limits will allow us to probe the influence of stripping and heat conduction on the stability of HVC in the disk-halo interface. Multiphase and extraplanar Gas
Studies of high- and intermediate-velocity clouds (objects of the infrared cirrus) show evidence for physical connections between both populations (Pietz et al. 1996, Kappes, Kerp & Richter 2003). We intend to study the connection in much larger detail, focusing in particular on the IVC phenomenon which might be the key to clarify the connection of high-altitude gas with Galactic fountain processes. Here, the analysis of line shapes is essential for investigating the multiphase structure. Mass spectrum
The discovery of parsec or AU-sized clouds within the Galactic halo by Lockman (Lockman 2002) focused our interest on the tiniest structures which can exist within the extreme environment of the Milky Way halo. The Effelsberg survey will be the major resource to determine the mass and size spectrum of structures within the Milky Way halo, because of its unique signal-to-noise ratio, dense angular sampling and sensitivity of 10^18 cm^-2 (24 mK rms and a line width of Δv = 20 km s^-1, 11 mK rms towards the SDSS area). HI-shells
The Canadian Galactic Plane Survey and the Southern Galactic-Plane Survey (Taylor et al. 2003 and 2002, McClure-Griffiths et al. 2001) provide a wealth of information on shells and cavities produced by the stellar evolution within the Milky Way. The shallower Effelsberg survey of the low Galactic latitude sky will supplement these radio interferometer surveys with information on the warm neutral gas with unique signal-to-noise quality data. This will allow us to obtain a coherent view of the evolution of shells within the solar neighborhood and the inner galaxy, and of the imprint of the density waves on these structures, only traceable by the warm gas. X-ray absorption
Future X-ray missions like XEUS and Con-X need to observe the early universe through the X-ray attenuating gas distribution of the Milky Way. The warm neutral medium is, by an order of magnitude, the most efficient absorber for soft X-ray photons with energies less than 0.3 keV (Kerp et al. 1998, for a review see also Kerp 2003). To overcome the "cosmic conspiracy" it is necessary to observe the sky through the Lockman window or the Chandra Deep Field South window, but towards multiple lines of interest with extremely well-studied HI column density distributions. the Effelsberg survey in combination with the Parkes narrow band survey will be the standard resource to identify these regions of "simple" soft X-ray absorbing regions. (jk)
|
||||||||||
Data reduction |
Data reduction scheme
We will use a data reduction-pipeline similar to that of HIPASS. For the RFI mitigation, stray-radiation correction, and baseline subtraction we will make use of the already existing and tested software developed in Bonn. During the first survey phase, we will optimize our observational setup and adopt the software accordingly. The software is subdivided into modules, which are equipped with a MBFITS-data interface. The whole data reduction pipeline is sketched in Appendix D. In addition to this data reduction pipeline, we have to develop and test new ways to determine an accurate spectral baseline for the Milky Way survey. Here, we plan to follow the least-squares-frequency switching approach of Arecibo (Heiles 2005). This software does not exist in Bonn but will be provided as part of our collaboration with Dr. M. Putman. To control the detection rate of galaxies within the data cubes, we will use the same approach as initially described by Rosenberg & Schneider (2002). Also this software module needs to be implemented into the Effelsberg telescope data reduction chain.
The data reduction chain uses the multi-beam fits format (MBfits) to exchange the data products between the individual modules. Each module has an MBfits input and provides a fits-format table that comprises the information i.e. on RFI contaminated spectral channels, the calibration factors, the stray-radiation correction or the ideal bandpass. The fits-outputs can be optionally used by the merger module for the correction of the raw data. This modular structure allows to debug the software during the development process very efficiently. After the merger, the "gridder" will calculate a corrected data cube which can be analysed with the "galaxy parameteriser" (GaPA) presented in Appendix B. The RFI, flux and stray radiation module is already programmed for the Effelsberg telescope. A new development is necessary for the modelling of the bandpass. Here, we plan to take advantage from the experience of our colleagues from Arecibo, using the least-squares frequency switching method by Heiles (2005). RFI detection/mitigation
Radio Frequency InterferenceThe sky is not "dark" for radio astronomers. Telecommunication and radio broadcast are present at different power levels everywhere on Earth. Today the sensitivity of the radio telescopes and their receiver systems has improved to measurements of the continually fainter signals from distant sources, especially investigating the high-redshift universe with future telescopes like LOFAR or SKA. On the other hand, the protected bands for modern radio telescopes are becoming continually more polluted by signals from a variety of electronic devices which produce Radio Frequency Interference (RFI) across the entire electromagnetic frequency range. We studied the interference situation using the Effelsberg 100-m telescope at 21-cm wavelength utilising a new FPGA-based spectrometer (Stanko et al., 2005) which provides high temporal resolution spectra (down to 100 ms). Algorithm
First step in the detection of interferences is the separation of all spectral features from the underlying baseline. Due to the high temporal resolution of our data we have a couple of spectra referring to each specific position on the sky. So we implemented a 2-dimensional (in time-frequency plane) baseline fitting procedure with automatic setting of proper windows.
We use an edge-detection algorithm (utilising horizontal/vertical matched filters) to evaluate a first guess for features in time-frequency plane. All signals above a certain threshold ($4\sigma_\textrm{rms}$, robust calculation) are excluded from the following 2-dimensional baseline-fit procedure. In the successive iteration steps the fitted baseline is subtracted from the data, which yields a residual. This result is inspected for signals in excess of a trigger level $x_\mathrm{trig}\sigma_\mathrm{rms}$ which in return defines new windows. The procedure is repeated until the fit has converged. The residual contains only astronomical line emission and RFI signals. To distinguish between both a follow-up statistical analysis has to be performed. It mainly uses the variability of most interferences as a classifier, but we implemented some additional detectors using morphological considerations. We implemented a graphical frontend to the detection algorithm explained above using C++/Qt which provides a sophisticated widget library as well as container classes (arrays, lists) and multi-threading support. The latter makes it easy to use multi-processor systems to accelerate the computation. The fits-I/O is done using the CFITSIO library. The figures below show the progress bar tab while using four threads (left panel) and a graphical preview of the currently detected RFI peaks (right panel, x-axis: frequency, y-axis: time) which are mainly narrow-band time-variable signals in this case overplotted to the current data.
RFI situation at EffelsbergUsing the number counts of the detected RFI signals it is possible to constrain the direction of their origin. The Figure shows the frequency of occurrence (weighted by SNR) of the RFI signals as a function of the angular distance from the building. At an angular distance of about 100$^\circ$ a local maximum can be identified. We attribute this to the so-called spill-over ring of the telescope receiver system meaning that a fraction of the interferences originate from the observatory building itself. Towards smaller angular distances the count rate increases. We associate this behaviour with one of the side lobes of the telescope, supporting our suggestion.
Stray-radiation correction
In HI (21-cm) astronomy it is well known, that the galactic HI disk can cause strong "wings" at small radial velocities. These are produced when a strong emitter (which is true for the HI disk) lies in the direction, the sidelobes of the telescope beam point to. In this way the observed signal of higher latitude sources can be polluted which is known as "stray radiation". In our institute we have the worlds expert on stray radiation, Peter Kalberla, who did the correction for the famous Leiden-Argentine-Bonn (LAB) survey. He will also be heavily involved in the new EBHIS doing the stray-radiation correction for it. The Gamma test
The Gamma test is a statistical method which allows to compute the noise in data when there is an arbitrary function underlaid. This function has not necessarily to be known, although one major condition for the Gamma test to work is, that the function is smooth (and continous). This turns out to be extremely useful in time series modelling, where one tries to find models using neural networks or genetic algorithms. A major problem in training a neural network is the stopping condition - to know when to stop the training. For that, one must know the noise which is incorporated in the data. But also for other applications, as well as for astronomy, it is useful to have a noise estimator which is very robust against baseline fluctuations. The Gamma test was originally developed at the university of Cardiff; for a proof see the Ph.D. thesis of Dafydd Evans ([Evans 2002a] The Gamma Test: Data derived estimates of noise for unknown smooth models using near neighbour asymptotics. D. Evans, Department of Computer Science, Cardiff University 2002). In order to understand what the Gamma test does, it is useful to start with the following simple consideration. Assume we have a function f(x) - the model. Then we can write
To setup things we introduce some definitions. Suppose we have M observations
From our first simple view we know that it might be a good idea to explore the continuity of f. Consider two points x(i) and x(j) which are close together, we expect then f(x(i)) and f(x(j)) are also close together or if not this can only be due to noise. The Gamma method is based on the statistic
We do the same on the gamma statistic as a function of the p-th nearest neighbours.
On can show now that
The Gamma test now computes all values of delta and gamma of p up to a maximum p_max. By linear regression we find the vertical intercept which is effectively the limit of gamma for δ→0, which is the variance of r. Also the slope of the regression line provides useful information, as this is dependend of the model complexity. The slope is proportional to the constant A(p), and this was related to the partial derivatives of f. By the way, the time complexity of the algorithm is of the order of M log M, which is ok if M is not too large. We finish with an example. Lets use a simple sine function as model and add some gaussian noise. Generated were 1000 samples according to the sine model plus gaussian noise with standard deviation of 0.1. Performing the Gamma test leads first to the γ and δ values as a function of p.
As you can see these values lie more or less on a line. Just to note, that this plot does not always looks so beautiful, as the method only makes a statement on the value of the intercept, not on the points itself. We can extract the variance which is about 9.8E-03. Taking the square root gives a sigma of ≈0.1. For comparison the mean square error is about 0.7, which is of course totally off, because the amplitude of the sine function is much higher than the amplitude of the noise. We utilize the Gamma test for galaxy finding within survey data cubes; see next topic (Galaxy finding). Galaxy finding/parametrization
The Effelsberg Hi extragalactic survey will produce a huge data volume. In comparison to the HIPASS survey we observe a smaller portion of the sky but due to faster dumping of spectra we end up with several Tbytes of raw data. Based on our experience with HIPASS data cubes to establish the HICAT database, we expect to detect several thousand of galaxies in total. This makes a manual analysis and parameterisation of the galaxies impossible. There exist several finder algorithms (i.e. MULTIFIND, TOPHAT and wavelet based algorithm). None of them is able to detect all galaxies within a data cube. Merging the results of the different finder algorithms results in a nearly complete list of possible galaxies. The major drawback of this approach is that about three times more candidates than galaxies are compiled in the candidate list. A manual verification and parameterisation is unavoidable (using shell-scripts and the MIRIAD software package). Based on the experience of the HICAT group members, the whole procedure will last more than a full man-year. We studied several methods to find a more sophisticated approach consisting of two main concepts. First we utilize a new finder routine developed at the University of Cardiff, which is based on the so-called Gamma test method. P. Boyce (2003) proposed an algorithm using a moving Gamma test for galaxy detection. The Gamma test is able to detect a complete list of galaxies, although low flux galaxies compete with artificial spectral features caused by RFI signals. The latter are indeed a main problem for the application of this method on the HIPASS cubes, but as we have a proven-to-work RFI detection toolbox for the FPGA based spectrometer we expect excellent results for our Effelsberg data. Second, we have written a graphical-user interface especially designed for fastest verification and parameterisation of candidate galaxies. It provides access to the Gamma test finder algorithm as well as to a newly developed X-ray finder (efficiency still to be proven). Based on the candidate lists it performs automatic parameterisation. The user can confirm or reject candidates within a few seconds. If necessary the user can calibrate bandpass shapes. The compiled galaxy lists can be saved in XML format and, with the help of a XSL style file can be instantly visualized with a web browser. The Figures show screen shots of the graphical-user interface displaying three projections of an HI data cube. These projections are shown in three windows in addition to windows that show one projection of the X-ray and the Gamma value cube. The compiled source list is provided in the lower left portion of each Figure. In the right Figure the moment map including data, model fit (two dimensional elliptical Gaussian), and residual as well as the peak flux and weighted mean spectra for parameterisation are displayed in different windows. Gridding
Gridding is the process to produce from the set of spectra (with arbitrary coordinates) a data cube. As the name indicates the result lives on a regular grid. The procedure is related to interpolation/smoothing, which is sometimes called regridding, when the data representation should be transformed from one regular grid to another. However, in our case we do not start with data which are regularly spaced. Each of the spectra contains the flux of nearby sources convolved with the telescope beam (the true intensity distribution is filtered with the beam and then sampled, which means measured at discrete positions). The most simple gridding method would be to calculate for each pixel the mean of all data points accessible within a certain radius (equivalent to box filtering, which can introduce ringing artifacts, also known as Gibbs phenomenon). Barnes (2001; see also Gridzilla) proposes to use the median estimator instead of the mean and introduces a smooth cut-off function (Gaussian) to avoid ringing. The median, nevertheless, leads to about 25% higher noise in the final data cubes. It was used mainly to deal with RFI signals. Therefore we chose to use the mean estimator but weighted with a Gaussian of a certain kernel width (which slightly reduces the effective spatial resolution). On the other hand we gain the possibility to serialize the data processing. While Barnes needed to keep all spectra for which to calculate the median in memory, we can process the spectra one by one. This enormously reduces the memory consumption of our algorithm. Now, each spectrum is sequentially loaded and according to its spatial coordinates all pixels within a certain range are determined. Based on the distance of the spectrum's coordinates to each of those pixels one can compute a weighting factor according to the smoothing Gaussian. The product of the weighting factor and the spectral intensities is added to the appropriate pixels in the cube, while the weighting factor itself is added to a weighting map which contains the sum of all weighting factors for each spatial pixel coordinate. After processing all spectra the data cube has to be divided by the weighting map to provide correct fluxes for each pixel. In astronomy we have to deal with the fact that the observations intrinsically take place in a spherical coordinate system. In the previous paragraphs we did not account for projection effects which obviously occur if we map from the observed curvilinear coordinates to a rectangular grid (using a certain coordinate projection/transformation). The latter is necessary if we want to produce a data cube (in fits format). However, only slight changes to our method are required to incorporate general coordinate projections. First, we need the transformation equations to convert between the true coordinates on the sphere of the sky (so-called World Coordinate System, WCS) and the pixel representation within the data cube. For each input spectrum we calculate (from its Ra/Dec or Lon/Lat coordinate) the pixel coordinates. Also in the WCS we define out to which distance we want to filter. We determine the equivalent pixels and put in the weighted values of the spectrum into the data cube (according to what we explained above, and using the WCS distance as parameter to determine the weighting factors). Finally we divide the summed-up data cube by the summed-up weighting factors in each pixel. So the only difference to the algorithm as describe earlier we use the correct measure to determine distances. In order to provide a convenient interface to the gridding algorithm explained above we chose to implement it in C++/Qt. It provides a sophisticated widget library as well as container classes (arrays, lists) and multi-threading support. The latter makes it easy to use multi-processor systems to accelerate the gridding. The fits-I/O is done using the CFITSIO library, the WCS-pixel conversion uses the WCSlib. First, the user has to define the input files containing fits-tables with the spectra which have to be gridded. Bonn Kong will show the positions of the spectra according to the currently chosen coordinate system. This can be changed in the "Fits header" tab. At the moment the WCSlib provides >20 projections. The user can also convert the Ra-Dec coordinates to Galactic coordinates, as well as define the relevant fits header entries (which define the mumber of the pixels in the grid, spatial resolution, the reference pixel, etc.).
The viewport will update its content on-the-fly. For convenience one can also change the displayed coordinate mesh. Finally, on the "Gridding" tab one can choose between different weighting functions (at the moment the Gaussian weighting, as well as a weighting using the spherical Bessel function of 0th order are implemented, the latter though is quite time consuming). Here the beam-size (according to the input spectra), the desired kernel-size and the maximum radius out to which the filtering is carried out can be adjusted.
Bandpass calibration (using LSFS; see Heiles 2007)
Least-squares frequency switching (Heiles technote) is a new method to reconstruct signal and gain function (known as bandpass or baseline) from spectral line observations using the frequency switching method. LSFS utilizes not only two but a set of three or more Local Oscillator (LO) frequencies. The reconstruction is based on a least squares fitting scheme. Here we present a detailed analysis on the stability of the LSFS method in a statistical sense and test the robustness against Radio Frequency Interference (RFI), Intermediate Frequency (IF) gain instabilities and continuum sources. It turns out that the LSFS method is indeed a very powerful method and is robust against most of such problems. Nevertheless LSFS fails in presence of RFI signals or strong line emission. We worked out solutions to overcome these limitations (The Robustness of Least-Squares Frequency Switching (LSFS)). |
Other projects


where r and r′ are noise values but drawn from the same noise distribution. In the limit of
we can safely assume that
goes to zero if the model is continuous. It follows that
We compute the expectation values of both sides
The right side turns out to be the variance of the r, which we have looked for. But if this is already what we where looking for, what is the Gamma test about? The problem in this calculation is, that for observational data we do not have enough dense points to really calculate this limit. To deal with this we can use the Gamma test.
where samples are generated by an unknown function f with noise added.
We assume the noise distribution to have zero mean, which is without loss of generality, because we could transform the model to fulfil this. We require f to be smooth for obvious reasons.
y′ denotes the function value of the nearest neighbour of x(i), therefor it is called sometimes near neighbour statistic. We now write a similar form not only for the first neighbour but for the p-th nearest neighbours
We call the distance between a point and its p-th nearest neighbours δ(p). We can compute the distance as the mean-squared difference of all p-th nearest neighbour to x(i), and compute the mean for all samples. The notation N[i,p] means the list of all (equidistant) p-th neighbours of x(i). The second form is only another notation for convenience.
So we not only have the nearest neighbour statistic, which we started from, but also a statistic for the p-th nearest neighbour.
If δ goes to zero this becomes the pure variance of r, if the constant A is finite. It is indeed true that
For this reason we need the smoothness of the model f, as then the partial derivatives of f are bounded and A becomes finite. In order for the equation to be valid a further condition must be fulfilled, namely, that the first four moments of the noise distribution have to be bounded. This is for technical reasons in the proof of the Gamma test. Of course we require the noise on different outputs to be independend of each other, as well as that the noise on the output is homogenous over the input space. However, if the latter is not fulfilled, this is not fatal for practical applications. The Gamma test will still provide a useful estimate for the average of the variance.




