Project description

  • Student: Andrei Cristian Ignat
  • Supervisor: Dr. Christoph Deil
  • Time: June 10, 2013 to August 9, 2013 (9 weeks)
  • Place: MPIK Heidelberg, H.E.S.S. group of Prof. Werner Hofmann

Abstract

Astronomical gamma-ray data analysis can be very CPU and / or I/O intensive. The purpose of this 9-week, first year physics student project is to time and profile typical data analysis tasks with a focus on the speedups that can be obtained for the maximum likelihood fitting step by using multiple CPU cores.

Project plan

We have nine weeks ... it’s very hard to predict how fast results are obtained ... so I reserved week 6 to continue with the main project or to do one of the side projects and there are two weeks at the end to write up the report and finish up loose ends.

  • Week 1: Learn some gamma-ray astrophysics (see references above)
  • Week 2: Learn some gamma-ray data analysis methods (see references above)
  • Week 3: Define and produce test data sets (one Galactic and one extra-galactic; one Fermi and one HESS)
  • Week 4: Run and time analyses with ctools on at least two machines and measure the speedup with the number of cores.
  • Week 5: Profile the analyses to find out where the CPU time is spent. Possibly try different compilers (gcc, clang, icc) and optimiser flags.
  • Week 6: Continue main project or if there is time do one of these things: time HAP, gt_apps_mp or Sherpa (see above) or some of the other ctools tasks (see here). Looking at CPU usage, memory usage and disk I/O would also be interesting to get a rough overview of what the analyses are doing (e.g. ctselect speed is probably disk I/O speed limited)
  • Week 7: Write up report
  • Week 8: Iterate project report (e.g. clearer description or double-check results or add additional plots or ...)

The project report and notes and scripts in the https://github.com/gammapy/gamma-speed/ repo are the product of your project. It should be a starting point for further work on HESS, Fermi, CTA data analysis speed by others in the future. Detailed descriptions of which tools you tried to time and profile (and possibly measure memory usage and disk I/O) and which are useful and which aren’t and how to use them is helpful.

The most useful thing would be an automatic script that measures certain aspects of ctools performance for typical analysis scenarios that can easily be re-run to try out speed improvements and prevent performance regressions, but this level of automation is most likely not possible in the given time. Just to get the idea I have in mind here, have a look at the PyPy speed center or the pandas benchmark as measured by vbench

Further references

Here’s some more useful references for tools you might use:

Project Versions

Table Of Contents

Previous topic

Welcome to gamma-speed’s documentation!

Next topic

Amdahl’s Law

This Page