{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["
\n", "
\n", "
\n", "
ObsPy Tutorial
\n", "
Introduction to File Formats and read/write in ObsPy
\n", "
\n", "
\n", "
"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Seismo-Live: http://seismo-live.org\n", "\n", "##### Authors:\n", "* Lion Krischer ([@krischer](https://github.com/krischer))\n", "* Tobias Megies ([@megies](https://github.com/megies))\n", "---"]}, {"cell_type": "markdown", "metadata": {}, "source": ["![](images/obspy_logo_full_524x179px.png)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["This is oftentimes not taught, but fairly important to understand, at least at a basic level. This also teaches you how to work with these in ObsPy."]}, {"cell_type": "markdown", "metadata": {}, "source": ["**This notebook aims to give a quick introductions to ObsPy's core functions and classes. Everything here will be repeated in more detail in later notebooks.**"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('ggplot')\n", "plt.rcParams['figure.figsize'] = 12, 8"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## SEED Identifiers\n", "\n", "According to the [SEED standard](www.fdsn.org/seed_manual/SEEDManual_V2.4.pdf), which is fairly well adopted, the following nomenclature is used to identify seismic receivers:\n", "\n", "* **Network code**: Identifies the network/owner of the data. Assigned by the FDSN and thus unique.\n", "* **Station code**: The station within a network. *NOT UNIQUE IN PRACTICE!* Always use together with a network code!\n", "* **Location ID**: Identifies different data streams within one station. Commonly used to logically separate multiple instruments at a single station.\n", "* **Channel codes**: Three character code: 1) Band and approximate sampling rate, 2) The type of instrument, 3) The orientation\n", "\n", "This results in full ids of the form **NET.STA.LOC.CHAN**, e.g. **IV.PRMA..HHE**.\n", "\n", "\n", "---\n", "\n", "\n", "In seismology we generally distinguish between three separate types of data:\n", "\n", "1. **Waveform Data** - The actual waveforms as time series.\n", "2. **Station Data** - Information about the stations' operators, geographical locations, and the instrument's responses.\n", "3. **Event Data** - Information about earthquakes.\n", "\n", "Some formats have elements of two or more of these."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Waveform Data\n", "\n", "![stream](images/Stream_Trace.svg)\n", "\n", "There are a myriad of waveform data formats but in Europe and the USA two formats dominate: **MiniSEED** and **SAC**\n", "\n", "\n", "### MiniSEED\n", "\n", "* This is what you get from datacenters and also what they store, thus the original data\n", "* Can store integers and single/double precision floats\n", "* Integer data (e.g. counts from a digitizer) are heavily compressed: a factor of 3-5 depending on the data\n", "* Can deal with gaps and overlaps\n", "* Multiple components per file\n", "* Contains only the really necessary parameters and some information for the data providers"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import obspy\n", "\n", "# ObsPy automatically detects the file format.\n", "st = obspy.read(\"data/example.mseed\")\n", "print(st)\n", "\n", "# Fileformat specific information is stored here.\n", "print(st[0].stats.mseed)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["st.plot()"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# This is a quick interlude to teach you basics about how to work\n", "# with Stream/Trace objects.\n", "\n", "# Most operations work in-place, e.g. they modify the existing\n", "# objects. We'll create a copy here.\n", "st2 = st.copy()\n", "\n", "# To use only part of a Stream, use the select() function.\n", "print(st2.select(component=\"Z\"))\n", "\n", "# Stream objects behave like a list of Trace objects.\n", "tr = st2[0]\n", "\n", "tr.plot()\n", "\n", "# Some basic processing. Please note that these modify the\n", "# existing object.\n", "tr.detrend(\"linear\")\n", "tr.taper(type=\"hann\", max_percentage=0.05)\n", "tr.filter(\"lowpass\", freq=0.5)\n", "\n", "tr.plot()"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# You can write it again by simply specifing the format.\n", "st.write(\"temp.mseed\", format=\"mseed\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### SAC\n", "\n", "* Custom format of the `sac` code.\n", "* Simple header and single precision floating point data.\n", "* Only a single component per file and no concept of gaps/overlaps.\n", "* Used a lot due to `sac` being very popular and the additional basic information that can be stored in the header."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["st = obspy.read(\"data/example.sac\")\n", "print(st)\n", "st[0].stats.sac.__dict__"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["st.plot()"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# You can once again write it with the write() method.\n", "st.write(\"temp.sac\", format=\"sac\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Station Data\n", "\n", "![inv](images/Inventory.svg)\n", "\n", "Station data contains information about the organziation that collections the data, geographical information, as well as the instrument response. It mainly comes in three formats:\n", "\n", "* `(dataless) SEED`: Very complete but pretty complex and binary. Still used a lot, e.g. for the Arclink protocol\n", "* `RESP`: A strict subset of SEED. ASCII based. Contains **ONLY** the response.\n", "* `StationXML`: Essentially like SEED but cleaner and based on XML. Most modern format and what the datacenters nowadays serve. **Use this if you can.**\n", "\n", "\n", "ObsPy can work with all of them but today we will focus on StationXML."]}, {"cell_type": "markdown", "metadata": {}, "source": ["They are XML files:"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["!head data/all_stations.xml"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import obspy\n", "\n", "# Use the read_inventory function to open them.\n", "inv = obspy.read_inventory(\"data/all_stations.xml\")\n", "print(inv)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["You can see that they can contain an arbirary number of networks, stations, and channels."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# ObsPy is also able to plot a map of them.\n", "inv.plot(projection=\"local\");"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# As well as a plot the instrument response.\n", "inv.select(network=\"IV\", station=\"SALO\", channel=\"BH?\").plot_response(0.001);"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Coordinates of single channels can also be extraced. This function\n", "# also takes a datetime arguments to extract information at different\n", "# points in time.\n", "inv.get_coordinates(\"IV.SALO..BHZ\")"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# And it can naturally be written again, also in modified state.\n", "inv.select(channel=\"BHZ\").write(\"temp.xml\", format=\"stationxml\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Event Data\n", "\n", "![events](./images/Event.svg)\n", "\n", "Event data is essentially served in either very simple formats like NDK or the CMTSOLUTION format used by many waveform solvers:"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["!cat data/GCMT_2014_04_01__Mw_8_1"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Datacenters on the hand offer **QuakeML** files, which are surprisingly complex in structure but can store complex relations."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Read QuakeML files with the read_events() function.\n", "cat = obspy.read_events(\"data/GCMT_2014_04_01__Mw_8_1.xml\")\n", "print(cat)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["print(cat[0])"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["cat.plot(projection=\"ortho\");"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Once again they can be written with the write() function.\n", "cat.write(\"temp_quake.xml\", format=\"quakeml\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["To show off some more things, I added a file containing all events from 2014 in the GCMT catalog."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import obspy\n", "\n", "cat = obspy.read_events(\"data/2014.ndk\")\n", "\n", "print(cat)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["cat.plot();"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["cat.filter(\"depth > 100000\", \"magnitude > 7\")"]}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}}, "nbformat": 4, "nbformat_minor": 2}