Welcome, guest | Sign In | My Account | Store | Cart

Notice! PyPM is being replaced with the ActiveState Platform, which enhances PyPM’s build and deploy capabilities. Create your free Platform account to download ActivePython or customize Python with the packages you require and get automatic updates.

Download
ActivePython
INSTALL>
pypm install thredds-crawler

How to install thredds_crawler

  1. Download and install ActivePython
  2. Open Command Prompt
  3. Type pypm install thredds-crawler
 Python 2.7Python 3.2Python 3.3
Windows (32-bit)
0.5 Available View build log
Windows (64-bit)
0.5 Available View build log
Mac OS X (10.5+)
0.5 Available View build log
Linux (32-bit)
0.5 Available View build log
Linux (64-bit)
0.5 Available View build log
 
Author
License
GPLv3
Dependencies
Lastest release
version 0.5 on Aug 9th, 2013

A simple crawler/parser for THREDDS catalogs

Usage

### Select

You can select datasets based on their THREDDS ID using the 'select' parameter. Python regex is supported.

```python > from thredds_crawler.crawl import Crawl > c = Crawl("http://tds.maracoos.org/thredds/MODIS.xml", select=[".*-Agg"]) > print c.datasets [

System Message: WARNING/2 (<string>, line 14); backlink

Inline literal start-string without end-string.

System Message: WARNING/2 (<string>, line 14); backlink

Inline interpreted text or phrase reference start-string without end-string.

System Message: ERROR/3 (<string>, line 19)

Unexpected indentation.
<LeafDataset id: MODIS-Agg, name: MODIS-Complete Aggregation, services: ['OPENDAP', 'ISO']>, <LeafDataset id: MODIS-2009-Agg, name: MODIS-2009 Aggregation, services: ['OPENDAP', 'ISO']>, <LeafDataset id: MODIS-2010-Agg, name: MODIS-2010 Aggregation, services: ['OPENDAP', 'ISO']>, <LeafDataset id: MODIS-2011-Agg, name: MODIS-2011 Aggregation, services: ['OPENDAP', 'ISO']>, <LeafDataset id: MODIS-2012-Agg, name: MODIS-2012 Aggregation, services: ['OPENDAP', 'ISO']>, <LeafDataset id: MODIS-2013-Agg, name: MODIS-2013 Aggregation, services: ['OPENDAP', 'ISO']>, <LeafDataset id: MODIS-One-Agg, name: 1-Day-Aggregation, services: ['OPENDAP', 'ISO']>, <LeafDataset id: MODIS-Three-Agg, name: 3-Day-Aggregation, services: ['OPENDAP', 'ISO']>, <LeafDataset id: MODIS-Seven-Agg, name: 7-Day-Aggregation, services: ['OPENDAP', 'ISO']>

System Message: WARNING/2 (<string>, line 28)

Block quote ends without a blank line; unexpected unindent.
]

### Skip

You can skip datasets based on their name and catalogRefs based on their xlink:title. By default, the crawler uses four regular expressions to skip lists of thousands upon thousands of individual files that are part of aggregations or FMRCs:

  • .*files/
  • .*Individual Files.*
  • .*File_Access.*
  • .*Forecast Model Run.*

By setting the skip parameter to anything other than a superset of the default you run the risk of having some angry system admins after you.

`python # Skipping everything! from thredds_crawler.crawl import Crawl c = Crawl("http://tds.maracoos.org/thredds/MODIS.xml", skip=[".*"]) assert len(c.datasets) == 0 `

## Dataset

You can get some basic information about a LeafDataset, including the services available.

```python > from thredds_crawler.crawl import Crawl > c = Crawl("http://tds.maracoos.org/thredds/MODIS.xml", select=[".*-Agg"]) > dataset = c.datasets[0] > print dataset.id MODIS-Agg > print dataset.name MODIS-Complete Aggregation > print dataset.services [

System Message: WARNING/2 (<string>, line 54); backlink

Inline literal start-string without end-string.

System Message: WARNING/2 (<string>, line 54); backlink

Inline interpreted text or phrase reference start-string without end-string.

System Message: ERROR/3 (<string>, line 64)

Unexpected indentation.
{
'url': 'http://tds.maracoos.org/thredds/dodsC/MODIS-Agg.nc', 'name': 'odap', 'service': 'OPENDAP'

System Message: WARNING/2 (<string>, line 68)

Definition list ends without a blank line; unexpected unindent.

}, {

System Message: ERROR/3 (<string>, line 70)

Unexpected indentation.
'url': 'http://tds.maracoos.org/thredds/iso/MODIS-Agg.nc', 'name': 'iso', 'service': 'ISO'

System Message: WARNING/2 (<string>, line 73)

Block quote ends without a blank line; unexpected unindent.

}

System Message: WARNING/2 (<string>, line 74)

Block quote ends without a blank line; unexpected unindent.
]

If you have a list of datasets you can easily return all endpoints of a certain type: ```python > from thredds_crawler.crawl import Crawl > c = Crawl("http://tds.maracoos.org/thredds/MODIS.xml", select=[".*-Agg"]) > urls = [s.get("url") for d in c.datasets for s in d.services if s.get("service").lower() == "opendap"] > print urls [

System Message: WARNING/2 (<string>, line 77); backlink

Inline literal start-string without end-string.

System Message: WARNING/2 (<string>, line 77); backlink

Inline interpreted text or phrase reference start-string without end-string.

System Message: ERROR/3 (<string>, line 84)

Unexpected indentation.
'http://tds.maracoos.org/thredds/dodsC/MODIS-Agg.nc', 'http://tds.maracoos.org/thredds/dodsC/MODIS-2009-Agg.nc', 'http://tds.maracoos.org/thredds/dodsC/MODIS-2010-Agg.nc', 'http://tds.maracoos.org/thredds/dodsC/MODIS-2011-Agg.nc', 'http://tds.maracoos.org/thredds/dodsC/MODIS-2012-Agg.nc', 'http://tds.maracoos.org/thredds/dodsC/MODIS-2013-Agg.nc', 'http://tds.maracoos.org/thredds/dodsC/MODIS-One-Agg.nc', 'http://tds.maracoos.org/thredds/dodsC/MODIS-Three-Agg.nc', 'http://tds.maracoos.org/thredds/dodsC/MODIS-Seven-Agg.nc'

System Message: WARNING/2 (<string>, line 93)

Block quote ends without a blank line; unexpected unindent.
]

## Metadata

The entire THREDDS catalog metadata record is saved along with the dataset object. It is an etree Element object ready for you to pull information out of. See the [THREDDS metadata spec](http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/v1.0.2/InvCatalogSpec.html#metadata)

`python > from thredds_crawler.crawl import Crawl > c = Crawl("http://tds.maracoos.org/thredds/MODIS.xml", select=[".*-Agg"]) > dataset = c.datasets[0] > print dataset.metadata.find("{http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0}documentation").text Ocean Color data are provided as a service to the broader community, and can be influenced by sensor degradation and or algorithm changes. We make efforts to keep this dataset updated and calibrated. The products in these files are experimental. Aggregations are simple means of available data over the specified time frame. Use at your own discretion. `

## Known Issues

  • Will not handle catalogs that reference themselves

Subscribe to package updates

Last updated Aug 9th, 2013

What does the lock icon mean?

Builds marked with a lock icon are only available via PyPM to users with a current ActivePython Business Edition subscription.

Need custom builds or support?

ActivePython Enterprise Edition guarantees priority access to technical support, indemnification, expert consulting and quality-assured language builds.

Plan on re-distributing ActivePython?

Get re-distribution rights and eliminate legal risks with ActivePython OEM Edition.