How to install serpextract
- Download and install ActivePython
- Open Command Prompt
pypm install serpextract
serpextract provides easy extraction of keywords from search engine results pages (SERPs).
Latest release on PyPI:
$ pip install serpextract
Or the latest development version (not recommended):
$ pip install -e git://github.com/Parsely/serpextract.git#egg=serpextract
Command-line usage, returns the engine name and keyword components separated by a comma and enclosed in quotes:
$ serpextract "http://www.google.ca/url?sa=t&rct=j&q=ars%20technica" "Google","ars technica"
You can also print out a list of all the SearchEngineParsers currently available in your local cache via:
$ serpextract -l
The list of search engine parsers that Piwik and therefore serpextract uses is far from exhaustive. If you want serpextract to attempt to guess if a given referring URL is a SERP, you can specify use_naive_method=True to serpextract.is_serp or serpextract.extract. By default, the naive method is disabled.
Naive search engine detection tries to find an instance of r'\.?search\.' in the netloc of a URL. If found, serpextract will then try to find a keyword in the query portion of the URL by looking for the following params in order:
_naive_params = ('q', 'query', 'k', 'keyword', 'term',)
If one of these are found, a keyword is extracted and an ExtractResult is constructed as:
ExtractResult(domain, keyword, None) # No parser, but engine name and keyword
In the event that you have a custom search engine that you'd like to track which is not currently supported by Piwik/serpextract, you can create your own instance of serpextract.SearchEngineParser and either pass it explicitly to either serpextract.is_serp or serpextract.extract or add it to the internal list of parsers.
You can also permanently add a custom parser to the internal list of parsers that serpextract maintains so that you no longer have to explicitly pass a parser object to serpextract.is_serp or serpextract.extract.
There are some basic tests for popular search engines, but more are required:
$ pip install -r requirements.txt $ nosetests
Internally, this module caches an OrderedDict representation of Piwik's list of search engines which is stored in serpextract/search_engines.pickle. This isn't intended to change that often and so this module ships with a cached version.