Welcome, guest | Sign In | My Account | Store | Cart

Notice! PyPM is being replaced with the ActiveState Platform, which enhances PyPM’s build and deploy capabilities. Create your free Platform account to download ActivePython or customize Python with the packages you require and get automatic updates.

Download
ActivePython
INSTALL>
pypm install collective.portlet.similarcontent

How to install collective.portlet.similarcontent

  1. Download and install ActivePython
  2. Open Command Prompt
  3. Type pypm install collective.portlet.similarcontent
 Python 2.7Python 3.2Python 3.3
Windows (32-bit)
1.5 Available View build log
1.4 Available View build log
1.3 Available View build log
Windows (64-bit)
1.5 Available View build log
1.4 Available View build log
1.3 Available View build log
Mac OS X (10.5+)
1.5 Available View build log
1.4 Available View build log
1.3 Available View build log
Linux (32-bit)
1.5 Available View build log
1.4 Available View build log
1.3 Available View build log
Linux (64-bit)
1.5 Available View build log
1.4 Available View build log
1.3 Available View build log
 
Author
License
ZPL 2.1
Dependencies
Lastest release
version 1.5 on Dec 13th, 2011

Introduction

A Plone portlet that uses the catalog internals to find 'similar' content to the page you are looking at

This portlet uses some deep dark data structures within the ZCatalog and ZCTextindex, so it could be brittle in the future if those structures are changed. Then again, they have been the same for the past 8 years or so ;)

This portlet also runs in linear time relative to the number for documents you have in your site, so it could well slow things down. That said I've tried to make it pretty efficient.

How it Works

In a nutshell, this portlet compares the text content of an object with all other objects on the site to find other objects with a similar content. The steps are as follows:

  1. Find the path of this document
  2. Look up the record_id (docid) of this path in the catalog
  3. Look in the SearchableText index to find all word ids (wids) in this document
  4. Work out the top 20 most 'important' words in this document [*]
  5. For each of the top 20 words, find all documents containing any of those words
  6. Use a vector space model to measure similarity of each candidate document to our top 20 words
  7. Return the top 10 most similar documents.

[*] We work out the top 20 words using a TF*IDF algorithm (the same used in ZCTextIndex.OkapiIndex) to find the words that appear proportionately high in this document compared to all documents in general.

TODO

Add some caching ;)

Changelog

1.5 - 2011-12-12
  • Fixed portlet edit permission used [vangheem]
1.4
  • Added checks for security and language on results [Alessio Siniscalchi]
1.3
  • Fixed broken 1.2 release egg
1.2
  • Added ability to only search certain types [matth]
  • Do not display portlet if no similar items found [matth]
1.1
  • Bug fix important word selection code [matth]
1.0
  • Initial release

Subscribe to package updates

Last updated Dec 13th, 2011

Download Stats

Last month:1

What does the lock icon mean?

Builds marked with a lock icon are only available via PyPM to users with a current ActivePython Business Edition subscription.

Need custom builds or support?

ActivePython Enterprise Edition guarantees priority access to technical support, indemnification, expert consulting and quality-assured language builds.

Plan on re-distributing ActivePython?

Get re-distribution rights and eliminate legal risks with ActivePython OEM Edition.