Consolidating third-party Knowlege base Sources

This is a follow-up and continuation of last weeks Knowledge base Content post. In writing this post it veered some way from my chosen topic, hence it’s posted under Technology, but it’s something I think it’s worth writing about – combining both local and specific internet search results in a single relevance-based list.

Third-party content

Third-party content is often taken directly from, or derived from, trouble-ticket data (I discussed trouble-ticket [Incident] data last week). I’m describing it here because it’s a type of content you don’t control. Typically hardware and software vendors publish this kind of content for the benefit of their customers. Sometimes the content is free, other times it's restricted and paid for (The Serio knowledge base is found here).

Third-party content presents some interesting challenges, the most important of which is integration when the content exists on an Internet website. I’m going to discuss these challenges here, as this is something I’ve been asked about before.

In an ideal world you’d set-up your own knowledge base, and simply ‘add-in’ third-party knowledge base sources – so that when you do a search, you can search at the same time:

your local content (your own trouble-ticket data and documents),
the knowledge base directory at vendor A,
the knowledge base directory at vendor B and so on.. 

and have a single, consolidated results list based on relevance.

It sounds great, but is very difficult (impossible?) to implement at this time. In a nutshell, if the content exists on another site you can either:

  • Spider the content yourself, like a search engine does. However, webmasters have a habit of banning spiders they don’t recognise or which operate without permission on their websites, so as to conserve bandwidth for real users
  • Ask the remote website to query it’s own content for you – but this presents problems when constructing a single results list based on relevance because your own knowledge base software cannot see the actual document (ie, does a remote document rank above or below a local, private document).

So, like I said, it’s technically demanding – even leaving aside the ‘terms of service’ issues that arise when using the content of another ogranisation.

Tip: If you are a Serio user, there is something you can do – add your best external knowledge base sources to your ‘Useful Links’ chapter (see ‘Useful Links’ in the HowTo guide for more information).

Probably the best hope for solving the problem I’m describing lies with major search engines such as Yahoo! and Google. What they need to do is allow searchers to set-up customised pages that restrict the searching to particular points within sites (where the KB content is specifically, not just a particular URL) and let the search engine perform consolidated search for you based on your choices.

However, for this to solve the problem search engines would need to see your own local (trouble ticket) content, and would need to understand this is private and not to be served to other searchers. The search engines could either fund this by a fee, or by advertising as now.

I'll come back to the 'proper' thread of these posts later in the week.