FAST Search for SharePoint 2010 – Indexing Database Content – Guidance

If you are doing any work with FAST Search for SharePoint 2010 and need to index database content (e.g. SQL tables), as a general rule of thumb you should use BCS for this.

FS4SP does have a JDBC connector that is quite capable of indexing database content, though don’t just use this because you have FAST and think you need to.

The reason here is simple – you will have a much simpler migration experience to SharePoint 2013, as the JDBC connector is now no longer included.

SharePoint 2013 vs. FAST Search for SharePoint 2010

Ok so the key differences between SP2013 and Fast Search for SharePoint 2010 are officially up on TechNet.

In summary, a number of features have gone.  Some people may be upset, though overall I think the majority of changes make sense because it simplifies the platform.  FAST for SharePoint (FS4SP) included a number of features that were baked in from FAST ESP (the standalone, pre-Microsoft product) and became redundant when SharePoint was added to the mix.

So, what are the key differences?:

  • FAST Search Database Connector: Unsupported.  The FS4SP DB connector was built in when FS4SP was rebuilt from FAST ESP.  Even with the release of FS4SP Microsoft’s recommendation was to use BCS wherever possible (instead of the FAST connectors)… primarily because it would be deprecated. In summary: You should be using BCS to index DB content (or a third party connector)
  • FAST Search Lotus Notes connector: Unsupported. Use BCS or for enhanced security handling, if you have the budget you’ll want to consider BA-Insight
  • FAST Search Web Crawler: Unsupported. The SP2013 web crawler provides similar capabilities to the FAST Search web crawler.
  • Find Similar Results: Unsupported.  It was hardly used.
  • FQL Operators:
    • ANY: Now has the same affect as OR.  Use WORDS instead – e.g. WORDS(TV, Television)
    • RANK: Use XRANK with updated syntax
    • XRANK: Updated syntax
  • Approach for Querying URLs: FS4SP provided the ability to query URLs using these operators: STARTS-WITHENDS-WITH and PHRASE. For performance reasons (at query time), this is no longer supported.  Instead you must query the full URL, the leading part of the URL – or add managed properties yourself to search any other part of the URL
  • Search Scope Filters: FS4SP Scopes need to be converted to SP2013 result sources
  • Anti-Phrasing: Unsupported.  FS4SP had the ability to filter out common words/phrases – e.g. “how can I”, “what is”, “who is”. SP2013 includes these phrases in search queries though. The workaround here is either to train your users to not enter redundant phrases (e.g. “how can I”), or to extend the search query web part to filter the phrases before you submit it to SharePoint
  • Offensive Content Filtering: Unsupported.  This was not built into SP2013 given the limited usage it had on FS4SP.
  • Substring Search: Unsupported.  This was not used much – primarily in situations where recall (overall number of documents retrieved) was more important than precision (high degree of relevance) – so this is not a big deal for most companies. Turning on substring search also had the downfall of bloating the search index.
  • Person and Location Entity Extraction: You need to use your own custom dictionaries.  Typically each business is different and has their own people and locations they care about. On FS4SP there was actually a lot of tuning necessary to get it working properly, because you would get overlap between People and Location names.
  • Number of Custom Entity Extractors: This is now limited to 12, and for many businesses this won’t be an issue. Primarily because on a given Search Centre, for performance and screen real-estate reasons you will generally want to keep the # search filters to an absolute max. of 6 to 10.  You really only want to include the filters that the majority of your end users will actually use and not bloat it with ones that will benefit 1 user in 1000.  The limitation of 12 could be an issue for organisations that are well advanced on their FAST implementation and are using Search for multiple applications that depend on entity extraction.
  • Document Formats: FS4SP supported several hundred file types after enabling the Advanced Filter Pack. However, many of these were legacy file types and investment has not been made in building these into SP2013.  If you have a file type that is not supported then your best option is to look for a third party iFilter/Connector  – e.g. from the iFilterShop or BA Insight
  • Pipeline Extensibility: This feature allows you to perform dynamic calculations or manipulations of document meta-data before being indexed.  On FS4SP you used to create your code and build it as an exe file and then put it on each FS4SP server that had a Document Processor role.  Now on SP2013 the approach is to use Web Service calls. I haven’t tried this yet on SP2013, though with FS4SP there was a fairly fundamental performance problem with the pipeline extensibility: Your extension (the exe) was opened and closed for EVERY single document going through the index. In some cases, due to extending the pipeline, I’d see crawl performance drop from 30 to 40 docs/sec to 10 docs/sec or less! Due to that performance impact, on FS4SP its absolutely critical that the exe you write is optimised as you can make it. I’m looking forward to testing this out on SP2013.
  • Custom XML Processing: Unsupported. This was another feature baked into FAST ESP and then made its way into FS4SP. It provided a way to manipulate XML files as they were going through the index – though it generally wasn’t super easy to configure.  The approach now is to call out to a web service that will process the XML for you
  • Docpush: Docupush was used to add (mostly) test documents to the index from the command line. This was built into the original FAST ESP product and made its way into FS4SP – though isn’t really needed now. If you just need to do a quick “is search at least partially working” test on SharePoint 2013 and you don’t have proper content sources to hook up to, you can still just do as you would on SP2010 or 2007 – just upload documents to a Document Library on a test site and run a crawl – pretty straight forward.

FAST Search for SharePoint 2010 – Get All Results Back

On FAST ESP, there used to be a technique to get back all documents in your index – enter the minus character, followed by a term that wouldn’t be in your index.

For example: -394kidfdkadkfl2k2

This would bring back all documents.

In FAST Search for SharePoint 2010, this does not work.

The trick instead, is to simply enter the hash character ( # ) by itself / without the parenthesis in the search query textbox.