Search Suggestions

  • Thread starter Thread starter Jonathan Wood
  • Start date Start date
J

Jonathan Wood

Greetings,

I want to add search to a site I'm building in the simplest way possible.

I know I could use Google. However, it's really database content that I want
to search. The actual URL could change that displays a particular database
record.

I have a table with fields that include TITLE and DESCRIPTION. I'd like to
search both columns for the word(s) entered by the user. I don't know if MS
SQL supports "LIKE %term%" syntax or how well that works. I think full-text
search is included with MS SQL 2005 but don't know how well it works, or how
appropriate it is to search two different columns, particularly on a small
site like I'm developing.

I'd appreciate any suggestions.

Thanks.

Jonathan
 
Building a good search for your site is a very challenging project.
Here are some tips from my experience. I had to build search for my
e-commerce site http://www.mspiercing.com

Biggest problem is misspells. People often misspell words and if you do not
account for it they end up with empty results and quit the site thinking you
do not have what you looking for. There is a free NetSpell component you can
download from http://sourceforge.net/projects/netspell/ . I end up modifying
the code to suit my needs. But the major part is ok there.

Then I played with Free Text but my conclusion it's not suitable for phrase
searching. So i end up writing my own. Here are the steps.

1. Create dictionary of dead words (such as 'the', 'in'....). That list
depends on your site. For example my site has a dead word 'ring'. Almost
anything there can be called 'ring'
2. Create substitution dictionary. For example on my site people look for
needle or needles. Essentially same product must come up in a search. Hence
I do the substitution on the fly. Mostly plural to singular form. But not
always. For example on my site people might look for 'belly ring' or 'navel
ring'. Hence 'Navel' = 'Belly' in my substitution dictionary.
3. Create dictionary of words that is used on a site. Simply by scanning
your Title and Description fields. Pass it through 'Dead Words' and
'Substitution' dictionaries and create a table (WordId, Word)
gold = 1
plated = 2
....

4. Create 'Search Helper' table. Sometimes it's called Markov's chain. You
do not need PHD to understand what it is.
I choose to use chain with length 3.
(ItemId, WordId1, WordId2, WordId3, Weight)

So for example ItemId: 25, Title: "Gold plated belly button ring with gem"
becomes following

gold = 1
plated = 2
belly = 3
gem = 4
(ring, with - dead words)

Now Markov's chaing with lenght 5 for this item with weight = 1.01

25,1,2,3,1.01
25,2,3,4,1.01
25,3,4,-1,1.01
25,4,-1,-1,1.01

I assign higher weight for keywords that are in Title that to those that are
in Description.
So idea is to bring up those where 3 words match together higher than 3
words match separately.

So if someone looking for 'Gold plated ring' item 25 will be ranked first.
Cause 'Gold plated' go together.
If someone looking for 'Gold belly ring' the ones that have a title 'Gold
belly ring' will be ranked first. And 'Gold plated' will be bellow them.

So if someone is looking for 'Gold belly ring' you issue following SQL
statements (something like this, i did not check it for SQL correctness)
"Gold Belly ring" = 1,3
SELECT ItemId FROM (
SELECT ItemId, 10*(Weight) FROM tblSearchHelper WHERE WordId1 = 1 AND
Word2=3
UNION
SELECT ItemId, Weigth FROM tblSearchHelper WHERE WordId1 = 1 OR Word2 = 3
) tblTemp
ORDER BY SUM(Weight) DESC


Then items will be sorted by weight in descending order. We multiplied
weight by 10 in first SELECT so the items that have 'gold ring' together
outweighed those that do not have them together.

Obviously it works for my type of site where i control the content.

George.
 
Thanks for the detailed explanation.

I'm actually going to save your email for if I ever get around to doing
something like. However, this type of effort cannot be justified on my
current project. For it, I'm leaning heavily to using LIKE.

I did think about doing what you've done some time back but the client
finally decided to purchase DT Search. So some of the details in your
description are of interest to me, even if I can't apply them at this time.

Thanks again.

Jonathan
 
Back
Top