Search not giving results we expect
Message
 

Hi,

When we do a search on a specific term (is-oil) in our database, we're not getting any results. However, when we do a search on "shell" we do get results. In the (preview) screen of the CV, we then also see the term "IS-Oil". Is this because of the hyphen? Is there a work-around, or is this something on the back-end? At the moment we are almost unable to search, as we do a lot of searches on those kind of terms.

 
The poster of the following message is an official representative of CATS.

Danny,

Andrew would be more qualified than me to answer this question but I know it has to do with the new search we implemented which is wildcard (*) based. Within this, the word IS is seen as a stop word. If Andrew can expand on this, great, but I think that is the main reason why this particular search is having problems for you.

 

I was noticing some interesting results too. From what I can tell the - is equal to NOT.

If I search for Ryan Elenbaum I find myself, if I search for Ryan - Elenbaum or Ryan NOT Elenbaum then I get all the other Ryans but not myself.

 

Any updates on a work around to do a search on these specific terms?

 

Interesting to see I've just acquired another name :)

 
The poster of the following message is an official representative of CATS.

I'm not sure there is a simple work around for this. I will have to defer comment to Andrew.

 
The poster of the following message is an official representative of CATS.

There are a couple of problems with the search that you referenced, "IS-Oil".

In order to provide keyword search, we need to break all data into words. This involves using non-alphanumeric (a-z, 0-9) characters as word breaks, indexing what's in between. Some characters we've created exceptions for, like + or @ for example because they can be found in common searches like email addresses or C++ (a programming language). For this reason we differ from some boolean search engines in that we use AND, NOT and OR and do not use + or -.

We do not exclude certain characters like a dot (.) or hyphen (-) because they have numerous meanings. For example, if we included the dot, then the last word in any sentence would be indexed with the trailing period, and you'd have to use an asterisks to search for it. Take this sentence:

"I have experience with welding."

A search for "welding" would match no records, you'd have to search for "welding*" or "welding.". This is obviously not desired.

Similarly, the hyphen is used quite frequently to start bullet lists, to indicate ranges, for exclusion, etc., as these examples show:

"Experience:"
"-Welding"
"-Soldering"
"-Pipe Fitting"

"Worked March-May 2009"
"Forklift operator-9 years"

If we included the hyphen character, a number of searches would be corrupted using the above example.

So your first problem is that your hyphen character in CATS is stripped when you process your search, so searching for "IS-Oil" would be similar to "Is Oil".

Secondly, our search includes a set of stopwords, which could also be described as plain words, like "is", "a", "the", etc.. These are words that appear very frequently and hold little meaning. So, your search for "Is Oil" would need be further reduced to just "Oil".

We try to employ the latest in search technologies (our engine is actually used by some pretty big sites, Craigslist for example). We're constantly tweaking it, polishing it for our particular domain. I'd say it's right where it needs to be for 95% of our users' searches, this particular search issue you're having is falling into that 5%. We'll do our best to close that margin even further as we go forward; but in the mean time certain searches like this that sit in the peripheral just aren't going to be possible.


Login to post new content in the forum