Tuning / Boolean query syntax

  • Default max length: 1,500 characters (but may be changed - contact us)
  • Query cannot be empty
  • Operators must be CAPITALIZED
  • NEAR accepts values from 0 to 99 (e.g. NEAR/3)
  • Operators are always surrounded by words (e.g. coffee AND tea AND decaf)
  • Double-check opening and closing quotes and parentheses
  • Query terms cannot contain special characters
  • Special characters are: `! @ # $ % ^ ( ) _ - = ~ + [ ] { } ( ) | " ' : ; . , < > ? / 1 2 3 4 5 6 7 8 9 0
  • Spaces are special characters
  • Terms containing special characters or phrases containing more than two words should be enclosed (escaped) in quotes (e.g. "#beautifulflowers", "customer service", "123 Ave Rosemont", "rendez-vous")
  • Queries can contain , but they must be enclosed in quotes

Operators

Note that operators must be capitalized, otherwise they will be treated as a query term. Query operators must also be preceded and followed by query terms or query phrases.

OR operator

Inside a query, the OR operator may be used to retrieve documents containing either of two terms.

Example:

  • onions OR cheese will detect "Onions make my eyes water", "My favorite cheese is cheddar", and "I want cheese and onions on my pizza".

AND operator

Inside a query, the AND operator may be used to retrieve documents containing both specified terms.

Example:

  • onions AND cheese will detect "I want cheese and onions on my pizza" or " I like cheese on my onion rings", but not "Onions make my eyes water" or "My favorite cheese is cheddar."

NEAR operator

A NEAR operator is effectively an AND operator where you can control the distance between the words. onions NEAR cheese means that the term cheese must exist within 10 words of onions. The default distance is 10 words, but you can vary the distance the NEAR operation uses by adding a number suffix such as onions NEAR/50 cheese, which means the onion must exist within 50 words of cheese. This window can be between 0 and 99.

Other examples include:

  • (onions OR bananas) NEAR/5 (cheese OR dinner) would tag "The banana split was included with dinner" and "The steak dinner with onions was my favorite." This query will not detect sentences like "The cheese platter on the dinner menu was superb" or "Bananas, strawberries, and ice cream are not a balanced dinner."
  • (onions NEAR/5 cheese) would tag a comment like "Do you want onions on top of your cheese?" but not "Their cheese is my favorite but only on the dish with carmelized onions."

Do not use the NEAR operator in the following fashions:

  • "onions NEAR/10 cheese" – this does nothing
  • onions "NEAR/10" cheese – this does nothing

NOTNEAR operator

NOTNEAR is effectively a NOT operator where you can control the distance between the two words. Onions NOTNEAR cheese means that the term onions cannot exist within 10 words of the term cheese. The default distance is 10 words, but you can vary the distance the NOTNEAR operation uses by adding a number suffix such as onions NOTNEAR/50 cheese, which means the onion cannot exist within 50 words of cheese. This window can be between 0 and 99.

WITH operator

A WITH operator requires that the two terms occur within the same sentence. As such, it is the same as a NEAR operator, with the exception that the match window between the two terms is not specified.

  • onions WITH cheese means that the term cheese must exist within the same sentence as onions.​

NOTWITH operator

A NOTWITH operator requires that the two terms cannot occur within the same sentence. As such, it is the same as a NOTNEAR operator, with the exception that the match window between the two terms is not specified. onions NOTWITH cheese means that the term cheese must not exist within the same sentence as onions.

NOT operator

The NOT operator excludes any documents containing the term which follows it. onions NOT celery will return all uses of onion, excluding those that contain "celery." A query must contain at least one non-excluded term when using the NOT operator.

Example

  • onions NOT celery will detect "I like onions very much" but not "I like onions on my sandwich and celery on the side."

EXCLUDE operator

Two query terms of any type may be joined by an EXCLUDE operator, e.g. York EXCLUDE "New York". The effect is different than that of the NOT operator. The query will return documents with the word "York", excluding those that only contain occurrences of "New York".

Consider the following sample text:

I spent the day in York, visiting the magnificent cathedral. Then it was time to head back to London for my flight home to New York.

This text would generate the following results for the provided queries:

  • York NOT "New York": FALSE
  • York EXCLUDE "New York": TRUE

Parentheses

Queries can use parentheses to control the logic of the query and they may appear in any combination.

Two examples of queries with smart uses of parentheses are:

  • ((onions OR cheese) AND celery) NOT horrible
  • (onions OR cheese) NEAR (horrible OR disgusting)

Every left parenthesis must have a corresponding right parenthesis. Queries can have nested parentheses up to 10 levels deep.

Queries

Terms and Phrases

Single query terms are the simplest query element, consisting of a single word. A query term can be an operator or a word that appears in a stopword list only if it is in quotations. A query term cannot contain punctuation or other special characters like `! @ # $ % ^ ( ) _ = ~ + [ ] { } ( ) | " ' : ; . , < > ? / -

Phrases must be enclosed in double quotes. When a single word is enclosed in quotes, it is not treated as a phrase search: it is treated like a single word. The case sensitivity operator (~) must be placed outside the quotes, like so: ~”Tim Cook"

Wildcards

A wildcard character (*) may be used at the end of a single word query term or within a phrase. It allows the system to tag all spellings of the word starting with the letters before the wildcard (*). Wildcards will only work in phrases if they are attached to the last term in the phrase. For example:

  • excit* would match excite, exciting, excitement, etc.
  • "running fast*" would match "running fast" and "running faster".

There must be at least a three-letter prefix to a wildcard query. d*, do*, and dog*M are all invalid. Queries like "*" and Commonwealth AND "*" are invalid and achieve nothing.

Nested Queries

Referencing a query is done by placing a caret (^) at the beginning of a query name and wrapping the caret and the query name in parentheses "( )". It signals to the system to look for a query and use it in another query. For example, consider the following queries:

  • Dirty dirty OR filth* OR disgust* OR nasty
  • Bathroom bathroom* OR toilet* OR restroom* OR lavator*
  • Restaurant_Interior restaurant OR table* OR chair* OR carpet* OR furniture* OR plate* OR cup*

Two queries can be combined to create a nested query. For example:

  • Dirty Bathroom (^Dirty) AND (^Bathroom)

Query names being nested cannot contain spaces. Only the AND and OR operators function with nested queries.

Metadata criteria

Certain metadata criteria can be included by enclosing accepted keywords within braces:

  • {<entity|document> <entity type> : <sentiment criteria>}

The syntax above allows for the first component of the metadata criteria to be either entity or document.

If the first component is entity, it may be followed by an entity_type. This may be any of the entity types supported by the Salience entity extraction model, company, person, place, or product.

Optionally, a sentiment criteria component may be added. Sentiment criteria can be a comparison of document or entity sentiment to a single value, or a range.

Based on these specifications, the following metadata query phrases are valid:

  • "merger announcement" NEAR/5 {entity company}
  • "merger announcement" AND {document: sentiment > 0.2}
  • "merger announcement" AND {entity company: sentiment > 0.2}
  • "merger announcement" AND {entity company: 0 < sentiment < 0.25}

NOTE:

The NEAR and WITH operators assume usage with text-level elements, it is not valid to use the {document: sentiment} construction with these query operators.
Valid: "merger announcement" NEAR/5 {entity company}
Invalid: "merger announcement" NEAR/5 {document: sentiment > 0.2}

Case Sensitivity

By default, query terms are handled in a case-insensitive manner. Case-sensitivity on a query term can be enforced using the ~ operator.

  • ~Google NEAR/10 Microsoft will hit for the phrase "Both tech giants Microsoft and Google are investing heavily in mobile technologies" but not hit for the phrase "who wins in search, microsoft, bing or google?"

Stemming

By default, query terms are stemmed. For phrase searches, only the right-most word is stemmed. The query process will not stem all words within the multi-word phrase.

  • "driving on faster roads" will match "driving on faster road*" but will not match "driving on fast* roads"

Turning off stemming

Placing the ! character in front of a query term will turn off stemming for the entire query. To turn off stemming for an individual term enclose it in ().

  • !cat OR dog will turn off stemming for both terms in the query.
  • (!cat) OR dog just turns off stemming for cat.
  • cat OR !dog just turns off stemming for dog.

Special characters may be used within query phrases if they are in quotations.

Correct Query:

Gepp OR Gunther OR Hasso OR "Hayden-Smith" OR Hirakubo OR Kanai OR Mathis OR Moeller OR "Nijssen_Smith" OR Sherman OR Shimizu OR "U'Ren" OR Daiji

Wrong Query:

Gepp OR Gunther OR Hasso OR Hayden-Smith OR Hirakubo OR Kanai OR Mathis OR Moeller OR Nijssen_Smith OR Sherman OR Shimizu OR U'Ren OR Daiji

Scores

Query results will be accompanied with two scores, Query Relevancy and Query Sentiment.

Query Relevancy

Query Relevancy is a count of the query terms found within a document. It can be particularly effective in determining the effectiveness of your queries based on your text. Consider the following text:

I have one cat and I used to have a dog too.

The query relevancy score for the query cat OR dog OR bird will be 2 because the query detects two of the query terms.

Query Sentiment

Query Sentiment is the sentiment for each query term identified separately based on model- and dictionary- driven approaches and calculates the average score for all mentioned terms.

Examples

The most important thing to keep in mind when creating queries is to keep them simple and organized. Here are some examples of queries that vary in complexity:

Germ

anti* OR bact* OR germ* OR "anti-bacterial"

This uses simple "OR" logic while incorporating the wildcard (*) to account for plural versions and typos/misspellings.

Internet Banking – Mobile Access

((internet OR online OR paperless) AND (bank*)) AND (mobile OR cell* OR phone* OR access*)

This is similar "OR" logic and wildcard usage like the last example. The AND operator requires the use of parentheses to keep the desired logic.

Price (Negative)

(pric* OR cost* OR fee* OR item*) AND (high OR expensive OR premium OR "so much" OR disappoint* OR spendy OR ("too" AND (high OR "much" OR expensive)) OR ("not" AND (good OR competitive* OR worth OR fair))) OR ("too expensive" OR "a little expensive")

Sometimes, customers have used two separate queries for a single term (i.e. instead of one query for price, there is one for Price (Positive) and one for Price (Negative)). A downside of this system is false positives/negatives can occur. For example, the comment "it has high quality and reasonable prices" would attach to Price (Positive) query and the Price (Negative) query, when it belongs with only the Price (Positive) query.

Price (Negative)

(pric* OR cost* OR fee* OR item*) AND (expensive OR premium OR "so much" OR disappoint* OR ("too" AND ("much" OR expensive)) OR ("not" AND (good OR competitive* OR worth OR fair))) OR (("too expensive" OR "a little expensive") AND (price* OR cost* OR fee* OR item*)) NEAR/8 (high OR courses)

To fix the problem above, we added an operator at the end of the query, removed "high", and added parentheses at the beginning and end of the original query. The "AND" and "NEAR/8" operators act to nullify the false negative by adding the qualification that high needs to be equal to or less than 8 characters from "price, cost, fee, or item".)

Stopwords

Stopwords remove small and common words which have little effect on the content, like prepositions and conjunctions. In a query, all stopwords must be encapsulated in quotes.

Download list of stopwords

Or call us at 1-800-377-8036