Order of search results
Pandosearch determines the order of search results by calculating a score for each document. We do this on the basis of how often and where in the document a search term occurs. Another factor is how unique a search term is when you look at all the available documents. Optionally, other factors can also be important, such as how old a document is. The document with the highest score will be at the top.
This is explained in more detail below.
Note: each Pandosearch implementation has its own calculation that can be customised as required. This article discusses the basic rules that are the same for almost every Pandosearch implementation.
Score for relevance
For each incoming search query, Pandosearch calculates a relevance score for all the available documents. A document is usually a web page, but can also be a PDF file or other information. The document with the highest score comes first in the search results.
This score is basically determined using the answers to the following two questions:
- In which fields does the search term occur?
- For all fields containing the search term: how heavily does this field weigh in the calculation?
There are also some other factors that come into play. The rest of this article discusses these in more detail.
Searchable fields
For all documents found, Pandosearch translates the raw information (the HTML as it appears online) into specific fields for searching.
The term “field” is very generic. This is because it can be all kinds of information and can vary greatly depending on the implementation and type of document. In this article we will assume that we have a web page. This basically involves fields such as:
- title: the title you see at the top of your browser window.
- body: the entire content of the web page visible to visitors.
- meta tags: pieces of “invisible” information in the source code that Pandosearch can read. This article provides more information about what these are and what they can do for you.
In addition, a document can contain many more fields that we will not address for now. Use the “Diagnostics” function on the Pando Panel if you want to see which fields Pandosearch recognises and what information they contain in your own implementation.
Back to the calculation: supposing someone searches for “customer service”, Pandosearch will look in all searchable fields for the word “customer service”. If it is found, the field will be counted in the calculation of the total score for the document. If not, it won’t be.
Weighting of fields
In addition to the question of whether a search term occurs in a field, the field in which a search term occurs is also important. In the example of a search for “customer service”, several pages may contain the word “customer service” somewhere in the main body. But chances are that there is only one page with the title “Customer Service”.
Generally speaking, a document is often more relevant if the title also contains a search term than if only the body text contains a search term. After all, the title is often a short summary of the most important content of a document.
This difference in relevance is reflected in the weighting of each field. By default, a field is given a weighting of 1. We then give the field containing the title a weighting of 5, for example. This means that the presence of a search term in the title is 5 x as important as in fields without specific weighting.
An extreme example of how weighting is applied is the keymatch functionality. A search term that occurs as a keymatch is given a very high weighting factor (for example something like 99), which in practice means that the document almost always comes out on top in the search results.
If a search term occurs in several fields, or several times in one field, the individual scores add up to a higher total score. The idea is that a document in which a search term occurs many times is more relevant than a document in which it is mentioned only once or twice.
In new implementations, we often start with a basic set of weighting factors that experience has shown to lead to good search results for visitors. If necessary, we will adjust them if it is found that certain fields are weighted too heavily or not heavily enough. In addition, we can also add new fields or omit fields from the calculation altogether if they prove to have no added value in practice.
Other factors
The basic way the mechanism works is described above. If we go a little deeper, there are even more factors that come into play. These are sometimes very technical and not always relevant to every implementation. It would therefore go too far to cover everything here. Here are a few examples to give you an impression:
- The shorter the text in the searched fields, the higher a document ranks in the results. The idea behind this is that if a search term appears once in a short text, it is probably more relevant than if it appears twice in a very long piece of text which also contains many other words.
- If someone searches for several words at once, documents containing all the search terms will be ranked higher than documents containing some of the search terms. This takes precedence over how often search terms occur. The idea behind this is that the more specific the content of a document is with all the search terms entered, the more likely it is to be exactly what you are looking for.
- If a word occurs very rarely in all the documents combined, the score will be higher when it is found than for a word that occurs very often. This is called the inverse document frequency and is a way to prevent articles (the, it, a) from playing too big a role in searches when someone searches for a whole sentence. The idea is that the less often a word occurs, the more likely it is that the document it is in will be what someone wants to find.
- For things like blog posts or news articles, we can set older articles to receive a lower score. The most recent news is often the most relevant to visitors and should therefore be at the top of the search results if there are several documents in which a keyword occurs equally often.
For further technical information, see for example these articles:
Conclusion
If you have any questions about the search order within your implementation after reading this article, please contact one of the available support channels and we will be happy to tell you more.