Do Semantic “Similarity” Searches Produce Better and Faster Results?

Posted by

The Prior Art Search Challenge

It is difficult to identify novelty of an invention without a thorough understanding of previous work in the field. Prior art search is essential to define the boundaries between a potential invention as claimable in a patent and the published prior art. It is common practice to perform keyword-based and/or classification-based searches of the disclosed concept. Getting a comprehensive result including hidden or unexpected prior art using classic methods is challenging, particularly when working under time constraints. The quality of the search greatly depends on how much time is spent on the search, and on the technical background and skill of the searcher.

Semantic search, similarity search or citation analysis approaches could be combined to overcome the above challenges in many cases. This article emphasizes the benefits of similarity search for finding potential prior art.

Similarity Search

A number of patent databases (both public and paid) provide similarity search options. Similarity search engines may operate in one or more of the following ways:

  1. Perform text mining and machine learning to extract contextual similarities between the target patent and the assets stored in the patent databases;
  2. Search for patents that have common citations or share citations within the same family;
  3. Retrieve a list of similar records leaving out stop words and common words such as “method”, “process” or “device”;
  4. Display potentially relevant prior art documents ranked based on the relevancy.

MaxVal validated the effectiveness of a similarity search engine¹.  MaxVal conducted similarity search for a set of three patents filed under different IPCs viz. US8315756B2, US8838292B2 and US9149609B2. In each case, the similarity search retrieved a large number of patents ranked in downward order of relevancy score, starting with the most similar with 100% rating. The top results with rating above 90% relevancy were filtered using either keyword or classification-based restriction to retrieve a handy number such as 200-300 that were easy to review. In each of the three cases, the examiner-cited references were retrieved with rating above 90% relevancy through the similarity search as shown in the table below.

Target Patent

Technology Examiner Cited References

Examiner Cited References Retrieved through Similarity Search

US8315756B2 G01C 22/00

(Decentralised systems, e.g. inter-vehicle communication)

US20100121518A1
US20140039716A1
US20130325210A1
US9092987B2
US9070022B2
US8838292B2 G05D 1/00

(Control of position or course in two dimensions specially adapted to land vehicles…)

US20030225477A1
US20040024527A1
US20040254729A1
US20080269992A1
US20060106538A1
US20070043502A1
US8315756B2
US9149609B2 A61M 25/06

(Guide tubes)

US4611594A
US5074871A
US5968057A
US5688234A
US20020026203A1
US5908435A
US20020068954A1
US7169154B1
US6517551B1
US20060135987A1
US20050119668A1
US20060253145A1
US20070191878A1
US20070208351A1

In summary, similarity search augments keyword/classification-based searches and dramatically increases the quality of the results set that needs to be examined. Thus, similarity search function could enhance the efficiency of prior art search by minimizing the probability of missing relevant prior art in given time.

¹MaxVal used Questel Orbit to test the functionality of Similarity search

Leave a Reply