The World

scribble

Ralph YY's Blog

12 Aug 2017
Solr Load Timeout

All the people know Elastic Search, which is super popular, it can use not only search keywords, also can use as a fast key/document database to response json file. It based on Lucence, which has combined with Solr. Compare to Elastic Search, Solr is not fast enough, but we have an old system still using Solr, recently got a loading issue.

Issue

When we do some solr search such as name is Lili, age is 50 etc, Solr is not able to response, our application looks frozen(probably don’t have timeout setting). However, when try some other criteria, it is working.

Troubleshoot

Solr actually use the HTTP request, same as Elastic Search, we can debug by Solr admin page

http://11.11.11.11/solr

or use the below HTTP url

http://11.11.11.11/solr/select?q=realname:"Lily" AND age:"50"

When we search, it take a long time load and return a long text/xml list, we find 1 record below:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="debugQuery">true</str>
<str name="indent">true</str>
<str name="q">realname:"Lily" AND age:"50"</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="filename">
/mnt/filings/111/111.pdf
</str>
<str name="realname">Lily</str>
<str name="age">50</str>
<str name="id">87476620170406name2017</str>
<date name="filingdate">2017-04-06T00:00:00Z</date>
<str name="name">name2017.pdf</str>
<arr name="content_type">
<str>text/plain; charset=ISO-8859-1</str>
</arr>
<arr name="content">
<str>
<PDF> begin 644 xxxx.pdf
...

This is a pdf file, and maybe Solr don’t know PDF format, so it compiled with a long long content with lots of special characters.

Romove The Index

Let’s try remove this id and check how is everything going. There are 2 ways to remove the data.

  • url delete it by id(not work):
    http://11.11.11.11/solr/update?stream.body=update?stream.body=<delete><query>id:87476620170406name2017</query></delete>&commit=true
    
  • use curl to delete(can use the correct Content-Type, this works for me)
    curl -H 'Content-Type: text/xml' http://11.11.11.11/solr/update --data-binary '<delete><query>id:87476620170406name2017</query></delete>'&commit=true
    

After delete this pdf, Solr is able to load very fast, our application is loading fast as well, the issue solved.

Conclusion

The Solr has its own mechanism search the content, but when the content is unformal with lots of special character, Solr is not good at parsing those characters. Which cause the slow response. However, those content are bad data, we need to fix them before insert into Solr.

Reference

Example Of Using Solr Query

Solr Delete Statement Error


Til next time,
at 00:00

scribble

comments powered by Disqus