12 Aug 2017
Solr Load Timeout
All the people know Elastic Search, which is super popular, it can use not only search keywords, also can use as a fast key/document database to response json file. It based on Lucence, which has combined with Solr. Compare to Elastic Search, Solr is not fast enough, but we have an old system still using Solr, recently got a loading issue.
Issue
When we do some solr search such as name is Lili, age is 50 etc, Solr is not able to response, our application looks frozen(probably don’t have timeout setting). However, when try some other criteria, it is working.
Troubleshoot
Solr actually use the HTTP request, same as Elastic Search, we can debug by Solr admin page
http://11.11.11.11/solr
or use the below HTTP url
http://11.11.11.11/solr/select?q=realname:"Lily" AND age:"50"
When we search, it take a long time load and return a long text/xml list, we find 1 record below:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="debugQuery">true</str>
<str name="indent">true</str>
<str name="q">realname:"Lily" AND age:"50"</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="filename">
/mnt/filings/111/111.pdf
</str>
<str name="realname">Lily</str>
<str name="age">50</str>
<str name="id">87476620170406name2017</str>
<date name="filingdate">2017-04-06T00:00:00Z</date>
<str name="name">name2017.pdf</str>
<arr name="content_type">
<str>text/plain; charset=ISO-8859-1</str>
</arr>
<arr name="content">
<str>
<PDF> begin 644 xxxx.pdf
...
This is a pdf file, and maybe Solr don’t know PDF format, so it compiled with a long long content with lots of special characters.
Romove The Index
Let’s try remove this id and check how is everything going. There are 2 ways to remove the data.
- url delete it by id(not work):
http://11.11.11.11/solr/update?stream.body=update?stream.body=<delete><query>id:87476620170406name2017</query></delete>&commit=true
- use curl to delete(can use the correct Content-Type, this works for me)
curl -H 'Content-Type: text/xml' http://11.11.11.11/solr/update --data-binary '<delete><query>id:87476620170406name2017</query></delete>'&commit=true
After delete this pdf, Solr is able to load very fast, our application is loading fast as well, the issue solved.
Conclusion
The Solr has its own mechanism search the content, but when the content is unformal with lots of special character, Solr is not good at parsing those characters. Which cause the slow response. However, those content are bad data, we need to fix them before insert into Solr.
Reference
Til next time,
at 00:00