Troubleshooting SearchBlox: Common Issues and Fast Fixes
1) Crawler not indexing or stops mid-run
- Symptoms: No new documents, crawler task shows errors or stalled.
- Quick fixes:
- Check crawler logs (Admin UI → Collections → View Logs) for HTTP errors, timeouts, or authentication failures.
- Verify seed URLs and robots.txt: ensure seeds are reachable and not blocked by robots.txt.
- Increase timeouts / reduce concurrency in collection settings if target sites throttle requests.
- Test credentials for sites requiring Basic/Form auth; re-enter and save in collection credentials.
- If using a proxy, confirm proxy settings and credentials are correct.
2) Documents indexed but not appearing in search results
- Symptoms: Indexed count increases but search returns no or missing results.
- Quick fixes:
- Confirm collection is selected in the search UI or query parameters.
- Refresh index/commit (Admin → Collections → Run Index or use API) to ensure latest segments are visible.
- Check field mapping and schema — ensure title/content fields are mapped and not stored as non-searchable.
- Verify security filters or query restrictions aren’t excluding documents (date filters, size, permissions).
- Inspect document size/type filters in collection settings that may skip certain filetypes.
3) Slow searches or high query latency
- Symptoms: Queries take long or time out under load.
- Quick fixes:
- Check CPU/RAM and Elasticsearch health (OS metrics + Admin dashboards). Upgrade VM resources if consistently high.
- Optimize queries: avoid heavy wildcard or leading wildcard searches; use fielded queries.
- Use caching for frequent queries and enable result caching where applicable.
- Tune index settings (shard/replica counts) and force-merge only on maintenance windows.
- Review analyzers and stop-words—complex analyzers can slow matching.
4) Connector/authentication failures (SharePoint, Google Drive, Databases)
- Symptoms: Connector reports auth errors or returns zero documents.
- Quick fixes:
- Re-authorize OAuth apps when tokens expire; follow provider re-consent flow.
- Confirm API quotas and scopes—ensure required scopes are granted and API project is enabled.
- Validate service account permissions for Google/SharePoint or DB user privileges for database connectors.
- Check network access: firewall or IP allowlists may block connector traffic.
- Examine connector-specific logs for API error codes (401, 403, 429) and act accordingly.
5) Duplicate documents in index
- Symptoms: Same content appears multiple times in results.
- Quick fixes:
- Enable duplicate detection in collection settings or configure a unique document ID (URL, GUID).
- Normalize URLs (remove session params) in crawler rules so identical content uses same ID.
- Apply canonical tags or metadata mapping during ingestion to collapse duplicates.
6) File parsing errors (PDF, Office, emails)
- Symptoms: Files indexed without text or with garbled content.
- Quick fixes:
- Check parser logs to see which MIME types fail.
- Ensure Tika/parse libraries are up to date on the SearchBlox server.
- Confirm file encoding and test converting sample files to plain text to isolate parser issues.
- Increase JVM heap if large files cause OOM in parsers.
7) Elasticsearch or service crashes
- Symptoms: SearchBlox service stops, Elasticsearch unresponsive.
- Quick fixes:
- Inspect logs (SearchBlox and Elasticsearch) for OOM or disk-full errors.
- Free disk space on data and temp partitions — Elasticsearch will stop when disk is critically full.
- Increase heap sizes per docs and follow recommended CPU/RAM sizing.
- Restart services in order: stop SearchBlox app, restart Elasticsearch, then start SearchBlox.
8) Security / permissions issues for search results
- Symptoms: Users see results they shouldn’t or see none due to permission filtering.
- Quick fixes:
- Verify secure search configuration and mapping of ACL fields in collections.
- Ensure user identity propagation from your auth system into queries (headers or token).
- Test with a known user account to confirm expected result visibility.
Diagnostic checklist (fast)
- Check logs first (crawler, connector, parser, Elasticsearch).
- Confirm network reachability (ping, curl seeds/APIs).
- Re-run indexing for affected collection.
- Validate credentials and OAuth tokens.
- Monitor CPU/RAM/disk and JVM heap usage.
When to escalate
- Repeated OOM, disk-full, or corrupted indices — stop indexing and contact SearchBlox support with logs and system metrics.
- Connector provider-side 5xx or quota restrictions — check provider status pages and open provider support tickets if needed.
If you want, I can generate the exact log commands and API calls to run for a specific error message or operating system.
Leave a Reply