ICC-Crawler
ICC web crawler for language research
About this crawler
ICC-Crawler is a web crawler identified by the regular-expression pattern ICC-Crawler in the User-Agent request header. It is categorised as academic. Use the regex above to detect, log, allow, or block ICC-Crawler traffic in your web server, CDN edge rules, or robots.txt.
Block-rate · top 25k sites
2.0%
Technical details
- Name
- ICC-Crawler
- Pattern
ICC-Crawler- Tags
- academic
- Reference
- https://ucri.nict.go.jp/en/icccrawler.html
- Added
- 2018/02/28
- Instances
- 1 known sample(s)
Sample User-Agent strings
ICC-Crawler/2.0 (Mozilla-compatible; ; http://ucri.nict.go.jp/en/icccrawler.html)
Block this crawler
robots.txt — disallow ICC-Crawler:
User-agent: ICC-Crawler
Disallow: /
Apache .htaccess — return 403:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ICC-Crawler [NC]
RewriteRule .* - [F,L]
Nginx — return 403 inside a server block:
if ($http_user_agent ~* "ICC-Crawler") {
return 403;
}
← back to all crawlers