ПРОЕКТЫ 


  АРХИВ 


Apache-Talk @lexa.ru 

Inet-Admins @info.east.ru 

Filmscanners @halftone.co.uk 

Security-alerts @yandex-team.ru 

nginx-ru @sysoev.ru 


  СТАТЬИ 


  ПЕРСОНАЛЬНОЕ 


  ПРОГРАММЫ 



ПИШИТЕ
ПИСЬМА












     АРХИВ :: nginx-ru
Nginx-ru mailing list archive (nginx-ru@sysoev.ru)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Как отбиться от 80legs.com ?



On Tue, 29 Oct 2013 10:40:10 -0400
"Gaidamak" <nginx-forum@xxxxxxxx> wrote:

> Повадилась такая вот напасть. 
> 
> http://www.80legs.com/webcrawler.html
> 
> Как ее грамотно выпилить? 
> 

Забанить по юзерагенту или как они на сайте сами пишут:

 If you'd like us to stop crawling your website, the best thing to do is to 
block our web crawler using the robots.txt specification. To do this, add the 
following to your robots.txt:

   User-agent: 008
   Disallow: /  
If you block 008 using robots.txt, you will see crawl requests die down 
gradually, rather than immediately. This happens because of our distributed 
architecture. Our computers only periodically receive robots.txt information 
for domains they are crawling.


> В логах много такого:
> 
> 109.166.134.39 - - [29/Oct/2013:18:34:09 +0400] site.domain.com "GET
> /page/url/  HTTP/1.1" 502 107 "-" "Mozilla/5.0 (compatible; 008/0.85;
> http://www.80legs.com/webcrawler.html) Gecko/2008032620" 0.000
> 
> Posted at Nginx Forum: 
> http://forum.nginx.org/read.php?21,244236,244236#msg-244236
> 
> _______________________________________________
> nginx-ru mailing list
> nginx-ru@xxxxxxxxx
> http://mailman.nginx.org/mailman/listinfo/nginx-ru

-- 
Peter B. Pokryshev <ppb@xxxxxxxxxxxx>

_______________________________________________
nginx-ru mailing list
nginx-ru@xxxxxxxxx
http://mailman.nginx.org/mailman/listinfo/nginx-ru


 




Copyright © Lexa Software, 1996-2009.