Spider crawling web spider trap

1 in addition to the familiar 301 steering and other steering are more sensitive to the search engine spiders, such as 302 JavaScript flash temporary steering, steering, steering, meta refresh jump, it is recommended that you do not do other to "turn 301, including, not to cannot but when do not use 301 steering. This is a suggestion.

2 found after the web page can grab content.

framework

let the spider through a simple HTML page link to, JavaScript links, flash links are spider trap. It should pay attention to.

was found to be a spider crawling, database dynamic generation, with a lot of parameters URL, sessionID, the whole page is flash, frame structure, a large number of turns, and copy the contents of a large number of spiders have the potential to intercept at the door. This is to pay attention to

1 if you don’t know the frame, you can omit this step, because you have avoided this spider trap.

2, flash

, the 4 jump

1 in a part of the web page using flash to enhance the visual effect is very normal, for example, now a lot of advertising, flash icon. But this is part of a HTML page. Will not have much impact

5,

2 framework design page, in the early days, but now websites rarely use the framework design, so there is not much to say, whether you are in or not, remember.

3, sessionID

Hello, this is the first time I published an article on it, if not a good place to master the exhibitions please.

The use of

1 (session ID) tracking user access, user’s visit will generate a separate ID, then add in URL, this is the spider every time the spider will crawl the site as a new user, which can not be normal spider crawling, a this is the spider trap.

1 to make search engine found on the website home page, there must be good external links to the home page, found the home page, and then the spider will creep along the link.

2 is usually recommended to track user access should use cookies instead of sessionID.

Some sites use sessionID

2 but some website is a big flash file, which constitutes a spider trap, this spider has only one flash link to take, no other content, so as to avoid it.

1, the search engine can find web pages.

Leave a Reply

Your email address will not be published. Required fields are marked *