Segmentation: Bot Traffic Identification & Exclusion Tool | Community
Skip to main content
MichaelWon
New Participant
January 9, 2017
Investigating

Segmentation: Bot Traffic Identification & Exclusion Tool

  • January 9, 2017
  • 12 replies
  • 26851 views

I need help identifying bot traffic because we get a ton of it; somewhere between 30 - 40% of our page views and 5-15% of visits.  I am not referring to known bots (Google Spider) or malicious bots attempting to take down our site or defraud us.  I am referring to third party scrapers coming to us for information.  This type of trafffic is not looked at negatviely because 1) it is not harmful to our site experience. 2) everyone does it. 3) it is difficult to police.

Because we get so much bot traffic, we spend a chunk of time identifying if swings in our KPIs are real or due to non-human traffic.  This slows us down considerably. The bots coming to our site use standard devices, user agent strings, operating systems, devices, and also change their IP addresses frequently.  I am able to qualitatively identify this traffic because of the following:

1. This traffic is typed/bookmarked.

2. This traffic never has any of our campaign parameters.

3. This traffic lands on pages that would not normally be a direct landing page (i.e. a specific product page)

4. This traffic is from the 'Other' device type.

5. Page Views = 1 per visit.

6. Visits = Visitors and visits is showing very high numbers, i.e > 1k when looking at captured IP addresses.

So, whoever is crawling our site is deleting their cookies on the same IP address and viewing a single page view.   See attached for a screenshot.

It would be great to somehow aggregate visits from different visiors (cookies) where certain behaviors are taking place.  For example: 

Exclude all 'Visitors' if

1. 'Any value' for a given variable (evar/prop) shows up more than X times.

AND

2. PVs per Visit for each visit <= 1

AND

3. Traffic Source for all visits is typed/bookmarked.

We can solve for this in SQL , but not sure its doable in Adobe.  Any thoughts?

12 replies

Employee
January 9, 2017
Employee
January 9, 2017

 Hi Michael,

 

We're actively investigating bot identification and filtering, so someone from my team may reach out to you for a deeper conversation. Our goal is to automatically identify non-human traffic and filter it out. We'd also like to report on it so you're aware of what content malicious bots are consuming. Would you prefer that the data be excluded entirely, or do you want the full reporting abilities of Analytics to be applied to bot behavior? What kinds of reports would you want on bots? 

 

In the mean time, have you considered classifying the IP address as bot-or-not? If you combine this with Virtual Report Suites, you can create a virtual report suite that excludes bot-visitors via a segment, and that segment can be updated over time. The easiest way to keep that segment up to date is to classify various IP addresses as bot and use the segment to exclude visitors where bot_check (a classification of IP address) equals "bot". Because bot IP addresses can change over time, a more complete solution is to combine IP+day as a Customer ID in the Visitor ID service, and use customer attributes to exclude visitors with a customer ID of IP+day. This would require regular updates via FTP, but would give you flexibility to exclude data from a virtual report suite at any time.