This blog had faced suspicious access since this February*1. It brought extra 1000-2000 accesses per day, though this blog usually has 200-300 visits per day. Major access analytics services as Google Analytics automatically remove such bot-like traffic, but some other serivces show these suspicious accesses below.
If their objective is scraping, they don't need to make so many accesses because once they download all pages, all they need to do is to check for daily updates. It won't require more than 1000 accesses per day.
Even if there is no actual damage from suspicious access, filtering out suspicious activities from analytics reports detected by aggregation services remains a necessary daily task. Moreover, it is psychologically draining.
Usually, blocking such accesses is the role of infrastructure. But we can't touch the infrastructure of blog services. Considering possible responses on the client side, the idea of access control by time zone came up.
- Idea: access control by time zone, not IP address
- Code commentary
- Code to redirect access from certain countries
- Off topic
- Reference
Idea: access control by time zone, not IP address
This suspicious access has next characteristic.
Referrer | None Direct |
Source | Singapore China |
OS | Unknown Android 5 |
Browser | Unknown |
Recently, there have been technologies such as Tor that change the source of access each time, though this suspicious access doesn't seem to use such technology. One service recognized the access was from Singapore, and another recognized the access was from China. It would be good to limit access from these locations first.
Web services with geolocation API below can identify country by source IP address. Some of them are paid services or free with limitation of max API call count, which might make it somewhat difficult to implement actively.
User Agent information, such as OS and browser names, can be easily spoofed, making it unsuitable for this purpose.
The idea of identifying a source country by client-side time zone and restricting access accordingly is not new. There are already similar practices being used by others, as shown below.
Code commentary
The function "getCountry" in code introduced above generates 2 dictionaries as
Function | Record sample |
---|---|
countries | JP: "Japan", |
timezones | Japan: {a: "Asia/Tokyo", r: 1 }, |
Typically, the following code outputs string as "Asia/Tokyo"
Intl.DateTimeFormat().resolvedOptions().timeZone
The source country can be estimated by cross-referencing the client-side time zone with 2 dictionaries. While the time zone set on a PC or smartphone doesn't always match the user's actual location, this post treats such casees as exceptions.
Code to redirect access from certain countries
Let's consider code that redirects to a predetermined destination, depending on the country estimated from the client-side time zone.
For example, if the access is from Japan (time zone: Japan), the code redirects to NHK News in Japan, but for access from other countries, it redirects to NHK WORLD.
const REDIRECT_DESTINATION_JP = "https://www3.nhk.or.jp/news/" const REDIRECT_DESTINATION_OTHER = "https://www3.nhk.or.jp/nhkworld/" const TARGET_COUNTRIES = ["Japan"] let userCountry = getCountry(); if (TARGET_COUNTRIES.includes(userCountry)){ window.location.replace(REDIRECT_DESTINATION_JP); } else{ window.location.replace(REDIRECT_DESTINATION_OTHER); }
This code can be adapted to redirect bot-like access by handling "TARGET_COUNTRIES" as a blacklist.
const REDIRECT_DESTINATION = "https://chinadigitaltimes.net/space/CDS%E4%B8%93%E9%A1%B5%EF%BC%9A%E6%95%8F%E6%84%9F%E8%AF%8D%E5%BA%93" const PROHIBITED_COUNTRIES = ["China", "Singapore"] let userCountry = getCountry(); if (PROHIBITED_COUNTRIES.includes(userCountry)){ window.location.replace(REDIRECT_DESTINATION); } else{ //window.location.replace(REDIRECT_DESTINATION_OTHER); }
Off topic
script tag
When a script is loaded as a linked file via a script tag rather than embedded code in an HTML file, take care of attribute specification.
async | HTML and JS files are parsed simultaneously JavaScript is executed once the JS file has been parsed. |
defer | JavaSCript is executed after HTML file has been parsed. |
redirect
There are 2 ways to specify destination of redirect. "window.location.replace" is better because it guides users through a junction page before redirecting them to the final destination.
window.location.replace("http://www.w3schools.com");
- The URL is replaced with the redirect destination.
- The current URL is not recorded in history.
- Browser history navigation doesn't work.
window.location.href = "http://www.w3schools.com";
- Let users jump to the redirect destination.
- The current URL is recorded in history.
- Browser history navigation works.