Thursday, November 24, 2016

Project At Crossroads



My MB project is now at its crossroads. It has been 2 years since I first launch out to research and experiment on ideas and perhaps a possible startup. I initially started out trying to build a cloud based storage application but later encountered the web traffic spam problem. I pivoted to work on solving this problem and applied machine learning to its toolbox. Over time, I realized that the data was showing me something exciting about the correlation between spam and hacking. I did another pivot and applied the technology to web security. MB has come a long way in its 2 year development journey. I have verified many interesting hypothesis about identifying bot and hacker behavior, as well as, suspicious web traffic patterns. I must admit that statistical application to real world web traffic data does have many powerful applications. I was able to identify both bot and malicious web traffic using the analysis techniques that I have designed and implemented to a surprising accuracy with a very good degree of recall. I am really proud of the technology that I have build in this project. I am thankful for A*Star/I2R to have allowed employees like me to research and explore new ideas and passion during our spare time.

However, the project is now at its crossroads. I will have to halt the public access to project MB for an undefined period until I can find support to justify the continuation of the project and service. For those of you who have graciously participated in the experiments with MB, you will have up to 29th November 2016 to remove your redirection or use of our service.

Until the time when I can resurrect the MB service, I want to thank those who have supported the project during this period of experiments and exploration. I personally feel that Singapore should continue to seed these types of exploratory work to breed new innovation and perhaps out of it will come the next tech giant...

Thursday, November 3, 2016

How I Identify Fake Web Traffic on My Site

Since I last posted about How I Identify Hackers on My Site, my MB project site has experienced a gradual but significant increase in web traffic. However, I am not celebrating because they are majority fake web traffic. I know that the people creating these fake web traffic are also reading my blog, probably through feedly or a similar feed reader. Nonetheless, I still want you to know that I am aware of what you are doing. The great thing about a technology like MB is that I have also designed it to identify fake web traffic as part of its machine learning capability. In order to understand what I am talking about, you need to follow my thoughts and take a look at the data.

On the left is a list of sample fake web traffic that I have been getting. All the web traffic actually comes from Internet Service Providers (ISP) from all over the world, stretching from Asia to the Middle East and North/South America. They are ISP providing Internet connection service for home users and mobile data or WiFi for the public. Interesting data indeed, on that note, I do think that these fake web traffic sources are actually a single botnet.

At first glance, the data does not look suspicious and MB does not block them because the web traffic is new and it is still gathering the patterns and analyzing them. However, internally MB's detailed analysis has already marked them as probably fake. If you (the hacker) are reading this post, yes, it wasn't difficult at all for MB to figure this out. MB's interface and reporting tool may be simple but make no mistake that it is a very intelligent piece of technology.

Before you think that this is purely guess work. You need to take a look at the data chart that MB uses as part of its machine learning capability. I have taken this report directly out of the service.


First, notice the sudden and significant drop in attack traffic between day 6 and 5. Second, this drop is approximately 2-3 days before the sudden rise in fake web traffic between day 3 and 2. MB takes into consideration the historical data that was used to model the site that it is trying to protect. When anomalies like those shown in the chart occurs, MB proactively notifies me and tells me that it has noticed some strange events happening on the site. Again this analysis approach is a pattern that I described in the storm chasing article. First you see the storm forming, then subsequently touch down occurs and the reconnaissance or attack takes place. More importantly, MB is aware of what is happening underneath the wires and data packets flowing in and out of your site, server or IoT device. You don't need to be a web security expert or have a PhD to prevent attacks from happening. You just need to adopt a predictive technology like MB to protect your Internet property. ;)

MB will blow your mind when you try it out for yourself!

Friday, October 28, 2016

How I Identify Hackers on My Site

The most often asked question that I get is on how to identify hackers on my website. I get this question whenever I make a presentation or share with people about my web security project, MB. I have decided to share a particular use case that I recently encountered on my project website to showcase how preemptive technology like MBapproaching web security. It is no longer the case of trying to fix a compromise, apply forensic tools to find the perpetrator, recover the stolen data and avoid ransom or negative publicity. Rather, technology is becoming proactive to look for the attackers before they even make an attempt. This is my story of such an approach.

This morning at about 8am in the morning while I was commuting to work, I received an email notification from my machine learning service MB about some anomalies currently happening on my website. I occasionally get automated notifications from MB whenever it detects patterns that are unusual or web traffic visitations that are outside of the 80 percentile range that it considers normal based on my existing site profile. MB uses the site profile model that it has build up over a given period of time with the aid of human feedback to figure out what is considered as anomalies and what is considered as normal. The nature and web traffic dynamics of every site is different so I designed MB with the capability to handle each site dynamically using an unique model in its learning computation. On the right, is a sample of the notification that I got on my way to the office. Notifications like this one is often a cue for me to check the analysis report generated by MB. I use the analysis report to look for hackers or bots attempting to do bypass or do reconnaissance on my web data or site.

When I finally get the time to review the analysis report, I quickly scan through hundreds of web traffic data using the visual cues that MB has provided for me. I will typically only look at data that has been classified as suspicious. With the help of a tool like MB, I will especially focus on the time around the period when the alert was generated. On the left is a sample list of buggers that MB has identified who had caused the alert to be flagged and reported.


Now at first glance, the data looks rather normal because the web traffic source came largely from home users. However, MB provide simple ways for me to cut and dice the data to understand the patterns that it has considered before alerting me. In the 4 cases on the left, I review them for an indicative touch down that has happened. The aggregate set of web traffic data surrounding the alert forms the digital storm patterns brewing towards a touch down. In this case, the Chinese visitor is the start of the storm formation 5 hours before the actual touch down occurred. Touch down occurs when subsequent storm patterns point towards attempts that clearly translates to suspicious behavior with respect to data and visitor interaction or contact. In the example shown, each of the later 3 visitors consistently made roughly 3-4 attempts each with behavioral patterns that suggest that the parties from each of these countries are actually the same person. What is even more interesting is that the data shows deliberate patterns of sequential attempts to conceal himself or herself after the perpetrator realizes that he or she has been discovered. You can also read about my other hacker discoveries here.

Crazy as all this may sound, they are the future to preemptive web security. Early detection could give law enforcement the head start to trace and identify hackers before the attack and also allow sufficient warning to prepare for the intruder before he or she attempts to compromise the target. In my work on this area, I have also created what I call the web traffic quality meter. These 2 charts are taken from my MB service report on my blog site and project site.


Not all sites are created equal when it comes to spammers, hackers and thieves. Some sites are more polluted than others. As the data shows, my blog site only has an estimate of 8% human visitors while my project site has an estimate of 17% human visitors. This measurement is important because my hypothesis is that the measure of spam traffic on a website can be effectively used to predict security issues before they actually happen! Not sure if my work in this area will ever be noticed but I am happy and contented for now to be able to experiment and discover new ways of keep us safe on the Internet. Cheers!

Wednesday, October 26, 2016

Can We Stop a DDoS?


After the massive DDoS attack on Dyn last Friday, everyone seem to be talking about IoT and DDoS vulnerabilities. These vulnerabilities are not new, in fact they have been widely predicted. I have listed some characteristics on what makes a vulnerable device before in my previous post about smart TVs and these exploit-abilities are worth repeating here again:
  • Devices are always connected to the Internet.
  • Devices are not actively monitored by the owners.
  • Legacy software is often found embedded in these devices.
  • Software in these devices are often to never updated on a regular basis.
  • Profile of the end users of these devices are mostly not IT security competent.
A friend recently send me another DDoS news article but this time it is an incident much closer to home, DDoS attacks on StarHub. In short, nothing online is safe from a DDoS attack. I have also written briefly about my experiments on DDoS before. Effective solutions to a DDoS attack are few. Many solution providers claim to deal with it effectively but in reality it is a very difficult problem depending on the amount of data generated during the attack. In my previous article, I had mentioned that the most effective known method to handle a DDoS seem to be the widely used fan out method. Meaning, distribute the DDoS out depending on the regions closest to the attack. This way, the large volume of attack traffic is quickly diluted and becomes much easier to manage and handle. However, this technique requires a massive content delivery network (CDN) that is not available to majority of website and IoT owners. I do think that there are other techniques that could better deal with this problem then a reverse brute force distribution approach.

Reverse the Protocol


The key problem to tackle first is the amount of data choking a server when an attack occurs. To solve this problem, we need to dig deeper into the way our Internet works. The TCP accelerates this problem because it buffers data at the lower levels and then stitches them together when the entire message has arrived to hand it over to the higher levels for further processing. Simple logic tells us that the more data arrives the greater the choke. Now, here is the way I think we should fix the problem. Don't buffer the data, in fact don't accept any data until the receiving end has agreed to accept the data. This is different from the current approach or implementation of first accept data then drop. Rather, we should reverse this with first acknowledge (before sending data) then accept data. Devices and routers should refuse to forward or route any data that has not been acknowledged as accepted. This way, only minimum amount of data is flowing through our Internet because the data is stuck at the sending end rather then flooding and choking the network. The difficulty for this solution is in how do we implement this first acknowledgment then accept approach. ;)

Clip the Capacity


The second problem to tackle is the blind forwarding and routing of data packets on our networks. Blind meaning that our network infrastructure is created to function without any intelligence. Our routers and gateways are dumb multiplexing machines that was designed for speed but not intelligence or security. In a DDoS, the massive volume of data floods the physical layer to a point where congestion brings everything to a halt. At this stage of the attack, the entire bandwidth and capacity is completely utilized. Clipping the capacity based on the destination machine will allow limited data from flooding the entire network. The reality is that after a certain point of flooding, the destination device or machine will not be able to handle the load anyway so why bother to continue to flood the network. This is where smarter routers and gateways can help to enforce the sanity of the network traffic flowing in and out of its jurisdiction. There is no need to do in depth analysis of network traffic but rather leverage on the anonymous bandwidth statistics with the assumption of collaborating routers and gateways outside of its jurisdiction. Routers and gateways collaborate with others outside their jurisdiction to clip the bandwidth along the routing path until it reaches the source and destination of the flooding. This way, only the channels of the attackers are affected leaving the rest of the innocent parties with the unaffected capacity and bandwidth to continue their activities.

Apply them Together


The combination of these two in my opinion could be enough to render DDoS toothless and bring control back from the criminals into the hands of everyone. In the recent case of Dyn, if a system like this is in place, the servers of Dyn could quickly control the traffic that is flooding it using the reverse protocol to limit the flooding to a level where it can manage while the collaborative clipping along the routing path of the capacity prevents others in the same network from being affected by the attack. It would be great to see this design a realty someday in the future!

Wednesday, October 12, 2016

Spotting Dangerous Web Traffic

I have been trying to raise awareness and the importance of blocking bot, suspicious or dangerous web traffic on sites and IoT devices for more than a year now. However, it seems that no one seem to be listening. Everyone shouts about hacking and major compromises like those at Yahoo and IoT DDoS. However, it seems like other than crying about it, very few individual consumers seem to be actively looking to solve this problem. I was hoping to help the individual consumer rather than the enterprise because I think that that might be the new and upcoming market. If you are interested in what I am doing or want to try my solution, please drop me an email. ;)

My Work & Experiments


Anyway, I wanted to share that I have been working and experimenting in my spare time on a solution that approaches the problem from a slightly different angle than the traditional detection approach. For those of you who have not been following my work, I am working on prevention rather than detection.

In my opinion, I don't think that the current generation of firewalls are effective anymore because it is too static when compared to the dynamic techniques used by criminal hackers. I saw many new startups spring up to tackle this problem and big investments in the cyber security space recently (here and here) too but I am wondering how effective these tools are or are they merely hype. I do hope that the best technology will win out in the long run.

From my experiments, I have come to believe that this problem can be tackled more gracefully if we are using and looking at the right data and context. I have applied machine learning to this problem to scale and stretch the prevention approach that I have started out with even further into the realm of predictions. Technically, my experimental approach with machine learning is rather simple. I hand craft and model a set of high quality labels to start with. Using the labeled data, I benchmark it against a wide set of dimensions within the web traffic data. The dimensions are slowly filtered out and reduced based on empirical observations over tens of thousands of real world live data. This process is repeated until I can weed out all the false positives and negatives with a satisfactory ratio of precision and recall. The final product is a machine learning model that I use as an unsupervised system which I promote as a service, MB™. The results are pretty exciting and interesting so far.

Two Dangerous Samples


Here are two of the recent samples that MB™ has caught without any human intervention. The data is taken out from my network report as the bots crawl around for information and participate in referral spamming. These machines are typically part of a botnet doing data reconnaissance for their criminal masters.


The web traffic from Syria actually belong to an organization whose members have links to the Syrian Electronic Army. The second web traffic came from a south American home user who probably does not even know that his or her computer has been hacked and used as part of a botnet. The clues are not always as obvious as the two samples shown here because both web traffic includes referral data that comes from well known referral spammers. MB™ has surprised me many times before in being able to pick out and spot them even when obvious data is not immediately available for my validation.

Preventing these types of web traffic from obtaining information about your website or IoT device is critical because without any information available to the criminal hackers, it is hard for them to quickly decide how to exploit your device, server or site. Trying exploits randomly quickly exposes them to other existing security tools such as malware detection, antivirus and penetration monitoring tools.

I recently deployed MB™ on my blogger site and the results shocked me. It turns out that almost 90% of all the traffic that came to my blog site are bots, scrapers and are questionable. Below is a summary report extracted from my MB™ dashboard.


Hope to hear from you soon, in the meantime, stay safe online! My contact is
support@malleablebyte.org

Tuesday, October 11, 2016

Don't Assume You Are Not A Target


When it comes to websites and IoT devices, a little bit of clarity from your site visitation data can go a long way to protecting it from hackers, thieves, scammers and spammers.

Protect your IoT and site today!

Wednesday, September 28, 2016

Does ShenZhen Sunrise Technology sound Familiar?


Have you seen web traffic from them coming to your site as well? If you have then this is a post that will interest you.

By the way, I am a self professing digital storm chaser. I wrote about bot infestation on my blogger site just a few weeks ago and after seeing the data and analysis from my machine learning service MB™, I am fully convinced that the previous spikes in traffic on my blog posts are largely bot generated. In a way, these new insights have generated new excitement for me in my digital storm chasing hobby. I have been recently tracking web traffic coming from ShenZhen Sunrise Technology Co. Web traffic from them is interesting because they have been showing up on my blog since I enabled MB™ on my blogger site. Before implementing MB™, I had totally no visibility to the web traffic harassing my blog site, now I can finally take some much needed action.

It seems like the company could have been originally a security camera manufacturer. However, they seem to have started developing many other types of popular consumer devices since then. Ok, lets get back into the dirt. Now, what is interesting about their web traffic is that, they display patterns very similar to bot like web crawlers or spiders employed by many search engines. This was the first reason why I got interested in them. The data that I have collected seem to consistently point them to Baidu. However, how are both companies related is still currently a mystery to me. The second reason that got me interested in them was the fairly recent reported attacks by botnets employing security cameras. The characteristics and vulnerability of poorly protected IoT devices are becoming an issue that needs to be closely watched but the technology available to do these types of automated and predictive monitoring are far and few. Hopefully, MB™ can make a difference in this area too.

Some Interesting Speculations


These are some possible speculations that I have thought about but have no direct evidence to make any real conclusions.
  • Is the security cameras that were implemented by Sunrise being taken over by hackers and used to do intelligence reconnaissance?
  • Are the hacked or compromised surveillance cameras used by Baidu?
  • Is Baidu using their IP addresses to hid their web crawling activities?
  • Is Sunrise secretly using their Baidu surveillance cameras to do referral spamming?
Anyway, the chances of Sunrise's devices or computers compromising Baidu is going to be a much higher probability than the other way around. So I hope you are reading this Baidu!

What to do if they are crawling on your site too?


I recommend that you block them immediately from crawling on your site because no device manufacturer from China is ever going to be interested in reading your blog, device or website. The fact that the web traffic patterns exhibits bot like crawling patterns should be a danger sign substantial enough to tell you that you need to keep them out of your blog, device or site. The only problem that you may face is that you cannot tell who they are by merely looking at their IP addresses. You need to employ a technology similar to MB™ to help you do just that. You can also try out my project website to check the probability of whether your device or site visitors are rogue or friendly. I hope that I have helped to keep you better informed with this post!