Top Menu

Archive | Government

Confess Now, Or The Taxman And Big Data Analytics Will Come After You

Governments worldwide are under mounting pressure to tighten their tax gaps, defined as the difference between what the government expects to collect in taxes versus what is actually collected. Closer to home, in a 2011 Economic Transformation Program (ETP) update by Minister in the PM’s Department and CEO of PEMANDU, Dato’ Sri Idris Jala, Malaysia’s tax gap stands at 20%.

Based on the country’s 2014 direct tax collection of RM133.7 billion, a 20% tax gap leaves an estimated (and whopping) RM33 billion worth of taxes uncollected and unaccounted for.

In contrast, many developed nations have tax gaps hovering between 10% – 15%.

Fraud is a major component of a nation’s tax gap, and it comes in many flavors. People and companies evade taxes by underreporting their income. Moreover, they tend to underpay or avoid filing their taxes altogether. The hidden economy – such as money laundering, prostitution, arms and drug trafficking – is also another key reason for a country’s tax gap.

In the past, efforts to widen the field audit and investigation coverage to narrow the tax gap were met with limited success due to manpower constraints and difficulties in monitoring and identifying these fraudulent individuals and organizations.

This is about to change. Governments are increasingly turning to sophisticated techniques like big data analytics to identify tax fraud and increase its tax revenue.

Numerous advanced data analytics tax collection business cases have been developed in recent years. These include initiatives such as propensity to pay, tax collection risk scoring, anti-money laundering, and GST fraud identification.

The Tax Big Data Analytics Maturity Model (Figure 1) shows that high-impact tax business cases that will bring in serious returns for the government – like GST fraud identification, anti-money laundering and real-time fraud detection – will require the deployment of sophisticated analytics that goes beyond traditional reporting, statistical and predictive models. These high-value projects will require the implementation of advanced data analytics such as text mining, path analysis, connections, affinity and visualization.

tax big data maturity model

Figure 1: Tax Big Data Analytics Maturity Model (Image Credit: Teradata)

Let’s take GST scams as an example. The government will be able to use big data analytics to clamp down on companies that charge GST but do not pay the output tax. They will also be able to use advanced analytics techniques to detect GST carousel fraud (Figure 2).

Traditionally, it is impossible for tax fraud analysts to scrutinize millions of transaction data points. The execution of the carousel fraud, for instance, usually takes place in just a few weeks, and by the time it is detected by tax agents, the fraudsters would have gone missing with stolen tax money.

With big data analytics, millions of transactions can be automatically processed. A graph data model can be built to represent GST fraud. Powerful visualization tools can be used to present the chains of suspicious transactions. Fraud analysts can drill down to see companies and individuals connected to these transactions. These suspicious cases can then be flagged for audit. Anti-fraud and investigation teams will be able to act faster to avoid the crime from being committed.

Yep, the taxman is comin’ after you with big data.

gst carousel fraud

Figure 2: An Example of a GST Carousel Fraud (Image Credit: Inland Revenue Authority of Singapore)


Social Analytics Powers Highly Accurate Election Result Prediction

In my last article I have hinted that one can predict an election outcome using social media data. In this article, I will share a bit more how we have adopted social data with other data sources to predict an election outcome in a small constituency in Malaysia with 97% accuracy.


Unlocking the Value of Predictive Analytics

Predictive analytics is a combination of art and science. It uses a combination of human intervention using anecdotal-based assumptions, ability to dissect multiple data sources with different data formats, deep understanding of statistical modelling (and the numbers behind it) and other data science techniques like machine learning, factor analysis, random forest and many other techniques one could think of. And let’s not forget the hours of refinement and reflection. At times, it involves simple mathematics.

The poll-plus model used by Nate Silver to predict the 2008 US Presidential Election is a living proof that predicting an election outcome or voter’s behaviour is a mixture of art and science. In the upcoming 2016 US Presidential Election, Nate expressed his latest views:

Polls shift rapidly and often prove to be fairly inaccurate,  even on the eve of the election. Non-polling factors, particularly endorsements, can provide some additional guidance,  but none of them is a magic bullet

To unlock the real value of predictive analytics, one needs to move away from believing that an investment in a technology (with a click-of-a-button user expectation) will give you a crystal-ball answer to solve your business problems. In a commercial world filled with buzz words and marketing jargons such as “big data”, “data science”, “world-class product”, “backed by the largest tech VCs”, it is easy to get distracted from seeing what value means when you look at from a single perspective (eg: product / tool). Value means combining people, process and technologies.

We predicted the election results with 97% accuracy. How did we do it ?

With differing opinions around the world on predictive approaches, we embarked on an opportunity to predict a by-election results for small constituency of 42,000 voters in 2016. When the official results were out at 2100 hours, the difference between actual vs. predicted results was 97%.

Hours of statistical modelling techniques were deployed to test the assumptions and predictors; from analysing the significance of Cubes Law, Multiple Linear Regressions (MLR) on factors such as ethnicity and age, statistical analysis on voter’s sentiment from online polling and social media data, census data, historical voting performance, effect of age and internet penetration, review of citizen’s emotions (public mood) at the locality level, national events and other data sets.

We create a crystal ball that gives you a set of realistic answers

Since there is no magic bullet in predicting election results, the best representation of a predictive outcome (i.e. probability of it happening) will fall under a best case, worst case and base case scenario. For the by-election predictive modelling, we predicted the incumbent will win 37% (worst case), 52% (base case) and 57% (best case). Each scenarios were carefully reviewed and tested using a set of weighting filters derived from multiple sources of data to represent the real-life situations. Therefore the actual result will potentially fall within the set of realistic probabilities (i.e. scenarios).


We also performed Monte Carlo simulation, which was used in the past to predict the US Presidential Election, as a final sanity check to validate our predicted results. To sum it up, the overall modelling framework and approach which we have undertaken is shown below.


While the framework shown may appear unexciting to some modellers or data scientists, our real competitive edge lies on the data preparation & cleaning, extrapolation and testing of various predictors that may potentially influence the voters outcome, bias estimation, logistic regression and the layers of assumptions applied in order to achieve near perfect accurate prediction.

Our cutting edge approach includes a development of a customized sentiment algorithm engine using Naïve Bayes classifier to detect local dialects in both languages (English & Malay) to identify patterns on favourability or likelihood of voting on either parties using a large sample size from social media data (i.e. both location based and keyword based). Emotions analytics was also used to measure public moods at localized locations within the constituency.

Last words on Predictive Analytics

More often than not, due to rapid evolution of computing technology and internet, users inadvertently forgot that technology (or tools) is a form of automated enablers that can deceive you to believe that those fancy charts or dashboards you view on the computer screen is the gospel truth. Sadly, many have not question the underlying assumptions and the accuracy of the data that they view every day.

The hard truth is, in the world of data science, data analytics is derived from a classic recipe (e.g.: mathematics & statistics) cooked in a brand new electric oven (i.e.: technology) by an amazing first-class science graduate (i.e. people) who is not afraid to explore new approaches or boundaries (i.e. process).


For more information how we can apply predictive analytics to assist your organization, drop us a message at

(This case study is republished with the permission of Berkshire Media)

Video analytics assures peace and inmate wellbeing in police lockups

The Self-Monitoring Analytics Reporting Technology (SMART) Lock-up project is collaboration between the Royal Malaysian Police (PDRM) and MIMOS to address critical issues such as mortality and health of inmates during remand. This is an integrated lock-up management system employing video analytics, wireless communications, systems integrity and business intelligence. The lock-up facility is fully equipped with surveillance cameras, and videos are captured from the cameras and fed through the analytics system for detection of suspicious behaviours. Any events triggered will alert security personnel at the control centre and officers-in-charge at the lock-up area. Overall, SMART enables total situational awareness and provides lock-up officers with vital real-time information for effective situation management.

Among the many challenges faced by PDRM lock-up facilities using conventional surveillance is that there is an over-reliance on officers to monitor inmates by sight and CCTV monitors continuously.  One of the major problems with conventional cameras is they have to be close to clearly identify individual inmates but placing cameras close in a prison environment is difficult and impractical in that cameras can be damaged or covered.

Furthermore, a need may arise to protect PDRM against false accusations about an officer’s actions or about another inmate. Therefore, irrefutable video evidence can help PDRM against false claims. Another area is the possibility of inmate unrest; major inmate unrest in particular can lead to destructive and disruptive consequences. By closely monitoring inmate activity, corrections officers can increase their level of control and react in real time to situations before escalation.

Also, lock-ups could also be a place where inmates smuggle in contraband, cell phones or drugs, and this can be observed with a smart video system in place. The system can also monitor the compliance of officers and their treatment of inmates, as well as record officers who may or may not be performing their jobs effectively.


SMART Lock-up provides five key features, namely advanced video analytics that highlight occurrences of unusual activities and suspicious events with wide angle capabilities to cover the entire area under surveillance; a wireless system that enables sharing of situational awareness; 3D location indicator that highlights event detection; integrity and security protection to ensure authorised use and data integrity; and high-level reporting dashboard based on business intelligence.

With the solution, each facility is fully equipped with surveillance cameras in the cells, walkways and perimeter. Videos are captured from the cameras and fed through the analytics system for detection of suspicious behaviour such as fighting, loitering, climbing, vandalism and suicidal actions by inmates. Any events triggered will alert security personnel at the control centre and officers-in-charge at the lock-up area. This is done via a surveillance screen that indicates the event with a 3D location marker that pinpoints the incident area. An additional form of alert such as an alert siren can be added to further enhance situational awareness. All events are captured with a time stamp for audit trail purposes.

In addition, a wireless system enables sharing of information at any time within the premise and can be extended to authorized users such as district, state and headquarters-level officers.  A 3D location indicator also highlights event detection and effective response time and resource deployment.  The SMART Lock-up also ensures integrity and security protection to ensure authorised use and data integrity.  All recorded data is then relayed to the dashboard system for further reporting and analysis.


Lock-up cell under surveillance
*Innovation, illustrations and drawings are covered and protected under several patents

Business Benefits:
The SMART Lock-up solution will uplift the lock-up operation and management from human-intensive to machine-assisted monitoring. It will facilitate self-monitoring with automatic notifications and alerts, and subsequently serve as a prevention, detection, evidence gathering, and site management tool.  The system comprises intelligent behavioural analysis to add to the lock-up operation’s security strategy, by focusing on early detection, and rapid response to abnormal behaviour or misconduct. Therefore, the provision of adequate misconduct prevention and intervention services is both beneficial to the inmates in custody, as well as to the lock-up officers, which lead to improvement in public perception toward lock-up security.

The Future:
The solution can be replicated to prisons, immigration detention deports, juvenile facilities, rehabilitation centres and other detention centres and facilities.

Singapore libraries use Big Data Analytics for its users and employees

The Singapore National Library Board (NLB) – with 25 public libraries, over 1.5 million titles and more than 30 million loans per year – offers an excellent opportunity to build a business case for Big Data Analytics (BDA). And they did just that.


Relying on vendors and building up its internal capability in unison, the NLB is currently executing some ROI-generating projects that ooze Big Data appeal:

  • Superior search results – data mining past loan record patterns and performing text analytics on them as well as books’ bibliographies to generate enhanced search results and recommendations
  • Demand Analysis – forecasting the demand for new and existing titles
  • Planning a library’s collection – optimization technology used to plan each library’s category mix, maximizing the number of loans given space and budget constraints

All these Big Data Analytics are done using a Hadoop cluster with 13 virtual servers on 3 virtual machine hosts. Elegant!

Business Information ASEAN (Sept 2014) has the full article here.

Yahoo! Japan predicts Japan’s election outcome with single digit accuracy

Yahoo! Japan deploys Big Data Analytics (BDA) to create a better Japan by analyzing the online behavior of its massive user base. Through BDA, it is able to:

  • Capture users’ behavior in detail by analyzing access logs, search on images and videos
  • Optimize advertising via machine learning – their advertisements improve day after day
  • Successfully forecast Japan’s economic growth index before the government could publish the report
  • Predict the country’s election outcome, accurate to one digit