What is Hadoop?
Apache Hadoop is an open source software framework written in Java for distributed storage and processing of very large datasets on multiple clusters. Developed by Doug Cutting and Mike Cafarella in 2005, the core of Apache Hadoop consists of ‘Hadoop Distributed File system for storage and MapReduce for processing data. The basic philosophy of Hadoop is to reduce dependence on expensive legacy system hardware to enable distributed parallel processing of very large amounts of data across inexpensive, standard, commodity servers to process and store data without any volume limitations. Hadoop makes the process of storing and managing data economical and reliable.
Wondering why the revolutionary technology was named ‘Hadoop’ and has a yellow elephant as its mascot? See the video of its creator Doug Cutting explain How Hadoop Got its name.
What are the key features of Hadoop?
- Reliable– Fail Safe technology that prevents loss of data even in an event of hardware failure.
- Powerful– unique storage method based on distributed file system resulting in faster data processing.
- Scalable– stores and distributes datasets to operate in parallel, allowing businesses to run applications on thousands of nodes.
- Cost-effective– runs on commodity machines & network
- Simple and flexible APIs– enables a large ecosystem of solution providers such as log processing, recommendation systems, data warehousing, fraud detection, etc.
How is it revolutionizing the Finance Industry?
Big Data is no longer just a buzzword for the banking and financial industry, for it addresses various issues with the 5Vs; Volume, Velocity, Variety and Veracity. Wondering what the 5th V means? As Bernard Marr, a leading business and Data Expert explains, it is the most important V of big data, for it defines how the other attributes work; Value. Value refers to an organization’s ability to turn data into something worth more. It is this Value addition that makes Big Data and Hadoop not just a new trend but a breakthrough for the finance industry.
Banks like BNY Mellon, Morgan Stanley, Bank of America, Credit Suisse, PNC, etc. are already working on strategies around Big Data in Banking, and other banks are rapidly catching up.
How is it used in banking and finance?
The reason for Hadoop’s success in the banking and finance domain is its ability to address various issues faced by the financial industry at minimal cost and time. Despite the various benefits of Hadoop, to apply it to a particular problem needs due diligence. Some of the scenarios in which it is used are:
Hadoop addresses most common industry challenges like fraud, financial crimes and data breaches effectively. Analyzing point of sale, authorization and transactions, and other points, banks can identify and mitigate fraud. Big Data also helps in picking up unusual patterns and alerting banks of the same, while also drastically reducing the time and resources required to complete these tasks.
Assess risks accurately using Big Data Solutions. Hadoop gives a complete and accurate view of risk and impact, enabling firms to makes informed decisions by analyzing transactional data to determine risk based on market behavior, scoring customers, and potential clients.
Data Storage and Security
Protection, easy storage and access of financial data are the optimal needs of banks and finance firms. While Hadoop Distributed File System (HDFS) provides scalable and reliable data storage designed to span large clusters of commodity servers, MapReduce processes each node in parallel, transferring only the package code for that node. This means information is stored in more than one cluster but with additional safety to provide a better and safer data storage option.
Bank need to analyze unstructured data residing in various sources like social media profiles, emails, calls, complaint logs, discussion forums, etc. as well as through traditional sources like transactional data, cash and equity, trade and lending, etc. to get a better understanding of their customers. Hadoop allows financial firms to access and analyze this data and also provides accurate insights to help make the right decision.
Hadoop is also used in other departments like customer segmentation and experience analysis, credit risk assessment, targeted services, etc.
Does Hadoop have limitations too?
Although Hadoop has been embraced by several banking organizations and it forms the backbone of several applications running Big Data technology, there are also several reasons Hadoop may not always be the best solution. Some of them are:
Big Data understanding
Hadoop is normally implemented when Big Data is to be implemented. But before using it, one must ask the right questions and ponder upon whether it is the right solution. Any organization that has a huge inflow of data from various sources and which is facing issues to store and effectively use the existent data can use Hadoop and Big Data solutions for their enterprise.
It is not a solution, but a tool
Hadoop is not the complete solution. Although fraud detection and risk management leverage the strengths of Hadoop, Hadoop by itself does not solve these issues. Programmers need to write codes with an understanding of the problem so that they utilize Hadoop’s strong points to solve the business problem. e.g. Big Data does not help in picking up unusual patterns. Big data merely allows large data to be processed concurrently.
Not a unique service
Hadoop allows analysis, but there are many products who allow you to do data analysis. So, though Hadoop can be used for the purpose of analysis, implementing the framework only to address analytical issues will not be a smart idea. Hadoop is beneficial only if one finds more than one scenarios where its USPs can be used properly.
Like any other technology, Hadoop is not full proof or foolproof for that matter. Data is at risk as encryption is missing in Hadoop system at storage and network levels. Also, since Hadoop makes various duplicates to store data so that it can be retrieved in case of failure, it is vulnerable to data breaches like Java, the language in which the framework of Hadoop is written in.
Here’s an analogy: Consider a dinner knife. You can use it very well to butter your toast, to cut a piece of potato, to wedge open a shut tin, or use as a screwdriver for wide-notched screws; but it is useless if you want to drink soup or if you want to make a phone call.
Hadoop is like a knife. The programmers use it to do things effectively where it is applicable. Hadoop does not do fraud detection or risk management, actually it does not do any business logic by itself; it just manages the storage and retrieval of data in a distributed way.
Role of Hexanika in bringing Hadoop to Banks
Hexanika is a RegTech big data software company which has developed a software platform SmartJoin and a software product SmartReg for financial institutions to address data sourcing and reporting challenges for regulatory compliance. We leverage distributed parallel processing by using ‘Hadoop’ to speedily process large volumes of data.
Hexanika offers these “end to end” platform and product solutions at affordable costs and the technology itself is easy to understand and implement by banks: scalable, sustainable and in tune with emerging trends in the banking and financial services world, ensuring a win –win situation for all stakeholders.
Banks using Big Data: http://www.datanami.com/2012/07/09/how_four_financial_giants_crunch_big_data/
Problems addressed using Hadoop: https://www.linkedin.com/grp/post/4047138-66789987
10 Reasons why Hadoop is not the best Big Data solution all time: http://www.efytimes.com/e1/fullnews.asp?edid=152456
Big Data: 5 major advantages of Hadoop: http://www.itproportal.com/2013/12/20/big-data-5-major-advantages-of-hadoop/