Hadoop-Related Software

Hadoop-Related Software Overview

What is Hadoop Software?

Hadoop is a very unusual kind of open-source data store from the Apache Foundation. However, an entire ecosystem of products has evolved around the Hadoop data store, to the point where it has become its own technology category.

The central idea of Hadoop is that data is spread across many commodity, inexpensive servers, although there are several commercial distributions of Hadoop from Cloudera and Hortonworks who wrap services around the technology.

Unlike a traditional database, Hadoop can handle huge volumes of both structured and unstructured data including log files, streaming data, images, audio and video files. All of this data can be put into the Hadoop cluster and accessed, modified and processed in place, eliminating the need to duplicate and structure data in a traditional warehouse.

Once this huge volume of structured and unstructured data has been stored, how do you extract any value from it? Since Hadoop is not a structured database, structured query languages like SQL do not work. But Hadoop has its own data processing and query framework called MapReduce. Developers can use MapReduce to write programs that can retrieve whatever data is needed. However, MapReduce has several constraints affecting performance and a newer product like Apache Spark provides an alternative distributed computing framework, which is significantly more efficient. Similarly, products like Hive and Cloudera Impala provide a SQL-like query language, which is much easier for data analysts to learn and use.

Best Hadoop-Related Software include:

Apache Spark, Amazon EMR, and Hadoop.

Hadoop-Related Software TrustMap

TrustMaps are two-dimensional charts that compare products based on trScore and research frequency by prospective buyers. Products must have 10 or more ratings to appear on this TrustMap.

Hadoop-Related Products

(1-25 of 35) Sorted by Most Reviews

The list of products below is based purely on reviews (sorted from most to least). There is no paid placement and analyst opinions do not influence their rankings. Here is our Promise to Buyers to ensure information on our site is reliable, useful, and worthy of your trust.

Hadoop

Top Rated

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Apache Hive

Customer Verified

Top Rated

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

Apache Spark

Top Rated

Amazon EMR (Elastic MapReduce)

Amazon EMR

Customer Verified

Top Rated

Amazon EMR is a cloud-native big data platform for processing vast amounts of data quickly, at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the scalability of Amazon EC2 and scalable…

Azure Data Lake Storage

Azure Data Lake Storage

Customer Verified

Top Rated

Azure Data Lake Storage Gen2 is a highly scalable and cost-effective data lake solution for big data analytics. It combines the power of a high-performance file system with massive scale and economy to help you speed your time to insight. Data Lake Storage Gen2 extends Azure Blob…

IBM Analytics Engine

IBM Analytics Engine

Top Rated

IBM BigInsights is an analytics and data visualization tool leveraging hadoop.

Hortonworks Data Platform

Hortonworks Data Platform

Top Rated

Hortonworks Data Platform (HDP) is an open source framework for distributed storage and processing of large, multi-source data sets. HDP modernizes IT infrastructure and keeps data secure—in the cloud or on-premises—while helping to drive new revenue streams, improve customer experience,…

Apache Pig

Top Rated

Apache Pig is a programming tool for creating MapReduce programs used in Hadoop.

Azure HDInsight

Azure HDInsight

Top Rated

HDInsight is an implementation of the Apache Hadoop technology stack on the Microsoft Azure cloud platform: It is based on the Hortonworks Hadoop distribution. Microsoft Azure HDInsight includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie, Ambari, etc.…

Kognitio

WX2 is the data and analytics focused data warehouse appliance solution from UK company Kognitio.

HPE Ezmeral Data Fabric (MapR)

HPE Ezmeral Data Fabric (MapR)

Top Rated

HPE Ezmeral Data Fabric (formerly MapR, acquired by HPE in 2019) is a software-defined datastore and file system that simplifies data management and analytics by unifying data across core, edge, and multicloud sources into a single platform. Just as a loom weaves multiple threads…

Arcadia Data

Top Rated

Cloudera Data Science Workbench

Data Science Workbench

Top Rated

Cloudera Data Science Workbench enables secure self-service data science for the enterprise. It is a collaborative environment where developers can work with a variety of libraries and frameworks.

Alluxio (formerly Tachyon)

Alluxio

Top Rated

Alluxio (formerly Tachyon) is an open source virtual distributed storage system.

IBM Db2 Big SQL

Db2 Big SQL

Top Rated

IBM offers Db2 Big SQL, an enterprise grade hybrid ANSI-compliant SQL on Hadoop engine, delivering massively parallel processing (MPP) and advanced data query. Big SQL offers a single database connection or query for disparate sources such as HDFS, RDMS, NoSQL databases, object stores…

Apache Flume

Top Rated

Apache Flume is a product enabling the flow of logs and other data into a Hadoop environment.

Cloudera Manager

Cloudera Manager

Top Rated

Cloudera Manager is a management application for Apache Hadoop and the enterprise data hub, from Cloudera.

Presto (formerly Presto DB)

Presto (formerly Presto DB)

Top Rated

Presto is an open source SQL query engine designed to run queries on data stored in Hadoop or in traditional databases. Teradata supported development of Presto followed the acquisition of Hadapt and Revelytix.

VMware Tanzu Data Services (Greenplum, GemFire, RabbitMQ, Tanzu SQL)

VMware Tanzu Data Services

Top Rated

VMware Tanzu Data Services is a portfolio of on-demand caching, messaging, and database software on VMware Tanzu for development teams building modern applications.

SAP Vora

SAP Vora is a computing engine designed to provide better accessibility to Hadoop data from SAP HANA. SAP Vora manages unstructured Hadoop data by building structured data hierarchies and making the data queryable through an SQL interface.

Hydrograph

Top Rated

Bitwise offers Hydrograph, a data integration tool with provides ETL functionality on Hadoop and Spark.

Apache Sqoop

Top Rated

Apache Sqoop is a tool for use with Hadoop, used to transfer data between Apache Hadoop and other, structured data stores.

Cloudera Distribution Hadoop (CDH)

Cloudera Distribution Hadoop (CDH)

Top Rated

CDH is Cloudera’s 100% open source platform distribution, including Apache Hadoop and built specifically to meet enterprise demands. CDH delivers everything needed for enterprise use right out of the box. By integrating Hadoop with more than a dozen other critical open source projects,…

Starburst Enterprise

Starburst Enterprise

Top Rated

Starburst Enterprise is a fully supported, production-tested and enterprise-grade distribution of open source Trino (formerly Presto® SQL). It aims to improve performance and security while making it easy to deploy, connect, and manage a Trino environment. Through connecting to any…

Apache Drill

Top Rated

Apache Drill is a schema-free query engine for use with NoSQL or Hadoop data or file storage systems and databases.

Learn More About Hadoop-Related Software

What is Hadoop Software?

Hadoop is a very unusual kind of open-source data store from the Apache Foundation. However, an entire ecosystem of products has evolved around the Hadoop data store, to the point where it has become its own technology category.

The central idea of Hadoop is that data is spread across many commodity, inexpensive servers, although there are several commercial distributions of Hadoop from Cloudera and Hortonworks who wrap services around the technology.

Unlike a traditional database, Hadoop can handle huge volumes of both structured and unstructured data including log files, streaming data, images, audio and video files. All of this data can be put into the Hadoop cluster and accessed, modified and processed in place, eliminating the need to duplicate and structure data in a traditional warehouse.

Once this huge volume of structured and unstructured data has been stored, how do you extract any value from it? Since Hadoop is not a structured database, structured query languages like SQL do not work. But Hadoop has its own data processing and query framework called MapReduce. Developers can use MapReduce to write programs that can retrieve whatever data is needed. However, MapReduce has several constraints affecting performance and a newer product like Apache Spark provides an alternative distributed computing framework, which is significantly more efficient. Similarly, products like Hive and Cloudera Impala provide a SQL-like query language, which is much easier for data analysts to learn and use.

Related Categories