Big Data – Everything about big data / Habr

How to become an author

.NET Knowledge Base

Farwit August 10, 2020 at 04:17 PM

Neural networks in reality
- Recovery Mode
The mass of news and articles about artificial intelligence creates the illusion that we are living in a fantastic time. But when you start asking everyone what exactly is useful in real life from these high technologies, the answers come down to some Google features, mobile games and a story about Chinese videos. By the way, oh, these Chinese videos — for some reason, they are constantly shown by the central mass media when they demonstrate Moscow's intellectual technologies.

In words, it seems, all the «intellects» are installed already everywhere, the whole country has long been transferred to neural networks, but only in some kind of demonstration pictures, in diagrams, on fingers. There is a mental dissonance — why not take a video camera and shoot at least a fragment of how Russia's super mega technologies work?

As Nikita Sergeevich said, «science ceases to be self-indulgence when its fruits are applied in the national economy.» And today's artificial intelligence is familiar to us only from games. Many people really want to see something useful in reality. Therefore, we were not too lazy and recorded our video of the operation of neural networks from real objects.

Read more →
- –1
- 358
- Comment
Farwit August 6, 2020 at 09:14 AM

The Project «Fabula»: How to find the desired video-fragment or person in a pile of video files?
If a person is far over 20, then he has already accumulated a huge film library of his life, as well as videos from friends, relatives, and from his place of work… It is no longer possible to find someone or something specific there. Recently, I was preparing a video compilation for my daughter's anniversary – I spent a week. The media is all the more overloaded with video archives. And every day, millions of terabytes of video content appear in the world. And this is in the era of BIG DATA.

Read more →
- +3
- 435
- 4
varunbhagat1 May 20, 2020 at 12:42 PM

Machine Learning & Big Data: Let’s Find The Relationship Between Them
- Big Data,
- Machine learning
Machine learning is indeed a famous word among technologies. Today we will relate it with another famous term that is Big data. Both these have become Buzz words these days. Let’s here find out their meaning individually.

Big data is known as the process in which we collect and analyze the large volume of data sets (called Big Data) which helps in discovering useful hidden patterns and other information such as customer choices, market trends which is really beneficial for the organizations to remain informed and customer-oriented business decisions.

Read more →
- 0
- 905
- 1
TheQuantumDaily May 15, 2020 at 02:25 AM

The QC House of Cards
There’s Gold in Them Thar Hills

Gold rushes can make people crazy. 1848 was enough of an indicator of that. When Sam Brannan announced to the world: ‘Gold! Gold! Gold from the American River!’, half the world’s population (or so it seemed to the tiny California population which lived there at the time) descended on the soon to be the newest state of the union.

San Francisco, before a small hamlet with a few hundred pioneers living there, became a centre of vice, murder and debauchery overnight.

Two hundred years before tulip mania hit Europe, and like in California with its argonauts or 49ers, it impoverished more than it made rich. In the early 2000s, too, the Dot.Com bubble created a speculative tendency in people when irrationality took over all reason.
Read more →
- –1
- 145
- Comment
TheQuantumDaily May 6, 2020 at 04:48 PM

Four Ways Quantum Computing Will Change Artificial Intelligence Forever
If science were a dating app, quantum physics and machine learning probably wouldn’t be a match. They’re from completely different fields and often require completely different backgrounds and skills. But, throw in a little quantum computing and, suddenly, that science-matchmaking app becomes Tinder and the attraction between the two is palpable.

(Credit: cmo.adobe.com/articles/2017/5/how-will-artificial-intelligence-impact-business-tlp-ptr.html#gs.5zlifl)

Even though the extent of change that quantum computing will unleash on AI is up for debate, many experts now more than suspect that quantum computing will definitely alter AI at some level. Analysts from bank holding company BBVA, for example, point toward the natural synergy between quantum computing and AI as reasons why quantum machine learning will eventually best classical machine learning.

“Quantum machine learning can be more efficient than classic machine learning, at least for certain models that are intrinsically hard to learn using conventional computers,” says Samuel Fernández Lorenzo, a quantum algorithm researcher who collaborates with BBVA’s New Digital Businesses area. “We still have to find out to what extent do these models appear in practical applications.”
Read more →
- 0
- 411
- 2
ShifaMartin May 5, 2020 at 02:54 PM

Reach Out Top Hadoop Consulting Companies To Leverage Big Data In 2020
- Big Data,
- Hadoop
Hadoop is divided into different modules, each of which delivers a distinct task crucial for a computer system and is uniquely designed for big data analytics. Apache Software Foundation developed this incredible platform. It is extensively utilized by worldwide developers to build big data Hadoop solutions amazingly and easily.

Big data offers several perks, some of them are; examining root causes of failures, recognizing the potential of data-driven marketing, improving and enhancing customer engagement, and much more. By offering multiple solutions in a single stream it helps in lowering the cost of the organization.

In various industries such as Retail, Manufacturing, Financial insurance, Education, Transportation, Agriculture, Healthcare, Energy, etc big data is utilized and that’s why it’s demand is expanding day by day. The Global Hadoop Market is envisioned to grow to $84.6 billion by 2021, with an expected CAGR of 63.4%.

Read more →
- +3
- 284
- 2
TheQuantumDaily May 3, 2020 at 04:29 AM

Could Quantum Computing Help Reverse Climate Change?
The unique powers of quantum computation may give humanity an important weapon — or several weapons — against climate change, according to one quantum computer pioneer.

One of the possible solutions for the excess carbon in the atmosphere and to reach global climate goals is to suck it out. It sounds pretty easy, but, in fact, the technology to do so cheaply and easily isn’t quite here yet, according to Jeremy O’Brien Chief Executive Officer, PsiQuantum, a quantum computing startup.

Currently, there is no way to simulate large complex molecules, like carbon dioxide. Current classical computers cannot simulate these types of molecules because the problem grows exponentially with the size or complexity of the simulated molecules, according to O’Brien, who wrote an article outlining the issue at the World Economic Forum’s annual meeting held recently.

“Crudely speaking, if simulating a molecule with 10 atoms takes a minute, a molecule with 11 takes two minutes, one with 12 atoms takes four minutes and so on,” he writes. “This exponential scaling quickly renders a traditional computer useless: simulating a molecule with just 70 atoms would take longer than the lifetime of the universe (13 billion years).”
Read more →
- +1
- 327
- 3
AdBlock has stolen the banner, but banners are not teeth — they will be back

More
Ads
TheQuantumDaily April 26, 2020 at 12:00 PM

The World’s Top 12 Quantum Computing Research Universities
- From sandbox
In just a few years, quantum computing and quantum information theory has gone from a fringe subject offered in small classes at odd hours in the corner of the physics building annex to a full complement of classes in well-funded programs being held at quantum centers and institutes at leading universities.

The question now for many would-be quantum computer students is not, “Are there universities that even offer classes in quantum computing,” but, rather, “Which universities are leaders at quantum computing research.”

We’ll look at some of the best right now:

The Institute for Quantum Computing — University of Waterloo

The University of Waterloo can proudly declare that, while many universities avoided offering quantum computing classes like cat adoption agencies avoided adoption applications from the Schrodinger family, this Canadian university went all in.

And it paid off.

Read more →
- 0
- 549
- Comment
PastorGL January 31, 2020 at 07:55 PM

Introducing One Ring — an open-source pipeline for all your Spark applications
- Open source,
- Java,
- Big Data,
- Hadoop,
- Data Engineering
If you utilize Apache Spark, you probably have a few applications that consume some data from external sources and produce some intermediate result, that is about to be consumed by some applications further down the processing chain, and so on until you get a final result.

We suspect that because we have a similar pipeline with lots of processes like this one:

Click here for a bit larger version

Each rectangle is a Spark application with a set of their own execution parameters, and each arrow is an equally parametrized dataset (externally stored highlighted with a color; note the number of intermediate ones). This example is not the most complex of our processes, it’s fairly a simple one. And we don’t assemble such workflows manually, we generate them from Process Templates (outlined as groups on this flowchart).

So here comes the One Ring, a Spark pipelining framework with very robust configuration abilities, which makes it easier to compose and execute a most complex Process as a single large Spark job.

And we just made it open source. Perhaps, you’re interested in the details.

We got you covered!
- +7
- 672
- Comment
o6CuFl2Q January 23, 2020 at 07:50 PM

Five Methods For Database Obfuscation
ClickHouse users already know that its biggest advantage is its high-speed processing of analytical queries. But claims like this need to be confirmed with reliable performance testing. That's what we want to talk about today.

We started running tests in 2013, long before the product was available as open source. Back then, just like now, our main concern was data processing speed in Yandex.Metrica. We had been storing that data in ClickHouse since January of 2009. Part of the data had been written to a database starting in 2012, and part was converted from OLAPServer and Metrage (data structures previously used by Yandex.Metrica). For testing, we took the first subset at random from data for 1 billion pageviews. Yandex.Metrica didn't have any queries at that point, so we came up with queries that interested us, using all the possible ways to filter, aggregate, and sort the data.

ClickHouse performance was compared with similar systems like Vertica and MonetDB. To avoid bias, testing was performed by an employee who hadn't participated in ClickHouse development, and special cases in the code were not optimized until all the results were obtained. We used the same approach to get a data set for functional testing.

After ClickHouse was released as open source in 2016, people began questioning these tests.

Read more →
- +7
- 4.5k
- 3
Andrey2008 January 16, 2020 at 03:11 PM

Machine Learning in Static Analysis of Program Source Code
Machine learning has firmly entrenched in a variety of human fields, from speech recognition to medical diagnosing. The popularity of this approach is so great that people try to use it wherever they can. Some attempts to replace classical approaches with neural networks turn up unsuccessful. This time we'll consider machine learning in terms of creating effective static code analyzers for finding bugs and potential vulnerabilities.

Read more →
- +2
- 1.3k
- Comment
ryan0852 January 10, 2020 at 10:00 AM

How Ecommerce Fueled By the Pillars of AI Technology
At present, we see artificial intelligence is implemented across the corridors of business operations and also the way we shop and trade online. To hit a home run in the retail game, genius AI applications, PIM solutions, and e-commerce development tools are now offering smart solutions: predictive analysis, recommendation engines, inventory management, and warehouse automation to create a more profitable shopping experience for consumers.

Now more than ever, e-commerce is an AI innovation game

Artificial Intelligence often sometimes seems complicated to newbies but in reality, it is simple in use and gives you the ability to predict customer needs. This paves the way for e-commerce companies to become a “big brand” or “big business” with revolutionary AI tools.

Now that AI algorithms making way for consumer acceptance of AI like never before, how can you use it to create more profitable outcomes in e-commerce?

Interesting E-commerce Stats:

With an estimated global population of 7.7 billion, 25 percent of people shopping through e-commerce stores. According to Statista, 52% of e-commerce stores will have omnichannel capabilities by 2020 which means they can communicate and sell with their consumers via multiple channels. For example, they can use their e-commerce website, Facebook e-shop, email account, and Instagram account.

Examples of AI tools and PIM software for e-commerce businesses that can help them have a high bar on customer service and marketing:

Read more →
- –1
- 235
- Comment
stefanbuzz December 18, 2019 at 11:55 AM

Apache Hadoop Code Quality: Production VS Test
- PVS-Studio corporate blog,
- Open source,
- Java,
- Big Data,
- Hadoop
In order to get high quality production code, it's not enough just to ensure maximum coverage with tests. No doubts, great results require the main project code and tests to work efficiently together. Therefore, tests have to be paid as much attention as the main code. A decent test is a key success factor, as it will catch regression in production. Let's take a look at PVS-Studio static analyzer warnings to see the importance of the fact that errors in tests are no worse than the ones in production. Today's focus: Apache Hadoop.

Read more →
- +4
- 437
- Comment
BeyersJulia November 18, 2019 at 06:45 PM

Cryptocurrency trading — How to develop a sustainable strategy
- Big Data
The cryptocurrency space is incredibly popular among investors and traders alike. Many of the most popular crypto assets have grown a lot in recent years. Let’s take Bitcoin.
Read more →
- +5
- 731
- 2
SvyatoslavMC October 22, 2019 at 10:05 AM

Analyzing the Code of ROOT, Scientific Data Analysis Framework
- PVS-Studio corporate blog,
- Open source,
- C++,
- C,
- Big Data
While Stockholm was holding the 118th Nobel Week, I was sitting in our office, where we develop the PVS-Studio static analyzer, working on an analysis review of the ROOT project, a big-data processing framework used in scientific research. This code wouldn't win a prize, of course, but the authors can definitely count on a detailed review of the most interesting defects plus a free license to thoroughly check the project on their own.

Introduction

ROOT is a modular scientific software toolkit. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage. It is mainly written in C++. ROOT was born at CERN, at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations.

Read more →
- +22
- 1.9k
- 4
msgeek October 1, 2019 at 10:00 AM

What's new in ML.NET and Model Builder
We are excited to announce updates to Model Builder and improvements in ML.NET. You can learn more in the «What’s new in ML.NET?.» session at .NET Conf.

ML.NET is an open-source and cross-platform machine learning framework (Windows, Linux, macOS) for .NET developers.

ML.NET offers Model Builder (a simple UI tool) and CLI to make it super easy to build custom ML Models using AutoML.

Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom AI into their applications by creating custom machine learning models for common scenarios like Sentiment Analysis, Recommendation, Image Classification and more!..

Read more →
- +4
- 523
- Comment
sashalisik September 6, 2019 at 01:27 PM

How we created IoT system for managing solar energy usage
- System Analysis and Design,
- IT Infrastructure,
- Big Data,
- Smart House,
- IOT
If you have no idea about the development architecture and mechanical/electrical design behind IoT solutions, they could seem like "having seemingly supernatural qualities or powers". For example, if you show a working IoT system to 18th century people, they'd think it's magic.This article is sort of busting such myth. Or, to put it more technically, about hints for fine-tuning the IoT development for an awesome project in solar energy management area.

Read more →
- +5
- 696
- Comment
stefanbuzz August 15, 2019 at 10:06 AM

PVS-Studio Visits Apache Hive
- PVS-Studio corporate blog,
- Information Security,
- Open source,
- Java,
- Big Data
For the past ten years, the open-source movement has been one of the key drivers of the IT industry's development, and its crucial component. The role of open-source projects is becoming more and more prominent not only in terms of quantity but also in terms of quality, which changes the very concept of how they are positioned on the IT market in general. Our courageous PVS-Studio team is not sitting idly and is taking an active part in strengthening the presence of open-source software by finding hidden bugs in the enormous depths of codebases and offering free license options to the authors of such projects. This article is just another piece of that activity! Today we are going to talk about Apache Hive. I've got the report — and there are things worth looking at.

Read more →
- +17
- 968
- Comment
sismetanin August 1, 2019 at 01:35 PM

Contextual Emotion Detection in Textual Conversations Using Neural Networks
Nowadays, talking to conversational agents is becoming a daily routine, and it is crucial for dialogue systems to generate responses as human-like as possible. As one of the main aspects, primary attention should be given to providing emotionally aware responses to users. In this article, we are going to describe the recurrent neural network architecture for emotion detection in textual conversations, that participated in SemEval-2019 Task 3 “EmoContext”, that is, an annual workshop on semantic evaluation. The task objective is to classify emotion (i.e. happy, sad, angry, and others) in a 3-turn conversational data set.

Read more →
- +37
- 2k
- Comment
o6CuFl2Q June 25, 2019 at 05:42 PM

How to speed up LZ4 decompression in ClickHouse?
- Яндекс corporate blog,
- High performance,
- Open source,
- C++,
- Big Data
When you run queries in ClickHouse, you might notice that the profiler often shows the LZ_decompress_fast function near the top. What is going on? This question had us wondering how to choose the best compression algorithm.

ClickHouse stores data in compressed form. When running queries, ClickHouse tries to do as little as possible, in order to conserve CPU resources. In many cases, all the potentially time-consuming computations are already well optimized, plus the user wrote a well thought-out query. Then all that's left to do is to perform decompression.

So why does LZ4 decompression becomes a bottleneck? LZ4 seems like an extremely light algorithm: the data decompression rate is usually from 1 to 3 GB/s per processor core, depending on the data. This is much faster than the typical disk subsystem. Moreover, we use all available CPU cores, and decompression scales linearly across all physical cores.

Read more →
- +19
- 5.1k
- 1

Common misconceptions about space-grade integrated circuits
+15 9.7k 4 0
The 2020 National Internet Segment Reliability Research
+26 7.9k 6 0
The hunt for vulnerability: executing arbitrary code on NVIDIA GeForce NOW virtual machines
+6 3.4k 0 0
How Data-Driven Insights Help In Better Decision Making
+2 175 0 0

Common misconceptions about space-grade integrated circuits
+15 9.7k 4 0
The 2020 National Internet Segment Reliability Research
+26 7.9k 6 0
Getting Better at Understanding Academic Papers: a Brief Guide for Beginners (Part 1)
+5 4.2k 4 0
The hunt for vulnerability: executing arbitrary code on NVIDIA GeForce NOW virtual machines
+6 3.4k 0 0