Pull to refresh
15.3
Rating

Data Mining *

Deep data analysis

Show first
Rating limit

Millions of orders per second matching engine testing

C++ *Data Mining *Big Data *Data Engineering *
Sandbox

I had some experience in the matching engine development for cryptocurrency exchange some time ago. That was an interesting and challenging experience. I developed it in clear C++ from scratch. The testing of it is also quite a challenging task. You need to get data for testing, perform testing, collect some statistics, and at last, analyze collected data to find weak points and bottlenecks. I want to focus on testing the C++ matching engine and show how testing can give insights for optimizations even without the need to change the code. The matching engine I developed can do more than 1’000’000 TPS (transactions per second) and is 10x times faster than the matching engine of the Binance cryptocurrency exchange (see one post on Binance Blog).

Read more
Total votes 5: ↑5 and ↓0 +5
Views 3.7K
Comments 1

Benefits of Hybrid Data Lake: How to combine Data Warehouse with Data Lake

NIX corporate blog Data Mining *Data Engineering *

Hey, hey! I am Ilya Kalchenko, a Data Engineer at NIX, a fan of big and small data processing, and Python. In this article, I want to discuss the benefits of hybrid data lakes for efficient and secure data organization.

 To begin with, I invite you to figure out the concepts of Data Warehouses and Data Lake. Let’s delve into the use cases and delimit areas of responsibility.

Read more
Rating 0
Views 1.1K
Comments 0

Coins classifier Neural Network: Head or Tail?

Python *Data Mining *Big Data *Data Engineering *TensorFlow *

Home of this article: https://robotics.snowcron.com/coins/02_head_or_tail.htm

The global objective of these articles is to build a coin classifier, capable of scanning your pocket change and find rare / valuable coins. This is a second article in a series, so let me remind you what happened earlier (https://habr.com/ru/post/538958/).

During previous step we got a rather large dataset composed of pairs of images, loaded from an online coins site meshok.ru. Those images were uploaded to the Internet by people we do not know, and though they are supposed to contain coin's head in one image and tail in the other, we can not rule out a situation when we have two heads and no tail and vice versa. Also at the moment we have no idea which image contains head and which contains tail: this might be important when we feed data to our final classifier.

So let's write a program to distinguish heads from tails. It is a rather simple task, involving a convolutional neural network that is using transfer learning.

Same way as before, we are going to use Google Colab environment, taking the advantage of a free video card they grant us an access to. We will store data on a Google Drive, so first thing we need is to allow Colab to access the Drive:

Читать далее
Rating 0
Views 673
Comments 0

How PVS-Studio Checked ELKI in January

PVS-Studio corporate blog Open source *Java *Data Mining *

If you feel like the New Year just came, and you missed the first half of January, then all this time you've been busy looking for tricky bugs in the code you maintain. It also means that our article is what you need. PVS-Studio has checked the ELKI open source project to show you errors that may occur in the code, how cunningly they can hide there, and how you can deal with them.


ELKI/image1.png

Read more →
Total votes 3: ↑3 and ↓0 +3
Views 463
Comments 1

Crime, Race and Lethal Force in the USA — Part 3

Open source *Python *Data Mining *Big Data *
Translation
image
This is the concluding part of my article devoted to a statistical analysis of police shootings and criminality among the white and the black population of the United States. In the first part, we talked about the research background, goals, assumptions, and source data; in the second part, we investigated the national use-of-force and crime data and tracked their connection with race.
Read more →
Total votes 3: ↑3 and ↓0 +3
Views 1.4K
Comments 0

Crime, Race and Lethal Force in the USA — Part 1

Open source *Python *Data Mining *
Translation
image

Do the police in the US really shoot black people more often than white people? Is use of lethal force connected with race? How is crime related to race? What are the odds of getting shot by the police if you are white and if you are black? We're taking public data and python with pandas to shed some light on these questions, propaganda and politics set far aside.
Read more →
Total votes 7: ↑5 and ↓2 +3
Views 2.1K
Comments 1

10 Best Email Scraping Tools for Sales Prospecting in 2020

Data Mining *
Sandbox
We all know how hard it is to build an email sales list from scratch, especially for small companies. There left no options due to limited resources. In fact, many companies even buy preset profiled lists from the third party and send identical mass emails. It can put your business in a vulnerable position ascribed into the low quality of the email lists. However, there is a better way to build a highly targeted email list with email scraping tools.

Email scraping can help you collect email addresses shown publicly using a bot. What makes this great is that you have control over where to get the email lists from, and who can opt-in. Moreover, you don’t have to rely on the second-hand source. I profiled a list of best 10 email scraping tools for sales prospecting. Let’s take a look.

1. Zoominfo

A full-featured email scraping platform with a comprehensive database. You can directly search for titles and companies within their platform. It is more like a directory system that covers professionals in all industries with contact information. Email lists are the assets. That said, it comes with a price tag. It is worth to invest if you are looking for accurate sales leads. Zoominfo is an excellent option for enterprise-level sales prospects.

image
Read more →
Total votes 3: ↑3 and ↓0 +3
Views 1.2K
Comments 0

How to find an English teacher. Part 1

Python *Programming *Data Mining *Data visualization Natural Language Processing *


In the modern world, here and there ideas are arising about using data science for an extra benefit. For instance, Google can use a history of watched videos for providing recommendations about new ones. Online shops are using a recommendation system for increasing your receipt. However… if companies use the data for their benefit, could we do the same for own needs such as looking an online English teacher?


Disclaimer

It is an approach based on my own experience and can be unsuitable to your point of view, ideas, or principles.

Total votes 2: ↑1 and ↓1 0
Views 1.2K
Comments 0

Approach to calculating individual risk in COVID-19

Python *Data Mining *C *

In February 2020, when the disease came to Europe, it became apparent to me that our timid hopes that the epidemics would subside and be finally buried in the China's soil were ruined. It was already evident from the Chinese statistics that the virus is lethal enough to scare and mild enough to pass unnoticed in many cases and, thus, to guarantee its effective dissemination. The question was when it reaches each next country.




Another question was the individual risks, especially the risk of lethal outcome if one contracts the virus. The average figure of around 5% was circulated by late January and early February. It was known that males were more susceptible to fatal outcomes. By February, it was also evident that the virus doesn't lead to death only in the elderly — the middle age was significantly affected, as well.

Read more →
Rating 0
Views 882
Comments 0

COVID YAAA! or Yet Another Analyze Attempt

Data Mining *R *Data visualization Machine learning *Health

image


Hello, Habr!


About a month ago, I had a feeling of constant anxiety. I began to eat poorly, sleep even worse, and constantly read to a ton of news about the pandemic. Based on them, the coronavirus either captured, or liberated our planet, was either a conspiracy of world governments, or the vengeance of the pangolin, the virus either threatened everyone at once, or personally me and my sleeping cat…


Hundreds of articles, social media posts, youtube-telegram-instagram-tik-tok (yes, I sin) content of varying degrees of content quality did not lead me to anything but an even greater sense of anxiety.


But one day I bought buckwheat decided to end it all. As soon as possible!

What did you do?
Total votes 1: ↑0 and ↓1 -1
Views 983
Comments 0

«Build it & Break it»: How some algorithms generate captcha, while others crack it

Data Mining *
Sandbox
Hello, Habr! Let's me present you a translation of an article "«Ломай меня полностью!» Как одни алгоритмы генерируют капчу, а другие её взламывают", author miroslavmirm.

Doesn't matter what kind of intelligence you have — be it artificial or natural — after this detailed analysis no captcha will be an obstacle. At the end of the article, you can find the simplest and most effective workaround solution.

CAPTCHA is a completely automated public Turing test to tell computers and humans apart by automatically setting up specific tasks that are difficult for computers but simple for human. This technology has become the security standard used to prevent automatic voting, registration, spam, brute-force attacks on websites, etc.
Read more →
Rating 0
Views 2K
Comments 0

Using Data Science for house hunting in Montreal

Data Mining *R *DIY

Introduction


I happen to live in Montreal, in my condo on the edge of McGill Ghetto. Close to Saint Laurent Boulevard or the Maine as locals call it, with all it's attractions — bars, restaurants, night clubs, drunken students. And once upon a time, on a particular lively night, listening to the sounds of McGill frosh students drunkenly heading home after hard night of studying. I thought, that it might be a good idea to move into my own house, a little bit further away from the action.


Image

Read more →
Total votes 10: ↑9 and ↓1 +8
Views 3.9K
Comments 0

Free API Moscow Stock Exchange (MOEX) in Google Sheets

Data Mining *Algorithms *API *Google API *Finance in IT
Last year the number of private investors at Moscow Stock Exchange (MOEX) has doubled and reached 3.86 million: about 1.9 million people have opened accounts at MOEX in 2019. The Saint Petersburg Stock Exchange which specializes in trading of foreign company shares has seen its accounts increase three times from 910,000 to 3,06 million over the past year.



This means that almost 2 million newbies without any actual trading experience and lacking any specialized software for trading/position analysis have entered the market.
Read more →
Total votes 4: ↑4 and ↓0 +4
Views 6.3K
Comments 1

Machine Learning for your flat hunt. Part 2

Python *Programming *Data Mining *Data visualization Machine learning *


Have you thought about the influence of the nearest metro to the price of your flat? 
What about several kindergartens around your apartment? Are you ready to plunge in the world of geo-spatial data?


The world provides so much information…



Read more →
Total votes 4: ↑4 and ↓0 +4
Views 1.3K
Comments 0

Contextual Emotion Detection in Textual Conversations Using Neural Networks

VK corporate blog Python *Data Mining *Big Data *Machine learning *

Nowadays, talking to conversational agents is becoming a daily routine, and it is crucial for dialogue systems to generate responses as human-like as possible. As one of the main aspects, primary attention should be given to providing emotionally aware responses to users. In this article, we are going to describe the recurrent neural network architecture for emotion detection in textual conversations, that participated in SemEval-2019 Task 3 “EmoContext”, that is, an annual workshop on semantic evaluation. The task objective is to classify emotion (i.e. happy, sad, angry, and others) in a 3-turn conversational data set.
Read more →
Total votes 37: ↑37 and ↓0 +37
Views 2.7K
Comments 0

How do you choose products in stores?

SAS corporate blog Data Mining *Machine learning *Social networks and communities
Translation
image

The most important single ingredient in the formula of success is knowing how to get along with people. Theodore Roosevelt

In the previous article I tried to cover the basics of pricing analytics. Now I'd like to talk about something more interesting.

Have you ever thought about why you choose certain products in stores, why you prefer them to other similar ones? Many shopping trips are spontaneous, so it's probably impossible to give a clear answer for all the times you go shopping. But the general idea is obvious: you go shopping for a specific reason (to get food, a gadget, for entertainment, to play blackjack). In this article I'm going to use available data from grocery retailers to talk about how a set of basic logical assumptions and community analysis can help us determine the way customers choose products.
Read more →
Total votes 10: ↑8 and ↓2 +6
Views 1.2K
Comments 1

A selection of Datasets for Machine learning

Python *Data Mining *Open data *Machine learning *Artificial Intelligence
Hi guys,

Before you is an article guide to open data sets for machine learning. In it, I, for a start, will collect a selection of interesting and fresh (relatively) datasets. And as a bonus, at the end of the article, I will attach useful links on independent search of datasets.

Less words, more data.

image

A selection of datasets for machine learning:


Read more →
Total votes 12: ↑11 and ↓1 +10
Views 6.1K
Comments 2

Authors' contribution