• Tarantool: an analyst's view

      Hi all! I'm Andrey Kapustin. I work as a system analyst at Mail.ru Group. Our products form a unified ecosystem. Many independent infrastructures generate data in it: taxi and food delivery services, email services, social networks, etc. The faster and more precise we can predict a client's needs, the sooner and more correctly we can offer our products. 

      Many system analysts and engineers are keen to know: 

      1. How to design the architecture of a trigger platform for real-time marketing?
      2. How to arrange a data structure that would be in line with the requirements of a marketing strategy for interacting with clients?
      3. How to ensure the stable operations of the  system under very heavy workloads? 

      Such systems are based on technologies of high-load processing and Big Data analysis. We have accumulated considerable experience in these areas. Our expertise is in high demand on the market.  I'm going to show how we help our customers to switch from off-line to on-line in their interactions with clients using Real-Time Marketing solutions based on Tarantool.
      Read more →
    • Visualizing Network Topologies: Zero to Hero in Two Days

      • Translation

      Hey everyone! This is a follow-up article on a local Cisco Russia DevNet Marathon online event I attended in May 2020. It was a series of educational webinars on network automation followed by daily challenges based on the discussed topics.
      On a final day, the participants were challenged to automate a topology analysis and visualization of an arbitrary network segment and, optionally, track and visualize the changes.


      The task was definitely not trivial and not widely covered in public blog posts. In this article, I would like to break down my own solution that finally took first place and describe the selected toolset and considerations.

      Let's get started.


      Read more →
    • Big / Bug Data: Analyzing the Apache Flink Source Code

        image1.png

        Applications used in the field of Big Data process huge amounts of information, and this often happens in real time. Naturally, such applications must be highly reliable so that no error in the code can interfere with data processing. To achieve high reliability, one needs to keep a wary eye on the code quality of projects developed for this area. The PVS-Studio static analyzer is one of the solutions to this problem. Today, the Apache Flink project developed by the Apache Software Foundation, one of the leaders in the Big Data software market, was chosen as a test subject for the analyzer.
        Read more →
      • The Rules for Data Processing Pipeline Builders


          "Come, let us make bricks, and burn them thoroughly."
          – legendary builders

          You may have noticed by 2020 that data is eating the world. And whenever any reasonable amount of data needs processing, a complicated multi-stage data processing pipeline will be involved.


          At Bumble — the parent company operating Badoo and Bumble apps — we apply hundreds of data transforming steps while processing our data sources: a high volume of user-generated events, production databases and external systems. This all adds up to quite a complex system! And just as with any other engineering system, unless carefully maintained, pipelines tend to turn into a house of cards — failing daily, requiring manual data fixes and constant monitoring.


          For this reason, I want to share certain good engineering practises with you, ones that make it possible to build scalable data processing pipelines from composable steps. While some engineers understand such rules intuitively, I had to learn them by doing, making mistakes, fixing, sweating and fixing things again…


          So behold! I bring you my favourite Rules for Data Processing Pipeline Builders.

          Read more →
        • Spring Boot app with Apache Kafka in Docker container

          • Translation
          • Tutorial

          Privet, comrads!

          In this article i’ll show how easy it is to setup Spring Java app with Kafka message brocker. We will use docker containers for kafka zookeeper/brocker apps and configure plaintext authorization for access from both local and external net.

          Link to final project on github can be picked up at the end of the article.

          Read more
        • Patroni cluster (with Zookeeper) in a docker swarm on a local machine

          • Tutorial

          There probably is no way one who stores some crucial data (and well, in particular, using SQL databases) can possibly dodge from thoughts of building some kind of safe cluster, distant guardian to protect consistency and availability at all times. Even if the main server with your precious database gets knocked out deadly - the show must go on, right? This basically means the database must still be available and data be up-to-date with the one on the failed server.

          As you might have noticed, there are dozens of ways to go and Patroni is just one of them. There is plenty of articles providing a more or less detailed comparison of the options available, so I assume I'm free to skip the part of luring you into Patroni's side. Let's start off from the point where among others you are already leaning towards Patroni and are willing to try that out in a more or less real-case setup.

          I am not a DevOps engineer originally so when the need for the high-availability cluster arose and I went on I would catch every single bump on the road. Hope this tutorial will help you out to get the job done with ease! If you don't want any more explanations, jump right in. Otherwise, you might want to read some more notes on the setup I went on with.

          Read more
        • OPPO, Huawei, Xiaomi. Chinese app stores join forces to take on Google

            Major players in the Chinese app market are joining forces to take on the almighty Google Play store. Xiaomi, Oppo and Vivo are reported to launch the Global Developer Service Alliance (GDSA), a platform allowing Android developers to publish their apps in the partnering stores from one upload.

            The GDSA is expected to launch in nine countries—including India, Indonesia, Malaysia, Russia, Spain, Thailand, the Philippines, and Vietnam—although paid app support may vary across the regions. Canalys’ Nicole Peng explains the wide reach of this alliance:

            By forming this alliance each company will be looking to leverage the others’ advantages in different regions, with Xiaomi’s strong user base in India, Vivo and Oppo in Southeast Asia, and Huawei in Europe. 

            Читать далее
          • Ads
            AdBlock has stolen the banner, but banners are not teeth — they will be back

            More
          • Linux Switchdev the Mellanox way

              This is a transcription of a talk that was presented at CSNOG 2020 — video is at the end of the page



              Greetings! My name is Alexander Zubkov. I work at Qrator Labs, where we protect our customers against DDoS attacks and provide BGP analytics.

              We started using Mellanox switches around 2 or 3 years ago. At the time we got acquainted with Switchdev in Linux and today I want to share with you our experience.
              Read more →
            • 5 Thought-Provoking Use Cases Of Blockchain In Diverse Industries

                Blockchain is a decentralized technology that maintains a record of all transactions occurring over a peer-to-peer network. Due to Blockchain's several different high-level use cases, numerous industries described Blockchain as the 'magic beans.' 

                Blockchains store the record in a decentralized system that is interconnected. This technology lessens vulnerability and enhances transparency in all industrial sectors as information is stored digitally, and it does not have any centralized point to carry out the transactions.

                Do You Know?

                Read more
              • Agreements as Code: how to refactor IaC and save your sanity?


                  Before we start, I'd like to get on the same page with you. So, could you please answer? How much time will it take to:


                  • Create a new environment for testing?
                  • Update java & OS in the docker image?
                  • Grant access to servers?

                  There is the spoiler from the TechLeadConf. Unfortunately, it's in Russian


                  It will take longer than you expect. I will explain why.

                  Read more →
                • Mysql 8.x Group Replication (Master-Slave) with Docker Compose

                    This post is handling the following situation - how to setup up simple Mysql services with group replication being dockerized. In our case, we’ll take the latest Mysql (version 8.x.x)

                    FYI: all mentioned code (worked and tested manually) located here.

                    I will skip not interested steps like ‘what is Mysql, Docker and why we choose them, etc’. We want to set up possibly trouble proof DB. That’s our plan.

                    Read more →
                  • InterSystems IRIS – the All-Purpose Universal Platform for Real-Time AI/ML

                      Author: Sergey Lukyanchikov, Sales Engineer at InterSystems

                      Challenges of real-time AI/ML computations


                      We will start from the examples that we faced as Data Science practice at InterSystems:

                      • A “high-load” customer portal is integrated with an online recommendation system. The plan is to reconfigure promo campaigns at the level of the entire retail network (we will assume that instead of a “flat” promo campaign master there will be used a “segment-tactic” matrix). What will happen to the recommender mechanisms? What will happen to data feeds and updates into the recommender mechanisms (the volume of input data having increased 25000 times)? What will happen to recommendation rule generation setup (the need to reduce 1000 times the recommendation rule filtering threshold due to a thousandfold increase of the volume and “assortment” of the rules generated)?
                      • An equipment health monitoring system uses “manual” data sample feeds. Now it is connected to a SCADA system that transmits thousands of process parameter readings each second. What will happen to the monitoring system (will it be able to handle equipment health monitoring on a second-by-second basis)? What will happen once the input data receives a new bloc of several hundreds of columns with data sensor readings recently implemented in the SCADA system (will it be necessary, and for how long, to shut down the monitoring system to integrate the new sensor data in the analysis)?
                      • A complex of AI/ML mechanisms (recommendation, monitoring, forecasting) depend on each other’s results. How many man-hours will it take every month to adapt those AI/ML mechanisms’ functioning to changes in the input data? What is the overall “delay” in supporting business decision making by the AI/ML mechanisms (the refresh frequency of supporting information against the feed frequency of new input data)?

                      Read more →
                    • The 2020 National Internet Segment Reliability Research


                        The National Internet Segment Reliability Research explains how the outage of a single Autonomous System might affect the connectivity of the impacted region with the rest of the world. Most of the time, the most critical AS in the region is the dominant ISP on the market, but not always.

                        As the number of alternate routes between AS’s increases (and do not forget that the Internet stands for “interconnected network” — and each network is an AS), so does the fault-tolerance and stability of the Internet across the globe. Although some paths are from the beginning more important than others, establishing as many alternate routes as possible is the only viable way to ensure an adequately robust network.

                        The global connectivity of any given AS, regardless of whether it is an international giant or regional player, depends on the quantity and quality of its path to Tier-1 ISPs.

                        Usually, Tier-1 implies an international company offering global IP transit service over connections with other Tier-1 providers. Nevertheless, there is no guarantee that such connectivity will be maintained all the time. For many ISPs at all “tiers”, losing connection to just one Tier-1 peer would likely render them unreachable from some parts of the world.
                        Read more →
                      • The hunt for vulnerability: executing arbitrary code on NVIDIA GeForce NOW virtual machines

                          Introduction


                          Against the backdrop of the coronavirus pandemic, the demand for cloud gaming services has noticeably increased. These services provide computing power to launch video games and stream gameplay to user devices in real-time. The most obvious advantage of this gaming type is that gamers do not need to have high-end hardware. An inexpensive computer is enough to run the client, spending time in self-isolation while the remote server carries out all calculations.

                          NVIDIA GeForce NOW is one of these cloud-based game streaming services. According to Google Trends, worldwide search queries for GeForce NOW peaked in February 2020. This correlates with the beginning of quarantine restrictions in many Asian, European, and North and South American countries, as well as other world regions. At the same time in Russia, where the self-isolation regime began in March, we see a similar picture with a corresponding delay.

                          Given the high interest in GeForce NOW, we decided to explore this service from an information security standpoint.
                          Read more →
                        • How Can AI & Data Science Help to Fight the Coronavirus?

                          image

                          Do you know AI can save us from a worldwide pandemic?

                          Yeah, it's true. Our global researchers have touted these two buzzing technologies can provide a substantial social benefit to this worldwide health crisis.

                          Before I begin, I would like to take this moment to say THANK YOU to all our COVID-19 Warriors standing on the frontline and working day and night for us. We can’t thank them enough. Our healthcare staff, police, scientist, security guards, and sweepers. Their contribution is overwhelming and commendable ?

                          Discovering a drug for any medicine demands the joint efforts of the world's brightest minds. The process is notoriously long, complicated, and expensive. And that's how health experts are involved in searching COVID-19 medicine. In the midst of such a crisis, artificial intelligence solutions are offering a new hope that a cure might appear faster with it.
                          Read more →
                        • Data Science vs AI: All You Need To Know

                            What do these terms mean? And what is the difference?


                            image

                            Data Science and Artificial Intelligence are creating a lot of buzzes these days. But what do these terms mean? And what is the difference between them?

                            While the terms Data Science and Artificial Intelligence (AI) comes under the same domain and are inter-connected to each other, they have their specific applications and meaning.

                            There’s no slowing down the spread of AI and data science. Many big tech giants are extensively investing in these technologies. As per the recent survey, it is estimated that artificial intelligence could add $15.7 trillion to the global economy by 2030.

                            Through this piece of writing, I will be explaining about the AI and data science concepts and their differences in detail. So, without wasting any more time, let’s get started!
                            Read more →
                          • Using kconfig for own projects

                              Intro


                              Every Linux professional write scripts. Someеimes light, linear. Sometimes complex script with functions and libs(yes, you can write your bash-library for use in other scripts).


                              But some of the scripts need a configuration file to work. For instance, I wrote a script that builds the ubuntu image for pxe, and I need to change the build process without build-script changes. The best way to resolve this task is to add configuration files.

                              Read more →
                            • IIoT platform databases – How Mail.ru Cloud Solutions deals with petabytes of data coming from a multitude of devices



                                Hello, my name is Andrey Sergeyev and I work as a Head of IoT Solution Development at Mail.ru Cloud Solutions. We all know there is no such thing as a universal database. Especially when the task is to build an IoT platform that would be capable of processing millions of events from various sensors in near real-time.

                                Our product Mail.ru IoT Platform started as a Tarantool-based prototype. I’m going to tell you about our journey, the problems we faced and the solutions we found. I will also show you a current architecture for the modern Industrial Internet of Things platform. In this article we will look into:

                                • our requirements for the database, universal solutions, and the CAP theorem
                                • whether the database + application server in one approach is a silver bullet
                                • the evolution of the platform and the databases used in it
                                • the number of Tarantools we use and how we came to this
                                Read more →
                              • The Project «Fabula»: How to find the desired video-fragment or person in a pile of video files?

                                  If a person is far over 20, then he has already accumulated a huge film library of his life, as well as videos from friends, relatives, and from his place of work… It is no longer possible to find someone or something specific there. Recently, I was preparing a video compilation for my daughter's anniversary – I spent a week. The media is all the more overloaded with video archives. And every day, millions of terabytes of video content appear in the world. And this is in the era of BIG DATA.

                                  image
                                  Read more →