• Benefits of Hybrid Data Lake: How to combine Data Warehouse with Data Lake

      Hey, hey! I am Ilya Kalchenko, a Data Engineer at NIX, a fan of big and small data processing, and Python. In this article, I want to discuss the benefits of hybrid data lakes for efficient and secure data organization.

       To begin with, I invite you to figure out the concepts of Data Warehouses and Data Lake. Let’s delve into the use cases and delimit areas of responsibility.

      Read more
    • Q1 2021 DDoS attacks and BGP incidents

        The year 2021 started on such a high note for Qrator Labs: on January 19, our company celebrated its 10th anniversary. Shortly after, in February, our network mitigated quite an impressive 750 Gbps DDoS attack based on old and well known DNS amplification. Furthermore, there is a constant flow of BGP incidents; some are becoming global routing anomalies. We started reporting in our newly made Twitter account for Qrator.Radar.

        Nevertheless, with the first quarter of the year being over, we can take a closer look at DDoS attacks statistics and BGP incidents for January - March 2021.

        Read more
      • You are standing at a red light at an empty intersection. How to make traffic lights smarter?

          Types of smart traffic lights: adaptive and neural networks

          Adaptive works at relatively simple intersections, where the rules and possibilities for switching phases are quite obvious. Adaptive management is only applicable where there is no constant loading in all directions, otherwise it simply has nothing to adapt to – there are no free time windows. The first adaptive control intersections appeared in the United States in the early 70s of the last century. Unfortunately, they have reached Russia only now, their number according to some estimates does not exceed 3,000 in the country.

          Neural networks – a higher level of traffic regulation. They take into account a lot of factors at once, which are not even always obvious. Their result is based on self-learning: the computer receives live data on the bandwidth and selects the maximum value by all possible algorithms, so that in total, as many vehicles as possible pass from all sides in a comfortable mode per unit of time. How this is done, usually programmers answer – we do not know, the neural network is a black box, but we will reveal the basic principles to you…

          Adaptive traffic lights use, at least, leading companies in Russia, rather outdated technology for counting vehicles at intersections: physical sensors or video background detector. A capacitive sensor or an induction loop only sees the vehicle at the installation site-for a few meters, unless of course you spend millions on laying them along the entire length of the roadway. The video background detector shows only the filling of the roadway with vehicles relative to this roadway. The camera should clearly see this area, which is quite difficult at a long distance due to the perspective and is highly susceptible to atmospheric interference: even a light snowstorm will be diagnosed as the presence of traffic – the background video detector does not distinguish the type of detection.

          Read more
        • 11 Kubernetes implementation mistakes – and how to avoid them


            I manage a team that designs and introduces in-house Kubernetes aaS at Mail.ru Cloud Solutions. And we often see a lack of understanding as to this technology, so I’d like to talk about common strategic mistakes at Kubernetes implementation in major projects.

            Most of the problems arise because the technology is quite sophisticated. There are unobvious implementation and operation challenges, as well as poorly used advantages, all of those resulting in money loss. Another issue is the global lack of knowledge and experience with Kubernetes. Learning its use by the book can be tricky, and hiring qualified staff can be challenging. All the hype complicates Kubernetes-related decision making. Curiously enough, Kubernetes is often implemented rather formally – just for it to be there and make their lives better in some way.

            Hopefully, this post will help you to make a decision you will feel proud of later (and won’t regret or feel like building a time machine to undo it).
            Read more →
          • Building projects (CI/CD), instruments

              In some projects, the build script is playing the role of Cinderella. The team focuses its main effort on code development. And the build process itself could be handled by people who are far from development (for example, those responsible for operation or deployment). If the build script works somehow, then everyone prefers not to touch it, and no one ever is thinking about optimization. However, in large heterogeneous projects, the build process could be quite complex, and it is possible to approach it as an independent project.If you treat the build script as a secondary unimportant project, then the result will be an indigestible imperative script, the support of which will be rather difficult.


              In this note we will take look at the criteria by which we chose the toolkit, and in the next one — how we use this toolkit. (There is also a Russian version.)


              CI/CD (opensource.com)

              Read more →
            • Distributed Artificial Intelligence with InterSystems IRIS

                Author: Sergey Lukyanchikov, Sales Engineer at InterSystems

                What is Distributed Artificial Intelligence (DAI)?

                Attempts to find a “bullet-proof” definition have not produced result: it seems like the term is slightly “ahead of time”. Still, we can analyze semantically the term itself – deriving that distributed artificial intelligence is the same AI (see our effort to suggest an “applied” definition) though partitioned across several computers that are not clustered together (neither data-wise, nor via applications, not by providing access to particular computers in principle). I.e., ideally, distributed artificial intelligence should be arranged in such a way that none of the computers participating in that “distribution” have direct access to data nor applications of another computer: the only alternative becomes transmission of data samples and executable scripts via “transparent” messaging. Any deviations from that ideal should lead to an advent of “partially distributed artificial intelligence” – an example being distributed data with a central application server. Or its inverse. One way or the other, we obtain as a result a set of “federated” models (i.e., either models trained each on their own data sources, or each trained by their own algorithms, or “both at once”).

                Distributed AI scenarios “for the masses”

                We will not be discussing edge computations, confidential data operators, scattered mobile searches, or similar fascinating yet not the most consciously and wide-applied (not at this moment) scenarios. We will be much “closer to life” if, for instance, we consider the following scenario (its detailed demo can and should be watched here): a company runs a production-level AI/ML solution, the quality of its functioning is being systematically checked by an external data scientist (i.e., an expert that is not an employee of the company). For a number of reasons, the company cannot grant the data scientist access to the solution but it can send him a sample of records from a required table following a schedule or a particular event (for example, termination of a training session for one or several models by the solution). With that we assume, that the data scientist owns some version of the AI/ML mechanisms already integrated in the production-level solution that the company is running – and it is likely that they are being developed, improved, and adapted to concrete use cases of that concrete company, by the data scientist himself. Deployment of those mechanisms into the running solution, monitoring of their functioning, and other lifecycle aspects are being handled by a data engineer (the company employee).

                Читать далее
              • Ads
                AdBlock has stolen the banner, but banners are not teeth — they will be back

                More
              • 2020 Network Security and Availability Report

                  By the beginning of 2021, Qrator Labs filtering network expands to 14 scrubbing centers and a total of 3 Tbps filtering bandwidth capacity, with the San Paolo scrubbing facility fully operational in early 2021;

                  New partner services fully integrated into Qrator Labs infrastructure and customer dashboard throughout 2020: SolidWall WAF and RuGeeks CDN;

                  Upgraded filtering logic allows Qrator Labs to serve even bigger infrastructures with full-scale cybersecurity protection and DDoS attacks mitigation;

                  The newest AMD processors are now widely used by Qrator Labs in packet processing.

                  DDoS attacks were on the rise during 2020, with the most relentless attacks described as short and overwhelmingly intensive.

                  However, BGP incidents were an area where it was evident that some change was and still is needed, as there was a significant amount of devastating hijacks and route leaks.

                  In 2020, we began providing our services in Singapore under a new partnership and opened a new scrubbing center in Dubai, where our fully functioning branch is staffed by the best professionals to serve local customers.

                  Read more
                • Architecting Architecture: Makers and Takers

                  • Translation

                  The step has been made. Not sure where to, but for sure from the point of no return. Keep calm and keep walking. It is about a time to look around and understand the smelly and slippery route before you. And what are those noisy creatures swarming around our fishy “innovative” design we called Mandelbrot blueprint. You don't get a buzzing-noise like that, just buzzing and buzzing, without its meaning something.

                  Read more
                • Architecting Architecture

                  • Translation

                  Architect. This word sounds so mysterious. So mysterious that to understand it you almost forced to add something. Like “System Architect” or “Program Architect”. Such addition does not make clearer, but for sure adds weight to the title. Now you know – that’s some serious guy! I prefer to make undoubtful and around 10 year ago added to my email signature “Enterprise Architect of Information Systems”. It’s a powerful perk. Like “Chosen One”. With architects it is always a matter of naming, you know. Maybe that is why the only way to become and architect is to be named as one by others. Like with vampires. One of them has to byte you! That is probably the easiest way to earn the title as there is no degree or school to grant you one. And if there’s a troubling title, somebody’s making a trouble, and the only reason for making a trouble that I know of is because you’re an Enterprise. Huge old and complex multinational corporation. Like a one-legged pirate. Strong and scary, but not a good runner. You own your ship, you had good days, you have some gold, you need new ways.

                  To get to new treasures and avoid losing second leg to piranha regulators and local business shark swarming waters near every enterprise ship – every pirate has a map. Map is a list of major features and requirements in desired order and priority.

                  Read more
                • HDB++ TANGO Archiving System

                  • Translation
                  • Tutorial
                  main

                  What is HDB++?


                  This is a TANGO archiving system, allows you to save data received from devices in the TANGO system.


                  Working with Linux will be described here (TangoBox 9.3 on base Ubuntu 18.04), this is a ready-made system where everything is configured.


                  What is the article about?


                  • System architecture.
                  • How to set up archiving.

                  It took me ~ 2 weeks to understand the architecture and write my own scripts for python for this case.


                  What is it for?


                  Allows you to store the history of the readings of your equipment.


                  • You don't need to think about how to store data in the database.
                  • You just need to specify which attributes to archive from which equipment.
                  Read more →