• Network Infrastructure — how is it seen by hyperscalers

    Network architecture at hyperscalers is a subject to constant innovation and is ever evolving to meet the demand. Network operators are constantly experimenting with solutions and finding new ways to keep it reliable and cost effective. Hyperscalers are periodically publishing their findings and innovations in a variety of scientific and technical groups.

    The purpose of this article is to summarize the information about how hyperscalers design and manage networks. The goal here is to help connecting the dots, dissect and digest the data from a variety of sources including my personal experience working with hyperscalers.

    DISCLAIMER: All information in this article is acquired from public resources. This article contains my own opinion which might not match and does not represent the opinion of my employer.

    Read more...
  • REST or Events? Choose the right communication style for your microservices

      Microservices Architecture is a well-known pattern for building a complex system that consists of loosely coupled modules. It provides better scalability, and it is easier to develop a system in multiple teams so that they don’t interfere with each other too much. However, it is important to choose the right way of communication between the services. Otherwise, this kind of architecture can do more harm than good.

      Read more
    • PVS-Studio's New Website: How We Designed It

        The PVS-Studio website turns 15 this year. This is quite significant for any internet resource. Back then, when our website was born, Russia announced 2006 as a year of humanities. That same year, in June, Denis Kryuchkov established a new platform, "Habrhabr" (now known as Habr). In November, Microsoft officially completed OS Windows Vista. That same month we registered the viva64.com domain.

        We celebrated our domain's 10th anniversary with the website's redesign. After that, we would only change the resource capacity and features, but we'd never touch the design in any way. During this time, the number of articles grew so much that we needed to add tags to facilitate navigation. Right now we are also working on our YouTube channel. This means, you will see more and more new videos on our website as well. We keep adding new web pages at a tremendous rate, while the website's usability stays the same.

        Time has come for big changes!

        Читать далее
      • Application performance monitoring and health metrics without APM

        • Translation

        I have already written about AIOps and machine learning methods in working with IT incidents, about hybrid umbrella monitoring and various approaches to service management. Now I would like to share a very specific algorithm, how one can quickly get information about functioning conditions of business applications using synthetic monitoring and how to build, on this basis, the health metric of business services at no special cost. The story is based on a real case of implementing the algorithm into the IT system of one of the airlines.

        Currently there are many APM systems, such as Appdynamics, Dynatrace, and others, having a UX control module inside that uses synthetic checks. And if the task is to learn about failures quicker than customers, I will tell you why all these APM systems are not needed. Also, nowadays health metrics are a fashionable feature of APM and I will show how you can build them without APM. 

        Читать далее
      • Q1 2021 DDoS attacks and BGP incidents

          The year 2021 started on such a high note for Qrator Labs: on January 19, our company celebrated its 10th anniversary. Shortly after, in February, our network mitigated quite an impressive 750 Gbps DDoS attack based on old and well known DNS amplification. Furthermore, there is a constant flow of BGP incidents; some are becoming global routing anomalies. We started reporting in our newly made Twitter account for Qrator.Radar.

          Nevertheless, with the first quarter of the year being over, we can take a closer look at DDoS attacks statistics and BGP incidents for January - March 2021.

          Read more
        • You are standing at a red light at an empty intersection. How to make traffic lights smarter?

            Types of smart traffic lights: adaptive and neural networks

            Adaptive works at relatively simple intersections, where the rules and possibilities for switching phases are quite obvious. Adaptive management is only applicable where there is no constant loading in all directions, otherwise it simply has nothing to adapt to – there are no free time windows. The first adaptive control intersections appeared in the United States in the early 70s of the last century. Unfortunately, they have reached Russia only now, their number according to some estimates does not exceed 3,000 in the country.

            Neural networks – a higher level of traffic regulation. They take into account a lot of factors at once, which are not even always obvious. Their result is based on self-learning: the computer receives live data on the bandwidth and selects the maximum value by all possible algorithms, so that in total, as many vehicles as possible pass from all sides in a comfortable mode per unit of time. How this is done, usually programmers answer – we do not know, the neural network is a black box, but we will reveal the basic principles to you…

            Adaptive traffic lights use, at least, leading companies in Russia, rather outdated technology for counting vehicles at intersections: physical sensors or video background detector. A capacitive sensor or an induction loop only sees the vehicle at the installation site-for a few meters, unless of course you spend millions on laying them along the entire length of the roadway. The video background detector shows only the filling of the roadway with vehicles relative to this roadway. The camera should clearly see this area, which is quite difficult at a long distance due to the perspective and is highly susceptible to atmospheric interference: even a light snowstorm will be diagnosed as the presence of traffic – the background video detector does not distinguish the type of detection.

            Read more
          • Ads
            AdBlock has stolen the banner, but banners are not teeth — they will be back

            More
          • 2020 Network Security and Availability Report

              By the beginning of 2021, Qrator Labs filtering network expands to 14 scrubbing centers and a total of 3 Tbps filtering bandwidth capacity, with the San Paolo scrubbing facility fully operational in early 2021;

              New partner services fully integrated into Qrator Labs infrastructure and customer dashboard throughout 2020: SolidWall WAF and RuGeeks CDN;

              Upgraded filtering logic allows Qrator Labs to serve even bigger infrastructures with full-scale cybersecurity protection and DDoS attacks mitigation;

              The newest AMD processors are now widely used by Qrator Labs in packet processing.

              DDoS attacks were on the rise during 2020, with the most relentless attacks described as short and overwhelmingly intensive.

              However, BGP incidents were an area where it was evident that some change was and still is needed, as there was a significant amount of devastating hijacks and route leaks.

              In 2020, we began providing our services in Singapore under a new partnership and opened a new scrubbing center in Dubai, where our fully functioning branch is staffed by the best professionals to serve local customers.

              Read more
            • HDB++ TANGO Archiving System

              • Translation
              • Tutorial
              main

              What is HDB++?


              This is a TANGO archiving system, allows you to save data received from devices in the TANGO system.


              Working with Linux will be described here (TangoBox 9.3 on base Ubuntu 18.04), this is a ready-made system where everything is configured.


              What is the article about?


              • System architecture.
              • How to set up archiving.

              It took me ~ 2 weeks to understand the architecture and write my own scripts for python for this case.


              What is it for?


              Allows you to store the history of the readings of your equipment.


              • You don't need to think about how to store data in the database.
              • You just need to specify which attributes to archive from which equipment.
              Read more →
            • Agreements as Code: how to refactor IaC and save your sanity?


                Before we start, I'd like to get on the same page with you. So, could you please answer? How much time will it take to:


                • Create a new environment for testing?
                • Update java & OS in the docker image?
                • Grant access to servers?

                There is the spoiler from the TechLeadConf. Unfortunately, it's in Russian


                It will take longer than you expect. I will explain why.

                Read more →
              • The 2020 National Internet Segment Reliability Research


                  The National Internet Segment Reliability Research explains how the outage of a single Autonomous System might affect the connectivity of the impacted region with the rest of the world. Most of the time, the most critical AS in the region is the dominant ISP on the market, but not always.

                  As the number of alternate routes between AS’s increases (and do not forget that the Internet stands for “interconnected network” — and each network is an AS), so does the fault-tolerance and stability of the Internet across the globe. Although some paths are from the beginning more important than others, establishing as many alternate routes as possible is the only viable way to ensure an adequately robust network.

                  The global connectivity of any given AS, regardless of whether it is an international giant or regional player, depends on the quantity and quality of its path to Tier-1 ISPs.

                  Usually, Tier-1 implies an international company offering global IP transit service over connections with other Tier-1 providers. Nevertheless, there is no guarantee that such connectivity will be maintained all the time. For many ISPs at all “tiers”, losing connection to just one Tier-1 peer would likely render them unreachable from some parts of the world.
                  Read more →
                • Looking back at 3 months of the global traffic shapeshifting

                    image
                    There would be no TL;DR in this article, sorry.

                    Those have been three months that genuinely changed the world. An entire lifeline passed from February, 1, when the coronavirus pandemics just started to spread outside of China and European countries were about to react, to April, 30, when nations were locked down in quarantine measures almost all over the entire world. We want to take a look at the repercussions, cyclic nature of the reaction and, of course, provide DDoS attacks and BGP incidents overview on a timeframe of three months.

                    In general, there seems to be an objective pattern in almost every country’s shift into the quarantine lockdown.
                    Read more →
                  • Bcache against Flashcache for Ceph Object Storage


                      Fast SSDs are getting cheaper every year, but they are still smaller and more expensive than traditional HDD drives. But HDDs have much higher latency and are easily saturated. However, we want to achieve low latency for the storage system, and a high capacity too. There’s a well-known practice of optimizing performance for big and slow devices — caching. As most of the data on a disk is not accessed most of the time but some percentage of it is accessed frequently, we can achieve a higher quality of service by using a small cache.

                      Server hardware and operating systems have a lot of caches working on different levels. Linux has a page cache for block devices, a dirent cache and an inode cache on the filesystem layer. Disks have their own cache inside. CPUs have caches. So, why not add one more persistent cache layer for a slow disk?
                      Read more →
                    • How to test Ansible and don't go nuts


                        It is the translation of my speech at DevOps-40 2020-03-18:


                        After the second commit, each code becomes legacy. It happens because the original ideas do not meet actual requirements for the system. It is not bad or good thing. It is the nature of infrastructure & agreements between people. Refactoring should align requirements & actual state. Let me call it Infrastructure as Code refactoring.

                        Read more →
                      • Why Enterprise Chat Apps isn’t built on Server-side Database like Hangouts, Slack, & Hip chat?

                        One of the most significant tools for any organization to smoothen their collaborative world is only through a real-time chat application whether the conversation takes place on mobile or desktop. Hangouts, Slack and Hipchat have been in action for businesses to establish a decent conversation between their internal employees and clients right from small-scale to enterprises.

                        Those big players come into play where there requires team collaboration. The big players are built on a server-side database where the messages shared from one device to another is stored in their server database. Ultimately, this results in storing a huge amount of data within the server-side database (Cloud-database).

                        The consumption of cloud storage will be pretty high. The client-side database is more efficient where the messages relayed is stored in the client device. The messages will be queued to minimize the consumption of data in the server.
                        image
                        Read more →
                      • New action to disrupt world’s largest online criminal network



                          Today, Microsoft and partners across 35 countries took coordinated legal and technical steps to disrupt one of the world’s most prolific botnets, called Necurs, which has infected more than nine million computers globally. This disruption is the result of eight years of tracking and planning and will help ensure the criminals behind this network are no longer able to use key elements of its infrastructure to execute cyberattacks.

                          A botnet is a network of computers that a cybercriminal has infected with malicious software, or malware. Once infected, criminals can control those computers remotely and use them to commit crimes. Microsoft’s Digital Crimes Unit, BitSight and others in the security community first observed the Necurs botnet in 2012 and have seen it distribute several forms of malware, including the GameOver Zeus banking trojan.
                          Read more →
                        • Monitor linux — cross platform firmware with zabbix server

                            About


                            This is small cross-platform linux-distro with zabbix server. It's a simple way to deploy powerful monitoring system on ARM platfornms and x86_64.


                            Worked as firmware (non-changeable systemd image with config files), have web-interface for system management like network settings, password and other.


                            Who is interested


                            • System admins/engineers who need to fast deploy of zabbix server.
                            • Everyone, who want to deploy zabbix on ARM.
                            • Enthusiasts
                            Read more →
                          • A Brief Comparison of the SDS Architectures for Virtualization

                            • Translation

                            The search for a suitable storage platform: GlusterFS vs. Ceph vs. Virtuozzo Storage


                            This article outlines the key features and differences of such software-defined storage (SDS) solutions as GlusterFS, Ceph, and Virtuozzo Storage. Its goal is to help you find a suitable storage platform.

                            Gluster



                            Let’s start with GlusterFS that is often used as storage for virtual environments in open-source-based hyper-converged products with SDS. It is also offered by Red Hat alongside Ceph.
                            GlusterFS employs a stack of translators, services that handle file distribution and other tasks. It also uses services like Brick that handle disks and Volume that handle pools of bricks. Next, the DHT (distributed hash table) service distributes files into groups based on hashes.
                            Note: We’ll skip the sharding service due to issues related to it, which are described in linked articles.

                            image

                            When a file is written onto GlusterFS storage, it is placed on a brick in one piece and copied to another brick on another server. The next file will be placed on two or more other bricks. This works well if the files are of about the same size and the volume consists of a single group of bricks. Otherwise the following issues may arise:
                            Read more →
                          • Full disclosure: 0day vulnerability (backdoor) in firmware for Xiaongmai-based DVRs, NVRs and IP cameras


                              This is a full disclosure of recent backdoor integrated into DVR/NVR devices built on top of HiSilicon SoC with Xiaongmai firmware. Described vulnerability allows attacker to gain root shell access and full control of device. Full disclosure format for this report has been chosen due to lack of trust to vendor. Proof of concept code is presented below.
                              Read more →