• We have published a model for text repunctuation and recapitalization for four languages


      Open In Colab


      Working with speech recognition models we often encounter misconceptions among potential customers and users (mostly related to the fact that people have a hard time distinguishing substance over form). People also tend to believe that punctuation marks and spaces are somehow obviously present in spoken speech, when in fact real spoken speech and written speech are entirely different beasts.


      Of course you can just start each sentence with a capital letter and put a full stop at the end. But it is preferable to have some relatively simple and universal solution for "restoring" punctuation marks and capital letters in sentences that our speech recognition system generates. And it would be really nice if such a system worked with any texts in general.


      For this reason, we would like to share a system that:


      • Inserts capital letters and basic punctuation marks (dot, comma, hyphen, question mark, exclamation mark, dash for Russian);
      • Works for 4 languages (Russian, English, German, Spanish) and can be extended;
      • By design is domain agnostic and is not based on any hard-coded rules;
      • Has non-trivial metrics and succeeds in the task of improving text readability;

      To reiterate — the purpose of such a system is only to improve the readability of the text. It does not add information to the text that did not originally exist.

      Read more →
    • Helpful service for microservice JSON-RPC based test automation

        Test automation, with product built in microservice architecture could be very situational in context of testing goals and ways to achieve them. You got an easy life if you testing a service, that is an isolated entity, which is receiving some data and providing a result of it's work in a response, by callback or through additional endpoint. In this case all you need to do is cover all the endpoints of the service, and probably learn to catch it's callbacks. However, it's not the only case. Sometimes you need to test service which isn't totally isolated, but a part of a chain of interactions. This service could send some data to other services within your infrastructure or even to third parties. This time you got plenty of additional things to bother of:

        Read more
      • How to fund an MVP-stage startup? An ultimate guide to initial funding

        • Tutorial

        Free money to fund your own business is probably the most cherished dream of every budding entrepreneur.

        And getting a grant is what can bring it to life. While small business owners dream of a grant to fund their startup, the process of its obtaining is not as easy and cloudless as it may seem.

        However, if you know where to look for the right fund and how to apply, one can significantly nick in the path to getting seed money.

        Difference between a grant, attracted investment, and loan.

        Read more
      • Millions of orders per second matching engine testing

        I had some experience in the matching engine development for cryptocurrency exchange some time ago. That was an interesting and challenging experience. I developed it in clear C++ from scratch. The testing of it is also quite a challenging task. You need to get data for testing, perform testing, collect some statistics, and at last, analyze collected data to find weak points and bottlenecks. I want to focus on testing the C++ matching engine and show how testing can give insights for optimizations even without the need to change the code. The matching engine I developed can do more than 1’000’000 TPS (transactions per second) and is 10x times faster than the matching engine of the Binance cryptocurrency exchange (see one post on Binance Blog).

        Read more
      • PIM or MDM: which system is better for retail?

          Effective data management is a critical aspect in retail. You have to manage information about customers, products, services, staff, materials, and so on. You should have a source that you will trust. And you need to store, process, moderate, and administer data in this system. 

          Until recently, retailers only knew MDM - Master Data Management. A traditional MDM system is a system that knows about different data sources. It contains the “golden standard” of data. 

          Imagine that your stores have one customer information, your online store has another, and your marketing services have third data. MDM system collects all these types of information in a single source. The system can find the same clients, spelled differently, and eliminate errors in the data based on different algorithms.

          The evolution of MDM systems has led to the emergence of highly specialized master systems. Modern business does not need to implement a heavy MDM to manage only product data. There are PIM systems for this task.

          Read more
        • Software testers — an endangered species?

            Nothing and nobody will escape oblivion. Whatever you may say, the history of mankind is a history of automation and the subsequent evolution of workers. This happened both during the first industrial revolution and during the second. The same thing happened with digital revolution. Now machine learning and artificial intelligence are being implemented everywhere. What is the future of software testing?

            Read more
          • CWE Top 25 2021. What is it, what is it for and how is it useful for static analysis?

              For the first time PVS-Studio provided support for the CWE classification in the 6.21 release. It took place on January 15, 2018. Years have passed since then and we would like to tell you about the improvements related to the support of this classification in the latest analyzer version.


              https://import.viva64.com/docx/blog/0869_CWE_status/image1.png


              Read more →
            • Ads
              AdBlock has stolen the banner, but banners are not teeth — they will be back

              More
            • Big Data Tools with IntelliJ IDEA Ultimate, PyCharm Professional, DataGrip 2021.3 EAP, and DataSpell Support

                Recently we released a new build of the Big Data Tools plugin that is compatible with the 2021.3 versions of IntelliJ IDEA and PyCharm. DataGrip 2021.3 support will be available immediately after the release in October. The plugin also supports our new data science IDE – JetBrains DataSpell. If you still use previous versions, now is the perfect time to upgrade both your IDE and the plugin. 

                This year, we introduced a number of new features as well as some features that have been there for a while, for example, running Spark Submit with a run configuration.

                Here’s a list of the key improvements:

                Read more
              • How malware gets into the App Store and why Apple can't stop that

                  Only after I had published a post detailing three iOS 0-day vulnerabilities and expressing my frustration with Apple Security Bounty Program, I received a reply from Apple:

                  We saw your blog post regarding this issue and your other reports.

                  We apologize for the delay in responding to you. We want to let you know that we are still investigating these issues and how we can address them to protect customers. Thank you again for taking the time to report these issues to us, we appreciate your assistance.

                  Please let us know if you have any questions.

                  Indeed, I do have questions. The same ones that you have ignored. I'm gonna repeat them. Why was the fix for analyticsd vulnerability quietly included in iOS 14.7 update but not mentioned on its security content list? Why did you promise to include it in the next update's list but broke your words not once but trice? Why do you keep ignoring these questions?

                  After my previous post, some people have expressed doubts that such code can make its way into the App Store. It's understandable why they think this way. That's because Apple makes people believe that the App Store is safe by repeating it over and over. Moreover, they claim that they disallow alternative stores and application sideloading to keep users safe and otherwise they would be in great danger. Android has alternative stores and unrestricted sideloading, and have you heard about any kind of security problems with Android recently? I haven't. But it the last few months alone there were so many reports about security and privacy issues on Apple platforms. And the real reason that Apple doesn't allow any alternatives to the App Store is that they receive 30% commission on all purchases made inside any app and it's a tremendously lucrative business for them. They also enact censorship by choosing to allow or disallow any app into the App Store based purely on subjective opinions of their employees and managers.

                  So in this article I'm going to dispute the claim that the App Store is safe, voice my complaints about the App Store review process and provide a detailed explanation (including source code) how malicious apps on the App Store conceal their functionality from the App Store review team and are able to sneak into the App Store.

                  Read more
                  • +27
                  • 29.8k
                  • 8
                • Why we need dynamic code analysis: the example of the PVS-Studio project

                    In May 2021, CppCast recorded a podcast called ABI stability (CppCast #300). In this podcast, Marshall Clow and the hosts discussed rather old news — Visual Studio compilers support the AddressSantitzer tool. We have already integrated ASan into our testing system a long time ago. Now we want to tell you about a couple of interesting errors it found.


                    0868_PVS-Studio_ASan/image2.png

                    Read more →
                  • Difficulties You Might Encounter When Using vue-i18n

                    After few months of frustration with trying to use the "de-facto" internationalization library for Vue.js - vue-i18n, I've decided it is time to replace it. And that is why I have created fluent-vue - a Vue.js internationalization plugin that uses Mozilla's Fluent syntax to allow for natural-sounding translations.

                    In this post, I try to explain what problems I have encountered when trying to use vue-i18n library in my app, and how Fluent syntax solves them.

                    Read more
                  • Disclosure of three 0-day iOS vulnerabilities and critique of Apple Security Bounty program

                    • Translation

                    I want to share my frustrating experience participating in Apple Security Bounty program. I've reported four 0-day vulnerabilities this year between March 10 and May 4, as of now three of them are still present in the latest iOS version (15.0) and one was fixed in 14.7, but Apple decided to cover it up and not list it on the security content page. When I confronted them, they apologized, assured me it happened due to a processing issue and promised to list it on the security content page of the next update. There were three releases since then and they broke their promise each time.

                    Read more to learn the specifics of 0-day vulnerabilities.

                    Read more
                  • Insights Into Proactive Threat Hunting

                    Proactive search for complex threats seems to be a useful technology but inaccessible for many organizations. Is it really so? What do companies need to do to start Threat Hunting? What tools are needed for threat hunting? What trends in this area can be seen on the market in the coming years? These are some of the questions I would like to answer in my article today.

                    What is Threat Hunting?

                    Threat Hunting is a search for threats in a proactive mode when the information security specialist is sure that the network is compromised. He should understand how his network operates in order to be able to identify various attacks by examining the existing anomalies.

                    Threat Hunting is a search for threats that have already bypassed automated detection systems. Moreover, most often, you do not have signals or alerts that allow you to detect an intrusion.

                    From the SOC perspective, Threat Hunting is an extension of the service that allows you to counter any level of intruders, including those who use previously unknown tools and methods.

                    Threat Hunting can be based on some data obtained by a security specialist, or it can be based on a hypothesis. If after testing the hypothesis, the test gives a positive result, then later, it can be used to improve the processes and mechanisms of detecting threats. And also, Threat Hunting allows you to find blind spots in the security system and expand the monitoring area.

                    What organizations need Threat Hunting?

                    Proactive threat hunting is relevant to those organizations that can become the target of a complex, targeted APT attack. At the same time, given the trend towards supply chain attacks, a small company may also become a target for motivated attackers.

                    Read more
                  • MISRA C: struggle for code quality and security

                      A couple of years ago the PVS-Studio analyzer got its first diagnostic rules to check program code compliance with the MISRA C and MISRA C++ standards. We collected feedback and saw that our clients were interested in using the analyzer to check their projects for MISRA compliance. So, we decided to further develop the analyzer in this direction. The article covers the MISRA C/C++ standard and the MISRA Compliance report. It also shows what we already managed to do and what we plan to achieve by the end of the year.


                      0866_MISRA_C/image1.png

                      Read more →
                    • Who controls App Store: Martians or AI? Closed session of Russia's Federation Council and Apple leaked online



                        Video recording of a closed session of the upper house of Russia's parliament was leaked online by Telegram channel A000MP97. In the video, Andrei Klimov, head of the Ad Hoc Sovereignty and Preventing Interference in the Domestic Affairs Commission, demands Apple to disclose who controls the App Store: people from Mars or artificial intelligence?

                        On September 16th, a closed session of the Commission took place, and representatives of Apple and Google were among those who were invited. The session discussed ways to protect sovereignty of the country, in particular, the fact that the Navalny app was still available in Apple App Store and Google Play. The services were accused of being complicit with organisations deemed extremist and banned in Russia as well as interference with Russian elections.
                        Read more →
                      • High-level pipelining in TL-Verilog, RISC-V from Imagination, formal tools and open-source EDA on ChipEXPO in Moscow

                          This year ChipEXPO conference in Moscow invited several Western speakers to present in English the emerging technologies in high-level HDLs, formal verification, open-source EDA and using industrual RISC-V cores for education. You can join these presentations on September 14-16 for free using this link (you may need to use google translate from Russian to go through the registration) https://eventswallet.com/en/events/282/

                          The whole program is here

                          The English-speaking presentations and tutorials include:

                          Read more
                        • UAVCAN HITL UAV Simulator for PX4

                            Hi from RaccoonLab, a team of enthusiasts in field robotics! We want to share our true-HITL UAVCAN-based simulator for PX4.

                            We believe a unified UAVCAN bus for drone onboard electronics will become a mainstream approach shortly. Our simulator is already based on UAVCAN (in opposition to UART-MAVLINK) and emulates exactly the same messages as real UAVCAN-sensors.

                            Read more
                          • Access the power of hardware accelerated video codecs in your Windows applications via FFmpeg / libavcodec

                            • Tutorial
                            Since 2011 all Intel GPUs (integrated and discrete Intel Graphics products) include Intel Quick Sync Video (QSV) — the dedicated hardware core for video encoding and decoding. Intel QSV is supported by all popular video processing applications across multiple OSes including FFmpeg. The tutorial focuses on Intel QSV based video encoding and decoding acceleration in Windows native (desktop) applications using FFmpeg/libavcodec for video processing. To illustrate concepts described, the open source 3D Streaming Toolkit is used.
                            Read more →
                          • Comparing Huawei ExaGear to Apple's Rosetta 2 and Microsoft's solution

                            • Translation

                            November 10, 2020 was in many ways a landmark event in the microprocessor industry: Apple unveiled its new Mac Mini, the main feature of which was the new M1 chip, developed in-house. It is not an exaggeration to say that this processor is a landmark achievement for the ARM ecosystem: finally an ARM architecture chip whose performance surpassed x86 architecture chips from competitors such as Intel, a niche that had been dominated for decades.

                            But the main interest for us is not the M1 processor itself, but the Rosetta 2 binary translation technology. This allows the user to run legacy x86 software that has not been migrated to the ARM architecture. Apple has a lot of experience in developing binary translation solutions and is a recognized leader in this area. The first version of the Rosetta binary translator appeared in 2006 were it aided Apple in the transition from PowerPC to x86 architecture. Although this time platforms were different from those of 2006, it was obvious that all the experience that Apple engineers had accumulated over the years, was not lost, but used to develop the next version - Rosetta 2.

                            We were keen to compare this new solution from Apple, a similar product Huawei ExaGear (with its lineage from Eltechs ExaGear) developed by our team. At the same time, we evaluated the performance of binary translation from x86 to Arm provided by Microsoft (part of MS Windows 10 for Arm devices) on the Huawei MateBook E laptop. At present, these are the only other x86 to Arm binary translation solution that we are aware of on the open market.

                            Read more