Pull to refresh

All streams

Show first
Rating limit

How we tackled document recognition issues for autonomus and automatic payments using OCR and NER

Python *Natural Language Processing *
Sandbox

In this article, I would like to describe how we’ve tackled the named entity recognition (aka NER) issue at Sber with the help of advanced AI techniques. It is one of many natural language processing (NLP) tasks that allows you to automatically extract data from unstructured text. This includes monetary values, dates, or names, surnames and positions.

Just imagine countless textual documents even a medium-sized organisation deals with on a daily basis, let alone huge corporations. Take Sber, for example: it is the largest financial institution in Russia, Central and Eastern Europe that has about 16,500 offices with over 250,000 employees, 137 million retail and 1.1 million corporate clients in 22 countries. As you can imagine, with such an enormous scale, the company collaborates with hundreds of suppliers, contractors and other counterparties, which implies thousands of contracts. For instance, the estimated number of legal documents to be processed in 2022 has been over 65,000, each of them consisting of 30 pages on average. During the lifecycle of a contract, a contract usually updated with 3 to 5 additional agreements. On top of this, a contract is accompanied by various source documents describing transactions. And in the PDF format, too.

Previously, the processing duty befell our service centre’s employees who checked whether payment details in a bill match those in the contract and then sent the document to the Accounting Department where an accountant double-checked everything. This is quite a long journey to a payment, right?

Read more
Rating 0
Views 25
Comments 0

An Antidote to Absent-Mindedness, or How I Gained Access to an OpenShift Node without an SSH Key

Иннотех corporate blog System administration **nix *DevOps *Openshift *
Translation

Typically when a Node falls out of the OpenShift cluster, this is resolved by simply restarting the offending element. What should you do, however, if you’ve forgotten the SSH key or left it in the office? You can attempt to restore access by using your wit and knowledge of Linux commands. Renat Garaev, lead developer at Innotech, described how he found the solution for this riddle and what was the outcome.

Read more
Rating 0
Views 4.1K
Comments 0

«If I had a heart...» Artificial Intelligence

Reading room Artificial Intelligence Science fiction

Most people fear of artificial intelligence (AI) for the unpredictability of its possible actions and impact [1], [2]. In regard to this technology concerns are voiced also by AI experts themselves - scientists, engineers, among whom are the foremost faces of their professions [3], [4], [5]. And you possibly share these concerns because it's like leaving a child alone at home with a loaded gun on the table - in 2021, AI was first used on the battlefield in completely autonomous way: with an independent determination of a target and a decision to defeat it without operator participation [6]. But let’s be honest, since humanity has taken in the opportunities this new tool could give us, there is already no way back – this is how the law of gengle works [7].

Imagine the feeling of a caveman observing our modern routine world: electricity, Internet, smartphones, robots... etc. In the next two hundred years in large part thankfully to AI humankind will undergo the number of transformations it has since the moment we have learned to control the fire [8]. The effect of this technology will surpass all our previous changes as a civilization. And even as a species, because our destiny is not to create AI, but to literally become it.

... more, give me more, give me more ...
Rating 0
Views 2.3K
Comments 0

Text-based CAPTCHA in 2022

Information Security *Machine learning *Artificial Intelligence
Translation

The first text-based CAPTCHA ( we’ll call it just CAPTCHA for the sake of brevity ) was used in 1997 by AltaVista search engine. It prevented bots from adding Uniform Resource Locator (URLs) to their web search engine.

Back then it was a decent defense measure. However the progress can't be stopped, and this defense was bypassed using OCR available at those times (for example FineReader).

CAPTCHA became more complex, noise was added to it, along with distortions, so the popular OCRs couldn’t recognize this text. And then OCRs custom made for this task appeared. It costed extra money and knowledge for the attacking side. The CAPTCHA developers were required to understand the challenges the attackers met, what distortions to add, in order to make the automation of the CAPTCHA recognition more complex.

The misunderstanding of the principles the OCRs were based on, some CAPTCHAs were given such distortions, that they were more of a hassle for regular users than for a machine.

OCRs for different types of CAPTCHAs were made using heuristics, and the most complicated part of it was the CAPTCHA segmentation for the stand along symbols, that subsequently could be easily recognized by the CNN (for example LeNet-5), also SVM showed a good result even on the raw pixels.

In this article I’ll try to grasp the whole history of CAPTCHA recognition, from heuristics to the contemporary automated recognition systems. We’ll figure out, if a CAPTCHA is still alive.

I’ll review the yandex.com CAPTCHA. The Russian version of the same CAPTCHA is more complex.

Read more
Total votes 4: ↑3 and ↓1 +2
Views 604
Comments 0

Collective meaning recognition

Search engines *Semantics *Algorithms *Natural Language Processing *
Translation

The published material is in the Appendix of my book [1]

Modern civilization finds itself at a crossroads in which to choose the meaning of life. Because of the development of technology, the majority of the world's population may be "superfluous" - not in demand in the production of values. There is another option, where each person is a supreme value, an absolute individual and can be indispensably useful in the technology of the collective mind.

In the eighties of the last century, the task of creating a scientific field of "collective intelligence" was set. Collective intelligence is defined as the ability of the collective to find solutions to problems more effectively than each participant individually. The right collective mind must be...

Read more
Total votes 2: ↑2 and ↓0 +2
Views 589
Comments 0

Wi-Fi and CWMP (TR-069) / USP (TR-369) protocols: frequecy optimization attempt

Wireless technologies *Popular science

I guess, it's not a big deal to say that Wi-Fi (IEEE 802.11 standards) is the one of the most popular and most spread communication technology of the current day. Especially indoors. The growing number of Wi-Fi devices still remains that leads to the overcrowded spectrums: both 2.4 GHz and 5 GHz.


This fact means increasing of demand for some optimization routines for utilization of resources. And therefore some RRM (Radio Resource Management) systems become required.



Read more →
Total votes 1: ↑1 and ↓0 +1
Views 3.3K
Comments 0

A Step-by-Step Guide To Integrate Video Calling Features Within Apps Using WebRTC

API *Video conferencing
Tutorial

WebRTC integrations have emerged as a game-changer in the Video Calling Technology over the years. The protocol has redefined the way real-time video communications take Developers can integrate WebRTCs commonly available as JavaScript APIs to add audio and video solutions to their apps. place. Developers can integrate WebRTCs commonly available as JavaScript APIs to add audio and video solutions to their apps. This tutorial will take you through the steps in developing a two-way video call between two devices. 

WebRTC (Web Real-Time Communication) is a set of rules that can establish bidirectional and full-duplex communication between our two devices using JavaScript. It connects your devices and enables transfer of unlimited real-time audio and video across any operating system. However, the WebRTC agents created for both devices do not know any information about each other inorder to establish the media exchange. At this point, a third, mutually agreed-upon server is introduced. This server which connects the devices to transfer data with necessary information about the endpoints is known as the Signaling Server. 

Before we start off with the steps, it is necessary to become familiar with the basics of the integration process. 

Read more
Rating 0
Views 1.2K
Comments 0

Utilitarian blockchain. 1. Assets

Decentralized networks Research and forecasts in IT Finance in IT Cryptocurrencies
image

In the modern world, the term " **blockchain** " is steadily associated with cryptocurrencies, NFTs, mining, trading and financial pyramids. However, even among programmers and IT people there is not always a clear understanding of what it is and what it is for.

This article attempts to look at this still relatively new element of the information and human space in practical and slightly philosophical aspects.

> **Disclaimer**: The article will use simple language to explain non-trivial concepts, so non-critical distortion of technical details is possible.
Read more →
Rating 0
Views 627
Comments 0

How to mimic Agile correctly?

Agile *

A similar article should have appeared earlier, about ten or fifthteen years ago, when Agile was just starting to be implemented in companies. How many mistakes, problems, conflicts could be avoided if managers immediately approached the issue correctly ...

But during this time, the experience of "implementing" Agile in different conditions, in different companies has accumulated, which should be generalized and widely disseminated.

Read more
Rating 0
Views 782
Comments 0

Two Factor Authentication — More Security, Less Effort

API *

Today we're talking multi-factor authentication, also known as two-factor authentication, and 2-step verification. It's got a few names but what is it?

Well, essentially it's proving your identity in more than one way. The principle being that if one of these authentication factors is defeated, that's not enough to give access to your data. So your data should be secure if someone steals your password for example. It's not enough to just log in twice. These different authentication factors have to work in a fundamentally different way. So you can't just use a second password or a password in a pin; because passwords and pins both rely on the same thing - your memory. So if they have to work in different ways, what different factors are available for us to use? Well, here are some of the common ones.

Read more
Rating 0
Views 792
Comments 0

Information is changed by entropy

Data recovery *
Recovery mode

According to the no-cloning theorem it is impossible to create an independent and identical copy of an arbitrary unknown quantum state. We cannot delete any quantum information as well. All changes in time of the state vector in quantum mechanics are described by the action of unitary operators. Accordingly, there must be an operator performing a deleting operation. The operator must be a zero matrix in order to nullify quantum information totally in all cases. But a zero matrix is not a uninaty or hermitian matrix. Therefore there is no such unitary operator that can delete information.

   This might be proven in another way. Let us imagine the double-slit thought experiment where interference exists when we do not know about the system and interference does not when we know about the system. Assume we have a storage where the data is stored and the experiment is being conducted with knowing about the system. Suppose we destroy the storage. What does the screen in the experiment show us? Quantum mechanics tells us that there must not be interference. Should it appear after the data is destroyed? Since the wave function has collapsed it cannot be restored. If there is a chance to delete the information in the experiment, it means that the wave function must go back to the initial state and show us interference, which is a contradiction.

   Based on the foregoing, we will consider the quantum eraser experiment. In that experiment information is neither erased nor disappeared. It is being changed. That is the key point. We increase entropy. If there is a 50 per cent chance to get interference then the entropy = 1 (max value). The same with a spin. If we change spins of elementary particles, for example in the Stern–Gerlach experiment with different axes measurements, we do not delete the information about the states of particles, we increase the entropy. Changing does not equal deleting. 

Read more
Total votes 3: ↑3 and ↓0 +3
Views 729
Comments 1