Development

Articles News Hubs Authors Companies

c3037 yesterday at 19:35

Notes about OpenTracing and Logs

Go *

1) OpenTracing (OT) != Logs but they are very similar.

2) Every application has 2 types of scopes: ApplicationScope (AScope) and RequestScope (RScope).

142

c3037 yesterday at 19:29

ArGOtecture

Go *

This is an article that describes my vision of building a system that actively uses Go as the main programming language and SOA/microservices as a design paradigm.

Here I will try to cover 4 chapters that together allow us to build a solid and reliable system.

178

Anastasia_Kochetova 30 June at 21:45

How Analyst Days/14 went for us

Иннотех corporate blog System Analysis and Design *Conferences IT-companies

Translation

Conference participation is one of the most important practices for professional development. Hence, Innotech is actively sending out both its speakers and listeners for the biggest events. Senior Analyst Anastasia Kochetova shares her impressions from the Analyst Days/14 conference.

150

snakers4 30 June at 15:39

Multilingual Text-to-Speech Models for Indic Languages

Machine learning *Natural Language Processing *Voice user interfaces *

In this article, we shall provide some background on how multilingual multi-speaker models work and test an Indic TTS model that supports 9 languages and 17 speakers (Hindi, Malayalam, Manipuri, Bengali, Rajasthani, Tamil, Telugu, Gujarati, Kannada).

It seems a bit counter-intuitive at first that one model can support so many languages and speakers provided that each Indic language has its own alphabet, but we shall see how it was implemented.

Also, we shall list the specs of these models like supported sampling rates and try something cool – making speakers of different Indic languages speak Hindi. Please, if you are a native speaker of any of these languages, share your opinion on how these voices sound, both in their respective language and in Hindi.

249

vldmrvslv 29 June at 17:24

Detecting attempts of mass influencing via social networks using NLP. Part 2

Python *Data Mining *Twitter API *Big Data *Natural Language Processing *

Tutorial

In Part 1 of this article, I built and compared two classifiers to detect trolls on Twitter. You can check it out here.

Now, time has come to look more deeply into the datasets to find some patterns using exploratory data analysis and topic modelling.

EDA

To do just that, I first created a word cloud of the most common words, which you can see below.

227

vldmrvslv 29 June at 17:20

Detecting attempts of mass influencing via social networks using NLP. Part 1

Python *Data Mining *Twitter API *Big Data *Natural Language Processing *

Tutorial

During the last decades, the world’s population has been developing as an information society, which means that information started to play a substantial end-to-end role in all life aspects and processes. In view of the growing demand for a free flow of information, social networks have become a force to be reckoned with. The ways of war-waging have also changed: instead of conventional weapons, governments now use political warfare, including fake news, a type of propaganda aimed at deliberate disinformation or hoaxes. And the lack of content control mechanisms makes it easy to spread any information as long as people believe in it.

Based on this premise, I’ve decided to experiment with different NLP approaches and build a classifier that could be used to detect either bots or fake content generated by trolls on Twitter in order to influence people.

In this first part of the article, I will cover the data collection process, preprocessing, feature extraction, classification itself and the evaluation of the models’ performance. In Part 2, I will dive deeper into the troll problem, conduct exploratory analysis to find patterns in the trolls’ behaviour and define the topics that seemed of great interest to them back in 2016.

Features for analysis

From all possible data to use (like hashtags, account language, tweet text, URLs, external links or references, tweet date and time), I settled upon English tweet text, Russian tweet text and hashtags. Tweet text is the main feature for analysis because it contains almost all essential characteristics that are typical for trolling activities in general, such as abuse, rudeness, external resources references, provocations and bullying. Hashtags were chosen as another source of textual information as they represent the central message of a tweet in one or two words.

245

ptsecurity 24 June at 12:27

IDS Bypass at Positive Hack Days 11: writeup and solutions

Positive Technologies corporate blog Information Security *Network technologies *CTF *

The IDS Bypass contest was held at the Positive Hack Days conference for the third time (for retrospective, here's . This year we created six game hosts, each with a flag. To get the flag, participants had either to exploit a vulnerability on the server or to fulfill another condition, for example, to enumerate lists of domain users.

The tasks and vulnerabilities themselves were quite straightforward. The difficulty laid in bypassing the IDS: the system inspected network traffic from participants using special rules that look for attacks. If such a rule was triggered, the participant's network request was blocked, and the bot sent them the text of the triggered rule in Telegram.

And yes, this year we tried to move away from the usual CTFd and IDS logs towards a more convenient Telegram bot. All that was needed to take part was to message the bot and pick a username. The bot then sent an OVPN file to connect to the game network, after which all interaction (viewing tasks and the game dashboard, delivering flags) took place solely through the bot. This approach paid off 100%!

Подробнее

452

vldmrvslv 23 June at 18:04

How we tackled document recognition issues for autonomus and automatic payments using OCR and NER

Python *Natural Language Processing *

Sandbox

In this article, I would like to describe how we’ve tackled the named entity recognition (aka NER) issue at Sber with the help of advanced AI techniques. It is one of many natural language processing (NLP) tasks that allows you to automatically extract data from unstructured text. This includes monetary values, dates, or names, surnames and positions.

Just imagine countless textual documents even a medium-sized organisation deals with on a daily basis, let alone huge corporations. Take Sber, for example: it is the largest financial institution in Russia, Central and Eastern Europe that has about 16,500 offices with over 250,000 employees, 137 million retail and 1.1 million corporate clients in 22 countries. As you can imagine, with such an enormous scale, the company collaborates with hundreds of suppliers, contractors and other counterparties, which implies thousands of contracts. For instance, the estimated number of legal documents to be processed in 2022 has been over 65,000, each of them consisting of 30 pages on average. During the lifecycle of a contract, a contract usually updated with 3 to 5 additional agreements. On top of this, a contract is accompanied by various source documents describing transactions. And in the PDF format, too.

Previously, the processing duty befell our service centre’s employees who checked whether payment details in a bill match those in the contract and then sent the document to the Accounting Department where an accountant double-checked everything. This is quite a long journey to a payment, right?

178

Renatk 16 June at 15:51

An Antidote to Absent-Mindedness, or How I Gained Access to an OpenShift Node without an SSH Key

Иннотех corporate blog System administration **nix *DevOps *Openshift *

Translation

Typically when a Node falls out of the OpenShift cluster, this is resolved by simply restarting the offending element. What should you do, however, if you’ve forgotten the SSH key or left it in the office? You can attempt to restore access by using your wit and knowledge of Linux commands. Renat Garaev, lead developer at Innotech, described how he found the solution for this riddle and what was the outcome.

4.2K

vadimszzz 10 June at 04:17

iOS security testing & reverse engineering guide

Information Security *Development for iOS *Reverse engineering *

Comprehensive guide for iOS app security testing and reverse engineering.

1.8K

andrey78910 9 June at 12:10

Text-based CAPTCHA in 2022

Information Security *Machine learning *Artificial Intelligence

Translation

The first text-based CAPTCHA ( we’ll call it just CAPTCHA for the sake of brevity ) was used in 1997 by AltaVista search engine. It prevented bots from adding Uniform Resource Locator (URLs) to their web search engine.

Back then it was a decent defense measure. However the progress can't be stopped, and this defense was bypassed using OCR available at those times (for example FineReader).

CAPTCHA became more complex, noise was added to it, along with distortions, so the popular OCRs couldn’t recognize this text. And then OCRs custom made for this task appeared. It costed extra money and knowledge for the attacking side. The CAPTCHA developers were required to understand the challenges the attackers met, what distortions to add, in order to make the automation of the CAPTCHA recognition more complex.

The misunderstanding of the principles the OCRs were based on, some CAPTCHAs were given such distortions, that they were more of a hassle for regular users than for a machine.

OCRs for different types of CAPTCHAs were made using heuristics, and the most complicated part of it was the CAPTCHA segmentation for the stand along symbols, that subsequently could be easily recognized by the CNN (for example LeNet-5), also SVM showed a good result even on the raw pixels.

In this article I’ll try to grasp the whole history of CAPTCHA recognition, from heuristics to the contemporary automated recognition systems. We’ll figure out, if a CAPTCHA is still alive.

I’ll review the yandex.com CAPTCHA. The Russian version of the same CAPTCHA is more complex.

765

SergeyBPshenichnikov 8 June at 18:38

Algebra of text without formulas

Search engines *Semantics *Algorithms *Natural Language Processing *

Translation

The article is an abstract of my book [1] based on previously presented publications [2], [3], [4], [5]

869

SergeyBPshenichnikov 7 June at 22:41

Collective meaning recognition

Search engines *Semantics *Algorithms *Natural Language Processing *

Translation

The published material is in the Appendix of my book [1]

Modern civilization finds itself at a crossroads in which to choose the meaning of life. Because of the development of technology, the majority of the world's population may be "superfluous" - not in demand in the production of values. There is another option, where each person is a supreme value, an absolute individual and can be indispensably useful in the technology of the collective mind.

In the eighties of the last century, the task of creating a scientific field of "collective intelligence" was set. Collective intelligence is defined as the ability of the collective to find solutions to problems more effectively than each participant individually. The right collective mind must be...

691

danilovmy 7 June at 06:59

Django ModelAdmins autoregister

Python *Django *

Some time ago I discovered that Django has the ability to auto-register ModelAdmins. Since this is not common knowledge and carries a number of benefits, I decided to write an article about it to bring it to the attention of the Django community.

Read about Django ModelAdmin autoregistry

867

parthiba 30 May at 13:18

A Step-by-Step Guide To Integrate Video Calling Features Within Apps Using WebRTC

API *Video conferencing

Tutorial

WebRTC integrations have emerged as a game-changer in the Video Calling Technology over the years. The protocol has redefined the way real-time video communications take Developers can integrate WebRTCs commonly available as JavaScript APIs to add audio and video solutions to their apps. place. Developers can integrate WebRTCs commonly available as JavaScript APIs to add audio and video solutions to their apps. This tutorial will take you through the steps in developing a two-way video call between two devices.

WebRTC (Web Real-Time Communication) is a set of rules that can establish bidirectional and full-duplex communication between our two devices using JavaScript. It connects your devices and enables transfer of unlimited real-time audio and video across any operating system. However, the WebRTC agents created for both devices do not know any information about each other inorder to establish the media exchange. At this point, a third, mutually agreed-upon server is introduced. This server which connects the devices to transfer data with necessary information about the endpoints is known as the Signaling Server.

Before we start off with the steps, it is necessary to become familiar with the basics of the integration process.

1.3K

kaze_no_saga 24 May at 08:11

Queries in PostgreSQL. Index scan

Postgres Professional corporate blog PostgreSQL *SQL *

Translation

Queries in PostgreSQL. Index scan

In previous articles we discussed query execution stages and statistics. Last time, I started on data access methods, namely Sequential scan. Today we will cover Index Scan.

3.2K

djezzzl 19 May at 21:48

Enhanced ActiveRecord preloading

Ruby *Ruby on Rails *

In this guide, I'd like to share with you tips and tricks about ActiveRecord preloading and how you can enhance it to the next level.

2.5K

Gehta 17 May at 14:48

Two Factor Authentication — More Security, Less Effort

API *

Today we're talking multi-factor authentication, also known as two-factor authentication, and 2-step verification. It's got a few names but what is it?

Well, essentially it's proving your identity in more than one way. The principle being that if one of these authentication factors is defeated, that's not enough to give access to your data. So your data should be secure if someone steals your password for example. It's not enough to just log in twice. These different authentication factors have to work in a fundamentally different way. So you can't just use a second password or a password in a pin; because passwords and pins both rely on the same thing - your memory. So if they have to work in different ways, what different factors are available for us to use? Well, here are some of the common ones.

834

tony-space 14 May at 21:06

Idiomatic Event Loop in C++

Programming *C++ *Concurrent computing *

Sometimes programming with mutexes gets too complicated and messy. Maybe you need to meet a new friend — the Event Loop pattern.

2.9K

alextretyak 1 May at 00:00

Lexical Analysis in 11l

Programming *Compilers *

This article discusses the lexical analyzer, which is an integral part of any compiler.

The task of the lexical analyzer is to split the source code of the program into tokens.

So for example the code

print(1 + 2)

will be tokenized as
print, (, 1, +, 2 and )

2.5K

2 3 ...

73 74

Development

Notes about OpenTracing and Logs

ArGOtecture

How Analyst Days/14 went for us

Multilingual Text-to-Speech Models for Indic Languages

Detecting attempts of mass influencing via social networks using NLP. Part 2

Detecting attempts of mass influencing via social networks using NLP. Part 1

IDS Bypass at Positive Hack Days 11: writeup and solutions

How we tackled document recognition issues for autonomus and automatic payments using OCR and NER

An Antidote to Absent-Mindedness, or How I Gained Access to an OpenShift Node without an SSH Key

iOS security testing & reverse engineering guide

Text-based CAPTCHA in 2022

Algebra of text without formulas

Collective meaning recognition

Django ModelAdmins autoregister

A Step-by-Step Guide To Integrate Video Calling Features Within Apps Using WebRTC

Queries in PostgreSQL. Index scan

Queries in PostgreSQL. Index scan

Enhanced ActiveRecord preloading

Two Factor Authentication — More Security, Less Effort

Idiomatic Event Loop in C++

Lexical Analysis in 11l

Your account

Sections

Information

Services