Pull to refresh
240.92
Rating

Python *

Interpreted high-level programming language for general-purpose programming

Show first
  • New
  • Top
Rating limit
  • All
  • ≥0
  • ≥10
  • ≥25
  • ≥50
  • ≥100

We have published a model for text repunctuation and recapitalization for four languages

Python *Big Data *Machine learning *Natural Language Processing *


Open In Colab


Working with speech recognition models we often encounter misconceptions among potential customers and users (mostly related to the fact that people have a hard time distinguishing substance over form). People also tend to believe that punctuation marks and spaces are somehow obviously present in spoken speech, when in fact real spoken speech and written speech are entirely different beasts.


Of course you can just start each sentence with a capital letter and put a full stop at the end. But it is preferable to have some relatively simple and universal solution for "restoring" punctuation marks and capital letters in sentences that our speech recognition system generates. And it would be really nice if such a system worked with any texts in general.


For this reason, we would like to share a system that:


  • Inserts capital letters and basic punctuation marks (dot, comma, hyphen, question mark, exclamation mark, dash for Russian);
  • Works for 4 languages (Russian, English, German, Spanish) and can be extended;
  • By design is domain agnostic and is not based on any hard-coded rules;
  • Has non-trivial metrics and succeeds in the task of improving text readability;

To reiterate — the purpose of such a system is only to improve the readability of the text. It does not add information to the text that did not originally exist.

Read more →
Total votes 2: ↑1 and ↓1 0
Views 252
Comments 0

Mode on: Comparing the two best colorization AI's

RUVDS.com corporate blog Python *Image processing *Machine learning *TensorFlow *

This article continues a series of notes about colorization. During today's experiment, we’ll be comparing a recent neural network with the good old Deoldify to gauge the rate at which the future is approaching.

This is a practical project, so we won’t pay extra attention to the underlying philosophy of the Transformer architecture. Besides, any attempt to explain the principles of its operation to a wide public in hand waving terms would become misguiding.

A lecturer: Mr. Petrov! How does a transformer work?
Petrov with a bass voice: Hum-m-m-m.


Google Colorizing Transformer vs Deoldify

Read more →
Total votes 17: ↑17 and ↓0 +17
Views 845
Comments 0

Data Phoenix Digest — 01.07.2021

Python *Algorithms *Big Data *Machine learning *Artificial Intelligence

We at Data Science Digest have always strived to ignite the fire of knowledge in the AI community. We’re proud to have helped thousands of people to learn something new and give you the tools to push ahead. And we’ve not been standing still, either.

Please meet Data Phoenix, a Data Science Digest rebranded and risen anew from our own flame. Our mission is to help everyone interested in Data Science and AI/ML to expand the frontiers of knowledge. More news, more updates, and webinars(!) are coming. Stay tuned!

The new issue of the new Data Phoenix Digest is here! AI that helps write code, EU’s ban on biometric surveillance, genetic algorithms for NLP, multivariate probabilistic regression with NGBoosting, alias-free GAN, MLOps toys, and more…

If you’re more used to getting updates every day, subscribe to our Telegram channel or follow us on social media: TwitterFacebook.

Read more
Total votes 1: ↑0 and ↓1 -1
Views 712
Comments 1

DataScience Digest — 24.06.21

Python *Algorithms *Big Data *Machine learning *Artificial Intelligence

The new issue of DataScienceDigest is here!

The impact of NLP and the growing budgets to drive AI transformations. How Airbnb standardized metric computation at scale. Cross-Validation, MASA-SR, AgileGAN, EfficientNetV2, and more.

If you’re more used to getting updates every day, subscribe to our Telegram channel or follow us on social media: Twitter, LinkedIn, Facebook.

Read more
Total votes 2: ↑1 and ↓1 0
Views 681
Comments 0

You are standing at a red light at an empty intersection. How to make traffic lights smarter?

Python *IT Infrastructure *Big Data *

Types of smart traffic lights: adaptive and neural networks

Adaptive works at relatively simple intersections, where the rules and possibilities for switching phases are quite obvious. Adaptive management is only applicable where there is no constant loading in all directions, otherwise it simply has nothing to adapt to – there are no free time windows. The first adaptive control intersections appeared in the United States in the early 70s of the last century. Unfortunately, they have reached Russia only now, their number according to some estimates does not exceed 3,000 in the country.

Neural networks – a higher level of traffic regulation. They take into account a lot of factors at once, which are not even always obvious. Their result is based on self-learning: the computer receives live data on the bandwidth and selects the maximum value by all possible algorithms, so that in total, as many vehicles as possible pass from all sides in a comfortable mode per unit of time. How this is done, usually programmers answer – we do not know, the neural network is a black box, but we will reveal the basic principles to you…

Adaptive traffic lights use, at least, leading companies in Russia, rather outdated technology for counting vehicles at intersections: physical sensors or video background detector. A capacitive sensor or an induction loop only sees the vehicle at the installation site-for a few meters, unless of course you spend millions on laying them along the entire length of the roadway. The video background detector shows only the filling of the roadway with vehicles relative to this roadway. The camera should clearly see this area, which is quite difficult at a long distance due to the perspective and is highly susceptible to atmospheric interference: even a light snowstorm will be diagnosed as the presence of traffic – the background video detector does not distinguish the type of detection.

Read more
Total votes 3: ↑3 and ↓0 +3
Views 945
Comments 0

Data Science Digest — 21.04.21

Python *Algorithms *Big Data *Machine learning *Artificial Intelligence

Hi All,

I’m pleased to invite you all to enroll in the Lviv Data Science Summer School, to delve into advanced methods and tools of Data Science and Machine Learning, including such domains as CV, NLP, Healthcare, Social Network Analysis, and Urban Data Science. The courses are practice-oriented and are geared towards undergraduates, Ph.D. students, and young professionals (intermediate level). The studies begin July 19–30 and will be hosted online. Make sure to apply — Spots are running fast!

If you’re more used to getting updates every day, follow us on social media:

Telegram
Twitter
LinkedIn
Facebook

Regards,
Dmitry Spodarets.

Read more
Total votes 3: ↑2 and ↓1 +1
Views 396
Comments 0

Neural network Telegram bot with StyleGAN and GPT-2

Python *Machine learning *Artificial Intelligence Social networks and communities

The Beginning


So we have already played with different neural networks. Cursed image generation using GANs, deep texts from GPT-2 — we have seen it all.


This time I wanted to create a neural entity that would act like a beauty blogger. This meant it would have to post pictures like Instagram influencers do and generate the same kind of narcissistic texts. \


Initially I planned to post the neural content on Instagram but using the Facebook Graph API which is needed to go beyond read-only was too painful for me. So I reverted to Telegram which is one of my favorite social products overall.


The name of the entity/channel (Aida Enelpi) is a bad neural-oriented pun mostly generated by the bot itself.


One of the first posts generated by Aida

Read more →
Rating 0
Views 1.1K
Comments 1

Data Science Digest — We Are Back

Python *Algorithms *Big Data *Machine learning *Artificial Intelligence

Hi All,

I have some good news for you…

Data Science Digest is back! We’ve been “offline” for a while, but no worries — You’ll receive regular digest updates with top news and resources on AI/ML/DS every Wednesday, starting today.

If you’re more used to getting updates every day, follow us on social media:

Telegram - https://t.me/DataScienceDigest
Twitter - https://twitter.com/Data_Digest
LinkedIn - https://www.linkedin.com/company/data-science-digest/
Facebook - https://www.facebook.com/DataScienceDigest/

And finally, your feedback is very much appreciated. Feel free to share any ideas with me and the team, and we’ll do our best to make Data Science Digest a better place for all.

Regards,
Dmitry Spodarets.

Read more
Rating 0
Views 555
Comments 0

HDB++ TANGO Archiving System

Open source *Python *IT Infrastructure *Data storage *Data storages *
Translation
Tutorial
main

What is HDB++?


This is a TANGO archiving system, allows you to save data received from devices in the TANGO system.


Working with Linux will be described here (TangoBox 9.3 on base Ubuntu 18.04), this is a ready-made system where everything is configured.


What is the article about?


  • System architecture.
  • How to set up archiving.

It took me ~ 2 weeks to understand the architecture and write my own scripts for python for this case.


What is it for?


Allows you to store the history of the readings of your equipment.


  • You don't need to think about how to store data in the database.
  • You just need to specify which attributes to archive from which equipment.
Read more →
Rating 0
Views 395
Comments 0

Coins classifier Neural Network: Head or Tail?

Python *Data Mining *Big Data *Data Engineering *TensorFlow *

Home of this article: https://robotics.snowcron.com/coins/02_head_or_tail.htm

The global objective of these articles is to build a coin classifier, capable of scanning your pocket change and find rare / valuable coins. This is a second article in a series, so let me remind you what happened earlier (https://habr.com/ru/post/538958/).

During previous step we got a rather large dataset composed of pairs of images, loaded from an online coins site meshok.ru. Those images were uploaded to the Internet by people we do not know, and though they are supposed to contain coin's head in one image and tail in the other, we can not rule out a situation when we have two heads and no tail and vice versa. Also at the moment we have no idea which image contains head and which contains tail: this might be important when we feed data to our final classifier.

So let's write a program to distinguish heads from tails. It is a rather simple task, involving a convolutional neural network that is using transfer learning.

Same way as before, we are going to use Google Colab environment, taking the advantage of a free video card they grant us an access to. We will store data on a Google Drive, so first thing we need is to allow Colab to access the Drive:

Читать далее
Rating 0
Views 574
Comments 0

Implementing Offline traceroute Tool Using Python

Python *Network technologies *
Translation

Hey everyone! This post was born from a question asked by an IT forum member. The summary of the question looked as follows:


  • There is a set of text files containing routing tables collected from various network devices.
  • Each file represents one device.
  • Device platforms and routing table formats may vary.
  • It is required to analyze a routing path from any device to an arbitrary subnet or host on-demand.
  • Resulting output should contain a list of routing table entries that are used for the routing to the given destination on each hop.

The one who asked a question worked as a TAC engineer. It is often that they collect or receive from the customers some text 'snapshots' of the network state for further offline analysis while troubleshooting the issues. Some automation could really save a lot of time.


I found this task interesting and also applicable to my own needs, so I decided to write a Proof-of-Concept implementation in Python 3 for Cisco IOS, IOS-XE, and ASA routing table format.


In this article, I’ll try to reconstruct the resulting script development process and my considerations behind each step.


Let’s get started.

Read more →
Rating 0
Views 1.7K
Comments 0

Coins Classification using Neural Networks

Python *Big Data *Data Engineering *
Tutorial

See more at robotics.snowcron.comThis is the first article in a serie dedicated to coins classification.Having countless "dogs vs cats" or "find a pedestrian on the street" classifiers all over the Internet, coins classification doesn't look like a difficult task. At first. Unfortunately, it is degree of magnitude harder - a formidable challenge indeed. You can easily tell heads of tails? Great. Can you figure out if the number is 1 mm shifted to the left? See, from classifier's view it is still the same head... while it can make a difference between a common coin priced according to the number on it and a rare one, 1000 times more expensive.Of course, we can do what we usually do in image classification: provide 10,000 sample images... No, wait, we can not. Some types of coins are rare indeed - you need to sort through a BASKET (10 liters) of coins to find one. Easy arithmetics suggests that to get 10000 images of DIFFERENT coins you will need 10,000 baskets of coins to start with. Well, and unlimited time.So it is not that easy.Anyway, we are going to begin with getting large number of images and work from there. We will use Russian coins as an example, as Russia had money reform in 1994 and so the number of coins one can expect to find in the pocket is limited. Unlike USA with its 200 years of monetary history. And yes, we are ONLY going to focus on current coins: the ultimate goal of our work is to write a program for smartphone to classify coins you have received in a grocery store as a change.Which makes things even worse, as we can not count on good lighting and quality cameras anymore. But we'll still try.In addition to "only Russian coins, beginning from 1994", we are going to add an extra limitation: no special occasion coins. Those coins look distinctive, so anyone can figure that this coin is special. We focus on REGULAR coins. Which limits their number severely.Don't take me wrong: if we need to apply the same approach to a full list of coins... it will work. But I got 15 GB of images for that limited set, can you imagine how large the complete set will be?!To get images, I am going to scan one of the largest Russian coins site "meshok.ru".This site allows buyers and sellers to find each other; sellers can upload images... just what we need. Unfortunately, a business-oriented seller can easily upload his 1 rouble image to 1, 2, 5, 10 roubles topics, just to increase the exposure.

So we can not count on the topic name, we have to determine what coin is on the photo ourselves.To scan the site, a simple scanner was written, based on the Python's Beautiful Soup library. In just few hours I got over 50,000 photos. Not a lot by Machine Learning standards, but definitely a start.After we got the images, we have to - unfortunately - revisit them by hand, looking for images we do not want in our training set, or for images that should be edited somehow. For example, someone could have uploaded a photo of his cat. We don't need a cat in our dataset.First, we delete all images, that can not be split to head/hail.

Читать далее
Rating 0
Views 693
Comments 0

Doing «Data Science» even if you have never heard the words before

Python *Algorithms *Mathematics *Machine learning *Artificial Intelligence

There’s a lot of talk about machine learning nowadays. A big topic – but, for a lot of people, covered by this terrible layer of mystery. Like black magic – the chosen ones’ art, above the mere mortal for sure. One keeps hearing the words “numpy”, “pandas”, “scikit-learn” - and looking each up produces an equivalent of a three-tome work in documentation.

I’d like to shatter some of this mystery today. Let’s do some machine learning, find some patterns in our data – perhaps even make some predictions. With good old Python only – no 2-gigabyte library, and no arcane knowledge needed beforehand.

Interested? Come join us.

Read more
Rating 0
Views 865
Comments 0

Visualizing Network Topologies: Zero to Hero in Two Days

Python *Cisco *Network technologies *
Translation

Hey everyone! This is a follow-up article on a local Cisco Russia DevNet Marathon online event I attended in May 2020. It was a series of educational webinars on network automation followed by daily challenges based on the discussed topics.
On a final day, the participants were challenged to automate a topology analysis and visualization of an arbitrary network segment and, optionally, track and visualize the changes.


The task was definitely not trivial and not widely covered in public blog posts. In this article, I would like to break down my own solution that finally took first place and describe the selected toolset and considerations.

Let's get started.


Read more →
Total votes 2: ↑2 and ↓0 +2
Views 11K
Comments 0

Big Data Tools EAP 12 Is Out: Experimental Python Support and Search Function in Zeppelin Notebooks

JetBrains corporate blog Python *Scala *Big Data *

Update 12 of the Big Data Tools plugin for IntelliJ IDEA Ultimate, PyCharm Professional Edition, and DataGrip has been released. You can install it from the JetBrains Plugin Repository or from inside your IDE. The plugin allows you to edit Zeppelin notebooks, upload files to cloud filesystems, and monitor Hadoop and Spark clusters.


In this release, we've added experimental Python support and global search inside Zeppelin notebooks. We’ve also addressed a variety of bugs. Let's talk about the details.


Read more →
Rating 0
Views 554
Comments 1

Authors' contribution