InvokeAI 2.2 is now available to everyone. This update brings in exciting features, like UI Outpainting, Embedding Management and more. See highlighted updates below, or the full release notes for everything included in the release.
Python *
Interpreted high-level programming language for general-purpose programming
Python Junior Plus, or the beginner's Roadmap to becoming a Python programmer
Hello! My name is Mikhail Emelyanov, I am embedded software engineer, and I was inspired to write this little roadmap on the capabilities of Python language by a certain commonality among the existing Python tutorials found on the web.
The usual suggestions to study, say, “Algorithms and Data Structures” or “Databases” are especially jarring. You can spend years studying these topics, and even after decades you'd still be able to find something you didn't know yet even without ever venturing outside the scope of Algorithms!
Using video game analogies, we can say that novice programmers often stand on the shore of the lake of boiling lava with an island with the ever-coveted jobs in the center, while the islands in between, which you have to jump on, gradually increasing your skills in successive mini-quests, are either missing, or arranged haphazardly, or their fairly smooth sequence breaks off, never having managed to get you any farther from the shore. Let's try to build a path of hint islands, a number of which, although not without effort, will finally allow us to reach our goal.
I trained a neural network on my drawings and give the model for free (and teach you to create your own)
Great for seamless patterns, abstract drawings, and watercolor-styled images. How to use it and train a neural network on your own pictures?
Download the model here: https://huggingface.co/netsvetaev/netsvetaev-free
InvokeAI 2.1 Release
The InvokeAI team is excited to share our latest feature release, with a set of new features, UI enhancements, and CLI capabilities.
What are OAuth 2 and OpenID Connect (OIDC)?
Author: Denis Zherdetskiy
Detecting attempts of mass influencing via social networks using NLP. Part 2
In Part 1 of this article, I built and compared two classifiers to detect trolls on Twitter. You can check it out here.
Now, time has come to look more deeply into the datasets to find some patterns using exploratory data analysis and topic modelling.
EDA
To do just that, I first created a word cloud of the most common words, which you can see below.
Detecting attempts of mass influencing via social networks using NLP. Part 1
During the last decades, the world’s population has been developing as an information society, which means that information started to play a substantial end-to-end role in all life aspects and processes. In view of the growing demand for a free flow of information, social networks have become a force to be reckoned with. The ways of war-waging have also changed: instead of conventional weapons, governments now use political warfare, including fake news, a type of propaganda aimed at deliberate disinformation or hoaxes. And the lack of content control mechanisms makes it easy to spread any information as long as people believe in it.
Based on this premise, I’ve decided to experiment with different NLP approaches and build a classifier that could be used to detect either bots or fake content generated by trolls on Twitter in order to influence people.
In this first part of the article, I will cover the data collection process, preprocessing, feature extraction, classification itself and the evaluation of the models’ performance. In Part 2, I will dive deeper into the troll problem, conduct exploratory analysis to find patterns in the trolls’ behaviour and define the topics that seemed of great interest to them back in 2016.
Features for analysis
From all possible data to use (like hashtags, account language, tweet text, URLs, external links or references, tweet date and time), I settled upon English tweet text, Russian tweet text and hashtags. Tweet text is the main feature for analysis because it contains almost all essential characteristics that are typical for trolling activities in general, such as abuse, rudeness, external resources references, provocations and bullying. Hashtags were chosen as another source of textual information as they represent the central message of a tweet in one or two words.
How we tackled document recognition issues for autonomus and automatic payments using OCR and NER
In this article, I would like to describe how we’ve tackled the named entity recognition (aka NER) issue at Sber with the help of advanced AI techniques. It is one of many natural language processing (NLP) tasks that allows you to automatically extract data from unstructured text. This includes monetary values, dates, or names, surnames and positions.
Just imagine countless textual documents even a medium-sized organisation deals with on a daily basis, let alone huge corporations. Take Sber, for example: it is the largest financial institution in Russia, Central and Eastern Europe that has about 16,500 offices with over 250,000 employees, 137 million retail and 1.1 million corporate clients in 22 countries. As you can imagine, with such an enormous scale, the company collaborates with hundreds of suppliers, contractors and other counterparties, which implies thousands of contracts. For instance, the estimated number of legal documents to be processed in 2022 has been over 65,000, each of them consisting of 30 pages on average. During the lifecycle of a contract, a contract usually updated with 3 to 5 additional agreements. On top of this, a contract is accompanied by various source documents describing transactions. And in the PDF format, too.
Previously, the processing duty befell our service centre’s employees who checked whether payment details in a bill match those in the contract and then sent the document to the Accounting Department where an accountant double-checked everything. This is quite a long journey to a payment, right?
Django ModelAdmins autoregister
Some time ago I discovered that Django has the ability to auto-register ModelAdmins
. Since this is not common knowledge and carries a number of benefits, I decided to write an article about it to bring it to the attention of the Django community.
The Python and Django conferences you don't want to miss in 2022
Quick reminder for those, who would like to broaden their horizons: there is “PyCon DE Berlin 2022 ” coming up already next week!
Django admin dynamic Inline positioning
Recently I've received an interesting request from a client about one of our Django projects.
He asked if it would be possible to show an inline component above other fields in the Django admin panel.
At the beginning I thought, that there shouldn't be any issue with that.
Though there was no easy solution other then installing another battery to the project. My gut feeling told me, there were another way around that problem.
Stop losing clients! Or how a developer can test a website, by the example of PVS-Studio. Part 1
A website with bugs could be a real pain in the neck for business. Just one 404 or 500 error could end up costing an obscene amount of money for the company and hurt a good reputation. But there is a way to avoid this issue: the website testing. That's sort of what this article is about. After reading this article, you will learn how to test code in Django, create your "own website tester" and much more. Welcome to the article.
Easy concurrency with Python Shared Object
Project repository.
Year old article about general concepts of the project.
So you want to build a multitasking system using python? But you actually hesitate because you know you'll have to either use multitasking module, which is slow and/or somewhat inconvenient, or a more powerfull external tool like Redis or RabbitMQ or even large DBMS like MongoDB or PostgreSQL, which require some glue (i.e. very far from native python code) and apply their own restrictions on what you can do with your data. If you think «why do I need so much hassle if I just want to run few worker threads in python using the data structures I already have in my python program and using functions I've already written? I just want to run this code in threads! Oh, I wish there was no GIL in Python» — then welcome to the club.
Of course many of us can build from scratch a decent tool that would make use of multiple cores. However, having already existing working software (Pandas, Tensorflow, SciPy, etc) is always cheaper than any development of new software. But the status quo in CPython tells us one thing: you cannot remove GIL because everything is based on GIL. Although making shit into gold could require much work, the ability to alleviate the transition from slow single-threaded shit to a slow not-so-single-threaded gold-looking shit might be worth it, so you won't have to rewrite your whole system from scratch.
We have published a model for text repunctuation and recapitalization for four languages
Working with speech recognition models we often encounter misconceptions among potential customers and users (mostly related to the fact that people have a hard time distinguishing substance over form). People also tend to believe that punctuation marks and spaces are somehow obviously present in spoken speech, when in fact real spoken speech and written speech are entirely different beasts.
Of course you can just start each sentence with a capital letter and put a full stop at the end. But it is preferable to have some relatively simple and universal solution for "restoring" punctuation marks and capital letters in sentences that our speech recognition system generates. And it would be really nice if such a system worked with any texts in general.
For this reason, we would like to share a system that:
- Inserts capital letters and basic punctuation marks (dot, comma, hyphen, question mark, exclamation mark, dash for Russian);
- Works for 4 languages (Russian, English, German, Spanish) and can be extended;
- By design is domain agnostic and is not based on any hard-coded rules;
- Has non-trivial metrics and succeeds in the task of improving text readability;
To reiterate — the purpose of such a system is only to improve the readability of the text. It does not add information to the text that did not originally exist.
Mode on: Comparing the two best colorization AI's
This article continues a series of notes about colorization. During today's experiment, we’ll be comparing a recent neural network with the good old Deoldify to gauge the rate at which the future is approaching.
This is a practical project, so we won’t pay extra attention to the underlying philosophy of the Transformer architecture. Besides, any attempt to explain the principles of its operation to a wide public in hand waving terms would become misguiding.
A lecturer: Mr. Petrov! How does a transformer work?
Petrov with a bass voice: Hum-m-m-m.
Google Colorizing Transformer vs Deoldify
Data Phoenix Digest — 01.07.2021
We at Data Science Digest have always strived to ignite the fire of knowledge in the AI community. We’re proud to have helped thousands of people to learn something new and give you the tools to push ahead. And we’ve not been standing still, either.
Please meet Data Phoenix, a Data Science Digest rebranded and risen anew from our own flame. Our mission is to help everyone interested in Data Science and AI/ML to expand the frontiers of knowledge. More news, more updates, and webinars(!) are coming. Stay tuned!
The new issue of the new Data Phoenix Digest is here! AI that helps write code, EU’s ban on biometric surveillance, genetic algorithms for NLP, multivariate probabilistic regression with NGBoosting, alias-free GAN, MLOps toys, and more…
If you’re more used to getting updates every day, subscribe to our Telegram channel or follow us on social media: Twitter, Facebook.
DataScience Digest — 24.06.21
The new issue of DataScienceDigest is here!
The impact of NLP and the growing budgets to drive AI transformations. How Airbnb standardized metric computation at scale. Cross-Validation, MASA-SR, AgileGAN, EfficientNetV2, and more.
If you’re more used to getting updates every day, subscribe to our Telegram channel or follow us on social media: Twitter, LinkedIn, Facebook.
[Tutorial] How to set up Atom IDE for python development
Setting up Atom for working with python is quite a tricky task. I've spent a lot of time making it work. Autocompleting, autoformatting, type hints, and much more will be available to you after reading this tutorial.
DataScience Digest — 10.06.21
The new issue of DataScienceDigest is here!
Machine learning in healthcare, the top 10 TED talks on AI, fraud detection in Uber, DatasetGAN, Text-to-Image generation via transformers, and more…
DataScience Digest — 02.06.21
New issue of DataScienceDigest is here! OpenAI is launching a $100 million startup fund, Albumentations 1.0 has been released, lessons on ML platforms, image cropping on Twitter, and more.