Sound

snakers4 March 30, 2021 at 06:33 AM

High-Quality Text-to-Speech Made Accessible, Simple and Fast
There is a lot of commotion in text-to-speech now. There is a great variety of toolkits, a plethora of commercial APIs from GAFA companies (based both on new and older technologies). There are also a lot of Silicon Valley startups trying to ship products akin to "deep fakes" in speech.

But despite all this ruckus we have not yet seen open solutions that would fulfill all of these criteria:
- Naturally sounding speech;
- A large library of voices in many languages;
- Support for 16kHz and 8kHz out of the box;
- No GPUs / ML engineering team / training required;
- Unique voices not infringing upon third-party licenses;
- High throughput on slow hardware. Decent performance on one CPU thread;
- Minimalism and lack of dependencies. One-line usage, no builds or coding in C++ required;
- Positioned as a solution, not yet another toolkit / compilation of models developed by other people;
- Not affiliated by any means with ecosystems of Google / Yandex / Sberbank;
We decided to share our open non-commercial solution that fits all of these criteria with the community. Since we have published the whole pipeline we do not focus much on cherry picked examples and we encourage you to visit our project GitHub repo to test our TTS for yourself.
Read more →
- +5
- 1.9k
- 5
SvyatoslavMC March 5, 2021 at 08:18 AM

Short-lived Music or MuseScore Code Analysis
- PVS-Studio corporate blog,
- Open source,
- C++,
- C,
- Sound
Having only programming background, it is impossible to develop software in some areas. Take the difficulties of medical software development as an example. The same is with music software, which will be discussed in this article. Here you need an advice of subject matter experts. However, it's more expensive for software development. That is why developers sometimes save on code quality. The example of the MuseScore project check, described in the article, will show the importance of code quality expertise. Hopefully, programming and musical humor will brighten up the technical text.

Читать далее
- 0
- 457
- Comment
snakers4 January 14, 2021 at 10:09 AM

Modern Portable Voice Activity Detector Released
- Open source,
- Machine learning,
- Sound
Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector (link). WebRTC though starts to show its age and it suffers from many false positives.

Also in some cases it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is a highly subjective matter and it depends on locale and business case, but Voice Activity and Number Detection are quite general tasks.

Key features:
- Modern, portable;
- Low memory footprint;
- Superior metrics to WebRTC;
- Trained on huge spoken corpora and noise / sound libraries;
- Slower than WebRTC, but fast enough for IOT / edge / mobile applications;
- Unlike WebRTC (which mostly tells silence from voice), our VAD can tell voice from noise / music / silence;
- PyTorch (JIT) and ONNX checkpoints;
Typical use cases:
- Spoken corpora anonymization;
- Can be used together with WebRTC;
- Voice activity detection for IOT / edge / mobile use cases;
- Data cleaning and preparation, number and voice detection in general;
- PyTorch and ONNX can be used with a wide variety of deployment options and backends in mind;
Read more →
- 0
- 791
- Comment
snakers4 September 17, 2020 at 07:48 PM

Modern Google-level STT Models Released
- Big Data,
- Machine learning,
- Start-up development,
- Sound
We are proud to announce that we have built from ground up and released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:
- English;
- German;
- Spanish;
You can find all of our models in our repository together with examples, quality and performance benchmarks. Also we invested some time into making our models as accessible as possible — you can try our examples as well as PyTorch, ONNX, TensorFlow checkpoints. You can also load our model via TorchHub.

PyTorch ONNX TensorFlow Quality Colab

English (en_v1) ✓ ✓ ✓ link

German (de_v1) ✓ ✓ ✓ link

Spanish (es_v1) ✓ ✓ ✓ link
Read more →
- +9
- 2.6k
- 1
Please pay attention
itmo March 30, 2020 at 04:49 PM

Juggling work and study at ITMO University: CS edition
We talked to the graduates of the Speech Information Systems MA program at ITMO about the ways our university helped jumpstart their careers. [More stories from our startups]:
- ITMO University startup accelerator introduces Laeneco, a smart stethoscope
- Quantum communications: building 100% secure data transfer systems
Read more →
- +4
- 537
- Comment
ZvoogHub September 2, 2019 at 01:34 PM

How to use MIDI for web in short
- Web design,
- Open source,
- JavaScript,
- WebGL,
- Sound
MIDI player
- Use MIDI parser to read notes from MIDI file
- Use WebAudioFont to play musical instruments in a browser
- See result
Read more →
- +6
- 1.7k
- Comment
ValdikSS June 18, 2019 at 12:00 AM

Bluetooth stack modifications to improve audio quality on headphones without AAC, aptX, or LDAC codecs
Before reading this article, it is recommended to read the previous one: Audio over Bluetooth: most detailed information about profiles, codecs, and devices / по-русски

Some wireless headphone users note low sound quality and lack of high frequencies when using the standard Bluetooth SBC codec, which is supported by all headphones and other Bluetooth audio devices. A common recommendation to get better sound quality is to buy devices and headphones with aptX or LDAC codecs support. These codecs require licensing fees, that's why devices with them are more expensive.

It turns out that the low quality of SBC is caused by artificial limitations of all current Bluetooth stacks and headphones' configuration, and this limitation can be circumvented on any existing device with software modification only.
Read more →
- +20
- 26.5k
- 3
Ads
AdBlock has stolen the banner, but banners are not teeth — they will be back

More
ValdikSS June 18, 2019 at 12:00 AM

Audio over Bluetooth: most detailed information about profiles, codecs, and devices
- Wireless technologies,
- Sound
This article is also available in Russian / Эта статья также доступна на русском языке

The mass market of smartphones without the 3.5 mm audio jack changed headphones industry, wireless Bluetooth headphones have become the main way to listen to music and communicate in headset mode for many users.
Bluetooth device manufacturers rarely disclose detailed product specifications, and Bluetooth audio articles on the Internet are contradictory and sometimes incorrect. They do not tell about all the features, and often publish the same false information.
Let's try to understand the protocol, the capabilities of Bluetooth stacks, headphones and speakers, Bluetooth codecs for music and speech, find out what affects the quality of the transmitted audio and the delay, learn how to capture and decode information about supported codecs and other device features.

TL;DR:
- SBC codec is OK
- Headphones have their own per-codec equalizer and post processing configuration
- aptX is not as good as the advertisements say
- LDAC is a marketing fluff
- Voice audio quality is still low
- Browsers are able to execute audio encoders compiled to WebAssembly from C using emscripten, and they won't even lag.
Read more →
- +22
- 178k
- 9
vibe_crc February 20, 2019 at 11:31 AM

Designing Sound for Pathfinder: Kingmaker
Pathfinder: Kingmaker (PF:K for short) is a role-playing video game created by Owlcat Games, released in Fall 2018 on Steam and GoG. Inspired by classic Bioware games, this project uses a popular board game system ruleset, combat takes place in Real-Time with Pause, follows an isometric camera, and has a non-linear story with multiple unique endings.

In this article, I will share a little about how we worked on designing the audio throughout the game’s development including task management, the search for inspiration, and troubleshooting. An experienced specialist may not find anything particularly groundbreaking in this recap, but beginners and enthusiasts will definitely discover some points of interest.

Read more →
- +32
- 2.3k
- Comment
shiru8bit February 7, 2019 at 01:14 PM

PC Speaker To Eleven
- Abnormal programming,
- Assembler,
- Demoscene,
- Old hardware,
- Sound
Known now as a «motherboard speaker», or just «beeper», PC Speaker has been introduced in 1981 along with the first personal IBM computer. Being a successor of the big serious computers for serious business, it has been designed to produce very basic system beeps, so it never really had a chance to shine bright as a music device in numerous entertainment programs of the emerging home market. Overshadowed by much more advanced sound chips of popular home game systems, quickly replaced with powerful sound cards, it mostly served as a fallback option, playing severely downgraded content of better sound hardware.

«System Beeps» is a music album in shape of an MS-DOS program that features original music composed for PC Speaker using the same basic old techniques like ones found in classic PC games. It follows the usual retro computing demoscene formula — take something rusty and obsolete, and push it to eleven — and attempts to reveal the long hidden potential of this humble little sound device. You can hear it in action and form an opinion on how successful this attempt was at Bandcamp, or in the video below. The following article is an in-depth overview of the original PC Speaker capabilities and making of the project, for those who would like to know more.

Read more →
- +30
- 30.9k
- 3

This "bzzz" is not for nothing

All posts

Top

Authors

Companies

High-Quality Text-to-Speech Made Accessible, Simple and Fast

Short-lived Music or MuseScore Code Analysis

Modern Portable Voice Activity Detector Released

Modern Google-level STT Models Released

Please pay attention

Juggling work and study at ITMO University: CS edition

How to use MIDI for web in short

MIDI player

Bluetooth stack modifications to improve audio quality on headphones without AAC, aptX, or LDAC codecs

Audio over Bluetooth: most detailed information about profiles, codecs, and devices

Designing Sound for Pathfinder: Kingmaker

PC Speaker To Eleven

Authors' contribution

Popular right now

Top posts

Please pay attention

Your account

Sections

Info

Services

	PyTorch	ONNX	TensorFlow	Quality
English (en_v1)	✓	✓	✓	link
German (de_v1)	✓	✓	✓	link
Spanish (es_v1)	✓	✓	✓	link