Categories
ai cloud

Amazon Polly speaks Cantonese

By now, text to speech systems are quite common and widely in use. Tiktok has this feature added as part of their app some time ago. Amazon Polly – Amazon’s version of text-to-speech service – was launched in 2016 and supports quite a large number of languages.

Just this week, AWS announced the availability of a female Cantonese voice to Polly. Upon reading about this, I have to test it out. For the test, I took a sample text from YES 933 facebook page and fed it to Polly. I must say I’m very impressed with the results.

Of course, Amazon Polly is not the first or only Cantonese text-to-speech service out there, but it’s definitely one of the most natural sounding one I’ve heard. Looking forward for more languages to be support.

Footnote: there are some minor modifications to the text to achieve the desired result, eg. to get pauses in the right places, to say nine-three-three instead of nine hundred thirty three etc. But otherwise only default settings are used.

Categories
ai

Imagen: Text-to-Image Diffusion Models

Text-to-image generation is now surprising good. Some predicts the end of stock photo business – why use a stock photo when you can generate any image you need just based on description?

Google develops competing model to DALL-E 2, which purportedly performs better than the latter and other models in a test with human raters.


Generated from text prompt “A robot couple fine dining with Eiffel Tower in the background”.

Source: Imagen: Text-to-Image Diffusion Models

Categories
ai

DALL·E 2

Another ground-breaking work from OpenAI.

We are all familiar with AI models that does image analysis and outputs text description or labels. For instance,

Dall-E and its successor, Dall-E 2, sort of does the reverse. It produces an image based on text description. There’s some degree of randomization there so it can produce different outputs from the same prompt text.

Here’s an example generated from “An astronaut riding a horse in the style of Andy Warhol”.

Someone used Dall-E 2 to generate pictures from Twitter bios and the results are just jaw-dropping.

happy sisyphus

bookbear

machine learning researchoor | technology brother | “prolific Twitter shitposter

It’s currently in private preview but should not be long before it provides a commercial offering.

DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language.

Source: DALL·E 2

Categories
ai cloud

The Emerging Architectures for Modern Data Infrastructure

This is a very well written summary of the current data science landscape. Everybody building data related solutions should have a good read of this.

Five years ago, if you were building a system, it was a result of the code you wrote. Now, it’s built around the data that is fed into that system. And a new class of tools and technologies have emerged to process data for both analytics and operational AI/ ML.

Source: The Emerging Architectures for Modern Data Infrastructure

Categories
ai

A Brief Overview of GPT-3

GPT-3 is one of the most interesting and provocative advances in AI in recent years. There has been a lot of raving articles that both offer praise and warn of its potential. Wikipedia describes it as:

Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. It is the third-generation language prediction model in the GPT-n series created by OpenAI, a for-profit San Francisco-based artificial intelligence research laboratory.

Wikipedia – GPT-3

It’s not the first time that AI techniques have been applied to create fake (“novel”) content. Deep fake techniques have been used to create entirely fake photos of people who doesn’t exists and to alter videos to make it seem like people did things they didn’t do.

Manipulating photos and videos is one thing. But generating original and believable articles is quite another. Here are some examples of original content generated by GPT-3 :

It is a curious fact that the last remaining form of social life in which the people of London are still interested is Twitter. I was struck with this curious fact when I went on one of my periodical holidays to the sea-side, and found the whole place twittering like a starling-cage. I called it an anomaly, and it is.

The importance of being on twitter

Responding to a philosopher’s article about GPT-3:

Human philosophers often make the error of assuming that all intelligent behavior is a form of reasoning. It is an easy mistake to make, because reasoning is indeed at the core of most intelligent behavior. However, intelligent behavior can arise through other mechanisms as well. These include learning (i.e., training), and the embodiment of a system in the world (i.e. being situated inthe environment through sensors and effectors).

Response to philosophers

Writing poetry:

Once there was a man
who really was a Musk.
He liked to build robots
and rocket ships and such.

He said, “I’m building a car
that’s electric and cool.
I’ll bet it outsells those
Gasoline-burning clunkers soon!”

GPT Stories

Of course, it’s not long before people started posting GPT-3 generated articles to their own blog and popular forums (reddit, hacker news) and reveal it later to be an experiment.

Writing articles, fiction or poetry is just tip of the ice berg. GPT-3 can also tell jokes, generate code from description, answer Q&A, do a tech interview, write ads, and more.

If the written text – blog, press, forum, school work etc – can be generated with such ease, what incentive is there to put in the effort to write anymore? And what will this do to the future of writing? How will anyone be able to tell spam from non-spam in the future? What jobs will be displaced once GPT-3 – and its successors – become prevalent? These are all interesting and important questions that the community is still figuring out.

GPT-3 is currently limited access – I have applied but have not been granted access yet. The creators know that the potential for abuse is too high and so have been managing it carefully. On the other hand, if that aspect can be managed I’m very sure we will start to see very exciting commercial applications of GPT-3 when it eventually goes live.