2019-08 Open archive of 240,000 hours' worth of talk radio, including 2.8 billion words of machine-transcription

- agosto 02, 2019

A group of MIT Media Lab researchers have published Radiotalk, a massive corpus of talk radio audio with machine-generated transcriptions, with a total of 240,000 hours' worth of speech, marked up with machine-readable metadata.

The audio was scraped from streaming radio services between Oct 2018 and Mar 2019, and the transcripts run to 2.8 billion words. The researchers hope the corpus will be used by "researchers in the fields of natural language processing, conversational analysis, and the social sciences."

https://boingboing.net/2019/08/01/pump-up-the-volume.html

Buscar este blog

Actualidad Deep Learning

2019-08 Open archive of 240,000 hours' worth of talk radio, including 2.8 billion words of machine-transcription

Comentarios

Publicar un comentario

Popular

Es hora de que la IA se explique

Tesla admite que es posible que nunca logre autos totalmente autónomos

Publicación del primer informe de progreso del Comité Ad hoc de Inteligencia Artificial (CAHAI)