Entradas

Mostrando entradas de 2019

Cómo las Big Tech evitan la regulación

No estoy de acuerdo con el autor en que "los investigadores de IA pasados ​​no hayan estado interesados ​​en el estudio de la ética"; Hay una gran literatura sobre esto, comenzando incluso en la década de 1950.  No es sorprendente que no se pueda confiar en que las grandes compañías (y las universidades financiadas por ellas) controlen ellos mismos. Siempre ha habido una investigación que financia dilemas éticos en IA: la mayoría de los fondos siempre provienen del Departamento de Defensa o de grandes empresas. Y, por supuesto, durante mucho tiempo ha habido una relación acogedora entre las grandes compañías tecnológicas y el Departamento de Defensa, incluso antes de Google, Apple, Amazon. https://theintercept.com/2019/12/20/mit-ethical-ai-artificial-intelligence/ Adaptado por Aniceto Pérez y Madrid, Filósofo de las Tecnologías y Editor de Actualidad Deep Learning (@forodeeplearn)

Deepfake Bot Submissions to Federal Public Comment Websites Cannot Be Distinguished from Human Submissions

El periodo de consulta federal es un forma importante para que las agencias federales incorporen la opinión pública en las decisiones políticas. Ahora que se aceptan comentarios online, los periodos de consultas son vulnerables a ataques a escala de internet. Por ejemplo, en 2017 más de 21 millones, el 96% de 22 millones, de los comentarios enviados acerca de la propuesta de la FCC para revocar la neutralidad de la red eran perceptibles técnicas de búsqueda y reemplazo. Métodos de IA públicamente accesibles pueden ahora generar "DeepFakes", texto generado por ordenador que imita el discurso humano original con gran precisión. https://techscience.org/a/2019121801/?utm_campaign=the_cybersecurity_202&utm_medium=Email&utm_source=Newsletter&wpisrc=nl_cybersecurity202&wpmm=1 Adaptado por Aniceto Pérez y Madrid, Filósofo de las Tecnologías y Editor de Actualidad Deep Learning (@forodeeplearn)

«La gente no se fija que paga continuamente con sus propios datos»

https://sevilla.abc.es/economia/abci-gente-no-fija-paga-continuamente-propios-datos-201912150135_noticia.html La entrevista tiene lugar en su despacho del edificio Berleymon, decorado con sobriedad nórdica. Y la primera pregunta versa sobre sus nuevas responsabilidades como gestora del mercado único digital en el marco del «pacto verde» que lanza la nueva presidenta de la Comisión Ursula von der Leyen y en la que puede ser tanto parte del problema como de la solución. «Definitivamente -dice- es las dos cosas y creo que esto será algo que afectará a todo el mandato porque cuando se mira el objetivo de ser neutros en carbón en 2050 es un objetivo completamente horizontal. En muchas de las soluciones digitales se consume una enorme cantidad de energía, como en los parques de servidores o en los coches autónomos. Pero, por otro lado, no se puede llevar a cabo la lucha contra el cambio climático sin soluciones digitales». ¿Cuál piensa que será la principal diferencia de esta nueva legisl...

Los proveedores de IA como Agenes Públicos

Kate Crawford, directora del institutoAI Now, pide en esta publicación de la revista jurídica de Columnia que los proveedores de sistemas que utilicen IA para el gobierno de EEUU sean considerados "Agentes públicos" a fin de poder establecer la responsabilidad, dado que los funcionarios carecen de conocimiento suficiente. https://columbialawreview.org/wp-content/uploads/2019/11/Crawford-Schultz-AI_systems_as_state_actors.pdf *****, AI FATE (fairness accuracy transparency ethics), AI Report, por Aniceto Pérez y Madrid, Filósofo de las Tecnologías y Editor de Actualidad Deep Learning (@forodeeplearn)

Presentation Professor Luciano Floridi – Onlife and Being Human in a Hyperconnected Era: What Utopia?

¿Existe algo así como una utopía digital? ¿Pueden las tecnologías ayudarnos a hacer realidad nuestros ideales y propósitos? De todos modos, ¿qué significa "beneficio para un propósito"? En la Cumbre Ejecutiva Utopia for Beginners de Sogeti, estas preguntas fueron abordadas por algunos pensadores y visionarios muy aclamados. En el siguiente video, el profesor Luciano Floridi compartirá su visión sobre "Utopía para principiantes". Luciano Floridi es profesor de Filosofía y Ética de la Información en la Universidad de Oxford. Floridi es uno de los asesores más influyentes de la Unión Europea en el campo de la ética de la información. También aconsejó a Google cómo abordar el nuevo derecho del olvido de los ciudadanos en Internet. Es el Director del Laboratorio de Ética Digital, una alianza entre la Universidad de Oxford y las empresas. El objetivo del laboratorio es “detectar e incrementar las oportunidades de la innovación digital al tiempo que se reducen los riesgos ...

AI and data irony – Ferrari without fuel?

Or, a lake without oxygen? While data is exploding and dancing like never before, AI is still not able to convert this ocean into the juice of actionable intelligence as much as, and as fast as, we hoped it to. What could be holding AI back? https://www.dqindia.com/ai-data-irony-ferrari-without-fuel/

U.S. Police Already Using 'Spot' Robot From Boston Dynamics in the Real World

Massachusetts State Police (MSP) has been quietly testing ways to use the four-legged Boston Dynamics robot known as Spot, according to new documents obtained by the American Civil Liberties Union of Massachusetts. And while Spot isn’t equipped with a weapon just yet, the documents provide a terrifying peek at our RoboCop future. https://gizmodo.com/u-s-police-already-using-spot-robot-from-boston-dynami-1840029868

Questioning The Long-Term Importance Of Big Data In AI

No asset is more prized in today's digital economy than data. It has become widespread to the point of cliche to refer to data as "the new oil." As one recent Economist headline put it, data is "the world's most valuable resource." Data is so highly valued today because of the essential role it plays in powering machine learning and artificial intelligence solutions. Training an AI system to function effectively—from Netflix's recommendation engine to Google's self-driving cars—requires massive troves of data. https://www.forbes.com/sites/robtoews/2019/11/04/questioning-the-long-term-importance-of-big-data-in-ai/#2d71ccc42177

OpenAI publica la versión íntegra de GPT-2

OpenAI publica la versión íntegra de GPT-2, el sistema de de generación de textos más avanzado que tiene 1.500 millones de parámetros. https://www.theverge.com/2019/11/7/20953040/openai-text-generation-ai-gpt-2-full-model-release-1-5b-parameters Adaptado por Aniceto Pérez y Madrid, Filósofo de las Tecnologías y Editor de Actualidad Deep Learning (@forodeeplearn)

Conferencia HAI en video: Ética, política y gobernanza

Conferencia sobre ética, política y gobernanza de la IA, días 28-29 octubre 2019 Los vídeos contienen todas las ponencias, incluyendo Fei Fei Li, codirectora de HAI Day 1:  https://youtu.be/k0jF- UMC1b4 Day 2:  https://youtu.be/ qPx9P1Mybu8 por Aniceto Pérez y Madrid, Filósofo de las Tecnologías y Editor de Actualidad Deep Learning (@forodeeplearn)

Head fake: MIT work shows fake news detection isn't quite there yet

Los esfuerzos para detectar fake news no están tan avanzados como deberías dado que las mejores prácticas se apoyan en detección de patrones que pueden ser explotadas por actores maliciosos. https://www.zdnet.com/article/head-fake-mit-says-fake-news-detection-is-not-what-it-appears/ Adaptado por Aniceto Pérez y Madrid, Filósofo de las Tecnologías y Editor de Actualidad Deep Learning (@forodeeplearn)

California prohíbe los 'deepfakes' políticos durante la campaña electoral

La ley únicamente permitirá la distribución de este contenido a los medios de comunicación o con finalidad de sátira o parodia https://www.elmundo.es/tecnologia/2019/10/08/5d9c997a21efa0ed088b4580.html Adaptado por Aniceto Pérez y Madrid, Filósofo de las Tecnologías y Editor de Actualidad Deep Learning (@forodeeplearn)

Fraud with a deepfake: the dark side of artificial intelligence

Durante los últimos años las fake news han sido una seria preocupación. Se cree que las fakes han jugado un papel importante en procesos electorales como las elecciones norteamericanas de 2016 y el referendum del Brexit. https://www.pandasecurity.com/mediacenter/news/deepfake-voice-fraud/ Adaptado por Aniceto Pérez y Madrid, Filósofo de las Tecnologías y Editor de Actualidad Deep Learning (@forodeeplearn)

2019-09 Contributing Data to Deepfake Detection Research

Deep learning has given rise to technologies that would have been thought impossible only a handful of years ago. Modern generative models are one example of these, capable of synthesizing hyperrealistic images, speech, music, and even video. These models have found use in a wide variety of applications, including making the world more accessible through text-to-speech, and helping generate training data for medical imaging. https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html

AI can’t protect us from deepfakes, argues new report

Un nuevo informe de Data and Society suscita dudas sobre soluciones automatizadas para videos modificados para engañar. Apoarse en la IA podría incluso empeorar las cosas al concentrar más daos y poder en manos de corporaciones privadas. https://www.theverge.com/2019/9/18/20872084/ai-deepfakes-solution-report-data-society-video-altered Adaptado por Aniceto Pérez y Madrid, Filósofo de las Tecnologías y Editor de Actualidad Deep Learning (@forodeeplearn)

2019-09 NHS trusts sign first deals with Google

Five National Health Service trusts have signed partnerships with Google to process sensitive patient records, in what are believed to be the first deals of their kind.  The deals came after DeepMind, the London-based artificial intelligence company, transferred control of its health division to its Californian parent. DeepMind had contracts to process medical data from six NHS trusts in Britain to develop its Streams app, which alerts doctors and nurses when patients are at risk of acute kidney injury, and to conduct artificial intelligence research. https://www.ft.com/content/641e0d84-da21-11e9-8f9b-77216ebe1f17

2019-09 Amazon Releases Data Set of Annotated Conversations to Aid Development of Socialbots

Today I am happy to announce the public release of the Topical Chat Dataset, a text-based collection of more than 235,000 utterances (over 4,700,000 words) that will help support high-quality, repeatable research in the field of dialogue systems.  The goal of Topical Chat is to enable innovative research in knowledge-grounded neural response-generation systems by tackling hard challenges that are not addressed by other publicly available datasets. Those challenges, which we have seen universities begin to tackle in the Alexa Prize Socialbot Grand Challenge, include transitioning between topics in a natural manner, knowledge selection and enrichment, and integration of fact and opinion into dialogue. https://developer.amazon.com/es/blogs/alexa/post/885ec615-314f-425f-a396-5bcffd33dd76/amazon-releases-data-set-of-annotated-conversations-to-aid-development-of-socialbots

2019-09 Andrew Ng at Amazon re:MARS 2019

In eras of technological disruption, leadership matters.  Andrew Ng speaks about the progress of AI, how to accelerate AI adoption, and what’s around the corner for AI at Amazon re:MARS 2019 in Las Vegas, California. https://www.deeplearning.ai/blog/andrew-ng-at-amazon-remars-2019/?utm_campaign=BlogAndrewReMarsSeptember12019&utm_content=100648184&utm_medium=social&utm_source=linkedin&hss_channel=lcp-18246783 *****, AI  for Good, AI FATE (fairness accuracy transparency ethics), AI Forecast, AI Techology advance, AI Training, Business, Data, PersonOfInterest,

Facebook expands use of face recognition

https://nakedsecurity.sophos.com/2019/09/06/facebook-expands-use-of-face-recognition/ AI FATE (fairness accuracy transparency ethics), AI Techology advance, Company, Eth Privacy,

2019-09 Announcing Two New Natural Language Dialog Datasets

Today’s digital assistants are expected to complete tasks and return personalized results across many subjects, such as movie listings, restaurant reservations and travel plans. However, despite tremendous progress in recent years, they have not yet reached human-level understanding. This is due, in part, to the lack of quality training data that accurately reflects the way people express their needs and preferences to a digital assistant. This is because the limitations of such systems bias what we say—we want to be understood, and so tailor our words to what we expect a digital assistant to understand. In other words, the conversations we might observe with today’s digital assistants don’t reach the level of dialog complexity we need to model human-level understanding. https://ai.googleblog.com/2019/09/announcing-two-new-natural-language.html

2019-08 Waymo Open Dataset

https://waymo.com/open/

2016-01 1.5 TB dataset of anonymized user interactions released by Yahoo

The Yahoo News Feed dataset is a collection based on a sample of anonymized user interactions on the news feeds of several Yahoo properties, including the Yahoo homepage, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Movies, and Yahoo Real Estate. The dataset stands at a massive ~110B lines (1.5TB bzipped) of user-news item interaction data, collected by recording the user- news item interaction of about 20M users from February 2015 to May 2015. In addition to the interaction data, we are providing the demographic information (age segment and gender) and the city in which the user is based for a subset of the anonymized users. On the item side, we are releasing the title, summary, and key-phrases of the pertinent news article. The interaction data is timestamped with the user’s local time and also contains partial information of the device on which the user accessed the news feeds, which allows for interesting work in contextual recommendation and temporal data mining. https://www.d...

2019-08 Waymo is going to share its self-driving data—but it’s still not enough

Waymo says it will share some of the data it’s gathered from its vehicles for free so other researchers working on autonomous driving can use it. Waymo isn’t the first to do this: Lyft, Argo AI, and other firms have already open-sourced some data sets. But Waymo’s move is notable because its vehicles have covered millions of miles on roads already. https://www.technologyreview.com/f/614211/waymo-is-going-to-share-its-self-driving-databut-its-still-not-enough/?utm_medium=tr_social&utm_campaign=site_visitor.unpaid.engagement&utm_source=Twitter#Echobox=1566491935

2019-08 20 Open Datasets for Natural Language Processing

https://noeliagorod.com/2019/08/19/20-open-datasets-for-natural-language-processing/amp/

2018-12 It all Boils Down to the Training Data

Is your model not performing well? Try digging into your data. Instead of getting marginal improvements in performance by searching for state-of-the-art models, drastically improve your model’s accuracy by improving the quality of your data. https://medium.com/labelbox/it-all-boils-down-to-the-training-data-393376f24e6a

2019-08 AI NEEDS YOUR DATA—AND YOU SHOULD GET PAID FOR IT

ROBERT CHANG, A Stanford ophthalmologist, normally stays busy prescribing drops and performing eye surgery. But a few years ago, he decided to jump on a hot new trend in his field: artificial intelligence. Doctors like Chang often rely on eye imaging to track the development of conditions like glaucoma. With enough scans, he reasoned, he might find patterns that could help him better interpret test results. https://www.wired.com/story/ai-needs-data-you-should-get-paid/

2019-08 Dataset search tool

https://toolbox.google.com/datasetsearch

2019-08 Open archive of 240,000 hours' worth of talk radio, including 2.8 billion words of machine-transcription

A group of MIT Media Lab researchers have published Radiotalk, a massive corpus of talk radio audio with machine-generated transcriptions, with a total of 240,000 hours' worth of speech, marked up with machine-readable metadata.  The audio was scraped from streaming radio services between Oct 2018 and Mar 2019, and the transcripts run to 2.8 billion words. The researchers hope the corpus will be used by "researchers in the fields of natural language processing, conversational analysis, and the social sciences." https://boingboing.net/2019/08/01/pump-up-the-volume.html

2019-07 Transforming Skewed Data for Machine Learning

Skewed data is common in data science; skew is the degree of distortion from a normal distribution. For example, below is a plot of the house prices from Kaggle’s House Price Competition that is right skewed, meaning there are a minority of very large values. https://medium.com/@ODSC/transforming-skewed-data-for-machine-learning-90e6cc364b0

2019-07 Free Data Sets for Machine Learning

I am a big fan of learning through practical application. I have found that when studying machine learning, it can be really useful to obtain some publically available data sets to apply the latest technique I have learnt to. Or you might want a really simple data set to benchmark a solution or compare… https://towardsdatascience.com/free-data-sets-for-machine-learning-73e74554cc21

2017-09 Dealing With Imbalanced Datasets

Dealing with imbalanced datasets is an everyday problem. SMOTE, Synthetic Minority Oversampling TEchnique and its variants are techniques for solving this problem through oversampling that have recently become a very popular way to improve model performance. https://www.datasciencecentral.com/profiles/blogs/dealing-with-imbalanced-datasets

2019-07 Building Better Deep Learning Requires New Approaches Not Just Bigger Data

In its rush to solve all the world’s problems through deep learning, Silicon Valley is increasingly embracing the idea of AI as a universal solver that can be rapidly adapted to any problem in any domain simply by taking a stock algorithm and feeding it relevant training data. The problem with this assumption is that today’s deep learning systems are little more than correlative pattern extractors that search large datasets for basic patterns and encode them into software. While impressive compared to the standards of previous eras, these systems are still extraordinarily limited, capable only of identifying simplistic correlations rather than actually semantically understanding their problem domain. In turn, the hand-coded era’s focus on domain expertise, ethnographic codification and deeply understanding a problem domain has given way to parachute programming in which deep learning specialists take an off-the-shelf algorithm, shove in a pile of training data, dump out the resulting m...

2019-06 Deep learning Data Sets for Every Data Scientist

https://www.datasciencecentral.com/profiles/blogs/deep-learning-data-sets-for-every-data-scientist

2019-07 Imbalanced vs Balanced Dataset in Machine Learning

Balanced Dataset : Before giving you the definition of Balanced dataset let me give you an example for your better understanding, lets assume I have a dataset with thousand data points and I name it “N”. So now N = 1000 data points, & N have two different classes one is N1 and another one is N2. Inside the N1 there have 580 data points and inside the N2 there have 420 data points. N1 have positive (+Ve) data points and N2 have negative (-Ve) data points. So we can say that the number of data points of N1 and N2 is almost similar than each other. So then I can write N1 ~ N2. Then it is proved that N is a Balanced Dataset. https://medium.com/@suvhradipghosh/imbalanced-vs-balanced-dataset-in-machine-learning-4faec5629b7e

2019-07 Should We Give Google Our Health Care Data?

Google is the latest company to stake its claim as king of health care technology. Streams, a tool to diagnose kidney disease, is being trialed by the UK’s National Health Service (NHS). Far more sophisticated tools are clearly in the pipeline. It recently unveiled "promising" artificial intelligence that can identify lung cancer a year before a doctor could. https://www.forbes.com/sites/forbestechcouncil/2019/07/01/should-we-give-google-our-health-care-data/#357989c3ce44

2019-06 alphabet dice.png STORAGE Use Data Lakes to Bet on the Future of Artificial Intelligence

Artificial intelligence has moved far beyond the stuff of science fiction. And, for all the benefits AI provides today, we can only guess at what the future of artificial intelligence holds. To help ensure that they will be able to take advantage of any and all AI advancements, many companies are making use of data lakes. https://www.itprotoday.com/storage/use-data-lakes-bet-future-artificial-intelligence

Free Datasets

https://www.kdnuggets.com/2011/02/free-public-datasets.html

2019-06 reasons why data lakes are vital for startup analytics | CIO

Whereas data warehouses and data marts tend to force companies into narrow data paradigms and silos, data lakes emphasize a more holistic and expansive view of analytics. Data lakes deliver a more adaptive approach towards analyzing data, and stress the value of all information, instead of pre-screened bits and pieces. https://www.cio.com/article/3315660/5-reasons-why-data-lakes-are-vital-for-startup-analytics.html

2019 Quality Analysis in Data Mining Projects “Cruising the Data Ocean” Blog Series - Part 6 of 6

In my previous posts, I discussed how to identify, acquire, cleanse, and extract meaning from Internet content and use it to build your business applications. But how do you ensure that your system always returns the highest-quality results? This is where quality analysis plays an essential role in your web data mining project. https://www.searchtechnologies.com/blog/data-mining-quality-analysis

2019 Building Search, Analytics, and BI Applications with Data from the Internet “Cruising the Data Ocean” Blog Series - Part 5 of 6

In my previous posts, I provided the tools and techniques for selecting, extracting, cleansing, and understanding content from the Internet in order to support your business use case. In this blog, I'll discuss how to use the processed data for your own custom search, analytics, and business intelligence (BI) applications. https://www.searchtechnologies.com/blog/building-search-analytics-applications

2019 Cleansing and Formatting Content for Data Mining Projects "Cruising the Data Ocean" Blog Series - Part 3 of 6

In the first and second parts of this blog series, I discussed how to identify and acquire content from various Internet sources for your data mining needs. In this third blog, I'll provide an overview of some common techniques and tools for data cleansing and formatting. https://www.searchtechnologies.com/blog/data-cleansing-techniques-data-mining

2019 How to Acquire Content from the Internet for Data Mining "Cruising the Data Ocean" Blog Series - Part 2 of 6

In the first part of this blog series, I discussed how to identify the sources for your data mining needs. Once you've done that, you will need to fetch it and download it to your own computers so it can be processed. I'll cover this step here in the second part of the blog series. https://www.searchtechnologies.com/blog/web-content-extraction-data-mining

2019 Data Mining Tools and Techniques for Harvesting Data from the Internet “Cruising the Data Ocean” Blog Series - Part 1 of 6

Have you ever said that sentence? In my recent experience, this sentence is coming up more and more. After all, the Internet has so much incredible information, if only it could be downloaded and processed – just think of how valuable it could be? https://www.searchtechnologies.com/blog/web-data-mining-tools-techniques

2019-05-21 Dealing with the Lack of Data in Machine Learning

In many projects I carried out, companies, despite having fantastic AI business ideas, display a tendency to slowly become frustrated when they realize that they do not have enough data… However, solutions do exist! The purpose of this article is to briefly introduce you to some of them (the ones that are proven effective in… https://medium.com/@alexandregonfalonieri/dealing-with-the-lack-of-data-in-machine-learning-725f2abd2b92?source=email-ae8114b14513-1559097493099-digest.reader------0-59------------------8a078c29_af2e_4f47_ab31_da7f05e48097-1&sectionName=top

2019-06 5 Million Faces — Top 15 Free Image Datasets for Facial Recognition

https://lionbridge.ai/datasets/5-million-faces-top-15-free-image-datasets-for-facial-recognition/

2019-06 Deep Learning Predictions of Diabetic Retinopathy Associated with Progression of Renal Disease in Type 1 Diabetes

competición Kaggle http://diabetes.diabetesjournals.org/content/68/Supplement_1/546-P

2019-06-15 Top 8 Sources For Machine Learning and Analytics Datasets

Open datasets https://medium.com/datadriveninvestor/top-8-sources-for-machine-learning-and-analytics-datasets-5d2d94ada8ab