The Eponymous Pickle: NLP

Showing posts with label NLP. Show all posts

Friday, May 12, 2023

China AI Emerges from iFlytek

New to me. not unexpected. How regulated will this be?

Chinese AI Company iFlytek Launches Its Generative Language Model Called Spark Model

By Niharika Singh -May 11, 2023

iFlytek, a leading Chinese AI company, has launched its natural language processing (NLP) model, iFLY-GEN, to rival OpenAI’s popular GPT-3 language model. iFlytek claims that iFLY-GEN is more advanced and precise than GPT-3, boasting higher accuracy and better performance in tasks such as language translation and natural language understanding.

According to iFlytek, iFLY-GEN is based on a new NLP algorithm developed by the company’s research team. This algorithm combines deep learning, reinforcement learning, and unsupervised learning techniques to achieve more accurate and efficient language processing. The company says that iFLY-GEN has achieved state-of-the-art performance in several NLP benchmarks, including the GLUE benchmark and the SuperGLUE benchmark.

iFlytek’s iFLY-GEN has been trained on a massive amount of data, including over 2 billion Chinese language sentences, as well as English, Japanese, and Korean language data. The company says that this extensive training has allowed iFLY-GEN to better understand the nuances and complexities of human language, enabling it to produce more accurate and natural-sounding responses.

iFLY-GEN is already used in several applications, including customer service chatbots, voice assistants, and language translation services. iFlytek claims that iFLY-GEN’s superior performance makes it more effective in these applications than GPT-3, which has sometimes been criticized for producing nonsensical or offensive responses.

While iFlytek’s iFLY-GEN may be a strong competitor to GPT-3, it remains to be seen whether it can gain the same widespread adoption and recognition. OpenAI’s GPT-3 has become a dominant force in the NLP field, with many companies and developers using it to power their language-based applications. GPT-3’s popularity is partly due to its accessibility, as it can be accessed through OpenAI’s API, allowing developers to integrate it into their projects efficiently.

iFlytek, on the other hand, is primarily known in China and needs more visibility and recognition in other parts of the world. Additionally, iFlytek’s business model differs from OpenAI’s, as iFlytek primarily focuses on selling its language technology to businesses and government agencies rather than offering an API for developers.

Despite these challenges, iFlytek’s iFLY-GEN has the potential to shake up the NLP market, particularly in China, where iFlytek has a strong presence. China is one of the largest and fastest-growing markets for language technology, and iFlytek’s advanced NLP capabilities could give it a competitive edge in this market.

In conclusion, iFlytek’s iFLY-GEN represents a significant new entrant in the NLP market, and its advanced capabilities could make it a formidable competitor to OpenAI’s GPT-3. While it remains to be seen how widely adopted iFLY-GEN will be outside of China, it has the potential to disrupt the NLP market and further accelerate the development of advanced language technology.

Check out the PR Release. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com .... '

Monday, March 20, 2023

Cohere.ai Pre trained Models and Tools for NLP Tasks

Brought to my attention, Useful when combined with GPT models? Experiences?

Harness the power of text A

Cohere.ai

See their blog for examples, uses: https://txt.cohere.ai/

Automatically generate writing according to any criteria.

Generate

Meet your AI-generated content writer

Generate is powered by a large language model that has read billions of words, learning the patterns and idiosyncrasies of sentences. Using this knowledge, it writes content, predicts outcomes or answers questions at your command.

Reading billions of words, to write the ones you need.

The Generate API is trained on vast amounts of text spanning all topics and industries. With Generate, you ‘instruct’ the model with your specific text generation ask. This could be a copywriting task, named entity recognition, or even paraphrasing or summarization.

Get Started ....

GPT as a Corporate Lobbyist

Considerable, interesting piece, Now how well can we detect this? Also covered in Schneier with much further analysis and comment:

Large Language Models as Corporate Lobbyists

9 Pages Posted: 4 Jan 2023 Last revised: 16 Jan 2023 By John Nay

Stanford University - CodeX - Center for Legal Informatics; New York University (NYU); Brooklyn Artificial Intelligence Research; Brooklyn Investment Group (BKLN.com)

Date Written: January 2, 2023

Abstract

We demonstrate a proof-of-concept of a large language model conducting corporate lobbying related activities. An autoregressive large language model (OpenAI’s text-davinci-003) determines if proposed U.S. Congressional bills are relevant to specific public companies and provides explanations and confidence levels. For the bills the model deems as relevant, the model drafts a letter to the sponsor of the bill in an attempt to persuade the congressperson to make changes to the proposed legislation. We use hundreds of novel ground-truth labels of the relevance of a bill to a company to benchmark the performance of the model, which outperforms the baseline of predicting the most common outcome of irrelevance. We also benchmark the performance of the previous OpenAI GPT-3 model (text-davinci-002), which was the state-of-the-art model on many academic natural language tasks until text-davinci-003 was recently released. The performance of text-davinci-002 is worse than a simple benchmark. These results suggest that, as large language models continue to exhibit improved natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. Longer-term, if AI begins to influence law in a manner that is not a direct extension of human intentions, this threatens the critical role that law as information could play in aligning AI with humans. This Essay explores how this is increasingly a possibility. Initially, AI is being used to simply augment human lobbyists for a small proportion of their daily tasks. However, firms have an incentive to use less and less human oversight over automated assessments of policy ideas and the written communication to regulatory agencies and Congressional staffers. The core question raised is where to draw the line between human-driven and AI-driven policy influence.

Keywords: Artificial Intelligence, AI, Machine Learning, Natural Language Processing, NLP, Self-Supervised Learning, Large Language Models, GPT, Foundation Models, AI Safety, AI Alignment, AI & Law, AI Policy, Computational Legal Studies, Computational Law, Law-Making, Public Policy, Policy-Making, Lobbying

JEL Classification: C45, C55, K49, O30 ...

Friday, December 16, 2022

Playing with Chat GPT: A Future of Delivering Knowledge?

The end of journalism, or the ability to tell the difference? Continuing toimprove. But will some of us will still be needed to point out the key details? Experimenting there too...

ChatGPT shrugged

Natasha Lomas@riptari / 6:44 PM EST•December 5, 2022

ChatGPT is a new artificial intelligence (AI) tool that’s designed to help people communicate with computers in a more natural and intuitive way — using natural language processing (NLP) technology. But what’s behind the development of ChatGPT and how can the tech be used to help humanity? These are some of the questions that we will explore in this article…

That rather bland lede, which we’ve slightly edited, was generated by OpenAI’s ChatGPT in response to a prompt by this (human) reporter. TechCrunch kicked off a conversation with the large language model by asking it to explain its function and purpose. We wanted to see if we could use the chatbot-style Q&A format it’s most recently been embedded into to probe the tech and get AI to articulate some of its limitations.

“ChatGPT is a natural language processing (NLP) model developed by OpenAI. It is designed to generate human-like responses to text input, allowing users to engage in natural, conversational interactions with the model,” the AI responded to our ice breaker. “ChatGPT works by using machine learning algorithms to analyze and understand the meaning of text input, and then generating a response based on that input. The model is trained on large amounts of text data, allowing it to learn the patterns and structures of natural language.”

Additionally, the bot told us its approach allows it to generate responses that are “relevant, coherent, and appropriate” to the context of the conversation. (With more probing, it went on to heavily caveat that claim — writing later: “It is important to note that my responses are not guaranteed to be perfect or always correct, and there may be cases where my responses may not be relevant, coherent, or appropriate.”)

ChatGPT further described its mission as being “intended to help humanity by providing a more natural and efficient way to interact with artificial intelligence” — expanding on its claimed purpose thusly:

By allowing users to communicate with AI in a more human-like way, ChatGPT can help to make AI more accessible and useful for a wider range of applications. This can help to improve the efficiency and accuracy of tasks that rely on natural language processing, such as language translation, text summarization, and information retrieval. Additionally, by providing a more intuitive and human-like interface for AI, ChatGPT can help to make AI more approachable and less intimidating to users who are less familiar with technology. Overall, ChatGPT is intended to help humanity by improving the usability and effectiveness of AI. ... '

Thursday, September 22, 2022

Linguistics and the Development of NLP

Home/Opinion/Interviews/Linguistics and the Development of NLP/Full Text

ACM OPINION

Linguistics and the Development of NLP By The Gradient

September 9, 2022

Christopher Manning is the director of the Stanford University AI Lab and an associate director of the Stanford Human-Centered Artificial Intelligence Institute.

In this podcast, Chris Manning, an ACM Fellow, AAAI Fellow, and past president of ACL, discusses a number of topics, including tree recursive neural networks, GloVe, neural machine translation, computational linguistic approaches to parsing, and his current work, which is focused on applying deep learning to natural language processing.

Full article

Friday, August 26, 2022

Data Platform for Chatbot Development

Just reviewing of this, some good thoughts. Data prep in particular

A Data Platform for Chatbot Development

Alex Woodie

One of the most compelling use cases for AI at the moment is developing chatbots and conversational agents. While the AI part of the equation works reasonably well, getting the training data organized to build and train accurate chatbots has emerged as the bottleneck for wider adoption. That’s what drove the folks at Dashbot to develop a data platform specifically for chatbot creation and optimization.

Recent advances in natural language processing (NLP) and transfer learning have helped to lower the technical bar to building chatbots and conversational agents. Instead of creating a whole NLP system from scratch, users can borrow a pre-trained deep learning model and customize just a few layers. When you combine this democratization of NLP tech with the workplace disruptions of COVID, we have a situation where chatbots appear to have sprung up everywhere almost overnight.

Andrew Hong also saw this sudden surge in chatbot creation and usage while working at a venture capital firm a few years ago. With the chatbot market expanding at a 24% CAGR (according to one forecast), it’s a potentially lucrative place for a technology investor, and Hong wanted to be in on it.

“I was looking to invest in this space. Everybody was investing in chatbots,” Hong told Datanami recently. “But then it kind of occurred to me there’s actually a data problem here. That’s when I poked deeper and saw this problem.” The problem (as you may have guessed) is that conversational data is a mess. According to Hong, organizations are devoting extensive data science and data engineering resources to prepare large amounts of raw chat transcripts and other conversational data so it can be used to train chatbots and agents.

The problem boils down to this: Without a lot of manual work to prep, organize, and analyze massive amounts of text data used for training, the chatbots and agents don’t work very well. Keeping the bots running efficiently also requires ongoing optimization, which Hong’s company, Dashbot, helps to automate.

“A lot of this is literally hieroglyphics,” Hong said of call transcripts, emails, and other text that’s used to train chatbots. “Raw conversational data is undecipherable. It’s like a giant file with billions of lines of just words. You really can’t even ask it a question.”

While a good chatbot seems to work effortlessly, there’s a lot of work going on behind the scenes to get there. For starters, raw text files that serve as the training data must be cleansed, prepped, and labeled. Sentences must be strung together, and questions and answers in a conversation grouped. As part of this process, the data is typically extracted from a data lake and loaded into a repository where it can be queried and analyzed, such as a relational database.

Next, there’s data science work involved. On the first pass, a machine learning algorithm might help to identify clusters in the text files. That might be followed by topic modeling to narrow down the topics that people are discussing. Sentiment analysis may be performed to help identify the topics that are associated with the highest frustration of users.

Finally, the training data is segmented by intents. Once an intent is associated with a particular piece of training data, then it can be used by an NLP system to train a chatbot to answer a particular question. A chatbot may be programmed to recognize and respond to 100 or more individual intents, and its performance on each of these varies with the quality of the training data.

Dashbot was founded in 2016 to automate as many of these steps as possible, and to help make the data preparation as turnkey as possible before handing the training data over to NLP chatbot vendors like Amazon Lex, IBM Watson, and Google Cloud Dialogflow.

“I think a tool like this needs to exists beyond chatbots,” said Hong, who joined Dashbot as its CEO in 2020. “How do you turn unstructured data into something usable? I think this ETL pipeline we built is going to help do that.”

Chatbot Data Prep

Instead of requiring data engineers and data scientists to spend days working with huge number of text files, Hong developed Dashbot’s offering, dubbed Conversational Data Cloud, to automate many of the steps required to turn raw text into the refined JSON document that the major NLP vendors expect.

“A lot of enterprises have call center transcripts just piling up in their Amazon data lakes. We can tap into that, transform that in a few seconds,” Hong said. “We can integrate with any conversational channel. It can be your call centers, chat bots, voice agents. You can even upload raw conversational files sitting on a data lake.”

The Dashbot product is broken up into three parts, including a data playground used for ETL and data cleansing; a reporting module, where the user can run analytics on the data; and an optimization layer.

The data prep occurs in the data playground, Hong said, while the analytics layer is useful for asking questions of the data that can help illuminate problems, such as: “In the last seven days how many people have called in and asked about this new product line that we just launched and how many people are frustrated by it?” ... '

Thursday, November 11, 2021

IBM Adds NLP to Watson Discovery

In our early looks at Watson, precisely what we wanted to add to Discovery to make it useful. For better links directly to business decision makers. This cou d be key for novel uses, Show me some examples.

IBM to Add New Natural Language Processing Enhancements to Watson Discovery in PRNewswire.

New planned features are designed to help business users quickly start applying AI to find more precise document insights with less training time and data science skills

Businesses in financial services, insurance and legal services turn to Watson Discovery to help automate processes and enhance customer care

NEWS PROVIDED BY IBM

ARMONK, N.Y., Nov. 10, 2021 /PRNewswire/ -- IBM (NYSE: IBM) today announced new natural language processing (NLP) enhancements planned for IBM Watson Discovery. These planned updates are designed to help business users in industries such as financial services, insurance and legal services enhance customer care and accelerate business processes by uncovering insights and synthesizing information from complex documents.

Businesses are increasingly turning to NLP and machine learning to help them comb through rising volumes of documents and data sets in a wide range of formats1. By applying AI to get document insights, business users can reduce research time and help their employees make more fact-driven decisions during complex, time sensitive tasks such as processing insurance claims, conducting financial analyses and reviewing legal agreements or contracts.

The new planned features that IBM announced today are designed to make it easier for Watson Discovery users to quickly customize the underlying NLP models on the unique language of their business. Stemming from NLP advancements developed by IBM Research, business users can train Watson Discovery to help read, understand and surface more precise insights from large sets of complex, industry-specific documents even if they don't have significant data science skills.

Pre-trained document structure understanding: Watson Discovery's Smart Document Understanding feature, available now in the Plus, Enterprise and Premium plans, includes a new pre-trained model that is designed to automatically understand the visual structure and layout of a document without additional training from a developer or data scientist. This helps users quickly find answers that were previously hidden or difficult to find like text in complex table structures or images.

Automatic text pattern detection: IBM has released a new advanced pattern creation feature in beta in the Plus, Premium and Enterprise plans that is designed to help users quickly identify business-specific text patterns within their documents. This is key for tasks like analyzing massive amounts of contracts or financial reports, which may report the same type of information, such as an increase or decrease in revenue, in different formats or using different phrases. Developed by IBM Research, it helps provide efficient ways of labeling data and training models. It is designed to start learning the underlying text patterns from as few as two examples and then refines the pattern based on user feedback. This helps users more rapidly train a model without manual and time-intensive tasks like defining rules and expressions.

Advanced NLP customization capabilities: Training NLP models to identify highly customized, business-specific words and phrases – for example insurance claim forms may include specific claim reasons or affected products – is a time-consuming task that requires significant data prep, labeling, and orchestration. Models trained on generic data sets often fail to retrieve the right information. With a new custom entity extractor feature, now available in beta for Watson Discovery Premium users, IBM is simplifying this process by reducing the effort for data prep, simplifying labeling with active learning and bulk annotation capabilities, and enabling simple model deployment that can accelerate training time.

The planned updates announced today are part of a pipeline of developments stemming from IBM Research. For example, answer finding was recently made generally available in Watson Discovery and Watson Assistant's Search Skill. It is designed to help busy professionals and customers identify the precise insights they need. .... '

Tuesday, June 22, 2021

China Claims to Exceed GPT-3 Language

Continued push on more powerful language models.

China outstrips GPT-3 with even more ambitious AI language model

By Anthony Spadafora in TechRadar, First Published 2 weeks ago

WuDao 2.0 model was trained using 1.75tn parameters

A Chinese AI institute has unveiled a new natural language processing (NLP) model that is even more sophisticated than those created by both Google and OpenAI.

The WuDao 2.0 model was created by the Beijing Academy of Artificial Intelligence (BAAI) and developed with the help of over 100 scientists from multiple organizations. What makes this pre-trained AI model so special is the fact that it uses 1.75tn parameters to simulate conversations, understand pictures, write poems and even create recipes.

Parameters are variables that are defined by machine learning models and as these models evolve, the parameters themselves also improve to allow an algorithm to get better at finding the correct outcome over time. Once a model has been trained on a specific data set like human speech samples, the outcome can then be applied to solving other similar problems. ... "

Thursday, February 25, 2021

Example of Question Answering Application: Jarvis

Question Answering Applications

Developing Question a Question Answer Application with NVIDIA Jarvis By James Sohn | February 25, 2021 Tags: AI/Deep Learning, BERT, cloud computing, featured,

There is a high chance that you have asked your smart speaker a question like, “How tall is Mount Everest?” If you did, it probably said, “Mount Everest is 29,032 feet above sea level.” Have you ever wondered how it found an answer for you?

Question answering (QA) is loosely defined as a system consisting of information retrieval (IR) and natural language processing (NLP), which is concerned with answering questions posed by humans in a natural language. If you are not familiar with information retrieval, it is a technique to obtain relevant information to a query, from a pool of resources, webpages, or documents in the database, for example. The easiest way to understand the concept is the search engine that you use daily.

You then need an NLP system to find an answer within the IR system that is relevant to the query. Although I just listed what you need for building a QA system, it is not a trivial task to build IR and NLP from scratch. Here’s how NVIDIA Jarvis makes it easy to develop a QA system.

Jarvis overview

NVIDIA Jarvis is a fully accelerated application framework for building multimodal conversational AI services that use an end-to-end deep learning pipeline. The Jarvis framework includes optimized services for speech, vision, and natural language understanding (NLU) tasks. In addition to providing several pretrained models for the entire pipeline of your conversational AI service, Javis is also architected for deployment at scale. In this post, I look closely into the QA function of Jarvis and how you can create your own QA application with it. ... "

Sunday, January 31, 2021

Whats a Transformer?

A major advance in NLP capabilities? Venturebeat does a good job of explaining and suggesting this will be one of the major AI advances of the year. Essentially as I see it a library and model that supports better conversation among people and machines. Open-Sourced. Worth a deeper look.

Microsoft trains world’s largest Transformer language model By Khari Johnson @kharijohnson in Venturebeat

Microsoft AI & Research today shared what it calls the largest Transformer-based language generation model ever and open-sourced a deep learning library named DeepSpeed to make distributed training of large models easier.

At 17 billion parameters, Turing NLG is twice the size of Nvidia’s Megatron, now the second biggest Transformer model, and includes 10 times as many parameters as OpenAI’s GPT-2. Turing NLG achieves state-of-the-art results on a range of NLP tasks.

Like Google’s Meena and initially with GPT-2, at first Turing NLG may only be shared in private demos.

Language generation models with the Transformer architecture predict the word that comes next. They can be used to write stories, generate answers in complete sentences, and summarize text.

Experts from across the AI field told VentureBeat 2019 was a seminal year for NLP models using the Transformer architecture, an approach that led to advances in language generation and GLUE benchmark leaders like Facebook’s RoBERTa, Google’s XLNet, and Microsoft’s MT-DNN.

Also today: Microsoft open-sourced DeepSpeed, a deep learning library that’s optimized for developers to deliver low latency, high throughput inference.

DeepSpeed contains the Zero Redundancy Optimizer (ZeRO) for training models with 100 million parameters or more at scale, which Microsoft used to train Turing NLG.

“Beyond saving our users time by summarizing documents and emails, T-NLG can enhance experiences with the Microsoft Office suite by offering writing assistance to authors and answering questions that readers may ask about a document,” Microsoft AI Research applied scientist Corby Rosset wrote in a blog post today. ... "

Saturday, December 07, 2019

History of Natural Language Processing

Interesting historical piece with hits about how things evolve.

Andrey Markov & Claude Shannon Counted Letters to Build the First Language-Generation Models

By Oscar Schwartz

This is part three of a six-part series on the history of natural language processing.
In 1913, the Russian mathematician Andrey Andreyevich Markov sat down in his study in St. Petersburg with a copy of Alexander Pushkin’s 19th century verse novel, Eugene Onegin, a literary classic at the time. Markov, however, did not start reading Pushkin’s famous text. Rather, he took a pen and piece of drafting paper, and wrote out the first 20,000 letters of the book in one long string of letters, eliminating all punctuation and spaces. Then he arranged these letters in 200 grids (10-by-10 characters each) and began counting the vowels in every row and column, tallying the results..... "

Sunday, October 20, 2019

AI for Reading Understanding

Been reading and exploring about what reading understanding means. Here an update from Quanta Magazine on the toic. We still have far to go when we have to deal with changing context, common sense and even inferring things like implications of cause and effect. We did lots of work with 'sentiment analysis' long ago, and its much easier to do now, with lots of easy to plug in capabilities, but the result is still statistically weak. Shows how difficult a building a semi general purpose chatbot is. We discovered that during several efforts. Good read here at the link:

Machines Beat Humans on a Reading Test. But Do They Understand?
A tool known as BERT can now beat humans on advanced reading-comprehension tests. But it's also revealed how far AI has to go.

In the fall of 2017, Sam Bowman, a computational linguist at New York University, figured that computers still weren’t very good at understanding the written word. Sure, they had become decent at simulating that understanding in certain narrow domains, like automatic translation or sentiment analysis (for example, determining if a sentence sounds “mean or nice,” he said). But Bowman wanted measurable evidence of the genuine article: bona fide, human-style reading comprehension in English. So he came up with a test.

In an April 2018 paper coauthored with collaborators from the University of Washington and DeepMind, the Google-owned artificial intelligence company, Bowman introduced a battery of nine reading-comprehension tasks for computers called GLUE (General Language Understanding Evaluation). The test was designed as “a fairly representative sample of what the research community thought were interesting challenges,” said Bowman, but also “pretty straightforward for humans.” For example, one task asks whether a sentence is true based on information offered in a preceding sentence. If you can tell that “President Trump landed in Iraq for the start of a seven-day visit” implies that “President Trump is on an overseas visit,” you’ve just passed.

The machines bombed. Even state-of-the-art neural networks scored no higher than 69 out of 100 across all nine tasks: a D-plus, in letter grade terms. Bowman and his coauthors weren’t surprised. Neural networks — layers of computational connections built in a crude approximation of how neurons communicate within mammalian brains — had shown promise in the field of “natural language processing” (NLP), but the researchers weren’t convinced that these systems were learning anything substantial about language itself. And GLUE seemed to prove it. “These early results indicate that solving GLUE is beyond the capabilities of current models and methods,” Bowman and his coauthors wrote. .... "

Saturday, October 19, 2019

Attention for Advanced Forecasting and Classification

Interesting and quite technical view of forecasting and classification that is worth a look. Of course accurate and timely forecasting is important for most businesses. Considerable piece here, below an intro with much more at the link. Have never seen it done accurately enough with these kinds of methods.

Attention for time series forecasting and classification
Harnessing the most recent advances in NLP for time series forecasting and classification By Isaac Godfried

Transformers (specifically self-attention) have powered significant recent progress in NLP. They have enabled models like BERT, GPT-2, and XLNet to form powerful language models that can be used to generate text, translate text, answer questions, classify documents, summarize text, and much more. With their recent success in NLP one would expect widespread adaptation to problems like time series forecasting and classification. After all, both involve processing sequential data. However, to this point research on their adaptation to time series problems has remained limited. Moreover, while some results are promising, others remain more mixed. In this article, I will review current literature on applying transformers as well as attention more broadly to time series problems, discuss the current barriers/limitations, and brainstorm possible solutions to (hopefully) enable these models to achieve the same level success as in NLP. This article will assume that you have a basic understanding of soft-attention, self-attention, and transformer architecture. If you don’t please read one of the linked articles. You can also watch my video from the PyData Orono presentation night.

Attention for time series data: Review

The need to accurately forecast and classify time series data spans across just about every industry and long predates machine learning. For instance, in hospitals you may want to triage patients with the highest mortality early-on and forecast patient length of stay; in retail you may want to predict demand and forecast sales; utility companies want to forecast power usage, etc. .... "

Monday, September 09, 2019

Voice Applications, Why and How

Good, non device specific look at voice applications. And an overview of what people are doing and why and where to start. And really not just about AI, think assistance in context.

Got speech? These guidelines will help you get started building voice applications
Speech adds another level of complexity to AI applications—today’s voice applications provide a very early glimpse of what is to come. By Ben Lorica, Yishay Carmiel in O'Reilly Media ....

Monday, February 11, 2019

BERT for Natural Language Understanding

Was just introduced to this again, worth a a look:

BERT Technology introduced in 3-minutes By Suleiman Khan, Ph.D. in Medium

Google BERT is a pre-training method for natural language understanding that performs various NLP tasks better than ever before.

BERT works in two steps, First, it uses a large amount of unlabeled data to learn a language representation in an unsupervised fashion called pre-training. Then, the pre-trained model can be fine-tuned in a supervised fashion using a small amount of labeled trained data to perform various supervised tasks. Pre-training machine learning models have already seen success in various domains including image processing and natural language processing (NLP). .... "

Tuesday, December 11, 2018

Finishing Your Sentences with Common Sense

Yes, its one way to tailor common sense driven sentences. Here some snippets of work underway in the space, starting with a NYT article, then links to technical details. The overall challenge for common sense natural language understanding research at this level is well described, but the solutions are technical:

Finally, a Machine That Can Finish Your Sentence

Completing someone else’s thought is not an easy trick for A.I. But new systems are starting to crack the code of natural language. By Cade Metz in the WSJ. ... "

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. ..."

And of course, a data challenge for this problem,with early success results:

A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

Rowan Zellers. Yonatan Bisk, Roy Schwartz, Yejin Choi
Paul G. Allen School of Computer Science & Engineering, University of Washington
Allen Institute for Artificial Intelligence

Further description of the data challenge in Swag:

Given a partial description like “she opened the hood of the car,” humans can reason about the situation and anticipate what might come, next (“then, she examined the engine”). In this paper, we introduce the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning.

We present Swag, a new dataset with 113k multiple choice questions about a rich spectrum of grounded situations. To address the, recurring challenges of the annotation artifacts and human biases found in many existing datasets, we propose Adversarial Filtering (AF), a novel procedure that constructs a de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter the data. To account for the aggressive adversarial filtering, we use state-of-theart language models to massively oversample a diverse set of potential counterfactuals.

Empirical results demonstrate that while humans can solve the resulting inference problems with high accuracy (88%), various competitive models struggle on our task. We provide comprehensive analysis that indicates significant opportunities for future research. ... "

Wednesday, May 02, 2018

Sequencing Problems for Natural Language Processing

Good piece by William Vorhies

Temporal Convolutional Nets (TCNs) Take Over from RNNs for NLP Predictions Posted by William Vorhies in DSC

Summary: Our starting assumption that sequence problems (language, speech, and others) are the natural domain of RNNs is being challenged. Temporal Convolutional Nets (TCNs) which are our workhorse CNNs with a few new features are outperforming RNNs on major applications today. Looks like RNNs may well be history.

It’s only been since 2014 or 2015 when our DNN-powered applications passed the 95% accuracy point on text and speech recognition allowing for whole generations of chatbots, personal assistants, and instant translators.

Convolutional Neural Nets (CNNs) are the acknowledged workhorse of image and video recognition while Recurrent Neural Nets (RNNs) became the same for all things language.

One of the key differences is that CNNs can recognize features in static images (or video when considered one frame at a time) while RNNs excelled at text and speech which were recognized as sequence or time-dependent problems. That is where the next predicted character or word or phrase depends on those that came before (left-to-right) introducing the concept of time and therefore sequence.

Actually RNNs are good at all types of sequence problems, including speech/text recognition, language-to-language translation, handwriting recognition, sequence data analysis (forecasting), and even automatic code generation in many different configurations. .... "

Saturday, October 07, 2017

Deep Learning Trends

Four deep learning trends from ACL 2017 by Abigail See

Part One: Linguistic Structure and Word Embeddings

Introduction

“NLP is booming”, declared Joakim Nivre at the presidential address of ACL 2017, which I attended in Vancouver earlier this month. As evidenced by the throngs of attendees, interest in NLP is at an all-time high – an increase that is chiefly due to the successes of the deep learning renaissance, which recently swept like a tidal wave over the field.

Beneath the optimism however, I noticed a tangible anxiety at ACL, as one field adjusts to its rapid transformation by another. Researchers asked whether there is anything of the old NLP left – or was it all swept away by the tidal wave? Are neural networks the only technique we need any more? How do we do good science now that experiments are so empirical, papers are immediately on arXiv, and access to GPUs can determine success?

I don't have money for GPUs! Is NLP dead? And language? I really like my features!
Mirella Lapata expresses the community's concerns in her keynote

Though these difficult questions were at the forefront of the conference (the presidential address even alluded to a recent high-profile debate on the subject), the overall mood was positive nonetheless. At ACL 2017, the NLP community continued to enthusiastically embrace deep learning, though with a healthy skepticism. As researchers are starting to reach a clearer view of what works and what doesn’t with current neural methods, there is a growing trend to consult older NLP wisdom to guide and improve those methods. In this post I take a look at what’s happening at this pivotal time for NLP research. .... "

Wednesday, September 21, 2016

The State of Natural Language

Long ago I took a course on language theory. Have since remained intrigued about how the evolution of the field ultimately links to Natural Language Processing (NLP), and ultimately intelligence and how we perceive and interact with our world. Some of our efforts sought to program bots to communicate. Seems this would be a good place to consider the language a Bot might use.

So for you that have not taken the course. Here is a portion of the intro to Ling001 at the University of Pennsylvania as it is taught this fall by Mark Liberman. Also the slides that Liberman uses. Make no claim that I understand all this, but it makes me think.

Also their Language Log blog.

Friday, August 19, 2016

AliceBot Chatbot Engine

I am reminded of the Alice Bot which we experimented with in the 90s. Now the idea has taken hold as a simpler way of delivering AI, especially for human natural language interaction.

About Alicebot:

A.L.I.C.E. (Artificial Linguistic Internet Computer Entity) is an award-winning free natural language artificial intelligence chat robot. The software used to create A.L.I.C.E. is available as free ("open source") Alicebot and AIML software.

Try talking to A.L.I.C.E. just like a real person, but remember you are really chatting with a machine! A.L.I.C.E.'s Alicebot engine utilizes AIML (Artificial Intelligence Markup Language) to form responses to yourquestions and inputs.

Unlike other commercial chat robot software costing thousands of dollars, the Alicebot engine and AIML are freely available under the terms of the GNU General Public License (used by GNU/Linux and thousands of other software projects). The A.L.I.C.E. project includes hundreds of contributors from around the world.

You can read more about the history of A.L.I.C.E., or find out how you can participate in the A.L.I.C.E. development community. ... "

More on the relationship to early AI. And the Alicebot AI Foundation. Relationship to the work of Joseph Weizenbaum.

Via Jim Spohrer.

About Me

RSS

Blog Archive