The Eponymous Pickle: Analysis

Showing posts with label Analysis. Show all posts

Saturday, April 29, 2023

Exploratory Data Analysis with Pandas Python 2023

Nicely done general, beginners piece on data analysis

by Rob Mulla

https://www.youtube.com/watch?v=xi0vhXFPegw

122,509 views Premiered Dec 31, 2021 Medallion Python Data Science Coding Videos

In this video about exploratory data analysis with pandas and python, Kaggle grandmaster Rob Mulla will teach you the basics of how to explore data using python and pandas. Exploratory Data Analysis it a necessary tool for any data scientist. Pandas is a MUST for anyone getting into data science with python. Python is the #1 coding language for data science and has been growing over the years as an essential tool, with Pandas being the main data wrangling module. Kaggle Grandmaster Rob goes over it all in this video. In this video we discuss the basics of how to use explore data including...

Timestamps:

00:00 Introduction

01:00 Imports and reading data

03:35 Data Understanding

06:40 Data Preparation

20:57 Feature Understanding

27:35 Feature Relationships

35:30 Asking a Question about the Data

40:00 Final Thoughts

Follow me on twitch for live coding streams: https://www.twitch.tv/medallionstallion

Wednesday, November 02, 2022

Natural Language Processing Software Evaluates Middle School Science Essays

Natural Language Processing Software Evaluates Middle School Science Essays

Penn State News, Mariah Chuprinski, October 11, 2022

Computer scientists at Pennsylvania State University (Penn State) and the University of Wisconsin-Madison (UW-Madison) appraised natural language processing software for evaluating students' science essays. Researchers augmented the PyrEval tool to assess concepts in student writing based on predetermined and computable rubrics. The PyrEval-CR software "can provide middle school students immediate feedback on their science essays," while also summarizing subjects or ideas in the essays "from one or more classrooms, so teachers can quickly determine if students have genuinely understood a science lesson," said Penn State's Rebecca Passonneau. Researchers tested PyrEval-CR on hundreds of science essays from Wisconsin public schools. .. '

Wednesday, February 09, 2022

Weird Computer-Generated Phrases Tip-Off Scientific Publishing Fraud

How well this would work fairly and contextually is unclear.

Weird Computer-Generated Phrases Tip-Off Scientific Publishing Fraud

By Bulletin of the Atomic Scientists

Scientists authored six million peer-reviewed publications in 2020, and among them are thousands of fabricated articles. Modern plagiarists are making use of software and perhaps even emerging AI technologies to draft articles — and they're getting away with it.

Technology is also being used to identify fraudulent research. A computer system named the Problematic Paper Screener searches through published science and seeks out "tortured phrases" in order to find suspect work. A tortured phrase is an established scientific concept paraphrased into a nonsensical sequence of words. "Artificial intelligence" becomes "counterfeit consciousness." "Signal to noise" becomes "flag to clamor."

As of January 2022, researchers found tortured phrases in 3,191 peer-reviewed articles published, including in reputable flagship publications. They also found published papers that appear to have been partly generated with AI language models like GPT-2, a system developed by OpenAI. Unlike papers where authors seem to have used paraphrasing software, which changes existing text, these AI models can produce text out of whole cloth.

From Bulletin of the Atomic Scientists

View Full Article

Sunday, May 24, 2020

Neuroimaging Data with Varying Results

We examined fMRI for possible uses in neuromarketing applications, and also found such variation in analysis results. Standards in the analysis approach are important. Another general caution for the results you achieve, even with very large databases.

Neuroimaging Results Altered by Varying Analysis Pipelines
Nature

A survey of neuroimaging studies found that nearly every study used a different analysis pipeline, and the analytical choices of individual researchers significantly impacted findings gleaned from a functional magnetic resonance imaging (fMRI) dataset. The team provided the same dataset to 70 independent research groups and asked them to test nine hypotheses, each of which asserted that activity in a specific brain region correlated with a specific task feature. There were considerable variations between each team's results, even when their underlying maps were highly correlated. The finding highlights the potential consequences of a lack of standardized pipelines for processing complex data. ... "

Thursday, November 21, 2019

Identification from Single Strand of Hair

New identification techniques.

Scientists can now identify someone from a single strand of hair
By Eva Frederick in Science Magazine

A new forensic technique could have criminals—and some prosecutors—tearing their hair out: Researchers have developed a method they say can identify a person from as little as 1 centimeter of a single strand of hair—and that is eight times more sensitive than similar protein analysis techniques. If the new method ever makes it into the courtroom, it could greatly expand the ability to identify the people at the scene of a crime. .... "

Sunday, July 23, 2017

Machine Learning and Big Data

I was recently asked, whats the difference? .... Just Architecture vs Math?

Machine learning with Big Data is, in many ways, different than "regular" machine learning. This informative image is helpful in identifying the steps in machine learning with Big Data, and how they fit together into a process of their own ...

By Matthew Mayo, in KDnuggets. (See the excellent chart, mentioned below, at the link)

Big Data is no longer buzzword terminology or cutting edge, conceptually; rather, it just is. Big Data is not easily or precisely definable, but it is generally easy to identify when you see it.

While successful applications of machine learning cannot rely solely on cramming ever-increasing amounts of Big Data at algorithms and hoping for the best, the ability to leverage large amounts of data for machine learning tasks is a must-have skill for practitioners at this point.

While much of machine learning holds true regardless of data amounts, there are aspects which are the exclusive domain of Big Data modeling, or which apply moreso than they do to smaller data amounts. Data scientist Rubens Zimbres outlines a process for applying machine to Big Data in his original graphic below. ..... "

Friday, December 02, 2016

Intelligence Analysis

New Book of Of interest. How are these kinds of structure embedded into AI?

Intelligence Analysis as Discovery of Evidence, Hypotheses, and Arguments: Connecting the Dots
by Gheorghe Tecuci, David A. Schum, Dorin Marcu, Mihai Boicu

This unique book on intelligence analysis covers several vital but often overlooked topics. It teaches the evidential and inferential issues involved in "connecting the dots" to draw defensible and persuasive conclusions from masses of evidence: from observations we make, or questions we ask, we generate alternative hypotheses as explanations or answers; we make use of our hypotheses to generate new lines of inquiry and discover new evidence; and we test the hypotheses with the discovered evidence.

To facilitate understanding of these issues and enable the performance of complex analyses, the book introduces an intelligent analytical tool, called Disciple-CD. Readers will practice with Disciple-CD and learn how to formulate hypotheses; develop arguments that reduce complex hypotheses to simpler ones; collect evidence to evaluate the simplest hypotheses; and assess the relevance and the believability of evidence, which combine in complex ways to determine its inferential force and the probabilities of the hypotheses. .... "

Saturday, October 29, 2016

Conversational Analytics Using AI

An obvious thought, place bot style AI into a conversational interaction about doing analytics for given problems and data. The caution is still that we still don't know how to manage complex conversations, just simplistic ones. Perhaps as a kind of conversationally managed automation of data science. This effort was new to me. Plan to take a look. Drastin unveils the World's first conversational analytics product powered by Artificial Intelligence. See the Drastin site.

Thursday, September 22, 2016

Image Captioning by Tensorflow is Now Open Source

I have mentioned before this is a problem we addressed for several AI oriented applications. We called it 'image recognition'. Now the general solution is open source. Some samples images in the article, and they are impressive. The general solution of this captioning problem is an important one.

Show and Tell: image captioning open sourced in TensorFlow
Thursday, September 22, 2016
Posted by Chris Shallue, Software Engineer, Google Brain Team

In 2014, research scientists on the Google Brain team trained a machine learning system to automatically produce captions that accurately describe images. Further development of that system led to its success in the Microsoft COCO 2015 image captioning challenge, a competition to compare the best algorithms for computing accurate image captions, where it tied for first place.

Today, we’re making the latest version of our image captioning system available as an open source model in TensorFlow. This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system. These improvements are outlined and analyzed in the paper Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, published in IEEE Transactions on Pattern Analysis and Machine Intelligence. .... "

The Eponymous Pickle

About Me

RSS

Blog Archive