/* ---- Google Analytics Code Below */
Showing posts with label R Programming. Show all posts
Showing posts with label R Programming. Show all posts

Friday, May 10, 2019

Visualizing Census Data

R based capabilities.    More hints in this short article at the link.   I found myself needing this in a project where it could have been useful.

Making maps to visualize census data    Posted by Mab Alam in DSC

Visualizing census data could not be easier thanks to few great packages in R. 

Not that in Python you could not do spatial analysis/visualization of census data, but certainly not as easily as in R because of some tailored rstats packages for this purpose.

Kyle Walker developed a package called tidycensus. This package allows for easy access, analysis and visualization of Census Bureau data on hundreds of variables.  ... " 

Thursday, April 27, 2017

R Packages for Data Wrangling

Most of this is quite well known, but provides an excellent lengthy table of resources.  I like the extent of the information, this is an update, there will be things you don't know.  'Data Wrangling' applies to manipulating and examining your data to put it into a form that is suitable for analysis and testing.  It can be time consuming, so doing it efficiently is very useful.  In the past (and still often today) this was done with a package like Excel, or a suitable database system. Often what you know best or what is standard in your business.

 Great R packages for data import, wrangling and visualization
By Sharon Machlis, Executive Editor, Online & Data Analytics, Computerworld

The focus here is on data: from R tips to desktop tools to taking a hard look at data claims.  .... "

Thursday, November 03, 2016

Power BI adds R

Interesting that many dashboard level visualization methods are adding links to deeper data science like R.  Should allow us to start with general visualization, then scaling up to data science.  In CWorld By Sharon Machlis:

Microsoft adds R data visualizations without R to Power BI
Microsoft has added R data visualizations to its Power BI data analysis cloud service -- options which no longer require either knowledge of R or having R on a local system.

These new custom visualizations work in Power BI desktop as well as the cloud service (but the desktop software does require a local R installation, although not knowledge of R). ... " 

Tuesday, October 18, 2016

MicroStrategy Desktop Now Free

Examined MicroStrategy for the enterprise some time ago, and found it to be very good.  Nice to see this added to the growing free BI viz options.  For students or the small business especially.  Worth a look.  Plan to try the R integration.

MicroStrategy Desktop BI software now free
BI vendor MicroStrategy said today that its Desktop software is now free, adding to the affordable self-service BI landscape that includes Tableau Public, Microsoft Power BI and others. MicroStrategy Desktop 10.5 is available for download at https://www.microstrategy.com/us/desktop.

Customers who currently use MicroStrategy Web "can seamlessley connect MicroStrategy Desktop to their existing projects," the company said in its announcement. "Additionally, by downloading their dashboards from the server to MicroStrategy Desktop, MicroStrategy Web users can work locally and offline." MicroStrategy Web is still a paid product.  .... "

 ... The desktop version of the software can connect to multiple types of data sources from spreadsheets to Hadoop for data visualization, including some GIS capabilities from Esri. There's also an add-on R Integration Pack. .. ." 

Monday, October 10, 2016

Model Based Machine Learning

Thinking of the possibilities for analyzing supply chain problems.   Bayesian approaches.   Has anyone examined?  This shows some technical examples using R.    Via DominoDataLab.

This guest post was written by Daniel Emaasit, a Ph.D Student of Transportation Engineering at the University of Nevada, Las Vegas. Daniel’s research interests include the development of probabilistic machine learning methods for high-dimensional data, with applications to urban mobility, transport planning, highway safety, & traffic operations. ....  

This blog post follows my journey from traditional statistical modeling to Machine Learning (ML) and introduces a new paradigm of ML called Model-Based Machine Learning (Bishop, 2013). Model-Based Machine Learning may be of particular interest to statisticians, engineers, or related professionals looking to implement machine learning in their research or practice.

During my Masters in Transportation Engineering (2011-2013), I used traditional statistical modeling in my research to study transportation related problems such as highway crashes. When I started my PhD, I wanted to explore using machine learning because of the powerful academic and industry use cases I had read about. In particular, I wanted to develop methods that learned how people travel within cities, allowing for better planning of transportation infrastructure.  ... " 

Friday, September 16, 2016

Getting Visual in the Smart City

Analysis of NYC Yellow Cab Taxi data  Posted by SupSta

In DSC, a visualized analysis of Yellow Taxi Cab data about NYC using R.  Gives an example of what can be done.   Is this useful for Uber?  For an understanding of what a driverless car world might look like?  For future smart city planning.  The future city is a digital city.    I always start with a visual of the data involved, and make the visualization interactive if at all possible.

Friday, September 09, 2016

What is R?

Nicely done, non-technical.  Check out some examples too.

What is R? R Explained in less than Two Minutes, to Absolutely Anyone
by Bernard Marr

If you're looking at ways you can harness the power of Big Data analytics in your business, but are not necessarily a techie person yourself, it can be a confusing field at first.

For this reason I'm publishing a series of short posts aimed at explaining some of the key concepts and technologies behind Big Data and data analytics, aimed at an audience which is not primarily composed of IT specialists or data scientists  .... " 

Thursday, September 01, 2016

Value of Quick Descriptive Statistics

Just received from Jason Brownlee:

" ... Hi, you can learn a lot about your dataset by reviewing some basic descriptive statistics. ... The R platform provides a seemingly unlimited array of functions for poking and prodding your dataset in order to learn just that little bit more.   .. " 

My View:  I  am a big proponent of scanning data in this way.  It addresses both correctness and basic descriptive measures.  It also lets you quickly add some basic visualization to the data, which further lets you look for more subtlety in patterns.  Also,  I note that this does not need to be done in R, it can be done in Excel, Tableau, Spotfire, SPSS or many other methods.    I have had experience with descriptive methods saving me a large amount of later, more complex efforts.  Do try it. ...

Jason Brownlee sends along a note on now to do a simple descriptive review of data in R.  List below.    Jason summarizes:  " .. Below is my list of the 8 descriptive statistics I recommend you look at when reviewing your dataset in R:

1. Peek at the first few rows of your data
2 .Review the number of rows and columns you ave.
3. Review the data types of each column
4. Take a look at the class distribution (for classification problems)
5. Calculate a simple 5-number summary for each column
6. Review the standard deviations for each numerical column
7. Check the skewness of each column, handy to see what transforms to apply
8. Review the correlations between attributes

Learn the exact snippet of code to use for each statistic in the blog post:
  http://machinelearningmastery.com/descriptive-statistics-examples-with-r/

Nicely done, for more writing by Jason, see:   http://machinelearningmastery.com/author/jasonb/

And see his E book on the subject

Will report on the book in September.

Wednesday, August 31, 2016

Visualizations in R

I was reminded of this,  Machlis Musings ... By Sharon Machlis  5 data visualizations in 5 minutes: each in 5 lines or less of R  ...     Nicely done.  Does not mean everything is this simple, but simple is always a good place to start.

Saturday, August 27, 2016

Product Association Rules Using R

Nice DSC article by Gregory Choi  that relates directly to retail purchasing. How do I mine for associations in purchasing based on past buying data?  Could be used to put together bundles for possible co marketing. This uses the old and overworked  'diapers and beer' example.  Using R as an sample methodology.

Saturday, August 20, 2016

Using Neural Networks with R

Back in the first go around with neural networks we had to write our own, but now its easily available via R.  (And many other places)  This is a reasonable introduction, but not really best for an actual application.    But if you know how R is used already, it gets you to what you need to test a solution architecture.    In KDNuggets.

Friday, August 19, 2016

Microsoft Links Power BI to R

 ... By Sharon Machlis

I remain pleasantly surprised at Microsoft's enthusiasm for adding R to its analytics ecosystem (and not [at least yet] fulfilling suspicions its end game is to fork a version of R that is semi-proprietary). 

Today offered another example, with an R for the Masses with Power BI webinar touting R as an option for data heavy-lifting within its Power BI platform. Granted, as an R user I'm biased, but it makes a lot more sense to lean on a language already in wide use for data work, as opposed to expecting people to learn its own DAX and M. During today's webinar, Microsoft revealed that its own survey showed more than 80% of respondents wanted to use R for advanced data work.

Power BI users could already run R scripts within the software to pull in data, to reshape and otherwise wrangle data, and to visualize data. During the webinar, Microsoft announced an R Script Showcase with examples designed to "find inspiration for leveraging R scripts in Power BI. There are already examples for using R to find clusters within your data, generate forecasts and create decision trees.   ... " 

Thursday, August 11, 2016

R Connects to Watson for Decision Improvement

Every successful Cognitive/AI  project I have been involved in since the 80s has also used machine learning analytics techniques now readily available in R.   Machine learning data science can be seen as a way to add intelligence to a system, and prepare it for better decision based methods.

  I think we will continue to see this collaboration between methods.  This is a new kind of ensemble technique that will evolve.   Decision making can be directly involved. Will continue to follow this development.

" ... New R extension gives data scientists easy access to IBM's Watson   By Katherine Noyes  

Data scientists have a lot of tools at their disposal, but not all of them are equally accessible. Aiming to put IBM's Watson AI within closer reach, analytics firm Columbus Collaboratory on Thursday released a new open-source R extension called CognizeR.

R is an open-source language that's widely used by data scientists for statistical and analytics applications. Previously, data scientists would have had to exit R to tap Watson's capabilities, coding the calls to Watson's application programming interfaces (APIs) in another language, such as Java or Python.

Now, CognizeR lets them tap into Watson's so-called "cognitive" artificial-intelligence services without leaving their native development environment.

"Data scientists can now seamlessly tap into our cognitive services to unlock data that lives in unstructured forms like chats, emails, social media, images, and documents," wrote Rob High, vice president and CTO for Watson, in a blog post.  .. " 

Wednesday, August 10, 2016

Scaling Data Science in R

Ran into exactly this problem recently.   Mostly we solved using R as an abstraction layer with sampled data.   There are a number of other solutions in the article:

You’ve got three options: Scaling up, scaling out, or using R as an abstraction layer.
By Federico Castanedo 

For more on this topic, Brian Kreeger and Roger Fried will be hosting a live webcast, Scalable Data Science with R, on August 16, 2016.

R is among the top five data science tools in use today according O’Reilly research; the latest kdnuggets survey puts it in first, and IEEE Spectrum ranks it as the fifth most popular programming language.

The latest Rexer Data Miner survey revealed that in the past eight years, there has been an three-fold increase in the number of respondents using R, and a seven-fold increase in the number of analysts/scientists who have said that R is their primary tool.

Despite its popularity, the main drawback of vanilla R is its inherently “single threaded” nature and its need to fit all the data being processed in RAM. But nowadays, data sets are typically in the range of GBs and they are growing quickly to TBs. In short, current growth in data volume and variety is demanding more efficient tools by data scientists   ... " 

Wednesday, June 08, 2016

An Advanced Beginners Guide to R

Nicely done.    So you’ve gone through the Computerworld Beginner’s Guide to R and want to take some next steps in your R journey? In this advanced beginner’s guide, you’ll learn data wrangling, best packages to use for different tasks, how to make maps with R and more    .. " 

Sunday, May 22, 2016

A Condensed Guide to Data Science

Nicely done introduction by Vincent Granville of DSC,  links to many useful resources.  Good list, condensed, but still lots there, and considerable variability in technical depth,  Which if you discuss computing languages, is necessary.

Hitchhiker's Guide to Data Science, Machine Learning, R, Python  by Vincent Granville

Thousands of articles and tutorials have been written about data science and machine learning. Hundreds of books, courses and conferences are available. You could spend months just figuring out what to do to get started, even to understand what data science is about. .... 

In this short contribution, I share what I believe to be the most valuable resources - a small list of top resources and starting points. This will be most valuable to any data practitioner who has very little free time.  .....  " 

DSC is an excellent resource to follow.

Wednesday, May 18, 2016

Apprenticeships in the US

In K@W:  Making a comeback?  Yes, I think so, for example a good place would be in data science and programming, which are at least 50% practiced - skill based to do well.  And do in a standardized way. Elsewhere are well.   I recall my father detailing his grueling experiences as a machinist's apprentice.

These are not necessarily done best in the University setting, we saw that a long time ago.  Perhaps a model that has university and experienced practitioners linked together? Also saw that in a recent experience, but the company was not willing to create the needed apprentice environment.

Tuesday, April 05, 2016

Principal Component Analysis using R

Principal Component Analysis (PCA) was a favorite technique of ours for a long time, because it addressed the dimensionality of a problem,   So important for problems that had many socially influenced dimensions.  These problems also typically had relatively sparse data, because it could be expensive to obtain. Here a good example of PCA in R, and motivation for its use.   Worth knowing.

Sunday, March 13, 2016

Classification Methods in R

We used classification methods to determine, for example, how marketing should be placed or how it should be adapted.  Most of this work was done long before R, but with adapted codes,  I am still in the process of understanding of how some of these methods have been implemented in R.  So I am collecting examples for possible test.  Here in DSC, a collection of examples of classification in R. Among these, we used only a handful.  Instructive.