Categories: Uncategorized

Data analytics using r studio

Instructions
(15pts) For following instructions and submission requirements
• Put your name at the top of this file
• All data cleaning should be done using a dplyr chain connected directly to the read.csv() line (only
one read.csv() per Part, you don’t need separate ones for each question)
• You only have to “clean” anything that you need to clean to answer the questions, I’m not expecting
you to clean every column in every dataset.
• All visualizations should be completed using the ggplot2 package.
• All questions should be answered in a single line (or chain) of code. No saving intermediate datasets
or objects.
• No irrelevant code or unnecessary code or output in your final document.
SUBMISSIONS
• Submit your Rmd file AND a knitted document to Blackboard
• Your knitted document must show your code AND output
• Submit the right documents the first try
Question 1: (5pts) Please load any package needed in this script in the code chunk below.
library(ggplot2)
library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)
library(scales)
1
Part 1 – Paycheck Protection Program
The Paycheck Protection Program (PPP) is a $953 billion business loan program established by the US
government through the Coronavirus Aid, Relief, and Economic Security Act (CARES Act) to help certain
businesses, self-employed workers, sole proprietors, certain nonprofit organizations, and tribal businesses
continue paying their workers.
Loan data is available through the Small Business Administration website. I took the data for all loans over
$150K, filtered for Florida, cleaned it a bit to match topics covered in our course, and am providing you
with it through the link below:
• https://dxl-datasets.s3.amazonaws.com/data/ppp_fl.csv
Read in and clean data (10pts)
Question 2: (5pts) Return the total amount of money loaned by month from 2020 to 2021.
Question 3: (5pts) Return the name and total money loaned for Florida’s top PPP lender in 2021 (“top”
= most money loaned).
Question 4: (5pts) What percent of loan money went to borrowers in the city of Miami?
Question 5: (7pts) Please display the average jobs by business age with a bar plot, with business age
grouped into the following 4 categories: Startup, 0-2 Years, 3+ Years, and Other/Unknown.
Question 6: (7pts) Please visualize the total amount borrowed in each zipcode in the city of Coral Gables
with a horizontal bar plot. All zipcodes should be 5-digits long. Order in descending order, and format
amount loaned as a currency in units of a million (e.g. $40M for $40,000,000).
2
Part 2 – AIAAIC
AIAAIC (AI, Algorithmic, and Automation Incidents and Controversies) is an independent, non-partisan,
public interest initiative that examines and makes the case for real AI, algorithmic, and automation transparency and openness.
AIAAIC is looking to make AI, algorithms, and automation more transparent by:
• Empowering civil society entities including researchers, academics, teachers, NGOs, journalists, and
think tanks
• Educating end users, citizens, students, and others
• Making the case to policymakers, regulators, and businesses
Data provided details incidents and controversies driven by and relating to artificial intelligence, algorithms,
and automation, and can be read in from:
• https://dxl-datasets.s3.amazonaws.com/data/aiaaic.csv
Read in and clean data (10pts)
Question 7: (7pts) Visualize the number of safety risk incidents by year in the USA with a horizontal bar
chart. Your final plot should look similar to template below (you’ll have different axis limits):
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
0 250 500 750 1000
Incidents
Safety risk incidents are on the rise in the USA
3
Question 8: (7pts) Visualize the number of safety risk incidents by year separately for China, USA, and
the UK. Your final plot should look similar to the template below (again, you’ll have different axis limits).
NOTE that This question is potentially difficult. Skip to Part 3 (much easier) and return later if needed.
China UK USA
0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000
2011
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
Number of Incidents
Safety risk incidents by year
4
Part 3
The data for this section comes from Atlanta’s open data portal, and contains information related to purchasing by Atlanta city government from 2016 to 2018.
The data is stored at the address below.
• https://dxl-datasets.s3.amazonaws.com/data/atl_ledger.csv
Read in and clean data (5pts)
Please recreate any two of the following three plots using ggplot. If you need to clean or aggregate data,
you should Your colors do not have to match mine but please do change the defaults.
Question 9 (7pts)
Question 10 (7pts)
$0
$50
$100
2016 2017 2018
Year
Total Spending (in millions $)
Expense Category
CAPITAL OUTLAYS
CONTRACTED SERVICES
SUPPLIES
Annual spending by expense category
5
ATLANTA CITIZENS REVIEW BOARD
CITY COUNCIL
DEPARTMENT OF AUDIT
DEPARTMENT OF AVIATION
DEPARTMENT OF CORRECTIONS
DEPARTMENT OF ETHICS
DEPARTMENT OF FINANCE
DEPARTMENT OF FIRE SERVICES
DEPARTMENT OF HUMAN RESOURCES
DEPARTMENT OF INFORMATION TECHNOLOGY
DEPARTMENT OF LAW
DEPARTMENT OF POLICE SERVICES
DEPARTMENT OF PROCUREMENT
DEPARTMENT OF PUBLIC DEFENDER
DEPARTMENT OF PUBLIC WORKS
DEPARTMENT OF THE SOLICITOR
DEPARTMENT OF WATERSHED MANAGEMENT
DEPT OF PARKS, RECREATION & CULTURAL AFF
DEPT OF PLANNING & COMMUNITY DEVELOPMENT
EXECUTIVE OFFICES
JUDICIAL AGENCIES
NON−DEPARTMENTAL
$0 $20 $40 $60
Total 2018 Spending (millions $)
Spending by department in 2018
GENERAL FUNDS SPECIAL REVENUE FUNDS
CAPITAL PROJECTS FUNDS ENTERPRISE FUNDS
$0
$50
$100
$150
$200
$0
$50
$100
$150
$200
CAPITAL OUTLAYS
CONTRACTED SERVICES
SUPPLIES
CAPITAL OUTLAYS
CONTRACTED SERVICES
SUPPLIES
Spending (in million

admin

Share
Published by
admin

Recent Posts

Childbirth

For this short paper activity, you will learn about the three delays model, which explains…

7 months ago

Literature

 This is a short essay that compares a common theme or motif in two works…

7 months ago

Hospital Adult Medical Surgical Collaboration Area

Topic : Hospital adult medical surgical collaboration area a. Current Menu Analysis (5 points/5%) Analyze…

7 months ago

Predictive and Qualitative Analysis Report

As a sales manager, you will use statistical methods to support actionable business decisions for Pastas R Us,…

7 months ago

Business Intelligence

Read the business intelligence articles: Getting to Know the World of Business Intelligence Business intelligence…

7 months ago

Alcohol Abuse

The behaviors of a population can put it at risk for specific health conditions. Studies…

7 months ago