Tales Gomes

Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your ecommendations, so they must be backed up with compelling data insights and professional data visualizations.

Ask Phase

Guiding questions

What is the problem you are trying to solve?
- How do annual members and casual riders use Cyclistic bikes differently?
- Why would casual riders buy Cyclistic annual memberships?
- How can Cyclistic use digital media to influence casual riders to become members?
How can your insights drive business decisions?
- improve the marketing campaign
Identify the business task
- Undertand the diferente between casual users and members to improve the marketing campaign
Consider key stakeholders
- Main stakeholders:
  - Cyclistic executive team
  - Lily Moreno
- Secundary stakeholder:
  - Cyclistic marketing analytics team leader

Prepare

Where is your data located?
- Cyclistic bikeshare database
How is the data organized?
- the data base is organized in 12 files with month data from july 2020 to june 2021.
Are there issues with bias or credibility in this data?
- Reliable -Yes, the data is reliable. The data is a primary source data based on a fictional company.
- Original - Yes, the original public data can be located.
- Comprehensive - Yes, no vital information is missing.
- Current - Yes, the data base is updated monyhly.
How are you addressing licensing, privacy, security, and accessibility?
- the data is distributed in this license.
How did you verify the data’s integrity?
- Using R (ver. 4.1) and Rstudio (ver. 1.4)
How does it help you answer your question?
- R is a powerful tool that makes it easy to manipulate large databases.
Are there any problems with the data?
- Yes, Some missing values, but it did not interfere with the analysis.

Process Phases

Ingesting the data

Ingesting the data using the vroom library and loading into the bikeshare_data.

library(tidyverse)  # used to filter the data

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.7     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(lubridate)  #used to work with date class.


Attaching package: 'lubridate'

The following objects are masked from 'package:base':

    date, intersect, setdiff, union

library(Hmisc)

Carregando pacotes exigidos: lattice

Carregando pacotes exigidos: survival

Carregando pacotes exigidos: Formula


Attaching package: 'Hmisc'

The following objects are masked from 'package:dplyr':

    src, summarize

The following objects are masked from 'package:base':

    format.pval, units

library(kableExtra)


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

# loding the files name and
files <- fs::dir_ls(path = "../database/")
files

../database/202007-divvy-tripdata.csv ../database/202008-divvy-tripdata.csv 
../database/202009-divvy-tripdata.csv ../database/202010-divvy-tripdata.csv 
../database/202011-divvy-tripdata.csv ../database/202012-divvy-tripdata.csv 
../database/202101-divvy-tripdata.csv ../database/202102-divvy-tripdata.csv 
../database/202103-divvy-tripdata.csv ../database/202104-divvy-tripdata.csv 
../database/202105-divvy-tripdata.csv ../database/202106-divvy-tripdata.csv 
../database/202107-divvy-tripdata.csv ../database/202108-divvy-tripdata.csv 
../database/202109-divvy-tripdata.csv ../database/202110-divvy-tripdata.csv 
../database/202111-divvy-tripdata.csv ../database/202112-divvy-tripdata.csv

bikeshare_data <- vroom::vroom(files, col_names = TRUE)

Rows: 8081804 Columns: 13

── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
dbl  (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(bikeshare_data)

Rows: 8,081,804
Columns: 13
$ ride_id            <chr> "762198876D69004D", "BEC9C9FBA0D4CF1B", "D2FD8EA432…
$ rideable_type      <chr> "docked_bike", "docked_bike", "docked_bike", "docke…
$ started_at         <dttm> 2020-07-09 15:22:02, 2020-07-24 23:56:30, 2020-07-…
$ ended_at           <dttm> 2020-07-09 15:25:52, 2020-07-25 00:20:17, 2020-07-…
$ start_station_name <chr> "Ritchie Ct & Banks St", "Halsted St & Roscoe St", …
$ start_station_id   <chr> "180", "299", "329", "181", "268", "635", "113", "2…
$ end_station_name   <chr> "Wells St & Evergreen Ave", "Broadway & Ridge Ave",…
$ end_station_id     <chr> "291", "461", "156", "94", "301", "289", "140", "31…
$ start_lat          <dbl> 41.90687, 41.94367, 41.93259, 41.89076, 41.91172, 4…
$ start_lng          <dbl> -87.62622, -87.64895, -87.63643, -87.63170, -87.626…
$ end_lat            <dbl> 41.90672, 41.98404, 41.93650, 41.91831, 41.90799, 4…
$ end_lng            <dbl> -87.63483, -87.66027, -87.64754, -87.63628, -87.631…
$ member_casual      <chr> "member", "member", "casual", "casual", "member", "…

Verifing missing values

bikeshare_data |>
    is.na() |>
    colSums()

           ride_id      rideable_type         started_at           ended_at 
                 0                  0                  0                  0 
start_station_name   start_station_id   end_station_name     end_station_id 
            785465             786088             849162             849623 
         start_lat          start_lng            end_lat            end_lng 
                 0                  0               8137               8137 
     member_casual 
                 0

The missing data are grouped at the location variable (station name, latitude and longitude)

filtering the data

Filtering and Process the data using the tools in the tidyverse.
- In this fase we created the following variables:
  - trip_duration - the trip duration in minutes;
  - weekday_day - The day of the week the trip takes place;
  - is_weekend - Test if the day is a weekend;
  - date_month - Stores the month the trip takes place;
  - date_hour - Stores the hour the trip takes place;
  - date_season - Stores the season of the year;
  - day_time - Stores the time of the day;
  - trip_route - Stores the route of the trip (start station to end station).
- Then we keep the following variable:
  - start_station_name;
  - ride_id;
  - rideable_type;
  - and member_casual.
- the we exclude the remaning original variables.
- then we change the class of the categorical variables to factor.
- And finally, we filter the data to contain only trip duration longer than 0 minutes.

I chose not to exclude missing values due to being limited to location variables (station names and geographic markers), as well as excluding trips shorter than two minutes to minimize data collection errors.

# Filterring data.
bikeshare_data <- bikeshare_data |>
    mutate(trip_duration = as.numeric(ended_at - started_at)/60, weekday_day = wday(started_at,
        label = TRUE), is_weekend = ifelse((wday(started_at) == 7 | wday(started_at) ==
        1), "yes", "no"), date_month = month(started_at, label = TRUE), date_hour = hour(started_at),
        date_season = case_when(month(started_at) == 1 | month(started_at) == 2 |
            month(started_at) == 3 ~ "winter", month(started_at) == 4 | month(started_at) ==
            5 | month(started_at) == 6 ~ "spring", month(started_at) == 7 | month(started_at) ==
            8 | month(started_at) == 9 ~ "summer", month(started_at) == 10 | month(started_at) ==
            11 | month(started_at) == 12 ~ "fall"), day_time = case_when(hour(started_at) <
            6 ~ "dawn", hour(started_at) >= 6 & hour(started_at) < 12 ~ "morning",
            hour(started_at) >= 12 & hour(started_at) < 18 ~ "afternoon", hour(started_at) >=
                18 ~ "night"), trip_route = str_c(start_station_name, end_station_name,
            sep = " to ")) |>
    relocate(start_station_name, .before = trip_route) |>
    select(-(started_at:end_lng)) |>
    mutate(is_weekend = factor(is_weekend, levels = c("yes", "no"), ordered = TRUE),
        rideable_type = factor(rideable_type, levels = c("docked_bike", "electric_bike",
            "classic_bike"), ordered = TRUE), member_casual = factor(member_casual,
            levels = c("member", "casual"), ordered = TRUE), date_season = factor(date_season,
            levels = c("winter", "spring", "summer", "fall"), ordered = TRUE), date_hour = factor(date_hour,
            levels = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
                17, 18, 19, 20, 21, 22, 23), ordered = TRUE), day_time = factor(day_time,
            levels = c("dawn", "morning", "afternoon", "night"), ordered = TRUE)) |>
    filter(trip_duration > 2)  # Only taking in account trips Higher than 2 minutes

Checking the data

glimpse(bikeshare_data)

Rows: 7,848,936
Columns: 21
$ ride_id            <chr> "762198876D69004D", "BEC9C9FBA0D4CF1B", "D2FD8EA432…
$ rideable_type      <ord> docked_bike, docked_bike, docked_bike, docked_bike,…
$ started_at         <dttm> 2020-07-09 15:22:02, 2020-07-24 23:56:30, 2020-07-…
$ ended_at           <dttm> 2020-07-09 15:25:52, 2020-07-25 00:20:17, 2020-07-…
$ trip_route         <chr> "Ritchie Ct & Banks St to Wells St & Evergreen Ave"…
$ start_station_name <chr> "Ritchie Ct & Banks St", "Halsted St & Roscoe St", …
$ start_station_id   <chr> "180", "299", "329", "181", "268", "635", "113", "2…
$ end_station_name   <chr> "Wells St & Evergreen Ave", "Broadway & Ridge Ave",…
$ end_station_id     <chr> "291", "461", "156", "94", "301", "289", "140", "31…
$ start_lat          <dbl> 41.90687, 41.94367, 41.93259, 41.89076, 41.91172, 4…
$ start_lng          <dbl> -87.62622, -87.64895, -87.63643, -87.63170, -87.626…
$ end_lat            <dbl> 41.90672, 41.98404, 41.93650, 41.91831, 41.90799, 4…
$ end_lng            <dbl> -87.63483, -87.66027, -87.64754, -87.63628, -87.631…
$ member_casual      <ord> member, member, casual, casual, member, casual, mem…
$ trip_duration      <dbl> 3.833333, 23.783333, 7.250000, 20.933333, 5.133333,…
$ weekday_day        <ord> qui, sex, qua, sex, sáb, ter, qui, seg, qui, seg, s…
$ is_weekend         <ord> no, no, no, no, yes, no, no, no, no, no, no, no, no…
$ date_month         <ord> jul, jul, jul, jul, jul, jul, jul, jul, jul, jul, j…
$ date_hour          <ord> 15, 23, 19, 19, 10, 16, 11, 16, 11, 18, 15, 18, 9, …
$ date_season        <ord> summer, summer, summer, summer, summer, summer, sum…
$ day_time           <ord> afternoon, night, night, night, morning, afternoon,…

Analyse Phase

First, we analyze the data broadly to see patterns, then group it by user type to see differences.

# Using Hmisc package

bikeshare_summary <- bikeshare_data |>
    select(-c(ride_id, start_station_name, trip_route)) |>
    describe(descript = "Statistical Description Summary", tabular = TRUE)

html(bikeshare_summary, exclude1 = TRUE, align = "c", scroll = TRUE, rows = 50)

Statistical Description Summary

9 Variables 7848936 Observations

rideable_type

n	missing	distinct
7848936	0	3

 Value        docked_bike electric_bike  classic_bike
 Frequency        2170489       2440349       3238098
 Proportion         0.277         0.311         0.413

member_casual

n	missing	distinct
7848936	0	2

 Value       member  casual
 Frequency  4344181 3504755
 Proportion   0.553   0.447

trip_duration

n	missing	distinct	Info	Mean	Gmd	.05	.10	.25	.50	.75	.90	.95
7848936	0	31695	1	24.29	27.51	3.600	4.650	7.433	13.017	23.617	40.867	62.300

lowest :	2.016667	2.033333	2.050000	2.066667	2.083333
highest:	52701.383333	53921.600000	54283.350000	55691.683333	55944.150000

weekday_day

n	missing	distinct
7848936	0	7

lowest : dom seg ter qua qui , highest: ter qua qui sex sáb

 Value          dom     seg     ter     qua     qui     sex     sáb
 Frequency  1192589  968361 1010452 1062400 1049751 1157123 1408260
 Proportion   0.152   0.123   0.129   0.135   0.134   0.147   0.179

is_weekend

n	missing	distinct
7848936	0	2

 Value          yes      no
 Frequency  2600849 5248087
 Proportion   0.331   0.669

date_month

n	missing	distinct
7848936	0	12

lowest : jan fev mar abr mai , highest: ago set out nov dez

 Value          jan     fev     mar     abr     mai     jun     jul     ago     set
 Frequency    93893   48012  222701  328578  516940  709441 1340055 1387323 1252437
 Proportion   0.012   0.006   0.028   0.042   0.066   0.090   0.171   0.177   0.160
                                   
 Value          out     nov     dez
 Frequency   986096  598027  365433
 Proportion   0.126   0.076   0.047

date_hour

n	missing	distinct
7848936	0	24

lowest : 0 1 2 3 4 , highest: 19 20 21 22 23

date_season

n	missing	distinct
7848936	0	4

 Value       winter  spring  summer    fall
 Frequency   364606 1554959 3979815 1949556
 Proportion   0.046   0.198   0.507   0.248

day_time

n	missing	distinct
7848936	0	4

 Value           dawn   morning afternoon     night
 Frequency     335805   1791759   3505742   2215630
 Proportion     0.043     0.228     0.447     0.282

size <- nrow(bikeshare_data)

bikeshare_data |>
  group_by(date_hour) |>
  summarise(percent = round((n() / size) * 100, 2)) |>
  arrange(desc(percent)) |>
  kable(caption = "Total percent usage by the hour of the day") |> 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    
    position = "center",
    fixed_thead = TRUE
  )|>
  scroll_box(width = "100%", height = "500px")

Total percent usage by the hour of the day
date_hour	percent
17	10.03
18	8.80
16	8.39
15	7.13
19	6.50
14	6.48
13	6.39
12	6.25
11	5.32
20	4.52
10	4.23
8	4.14
9	3.71
7	3.49
21	3.41
22	2.87
23	2.12
6	1.94
0	1.41
1	0.96
5	0.72
2	0.58
3	0.32
4	0.29

Analyzing the data generated by the “describe” function we can infer that:
- Regarding the type of bikes, “classic_bike” is more than 41% of all trips, followed by “eletric_bike” with 31% and “docked_bike” with 28%;
- Regarding to the type of user, “member” represents 55% while “casual” represents 45%;
- Regarding to the day, “weekend” represents 33% of the races with a peak on Saturday and a minimum on Monday;
- Regarding the time of day, it can be observed that the peak occurs at 17, 18 and 16 hours. The races decrease from afternoon, night, morning, until dawn.
- Regarding to the month and season, the values decrease from summer,fall, spring to winter. With the busiest months being August, July, September and Octuber and the least busy months being February, January and March;
- Regarding to the duration of the trip, the average duration is 24.26 minutes.

bikeshare_skim_member <- bikeshare_data |>
    select(-c(ride_id, start_station_name, trip_route)) |>
    group_by(member_casual) |>
    skimr::skim() |>
    as_tibble()


bikeshare_skim_member |>
    skimr::yank("numeric") |>
    as_tibble() |>
    kable(caption = "Trip duration difference between Casual users and members") |>
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
        full_width = FALSE, position = "center", fixed_thead = TRUE) |>
    scroll_box(width = "100%", height = "200px")

Trip duration difference between Casual users and members
skim_variable	member_casual	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
trip_duration	member	0	1	14.71053	45.9871	2.016667	6.183333	10.43333	17.91667	33421.37	▇▁▁▁▁
trip_duration	casual	0	1	36.15625	306.0679	2.016667	10.000000	17.55000	32.28333	55944.15	▇▁▁▁▁

bikeshare_skim_member |>
    skimr::yank("factor") |>
    as_tibble() |>
    kable(caption = "Behavior difference of Members and Casual users") |>
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
        full_width = FALSE, position = "center", fixed_thead = TRUE) |>
    scroll_box(width = "100%", height = "500px")

Behavior difference of Members and Casual users
skim_variable	member_casual	complete_rate	ordered	n_unique	top_counts
rideable_type	member	1	TRUE	3	cla: 1982937, ele: 1318011, doc: 1043233
rideable_type	casual	1	TRUE	3	cla: 1255161, doc: 1127256, ele: 1122338
weekday_day	member	1	TRUE	7	qua: 672545, qui: 645504, sex: 641688, ter: 640662
weekday_day	casual	1	TRUE	7	sáb: 782770, dom: 655855, sex: 515435, qui: 404247
is_weekend	member	1	TRUE	2	no: 3181957, yes: 1162224
is_weekend	casual	1	TRUE	2	no: 2066130, yes: 1438625
date_month	member	1	TRUE	12	ago: 700245, set: 670754, jul: 642136, out: 593567
date_month	casual	1	TRUE	12	jul: 697919, ago: 687078, set: 581683, out: 392529
date_hour	member	1	TRUE	24	17: 456215, 18: 390430, 16: 368660, 15: 292211
date_hour	casual	1	TRUE	24	17: 331026, 18: 300233, 16: 289602, 15: 267719
date_season	member	1	TRUE	4	sum: 2013135, fal: 1269120, spr: 807456, win: 254470
date_season	casual	1	TRUE	4	sum: 1966680, spr: 747503, fal: 680436, win: 110136
day_time	member	1	TRUE	4	aft: 1894379, mor: 1167354, nig: 1143103, daw: 139345
day_time	casual	1	TRUE	4	aft: 1611363, nig: 1072527, mor: 624405, daw: 196460

Regarding the difference in usage between members and casual users, we can observe the following:
- The trip duration is 2,45 times longer for casual users than members. Averaging 36.15 min for casual users and 14.71 min for members;
- Regarding the type of bicycle, the most used for members, in descending order, are “classic”, “eletric”, and “docked”. For casual users they are “classic”, “docked” and “eletric”;
- Regarding the time of year, both users follow the general average with a peak in summer and less use in winter, however for casual users spring is busier than fall;
- The busiest member months are August, September and July. For casual users, the busiest months are July, August and September;
- Regarding the day of the week, the busiest days for members, in descending order, are Wednesday, Thursday and Friday. For casual users, the busiest days are Saturday, Sunday and Friday. With greater usage of the service on weekends for casual members compared to members;
- Regarding the time of day both types of users have more runs in the afternoon, however in casual members the night is busier than in the morning.
The Stations and routes more often used are the following:

bikeshare_data |>
    group_by(start_station_name) |>
    summarise(number_of_trips = n()) |>
    arrange(-number_of_trips) |>
    drop_na(start_station_name) |>
    slice(1:10) |>
    kable(caption = "Top 10 most popular station") |>
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
        full_width = FALSE, position = "center", fixed_thead = TRUE)

Top 10 most popular station
start_station_name	number_of_trips
Streeter Dr & Grand Ave	108220
Clark St & Elm St	62730
Theater on the Lake	60991
Michigan Ave & Oak St	60550
Wells St & Concord Ln	60330
Millennium Park	59264
Wells St & Elm St	52592
Clark St & Armitage Ave	49025
Clark St & Lincoln Ave	48212
Lake Shore Dr & Monroe St	48087

bikeshare_data |>
    group_by(trip_route) |>
    summarise(number_of_trips = n()) |>
    arrange(-number_of_trips) |>
    drop_na(trip_route) |>
    slice(1:10) |>
    kable(caption = "Top 10 most popular route") |>
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
        full_width = FALSE, position = "center", fixed_thead = TRUE)

Top 10 most popular route
trip_route	number_of_trips
Streeter Dr & Grand Ave to Streeter Dr & Grand Ave	16190
Lake Shore Dr & Monroe St to Lake Shore Dr & Monroe St	9421
Millennium Park to Millennium Park	9291
Michigan Ave & Oak St to Michigan Ave & Oak St	8816
Buckingham Fountain to Buckingham Fountain	6818
Theater on the Lake to Theater on the Lake	6147
Indiana Ave & Roosevelt Rd to Indiana Ave & Roosevelt Rd	6136
Ellis Ave & 60th St to Ellis Ave & 55th St	5871
Fort Dearborn Dr & 31st St to Fort Dearborn Dr & 31st St	5839
Ellis Ave & 55th St to Ellis Ave & 60th St	5339

Share Phase

By the hour and the time of the day

bikeshare_data |>
    group_by(member_casual, date_hour) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(aes(x = date_hour, y = n_trip, color = member_casual, group = member_casual)) +
    geom_line() + geom_point() + facet_grid(rows = vars(member_casual)) + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + scale_y_continuous(labels = scales::unit_format(unit = "k",
    scale = 0.001)) + ggthemes::theme_hc() + labs(title = "Number of trips by the hour of the day",
    color = "Type of user", fill = "Type of user", x = "hour of the day", y = "Number of trips (in Thousands)")

bikeshare_data |>
    group_by(member_casual, date_hour) |>
    summarise(mean_trip = mean(trip_duration), .groups = "drop") |>
    ggplot(aes(x = date_hour, y = mean_trip, color = member_casual, group = member_casual)) +
    geom_line() + geom_point() + facet_grid(rows = vars(member_casual), scales = "free_y") +
    ggthemes::scale_color_tableau(palette = "Superfishel Stone") + ggthemes::scale_fill_tableau(palette = "Superfishel Stone") +
    ggthemes::theme_hc() + labs(title = "Number of trips by he hour of the day",
    color = "Type of user", fill = "Type of user", x = "hour of the day", y = "Trip duration (min)")

bikeshare_data |>
    group_by(member_casual, day_time) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(mapping = aes(day_time, n_trip)) + geom_col(aes(color = member_casual,
    fill = member_casual), position = "dodge2") + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + scale_y_continuous(limits = c(0,
    2e+06), n.breaks = 10, labels = scales::unit_format(unit = "MM", scale = 1e-06)) +
    ggthemes::theme_hc() + labs(title = "Number of trips by the time of the day",
    color = "Type of user", fill = "Type of user", x = "Time of the day", y = "Number of trips (in Millions)")

By Month and Season

bikeshare_data |>
    group_by(member_casual, date_month) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(aes(x = date_month, y = n_trip, color = member_casual, group = member_casual)) +
    geom_line() + geom_point() + facet_grid(rows = vars(member_casual)) + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + scale_y_continuous(labels = scales::unit_format(unit = "k",
    scale = 0.001)) + ggthemes::theme_hc() + labs(title = "Number of trips by the months and type of user",
    color = "Type of user", fill = "Type of user", x = "Month of the year", y = "Number of trips (in Thousands)")

bikeshare_data |>
    group_by(member_casual, date_month) |>
    summarise(mean_duration = mean(trip_duration), .groups = "drop") |>
    ggplot(aes(x = date_month, y = mean_duration, color = member_casual, group = member_casual)) +
    geom_line() + geom_point() + facet_grid(rows = vars(member_casual), scales = "free_y") +
    ggthemes::scale_color_tableau(palette = "Superfishel Stone") + ggthemes::scale_fill_tableau(palette = "Superfishel Stone") +
    ggthemes::theme_hc() + labs(title = "Trip duration by the Months and type of user",
    color = "Type of user", fill = "Type of user", x = "hour of the day", y = "Trip duration (min)")

bikeshare_data |>
    group_by(member_casual, date_season) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(mapping = aes(date_season, n_trip)) + geom_col(aes(color = member_casual,
    fill = member_casual), position = "dodge2") + scale_y_continuous(n.breaks = 10,
    labels = scales::unit_format(unit = "MM", scale = 1e-06)) + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + ggthemes::theme_hc() +
    labs(title = "Number of trips by seasons and type of user", color = "Type of user",
        fill = "Type of user", x = "season of the year", y = "Number of trips (in Millions)")

By type of the bike

bikeshare_data |>
    group_by(member_casual, rideable_type) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(mapping = aes(rideable_type, n_trip)) + geom_col(aes(color = member_casual,
    fill = member_casual), position = "dodge2") + scale_y_continuous(n.breaks = 10,
    labels = scales::unit_format(unit = "MM", scale = 1e-06)) + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + ggthemes::theme_hc() +
    labs(title = "Number of trips by the type of the bike and user", color = "Type of user",
        fill = "Type of user", x = "type of the bike", y = "Number of trips (in Millions)")

bikeshare_data |>
    group_by(member_casual, rideable_type, date_hour) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(aes(x = date_hour, y = n_trip, color = rideable_type, group = interaction(member_casual,
        rideable_type))) + geom_line() + geom_point() + facet_grid(rows = vars(member_casual),
    scales = "free_y") + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + scale_y_continuous(labels = scales::unit_format(unit = "k",
    scale = 0.001)) + ggthemes::theme_hc() + labs(title = "Number of trips by the hour of the day, type of bike and type of user",
    color = "Type of bike", fill = "Type of bike", x = "hour of the day", y = "Number of trips (in Thousands)")

bikeshare_data |>
    group_by(member_casual, rideable_type, date_hour) |>
    summarise(mean_duration = mean(trip_duration), .groups = "drop") |>
    ggplot(aes(x = date_hour, y = mean_duration, color = rideable_type, group = interaction(member_casual,
        rideable_type))) + geom_line() + geom_point() + facet_grid(rows = vars(member_casual),
    scales = "free_y") + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + ggthemes::theme_hc() +
    labs(title = "Duration of the trips by the type of the bike", color = "Type of bike",
        fill = "Type of bike", x = "hour of the day", y = "Trip duration (min)")

bikeshare_data |>
    group_by(member_casual, rideable_type, date_month) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(aes(x = date_month, y = n_trip, color = rideable_type, group = interaction(member_casual,
        rideable_type))) + geom_line() + geom_point() + facet_grid(rows = vars(member_casual),
    scales = "free_y") + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + scale_y_continuous(labels = scales::unit_format(unit = "k",
    scale = 0.001)) + ggthemes::theme_hc() + labs(title = "Number of trips by the months",
    color = "Type of bike", fill = "Type of bike", x = "Month of the year", y = "No. of trips (in Thousands)")

Stations and the Routs more offen used

bikeshare_data |>
    group_by(start_station_name) |>
    summarise(n_trip = n()) |>
    arrange(-n_trip) |>
    drop_na(start_station_name) |>
    slice(1:10) |>
    ggplot(mapping = aes(fct_reorder(start_station_name, -n_trip), n_trip)) + geom_col(aes(color = n_trip,
    fill = n_trip), position = "dodge2") + coord_flip() + ggthemes::scale_color_continuous_tableau(palette = "Blue") +
    ggthemes::scale_fill_continuous_tableau(palette = "Blue") + ggthemes::theme_hc() +

theme(legend.position = "none") + labs(title = "10 most used start point", x = "",
    y = "")

bikeshare_data |>
    group_by(trip_route) |>
    summarise(n_trip = n()) |>
    arrange(-n_trip) |>
    drop_na(trip_route) |>
    slice(1:10) |>
    ggplot(mapping = aes(fct_reorder(trip_route, -n_trip), n_trip)) + geom_col(aes(color = n_trip,
    fill = n_trip), position = "dodge2") + coord_flip() + ggthemes::scale_color_continuous_tableau(palette = "Blue") +
    ggthemes::scale_fill_continuous_tableau(palette = "Blue") + ggthemes::theme_hc() +
    theme(legend.position = "none") + labs(title = "Top 10 Routes", x = "", y = "")

Act

Key findings

Different of members, casual user use service more often during the weekend;
Also have the mean duration of the trips 2,45 times longer than members;
They have the highest trip duration during dawn (from 12 am to 5 am). With the pic at 2 am;
Casual users use the service more from July to October;
Although February is the month with the least trips, it is also the month with the longest duration of trips;
Casual members use more docked bikes thans members and also have a longer average travel time for this type of bike.

Recommendations

Create a subscription based on time-of-day to encourage casual users that ride, for exemple, from 9 pm to 5 am to subscribe;
Implement discounts or a points system based on loyalty (frequency of use) and high trip-duration users;
Create seasonal subscriptions such as summer and spring. Or implement discounts on temporary subscriptions (3, 6, 9, 12 months);
Create subscription especific to ride on week day or on the weekend;
Create subscriptions for specific types of bicycles. Plans to use only docked bicycles, for example.