Google Data Analytics Capstone Project
How Does a Bike-Share Navigate Speedy?

Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your ecommendations, so they must be backed up with compelling data insights and professional data visualizations.

Ask Phase

Guiding questions

  1. What is the problem you are trying to solve?

    • How do annual members and casual riders use Cyclistic bikes differently?

    • Why would casual riders buy Cyclistic annual memberships?

    • How can Cyclistic use digital media to influence casual riders to become members?

  2. How can your insights drive business decisions?

    • improve the marketing campaign
  3. Identify the business task

    • Undertand the diferente between casual users and members to improve the marketing campaign
  4. Consider key stakeholders

    • Main stakeholders:

      • Cyclistic executive team

      • Lily Moreno

    • Secundary stakeholder:

      • Cyclistic marketing analytics team leader

Prepare

  1. Where is your data located?

  2. How is the data organized?

    • the data base is organized in 12 files with month data from july 2020 to june 2021.
  3. Are there issues with bias or credibility in this data?

    • Reliable -Yes, the data is reliable. The data is a primary source data based on a fictional company.

    • Original - Yes, the original public data can be located.

    • Comprehensive - Yes, no vital information is missing.

    • Current - Yes, the data base is updated monyhly.

  4. How are you addressing licensing, privacy, security, and accessibility?

    • the data is distributed in this license.
  5. How did you verify the data’s integrity?

    • Using R (ver. 4.1) and Rstudio (ver. 1.4)
  6. How does it help you answer your question?

    • R is a powerful tool that makes it easy to manipulate large databases.
  7. Are there any problems with the data?

    • Yes, Some missing values, but it did not interfere with the analysis.

Process Phases

Ingesting the data

  • Ingesting the data using the vroom library and loading into the bikeshare_data.
library(tidyverse)  # used to filter the data
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.7     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(lubridate)  #used to work with date class. 

Attaching package: 'lubridate'
The following objects are masked from 'package:base':

    date, intersect, setdiff, union
library(Hmisc)
Carregando pacotes exigidos: lattice
Carregando pacotes exigidos: survival
Carregando pacotes exigidos: Formula

Attaching package: 'Hmisc'
The following objects are masked from 'package:dplyr':

    src, summarize
The following objects are masked from 'package:base':

    format.pval, units
library(kableExtra)

Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':

    group_rows
# loding the files name and
files <- fs::dir_ls(path = "../database/")
files
../database/202007-divvy-tripdata.csv ../database/202008-divvy-tripdata.csv 
../database/202009-divvy-tripdata.csv ../database/202010-divvy-tripdata.csv 
../database/202011-divvy-tripdata.csv ../database/202012-divvy-tripdata.csv 
../database/202101-divvy-tripdata.csv ../database/202102-divvy-tripdata.csv 
../database/202103-divvy-tripdata.csv ../database/202104-divvy-tripdata.csv 
../database/202105-divvy-tripdata.csv ../database/202106-divvy-tripdata.csv 
../database/202107-divvy-tripdata.csv ../database/202108-divvy-tripdata.csv 
../database/202109-divvy-tripdata.csv ../database/202110-divvy-tripdata.csv 
../database/202111-divvy-tripdata.csv ../database/202112-divvy-tripdata.csv 
bikeshare_data <- vroom::vroom(files, col_names = TRUE)
Rows: 8081804 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
dbl  (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(bikeshare_data)
Rows: 8,081,804
Columns: 13
$ ride_id            <chr> "762198876D69004D", "BEC9C9FBA0D4CF1B", "D2FD8EA432…
$ rideable_type      <chr> "docked_bike", "docked_bike", "docked_bike", "docke…
$ started_at         <dttm> 2020-07-09 15:22:02, 2020-07-24 23:56:30, 2020-07-…
$ ended_at           <dttm> 2020-07-09 15:25:52, 2020-07-25 00:20:17, 2020-07-…
$ start_station_name <chr> "Ritchie Ct & Banks St", "Halsted St & Roscoe St", …
$ start_station_id   <chr> "180", "299", "329", "181", "268", "635", "113", "2…
$ end_station_name   <chr> "Wells St & Evergreen Ave", "Broadway & Ridge Ave",…
$ end_station_id     <chr> "291", "461", "156", "94", "301", "289", "140", "31…
$ start_lat          <dbl> 41.90687, 41.94367, 41.93259, 41.89076, 41.91172, 4…
$ start_lng          <dbl> -87.62622, -87.64895, -87.63643, -87.63170, -87.626…
$ end_lat            <dbl> 41.90672, 41.98404, 41.93650, 41.91831, 41.90799, 4…
$ end_lng            <dbl> -87.63483, -87.66027, -87.64754, -87.63628, -87.631…
$ member_casual      <chr> "member", "member", "casual", "casual", "member", "…

Verifing missing values

bikeshare_data |>
    is.na() |>
    colSums()
           ride_id      rideable_type         started_at           ended_at 
                 0                  0                  0                  0 
start_station_name   start_station_id   end_station_name     end_station_id 
            785465             786088             849162             849623 
         start_lat          start_lng            end_lat            end_lng 
                 0                  0               8137               8137 
     member_casual 
                 0 

The missing data are grouped at the location variable (station name, latitude and longitude)

filtering the data

  • Filtering and Process the data using the tools in the tidyverse.

    • In this fase we created the following variables:

      • trip_duration - the trip duration in minutes;

      • weekday_day - The day of the week the trip takes place;

      • is_weekend - Test if the day is a weekend;

      • date_month - Stores the month the trip takes place;

      • date_hour - Stores the hour the trip takes place;

      • date_season - Stores the season of the year;

      • day_time - Stores the time of the day;

      • trip_route - Stores the route of the trip (start station to end station).

    • Then we keep the following variable:

      • start_station_name;

      • ride_id;

      • rideable_type;

      • and member_casual.

    • the we exclude the remaning original variables.

    • then we change the class of the categorical variables to factor.

    • And finally, we filter the data to contain only trip duration longer than 0 minutes.

I chose not to exclude missing values due to being limited to location variables (station names and geographic markers), as well as excluding trips shorter than two minutes to minimize data collection errors.

# Filterring data.
bikeshare_data <- bikeshare_data |>
    mutate(trip_duration = as.numeric(ended_at - started_at)/60, weekday_day = wday(started_at,
        label = TRUE), is_weekend = ifelse((wday(started_at) == 7 | wday(started_at) ==
        1), "yes", "no"), date_month = month(started_at, label = TRUE), date_hour = hour(started_at),
        date_season = case_when(month(started_at) == 1 | month(started_at) == 2 |
            month(started_at) == 3 ~ "winter", month(started_at) == 4 | month(started_at) ==
            5 | month(started_at) == 6 ~ "spring", month(started_at) == 7 | month(started_at) ==
            8 | month(started_at) == 9 ~ "summer", month(started_at) == 10 | month(started_at) ==
            11 | month(started_at) == 12 ~ "fall"), day_time = case_when(hour(started_at) <
            6 ~ "dawn", hour(started_at) >= 6 & hour(started_at) < 12 ~ "morning",
            hour(started_at) >= 12 & hour(started_at) < 18 ~ "afternoon", hour(started_at) >=
                18 ~ "night"), trip_route = str_c(start_station_name, end_station_name,
            sep = " to ")) |>
    relocate(start_station_name, .before = trip_route) |>
    select(-(started_at:end_lng)) |>
    mutate(is_weekend = factor(is_weekend, levels = c("yes", "no"), ordered = TRUE),
        rideable_type = factor(rideable_type, levels = c("docked_bike", "electric_bike",
            "classic_bike"), ordered = TRUE), member_casual = factor(member_casual,
            levels = c("member", "casual"), ordered = TRUE), date_season = factor(date_season,
            levels = c("winter", "spring", "summer", "fall"), ordered = TRUE), date_hour = factor(date_hour,
            levels = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
                17, 18, 19, 20, 21, 22, 23), ordered = TRUE), day_time = factor(day_time,
            levels = c("dawn", "morning", "afternoon", "night"), ordered = TRUE)) |>
    filter(trip_duration > 2)  # Only taking in account trips Higher than 2 minutes
  • Checking the data
glimpse(bikeshare_data)
Rows: 7,848,936
Columns: 21
$ ride_id            <chr> "762198876D69004D", "BEC9C9FBA0D4CF1B", "D2FD8EA432…
$ rideable_type      <ord> docked_bike, docked_bike, docked_bike, docked_bike,…
$ started_at         <dttm> 2020-07-09 15:22:02, 2020-07-24 23:56:30, 2020-07-…
$ ended_at           <dttm> 2020-07-09 15:25:52, 2020-07-25 00:20:17, 2020-07-…
$ trip_route         <chr> "Ritchie Ct & Banks St to Wells St & Evergreen Ave"…
$ start_station_name <chr> "Ritchie Ct & Banks St", "Halsted St & Roscoe St", …
$ start_station_id   <chr> "180", "299", "329", "181", "268", "635", "113", "2…
$ end_station_name   <chr> "Wells St & Evergreen Ave", "Broadway & Ridge Ave",…
$ end_station_id     <chr> "291", "461", "156", "94", "301", "289", "140", "31…
$ start_lat          <dbl> 41.90687, 41.94367, 41.93259, 41.89076, 41.91172, 4…
$ start_lng          <dbl> -87.62622, -87.64895, -87.63643, -87.63170, -87.626…
$ end_lat            <dbl> 41.90672, 41.98404, 41.93650, 41.91831, 41.90799, 4…
$ end_lng            <dbl> -87.63483, -87.66027, -87.64754, -87.63628, -87.631…
$ member_casual      <ord> member, member, casual, casual, member, casual, mem…
$ trip_duration      <dbl> 3.833333, 23.783333, 7.250000, 20.933333, 5.133333,…
$ weekday_day        <ord> qui, sex, qua, sex, sáb, ter, qui, seg, qui, seg, s…
$ is_weekend         <ord> no, no, no, no, yes, no, no, no, no, no, no, no, no…
$ date_month         <ord> jul, jul, jul, jul, jul, jul, jul, jul, jul, jul, j…
$ date_hour          <ord> 15, 23, 19, 19, 10, 16, 11, 16, 11, 18, 15, 18, 9, …
$ date_season        <ord> summer, summer, summer, summer, summer, summer, sum…
$ day_time           <ord> afternoon, night, night, night, morning, afternoon,…

Analyse Phase

  • First, we analyze the data broadly to see patterns, then group it by user type to see differences.
# Using Hmisc package

bikeshare_summary <- bikeshare_data |>
    select(-c(ride_id, start_station_name, trip_route)) |>
    describe(descript = "Statistical Description Summary", tabular = TRUE)

html(bikeshare_summary, exclude1 = TRUE, align = "c", scroll = TRUE, rows = 50)
Statistical Description Summary

9 Variables   7848936 Observations

rideable_type
image
nmissingdistinct
784893603
 Value        docked_bike electric_bike  classic_bike
 Frequency        2170489       2440349       3238098
 Proportion         0.277         0.311         0.413
 

member_casual
nmissingdistinct
784893602
 Value       member  casual
 Frequency  4344181 3504755
 Proportion   0.553   0.447
 

trip_duration
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
7848936031695124.2927.51 3.600 4.650 7.43313.01723.61740.86762.300
lowest : 2.016667 2.033333 2.050000 2.066667 2.083333
highest:52701.38333353921.60000054283.35000055691.68333355944.150000

weekday_day
image
nmissingdistinct
784893607
lowest : dom seg ter qua qui , highest: ter qua qui sex sáb
 Value          dom     seg     ter     qua     qui     sex     sáb
 Frequency  1192589  968361 1010452 1062400 1049751 1157123 1408260
 Proportion   0.152   0.123   0.129   0.135   0.134   0.147   0.179
 

is_weekend
nmissingdistinct
784893602
 Value          yes      no
 Frequency  2600849 5248087
 Proportion   0.331   0.669
 

date_month
image
nmissingdistinct
7848936012
lowest : jan fev mar abr mai , highest: ago set out nov dez
 Value          jan     fev     mar     abr     mai     jun     jul     ago     set
 Frequency    93893   48012  222701  328578  516940  709441 1340055 1387323 1252437
 Proportion   0.012   0.006   0.028   0.042   0.066   0.090   0.171   0.177   0.160
                                   
 Value          out     nov     dez
 Frequency   986096  598027  365433
 Proportion   0.126   0.076   0.047
 

date_hour
image
nmissingdistinct
7848936024
lowest : 0 1 2 3 4 , highest: 19 20 21 22 23
date_season
image
nmissingdistinct
784893604
 Value       winter  spring  summer    fall
 Frequency   364606 1554959 3979815 1949556
 Proportion   0.046   0.198   0.507   0.248
 

day_time
image
nmissingdistinct
784893604
 Value           dawn   morning afternoon     night
 Frequency     335805   1791759   3505742   2215630
 Proportion     0.043     0.228     0.447     0.282
 

size <- nrow(bikeshare_data)

bikeshare_data |>
  group_by(date_hour) |>
  summarise(percent = round((n() / size) * 100, 2)) |>
  arrange(desc(percent)) |>
  kable(caption = "Total percent usage by the hour of the day") |> 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    
    position = "center",
    fixed_thead = TRUE
  )|>
  scroll_box(width = "100%", height = "500px")
Total percent usage by the hour of the day
date_hour percent
17 10.03
18 8.80
16 8.39
15 7.13
19 6.50
14 6.48
13 6.39
12 6.25
11 5.32
20 4.52
10 4.23
8 4.14
9 3.71
7 3.49
21 3.41
22 2.87
23 2.12
6 1.94
0 1.41
1 0.96
5 0.72
2 0.58
3 0.32
4 0.29
  • Analyzing the data generated by the “describe” function we can infer that:

    • Regarding the type of bikes, “classic_bike” is more than 41% of all trips, followed by “eletric_bike” with 31% and “docked_bike” with 28%;

    • Regarding to the type of user, “member” represents 55% while “casual” represents 45%;

    • Regarding to the day, “weekend” represents 33% of the races with a peak on Saturday and a minimum on Monday;

    • Regarding the time of day, it can be observed that the peak occurs at 17, 18 and 16 hours. The races decrease from afternoon, night, morning, until dawn.

    • Regarding to the month and season, the values decrease from summer,fall, spring to winter. With the busiest months being August, July, September and Octuber and the least busy months being February, January and March;

    • Regarding to the duration of the trip, the average duration is 24.26 minutes.

bikeshare_skim_member <- bikeshare_data |>
    select(-c(ride_id, start_station_name, trip_route)) |>
    group_by(member_casual) |>
    skimr::skim() |>
    as_tibble()


bikeshare_skim_member |>
    skimr::yank("numeric") |>
    as_tibble() |>
    kable(caption = "Trip duration difference between Casual users and members") |>
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
        full_width = FALSE, position = "center", fixed_thead = TRUE) |>
    scroll_box(width = "100%", height = "200px")
Trip duration difference between Casual users and members
skim_variable member_casual n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
trip_duration member 0 1 14.71053 45.9871 2.016667 6.183333 10.43333 17.91667 33421.37 ▇▁▁▁▁
trip_duration casual 0 1 36.15625 306.0679 2.016667 10.000000 17.55000 32.28333 55944.15 ▇▁▁▁▁
bikeshare_skim_member |>
    skimr::yank("factor") |>
    as_tibble() |>
    kable(caption = "Behavior difference of Members and Casual users") |>
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
        full_width = FALSE, position = "center", fixed_thead = TRUE) |>
    scroll_box(width = "100%", height = "500px")
Behavior difference of Members and Casual users
skim_variable member_casual n_missing complete_rate ordered n_unique top_counts
rideable_type member 0 1 TRUE 3 cla: 1982937, ele: 1318011, doc: 1043233
rideable_type casual 0 1 TRUE 3 cla: 1255161, doc: 1127256, ele: 1122338
weekday_day member 0 1 TRUE 7 qua: 672545, qui: 645504, sex: 641688, ter: 640662
weekday_day casual 0 1 TRUE 7 sáb: 782770, dom: 655855, sex: 515435, qui: 404247
is_weekend member 0 1 TRUE 2 no: 3181957, yes: 1162224
is_weekend casual 0 1 TRUE 2 no: 2066130, yes: 1438625
date_month member 0 1 TRUE 12 ago: 700245, set: 670754, jul: 642136, out: 593567
date_month casual 0 1 TRUE 12 jul: 697919, ago: 687078, set: 581683, out: 392529
date_hour member 0 1 TRUE 24 17: 456215, 18: 390430, 16: 368660, 15: 292211
date_hour casual 0 1 TRUE 24 17: 331026, 18: 300233, 16: 289602, 15: 267719
date_season member 0 1 TRUE 4 sum: 2013135, fal: 1269120, spr: 807456, win: 254470
date_season casual 0 1 TRUE 4 sum: 1966680, spr: 747503, fal: 680436, win: 110136
day_time member 0 1 TRUE 4 aft: 1894379, mor: 1167354, nig: 1143103, daw: 139345
day_time casual 0 1 TRUE 4 aft: 1611363, nig: 1072527, mor: 624405, daw: 196460
  • Regarding the difference in usage between members and casual users, we can observe the following:

    • The trip duration is 2,45 times longer for casual users than members. Averaging 36.15 min for casual users and 14.71 min for members;

    • Regarding the type of bicycle, the most used for members, in descending order, are “classic”, “eletric”, and “docked”. For casual users they are “classic”, “docked” and “eletric”;

    • Regarding the time of year, both users follow the general average with a peak in summer and less use in winter, however for casual users spring is busier than fall;

    • The busiest member months are August, September and July. For casual users, the busiest months are July, August and September;

    • Regarding the day of the week, the busiest days for members, in descending order, are Wednesday, Thursday and Friday. For casual users, the busiest days are Saturday, Sunday and Friday. With greater usage of the service on weekends for casual members compared to members;

    • Regarding the time of day both types of users have more runs in the afternoon, however in casual members the night is busier than in the morning.

  • The Stations and routes more often used are the following:

bikeshare_data |>
    group_by(start_station_name) |>
    summarise(number_of_trips = n()) |>
    arrange(-number_of_trips) |>
    drop_na(start_station_name) |>
    slice(1:10) |>
    kable(caption = "Top 10 most popular station") |>
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
        full_width = FALSE, position = "center", fixed_thead = TRUE)
Top 10 most popular station
start_station_name number_of_trips
Streeter Dr & Grand Ave 108220
Clark St & Elm St 62730
Theater on the Lake 60991
Michigan Ave & Oak St 60550
Wells St & Concord Ln 60330
Millennium Park 59264
Wells St & Elm St 52592
Clark St & Armitage Ave 49025
Clark St & Lincoln Ave 48212
Lake Shore Dr & Monroe St 48087
bikeshare_data |>
    group_by(trip_route) |>
    summarise(number_of_trips = n()) |>
    arrange(-number_of_trips) |>
    drop_na(trip_route) |>
    slice(1:10) |>
    kable(caption = "Top 10 most popular route") |>
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
        full_width = FALSE, position = "center", fixed_thead = TRUE)
Top 10 most popular route
trip_route number_of_trips
Streeter Dr & Grand Ave to Streeter Dr & Grand Ave 16190
Lake Shore Dr & Monroe St to Lake Shore Dr & Monroe St 9421
Millennium Park to Millennium Park 9291
Michigan Ave & Oak St to Michigan Ave & Oak St 8816
Buckingham Fountain to Buckingham Fountain 6818
Theater on the Lake to Theater on the Lake 6147
Indiana Ave & Roosevelt Rd to Indiana Ave & Roosevelt Rd 6136
Ellis Ave & 60th St to Ellis Ave & 55th St 5871
Fort Dearborn Dr & 31st St to Fort Dearborn Dr & 31st St 5839
Ellis Ave & 55th St to Ellis Ave & 60th St 5339

Share Phase

By the hour and the time of the day

bikeshare_data |>
    group_by(member_casual, date_hour) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(aes(x = date_hour, y = n_trip, color = member_casual, group = member_casual)) +
    geom_line() + geom_point() + facet_grid(rows = vars(member_casual)) + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + scale_y_continuous(labels = scales::unit_format(unit = "k",
    scale = 0.001)) + ggthemes::theme_hc() + labs(title = "Number of trips by the hour of the day",
    color = "Type of user", fill = "Type of user", x = "hour of the day", y = "Number of trips (in Thousands)")

bikeshare_data |>
    group_by(member_casual, date_hour) |>
    summarise(mean_trip = mean(trip_duration), .groups = "drop") |>
    ggplot(aes(x = date_hour, y = mean_trip, color = member_casual, group = member_casual)) +
    geom_line() + geom_point() + facet_grid(rows = vars(member_casual), scales = "free_y") +
    ggthemes::scale_color_tableau(palette = "Superfishel Stone") + ggthemes::scale_fill_tableau(palette = "Superfishel Stone") +
    ggthemes::theme_hc() + labs(title = "Number of trips by he hour of the day",
    color = "Type of user", fill = "Type of user", x = "hour of the day", y = "Trip duration (min)")

bikeshare_data |>
    group_by(member_casual, day_time) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(mapping = aes(day_time, n_trip)) + geom_col(aes(color = member_casual,
    fill = member_casual), position = "dodge2") + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + scale_y_continuous(limits = c(0,
    2e+06), n.breaks = 10, labels = scales::unit_format(unit = "MM", scale = 1e-06)) +
    ggthemes::theme_hc() + labs(title = "Number of trips by the time of the day",
    color = "Type of user", fill = "Type of user", x = "Time of the day", y = "Number of trips (in Millions)")

By Month and Season

bikeshare_data |>
    group_by(member_casual, date_month) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(aes(x = date_month, y = n_trip, color = member_casual, group = member_casual)) +
    geom_line() + geom_point() + facet_grid(rows = vars(member_casual)) + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + scale_y_continuous(labels = scales::unit_format(unit = "k",
    scale = 0.001)) + ggthemes::theme_hc() + labs(title = "Number of trips by the months and type of user",
    color = "Type of user", fill = "Type of user", x = "Month of the year", y = "Number of trips (in Thousands)")

bikeshare_data |>
    group_by(member_casual, date_month) |>
    summarise(mean_duration = mean(trip_duration), .groups = "drop") |>
    ggplot(aes(x = date_month, y = mean_duration, color = member_casual, group = member_casual)) +
    geom_line() + geom_point() + facet_grid(rows = vars(member_casual), scales = "free_y") +
    ggthemes::scale_color_tableau(palette = "Superfishel Stone") + ggthemes::scale_fill_tableau(palette = "Superfishel Stone") +
    ggthemes::theme_hc() + labs(title = "Trip duration by the Months and type of user",
    color = "Type of user", fill = "Type of user", x = "hour of the day", y = "Trip duration (min)")

bikeshare_data |>
    group_by(member_casual, date_season) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(mapping = aes(date_season, n_trip)) + geom_col(aes(color = member_casual,
    fill = member_casual), position = "dodge2") + scale_y_continuous(n.breaks = 10,
    labels = scales::unit_format(unit = "MM", scale = 1e-06)) + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + ggthemes::theme_hc() +
    labs(title = "Number of trips by seasons and type of user", color = "Type of user",
        fill = "Type of user", x = "season of the year", y = "Number of trips (in Millions)")

By type of the bike

bikeshare_data |>
    group_by(member_casual, rideable_type) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(mapping = aes(rideable_type, n_trip)) + geom_col(aes(color = member_casual,
    fill = member_casual), position = "dodge2") + scale_y_continuous(n.breaks = 10,
    labels = scales::unit_format(unit = "MM", scale = 1e-06)) + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + ggthemes::theme_hc() +
    labs(title = "Number of trips by the type of the bike and user", color = "Type of user",
        fill = "Type of user", x = "type of the bike", y = "Number of trips (in Millions)")

bikeshare_data |>
    group_by(member_casual, rideable_type, date_hour) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(aes(x = date_hour, y = n_trip, color = rideable_type, group = interaction(member_casual,
        rideable_type))) + geom_line() + geom_point() + facet_grid(rows = vars(member_casual),
    scales = "free_y") + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + scale_y_continuous(labels = scales::unit_format(unit = "k",
    scale = 0.001)) + ggthemes::theme_hc() + labs(title = "Number of trips by the hour of the day, type of bike and type of user",
    color = "Type of bike", fill = "Type of bike", x = "hour of the day", y = "Number of trips (in Thousands)")

bikeshare_data |>
    group_by(member_casual, rideable_type, date_hour) |>
    summarise(mean_duration = mean(trip_duration), .groups = "drop") |>
    ggplot(aes(x = date_hour, y = mean_duration, color = rideable_type, group = interaction(member_casual,
        rideable_type))) + geom_line() + geom_point() + facet_grid(rows = vars(member_casual),
    scales = "free_y") + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + ggthemes::theme_hc() +
    labs(title = "Duration of the trips by the type of the bike", color = "Type of bike",
        fill = "Type of bike", x = "hour of the day", y = "Trip duration (min)")

bikeshare_data |>
    group_by(member_casual, rideable_type, date_month) |>
    summarise(n_trip = n(), .groups = "drop") |>
    ggplot(aes(x = date_month, y = n_trip, color = rideable_type, group = interaction(member_casual,
        rideable_type))) + geom_line() + geom_point() + facet_grid(rows = vars(member_casual),
    scales = "free_y") + ggthemes::scale_color_tableau(palette = "Superfishel Stone") +
    ggthemes::scale_fill_tableau(palette = "Superfishel Stone") + scale_y_continuous(labels = scales::unit_format(unit = "k",
    scale = 0.001)) + ggthemes::theme_hc() + labs(title = "Number of trips by the months",
    color = "Type of bike", fill = "Type of bike", x = "Month of the year", y = "No. of trips (in Thousands)")

Stations and the Routs more offen used

bikeshare_data |>
    group_by(start_station_name) |>
    summarise(n_trip = n()) |>
    arrange(-n_trip) |>
    drop_na(start_station_name) |>
    slice(1:10) |>
    ggplot(mapping = aes(fct_reorder(start_station_name, -n_trip), n_trip)) + geom_col(aes(color = n_trip,
    fill = n_trip), position = "dodge2") + coord_flip() + ggthemes::scale_color_continuous_tableau(palette = "Blue") +
    ggthemes::scale_fill_continuous_tableau(palette = "Blue") + ggthemes::theme_hc() +

theme(legend.position = "none") + labs(title = "10 most used start point", x = "",
    y = "")

bikeshare_data |>
    group_by(trip_route) |>
    summarise(n_trip = n()) |>
    arrange(-n_trip) |>
    drop_na(trip_route) |>
    slice(1:10) |>
    ggplot(mapping = aes(fct_reorder(trip_route, -n_trip), n_trip)) + geom_col(aes(color = n_trip,
    fill = n_trip), position = "dodge2") + coord_flip() + ggthemes::scale_color_continuous_tableau(palette = "Blue") +
    ggthemes::scale_fill_continuous_tableau(palette = "Blue") + ggthemes::theme_hc() +
    theme(legend.position = "none") + labs(title = "Top 10 Routes", x = "", y = "")

Act

Key findings

  • Different of members, casual user use service more often during the weekend;
  • Also have the mean duration of the trips 2,45 times longer than members;
  • They have the highest trip duration during dawn (from 12 am to 5 am). With the pic at 2 am;
  • Casual users use the service more from July to October;
  • Although February is the month with the least trips, it is also the month with the longest duration of trips;
  • Casual members use more docked bikes thans members and also have a longer average travel time for this type of bike.

Recommendations

  • Create a subscription based on time-of-day to encourage casual users that ride, for exemple, from 9 pm to 5 am to subscribe;
  • Implement discounts or a points system based on loyalty (frequency of use) and high trip-duration users;
  • Create seasonal subscriptions such as summer and spring. Or implement discounts on temporary subscriptions (3, 6, 9, 12 months);
  • Create subscription especific to ride on week day or on the weekend;
  • Create subscriptions for specific types of bicycles. Plans to use only docked bicycles, for example.