美文网首页
共享单车项目分析

共享单车项目分析

作者: Johnz0 | 来源:发表于2019-11-15 21:46 被阅读0次

简介:随着共享单车的星期,这次探索三大美国城市的自行车共享系统相关的数据:芝加哥、纽约和华盛顿特区,帮助共享单车公司得到一些关键性的数据信息,例如哪个起始车站最热门,哪一趟行程最热门等等,来对共享单车的投放给予一定帮助。

一、分析步骤

  • 编写代码导入数据,并通过计算描述性统计数据回答有趣的问题。
  • 编写一个脚本,该脚本会接受原始输入并在终端中创建交互式体验,以展现这些统计信息。
  • 提出问题
  • 终端应用脚本

二、提出问题

  • 起始时间(Start Time 列)中哪个月份最常见?
  • 起始时间中,一周的哪一天(比如 Monday, Tuesday)最常见?
  • 起始时间中,一天当中哪个小时最常见?
  • 总骑行时长(Trip Duration)是多久,平均骑行时长是多久?
  • 哪个起始车站(Start Station)最热门,哪个结束车站(End Station)最热门?
  • 哪一趟行程最热门(即,哪一个起始站点与结束站点的组合最热门)?
  • 每种用户类型有多少人?
  • 每种性别有多少人?
  • 出生年份最早的是哪一年、最晚的是哪一年,最常见的是哪一年?

三、代码实现

工具:Python
文本编辑器:Pycharm

import time
import pandas as pd
import numpy as np


CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def get_filters():
    """
    Asks user to specify a city, month, and day to analyze.

    Returns:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    """
    print('Hello! Let\'s explore some US bikeshare data!')
    # get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
    city = input("Which city do you want to analyze? input :chicago, new york city, washington\n").lower()
    while True:
        if city not in CITY_DATA.keys():
            city = input('Invalid input======\nwould you like to see data for chicago, '
                         'new youk city, or washington?')
        else:
            break

    # get user input for month (all, january, february, ... , june)
    months = ['all', 'january', 'february', 'march', 'april', 'may', 'june']
    month = input("Which month data do you want to analyze?input :all,january, february, "
                  "march, april, may, june\n").lower()
    while True:
        if month not in months:
            month = input('Invalid input======\nWhich month data do you want to analyze?input :all,january, february,'
                  'march, april, may, june\n').lower()
        else:
            break

    # get user input for day of week (all, monday, tuesday, ... sunday)
    days = ['all', 'monday','tuesday','wednesday','thursday','friday','saturday','sunday']
    day = input("Which day of week do you want to analyze? input:"
                "all,monday, tuesday, wednesday, thursday, friday, saturday, sunday").lower()
    while True:
        if day not in days:
            day = input("Invalid input======\nWhich day of week do you want to analyze? input:"
                "all,monday, tuesday, wednesday, thursday, friday, saturday, sunday").lower()
        else:
            break

    print('-'*40)
    return city, month, day


def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """
    # load data file into a dataframe
    df = pd.read_csv(CITY_DATA[city])

    # convert the Start Time column to datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])

    # extract month and day of week from Start Time to create new columns
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.weekday_name

    # filter by month if applicable
    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month) + 1

        # filter by month to create the new dataframe
        df = df[df['month'] == month]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['day_of_week'] == day.title()]
    return df


def time_stats(df):
    """Displays statistics on the most frequent times of travel."""

    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()

    # display the most common month
    common_month = df['month'].mode()[0]
    print('The most common month: ', common_month)

    # display the most common day of week
    common_day_of_week = df['day_of_week'].mode()[0]
    print('The most common day of week: ', common_day_of_week)

    # display the most common start hour
    df['start_hour'] = df['Start Time'].dt.hour
    common_start_hour = df['start_hour'].mode()[0]
    print('The most common start hour: ', common_start_hour)


    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def station_stats(df):
    """Displays statistics on the most popular stations and trip."""

    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # display most commonly used start station
    common_start_station = df['Start Station'].mode()[0]
    print('The most commonly used start station: ', common_start_station)

    # display most commonly used end station
    common_end_station = df['End Station'].mode()[0]
    print('The most commonly used end station: ', common_end_station)

    # display most frequent combination of start station and end station trip
    df['Station'] = df['Start Station'] + df['End Station']
    frequent_station = df['Station'].mode()[0]
    print('The most frequent station: ', frequent_station)

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    # display total travel time
    total_travel_time = df['Trip Duration'].sum()
    print('The total trabel time: ', total_travel_time)

    # display mean travel time
    mean_trabel_time = df['Trip Duration'].mean()
    print('The mean travel time: ', mean_trabel_time)

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def user_stats(df):
    """Displays statistics on bikeshare users."""

    print('\nCalculating User Stats...\n')
    start_time = time.time()

    # Display counts of user types
    count_user_types = df['User Type'].value_counts()
    print('Counts of user types: ', count_user_types)

    # Display counts of gender
    try:
        count_gender = df['Gender'].value_counts()
        print('Counts of gender: ', count_gender)
    except KeyError:
        print('Counts of gender:oh sorry, this city have no this data.')

    # Display earliest, most recent, and most common year of birth
    try:
        earliest_birth = df['Birth Year'].min()
        most_recent_birth = df['Birth Year'].max()
        most_common_birth = df['Birth Year'].mode()[0]
        print('Earliest year of birth:',earliest_birth)
        print('Most recent year of birth',most_recent_birth)
        print('Most common year of birth',most_common_birth)
    except KeyError:
        print('oh sorry, this city have no Birth Year data.')

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)

        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


if __name__ == "__main__":
    main()

四、互动式体验

该文件是一个脚本,它接受原始输入在终端中创建交互式体验,来回答有关数据集的问题。
输入想要查看的问题:

输入.png
得出答案:
答案.png
Ps:脚本还可以持续地优化,这次只是做了一个简易的版本,另外还可以在脚本加入可视化的工具,输入需要的数据,自动生成需要的图表,这就不要太方便了啊啊啊啊啊!!!!!!

相关文章

  • 共享单车项目分析

    项目来源:Bike Sharing Demand | Kaggle 一、提出问题 在本项目中,参与者被要求将历史使...

  • 共享单车项目分析

    简介:随着共享单车的星期,这次探索三大美国城市的自行车共享系统相关的数据:芝加哥、纽约和华盛顿特区,帮助共享单车公...

  • Kaggle-共享单车项目分析

    项目链接:Bike Sharing Demand | Kaggle 思路:1.认识数据 2.特征工程 3.建模...

  • 共享单车还能火多久?

    ** 什么是共享单车 **现在提到共享单车,大家应该都不陌生,距离共享单车进入市场已有段时间。共享单车是共享经济的...

  • 共享XX

    共享电单车 芒果电单车 七号电单车 共享汽车 gofun 共享单车 摩拜 ofo bluegogo

  • 设计模式之享元模式

    享元模式,刚好现在共享单车火,拿来开刀 抽象共享单车 天朝小黄车 50斤的摩拜单车 共享单车类型 单车托管所 客户...

  • 押金把共享单车带入疯狂,如果消灭押金未来会怎样?

    共享单车有多火? 摩拜单车、ofo共享单车、酷骑单车、1步单车、由你单车、7号电单车、黑鸟单车、熊猫单车、云单车、...

  • ofo出事摩拜补刀共享单车之伤 别将共享单车提供给孩童

    共享单车可谓方便了生活,但因为共享单车所造成的隐患可不少,除了违章停车、共享单车被毁以外,最严重的要属共享单车用车...

  • 共享大乱炖,几多热闹几多喧嚣 | 借把伞

    一、主流共享产品:共享单车、共享汽车、共享雨伞等 1、共享单车 共享单车市场很稳定没有特别的新闻,反正前排名前两位...

  • 共享单车

    说起共享单车,那可谓是无人不知,无人不晓。作为共享经济的代表之一,共享单车最先火了起来。在共享单车的红火时代,共享...

网友评论

      本文标题:共享单车项目分析

      本文链接:https://www.haomeiwen.com/subject/pmqeictx.html