湾区自行车共享
在本章的最后,我们将使用学到的所有方法来考察一个新的、大型的数据集。我们还将介绍 map_table,一个强大的可视化工具。
Bay Area Bike Share 服务发布了一个数据集,描述了其系统中2014年9月至2015年8月的每一次自行车租赁。总共有354,152次租赁。各列如下:
- 租赁ID
- 租赁时长(秒)
- 开始日期
- 出发站名称和出发终端代码
- 到达站名称和到达终端代码
- 自行车的序列号
- 订阅者类型和邮政编码
from datascience import *
%matplotlib inline
path_data = '../../../assets/data/'
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import math
from scipy import stats
import numpy as np
import warnings
warnings.simplefilter(action='ignore', category=np.VisibleDeprecationWarning)
trips = Table.read_table(path_data + 'trip.csv')
trips
Trip ID | Duration | Start Date | Start Station | Start Terminal | End Date | End Station | End Terminal | Bike # | Subscriber Type | Zip Code
913460 | 765 | 8/31/2015 23:26 | Harry Bridges Plaza (Ferry Building) | 50 | 8/31/2015 23:39 | San Francisco Caltrain (Townsend at 4th) | 70 | 288 | Subscriber | 2139
913459 | 1036 | 8/31/2015 23:11 | San Antonio Shopping Center | 31 | 8/31/2015 23:28 | Mountain View City Hall | 27 | 35 | Subscriber | 95032
913455 | 307 | 8/31/2015 23:13 | Post at Kearny | 47 | 8/31/2015 23:18 | 2nd at South Park | 64 | 468 | Subscriber | 94107
913454 | 409 | 8/31/2015 23:10 | San Jose City Hall | 10 | 8/31/2015 23:17 | San Salvador at 1st | 8 | 68 | Subscriber | 95113
913453 | 789 | 8/31/2015 23:09 | Embarcadero at Folsom | 51 | 8/31/2015 23:22 | Embarcadero at Sansome | 60 | 487 | Customer | 9069
913452 | 293 | 8/31/2015 23:07 | Yerba Buena Center of the Arts (3rd @ Howard) | 68 | 8/31/2015 23:12 | San Francisco Caltrain (Townsend at 4th) | 70 | 538 | Subscriber | 94118
913451 | 896 | 8/31/2015 23:07 | Embarcadero at Folsom | 51 | 8/31/2015 23:22 | Embarcadero at Sansome | 60 | 363 | Customer | 92562
913450 | 255 | 8/31/2015 22:16 | Embarcadero at Sansome | 60 | 8/31/2015 22:20 | Steuart at Market | 74 | 470 | Subscriber | 94111
913449 | 126 | 8/31/2015 22:12 | Beale at Market | 56 | 8/31/2015 22:15 | Temporary Transbay Terminal (Howard at Beale) | 55 | 439 | Subscriber | 94130
913448 | 932 | 8/31/2015 21:57 | Post at Kearny | 47 | 8/31/2015 22:12 | South Van Ness at Market | 66 | 472 | Subscriber | 94702
... (354142 rows omitted)我们将只关注“免费行程”,即持续时间少于1800秒(半小时)的行程。更长的行程需要收费。
下面的直方图显示,大多数行程大约在10分钟(600秒)左右。很少有接近30分钟(1800秒)的,可能是因为人们尽量在截止时间之前归还自行车以避免付费。
commute = trips.where('Duration', are.below(1800))
commute.hist('Duration', unit='Second')
Histogram with 'Duration (Second)' on the x-axis and 'Percent per Second' on the y-axis. There are 10 bins. The tallest bars are between 250 and approximately 550. The histogram has a right tail extending to 1750.通过指定更多的箱,我们可以获得更多细节。但整体形状变化不大。
commute.hist('Duration', bins=60, unit='Second')
Histogram with 'Duration (Second)' on the x-axis and 'Percent per Second' on the y-axis. Bins now have a width of 60 so there are many bars ranging from 0 to above 1750. The tallest bars are on the left hand side of the graph and there is a right tail.使用 group 和 pivot 探索数据
我们可以使用 group 来识别使用最频繁的出发站:
starts = commute.group('Start Station').sort('count', descending=True)
starts
Start Station | count
San Francisco Caltrain (Townsend at 4th) | 25858
San Francisco Caltrain 2 (330 Townsend) | 21523
Harry Bridges Plaza (Ferry Building) | 15543
Temporary Transbay Terminal (Howard at Beale) | 14298
2nd at Townsend | 13674
Townsend at 7th | 13579
Steuart at Market | 13215
Embarcadero at Sansome | 12842
Market at 10th | 11523
Market at Sansome | 11023
... (60 rows omitted)最多的行程始于旧金山 Townsend 和 4th 街的加州火车(Caltrain)站。人们乘火车进入城市,然后使用共享单车前往下一个目的地。
group 方法也可以用于按出发站和到达站对租赁进行分类。
commute.group(['Start Station', 'End Station'])
Start Station | End Station | count
2nd at Folsom | 2nd at Folsom | 54
2nd at Folsom | 2nd at South Park | 295
2nd at Folsom | 2nd at Townsend | 437
2nd at Folsom | 5th at Howard | 113
2nd at Folsom | Beale at Market | 127
2nd at Folsom | Broadway St at Battery St | 67
2nd at Folsom | Civic Center BART (7th at Market) | 47
2nd at Folsom | Clay at Battery | 240
2nd at Folsom | Commercial at Montgomery | 128
2nd at Folsom | Davis at Jackson | 28
... (1619 rows omitted)54次行程既从 Folsom 的 2nd 街站出发又在那里结束。更多的数量(437次)是在 Folsom 的 2nd 街站和 Townsend 的 2nd 街站之间。
pivot 方法进行相同的分类,但将结果显示在列联表中,该表显示了出发站和到达站的所有可能组合,即使其中一些组合不对应任何行程。请记住,pivot 语句的第一个参数指定了透视表的列标签;第二个参数标记行。
在 Beale at Market 附近有一个火车站和湾区捷运(BART)站,这解释了为什么在那里出发和结束的行程数量很高。
commute.pivot('Start Station', 'End Station')
End Station | 2nd at Folsom | 2nd at South Park | 2nd at Townsend | 5th at Howard | Adobe on Almaden | Arena Green / SAP Center | Beale at Market | Broadway St at Battery St | California Ave Caltrain Station | Castro Street and El Camino Real | Civic Center BART (7th at Market) | Clay at Battery | Commercial at Montgomery | Cowper at University | Davis at Jackson | Embarcadero at Bryant | Embarcadero at Folsom | Embarcadero at Sansome | Embarcadero at Vallejo | Evelyn Park and Ride | Franklin at Maple | Golden Gate at Polk | Grant Avenue at Columbus Avenue | Harry Bridges Plaza (Ferry Building) | Howard at 2nd | Japantown | MLK Library | Market at 10th | Market at 4th | Market at Sansome | Mechanics Plaza (Market at Battery) | Mezes Park | Mountain View Caltrain Station | Mountain View City Hall | Palo Alto Caltrain Station | Park at Olive | Paseo de San Antonio | Post at Kearny | Powell Street BART | Powell at Post (Union Square) | Redwood City Caltrain Station | Redwood City Medical Center | Redwood City Public Library | Rengstorff Avenue / California Street | Ryland Park | SJSU - San Salvador at 9th | SJSU 4th at San Carlos | San Antonio Caltrain Station | San Antonio Shopping Center | San Francisco Caltrain (Townsend at 4th) | San Francisco Caltrain 2 (330 Townsend) | San Francisco City Hall | San Jose City Hall | San Jose Civic Center | San Jose Diridon Caltrain Station | San Mateo County Center | San Pedro Square | San Salvador at 1st | Santa Clara County Civic Center | Santa Clara at Almaden | South Van Ness at Market | Spear at Folsom | St James Park | Stanford in Redwood City | Steuart at Market | Temporary Transbay Terminal (Howard at Beale) | Townsend at 7th | University and Emerson | Washington at Kearny | Yerba Buena Center of the Arts (3rd @ Howard)
2nd at Folsom | 54 | 190 | 554 | 107 | 0 | 0 | 40 | 21 | 0 | 0 | 44 | 78 | 54 | 0 | 9 | 77 | 32 | 41 | 14 | 0 | 0 | 11 | 30 | 416 | 53 | 0 | 0 | 169 | 114 | 302 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 60 | 121 | 88 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 694 | 445 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 38 | 57 | 0 | 0 | 39 | 237 | 342 | 0 | 17 | 31
2nd at South Park | 295 | 164 | 71 | 180 | 0 | 0 | 208 | 85 | 0 | 0 | 112 | 87 | 160 | 0 | 37 | 56 | 178 | 83 | 116 | 0 | 0 | 57 | 73 | 574 | 500 | 0 | 0 | 139 | 199 | 1633 | 119 | 0 | 0 | 0 | 0 | 0 | 0 | 299 | 84 | 113 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 559 | 480 | 48 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 66 | 152 | 0 | 0 | 374 | 429 | 143 | 0 | 63 | 209
2nd at Townsend | 437 | 151 | 185 | 92 | 0 | 0 | 608 | 350 | 0 | 0 | 80 | 329 | 168 | 0 | 386 | 361 | 658 | 506 | 254 | 0 | 0 | 27 | 315 | 2607 | 295 | 0 | 0 | 110 | 225 | 845 | 177 | 0 | 0 | 0 | 0 | 0 | 0 | 120 | 100 | 141 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 905 | 299 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 72 | 508 | 0 | 0 | 2349 | 784 | 417 | 0 | 57 | 166
5th at Howard | 113 | 177 | 148 | 83 | 0 | 0 | 59 | 130 | 0 | 0 | 203 | 76 | 129 | 0 | 30 | 57 | 49 | 166 | 54 | 0 | 0 | 85 | 78 | 371 | 478 | 0 | 0 | 303 | 158 | 168 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 93 | 183 | 169 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 690 | 1859 | 48 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 116 | 102 | 0 | 0 | 182 | 750 | 200 | 0 | 43 | 267
Adobe on Almaden | 0 | 0 | 0 | 0 | 11 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 17 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 7 | 16 | 0 | 0 | 0 | 0 | 0 | 19 | 23 | 265 | 0 | 20 | 4 | 5 | 10 | 0 | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0
Arena Green / SAP Center | 0 | 0 | 0 | 0 | 7 | 64 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 16 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24 | 3 | 7 | 0 | 0 | 0 | 0 | 0 | 6 | 20 | 7 | 0 | 56 | 12 | 38 | 259 | 0 | 0 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0
Beale at Market | 127 | 79 | 183 | 59 | 0 | 0 | 59 | 661 | 0 | 0 | 201 | 75 | 101 | 0 | 247 | 178 | 38 | 590 | 165 | 0 | 0 | 54 | 435 | 57 | 72 | 0 | 0 | 286 | 236 | 163 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 49 | 227 | 179 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 640 | 269 | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 243 | 128 | 0 | 0 | 16 | 167 | 35 | 0 | 64 | 45
Broadway St at Battery St | 67 | 89 | 279 | 119 | 0 | 0 | 1022 | 110 | 0 | 0 | 62 | 283 | 226 | 0 | 191 | 198 | 79 | 231 | 35 | 0 | 0 | 5 | 70 | 168 | 49 | 0 | 0 | 32 | 97 | 341 | 214 | 0 | 0 | 0 | 0 | 0 | 0 | 169 | 71 | 218 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 685 | 438 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 106 | 0 | 0 | 344 | 748 | 50 | 0 | 79 | 47
California Ave Caltrain Station | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 38 | 1 | 0 | 0 | 0 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 192 | 40 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 17 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 57 | 0 | 0
Castro Street and El Camino Real | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 931 | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 4 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
... (60 rows omitted)我们还可以使用 pivot 来找出出发站和到达站之间的最短骑行时间。这里 pivot 被赋予 Duration 作为可选的 values 参数,并以 min 作为要对每个单元格中的值执行的函数。
commute.pivot('Start Station', 'End Station', 'Duration', min)
End Station | 2nd at Folsom | 2nd at South Park | 2nd at Townsend | 5th at Howard | Adobe on Almaden | Arena Green / SAP Center | Beale at Market | Broadway St at Battery St | California Ave Caltrain Station | Castro Street and El Camino Real | Civic Center BART (7th at Market) | Clay at Battery | Commercial at Montgomery | Cowper at University | Davis at Jackson | Embarcadero at Bryant | Embarcadero at Folsom | Embarcadero at Sansome | Embarcadero at Vallejo | Evelyn Park and Ride | Franklin at Maple | Golden Gate at Polk | Grant Avenue at Columbus Avenue | Harry Bridges Plaza (Ferry Building) | Howard at 2nd | Japantown | MLK Library | Market at 10th | Market at 4th | Market at Sansome | Mechanics Plaza (Market at Battery) | Mezes Park | Mountain View Caltrain Station | Mountain View City Hall | Palo Alto Caltrain Station | Park at Olive | Paseo de San Antonio | Post at Kearny | Powell Street BART | Powell at Post (Union Square) | Redwood City Caltrain Station | Redwood City Medical Center | Redwood City Public Library | Rengstorff Avenue / California Street | Ryland Park | SJSU - San Salvador at 9th | SJSU 4th at San Carlos | San Antonio Caltrain Station | San Antonio Shopping Center | San Francisco Caltrain (Townsend at 4th) | San Francisco Caltrain 2 (330 Townsend) | San Francisco City Hall | San Jose City Hall | San Jose Civic Center | San Jose Diridon Caltrain Station | San Mateo County Center | San Pedro Square | San Salvador at 1st | Santa Clara County Civic Center | Santa Clara at Almaden | South Van Ness at Market | Spear at Folsom | St James Park | Stanford in Redwood City | Steuart at Market | Temporary Transbay Terminal (Howard at Beale) | Townsend at 7th | University and Emerson | Washington at Kearny | Yerba Buena Center of the Arts (3rd @ Howard)
2nd at Folsom | 61 | 97 | 164 | 268 | 0 | 0 | 271 | 407 | 0 | 0 | 483 | 329 | 306 | 0 | 494 | 239 | 262 | 687 | 599 | 0 | 0 | 639 | 416 | 282 | 80 | 0 | 0 | 506 | 237 | 167 | 250 | 0 | 0 | 0 | 0 | 0 | 0 | 208 | 264 | 290 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 300 | 303 | 584 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 590 | 208 | 0 | 0 | 318 | 149 | 448 | 0 | 429 | 165
2nd at South Park | 61 | 60 | 77 | 86 | 0 | 0 | 78 | 345 | 0 | 0 | 290 | 188 | 171 | 0 | 357 | 104 | 81 | 490 | 341 | 0 | 0 | 369 | 278 | 122 | 60 | 0 | 0 | 416 | 142 | 61 | 68 | 0 | 0 | 0 | 0 | 0 | 0 | 60 | 237 | 106 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 63 | 66 | 458 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 399 | 63 | 0 | 0 | 79 | 61 | 78 | 0 | 270 | 96
2nd at Townsend | 137 | 67 | 60 | 423 | 0 | 0 | 311 | 469 | 0 | 0 | 546 | 520 | 474 | 0 | 436 | 145 | 232 | 509 | 494 | 0 | 0 | 773 | 549 | 325 | 221 | 0 | 0 | 667 | 367 | 265 | 395 | 0 | 0 | 0 | 0 | 0 | 0 | 319 | 455 | 398 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 125 | 133 | 742 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 777 | 241 | 0 | 0 | 291 | 249 | 259 | 0 | 610 | 284
5th at Howard | 215 | 300 | 384 | 68 | 0 | 0 | 357 | 530 | 0 | 0 | 179 | 412 | 364 | 0 | 543 | 419 | 359 | 695 | 609 | 0 | 0 | 235 | 474 | 453 | 145 | 0 | 0 | 269 | 161 | 250 | 306 | 0 | 0 | 0 | 0 | 0 | 0 | 234 | 89 | 202 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 256 | 221 | 347 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 375 | 402 | 0 | 0 | 455 | 265 | 357 | 0 | 553 | 109
Adobe on Almaden | 0 | 0 | 0 | 0 | 84 | 275 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 701 | 387 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 229 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 441 | 452 | 318 | 0 | 0 | 0 | 0 | 0 | 309 | 146 | 182 | 0 | 207 | 358 | 876 | 101 | 0 | 0 | 369 | 0 | 0 | 0 | 0 | 0 | 0 | 0
Arena Green / SAP Center | 0 | 0 | 0 | 0 | 305 | 62 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 526 | 546 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 403 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 288 | 875 | 685 | 0 | 0 | 0 | 0 | 0 | 440 | 420 | 153 | 0 | 166 | 624 | 759 | 116 | 0 | 0 | 301 | 0 | 0 | 0 | 0 | 0 | 0 | 0
Beale at Market | 219 | 343 | 417 | 387 | 0 | 0 | 60 | 155 | 0 | 0 | 343 | 122 | 153 | 0 | 115 | 216 | 170 | 303 | 198 | 0 | 0 | 437 | 235 | 149 | 204 | 0 | 0 | 535 | 203 | 88 | 72 | 0 | 0 | 0 | 0 | 0 | 0 | 191 | 316 | 191 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 499 | 395 | 526 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 575 | 173 | 0 | 0 | 87 | 94 | 619 | 0 | 222 | 264
Broadway St at Battery St | 351 | 424 | 499 | 555 | 0 | 0 | 195 | 62 | 0 | 0 | 520 | 90 | 129 | 0 | 70 | 340 | 284 | 128 | 101 | 0 | 0 | 961 | 148 | 168 | 357 | 0 | 0 | 652 | 351 | 218 | 221 | 0 | 0 | 0 | 0 | 0 | 0 | 255 | 376 | 316 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 611 | 599 | 799 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 738 | 336 | 0 | 0 | 169 | 291 | 885 | 0 | 134 | 411
California Ave Caltrain Station | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 82 | 1645 | 0 | 0 | 0 | 628 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1771 | 0 | 484 | 131 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1077 | 0 | 0 | 0 | 870 | 911 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 531 | 0 | 0
Castro Street and El Camino Real | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 74 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 499 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 201 | 108 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 654 | 0 | 0 | 0 | 953 | 696 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
... (60 rows omitted)有人有一次非常快速的行程(271秒,约4.5分钟),从 Beale at Market 到 Folsom 的 2nd 街站,大约五个街区远。2nd Avenue 站与 Adobe on Almaden 之间没有自行车行程,因为后者在不同的城市。
绘制地图
表 stations 包含每个自行车站的地理信息,包括纬度、经度和“landmark”,即站点所在城市的名称。
stations = Table.read_table(path_data + 'station.csv')
stations
station_id | name | lat | long | dockcount | landmark | installation
2 | San Jose Diridon Caltrain Station | 37.3297 | -121.902 | 27 | San Jose | 8/6/2013
3 | San Jose Civic Center | 37.3307 | -121.889 | 15 | San Jose | 8/5/2013
4 | Santa Clara at Almaden | 37.334 | -121.895 | 11 | San Jose | 8/6/2013
5 | Adobe on Almaden | 37.3314 | -121.893 | 19 | San Jose | 8/5/2013
6 | San Pedro Square | 37.3367 | -121.894 | 15 | San Jose | 8/7/2013
7 | Paseo de San Antonio | 37.3338 | -121.887 | 15 | San Jose | 8/7/2013
8 | San Salvador at 1st | 37.3302 | -121.886 | 15 | San Jose | 8/5/2013
9 | Japantown | 37.3487 | -121.895 | 15 | San Jose | 8/5/2013
10 | San Jose City Hall | 37.3374 | -121.887 | 15 | San Jose | 8/6/2013
11 | MLK Library | 37.3359 | -121.886 | 19 | San Jose | 8/6/2013
... (60 rows omitted)我们可以使用 Marker.map_table 绘制站点位置的地图。该函数操作一个表格,其列依次为纬度、经度以及每个点的可选标识符。
Marker.map_table(stations.select('lat', 'long', 'name').relabel('name', 'labels'))
<datascience.maps.Map at 0x7f015faaf1c0>地图使用 OpenStreetMap 创建,这是一个开放的在线地图系统,你可以像使用 Google Maps 或任何其他在线地图一样使用它。放大到旧金山查看站点的分布情况。点击标记查看是哪个站点。
你也可以通过彩色圆点在地图上表示点。以下是旧金山自行车站的这样的地图。
sf = stations.where('landmark', are.equal_to('San Francisco'))
sf_map_data = sf.select('lat', 'long', 'name').relabel('name', 'labels')
Circle.map_table(sf_map_data, color='green')
<datascience.maps.Map at 0x7f0191ad31f0>更具信息量的地图:join 的应用
这些自行车站位于湾区的五个不同城市。为了通过为每个城市使用不同颜色来区分这些点,让我们首先使用 group 来识别所有城市并为每个城市分配一种颜色。
cities = stations.group('landmark').relabeled('landmark', 'city')
cities
city | count
Mountain View | 7
Palo Alto | 5
Redwood City | 7
San Francisco | 35
San Jose | 16colors = cities.with_column('color', make_array('blue', 'red', 'green', 'orange', 'purple'))
colors
city | count | color
Mountain View | 7 | blue
Palo Alto | 5 | red
Redwood City | 7 | green
San Francisco | 35 | orange
San Jose | 16 | purple现在我们可以通过 landmark 连接 stations 和 colors,然后选择绘制地图所需的列。
joined = stations.join('landmark', colors, 'city')
colored = joined.select('lat', 'long', 'name', 'color').relabel('name', 'labels')
Marker.map_table(colored)
<datascience.maps.Map at 0x7f016578f6d0>现在标记有五种不同的颜色,对应五个不同的城市。
为了查看大多数自行车租赁的出发地,让我们找出出发站:
starts = commute.group('Start Station').sort('count', descending=True)
starts
Start Station | count
San Francisco Caltrain (Townsend at 4th) | 25858
San Francisco Caltrain 2 (330 Townsend) | 21523
Harry Bridges Plaza (Ferry Building) | 15543
Temporary Transbay Terminal (Howard at Beale) | 14298
2nd at Townsend | 13674
Townsend at 7th | 13579
Steuart at Market | 13215
Embarcadero at Sansome | 12842
Market at 10th | 11523
Market at Sansome | 11023
... (60 rows omitted)我们可以通过首先将 starts 与 stations 连接来包含绘制这些站点地图所需的地理数据:
station_starts = stations.join('name', starts, 'Start Station')
station_starts
name | station_id | lat | long | dockcount | landmark | installation | count
2nd at Folsom | 62 | 37.7853 | -122.396 | 19 | San Francisco | 8/22/2013 | 7841
2nd at South Park | 64 | 37.7823 | -122.393 | 15 | San Francisco | 8/22/2013 | 9274
2nd at Townsend | 61 | 37.7805 | -122.39 | 27 | San Francisco | 8/22/2013 | 13674
5th at Howard | 57 | 37.7818 | -122.405 | 15 | San Francisco | 8/21/2013 | 7394
Adobe on Almaden | 5 | 37.3314 | -121.893 | 19 | San Jose | 8/5/2013 | 522
Arena Green / SAP Center | 14 | 37.3327 | -121.9 | 19 | San Jose | 8/5/2013 | 590
Beale at Market | 56 | 37.7923 | -122.397 | 19 | San Francisco | 8/20/2013 | 8135
Broadway St at Battery St | 82 | 37.7985 | -122.401 | 15 | San Francisco | 1/22/2014 | 7460
California Ave Caltrain Station | 36 | 37.4291 | -122.143 | 15 | Palo Alto | 8/14/2013 | 300
Castro Street and El Camino Real | 32 | 37.386 | -122.084 | 11 | Mountain View | 12/31/2013 | 1137
... (58 rows omitted)现在我们只提取绘制地图所需的数据,为每个站点添加颜色和面积。面积是每个站点出发的租赁数量的0.3倍,选择常数0.3是为了使圆点在地图上以适当的比例显示。
starts_map_data = station_starts.select('lat', 'long', 'name').with_columns(
'colors', 'blue',
'areas', station_starts.column('count') * 0.3
)
starts_map_data.show(3)
Circle.map_table(starts_map_data.relabel('name', 'labels'))
<IPython.core.display.HTML object><datascience.maps.Map at 0x7f016552bdf0>旧金山那个巨大的色块显示,该市的东部地区是湾区无可匹敌的自行车租赁之都。