当前位置： > 网站建设 > wordpress建站 > 文章内容

python流程控制语句怎么t用(python 流程编排)

http://www.itjxue.com 2023-04-09 21:51 来源:未知 点击次数:

数据分析员用python做数据分析是怎么回事，需要用到python中的那些内容，具体是怎么操作的?

最近，Analysis with Programming加入了Planet Python。我这里来分享一下如何通过Python来开始数据分析。具体内容如下：

数据导入

导入本地的或者web端的CSV文件；

数据变换；

数据统计描述；

假设检验

单样本t检验；

可视化；

创建自定义函数。

数据导入

这是很关键的一步，为了后续的分析我们首先需要导入数据。通常来说，数据是CSV格式，就算不是，至少也可以转换成CSV格式。在Python中，我们的操作如下：

import pandas as pd

# Reading data locally

df = pd.read_csv('/Users/al-ahmadgaidasaad/Documents/d.csv')

# Reading data from web

data_url = ""

df = pd.read_csv(data_url)

为了读取本地CSV文件，我们需要pandas这个数据分析库中的相应模块。其中的read_csv函数能够读取本地和web数据。

END

数据变换

既然在工作空间有了数据，接下来就是数据变换。统计学家和科学家们通常会在这一步移除分析中的非必要数据。我们先看看数据（下图）

对R语言程序员来说，上述操作等价于通过print(head(df))来打印数据的前6行，以及通过print(tail(df))来打印数据的后6行。当然Python中，默认打印是5行，而R则是6行。因此R的代码head(df, n = 10)，在Python中就是df.head(n = 10)，打印数据尾部也是同样道理

请点击输入图片描述

在R语言中，数据列和行的名字通过colnames和rownames来分别进行提取。在Python中，我们则使用columns和index属性来提取，如下：

# Extracting column names

print df.columns

# OUTPUT

Index([u'Abra', u'Apayao', u'Benguet', u'Ifugao', u'Kalinga'], dtype='object')

# Extracting row names or the index

print df.index

# OUTPUT

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78], dtype='int64')

数据转置使用T方法，

# Transpose data

print df.T

# OUTPUT

0 ? ? ?1 ? ? 2 ? ? ?3 ? ? 4 ? ? ?5 ? ? 6 ? ? ?7 ? ? 8 ? ? ?9

Abra ? ? ?1243 ? 4158 ?1787 ?17152 ?1266 ? 5576 ? 927 ?21540 ?1039 ? 5424

Apayao ? ?2934 ? 9235 ?1922 ?14501 ?2385 ? 7452 ?1099 ?17038 ?1382 ?10588

Benguet ? ?148 ? 4287 ?1955 ? 3536 ?2530 ? ?771 ?2796 ? 2463 ?2592 ? 1064

Ifugao ? ?3300 ? 8063 ?1074 ?19607 ?3315 ?13134 ?5134 ?14226 ?6842 ?13828

Kalinga ?10553 ?35257 ?4544 ?31687 ?8520 ?28252 ?3106 ?36238 ?4973 ?40140

... ? ? ? 69 ? ? 70 ? ? 71 ? ? 72 ? ? 73 ? ? 74 ? ? 75 ? ? 76 ? ? 77

Abra ? ? ... ? ?12763 ? 2470 ?59094 ? 6209 ?13316 ? 2505 ?60303 ? 6311 ?13345

Apayao ? ... ? ?37625 ?19532 ?35126 ? 6335 ?38613 ?20878 ?40065 ? 6756 ?38902

Benguet ?... ? ? 2354 ? 4045 ? 5987 ? 3530 ? 2585 ? 3519 ? 7062 ? 3561 ? 2583

Ifugao ? ... ? ? 9838 ?17125 ?18940 ?15560 ? 7746 ?19737 ?19422 ?15910 ?11096

Kalinga ?... ? ?65782 ?15279 ?52437 ?24385 ?66148 ?16513 ?61808 ?23349 ?68663

Abra ? ? ?2623

Apayao ? 18264

Benguet ? 3745

Ifugao ? 16787

Kalinga ?16900

Other transformations such as sort can be done using codesort/code attribute. Now let's extract a specific column. In Python, we do it using either codeiloc/code or codeix/code attributes, but codeix/code is more robust and thus I prefer it. Assuming we want the head of the first column of the data, we have

其他变换，例如排序就是用sort属性。现在我们提取特定的某列数据。Python中，可以使用iloc或者ix属性。但是我更喜欢用ix，因为它更稳定一些。假设我们需数据第一列的前5行，我们有：

print df.ix[:, 0].head()

# OUTPUT 0 ? ? 1243 1 ? ? 4158 2 ? ? 1787 3 ? ?17152 4 ? ? 1266 Name: Abra, dtype: int64

顺便提一下，Python的索引是从0开始而非1。为了取出从11到20行的前3列数据，我们有

print df.ix[10:20, 0:3]

# OUTPUT

Abra ?Apayao ?Benguet

10 ? ?981 ? ?1311 ? ? 2560

11 ?27366 ? 15093 ? ? 3039

12 ? 1100 ? ?1701 ? ? 2382

13 ? 7212 ? 11001 ? ? 1088

14 ? 1048 ? ?1427 ? ? 2847

15 ?25679 ? 15661 ? ? 2942

16 ? 1055 ? ?2191 ? ? 2119

17 ? 5437 ? ?6461 ? ? ?734

18 ? 1029 ? ?1183 ? ? 2302

19 ?23710 ? 12222 ? ? 2598

20 ? 1091 ? ?2343 ? ? 2654

上述命令相当于df.ix[10:20, ['Abra', 'Apayao', 'Benguet']]。

为了舍弃数据中的列，这里是列1(Apayao)和列2(Benguet)，我们使用drop属性，如下：

print df.drop(df.columns[[1, 2]], axis = 1).head()

# OUTPUT

Abra ?Ifugao ?Kalinga

0 ? 1243 ? ?3300 ? ?10553

1 ? 4158 ? ?8063 ? ?35257

2 ? 1787 ? ?1074 ? ? 4544

3 ?17152 ? 19607 ? ?31687

4 ? 1266 ? ?3315 ? ? 8520

axis?参数告诉函数到底舍弃列还是行。如果axis等于0，那么就舍弃行。

END

统计描述

下一步就是通过describe属性，对数据的统计特性进行描述：

print df.describe()

# OUTPUT

Abra ? ? ? ?Apayao ? ? ?Benguet ? ? ? ?Ifugao ? ? ? Kalinga

count ? ? 79.000000 ? ? 79.000000 ? ?79.000000 ? ? 79.000000 ? ? 79.000000

mean ? 12874.379747 ?16860.645570 ?3237.392405 ?12414.620253 ?30446.417722

std ? ?16746.466945 ?15448.153794 ?1588.536429 ? 5034.282019 ?22245.707692

min ? ? ?927.000000 ? ?401.000000 ? 148.000000 ? 1074.000000 ? 2346.000000

25% ? ? 1524.000000 ? 3435.500000 ?2328.000000 ? 8205.000000 ? 8601.500000

50% ? ? 5790.000000 ?10588.000000 ?3202.000000 ?13044.000000 ?24494.000000

75% ? ?13330.500000 ?33289.000000 ?3918.500000 ?16099.500000 ?52510.500000

max ? ?60303.000000 ?54625.000000 ?8813.000000 ?21031.000000 ?68663.000000

END

假设检验

Python有一个很好的统计推断包。那就是scipy里面的stats。ttest_1samp实现了单样本t检验。因此，如果我们想检验数据Abra列的稻谷产量均值，通过零假设，这里我们假定总体稻谷产量均值为15000，我们有：

from scipy import stats as ss

# Perform one sample t-test using 1500 as the true mean

print ss.ttest_1samp(a = df.ix[:, 'Abra'], popmean = 15000)

# OUTPUT

(-1.1281738488299586, 0.26270472069109496)

返回下述值组成的元祖：

t : 浮点或数组类型t统计量

prob : 浮点或数组类型two-tailed p-value 双侧概率值

通过上面的输出，看到p值是0.267远大于α等于0.05，因此没有充分的证据说平均稻谷产量不是150000。将这个检验应用到所有的变量，同样假设均值为15000，我们有：

print ss.ttest_1samp(a = df, popmean = 15000)

# OUTPUT

(array([ -1.12817385, ? 1.07053437, -65.81425599, ?-4.564575 ?, ? 6.17156198]),

array([ ?2.62704721e-01, ? 2.87680340e-01, ? 4.15643528e-70,

1.83764399e-05, ? 2.82461897e-08]))

第一个数组是t统计量，第二个数组则是相应的p值

END

可视化

Python中有许多可视化模块，最流行的当属matpalotlib库。稍加提及，我们也可选择bokeh和seaborn模块。之前的博文中，我已经说明了matplotlib库中的盒须图模块功能。

请点击输入图片描述

# Import the module for plotting

import matplotlib.pyplot as plt

plt.show(df.plot(kind = 'box'))

现在，我们可以用pandas模块中集成R的ggplot主题来美化图表。要使用ggplot，我们只需要在上述代码中多加一行，

import matplotlib.pyplot as plt

pd.options.display.mpl_style = 'default' # Sets the plotting display theme to ggplot2

df.plot(kind = 'box')

这样我们就得到如下图表：

请点击输入图片描述

比matplotlib.pyplot主题简洁太多。但是在本文中，我更愿意引入seaborn模块，该模块是一个统计数据可视化库。因此我们有：

# Import the seaborn library

import seaborn as sns

# Do the boxplot

plt.show(sns.boxplot(df, widths = 0.5, color = "pastel"))

请点击输入图片描述

多性感的盒式图，继续往下看。

请点击输入图片描述

plt.show(sns.violinplot(df, widths = 0.5, color = "pastel"))

请点击输入图片描述

plt.show(sns.distplot(df.ix[:,2], rug = True, bins = 15))

请点击输入图片描述

with sns.axes_style("white"):

plt.show(sns.jointplot(df.ix[:,1], df.ix[:,2], kind = "kde"))

请点击输入图片描述

plt.show(sns.lmplot("Benguet", "Ifugao", df))

END

创建自定义函数

在Python中，我们使用def函数来实现一个自定义函数。例如，如果我们要定义一个两数相加的函数，如下即可：

def add_2int(x, y):

return x + y

print add_2int(2, 2)

# OUTPUT

顺便说一下，Python中的缩进是很重要的。通过缩进来定义函数作用域，就像在R语言中使用大括号{…}一样。这有一个我们之前博文的例子：

产生10个正态分布样本，其中和

基于95%的置信度，计算和?;

重复100次; 然后

计算出置信区间包含真实均值的百分比

Python中，程序如下：

import numpy as np

import scipy.stats as ss

def case(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):

m = np.zeros((rep, 4))

for i in range(rep):

norm = np.random.normal(loc = mu, scale = sigma, size = n)

xbar = np.mean(norm)

low = xbar - ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

up = xbar + ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

if (mu low) (mu up):

rem = 1

else:

rem = 0

m[i, :] = [xbar, low, up, rem]

inside = np.sum(m[:, 3])

per = inside / rep

desc = "There are " + str(inside) + " confidence intervals that contain "

"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs"

return {"Matrix": m, "Decision": desc}

上述代码读起来很简单，但是循环的时候就很慢了。下面针对上述代码进行了改进，这多亏了?Python专家

import numpy as np

import scipy.stats as ss

def case2(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):

scaled_crit = ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

norm = np.random.normal(loc = mu, scale = sigma, size = (rep, n))

xbar = norm.mean(1)

low = xbar - scaled_crit

up = xbar + scaled_crit

rem = (mu low) (mu up)

m = np.c_[xbar, low, up, rem]

inside = np.sum(m[:, 3])

per = inside / rep

desc = "There are " + str(inside) + " confidence intervals that contain "

"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs"

return {"Matrix": m, "Decision": desc}

python流程控制语句怎么t用(python 流程编排)

Python中怎样使用循环控制语句，比如说这段代码i=i+4是什么意思？

while后面跟的是循环的条件，条件为小于变量的长度，当条件为假时，也就是大与变量的长度了停止循环。i=0，i=i+4，那么吧0代入i=i+4中右边的i中，得到4，得到4后，赋值左边的变量i就是4了，然后比较是不是符合条件，不符合条件就会停止循环，符合条件，就继续循环，那么变量i刚刚为4，现在就4+4得到8然后左边的变量i重新被赋值为8然后再去做判断，i是变量！变量！能变得，能被重新赋值。然后继续跟条件比较，不符合了就停止，符合条件，继续?4......

python有哪些控制语句

控制语句：

if语句，当条件成立时运行语句块。经常与else, elif(相当于else if) 配合使用。

for语句，遍历列表、字符串、字典、集合等迭代器，依次处理迭代器中的每个元素。

while语句，当条件为真时，循环运行语句块。

try语句，与except,finally配合使用处理在程序运行中出现的异常情况。

class语句，用于定义类型。

def语句，用于定义函数和类型的方法。

pass语句，表示此行为空，不运行任何操作。

assert语句，用于程序调试阶段时测试运行条件是否满足。

with语句，Python2.6以后定义的语法，在一个场景中运行语句块。比如，运行语句块前加密，然后在语句块运行退出后解密。

yield语句，在迭代器函数内使用，用于返回一个元素。自从Python 2.5版本以后。这个语句变成一个运算符。

raise语句，制造一个错误。

import语句，导入一个模块或包。

from … import语句，从包导入模块或从模块导入某个对象。

import … as语句，将导入的对象赋值给一个变量。

in语句，判断一个对象是否在一个字符串/列表/元组里。

Python汇总的while语句怎么使用？

1、while循环语句

迭代(iteration)意味着反复执行相同的代码块。实现迭代的编程结构称为循环(loop)。

假设有一项任务，要在屏幕上输出从1到100的数字。仅仅使用之前讨论过的知识，可能会写出如下代码：

print(1)

print(2)

print(3)

#?此处省略print(4)?~?print(99)

print(100)

这样做非常麻烦，也不聪明。学会编程的好处之一就是可以方便地处理重复工作。Python中的循环可以分为while循环和for循环两种。

与if语句类似，while语句同样首先判断布尔表达式，如果为真，那么继续执行循环体;如果为假，那么跳出循环体。while循环的语法格式如下：

while 表达式:

循环体

使用while循环来完成从1到100的输出任务：

?n?=?1???#?因为从1开始打印，所以先将n赋值为1

?while?n?=?100:???#?每次都判断n的值是否小于等于100

...?????print(n)???#?打印n的值

...?????n?=?n?+?1???#?每次将n的值增加1，以便打印2、3、4……

...

……

100

while循环总是不断地判断条件是否满足，直到条件不满足才跳出循环。利用这个特性，可以编写一个“猜数字”的游戏。先由一个人输入一个数字(自然数)，这个数作为谜底。再由另外一个人猜，如果猜的数字比谜底大，那么打印“猜大了”; 如果猜的数字比谜底小，那么打印“猜小了”;直到猜的数字与谜底相等为止，打印“恭喜，猜对了!”。代码示例如下：

answer?=?int(input('请输入谜底数字：'))

number?=?int(input('请输入您要猜的数字：'))

while?number?!=?answer:???#?如果number不等于answer，即猜错了，那么进入循环

if?number??answer:???#?如果猜的数字大于谜底

print('猜大了!')

else:???#?如果猜的数字小于谜底

print('猜小了!')

number?=?int(input('请继续输入您要猜的数字：'))

print('恭喜，猜对了！')???#?如果跳出循环，那么说明猜对了

建议将这个程序使用脚本模式运行，输入数据，测试运行结果：

请输入谜底数字：77

请输入您要猜的数字：20

猜小了!

请继续输入您要猜的数字：90

猜大了!

请继续输入您要猜的数字：80

猜大了!

请继续输入您要猜的数字：77

恭喜，猜对了！

2、小心死循环

死循环是指程序在流程控制中一直重复运行某一段代码，无法依靠自身的控制终止循环。初学者很容易不小心写出死循环。例如，在前文使用while循环完成从1到100的输出任务中，如果将最后一行代码“n = n + 1”删掉，那么该循环就变成了死循环。这是因为n的值不会增加，从而会一直满足“n = 100”的条件无法跳出循环。试着运行如下代码：

?n?=?1

?while?n??100:

...?????print(n)

...

……

当敲下最后一个回车键时，程序会在屏幕上不停地打印1，直到强制关闭程序窗口，或者按Ctrl+C键终止程序。死循环可能会使电脑崩溃或者造成其他与预期不符的后果，因此，在使用while循环时要格外小心，看看是否有能够满足条件跳出循环的“出口”。

需要指出的是，有些应用场景下需要使用死循环(例如，操作系统最外层是死循环，保证电脑一直运行)，也就是说写出死循环不一定是错的，但应该仅当清楚自己在做什么时，才写死循环。

关于Python的基础问题可以看下这个网页的视频教程，网页链接，希望我的回答能帮到你。

(责任编辑：IT教学网)

复制链接发给好友收藏本文关闭此页

上一篇：自考本科报名官网入口2023(深圳自考本科报名官网入口2023)

下一篇：怎样自学中医从哪里入手(普通人想学中医去哪学)

python流程控制语句怎么t用(python 流程编排)

数据分析员用python做数据分析是怎么回事，需要用到python中的那些内容，具体是怎么操作的?

Python中怎样使用循环控制语句，比如说这段代码i=i+4是什么意思？

python有哪些控制语句

Python汇总的while语句怎么使用？

(责任编辑：IT教学网)

相关wordpress建站文章

阅读排行

专题教程

推荐wordpress建站文章

最新更新wordpress建站