Category Archives: Python|R

Core Python Programming Reading Note 3

2012年8月30日 by debugo · 2 Comments

1. stringIO, CStringIO
StringIO的行为与file对象非常像，但它不是磁盘上文件，而是一个内存里的“文件”，我们可以将操作磁盘文件那样来操作StringIO。

>>> s = StringIO.StringIO("hello world")
>>> s.write("abcdefgn")
>>> s.write('加油中国')
>>> s.seek(0)
>>> print s.read()
abcdefg
加油中国
>>> s.seek(-4, 2)
>>> print s.read()
中国

>>> s = StringIO.StringIO("hello world")

>>> s.write("abcdefgn")

>>> s.write('加油中国')

>>> s.seek(0)

>>> print s.read()

abcdefg

加油中国

>>> s.seek(-4, 2)

>>> print s.read()

中国

我们看到了StringIO的行为，基本与file一致。StringIO提供了一个方法，可以方便的获取其中的数据：StringIO.getvalue()。
如果使用read方法获取其中的数据，必须通过seek先设置”文件指针”的位置。
Python标准模块中还提供了一个CStringIO模块，它的行为与StringIO基本一致，但运行效率方面比StringIO更好。
但使用cStringIO模块时，有几个注意点：
1. CStringIO.StringIO不能作为基类被继承；
2. 创建cStringIO.StringIO对象时，如果初始化函数提供了初始化数据，新生成的对象是只读的。所以下面的代码是错误的：
s = cStringIO.StringIO(“JGood/n”);
s.write(“OOOKKK”);
2. module
模块是按照逻辑上组织Python代码的方法，那么文件爱你是物理层上组织代码的方法。因此一个文件可以被看作一个独立的模块。一个模块也可以被看作一个文件。
模块的文件名就是模块的名字加上扩展名.py。每个模块都定义了自己独立的名字空间。
模块的搜索路径：sys.path列表。像该列表加入自己的新模块路径即可:sys.path.append(“/home/luffy/pycode”)
三个名字空间:内建名字空间(__builtins__)，局部名字空间和全局名字空间。最先加载的是内建空间，随后加载模块的全局名字空间。局部空间的内容是随时变化的(全局空间是不变的)。获得局部空间和全局空间的符号：globals()和locals()。

>>> globals()
{'a': 1, 'lam': <function <lambda> at 0x01A687F0>, '__builtins__': <module '__bu
iltin__' (built-in)>, '__package__': None, 'x': 99, 'y': 10, '__name__': '__main
__', 'primes': <function <lambda> at 0x01A68830>, '__doc__': None}

>>> globals()

{'a': 1, 'lam': <function <lambda> at 0x01A687F0>, '__builtins__': <module '__bu

iltin__' (built-in)>, '__package__': None, 'x': 99, 'y': 10, '__name__': '__main

__', 'primes': <function <lambda> at 0x01A68830>, '__doc__': None}

Python的一个有用的特性是你可以在任何需要房子数据的地方获得一个名字空间。所以给函数添加属性可以（例如添加version和__doc__属性等）：

>>> a =1
>>> def f():
...     a = 2
...     print a
...
>>> f()
2
>>> f.a=3
>>> f()
2
>>> print f.a
3

>>> a =1

>>> def f():

... a = 2

... print a

...

>>> f()

>>> f.a=3

>>> f()

>>> print f.a

可以看出，局部变量和同一名字空间下的变量是不一样的。
—–习惯的import顺序—–
Python标准库模块
Python三方模块
应用程序自定义模块
—–from-import语句—–
把模块的名称引入到当前域.
但是from module import *的操作会污染当前作用域的名字空间，要谨慎shiyo
—–import xxx as yyy语句—–
import Tkinter as tk
—–引入时执行模块—–
加载模块会直接导致这个模块被执行。也就是被导入模块的顶层代码将直接被执行。
一个模块只会被加载一次，无论他被引入多少次。这可以组织多重导入时代码被多次执行。
—–模块名称重复—–

#importee.py
foo = 'abc'
def show()
    print 'importee %s'%foo
#importer.py
from importee import foo, show
show()                                        #abc
foo='efg'
print 'importee %s'%foo             #是当前名字空间下的foo
show()                                        #还是show所在importee中的名字空间下的foo

#importee.py

foo = 'abc'

def show()

print 'importee %s'%foo

#importer.py

from importee import foo, show

show() #abc

foo='efg'

print 'importee %s'%foo #是当前名字空间下的foo

show() #还是show所在importee中的名字空间下的foo

C:Usersluffy>python importer.py
importee abc
importee efg
importee abc
这种引入方法是和不好的，所以推荐使用import和白完整的标识符名称。
__import__()函数
__import__(module_name[, globals[, locals]])
reload()函数
重新导入模块。重新导入的模块需要的直接模块名而非字符串。
3. Package
包是一个有层次的文件目录结构，它定义了一个由模块和子包组成的Python应用程序执行环境。用户解决下面问题：
为平坦的名称空间加入有层次的组织结构；
把有联系的模块结合在一起（子包）；
解决有冲突的模块名称；
使用目录结构而不是一大堆混乱的文件。
初始化包需要有有一个__init__.py文件，__init__.py的文件将会被执行。否则导致一个ImportWarning信息。
sys.modules变量包含了一个由当前载入（完整&成功）到解释器的模块组成的字典。模块名作为键，它们的位置作为值。
4. lambda函数
lambda函数是一种快速定义单行的最小函数，是从 Lisp 借用来的，可以用在任何需要函数的地方。常常和map,reduce,filter等函数中使用，例如阶层：
print reduce(lambda x,y:x*y, range(1, 1001))
自定义排序方法：
list_people=[People(21,’male’),People(20,’famale’),People(34,’male’),People(19,’famale’)]
list_people.sort(lambda p1,p2:cmp(p1.age,p2.age))
其他简写：
>>> arrayA = [1,2,3,4,5,6,7]
>>> arrayB = [ number for number in arrayA if number % 2 ]
>>>people_who_want_to_watch_av_film = [person1, person2, person3, person4]
然后，我们检查他们的年龄，产生能够观看的人员的列表，并打印出来：
>>> people_who_can_watch_av_film = [‘Hi, %s %s, you can watch av!’ % (person[‘surname’], person[‘givename’]) for person in people_who_want_to_watch_av_film if person[‘age’] >= 18]
lambda工厂式：

>>> def make_incrementor(n):
...     return lambda x: x + n
...
>>> f = make_incrementor(42)
>>> f(0)
42
>>> f(1)
43

>>> def make_incrementor(n):

... return lambda x: x + n

...

>>> f = make_incrementor(42)

>>> f(0)

>>> f(1)

删除之中的偶数

>>> map(lambda x:x if x%2 else None,[x for x in range(100)])[1::2]
求素数
>>> primes = lambda n:[x for x in range(1,n) if not [y for y inrange(2,int(x**0.5 +1)) if x % y == 0]]
>>> primes(500)
[1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157,
163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241,
251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,
349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439,
443, 449, 457, 461, 463, 467, 479, 487, 491, 499]

>>> map(lambda x:x if x%2 else None,[x for x in range(100)])[1::2]

求素数

>>> primes = lambda n:[x for x in range(1,n) if not [y for y inrange(2,int(x**0.5 +1)) if x % y == 0]]

>>> primes(500)

[1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,

73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157,

163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241,

251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,

349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439,

443, 449, 457, 461, 463, 467, 479, 487, 491, 499]

5. class & magic method
(1). __init__ & __del__

class Simple:  
     def __init__( self[, other_parameter]):  
         pass  
     def __del__( self):  
         pass  
     def func( self ):  
         pass

class Simple:

def __init__( self[, other_parameter]):

pass

def __del__( self):

pass

def func( self ):

pass

(2). __str__和__unicode__
__str__() is a Python “magic method” that defines what should be returned if you call str() on the object. Thus, you should always return a nice, human-readable string for the object’s __str__. Although this isn’t required, it’s strongly encouraged.

class Person():
    def __init__(self, name):
        self.name = name
    def __str__(self):
        return 'name is %s' % self.name
print Person('luffy')

class Person():

def __init__(self, name):

self.name = name

def __str__(self):

return 'name is %s' % self.name

print Person('luffy')

__unicode__(self)同理，调用buildin函数unicode(obj)时调用。
__repr__(self)同理，输出更详细的类型信息
(3). 私有机制
python对象没有权限控制，所有的变量都是外部可以调用的。但是程序员只有一些约定俗成的方法：私有变量加双下划线。如__name。
(4). __iter__
buildin函数iter
i1 = iter(itr, ‘c’)
这个意思是说，返回itr的iterator，而且在之后的迭代之中，迭代出来’c’就立马停止。对这个itr有什么要求呢？这个itr在这里必须是callable的，即要实现__call__函数
i1 = iter(itr)
这里itr必须实现__iter__函数，这个函数的返回值必须返回一个iterator对象

class Itr(object):
    def __init__(self):
        self.result = ['a', 'b', 'c', 'd']
        self.i = iter(self.result)
    def __call__(self):
        res = next(self.i)
        print("__call__ called, which would return ", res)
        return res
    def __iter__(self):
        print("__iter__ called")
        return iter(self.result)
itr = Itr()

class Itr(object):

def __init__(self):

self.result = ['a', 'b', 'c', 'd']

self.i = iter(self.result)

def __call__(self):

res = next(self.i)

print("__call__ called, which would return ", res)

return res

def __iter__(self):

print("__iter__ called")

return iter(self.result)

itr = Itr()

# i1必须是callable的，否则无法返回callable-iterator

i1 = iter(itr, 'c')
print("i1 = ", i1)
('i1 = ', <callable-iterator object at 0x01ACE9D0>)
# i2只需要类实现__iter__函数即可返回
i2 = iter(itr)        #此时i2只是一个可迭代的list对象
__iter__ called
print("i2 = ", i2)
('i2 = ', <listiterator object at 0x01ACEA30>)
for i in i1:
    print(i)
>>> for i in i1:
...     print i
...
('__call__ called, which would return ', 'a')
a
('__call__ called, which would return ', 'b')
b
('__call__ called, which would return ', 'c')
>>> for i in i2:
...     print i
...
a
b
c
d

i1 = iter(itr, 'c')

print("i1 = ", i1)

('i1 = ', <callable-iterator object at 0x01ACE9D0>)

# i2只需要类实现__iter__函数即可返回

i2 = iter(itr) #此时i2只是一个可迭代的list对象

__iter__ called

print("i2 = ", i2)

('i2 = ', <listiterator object at 0x01ACEA30>)

for i in i1:

print(i)

>>> for i in i1:

... print i

...

('__call__ called, which would return ', 'a')

('__call__ called, which would return ', 'b')

('__call__ called, which would return ', 'c')

>>> for i in i2:

... print i

...

(5). super()
调用父类的方法：

>>> class A:
...     def __init__(self):
...             print 'enter A init1'
...             print 'enter A init2'
...     def __del__(self):
...             print "leave A"
...
>>> class B(A):
...     def __init__(self):
...             print "enter B init 1"
...             #A.__init__(self) or super(A, self)
...             print "enter B init 2"
...     def __del__(self):
...             print "leave B"
...
>>> B()
enter B init 1
enter B init 2
leave B
<__main__.B instance at 0x01B16300>

>>> class A:

... def __init__(self):

... print 'enter A init1'

... print 'enter A init2'

... def __del__(self):

... print "leave A"

...

>>> class B(A):

... def __init__(self):

... print "enter B init 1"

... #A.__init__(self) or super(A, self)

... print "enter B init 2"

... def __del__(self):

... print "leave B"

...

>>> B()

enter B init 1

enter B init 2

leave B

<__main__.B instance at 0x01B16300>

这里我们发现python的继承不会自动调用父类的构造器/析构器，必须亲自调用它。
1. super并不是一个函数，是一个类名，形如super(B, self)事实上调用了super类的初始化函数，产生了一个super对象；
2. super类的初始化函数并没有做什么特殊的操作，只是简单记录了类类型和具体实例；
3. super(B, self).func的调用并不是用于调用当前类的父类的func函数；
4. Python的多继承类是通过mro的方式来保证各个父类的函数被逐一调用，而且保证每个父类函数只调用一次（如果每个类都使用super）；
5. 混用super类和非绑定的函数是一个危险行为，这可能导致应该调用的父类函数没有调用或者一个父类函数被调用多次。
(6). 函数的名称，doc_string
func_doc The function’s documentation string, or None if unavailable Writable
__doc__ Another way of spelling func_doc Writable
func_name The function’s name Writable
__name__ Another way of spelling func_name Writabl
(7). magic method
object.__lt__(self, other)
object.__le__(self, other)
object.__eq__(self, other)
object.__ne__(self, other)
object.__gt__(self, other)
object.__ge__(self, other)
object.__cmp__(self, other) #Called by comparison operations if rich comparison (see above) is not defined. Should return a negative integer if self other
object.__hash__(self） #Called by built-in function hash()
If a class does not define a __cmp__() or __eq__() method it should not define a __hash__() operation either;
if it defines__cmp__() or __eq__() but not __hash__(), its instances will not be usable in hashed collections.
If a class defines mutable objects and implements a __cmp__() or __eq__() method, it should not implement __hash__(),
since hashable collection implementations require that a object’s hash value is immutable
(if the object’s hash value changes, it will be in the wrong hash bucket).

Posted in Python|R.

Core Python Programming Reading Note 2

2012年8月29日 by debugo · 2 Comments

1. zip, enumerate
enumerate(list) 参数为可遍历的变量，为字符串或列表，返回(index,元素)的组合。
例1 找到某一个字符串中1出现的位置：

def xread_line(line):
    return((idx,int(val))for idx, val in enumerate(line)if val != '0')

printlist(xread_line('0001110101'))
[(3, 1), (4, 1), (5, 1), (7, 1), (9, 1)]

print xread_line('0001110101')
<generator object <genexpr> at 0x01ABA468>

def xread_line(line):

return((idx,int(val))for idx, val in enumerate(line)if val != '0')

printlist(xread_line('0001110101'))

[(3, 1), (4, 1), (5, 1), (7, 1), (9, 1)]

print xread_line('0001110101')

例2：

for i in range(len(L)):
    item = L[i]
    # ... compute some result based on item ...
    L[i] = result This can be rewritten using enumerate() as:
for i, item in enumerate(L)
# ... compute some result based on item ...
    L[i] = result

for i in range(len(L)):

item = L[i]

# ... compute some result based on item ...

L[i] = result This can be rewritten using enumerate() as:

for i, item in enumerate(L)

# ... compute some result based on item ...

L[i] = result

Continue reading →

Posted in Python|R.

Core Python Programming Reading Note 1

2012年8月27日 by debugo · Leave a comment

1. build-in functions
int(obj)
str(obj)
len(obj)
type(obj)
help(obj) 获得对象说明
dir(obj) 显示对象属性/方法列表

>>> dir(int)
['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delat
tr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__forma
t__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__',
'__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul
__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow
__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_
ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ro
r__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rx
or__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '_
_truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', '
imag', 'numerator', 'real']

>>> help(unicode)
Help on class unicode in module __builtin__:

class unicode(basestring)
 |  unicode(string [, encoding[, errors]]) -> object
 |
 |  Create a new Unicode object from the given encoded string.
 |  encoding defaults to the current default string encoding.
 |  errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.
 |
 |  Method resolution order:
 |      unicode
 |      basestring
 |      object
 |
 |  Methods defined here:
 |
 |  __add__(...)
 |      x.__add__(y) <==> x+y
 |
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x
 |
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
-- More  --

>>> dir(int)

['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delat

tr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__forma

t__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__',

'__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul

__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow

__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_

ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ro

r__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rx

or__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '_

_truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', '

imag', 'numerator', 'real']

>>> help(unicode)

Help on class unicode in module __builtin__:

class unicode(basestring)

| unicode(string [, encoding[, errors]]) -> object

| Create a new Unicode object from the given encoded string.

| encoding defaults to the current default string encoding.

| errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.

| Method resolution order:

| unicode

| basestring

| object

| Methods defined here:

| __add__(...)

| x.__add__(y) <==> x+y

| __contains__(...)

| x.__contains__(y) <==> y in x

| __eq__(...)

| x.__eq__(y) <==> x==y

-- More --

操作和man帮助类似，space翻页，q退出。
2. id, type, isinstance
每一个对象有一个id值，使用id()函数得到的。对象还可以使用type()来获得类型信息。

>>> str=u'abcd'
>>> type(str)
<type 'unicode'>
>>> id(str)
19647568

>>> str=u'abcd'

>>> type(str)

>>> id(str)

19647568

关于类和对象的类型：

>>> class Foo:
...     def a():
...             print a
...
>>> type(Foo)
<type 'classobj'>
>>> a=Foo()
>>> type(a)
<type 'instance'>

>>> class Foo:

... def a():

... print a

...

>>> type(Foo)

>>> a=Foo()

>>> type(a)

判断对象是不是某个类的:isinstance函数：

>>> isinstance(a, Foo)
True
>>> isinstance(1.23,(int,float))
True
>>> isinstance(1.23,(long,complex))
False

>>> isinstance(a, Foo)

True

>>> isinstance(1.23,(int,float))

True

>>> isinstance(1.23,(long,complex))

False

当一个对象的引用计数为0时，将会被gc自动收集。
3. eval,exec,compile
eval(str [,globals [,locals ]])
函数将字符串str当成有效Python表达式来求值，并返回计算结果。
同样地, exec语句将字符串str当成有效Python代码来执行..exec(str) 这种形式也被接受，但是它没有返回值。
最后，execfile(filename [,globals [,locals ]])
函数可以用来执行一个文件,看下面的例子:

>>> eval('3+4')
7
>>> exec 'a=100'
>>> a
100
>>> execfile(r'c:test.py')
hello,world!

>>> eval('3+4')

>>> exec 'a=100'

>>> a

100

>>> execfile(r'c:test.py')

hello,world!

默认的，eval(),exec,execfile()所运行的代码都位于当前的名字空间中. eval(), exec,和 execfile()函数也可以接受一个或两个可选字典参数作为代码执行的全局名字空间和局部名字空间. 例如:

>>>globals = {'x': 7,
          'y': 10,
          'birds': ['Parrot', 'Swallow', 'Albatross']
       }
>>>locals = {}

>>>globals = {'x': 7,

'y': 10,

'birds': ['Parrot', 'Swallow', 'Albatross']

}

>>>locals = {}

# 将上边的字典作为全局和局部名称空间

>>>a = eval("3*x + 4*y", globals, locals)
>>>exec "for b in birds: print b" in globals, locals   # 注意这里的语法
>>>execfile("foo.py", globals, locals)

>>>a = eval("3*x + 4*y", globals, locals)

>>>exec "for b in birds: print b" in globals, locals # 注意这里的语法

>>>execfile("foo.py", globals, locals)

如果你省略了一个或者两个名称空间参数,那么当前的全局和局部名称空间就被使用。
注意例子中exec语句的用法和eval(), execfile()是不一样的. exec是一个语句(就象print或while), 而eval()和execfile()则是内建函数.
当一个字符串被exec,eval(),或execfile()执行时,解释器会先将它们编译为字节代码，然后再执行.这个过程比较耗时,所以如果需要对某段代码执行很多次时,最好还是对该代码先进行预编译,这样就不需要每次都编译一遍代码，可以有效提高程序的执行效率。
compile(str ,filename ,kind )函数将一个字符串编译为字节代码, str是将要被编译的字符串, filename是定义该字符串变量的文件，kind参数指定了代码被编译的类型– ‘single’指单个语句, ‘exec’指多个语句, ‘eval’指一个表达式. cmpile()函数返回一个代码对象，该对象当然也可以被传递给eval()函数和exec语句来执行,例如:

>>>str = "for i in range(0,10): print i"
>>>c = compile(str,'','exec')      # 编译为字节代码对象
>>>exec c                          # 执行
0
1
2
3
4
5
6
7
8
9
>>>str2 = "3*x + 4*y"
>>>c2 = compile(str2, '', 'eval')  # 编译为表达
 
>>> locals={'x':3,'y':4}
>>> eval(c2, {}, locals)
25

>>>str = "for i in range(0,10): print i"

>>>c = compile(str,'','exec') # 编译为字节代码对象

>>>exec c # 执行

>>>str2 = "3*x + 4*y"

>>>c2 = compile(str2, '', 'eval') # 编译为表达

>>> locals={'x':3,'y':4}

>>> eval(c2, {}, locals)

4. is(is not) & ==(!=)
is 是变量是否指向同一对象，即id()的相同。==是对象的内容是否相等（通过对象内建函数__cmp__来定义)。

>>> str1=u'abcde'
>>> str2=u'abcde'
>>> str1 == str2
True
>>> str1 is str2
False
>>> id(str1)
19647568
>>> id(str2)
31030136

>>> str1=u'abcde'

>>> str2=u'abcde'

>>> str1 == str2

True

>>> str1 is str2

False

>>> id(str1)

19647568

>>> id(str2)

31030136

需要注意的是，数值常量和字符串常量会被哈希缓冲。所以指向相同字符串的变量，其id也会相同。
几种python中没有的机制：
a). 不分float, double，只有float
b). 没有指针
c). 没有char/byte类型
d). 没有short类型，但是有int和long。其类型自动判断。

>>> type(i)
<type 'int'>
>>> i = 10000000000000000000000
>>> type(i)
<type 'long'>

>>> type(i)

>>> i = 10000000000000000000000

>>> type(i)

★5. map, reduce, filter, apply
使用C语言编写并作了优化，代替循环来提升性能。
map(fun , list) #将list的每个元素作用于一元函数fun，并且返回所有fun值的列表

>>> a=(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
>>> fun=lambda x:x**2
>>> print map(fun,a)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 3
61, 400]
reduce(function, sequence[, initial])     #将sequence作用于2元函数function，function有两个参数，反悔这两个参数经过函数运算后的值。
>>>reduce(lambda x, y: x+y, [1, 2, 3, 4, 5])         #形式类似于((((1+2)+3)+4)+5).
15
filter(function, sequence)    #对sequence中的item依次执行function(item)，将执行结果为True的item组成一个List/String/Tuple（取决于sequence的类型）返回：
>>> myfilter=lambda x:x%2!=0 and x%3!=0
>>> filter(myfilter,range(2,25))
[5, 7, 11, 13, 17, 19, 23]

>>> a=(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

>>> fun=lambda x:x**2

>>> print map(fun,a)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 3

61, 400]

reduce(function, sequence[, initial]) #将sequence作用于2元函数function，function有两个参数，反悔这两个参数经过函数运算后的值。

>>>reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) #形式类似于((((1+2)+3)+4)+5).

filter(function, sequence) #对sequence中的item依次执行function(item)，将执行结果为True的item组成一个List/String/Tuple（取决于sequence的类型）返回：

>>> myfilter=lambda x:x%2!=0 and x%3!=0

>>> filter(myfilter,range(2,25))

[5, 7, 11, 13, 17, 19, 23]

apply(func [, args [, kwargs ]]) #间接执行一个函数func，它的参数在列表args中，或者在字典kwargs中。apply的返回值就是func的返回值。apply()的元祖参数是有序的，元素的顺序必须和func()形式参数的顺序一致。
#需要把一个图片拆分为大小相同的很多的小图片。按照以前写程序的思想就是写两重循环了，枚举开始左上角的点。

def split_all_image(im ,size_box):
     """                                                                                                                                                      
     split the image into many subimages                                                                                                                      
     the subimage's size (x,y) is size_box                                                                                                                    
     type size_box = tuple                                                                                                                                    
     ex: split_all_image(im , (16,16))                                                                                                                        
     """
     max_x , max_y = im.size
     step_x , step_y = size_box
     max_x -= step_x
     max_y -= step_y
     ys = xrange(0 , max_y  , step_y)
     tmp = [map(lambda y : (x,y) , [y for y in ys]) for x in xrange(0 , max_x , step_x)]
     xy = reduce(lambda x , y : x + y , tmp)
     blocks = map(lambda x : get_signle_block(im , (x[0],x[1],x[0]+step_x,x[1]+step_y)) , xy)
     return blocks

def split_all_image(im ,size_box):

"""

split the image into many subimages

the subimage's size (x,y) is size_box

type size_box = tuple

ex: split_all_image(im , (16,16))

"""

max_x , max_y = im.size

step_x , step_y = size_box

max_x -= step_x

max_y -= step_y

ys = xrange(0 , max_y , step_y)

tmp = [map(lambda y : (x,y) , [y for y in ys]) for x in xrange(0 , max_x , step_x)]

xy = reduce(lambda x , y : x + y , tmp)

blocks = map(lambda x : get_signle_block(im , (x[0],x[1],x[0]+step_x,x[1]+step_y)) , xy)

return blocks

在数据分析时，集合对象的操作和map,filter,reduce等函数尤其重要。

#List排重
i = [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 5, 6, 9, 0]
# o = [0, 1, 2, 3, 4, 5, 6, 7, 9]
o = list( set(i) )
#合并List
i = [[1], [2, 3], [4, 5, 6], [1, 2, 3]]
# o = [1, 2, 3, 4, 5, 6, 1, 2, 3]
o = sum( i, [] )
#序号化序列
i = ['a', 'b', 'c']
# o = [(0, 'a'), (1, 'b'), (2, 'c')]
o = list( enumerate(i) )
#反向化序列j
i = ['a', 'b', 'c']
# o = ['c', 'b', 'a']
o = list( reversed(i) )
#把所有非List元素转为List
i = [[1], [2, 3], 4]
# o = [[1], [2, 3], [4]]
o = [ x if type(x)==type([]) else [x,] for x in i ]
#不唯一的元素
i = [1, 2, 3, 4, 5, 6, 7, 8, 1, 3, 5, 7, 8]
# o = [1, 3, 5, 7, 8, 1, 3, 5, 7, 8]
o = [ x for x in i if i.count(x)!= 1 ]
#找出List中所有长度为3的单词的所在位置
i = ['How', 'are', 'you', '?', 'Fine', '.', 'Thank', 'you', '.']
# o = [0, 1, 2, 7]
o = [ idx for idx, x in enumerate(i) if len(x) == 3 ]
#切分List
i = [1, 'hello', 5, 6, 'world', 7]
# o = [[1, 5, 6, 7], ['hello', 'world']]
o = [ [ y for y in i if type(y) == x ] for x in set([type(x) for x in i])]
#反向索引
i = [1, 1, 4, 5, 9, 6, 4]
# o = [(1, [0, 1]), (4, [2, 6]), (5, [3]), (6, [5]), (9, [4])]
o = [ ( x, [ idx for idx, y in enumerate(i) if x == y ] ) for x in set(i) ]
#取区间
i = [2, 3, 5, 7, 11, 13, 17, 23]
# o = [(2, 3), (3, 5), (5, 7), (7, 11), (11, 13), (13, 17), (17, 23)]
o = zip( i[:-1], i[1:] )
#取区间2
i = [2, 3, 5, 7, 11, 13, 17, 23]
# o = [(2, 3), (5, 7), (11, 13), (17, 23)]
o = zip( i[::2], i[1::2] )
#取区间3
i = [2, 3, 5, 7, 11, 13, 17, 23]
# o = [(2, 3, 5), (5, 7, 11), (11, 13, 17)]
o = zip( i[::2], i[1::2], i[2::2] )
#求置换矩阵
i = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
# o = [['a', 'd', 'g'], ['b', 'e', 'h'], ['c', 'f', 'i']]
o = zip(*i)
#矩阵乘法
i = [[1, 2, 3], [4, 5, 6]]
# o = [(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)
]
o = [ ( a, b ) for a in i[0] for b in i[1] ]
#矩阵乘法2
i = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
# o = [[1, 4, 7], [1, 4, 8], [1, 4, 9],
# [1, 5, 7], [1, 5, 8], [1, 5, 9],
# [1, 6, 7], [1, 6, 8], [1, 6, 9],
# [2, 4, 7], [2, 4, 8], [2, 4, 9],
# ...
# [3, 6, 7], [3, 6, 8], [3, 6, 9]]
o = [[sum([x*y for x in i for y in j]) for j in b] for i in a]
#分段落1
i = '''
this a phase
this a phase
this
a
phase
this a phase
'''.splitlines()
# o = ['this a phase', 'this a phase', 'thisn an phase', 'this a phase']
o = [ idx for idx, l in enumerate(i) if not l.startswith(' ') ]
o = zip( o, o[1:]+[None,] )
o = [ 'n'.join(i[s:e]) for s, e in o ]
#分段落2
i = '''
[sectionA]
itemA
itemB
[sectionB]
itemC
itemD
[sectionC]
[sectionD]
itemE
'''.splitlines()
# o = {'sectionA': ['itemA', 'itemB'],
# 'sectionB': ['itemC', 'itemD'],
# 'sectionC': [],
# 'sectionD': ['itemE']}
o = [ idx for idx, l in enumerate(i) if l.startswith('[') and l.endswith(']')
]
o = zip( o, o[1:]+[None,] )
o = dict([ ( i[s][1:-1], [ l for l in i[s+1:e] if l != '' ] ) for s, e in o ]
)
#生成替换Dict
i = (('a', 'b', 'c'), ('e', 'f', 'g'), ('h'))
# o = {'a': 'a', 'b': 'a', 'c': 'a', 'e': 'e', 'f': 'e', 'g': 'e', 'h': 'h'}
o = dict( sum( [ zip( x, [x[0],]*len(x) ) for x in i ], [] ) )
#合并Dict
i = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4, 'c': 6}, {'c': 7, 'd': 8}]
# o = {'a': [1, 3], 'b': [2, 4], 'c': [6, 7], 'd': [8]}
o = set(sum([ x.keys() for x in i ],[]))
o = dict([ (key,[ x[key] for x in i if key in x ]) for key in o ])
#以'idx'为关键字合并Dict ('切分List'与'合并Dict'结合)
i = [{'a': 1, 'b': 2, 'idx': 'hello'},
{'a': 3, 'b': 4, 'c': 6, 'idx': 'hello'},
{'c': 7, 'd': 8, 'idx': 'world'}]
# o = {'hello': {'a': [1, 3], 'b': [2, 4], 'c': [6]},
# 'world': {'c': [7], 'd': [8]}}
o = set([ d['idx'] for d in i ])
ods = [ [ d for d in i if d['idx'] == k ] for k in o ]
oks = [ set(sum([ d.keys() for d in od ],[])) for od in ods ]
ods = [ dict([ ( k, [ d[k] for d in od if k in d ] ) for k in ok if k!='idx']
)
for od, ok in zip(ods, oks) ]
o = dict( zip( o, ods ) )

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

#List排重

i = [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 5, 6, 9, 0]

# o = [0, 1, 2, 3, 4, 5, 6, 7, 9]

o = list( set(i) )

#合并List

i = [[1], [2, 3], [4, 5, 6], [1, 2, 3]]

# o = [1, 2, 3, 4, 5, 6, 1, 2, 3]

o = sum( i, [] )

#序号化序列

i = ['a', 'b', 'c']

# o = [(0, 'a'), (1, 'b'), (2, 'c')]

o = list( enumerate(i) )

#反向化序列j

i = ['a', 'b', 'c']

# o = ['c', 'b', 'a']

o = list( reversed(i) )

#把所有非List元素转为List

i = [[1], [2, 3], 4]

# o = [[1], [2, 3], [4]]

o = [ x if type(x)==type([]) else [x,] for x in i ]

#不唯一的元素

i = [1, 2, 3, 4, 5, 6, 7, 8, 1, 3, 5, 7, 8]

# o = [1, 3, 5, 7, 8, 1, 3, 5, 7, 8]

o = [ x for x in i if i.count(x)!= 1 ]

#找出List中所有长度为3的单词的所在位置

i = ['How', 'are', 'you', '?', 'Fine', '.', 'Thank', 'you', '.']

# o = [0, 1, 2, 7]

o = [ idx for idx, x in enumerate(i) if len(x) == 3 ]

#切分List

i = [1, 'hello', 5, 6, 'world', 7]

# o = [[1, 5, 6, 7], ['hello', 'world']]

o = [ [ y for y in i if type(y) == x ] for x in set([type(x) for x in i])]

#反向索引

i = [1, 1, 4, 5, 9, 6, 4]

# o = [(1, [0, 1]), (4, [2, 6]), (5, [3]), (6, [5]), (9, [4])]

o = [ ( x, [ idx for idx, y in enumerate(i) if x == y ] ) for x in set(i) ]

#取区间

i = [2, 3, 5, 7, 11, 13, 17, 23]

# o = [(2, 3), (3, 5), (5, 7), (7, 11), (11, 13), (13, 17), (17, 23)]

o = zip( i[:-1], i[1:] )

#取区间2

i = [2, 3, 5, 7, 11, 13, 17, 23]

# o = [(2, 3), (5, 7), (11, 13), (17, 23)]

o = zip( i[::2], i[1::2] )

#取区间3

i = [2, 3, 5, 7, 11, 13, 17, 23]

# o = [(2, 3, 5), (5, 7, 11), (11, 13, 17)]

o = zip( i[::2], i[1::2], i[2::2] )

#求置换矩阵

i = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]

# o = [['a', 'd', 'g'], ['b', 'e', 'h'], ['c', 'f', 'i']]

o = zip(*i)

#矩阵乘法

i = [[1, 2, 3], [4, 5, 6]]

# o = [(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)

]

o = [ ( a, b ) for a in i[0] for b in i[1] ]

#矩阵乘法2

i = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# o = [[1, 4, 7], [1, 4, 8], [1, 4, 9],

# [1, 5, 7], [1, 5, 8], [1, 5, 9],

# [1, 6, 7], [1, 6, 8], [1, 6, 9],

# [2, 4, 7], [2, 4, 8], [2, 4, 9],

# ...

# [3, 6, 7], [3, 6, 8], [3, 6, 9]]

o = [[sum([x*y for x in i for y in j]) for j in b] for i in a]

#分段落1

i = '''

this a phase

this

phase

this a phase

'''.splitlines()

# o = ['this a phase', 'this a phase', 'thisn an phase', 'this a phase']

o = [ idx for idx, l in enumerate(i) if not l.startswith(' ') ]

o = zip( o, o[1:]+[None,] )

o = [ 'n'.join(i[s:e]) for s, e in o ]

#分段落2

i = '''

[sectionA]

itemA

itemB

[sectionB]

itemC

itemD

[sectionC]

[sectionD]

itemE

'''.splitlines()

# o = {'sectionA': ['itemA', 'itemB'],

# 'sectionB': ['itemC', 'itemD'],

# 'sectionC': [],

# 'sectionD': ['itemE']}

o = [ idx for idx, l in enumerate(i) if l.startswith('[') and l.endswith(']')

]

o = zip( o, o[1:]+[None,] )

o = dict([ ( i[s][1:-1], [ l for l in i[s+1:e] if l != '' ] ) for s, e in o ]

)

#生成替换Dict

i = (('a', 'b', 'c'), ('e', 'f', 'g'), ('h'))

# o = {'a': 'a', 'b': 'a', 'c': 'a', 'e': 'e', 'f': 'e', 'g': 'e', 'h': 'h'}

o = dict( sum( [ zip( x, [x[0],]*len(x) ) for x in i ], [] ) )

#合并Dict

i = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4, 'c': 6}, {'c': 7, 'd': 8}]

# o = {'a': [1, 3], 'b': [2, 4], 'c': [6, 7], 'd': [8]}

o = set(sum([ x.keys() for x in i ],[]))

o = dict([ (key,[ x[key] for x in i if key in x ]) for key in o ])

#以'idx'为关键字合并Dict ('切分List'与'合并Dict'结合)

i = [{'a': 1, 'b': 2, 'idx': 'hello'},

{'a': 3, 'b': 4, 'c': 6, 'idx': 'hello'},

{'c': 7, 'd': 8, 'idx': 'world'}]

# o = {'hello': {'a': [1, 3], 'b': [2, 4], 'c': [6]},

# 'world': {'c': [7], 'd': [8]}}

o = set([ d['idx'] for d in i ])

ods = [ [ d for d in i if d['idx'] == k ] for k in o ]

oks = [ set(sum([ d.keys() for d in od ],[])) for od in ods ]

ods = [ dict([ ( k, [ d[k] for d in od if k in d ] ) for k in ok if k!='idx']

)

for od, ok in zip(ods, oks) ]

o = dict( zip( o, ods ) )

Book 《Python核心编程第二版》

Posted in Python|R.

Category Archives: Python|R

Core Python Programming Reading Note 3

Core Python Programming Reading Note 2

Core Python Programming Reading Note 1

近期文章

热评文章

文章归档

分类目录

友链

功能