Data Structures

Chapter 4 notes

Data Structures

Data Structures: Approaches to organizing abstract data types, such that the data can be accessed efficiently
List-Like Structures: Also referred to as sequences and collections, a data structure that holds multiple individual values gathered together under one variable name, accessed via indices. This includes structures like lists, arrays, and tuples. Lists are simultaneously a general type of data structure and a specific data type in some languages.
Index: A number used to access a particular element from a list-like data structure. Traditionally, most programming languages assign the first item of a list-like data structure the index 0.

python 数据结构的种类

String
Lists
File Input and Output
Dictionaries
hash maps

Value vs Reference

Passing by Value：各自有一个备份（拷贝传递）
- 新建一个内存区域存储值
- 修改的是之前值的拷贝
Passing by Ref：共享一个 value
- 实际上传递的是指向内存地址的坐标
- 会改变当前内存区域的值

passing value or ref in python

大多数 python 的高级数据类型（比如 list ）都是 passing by reference，而基础类型是 passing by value
python 的运算符通常是 passing (assignment) by value，因为是基于基础类型的操作
python 的 member function(method) 通常是 passing by reference，因为是基于高级数据结构的操作

Mutability in Python

Mutability：指变量在声明后是否可以更改
Mutable Variable：声明后可更改的变量
Immutable Variable：声明后不可更改的变量

实际上，python 所有的传递都是引用传递。之所以基础类型的传递看起来像是值传递（拷贝），是因为 python 中的基础类型都是 Immutable Variable。因此对基础类型的赋值期间，做了以下的操作：

申请一块新内存
将指向原有值的引用指向新内存
在新内存中存储新址

需要注意的是，local 变量是独立存在的。跨越 scope 的赋值并不能改变原有变量的值（引用）；比如在函数内部对变量进行修改不会影响传递进函数的变量。

实际上，python 在调用函数的时候进行的操作，都是引用指向的变化。假设我我们有以下程序：

#Add one to anInteger
def addOne(anInteger):  
    anInteger = anInteger + 1
    print("anInteger:", anInteger)

#Create myInteger with the value 5
myInteger = 5   
print("myInteger before addOne:", myInteger)
#Call addOne on myInteger
addOne(myInteger)   
print("myInteger after addOne:", myInteger)

首先，myInteger 指向了内存内容为 5 的区块
其次，当调用 addOne() 函数时，当传递 myInteger 给 anInteger 时，python 建立了一个新的名字指向了 myInteger 区块
当在函数中对 anInteger 进行修改时，由于 myInteger 属于 Immutable Variable，因此 python 此时只能创建一个新的内存区块，用于存储 anInteger 的新值，此时为 6
因此我们会在最后打印结果里看到 myInteger 的值依然是 5，因为 myInteger 的指向没有发生过变化。

python 中内存地址的打印

python id() 函数返回变量对应的内存地址。

#address printing 
myInt = 5
print(id(myInt1))

通过该函数可以观察到 Immutable variable 赋值之后内存地址会发生变化，但 mutable variable 不会。
List 的地址指向其元素区块的首地址
member function 不会改变 List 的地址，但重新指向新的 List 会。

python 中，如果对两个 name 赋予相同的值：

如果是 immutable variable，那么 python 会自动关联当前值相同的 name，导致的结果是这两个 name 会指向同一个内存地址。实际上，即便是同一个变量，当其值发生变化后再变化为最初的值，其指向的地址和一开始指向的地址是完全一样的。
如果是 mutable variable，那么 python 创建的则是两个独立的副本。

Methods

Method 同 C++ 成员函数。一些常用的 method:

#string
isdigit()
isupper()
#check if the specified prefix exsit in the string  
string.startswith(prefix)

所有 method 都可以以非成员函数（function）的形式调用，比如：

#equal calls
isdigit(myString)
myString.isdigit()

Strings

String: A data structure that holds a list, or a string, of characters.
Character：可以是字母，数字，符号，或者其他。
- 通过 Unicode (一种 16 进制的代号) 来表示
- 某些特殊的字符不显示，比如换行符（Newline Character，分为 line feed 和 carriage return）

Declaring Strings in Python

引号的打印

如果 string 中包含了双引号（quotation mark） " 和单引号（apostrophes） '，那么有如下三种方法打印：

如果以双引号" 开始，那么一定会以第一个遇到的双引号 " 结束。打印双引号之间的内容，比如：

a_string = "Helloworld"
Helloworld

如果想打印双引号，那么可以使用单引号起始。string 会在遇到下一个单引号的时候结束，打印两个单引号之间的内容，比如：

a_string = '"Helloworld"'
"Helloworld"

想打印混合单双引号的情况：python 提供了三个单引号 ''' 这种形式来处理该问题。连续的三个单引号会被视作双引号或者单引号，在遇到下一个连续三个单引号的时候结束，打印之间的内容。比如：

a_string = ''''"Helloworld'''
'"Helloworld

换行与反斜杠的打印

在 python 中，\（forward slash）之后的序列被称为 escape sequence。python 会检查 \ 后的内容是否与 escape sequence 中的内容匹配。如果匹配，则应用该内容。比较重要的有：

\n：换行（newline）
\t : tab
\"：反斜杠后面可以接 string 终结标志，比如双引号。python 会认为这种组合是我们希望打印反斜杠后面的字符，而不是终结 string。
\\：打印第二个反斜杠，并忽略其 escape sequence 起点的作用

注意：三个单引号无法被反斜杠标记。三个单引号起头遇到下一个三个单引号的序列之前，会将之间所有的内容强制转换为 string。因此，如果在使用三个单引号标记的 string 中使用了回车键，那么该回车会被视作换行，并反映到打印中。比如：

#an enter is after 5
a_string = '''12345 
67890'''

会打印：

12345
67890

String Concatenation and Slicing

Concatenation

通过 + 实现
通过 += 实现

my_string = "Hello"
my_string += "!"

Slicing

使用 index + foroop 分割 string

a_string = "hello"
for i in range(0, 3):
    my_string += astring[i]

使用 start & end 分割 string（下标支持变量）。start 和 end 也可以是 literal：

start = 0
end = 3
my_string = a_string[start : end]
#0,1,2,3
my_string = a_string[0:3]
#0,1,2, 3 alterntive
my_string = a_string[:3]
#4 to the end
my_string = a_string[3:]

python 有范围保护。如果 index 的范围超出了 string 的最大值，那么 substring 只会截取到被截取 string 的最后一位：

my_string = a_string[1:100]
#will print
ello

Slicing 和间隔

如果在截取 string 的时候需要添加间隔，可以使用如下的方法：

a_str = "Hello, world!"
#take every other 2 chars from second char (inclusive), "el,w"
my_string = a_str[1:9:2]
#take every other 3 chars from the beginning of the string, "Hl r!"
my_string = a_str[::3]

负数 index

负数 index 在 python 中表示从结束的方向进行下标的计量，比如：

#will print 3, notice the count start with 1, not zero
my_string = "01234"
print(my_string[-2])

负数 Index 也可以用于范围表示，比如：

my_string = "01234"
#012, the number till 3
print(my_string[:-2])
#34, the number from 3 to the end
print(my_string[-2:])

slicing 的连续使用

python 中会出现如下带有两个方括号写法：

myString = "1234567890"
print(myString[::2][2:])

这种实际上是进行了多次 slicing，也就是：

myString = "1234567890"
# 13579
myString = myString[::2]
# 579
myString = myString[2:]

String Searching

in

判断 substring 是否存在：in。

#print True
a_string = "I like it!"
print("I" in a_string)

string member find()

搜索指定的 surstring 的位置：成员函数 find()。find 会按指定关键字对指定字符串进行搜索，并范围第一个匹配的子字符串的起始下标。如果没有找到，则返回 -1。需要注意的是，搜索区分大小写。

#result is 2
myString = "ABCDE"
print(myString.find("CDE"))

通常，我们可以利用 find() 的返回值来作为循环的判断条件：

my_string = "ABCDEABCDEABCDEFGHIJFGHIJABCDEABCDEFGHIJ"
keyword = "AB"

find_location = my_string.find(keyword)

#while keyword is in the string, keep search
while find_location >= 0:
    print(keyword, "found at", find_location)
    #get the next index
    find_location = my_string.find(keyword, find_location + 1)

find() 可以添加 index 来搜索 string 中指定的范围：

myString = "ABCDEABCDEABCDE"
#Prints the first index of "CDE" in myString after 5
print(myString.find("CDE", 5)) 
#Prints the first index of "CDE" in myString between 3 and 6
print(myString.find("CDE", 3, 6))

其他的 string 成员

split()

split() 可以按指定的字符作为间隔符划分字符串：

#will print ['I', 'like', 'shorts!']
my_string = "I like shorts!"
print(my_string.split())

如果分隔符处于字符串的最后一位，那么 split() 还会额外产生一个空字符串作为最后的一个分割部分。

我们可以利用 split() 返回值的特性来计算 string 中单词的数量：

#Given the assumption that spaces indicate a new word
def num_words(a_string):
    return len(a_string.split())

utilites

以下的所有成员调用均不会改变原有 string 的内容。

# capitalize the first char in the string
print(myString.capitalize())
# lower all charactors in the string
print(myString.lower())
# caplitalize all characters in the string
print(myString.upper())
# caplitalize all characters follow by a space in the string
print(myString.title())
# strip out all spaces before or after the string
# e.g. "   I like shorts!   " -> "I like shorts!"
print(myString.strip())
# find and replace ALL instances of thekeyword with your own word
print(myString.replace("MY", "YOUR"))

# join an result yield by spilt() (a list) with a specialized character into a string
# e.g. ['I', 'like', 'shorts!'] -> "I-like-shorts!"
# notice "-"(the dedicated spacer) is the string that called join()
my_list = my_string.split()
print("-".join(my_list))

Lists

List 是 python 提供的，通过 index 访问的一种有序容器。List 有两种性质：

Mutability：
- list 内部的元素是否可以更改
- list 的长度是否可以更改
Homogeneity：
- 同一个 list 是否能接受多种不同类型的变量。homogenous 类型只能接收类型相同的变量，heterogenous 类型则相反。

Tuples

Tuples 是 python 提供的，类 list 的，但属于 immutiable 类型的数据结构。

Declaring Tuples

# using paranthesis
# using value 
myTuple = (1,2,3)
# using variable
my_int1 = 1
my_int2 = 2
myTuple = (my_int1, my_int2)

Tuple 支持由不同类型的变量组成（因为 Python 不会提前检查变量的类型）：

my_int = 1
my_str = "two"
myTuple = (my_int, my_str)

Tuple 被打印时，paranthesis 也会被打印
Tuple 元素为 string 的时候，quote（双引号）也会被打印

Reading Tuples

# using index
print(myTuple[0])
# using slice
print(myTuple[3:])

Tuple 还可以进行 unpacking，也就是将里面的元素全部释放出来作为各自单独的存在。unpacking 的时候，可以对每个元素赋予新的 name：

my_str = "Hello"
my_float = "5.1"
my_int = 5

# packing 
my_tuple = (my_str, my_float, my_int)

# unpacking
(my_new_str, my_new_float, my_new_int) = my_tuple

Tuple 的使用场景

一次返回多个变量：

#Returns a tuple containing the quotient and remainder
def quotientAndRemainder(dividend, divisor):    
    #do sthing...
    #Returns the tuple of the quotient and remainder
    return (quotient, remainder)

使用 unpacking 功能管理返回的变量，提高可读性：

(myQuotient, myRemainder) = quotientAndRemainder(myDividend, myDivisor)
print("Quotient:", myQuotient)
print("Remainder:", myRemainder)

nesting tuples

# define a nested tuple
mySuperTuple = ((1, 2, 3), (4, 5, 6), (7, 8, 9))

# define a nested tuple with variable
myTuple1 = (1, 2, 3)
myTuple2 = (4, 5, 6)
myTuple3 = (7, 8, 9)

mySuperTuple = (myTuple1, myTuple2, myTuple3)

#access first element in the second sub tuple
print(mySuperTuple[1][0])

Lists

List 可以使用所有 Tuple 支持的操作：

定义
读取
pack & unpack
nesting

定义 List 使用 square bracket：

my_list = [1,2,3]

List 与 Tuple 的不同之处在于 List 是可写的。

List 的赋值不是拷贝

list_2 = [1,2,3]
list_1 = list_2

上述代码中：

list_1 和 list_2 指向的是同一个 list
list 的 assignment 不会创建另外一个新的 list

List member function

sort()

作用：对元素按升序排列
后果：改变 list 内容

my_list = [6,2,3,1,5,4]
# [1, 2, 3, 4, 5, 6]
my_list.sort()

reverse()

作用：反转 list 中的元素顺序
后果：改变 list 内容

my_list = [6,2,3,1,5,4]
# [4, 5, 1, 3, 2, 6]
my_list.reverse()

append()

作用：在 list 末尾添加元素
后果：改变 list 内容

my_list = [6,2,3,1,5,4]
# [6, 2, 3, 1, 5, 4, 7]
my_list.append(7)

extend()

作用：在 list 末尾添加参数 list
后果：改变 list 内容

my_list = [6,2,3,1,5,4]
my_list2 = [0,0,0]
# [6, 2, 3, 1, 5, 4, 0, 0, 0]
my_list.extend(my_list2)

insert(idx, value)

作用：在指定下标处插入指定元素，插入位置之后的其他所有元素右移
后果：改变 list 内容

my_list = [6,2,3,1,5,4]
# [99, 6, 2, 3, 1, 5, 4]
my_list.insert(0,99)

pop()

作用：移除 list 最后一个元素并返回该元素
后果：改变 list 内容

my_list = [6,2,3,1,5,4]
# [6, 2, 3, 1, 5]
# 4
i = my_list.pop()

remove() & del

作用：移除 list 中指定的元素。
- remove() 的参数是 list 元素的内容
- del 的参数是 list 中元素的范围
后果：改变 list 内容

my_list = [6,2,3,1,5,4]
# [2, 3, 1, 5, 4]
my_list.remove(6)
print(my_list)
# [2, 3, 1]
del my_list[-2:]
print(my_list)

Lists, loops and functions

遍历 list 使用 for loop 和关键字 in：

for item in a_list:
    #do sth......

function 和 list

由于 list 是可写的，我们需要尤其注意其在函数内部的使用。总的来说，函数不应该改变输入到函数的内部的 list。

注意如下的形式：

for num in list:
   #do sth in num..

这种情况下 num 是局部变量，修改其值不影响对应的 list 元素本身。

tuple vs lists

tuple 用于大小确定的形式， list 用于不确定大小的形式
list 的元素通常有相似的属性，tuple 则用于不同意义的（或是有不同数据类型的）（convention）

Advanced List-Like Structures

stacks

LIFO: Last in, First out

只能从最上部添加元素
只能从最上部（最近添加）开始读取元素，不移动开最上面的元素，就不能读取下部的元素。
适用于的场景：有序的打包任务（比如清理厨房，然后清理卧室：必须做完清理厨房的子任务才能去清理卧室）

queue

FIFO:First in, First out

要访问数据必须要移除该数据

Linked List

元素存储是离散的，通过链（指针）来维护
插入操作非常快

File Input and Output

File Input and Output: The complementary processes of saving data to a file and loading data from a file, generally such that the state of the memory of the program is the same after saving and loading have occurred.

file types

encoding: 程序解释 file 的规则

Reading, Writing, Appending

Opening and Closing Files

打开文件需要包含在 try-block 里面，防止非法的数据
打开文件时通常需要注明 mode（只读，可写，append）等等。
文件被打开时，OS通常不允许其他程序修改该文件。所以程序处理完文件时，需要关闭文件。

Reading, Writing, Appending

Reading：loading file into program
Writing: 将当前内容写入文件（可能有覆盖的风险）
Appending: 将当前内容写入当前文件内容之后

Writing Files in Python

Writing 只能写 string
Wrtiing 默认无换行，需要手动添加 \n 在输出末尾

# open file "out_file.txt" in writing mode
output_file = open("out_file.txt", "w")

# wirte content to out_file.txt, string ONLY, NO NEWLINE by default
output_file.write(str(myInt1))
# mannally add newline
output_file.write(str(myInt1) + "\n")

# close file
output_file.close()

Write list

使用 for 循环分别写出每个元素
使用 writeline() 函数（不能换行，不能写非 string 的 list，不推荐）
使用 write() 与 join 组合（也不能支持非 string 的 list）
使用 print() 直接输出到文件（推荐）

# wirte every element each time
for name in list:
   output_file.write(str(name) + "\n")

# write a whole list in one time with writelines()
output_file.writelines(my_list)

# write a whole list in one time with join():
output_file.write("\n".join(my_list))

# print comes with line break by default, so no "\n" needed
print(name, file = output_file)

Appending to files

在已有内容后写入
good for loging

# appending mode
output_file = open("xxx.text", "a")

Reading Files in Python

使用 readline() 读取每一行内容（会读取换行）
- 控制额外的换行：
  - 使用 print() 的 tag end=“” 取消 print() 自带的换行
  - 使用 strip() 取消文本前后的内容
使用 read() 读取所有内容

# open input_file
input_file = open("xxx.txt", "r")
# readline() with "\n"
print(input_file.readline())

# readline() without "\n" by changing print()
print(input_file.readline(), end = "")

# readline() without "\n" by adding strip()
print(input_file.readline().strip())

# take size as parameter, default -1, which means the whole file
input_file.read()

读取的内容可以赋给变量。类型转换后会自动删除 white space 的内容：

my_int = int(input_file.readline())

Loading into Lists

python 中的文件可以通过 for 循环来一行一行全部读取

for line in input_file:
    my_list.append(line.strip())

Save and Load Functions

可以将读写封装为函数，提高使用效率：

def save(file_name, data):
    output_file = open(file_name, "w")
    for line in data:
        print(line, file = output_file)
    output_file.close()



def load(file_name):
    a_list = []
    input_file = open(file_name, "r")
    for line in input_file:
        a_list.append(line.strip())
    input_file.close()
    return a_list

my_list = [1,2,3,4,5,6,7]
save("test.txt", my_list)

loading_list = load("test.txt")
print(loading_list)

Dictionary

Dictionaries: A data structure comprised of key-value pairs, where a key is entered into the dictionary to get out a value.
- Dictionary Key: A value then, when passed into a dictionary, returns a corresponding value
- Dictionary Value: A value returned in response to a key in a dictionary.

Dictionary 和 List 的区别

List 只能通过 index 访问，如果要找特定数据，需要一个一个找，有序
Dictionary（Map）可以通过 key 直接访问，无序

Dictionaries in Python

Creating Dicts

使用大括号 { braces 创建。key 和 value 通过冒号 (colon) : 组成 pair“：

#define
my_dict = {"sprockets" : 5, "widgets" : 11, "cogs" : 3, "gizmos": 15}

#access and modifiy
my_dict["sprockets"] = 1

#if unsure whether a operation is an creating or modifiying:
dictionary.setdefault("key", 0)
dictionary["key"] += 1

Dictionary 可以直接通过函数传递。
Key is immutiable。

Adding to and Removing

添加 / 删除 / 访问 / 验证操作都是基于 Key
访问不存在的元素会抛出 KeyError
查看 pair 是否存在可以通过使用 in 关键字查询 key 是否存在于字典中

# adding pair 
myDictionary["gadgets"] = 1 

# delete pair
del myDictionary["gadgets"]

Traversing

# if we only concern the value
for val in dict.value():
   if val > 5:
     # do sth

# if we only care about key
for k in dict:
 # do sth

for k in dict.key():
  # do sth
  
# if we need to bring them in both

for (k, val) in dict.items():
   # do sth

Dictionary Applications

使用名称作为 key，内容作为 value (比如统计词的出现数)
value 部分可以放入更多的详细信息（比如包含信息的 tuple, list ）
字典自身可以嵌套，组成更复杂的数据结构。这种结构更像是面向对象的概念

使用 key 统计 value 出现的次数

dict = {}
for name in a_list:
    # if the name already has the record, add 1 count
    if name in dict:
        dict[name] += 1
    # else, create the name and add 1 count
    else:
        dict[name] == 1

将 value 当做 key 使用

这种情况下需要知道 value 的范围。假设 value 是餐桌的编号（1-4），而 key 是客人的名字；如果想统计哪个桌子上都有谁：

for tab_num in range(1, 4):
    for (name, table) in seating_chart.item():
        if tab_num == table:
            print(name, end = " ")

References

String methods

目录

Data Structures

Data Structures

python 数据结构的种类

Value vs Reference

passing value or ref in python

Mutability in Python

python 中内存地址的打印

Methods

Strings

Declaring Strings in Python

引号的打印

换行与反斜杠的打印

String Concatenation and Slicing

Concatenation

Slicing

Slicing 和 间隔

负数 index

slicing 的连续使用

String Searching

in

string member find()

其他的 string 成员

split()

utilites

Lists

Tuples

Declaring Tuples

Reading Tuples

Tuple 的使用场景

nesting tuples

Lists

List 的赋值不是拷贝

List member function

sort()

reverse()

append()

extend()

insert(idx, value)

pop()

remove() & del

Lists, loops and functions

function 和 list

tuple vs lists

Advanced List-Like Structures

stacks

queue

Linked List

File Input and Output

file types

Reading, Writing, Appending

Opening and Closing Files

Reading, Writing, Appending

Writing Files in Python

Write list

Appending to files

Reading Files in Python

Loading into Lists

Save and Load Functions

Dictionary

Dictionary 和 List 的区别

Dictionaries in Python

Creating Dicts

Adding to and Removing

Traversing

Dictionary Applications

使用 key 统计 value 出现的次数

将 value 当做 key 使用

References

Slicing 和间隔