======Data Structures======
//Chapter 4 notes//
----
====Data Structures====
  * **Data Structures**: Approaches to organizing abstract data types, such that the data can be accessed efficiently
  * **List-Like Structures**: Also referred to as sequences and collections, a data structure that holds multiple individual values gathered together under one variable name, accessed via indices. This includes structures like lists, arrays, and tuples. Lists are simultaneously a general type of data structure and a specific data type in some languages.
  *** Index**: A number used to access a particular element from a list-like data structure. Traditionally, most programming languages assign the first item of a list-like data structure the index 0.
==python 数据结构的种类==
  * String
  * Lists
  * File Input and Output
  * Dictionaries
  * hash maps
===Value vs Reference===
  * Passing by Value：各自有一个备份（拷贝传递）
    * 新建一个内存区域存储值
    * 修改的是之前值的拷贝
  * Passing by Ref：共享一个 value
    * 实际上传递的是指向内存地址的坐标
    * 会改变当前内存区域的值
==passing  value or ref in python==
  * 大多数 python 的高级数据类型（比如 list ）都是 passing by reference，而基础类型是 passing by value
  * python 的运算符通常是 passing (assignment) by value，因为是基于基础类型的操作
  * python 的 member function(method) 通常是 passing by reference，因为是基于高级数据结构的操作
===Mutability in Python===
  * Mutability：指变量在声明后是否可以更改
  * Mutable Variable：声明后可更改的变量
  * Immutable Variable：声明后不可更改的变量
实际上，python **所有的传递都是引用传递**。之所以基础类型的传递看起来像是值传递（拷贝），是因为 python 中的基础类型都是 Immutable Variable。因此对基础类型的赋值期间，做了以下的操作：
  * 申请一块新内存
  * 将指向原有值的引用指向新内存
  * 在新内存中存储新址
需要注意的是，local 变量是**独立存在**的。跨越 scope 的赋值并不能改变原有变量的值（引用）；比如在函数内部对变量进行修改不会影响传递进函数的变量。
\\ \\ 
实际上，python 在调用函数的时候进行的操作，都是引用指向的变化。假设我我们有以下程序：
<code py>
#Add one to anInteger
def addOne(anInteger):  
    anInteger = anInteger + 1
    print("anInteger:", anInteger)

#Create myInteger with the value 5
myInteger = 5   
print("myInteger before addOne:", myInteger)
#Call addOne on myInteger
addOne(myInteger)   
print("myInteger after addOne:", myInteger)

</code>
  - 首先，''myInteger'' 指向了内存内容为 ''5'' 的区块
  - 其次，当调用 ''addOne()'' 函数时，当传递 ''myInteger'' 给 ''anInteger'' 时，python 建立了一个新的名字指向了 ''myInteger'' 区块
  - 当在函数中对 ''anInteger'' 进行修改时，由于 ''myInteger'' 属于 Immutable Variable，因此 python 此时只能创建一个新的内存区块，用于存储 ''anInteger'' 的新值，此时为 ''6''
  - 因此我们会在最后打印结果里看到 ''myInteger'' 的值依然是 ''5''，因为 ''myInteger'' 的指向没有发生过变化。
==python 中内存地址的打印==
python ''id()'' 函数返回变量对应的内存地址。
<code py>
#address printing 
myInt = 5
print(id(myInt1))
</code>

<WRAP center round tip 100%>
  * 通过该函数可以观察到 Immutable variable 赋值之后内存地址会发生变化，但 mutable variable 不会。
  * //List// 的地址指向其元素区块的首地址
  * member function 不会改变 //List// 的地址，但重新指向新的 //List// 会。

</WRAP>

<WRAP center round box 100%>
python 中，如果对两个 name 赋予相同的值：
  * 如果是 immutable variable，那么 python 会自动关联当前值相同的 name，导致的结果是这两个 name 会指向同一个内存地址。实际上，即便是同一个变量，当其值发生变化后再变化为最初的值，**其指向的地址和一开始指向的地址是完全一样的**。
  * 如果是 mutable variable，那么 python 创建的则是两个独立的副本。
</WRAP>
===Methods===
Method 同 C++ 成员函数。一些常用的 method:
<code py>
#string
isdigit()
isupper()
#check if the specified prefix exsit in the string  
string.startswith(prefix)
</code>
所有 method 都可以以非成员函数（function）的形式调用，比如：
<code py>
#equal calls
isdigit(myString)
myString.isdigit()
</code>
====Strings====
  * String: A data structure that holds a list, or a string, of characters.
  * Character：可以是字母，数字，符号，或者其他。
    * 通过 //Unicode// (一种 16 进制的代号) 来表示
    * 某些特殊的字符不显示，比如换行符（//Newline Character//，分为 line feed 和 carriage return）
===Declaring Strings in Python===
==引号的打印==
如果 string 中包含了双引号（//quotation mark//）  ''%%"%%'' 和单引号（//apostrophes//） ''%%'%%''，那么有如下三种方法打印：
  * 如果以双引号''%%"%%'' 开始，那么一定会以第一个遇到的双引号 ''%%"%%'' 结束。打印双引号之间的内容，比如：
<code py>
a_string = "Helloworld"
Helloworld
</code>
  * 如果想打印双引号，那么可以使用单引号起始。string 会在遇到下一个单引号的时候结束，打印两个单引号之间的内容，比如：
<code py>
a_string = '"Helloworld"'
"Helloworld"
</code>
  * 想打印混合单双引号的情况：python 提供了三个单引号 ''%%'''%%'' 这种形式来处理该问题。连续的三个单引号会被视作双引号或者单引号，在遇到下一个连续三个单引号的时候结束，打印之间的内容。比如：
<code py>
a_string = ''''"Helloworld'''
'"Helloworld
</code>
==换行与反斜杠的打印==
在 python 中，''\''（//forward slash//）之后的序列被称为 //escape sequence//。python 会检查 ''\'' 后的内容是否与 //escape sequence// 中的内容匹配。如果匹配，则应用该内容。比较重要的有：
  * ''\n''：换行（//newline//）
  * ''\t'' :  //tab//
  * ''\%%"%%''：反斜杠后面可以接 string 终结标志，比如双引号。python 会认为这种组合是我们希望打印反斜杠后面的字符，而不是终结 string。
  * ''\\''：打印第二个反斜杠，并忽略其 //escape sequence// 起点的作用

注意：三个单引号无法被反斜杠标记。三个单引号起头遇到下一个三个单引号的序列之前，会将**之间所有的内容**强制转换为 string。因此，如果在使用三个单引号标记的 string 中使用了回车键，那么该回车会被视作换行，并反映到打印中。比如：
<code py>
#an enter is after 5
a_string = '''12345 
67890'''
</code>
会打印：
<code py>
12345
67890
</code>
===String Concatenation and Slicing===
==Concatenation==
  * 通过 ''+'' 实现
  * 通过 ''+='' 实现
<code py>
my_string = "Hello"
my_string += "!"
</code>
==Slicing==
  * 使用 index + foroop 分割 string
<code py>
a_string = "hello"
for i in range(0, 3):
    my_string += astring[i]
</code>
  * 使用 start & end 分割 string（下标支持变量）。''start'' 和 ''end'' 也可以是 literal：
<code py>
start = 0
end = 3
my_string = a_string[start : end]
#0,1,2,3
my_string = a_string[0:3]
#0,1,2, 3 alterntive
my_string = a_string[:3]
#4 to the end
my_string = a_string[3:]
</code>
python 有范围保护。如果 index 的范围超出了 string 的最大值，那么 substring 只会截取到被截取 string 的最后一位：
<code py>
my_string = a_string[1:100]
#will print
ello
</code>
==Slicing 和 间隔==
如果在截取 string 的时候需要添加间隔，可以使用如下的方法：
<code py>
a_str = "Hello, world!"
#take every other 2 chars from second char (inclusive), "el,w"
my_string = a_str[1:9:2]
#take every other 3 chars from the beginning of the string, "Hl r!"
my_string = a_str[::3]
</code>
==负数 index==
负数 index 在 python 中表示从**结束的方向**进行下标的计量，比如：
<code py>
#will print 3, notice the count start with 1, not zero
my_string = "01234"
print(my_string[-2])
</code>
负数 Index 也可以用于范围表示，比如：
<code py>
my_string = "01234"
#012, the number till 3
print(my_string[:-2])
#34, the number from 3 to the end
print(my_string[-2:])
</code>
==slicing 的连续使用==
python 中会出现如下带有两个方括号写法：
<code py>
myString = "1234567890"
print(myString[::2][2:])
</code>
这种实际上是进行了多次 slicing，也就是：
<code py>
myString = "1234567890"
# 13579
myString = myString[::2]
# 579
myString = myString[2:]
</code>
===String Searching===
==in==
判断 substring 是否存在：''in''。
<code py>
#print True
a_string = "I like it!"
print("I" in a_string)
</code>
==string member find()==
搜索指定的 surstring 的位置：**成员函数** ''find()''。''find'' 会按指定关键字对指定字符串进行搜索，并范围**第一个**匹配的子字符串的**起始下标**。如果没有找到，则返回 ''-1''。需要注意的是，搜索区分**大小写**。
<code py>
#result is 2
myString = "ABCDE"
print(myString.find("CDE")) 
</code>
通常，我们可以利用 ''find()'' 的返回值来作为循环的判断条件：
<code py>
my_string = "ABCDEABCDEABCDEFGHIJFGHIJABCDEABCDEFGHIJ"
keyword = "AB"

find_location = my_string.find(keyword)

#while keyword is in the string, keep search
while find_location >= 0:
    print(keyword, "found at", find_location)
    #get the next index
    find_location = my_string.find(keyword, find_location + 1)
</code>
''find()'' 可以添加 index 来搜索 string 中指定的范围：
<code py>
myString = "ABCDEABCDEABCDE"
#Prints the first index of "CDE" in myString after 5
print(myString.find("CDE", 5)) 
#Prints the first index of "CDE" in myString between 3 and 6
print(myString.find("CDE", 3, 6)) 
</code>
===其他的 string 成员===
==split()==
''split()'' 可以按指定的字符作为间隔符划分字符串：
<code py>
#will print ['I', 'like', 'shorts!']
my_string = "I like shorts!"
print(my_string.split())
</code>

<WRAP center round box 100%>
如果分隔符处于字符串的最后一位，那么 //split()// 还会额外产生一个**空字符串**作为最后的一个分割部分。
</WRAP>
我们可以利用 ''split()'' 返回值的特性来计算 string 中单词的数量：
<code py>
#Given the assumption that spaces indicate a new word
def num_words(a_string):
    return len(a_string.split())
</code>
==utilites==
以下的所有成员调用均不会改变原有 string 的内容。
<code py>
# capitalize the first char in the string
print(myString.capitalize())
# lower all charactors in the string
print(myString.lower())
# caplitalize all characters in the string
print(myString.upper())
# caplitalize all characters follow by a space in the string
print(myString.title())
# strip out all spaces before or after the string
# e.g. "   I like shorts!   " -> "I like shorts!"
print(myString.strip())
# find and replace ALL instances of thekeyword with your own word
print(myString.replace("MY", "YOUR"))

# join an result yield by spilt() (a list) with a specialized character into a string
# e.g. ['I', 'like', 'shorts!'] -> "I-like-shorts!"
# notice "-"(the dedicated spacer) is the string that called join()
my_list = my_string.split()
print("-".join(my_list))
</code>
====Lists====
//List// 是 python 提供的，通过 index 访问的一种**有序**容器。//List// 有两种性质：
  * **Mutability**：
    * list 内部的元素是否可以更改
    * list 的长度是否可以更改
  * **Homogeneity**：
    * 同一个 list 是否能接受多种不同类型的变量。//homogenous// 类型只能接收类型相同的变量，//heterogenous// 类型则相反。
===Tuples===
Tuples 是 python 提供的，类 list 的，但属于 immutiable 类型的数据结构。
==Declaring Tuples==
<code py>
# using paranthesis
# using value 
myTuple = (1,2,3)
# using variable
my_int1 = 1
my_int2 = 2
myTuple = (my_int1, my_int2)
</code>
//Tuple// 支持由不同类型的变量组成（因为 Python 不会提前检查变量的类型）：
<code py>
my_int = 1
my_str = "two"
myTuple = (my_int, my_str)
</code>
<WRAP center round info 100%>
  * //Tuple// 被打印时，paranthesis 也会被打印
  * //Tuple// 元素为 string 的时候，quote（双引号）也会被打印
</WRAP>
==Reading Tuples==
<code py>
# using index
print(myTuple[0])
# using slice
print(myTuple[3:])
</code>
//Tuple// 还可以进行 unpacking，也就是将里面的元素全部释放出来作为各自单独的存在。unpacking 的时候，可以对每个元素赋予新的 name：
<code py>
my_str = "Hello"
my_float = "5.1"
my_int = 5

# packing 
my_tuple = (my_str, my_float, my_int)

# unpacking
(my_new_str, my_new_float, my_new_int) = my_tuple

</code>
==Tuple 的使用场景==
  * 一次返回多个变量：
<code py>
#Returns a tuple containing the quotient and remainder
def quotientAndRemainder(dividend, divisor):    
    #do sthing...
    #Returns the tuple of the quotient and remainder
    return (quotient, remainder)   
</code>
  * 使用 unpacking 功能管理返回的变量，提高可读性：
<code py>
(myQuotient, myRemainder) = quotientAndRemainder(myDividend, myDivisor)
print("Quotient:", myQuotient)
print("Remainder:", myRemainder)
</code>
==nesting tuples==
<code py>
# define a nested tuple
mySuperTuple = ((1, 2, 3), (4, 5, 6), (7, 8, 9))

# define a nested tuple with variable
myTuple1 = (1, 2, 3)
myTuple2 = (4, 5, 6)
myTuple3 = (7, 8, 9)

mySuperTuple = (myTuple1, myTuple2, myTuple3)

#access first element in the second sub tuple
print(mySuperTuple[1][0])
</code>
===Lists===
//List// 可以使用所有 //Tuple// 支持的操作：
  * 定义
  * 读取
  * pack & unpack
  * nesting
定义 //List// 使用 square bracket：
<code py>
my_list = [1,2,3]
</code>
//List// 与 //Tuple// 的不同之处在于 //List// 是可写的。
== List 的赋值不是拷贝 ==
<code py>
list_2 = [1,2,3]
list_1 = list_2
</code>
上述代码中：
  * ''list_1'' 和 ''list_2'' 指向的是同一个 list
  * list 的 assignment 不会创建另外一个新的 list
===List member function===
==sort()==
  * 作用：对元素按**升序**排列
  * 后果：改变 list 内容
<code py>
my_list = [6,2,3,1,5,4]
# [1, 2, 3, 4, 5, 6]
my_list.sort()
</code>
==reverse()==
  * 作用：反转 list 中的元素顺序
  * 后果：改变 list 内容
<code py>
my_list = [6,2,3,1,5,4]
# [4, 5, 1, 3, 2, 6]
my_list.reverse()
</code>
==append()==
  * 作用：在 list 末尾添加元素
  * 后果：改变 list 内容
<code py>
my_list = [6,2,3,1,5,4]
# [6, 2, 3, 1, 5, 4, 7]
my_list.append(7)
</code>
==extend()==
  * 作用：在 list 末尾添加参数 list
  * 后果：改变 list 内容
<code py>
my_list = [6,2,3,1,5,4]
my_list2 = [0,0,0]
# [6, 2, 3, 1, 5, 4, 0, 0, 0]
my_list.extend(my_list2)
</code>
==insert(idx, value)==
  * 作用：在指定下标处插入指定元素，插入位置之后的其他所有元素右移
  * 后果：改变 list 内容
<code py>
my_list = [6,2,3,1,5,4]
# [99, 6, 2, 3, 1, 5, 4]
my_list.insert(0,99)

</code>
==pop()==
  * 作用：移除 list 最后一个元素并返回该元素
  * 后果：改变 list 内容
<code py>
my_list = [6,2,3,1,5,4]
# [6, 2, 3, 1, 5]
# 4
i = my_list.pop()
</code>
==remove() & del==
  * 作用：移除 list 中指定的元素。
    * ''remove()'' 的参数是 list 元素的内容
    * ''del'' 的参数是 list 中元素的范围
  * 后果：改变 list 内容
<code py>
my_list = [6,2,3,1,5,4]
# [2, 3, 1, 5, 4]
my_list.remove(6)
print(my_list)
# [2, 3, 1]
del my_list[-2:]
print(my_list)
</code>
===Lists, loops and functions===
遍历 list 使用 for loop 和关键字 ''in''：
<code py>
for item in a_list:
    #do sth......
</code>
==function 和 list==
由于 //list// 是可写的，我们需要尤其注意其在函数内部的使用。总的来说，函数**不应该**改变输入到函数的内部的 //list//。\\ \\ 
注意如下的形式：
<code py>
for num in list:
   #do sth in num..
</code>
这种情况下 ''num'' 是局部变量，修改其值不影响对应的 list 元素本身。
==tuple vs lists==
  * tuple 用于大小确定的形式， list 用于不确定大小的形式
  * list 的元素通常有相似的属性，tuple 则用于不同意义的（或是有不同数据类型的）（convention）
===Advanced List-Like Structures===
==stacks==  
//**LIFO**//: **Last in, First out**
  * 只能从最上部添加元素
  * 只能从最上部（最近添加）开始读取元素，不移动开最上面的元素，就不能读取下部的元素。
  * 适用于的场景：**有序**的打包任务（比如清理厨房，然后清理卧室：必须做完清理厨房的子任务才能去清理卧室）


==queue==
//**FIFO**//:**First in, First out**
  * 要访问数据必须要移除该数据
==Linked List==
  * 元素存储是离散的，通过链（指针）来维护
  * 插入操作非常快
====File Input and Output====
  * **File Input and Output**: The complementary processes of saving data to a file and loading data from a file, generally such that the state of the memory of the program is the same after saving and loading have occurred.
==file types==
  * encoding: 程序解释 file 的规则
===Reading, Writing, Appending===
==Opening and Closing Files==
  * 打开文件需要包含在 try-block 里面，防止非法的数据
  * 打开文件时通常需要注明 mode（只读，可写，append）等等。
  * 文件被打开时，OS通常不允许其他程序修改该文件。所以程序处理完文件时，需要关闭文件。
==Reading, Writing, Appending==
  * Reading：loading file into program
  * Writing: 将当前内容写入文件（可能有覆盖的风险）
  * Appending: 将当前内容写入当前文件内容**之后**
===Writing Files in Python===
  * Writing 只能写 string
  * Wrtiing 默认无换行，需要手动添加 ''\n'' 在输出末尾
<code py>
# open file "out_file.txt" in writing mode
output_file = open("out_file.txt", "w")

# wirte content to out_file.txt, string ONLY, NO NEWLINE by default
output_file.write(str(myInt1))
# mannally add newline
output_file.write(str(myInt1) + "\n")

# close file
output_file.close()
</code>
==Write list==
  * 使用 //for// 循环分别写出每个元素
  * 使用 ''writeline()'' 函数（不能换行，不能写非 string 的 list，不推荐）
  * 使用 ''write()'' 与 ''join'' 组合 （也不能支持非 string 的 list）
  * 使用 ''print()'' 直接输出到文件（**推荐**）
<code py>
# wirte every element each time
for name in list:
   output_file.write(str(name) + "\n")

# write a whole list in one time with writelines()
output_file.writelines(my_list)

# write a whole list in one time with join():
output_file.write("\n".join(my_list))

# print comes with line break by default, so no "\n" needed
print(name, file = output_file)
</code>
==Appending to files==
  * 在已有内容后写入
  * good for loging
<code py>
# appending mode
output_file = open("xxx.text", "a")
</code>
===Reading Files in Python===
  * 使用 ''readline()'' 读取每一行内容（会读取换行）
    * 控制额外的换行：
      * 使用 ''print()'' 的 tag ''end=""'' 取消 ''print()'' 自带的换行
      * 使用 ''strip()'' 取消文本前后的内容
  * 使用 ''read()'' 读取所有内容
<code py>
# open input_file
input_file = open("xxx.txt", "r")
# readline() with "\n"
print(input_file.readline())

# readline() without "\n" by changing print()
print(input_file.readline(), end = "")

# readline() without "\n" by adding strip()
print(input_file.readline().strip())

# take size as parameter, default -1, which means the whole file
input_file.read()
</code>

读取的内容可以赋给变量。类型转换后会自动删除 white space 的内容：
<code py>
my_int = int(input_file.readline())
</code>
==Loading into Lists==
  * python 中的文件可以通过 for 循环来一行一行全部读取
<code py>
for line in input_file:
    my_list.append(line.strip())
</code>
==Save and Load Functions==
  * 可以将读写封装为函数，提高使用效率：
<code py>
def save(file_name, data):
    output_file = open(file_name, "w")
    for line in data:
        print(line, file = output_file)
    output_file.close()


def load(file_name):
    a_list = []
    input_file = open(file_name, "r")
    for line in input_file:
        a_list.append(line.strip())
    input_file.close()
    return a_list

my_list = [1,2,3,4,5,6,7]
save("test.txt", my_list)

loading_list = load("test.txt")
print(loading_list)

</code>
====Dictionary====
  * Dictionaries: A data structure comprised of **key-value pairs**, where a key is entered into the dictionary to get out a value.
    * Dictionary Key: A value then, when passed into a dictionary, returns a corresponding value
    * Dictionary Value: A value returned in response to a key in a dictionary.
==Dictionary 和 List 的区别==
  * List 只能通过 index 访问，如果要找特定数据，需要一个一个找，**有序**
  * Dictionary（Map） 可以通过 key 直接访问，**无序**
===Dictionaries in Python===
==Creating Dicts==
使用大括号 ''{'' //braces// 创建。''key'' 和 ''value'' 通过冒号 (//colon//) '':'' 组成 pair“：
<code py>
#define
my_dict = {"sprockets" : 5, "widgets" : 11, "cogs" : 3, "gizmos": 15}

#access and modifiy
my_dict["sprockets"] = 1

#if unsure whether a operation is an creating or modifiying:
dictionary.setdefault("key", 0)
dictionary["key"] += 1
</code>
  * //Dictionary// 可以直接通过函数传递。
  * //Key// is **immutiable**。
==Adding to and Removing ==
  * 添加 / 删除 / 访问 / 验证操作都是基于 **Key**
  * 访问不存在的元素会抛出 //KeyError//
  * 查看 pair 是否存在可以通过使用 ''in'' 关键字查询 key 是否存在于字典中
<code py>
# adding pair 
myDictionary["gadgets"] = 1 

# delete pair
del myDictionary["gadgets"]
</code>
==Traversing==
<code py>
# if we only concern the value
for val in dict.value():
   if val > 5:
     # do sth

# if we only care about key
for k in dict:
 # do sth

for k in dict.key():
  # do sth
  
# if we need to bring them in both

for (k, val) in dict.items():
   # do sth
</code>
===Dictionary Applications===
  * 使用名称作为 key，内容作为 value (比如统计词的出现数)
  * value 部分可以放入更多的详细信息（比如包含信息的 tuple, list ）
  * 字典自身可以嵌套，组成更复杂的数据结构。这种结构更像是面向对象的概念
==使用 key 统计 value 出现的次数==
<code py>
dict = {}
for name in a_list:
    # if the name already has the record, add 1 count
    if name in dict:
        dict[name] += 1
    # else, create the name and add 1 count
    else:
        dict[name] == 1
</code>
==将 value 当做 key 使用==
这种情况下需要知道 value 的范围。假设 value 是餐桌的编号（1-4），而 key 是客人的名字；如果想统计哪个桌子上都有谁：
<code py>
for tab_num in range(1, 4):
    for (name, table) in seating_chart.item():
        if tab_num == table:
            print(name, end = " ")
</code>
====References====
  * [[https://docs.python.org/3.7/library/stdtypes.html#string-methods|String methods]]