Python数据结构与算法补全系列(三)--搜索和排序

搜索

搜索是指从元素集合中找到特定元素的算法过程。搜索通常返回 True 或 False, 分别表示元素是否存在；或者返回目标元素的位置。

顺序搜索

存储于列表等集合中的元素彼此存在线性或顺序的关系，每个元素的位置与其他元素相关。在 Python 列表中，元素的位置就是它的下标。因为下标是有序的，所以能够顺序访问，由此可以进行顺序搜索。

# 无序列表的顺序搜索
def sequential_search(alist, item):
    pos = 0
    found = False
    
    while pos < len(alist) and not found:
        if alist[pos] == item:
            found = True
        else:
            pos = pos + 1
    
    return found

假设列表中的元素按升序排列，搜索的效率会比无序列表高一些：

# 有序列表的顺序搜索
def ordered_sequential_search(alist, item):
    pos = 0
    found = False
    stop = False
    while pos < len(alist) and not found and not stop:
        if alist[pos] == item:
            found = True
        else:
            if alist[pos] > item:
                stop = True
            else:
                pos = pos + 1
    
    return found

顺序搜索的时间复杂度：O(n)

二分搜索

对于有序列表可以从中间的水元素着手，将其和目标元素比较，这样可以很快地排除一半的元素：

# 有序列表的二分搜索
def binary_search(alist, item):
    first = 0
    last = len(alist) - 1
    found = Fasle
    
    while first <= last and not found:
        mid_point = (first + last) // 2
        if alist[mid_point] == item:
            found = True
        else:
            if item < alist[mid_point]:
                last = mid_point - 1
            else:
                first = mid_point + 1
    
    return found

二分搜索的递归版本：

# 二分搜索的递归版本
def binary_search(alist, item):
    if len(alist) == 0:
        return False
    else:
        mid_point = len(alist) // 2
        if alist[mid_point] == item:
            return True
        else:
            if item < alist[mid_point]:
                return binary_search(alist[:midpoint], item)
            else:
                return binary_search(alist[midpoint+1:], item)

二分搜索的时间复杂度：O(logn)

排序

排序是指将集合中的元素按某种顺序排列的过程。排序是计算机科学中的一个很重要的研究领域。给大量的元素排序可能销号大量的计算资源。排序算法的效率与待处理的元素数目有关。衡量排序算法的总体效率可以用总比较次数和总交换次数等。

冒泡排序

冒泡排序多次遍历列表，它比较相邻的元素，将不合顺序的交换，每轮遍历都将下一个最大值放到正确的位置上。

def bubble_sort(alist):
    for passnum in range(len(alist)-1, 0, -1):
        for i in range(passnum):
            if alist[i] > alist[i + 1]:
                alist[i], alist[i + 1] = alist[i + 1], alist[i]

冒泡排序通常被认为是效率最低的排序算法，因为在确定最终的位置前必须交换元素。如果在一轮遍历中没有发生元素交换，就可以确定列表已经有序。
按照上述思路修改排序算法：

# 短冒泡
def short_bubble_sort(alist):
    exchanges = True
    passnum = len(alist) - 1
    while passnum > 0 and exchanges:
        exchanges = False
        for i in range(passnum):
            if alist[i] > alist[i + 1]:
                exchanges = True
                alist[i], alist[i + 1] = alist[i + 1], alist[i]
                
        passnum = passnum - 1

冒泡排序的时间复杂度：O(n^2)

选择排序

选择排序在冒泡排序的基础上做了改进，每次遍历列表时只做一次交换。选择排序在每次遍历时寻找最大值，并在遍历完之后将它放到正确的位置上。

def selection_sort(alist):
    for fillslot in range(len(alist)-1, 0, -1):
        position_of_max = 0
        for location in range(1, fillslot + 1):
            if alist[location] > alist[position_of_max]:
                position_of_max = location
        
        alist[fillslot], alist[position_of_max] = \
        alist[position_of_max], alist[fillslot]

选择排序与冒泡排序的比较次数相同，但由于减少了交换次数，因此排序算法通常更快。
选择排序的时间复杂度：O(n^2)

插入排序

插入排序在列表较低的一端维护一个有序的子列表，并逐个将每个新元素插入到这个子列表。

def insertion_sort(alist):
    for index in range(1, len(alist)):
        
        current_value = alist[index]
        position = index
        
        while position > 0 and alist[position-1] > current_value:
            alist[position] = alist[position-1]
            position = position-1
        
        alist[position] = current_value

最坏的情况下，插入排序的时间复杂的为O(n^2)

希尔排序

希尔排序也成”递减增量排序”，它对插入排序做了改进，将列表分成数个子列表，并对每一个子列表应用插入排序。

def shell_sort(alist):
    sublistcount = len(alist) // 2
    while sublistcount > 0:
        for startposition in range(sublistcount):
            gap_inserttion_sort(alist, startposition, sublistcount)
        
        print("After increments of size", sublistcount,
             "The list is", alist)
        
        sublistcount = sublistcount // 2
         
def gap_inserttion_sort(alist, start, gap):
    for i in range(start+gap, len(alist), gap):
        
        current_value = alist[i]
        position = i
        
        while position >= gap and \
        alist[position-gap] > current_value:
            alist[position] = alist[position-gap]
            position = position-gap
            
        alist[position] = current_value
>>> alist = [54, 26, 93, 17, 77, 31, 44, 55, 20]
>>> shell_sort(alist)
After increments of size 4 The list is [20, 26, 44, 17, 54, 31, 93, 55, 77]
After increments of size 2 The list is [20, 17, 44, 26, 54, 31, 77, 55, 93]
After increments of size 1 The list is [17, 20, 26, 31, 44, 54, 55, 77, 93]

每一轮的遍历都生成了更有序的列表，这使得最后一次的遍历非常高效。
希尔排序的时间复杂度可以达到 O(n^(3/2))

归并排序

归并排序是一种递归算法，每次讲一个列表一分为二。如果列表为空或只有一个元素，就认为它是有序的（基本情况）；如果不为空就将列表一分为二并调用归并排序。当两部分都有序后，就进行归并操作。

def merge_sort(alist):
    print("Splitting ", alist)
    if len(alist) > 1:
        mid = len(alist) // 2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]
        
        merge_sort(lefthalf)
        merge_sort(righthalf)
        
        i = 0
        j = 0
        k = 0
        
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                alist[k] = lefthalf[i]
                i = i + 1
            else:
                alist[k] = righthalf[j]
                j = j + 1
            k = k + 1
        
        while i < len(lefthalf):
            alist[k] = lefthalf[i]
            i = i + 1
            k = k + 1
            
        while j < len(righthalf):
            alist[k] = righthalf[j]
            j = j + 1
            k = k + 1
    print("Merging ", alist)

归并排序的时间复杂度为O(nlogn)

快速排序

快速排序首先选出一个基准值，如第一个元素。基准值的作用是帮助切分列表。在最终的有序列表中，基准值的位置通常被称为分割点。算法在分割点切分子列表，并进行对快速排序的子调用。
在排序过程中，选出基准值后下一步是分区操作。分区操作的目的是根据待排序元素与基准值的相对大小将他们放到正确的一遍，同时逐渐逼近分割点。

def quick_sort(alist):
    quick_sort_helper(alist, 0, len(alist) - 1)

def quick_sort_helper(alist, first, last):
    if first < last:
        
        splitpoint = partition(alist, first, last)
        
        quick_sort_helper(alist, first, splitpoint-1)
        quick_sort_helper(alist, splitpoint+1, last)
        
def partition(alist, first, last):
    pivotvalue = alist[first]
    
    leftmark = first + 1
    rightmark = last
    
    done = False
    while not done:
        
        while leftmark <= rightmark and \
        alist[leftmark] <= pivotvalue:
            leftmark = leftmark + 1
            
        while alist[rightmark] >= pivotvalue and \
        rightmark >= leftmark:
            rightmark = rightmark - 1
        
        if rightmark < leftmark:
            done = True
        else:
            temp = alist[leftmark]
            alist[leftmark] = alist[rightmark]
            alist[rightmark] = temp
    
    temp = alist[first]
    alist[first] = alist[rightmark]
    alist[rightmark] = temp
    
    return rightmark
>>> alist = [54, 26, 93, 17, 77, 31, 44, 55, 20]
>>> quick_sort(alist)
>>> alist
[17, 20, 26, 31, 44, 54, 55, 77, 93]

实际操作过程中可使用三数取中法即在选择基准值时考虑列表的头元素、中间元素和尾元素来避免切分不均匀。
快速排序的的时间复杂度为O(nlogn),并且不需要像归并排序那样使用额外空间。