Python实现列表去重的多种方法及性能对比

列表

Python

2023-02-22 13:36:42

本文将通过多种方法实现列表（List）去重（将列表中的重复项删除）；其中包括使用常规方法去重、set()集和去重、使用enumerate()去重、和使用collections.OrderedDict.fromkeys()实现去重；并编写了测试代码来测试这些去重方法的性能、以及去重后元素的排序是否改变；

性能测试程序

import random
import time

def removing_duplicated(lst):
    #去重函数
    pass

def test_removing_duplicated():

    test_list = []
    
    limit = 1000000
    for i in range(limit):
        test_list.append(random.randint(0,10))

    print('测试列表中创建了',limit,'个0~10之间的随机整数')
    
    top20 = []
    for i in range(20):
        top20.append(test_list[i])

    print('测试列表中的前20个元素：',top20)

    new_list = []
    
    start = time.time()
    
    new_list = removing_duplicated(test_list)
    
    consuming = time.time() - start
    
    print('去重后的列表：',new_list)
    print('耗时：',consuming)
    
if __name__ == '__main__':
    test_removing_duplicated()

其中removing_duplicated()函数是我们需要实现的去重函数；下面请看去重函数removing_duplicated()的具体实现；

1. 列表常规去重

这种方法通过遍历列表，将列表中的元素添加到一个新的列表中，并忽略新列表中已经出现的元素；

该方法能保持排序不变；

def removing_duplicated(lst):
    #去重函数
    res = []
    for c in lst:
        if c not in res:
            res.append(c)
    return res

优雅的写法：

def removing_duplicated(lst):
    res = []
    [res.append(c) for c in lst if c not in res]
    return res

测试程序输出：

测试列表中创建了 1000000 个0~10之间的随机整数
测试列表中的前20个元素： [7, 8, 6, 7, 7, 0, 10, 2, 4, 5, 5, 7, 7, 5, 6, 7, 10, 6, 4, 8]
去重后的列表： [7, 8, 6, 0, 10, 2, 4, 5, 1, 3, 9]
耗时： 0.09773898124694824

2. 列表使用集和去重（使用set()）

这种方法最简单且性能最好，缺点是元素的排序丢失了；

def removing_duplicated(lst):
    
    return list(set(lst))

测试程序输出：

测试列表中创建了 1000000 个0~10之间的随机整数
测试列表中的前20个元素： [9, 9, 4, 2, 6, 0, 10, 0, 1, 2, 0, 8, 0, 5, 4, 4, 6, 2, 1, 4]
去重后的列表： [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
耗时： 0.011970043182373047

3. 列表使用enumerate()去重

通过列表推导式和枚举的结合使用可实现列表去重，并且能保留元素排序；

该方法效率很低，不推荐使用；

def removing_duplicated(lst):
    
    return [i for n, i in enumerate(lst) if i not in lst[:n]]

因为该方法效率实在太低，故将测试元素数量修改为10000；

测试程序输出：

测试列表中创建了 10000 个0~10之间的随机整数
测试列表中的前20个对象： [0, 6, 9, 6, 6, 5, 9, 5, 8, 7, 4, 4, 7, 3, 9, 10, 5, 6, 1, 4]
去重后的列表： [0, 6, 9, 5, 8, 7, 4, 3, 10, 1, 2]
耗时： 0.07678914070129395

4. 使用collections.OrderedDict.fromkeys()实现列表去重

这种方法性能也不错，且保留元素排序；

import collections
def removing_duplicated(lst):
    #使用该方法需要 import collections
    return list(collections.OrderedDict.fromkeys(lst))

测试程序输出：

测试列表中创建了 1000000 个0~10之间的随机整数
测试列表中的前20个对象： [0, 6, 7, 1, 1, 9, 8, 4, 2, 10, 0, 7, 5, 9, 7, 10, 1, 4, 1, 6]
去重后的列表： [0, 6, 7, 1, 9, 8, 4, 2, 10, 5, 3]
耗时： 0.03390908241271973

总结

对列表进行去重，如果要保留元素的顺序，使用collections.OrderedDict.fromkeys()实现去重效果最佳；如果不考虑元素的顺序，使用set()集和去重效果最佳。