topK问题-白红宇的个人博客

topK问题

发布日期：2021-06-30 19:56:06 浏览次数：2 分类：技术文章

本文共 3846 字，大约阅读时间需要 12 分钟。

给定一个字符串数组，让你找出前k个出现次数最多的字符串

比如：

输入：

1 2 4 5 6 5 8 6 6 9

输出：

No.1:6, times：3

No.2:5, times：2

No.3:2, times：1

输入：

abc abc aaa snfh asnfdk aaa kjsda asd 123

输出：

No.1:abc, times：2

No.2:aaa, times：2

No.3:123, times：1

第一行的整数代表是要显示出现次数前k名的字符串

第二行输入字符串，每个字符串用空格分开

如果出现次数相同，任意选取即可

import java.io.BufferedReader;import java.io.IOException;import java.io.InputStreamReader;import java.util.HashMap;import java.util.Map.Entry;public class Main {    static class Node {        public String str;        public int times;        public Node(String s, int t) {            str = s;            times = t;        }    }    public static void main(String[] args) throws IOException {        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));        int k = Integer.valueOf(br.readLine().trim());        String line = null;        StringBuilder str = new StringBuilder();        while ((line = br.readLine()) != null) {            str.append(line);        }        br.close();        String[] s = str.toString().split(" +");        str = null;        printTopK(s, k);    }    public static void printTopK(String[] arr, int topK) {        if (arr == null || topK < 1)            return;        HashMap
   
     map = new HashMap<>();        // 生成哈希表(记录字符串的词频)        int len = arr.length; // 海量数据访问栈内存比较好        for (int i = 0; i < len; ++i) {            if (!map.containsKey(arr[i])) {                map.put(arr[i], 1);            } else {                map.put(arr[i], map.get(arr[i]) + 1);            }        }        Node[] heap = new Node[topK];        int index = 0;        // 遍历哈希表，决定是否进堆，一共topK个堆元素，恢复堆有序，最后留下的一定是满足条件最大的几个        for (Entry
    
      entry : map.entrySet()) {            String str = entry.getKey();            int times = entry.getValue();            Node node = new Node(str, times);            if (index != topK) { // 堆没满之前                heap[index] = node;                heapInsert(heap, index++); // 插入时恢复堆有序            } else { // 堆已经满了，后续的直接和最小元素比来决定去和留                if (heap[0].times < node.times) {                    heap[0] = node;                    sink(heap, 0, topK - 1); // 下沉恢复堆有序                }            }        }        // 现在需要有序输出，也就是堆排序了        int N = topK - 1; // 下标        while (N > 0) {            swap(heap, 0, N--); // 最小的放到最后            sink(heap, 0, N); // 剩下的继续恢复堆有序        }        // 按照排名打印堆排序后的topK条记录        for (int i = 0; i != topK; ++i) {            if (heap[i] == null) {                break;            } else {                System.out.print("No." + (i + 1) + ":");                System.out.println(heap[i].str + ", times：" + heap[i].times);            }        }    }    private static void sink(Node[] heap, int i, int n) {        int parent = i + 1; // 避免i=0死循环，因为i*2=0 <=0恒成立，第parent个结点，下标为parent-1        int N = n + 1;// 第N个结点，下标为N-1        while ((parent << 1) <= N) { // 看有没有孩子            int j = parent << 1; // 左孩子            if (j < N && isLess(heap, j + 1, j)) { // 如果不满足j
     
      > 1; // 父节点需要除以2            if (heap[index].times < heap[parent].times) {                swap(heap, index, parent);                index = parent; // 交换后跟踪下标            } else {                break;            }        }    }    private static void swap(Node[] heap, int index, int parent) {        Node temp = heap[index];        heap[index] = heap[parent];        heap[parent] = temp;    }}

思路就是建立出小顶堆，然后每次和堆顶元素比较，比堆顶大，那么就替换堆顶元素，然后下沉恢复堆有序，堆里始终保持着到目前为止出现次数最大的几个字符串，遍历字符串数组完成即可，最后堆排序完成输出就满足了要求。

生成哈希表复杂度O(n), 有n条数据

每次进堆的时候恢复堆有序需要O（logk），因为堆数组是k个，是我们需要排名出来的前k个元素，所以前k次进堆并恢复堆有序时间复杂度为O(klogk)

剩下n-k个元素需要检查更新小顶堆，时间复杂度O( (n-k)logk )

接着k个元素堆排序O(klogk)

总的时间复杂度O(n)+2O(klogk)+O( (n-k)logk )

因为我们排出来的k一般很小，比如10W条数据需要前20条，那么这个k相遇于n来说可以忽略

所以总体时间复杂度为O(nlogk)

k是需要排名列出的前k条记录

n为总体数据量

===========================Talk is cheap, show me the code=======================

转载地址：https://liuchenyang0515.blog.csdn.net/article/details/82930835 如侵犯您的版权，请留言回复原文章的地址，我们会给您删除此文章，给您带来不便请您谅解！

上一篇：顺时针打印矩阵

下一篇：9. Palindrome Number(回文数)

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！

发表评论

最新留言

关于作者

推荐文章