随机算法 & 数据结构

rd.nextInt(n) 可以生成 [0 - n) 之间的整数，注意开区间。
一般要靠 arraylist，因为有 get() 函数。
remove 任意位置是 O(n)，但是每次只 remove 末位的话，平均是 O(1)，在知道 index 的情况下 swap 一下就好了。

随机数的帖子

http://www.1point3acres.com/bbs/forum.php?mod=viewthread&tid=119376&highlight=%CB%E6%BB%FA

Insert Delete GetRandom O(1)

实现 O(1) 的 GetRaondom 可以通过如下方式：

生成范围随机数
对应的数据结构支持 O(1) get
可以维护连续 key 区间，保证随机数时间复杂度

于是一开始想了下自己维护一个 LinkedList，但是不支持 O(1) get 只能算了。。

ArrayList 的问题在于，它的任意位置 delete 操作是 O(n)，和自身长度成比例的，因为要 shift 元素，怎么能把这个操作变成 O(1) 呢？

这里要做个取巧的事，考虑到所有value都是 integer，我们可以直接 swap，这样删除 “尾巴” 就是 O(1) 的平均复杂度了。

rd.nextInt(n) 可以生成 [0 - n) 之间的整数，注意开区间。
正好可以当 index 用。

需要考虑一个细节, 就是只有一个元素时删除元素. 一次AC

public class RandomizedSet {

    HashMap<Integer, Integer> map;
    ArrayList<Integer> list;
    /** Initialize your data structure here. */
    public RandomizedSet() {
        map = new HashMap<Integer, Integer>();
        list = new ArrayList<Integer>();
    }

    /** Inserts a value to the set. Returns true if the set did not already contain the specified element. */
    public boolean insert(int val) {
        if(!map.containsKey(val)){
            list.add(val);
            map.put(val, list.size()-1);
            return true;
        }
        return false;
    }

    /** Removes a value from the set. Returns true if the set contained the specified element. */
    public boolean remove(int val) {
        if(!map.containsKey(val)) return false;
        int idx = map.get(val);
        int tmp = list.get(list.size()-1);
        list.set(list.size()-1,val);
        list.set(idx, tmp);
        list.remove(list.size()-1);
        map.put(tmp, idx);
        map.remove(val);
        return true;
    }

    /** Get a random element from the set. */
    public int getRandom() {
        if(list.size()==0) return 0;
        Random rd = new Random();
        int idx = rd.nextInt(list.size());
        return list.get(idx);
    }
}

Insert Delete GetRandom O(1) - Duplicates allowed

HashMap 的 value 只存一个 index 就不够用了，得存一个 Set<>，记录相同元素所有的 index，反正 add / remove 的平均复杂度都是 O(1)。

(G) Implement HashTable with get,set,delete,getRandom functions in O(1).

这个也是平均复杂度，所用的技巧和上一题完全一样，ArrayList + HashMap

(G) Get random number except blacklist

http://www.1point3acres.com/bbs/thread-78548-1-1.html

设计一个随机数产生器，有一个以列表形式保存的已经排序blacklist，输出的数字如果出现在其中就要剔除。（面试完之后，听人提醒这题很类似CTCI上的问题12.3）这题面试官没让我写code，我说了几个答案他都觉得不满意，因为他想要一种“渐进的”方法解决此题

题目要求:

给定区间 [0，n]，和一个排好序的 blacklist，List<> 随机返回一个不在 blacklist 中的数。

分析：

这道题最简单的思路是先从 blacklist 的角度想。先生成一个[0, n] 之间的数，然后看这个数在不在 blacklist 里，如果在的话，再生成一次。

问题一：如果 blacklist 不是 set 的形式给的，至少要 binary search;
问题二：运气不好，或者 blacklist 非常大的话，我们可能要调用多次 getRandom，而这个 API 可能是非常贵的。

因此另一个思考角度是，从 white list 出发。

对于【0，10】，blacklist = 【1,2,3,7,8】来讲，white list = 【0,4,5,6,9,10】.因此假设我们有足够内存去维护 whitelist，我们可以直接生成并返回一个 white list 中的元素，这样可以保证一次 getRandom() 操作保证得到想要的元素。

假如 white list 内存放不下呢？

方法照旧。每次我们生成一个 white list 的合理 index 之后，我们要找的就是 “第 index + 1 个不在 blacklist 中的数”。这个可以通过扫 blacklist 实现，O(1) 的内存开销。+

相当于实现了一个 index -> whitelist[index] 的 mapping，可以保证一次 getRandom() 就可以得到有效元素。

i 从 0 到 n 循环，如果大于 blacklist 最后一个数，或者小于 blacklist 当前数，都 count --;
否则 blackPtr ++ ;
每次循环的时候，count 扣完了，当前 i 就是目标元素。

static Random rand = new Random();

    public static int getRandom(int n, List<Integer> blackList){
        int totalNum = n + 1;
        int whiteListSize = totalNum - blackList.size();

        int index = rand.nextInt(whiteListSize);
        int count = index + 1;
        int blackPtr = 0;
        for(int i = 0; i <= n; i++){
            if(i > blackList.get(blackList.size() - 1) || i < blackList.get(blackPtr)){
                count --;
            } else {
                blackPtr ++;
            }

            if(count == 0) return i;
        }

        return -1;
    }

    public static void main(String[] args){
        List<Integer> blackList = new ArrayList<Integer>();
        blackList.add(1);blackList.add(2);blackList.add(3);
        blackList.add(7);blackList.add(8);
        blackList.add(13);blackList.add(19);blackList.add(20);
        for(int i = 0; i < 50; i++){
            System.out.print(" , " + getRandom(20, blackList));
        }
    }

huixuan : 我之前还想过这个问题可以变形成另外一个问题：比如产生长度为k的随机序列，序列中数字的值在[0, n]中sample，但是不可以有重复。要求讨论两个case: k和n差不多大、k远小于n

Random number generator with weight

题目描述：给定几种元素，以及对应的 weight，按每个元素的概率分布符合其 weight 的方式生成随机元素。

如：【0,1,2,3,4】的 weights 【0.1, 0.2, 0.2, 0.4, 0,1】，按 weight 返回【0,4】

核心是 cumulative distribution. 计算出 [0.1, 0.3, 0.5, 0.9, 1] 的 cdf 之后，我们可以随机返回一个【0,1】之间的数，然后对于 cdf array 做 binary search.

我们要寻找的，就是第一个 random number < cdf[i] 所对应的元素 num[i].

Reservoir Sampling (Quora)

题目描述： Given a data set of unknown size N, uniformly select k elements from the set such that each element has a 1/N probability of being chosen.

http://www.geeksforgeeks.org/reservoir-sampling/

分析 & 证明：

核心是 proof by induction.
从 k = 1 开始，对于第 i 个元素，我们有 1 / i 的几率让当前元素成为 candidate，相应的，有 (i - 1) / i 的概率直接扔掉这个新元素。
假设我们有 3 个元素，那么第 3 个元素只会被 check 一次，有 1/3 的几率留下，2/3 的几率被扔掉；
而它的前一个元素，2，会被 check 两次，一次是自己，一次是 3 的时候；而它成为最终选中的元素需要同时满足两个条件：
- 考虑到自己的时候，随到了 keep : (1/i = 1/2 概率)
- 考虑后面元素的时候，没替换掉它，即后面元素都 discard 了：(2/3 概率)
- 两项相乘，等于 1/3.
这个 induction 的推导，对于任意有效 k 和 n 也都适用，从后往前。
- 倒数第二个元素选中的概率 =(k / n - 1) * (1-k/n*1/k)=(k / n - 1) * (n - 1 / n) = k / n
- 倒数第三个元素选中的概率是=(k/n-2)*(1-k/(n-1)*1/k)*(1-k/n*1/k)=k/n
- 算概率的原则是对于倒数第K个元素, 被选中的概率乘以依次不会被替换的概率. 被替换的概率是1-k/i*1/k.

O(n) 时间，O(k) 空间。算法和测试代码如下：

 static Random rand = new Random();

    public static int[] reserviorSampling(int k, int[] nums){
        if(k >= nums.length) return nums;

        int i = 0;
        int[] rst = new int[k];
        for(; i < k; i++){
            rst[i] = nums[i];
        }

        for(; i < nums.length; i++){
            // random is exclusive
            int num = rand.nextInt(i + 1);
            if(num < k) rst[num] = nums[i];
        }

        return rst;
    }

    public static void main(String[] args){
        int[] count = new int[10];
        for(int i = 0; i < 10000; i++){
            int[] sampled = reserviorSampling(5, new int[]{0,1,2,3,4,5,6,7,8,9});
            for(int num : sampled) count[num]++;
        }

        for(int i = 0; i < 10; i++){
            System.out.println("Count of " + i + " " + count[i] + " times");
        }
    }

Linked List Random Node

刚写完这章的总结，leetcode 就搞了个蓄水池抽样的题出来。。好与时俱进啊！

public class Solution {

    /** @param head The linked list's head.
        Note that the head is guaranteed to be not null, so it contains at least one node. */
    ListNode head;
    public Solution(ListNode head) {
        this.head = head;
    }

    /** Returns a random node's value. */
    public int getRandom() {
        int ans = -1; int count = 1;
        Random rd = new Random();
        ListNode cur = head;
        while(cur!=null){
            if(rd.nextInt(count++)==0)
                ans = cur.val;
            cur = cur.next;
        }
        return ans;
    }
}

Shuffle an Array

Shuffle array, Fisher–Yates shuffle, Knuth 洗牌算法

代码和流程惊人的简单：

遍历每个 i ，随机从 i 还有 i 后面的区间抽一个数，和 i 交换；
重点是 index = 【i, n - 1】区间。每个元素有留在原位置的概率。

其原理非常类似于一群人在一个黑箱子里抽彩票，for 循环中的每次迭代相当于一个人，每次抽样范围 index = 箱子中剩下的所有彩票，其数量依次递减。到最后每个人虽然抽奖时箱子里彩票总数不同，但是获奖概率却是均等的。

public class Solution {
    int[] nums;
    Random rand;
    public Solution(int[] nums) {
        this.nums = nums;
        this.rand = new Random();
    }

    /** Resets the array to its original configuration and return it. */
    public int[] reset() {
        return nums;
    }

    /** Returns a random shuffling of the array. */
    public int[] shuffle() {
        if(nums == null) return null;
        int[] copy = nums.clone();
        for(int i=1;i<copy.length;i++){
            int j = rand.nextInt(i+1);
            swap(copy, i, j);
        }
        return copy;
    }

    public void swap(int[] num, int i, int j){
        int t = num[i];
        num[i] = num[j];
        num[j] = t;
    }
}

给一个函数foo()可以随机生成1-5, 利用foo函数随机生成1-7的数字.

错误的想法: 利用foo()*foo()构造1-25的均匀分布随机数, 再通过1-25来构造7的倍数, 来获得1-7. 错误在foo()*foo()没法生成1-25的全部数字, 比如14; 也不均匀, 4可以通过(1,4) (2,2) (2,2) (4,1)生成.

正确的解法: 利用5*foo()-5+foo(), 先生成(0, 5, 10, 15, 20)的起始点, 再通过加foo()来填补区间, 生成均匀的1-25. 而此时我们需要的是7的倍数, 7, 14, 21. 在这里我们取能取到最多的21, 当生成的num<22时,我们对num%7+1可以得到1-7. num>=22时, 重新生成.

int foo() // given method that returns 1 to 5 with equal probability
{
    // some code here
}

int my_rand() // returns 1 to 7 with equal probability
{
    int i;
    i = 5*foo() + foo() - 5;
    if (i < 22)
        return i%7 + 1;
    return my_rand();
}

二维平面上给一个矩形（和坐标轴平行），在矩形中给一个unified random的点。(G)

在x和y range之间生成一个double

Random r = new Random();
double randomValue = rangeMin + (rangeMax - rangeMin) * r.nextDouble();

随机算法 & 数据结构

rd.nextInt(n) 可以生成 [0 - n) 之间的整数，注意开区间。

一般要靠 arraylist，因为有 get() 函数。

remove 任意位置是 O(n)，但是每次只 remove 末位的话，平均是 O(1)，在知道 index 的情况下 swap 一下就好了。

生成范围随机数

对应的数据结构支持 O(1) get

可以维护连续 key 区间，保证随机数时间复杂度

这里要做个取巧的事，考虑到所有value都是 integer，我们可以直接 swap，这样删除 “尾巴” 就是 O(1) 的平均复杂度了。

rd.nextInt(n) 可以生成 [0 - n) 之间的整数，注意开区间。

正好可以当 index 用。

需要考虑一个细节, 就是只有一个元素时删除元素. 一次AC

HashMap 的 value 只存一个 index 就不够用了，得存一个 Set<>，记录相同元素所有的 index，反正 add / remove 的平均复杂度都是 O(1)。

(G) Get random number except blacklist

题目要求:

分析：

问题一：如果 blacklist 不是 set 的形式给的，至少要 binary search;

问题二：运气不好，或者 blacklist 非常大的话，我们可能要调用多次 getRandom，而这个 API 可能是非常贵的。

因此另一个思考角度是，从 white list 出发。

假如 white list 内存放不下呢？

相当于实现了一个 index -> whitelist[index] 的 mapping，可以保证一次 getRandom() 就可以得到有效元素。

i 从 0 到 n 循环，如果大于 blacklist 最后一个数，或者小于 blacklist 当前数，都 count --;

否则 blackPtr ++ ;

每次循环的时候，count 扣完了，当前 i 就是目标元素。

huixuan : 我之前还想过这个问题可以变形成另外一个问题：比如产生长度为k的随机序列，序列中数字的值在[0, n]中sample，但是不可以有重复。要求讨论两个case: k和n差不多大、k远小于n

Random number generator with weight

核心是 cumulative distribution. 计算出 [0.1, 0.3, 0.5, 0.9, 1] 的 cdf 之后，我们可以随机返回一个 【0,1】 之间的数，然后对于 cdf array 做 binary search.

我们要寻找的，就是第一个 random number < cdf[i] 所对应的元素 num[i].

题目描述： Given a data set of unknown size N, uniformly select k elements from the set such that each element has a 1/N probability of being chosen.

分析 & 证明：

核心是 proof by induction.

从 k = 1 开始，对于第 i 个元素，我们有 1 / i 的几率让当前元素成为 candidate，相应的，有 (i - 1) / i 的概率直接扔掉这个新元素。

假设我们有 3 个元素，那么第 3 个元素只会被 check 一次，有 1/3 的几率留下，2/3 的几率被扔掉；

而它的前一个元素，2，会被 check 两次，一次是自己，一次是 3 的时候；而它成为最终选中的元素需要同时满足两个条件：

考虑到自己的时候，随到了 keep : (1/i = 1/2 概率)

考虑后面元素的时候，没替换掉它，即后面元素都 discard 了：(2/3 概率)

两项相乘，等于 1/3.

这个 induction 的推导，对于任意有效 k 和 n 也都适用，从后往前。

倒数第二个元素选中的概率 =(k / n - 1) * (1-k/n*1/k)=(k / n - 1) * (n - 1 / n) = k / n

倒数第三个元素选中的概率是=(k/n-2)*(1-k/(n-1)*1/k)*(1-k/n*1/k)=k/n

算概率的原则是对于倒数第K个元素, 被选中的概率乘以依次不会被替换的概率. 被替换的概率是1-k/i*1/k.

O(n) 时间，O(k) 空间。算法和测试代码如下：

代码和流程惊人的简单：

遍历每个 i ，随机从 i 还有 i 后面的区间抽一个数，和 i 交换；

重点是 index = 【i, n - 1】区间。每个元素有留在原位置的概率。

其原理非常类似于一群人在一个黑箱子里抽彩票，for 循环中的每次迭代相当于一个人，每次抽样范围 index = 箱子中剩下的所有彩票，其数量依次递减。到最后每个人虽然抽奖时箱子里彩票总数不同，但是获奖概率却是均等的。

在x和y range之间生成一个double

results matching ""

No results matching ""

核心是 cumulative distribution. 计算出 [0.1, 0.3, 0.5, 0.9, 1] 的 cdf 之后，我们可以随机返回一个【0,1】之间的数，然后对于 cdf array 做 binary search.

倒数第二个元素选中的概率 =(k / n - 1) * (1-k/n1/k)=(k / n - 1) (n - 1 / n) = k / n

倒数第三个元素选中的概率是=(k/n-2)(1-k/(n-1)1/k)(1-k/n1/k)=k/n