Close
youtube history feature efficient update

Low Level Design of Youtube’s last watched video feature (part – 1)

Introduction

I’m sure most of you have seen/used the last watched history feature of youtube. But ever wondered how it must be working internally? Well I am also curious about this to decipher how it really works so effortlessly. Before we dive into the discussion I have divided this blog into two parts where in this part we dive into sub-problems that are pivotal to efficiently maintain the watched video history.

Problem defination

A few assumptions I will be making here. If you see youtube’s last watched, it maintains per day of watched history. We assume that youtube maintains a month of history per user & per day only 100 videos. In this part lets discuss a critical part of how to efficiently maintain today’s history, because if you see lots of action will happen in today history.

  • Every watched video will appear at the top of today’s history.
  • In case when you watch an older video from the day, it comes back on the top again.
  • If the limit of 100 videos reached we need to remove the last video.

Implementation of last video watched history efficient update

usage Doubly linked list for Efficient update across

Let’s try to think about the problem here. If you see what is really happening when you watch a video it just comes on the top, be it an older one that watched hours ago or a new one. So we need to use a data structure which can efficiently move watched video from the ith location to top and should not disturb other locations. Basically we need O(1) time to move the video from the list. What data structure should we use? I think the Doubly Linked List would work nicely here. Nice we figured one part of the problem.

usage of HashMap to keep the Doubly linked list node cached

Now let’s think about how to figure out whether the video is watched already available in the list. If you think then in the case of DLL, it has O(n) searching time complexity. That does not look good, so what to do now? If you think a bit more you would notice that if we store the reference to the node of DLL. We should be able to search in O(1) wouldn’t that be great. Do you think any other data structure can help us here? I can think of a hashMap where we can store <Video_ID, Ref_fo_DLL_Obj> to achieve an O(1) search time.

Existing video details replacement from the hash map and Doubly linked list

One last item is when the limit reaches we need to remove the last video, that too in O(1). But DLL we only maintain the head not the tail. But in this case we need to maintain so that we can achieve O(1). Now think a bit more the DLL is starting to resemble another structure like Deque(Double ended queue).

All coming together for efficient watched history update

So over all if implementation Dqueue as DLL and hashMap. We could achieve O(1) additional, O(1) moving & O(1) tail deletion. Let’s dive into the implementation part now. Below VideoDoublyLinkedList.java has implementation of DLL where as WatchedVideoDayHistory.java has logic to keep the history list upto date as what video last watched. Copy the code to your editor, try to run and understand.

Main.Java

import working.example.tech.LastWatchedYoutubeVideo.
WatchedVideoDayHistory;

import working.example.tech.interfaces.TestCaseRunner;
public class Main {
    public static void main(String[] args) {
         TestCaseRunner wVideoHist = new WatchedVideoDayHistory();
         wVideoHist.RunTest();
    }
}

WatchedVideoDayHistory.java

package working.example.tech.LastWatchedYoutubeVideo;
import working.example.tech.interfaces.TestCaseRunner;
import java.util.HashMap;
import java.util.Vector;
public class WatchedVideoDayHistory implements TestCaseRunner {
    private VideoDoublyLinkedList DLL;

    public WatchedVideoDayHistory() {
        this.DLL = null;
    }

    private void run(Vector<Integer> listOfvideoId, Integer perDayHistorySize) {
        HashMap<Integer, VideoDoublyLinkedList.Node> videoMap = new HashMap<>();
        DLL = new VideoDoublyLinkedList();
        for (Integer videoId : listOfvideoId) {
            VideoDoublyLinkedList.Node exist = videoMap.get(videoId);
            if (videoMap.size() < perDayHistorySize) {
                if (exist == null) {
                    videoMap.put(videoId, DLL.add(videoId)) ;
                } else {
                    DLL.remove(exist);
                    videoMap.put(videoId, DLL.add(videoId));
                }
            } else {
                // replace existing and add new
                DLL.remove(DLL.getTail());
                DLL.add(videoId);
            }
            showOut();
            System.out.println();
        }
    }
    @Override
    public void RunTest() {
        Vector<Integer> list = new Vector<>();
        list.add(123);
        list.add(234);
        list.add(345);
        list.add(123);
        list.add(432);
        list.add(456);
        list.add(345);
        list.add(432);
        list.add(432);
        this.run(list, 100);
    }

    @Override
    public void showOut() {
        VideoDoublyLinkedList.Node head = DLL.getHead();
        while(head != null) {
            System.out.print(" " + head.VideoId);
            head = head.next;
        }
    }
}

VideoDoublyLinkedList.java

package working.example.tech.LastWatchedYoutubeVideo;
public class VideoDoublyLinkedList {
    private Node head;
    private Node tail;


    public VideoDoublyLinkedList() {
        head = null;
        tail = null;
    }
    class Node {
        int VideoId;
        Node next;
        Node prev;
        public Node(int videoId, Node next, Node prev) {
            VideoId = videoId;
            this.next = next;
            this.prev = prev;
        }
    }

    public Node add(int Id) {
        Node node = new Node(Id, null, null);
        if (head == null) {
            head = node;
            tail = head;
        } else {
            node.next = head;
            head.prev = node;
            head = node;
        }
        return node;
    }

    public Node remove(Node in) {
        Node deleted = in;
        if (in == head && in == tail) {
            head = null;
            tail = null;
        } else if (in == head) {
            head = in.next;
            head.prev.next = null;
            head.prev = null;
        } else if (in == tail) {
            tail = in.prev;
            tail.next.prev = null;
            tail.next = null;
        } else {
            Node tempNode = in.prev;
            tempNode.next = in.next;
            in.prev = null;
        }
        return deleted;
    }

    public Node getHead() {
        return head;
    }

    public Node getTail() {
        return tail;
    }
}

Input:

Video Ids : 123 234 345 123 432 456 345 432 432 432

Output:

 123
 234 123
 345 234 123
 123 345 234
 432 123 345 234
 456 432 123 345 234
 345 456 432 123 234
 432 345 456 123 234
 432 345 456 123 234

Conclusion

In this part we discussed the youtube’s last watched video sub problem, and how it can be implemented efficiently. in the next part, I shall discuss low level design of the entire feature. So stay tuned!

References

https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html

https://www.scaler.com/topics/data-structures/doubly-linked-list/

More blogs

https://www.workingexample.tech/how-to-write-a-web-crawler-for-large-language-models-at-scale-part-2/

https://www.workingexample.tech/how-to-write-a-web-crawler-for-large-language-models-at-scale-part-i/

https://www.workingexample.tech/low-level-design-of-youtubes-last-watched-video-feature-part-2/