Getting Started

Oh Hey!! Its mid-term already, Can you believe it? By the way thanks for staying with us this far along. It means a lot.

Now lets talk about the elephant in the room. Whats Mid Term, and why I’m stressing it on it? Well, the Mid-Term Evaluation marks a crucial phase in one’s GSOC journey. During this period, Google engages contributors and mentors to assess the progress made thus far, evaluating the candidate’s capabilities and determining their suitability for continuing the project.

And on clearing the evaluation, the contributors get their share of the stipend. So yay!!

So.. whats in for today??

As you are familiar with our tool by now, If not go check it our here. Anyways you can see it can attract a large pool of audience who wish to edit their videos, or upload their stuff on Wiki Commons.

But is our tool capable for handling such traffic?

Putting the tool to test

Ok so for testing the tool’s capabilities for a large traffic, here’s what I did.

I opened multiple instances of the tool. Like multiple tabs, multiple windows etc. I tried to open multiple instances of the same account (trying to upload multiple videos by same account), opening the tool by multiple accounts on different windows. And I tried editing the videos on the same instant.

Here’s what I gathered, for every case. The Latest edited video (by the tool’s server), was being sent to all the connected clients. It seemed like we were Dancing with our hands tied

issue explaining the problem As you can see in the image above, the instances send their respective videos V1,V2,V3,V4. But they seem to receiving the same video after editing (shown by the red arrows).

On Investigating I Found

Ok, so this functionality was implemented on the last day of Wiki Hackathon, so its natural for it to have some vulnerabilities. Anyways here’s how the current working was:

// maintain a list of socket connections
socketConnections = [];

// capturing the "connection" event
io.on("connections", () => {
  socketConnections.append(socket.id);

  // if user is logged in, update the socket id in database
  if (userLoggedIn) {
    UserModel.update(socket.id);
  }
});

// sending data to all sockets
io.to(socketConnections).send(data);

Note: The above code was just the pseudo code, to help you understand the working.

For those, who couldn’t understand, here’s what was happening:

  • On every connection, a new socket id was generated.
  • The newly generated id gets appended to a list of connections. If the user was logged in, we store the last known socket id in the User model.
  • Now send the data to all the socket ids connected.

What Really Happened?

So on multiple connections, each socket id was getting appended to the list of socket connections (even though they are of different users / or some of them are inactive now). And so on sending every data to every socket id, meant overriding the previously sent data with the new one, to some connections.

This is why, some users were getting different results being edited, even though this was not what they ordered. They were getting edited video they asked for, but it was being overwritten by the newly edited video (by the server).

My proposal

Here’s what I proposed, Why don’t we send the concerned data to the concerned socket id, instead of sending everything to everyone.

I know this doesn’t sound something genius, but here’s what I did.

Since every new connection meant, generating new socket id. (Even on refreshing the tab, the server generates a new socket id, for the user). So why not send the socket id back to the client, and on sending the payload data, we request for the id back from the client, and send them their respective data.

Not following?

  • On a new connection, we check if the user is logged in or not. If its logged in, we send the user data back to him, along with the newly generated socket id.

  • We store the user data on browser’s localStorage, and send the user data back the server along with the API request for payload data.

  • The server extracts the socket id (sent via client), and then sends the payload data back to the socket id, via socket communication.

  • Oh and we never store the socket id in the database, as it changes on every refresh. We do not want to overwhelm our database.

Seems correct, right?

Yeah this approach, actually worked pretty well, given the to and fro communications done b/w the client and server. I tested this approach, by opening multiple windows, and tried editing two different videos at the same time. On both of them I got the correct and desired result. Yay!!

But…

Here’s an interesting fact about localStorage

The localStorage read-only property of the window interface allows you to access a Storage object for the Document’s origin; the stored data is saved across browser sessions. It is similar to sessionStorage, except that while localStorage data has no expiration time, sessionStorage data gets cleared when the page session ends — that is, when the page is closed.

Source: MDN web docs

Anyways, as you might have guessed the issue here. If not, let me explain. The data stored on localStorage doesn’t have a expiration date, and so it won’t be deleted. Besides the data stored in it, persists among all the browser sessions. If I open multiple tabs on the same browser, the session and the data would be maintained, and so on uploading multiple videos from multiple tabs (not windows). All of the instances (new tabs, of same window) would have same localStorage data.

So If any user logs in, the server sends the new socket id and user data back to the client. We store that user data on the localStorage, and use the same, when required. On opening multiple tabs on the same browser window, the localStorage would get updated on every new tab instance, and every tab would share the same socket id.

And so even though two tabs would send two different videos to edit, since they share the same socket id (as same data on localStorage), they would be getting the same processed video after editing, from the server.

Now what

As described in the above extract from MDN web docs, the data stored in sessionStorage stays as long as the page exists, and gets cleared as page session ends. If a new page opens up, we get new storage access for that particular session.

Basically, for every new tab, we would have a dedicated storage, where we could store the socket id and since it is page-dependent, the data stored in it won’t be the same for multiple tabs of the same browser.

Works fine, but instead of using another storage option, why not store the socket id ‘in-memory’. And yes, this is we’ve resorted to, for now. We store the socket id sent from the server in a React Context, and use it across the entire app, when required. This prevents the same socket id case entirely, and so each instance (irrespective of the fact that its the same user or different one), will have a unique socket id, which it will send back to the server for receiving back the payload data.

What’s in the future?

Well, the above approach works for the current version of the tool at least, and makes the tool more efficient and stable than it was.

But we won’t stop here just yet. Ok here’s a sneak peak of the future, we are planning to introduce a new feature called “Pending work”, which as the name suggests, allows users to store their pending work(videos) with us. The user can store their past edited videos with us, with the note that they haven’t made any decision on it, i.e do they want to upload the videos on commons or download it.

The main use case of the above feature, which I found appealing, was when the user tries to edit some large videos. This would obviously take time, and for the meanwhile, if they have any more videos to edit (which can be way shorter than the previous), instead of waiting for the large video to finish, they can edit it parallely. And so the previously editing video would go to their “Pending work” queue, and would edit on the background, while the user can continue with his/her work.

As useful as this sounds, its going to look as fancy, we are planning to show a sidebar, with all the current videos which the user have stored with us, they could be under processing or finished with the editing. If they are under processing, we’ll be showing them the live preview of the progress bar (payload data), along with the current video progress(if any).

How is this going to work out?

Well for now, its still in the very initial stage, but here’s what i thought. Maybe someone among you readers can help me with this?

  • We maintain a map (dictionary), to store the active socket connections, against the user id.
  • I don’t know how I’m going to check for active connections. One approach can be to catch the “disconnect” event of the sockets, and upon disconnection, I send the user id as well, which will help me to delete the socket id from the user id’s active connections.
  • Here’s a skeleton of the map
socketConnections <map> = {userID: [ active socket ids ]}
  • And as far as sending the required data is concerned, now I have to send all the payload data of the particular user, to all the user’s instances. Basically sending all the data of a particular user to all the socket ids.
  • Why?, because I need to show the live preview for the videos in the “pending work”, as well as for the video which is currently being edited.
  • How to map the data, to the respective videos? Maybe link them with the video ids. I have not thought this through, but cross mapping them with video ids and user ids can be seen as a way to go.

PS: The above working may or may not be used. We usually discuss the workings thoroughly in our bi-weekly catch ups. And I’m pretty sure the above way is going to have some vulnerabilities of the sort.

Is it time?

For this blog?.. Yea

For the program?.. Oh its only half time!! So yeah with this, we come to an end of our blog now. We hope you enjoyed this one. If anything interesting comes up, or if you wanna say Hi, you know where to connect with me :)

Stay tuned, for the next half. We promise to bring out more interesting stuff. Till then Bye-Bye.