Photo Organizing!

I’m sure you have about 10,000 pictures all in one big folder with no organization, multiple copies, and a wishful intent of organizing it. When you finally find the time to sit down and do your organizing, the task seems so insurmountable, you suddenly realize you forgot to scrub your kitchen floor, do the laundry, weed the garden, and complete every other chore you’ve been putting off! I was there — I made it through.

Vision

I was inspired by Timehop, of all things, when I sat down to organize all of my photos. The intention was that I would combine my entire family’s photos together, organized chronologically. Timehop, for those of you who don’t know, connects to your social media accounts (Facebook, Twitter, Instagram, Google Photos, and Dropbox) to gather your pictures. Each day, Timehop shows you pictures on that day from from your history. It’s pretty fun watching your kids grow up and seeing how much they have grown each day. I quickly found that, while Timehop is pretty great, it requires pictures. I’m not the big picture taker in the family, my beautiful wife is, so my Timehop was barren and her Timehop was full of all the pictures. This made me sad, but also inspired me to bring all of our pictures together (along with all the other pictures I had sitting around on several devices). So here’s what I did to bring my vision to life.

The Work

I started by bringing all of my pictures from all of my devices to my computer. I had to find an SD card reader, the power cord from my old Nikon camera (circa 2001), a plethora of CDs, all the odd folders on my PC, the photo folders I had on Dropbox, and the photo folders I had in Google Drive and bring them together in one place on my PC to start organizing. Once I had all the pictures in one place, I quickly realized that there were quite a few pictures! Roughly 230GB worth of photos!

I started organizing in folders by Year, then Month, then Day to get a good chronological order. I quickly realized there were many, many, many duplicates and that organizing them chronologically helped to identify them. I would manually inspect each of the duplicates and keep the better quality one. I also realized some of the images (typically the ones I had in backup folders) were missing the date information. I put these in a separate folder. Some of them had the date imprinted in the photo, some of these imprints were blatantly wrong (why is the sun out at 2AM?). These also went in the “unknown” folder. I went through all of the pictures and got them sorted chronologically as best as I could. There were a LOT of duplicate photos I was able to remove.

For the “unknown” folder I would occasionally find a dated picture and was able to remove the unknown picture. At the end of all of this organization, I was able to identify the date of some of the unknown pictures by context with other pictures (same outfit, same people and location, etc) and updated the EXIF data with the correct date. There were a few photos that were also mis-dated, but I didn’t find those until later.

After chronologically organizing these pictures, I realized I had mixed in some of my wife’s pictures and some of my in-law’s pictures. This had a really cool effect of seeing what the in-law’s were up to when the wife and I were somewhere else. It was interesting to see what my wife was up to when I was hanging out with friends before we even met. This side-effect was an awesome early payoff for the work I had done already. I wanted to include the rest of the family into this, but haven’t had much luck in getting buy in. Google Photos has the ability to share an album with others allowing collaboration on it. I might revisit this option in the future, especially since my brother recently lost all of his photos.

The Backup

So, after getting all of these photos finally organized (it took a few hours each night for about a week). I wanted all of this hard work backed up. I looked to Google Drive to be the workhorse for this. So, I uploaded all of the pictures to Google Drive one evening and went to bed. That morning I had a bunch of emails from Google about my Google Drive almost being full, options to upgrade my Google Drive space before I ran out of it, and that my Google Drive was finally full. I learned that your Google Drive contains all of your upload as well as all of your emails! So, I stopped getting emails sometime that evening. It was a bit alarming discovering this… so I quickly removed the partially uploaded photos from Google Drive and turned to Dropbox.

Dropbox uploads were done through a folder sync using the companion app. This seemed to go well, I set it up and let it run. Dropbox indexes all of the files it needs to upload before uploading them, so it took a few hours for it to do that before it started the actual upload. The upload lasted several more hours before I started getting emails in the middle of the night about my Dropbox becoming full, then actually being full… So, the next day, I stopped the upload and removed the partial upload from Dropbox. Strike two…

When I originally uploaded my photos to Google Drive, I thought it automatically moved them to Google Photos. Google is magic right? It turns out, they had recently announced they were going to stop doing this in the name of simplicity. Turns out it was still possible, but was confusing to configure. So, I skipped the Google Drive and went straight to Google Photos. I read up on what the requirements were and decided to take the plunge… if it didn’t work out, the worst that could happen would be a few emails, right?

The uploads took most of the day in upload time. I wasn’t sitting in front of the computer the whole time, but I would periodically check in to see how it was doing. I wasn’t getting email warnings, so that was a bonus. I also installed the Google Photos app on my iPhone and it started backing up all of the photos I’d ever taken on the phone to Google Photos as well. After all of the uploads finished (finally) I got my second awesome surprise. Google Photos‘ “assistant” started identifying duplicate photos! All of the hard work I had put in earlier identifying duplicate photos and eliminating them was automatically being done by Google Photos‘ assistant! It did find more duplicates. It also automatically created albums based on the geotags of some of the pictures. It was super neat to see albums starting to be created.

Over time, the Google Photos‘ assistant started identifying people and pets in the pictures and made albums of each person. In the past year since I’ve finished, it’s also been able to identify the same person at different ages and asks if the two faces are the same person. I no doubt have an uneasy feeling in that this is training some AI on facial recognition, but, like any other social media site, it’s a price I’m willing to pay to get the awesome benefits from this free service. Because of this facial recognition, Google Photos has provided me videos of my son growing up through the years, my pets, and my wife and I since we’ve met. It’s been rewarding each and every day.

I did end up sharing the entire drive with my wife. She sees whenever a new photo is uploaded so I don’t have to AirDrop or text the pictures I do take to her, she just gets them in her Google Photos app. I also don’t have to worry about losing pictures on my phone. I can create and share albums in real-time with people instead of having to remember to send them to them later. It’s been an interesting project that’s been more and more rewarding as each day passes.

Wrap Up

My Timehop feed has improved, I have a 811 day streak! It’s been real awesome to see what my wife was doing on the day I graduated high school, or what I was doing on the day she graduated college. It’s shown how large this world is when you step outside your own life and realize everyone is living their own. It really wants me to get all of the pictures into one place and see how my cousins, nieces, nephews, and the rest of my family is doing. While I’m not physically with these people every day, a collective photo album like this would almost be like being there.

The Importance of Data Backups

If it hasn’t happened to you yet, you will eventually suffer a data loss event. The most common event is your hard drive, with all your family pictures on it, suddenly and unexpectedly dies. If you haven’t planned for this, you may have to spend a bunch of money on data recovery to recover all of your valuable family pictures. Worse case, you spend the money only to find out that it can’t be recovered. This happened to my brother recently, and he thought he had a backup system in place, but when his hard drive crashed, even I couldn’t help him. He found out the hard way that his backup system wasn’t running as expected and he had to shell out $5,000 for a specialized clean room direct read data recovery process that ultimately was unable to recover his pictures. If he had an automatic backup system in place, he could have prevented this disaster.

Planning for Disaster

If your computer were to die today, do you know if all of your important data is easily recoverable? If your answer is “no” or you don’t know the answer, perhaps it’s time to set up something simple and easy now for preventing the type of disaster my brother experienced. In Windows, most of your documents are automatically stored in your “My Documents” or “Documents” folder (depending on your Windows version). This is intended from Microsoft’s perspective as it keeps all of your user files in one location. You only need to backup one single folder, and you backup most (if not all) of your important files. There are several premium services out there to help you keep your data safe. There are also some free services you could use.

Recommendations

Dropbox provides a limited amount of space for free accounts with the option of upgrading for more space. This service has a companion application you install on devices you want to automatically backup files from. I personally use Dropbox for backing up files from my PC and backup files from this server. Your files are accessible via the companion app when you install it on a new device or on the Dropbox website.

Google Drive provides a limited amount of free space and has a companion application that is installed on each computer from which you want to automatically backup files. The companion app can be configured to sync multiple computers or to backup individual computers into a single online Google Drive. This could be a useful utility for those with multiple computers they want to backup but not share data, or for those who want to sync working files across multiple devices. I personally use Google Drive for backing up document files (PDFs, Spreadsheets, text documents, etc.).

Google Photos allows for free and unlimited storage for down-scaled images and the service provides a few extra perks like de-duplication, automatic photo album creation, automatic styling, automatic organization, and automatic videos. This was very helpful to me when I consolidated all of my pictures I’ve taken over the past 18 years from several different locations. This is a great option of all you care about are photos and you take the majority of photos with your phone. Photos are automatically down-scaled and uploaded to Google Photos. I personally use Google Photos for all of my photo needs. I chose Google Photos when I filled my Google Drive and Dropbox space with photos and was looking for a free alternative. If you are a photographer or are concerned about retaining the original, high-resolution photo, there is a paid option for Google Photos which allows you to store the original photo rather than the down-scaled image.

iCloud provides a limited free amount of space for Apple users. This drive is a little more complicated to set up and is typically used for syncing files across multiple Apple products. I’ve actually lost files using this service due to it’s rather unintuitive interface and confusing backup strategy.

To get started, you simply need to pick one. I recommend Dropbox for file backups as it’s simple to use and easy to set up. I highly recommend Google Photos for all of your photo backup needs. Remember, if you try one and don’t quite like it, you can always switch to another!

Wrapup

My brother’s backup strategy was to use a external hard drive and have it backup data periodically to it. It wasn’t a terrible plan, I used to do this. He was at least thinking of data backup before disaster struck. I’ve helped clients of mine with fixing their computer and most of my conversations on data backup were brand new concepts to them. Many of them didn’t understand the importance of data backup until it was too late and data was lost. My brother’s backup attempts were valiant, but his downfall was never testing the backup system and never checking up on if it was working. This double-whammy ultimately lead to a false sense of security which is easily identified with regular maintenance and upkeep of the backup system; essential when you create your own system. Making backups easy and truly a “set it and forget it” thing, online backup providers are your ideal choice as they maintain their systems for you, and you don’t have to worry about your backup hardware failing.

If you would like help setting up your backup plan, I am available to help! Simply contact me and we can set up a time to backup your data!

Security

Hacked accounts, identity theft, security breaches. Security compromise is no longer a question of “if” but a question of “when.” With companies hoarding personal information for marketing, the payload of a successful data breach becomes increasingly more valuable to follow up attacks of individuals. Identity theft leveraging stolen information is typical after a large data breach. This is even more valuable if passwords are involved and attackers are more easily able to gain access to other accounts of yours for proving your identity.

Data Breaches

The elephant in the room. These are occurring more and more with companies that aren’t security first oriented. Startups and established and trusted businesses are both at risk and often victims of data breaches. They occur so often it’s hard to keep track and are often left undiscovered for several years. What can an individual consumer do to prevent these breaches? Practically speaking… nothing. You could decide to not use these businesses, but let’s face it, that is rather inconvenient and practically impossible unless you want to completely remove yourself from the internet. Even physical devices such as card skimmers are used to steal financial information. So unless you want to deal solely in cash, hold no money in a bank, and completely relieve yourself from the internet, you are prone to data breaches.

Take Precautions

There are tools to help protect yourself! For identifying yourself as part of a data breach, you could sign up for the completely free “Have I Been Pwned” website (that’s pronounced “Have I Been Owned“) to be notified if the information you provide is involved in a data breach. They also have a password checker to determine if the password you use has been identified in a breach. These are two very useful tools for determining which data breaches you are involved in and what information of yours may be subject to compromise. I use this service for all of my email accounts and have also added this website’s domain as an added precaution. It’s completely free and Troy Hunt is very active in its development. I strongly urge you to check it out, even if just to see if your account was involved in any of their identified data breaches. Spoiler alert: it is, especially if you have an account, like me, with any of these companies:

  1. Adobe
  2. LinkedIn
  3. Dropbox
  4. Evite
  5. Last.fm
  6. Bitly
  7. Elance
  8. MyFitnessPal
  9. Verifications.io
  10. Apollo
  11. Evony
  12. Houzz
  13. NextGenUpdate
  14. ShareThis
  15. Zomato
  16. FlashFlashRevolution
  17. Neteller
  18. and another 2,844 data breaches my accounts have been involved in

Some of these don’t seem familiar to me (Verifications.io, ShareThis, Neteller, and Apollo), but upon further investigation, these are services other sites use for processing. This becomes more of a Privacy Policy fine print issue in that I never knew until a breach was published that another site used these for data processing. More interesting is that at least one breach was a Chinese company (NetEase) that I’ve never used and, through the breach disclosure, other victims have indicated the same!

Because of this data processing issue, and the fact nobody reads privacy policies (did you read mine?), a data breach of one of these services has a much larger amount of data and a much larger payoff for hackers. To combat this, you should never reuse a password, ever. If you currently have that one, easy-to-remember, password you use for all of your sites, you are not in the minority. You can also see if your password has been compromised if you search for it at Have I Been Pwned’s Password Search. It’s astonishing to me that the password “password” has been involved in  3,730,471 breach records at the time of writing this article. Over 35 gigabytes of related breach record data can be downloaded for this password alone. Password reuse is contagion for your accounts. If you use a password that has been breached, you should assume your account is breached and reset all accounts that use the password immediately. I’ve searched through my super-hard-to-guess, personal-to-me passwords and 3 of them were already breached. I then spent the better part of 2 hours changing passwords.

Password Management

These 2 hours had me wondering if there was a better way than remembering which sites I have logins for and what those logins even are. Enter LastPass. LastPass is a free password manager that handles creating and storing all of your passwords. If you’ve used your browser to remember your usernames and passwords for all your sites, LastPass does the same thing, but using secure methods. Your browser stores all of your usernames and passwords in plain text. What that means, is that anyone who has access to your computer can steal your usernames and passwords. It also means if your computer is infected with a virus or malware, that can also steal your usernames and passwords stored by your browser. Plain text password storage is equivalent to writing down your passwords on a sticky note attached to your computer. It’s highly insecure and easily stolen.

LastPass requires a single strong master password (or better yet, a passphrase!). This password encrypts your passwords and sites all in one single block of data known as a “blob.” There is no way to tell what sites you have passwords for or what the passwords are. The encrypted data is stored on LastPass‘ servers with no way of decrypting it unless you know your master password. You cannot recover your master password. Your master password cannot be stolen unless you store it insecurely yourself (in plain text on your computer, for example). You do have a hint you can use in the case you did forget your master password and there are a few options for recovery if you completely forget it (which I have on two occasions).

Getting set up on LastPass can be a bit cumbersome, but there is a password import utility that will import all of your website and password information saved within your browser into LastPass. This is a great first step in securing your online accounts. Once imported, you can disable password storing in your browser an leverage the LastPass browser extension to auto-fill, auto-save, and auto-update your passwords. When you create a new account on a website, LastPass gives you the option of creating a new randomly generated password for the site. This makes keeping strong and unique passwords for each site a breeze. I typically start at a high number of characters (~100) and reduce it if the website requires a shorter maximum password. The longer the password the more cryptographically secure it is. LastPass also provides the options to include numbers and symbols and/or to make the password human pronounceable (i.e.: a pass phrase).

Once you have all of your passwords migrated to LastPass, this tool provides a Security Challenge where it will take all of your passwords and run some tests against them. It checks to see how many are reused, how strong they are, if they have been involved in a known data breach, and if they should be changed based on their age in LastPass. The fewer the problems, the higher the Security Challenge score. Ideally, you would want 100% but that isn’t practical all the time.

Wrap Up

So, you might be a bit alarmed by all of this. You might even be overwhelmed. Security takes vigilance and persistence, but it also takes some time. If you’re sitting there using password for all of your sites and you’ve found it to be involved on thousands of breaches, don’t fret. You’ve identified that you have a password hygiene problem so now you can fix it. Head on over to LastPass or any other password manager you feel comfortable with and start cleaning up the mess. Take it one step at a time and soon you will have a strong grasp on your account security. You are not alone! Myself, I have a current Security Challenge score of 51% which is pretty garbage if you ask me, seems like I have some password hygiene to take care of!

Server-Side vs Client-Side Trust

Server-Side vs Client-Side Trust

I like to pretend I’m an avid gamer. I try to keep up with the latest gaming trends, well.. at least Iย try to. I have a few games I typically go to, and a few games I am excited to play once they release (and I wait until a sale, or until they’ve been out long enough that the price drops… I am an adult withย responsibilities after all…). I’ve played some games that are great (like Diablo III) and some games that are great in concept but lacking in execution (like Tom Clancy’s: The Division). My go-to games are generally networked and have other players playing them either in a cooperative or adversarial capacity. There are some games, however, that draw more hackers to than others. Why is that? This article is an attempt to explain the exploitation practices of these so-called “hackers” and their drive behind their exploits.

First, before we talk about how “hacking” works, we should set up some basics of network based game play. There are many methods of accomplishing this type of play I’d like to discuss with the pros and cons of each

Peer-to-Peer

Peer-to-peer, sometimes referred to as P2P, is exactly as it sounds. One player is the physical host of a game and other players make connections to the host. In this type of networked gaming, the host has an advantage when it comes to latency (aka lag). The inputs from the host have a latency of 0 while all other players have a latency based on the connection speed of the host player.

The pros of this type of networking is the easy of use. It doesn’t require any specific setup or resources for a player to host a game. It also does not require the gaming company to establish and maintain any dedicated servers. Many console games typically use this type of setup as the network demand is low and the games are typically casual.

Dedicated Server

Dedicated servers are hosted game servers that are specifically build to host a specific game. In games that use this type of connection, all players connect to the dedicated server and all player’s latency is based on their individual connections to the hosting server. Typically, the dedicated server is geographically positioned close to an internet backbone. This type of server is generally used for more competitive play (like Overwatch, or Counter-Strike) and can be set up in a local LAN environment for offline play.

Cloud Hosted

Cloud hosted servers are rather new to the gaming industry. These types of servers are typically allocated on-demand for a particular game and are shut down after the game resolves. This reduces the overall cost of a company for having dedicated servers as the capacity for players expands and contracts with demand. Games that utilize this type of server generally have a match-making system that finds and groups players, allocates a new server, then loads the players on the server. As technology expands, this type of gaming server is likely to become more adopted for many games.

Building Trust

With any game comes some level of trust between players. For video games, this trust can be enforced by the server hosting the multiplayer game or by the game itself. Server-side trust is generally the most trustworthy: commands come in from each player’s game, are validated, then the game state is updated and sent to each player’s client. Client-side trust, however, is more untrustworthy as the server assumes the commands it receives from each client are true and no validation is performed.

Server Side Trust

In Server Side Trust, when a command is received from any player’s client, it is validated against the rules of the game and the game state. If a player’s client sends a fake command “Player A shot Player B on the other side of the map and did a million damage” this is logically checked against the game state (Is Player A in range of Player B?) and the rules of the game (Does the weapon Player A uses allow a million damage?). If any of the commands violate the game state or game rules, the command is either ignored or flagged as suspicious. If enough suspicious commands are flagged for a player, that player could be banned from the game as it indicates cheating.

Client Side Trust

In Client Side Trust, when a server receives a command from a player’s client, it is regarded as the truth. If, somehow, a fake command is sent from a client (say, “Player A shot Player B on the other side of the map and did a million damage”), the server trusts this command as a true and accurate command, updates the game state, then relays the state to each player’s client. The result to Player B would be that they just suddenly were killed by Player A. This is obviously a problem. Client side trust assumes the game client is secure and extra steps are necessary to ensure client is not modified and the messages sent to the server are the original. In transit communications can be protected through encryption, provided the encryption is one-way and messages cannot be intercepted before or after decryption (very unlikely).

Never Trust the Client

There’s a saying in software development: “Never trust the client.” This does not refer to a person, but rather a consumer of server-side processes. Be it a web application, game client, or anything else processes transactions with a central (or distributed) server. Client side trust is inherently insecure. Server side validation is always required when communicating with a third-party (in this case a client application). This validation is crucial to ensuring the integrity of the system as a whole. Once validation breaks down, which is most likely will in very creative ways, recovering from this becomes easier if the server is already validating incoming messages against the rules of the system. Assuming everything that comes into a server side system from a third-party client is an attempt at breaking the system is a heavy-handed approach, but will reap major benefits as the system grows. Having a validation system in place to thwart adversarial communications will always provide benefits for your trustworthy clients.

In The Wild

Now that we all understand the different types of network based systems, let’s take a look at real world applications, how they are built, and the effects of the architecture of these systems.

Tom Clancy’s The Division

This game is build with client side trust as a benefit. There are trade-offs for this, of course, but an unfortunate side effect is that the game client or the traffic the game client is sending to a server can be manipulated. This particular game does have a lot of computational complexities, including world state and physics. These complexities would need to reside on the server if server side trust were to be leveraged. This, in turn, becomes expensive. Having a server powerful enough to model the physics of enemies and the world (which has a large interactive part in game play) becomes almost cost prohibitive. Ubisoft’s approach to Tom Clancy’s: The Division was to enable client side trust from the beginning of development. This allowed the development team to quickly deliver a working (and beautiful!) game to their customers. As a side effect, the game is rampant with cheaters in the PVP areas where competition is high. In this case, the negative cost of cheaters in a PVP area affected a smaller base of their customers as the PVP area was opt-in. The positive benefits involve very complex processes run on each player’s platform (PC, console, etc.) and reduced the cost of server hosting for the game. As a partial list of computations the server would need to validate from each client:

  1. Enemy position, inventory, and state (fire, poison, bleed, etc)
  2. Player position, inventory, and state
  3. NPC position, inventory, and state
  4. Projectile information (bullet drop, grenade location, damage, etc)
  5. Objective information (for each player connected)
  6. A. WHOLE. LOT. MORE.

This becomes cumbersome is a vast game like Tom Clancy’s: The Division. It also would require a lot of changes to the game client if the game were to switch to server side trust. The server would have to maintain the entire game state with each message from each client. It would also eliminate the unique aspect of Tom Clancy’s: The Division where each player has their own world in it that has it’s own state. This allows a player to join another player’s game (in a completely different state) and help them in their quest line. This also enables a changing world where certain impact on a player’s world permanently changes the world somehow. World of Warcraft accomplished this in their many expansions, but the calculations for combat in World of Warcraft and the overall computations required are minimized and streamlined for server side trust.

World of Warcraft

Possibly the best example for server side trust is World of Warcraft. This game is very light on many aspects while still immersing a player into the world through rich lore. The game itself is broken into different servers with a maximum capacity for players. Each area within any given server is then broken up into smaller worlds. Each of those worlds are broken down further into smaller areas. This is called server sharding and helps balance the overall load of any sharded area based on population. This sharding is also why some of the characters you see in a major city disappear when they leave the city: they are migrated to a different server shard. It also explains when you enter an area after an event and that area is changed: you’ve moved to a typed shard based on your quest progress.

Aside from sharding, there isn’t very complex battle calculations. There is no bullet drop, there is no projectile pathing, there is no enemy inventory (enemy loot drops are calculated at the time of enemy death based on loot tables). The entire game has been developed for server side trust and some sacrifices were made to accomplish this. These sacrifices were made up for through rich storytelling and an immersive world.

Player Unknown’s Battlegrounds

Another popular and competitive game that uses client side trust. This particular game has accumulated many cheaters due to its competitive nature. Among the things that have client side trust within the game client are:

  1. Inventory
  2. Hit Detection
  3. Collision
  4. Ammo
  5. Health
  6. Momentum

Any one of these things could be replaced with a message indicating another value. If I sent a message saying I just picked up the best gun in the game, I’d have the best gun in the game conjured up out of thing air. I could also send a message indicating I hit a player anywhere on the map and that player would take damage. This game has recently adopted Easy Anti-Cheat as a measure to prevent tampering with the client side trust. This works by providing a middle layer between the application itself and any process interacting with it. It verifies the integrity of data before sending it to the server and flags any suspicious messages. It also monitors processes that would tamper with the game client and also flag these actions. Enough flags and Easy Anti Cheat notifies the game company who can later ban the player. This effectively moves server side trust to a client side layer while also not being part of the client itself. This type of middleware is currently a better solution for applications with client side trust than rewriting or introducing server side trust.

Wrap Up

Competitive games are generally the target of cheaters and many of them opt for client side trust for the low traffic latency and increased complexity of algorithms. The only solution for applications with client side trust that require a level of integrity are the middleware applications that monitor the application process and any interactions with the process. This is not entirely fault proof, but does offer a greater deal of protection for other players against cheaters. But, as with anything that gives someone an edge in a competition, if there’s a will, there’s a way.

OAuth2 Explained

OAuth2 Explained

I’ve recently involved myself with the implementation of OAuth2 in my current position. During this involvement, I’ve had to read through the OAuth2 specifications and audit our system for compliance. It’s pretty cut and dry as far as compliance goes.. it either works, or it doesn’t work. Since it was working and nobody on the team understood it, I decided to document our OAuth2 flow through our system when a user attempts to access a protected resource. This proved extremely valuable for all of the members of the team to understand the entire flow. When OAuth2 related problems did arise or something needed to be tweaked this document was paramount in the team understanding the problem and addressing it quickly. We pride ourselves on going fast… We are Go developers after all.

What is OAuth2?

You may have heard of the term before, but you definitely have used it before. Whenever you sign into something using your social media account (Google, Facebook, etc.) you are most likely using OAuth2. One of the main hallmarks of using OAuth2 is granting access to some piece of information on your social media account to another service. When you decide to log into a website using your Facebook account, Facebook asks you some questions. These questions are the core of what’s important in OAuth2: delegation. These questions are OAuth2’s way of asking you “Do you trust and are you OK to share this data with that third-party over there?” Facebook is making you aware of what this third party is requesting and if you are OK with that. You can generally decide what access permissions you want to delegate to the app you are logging into and you get to see what this third party needs in order to operate. You can certainly tell if the third party is mining data if it asks for everything about you to simply log in. This is a great layer of transparency! When you agree to these permissions, you are delegating access to this third party. If you’re alarmed, you should be.

One of the many problems OAuth2 is looking to address is transparency. Knowing exactly what has access to  your data and putting that access in your control is one of the many, many, many, reasons I prefer OAuth2. When I want to revoke access, I can! I simply log into my Facebook (or whatever account I used to log in) and see what has access to my account. I can revoke access to anything I want immediately. This control is precisely why OAuth2 has become omnipresent: control of fine-grained third party access to your private data is in your hands.

Now, this isn’t necessarily the case everywhere. Some OAuth2 providers (like Facebook and Google) provide well documented OAuth2 APIs. These APIs defined the backbone of access control: scopes. Each OAuth2 authorization request includes a list of these scopes. Each scope represents pre-defined access to your account data. Some of them (like Facebook’s email) are generally benign and required as something equivalent to a username so the third party knows who you are within their own system. Others (like the Google’s `https://www.googleapis.com/auth/admin.datatransfer` scope for Google) may be used maliciously. In the case of Google, this particular scope allows the relying party to “View and manage data transfers between users in your organization.” Sounds pretty bad right? This is why you have the opportunity to review the requested scopes when granting (or denying) authorization to any third party. You get to audit permissions before granting them! What a wonderful solution! So, if you are using an app that manages your Google organization, this scope is perfectly normal. However, there is a caveat with this auditing. In some certain circumstances, scopes may not be granular enough. A single scope may over deliver in protected data.

When a non-granular scope is used, more information than the user expects is shared. This seems to be the case in Facebook’s Cambridge Analytica fiasco. Facebook has since addressed the issue, but this seems like a permissions issue related to third party access and improper scope compliance. This is all speculation, but it appears that one or more scopes were providing access to more information than they should have. This security loophole would have allowed a third party to access and mine data that was not intended to be shared, but ultimately was accessible. This particular case did not provide transparency for data access. But fear not! There is definitely something you can do about some of this!

Revoking previously granted access to third parties involves logging into your provider account (Facebook, Google, etc) and checking on existing access. For Facebook, check your Account Settings for Apps and Websites (for Google check here). You can review which sites and apps have access to your data and what they have access to. You can also see what access has expired and what has previously been removed. You should review these periodically for anything you don’t use anymore. Apps and websites you don’t use anymore may still have access to your data! Cleaning this up takes about 1 minute and helps lock down your data. While this won’t prevent data leakage (in the case of the Cambridge Analytica scenario) it will maximize protection of your account details.

Now that you know how authorization delegation works, OAuth2 makes this more secure by requiring third party websites and applications to pre-register with them. This creates a trust between the third party and the OAuth2 provider and grants extreme control to the OAuth2 provider as they can revoke ALL access to the third party by blocking their OAuth2 account. Once this trust is established, third parties are granted an ID and a Secret. These two pieces of information are heavily used in various OAuth2 flows.

As a third party, you can obtain Access Tokens in a number of ways called grants. The preferred grant flow is the Code Grant flow (detailed below). Others include Token, Password, and Implicit Grant flows. I won’t cover these in this article but you can read more about them in the OAuth2 Specifications.

Code Grant

The Code Grant. The most secure grant for all parties. This grant requires the third party to begin the OAuth2 delegation flow by requesting the OAuth2 provider begin the Code Grant flow. In this flow, the third party directs the user to the OAuth2 provider where they are prompted to log in and accept (or deny) the requested permissions (via scopes). If you look in the URL, you can see the Client ID, the grant type, and the requested scopes. Once accepted (or denied) the OAuth2 provider redirects the user back to the third party’s callback handler with an OAuth2 code. The third party must handle this code by exchanging it for an Access Token. This is accomplished by presenting the code received from the OAuth2 provider, the third party’s registered ID with the OAuth2 provider, and the secret provided to the third party by the OAuth2 provider. This proves to the OAuth2 provider that the third party initiated this call and it’s not an attempt to hijack an OAuth2 code. Once this is received and validated by the OAuth2 provider, the third party receives an Access Token and an optional Refresh Token.

A simplified view of the OAuth2 Code Grant Flow

Refreshing

The Access Token has a short life depending on what the OAuth2 provider has set. The Refresh Token generally has a longer life (around 2 weeks in many cases). This means the third party has unrestricted access to the data you agreed to provide it whenever it wants to go retrieve it until the Access Token expires (or is revoked by you security minded people out there). Once an Access Token expires (or is revoked) the third party may attempt to renew the Access Token using the Refresh Token by providing the client ID, secret, and Refresh Token to the OAuth2 provider. If access has not been revoked, a new Access Token (and optional Refresh Token) are provided renewing the life of the access. If access has been revoked, neither the Access Token nor the Refresh Token are returned and the third party is notified it is no longer authorized. If the Refresh Token is not used before it expires, then the entire access is automatically expired and a user will need to log in and accept the permissions again.

OAuth2 Refresh Flow

Refresh Tokens have a longer expiration date. Facebook’s is 90 days. On any Facebook activity for that app, the Refresh Token can be renewed. A third party could access your data every day to keep it up to date. This automatically renews the Access Token on Facebook’s servers. A Refresh Token can also be requested every day to keep that from expiring as well. I know I haven’t used Charm Farm in forever, but the access hasn’t expired yet on Facebook!

Improvements

I could see an addendum to the OAuth2 spec to increase transparency by allowing an optional expiration that can be changed by the user during the permission review page or the OAuth2 providers allowing the manipulation of the Access Token and Refresh Token expiration dates. This wouldn’t hinder the operations of an app or website, but will have the added affect of automatically expiring application access you don’t use often at the option of the user. It would also lessen the burden for requiring the user to revoke access for long forgotten apps that are still mooching personal data because they have kept up the refresh cycle. Anything free costs something. OAuth2 has made it easier to cost information. Putting more control into users’ hands is the logical next step. The best thing you can do right now to stay safe is to be vigilant. You should periodically review the access other apps and websites have to your data on all your social media accounts. This will reduce the avenues available for leaking private information.