Friday, March 9, 2018

Recent breaking changes in SharePoint Online

If you ever wondered why SharePoint online apps/add-ins/products are sold almost exclusively as subscription - here is your answer.

We at KWizCom run full automated testing on several production tenants as well as on our own fast-ring release tenant every night, as well as after every build is pushed to fast ring or to production.

More often than we'd like, we see glitches or breaking changes pushed to production that make some or all of our code stop working as intended.

Here are a couple of stories from the past 48 hours:

First


Product manager sets up a live demo of our forms solution. EVERY THING breaks. Pages load with no settings, after a refresh - with only some settings and on other times normally with everything working.

We investigate and notice that SPO (SharePoint Online) is returning a JS file we store under the site assets with a different content every time.

We cancel all caching, open the file in the browser - and you won't believe it! We get a different file every time!

We rename the file, so the file in that URL doesn't exist anymore. Is the problem gone? No - it's worst. Now it comes back in 3 different versions as well as a 404 not found error occasionally.

I'm freaking out, calling on all my Microsoft contacts to investigate. Within 6 hours the problem is gone like it never happens.

Was I dreaming? NO! I got the video to prove it:



Now, obviously I can't reproduce it so I can't give Microsoft additional information, there is no way to know what happened and how to prevent it from happening again.

(Sounds to me, like a new caching feature was being tested out and pushed to production, then later pulled back).

Second

You thought the first story was strange? try this one. We have a provider hosted app that customers are required to install in order to get our products working.
We literally had this app for YEARS with no problems.
Two days ago we get a support call that no one can install our provider hosted app.
We check and see that our app has an install and uninstall remote event receivers - and that event receiver is throwing error 500.

This event receiver code hasn't changed in years, so what happened?
Well, a built in method that visual studio project auto-generated for us that gets the clientContext used to work, or return null if the app was not trusted.
In the last 48 hours it now stopped working and was throwing an exception blowing up our event handler.


Fun times.
In the first case, we had nothing to do but wait, in the second - it was a simply easy fix. But you get my point as to why we need to constantly monitor and push updates for our products?

Now, you might better understand why I am a strong advocate of allowing vendors to update apps remotely without forcing each user to install a new package (like they try to make us do in the new SPFx eco-system).

We literally push hundreds of updates and fixes every month for all our products without bothering the users to install them - I can't imagine we would have happy customers if they would need to install hundreds of updated packages every month...

Now, if you are wondering why we need so many updates, and boasting about your single app that was built X years ago and working until today - ask yourself what does it do. My guess: not much more than saying "hello" or showing the current time and weather... ;)

Hope this provides some insight to those who are looking to support SPO or any cloud service in their products. Sound off in the comments if you have more to add!

No comments: