Thursday, February 28, 2008

Live Site Issue: File Names (Resolved)

We are working to resolve an issue with downloads currently where the user is prompted to download an incorrect file. Our team has found the issue and is working to resolve as quickly as possible.


UPDATE 12:38 PM EST - Issue Resolved

The issue with the file names being the same has now been resolved. It was actually a bigger issue than the simple file names. A new script running on the server that we moved from the development environment to the live product server had a seemingly small bug in it that prevent it from binding the change to a single user's account and caused a site wide update instead. The only recourse was to remove the offending script until it could be looked at closely and to restore the data from a backup.

This is where things got a little messy. Our system backs up data regularly in case something like this comes up. However, the issue arose when some of the backups still had the offending incorrect data in them. Luckily, one of the backups from earlier today 6:00 AM EST had the proper data. In a perfect world, we would be able to restore the single table and fields that caused the issue, but unfortunately, that is not the case. Databases are generally monolithic creatures with one aspect of data, like product information, intimately tied to other tables, like user information and order information. The only reliable solution is to restore the entire data set to the last known good value. This means the 6AM data backup. Meaning, there is a current data gap from 6:00AM EST today until 12:00 noon today that is unrecoverable at this time.

Ultimately, this is a good thing. I see issues in a potential data loss recover process that will need to be improved and I am putting those at the top of our task list. Changes will include a better testing environment before scripts go out, a better QC check when the script moves to the live server, and a faster, automated data or site restore option. It took a few hours to restore the data when it should only have taken a few minutes. We didn't want the 6 hour data gap, but in hindsight, we should have just accepted it for the time being. We spent far too long looking at ways around the gap in data to no avail.

Finally, the system should be working properly now. If it is not, we will be on it all day. For sellers, the orders placed during the data gap will not be recovered- this unfortunately, is unavoidable. You can still send the downloads to customers manually using the Add Download Email link found on the Product Detail page in your account.

Thank you for bearing with us during this hectic time,