Search

Showing posts with label Bandwidth Matters. Show all posts
Showing posts with label Bandwidth Matters. Show all posts

Monday, October 14, 2013

10 successful big data sandbox strategies


Keep in mind these ten strategies when building and managing big data test environments. 
bigdata-business-pain-140x105.jpg
Being able to experiment with big data and queries in a safe and secure “sandbox” test environment is important to both IT and end business users as companies get going with big data. Nevertheless, setting up a big data sandbox test environment is different from establishing traditional test environments for transactional data and reports. Here are ten key strategies to keep in mind for building and managing big data sandboxes:

1. Data mart or master data repository?

The data base administrator needs to make a decision early-on as to whether to have test sandboxes use data directly from the master data repository that production uses, or whether the best solution is to replicate and splinter off sections of this data into separate data marts that are reserved for testing purposes only. The advantage of the full data repository is that testing actually uses data that is used in production, so test results will be more accurate. The disadvantage is that data contention can be created with production itself. With the data mart strategy, you don’t risk contention with production data—but the data will likely need to be periodically refreshed to stay in some degree of synchronization with data being used in production if it is going to closely approximate the production environment.

2. Work out scheduling

Scheduling is one of the most important big data sandbox activities. It ensures that all sandbox work is optimally being run. It usually achieves this by concurrently scheduling a group of smaller jobs that can be completed while a longer job is being run. In this way, resources are allocated to as many jobs as possible. The key to this process is for IT to sit down with the various user areas that are using sandboxes so everyone has an upfront understanding of the schedule, the rationale behind it, and when they can expect their jobs to run.  

3. Set limits

If months go by without a specific data mart or sandbox being used, business users and IT should have mutually acceptable policies in place for purging these resources so they can be put back into a resource pool that can be re-provisioned for other activities. The test environment should be managed as effectively as its production environment counterpart so that resources are called into play only when they are actively being used.

4. Use clean data

One of the preliminary big data pipeline jobs should be preparing and cleaning data so that it is of reasonable quality for testing, especially if you are using the “data mart” approach. It is a bad habit (dating back to testing for standard reports and transactions) to use data in test regions that is incomplete, inaccurate, or even broken—simply because it was never cleaned up before it was dumped into a test region. Resist this temptation with big data.

5. Monitor resources

Assuming big data resources are centralized in the data center, IT should set resource allowances and monitor sandbox utilization. One area often requiring close attention is the tendency to over-provision resources as more end user departments engage in sandbox activities.

6. Watch for project overlap

At some point, it makes sense to have a corporate “steering committee” for big data that tracks the various sandbox projects going on throughout the company to ensure that there is no overlap and/or duplicated effort.  

7. Consider centralizing compute resources and management in IT

Some companies start out with big data projects in specific departments but quickly learn that they can’t work on big data, do their daily work, and then manage compute resources, too. Ultimately, they move the equipment into the data center for IT to manage. This frees them to focus on the business and ways that big data can bring in value.

8. Use a data team

Even in sandbox experimentation, it’s important to have the requisite big data skills team on hand to assist with tasks. Typically, this team consists of a business analyst, a data scientist, and an IT support person who can fine-tune hardware and software resources and coordinate with database specialists.

9. Stay on task with business cases

It’s important to infuse creativity into sandbox activities, but not to where you totally forget the initial charge of the business case you’re trying to bring value to.

10. Define what a sandbox is!

Especially participants coming from the end business might not be familiar with the term “sandbox” or what it implies. Like the childhood sandbox, the purpose of a big data sandbox is to freely play and experiment with big data—but to do it with purpose. Part of this purposeful activity should be abiding by the ground rules of the sandbox, such as when, where and how to use it, as well as experimenting to derive meaningful results for the business.

Tuesday, September 6, 2011

Bite some bits from your bandwidth

With broadband connections aplenty, it can be easy to forget that at the end of the day someone is paying for the bits and bytes that are getting transferred around the globe. The price for buying an app — not the cost of using the app - is usually foremost on a developer’s mind.

The overall cost of an app can be broken down into three parts: the app, bandwidth, and time.

  • Cost of the app: People consider the upfront cost of the app, plus upgrades and/or subscriptions they will need for the app to perform according to what they desire.
  • Cost of bandwidth: People are generally aware of bandwidth usage, however, it is not usually until after excessive consumption that they discover the real cost of their apps. This is especially problematic if a big bandwidth bill unexpectedly arrives.
  • Cost of time: The amount of time required to use an app is worth money. Slow apps that don’t work are not only frustrating but also use valuable time that could be put to use elsewhere. Apps that are difficult to learn can get abandoned, resulting in a complete waste of time and frustration for the user.

Video-based apps are a great example of where people give heavy consideration to the cost of the app, but not as much to the other two factors until their bandwidth bill arrives or they look at the clock.

We can assume that the upfront cost of your app and upgrades is fixed to a certain degree, but what about your bandwidth usage? Are you saving time through your use of the Internet and your app, or are you adding bandwidth and time? Apps can quickly rack up the bandwidth bill if data transfer is not managed or monitored during the development process. There are many places that bandwidth can be saved through optimizations, and app developers should set this as a priority not an afterthought. Here are some ways that you can save bandwidth and time in your apps.

1: In-app processing

Processing content within the app or before sending it to a server can reduce bandwidth consumption. If content that is to be uploaded to a server can be compressed or otherwise optimized, the reduced transfer time will save bandwidth costs and time communicating with the Internet. If a data processing function can be accomplished just as fast or even at the same rate on the app-side, it may be better for you to have that processing occur within the app before sending it to the server rather than sending it to the server and waiting for processing and the results.

2: Server-side processing

Oftentimes, it may be better to have processing handled on the server side. In this case, you will want to optimize the number of times your app has to communicate with the server. If you combine multiple data requests into one request with multiple sub-requests that can be parsed on the server, you will save time. If you can do the same for returned data, you will reap time-saving benefits twice.

3: Count your bytes

For server/app data exchange, JSON and SOAP are good technologies, but you may be able to save bandwidth through formatting of your own devise. This is applicable to both app-side and server-side processing and can be helpful in situations requiring a live data feed. I have used delimited text for returning results from a server to an app. Splitting delimited text can be quicker and easier than parsing JSON or SOAP if the data is simple enough.

4: Cache your content

Caching content on servers has been standard practice for reducing bandwidth for a long time. Applying caching techniques to app development is equally valuable for performance purposes and efficiency. Pinging servers constantly can usually be avoided unless a live data feed is required. If you don’t need a live feed of data, then cache locally to the device whenever possible.

5: Optimize your media

Apple has already taken steps to directly impact the use of bandwidth and streaming within apps. If you are looking at using video in your apps, then make sure your video is encoded to the best possible settings based on the actual content. The best experience for an end user comes from well-tweaked streams. If your content is static and manageable in size, consider including it within the app rather than via an in-app download or streaming.

Summary

If the bandwidth usage of your app has not been optimized, then the overall cost of your app has not been optimized. If you can reduce the amount of bandwidth your app is using, you can also reduce the amount of time needed for users to use your app, and this will add value overall. Broadband access may be more available than ever, but developers should still consider bandwidth optimization from the beginning, for the benefit of end users and the Internet in general.