Sam's profileSamb Business Intelligen...PhotosBlogListsMore ![]() | Help |
|
May 30 Surface Computing coming from Microsoft this coming Fiscal Year!http://on10.net/Blogs/larry/first-look-microsoft-surfacing-computing/ This is pretty exciting and kind of a surprise that it's coming to the public so soon... May 23 How to open up a new Report in its own window in Reporting ServicesI've heard this request a lot and here's one way to do it... javascript:void(window.open( 'http://pathToTheReport’, '_blank') Drop this in the Jump To URL textbox A good demonstration/exercise of Visual Thinkinghttp://communicationnation.blogspot.com/ Dave Gray, the founder of XPlane and a profoundly visual guy has posted a great slide lesson from Ryan Coleman on the value and methodology of visual thinking. Anyone that does any sort of modeling or vision shaping should take a look at this site. May 21 Chess Move Visualization - Influence Waves and Curves for anticipated movementhttp://turbulence.org/spotlight/thinking/chess.html Now this is a great site - you can play against the computer and watch it analyze moves. The chess board will gently pulse to show the influence of the various pieces. in the left image below, you can see waves over the squares around the king and (very lightly) over the squares where the pawns might capture. When the machine (Black) is thinking, a network of curves is overlaid on the board; see image at right. The curves show potential moves--often several turns in the future--considered by the computer. Orange curves are moves by black; green curves are ones by white. The brighter curves are thought by the program to be better for white. As business processes become automated and personalized by products like Biztalk Server and platforms like Windows Workflow Foundation, a lot of visualization possibilities will emerge that relate to the chess visualization. What was considered, why was a step taken or not. May 18 Code Camp 2007 - SQL Server 2005 Data Mining (DM)Welcome to Code Camp. I decided to post most of the relevant material here, so that everyone can download it or share it with friends. This session will cover an intro to Data Mining and will also take you through a rapid coding example of using DM for prediction in a heads-up graphical display with Direct X. For this demo, I thought I would use some data that is *not* business related. I do business stuff all the day long - so let's do something that has nothing to do with business, but we could learn something about DM that can be transferred to business problems. My second objective is to have some fun. So, the data set is Forest Fire data (you can find some great data sets here) - specifically, fires by lat/longitude in Colorado by reason and how much acreage was destroyed. There are over 4800 data points in this data file and it was fun just trying to figure out how to get all that data displayed. The data set looks like this, nothing too special... FN is a unique firenumber, which is quite useful for data mining to keep its wits about itself. Notice that virtually everything we are going to work with is a floating point number - lat, long, and acreage, only reason, which is coded to things like 1=lightning, 6 = campfires, etc. is a discrete integer that is not continuous (yeah, Sam, Integers are not continuous). When you plot this out by lat/longitude in DirectX you get this kind of a visualization. The spheres represent each row from the data set and size and color of the sphere show how much acreage is destroyed. The backwall shows 10 years of fire data grouped by year and month and then predicted what the next 30 days will look like. This is using the Time Series DM Model. There are all kinds of things in the visualization to help you understand what is what, there are whiskers that help you know the exact lat/long for each fire when you brush up against them with the mouse. I also show the reason and numeric value of acreage destroyed in a tool tip. This is a cool visualization and a great headsup display for a smoke jumper unit or USDF agency that monitors fires - but what about predicting big fires based on this data? This is where DM comes in. The best models for prediction are the classification algorithms: decision trees, logistical regression and the grand daddy: Neural Nets. Each has it's own pro/con. Upon building a model using FN as the key and predicting [acres total] with lat and long as the inputs we get a decision tree. It's a little hard to make out, but basically it finds rectangular regions that have lots of fires. Microsoft's great advantage is that we operationalize this model by allowing you to "join" (yeah, SQL join - that one) your data to this model's truth table - using SQL - so that means Reporting Services, Integration Services, BSM, PPS, Excel - anything can like to this to help you predict values. For example, if I took the data table we originally plotted and join it to the Data Mining model above like so: You'll notice that Data Mining gives you some new functions you can call: PREDICT, PREDICTVARIANCE, PREDICTHISTROGRAM, PREDICTPROBABILITY and many others. I can then traverse this record set and "paint" the height of a cylinder (each cylinder is a fire) and the "prediction" of the acreage destroyed...the green is the result of decision trees and purple is Neural nets. Interesting how placement is almost identical, but the acreage destroyed prediction is very different. Neural nets are a great "general" purpose classifier, which might work better on problems that are based on - is it there, or not? Decision Trees and Logistical Regression seem to be better at placement and HOW MUCH BURNED - OR WILL BURN, well you get the idea. If I grided out Colorado by 200 even points longitude and 200 even points latitude I could count the number fires in those corresponding grids and plot the value of the grid to see how much fire occurred there (we are talking counts here, remember, not acreage). We would get this... But what if I just used the overall grid points and did a prediction against those sparse points - would I get the same thing? Um, no, no you wouldn't. You would, in fact, get this (I visualized it with colors just to show the issue)... ...see, we are dealing with floating points and the precision of these numbers really need us to be a little more careful, than just diving by 200 and taking those points against the dm model. Here is one solution, but it isn't very speedy... Take the grid approach but instead of using the sparse grid lines, do a number of jittered samples in each grid square - I chose randomly, but you could also do some sort of sweep. Then I figured the average and plot that height. This allows you to use better precision floating points in the lat/long area and the breaks are much more clean than that nasty mess I showed you above. Here's the DMX (think SQL) to do this...it's called a singleton query. "SELECT Predict([Acres Total]) From [Fires] NATURAL PREDICTION JOIN (SELECT " & CStr(flat) & " AS [Latitude], " & CStr(flong) & " AS [Longitude]) AS t" where flat and flong are doubles (floats) that are the jittered values. The recordset returns the predicted Acres that will be destroyed. Here's a visualization of the end game... So, clearly this is a map of Colorado, but the surface is perturbed - the height is the acreage predicted to burn, based on 10 years of history. This is with 5 jittered samples per grid point. and here it is with 10 jittered samples... This took about an hour to create on a brand new HP laptop with 4GB of RAM with dual core 2.16GHz - so it won't win any speed records, but you can save the results and then the rendering would be immediate.This is such a cool visual, that I will make a post later on with all the details... Finally, I wanted to show you how we get that prediction of the next 30 days on the backwall of the original visual... The Time Series has a great function that let's us peer into the future a little, or 30 days in this case... "SELECT FLATTENED PredictTimeSeries([Acres Tota],30) From [DateView]" The *FLATTENED* clause let's me take a nesting and materialize it as a recordset that I can traverse. In this case each record is the Prediction of Acreage to be destroyed starting tomorrow (based on the last data point and going 30 days into the future). This session is only an hour long, so I will be flying as I cover this material - this is a great technology that has uses virtually everywhere. SQL Server 2005 really makes it much more approachable for the average developer. I want to know more - where can I go?
Drop me any thoughts you may have... May 16 Visualize the World Population with a TreeMaphttp://www.hivegroup.com/world.html Bill Morein, an engineer on Visio, pointed me to these visualization examples that utilize treemaps... SQL Server "Katmai" will support Spatial Data TypesLast week at the BI Conference they announced that Katmai - the next version of SQL Server (currently in CTP and slated for a 2008 release) will support spatial capabilities. Here are a few known details...
When you add these types of features and the ability to call a web service like www.local.live.com you get a pretty interesting environment for business intelligence questions that center on "where?".
Update: Here's an interesting post to get excited about: http://blogs.msdn.com/isaac/archive/2007/05/16/sql-server-spatial-support-an-introduction.aspx Visualizations of Digg ActivityDigg Labs and Brian Shaler have some pretty neat visualizations of blog and "interest" visualization. I am especially fond of the swarm visualization. http://brian.shaler.name/digg/heatmap/ As more and more companies capture visits and linkages over short periods of time and as these social networks become powerful in their spending ability and influence these types of visualizations will become common place. "You can explain anything with Animation" - Walt Disney May 14 Mike Gannotti's Sharepoint/Silverlight Integration Contesthttp://sharepoint.microsoft.com/blogs/mikeg/Lists/Posts/Post.aspx?ID=248 Mike used to work here in the Mid-Atlantic and is, without a doubt, one of the best Sharepoint resources in the company. Now he is down in South East and creating quite a hub-bub with his Sharepoint Blog. He is quite a versatile guy with deep knowledge around taxonomy, search, web parts and integrating digital media for training and conferencing. Take a look at his contest - he's even giving away a pretty valuable prize. May 11 Add "minibars" to your SQL Server Reporting Services ReportsOver the last few months I have noticed a "fad" in BI visualization. Miniature bars that are added to report rows that show magnitude of change. They don't have any axes and they are relative to each other, so in many ways they are similar to Ed Tufte's Sparklines. Here's an example of one from www.cnbc.com They are quite useful and eye-catching. Here's a way to make your own, based on the one above. Here's what we will end up with... ...and here's how to do it step by step... Here's a Data Table in SQL 2005 we can use... So, here we are after just general layout and font settings (size=8, Arial). The Modulus Row coloring is a trick - apply the following expression to Background Color of the Table rows... =iif(RowNumber(Nothing) Mod 2, "Gainsboro", "Silver") Now it's time to do the bitmaps, which are Valued expressions based on positive and negative values. Each Bitmap is valued as follows: =iif(Fields!Last_Month.Value<0, "quotedown","quoteup") - where the quoted name is the name of the image. Don't forget to set Sizing to "Clip" and use the Padding Values in the Image Property to get everything looking nice. So that brings us to this look... The Numeric Values are formatted in Bold and the Color is an expression: OK, so let's get the minibars in there now... Click on the Report Menu in Layout mode and select Properties... Now pick the Code tab - you can code anything you want as a function - I used VB...
Apply the Expression to the empty fields to the right of each value like so: =code.sMiniBar(Fields!Last_Month.Value, max(Fields!Last_Month.Value) ,10) The Parms allow you to call with the value to plot, the maximum range of values (conveniently handled by the max() function from Reporting Services, and the length of chars to map to in the field (10 is a good value). Here's what you end up with... "N's??!", yes N's! When you apply WingDings (or any other symbolic font) you get bars. Now we can set the color based on value like we did with the values earlier. Something I always struggle with is the use of color in reports - most of the colors that are available in the Reporting Services Palette are garish - too intense and too bright. So, I used custom settings to get these more muted colors which I think are easier to read and more pleasing to the eye. You would also be surprised how much difference the border outlines on the cells and rows can make to a report. I like mine better than the original with two exceptions - I still need to figure out how to get the icon, the numeric and the mini-bar a little more unified/centered under the column titles and I don't like the fact that the images are plainly not accepting the background coloration. Any advice on those? BTW, you'll notice that even though they are separated farther apart in my implementation, they do line up much nicer from a justification perspective. Here's another font suggestion - use WebDings and pass "g" back in the code instead of "n" - set the font size to 6pt to get a narrow, continuous bar. These kind of reports would make great scorecard/dashboard parts in Sharepoint. May 10 OfficeWriter now part of Microsoft Stablehttp://seattlepi.nwsource.com/business/315084_officewriter10.html It will be delivered in the "Katmai" release - here's more about what's in the next SQL release... http://www.microsoft.com/presspass/press/2007/may07/05-09KatmaiPR.mspx "Time as a Flow" Visualization methodAn algorithmically generated visualization based on statistical information provided by Last.fm software. Every song listened to by a particular user over an 18 month period of time. Each colored band represents a musical artist, progressing left to right. the span is wider when listening was more frequent, & skinnier when it was not. Market Radar7 years of data for the S&P 500 - I like the comparison of the dot com Boom and Bust periods. May 09 Code Camp Preview 3 - It's Alive!Most of the legends are now in. I've tried to follow the advice of Ed Tufte on this visualization - many of his rules are as follows:
You'll notice the cool whiskers I added (blue lines) which help a user know exactly where they are pointing the mouse from a lat/long perspective. I also added a new Data Mining Model for Logistical Regression (shown as pink cylinders). Logistical Regression seems to work better than Neural Nets with regard to Acreage Lost (depicted by height) - which makes sense because it is designed to work better with continous data than discrete data. In the next few days I will complete the visualization by adding a time line plot of fires by year, month and day on the back wall and use the Time Series DM model to predict some future steps. I've learned a lot about the Data Mining algorithms in SQL 2005 and using DMX doing this project, I hope you've enjoyed the progress. May 08 Code Camp Data Mining Preview 2OK - I made a bit of progress getting the DM Algorithms integrated with the display. I added the legends for the Rubber Sheet (Count of Fires in grid cells)... I built three DM Mining Models - two Decision Trees and one Neural Net. Here's a lift chart that I used to understand the efficiencies of both... Off the bat I noticed that the Neural Net seems less sensitive to magnitudes (like acreage destroyed) and more sensitive to presence of a value (in this case lat/long). Whereas, the Decision Trees seem to be quite sensitive to Magnitude. Hard to describe - better to see... Just to remind you what we are doing - forest fires plotted by lat/long in Colorado... You can see a few big fires that swallow all kinds of acreage and then many small fires that burn up small patches of acreage. Here's the Neural Net View - Height of each purple cylinder (Predict([Acres Total])) is how much acreage is destroyed at lat/long... Here's the Decision Tree view (Green Cylinders) - same data, just a different Algorithm - I overlay to show you how DT's seem to be more sensitive on Acreage Destroyed, but they also predict *Negative* damage in some areas... Here are the negative values hiding "under" the Colorado polygon... So, overall here's my take - Neural Net = great generalized pattern recognizer, more sensitive to presence than magnitude and Decision Trees - useful at both presence (lat/long) and magnitude - perhaps overly so. Over the next few days I'll get the Legend and Interaction features cleaned up. Drop me comments if you want... |
|
|