30 second guide to data warehousing
Many years ago, my horrid manager refused me a wonderful opportunity to go on an all expenses paid training course all about data warehousing in some exotic location.
I was moaning about this to a colleague over lunch. She was an ex-teacher and happened to work in the prestigous data warehousing consultancy group.
'There, there Norman. Don't cry. Tell me exactly what you wanted to learn from this course ?'
'Well Sue, I just feel so stupid. I don't even know what a fact table is, or a slowly moving dimension let alone a star schema - all that fancy data warehousing terminology'.
'Shut up and listen. You buy a sandwich in Tesco. The sandwich costs 2.55 GBP. You have a table called TRANSACTION with a column called PRICE. There are other tables called PRODUCT, REGION, STORE, DATE and CAMPAIGN. There are a load of foreign keys from the fact table to the dimensions and the data model is highly normalised.'
'The TRANSACTION table is a fact table because it records a fact - an event that actually happened. Fact tables tend to be large. Just think of all those massive queues for all the the checkouts at all the Tesco stores.'
The other tables are called dimensions - these tables tend to be smaller and describe elements of the business and allow managers to report on sales by product/region/store/campaign/month/year/quarter.
'Oh I see but what about a star schema ?'
'Draw a picture with the fact table in the middle and the dimension tables around the edges. Connect the tables together. What do you see ?'
'Oh I see. A pretty star. OK then. What about a snowflake ?'
'Draw 7 stars and join them up. What do you see ?'
'Oh I see. A lovely snowflake. Thanks a lot, Sue. That really has been very useful.'
'No problem. Data warehousing isn't actually that hard.'
'Now what is the Pareto Principle ?'
Unfortunately, my helpful teacher suddenly remembered she had an urgent meeting to go to and the '30 second Guide to CRM' was postponed.
Rule is dead, long live Rule
A long day tuning SQL queries using Siebel 7.8 and Oracle 10gR2...
We used the Siebel recommended settings (TechNote 582).We used the Oracle recommended settings.
We gathered table statistics.
We gathered index statistics.
We gathered column histograms.
We dropped statistics on empty tables (Alert 1162).
We set some miscellaneous (magic) underscore parameters to encourage CBO to use the correct index.
We poured over 10053 trace files.
We used a 15 year old, deprecated, desupported optimizer technology to reduce a complex 27 table (outer) join query with a subquery to subsecond from an hourglass.
We opened an SR with Oracle Technical Support.
We opened an SR with Siebel Technical Support.
We used a stored outline.
We went home.
helping people read books
Someone recently asked me at a dinner party: 'So, Norman, tell me what you do in life ?'. I spontaneously replied: 'I help people read books'. The lady (for it was a she) exclaimed: 'Oh how absolutely fabulous. You are a teacher'. 'Err, well, no. I actually work in IT'. 'Oh I see. You work in training. Why didn't you just say so ?' 'Err, well no. I am a sort of IT consultant'.
Anyway, after an embarassing stony silence, thankfully I managed to steer the conversation to the safer domain of the wide range of choices for secondary school education in our locality. This fascinating subject occupied us right through until the desert and coffee were served.
But the point I was trying to make was that Siebel and Oracle are incredibly large, complicated, wide ranging software products. I have worked with Siebel for three years and Oracle for a little longer but there are still so many areas and modules in both products that I have no practical experience of whatsoever.
I remember once reading Tom Kyte stating that he did not have access to the Oracle source code nor did he did not have a hotline to RDBMS engineering. The basis of his wealth of extensive Oracle knowledge was primarily the documentation set. I remember being hugely impressed by this simple statement. [ Sorry I did look but failed to locate the reference ]
I am a Siebel 'consultant' trying to help people use Siebel more effectively. Most of the information needed to help customers use Siebel more effectively is actually contained in the documentation. The only problem is that the 'documentation' is simply overwhelming as it includes the manuals, FAQ's, Alerts, Release Notes, Service Requests etc etc.
I have a couple of advantages: Firstly, I am continually exposed to a wide variety of different Siebel related issues day after day so I so have a degree of experience of real-world problems (and hopefully the resolution).
Secondly, and more importantly, I do have access to a network of highly talented, intelligent individuals with far more experience and intelligence than yours truly. Now this wouldn't be an advantage unless that group of people were prepared to share their knowledge and I am pleased to say that they are. This isn't necessarily true at all companies I have worked for.
Normally, I lug my heavy laptop, hanging over my shoulder, attached to my body like a young helpless infant, all around Europe. Today I was in Stockholm and the weather was unusually hot (30'C). To reach the office, I had to take a train and a tube in the morning rush hour. Consequently, I left the laptop behind in the hotel and arrived onsite free from back pain and feeling blissfully liberated.
I told the customer that we would purely be using the public documentation that is freely available to me and him. No hidden cheat-sheets, no private internal emails, no top tips from engineering. He was impressed (I think).
Then, of course, inevitably, we hit a very obtuse, bizarre problem, neither of us had encountered before so it was time to make another call on that network.
history of Oracle
A couple of people stumble across this blog searching for the 'History of Oracle' but ultimately go away disappointed.
For those people, there is a brief but interesting timeline (covering 1977 to 2001) detailing the development of Oracle Corporation in this freely available screensaver available from Club Oracle.
The screensaver is the one titled 'Oracle Defining Moments - 25 Years of Technology Innovation'.
state of the database nation
A Gartner/IDC report summarising the state of the database market in 2005 contains some interesting nuggets of information.
The database market is still growing at 9.4% (which surprised me a little).
OpenSource databases account for less than 1% of the market but are growing fast (47%).
The Linux platform (thanks mainly to Oracle) is showing the strongest growth (84%).
Despite these two statements of fact, Oracle are not perturbed by the threat of OpenSource (pass the salt cellar).
Market share:
- Oracle - 44.6%
- IBM - 21.4%
- Microsoft - 16.8%
- OpenSource (MySQL, Ingres) - 1%